Python 3.7 通过 asyncio 实现异步编程

Python 中通过 asyncio 实现的异步编程主要包含如下三个模块:

  • 事件循环(event loop):每一个需要异步执行的任务都会在事件循环中注册,事件循环负责管理这些任务之间的执行流程
  • 协程(Coroutine):指用于执行具体某个异步任务的函数。函数体中的 await 关键字可以将协程的控制权释放给事件循环
  • Future:表示已经执行或者尚未执行的任务的结果

在异步程序的世界里,所有代码都运行在事件循环中,可以同时执行多个协程。这些协程异步地执行,直到遇到 await 关键字,此时该协程会让出程序控制权给事件循环,使得其他协程有机会发挥作用。
需要注意的是,不能在同一个函数中同时包含异步和同步代码。即在同步函数中无法使用 await 关键字。

一、Hello World

以下是一段简单的使用了 async 关键字的 Hello World 程序:

1
2
3
4
5
6
7
8
9
10
import asyncio

async def hello(first_print, second_print):
print(first_print)
await asyncio.sleep(1)
print(second_print)

asyncio.run(hello("Welcome", "Good-bye"))
# => Welcome
# => Good-bye

上述代码的行为看上去更像是同步代码,先输出 Welcome,等待一秒钟之后,再输出 Good-bye
在进一步探究之前,先看下上述异步代码中出现的几个基本概念:

  • Python 语言中,任何由 async def 定义的函数(即上面的 hello())都可以称之为协程。调用协程函数所返回的对象称为协程对象。
  • 函数 asyncio.run 是所有异步代码的主入口,只应该被调用一次。它负责组织传入的协程对象,同时管理 asyncio 的事件循环。
  • await 关键字用于将协程运行时获取的程序控制权移交给事件循环,并中断该协程的执行流程。

一个更现实的异步程序的示例如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import asyncio
import time

async def say_something(delay, words):
print(f"Started: {words}")
await asyncio.sleep(delay)
print(f"Finished: {words}")

async def main():
print(f"Starting Tasks: {time.strftime('%X')}")
task1 = asyncio.create_task(say_something(1, "First task"))
task2 = asyncio.create_task(say_something(2, "Second task"))

await task1
await task2

print(f"Finished Tasks: {time.strftime('%X')}")

asyncio.run(main())

# => Starting Tasks: 20:32:28
# => Started: First task
# => Started: Second task
# => Finished: First task
# => Finished: Second task
# => Finished Tasks: 20:32:30

从同步执行的逻辑来看,应该是 task1 开始,等待一秒钟,结束;task2 开始,等待两秒钟,结束。共耗时 3 秒以上。
异步程序实际的执行流程为,task1task2 同时开始,各自等待一段时间后,先后结束。共耗时 2 秒。具体如下:

  • task1 中的 say_something 协程开始执行
  • say_something 遇到 await 关键字时(await asyncio.sleep(delay)),协程暂停执行并等待 1 秒钟,在暂停的同时将程序控制权转移给事件循环
  • task2 从事件循环获取控制权开始执行,同样遇到 await 关键字时暂停协程并等待 2 秒钟,在暂停的同时将程序控制权转移给事件循环
  • task1 等待时间结束后,事件循环将控制权移交给 task1,恢复其协程的运行直至结束
  • task1 运行结束,task2 等待时间完成,task2 获取程序控制权并恢复运行直至结束。两个任务执行完成。

二、Awaitable 对象

await 关键字用于将程序控制权移交给事件循环并中断当前协程的执行。它有以下几个使用规则:

  • 只能用在由 async def 修饰的函数中,在普通函数中使用会抛出异常
  • 调用一个协程函数后,就必须等待其执行完成并返回结果
  • await func() 中的 func() 必须是一个 awaitable 对象。即一个协程函数或者一个在内部实现了 __await__() 方法的对象,该方法会返回一个生成器

Awaitable 对象包含协程、Task 和 Future 等。

协程

关于被 await 调用的协程,即上面的第二条规则,可以参考如下代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import asyncio

async def mult(first, second):
print(f"Calculating multiply of {first} and {second}")
await asyncio.sleep(1)
num_mul = first * second
print(f"Multiply is {num_mul}")
return num_mul

async def sum(first, second):
print(f"Calculating sum of {first} and {second}")
await asyncio.sleep(1)
num_sum = first + second
print(f"Sum is {num_sum}")
return num_sum

async def main(first, second):
await sum(first, second)
await mult(first, second)

asyncio.run(main(7, 8))
# => Calculating sum of 7 and 8
# => Sum is 15
# => Calculating multiply of 7 and 8
# => Multiply is 56

上述代码中由 await 修饰的两个协程函数 summult 即为 awaitable 对象,从输出结果中可以看出,sum 函数先执行完毕并输出结果,随后 mult 函数执行并输出结果。
await 调用的协程函数必须执行完毕后才能继续执行另外的 await 协程,这看上去并不符合异步程序的定义。

Tasks

协程异步执行的关键在于 Tasks。
当任意一个协程函数被类似于 asyncio.create_task() 的函数调用时,该协程就会自动排进由事件循环管理的执行流程里。在 asyncio 的定义中,由事件循环控制运行的协程即被称为任务
绝大多数情况下,编写异步代码即意味着需要使用 create_task() 方法将协程放进事件循环。

参考如下代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import asyncio

async def mul(first, second):
print(f"Calculating multiply of {first} and {second}")
await asyncio.sleep(1)
num_mul = first * second
print(f"Multiply is {num_mul}")
return num_mul

async def sum(first, second):
print(f"Calculating sum of {first} and {second}")
await asyncio.sleep(1)
num_sum = first + second
print(f"Sum is {num_sum}")
return num_sum

async def main(first, second):
sum_task = asyncio.create_task(sum(first, second))
mul_task = asyncio.create_task(mul(first, second))
await sum_task
await mul_task

asyncio.run(main(7, 8))

# => Calculating sum of 7 and 8
# => Calculating multiply of 7 and 8
# => Sum is 15
# => Multiply is 56

对比上一段代码示例,从输出中可以看出,sum_taskmul_task 两个任务的执行流程符合异步程序的逻辑。
sum_task 遇到 await asyncio.sleep(1) 语句后并没有让整个程序等待自己返回计算结果,而是中断执行并把控制权通过事件循环移交给 mul_task。两个任务先后执行并进入等待,最后在各自的等待时间结束后输出结果。

create_task() 函数以外,还可以使用 asyncio.gather() 函数创建异步任务:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import asyncio
import time

async def greetings():
print("Welcome")
await asyncio.sleep(1)
print("Good by")

async def main():
await asyncio.gather(greetings(), greetings())

def say_greet():
start = time.perf_counter()
asyncio.run(main())
elasped = time.perf_counter() - start
print(f"Total time elasped: {elasped}")

say_greet()

# => Welcome
# => Welcome
# => Good by
# => Good by
# => Total time elasped: 1.0213364

实际两个任务完成的时间略大于 1 秒而不是 2 秒。

Futures

Futures 代表异步操作的预期结果,即该异步操作可能已经执行也可能尚未执行完毕。通常情况下并不需要在代码中显式地管理 Future 对象,这些工作一般由 asyncio 库隐式地处理。

当一个 Future 实例被创建成功以后,即代表该实例关联的异步操作还没有完成,但是会在未来的某个时间返回结果。
asyncio 有一个 asyncio.wait_for(aws, timeout, *) 方法可以为异步任务设置超时时间。如果超过指定时间后异步操作仍未执行完毕,则该任务被取消并抛出 asyncio.TimeoutError 异常。
timeout 的默认值为 None,即程序会阻塞并一直等待直到 Future 对象关联的操作返回结果。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import asyncio

async def long_time_taking_method():
await asyncio.sleep(4000)
print("Completed the work")

async def main():
try:
await asyncio.wait_for(long_time_taking_method(),
timeout=2)
except asyncio.TimeoutError:
print("Timeout occurred")

asyncio.run(main())
# => Timeout occurred

三、Async 实例代码

通过创建子进程异步执行 Shell 命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import asyncio


async def run(cmd):
proc = await asyncio.create_subprocess_shell(
cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE)

stdout, stderr = await proc.communicate()

print(f'[{cmd!r} exited with {proc.returncode}]')
if stdout:
print(f'[stdout]\n{stdout.decode()}')
if stderr:
print(f'[stderr]\n{stderr.decode()}')


async def main():
await asyncio.gather(
run('sleep 2; echo "world"'),
run('sleep 1; echo "hello"'),
run('ls /zzz'))

asyncio.run(main())

# => ['ls /zzz' exited with 2]
# => [stderr]
# => ls: cannot access '/zzz': No such file or directory

# => ['sleep 1; echo "hello"' exited with 0]
# => [stdout]
# => hello

# => ['sleep 2; echo "world"' exited with 0]
# => [stdout]
# => world

通过 Queue 将工作负载分发给多个异步执行的 Task 处理:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import asyncio
import random
import time


async def worker(name, queue):
while True:
# Get a "work item" out of the queue.
sleep_for = await queue.get()

# Sleep for the "sleep_for" seconds.
await asyncio.sleep(sleep_for)

# Notify the queue that the "work item" has been processed.
queue.task_done()

print(f'{name} has slept for {sleep_for:.2f} seconds')


async def main():
# Create a queue that we will use to store our "workload".
queue = asyncio.Queue()

# Generate random timings and put them into the queue.
total_sleep_time = 0
for _ in range(20):
sleep_for = random.uniform(0.05, 1.0)
total_sleep_time += sleep_for
queue.put_nowait(sleep_for)

# Create three worker tasks to process the queue concurrently.
tasks = []
for i in range(3):
task = asyncio.create_task(worker(f'worker-{i}', queue))
tasks.append(task)

# Wait until the queue is fully processed.
started_at = time.monotonic()
await queue.join()
total_slept_for = time.monotonic() - started_at

# Cancel our worker tasks.
for task in tasks:
task.cancel()
# Wait until all worker tasks are cancelled.
await asyncio.gather(*tasks, return_exceptions=True)

print('====')
print(f'3 workers slept in parallel for {total_slept_for:.2f} seconds')
print(f'total expected sleep time: {total_sleep_time:.2f} seconds')


asyncio.run(main())
# => worker-2 has slept for 0.12 seconds
# => worker-1 has slept for 0.28 seconds
# => worker-1 has slept for 0.12 seconds
# => worker-0 has slept for 0.46 seconds
# => worker-0 has slept for 0.49 seconds
# => worker-2 has slept for 0.90 seconds
# => worker-1 has slept for 0.62 seconds
# => worker-1 has slept for 0.67 seconds
# => worker-0 has slept for 0.85 seconds
# => worker-2 has slept for 0.94 seconds
# => worker-1 has slept for 0.45 seconds
# => worker-2 has slept for 0.19 seconds
# => worker-0 has slept for 0.99 seconds
# => worker-2 has slept for 0.86 seconds
# => worker-1 has slept for 0.97 seconds
# => worker-0 has slept for 0.74 seconds
# => worker-1 has slept for 0.58 seconds
# => worker-2 has slept for 0.73 seconds
# => worker-1 has slept for 0.27 seconds
# => worker-0 has slept for 0.57 seconds
# => ====
# => 3 workers slept in parallel for 4.10 seconds
# => total expected sleep time: 11.80 seconds

参考资料

asyncio — Asynchronous I/O