Introduction to multitasking with Python #003 gevent (2020 tutorial)
1 . Review of Iterator
In python, we could achieve multitasking through another way by using gevent module. Howerver before introducing gevent, we have to review the concept of iterator.
You may have used python for loop millions of times:
for i in Iterable:
print(i)
We have learned several iterable objects, str, list, tuple, dict and set. Those iterable objects or sequences could be used with the for loop. And we take that for granted. But look at the following example:
print(i)
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
for i in 100:
TypeError: 'int' object is not iterable
>>>
Why cannot int type be used with for loop in Python and how could you make an object iterable and be used with for loop?
Now, let's make an iterable object that could be used with for loop.
We could use isinstance() built-in method to check whether an object is iterable or not.
>>> isinstance([11,22,33], Iterable)
True
>>>
As you could see, our list object [11,22,33] is an Iterable object.
Let's create a iterable object ourself:
from collections import Iterable class NameList(object): def __init__(self): self.name_list = [] def add(self, name): self.name_list.append(name) def __iter__(self): return None if __name__ == '__main__': name_list = NameList() print(isinstance(name_list, Iterable))
Create a class named 'NameList' where you could add names. We use a list to actually store the names that you added. The main part is the following:
def __iter__(self):
return None
In order to make your NameList object iterable you must define this __iter__() special method. When you run this program, it will output True when we test our newly created name_list object.
Is this enough for our NameList object to be used with for loop. Of couse not. You could see the result of the following program:
from collections import Iterable class NameList(object): def __init__(self): self.name_list = [] def add(self, name): self.name_list.append(name) def __iter__(self): return None if __name__ == '__main__': name_list = NameList() name_list.add('Pete') name_list.add('Joe') name_list.add('Gill') name_list.add('Henry') for name in name_list: print(name)
After running this program, we see the output as following:
File "E:\02 python\02 multitasking\03-gevent\iterator_demo01.py", line 21, in <module>
for name in name_list:
TypeError: iter() returned non-iterator of type 'NoneType'
It raises the TypeError. The reason why this TypeError is thrown is that in our __iter__() method, we should return a object of type Iterator; while in this case, we return a NoneType object None. A little confused? Don't worry, let me explain this:
Iterator and Iterable objects are two different objects. As could see from the example above, as long as we define a __iter__() method, our object is an Iterable object. Howerver it is not enough for an Iterator object, you must define both the __iter__() and the __next__() methods. So let's do it:
from collections import Iterable, Iterator class NameList(object): def __init__(self): self.name_list = [] def add(self, name): self.name_list.append(name) def __iter__(self): return None class NameListIterator(object): def __iter__(self): pass def __next__(self): pass if __name__ == '__main__': name_list = NameList() name_list_iterator = NameListIterator() print(isinstance(name_list, Iterator)) print(isinstance(name_list_iterator, Iterator))
As you could see, we define both __iter__() and __next__() methods in our NameListIterator class. If run this program, isinstance(name_list, Iterator) returns False whereas isinstance(name_list_iterator, Iterator) returns True.
Now let's make sure that our NameList object can be used with for loop.
from collections import Iterable, Iterator class NameList(object): def __init__(self): self.name_list = [] def add(self, name): self.name_list.append(name) def __iter__(self): return NameListIterator(self) class NameListIterator(object): def __init__(self, obj): self.obj = obj self.n = 0 def __iter__(self): pass def __next__(self): if self.n < len(self.obj.name_list): value = self.obj.name_list[self.n] self.n += 1 return value else: raise StopIteration if __name__ == '__main__': name_list = NameList() name_list.add('Joe') name_list.add('Peter') name_list.add('Henry') name_list.add('Coo') name_list.add('Lay') name_list.add('Tugo') for name in name_list: print(name)
def __iter__(self):
return NameListIterator(self)
Instead of returning None, we return an Iterator type that we just defined. Our iterator will accept one constructor argument which is the NameList object itself.
def __next__(self):
if self.n < len(self.obj.name_list):
value = self.obj.name_list[self.n]
self.n += 1
return value
else:
raise StopIteration
The __next__() method is the core of the Iterator object NameListIterator. When we use for loop to iterate through our name_list object, python actually invokes built-in function next() which in turn invokes the __next__() method defined in the NameListIterator class.
We create a instance variable n to keep track of the length of the name_list, if reaches the end of the name_list, it will raise StropIteration exception and for loop will terminate.
And also there is another short way to achieve this in stead of the lengthy method above.
from collections import Iterable, Iterator class NameList(object): def __init__(self): self.name_list = [] self.n = 0 def add(self, name): self.name_list.append(name) def __iter__(self): return self def __next__(self): if self.n < len(self.name_list): value = self.name_list[self.n] self.n += 1 return value else: raise StopIteration if __name__ == '__main__': name_list = NameList() name_list.add('Joe') name_list.add('Peter') name_list.add('Henry') name_list.add('Coo') name_list.add('Lay') name_list.add('Tugo') for name in name_list: print(name)
We could just put the __next__() method inside our NameList class and return itself intead.
def __iter__(self):
return self
1.1 Application of iterator
We could use iterator to create a data structure that could hold values, but also this structure does not actually create those values and put them into memory. What it does is that whenever you need those values such as iterating them using for loop, it will generate those values in place.
The example above uses a list to hold values which is not a good implementation of this unique character. Let's see another example:
M | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
P | 1 | 1 | 2 | 3 | 5 | 8 | 13 | 21 | 34 | 55 | 89 | 144 | 233 | 377 | 610 | 987 |
The above are a part portion of the famous Fibonacci sequence, we could create an Iterator object that could generate those numbers dynamically instead of storing those numbers in a list or tuple etc.
# Fibonacci iterator from collections import Iterable, Iterator class Fib(object): def __init__(self, n): self.n = n # the number of fibonacci numbers starting from 0 plus 1 self.cur_i = 0 # track the current iteration number self.f0 = 1 # the first two fibonaccis are 1 self.f1 = 1 def __iter__(self): return self def __next__(self): if self.cur_i < self.n+1: if self.cur_i == 0 or self.cur_i == 1: self.cur_i += 1 return 1 else: temp = self.f0 + self.f1 self.f0 = self.f1 self.f1 = temp self.cur_i += 1 return self.f1 else: raise StopIteration if __name__ == '__main__': fib = Fib(15) for n in fib: print(n, end='|')
As you could see from the program above, we did not use any list or tuple to store all the numbers.
Beacause of the __iter__() and __next__() methods, the Fib object is an iterator. So we could use for loop on it.
fib = Fib(15)
for n in fib:
print(n)
We pass 15 to the Fib constructor, it wil generate the fib number from 0 to 15 (16 numbers).
if self.cur_i < self.n+1:
if self.cur_i == 0 or self.cur_i == 1:
self.cur_i += 1
return 1
else:
temp = self.f0 + self.f1
self.f0 = self.f1
self.f1 = temp
self.cur_i += 1
return self.f1
else:
raise StopIteration
Each iteration, __next__() method will be invoked. In the first two iterations, __next__() just return 1. From the third iteration on, we calculate the next number from the summation of the previous two numbers.
f2 = f1 + f0
f3 = f1 + f2
...
f[n] = f[n-1] + f[n-2]
Whenever the current number is returned, we no longer need to keep it and could discard it. So we could only have f0 and f1 overridden by the values next to them in the back,.
For instance starting from the thrid iteration,
f0 = 1, f1 = 1
temp = f0 + f1 = 2 # use temp to hold the value of next place, namely f2
f0 = f1 # f0 is assigned the value of f1
f1 = temp # f1 is assigned with value of temp or f2
return f1
The fourth iteration goes through the same process, until there is no value to generate. And if this is the case, raise StopIteration.
The output of the program should be:
1|1|2|3|5|8|13|21|34|55|89|144|233|377|610|987|
2. Review of generator
A generator is a specicial kind of iterator. It has its own special structure when creating one. There are two ways that you could create a generator with:
- generator expression
- generator iterator function
2.1 generator expression
You may have already created a list using list comprehension:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
In order to create a generator, just replace the brackets "[]" with parenthesis "()":
>>> type(g)
<class 'generator'>
>>>
After the creation of the generator, you could use next() method or for loop to fetch elements from the generator just like the what you do with a normal iterator object.
0
>>> next(g)
1
>>> next(g)
4
2.2 generator iterator function
Next, let's create the same generator like (x**2 for x in range(10)) with generator function.
def num_gen(n): i = 0 while i < n: yield i**2 i += 1 # create a generator object g = num_gen(10) for e in g: print(e)
As you could see from the code above, we define a normal function and if there is a keyword yield inside, this function is a generator function. Its usage is totally different than a normal function.
g = num_gen(10)
When you invoke a generator function, it does not execute the instructions inside and it actually create a generator object just like what a normal class would do. Then we assign the newly created generator object to variable g.
for e in g:
print(e)
Because a generator object is a special kind of iterator, so we could use for loop or next() method to fetch elements from it.
Here is what is really happening when iterating a generator:
In the first iteration, code execution starts from the first line:
Then it goes into the while loop, yield statement will calculate its value which is zero and return the value and assign the value to variable e in the for loop; and then prints the value 0.
After printing the value, execution goes to the third statement: i += 1. Now i becomes 1.
Then this time the return value becomes 1 and print 1. For the second iteration, if it starts from the i=0 again, we still get 0.
The for loop continues until there is no value to return, iteration stops. We could use the following debugging code to check this execution order.
def num_gen(n): print('----1----') i = 0 while i < n: print('----2----') yield i**2 print('----3----') i += 1 # create a generator object g = num_gen(5) for e in g: print(e)
The output is:
----2----
0
----3----
----2----
1
----3----
----2----
4
----3----
----2----
9
----3----
----2----
16
----3----
As you could see, the statement i = 0 is only executed once for the entire loop.
Instead of using next() method to fetch its element, we have a object-oriented way to do the same thing using generator.send() method.
def num_gen(n): i = 0 while i < n: yield i**2 i += 1 # create a generator object g = num_gen(5) print(g.send(None)) print(g.send(None)) print(g.send(None)) print(g.send(None))
The output is:
1
4
9
Instead of passing None, you could pass other argments to the send method. But usually, the first iteration we must pass None as the argument.
def num_gen(n): i = 0 while i < n: res = yield i**2 if res == 9: break i += 1 # create a generator object g = num_gen(10) print(g.send(None)) print(g.send(None)) print(g.send(9))
The output of this program is:
1
Traceback (most recent call last):
File "E:/02 python/02 multitasking/03-gevent/generator_demo01.py", line 14, in <module>
print(g.send(9))
StopIteration
After printing 0 and 1 it throws StopIteration exeption. Here is what is really happening this the background.
When we first invoke g.send(None), execution starts from statement i = 0 and then it goest to the statement res = yield i ** 2. The left part executes first, so it returns the calculated value and we print it. Next, because you pass None to the send() method, then res will be assigned with this value. It then checks the condition res == 9 which returns False and then goes to the statement i += 1.
Next iteration starts when you invoke g.send(None) again, execution starts from the statement res = yiedl i**2, then checks condition res == 9, and i += 1.
In the third iteration, things start to change, execution starts from statement res = yield i**2, now res is assigned with value 9 that you passed into the send() method. Then the condition satisfies, then it breaks out of the while loop and StopIteration is thrown.
In reality, we usually just use next() method and if you want to pass arguments to the gernerator, then use send() method. Just remember, pass None in your first send() invokation.
3. Achieve multitasking using generator
Let's see an example first:
def task_01(): while True: print('----task01----') def task_02(): while True: print('----task02----') def main(): task_01() task_02() if __name__ == '__main__': main()
Now, we have two tasks, if we use the method above our output would be:
Only task01 will be executed while task02 would never be executed.
Let's change it:
def task_01(): while True: print('----task01----') yield def task_02(): while True: print('----task02----') yield def main(): t1 = task_01() t2 = task_02() while True: next(t1) next(t2) if __name__ == '__main__': main()
The output shoud be:
As you could see from the output above, we have achieved multitasking in a single thread.
def task_01():
while True:
print('----task01----')
yield
We have put yield into the function, the function then become an generator function. When we call next() method on the created generator object t1, execution will block at the yield statement for task1 and task2 starts next.
Comparing with multithreading and multiprocessing, this method uses the least resouces per se.
4. Introduction to greenlet and gevent
Before we move on, let's install gevent and greenlet first.
pip install gevent
You only have to install gevent, cuz greenlet will be automatically installed.
4.1 greenlet module
The greenlet module is just like a wrapper around the yield statement. By using greenlet, we could just write normal functions without yield statement to achieve the same end. Let's look at the the example with greenlet:
from greenlet import greenlet def task_01(): global t2 while True: print('----task01----') t2.switch() # switch to task2 def task_02(): global t1 while True: print('----task02----') t1.switch() # switch to task1 again def main(): global t1, t2 t1 = greenlet(task_01) t2 = greenlet(task_02) t1.switch() # switch to task1 first if __name__ == '__main__': main()
global t1, t2
t1 = greenlet(task_01)
t2 = greenlet(task_02)
We create two greenlet objects for task_01 and task_02.
t1.switch()
Then we switch to task_01 first and task_01 executes and print '----task01----'.
while True:
print('----task01----')
t2.switch() # switch to task2
Next, we invoke t2.swtich() to swtich to task_02, then task_01 stops and task_02 starts.
while True:
print('----task02----')
t1.switch() # switch to task1 again
Then after print '----task02---', we switch to task_01 again. Our two tasks will execute interchangably one after another.
Output should be same as the example above.
4.2 gevent
The gevent module is another higher level module that wraps around the greenlet module. It adds more features to the greenlet module. Its usage is simple too.
from gevent import spawn, sleep def task_01(): while True: print('----task01----') sleep(0.0001) def task_02(): while True: print('----task02----') sleep(0.0001) def main(): t1 = spawn(task_01) t2 = spawn(task_02) t1.join() t2.join() if __name__ == '__main__': main()
t1 = spawn(task_01)
t2 = spawn(task_02)
We use spawn method to create two <class 'gevent._gevent_cgreenlet.Greenlet'> objects and then we use join() method to start task_01. The task_01 keeps running until it comes across the sleep() method and then it switches to task_02 until reach sleep() again and then back to task_01.
Warning, the sleep() method is the gevent.sleep() instead of time.sleep(). The gevent module only support its own sleep() method.
The output of this program is also the same as above.
Comments
Post a Comment