Introduction to multitasking with Python #003 gevent (2020 tutorial)


 1 . Review of Iterator

In python, we could achieve multitasking through another way by using gevent module. Howerver before introducing gevent, we have to review the concept of iterator.

You may have used python for loop millions of times:

for i in Iterable:

    print(i)

We have learned several iterable objects, str, list, tuple, dict and set. Those iterable objects or sequences could be used with the for loop. And we take that for granted. But look at the following example:

>>> for i in 100:
print(i)

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    for i in 100:
TypeError: 'int' object is not iterable
>>> 

Why cannot int type be used with for loop in Python and how could you make an object iterable and be used with for loop?

Now, let's make an iterable object that could be used with for loop.

We could use isinstance() built-in method to check whether an object is iterable or not.

>>> from collections import Iterable
>>> isinstance([11,22,33], Iterable)
True
>>> 

As you could see, our list object [11,22,33] is an Iterable object.

Let's create  a iterable object ourself:

from collections import Iterable

class NameList(object):
      
      def __init__(self):
            self.name_list = []

      def add(self, name):
            self.name_list.append(name)

      def __iter__(self):
            return None

if __name__ == '__main__':
      name_list = NameList()
      print(isinstance(name_list, Iterable))

Create a class named 'NameList' where you could add names. We use a list to actually store the names that you added. The main part is the following:

def __iter__(self):
    return None

In order to make your NameList object iterable you must define this __iter__() special method. When you run this program, it will output True when we test our newly created name_list object.

Is this enough for our NameList object to be used with for loop. Of couse not. You could see the result of the following program:

from collections import Iterable

class NameList(object):
      
      def __init__(self):
            self.name_list = []

      def add(self, name):
            self.name_list.append(name)

      def __iter__(self):
            return None

if __name__ == '__main__':
      name_list = NameList()
      name_list.add('Pete')
      name_list.add('Joe')
      name_list.add('Gill')
      name_list.add('Henry')

      for name in name_list:
            print(name)
      

After running this program, we see the output as following:

Traceback (most recent call last):
  File "E:\02 python\02 multitasking\03-gevent\iterator_demo01.py", line 21, in <module>
    for name in name_list:
TypeError: iter() returned non-iterator of type 'NoneType'

It raises the TypeError. The reason why this TypeError is thrown is that in our __iter__() method, we should return a object of type Iterator; while in this case, we return a NoneType object None. A little confused? Don't worry, let me explain this:

Iterator and Iterable objects are two different objects. As could see from the example above, as long as we define a __iter__() method, our object is an Iterable object. Howerver it is not enough for an Iterator object, you must define both the __iter__() and the __next__() methods. So let's do it:

from collections import Iterable, Iterator

class NameList(object):
      
      def __init__(self):
            self.name_list = []

      def add(self, name):
            self.name_list.append(name)

      def __iter__(self):
            return None

class NameListIterator(object):
      def __iter__(self):
            pass
      def __next__(self):
            pass

if __name__ == '__main__':
      name_list = NameList()
      name_list_iterator = NameListIterator()
      print(isinstance(name_list, Iterator))
      print(isinstance(name_list_iterator, Iterator))
      

As you could see, we define both __iter__() and __next__() methods in our NameListIterator class. If run this program, isinstance(name_list, Iterator) returns False whereas isinstance(name_list_iterator, Iterator) returns True.

Now let's make sure that  our NameList object can be used with for loop.

from collections import Iterable, Iterator

class NameList(object):
      
      def __init__(self):
            self.name_list = []

      def add(self, name):
            self.name_list.append(name)

      def __iter__(self):
            return NameListIterator(self)

class NameListIterator(object):
      def __init__(self, obj):
            self.obj = obj
            self.n = 0
            
      def __iter__(self):
            pass
      
      def __next__(self):
            if self.n < len(self.obj.name_list):
                  value = self.obj.name_list[self.n]
                  self.n += 1
                  return value
            else:
                  raise StopIteration
                        
if __name__ == '__main__':
      name_list = NameList()
      name_list.add('Joe')
      name_list.add('Peter')
      name_list.add('Henry')
      name_list.add('Coo')
      name_list.add('Lay')
      name_list.add('Tugo')

      for name in name_list:
            print(name)
      

def __iter__(self):
    return NameListIterator(self)

Instead of returning None, we return an Iterator type that we just defined. Our iterator will accept one constructor argument which is the NameList object itself.

def __next__(self):
     if self.n < len(self.obj.name_list):
          value = self.obj.name_list[self.n]
          self.n += 1
          return value
     else:
          raise StopIteration

The __next__() method is the core of the Iterator object NameListIterator. When we use for loop to iterate through our name_list object, python actually invokes built-in function next() which in turn invokes the __next__() method defined in the NameListIterator class.

We create a instance variable n to keep track of the length of the name_list, if reaches the end of the name_list, it will raise StropIteration exception and for loop will terminate.

And also there is another short way to achieve this in stead of the lengthy method above.

from collections import Iterable, Iterator

class NameList(object):
      
      def __init__(self):
            self.name_list = []
            self.n = 0

      def add(self, name):
            self.name_list.append(name)

      def __iter__(self):
            return self

      def __next__(self):
            if self.n < len(self.name_list):
                  value = self.name_list[self.n]
                  self.n += 1
                  return value
            else:
                  raise StopIteration
      

if __name__ == '__main__':
      name_list = NameList()
      name_list.add('Joe')
      name_list.add('Peter')
      name_list.add('Henry')
      name_list.add('Coo')
      name_list.add('Lay')
      name_list.add('Tugo')

      for name in name_list:
            print(name)

We could just put the __next__() method inside our NameList class and return itself intead.

def __iter__(self):
    return self

1.1 Application of iterator

We could use iterator to create a data structure that could hold values, but also this structure does not actually create those values and put them into memory. What it does is that whenever you need those values such as iterating them using for loop, it will generate those values in place.

The example above uses a list to hold values which is not a good implementation of this unique character. Let's see another example:

M

0

1

2

4

5

6789101112131415

P

1

1

2

3

5

8

13

21

34

55

89

144

233

377

610

987

The above are a part portion of the famous Fibonacci sequence, we could create an Iterator object that could generate those numbers dynamically instead of storing those numbers in a list or tuple etc.

# Fibonacci iterator
from collections import Iterable, Iterator

class Fib(object):
      
      def __init__(self, n):
            self.n = n # the number of fibonacci numbers starting from 0 plus 1
            self.cur_i = 0 # track the current iteration number
            self.f0 = 1 # the first two fibonaccis are 1
            self.f1 = 1

      def __iter__(self):
            return self

      def __next__(self):
            if self.cur_i < self.n+1:
                  if self.cur_i == 0 or self.cur_i == 1:
                        self.cur_i += 1
                        return 1
                  else:
                        temp = self.f0 + self.f1
                        self.f0 = self.f1
                        self.f1 = temp
                        self.cur_i += 1
                        return self.f1
            else:
                  raise StopIteration
                        
                  
if __name__ == '__main__':
      fib = Fib(15)
      for n in fib:
            print(n, end='|')

As you could see from the program above, we did not use any list or tuple to store all the numbers. 

Beacause of the __iter__() and __next__() methods, the Fib object is an iterator. So we could use for loop on it.

fib = Fib(15)
for n in fib:
    print(n)

We pass 15 to the Fib constructor, it wil generate the fib number from 0 to 15 (16 numbers). 

if self.cur_i < self.n+1:
      if self.cur_i == 0 or self.cur_i == 1:
            self.cur_i += 1
            return 1
      else:
            temp = self.f0 + self.f1
            self.f0 = self.f1
            self.f1 = temp
            self.cur_i += 1
            return self.f1
else:
     raise StopIteration

Each iteration, __next__() method will be invoked. In the first two iterations, __next__() just return 1. From the third iteration on, we calculate the next number from the summation of the previous two numbers.

f2 = f1 + f0

f3 = f1 + f2

...

f[n] = f[n-1] + f[n-2]

Whenever the current number is returned, we no longer need to keep it and could discard it. So we could only have f0 and f1 overridden by the values next to them in the back,. 

For instance starting from the thrid iteration,

f0 = 1, f1 = 1

temp = f0 + f1 = 2 # use temp to hold the value of next place, namely f2

f0 = f1 # f0 is assigned the value of f1

f1 = temp # f1 is assigned with value of temp or f2

return f1

The fourth iteration goes through the same process, until there is no value to generate. And if this is the case, raise StopIteration.

The output of the program should be:

1|1|2|3|5|8|13|21|34|55|89|144|233|377|610|987|

2. Review of generator

A generator is a specicial kind of iterator. It has its own special structure when creating one. There are two ways that you could create a generator with:

  1. generator expression
  2. generator iterator function

2.1 generator expression

You may have already created a list using list comprehension:

>>> [x**2 for x in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In order to create a generator, just replace the brackets "[]" with parenthesis "()":

>>> g = (x**2 for x in range(10))
>>> type(g)
<class 'generator'>
>>> 

After the creation of the generator, you could use next() method or for loop to fetch elements from the generator just like the what you do with a normal iterator object.

>>> next(g)
0
>>> next(g)
1
>>> next(g)
4

2.2 generator iterator function

Next, let's create the same generator like (x**2 for x in range(10)) with generator function.

def num_gen(n):
      i = 0
      while i < n:
            yield i**2
            i += 1

# create a generator object
g = num_gen(10)
for e in g:
      print(e)

As you could see from the code above, we define a normal function and if there is a keyword yield inside, this function is a generator function. Its usage is totally different than a normal function.

g = num_gen(10)

When you invoke a generator function, it does not execute the instructions inside and it actually create a generator object just like what a normal class would do. Then we assign the newly created generator object to variable g.

for e in g:
      print(e)

Because a generator object is a special kind of iterator, so we could use for loop or next() method to fetch elements from it.

Here is what is really happening when iterating a generator:

In the first iteration, code execution starts from the first line:


Then it goes into the while loop, yield statement will calculate its value which is zero and return the value and assign the value to variable e in the for loop; and then prints the value 0.


After printing the value, execution goes to the third statement: i += 1. Now i becomes 1. 


Then the first iteration stops, the second iteration starts. To our surprise, the execution starts from the "yield i**2" statement this time instead of i = 0 statement.



Then this time the return value becomes 1 and print 1. For the second iteration, if it starts from the i=0 again, we still get 0. 

The for loop continues until there is no value to return, iteration stops. We could use the following debugging code to check this execution order.

def num_gen(n):
      print('----1----')
      i = 0
      while i < n:
            print('----2----')
            yield i**2
            print('----3----')
            i += 1

# create a generator object
g = num_gen(5)
for e in g:
      print(e)

The output is:

----1----
----2----
0
----3----
----2----
1
----3----
----2----
4
----3----
----2----
9
----3----
----2----
16
----3----

As you could see, the statement i = 0 is only executed once for the entire loop.

Instead of using next() method to fetch its element, we have a object-oriented way to do the same thing using generator.send() method.

def num_gen(n):
      i = 0
      while i < n:
            yield i**2
            i += 1

# create a generator object
g = num_gen(5)

print(g.send(None))
print(g.send(None))
print(g.send(None))
print(g.send(None))

The output is:

0
1
4
9

Instead of passing None, you could pass other argments to the send method. But usually, the first iteration we must pass None as the argument. 

def num_gen(n):
      i = 0
      while i < n:
            res = yield i**2
            if res == 9:
                  break
            i += 1

# create a generator object
g = num_gen(10)

print(g.send(None))
print(g.send(None))
print(g.send(9))

The output of this program is:

0
1
Traceback (most recent call last):
  File "E:/02 python/02 multitasking/03-gevent/generator_demo01.py", line 14, in <module>
    print(g.send(9))
StopIteration

After printing 0 and 1 it throws StopIteration exeption. Here is what is really happening this the background.

When we first invoke g.send(None), execution starts from statement i = 0 and then it goest to the statement res = yield i ** 2. The left part executes first, so it returns the calculated value  and we print it. Next, because you pass None to the send() method, then res will be assigned with this value. It then checks the condition res == 9 which returns False and then goes to the statement i += 1. 

Next iteration starts when you invoke g.send(None) again, execution starts from the statement res = yiedl i**2, then checks condition res == 9, and i += 1.

In the third iteration, things start to change, execution starts from statement res = yield i**2, now res is assigned with value 9 that you passed into the send() method. Then the condition satisfies, then it breaks out of the while loop and StopIteration is thrown. 

In reality, we usually just use next() method and if you want to pass arguments to the gernerator, then use send() method. Just remember, pass None in your first send() invokation.

3. Achieve multitasking using generator

Let's see an example first:

def task_01():
      while True:
            print('----task01----')

def task_02():
      while True:
            print('----task02----')

def main():
      task_01()
      task_02()

if __name__ == '__main__':
      main()

Now, we have two tasks, if we use the method above our output would be:


Only task01 will be executed while task02 would never be executed.

Let's change it:

def task_01():
      while True:
            print('----task01----')
            yield

def task_02():
      while True:
            print('----task02----')
            yield

def main():
      t1 = task_01()
      t2 = task_02()

      while True:
            next(t1)
            next(t2)
      
if __name__ == '__main__':
      main()

The output shoud be:



As you could see from the output above, we have achieved multitasking in a single thread. 

def task_01():
      while True:
            print('----task01----')
            yield

We have put yield into the function, the function then become an generator function. When we call next() method on the created generator object t1, execution will block at the yield statement for task1 and task2 starts next.

Comparing with multithreading and multiprocessing, this method uses the least resouces per se.

4. Introduction to greenlet and gevent

Before we move on, let's install gevent and greenlet first.

pip install gevent

You only have to install gevent, cuz greenlet will be automatically installed.

4.1 greenlet module

The greenlet module is just like a wrapper around the yield statement. By using greenlet, we could just write normal functions without yield statement to achieve the same end. Let's look at the the example with greenlet:

from greenlet import greenlet

def task_01():
      global t2
      while True:
            print('----task01----')
            t2.switch() # switch to task2

def task_02():
      global t1
      while True:
            print('----task02----')
            t1.switch() # switch to task1 again

def main():
      global t1, t2
      t1 = greenlet(task_01)
      t2 = greenlet(task_02)

      t1.switch() # switch to task1 first
      
if __name__ == '__main__':
      main()

global t1, t2
t1 = greenlet(task_01)
t2 = greenlet(task_02)

We create two greenlet objects for task_01 and task_02. 

t1.switch()

Then we switch to task_01 first and task_01 executes and print '----task01----'. 

while True:
    print('----task01----')
    t2.switch() # switch to task2

Next, we invoke t2.swtich() to swtich to task_02, then task_01 stops and task_02 starts. 

while True:
    print('----task02----')
    t1.switch() # switch to task1 again

Then after print '----task02---', we switch to task_01 again.  Our two tasks will execute interchangably one after another.


Output should be same as the example above.

4.2 gevent

The gevent module is another higher level module that wraps around the greenlet module. It adds more features to the greenlet module. Its usage is simple too.

from gevent import spawn, sleep

def task_01():
      while True:
            print('----task01----')
            sleep(0.0001)            

def task_02():
      while True:
            print('----task02----')
            sleep(0.0001)

def main():
      t1 = spawn(task_01)
      t2 = spawn(task_02)

      t1.join()
      t2.join()
      
      
if __name__ == '__main__':
      main()

t1 = spawn(task_01)
t2 = spawn(task_02)

We use spawn method to create two <class 'gevent._gevent_cgreenlet.Greenlet'> objects and then we use join() method to start task_01. The task_01 keeps running until it comes across the sleep() method and then it switches to task_02 until reach sleep() again and then back to task_01. 

Warning, the sleep() method is the gevent.sleep() instead of time.sleep(). The gevent module only support its own sleep() method.

The output of this program is also the same as above.


Comments

Popular posts from this blog

How to write a slide puzzle game with Python and Pygame (2020 tutorial)

How to create a memory puzzle game with Python and Pygame (#005)

Introduction to multitasking with Python #001 multithreading (2020 tutorial)