Introduction to multitasking with Python #001 multithreading (2020 tutorial)


1. What is multitasking?

So far, we have written so many programs with python. All the instructions inside a program run sequentially one by one. Let's look at an simple example:

# multithreading_example01.py
# basci program without using threading
import time

def sing():
      for i in range(5):
            print('Singing...')
            time.sleep(1)
            
def dance():
      for i in range(5):
            print('Dancing...')
            time.sleep(1)

def main():
      # singing a song
      sing()
      # dancing
      dance()

if __name__ == '__main__':
      main()

 After running this program, after ten seconds the program exits.

But if we want to dance and sing at the same time intead of doing one after another, what we could do with the code?

In python, we could use the threading module and run both sing() and dance functions at the same time. Here is how we can do it:

# multithreading_example01.py
# basci program with threading
import time, threading

def sing():
      for i in range(5):
            print('Singing...')
            time.sleep(1)
            
def dance():
      for i in range(5):
            print('Dancing...')
            time.sleep(1)

def main():

      # start a thread for singing
      t1 = threading.Thread(target=sing)
      # start a thread for dancing
      t2 = threading.Thread(target=dance)
      t1.start()
      t2.start()

if __name__ == '__main__':
      main()
 
# start a thread for singing
t1 = threading.Thread(target=sing)
 # start a thread for dancing
t2 = threading.Thread(target=dance)

In this program, we create two threads, one for sing() and one for dance(). After we invoke start() method on each thread, both functions will run at roughly the same time.


2. Introduction to threading.module

In order to create a thread, we could use: threading.Thread(target=name_of_function)

A Thread object represents an activity that can run in a separate thread of control. After a Thread object is created, you must invoke its start() method to start the activity.

There is always a main thread that could spawn many child threads. Let's see another example:

# multithreading_example03.py
# basci program with threading
import time, threading, sys

def cooking():
      for i in range(5):
            print('child thread t1...')
            print('cooking...')
            time.sleep(1)
      

def watching_tv():
      for i in range(5):
            print('child thread t2...')
            print('watching tv...')
            time.sleep(1)
      

def main():
      # start two child threads
      t1 = threading.Thread(target=cooking)
      t2 = threading.Thread(target=watching_tv)

      t1.start()
      t2.start()

      print('main tread')


if __name__ == '__main__':
      main()

After we run this program, the main thread started first. 

In the main thread, instructions are executed by the orders. 

  1. import modules
  2. check condition __name__ == '__main__', if satisfies, move on
  3. invoke main() 
  4. create a thread for the activity of cooking
  5. create a thread for the activity of watching tv
  6. start activity t1
  7. start activity t2
  8. print 'main stread' string

In step 6,  cooking activity is started in a separate child thread of control. This activity begin to run while the main thread is running. This activity need much more time to finish, but the main thread is not blocked by this thread. Instead, the main thread goes to the step 7. Same thing happens to the watching tv activity. This activity could not block the main activity either.

After the main thread finished, the two child threads are still running. When the two child threads finished executing, the program exited.


As we can see from the screen shot above, the running order of the main and two child threads are not related. The order is determined by the operating system. Let's see another example.

# multithreading_example04.py
# basci program with threading
import threading, time

def cooking():
      for i in range(5):
            print('child thread t1...')
            print('cooking...')
            
      

def watching_tv():
      for i in range(5):
            print('child thread t2...')
            print('watching tv...')
            
      

def main():
      # start two child threads
      t1 = threading.Thread(target=cooking)
      t2 = threading.Thread(target=watching_tv)

      t1.start()
      t2.start()

      print(threading.enumerate())
      print('main tread')
      time.sleep(1)
      print(threading.enumerate())


if __name__ == '__main__':
      main()

During the execution of the program, we could use threading.enumerate() to tell how many threads are running.  Let's see the outputs of the program above.


As you could see from the screenshot above, the two child threads starts to run when we invoke its start() method instead of the time when they are created. As you can see from the part 1, our threading.enumerate() displays three threads in a list, the main thread and two child threads. We use time.sleep(1) to let the main thread sleep 1 second during which our two child threads finished their tasks and threads are killed. So in part 2, we could only see the main thread in the list.

By the way, if the main thread is killed and any other child threads will be killed too.

2.1 An object-oriented approach to create a Thread object

Except for the method in the previous topic, we could subclass the threading.Thread class and just override its run() method. Let's see an example.

# multithreading_example05.py
# basci program with threading
import threading, time

class CookingThread(threading.Thread):
      def run(self):
            for i in range(5):
                  print('child thread t1...')
                  print('cooking...')
                  time.sleep(1)
class WatchingTVThread(threading.Thread):
      def run(self):
            for i in range(5):
                  print('child thread t2...')
                  print('watching tv...')
                  time.sleep(1)

def main():
      # Create two child threads
      t1 = CookingThread()
      t2 = WatchingTVThread()

      t1.start()
      t2.start()

      print('main tread')

if __name__ == '__main__':
      main()

We could only override the run() method in the subclass. After we created the instance out of the Thread subclass, we could invoke its instance method start() which in turn invokes the actual run() method to start the child thread.

3. The relationship betwen global variables and threads

Global variables are variables that defined outside funtions; and they are shared by all threads. Let's see an example:

# multithreading_example05.py
# basci program with threading
import threading, time

# define a global variable
num = 100

def test01():
      global num
      num += 1
      print('----test01----')
      print('num=%d' % num)
      
def test02():
      print('----test02----')
      print('num=%d' % num)

def main():
      # Create two child threads
      t1 = threading.Thread(target=test01)
      t2 = threading.Thread(target=test02)

      t1.start()
      time.sleep(1)
      t2.start()
      time.sleep(1)

      print('----main tread----')
      print('num=%d' % num)

if __name__ == '__main__':
      main()

# define a global variable
num = 100

We define a gloabl variable that could be visible to all threads.

t1 = threading.Thread(target=test01)
t2 = threading.Thread(target=test02)

t1.start()
time.sleep(1)
t2.start()
time.sleep(1)

Then we use time.sleep(1) to let the task1 to run and finish first, then the task2 and finally the main task. Here is the output:


As you could see, at start num is 100, then the child thread t1 change it to 101, then child thread t2 print it and finally the main thread also print the value 101.

So let's see another example in which we define a global variable of type list.

# multithreading_example07.py
# basci program with threading
import threading, time

# define a global variable
nums = [11,22]

def test01(temp):
      temp.append(33)
      print('----test01----')
      print('temp=%s' % str(temp))
      
def test02(temp):
      print('----test02----')
      print('temp=%s' % str(temp))

def main():
      # Create two child threads
      t1 = threading.Thread(target=test01, args=(nums,))
      t2 = threading.Thread(target=test02, args=(nums,))

      t1.start()
      time.sleep(1)
      t2.start()
      time.sleep(1)

      print('----main tread----')
      print('nums=%s' % str(nums))

if __name__ == '__main__':
      main()

# define a global variable
nums = [11,22]

We define a global list. Now we coud pass this list as arguments to our task functions by passing keyword arguments args=(nums, ) to the Thread class' constructor. The value of args keyword is a tuple, so that's why I put a comma after (nums,) nums variable.

Let's see the output of this program:

As you can see from the result, although temp variable is local, but the list [11,22] is not copied to the variable temp, it is just a reference to the original list. So temp points to the same list inside each thread.
After appending 33:


3.1 There is a race condition

When more than one threads are trying to manipulate the same global variable at the same time, there is a race condition in this process. What does that mean? Let's see an example:

# multithreading_example07.py
# basci program with threading
import threading, time

# define a global variable
num = 0

def test01(n):
      global num
      for i in range(n):
            num += 1
      print('----test01---- num = %d' % num)
def test02(n):
      global num
      for i in range(n):
            num += 1
      print('----test01---- num = %d' % num)

def main():
      # Create two child threads
      t1 = threading.Thread(target=test01, args=(100,))
      t2 = threading.Thread(target=test02, args=(100,))

      t1.start()
      t2.start()
      time.sleep(5)

      print('----main tread---- num = %d' % num)

if __name__ == '__main__':
      main()

In this program, we started two child threads alongside the main thread. In each child thread, we add the globle variable num by 1. As you could see, test01() function added 100 times, test02() added another 100 times then the final value of num should be 200.  So let's see the result:

As you could see, the result is 200 indeed. But if we let two functions add 1,000,000 times, the final should be 2,000,000.

t1 = threading.Thread(target=test01, args=(1000000,))
t2 = threading.Thread(target=test02, args=(1000000,))

As you could see from the result, the num is not 2,000,000 but a wierd 1092354. What had happened to this program? In order to explain this, let's break down the code into small pieces.

We have two child thread, both of them do the same thing, add 1 to global variable num. In order to achieve multithreading or multitasking in this case, our CPU need to allocate its resources to both threads in a way by which it could fool us that it seems like both threads are running at the same time. Let's break down the for loop a little further.

The instruction num += 1 could be break down to three instructions:

  1. gain the value of num
  2. add 1 to num
  3. assign the new value to num
In reality, CPU could first fetch the first two instructions above in thread 1 and execute it. 


  1. gain the value of num which is 0
  2. add 1 to 0 which is 1

After the execution of the first two instrution, the value of num is still 0 and nothing changed to it.

Then the CPU quit executing the instructions from thread 1 and start to execute the first two instructions from thread 2.

  1. gain the value of num which is 0
  2. add 1 to 0 which is 1

Because our thread 1 did not finish the third step because of CPU, so num is stil 0 when thread 2 is trying to gain the value of num. Then CPU execute 2nd instruction of thread 2 (add 1 to num). Once again the CPU won't execute the third instruction for thread 2 and it turns to the thread 1 again and start executing instructions where it has left.



Now, CPU execute the 3 rd instruction from the thread 1, now the num has become 1. Similarly, after the execution of the 3rd instruction from the thread 1, the CPU turns to thread 2 again and start executing from where it has left before.
Because in its last step, the value of num known to thread 2 is 0, so after the execution of the 3rd instruction, num becomes 1 instead of 2. Then after two add operations, one from thread 1, one from thread 2, the value of num  is still 1.

So that is the reason why such wierd number appears in the above example.

3.2 How to solve this race condition

In the previous example, the race condition is caused by "num += 1 ".  When executing this instruction, it actually breaks down to three instructions that the CPU execute one by one. The solution to this problem is that we must let the CPU finish executing the three instrutions of thread 1 first, then the CPU could execute other instructions from thread 2. 

Our threading.Lock object could help us to do this trick and its usage is simple:

# multithreading_example09.py
# basci program with threading
import threading, time

# define a global variable
num = 0
# create a Lock object
mutex = threading.Lock()

def test01(n):
      global num
      for i in range(n):
            mutex.acquire()
            num += 1
            mutex.release()
      print('----test01---- num = %d' % num)
def test02(n):
      global num
      for i in range(n):
            mutex.acquire()
            num += 1
            mutex.release()
      print('----test01---- num = %d' % num)

def main():
      # Create two child threads
      t1 = threading.Thread(target=test01, args=(1000000,))
      t2 = threading.Thread(target=test02, args=(1000000,))

      t1.start()
      t2.start()
      time.sleep(5)

      print('----main tread---- num = %d' % num)

if __name__ == '__main__':
      main()

# create a Lock object
mutex = threading.Lock()

We create a Lock object called mutex.

mutex.acquire()
num += 1
mutex.release()

Then we use this Lock object to lock the instruction num += 1 so that the CPU will execute all the three steps above. The mutex.acquire() method will acquire the lock if the lock is relased otherwise it will block and wait for its release from other threads. In the end, we must release the lock for other threads. If you forgot to release the lock in thread 1, thread 2 will block forever and never execute the statement num += 1.

Let's test our program several times:

The results are 2,000,000 and our race condition problem solved.

3.2 Avoid the dead lock

In one program, we could create several locks like threads. Let's look at the following example:

# multithreading_example10.py
# basci program with threading
import threading, time

mutex01 = threading.Lock()
mutex02 = threading.Lock()

class Thread01(threading.Thread):

      def run(self):
            print('----thread 1---')

            # acquire mutex01
            mutex01.acquire()
            print('----mutex01 acquired----')
            time.sleep(1)

            # acquire mutex02
            mutex02.acquire()
            print('----mutex02 acquired----')
            time.sleep(1)

            # release mutex02
            mutex01.release()

class Thread02(threading.Thread):

      def run(self):
            print('----thread 2---')

             # acquire mutex02
            mutex02.acquire()
            print('----mutex02 acquired----')
            time.sleep(1)

            # acquire mutex01
            mutex01.acquire()
            print('----mutex01 acquired----')
            time.sleep(1)

            # release mutex02
            mutex01.release()
      

def main():
      # Create two child threads
      t1 = Thread01()
      t2 = Thread02()

      t1.start()
      t2.start()

if __name__ == '__main__':
      main()

In this program we create two locks, mutex01 and mutex02. After running this program, we get this:

This program never ends and there is a dead lock condition. Let's what is wrong with this program.

When two threads are started, both execute statements from top to bottom.

Green and red arrows are like CPU, it execute statements one by one.


At this step above, mutex01 and mutex02 are in release state, so both lock are acquired successfully by the two thread. Then CPU continue to execute,


At this stage above, thread 1 is trying to acquire the lock mutex02 and thread 2 is trying to acquire the lock mutex01, but both locks by this stage are in locked state, so the program blocks at this stage.

4. Application UDP chater with multithreading

In our UDP tutorial, we have written a single thread UDP chatter where we could only send and receive message separately. Now we could write a UDP chatter that could send and receive message constantly with just one socket.

# UDP_chatter_multithreading_version.py

import socket, threading

def send_to(udp_socket, dest_ip, dest_port):
      while True:
            msg = input('Enter message: ')
            udp_socket.sendto(msg.encode('utf-8'), (dest_ip, dest_port))

def recv_from(udp_socket):
      while True:
            data_recv = udp_socket.recvfrom(1024)
            print(data_recv)

def main():
      # create a UDP socket
      udp_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

      # bind a local address to udp_socket
      udp_socket.bind(('', 7878))

      # remote addresses
      dest_ip = input('Enter the remote IP: ')
      dest_port = int(input('Enter the remote port: '))

      # send message
      send_task = threading.Thread(target=send_to, args=(udp_socket, dest_ip, dest_port))
      # receive message
      recv_task = threading.Thread(target=recv_from, args=(udp_socket,))

      # start the threads
      send_task.start()
      recv_task.start()

if __name__ == '__main__':
      main()

In this program, we created two threads, one for send_to() function and another for recv_from function. When the program is running, it could both receive and sent message at the same time.








Comments

Popular posts from this blog

How to create a memory puzzle game with Python and Pygame (#005)

How to write a slide puzzle game with Python and Pygame (2020 tutorial)

Introduction to multitasking with Python #003 gevent (2020 tutorial)