Introduction to network programming with Python #0003 HTTP 2020 Tutorial)

 


Table of Contents

1. Introduction to HTTP

2. Write a TCP server that serves as a HTTP server

3. A HTTP server demo

4. A HTTP server with multiprocessing

5. A HTTP server with multithreading

6. A HTTP server with gevent

7. A multitasking TCP server without threadings, multiprocessing, and gevent

8. HTTP Persistent connection Vs multiple connections

9. Improve our server with epoll (Linux only)


1. Introduction to HTTP

Let's just explain HTTP in a simple way.

The HTTP (Hyper Text Transfer Protocol) is a way to let you enter a website address in a web browser and get a web page that you want. 

For example:


The HTTP protocol is built on top of TCP/IP protocol and it defines or restricts the way how a web browser communicate with the HTTP server.

Let's open Chrome and enter google.com.  Then press "F12" to enter the dev panel and select Network tab.


Then refresh the page with "F5" and open the www.google.com item in the dev panel.





As you could see from the headers section, we could see the response headers and request headers. Actually, when we request a page from the google.com server, our browser send a request header with all the information requested by the HTTP protocol; and then the google server will response with a response headers along side the content.

Now let's open the Net Assistant and start a TCP server. Now we could connect to this server through our web browser.


When we enter 127.0.0.1:8080 into our browser, our Net Assistant will display the HTTP header information sent by the browser.


The most import part is the first line:

GET / HTTP/1.1

As you could see, "GET" is for getting content from the web server. If you post a form, then the method is "POST". The next "/" slash is the file that your browser want to get, because we only enter "127.0.0.1:8080" without a file name such as "127.0.0.1:8080/index.html", so it is only a single "/". If you enter "127.0.0.1:8080/index.html", the first line would become:

GET /index.html HTTP/1.1

The last part is the HTTP version HTTP/1.1.

As for the reponse, if the server finds the file that the browser requested, the server will send the response to the browser.

In order to complete the demo, we could send the following info back to browser.

HTTP/1.1 200 OK

<h1>Hello</h1>

There is a empty line betwen the first line and the content. The first line is the most important header in a HTTP response. It tells the browser file exists and is ready to send the content back to the browser. The actual content follows the headers. The browser uses the empty line to separate headers and content.



After you entered HTTP/1.1 200 OK \n\n<h1>hello<\h1> and send, the browser will display the hello message.

2. Write a TCP server that serves as a HTTP server

Now, let's write a TCP server that could accept requests from a web browser.

# a simple http server
import socket, re


def main():
      http_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

      http_socket.bind(("", 8080))
      http_socket.listen(128)
      while True:
            client_socket, client_address = http_socket.accept()
            print(f"{client_address}: connected")
            content = client_socket.recv(1024).decode('utf-8')
            res = re.search(r"(POST|GET) (/.*) HTTP/1\.[10]", content)
            file_name = res.group(2)
            print(content)
            if file_name == "/":
                  response = 'HTTP/1.1 200 OK\n\n<h1>Hello, Welcome!</h1>'
            else:
                  response = 'HTTP/1.1 200 OK\n\n<h1>Nothing found</h1>'
            client_socket.send(response.encode('utf-8'))
            client_socket.close()
      http_socket.close()

if __name__ == "__main__":
      main()

It is a simple program that we have written several times in the previous topics. One thing to notice is that:

res = re.search(r"(POST|GET) (/.*) HTTP/1\.[10]", content)
file_name = res.group(2)

We use a regular expression to get the first line and get the file name requested by the browser.

If the file name is not specified, return "Hello, Welcome!", otherwise return "Nothing found".

http://192.168.123.64:8080/news.html

http://192.168.123.64:8080




3. A HTTP server demo

We are gonna impove the code above and allow our browser to get the specific page. We are gonna use a file which contains a lot of html files and pictures. We are gonna use this file to test our server.

All those files are in the directory named "demo" and our server is in the same directory as the demo.

# a simple http server
import socket, re

def handle_client(http_socket):
      client_socket, client_address = http_socket.accept()
      
      print(f"{client_address}: connected")
      request_content = client_socket.recv(1024).decode('utf-8')
      res = re.search(r"(POST|GET) (/.*) HTTP/1\.[10]", request_content)
      file_name = res.group(2)

      response_header = b"HTTP/1.1 200 OK\n\n"
      # open the file_name
      try:
            if file_name == "/":
                  f = open(f"demo/index.html", 'rb')
            else:
                  f = open(f"demo{file_name}", 'rb')
      except:
            response_header = b"HTTP/1.1 404 NOT FOUND\n\n"
            content = b"File Not Found"
      else:
            content = f.read()
            f.close()

      client_socket.send(response_header)
      client_socket.send(content)
      client_socket.close()

def main():
      http_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

      http_socket.bind(("", 8080))
      http_socket.listen(128)
      while True:
            handle_client(http_socket)
      http_socket.close()

if __name__ == "__main__":
      main()
     
     try:
            if file_name == "/":
                  f = open(f"demo/index.html", 'rb')
            else:
                  f = open(f"demo{file_name}", 'rb')
      except:
            response_header = b"HTTP/1.1 404 NOT FOUND\n\n"
            content = b"File Not Found"
      else:
            content = f.read()
            f.close()

If we do not specify the exact file name, the index.html will be opened. If we request a specific file, it will be opened and read and finnaly returned to the browser.

Whenever there is an exception during the process of opening a file, a 404 NOT FOUND header will be returned along with the "File Not Found" content.

client_socket.send(response_header)
client_socket.send(content)

Then we send both the header and response content back to the browser.

Now run the server, we should be able to navigate the demo site:


4. A HTTP server with multiprocessing

We are gonna use multiprocessing module, if you don't know multitasking please refre to my posts on multitasking.

# a simple http server
import socket, re, multiprocessing

def handle_client(client_socket):
      
      request_content = client_socket.recv(1024).decode('utf-8')
      res = re.search(r"(POST|GET) (/.*) HTTP/1\.[10]", request_content)
      file_name = res.group(2)

      response_header = b"HTTP/1.1 200 OK\n\n"
      # open the file_name
      try:
            if file_name == "/":
                  f = open(f"demo/index.html", 'rb')
            else:
                  f = open(f"demo{file_name}", 'rb')
      except:
            response_header = b"HTTP/1.1 404 NOT FOUND\n\n"
            content = b"File Not Found"
      else:
            content = f.read()
            f.close()

      client_socket.send(response_header)
      client_socket.send(content)
      client_socket.close()

def main():
      http_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

      http_socket.bind(("", 8080))
      http_socket.listen(128)
      while True:
            
            client_socket, client_address = http_socket.accept()
            print(f"{client_address}: connected")

            t = multiprocessing.Process(target=handle_client, args=(client_socket,))
            t.start()
            
            client_socket.close()
            
      http_socket.close()

if __name__ == "__main__":
      main()

It is so simple to write a multiprocessing HTTP web server. One thing to point out:


As you could see from the code above, we must have the client_socket closed twice in both main process and child process. Because when a new process is forked, a copy of resources used by the main process will be passed to the child process as well. 

5. A HTTP server with multithreading

# a simple http server
import socket, re, threading

def handle_client(client_socket):
      
      request_content = client_socket.recv(1024).decode('utf-8')
      res = re.search(r"(POST|GET) (/.*) HTTP/1\.[10]", request_content)
      file_name = res.group(2)

      response_header = b"HTTP/1.1 200 OK\n\n"
      # open the file_name
      try:
            if file_name == "/":
                  f = open(f"demo/index.html", 'rb')
            else:
                  f = open(f"demo{file_name}", 'rb')
      except:
            response_header = b"HTTP/1.1 404 NOT FOUND\n\n"
            content = b"File Not Found"
      else:
            content = f.read()
            f.close()

      client_socket.send(response_header)
      client_socket.send(content)
      client_socket.close()

def main():
      http_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

      http_socket.bind(("", 8080))
      http_socket.listen(128)
      while True:
            
            client_socket, client_address = http_socket.accept()
            print(f"{client_address}: connected")

            t = threading.Thread(target=handle_client, args=(client_socket,))
            t.start()
            
            
      http_socket.close()

if __name__ == "__main__":
      main()

Just replace the multiprocessing module with threading module and delete the client_socket.close() in the main thread; because threads share the same resources used by the main thread.

6. A HTTP server with gevent

# a simple http server
import socket, re, gevent
from gevent import monkey

monkey.patch_all()

def handle_client(client_socket):
      
      request_content = client_socket.recv(1024).decode('utf-8')
      res = re.search(r"(POST|GET) (/.*) HTTP/1\.[10]", request_content)
      file_name = res.group(2)

      response_header = b"HTTP/1.1 200 OK\n\n"
      # open the file_name
      try:
            if file_name == "/":
                  f = open(f"demo/index.html", 'rb')
            else:
                  f = open(f"demo{file_name}", 'rb')
      except:
            response_header = b"HTTP/1.1 404 NOT FOUND\n\n"
            content = b"File Not Found"
      else:
            content = f.read()
            f.close()

      client_socket.send(response_header)
      client_socket.send(content)
      client_socket.close()

def main():
      http_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

      http_socket.bind(("", 8080))
      http_socket.listen(128)
      while True:
            
            client_socket, client_address = http_socket.accept()
            print(f"{client_address}: connected")

            gevent.spawn(handle_client, client_socket)
            
      http_socket.close()

if __name__ == "__main__":
      main()

7. A multitasking TCP server without threadings, multiprocessing, and gevent

# http server single thread version
import socket, time

# create a tcp socket
tcp_socket_server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# set tcp socket in non-blocking mode
tcp_socket_server.setblocking(False)

# bind a local address
tcp_socket_server.bind(("", 8080))

# set the server socket into listen mode
tcp_socket_server.listen(128)

# a list to store all client sockets
client_socket_list = list()

while True:
      # time.sleep(0.5)
      # wait for incoming clients
      try:
            client_socket, client_address = tcp_socket_server.accept()
      except:
            print("No incoming connections...")
      else:
            # set client socket in non-blocking mode
            client_socket.setblocking(False)
            print(f"{client_address}: connected")
            client_socket_list.append(client_socket)

      for client_socket in client_socket_list:

            try:
                  # recv data from the client socket
                  data_recv = client_socket.recv(1024)
            except:
                  print('No data coming from the socket...')
            else:
                  if data_recv:
                        print(data_recv)
                  else:
                        print('client closed')
                        client_socket.close()
                        client_socket_list.remove(client_socket)

In this program, we set the socket into none blocking mode and use try...except...else block to handle none blocking exceptions.

8. HTTP Persistent connection Vs multiple connections

HTTP persitent connection uses a single TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for each HTTP request/reponse.

As you could see from the left graph in the picture above, if we want to send and receive three HTTP request/response, we have to open three connections for each which is a huge burdern for modern services. 

For example, if the page you requested have over 100 pictures to request after the content being retrieved. The server has to open over 100 sockets to handle all those requests which is a huge waste.

If we use persistent connection, we only have to open a single TCP connection to achieve multiple HTTP requests/responses.

8.1 HTTP/1.0 vs HTTP/1.1

In the previous topics, we uses the HTTP/1.1 which supports persistent connections by default.

HTTP/1.0, which is common a decade ago, does not support persistent connections by default. If you want the connection too keep alive, you have to specify the header:

Connection: keep-alive

Then the connection will not be dropped until the client or server decide to quit the conversation.

8.2 Modern web browser

All modern web browers, such as Chrome, Firefox, Edge, etc support persistent connections.

8.3 Improve our HTTP server to support persistent connection

In our previous topics, we have written several HTTP server demos, but not any one of them support persistent connection. It is because after receiving a request from a client and send back the response, we immediatedly close the socket; ironically, we still use the HTTP/1.1. 

So let's see the following code, it a single process, single thread multitasking HTTP server.

# http server single thread version
import socket, time, re

def service_client(client_socket, client_socket_list):

      while True:
            try:
                  request_content = client_socket.recv(1024).decode('utf-8')
            except:
                  print("No data coming...")
                  break
            else:
                  if request_content:
                        res = re.search(r"(POST|GET) (/.*) HTTP/1\.[10]", request_content)
                        file_name = res.group(2)

                        response_header = "HTTP/1.1 200 OK\n"
                        # open the file_name
                        try:
                              if file_name == "/":
                                    f = open(f"demo/index.html", 'rb')
                              else:
                                    f = open(f"demo{file_name}", 'rb')
                        except:
                              response_header = "HTTP/1.1 404 NOT FOUND\n"
                              content = b"File Not Found"
                        else:
                              content = f.read()
                              f.close()

                        response_header += f"Content-Length: {len(content)}\n\n"
                        response_header = response_header.encode('utf-8')
                        client_socket.send(response_header)
                        client_socket.send(content)
                  else:
                        client_socket.close()
                        client_socket_list.remove(client_socket)

 

def main():
      
      # create a tcp socket
      tcp_socket_server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

      # set tcp socket in non-blocking mode
      tcp_socket_server.setblocking(False)

      # bind a local address
      tcp_socket_server.bind(("", 8080))

      # set the server socket into listen mode
      tcp_socket_server.listen(128)

      # a list to store all client sockets
      client_socket_list = list()

      while True:
            #time.sleep(0.5)
            # wait for incoming clients
            try:
                  client_socket, client_address = tcp_socket_server.accept()
            except:
                  print("No incoming connections...")
            else:
                  # set client socket in non-blocking mode
                  client_socket.setblocking(False)
                  print(f"{client_address}: connected")
                  client_socket_list.append(client_socket)

            for client_socket in client_socket_list:
                  service_client(client_socket, client_socket_list)

      tcp_socket_server.close()           


if __name__ == "__main__":
      main()

We do not close the client socket after serving the client once; when the web browser is closed, our client socket will close otherwise it will be stored somewhere in the memory waiting for handle HTTP request/response.


















Comments

Popular posts from this blog

How to create a memory puzzle game with Python and Pygame (#005)

How to write a slide puzzle game with Python and Pygame (2020 tutorial)

Introduction to multitasking with Python #003 gevent (2020 tutorial)