Introduction to network programming with Python #0003 HTTP 2020 Tutorial)
Table of Contents
1. Introduction to HTTP
2. Write a TCP server that serves as a HTTP server
3. A HTTP server demo
4. A HTTP server with multiprocessing
5. A HTTP server with multithreading
6. A HTTP server with gevent
7. A multitasking TCP server without threadings, multiprocessing, and gevent
8. HTTP Persistent connection Vs multiple connections
9. Improve our server with epoll (Linux only)
1. Introduction to HTTP
Let's just explain HTTP in a simple way.
The HTTP (Hyper Text Transfer Protocol) is a way to let you enter a website address in a web browser and get a web page that you want.
For example:
The HTTP protocol is built on top of TCP/IP protocol and it defines or restricts the way how a web browser communicate with the HTTP server.
Let's open Chrome and enter google.com. Then press "F12" to enter the dev panel and select Network tab.
As you could see from the headers section, we could see the response headers and request headers. Actually, when we request a page from the google.com server, our browser send a request header with all the information requested by the HTTP protocol; and then the google server will response with a response headers along side the content.
Now let's open the Net Assistant and start a TCP server. Now we could connect to this server through our web browser.
When we enter 127.0.0.1:8080 into our browser, our Net Assistant will display the HTTP header information sent by the browser.
The most import part is the first line:
GET / HTTP/1.1
As you could see, "GET" is for getting content from the web server. If you post a form, then the method is "POST". The next "/" slash is the file that your browser want to get, because we only enter "127.0.0.1:8080" without a file name such as "127.0.0.1:8080/index.html", so it is only a single "/". If you enter "127.0.0.1:8080/index.html", the first line would become:GET /index.html HTTP/1.1
The last part is the HTTP version HTTP/1.1.
As for the reponse, if the server finds the file that the browser requested, the server will send the response to the browser.
In order to complete the demo, we could send the following info back to browser.
<h1>Hello</h1>
There is a empty line betwen the first line and the content. The first line is the most important header in a HTTP response. It tells the browser file exists and is ready to send the content back to the browser. The actual content follows the headers. The browser uses the empty line to separate headers and content.
After you entered HTTP/1.1 200 OK \n\n<h1>hello<\h1> and send, the browser will display the hello message.
2. Write a TCP server that serves as a HTTP server
Now, let's write a TCP server that could accept requests from a web browser.
# a simple http server
import socket, re
def main():
http_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
http_socket.bind(("", 8080))
http_socket.listen(128)
while True:
client_socket, client_address = http_socket.accept()
print(f"{client_address}: connected")
content = client_socket.recv(1024).decode('utf-8')
res = re.search(r"(POST|GET) (/.*) HTTP/1\.[10]", content)
file_name = res.group(2)
print(content)
if file_name == "/":
response = 'HTTP/1.1 200 OK\n\n<h1>Hello, Welcome!</h1>'
else:
response = 'HTTP/1.1 200 OK\n\n<h1>Nothing found</h1>'
client_socket.send(response.encode('utf-8'))
client_socket.close()
http_socket.close()
if __name__ == "__main__":
main()
It is a simple program that we have written several times in the previous topics. One thing to notice is that:
res = re.search(r"(POST|GET) (/.*) HTTP/1\.[10]", content)
file_name = res.group(2)
We use a regular expression to get the first line and get the file name requested by the browser.
If the file name is not specified, return "Hello, Welcome!", otherwise return "Nothing found".
3. A HTTP server demo
We are gonna impove the code above and allow our browser to get the specific page. We are gonna use a file which contains a lot of html files and pictures. We are gonna use this file to test our server.
All those files are in the directory named "demo" and our server is in the same directory as the demo.# a simple http server
import socket, re
def handle_client(http_socket):
client_socket, client_address = http_socket.accept()
print(f"{client_address}: connected")
request_content = client_socket.recv(1024).decode('utf-8')
res = re.search(r"(POST|GET) (/.*) HTTP/1\.[10]", request_content)
file_name = res.group(2)
response_header = b"HTTP/1.1 200 OK\n\n"
# open the file_name
try:
if file_name == "/":
f = open(f"demo/index.html", 'rb')
else:
f = open(f"demo{file_name}", 'rb')
except:
response_header = b"HTTP/1.1 404 NOT FOUND\n\n"
content = b"File Not Found"
else:
content = f.read()
f.close()
client_socket.send(response_header)
client_socket.send(content)
client_socket.close()
def main():
http_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
http_socket.bind(("", 8080))
http_socket.listen(128)
while True:
handle_client(http_socket)
http_socket.close()
if __name__ == "__main__":
main()
try:
if file_name == "/":
f = open(f"demo/index.html", 'rb')
else:
f = open(f"demo{file_name}", 'rb')
except:
response_header = b"HTTP/1.1 404 NOT FOUND\n\n"
content = b"File Not Found"
else:
content = f.read()
f.close()
If we do not specify the exact file name, the index.html will be opened. If we request a specific file, it will be opened and read and finnaly returned to the browser.
Whenever there is an exception during the process of opening a file, a 404 NOT FOUND header will be returned along with the "File Not Found" content.
client_socket.send(response_header)
client_socket.send(content)
Then we send both the header and response content back to the browser.
Now run the server, we should be able to navigate the demo site:
4. A HTTP server with multiprocessing
We are gonna use multiprocessing module, if you don't know multitasking please refre to my posts on multitasking.
# a simple http server
import socket, re, multiprocessing
def handle_client(client_socket):
request_content = client_socket.recv(1024).decode('utf-8')
res = re.search(r"(POST|GET) (/.*) HTTP/1\.[10]", request_content)
file_name = res.group(2)
response_header = b"HTTP/1.1 200 OK\n\n"
# open the file_name
try:
if file_name == "/":
f = open(f"demo/index.html", 'rb')
else:
f = open(f"demo{file_name}", 'rb')
except:
response_header = b"HTTP/1.1 404 NOT FOUND\n\n"
content = b"File Not Found"
else:
content = f.read()
f.close()
client_socket.send(response_header)
client_socket.send(content)
client_socket.close()
def main():
http_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
http_socket.bind(("", 8080))
http_socket.listen(128)
while True:
client_socket, client_address = http_socket.accept()
print(f"{client_address}: connected")
t = multiprocessing.Process(target=handle_client, args=(client_socket,))
t.start()
client_socket.close()
http_socket.close()
if __name__ == "__main__":
main()
It is so simple to write a multiprocessing HTTP web server. One thing to point out:
As you could see from the code above, we must have the client_socket closed twice in both main process and child process. Because when a new process is forked, a copy of resources used by the main process will be passed to the child process as well.
5. A HTTP server with multithreading
# a simple http server
import socket, re, threading
def handle_client(client_socket):
request_content = client_socket.recv(1024).decode('utf-8')
res = re.search(r"(POST|GET) (/.*) HTTP/1\.[10]", request_content)
file_name = res.group(2)
response_header = b"HTTP/1.1 200 OK\n\n"
# open the file_name
try:
if file_name == "/":
f = open(f"demo/index.html", 'rb')
else:
f = open(f"demo{file_name}", 'rb')
except:
response_header = b"HTTP/1.1 404 NOT FOUND\n\n"
content = b"File Not Found"
else:
content = f.read()
f.close()
client_socket.send(response_header)
client_socket.send(content)
client_socket.close()
def main():
http_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
http_socket.bind(("", 8080))
http_socket.listen(128)
while True:
client_socket, client_address = http_socket.accept()
print(f"{client_address}: connected")
t = threading.Thread(target=handle_client, args=(client_socket,))
t.start()
http_socket.close()
if __name__ == "__main__":
main()
Just replace the multiprocessing module with threading module and delete the client_socket.close() in the main thread; because threads share the same resources used by the main thread.
6. A HTTP server with gevent
# a simple http server
import socket, re, gevent
from gevent import monkey
monkey.patch_all()
def handle_client(client_socket):
request_content = client_socket.recv(1024).decode('utf-8')
res = re.search(r"(POST|GET) (/.*) HTTP/1\.[10]", request_content)
file_name = res.group(2)
response_header = b"HTTP/1.1 200 OK\n\n"
# open the file_name
try:
if file_name == "/":
f = open(f"demo/index.html", 'rb')
else:
f = open(f"demo{file_name}", 'rb')
except:
response_header = b"HTTP/1.1 404 NOT FOUND\n\n"
content = b"File Not Found"
else:
content = f.read()
f.close()
client_socket.send(response_header)
client_socket.send(content)
client_socket.close()
def main():
http_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
http_socket.bind(("", 8080))
http_socket.listen(128)
while True:
client_socket, client_address = http_socket.accept()
print(f"{client_address}: connected")
gevent.spawn(handle_client, client_socket)
http_socket.close()
if __name__ == "__main__":
main()
7. A multitasking TCP server without threadings, multiprocessing, and gevent
# http server single thread version import socket, time # create a tcp socket tcp_socket_server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # set tcp socket in non-blocking mode tcp_socket_server.setblocking(False) # bind a local address tcp_socket_server.bind(("", 8080)) # set the server socket into listen mode tcp_socket_server.listen(128) # a list to store all client sockets client_socket_list = list() while True: # time.sleep(0.5) # wait for incoming clients try: client_socket, client_address = tcp_socket_server.accept() except: print("No incoming connections...") else: # set client socket in non-blocking mode client_socket.setblocking(False) print(f"{client_address}: connected") client_socket_list.append(client_socket) for client_socket in client_socket_list: try: # recv data from the client socket data_recv = client_socket.recv(1024) except: print('No data coming from the socket...') else: if data_recv: print(data_recv) else: print('client closed') client_socket.close() client_socket_list.remove(client_socket)
In this program, we set the socket into none blocking mode and use try...except...else block to handle none blocking exceptions.
8. HTTP Persistent connection Vs multiple connections
HTTP persitent connection uses a single TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for each HTTP request/reponse.
As you could see from the left graph in the picture above, if we want to send and receive three HTTP request/response, we have to open three connections for each which is a huge burdern for modern services.For example, if the page you requested have over 100 pictures to request after the content being retrieved. The server has to open over 100 sockets to handle all those requests which is a huge waste.
If we use persistent connection, we only have to open a single TCP connection to achieve multiple HTTP requests/responses.
8.1 HTTP/1.0 vs HTTP/1.1
In the previous topics, we uses the HTTP/1.1 which supports persistent connections by default.
HTTP/1.0, which is common a decade ago, does not support persistent connections by default. If you want the connection too keep alive, you have to specify the header:
Connection: keep-alive
Then the connection will not be dropped until the client or server decide to quit the conversation.
8.2 Modern web browser
All modern web browers, such as Chrome, Firefox, Edge, etc support persistent connections.
8.3 Improve our HTTP server to support persistent connection
In our previous topics, we have written several HTTP server demos, but not any one of them support persistent connection. It is because after receiving a request from a client and send back the response, we immediatedly close the socket; ironically, we still use the HTTP/1.1.
So let's see the following code, it a single process, single thread multitasking HTTP server.
# http server single thread version import socket, time, re def service_client(client_socket, client_socket_list): while True: try: request_content = client_socket.recv(1024).decode('utf-8') except: print("No data coming...") break else: if request_content: res = re.search(r"(POST|GET) (/.*) HTTP/1\.[10]", request_content) file_name = res.group(2) response_header = "HTTP/1.1 200 OK\n" # open the file_name try: if file_name == "/": f = open(f"demo/index.html", 'rb') else: f = open(f"demo{file_name}", 'rb') except: response_header = "HTTP/1.1 404 NOT FOUND\n" content = b"File Not Found" else: content = f.read() f.close() response_header += f"Content-Length: {len(content)}\n\n" response_header = response_header.encode('utf-8') client_socket.send(response_header) client_socket.send(content) else: client_socket.close() client_socket_list.remove(client_socket) def main(): # create a tcp socket tcp_socket_server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # set tcp socket in non-blocking mode tcp_socket_server.setblocking(False) # bind a local address tcp_socket_server.bind(("", 8080)) # set the server socket into listen mode tcp_socket_server.listen(128) # a list to store all client sockets client_socket_list = list() while True: #time.sleep(0.5) # wait for incoming clients try: client_socket, client_address = tcp_socket_server.accept() except: print("No incoming connections...") else: # set client socket in non-blocking mode client_socket.setblocking(False) print(f"{client_address}: connected") client_socket_list.append(client_socket) for client_socket in client_socket_list: service_client(client_socket, client_socket_list) tcp_socket_server.close() if __name__ == "__main__": main()
We do not close the client socket after serving the client once; when the web browser is closed, our client socket will close otherwise it will be stored somewhere in the memory waiting for handle HTTP request/response.
Comments
Post a Comment