How I wrote a Python client for HTTP/3 proxies
My talk from EuroPython 2022.
MASQUE (Multiplexed Application Substrate over QUIC Encryption) is a draft of a new protocol that allows running proxy or VPN services indistinguishable from HTTPS servers. Akamai built a managed proxy service based on the MASQUE protocol to provide egress proxy for iCloud Private Relay.
While working on the proxy at Akamai, I wrote a Python client for testing the proxy service. The MASQUE protocol can tunnel traffic through HTTP/3 or HTTP/2, but common Python libraries only support HTTP/1.1. The tunneled traffic can use any protocol on top of TCP or UDP, including all HTTP versions, so MASQUE can be proxied through MASQUE for onion routing.
In this talk, I will show that the MASQUE proxy design is simple and yet client implementations are complex. To put everything into context, I will recap how HTTP proxies operate and how HTTP versions differ. I will highlight lessons learned from designing a low-level HTTP client using Python asyncio.
Here is a transcript of my talk.
PDF version of my slides is available too, but the slide deck is meant for illustration, so it may not be useful alone.
I would like to share with you how I wrote a Python client for HTTP/3 proxies.
I will give you a short intro to the latest HTTP version, I will tell you what I learnt about proxies, and we will look how all this fits into the Python ecosystem.
My name is Miloslav and I am from Prague.
I work for Akamai, where lots of my work over the last 5 years was related to HTTP/3.
HTTP/3 is important to Akamai because we run one of the larges CDNs with over than 300,000 servers.
But today, I won’t talk about the CDN. I will talk about proxies.
Last year, in 2021, Akamai built a new network of proxies.
And as all software needs testing, my job was to write a client that would connect to these proxies.
If you are wondering, how a network of proxies can be useful, look for iCloud Private Relay in your iPhone or Mac.
Private Relay is described in a paper from Apple, where you can find that the service is using a MASQUE protocol.
MASQUE is a draft from IETF, so I won’t talk here about some proprietary service.
We will look at modern open standards for proxying.
MASQUE means: Multiplexed Application Substrate over QUIC Encryption.
I guess that somebody just wanted a nice acronym.
MASQUE proxies are normal HTTP proxies, just using HTTP/3 instead of common HTTP/1.
So, if MASQUE proxies are just HTTP/3 proxies, the question is: What is HTTP/3? What makes it special?
HTTP/3 is HTTP over QUIC.
OK. So, now the question is: What is QUIC?
That’s a little bit difficult to explain, and I don’t have much time, so I will give you a brief intro only.
TCP is a protocol that powers the Internet since 80’s. TCP is implemented in operating systems, so apps can open a TCP socket using a system call, write data to it, and the data will get to the remote side. Complete and in order.
But TCP is not the only Internet protocol. For practical considerations, one other is interesting: UDP Compared to TCP, UDP is dull. When send an UDP datagram, you cannot be sure whether it gets to to the remote side.
Ten years ago, somebody at Google got an idea: What if we used UDP primitives to build something like TCP, just improved. And they did it. They made QUIC.
Now, with TCP, you can add TLS for secure connections. QUIC has TLS builtin, it’s always fully encrypted.
Back to HTTP: HTTP/1 and HTTP/2 use TCP for transport. HTTP/3 uses QUIC (UDP).
An important advantage of QUIC is that it is multiplexed. Multiplexing means that you can send multiple requests in parallel over a single connection.
You may know that even with HTTP/2, you can open one TCP connection and use it to send multiple parallel requests. But internally, the requests must be serialized into one TCP flow. So, if something gets stuck (a packet is lost), it will block all other traffic over the connection.
QUIC supports the multiplexing at the network level. Each HTTP request gets its own independent stream. If one packet with one stream is lost, other streams are not affected.
This is great for the proxy use case because you will likely open many connections to different origins through one proxy.
So we know that the MASQUE proxies are just HTTP/3 proxies.
But a proxy is a quite broad term, so before we dig deeper, I would like to clarify about what kind of proxies we are talking about.
We should distinguish forward and reverse proxies.
The reverse proxies are very common because they are utilized by websites.
For example, when I visit the EuroPython website, it’s quite likely that my requests go through one or more reverse proxies. But I, as the user, I am not aware of that, I don’t care.
They’re not my proxies.
By contrast, forward proxies are chosen by users.
I explicitly connect to a proxy and ask it to act on my behalf. I can use my proxy for all websites I visit and the websites do not need to know that I am using the proxy.
It is my proxy.
Another source of confusion could be the difference between HTTP and SOCKS proxies.
The SOCKS proxies are low-level and (quite) simple.
They are great for private networks and development. For example, SSH clients can run a local SOCKS proxy that tunnels traffic through an SSH connection.
But I would not expose a SOCKS proxy over the public Internet. The protocol is simple, traffic to a proxy is not encrypted, and only basic authentication only is supported.
Compared to SOCKS, HTTP proxies are much more powerful, as they inherit the whole HTTP stack.
- Do you want encryption? No problem. Use HTTPS instead of HTTP.
- Do you need authentication? There are many options in HTTP.
- You can employ all your favorite tools or toys. For example, you can add a reverse proxy in front of a forward proxy. Why not? It’s all HTTP.
HTTP forward proxies
MASQUE proxies are HTTP forward proxies. I will not talk about SOCK proxies and I will not talk about reverse proxies.
I would like to show you how a proxied request look like, but it would be difficult to read binary protocol like HTTP/3. So I will use HTTP/1.1 for illustration.
I’m sure that most (if not all) of you know how HTTP requests looks like:
- I open a TCP connection to a server, for example using netcat,
- I write my request to the connection,
- and I read a response from the server.
A proxy is a server that can return resources from 3rd-party hosts.
BTW the 3rd party host is usually called an origin. A proxy is between the origin and me.
- I connect to a proxy running at my localhost,
- and request a resource from some origin.
- The proxy will issue a request on my behalf and forward a response to me.
This is called proxy forwarding. And it is almost useless today.
These days, all important websites use HTTPS, meaning that traffic is encrypted using TLS. And we don’t want some proxies to peek into our traffic.
To support encrypted traffic, proxies need a completely different mode of operation. This mode is called tunneling.
In the tunneling mode, I connect to a proxy, but I do not send my request. Instead, I ask the proxy to setup a tunnel to an origin.
To achieve that, we have a special HTTP method called CONNECT. The proxy opens the tunnel and responds 200 to indicate success.
From now on, everything I send goes unmodified (is tunneled) to the origin server; and everything from the origin goes unmodified to me.
Now you should ask: Where is HTTPS, where is the encryption?
I’ve told you that the CONNECT method is used to tunnel HTTPS traffic, but I’ve shown you plain HTTP.
The truth is that we can tunnel any protocol. For the proxy, the traffic are just bytes, the proxy does not understand to your data. So, we can tunnel plain HTTP (as in my example), TLS for secure connections, or anything else we want.
In the tunneling mode, HTTP is used to setup a tunnel, but once the tunnel is established, its payload can be anything.
When I say anything, it includes another proxy connections. This is called onion routing:
- I can connect to one proxy.
- And ask it to setup a tunnel to a second proxy.
- Then, through the first and the second proxies, connect to a third proxy.
And so on.
But let’s get back to the MASQUE proxies:
- I’ve already told you that they are normal proxies, just using HTTP/3. Actually, they should support HTTP/2 as a fallback for networks where UDP is blocked.
- MASQUE proxies work in the tunneling mode only. It’s logical, it does not make sense to support the forwarding mode for a small fraction of insecure traffic.
- The MASQUE spec explicitly mentions the onion routing. It’s there by design.
- And there are some other goodies that normal proxies do not have, but I won’t go into that today.
OK. I want a client that supports MASQUE, meaning that I need something that supports: HTTP/3 and proxies.
So, what are my options? Let’s look at HTTP libraries in Python.
Unfortunately, both these libraries support HTTP/1 only.
For HTTP/2, we can use a nice library called httpx. But I am not aware about any ready-to-use client that would support HTTP/3 or QUIC.
SansIO or “Bring your own IO” are libraries with distilled protocol implementations. They can covert requests to bytes and bytes to responses, but they do not transport anything over a network. That’s something you must add.
SansIO is great if don’t want to start from scratch.
On the other hand, I am not sure that it’s such a great idea to implement networking protocols in Python. At least, if we care about performance. (And H2 and HTTP/3 are often chosen for their performance).
We should look at libraries like nghttp2, which is C library implementing HTTP/2. It’s used in projects like nginx or curl.
In any case, I think that we should consider our options how to wrap compiled libraries to Python.
At this point, I want to make clear that I am speaking about very low-level stuff. I am aware that most users, don’t care about this detail, they want something “for humans”. But that’s my topic today.
I am looking for internals that would allow me to combine protocols for proxies and origins.
Let’s finally look what I made.
How does my MASQUE client look like? What I wrote?
The core of my client is based on the aioquic library. Honestly, I did not have many options here, as aioquic is the only QUIC implementation available in Python.
As aioquic follows the SansIO approach, I had to bring my own IO, so I combined it with asyncio from the standard library.
Together, I got a simple H3Client. At this layer, the client was able to speak HTTP, but there was nothing related to proxies.
The proxy-related logic is one layer above. ProxyClient uses an H3Client to tunnel bytes through an HTTP/3 proxy.
At this layer, I have something like netcat with proxy support. So, I can write bytes and I can read bytes, but that’s not enough. I want to test real HTTP requests.
So, I added one more layer. I took h11, a SansIO implementation of HTTP/1, and combined it with my ProxyClient.
With this composition, I can finally tunnel HTTP/1 traffic through an HTTP/3 proxy.
Now, do you see any pattern in my class structure?
I see that:
- I have two layers and at each layer I have a SansIO library. Once for HTTP/3, once for HTTP/1.
- So at both layers, I had to provide IO. Once I used the standard library, once I used my own class.
- Together, in both cases, I got simple HTTP clients. Something that can send HTTP requests.
Looking at this pattern, I was considering how to properly generalize it.
What about HTTP/2 proxies? What about HTTP/2 origins? And other combinations? What about SOCKS proxies?
Two years ago, I gave a talk at EuroPython, where I said that HTTP is only one. Its concepts remain same for all versions. HTTP/1, HTTP/2, and HTTP/3 are just implementations, encoding HTTP to TCP or UDP.
I think that my next version of my client will follow exactly this design. I should have an interface for HTTP connections. Then I will implement it for different HTTP versions, most likely by combining SansIO and IO parts.
Or, for the test best performance, the interface could be implemented as a wrapper on top of a C library.
To support proxies, IO in my design should injectable. I do not want hard-coded sockets.
I think that we should have interfaces: For TCP-like connections and for UDP-like flows.
Obviously, the most common implementations will just wrap sockets. But will be able to plug in implementations that tunnel traffic through a proxy. Be it HTTP or SOSKS. And UDP can be tunneled through proxies too.
There are a few design details (lessons learnt) that I would like to mention:
- The main advantage of HTTP/2 and HTTP/3 is the support for the multiplexing. To make use of that, our code should be async.
- QUIC, unlike TCP, is implemented in user space. Whenever you have a QUIC connection open, there must be a background task that handles all networking stuff: sending packets, acknowledging packets, retransmitting packets, …
- I also learnt that writing low-level asyncio code is difficult. So, I would consider using trio or anyio streams next time.
I logical question is: Where you can find my code? Did I contribute to some opensource? I hacked some (very limited) solution to httpx, but I haven’t published it, as it’s not that easy.
HTTP/3 alone is complex. And in my case, my task was even more difficult. I want to support new protocols for both proxy and origin connections.
The existing libraries have very small overlap with the code that I have written.
This talk are my two cents how I think that internals (very low-level layers) of HTTP libraries could look like. I hope that I will be able to share more in future.
When I presented my vision, it’s time to conclude the talk.
What you can remember?
Please remember that HTTP tunnels are simple.
You just send a CONNECT request and from that point everything is tunneled through.
Proxies are not complicated, but HTTP/2 and HTTP/3 are different than the first version of the protocol.
Maybe, the existing abstractions are not sufficient, especially if we consider combinations of protocols for proxies and origins.
Speaking about abstractions:
- Do not forget HTTP itself is an interface.
- HTTP versions are implementations of the interface.
SansIO libraries are great if don’t want to start from scratch. For the production workloads, I would not forget about battle-proven native libraries.
That’s all from my site. Thank you for your attention. Thank you that I could share my experience with you.