How I wrote a Python client for HTTP/3 proxies

My talk from EuroPython 2022.

Abstract

MASQUE (Multiplexed Application Substrate over QUIC Encryption) is a draft of a new protocol that allows running proxy or VPN services indistinguishable from HTTPS servers. Akamai built a managed proxy service based on the MASQUE protocol to provide egress proxy for iCloud Private Relay.

While working on the proxy at Akamai, I wrote a Python client for testing the proxy service. The MASQUE protocol can tunnel traffic through HTTP/3 or HTTP/2, but common Python libraries only support HTTP/1.1. The tunneled traffic can use any protocol on top of TCP or UDP, including all HTTP versions, so MASQUE can be proxied through MASQUE for onion routing.

In this talk, I will show that the MASQUE proxy design is simple and yet client implementations are complex. To put everything into context, I will recap how HTTP proxies operate and how HTTP versions differ. I will highlight lessons learned from designing a low-level HTTP client using Python asyncio.

Video

Transcript

MASQUE - Slide 1

Here is a transcript of my talk.

PDF version of my slides is available too, but the slide deck is meant for illustration, so it may not be useful alone.

Introduction

MASQUE - Slide 2

I would like to share with you how I wrote a Python client for HTTP/3 proxies.

I will give you a short intro to the latest HTTP version, I will tell you what I learnt about proxies, and we will look how all this fits into the Python ecosystem.

MASQUE - Slide 3

My name is Miloslav and I am from Prague.

I work for Akamai, where lots of my work over the last 5 years was related to HTTP/3.

MASQUE - Slide 4

HTTP/3 is important to Akamai because we run one of the larges CDNs with over than 300,000 servers.

But today, I won’t talk about the CDN. I will talk about proxies.

MASQUE - Slide 5

Last year, in 2021, Akamai built a new network of proxies.

And as all software needs testing, my job was to write a client that would connect to these proxies.

MASQUE - Slide 6

If you are wondering, how a network of proxies can be useful, look for iCloud Private Relay in your iPhone or Mac.

Private Relay is described in a paper from Apple, where you can find that the service is using a MASQUE protocol.

MASQUE

MASQUE - Slide 7

MASQUE is a draft from IETF, so I won’t talk here about some proprietary service.

We will look at modern open standards for proxying.

MASQUE - Slide 8

MASQUE means: Multiplexed Application Substrate over QUIC Encryption.

I guess that somebody just wanted a nice acronym.

MASQUE - Slide 9

MASQUE proxies are normal HTTP proxies, just using HTTP/3 instead of common HTTP/1.

So, if MASQUE proxies are just HTTP/3 proxies, the question is: What is HTTP/3? What makes it special?

MASQUE - Slide 10

HTTP/3 is HTTP over QUIC.

OK. So, now the question is: What is QUIC?

That’s a little bit difficult to explain, and I don’t have much time, so I will give you a brief intro only.

MASQUE - Slide 11

TCP is a protocol that powers the Internet since 80’s. TCP is implemented in operating systems, so apps can open a TCP socket using a system call, write data to it, and the data will get to the remote side. Complete and in order.

But TCP is not the only Internet protocol. For practical considerations, one other is interesting: UDP Compared to TCP, UDP is dull. When send an UDP datagram, you cannot be sure whether it gets to to the remote side.

Ten years ago, somebody at Google got an idea: What if we used UDP primitives to build something like TCP, just improved. And they did it. They made QUIC.

Now, with TCP, you can add TLS for secure connections. QUIC has TLS builtin, it’s always fully encrypted.

Back to HTTP: HTTP/1 and HTTP/2 use TCP for transport. HTTP/3 uses QUIC (UDP).

MASQUE - Slide 12

An important advantage of QUIC is that it is multiplexed. Multiplexing means that you can send multiple requests in parallel over a single connection.

You may know that even with HTTP/2, you can open one TCP connection and use it to send multiple parallel requests. But internally, the requests must be serialized into one TCP flow. So, if something gets stuck (a packet is lost), it will block all other traffic over the connection.

QUIC supports the multiplexing at the network level. Each HTTP request gets its own independent stream. If one packet with one stream is lost, other streams are not affected.

This is great for the proxy use case because you will likely open many connections to different origins through one proxy.

Proxies

MASQUE - Slide 13

So we know that the MASQUE proxies are just HTTP/3 proxies.

But a proxy is a quite broad term, so before we dig deeper, I would like to clarify about what kind of proxies we are talking about.

MASQUE - Slide 14

We should distinguish forward and reverse proxies.

MASQUE - Slide 15

The reverse proxies are very common because they are utilized by websites.

For example, when I visit the EuroPython website, it’s quite likely that my requests go through one or more reverse proxies. But I, as the user, I am not aware of that, I don’t care.

They’re not my proxies.

MASQUE - Slide 16

By contrast, forward proxies are chosen by users.

I explicitly connect to a proxy and ask it to act on my behalf. I can use my proxy for all websites I visit and the websites do not need to know that I am using the proxy.

It is my proxy.

MASQUE - Slide 17

Another source of confusion could be the difference between HTTP and SOCKS proxies.

MASQUE - Slide 18

The SOCKS proxies are low-level and (quite) simple.

They are great for private networks and development. For example, SSH clients can run a local SOCKS proxy that tunnels traffic through an SSH connection.

But I would not expose a SOCKS proxy over the public Internet. The protocol is simple, traffic to a proxy is not encrypted, and only basic authentication only is supported.

MASQUE - Slide 19

Compared to SOCKS, HTTP proxies are much more powerful, as they inherit the whole HTTP stack.

MASQUE - Slide 20

HTTP forward proxies

MASQUE proxies are HTTP forward proxies. I will not talk about SOCK proxies and I will not talk about reverse proxies.

I would like to show you how a proxied request look like, but it would be difficult to read binary protocol like HTTP/3. So I will use HTTP/1.1 for illustration.

MASQUE - Slide 21

I’m sure that most (if not all) of you know how HTTP requests looks like:

That’s all.

MASQUE - Slide 22

A proxy is a server that can return resources from 3rd-party hosts.

BTW the 3rd party host is usually called an origin. A proxy is between the origin and me.

This is called proxy forwarding. And it is almost useless today.

MASQUE - Slide 23

These days, all important websites use HTTPS, meaning that traffic is encrypted using TLS. And we don’t want some proxies to peek into our traffic.

MASQUE - Slide 24

To support encrypted traffic, proxies need a completely different mode of operation. This mode is called tunneling.

In the tunneling mode, I connect to a proxy, but I do not send my request. Instead, I ask the proxy to setup a tunnel to an origin.

To achieve that, we have a special HTTP method called CONNECT. The proxy opens the tunnel and responds 200 to indicate success.

From now on, everything I send goes unmodified (is tunneled) to the origin server; and everything from the origin goes unmodified to me.

MASQUE - Slide 25

Now you should ask: Where is HTTPS, where is the encryption?

I’ve told you that the CONNECT method is used to tunnel HTTPS traffic, but I’ve shown you plain HTTP.

The truth is that we can tunnel any protocol. For the proxy, the traffic are just bytes, the proxy does not understand to your data. So, we can tunnel plain HTTP (as in my example), TLS for secure connections, or anything else we want.

In the tunneling mode, HTTP is used to setup a tunnel, but once the tunnel is established, its payload can be anything.

MASQUE - Slide 26

When I say anything, it includes another proxy connections. This is called onion routing:

And so on.

MASQUE - Slide 27

But let’s get back to the MASQUE proxies:

HTTP clients

MASQUE - Slide 28

OK. I want a client that supports MASQUE, meaning that I need something that supports: HTTP/3 and proxies.

So, what are my options? Let’s look at HTTP libraries in Python.

MASQUE - Slide 29

Python has batteries included, so we have an HTTP client in the standard library. But the most popular option today are requests.

Unfortunately, both these libraries support HTTP/1 only.

For HTTP/2, we can use a nice library called httpx. But I am not aware about any ready-to-use client that would support HTTP/3 or QUIC.

MASQUE - Slide 30

I carefully say ready-to-use because we have so-called SansIO libraries. And we have them for all three HTTP versions, including the latest one.

SansIO or “Bring your own IO” are libraries with distilled protocol implementations. They can covert requests to bytes and bytes to responses, but they do not transport anything over a network. That’s something you must add.

SansIO is great if don’t want to start from scratch.

MASQUE - Slide 31

On the other hand, I am not sure that it’s such a great idea to implement networking protocols in Python. At least, if we care about performance. (And H2 and HTTP/3 are often chosen for their performance).

We should look at libraries like nghttp2, which is C library implementing HTTP/2. It’s used in projects like nginx or curl.

And there are competing HTTP/3 implementations, written in C, C++, or Rust. (Two of them are called Quiche, so just be careful to distinguish them.)

In any case, I think that we should consider our options how to wrap compiled libraries to Python.

MASQUE - Slide 32

At this point, I want to make clear that I am speaking about very low-level stuff. I am aware that most users, don’t care about this detail, they want something “for humans”. But that’s my topic today.

I am looking for internals that would allow me to combine protocols for proxies and origins.

My client

MASQUE - Slide 33

Let’s finally look what I made.

How does my MASQUE client look like? What I wrote?

MASQUE - Slide 34

The core of my client is based on the aioquic library. Honestly, I did not have many options here, as aioquic is the only QUIC implementation available in Python.

As aioquic follows the SansIO approach, I had to bring my own IO, so I combined it with asyncio from the standard library.

Together, I got a simple H3Client. At this layer, the client was able to speak HTTP, but there was nothing related to proxies.

MASQUE - Slide 35

The proxy-related logic is one layer above. ProxyClient uses an H3Client to tunnel bytes through an HTTP/3 proxy.

At this layer, I have something like netcat with proxy support. So, I can write bytes and I can read bytes, but that’s not enough. I want to test real HTTP requests.

MASQUE - Slide 36

So, I added one more layer. I took h11, a SansIO implementation of HTTP/1, and combined it with my ProxyClient.

With this composition, I can finally tunnel HTTP/1 traffic through an HTTP/3 proxy.

MASQUE - Slide 37

Now, do you see any pattern in my class structure?

I see that:

Vision

MASQUE - Slide 38

Looking at this pattern, I was considering how to properly generalize it.

What about HTTP/2 proxies? What about HTTP/2 origins? And other combinations? What about SOCKS proxies?

MASQUE - Slide 39

Two years ago, I gave a talk at EuroPython, where I said that HTTP is only one. Its concepts remain same for all versions. HTTP/1, HTTP/2, and HTTP/3 are just implementations, encoding HTTP to TCP or UDP.

New RFCs from this year are structured exactly like that. We have one RFC 9110 about HTTP semantics. Then we have separate RFC (9112, 9113, 9114) for each HTTP version.

I think that my next version of my client will follow exactly this design. I should have an interface for HTTP connections. Then I will implement it for different HTTP versions, most likely by combining SansIO and IO parts.

Or, for the test best performance, the interface could be implemented as a wrapper on top of a C library.

MASQUE - Slide 40

To support proxies, IO in my design should injectable. I do not want hard-coded sockets.

I think that we should have interfaces: For TCP-like connections and for UDP-like flows.

Obviously, the most common implementations will just wrap sockets. But will be able to plug in implementations that tunnel traffic through a proxy. Be it HTTP or SOSKS. And UDP can be tunneled through proxies too.

MASQUE - Slide 41

There are a few design details (lessons learnt) that I would like to mention:

  1. The main advantage of HTTP/2 and HTTP/3 is the support for the multiplexing. To make use of that, our code should be async.
  2. QUIC, unlike TCP, is implemented in user space. Whenever you have a QUIC connection open, there must be a background task that handles all networking stuff: sending packets, acknowledging packets, retransmitting packets, …
  3. I also learnt that writing low-level asyncio code is difficult. So, I would consider using trio or anyio streams next time.

MASQUE - Slide 42

I logical question is: Where you can find my code? Did I contribute to some opensource? I hacked some (very limited) solution to httpx, but I haven’t published it, as it’s not that easy.

HTTP/3 alone is complex. And in my case, my task was even more difficult. I want to support new protocols for both proxy and origin connections.

The existing libraries have very small overlap with the code that I have written.

This talk are my two cents how I think that internals (very low-level layers) of HTTP libraries could look like. I hope that I will be able to share more in future.

Summary

MASQUE - Slide 43

When I presented my vision, it’s time to conclude the talk.

What you can remember?

MASQUE - Slide 44

Please remember that HTTP tunnels are simple.

You just send a CONNECT request and from that point everything is tunneled through.

MASQUE - Slide 45

Proxies are not complicated, but HTTP/2 and HTTP/3 are different than the first version of the protocol.

Maybe, the existing abstractions are not sufficient, especially if we consider combinations of protocols for proxies and origins.

MASQUE - Slide 46

Speaking about abstractions:

SansIO libraries are great if don’t want to start from scratch. For the production workloads, I would not forget about battle-proven native libraries.

MASQUE - Slide 47

That’s all from my site. Thank you for your attention. Thank you that I could share my experience with you.