Implementing QUIC from Scratch with Rust: Trying to analyse and implement QUIC Handshake 😂

on 2024-12-31

Why Is the QUIC Handshake So Complex?

Compared to TCP or other UDP-based protocols, the QUIC handshake is significantly more complex. If you ask me why, my immediate thought is that QUIC integrates the TLS 1.3 protocol to provide application-layer security. Unlike traditional protocols that separate the transport layer and the encryption layer, such as TCP + TLS or UDP-based protocols like UDT combined with DTLS. QUIC embeds the TLS 1.3 protocol directly, forming QUIC-TLS.

Why Do We Need an Encryption Layer?

Before delving into the design of the QUIC handshake, let me first explain why an encryption layer is essential. The most critical reason is to ensure secure network communication. By strictly adhering to the TLS protocol requirements, secure communication between connections can be guaranteed, safeguarding against man-in-the-middle attacks.

Additionally, an interesting benefit is that it prevents protocol ossification. This lesson comes from years of experience with TCP. Since TCP and TLS are decoupled, many middleboxes—deployed for purposes such as traffic monitoring, security, or QoS management—may handle TCP in ways that hinder protocol updates. Even minor updates that conform to the TCP standard, such as introducing a new TCP header option, can cause issues due to these middleboxes. Once deployed, these devices are rarely updated due to cost and stability concerns, leading to protocol stagnation.

In contrast, the QUIC protocol is encrypted end-to-end, which means middleboxes cannot perform special operations based on QUIC traffic. This ensures that QUIC remains resistant to protocol ossification caused by middleboxes.

The Difference Between DTLS and TLS

Typically, UDP-based transport layer protocols use DTLS to ensure security, while TCP-based protocols use TLS. The key distinction is that DTLS is designed to work over unreliable datagram transport like UDP, adapting TLS to achieve similar levels of security.

arch-01

TLS handshakes primarily handle asymmetric key exchange, after which a symmetric key is used to encrypt subsequent data. It's worth noting that the initial handshake messages in TLS are transmitted in plaintext, as the symmetric key for handshake encryption becomes available only after certain stages.

arch-01

For DTLS, the first task is to ensure that messages encrypted with symmetric keys can function correctly over unreliable transport. In regular TLS, the MAC algorithm relies on the implicit sequence numbers of the TLS Record Layer, which works over reliable streams. However, since DTLS operates over UDP and lacks reliability, it must explicitly include sequence numbers in its Record Layer.

These sequence numbers in DTLS are used for sliding window calculations and to prevent replay attacks. Replay attacks are a particular concern for UDP, which lacks the connection establishment of a three-way handshake. Additionally, DTLS must ensure that each UDP datagram carries a complete TLS Record Layer message, as partial messages cannot be reliably reassembled without this guarantee.

If reliability for Application Data is needed, it must be provided by the transport layer protocol above DTLS, such as UDT or KCP.

However, DTLS faces a major challenge during handshakes: it must ensure message ordering and reliability, which it achieves by implementing a basic ARQ mechanism. This involves dealing with retransmission confirmations and other issues, which I plan to explore further when implementing QUIC's ARQ.

Moreover, TLS itself is designed for reliable streams, which adds more compatibility challenges for DTLS. For instance, in DTLS 1.2 (corresponding to TLS 1.2), the ChangeCipherSpec message serves as the encryption trigger. Due to packet loss and out-of-order delivery in unreliable streams, the receiver requires additional mechanisms to handle it. Here, the epoch and Sequence Number fields in the DTLS Record Layer play crucial roles. The epoch indicates whether data is encrypted, effectively replacing the role of ChangeCipherSpec.

Similarly, since TLS handshake messages can exceed the UDP MTU (e.g., certificate messages), DTLS must fragment and reassemble handshake messages itself. Post-handshake data typically relies on the upper-layer protocol to handle MTU issues, but handshake messages fall entirely under DTLS's purview.

QUIC could have opted to use DTLS, but it chose a more innovative path. The details of DTLS mentioned here provide a basis for understanding the challenges QUIC must address in its handshake design.

The Purpose of TCP's Three-Way Handshake

Let’s revisit TCP’s three-way handshake. A classic question is: what exactly is the purpose of TCP’s three-way handshake? While this question might seem overly familiar or tedious, approaching it from the perspective of a transport layer protocol designer can reveal its significance.

For a transport protocol to ensure reliable, in-order delivery (as DTLS also illustrates), sequence numbers are indispensable. TCP is no exception. During the handshake, the primary task is to negotiate where sequence numbers should start. Some might ask, why not just start the sequence numbers at zero and skip the handshake? There are two main issues with this approach:

Security Vulnerability: Starting sequence numbers at zero makes TCP connections vulnerable to attacks. An attacker could easily guess the sequence number of a TCP connection (given that connections are uniquely identified by a four-tuple) and disrupt it, e.g., by injecting a forged RST packet, like tcpkilltool.
Old Packet Interference: When a new connection reuses a four-tuple recently freed by the kernel, late-arriving packets from the previous connection might interfere with the new connection.

Returning to KCP as an example, its sequence numbers always start from zero, as KCP does not implement its own handshake. I once encountered a scenario where a customer reused an old KCP context after a reconnection, causing issues because the sequence numbers overlapped.

Back to TCP: the handshake also serves to confirm the trustworthiness of both parties. For instance, without the three-way handshake, an attacker could send a spoofed TCP SYN packet with a forged source address. If the receiver accepts this without verification, it could lead to resource exhaustion attacks, as well as disruptions for the forged source.

Additionally, the handshake negotiates key parameters (e.g., window size, MSS, support for SACK), which directly impact transmission efficiency. So, observing what happens during a TCP handshake helps us understand its role better.

However, TCP's handshake lacks security. On-path attackers can easily disrupt TCP connections, which led to the advent of TLS. TLS, using key agreement algorithms like ECDH, addresses many security concerns but introduces its own latency issues. Combined with TCP, handshake latency becomes a bottleneck.

Merging TCP and TLS Handshakes

To address these inefficiencies, some proposals have aimed to merge TCP and TLS handshakes. However, due to protocol ossification and the difficulty of updating TCP implementations in kernels, these ideas remain largely theoretical. Still, merging the handshakes would have potential benefits, such as resolving sequence number issues and improving security.

Yet, simply merging TCP and TLS handshakes does not eliminate risks such as SYN-flood attacks, especially given TLS's resource-intensive handshake process.

How QUIC Implements Its Handshake

With these challenges in mind, QUIC takes a unique approach by fully integrating TLS 1.3, offering a solution where QUIC provides a reliable byte-stream abstraction for TLS, and TLS ensures security for QUIC. We'll explore this design further in subsequent sections.

How QUIC Provides a Reliable Byte Stream for TLS 1.3

After the handshake phase, once the symmetric keys are negotiated, TLS primarily focuses on encrypting and decrypting the payload of QUIC packets to ensure security. In this context, QUIC essentially assumes the role of the TLS Record layer. The ARQ implementation in QUIC ensures that TLS 1.3 still operates over a reliable byte stream.

During the handshake phase, QUIC also guarantees that the TLS handshake completes based on a reliable byte stream. For this purpose, QUIC introduces a dedicated QUIC Crypto Frame to carry TLS handshake messages. The offset and length fields in the QUIC Crypto Frame ensure that TLS 1.3 operates on QUIC as it would on TCP, which provides a reliable byte stream abstraction.

To elaborate further, QUIC introduces the concept of frames. In TCP, control information (e.g., SYN, FIN flags, or ACK details) is embedded in the TCP header. By contrast, QUIC uses distinct frames for such information, effectively decoupling control information from the transport channel. This design offers numerous advantages, such as enabling QUIC’s core feature of multiplexed streams. Additionally, this modularity provides advantages for implementing ARQ, making it more flexible and efficient.

arch-01

It is also worth noting that as a transport-layer protocol, QUIC negotiates various transmission parameters to enhance efficiency. Unlike TCP, which places these parameters in its header or options, QUIC uses Transport Parameters. Since the QUIC handshake relies entirely on the TLS 1.3 handshake for data exchange, these parameters are transmitted as extensions in TLS handshake messages (QUIC Transport Parameters Extension). This approach highlights the seamless integration of QUIC and TLS rather than a simple layering.

How QUIC Uses TLS 1.3 for Encryption

TLS messages mainly fall into two categories: Handshake and Application Data. Within Handshake messages, there are further distinctions between plaintext and encrypted messages. From an encryption perspective, TLS can be divided into TLS unencrypted Handshake, TLS encrypted Handshake, and Application Data. QUIC reflects this structure by defining three separate spaces: initial, handshake, and application. Each space operates independently, with its own sequence numbers and ARQ mechanisms.

QUIC encrypts the payload of each packet using symmetric keys negotiated via TLS 1.3. The three spaces (initial, handshake, and application) each use different symmetric keys for encryption. The initial space is not strictly plaintext but is encrypted using symmetric keys derived from a salt specified in the QUIC RFC. While this provides minimal security ("defense against casual attackers"), it increases the attacker's computational cost. The handshake and application keys are derived using HKDF from the symmetric keys negotiated during the TLS 1.3 handshake.

In addition to payload encryption, QUIC introduces Header Protection, which encrypts certain parts of the packet header, such as the packet sequence number and specific bits of the QUIC flag. This approach ensures a deeper integration with TLS 1.3. Header Protection uses the AES-ECB encryption algorithm, which is lighter and faster than the AEAD mode used for payload encryption. While AES-ECB carries some security risks (it is not supported in ring for this reason), it is sufficient for header protection, balancing performance and security.

arch-01

The order of operations for payload encryption and Header Protection is as follows: Payload encryption occurs first, followed by Header Protection. During packet parsing, the reverse order is applied—Header Protection is removed first, then payload decryption is performed. This sequence simplifies QUIC packet processing, as the protocol layer can first access packet types and sequence numbers for basic transport-layer operations like duplicate detection.

QUIC Retry

Thus far, we’ve described the basic workings of QUIC-TLS handshakes, which achieve both transport-layer handshake functionality and TLS asymmetric key negotiation. However, this raises a security concern. The three-way handshake ensures peer authenticity in traditional protocols, but the fusion of QUIC-TLS handshakes with TLS negotiations means that peer verification might not be fully completed during the TLS 1.3 handshake. Since TLS 1.3 handshakes impose significant computational costs on the server, this opens the door to amplification attacks. To mitigate this, the QUIC protocol specifies that servers cannot send more than three times the handshake data received.

While AEAD’s MAC mechanism can verify message authenticity, it cannot be leveraged during the handshake process. Therefore, QUIC introduces the Retry mechanism to secure the handshake. Similar to TCP’s sequence number negotiation, QUIC Retry uses tokens and tags to verify the authenticity of both peers. Tags utilize AEAD’s message authentication capabilities to confirm the validity of Retry messages received by the client.

I recall that when we deployed our custom UDP-based transport protocol, it initially had amplification vulnerabilities. A senior colleague identified and resolved these issues promptly. In protocols like DTLS, the HelloRetryRequest message serves a similar purpose, providing both re-negotiation of encryption parameters and peer authenticity checks akin to QUIC Retry.

Coding Implementation and Conclusion

Moving to implementation, I opted not to use existing QUIC-TLS solutions provided by libraries like rustls or OpenSSL. Instead, I used third-party libraries like ring and aes to implement the cryptographic algorithms required for QUIC-TLS. As mentioned in previous blogs, QUIC handshakes are fascinating, and relying solely on SSL libraries would deprive me of much of the enjoyment. Of course, the trade-off is that this project won’t benefit from the stability and security guarantees of established SSL libraries, but that aligns with the project’s objectives.

Initially, deriving symmetric keys for the initial space was challenging due to my limited familiarity with certain cryptographic algorithms. Debugging against NGINX revealed issues in my derived keys, prompting me to cross-reference OpenSSL and rustls source code. Once resolved, the same implementation could be reused for the handshake and application spaces, with the secret keys provided by TLS instead of being derived from the RFC.

While implementing QUIC Retry, I simplified debugging by fabricating TLS-related packets. Fortunately, NGINX did not strictly validate the initial contents of QUIC Crypto frames (or perhaps OpenSSL’s do_ssl_handshake had not yet flagged an error), allowing me to verify the Retry implementation.

There are many other design details in QUIC, such as length optimization and sequence number compression, that I haven’t touched on but are essential to careful implementation. I also introduced numerous command-line options for this QUIC client tool, enabling various test scenarios like customizing handshake packet lengths or defining Source Connection ID and Orignial Destination Connection ID.

Although this project only implements the QUIC handshake, during the coding process, I had to design the structural framework of the send queue for feather-quic. This raised the question of whether the basic unit of the send queue should be a QUIC Packet or a QUIC Frame. Considering that QUIC will later implement retransmissions as part of the ARQ mechanism, and that QUIC-TLS also supports Key Updates, retransmissions will require re-encrypting and regenerating the QUIC Packet. Therefore, the basic unit of the send queue must be the QUIC Frame. This approach offers greater flexibility, allowing each QUIC Frame to be freely prioritized and assembled into a QUIC Packet. Each frame is also responsible for tracking whether it has been acknowledged by the peer, enabling the implementation of many QUIC features.

arch-01

In conclusion, QUIC handshakes are highly complex but immensely rewarding to study. This blog aimed to explain key implementation details based on my understanding. Due to space constraints, I didn’t cover TLS 1.3 handshake details but focused on the overall QUIC-TLS implementation. For further exploration, see the related PR and branch. The next blog will delve into implementing the TLS 1.3-related part.