Implement QUIC from Scratch with Rust: Implement TLS 1.3 Handshake and QUIC-TLS Key Update

on 2025-01-10

A Simple Introduction to TLS 1.3

Although I frequently use TLS in my work and sometimes have to dig into OpenSSL to troubleshoot issues, I’m far from being an expert in security. Here, I’ll briefly describe the core aspects of TLS, why they work the way they do, and the security algorithms involved based on my limited understanding. For advanced algorithms, I will fully rely on the ring crate for implementation.

If I were to summarize TLS in one sentence: TLS uses asymmetric encryption to negotiate symmetric keys and then employs symmetric encryption to ensure upper-layer security. This raises the questions: what are asymmetric and symmetric keys? Why not just use symmetric keys directly? And how does TLS ensure security?

Symmetric encryption algorithms use the same key for both encrypting and decrypting data. Since they are primarily based on bitwise operations, they are computationally efficient but can be easily compromised once the encryption method is known. In contrast, asymmetric encryption employs a pair of keys: a public key for encryption and a private key for decryption. While offering stronger security, asymmetric encryption is computationally complex and slower. Therefore, TLS uses asymmetric encryption during the handshake phase to negotiate a symmetric key for subsequent communication, achieving the security of asymmetric encryption with the performance benefits of symmetric encryption.

The most common asymmetric algorithms are RSA and ECDH. The standout feature of ECDH is its forward secrecy. Forward secrecy ensures that even if the public-private key pair used in asymmetric encryption is compromised, the security of past communications remains intact. How is this achieved? ECDH relies on ephemeral keys in addition to the public-private key pair. These ephemeral keys are stored only in memory (not transmitted over the network or saved) and are critical to ensuring forward secrecy, leveraging principles of mathematical symmetry. In TLS 1.2, RSA and ECDH are optional during negotiation, whereas TLS 1.3 mandates ECDH for asymmetric key negotiation.

Symmetric encryption algorithms include AES-GCM, AES-CBC+HMAC, and ChaCha20-Poly1305. A key takeaway is that AEAD algorithms have become the mainstream choice. AEAD algorithms provide both encryption and integrity protection, unlike AES-CBC+HMAC, where encryption and authentication are separate operations and thus not considered AEAD. In TLS 1.3, AEAD encryption algorithms are mandatory, deprecating less secure or efficient algorithms such as 3DES and AES-CBC. Out of curiosity, I also explored the differences between ChaCha20-Poly1305 and AES-GCM. While AES-GCM is typically preferred due to hardware acceleration, ChaCha20-Poly1305 outperforms it in environments lacking hardware acceleration, such as mobile devices or low-power systems.

TLS employs key derivation algorithms to expand the negotiated key. Before starting this project, I wasn’t even aware of such a concept. On reflection, it makes perfect sense: asymmetric key negotiation usually produces only a single key. However, TLS requires multiple encryption phases with distinct keys for better security, and using the same negotiation process for each would be impractical. Key derivation algorithms not only offer flexibility but also enhance security by iteratively deriving keys from the original key. TLS 1.3 adopts the HKDF key derivation algorithm, an improvement over TLS 1.2’s PRF algorithm. Both are HMAC-based, but HKDF is reportedly more secure and flexible. However, analyzing their specific differences is currently beyond my bandwidth.

Lastly, digital certificates are essential for ensuring the security of public keys during asymmetric key negotiation. While indispensable to TLS, I skipped implementing certificate verification in my project. Given the challenges of correctly using HKDF to derive TLS 1.3 keys, I opted to sidestep the complexity of digital certificate validation.

TLS 1.3 vs. TLS 1.2 Handshake Details

Let’s compare handshake details between TLS 1.3 and TLS 1.2. This is critical since I’m developing a simple TLS 1.3 client, where the handshake process is paramount. The biggest difference is TLS 1.3 simplifies the handshake, requiring only one RTT (Round Trip Time), whereas TLS 1.2 requires two. I believe this optimization isn’t solely due to TLS 1.3 using ECDH but also because TLS 1.3 employs a "weak negotiation" approach compared to TLS 1.2's "strong negotiation."

To elaborate: TLS 1.2 lists all supported asymmetric algorithms and cipher suites at the handshake initiation and waits for the server to confirm before proceeding with key negotiation. TLS 1.3, on the other hand, lists supported algorithms but proactively initiates negotiation with a selected subset instead of waiting for server confirmation.

This proactive approach streamlines the process. Extensions like Key_Share facilitate this early negotiation. Moreover, deprecating insecure algorithms in TLS 1.3 reduces choices, increasing the likelihood of successful negotiation. The Hello Retry Request mechanism acts as a fallback for renegotiation.

Additionally, the handshake process in TLS 1.3 is more straightforward. TLS 1.2 includes a ChangeCipherSpec message to indicate that subsequent handshake messages will be encrypted using the negotiated symmetric key. TLS 1.3 removes this step, as the process is inherently clear. Post client_hello and server_hello messages (plaintext asymmetric negotiation), both sides have the symmetric key, and subsequent messages like encrypted_extensions are encrypted. Once the finished message is verified, the handshake is complete, rendering ChangeCipherSpec unnecessary.

Finally, TLS 1.3 mandates strict key disposal after each phase, enhancing forward secrecy. For example, QUIC-TLS specifies explicit disposal times for Initial and Handshake keys.

Key Update

Key Update: Principles and Purpose

The Key Update mechanism was introduced in TLS 1.3 to periodically refresh the symmetric keys used for communication. This enhances the security of TLS by preventing prolonged use of the same key. An extreme example is with AEAD encryption algorithms, which have a data limit; overusing a single key can weaken their security. You can refer to the AEAD encryption algorithm limits for more details. The principle behind Key Update is straightforward: it uses the HKDF key derivation algorithm to derive a new key from the existing negotiated key, following specific protocol rules.

Key Update Implementation in TLS 1.3 vs. QUIC

While this could conclude the explanation of Key Update, there’s a significant difference between how it is implemented in TLS 1.3 and QUIC, which is worth exploring. QUIC optimizes the process, and I spent extra time understanding its design details while working on a QUIC Key Update implementation. Let me summarize my findings.

Above, I explained how Key Update replaces symmetric keys using the HKDF derivation algorithm. However, a critical question remains: What triggers a Key Update?

TLS 1.3 Key Update Trigger Mechanism

In TLS 1.3, Key Update can be triggered after the handshake is complete—that is, after both sides have received their respective finished messages. At this point, either party can send a key_update message to notify the peer that the key needs updating.

The sender begins encrypting subsequent data with the updated key immediately after sending the key_update message.
The receiver, upon receiving this message, updates its key and replies with its own key_update message.
This ensures both parties use the new key for all future communications.

Since TLS operates over a reliable, full-duplex channel, the explicit exchange of key_update messages ensures clarity. Even if both sides coincidentally trigger a Key Update simultaneously, the protocol remains unaffected. The worst-case scenario is both sides updating the key twice, which is harmless.

Challenges in Applying TLS Key Update to QUIC

However, the TLS 1.3 Key Update mechanism cannot be directly reused in QUIC-TLS because QUIC integrates TLS tightly within its transport layer and does not rely on a reliable byte stream. Using explicit key_update messages like TLS 1.3 would be inappropriate.

Instead, QUIC adopts an elegant solution. In the QUIC Short Header Packet, which is used for application-level data, there’s a dedicated key_phase bit that indicates whether a key update has occurred. Since Key Update only happens after the handshake, only short packets carry this bit.

For security reasons, the key_phase bit is protected by QUIC’s Header Protection. A common question here is: Does Key Update affect the Header Protection key? The answer is no—Key Update does not change the Header Protection key. This ensures the receiver can correctly process the key_phase bit.

Handling Packet Reordering in QUIC Key Update

A key challenge with QUIC’s approach is handling packet reordering. For example, if the sender triggers a Key Update and signals the receiver via the key_phase bit, but an old packet encrypted with the previous key arrives late, how should this be handled?

The solution suggested in the QUIC RFC is to delay discarding the old key. For example, the old key can be retained for a period, such as three times the Probe Timeout (PTO), to allow for late packets to be decrypted. This minimizes the impact on transmission performance.

Ensuring Correct Key Update Behavior in QUIC

Another question I had early on was whether the key_phase bit alone could handle all scenarios. What happens if a delayed packet encrypted with an old key arrives after the receiver has already updated its key?

The answer lies in a simple rule: only a packet that can be successfully decrypted with the current key and shows a key_phase transition can trigger a Key Update.

If a packet encrypted with an old key arrives but cannot be decrypted with the current key, it is ignored—no key update is triggered.
Similarly, if the old key has already been discarded and the packet cannot be decrypted, it is dropped.

This rule ensures that even if both sides simultaneously initiate a Key Update, the process remains seamless. Following these principles, QUIC effectively manages Key Update and ensures robust communication.

SSLKEYLOG

When debugging TLS traffic, you’ve probably used an SSL library to export an SSLKEYLOG file and then loaded it into tools like Wireshark to decrypt the TLS traffic. It’s incredibly handy. Of course, many modern eBPF-based tools can directly capture unencrypted traffic from within the SSL library or application itself, but that’s not our focus today.

To make debugging feather-quic easier, I also added support for generating SSLKEYLOG files. Honestly, I never paid much attention to the specific contents of this file when using it—just knew that it stores symmetric keys. To implement SSLKEYLOG support, I referred to its RFC documentation.

The document clearly states that each line in the file represents a key, with three fields in each line:

Label: Indicates the type of key, such as whether it’s for the handshake or after the handshake. For example, Key Update replaces keys, so the labels for handshake and application-layer keys include numbers to indicate updated keys sequentially. Thinking back, I’ve occasionally seen a lot of keys in SSLKEYLOG files and never paid much attention. Now it’s clear—Key Update is the culprit.
Client Random: This corresponds to the Random field in the client_hello message. It’s used to uniquely identify the TLS connection.
Symmetric Key: The actual symmetric key, represented in hexadecimal.

With this structure, it’s straightforward to generate SSLKEYLOG files during TLS 1.3 handshakes. This is invaluable for real-time debugging and troubleshooting.

arch-01

Implementation Details

First, I genuinely appreciate the simplifications TLS 1.3 introduced compared to TLS 1.2—it made the implementation process much easier for me.

I also owe a lot to the ring crate, which handled the security algorithms I didn’t want to dive into myself. For instance, it provides X25519 for the ECDH asymmetric key exchange, as well as other complex algorithms like AES-GCM and HKDF, which I mentioned earlier.

The Pitfall with HKDF

However, I ran into a frustrating issue with HKDF, which cost me quite a bit of time. Looking back, the main reason was my unfamiliarity with the algorithm’s specifics. This led to mistakes when calling the interface provided by ring. My initial thought was simple: just supply the necessary inputs for HKDF, and it should work.

But the Prk::expand interface turned out to be unexpectedly cumbersome. Its info parameter, a byte array, has a somewhat archaic assembly process. If this were C code, I wouldn’t have minded. But since it’s Rust, I feel like ring could at least provide a simple wrapper instead of leaving me to figure out how to construct it myself. 😭

It took me some time digging through the implementation in the rustls crate to get a better idea of how to use it. The documentation for the related interface was also quite sparse, offering no explanation about the info parameter and only linking to the HKDF RFC. It seems ring assumes only security professionals would use it—clearly not expecting someone like me, a security novice, to dabble with it.

Challenges with the TLS 1.3 RFC

Another source of frustration was the TLS 1.3 RFC’s description of HKDF derivation details. It’s very concise, which isn’t exactly beginner-friendly for someone who just wants to quickly implement a prototype of TLS 1.3 HKDF key derivation.

To be clear, I’m not saying the RFC is poorly written—on the contrary, it’s excellent. But for me, understanding it required extra time. I also had to print every key at each stage and compare them with OpenSSL’s implementation to get the code working.

A particularly helpful resource was the TLS 1.3 key generation example. This document saved me a lot of effort. To be honest, I found it by brute-forcing a Google search with the hex representation of a fixed key used in the early stages of TLS 1.3. 😉

Cutting Corners for Learning

In the end, to complete the handshake process, I ensured the client’s finished message was generated strictly according to the RFC. This guaranteed no validation issues on the peer’s side.

However, I skipped verifying the certificate and finished messages sent by the server. As I mentioned earlier, this project is just a learning tool. Perhaps one day I’ll get serious and adopt a robust, efficient, and well-known SSL library to support feather-quic properly.

Epilogue

At this point, the QUIC handshake is almost complete. The only remaining part related to QUIC-TLS is the most challenging and interesting aspect: 0-RTT implementation. However, before diving into 0-RTT, I want to finish implementing QUIC ARQ because, without a retransmission confirmation mechanism, QUIC can't truly be considered functional. Once I complete the basic functionality of QUIC, I'll revisit and tackle the highly intriguing 0-RTT.

Also, I need to carefully plan how to test all the edge cases, as many scenarios are quite difficult to test. But I'll tackle that work as I move forward with implementing additional features.

Here are the related PR and branch for this article’s implementation. The progress has been delayed a bit, mainly because Overwatch China server relaunched, so I’ve been playing a bit every day. Additionally, I’ve been trying out Marvel Duel (learning new games is way harder than learning new technologies, really 😅), and I still haven’t completed my Diablo IV Season 6 battle pass tasks. This past weekend, I couldn’t resist and stayed up late watching Slow Horses Season 3 and 4—it was exhausting 😭.