Implementing QUIC from Scratch with Rust: Connection Close and Error Handling

on 2025-05-03

First, as usual, I take a few swings at TCP and explain why QUIC's connection-close design is much better. Then I implement the cases where a QUIC connection needs to close because of errors. Finally, the first phase of this project is roughly complete, so I write down a few thoughts from the journey so far.

How TCP Closes Connections

Starting with the TCP Four-Way Close

When people mention TCP's four-way close, the first thing many of us picture is that painful state transition diagram. When I first learned it, I really wondered why it had to be so complicated. This reminds me of the time when I was interning in Xierqi and preparing for campus recruiting interviews. Every morning, squeezed into the subway, I would review the TCP state machine. The Xierqi subway was ridiculously crowded. Staff would even push people from outside to make sure each train was full enough.

To be honest, if you ask me to draw the state machine from memory today, I still cannot guarantee it is 100% correct. But if you understand the key problem TCP close is trying to solve, you do not need to memorize it mechanically. You can usually reconstruct it well enough. As dog250 once said, we are essentially technical workers. Knowing TCP is just a required brick-laying skill. TCP is not scripture, so there is no need to nitpick every corner.

TCP four-way close state machine

The reason TCP close is complicated is simple: TCP is a full-duplex transport protocol. In theory, the side that actively starts closing can only close its own write direction. This means TCP close has to consider whether the peer has also started closing its transmission direction by sending a FIN packet. That is the difference between FIN WAIT and CLOSING. Just imagine how nice it would be if a transport connection-close design only needed to consider closing the local connection and notifying the peer that the connection has closed. We will praise QUIC for that shortly.

Another thing everyone notices is that the active closer eventually enters TIME WAIT. This is another classic interview topic. TIME WAIT makes the kernel keep TCP connection state, so the corresponding four-tuple remains occupied. When there are too many TIME WAIT sockets, new sockets may fail during connect, which is a common engineering problem in high-concurrency systems and therefore appears often in interviews.

But the design purpose of TIME WAIT is only to ensure packets from an old TCP connection do not affect a new connection. For example, when one side closes, it must send a FIN packet, and the peer must explicitly acknowledge it. After the active closer receives the peer's FIN and enters TIME WAIT, it must reply with a FIN-ACK. Otherwise, the peer will keep retransmitting FIN packets, and those old FIN packets may affect a later TCP connection.

At this point, we can see that the conditions for affecting a new connection are actually quite strict. First, the four-tuples of the old and new connections must be exactly the same. Second, the sequence number range at the end of the old connection must overlap with the randomly negotiated sequence number range of the new connection. Only then can the FIN affect the new connection. Of course, it is not only FIN packets. Some final data packets from the old connection may also affect the new one.

In any case, since it is possible, TCP has to consider this edge case instead of ignoring it. TCP's solution is TIME WAIT, kept for twice the MSL, so packets from the old connection can fully disappear and not affect any possible new connection.

I went through these familiar details only to make it easier to criticize TCP's close design. First, TCP's connection identity is too simple. The four-tuple design still leaves collision possibilities. TCP suffered from this during handshake, and it still has to solve it during close. Second, TCP insists that FIN must be acknowledged before the close is considered complete. That is a little too perfectionist and not simple enough in engineering practice. In the real world, many implementations do not follow every step neatly, and lost FIN acknowledgments are common.

Abnormal Ways to Close a TCP Connection

Let me also mention other TCP close paths.

The first is TCP reset. It complements the normal four-way close. If the four-way close is the happy path, reset is a quick close mechanism for abnormal situations. A classic example: if a TCP client starts a handshake to an address where nothing is listening, kernel TCP stacks usually respond quickly with a TCP RST packet. TCP stacks do apply validation to RST packets, such as requiring the sequence number to be inside the receive window. But there are still security risks. The tcpkill tool I mentioned earlier is a classic example.

The second is TCP keepalive. If TCP keepalive is enabled, the stack sends probes after the configured idle time. If the peer does not respond, the TCP stack silently destroys the connection. It does not run the four-way close and does not send an RST. An interesting detail is that if the peer's TCP state was destroyed but the machine itself is still reachable, the peer may respond with an RST according to the TCP reset mechanism, helping the local side destroy the connection quickly.

Another interesting detail is how TCP performs keepalive probing. The implementation is simple: it sends an empty packet with zero length. The trick is how to make the peer definitely reply with an ACK. The empty packet's sequence number is deliberately set to already acknowledged data. The peer treats it as an out-of-order packet and quickly sends an ACK with its latest receive state.

QUIC's Connection-Close Design

How QUIC Solves the Problems TCP Close Faces

In the previous post about multiplexed streams, I spent a lot of space explaining QUIC Stream close design. So it should already be clear that QUIC connection close does not need to handle the full-duplex problem. Full-duplex behavior belongs to data transport channels, and QUIC Stream has already solved it. At the connection layer, QUIC only needs to consider closing the connection itself, notifying the peer, and handling being closed by the peer.

TCP identifies connections by the four-tuple, or five-tuple if you include the protocol: TCP, source IP, source port, destination IP, destination port. Handling the possibility that packets from old and new connections interfere with each other is a required course in TCP design. QUIC basically does not need to worry about that.

First, QUIC does not identify connections by the UDP four-tuple. It uses the source and destination connection IDs negotiated by both sides, so collision probability is much lower than TCP. Second, and more importantly, QUIC went through the trouble of integrating TLS into the transport layer. A QUIC packet must be decrypted successfully before its QUIC frames can be processed by the stack. AEAD encryption guarantees that only packets encrypted with keys derived for the current connection can be decrypted successfully. Even if packets from another QUIC connection accidentally enter the current connection's processing path, they cannot pass the final decryption gate.

Concrete Design Details

Now we can talk about the concrete details of QUIC close. Similar to TCP, the QUIC RFC starts by stating that QUIC connections close in only three cases: idle timeout, immediate close, and stateless reset. These roughly correspond to TCP keepalive timeout, normal close, and reset. Because QUIC's protocol design avoids many of the problems mentioned above, its connection-close design is much simpler than TCP's.

Idle Timeout

QUIC idle timeout differs from TCP keepalive in several ways.

First, QUIC idle timeout has negotiation. Both sides confirm their values during the QUIC handshake, and the smaller value becomes effective. TCP keepalive must be enabled by the application and takes effect unilaterally.

Second, QUIC idle timeout does not send special probes after the idle duration is reached, the way TCP keepalive does. It directly treats the connection as something that should be closed immediately.

They also share similarities. After the idle timer fires, both choose silent close instead of the normal close flow. QUIC idle timeout can also be refreshed by ACK-eliciting frames, such as PING. This reflects the same basic definition of connection idleness: only sending or receiving ACK-eliciting data should reset the idle timer.

Finally, the QUIC RFC recommends that implementations provide users with a way for the lower layer to automatically keep connections alive. For example, when the application has no data, the QUIC stack can send ACK-eliciting frames to keep the connection from being killed by idle timeout. I think this kind of keepalive should be maintained by the application through its own heartbeat, rather than implemented by the transport stack. To the application, the transport stack is a black box and not fully controllable. For behavior that may affect transmission efficiency, letting the application handle it is often more flexible and efficient.

Normal Close, or QUIC's Close Flow

Client-server flow for QUIC connection close

As mentioned above, QUIC no longer needs to consider full-duplex close semantics or interference between old and new connection packets. So the normal close flow is straightforward.

As shown in the diagrams, QUIC mainly has two states during close: Closing Connection State and Draining Connection State. The former is entered by the side that actively starts closing. The latter is entered by the side that is notified by the peer. Once a QUIC connection enters Closing Connection State, if it receives new packets, it only needs to return the Connection close frame again and reject the peer. For the QUIC stack that enters Draining Connection State after receiving a close message, things are even simpler: reply with a no-error Connection close frame, then stop sending any data.

QUIC Closing and Draining state transitions

QUIC defines the Connection close frame so one endpoint can actively close the other endpoint. There are two excellent design details here.

First, it can carry a custom error code and error description. At first glance, that may not sound impressive. But if a transport protocol can easily expose error-code dashboards, its stability and quality after launch can be much easier to maintain. TCP cannot carry this kind of error information by itself and has to rely on extra application-layer design. QUIC can do this because, from the beginning, it separated control messages from transport data. Everything is an independent QUIC Frame. TCP only reserved one bit for FIN in its header, so it cannot do much here.

Even better, Connection close frame separates errors into QUIC transport-layer errors and application-layer errors. This is very considerate and saves the application layer a lot of work. The QUIC RFC also defines QUIC transport error codes in detail, which is very helpful for quickly locating QUIC-related problems. For example, we can use Wireshark to quickly inspect why a QUIC connection was closed. The error codes defined by TLS are also mapped into QUIC transport error codes. I can only say: respect to the QUIC designers. They really covered the details.

The second excellent detail is that Connection close frame does not need to be ACKed. This greatly simplifies QUIC's normal close flow. The side that actively closes the QUIC connection does not need to care about the peer's state. If the application decided to close the connection, it presumably has already finished what it needed to do.

Even if the peer does not receive the Connection close frame, it can repeatedly get it by sending normal data in the short term. If the peer has no data to send and also misses the local side's three-PTO Closing Connection State, it can still close itself through idle timeout. Someone may ask: what if idle timeout is not configured, and by coincidence the peer misses all Connection close frame messages? How can the peer close in time and avoid wasting resources? That is why QUIC has an abnormal close mechanism, described below. In real engineering, however, QUIC idle timeout should still be configured, because abnormal close is not a cure-all.

Abnormal Close

QUIC's abnormal close mechanism is the stateless reset message. When a QUIC connection receives the corresponding reset message, it immediately enters Draining Connection State, then destroys itself. But the QUIC RFC spends a lot of text explaining how to use randomly generated, negotiated, and validated tokens to strengthen reset-message security. The natural question is: doesn't QUIC encryption already ensure messages are trusted? Why does reset need extra security validation?

The answer is simple: QUIC stateless reset is a close mechanism used after the QUIC stack has destroyed active connection state. The encryption/decryption context for that connection is gone, so the stack can no longer send a normal Connection close frame. Recall the TCP example above: when the kernel TCP stack receives a handshake request to a target with no listener, it often replies with an RST packet. QUIC wants to cover a similar scenario. Even without an active QUIC connection, it can help the peer's QUIC stack close quickly and avoid wasting resources.

Some people may say: QUIC stateless reset has a security validation mechanism, which is good, but a QUIC stack with no active connection should not keep stateless reset tokens for a long time either, because that also costs resources.

QUIC already considered this. RFC 9000 specifies that reset tokens can be generated from a fixed static key and the DCID of the invalid request packet through a defined cryptographic method. With this design, the QUIC stack does not need to waste resources recording tokens. In a QUIC cluster, there is also no need to synchronize token records through extra middleware. Each stack that receives an invalid packet can generate the same token by itself, matching the token the peer had previously recorded, and then safely and quickly close the peer connection.

This reminds me of a previous UDP-based custom protocol project I worked on. Its token implementation used the same idea, because synchronizing machine state inside the cluster would have been too heavy.

But as emphasized above, this is still not a cure-all. Even if QUIC has considered everything, what if the whole cluster is down? So an idle timeout value that fits the business scenario is still necessary.

Edge Cases

Preventing Amplification Attacks

There are basically two places in the QUIC close flow where amplification attacks need to be considered. If I remember correctly, earlier posts already discussed what amplification attacks are, so I will directly explain how QUIC defends against them during close.

First, after QUIC enters Closing Connection State, the spec says it should actively respond to peer packets with a Connection close frame, helping the peer close quickly. At that point, the QUIC stack has two choices. One is to keep normal QUIC connection state and only respond with Connection close frame when it receives packets that can be decrypted correctly. Since those packets are trusted, there is no amplification risk. This is the implementation chosen by feather-quic.

The other choice is to reclaim resources faster by destroying most connection state, including decryption capability, and only keeping the previous Connection close frame. In that case, the QUIC connection can no longer distinguish whether incoming packets are trustworthy, so it must limit how often it sends Connection close frame to avoid amplification attacks. Although Closing Connection State usually lasts only a short time, around three PTOs, we still need rate-limiting.

The other place is QUIC stateless reset. As long as an invalid packet arrives and no active connection matches it, the QUIC stack may compute a 16-byte reset token and assemble a reset message in response. There are usually two attack scenarios.

In the first, the attacker spoofs the packet's source IP as the victim's address and sends invalid traffic, tricking a legitimate peer into sending QUIC reset messages to the victim. If the QUIC implementation follows one rule, that the response reset message must never be larger than the triggering packet, the amplification problem is naturally solved.

The second scenario is more interesting, and I did not think of it at first. It is mentioned in the QUIC RFC. The attacker spoofs the packet's source IP as another cluster that also supports QUIC. In that case, the reset response may trigger another reset response from the other QUIC cluster, creating an infinite loop where both sides keep hitting each other. The attacker spends almost no resources and gains leverage.

Therefore, the QUIC spec requires reset messages not only to be no larger than the triggering packet, but strictly smaller than it. Since reset messages have a minimum length of 21 bytes, overly small trigger packets cannot continue the loop. The QUIC stack should also apply send-rate limits when sending reset messages to provide another safety layer and break the loop faster.

Closing During Handshake

Both QUIC Initial and Handshake spaces support carrying Connection close frame. This does not surprise me. When I implemented the handshake earlier, I was sent close messages by production-grade QUIC stacks many times. This time, I needed to actively close in certain situations during handshake, so the QUIC RFC gave several useful details to watch.

First, send Connection close frame in packets with the highest available data protection level. This is to avoid the peer having already discarded lower-level symmetric keys. At the same time, we also need to handle the case where the peer has not yet negotiated higher-level keys and cannot decrypt higher-level packets. The solution is simple: send in both Initial and Handshake spaces when appropriate, and put them into the same UDP datagram.

Second, the RFC recommends caution when an error happens in an Initial packet. Do not directly send Connection close frame, because this may still leave room for attackers. An on-path attacker can easily forge Initial packets and affect a normal handshake. My implementation therefore requires that the packet be decrypted successfully, and that the error happen outside the Initial space, before replying with Connection close frame.

Stage Reflections

I used tokei to check the code size of the project. I did not expect that implementing only the basic features of a QUIC stack would already require this much code, although a fair amount of it is integration tests. This made me feel that I probably could have optimized more and reduced the amount of code if I had invested more effort.

Although I have written quite a bit of Rust, I have only gained a basic understanding of it so far. Many interesting parts of Rust have not been deeply experienced. For example, my project still uses a classic networking model: single thread plus event loop. So I did not get much practical experience with Rust's thread-safety design. I have not yet tried using Rust async/await to turn feather-quic into a more modern protocol library. I also rarely used Rust macro metaprogramming. I hope I can use these more later and experience Rust more deeply.

Screenshot of feather-quic project code statistics

The basic QUIC functionality is finally done. It took much longer than I expected. I originally thought the earlier parts would be easy to slice through quickly, but the process of writing code, testing it, and then writing technical blog posts consumed far more energy than I had expected. In the blog posts, I tried to explain classic problems from a fresh angle. I think the key to freshness is to stand in the designer's position, understand the difficulties and historical baggage they faced, and then look at the actual solution. That kind of explanation should be better than a plain chronological walkthrough. Of course, limited by my own ability, some parts are still not explained as well as I would like. I can only polish them later when I get the chance.

Another small regret is that the project is still not very complete. The code probably still has quite a few bugs, and some implementation details need repeated thinking. As I said before, I originally thought I could finish quickly, because I also wanted to start an investigation-tool project I had been thinking about for a long time, probably an eBPF and DWARF based debugging tool for user-space processes. But I overestimated my technical ability and available energy, so the battle line has stretched out for a long time.

Finally, I can move toward some advanced QUIC features. Surely this will not take longer than the earlier work, right? Most likely it will take much longer. As usual, here is the PR related to this post, plus some bug fixes I made after noticing something felt wrong while writing the blog. Also, EA FC 25 has started the Team of the Season event. Should I give EA another chance?