While we were using an Apache ActiveMQ broker at my job. It has been reported by the operators last week that the broker could not open new sockets because it has reached the maximum limit of open file descriptors. The operators have also mentioned that the majority of the TCP sockets where in the CLOSE_WAIT state. My initial reaction has been to confuse the TCP state CLOSE_WAIT with the TIME_WAIT state.
Yesterday, someone else reported the same problem on the Apache ActiveMQ developers’ mailing list and I have noted by reading posts from people trying to suggest workarounds that I was not the only one to mix up the TCP state CLOSE_WAIT and TIME_WAIT. So this gave me the idea to take this opportunity to write about the differences between these 2 states since apparently there is a widespread need for clarifications on that matter.
A TCP connection goes into the CLOSE_WAIT state when it receives a FIN segment from its peer. From that point the connection becomes half-duplex and the TCP connection will not receive any new data from its peer but it will still be able to send any amount of data back to the peer. Hence, the socket will stay there as long as the server does not call close() explicitly on the socket.
I am not sure on this but I think this is exactly what the state name means. The socket is waiting for the application to call close() before going away.
TIME_WAIT is the TCP state a connection will go when it performs an active close (it is initiating the connection shutdown) after having received the peer FIN segment and sent it back an acknowledgment for the FIN.
The connection will stay in that state for 2 times the maximum segment live (MSL). As an interesting side note, the MSL has nothing to do with IP time to live (TTL) or the TCP connection round time trip (RTT). The MSL value is a constant and varies from one TCP implementation to another. RFC 793 suggests using 2 minutes for the MSL and the implementation shown in this book is using 2 minutes but 4.4BSD is using 75 seconds and Linux close to 4 minutes. The reason why MSL is unrelated to IP TTL and RTT is that a TCP segment can outlive the IP packet carrying it even if routers takes longer than usual to deliver it since a TCP stack will retransmit it.
TIME_WAIT is needed because it is possible that the TCP acknowledgment for the received FIN gets lost (or any segment containing data prior the FIN). In that case, the peer will try to retransmit its FIN for the MSL period. Waiting for a longer period than MSL is the only choice TCP can make as there are no acknowledgements for ACKs (that would be recursive anyway and solve nothing if ACKs were acknowledged).
I have been wondering for some time why TCP does not wait 2 times MSL before leaving the LAST_ACK state like it does for TIME_WAIT since there are scenarios where TCP could receive packets from the previous connection after leaving LAST_ACK and I think that I have understand why. In TIME_WAIT, you want to handle peer retransmission otherwise it could think that its last segments have never been received. In LAST_ACK, there is nothing else to acknowledge so it does not matter if TCP reset the connection after leaving LAST_ACK.
For instance, let’s say that the stack retransmits its FIN and both the original FIN and the retransmitted one get acknowledged. What will happen is that at the reception of the first ACK, TCP will release its Transmission Control Block (TCB) associated to this connection and at the reception of the second ACK, it will reply with a RST. It is no big deal. The other peer will be in the TIME_WAIT state and will just conclude that the peer is gone when it will read the RST.
Comments are closed for this post.
I want you to find in this blog informations about C++ programming that I had a hard time to find in the first place on the web.
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
<< < | > >> | |||||
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |