Olivier Langlois's blog

BOOKS i'm reading

Napoleon Hill Keys to Success: The 17 Principles of Personal Achievement, Napoleon Hill, ISBN: 978-0452272811

The 4-Hour Workweek: Escape 9-5, Live Anywhere, and Join the New Rich (Expanded and Updated), Timothy Ferriss, ISBN: 978-0307465351

The Fountainhead, Ayn Rand, ISBN: 0452273331

07/06/08

12:22:15 pm, by lano1106, 249 words, 4860 views

Categories: TCP/IP, TCP/IP

UNIX Network Programming: Networking APIs: Sockets and XTI; Volume 1, Second edition

This is considered by many as the TCP/IP application programming bible and I am among them. This book is simply the most complete and detailed book on Socket programming. It describes every option under all their small details. This makes the book reading lengthy and tedious but it also makes it an excellent reference. Even experienced socket programmers will most likely learn something from this book. For myself, I got a better understanding of the listen() parameter purpose, a better understanding of socket lingering behavior and I refer the book from time to time to refresh my memory on topics such as how to time out a TCP connection attempt.

After having borrowed the second edition from someone at my work, I have decided to get myself a copy of the book. I have purchased the third edition. As of the time of writing this review the price for a used copy of the second edition is 6$ compared to 60$ for the third edition. Since I had the chance to compare the content of both editions, you might be interested to know that beside 1 bug fix in the sample code that I have noticed, the content of both edition is identical to 90% in my estimation. The changes are very minor. Some unimportant topics from the second edition such as XTI have been replaced by very specialized new topics. This means that, in my opinion, purchasing the older second edition which is still very accurate is a very good purchase.

Permalink 2 comments

06/10/08

07:47:43 pm, by lano1106, 208 words, 3027 views

Categories: TCP/IP

Why TCPINCR is set to 904?

In the book Internetworking with TCP/IP volume 2 - Design, implementation, and internals at the chapter 13 (TCP: Output processing), section 13.16 Other TCP Procedures, the source code of the function tcpiss() called to obtain the Initial Send Sequence (ISS) is presented. The function works by initializing a static variable with a clock counter value and then increment that value to TCPINCR whose value is 904.

No explanation is provided about the 904 value and there is even an exercise asking the reader to contact the coauthor David Stevens by e-mail to know why they are using 904. I first tried to find the answer on the net before contacting them but the information was unavailable.

So here is the answer of Douglas E. Comer, coauthor of the book, on the topic for the next fellow looking for the same answer:

It's an inside joke. The TCP standard says to choose a value other than 1.
My co-author, Dave Stevens, and I tossed a coin, and we used his birthday (September 4th). There are a couple of other fun items in the book (hint: look in the index).

Regards,
Doug Comer

After reading Mr. Comer reply, I have checked the index and there is a list of well-known constants such as the world renowned 1.3.6.1.2.1 !

Permalink

06/05/08

11:08:40 pm, by lano1106, 669 words, 27904 views

Categories: TCP/IP

CLOSE_WAIT vs TIME_WAIT

While we were using an Apache ActiveMQ broker at my job. It has been reported by the operators last week that the broker could not open new sockets because it has reached the maximum limit of open file descriptors. The operators have also mentioned that the majority of the TCP sockets where in the CLOSE_WAIT state. My initial reaction has been to confuse the TCP state CLOSE_WAIT with the TIME_WAIT state.

Yesterday, someone else reported the same problem on the Apache ActiveMQ developers’ mailing list and I have noted by reading posts from people trying to suggest workarounds that I was not the only one to mix up the TCP state CLOSE_WAIT and TIME_WAIT. So this gave me the idea to take this opportunity to write about the differences between these 2 states since apparently there is a widespread need for clarifications on that matter.

A TCP connection goes into the CLOSE_WAIT state when it receives a FIN segment from its peer. From that point the connection becomes half-duplex and the TCP connection will not receive any new data from its peer but it will still be able to send any amount of data back to the peer. Hence, the socket will stay there as long as the server does not call close() explicitly on the socket.

I am not sure on this but I think this is exactly what the state name means. The socket is waiting for the application to call close() before going away.

TIME_WAIT is the TCP state a connection will go when it performs an active close (it is initiating the connection shutdown) after having received the peer FIN segment and sent it back an acknowledgment for the FIN.

The connection will stay in that state for 2 times the maximum segment live (MSL). As an interesting side note, the MSL has nothing to do with IP time to live (TTL) or the TCP connection round time trip (RTT). The MSL value is a constant and varies from one TCP implementation to another. RFC 793 suggests using 2 minutes for the MSL and the implementation shown in this book is using 2 minutes but 4.4BSD is using 75 seconds and Linux close to 4 minutes. The reason why MSL is unrelated to IP TTL and RTT is that a TCP segment can outlive the IP packet carrying it even if routers takes longer than usual to deliver it since a TCP stack will retransmit it.

TIME_WAIT is needed because it is possible that the TCP acknowledgment for the received FIN gets lost (or any segment containing data prior the FIN). In that case, the peer will try to retransmit its FIN for the MSL period. Waiting for a longer period than MSL is the only choice TCP can make as there are no acknowledgements for ACKs (that would be recursive anyway and solve nothing if ACKs were acknowledged).

I have been wondering for some time why TCP does not wait 2 times MSL before leaving the LAST_ACK state like it does for TIME_WAIT since there are scenarios where TCP could receive packets from the previous connection after leaving LAST_ACK and I think that I have understand why. In TIME_WAIT, you want to handle peer retransmission otherwise it could think that its last segments have never been received. In LAST_ACK, there is nothing else to acknowledge so it does not matter if TCP reset the connection after leaving LAST_ACK.

For instance, let’s say that the stack retransmits its FIN and both the original FIN and the retransmitted one get acknowledged. What will happen is that at the reception of the first ACK, TCP will release its Transmission Control Block (TCB) associated to this connection and at the reception of the second ACK, it will reply with a RST. It is no big deal. The other peer will be in the TIME_WAIT state and will just conclude that the peer is gone when it will read the RST.

Permalink 3 comments

05/14/08

12:36:56 pm, by lano1106, 286 words, 4689 views

Categories: C++

C/C++ register keyword usefulness

I have received a comment for this blog post to the effect that using the keyword 'register' was useless and all compilers will ignore it. I do not agree on this. If you search on the Internet, you will find out that a lot of people speculate and have their opinion on the subject but the truth is that unless you are a compiler writer, you have no idea whether a given compiler is doing something with the 'register' keyword or not. Even if you know the answer for a specific compiler, you cannot generalize for the other compilers. Here is what the C++ standard in section 7.1.1:

A register specifier has the same semantics as an auto specifier together with a hint to the implementation that the object so declared will be heavily used. [Note: the hint can be ignored and in most implementations it will be ignored if the address of the object is taken. -end note]

It is true that today compiler optimizers are very clever and can do a decent job at managing a CPU registers alone but lets imagine how the algorithm managing the registers used by the compiler works. Registers are a scarce resource and the compiler must analyze each variable usage and assign to each variable a score based on various criterias. If the compiler needs to swap back a variable to memory to reuse a register, it would choose the variable having the lowest score. What happens if more than one variable have the same low score? In such situation, a compiler could use the 'register' keyword hint to take a decision.

Situations where the 'register' keyword could be used:

temporary variables used in bitwise logical operations
loop counters

Permalink

05/06/08

09:16:37 pm, by lano1106, 791 words, 2686 views

Categories: C++

Formatted I/O in C++

In this post, I want to explain why using ostream for formatting output is a better option than using printf. I intend to make my point by showing how these 2 methods work. By the way, an important milestone in my programming career was to come to the realization that standard libraries functions are no more magical than your own functions. I believe that once you stop considering system functions as black boxes and start getting interested in how they work, your programming skill will improve. This is especially true, since a lot of these libraries source code is available for your leisure to consult it.

printf was extremely versatile when it has been introduced with the C language standard library. However, it suffers serious problems:

Performance issue: printf is interpreting the format string at run-time. It scans the string looking for the next '%' character. Once found, it goes in a switch case on the following char to determine the type of the next variable to extract from the stack and to format its value in its string representation.
Safety: printf is unsafe from the fact that the variables type and size passed to printf are lost and they are removed from the stack solely based on the content of format string. printf is a source of many security exploits in networked server. With a carefully crafted input string containing '%' chars in it (like in a HTTP user-agent string), it is possible to crash a server that is using printf incorrectly. Some compilers such as gcc tries to warn you about potential problems with printf but usually these warnings go unnoticed so even with them, printf is still very error prone.

std::ostream does not suffer from any of printf problems. ostream is using the function overloading language feature to determine what is the type of the variable to format. Because all the work is performed by the compiler, there is no runtime cost as with printf. A good implementation of C++ I/O streams can offer a substantial performance advantage compared to printf based code. Of course, there are bad implementations like old implementations. The standard library shipped with VC++ 6 is one of these bad implementations where I/O streams are implemented under the hood with printf. Obviously, with such implementation you can only have slower I/O than using directly printf but that type of implementation is becoming quite rare. You also have safety since type checking is performed by the compiler.

Also, when using C++ output string streams, as you would initialize a string with its initial value rather than creating an empty string followed by assigning it its initial value, you can do the same thing ostrstring. If the first element of string to format is a string, pass it to the ostrstring constructor:

explicit
basic_stringstream(const basic_string<Ch,Tr,A>& s,
                   openmode m = out|in);

use:

std::ostringstream ost("First string chunk:",
                       std::ios_base::ate);
ost << 123;

instead of:

std::ostringstream ost;
ost << "First string chunk:" << 123;

And finally, there is, in my opinion, the boost abomination. The format class. This is a glorified object oriented printf. It is slightly improved compared to printf in the sense that the parameters type is validated and hence using format is safer than printf. However, it has the same performance problem than printf because it has to evaluate the format string at runtime exactly like printf. I guess that the motivation for writing this class was that there is some niche situations where using C++ streams are rather messy. My problem with this class is that having multiple options to achieve the same outcome is confusing. This is especially true for people learning C++. The language itself is already complex enough without adding an unnecessary complexity by adding multiple options. Another category of programmers for whom having format as an option can be problematic is the blind followers of boost. Programmers of this category are usually junior and they have been told that the next C++ cool thing was the boost library and since then they started to incorporate boost classes everywhere they can without understanding the implications or looking how boost classes are working.

I am concerned about this class because boost is considered as the playground for the persons designing what will be future standard libraries. In my opinion, only the best option to perform a certain task for most of the cases should find its way into mainstream libraries. We need to keep printf for backward compatibility but format that is only a better printf but not quite as good as I/O streams should not have that status.

Permalink

<< Previous Page :: Next Page >>

Olivier Langlois's blog

07/06/08

UNIX Network Programming: Networking APIs: Sockets and XTI; Volume 1, Second edition

06/10/08

Why TCPINCR is set to 904?

06/05/08

CLOSE_WAIT vs TIME_WAIT

05/14/08

C/C++ register keyword usefulness

05/06/08

Formatted I/O in C++

Olivier Langlois's blog

Search

Categories

Olivier Langlois's blog

Archives

Misc

XML Feeds

Who's Online?

Sun	Mon	Tue	Wed	Thu	Fri	Sat
<< <				> >>
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31