Olivier Langlois's blog

BOOKS i'm reading

Napoleon Hill Keys to Success: The 17 Principles of Personal Achievement, Napoleon Hill, ISBN: 978-0452272811

The 4-Hour Workweek: Escape 9-5, Live Anywhere, and Join the New Rich (Expanded and Updated), Timothy Ferriss, ISBN: 978-0307465351

The Fountainhead, Ayn Rand, ISBN: 0452273331

Categories: C++, tutorials

05/06/08

09:16:37 pm, by lano1106, 791 words, 2249 views

Categories: C++

Formatted I/O in C++

In this post, I want to explain why using ostream for formatting output is a better option than using printf. I intend to make my point by showing how these 2 methods work. By the way, an important milestone in my programming career was to come to the realization that standard libraries functions are no more magical than your own functions. I believe that once you stop considering system functions as black boxes and start getting interested in how they work, your programming skill will improve. This is especially true, since a lot of these libraries source code is available for your leisure to consult it.

printf was extremely versatile when it has been introduced with the C language standard library. However, it suffers serious problems:

Performance issue: printf is interpreting the format string at run-time. It scans the string looking for the next '%' character. Once found, it goes in a switch case on the following char to determine the type of the next variable to extract from the stack and to format its value in its string representation.
Safety: printf is unsafe from the fact that the variables type and size passed to printf are lost and they are removed from the stack solely based on the content of format string. printf is a source of many security exploits in networked server. With a carefully crafted input string containing '%' chars in it (like in a HTTP user-agent string), it is possible to crash a server that is using printf incorrectly. Some compilers such as gcc tries to warn you about potential problems with printf but usually these warnings go unnoticed so even with them, printf is still very error prone.

std::ostream does not suffer from any of printf problems. ostream is using the function overloading language feature to determine what is the type of the variable to format. Because all the work is performed by the compiler, there is no runtime cost as with printf. A good implementation of C++ I/O streams can offer a substantial performance advantage compared to printf based code. Of course, there are bad implementations like old implementations. The standard library shipped with VC++ 6 is one of these bad implementations where I/O streams are implemented under the hood with printf. Obviously, with such implementation you can only have slower I/O than using directly printf but that type of implementation is becoming quite rare. You also have safety since type checking is performed by the compiler.

Also, when using C++ output string streams, as you would initialize a string with its initial value rather than creating an empty string followed by assigning it its initial value, you can do the same thing ostrstring. If the first element of string to format is a string, pass it to the ostrstring constructor:

explicit
basic_stringstream(const basic_string<Ch,Tr,A>& s,
                   openmode m = out|in);

use:

std::ostringstream ost("First string chunk:",
                       std::ios_base::ate);
ost << 123;

instead of:

std::ostringstream ost;
ost << "First string chunk:" << 123;

And finally, there is, in my opinion, the boost abomination. The format class. This is a glorified object oriented printf. It is slightly improved compared to printf in the sense that the parameters type is validated and hence using format is safer than printf. However, it has the same performance problem than printf because it has to evaluate the format string at runtime exactly like printf. I guess that the motivation for writing this class was that there is some niche situations where using C++ streams are rather messy. My problem with this class is that having multiple options to achieve the same outcome is confusing. This is especially true for people learning C++. The language itself is already complex enough without adding an unnecessary complexity by adding multiple options. Another category of programmers for whom having format as an option can be problematic is the blind followers of boost. Programmers of this category are usually junior and they have been told that the next C++ cool thing was the boost library and since then they started to incorporate boost classes everywhere they can without understanding the implications or looking how boost classes are working.

I am concerned about this class because boost is considered as the playground for the persons designing what will be future standard libraries. In my opinion, only the best option to perform a certain task for most of the cases should find its way into mainstream libraries. We need to keep printf for backward compatibility but format that is only a better printf but not quite as good as I/O streams should not have that status.

Permalink

05/04/08

06:20:38 pm, by lano1106, 168 words, 2845 views

Categories: C++

C++ function overloading

This is the C++ language feature that allows multiple functions to have the same name. Having multiple functions with the same name was not possible in C. The compiler is able to choose the right function by checking its parameters:

int f(int)
{
  printf("This is f(int)\n");
}

int f(char *)
{
  printf("This is f(char *)\n");
}

int test()
{
  f(1); // This is f(int)
  f("test"); // This is f(char *)
}

The way it works is that the compiler mangles function names. Mangling means that the function parameters type gets encoded in the exported function name. How a compiler mangles a function name is not specified by the standard. Hence this is one of the reasons why it is not possible to use an object file generated with compiler X with compiler Y. Most tools manipulating C++ object files (like nm) perform the reverse operation: demangling. That is extracting the function signature (including the parameters) from the exported C++ mangled function name.

Permalink

04/28/08

09:55:06 pm, by lano1106, 680 words, 4656 views

Categories: C++

ODR, inline functions and C++ compiler generated functions

ODR is the One Definition Rule. This is the rule that stipulates that a name (a function, a class or global variable and much more other types) must be defined not more than once in the same program. The compiler will emit an error if a name is defined more than once in the same translation unit (TU) or if the name is defined more than once but in differents TUs, it is the linker that will catch the error and it will emit a duplicate symbol error.

The ODR is the reason why header files guards are used (#ifndef XXX #define XXX #endif). If a header file was not protected by guards and included more than once in a TU, the ODR would be violated.

All C++ programmers have a formal knowledge of the ODR or at least an intuition of its details learned by experience but this rule gets very interesting when you start to study the exceptions to it. Studying these things makes you more appreciate the complexity of the C++ language.

For instance, there are the inline functions. Usually, when we think about inline functions, we consider them as fancy macros with strong type checks. However the inline keyword can be considered only as a hint given to the compiler and the compiler is free to completely ignore the inline specifier. If it does ignore it, it will generate a regular function and export it from the resulting object file. If the inline functions are defined in many TUs and linked together then according to the ODR, this would be an error but inline functions are exempted from the ODR and the linker only keeps one copy of the inline functions defined multiple times. I would be very eager to figure out how the linker knows that a given function is exempted from the ODR because as you will see next, there are no apparent hints to this effect.

The next exception to the ODR is the C++ compiler generated functions for a class. Those functions include the copy constructor and the assignment operator. The C++ standard says that these functions are always implicitly declared if not explicitly done by the user but implicitly defined only if they are used. They will also be considered as inline hence also exempted from the ODR. Another property of the generated functions is that they are classified as trivial or non-trivial. Essentially, a trivial copy constructor simply means that the class looks more like a C POD (Plain Old Data) structure than a class and the compiler can use the same C compiler technique to perform the deep copy (probably something like a memcpy). Otherwise, it will need to create a real constructor function. Here is a simple example to demonstrate these notions:

C.h: 

class C 
{ 
public: 
   C() : a(0), b(0), c(0), d(0), e(0) {} 
//  virtual ~C() {} 
private: 
   int a; 
   int b; 
   int c; 
   int d; 
   int e; 
}; 

f.cpp: 

#include "C.h" 

C f() 
{ 
     C a; 
     C b = a; 
     return b; 
} 

g.cpp: 

#include "C.h" 

C g() 
{ 
     C a; 
     C b = a; 
     return b; 
} 

main.cpp: 

#include "C.h" 

C f(); 
C g(); 

int main(int argc, char *argv[]) 
{ 
   f(); 
   g(); 
   return 0; 
} 

g++ -c f.cpp 
nm -a -C f.o 

00000000 t 
00000000 d 
00000000 b 
00000000 t 
00000000 n 
00000000 n 
00000000 a f.cpp 
00000000 T f() 
00000000 W C::C()

No sign of the copy constructor. I am guessing that the compiler make the copy constructor inline because it is trivial. I am adding a virtual destructor to make the copy constructor non-trivial:

00000000 T f() 
          U operator delete(void*) 
00000000 W C::C(C const&) 
00000000 W C::C() 
00000000 W C::~C() 
00000000 W C::~C() 
00000000 V typeinfo for C 
00000000 V typeinfo name for C 
00000000 V vtable for C 
          U vtable for __cxxabiv1::__class_type_info

Now g.o and f.o have a copy of C copy constructor and the linker will link fine and only one copy of the function will find its way in the final executable file because the implicitly defined copy constructor is inline.

Permalink

04/12/08

03:22:31 pm, by lano1106, 265 words, 2577 views

Categories: C++

shift operator undefined behavior

I was expecting:

int main( int argc, char *argv[] ) 
{ 
   unsigned char m = 32; 
   register unsigned mask = (1<<m); 
   std::cout << std::hex << mask << '\n'; 
   return 0; 
}

to print 0 but instead this program compiled with g++ (and VC++.NET2003 too) prints 1!

If I change (1<<m) by (1<<32) or if change the program for:

int main( int argc, char *argv[] ) 
{ 
   unsigned char m = 31; 
   register unsigned mask = (1<<m)<<1; 
   std::cout << std::hex << mask << '\n'; 
   return 0; 
}

it gives me the expected 0.

In the C++ standard document, section 5.8. It is written

"The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand."

The root for this behavior probably originates from C and the safe way to perform a bit shift when the number of bits to shift may exceed the length of the left operand is to implement the shift operation in a function:

Example, almost but not universally portable:

#include <climits> 

unsigned int safe_uint_shift(unsigned int value,
                             unsigned int bits) 
{ 
    if( bits > (CHAR_BIT*std::sizeof(unsigned int) )
    {
      return 0;
    }
    else
    {
        return value << bits; 
    }
}

Put it in a header and make it inline if you like.

This solution has been proposed by Jack Klein.

The other way that I have used is to promote the left operand to an integer type having enough bits:

register unsigned mask =
  (static_cast<unsigned long long>(1)<<m)<<1;

Permalink

09:04:01 am, by lano1106, 279 words, 1786 views

Categories: C++

operator<< for a private inner class

Suppose we have

class A
{
  private:
  class B
  {
  };
};

and we would like to define operator<< for class B. I had this situation yesterday and it took me few tries to make it work. I first tried:

class A
{
...
};
std::ostream operator<<( ostream &, const A::B & );

It did not work because the compiler complained that B was private. My second attempt:

class A
{
  private:
  class B
  {
  };
  static std::ostream &operator<<( ostream &, const B & );
};

I was getting closer to the solution but this was still not quite right. Apparently, you do not have the right to declare operator<< as a static member of another class or as soon as an operator is defined as a class member, automatically, the compiler expect the left operand to be of this class type.

Then I tried this:

class A
{
  private:
  class B
  {
  };
  friend std::ostream &operator<<( ostream &, const B & );
};
std::ostream &operator<<( ostream &o, const A::B &b )
{
}

This time, I am not sure why but the linker now was complaining about duplicate symbols for my operator<< function. Not sure if it is the code that has a problem. It looks good to me. I suspect that maybe it is parameter type mangling problem and the compiler does not recognize 'const B &' as the same as 'const A::B &' but anyhow, I did not pursue the investigation I have finally made the code work like this:

class A
{
  private:
  class B
  {
  };
  friend std::ostream &operator<<( ostream &o, const B &b )
  {
  }
};

Permalink

<< Previous Page :: Next Page >>

Sun	Mon	Tue	Wed	Thu	Fri	Sat
<< <				> >>
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Categories: C++, tutorials

05/06/08

Formatted I/O in C++

05/04/08

C++ function overloading

04/28/08

ODR, inline functions and C++ compiler generated functions

04/12/08

shift operator undefined behavior

operator<< for a private inner class

Olivier Langlois's blog

Search

Categories

Olivier Langlois's blog

Archives

Misc

XML Feeds

Who's Online?