Olivier Langlois's blog

BOOKS i'm reading

Napoleon Hill Keys to Success: The 17 Principles of Personal Achievement, Napoleon Hill, ISBN: 978-0452272811

The 4-Hour Workweek: Escape 9-5, Live Anywhere, and Join the New Rich (Expanded and Updated), Timothy Ferriss, ISBN: 978-0307465351

The Fountainhead, Ayn Rand, ISBN: 0452273331

Archives for: November 2007

11/25/07

05:46:39 pm, by lano1106, 366 words, 6862 views

Categories: C++

Optimizing C++ code for Intel processors (and possibly many others)

Pipelined processors do not like conditional branches because when conditional branch is entering in the pipeline, the processor does not know yet what will be the condition outcome but it needs to continue to fill the pipeline with new instructions. In that situation the processor will try to predict what will be the condition outcome and start filling the pipeline from its prediction. This is very costly performance wise when the processor mispredicts the condition outcome. When that happens, the processor has to flush the pipeline content and start filling it again from the branch point.

The conditional branch prediction module has 2 components. There is the branch outcome cache that will help the processor to remember what was the outcome of branches that it has encountered recently and a static branch prediction algorithm.

The static branch algorithm is very simple:

If the conditional branch goes further in the code, it will not be taken
If the conditional branch goes back in the code, it will be taken

If you read the Intel processor optimization guide, few simple guidelines are easy to apply:

1. Eliminate branches

Some conditional branches are simply useless as whether they are taken or not, it will not make any differences on the end result. 2 examples of these are:

if( i != 0 )
  i = 0;

if( ptr != NULL )
{
  delete ptr;
  ptr = NULL;
}

should be changed to:

i = 0;

and

delete ptr;
ptr = NULL;

It is faster to reassign the same value to a variable than conditionally assign it. The operator delete and the function free() already contain safeguards against NULL pointers and it should be quite exceptional that they are called anyway with NULL pointers.

2. Refactor if statement conditions to be true most of the time.

Here is an example:

if(error)
{
  // exceptional error handling
}
else
{
  // normal operation
}

would be faster written like:

if(!error)
{
  // normal operation
}
else
{
  // exceptional error handling
}

To conclude, these guidelines are specifically for Intel processors but since the static branch prediction algorithm used by Intel processors is widely used by other pipelined processors (PowerPC is one of them), applying them on other platforms should be beneficial as well. Also, if you are looking for other optimization techniques, you can consult these sources.

Permalink 1 comment

11/21/07

10:10:47 pm, by lano1106, 271 words, 5374 views

Categories: C++

Undefined behavior from a pthread C++ program when calling exit()

I have done pthread programming for a while and read a couple of books on the topic too but nothing have prepared me to the undefined behavior that I have discovered the hard way.

The POSIX standard states that when the main thread exits the process, the remaining threads will be automatically terminated. What the standard does not specify is in which order the different process cleanup will occur hence if you leave the process without terminating yourself the threads, you will get undefined behavior. In my case, it was a segmentation fault.

The undefined behavior is real especially if this is a C++ program because part of the termination cleanup, there is the destruction of the program global objects. If your threads are still running and happen to use global objects, this is a sure way to a segmentation fault.

Even in fixing the problem, I have learned again something new about pthreads programming. To fix my problem, I just sent to myself the SIGINT signal that initiates the correct termination. In my first attempt, I have replaced my exit() function call with raise(SIGINT) which I have read in my UNIX programming textbook was the equivalent to kill(getpid(), SIGINT) but by doing so, the signal was never received. After some investigation, I have found in a book, that in a multithread program, raise(SIGINT) was equivalent to pthread_kill(pthread_self(),SIGINT). In my case, calling raise was doing nothing because the signal was blocked in the thread from where the call was done. Everything finally worked fine when I replaced the raise call with kill(getpid(), SIGINT).

Permalink

12:04:37 am, by lano1106, 485 words, 2535 views

Categories: C++

Prefer calling the prefix form of the ++ and -- operators

This is an advice that can be found in many books such as:

C++ Coding Standards, Exceptional C++, More Effective C++ or The C++ Standard Library

However, despite the abundance of literature giving this advice and explaining why, a lot of people are not aware of this. So if you are one of these persons and you are someone who is spending more time browsing the web than reading books and you have found this blog, this is for you.

At first look, statements like

++i;
i++;

seem equivalent but you have to remember that if i is an object, the increment operator is implemented with a function. Seeing the form of typical operator functions implementation should explain everything:

// Prefix version
T& T::operator++()
{
  //Perform increment
  return *this; // Return a reference on this
}

// Postfix version
T T::operator++(int)
{
  T res(*this); // Make a copy of the current value
  ++*this;      // Call the prefix version
  return res;   // Return the value by result
}

So, as you can see, most of the time the postfix increment operator is implemented in terms of the prefix one so even if you call the postfix operator, you will end up calling the prefix operator function anyway. The rest is just overhead:

The copy constructor will be invoked at res creation.
A function call will be done if operator++() is not inline
The copy constructor might be called a second time because res is returned by value

You'll notice that at the third point, I used 'might'. If you have a good compiler, it might be able to get rid of the construction of the return value by using a technique called the Return Value Optimization (RVO). Most compilers are able to pull off that optimization with an unnamed object (temporary object) like:

return T;

If a return statement is returning an automatic variable (local variable) declared before the return statement as it is done in the operator++(int) function, a lot fewer compilers are able to apply the RVO. Those who are able are said to have named RVO (NRVO). Visual C++ has it only since version 2005 according to this blog entry. GCC supports it since version 3.1 according to this Boost mailing list archive entry.

To conclude, it should now be obvious that unless you absolutely need the postfix increment semantic, you should use the prefix version. This should represents 99% of the increments in a program since usually, they stand alone in their own statement and the return value is just not used at all. This is usually what happens in code using STL containers iterators. One could argue that it does not make a perceptable difference performance wise in a program. Maybe but why choosing the slowest option when both ways are equally easy to write? Herb Sutter and Andrei Alexandrescu think the same and they give the excellent advice (Don't pessimize prematurely) in their book C++ Coding Standards.

Permalink

11/19/07

11:38:11 pm, by lano1106, 575 words, 7365 views

Categories: C++

How to remove dynamically allocated pointers from STL set

Before clearing or destructing a STL container containing dynamically allocated pointers, it is important to delete the pointers or else, you are going to leak memory and resources (See item 7 in Effective STL). The usual way to this is:

for( vector<MyClass*>::iterator it = v.begin(),
     e = v.end(); it != e; ++it )
{
    delete *it;
}
v.clear();

There is a small twist with sets. This code would be fine with a set using its default compare function but since sorting pointers with their values is not very useful, most of the time, such containers are configured with a custom compare function that dereference the pointers to perform its comparison. The standard does not specify which set methods use the compare function and which are not. So it is looking for trouble to free the memory and keep the pointers on it in the set and at best your set will be in an invalid state.

The easy way is to store boost::smart_ptr<MyClass*> into the set. All you have to do is to clear the set and the smart pointers will take care of deallocating the memory. My opinion, however, is that this solution is like using a machine gun to kill a mosquito. The real purpose of smart pointers is for dealing with situations where the pointers ownership is fuzzy. If you know that it is safe without any doubt to delete the pointers in your container when you want to clear it, you should not be using smart pointers.

Another option is to use boost::ptr_set but since I am not very familiar with this class, this is not something I will recommend.

The best solution that I have seen is simply doing this:

typedef std::set<MyClass*,
                 CompFuncThatDerefencePointers> MySet;
 
for( MySet::iterator it = s.begin(),
     e = s.end(); it != e; )
{
    MyClass *p = *it;
    s.erase(it++);
    delete p;
}

There are 2 more details that I want to discuss. Some STL containers (such as set and map) erase() method signature is

void erase(iterator it);

and some other containers (list and vector) have

iterator erase(iterator it);

The second signature makes sense for containers that upon insertion or removal of elements invalidates all iterators on it but I have no idea why list has its erase returning an iterator. If you think that you might switch from a list to an associative container, you might want to erase elements from the list with

l.erase(iter++);

to keep your code compatible with the associative containers interface. As I am writing, I think that the STL designers decided to make the list erase signature compatible with the vector erase because they considered that this was the most likely switch if you had to change your container strategy.

The other important point that I wanted to mention is that my recommendation to explicitly clear the pointers rather than using boost::smart_ptr assumes that your container is a data member of a class and that you are using the RAII idiom. That is that you are deleting the container elements memory in the class destructor. This is to be sure that your code is exception-safe. If your container is an automatic variable in a function, then this might be an appropriate situation to use boost::smart_ptr because if you do not and an exception is thrown before your container clearing code is executed, you will leak memory.

Permalink

11/13/07

10:56:52 pm, by lano1106, 327 words, 2616 views

Categories: C++

How to optimize memory layout/usage of structures and classes

In the book The C++ Programming language from Bjarne Stroustrup, you can read in section 5.7 (Structures):

The size of an object of a structure type is not necessarily the sum of the sizes of its members. This is because many machines require objects of certain types to be allocated on architecture dependant boundaries or handle such objects much more efficiently if they are. For example, integers are often allocated on word boundaries. On such machines, objects are said to have to be aligned properly. This leads to "holes" in the structures...You can minimize wasted space by simply ordering members by size (largest member first).

As an example, on my platform if I write:

struct MyPOD1
{
  bool a;
  int  b;
  bool c
  int  d;
};

struct MyPOD2
{
  int  b;
  int  d;
  bool a;
  bool c;
};

sizeof(MyPOD1) is 16 and sizeof(MyPOD2) is 12. which means that 4 bytes are wasted by the MyPOD1 layout. Bjarne continues by adding that you should prefer readability over memory usage optimization but in my opinion taking the habit of always placing 1 byte variables such as bools and chars at the end of a class data members declaration can hardly obfuscate the code especially if the class encapsulation is tight.

Now that the demonstration that by just reordering data members around in a class declaration can save space has been done, let me explain to you the advantages of saving few bytes. Just imagine that you have a std::vector<MyPOD2> of 25K elements. You just saved 100KB of space to your program for free and since more elements will fit in the processor cache which should make the vector traversal faster.

I recommend taking this habit because it is easy to do and once you get used to do it, it will become an automatism and you will get the best performance from your data structures without having to stop and think about how you could optimize them before you need to.

Permalink

Sun	Mon	Tue	Wed	Thu	Fri	Sat
<< <		Current			> >>
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Archives for: November 2007

11/25/07

Optimizing C++ code for Intel processors (and possibly many others)

11/21/07

Undefined behavior from a pthread C++ program when calling exit()

Prefer calling the prefix form of the ++ and -- operators

11/19/07

How to remove dynamically allocated pointers from STL set

11/13/07

How to optimize memory layout/usage of structures and classes

Olivier Langlois's blog

Search

Categories

Olivier Langlois's blog

Archives

Misc

XML Feeds

Who's Online?