The question with most votes so far asked at StackOverflow is about branch prediction.

The code is to test the performance of code generated by different compilers.

Some modified versions (inspired by answers of that question) of the originally posted sum loop are also included.

I have tried GCC 4.8.2 and ICC 14.0.1.

Here are the outputs:

GCC 4.8.2 with -O2:

ICC 14.0.1 with -O2:

I understand ICC's results now after reading Mysticial's answer. ICC swaps the inner and outer loops for me without having me to rewrite the code.

But GCC's 0s are surprising.

I modified the code further by changing

to

and ICC gives me 0s just like GCC.

So ICC is intelligently optimizing the code by interchanging the loops when its poorly written. GCC does not do this.

However, GCC recognizes that the benchmark loop is even worse written when it has been rewritten as a inner loop and just removes the loop.

Poorly written loops in code cannot be fixed completely by smart compilers. This is true at least for now.

Soucecode can be found at GitHub