From performance perspective, version 1 is faster for 2 reasons, it only push 2 local variables onto the stack instead of 3 and it returns immediately when the if condition is true whereas in this case, the 2nd version has to jump out of the if condition to return(1 extra jump).
You will want to be very careful with such assumptions. While you may be right, your way to assess performance is at micro level, but making assumptions using high level codes.
First of all, Intel architecture, most probably where this code is run on is not a native stack based architecture, as such, there is no pushing of local variables into the stack for this function. In simplified explanation, besides the return instruction pointer that is pushed into the stack during a subroutine call, other stack values include the parameters. There is not parameters for this function and hence there shouldn't be other explicit stack values we can identify at this high level code assessment.
Allocating stack space for local variables is not a pushing mechanism. It is merely shifting the stack pointer (SP/ESP register) based on the amount of local space required. Between 2 and 3 local variables, there is no substantial differences in performance between both implementations.
For the sake of argument, your assumption neglected a lot of on-goings beneath the codebase, cache locality, compiler optimisation like dead code removal, loops unrolling, registers allocation optimisation, code inlining etc. All these contribute to the actual assembly code that will be formulated.
Allow me to show you what happens practically.
Code:
$ cat test.cpp
#include <iostream>
#include <climits>
using namespace std;
int findMax() {
int x, y;
//cout << "Enter values for x and y: " << endl;
//cin >> x >> y;
x = 4; y = 3;
if (x > y)
return x;
else
return y;
}
int main(int argc, char** argv) {
int x;
for (int i = 0; i < INT_MAX; i++)
x += findMax();
return 0;
}
$ cat test2.cpp
#include <iostream>
#include <climits>
using namespace std;
int findMax() {
int x, y, maxNo;
//cout << "Enter values for x and y: " << endl;
//cin >> x >> y;
x = 4; y = 3;
if (x > y)
y = maxNo;
else
y = maxNo;
return maxNo;
}
int main(int argc, char** argv) {
int x;
for (int i = 0; i < INT_MAX; i++)
x += findMax();
return 0;
}
Your claim is test.cpp will be faster
Below is my results
Code:
$ time ./test; time ./test2
real 0m8.671s
user 0m8.457s
sys 0m0.010s
real 0m8.587s
user 0m8.480s
sys 0m0.008s
$ time ./test; time ./test2
real 0m8.589s
user 0m8.419s
sys 0m0.009s
real 0m8.726s
user 0m8.532s
sys 0m0.009s
What does it says ? It contradicts with your finding. Are you wrong ? No you are not wrong. If you run sufficient number of times, you might be statistically right. But you have also neglected the power of the compiler.
During compilation, I used NO OPTIMIZATION explicitly
Code:
$ g++ -O0 -o test test.cpp
$ g++ -O0 -o test2 test2.cpp
However most of the time, this is not the case.
Normally the default will be using "-O2" for optimisation. Then I rerun the code again
Code:
$ time ./test; time ./test2
real 0m0.005s
user 0m0.001s
sys 0m0.002s
real 0m0.005s
user 0m0.001s
sys 0m0.002s
Observe the stunning difference ? What I want to warn you is never perform such micro benchmarking. They are not real reflection of performance. At high level code performance assessment, we can only stick with Big-O notation. Going further down will require real testing and real benchmarking, taking into account a lot of factors, not just how the source codes are arranged.
Big-O performance approach always assumed a large number N, or large number of repetitions for performance assessment. The type of performance graph gives a good indication to the efficiency of the algorithm. However, even in real practical cases, we cannot assume N must always be large. A bubble sort that is linear works better for small N compared to quick-sort where the Big-O is logarithm.
Drilling further down into performance will be a mistake if not considering other more details which are beyond source code. Please be mindful about this.