C++ help needed :(

aaronxg · Oct 29, 2013

Hi people,

I am a C++ noob.
Would like to have some enlightenment over this question

Thanks in advance!

BlackCube · Oct 30, 2013

Version 1?

If x is greater, then it returns and stops there.
And it use one lesser variable?

davidktw · Oct 30, 2013

aaronxg said:
Hi people,

I am a C++ noob.
Would like to have some enlightenment over this question

Thanks in advance!

Fundamentally you can't be fault at any of the both answers. Since (A) both solutions are correct, (B) the codes are too simple for any of both versions to be inappropriate.

However, from the academic perspective, Version 2 will be a more correct approach because of the execution flow.

It is often not encouraged to have multiple exit points in the code. Version 1 have 2 exit points. It is fine for such small codes, but if the code gets larger and also more complicated, it will be beneficial to have a common exit point so that it will assist in debugging and sanity checking.

Using an extra maxNo variable to contain the result before passing back to the callee allows for consolidated sanity check to be performed at the end of the function. Suppose I have a function called double_it(int), for Version 1, I will need to apply it 2 times since there are 2 exit points. For Version 2, I will only need to apply once on the maxNo variable and return it. At the same time, the maxNo variable name helps to explain to another programmer what is the intention of the code and what value is expected to be returned from the function. There isn't any more comments required to explain such simple intention.

Therefore academically I will choose Version 2 as the better solution.

However I personally would have Version 3 as follows

Code:

return  x > y ? x : y;

This example is too simple to really say it helps a lot, but well, it is just a purely academic discussion, isn't it ?

stan216 · Oct 30, 2013

question 1a 5 marks? :s11:

mark inflation or need you to write a novel?

reasons for version 2: there is only 1 return statement, compared to 2 return statements in version 1. of course in version 1, the function will only execute one of the return statements, but for debugging purposes, version 2 is preferred as there is only 1 return statement; should a logical error occur, it would probably be easier to trace the error. The functions only have 7-8 lines of code, however, if you have more lines of code and multiple return statements, you might have a hard time figuring out which return statement caused the error. It's for clarity purposes.

TL;DR? :s22:

It's for clarity purposes.

stupidbodo · Oct 30, 2013

i say this question sucks both ways are fundamentally correct and can be debated in any way you want it's more of a matter of your style of writing.

I rather the question be like which way is more pythonic etc etc lol :s13:

AnimeNewbie · Oct 30, 2013

From performance perspective, version 1 is faster for 2 reasons, it only push 2 local variables onto the stack instead of 3 and it returns immediately when the if condition is true whereas in this case, the 2nd version has to jump out of the if condition to return(1 extra jump).

davidktw · Oct 30, 2013

AnimeNewbie said:
From performance perspective, version 1 is faster for 2 reasons, it only push 2 local variables onto the stack instead of 3 and it returns immediately when the if condition is true whereas in this case, the 2nd version has to jump out of the if condition to return(1 extra jump).

You will want to be very careful with such assumptions. While you may be right, your way to assess performance is at micro level, but making assumptions using high level codes.

First of all, Intel architecture, most probably where this code is run on is not a native stack based architecture, as such, there is no pushing of local variables into the stack for this function. In simplified explanation, besides the return instruction pointer that is pushed into the stack during a subroutine call, other stack values include the parameters. There is not parameters for this function and hence there shouldn't be other explicit stack values we can identify at this high level code assessment.

Allocating stack space for local variables is not a pushing mechanism. It is merely shifting the stack pointer (SP/ESP register) based on the amount of local space required. Between 2 and 3 local variables, there is no substantial differences in performance between both implementations.

For the sake of argument, your assumption neglected a lot of on-goings beneath the codebase, cache locality, compiler optimisation like dead code removal, loops unrolling, registers allocation optimisation, code inlining etc. All these contribute to the actual assembly code that will be formulated.

Allow me to show you what happens practically.

Code:

$ cat test.cpp
#include <iostream>
#include <climits>

using namespace std;

int findMax() {
  int x, y;
  //cout << "Enter values for x and y: " << endl;
  //cin >> x >> y;
  x = 4; y = 3;
  if (x > y)
    return x;
  else
    return y;
}

int main(int argc, char** argv) {
  int x;
  for (int i = 0; i < INT_MAX; i++)
    x += findMax();
  return 0;
}

$ cat test2.cpp
#include <iostream>
#include <climits>

using namespace std;

int findMax() {
  int x, y, maxNo;
  //cout << "Enter values for x and y: " << endl;
  //cin >> x >> y;
  x = 4; y = 3;
  if (x > y)
    y = maxNo;
  else
    y = maxNo;
  return maxNo;
}

int main(int argc, char** argv) {
  int x;
  for (int i = 0; i < INT_MAX; i++)
    x += findMax();
  return 0;
}

Your claim is test.cpp will be faster
Below is my results

Code:

$ time ./test; time ./test2

real	0m8.671s
user	0m8.457s
sys	0m0.010s

real	0m8.587s
user	0m8.480s
sys	0m0.008s
$ time ./test; time ./test2

real	0m8.589s
user	0m8.419s
sys	0m0.009s

real	0m8.726s
user	0m8.532s
sys	0m0.009s

What does it says ? It contradicts with your finding. Are you wrong ? No you are not wrong. If you run sufficient number of times, you might be statistically right. But you have also neglected the power of the compiler.

During compilation, I used NO OPTIMIZATION explicitly

Code:

$ g++ -O0 -o test test.cpp
$ g++ -O0 -o test2 test2.cpp

However most of the time, this is not the case.
Normally the default will be using "-O2" for optimisation. Then I rerun the code again

Code:

$ time ./test; time ./test2

real	0m0.005s
user	0m0.001s
sys	0m0.002s

real	0m0.005s
user	0m0.001s
sys	0m0.002s

Observe the stunning difference ? What I want to warn you is never perform such micro benchmarking. They are not real reflection of performance. At high level code performance assessment, we can only stick with Big-O notation. Going further down will require real testing and real benchmarking, taking into account a lot of factors, not just how the source codes are arranged.

Big-O performance approach always assumed a large number N, or large number of repetitions for performance assessment. The type of performance graph gives a good indication to the efficiency of the algorithm. However, even in real practical cases, we cannot assume N must always be large. A bubble sort that is linear works better for small N compared to quick-sort where the Big-O is logarithm.

Drilling further down into performance will be a mistake if not considering other more details which are beyond source code. Please be mindful about this.

AnimeNewbie · Oct 30, 2013

I compiled the code using VS2012 Release mode with no optimization. Version 1 is faster than version 2. There is uninitialized bug in your code and your version 2 always return 0 in debug mode.

Code:

Timing: 7244ms
Timing: 8135ms

Code:

#include <iostream>
#include <cstdint>

#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <MMSystem.h>

#pragma comment(lib, "winmm.lib")

bool InitMMTimer(UINT wTimerRes);
void DestroyMMTimer(UINT wTimerRes, bool init);

using namespace std;

int findMax1() {
	int x, y;
	x = 3; y = 4;
	if (x > y)
		return x;
	else
		return y;
}


int findMax2() {
	int x, y, maxNo;
	x = 3; y = 4;
	if (x > y)
		maxNo = x;
	else
		maxNo = y;
	return maxNo;
}

int main() {

	int64_t x=0;
	UINT wTimerRes = 0;
	bool init = InitMMTimer(wTimerRes);

	DWORD startTime = timeGetTime();

	for (int i = 0; i < INT_MAX; i++)
		x += findMax1();

	DWORD endTime = timeGetTime();
	printf("Timing: %ums\n", endTime-startTime);

	x=0;
	startTime = timeGetTime();

	for (int i = 0; i < INT_MAX; i++)
		x += findMax2();

	endTime = timeGetTime();
	printf("Timing: %ums\n", endTime-startTime);

	DestroyMMTimer(wTimerRes, init);

	return 0;
}

bool InitMMTimer(UINT wTimerRes)
{
	TIMECAPS tc;

	if (timeGetDevCaps(&tc, sizeof(TIMECAPS)) != TIMERR_NOERROR) 
	{
		return false;
	}

	wTimerRes = min(max(tc.wPeriodMin, 1), tc.wPeriodMax);
	timeBeginPeriod(wTimerRes); 

	return true;
}

void DestroyMMTimer(UINT wTimerRes, bool init)
{
	if(init)
		timeEndPeriod(wTimerRes);
}

davidktw · Oct 30, 2013

AnimeNewbie said:
I compiled the code using VS2012 Release mode with no optimization. Version 1 is faster than version 2. There is uninitialized bug in your code and your version 2 always return 0 in debug mode.

Code:

Timing: 7244ms Timing: 8135ms

Code:

#include <iostream> #include <cstdint> #define WIN32_LEAN_AND_MEAN #include <Windows.h> #include <MMSystem.h> #pragma comment(lib, "winmm.lib") bool InitMMTimer(UINT wTimerRes); void DestroyMMTimer(UINT wTimerRes, bool init); using namespace std; ...

Yes there is a bug in my code, but that wouldn't change anything much seriously. I didn't say your assessment is wrong. If you actually has to run this set of codes for 2millions time, I'm sure you will get some benefit.

The bigger question is the assessment from source code is not equivalent to what truly happens on the opcode level. Things can change radically depending on optimisation and chipset architecture. Try even the first level optimisation and it starts to give way.

AnimeNewbie · Oct 30, 2013

davidktw said:
The bigger question is the assessment from source code is not equivalent to what truly happens on the opcode level. Things can change radically depending on optimisation and chipset architecture. Try even the first level optimisation and it starts to give way.

This is the assembly code + source code from the Visual C++ compiler.

Code:

?findMax1@@YAHXZ PROC					; findMax1, COMDAT

; 19   : int findMax1() {

	push	ebp
	mov	ebp, esp
	sub	esp, 8

; 20   : 	int x, y;
; 21   : 	x = 3; y = 4;

	mov	DWORD PTR _x$[ebp], 3
	mov	DWORD PTR _y$[ebp], 4

; 22   : 	if (x > y)

	mov	eax, DWORD PTR _x$[ebp]
	cmp	eax, DWORD PTR _y$[ebp]
	jle	SHORT $LN2@findMax1

; 23   : 		return x;

	mov	eax, DWORD PTR _x$[ebp]
	jmp	SHORT $LN3@findMax1

; 24   : 	else

	jmp	SHORT $LN3@findMax1
$LN2@findMax1:

; 25   : 		return y;

	mov	eax, DWORD PTR _y$[ebp]
$LN3@findMax1:

; 26   : }

	mov	esp, ebp
	pop	ebp
	ret	0
?findMax1@@YAHXZ ENDP					; findMax1

============================================
?findMax2@@YAHXZ PROC					; findMax2, COMDAT

; 29   : int findMax2() {

	push	ebp
	mov	ebp, esp
	sub	esp, 12					; 0000000cH

; 30   : 	int x, y, maxNo;
; 31   : 	x = 3; y = 4;

	mov	DWORD PTR _x$[ebp], 3
	mov	DWORD PTR _y$[ebp], 4

; 32   : 	if (x > y)

	mov	eax, DWORD PTR _x$[ebp]
	cmp	eax, DWORD PTR _y$[ebp]
	jle	SHORT $LN2@findMax2

; 33   : 		maxNo = x;

	mov	ecx, DWORD PTR _x$[ebp]
	mov	DWORD PTR _maxNo$[ebp], ecx

; 34   : 	else

	jmp	SHORT $LN1@findMax2
$LN2@findMax2:

; 35   : 		maxNo = y;

	mov	edx, DWORD PTR _y$[ebp]
	mov	DWORD PTR _maxNo$[ebp], edx
$LN1@findMax2:

; 36   : 	return maxNo;

	mov	eax, DWORD PTR _maxNo$[ebp]

; 37   : }

	mov	esp, ebp
	pop	ebp
	ret	0
?findMax2@@YAHXZ ENDP					; findMax2

My source code assessment is the same with the assembly code except for some minor differences. Optimization is off for your code because the compiler detects the for-loop result x is not used, so the for-loop is optimized away. And in benchmarks, you are not supposed to use constants because that would result x86 (Pentium and above) processor to always use the same 'correct' branch prediction.

Optimization skews the result because optimizer can turn version 2 into version 1 or totally avoid the local variables and use registers instead. When you use optimization, all bets are off.

davidktw · Oct 30, 2013

My source code assessment is the same with the assembly code except for some minor differences. Optimization is off for your code because the compiler detects the for-loop result x is not used, so the for-loop is optimized away. And in benchmarks, you are not supposed to use constants because that would result x86 (Pentium and above) processor to always use the same 'correct' branch prediction.

Optimization skews the result because optimizer can turn version 2 into version 1 or totally avoid the local variables and use registers instead. When you use optimization, all bets are off.

That is my point isn't it ? Of course I know version 1 is faster without optimisation. But in real practical development, we don't do without optimisation. That's why the danger of directly translating performance from the source code is a source of error.

The fact that optimisation does a lot more under the hood where registers are used instead of just pure memory variables and the part where optimisation can go beyond the boundary of function calls as long as the flow correctness is maintain makes the assessment of version 1 must be faster is irrelevant to actual behaviour.

This is the focus where I'm trying to get the message across. Without optimisation, it will not be hard to deduce what you have explain. I concur with that.

That's why I don't find it is correct to discuss about performance of Version 1 vs Version 2 at the source code level, because at this level, we talk about Big-O notation as the bigger picture to optimisation. It's only when we go low level enough to find out how the compiler has augment or remove certain flow in the codes after optimisation and compilation that we can be more definitive in determining the performance comparison between the 2 versions.

That is where I'm coming from. These sets of codes are simple enough to look at assembly language. But what we need to be careful where is what level of performance optimisation we should be discussing at which level to make correct assessment and proper approach to performance tuning.

AnimeNewbie · Oct 30, 2013

I was merely commenting on the source code in the general sense. You are the one who bring in all those other stuff to attack me. Game programmers dun turn on their optimizers, they prefer to hand-craft their optimizations.

davidktw · Oct 30, 2013

AnimeNewbie said:
I was merely commenting on the source code in the general sense. You are the one who bring in all those other stuff to attack me. Game programmers dun turn on their optimizers, they prefer to hand-craft their optimizations.

I don't think there is any attack here. Is there ? duh... okay fine. If you say so "game" developer

AnimeNewbie · Oct 30, 2013

Hi TS,

Academically, version 2 is the correct answer so I would advise TS to choose that as Singapore education system usually have 1 std answer for each question, not open-ended answers. In real life, there is no preference between the 2 methods. Micro-optimizations like version 1 is very much frowned upon in the programming world, the correct program optimization would be using a more efficient algorithm that has better Big Oh efficiency. Well, that does not apply to your question because there is not much data involved.

davidktw · Oct 30, 2013

AnimeNewbie said:
Hi TS,

Academically, version 2 is the correct answer so I would advise TS to choose that as Singapore education system usually have 1 std answer for each question, not open-ended answers. In real life, there is no preference between the 2 methods. Micro-optimizations like version 1 is very much frowned upon in the programming world, the correct program optimization would be using a more efficient algorithm that has better Big Oh efficiency. Well, that does not apply to your question because there is not much data involved.

(Clap)

leofireng86 · Oct 30, 2013

woah...i think we should look at TS question...the question asked about preferred method, there is no right or wrong method as long as it gets the job done, both are correct and both get the job done. Based on my experience version 2 is preferred imho. why?

v1 have 2 return code, and v2 has 1. Maintaining v2 code is definitely easier. When your function code size increases (think 100lines - 1000lines compared to 10 lines here?) and it is suppose to return a value, you would be sure you want to have a single return point.

p.s. hope you have fun learning programming!

reno77 · Oct 30, 2013

Well you are not casting your x y to int, so version 1 I think will crash if a char is entered, version 2 might return a uninitialized random maxno

C++ character to int - Stack Overflow

aaronxg · Oct 30, 2013

Thank you all for your help!
Time to digest :spin:

AnimeNewbie · Nov 23, 2013

IN the real world and the software world, things are not always in black and white. Some things are still useful in some scenarios. I resort to the mentioned optimization in a image processing tight loop where in debug mode, the process cannot keep up with the required fps. I also use in LUA where the code is interpreted simply, not optimized. Of cos, as rule of thumb, when doing programming, we should not put on optimizing hat, except where it really matters.

C++ help needed :(

More options

aaronxg

Member

BlackCube

Great Supremacy Member

davidktw

Arch-Supremacy Member

stan216

Senior Member

stupidbodo

Junior Member

AnimeNewbie

Suspended

davidktw

Arch-Supremacy Member

AnimeNewbie

Suspended

davidktw

Arch-Supremacy Member

AnimeNewbie

Suspended

davidktw

Arch-Supremacy Member

AnimeNewbie

Suspended

davidktw

Arch-Supremacy Member

AnimeNewbie

Suspended

davidktw

Arch-Supremacy Member

leofireng86

Junior Member

reno77

Arch-Supremacy Member

aaronxg

Member

AnimeNewbie

Suspended