REPOST: Why Sequence Points in C are important ?

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,391
Reaction score
1,180
Following from http://forums.hardwarezone.com.sg/97506184-post4723.html. Wanted to repost here, so in case anyone stump upon such codes in C again will understand deeper and better what is going on.

int main(void)
{
int a = 5;
a = a-- - --a * (a = -3) * a++ + ++a;
printf("a = %i\n", a);

return 0;
}

david can help me see why this evaluates to 22? Thank you

Okay as promised lets follow up on this question of yours ? What really happened ? Lets see if we can even answer why you get "22". As of writing, it is the first time I'm looking at the assembly of your compiled C codes

aac5n4.png


Looking at your codes what we want to concern on is the complex pre and post increment/decrement expression on the same variable within the same sequence point.

The GCC compiler compile the C codes into the equivalent assembly language before compilation into machine opcodes.

@0008, you see the literal "5"($5) is assigned to -4(%rbp), "%rbp" is a register, namely the base pointer. It is one of the 2 pointers, namely Stack Pointer and Base Pointer often used in Intel x86, pointing to Stacks. You know what is a Stack, I hope. So -4(%rbp) is equivalent to an array access %rbp[-4], which means where %rbp is pointing to, minus 4 bytes, which is equivalent to a 32bits integer. Stacks are inverted, meaning they start from high memory addresses and stack up to lower memory addresses.

@00f: move the value "5" to one of the very commonly referenced register in assembly language %eax. It is one of the arithmetic register. RAX is the 64bits counterpart, and EAX is the 32bits counterpart of AX. So we now know memory address referenced by -4(%rbp) has the value 5. Remember the concept of a variable is a value found in a high level programming language placeholder. What really is important is not where "A" is, but rather the manipulation of the value that you wanted to operate in "A" and then probably pass it elsewhere over the network, or print it out, or just neglect it. What is really important is the manipulation of the value "5".

@0012,@0015: performs a 5-1=>4 and assign the value to EDX, then move the value back to -4(%rbp), which contain the value 4

@0018: subtract 1 from -4(%rbp), hence 4 becomes 3

@001c: move the value -3 to -4(%rbp). See up to this point, something weird happened. It seems all that has happened to -4(%rbp) is effectively render useless, because a -3 has overwrite the earlier value 3. Now -4(%rbp) stores -3

@0023: assign the value of -4(%rbp) which is now -3 to EDX, again render the earlier operations useless

@0026: Now ESI is assigned with the value of EDX which is -3. Hence ESI is -3

@0028: Multiple the value stored at ESI with the value stored at -4(%rbp) and store into ESI. That means the value of ESI <-- -3 * -3 = 9

@002c: move -4(%rbp)'s value which is -3 to EDX. Hence EDX is holding -3

@002f: increment %rdx by 1 and assign to ECX, which means ECX should have the value -3 + 1 = -2

@0032: move value of ECX to -4(%rbp), means now -4(%rbp) store value -2

@0035: multiply %esi into %edx means %edx <-- 9 * -3 = -27. Hence %edx is holding value -27

@0038: subtract %edx from %eax and store into %eax means %eax <-- 5 - (-27) = 32. Remember in @000f where %eax is storing the value 5 and undisturbed so far ?

@003a: add "1" to -4(%rbp), hence -4(%rbp) now have -1 (refer to @0032 why)

@003e: Finally add the value in %eax to -4(%rbp), means 32 + (-1) = 31

So after tracing so long, what did you see ? "a" which is referenced at -4(%rbp) is storing the value "31"

I hope you see what is really going on. How the compiler converted from C into assembly code does not correspond to what you think is happening at the source code level. Simply to put, the machine derives differently from your assumed semantics at the source code level.

In the end, "a" is holding a different value.

Below is the output when running on my linux server

Code:
$ cat abc.c
#include <stdio.h>

int main(void)
{
int a = 5;
a = a--
- --a
* (a = -3)
* a++
+ ++a;
printf("a = %i\n", a);

return 0;
}

$ cat abc.s
	.file	"abc.c"
	.section	.rodata
.LC0:
	.string	"a = %i\n"
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$16, %rsp
	movl	$5, -4(%rbp)
	movl	-4(%rbp), %eax
	leal	-1(%rax), %edx
	movl	%edx, -4(%rbp)
	subl	$1, -4(%rbp)
	movl	$-3, -4(%rbp)
	movl	-4(%rbp), %edx
	movl	%edx, %esi
	imull	-4(%rbp), %esi
	movl	-4(%rbp), %edx
	leal	1(%rdx), %ecx
	movl	%ecx, -4(%rbp)
	imull	%esi, %edx
	subl	%edx, %eax
	addl	$1, -4(%rbp)
	addl	%eax, -4(%rbp)
	movl	-4(%rbp), %eax
	movl	%eax, %esi
	movl	$.LC0, %edi
	movl	$0, %eax
	call	printf
	movl	$0, %eax
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
	.section	.note.GNU-stack,"",@progbits
$ ./abc
a = 31
$

So why you get "22" and not "31" ? I compile the same C code in my Mac OS X using also GCC and the answer is different. I get "-23"

So you see why my earlier answer to you is the answer is inconsistent.

Reading my Mac OS X assembly codes below, you will see different set of assembly codes are created.
Code:
$ cat ./abc.s
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_main
	.align	4, 0x90
_main:                                  ## @main
	.cfi_startproc
## BB#0:
	pushq	%rbp
Ltmp2:
	.cfi_def_cfa_offset 16
Ltmp3:
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
Ltmp4:
	.cfi_def_cfa_register %rbp
	subq	$16, %rsp
	leaq	L_.str(%rip), %rdi
	movl	$0, -4(%rbp)
	movl	$5, -8(%rbp)
	movl	-8(%rbp), %eax
	movl	%eax, %ecx
	addl	$4294967295, %ecx       ## imm = 0xFFFFFFFF
	movl	%ecx, -8(%rbp)
	movl	-8(%rbp), %ecx
	addl	$4294967295, %ecx       ## imm = 0xFFFFFFFF
	movl	%ecx, -8(%rbp)
	movl	$-3, -8(%rbp)
	imull	$4294967293, %ecx, %ecx ## imm = 0xFFFFFFFD
	movl	-8(%rbp), %edx
	movl	%edx, %esi
	addl	$1, %esi
	movl	%esi, -8(%rbp)
	imull	%edx, %ecx
	subl	%ecx, %eax
	movl	-8(%rbp), %ecx
	addl	$1, %ecx
	movl	%ecx, -8(%rbp)
	addl	%ecx, %eax
	movl	%eax, -8(%rbp)
	movl	-8(%rbp), %esi
	movb	$0, %al
	callq	_printf
	movl	$0, %ecx
	movl	%eax, -12(%rbp)         ## 4-byte Spill
	movl	%ecx, %eax
	addq	$16, %rsp
	popq	%rbp
	retq
	.cfi_endproc

	.section	__TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
	.asciz	"a = %i\n"


.subsections_via_symbols

I hope now you get a deeper understanding how important the semantics of Sequence Points in C matters. In fact, such notions matters in all programming languages. Java will do it consistently because it's semantics is different and the language promise evaluation from left to right. So the byte codes created are consistent.
 

KnightNiwrem

Senior Member
Joined
Jun 1, 2014
Messages
1,056
Reaction score
0
The assembly program compiled from the C program is actually not helpful.

The correct takeaway, should be to never write this kind of lines in your C program.

The C standard does not guarantee the behaviour of the compiler when you perform these kinds of operations between 2 sequence points.

That is to say, the compiler could compile the value of a to "yellow", and that would be correct too. In fact, it is not just between different operating systems, that the above program produces inconsistent values. It can also produce different values in the same operating system, compiling it different C compilers.
 

davidktw

Arch-Supremacy Member
Joined
Apr 15, 2010
Messages
13,391
Reaction score
1,180
The assembly program compiled from the C program is actually not helpful.

The correct takeaway, should be to never write this kind of lines in your C program.

The C standard does not guarantee the behaviour of the compiler when you perform these kinds of operations between 2 sequence points.

That is to say, the compiler could compile the value of a to "yellow", and that would be correct too. In fact, it is not just between different operating systems, that the above program produces inconsistent values. It can also produce different values in the same operating system, compiling it different C compilers.

I'm not suggesting one should read into the assembly codes. But it helps to explain what is going on.

Of course, it is after all the compilers different implementation that differs since the C standard did not state how such evaluation should take place between 2 sequence points.

Anyway, I will really urge anyone whom wants to know more about computer architecture to challenge themselves by sometimes reading on assembly codes and read up more from the Internet too.

It's not as if you will just keep doing assembly, you get a chance to relate to the high level languages and see what is really happening.
 
Last edited:
Important Forum Advisory Note
This forum is moderated by volunteer moderators who will react only to members' feedback on posts. Moderators are not employees or representatives of HWZ. Forum members and moderators are responsible for their own posts.

Please refer to our Community Guidelines and Standards, Terms of Service and Member T&Cs for more information.
Top