davidktw
Arch-Supremacy Member
- Joined
- Apr 15, 2010
- Messages
- 13,391
- Reaction score
- 1,180
Following from http://forums.hardwarezone.com.sg/97506184-post4723.html. Wanted to repost here, so in case anyone stump upon such codes in C again will understand deeper and better what is going on.
Okay as promised lets follow up on this question of yours ? What really happened ? Lets see if we can even answer why you get "22". As of writing, it is the first time I'm looking at the assembly of your compiled C codes
Looking at your codes what we want to concern on is the complex pre and post increment/decrement expression on the same variable within the same sequence point.
The GCC compiler compile the C codes into the equivalent assembly language before compilation into machine opcodes.
@0008, you see the literal "5"($5) is assigned to -4(%rbp), "%rbp" is a register, namely the base pointer. It is one of the 2 pointers, namely Stack Pointer and Base Pointer often used in Intel x86, pointing to Stacks. You know what is a Stack, I hope. So -4(%rbp) is equivalent to an array access %rbp[-4], which means where %rbp is pointing to, minus 4 bytes, which is equivalent to a 32bits integer. Stacks are inverted, meaning they start from high memory addresses and stack up to lower memory addresses.
@00f: move the value "5" to one of the very commonly referenced register in assembly language %eax. It is one of the arithmetic register. RAX is the 64bits counterpart, and EAX is the 32bits counterpart of AX. So we now know memory address referenced by -4(%rbp) has the value 5. Remember the concept of a variable is a value found in a high level programming language placeholder. What really is important is not where "A" is, but rather the manipulation of the value that you wanted to operate in "A" and then probably pass it elsewhere over the network, or print it out, or just neglect it. What is really important is the manipulation of the value "5".
@0012,@0015: performs a 5-1=>4 and assign the value to EDX, then move the value back to -4(%rbp), which contain the value 4
@0018: subtract 1 from -4(%rbp), hence 4 becomes 3
@001c: move the value -3 to -4(%rbp). See up to this point, something weird happened. It seems all that has happened to -4(%rbp) is effectively render useless, because a -3 has overwrite the earlier value 3. Now -4(%rbp) stores -3
@0023: assign the value of -4(%rbp) which is now -3 to EDX, again render the earlier operations useless
@0026: Now ESI is assigned with the value of EDX which is -3. Hence ESI is -3
@0028: Multiple the value stored at ESI with the value stored at -4(%rbp) and store into ESI. That means the value of ESI <-- -3 * -3 = 9
@002c: move -4(%rbp)'s value which is -3 to EDX. Hence EDX is holding -3
@002f: increment %rdx by 1 and assign to ECX, which means ECX should have the value -3 + 1 = -2
@0032: move value of ECX to -4(%rbp), means now -4(%rbp) store value -2
@0035: multiply %esi into %edx means %edx <-- 9 * -3 = -27. Hence %edx is holding value -27
@0038: subtract %edx from %eax and store into %eax means %eax <-- 5 - (-27) = 32. Remember in @000f where %eax is storing the value 5 and undisturbed so far ?
@003a: add "1" to -4(%rbp), hence -4(%rbp) now have -1 (refer to @0032 why)
@003e: Finally add the value in %eax to -4(%rbp), means 32 + (-1) = 31
So after tracing so long, what did you see ? "a" which is referenced at -4(%rbp) is storing the value "31"
I hope you see what is really going on. How the compiler converted from C into assembly code does not correspond to what you think is happening at the source code level. Simply to put, the machine derives differently from your assumed semantics at the source code level.
In the end, "a" is holding a different value.
Below is the output when running on my linux server
So why you get "22" and not "31" ? I compile the same C code in my Mac OS X using also GCC and the answer is different. I get "-23"
So you see why my earlier answer to you is the answer is inconsistent.
Reading my Mac OS X assembly codes below, you will see different set of assembly codes are created.
I hope now you get a deeper understanding how important the semantics of Sequence Points in C matters. In fact, such notions matters in all programming languages. Java will do it consistently because it's semantics is different and the language promise evaluation from left to right. So the byte codes created are consistent.
int main(void)
{
int a = 5;
a = a-- - --a * (a = -3) * a++ + ++a;
printf("a = %i\n", a);
return 0;
}
david can help me see why this evaluates to 22? Thank you
Okay as promised lets follow up on this question of yours ? What really happened ? Lets see if we can even answer why you get "22". As of writing, it is the first time I'm looking at the assembly of your compiled C codes
Looking at your codes what we want to concern on is the complex pre and post increment/decrement expression on the same variable within the same sequence point.
The GCC compiler compile the C codes into the equivalent assembly language before compilation into machine opcodes.
@0008, you see the literal "5"($5) is assigned to -4(%rbp), "%rbp" is a register, namely the base pointer. It is one of the 2 pointers, namely Stack Pointer and Base Pointer often used in Intel x86, pointing to Stacks. You know what is a Stack, I hope. So -4(%rbp) is equivalent to an array access %rbp[-4], which means where %rbp is pointing to, minus 4 bytes, which is equivalent to a 32bits integer. Stacks are inverted, meaning they start from high memory addresses and stack up to lower memory addresses.
@00f: move the value "5" to one of the very commonly referenced register in assembly language %eax. It is one of the arithmetic register. RAX is the 64bits counterpart, and EAX is the 32bits counterpart of AX. So we now know memory address referenced by -4(%rbp) has the value 5. Remember the concept of a variable is a value found in a high level programming language placeholder. What really is important is not where "A" is, but rather the manipulation of the value that you wanted to operate in "A" and then probably pass it elsewhere over the network, or print it out, or just neglect it. What is really important is the manipulation of the value "5".
@0012,@0015: performs a 5-1=>4 and assign the value to EDX, then move the value back to -4(%rbp), which contain the value 4
@0018: subtract 1 from -4(%rbp), hence 4 becomes 3
@001c: move the value -3 to -4(%rbp). See up to this point, something weird happened. It seems all that has happened to -4(%rbp) is effectively render useless, because a -3 has overwrite the earlier value 3. Now -4(%rbp) stores -3
@0023: assign the value of -4(%rbp) which is now -3 to EDX, again render the earlier operations useless
@0026: Now ESI is assigned with the value of EDX which is -3. Hence ESI is -3
@0028: Multiple the value stored at ESI with the value stored at -4(%rbp) and store into ESI. That means the value of ESI <-- -3 * -3 = 9
@002c: move -4(%rbp)'s value which is -3 to EDX. Hence EDX is holding -3
@002f: increment %rdx by 1 and assign to ECX, which means ECX should have the value -3 + 1 = -2
@0032: move value of ECX to -4(%rbp), means now -4(%rbp) store value -2
@0035: multiply %esi into %edx means %edx <-- 9 * -3 = -27. Hence %edx is holding value -27
@0038: subtract %edx from %eax and store into %eax means %eax <-- 5 - (-27) = 32. Remember in @000f where %eax is storing the value 5 and undisturbed so far ?
@003a: add "1" to -4(%rbp), hence -4(%rbp) now have -1 (refer to @0032 why)
@003e: Finally add the value in %eax to -4(%rbp), means 32 + (-1) = 31
So after tracing so long, what did you see ? "a" which is referenced at -4(%rbp) is storing the value "31"
I hope you see what is really going on. How the compiler converted from C into assembly code does not correspond to what you think is happening at the source code level. Simply to put, the machine derives differently from your assumed semantics at the source code level.
In the end, "a" is holding a different value.
Below is the output when running on my linux server
Code:
$ cat abc.c
#include <stdio.h>
int main(void)
{
int a = 5;
a = a--
- --a
* (a = -3)
* a++
+ ++a;
printf("a = %i\n", a);
return 0;
}
$ cat abc.s
.file "abc.c"
.section .rodata
.LC0:
.string "a = %i\n"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl $5, -4(%rbp)
movl -4(%rbp), %eax
leal -1(%rax), %edx
movl %edx, -4(%rbp)
subl $1, -4(%rbp)
movl $-3, -4(%rbp)
movl -4(%rbp), %edx
movl %edx, %esi
imull -4(%rbp), %esi
movl -4(%rbp), %edx
leal 1(%rdx), %ecx
movl %ecx, -4(%rbp)
imull %esi, %edx
subl %edx, %eax
addl $1, -4(%rbp)
addl %eax, -4(%rbp)
movl -4(%rbp), %eax
movl %eax, %esi
movl $.LC0, %edi
movl $0, %eax
call printf
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
.section .note.GNU-stack,"",@progbits
$ ./abc
a = 31
$
So why you get "22" and not "31" ? I compile the same C code in my Mac OS X using also GCC and the answer is different. I get "-23"
So you see why my earlier answer to you is the answer is inconsistent.
Reading my Mac OS X assembly codes below, you will see different set of assembly codes are created.
Code:
$ cat ./abc.s
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main: ## @main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
subq $16, %rsp
leaq L_.str(%rip), %rdi
movl $0, -4(%rbp)
movl $5, -8(%rbp)
movl -8(%rbp), %eax
movl %eax, %ecx
addl $4294967295, %ecx ## imm = 0xFFFFFFFF
movl %ecx, -8(%rbp)
movl -8(%rbp), %ecx
addl $4294967295, %ecx ## imm = 0xFFFFFFFF
movl %ecx, -8(%rbp)
movl $-3, -8(%rbp)
imull $4294967293, %ecx, %ecx ## imm = 0xFFFFFFFD
movl -8(%rbp), %edx
movl %edx, %esi
addl $1, %esi
movl %esi, -8(%rbp)
imull %edx, %ecx
subl %ecx, %eax
movl -8(%rbp), %ecx
addl $1, %ecx
movl %ecx, -8(%rbp)
addl %ecx, %eax
movl %eax, -8(%rbp)
movl -8(%rbp), %esi
movb $0, %al
callq _printf
movl $0, %ecx
movl %eax, -12(%rbp) ## 4-byte Spill
movl %ecx, %eax
addq $16, %rsp
popq %rbp
retq
.cfi_endproc
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "a = %i\n"
.subsections_via_symbols
I hope now you get a deeper understanding how important the semantics of Sequence Points in C matters. In fact, such notions matters in all programming languages. Java will do it consistently because it's semantics is different and the language promise evaluation from left to right. So the byte codes created are consistent.