Suppose you have code that looks like this: int func1(); int func2(); int func3(); int func4(); int func5() { int i = 0; i+=func2(); i+=func3(); i+=func4(); return i+func1(); } int func6() { int i=1; i+=func3(); i+=func3(); i+=func4(); return i+func1(); } It gets compiled to the following assembly when using -Os: func5(): pushq %rbx call func2() movl %eax, %ebx call func3() addl %eax, %ebx call func4() addl %eax, %ebx call func1() addl %ebx, %eax popq %rbx ret func6(): pushq %rbx call func3() leal 1(%rax), %ebx call func3() addl %eax, %ebx call func4() addl %eax, %ebx call func1() addl %ebx, %eax popq %rbx ret However the ends of the two functions are identical. This could be compiled into the following which is functionally the same but takes less space: func5(): pushq %rbx call func2() movl %eax, %ebx common_tail: call func3() addl %eax, %ebx call func4() addl %eax, %ebx call func1() addl %ebx, %eax popq %rbx ret func6(): pushq %rbx call func3() leal 1(%rax), %ebx jmp common_tail Testing with compiler explorer says that neither GCC, Clang, MSVC nor ICC do this optimization but some embedded compilers do.