Clang does not support gcc's flags -mapcs and -mapcs-frame. Linux perf tool uses frame pointer based unwinder to gather callgraph profile (perf -g). On ARM this requires APCS frames (-fno-omit-frame-pointer -mapcs-frame) to ensure that frame pointers are stored in predictable locations. As a result, callgraph profiling on ARM does not work with Clang-generated binaries. Dwarf-based unwinding is not an option due to performance issues. AddressSanitizer would benefit greatly from this, too. It needs to gather stack traces of all memory allocations. Without FP we have use libc dwarf-based unwinder, and it's a huge performance hit (10x easily, depending on the application).
After some more research, Clang stores frame pointers in a stable location, which is the second word of the stack frame. This seems to be a conscious decision (see definitions of CSR_AAPCS and CSR_iOS in ARMCallingConv.td). This is consistent with GCC behaviour in the default mode (without -mapcs-frame). This behaviour does not seem to be standardized. The only standard that mentions frame pointers on ARM is APCS, which is long deprecated. Linux kernel follows APCS, which means that Clang is incompatible with "perf -g". There is no way to emit frame pointers in leaf functions (or is there?). It looks like it would be possible to implement AAPCS frame pointer based unwinder for AddressSanitizer. It is only used in interceptors, therefore we don't care about unwinding from leaf functions.
Since r199725 AddressSanitizer does FP-based unwind following AAPCS frame layout (with {fp, lr} at the top of the frame). The only remaining practical issue from the lack of -mapcs-frame support is incompatibility with linux perf. Seeing that APCS is deprecated, maybe this should be fixed in linux kernel instead?
As mentioned LLVM and GCC both stores the frame pointer in the second word of the stack frame after the return address. However the position in the stack frame that the frame pointer points appears to differ... LLVM stack frame looks like... return address (LR) saved frame pointer (R7 or R11) : <- frame pointer set to here : GCC stack frame looks like... return address (LR) saved frame pointer (R7 or R11) <- frame pointer set to here : : This may be significant!
Ouch. Thanks for pointing this out. A minor correction, LLVM: return address (LR) saved frame pointer (R7 or R11) <- frame pointer set to here : : GCC: return address (LR) <- frame pointer set to here saved frame pointer (R7 or R11) : : With LLVM, like on x86, fp_next = fp[0]. With GCC, fp_next = *fp[-1] Both can be supported if we can tell stack pointers from code pointers. Which we can do in AddressSanitizer, we always know stack limits of the current thread.
And Thumb frame pointers are completely messed up. Clang: 0: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr} 4: af03 add r7, sp, #12 Frame pointer in r7, r7_next = *r7, but offset from frame pointer to the return address is not fixed (in this example it's 5*4=20 bytes, depends on the set of spilled registers). GCC: 4: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr} 8: b08b sub sp, #44 ; 0x2c ... e: af00 add r7, sp, #0 ... a2: f107 072c add.w r7, r7, #44 ; 0x2c a6: 46bd mov sp, r7 a8: e8bd 8ff0 ldmia.w sp!, {r4, r5, r6, r7, r8, r9, sl, fp, pc} Here r7 points to the top of the local variable area. There is no way to find even the address of the previous frame.
Hi... we're considering doing some custom unwind code in chromium and am wondering what might be left of this bug? I see this: https://github.com/llvm-mirror/llvm/blob/master/lib/Target/ARM/ARMCallingConv.td#L257 which shows that at least in some cases, in thumb mode, r7 indeed pushed in front of LR instead of r11. Does anyone know if this bug is still valid? If not, would there be objections if I tried to fix it?
Interesting, it appears to be fixed by this patch: http://llvm.org/viewvc/llvm-project?rev=269459&view=rev Now, both -marm and -mthumb store {fp, lr} at the top of the frame (with fp==r7 for thumb and r11 for arm), and fp_next==*fp. As long as you don't need GCC interoperability, thumb frame pointers can be used for stack unwinding.
Nice! I think that means this bug is fixed from the llvm perspective then? Should we close?
FYI, it looks like as of gcc 6.3, they seem to also order r7 next to lr? https://godbolt.org/g/koMcqu
Yes, but the new frame pointer does not point to the store location of the previous frame pointer. Looks like this can not be used for unwinding. Parent::VirtualBig(int, BigStruct): sub sp, sp, #8 push {r4, r5, r7, lr} add r7, sp, #0
Ah, I see. Whereas in the llvm generated version: Disassembly of section .text._ZN6Parent10VirtualBigEi9BigStruct: 0: b081 sub sp, #4 2: b5d8 push {r3, r4, r6, r7, lr} 4: af03 add r7, sp, #12 r7 is adjusted back to account for the extra register pushes. Good to know...
Anyway, there is nothing left to be done on the LLVM side, so I'm closing this bug.
See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92172