LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 18505 - Missing support for APCS frame layout
Summary: Missing support for APCS frame layout
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: ARM (show other bugs)
Version: trunk
Hardware: PC Linux
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-16 03:03 PST by Evgenii Stepanov
Modified: 2020-06-02 15:23 PDT (History)
8 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Evgenii Stepanov 2014-01-16 03:03:02 PST
Clang does not support gcc's flags -mapcs and -mapcs-frame.

Linux perf tool uses frame pointer based unwinder to gather callgraph profile (perf -g). On ARM this requires APCS frames (-fno-omit-frame-pointer -mapcs-frame) to ensure that frame pointers are stored in predictable locations. As a result, callgraph profiling on ARM does not work with Clang-generated binaries.

Dwarf-based unwinding is not an option due to performance issues.

AddressSanitizer would benefit greatly from this, too. It needs to gather stack traces of all memory allocations. Without FP we have use libc dwarf-based unwinder, and it's a huge performance hit (10x easily, depending on the application).
Comment 1 Evgenii Stepanov 2014-01-20 05:46:25 PST
After some more research, Clang stores frame pointers in a stable location, which is the second word of the stack frame. This seems to be a conscious decision (see definitions of CSR_AAPCS and CSR_iOS in ARMCallingConv.td).

This is consistent with GCC behaviour in the default mode (without -mapcs-frame). This behaviour does not seem to be standardized.

The only standard that mentions frame pointers on ARM is APCS, which is long deprecated. Linux kernel follows APCS, which means that Clang is incompatible with "perf -g".

There is no way to emit frame pointers in leaf functions (or is there?).

It looks like it would be possible to implement AAPCS frame pointer based unwinder for AddressSanitizer. It is only used in interceptors, therefore we don't care about unwinding from leaf functions.
Comment 2 Evgenii Stepanov 2014-01-21 05:24:24 PST
Since r199725 AddressSanitizer does FP-based unwind following AAPCS frame layout (with {fp, lr} at the top of the frame).

The only remaining practical issue from the lack of -mapcs-frame support is incompatibility with linux perf. Seeing that APCS is deprecated, maybe this should be fixed in linux kernel instead?
Comment 3 Keith Walker 2014-01-21 08:26:05 PST
As mentioned LLVM and GCC both stores the frame pointer in the second word of the stack frame after the return address.  However the position in the stack frame that the frame pointer points appears to differ...

LLVM stack frame looks like...

   return address (LR)
   saved frame pointer (R7 or R11)
   :                                <- frame pointer set to here
   :

GCC stack frame looks like...

   return address (LR)
   saved frame pointer (R7 or R11)   <- frame pointer set to here
   :
   :

This may be significant!
Comment 4 Evgenii Stepanov 2014-01-22 01:39:36 PST
Ouch. Thanks for pointing this out.

A minor correction, LLVM:
   return address (LR)
   saved frame pointer (R7 or R11)   <- frame pointer set to here
   :
   :

                                   
GCC:
   return address (LR)               <- frame pointer set to here
   saved frame pointer (R7 or R11)
   :
   :

With LLVM, like on x86, fp_next = fp[0].
With GCC, fp_next = *fp[-1]

Both can be supported if we can tell stack pointers from code pointers. Which we can do in AddressSanitizer, we always know stack limits of the current thread.
Comment 5 Evgenii Stepanov 2014-01-22 05:37:15 PST
And Thumb frame pointers are completely messed up.

Clang:
   0:   e92d 4ff0       stmdb   sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
   4:   af03            add     r7, sp, #12
Frame pointer in r7, r7_next = *r7, but offset from frame pointer to the return address is not fixed (in this example it's 5*4=20 bytes, depends on the set of spilled registers).

GCC:
   4:   e92d 4ff0       stmdb   sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
   8:   b08b            sub     sp, #44 ; 0x2c
   ...
   e:   af00            add     r7, sp, #0

   ...

  a2:   f107 072c       add.w   r7, r7, #44     ; 0x2c
  a6:   46bd            mov     sp, r7
  a8:   e8bd 8ff0       ldmia.w sp!, {r4, r5, r6, r7, r8, r9, sl, fp, pc}
Here r7 points to the top of the local variable area. There is no way to find even the address of the previous frame.
Comment 6 Albert J. Wong 2017-12-06 11:08:20 PST
Hi... we're considering doing some custom unwind code in chromium and am wondering what might be left of this bug?

I see this:

https://github.com/llvm-mirror/llvm/blob/master/lib/Target/ARM/ARMCallingConv.td#L257

which shows that at least in some cases, in thumb mode, r7 indeed pushed in front of LR instead of r11.

Does anyone know if this bug is still valid?  If not, would there be objections if I tried to fix it?
Comment 7 Evgenii Stepanov 2017-12-06 13:24:20 PST
Interesting, it appears to be fixed by this patch:
http://llvm.org/viewvc/llvm-project?rev=269459&view=rev

Now, both -marm and -mthumb store {fp, lr} at the top of the frame (with fp==r7 for thumb and r11 for arm), and fp_next==*fp.

As long as you don't need GCC interoperability, thumb frame pointers can be used for stack unwinding.
Comment 8 Albert J. Wong 2017-12-06 13:38:32 PST
Nice! I think that means this bug is fixed from the llvm perspective then? Should we close?
Comment 9 Albert J. Wong 2017-12-06 13:46:41 PST
FYI, it looks like as of gcc 6.3, they seem to also order r7 next to lr?

https://godbolt.org/g/koMcqu
Comment 10 Evgenii Stepanov 2017-12-06 13:51:18 PST
Yes, but the new frame pointer does not point to the store location of the previous frame pointer. Looks like this can not be used for unwinding.

Parent::VirtualBig(int, BigStruct):
  sub sp, sp, #8
  push {r4, r5, r7, lr}
  add r7, sp, #0
Comment 11 Albert J. Wong 2017-12-06 14:05:44 PST
Ah, I see.  Whereas in the llvm generated version:

Disassembly of section .text._ZN6Parent10VirtualBigEi9BigStruct:
   0:   b081            sub     sp, #4
   2:   b5d8            push    {r3, r4, r6, r7, lr}
   4:   af03            add     r7, sp, #12

r7 is adjusted back to account for the extra register pushes. Good to know...
Comment 12 Evgenii Stepanov 2017-12-06 14:40:27 PST
Anyway, there is nothing left to be done on the LLVM side, so I'm closing this bug.
Comment 13 Nick Desaulniers 2020-06-02 15:23:58 PDT
See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92172