When run in a TARGET_ARCH=powerpc buildworld based environment that was built via clang 3.8.0 from FreeBSD's projects/clang380-import source the following 8 line program gets a SEGV. But before it does it ignores the catch clause and calls std::terminate. #include <exception> int main(void) { try { throw std::exception(); } catch (std::exception& e) {} // same result without & return 0; } (The above is a simplification of the original discovery context. The actual problem code is not in the above source but in supporting FreeBSD library code when compiled via clang 3.8.0.) I've tracked down the problem to misbehavior of clang 3.8.0 code generation for __builtin_dwarf_cfa () as used in: #define uw_init_context(CONTEXT) \ do \ { \ /* Do any necessary initialization to access arbitrary stack frames. \ On the SPARC, this means flushing the register windows. */ \ __builtin_unwind_init (); \ uw_init_context_1 (CONTEXT, __builtin_dwarf_cfa (), \ __builtin_return_address (0)); \ } \ while (0) . . . 85 _Unwind_Reason_Code 86 _Unwind_RaiseException(struct _Unwind_Exception *exc) 87 { 88 struct _Unwind_Context this_context, cur_context; 89 _Unwind_Reason_Code code; 90 91 /* Set up this_context to describe the current stack frame. */ 92 uw_init_context (&this_context); In the below r4 ends up with the __builtin_dwarf_cfa () value supplied to uw_init_context_1: Dump of assembler code for function _Unwind_RaiseException: 0x419a8fd8 <+0>: mflr r0 0x419a8fdc <+4>: stw r31,-148(r1) 0x419a8fe0 <+8>: stw r30,-152(r1) 0x419a8fe4 <+12>: stw r0,4(r1) 0x419a8fe8 <+16>: stwu r1,-2992(r1) 0x419a8fec <+20>: mr r31,r1 . . . 0x419a9094 <+188>: mr r4,r31 0x419a9098 <+192>: mflr r30 0x419a909c <+196>: lwz r5,2996(r31) 0x419a90a0 <+200>: mr r3,r28 0x419a90a4 <+204>: bl 0x419a929c <uw_init_context_1> That r4 ends up holding the stack pointer value for after it has been decremented. r4 is not pointing at the boundary with the caller's frame. The .eh_frame information and unwind code is set up for pointing at the boundary with the caller's frame. So the cfa relative addressing is messed up for what it actually extracts. Contrast this with some other compiler's TARGET_ARCH=powerpc64 code (for FreeBSD's projects/clang380-import's source code again) where r4 is made to be at the boundary with the caller's frame: Dump of assembler code for function _Unwind_RaiseException: 0x00000000501cb810 <+0>: mflr r0 0x00000000501cb814 <+4>: stdu r1,-5648(r1) . . . 0x00000000501cb8d0 <+192>: addi r4,r1,5648 0x00000000501cb8d4 <+196>: stw r12,5656(r1) 0x00000000501cb8d8 <+200>: mr r28,r3 0x00000000501cb8dc <+204>: addi r31,r1,2544 0x00000000501cb8e0 <+208>: mr r3,r27 0x00000000501cb8e4 <+212>: addi r29,r1,112 0x00000000501cb8e8 <+216>: bl 0x501cae60 <uw_init_context_1> (clang 3.8.0 is unable to complete a buildworld for FreeBSD last I checked. Thus my use of another compiler.) NOTE: The powerpc (32-bit) issue may in some way be associated with the clang 3.8.0 FreeBSD powerpc ABI violation in how it handles the stack pointer: TARGET_ARCH=powerpc builds are currently using a "red zone" in the stack, decrementing the stack pointer late, and incrementing the stack pointer early compared to the FreeBSD ABI rules. (This is similar to the official FreeBSD ABI for TARGET_ARCH=powerpc64.)
(In reply to comment #0) Here is a two line self-contained program that shows he problem when the .o is examined with objdump. I provide comparisons with .o's from g++49 or g++5. # more builtin_dwarf_cfa.cpp extern void g(void*); void f() { g(__builtin_dwarf_cfa()); } In a TARGET_ARCH=powerpc64 context: # clang++ -c -g -std=c++11 -Wall -pedantic builtin_dwarf_cfa.cpp # /usr/local/bin/objdump -d --prefix-addresses builtin_dwarf_cfa.o builtin_dwarf_cfa.o: file format elf64-powerpc-freebsd Disassembly of section .text: 0000000000000000 <._Z1fv> mflr r0 0000000000000004 <._Z1fv+0x4> std r31,-8(r1) 0000000000000008 <._Z1fv+0x8> std r0,16(r1) 000000000000000c <._Z1fv+0xc> stdu r1,-128(r1) 0000000000000010 <._Z1fv+0x10> mr r31,r1 0000000000000014 <._Z1fv+0x14> mr r3,r31 0000000000000018 <._Z1fv+0x18> bl 0000000000000018 <._Z1fv+0x18> 000000000000001c <._Z1fv+0x1c> nop 0000000000000020 <._Z1fv+0x20> addi r1,r1,128 0000000000000024 <._Z1fv+0x24> ld r0,16(r1) 0000000000000028 <._Z1fv+0x28> ld r31,-8(r1) 000000000000002c <._Z1fv+0x2c> mtlr r0 0000000000000030 <._Z1fv+0x30> blr ... r3 does not point to the boundary with the caller's stack frame. By contrast for g++49: # g++49 -c -g -std=c++11 -Wall -pedantic builtin_dwarf_cfa.cpp # /usr/local/bin/objdump -d --prefix-addresses builtin_dwarf_cfa.o | more builtin_dwarf_cfa.o: file format elf64-powerpc-freebsd Disassembly of section .text: 0000000000000000 <._Z1fv> mflr r0 0000000000000004 <._Z1fv+0x4> std r0,16(r1) 0000000000000008 <._Z1fv+0x8> std r31,-8(r1) 000000000000000c <._Z1fv+0xc> stdu r1,-128(r1) 0000000000000010 <._Z1fv+0x10> mr r31,r1 0000000000000014 <._Z1fv+0x14> addi r9,r31,128 0000000000000018 <._Z1fv+0x18> mr r3,r9 000000000000001c <._Z1fv+0x1c> bl 000000000000001c <._Z1fv+0x1c> 0000000000000020 <._Z1fv+0x20> nop 0000000000000024 <._Z1fv+0x24> addi r1,r31,128 0000000000000028 <._Z1fv+0x28> ld r0,16(r1) 000000000000002c <._Z1fv+0x2c> mtlr r0 0000000000000030 <._Z1fv+0x30> ld r31,-8(r1) 0000000000000034 <._Z1fv+0x34> blr 0000000000000038 <._Z1fv+0x38> .long 0x0 000000000000003c <._Z1fv+0x3c> .long 0x90001 0000000000000040 <._Z1fv+0x40> lwz r0,1(r1) r3 does point to the boundary with the caller's stack frame. For TARGET_ARCH=powerpc, clang 3.8.0 first: # clang++ -c -g -std=c++11 -Wall -pedantic builtin_dwarf_cfa.cpp # /usr/local/bin/objdump -d --prefix-addresses builtin_dwarf_cfa.o builtin_dwarf_cfa.o: file format elf32-powerpc-freebsd Disassembly of section .text: 00000000 <_Z1fv> mflr r0 00000004 <_Z1fv+0x4> stw r31,-4(r1) 00000008 <_Z1fv+0x8> stw r0,4(r1) 0000000c <_Z1fv+0xc> stwu r1,-16(r1) 00000010 <_Z1fv+0x10> mr r31,r1 00000014 <_Z1fv+0x14> mr r3,r31 00000018 <_Z1fv+0x18> bl 00000018 <_Z1fv+0x18> 0000001c <_Z1fv+0x1c> addi r1,r1,16 00000020 <_Z1fv+0x20> lwz r0,4(r1) 00000024 <_Z1fv+0x24> lwz r31,-4(r1) 00000028 <_Z1fv+0x28> mtlr r0 0000002c <_Z1fv+0x2c> blr Then g++5 (5.3): # g++5 -c -g -std=c++11 -Wall -pedantic builtin_dwarf_cfa.cpp # /usr/local/bin/objdump -d --prefix-addresses builtin_dwarf_cfa.o builtin_dwarf_cfa.o: file format elf32-powerpc-freebsd Disassembly of section .text: 00000000 <_Z1fv> stwu r1,-16(r1) 00000004 <_Z1fv+0x4> mflr r0 00000008 <_Z1fv+0x8> stw r0,20(r1) 0000000c <_Z1fv+0xc> stw r31,12(r1) 00000010 <_Z1fv+0x10> mr r31,r1 00000014 <_Z1fv+0x14> addi r9,r31,16 00000018 <_Z1fv+0x18> mr r3,r9 0000001c <_Z1fv+0x1c> bl 0000001c <_Z1fv+0x1c> 00000020 <_Z1fv+0x20> nop 00000024 <_Z1fv+0x24> addi r11,r31,16 00000028 <_Z1fv+0x28> lwz r0,4(r11) 0000002c <_Z1fv+0x2c> mtlr r0 00000030 <_Z1fv+0x30> lwz r31,-4(r11) 00000034 <_Z1fv+0x34> mr r1,r11 00000038 <_Z1fv+0x38> blr
(In reply to comment #1) I should have been explicit: The stack frames boundary that I reference in the 2-line examples are between: A) f's frame and B) f's caller's frame (Not between f vs. g.) (The external g function just avoided any potential optimization that might eliminate the code I was trying to produce.) (B) is rather implicit as I wrote comment #1. It could lead to confusion. Thus this note.
Looks like arm has the same sort of distinction vs. g++: # clang++ -c -g -std=c++11 -Wall -pedantic builtin_dwarf_cfa.cpp # /usr/local/bin/objdump -d --prefix-addresses builtin_dwarf_cfa.o builtin_dwarf_cfa.o: file format elf32-littlearm Disassembly of section .text: 00000000 <_Z1fv> push {fp, lr} 00000004 <_Z1fv+0x4> mov fp, sp 00000008 <_Z1fv+0x8> mov r0, fp 0000000c <_Z1fv+0xc> bl 00000000 <_Z1gPv> 00000010 <_Z1fv+0x10> pop {fp, pc} # g++5 -c -g -std=c++11 -Wall -pedantic builtin_dwarf_cfa.cpp # /usr/local/bin/objdump -d --prefix-addresses builtin_dwarf_cfa.o builtin_dwarf_cfa.o: file format elf32-littlearm Disassembly of section .text: 00000000 <_Z1fv> push {fp, lr} 00000004 <_Z1fv+0x4> add fp, sp, #4, 0 00000008 <_Z1fv+0x8> add r3, fp, #4, 0 0000000c <_Z1fv+0xc> mov r0, r3 00000010 <_Z1fv+0x10> bl 00000000 <_Z1gPv> 00000014 <_Z1fv+0x14> nop ; (mov r0, r0) 00000018 <_Z1fv+0x18> pop {fp, pc}
Here is what the "ABI for the ARM 32 32-bit Architecture" "DWARF for the ARM Architecture" document says about the CFA: 3.4 Canonical Frame Address The term Canonical Frame Address (CFA) is defined in [GDWARF], §6.4, Call Frame Information. This ABI adopts the typical definition of CFA given there. The CFA is the value of the stack pointer (r13) at the call site in the previous frame. This, with the armv6 code I've shown via "objdump -d", indicates that for armv6 clang++'s __builtin_dwarf_cfa() return value is not the same value as the official ARM ABI indicates. It also indicates that what g++ returns does match the official ARM ABI.
I do not claim that the following is the proper, global, __builtin_dwarf_cfa () fix given the history of it being a gcc/g++ mismatch since clang 2.7 or so when it was added to clang. But this work around on a powerpc FreeBSD box has let me investigate later issues in the C++ exception handling while using FreeBSD's libgcc_s. Thanks go to Roman Divacky for finding what I needed to look at in clang/llvm for this. For case Intrinsic::eh_dwarf_cfa in SelectionDAGBuilder::visitIntrinsicCall . . . # svnlite diff contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp Index: contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (revision 296011) +++ contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (working copy) @@ -4618,7 +4618,7 @@ CfaArg); SDValue FA = DAG.getNode( ISD::FRAMEADDR, sdl, TLI.getPointerTy(DAG.getDataLayout()), - DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout()))); + DAG.getConstant(1, sdl, TLI.getPointerTy(DAG.getDataLayout()))); setValue(&I, DAG.getNode(ISD::ADD, sdl, FA.getValueType(), FA, Offset)); return nullptr; In other words: use Frame Depth 1 instead of Frame Depth 0. So get a frame/stack boundary that is between the frame for routine using __builtin_dwarf_cfa () (_Unwind_RaiseException here) and the frame for its caller (throw code), matching what gcc/g++ does when it is used to compile that same code. For TARGET_ARCH=powerpc (and likely powerpc64?) this allowed getting much farther into the exception handling for some types of contexts. [__builtin_dwarf_cfa () is not the only issue overall.] FreeBSD has not had clang for TARGET_ARCH=powerpc (or powerpc64) yet and so the clang history does not matter so much for it for those architectures but matching gcc/g++ helps, allowing a mix of clang and gcc use overall. (Actually some users of lang/clang*'s in ports might well notice the difference.) It is less clear to me what is appropriate for TARGET_ARCH's that have been using clang for buildworld in FreeBSD land for some time --or for outside FreeBSD contexts: Frame Depth 0 use here has been around a long time. Since other things use the lower level code interface involved the above avoids changing the "API" results for the other uses by changing the calling code instead. (I've not checked if any of the other uses of the lower level code have off by one problems compared to gcc/g++ as well.)
Adjusting the example source shows that the _builtin_dwarf_cfa() result depends on where it is used: # more builtin_dwarf_cfa.cpp #include <stdlib.h> extern void g(void*); void f0() { g(__builtin_dwarf_cfa()); } void f1() { auto f1_cfa = __builtin_dwarf_cfa(); g(f1_cfa); } f0 and f1 pass g different offsets from the frame pointer. See below for a TARGET_ARCH=powerpc example. g++ behaves like f1 for both f1 and f0. # clang++ -c -g -std=c++11 -Wall -pedantic builtin_dwarf_cfa.cpp results in: Disassembly of section .text: 00000000 <_Z2f0v> mflr r0 00000004 <_Z2f0v+0x4> stw r31,-4(r1) 00000008 <_Z2f0v+0x8> stw r0,4(r1) 0000000c <_Z2f0v+0xc> stwu r1,-16(r1) 00000010 <_Z2f0v+0x10> mr r31,r1 00000014 <_Z2f0v+0x14> mr r3,r31 00000018 <_Z2f0v+0x18> lwz r3,0(r3) 0000001c <_Z2f0v+0x1c> bl 0000001c <_Z2f0v+0x1c> 00000020 <_Z2f0v+0x20> addi r1,r1,16 00000024 <_Z2f0v+0x24> lwz r0,4(r1) 00000028 <_Z2f0v+0x28> lwz r31,-4(r1) 0000002c <_Z2f0v+0x2c> mtlr r0 00000030 <_Z2f0v+0x30> blr 00000034 <_Z2f1v> mflr r0 00000038 <_Z2f1v+0x4> stw r31,-4(r1) 0000003c <_Z2f1v+0x8> stw r0,4(r1) 00000040 <_Z2f1v+0xc> stwu r1,-16(r1) 00000044 <_Z2f1v+0x10> mr r31,r1 00000048 <_Z2f1v+0x14> mr r3,r31 0000004c <_Z2f1v+0x18> lwz r3,0(r3) 00000050 <_Z2f1v+0x1c> stw r3,8(r31) 00000054 <_Z2f1v+0x20> bl 00000054 <_Z2f1v+0x20> 00000058 <_Z2f1v+0x24> addi r1,r1,16 0000005c <_Z2f1v+0x28> lwz r0,4(r1) 00000060 <_Z2f1v+0x2c> lwz r31,-4(r1) 00000064 <_Z2f1v+0x30> mtlr r0 00000068 <_Z2f1v+0x34> blr
(In reply to comment #6) Ignore comment 6 (I wish I could just delete it to avoid creating confusions). > Adjusting the example source shows that the _builtin_dwarf_cfa() result > depends on where it is used: > . . . WRONG! I misread where an offset was used and was using a clang++ 3.8.0 with a local workaround in it as well. Not one of my better days.
Another way of seeing which boundary of a frame (low memory address side vs. high memory address side) is the sign of the offset used for the DW_REG_offset figures for after the stack pointer has been adjusted. A powerpc example (from dwarfdump -v -v -F) is: DW_CFA_offset r28 -160 (40 * -4) Negative offsets are for the cfa having a high-address boundary value. Positive offsets would be for the cfa having a low-address boundary value. So the .eh_frame information for powerpc (and powerpc64) indicates that the high-address side is supposed to be used for the cfa. This is also how the official documents read: the stack pointer value on entry before the adjustment for the local frame. (Armv6/armv7 also get Negative offsets. Likely others do was well.) PPCTargetLowering::LowerFRAMEADDR for depth zero returns the lower address side, in part because PPCTargetLowering::LowerRETURNADDR uses PPCTargetLowering::LowerFRAMEADDR based on a numbering where that would be the zero position: as stands both count Frame Depth the same way. But PPCTargetLowering::LowerRETURNADDR works correctly as is: it would have to be adjusted if the PPCTargetLowering::LowerFRAMEADDR Frame Depth numbering was changed. Thus it appears that in "case Intrinsic::eh_dwarf_cfa" in SelectionDAGBuilder::visitIntrinsicCall its Frame Depth for its ISD::FRAMEADDR use should be converting from the ISD::FRAMEADDR "low-address side" view to the cfa "high-address side" view by requesting the "low-address" side of the "Depth 1 Frame" (i.e., the prior Frame Pointer register value): ISD::FRAMEADDR returns the "low-address side" for the given depth.
Patch posted for review: https://reviews.llvm.org/D24038
(In reply to comment #9) > Patch posted for review: https://reviews.llvm.org/D24038 r280350. Also, PR30231 filed to track the potential issue on ARM.
(In reply to comment #10) > (In reply to comment #9) > > Patch posted for review: https://reviews.llvm.org/D24038 > > r280350. Also, PR30231 filed to track the potential issue on ARM. Thanks Hal. Dimitry Andric (dim at FreeBSD.org) has written: > I merged the upstream fix to projects/clang390-import: > > https://svnweb.freebsd.org/changeset/base/305683 So FreeBSD stable/12 will be adopting your changes. As for my activity: I'll not have access to powerpc64s/powerpcs for a few weeks yet.
(In reply to comment #11) > So FreeBSD stable/12 will be adopting your changes. That should have been head (current) for FreeBSD 12.
(In reply to comment #12) > (In reply to comment #11) > > So FreeBSD stable/12 will be adopting your changes. > > That should have been head (current) for FreeBSD 12. powerpc64 notes (and only ppc64 for now). . . With my amd64 -> TARGET_ARCH=powerpc64 buildworld and the FreeBSD clang 3.9.0 that in includes the simple 2-line example works fine (code inspection of the .o file). Thanks! But there are other problems that still prevent the following from working overall. Yet 26761's issue is fixed. #include <exception> int main(void) { try { throw std::exception(); } catch (std::exception& e) {} // same result without & return 0; } An inspection of the code produced in gdb shows: 0x00000000501c72bc <+0>: mflr r0 0x00000000501c72c0 <+4>: mfcr r12 0x00000000501c72c4 <+8>: std r31,-152(r1) 0x00000000501c72c8 <+12>: std r0,16(r1) 0x00000000501c72cc <+16>: stw r12,8(r1) 0x00000000501c72d0 <+20>: stdu r1,-5840(r1) 0x00000000501c72d4 <+24>: mr r31,r1 . . . 0x00000000501c7394 <+216>: addi r4,r31,5840 . . . 0x00000000501c7414 <+344>: bl 0x501c76dc <uw_init_context_1> So that much is now correct (matching 26761's issue). But overall it gets: Program terminated with signal SIGABRT, Aborted. #0 0x00000000502f8868 in .__sys_thr_kill () from /lib/libc.so.7 (gdb) bt #0 0x00000000502f8868 in .__sys_thr_kill () from /lib/libc.so.7 #1 0x00000000502f8818 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52 #2 0x00000000502f8748 in abort () at /usr/src/lib/libc/stdlib/abort.c:65 #3 0x00000000501c9cbc in _Unwind_GetGR (context=<optimized out>, index=65) at /usr/src/gnu/lib/libgcc/../../../contrib/gcc/unwind-dw2.c:180 #4 uw_update_context_1 (context=<optimized out>, fs=<optimized out>) at /usr/src/gnu/lib/libgcc/../../../contrib/gcc/unwind-dw2.c:1353 #5 0x00000000501c78b0 in uw_init_context_1 (context=0xffffffffffffd1e0, outer_cfa=0xffffffffffffd940, outer_ra=0x50179ea0 <__cxa_throw(void*, std::type_info*, void (*)(void*))+248>) at /usr/src/gnu/lib/libgcc/../../../contrib/gcc/unwind-dw2.c:1442 #6 0x00000000501c7418 in _Unwind_RaiseException (exc=<optimized out>) at /usr/src/gnu/lib/libgcc/../../../contrib/gcc/unwind.inc:92 #7 0x0000000050179ea0 in throw_exception (ex=<optimized out>) at /usr/src/lib/libcxxrt/../../contrib/libcxxrt/exception.cc:774 #8 __cxa_throw (thrown_exception=<optimized out>, tinfo=<optimized out>, dest=<optimized out>) at /usr/src/lib/libcxxrt/../../contrib/libcxxrt/exception.cc:801 #9 0x0000000010000cf0 in .main () because of other issues with C++ exception handling. Note: That gdb can do the bt now is a big improvement for powerpc64 as I remember. More is definitely working than when I reported 26761. Again: Thanks!