LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 26519 - Clang 4.0.0's "Target: powerpc-unknown-freebsd11.0" code generation is violating the SVR4 ABI (SEGV can result)
Summary: Clang 4.0.0's "Target: powerpc-unknown-freebsd11.0" code generation is violat...
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: PowerPC (show other bugs)
Version: 4.0
Hardware: Other FreeBSD
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks: 25780
  Show dependency tree
 
Reported: 2016-02-07 14:35 PST by Mark Millard
Modified: 2017-11-12 11:42 PST (History)
6 users (show)

See Also:
Fixed By Commit(s):


Attachments
.zip of preprocessed numeric.c from perl5 for comments 24 & 25 (64.09 KB, application/zip)
2017-05-08 11:37 PDT, Mark Millard
Details
Patch for release_40 (2.11 KB, patch)
2017-05-09 15:19 PDT, Krzysztof Parzyszek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Millard 2016-02-07 14:35:42 PST
Comparing clang 3.8.0 (via FreeBSD's projects/clang380-import svn) generated code for TARGET_ARCH=powerpc (32-bit) to gcc 4.2.1 generated code. . .

clang 3.8.0 based Str_Match preamble (from make):

0x181a4a8 <Str_Match>:	mflr    r0
0x181a4ac <Str_Match+4>:	stw     r31,-4(r1) # Clang's frame pointer (r31) 
                                                   # saved before stack pointer changed.
0x181a4b0 <Str_Match+8>:	stw     r0,4(r1)   # lr saved before stack pointer changed.
0x181a4b4 <Str_Match+12>:	stwu    r1,-32(r1) # Stack pointer finally saved and
                                                   # changed.
0x181a4b8 <Str_Match+16>:	mr      r31,r1     # r31 is the frame pointer under clang.
0x181a4bc <Str_Match+20>:	stw     r30,24(r31)

gcc 4.2.1 based Str_Match preamble:

0x1819cb8 <Str_Match>:	mflr    r0
0x1819cbc <Str_Match+4>:	stwu    r1,-32(r1) # Stack pointer saved and changed first.
0x1819cc0 <Str_Match+8>:	stw     r31,28(r1) # r31 saved after stack pointer changed.
0x1819cc4 <Str_Match+12>:	mr      r31,r3     # gcc 4.2.1 does not reserve
                                                   # r31 for use as a frame pointer.
0x1819cc8 <Str_Match+16>:	stw     r30,24(r1)
0x1819ccc <Str_Match+20>:	stw     r0,36(r1)  # lr saved after stack pointer changed.

Picking a different example for postamble code, showing just clang 3.8.0's code:

0x1801b8c <Buf_AddBytes+104>:	lwz     r30,24(r31)
0x1801b90 <Buf_AddBytes+108>:	lwz     r29,20(r31)
0x1801b94 <Buf_AddBytes+112>:	lwz     r28,16(r31)
0x1801b98 <Buf_AddBytes+116>:	lwz     r27,12(r31)
0x1801b9c <Buf_AddBytes+120>:	lwz     r26,8(r31)
0x1801ba0 <Buf_AddBytes+124>:	addi    r1,r1,32   # Stack pointer adjusted first
0x1801ba4 <Buf_AddBytes+128>:	lwz     r0,4(r1)
0x1801ba8 <Buf_AddBytes+132>:	lwz     r31,-4(r1) # Then Frame Pointer load happens
                                                   # "outside" the new stack range.
0x1801bac <Buf_AddBytes+136>:	mtlr    r0
0x1801bb0 <Buf_AddBytes+140>:	blr

In other words: clang 3.8.0's generated 32-bit powerpc code is based on there being a safe scratch area below the stack ("below" by memory address). So similar to the 224 byte "red zone" area that 32-bit AIX powerpc and 32-bit Darwin powerpc use.


I'm told by Nathan Whithorn that "the 32-bit ELF ABI does not require any such red zone" and so that clang 3.8.0 is violating the ABI that is supposed to be involved.

I do not have specific document or section references (or web links) to list for the ABI details at this time. I'm just reporting what I'm told by FreeBSD folks.


I used "make" code as the example above because something like "make -j 6 buildworld" uses signal delivery extensively (SIGCHLD) and such a build its gets a SEGV in a make process within the 1st few minutes (on a "Quad core G5 PowerMac" using a FreeBSD powerpc 32-bit installation). The signal delivery is sometimes replacing the value at "-4(r1)" in the above code before it is loaded back into r31 (the clang 3.8.0 framepointer register for powerpc as it is currently generating code). The FreeBSD signal delivery for 32-bit powerpc does not have/use a "red zone" on the smaller-address side of the stack.
Comment 1 Mark Millard 2016-02-07 16:08:07 PST
FYI, additional ABI issue possibility:

On FreeBSD 11.0-CURRENT (well, projects/clang380-import) when I try "clang++ -std=c++11 -dM -E just_main.cpp" I get. . .

#define __BIGGEST_ALIGNMENT__ 8
. . .
#define __NATURAL_ALIGNMENT__ 1

on both TARGET_ARCH=powerpc and TARGET_ARCH=powerpc64 contexts.

But for the special port devel/powerpc64-gcc used for modern cross compiles of FreeBSD ( /usr/local/bin/powerpc64-portbld-freebsd11.0-g++ -std=c++11 -dM -E just_main.cpp ) I get:

#define __BIGGEST_ALIGNMENT__ 16

(Natural is not referenced. There is no such special port for 32-bit powerpc to see its output.)

So there may be an ABI alignment mismatch someplace as well unless the ABI has some optional-status alignment rules.

Without a ABI reference document I'm unsure which __BIGGEST_ALIGNMENT__ would be correct for the FreeBSD powerpc ABI (if either one is). (Similarly for powerpc64.)



Side notes:

powerpc64-portbld-freebsd11.0-g++ also reports:

#define __CMODEL_MEDIUM__ 1
. . .
#define __cpp_rvalue_reference 200610

But clang++ 3.8.0 reports nothing analogous to the first and:

#define __cpp_rvalue_references 200610

(gcc and clang have spelling differences here.)
Comment 2 Mark Millard 2016-02-09 15:02:03 PST
"FreeBSD's" Justin Hibbits added a note in the FreeBSD bug entry (206990) that I repeat here:

There is no provision in the ABI for a redzone in 32-bit powerpc.  LLVM is broken for 32-bit PowerPC regarding this, and there are comments in the source code to this regard, to the effect:

(PPCFrameLowering.cpp):
    // FIXME: On PPC32 SVR4, we must not spill before claiming the stackframe.

If a signal interrupts the thread at the precise wrong time (when creating the stack frame, but before adjusting %r1), Bad Things will happen.
Comment 3 Mark Millard 2016-02-14 01:29:51 PST
It appears that I originally forgot to set the Component for my submittal. Looking around I estimate that LLVM Codegen is the closest category: PPCFrameLowering.cpp is not under tools/clang/ at all so C++ would seem to be over specific.
Comment 4 Mark Millard 2016-08-30 14:05:41 PDT
(In reply to comment #1)

Just a note on __cpp_rvalue_reference vs. __cpp_rvalue_references :

> Side notes:
> 
> powerpc64-portbld-freebsd11.0-g++ also reports:
> 
> #define __CMODEL_MEDIUM__ 1
> . . .
> #define __cpp_rvalue_reference 200610
> 
> But clang++ 3.8.0 reports nothing analogous to the first and:
> 
> #define __cpp_rvalue_references 200610
> 
> (gcc and clang have spelling differences here.)

https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00175.html says:

> as PR71214 points out gcc uses a wrong feature test macro for C++11
> rvalue references: __cpp_rvalue_reference instead of the correct 
> __cpp_rvalue_references.

So this specific point is a gcc problem but libraries targeting supporting what gcc has done historically will now have 2 names to test the value of.
Comment 5 Krzysztof Parzyszek 2016-09-01 09:22:49 PDT
Patch for review: https://reviews.llvm.org/D24093
Comment 6 Krzysztof Parzyszek 2016-09-06 07:31:17 PDT
Committed in r280705.
Comment 7 Mark Millard 2016-09-10 16:08:46 PDT
(In reply to comment #6)
> Committed in r280705.

Thanks Krzysztof.

Dimitry Andric (dim at FreeBSD.org) has written:

> I merged the upstream fix to projects/clang390-import:
> 
> https://svnweb.freebsd.org/changeset/base/305686


So FreeBSD head (current) for 12 will be adopting your changes.



As for my activity:

I'll not have access to powerpc64s/powerpcs for a few weeks yet.
Comment 8 Mark Millard 2016-09-10 20:04:09 PDT
(In reply to comment #6)
> Committed in r280705.

Did the commit also fix the stack pointer adjustment timing for the post-amble code?

The wording that I see in the review and commit talks about the "claim" side of things, which I interpret to be for the pre-amble side of things. If I interpret the code right that is the side fixed. (I'm not clang/llvm code literate so I could easily be wrong.)

My original submittal also noted the timing problem existed on the post-amble side in 3.8.0's code generation:

0x1801b8c <Buf_AddBytes+104>:	lwz     r30,24(r31)
0x1801b90 <Buf_AddBytes+108>:	lwz     r29,20(r31)
0x1801b94 <Buf_AddBytes+112>:	lwz     r28,16(r31)
0x1801b98 <Buf_AddBytes+116>:	lwz     r27,12(r31)
0x1801b9c <Buf_AddBytes+120>:	lwz     r26,8(r31)
0x1801ba0 <Buf_AddBytes+124>:	addi    r1,r1,32   # Stack pointer adjusted first
0x1801ba4 <Buf_AddBytes+128>:	lwz     r0,4(r1)
0x1801ba8 <Buf_AddBytes+132>:	lwz     r31,-4(r1) # Then Frame Pointer load happens
                                                   # "outside" the new stack range.
0x1801bac <Buf_AddBytes+136>:	mtlr    r0
0x1801bb0 <Buf_AddBytes+140>:	blr

If such code can still be generated there is is a time frame needing a red-zone to protect stack contents.

Hopefully I'm just wrong and this was fixed too.
Comment 9 Krzysztof Parzyszek 2016-09-10 20:37:55 PDT
The post-amble has not been fixed.
Comment 10 Krzysztof Parzyszek 2016-09-12 13:16:52 PDT
The epilogue part of the fix: https://reviews.llvm.org/D24466

Hopefully there is nothing else missing.
Comment 11 Krzysztof Parzyszek 2016-09-22 12:23:21 PDT
Committed in r282174.
Comment 12 Mark Millard 2016-11-25 00:42:15 PST
(In reply to comment #11)
> Committed in r282174.

projects/clang390-import has been merged to head for FreeBSD --and is
now scheduled for an MFC from there into stable/11 in a month. It does
not yet have this fix.

But there is a chance for later 11.x-RELEASE's to support powerpc64 and/or
powerpc via clang, not just 12.0-RELEASE. (I'm not implying in a month
specifically, just that stable/11 might progress as llvm does as long
as the fixes backport into FreeBSD's 3.9.0 reasonably --and that 11.x
would pick up such changes from stable/11 .)

I'm still hoping that llvm 26519's and 26970's fixes will be applied to
clang 3.9.0 in FreeBSD's head soon. That would enable some significant
testing. I could try buildworld for powerpc64 and I could remove my "red
zone" signal delivery hack for powerpc to see if it is no longer needed
[lack of ABI violations].

[I want to be able to have good runs/results of the kyua tests before I
try to deal with testing buildkernel. This requires signal handling and
C++ exception handling to be working well even if the kernel does not
require all of that.]
Comment 13 emaste 2016-11-25 12:24:16 PST
For reference this is now in FreeBSD as https://svnweb.freebsd.org/changeset/base/309147
Comment 14 Mark Millard 2016-11-27 02:00:28 PST
(In reply to comment #13)
> For reference this is now in FreeBSD as
> https://svnweb.freebsd.org/changeset/base/309147

Unfortunately there are problems handling use of r30, at least when floating
point is involved.

The used code from the compliler_rt __floatdidf seems to be just wrong in part
of its use of r30. The below initially uses the "df -m" example failure and its
code.

At first is saves r30 on the stack as part of the function preamble:

Dump of assembler code for function __floatdidf:
0x018029b8 <__floatdidf+0>:	mflr    r0
0x018029bc <__floatdidf+4>:	stw     r0,4(r1)
0x018029c0 <__floatdidf+8>:	stwu    r1,-32(r1)
0x018029c4 <__floatdidf+12>:	stw     r31,28(r1)
0x018029c8 <__floatdidf+16>:	stw     r30,24(r1)

Later it uses r30 for other things.

Then it restores it from the stack:

0x01802a08 <__floatdidf+80>:	lwz     r30,24(r1)

What it does with r30 next effectively makes r30 a parameter to the routine and
not just saved/restored:

0x01802a10 <__floatdidf+88>:	lwz     r3,-8(r30)

In the above the r30 from the caller's context is being used as the base for accessing memory and putting that memory content in r3.

What it does next with r3 is one of the points where SIGSEGV is happening: For
"df -m" r3 ends up being 0 and the following fails.

0x01802a18 <__floatdidf+96>:	lfs     f12,0(r3)


In the fsck_ufs example it is the same sort of thing but r30 happens to be
initially 0 so it fails at an earlier stage.

Again there is the preamble that saves r30:

0x0181afcc <__floatdidf+0>:	mflr    r0
0x0181afd0 <__floatdidf+4>:	stw     r0,4(r1)
0x0181afd4 <__floatdidf+8>:	stwu    r1,-32(r1)
0x0181afd8 <__floatdidf+12>:	stw     r31,28(r1)
0x0181afdc <__floatdidf+16>:	stw     r30,24(r1)

Later it uses r30 for other things.

Then it restores it from the stack:

0x0181b01c <__floatdidf+80>:	lwz     r30,24(r1)

What it does with r30 next effectively makes r30 a parameter to the routine and
not just saved/restored:

0x0181b024 <__floatdidf+88>:	lwz     r3,-12(r30)

Again: In the above the r30 from the caller's context is being used as the base for accessing memory and putting that memory content in r3.

But this time it turns out that r30 is 0 and the above also fails.

If the code had gotten that far it would have done the same thing with r3 that
"df -m" did in its failure:

0x0181b02c <__floatdidf+96>:	lfs     f12,0(r3)


For reference compiler-rt/lib/builtins/floatdidf.c has:

COMPILER_RT_ABI double
__floatdidf(di_int a)
{
    static const double twop52 = 4503599627370496.0; // 0x1.0p52
    static const double twop32 = 4294967296.0; // 0x1.0p32

    union { int64_t x; double d; } low = { .d = twop52 };

    const double high = (int32_t)(a >> 32) * twop32;
    low.x |= a & INT64_C(0x00000000ffffffff);

    const double result = (high - twop52) + low.d;
    return result;
}

and the matching assembler code for the fsck_ufs example is (from gdb):

0x0181afcc <__floatdidf+0>:	mflr    r0
0x0181afd0 <__floatdidf+4>:	stw     r0,4(r1)
0x0181afd4 <__floatdidf+8>:	stwu    r1,-32(r1)
0x0181afd8 <__floatdidf+12>:	stw     r31,28(r1)
0x0181afdc <__floatdidf+16>:	stw     r30,24(r1)
0x0181afe0 <__floatdidf+20>:	bl      0x182e96c <.got+20>
0x0181afe4 <__floatdidf+24>:	mr      r31,r1
0x0181afe8 <__floatdidf+28>:	xoris   r3,r3,32768
0x0181afec <__floatdidf+32>:	lis     r5,17200
0x0181aff0 <__floatdidf+36>:	mflr    r30
0x0181aff4 <__floatdidf+40>:	stw     r3,12(r31)
0x0181aff8 <__floatdidf+44>:	stw     r5,8(r31)
0x0181affc <__floatdidf+48>:	lwz     r3,-16(r30)
0x0181b000 <__floatdidf+52>:	lwz     r6,-20(r30)
0x0181b004 <__floatdidf+56>:	lfd     f1,8(r31)
0x0181b008 <__floatdidf+60>:	stw     r4,20(r31)
0x0181b00c <__floatdidf+64>:	stw     r5,16(r31)
0x0181b010 <__floatdidf+68>:	lfd     f13,16(r31)
0x0181b014 <__floatdidf+72>:	lwz     r0,36(r1)
0x0181b018 <__floatdidf+76>:	lwz     r31,28(r1)
0x0181b01c <__floatdidf+80>:	lwz     r30,24(r1)
0x0181b020 <__floatdidf+84>:	lfs     f2,0(r3)
0x0181b024 <__floatdidf+88>:	lwz     r3,-12(r30)
0x0181b028 <__floatdidf+92>:	lfs     f0,0(r6)
0x0181b02c <__floatdidf+96>:	lfs     f12,0(r3)
0x0181b030 <__floatdidf+100>:	fsub    f0,f1,f0
0x0181b034 <__floatdidf+104>:	fmul    f0,f0,f2
0x0181b038 <__floatdidf+108>:	fadd    f0,f0,f12
0x0181b03c <__floatdidf+112>:	fadd    f1,f13,f0
0x0181b040 <__floatdidf+116>:	addi    r1,r1,32
0x0181b044 <__floatdidf+120>:	mtlr    r0
0x0181b048 <__floatdidf+124>:	blr
Comment 15 Mark Millard 2016-11-27 16:39:37 PST
(In reply to comment #14)

My interpretation of this is that the "lwz r30,24(r1)" is simply out
of sequence and should be later in the sequence:

> 0x0181b01c <__floatdidf+80>:	lwz     r30,24(r1)  <<<=== not here
> 0x0181b020 <__floatdidf+84>:	lfs     f2,0(r3)
> 0x0181b024 <__floatdidf+88>:	lwz     r3,-12(r30)
> 0x0181b028 <__floatdidf+92>:	lfs     f0,0(r6)
> 0x0181b02c <__floatdidf+96>:	lfs     f12,0(r3)
> 0x0181b030 <__floatdidf+100>:	fsub    f0,f1,f0
> 0x0181b034 <__floatdidf+104>:	fmul    f0,f0,f2
> 0x0181b038 <__floatdidf+108>:	fadd    f0,f0,f12
> 0x0181b03c <__floatdidf+112>:	fadd    f1,f13,f0
lwz     r30,24(r1)  <<<<<<<<<<=============== instead here
> 0x0181b040 <__floatdidf+116>:	addi    r1,r1,32
> 0x0181b044 <__floatdidf+120>:	mtlr    r0
> 0x0181b048 <__floatdidf+124>:	blr

(addresses not adjusted for the change in the above)

So that r30 would be restored between the main activity being
finished and r1 being adjusted (popping the stack frame). Of
course any place after the last local r30 use [after -12(r30)]
up to where I show it would also work.

I reported this issue as a comment on 26519 because I'm guessing
that the ordering problem is tied to trying to trying to avoid the
stack-handling ABI violations for FreeBSD: another incompleteness
in the coverage of the stack handling. If it is not that then a
separate submittal would likely be appropriate.

Since I'm allowed to I'm going to change the status to reopened so
that it does not look to have been fully fixed. Let me know if this
is inappropriate since I would not be the one fixing it.
Comment 16 Mark Millard 2017-04-06 00:26:07 PDT
FYI: FreeBSD has merged clang/llvm 4.0 into stable/11 (-r316423)
in addition to head (12) some time ago. release/11.1.0 will be
clang/llvm 4.0 based.

TARGET_ARCH=powerpc64 and TARGET_ARCH=powerpc still does not
handle thrown exceptions in this clang version.

TARGET_ARCH=powerpc still has the R30 vs. tack handling vs.
floating point code problem: restored R30 and then generates
floating point cleanup code that expects R30 to not have
changed yet.

(That last is why this was reopened, although at the time
the clang was an earlier version.)
Comment 17 Krzysztof Parzyszek 2017-05-04 11:50:29 PDT
Here's what I tried:

--- float-r30.c ---
double __floatdidf(long long int a) {
    static const double twop52 = 4503599627370496.0;
    static const double twop32 = 4294967296.0;

    union { long long int x; double d; } low = { .d = twop52 };

    const double high = (int)(a >> 32) * twop32;
    low.x |= a & 0x00000000ffffffffL;

    const double result = (high - twop52) + low.d;
    return result;
}
-------------------

clang -target powerpc -S -O2 float-r30.c -fpic

Output:

--- float-r30.s ---
        .text
        .file   "float-r30.c"
        .section        .rodata.cst4,"aM",@progbits,4
        .p2align        2
.LCPI0_0:
        .long   1501560836              # float 4.50360177E+15
.LCPI0_1:
        .long   1333788672              # float 4.2949673E+9
.LCPI0_2:
        .long   3649044480              # float -4.50359963E+15
        .text
        .globl  __floatdidf
        .p2align        2
        .type   __floatdidf,@function
__floatdidf:                            # @__floatdidf
.Lfunc_begin0:
# BB#0:                                 # %entry
        mflr 0
        stw 0, 4(1)
        stwu 1, -32(1)
        stw 31, 28(1)
        stw 30, 24(1)
        bl _GLOBAL_OFFSET_TABLE_@local-4
        mr 31, 1
        lis 5, 17200
        xoris 3, 3, 32768
        mflr 30
        stw 5, 8(31)
        stw 3, 12(31)
        lwz 3, .LCPI0_1@GOT(30)
        lwz 6, .LCPI0_0@GOT(30)
        lfd 1, 8(31)
        stw 4, 20(31)
        stw 5, 16(31)
        lfd 13, 16(31)
        lwz 0, 36(1)
        lwz 31, 28(1)
        lwz 30, 24(1)
        lfs 2, 0(3)
        lwz 3, .LCPI0_2@GOT(30)
        lfs 0, 0(6)
        lfs 12, 0(3)
        fsub 0, 1, 0
        fmul 0, 0, 2
        fadd 0, 0, 12
        fadd 1, 0, 13
        addi 1, 1, 32
        mtlr 0
        blr
.Lfunc_end0:
        .size   __floatdidf, .Lfunc_end0-.Lfunc_begin0


        .ident  "clang version 5.0.0 (http://llvm.org/git/clang.git fcb52251e4b32f8f4ab0a4d434c3f9c5a5ffe008) (http://llvm.org/git/llvm.git 624e695cd90a7b570df26159f225f79fc8a86976)"
        .section        ".note.GNU-stack","",@progbits
-------------------


The r30 is used as a PIC base pointer, and it is restored before it's used. This is actually correct.
Comment 18 Krzysztof Parzyszek 2017-05-04 11:56:31 PDT
I take it back.  I just noticed that it's set to GOT at the entry to the function, not before it.
Comment 19 Krzysztof Parzyszek 2017-05-04 12:26:02 PDT
Committed a fix in r302183.
Comment 20 Krzysztof Parzyszek 2017-05-04 12:42:26 PDT
Opened PR32930 to request merging this into 4.0.1.
Comment 21 Mark Millard 2017-05-05 01:02:16 PDT
(In reply to Krzysztof Parzyszek from comment #19)
> Committed a fix in r302183.

Thanks. FreeBSD's head merged it in as -r317810
and I've built and booted -r317820 based on:

A) buildworld built via the updated system-clang 4
B) buildkernel built via gcc 4.2.1

It appears for (A) that even C++ exceptions work to
some extent (possibly fully) and various previously
broken commands now work (no stack problem).

Thanks much. Big improvement.

stable/11 will probably merge the change in
a few days.

Booting a kernel from a system-clang 4 based
buildkernel fails with "exec /sbin/init:
error 13". This is not new. But it does get
to the point of attempting /sbin/init .

(Note: powerpc64 could use the 2 line
patch listed in llvm bugzilla 31716's Comment
1 to get to a somewhat similar state of
usability on FreeBSD. As stands I use a
personal patch for the issue but that is just
me.)
Comment 22 Mark Millard 2017-05-05 18:13:43 PDT
(In reply to Mark Millard from comment #21)
> (In reply to Krzysztof Parzyszek from comment #19)
> > Committed a fix in r302183.
> 
> Thanks. FreeBSD's head merged it in as -r317810
> and I've built and booted -r317820 based on:
> 
> A) buildworld built via the updated system-clang 4
> B) buildkernel built via gcc 4.2.1
> 
> It appears for (A) that even C++ exceptions work to
> some extent (possibly fully) and various previously
> broken commands now work (no stack problem).

I got testing clang vs. gcc 4.2.1 confused: only the
gcc 4.2.1 buildworld ends up with a c++ compiler where
handling thrown C++ exceptions works. When clang++ is
used handling thrown C++ exceptions fails in the program
that results, like before.

(powerpc64 has the same sort of C++ exception handling
issues.)

But the stack handling fix is still a great improvement
for powerpc: thanks again.
Comment 23 Mark Millard 2017-05-05 21:41:49 PDT
(In reply to Krzysztof Parzyszek from comment #19)
> Committed a fix in r302183.

buildworld buildkernel makes extensive use
of signals and its failure is how I discovered
the original stack handling problems. I used to
have to patch in so-called "red zone" handling
to avoid the issue.

No more: a running a kernel that was built
without a "red zone" and running a world based
on clang now allows buildworld buildkernel to
complete just fine.

Not needing a "red zone" is a great
improvement.
Comment 24 Mark Millard 2017-05-06 02:45:41 PDT
(In reply to Krzysztof Parzyszek from comment #19)
> Committed a fix in r302183.

Bad news: Another code generation error, this
time demonstrated in compiling part of perl5. . .
(I tried to build a port that indirectly
tried build perl5 but perl5's miniperl crashes.)

/usr/obj/portswork/usr/ports/lang/perl5.24/work/perl-5.24.1/numeric.c

has Perl_cast_iv(NV f) for which clang double stores
two different things to one address [24(r1)]. Below
the => lines are the double store, the second
destroying the r30 value that was saved in the first:

Dump of assembler code for function Perl_cast_iv:
   0x0196a114 <+0>:	mflr    r0
   0x0196a118 <+4>:	stw     r0,4(r1)
   0x0196a11c <+8>:	stwu    r1,-32(r1)
   0x0196a120 <+12>:	stw     r31,28(r1)
=>   0x0196a124 <+16>:	stw     r30,24(r1)
   0x0196a128 <+20>:	mr      r31,r1
   0x0196a12c <+24>:	mfcr    r12
=>   0x0196a130 <+28>:	stw     r12,24(r31)

Note: r31 == r1 for that second "=>" line.

The return code sequence has a similar
problem: two loads from the same address.

Note: r31 == r1 here too.

=>   0x0196a1bc <+168>:	lwz     r12,24(r31)
   0x0196a1c0 <+172>:	lwz     r0,36(r1)
   0x0196a1c4 <+176>:	lwz     r31,28(r1)
=>   0x0196a1c8 <+180>:	lwz     r30,24(r1)
   0x0196a1cc <+184>:	mtcrf   32,r12
   0x0196a1d0 <+188>:	addi    r1,r1,32
   0x0196a1d4 <+192>:	mtlr    r0
   0x0196a1d8 <+196>:	blr

The Perl_cast_iv source code looks like:

IV
Perl_cast_iv(NV f)
{
  if (f < IV_MAX_P1)
    return f < IV_MIN ? IV_MIN : (IV) f;
  if (f < UV_MAX_P1) {
#if CASTFLAGS & 2
    /* For future flexibility allowing for sizeof(UV) >= sizeof(IV)  */
    if (f < UV_MAX_P1_HALF)
      return (IV)(UV) f;
    f -= UV_MAX_P1_HALF;
    return (IV)(((UV) f) | (1 + (UV_MAX >> 1)));
#else
    return (IV)(UV) f;
#endif
  }
  return f > 0 ? (IV)UV_MAX : 0 /* NaN */;
}
Comment 25 Krzysztof Parzyszek 2017-05-08 05:41:37 PDT
Could you attach a preprocessed file?
Comment 26 Mark Millard 2017-05-08 11:37:19 PDT
Created attachment 18417 [details]
.zip of preprocessed numeric.c from perl5 for comments 24 & 25

The file in the .zip was produced with:

cd /usr/obj/portswork/usr/ports/lang/perl5.24/work/perl-5.24.1

cpp -c -DPERL_CORE -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_FORTIFY_SOURCE=2 -O2 -pipe -g -fno-strict-aliasing -Wall -Werror=declaration-after-statement -Wextra -Wc++-compat -Wwrite-strings -Wthread-safety -DPIC -fPIC numeric.c > ~/perl_numeric_powerpc.txt

The original compile had cc instead of cpp and did not have the I/O
redirection: I edited the line shown in the log file to produce the
above line. Luckily clang's cpp ignores irrelevant compiler options,
making a correctly matching cpp command easy to produce.

You likely want to rename perl_numeric_powerpc.txt to compile it.
Comment 27 Mark Millard 2017-05-08 11:45:05 PDT
(In reply to Mark Millard from comment #26)
> Created attachment 18417 [details]
> .zip of preprocessed numeric.c from perl5 for comments 24 & 25

The Perl_cast_iv code ends up looking like:

. . .
typedef long long IV;
typedef unsigned long long UV;
. . .
typedef double NV;
. . .

IV
Perl_cast_iv(NV f)
{
  if (f < (2.0 * (1 + (((UV)((IV) ((~(UV)0) >> 1))) >> 1))))
    return f < (-((IV) ((~(UV)0) >> 1)) - ((3 & -1) == 3)) ? (-((IV) ((~(UV)0) >> 1)) - ((3 & -1) == 3)) : (IV) f;
  if (f < (4.0 * (1 + (((~(UV)0)) >> 2)))) {







    return (IV)(UV) f;

  }
  return f > 0 ? (IV)(~(UV)0) : 0 ;
}
Comment 28 Krzysztof Parzyszek 2017-05-09 15:19:12 PDT
I posted a patch for master in https://reviews.llvm.org/D33017, but it doesn't apply cleanly to release_40.

I'm attaching the version for 4.0.  Could you try if it works?
Comment 29 Krzysztof Parzyszek 2017-05-09 15:19:53 PDT
Created attachment 18423 [details]
Patch for release_40
Comment 30 Mark Millard 2017-05-09 15:58:26 PDT
(In reply to Krzysztof Parzyszek from comment #28)
> I posted a patch for master in https://reviews.llvm.org/D33017, but it
> doesn't apply cleanly to release_40.
> 
> I'm attaching the version for 4.0.  Could you try if it works?

I'll see about updating to such a clang and retrying the
port builds with it, for example.

I'll note that Roman Divacky has me trying for clang 4:

# svnlite diff /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
Index: /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
===================================================================
--- /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp	(revision 317820)
+++ /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp	(working copy)
@@ -1187,7 +1187,7 @@
       // For SVR4, don't emit a move for the CR spill slot if we haven't
       // spilled CRs.
       if (isSVR4ABI && (PPC::CR2 <= Reg && Reg <= PPC::CR4)
-          && !MustSaveCR)
+          && (!MustSaveCR && isPPC64))
         continue;
 
       // For 64-bit SVR4 when we have spilled CRs, the spill location

for having the CFI information written out for condition code
register use for powerpc (later code not shown). He is looking
into some of the issues for C++ handling thrown exceptions.
Comment 31 Mark Millard 2017-05-09 17:21:18 PDT
(In reply to Krzysztof Parzyszek from comment #28)
> I posted a patch for master in https://reviews.llvm.org/D33017, but it
> doesn't apply cleanly to release_40.
> 
> I'm attaching the version for 4.0.  Could you try if it works?

I've done a buildworld based on the patch, installed it,
booted it, and: built and installed /usr/ports/lang/perl5.24
with it. That includes the internal miniperl build that
previous created a miniperl that crashed --and the use of
the new miniperl in the later stages of building perl:
no crashes and things seem to be operating.

So it looks good so far for what the patch was intended
to fix (32-bit powerpc context).
Comment 32 Krzysztof Parzyszek 2017-05-10 05:41:06 PDT
(In reply to Mark Millard from comment #31)
> 
> So it looks good so far for what the patch was intended
> to fix (32-bit powerpc context).

That is great news. Thank you for doing all this work.
Comment 33 Krzysztof Parzyszek 2017-05-17 06:45:55 PDT
The fix has been committed in master in r303257.

I opened PR33070 to request merging it into 4.0.1.
Comment 34 emaste 2017-11-12 11:04:27 PST
There are a couple of different issues discussed in this PR so I'm not sure if they are all resolved at this point or not. Mark?
Comment 35 Mark Millard 2017-11-12 11:35:30 PST
(In reply to emaste from comment #34)
> There are a couple of different issues discussed in this PR so I'm not sure
> if they are all resolved at this point or not. Mark?

Overall I expect closing 226519 as fixed again is
appropriate.

Supporting details:

With multiple problems at the same time making notes
indicating what is working vs. what is not can be a
mess: when one thing is fixed things can still fail
overall.

The particular type of issue 26519 directly applies
to is just code generation and its stack handling
properties, not other things tied to C++ exceptions
working (even though I note failures in thrown C++
exceptions).

I've not found any more evidence of the code generation
mishandling the stack or ordering the code in a way that
violates the ABI. My testing has been with FreeBSD's
clang 5 in head for some time now.

However, at this point I've not investigated the /sbin/init
failure that was noted at all to know if code generation
was involved. If code generation for stack operations was
involved then this could be re-opened if closed. (As has
happened before for incomplete stack handling fixes.)

The kernel fails to build for powerpc via clang 5 so I
do not get as far as testing /sbin/init based on clang 5
code generation.
Comment 36 emaste 2017-11-12 11:42:57 PST
Closing per feedback from Mark.