LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 26605 - FreeBSD projects/clang380-import for TARGET_ARCH=powerpc : clang 3.8.0 sometimes mishandles va_list overflow_arg_area vs. reg_save_area
Summary: FreeBSD projects/clang380-import for TARGET_ARCH=powerpc : clang 3.8.0 someti...
Status: RESOLVED FIXED
Alias: None
Product: clang
Classification: Unclassified
Component: C++ (show other bugs)
Version: 3.8
Hardware: Other FreeBSD
: P normal
Assignee: Unassigned Clang Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-13 20:45 PST by Mark Millard
Modified: 2016-02-20 03:00 PST (History)
4 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Millard 2016-02-13 20:45:33 PST
In a projects/clang380-import context libxo code use by "ls -l -n /" gets a segmentation violation (SEGV) for TARGET_ARCH=powerpc for buildworld/installworld. The later simplified code reproduces the problem.

The issue is complicated alignment/placement handling of va_list content between overflow_arg_area and reg_save_area that can happen when things larger than 4 bytes in size are involved in the mix, such as intmax_t.

It can be that only 4 bytes are left in reg_save_area when the next things is, say, 8 bytes in size. The next thing after that can be 4 bytes in size --which place does the 4 byte item go and which is it extracted from?

The code generated is not consistent with itself. The following short program demonstrates the problem.

#include <stdarg.h> // for va_list, va_start, va_arg, va_end
#include <stdint.h> // for intmax_t

intmax_t
va_test (char *s, ...)
{
    va_list vap;

    va_start(vap, s);

    char*        t0 = va_arg(vap, char*);
    unsigned int o0 = va_arg(vap, unsigned int);
    int          c0 = va_arg(vap, int);
    unsigned int u0 = va_arg(vap, unsigned int);
    int          c1 = va_arg(vap, int);
    char *       t1 = va_arg(vap, char*);
 
    intmax_t     j0 = va_arg(vap, intmax_t); // This spans into overflow_arg_area.

    int          c2 = va_arg(vap, int);      // A copy was put in the 
                                             // overflow_arg_area because of the
                                             // above.
                                             // But this tries to extract from the
                                             // last 4 bytes of the reg_save_area.
                                             // It does not increment the
                                             // overflow_arg_area position pointer
                                             // past the copy that is there.

    char *       t2 = va_arg(vap, char*);    // The lack of increment before makes
                                             // this extraction off by 4 bytes.

    char         t2fc = *t2;  // <<< This gets SEGV. t2 actually got what should be
                              //     the c2 value.

    intmax_t     j1 = va_arg(vap, intmax_t);

    va_end(vap);

    return (intmax_t) ((s-t2)+(t0-t1)+o0+u0+j0+j1+c0+c1+c2+t2fc);
    // Avoid any optimize-away for lack of use.
}

int main(void)
{
    char         s[1025] = "test string for this";

    char*        t0 = s + 5;
    unsigned int o0 = 3;
    int          c0 = 1;
    unsigned int u0 = 1;
    int          c1 = 3;
    char *       t1 = s + 12;
    intmax_t     j0 = 314159265358979323;
    int          c2 = 4;
    char *       t2 = s + 16;
    intmax_t     j1 = ~314159265358979323;

    intmax_t      result = va_test(s,t0,o0,c0,u0,c1,t1,j0,c1,t2,j1);

    return (int) (result - (intmax_t) ((s-t2)+(t0-t1)+o0+u0+j0+j1+c0+c1+c2+*t2));
    // Avoid any optimize-away for lack of use.
}

Context:

# freebsd-version -ku; uname -aKU
11.0-CURRENT
11.0-CURRENT
FreeBSD FBSDG4C1 11.0-CURRENT FreeBSD 11.0-CURRENT #3 r295351M: Sat Feb  6 16:27:58 PST 2016     markmi@FreeBSDx64:/usr/obj/clang_gcc421/powerpc.powerpc/usr/src/sys/GENERICvtsc-NODEBUG  powerpc 1100097 1100097
Comment 1 Mark Millard 2016-02-14 16:42:26 PST
From an exchange in FreeBSD's areas (but with some light editing). . .

On 2016-Feb-14, at 11:29 AM, Roman Divacky <rdivacky at vlakno.cz> wrote:

Fwiw, the code to handle the vaarg is in 
tools/clang/lib/CodeGen/TargetInfo.cpp:PPC32_SVR4_ABIInfo::EmitVAArg()

You can take a look to see whats wrong.


I then replied with:

Clang's code base is not familiar material for me nor do I have solid reference material for the FreeBSD TARGET_ARCH=powerpc ABI rules so the below has my guess work involved.

The following code appears to have hard wired a global, unvarying constant (8) into the test for picking UsingRegs vs. UsingOverflow.



 llvm::Value *NumRegs = Builder.CreateLoad(NumRegsAddr, "numUsedRegs");
. . .
 llvm::Value *CC =
     Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), "cond");

 llvm::BasicBlock *UsingRegs = CGF.createBasicBlock("using_regs");
 llvm::BasicBlock *UsingOverflow = CGF.createBasicBlock("using_overflow");
 llvm::BasicBlock *Cont = CGF.createBasicBlock("cont");

 Builder.CreateCondBr(CC, UsingRegs, UsingOverflow);
. . .
 // Case 1: consume registers.
 Address RegAddr = Address::invalid();
 {
. . .
   // Increase the used-register count.
   NumRegs =
     Builder.CreateAdd(NumRegs,
                       Builder.getInt8((isI64 || (isF64 && IsSoftFloatABI)) ? 2 : 1));
   Builder.CreateStore(NumRegs, NumRegsAddr);. . .
. . .
 }

 // Case 2: consume space in the overflow area.
 Address MemAddr = Address::invalid();
 {
. . . (no adjustments to NumRegs) . . .



If so the means of counting NumRegs (a.k.a. gpr) then needs to take into account an allocated but unused last UsingRegs "slot" sometimes. Imagine. . .

r3, r4, r5, r6, r7, r8, r9 in use already so r10 is the last possible "UsingRegs" context.
(0  1   2   3   4   5   6, leaving r10 as position 7, the last < 8 value)

Then the next two arguments are a 8 byte integer then a 4 byte integer (in that order). That results in what should be:

r10 "UsingRegs" slot reserved and un-accessed
In other words: counted as allocated so that the rest goes in in the overflow area
(so no position 7 usage)

then

overflow with the 8 byte integer then the 4 byte integer.


And, in fact, the memory content reflects this in the overflow area.


But the va_arg access code does not count r10's slot as allocated in "UsingRegs" after the 8 byte integer. So later it tries to use r10's slot for the 4 byte integer that is actually in the UsingOverflow area.

One fix of sorts is to have "Case 2: consume space in the overflow area." set NumRegs (a.k.a. gpr) to the bound from the Builder.CreateICmpULT (8 in this context). Then the first (or any/every) use of the UsingOverflow area forces no more use of the UsingRegs area (for the involved va_list).
Comment 2 Mark Millard 2016-02-15 02:51:41 PST
(In reply to comment #1)

On Sun Feb 14 23:46:14 UTC 2016 Nathan Whitehorn wrote:

On 02/14/16 14:34, Mark Millard wrote:
> clang's code base is not familiar material for me nor do I have solid 
> reference material for the FreeBSD TARGET_ARCH=powerpc ABI rules so 
> the below has my guess work involved. The following code appears to 
> have hard wired a global, unvarying constant (8) into the test for 
> picking UsingRegs vs. UsingOverflow.

For reference, we use the standard ELF ABI 
(https://uclibc.org/docs/psABI-ppc.pdf).
-Nathan

Reviewing the Parameter Passing material in that document shows that the problem is in the original specification.

And there is a more modern specification that has a fix in its wording. (Which shows that I'm not likely to be wrong.) I'll reference and quote it later.

First I'll explain the problem that is in psABI-ppc.pdf (the old SunSoft 1995 document).

First a numbering point: psABI-ppc.pdf uses "gr" matching the numeral in r3, r4, . . . , r10, starting at r3 (i.e, 3). And gr indicates the next register to be used, not the last one already used.

The document splits the algorithm for placement of parameters into 3 stages with the following structure, intended as they have it in the document but various less interesting details for my "8byte then 4byte" example omitted:

INITIALIZING:
     Set fr=1, gr=3, and starg to the address of
     parameter word 1.
SCAN:
     If there are no more arguments, terminate.
     Otherwise, select one of the following
     depending on the type of the next argument:

     DOUBLE_OR_FLOAT
        If fr>8 ( . . .), go to OTHER. Otherwise,
        . . .

     SIMPLE_ARG
        If gr>10, go to OTHER. Otherwise, load the
        argument value into general register gr,
        set gr to gr+1, can goto SCAN. . . .

     LONG_LONG
        If gr>9, go to OTHER. Otherwise, . . .

OTHER:
       Arguments not otherwise handled above are
       passed in the parameter words of the
       caller’s stack frame. . . . Set starg to
       starg+size, then go to SCAN.

Note that gr is not incremented by LONG_LONG or by the later OTHER usage when gr>9. (That would be my example's 8 byte integer that is later followed by a 4 byte one.)

That OTHER's "go to SCAN" would then lead to the following 4 byte integer in my example to be put in r10 and gr then being set to 11 instead of it being stored in a parameter word on the stack.

The nasty thing about this for va_list/va_arg use is that the stored information does not indicate which was before vs. after in the argument order: the 4 byte r10 content or the 8 byte "OTHER" content: the two orders produce identical results.

This can not be correct.

The Power-Arch-32-bit-ABI-supp-1.0-Unified.pdf is more modern and explicitly deals with VR and other modern things. (Its terminology matching LONG_LONG above is DUAL_GP.) But for what I'm dealing with here it has the following extra wording at the very end of its OTHER section:

If gr>9 and the type is DUAL_GP ,or . . ., or . . ., then set gr = 11 (to prevent subsequent SINGLE_GPs from being placed in registers after DUAL_GP, QUAD_GP, or EIGHT_GP arguments that would no longer fit in the registers).
Comment 3 Mark Millard 2016-02-18 17:43:27 PST
Roman Divacky (rdivacky at vlakno.cz) came up with the following patch [other than some discussion of what turned into "(8)" in "getIn8(8)" below]. The svnlite diff below is from my testing area for projects/clang380-import in FreeBSD's svn. So the revision number is from there too.

Index: /usr/src/contrib/llvm/tools/clang/lib/CodeGen/TargetInfo.cpp
===================================================================
--- /usr/src/contrib/llvm/tools/clang/lib/CodeGen/TargetInfo.cpp	(revision 295601)
+++ /usr/src/contrib/llvm/tools/clang/lib/CodeGen/TargetInfo.cpp	(working copy)
@@ -3569,6 +3569,8 @@
  {
    CGF.EmitBlock(UsingOverflow);

+    Builder.CreateStore(Builder.getInt8(8), NumRegsAddr);
+
    // Everything in the overflow area is rounded up to a size of at least 4.
    CharUnits OverflowAreaAlign = CharUnits::fromQuantity(4);

The "(8)" is intended to match the "(8)" from:

  llvm::Value *CC =
      Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), "cond");


My simple test cases and FreeBSD's clang380-import "buildworld" materials (built with the udpate clang 3.8.0) all indicate that this fixes the problem reported.

For example each of the following now work based on code generation from the patched clang 3.8.0:

la -l -n /

svnlite update -295601 /usr/src

Both are from using the patched clang 3.8.0 to "buildworld" and then install it and reboot and use the software. The FreeBSD vintage started from is projects/clang380-import -r295601 .

I did some testing with doubles in use as a variant of my simplified example as well in order to test the fpr related handling, not just gpr related. The mix worked fine.


[Note: I run with the signal delivery modified to have a "red zone" to deal with other aspects of clang 3.8.0 code generation that are currently not ABI compliant for when the stack pointer is moved. Having a "red zone" is still operationally correct for an ABI compliant code generation, it just temporarily wastes more bytes. Also: the kernel was built with gcc 4.2.1 but world was built with the patched clang 3.8.0. At this stage I've only been identifying issues with "world" vs. clang 3.8.0. The kernel is a much more involved issue.]
Comment 4 Mark Millard 2016-02-20 02:43:52 PST
On 2016-Feb-20, at 12:34 AM, Roman Divacky <rdivacky at vlakno.cz> wrote (on a FreeBSd list):

Fwiw, I've just committed the patch to clang in r261422. You might want
to keep using a local modification or ask dim@ to import that patch
to our copy of 3.8.

Thanks for your diagnosis and testing!

Roman
Comment 5 Roman Divacky 2016-02-20 03:00:52 PST
Fixed in r261422.