for the following IR : ; ModuleID = '/Volumes/sc3_src/test-ppc/small-struct.c' target datalayout = "E-m:o-p:32:32-f64:32:64-n32" target triple = "powerpc-apple-macosx10.5.0" %struct.sm = type { i8, i8 } ; Function Attrs: nounwind ssp define void @foo(%struct.sm* byval %s) #0 { entry: %a = getelementptr inbounds %struct.sm* %s, i32 0, i32 0 %0 = load i8* %a, align 1, !tbaa !1 %conv2 = zext i8 %0 to i32 %add = add nuw nsw i32 %conv2, 3 %conv1 = trunc i32 %add to i8 store i8 %conv1, i8* %a, align 1, !tbaa !1 call void @bar(%struct.sm* byval %s, %struct.sm* byval %s) #2 ret void } declare void @bar(%struct.sm* byval, %struct.sm* byval) #1 attributes #0 = { nounwind ssp "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #2 = { nounwind } !llvm.ident = !{!0} !0 = metadata !{metadata !"clang version 3.5.0 (trunk 212556)"} !1 = metadata !{metadata !2, metadata !3, i64 0} !2 = metadata !{metadata !"sm", metadata !3, i64 0, metadata !3, i64 1} !3 = metadata !{metadata !"omnipotent char", metadata !4, i64 0} !4 = metadata !{metadata !"Simple C/C++ TBAA"} ==== the following printout shows that the store of r2 gets delayed (and the copy to the active register is lost). # *** IR Dump After If Converter *** ( LOOKS OK ): # Machine code for function foo: Post SSA Frame Objects: fi#-2: size=4, align=4, fixed, at location [SP-4] fi#-1: size=2, align=2, fixed, at location [SP+26] Function Live Ins: %R3 in %vreg0 BB#0: derived from LLVM BB %entry Live Ins: %R3 %R0<def> = MFLR %LR<imp-use> STW %R31, -4, %R1 STW %R0, 8, %R1 %R1<def,tied3> = STWU %R1, -64, %R1<tied0> %R31<def> = OR %R1, %R1 STH %R3<kill>, 90, %R31; mem:ST2[%s] %R2<def> = LBZ 90, %R31; mem:LD1[%a3](align=2)(tbaa=<badref>) %R2<def> = ADDI %R2<kill>, 3 STB %R2<kill>, 90, %R31; mem:ST1[%a3](align=2)(tbaa=<badref>) %R3<def> = LHZ 90, %R31; mem:LD2[FixedStack-1](align=4) %R4<def> = OR %R3, %R3 BL <ga:@bar>, <regmask>, %LR<imp-def>, %RM<imp-use>, %R3<imp-use>, %R4<imp-use,kill>, %R1<imp-def> %R1<def> = ADDI %R1, 64 %R0<def> = LWZ 8, %R1 %R31<def> = LWZ -4, %R1 MTLR %R0, %LR<imp-def> BLR %LR<imp-use>, %RM<imp-use> # End machine code for function foo. # *** IR Dump After Post RA top-down list latency scheduler *** (LOOKS like wrong code): # Machine code for function foo: Post SSA Frame Objects: fi#-2: size=4, align=4, fixed, at location [SP-4] fi#-1: size=2, align=2, fixed, at location [SP+26] Function Live Ins: %R3 in %vreg0 BB#0: derived from LLVM BB %entry Live Ins: %R3 %R0<def> = MFLR %LR<imp-use> STW %R31, -4, %R1 STW %R0<kill>, 8, %R1 %R1<def,tied3> = STWU %R1, -64, %R1<tied0> %R31<def> = OR %R1, %R1 STH %R3<kill>, 90, %R31; mem:ST2[%s] %R3<def> = LHZ 90, %R31; mem:LD2[FixedStack-1](align=4) %R2<def> = LBZ 90, %R31; mem:LD1[%a3](align=2)(tbaa=<badref>) %R2<def> = ADDI %R2<kill>, 3 %R4<def> = OR %R3, %R3 STB %R2<kill>, 90, %R31; mem:ST1[%a3](align=2)(tbaa=<badref>) ^^^^ note that the store is now ordered after the registers are setup for the call to bar - so the input to bar appears incorrect. BL <ga:@bar>, <regmask>, %LR<imp-def>, %RM<imp-use>, %R3<imp-use,kill>, %R4<imp-use,kill>, %R1<imp-def> %R1<def> = ADDI %R1, 64 %R0<def> = LWZ 8, %R1 %R31<def> = LWZ -4, %R1 MTLR %R0<kill>, %LR<imp-def> BLR %LR<imp-use>, %RM<imp-use> # End machine code for function foo.
Adding Bill and Uli, because they've touched similar ABI code for PPC64 recently, and might have some insight on what's supposed to happen. The problem here is that the parameter, %struct.sm* byval %s, is marked as byval, and in the Darwin ABI code, the stack object for this parameter is created like this: 3057 int FI = MFI->CreateFixedObject(ObjSize, CurArgOffset, true); Note that last parameter, which is true, is the parameter isImmutable. By setting it to true, you're promising that nothing will modify it, and so the scheduler is free not to add dependency chains from any loads of that parameter to anything else (which is what happens here, resulting in your miscompile). I'll note that in the PPC64 ABI code, we also tag some of our byval parameters as immutable (such as those coming from the caller's stack frame), and someone should audit/confirm that's really appropriate.
Reviewing LowerFormalArguments_64SVR4, I see four places where CreateFixedObject is currently being called with immutable = true: - For the dummy object representing a zero-sized byval parameter This looks OK, since the object must not be accessed anyway. - For all other byval parameters This is wrong, since byval parameters may be written to - For non-byval parameters that don't fit in registers (and we don't have tail call) This looks OK; those parameters will only be read out of their stack slots once - For unnamed vararg parameters This looks OK; va_arg will only copy a parameter out, and not write to it I'll check in a fix for the one incorrect case shortly.
Fix for the case found in the previous comment is now checked in as revision 214517.
As it turns out, there were two things wrong here. First was the immutable property. Second, there was an assumption that all fixed-offset stack objects were not pointed-to by IR values (which is false for byvals). This should be fixed by r215793/r215794/r215795.
(In reply to comment #4) > As it turns out, there were two things wrong here. First was the immutable > property. Second, there was an assumption that all fixed-offset stack > objects were not pointed-to by IR values (which is false for byvals). > > This should be fixed by r215793/r215794/r215795. Hal, can you recall why the "isAliased" part of this fix was PPC only? On pretty much all targets, byval is used when the argument lives in some fixed stack object that is aliased by an LLVM value (the pointer argument). Also, if I revert the change (r215795), the checked in test case still passes. I'd like to remove the isAliased boolean from StackObject if possible.