LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 20280 - Load-Store ordering issue for top-down list latency scheduler.
Summary: Load-Store ordering issue for top-down list latency scheduler.
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: PowerPC (show other bugs)
Version: trunk
Hardware: Macintosh MacOS X
: P normal
Assignee: Hal Finkel
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-10 10:32 PDT by Iain Sandoe
Modified: 2017-01-03 14:38 PST (History)
6 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Iain Sandoe 2014-07-10 10:32:36 PDT
for the following IR :
; ModuleID = '/Volumes/sc3_src/test-ppc/small-struct.c'
target datalayout = "E-m:o-p:32:32-f64:32:64-n32"
target triple = "powerpc-apple-macosx10.5.0"
 
%struct.sm = type { i8, i8 }
 
; Function Attrs: nounwind ssp
define void @foo(%struct.sm* byval %s) #0 {
entry:
  %a = getelementptr inbounds %struct.sm* %s, i32 0, i32 0
  %0 = load i8* %a, align 1, !tbaa !1
  %conv2 = zext i8 %0 to i32
  %add = add nuw nsw i32 %conv2, 3
  %conv1 = trunc i32 %add to i8
  store i8 %conv1, i8* %a, align 1, !tbaa !1
  call void @bar(%struct.sm* byval %s, %struct.sm* byval %s) #2
  ret void
}
 
declare void @bar(%struct.sm* byval, %struct.sm* byval) #1
 
attributes #0 = { nounwind ssp "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }
 
!llvm.ident = !{!0}
 
!0 = metadata !{metadata !"clang version 3.5.0 (trunk 212556)"}
!1 = metadata !{metadata !2, metadata !3, i64 0}
!2 = metadata !{metadata !"sm", metadata !3, i64 0, metadata !3, i64 1}
!3 = metadata !{metadata !"omnipotent char", metadata !4, i64 0}
!4 = metadata !{metadata !"Simple C/C++ TBAA"}
 
==== the following printout shows that the store of r2 gets delayed (and the copy to the active register is lost).
 
# *** IR Dump After If Converter *** ( LOOKS OK ):
# Machine code for function foo: Post SSA
Frame Objects:
  fi#-2: size=4, align=4, fixed, at location [SP-4]
  fi#-1: size=2, align=2, fixed, at location [SP+26]
Function Live Ins: %R3 in %vreg0
 
BB#0: derived from LLVM BB %entry
    Live Ins: %R3
	%R0<def> = MFLR %LR<imp-use>
	STW %R31, -4, %R1
	STW %R0, 8, %R1
	%R1<def,tied3> = STWU %R1, -64, %R1<tied0>
	%R31<def> = OR %R1, %R1
	STH %R3<kill>, 90, %R31; mem:ST2[%s]
	%R2<def> = LBZ 90, %R31; mem:LD1[%a3](align=2)(tbaa=<badref>)
	%R2<def> = ADDI %R2<kill>, 3
	STB %R2<kill>, 90, %R31; mem:ST1[%a3](align=2)(tbaa=<badref>)
	%R3<def> = LHZ 90, %R31; mem:LD2[FixedStack-1](align=4)
	%R4<def> = OR %R3, %R3
	BL <ga:@bar>, <regmask>, %LR<imp-def>, %RM<imp-use>, %R3<imp-use>, %R4<imp-use,kill>, %R1<imp-def>
	%R1<def> = ADDI %R1, 64
	%R0<def> = LWZ 8, %R1
	%R31<def> = LWZ -4, %R1
	MTLR %R0, %LR<imp-def>
	BLR %LR<imp-use>, %RM<imp-use>
 
# End machine code for function foo.
 
# *** IR Dump After Post RA top-down list latency scheduler *** (LOOKS like wrong code):
# Machine code for function foo: Post SSA
Frame Objects:
  fi#-2: size=4, align=4, fixed, at location [SP-4]
  fi#-1: size=2, align=2, fixed, at location [SP+26]
Function Live Ins: %R3 in %vreg0
 
BB#0: derived from LLVM BB %entry
    Live Ins: %R3
	%R0<def> = MFLR %LR<imp-use>
	STW %R31, -4, %R1
	STW %R0<kill>, 8, %R1
	%R1<def,tied3> = STWU %R1, -64, %R1<tied0>
	%R31<def> = OR %R1, %R1
	STH %R3<kill>, 90, %R31; mem:ST2[%s]
	%R3<def> = LHZ 90, %R31; mem:LD2[FixedStack-1](align=4)
	%R2<def> = LBZ 90, %R31; mem:LD1[%a3](align=2)(tbaa=<badref>)
	%R2<def> = ADDI %R2<kill>, 3
	%R4<def> = OR %R3, %R3
	STB %R2<kill>, 90, %R31; mem:ST1[%a3](align=2)(tbaa=<badref>)
^^^^ note that the store is now ordered after the registers are setup for the call to bar - so the input to bar appears incorrect.
	BL <ga:@bar>, <regmask>, %LR<imp-def>, %RM<imp-use>, %R3<imp-use,kill>, %R4<imp-use,kill>, %R1<imp-def>
	%R1<def> = ADDI %R1, 64
	%R0<def> = LWZ 8, %R1
	%R31<def> = LWZ -4, %R1
	MTLR %R0<kill>, %LR<imp-def>
	BLR %LR<imp-use>, %RM<imp-use>
 
# End machine code for function foo.
Comment 1 Hal Finkel 2014-07-30 07:19:06 PDT
Adding Bill and Uli, because they've touched similar ABI code for PPC64 recently, and might have some insight on what's supposed to happen.

The problem here is that the parameter, %struct.sm* byval %s, is marked as byval, and in the Darwin ABI code, the stack object for this parameter is created like this:

3057	      int FI = MFI->CreateFixedObject(ObjSize, CurArgOffset, true);

Note that last parameter, which is true, is the parameter isImmutable. By setting it to true, you're promising that nothing will modify it, and so the scheduler is free not to add dependency chains from any loads of that parameter to anything else (which is what happens here, resulting in your miscompile).

I'll note that in the PPC64 ABI code, we also tag some of our byval parameters as immutable (such as those coming from the caller's stack frame), and someone should audit/confirm that's really appropriate.
Comment 2 Ulrich Weigand 2014-08-01 08:26:40 PDT
Reviewing LowerFormalArguments_64SVR4, I see four places where CreateFixedObject is currently being called with immutable = true:

- For the dummy object representing a zero-sized byval parameter
  This looks OK, since the object must not be accessed anyway.

- For all other byval parameters
  This is wrong, since byval parameters may be written to

- For non-byval parameters that don't fit in registers (and we don't have tail call)
  This looks OK; those parameters will only be read out of their stack slots once

- For unnamed vararg parameters
  This looks OK; va_arg will only copy a parameter out, and not write to it

I'll check in a fix for the one incorrect case shortly.
Comment 3 Ulrich Weigand 2014-08-01 09:36:31 PDT
Fix for the case found in the previous comment is now checked in as revision 214517.
Comment 4 Hal Finkel 2014-08-15 19:20:22 PDT
As it turns out, there were two things wrong here. First was the immutable property. Second, there was an assumption that all fixed-offset stack objects were not pointed-to by IR values (which is false for byvals).

This should be fixed by r215793/r215794/r215795.
Comment 5 Reid Kleckner 2017-01-03 14:38:25 PST
(In reply to comment #4)
> As it turns out, there were two things wrong here. First was the immutable
> property. Second, there was an assumption that all fixed-offset stack
> objects were not pointed-to by IR values (which is false for byvals).
> 
> This should be fixed by r215793/r215794/r215795.

Hal, can you recall why the "isAliased" part of this fix was PPC only? On pretty much all targets, byval is used when the argument lives in some fixed stack object that is aliased by an LLVM value (the pointer argument).

Also, if I revert the change (r215795), the checked in test case still passes. I'd like to remove the isAliased boolean from StackObject if possible.