Created attachment 14637 [details] .c and .sh files produced after assertion failure Found during FreeBSD/arm64 ports build and reported in FreeBSD PR 201762; I reproduced on a recent SVN build. fatal error: error in backend: fixup value out of range cc: error: clang frontend command failed with exit code 70 (use -v to see invocation) FreeBSD clang version 3.6.1 (tags/RELEASE_361/final 237755) 20150525 Target: aarch64-unknown-freebsd11.0 Thread model: posix cc: note: diagnostic msg: PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace, preprocessed source, and associated run script. cc: note: diagnostic msg: ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: cc: note: diagnostic msg: /tmp/cockney-d900cb.c cc: note: diagnostic msg: /tmp/cockney-d900cb.sh
This is because llvm is trying to create a tbnz instruction, however after calculating the fixup it finds the value is too large to fit into the 14-bit field. In the attached case I find llvm is generating the tbnz fixup with an offset of 34612 bytes, 1842 bytes past where it could branch to. I expect this to most likely be from the large switch statement.
Created attachment 15548 [details] Reduced bitcode for the cockney testcase I invested a significant amount of compute time to get this reduction :-) It's still not very useful, since the disassembly is over 7500 lines, but it runs a bit quicker for testing. Here's how I'm reproducing the fault with the attached file: $ llc -filetype=obj -O1 -relocation-model=pic cockney-reduced.bc This fault does not occur when generating assmebly, or when run in -O0 mode.
Created attachment 15581 [details] Reattach the original testcases. The reduced one no longer triggers with ToT Clang
I'm not able to continue investigating this ticket, so I'll do my best to hand-over what I learned while trying (and failing) to fix it. I initially thought this would be a bug in AArch64's branch relaxation pass [lib/Target/AArch64/AArch64BranchRelaxation.cpp], the purpose of which is to transform branch instruction to targets that are out of range for the instruction encodings. Transforms like, tbz LBL_TOO_FAR_AWAY ==XFORM=> tbnz NEXT_BB B LBL_TOO_FAR_AWAY I checked that this invariant wasn't being broken by an inspection of the basic block offsets, some offsets were within a few thousand bytes of 32K (the limit for TB[N]Z), but none were over it. So we then go into MC. My first hack was to catch this oversized fixup value in ELFAArch64AsmBackend::processFixupValue and emit a relocaton. That's almost certainly not the right fix, because the relocation might be truncated, but I didn't verify on of this. It's just one possible way of curing the symptom and not getting the crash, I doubt it solves the real problem! Properties of the bug I've noticed - Does not show up in -O0 or -O1 from Clang. - You have to compile in -O2 mode and above in Clang, but -O1 and above in llc. - Only shows up when emitting an object, not assembly. So the problem is in the object streamer bits. - Only shows up with -fPIC (i.e., -relocation-model=pic for llc) This command will show the error without having to use the provided "sh" file. $ clang -target aarch64-unknown -c -fPIC -O1 cockney-d900cb.c Once a bitcode file has been produced, this is the llc rune: $ llc -O1 -relocation-model=pic -filetype=obj cockney-d900cb.bc Using bugpoint can reduce the test-case significantly, but it's still too big for Human analysis of the source. It does make interactive runs go faster. This is the bugpoint command I used: $ bugpoint -llc-safe cockney-d900cb.bc --tool-args -relocation-model=pic -filetype=obj That took several hours on my machine. (Un)fortunately, the test-case I reduced a few days ago got fixed upstream, it no longer fails on trunk. So if you want a smaller reproducer, use the above. As to what is causing the bug, my only hunch is that there's something wrong in the symbol generation. I see MCValue's whose "A symbol" is larger than 32K, but that might be OK, because the fixup offsets are supposed to account for that, if I've understood correctly. On my machine, the value that causes the crash has an "A symbol" with an offset of 45084 and a corresponding fixup with offset 9660, giving a difference of 35424; 2657 bytes too big. The fixup offsets are computed in what appears to be a sensible fashion in MCELFStreamer::EmitInstToData. A fixup's offset points to the start of the corresponding instruction's code in the data fragment. Another hunch was that relaxation is somehow pushing the symbol offsets over the instruction encoding's maximums. I know the assembler is supposed to do branch shortening, so maybe there's a bug there? I've run out of time to load more context in to solve this one, so I'll have to drop it.
Created attachment 16806 [details] Reduced bitcode testcase This is what bugpoint has reduced now. It's pretty fragile, because it's only 1 byte over the limit, so any change that reduces that huge function by 1 byte might avoid the assertion.
Created attachment 16807 [details] Reduced bitcode testcase This is what bugpoint managed to reduce this time around. It's pretty fragile because it's only 1 byte over the limit of 32767, so any change that will make that function smaller will avoid this error.
Potential fix: https://reviews.llvm.org/D22870 Note that this bug is not in the object streaming code, as previously assumed in the comments - in fact, the corresponding assembly file (clang -O2 -S) crashes gas as well as llvm-mc with the same out-of-range errors.
r277331