Created attachment 10841 [details] picojpeg.c code generates this backend problem. While compiling the open source picojpeg code with llvm 3.3 it generates code like this mov lr, pc bx r6 In thumb mode, and the code it is calling with the bx has code that uses this push {...,lr} pop {...,pc} Which is just bad form, a bx should be used (pop {rn}; bx rn) not pop {pc}, in either case the lr does not have the lsbit set, because mov lr,pc does not set the lsbit (the pc does not have the lsbit set in thumb mode it is stripped by bl,bx,blx) so when it returns after this combination mov lr, pc bx r6 ;returns here ideally which is thumb code, if the processor is a cortex-m then it is game over because you tried to switch to arm mode (in that case I would hope to see an exception, but didnt test it there), if it is not a cortex-m then it returns in arm mode and tries to execute the thumb instructions as arm instructions and unpredictable results will occur. Naturally with more code in the project, and llvm-linking the project and optimizing the whole thing the problem can move around but the attached code with this makefile generates the mov lr,pc with clang+llvm 3.3 OOPS = -std-compile-opts -strip-debug -disable-simplify-libcalls LOPS = -Wall -m32 -emit-llvm LLCOPS = -march=thumb -disable-simplify-libcalls picojpeg.s : picojpeg.c clang $(LOPS) -c picojpeg.c -o picojpeg.bc opt $(OOPS) picojpeg.bc -o picojpeg.opt.bc llc $(LLCOPS) picojpeg.opt.bc -o picojpeg.s
From the ARM ARM The general-purpose registers loaded can include the PC. If they do, the word loaded for the PC is treated as an address and a branch occurs to that address. In ARMv5 and above, bit[0] of the loaded value determines whether execution continues after this branch in ARM state or in Thumb state, as though the following instruction had been executed: BX (loaded_value) In T variants of ARMv4, bit[0] of the loaded value is ignored and execution continues in Thumb state, as though the following instruction had been executed: MOV PC,(loaded_value) So replacing pop {pc} with a bx is only needed on ARMv4T I guess. I was using my instruction set simulator when I detected this and then verified the problem on an ARMv6
Okay, I see this is valid armv4t code. adding mcpu=something changes the code generation.