LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 52323 - New code-gen options for retpolines and straight line speculation
Summary: New code-gen options for retpolines and straight line speculation
Status: NEW
Alias: None
Product: clang
Classification: Unclassified
Component: C (show other bugs)
Version: unspecified
Hardware: PC Linux
: P enhancement
Assignee: Unassigned Clang Bugs
URL:
Keywords:
Depends on:
Blocks: 4068
  Show dependency tree
 
Reported: 2021-10-26 08:55 PDT by Andrew Cooper
Modified: 2021-11-22 14:00 PST (History)
14 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Cooper 2021-10-26 08:55:39 PDT
Hello

[FYI, this is being cross-requested of GCC too]

Linux and other kernel level software makes use of -mindirect-branch=thunk-extern to be able to alter the handling of indirect branches at boot.  It turns out to be advantageous to inline the thunks when retpoline is not in use.  https://lore.kernel.org/lkml/20211026120132.613201817@infradead.org/ is some infrastructure to make this work.

In some cases, we want to be able to inline an `lfence; jmp *%reg` thunk.  This is fine for the low 8 registers, but not fine for %r{8..15} where the REX prefix pushes the replacement size to being 6 bytes.

It would be very useful to have a code-gen option to write out `call %cs:__x86_indirect_thunk_r{8..15}` where the redundant %cs prefix will increase the instruction length to 6, allowing the non-retpoline form to be inlined.


Relatedly, x86 straight line speculation has been discussed before, but without any action taken.  It would be helpful to have a code gen option which would emit `int3` following any `ret` instruction, and any indirect jump, as neither of these two cases have following architectural execution.

The reason these two are related is that if both options are in use, we want an extra byte of replacement space to be able to inline `lfence; jmp *%reg; int3`.


Third Clang has been observed to spot conditional tail calls as `Jcc __x86_indirect_thunk_*`.  This is a 6 byte source size, but needs up to 9 bytes of space for inlining including an `int3` for straight line speculation reasons (See https://lore.kernel.org/lkml/20211026120310.359986601@infradead.org/ for full details).  It might be enough to simply prohibit an optimisation like this when trying to pad retpolines for inlineability.
Comment 1 Andrew Cooper 2021-10-26 08:56:32 PDT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 for GCC cross-request.
Comment 2 Nick Desaulniers 2021-11-18 14:01:03 PST
It looks like GCC has added support for -mindirect-branch-cs-prefix:

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=2196a681d7810ad8b227bf983f38ba716620545e

This is being used when available in the Linux kernel:

https://lore.kernel.org/lkml/20211118185421.GK174703@worktop.programming.kicks-ass.net/
Comment 3 Eli Friedman 2021-11-19 10:49:31 PST
(In reply to Andrew Cooper from comment #0)
> Relatedly, x86 straight line speculation has been discussed before, but
> without any action taken.  It would be helpful to have a code gen option
> which would emit `int3` following any `ret` instruction, and any indirect
> jump, as neither of these two cases have following architectural execution.

Is there documentation somewhere describing this mitigation? In particular:

1. What unconditional branches can lead straight-line speculation?
2. What instructions can be used to stop speculation?  (Is int3 actually effective? Are there other instructions that would also work?)
Comment 4 Andrew Cooper 2021-11-20 05:17:48 PST
(In reply to Eli Friedman from comment #3)
> (In reply to Andrew Cooper from comment #0)
> > Relatedly, x86 straight line speculation has been discussed before, but
> > without any action taken.  It would be helpful to have a code gen option
> > which would emit `int3` following any `ret` instruction, and any indirect
> > jump, as neither of these two cases have following architectural execution.
> 
> Is there documentation somewhere describing this mitigation? In particular:
> 
> 1. What unconditional branches can lead straight-line speculation?

For AMD, it is discussed here https://developer.amd.com/wp-content/resources/Managing-Speculation-on-AMD-Processors.pdf, mitigation G-5 on the final page:

  Place an LFENCE after an indirect branch instruction (RET, JMP reg or mem,
  CALL reg or mem) to help prevent possible sequential speculation.

For Intel, notes are included in SDM Vol2 for the CALL and JMP instructions:

  Certain situations may lead to the next sequential instruction after a 
  near indirect CALL being speculatively executed. If software needs to
  prevent this (e.g., in order to prevent a speculative execution side
  channel), then an LFENCE instruction opcode can be placed after the near
  indirect CALL in order to block speculative execution.

> 2. What instructions can be used to stop speculation?  (Is int3 actually
> effective? Are there other instructions that would also work?)

As you can see, LFENCE is the official recommendation.  It is about the only option for halting speculation which is safe to actually execute, and don't otherwise impact program state.

CALL has architectural execution following it.  However, the code following a CALL instruction is typically preservation of the return value and a pile of dead registers wanting reloading, and is typically not a pointer deference involving a callee-clobbered register.  Therefore, CALL's are unlikely to have subsequent instructions which are vulnerable to speculative type confusion, and are therefore uninteresting to protect.

JMP and RET are different.  They are followed by arbitrary unrelated basic blocks, which could contain anything.

We could use LFENCE everywhere.  However, as we don't architecturally execute the instruction, we don't care about architectural side effects.  Basically any instruction which causes a decode exception, or is microcoded, halts speculation.  INT3 is safe to use, and is 1/3 of the length of LFENCE, so has less of an impact on code size.
Comment 5 Eli Friedman 2021-11-22 12:36:46 PST
(In reply to Andrew Cooper from comment #4)
> CALL has architectural execution following it.  However, the code following
> a CALL instruction is typically preservation of the return value and a pile
> of dead registers wanting reloading, and is typically not a pointer
> deference involving a callee-clobbered register.

I'm a bit skeptical of heuristics like this; it's making very specific assumptions about how the compiler generates code, which might not hold for different codebases and/or optimizations.

> We could use LFENCE everywhere.  However, as we don't architecturally
> execute the instruction, we don't care about architectural side effects. 
> Basically any instruction which causes a decode exception, or is microcoded,
> halts speculation.  INT3 is safe to use, and is 1/3 of the length of LFENCE,
> so has less of an impact on code size.

It looks like the current version of Intel manual actually explicitly mentions INT3, so I guess that's fine.
Comment 6 Andrew Cooper 2021-11-22 14:00:07 PST
(In reply to Eli Friedman from comment #5)
> It looks like the current version of Intel manual actually explicitly
> mentions INT3, so I guess that's fine.
Ah great - I'd missed that update coming though.  I'll pester the other guys to document too.

> > CALL has architectural execution following it.  However, the code following
> > a CALL instruction is typically preservation of the return value and a pile
> > of dead registers wanting reloading, and is typically not a pointer
> > deference involving a callee-clobbered register.
> 
> I'm a bit skeptical of heuristics like this; it's making very specific
> assumptions about how the compiler generates code, which might not hold for
> different codebases and/or optimizations.
Nevertheless, protecting JMP/RET with an INT3 is easy and cheap, while protecting CALL with LFENCE is very much not, and risk profiles of the code is very different.

My gut feeling is that anyone wanting protection in the CALL case would probably be using Speculative Load Hardening instead.