LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 37400 - MSVC and Clang use different values for _MM_HINT constants; Windows SDK hardcodes MSVC values
Summary: MSVC and Clang use different values for _MM_HINT constants; Windows SDK hardc...
Status: NEW
Alias: None
Product: clang
Classification: Unclassified
Component: Headers (show other bugs)
Version: 6.0
Hardware: PC Windows NT
: P normal
Assignee: Unassigned Clang Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-09 15:44 PDT by Fabian Giesen
Modified: 2018-05-17 13:31 PDT (History)
5 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Giesen 2018-05-09 15:44:14 PDT
With clang-cl 6.0:

// ---- begin
#include <emmintrin.h>
#include <Windows.h> // <--- only with this present!

void f(const void *p)
{
    _mm_prefetch((const char *)p, _MM_HINT_T0);
}
// ---- end

"clang-cl -c -O2 -FA prefetch.cpp" yields (only relevant parts):

# ---- begin
"?f@@YAXPEBX@Z":                        # @"\01?f@@YAXPEBX@Z"
# %bb.0:
	prefetcht2	(%rcx)
	retq
# ---- end

...huh? Some grepping later, it turns out that "um\winnt.h" in the Windows SDK 10.0.16299.0 (and presumably other versions as well, but I didn't check) contains this:

C:\Program Files (x86)\Windows Kits\10\Include\10.0.16299.0>rg MM_HINT_T0
um\winnt.h
3266:#define _MM_HINT_T0     1
3296:#define PF_TEMPORAL_LEVEL_1 _MM_HINT_T0
7349:#define _MM_HINT_T0     1
7366:#define PF_TEMPORAL_LEVEL_1 _MM_HINT_T0

and indeed, the MSVC version of xmmintrin.h has _MM_HINT_T0 #defined to 1.

Long story short, for any translation unit that includes Windows.h, 
_MM_HINT_* end up re-#defined to MSVC-specific values, which produce the wrong instructions with clang-cl.

If the goal is to make clang-cl be able to compile apps using unmodified Windows headers, then Clang needs to use the same values for _MM_HINT_* as MSVC does. (Presumably with some remapping done in the frontend.) Sigh.
Comment 1 Craig Topper 2018-05-09 16:02:51 PDT
We use the same encodings as gcc, which doesn't match icc. And based on this bug MSVC.

Related bug 
https://bugs.llvm.org/show_bug.cgi?id=32411
Comment 2 Reid Kleckner 2018-05-17 10:44:58 PDT
It looks like these prefetch values are used by more than just _mm_prefetch / __builtin_prefetch. They're also used by scatter/gather intrinsics. That makes it hard to just change the numbering everywhere in MSVC environments.

I think we might want to do something nasty like ignore definitions of _MM_HINT_TN that use "incorrect" values in the pre-processor.
Comment 3 Fabian Giesen 2018-05-17 13:31:18 PDT
You mean the AVX512PF gather/scatter prefetch instructions?

I just checked, and they appear to work (in Clang 6.0) with both the Clang xmmintrin.h and the overrides from Windows.h, so I got curious.

X86ISelLowering.cpp LowerINTRINSIC_W_CHAIN (which seems to be the place that handles AVX512 gather/scatter prefetches) has:

  case PREFETCH: {
    SDValue Hint = Op.getOperand(6);
    unsigned HintVal = cast<ConstantSDNode>(Hint)->getZExtValue();
    assert((HintVal == 2 || HintVal == 3) &&
           "Wrong prefetch hint in intrinsic: should be 2 or 3");
    unsigned Opcode = (HintVal == 2 ? IntrData->Opc1 : IntrData->Opc0);
    SDValue Chain = Op.getOperand(0);
    SDValue Mask  = Op.getOperand(2);
    SDValue Index = Op.getOperand(3);
    SDValue Base  = Op.getOperand(4);
    SDValue Scale = Op.getOperand(5);
    return getPrefetchNode(Opcode, Op, DAG, Mask, Base, Index, Scale, Chain,
                           Subtarget);
  }

Opc1 is the opcode to use for a L2 cache prefetch (=T1 hint), Opc0 is the opcode to use for a L1 cache prefetch (=T0 hint).

MSVC (and presumably ICC too) has:

/* constants for use with _mm_prefetch */
#define _MM_HINT_NTA    0
#define _MM_HINT_T0     1
#define _MM_HINT_T1     2
#define _MM_HINT_T2     3
#define _MM_HINT_ENTA   4

matching the values that go into the corresponding ModRM field, see e.g. X86InstrSSE.td:

3082:def PREFETCHT0   : I<0x18, MRM1m, (outs), (ins i8mem:$src),
3084:def PREFETCHT1   : I<0x18, MRM2m, (outs), (ins i8mem:$src),
3086:def PREFETCHT2   : I<0x18, MRM3m, (outs), (ins i8mem:$src),
3088:def PREFETCHNTA  : I<0x18, MRM0m, (outs), (ins i8mem:$src),

Clang xmmintrin.h has:

#define _MM_HINT_ET0 7
#define _MM_HINT_ET1 6
#define _MM_HINT_T0  3
#define _MM_HINT_T1  2
#define _MM_HINT_T2  1
#define _MM_HINT_NTA 0

note _MM_HINT_T1 is the same value (2) for both.

i.e. with Clang/GCC-style _MM_HINT values, prefetch intrinsics should get either _MM_HINT_T0 (3) or _MM_HINT_T1 (2), which is what the assert above tests for.

with MSVC-style _MM_HINT values, it would see either _MM_HINT_T0 (1) or _MM_HINT_T1 (2). So presumably that assert would hit if I were using a debug build of Clang, but either way it still produces the right instructions because the test is only "is this 2 or not".