With clang-cl 6.0: // ---- begin #include <emmintrin.h> #include <Windows.h> // <--- only with this present! void f(const void *p) { _mm_prefetch((const char *)p, _MM_HINT_T0); } // ---- end "clang-cl -c -O2 -FA prefetch.cpp" yields (only relevant parts): # ---- begin "?f@@YAXPEBX@Z": # @"\01?f@@YAXPEBX@Z" # %bb.0: prefetcht2 (%rcx) retq # ---- end ...huh? Some grepping later, it turns out that "um\winnt.h" in the Windows SDK 10.0.16299.0 (and presumably other versions as well, but I didn't check) contains this: C:\Program Files (x86)\Windows Kits\10\Include\10.0.16299.0>rg MM_HINT_T0 um\winnt.h 3266:#define _MM_HINT_T0 1 3296:#define PF_TEMPORAL_LEVEL_1 _MM_HINT_T0 7349:#define _MM_HINT_T0 1 7366:#define PF_TEMPORAL_LEVEL_1 _MM_HINT_T0 and indeed, the MSVC version of xmmintrin.h has _MM_HINT_T0 #defined to 1. Long story short, for any translation unit that includes Windows.h, _MM_HINT_* end up re-#defined to MSVC-specific values, which produce the wrong instructions with clang-cl. If the goal is to make clang-cl be able to compile apps using unmodified Windows headers, then Clang needs to use the same values for _MM_HINT_* as MSVC does. (Presumably with some remapping done in the frontend.) Sigh.
We use the same encodings as gcc, which doesn't match icc. And based on this bug MSVC. Related bug https://bugs.llvm.org/show_bug.cgi?id=32411
It looks like these prefetch values are used by more than just _mm_prefetch / __builtin_prefetch. They're also used by scatter/gather intrinsics. That makes it hard to just change the numbering everywhere in MSVC environments. I think we might want to do something nasty like ignore definitions of _MM_HINT_TN that use "incorrect" values in the pre-processor.
You mean the AVX512PF gather/scatter prefetch instructions? I just checked, and they appear to work (in Clang 6.0) with both the Clang xmmintrin.h and the overrides from Windows.h, so I got curious. X86ISelLowering.cpp LowerINTRINSIC_W_CHAIN (which seems to be the place that handles AVX512 gather/scatter prefetches) has: case PREFETCH: { SDValue Hint = Op.getOperand(6); unsigned HintVal = cast<ConstantSDNode>(Hint)->getZExtValue(); assert((HintVal == 2 || HintVal == 3) && "Wrong prefetch hint in intrinsic: should be 2 or 3"); unsigned Opcode = (HintVal == 2 ? IntrData->Opc1 : IntrData->Opc0); SDValue Chain = Op.getOperand(0); SDValue Mask = Op.getOperand(2); SDValue Index = Op.getOperand(3); SDValue Base = Op.getOperand(4); SDValue Scale = Op.getOperand(5); return getPrefetchNode(Opcode, Op, DAG, Mask, Base, Index, Scale, Chain, Subtarget); } Opc1 is the opcode to use for a L2 cache prefetch (=T1 hint), Opc0 is the opcode to use for a L1 cache prefetch (=T0 hint). MSVC (and presumably ICC too) has: /* constants for use with _mm_prefetch */ #define _MM_HINT_NTA 0 #define _MM_HINT_T0 1 #define _MM_HINT_T1 2 #define _MM_HINT_T2 3 #define _MM_HINT_ENTA 4 matching the values that go into the corresponding ModRM field, see e.g. X86InstrSSE.td: 3082:def PREFETCHT0 : I<0x18, MRM1m, (outs), (ins i8mem:$src), 3084:def PREFETCHT1 : I<0x18, MRM2m, (outs), (ins i8mem:$src), 3086:def PREFETCHT2 : I<0x18, MRM3m, (outs), (ins i8mem:$src), 3088:def PREFETCHNTA : I<0x18, MRM0m, (outs), (ins i8mem:$src), Clang xmmintrin.h has: #define _MM_HINT_ET0 7 #define _MM_HINT_ET1 6 #define _MM_HINT_T0 3 #define _MM_HINT_T1 2 #define _MM_HINT_T2 1 #define _MM_HINT_NTA 0 note _MM_HINT_T1 is the same value (2) for both. i.e. with Clang/GCC-style _MM_HINT values, prefetch intrinsics should get either _MM_HINT_T0 (3) or _MM_HINT_T1 (2), which is what the assert above tests for. with MSVC-style _MM_HINT values, it would see either _MM_HINT_T0 (1) or _MM_HINT_T1 (2). So presumably that assert would hit if I were using a debug build of Clang, but either way it still produces the right instructions because the test is only "is this 2 or not".