New user self-registration is disabled due to spam. For an account please email bugs-admin@lists.llvm.org with your e-mail address and full name.

Bug 43230 - [X86] Incorrect shuffle of shift optimization on haswell
Summary: [X86] Incorrect shuffle of shift optimization on haswell
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: X86 (show other bugs)
Version: 9.0
Hardware: PC Linux
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks: release-9.0.0
  Show dependency tree
 
Reported: 2019-09-05 13:26 PDT by Nikita Popov
Modified: 2019-09-09 02:37 PDT (History)
5 users (show)

See Also:
Fixed By Commit(s): 371305,371307


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nikita Popov 2019-09-05 13:26:42 PDT
define <16 x i16> @test(<16 x i16> %a, <16 x i16> %b) {
  %shr = lshr <16 x i16> %a, %b
  %shuf = shufflevector <16 x i16> zeroinitializer, <16 x i16> %shr, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 30, i32 15>
  ret <16 x i16> %shuf
} 

With -mcpu=haswell results in:

	vpxor	%xmm2, %xmm2, %xmm2
	vpunpckhwd	%ymm2, %ymm1, %ymm1 # ymm1 = ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15]
	vpunpckhwd	%ymm0, %ymm2, %ymm0 # ymm0 = ymm2[4],ymm0[4],ymm2[5],ymm0[5],ymm2[6],ymm0[6],ymm2[7],ymm0[7],ymm2[12],ymm0[12],ymm2[13],ymm0[13],ymm2[14],ymm0[14],ymm2[15],ymm0[15]
	vpsrlvd	%ymm1, %ymm0, %ymm0
	vpand	.LCPI0_0(%rip), %ymm0, %ymm0

While in LLVM 7 it was:

        vpxor   xmm2, xmm2, xmm2
        vpunpckhwd      ymm1, ymm1, ymm2 # ymm1 = ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15]
        vpunpckhwd      ymm0, ymm2, ymm0 # ymm0 = ymm2[4],ymm0[4],ymm2[5],ymm0[5],ymm2[6],ymm0[6],ymm2[7],ymm0[7],ymm2[12],ymm0[12],ymm2[13],ymm0[13],ymm2[14],ymm0[14],ymm2[15],ymm0[15]
        vpsrlvd ymm0, ymm0, ymm1
        vpand   ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]

Godbolt: https://godbolt.org/z/4SyEhQ

I *think* this transformation is not correct, though maybe my vector foo is too weak.

The debug log has:

With: t53: v32i8 = BUILD_VECTOR undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, undef:i8, Constant:i8<26>, Constant:i8<27>, undef:i8, undef:i8


Combining: t50: v32i8 = X86ISD::PSHUFB t48, t53
Creating new node: t54: v8i32 = undef
Creating new node: t55: v16i16 = bitcast t23
Creating constant: t56: i8 = Constant<-28>
Creating new node: t57: v16i16 = X86ISD::PSHUFLW t55, Constant:i8<-28>
Creating new node: t58: v32i8 = bitcast t57
 ... into: t58: v32i8 = bitcast t57

Which looks like a non-identity pshufb is replaced with an identity pshuflw.

This happens via matchUnaryPermuteShuffle(), though I haven't looked further.
Comment 1 Nikita Popov 2019-09-05 13:33:46 PDT
Err sorry, I ended up copy&pasting the wrong outputs.

This is LLVM trunk:

        vpxor   xmm2, xmm2, xmm2
        vpunpckhwd      ymm1, ymm1, ymm2 # ymm1 = ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15]
        vpunpckhwd      ymm0, ymm2, ymm0 # ymm0 = ymm2[4],ymm0[4],ymm2[5],ymm0[5],ymm2[6],ymm0[6],ymm2[7],ymm0[7],ymm2[12],ymm0[12],ymm2[13],ymm0[13],ymm2[14],ymm0[14],ymm2[15],ymm0[15]
        vpsrlvd ymm0, ymm0, ymm1
        vpand   ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]

This is LLVM 7:

        vpxor   xmm2, xmm2, xmm2
        vpunpckhwd      ymm1, ymm1, ymm2 # ymm1 = ymm1[4],ymm2[4],ymm1[5],ymm2[5],ymm1[6],ymm2[6],ymm1[7],ymm2[7],ymm1[12],ymm2[12],ymm1[13],ymm2[13],ymm1[14],ymm2[14],ymm1[15],ymm2[15]
        vpunpckhwd      ymm0, ymm2, ymm0 # ymm0 = ymm2[4],ymm0[4],ymm2[5],ymm0[5],ymm2[6],ymm0[6],ymm2[7],ymm0[7],ymm2[12],ymm0[12],ymm2[13],ymm0[13],ymm2[14],ymm0[14],ymm2[15],ymm0[15]
        vpsrlvd ymm0, ymm0, ymm1
        vpshufb ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] # ymm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,ymm0[26,27],zero,zero
Comment 2 Simon Pilgrim 2019-09-05 13:57:38 PDT
I'll take a look
Comment 3 Nikita Popov 2019-09-07 03:32:10 PDT
Looks like this is due to a typo... The PSHUL/HW code is constructing LoMask and HiMask from Mask rather than RepeatedMask. In this case the low half of the original mask is all-undef so it ends up constructing an identity shuffle.
Comment 4 Nikita Popov 2019-09-07 03:53:54 PDT
https://reviews.llvm.org/D67314
Comment 5 Nikita Popov 2019-09-07 05:22:53 PDT
Test in https://reviews.llvm.org/rL371305 and fix in https://reviews.llvm.org/rL371307.

If possible, it would be good to have this in LLVM 9. This bug has the dubious honor of miscompiling an AES implementation :(
Comment 6 Hans Wennborg 2019-09-09 02:37:02 PDT
Merged to release_90 in r371378.