LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 18478 - Overzealous strength reduction on _mm_mullo_epi16 producing inferior code
Summary: Overzealous strength reduction on _mm_mullo_epi16 producing inferior code
Status: RESOLVED FIXED
Alias: None
Product: clang
Classification: Unclassified
Component: -New Bugs (show other bugs)
Version: trunk
Hardware: PC Windows NT
: P normal
Assignee: Andrea Di Biagio
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-14 15:11 PST by Fabian Giesen
Modified: 2018-11-07 00:17 PST (History)
3 users (show)

See Also:
Fixed By Commit(s):


Attachments
Small repro with generated code (2.31 KB, text/plain)
2014-01-14 15:11 PST, Fabian Giesen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Giesen 2014-01-14 15:11:42 PST
Created attachment 11871 [details]
Small repro with generated code

Both of the functions in this file should be equivalent; the former ("good_...") does generate the expected code, but the latter ("bad_...") ends up "de-SIMDifying" the function almost completely.

The only difference between the two is that the former declares the magic value at global scope whereas the latter uses the _mm_setr_epi16 intrinsic.

This was tested using the official Clang 3.4 release binaries, not trunk, but the "Version" field doesn't have 3.4 yet.
Comment 1 Andrea Di Biagio 2014-02-12 20:20:25 PST
Hi Fabian,

Trunk revision 201271 fixed this issue.
Now the compiler produces the following SSE code for function 'bad_unpack_2bits_to_16' from your reproducible:

###
.LCPI1_0:
	.short	16384                   # 0x4000
	.short	4096                    # 0x1000
	.short	1024                    # 0x400
	.short	256                     # 0x100
	.short	64                      # 0x40
	.short	16                      # 0x10
	.short	4                       # 0x4
	.short	1                       # 0x1
	.text
	.globl	_Z22bad_unpack_2bits_to_16t
	.align	16, 0x90
	.type	_Z22bad_unpack_2bits_to_16t,@function
_Z22bad_unpack_2bits_to_16t:            # @_Z22bad_unpack_2bits_to_16t
	.cfi_startproc
# BB#0:                                 # %entry
	movd	%edi, %xmm0
	punpcklwd	%xmm0, %xmm0    # xmm0 = xmm0[0,0,1,1,2,2,3,3]
	pshufd	$0, %xmm0, %xmm0        # xmm0 = xmm0[0,0,0,0]
	pmullw	.LCPI1_0(%rip), %xmm0
	psrlw	$14, %xmm0
	retq
###
Comment 2 Andrea Di Biagio 2014-02-14 05:29:34 PST
Resolving this bug as FIXED.
Trunk revision 201271 fixed this issue and I don't think there is more work to be done on this.
Fabian, could you please verify that this now works for you as well?
Thanks.
Comment 3 Fabian Giesen 2014-02-18 13:02:46 PST
Yep, bug is fixed. Thanks!