llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	e95550f508	[X86][AVX] Add VMOVDDUP-VPBROADCASTQ execution domain mapping Noticed in D57514. Differential Revision: https://reviews.llvm.org/D57519 llvm-svn: 352922	2019-02-01 21:41:30 +00:00
Simon Pilgrim	9f4dea8c06	[X86] Add VPSLLI/VPSRLI ((X >>u C1) << C2) SimplifyDemandedBits combine Repeat of the generic SimplifyDemandedBits shift combine llvm-svn: 350399	2019-01-04 15:43:43 +00:00
Craig Topper	ec096a1dae	[X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode similar to what's done when the destination is f64. The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away. By custom legalizing it we can avoid this churn and maybe produce better code. llvm-svn: 348085	2018-12-02 05:46:48 +00:00
Craig Topper	bc8148f7b0	[X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction. Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54671 llvm-svn: 347171	2018-11-18 17:59:28 +00:00
Sanjay Patel	0a515595a7	[x86] allow vector load narrowing with multi-use values This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595	2018-11-10 20:05:31 +00:00
Craig Topper	17d64c71c5	[X86] Move the promotion of v16i16->v16i8 for avx512f but not avx512bw from lowering to isel. Change to use vpmovzx instead of vpmovsx. With avx512f but not avx512bw we need to extend to v16i32 then truncate that to to v16i8. Previously we emitted both nodes during lowering, but I'm trying to switch to using target independent nodes and with that switched the extend+truncate wou This patch changes the implementation to what will be necessary with that patch which helps minimize test diffs. llvm-svn: 346552	2018-11-09 20:09:53 +00:00
Craig Topper	731ea7dbc1	[X86] Turn X86ISD::VSEXT into X86ISD::VZEXT if the upper bits aren't demanded. This makes X86ISD::VSEXT more similar to ISD::SIGN_EXTEND and ISD::ZERO_EXTEND. I'm hoping to replace X86ISD::VSEXT/VZEXT with target independent nodes. Making the target specific nodes similar to the target independent nodes helps minimize test diffs in that patch. llvm-svn: 346539	2018-11-09 19:05:51 +00:00
Craig Topper	06aea1720a	[X86] Move promotion of vector and/or/xor from legalization to DAG combine Summary: I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that. This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better. In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53107 llvm-svn: 344487	2018-10-15 01:51:58 +00:00
Simon Pilgrim	29279f29c8	[X86][SSE] Add extract_subvector(PSHUFB) -> PSHUFB(extract_subvector()) combine Fixes PR32160 by reducing the size of PSHUFB if we only use one of the lanes. This approach can probably be generalized to handle any target shuffle (and any subvector index) but we have no test coverage at the moment. llvm-svn: 344336	2018-10-12 12:10:34 +00:00
Simon Pilgrim	8191d63c3b	[X86] Add initial SimplifyDemandedVectorEltsForTargetNode support This patch adds an initial x86 SimplifyDemandedVectorEltsForTargetNode implementation to handle target shuffles. Currently the patch only decodes a target shuffle, calls SimplifyDemandedVectorElts on its input operands and removes any shuffle that reduces to undef/zero/identity. Future work will need to integrate this with combineX86ShufflesRecursively, add support for other x86 ops, etc. NOTE: There is a minor regression that appears to be affecting further (extractelement?) combines which I haven't been able to solve yet - possibly something to do with how nodes are added to the worklist after simplification. Differential Revision: https://reviews.llvm.org/D52140 llvm-svn: 342564	2018-09-19 18:11:34 +00:00
Craig Topper	cdb0ed2910	[X86] Add custom execution domain fixing for 128/256-bit integer logic operations with AVX512F, but not AVX512DQ. AVX512F only has integer domain logic instructions. AVX512DQ added FP domain logic instructions. Execution domain fixing runs before EVEX->VEX. So if we have AVX512F and not AVX512DQ we fail to do execution domain switching of the logic operations. This leads to mismatches in execution domain and more test differences. This patch adds custom domain fixing that switches EVEX integer logic operations to VEX fp logic operations if XMM16-31 are not used. llvm-svn: 337137	2018-07-15 23:32:36 +00:00
Simon Pilgrim	a6afa310c9	[X86][SSE] Simplify combineVectorTruncationWithPACKUS to reduce code duplication Simplify combineVectorTruncationWithPACKUS to mask the upper bits followed by calling truncateVectorWithPACK instead of duplicating with similar code. This results in the codegen using (V)PACKUSDW on SSE41+ targets for vXi64/vXi32 inputs where before it always used PACKUSWB (along with a lot more bitcasting). I've raised PR37749 as until we avoid unnecessary concats back to 256-bit for bitwise ops, we can't avoid splitting the input value into 128-bit subvectors for masking. llvm-svn: 334289	2018-06-08 13:59:11 +00:00
Simon Pilgrim	ad45efc445	[X86][SSE] Consistently prefer lowering to PACKUS over PACKSS We have some combines/lowerings that attempt to use PACKSS-then-PACKUS and others that use PACKUS-then-PACKSS. PACKUS is much easier to combine with if we know the upper bits are zero as ComputeKnownBits can easily see through BITCASTs etc. especially now that rL333995 and rL334007 have landed. It also effectively works at byte level which further simplifies shuffle combines. The only (minor) annoyances are that ComputeKnownBits can sometimes take longer as it doesn't fail as quickly as ComputeNumSignBits (but I'm not seeing any actual regressions in tests) and PACKUSDW only became available after SSE41 so we have more codegen diffs between targets. llvm-svn: 334276	2018-06-08 10:29:00 +00:00
Simon Pilgrim	c0dbdb86c3	[TargetLowering] SimplifyDemandedVectorElts - pass demanded elts through TRUNCATE ops llvm-svn: 326043	2018-02-24 19:28:34 +00:00
Simon Pilgrim	4e2f757dc1	[X86][SSE] Allow float domain crossing if we are merging 2 or more shuffles and the root started as a float domain shuffle llvm-svn: 325349	2018-02-16 14:57:25 +00:00
Simon Pilgrim	80663ee986	[SelectionDAG] Add initial implementation of TargetLowering::SimplifyDemandedVectorElts This is mainly a move of simplifyShuffleOperands from DAGCombiner::visitVECTOR_SHUFFLE to create a more general purpose TargetLowering::SimplifyDemandedVectorElts implementation. Further features can be moved/added in future patches. Differential Revision: https://reviews.llvm.org/D42896 llvm-svn: 325232	2018-02-15 12:14:15 +00:00
Simon Pilgrim	2ec8373633	[X86][SSE] truncateVectorWithPACK - Use src type instead of dst to select between PACKSDW/PACKSWB Try to keep PACKSDW/PACKSWB as wide as possible, this helps ComputeNumSignBits as it can only peek through bitcasts to wider types, pre-AVX2 codegen was already doing this as it could peek through bitcasts/subvectors more easily than AVX2 could through shuffles. This shouldn't affect existing results as calls to truncateVectorWithPACK ensure we have enough sign bits to pack to the same value, but it should make it possible to use truncateVectorWithPACK chains to perform saturation in combineTruncateWithSat with a future patch. llvm-svn: 325149	2018-02-14 18:23:58 +00:00
Puyan Lotfi	43e94b15ea	Followup on Proposal to move MIR physical register namespace to '$' sigil. Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922	2018-01-31 22:04:26 +00:00
Craig Topper	b2868233b7	[X86] Use ISD::TRUNCATE instead of X86ISD::VTRUNC when input and output types have the same number of elements. llvm-svn: 322455	2018-01-14 08:11:36 +00:00
Zvi Rackover	72b0bb1405	X86 Tests: Update more isel tests with FastVariableShuffle feature Summary: Added the FastVariableShuffle feature to cases that resembled processors for which this fearure is on. For AVX2 there are processors with and w/o this fearue enable. For AVX512 only KNL does enable this feature so cases which only have +avx512f were left without the FastVariableShuffle enabled. Reviewers: RKSimon, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41851 llvm-svn: 322090	2018-01-09 16:26:06 +00:00
Simon Pilgrim	e337268df7	[X86][SSE] Add test case from PR32160 llvm-svn: 321620	2018-01-01 13:04:04 +00:00
Simon Pilgrim	d873b6f6ba	[X86][AVX512] Attempt target shuffle combining to different types instead of early-out We try to prevent shuffle combining to value types that would stop the folding of masked operations, but by just returning early, we were failing to try different shuffle types. The TODOs are all still relevant here to improve codegen but we're lacking test examples. llvm-svn: 321085	2017-12-19 16:54:07 +00:00
Francis Visoiu Mistrih	a8a83d150f	[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register. Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/<def>//g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<def[ ],[ ]dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ],[ ]dead>/implicit-def dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name "*.s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' llvm-svn: 320022	2017-12-07 10:40:31 +00:00
Francis Visoiu Mistrih	25528d6de7	[CodeGen] Unify MBB reference format in both MIR and debug output As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665	2017-12-04 17:18:51 +00:00
Francis Visoiu Mistrih	9d7bb0cb40	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187	2017-11-28 17:15:09 +00:00
Simon Pilgrim	ae1f013495	[X86][SSE] Add PACKUS support to combineVectorTruncation Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317315	2017-11-03 11:33:48 +00:00
Simon Pilgrim	e152c2c447	[X86][SSE] Add PACKUS support to LowerTruncate Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317128	2017-11-01 21:52:29 +00:00
Simon Pilgrim	f657ba0cb6	[X86][SSE] Truncate with PACKSS any input with sufficient sign-bits So far we've only been using PACKSS truncations with 'all-bits or zero-bits' patterns (vector comparison results etc.). When really we can safely use it for any case as long as the number of sign bits reach down to the last 16-bits (or 8-bits if we're truncating to bytes). The next steps after this is add the equivalent support for PACKUS and to support packing to sub-128 bit vectors for truncating stores etc. Differential Revision: https://reviews.llvm.org/D39476 llvm-svn: 317086	2017-11-01 11:47:44 +00:00
Simon Pilgrim	b37a24e82f	[SelectionDAG] Add support for INSERT_SUBVECTOR to computeKnownBits llvm-svn: 316847	2017-10-28 22:10:40 +00:00
Simon Pilgrim	294f88dfa0	[X86][SSE] Combine 128-bit target shuffles to PACKSS/PACKUS. llvm-svn: 316845	2017-10-28 20:51:27 +00:00
Simon Pilgrim	7cd4e2c96f	[X86][SSE] Tests packuswb/truncation codegen from PR34773 llvm-svn: 316033	2017-10-17 21:14:53 +00:00
Simon Pilgrim	73f143e774	[X86][SSE] Improve shuffling combining with horizontal operations Recognise cases when we can merge the shuffles with their horizontal (HADD/HSUB/PACK) instruction inputs. Replaces an older implementation which performed some of this during lowering, expanding an existing target shuffle combine stage instead. Differential Revision: https://reviews.llvm.org/D38506 llvm-svn: 315150	2017-10-07 12:42:23 +00:00
Simon Pilgrim	b47b3f2564	[X86][SSE] Add support for lowering v8i16 binary shuffles to PACKSS/PACKUS Missed in D38472 llvm-svn: 314916	2017-10-04 17:31:28 +00:00
Craig Topper	6fb55716e9	[X86] Redefine MOVSS/MOVSD instructions to take VR128 regclass as input instead of FR32/FR64 This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted. This should fix PR33079 Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results. Differential Revision: https://reviews.llvm.org/D38449 llvm-svn: 314914	2017-10-04 17:20:12 +00:00
Simon Pilgrim	f5f291d129	[X86][SSE] Add support for lowering shuffles to PACKSS/PACKUS If the upper bits of a truncation shuffle patterns have at least the minimum number of sign/zero bits on their inputs then we can safely use PACKSS/PACKUS as shuffles. Partial fix for https://bugs.llvm.org/show_bug.cgi?id=34773 Differential Revision: https://reviews.llvm.org/D38472 llvm-svn: 314788	2017-10-03 12:01:31 +00:00
Simon Pilgrim	a8dd6f4f30	[X86][SSE] Fold (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1 Remove sign extend in register style pattern if the sign is already extended enough llvm-svn: 314599	2017-09-30 17:57:34 +00:00
Simon Pilgrim	5bd43bce07	[X86][SSE] Add vector truncation cases inspired by PR34773 We should be using PACKSS/PACKUS more aggressively when we know the state of the upper bits llvm-svn: 314597	2017-09-30 16:14:59 +00:00
Craig Topper	a80949feb5	[X86] Add VPERMPD/VPERMQ and VPERMPS/VPERMD to the execution domain fixing table. llvm-svn: 313610	2017-09-19 04:39:55 +00:00
Craig Topper	cb0e74975a	[AVX-512] Remove patterns that select vmovdqu8/16 for unmasked loads. Prefer vmovdqa64/vmovdqu64 instead. These were taking priority over the aligned load instructions since there is no vmovda8/16. I don't think there is really a difference between aligned and unaligned on newer cpus so I don't think it matters which instructions we use. But with this change we reduce the size of the isel table a little and we allow the aligned information to pass through to the evex->vec pass and produce the same output has avx/avx2 in some cases. I also generally dislike patterns rooted in a bitcast which these were. Differential Revision: https://reviews.llvm.org/D35977 llvm-svn: 309589	2017-07-31 17:35:44 +00:00
Ayman Musa	d9fb157845	[X86][SSE2] Fix asm string for movq (Move Quadword) instruction. Replace "mov{d\|q}" with "movq". Differential Revision: https://reviews.llvm.org/D32220 llvm-svn: 301386	2017-04-26 07:08:44 +00:00
Amjad Aboud	4f97751798	[X86] Generate VZEROUPPER for Skylake-avx512. VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859	2017-03-03 09:03:24 +00:00
Craig Topper	464b8cb244	[X86] Don't base domain decisions on VEXTRACTF128/VINSERTF128 if only AVX1 is available. Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available. This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp. Overall I think this produces better results in the modified test cases. llvm-svn: 294824	2017-02-11 05:32:57 +00:00
Craig Topper	6a35a81fc5	[X86] In LowerTRUNCATE, create an ISD::VECTOR_SHUFFLE instead of explicitly creating a PSHUFB. This will be lowered by regular shuffle lowering to a PSHUFB later. Similar was already done for several other shuffles in this function. The test changes are because the old code used explicity zeroing for elements that could have been undef. While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us. llvm-svn: 294130	2017-02-05 18:33:14 +00:00
Craig Topper	fa875a1d3d	[AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT instructions. llvm-svn: 290869	2017-01-03 05:46:18 +00:00
Gadi Haber	19c4fc5e62	This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible. There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers. The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled. Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky Differential Revision: https://reviews.llvm.org/D27901 llvm-svn: 290663	2016-12-28 10:12:48 +00:00
Sanjay Patel	a0d8a278a7	[x86] use a single shufps when it can save instructions This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 llvm-svn: 289837	2016-12-15 18:03:38 +00:00
Simon Pilgrim	b56c731f18	[X86][AVX512] Add AVX512VL/AVX512BWVL vector truncation tests llvm-svn: 286105	2016-11-07 13:34:29 +00:00
Simon Pilgrim	02666ac9c3	[X86][SSE] Drop unnecessary -mcpu argument from trunc tests cpu/triple duplication llvm-svn: 286104	2016-11-07 13:28:20 +00:00
Michael Kuperstein	7cc2123847	[DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputs This generalizes the build_vector -> vector_shuffle combine to support any number of inputs. The idea is to create a binary tree of shuffles, where the first layer performs pairwise shuffles of the input vectors placing each input element into the correct lane, and the rest of the tree blends these shuffles together. This doesn't try to be smart and create any sort of "optimal" shuffles. The assumption is that even a "poor" shuffle sequence is better than extracting and inserting the elements one by one. Differential Revision: https://reviews.llvm.org/D24683 llvm-svn: 283480	2016-10-06 18:58:24 +00:00
Igor Breger	74813fc19c	[AVX512BW] Change truncStore action (v16i16->v16i18). It can be legal only with AVX512VL. Differential Revision: http://reviews.llvm.org/D24547 llvm-svn: 281445	2016-09-14 08:04:28 +00:00

1 2

68 Commits