llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	814339454d	[X86][SSE] canonicalizeShuffleWithBinOps - handle target shuffles. Fold SHUFFLE(BINOP(SHUFFLE(X),SHUFFLE(Y))) -> BINOP(SHUFFLE'(X),SHUFFLE'(Y)) style patterns as well as the existing shuffles of constants.	2021-03-15 15:01:29 +00:00
Simon Pilgrim	75a184dacf	Revert rG9ba577eca2e339726bfaad4e615c6324a705b292 "[X86][SSE] canonicalizeShuffleWithBinOps - handle target shuffles. NFCI." Sorry this wasn't supposed to be committed yet (and certainly not tagged as NFCI....)	2021-03-15 12:23:44 +00:00
Simon Pilgrim	9ba577eca2	[X86][SSE] canonicalizeShuffleWithBinOps - handle target shuffles. NFCI. Fold SHUFFLE(BINOP(SHUFFLE(X),SHUFFLE(Y))) -> BINOP(SHUFFLE'(X),SHUFFLE'(Y)) style patterns as well as the existing shuffles of constants.	2021-03-15 11:59:25 +00:00
Simon Pilgrim	2885d1251f	[X86] Fold bitcast(logic(bitcast(X), Y)) --> logic'(X, bitcast(Y)) for int-int bitcasts Extend the existing combine that handles bitcasting for fp-logic ops to also help remove logic ops across bitcasts to/from the same integer types. This helps improve AVX512 predicate handling for D/Q logic ops and also allows DAGCombine's scalarizeExtractedBinop to remove some annoying gpr->simd->gpr transfers. The concat_vectors regression in pr40891.ll will be addressed in a followup commit on this patch. Differential Revision: https://reviews.llvm.org/D96206	2021-02-21 14:40:54 +00:00
Simon Pilgrim	ff51bcee4a	[X86] KnownBits - use llvm min/max intrinsics instead of (deprecated) sse intrinsics. NFCI. These are auto-upgraded to the equivalent llvm variants now.	2021-02-20 12:07:02 +00:00
Simon Pilgrim	c33d36e066	[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle unary vperm2x128(permute/shift(x,c),undef) cases Fold vperm2x128(permute/shift(x,c),undef) -> permute/shift(vperm2x128(x,undef),c)	2021-01-22 15:47:23 +00:00
Simon Pilgrim	69bc0990a9	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED). Add DemandedElts support inside the TRUNCATE analysis. REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d Differential Revision: https://reviews.llvm.org/D56387	2021-01-21 13:01:34 +00:00
Hans Wennborg	a51226057f	Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE" It caused "Vector shift amounts must be in the same as their first arg" asserts in Chromium builds. See the code review for repro instructions. > Add DemandedElts support inside the TRUNCATE analysis. > > Differential Revision: https://reviews.llvm.org/D56387 This reverts commit `cad4275d69`.	2021-01-20 20:06:55 +01:00
Simon Pilgrim	cad4275d69	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE Add DemandedElts support inside the TRUNCATE analysis. Differential Revision: https://reviews.llvm.org/D56387	2021-01-20 15:39:58 +00:00
Simon Pilgrim	21d02dc595	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add general shuffle combining support This patch uses partial DemandedElts masks to further simplify target shuffle chains and finally starts making target shuffle combining part of SimplifyDemandedBits/SimplifyDemandedVectorElts. We already manage this for Depth == 0 cases, where combineX86ShuffleChain would early-out if the shuffle combined to the same op, but the patch generalizes this by manipulating the depth handling of combineX86ShufflesRecursively - calling with a new Depth = 0 and reducing the maximum shuffle combine depth accordingly. Differential Revision: https://reviews.llvm.org/D66004	2020-09-02 09:24:46 +01:00
Simon Pilgrim	0c005be6eb	[X86][SSE] getV4X86ShuffleImm8 - canonicalize broadcast masks If the mask input to getV4X86ShuffleImm8 only refers to a single source element (+ undefs) then canonicalize to a full broadcast. getV4X86ShuffleImm8 defaults to inline values for undefs, which can be useful for shuffle widening/narrowing but does leave SimplifyDemanded* calls thinking the shuffle depends on unnecessary elements. I'm still investigating what we should do more generally to avoid these undemanded elements, but broadcast cases was a simpler win.	2020-07-29 11:32:44 +01:00
Simon Pilgrim	17eafe0841	[X86][SSE] lowerV2I64Shuffle - use undef elements in PSHUFD mask widening If we lower a v2i64 shuffle to PSHUFD, we currently clamp undef elements to 0, (elements 0,1 of the v4i32) which can result in the shuffle referencing more elements of the source vector than expected, affecting later shuffle combines and KnownBits/SimplifyDemanded calls. By ensuring we widen the undef mask element we allow getV4X86ShuffleImm8 to use inline elements as the default, which are more likely to fold.	2020-07-26 16:04:22 +01:00
Simon Pilgrim	f54402b63a	[X86][AVX] Attempt to fold extract_subvector(shuffle(X)) -> extract_subvector(X) If we're extracting a subvector from a shuffle that is shuffling entire subvectors we can peek through and extract the subvector from the shuffle source instead. This helps remove some cases where concat_vectors(extract_subvector(),extract_subvector()) legalizations has resulted in BLEND/VPERM2F128 shuffles of the subvectors.	2020-07-09 14:09:24 +01:00
Simon Pilgrim	ecc5d7ee0d	[DAG] SimplifyMultipleUseDemandedBits - drop unnecessary *_EXTEND_VECTOR_INREG cases For little endian targets, if we only need the lowest element and none of the extended bits then we can just use the (bitcasted) source vector directly. We already do this in SimplifyDemandedBits, this adds the SimplifyMultipleUseDemandedBits equivalent.	2020-06-22 12:35:32 +01:00
Simon Pilgrim	e53d4869a0	[X86][AVX] combineVectorSignBitsTruncation - avoid complex vXi64->vXi32 PACKSS truncations (PR45794) Unless we're truncating an 'all-bits' result, using PACKSS for vXi64->vXi32 truncation causes problems with later combines as ComputeNumSignBits struggles to see through BITCASTs to smaller types. If we don't use PACKSS in these cases then we fallback to shuffles which are usually just as good.	2020-05-05 11:57:25 +01:00
Simon Pilgrim	371a69ac9a	[X86][AVX] Add PR45794 sitofp v4i64-v4f32 test case	2020-05-05 11:32:48 +01:00
Simon Pilgrim	3daa71ee00	[SelectionDAG] ComputeNumSignBits - add DemandedElts support for MIN/MAX ops	2020-01-25 20:21:14 +00:00
Simon Pilgrim	481b79668c	[X86] Add tests showing ComputeNumSignBits's failure to use DemandedElts for MIN/MAX opcodes	2020-01-25 19:28:57 +00:00
Simon Pilgrim	c6fcd5d115	[SelectionDAG] ComputeNumSignBits add getValidMaximumShiftAmountConstant() for ISD::SHL support Allows us to handle non-uniform SHL shifts to determine the minimum number of sign bits remaining (based off the maximum shift amount value)	2020-01-13 18:02:37 +00:00
Simon Pilgrim	ffc05d0dbc	[X86][SSE] Add sitofp(shl(sext(x),y)) test case with non-uniform shift value Shows that for non-uniform SHL shifts we fail to determine the minimum number of sign bits remaining (based off the maximum shift amount value)	2020-01-13 17:34:40 +00:00
Simon Pilgrim	38e2c01221	[SelectionDAG] ComputeNumSignBits add getValidMinimumShiftAmountConstant() ISD::SRA support Allows us to handle more non-uniform SRA sign bits cases	2020-01-13 16:55:02 +00:00
Simon Pilgrim	7afaa0099b	[X86][SSE] Add sitofp(ashr(x,y)) test case with non-uniform shift value	2020-01-13 16:55:02 +00:00
Simon Pilgrim	ee4aa1a228	[X86] Add AVX2 known signbits codegen tests	2020-01-13 16:55:01 +00:00
Simon Pilgrim	03cde3a7cc	[X86] Regenerate known-signbits-vector.ll tests. Use X86 instead of X32 and add a common CHECK prefix	2019-11-04 15:12:01 +00:00
Sanjay Patel	eb99165b97	[x86] try to keep FP casted+truncated+extracted vector element out of GPRs inttofp (trunc (extelt X, 0)) --> inttofp (extelt (bitcast X), 0) We have pseudo-vectorization of scalar int to FP casts, so this tries to make that more likely by replacing a truncate with a bitcast. I didn't see any test diffs starting from 'uitofp', so I left that as a TODO. We can't only match the shorter trunc+extract pattern because there's an opposing transform somewhere, so we infinite loop. Waiting to try this during lowering is another possibility. A motivating case is shown in PR39975 and included in the test diffs here: https://bugs.llvm.org/show_bug.cgi?id=39975 Differential Revision: https://reviews.llvm.org/D64710 llvm-svn: 366098	2019-07-15 18:17:23 +00:00
Simon Pilgrim	83e1a1e79b	[TargetLowering] SimplifyDemandedVectorElts - add shift/rotate support. llvm-svn: 364548	2019-06-27 14:25:54 +00:00
Simon Pilgrim	c692a8dc51	[TargetLowering] SimplifyDemandedBits - use DemandedElts to better identify partial splat shift amounts llvm-svn: 364541	2019-06-27 13:48:43 +00:00
Sanjay Patel	d14389c0a5	[x86] split 256-bit vector selects if operands are vector concats This is similar logic/motivation to the select splitting in D62969. In D63233, the pattern changes so that we no longer have an extract_subvector of vselect, but the operands of the select are still being concatenated. The closest case is represented in either the first or last test diffs here - we have an extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions. I think that's the right trade-off for most AVX1 targets. In the example based on PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 ...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx). Differential Revision: https://reviews.llvm.org/D63364 llvm-svn: 363508	2019-06-16 14:04:49 +00:00
Craig Topper	d10a200ceb	[X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing. We require d/q suffixes on the memory form of these instructions to disambiguate the memory size. We don't require it on the register forms, but need to support parsing both with and without it. Previously we always printed the d/q suffix on the register forms, but it's redundant and inconsistent with gcc and objdump. After this patch we should support the d/q for parsing, but not print it when its unneeded. llvm-svn: 360085	2019-05-06 21:39:51 +00:00
Simon Pilgrim	38a0616c1d	[DAGCombiner] Fold truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y)) If scalar truncates are free, attempt to pre-truncate build_vectors source operands. Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering. Differential Revision: https://reviews.llvm.org/D59654 llvm-svn: 357161	2019-03-28 11:34:21 +00:00
Simon Pilgrim	e24441aab0	[TargetLowering] Add SimplifyDemandedBits support for ISD::INSERT_VECTOR_ELT This helps us relax the extension of a lot of scalar elements before they are inserted into a vector. Its exposes an issue in DAGCombiner::convertBuildVecZextToZext as some/all the zero-extensions may be relaxed to ANY_EXTEND, so we need to handle that case to avoid a couple of AVX2 VPMOVZX test regressions. Once this is in it should be easier to fix a number of remaining failures to fold loads into VBROADCAST nodes. Differential Revision: https://reviews.llvm.org/D59484 llvm-svn: 356989	2019-03-26 12:32:01 +00:00
Simon Pilgrim	d33e62c826	[X86][SSE] Fold scalar_to_vector(i64 anyext(x)) -> bitcast(scalar_to_vector(i32 anyext(x))) Reduce the size of an any-extended i64 scalar_to_vector source to i32 - the any_extend nodes are often introduced by SimplifyDemandedBits. llvm-svn: 356292	2019-03-15 19:14:28 +00:00
Simon Pilgrim	8fbe439345	[SelectionDAG] Add SimplifyDemandedBits handling for ISD::SCALAR_TO_VECTOR Fixes a lot of constant folding mismatches between i686 and x86_64 llvm-svn: 356273	2019-03-15 17:00:55 +00:00
Craig Topper	be9eeb5526	Recommit r354363 "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" And its follow ups r354511, r354640. A follow patch will fix the issue that caused it to be reverted. llvm-svn: 354737	2019-02-23 21:41:42 +00:00
Reid Kleckner	e3876637cf	Revert r354363 & co "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" r354363 caused https://crbug.com/934963#c1, which has a plain C reduced test case. I also had to revert some dependent changes: - r354648 - r354647 - r354640 - r354511 llvm-svn: 354713	2019-02-23 01:19:42 +00:00
Sanjay Patel	234a5e8ea4	[x86] vectorize more cast ops in lowering to avoid register file transfers This is a follow-up to D56864. If we're extracting from a non-zero index before casting to FP, then shuffle the vector and optionally narrow the vector before doing the cast: cast (extelt V, C) --> extelt (cast (extract_subv (shuffle V, [C...]))), 0 This might be enough to close PR39974: https://bugs.llvm.org/show_bug.cgi?id=39974 Differential Revision: https://reviews.llvm.org/D58197 llvm-svn: 354619	2019-02-21 20:40:39 +00:00
Simon Pilgrim	0b3b9424ca	[X86][SSE] Generalize X86ISD::BLENDI support to more value types D42042 introduced the ability for the ExecutionDomainFixPass to more easily change between BLENDPD/BLENDPS/PBLENDW as the domains required. With this ability, we can avoid most bitcasts/scaling in the DAG that was occurring with X86ISD::BLENDI lowering/combining, blend with the vXi32/vXi64 vectors directly and use isel patterns to lower to the float vector equivalent vectors. This helps the shuffle combining and SimplifyDemandedVectorElts be more aggressive as we lose track of fewer UNDEF elements than when we go up/down through bitcasts. I've introduced a basic blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) fold, there are more generalizations I can do there (e.g. widening/scaling and handling the tricky v16i16 repeated mask case). The vector-reduce-smin/smax regressions will be fixed in a future improvement to SimplifyDemandedBits to peek through bitcasts and support X86ISD::BLENDV. Reapplied after reversion at rL353699 - AVX2 isel fix was applied at rL354358, additional test at rL354360/rL354361 Differential Revision: https://reviews.llvm.org/D57888 llvm-svn: 354363	2019-02-19 18:05:42 +00:00
Sam McCall	e825ba9165	Revert "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" This reverts commit r353610. It causes a miscompile visible in macro expansion in a bootstrapped clang. http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190211/626590.html llvm-svn: 353699	2019-02-11 14:05:36 +00:00
Simon Pilgrim	690a2889d8	[X86][SSE] Generalize X86ISD::BLENDI support to more value types D42042 introduced the ability for the ExecutionDomainFixPass to more easily change between BLENDPD/BLENDPS/PBLENDW as the domains required. With this ability, we can avoid most bitcasts/scaling in the DAG that was occurring with X86ISD::BLENDI lowering/combining, blend with the vXi32/vXi64 vectors directly and use isel patterns to lower to the float vector equivalent vectors. This helps the shuffle combining and SimplifyDemandedVectorElts be more aggressive as we lose track of fewer UNDEF elements than when we go up/down through bitcasts. I've introduced a basic blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) fold, there are more generalizations I can do there (e.g. widening/scaling and handling the tricky v16i16 repeated mask case). The vector-reduce-smin/smax regressions will be fixed in a future improvement to SimplifyDemandedBits to peek through bitcasts and support X86ISD::BLENDV. Differential Revision: https://reviews.llvm.org/D57888 llvm-svn: 353610	2019-02-09 13:13:59 +00:00
Sanjay Patel	e84fbb67a1	[x86] vectorize cast ops in lowering to avoid register file transfers The proposal in D56796 may cross the line because we're trying to avoid vectorization transforms in generic DAG combining. So this is an alternate, later, x86-specific translation of that patch. There are several potential follow-ups to enhance this: 1. Allow extraction from non-zero element index. 2. Peek through extends of smaller width integers. 3. Support x86-specific conversion opcodes like X86ISD::CVTSI2P Differential Revision: https://reviews.llvm.org/D56864 llvm-svn: 353302	2019-02-06 14:59:39 +00:00
Craig Topper	cfeb1cf9af	[X86] Add INSERT_SUBVECTOR to ComputeNumSignBits This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits. My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common. Differential Revision: https://reviews.llvm.org/D56283 llvm-svn: 350432	2019-01-04 20:50:59 +00:00
Craig Topper	58c61dce1d	[X86] Add test case for D56283. This tests a case where we need to be able to compute sign bits for two insert_subvectors that is a liveout of a basic block. The result is then used as a boolean vector in another basic block. llvm-svn: 350359	2019-01-03 22:31:07 +00:00
Sanjay Patel	9633d76a40	[DAGCombiner][x86] scalarize binop followed by extractelement As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354	2019-01-03 21:31:16 +00:00
Simon Pilgrim	1e1fd9c761	[TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts Differential Revision: https://reviews.llvm.org/D55600 llvm-svn: 349264	2018-12-15 11:36:36 +00:00
Simon Pilgrim	b5aaa673c6	[X86][SSE] Add SSE vector imm/var shift support to SimplifyDemandedVectorEltsForTargetNode llvm-svn: 349057	2018-12-13 16:39:29 +00:00
Simon Pilgrim	f6c898e12f	[TargetLowering] Add ISD::AND handling to SimplifyDemandedVectorElts If either of the operand elements are zero then we know the result element is going to be zero (even if the other element is undef). Differential Revision: https://reviews.llvm.org/D55558 llvm-svn: 348926	2018-12-12 13:43:07 +00:00
Simon Pilgrim	f6371f5f23	[TargetLowering] Add ISD::EXTRACT_VECTOR_ELT support to SimplifyDemandedBits Let SimplifyDemandedBits attempt to simplify all elements of a vector extraction. Part of PR39689. llvm-svn: 348839	2018-12-11 11:08:40 +00:00
Simon Pilgrim	b356d0463e	[TargetLowering] Improve SimplifyDemandedVectorElts/SimplifyDemandedBits support For bitcast nodes from larger element types, add the ability for SimplifyDemandedVectorElts to call SimplifyDemandedBits by merging the elts mask to a bits mask. I've raised https://bugs.llvm.org/show_bug.cgi?id=39689 to deal with the few places where SimplifyDemandedBits's lack of vector handling is a problem. Differential Revision: https://reviews.llvm.org/D54679 llvm-svn: 347301	2018-11-20 12:02:16 +00:00
Craig Topper	9a7e19b8f2	[DAGCombiner][X86][Mips] Enable combineShuffleOfScalars to run between vector op legalization and DAG legalization. Fix bad one use check in combineShuffleOfScalars It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input. I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used. Differential Revision: https://reviews.llvm.org/D54283 llvm-svn: 346530	2018-11-09 18:04:34 +00:00
Craig Topper	ed6a0a817f	[X86] Add vector shift by immediate to SimplifyDemandedBitsForTargetNode. Summary: This also enables some constant folding from KnownBits propagation. This helps on some cases vXi64 case in 32-bit mode where constant vectors appear as vXi32 and a bitcast. This can prevent getNode from constant folding sra/shl/srl. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54069 llvm-svn: 346102	2018-11-04 17:31:27 +00:00

1 2

79 Commits