llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	2a419a0b99	[X86][SSE] combineX86ShuffleChain - check if we're blending with zero into already zero elements Add a SelectionDAG::MaskedElementsAreZero helper that wraps SelectionDAG::MaskedValueIsZero testing for entirely zero vector elements	2021-04-20 17:09:49 +01:00
Simon Pilgrim	9d57a77b81	[X86] combineCMP - fold cmpEQ/NE(TRUNC(X),0) -> cmpEQ/NE(X,0) If we are truncating from a i32 source before comparing the result against zero, then see if we can directly compare the source value against zero. If the upper (truncated) bits are known to be zero then we can compare against that, hopefully increasing the chances of us folding the compare into a EFLAG result of the source's operation. Fixes PR49028. Differential Revision: https://reviews.llvm.org/D100491	2021-04-15 13:55:51 +01:00
Simon Pilgrim	4fbe761572	[X86][SSE] canonicalizeShuffleWithBinOps - check for more combos of merge-able binary shuffles. In the fold SHUFFLE(BINOP(X,Y),BINOP(Z,W)) -> BINOP(SHUFFLE(X,Z),SHUFFLE(Y,W)), check if both X/Z AND Y/W have at least one merge-able shuffle in which case the total number of shuffle should still fall. Helps with instruction count regressions we saw while fixing PR48823	2021-04-14 15:24:41 +01:00
Simon Pilgrim	73737fe990	[X86] Fold cmpeq/ne(trunc(x),0) --> cmpeq/ne(x,0) Relax the fold from rGbaadbe04bf75 to compare any op, not just logic ops, now that the movmsk regressions have been handled.	2021-04-14 11:02:02 +01:00
Simon Pilgrim	016ceb8382	[X86][SSE] combineSetCCMOVMSK - allow comparison with upper (known zero) bits in MOVMSK(SHUFFLE(X,u)) -> MOVMSK(X) fold Extension to rG74f98391a7a4, we can also include any of the upper (known zero) bits in the comparison in the shuffle removal fold, just as long as we demand all the elements of the movmsk source vector.	2021-04-14 11:02:01 +01:00
Simon Pilgrim	74f98391a7	[X86][SSE] combineSetCCMOVMSK - allow comparison with upper (known zero) bits in CMP(MOVMSK(PACKSS())) -> CMP(MOVMSK()) fold We already allow the comparison of the upper bits of 'IsAllOf' (allbits) patterns, but we can safely compare the known zero bits for 'IsAnyOf' (zerobits) patterns as well. This fixes an issues where we are comparing a type wide than the number of vector elements, which avoids a regression mentioned in rGbaadbe04bf75.	2021-04-13 17:37:24 +01:00
Simon Pilgrim	baadbe04bf	[X86] Fold cmpeq/ne(trunc(logic(x)),0) --> cmpeq/ne(logic(x),0) Fixes the issues noted in PR48768, where the and/or/xor instruction had been promoted to avoid i8/i16 partial-dependencies, but the test against zero had not. We can almost certainly relax this fold to work for any truncation, although it breaks a number of existing folds (notable movmsk folds which tend to rely on the truncate to determine the demanded bits/elts in the source vector). There is a reverse combine in TargetLowering.SimplifySetCC so we must wait until after legalization before attempting this.	2021-04-12 16:05:34 +01:00
Simon Pilgrim	231b87618b	[X86][AVX512] Fold not(kmov(x)) -> kmov(not(x)) and not(widen_subvector(x)) -> widen_subvector(not(x)) Improve AVX512 mask inversion, rG38c799bce801 exposed some missing opportunities to move scalar not() back onto the boolvector types for folding with setcc etc.	2021-04-11 20:07:09 +01:00
Simon Pilgrim	13bdac5709	[X86] combineXor - Pull out repeated getOperand() calls. NFCI.	2021-04-11 19:01:59 +01:00
Simon Pilgrim	38c799bce8	[X86] Fold cmpeq/ne(and(X,Y),Y) --> cmpeq/ne(and(~X,Y),0) Followup to D100177, handle an similar (demorgan inverse style) case from PR47797 as well The AVX512 test cases could be further improved if we folded not(iX bitcast(vXi1)) -> (iX bitcast(not(vXi1))) Alive2: https://alive2.llvm.org/ce/z/AnA_-W	2021-04-11 18:42:01 +01:00
Simon Pilgrim	d8bc4de3cf	[X86] Fold cmpeq/ne(or(X,Y),X) --> cmpeq/ne(and(~X,Y),0) on non-BMI targets (PR44136) Followup to D100177, enable the fold for non-BMI targets as well.	2021-04-09 16:11:11 +01:00
Simon Pilgrim	245036950a	[X86][BMI] Fold cmpeq/ne(or(X,Y),X) --> cmpeq/ne(and(~X,Y),0) (PR44136) I've initially just enabled this for BMI which has the ANDN instruction for i32/i64 - the i16/i8 cases give an idea of what'd we get when we enable it in all cases (I'll do this as a later commit). Additionally, the i16/i8 cases could be freely promoted to i32 (as the args are already zeroext) and we could then make use of ANDN + the free cmp0 there as well - this has come up in PR48768 and PR49028 so I'm going to look at this soon. https://alive2.llvm.org/ce/z/QVWHP_ https://alive2.llvm.org/ce/z/pLngT- Vector cases do not appear to benefit from this as we end up with having to generate the zero vector as well - this is one of the reasons I didn't try to tie this into hasAndNot/hasAndNotCompare. Differential Revision: https://reviews.llvm.org/D100177	2021-04-09 15:52:03 +01:00
Simon Pilgrim	3ae0a405fc	[X86] combineHorizOpWithShuffle - peek through one use bitcasts when decoding shuffles. Checking for one use, peek through bitcasts of the horizop args to allows us to merge shuffles of different widths through the horizop.	2021-04-09 10:51:04 +01:00
Simon Pilgrim	53283cc2f1	[X86][SSE] canonicalizeShuffleWithBinOps - add MOVSD/MOVSS handling.	2021-04-06 16:42:18 +01:00
Simon Pilgrim	ddbb58736a	[KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI. As promised in D98866	2021-04-06 10:11:41 +01:00
Simon Pilgrim	36d4f6d7f8	[X86] Fold xor(zext(xor(x,c1)),c2) -> xor(zext(x),xor(zext(c1),c2)) Fixes PR47603 (second case) by extending rG89afec348dbd3e5078f176e978971ee2d3b5dec8	2021-04-05 11:40:37 +01:00
Simon Pilgrim	89afec348d	[X86] Fold xor(truncate(xor(x,c1)),c2) -> xor(truncate(x),xor(truncate(c1),c2)) Fixes PR47603 This should probably be transferable to DAGCombine - the main limitation with the existing trunc(logicop) DAG fold is we don't know if legalization has tried to promote truncated logicops already. We might be able to peek through extensions as well.	2021-04-03 12:43:05 +01:00
Simon Pilgrim	7c17f1ea84	[X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper (REAPPLIED) Use the getTargetShuffleInputs helper for all shuffle decoding Reapplied (after reversion in rGfa0aff6d6960) with fix+test for subvector splitting - we weren't accounting for peeking through bitcasts changing the vector element count of the shuffle sources.	2021-04-03 11:59:19 +01:00
Nico Weber	fa0aff6d69	Revert "[X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper" This reverts commit `500969f1d0`. Makes clang assert compiling avx2 code, see https://bugs.chromium.org/p/chromium/issues/detail?id=1195353#c4 for a standalone repro.	2021-04-02 09:55:55 -04:00
Simon Pilgrim	500969f1d0	[X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper Use the getTargetShuffleInputs helper for all shuffle decoding	2021-04-02 11:50:18 +01:00
Yang Fan	bc6001ce1e	[X86] Fix -Wunused-function warning (NFC) GCC warning: ``` /llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:9212:13: warning: ‘bool isHorizOp(unsigned int)’ defined but not used [-Wunused-function] 9212 \| static bool isHorizOp(unsigned Opcode) { \| ^~~~~~~~~ ```	2021-04-02 09:38:12 +08:00
Simon Pilgrim	abbe80fa52	[X86][SSE] Fold HOP(HOP(X,X),HOP(Y,Y)) -> HOP(PERMUTE(HOP(X,Y)),PERMUTE(HOP(X,Y)) For slow-hop targets, attempt to merge HADD/SUB pairs used in chains.	2021-04-01 11:54:10 +01:00
Simon Pilgrim	301319840e	[X86][SSE] Enable (F)HADD/SUB handling to SimplifyMultipleUseDemandedVectorElts Attempt to bypass unused horiz-op operands. This is very similar to the PACKSS/PACKUS handling - we should try to merge these.	2021-04-01 11:54:09 +01:00
Simon Pilgrim	f7aeaced65	[X86][SSE] Add isHorizOp helper function. NFCI.	2021-04-01 11:54:09 +01:00
Craig Topper	437958d9fd	[X86] Improve SMULO/UMULO codegen for vXi8 vectors. The default expansion creates a MUL and either a MULHS/MULHU. Each of those separately expand to sequences that use one or more PMULLW instructions as well as additional instructions to extend the types to vXi16. The MULHS/MULHU expansion computes the whole 16-bit product, but only keeps the high part. We can improve the lowering of SMULO/UMULO for some cases by using the MULHS/MULHU expansion, but keep both the high and low parts. And we can use those parts to calculate the overflow. For AVX512 we might have vXi1 overflow outputs. We can improve those by using vpcmpeqw to produce a k register if AVX512BW is enabled. This is a little better than truncating the high result to use vpcmpeqb. If we don't have avx512bw we can extend up to v16i32 to use vpcmpeqd to produce a k register. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97624	2021-03-31 10:13:50 -07:00
Tomas Matheson	a9968c0a33	[NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions Currently needsStackRealignment returns false if canRealignStack returns false. This means that the behavior of needsStackRealignment does not correspond to it's name and description; a function might need stack realignment, but if it is not possible then this function returns false. Furthermore, needsStackRealignment is not virtual and therefore some backends have made use of canRealignStack to indicate whether a function needs stack realignment. This patch attempts to clarify the situation by separating them and introducing new names: - shouldRealignStack - true if there is any reason the stack should be realigned - canRealignStack - true if we are still able to realign the stack (e.g. we can still reserve/have reserved a frame pointer) - hasStackRealignment = shouldRealignStack && canRealignStack (not target customisable) Targets can now override shouldRealignStack to indicate that stack realignment is required. This change will make it easier in a future change to handle the case where we need to realign the stack but can't do so (for example when the register allocator creates an aligned spill after the frame pointer has been eliminated). Differential Revision: https://reviews.llvm.org/D98716 Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87	2021-03-30 17:31:39 +01:00
Sanjay Patel	e694e19a79	[x86] enhance matching of pmaddwd This was crashing with the example from: https://llvm.org/PR49716 ...and that was avoided with `a283d72583` , but as we can see from the SSE vs. AVX test code diff, we can try harder to match the pattern. This matcher code was adapted from another pmadd pattern match in D49636, but it needs different ops to deal with size mismatches. Differential Revision: https://reviews.llvm.org/D99531	2021-03-30 07:28:33 -04:00
Simon Pilgrim	805148eaf2	[X86][SSE] combineHorizOpWithShuffle - consistently use getTargetShuffleInputs to decode shuffles Minor cleanup before I start trying to merge the unary/binary shuffle combining paths.	2021-03-29 11:31:19 +01:00
Craig Topper	69bdf35dc7	[X86] Optimize vXi8 MULHS on targets where we can't sign_extend to the next register size. For these cases we need to extract the upper or lower elements, multiply them using 16-bit multiplies and repack them. Previously we used punpcklbw/punpckhbw+psraw or pmovsxbw+pshudfd to extract and sign extend so we could use pmullw to compute the 16-bit product and then shift down the high bits. We can avoid the need to sign extend if we unpack the bytes into the high byte of each word and fill the lower byte with 0 using pxor. This puts the sign bit of each byte into the sign bit of each word. Since the LHS and RHS have 8 trailing zeros, the full 32-bit product of those 16-bit values will have 16 trailing zeros. This means the 16-bit product of the original bytes is in the upper 16 bits which we can calculate using pmulhw. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98587	2021-03-28 11:41:29 -07:00
Simon Pilgrim	2a0d5da917	[X86][SSE] foldShuffleOfHorizOp - remove broadcast handling. Remove VBROADCAST/MOVDDUP/splat-shuffle handling from foldShuffleOfHorizOp This can all be handled by canonicalizeShuffleMaskWithHorizOp along as we check that the HADD/SUB are only used once (to prevent infinite loops on slow-horizop targets which will try to reuse the nodes again followed by a post-hop shuffle).	2021-03-27 15:09:23 +00:00
Simon Pilgrim	41146bfe82	[X86][SSE] combineX86ShuffleChain - attempt to recognise 'hidden' identity shuffles See if the combined shuffle mask is equivalent to an identity shuffle, typically this is due to repeated LHS/RHS ops in horiz-ops, but isTargetShuffleEquivalent might see other patterns as well. This is another small step towards getting rid of foldShuffleOfHorizOp and relying on canonicalizeShuffleMaskWithHorizOp and generic shuffle combining.	2021-03-27 11:09:30 +00:00
Sanjay Patel	a283d72583	[x86] prevent crashing while matching pmaddwd This could crash in 2 ways: either one or both of the input vectors could be a different size than the math ops. https://llvm.org/PR49716	2021-03-27 05:27:14 -04:00
Simon Pilgrim	c769ba9514	[X86][AVX] combineHorizOpWithShuffle - improve SHUFFLE(HOP(LOSUBVECTOR(X),HISUBVECTOR(X))) folding Peek through bitcasts to find subvector splits and use getTargetShuffleInputs to decode target shuffles as well as ShuffleVectorSDNode	2021-03-26 17:23:54 +00:00
Simon Pilgrim	36e3c6c841	[X86][AVX] Truncate vectors with PACKSS/PACKUS on AVX2 targets Until AVX512 we don't have any vector truncation instructions, and always lower using shuffles instead. combineVectorTruncation performs this earlier than lowering as it makes it easier to use any sign/zero-extended bits in the truncated bits with PACKSS/PACKUS to perform the shuffle. We currently don't attempt to use combineVectorTruncation on AVX2 targets as in the past 256-bit PACKSS/PACKUS tended to cause 128-bit lane shuffle regressions - but these should now be all resolved with combineHorizOpWithShuffle and in all cases we now reduce the amount of cross-lane shuffling and variable shuffle mask usage. Differential Revision: https://reviews.llvm.org/D96609	2021-03-25 10:34:34 +00:00
Simon Pilgrim	9fde88c3e2	[X86][AVX] splitIntVSETCC - handle separate (canonicalized) SETCC operands LowerVSETCC calls splitIntVSETCC after canonicalizing certain patterns, in particular (X & CPow2 != 0) -> (X & CPow2 == CPow2). Unfortunately if we're splitting for AVX1/non-AVX512BW cases, we lose these canonicalizations as we call the split with the original SetCC node, and when the split nodes are later lowered in LowerVSETCC the patterns are lost behind extract_subvector etc. But if we pass the canonicalized operands for splitting we retain the optimizations. Differential Revision: https://reviews.llvm.org/D99256	2021-03-25 10:18:44 +00:00
Simon Pilgrim	7920527796	[X86][AVX] combineBitcastvxi1 - improve handling of vectors truncated to vXi1 If we're truncating to vXi1 from a wider type, then prefer the original wider vector as is simplifies folding the separate truncations + extensions. AVX1 this is only worth it for v8i1 cases, not v4i1 where we're always better off truncating down to v4i32 for movmsk. Helps with some regressions encountered in D96609	2021-03-24 14:05:59 +00:00
Simon Pilgrim	e9015bd595	[X86][AVX] lowerShuffleAsBroadcast - MOVDDUP(SCALAR_TO_VECTOR(X)) -> BROADCAST(X) Prefer broadcast from scalar on AVX targets as this makes it easier for later folds to strip away bitcasts etc. This helps a lot with the AVX1 poor codegen from PR49658. There's a trivial regression in bitcast-int-to-vector-bool-*ext.ll tests due to SimplifyDemandedBits not being able to see a multi-use case, but there's bigger existing codegen issues to be addressed first in those tests (unnecessary NOTs).	2021-03-24 11:31:56 +00:00
Simon Pilgrim	c1ef642ad8	[X86] Remove unused 'OneUse' option from IsNOT helper. NFCI.	2021-03-24 11:14:38 +00:00
Simon Pilgrim	080cb83e52	[X86][AVX] Narrow VPBROADCASTQ->VPBROADCASTD if we don't need the upper bits. Helps fix cases where we've splatted smaller types to a wider vector element type without needing the upper bits. Avoid this on AVX512 targets as that can affect broadcast folding.	2021-03-23 09:41:02 +00:00
Simon Pilgrim	3179588947	[X86][AVX] ComputeNumSignBitsForTargetNode - add X86ISD::VBROADCAST handling for scalar sources The target shuffle code handles vector sources, but X86ISD::VBROADCAST can also accept a scalar source for splatting. Added as an extension to PR49658	2021-03-21 12:22:51 +00:00
Simon Pilgrim	297b9bc3fa	[X86][AVX] computeKnownBitsForTargetNode - add X86ISD::VBROADCAST handling for scalar sources The target shuffle code handles vector sources, but X86ISD::VBROADCAST can also accept a scalar source for splatting. Suggested by @craig.topper on PR49658	2021-03-21 10:40:57 +00:00
Simon Pilgrim	54a05f2ec8	[X86] computeKnownBitsForTargetNode - add X86ISD::PMULUDQ handling Reuse the existing KnownBits multiplication code to handle what is effectively a ISD::UMUL_LOHI varient	2021-03-21 09:57:20 +00:00
Simon Pilgrim	64687f2cc3	[X86][SSE] canonicalizeShuffleWithBinOps - add PERMILPS/PERMILPD + PERMPD/PERMQ + INSERTPS handling. Bail if the INSERTPS would introduce zeros across the binop.	2021-03-16 13:52:08 +00:00
Simon Pilgrim	772155793b	[X86][SSE] isHorizontalBinOp - ensure we clear any unused source operands to improve HADD/SUB matching Our shuffle matching for HADD/SUB patterns wasn't clearing repeated ops in 'fake unary' style shuffle masks (unpack(x,x) etc.), preventing matching of add(fakeunary(),fakeunary()) style patterns.	2021-03-15 16:24:29 +00:00
Simon Pilgrim	814339454d	[X86][SSE] canonicalizeShuffleWithBinOps - handle target shuffles. Fold SHUFFLE(BINOP(SHUFFLE(X),SHUFFLE(Y))) -> BINOP(SHUFFLE'(X),SHUFFLE'(Y)) style patterns as well as the existing shuffles of constants.	2021-03-15 15:01:29 +00:00
Simon Pilgrim	07232f4507	[X86][SSE] canonicalizeShuffleWithBinOps - add X86ISD::PSHUFB handling. Recommit rGcd938ab162b0ac560dd0e9fee290980c7e0e47e5 with an early-out if the pshub would introduce zeros across the binop.	2021-03-15 12:43:30 +00:00
Simon Pilgrim	75a184dacf	Revert rG9ba577eca2e339726bfaad4e615c6324a705b292 "[X86][SSE] canonicalizeShuffleWithBinOps - handle target shuffles. NFCI." Sorry this wasn't supposed to be committed yet (and certainly not tagged as NFCI....)	2021-03-15 12:23:44 +00:00
Simon Pilgrim	9ba577eca2	[X86][SSE] canonicalizeShuffleWithBinOps - handle target shuffles. NFCI. Fold SHUFFLE(BINOP(SHUFFLE(X),SHUFFLE(Y))) -> BINOP(SHUFFLE'(X),SHUFFLE'(Y)) style patterns as well as the existing shuffles of constants.	2021-03-15 11:59:25 +00:00
Simon Pilgrim	6878be5dc3	[X86][SSE] Attempt to merge single-op hops for slow targets. For slow-hop targets, see if any single-op hops are duplicating work already done on another (dual-op) hop, which can sometimes occur as isHorizontalBinOp tries to find potential duplicates (but can't merge them itself). If so, reuse the other hop and shuffle the result.	2021-03-15 09:30:20 +00:00
Simon Pilgrim	6cb7dddaf4	[X86][AVX] Insert zeros byte elements into 256/512-bit vectors using shuffle/and Avoid extracting/inserting subvectors which makes it more difficult for shuffle combining to merge them together.	2021-03-12 15:16:36 +00:00

1 2 3 4 5 ...

7615 Commits