Commit Graph

8142 Commits

Author SHA1 Message Date
Simon Pilgrim e8305c0b8f [X86] combineX86ShuffleChain - don't fold to truncate(concat(V1,V2)) if it was already a PACK op
Fixes #55050
2022-04-25 17:13:44 +01:00
Craig Topper c6fdb1de47 [X86] Move some hasOneUse checks after checking what the opcode is.
Calling hasOneUse can be expensive on nodes with multiple results.
Especially when some results are Chains. By checking the opcode first,
we can avoid walking the uses if it isn't an interesting node,
and thus avoid calling hasOneUse on a node that might have many uses.

Found by profiling the IR given in D123857.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D123881
2022-04-16 14:18:58 -07:00
Craig Topper 9d86bf825c [X86] Move hasOneUse check after opcode check. NFC
Checking opcode is cheap. hasOneUse might not be if the node has
multiple results. By checking the opcode we can rule out nodes
with multiple results we aren't interested in.
2022-04-15 17:20:57 -07:00
Liu, Chen3 bf60a5af0a [X86] Covert unsigned int 0 to float-point with FILD instruction.
unsinged int 0 will be convert to float/double -0.0 when the rounding
mode is set to 'FE_DOWNWARD'. Using FILD instruction instead of SSE
instructions on 32-bit target if the strictfp is enabled.

Differential Revision: https://reviews.llvm.org/D123660
2022-04-13 20:06:15 +08:00
Simon Pilgrim 0488c6638b [X86] getFauxShuffleMask - remove use DemandedElts TODO
Most of the getTargetShuffleInputs recursive calls have now gone and the remaining uses aren't likely to benefit from a DemandedElts mask
2022-04-12 15:36:30 +01:00
Simon Pilgrim 1e803d305a Revert rG88ff6f70c45f2767576c64dde28cbfe7a90916ca "[X86] Extend vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)) to include inner or(pshufb(x), pshufb(y)) chains"
Reverting while I investigate reports of internal test regressions/failures
2022-04-11 10:42:43 +01:00
Simon Pilgrim 88ff6f70c4 [X86] Extend vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)) to include inner or(pshufb(x), pshufb(y)) chains 2022-04-10 13:04:53 +01:00
Simon Pilgrim c74d729bd6 [X86] combineExtractSubvector - fold extract_subvector(insert_subvector(V,X,C1),C1)
extract_subvector(insert_subvector(V,X,C1),C1) -> insert_subvector(extract_subvector(V,C1),X,0)

More aggressively attempt to reduce the width of an extract_subvector source - we currently only do this if we're inserting into a zero vector (i.e. canonicalizing to the AVX implicit zero upper elts pattern).

But if we're extracting from the same point as the inner insert_subvector then the fold is still relatively trivial - we can probably do even better if we can ensure the subvector isn't badly split.
2022-04-10 11:03:08 +01:00
Simon Pilgrim 30a01bccda [X86] Fold concat(pshufb(x,y),pshufb(z,w)) -> pshufb(concat(x,z),concat(y,w)) 2022-04-09 16:05:50 +01:00
Simon Pilgrim 97ee923248 [X86] lowerV64I8Shuffle - attempt to fold to SHUFFLE(ALIGNR(X,Y)) and OR(PSHUFB(X),PSHUFB(Y)) 2022-04-09 14:09:39 +01:00
Simon Pilgrim 3d4bb78fbe [X86][SSE] combineSelect - more aggressively create zero elements in the or(pshufb(x), pshufb(y)) fold
When we fold vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)), ensure we convert all undef elements to zero elements - this should help us expose more known zero elements for deeper chains of these cases.

Noticed while triaging Issue #54819
2022-04-09 12:53:00 +01:00
chenglin.bi f72b3a506b [x86] Replace getNodeIfExists to doesNodeExist when only check node exist
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123224
2022-04-08 00:33:05 +08:00
Wei Xiao 842d0bf931 [x86] Improve select lowering for smin(x, 0) & smax(x, 0)
smin(x, 0):
  (select (x < 0), x, 0) -> ((x >> (size_in_bits(x)-1))) & x

smax(x, 0):
  (select (x > 0), x, 0) -> (~(x >> (size_in_bits(x)-1))) & x
  The comparison is testing for a positive value, we have to invert the sign
  bit mask, so only do that transform if the target has a bitwise 'and not'
  instruction (the invert is free).

The transform is performed only when CMP has a single user to avoid
increasing total instruction number.

https://alive2.llvm.org/ce/z/euUnNm
https://alive2.llvm.org/ce/z/37339J

Differential Revision: https://reviews.llvm.org/D123109
2022-04-07 15:53:24 +08:00
Matt Arsenault c4ea925f50 AtomicExpand: Change return type for shouldExpandAtomicStoreInIR
Use the same enum as the other atomic instructions for consistency, in
preparation for addition of another strategy.

Introduce a new "Expand" option, since the store expansion does not
use cmpxchg. Alternatively, the existing CmpXChg strategy could be
renamed to Expand.
2022-04-06 22:34:04 -04:00
Roman Lebedev 9be6e7b0f2
[X86] `lowerBuildVectorAsBroadcast()`: with AVX512VL, allow i64->XMM broadcasts from constant pool
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D123221
2022-04-06 18:33:40 +03:00
Pierre Gousseau a3d5f1cf5d [x86] Fix infinite loop inside DAG combiner with lzcnt feature.
The issue affects targets supporting fast-lzcnt such as btver2.
This removes extraneous zext/trunc node insertions to fix the infinite
loop.
This fixes Issue https://github.com/llvm/llvm-project/issues/54694

Differential Revision: https://reviews.llvm.org/D122900

Reviewed By: RKSimon, spatel, lebedev.ri
2022-04-05 17:32:10 +01:00
Simon Pilgrim 623d4b5787 [X86] Support optional NOT stages in the AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) fold
Extension to D122891, peek through NOT() ops, adjusting the condcode as we go.
2022-04-04 10:51:26 +01:00
Simon Pilgrim fbfd78f7aa [X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow v16i32 sub-lane permutes for v64i8 shuffles
Without VBMI, we are better off permuting v16i32 sub-lanes, even though its a variable shuffle, if it allows us to then shuffle v64i8 inlane repeated masks (PSHUFB etc.)

Fixes #54658
2022-04-03 10:05:10 +01:00
Simon Pilgrim b8652fbcbb [X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) (RECOMMITTED)
As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc.

Recommitted with a fix to ensure we zext/trunc the SETCC result to the original type.

Differential Revision: https://reviews.llvm.org/D122891
2022-04-01 16:59:06 +01:00
Simon Pilgrim 5a457bd2fa Revert rGa5f637bcbb7d1e08ce637f113fc117c3f4b2b110 "[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y))"
Investigating a sanitizer-windows buildbot breakage
2022-04-01 16:48:24 +01:00
Simon Pilgrim 9afa6811ad [X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow 64-bit sublane shuffling on AVX512BW v64i8 shuffles
We were only performing this on 256-bit vectors on AVX2 targets

Noticed while triaging Issue #54658
2022-04-01 16:40:10 +01:00
Simon Pilgrim a5f637bcbb [X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y))
As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc.

Differential Revision: https://reviews.llvm.org/D122891
2022-04-01 16:07:56 +01:00
Simon Pilgrim 3245cfb8d3 [X86] Add getBT helper node for attempting to create a X86ISD::BT node
Avoids repeating all the extension/legalization wrappers in every use
2022-04-01 11:48:25 +01:00
Simon Pilgrim 919b657080 Revert rGff2d1bb2b749bd8a5697c25d2380b7c97a59ae06 "[X86] Add getBT helper node for attempting to create a X86ISD::BT node"
Typo means that this doesn't return a value in all cases.
2022-04-01 11:21:00 +01:00
Simon Pilgrim ff2d1bb2b7 [X86] Add getBT helper node for attempting to create a X86ISD::BT node
Avoids repeating all the extension/legalization wrapper in every use
2022-04-01 11:12:23 +01:00
Simon Pilgrim cb5c4a5917 [X86] lowerV8I16Shuffle - use explicit SmallVector<SDValue, 4> width to avoid MSVC AVX alignment bug
As discussed on Issue #54645 - building llc with /AVX can result in incorrectly aligned structs
2022-04-01 10:54:24 +01:00
Simon Pilgrim 535211c3eb [X86] Remove redundant FIXME
lowerV64I8Shuffle has been extended a lot since this was added.
2022-03-31 18:05:52 +01:00
Simon Pilgrim fac1729924 [X86] lowerV64I8Shuffle - don't use lowerShuffleWithPERMV until we've tried simpler options
Shuffle combining will still lower to this with better fast cross lane checks.

Noticed while triaging Issue #54658
2022-03-31 18:05:51 +01:00
Sanjay Patel 4a54e3eed3 [x86] try to replace 0.0 in fcmp with negated operand
This inverts a fold recently added to IR with:
3491f2f4b0

We can put -bidirectional on the Alive2 examples to show that
the reverse transforms work:
https://alive2.llvm.org/ce/z/8iVQwB

The motivation for the IR change was to improve matching to
'fabs' in IR (see https://github.com/llvm/llvm-project/issues/38828 ),
but it regressed x86 codegen for 'not-quite-fabs' patterns like
(X > -X) ? X : -X.
Ie, when there is no fast-math (nsz), the cmp+select is not a proper
fabs operation, but it does map nicely to the unusual NAN semantics
of MINSS/MAXSS.

I drafted this as a target-independent fold, but it doesn't appear to
help any other targets and seems to cause regressions for SystemZ at
least.

Differential Revision: https://reviews.llvm.org/D122726
2022-03-31 09:17:49 -04:00
Simon Pilgrim 481b185620 [X86] combineCarryThroughADD - recognise X86ISD::ADD(AND(X,1),-1) pattern can be folded to X86ISD::BT
As mentioned on D122482, if we've generated a masked overflow test see if we can fold it to X86ISD::BT to feed a X86ISD::ADC/SBB

Differential Revision: https://reviews.llvm.org/D122572
2022-03-31 09:52:55 +01:00
Simon Pilgrim 6697e3354f [X86] combineADC - fold ADC(C1,C2,Carry) -> ADC(0,C1+C2,Carry)
If we're not relying on the flag result, we can fold the constants together into the RHS immediate operand and set the LHS operand to zero, simplifying for further folds.

We could do something similar if the flag result is in use and the constant fold doesn't affect it, but I don't have any real test cases for this yet.

As suggested by @davezarzycki on Issue #35256

Differential Revision: https://reviews.llvm.org/D122482
2022-03-30 09:11:55 +01:00
Simon Pilgrim 1ec109ec58 [X86] combineCarryThroughADD - remove unused peek through of SEXT/AEXT nodes. 2022-03-29 17:22:50 +01:00
Shao-Ce SUN 662b9fa02c [NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122557
2022-03-29 09:53:24 +08:00
Simon Pilgrim 8a1956dfa5 [X86] lowerV64I8Shuffle - attempt to match with lowerShuffleAsLanePermuteAndPermute
Fixes #54562
2022-03-28 17:21:27 +01:00
Phoebe Wang 674d52e8ce [X86] Refactor X86ScalarSSEf16/32/64 with hasFP16/SSE1/SSE2. NFCI
This is used for f16 emulation. We emulate f16 for SSE2 targets and
above. Refactoring makes the future code to be more clean.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D122475
2022-03-27 12:24:02 +08:00
Simon Pilgrim 43a969debd [X86] combineADC - pull out repeated dyn_cast<ConstantSDNode> calls. NFC. 2022-03-25 12:53:08 +00:00
Simon Pilgrim 3db858c58c [X86] combineAdd - fold ADD(ADC(Y,0,W),X) -> ADC(X,Y,W)
This also exposed a missed ADC canonicalization of constant ops to the RHS
2022-03-25 10:52:10 +00:00
Simon Pilgrim 33b214b711 [X86] combineSub - fold SUB(X,ADC(Y,0,W)) -> SBB(X,Y,W) 2022-03-24 18:00:00 +00:00
Simon Pilgrim 438ac282db [X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y) (REAPPLIED)
As suggested on PR35908, if we are adding/subtracting an extracted bit, attempt to use BT instead to fold the op and use a ADC/SBB op.

Reapply with extra type legality checks - LowerAndToBT was originally only used during lowering, now that it can occur earlier we might encounter illegal types that we can either promote to i32 or just bail.

Differential Revision: https://reviews.llvm.org/D122084
2022-03-21 21:37:42 +00:00
Nikita Popov 1533682839 Revert "[X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y)"
This reverts commit 81569f5b6e.

This causes a segfault when building consumer-typeset in
ReleaseLTO-g configuration:
https://llvm-compile-time-tracker.com/show_error.php?commit=81569f5b6ef531a48023f28133481262ee1509a3
2022-03-21 21:52:36 +01:00
Simon Pilgrim 5fd9451668 [X86][AVX512] lower1BitShuffle - fold broadcast(setcc(x,y)) -> setcc(broadcast(x),broadcast(y)) (PR52500)
AVX512 has excellent broadcast ops for everything but vXi1 bool vectors - so if we're broadcasting a comparison result, see if we can broadcast the comparison operands instead.
2022-03-21 17:42:49 +00:00
Simon Pilgrim b6e2832fc2 [X86] Don't fold SUB(X,SBB(0,0,W)) -> SUB(ADC(0,0,W),Y)
This will further fold to a AND(SETCC_CARRY(),1) pattern which tends to prevent further folds.
2022-03-21 15:54:48 +00:00
Simon Pilgrim 315896d3ac [X86] Fold SUB(X,SBB(Y,Z,W)) -> SUB(ADC(X,Z,W),Y)
Prefer the commutable ADC over SBB to improve load folding opportunities
2022-03-21 14:20:46 +00:00
Simon Pilgrim ed51e26ab4 [X86] combineAddOrSubToADCOrSBB - commute + neg subtraction patterns
Handle SUB(AND(SRL(Y,Z),1),X) -> NEG(SBB(X,0,BT(Y,Z))) folds

I'll address the X86 lost folded-load regressions in a follow-up patch
2022-03-21 13:55:35 +00:00
Simon Pilgrim 5e9365c5eb [X86] combineAddOrSubToADCOrSBB - bail for illegal types
Ensure we don't attempt to fold to illegal types to ADC/SBB nodes.

After D122084 its possible for ADD(X,AND(SRL(Y,Z),1) patterns to be matched before type legalization.
2022-03-21 13:31:21 +00:00
Simon Pilgrim 81569f5b6e [X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y)
As suggested on PR35908, if we are adding/subtracting an extracted bit, attempt to use BT instead to fold the op and use a ADC/SBB op.

Differential Revision: https://reviews.llvm.org/D122084
2022-03-21 10:57:12 +00:00
Simon Pilgrim 1ae3c4e948 [X86] combineAddOrSubToADCOrSBB - split to more cleanly handle commuted variants.
Split combineAddOrSubToADCOrSBB into wrapper (which handles ADDs with commuted args) and the real combine, which no longer has to account for commutation.

I'm intending to extend combineAddOrSubToADCOrSBB to detect patterns other than just X86ISD::SETCC, so we need to detect all patterns without detecting them as part of a commutation swap.
2022-03-20 09:14:21 +00:00
Shengchen Kan 076a9dc99a [X86][NFC] Rename hasCMOV() to canUseCMOV(), hasLAHFSAHF() to canUseLAHFSAHF()
To make them less like other feature functions.
This is a follow-up patch for D121978.
2022-03-20 12:00:25 +08:00
Craig Topper 57b41af838 [X86] Rename FeatureCMPXCHG8B/FeatureCMPXCHG16B to FeatureCX8/CX16 to match CPUID.
Rename hasCMPXCHG16B() to canUseCMPXCHG16B() to make it less like other
feature functions. Add a similar canUseCMPXCHG8B() that aliases
hasCX8() to keep similar naming.

Differential Revision: https://reviews.llvm.org/D121978
2022-03-19 12:34:06 -07:00
Simon Pilgrim 34110a7320 [X86] combineAddOrSubToADCOrSBB - pull out repeated Y.getOperand(1) calls. NFC. 2022-03-19 17:56:11 +00:00
Simon Pilgrim b90478d422 [X86] createShuffleMaskFromVSELECT - handle BLENDV constant masks as well as VSELECT constant masks
Handle constant masks for both vselect nodes (mask != 0) and blendv nodes (mask < 0)
2022-03-19 16:51:07 +00:00
Simon Pilgrim a6c18bfbe3 [X86] combineSelect - don't constant fold BLENDV nodes like VSELECT
If a X86ISD::BLENDV op appears before legalization (in this test case due to the icmp_slt x, 0) its constant mask was being treated as a vselect mask (mask != 0) instead of blendv (mask < 0)

This just prevents constant folding entirely for non-VSELECT ops.
2022-03-19 16:31:19 +00:00
Simon Pilgrim 56ad791f46 [X86] LowerAndToBT - fold BT(NOT(X),Y) -> BT(X,Y) and flip the CondCode 2022-03-19 14:03:03 +00:00
Simon Pilgrim c7ba5a9aff [X86][SSE] Add initial support for extracting non-constant bool vector elements
We can use MOVMSK+TEST/BT to extract individual bool elements even if the index isn't constant

This relies on combineBitcastvxi1 so some AVX512 cases still aren't optimized as they avoid MOVMSK usage.
2022-03-19 13:31:05 +00:00
Shengchen Kan 920c2e5763 [X86][NFC] Rename target feature hasCMov->hasCMOV
This is a follow-up patch for D121975.
2022-03-18 14:05:52 +08:00
Craig Topper 6cfe41dcc8 [X86] Rename more target feature related things consistency. NFC
-Rename Mode*Bit to Is*Bit to match X86Subtarget.
-Rename FeatureLAHFSAHF to FeatureLAFHSAFH64 to match X86Subtarget.
-Use consistent capitalization

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D121975
2022-03-17 22:27:17 -07:00
Simon Pilgrim e3deb7d88b [X86] computeKnownBitsForTargetNode - add X86ISD::AND KnownBits handling
Fixes #54171
2022-03-16 11:05:36 +00:00
Shengchen Kan 052d37dc7c [NFC][X86] Rename some variables and functions about target features
This is preparation for D121768. The member's name should align w/
the interface for trival target feature.
2022-03-16 13:08:52 +08:00
Simon Pilgrim f591231cad [X86] combineSelect - canonicalize (vXi1 bitcast(iX Cond)) with combineToExtendBoolVectorInReg before legalization
This replaces the attempt in 20af71f8ec to use combineToExtendBoolVectorInReg to create X86ISD::BLENDV masks directly, instead we use it to canonicalize the iX bitcast to a sign-extended mask and then truncate it back to vXi1 prior to legalization breaking it apart.

Fixes #53760
2022-03-15 12:16:11 +00:00
Simon Pilgrim ad3a7654dc [X86] combineCMP - peek through zero-extensions for X86cmp(zext(x0),0) zero tests (PR38960)
If we're comparing a value against zero, strip away any zero-extension and perform the comparison on the pre-extended value

Fixes #38308

Differential Revision: https://reviews.llvm.org/D121472
2022-03-13 11:38:40 +00:00
Simon Pilgrim e4ab2024a6 [X86] convertIntLogicToFPLogic - enable fp-logic on pre-AVX targets for supported fp predicates (PR34563)
If the SETCC fp-condcode is supported on SSE as a single CMPPS/PD op then we can use convertIntLogicToFPLogic to reduce EFLAGS and XMM->GPR traffic like we do for AVX targets.

Differential Revision: https://reviews.llvm.org/D121210
2022-03-08 18:06:27 +00:00
Simon Pilgrim 9119eefe5f [X86] Add cheapX86FSETCC_SSE helper. NFC.
Identify FP CondCode that can be performed by a non-AVX SSE CMP op

Pulled out of D121210
2022-03-08 18:06:27 +00:00
Simon Pilgrim d0aa77440c [X86] convertIntLogicToFPLogic - pull out condcodes. NFCI. 2022-03-08 13:31:17 +00:00
Simon Pilgrim 588d97e246 [X86] getTargetVShiftNode - peek through any zext node
If the shift amount has been zero-extended, peek through as this might help us further canonicalize the shift amount.

Fixes regression mentioned in rG147cfcbef1255ba2b4875b76708dab1a685085f5
2022-03-04 17:41:45 +00:00
Simon Pilgrim 147cfcbef1 [X86] LowerShiftByScalarVariable - find splat patterns with getSplatSourceVector instead of getSplatValue
This completes the removal of uses of SelectionDAG::getSplatValue started in D119090 - by avoiding extracting the splatted element we make it a lot easier to zero-extend the bottom 64-bits of the shift amount and fixes issues we had on 32-bit targets where i64 isn't legal.

I've removed the old version of getTargetVShiftNode that took the scalar shift amount argument and LowerRotate can finally efficiently handle vXi16 rotates-by-scalar (using the same code as general funnel-shifts).

The only regression we see is in the X86-AVX2 PR52719 test case in vector-shift-ashr-256.ll - this is now hitting the same problem as the X86-AVX1 case (failure to simplify a multi-use X86ISD::VBROADCAST_LOAD) which I intend to address in a follow up patch.
2022-03-04 16:47:35 +00:00
Simon Pilgrim 940d7cd59f [X86] SimplifyDemandedVectorElts - adjust X86ISD::ANDNP demanded elts based off constant masks
Similar to what we already do in combineAndnp, if either operand is a constant then we can improve the demanded elts/bits.
2022-03-04 13:40:56 +00:00
Paul Robinson 7b85f0f32f [PS4] isPS4 and isPS4CPU are not meaningfully different 2022-03-03 11:36:59 -05:00
Simon Pilgrim 75c4a92706 [X86] Enable v32i16 FSHL/FSHR support
Now that we've improved splat detection we no longer see regressions in the funnel-shift-by-splat-amount test cases
2022-03-02 17:32:38 +00:00
Simon Pilgrim ab2cbb8466 [X86] LowerShiftByScalarVariable - remove 32-bit vXi64 bitcast shift amount handling
This was handled generically (and better) by D120553
2022-03-02 13:52:14 +00:00
Phoebe Wang e03d216c28 [X86] Use bit test instructions to optimize some logic atomic operations
This is to match GCC's optimizations: https://gcc.godbolt.org/z/3odh9e7WE

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120199
2022-03-01 09:57:08 +08:00
Simon Pilgrim 2b46417aa2 [X86][SSE] Attempt to lower vec_reduce_add patterns with PSADBW for zero-extended vXi8 sources
For i16/32/64 vectors, if the upper bits are known to be zero, then we can try to truncate to vXi8 (if its worth it) and perform this as a PSADBW to add+zext each v4i8 subvector to a i64 sum, which we can then reduce together.

This addresses some of the PR42674 test cases where the source data was vXi8 but had been extended to match a wider unsigned integer accumulator.

Differential Revision: https://reviews.llvm.org/D120193
2022-02-27 15:17:42 +00:00
Pawe Bylica eb1ff70fc5
[X86] Combine ADC(ADD(X,Y),0,Carry) -> ADC(X,Y,Carry)
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120435
2022-02-25 14:31:20 +01:00
Simon Pilgrim 748bf545dc Revert rG87753cebf5f861eee418d6bce155dfa0b00f9878 "[X86] combineX86ShufflesRecursively - don't both widening inputs before calling combineX86ShuffleChain"
Reverting while we investigate codegen regression reports
2022-02-25 08:59:53 +00:00
Simon Pilgrim a636801a36 [X86] LowerRotate - enable v8i16 ROTL/ROTR on all pre-SSE41 targets
We're still better off expanding this once we have PMOVZX
2022-02-24 14:14:08 +00:00
Simon Pilgrim 0ea50bee83 [X86] SimplifyDemandedVectorEltsForTargetNode - add X86ISD::ANDNP handling 2022-02-24 13:51:51 +00:00
Simon Pilgrim e41a138520 [X86] LowerShiftByScalarVariable - use getSplatSourceVector for vXi8 shift expansion
Using getSplatValue causes poor codegen due to not always being able to remove the EXTRACT_VECTOR_ELT created inside getSplatValue.

The vXi16 shifts/rotates are still showing occasional regressions but vXi8 is a definite improvement.
2022-02-24 11:24:06 +00:00
Simon Pilgrim 427d9f60db [X86] combineX86ShufflesRecursively - pull out repeated getValueType/getSimpleValueType calls. 2022-02-23 18:45:28 +00:00
Simon Pilgrim 87753cebf5 [X86] combineX86ShufflesRecursively - don't both widening inputs before calling combineX86ShuffleChain
combineX86ShuffleChain no longer has to assume that the shuffle inputs are the right size, so don't create unnecessary nodes messing up oneuse limits as detailed on Issue #45319
2022-02-23 17:29:41 +00:00
Simon Pilgrim 22d0453128 [X86] combineX86ShuffleChainWithExtract - don't both widening inputs after peeking through ISD::EXTRACT_SUBVECTOR nodes
combineX86ShuffleChain no longer has to assume that the shuffle inputs are the right size, so don't create unnecessary nodes messing up oneuse limits as detailed on Issue #45319

Removing widening from combineX86ShufflesRecursively will be the next step, followed by removing combineX86ShuffleChainWithExtract entirely
2022-02-23 15:44:24 +00:00
Sanjay Patel ad7214f23d [x86] add load folding restriction to pushAddIntoCmovOfConsts()
With only a load-fold the diffs look neutral. If there's a load and store (rmw)
fold opportunity as shown in the test based on #53862, then we end up with an
extra instruction.

Fixes #53862

Differential Revision: https://reviews.llvm.org/D120281
2022-02-22 08:02:11 -05:00
Simon Pilgrim ec910751fe [X86] combineX86ShufflesRecursively - attempt to fold ISD::EXTRACT_SUBVECTOR into a shuffle chain
Peek through if we're extracting a non-zero'th subvector in an attempt to fold the extract into a lane-crossing shuffle

This also exposes a failure to fold extract_subvector(movddup(x),c) -> movddup(extract_subvector(x,c))
2022-02-20 18:50:33 +00:00
Simon Pilgrim 8ef3e895ad [X86] combineX86ShufflesRecursively - add TODO not to generate temporary nodes
Extension to PR45974, unless we actual combine the target shuffles we shouldn't be generating temporary nodes as they may interfere with the one use checks in the shuffle recursions
2022-02-20 15:59:23 +00:00
Simon Pilgrim ab069f37e8 [X86] combineArithReduction - pull out repeated getVectorNumElements() calls 2022-02-19 19:41:20 +00:00
Simon Pilgrim de2c0a2e61 [X86] combineADC/SBB - pull out repeated getOperand calls. NFC. 2022-02-18 11:21:44 +00:00
Nick Desaulniers 027c16bef4 [X86ISelLowering] permit BlockAddressSDNode "i" constraints for PIC
When building 32b x86 code as PIC, the existing handling of "i"
constraints is conservative since generally we have to go through the
GOT to find references to functions.

But generally, BlockAddresses from C code refer to the Function in the
current TU.  Permit BlockAddresses to be used with the "i" constraint
for those cases.

I regressed this in
commit 4edb9983cb ("[SelectionDAG] treat X constrained labels as i for asm")

Fixes: https://github.com/llvm/llvm-project/issues/53868

Reviewed By: efriedma, MaskRay

Differential Revision: https://reviews.llvm.org/D119905
2022-02-17 10:54:46 -08:00
Simon Pilgrim 2808743cbd [X86] LowerVSETCC - always split 512-bit vectors before lowering to PCMPEQ/GT (PR53842)
Extend the existing split where we already do this for v32i16/v64i8

We can end up trying to use PCMPEQ/GT if the result needs to be sign-extended (typically due to the DAGCombiner::foldSextSetcc fold).

Fixes #53842
2022-02-15 14:21:12 +00:00
Simon Pilgrim 890beda4e1 [X86] combineArithReduction - pull out (near) duplicate v4i8/v8i8 widening code. NFC. 2022-02-13 21:02:50 +00:00
Sanjay Patel c486b82cfb [x86] try harder to scalarize a vector load with extracted integer op uses
This is a retry of b4b97ec813 - that was reverted because it
could cause miscompiles by illegally reordering memory operations.
A new test based on #53695 is added here to verify we do not have
that same problem.

extract_vec_elt (load X), C --> scalar load (X+C)

As noted in the comment, DAGCombiner has this fold -- and the code in this
patch is adapted from DAGCombiner::scalarizeExtractedVectorLoad() -- but
x86 should benefit even if the loaded vector has other uses as long as we
apply some other x86-specific conditions. The motivating example from #50310
is shown in vec_int_to_fp.ll.

Fixes #50310
Fixes #53695

Differential Revision: https://reviews.llvm.org/D118376
2022-02-13 08:32:21 -05:00
Phoebe Wang 2aa732a918 [X86][MS] Fix the wrong alignment of vector variable arguments on Win32
D108887 fixed alignment mismatch by changing the caller's alignment in
ABI. However, we found some cases that still assume the alignment is
vector size. This patch fixes them to avoid the runtime crash.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D114536
2022-02-13 10:23:18 +08:00
Simon Pilgrim 9c55b0e121 [X86] LowerFunnelShift - enable v16i16 support 2022-02-12 17:04:59 +00:00
Simon Pilgrim a4ed0c2f03 [X86] combineAndnp - if an input has a zero (after inversion for Op0) in a vector element, then we don't demand that bit/element in the other input
Similar to what we already perform in combineAnd
2022-02-12 16:49:05 +00:00
Simon Pilgrim 1f43367377 [X86] getTargetVShiftNode - Fix Wparentheses gcc warning. 2022-02-12 16:37:24 +00:00
Simon Pilgrim 6320c3e77c [X86] combineAndnp - pull out repeated operands. NFC. 2022-02-12 16:35:24 +00:00
Simon Pilgrim dcf465731d [X86] combineAnd - add SimplifyMultipleUseDemandedBits handling to masked vector element analysis
Extend the existing fold to use SimplifyMultipleUseDemandedBits as well as SimplifyDemandedVectorElts/SimplifyDemandedBits when attempting to simplify based off known zero vector elements.
2022-02-12 15:30:53 +00:00
Simon Pilgrim 1e1b60138c [X86] Improve uniform funnelshift/rotation amount handling
To find uniform shift/rotation amounts, we currently use SelectionDAG::getSplatValue which creates a node that extracts the scalar value from the source vector, this makes it more difficult for later combines to remove the extraction and stay on the SIMD unit, and can be a problem when the scalar type is illegal (i.e. i64 vs v2i64 on 32-bit targets).

This patch begins to use SelectionDAG::getSplatSourceVector (which SelectionDAG::getSplatValue uses internally) and adds a new variant of getTargetVShiftNode that takes the source vector and the splat index, and adjusts the vector in place to create the zero-extended value suitable for the SSE PSLL/PSRL/PSRA uniform instructions.

I'm still addressing a number of regressions when used for normal vector shifts, so I've just handled the funnelshift/rotation lowering for this first patch. I can then focus on the yak shaving (SimplifyDemandedBits/Elts in particular) necessary to always use SelectionDAG::getSplatSourceVector.

Differential Revision: https://reviews.llvm.org/D119090
2022-02-12 14:46:30 +00:00
Simon Pilgrim 37cf7275cd [X86] Enable vector splitting of ISD::AVGCEILU nodes on AVX1 and non-BWI targets 2022-02-12 14:04:55 +00:00
David Green f810b40c3b [X86] Replace X86ISD::AVG with generic ISD::AVGCEILU
Pulled out of D106237, this replaces the X86ISD::AVG DAG node with the
generic ISD::AVGCEILU. It doesn't remove the detectAVGPattern method,
but the extra generic ISel matching does alter the existing test.

Differential Revision: https://reviews.llvm.org/D119073
2022-02-11 18:57:18 +00:00
Simon Pilgrim 20af71f8ec [X86] combineVSelectToBLENDV - handle vselect(vXi1,A,B) -> blendv(sext(vXi1),A,B)
For pre-AVX512 targets, attempt to sign-extend a vXi1 condition mask to pass to a X86ISD::BLENDV node

Fixes Issue #53760
2022-02-11 18:38:17 +00:00
Simon Pilgrim 48e1434a0a [X86] Move combineToExtendBoolVectorInReg before the select combines. NFC.
Avoid the need for a forward declaration.

Cleanup prep for Issue #53760
2022-02-11 16:51:46 +00:00
Simon Pilgrim 827d0c51be [X86] combineToExtendBoolVectorInReg - use explicit arguments. NFC.
Replace the *_EXTEND node with the raw operands, this will make it easier to use combineToExtendBoolVectorInReg for any boolvec extension combine.

Cleanup prep for Issue #53760
2022-02-11 16:40:29 +00:00
Sanjay Patel a68e098024 [SDAG] move x86 select-with-identity-constant fold behind a target hook; NFC
This is no-functional-change-intended because only the
x86 target enables the TLI hook currently.

We can add fmul/fdiv opcodes to the switch similar to the
proposal D119111, but we don't need to make other changes
like enabling target-specific combines.

We can also add integer opcodes (add, or, shl, etc.) to
the switch because this function is called from all of the
generic binary opcodes.

The goal is to incrementally enable the profitable diffs
from D90113 while avoiding regressions.

Differential Revision: https://reviews.llvm.org/D119150
2022-02-08 09:55:05 -05:00
Simon Pilgrim d7be2bff16 [X86] combineShiftRightArithmetic - break if-else chain as they all return (style). NFC. 2022-02-07 09:54:34 +00:00
Simon Pilgrim 74b98ab1db [X86] Fold ZERO_EXTEND_VECTOR_INREG(BUILD_VECTOR(X,Y,?,?)) -> BUILD_VECTOR(X,0,Y,0)
Helps avoid some unnecessary shift by splat amount extensions before shuffle combining gets limited by with one use checks
2022-02-06 12:53:11 +00:00
Sanjay Patel 7b03725097 Revert "[x86] try harder to scalarize a vector load with extracted integer op uses"
This reverts commit b4b97ec813.

As discussed in post-commit feedback at:
https://reviews.llvm.org/D118376
...there's a stage 2 failure on a Mac running a clang-refactor tool test.
2022-02-04 07:45:57 -05:00
Sanjay Patel 6592bcecd4 [x86] invert a vector select IR canonicalization with a binop identity constant
This is an intentionally limited/different form of D90113.
That patch bravely tries to generalize folds where we pull
a binop into the arms of a select:
N0 + (Cond ? 0 : FVal) --> Cond ? N0 : (N0 + FVal)
...but it is not universally profitable.

This is the inverse of IR canonicalization as discussed in
D113442.

We know that this transform is not entirely profitable even
within x86, so we only handle x86 vector fadd/fsub as a 1st
step. The intent is to prevent AVX512 regressions as mentioned
in D113442.

The plan is to port this to DAGCombiner (so it will eventually
look more like D90113) and add more types/cases in pieces with
many more tests to verify that we are seeing improvements.

Differential Revision: https://reviews.llvm.org/D118644
2022-02-02 08:17:53 -05:00
Simon Pilgrim 5aa2acc86b [DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper
None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.
2022-02-02 12:04:49 +00:00
Simon Pilgrim 7ec8fc2932 [X86] combineAnd() - per-element simplification - call SimplifyDemandedBits using mask demanded bits if SimplifyDemandedVectorElts fails
We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements.

This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.
2022-01-31 13:58:00 +00:00
Simon Pilgrim 156f83adc2 [X86] combineVectorTruncation - use PACKUSDW(BLENDW(X,0),BLENDW(Y,0)) for v8i32->v8i16 truncation
Limit this to SSE41 - AVX1 targets to avoid UNPCKL(PSHUFB,PSHUFB), pre-SSE41 we don't have PACKUSDW/BLENDW and with AVX2 we can perform this as PERMQ(PSHUFB()).
2022-01-30 20:07:04 +00:00
Simon Pilgrim b7e04ccd99 [X86][AVX] matchUnaryShuffle - avoid creation of on-the-fly nodes (PR45974)
Don't extract the ANY/ZERO_EXTEND_VECTOR_INREG subvector source until we're definitely combining to a new node.
2022-01-30 17:59:14 +00:00
Simon Pilgrim 2cdbaca394 [X86] Attempt to fold MOVMSK(CMPEQ(AND(X,C1),0)) -> MOVMSK(NOT(SHL(X,C2)))
Allows pow2 mask tests to avoid an unnecessary constant load.

Noticed while investigating how to extend MatchVectorAllZeroTest to support more allof/anyof patterns.
2022-01-30 15:53:21 +00:00
Simon Pilgrim ee9eeed773 [X86] LowerFunnelShift - enable v8i16 lowering 2022-01-29 16:20:36 +00:00
Simon Pilgrim 6777289dd9 [X86] lowerShuffleAsBlend - pull out repeated getVectorNumElements() calls. NFC. 2022-01-29 16:16:29 +00:00
Simon Pilgrim f1305f2369 [X86] combinePredicateReduction - always use PMOVMSKB(PCMPEQB()) for allof(icmp_eq()) reductions
This greatly simplifies the codegen for recognising PTEST patterns and matches the codegen from the very similar LowerVectorAllZero
2022-01-29 15:16:59 +00:00
Simon Pilgrim 67a399fd57 [X86] SimplifyDemandedBits - add X86ISD::BLENDV SimplifyMultipleUseDemandedBits handling
Lets us see through multiple use operands
2022-01-29 14:26:41 +00:00
Simon Pilgrim 7e849fd97b [X86] LowerFunnelShift - allow non-constant vXi8 unpack(y,x) << zext(z) lowering pre-AVX512
Without AVX512 (which can efficiently extend/truncate to vXi16/vXi32), unpacking/packing to vXi16 is more efficient that relying on the (uops-heavy) PBLENDV shift expansion
2022-01-29 13:58:30 +00:00
Luo, Yuanke be44177ede [X86][avx512fp16] Promote fp16 to fp32 for frem.
Promote fp16 to fp32 for frem.

Differential Revision: https://reviews.llvm.org/D118470
2022-01-29 11:41:27 +08:00
Sanjay Patel b4b97ec813 [x86] try harder to scalarize a vector load with extracted integer op uses
extract_vec_elt (load X), C --> scalar load (X+C)

As noted in the comment, DAGCombiner has this fold -- and the code in this
patch is adapted from DAGCombiner::scalarizeExtractedVectorLoad() -- but
x86 should benefit even if the loaded vector has other uses as long as we
apply some other x86-specific conditions. The motivating example from #50310
is shown in vec_int_to_fp.ll.

Fixes #50310

Differential Revision: https://reviews.llvm.org/D118376
2022-01-28 10:22:52 -05:00
Simon Pilgrim c7bb3665a1 [X86] SimplifyDemandedBitsForTargetNode - fold MOVMSK(YMM) -> MOVMSK(XMM)
If we don't demand the upper elements of the 256-bit vector, then just perform as a 128-bit vector
2022-01-28 14:42:53 +00:00
Simon Pilgrim 2a13beaa70 [X86] combineSetCCMOVMSK - don't fold MOVMSK(BITCAST(PCMPEQ(X,0))) -> PTESTZ(X,X) if we're not testing every element comparison 2022-01-28 13:22:37 +00:00
Simon Pilgrim cce6490eca [X86] combineSetCCMOVMSK - match all_of patterns with X86ISD::CMP as well as X86ISD::SUB
Previous folds by combineSetCCMOVMSK might have converted these to CMP when changing the bitwidth, and the CMP->SUB fold might not have happened (or will happen)
2022-01-28 11:43:10 +00:00
Simon Pilgrim 93c9b39d25 [X86] Fix MOVMSK(CONCAT(X,Y)) -> MOVMSK(AND/OR(X,Y)) fold for float types and demanded elements
rG9103b73fe052 was assuming that we could OR/AND with the source vector, but that will fail on float/double vectors without bitcasting - it also missed the case that any_of checks might be testing less than all the source elements
2022-01-28 11:01:47 +00:00
Simon Pilgrim 9103b73fe0 [X86] Fold MOVMSK(CONCAT(X,Y)) -> MOVMSK(AND/OR(X,Y)) for all_of/any_of patterns
Makes it easier for later folds and avoids unnecessary 256-bit ops (especially on AVX1-only targets where we miss a lot of integer instructions)
2022-01-27 18:28:09 +00:00
Simon Pilgrim ccda0f2226 [X86][SSE] Add combineBitOpWithShift for BITOP(SHIFT(X,Z),SHIFT(Y,Z)) -> SHIFT(BITOP(X,Y),Z) vector folds
InstCombine performs this more generally with SimplifyUsingDistributiveLaws, but we don't need anything that complex here - this is mainly to fix up cases where logic ops get created late on during lowering, often in conjunction with sext/zext ops for type legalization.

https://alive2.llvm.org/ce/z/gGpY5v
2022-01-27 14:54:41 +00:00
Simon Pilgrim 389ae775e4 [X86] Fold TESTZ(OR(LO(X),HI(X)),OR(LO(Y),HI(Y))) -> TESTZ(X,Y)
Helps fix a number of poor codegen cases for allof(cmp()) with 256-bit vectors on AVX1
2022-01-27 13:20:36 +00:00
Benjamin Kramer f15014ff54 Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17"
This reverts commit ef82063207.

- It conflicts with the existing llvm::size in STLExtras, which will now
  never be called.
- Calling it without llvm:: breaks C++17 compat
2022-01-26 16:55:53 +01:00
serge-sans-paille ef82063207 Rename llvm::array_lengthof into llvm::size to match std::size from C++17
As a conquence move llvm::array_lengthof from STLExtras.h to
STLForwardCompat.h (which is included by STLExtras.h so no build
breakage expected).
2022-01-26 16:17:45 +01:00
Simon Pilgrim 99ae5c13f6 [X86] Add 'getSplitVectorSrc' helper to determine if subvectors all come from the same source
Helps determine if the subvector ops come from the same larger vector and match the lower/upper extractions
2022-01-26 15:17:21 +00:00
Simon Pilgrim 157f9b68a3 [X86] combineVectorSignBitsTruncation - fix indentation. NFC. 2022-01-25 11:54:22 +00:00
Simon Pilgrim 902184e6cc [X86] combinePredicateReduction - generalize allof(cmpeq(x,0)) handling to allof(cmpeq(x,y))
There's no further reasons to limit this to cmpeq-with-zero, the outstanding regressions with lowering to PTEST have now been addressed

Improves codegen for Issue #53379
2022-01-25 00:24:06 +00:00
Simon Pilgrim 11bb4a1111 [X86] combinePredicateReduction - split vXi16 allof(cmpeq()) to vXi8 allof(cmpeq())
vXi16 patterns allof(cmp()) reduction patterns will have to be pack the comparison results to vXi8 to use PMOVMSKB.

If we're reducing cmpeq(), then we can compare the vXi8 halves directly - similar to what we already do for vXi64 -> vXi32 for cases without PCMPEQQ.
2022-01-24 22:43:29 +00:00
Simon Pilgrim 8d298355ca [X86] combineSetCCMOVMSK - detect and(pcmpeq(),pcmpeq()) ptest pattern.
Handle cases where we've split an allof(cmpeq()) pattern to a legal vector type
2022-01-24 21:42:03 +00:00
Simon Pilgrim 6997f4d07f [X86] combineSetCCMOVMSK - fold allof(cmpeq(x,y)) -> ptest(sub(x,y)) (PR53379)
As suggested on PR53379, for all-of icmp-eq patterns, we can use ptest(sub(x,y)) on SSE41+ targets

This is a generalization of the existing allof(cmpeq(x,0)) -> ptest(x) pattern

We can probably extend this further, in particularly to handle 256-bit cases on pre-AVX2 targets, but this part of the generalization is pretty trivial

Fixes Issue #53379
2022-01-24 16:44:37 +00:00
Simon Pilgrim 577a6dc9a1 [X86] getVectorMaskingNode - fix indentation. NFC.
clang-format
2022-01-24 11:08:41 +00:00
Kazu Hirata bf039a8620 [Target] Use range-based for loops (NFC) 2022-01-23 22:53:15 -08:00
Simon Pilgrim 4762c077e7 [X86] LowerFunnelShift - always lower vXi8 fshl by constant amounts as unpack(y,x) << zext(z)
This can always be lowered as PMULLW+PSRLWI+PACKUSWB
2022-01-23 21:35:05 +00:00
Simon Pilgrim 32dc14f876 [X86] LowerFunnelShift - use supportedVectorShiftWithBaseAmnt to check for supported scalar shifts
Allows us to reuse the ISD shift opcode instead of a mixture of ISD/X86ISD variants
2022-01-23 21:13:58 +00:00
David Green b27e5459d5 [DAG] Convert truncstore(extend(x)) back to store(x)
Pulled out of D106237, this folds truncstore(extend(x)) back to store(x)
if the original store was legal. This can come up due to the order we
fold nodes. A fold from X86 needs to be adjusted to prevent infinite
loops, to have it pick the operand of a trunc more directly.

Differential Revision: https://reviews.llvm.org/D117901
2022-01-22 13:20:36 +00:00
Simon Pilgrim 866311e71c [X86] lowerToAddSubOrFMAddSub - lower 512-bit ADDSUB patterns to blend(fsub,fadd)
AVX512 doesn't provide a ADDSUB instruction, but if we've built this from a build vector of scalar fsub/fadd elements we can still lower to blend(fsub,fadd)
2022-01-20 15:16:05 +00:00
Simon Pilgrim 304cfc706a [X86] combineConcatVectorOps - remove superfluous Subtarget.hasAVX() check
This function only ever gets called by AVX targets, and we already assert for this at the top of the function
2022-01-20 12:56:09 +00:00
Simon Pilgrim c4f5fd76da [X86] combineConcatVectorOps - add handling for X86ISD::VSHL/VSRL/VSRA
These can be handled the same as the vector shift by immediate variants that are already handled.
2022-01-20 12:56:08 +00:00
Luo, Yuanke 5dea7a865e Combine to vpdpbusd when operand is constant and small enough.
Differential Revision: https://reviews.llvm.org/D116363
2022-01-20 11:10:49 +08:00
Simon Pilgrim a8890995ee [X86][AVX] LowerFunnelShift - improve FSHL/FSHR per-element lowering
Similar to LowerRotate, see if we can either unpack or extend to a wider type and use that type's per-element shift instruction
2022-01-19 10:15:43 +00:00
Simon Pilgrim ce2345d8c1 [X86] getTargetShuffleInputs - ensure we limit the maximum recursion depth to match SelectionDAG::MaxRecursionDepth
Regressions were pre-handled by rG62e36b120749

Fixes Issue #52960
2022-01-18 15:25:21 +00:00
Simon Pilgrim 62e36b1207 [X86] canLowerByDroppingEvenElements - generalize to drop even or odd elements
This allows us to match shuffle<1,3,5,7,9,11,13,15> style shift+trunc/pack patterns as well as the existing shuffle<0,2,4,6,8,10,12,14> style shuffle trunc/pack patterns

In the future, interleaving patterns might benefit from an even more general implementation for higher strides
2022-01-18 15:07:24 +00:00
Simon Pilgrim c41ca1be7d [X86] LowerFunnelShift - enable vXi32 handling 2022-01-15 15:03:24 +00:00
Fangrui Song 254302021b [X86] Fix -Wunused-lambda-capture 2022-01-14 10:07:20 -08:00
Simon Pilgrim 67076ebb60 [X86][AVX] lowerShuffleAsLanePermuteAndShuffle - don't split repeated mask patterns
Generalize 57a551a8df - if the inlane mask is a repeated mask, we're better off performing the lane permute instead of splitting
2022-01-14 17:10:37 +00:00
Simon Pilgrim 9b72e0f9a2 [X86] combineConcatVectorOps - fold concat(permilpd(x),permilpd(y)) -> permilpd(concat(x,y)) 2022-01-14 15:48:57 +00:00
Simon Pilgrim 7500b4c7e4 [X86] combineConcatVectorOps - fold concat(movs*dup(x),movs*dup(y)) -> movs*dup(concat(x,y)) 2022-01-14 15:48:56 +00:00
Simon Pilgrim 7d0ea3f41a [X86] combineConcatVectorOps - fold concat(movddup(x),movddup(y)) -> movddup(concat(x,y))
For AVX2+ targets this requires us to also recognise v4f64 concat(broadcast(x),broadcast(y)) -> movddup(concat(x,y))
2022-01-14 14:49:57 +00:00