Commit Graph

8130 Commits

Author SHA1 Message Date
Simon Pilgrim 63e3035dbe [X86] LowerGC_TRANSITION - remove redundant SDLoc(). 2022-06-07 10:57:58 +01:00
Shilei Tian 0c3e6e5717 [NFC] Remove trailing whitespace 2022-06-06 18:59:13 -04:00
Eric Christopher 93cb6b9c83 Revert "[X86] combineConcatVectorOps - add support for concatenation VSELECT/BLENDV nodes"
See the original commit for a testcase.

This reverts commit ea8fb3b601.
2022-06-03 12:31:11 -07:00
Simon Pilgrim de2b543505 [X86] LowerVSETCC - merge getConstant() calls with flipped/unflipped sign masks. NFCI. 2022-06-01 15:09:48 +01:00
Sanjay Patel 3a503a4a9c [x86] fix miscompile from wrongly identified fneg
We may need to peek through a bitcast when identifying an fneg idiom
via its pool constant, but we can't allow a different-sized constant
in that match.

This is noted in issue #55758 with an example that needs fast-math,
but as the test here shows, this has potential to miscompile more
generally (no fast-math required).

Differential Revision: https://reviews.llvm.org/D126775
2022-06-01 09:56:33 -04:00
Simon Pilgrim f6dbb0b6fb [X86] Fix typo in extraction type introduced in rGed0303aa2251e4484a2b4ff7f236c9f7cdfb2092
It doesn't look like we have test coverage for this at the moment :(
2022-06-01 12:31:27 +01:00
Simon Pilgrim ea8fb3b601 [X86] combineConcatVectorOps - add support for concatenation VSELECT/BLENDV nodes
If the LHS/RHS selection operands can be cheaply concatenated back together then replace 2 x 128-bit selection nodes with 1 x 256-bit node

Addresses the regression introduced in the bug fix from rGd5af6a38082b39ae520a328e44dc29ebcb036bb2
2022-06-01 10:46:06 +01:00
Simon Pilgrim d5af6a3808 [X86] LowerMINMAX - split v4i64 types on AVX1 targets (Issue #55648)
Originally we tried to use default expansion for v4i64 types to make it easier to concatenate the results back together, but this can cause infinite loop issues with existing VSELECT splitting code in narrowExtractedVectorSelect if we have other uses of the VSELECT results (e.g. reduction patterns).

To fix the infinite loop, this patch always splits MIN/MAX v4i64 nodes during lowering and I've added a TODO for combineConcatVectorOps to investigate when we can cheaply concatenate VSELECT/BLENDV nodes together.

Fixes #55648 - regression test case will be added in a follow up.
2022-05-31 17:28:56 +01:00
Simon Pilgrim af0113cf77 [X86] combineEXTRACT_SUBVECTOR - pull out repeated getVectorNumElements() calls. NFC. 2022-05-31 16:13:54 +01:00
Simon Pilgrim b9443cb6fa [X86] narrowExtractedVectorSelect - don't peek through bitcasts to find source vector
We don't seem to need this for any test coverage and it was making tracking of the uses() of the source vector more difficult

Noticed while investigating Issue #55648
2022-05-31 14:57:18 +01:00
Simon Pilgrim ed0303aa22 [X86] LowerTRUNCATE - avoid creating extract_subvector(bitcast(vec)) patterns
We have a generic DAG combine to attempt to fold extract_subvector(bitcast(vec)) -> bitcast(extract_subvector(vec)) but if we create these patterns late in lowering then we often miss them.

Noticed while investigating Issue #55648 which gets caught in an infinite loop trying to split extract_subvector(bitcast(vselect()) patterns - this doesn't fix the issue yet but reduces the regressions from the WIP fix.
2022-05-31 14:30:56 +01:00
Zongwei Lan ad73ce318e [Target] use getSubtarget<> instead of static_cast<>(getSubtarget())
Differential Revision: https://reviews.llvm.org/D125391
2022-05-26 11:22:41 -07:00
Craig Topper 06fee478d2 [X86] Add isSimple check to the load combine in combineExtractVectorElt.
I think we need to be sure the load isn't volatile before we
duplicate and shrink it.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126353
2022-05-25 09:11:11 -07:00
Jay Foad 6bec3e9303 [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.

The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.

Differential Revision: https://reviews.llvm.org/D125557
2022-05-19 11:23:13 +01:00
Simon Pilgrim 320545b577 [X86] Rename combineCONCAT_VECTORS\INSERT_SUBVECTOR\EXTRACT_SUBVECTOR to match Opcode name. NFCI.
Its a lot easier to quickly search for the combine when it actually contains the name of the opcode it combines.
2022-05-17 18:37:53 +01:00
Simon Pilgrim c64f5d44ad [X86] Attempt to fold EFLAGS into X86ISD::ADD/SUB ops
We already use combineAddOrSubToADCOrSBB to fold extended EFLAGS results into ISD::ADD/SUB ops as X86ISD::ADC/SBB carry ops.

This patch extends this to also try to fold EFLAGS results with X86ISD::ADD/SUB ops

Differential Revision: https://reviews.llvm.org/D125642
2022-05-17 10:59:24 +01:00
Simon Pilgrim b3077f563d [X86] Move combineAddOrSubToADCOrSBB earlier. NFC.
Make it easier to reuse in X86 ADD/SUB combines in an upcoming patch.
2022-05-15 22:06:33 +01:00
Simon Pilgrim fd1f0c51ef [X86] lowerShuffleAsLanePermuteAndSHUFP always succeeds, so just return the result. NFC. 2022-05-15 15:53:36 +01:00
Simon Pilgrim c0f59be358 [X86] Pull out repeated isShuffleMaskInputInPlace calls. NFC. 2022-05-15 15:35:09 +01:00
Simon Pilgrim 32162cf291 [X86] lowerV4I64Shuffle - try harder to lower to PERMQ(BLENDD(V1,V2)) pattern 2022-05-15 14:57:58 +01:00
Simon Pilgrim bc90bbb759 [X86] LowerAVG - fix cut+paste typo. NFC. 2022-05-14 17:42:09 +01:00
Simon Pilgrim 98f82d69bd [X86] LowerStore - use is64BitVector() wrapper. NFCI. 2022-05-13 15:30:18 +01:00
Matthias Braun cd19af74c0 Avoid 8 and 16bit switch conditions on x86
This adds a `TargetLoweringBase::getSwitchConditionType` callback to
give targets a chance to control the type used in
`CodeGenPrepare::optimizeSwitchInst`.

Implement callback for X86 to avoid i8 and i16 types where possible as
they often incur extra zero-extensions.

This is NFC for non-X86 targets.

Differential Revision: https://reviews.llvm.org/D124894
2022-05-10 10:00:10 -07:00
Simon Pilgrim 980f41d7c4 [X86] (style) Use auto for dyn_cast<> results 2022-05-01 17:15:18 +01:00
Simon Pilgrim d4f06ec874 [X86] (style) Don't use auto for non obvious types 2022-05-01 17:10:21 +01:00
Simon Pilgrim 92235e3bf4 [X86] lowerShuffleAsRepeatedMaskAndLanePermute - permit 32-bit sublane permute for unary v32i8 cases
Increase the likelihood that we can lower to a permd(pshufb()) pattern, but only after we've attempted with 64-bit sublane permutes first

Fixes #55066
2022-04-30 11:00:28 +01:00
Simon Pilgrim b424055b52 [X86] lowerShuffleAsRepeatedMaskAndLanePermute - move the sublane split code into a lambda helper. NFC.
This is a NFC cleanup as part of the work on #55066 - the idea being that we will be able to check for multiple sub lane scales.
2022-04-29 16:03:50 +01:00
Simon Pilgrim 3562f855b7 [X86] SimplifyDemandedVectorEltsForTargetNode - fold (uniform) shift(0,x) -> 0 2022-04-29 12:08:47 +01:00
Simon Pilgrim 336a1233b2 [X86] SimplifyDemandedVectorEltsForTargetNode - fold shift(0,x) -> 0 2022-04-29 11:32:54 +01:00
Simon Pilgrim 6c44e398ec [X86] combineShuffle - reuse SDLoc. NFCI. 2022-04-29 10:30:11 +01:00
Simon Pilgrim 2d7f0b1c22 [X86] Fold ANDNP(undef,x)/ANDNP(x,undef) -> 0
Matches the fold in DAGCombiner::visitANDLike.
2022-04-29 10:20:48 +01:00
Simon Pilgrim ab17ed0723 [X86] Don't fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) on BMI2 targets
With BMI2 we have SHRX which is a lot quicker than regular x86 shifts.

Fixes #55138
2022-04-28 21:28:16 +01:00
Simon Pilgrim 9e3b7e8e65 [X86] getTargetVShiftByConstNode - use SelectionDAG::FoldConstantArithmetic to perform constant folding. NFCI.
Remove some unnecessary code duplication.
2022-04-28 17:10:20 +01:00
Simon Pilgrim de7cee24b6 [X86] getBT - attempt to peek through aext(and(trunc(x),c)) mask/modulo
Ideally we'd fold this with generic DAGCombiner, but that only works for !isTruncateFree cases - we might be able to adapt IsDesirableToPromoteOp to find truncated src ops in the future, but for now just use this peephole.

Noticed in Issue #55138
2022-04-28 16:10:26 +01:00
Simon Pilgrim ed8dffef4c [X86] getFauxShuffle - don't assume an UNDEF src element for AND/ANDNP results in an UNDEF shuffle mask index
The other src element might be zero, guaranteeing zero.

Fixes #55157
2022-04-28 12:32:58 +01:00
Simon Pilgrim e378577524 [X86] Use is128BitLaneRepeatedShuffleMask wrapper. NFC.
We don't need to know the actual repeated mask.
2022-04-27 21:09:57 +01:00
Simon Pilgrim 03482bccad [X86] collectConcatOps - add ability to collect from vector 'widening' patterns
Recognise insert_subvector(undef, x, lo/hi) patterns where we double the width of a vector - creating an UNDEF subvector on the fly.
2022-04-27 15:38:58 +01:00
Xiang1 Zhang c430f0f532 [X86] Add use condition for combineSetCCMOVMSK
Reviewed by RKSimon, LuoYuanke

Differential Revision: https://reviews.llvm.org/D123652
2022-04-26 16:42:50 +08:00
Simon Pilgrim e8305c0b8f [X86] combineX86ShuffleChain - don't fold to truncate(concat(V1,V2)) if it was already a PACK op
Fixes #55050
2022-04-25 17:13:44 +01:00
Craig Topper c6fdb1de47 [X86] Move some hasOneUse checks after checking what the opcode is.
Calling hasOneUse can be expensive on nodes with multiple results.
Especially when some results are Chains. By checking the opcode first,
we can avoid walking the uses if it isn't an interesting node,
and thus avoid calling hasOneUse on a node that might have many uses.

Found by profiling the IR given in D123857.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D123881
2022-04-16 14:18:58 -07:00
Craig Topper 9d86bf825c [X86] Move hasOneUse check after opcode check. NFC
Checking opcode is cheap. hasOneUse might not be if the node has
multiple results. By checking the opcode we can rule out nodes
with multiple results we aren't interested in.
2022-04-15 17:20:57 -07:00
Liu, Chen3 bf60a5af0a [X86] Covert unsigned int 0 to float-point with FILD instruction.
unsinged int 0 will be convert to float/double -0.0 when the rounding
mode is set to 'FE_DOWNWARD'. Using FILD instruction instead of SSE
instructions on 32-bit target if the strictfp is enabled.

Differential Revision: https://reviews.llvm.org/D123660
2022-04-13 20:06:15 +08:00
Simon Pilgrim 0488c6638b [X86] getFauxShuffleMask - remove use DemandedElts TODO
Most of the getTargetShuffleInputs recursive calls have now gone and the remaining uses aren't likely to benefit from a DemandedElts mask
2022-04-12 15:36:30 +01:00
Simon Pilgrim 1e803d305a Revert rG88ff6f70c45f2767576c64dde28cbfe7a90916ca "[X86] Extend vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)) to include inner or(pshufb(x), pshufb(y)) chains"
Reverting while I investigate reports of internal test regressions/failures
2022-04-11 10:42:43 +01:00
Simon Pilgrim 88ff6f70c4 [X86] Extend vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)) to include inner or(pshufb(x), pshufb(y)) chains 2022-04-10 13:04:53 +01:00
Simon Pilgrim c74d729bd6 [X86] combineExtractSubvector - fold extract_subvector(insert_subvector(V,X,C1),C1)
extract_subvector(insert_subvector(V,X,C1),C1) -> insert_subvector(extract_subvector(V,C1),X,0)

More aggressively attempt to reduce the width of an extract_subvector source - we currently only do this if we're inserting into a zero vector (i.e. canonicalizing to the AVX implicit zero upper elts pattern).

But if we're extracting from the same point as the inner insert_subvector then the fold is still relatively trivial - we can probably do even better if we can ensure the subvector isn't badly split.
2022-04-10 11:03:08 +01:00
Simon Pilgrim 30a01bccda [X86] Fold concat(pshufb(x,y),pshufb(z,w)) -> pshufb(concat(x,z),concat(y,w)) 2022-04-09 16:05:50 +01:00
Simon Pilgrim 97ee923248 [X86] lowerV64I8Shuffle - attempt to fold to SHUFFLE(ALIGNR(X,Y)) and OR(PSHUFB(X),PSHUFB(Y)) 2022-04-09 14:09:39 +01:00
Simon Pilgrim 3d4bb78fbe [X86][SSE] combineSelect - more aggressively create zero elements in the or(pshufb(x), pshufb(y)) fold
When we fold vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)), ensure we convert all undef elements to zero elements - this should help us expose more known zero elements for deeper chains of these cases.

Noticed while triaging Issue #54819
2022-04-09 12:53:00 +01:00
chenglin.bi f72b3a506b [x86] Replace getNodeIfExists to doesNodeExist when only check node exist
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123224
2022-04-08 00:33:05 +08:00
Wei Xiao 842d0bf931 [x86] Improve select lowering for smin(x, 0) & smax(x, 0)
smin(x, 0):
  (select (x < 0), x, 0) -> ((x >> (size_in_bits(x)-1))) & x

smax(x, 0):
  (select (x > 0), x, 0) -> (~(x >> (size_in_bits(x)-1))) & x
  The comparison is testing for a positive value, we have to invert the sign
  bit mask, so only do that transform if the target has a bitwise 'and not'
  instruction (the invert is free).

The transform is performed only when CMP has a single user to avoid
increasing total instruction number.

https://alive2.llvm.org/ce/z/euUnNm
https://alive2.llvm.org/ce/z/37339J

Differential Revision: https://reviews.llvm.org/D123109
2022-04-07 15:53:24 +08:00
Matt Arsenault c4ea925f50 AtomicExpand: Change return type for shouldExpandAtomicStoreInIR
Use the same enum as the other atomic instructions for consistency, in
preparation for addition of another strategy.

Introduce a new "Expand" option, since the store expansion does not
use cmpxchg. Alternatively, the existing CmpXChg strategy could be
renamed to Expand.
2022-04-06 22:34:04 -04:00
Roman Lebedev 9be6e7b0f2
[X86] `lowerBuildVectorAsBroadcast()`: with AVX512VL, allow i64->XMM broadcasts from constant pool
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D123221
2022-04-06 18:33:40 +03:00
Pierre Gousseau a3d5f1cf5d [x86] Fix infinite loop inside DAG combiner with lzcnt feature.
The issue affects targets supporting fast-lzcnt such as btver2.
This removes extraneous zext/trunc node insertions to fix the infinite
loop.
This fixes Issue https://github.com/llvm/llvm-project/issues/54694

Differential Revision: https://reviews.llvm.org/D122900

Reviewed By: RKSimon, spatel, lebedev.ri
2022-04-05 17:32:10 +01:00
Simon Pilgrim 623d4b5787 [X86] Support optional NOT stages in the AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) fold
Extension to D122891, peek through NOT() ops, adjusting the condcode as we go.
2022-04-04 10:51:26 +01:00
Simon Pilgrim fbfd78f7aa [X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow v16i32 sub-lane permutes for v64i8 shuffles
Without VBMI, we are better off permuting v16i32 sub-lanes, even though its a variable shuffle, if it allows us to then shuffle v64i8 inlane repeated masks (PSHUFB etc.)

Fixes #54658
2022-04-03 10:05:10 +01:00
Simon Pilgrim b8652fbcbb [X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) (RECOMMITTED)
As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc.

Recommitted with a fix to ensure we zext/trunc the SETCC result to the original type.

Differential Revision: https://reviews.llvm.org/D122891
2022-04-01 16:59:06 +01:00
Simon Pilgrim 5a457bd2fa Revert rGa5f637bcbb7d1e08ce637f113fc117c3f4b2b110 "[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y))"
Investigating a sanitizer-windows buildbot breakage
2022-04-01 16:48:24 +01:00
Simon Pilgrim 9afa6811ad [X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow 64-bit sublane shuffling on AVX512BW v64i8 shuffles
We were only performing this on 256-bit vectors on AVX2 targets

Noticed while triaging Issue #54658
2022-04-01 16:40:10 +01:00
Simon Pilgrim a5f637bcbb [X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y))
As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc.

Differential Revision: https://reviews.llvm.org/D122891
2022-04-01 16:07:56 +01:00
Simon Pilgrim 3245cfb8d3 [X86] Add getBT helper node for attempting to create a X86ISD::BT node
Avoids repeating all the extension/legalization wrappers in every use
2022-04-01 11:48:25 +01:00
Simon Pilgrim 919b657080 Revert rGff2d1bb2b749bd8a5697c25d2380b7c97a59ae06 "[X86] Add getBT helper node for attempting to create a X86ISD::BT node"
Typo means that this doesn't return a value in all cases.
2022-04-01 11:21:00 +01:00
Simon Pilgrim ff2d1bb2b7 [X86] Add getBT helper node for attempting to create a X86ISD::BT node
Avoids repeating all the extension/legalization wrapper in every use
2022-04-01 11:12:23 +01:00
Simon Pilgrim cb5c4a5917 [X86] lowerV8I16Shuffle - use explicit SmallVector<SDValue, 4> width to avoid MSVC AVX alignment bug
As discussed on Issue #54645 - building llc with /AVX can result in incorrectly aligned structs
2022-04-01 10:54:24 +01:00
Simon Pilgrim 535211c3eb [X86] Remove redundant FIXME
lowerV64I8Shuffle has been extended a lot since this was added.
2022-03-31 18:05:52 +01:00
Simon Pilgrim fac1729924 [X86] lowerV64I8Shuffle - don't use lowerShuffleWithPERMV until we've tried simpler options
Shuffle combining will still lower to this with better fast cross lane checks.

Noticed while triaging Issue #54658
2022-03-31 18:05:51 +01:00
Sanjay Patel 4a54e3eed3 [x86] try to replace 0.0 in fcmp with negated operand
This inverts a fold recently added to IR with:
3491f2f4b0

We can put -bidirectional on the Alive2 examples to show that
the reverse transforms work:
https://alive2.llvm.org/ce/z/8iVQwB

The motivation for the IR change was to improve matching to
'fabs' in IR (see https://github.com/llvm/llvm-project/issues/38828 ),
but it regressed x86 codegen for 'not-quite-fabs' patterns like
(X > -X) ? X : -X.
Ie, when there is no fast-math (nsz), the cmp+select is not a proper
fabs operation, but it does map nicely to the unusual NAN semantics
of MINSS/MAXSS.

I drafted this as a target-independent fold, but it doesn't appear to
help any other targets and seems to cause regressions for SystemZ at
least.

Differential Revision: https://reviews.llvm.org/D122726
2022-03-31 09:17:49 -04:00
Simon Pilgrim 481b185620 [X86] combineCarryThroughADD - recognise X86ISD::ADD(AND(X,1),-1) pattern can be folded to X86ISD::BT
As mentioned on D122482, if we've generated a masked overflow test see if we can fold it to X86ISD::BT to feed a X86ISD::ADC/SBB

Differential Revision: https://reviews.llvm.org/D122572
2022-03-31 09:52:55 +01:00
Simon Pilgrim 6697e3354f [X86] combineADC - fold ADC(C1,C2,Carry) -> ADC(0,C1+C2,Carry)
If we're not relying on the flag result, we can fold the constants together into the RHS immediate operand and set the LHS operand to zero, simplifying for further folds.

We could do something similar if the flag result is in use and the constant fold doesn't affect it, but I don't have any real test cases for this yet.

As suggested by @davezarzycki on Issue #35256

Differential Revision: https://reviews.llvm.org/D122482
2022-03-30 09:11:55 +01:00
Simon Pilgrim 1ec109ec58 [X86] combineCarryThroughADD - remove unused peek through of SEXT/AEXT nodes. 2022-03-29 17:22:50 +01:00
Shao-Ce SUN 662b9fa02c [NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122557
2022-03-29 09:53:24 +08:00
Simon Pilgrim 8a1956dfa5 [X86] lowerV64I8Shuffle - attempt to match with lowerShuffleAsLanePermuteAndPermute
Fixes #54562
2022-03-28 17:21:27 +01:00
Phoebe Wang 674d52e8ce [X86] Refactor X86ScalarSSEf16/32/64 with hasFP16/SSE1/SSE2. NFCI
This is used for f16 emulation. We emulate f16 for SSE2 targets and
above. Refactoring makes the future code to be more clean.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D122475
2022-03-27 12:24:02 +08:00
Simon Pilgrim 43a969debd [X86] combineADC - pull out repeated dyn_cast<ConstantSDNode> calls. NFC. 2022-03-25 12:53:08 +00:00
Simon Pilgrim 3db858c58c [X86] combineAdd - fold ADD(ADC(Y,0,W),X) -> ADC(X,Y,W)
This also exposed a missed ADC canonicalization of constant ops to the RHS
2022-03-25 10:52:10 +00:00
Simon Pilgrim 33b214b711 [X86] combineSub - fold SUB(X,ADC(Y,0,W)) -> SBB(X,Y,W) 2022-03-24 18:00:00 +00:00
Simon Pilgrim 438ac282db [X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y) (REAPPLIED)
As suggested on PR35908, if we are adding/subtracting an extracted bit, attempt to use BT instead to fold the op and use a ADC/SBB op.

Reapply with extra type legality checks - LowerAndToBT was originally only used during lowering, now that it can occur earlier we might encounter illegal types that we can either promote to i32 or just bail.

Differential Revision: https://reviews.llvm.org/D122084
2022-03-21 21:37:42 +00:00
Nikita Popov 1533682839 Revert "[X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y)"
This reverts commit 81569f5b6e.

This causes a segfault when building consumer-typeset in
ReleaseLTO-g configuration:
https://llvm-compile-time-tracker.com/show_error.php?commit=81569f5b6ef531a48023f28133481262ee1509a3
2022-03-21 21:52:36 +01:00
Simon Pilgrim 5fd9451668 [X86][AVX512] lower1BitShuffle - fold broadcast(setcc(x,y)) -> setcc(broadcast(x),broadcast(y)) (PR52500)
AVX512 has excellent broadcast ops for everything but vXi1 bool vectors - so if we're broadcasting a comparison result, see if we can broadcast the comparison operands instead.
2022-03-21 17:42:49 +00:00
Simon Pilgrim b6e2832fc2 [X86] Don't fold SUB(X,SBB(0,0,W)) -> SUB(ADC(0,0,W),Y)
This will further fold to a AND(SETCC_CARRY(),1) pattern which tends to prevent further folds.
2022-03-21 15:54:48 +00:00
Simon Pilgrim 315896d3ac [X86] Fold SUB(X,SBB(Y,Z,W)) -> SUB(ADC(X,Z,W),Y)
Prefer the commutable ADC over SBB to improve load folding opportunities
2022-03-21 14:20:46 +00:00
Simon Pilgrim ed51e26ab4 [X86] combineAddOrSubToADCOrSBB - commute + neg subtraction patterns
Handle SUB(AND(SRL(Y,Z),1),X) -> NEG(SBB(X,0,BT(Y,Z))) folds

I'll address the X86 lost folded-load regressions in a follow-up patch
2022-03-21 13:55:35 +00:00
Simon Pilgrim 5e9365c5eb [X86] combineAddOrSubToADCOrSBB - bail for illegal types
Ensure we don't attempt to fold to illegal types to ADC/SBB nodes.

After D122084 its possible for ADD(X,AND(SRL(Y,Z),1) patterns to be matched before type legalization.
2022-03-21 13:31:21 +00:00
Simon Pilgrim 81569f5b6e [X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y)
As suggested on PR35908, if we are adding/subtracting an extracted bit, attempt to use BT instead to fold the op and use a ADC/SBB op.

Differential Revision: https://reviews.llvm.org/D122084
2022-03-21 10:57:12 +00:00
Simon Pilgrim 1ae3c4e948 [X86] combineAddOrSubToADCOrSBB - split to more cleanly handle commuted variants.
Split combineAddOrSubToADCOrSBB into wrapper (which handles ADDs with commuted args) and the real combine, which no longer has to account for commutation.

I'm intending to extend combineAddOrSubToADCOrSBB to detect patterns other than just X86ISD::SETCC, so we need to detect all patterns without detecting them as part of a commutation swap.
2022-03-20 09:14:21 +00:00
Shengchen Kan 076a9dc99a [X86][NFC] Rename hasCMOV() to canUseCMOV(), hasLAHFSAHF() to canUseLAHFSAHF()
To make them less like other feature functions.
This is a follow-up patch for D121978.
2022-03-20 12:00:25 +08:00
Craig Topper 57b41af838 [X86] Rename FeatureCMPXCHG8B/FeatureCMPXCHG16B to FeatureCX8/CX16 to match CPUID.
Rename hasCMPXCHG16B() to canUseCMPXCHG16B() to make it less like other
feature functions. Add a similar canUseCMPXCHG8B() that aliases
hasCX8() to keep similar naming.

Differential Revision: https://reviews.llvm.org/D121978
2022-03-19 12:34:06 -07:00
Simon Pilgrim 34110a7320 [X86] combineAddOrSubToADCOrSBB - pull out repeated Y.getOperand(1) calls. NFC. 2022-03-19 17:56:11 +00:00
Simon Pilgrim b90478d422 [X86] createShuffleMaskFromVSELECT - handle BLENDV constant masks as well as VSELECT constant masks
Handle constant masks for both vselect nodes (mask != 0) and blendv nodes (mask < 0)
2022-03-19 16:51:07 +00:00
Simon Pilgrim a6c18bfbe3 [X86] combineSelect - don't constant fold BLENDV nodes like VSELECT
If a X86ISD::BLENDV op appears before legalization (in this test case due to the icmp_slt x, 0) its constant mask was being treated as a vselect mask (mask != 0) instead of blendv (mask < 0)

This just prevents constant folding entirely for non-VSELECT ops.
2022-03-19 16:31:19 +00:00
Simon Pilgrim 56ad791f46 [X86] LowerAndToBT - fold BT(NOT(X),Y) -> BT(X,Y) and flip the CondCode 2022-03-19 14:03:03 +00:00
Simon Pilgrim c7ba5a9aff [X86][SSE] Add initial support for extracting non-constant bool vector elements
We can use MOVMSK+TEST/BT to extract individual bool elements even if the index isn't constant

This relies on combineBitcastvxi1 so some AVX512 cases still aren't optimized as they avoid MOVMSK usage.
2022-03-19 13:31:05 +00:00
Shengchen Kan 920c2e5763 [X86][NFC] Rename target feature hasCMov->hasCMOV
This is a follow-up patch for D121975.
2022-03-18 14:05:52 +08:00
Craig Topper 6cfe41dcc8 [X86] Rename more target feature related things consistency. NFC
-Rename Mode*Bit to Is*Bit to match X86Subtarget.
-Rename FeatureLAHFSAHF to FeatureLAFHSAFH64 to match X86Subtarget.
-Use consistent capitalization

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D121975
2022-03-17 22:27:17 -07:00
Simon Pilgrim e3deb7d88b [X86] computeKnownBitsForTargetNode - add X86ISD::AND KnownBits handling
Fixes #54171
2022-03-16 11:05:36 +00:00
Shengchen Kan 052d37dc7c [NFC][X86] Rename some variables and functions about target features
This is preparation for D121768. The member's name should align w/
the interface for trival target feature.
2022-03-16 13:08:52 +08:00
Simon Pilgrim f591231cad [X86] combineSelect - canonicalize (vXi1 bitcast(iX Cond)) with combineToExtendBoolVectorInReg before legalization
This replaces the attempt in 20af71f8ec to use combineToExtendBoolVectorInReg to create X86ISD::BLENDV masks directly, instead we use it to canonicalize the iX bitcast to a sign-extended mask and then truncate it back to vXi1 prior to legalization breaking it apart.

Fixes #53760
2022-03-15 12:16:11 +00:00
Simon Pilgrim ad3a7654dc [X86] combineCMP - peek through zero-extensions for X86cmp(zext(x0),0) zero tests (PR38960)
If we're comparing a value against zero, strip away any zero-extension and perform the comparison on the pre-extended value

Fixes #38308

Differential Revision: https://reviews.llvm.org/D121472
2022-03-13 11:38:40 +00:00
Simon Pilgrim e4ab2024a6 [X86] convertIntLogicToFPLogic - enable fp-logic on pre-AVX targets for supported fp predicates (PR34563)
If the SETCC fp-condcode is supported on SSE as a single CMPPS/PD op then we can use convertIntLogicToFPLogic to reduce EFLAGS and XMM->GPR traffic like we do for AVX targets.

Differential Revision: https://reviews.llvm.org/D121210
2022-03-08 18:06:27 +00:00
Simon Pilgrim 9119eefe5f [X86] Add cheapX86FSETCC_SSE helper. NFC.
Identify FP CondCode that can be performed by a non-AVX SSE CMP op

Pulled out of D121210
2022-03-08 18:06:27 +00:00