Commit Graph

7035 Commits

Author SHA1 Message Date
Simon Pilgrim e5e719d885 [X86][SSE] lowerV8I16Shuffle - lower compaction shuffles using PACKUSDW(PBLENDW,PBLENDW) on SSE41+
Similar to the lowerV16I8Shuffle implementation, for binary compaction v8i16 shuffles we can avoid the PUNPCKLDQ(PSHUFB,PSHUFB) pattern on SSE41+ targets by using PACKUSDW and PBLENDW. Before SSE41 we would need to use PACKSSDW but that requires sign extension that seems to destroy any gains, even on targets without PSHUFB.

This is a bigger gain on AMD than Intel targets but should never be a regression, and avoiding the shuffle mask load(s) is always useful.

Noticed in codegen while dealing with PR31443.
2020-04-04 13:08:25 +01:00
Simon Pilgrim 34a497b765 [X86][SSE] lowerShuffleWithPACK - extend to use chained PACKs for larger truncations
Extend lowerShuffleWithPACK/matchShuffleWithPACK/createPackShuffleMask to handle compaction style shuffle masks that can be lowered to chains of PACKSS/PACKUS if their inputs are suitably sign/zero extended.

This helps avoid PSHUFB (and its mask load) for short shuffle chains, shuffle combining will still replace with a PSHUFB if we have enough shuffles as getFauxShuffleMask should recognise the PACKSS/PACKUS chains.
2020-04-03 18:26:10 +01:00
Scott Constable 5b519cf1fc [X86] Add Indirect Thunk Support to X86 to mitigate Load Value Injection (LVI)
This pass replaces each indirect call/jump with a direct call to a thunk that looks like:

lfence
jmpq *%r11

This ensures that if the value in register %r11 was loaded from memory, then
the value in %r11 is (architecturally) correct prior to the jump.
Also adds a new target feature to X86: +lvi-cfi
("cfi" meaning control-flow integrity)
The feature can be added via clang CLI using -mlvi-cfi.

This is an alternate implementation to https://reviews.llvm.org/D75934 That merges the thunk insertion functionality with the existing X86 retpoline code.

Differential Revision: https://reviews.llvm.org/D76812
2020-04-03 00:34:39 -07:00
Scott Constable 71e8021d82 [X86][NFC] Generalize the naming of "Retpoline Thunks" and related code to "Indirect Thunks"
There are applications for indirect call/branch thunks other than retpoline for Spectre v2, e.g.,

https://software.intel.com/security-software-guidance/software-guidance/load-value-injection

Therefore it makes sense to refactor X86RetpolineThunks as a more general capability.

Differential Revision: https://reviews.llvm.org/D76810
2020-04-02 21:55:13 -07:00
Craig Topper 4fdb63bbf0 [X86] Enable combineExtSetcc for vectors larger than 256 bits when we've disabled 512 bit vectors.
The compares are going to be type legalized to 256 bits so we
might as well fold the extend.
2020-04-02 12:44:27 -07:00
Guillaume Chatelet fc63c4d8ce [Alignment][NFC] Remove remaining uses of MachineFrameInfo::setObjectAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77217
2020-04-01 14:38:05 +00:00
Simon Pilgrim eb8880562e [X86][SSE] combinePTESTCC - fold TESTZ(X,~Y) -> TESTC(Y,X) 2020-04-01 15:10:53 +01:00
Guillaume Chatelet 3a78f44daf [Alignment][NFC] Convert SelectionDAG::InferPtrAlignment to MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77212
2020-04-01 13:22:11 +00:00
Simon Pilgrim 481413d394 [X86][SSE] matchShuffleWithPACK - generalize zero/signbits matching for any packed src type
First step toward making use of canLowerByDroppingEvenElements to match chains of PACKSS/PACKUS for compaction shuffles.

At the moment we still only match a single stage but the MatchPACK is now more general.
2020-04-01 14:10:32 +01:00
Simon Pilgrim 918ccb64b0 [X86][SSE] Handle basic inversion of PTEST/TESTP operands (PR38522)
PTEST/TESTP sets EFLAGS as:
TESTZ: ZF = (Op0 & Op1) == 0
TESTC: CF = (~Op0 & Op1) == 0
TESTNZC: ZF == 0 && CF == 0

If we are inverting the 0'th operand of a PTEST/TESTP instruction we can adjust the comparisons to correct handle the inversion implicitly.

Additionally, for "TESTZ" (ZF) cases, the allones case, PTEST(X,-1) can be simplified to PTEST(X,X).

We can expand this for the TESTZ(X,~Y) pattern and also handle KTEST/KORTEST in the future.

Differential Revision: https://reviews.llvm.org/D76984
2020-04-01 11:33:28 +01:00
Bjorn Pettersson ef49895da8 [X86] Do not assume types are legal in getFauxShuffleMask
Summary:
Make sure we do not assert on value types not being
simple in getFauxShuffleMask when analysing operations
such as "v8i16 = truncate v8i24".

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77136
2020-04-01 11:40:18 +02:00
Guillaume Chatelet c7468c1696 [Alignment][NFC] Use Align in SelectionDAG::getMemIntrinsicNode
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jholewinski, nemanjai, hiraditya, kbarton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77149
2020-04-01 09:32:05 +00:00
Craig Topper f92563f907 [VectorUtils][X86] De-templatize scaleShuffleMask and 2 X86 shuffle mask helpers and move their implementation to cpp files
Summary: These were templated due to SelectionDAG using int masks for shuffles and IR using unsigned masks for shuffles. But now that D72467 has landed we have an int mask version of IRBuilder::CreateShuffleVector. So just use int instead of a template

Reviewers: spatel, efriedma, RKSimon

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D77183
2020-04-01 00:46:48 -07:00
Simon Pilgrim f9f401dba1 [X86][AVX] Add additional 256/512-bit test cases for PACKSS/PACKUS shuffle patterns
Also add lowerShuffleWithPACK call to lowerV32I16Shuffle - shuffle combining was catching it but we avoid a lot of temporary shuffle creations if we catch it at lowering first.
2020-04-01 08:19:03 +01:00
Simon Pilgrim 7e0e5fa499 Revert rGefe59d6717dcdf7777acb9b7a734e1a520bdf22a "[X86][SSE] lowerShuffleWithPACK - extend to use chained PACKs for larger truncations"
This might be causing an issue on the fuchsia-x86_64-linux buildbot - reverting to see what happens.
2020-03-31 15:47:30 +01:00
Simon Pilgrim 38619fa7da Fix enumeral mismatch warning. NFCI.
Don't mix llvm::ISD and llvm::X86ISD.
2020-03-31 15:38:02 +01:00
Simon Pilgrim efe59d6717 [X86][SSE] lowerShuffleWithPACK - extend to use chained PACKs for larger truncations
If canLowerByDroppingEvenElements indicates that the shuffle is a N:1 compaction pattern and the inputs are suitably sign/zero extended then we can use a chain of PACKSS/PACKUS to compact.

This helps avoid PSHUFB (and its mask load) for short shuffle chains, shuffle combining will still replace with a PSHUFB if we have enough shuffles as getFauxShuffleMask can recognise PACKSS/PACKUS chains.
2020-03-31 14:48:48 +01:00
Simon Pilgrim 98357dee1c [X86] Combine concat(palignr,palignr) -> palignr(concat,concat)
combineX86ShufflesRecursively should handle this someday
2020-03-31 11:06:35 +01:00
Simon Pilgrim 7a4a98a9c4 [X86] Move canLowerByDroppingEvenElements earlier to be with matchShuffleWithPACK. NFCI.
Make sure its defined earlier so more shuffle lowering methods can use it.
2020-03-31 10:56:35 +01:00
Guillaume Chatelet bdf77209b9 [Alignment][NFC] Use Align version of getMachineMemOperand
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jyknight, sdardis, nemanjai, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, jfb, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77059
2020-03-30 15:46:27 +00:00
Simon Pilgrim e95d04f4f1 [X86][AVX] lowerV4X128Shuffle - attempt to widen to 2x256 to simplify shuffles
If we are lowering to X86ISD::SHUF128 we are going to lose track of individual 128-bit lanes that are UNDEF, so if we can widen these to guarantee that they are sequential with their neighbour we should. This helps with later shuffle combines.
2020-03-30 12:22:26 +01:00
Simon Pilgrim 9c8ec99c80 [X86][AVX] Combine 128/256-bit lane shuffles with zeroable upper subvectors to EXTRACT_SUBVECTOR (PR40720)
As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128/SHUF128, and we can use the implicit zeroing of the uppers.
2020-03-29 19:51:38 +01:00
Simon Pilgrim 8206c50cde [X86] Add isAnyZero shuffle mask helper 2020-03-29 19:51:37 +01:00
Simon Pilgrim 7734e4b3a3 [X86][AVX] Combine 128-bit lane shuffles with a zeroable upper half to EXTRACT_SUBVECTOR (PR40720)
As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128, and we can use the implicit zeroing of the upper half.

I've added some extra tests to vector-shuffle-combining-avx2.ll to make sure we don't lose coverage.
2020-03-29 16:41:59 +01:00
Simon Pilgrim da4c7db793 [X86] Rename matchShuffleAsByteRotate to matchShuffleAsElementRotate. NFC.
This was an inner helper function for the real matchShuffleAsByteRotate function, but it is more generic and is used directly for VALIGN lowering which doesn't work at the byte level.
2020-03-29 16:41:58 +01:00
Simon Pilgrim 10439f9e32 [X86][AVX] Add X86ISD::VALIGN target shuffle decode support
Allows us to combine VALIGN instructions with other shuffles - the combiner doesn't create VALIGN yet though.
2020-03-29 16:41:58 +01:00
Craig Topper 9f7d4150b9 [X86] Move combineLoopMAddPattern and combineLoopSADPattern to an IR pass before SelecitonDAG.
These transforms rely on a vector reduction flag on the SDNode
set by SelectionDAGBuilder. This flag exists because SelectionDAG
can't see across basic blocks so SelectionDAGBuilder is looking
across and saving the info. X86 is the only target that uses this
flag currently. By removing the X86 code we can remove the flag
and the SelectionDAGBuilder code.

This pass adds a dedicated IR pass for X86 that looks across the
blocks and transforms the IR into a form that the X86 SelectionDAG
can finish.

An advantage of this new approach is that we can enhance it to
shrink the phi nodes and final reduction tree based on the zeroes
that we need to concatenate to bring the partially reduced
reduction back up to the original width.

Differential Revision: https://reviews.llvm.org/D76649
2020-03-26 14:10:20 -07:00
Simon Pilgrim ad36491ebb [X86] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on all targets
Extends rG9d1721ce3926 to support AVX2+ targets.
2020-03-26 20:46:24 +00:00
Simon Pilgrim 39a52a19ed [X86] lowerV16I8Shuffle - create v8i16 mask for PACKUS(AND(),AND()) patterns.
We can improve computeKnownBits results by avoiding excess bitcasts.

For this pattern we were doing:

  (v16i8 PACKUS(v8i16 BITCAST(v16i8 AND(V1, MASK)), v8i16 BITCAST(v16i8 AND(V2, MASK))))

By performing the MASK/AND with a v8i16 type and bitcasting V1/V2 directly we can help computeKnownBits see that the mask is clearing the upper bits and allows shuffle combining to peek through later on.

This will be necessary to extend rG9d1721ce3926 to AVX2+ targets in a future patch.
2020-03-26 19:59:57 +00:00
Simon Pilgrim 9d1721ce39 [X86][SSE] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on pre-AVX2 targets
As discussed on PR31443, we should be trying to use PACKUS for binary truncation patterns to reduce the number of shuffles.

The plan is to support AVX2+ targets once we've worked around PR45315 - we fail to peek through a VBROADCAST_LOAD mask to recognise zero upper bits in a PACKUS pattern.

We should also be able to add support for v8i16 and possibly 256/512-bit vectors as well.
2020-03-26 15:47:43 +00:00
Simon Pilgrim e30d29ebc1 [X86][SSE] getFauxShuffleMask - peek through TRUNCATE/AEXT/ZEXT for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT())
As long we extract from a source vector with smaller elements and we zero-extend the element in the final shuffle mask then we can safely peek through truncations and any/zero-extensions to find the source extraction.
2020-03-26 11:57:45 +00:00
Simon Pilgrim c6e5531f9b [X86][AVX] Combine shuffles to TRUNCATE/VTRUNC patterns
Add support for combining shuffles to AVX512 truncate instructions - another step toward fixing D56387/D66004. It also fixes SKX code on PR31443.

We could probably extend this further to handle non-VLX truncation cases.
2020-03-25 17:41:51 +00:00
Reid Kleckner 597718aae0 Re-land "Avoid emitting unreachable SP adjustments after `throw`"
This reverts commit 4e0fe038f4. Re-lands
65b21282c7.

After landing 5ff5ddd0ad to add int3 into
trailing unreachable blocks, we can now remove these extra stack
adjustments without confusing the Win64 unwinder. See
https://llvm.org/45064#c4 or X86AvoidTrailingCall.cpp for a full
explanation.

Fixes PR45064.
2020-03-24 12:04:43 -07:00
Simon Pilgrim 714402147d [X86][SSE1] Add support for logic+movmsk patterns (PR42870)
rL368506 handled the basic case, but we need to account for boolean logic patterns as well.
2020-03-24 14:28:40 +00:00
Guillaume Chatelet 3ba550a05a [Alignment][NFC] Use TFL::getStackAlign()
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: dylanmckay, sdardis, nemanjai, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76551
2020-03-23 13:48:29 +01:00
Craig Topper e2cb121374 [X86] Remove maximum vector length limit from combineBasicSADPattern.
createPSADBW uses SplitsOpsAndApply so should be able to handle
any size.

Restrict the extract result type to i32 or i64 since that's what
we have coverage for today and probably matches what the
isSimple() check gave us before.

Differential Revision: https://reviews.llvm.org/D76560
2020-03-22 15:02:05 -07:00
Craig Topper b89ae50795 [X86] Remove maximum vector width restriction from combineLoopSADPattern.
SplitsOpsAndApply will take care of any needed splitting correctly.
All that we need to check is that the vector element count is a
power of 2.

Differential Revision: https://reviews.llvm.org/D76558
2020-03-22 11:09:14 -07:00
Simon Pilgrim 25eb9056d7 [X86] getTargetShuffleAndZeroables - add insert_subvector(undef, sub, c) handling.
We often widen xmm/ymm vectors to ymm/zmm by insertion into an undef base vector. By letting getTargetShuffleAndZeroables track the undef elts we can help avoid a lot of unnecessary cross-lane shuffles.

Fixes PR44694
2020-03-21 19:11:42 +00:00
Simon Pilgrim 4ceade0428 [X86] Combine concat(shufps,shufps) -> shufps(concat,concat)
Now that rG18c19441d105 has improved VPERM2X128 handling, we can perform this to improve x64->x32 truncation without poor cross-lane issues.

Someday combineX86ShufflesRecursively will handle this, but we're still really bad at dealing with different vector widths.
2020-03-21 12:44:10 +00:00
Simon Pilgrim f424d51c3e Revert rGe6a7e3b5e3e7 "[X86][SSE] matchShuffleWithSHUFPD - add support for unary shuffles."
This reverts commit e6a7e3b5e3.

Avoids register pressure regression reported at PR45263
2020-03-21 12:14:19 +00:00
Craig Topper 32fbea1548 [X86] Prevent (bitcast (broadcast_load)) combine from producing vXf16 broadcast instructions.
The combine tries to put the broadcast in either the integer or
fp domain to match the bitcast domain. But we can only do this
if the broadcast size is 32 or larger.
2020-03-20 09:15:07 -07:00
Nico Weber 4e0fe038f4 Revert "Avoid emitting unreachable SP adjustments after `throw`"
This reverts commit 65b21282c7.
Breaks sanitizer bots (https://reviews.llvm.org/D75712#1927668)
and causes https://crbug.com/1062021 (which may or may not
be a compiler bug, not clear yet).
2020-03-17 20:49:22 -04:00
Simon Pilgrim c9656a3b31 [DAGCombiner] matchRotateSub - handle shift amount truncation
Under certain circumstances we'll end up in the position where the negated shift amount will get truncated to the type specified getScalarShiftAmountTy(), so we need to test for a truncated version of the shift amount as well.

This allows us to remove half of the remaining patterns tested for by X86ISelLowering's combineOrShiftToFunnelShift.
2020-03-17 16:01:23 +00:00
Simon Pilgrim ebb181cf40 [X86] matchScalarReduction - add support for partial reductions
Add optional support for opt-in partial reduction cases by providing an optional partial mask to indicate which elements have been extracted for the scalar reduction.
2020-03-16 18:01:02 +00:00
Simon Pilgrim e43a085781 [X86] X86::isConstantSplat - enable partial undef bit handling by default.
We currently only ever use this for lowering constant uniform values (shift/rotate by immediate) so we can safely enable it by default (it treats the undef bits as zero when extracting constants).

This is necessary for an upcoming patch that will use SimplifyDemandedBits more aggressively on funnel shift amounts and causes regressions in vXi64 constant without it.
2020-03-16 12:56:24 +00:00
Simon Pilgrim ac4609cb1d [X86] LowerRotate - use X86::isConstantSplat to detect constant splat rotation amounts.
Avoid code duplication and matches what we do for the similar LowerFunnelShift and LowerScalarImmediateShift methods.
2020-03-16 12:56:23 +00:00
Simon Pilgrim ee862adf60 Fix signed/unsigned comparison warning. 2020-03-14 18:42:27 +00:00
Simon Pilgrim 0cb2f089c1 [X86] getFauxShuffleMask - pull out repeated byte sizes varaibles. NFC. 2020-03-14 17:36:17 +00:00
Simon Pilgrim f47f4c137b [X86] getFauxShuffleMask - merge insertelement paths
Merge the INSERT_VECTOR_ELT/SCALAR_TO_VECTOR and PINSRW/PINSRB shuffle mask paths - they both do the same thing (find source vector + handle implicit zero extension). The PINSRW/PINSRB path also handled in the insertion of zero case which needed to be added to the general case as well.
2020-03-14 13:11:03 +00:00
Craig Topper 755e00876c [X86] Remove isel patterns for X86VBroadcast+trunc+extload. Replace with DAG combines.
This is a little more complicated than I'd like it to be. We have
to manually match a trunc+srl+load pattern that generic DAG
combine won't do for us due to isTypeDesirableForOp.
2020-03-13 18:12:16 -07:00