Commit Graph

8166 Commits

Author SHA1 Message Date
Eli Friedman cfd2c5ce58 Untangle the mess which is MachineBasicBlock::hasAddressTaken().
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code.  Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.

Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.

So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.

Discovered while trying to sort out related stuff on D102817.

Differential Revision: https://reviews.llvm.org/D124697
2022-08-16 16:15:44 -07:00
Bing1 Yu 807b8cb06c [X86] Fix a lowering issue of mask.compress which has undef float passthrough
Previously, LegaizeDAG didn't check mask.compress's passthrough might be float, and this lead to getConstant crash since it doesn't support fp

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131947
2022-08-16 17:54:45 +08:00
Simon Pilgrim a7b85e4c0c [X86] Freeze shl(x,1) -> add(x,x) vector fold (PR50468)
Vector fold shl(x,1) -> add(freeze(x),freeze(x)) to avoid the undef issues identified in PR50468

Differential Revision: https://reviews.llvm.org/D106675
2022-08-15 16:17:21 +01:00
Simon Pilgrim 41bdb8cd36 [X86] Fold insert_vector_elt(undef, elt, 0) --> scalar_to_vector(elt)
I had hoped to make this a generic fold in DAGCombine, but there's quite a few regressions in Thumb2 MVE that need addressing first.

Fixes regressions from D106675.
2022-08-15 14:56:30 +01:00
Simon Pilgrim 8b47e29fa0 [X86] combineVectorShiftImm - fold (shl (add X, X), C) -> (shl X, (C + 1))
Noticed while investigating the regressions in D106675
2022-08-14 17:42:02 +01:00
Phoebe Wang 8b69549dc5 [X86][FP16] Promote FP16->[U]INT to FP16->FP32->[U]INT
This is to avoid f16->i64 being lowered to `__fixhfdi/__fixunshfdi` on 32-bits since neither libgcc nor compiler-rt provide them. https://godbolt.org/z/cjWEsea5v

It also helps to improve the performance by promoting the vector type.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D131828
2022-08-14 09:37:33 +08:00
Simon Pilgrim 6ba5fc2dee [X86] lowerShuffleWithVPMOV - support direct lowering to VPMOV on VLX targets
lowerShuffleWithVPMOV currently only matches shuffle(truncate(x)) patterns, but on VLX targets the truncate isn't usually necessary to make the VPMOV node worthwhile (as we're only targetting v16i8/v8i16 shuffles we're almost always ending up with a PSHUFB node instead). PACKSS/PACKUS are still preferred vs VPMOV due to their lower uop count.

Fixes the remaining regression from the fixes in rG293899c64b75
2022-08-11 17:40:07 +01:00
Simon Pilgrim 5dcf0c342b [X86] lowerShuffleWithVPMOV - remove oneuse constraints on shuffle(trunc(x),undef) -> vpmov(x) lowering
These were added in rG057bdd63 but shuffle combining has gotten a lot better at folding different vector widths since then.
2022-08-11 14:06:42 +01:00
aqjune 02e56e2533 [CodeGen] Generate efficient assembly for freeze(poison) version of `mm*_cast*` intel intrinsics
This patch makes the variants of `mm*_cast*` intel intrinsics that use `shufflevector(freeze(poison), ..)` emit efficient assembly.
(These intrinsics are planned to use `shufflevector(freeze(poison), ..)` after shufflevector's semantics update; relevant thread: D103874)

To do so, this patch

1. Updates `LowerAVXCONCAT_VECTORS` in X86ISelLowering.cpp to recognize `FREEZE(UNDEF)` operand of `CONCAT_VECTOR` in addition to `UNDEF`
2. Updates X86InstrVecCompiler.td to recognize `insert_subvector` of `FREEZE(UNDEF)` vector as its first operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130339
2022-08-11 13:36:21 +09:00
Amaury Séchet 9bceb8981d [X86] (0 - SetCC) | C -> (zext (not SetCC)) * (C + 1) - 1 if we can get a LEA out of it.
This adresses various regression in D131260 , as well as is a useful optimization in itself.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131358
2022-08-10 15:12:00 +00:00
Simon Pilgrim b92c7dc211 [X86] Use DAG.getFreeze() to create freeze node. NFC. 2022-08-10 15:03:56 +01:00
Phoebe Wang c7ec6e19d5 [X86][BF16] Make backend type bf16 to follow the psABI
X86 psABI has updated to support __bf16 type, the ABI of which is the
same as FP16. See https://discourse.llvm.org/t/patch-add-optional-bfloat16-support/63149

This is an alternative of D129858, which has less code modification and
supports the vector type as well.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D130832
2022-08-10 08:58:56 +08:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Dawid Jurczak 1bd31a6898 [NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T
Extracted from https://reviews.llvm.org/D129781 and address comment:
https://reviews.llvm.org/D129781#3655571

Differential Revision: https://reviews.llvm.org/D130268
2022-08-05 13:35:41 +02:00
Craig Topper ff91b2d9df [X86] Promote i16 CTTZ/CTTZ_ZERO_UNDEF always.
If we're going to emit a rep prefix before bsf as proposed in
D130956, it makes sense to promote i16 operations to i32 to avoid
the false depedency of tzcntw.

Reviewed By: skan, pengfei

Differential Revision: https://reviews.llvm.org/D130995
2022-08-03 13:12:20 -07:00
David Truby 9a976f3661 [llvm] Always use TargetConstant for FP_ROUND ISD Nodes
This patch ensures consistency in the construction of FP_ROUND nodes
such that they always use ISD::TargetConstant instead of ISD::Constant.

This additionally fixes a bug in the AArch64 SVE backend where patterns
were matching against TargetConstant nodes and sometimes failing when
passed a Constant node.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130370
2022-08-03 14:02:11 +01:00
Phoebe Wang 23021d4d8c [X86][FP16] Fix vector_shuffle and lowering without f16c feature problems
The problem Alexander reported on D127982 was caused by an optimization
for AVX512-FP16 instruction. We must limit it to the feature enabled only.

During the investigation, I found we didn't expand for fp_round/fp_extend
without F16C. This may result runtime crash, so change them too.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130817
2022-08-02 22:26:41 +08:00
Simon Pilgrim acb5abb7d3 [X86] getFauxShuffleMask - use DemandedElts variant of getTargetShuffleInputs. NFCI.
We don't specify the demanded elts yet, this patch just rewires the getTargetShuffleInputs calls and gives an "all demanded elts" mask.
2022-07-31 12:15:04 +01:00
Simon Pilgrim 9cdba33337 [X86] combineX86ShufflesRecursively - determine demanded elts to pass to getTargetShuffleInputs
Only PACKSS/PACKUS faux shuffles make use of the demanded elts at the moment, but this at least improves the handling of a couple of truncation patterns.
2022-07-31 11:30:40 +01:00
Simon Pilgrim df457f583a [X86] Use std::tie so we can have more meaningful variable names for demanded bits/elts pairs. NFCI.
*.first + *.second were proving difficult to keep track of.
2022-07-30 18:57:15 +01:00
Simon Pilgrim a14f94c20c [X86] computeKnownBitsForTargetNode - out of range X86ISD::VSRAI doesn't fold to zero
Noticed by inspection and I can't seem to make a test case, but SSE arithmetic bit shifts clamp to the max shift amount (i.e. create a sign splat) - combineVectorShiftImm already does something similar.
2022-07-30 17:55:39 +01:00
Simon Pilgrim 813459ed2b [X86] combineSelect fold 'smin' style pattern select(pcmpgt(RHS, LHS), LHS, RHS) -> select(pcmpgt(LHS, RHS), RHS, LHS) if pcmpgt(LHS, RHS) already exists
Avoids repeated commuted comparisons when we're performing min/max and clamp patterns
2022-07-30 15:31:36 +01:00
Simon Pilgrim bc2c4f6c85 [X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X) (REAPPLIED)
If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc.

Updated version - original version rG057db2002bb3 didn't correctly account for multiple uses of the mask that might be folding "OR(AND(X,C),AND(Y,~C)) -> OR(AND(X,C),ANDNP(C,Y))" in canonicalizeBitSelect
2022-07-29 15:12:26 +01:00
Florian Hahn f912bab111
Revert "[X86][DAGISel] Don't widen shuffle element with AVX512"
This reverts commit 5fb4134210.

This patch is causing crashes when building llvm-test-suite when
optimizing for CPUs with AVX512.

Reproducer crashing with llc:

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
    target triple = "x86_64-apple-macosx"

    define i32 @test(<32 x i32> %0) #0 {
    entry:
      %1 = mul <32 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
      %2 = tail call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %1)
      ret i32 %2
    }

    ; Function Attrs: nocallback nofree nosync nounwind readnone willreturn
    declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1

    attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" }
    attributes #1 = { nocallback nofree nosync nounwind readnone willreturn }
2022-07-28 15:26:42 +01:00
Luo, Yuanke 5fb4134210 [X86][DAGISel] Don't widen shuffle element with AVX512
Currently the X86 shuffle lowering would widen the element type for
shuffle if the mask element value is adjacent. For below example

  %t2 = add nsw <16 x i32> %t0, %t1
  %t3 = sub nsw <16 x i32> %t0, %t1
  %t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3,
                      <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4,
                       i32 5, i32 6, i32 7, i32 8, i32 9, i32 10,
                       i32 11, i32 12, i32 13, i32 14, i32 15>

  ret <16 x i32> %t4

Compiler would transform the shuffle to
  %t4 = shufflevector <8 x i64> %t2, <8 x i64> %t3,
                      <8 x i64> <i32 8, i32 1, i32 2, i32 3, i32 4,
                                 i32 5, i32 6, i32 7>
This may lose the oppotunity to let ISel select mask instruction when
avx512 is enabled.

This patch is to prevent the tranform when avx512 feature is enabled.
Thank Simon for the idea.

Differential Revision: https://reviews.llvm.org/D129537
2022-07-26 11:56:03 +08:00
Craig Topper 00060a7b97 [X86] Custom type legalize v2i32 smulo/umulo to use a single pmuldq/pmuludq.
With SSE4.1 and above we were using 3 multiply instructions. This
was due to type legalization widening to v4i32 and the low half
being done with pmulld while the high half used two pmuldq/pmuludq.

Instead of that, we can use a single pmuludq/pmuldq to calculate
the full product at once, extract the high and low bits and compare
to check for overflow.

I've restricted SMULO to sse4.1 to get pmuldq. We can probably
do a fixup to pmuludq on earlier targets, but that's for another day.

I was going through my git stash and found an early version of this patch
from a year or two ago so I went ahead and finished it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130432
2022-07-25 09:12:35 -07:00
Simon Pilgrim 0708771cce [DAG] MaskedVectorIsZero - don't bother with (-1).isSubsetOf mask check. NFC.
Just use KnownBits::isZero() to ensure all the bits are known zero.
2022-07-24 13:12:21 +01:00
Simon Pilgrim 69d1e805ce [X86] combineAndnp - remove unused variable. NFC. 2022-07-24 11:32:44 +01:00
Simon Pilgrim ce81a0df67 [X86][SSE] Enable X86ISD::ANDNP constant folding 2022-07-24 11:07:34 +01:00
Simon Pilgrim 293899c64b [X86] Don't assume an AND/ANDNP element is undef/undemanded just because one element is undef
For mask ops like these, the other operand's corresponding element might be zero (result = zero) - so we must demand all the bits and that element.

This appears to be what D128570 was trying to fix - both sides of the funnel shift mask of the vXi64 (legalized to v2Xi32) were incorrectly simplifying the upper 32-bit halves to undef, resulting in bad folds later on.

I intend to address the test case regressions, but this close to the release branch I'd prefer to get a fix in first.
2022-07-24 10:53:38 +01:00
Simon Pilgrim 676a03d8a5 [X86] matchBinaryShuffle - limit SHUFFLE(X,Y) -> OR(X,Y) cases to where X + Y are the same width as the result
Minor bit of prep work toward not unnecessarily widening shuffle operands in combineX86ShufflesRecursively, instead only widening in combineX86ShuffleChain if we actual find a match - see Issue #45319
2022-07-23 16:56:45 +01:00
Arnold Schwaighofer 58e6ee0e1f llvm.swift.async.context.addr cannot be modeled as NoMem because we don't want it to be cse'd accross async suspends
An async suspend models the split between two partial async functions.
`llvm.swift.async.context.addr ` will have a different value in the two
partial functions so it is not correct to generally CSE the instruction.

rdar://97336162

Differential Revision: https://reviews.llvm.org/D130201
2022-07-22 11:50:58 -07:00
Phoebe Wang 02fe96b240 [X86][FP16] Do not split FP64->FP16 to FP64->FP32->FP16
Truncation from double to half is not always identical to truncating to float first and then to half. https://godbolt.org/z/56s9517hd

On the other hand, expanding to float and then to double is always identical to expanding to double directly. https://godbolt.org/z/Ye8vbYPnY

Reviewed By: RKSimon, skan

Differential Revision: https://reviews.llvm.org/D130151
2022-07-22 08:36:05 +08:00
Bing1 Yu e01bf5a3e2 [X86] Promote v32f16's fadd into v32f32's fadd when it is avx512 without avx512fp16
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D130059
2022-07-19 14:37:50 +08:00
Benjamin Kramer 9234a7c0df [X86][FP16] Don't crash when lowering SELECT on fp16 vectors
This is a regression from f187948162
2022-07-18 13:41:00 +02:00
Phoebe Wang f187948162 [X86][FP16] Enable vector support for FP16 emulation
This is follow up of D107082, which enable vector support according to psABI.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D127982
2022-07-16 09:38:58 +08:00
Simon Pilgrim 66bfd1ba8c [X86] Move isInRange(ArrayRef<int>) inside assert to fix NDEBUG builds. NFC.
Fix unused static function warning introduced by D129207
2022-07-12 21:51:07 +01:00
Xiang1 Zhang a45dd3d814 [X86] Support -mstack-protector-guard-symbol
Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D129346
2022-07-12 10:17:00 +08:00
Xiang1 Zhang 643786213b Revert "[X86] Support -mstack-protector-guard-symbol"
This reverts commit efbaad1c4a.
due to miss adding review info.
2022-07-12 10:14:32 +08:00
Xiang1 Zhang efbaad1c4a [X86] Support -mstack-protector-guard-symbol 2022-07-12 10:13:48 +08:00
Simon Pilgrim 97868fb972 [X86] isTargetShuffleEquivalent - attempt to match SM_SentinelZero shuffle mask elements using known bits
If the combined shuffle mask requires zero elements, we don't currently have much chance of matching them against the expected source vector. This patch uses the SelectionDAG::MaskedVectorIsZero wrapper to attempt to determine if the expected lement we want to use is already known to be zero.

I've also tightened up the ExpectedMask assertion to always be in range - we're never giving it a target shuffle mask that has sentinels at all - allowing to remove some of the confusing bounds checks.

This attempts to address some of the regressions uncovered by D129150 where we more aggressively fold shuffles as AND / 'clear' masks which results in more combined shuffles using SM_SentinelZero.

Differential Revision: https://reviews.llvm.org/D129207
2022-07-11 15:29:44 +01:00
Phoebe Wang 8fb083d33e [X86][FP16] Add constrained FP support for scalar emulation
This is a follow up patch to support constrained FP in FP16 emulation.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D128114
2022-07-08 20:33:42 +08:00
Phoebe Wang 6c535f9f1b [X86][FP16] Fix crash when lowering copysign for f16
This is to address the assertion fail reported in https://reviews.llvm.org/D107082#3635612
Not sure if it is a problem of promoting FCOPYSIGN + libcall FP_ROUND.
The promoting will set the rounding mode to 1 a442c62888/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (L4810-L4814)
While libcall cannot handle the rounding mode equals to 1 a442c62888/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (L4324-L4328)
So changing the action to Expand to workaround the problem.

Reviewed By: clementval, MaskRay

Differential Revision: https://reviews.llvm.org/D129294
2022-07-07 19:17:26 -07:00
Simon Pilgrim fbb51ac0ba [X86] LowerShift - lower some shuffles directly to X86ISD::PSHUFLW nodes.
These are expected to lower to X86ISD::PSHUFLW but we were seeing some regressions in D129150 because it'd managed to exploit the masking of the shift amounts to create unintended clear masks instead.
2022-07-06 18:01:03 +01:00
Shilei Tian 1023ddaf77 [LLVM] Add the support for fmax and fmin in atomicrmw instruction
This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127041
2022-07-06 10:57:53 -04:00
Paul Robinson 08e4fe6c61 [X86] Add RDPRU instruction
Add support for the RDPRU instruction on Zen2 processors.

User-facing features:

- Clang option -m[no-]rdpru to enable/disable the feature
- Support is implicit for znver2/znver3 processors
- Preprocessor symbol __RDPRU__ to indicate support
- Header rdpruintrin.h to define intrinsics
- "rdpru" mnemonic supported for assembler code

Internal features:

- Clang builtin __builtin_ia32_rdpru
- IR intrinsic @llvm.x86.rdpru

Differential Revision: https://reviews.llvm.org/D128934
2022-07-06 07:17:47 -07:00
Craig Topper 2bfca35614 [X86] Disable combineVectorSizedSetCCEquality for soft float.
The vector types aren't legal with soft float.
Also disable under NoImplicitFloat for good measure.

Fixes PR56351.

Differential Revision: https://reviews.llvm.org/D129060
2022-07-04 08:33:30 -07:00
Simon Pilgrim 26708fa166 Revert rG057db2002bb3: [X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X)
If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc.

Reverted due to reports from @alexfh about it causing an infinite loop (repro still pending).
2022-07-01 10:36:09 +01:00
Simon Pilgrim e961e05d59 [SLP][X86] Add 32-bit vector stores to help vectorization opportunities
Building on the work on D124284, this patch tags v4i8 and v2i16 vector loads as custom, enabling SLP to try to vectorize these types ending in a partial store (using the SSE MOVD instruction) - we already do something similar for 64-bit vector types.

Differential Revision: https://reviews.llvm.org/D127604
2022-06-30 20:25:50 +01:00
Craig Topper 3706bdad4a [X86] Remove unnecessary COPY from EmitLoweredCascadedSelect.
I believe we already checked that the destination of the first
CMOV is only used by the second CMOV so I don't think there is any
reason we need the PHI to write the register that was used by the
first CMOV. We can directly use the second CMOV destination and
avoid the copy.

This may be a left over from when the cascaded select handling
was part of the main algorithm before it was refactored in D35685.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D128124
2022-06-28 09:33:33 -07:00