Commit Graph

2296 Commits

Author SHA1 Message Date
Simon Pilgrim e2d13fd096 [DAG] canCreateUndefOrPoison - add freeze(shl(x,y)) -> shl(freeze(x),y) support
These are guaranteed not to create undef/poison if the shift amount is known to be in range
2022-08-14 14:38:10 +01:00
Simon Pilgrim a621d38bcb [DAG] canCreateUndefOrPoison - add freeze(and/or/xor(x,y)) -> and/or/xor(freeze(x),y) support
These are guaranteed not to create undef/poison
2022-08-14 13:14:53 +01:00
Simon Pilgrim 60534b8879 [DAG] canCreateUndefOrPoison - add freeze(add/sub/mul(x,y)) -> add/sub/mul(freeze(x),y,z) support
These are guaranteed not to create undef/poison as long as there are no poison generating flags
2022-08-13 20:58:00 +01:00
aqjune 02e56e2533 [CodeGen] Generate efficient assembly for freeze(poison) version of `mm*_cast*` intel intrinsics
This patch makes the variants of `mm*_cast*` intel intrinsics that use `shufflevector(freeze(poison), ..)` emit efficient assembly.
(These intrinsics are planned to use `shufflevector(freeze(poison), ..)` after shufflevector's semantics update; relevant thread: D103874)

To do so, this patch

1. Updates `LowerAVXCONCAT_VECTORS` in X86ISelLowering.cpp to recognize `FREEZE(UNDEF)` operand of `CONCAT_VECTOR` in addition to `UNDEF`
2. Updates X86InstrVecCompiler.td to recognize `insert_subvector` of `FREEZE(UNDEF)` vector as its first operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130339
2022-08-11 13:36:21 +09:00
Simon Pilgrim bbc27d0148 [DAG] canCreateUndefOrPoison - add freeze(truncate(x)) -> truncate(freeze(x)) support 2022-08-10 11:27:22 +01:00
Simon Pilgrim 2724143551 [DAG] canCreateUndefOrPoison - add freeze(ctpop(x)) -> ctpop(freeze(x)) and freeze(parity(x)) -> parity(freeze(x)) support
Both are guaranteed not to create undef/poison
2022-08-09 10:10:29 +01:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Simon Pilgrim 6f2bee667a [DAG] canCreateUndefOrPoison - add freeze(bswap(x)) -> bswap(freeze(x)) and freeze(bitreverse(x)) -> bitreverse(freeze(x)) support
Both are guaranteed not to create undef/poison
2022-08-08 17:27:17 +01:00
Simon Pilgrim e4b2c52420 [DAG] canCreateUndefOrPoison - add freeze(sext(x)) -> sext(freeze(x)) and freeze(zext(x)) -> zext(freeze(x)) support
Both are guaranteed not to create undef/poison
2022-08-08 16:43:40 +01:00
Simon Pilgrim 9641a201a5 [DAG] Add initial SelectionDAG::canCreateUndefOrPoison support
This patch adds basic support for a DAG variant of the canCreateUndefOrPoison call and updates DAGCombiner::visitFREEZE to use it, further Opcodes (including target specific Opcodes) can be handled when we have test coverage.

So far, I've left visitFREEZE to just use this for unary nodes (which currently means the existing BITCAST/FREEZE cases) - later patches will add other unary opcodes (with test coverage) and we can also refactor visitFREEZE to support a general number of operands like we do in InstCombinerImpl::pushFreezeToPreventPoisonFromPropagating.

I'm not aware of any vector test freeze coverage so the DemandedElts (and the Depth) args are not being used yet - but they are in place. Similarly we will be able to handle poison generating SDNodeFlags as and when it becomes an issue.

Part of the work for D106675 / PR50468

Differential Revision: https://reviews.llvm.org/D130646
2022-08-08 15:16:06 +01:00
Simon Pilgrim b334709467 Remove superfluous ; outside of a function 2022-08-08 12:14:03 +01:00
Simon Pilgrim e5e93b6130 [DAG] FoldConstantArithmetic - add initial support for undef elements in bitcasted binop constant folding
FoldConstantArithmetic can fold constant vectors hidden behind bitcasts (e.g. vXi64 -> v2Xi32 on 32-bit platforms), but currently bails if either vector contains undef elements. These undefs can often occur due to SimplifyDemandedBits/VectorElts calls recognising that the upper bits are often unnecessary (e.g. funnel-shift/rotate implicit-modulo and AND masks).

This patch adds a basic 'FoldValueWithUndef' handler that will attempt to constant fold if one or both of the ops are undef - so far this just handles the AND and MUL cases where we always fold to zero.

The RISCV codegen increase is interesting - it looks like the BUILD_VECTOR lowering was loading a constant pool entry but now (with all elements defined constant) it can materialize the constant instead?

Differential Revision: https://reviews.llvm.org/D130839
2022-08-08 11:53:56 +01:00
Dawid Jurczak 1bd31a6898 [NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T
Extracted from https://reviews.llvm.org/D129781 and address comment:
https://reviews.llvm.org/D129781#3655571

Differential Revision: https://reviews.llvm.org/D130268
2022-08-05 13:35:41 +02:00
David Truby 9a976f3661 [llvm] Always use TargetConstant for FP_ROUND ISD Nodes
This patch ensures consistency in the construction of FP_ROUND nodes
such that they always use ISD::TargetConstant instead of ISD::Constant.

This additionally fixes a bug in the AArch64 SVE backend where patterns
were matching against TargetConstant nodes and sometimes failing when
passed a Constant node.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130370
2022-08-03 14:02:11 +01:00
Simon Pilgrim 9082c13106 [Support] Add KnownBits::concat method
Add a method for the various cases where we need to concatenate 2 KnownBits together (BUILD_PAIR and SHIFT_PARTS in particular) - uses the existing APInt::concat 'HiBits.concat(LoBits)' convention

Differential Revision: https://reviews.llvm.org/D130557
2022-07-29 11:06:39 +01:00
Simon Pilgrim 8c99cef1e7 [DAG] Remove SelectionDAG::GetDemandedBits and use SimplifyMultipleUseDemandedBits directly.
GetDemandedBits is mainly a wrapper around SimplifyMultipleUseDemandedBits now, and is only used by DAGCombiner::visitSTORE so I've moved all remaining functionality there.

visitSTORE was making use of this to 'simplify' constants for a trunc-store. Just removing this code left to a mixture of regressions and gains - it came down to whether a target preferred a sign or zero extended constant for materialization/truncation. I've just moved the code over for now, but a next step would be to move this to targetShrinkDemandedConstant, but some targets that override the method expect a basic binop, and might react badly to a store node.....
2022-07-28 17:03:44 +01:00
Simon Pilgrim ea7f14dad0 [DAG] SelectionDAG::GetDemandedBits - don't simplify opaque constants
I'm actually trying to get rid of GetDemandedBits - but while dismantling it I noticed that we were altering opaque constants. Fixing that causes a FP_TO_INT_SAT regression that should be addressed separately - I'll raise a bug.
2022-07-28 14:46:59 +01:00
Simon Pilgrim 69d5a038b9 [DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.

This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits.

There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP.

Differential Revision: https://reviews.llvm.org/D77804
2022-07-28 14:10:44 +01:00
Simon Pilgrim c0b3f7a50f [DAG] SimplifyDemandedBits - ensure we clear known One bits that AssertZext asserts are really known Zero
Matches ComputeKnownBits behaviour

Thanks to @uabelho for the fuzz regression report on D129765
2022-07-27 13:57:47 +01:00
Paul Walker e5c892dd85 [SVE][SelectionDAG] Use INDEX to generate matching instances of BUILD_VECTOR.
This patch starts small, only detecting sequences of the form
<a, a+n, a+2n, a+3n, ...> where a and n are ConstantSDNodes.

Differential Revision: https://reviews.llvm.org/D125194
2022-07-26 15:28:37 +00:00
Simon Pilgrim 428c0f2adc [DAG] getNode - assert that SMUL_LOHI/UMUL_LOHI nodes have the correct ops + types 2022-07-24 15:30:57 +01:00
Simon Pilgrim 0708771cce [DAG] MaskedVectorIsZero - don't bother with (-1).isSubsetOf mask check. NFC.
Just use KnownBits::isZero() to ensure all the bits are known zero.
2022-07-24 13:12:21 +01:00
Simon Pilgrim ac8be21365 [DAG] isSplatValue - don't attempt to merge any BITCAST sub elements if they contain UNDEFs
We still haven't found a solution that correctly handles 'don't care' sub elements properly - given how close it is to the next release branch, I'm making this fail safe change and we can revisit this later if we can't find alternatives.

NOTE: This isn't a reversion of D128570 - it's the removal of undef handling across bitcasts entirely

Fixes #56520
2022-07-23 18:38:48 +01:00
Simon Pilgrim 8937252465 [DAG] computeKnownBits - add basic shift-by-parts handling
Concat KnownBits from ISD::SHL_PARTS / ISD::SRA_PARTS / ISD::SRL_PARTS lo/hi operands and perform the KnownBits calculation by the shift amount on the extended type, before splitting the KnownBits based on the requested lo/hi result.
2022-07-23 09:46:30 +01:00
David Green 23d6186be0 [SelectionDAG] Fix fptoi.sat scalable vector lowering
Vector fptosi_sat and fptoui_sat were being expanded by unrolling the
vector operation. This doesn't work for scalable vector, so this patch
adds a call to TLI.expandFP_TO_INT_SAT if the vector is scalable.

Scalable tests are added for AArch64 and RISCV. Some of the AArch64
fptoi_sat operations should be legal, but that will be handled in
another patch.

Differential Revision: https://reviews.llvm.org/D130028
2022-07-21 08:00:22 +01:00
Simon Pilgrim 029e83b401 [DAG] getNode - don't bother creating ADDO(X,0) or SUBO(X,0) nodes.
Similar to what we already do in getNode for basic ADD/SUB nodes, return the X operand directly, but here we know that there will be no/zero overflow as well.

As noted on D127115 - this path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen without a topological worklist.
2022-07-20 12:04:33 +01:00
Simon Pilgrim 766cd95481 [DAG] getNode - assert that ADDO/SUBO nodes have the correct ops + types 2022-07-20 11:23:58 +01:00
Matt Arsenault 8d0383eb69 CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.

Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.

Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
2022-07-18 17:23:41 -04:00
Simon Pilgrim 26ce33706f [DAG] computeKnownBits - move UDIV handling to same place as UREM/SREM. NFC. 2022-07-17 11:59:42 +01:00
Simon Pilgrim 5ec47c6dc5 [DAG] Add MERGE_VALUE computeKnownBits/ComputeNumSignBits handling.
Just forward the value tracking to the operand specified by the ResNo
2022-07-17 11:58:08 +01:00
Kazu Hirata 9e6d1f4b5d [CodeGen] Qualify auto variables in for loops (NFC) 2022-07-17 01:33:28 -07:00
Philip Reames dde2a7fb6d [RISCV] Exploit fact that vscale is always power of two to replace urem sequence
When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale.

vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.)

We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.)

It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic.

Differential Revision: https://reviews.llvm.org/D129609
2022-07-13 10:54:47 -07:00
Nicolai Hähnle ede600377c ManagedStatic: remove many straightforward uses in llvm
(Reapply after revert in e9ce1a5880 due to
Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
than error categories, to be checked in more detail and reapplied
separately.)

Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 10:29:15 +02:00
Nicolai Hähnle e9ce1a5880 Revert "ManagedStatic: remove many straightforward uses in llvm"
This reverts commit e6f1f06245.

Reverting due to a failure on the fuchsia-x86_64-linux buildbot.
2022-07-10 09:54:30 +02:00
Nicolai Hähnle e6f1f06245 ManagedStatic: remove many straightforward uses in llvm
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 09:15:08 +02:00
Sergei Barannikov 2247fdc84d [SelectionDAG] computeKnownBits / ComputeNumSignBits for the remaining overflow-aware nodes
Some overflow-aware nodes were missing from the switches in
computeKnownBits and ComputeNumSignBits.
2022-07-08 09:19:19 +01:00
Shilei Tian 1023ddaf77 [LLVM] Add the support for fmax and fmin in atomicrmw instruction
This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127041
2022-07-06 10:57:53 -04:00
Xiang1 Zhang 72a23cef7e [ISel] Match all bits when merge undefs for DAG combine
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D128570
2022-07-01 09:09:43 +08:00
Xiang1 Zhang 64f44a90ef Revert "[ISel] Match all bits when merge undef(s) for DAG combine"
This reverts commit 5fe5aa284e.
2022-07-01 08:59:04 +08:00
Xiang1 Zhang 5fe5aa284e [ISel] Match all bits when merge undef(s) for DAG combine 2022-07-01 08:58:00 +08:00
Tim Northover 4aafebce52 SelectionDAG: allow FP extensions when folding extract/insert.
Before, we were trying to sign extend half -> float, and asserted in getNode.
2022-06-28 12:08:35 +01:00
Simon Pilgrim 2c3a4a9334 [DAG] SelectionDAG::GetDemandedBits - don't recurse back into GetDemandedBits
Another minor cleanup as we work toward removing GetDemandedBits entirely - call SimplifyMultipleUseDemandedBits directly.
2022-06-22 13:48:57 +01:00
Simon Pilgrim 8cecb6be56 [DAG] Remove SelectionDAG::GetDemandedBits DemandedElts variant. NFC.
We're slowly removing SelectionDAG::GetDemandedBits and replacing it with SimplifyMultipleUseDemandedBits, we no longer have any uses for the vector demanded elt variant.
2022-06-21 21:23:10 +01:00
Kazu Hirata 7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Guillaume Chatelet 03036061c7 [Alignment] Use 'previous()' method instead of scalar division
This is in preparation of integration with D128052.

Differential Revision: https://reviews.llvm.org/D128169
2022-06-20 11:01:43 +00:00
Simon Pilgrim ba3f2667b6 [DAG] Add MaskedVectorIsZero helper
Equivalent to MaskedValueIsZero, except its checking if all of the demanded vectors elements are known to be zero
2022-06-19 17:56:30 +01:00
Adrian Tong 55311801f0 Allow bitwidth difference when checking for isOneOrOneSplat.
This helps handling a case where the BUILD_VECTOR has i16 element type
and i32 constant operands

t2: v8i16 = setcc t8, t17, setult:ch
t3: v8i16 = BUILD_VECTOR Constant:i32<1>, ...
   t4: v8i16 = and t2, t3
      t5: v8i16 = add t8, t4

This can be turned into t5: v8i16 = sub t8, t2, and allows us to remove
t3 and t4 from the DAG.

Differential Revision: https://reviews.llvm.org/D127354
2022-06-16 16:04:20 +00:00
Benjamin Kramer ca50cb120b [SelectionDAG] Constant fold FP_TO_BF16 and BF16_TO_FP. 2022-06-15 18:51:32 +02:00
Guillaume Chatelet 95083fa3b8 [NFC] Remove deadcode 2022-06-10 15:13:42 +00:00
Guillaume Chatelet 38637ee477 [clang] Add support for __builtin_memset_inline
In the same spirit as D73543 and in reply to https://reviews.llvm.org/D126768#3549920 this patch is adding support for `__builtin_memset_inline`.

The idea is to get support from the compiler to easily write efficient memory function implementations.

This patch could be split in two:
 - one for the LLVM part adding the `llvm.memset.inline.*` intrinsics.
 - and another one for the Clang part providing the instrinsic as a builtin.

Differential Revision: https://reviews.llvm.org/D126903
2022-06-10 13:13:59 +00:00