Commit Graph

12514 Commits

Author SHA1 Message Date
Craig Topper d4facda414 [TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant.
If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D135541
2022-10-10 11:02:22 -07:00
Craig Topper 9f67047cf0 [VP][RISCV] Add vp.smax/smin/umax/umin intrinsics
Differential Revision: https://reviews.llvm.org/D135418
2022-10-07 17:14:31 -07:00
Amaury Séchet 62ea6c5be7
[DAGCombine] Deduplicate addcarry node using commutativity.
The first two parameters of addcarry are commutative. We may face a situation where both variant are present in the DAG, in which case we benefit from using just one.

Depends on D57302 and D33587

Reviewed By: RKSimon, chfast

Differential Revision: https://reviews.llvm.org/D57317
2022-10-08 00:55:14 +02:00
eopXD dbc681c98e [VP][RISCV] Add vp.roundtozero and its RISC-V support
The scalar instruction of this is `llvm.trunc`. However the naming of
ISD::VP_TRUNC is already taken by `trunc` of the LLVM IR. Naming this as
`vp.ftrunc` would likely cause confusion with `vp.fptrunc`. So adding
`vp.roundtozero` that will look similar to `vp.roundeven`.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D135233
2022-10-07 02:15:23 -07:00
Philip Reames 04bb32e58a [DAG] Extract helper for (neg x) [nfc]
This is a frequently reoccurring pattern, let's factor it out.

Differential Revision: https://reviews.llvm.org/D135301
2022-10-06 13:23:52 -07:00
Pierre van Houtryve 3ec0085c3f [DAG] Update `isKnownNeverNaN` for `FMA/FMAD`
We can still get a NaN even if none of the operands are NaN,
e.g. from +inf/-inf. D50804 didn't catch that.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134854
2022-10-06 06:52:36 +00:00
jeff cebec42089 [DAGCombiner] [AMDGPU] Allow vector loads in MatchLoadCombine
Since SROA chooses promotion based on reaching load / stores of allocas, we may run into scenarios in which we alloca a vector, but promote it to an integer. The result of which is the familiar LoadCombine pattern (i.e. ZEXT, SHL, OR). However, instead of coming directly from distinct loads, the elements to be combined are coming from ExtractVectorElements which stem from a shared load.

This patch identifies such a pattern and combines it into a load.

Change-Id: I0bc06588f11e88a0a975cde1fd71e9143e6c42dd
2022-10-04 12:16:00 -07:00
Sanjay Patel 17dcbd8165 [SDAG] don't hoist div/rem through a select with neutral constant
This bug was introduced with D134966.
2022-10-04 13:15:01 -04:00
Jay Foad af947d9fcb [ISel] Fix crash in new FMA DAG combine
Fix a crash in the FMA combine added by D132837 and amended by D134810.
In cases where the newly created node could be folded, the combiner
would fail this assertion:

llc: DAGCombiner.cpp:268: void (anonymous namespace)::DAGCombiner::AddToWorklist(llvm::SDNode *): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed.

Differential Revision: https://reviews.llvm.org/D135150
2022-10-04 15:13:18 +01:00
Philip Reames a200b0fc25 [DAG] Introduce getSplat utility for common dispatch pattern [nfc]
We have a very common pattern of dispatching between BUILD_VECTOR and SPLAT_VECTOR creation repeated in many cases in code.  Common the pattern into a utility function.
2022-10-03 12:49:39 -07:00
Philip Reames 21f97fdc97 [DAG] Use getSplatBuildVector in a couple more places [nfc] 2022-10-03 09:48:49 -07:00
Markus Böck 36af4c8418 [SelectionDAG] Fix use-after-free introduced in D130881
The code introduced in https://reviews.llvm.org/D130881 has a bug as it may cause a use-after-free error that can be caught by ASAN.
The bug essentially boils down to iterator invalidation of `DenseMap`. The expression `SDEI[To] = I->second;` may cause `SDEI` to grow if `To` is inserted for the very first time. When that happens, all existing iterators to the map are invalidated as their backing storage has been freed. Accessing `I->second` is then invalid and attempts to access freed memory (as `I` is an iterator of `SDEI`).

This patch fixes that quite simply by first making a copy of `I->second`, and then moving into the possibly newly inserted KV of the ` DenseMap`.

No test attached as I am not sure it is practible to test.

Differential revision: https://reviews.llvm.org/D135019
2022-10-03 15:09:14 +02:00
Petar Avramovic 1fa2019828 [SelectionDAG] Add check for BUILD_VECTOR in isKnownNeverNaN
Includes handling of constants with vector type in isKnownNeverNaN.
For AMDGPU results in not making fcanonicalize during legalization
for vector inputs to fmaxnum_ieee and fminnum_ieee. Does not affect
end result since there is a combine that eliminates fcanonicalize.

Differential Revision: https://reviews.llvm.org/D88573
2022-10-03 12:47:07 +02:00
David Green 3651635eca [ARM][DAG] BF16 constant handling.
Much like f16 and f32, we shouldn't try to shrink bf16 to smaller fp
constant.  The code may not be optimal, but this allows us to legalize
bf16 constants under Arm without errors.
2022-10-02 11:51:08 +01:00
Yeting Kuo cefb7aab61 [VP][RISCV] Add vp.copysign and RISC-V support.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134935
2022-10-01 10:19:10 +08:00
Simon Pilgrim 61dc5014ac [DAG] Update foldSelectWithIdentityConstant to use llvm::isNeutralConstant
D133866 added the llvm::isNeutralConstant helper to track neutral/passthrough constants

This patch updates foldSelectWithIdentityConstant to use the helper instead of maintaining its own opcode handling

Differential Revision: https://reviews.llvm.org/D134966
2022-09-30 17:46:52 +01:00
Amaury Séchet 031a7ad575 [NFC] Fix erroneous indentation. 2022-09-30 12:30:27 +00:00
Yeting Kuo 1cc02b05b7 [SelectionDAG] Add helper function to check whether a SDValue is neutral element. NFC.
Using this helper makes work about neutral elements more easier. Although I only
find one case now, I think it will have more chance to be used since so many
combine works are related to neutral elements.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133866
2022-09-30 11:29:11 +08:00
Amaury Séchet 923909afbe [DAG] Simplify the select of constant combine code. NFC 2022-09-30 01:03:14 +00:00
Amaury Séchet d7600c7ccb [DAG] select Cond, C, -1 --> or (sext (not Cond)), C when C is MVT::i1
In the spirit of D130765 . Get rid of cbranches and/or cmov. Usually shorter, but sometime not, becaus eit's hard to prededict when dependency breaking xor will be introduced.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134736
2022-09-30 00:36:58 +00:00
Thomas Symalla a41dde2c62 [AMDGPU] Add use check in v_fma combine.
In D132837, an existing v_fma combine was extended to regard nested
fma instructions. Originally, the inner FMA was checked for being used
only once. In its current state, this check is missing, which causes
some regressions.

In this patch, this check was added.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D134856
2022-09-29 12:25:03 +02:00
Jay Foad 2c12a04bba [ISel] Fix DAG divergence after new FMA combine
D132837 introduced a new DAG combine that used MorphNodeTo to morph an
FMUL into an FMA. It turns out that MorphNodeTo does not properly update
the divergence bit for users of the morphed node, causing an assertion
failure on the new test case:

llc: SelectionDAG.cpp:10486: void llvm::SelectionDAG::VerifyDAGDivergence(): Assertion `calculateDivergence(N) == N->isDivergent() && "Divergence bit inconsistency detected"' failed.

Fixing MorphNodeTo to propagate the divergence bit is tricky because of
the way it is used to select machine instructions, so use getNode and
ReplaceAllUsesOfValueWith instead.

Differential Revision: https://reviews.llvm.org/D134810
2022-09-28 19:41:51 +01:00
Craig Topper 12357e88af [RISCV][SelectionDAGBuilder] Fix crash when copying a v1f32 vector between basic blocks.
On a rv64 without f32 or vector support, this will be passed across
the basic block as an i64. We need use i32 as an intermediate type
with bitcast and anyext/trunc.

Fixes PR58025

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134758
2022-09-28 10:13:35 -07:00
Matt Devereau 0a4771a7e8 [AArch64][SVE] Expand gather index to 32 bits instead of 64 bits
For gathers which load in 8 and 16 bit data then use that data
as an index, the index can be extended to 32 bits instead of
64 bits

Differential Revision: https://reviews.llvm.org/D130692
2022-09-28 14:42:12 +00:00
jacquesguan 465ac0b96e [LegalizeTypes] Use getVectorElementCount to avoid crash of scalable vector.
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134718
2022-09-28 17:31:29 +08:00
eopXD 9677d70eb2 [VP][RISCV] Add vp.floor, vp.round, vp.roundeven and their RISC-V support
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134759
2022-09-27 19:45:58 -07:00
eopXD 163cb33854 [VP][RISCV] Add vp.ceil and RISC-V support
Previous commit 8b00b24f85 missed to add `int_ceil` anchor for the
llvm.ceil.* section under LangRef.rst

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134586
2022-09-27 12:04:09 -07:00
eopXD 384b8b3da7 Revert "[VP][RISCV] Add vp.ceil and RISC-V support"
This reverts commit 8b00b24f85.
2022-09-27 11:12:57 -07:00
eopXD 8b00b24f85 [VP][RISCV] Add vp.ceil and RISC-V support
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134586
2022-09-27 11:08:27 -07:00
Craig Topper a6383bb51c [VP][RISCV] Add vp.fmuladd.
Expanded in SelectionDAGBuilder similar to llvm.fmuladd.

Reviewed By: frasercrmck, simoll

Differential Revision: https://reviews.llvm.org/D134474
2022-09-27 10:02:37 -07:00
Amaury Séchet d1baed7c9c [DAG] select Cond, -1, C --> or (sext Cond), C if Cond is MVT::i1
This seems to be beneficial overall, except for midpoint-int.ll .

The X86 backend seems to generate zeroing that are not necesary.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D131260
2022-09-27 12:54:52 +00:00
Yeting Kuo 04e1301f3d [VP][RISCV] Add vp.maxnum and vp.minnum intrinsics and RISC-V support.
Add vp.maxnum and vp.minnum which are vector predicted intrinsics of llvm.maxnum
and llvm.minnum.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134639
2022-09-27 13:36:45 +08:00
Paul Scoropan ce004fb4f2 [PowerPC] XCOFF exception section support on the direct assembler path
This feature implements support for making entries in the exception section
on XCOFF on the direct assembly path using the ".except" pseudo-op. It also
provides functionality to lower entries (comprised of language and reason
codes) into the exception section through the use of annotation metadata
attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler
support will be provided in another review. https://reviews.llvm.org/D133030
needs to merge first for LIT tests

Reviewed By: shchenz, RKSimon

Differential Revision: https://reviews.llvm.org/D132146
2022-09-26 22:24:20 -04:00
Craig Topper afdd600a49 [LegalizeTypes][RISCV] Support f16 in ExpandIntRes_LLROUND_LLRINT.
Promote f16 to f32 and use the f32 libcall.

I deleted rv64zfh-half-intrinsics-strict.ll because it only existed due to this issue breaking rv32.

Differential Revision: https://reviews.llvm.org/D134579
2022-09-26 11:09:33 -07:00
Amaury Séchet b30bbd181b Small formating nit in DAGCombiner. NFC 2022-09-26 13:36:11 +00:00
Yeting Kuo 43c5fbdd3a [VP][RISCV] Add vp.sqrt intrinsic and RISC-V support.
The patch modeled vp.fabs patch D132793.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D133690
2022-09-26 10:47:40 +08:00
Philip Reames b9c4733079 [DAG] Move one-use add of splat to base of scatter/gather
This extends the uniform base transform used with scatter/gather to support one-use vector adds-of-splats with a non-zero base. This has the effect of essentially reassociating an add from vector to scalar domain.

The motivation is to improve the lowering of scatter/gather operations fed by complex geps.

Differential Revision: https://reviews.llvm.org/D134472
2022-09-22 18:45:12 -07:00
Philip Reames 60c91fd364 [RISCV] Disallow scale for scatter/gather
RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely.

This has two effects:
* First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to.
* Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.)

The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics.

Differential Revision: https://reviews.llvm.org/D134382
2022-09-22 15:31:26 -07:00
Craig Topper 9d236d4dab [SelectionDAGBuilder] Simplify how VTs is created for constrained intrinsics. NFC
All constrained intrinsics return a single value. We can directly
convert it to an EVT instead of going through ComputeValueTypes.
2022-09-22 14:21:22 -07:00
Philip Reames 46525fee81 [DAGCombine] Check both forms of a commutative transform
The transform to fold an add into the base of a scatter/gather was only checking to see if the LHS was a splat.  Included test change indicates that splats are not canonicalized to LHS, and that we need to check both sides.
2022-09-22 12:21:47 -07:00
Philip Reames 143f3bf8f4 [SDAG] Split handling of VPLoad/VPGather and VPStore/VPScatter [nfc]
The merged routines are not-idiomatic, and the code sharing that results is prettty minimal.  The confusion factor is not justified.
2022-09-21 09:06:02 -07:00
Thomas Symalla c98a46fee6 [ISel] Enable generating more fma instructions.
This patch changes a FADD / FMUL => FMA ISel pattern implemented
in D80801 so that it peeks through more than one FMA.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D132837
2022-09-21 12:03:11 +02:00
Matt Arsenault bcb931c484 SelectionDAG: Add AssumptionCache analysis dependency
Fixes compile time regression after
bb70b5d406
2022-09-19 19:10:51 -04:00
Matt Arsenault 0d8ffcc532 Analysis: Add AssumptionCache argument to isDereferenceableAndAlignedPointer
This does not try to pass it through from the end users.
2022-09-19 18:57:33 -04:00
Simon Pilgrim 8206044183 [DAG] SimplifyDemandedVectorElts - add MULHS/MULHU handling to existing MUL/AND handling
Allows to determine known zero elements, which particularly helps simplification of DIV/REM by constant patterns
2022-09-19 12:44:43 +01:00
Simon Pilgrim 47cfe71027 [DAG] MatchRotate - reuse existing LHSShiftArg/RHSShiftArg variables. NFC. 2022-09-18 14:35:10 +01:00
Craig Topper 1121eca685 [VP][VE] Default VP_SREM/UREM to Expand and add generic expansion using VP_SDIV/UDIV+VP_MUL+VP_SUB.
I want to default all VP operations to Expand. These 2 were blocking
because VE doesn't support them and the tests were expecting them
to fail a specific way. Using Expand caused them to fail differently.

Seemed better to emulate them using operations that are supported.

@simoll mentioned on Discord that VE has some expansion downstream. Not
sure if its done like this or in the VE target.

Reviewed By: frasercrmck, efocht

Differential Revision: https://reviews.llvm.org/D133514
2022-09-16 13:19:02 -07:00
Stanislav Mekhanoshin a0c8f5fefa [SDAG] Print divergence in SDNode::dump
If target does not support divergence the field is set to false
and not printed.

Differential Revision: https://reviews.llvm.org/D133984
2022-09-16 11:43:34 -07:00
Juan Manuel MARTINEZ CAAMAÑO e438f2d821 [DAGCombine] Do not fold SRA/SRL of MUL into MULH when MUL's LSB are
used, and MUL_LOHI is available

Folding into a sra(mul) / srl(mul) into a mulh introduces an extra
multiplication to compute the high half of the multiplication,
while it is more profitable to compute the high and lower halfs with a
single mul_lohi.

Differential Revision: https://reviews.llvm.org/D133768
2022-09-16 15:48:36 +00:00
Alexander Timofeev fbdea5a2e9 [AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode
This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler.  It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133593
2022-09-15 22:03:56 +02:00
Sergei Barannikov c6acb4eb0f [SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit.  This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.
2022-09-15 14:02:12 -04:00
Roland Froese 207228c1d6 [DAGCombiner] More load-store forwarding for big-endian
Get some load-store forwarding cases for big-endian where a larger store covers
a smaller load, and the offset would be 0 and handled on little-endian but on
big-endian the offset is adjusted to be non-zero. The idea is just to shift the
data to make it look like the offset 0 case.

Differential Revision: https://reviews.llvm.org/D130115
2022-09-14 15:36:35 -04:00
Craig Topper efd5acf120 [LegalizeTypes][NVPTX] Remove extra compare from fallback code for ISD::ADD in ExpandIntRes_ADDSUB.
This is the ultimate fallback code if UADDO isn't supported.

If the target uses 0/1 we used one compare, but if the target doesn't
use 0/1 we emitted two compares. Regardless of boolean constants we
should only need to check that the Result is less than one of the
original operands. So we only need one compare.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D133708
2022-09-13 09:07:56 -07:00
Craig Topper 38ffa2bb96 [LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130862
2022-09-12 10:34:52 -07:00
Pavel Samolysov f045d0c392 [NFC][ScheduleDAG] Use a reference to iterate over NodeSuccs/ChainSuccs 2022-09-12 15:54:48 +03:00
Pavel Samolysov 05946c144d [NFC][ScheduleDAG] Use structure bindings and emplace_back
Some uses of std::make_pair and the std::pair's first/second members
in the ScheduleDAGRRList.cpp file were replaced with using of the
vector's emplace_back along with structure bindings from C++17.
2022-09-12 15:54:48 +03:00
Matt Arsenault c34679b2e8 DAG: Sink some getter code closer to uses 2022-09-12 08:38:35 -04:00
Matt Arsenault bb70b5d406 CodeGen: Set MODereferenceable from isDereferenceableAndAlignedPointer
Previously this was assuming piontsToConstantMemory implies
dereferenceable.
2022-09-12 08:38:35 -04:00
Pavel Samolysov 354a3d9c02 [NFC][ScheduleDAG] Use Register and MCPhysReg instead of unsigned 2022-09-12 15:18:11 +03:00
Craig Topper 545affbf79 [DAGCombiner] Use HandleSDNode to keep node alive across call to getNegatedExpression.
getNegatedExpression can delete nodes. If the first call to
getNegatedExpression produced a node that the second call also
manages to create, it might get deleted. Use a HandleSDNode to
ensure it has a use to prevent it from being deleted.

Fixes PR57658.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133602
2022-09-09 22:02:41 -07:00
Craig Topper aa83bdd198 [DAGCombiner][X86] Fold (sub (subcarry X, 0, Carry), Y) -> (subcarry X, Y, Carry)
Fixes PR57576.

Differential Revision: https://reviews.llvm.org/D133471
2022-09-08 22:56:46 -07:00
Joe Loser 5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
Marco Elver 0ba8886af5 [FastISel] Propagate PCSections metadata to MachineInstr
Propagate PC sections metadata to MachineInstr when FastISel is doing
instruction selection.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130884
2022-09-07 11:36:01 +02:00
Marco Elver 4c58b00801 [SelectionDAG] Propagate PCSections through SDNodes
Add a new entry to SDNodeExtraInfo to propagate PCSections through
SelectionDAG.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130882
2022-09-07 11:22:50 +02:00
Marco Elver 7d63983c65 [SelectionDAG] Properly copy ExtraInfo on RAUW
During SelectionDAG legalization SDNodes with associated extra info may
be replaced with a new SDNode. Preserve associated extra info on
ReplaceAllUsesWith and remove entries in DeallocateNode.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130881
2022-09-06 16:32:50 +02:00
Marco Elver cc3faf4226 [SelectionDAG] Rename CallSiteDbgInfo to NodeExtraInfo
For information infrequently attached to SDNodes, it is useful to
provide a way to add this information out-of-line. This is already done
for call-site specific information.

Rename CallSiteDbgInfo to NodeExtraInfo in preparation of adding
additional information not necessarily related to call sites only.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130880
2022-09-06 16:32:50 +02:00
Benjamin Kramer c349d7f4ff [SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path
The main difference is that this preserves intermediate rounding steps,
which the other route doesn't. This aligns bfloat16 more with half
floats, which use this path on most targets.

I didn't understand what the difference was between these softening
approaches when I first added bfloat lowerings, would be nice if we only
had one of them.

Based on @pengfei 's D131502

Differential Revision: https://reviews.llvm.org/D133207
2022-09-06 11:54:34 +02:00
David Sherwood ffa6267300 [CodeGen] Support extracting fixed-length vectors from illegal scalable vectors
For some indices we can simply extract the fixed-length subvector from the
low half of the scalable vector, for example when the index is less than the
minimum number of elements in the low half. For all other cases we can
expand the operation through the stack by storing out the vector and
reloading the fixed-length part we need.

Fixes https://github.com/llvm/llvm-project/issues/55412

Tests added here:

  CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll

Differential Revision: https://reviews.llvm.org/D117499
2022-09-05 15:05:14 +01:00
Simon Pilgrim 4e6783f866 [DAG] getFreeze()/getNode() - account for operand depth when calling isGuaranteedNotToBeUndefOrPoison (PR57554)
Similar to #57402 - we were calling isGuaranteedNotToBeUndefOrPoison on the freeze operand (with Depth = 0), but wasn't accounting for the fact that a later isGuaranteedNotToBeUndefOrPoison assertion will call from the new node (with Depth = 0 as well) - which will then recursively call isGuaranteedNotToBeUndefOrPoison for its operands with Depth = 1

Fixes #57554
2022-09-05 11:46:46 +01:00
Craig Topper e529c0a2a0 [TargetLowering] Use ComputeMaxSignificantBits instead of ComputeNumSignBits in expandMUL_LOHI. NFC
The way ComputeNumSignBits was being used was only correct if
OuterBitSize is exactly 2x InnerBitSize. Which is always true,
but not obviously so. Comparing ComputeMaxSignificantBits to
InnerBitSize feels more correct.
2022-09-04 22:35:16 -07:00
Craig Topper 45e2809f71 [TargetLowering] Use getShiftAmountConstant. NFC 2022-09-04 20:22:32 -07:00
Simon Pilgrim 62cdfdab4d [DAG] canCreateUndefOrPoison - add freeze(insert_subvector(x,y,c)) -> insert_subvector(freeze(x),freeze(y),c) support
We already have plenty of assertions in place to ensure that the insertion index is constant and inrange
2022-09-03 13:41:33 +01:00
Simon Pilgrim eaede4b5b7 [DAG] extractShiftForRotate - replace assertion for shift opcode with an early-out
We feed the result from the first extractShiftForRotate call into the second, and that result might no longer be a shift op (usually due to constant folding).

NOTE: We REALLY need to stop creating nodes on the fly inside extractShiftForRotate!

Fixes Issue #57474
2022-08-31 15:50:48 +01:00
Simon Pilgrim 9d22800275 [DAG] visitFreeze - account for operand depth when calling isGuaranteedNotToBeUndefOrPoison (PR57402)
We were calling isGuaranteedNotToBeUndefOrPoison on operands (with Depth = 0), but wasn't accounting for the fact that a later isGuaranteedNotToBeUndefOrPoison assertion will call from the new node (with Depth = 0 as well) - which will then recursively call isGuaranteedNotToBeUndefOrPoison for its operands with Depth = 1

Fixes #57402
2022-08-31 12:20:30 +01:00
wanglian e2bb9774b1 [LegalizeTypes] Support widen result for VECTOR_REVERSE.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132359
2022-08-30 10:01:26 +08:00
Craig Topper 2f811a6c7f [VP][RISCV] Add vp.fabs intrinsic and RISC-V support.
Mostly just modeled after vp.fneg except there is a
"functional instruction" for fneg while fabs is always an
intrinsic.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D132793
2022-08-29 09:32:06 -07:00
Kazu Hirata d1688e9ddf [llvm] Use std::gcd (NFC)
This patch replaces calls to greatestCommonDivisor with std::gcd where
both arguments are known to be of unsigned.  This means that
std::common_type_t of the two argument types should just be the wider
one of the two.
2022-08-27 23:54:29 -07:00
Simon Pilgrim 88c7b16bed [DAG] Strip poison generating flags in freeze(op()) -> op(freeze()) fold
This patch follows the InstCombine approach of stripping poison generating flags (nsw/nuw from add/sub etc.) to allow us to push a freeze() through the op. Unlike InstCombine it doesn't retain any flags, but we have plenty of DAG folds that do the same thing already. We assert that the newly generated op isGuaranteedNotToBeUndefOrPoison.

Similar to the ValueTracking approach, isGuaranteedNotToBeUndefOrPoison has been updated to confirm that if an op can't create undef/poison and its operands are guaranteed not to be undef/poison - then its not undef/poison. This is just for the generic opcodes - target specific opcodes will need to do this manually just in case they have some special cases.

Differential Revision: https://reviews.llvm.org/D132333
2022-08-26 11:47:51 +01:00
Matthias Gehre 6d13b80fcb Revert "[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers"
This reverts https://reviews.llvm.org/D120329.
I abandoned the PR [0] to add __divei4 functions to compiler-rt
in favor of adding a pass to transform div/rem [1].

This removes the backend code that was supposed to emit calls to the __divei4 functions.

[0] https://reviews.llvm.org/D120327
[1] https://reviews.llvm.org/D130076

Differential Revision: https://reviews.llvm.org/D130079
2022-08-26 10:52:56 +01:00
wanglian 2887d7786f [DAGCombiner] Use FoldConstantArithmetic instead of dyn_cast in visitFP_ROUND.
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D132439
2022-08-25 11:29:05 +08:00
Sami Tolvanen cff5bef948 KCFI sanitizer
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.

Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.

KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.

A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:

```
.c:
  int f(void);
  int (*p)(void) = f;
  p();

.s:
  .4byte __kcfi_typeid_f
  .global f
  f:
    ...
```

Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.

As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.

Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.

Relands 67504c9549 with a fix for
32-bit builds.

Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay

Differential Revision: https://reviews.llvm.org/D119296
2022-08-24 22:41:38 +00:00
Sami Tolvanen a79060e275 Revert "KCFI sanitizer"
This reverts commit 67504c9549 as using
PointerEmbeddedInt to store 32 bits breaks 32-bit arm builds.
2022-08-24 19:30:13 +00:00
Sami Tolvanen 67504c9549 KCFI sanitizer
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.

Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.

KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.

A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:

```
.c:
  int f(void);
  int (*p)(void) = f;
  p();

.s:
  .4byte __kcfi_typeid_f
  .global f
  f:
    ...
```

Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.

As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.

Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.

Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay

Differential Revision: https://reviews.llvm.org/D119296
2022-08-24 18:52:42 +00:00
Simon Pilgrim 5377abcde2 [DAG] matchRotateHalf - constify SelectionDAG arg. NFC.
Based off Issue #57283 - we need to try harder to ensure we're not creating nodes on-the-fly - so make sure we're just using SelectionDAG for analysis where possible
2022-08-24 10:57:38 +01:00
Simon Pilgrim e624f8a3bb [DAG] MatchRotate - bail if we fail to match a shl/srl pair
extractShiftForRotate may fail to return canonicalized shifts due to constant folding or other simplification that can occur in getNode()

Fixes Issue #57283
2022-08-24 03:05:07 +01:00
Sanjay Patel f8dfbea324 [SDAG] expand more is-power-of-2 patterns that use popcount
(ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0)

Adjust the legality check to avoid the poor codegen on AArch64.
We probably only want to use popcount on this pattern when it
is a single instruction.

fixes #57225

Differential Revision: https://reviews.llvm.org/D132237
2022-08-23 17:53:53 -04:00
Arthur Eubanks d6cc7a5b46 [FastISel] Respect musttail over "disable-tail-calls"
musttail should be honored even in the presence of attributes like "disable-tail-calls". SelectionDAG properly handles this.

Update LangRef to explicitly mention that this is the semantics of musttail.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D132193
2022-08-23 08:55:40 -07:00
Jakub Kuderski 6fa87ec10f [ADT] Deprecate is_splat and replace all uses with all_equal
See the discussion thread for more details:
https://discourse.llvm.org/t/adt-is-splat-and-empty-ranges/64692

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D132335
2022-08-23 11:36:27 -04:00
Kazu Hirata ec5eab7e87 Use range-based for loops (NFC) 2022-08-20 21:18:32 -07:00
Kazu Hirata 258531b7ac Remove redundant initialization of Optional (NFC) 2022-08-20 21:18:28 -07:00
Lorenzo Albano 98117fe208 [VP] Add splitting for VP_STRIDED_STORE and VP_STRIDED_LOAD
Following the comment's thread of D117235, I added checks for the widening + splitting case, which also causes a split with one of the resulting vectors to be empty. Due to the same issues described in that same thread, the `fixed-vectors-strided-store.ll` test is missing the widening + splitting case, while the same case in the `strided-vpload.ll` test requires to manually split the loaded vector.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121784
2022-08-19 18:15:56 -07:00
wanglian fc2b4dfef2 [DAGCombiner] Add use check for VSCALE in visitSUB.
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D132115
2022-08-19 09:46:18 +08:00
Paul Walker 96c8d615d6 [SVE] Extend findMoreOptimalIndexType so BUILD_VECTORs do not force 64bit indices.
Extends findMoreOptimalIndexType to allow ISD::BUILD_VECTOR based
indices to be truncated when such truncation is lossless. This can
enable the use of 32bit gather/scatter indices thus making it less
likely to have to split a gather/scatter in two.

Depends on D125194

Differential Revision: https://reviews.llvm.org/D130533
2022-08-18 18:00:53 +01:00
wanglian 989ebc1783 [DAGCombiner][NFC] Tidy up unnecessary brackets in visitADD.
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D132107
2022-08-18 15:48:22 +08:00
wanglian 230e277dfe [DAGCombiner][NFC] Merge two if statement into one.
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131941
2022-08-18 10:12:35 +08:00
Sanjay Patel 7f72a0f5bb [SDAG] avoid generating libcall to function with same name
This is a potentially better alternative to D131452 that also
should avoid the infinite loop bug from:
issue #56403

This is again a minimal fix to reduce merging pain for the
release. But if this makes sense, then we might want to guard
all of the RTLIB generation (and other libcalls?) with a
similar name check.

Differential Revision: https://reviews.llvm.org/D131521
2022-08-17 16:19:34 -04:00
Nick Desaulniers 6b0e2fa6f0 [SelectionDAG] make INLINEASM_BR use MachineBasicBlocks instead of BlockAddresses
As part of re-architecting callbr to no longer use blockaddresses
(https://reviews.llvm.org/D129288), we don't really need them in MIR.
They make comparing MachineBasicBlocks of indirect targets during
MachineVerifier a PITA.

Suggested by @efriedma from the discussion:
https://reviews.llvm.org/D130290#3669531

Reviewed By: efriedma, void

Differential Revision: https://reviews.llvm.org/D130316
2022-08-17 09:34:31 -07:00
Eli Friedman cfd2c5ce58 Untangle the mess which is MachineBasicBlock::hasAddressTaken().
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code.  Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.

Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.

So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.

Discovered while trying to sort out related stuff on D102817.

Differential Revision: https://reviews.llvm.org/D124697
2022-08-16 16:15:44 -07:00
wanglian fbc4c26e9a [SelectionDAG][NFC] Fix return type when used isConstantIntBuildVectorOrConstantInt
and isConstantFPBuildVectorOrConstantFP

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131870
2022-08-16 10:07:24 +08:00
David Green dfc95bab07 [DAG] Ensure more Legal BUILD_VECTOR elements types in shuffle->And combine
This is a followup to D131350, which caused another problem for i64
types being split into i32 on i32 targets. This patch tries to make sure
that either Illegal types are OK, or that the element types of a
buildvector are legal and bigger than or equal to the size of the
original elements.

Differential Revision: https://reviews.llvm.org/D131883
2022-08-15 14:41:45 +01:00
Ayke van Laethem de48717fcf
[AVR] Support unaligned store
This patch really just extends D39946 towards stores as well as loads.
While the patch is in SelectionDAGBuilder, it only applies to AVR (the
only target that supports unaligned atomic operations).

Differential Revision: https://reviews.llvm.org/D128483
2022-08-15 14:29:37 +02:00
Simon Pilgrim 3a73133217 [DAG] canCreateUndefOrPoison - add freeze(sign_extend_inreg(x,vt)) -> sign_extend_inreg(freeze(x),vt) support
Guaranteed not to create undef/poison
2022-08-15 12:18:59 +01:00
Peter Waller 6e85db7293 [DAGCombine] Combine signext_inreg of extract-extend
The outer signext_inreg is redundant in the following:

  Fold (signext_inreg (extract_subvector (zext|anyext|sext iN_value to _) _) from iN)
       -> (extract_subvector (signext iN_value to iM))

Tests are precommitted and clone those by analogy from the AND case in
the same file. Add a negative test to check extension width is handled
correctly.

This patch supersedes D130700.

Differential Revision: https://reviews.llvm.org/D131503
2022-08-15 10:58:07 +00:00
Simon Pilgrim 7e294e676e [DAG] canCreateUndefOrPoison - add freeze(assertsext/zext(x,bt)) -> assertsext/zext(freeze(x),vt) support
These are guaranteed not to create undef/poison (although they may pass through) - the associated ISD::VALUETYPE node is also guaranteed never to generate poison
2022-08-15 11:13:43 +01:00
Simon Pilgrim e2d13fd096 [DAG] canCreateUndefOrPoison - add freeze(shl(x,y)) -> shl(freeze(x),y) support
These are guaranteed not to create undef/poison if the shift amount is known to be in range
2022-08-14 14:38:10 +01:00
Simon Pilgrim a621d38bcb [DAG] canCreateUndefOrPoison - add freeze(and/or/xor(x,y)) -> and/or/xor(freeze(x),y) support
These are guaranteed not to create undef/poison
2022-08-14 13:14:53 +01:00
Simon Pilgrim 60534b8879 [DAG] canCreateUndefOrPoison - add freeze(add/sub/mul(x,y)) -> add/sub/mul(freeze(x),y,z) support
These are guaranteed not to create undef/poison as long as there are no poison generating flags
2022-08-13 20:58:00 +01:00
Joe Loser b12aa497cd
[DAGCombine] Replace std::monostate equivalent in DAGCombiner.cpp
Remove the `UnitT` type and operators in favor of using `std::monostate`
directly.

Differential Revision: https://reviews.llvm.org/D131778
2022-08-12 21:42:09 -06:00
Simon Pilgrim 4de35f4bbf [DAG] Add TODO to remove creation of INSERT_SUBVECTOR nodes from SimplifyMultipleUseDemandedBits
SimplifyMultipleUseDemandedBits shouldn't be creating general nodes like this - although we allow bitcasts, even general constant folding is avoided.

Removing it causes a number of regressions that need addressing first, but I've added a TODO for now.
2022-08-12 10:45:30 +01:00
Filipp Zhinkin 1626ee6a95 [DAGCombine] Hoist shifts out of a logic operations tree.
Hoist and combine shift operations from logic operations tree:
logic (logic (SH x0, s), y), (logic (SH x1, s), z)  --> logic (SH (logic x0, x1), s), (logic y, z)

The transformation improves code generated for some cases related to the issue https://github.com/llvm/llvm-project/issues/49541.

Correctness:
https://alive2.llvm.org/ce/z/pVqVgY
https://alive2.llvm.org/ce/z/YVvT-q
https://alive2.llvm.org/ce/z/W5zTBq
https://alive2.llvm.org/ce/z/YfJsvJ
https://alive2.llvm.org/ce/z/3YSyDM
https://alive2.llvm.org/ce/z/Bs2kzk
https://alive2.llvm.org/ce/z/EoQpzU
https://alive2.llvm.org/ce/z/Jnc_5H
https://alive2.llvm.org/ce/z/_LP6k_
https://alive2.llvm.org/ce/z/KvZNC9

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D131189
2022-08-12 12:42:16 +03:00
wanglian 061f7ec9fa [LegalizeTypes][NFC] Use getConstantOperandVal instead of cast constant getvalue
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D131642
2022-08-12 14:35:10 +08:00
wanglian 1303057888 [LegalizeTypes][NFC] Use dyn_cast instead of isa and cast
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D131544
2022-08-12 14:18:49 +08:00
wanglian 3b71f1d5ab [LegalizeTypes][NFC] Use getConstantOperandAPInt instead of cast constant getAPInt
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131653
2022-08-12 10:21:54 +08:00
Peter Waller 898699831b [DAGCombine] Check zext legality in zext-extract-extend combine
Discussed in D131503.

Fix to D130782.
2022-08-11 14:30:42 +00:00
aqjune 02e56e2533 [CodeGen] Generate efficient assembly for freeze(poison) version of `mm*_cast*` intel intrinsics
This patch makes the variants of `mm*_cast*` intel intrinsics that use `shufflevector(freeze(poison), ..)` emit efficient assembly.
(These intrinsics are planned to use `shufflevector(freeze(poison), ..)` after shufflevector's semantics update; relevant thread: D103874)

To do so, this patch

1. Updates `LowerAVXCONCAT_VECTORS` in X86ISelLowering.cpp to recognize `FREEZE(UNDEF)` operand of `CONCAT_VECTOR` in addition to `UNDEF`
2. Updates X86InstrVecCompiler.td to recognize `insert_subvector` of `FREEZE(UNDEF)` vector as its first operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130339
2022-08-11 13:36:21 +09:00
Simon Pilgrim 8623da5f74 [DAG] visitFREEZE - generalize freeze(op()) -> op(freeze()) to any number of operands
canCreateUndefOrPoison currently only handles unary ops, but we intend to change that soon - this more closely matches the pushFreezeToPreventPoisonFromPropagating behaviour where the freeze is pushed up to a single operand value, as long as all others are guaranteed not to be poison/undef.

However, pushFreezeToPreventPoisonFromPropagating would freeze all uses of the value - whilst this variant requires the frozen value to be only used in the op - we can look at generalize multiple uses later if the need arises.
2022-08-10 13:12:46 +01:00
Simon Pilgrim bbc27d0148 [DAG] canCreateUndefOrPoison - add freeze(truncate(x)) -> truncate(freeze(x)) support 2022-08-10 11:27:22 +01:00
David Truby b1b9c39629 [AArch64][SVE] Use SVE for VLS fcopysign for wide vectors
Currently fcopysign for VLS vectors lowers through NEON even when the
vector width is wider than a NEON vector, causing bad codegen as the
vectors are split. This patch causes SVE to be used for these vectors
instead, giving much better codegen on wide VLS vectors.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D128642
2022-08-10 10:17:19 +00:00
Simon Pilgrim df3ea7365e [DAG] Use DAG.getFreeze() to create freeze node. NFC. 2022-08-10 10:26:26 +01:00
Simon Pilgrim ed162d455a [DAG] Avoid hasOneUse() calls if the cheaper !AssumeSingleUse test has already failed. NFC.
Very minor optimization, but every little helps..
2022-08-09 16:42:19 +01:00
Simon Pilgrim d79e7dc939 [DAG] SimplifyDemandedVectorElts - and/mul(x,y) - if a demanded element of y is known zero then we don't need to demand it in x
This fixes most of the remaining regressions from the fixes in rG293899c64b75
2022-08-09 16:24:08 +01:00
Simon Pilgrim 2724143551 [DAG] canCreateUndefOrPoison - add freeze(ctpop(x)) -> ctpop(freeze(x)) and freeze(parity(x)) -> parity(freeze(x)) support
Both are guaranteed not to create undef/poison
2022-08-09 10:10:29 +01:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Simon Pilgrim 6f2bee667a [DAG] canCreateUndefOrPoison - add freeze(bswap(x)) -> bswap(freeze(x)) and freeze(bitreverse(x)) -> bitreverse(freeze(x)) support
Both are guaranteed not to create undef/poison
2022-08-08 17:27:17 +01:00
Simon Pilgrim e4b2c52420 [DAG] canCreateUndefOrPoison - add freeze(sext(x)) -> sext(freeze(x)) and freeze(zext(x)) -> zext(freeze(x)) support
Both are guaranteed not to create undef/poison
2022-08-08 16:43:40 +01:00
Simon Pilgrim 9641a201a5 [DAG] Add initial SelectionDAG::canCreateUndefOrPoison support
This patch adds basic support for a DAG variant of the canCreateUndefOrPoison call and updates DAGCombiner::visitFREEZE to use it, further Opcodes (including target specific Opcodes) can be handled when we have test coverage.

So far, I've left visitFREEZE to just use this for unary nodes (which currently means the existing BITCAST/FREEZE cases) - later patches will add other unary opcodes (with test coverage) and we can also refactor visitFREEZE to support a general number of operands like we do in InstCombinerImpl::pushFreezeToPreventPoisonFromPropagating.

I'm not aware of any vector test freeze coverage so the DemandedElts (and the Depth) args are not being used yet - but they are in place. Similarly we will be able to handle poison generating SDNodeFlags as and when it becomes an issue.

Part of the work for D106675 / PR50468

Differential Revision: https://reviews.llvm.org/D130646
2022-08-08 15:16:06 +01:00
Simon Pilgrim b334709467 Remove superfluous ; outside of a function 2022-08-08 12:14:03 +01:00
Shubham Narlawar ab4fc87a9d [DAG] Emit table lookup from TargetLowering::expandCTTZ()
This patch emits table lookup in expandCTTZ.

Context -
https://reviews.llvm.org/D113291 transforms set of IR instructions to
cttz intrinsic but there are some targets which does not support CTTZ or
CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ().

Differential Revision: https://reviews.llvm.org/D128911
2022-08-08 12:08:05 +01:00
Simon Pilgrim e5e93b6130 [DAG] FoldConstantArithmetic - add initial support for undef elements in bitcasted binop constant folding
FoldConstantArithmetic can fold constant vectors hidden behind bitcasts (e.g. vXi64 -> v2Xi32 on 32-bit platforms), but currently bails if either vector contains undef elements. These undefs can often occur due to SimplifyDemandedBits/VectorElts calls recognising that the upper bits are often unnecessary (e.g. funnel-shift/rotate implicit-modulo and AND masks).

This patch adds a basic 'FoldValueWithUndef' handler that will attempt to constant fold if one or both of the ops are undef - so far this just handles the AND and MUL cases where we always fold to zero.

The RISCV codegen increase is interesting - it looks like the BUILD_VECTOR lowering was loading a constant pool entry but now (with all elements defined constant) it can materialize the constant instead?

Differential Revision: https://reviews.llvm.org/D130839
2022-08-08 11:53:56 +01:00
David Green 061e0189a3 [DAG] Ensure Legal BUILD_VECTOR elements types in shuffle->And combine
D129150 added a combine from shuffles to And that creates a BUILD_VECTOR
of constant elements. We need to ensure that the elements are of a legal
type, to prevent asserts during lowering.

Fixes #56970.

Differential Revision: https://reviews.llvm.org/D131350
2022-08-08 09:47:55 +01:00
Kazu Hirata a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Filipp Zhinkin c55899f763 [DAGCombiner] Hoist funnel shifts from logic operation
Hoist funnel shift from logic op:
logic_op (FSH x0, x1, s), (FSH y0, y1, s) --> FSH (logic_op x0, y0), (logic_op x1, y1), s

The transformation improves code generated for some cases related to
issue https://github.com/llvm/llvm-project/issues/49541.

Reduced amount of funnel shifts can also improve throughput on x86 CPUs by utilizing more
available ports: https://quick-bench.com/q/gC7AKkJJsDZzRrs_JWDzm9t_iDM

Transformation correctness checks:
https://alive2.llvm.org/ce/z/TKPULH
https://alive2.llvm.org/ce/z/UvTd_9
https://alive2.llvm.org/ce/z/j8qW3_
https://alive2.llvm.org/ce/z/7Wq7gE
https://alive2.llvm.org/ce/z/Xr5w8R
https://alive2.llvm.org/ce/z/D5xe_E
https://alive2.llvm.org/ce/z/2yBZiy

Differential Revision: https://reviews.llvm.org/D130994
2022-08-05 17:02:22 -04:00
Dawid Jurczak 1bd31a6898 [NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T
Extracted from https://reviews.llvm.org/D129781 and address comment:
https://reviews.llvm.org/D129781#3655571

Differential Revision: https://reviews.llvm.org/D130268
2022-08-05 13:35:41 +02:00
Lorenzo Albano 74940d2668 [VP] Add widening for VP_STRIDED_LOAD and VP_STRIDED_STORE
Reviewed By: frasercrmck, craig.topper

Differential Revision: https://reviews.llvm.org/D121114
2022-08-04 16:12:01 +02:00
wanglian b6b0690355 [LegalizeTypes][VP] Add split operand support for VP float and integer casting
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D130685
2022-08-04 15:41:50 +08:00
Felipe de Azevedo Piovezan a5a8a05c78 [SelectionDAG] Handle IntToPtr constants in dbg.value
The function `handleDebugValue` has custom logic to handle certain kinds
constants, namely integers, floats and null pointers. However, it does
not handle constant pointers created from IntToPtr ConstantExpressions.
This patch addresses the issue by replacing the Constant with its
integer operand.

A similar bug was addressed for GlobalISel in D130642.

Reviewed By: aprantl, #debug-info

Differential Revision: https://reviews.llvm.org/D130908
2022-08-03 14:10:05 -04:00
David Truby 9a976f3661 [llvm] Always use TargetConstant for FP_ROUND ISD Nodes
This patch ensures consistency in the construction of FP_ROUND nodes
such that they always use ISD::TargetConstant instead of ISD::Constant.

This additionally fixes a bug in the AArch64 SVE backend where patterns
were matching against TargetConstant nodes and sometimes failing when
passed a Constant node.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130370
2022-08-03 14:02:11 +01:00
Fraser Cormack 646e2f4803 [VP] Rename VP int<->float conversion ISD opcodes
These should be named like the non-VP versions for consistency.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130967
2022-08-03 10:04:38 +01:00
Simon Pilgrim b651fdff79 [DAG] matchRotateSub - ensure the (pre-extended) shift amount is wide enough for the amount mask (PR56859)
matchRotateSub is given shift amounts that will already have stripped any/zero-extend nodes from - so make sure those values are wide enough to take a mask.
2022-08-02 11:38:52 +01:00
Marius Brehler ddb6c28638 Avoid comparison of integers of different signs
Otherwiese a warning is emitted when compiling with `-Wsign-compare`.
2022-08-01 11:20:41 +00:00
Simon Pilgrim b43d7aacf8 [DAG] visitINSERT_VECTOR_ELT - extend folding to BUILD_VECTOR if all missing elements from an insertion chain are known zero 2022-08-01 11:32:33 +01:00
David Sherwood 41119a0f52 [DAGCombiner] Extend visitAND to include EXTRACT_SUBVECTOR
Eliminate an AND by redefining an anyext|sext|zext.

     (and (extract_subvector (anyext|sext|zext v) _) iN_mask)
  => (extract_subvector (zeroext_iN v))

Differential Revision: https://reviews.llvm.org/D130782
2022-08-01 10:32:32 +01:00
Chuanqi Xu 9701053517 Introduce @llvm.threadlocal.address intrinsic to access TLS variable
This belongs to a series of patches which try to solve the thread
identification problem in coroutines. See
https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015
for a full background.

The problem consists of two concrete problems: TLS variable and readnone
functions. This patch tries to convert the TLS problem to readnone
problem by converting the access of TLS variable to an intrinsic which
is marked as readnone.

The readnone problem would be addressed in following patches.

Reviewed By: nikic, jyknight, nhaehnle, ychen

Differential Revision: https://reviews.llvm.org/D125291
2022-08-01 10:51:30 +08:00
Simon Pilgrim 9ad082eb5a [DAG] Pull out repeated getOperand() calls for shuffle ops. NFC. 2022-07-30 14:02:54 +01:00
Amaury Séchet 226086230c [DAG] Use recursivelyDeleteUnusedNodes in CommitTargetLoweringOpt.
It simplifies the logic and removes the need for manual bookkeeping.

Differential Revision: https://reviews.llvm.org/D130445
2022-07-29 13:49:03 +00:00
Simon Pilgrim af1b7ebcdf [TargetLowering] Move a few hasOneUse() tests later to reduce unnecessary computations. NFC.
Many of these cases, an early-out on the much cheaper getOpcode() check will avoid us needing to call hasOneUse() entirely.
2022-07-29 14:20:35 +01:00
Simon Pilgrim 641dba9e28 [DAG] Move a few hasOneUse() tests later to reduce unnecessary computations. NFC.
Many of these cases, an early-out on the much cheaper getOpcode() check will avoid us needing to call hasOneUse() entirely.
2022-07-29 11:34:39 +01:00
Simon Pilgrim 9082c13106 [Support] Add KnownBits::concat method
Add a method for the various cases where we need to concatenate 2 KnownBits together (BUILD_PAIR and SHIFT_PARTS in particular) - uses the existing APInt::concat 'HiBits.concat(LoBits)' convention

Differential Revision: https://reviews.llvm.org/D130557
2022-07-29 11:06:39 +01:00
Simon Pilgrim 8c99cef1e7 [DAG] Remove SelectionDAG::GetDemandedBits and use SimplifyMultipleUseDemandedBits directly.
GetDemandedBits is mainly a wrapper around SimplifyMultipleUseDemandedBits now, and is only used by DAGCombiner::visitSTORE so I've moved all remaining functionality there.

visitSTORE was making use of this to 'simplify' constants for a trunc-store. Just removing this code left to a mixture of regressions and gains - it came down to whether a target preferred a sign or zero extended constant for materialization/truncation. I've just moved the code over for now, but a next step would be to move this to targetShrinkDemandedConstant, but some targets that override the method expect a basic binop, and might react badly to a store node.....
2022-07-28 17:03:44 +01:00
Simon Pilgrim be488ba7de [DAG] DAGCombiner::visitTRUNCATE - remove GetDemandedBits call
This should now all be handled by SimplifyDemandedBits.
2022-07-28 15:23:04 +01:00