Commit Graph

12514 Commits

Author SHA1 Message Date
Henry Yu ef0d689e8b [SelectionDAGBuilder] use bitcast instead of AnyExtOrTrunc if copy parts from an int vector to a float vector to fix issue #58615
The getCopyFromPartsVector doesn't work correctly when PartEVT and ValueVT have both different element type and different size.

This patch
1) removes the part of a comment that contains the incorrect assumption that element type are the same
2) use bitcast when copy parts of int vector to a float vector after the subvector extraction

Reviewed By: Peter, efriedma

Differential Revision: https://reviews.llvm.org/D136726
2022-11-03 15:35:13 -07:00
Matt Arsenault cbce11c422 WebAssembly: Move exception handling code together 2022-11-02 16:05:34 -07:00
Matt Arsenault 4fed59ed41 FunctionLoweringInfo: Use TLI member instead of finding it 2022-11-02 16:05:34 -07:00
Yeting Kuo 71e4e35581 [VP][RISCV] Add vp.rint and RISC-V support.
FRINT uses dynamic rounding mode instead of static rounding mode. The patch
rename VFCVT_X_F_VL to VFCVT_RM_X_F_VL for static rounding mode uses and added
new ISDNode VFCVT_X_F_VL directly selected to PseudoVFCVT_X_F_V.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D136662
2022-11-01 14:52:47 +08:00
Simon Pilgrim 55a11b542e [VectorUtils] Add getShuffleDemandedElts helper
We have similar code to translate a demanded elements mask for a shuffle's operands in multiple places - this patch adds a helper function to VectorUtils and updates a number of locations to use it directly.

Differential Revision: https://reviews.llvm.org/D136832
2022-10-30 17:03:55 +00:00
Simon Pilgrim 78739fdb4d [DAG] Enable combineShiftOfShiftedLogic folds after type legalization
This was disabled to prevent regressions, which appear to be just occurring on AMDGPU (at least in our current lit tests), which I've addressed by adding AMDGPUTargetLowering::isDesirableToCommuteWithShift overrides.

Fixes #57872

Differential Revision: https://reviews.llvm.org/D136042
2022-10-29 12:30:04 +01:00
Simon Pilgrim 69d117edc2 [DAG] ExpandIntRes_MINMAX - simplify cases with sufficient number of sign bits
When legalizing a smax/smin/umax/umin op, if we know that the upper half is all sign bits, then we can perform the op on the lower half and then sign extend the result to the upper half.

Alive2: https://alive2.llvm.org/ce/z/rk8Rfd

Fixes #58630
2022-10-28 17:10:45 +01:00
Sanjay Patel 1e7c1dd67c [SDAG] avoid crash from mismatched types in scalar-to-vector fold
This bug was introduced with D136713 / 54eeadcf44 .

As an enhancement, we could cast operands to the expected type,
but we need to make sure that is done correctly (zext vs. sext).
It's also possible (but seems unlikely) that an operand can have
a type larger than the result type.

Fixes #58661
2022-10-28 09:14:08 -04:00
Simon Pilgrim d47f056cd2 [DAG] visitXOR - fold XOR(A,B) -> OR(A,B) iff A and B have no common bits
Alive2: https://alive2.llvm.org/ce/z/7wvfns

Part of Issue #58624
2022-10-28 12:11:12 +01:00
Simon Pilgrim 28bfd853ab [DAG] visitFSUBForFMACombine - pass callbacks by reference in isContractableAndReassociableFMUL lambda capture. NFC.
Fixes a coverity remark about large copies by value
2022-10-28 11:48:45 +01:00
Pierre van Houtryve 088a816824 [DAGCombiner] Use `getAnyExtOrTrunc` instead of TRUNCATE in ExtractVectorElt combine
ScalarVT isn't guaranteed to be smaller than the BCSrc.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D136849
2022-10-28 06:33:29 +00:00
Craig Topper 00d93def77 [LegalizeVectorOps][X86][RISCV] Expand vector S/USHLSAT instead of unrolling.
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D136478
2022-10-27 09:09:36 -07:00
Sanjay Patel 54eeadcf44 [SDAG] avoid vector extract/insert around binop
scalar-to-vector (scalar binop (extractelt V, Idx), C) --> shuffle (vector binop V, C'), {Idx, -1, -1...}

We generally try to avoid ad-hoc vectorization in SDAG,
but the motivating case from issue #39482 escapes our
normal vectorization folds in IR. It seems like it should
always be a win to transform this pattern in cases where
we have the same vector type for input and output and the
target supports the vector operation. That avoids
transfers from vector to scalar and back.

In the x86 shift examples, we create the scalar-to-vector
node during legalization. I'm not sure if there's a more
general way to create the pattern for testing. (If so, I
could add tests for other targets.)

Differential Revision: https://reviews.llvm.org/D136713
2022-10-26 14:04:46 -04:00
Sanjay Patel 3aec021118 [SDAG] add helper for opcodes that are not speculatable
This is not quite NFC because one of the users should
now avoid the DIVREM opcodes too, but I'm not sure
how to test that.

I used the same name as an analysis function in IR
in case we want to expand this to include other
operations.

Another potential use is proposed in D136713.
2022-10-26 11:20:14 -04:00
Haohai Wen 21f23a37c6 [SelectionDAG] Clamp stack alignment for memset, memmove
memcpy has clamped dst stack alignment to NaturalStackAlignment if
hasStackRealignment is false. We should also clamp stack alignment
for memset and memmove. If we don't clamp, SelectionDAG may first
do tail call optimization which requires no stack realignment. Then
memmove, memset in same function may be lowered to load/store with
larger alignment leading to PEI emit stack realignment code which
is absolutely not correct.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D136456
2022-10-26 16:45:31 +08:00
Craig Topper 8c42b5e89e [SelectionDAG] Add missing semicolon after return.
I'm unsure what the code does without the semicolon. On the surface it
seems like the assert below it would be considered part of the if
and thus the assert would only execute if DestReg is 0. But 0 isn't
considered a virtual register so the assert should fail.

Found by PVS Studio.
Reported https://pvs-studio.com/en/blog/posts/cpp/1003/ (N7)
2022-10-25 10:24:01 -07:00
Sanjay Patel b179351ad4 [SDAG] refactor folds for scalar-to-vector; NFCI
Fix typos, add comments, improve variable names,
rearrange code, add early exits.
2022-10-25 12:53:46 -04:00
Roman Lebedev 377f27be87
[X86] `DAGTypeLegalizer::ModifyToType()`: when widening w/ zeros, insert into undef and `and`-mask the padding away
We can expect that the sequence of inserting-of-extracts-into-undef
will be successfully lowered back into widening of the source vector,
but it seems that at least for X86 mask vectors, we have a really hard time
recovering from inserting-into-zero.

I've looked into alternative fix injection points, and they are much more
involved, by the time of `LowerBUILD_VECTORvXi1()`/`LowerINSERT_VECTOR_ELT()`
the constants might be obscured, so it does not seem like we can easily
deal with this by lowering into bit math later on,
some other pieces are missing.

Instead, it seems like just clearing the padding away via an `AND`-mask
is at least not a worse choice. Why create a problem where there wasn't one.
Though yes, it is possible that there are cases where constants originate
from the source IR, so some other fix may still be needed.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D136046
2022-10-24 20:27:02 +03:00
Craig Topper 1fa8fd4c33 Recommit "[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant."
This reverts commit 65aaecca88.

There was an ordering problem in the calculation of the partial
remainder.

Original commit message:

If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.

Differential Revision: https://reviews.llvm.org/D135541
2022-10-24 10:08:50 -07:00
Craig Topper 65aaecca88 Revert "[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant."
This reverts commit f6a7b47820.

I received a report that this fails on 32-bit X86.
2022-10-24 07:12:54 -07:00
Simon Pilgrim fd5f3abb07 [DAG] Fold (abs (sign_extend_inreg x)) -> (zero_extend (abs (truncate x))) (PR43370)
If the upper half of an abs() is all sign bits, then we can perform the abs() using just the lower half and then zero extend.

I've limited the DAG combine to only sign_extend_inreg (and free truncate/zero_extend) to minimise any later promotion issues, but for legalization a similar fold can use ComputeNumSignBits to be more aggressive.

Alive2: https://alive2.llvm.org/ce/z/y32fS4

Fixes #43370

Differential Revision: https://reviews.llvm.org/D136559
2022-10-24 10:27:08 +01:00
Kazu Hirata a1317be28d [SelectionDAG] Use std::clamp (NFC) 2022-10-24 00:23:51 -07:00
Simon Pilgrim 913f08b74c [DAG] Add freeze(sign/zero_extend_vector_inreg(x)) -> sign/zero_extend_vector_inreg(freeze(x)) folding 2022-10-23 12:19:42 +01:00
Craig Topper f6a7b47820 [TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant.
If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D135541
2022-10-22 23:35:33 -07:00
Craig Topper db25f51e37 Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))"
This reverts commit e8b3ffa532.

The AMDGPU/mad_64_32.ll seems to fail on some of the build bots but
passes locally. I'm really confused.
2022-10-22 22:50:43 -07:00
Craig Topper e8b3ffa532 [DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional
negate of Y.

This pattern shows up when type legalizing wide multiplies involving
a sign extended value.

Fixes PR57549.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133399
2022-10-22 21:51:45 -07:00
Craig Topper 00816714f9 [DAGCombiner][RISCV] Make foldBinOpIntoSelect work correctly with opaque constants.
The CanFoldNonConst doesn't work correctly with opaque constants
because getNode won't constant fold constants if one is opaque. Even
if the operation is AND/OR. This can lead to infinite loops.

This patch does the folding manually in the DAGCombine. Alternatively,
we could improve getNode but that seemed likely to have bigger impact
and possibly increase compile time for the additional checks. We wouldn't
want to directly constant fold because we need to preserve the opaque flag.

Fixes PR58511.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D136472
2022-10-22 19:10:33 -07:00
Simon Pilgrim b24a9f0cef [DAG] visitFREEZE - pull out Operands array. NFCI.
Initial tidyup and it will make it easier to adjust additional Operands in a future patch.
2022-10-22 20:14:56 +01:00
Simon Pilgrim 7511303c4f [DAG] canCreateUndefOrPoison - add freeze(fsh(x,y,z)) -> fsh(freeze(x),freeze(y),freeze(z)) support
The funnel-shift amount is always modulo, so won't introduce poison/undef
2022-10-22 18:39:52 +01:00
Simon Pilgrim 89111707ec [DAG] canCreateUndefOrPoison - add freeze(rot(x,y)) -> rot(freeze(x),freeze(y)) support
The rotation amount is always modulo, so won't introduce poison/undef
2022-10-22 17:24:53 +01:00
Paul Walker ab8257ca0e [NFC] Fix a few whitespace inconsistencies. 2022-10-20 14:52:25 +00:00
Simon Pilgrim 9708d88017 Revert rG42230efccf8fe1185be5fa6c23dce0a8183d6ec9 "[DAG] Fold (sra (or (shl x, c1), (shl y, c2)), c1) -> (sext_inreg (or x, (shl y,c2-c1)) iff c2 >= c1"
@foad was right - this isn't actually going to help with D136042 as much as hoped, we need a better AMDGPU-specific solution as other targets are likely to make use of it
2022-10-19 12:07:41 +01:00
Simon Pilgrim 42230efccf [DAG] Fold (sra (or (shl x, c1), (shl y, c2)), c1) -> (sext_inreg (or x, (shl y,c2-c1)) iff c2 >= c1
Helps with some of the AMDGPU regressions identified in D136042 where we were losing signed BFE patterns after sinking shifts behind logic ops.

Differential Revision: https://reviews.llvm.org/D136081
2022-10-19 11:18:49 +01:00
Koakuma d3fcbee10d [SPARC] Make calls to function with big return values work
Implement CanLowerReturn and associated CallingConv changes for SPARC/SPARC64.

In particular, for SPARC64 there's new `RetCC_Sparc64_*` functions that handles the return case of the calling convention.
It uses the same analysis as `CC_Sparc64_*` family of funtions, but fails if the return value doesn't fit into the return registers.

This makes calls to functions with big return values converted to an sret function as expected, instead of crashing LLVM.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D132465
2022-10-18 00:01:55 +00:00
Craig Topper 30305d7948 [TargetLowering][RISCV][Sparc] Don't emit zero check in CTTZTableLookup for CTTZ_ZERO_UNDEF.
The code incorrectly checked for CTLZ_ZERO_UNDEF instead of
CTTZ_ZERO_UNDEF.

While I was there I flipped the condition into an early out.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D136010
2022-10-17 10:15:39 -07:00
Kazu Hirata ef9956f434 [IR] Rename FuncletPadInst::getNumArgOperands to arg_size (NFC)
This patch renames FuncletPadInst::getNumArgOperands to arg_size for
consistency with CallBase, where getNumArgOperands was removed in
favor of arg_size in commit 3e1c787b31

Differential Revision: https://reviews.llvm.org/D136048
2022-10-17 10:15:10 -07:00
Simon Pilgrim 8e77458578 [DAG] visitShiftByConstant - replace constant detection with FoldConstantArithmetic
Instead of checking that an operand is constant/opaque before calling getNode() and then checking that the result is a constant, just use FoldConstantArithmetic which will just early-out if the operands are not constant foldable.
2022-10-17 16:19:10 +01:00
Simon Pilgrim af5942cc09 Remove trailing whitespace. NFC. 2022-10-17 15:20:26 +01:00
Peter Rong c2e7c9cb33 [CodeGen] Using ZExt for extractelement indices.
In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`.
This is because IRTranslator uses SExt for indices.

In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt.
This change includes both documentation, SelectionDAG and IRTranslator.
We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86

This patch fixes issue #57452.

Differential Revision: https://reviews.llvm.org/D132978
2022-10-15 15:45:35 -07:00
Filipp Zhinkin ef774bec63 [AArch64] Support SETCCCARRY lowering
Support SETCCCARRY lowering to SBCS instruction.

Related issue: https://github.com/llvm/llvm-project/issues/44629

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D135302
2022-10-14 22:29:31 +03:00
chenglin.bi c1909d7337 [DAGCombiner] Fix crash for the merge stores with different value type
The crash case comes from #58350. It have two stores, one store is type f32 and the other is v1f32.
When we try to merge these two stores on v1f32, the memVT is vector type so the old code will use ISD::EXTRACT_SUBVECTOR for type f32 also then compiler crash.
So this patch insert a build_vector for f32 store to generate v1f32 also when memVT is v1f32.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D135954
2022-10-15 01:16:35 +08:00
Nicola Lancellotti ce1a2ccf94 [NFC] Fix typo in DAGCombiner 2022-10-14 17:47:25 +01:00
Sander de Smalen 02df03c5b7 [AArch64][SME] Add support for arm_locally_streaming functions.
Functions with `aarch64_sme_pstatesm_body` will emit a SMSTART at the start
of the function, and a SMSTOP at the end of the function, such that all
operations use the right value for vscale.

Because the placement of these nodes is critically important (i.e. no
vscale-dependent operations should be done before SMSTART has been issued),
we require glueing the CopyFromReg to the Entry node such that we can
insert the SMSTART as part of that glued chain.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131582
2022-10-14 13:47:53 +00:00
Xiang1 Zhang aad013de41 [InlineAsm][bugfix] Correct function addressing in inline asm
In Linux PIC model, there are 4 cases about value/label addressing:
Case 1: Function call or Label jmp inside the module.
Case 2: Data access (such as global variable, static variable) inside the module.
Case 3: Function call or Label jmp outside the module.
Case 4: Data access (such as global variable) outside the module.

Due to current llvm inline asm architecture designed to not "recognize" the asm
code, there are quite troubles for us to treat mem addressing differently for
same value/adress used in different instuctions.
For example, in pic model, call a func may in plt way or direclty pc-related,
but lea/mov a function adress may use got.

This patch fix/refine the case 1 and case 2 in inline asm.
Due to currently inline asm didn't support jmp the outsider lable, this patch
mainly focus on fix the function call addressing bugs in inline asm.

Reviewed By: Pengfei, RKSimon

Differential Revision: https://reviews.llvm.org/D133914
2022-10-14 09:47:26 +08:00
Mirko Brkusanin 8b8463ef6c [SelectionDAG] Use consistent type sizes for opcode 2022-10-12 17:33:04 +02:00
Craig Topper ac9209751a Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))"
This reverts commit 0148df8157.

Getting a lit test failures on AMDGPU but I can't reproduce it so far.
Reverting to investigate.
2022-10-11 16:30:40 -07:00
Craig Topper 0148df8157 [DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional
negate of Y.

This pattern shows up when type legalizing wide multiplies involving
a sign extended value.

Fixes PR57549.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133399
2022-10-11 16:20:55 -07:00
Philip Reames 487695e7c9 [SDAG] Treat DemandedElts argument to isSplatVector as splat for scalable vectors [nfc]
The previous code used a APInt(1, 0) to represent the demanded elts of a scalable vector, and then ignored that argument if type was scalable.  This was inconsistent with the UndefElts parameter which is set to either APInt(1, 0) or APInt(1,1) - that is, implicitly broadcast across all lanes.  Particularly since the undef code relied on the DemandedElts parameter having bitwidth 1 to achieve that result!

This change switches the demanded parameter to APInt(1,1), documents the broadcast semantics, and takes advantage of it to remove one special case for scalable vectors which is no longer required.
2022-10-11 09:49:28 -07:00
Philip Reames ac4f3fff8c [SDAG] Clarify behavior of scalable demanded/undef elts in isSplatValue [nfc]
Update comment, and add an assertion to check property expected by sole (non-test) caller.  Remove tests which appear to have been copied from fixed vector tests, and whose demanded bits don't correspond to the way this interface is otherwise used.
2022-10-11 07:28:34 -07:00
Craig Topper 0121b1a4ac Revert "[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant."
This reverts commit d4facda414.

This has been reported to cause failures. Reverting while I investigate.
2022-10-10 14:53:29 -07:00
Craig Topper d4facda414 [TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant.
If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D135541
2022-10-10 11:02:22 -07:00
Craig Topper 9f67047cf0 [VP][RISCV] Add vp.smax/smin/umax/umin intrinsics
Differential Revision: https://reviews.llvm.org/D135418
2022-10-07 17:14:31 -07:00
Amaury Séchet 62ea6c5be7
[DAGCombine] Deduplicate addcarry node using commutativity.
The first two parameters of addcarry are commutative. We may face a situation where both variant are present in the DAG, in which case we benefit from using just one.

Depends on D57302 and D33587

Reviewed By: RKSimon, chfast

Differential Revision: https://reviews.llvm.org/D57317
2022-10-08 00:55:14 +02:00
eopXD dbc681c98e [VP][RISCV] Add vp.roundtozero and its RISC-V support
The scalar instruction of this is `llvm.trunc`. However the naming of
ISD::VP_TRUNC is already taken by `trunc` of the LLVM IR. Naming this as
`vp.ftrunc` would likely cause confusion with `vp.fptrunc`. So adding
`vp.roundtozero` that will look similar to `vp.roundeven`.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D135233
2022-10-07 02:15:23 -07:00
Philip Reames 04bb32e58a [DAG] Extract helper for (neg x) [nfc]
This is a frequently reoccurring pattern, let's factor it out.

Differential Revision: https://reviews.llvm.org/D135301
2022-10-06 13:23:52 -07:00
Pierre van Houtryve 3ec0085c3f [DAG] Update `isKnownNeverNaN` for `FMA/FMAD`
We can still get a NaN even if none of the operands are NaN,
e.g. from +inf/-inf. D50804 didn't catch that.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134854
2022-10-06 06:52:36 +00:00
jeff cebec42089 [DAGCombiner] [AMDGPU] Allow vector loads in MatchLoadCombine
Since SROA chooses promotion based on reaching load / stores of allocas, we may run into scenarios in which we alloca a vector, but promote it to an integer. The result of which is the familiar LoadCombine pattern (i.e. ZEXT, SHL, OR). However, instead of coming directly from distinct loads, the elements to be combined are coming from ExtractVectorElements which stem from a shared load.

This patch identifies such a pattern and combines it into a load.

Change-Id: I0bc06588f11e88a0a975cde1fd71e9143e6c42dd
2022-10-04 12:16:00 -07:00
Sanjay Patel 17dcbd8165 [SDAG] don't hoist div/rem through a select with neutral constant
This bug was introduced with D134966.
2022-10-04 13:15:01 -04:00
Jay Foad af947d9fcb [ISel] Fix crash in new FMA DAG combine
Fix a crash in the FMA combine added by D132837 and amended by D134810.
In cases where the newly created node could be folded, the combiner
would fail this assertion:

llc: DAGCombiner.cpp:268: void (anonymous namespace)::DAGCombiner::AddToWorklist(llvm::SDNode *): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed.

Differential Revision: https://reviews.llvm.org/D135150
2022-10-04 15:13:18 +01:00
Philip Reames a200b0fc25 [DAG] Introduce getSplat utility for common dispatch pattern [nfc]
We have a very common pattern of dispatching between BUILD_VECTOR and SPLAT_VECTOR creation repeated in many cases in code.  Common the pattern into a utility function.
2022-10-03 12:49:39 -07:00
Philip Reames 21f97fdc97 [DAG] Use getSplatBuildVector in a couple more places [nfc] 2022-10-03 09:48:49 -07:00
Markus Böck 36af4c8418 [SelectionDAG] Fix use-after-free introduced in D130881
The code introduced in https://reviews.llvm.org/D130881 has a bug as it may cause a use-after-free error that can be caught by ASAN.
The bug essentially boils down to iterator invalidation of `DenseMap`. The expression `SDEI[To] = I->second;` may cause `SDEI` to grow if `To` is inserted for the very first time. When that happens, all existing iterators to the map are invalidated as their backing storage has been freed. Accessing `I->second` is then invalid and attempts to access freed memory (as `I` is an iterator of `SDEI`).

This patch fixes that quite simply by first making a copy of `I->second`, and then moving into the possibly newly inserted KV of the ` DenseMap`.

No test attached as I am not sure it is practible to test.

Differential revision: https://reviews.llvm.org/D135019
2022-10-03 15:09:14 +02:00
Petar Avramovic 1fa2019828 [SelectionDAG] Add check for BUILD_VECTOR in isKnownNeverNaN
Includes handling of constants with vector type in isKnownNeverNaN.
For AMDGPU results in not making fcanonicalize during legalization
for vector inputs to fmaxnum_ieee and fminnum_ieee. Does not affect
end result since there is a combine that eliminates fcanonicalize.

Differential Revision: https://reviews.llvm.org/D88573
2022-10-03 12:47:07 +02:00
David Green 3651635eca [ARM][DAG] BF16 constant handling.
Much like f16 and f32, we shouldn't try to shrink bf16 to smaller fp
constant.  The code may not be optimal, but this allows us to legalize
bf16 constants under Arm without errors.
2022-10-02 11:51:08 +01:00
Yeting Kuo cefb7aab61 [VP][RISCV] Add vp.copysign and RISC-V support.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134935
2022-10-01 10:19:10 +08:00
Simon Pilgrim 61dc5014ac [DAG] Update foldSelectWithIdentityConstant to use llvm::isNeutralConstant
D133866 added the llvm::isNeutralConstant helper to track neutral/passthrough constants

This patch updates foldSelectWithIdentityConstant to use the helper instead of maintaining its own opcode handling

Differential Revision: https://reviews.llvm.org/D134966
2022-09-30 17:46:52 +01:00
Amaury Séchet 031a7ad575 [NFC] Fix erroneous indentation. 2022-09-30 12:30:27 +00:00
Yeting Kuo 1cc02b05b7 [SelectionDAG] Add helper function to check whether a SDValue is neutral element. NFC.
Using this helper makes work about neutral elements more easier. Although I only
find one case now, I think it will have more chance to be used since so many
combine works are related to neutral elements.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133866
2022-09-30 11:29:11 +08:00
Amaury Séchet 923909afbe [DAG] Simplify the select of constant combine code. NFC 2022-09-30 01:03:14 +00:00
Amaury Séchet d7600c7ccb [DAG] select Cond, C, -1 --> or (sext (not Cond)), C when C is MVT::i1
In the spirit of D130765 . Get rid of cbranches and/or cmov. Usually shorter, but sometime not, becaus eit's hard to prededict when dependency breaking xor will be introduced.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134736
2022-09-30 00:36:58 +00:00
Thomas Symalla a41dde2c62 [AMDGPU] Add use check in v_fma combine.
In D132837, an existing v_fma combine was extended to regard nested
fma instructions. Originally, the inner FMA was checked for being used
only once. In its current state, this check is missing, which causes
some regressions.

In this patch, this check was added.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D134856
2022-09-29 12:25:03 +02:00
Jay Foad 2c12a04bba [ISel] Fix DAG divergence after new FMA combine
D132837 introduced a new DAG combine that used MorphNodeTo to morph an
FMUL into an FMA. It turns out that MorphNodeTo does not properly update
the divergence bit for users of the morphed node, causing an assertion
failure on the new test case:

llc: SelectionDAG.cpp:10486: void llvm::SelectionDAG::VerifyDAGDivergence(): Assertion `calculateDivergence(N) == N->isDivergent() && "Divergence bit inconsistency detected"' failed.

Fixing MorphNodeTo to propagate the divergence bit is tricky because of
the way it is used to select machine instructions, so use getNode and
ReplaceAllUsesOfValueWith instead.

Differential Revision: https://reviews.llvm.org/D134810
2022-09-28 19:41:51 +01:00
Craig Topper 12357e88af [RISCV][SelectionDAGBuilder] Fix crash when copying a v1f32 vector between basic blocks.
On a rv64 without f32 or vector support, this will be passed across
the basic block as an i64. We need use i32 as an intermediate type
with bitcast and anyext/trunc.

Fixes PR58025

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134758
2022-09-28 10:13:35 -07:00
Matt Devereau 0a4771a7e8 [AArch64][SVE] Expand gather index to 32 bits instead of 64 bits
For gathers which load in 8 and 16 bit data then use that data
as an index, the index can be extended to 32 bits instead of
64 bits

Differential Revision: https://reviews.llvm.org/D130692
2022-09-28 14:42:12 +00:00
jacquesguan 465ac0b96e [LegalizeTypes] Use getVectorElementCount to avoid crash of scalable vector.
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134718
2022-09-28 17:31:29 +08:00
eopXD 9677d70eb2 [VP][RISCV] Add vp.floor, vp.round, vp.roundeven and their RISC-V support
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134759
2022-09-27 19:45:58 -07:00
eopXD 163cb33854 [VP][RISCV] Add vp.ceil and RISC-V support
Previous commit 8b00b24f85 missed to add `int_ceil` anchor for the
llvm.ceil.* section under LangRef.rst

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134586
2022-09-27 12:04:09 -07:00
eopXD 384b8b3da7 Revert "[VP][RISCV] Add vp.ceil and RISC-V support"
This reverts commit 8b00b24f85.
2022-09-27 11:12:57 -07:00
eopXD 8b00b24f85 [VP][RISCV] Add vp.ceil and RISC-V support
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134586
2022-09-27 11:08:27 -07:00
Craig Topper a6383bb51c [VP][RISCV] Add vp.fmuladd.
Expanded in SelectionDAGBuilder similar to llvm.fmuladd.

Reviewed By: frasercrmck, simoll

Differential Revision: https://reviews.llvm.org/D134474
2022-09-27 10:02:37 -07:00
Amaury Séchet d1baed7c9c [DAG] select Cond, -1, C --> or (sext Cond), C if Cond is MVT::i1
This seems to be beneficial overall, except for midpoint-int.ll .

The X86 backend seems to generate zeroing that are not necesary.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D131260
2022-09-27 12:54:52 +00:00
Yeting Kuo 04e1301f3d [VP][RISCV] Add vp.maxnum and vp.minnum intrinsics and RISC-V support.
Add vp.maxnum and vp.minnum which are vector predicted intrinsics of llvm.maxnum
and llvm.minnum.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134639
2022-09-27 13:36:45 +08:00
Paul Scoropan ce004fb4f2 [PowerPC] XCOFF exception section support on the direct assembler path
This feature implements support for making entries in the exception section
on XCOFF on the direct assembly path using the ".except" pseudo-op. It also
provides functionality to lower entries (comprised of language and reason
codes) into the exception section through the use of annotation metadata
attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler
support will be provided in another review. https://reviews.llvm.org/D133030
needs to merge first for LIT tests

Reviewed By: shchenz, RKSimon

Differential Revision: https://reviews.llvm.org/D132146
2022-09-26 22:24:20 -04:00
Craig Topper afdd600a49 [LegalizeTypes][RISCV] Support f16 in ExpandIntRes_LLROUND_LLRINT.
Promote f16 to f32 and use the f32 libcall.

I deleted rv64zfh-half-intrinsics-strict.ll because it only existed due to this issue breaking rv32.

Differential Revision: https://reviews.llvm.org/D134579
2022-09-26 11:09:33 -07:00
Amaury Séchet b30bbd181b Small formating nit in DAGCombiner. NFC 2022-09-26 13:36:11 +00:00
Yeting Kuo 43c5fbdd3a [VP][RISCV] Add vp.sqrt intrinsic and RISC-V support.
The patch modeled vp.fabs patch D132793.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D133690
2022-09-26 10:47:40 +08:00
Philip Reames b9c4733079 [DAG] Move one-use add of splat to base of scatter/gather
This extends the uniform base transform used with scatter/gather to support one-use vector adds-of-splats with a non-zero base. This has the effect of essentially reassociating an add from vector to scalar domain.

The motivation is to improve the lowering of scatter/gather operations fed by complex geps.

Differential Revision: https://reviews.llvm.org/D134472
2022-09-22 18:45:12 -07:00
Philip Reames 60c91fd364 [RISCV] Disallow scale for scatter/gather
RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely.

This has two effects:
* First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to.
* Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.)

The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics.

Differential Revision: https://reviews.llvm.org/D134382
2022-09-22 15:31:26 -07:00
Craig Topper 9d236d4dab [SelectionDAGBuilder] Simplify how VTs is created for constrained intrinsics. NFC
All constrained intrinsics return a single value. We can directly
convert it to an EVT instead of going through ComputeValueTypes.
2022-09-22 14:21:22 -07:00
Philip Reames 46525fee81 [DAGCombine] Check both forms of a commutative transform
The transform to fold an add into the base of a scatter/gather was only checking to see if the LHS was a splat.  Included test change indicates that splats are not canonicalized to LHS, and that we need to check both sides.
2022-09-22 12:21:47 -07:00
Philip Reames 143f3bf8f4 [SDAG] Split handling of VPLoad/VPGather and VPStore/VPScatter [nfc]
The merged routines are not-idiomatic, and the code sharing that results is prettty minimal.  The confusion factor is not justified.
2022-09-21 09:06:02 -07:00
Thomas Symalla c98a46fee6 [ISel] Enable generating more fma instructions.
This patch changes a FADD / FMUL => FMA ISel pattern implemented
in D80801 so that it peeks through more than one FMA.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D132837
2022-09-21 12:03:11 +02:00
Matt Arsenault bcb931c484 SelectionDAG: Add AssumptionCache analysis dependency
Fixes compile time regression after
bb70b5d406
2022-09-19 19:10:51 -04:00
Matt Arsenault 0d8ffcc532 Analysis: Add AssumptionCache argument to isDereferenceableAndAlignedPointer
This does not try to pass it through from the end users.
2022-09-19 18:57:33 -04:00
Simon Pilgrim 8206044183 [DAG] SimplifyDemandedVectorElts - add MULHS/MULHU handling to existing MUL/AND handling
Allows to determine known zero elements, which particularly helps simplification of DIV/REM by constant patterns
2022-09-19 12:44:43 +01:00
Simon Pilgrim 47cfe71027 [DAG] MatchRotate - reuse existing LHSShiftArg/RHSShiftArg variables. NFC. 2022-09-18 14:35:10 +01:00
Craig Topper 1121eca685 [VP][VE] Default VP_SREM/UREM to Expand and add generic expansion using VP_SDIV/UDIV+VP_MUL+VP_SUB.
I want to default all VP operations to Expand. These 2 were blocking
because VE doesn't support them and the tests were expecting them
to fail a specific way. Using Expand caused them to fail differently.

Seemed better to emulate them using operations that are supported.

@simoll mentioned on Discord that VE has some expansion downstream. Not
sure if its done like this or in the VE target.

Reviewed By: frasercrmck, efocht

Differential Revision: https://reviews.llvm.org/D133514
2022-09-16 13:19:02 -07:00
Stanislav Mekhanoshin a0c8f5fefa [SDAG] Print divergence in SDNode::dump
If target does not support divergence the field is set to false
and not printed.

Differential Revision: https://reviews.llvm.org/D133984
2022-09-16 11:43:34 -07:00
Juan Manuel MARTINEZ CAAMAÑO e438f2d821 [DAGCombine] Do not fold SRA/SRL of MUL into MULH when MUL's LSB are
used, and MUL_LOHI is available

Folding into a sra(mul) / srl(mul) into a mulh introduces an extra
multiplication to compute the high half of the multiplication,
while it is more profitable to compute the high and lower halfs with a
single mul_lohi.

Differential Revision: https://reviews.llvm.org/D133768
2022-09-16 15:48:36 +00:00
Alexander Timofeev fbdea5a2e9 [AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode
This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler.  It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133593
2022-09-15 22:03:56 +02:00