llvm-project

Commit Graph

Author	SHA1	Message	Date
chenglin.bi	8c74205642	[SelectionDAG][DAGCombiner] Reuse exist node by reassociate When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122539	2022-06-24 23:15:06 +08:00
Nabeel Omer	0d41794335	[SLP] Add cost model for `llvm.powi.` intrinsics (REAPPLIED) Patch was reverted in `4c5f10a` due to buildbot failures, now being reapplied with updated AArch64 and RISCV tests. This patch adds handling for the llvm.powi. intrinsics in BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization. Closes #53887. Differential Revision: https://reviews.llvm.org/D128172	2022-06-24 10:23:19 +00:00
Lian Wang	1ce30457c1	[LegalizeTypes][NFC] Add an assert to WidenVecRes_EXTRACT_SUBVECTOR and adjust some code Reviewed By: craig.topper, david-arm Differential Revision: https://reviews.llvm.org/D128038	2022-06-24 03:06:16 +00:00
Lian Wang	770fe864fe	[SelectionDAG] Enable WidenVecOp_VECREDUCE for scalable vector Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128239	2022-06-24 02:32:53 +00:00
Craig Topper	8b10ffabae	[RISCV] Disable <vscale x 1 x > types with Zve32x or Zve32f. According to the vector spec, mf8 is not supported for i8 if ELEN is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32. Since RVVBitsPerBlock is 64 and LMUL is calculated as ((MinNumElements ElementSize) / RVVBitsPerBlock) this means we need to disable any type with MinNumElements==1. For generic IR, these types will now be widened in type legalization. For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan to work on disabling the intrinsics in the riscv_vector.h header. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D128286	2022-06-23 08:49:18 -07:00
chenglin.bi	9c2bf534f5	Revert "[SelectionDAG][DAGCombiner] Reuse exist node by reassociate" This reverts commit `6c951c5ee6`.	2022-06-23 13:21:51 +08:00
Guillaume Chatelet	57ffff6db0	Revert "[NFC] Remove dead code" This reverts commit `8ba2cbff70`.	2022-06-22 14:55:47 +00:00
Guillaume Chatelet	8ba2cbff70	[NFC] Remove dead code	2022-06-22 13:33:58 +00:00
Simon Pilgrim	2c3a4a9334	[DAG] SelectionDAG::GetDemandedBits - don't recurse back into GetDemandedBits Another minor cleanup as we work toward removing GetDemandedBits entirely - call SimplifyMultipleUseDemandedBits directly.	2022-06-22 13:48:57 +01:00
Simon Pilgrim	1c2b756cd6	[DAG] visitTRUNCATE - move TRUNCATE(ADDE/ADDCARRY) folds to switch statement handling the other binops. NFC.	2022-06-21 22:07:41 +01:00
Simon Pilgrim	8cecb6be56	[DAG] Remove SelectionDAG::GetDemandedBits DemandedElts variant. NFC. We're slowly removing SelectionDAG::GetDemandedBits and replacing it with SimplifyMultipleUseDemandedBits, we no longer have any uses for the vector demanded elt variant.	2022-06-21 21:23:10 +01:00
Nabeel Omer	4c5f10aeeb	Revert rGe6ccb57bb3f6b761f2310e97fd6ca99eff42f73e "[SLP] Add cost model for `llvm.powi.*` intrinsics" This reverts commit `e6ccb57bb3`.	2022-06-21 15:05:55 +00:00
Nabeel Omer	e6ccb57bb3	[SLP] Add cost model for `llvm.powi.` intrinsics This patch adds handling for the llvm.powi. intrinsics in BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization. Closes #53887. Differential Revision: https://reviews.llvm.org/D128172	2022-06-21 14:40:34 +00:00
Kazu Hirata	7a47ee51a1	[llvm] Don't use Optional::getValue (NFC)	2022-06-20 22:45:45 -07:00
Kazu Hirata	d66cbc565a	Don't use Optional::hasValue (NFC)	2022-06-20 20:26:05 -07:00
Kazu Hirata	0916d96d12	Don't use Optional::hasValue (NFC)	2022-06-20 20:17:57 -07:00
chenglin.bi	6c951c5ee6	[SelectionDAG][DAGCombiner] Reuse exist node by reassociate When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122539	2022-06-21 09:45:19 +08:00
David Green	c0ecbfa4fd	[AArch64] Known bits for AArch64ISD::DUP An AArch64ISD::DUP is just a splat, where the known bits for each lane are the same as the input. This teaches that to computeKnownBitsForTargetNode. Problems arise for constants though, as a constant BUILD_VECTOR can be lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then turn back into a constant BUILD_VECTOR leading to an infinite cycle. This has been prevented by adding a isTargetCanonicalConstantNode node to prevent the conversion back into a BUILD_VECTOR. Differential Revision: https://reviews.llvm.org/D128144	2022-06-20 19:11:57 +01:00
Kazu Hirata	e0e687a615	[llvm] Don't use Optional::hasValue (NFC)	2022-06-20 10:38:12 -07:00
Guillaume Chatelet	03036061c7	[Alignment] Use 'previous()' method instead of scalar division This is in preparation of integration with D128052. Differential Revision: https://reviews.llvm.org/D128169	2022-06-20 11:01:43 +00:00
Simon Pilgrim	e4a124dda5	[DAG] Fold (srl (shl x, c1), c2) -> and(shl/srl(x, c3), m) Similar to the existing (shl (srl x, c1), c2) fold Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125836	2022-06-20 08:37:38 +01:00
Lian Wang	ab25e263a9	[SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D127710	2022-06-20 06:30:26 +00:00
Craig Topper	314dbde12c	[DAGCombiner][ARM][RISCV] Teach ShrinkLoadReplaceStoreWithStore to use truncstore. The VT we want to shrink to may not be legal especially after type legalization. Fixes PR56110. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D128135	2022-06-19 15:50:15 -07:00
Simon Pilgrim	ba3f2667b6	[DAG] Add MaskedVectorIsZero helper Equivalent to MaskedValueIsZero, except its checking if all of the demanded vectors elements are known to be zero	2022-06-19 17:56:30 +01:00
Simon Pilgrim	1ebe5cac46	[DAG] SimplifyDemandedBits - add DemandedElts handling to ISD::SIGN_EXTEND_INREG simplification	2022-06-19 15:35:29 +01:00
Simon Pilgrim	db1be696c4	[DAG] SimplifyDemandedBits - add ISD::VSELECT handling	2022-06-19 15:18:25 +01:00
Kazu Hirata	129b531c9c	[llvm] Use value_or instead of getValueOr (NFC)	2022-06-18 23:07:11 -07:00
Paul Walker	0e21f1d56a	[SelectionDAG] Extend WidenVecOp_INSERT_SUBVECTOR to cover more cases. WidenVecOp_INSERT_SUBVECTOR only supported cases where widening effectively converts the insert into a copy. However, when the widened subvector is no bigger than the vector being inserted into and we can be sure there's no loss of data, we can simply emit another INSERT_SUBVECTOR. Fixes: #54982 Differential Revision: https://reviews.llvm.org/D127508	2022-06-17 12:39:42 +00:00
Lian Wang	f2bcf33058	[LegalizeTypes][NFC] Merge promote SPLAT_VECTOR and promote SCALAR_TO_VECTOR to one function Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D127825	2022-06-17 02:43:52 +00:00
Lian Wang	16215eb979	[LegalizeTypes][RISCV][NFC] Modify assert in PromoteIntRes_STEP_VECTOR and add some tests for RISCV Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D127939	2022-06-17 02:26:09 +00:00
Craig Topper	e6c7a3a54f	[SelectionDAG] Don't apply MinRCSize constraint in InstrEmitter::AddRegisterOperand for IMPLICIT_DEF sources. MinRCSize is 4 and prevents constrainRegClass from changing the register class if the new class has size less than 4. IMPLICIT_DEF gets a unique vreg for each use and will be removed by the ProcessImplicitDef pass before register allocation. I don't think there is any reason to prevent constraining the virtual register to whatever register class the use needs. The attached test case was previously creating a copy of IMPLICIT_DEF because vrm8nov0 has 3 registers in it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128005	2022-06-16 14:55:14 -07:00
Adrian Tong	55311801f0	Allow bitwidth difference when checking for isOneOrOneSplat. This helps handling a case where the BUILD_VECTOR has i16 element type and i32 constant operands t2: v8i16 = setcc t8, t17, setult:ch t3: v8i16 = BUILD_VECTOR Constant:i32<1>, ... t4: v8i16 = and t2, t3 t5: v8i16 = add t8, t4 This can be turned into t5: v8i16 = sub t8, t2, and allows us to remove t3 and t4 from the DAG. Differential Revision: https://reviews.llvm.org/D127354	2022-06-16 16:04:20 +00:00
Benjamin Kramer	8c4a07c61f	[DAGCombiner] Fold fold (fp_to_bf16 (bf16_to_fp op)) -> op	2022-06-15 19:54:39 +02:00
Benjamin Kramer	ca50cb120b	[SelectionDAG] Constant fold FP_TO_BF16 and BF16_TO_FP.	2022-06-15 18:51:32 +02:00
Paul Robinson	654a835c3f	[PS5] Trap after noreturn calls, with special case for stack-check-fail	2022-06-15 09:02:17 -07:00
Benjamin Kramer	fb34d531af	Promote bf16 to f32 when the target doesn't support it This is modeled after the half-precision fp support. Two new nodes are introduced for casting from and to bf16. Since casting from bf16 is a simple operation I opted to always directly lower it to integer arithmetic. The other way round is more complicated if you want to preserve IEEE semantics, so it's handled by a new __truncsfbf2 compiler-rt builtin. This is of course very bare bones, but sufficient to get a semi-softened fadd on x86. Possible future improvements: - Targets with bf16 conversion instructions can now make fp_to_bf16 legal - The software conversion to bf16 can be replaced by a trivial implementation under fast math. Differential Revision: https://reviews.llvm.org/D126953	2022-06-15 12:56:31 +02:00
Simon Pilgrim	f096d5926d	[DAG] Fix SDLoc mismatch in (shl (srl x, c1), c2) -> and(shift(x,c3)) fold Noticed by @craig.topper on D125836 which uses a tweaked copy of the same code. Differential Revision: https://reviews.llvm.org/D127772	2022-06-15 11:07:59 +01:00
Ping Deng	c06f77ec0d	[SelectionDAG] fold 'Op0 - (X * MulC)' to 'Op0 + (X << log2(-MulC))' Reviewed By: craig.topper, spatel Differential Revision: https://reviews.llvm.org/D127474	2022-06-15 05:50:18 +00:00
Guillaume Chatelet	5a293d21fc	[NFC][Alignment] Use getAlign in SelectionDAGBuilder	2022-06-13 15:13:05 +00:00
Nikita Popov	b9a7dea917	[SelectionDAG] Handle trapping aggregate (PR49839) Call canTrap() on Constant to account for trapping ConstantAggregate.	2022-06-13 15:06:53 +02:00
Simon Pilgrim	7d8fd4f5db	[DAG] visitINSERT_VECTOR_ELT - attempt to reconstruct BUILD_VECTOR before other fold interfere Another issue unearthed by D127115 We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with). D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded. This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late. Differential Revision: https://reviews.llvm.org/D127595	2022-06-13 11:48:18 +01:00
Simon Pilgrim	1cf9b24da3	[DAG] Enable ISD::FSHL/R SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. This helps with several of the regressions from D125836	2022-06-12 19:25:20 +01:00
Simon Pilgrim	54ae4ca755	[DAG] visitSRL - pull out ShiftVT. NFC.	2022-06-12 14:02:23 +01:00
Simon Pilgrim	cf5c63d187	[DAG] visitVECTOR_SHUFFLE - fold splat(insert_vector_elt()) and splat(scalar_to_vector()) to build_vector splats Addresses a number of regressions identified in D127115	2022-06-11 21:06:42 +01:00
Simon Pilgrim	44a0cd25df	[DAG] visitINSERT_VECTOR_ELT - add <1 x ???> insert_vector_elt(v0,extract_vector_elt(v1,0),0) special case handling Check if we're just replacing one v1x?? vector with another	2022-06-11 19:30:00 +01:00
Simon Pilgrim	a71ad6a3c8	[DAG] visitINSERT_VECTOR_ELT - fold insert_vector_elt(scalar_to_vector(x),v,i) -> build_vector() Allow scalar_to_vector nodes to be used for the start of a build_vector creation	2022-06-11 15:29:22 +01:00
Simon Pilgrim	693f4db1ec	[DAG] visitINSERT_VECTOR_ELT - refactor BUILD_VECTOR insertion to remove early-out. NFCI. Remove the early-out cases so we can more easily add additional folds in the future.	2022-06-11 12:01:13 +01:00
Paul Walker	10d55c4634	[SelectionDAG] Remove invalid TypeSize conversion from WidenVecOp_BITCAST. Differential Revision: https://reviews.llvm.org/D127322	2022-06-11 10:41:13 +01:00
Guillaume Chatelet	95083fa3b8	[NFC] Remove deadcode	2022-06-10 15:13:42 +00:00
Simon Pilgrim	91adbc3208	[DAG] SimplifyDemandedVectorElts - adding SimplifyMultipleUseDemandedVectorElts handling to ISD::CONCAT_VECTORS Attempt to look through multiple use operands of ISD::CONCAT_VECTORS nodes Another minor improvement for D127115	2022-06-10 16:06:43 +01:00
Guillaume Chatelet	38637ee477	[clang] Add support for __builtin_memset_inline In the same spirit as D73543 and in reply to https://reviews.llvm.org/D126768#3549920 this patch is adding support for `__builtin_memset_inline`. The idea is to get support from the compiler to easily write efficient memory function implementations. This patch could be split in two: - one for the LLVM part adding the `llvm.memset.inline.*` intrinsics. - and another one for the Clang part providing the instrinsic as a builtin. Differential Revision: https://reviews.llvm.org/D126903	2022-06-10 13:13:59 +00:00
Simon Pilgrim	7dbfcfa735	[DAG] combineInsertEltToShuffle - if EXTRACT_VECTOR_ELT fails to match an existing shuffle op, try to replace an undef op if there is one. This should fix a number of shuffle regressions in D127115 where the re-ordered combines mean we fail to fold a EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT sequence into a BUILD_VECTOR if we extract from more than one vector source.	2022-06-09 14:56:14 +01:00
Guillaume Chatelet	dc3367970e	[SelectionDAG] Handle bzero/memset libcalls globally instead of per target Differential Revision: https://reviews.llvm.org/D127279	2022-06-09 08:34:55 +00:00
Craig Topper	4bcfc41846	[SelectionDAG] Teach computeKnownBits that a nsw self multiply produce a positive value. This matches what we do in IR. For the RISC-V test case, this allows us to use -8 for the AND mask instead of materializing a constant in a register. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D127335	2022-06-08 14:55:58 -07:00
Kai Nacke	d897a14c2e	[SystemZ] Fix check for zero size when lowering memcmp. During lowering of memcmp/bcmp, the check for a size of 0 is done in 2 different ways. In rare cases this can lead to a crash in SystemZSelectionDAGInfo::EmitTargetCodeForMemcmp(). The root cause is that SelectionDAGBuilder::visitMemCmpBCmpCall() checks for a constant int value which is not yet evaluated. When the value is turned into a SDValue, then the evaluation is done and results in a ConstantSDNode. But EmitTargetCodeForMemcmp() expects the special case of 0 length to be handled, which results in an assertion. The fix is to turn the value into a SDValue, so that both functions use the same check. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D126900	2022-06-08 14:52:13 -04:00
Simon Pilgrim	b84c10d4bc	[DAG] visitVSELECT - don't wait for truncation of sub before attempting to match with getTruncatedUSUBSAT Fixes some X86 PSUBUS regressions encountered in D127115 where the truncate was being replaced with a PACKSS/PACKUS before the fold got called again	2022-06-08 16:16:35 +01:00
Paul Walker	d88354213c	[SelectionDAG] Remove invalid TypeSize conversion from PromoteIntRes_BITCAST. Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work with TypeSize directly rather than silently casting to unsigned. To accomplish this I've extended TypeSize with an interface that essentially allows TypeSize division when both operands have the same number of dimensions. There still exists combinations of scalable vector bitcasts that cause compiler crashes. I call these out by adding "is missing" entries to sve-bitcast. Depends on D126957. Fixes: #55114 Differential Revision: https://reviews.llvm.org/D127126	2022-06-08 10:30:07 +01:00
Paul Walker	a1121c31d8	[SVE] Fix incorrect code generation for bitcasts of unpacked vector types. Bitcasting between unpacked scalable vector types of different element counts is not a NOP because the live elements are laid out differently. 01234567 e.g. nxv2i32 = XX??XX?? nxv4f16 = X?X?X?X? Differential Revision: https://reviews.llvm.org/D126957	2022-06-08 10:30:07 +01:00
Simon Pilgrim	a083f3caa1	[DAG] combineShuffleOfSplatVal - fold shuffle(splat,undef) -> splat, iff the splat contains no UNDEF elements As noticed on D127115 - we were missing this fold, instead just having the shuffle(shuffle(x,undef,splatmask),undef) fold. We should be able to merge these into one using SelectionDAG::isSplatValue, but we'll need to match the shuffle's undef handling first. This also exposed an issue in SelectionDAG::isSplatValue which was incorrectly propagating the undef mask across a bitcast (it was trying to just bail with a APInt::isSubsetOf if it found any undefs but that was actually the wrong way around so didn't fire for partial undef cases).	2022-06-07 16:42:24 +01:00
Guillaume Chatelet	0788186182	[Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times. Differential Revision: https://reviews.llvm.org/D126910	2022-06-07 13:52:20 +00:00
Nikita Popov	5a64bc207e	[DAGCombiner] Remove overzealous assertion when folding assert+trunc+assert (PR55846) These assert that there are no "useless" assertzext/assertsext nodes (that assert a wider width than a following trunc), but I don't think there is anything preventing such nodes from reaching this code. I don't think the assertion is relevant for correctness of this transform either -- if such an assert is present, then the other one will always be to a smaller width, and we'll pick that one. The assertion dates back to D37017. Fixes https://github.com/llvm/llvm-project/issues/55846. Differential Revision: https://reviews.llvm.org/D126952	2022-06-07 09:50:26 +02:00
Craig Topper	be398100ea	[SelectionDAG] Further improve computeKnownBits for (smax X, C) where C is non-negative. Move the code that was added for D126896 after the normal recursive calls to computeKnownBits. This allows us to calculate trailing zeros. Previously we would break out of the switch before the recursive calls.	2022-06-06 09:59:23 -07:00
Lian Wang	20cf77f776	[LegalizeTypes][VP] Add widen and split support for vp.fptrunc and vp.fpext Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D126439	2022-06-06 02:28:01 +00:00
Fangrui Song	d86a206f06	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 00:31:44 -07:00
Fangrui Song	557efc9a8b	[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded. Also remove cl::init(false) while touching the lines.	2022-06-03 21:59:05 -07:00
Benjamin Kramer	e8e4b741dd	[DAGCombiner] Add bf16 to the matrix of types that we don't promote to integer stores Remove a few stray semicolons while there.	2022-06-03 13:28:34 +02:00
Nikita Popov	ad742cf85d	[DAGCombine] Handle promotion of shift with both operands the same When promoting a shift, make sure we only fetch the second operand after promoting the first. Load promotion may replace users of the old load, and we don't want to be left with a dangling reference to the old load instruction. The crashing test case is from https://reviews.llvm.org/D126689#3553212. Differential Revision: https://reviews.llvm.org/D126886	2022-06-03 10:00:44 +02:00
Craig Topper	fa20bf1636	[DAGCombiner][RISCV] Improve computeKnownBits for (smax X, C) where C is non-negative. If C is non-negative, the result of the smax must also be non-negative, so all sign bits of the result are 0. This allows DAGCombiner to remove a zext_inreg in the modified test. This zext_inreg started as a sext that became zext before type legalization then was promoted to a zext_inreg. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126896	2022-06-02 12:34:24 -07:00
jacquesguan	5482ae6328	[LegalizeTypes][VP] Add widen and split support for VP FP integer casting op. This patch adds widen and split support for VP_FPTOSI, VP_FPTOUI, VP_SITOFP and VP_UITOFP. Differential Revision: https://reviews.llvm.org/D126847	2022-06-02 09:05:27 +00:00
jacquesguan	058791d8f2	[LegalizeTypes][VP] Add widen and split support for VP_SIGN_EXTEND and VP_ZERO_EXTEND. Differential Revision: https://reviews.llvm.org/D126442	2022-06-02 02:21:22 +00:00
Ping Deng	ae8ae45e2a	[DAGCombine][NFC] Add braces to 'else' to match braced 'if' Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D126624	2022-06-01 07:54:05 +00:00
Ping Deng	88af539c0e	[RISCV] Support VP_REDUCE_MUL mask operation Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126520	2022-05-30 03:05:39 +00:00
Ping Deng	083798e270	[LegalizeTypes][VP] Add integer promotion support for vp.fptosi/vp.fptoui Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125760	2022-05-30 03:05:39 +00:00
Ping Deng	121689a62e	[SelectionDAG][NFC] Simplify integer promotion in setcc/vp.setcc Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126516	2022-05-27 05:50:19 +00:00
Craig Topper	460781feef	[LegalizeTypes] Fix bug in expensive checks verification With a fix for an expensive checks build failure exposed by new RISC-V tests. Something about expanding two rotates in type legalization caused a change in the remapping tables that the expensive checks verifying wasn't expecting. See comment in the code for how it was fixed. Tests came from this commit that exposed the bug [RISCV] Add test cases showing failure to remove mask on rotate amounts. If the masking AND has multiple users we fail to remove it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D126036	2022-05-26 13:13:32 -07:00
Simon Pilgrim	f366acdbf6	[DAG] Generalize (sra (trunc (sra x, c1)), c2) -> (trunc (sra x, c1 + c2)) constant folding Remove local (uniform) constant folding and rely on getNode() to perform it Minor cleanup step toward adding non-uniform shift amount support	2022-05-26 14:05:09 +01:00
Simon Pilgrim	7b617eef80	[DAG] Cleanup "and/or of cmp with single bit diff" fold to use ISD::matchBinaryPredicate Prep work as I'm investigating some cases where TLI::convertSetCCLogicToBitwiseLogic should accept vectors.	2022-05-26 12:34:09 +01:00
Lian Wang	8aa6b05deb	[LegalizeTypes][VP] Add widen and split support for VP_TRUNCATE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125950	2022-05-26 02:03:27 +00:00
Paul Walker	6f215ca680	[SelectionDAG] Add support to widen ISD::STEP_VECTOR operations. Fixes: #55165 Differential Revision: https://reviews.llvm.org/D126168	2022-05-24 22:42:37 +01:00
Serge Pavlov	6fc0bc5b0f	Fix behavior of is_fp_class on empty class set The second argument to is_fp_class specifies the set of floating-point class to test against. It can be zero, in this case the intrinsic is expected to return zero value. Differential Revision: https://reviews.llvm.org/D112025	2022-05-24 21:50:18 +07:00
Simon Pilgrim	11455e4758	[DAG] Unroll vectorized FPOW instructions before widening that will scalarize to libcalls anyway Followup to D125988 - FPOW is similar to FREM and will most likely scalarize to libcalls, so unroll before widening to prevent use making additional libcalls with UNDEF args.	2022-05-24 15:44:53 +01:00
Nabeel Omer	8b5d9cbbfe	[x86][DAG] Unroll vectorized FREMs that will become libcalls Currently, two element vectors produced as the result of a binary op are widened to four element vectors on x86 by DAGTypeLegalizer::WidenVecRes_BinaryCanTrap. If the op still isn't legal after widening it is unrolled into 4 scalar ops in SelectionDAG before being converted into a libcall. This way we end up with 4 libcalls (two of them on known undef elements) instead of the original two libcalls. This patch modifies DAGTypeLegalizer::WidenVectorResult to ensure that if it is known that a binary op will be tunred into a libcall, it is unrolled instead of being widened. This prevents the creation of the extra scalar instructions on known undef elements and (eventually) libacalls with known undef parameters which would otherwise be created when the op gets expanded post widening. Differential Revision: https://reviews.llvm.org/D125988	2022-05-24 13:34:51 +01:00
Fraser Cormack	7f7ef0ed61	[LegalizeTypes][NFC] Fix node name in assertion message This was probably copy/pasted from the MSCATTER widening.	2022-05-24 09:16:18 +01:00
Lian Wang	be84f91f87	[LegalizeTypes][VP] Fix OpNo in WidenVecOp_VP_SCATTER Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126276	2022-05-24 07:14:46 +00:00
Craig Topper	569d8945f3	[DAGCombiner][AArch64] Don't fold (smulo x, 2) -> (saddo x, x) if VT is i2. If the VT is i2, then 2 is really -2. Test has not been commited yet, but diff shows the change. Fixes PR55644. Differential Revision: https://reviews.llvm.org/D126213	2022-05-23 11:13:57 -07:00
Craig Topper	c11051a400	[SelectionDAG] Add a freeze to ISD::ABS expansion. I had initially assumed this was the problem with https://github.com/llvm/llvm-project/issues/55271#issuecomment-1133426243 But it turns out that was a simpler issue. This patch is still more correct than what we were doing before so figured I'd submit it anyway. No test case because I'm not sure how to get an undef around until expansion. Looking at the test deltas I wonder if it be valid to combine (sext_inreg (freeze (aextload X))) -> (freeze (sextload X)). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D126175	2022-05-22 14:29:58 -07:00
Craig Topper	768a1ca5ec	[SelectionDAG] Fold abs(undef) to 0 instead of undef. abs should only produce a positive value or the signed minimum value. This means we can't fold abs(undef) to undef as that would allow more values. Fold to 0 instead to match InstSimplify. Fixes test mentioned in comment on pr55271. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126174	2022-05-22 12:47:32 -07:00
Paul Walker	258dac43d6	[SVE] Enable use of 32bit gather/scatter indices for fixed length vectors Differential Revision: https://reviews.llvm.org/D125193	2022-05-22 12:32:30 +01:00
Ping Deng	0e8ac3a797	[LegalizeTypes][VP] Add integer promotion support for vp.sitofp/vp.uitofp Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125960	2022-05-22 02:13:45 +00:00
Craig Topper	003b95acf2	[LegalizeTypes] Remove double map lookup in DAGTypeLegalizer::PerformExpensiveChecks. NFC Remove repeated checks for ResId being 0.	2022-05-21 00:06:59 -07:00
Craig Topper	66875dbcc0	[LegalizeTypes] Use SmallDenseMap::count instead of SmallDenseMap::find. NFC It's more readable and more efficient.	2022-05-21 00:06:55 -07:00
Jay Foad	6bec3e9303	[APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf Most clients only used these methods because they wanted to be able to extend or truncate to the same bit width (which is a no-op). Now that the standard zext, sext and trunc allow this, there is no reason to use the OrSelf versions. The OrSelf versions additionally have the strange behaviour of allowing extending to a smaller width, or truncating to a larger width, which are also treated as no-ops. A small amount of client code relied on this (ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and needed rewriting. Differential Revision: https://reviews.llvm.org/D125557	2022-05-19 11:23:13 +01:00
Lian Wang	530bab1f93	[RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation Re-landed D125206 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125206	2022-05-19 09:53:33 +00:00
Lian Wang	f035068bb3	[LegalizeVectorTypes][VP] Add widen and split support for VP_SETCC Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D125446	2022-05-19 07:42:39 +00:00
Lian Wang	bbc6834e26	[LegalizeTypes][VP] Add integer promotions support for VP_TRUNCATE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125739	2022-05-19 07:36:10 +00:00
Lian Wang	993070d11f	[LegalizeTypes][VP][NFC] Use an if and two returns instead of ?: operator Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125858	2022-05-19 07:18:24 +00:00
Craig Topper	46eef76876	[DAGCombiner] Fix bug in MatchBSwapHWordLow. This function tries to match (a >> 8) \| (a << 8) as (bswap a) >> 16. If the SRL isn't masked and the high bits aren't demanded, we still need to ensure that bits 23:16 are zero. After the right shift they will be in bits 15:8 which is where the important bits from the SHL end up. It's only a bswap if the OR on bits 15:8 only takes the bits from the SHL. Fixes PR55484. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D125641	2022-05-18 09:23:18 -07:00
Yeting Kuo	00999fb6e1	[SelectionDAGBuilder] Pass fast math flags to most of VP SDNodes. The patch does not pass math flags to float VPCmpIntrinsics because LLParser could not identify float VPCmpIntrinsics as FPMathOperators. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125600	2022-05-18 16:15:47 +08:00
Simon Pilgrim	d40b7f0d5a	[DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a single AND node. AArch64 has a couple of regressions due to this, so I've enforced the existing one use limit inside a AArch64TargetLowering::shouldFoldConstantShiftPairToMask callback. Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125607	2022-05-17 13:40:11 +01:00
jacquesguan	26593e7314	[SelectionDAG] Support more VP reduction mask operation. This patch uses VP_REDUCE_AND and VP_REDUCE_OR to replace VP_REDUCE_SMAX,VP_REDUCE_SMIN,VP_REDUCE_UMAX and VP_REDUCE_UMIN for mask vector type. Differential Revision: https://reviews.llvm.org/D125002	2022-05-17 09:14:21 +00:00
Paul Walker	7dd05ba9ed	[SelectionDAG] Remove duplicate "is scaled" information from gather/scatter SDNodes. During early gather/scatter enablement two different approaches were taken to represent scaled indices: * A Scale operand whereby byte_offsets = Index * Scale * An IndexType whereby byte_offsets = Index * sizeof(MemVT.ElementType) Having multiple representations is bad as shown by this patch which fixes instances where the two are out of sync. The dedicated scale operand is more flexible and pervasive so this patch removes the UNSCALED values from IndexType. This means all indices are scaled but the scale can be one, hence unscaled. SDNodes now use the scale operand to answer the "isScaledIndex" question. I toyed with the idea of keeping the UNSCALED enums and helper functions but because they will have no uses and force SDNodes to validate the set of supported values I figured it's best to remove them. We can re-add them if there's a real need. For similar reasons I've kept the IndexType enum when a bool could be used as I think being explicitly looks better. Depends On D123347 Differential Revision: https://reviews.llvm.org/D123381	2022-05-16 20:47:52 +01:00
Craig Topper	1c4880a2d3	[TargetLowering] Expand the last stage of i16 popcnt using shift+add+and instead of mul+shift. If we use multiply it would be with 0x0101 which is 1 more than a power of 2. On some targets we would expand this to shl+add. By avoiding the multiply earlier, we can generate better code. Note, PowerPC doesn't do the shl+add expansion of multiply so one of the tests increased in instruction count. Limiting to scalars because it almost always increased the number of instructions in vector tests. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D125638	2022-05-16 09:27:44 -07:00
Craig Topper	e6fc8454be	[DAGCombiner] Fix incorrect indentation. NFC	2022-05-16 09:27:15 -07:00
Bradley Smith	7ff5148d64	[DAGCombine] Support splat_vector nodes in (and (extload)) dagcombine Differential Revision: https://reviews.llvm.org/D125367	2022-05-16 11:25:20 +00:00
Yeting Kuo	26a61ab678	[SelectionDAG] Make getNode which uses single element SDVTList pass SDNodeFlags. The patch make users not need to know getNode with SDNodeFlags argument may not pass its flags. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125659	2022-05-16 18:19:46 +08:00
Denis Antrushin	8903dbef8f	[StatepointLowering] Properly handle local and non-local relocates of the same value. FunctionLoweringInfo::StatepointRelocationMaps map is used to pass GC pointer lowering information from statepoint to gc.relocate which may appear ini different block. D124444 introduced different lowering for local and non-local relocates. Local relocates use SDValue and non-local relocates use value exported to VReg. But I overlooked the fact that StatepointRelocationMap is indexed not by GCRelocate instruction, but by derived pointer. This works incorrectly when we have two relocates (one local and another non-local) of the same value, because they need different relocation records. This patch fixes the problem by recording relocation information per relocate instruction, not per derived pointer. This way, each gc.relocate can be lowered differently. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D125538	2022-05-16 17:02:34 +07:00
Nikita Popov	05c3fe075d	[FastISel] Fix load folding for registers with fixups FastISel tries to fold loads into the single using instruction. However, if the register has fixups, then there may be additional uses through an alias of the register. In particular, this fixes the problem reported at https://reviews.llvm.org/D119432#3507087. The load register is (at the time of load folding) only used in a single call instruction. However, selection of the bitcast has added a fixup between the load register and the cross-BB register of the bitcast result. After fixups are applied, there would now be two uses of the load register, so load folding is not legal. Differential Revision: https://reviews.llvm.org/D125459	2022-05-16 10:25:25 +02:00
Craig Topper	b4ad450953	[TargetLowering] expandCTPOP don't create an used constant mask for i8 ctpop. NFC Use early out for the i8 case. I'm looking at avoiding MUL on targets that use libcalls for MUL. So doing a little pre-refactoring.	2022-05-14 20:35:38 -07:00
Simon Pilgrim	f4eac6e5f6	[DAG] visitOR - merge isa/cast<ShuffleVectorSDNode> into dyn_cast<ShuffleVectorSDNode>. NFC. Also, initialize entire mask to -1 to simplify undefined cases.	2022-05-14 20:49:26 +01:00
Simon Pilgrim	95cdd63b87	[DAG] visitADDLike - use SelectionDAG::FoldConstantArithmetic directly to match constant operands SelectionDAG::FoldConstantArithmetic determines if operands are foldable constants, so we don't need to bother with isConstantOrConstantVector / Opaque tests before calling it directly.	2022-05-14 18:39:41 +01:00
Simon Pilgrim	8db72d9d04	[DAG] visitMUL - pull out repeated SDLoc() calls. NFC.	2022-05-14 14:28:39 +01:00
Simon Pilgrim	8d4d4988e4	[DAG] Use SelectionDAG::FoldConstantArithmetic directly to match constant operands SelectionDAG::FoldConstantArithmetic determines if operands are foldable constants, so we don't need to bother with isConstantOrConstantVector / Opaque tests before calling it directly.	2022-05-14 14:19:12 +01:00
Simon Pilgrim	1ecc3d86ae	[DAG] Enable ISD::SHL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits Pulled out of D77804 as its going to be easier to address the regressions individually. This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. The lost RISCV gorc2 fold shouldn't be a problem - instcombine would have already destroyed that pattern - see https://github.com/llvm/llvm-project/issues/50553 Differential Revision: https://reviews.llvm.org/D124839	2022-05-14 09:50:01 +01:00
Simon Pilgrim	3fc33ced10	DAGCombiner.cpp - break if-else chains that always return (style)	2022-05-13 18:31:39 +01:00
Sanjay Patel	e52e1dab2a	[SDAG] freeze operand when expanging urem This is a potential miscompile as discussed in issue #55291. The related IR transform was patched with: `d428f09b2c`	2022-05-13 10:55:14 -04:00
Lian Wang	693758b282	[LegalizeTypes][VP] Add integer promotion support for vp.setcc Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125453	2022-05-13 06:25:13 +00:00
Lian Wang	8050ba6678	[LegalizeTypes][VP] Add integer promotion support for vp.merge Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125452	2022-05-13 03:28:29 +00:00
Nikita Popov	50f846d634	[FastISel] Add some debug output (NFC) Print a debug message when aborting isel (next to the ORE report) and when folding a load.	2022-05-12 12:25:20 +02:00
Lian Wang	9176096c86	[LegalizeVectorTypes] Enable WidenVecRes_SETCC work for scalable vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125359	2022-05-12 02:52:43 +00:00
Xiang1 Zhang	2ea8f203cd	[CodeGen] Fix ConvertNodeToLibcall for STRICT_FPOWI Reviewed By: PengfeiWang Differential Revision: https://reviews.llvm.org/D125159	2022-05-11 08:58:06 +08:00
Lian Wang	f14a1f26ad	Revert "[RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation" This patch make CodeGen/test/AArch64/vecreduce-add-legalization.ll fail. This reverts commit `17a8a1bb71`.	2022-05-10 09:25:25 +00:00
Lian Wang	17a8a1bb71	[RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125206	2022-05-10 08:52:48 +00:00
David Green	2cfb243bcd	[DAG] Use isAnyConstantBuildVector. NFC As suggested from `02f8519502`, this uses the isAnyConstantBuildVector method in lieu of separate isBuildVectorOfConstantSDNodes calls. It should otherwise be an NFC.	2022-05-09 14:13:03 +01:00
David Green	02f8519502	[DAG] Prevent infinite loop combining bitcast shuffle This prevents an infinite loop from D123801, where code trying to reduce the total number of bitcasts, but also handling constants, could create the opposite transform. Prevent the transform in these case to let the bitcast of a constant transform naturally. Fixes #55345	2022-05-09 09:36:22 +01:00
Simon Pilgrim	800d36cf32	[DAG] Only perform the fold (A-B)+(C-D) --> (A+C)-(B+D) when both inner subs have one use Fixes #51381	2022-05-08 13:51:58 +01:00
Craig Topper	b81bf7bb2f	[LegalizeTypes] Make use of SelectionDAG::getShiftAmountConstant. NFC Instead of calling getShiftAmountTy and getConstant separately.	2022-05-07 12:16:53 -07:00
Craig Topper	00bfaba997	[LegalizeTypes] Don't assume fshl/fshr shift amount type matches the other operands. Like other shifts, the type isn't required to match. We shouldn't assume we can call ZExtPromotedInteger. I tested the PromoteIntOp_FunnelShift locally by removing the promotion of the shift amount from PromoteIntRes_FunnelShift. But with the final version of this patch it is never executed on any tests. Differential Revision: https://reviews.llvm.org/D125106	2022-05-07 11:44:07 -07:00
Amaury Séchet	06fad8bc05	[DAGCombine] Add node in the worklist in topological order in CombineTo This is part of an ongoing effort toward making DAGCombine process the nodes in topological order. This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124743	2022-05-07 16:24:31 +00:00
Paul Walker	702c4ade22	[ISD::IndexType] Helper functions for common queries. Add helper functions to query the signed and scaled properties of ISD::IndexType along with functions to change them. Remove setIndexType from MaskedGatherSDNode because it only has one usage and typically should only be changed alongside its index operand. Minimise the direct use of the enum values to lay the groundwork for more refactoring. Differential Revision: https://reviews.llvm.org/D123347	2022-05-07 11:23:42 +01:00
David Green	5930691ee1	Revert "[DAGCombine] Make combineShuffleOfBitcast LittleEndian specific" This reverts commit `891c3cf99e` as it turns out that the error was not caused by this commit, the error caming from D124526 instead.	2022-05-06 21:03:22 +01:00
David Green	891c3cf99e	[DAGCombine] Make combineShuffleOfBitcast LittleEndian specific Something is going wrong with the BigEndian PowerPC bot. It is hard to tell what is wrong from here, but attempt to fix it by disabling the combineShuffleOfBitcast combine for bigendian.	2022-05-06 18:42:44 +01:00
Craig Topper	76f90a9d71	[SelectionDAG] Clear promoted bits before UREM on shift amount in PromoteIntRes_FunnelShift. Otherwise we have garbage in the upper bits that can affect the results of the UREM. Fixes PR55296. Differential Revision: https://reviews.llvm.org/D125076	2022-05-06 09:26:30 -07:00
Simon Pilgrim	c0bebc12f0	[DAG] visitREM - merge buildOptimizedSREM into if(). NFCI.	2022-05-06 15:39:17 +01:00
David Green	115c188807	[DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask')) If the mask is made up of elements that form a mask in the higher type we can convert shuffle(bitcast into the bitcast type, simplifying the instruction sequence. A v4i32 2,3,0,1 for example can be treated as a 1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load combines, along with helping simplify a number of other tests. The PowerPC combine for v16i8 splat vector loads needed some fixes to keep it working for v16i8 vectors. This improves the handling of v2i64 shuffles to match too, hopefully improving them in general. Differential Revision: https://reviews.llvm.org/D123801	2022-05-06 10:50:31 +01:00
Lian Wang	fb0d636f28	[RISCV][SelectionDAG] Support VP_REDUCE_ADD mask operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124986	2022-05-06 01:49:21 +00:00
Craig Topper	5140e0d219	[SelectionDAGISel] Add back a comment to MergeInputChains handling. NFC This comment used to exist, but was lost in a refactor over 10 years ago, but still seems relevant and improves readability.	2022-05-05 12:59:21 -07:00
Craig Topper	084f967370	[SelectionDAG] Constant fold (sext_inreg undef, VT) to 0 instead of undef. The result of sign_extend_inreg needs to have as many sign bits as requested by the VT argument. The easiest way to guarantee this is to fold it to 0. SystemZ test was modified to avoid using undef. Fixes https://github.com/llvm/llvm-project/issues/55178 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124696	2022-05-05 09:45:35 -07:00
Craig Topper	4e2d1a6c18	[DAGCombiner] Fold (sext/zext undef) -> 0 and aext(undef) -> undef. Differential Revision: https://reviews.llvm.org/D124988	2022-05-05 09:34:18 -07:00
Craig Topper	fd13192aa5	[DAGCombiner] Fold (max/min X, X) -> X. Differential Revision: https://reviews.llvm.org/D124951	2022-05-05 09:34:17 -07:00
Nikita Popov	9678936f18	[DAGCombine] Fold (X & ~Y) \| Y with truncated not This extends the (X & ~Y) \| Y to X \| Y fold to also work if ~Y is a truncated not (when taking into account the mask X). This is done by exporting the infrastructure added in D124856 and reusing it here. I've retained the old value of AllowUndefs=false, though probably this can be switched to true with extra test coverage. Differential Revision: https://reviews.llvm.org/D124930	2022-05-05 11:10:11 +02:00
Craig Topper	572dfef1db	[SelectionDAG] Use llvm::any_of to simplify a loop. NFC	2022-05-04 19:09:06 -07:00
Nikita Popov	451bc723ae	[SDAG] Handle truncated not in haveNoCommonBitsSet() Demanded bits analysis may replace a full-width not with a any_extend (not (truncate X)) pattern. This patch looks through this kind of pattern in haveNoCommonBitsSet(). Of course, we can only do this if we only need negated bits in the non-extended part, as the other bits may now be arbitrary. For example, if we have haveNoCommonBitsSet(~X & Y, X) then ~X only needs to actually negate bits set in Y. This is only a partial solution to the problem in that it allows add -> or conversion, but the resulting or doesn't get folded yet. (I guess that will involve exposing getBitwiseNotOperand() as a more general helper and using that in the relevant transform.) Differential Revision: https://reviews.llvm.org/D124856	2022-05-04 15:30:44 +02:00
serge-sans-paille	7030654296	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `fa5a4e1b95` detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D124847	2022-05-04 08:32:38 +02:00
Simon Pilgrim	faa35fc873	[DAG] Fix issue with rot(rot(x,c1),c2) -> rot(x,c1+c2) fold with unnormalized rotation amounts Don't assume the rotation amounts have been correctly normalized - do it as part of the constant folding. Also, the normalization should be performed with UREM not SREM.	2022-05-03 17:16:26 +01:00
Nikita Popov	2171a896ed	[SDAG] Handle A and B&~A in haveNoCommonBitsSet() This is the DAG variant of D124763. The code already handles the general pattern, but not this degenerate case. This allows folding A + (B&~A) to A \| (B&~A) which further holds to A \| B. Handling on the SDAG level is needed because in the motivating case the add is actually a getelementptr, which only gets converted into an add on the SDAG level. However, this patch is not quite sufficient to handle the getelementptr case yet, because of an interfering demanded bits simplification. Differential Revision: https://reviews.llvm.org/D124772	2022-05-03 15:47:02 +02:00
Nikita Popov	e0892614b1	[SDAG] Extract commutative helper from haveNoCommonBitsSet() (NFC) To make it easier to add additional patterns, which will generally want to handle commuted top-level operands.	2022-05-03 12:28:35 +02:00
Hsiangkai Wang	eaaa31ff2c	[RISCV][TargetLowering] Special case overflow expansion for (uaddo X, C). Follow-up to D122933. Differential Revision: https://reviews.llvm.org/D124374	2022-05-03 03:51:36 +00:00
Craig Topper	5f057eaa0d	[DAGCombiner] reassociationCanBreakAddressingModePattern should check uses of the outer add. When looking for memory uses, reassociationCanBreakAddressingModePattern should check uses of the outer ADD rather than the inner ADD. We want to know if the two ops we're reassociating are used by a load/store. In practice, the existing check usually works because CodeGenPrepare will make one of the load/stores have an offset of 0 relative to split GEP. That will make the inner add have a memory use. To test this, I've manually split the GEPs so there is no 0 offset store. This issue was recently discussed in the original review D60294. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D124644	2022-05-02 16:38:53 -07:00
Sanjay Patel	747c6a0c73	[SDAG] fix miscompile when casting int->FP->int This is the codegen equivalent of D124692. As shown in https://github.com/llvm/llvm-project/issues/55150 - the existing fold may be wrong when converting to a signed value. This is a quick fix to avoid the miscompile. https://alive2.llvm.org/ce/z/KtaDmd Differential Revision: https://reviews.llvm.org/D124771	2022-05-02 14:57:27 -04:00
Simon Pilgrim	ae8b10e543	[DAG] (style) Break apart if-else chain as they all return	2022-05-01 17:56:59 +01:00
Paul Walker	f10a8f6752	[LegalizeDAG] Fix TypeSize conversion error when expanding SIGN_EXTEND_INREG SIGN_EXTEND_INREG expansion can trigger a TypeSize error because "VT.getSizeInBits() == 1" is used to detect for a boolean without first verifying VT is a scalar.	2022-04-30 19:21:48 +01:00
Craig Topper	6affe87bda	[DAGCombiner] When matching a disguised rotate by constant don't forget to apply LHSMask/RHSMask. We try to match as a disguised rotate by constant of these forms (shl (X \| Y), C1) \| (srl X, C2) --> (rotl X, C1) \| (shl Y, C1) (shl X, C1) \| (srl (X \| Y), C2) --> (rotl X, C1) \| (srl Y, C2) We may have also looked through an AND to find the shift. If we did, we need to apply a mask to the result. I'll add an AArch64 test and pre-commit it and the RISC-V test tomorrow. Fixes PR55201. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124711	2022-04-30 11:02:30 -07:00
Paul Walker	23c509754d	[DAGCombiner] Stop invalid sign conversion in refineIndexType. When looking through extends of gather/scatter indices it's safe to convert a known positive signed index to unsigned, but unsigned indices must remain unsigned. Depends On D123318 Differential Revision: https://reviews.llvm.org/D123326	2022-04-29 14:20:13 +01:00
Nikita Popov	027c728f29	[SelectionDAGBuilder] Don't create MGATHER/MSCATTER with Scale != ElemSize This is an alternative to D124530. In getUniformBase() only create scales that match the gather/scatter element size. If targets also support other scales, then they can produce those scales in target DAG combines. This is what X86 already does (as long as the resulting scale would be 1, 2, 4 or 8). This essentially restores the pre-opaque-pointer state of things. Fixes https://github.com/llvm/llvm-project/issues/55021. Differential Revision: https://reviews.llvm.org/D124605	2022-04-29 14:57:53 +02:00
Paul Walker	7a0b897e86	[DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling refineUniformBase and selectGatherScatterAddrMode both attempt the transformation: base(0) + index(A+splat(B)) => base(B) + index(A) However, this is only safe when index is not implicitly scaled. Differential Revision: https://reviews.llvm.org/D123222	2022-04-29 12:35:16 +01:00
Serge Pavlov	9fc58f1820	[PowerPC] Support of ppc_fp128 in lowering of llvm.is_fpclass PowerPC supports `ppc_fp128`, which is not an IEEE floating point type. The generic lowering of llvm.is_fpclass could not handle it properly. This change extends the generic lowering code to support `ppc_fp128`. The change was tested on emulator using runtime tests from https://reviews.llvm.org/D112933 and the patch for clang https://reviews.llvm.org/D112932. Differential Revision: https://reviews.llvm.org/D113908	2022-04-29 11:10:47 +07:00
Alexey Bataev	75e1cf4a6a	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-28 10:04:41 -07:00
Bjorn Pettersson	3a39bb96ca	[SelectionDAG] Use correct boolean representation in FoldConstantArithmetic The description of SETCC says /// SetCC operator - This evaluates to a true value iff the condition is /// true. If the result value type is not i1 then the high bits conform /// to getBooleanContents. Without this patch, we sign extended the i1 to the used larger type regardless of getBooleanContents. This resulted in miscompiles, as shown in the attached testcase that ended up returning -1 instead of 1 when using -mattr=+v. Fixes https://github.com/llvm/llvm-project/issues/55168 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124618	2022-04-28 18:42:16 +02:00
Alexey Bataev	9861ca0c23	Revert "[COST]Improve cost model for shuffles in SLP." This reverts commit `29a470e380` to fix a crash reported in https://reviews.llvm.org/D100486#3479989.	2022-04-28 08:11:56 -07:00
Alexey Bataev	29a470e380	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-27 10:56:26 -07:00
Denis Antrushin	4059770af5	[StatepointLowering] Only export STATEPOINT results if used in nonlocal blocks. Cuurently we always export STATEPOINT results (GC pointers lowered via VRegs) to virtual registers. When processing gc.relocate instructions we have to generate CopyFromRegs node and then export it to VReg again if gc.relocate is used in other basic blocks. This results in generation of extra COPY MIR instruction if statepoint and its gc.relocate are in the same BB, but gc.relocate result is used in other blocks. This patch changes this behavior to export statepoint results only if used in other basic blocks. For local uses StatepointLoweringState.(get\|set)Location() API is used to communicate appropriate statepoint result from `LowerStatepoint()` to `visitGCRelocate()` This is NFC and is purely compile time optimization. On big methids it can improve codegen compile time up to 10%. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D124444	2022-04-27 15:53:24 +03:00
Serge Pavlov	170a903144	Intrinsic for checking floating point class This change introduces a new intrinsic, `llvm.is.fpclass`, which checks if the provided floating-point number belongs to any of the the specified value classes. The intrinsic implements the checks made by C standard library functions `isnan`, `isinf`, `isfinite`, `isnormal`, `issubnormal`, `issignaling` and corresponding IEEE-754 operations. The primary motivation for this intrinsic is the support of strict FP mode. In this mode using compare instructions or other FP operations is not possible, because if the value is a signaling NaN, floating-point exception `Invalid` is raised, but the aforementioned functions must never raise exceptions. Currently there are two solutions for this problem, both are implemented partially. One of them is using integer operations to implement the check. It was implemented in https://reviews.llvm.org/D95948 for `isnan`. It solves the problem of exceptions, but offers one solution for all targets, although some can do the check in more efficient way. The other, implemented in https://reviews.llvm.org/D96568, introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects a target specific code into IR to implement `isnan` and some other functions. It is convenient for targets that have dedicated instruction to determine FP data class. However using target-specific intrinsic complicates analysis and can prevent some optimizations. A special intrinsic for value class checks allows representing data class tests with enough flexibility. During IR transformations it represents the check in target-independent way and saves it from undesired transformations. In the instruction selector it allows efficient lowering depending on the used target and mode. This implementation is an extended variant of `llvm.isnan` introduced in https://reviews.llvm.org/D104854. It is limited to minimal intrinsic support. Target-specific treatment will be implemented in separate patches. Differential Revision: https://reviews.llvm.org/D112025	2022-04-26 13:09:16 +07:00
Lian Wang	9980148305	[RISCV][SelectionDAG] Support VP_ADD/VP_MUL/VP_SUB mask operations Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124144	2022-04-26 02:30:22 +00:00
David Green	9727c77d58	[NFC] Rename Instrinsic to Intrinsic	2022-04-25 18:13:23 +01:00
Simon Pilgrim	34e7243464	[DAG] Fold freeze(bitcast(x)) -> bitcast(freeze(x)) This is a very specific fold to fix an upstream poor codegen issue. InstCombine has the much more flexible pushFreezeToPreventPoisonFromPropagating but I don't think we're quite there with DAG/TLI handling for canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison value tracking yet. Fixes #54911 Differential Revision: https://reviews.llvm.org/D124185	2022-04-22 16:39:25 +01:00
Alexey Bataev	2cca53c815	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 09:37:16 -07:00
Alexey Bataev	5f7ac15912	Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer." This reverts commit `2f49163b33` to fix a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284	2022-04-20 06:35:55 -07:00
Alexey Bataev	2f49163b33	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 05:32:56 -07:00
Matt Arsenault	8591328e15	Intrinsics: Mark llvm.eh.sjlj.callsite argument as immarg The assert in SelectionDAG implies that it is	2022-04-19 21:04:33 -04:00
chenglin.bi	222adf338a	[Arch64][SelectionDAG] Add target-specific implementation of srem 1. X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. 2. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-19 02:49:42 +08:00
chenglin.bi	acfc025a72	Revert "[Arch64][SelectionDAG] Add target-specific implementation of srem" This reverts commit `9d9eddd3dd`.	2022-04-18 10:35:09 +08:00
chenglin.bi	9d9eddd3dd	[Arch64][SelectionDAG] Add target-specific implementation of srem X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-16 12:29:11 +08:00
Craig Topper	c6dc229a6d	[DAGCombiner] Move call to hasOneUse after opcode checks. NFC Checking the opcode is cheap, counting the number of uses is not.	2022-04-15 17:02:16 -07:00
Craig Topper	a7b9d75e7a	[DAGCombiner] Move or/xor/and opcode check in ReduceLoadOpStoreWidth before hasOneUse check. hasOneUse is not cheap on nodes with chain results that might have many uses. By checking the opcode first, we can avoid a costly walk of the use list on nodes we aren't interested in. Found by investigating calls to hasNUsesOfValue from the example provided in D123857.	2022-04-15 16:38:27 -07:00
John Brawn	12c1022679	[AArch64] Lowering and legalization of strict FP16 For strict FP16 to work correctly needs some changes in lowering and legalization: * SelectionDAGLegalize::PromoteNode was missing handling for some strict fp opcodes. * Some of the custom lowering of strict fp operations needed to be adjusted to work with FP16. * Custom lowering needed to be added for round-to-int operations. With this, and the previous patches for the rest of the strict fp isel, we can set IsStrictFPEnabled = true. Differential Revision: https://reviews.llvm.org/D115620	2022-04-14 16:51:22 +01:00
Paul Walker	0c44115e51	[SVE] Add support for non-element-type sized scaling when lowering MGATHER/MSCATTER. The lowering code did not use the scale operand of MGATHER/MSCATTER nodes, but instead assumed scaled indices were always scaled based on the element type of the memory type. This patch adds the missing support by rewritting the nodes as unscaled variants. Differential Revision: https://reviews.llvm.org/D123670	2022-04-14 11:54:46 +01:00
Simon Pilgrim	fef221bf1f	[DAG] Enable SimplifyVBinOp folds on add/sub sat intrinsics	2022-04-13 12:53:23 +01:00
Jonas Paulsson	46f83caebc	[InlineAsm] Add support for address operands ("p"). This patch adds support for inline assembly address operands using the "p" constraint on X86 and SystemZ. This was in fact broken on X86 (see example at https://reviews.llvm.org/D110267, Nov 23). These operands should probably be treated the same as memory operands by CodeGenPrepare, which have been commented with "TODO" there. Review: Xiang Zhang and Ulrich Weigand Differential Revision: https://reviews.llvm.org/D122220	2022-04-13 12:50:21 +02:00
Simon Pilgrim	cfb3ee2185	[DAG] Add non-uniform vector support to (shl (srl x, c1), c2) -> (and (shift x, c3)) Another part of D77804 yak shaving Differential Revision: https://reviews.llvm.org/D123523	2022-04-13 11:37:33 +01:00
Simon Pilgrim	bc32a1dd76	[DAG] Add non-uniform vector support to (shl (sr[la] exact X, C1), C2) folds	2022-04-12 12:57:56 +01:00
Craig Topper	35be4a7af3	[SelectionDAG] Remove unecessary null check after call to getNode. NFC As far as I know getNode will never return a null SDValue. I'm guessing this was modeled after the FoldConstantArithmetic call earlier. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123550	2022-04-11 18:03:44 -07:00
Craig Topper	2ce2562876	[RISCV][SelectionDAG] Add a hook to sign extend i32 ConstantInt operands of phis on RV64. Materializing constants on RISCV is simpler if the constant is sign extended from i32. By default i32 constant operands of phis are zero extended. This patch adds a hook to allow RISCV to override this for i32. We have an existing isSExtCheaperThanZExt, but it operates on EVT which we don't have at these places in the code. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122951	2022-04-11 14:38:39 -07:00
Craig Topper	28cb508195	[TargetLowering][RISCV] Allow truncation when checking if the arguments of a setcc are splats. We're just trying to canonicalize here and won't be using the constant value returned. The attached test changes are because we were previously commuting a seteq X, (splat_vector 0) because we also have (sub 0, X). The 0 is larger than the element type so we don't detect it as a splat without the AllowTruncation flag. By preventing the commute we are able to match it to the vmseq.vx instruction during isel. We only look for constants on the RHS in isel. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D123256	2022-04-11 09:49:36 -07:00
Sanjay Patel	2ed15984b4	[SDAG] try to reduce compare of funnel shift equal 0 fshl (or X, Y), X, C ==/!= 0 --> or (shl Y, C), X ==/!= 0 fshl X, (or X, Y), C ==/!= 0 --> or (srl Y, BW-C), X ==/!= 0 This is similar to an existing setcc-of-rotate fold, but the matching requires more checks for the more general funnel op: https://alive2.llvm.org/ce/z/Ab2jDd We are effectively decomposing the funnel shift into logical shifts, reassociating, and removing a shift. This should get us the final improvements for x86-64 that were originally shown in D111530 ( https://github.com/llvm/llvm-project/issues/49541 ); x86-32 still shows some SHLD/SHRD, so the pattern is not matching there yet. Differential Revision: https://reviews.llvm.org/D122919	2022-04-11 07:44:58 -04:00
Tim Northover	6c85668d28	Tail calls: look through AssertZExt to find register copy. arm64_32 guarantees the high 32 bits of pointer parameters are passed as 0, and this is modelled in the IR by inserting an AssertZExt after the CopyFromReg. The function deciding whether registers that need to be preserved actually are wasn't expecting this so it banned perfectly legitimate tail calls.	2022-04-11 12:24:47 +01:00
Fraser Cormack	18106b99f0	[VP] Explicitly map from VP intrinsic to ISD opcode This patch aims to overcome an issue in these mappings where, when an ISD node was registered with BEGIN_REGISTER_VP_SDNODE but outwidth the scope of a pair of BEGIN_REGISTER_VP_INTRINSIC/END_REGISTER_VP_INTRINSIC macros, the switch cases fell apart. This in particular happened with VP_SETCC, where we'd end up with something along the lines of: case Intrinsic::vp_fcmp: break; case Intrinsic::vp_icmp: break; ResOpc = ISD::VP_SETCC; case Intrinsic::vp_store: ... To remedy this, we introduce a special-purpose mapping macro which can map any number of VP intrinsic opcodes to an ISD opcode. As a result, we no longer need to special-case the mapping from vp.icmp and vp.fcmp to VP_SETCC, as the new helper macro does it for us. Thanks to @craig.topper for noticing this and to @rogfer01 for the idea. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D123324	2022-04-08 12:30:22 +01:00
Fraser Cormack	8216255c9f	[RISCV][VP] Add basic RVV codegen for vp.fcmp This patch adds the necessary infrastructure to lower vp.fcmp via ISD::VP_SETCC to RVV instructions. Most notably this patch adds cond-code legalization for VP_SETCC, reusing the existing TargetLowering::LegalizeSetCCCondCode by passing in additional SDValue parameters for the Mask and EVL. This method then uses VP operations to legalize the condcode. There is still a general lack of canonicalization on VP_SETCC as opposed to SETCC which results in worse code than is theoretically possible. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D123051	2022-04-07 09:16:07 +01:00
Craig Topper	bdb1ab9804	[LegalizeTypes][VP] Use LoVT/HiVT when splitting VP operations in SplitVecRes_UnaryOp. The VP path was using the split source VTs instead of the split destination VTs. This may not be a problem today because the VP nodes going through this have the same source and dest VTs. It will be a problem when we start using this function for legalizing VP cast operations.	2022-04-06 10:51:49 -07:00
Daniil Kovalev	62a983ebc5	Revert "[CodeGen] Place SDNode debug ID declaration under appropriate #if" This reverts commit `83a798d4b0`. As discussed in D120714 with @thakis, the patch added unneeded complexity without noticeable benefits.	2022-04-06 20:32:53 +03:00
Craig Topper	8fc19185e3	[LegalizeTypes] Move SplitVecRes_VECTOR_REVERSE/VECTOR_SPLICE near other SplitVecRes methods. NFC This file is divided into sections for different legalization actions. We should keep similar methods together.	2022-04-06 10:29:32 -07:00
Craig Topper	1ad36487e9	[LegalizeDAG] Use SelectionDAG::getBoolConstant to simplify some code. NFC	2022-04-06 10:08:11 -07:00
Craig Topper	5b5f59428c	[DAGCombiner] Replace call getSExtOrTrunc with a truncate. NFC The extend case should never occur. The sign extend would be an arbitrary choice, remove it to avoid confusion.	2022-04-06 09:59:45 -07:00
Paul Walker	7d3af9ef0f	[DAGCombine] insert_subvector undef, (splat X), N2 -> splat X Differential Revision: https://reviews.llvm.org/D120328	2022-04-06 17:15:38 +01:00
Fraser Cormack	6be5e875be	[RISCV][VP] Add basic RVV codegen for vp.icmp This patch adds the minimum required to successfully lower vp.icmp via the new ISD::VP_SETCC node to RVV instructions. Regular ISD::SETCC goes through a lot of canonicalization which targets may rely on which has not hereto been ported to VP_SETCC. It also supports expansion of individual condition codes and a non-boolean return type. Support for all of that will follow in later patches. In the case of RVV this largely isn't a problem as the vector integer comparison instructions are plentiful enough that it can lower all VP_SETCC nodes on legal integer vectors except for boolean vectors, which regular SETCC folds away immediately into logical operations. Floating-point VP_SETCC operations aren't as well supported in RVV and the backend relies on condition code expansion, so support for those operations will come in later patches. Portions of this code were taken from the VP reference patches. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122743	2022-04-06 16:51:22 +01:00
Paul Walker	1c307b9794	[NFC] Remove redundant IndexType canonicalisation from DAGTypeLegalizer::PromoteIntOp_MSCATTER Promotion does not affect the base element type and so the original index type will remain unchanged. This reflects the behaviour of DAGTypeLegalizer::PromoteIntOp_MGATHER with no tests affected.	2022-04-06 15:30:29 +01:00
zhongyunde	19e5235147	[AArch64][InstCombine] Fold MLOAD and zero extensions into MLOAD Accord the discussion in D122281, we missing an ISD::AND combine for MLOAD because it relies on BuildVectorSDNode is fails for scalable vectors. This patch is intend to handle that, so we can circle back the type MVT::nxv2i32 Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122703	2022-04-06 20:50:42 +08:00
Roman Lebedev	34ce9fd864	[TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits E.g. in ``` %i0 = zext <2 x i8> to <2 x i16> %i1 = bitcast <2 x i16> to <4 x i8> ``` the `%i0`'s zero bits are known to be `0xFF00` (upper half of every element is known zero), but no elements are known to be zero, and for `%i1`, we don't know anything about zero bits, but the elements under `0b1010` mask are known to be zero (i.e. the odd elements). But, we didn't perform such a propagation. Noticed while investigating more aggressive `vpmaddwd` formation. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D123163	2022-04-06 14:19:31 +03:00
Daniil Kovalev	83a798d4b0	[CodeGen] Place SDNode debug ID declaration under appropriate #if Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to reduce memory usage when it is not needed. Differential Revision: https://reviews.llvm.org/D120714	2022-04-06 14:09:32 +03:00
Jeremy Morse	fb6596f1ec	[DebugInfo][InstrRef] Avoid a crash from mixed variable location modes Variable locations now come in two modes, instruction referencing and DBG_VALUE. At -O0 we pick DBG_VALUE to allow fast construction of variable information. Unfortunately, SelectionDAG edits the optimisation level in the presence of opt-bisect-limit, meaning different passes have different views of what variable location mode we should use. That causes assertions when they're mixed. This patch plumbs through a boolean in SelectionDAG from start to instruction emission, so that we don't rely on the current optimisation level for correctness. Differential Revision: https://reviews.llvm.org/D123033	2022-04-06 11:55:38 +01:00
Simon Pilgrim	3369e474bb	[DAG] Allow XOR(X,MIN_SIGNED_VALUE) to perform AddLike folds As raised on PR52267, XOR(X,MIN_SIGNED_VALUE) can be treated as ADD(X,MIN_SIGNED_VALUE), so let these cases use the 'AddLike' folds, similar to how we perform no-common-bits OR(X,Y) cases. define i8 @src(i8 %x) { %r = xor i8 %x, 128 ret i8 %r } => define i8 @tgt(i8 %x) { %r = add i8 %x, 128 ret i8 %r } Transformation seems to be correct! https://alive2.llvm.org/ce/z/qV46E2 Differential Revision: https://reviews.llvm.org/D122754	2022-04-06 10:37:11 +01:00
Simon Pilgrim	9e97b2a477	[DAG] SimplifySetCC - relax fold (X^C1) == C2 --> X == C1^C2 https://alive2.llvm.org/ce/z/A_auBq Remove limitation that wouldn't perform the fold if all the inverted bits are known zero The thumb2 changes look to be benign, although it does show that the TEQ/TST isel patterns could probably be improved. Fixes movmsk regression in D122754 Differential Revision: https://reviews.llvm.org/D123023	2022-04-06 09:18:08 +01:00
Simon Pilgrim	328754474a	[DAG] SimplifySetCC - clang-format add/xor/sub with constant handling. NFC.	2022-04-04 13:30:17 +01:00
Michael Gottesman	e24f534879	[debug-info] As an NFC commit, refactor EmitFuncArgumentDbgValue so that it can be extended to support llvm.dbg.addr. The reason why I am making this change is that before this commit, EmitFuncArgumentDbgValue relied on a boolean flag IsDbgDeclare both to signal that a DBG_VALUE should be made to be indirect /and/ that the original intrinsic was a dbg.declare. This is no longer always true if we add support for handling dbg.addr since we will have an indirect DBG_VALUE that is a different intrinsic from dbg.declare. With that in mind, in this NFC patch, we prepare for future fixes by introducing a 3 case-enum argument to EmitFuncArgumentDbgValue that allows the caller to explicitly specify how the argument's DBG_VALUE should be emitted. This then allows us to turn the indirect checks into a != FuncArgumentDbgValueKind::Value and prepare us for a future where we add support here for llvm.dbg.addr directly. rdar://83957028 Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D122945	2022-04-01 17:07:28 -07:00
Craig Topper	fa630e7594	[RISCV][AMDGPU][TargetLowering] Special case overflow expansion for (uaddo X, 1). If we expand (uaddo X, 1) we previously expanded the overflow calculation as (X + 1) <u X. This potentially increases the live range of X and can prevent X+1 from reusing the register that previously held X. Since we're adding 1, overflow only occurs if X was UINT_MAX in which case (X+1) would be 0. So this patch adds a special case to expand the overflow calculation to (X+1) == 0. This seems to help with uaddo intrinsics that get introduced by CodeGenPrepare after LSR. Alternatively, we could block the uaddo transform in CodeGenPrepare for this case. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122933	2022-04-01 13:14:10 -07:00
Simon Pilgrim	76cd11f303	[DAG] Add llvm::isMinSignedConstant helper. NFC Pulled out of D122754	2022-04-01 17:47:34 +01:00
Matt Arsenault	4a8665e23e	SelectionDAG: Avoid some uses of getPointerTy Avoids use of the default address space parameter, and avoids some assumptions about the incoming address space.	2022-03-31 18:49:22 -04:00
Craig Topper	85eae45520	[SelectionDAG] Move extension type for ConstantSDNode from getCopyToRegs to HandlePHINodesInSuccessorBlocks. D122053 set the ExtendType for ConstantSDNodes in getCopyToRegs to ZERO_EXTEND to match assumptions in ComputePHILiveOutRegInfo. PHIs are probably not the only way ConstantSDNodeNodes can get to getCopyToRegs. This patch adds an ExtendType parameter to CopyValueToVirtualRegister and has HandlePHINodesInSuccessorBlocks pass ISD::ZERO_EXTEND for ConstantInts. This way we only affect ConstantSDNodes for PHIs. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122171	2022-03-30 11:32:43 -07:00
Sanjay Patel	436b875e49	[SDAG] avoid libcalls to fmin/fmax for soft-float targets This is an extension of D70965 to avoid creating a mathlib call where it did not exist in the original source. Also see D70852 for discussion about an alternative proposal that was abandoned. In the motivating bug report: https://github.com/llvm/llvm-project/issues/54554 ...we also have a more general issue about handling "no-builtin" options. Differential Revision: https://reviews.llvm.org/D122610	2022-03-30 11:22:03 -04:00
Sanjay Patel	e18cc5277f	[SDAG] try to canonicalize logical shift after bswap When shifting by a byte-multiple: bswap (shl X, C) --> lshr (bswap X), C bswap (lshr X, C) --> shl (bswap X), C This is the backend version of D122010 and an alternative suggested in D120648. There's an extra check to make sure the shift amount is valid that was not in the rough draft. I'm not sure if there is a larger motivating case for RISCV (bug report?), but the ARM diffs show a benefit from having a late version of the transform (because we do not combine the loads in IR). Differential Revision: https://reviews.llvm.org/D122655	2022-03-30 09:29:32 -04:00
Fraser Cormack	43a91a8474	[SelectionDAG] Don't create illegally-typed nodes while constant folding This patch fixes a (seemingly very rare) crash during vector constant folding introduced in D113300. Normally, during legalization, if we create an illegally-typed node during a failed attempt at constant folding it's cleaned up before being visited, due to it having no uses. If, however, an illegally-typed node is created during one round of legalization and isn't cleaned up, it's possible for a second round of legalization to create new illegally-typed nodes which add extra uses to the old illegal nodes. This means that we can end up visiting the old nodes before they're known to be dead, at which point we crash. I'm not happy about this fix. Creating illegal types at all seems like a bad idea, but we all-too-often rely on illegal constants being successfully folded and being fixed up afterwards. However, we can't rely on constant folding actually happening, and we don't have a foolproof way of peering into the future. Perhaps the correct fix is to revisit the node-iteration order during legalization, ensuring we visit all uses of nodes before the nodes themselves. Or alternatively we could try and clean up dead nodes immediately after failing constant folding. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122382	2022-03-30 13:17:55 +01:00
Craig Topper	e68257fcee	[RISCV][SelectionDAG] Enable TargetLowering::hasBitTest for masks that fit in ANDI. Modified DAGCombiner to pass the shift the bittest input and the shift amount to hasBitTest. This matches the other call to hasBitTest in TargetLowering.h This is an alternative to D122454. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D122458	2022-03-28 12:46:36 -07:00
Simon Pilgrim	e209190c2d	[SDAG] enable binop identity constant folds for multiplies Add mul to the list of ops that we canonicalize with a select to expose an identity merge Differential Revision: https://reviews.llvm.org/D122071	2022-03-25 11:07:04 +00:00
Craig Topper	67eb2f144e	[SelectionDAG] Add AssertAlign to AddNodeIDCustom so that it will CSE properly. The alignment needs to be part of the folding set hash. This is handled by getAssertAlign when nodes are created, but needs to repeated here. No test case as I found it as part of a very early experimental patch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D122279	2022-03-24 08:59:09 -07:00
Daniil Kovalev	c53cbce45e	[CodeGen] Define ABI breaking class members correctly Non-static class members declared under #ifndef NDEBUG should be declared under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to make headers library-friendly and allow cross-linking, as discussed in D120714. Differential Revision: https://reviews.llvm.org/D121549	2022-03-24 12:42:59 +03:00
Craig Topper	cac9773dcc	[SelectionDAG] Don't create entries in ValueMap in ComputePHILiveOutRegInfo Instead of using operator[], use DenseMap::find to prevent default constructing an entry if it isn't already in the map. Also simplify a condition to check for 0 instead of a virtual register. I'm pretty sure we can only get 0 or a virtual register out of the value map.	2022-03-23 09:52:07 -07:00
serge-sans-paille	02c28970b2	Cleanup include: codegen second round Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122180	2022-03-23 13:54:00 +01:00
Craig Topper	681fd2c11e	Revert "[SelectionDAG] Don't create entries in ValueMap in ComputePHILiveOutRegInfo" This reverts commit `1a9b55b63a`. Causing build bot failures	2022-03-22 23:41:47 -07:00
Craig Topper	1a9b55b63a	[SelectionDAG] Don't create entries in ValueMap in ComputePHILiveOutRegInfo Instead of using operator[], use DenseMap::find to prevent default constructing an entry if it isn't already in the map.	2022-03-22 23:24:53 -07:00
Craig Topper	73f0af106b	[SelectionDAG] Add printing support for the Align value of AssertAlign nodes. Differential Revision: https://reviews.llvm.org/D122262	2022-03-22 14:16:32 -07:00
Craig Topper	37c0aacd71	[SelectionDAG] Make getPreferredExtendForValue take a Instruction * instead of Value . This is only called for instructions and the caller is already holding an Instruction . This makes the code more explicit and makes it obvious the code doesn't make decisions about constants.	2022-03-21 12:15:22 -07:00
zhongyunde	828b89bc0b	[AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads Trying to reduce the number of masked loads in favour of more unpklo/hi instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions from legal types. Both of normal and masked loads test cases added to guard compile crash. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D120953	2022-03-21 23:47:33 +08:00
Simon Pilgrim	35a7be6ccb	[SDAG] enable binop identity constant folds for shifts Add shl/srl/sra to the list of ops that we canonicalize with a select to expose an identity merge Differential Revision: https://reviews.llvm.org/D122070	2022-03-21 13:02:50 +00:00
Luo, Yuanke	10bb623192	enable binop identity constant folds for add Differential Revision: https://reviews.llvm.org/D119654	2022-03-20 19:07:16 +08:00
Craig Topper	4eb59f0179	[SelectionDAG][RISCV] Make RegsForValue::getCopyToRegs explicitly zero_extend constants. ComputePHILiveOutRegInfo assumes that constant incoming values to Phis will be zero extended if they aren't a legal type. To guarantee that we should zero_extend rather than any_extend constants. This fixes a bug for RISCV where any_extend of constants can be treated as a sign_extend. Differential Revision: https://reviews.llvm.org/D122053	2022-03-19 18:43:14 -07:00
Craig Topper	306ff74154	[SelectionDAG] Use APInt::zextOrSelf instead of zextOrTrunc in ComputePHILiveOutRegInfo The width never decreases here.	2022-03-18 23:26:19 -07:00
Heejin Ahn	b8038a916d	[WebAssembly] Disable SimplifyDemandedVectorElts after legalization This fixes a reported bug that caused an infinite loop during the SelectionDAG optimization phase in ISel, by creating an overridable hook in `TargetLowering` that allows us to bail out from running `SimplifyDemandedVectorElts`. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D121869	2022-03-16 20:52:43 -07:00
Marco Elver	555df03012	[SelectionDAG][NFC] Clean up SDCallSiteDbgInfo accessors * Consistent naming: addCallSiteInfo vs. getCallSiteInfo; * Use ternary operator to reduce verbosity; * const'ify getters; * Add comments; NFCI. Differential Revision: https://reviews.llvm.org/D121820	2022-03-16 17:46:06 +01:00
Matthias Gehre	09854f2af3	[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers Emit calls to __divei4 and friends for divison/remainder of large integers. This fixes https://github.com/llvm/llvm-project/issues/44994. The overall RFC is in https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329 The compiler-rt part is in https://reviews.llvm.org/D120327 Differential Revision: https://reviews.llvm.org/D120329	2022-03-16 09:36:28 +00:00
Craig Topper	1bf4bbc492	[LegalizeTypes][RISCV][WebAssembly] Expand ABS in PromoteIntRes_ABS if it will expand to sra+xor+sub later. If we promote the ABS and then Expand in LegalizeDAG, then both the sra and the xor will have their inputs sign extended. This generates extra code on RISCV which lacks an i8 or i16 sign extend instructon. If we expand during type legalization, then only the sra will get its input sign extended. RISCV is able to combine this with the sra by doing a shift left followed by an sra. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D121664	2022-03-15 08:27:39 -07:00
Craig Topper	ad94dfb9a0	[DAGCombiner][RISCV] Adjust (aext (and (trunc x), cst)) -> (and x, cst) to sext cst based on target preference RISCV strong prefers i32 values be sign extended to i64. This combine was always zero extending the constant using APInt methods. This adjusts the code so that it calls getNode using ISD::ANY_EXTEND instead. getNode will call TLI.isSExtCheaperThanZExt to decide how to handle the constant. Tests were copied from D121598 where I noticed that we were creating constants that were hard to materialize. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D121650	2022-03-15 08:26:47 -07:00
Sanjay Patel	c2592c374e	[SDAG] simplify bitwise logic with repeated operand We do not have general reassociation here (and probably do not need it), but I noticed these were missing in patches/tests motivated by D111530, so we can at least handle the simplest patterns. The VE test diff looks correct, but we miss that pattern in IR currently: https://alive2.llvm.org/ce/z/u66_PM	2022-03-13 11:12:30 -04:00
serge-sans-paille	ed98c1b376	Cleanup includes: DebugInfo & CodeGen Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121332	2022-03-12 17:26:40 +01:00
Lorenzo Albano	28cfa764c2	[VP] Strided loads/stores This patch introduces two new experimental IR intrinsics and SDAG nodes to represent vector strided loads and stores. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D114884	2022-03-10 18:46:54 +01:00
Stanislav Mekhanoshin	0be6fd44f3	[SDAG] Use MMO flags in MemSDNode folding SDNodes with different target flags may now be folded together rightfully resulting in the assertion in the refineAlignment. Folding nodes with different target flags may result in the wrong load instructions produced at least on the AMDGPU. Fixes: SWDEV-326805 Differential Revision: https://reviews.llvm.org/D121335	2022-03-09 14:25:22 -08:00
Sanjay Patel	341623653d	[SDAG] match rotate pattern with extra 'or' operation This is another fold generalized from D111530. We can find a common source for a rotate operation hidden inside an 'or': https://alive2.llvm.org/ce/z/9pV8hn Deciding when this is profitable vs. a funnel-shift is tricky, but this does not show any regressions: if a target has a rotate but it does not have a funnel-shift, then try to form the rotate here. That is why we don't have x86 test diffs for the scalar tests that are duplicated from AArch64 ( `74a65e3834` ) - shld/shrd are available. That also makes it difficult to show vector diffs - the only case where I found a diff was on x86 AVX512 or XOP with i64 elements. There's an additional check for a legal type to avoid a problem seen with x86-32 where we form a 64-bit rotate but then it gets split inefficiently. We might avoid that by adding more rotate folds, but I didn't check to see what is missing on that path. This gets most of the motivating patterns for AArch64 / ARM that are in D111530. We still need a couple of enhancements to setcc pattern matching with rotate/funnel-shift to get the rest. Differential Revision: https://reviews.llvm.org/D120933	2022-03-09 13:19:00 -05:00
Craig Topper	29511ec7da	[LegalizeTypes][VP] Add widening and splitting support for VP_FMA. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D120854	2022-03-08 09:59:59 -08:00
Craig Topper	c392b9924e	[LegalizeTypes][VP] Add splitting and widening support for VP_FNEG. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D120785	2022-03-08 09:59:34 -08:00
Fraser Cormack	17310f3d19	[SelectionDAG][NFC] Address a few clang-tidy warnings Fix a couple of else-after-return warnings and some unnecessary parentheses.	2022-03-08 16:22:26 +00:00
Craig Topper	8e132c5c1d	[LegalizeTypes][ARM][X86] Change ExpandIntRes_ABS to use sra+xor+sub. Previously we used sra+add+xor if ADDCARRY is supported. This changes to sra+xor+sub is SUBCARRY is available. This is consistent with the recent change to the default expansion in LegalizeDAG. Differential Revision: https://reviews.llvm.org/D121039	2022-03-07 11:28:32 -08:00
David Green	4388f4f776	[DAG] Don't convert undef to 0 when creating buildvector When inserting undef into buildvectors created from shuffles of buildvectors, we convert elements to the largest needed type. This had the effect of converting undef into 0, which isn't needed as the buildvector implicitly truncates and trunc(zext(undef)) == undef. Differential Revision: https://reviews.llvm.org/D121002	2022-03-06 18:35:34 +00:00
Sanjay Patel	f4b53972ce	[SDAG] fold bitwise logic with shifted operands This extends `acb96ffd14` to 'and' and 'xor' opcodes. Copying from that message: LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z https://alive2.llvm.org/ce/z/QmR9rR This is a reassociation + factoring fold. The common shift operation is moved after a bitwise logic op on 2 input operands. We get simpler cases of these patterns in IR, but I suspect we would miss all of these exact tests in IR too. We also handle the simpler form of this plus several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands().	2022-03-05 11:14:45 -05:00
Paul Walker	42b4a6227e	[DAGCombine] Prevent illegal ISD::SPLAT_VECTOR operations post legalisation. When triggered during operation legalisation the affected combine generates a splat_vector that when custom lowered for SVE fixed length code generation, results in the original precombine sequence and thus we enter a legalisation/combine hang. NOTE: The patch contains no tests because I observed this issue only when combined with other work that might never become public. The current way AArch64 lowers ISD::SPLAT_VECTOR meant a specific test was not possible so I'm hoping the DAGCombiner fix can be seen as obvious. The AArch64ISelLowering change is requirted to maintain existing code quality. Differential Revision: https://reviews.llvm.org/D120735	2022-03-04 11:54:03 +00:00
Maksim Panchenko	7e570308f2	[NFC] Fix typos Reviewed By: yota9, Amir Differential Revision: https://reviews.llvm.org/D120859	2022-03-03 13:26:39 -08:00
Paul Robinson	7b85f0f32f	[PS4] isPS4 and isPS4CPU are not meaningfully different	2022-03-03 11:36:59 -05:00
Sanjay Patel	e9302bf7ef	[SDAG] try harder to remove a rotate from X == 0 https://alive2.llvm.org/ce/z/mJP7XP This can be viewed as expanding the compare into and/or-of-compares: https://alive2.llvm.org/ce/z/bkZYWE followed by reduction of each compare. This could be extended in several ways: 1. There's a (X & Y) == -1 sibling. 2. We can recurse through more than 1 'or'. 3. The fold could be generalized beyond rotates - any operation that only changes the order of bits (bswap, bitreverse). This is a transform noted in D111530.	2022-03-03 09:25:46 -05:00
Sanjay Patel	c33dbc2a2d	[SDAG] refactor foldSetCCWithRotate; NFC There are more potential optimizations to make here, so rearrange to make it easier to append those.	2022-03-02 16:42:05 -05:00
Craig Topper	ab7a7cc1dd	Revert "[LegalizeTypes][VP] Add splitting and widening support for VP_FNEG." This reverts commit `ac93f95861`. Committed by accident.	2022-03-02 10:00:22 -08:00
Craig Topper	324c0a7206	[SelectionDAG][RISCV] Emit a canonical sign bit test from ExpandIntRes_ABS. Instead of emitting 0 > Hi, emit Hi < 0. If Hi needs to be expanded again this will allow the special case for sign bit tests in ExpandIntOp_SETCC to trigger. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120761	2022-03-02 09:47:26 -08:00
Craig Topper	ac93f95861	[LegalizeTypes][VP] Add splitting and widening support for VP_FNEG. Differential Revision: https://reviews.llvm.org/D120785	2022-03-02 09:47:05 -08:00
Simon Pilgrim	5cce97d61e	[DAG] isSplatValue - improve ISD::VECTOR_SHUFFLE splat detection Currently we only check for splat shuffles, this extends it to see if the source operand is a splat across the demanded elts based upon the shuffle mask	2022-03-02 15:32:24 +00:00
Simon Pilgrim	df0a2b4f30	[DAG] SelectionDAG::isSplatValue - add initial BITCAST handling This patch adds support for recognising vector splats by peeking through bitcasts to vectors with smaller element types - if all the offset subelements are splats then the bitcasted vector is a splat as well. We don't have great coverage for isSplatValue so I've made this pretty specific to the use case I'm trying to fix - regressions in some vXi64 vector shift by splat cases that 32-bit x86 doesn't recognise because the shift amount buildvector has been type legalised to v2Xi32. We can add further support (floats, bitcast from larger element types, undef elements) when we have actual test coverage. Differential Revision: https://reviews.llvm.org/D120553	2022-03-02 11:25:51 +00:00
Craig Topper	8787726609	[LegalizeTypes] Remove incomplete StrictFP support from SplitVecRes_UnaryOp. NFC There is no handling of Chain operands in this function so it can't work. There's a separate splitting function for all strict fp nodes.	2022-03-01 15:43:57 -08:00
serge-sans-paille	a494ae43be	Cleanup includes: TransformsUtils Estimation on the impact on preprocessor output: before: 1065307662 after: 1064800684 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120741	2022-03-01 21:00:07 +01:00
Craig Topper	bf8054644d	[DAGCombiner] Don't expand (neg (abs x)) if the abs has an additional user. If the types aren't legal, the expansions may get type legalized in a different way preventing code sharing. If the type is legal, we will share some instructions between the two expansions, but we will need an extra register. Since we don't appear to fold (neg (sub A, B)) if the sub has an additional user, I think it makes sense not to expand NABS. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120513	2022-03-01 07:32:07 -08:00
Sanjay Patel	69684b84c6	[SDAG] fold (rotate X) eq/ne (0/-1) This is the SDAG equivalent of an instcombine transform added with: `fd807601a7` This is another step towards solving #49541 and part of an alternative set of more general transforms than what is proposed in D111530. https://alive2.llvm.org/ce/z/ToxaE8	2022-02-27 11:31:19 -05:00
Sanjay Patel	acb96ffd14	[SDAG] fold bitwise logic with shifted operands LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z https://alive2.llvm.org/ce/z/QmR9rR This is a reassociation + factoring fold. The common shift operation is moved after a bitwise logic op on 2 input operands. We get simpler cases of these patterns in IR, but I suspect we would miss all of these exact tests in IR too. We also handle the simpler form of this plus several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands(). This is a partial implementation of a transform suggested in D111530 (only handles 'or' bitwise logic as a first step - need to stamp out more tests for other opcodes). Several of the same tests added for D111530 are altered here (but not fully optimized). I'm not sure yet if this would help/hinder that patch, but this should be an improvement for all tests added with `ecf606cb43` since it removes a shift operation in those examples. Differential Revision: https://reviews.llvm.org/D120516	2022-02-27 09:54:12 -05:00
Simon Pilgrim	fadd20f80d	[DAG] Ensure type is legal for bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2)))) fold As reported on D120192	2022-02-27 11:25:22 +00:00
Nikita Popov	87ebd9a36f	[IR] Use CallBase::getParamElementType() (NFC) As this method now exists on CallBase, use it rather than the one on AttributeList.	2022-02-25 10:01:58 +01:00
Simon Pilgrim	370ebc9d9a	[DAG] Attempt to fold bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2)))) If the shl is at least half the bitwidth (i.e. the lower half of the bswap source is zero), then we can reduce the shift and perform the bswap at half the bitwidth and just zero extend. Based off PR51391 + PR53867 Differential Revision: https://reviews.llvm.org/D120192	2022-02-24 19:33:51 +00:00
Sanjay Patel	4a3708cd6b	[SDAG] remove shift that is redundant with part of funnel shift This is the SDAG translation of D120253 : https://alive2.llvm.org/ce/z/qHpmNn The SDAG nodes can have different operand types than the result value. We can see an example of that with AArch64 - the funnel shift amount is an i64 rather than i32. We may need to make that match even more flexible to handle post-legalization nodes, but I have not stepped into that yet. Differential Revision: https://reviews.llvm.org/D120264	2022-02-24 11:25:46 -05:00
Craig Topper	c7d6448d03	[DAGCombiner][TargetLowering] Pass SDValue by value to isMulAddWithConstProfitable. Internally to DAGCombiner the SDValues were passed by non-const reference despite not being modified. They were then passed by const reference to TLI. This patch passes them by value which is consistent with the vast majority of code. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120420	2022-02-23 12:40:45 -08:00
Pawe Bylica	afdaa86b77	[DAGCombine] Extend combineCarryDiamond() In combineCarryDiamond() use getAsCarry() to find more candidates for being a carry flag. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118362	2022-02-23 21:37:49 +01:00
Sanjay Patel	21d7c3bcc6	[DAG] try to convert multiply to shift via demanded bits This is a fix for a regression discussed in: https://github.com/llvm/llvm-project/issues/53829 We cleared more high multiplier bits with `995d400`, but that can lead to worse codegen because we would fail to recognize the now disguised multiplication by neg-power-of-2 as a shift-left. The problem exists independently of the IR change in the case that the multiply already had cleared high bits. We also convert shl+sub into mul+add in instcombine's negator. This patch fills in the high-bits to see the shift transform opportunity. Alive2 attempt to show correctness: https://alive2.llvm.org/ce/z/GgSKVX The AArch64, RISCV, and MIPS diffs look like clear wins. The x86 code requires an extra move register in the minimal examples, but it's still an improvement to get rid of the multiply on all CPUs that I am aware of (because multiply is never as fast as a shift). There's a potential follow-up noted by the TODO comment. We should already convert that pattern into shl+add in IR, so it's probably not common: https://alive2.llvm.org/ce/z/7QY_Ga Fixes #53829 Differential Revision: https://reviews.llvm.org/D120216	2022-02-23 12:09:32 -05:00
Paweł Bylica	df0c16ce00	[NFC][DAGCombine] Use isOperandOf() in combineCarryDiamond Pre-commit for https://reviews.llvm.org/D118362.	2022-02-21 21:41:31 +01:00
Simon Pilgrim	46f1e8359e	[DAG] visitBSWAP - pull out repeated SDLoc. NFC Cleanup for D120192	2022-02-21 13:08:01 +00:00
Craig Topper	440c4b705a	[SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y). Previous we used sra (X, size(X)-1); xor (add (X, Y), Y). By placing sub at the end, we allow RISCV to combine sign_extend_inreg with it to form subw. Some X86 tests for Z - abs(X) seem to have improved as well. Other targets look to be a wash. I had to modify ARM's abs matching code to match from sub instead of xor. Maybe instead ISD::ABS should be made legal. I'll try that in parallel to this patch. This is an alternative to D119099 which was focused on RISCV only. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119171	2022-02-20 21:11:23 -08:00
Chen Zheng	efe5b8ad90	[ISEL] remove unnecessary getNode(); NFC Reviewed By: RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D120049	2022-02-20 21:08:49 -05:00
Luo, Yuanke	67ef63138b	[SDAG] enable binop identity constant folds for sub This patch extract the sub folding from D119654 and leave only add folding in that patch. Differential Revision: https://reviews.llvm.org/D120116	2022-02-21 09:37:36 +08:00
Craig Topper	24bfa24355	[SelectionDAGBuilder] Simplify visitShift. NFC This code was detecting whether the value returned by getShiftAmountTy can represent all shift amounts. If not, it would use MVT::i32 as a placeholder. getShiftAmountTy was updated last year to return i32 if the type returned by the target couldn't represent all values. This means the MVT::i32 case here is dead and can the logic can be simplified. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120164	2022-02-19 12:40:59 -08:00
Craig Topper	8e7247a377	[SelectionDAG] Fix off by one error in range check in DAGTypeLegalizer::ExpandShiftByConstant. The code was considering shifts by an about larger than the number of bits in the original VT to be out of range. Shifts exactly equal to the original bit width are also out of range. I don't know how to test this. DAGCombiner should usually fold this away. I just noticed while looking for something else in this code. The llvm-cov report shows that we don't have coverage for out of range shifts here. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120170	2022-02-18 18:42:20 -08:00
Craig Topper	04f815c26f	[SelectionDAGBuilder] Remove LegalTypes=false from a call to getShiftAmountConstant. getShiftAmountTy will return MVT::i32 if the shift amount coming from the target's getScalarShiftAmountTy can't reprsent all possible values. That should eliminate the need to use the pointer type which is what we do when LegalTypes is false. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120165	2022-02-18 15:36:35 -08:00
Sanjay Patel	a2963d871e	[SDAG] fold sub-of-shift to add-of-shift This fold is done in IR: https://alive2.llvm.org/ce/z/jWyFrP There is an x86 test that shows an improvement from the added flexibility of using add (commutative). The other diffs are presumed neutral. Note that this could also be folded to an 'xor', but I'm not sure if that would be universally better (eg, x86 can convert adds more easily into LEA). This helps prevent regressions from a potential fold for issue #53829.	2022-02-18 11:55:50 -05:00
Paul Walker	6457f42bde	[DAGCombiner] Extend ISD::ABDS/U combine to handle more cases. The current ABD combine doesn't quite work for SVE because only a single scalable vector per scalar integer type is legal (e.g. for i32, <vscale x 4 x i32> is the only legal scalable vector type). This patch extends the combine to also trigger for the cases when operand extension must be retained. Differential Revision: https://reviews.llvm.org/D115739	2022-02-17 13:32:20 +00:00
Bjorn Pettersson	1a8bdf95a3	[DAG] Fix in ReplaceAllUsesOfValuesWith When doing SelectionDAG::ReplaceAllUsesOfValuesWith a worklist is prepared containing all users that should be updated. Then we use the RemoveNodeFromCSEMaps/AddModifiedNodeToCSEMaps helpers to handle recursive CSE updates while doing the replacements. This patch aims at solving a problem that could arise if the recursive CSE updates would result in an SDNode present in the worklist is being removed as a side-effect of morphing a prio user in the worklist. To examplify such a scenario, imagine that we have these nodes in the DAG t12: i64 = add t8, t11 t13: i64 = add t12, t8 t14: i64 = add t11, t11 t15: i64 = add t14, t8 t16: i64 = sub t13, t15 and that the t8 uses should be replaced by t11. An initial worklist (listing the users that should be morphed) could be [t12, t13, t15]. When updating t12 we get t12: i64 = add t11, t11 which results in a CSE update that replaces t14 by t12, so we get t15: i64 = add t12, t8 which results in a CSE update that replaces t13 by t12, so we get t16: i64 = sub t12, t15 and then t13 is removed given that it was the last use of t13. So when being done with the updates triggered by rewriting the use of t8 in t12 the t13 node no longer exist. And we used to end up hitting an assertion when continuing with the worklist aiming at replacing the t8 uses in t13. The solution is based on using a DAGUpdateListener, making sure that we prune a user from the worklist if it is removed during the recursive CSE updates. The bug was found using an OOT target. I think the problem is quite old, even if the particular intree target reproducer added in this patch seem to pass when using LLVM 13.0.0. Differential Revision: https://reviews.llvm.org/D119088	2022-02-17 14:29:59 +01:00
Craig Topper	1daa66d3fd	[SelectionDAG] Add SPLAT_VECTOR to SelectionDAG::isConstantFPBuildVectorOrConstantFP. Matches what is done for the int version. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D119793	2022-02-16 09:22:11 -08:00
Simon Pilgrim	30e9cdd1aa	[DAG] computeKnownBits - add ISD::AVGCEILU handling Expand the ISD::AVGCEILU to determine the known bits of the result. First part of PR53622 Differential Revision: https://reviews.llvm.org/D119629	2022-02-16 13:00:15 +00:00
David Green	655d0d86f9	[DAGCombine] Move AVG combine to SimplifyDemandBits This moves the matching of AVGFloor and AVGCeil into a place where demand bit are available, so that it can detect more cases for more folds. It changes the transform to start from a shift, not from a truncate. We match the pattern shr(add(ext(A), ext(B)), 1), transforming to ext(hadd(A, B)). For signed values, because only the bottom bits are demanded llvm will transform the above to use a lshr too, as opposed to ashr. In order to correctly detect the hadd we need to know the demanded bits to turn it back. Depending on whether the shift is signed (ashr) or logical (lshr), and the extensions are signed or unsigned we can create different nodes. If the shift is signed: Needs >= 2 sign bits. https://alive2.llvm.org/ce/z/h4gQAW generating signed rhadd. Needs >= 2 zero bits. https://alive2.llvm.org/ce/z/B64DUA generating unsigned rhadd. If the shift is unsigned: Needs >= 1 zero bits. https://alive2.llvm.org/ce/z/ByD8sj generating unsigned rhadd. Needs 1 demanded bit zero and >= 2 sign bits https://alive2.llvm.org/ce/z/hvPGxX and https://alive2.llvm.org/ce/z/32P5n1 generating signed rhadd. Differential Revision: https://reviews.llvm.org/D119072	2022-02-15 10:17:02 +00:00
David Green	03380c70ed	[DAGCombine] Basic combines for AVG nodes. This adds very basic combines for AVG nodes, mostly for constant folding and handling degenerate (zero) cases. The code performs mostly the same transforms as visitMULHS, adjusted for AVG nodes. Constant folding extends to a higher bitwidth and drops the lowest bit. For undef nodes, `avg undef, x` is transformed to x. There is also a transform for `avgfloor x, 0` transforming to `shr x, 1`. Differential Revision: https://reviews.llvm.org/D119559	2022-02-14 11:18:35 +00:00
Nikita Popov	ff040eca93	[FastISel] Reuse register for bitcast that does not change MVT The current FastISel code reuses the register for a bitcast that doesn't change the IR type, but uses a reg-to-reg copy if it changes the IR type without changing the MVT. However, we can simply reuse the register in that case as well. In particular, this avoids unnecessary reg-to-reg copies for pointer bitcasts. This was found while inspecting O0 codegen differences between typed and opaque pointers. Differential Revision: https://reviews.llvm.org/D119432	2022-02-14 09:13:17 +01:00
Craig Topper	e72fe654b7	[DAGCombiner] Use getShiftAmountConstant in DAGCombiner::foldSelectOfConstants. This enables fshl to be matched earlier on X86 %6 = lshr i32 %3, 1 %7 = select i1 %4, i32 -2147483648, i32 0 %8 = or i32 %6, %7 X86 uses i8 for shift amounts. SelectionDAGBuilder creates the ISD::SRL with an i8 shift type. DAGCombiner turns the select into an ISD::SHL. Prior to this patch it would use i32 for the shift amount. fshl matching failed because the shift amounts have different types. LegalizeDAG fixes the ISD::SHL shift amount to i8. This allowed fshl matching to succeed. With this patch, the ISD::SHL will be created with an i8 shift amount. This allows the fshl to match immediately. No test case beause we still end up with a fshl either way.	2022-02-13 19:09:26 -08:00
Sanjay Patel	96b7e0b5a0	[SDAG] clean up scalarizing load transform I have not found a way to expose a difference for this patch in a test because it only triggers for a one-use load, but this is the code that was adapted into D118376 and caused miscompiles. The new code pattern is the same as what we do in narrowExtractedVectorLoad() (reduces load width for a subvector extract). This removes seemingly unnecessary manual worklist management and fixes the chain updating via "SelectionDAG::makeEquivalentMemoryOrdering()". Differential Revision: https://reviews.llvm.org/D119549	2022-02-12 11:41:19 -05:00
Sanjay Patel	429f10f5f2	[SDAG] reduce code duplication and fix formatting; NFC	2022-02-12 10:22:13 -05:00
Arthur Eubanks	c0281c7607	[OpaquePtr][SPARC] Remove getPointerElementType() call in SparcISelLowering Requires keeping better track of sret types.	2022-02-11 11:31:19 -08:00
David Green	4072e362c0	[ISel] Port AArch64 HADD and RHADD to ISel This ports the aarch64 combines for HADD and RHADD over to DAG combine, so that they can be used in more architectures (notably MVE in a followup patch). They are renamed to AVGFLOOR and AVGCEIL in the process, to avoid confusion with instructions such as X86 hadd. The code was also rewritten slightly to remove the AArch64 idiosyncrasies. The general pattern for a AVGFLOORS is %xe = sext i8 %x to i32 %ye = sext i8 %y to i32 %a = add i32 %xe, %ye %r = lshr i32 %a, 1 %t = trunc i32 %r to i8 An AVGFLOORU is equivalent with zext. Because of the truncate lshr==ashr, as the top bits are not demanded. An AVGCEIL also includes an extra rounding, so includes an extra add of 1. Differential Revision: https://reviews.llvm.org/D106237	2022-02-11 18:28:56 +00:00
Julien Pages	dcb2da13f1	[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode Add a new llvm.fptrunc.round intrinsic to precisely control the rounding mode when converting from f32 to f16. Differential Revision: https://reviews.llvm.org/D110579	2022-02-11 12:08:23 -05:00
Nikita Popov	6241f7dee0	[FastISel] Remove redundant reg class check (NFC) SrcVT and DstVT are the same in this branch, as such their register classes will also be the same.	2022-02-10 14:10:00 +01:00
Reid Kleckner	b5a592a8e2	[DAG] Remove pointless std::function wrapper, NFC	2022-02-09 14:30:43 -08:00
Reid Kleckner	f63c150187	Revert "[DagCombine] Increase depth by number of operands to avoid a pathological compile time." Appears to be causing check-llvm to fail This reverts commit `49ab760090`.	2022-02-09 13:55:40 -08:00
Alina Sbirlea	49ab760090	[DagCombine] Increase depth by number of operands to avoid a pathological compile time. We're hitting a pathological compile-time case, profiled to be in DagCombiner::visitTokenFactor and many inserts into a SmallPtrSet. It looks like one of the paths around findBetterNeighborChains is not capped and leads to this. This patch resolves the issue. Looking for feedback if this solution looks reasonable. Differential Revision: https://reviews.llvm.org/D118877	2022-02-09 13:31:28 -08:00
Sander de Smalen	ec46232517	[DAGCombiner] Fold `ty1 extract_vector(ty2 splat(V)) -> ty1 splat(V)` This seems like an obvious fold, which leads to a few improvements. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118920	2022-02-09 14:30:01 +00:00
Sanjay Patel	905abc5b7d	[SDAG] enable binop identity constant folds for fmul/fdiv The test diffs are identical to D119111. This only affects x86 currently because no other target has an override for the TLI hook that controls this transform.	2022-02-08 10:52:28 -05:00
Roman Lebedev	ae9414d562	[ValueTracking] Only check for non-undef/poison if already known to be a self-multiply https://godbolt.org/z/js9fTTG9h ^ we don't care what `isGuaranteedNotToBeUndefOrPoison()` says unless we already knew that the operands were equal.	2022-02-08 18:35:29 +03:00
Sanjay Patel	a68e098024	[SDAG] move x86 select-with-identity-constant fold behind a target hook; NFC This is no-functional-change-intended because only the x86 target enables the TLI hook currently. We can add fmul/fdiv opcodes to the switch similar to the proposal D119111, but we don't need to make other changes like enabling target-specific combines. We can also add integer opcodes (add, or, shl, etc.) to the switch because this function is called from all of the generic binary opcodes. The goal is to incrementally enable the profitable diffs from D90113 while avoiding regressions. Differential Revision: https://reviews.llvm.org/D119150	2022-02-08 09:55:05 -05:00
Simon Pilgrim	fd2bb51f1e	[ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bitmask. This patch adds variants of APInt::isShiftedMask, isShiftedMask_32 and isShiftedMask_64 that return these values as additional arguments. I've updated a number of cases that were either performing seperate size/position calculations or had created their own local wrapper versions of these. Differential Revision: https://reviews.llvm.org/D119019	2022-02-08 12:04:13 +00:00
Sanjay Patel	d1ecfaa097	[SDAG] try to fold one-demanded-bit-of-multiply This is a translation of the transform added to InstCombine with: D118539	2022-02-07 17:24:35 -05:00
Sanjay Patel	fc6bee1c11	[SDAG] SimplifyDemandedBits - generalize fold for 2 LSB of X*X This is translated from recent changes to the IR version of this function: D119060 D119139	2022-02-07 15:38:50 -05:00
Simon Pilgrim	74555fd367	[DAG] visitINSERT_VECTOR_ELT - break if-else chain as they both return (style). NFC.	2022-02-07 09:58:47 +00:00
Craig Topper	c35ccd2ac8	[DAGCombiner][RISCV] Allow rotates by non-constant to be matched for i32 on riscv64 with Zbb. rv64izbb has a RORW/ROLW instructions that operate on the lower 32-bits of a 64-bit value and sign extend bit 31 of the result. DAGCombiner won't match rotate idioms because the i32 type isn't Legal on riscv64. This patch teaches DAGCombiner to allow it if the type is going to be promoted and the target has Custom type legalization for ISD::ROTL or ISD::ROTR. I've restricted this to scalar types. It doesn't appear any in tree targets other than riscv64 have custom type legalization for rotates. If this patch isn't acceptable, I guess I can match SRLW, SLLW, and OR after type legalization, but I'd like to avoid that if possible. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119062	2022-02-06 10:58:12 -08:00
Bjorn Pettersson	cecf11c315	[DAGCombiner] Fold SSHLSAT/USHLSAT to SHL when no saturation will occur When the shift amount is known and a known sign bit analysis of the shiftee indicates that no saturation will occur, then we can replace SSHLSAT/USHLSAT by SHL. Differential Revision: https://reviews.llvm.org/D118765	2022-02-06 18:59:06 +01:00
Benjamin Kramer	a40dc4eaf8	Simplify mask creation with llvm::seq. NFCI.	2022-02-05 23:35:41 +01:00
Sander de Smalen	6452549f30	[DAGCombiner] Fold vecreduce_or/and if operand is insert_subvector. Fold: vecreduce_or(insert_subvec(zeroinitializer, vec)) -> vecreduce_or(vec) vecreduce_and(insert_subvec(allones, vec)) -> vecreduce_and(vec) vecreduce_and/or(insert_subvec(undef, vec)) -> vecreduce_and/or(vec) This is useful for SVE which uses insert/extract subvector to convert fixed-width to/from scalable vectors. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D118919	2022-02-05 14:35:53 +00:00
John Brawn	0d8092dd48	[AArch64] Fix legalization of v1f64 strict_fsetcc and strict_fsetccs These operations are scalarized but the result type v1i1 isn't which needs special handling (the same as is done for the non-strict versions of these operations). Differential Revision: https://reviews.llvm.org/D118258	2022-02-04 12:55:38 +00:00
serge-sans-paille	ffe8720aa0	Reduce dependencies on llvm/BinaryFormat/Dwarf.h This header is very large (3M Lines once expended) and was included in location where dwarf-specific information were not needed. More specifically, this commit suppresses the dependencies on llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used, this has a decent impact on number of preprocessed lines generated during compilation of LLVM, as showcased below. This is achieved by moving some definitions back to the .cpp file, no performance impact implied[0]. As a consequence of that patch, downstream user may need to manually some extra files: llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h In some situations, codes maybe relying on the fact that llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden dependency now needs to be explicit. $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l after: 10978519 before: 11245451 Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup [0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions Differential Revision: https://reviews.llvm.org/D118781	2022-02-04 11:44:03 +01:00
Bjorn Pettersson	3db39e7479	[DAGCombiner] Fix dependency analysis in checkMergeStoreCandidatesForDependencies In the aftermath of D116895 a problem was found in the analysis of dependencies between store merge candidates in checkMergeStoreCandidatesForDependencies, that is needed to avoid the cycles are introduced in the DAG. In the past it has been enough (or assumed to be enough) to start scanning from non-chain operands when analysing the store merge candidates for dependencies, assuming that the analysis of chain dependencies performed when finding the candidates would cover up for potential dependencies that exist involving the chain operands. It was however discovered that one could end up with scenarios such as descibed in the aarch64-checkMergeStoreCandidatesForDependencies.ll test case, when the dependency between two stores is given by a mix of chain operand dependencies and non-chain operand dependencies. The fix in this patch make sure that we also account for chain operand dependencies when doing the more elaborate analysis in checkMergeStoreCandidatesForDependencies, no longer relying on that the earlier check involving chain operands is enough. Differential Revision: https://reviews.llvm.org/D118943	2022-02-04 08:53:01 +01:00
Sander de Smalen	01bfe9729a	[ISEL] Canonicalize STEP_VECTOR to LHS if RHS is a splat. This helps recognise patterns where we're trying to match STEP_VECTOR patterns to INDEX instructions that take a GPR for the Start/Step. The reason for canonicalising this operation to the LHS is because it will already be canonicalised to the LHS if the RHS is a constant splat vector. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D118459	2022-02-03 09:31:46 +00:00
Simon Pilgrim	5aa2acc86b	[DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.	2022-02-02 12:04:49 +00:00
Simon Moll	7d926b7177	[VE] LEGALAVL and staged VVP legalization The new LEGALAVL node annotates that the AVL refers to packs of 64bit. We use a two-stage lowering approach with LEGALAVL: First, standard SDNodes are translated into illegal VVP layer nodes. Regardless of source (VP or standard), all VVP nodes have a mask and AVL parameter. The AVL parameter refers to the element position (just as in VP intrinsics). Second, we legalize the AVL usage in VVP layer nodes. If the element size is < 64bit, the EVL parameter has to be adjusted to refer to packs of 64bits. We wrap the legalized AVL in a LEGALAVL node to track this. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D118321	2022-02-02 09:11:41 +01:00
David Green	c89cfbd4dd	Revert "[DAG] Extend SearchForAndLoads with any_extend handling" This reverts commit `100763a88f` as it was making incorrect assumptions about implicit zero_extends.	2022-02-01 20:18:40 +00:00
Simon Pilgrim	904395ab8f	[DAG] SimplifyMultipleUseDemandedBits - add default Depth = 0 argument. Simplifies an upcoming change.	2022-02-01 12:34:38 +00:00
Simon Pilgrim	d83a96f59f	[DAG] Make it clear mul(x,x) knownbits bit[1] == 0 check should be for x is undef only As raised on rGffd0e464b4b9, if x is poison, this fold is still ok.	2022-02-01 11:32:14 +00:00
Bjorn Pettersson	3885879046	[DAGCombine] Add simple folds for SSHLSAT/USHLSAT Do "simplifyShift" and "FoldConstantArithmetic" folds for the SSHLSAT and USHLSAT DAG nodes. This includes folds such as: (shlsat undef/poison, x) -> 0 (shlsat x, undef/poison) -> undef (shlsat x, too_large_shamt) -> undef (shlsat 0, x) -> 0 (shlsat x, 0) -> x (shlsat c1, c2) -> c3 Differential Revision: https://reviews.llvm.org/D118603	2022-02-01 10:51:35 +01:00
David Sherwood	daa80339df	[CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors I have updated TargetLowering::isConstTrueVal to also consider SPLAT_VECTOR nodes with constant integer operands. This allows the optimisation to also work for targets that support scalable vectors. Differential Revision: https://reviews.llvm.org/D117210	2022-02-01 09:50:00 +00:00
Philip Reames	57cf29ac1b	[Statepoint] Remove another use of getActualReturnType [NFC] For the cross block gc.result projection case, we only care about the return type if there is a cross block gc.result, and if there is one, we can take the type from the gc.result. At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.	2022-01-31 09:57:46 -08:00
Philip Reames	6e4f7c0823	[Statepoints] Take result type from gc.result [NFC] When lowering a gc.result, we can assume that the result type of the gc.result matches the type of the underlying call. This is explicitly required in LangRef. At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.	2022-01-31 09:42:34 -08:00
Philip Reames	093b43f48d	Sink getGCResultLocality to sole use [NFC]	2022-01-31 09:33:57 -08:00
Kerry McLaughlin	002b944dfa	[SVE] Fix TypeSize->uint64_t implicit conversion in visitAlloca() Fixes a crash ('Invalid size request on a scalable vector') in visitAlloca() when we call this function for a scalable alloca instruction, caused by the implicit conversion of TySize to uint64_t. This patch changes TySize to a TypeSize as returned by getTypeAllocSize() and ensures the allocation size is multiplied by vscale for scalable vectors. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D118372	2022-01-31 14:37:23 +00:00
Dávid Bolvanský	ae990a3cbd	[Analysis] Attribute noundef should not prevent tail call optimization Very similar to https://reviews.llvm.org/D101230 Fixes https://github.com/llvm/llvm-project/issues/53501	2022-01-31 15:13:52 +01:00
Simon Pilgrim	7ec8fc2932	[X86] combineAnd() - per-element simplification - call SimplifyDemandedBits using mask demanded bits if SimplifyDemandedVectorElts fails We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements. This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.	2022-01-31 13:58:00 +00:00
Simon Pilgrim	2d1390efbe	[DAG] SimplifyDemandedBits - mul(x,x) - if only demand bit[1] then fold to zero	2022-01-31 12:00:51 +00:00
Simon Pilgrim	48f45f6b25	[X86] Limit mul(x,x) knownbits tests with not undef/poison check We can only assume bit[1] == zero if its the only demanded bit or the source is not undef/poison	2022-01-31 11:55:10 +00:00
Kazu Hirata	2bea207d26	[CodeGen] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-01-30 12:32:51 -08:00
Cullen Rhodes	5d089d9a83	[DAGCombiner] Fix invalid size request in combineRepeatedFPDivisors If we have a vector FP division with a splatted divisor, use getVectorMinNumElements when scaling the num of uses by splat factor. For AArch64 the combine kicks in for the <vscale x 4 x float> case since it's above the fdiv threshold (3) when scaling num uses by splat factor, but the codegen is worse (splat + vector fdiv + vector fmul) than the <vscale x 2 x double> case (splat + vector fdiv). If the combine could be converted into a scalar FP division by scalarizeBinOpOfSplats it may be cheaper, but it looks like this is predicated on the isExtractVecEltCheap TLI function which is implemented for x86 but not AArch64. Perhaps for now combineRepeatedFPDivisors should only scale num uses by splat if the division can be converted into scalar op. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118343	2022-01-28 17:01:08 +00:00
Ellis Hoag	11d3074267	[InstrProf] Add single byte coverage mode Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because * We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions * We use a single byte per function rather than 8 bytes per block The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code. When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%). [0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D116180	2022-01-27 17:38:55 -08:00
Simon Pilgrim	fdd3e2c943	[DAG] SelectionDAG::getNode(N1,N2) - detect N2 constant vector splats as well as scalars We already perform some basic folds (add/sub with zero etc.) on scalar types, this patch adds some basic support for constant splats as well in a few cases (we can add more with future test coverage). In the cases I've enabled, we can handle buildvector implicit truncation as we're not creating new constant nodes from the vector types - we're just returning existing nodes. This allows us to get a number of extra cases in the aarch64 tests. I haven't enabled support for undefs in buildvector splats, as we're often checking for zero/allones patterns that return the original constant and we shouldn't be returning undef elements in some of these cases - we can enable this later if we're OK with creating new constants. Differential Revision: https://reviews.llvm.org/D118264	2022-01-27 10:59:08 +00:00
Fraser Cormack	84e85e025e	[SelectionDAG][VP] Provide expansion for VP_MERGE This patch adds support for expanding VP_MERGE through a sequence of vector operations producing a full-length mask setting up the elements past EVL/pivot to be false, combining this with the original mask, and culminating in a full-length vector select. This expansion should work for any data type, though the only use for RVV is for boolean vectors, which themselves rely on an expansion for the VSELECT. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118058	2022-01-27 09:00:41 +00:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
Sanjay Patel	63daea8b35	[SDAG] fix bug in ComputeNumSignBits of target constant The loop below the changed line assumes that the element width of the target constant is the same as the element width of the loaded value, but that is not always true. We could try harder to do some kind of min/max calc even if the sizes don't match, but that can be another patch if needed. This fixes #53401 (miscompile) and does not change the motivating cases added when this analysis was introduced: `ad298f86b7`	2022-01-26 10:22:41 -05:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
alex-t	5157f984ae	[AMDGPU] Enable divergence-driven XNOR selection Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence. This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32) on those targets which have no V_XNOR. Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form. We assume that xor (not) is already turned into the not (xor) by the combiner. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116270	2022-01-26 15:33:10 +03:00
David Green	57356d6bb7	[DAG] Create fptoui.sat from clamped fptoui This is the unsigned variant of D111976, where we convert a clamped fptoui to a fptoui.sat. Because we are unsigned, the condition this time is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D114964	2022-01-26 08:37:44 +00:00
Simon Pilgrim	15e2be291f	[DAG] visitMULHS/MULHU/AND - remove some redundant LHS constant checks Now that we constant fold and canonicalize constants to the RHS, we don't need to check both LHS and RHS for specific constants	2022-01-25 11:54:23 +00:00
Bjorn Pettersson	109cc5adcc	[DAGCombine] Fold SRA of a load into a narrower sign-extending load An sra is basically sign-extending a narrower value. Fold away the shift by doing a sextload of a narrower value, when it is legal to reduce the load width accordingly. Differential Revision: https://reviews.llvm.org/D116930	2022-01-25 12:14:48 +01:00
Fraser Cormack	7cb452bfde	[SelectionDAG][VP] Add widening support for VP_MERGE This patch adds widening support for ISD::VP_MERGE, which widens identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118030	2022-01-25 10:59:40 +00:00
Fraser Cormack	5f5c5603ce	[SelectionDAG][VP] Add splitting support for VP_MERGE This patch adds splitting support for ISD::VP_MERGE, which splits identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118032	2022-01-25 10:33:23 +00:00
Victor Perez	2233befa5d	[LegalizeTypes][VP] Add splitting support for vp.gather and vp.scatter Split these nodes in a similar way as their masked versions. Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D117760	2022-01-25 10:08:07 +00:00
Paweł Bylica	9d32847b33	[DAGCombine] Remove unused param in combineCarryDiamond(). NFC	2022-01-24 20:57:00 +01:00
Sander de Smalen	699e22a083	[ISEL] Move trivial step_vector folds to FoldConstantArithmetic. Given that step_vector is practically a constant, doing this early helps with DAGCombine folds that happen before type legalization. There is currently no way to test this happens earlier, although existing tests for step_vector folds continue protect the folds happening at all. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117863	2022-01-24 16:37:21 +00:00
Craig Topper	a43ed49f5b	[DAGCombiner][RISCV] Canonicalize (bswap(bitreverse(x))->bitreverse(bswap(x)). If the bitreverse gets expanded, it will introduce a new bswap. By putting a bswap before the bitreverse, we can ensure it gets cancelled out when this happens. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118012	2022-01-24 08:31:53 -08:00
Craig Topper	b8c7cdcc81	[SelectionDAG][RISCV] Teach getNode to fold bswap(bswap(x))->x. This can show up during when bitreverse is expanded to bswap and swap of bits within a byte. If the input is already a bswap, we should cancel them out before we further transform them in a way that makes it harder to see the redundancy. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118007	2022-01-24 08:17:46 -08:00
Matt Arsenault	99e8e17313	Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This reverts commit `a97e20a3a8`.	2022-01-24 09:26:52 -05:00
Bjorn Pettersson	46cacdbb21	[DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth In code review for D117104 two slightly weird checks were found in DAGCombiner::reduceLoadWidth. They were typically checking if BitsA was a mulitple of BitsB by looking at (BitsA & (BitsB - 1)), but such a comparison actually only make sense if BitsB is a power of two. The checks were related to the code that attempted to shrink a load based on the fact that the loaded value would be right shifted. Afaict the legality of the value types is checked later (typically in isLegalNarrowLdSt), so the existing checks were both overly conservative as well as being wrong whenever ExtVTBits wasn't a power of two. The latter was a situation triggered by a number of lit tests so we could not just assert on ExtVTBIts being a power of two). When attempting to simply remove the checks I found some problems, that seems to have been guarded by the checks (maybe just out of luck). A typical example would be a pattern like this: t1 = load i96* ptr t2 = srl t1, 64 t3 = truncate t2 to i64 When DAGCombine is visiting the truncate reduceLoadWidth is called attempting to narrow the load to 64 bits (ExtVT := MVT::i64). Then the SRL is detected and we set ShAmt to 64. In the past we've bailed out due to i96 not being a multiple of 64. If we simply remove that check then we would end up replacing the load with a new load that would read 64 bits but with a base pointer adjusted by 64 bits. So we would read 32 bits the wasn't accessed by the original load. This patch will instead utilize the fact that the logical left shift can be folded away by using a zextload. Thus, the pattern above will now be combined into t3 = load i32* ptr+offset, zext to i64 Another case is shown in the X86/shift-folding.ll test case: t1 = load i32* ptr t2 = srl i32 t1, 8 t3 = truncate t2 to i16 In the past we bailed out due to the shift count (8) not being a multiple of 16. Now the narrowing kicks in and we get t3 = load i16* ptr+offset Differential Revision: https://reviews.llvm.org/D117406	2022-01-24 12:22:04 +01:00
Nikita Popov	e7c9a6cae0	[SDAG] Don't move DBG_VALUE instructions after insertion point during scheduling (PR53243) EmitSchedule() shouldn't be touching instructions after the provided insertion point. The change introduced in D83561 performs a scan to the end of the block, and thus may move unrelated instructions. In particular, this ends up moving instructions that have been produced by FastISel and will later be deleted. Moving them means that more instructions than intended are removed. Fix this by stopping the iteration when the insertion point is reached. Fixes https://github.com/llvm/llvm-project/issues/53243. Differential Revision: https://reviews.llvm.org/D117489	2022-01-24 10:50:49 +01:00
Sander de Smalen	4f8fdf7827	[ISEL] Canonicalise constant splats to RHS. SelectionDAG::getNode() canonicalises constants to the RHS if the operation is commutative, but it doesn't do so for constant splat vectors. Doing this early helps making certain folds on vector types, simplifying the code required for target DAGCombines that are enabled before Type legalization. Somewhat to my surprise, DAGCombine doesn't seem to traverse the DAG in a post-order DFS, so at the time of doing some custom fold where the input is a MUL, DAGCombiner::visitMUL hasn't yet reordered the constant splat to the RHS. This patch leads to a few improvements, but also a few minor regressions, which I traced down to D46492. When I tried reverting this change to see if the changes were still necessary, I ran into some segfaults. Not sure if there is some latent bug there. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117794	2022-01-24 09:38:36 +00:00
Simon Pilgrim	accc07e654	[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312) Fixes parity codegen issue where we know all but the lowest bit is zero, we can replace the ICMPNE with 0 comparison with a ext/trunc Differential Revision: https://reviews.llvm.org/D117983	2022-01-23 16:36:25 +00:00
Simon Pilgrim	6605057992	Revert rG7c66aaddb128dc0f342830c1efaeb7a278bfc48c "[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312)" Noticed a typo in the getBooleanContents call just after I pressed commit :(	2022-01-23 16:28:44 +00:00
Simon Pilgrim	7c66aaddb1	[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312) Fixes parity codegen issue where we know all but the lowest bit is zero, we can replace the ICMPNE with 0 comparison with a ext/trunc Differential Revision: https://reviews.llvm.org/D117983	2022-01-23 16:20:42 +00:00
David Green	b27e5459d5	[DAG] Convert truncstore(extend(x)) back to store(x) Pulled out of D106237, this folds truncstore(extend(x)) back to store(x) if the original store was legal. This can come up due to the order we fold nodes. A fold from X86 needs to be adjusted to prevent infinite loops, to have it pick the operand of a trunc more directly. Differential Revision: https://reviews.llvm.org/D117901	2022-01-22 13:20:36 +00:00
Craig Topper	9abc593e98	[TargetLowering][InstCombine] Simplify BSwap demanded bits code a little. NFC Use alignDown instead of &= ~7. Replace ResultBit with NLZ. (BitWidth - NLZ - NTZ == 8) so (BitWidth - NTZ - 8 == NLZ). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D117804	2022-01-20 10:45:17 -08:00
Victor Perez	c10c748878	[LegalizeTypes][VP] Add widening support for vp.gather and vp.scatter Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117557	2022-01-20 08:57:57 +00:00
Simon Pilgrim	d6fee6c3b0	[DAG] SelectionDAG::computeKnownBits - add mul(x,x) self-multiply handling (PR48683) Pass the SelfMultiply flag to KnownBits::mul() - added at D108992 https://alive2.llvm.org/ce/z/NN_eaR	2022-01-19 17:39:32 +00:00
David Green	100763a88f	[DAG] Extend SearchForAndLoads with any_extend handling This extends the code in SearchForAndLoads to be able to look through ANY_EXTEND nodes, which can be created from mismatching IR types where the AND node we begin from only demands the low parts of the register. That turns zext and sext into any_extends as only the low bits are demanded. To be able to look through ANY_EXTEND nodes we need to handle mismatching types in a few places, potentially truncating the mask to the size of the final load. Recommitted with a more conservative check for the type of the extend. Differential Revision: https://reviews.llvm.org/D117457	2022-01-18 21:03:08 +00:00
Fraser Cormack	c8e33978fb	[VP] Propagate align parameter attr on VP gather/scatter to ISel This patch fixes a case where the 'align' parameter attribute on the pointer operands to llvm.vp.gather and llvm.vp.scatter was being dropped during the conversion to the SelectionDAG. The default alignment equal to the ABI type alignment of the vector type was kept. It also updates the documentation to reflect the fact that the parameter attribute is now properly supported. The default alignment of these intrinsics was previously documented as being equal to the ABI alignment of the scalar type, when in fact that wasn't the case: the ABI alignment of the vector type was used instead. This has also been fixed in this patch. Reviewed By: simoll, craig.topper Differential Revision: https://reviews.llvm.org/D114423	2022-01-18 17:33:24 +00:00
Sanjay Patel	870591200d	[SDAG] remove duplicate functionality when getting shift type for demanded bits; NFCI This was noted as a potential cleanup in D117508. getShiftAmountTy() has checks for vector, phase, etc. so it should handle anything that the caller was trying to account for.	2022-01-18 12:13:45 -05:00
Victor Perez	b7bf96a258	[LegalizeTypes][VP] Add widening support for vp.reduce.* When widening these intrinsics, we do not have to insert neutral elements at the end of the vector as when widening vector.reduce.* intrinsics, thanks to vector predication semantics. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117467	2022-01-18 10:21:01 +00:00
Hans Wennborg	f4615feaa1	Revert "[DAG] Extend SearchForAndLoads with any_extend handling" This caused builds to fail with llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:5638: bool (anonymous namespace)::DAGCombiner::BackwardsPropagateMask(llvm::SDNode *): Assertion `NewLoad && "Shouldn't be masking the load if it can't be narrowed"' failed. See the code review for a link to a reproducer. > This extends the code in SearchForAndLoads to be able to look through > ANY_EXTEND nodes, which can be created from mismatching IR types where > the AND node we begin from only demands the low parts of the register. > That turns zext and sext into any_extends as only the low bits are > demanded. To be able to look through ANY_EXTEND nodes we need to handle > mismatching types in a few places, potentially truncating the mask to > the size of the final load. > > Differential Revision: https://reviews.llvm.org/D117457 This reverts commit `578008789f`.	2022-01-18 10:50:55 +01:00
Victor Perez	fd1dce35bd	[LegalizeTypes][VP] Add splitting support for vp.reduction.* Split vp.reduction.* intrinsics by splitting the vector to reduce in two halves, perform the reduction operation in each one of them and accumulate the results of both operations. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117469	2022-01-18 09:29:24 +00:00
David Sherwood	f4515ab858	Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants" This reverts commit `197f3c0deb`. Reverting after miscompilation errors discovered with ffmpeg.	2022-01-18 08:40:20 +00:00
Sanjay Patel	ba6485e25f	[SDAG] add demanded bits transform for bswap A possible codegen regression for PowerPC is noted in D117406 because we don't recognize a pattern that demands only 1 byte from a bswap. This fold has existed in IR since close to the beginning of LLVM: https://github.com/llvm/llvm-project/blame/main/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L794 ...so this patch copies that code as much as possible and adapts it for SDAG. The test for PowerPC that would change in D117406 is over-reduced with undefs, so I recreated it for AArch64 and x86 by passing in pointer args and renamed the values to make the logic clearer. Differential Revision: https://reviews.llvm.org/D117508	2022-01-17 18:25:42 -05:00
David Green	578008789f	[DAG] Extend SearchForAndLoads with any_extend handling This extends the code in SearchForAndLoads to be able to look through ANY_EXTEND nodes, which can be created from mismatching IR types where the AND node we begin from only demands the low parts of the register. That turns zext and sext into any_extends as only the low bits are demanded. To be able to look through ANY_EXTEND nodes we need to handle mismatching types in a few places, potentially truncating the mask to the size of the final load. Differential Revision: https://reviews.llvm.org/D117457	2022-01-17 15:25:11 +00:00
David Sherwood	197f3c0deb	[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants When we know the value we're extending is a negative constant then it makes sense to use SIGN_EXTEND because this may improve code quality in some cases, particularly when doing a constant splat of an unpacked vector type. For example, for SVE when splatting the value -1 into all elements of a vector of type <vscale x 2 x i32> the element type will get promoted from i32 -> i64. In this case we want the splat value to sign-extend from (i32 -1) -> (i64 -1), whereas currently it zero-extends from (i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use a single mov immediate instruction. New tests added here: CodeGen/AArch64/sve-vector-splat.ll I believe we see some code quality improvements in these existing tests too: CodeGen/AArch64/reduce-and.ll CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only occur because the test disables codegen prepare and branch folding. Differential Revision: https://reviews.llvm.org/D114357	2022-01-17 11:08:57 +00:00
Bjorn Pettersson	9f237c9e7d	[DAGCombine] Refactor DAGCombiner::ReduceLoadWidth. NFCI Update code comments in DAGCombiner::ReduceLoadWidth and refactor the handling of SRL a bit. The refactoring is done with the intent of adding support for folding away SRA by using SEXTLOAD in a follow-up patch. The function is also renamed as DAGCombiner::reduceLoadWidth. Differential Revision: https://reviews.llvm.org/D117104	2022-01-16 20:24:52 +01:00
Fangrui Song	5456249736	[SelectionDAG] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D117235	2022-01-15 17:13:09 -08:00
Nikita Popov	c63a3175c2	[AttrBuilder] Remove ctor accepting AttributeList and Index Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeList. Moving this out of the AttrBuilder generally results in cleaner code.	2022-01-15 22:39:31 +01:00
Fraser Cormack	877d1b3d07	[SelectionDAG][VP] Add splitting/widening for VP_LOAD and VP_STORE Original patch by @hussainjk. This patch was split off from D109377 to keep vector legalization (widening/splitting) separate from vector element legalization (promoting). While the original patch added a third overload of SelectionDAG::getVPStore, this patch takes the liberty of collapsing those all down to 1, as three overloads seems excessive for a little-used node. The original patch also used ModifyToType in places, but that method still crashes on scalable vector types. Seeing as the other VP legalization methods only work when all operands need identical widening, this patch follows in that vein. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117235	2022-01-15 11:41:29 +00:00
Craig Topper	e0841f6920	[SelectionDAGBuilder] Remove unneeded vector bitcast from visitTargetIntrinsic. This seems to be a leftover from a long time ago when there was an ISD::VBIT_CONVERT and a MVT::Vector. It looks like in those days the vector type was carried in a VTSDNode. As far as I know, these days ComputeValueTypes would have already assigned "Result" the same type we're getting from TLI.getValueType here. Thus the BITCAST is always a NOP. Verified by adding an assert and running check-llvm. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D117335	2022-01-14 12:52:49 -08:00
James Y Knight	a97e20a3a8	Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This commit sometimes causes a crash when compiling a vtable thunk. E.g.: clang '--target=aarch64-grtev4-linux-gnu' -xc++ - -c -o /dev/null <<EOF struct a { virtual int f(); }; struct c { virtual int &g() const; }; struct d : a, c { int &g() const; }; int &d::g() const {} EOF Some follow-up commits have been reverted as well: Revert "IR: Make getRetAlign check callee function attributes" Revert "Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC." Revert "Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC." This reverts commit `4f414af6a7`. This reverts commit `a5507d2e25`. This reverts commit `3d2d208f6a`. This reverts commit `07ddfa95e3`.	2022-01-14 04:50:07 +00:00
David Sherwood	ba471ba8d2	Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants" This reverts commit `31009f0b5a`. It seems to be causing SVE VLA buildbot failures and has introduced a genuine regression. Reverting for now.	2022-01-13 15:59:43 +00:00
David Sherwood	31009f0b5a	[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants When we know the value we're extending is a negative constant then it makes sense to use SIGN_EXTEND because this may improve code quality in some cases, particularly when doing a constant splat of an unpacked vector type. For example, for SVE when splatting the value -1 into all elements of a vector of type <vscale x 2 x i32> the element type will get promoted from i32 -> i64. In this case we want the splat value to sign-extend from (i32 -1) -> (i64 -1), whereas currently it zero-extends from (i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use a single mov immediate instruction. New tests added here: CodeGen/AArch64/sve-vector-splat.ll I believe we see some code quality improvements in these existing tests too: CodeGen/AArch64/dag-numsignbits.ll CodeGen/AArch64/reduce-and.ll CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only occur because the test disables codegen prepare and branch folding. Differential Revision: https://reviews.llvm.org/D114357	2022-01-13 09:43:07 +00:00
Matt Arsenault	07ddfa95e3	GlobalISel: Add G_ASSERT_ALIGN hint instruction Insert it for call return values only for now, which is the only case the DAG handles also.	2022-01-12 18:20:58 -05:00
Craig Topper	63b17eb9ec	[RISCV] Add strictfp support for compares. This adds support for STRICT_FSETCC(quiet) and STRICT_FSETCCS(signaling). FEQ matches well to STRICT_FSETCC oeq. FLT/FLE matches well to STRICT_FSETCCS olt/ole. Others require commuting operands or multiple instructions. STRICT_FSETCC olt/ole/ogt/oge/ult/ule/ugt/uge uses FLT/FLE, but we need to save/restore FFLAGS around them to avoid spurious exceptions. I've implemented pseudo instructions with a CustomInserter to insert the save/restore CSR instructions. Unfortunately, this doesn't honor exceptions for signaling NANs but I'm not sure if signaling nans are really supported by the constrained intrinsics. STRICT_FSETCC one and ueq expand to a pair of FLT instructions with a save/restore of fflags around each. This could be improved in the future. There may be some opportunities to generate better code for strict comparisons mixed with nonans fast math flags. I've left FIXMEs in the .td files for that. Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com> Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D116694	2022-01-11 20:01:41 -08:00
Nick Desaulniers	4edb9983cb	[SelectionDAG] treat X constrained labels as i for asm Completely rework how we handle X constrained labels for inline asm. X should really be treated as i. Then existing tests can be moved to use i D115410 and clang can just emit i D115311. (D115410 and D115311 are callbr, but this can be done for label inputs, too). Coincidentally, this simplification solves an ICE uncovered by D87279 based on assumptions made during D69868. This is the third approach considered. See also discussions v1 (D114895) and v2 (D115409). Reported-by: kernel test robot <lkp@intel.com> Fixes: https://github.com/ClangBuiltLinux/linux/issues/1512 Reviewed By: void, jyknight Differential Revision: https://reviews.llvm.org/D115688	2022-01-11 10:29:40 -08:00
David Sherwood	51497dc0b2	[IR] Change vector.splice intrinsic to reject out-of-bounds indices I've changed the definition of the experimental.vector.splice instrinsic to reject indices that are known to be or possibly out-of-bounds. In practice, this means changing the definition so that the index is now only valid in the range [-VL, VL-1] where VL is the known minimum vector length. We use the vscale_range attribute to take the minimum vscale value into account so that we can permit more indices when the attribute is present. The splice intrinsic is currently only ever generated by the vectoriser, which will never attempt to splice vectors with out-of-bounds values. Changing the definition also makes things simpler for codegen since we can always assume that the index is valid. This patch was created in response to review comments on D115863 Differential Revision: https://reviews.llvm.org/D115933	2022-01-11 09:37:39 +00:00
Nick Desaulniers	649b11ef8b	git-clang-format HEAD~	2022-01-10 18:34:30 -08:00
Nick Desaulniers	301e911740	[TargetLowering] precommit refactor from D115688 NFC Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>	2022-01-10 18:32:13 -08:00
Nadav Rotem	e2cc091a7d	Fix a missed opportunity to merge stores. This commit fixes a missed opportunity in merging consecutive stores. The code that searches for stores skipped the case of stores that directly connect to the root. The comment above the implementation lists this case but the code did not handle it. I found this pattern when looking into the shared_ptr destructor. GCC generates the right sequence. Here is a small repo: int foo(int* buff) { buff[0] = 0; int x = buff[1]; buff[1] = 0; return x; } Differential Revision: https://reviews.llvm.org/D116895	2022-01-10 13:49:02 -08:00
Serge Guelton	d2cc6c2d0c	Use a sorted array instead of a map to store AttrBuilder string attributes Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step. Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions Differential Revision: https://reviews.llvm.org/D116599	2022-01-10 14:49:53 +01:00
Chen Zheng	2c46ca96e2	[PowerPC] fast isel can lower intrinsics call on AIX. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D114778	2022-01-10 02:30:05 +00:00
Craig Topper	a500f7f48f	[SelectionDAG] Add FP_TO_UINT_SAT/FP_TO_SINT_SAT to computeKnownBits/computeNumSignBits. These nodes should saturate to their saturating VT. We can use this information to know the bits past the VT are all zeros or all sign bits. I think we might only have test coverage for the unsigned case. I'll verify and add tests. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D116870	2022-01-09 17:48:05 -08:00
Nikita Popov	0312fe2901	[CodeGen] Support opaque pointers for inline asm This is the last part of D116531. Fetch the type of the indirect inline asm operand from the elementtype attribute, rather than the pointer element type. Fixes https://github.com/llvm/llvm-project/issues/52928.	2022-01-07 10:57:38 +01:00
Nikita Popov	e4d1779990	[IR] Add ConstraintInfo::hasArg() helper (NFC) Checking whether a constraint corresponds to an argument is a recurring pattern.	2022-01-07 10:44:38 +01:00
Victor Perez	38efa68b08	[LegalizeTypes][VP] Add splitting support for vp.select Split vp.select in a similar way as vselect, splitting also the length parameter. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116651	2022-01-07 08:46:01 +00:00
Kazu Hirata	2aed08131d	[llvm] Use true/false instead of 1/0 (NFC) Identified with modernize-use-bool-literals.	2022-01-07 00:39:14 -08:00
Craig Topper	88ecdd30f6	[LegalizeTypes] Remove IsVP argument from type legalization methods. NFC We can either check the opcode or number of operands or use ISD::isVPOpcode inside the methods. In some places I've used number of operands figuring that it is cheaper than isVPOpcode. I've included isVPOpcode in an assert to verify. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D116578	2022-01-05 09:00:48 -08:00
Victor Perez	96e220e688	[LegalizeTypes][VP] Add integer promotion support for vp.select Promote select, vselect and vp.select in a similar way. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116400	2022-01-05 11:01:52 +00:00
Victor Perez	df5226dfb3	[LegalizeTypes][VP] Add widening support for vp.select Widen vp.select the same way as select and vselect. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116407	2022-01-05 09:21:11 +00:00
Craig Topper	a04b532505	[LegalizeIntegerTypes][RISCV] Teach PromoteSetCCOperands to check sign bits of unsigned compares. Unsigned compares work with either zero extended or sign extended inputs just like equality comparisons. I didn't allow this when I refactored the code in D116421 due to lack of tests. But I've since found a simple C test case that demonstrates when this can be useful. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D116617	2022-01-04 12:38:47 -08:00
Simon Moll	4c2aba999e	[VP][ISel] use LEGALPOS for legalization action Use the VPIntrinsics.def's LEGALPOS that is specified with every VP SDNode to determine which return or operand value type shall be used to infer the legalization action. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D116594	2022-01-04 14:50:49 +01:00
Simon Pilgrim	882c083889	[DAG] TargetLowering::SimplifySetCC - use APInt::getMinSignedBits() helper. NFC.	2022-01-04 13:48:36 +00:00
Craig Topper	cbcbbd6ac8	[ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC This function returns an upper bound on the number of bits needed to represent the signed value. Use "Max" to match similar functions in KnownBits like countMaxActiveBits. Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old name around to keep this patch size down. Will do a bulk rename as follow up. Rename KnownBits::countMaxSignedBits->countMaxSignificantBits. Reviewed By: lebedev.ri, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D116522	2022-01-03 11:33:30 -08:00
Victor Perez	5527139302	[RISCV][VP] Add RVV codegen for [nX]vXi1 vp.select Expand [nX]vXi1 vp.select the same way as [nX]vXi1 vselect. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115546	2022-01-02 23:12:32 -08:00
Kazu Hirata	69ccc96162	[llvm] Use the default constructor for SDValue (NFC)	2022-01-01 10:36:59 -08:00
Craig Topper	243b7aaf51	[SelectionDAG] Use KnownBits::countMinSignBits() to simplify the end of ComputeNumSignBits. This matches what is done in ValueTracking.cpp Reviewed By: RKSimon, foad Differential Revision: https://reviews.llvm.org/D116423	2021-12-31 17:29:57 -08:00
Craig Topper	d00e438cfe	[RISCV][LegalizeIntegerTypes] Teach PromoteSetCCOperands not to sext i32 comparisons for RV64 if the promoted values are already zero extended. This is similar to what is done for targets that prefer zero extend where we avoid using a zero extend if the promoted values are sign extended. We'll also check for zero extended operands for ugt, ult, uge, and ule when the target prefers sign extend. This is different than preferring zero extend, where we only check for sign bits on equality comparisons. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D116421	2021-12-31 17:15:20 -08:00
Craig Topper	7d659c6ac7	[LegalizeIntegerTypes] Rename NewLHS/NewRHS arguments to DAGTypeLegalizer::PromoteSetCCOperands. NFC The 'New' only makes sense in the context of these being output arguments, but they are also used as inputs first. Drop the 'New' and just call them LHS/RHS. Factored out of D116421.	2021-12-30 15:31:43 -08:00
Craig Topper	15787ccd45	[RISCV] Add support for STRICT_LRINT/LLRINT/LROUND/LLROUND. Tests for other strict intrinsics. This patch adds isel support for STRICT_LRINT/LLRINT/LROUND/LLROUND. It also adds test cases for f32 and f64 constrained intrinsics that correspond to the intrinsics in float-intrinsics.ll and double-intrinsics.ll. Support for promoting the integer argument of STRICT_FPOWI was added. I've skipped adding tests for f16 intrinsics, since we don't have libcalls for them and we have inconsistent support for promoting them in LegalizeDAG. This will need to be examined more closely. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D116323	2021-12-30 11:54:32 -08:00
Craig Topper	1c6b740d4b	[TargetLowering] Remove workaround for old behavior of getShiftAmountTy. NFC getShiftAmountTy used to directly return the shift amount type from the target which could be too small for large illegal types. For example, X86 always returns i8. The code here detected this and used i32 instead if it won't fit. This behavior was added to getShiftAmountTy in D112469 so we no longer need this workaround.	2021-12-28 14:08:25 -08:00
Simon Pilgrim	71fc4bbdd2	[X86][SSE] Add ISD::ROTR support Fix issue in TargetLowering::expandROT where we only attempt to flip a rotation if the other direction has better support - this matches TargetLowering::expandFunnelShift This allows us to enable ISD::ROTR lowering on SSE targets, which particularly simplifies/improves codegen for splat amount and AVX2 per-element shifts.	2021-12-23 15:07:30 +00:00
Shivam Gupta	0489e89119	[DAGCombiner] Avoid combining adjacent stores at -O0 to improve debug experience When the source has a series of assignments, users reasonably want to have the debugger step through each one individually. Turn off the combine for adjacent stores so we get this behavior at -O0. Similar to D7181. Reviewed By: spatel, xgupta Differential Revision: https://reviews.llvm.org/D115808	2021-12-23 10:48:28 +05:30
Simon Pilgrim	4639461531	[DAG][X86] Add TargetLowering::isSplatValueForTargetNode override Add callback to enable us to test target nodes if they are splat vectors Added some basic X86ISD::VBROADCAST + X86ISD::VBROADCAST_LOAD handling	2021-12-22 16:57:44 +00:00
Simon Pilgrim	592e89e636	[DAG] Constify SelectionDAG::isSplatValue() This doesn't generate any nodes so should be usable by methods with const SelectionDAG &.	2021-12-21 11:19:23 +00:00

... 6 7 8 9 10 ...

12514 Commits