llvm-project

Commit Graph

Author	SHA1	Message	Date
Paul Walker	7dd05ba9ed	[SelectionDAG] Remove duplicate "is scaled" information from gather/scatter SDNodes. During early gather/scatter enablement two different approaches were taken to represent scaled indices: * A Scale operand whereby byte_offsets = Index * Scale * An IndexType whereby byte_offsets = Index * sizeof(MemVT.ElementType) Having multiple representations is bad as shown by this patch which fixes instances where the two are out of sync. The dedicated scale operand is more flexible and pervasive so this patch removes the UNSCALED values from IndexType. This means all indices are scaled but the scale can be one, hence unscaled. SDNodes now use the scale operand to answer the "isScaledIndex" question. I toyed with the idea of keeping the UNSCALED enums and helper functions but because they will have no uses and force SDNodes to validate the set of supported values I figured it's best to remove them. We can re-add them if there's a real need. For similar reasons I've kept the IndexType enum when a bool could be used as I think being explicitly looks better. Depends On D123347 Differential Revision: https://reviews.llvm.org/D123381	2022-05-16 20:47:52 +01:00
Craig Topper	1c4880a2d3	[TargetLowering] Expand the last stage of i16 popcnt using shift+add+and instead of mul+shift. If we use multiply it would be with 0x0101 which is 1 more than a power of 2. On some targets we would expand this to shl+add. By avoiding the multiply earlier, we can generate better code. Note, PowerPC doesn't do the shl+add expansion of multiply so one of the tests increased in instruction count. Limiting to scalars because it almost always increased the number of instructions in vector tests. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D125638	2022-05-16 09:27:44 -07:00
Craig Topper	e6fc8454be	[DAGCombiner] Fix incorrect indentation. NFC	2022-05-16 09:27:15 -07:00
Bradley Smith	7ff5148d64	[DAGCombine] Support splat_vector nodes in (and (extload)) dagcombine Differential Revision: https://reviews.llvm.org/D125367	2022-05-16 11:25:20 +00:00
Yeting Kuo	26a61ab678	[SelectionDAG] Make getNode which uses single element SDVTList pass SDNodeFlags. The patch make users not need to know getNode with SDNodeFlags argument may not pass its flags. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125659	2022-05-16 18:19:46 +08:00
Denis Antrushin	8903dbef8f	[StatepointLowering] Properly handle local and non-local relocates of the same value. FunctionLoweringInfo::StatepointRelocationMaps map is used to pass GC pointer lowering information from statepoint to gc.relocate which may appear ini different block. D124444 introduced different lowering for local and non-local relocates. Local relocates use SDValue and non-local relocates use value exported to VReg. But I overlooked the fact that StatepointRelocationMap is indexed not by GCRelocate instruction, but by derived pointer. This works incorrectly when we have two relocates (one local and another non-local) of the same value, because they need different relocation records. This patch fixes the problem by recording relocation information per relocate instruction, not per derived pointer. This way, each gc.relocate can be lowered differently. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D125538	2022-05-16 17:02:34 +07:00
Nikita Popov	05c3fe075d	[FastISel] Fix load folding for registers with fixups FastISel tries to fold loads into the single using instruction. However, if the register has fixups, then there may be additional uses through an alias of the register. In particular, this fixes the problem reported at https://reviews.llvm.org/D119432#3507087. The load register is (at the time of load folding) only used in a single call instruction. However, selection of the bitcast has added a fixup between the load register and the cross-BB register of the bitcast result. After fixups are applied, there would now be two uses of the load register, so load folding is not legal. Differential Revision: https://reviews.llvm.org/D125459	2022-05-16 10:25:25 +02:00
Craig Topper	b4ad450953	[TargetLowering] expandCTPOP don't create an used constant mask for i8 ctpop. NFC Use early out for the i8 case. I'm looking at avoiding MUL on targets that use libcalls for MUL. So doing a little pre-refactoring.	2022-05-14 20:35:38 -07:00
Simon Pilgrim	f4eac6e5f6	[DAG] visitOR - merge isa/cast<ShuffleVectorSDNode> into dyn_cast<ShuffleVectorSDNode>. NFC. Also, initialize entire mask to -1 to simplify undefined cases.	2022-05-14 20:49:26 +01:00
Simon Pilgrim	95cdd63b87	[DAG] visitADDLike - use SelectionDAG::FoldConstantArithmetic directly to match constant operands SelectionDAG::FoldConstantArithmetic determines if operands are foldable constants, so we don't need to bother with isConstantOrConstantVector / Opaque tests before calling it directly.	2022-05-14 18:39:41 +01:00
Simon Pilgrim	8db72d9d04	[DAG] visitMUL - pull out repeated SDLoc() calls. NFC.	2022-05-14 14:28:39 +01:00
Simon Pilgrim	8d4d4988e4	[DAG] Use SelectionDAG::FoldConstantArithmetic directly to match constant operands SelectionDAG::FoldConstantArithmetic determines if operands are foldable constants, so we don't need to bother with isConstantOrConstantVector / Opaque tests before calling it directly.	2022-05-14 14:19:12 +01:00
Simon Pilgrim	1ecc3d86ae	[DAG] Enable ISD::SHL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits Pulled out of D77804 as its going to be easier to address the regressions individually. This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. The lost RISCV gorc2 fold shouldn't be a problem - instcombine would have already destroyed that pattern - see https://github.com/llvm/llvm-project/issues/50553 Differential Revision: https://reviews.llvm.org/D124839	2022-05-14 09:50:01 +01:00
Simon Pilgrim	3fc33ced10	DAGCombiner.cpp - break if-else chains that always return (style)	2022-05-13 18:31:39 +01:00
Sanjay Patel	e52e1dab2a	[SDAG] freeze operand when expanging urem This is a potential miscompile as discussed in issue #55291. The related IR transform was patched with: `d428f09b2c`	2022-05-13 10:55:14 -04:00
Lian Wang	693758b282	[LegalizeTypes][VP] Add integer promotion support for vp.setcc Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125453	2022-05-13 06:25:13 +00:00
Lian Wang	8050ba6678	[LegalizeTypes][VP] Add integer promotion support for vp.merge Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125452	2022-05-13 03:28:29 +00:00
Nikita Popov	50f846d634	[FastISel] Add some debug output (NFC) Print a debug message when aborting isel (next to the ORE report) and when folding a load.	2022-05-12 12:25:20 +02:00
Lian Wang	9176096c86	[LegalizeVectorTypes] Enable WidenVecRes_SETCC work for scalable vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125359	2022-05-12 02:52:43 +00:00
Xiang1 Zhang	2ea8f203cd	[CodeGen] Fix ConvertNodeToLibcall for STRICT_FPOWI Reviewed By: PengfeiWang Differential Revision: https://reviews.llvm.org/D125159	2022-05-11 08:58:06 +08:00
Lian Wang	f14a1f26ad	Revert "[RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation" This patch make CodeGen/test/AArch64/vecreduce-add-legalization.ll fail. This reverts commit `17a8a1bb71`.	2022-05-10 09:25:25 +00:00
Lian Wang	17a8a1bb71	[RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125206	2022-05-10 08:52:48 +00:00
David Green	2cfb243bcd	[DAG] Use isAnyConstantBuildVector. NFC As suggested from `02f8519502`, this uses the isAnyConstantBuildVector method in lieu of separate isBuildVectorOfConstantSDNodes calls. It should otherwise be an NFC.	2022-05-09 14:13:03 +01:00
David Green	02f8519502	[DAG] Prevent infinite loop combining bitcast shuffle This prevents an infinite loop from D123801, where code trying to reduce the total number of bitcasts, but also handling constants, could create the opposite transform. Prevent the transform in these case to let the bitcast of a constant transform naturally. Fixes #55345	2022-05-09 09:36:22 +01:00
Simon Pilgrim	800d36cf32	[DAG] Only perform the fold (A-B)+(C-D) --> (A+C)-(B+D) when both inner subs have one use Fixes #51381	2022-05-08 13:51:58 +01:00
Craig Topper	b81bf7bb2f	[LegalizeTypes] Make use of SelectionDAG::getShiftAmountConstant. NFC Instead of calling getShiftAmountTy and getConstant separately.	2022-05-07 12:16:53 -07:00
Craig Topper	00bfaba997	[LegalizeTypes] Don't assume fshl/fshr shift amount type matches the other operands. Like other shifts, the type isn't required to match. We shouldn't assume we can call ZExtPromotedInteger. I tested the PromoteIntOp_FunnelShift locally by removing the promotion of the shift amount from PromoteIntRes_FunnelShift. But with the final version of this patch it is never executed on any tests. Differential Revision: https://reviews.llvm.org/D125106	2022-05-07 11:44:07 -07:00
Amaury Séchet	06fad8bc05	[DAGCombine] Add node in the worklist in topological order in CombineTo This is part of an ongoing effort toward making DAGCombine process the nodes in topological order. This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124743	2022-05-07 16:24:31 +00:00
Paul Walker	702c4ade22	[ISD::IndexType] Helper functions for common queries. Add helper functions to query the signed and scaled properties of ISD::IndexType along with functions to change them. Remove setIndexType from MaskedGatherSDNode because it only has one usage and typically should only be changed alongside its index operand. Minimise the direct use of the enum values to lay the groundwork for more refactoring. Differential Revision: https://reviews.llvm.org/D123347	2022-05-07 11:23:42 +01:00
David Green	5930691ee1	Revert "[DAGCombine] Make combineShuffleOfBitcast LittleEndian specific" This reverts commit `891c3cf99e` as it turns out that the error was not caused by this commit, the error caming from D124526 instead.	2022-05-06 21:03:22 +01:00
David Green	891c3cf99e	[DAGCombine] Make combineShuffleOfBitcast LittleEndian specific Something is going wrong with the BigEndian PowerPC bot. It is hard to tell what is wrong from here, but attempt to fix it by disabling the combineShuffleOfBitcast combine for bigendian.	2022-05-06 18:42:44 +01:00
Craig Topper	76f90a9d71	[SelectionDAG] Clear promoted bits before UREM on shift amount in PromoteIntRes_FunnelShift. Otherwise we have garbage in the upper bits that can affect the results of the UREM. Fixes PR55296. Differential Revision: https://reviews.llvm.org/D125076	2022-05-06 09:26:30 -07:00
Simon Pilgrim	c0bebc12f0	[DAG] visitREM - merge buildOptimizedSREM into if(). NFCI.	2022-05-06 15:39:17 +01:00
David Green	115c188807	[DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask')) If the mask is made up of elements that form a mask in the higher type we can convert shuffle(bitcast into the bitcast type, simplifying the instruction sequence. A v4i32 2,3,0,1 for example can be treated as a 1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load combines, along with helping simplify a number of other tests. The PowerPC combine for v16i8 splat vector loads needed some fixes to keep it working for v16i8 vectors. This improves the handling of v2i64 shuffles to match too, hopefully improving them in general. Differential Revision: https://reviews.llvm.org/D123801	2022-05-06 10:50:31 +01:00
Lian Wang	fb0d636f28	[RISCV][SelectionDAG] Support VP_REDUCE_ADD mask operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124986	2022-05-06 01:49:21 +00:00
Craig Topper	5140e0d219	[SelectionDAGISel] Add back a comment to MergeInputChains handling. NFC This comment used to exist, but was lost in a refactor over 10 years ago, but still seems relevant and improves readability.	2022-05-05 12:59:21 -07:00
Craig Topper	084f967370	[SelectionDAG] Constant fold (sext_inreg undef, VT) to 0 instead of undef. The result of sign_extend_inreg needs to have as many sign bits as requested by the VT argument. The easiest way to guarantee this is to fold it to 0. SystemZ test was modified to avoid using undef. Fixes https://github.com/llvm/llvm-project/issues/55178 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124696	2022-05-05 09:45:35 -07:00
Craig Topper	4e2d1a6c18	[DAGCombiner] Fold (sext/zext undef) -> 0 and aext(undef) -> undef. Differential Revision: https://reviews.llvm.org/D124988	2022-05-05 09:34:18 -07:00
Craig Topper	fd13192aa5	[DAGCombiner] Fold (max/min X, X) -> X. Differential Revision: https://reviews.llvm.org/D124951	2022-05-05 09:34:17 -07:00
Nikita Popov	9678936f18	[DAGCombine] Fold (X & ~Y) \| Y with truncated not This extends the (X & ~Y) \| Y to X \| Y fold to also work if ~Y is a truncated not (when taking into account the mask X). This is done by exporting the infrastructure added in D124856 and reusing it here. I've retained the old value of AllowUndefs=false, though probably this can be switched to true with extra test coverage. Differential Revision: https://reviews.llvm.org/D124930	2022-05-05 11:10:11 +02:00
Craig Topper	572dfef1db	[SelectionDAG] Use llvm::any_of to simplify a loop. NFC	2022-05-04 19:09:06 -07:00
Nikita Popov	451bc723ae	[SDAG] Handle truncated not in haveNoCommonBitsSet() Demanded bits analysis may replace a full-width not with a any_extend (not (truncate X)) pattern. This patch looks through this kind of pattern in haveNoCommonBitsSet(). Of course, we can only do this if we only need negated bits in the non-extended part, as the other bits may now be arbitrary. For example, if we have haveNoCommonBitsSet(~X & Y, X) then ~X only needs to actually negate bits set in Y. This is only a partial solution to the problem in that it allows add -> or conversion, but the resulting or doesn't get folded yet. (I guess that will involve exposing getBitwiseNotOperand() as a more general helper and using that in the relevant transform.) Differential Revision: https://reviews.llvm.org/D124856	2022-05-04 15:30:44 +02:00
serge-sans-paille	7030654296	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `fa5a4e1b95` detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D124847	2022-05-04 08:32:38 +02:00
Simon Pilgrim	faa35fc873	[DAG] Fix issue with rot(rot(x,c1),c2) -> rot(x,c1+c2) fold with unnormalized rotation amounts Don't assume the rotation amounts have been correctly normalized - do it as part of the constant folding. Also, the normalization should be performed with UREM not SREM.	2022-05-03 17:16:26 +01:00
Nikita Popov	2171a896ed	[SDAG] Handle A and B&~A in haveNoCommonBitsSet() This is the DAG variant of D124763. The code already handles the general pattern, but not this degenerate case. This allows folding A + (B&~A) to A \| (B&~A) which further holds to A \| B. Handling on the SDAG level is needed because in the motivating case the add is actually a getelementptr, which only gets converted into an add on the SDAG level. However, this patch is not quite sufficient to handle the getelementptr case yet, because of an interfering demanded bits simplification. Differential Revision: https://reviews.llvm.org/D124772	2022-05-03 15:47:02 +02:00
Nikita Popov	e0892614b1	[SDAG] Extract commutative helper from haveNoCommonBitsSet() (NFC) To make it easier to add additional patterns, which will generally want to handle commuted top-level operands.	2022-05-03 12:28:35 +02:00
Hsiangkai Wang	eaaa31ff2c	[RISCV][TargetLowering] Special case overflow expansion for (uaddo X, C). Follow-up to D122933. Differential Revision: https://reviews.llvm.org/D124374	2022-05-03 03:51:36 +00:00
Craig Topper	5f057eaa0d	[DAGCombiner] reassociationCanBreakAddressingModePattern should check uses of the outer add. When looking for memory uses, reassociationCanBreakAddressingModePattern should check uses of the outer ADD rather than the inner ADD. We want to know if the two ops we're reassociating are used by a load/store. In practice, the existing check usually works because CodeGenPrepare will make one of the load/stores have an offset of 0 relative to split GEP. That will make the inner add have a memory use. To test this, I've manually split the GEPs so there is no 0 offset store. This issue was recently discussed in the original review D60294. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D124644	2022-05-02 16:38:53 -07:00
Sanjay Patel	747c6a0c73	[SDAG] fix miscompile when casting int->FP->int This is the codegen equivalent of D124692. As shown in https://github.com/llvm/llvm-project/issues/55150 - the existing fold may be wrong when converting to a signed value. This is a quick fix to avoid the miscompile. https://alive2.llvm.org/ce/z/KtaDmd Differential Revision: https://reviews.llvm.org/D124771	2022-05-02 14:57:27 -04:00
Simon Pilgrim	ae8b10e543	[DAG] (style) Break apart if-else chain as they all return	2022-05-01 17:56:59 +01:00
Paul Walker	f10a8f6752	[LegalizeDAG] Fix TypeSize conversion error when expanding SIGN_EXTEND_INREG SIGN_EXTEND_INREG expansion can trigger a TypeSize error because "VT.getSizeInBits() == 1" is used to detect for a boolean without first verifying VT is a scalar.	2022-04-30 19:21:48 +01:00
Craig Topper	6affe87bda	[DAGCombiner] When matching a disguised rotate by constant don't forget to apply LHSMask/RHSMask. We try to match as a disguised rotate by constant of these forms (shl (X \| Y), C1) \| (srl X, C2) --> (rotl X, C1) \| (shl Y, C1) (shl X, C1) \| (srl (X \| Y), C2) --> (rotl X, C1) \| (srl Y, C2) We may have also looked through an AND to find the shift. If we did, we need to apply a mask to the result. I'll add an AArch64 test and pre-commit it and the RISC-V test tomorrow. Fixes PR55201. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124711	2022-04-30 11:02:30 -07:00
Paul Walker	23c509754d	[DAGCombiner] Stop invalid sign conversion in refineIndexType. When looking through extends of gather/scatter indices it's safe to convert a known positive signed index to unsigned, but unsigned indices must remain unsigned. Depends On D123318 Differential Revision: https://reviews.llvm.org/D123326	2022-04-29 14:20:13 +01:00
Nikita Popov	027c728f29	[SelectionDAGBuilder] Don't create MGATHER/MSCATTER with Scale != ElemSize This is an alternative to D124530. In getUniformBase() only create scales that match the gather/scatter element size. If targets also support other scales, then they can produce those scales in target DAG combines. This is what X86 already does (as long as the resulting scale would be 1, 2, 4 or 8). This essentially restores the pre-opaque-pointer state of things. Fixes https://github.com/llvm/llvm-project/issues/55021. Differential Revision: https://reviews.llvm.org/D124605	2022-04-29 14:57:53 +02:00
Paul Walker	7a0b897e86	[DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling refineUniformBase and selectGatherScatterAddrMode both attempt the transformation: base(0) + index(A+splat(B)) => base(B) + index(A) However, this is only safe when index is not implicitly scaled. Differential Revision: https://reviews.llvm.org/D123222	2022-04-29 12:35:16 +01:00
Serge Pavlov	9fc58f1820	[PowerPC] Support of ppc_fp128 in lowering of llvm.is_fpclass PowerPC supports `ppc_fp128`, which is not an IEEE floating point type. The generic lowering of llvm.is_fpclass could not handle it properly. This change extends the generic lowering code to support `ppc_fp128`. The change was tested on emulator using runtime tests from https://reviews.llvm.org/D112933 and the patch for clang https://reviews.llvm.org/D112932. Differential Revision: https://reviews.llvm.org/D113908	2022-04-29 11:10:47 +07:00
Alexey Bataev	75e1cf4a6a	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-28 10:04:41 -07:00
Bjorn Pettersson	3a39bb96ca	[SelectionDAG] Use correct boolean representation in FoldConstantArithmetic The description of SETCC says /// SetCC operator - This evaluates to a true value iff the condition is /// true. If the result value type is not i1 then the high bits conform /// to getBooleanContents. Without this patch, we sign extended the i1 to the used larger type regardless of getBooleanContents. This resulted in miscompiles, as shown in the attached testcase that ended up returning -1 instead of 1 when using -mattr=+v. Fixes https://github.com/llvm/llvm-project/issues/55168 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124618	2022-04-28 18:42:16 +02:00
Alexey Bataev	9861ca0c23	Revert "[COST]Improve cost model for shuffles in SLP." This reverts commit `29a470e380` to fix a crash reported in https://reviews.llvm.org/D100486#3479989.	2022-04-28 08:11:56 -07:00
Alexey Bataev	29a470e380	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-27 10:56:26 -07:00
Denis Antrushin	4059770af5	[StatepointLowering] Only export STATEPOINT results if used in nonlocal blocks. Cuurently we always export STATEPOINT results (GC pointers lowered via VRegs) to virtual registers. When processing gc.relocate instructions we have to generate CopyFromRegs node and then export it to VReg again if gc.relocate is used in other basic blocks. This results in generation of extra COPY MIR instruction if statepoint and its gc.relocate are in the same BB, but gc.relocate result is used in other blocks. This patch changes this behavior to export statepoint results only if used in other basic blocks. For local uses StatepointLoweringState.(get\|set)Location() API is used to communicate appropriate statepoint result from `LowerStatepoint()` to `visitGCRelocate()` This is NFC and is purely compile time optimization. On big methids it can improve codegen compile time up to 10%. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D124444	2022-04-27 15:53:24 +03:00
Serge Pavlov	170a903144	Intrinsic for checking floating point class This change introduces a new intrinsic, `llvm.is.fpclass`, which checks if the provided floating-point number belongs to any of the the specified value classes. The intrinsic implements the checks made by C standard library functions `isnan`, `isinf`, `isfinite`, `isnormal`, `issubnormal`, `issignaling` and corresponding IEEE-754 operations. The primary motivation for this intrinsic is the support of strict FP mode. In this mode using compare instructions or other FP operations is not possible, because if the value is a signaling NaN, floating-point exception `Invalid` is raised, but the aforementioned functions must never raise exceptions. Currently there are two solutions for this problem, both are implemented partially. One of them is using integer operations to implement the check. It was implemented in https://reviews.llvm.org/D95948 for `isnan`. It solves the problem of exceptions, but offers one solution for all targets, although some can do the check in more efficient way. The other, implemented in https://reviews.llvm.org/D96568, introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects a target specific code into IR to implement `isnan` and some other functions. It is convenient for targets that have dedicated instruction to determine FP data class. However using target-specific intrinsic complicates analysis and can prevent some optimizations. A special intrinsic for value class checks allows representing data class tests with enough flexibility. During IR transformations it represents the check in target-independent way and saves it from undesired transformations. In the instruction selector it allows efficient lowering depending on the used target and mode. This implementation is an extended variant of `llvm.isnan` introduced in https://reviews.llvm.org/D104854. It is limited to minimal intrinsic support. Target-specific treatment will be implemented in separate patches. Differential Revision: https://reviews.llvm.org/D112025	2022-04-26 13:09:16 +07:00
Lian Wang	9980148305	[RISCV][SelectionDAG] Support VP_ADD/VP_MUL/VP_SUB mask operations Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124144	2022-04-26 02:30:22 +00:00
David Green	9727c77d58	[NFC] Rename Instrinsic to Intrinsic	2022-04-25 18:13:23 +01:00
Simon Pilgrim	34e7243464	[DAG] Fold freeze(bitcast(x)) -> bitcast(freeze(x)) This is a very specific fold to fix an upstream poor codegen issue. InstCombine has the much more flexible pushFreezeToPreventPoisonFromPropagating but I don't think we're quite there with DAG/TLI handling for canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison value tracking yet. Fixes #54911 Differential Revision: https://reviews.llvm.org/D124185	2022-04-22 16:39:25 +01:00
Alexey Bataev	2cca53c815	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 09:37:16 -07:00
Alexey Bataev	5f7ac15912	Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer." This reverts commit `2f49163b33` to fix a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284	2022-04-20 06:35:55 -07:00
Alexey Bataev	2f49163b33	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 05:32:56 -07:00
Matt Arsenault	8591328e15	Intrinsics: Mark llvm.eh.sjlj.callsite argument as immarg The assert in SelectionDAG implies that it is	2022-04-19 21:04:33 -04:00
chenglin.bi	222adf338a	[Arch64][SelectionDAG] Add target-specific implementation of srem 1. X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. 2. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-19 02:49:42 +08:00
chenglin.bi	acfc025a72	Revert "[Arch64][SelectionDAG] Add target-specific implementation of srem" This reverts commit `9d9eddd3dd`.	2022-04-18 10:35:09 +08:00
chenglin.bi	9d9eddd3dd	[Arch64][SelectionDAG] Add target-specific implementation of srem X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-16 12:29:11 +08:00
Craig Topper	c6dc229a6d	[DAGCombiner] Move call to hasOneUse after opcode checks. NFC Checking the opcode is cheap, counting the number of uses is not.	2022-04-15 17:02:16 -07:00
Craig Topper	a7b9d75e7a	[DAGCombiner] Move or/xor/and opcode check in ReduceLoadOpStoreWidth before hasOneUse check. hasOneUse is not cheap on nodes with chain results that might have many uses. By checking the opcode first, we can avoid a costly walk of the use list on nodes we aren't interested in. Found by investigating calls to hasNUsesOfValue from the example provided in D123857.	2022-04-15 16:38:27 -07:00
John Brawn	12c1022679	[AArch64] Lowering and legalization of strict FP16 For strict FP16 to work correctly needs some changes in lowering and legalization: * SelectionDAGLegalize::PromoteNode was missing handling for some strict fp opcodes. * Some of the custom lowering of strict fp operations needed to be adjusted to work with FP16. * Custom lowering needed to be added for round-to-int operations. With this, and the previous patches for the rest of the strict fp isel, we can set IsStrictFPEnabled = true. Differential Revision: https://reviews.llvm.org/D115620	2022-04-14 16:51:22 +01:00
Paul Walker	0c44115e51	[SVE] Add support for non-element-type sized scaling when lowering MGATHER/MSCATTER. The lowering code did not use the scale operand of MGATHER/MSCATTER nodes, but instead assumed scaled indices were always scaled based on the element type of the memory type. This patch adds the missing support by rewritting the nodes as unscaled variants. Differential Revision: https://reviews.llvm.org/D123670	2022-04-14 11:54:46 +01:00
Simon Pilgrim	fef221bf1f	[DAG] Enable SimplifyVBinOp folds on add/sub sat intrinsics	2022-04-13 12:53:23 +01:00
Jonas Paulsson	46f83caebc	[InlineAsm] Add support for address operands ("p"). This patch adds support for inline assembly address operands using the "p" constraint on X86 and SystemZ. This was in fact broken on X86 (see example at https://reviews.llvm.org/D110267, Nov 23). These operands should probably be treated the same as memory operands by CodeGenPrepare, which have been commented with "TODO" there. Review: Xiang Zhang and Ulrich Weigand Differential Revision: https://reviews.llvm.org/D122220	2022-04-13 12:50:21 +02:00
Simon Pilgrim	cfb3ee2185	[DAG] Add non-uniform vector support to (shl (srl x, c1), c2) -> (and (shift x, c3)) Another part of D77804 yak shaving Differential Revision: https://reviews.llvm.org/D123523	2022-04-13 11:37:33 +01:00
Simon Pilgrim	bc32a1dd76	[DAG] Add non-uniform vector support to (shl (sr[la] exact X, C1), C2) folds	2022-04-12 12:57:56 +01:00
Craig Topper	35be4a7af3	[SelectionDAG] Remove unecessary null check after call to getNode. NFC As far as I know getNode will never return a null SDValue. I'm guessing this was modeled after the FoldConstantArithmetic call earlier. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123550	2022-04-11 18:03:44 -07:00
Craig Topper	2ce2562876	[RISCV][SelectionDAG] Add a hook to sign extend i32 ConstantInt operands of phis on RV64. Materializing constants on RISCV is simpler if the constant is sign extended from i32. By default i32 constant operands of phis are zero extended. This patch adds a hook to allow RISCV to override this for i32. We have an existing isSExtCheaperThanZExt, but it operates on EVT which we don't have at these places in the code. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122951	2022-04-11 14:38:39 -07:00
Craig Topper	28cb508195	[TargetLowering][RISCV] Allow truncation when checking if the arguments of a setcc are splats. We're just trying to canonicalize here and won't be using the constant value returned. The attached test changes are because we were previously commuting a seteq X, (splat_vector 0) because we also have (sub 0, X). The 0 is larger than the element type so we don't detect it as a splat without the AllowTruncation flag. By preventing the commute we are able to match it to the vmseq.vx instruction during isel. We only look for constants on the RHS in isel. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D123256	2022-04-11 09:49:36 -07:00
Sanjay Patel	2ed15984b4	[SDAG] try to reduce compare of funnel shift equal 0 fshl (or X, Y), X, C ==/!= 0 --> or (shl Y, C), X ==/!= 0 fshl X, (or X, Y), C ==/!= 0 --> or (srl Y, BW-C), X ==/!= 0 This is similar to an existing setcc-of-rotate fold, but the matching requires more checks for the more general funnel op: https://alive2.llvm.org/ce/z/Ab2jDd We are effectively decomposing the funnel shift into logical shifts, reassociating, and removing a shift. This should get us the final improvements for x86-64 that were originally shown in D111530 ( https://github.com/llvm/llvm-project/issues/49541 ); x86-32 still shows some SHLD/SHRD, so the pattern is not matching there yet. Differential Revision: https://reviews.llvm.org/D122919	2022-04-11 07:44:58 -04:00
Tim Northover	6c85668d28	Tail calls: look through AssertZExt to find register copy. arm64_32 guarantees the high 32 bits of pointer parameters are passed as 0, and this is modelled in the IR by inserting an AssertZExt after the CopyFromReg. The function deciding whether registers that need to be preserved actually are wasn't expecting this so it banned perfectly legitimate tail calls.	2022-04-11 12:24:47 +01:00
Fraser Cormack	18106b99f0	[VP] Explicitly map from VP intrinsic to ISD opcode This patch aims to overcome an issue in these mappings where, when an ISD node was registered with BEGIN_REGISTER_VP_SDNODE but outwidth the scope of a pair of BEGIN_REGISTER_VP_INTRINSIC/END_REGISTER_VP_INTRINSIC macros, the switch cases fell apart. This in particular happened with VP_SETCC, where we'd end up with something along the lines of: case Intrinsic::vp_fcmp: break; case Intrinsic::vp_icmp: break; ResOpc = ISD::VP_SETCC; case Intrinsic::vp_store: ... To remedy this, we introduce a special-purpose mapping macro which can map any number of VP intrinsic opcodes to an ISD opcode. As a result, we no longer need to special-case the mapping from vp.icmp and vp.fcmp to VP_SETCC, as the new helper macro does it for us. Thanks to @craig.topper for noticing this and to @rogfer01 for the idea. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D123324	2022-04-08 12:30:22 +01:00
Fraser Cormack	8216255c9f	[RISCV][VP] Add basic RVV codegen for vp.fcmp This patch adds the necessary infrastructure to lower vp.fcmp via ISD::VP_SETCC to RVV instructions. Most notably this patch adds cond-code legalization for VP_SETCC, reusing the existing TargetLowering::LegalizeSetCCCondCode by passing in additional SDValue parameters for the Mask and EVL. This method then uses VP operations to legalize the condcode. There is still a general lack of canonicalization on VP_SETCC as opposed to SETCC which results in worse code than is theoretically possible. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D123051	2022-04-07 09:16:07 +01:00
Craig Topper	bdb1ab9804	[LegalizeTypes][VP] Use LoVT/HiVT when splitting VP operations in SplitVecRes_UnaryOp. The VP path was using the split source VTs instead of the split destination VTs. This may not be a problem today because the VP nodes going through this have the same source and dest VTs. It will be a problem when we start using this function for legalizing VP cast operations.	2022-04-06 10:51:49 -07:00
Daniil Kovalev	62a983ebc5	Revert "[CodeGen] Place SDNode debug ID declaration under appropriate #if" This reverts commit `83a798d4b0`. As discussed in D120714 with @thakis, the patch added unneeded complexity without noticeable benefits.	2022-04-06 20:32:53 +03:00
Craig Topper	8fc19185e3	[LegalizeTypes] Move SplitVecRes_VECTOR_REVERSE/VECTOR_SPLICE near other SplitVecRes methods. NFC This file is divided into sections for different legalization actions. We should keep similar methods together.	2022-04-06 10:29:32 -07:00
Craig Topper	1ad36487e9	[LegalizeDAG] Use SelectionDAG::getBoolConstant to simplify some code. NFC	2022-04-06 10:08:11 -07:00
Craig Topper	5b5f59428c	[DAGCombiner] Replace call getSExtOrTrunc with a truncate. NFC The extend case should never occur. The sign extend would be an arbitrary choice, remove it to avoid confusion.	2022-04-06 09:59:45 -07:00
Paul Walker	7d3af9ef0f	[DAGCombine] insert_subvector undef, (splat X), N2 -> splat X Differential Revision: https://reviews.llvm.org/D120328	2022-04-06 17:15:38 +01:00
Fraser Cormack	6be5e875be	[RISCV][VP] Add basic RVV codegen for vp.icmp This patch adds the minimum required to successfully lower vp.icmp via the new ISD::VP_SETCC node to RVV instructions. Regular ISD::SETCC goes through a lot of canonicalization which targets may rely on which has not hereto been ported to VP_SETCC. It also supports expansion of individual condition codes and a non-boolean return type. Support for all of that will follow in later patches. In the case of RVV this largely isn't a problem as the vector integer comparison instructions are plentiful enough that it can lower all VP_SETCC nodes on legal integer vectors except for boolean vectors, which regular SETCC folds away immediately into logical operations. Floating-point VP_SETCC operations aren't as well supported in RVV and the backend relies on condition code expansion, so support for those operations will come in later patches. Portions of this code were taken from the VP reference patches. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122743	2022-04-06 16:51:22 +01:00
Paul Walker	1c307b9794	[NFC] Remove redundant IndexType canonicalisation from DAGTypeLegalizer::PromoteIntOp_MSCATTER Promotion does not affect the base element type and so the original index type will remain unchanged. This reflects the behaviour of DAGTypeLegalizer::PromoteIntOp_MGATHER with no tests affected.	2022-04-06 15:30:29 +01:00
zhongyunde	19e5235147	[AArch64][InstCombine] Fold MLOAD and zero extensions into MLOAD Accord the discussion in D122281, we missing an ISD::AND combine for MLOAD because it relies on BuildVectorSDNode is fails for scalable vectors. This patch is intend to handle that, so we can circle back the type MVT::nxv2i32 Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122703	2022-04-06 20:50:42 +08:00
Roman Lebedev	34ce9fd864	[TLI] `TargetLowering::SimplifyDemandedVectorElts()`: narrowing bitcast: fill known zero elts from known src bits E.g. in ``` %i0 = zext <2 x i8> to <2 x i16> %i1 = bitcast <2 x i16> to <4 x i8> ``` the `%i0`'s zero bits are known to be `0xFF00` (upper half of every element is known zero), but no elements are known to be zero, and for `%i1`, we don't know anything about zero bits, but the elements under `0b1010` mask are known to be zero (i.e. the odd elements). But, we didn't perform such a propagation. Noticed while investigating more aggressive `vpmaddwd` formation. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D123163	2022-04-06 14:19:31 +03:00
Daniil Kovalev	83a798d4b0	[CodeGen] Place SDNode debug ID declaration under appropriate #if Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to reduce memory usage when it is not needed. Differential Revision: https://reviews.llvm.org/D120714	2022-04-06 14:09:32 +03:00
Jeremy Morse	fb6596f1ec	[DebugInfo][InstrRef] Avoid a crash from mixed variable location modes Variable locations now come in two modes, instruction referencing and DBG_VALUE. At -O0 we pick DBG_VALUE to allow fast construction of variable information. Unfortunately, SelectionDAG edits the optimisation level in the presence of opt-bisect-limit, meaning different passes have different views of what variable location mode we should use. That causes assertions when they're mixed. This patch plumbs through a boolean in SelectionDAG from start to instruction emission, so that we don't rely on the current optimisation level for correctness. Differential Revision: https://reviews.llvm.org/D123033	2022-04-06 11:55:38 +01:00
Simon Pilgrim	3369e474bb	[DAG] Allow XOR(X,MIN_SIGNED_VALUE) to perform AddLike folds As raised on PR52267, XOR(X,MIN_SIGNED_VALUE) can be treated as ADD(X,MIN_SIGNED_VALUE), so let these cases use the 'AddLike' folds, similar to how we perform no-common-bits OR(X,Y) cases. define i8 @src(i8 %x) { %r = xor i8 %x, 128 ret i8 %r } => define i8 @tgt(i8 %x) { %r = add i8 %x, 128 ret i8 %r } Transformation seems to be correct! https://alive2.llvm.org/ce/z/qV46E2 Differential Revision: https://reviews.llvm.org/D122754	2022-04-06 10:37:11 +01:00
Simon Pilgrim	9e97b2a477	[DAG] SimplifySetCC - relax fold (X^C1) == C2 --> X == C1^C2 https://alive2.llvm.org/ce/z/A_auBq Remove limitation that wouldn't perform the fold if all the inverted bits are known zero The thumb2 changes look to be benign, although it does show that the TEQ/TST isel patterns could probably be improved. Fixes movmsk regression in D122754 Differential Revision: https://reviews.llvm.org/D123023	2022-04-06 09:18:08 +01:00
Simon Pilgrim	328754474a	[DAG] SimplifySetCC - clang-format add/xor/sub with constant handling. NFC.	2022-04-04 13:30:17 +01:00
Michael Gottesman	e24f534879	[debug-info] As an NFC commit, refactor EmitFuncArgumentDbgValue so that it can be extended to support llvm.dbg.addr. The reason why I am making this change is that before this commit, EmitFuncArgumentDbgValue relied on a boolean flag IsDbgDeclare both to signal that a DBG_VALUE should be made to be indirect /and/ that the original intrinsic was a dbg.declare. This is no longer always true if we add support for handling dbg.addr since we will have an indirect DBG_VALUE that is a different intrinsic from dbg.declare. With that in mind, in this NFC patch, we prepare for future fixes by introducing a 3 case-enum argument to EmitFuncArgumentDbgValue that allows the caller to explicitly specify how the argument's DBG_VALUE should be emitted. This then allows us to turn the indirect checks into a != FuncArgumentDbgValueKind::Value and prepare us for a future where we add support here for llvm.dbg.addr directly. rdar://83957028 Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D122945	2022-04-01 17:07:28 -07:00
Craig Topper	fa630e7594	[RISCV][AMDGPU][TargetLowering] Special case overflow expansion for (uaddo X, 1). If we expand (uaddo X, 1) we previously expanded the overflow calculation as (X + 1) <u X. This potentially increases the live range of X and can prevent X+1 from reusing the register that previously held X. Since we're adding 1, overflow only occurs if X was UINT_MAX in which case (X+1) would be 0. So this patch adds a special case to expand the overflow calculation to (X+1) == 0. This seems to help with uaddo intrinsics that get introduced by CodeGenPrepare after LSR. Alternatively, we could block the uaddo transform in CodeGenPrepare for this case. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122933	2022-04-01 13:14:10 -07:00
Simon Pilgrim	76cd11f303	[DAG] Add llvm::isMinSignedConstant helper. NFC Pulled out of D122754	2022-04-01 17:47:34 +01:00
Matt Arsenault	4a8665e23e	SelectionDAG: Avoid some uses of getPointerTy Avoids use of the default address space parameter, and avoids some assumptions about the incoming address space.	2022-03-31 18:49:22 -04:00
Craig Topper	85eae45520	[SelectionDAG] Move extension type for ConstantSDNode from getCopyToRegs to HandlePHINodesInSuccessorBlocks. D122053 set the ExtendType for ConstantSDNodes in getCopyToRegs to ZERO_EXTEND to match assumptions in ComputePHILiveOutRegInfo. PHIs are probably not the only way ConstantSDNodeNodes can get to getCopyToRegs. This patch adds an ExtendType parameter to CopyValueToVirtualRegister and has HandlePHINodesInSuccessorBlocks pass ISD::ZERO_EXTEND for ConstantInts. This way we only affect ConstantSDNodes for PHIs. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122171	2022-03-30 11:32:43 -07:00
Sanjay Patel	436b875e49	[SDAG] avoid libcalls to fmin/fmax for soft-float targets This is an extension of D70965 to avoid creating a mathlib call where it did not exist in the original source. Also see D70852 for discussion about an alternative proposal that was abandoned. In the motivating bug report: https://github.com/llvm/llvm-project/issues/54554 ...we also have a more general issue about handling "no-builtin" options. Differential Revision: https://reviews.llvm.org/D122610	2022-03-30 11:22:03 -04:00
Sanjay Patel	e18cc5277f	[SDAG] try to canonicalize logical shift after bswap When shifting by a byte-multiple: bswap (shl X, C) --> lshr (bswap X), C bswap (lshr X, C) --> shl (bswap X), C This is the backend version of D122010 and an alternative suggested in D120648. There's an extra check to make sure the shift amount is valid that was not in the rough draft. I'm not sure if there is a larger motivating case for RISCV (bug report?), but the ARM diffs show a benefit from having a late version of the transform (because we do not combine the loads in IR). Differential Revision: https://reviews.llvm.org/D122655	2022-03-30 09:29:32 -04:00
Fraser Cormack	43a91a8474	[SelectionDAG] Don't create illegally-typed nodes while constant folding This patch fixes a (seemingly very rare) crash during vector constant folding introduced in D113300. Normally, during legalization, if we create an illegally-typed node during a failed attempt at constant folding it's cleaned up before being visited, due to it having no uses. If, however, an illegally-typed node is created during one round of legalization and isn't cleaned up, it's possible for a second round of legalization to create new illegally-typed nodes which add extra uses to the old illegal nodes. This means that we can end up visiting the old nodes before they're known to be dead, at which point we crash. I'm not happy about this fix. Creating illegal types at all seems like a bad idea, but we all-too-often rely on illegal constants being successfully folded and being fixed up afterwards. However, we can't rely on constant folding actually happening, and we don't have a foolproof way of peering into the future. Perhaps the correct fix is to revisit the node-iteration order during legalization, ensuring we visit all uses of nodes before the nodes themselves. Or alternatively we could try and clean up dead nodes immediately after failing constant folding. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122382	2022-03-30 13:17:55 +01:00
Craig Topper	e68257fcee	[RISCV][SelectionDAG] Enable TargetLowering::hasBitTest for masks that fit in ANDI. Modified DAGCombiner to pass the shift the bittest input and the shift amount to hasBitTest. This matches the other call to hasBitTest in TargetLowering.h This is an alternative to D122454. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D122458	2022-03-28 12:46:36 -07:00
Simon Pilgrim	e209190c2d	[SDAG] enable binop identity constant folds for multiplies Add mul to the list of ops that we canonicalize with a select to expose an identity merge Differential Revision: https://reviews.llvm.org/D122071	2022-03-25 11:07:04 +00:00
Craig Topper	67eb2f144e	[SelectionDAG] Add AssertAlign to AddNodeIDCustom so that it will CSE properly. The alignment needs to be part of the folding set hash. This is handled by getAssertAlign when nodes are created, but needs to repeated here. No test case as I found it as part of a very early experimental patch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D122279	2022-03-24 08:59:09 -07:00
Daniil Kovalev	c53cbce45e	[CodeGen] Define ABI breaking class members correctly Non-static class members declared under #ifndef NDEBUG should be declared under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to make headers library-friendly and allow cross-linking, as discussed in D120714. Differential Revision: https://reviews.llvm.org/D121549	2022-03-24 12:42:59 +03:00
Craig Topper	cac9773dcc	[SelectionDAG] Don't create entries in ValueMap in ComputePHILiveOutRegInfo Instead of using operator[], use DenseMap::find to prevent default constructing an entry if it isn't already in the map. Also simplify a condition to check for 0 instead of a virtual register. I'm pretty sure we can only get 0 or a virtual register out of the value map.	2022-03-23 09:52:07 -07:00
serge-sans-paille	02c28970b2	Cleanup include: codegen second round Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122180	2022-03-23 13:54:00 +01:00
Craig Topper	681fd2c11e	Revert "[SelectionDAG] Don't create entries in ValueMap in ComputePHILiveOutRegInfo" This reverts commit `1a9b55b63a`. Causing build bot failures	2022-03-22 23:41:47 -07:00
Craig Topper	1a9b55b63a	[SelectionDAG] Don't create entries in ValueMap in ComputePHILiveOutRegInfo Instead of using operator[], use DenseMap::find to prevent default constructing an entry if it isn't already in the map.	2022-03-22 23:24:53 -07:00
Craig Topper	73f0af106b	[SelectionDAG] Add printing support for the Align value of AssertAlign nodes. Differential Revision: https://reviews.llvm.org/D122262	2022-03-22 14:16:32 -07:00
Craig Topper	37c0aacd71	[SelectionDAG] Make getPreferredExtendForValue take a Instruction * instead of Value . This is only called for instructions and the caller is already holding an Instruction . This makes the code more explicit and makes it obvious the code doesn't make decisions about constants.	2022-03-21 12:15:22 -07:00
zhongyunde	828b89bc0b	[AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads Trying to reduce the number of masked loads in favour of more unpklo/hi instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions from legal types. Both of normal and masked loads test cases added to guard compile crash. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D120953	2022-03-21 23:47:33 +08:00
Simon Pilgrim	35a7be6ccb	[SDAG] enable binop identity constant folds for shifts Add shl/srl/sra to the list of ops that we canonicalize with a select to expose an identity merge Differential Revision: https://reviews.llvm.org/D122070	2022-03-21 13:02:50 +00:00
Luo, Yuanke	10bb623192	enable binop identity constant folds for add Differential Revision: https://reviews.llvm.org/D119654	2022-03-20 19:07:16 +08:00
Craig Topper	4eb59f0179	[SelectionDAG][RISCV] Make RegsForValue::getCopyToRegs explicitly zero_extend constants. ComputePHILiveOutRegInfo assumes that constant incoming values to Phis will be zero extended if they aren't a legal type. To guarantee that we should zero_extend rather than any_extend constants. This fixes a bug for RISCV where any_extend of constants can be treated as a sign_extend. Differential Revision: https://reviews.llvm.org/D122053	2022-03-19 18:43:14 -07:00
Craig Topper	306ff74154	[SelectionDAG] Use APInt::zextOrSelf instead of zextOrTrunc in ComputePHILiveOutRegInfo The width never decreases here.	2022-03-18 23:26:19 -07:00
Heejin Ahn	b8038a916d	[WebAssembly] Disable SimplifyDemandedVectorElts after legalization This fixes a reported bug that caused an infinite loop during the SelectionDAG optimization phase in ISel, by creating an overridable hook in `TargetLowering` that allows us to bail out from running `SimplifyDemandedVectorElts`. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D121869	2022-03-16 20:52:43 -07:00
Marco Elver	555df03012	[SelectionDAG][NFC] Clean up SDCallSiteDbgInfo accessors * Consistent naming: addCallSiteInfo vs. getCallSiteInfo; * Use ternary operator to reduce verbosity; * const'ify getters; * Add comments; NFCI. Differential Revision: https://reviews.llvm.org/D121820	2022-03-16 17:46:06 +01:00
Matthias Gehre	09854f2af3	[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers Emit calls to __divei4 and friends for divison/remainder of large integers. This fixes https://github.com/llvm/llvm-project/issues/44994. The overall RFC is in https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329 The compiler-rt part is in https://reviews.llvm.org/D120327 Differential Revision: https://reviews.llvm.org/D120329	2022-03-16 09:36:28 +00:00
Craig Topper	1bf4bbc492	[LegalizeTypes][RISCV][WebAssembly] Expand ABS in PromoteIntRes_ABS if it will expand to sra+xor+sub later. If we promote the ABS and then Expand in LegalizeDAG, then both the sra and the xor will have their inputs sign extended. This generates extra code on RISCV which lacks an i8 or i16 sign extend instructon. If we expand during type legalization, then only the sra will get its input sign extended. RISCV is able to combine this with the sra by doing a shift left followed by an sra. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D121664	2022-03-15 08:27:39 -07:00
Craig Topper	ad94dfb9a0	[DAGCombiner][RISCV] Adjust (aext (and (trunc x), cst)) -> (and x, cst) to sext cst based on target preference RISCV strong prefers i32 values be sign extended to i64. This combine was always zero extending the constant using APInt methods. This adjusts the code so that it calls getNode using ISD::ANY_EXTEND instead. getNode will call TLI.isSExtCheaperThanZExt to decide how to handle the constant. Tests were copied from D121598 where I noticed that we were creating constants that were hard to materialize. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D121650	2022-03-15 08:26:47 -07:00
Sanjay Patel	c2592c374e	[SDAG] simplify bitwise logic with repeated operand We do not have general reassociation here (and probably do not need it), but I noticed these were missing in patches/tests motivated by D111530, so we can at least handle the simplest patterns. The VE test diff looks correct, but we miss that pattern in IR currently: https://alive2.llvm.org/ce/z/u66_PM	2022-03-13 11:12:30 -04:00
serge-sans-paille	ed98c1b376	Cleanup includes: DebugInfo & CodeGen Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121332	2022-03-12 17:26:40 +01:00
Lorenzo Albano	28cfa764c2	[VP] Strided loads/stores This patch introduces two new experimental IR intrinsics and SDAG nodes to represent vector strided loads and stores. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D114884	2022-03-10 18:46:54 +01:00
Stanislav Mekhanoshin	0be6fd44f3	[SDAG] Use MMO flags in MemSDNode folding SDNodes with different target flags may now be folded together rightfully resulting in the assertion in the refineAlignment. Folding nodes with different target flags may result in the wrong load instructions produced at least on the AMDGPU. Fixes: SWDEV-326805 Differential Revision: https://reviews.llvm.org/D121335	2022-03-09 14:25:22 -08:00
Sanjay Patel	341623653d	[SDAG] match rotate pattern with extra 'or' operation This is another fold generalized from D111530. We can find a common source for a rotate operation hidden inside an 'or': https://alive2.llvm.org/ce/z/9pV8hn Deciding when this is profitable vs. a funnel-shift is tricky, but this does not show any regressions: if a target has a rotate but it does not have a funnel-shift, then try to form the rotate here. That is why we don't have x86 test diffs for the scalar tests that are duplicated from AArch64 ( `74a65e3834` ) - shld/shrd are available. That also makes it difficult to show vector diffs - the only case where I found a diff was on x86 AVX512 or XOP with i64 elements. There's an additional check for a legal type to avoid a problem seen with x86-32 where we form a 64-bit rotate but then it gets split inefficiently. We might avoid that by adding more rotate folds, but I didn't check to see what is missing on that path. This gets most of the motivating patterns for AArch64 / ARM that are in D111530. We still need a couple of enhancements to setcc pattern matching with rotate/funnel-shift to get the rest. Differential Revision: https://reviews.llvm.org/D120933	2022-03-09 13:19:00 -05:00
Craig Topper	29511ec7da	[LegalizeTypes][VP] Add widening and splitting support for VP_FMA. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D120854	2022-03-08 09:59:59 -08:00
Craig Topper	c392b9924e	[LegalizeTypes][VP] Add splitting and widening support for VP_FNEG. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D120785	2022-03-08 09:59:34 -08:00
Fraser Cormack	17310f3d19	[SelectionDAG][NFC] Address a few clang-tidy warnings Fix a couple of else-after-return warnings and some unnecessary parentheses.	2022-03-08 16:22:26 +00:00
Craig Topper	8e132c5c1d	[LegalizeTypes][ARM][X86] Change ExpandIntRes_ABS to use sra+xor+sub. Previously we used sra+add+xor if ADDCARRY is supported. This changes to sra+xor+sub is SUBCARRY is available. This is consistent with the recent change to the default expansion in LegalizeDAG. Differential Revision: https://reviews.llvm.org/D121039	2022-03-07 11:28:32 -08:00
David Green	4388f4f776	[DAG] Don't convert undef to 0 when creating buildvector When inserting undef into buildvectors created from shuffles of buildvectors, we convert elements to the largest needed type. This had the effect of converting undef into 0, which isn't needed as the buildvector implicitly truncates and trunc(zext(undef)) == undef. Differential Revision: https://reviews.llvm.org/D121002	2022-03-06 18:35:34 +00:00
Sanjay Patel	f4b53972ce	[SDAG] fold bitwise logic with shifted operands This extends `acb96ffd14` to 'and' and 'xor' opcodes. Copying from that message: LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z https://alive2.llvm.org/ce/z/QmR9rR This is a reassociation + factoring fold. The common shift operation is moved after a bitwise logic op on 2 input operands. We get simpler cases of these patterns in IR, but I suspect we would miss all of these exact tests in IR too. We also handle the simpler form of this plus several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands().	2022-03-05 11:14:45 -05:00
Paul Walker	42b4a6227e	[DAGCombine] Prevent illegal ISD::SPLAT_VECTOR operations post legalisation. When triggered during operation legalisation the affected combine generates a splat_vector that when custom lowered for SVE fixed length code generation, results in the original precombine sequence and thus we enter a legalisation/combine hang. NOTE: The patch contains no tests because I observed this issue only when combined with other work that might never become public. The current way AArch64 lowers ISD::SPLAT_VECTOR meant a specific test was not possible so I'm hoping the DAGCombiner fix can be seen as obvious. The AArch64ISelLowering change is requirted to maintain existing code quality. Differential Revision: https://reviews.llvm.org/D120735	2022-03-04 11:54:03 +00:00
Maksim Panchenko	7e570308f2	[NFC] Fix typos Reviewed By: yota9, Amir Differential Revision: https://reviews.llvm.org/D120859	2022-03-03 13:26:39 -08:00
Paul Robinson	7b85f0f32f	[PS4] isPS4 and isPS4CPU are not meaningfully different	2022-03-03 11:36:59 -05:00
Sanjay Patel	e9302bf7ef	[SDAG] try harder to remove a rotate from X == 0 https://alive2.llvm.org/ce/z/mJP7XP This can be viewed as expanding the compare into and/or-of-compares: https://alive2.llvm.org/ce/z/bkZYWE followed by reduction of each compare. This could be extended in several ways: 1. There's a (X & Y) == -1 sibling. 2. We can recurse through more than 1 'or'. 3. The fold could be generalized beyond rotates - any operation that only changes the order of bits (bswap, bitreverse). This is a transform noted in D111530.	2022-03-03 09:25:46 -05:00
Sanjay Patel	c33dbc2a2d	[SDAG] refactor foldSetCCWithRotate; NFC There are more potential optimizations to make here, so rearrange to make it easier to append those.	2022-03-02 16:42:05 -05:00
Craig Topper	ab7a7cc1dd	Revert "[LegalizeTypes][VP] Add splitting and widening support for VP_FNEG." This reverts commit `ac93f95861`. Committed by accident.	2022-03-02 10:00:22 -08:00
Craig Topper	324c0a7206	[SelectionDAG][RISCV] Emit a canonical sign bit test from ExpandIntRes_ABS. Instead of emitting 0 > Hi, emit Hi < 0. If Hi needs to be expanded again this will allow the special case for sign bit tests in ExpandIntOp_SETCC to trigger. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120761	2022-03-02 09:47:26 -08:00
Craig Topper	ac93f95861	[LegalizeTypes][VP] Add splitting and widening support for VP_FNEG. Differential Revision: https://reviews.llvm.org/D120785	2022-03-02 09:47:05 -08:00
Simon Pilgrim	5cce97d61e	[DAG] isSplatValue - improve ISD::VECTOR_SHUFFLE splat detection Currently we only check for splat shuffles, this extends it to see if the source operand is a splat across the demanded elts based upon the shuffle mask	2022-03-02 15:32:24 +00:00
Simon Pilgrim	df0a2b4f30	[DAG] SelectionDAG::isSplatValue - add initial BITCAST handling This patch adds support for recognising vector splats by peeking through bitcasts to vectors with smaller element types - if all the offset subelements are splats then the bitcasted vector is a splat as well. We don't have great coverage for isSplatValue so I've made this pretty specific to the use case I'm trying to fix - regressions in some vXi64 vector shift by splat cases that 32-bit x86 doesn't recognise because the shift amount buildvector has been type legalised to v2Xi32. We can add further support (floats, bitcast from larger element types, undef elements) when we have actual test coverage. Differential Revision: https://reviews.llvm.org/D120553	2022-03-02 11:25:51 +00:00
Craig Topper	8787726609	[LegalizeTypes] Remove incomplete StrictFP support from SplitVecRes_UnaryOp. NFC There is no handling of Chain operands in this function so it can't work. There's a separate splitting function for all strict fp nodes.	2022-03-01 15:43:57 -08:00
serge-sans-paille	a494ae43be	Cleanup includes: TransformsUtils Estimation on the impact on preprocessor output: before: 1065307662 after: 1064800684 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120741	2022-03-01 21:00:07 +01:00
Craig Topper	bf8054644d	[DAGCombiner] Don't expand (neg (abs x)) if the abs has an additional user. If the types aren't legal, the expansions may get type legalized in a different way preventing code sharing. If the type is legal, we will share some instructions between the two expansions, but we will need an extra register. Since we don't appear to fold (neg (sub A, B)) if the sub has an additional user, I think it makes sense not to expand NABS. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120513	2022-03-01 07:32:07 -08:00
Sanjay Patel	69684b84c6	[SDAG] fold (rotate X) eq/ne (0/-1) This is the SDAG equivalent of an instcombine transform added with: `fd807601a7` This is another step towards solving #49541 and part of an alternative set of more general transforms than what is proposed in D111530. https://alive2.llvm.org/ce/z/ToxaE8	2022-02-27 11:31:19 -05:00
Sanjay Patel	acb96ffd14	[SDAG] fold bitwise logic with shifted operands LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z https://alive2.llvm.org/ce/z/QmR9rR This is a reassociation + factoring fold. The common shift operation is moved after a bitwise logic op on 2 input operands. We get simpler cases of these patterns in IR, but I suspect we would miss all of these exact tests in IR too. We also handle the simpler form of this plus several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands(). This is a partial implementation of a transform suggested in D111530 (only handles 'or' bitwise logic as a first step - need to stamp out more tests for other opcodes). Several of the same tests added for D111530 are altered here (but not fully optimized). I'm not sure yet if this would help/hinder that patch, but this should be an improvement for all tests added with `ecf606cb43` since it removes a shift operation in those examples. Differential Revision: https://reviews.llvm.org/D120516	2022-02-27 09:54:12 -05:00
Simon Pilgrim	fadd20f80d	[DAG] Ensure type is legal for bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2)))) fold As reported on D120192	2022-02-27 11:25:22 +00:00
Nikita Popov	87ebd9a36f	[IR] Use CallBase::getParamElementType() (NFC) As this method now exists on CallBase, use it rather than the one on AttributeList.	2022-02-25 10:01:58 +01:00
Simon Pilgrim	370ebc9d9a	[DAG] Attempt to fold bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2)))) If the shl is at least half the bitwidth (i.e. the lower half of the bswap source is zero), then we can reduce the shift and perform the bswap at half the bitwidth and just zero extend. Based off PR51391 + PR53867 Differential Revision: https://reviews.llvm.org/D120192	2022-02-24 19:33:51 +00:00
Sanjay Patel	4a3708cd6b	[SDAG] remove shift that is redundant with part of funnel shift This is the SDAG translation of D120253 : https://alive2.llvm.org/ce/z/qHpmNn The SDAG nodes can have different operand types than the result value. We can see an example of that with AArch64 - the funnel shift amount is an i64 rather than i32. We may need to make that match even more flexible to handle post-legalization nodes, but I have not stepped into that yet. Differential Revision: https://reviews.llvm.org/D120264	2022-02-24 11:25:46 -05:00
Craig Topper	c7d6448d03	[DAGCombiner][TargetLowering] Pass SDValue by value to isMulAddWithConstProfitable. Internally to DAGCombiner the SDValues were passed by non-const reference despite not being modified. They were then passed by const reference to TLI. This patch passes them by value which is consistent with the vast majority of code. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120420	2022-02-23 12:40:45 -08:00
Pawe Bylica	afdaa86b77	[DAGCombine] Extend combineCarryDiamond() In combineCarryDiamond() use getAsCarry() to find more candidates for being a carry flag. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118362	2022-02-23 21:37:49 +01:00
Sanjay Patel	21d7c3bcc6	[DAG] try to convert multiply to shift via demanded bits This is a fix for a regression discussed in: https://github.com/llvm/llvm-project/issues/53829 We cleared more high multiplier bits with `995d400`, but that can lead to worse codegen because we would fail to recognize the now disguised multiplication by neg-power-of-2 as a shift-left. The problem exists independently of the IR change in the case that the multiply already had cleared high bits. We also convert shl+sub into mul+add in instcombine's negator. This patch fills in the high-bits to see the shift transform opportunity. Alive2 attempt to show correctness: https://alive2.llvm.org/ce/z/GgSKVX The AArch64, RISCV, and MIPS diffs look like clear wins. The x86 code requires an extra move register in the minimal examples, but it's still an improvement to get rid of the multiply on all CPUs that I am aware of (because multiply is never as fast as a shift). There's a potential follow-up noted by the TODO comment. We should already convert that pattern into shl+add in IR, so it's probably not common: https://alive2.llvm.org/ce/z/7QY_Ga Fixes #53829 Differential Revision: https://reviews.llvm.org/D120216	2022-02-23 12:09:32 -05:00
Paweł Bylica	df0c16ce00	[NFC][DAGCombine] Use isOperandOf() in combineCarryDiamond Pre-commit for https://reviews.llvm.org/D118362.	2022-02-21 21:41:31 +01:00
Simon Pilgrim	46f1e8359e	[DAG] visitBSWAP - pull out repeated SDLoc. NFC Cleanup for D120192	2022-02-21 13:08:01 +00:00
Craig Topper	440c4b705a	[SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y). Previous we used sra (X, size(X)-1); xor (add (X, Y), Y). By placing sub at the end, we allow RISCV to combine sign_extend_inreg with it to form subw. Some X86 tests for Z - abs(X) seem to have improved as well. Other targets look to be a wash. I had to modify ARM's abs matching code to match from sub instead of xor. Maybe instead ISD::ABS should be made legal. I'll try that in parallel to this patch. This is an alternative to D119099 which was focused on RISCV only. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119171	2022-02-20 21:11:23 -08:00
Chen Zheng	efe5b8ad90	[ISEL] remove unnecessary getNode(); NFC Reviewed By: RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D120049	2022-02-20 21:08:49 -05:00
Luo, Yuanke	67ef63138b	[SDAG] enable binop identity constant folds for sub This patch extract the sub folding from D119654 and leave only add folding in that patch. Differential Revision: https://reviews.llvm.org/D120116	2022-02-21 09:37:36 +08:00
Craig Topper	24bfa24355	[SelectionDAGBuilder] Simplify visitShift. NFC This code was detecting whether the value returned by getShiftAmountTy can represent all shift amounts. If not, it would use MVT::i32 as a placeholder. getShiftAmountTy was updated last year to return i32 if the type returned by the target couldn't represent all values. This means the MVT::i32 case here is dead and can the logic can be simplified. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120164	2022-02-19 12:40:59 -08:00
Craig Topper	8e7247a377	[SelectionDAG] Fix off by one error in range check in DAGTypeLegalizer::ExpandShiftByConstant. The code was considering shifts by an about larger than the number of bits in the original VT to be out of range. Shifts exactly equal to the original bit width are also out of range. I don't know how to test this. DAGCombiner should usually fold this away. I just noticed while looking for something else in this code. The llvm-cov report shows that we don't have coverage for out of range shifts here. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120170	2022-02-18 18:42:20 -08:00
Craig Topper	04f815c26f	[SelectionDAGBuilder] Remove LegalTypes=false from a call to getShiftAmountConstant. getShiftAmountTy will return MVT::i32 if the shift amount coming from the target's getScalarShiftAmountTy can't reprsent all possible values. That should eliminate the need to use the pointer type which is what we do when LegalTypes is false. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120165	2022-02-18 15:36:35 -08:00
Sanjay Patel	a2963d871e	[SDAG] fold sub-of-shift to add-of-shift This fold is done in IR: https://alive2.llvm.org/ce/z/jWyFrP There is an x86 test that shows an improvement from the added flexibility of using add (commutative). The other diffs are presumed neutral. Note that this could also be folded to an 'xor', but I'm not sure if that would be universally better (eg, x86 can convert adds more easily into LEA). This helps prevent regressions from a potential fold for issue #53829.	2022-02-18 11:55:50 -05:00
Paul Walker	6457f42bde	[DAGCombiner] Extend ISD::ABDS/U combine to handle more cases. The current ABD combine doesn't quite work for SVE because only a single scalable vector per scalar integer type is legal (e.g. for i32, <vscale x 4 x i32> is the only legal scalable vector type). This patch extends the combine to also trigger for the cases when operand extension must be retained. Differential Revision: https://reviews.llvm.org/D115739	2022-02-17 13:32:20 +00:00
Bjorn Pettersson	1a8bdf95a3	[DAG] Fix in ReplaceAllUsesOfValuesWith When doing SelectionDAG::ReplaceAllUsesOfValuesWith a worklist is prepared containing all users that should be updated. Then we use the RemoveNodeFromCSEMaps/AddModifiedNodeToCSEMaps helpers to handle recursive CSE updates while doing the replacements. This patch aims at solving a problem that could arise if the recursive CSE updates would result in an SDNode present in the worklist is being removed as a side-effect of morphing a prio user in the worklist. To examplify such a scenario, imagine that we have these nodes in the DAG t12: i64 = add t8, t11 t13: i64 = add t12, t8 t14: i64 = add t11, t11 t15: i64 = add t14, t8 t16: i64 = sub t13, t15 and that the t8 uses should be replaced by t11. An initial worklist (listing the users that should be morphed) could be [t12, t13, t15]. When updating t12 we get t12: i64 = add t11, t11 which results in a CSE update that replaces t14 by t12, so we get t15: i64 = add t12, t8 which results in a CSE update that replaces t13 by t12, so we get t16: i64 = sub t12, t15 and then t13 is removed given that it was the last use of t13. So when being done with the updates triggered by rewriting the use of t8 in t12 the t13 node no longer exist. And we used to end up hitting an assertion when continuing with the worklist aiming at replacing the t8 uses in t13. The solution is based on using a DAGUpdateListener, making sure that we prune a user from the worklist if it is removed during the recursive CSE updates. The bug was found using an OOT target. I think the problem is quite old, even if the particular intree target reproducer added in this patch seem to pass when using LLVM 13.0.0. Differential Revision: https://reviews.llvm.org/D119088	2022-02-17 14:29:59 +01:00
Craig Topper	1daa66d3fd	[SelectionDAG] Add SPLAT_VECTOR to SelectionDAG::isConstantFPBuildVectorOrConstantFP. Matches what is done for the int version. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D119793	2022-02-16 09:22:11 -08:00
Simon Pilgrim	30e9cdd1aa	[DAG] computeKnownBits - add ISD::AVGCEILU handling Expand the ISD::AVGCEILU to determine the known bits of the result. First part of PR53622 Differential Revision: https://reviews.llvm.org/D119629	2022-02-16 13:00:15 +00:00
David Green	655d0d86f9	[DAGCombine] Move AVG combine to SimplifyDemandBits This moves the matching of AVGFloor and AVGCeil into a place where demand bit are available, so that it can detect more cases for more folds. It changes the transform to start from a shift, not from a truncate. We match the pattern shr(add(ext(A), ext(B)), 1), transforming to ext(hadd(A, B)). For signed values, because only the bottom bits are demanded llvm will transform the above to use a lshr too, as opposed to ashr. In order to correctly detect the hadd we need to know the demanded bits to turn it back. Depending on whether the shift is signed (ashr) or logical (lshr), and the extensions are signed or unsigned we can create different nodes. If the shift is signed: Needs >= 2 sign bits. https://alive2.llvm.org/ce/z/h4gQAW generating signed rhadd. Needs >= 2 zero bits. https://alive2.llvm.org/ce/z/B64DUA generating unsigned rhadd. If the shift is unsigned: Needs >= 1 zero bits. https://alive2.llvm.org/ce/z/ByD8sj generating unsigned rhadd. Needs 1 demanded bit zero and >= 2 sign bits https://alive2.llvm.org/ce/z/hvPGxX and https://alive2.llvm.org/ce/z/32P5n1 generating signed rhadd. Differential Revision: https://reviews.llvm.org/D119072	2022-02-15 10:17:02 +00:00
David Green	03380c70ed	[DAGCombine] Basic combines for AVG nodes. This adds very basic combines for AVG nodes, mostly for constant folding and handling degenerate (zero) cases. The code performs mostly the same transforms as visitMULHS, adjusted for AVG nodes. Constant folding extends to a higher bitwidth and drops the lowest bit. For undef nodes, `avg undef, x` is transformed to x. There is also a transform for `avgfloor x, 0` transforming to `shr x, 1`. Differential Revision: https://reviews.llvm.org/D119559	2022-02-14 11:18:35 +00:00
Nikita Popov	ff040eca93	[FastISel] Reuse register for bitcast that does not change MVT The current FastISel code reuses the register for a bitcast that doesn't change the IR type, but uses a reg-to-reg copy if it changes the IR type without changing the MVT. However, we can simply reuse the register in that case as well. In particular, this avoids unnecessary reg-to-reg copies for pointer bitcasts. This was found while inspecting O0 codegen differences between typed and opaque pointers. Differential Revision: https://reviews.llvm.org/D119432	2022-02-14 09:13:17 +01:00
Craig Topper	e72fe654b7	[DAGCombiner] Use getShiftAmountConstant in DAGCombiner::foldSelectOfConstants. This enables fshl to be matched earlier on X86 %6 = lshr i32 %3, 1 %7 = select i1 %4, i32 -2147483648, i32 0 %8 = or i32 %6, %7 X86 uses i8 for shift amounts. SelectionDAGBuilder creates the ISD::SRL with an i8 shift type. DAGCombiner turns the select into an ISD::SHL. Prior to this patch it would use i32 for the shift amount. fshl matching failed because the shift amounts have different types. LegalizeDAG fixes the ISD::SHL shift amount to i8. This allowed fshl matching to succeed. With this patch, the ISD::SHL will be created with an i8 shift amount. This allows the fshl to match immediately. No test case beause we still end up with a fshl either way.	2022-02-13 19:09:26 -08:00
Sanjay Patel	96b7e0b5a0	[SDAG] clean up scalarizing load transform I have not found a way to expose a difference for this patch in a test because it only triggers for a one-use load, but this is the code that was adapted into D118376 and caused miscompiles. The new code pattern is the same as what we do in narrowExtractedVectorLoad() (reduces load width for a subvector extract). This removes seemingly unnecessary manual worklist management and fixes the chain updating via "SelectionDAG::makeEquivalentMemoryOrdering()". Differential Revision: https://reviews.llvm.org/D119549	2022-02-12 11:41:19 -05:00
Sanjay Patel	429f10f5f2	[SDAG] reduce code duplication and fix formatting; NFC	2022-02-12 10:22:13 -05:00
Arthur Eubanks	c0281c7607	[OpaquePtr][SPARC] Remove getPointerElementType() call in SparcISelLowering Requires keeping better track of sret types.	2022-02-11 11:31:19 -08:00
David Green	4072e362c0	[ISel] Port AArch64 HADD and RHADD to ISel This ports the aarch64 combines for HADD and RHADD over to DAG combine, so that they can be used in more architectures (notably MVE in a followup patch). They are renamed to AVGFLOOR and AVGCEIL in the process, to avoid confusion with instructions such as X86 hadd. The code was also rewritten slightly to remove the AArch64 idiosyncrasies. The general pattern for a AVGFLOORS is %xe = sext i8 %x to i32 %ye = sext i8 %y to i32 %a = add i32 %xe, %ye %r = lshr i32 %a, 1 %t = trunc i32 %r to i8 An AVGFLOORU is equivalent with zext. Because of the truncate lshr==ashr, as the top bits are not demanded. An AVGCEIL also includes an extra rounding, so includes an extra add of 1. Differential Revision: https://reviews.llvm.org/D106237	2022-02-11 18:28:56 +00:00
Julien Pages	dcb2da13f1	[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode Add a new llvm.fptrunc.round intrinsic to precisely control the rounding mode when converting from f32 to f16. Differential Revision: https://reviews.llvm.org/D110579	2022-02-11 12:08:23 -05:00
Nikita Popov	6241f7dee0	[FastISel] Remove redundant reg class check (NFC) SrcVT and DstVT are the same in this branch, as such their register classes will also be the same.	2022-02-10 14:10:00 +01:00
Reid Kleckner	b5a592a8e2	[DAG] Remove pointless std::function wrapper, NFC	2022-02-09 14:30:43 -08:00
Reid Kleckner	f63c150187	Revert "[DagCombine] Increase depth by number of operands to avoid a pathological compile time." Appears to be causing check-llvm to fail This reverts commit `49ab760090`.	2022-02-09 13:55:40 -08:00
Alina Sbirlea	49ab760090	[DagCombine] Increase depth by number of operands to avoid a pathological compile time. We're hitting a pathological compile-time case, profiled to be in DagCombiner::visitTokenFactor and many inserts into a SmallPtrSet. It looks like one of the paths around findBetterNeighborChains is not capped and leads to this. This patch resolves the issue. Looking for feedback if this solution looks reasonable. Differential Revision: https://reviews.llvm.org/D118877	2022-02-09 13:31:28 -08:00
Sander de Smalen	ec46232517	[DAGCombiner] Fold `ty1 extract_vector(ty2 splat(V)) -> ty1 splat(V)` This seems like an obvious fold, which leads to a few improvements. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118920	2022-02-09 14:30:01 +00:00
Sanjay Patel	905abc5b7d	[SDAG] enable binop identity constant folds for fmul/fdiv The test diffs are identical to D119111. This only affects x86 currently because no other target has an override for the TLI hook that controls this transform.	2022-02-08 10:52:28 -05:00
Roman Lebedev	ae9414d562	[ValueTracking] Only check for non-undef/poison if already known to be a self-multiply https://godbolt.org/z/js9fTTG9h ^ we don't care what `isGuaranteedNotToBeUndefOrPoison()` says unless we already knew that the operands were equal.	2022-02-08 18:35:29 +03:00
Sanjay Patel	a68e098024	[SDAG] move x86 select-with-identity-constant fold behind a target hook; NFC This is no-functional-change-intended because only the x86 target enables the TLI hook currently. We can add fmul/fdiv opcodes to the switch similar to the proposal D119111, but we don't need to make other changes like enabling target-specific combines. We can also add integer opcodes (add, or, shl, etc.) to the switch because this function is called from all of the generic binary opcodes. The goal is to incrementally enable the profitable diffs from D90113 while avoiding regressions. Differential Revision: https://reviews.llvm.org/D119150	2022-02-08 09:55:05 -05:00
Simon Pilgrim	fd2bb51f1e	[ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bitmask. This patch adds variants of APInt::isShiftedMask, isShiftedMask_32 and isShiftedMask_64 that return these values as additional arguments. I've updated a number of cases that were either performing seperate size/position calculations or had created their own local wrapper versions of these. Differential Revision: https://reviews.llvm.org/D119019	2022-02-08 12:04:13 +00:00
Sanjay Patel	d1ecfaa097	[SDAG] try to fold one-demanded-bit-of-multiply This is a translation of the transform added to InstCombine with: D118539	2022-02-07 17:24:35 -05:00
Sanjay Patel	fc6bee1c11	[SDAG] SimplifyDemandedBits - generalize fold for 2 LSB of X*X This is translated from recent changes to the IR version of this function: D119060 D119139	2022-02-07 15:38:50 -05:00
Simon Pilgrim	74555fd367	[DAG] visitINSERT_VECTOR_ELT - break if-else chain as they both return (style). NFC.	2022-02-07 09:58:47 +00:00
Craig Topper	c35ccd2ac8	[DAGCombiner][RISCV] Allow rotates by non-constant to be matched for i32 on riscv64 with Zbb. rv64izbb has a RORW/ROLW instructions that operate on the lower 32-bits of a 64-bit value and sign extend bit 31 of the result. DAGCombiner won't match rotate idioms because the i32 type isn't Legal on riscv64. This patch teaches DAGCombiner to allow it if the type is going to be promoted and the target has Custom type legalization for ISD::ROTL or ISD::ROTR. I've restricted this to scalar types. It doesn't appear any in tree targets other than riscv64 have custom type legalization for rotates. If this patch isn't acceptable, I guess I can match SRLW, SLLW, and OR after type legalization, but I'd like to avoid that if possible. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119062	2022-02-06 10:58:12 -08:00
Bjorn Pettersson	cecf11c315	[DAGCombiner] Fold SSHLSAT/USHLSAT to SHL when no saturation will occur When the shift amount is known and a known sign bit analysis of the shiftee indicates that no saturation will occur, then we can replace SSHLSAT/USHLSAT by SHL. Differential Revision: https://reviews.llvm.org/D118765	2022-02-06 18:59:06 +01:00
Benjamin Kramer	a40dc4eaf8	Simplify mask creation with llvm::seq. NFCI.	2022-02-05 23:35:41 +01:00
Sander de Smalen	6452549f30	[DAGCombiner] Fold vecreduce_or/and if operand is insert_subvector. Fold: vecreduce_or(insert_subvec(zeroinitializer, vec)) -> vecreduce_or(vec) vecreduce_and(insert_subvec(allones, vec)) -> vecreduce_and(vec) vecreduce_and/or(insert_subvec(undef, vec)) -> vecreduce_and/or(vec) This is useful for SVE which uses insert/extract subvector to convert fixed-width to/from scalable vectors. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D118919	2022-02-05 14:35:53 +00:00
John Brawn	0d8092dd48	[AArch64] Fix legalization of v1f64 strict_fsetcc and strict_fsetccs These operations are scalarized but the result type v1i1 isn't which needs special handling (the same as is done for the non-strict versions of these operations). Differential Revision: https://reviews.llvm.org/D118258	2022-02-04 12:55:38 +00:00
serge-sans-paille	ffe8720aa0	Reduce dependencies on llvm/BinaryFormat/Dwarf.h This header is very large (3M Lines once expended) and was included in location where dwarf-specific information were not needed. More specifically, this commit suppresses the dependencies on llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used, this has a decent impact on number of preprocessed lines generated during compilation of LLVM, as showcased below. This is achieved by moving some definitions back to the .cpp file, no performance impact implied[0]. As a consequence of that patch, downstream user may need to manually some extra files: llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h In some situations, codes maybe relying on the fact that llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden dependency now needs to be explicit. $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l after: 10978519 before: 11245451 Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup [0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions Differential Revision: https://reviews.llvm.org/D118781	2022-02-04 11:44:03 +01:00
Bjorn Pettersson	3db39e7479	[DAGCombiner] Fix dependency analysis in checkMergeStoreCandidatesForDependencies In the aftermath of D116895 a problem was found in the analysis of dependencies between store merge candidates in checkMergeStoreCandidatesForDependencies, that is needed to avoid the cycles are introduced in the DAG. In the past it has been enough (or assumed to be enough) to start scanning from non-chain operands when analysing the store merge candidates for dependencies, assuming that the analysis of chain dependencies performed when finding the candidates would cover up for potential dependencies that exist involving the chain operands. It was however discovered that one could end up with scenarios such as descibed in the aarch64-checkMergeStoreCandidatesForDependencies.ll test case, when the dependency between two stores is given by a mix of chain operand dependencies and non-chain operand dependencies. The fix in this patch make sure that we also account for chain operand dependencies when doing the more elaborate analysis in checkMergeStoreCandidatesForDependencies, no longer relying on that the earlier check involving chain operands is enough. Differential Revision: https://reviews.llvm.org/D118943	2022-02-04 08:53:01 +01:00
Sander de Smalen	01bfe9729a	[ISEL] Canonicalize STEP_VECTOR to LHS if RHS is a splat. This helps recognise patterns where we're trying to match STEP_VECTOR patterns to INDEX instructions that take a GPR for the Start/Step. The reason for canonicalising this operation to the LHS is because it will already be canonicalised to the LHS if the RHS is a constant splat vector. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D118459	2022-02-03 09:31:46 +00:00
Simon Pilgrim	5aa2acc86b	[DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.	2022-02-02 12:04:49 +00:00
Simon Moll	7d926b7177	[VE] LEGALAVL and staged VVP legalization The new LEGALAVL node annotates that the AVL refers to packs of 64bit. We use a two-stage lowering approach with LEGALAVL: First, standard SDNodes are translated into illegal VVP layer nodes. Regardless of source (VP or standard), all VVP nodes have a mask and AVL parameter. The AVL parameter refers to the element position (just as in VP intrinsics). Second, we legalize the AVL usage in VVP layer nodes. If the element size is < 64bit, the EVL parameter has to be adjusted to refer to packs of 64bits. We wrap the legalized AVL in a LEGALAVL node to track this. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D118321	2022-02-02 09:11:41 +01:00
David Green	c89cfbd4dd	Revert "[DAG] Extend SearchForAndLoads with any_extend handling" This reverts commit `100763a88f` as it was making incorrect assumptions about implicit zero_extends.	2022-02-01 20:18:40 +00:00
Simon Pilgrim	904395ab8f	[DAG] SimplifyMultipleUseDemandedBits - add default Depth = 0 argument. Simplifies an upcoming change.	2022-02-01 12:34:38 +00:00
Simon Pilgrim	d83a96f59f	[DAG] Make it clear mul(x,x) knownbits bit[1] == 0 check should be for x is undef only As raised on rGffd0e464b4b9, if x is poison, this fold is still ok.	2022-02-01 11:32:14 +00:00
Bjorn Pettersson	3885879046	[DAGCombine] Add simple folds for SSHLSAT/USHLSAT Do "simplifyShift" and "FoldConstantArithmetic" folds for the SSHLSAT and USHLSAT DAG nodes. This includes folds such as: (shlsat undef/poison, x) -> 0 (shlsat x, undef/poison) -> undef (shlsat x, too_large_shamt) -> undef (shlsat 0, x) -> 0 (shlsat x, 0) -> x (shlsat c1, c2) -> c3 Differential Revision: https://reviews.llvm.org/D118603	2022-02-01 10:51:35 +01:00
David Sherwood	daa80339df	[CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors I have updated TargetLowering::isConstTrueVal to also consider SPLAT_VECTOR nodes with constant integer operands. This allows the optimisation to also work for targets that support scalable vectors. Differential Revision: https://reviews.llvm.org/D117210	2022-02-01 09:50:00 +00:00
Philip Reames	57cf29ac1b	[Statepoint] Remove another use of getActualReturnType [NFC] For the cross block gc.result projection case, we only care about the return type if there is a cross block gc.result, and if there is one, we can take the type from the gc.result. At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.	2022-01-31 09:57:46 -08:00
Philip Reames	6e4f7c0823	[Statepoints] Take result type from gc.result [NFC] When lowering a gc.result, we can assume that the result type of the gc.result matches the type of the underlying call. This is explicitly required in LangRef. At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.	2022-01-31 09:42:34 -08:00
Philip Reames	093b43f48d	Sink getGCResultLocality to sole use [NFC]	2022-01-31 09:33:57 -08:00
Kerry McLaughlin	002b944dfa	[SVE] Fix TypeSize->uint64_t implicit conversion in visitAlloca() Fixes a crash ('Invalid size request on a scalable vector') in visitAlloca() when we call this function for a scalable alloca instruction, caused by the implicit conversion of TySize to uint64_t. This patch changes TySize to a TypeSize as returned by getTypeAllocSize() and ensures the allocation size is multiplied by vscale for scalable vectors. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D118372	2022-01-31 14:37:23 +00:00
Dávid Bolvanský	ae990a3cbd	[Analysis] Attribute noundef should not prevent tail call optimization Very similar to https://reviews.llvm.org/D101230 Fixes https://github.com/llvm/llvm-project/issues/53501	2022-01-31 15:13:52 +01:00
Simon Pilgrim	7ec8fc2932	[X86] combineAnd() - per-element simplification - call SimplifyDemandedBits using mask demanded bits if SimplifyDemandedVectorElts fails We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements. This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.	2022-01-31 13:58:00 +00:00
Simon Pilgrim	2d1390efbe	[DAG] SimplifyDemandedBits - mul(x,x) - if only demand bit[1] then fold to zero	2022-01-31 12:00:51 +00:00
Simon Pilgrim	48f45f6b25	[X86] Limit mul(x,x) knownbits tests with not undef/poison check We can only assume bit[1] == zero if its the only demanded bit or the source is not undef/poison	2022-01-31 11:55:10 +00:00
Kazu Hirata	2bea207d26	[CodeGen] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-01-30 12:32:51 -08:00
Cullen Rhodes	5d089d9a83	[DAGCombiner] Fix invalid size request in combineRepeatedFPDivisors If we have a vector FP division with a splatted divisor, use getVectorMinNumElements when scaling the num of uses by splat factor. For AArch64 the combine kicks in for the <vscale x 4 x float> case since it's above the fdiv threshold (3) when scaling num uses by splat factor, but the codegen is worse (splat + vector fdiv + vector fmul) than the <vscale x 2 x double> case (splat + vector fdiv). If the combine could be converted into a scalar FP division by scalarizeBinOpOfSplats it may be cheaper, but it looks like this is predicated on the isExtractVecEltCheap TLI function which is implemented for x86 but not AArch64. Perhaps for now combineRepeatedFPDivisors should only scale num uses by splat if the division can be converted into scalar op. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118343	2022-01-28 17:01:08 +00:00
Ellis Hoag	11d3074267	[InstrProf] Add single byte coverage mode Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because * We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions * We use a single byte per function rather than 8 bytes per block The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code. When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%). [0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D116180	2022-01-27 17:38:55 -08:00
Simon Pilgrim	fdd3e2c943	[DAG] SelectionDAG::getNode(N1,N2) - detect N2 constant vector splats as well as scalars We already perform some basic folds (add/sub with zero etc.) on scalar types, this patch adds some basic support for constant splats as well in a few cases (we can add more with future test coverage). In the cases I've enabled, we can handle buildvector implicit truncation as we're not creating new constant nodes from the vector types - we're just returning existing nodes. This allows us to get a number of extra cases in the aarch64 tests. I haven't enabled support for undefs in buildvector splats, as we're often checking for zero/allones patterns that return the original constant and we shouldn't be returning undef elements in some of these cases - we can enable this later if we're OK with creating new constants. Differential Revision: https://reviews.llvm.org/D118264	2022-01-27 10:59:08 +00:00
Fraser Cormack	84e85e025e	[SelectionDAG][VP] Provide expansion for VP_MERGE This patch adds support for expanding VP_MERGE through a sequence of vector operations producing a full-length mask setting up the elements past EVL/pivot to be false, combining this with the original mask, and culminating in a full-length vector select. This expansion should work for any data type, though the only use for RVV is for boolean vectors, which themselves rely on an expansion for the VSELECT. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118058	2022-01-27 09:00:41 +00:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
Sanjay Patel	63daea8b35	[SDAG] fix bug in ComputeNumSignBits of target constant The loop below the changed line assumes that the element width of the target constant is the same as the element width of the loaded value, but that is not always true. We could try harder to do some kind of min/max calc even if the sizes don't match, but that can be another patch if needed. This fixes #53401 (miscompile) and does not change the motivating cases added when this analysis was introduced: `ad298f86b7`	2022-01-26 10:22:41 -05:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
alex-t	5157f984ae	[AMDGPU] Enable divergence-driven XNOR selection Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence. This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32) on those targets which have no V_XNOR. Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form. We assume that xor (not) is already turned into the not (xor) by the combiner. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116270	2022-01-26 15:33:10 +03:00
David Green	57356d6bb7	[DAG] Create fptoui.sat from clamped fptoui This is the unsigned variant of D111976, where we convert a clamped fptoui to a fptoui.sat. Because we are unsigned, the condition this time is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D114964	2022-01-26 08:37:44 +00:00
Simon Pilgrim	15e2be291f	[DAG] visitMULHS/MULHU/AND - remove some redundant LHS constant checks Now that we constant fold and canonicalize constants to the RHS, we don't need to check both LHS and RHS for specific constants	2022-01-25 11:54:23 +00:00
Bjorn Pettersson	109cc5adcc	[DAGCombine] Fold SRA of a load into a narrower sign-extending load An sra is basically sign-extending a narrower value. Fold away the shift by doing a sextload of a narrower value, when it is legal to reduce the load width accordingly. Differential Revision: https://reviews.llvm.org/D116930	2022-01-25 12:14:48 +01:00
Fraser Cormack	7cb452bfde	[SelectionDAG][VP] Add widening support for VP_MERGE This patch adds widening support for ISD::VP_MERGE, which widens identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118030	2022-01-25 10:59:40 +00:00
Fraser Cormack	5f5c5603ce	[SelectionDAG][VP] Add splitting support for VP_MERGE This patch adds splitting support for ISD::VP_MERGE, which splits identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118032	2022-01-25 10:33:23 +00:00
Victor Perez	2233befa5d	[LegalizeTypes][VP] Add splitting support for vp.gather and vp.scatter Split these nodes in a similar way as their masked versions. Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D117760	2022-01-25 10:08:07 +00:00
Paweł Bylica	9d32847b33	[DAGCombine] Remove unused param in combineCarryDiamond(). NFC	2022-01-24 20:57:00 +01:00
Sander de Smalen	699e22a083	[ISEL] Move trivial step_vector folds to FoldConstantArithmetic. Given that step_vector is practically a constant, doing this early helps with DAGCombine folds that happen before type legalization. There is currently no way to test this happens earlier, although existing tests for step_vector folds continue protect the folds happening at all. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117863	2022-01-24 16:37:21 +00:00
Craig Topper	a43ed49f5b	[DAGCombiner][RISCV] Canonicalize (bswap(bitreverse(x))->bitreverse(bswap(x)). If the bitreverse gets expanded, it will introduce a new bswap. By putting a bswap before the bitreverse, we can ensure it gets cancelled out when this happens. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118012	2022-01-24 08:31:53 -08:00
Craig Topper	b8c7cdcc81	[SelectionDAG][RISCV] Teach getNode to fold bswap(bswap(x))->x. This can show up during when bitreverse is expanded to bswap and swap of bits within a byte. If the input is already a bswap, we should cancel them out before we further transform them in a way that makes it harder to see the redundancy. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118007	2022-01-24 08:17:46 -08:00
Matt Arsenault	99e8e17313	Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This reverts commit `a97e20a3a8`.	2022-01-24 09:26:52 -05:00
Bjorn Pettersson	46cacdbb21	[DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth In code review for D117104 two slightly weird checks were found in DAGCombiner::reduceLoadWidth. They were typically checking if BitsA was a mulitple of BitsB by looking at (BitsA & (BitsB - 1)), but such a comparison actually only make sense if BitsB is a power of two. The checks were related to the code that attempted to shrink a load based on the fact that the loaded value would be right shifted. Afaict the legality of the value types is checked later (typically in isLegalNarrowLdSt), so the existing checks were both overly conservative as well as being wrong whenever ExtVTBits wasn't a power of two. The latter was a situation triggered by a number of lit tests so we could not just assert on ExtVTBIts being a power of two). When attempting to simply remove the checks I found some problems, that seems to have been guarded by the checks (maybe just out of luck). A typical example would be a pattern like this: t1 = load i96* ptr t2 = srl t1, 64 t3 = truncate t2 to i64 When DAGCombine is visiting the truncate reduceLoadWidth is called attempting to narrow the load to 64 bits (ExtVT := MVT::i64). Then the SRL is detected and we set ShAmt to 64. In the past we've bailed out due to i96 not being a multiple of 64. If we simply remove that check then we would end up replacing the load with a new load that would read 64 bits but with a base pointer adjusted by 64 bits. So we would read 32 bits the wasn't accessed by the original load. This patch will instead utilize the fact that the logical left shift can be folded away by using a zextload. Thus, the pattern above will now be combined into t3 = load i32* ptr+offset, zext to i64 Another case is shown in the X86/shift-folding.ll test case: t1 = load i32* ptr t2 = srl i32 t1, 8 t3 = truncate t2 to i16 In the past we bailed out due to the shift count (8) not being a multiple of 16. Now the narrowing kicks in and we get t3 = load i16* ptr+offset Differential Revision: https://reviews.llvm.org/D117406	2022-01-24 12:22:04 +01:00
Nikita Popov	e7c9a6cae0	[SDAG] Don't move DBG_VALUE instructions after insertion point during scheduling (PR53243) EmitSchedule() shouldn't be touching instructions after the provided insertion point. The change introduced in D83561 performs a scan to the end of the block, and thus may move unrelated instructions. In particular, this ends up moving instructions that have been produced by FastISel and will later be deleted. Moving them means that more instructions than intended are removed. Fix this by stopping the iteration when the insertion point is reached. Fixes https://github.com/llvm/llvm-project/issues/53243. Differential Revision: https://reviews.llvm.org/D117489	2022-01-24 10:50:49 +01:00
Sander de Smalen	4f8fdf7827	[ISEL] Canonicalise constant splats to RHS. SelectionDAG::getNode() canonicalises constants to the RHS if the operation is commutative, but it doesn't do so for constant splat vectors. Doing this early helps making certain folds on vector types, simplifying the code required for target DAGCombines that are enabled before Type legalization. Somewhat to my surprise, DAGCombine doesn't seem to traverse the DAG in a post-order DFS, so at the time of doing some custom fold where the input is a MUL, DAGCombiner::visitMUL hasn't yet reordered the constant splat to the RHS. This patch leads to a few improvements, but also a few minor regressions, which I traced down to D46492. When I tried reverting this change to see if the changes were still necessary, I ran into some segfaults. Not sure if there is some latent bug there. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117794	2022-01-24 09:38:36 +00:00
Simon Pilgrim	accc07e654	[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312) Fixes parity codegen issue where we know all but the lowest bit is zero, we can replace the ICMPNE with 0 comparison with a ext/trunc Differential Revision: https://reviews.llvm.org/D117983	2022-01-23 16:36:25 +00:00
Simon Pilgrim	6605057992	Revert rG7c66aaddb128dc0f342830c1efaeb7a278bfc48c "[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312)" Noticed a typo in the getBooleanContents call just after I pressed commit :(	2022-01-23 16:28:44 +00:00
Simon Pilgrim	7c66aaddb1	[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312) Fixes parity codegen issue where we know all but the lowest bit is zero, we can replace the ICMPNE with 0 comparison with a ext/trunc Differential Revision: https://reviews.llvm.org/D117983	2022-01-23 16:20:42 +00:00
David Green	b27e5459d5	[DAG] Convert truncstore(extend(x)) back to store(x) Pulled out of D106237, this folds truncstore(extend(x)) back to store(x) if the original store was legal. This can come up due to the order we fold nodes. A fold from X86 needs to be adjusted to prevent infinite loops, to have it pick the operand of a trunc more directly. Differential Revision: https://reviews.llvm.org/D117901	2022-01-22 13:20:36 +00:00
Craig Topper	9abc593e98	[TargetLowering][InstCombine] Simplify BSwap demanded bits code a little. NFC Use alignDown instead of &= ~7. Replace ResultBit with NLZ. (BitWidth - NLZ - NTZ == 8) so (BitWidth - NTZ - 8 == NLZ). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D117804	2022-01-20 10:45:17 -08:00
Victor Perez	c10c748878	[LegalizeTypes][VP] Add widening support for vp.gather and vp.scatter Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117557	2022-01-20 08:57:57 +00:00
Simon Pilgrim	d6fee6c3b0	[DAG] SelectionDAG::computeKnownBits - add mul(x,x) self-multiply handling (PR48683) Pass the SelfMultiply flag to KnownBits::mul() - added at D108992 https://alive2.llvm.org/ce/z/NN_eaR	2022-01-19 17:39:32 +00:00
David Green	100763a88f	[DAG] Extend SearchForAndLoads with any_extend handling This extends the code in SearchForAndLoads to be able to look through ANY_EXTEND nodes, which can be created from mismatching IR types where the AND node we begin from only demands the low parts of the register. That turns zext and sext into any_extends as only the low bits are demanded. To be able to look through ANY_EXTEND nodes we need to handle mismatching types in a few places, potentially truncating the mask to the size of the final load. Recommitted with a more conservative check for the type of the extend. Differential Revision: https://reviews.llvm.org/D117457	2022-01-18 21:03:08 +00:00
Fraser Cormack	c8e33978fb	[VP] Propagate align parameter attr on VP gather/scatter to ISel This patch fixes a case where the 'align' parameter attribute on the pointer operands to llvm.vp.gather and llvm.vp.scatter was being dropped during the conversion to the SelectionDAG. The default alignment equal to the ABI type alignment of the vector type was kept. It also updates the documentation to reflect the fact that the parameter attribute is now properly supported. The default alignment of these intrinsics was previously documented as being equal to the ABI alignment of the scalar type, when in fact that wasn't the case: the ABI alignment of the vector type was used instead. This has also been fixed in this patch. Reviewed By: simoll, craig.topper Differential Revision: https://reviews.llvm.org/D114423	2022-01-18 17:33:24 +00:00
Sanjay Patel	870591200d	[SDAG] remove duplicate functionality when getting shift type for demanded bits; NFCI This was noted as a potential cleanup in D117508. getShiftAmountTy() has checks for vector, phase, etc. so it should handle anything that the caller was trying to account for.	2022-01-18 12:13:45 -05:00
Victor Perez	b7bf96a258	[LegalizeTypes][VP] Add widening support for vp.reduce.* When widening these intrinsics, we do not have to insert neutral elements at the end of the vector as when widening vector.reduce.* intrinsics, thanks to vector predication semantics. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117467	2022-01-18 10:21:01 +00:00
Hans Wennborg	f4615feaa1	Revert "[DAG] Extend SearchForAndLoads with any_extend handling" This caused builds to fail with llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:5638: bool (anonymous namespace)::DAGCombiner::BackwardsPropagateMask(llvm::SDNode *): Assertion `NewLoad && "Shouldn't be masking the load if it can't be narrowed"' failed. See the code review for a link to a reproducer. > This extends the code in SearchForAndLoads to be able to look through > ANY_EXTEND nodes, which can be created from mismatching IR types where > the AND node we begin from only demands the low parts of the register. > That turns zext and sext into any_extends as only the low bits are > demanded. To be able to look through ANY_EXTEND nodes we need to handle > mismatching types in a few places, potentially truncating the mask to > the size of the final load. > > Differential Revision: https://reviews.llvm.org/D117457 This reverts commit `578008789f`.	2022-01-18 10:50:55 +01:00
Victor Perez	fd1dce35bd	[LegalizeTypes][VP] Add splitting support for vp.reduction.* Split vp.reduction.* intrinsics by splitting the vector to reduce in two halves, perform the reduction operation in each one of them and accumulate the results of both operations. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117469	2022-01-18 09:29:24 +00:00
David Sherwood	f4515ab858	Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants" This reverts commit `197f3c0deb`. Reverting after miscompilation errors discovered with ffmpeg.	2022-01-18 08:40:20 +00:00
Sanjay Patel	ba6485e25f	[SDAG] add demanded bits transform for bswap A possible codegen regression for PowerPC is noted in D117406 because we don't recognize a pattern that demands only 1 byte from a bswap. This fold has existed in IR since close to the beginning of LLVM: https://github.com/llvm/llvm-project/blame/main/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L794 ...so this patch copies that code as much as possible and adapts it for SDAG. The test for PowerPC that would change in D117406 is over-reduced with undefs, so I recreated it for AArch64 and x86 by passing in pointer args and renamed the values to make the logic clearer. Differential Revision: https://reviews.llvm.org/D117508	2022-01-17 18:25:42 -05:00
David Green	578008789f	[DAG] Extend SearchForAndLoads with any_extend handling This extends the code in SearchForAndLoads to be able to look through ANY_EXTEND nodes, which can be created from mismatching IR types where the AND node we begin from only demands the low parts of the register. That turns zext and sext into any_extends as only the low bits are demanded. To be able to look through ANY_EXTEND nodes we need to handle mismatching types in a few places, potentially truncating the mask to the size of the final load. Differential Revision: https://reviews.llvm.org/D117457	2022-01-17 15:25:11 +00:00
David Sherwood	197f3c0deb	[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants When we know the value we're extending is a negative constant then it makes sense to use SIGN_EXTEND because this may improve code quality in some cases, particularly when doing a constant splat of an unpacked vector type. For example, for SVE when splatting the value -1 into all elements of a vector of type <vscale x 2 x i32> the element type will get promoted from i32 -> i64. In this case we want the splat value to sign-extend from (i32 -1) -> (i64 -1), whereas currently it zero-extends from (i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use a single mov immediate instruction. New tests added here: CodeGen/AArch64/sve-vector-splat.ll I believe we see some code quality improvements in these existing tests too: CodeGen/AArch64/reduce-and.ll CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only occur because the test disables codegen prepare and branch folding. Differential Revision: https://reviews.llvm.org/D114357	2022-01-17 11:08:57 +00:00
Bjorn Pettersson	9f237c9e7d	[DAGCombine] Refactor DAGCombiner::ReduceLoadWidth. NFCI Update code comments in DAGCombiner::ReduceLoadWidth and refactor the handling of SRL a bit. The refactoring is done with the intent of adding support for folding away SRA by using SEXTLOAD in a follow-up patch. The function is also renamed as DAGCombiner::reduceLoadWidth. Differential Revision: https://reviews.llvm.org/D117104	2022-01-16 20:24:52 +01:00
Fangrui Song	5456249736	[SelectionDAG] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D117235	2022-01-15 17:13:09 -08:00
Nikita Popov	c63a3175c2	[AttrBuilder] Remove ctor accepting AttributeList and Index Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeList. Moving this out of the AttrBuilder generally results in cleaner code.	2022-01-15 22:39:31 +01:00
Fraser Cormack	877d1b3d07	[SelectionDAG][VP] Add splitting/widening for VP_LOAD and VP_STORE Original patch by @hussainjk. This patch was split off from D109377 to keep vector legalization (widening/splitting) separate from vector element legalization (promoting). While the original patch added a third overload of SelectionDAG::getVPStore, this patch takes the liberty of collapsing those all down to 1, as three overloads seems excessive for a little-used node. The original patch also used ModifyToType in places, but that method still crashes on scalable vector types. Seeing as the other VP legalization methods only work when all operands need identical widening, this patch follows in that vein. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117235	2022-01-15 11:41:29 +00:00
Craig Topper	e0841f6920	[SelectionDAGBuilder] Remove unneeded vector bitcast from visitTargetIntrinsic. This seems to be a leftover from a long time ago when there was an ISD::VBIT_CONVERT and a MVT::Vector. It looks like in those days the vector type was carried in a VTSDNode. As far as I know, these days ComputeValueTypes would have already assigned "Result" the same type we're getting from TLI.getValueType here. Thus the BITCAST is always a NOP. Verified by adding an assert and running check-llvm. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D117335	2022-01-14 12:52:49 -08:00
James Y Knight	a97e20a3a8	Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This commit sometimes causes a crash when compiling a vtable thunk. E.g.: clang '--target=aarch64-grtev4-linux-gnu' -xc++ - -c -o /dev/null <<EOF struct a { virtual int f(); }; struct c { virtual int &g() const; }; struct d : a, c { int &g() const; }; int &d::g() const {} EOF Some follow-up commits have been reverted as well: Revert "IR: Make getRetAlign check callee function attributes" Revert "Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC." Revert "Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC." This reverts commit `4f414af6a7`. This reverts commit `a5507d2e25`. This reverts commit `3d2d208f6a`. This reverts commit `07ddfa95e3`.	2022-01-14 04:50:07 +00:00
David Sherwood	ba471ba8d2	Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants" This reverts commit `31009f0b5a`. It seems to be causing SVE VLA buildbot failures and has introduced a genuine regression. Reverting for now.	2022-01-13 15:59:43 +00:00
David Sherwood	31009f0b5a	[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants When we know the value we're extending is a negative constant then it makes sense to use SIGN_EXTEND because this may improve code quality in some cases, particularly when doing a constant splat of an unpacked vector type. For example, for SVE when splatting the value -1 into all elements of a vector of type <vscale x 2 x i32> the element type will get promoted from i32 -> i64. In this case we want the splat value to sign-extend from (i32 -1) -> (i64 -1), whereas currently it zero-extends from (i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use a single mov immediate instruction. New tests added here: CodeGen/AArch64/sve-vector-splat.ll I believe we see some code quality improvements in these existing tests too: CodeGen/AArch64/dag-numsignbits.ll CodeGen/AArch64/reduce-and.ll CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only occur because the test disables codegen prepare and branch folding. Differential Revision: https://reviews.llvm.org/D114357	2022-01-13 09:43:07 +00:00
Matt Arsenault	07ddfa95e3	GlobalISel: Add G_ASSERT_ALIGN hint instruction Insert it for call return values only for now, which is the only case the DAG handles also.	2022-01-12 18:20:58 -05:00
Craig Topper	63b17eb9ec	[RISCV] Add strictfp support for compares. This adds support for STRICT_FSETCC(quiet) and STRICT_FSETCCS(signaling). FEQ matches well to STRICT_FSETCC oeq. FLT/FLE matches well to STRICT_FSETCCS olt/ole. Others require commuting operands or multiple instructions. STRICT_FSETCC olt/ole/ogt/oge/ult/ule/ugt/uge uses FLT/FLE, but we need to save/restore FFLAGS around them to avoid spurious exceptions. I've implemented pseudo instructions with a CustomInserter to insert the save/restore CSR instructions. Unfortunately, this doesn't honor exceptions for signaling NANs but I'm not sure if signaling nans are really supported by the constrained intrinsics. STRICT_FSETCC one and ueq expand to a pair of FLT instructions with a save/restore of fflags around each. This could be improved in the future. There may be some opportunities to generate better code for strict comparisons mixed with nonans fast math flags. I've left FIXMEs in the .td files for that. Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com> Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D116694	2022-01-11 20:01:41 -08:00
Nick Desaulniers	4edb9983cb	[SelectionDAG] treat X constrained labels as i for asm Completely rework how we handle X constrained labels for inline asm. X should really be treated as i. Then existing tests can be moved to use i D115410 and clang can just emit i D115311. (D115410 and D115311 are callbr, but this can be done for label inputs, too). Coincidentally, this simplification solves an ICE uncovered by D87279 based on assumptions made during D69868. This is the third approach considered. See also discussions v1 (D114895) and v2 (D115409). Reported-by: kernel test robot <lkp@intel.com> Fixes: https://github.com/ClangBuiltLinux/linux/issues/1512 Reviewed By: void, jyknight Differential Revision: https://reviews.llvm.org/D115688	2022-01-11 10:29:40 -08:00
David Sherwood	51497dc0b2	[IR] Change vector.splice intrinsic to reject out-of-bounds indices I've changed the definition of the experimental.vector.splice instrinsic to reject indices that are known to be or possibly out-of-bounds. In practice, this means changing the definition so that the index is now only valid in the range [-VL, VL-1] where VL is the known minimum vector length. We use the vscale_range attribute to take the minimum vscale value into account so that we can permit more indices when the attribute is present. The splice intrinsic is currently only ever generated by the vectoriser, which will never attempt to splice vectors with out-of-bounds values. Changing the definition also makes things simpler for codegen since we can always assume that the index is valid. This patch was created in response to review comments on D115863 Differential Revision: https://reviews.llvm.org/D115933	2022-01-11 09:37:39 +00:00
Nick Desaulniers	649b11ef8b	git-clang-format HEAD~	2022-01-10 18:34:30 -08:00
Nick Desaulniers	301e911740	[TargetLowering] precommit refactor from D115688 NFC Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>	2022-01-10 18:32:13 -08:00
Nadav Rotem	e2cc091a7d	Fix a missed opportunity to merge stores. This commit fixes a missed opportunity in merging consecutive stores. The code that searches for stores skipped the case of stores that directly connect to the root. The comment above the implementation lists this case but the code did not handle it. I found this pattern when looking into the shared_ptr destructor. GCC generates the right sequence. Here is a small repo: int foo(int* buff) { buff[0] = 0; int x = buff[1]; buff[1] = 0; return x; } Differential Revision: https://reviews.llvm.org/D116895	2022-01-10 13:49:02 -08:00
Serge Guelton	d2cc6c2d0c	Use a sorted array instead of a map to store AttrBuilder string attributes Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step. Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions Differential Revision: https://reviews.llvm.org/D116599	2022-01-10 14:49:53 +01:00
Chen Zheng	2c46ca96e2	[PowerPC] fast isel can lower intrinsics call on AIX. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D114778	2022-01-10 02:30:05 +00:00
Craig Topper	a500f7f48f	[SelectionDAG] Add FP_TO_UINT_SAT/FP_TO_SINT_SAT to computeKnownBits/computeNumSignBits. These nodes should saturate to their saturating VT. We can use this information to know the bits past the VT are all zeros or all sign bits. I think we might only have test coverage for the unsigned case. I'll verify and add tests. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D116870	2022-01-09 17:48:05 -08:00
Nikita Popov	0312fe2901	[CodeGen] Support opaque pointers for inline asm This is the last part of D116531. Fetch the type of the indirect inline asm operand from the elementtype attribute, rather than the pointer element type. Fixes https://github.com/llvm/llvm-project/issues/52928.	2022-01-07 10:57:38 +01:00
Nikita Popov	e4d1779990	[IR] Add ConstraintInfo::hasArg() helper (NFC) Checking whether a constraint corresponds to an argument is a recurring pattern.	2022-01-07 10:44:38 +01:00
Victor Perez	38efa68b08	[LegalizeTypes][VP] Add splitting support for vp.select Split vp.select in a similar way as vselect, splitting also the length parameter. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116651	2022-01-07 08:46:01 +00:00
Kazu Hirata	2aed08131d	[llvm] Use true/false instead of 1/0 (NFC) Identified with modernize-use-bool-literals.	2022-01-07 00:39:14 -08:00
Craig Topper	88ecdd30f6	[LegalizeTypes] Remove IsVP argument from type legalization methods. NFC We can either check the opcode or number of operands or use ISD::isVPOpcode inside the methods. In some places I've used number of operands figuring that it is cheaper than isVPOpcode. I've included isVPOpcode in an assert to verify. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D116578	2022-01-05 09:00:48 -08:00
Victor Perez	96e220e688	[LegalizeTypes][VP] Add integer promotion support for vp.select Promote select, vselect and vp.select in a similar way. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116400	2022-01-05 11:01:52 +00:00
Victor Perez	df5226dfb3	[LegalizeTypes][VP] Add widening support for vp.select Widen vp.select the same way as select and vselect. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116407	2022-01-05 09:21:11 +00:00
Craig Topper	a04b532505	[LegalizeIntegerTypes][RISCV] Teach PromoteSetCCOperands to check sign bits of unsigned compares. Unsigned compares work with either zero extended or sign extended inputs just like equality comparisons. I didn't allow this when I refactored the code in D116421 due to lack of tests. But I've since found a simple C test case that demonstrates when this can be useful. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D116617	2022-01-04 12:38:47 -08:00
Simon Moll	4c2aba999e	[VP][ISel] use LEGALPOS for legalization action Use the VPIntrinsics.def's LEGALPOS that is specified with every VP SDNode to determine which return or operand value type shall be used to infer the legalization action. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D116594	2022-01-04 14:50:49 +01:00
Simon Pilgrim	882c083889	[DAG] TargetLowering::SimplifySetCC - use APInt::getMinSignedBits() helper. NFC.	2022-01-04 13:48:36 +00:00
Craig Topper	cbcbbd6ac8	[ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC This function returns an upper bound on the number of bits needed to represent the signed value. Use "Max" to match similar functions in KnownBits like countMaxActiveBits. Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old name around to keep this patch size down. Will do a bulk rename as follow up. Rename KnownBits::countMaxSignedBits->countMaxSignificantBits. Reviewed By: lebedev.ri, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D116522	2022-01-03 11:33:30 -08:00
Victor Perez	5527139302	[RISCV][VP] Add RVV codegen for [nX]vXi1 vp.select Expand [nX]vXi1 vp.select the same way as [nX]vXi1 vselect. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115546	2022-01-02 23:12:32 -08:00
Kazu Hirata	69ccc96162	[llvm] Use the default constructor for SDValue (NFC)	2022-01-01 10:36:59 -08:00
Craig Topper	243b7aaf51	[SelectionDAG] Use KnownBits::countMinSignBits() to simplify the end of ComputeNumSignBits. This matches what is done in ValueTracking.cpp Reviewed By: RKSimon, foad Differential Revision: https://reviews.llvm.org/D116423	2021-12-31 17:29:57 -08:00
Craig Topper	d00e438cfe	[RISCV][LegalizeIntegerTypes] Teach PromoteSetCCOperands not to sext i32 comparisons for RV64 if the promoted values are already zero extended. This is similar to what is done for targets that prefer zero extend where we avoid using a zero extend if the promoted values are sign extended. We'll also check for zero extended operands for ugt, ult, uge, and ule when the target prefers sign extend. This is different than preferring zero extend, where we only check for sign bits on equality comparisons. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D116421	2021-12-31 17:15:20 -08:00
Craig Topper	7d659c6ac7	[LegalizeIntegerTypes] Rename NewLHS/NewRHS arguments to DAGTypeLegalizer::PromoteSetCCOperands. NFC The 'New' only makes sense in the context of these being output arguments, but they are also used as inputs first. Drop the 'New' and just call them LHS/RHS. Factored out of D116421.	2021-12-30 15:31:43 -08:00
Craig Topper	15787ccd45	[RISCV] Add support for STRICT_LRINT/LLRINT/LROUND/LLROUND. Tests for other strict intrinsics. This patch adds isel support for STRICT_LRINT/LLRINT/LROUND/LLROUND. It also adds test cases for f32 and f64 constrained intrinsics that correspond to the intrinsics in float-intrinsics.ll and double-intrinsics.ll. Support for promoting the integer argument of STRICT_FPOWI was added. I've skipped adding tests for f16 intrinsics, since we don't have libcalls for them and we have inconsistent support for promoting them in LegalizeDAG. This will need to be examined more closely. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D116323	2021-12-30 11:54:32 -08:00
Craig Topper	1c6b740d4b	[TargetLowering] Remove workaround for old behavior of getShiftAmountTy. NFC getShiftAmountTy used to directly return the shift amount type from the target which could be too small for large illegal types. For example, X86 always returns i8. The code here detected this and used i32 instead if it won't fit. This behavior was added to getShiftAmountTy in D112469 so we no longer need this workaround.	2021-12-28 14:08:25 -08:00
Simon Pilgrim	71fc4bbdd2	[X86][SSE] Add ISD::ROTR support Fix issue in TargetLowering::expandROT where we only attempt to flip a rotation if the other direction has better support - this matches TargetLowering::expandFunnelShift This allows us to enable ISD::ROTR lowering on SSE targets, which particularly simplifies/improves codegen for splat amount and AVX2 per-element shifts.	2021-12-23 15:07:30 +00:00
Shivam Gupta	0489e89119	[DAGCombiner] Avoid combining adjacent stores at -O0 to improve debug experience When the source has a series of assignments, users reasonably want to have the debugger step through each one individually. Turn off the combine for adjacent stores so we get this behavior at -O0. Similar to D7181. Reviewed By: spatel, xgupta Differential Revision: https://reviews.llvm.org/D115808	2021-12-23 10:48:28 +05:30
Simon Pilgrim	4639461531	[DAG][X86] Add TargetLowering::isSplatValueForTargetNode override Add callback to enable us to test target nodes if they are splat vectors Added some basic X86ISD::VBROADCAST + X86ISD::VBROADCAST_LOAD handling	2021-12-22 16:57:44 +00:00
Simon Pilgrim	592e89e636	[DAG] Constify SelectionDAG::isSplatValue() This doesn't generate any nodes so should be usable by methods with const SelectionDAG &.	2021-12-21 11:19:23 +00:00
Sami Tolvanen	5dc8aaac39	[llvm][IR] Add no_cfi constant With Control-Flow Integrity (CFI), the LowerTypeTests pass replaces function references with CFI jump table references, which is a problem for low-level code that needs the address of the actual function body. For example, in the Linux kernel, the code that sets up interrupt handlers needs to take the address of the interrupt handler function instead of the CFI jump table, as the jump table may not even be mapped into memory when an interrupt is triggered. This change adds the no_cfi constant type, which wraps function references in a value that LowerTypeTestsModule::replaceCfiUses does not replace. Link: https://github.com/ClangBuiltLinux/linux/issues/1353 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D108478	2021-12-20 12:55:32 -08:00
Shivam Gupta	eb66f0662a	Revert "[DAGCombiner] Avoid combining adjacent stores at -O0 to improve debug experience" This reverts commit `731bde1ed3`.	2021-12-20 21:43:40 +05:30
Shivam Gupta	731bde1ed3	[DAGCombiner] Avoid combining adjacent stores at -O0 to improve debug experience When the source has a series of assignments, users reasonably want to have the debugger step through each one individually. Turn off the combine for adjacent stores so we get this behavior at -O0. Similar to D7181. Differential Revision: https://reviews.llvm.org/D115808	2021-12-19 20:58:49 +05:30
Simon Pilgrim	efec3a26b4	[DAG] visitADDSAT/visitSUBSAT - merge scalar/vector canonicalization and constant folding. Match order of most of the other integer opcode combines	2021-12-19 13:19:40 +00:00
Simon Pilgrim	c1340b9e78	[DAG] Improve FMINNUM/FMAXNUM/FMINIMUM/FMAXIMUM constant folding Merge the node combines into a common DAGCombiner::visitFMinMax (like we do for IMINMAX). Move the constant folding into SelectionDAG::foldConstantFPMath. This allows us to fold the vecreduce-propagate-sd-flags.ll test as it reduces constants - so I've refactored it to take variables instead. Differential Revision: https://reviews.llvm.org/D115952	2021-12-19 11:45:51 +00:00
Kazu Hirata	fee57711fe	Use DenseMap::lookup (NFC)	2021-12-17 18:19:25 -08:00
Sanjay Patel	79932211f9	[SDAG] remove FP-to-int cast attribute check in fold to FTRUNC We were using a function attribute to indicate a non-standard FP mode, but now we can use intrinsics for that job as shown in the new tests. Presumably the x86 asm could be improved for that IR with intrinsics, but I have not worked out exactly how to do that. Note that the transform to FTRUNC still requires a hacky check for "nsz" (because FMF are not applied to FP casts). This is a cleanup based on the clang change in D115804 / `8c7f2a4f87` . This is effectively a revert of `5a90285bd9` + D46237 . Differential Revision: https://reviews.llvm.org/D115885	2021-12-17 16:01:37 -05:00
Kazu Hirata	90bd4873d6	[CodeGen] Fix an unused variable warning This patch fixes: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:22617:11: error: unused variable 'Ops' [-Werror,-Wunused-variable]	2021-12-17 09:43:42 -08:00
Simon Pilgrim	35c7b1aeae	[DAG] SimplifyVBinOp - remove FoldConstantArithmetic call. Constant folding (scalar/vector) is now consistently handled before the SimplifyVBinOp calls.	2021-12-17 17:22:23 +00:00
Simon Pilgrim	f602723bfa	[DAG] Constant fold + canonicalize fp binops before SimplifyVBinOp call Replace custom constant scalar/splat folding with FoldConstantArithmetic call and canonicalize commutative constant ops to the RHS before the SimplifyVBinOp call	2021-12-17 17:02:54 +00:00
Simon Pilgrim	9d2994311a	[DAG] Move foldConstantFPMath() inside FoldConstantArithmetic Further merging of integer and fp constant folding paths. This allows us to handle undef vector arguments the same as scalar cases.	2021-12-17 16:06:41 +00:00
Simon Pilgrim	512ab9968d	[DAG] foldConstantFPMath - fold vector splats as well as scalar constants	2021-12-17 15:19:26 +00:00
Simon Pilgrim	52611702ea	Revert rG22dbc7a48bf7a3942a7e5ff57977ef828d240bd3 "[DAG] foldConstantFPMath - fold vector splats as well as scalar constants" A followup patch uncovered an issue with allowing undef elements in the splat - I will reapply this with a fixed implementation.	2021-12-17 15:19:25 +00:00
David Truby	5c9684704d	[DAG][sve] Lowering for VLS masked truncating stores This extends the custom lowering for truncating stores on fixed length vectors in SVE to support masked truncating stores. It also adds a DAG combine for truncates followed by masked stores. Reviewed By: peterwaller-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D108115	2021-12-17 15:04:45 +00:00
Simon Pilgrim	22dbc7a48b	[DAG] foldConstantFPMath - fold vector splats as well as scalar constants	2021-12-17 14:24:36 +00:00
Simon Pilgrim	d91b5b0f57	[DAG] foldConstantFPMath - use APFloat& for read-only constant fold arg. NFC. We just need to copy the 1st arg (which we use for the constant fold result) - use a cheaper const reference for the 2nd arg.	2021-12-17 12:34:03 +00:00
Simon Pilgrim	42f00106b7	[DAG] Constant fold + canonicalize integer binops before SimplifyVBinOp call SimplifyVBinOp still has a FoldConstantArithmetic call, which now it isn't vector specific we should be able to remove (once fp binops are tidied up); but we can at least clean up the integer opcodes to perform the basic constant/undef handling in common code first.	2021-12-17 12:02:27 +00:00
Nathan Sidwell	dd073e08ae	Avoid by-value copies of referenced objects These were detected by the new -Wauto-by-value-copy (D114989) warning, these by-value constant copies need only be references. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D114990	2021-12-16 07:22:46 -08:00
Simon Pilgrim	b88f4f271b	[DAG] SelectionDAG::isSplatValue - add *_EXTEND_VECTOR_INREG handling Fixes #52719	2021-12-15 12:26:39 +00:00
Chen Zheng	062d9b7d43	[LegalizeVectorOps] code refactor for LegalizeOp; NFC Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115636	2021-12-14 03:45:53 +00:00
Chen Zheng	8c107bee70	[LegalizeVectorOps] fix a typo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115637	2021-12-14 00:22:58 +00:00
Roman Lebedev	c1a36ba002	[DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask') In most test changes this allows us to drop some broadcasts/shuffles. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D104156	2021-12-13 20:03:44 +03:00
Peter Waller	921e89c59a	[SVE] Only combine (fneg (fma)) => FNMLA with nsz -(Za + Zm * Zn) != (-Za + Zm * (-Zn)) when the FMA produces a zero output (e.g. all zero inputs can produce -0 output) Add a PatFrag to check presence of nsz on the fneg, add tests which ensure the combine does not fire in the absense of nsz. See https://reviews.llvm.org/D90901 for a similar discussion on X86. Differential Revision: https://reviews.llvm.org/D109525	2021-12-13 11:33:07 +00:00
Fraser Cormack	b0319ab79b	[PR52475] Ensure a correct chain in copies to/from hidden sret parameter This patch fixes an issue during SelectionDAG construction. When the target is unable to lower the function's return value, a hidden sret parameter is created. It is initialized and copied to a stored variable (DemoteRegister) with CopyToReg and is later fetched with CopyFromReg. The bug is that the chains used for each copy are inconsistent, and thus in rare cases the scheduler may issue them out of order. The fix is to ensure that the CopyFromReg uses the DAG root which is set as the chain corresponding to the initial CopyToReg. Fixes https://llvm.org/PR52475 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D114795	2021-12-13 10:46:32 +00:00
David Sherwood	652faed353	[CodeGen] Improve SelectionDAGBuilder lowering code for get.active.lane.mask intrinsic Previously we were using UADDO to generate a two-result value with the unsigned addition and the overflow mask. We then combined the overflow mask with the trip count comparison to get a result. However, we don't need to do this - we can simply use a UADDSAT saturating add node to add the vector index splat and the stepvector together. Then we can just compare this to a splat of the trip count. This results in overall better code quality for both Thumb2 and AArch64. Differential Revision: https://reviews.llvm.org/D115354	2021-12-10 13:39:38 +00:00
Kazu Hirata	f829630d2e	[llvm] Use llvm::count (NFC)	2021-12-09 20:50:38 -08:00
David Sherwood	3257f63bbd	[NFC][CodeGen] Remove rarely used DL variable from SelectionDAGBuilder There is a pointer to the DataLayout in SelectionDAGBuilder called 'DL' that is hardly ever used. In most cases the code seems to just use `DAG.getDataLayout()` instead. Given that DL is also often used as a shadowed variable for the debug location it seems sensible to just kill off the few remaining uses and be consistent with the rest of the code. Differential Revision: https://reviews.llvm.org/D114451	2021-12-08 17:05:46 +00:00
David Green	5d7efd4758	[SDAG] Refine MMO size when converting masked load/store to normal load/store After D113888 / `32b6c17b29` the MMO size of a masked loads/store is unknown. When we are converting back to a standard load/store because the mask is known all ones, we can refine that to the correct size from the size of the vector being loaded/stored. Differential Revision: https://reviews.llvm.org/D114582	2021-12-08 10:13:25 +00:00
Simon Pilgrim	d298c32407	Remove unused variable. NFC.	2021-12-07 18:37:07 +00:00
Simon Pilgrim	52d2f35323	[DAG] Update expandFunnelShift/expandROT to return the expansion directly. NFCI. Don't return a bool to indicate if the expansion was successful, just return the SDValue result directly, like we do for most other basic expansions.	2021-12-07 18:09:43 +00:00
Kazu Hirata	630c847b1b	[llvm] Use range-based for loops (NFC)	2021-12-07 09:17:03 -08:00
Fraser Cormack	40d51de5cb	[SelectionDAG] Use UnknownSize for VP memory ops In the style of D113888, this patch updates the various VP memory operations (load, store, gather, scatter) to use UnknownSize. This is for the same reason as for masked loads and stores: the number of elements accessed is not generally known at compile time. This is somewhat pessimistic in the sense that we may still find un-canonicalized intrinsics featuring both an all-true mask and an EVL equal to the vector size. Arguably those should be canonicalized before the SelectionDAG, so those have been left for future work. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D115036	2021-12-07 10:51:02 +00:00
Fraser Cormack	3460cc2585	[VP] Propagate align parameter attr on VP load/store to ISel This patch fixes a case where the 'align' parameter attribute on the pointer operands to llvm.vp.load and llvm.vp.store was being dropped during the conversion to the SelectionDAG. The default alignment equal to the ABI type alignment of the vector type was kept. It also updates the documentation to reflect the fact that the parameter attribute is now properly supported. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D114422	2021-12-07 10:16:16 +00:00
Kazu Hirata	c4a8928b51	[CodeGen] Use range-based for loops (NFC)	2021-12-06 08:49:10 -08:00
David Green	57ff805a6d	[DAG] Create fptoui.sat from clamped fptosi As an extension to D111976, this converts clamp fptosi, clamped between 0 and (2^n)-1 to a fptoui.sat. This can greatly help on targets with conversions that naturally saturate, such as Arm. X86 disables the transform as some of the test cases increases in size. A fptoui.sat necessitates a fp clamp without native support, so there is little use in converting if the instruction is just going to be expanded. Differential Revision: https://reviews.llvm.org/D112428	2021-12-05 09:25:52 +00:00
Simon Pilgrim	ebf5271918	[DAG] PromoteIntRes_FunnelShift - rename shift Amount variable to Amt to prevent line overflow. NFC.	2021-12-03 17:24:45 +00:00
David Green	255ad73424	[ARM] Make MVE v2i1 predicates legal MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the two halves. This was never treated as a legal type in llvm in the past as there are not many 64bit instructions and no 64bit compares. There are a few instructions that could use it though, notably a VSELECT (as it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for similar reasons, some gathers/scatter and long multiplies and VCTP64 instructions. This patch goes through and makes v2i1 a legal type, handling all the cases that fall out of that. It also makes VSELECT legal for v2i64 as a side benefit. A lot of the codegen changes as a result - usually in way that is a little better or a little worse, but still expensive. Costs can change a little too in the process, again in a way that expensive things remain expensive. A lot of the tests that changed are mainly to ensure correctness - the code can hopefully be improved in the future where it comes up in practice. The intrinsics currently remain using the v4i1 they previously did to emulate a v2i1. This will be changed in a followup patch but this one was already large enough. Differential Revision: https://reviews.llvm.org/D114449	2021-12-03 14:05:41 +00:00
Jay Foad	d133a21b71	[SelectionDAG] Add newline to a debug message	2021-12-03 13:33:32 +00:00
Simon Pilgrim	6803d08c38	[DAG][PowerPC] Enable initial ISD::BITCAST SimplifyDemandedBits/SimplifyMultipleUseDemandedBits big-endian handling This patch begins extending handling for peeking through bitcast nodes to big-endian targets as well as the existing little-endian case. Differential Revision: https://reviews.llvm.org/D114676	2021-12-02 11:47:53 +00:00
Omer Aviram	617ad14060	[SelectionDAG] Add pattern to haveNoCommonBitsSet Correctly identify the following pattern, which has no common bits: (X & ~M) op (Y & M). Differential Revision: https://reviews.llvm.org/D113970	2021-12-01 12:04:04 -05:00
Simon Pilgrim	19d34f6e95	[X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT combinePMULH currently only truncates vXi32/vXi64 multiplies to PMULHW/PMULUW if the source operands are SEXT/ZEXT instructions for a 'free' truncation. But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation. This helps us avoid the wider multiplies, in exchange for truncation on both source operands instead of the result. Differential Revision: https://reviews.llvm.org/D113371	2021-12-01 16:37:49 +00:00
Bradley Smith	0eb1efb92c	[DAGCombiner] When combining REM ensure optimized div nodes are unique The REM DAG combine uses the visitDivLike functions to try and get an optimized DIV node to provide better codegen, however in some cases this visitDivLike call ends up in the BuildSDIVPow2 target hook, which in turn sometimes will return the same node passed in to indicate not to change it. The REM DAG combine does not anticipate this and creates a cycle in the DAG because of it. Fix this by ensuring any such optimized div node returned is distinct from the node being combined. Differential Revision: https://reviews.llvm.org/D114716	2021-12-01 11:24:26 +00:00
Simon Pilgrim	9981dd142f	[DAG] Apply clang-format to visitMSTORE + visitMLOAD. NFC. Reduce diff in D114582	2021-12-01 11:23:47 +00:00
Qiu Chaofan	15826eb437	[Legalizer] Avoid expansion to BR_CC if illegal Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110616	2021-12-01 12:22:21 +08:00
David Green	9e8a71caf0	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 15:29:14 +00:00
Hans Wennborg	a87782c34d	Revert "[DAG] Create fptosi.sat from clamped fptosi" It causes builds to fail with this assert: llvm/include/llvm/ADT/APInt.h:990: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed. See comment on the code review. > This adds a fold in DAGCombine to create fptosi_sat from sequences for > smin(smax(fptosi(x))) nodes, where the min/max saturate the output of > the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because > it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, > ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need > to be handled similarly. > > A shouldConvertFpToSat method was added to control when converting may > be profitable. The original fptosi will have a less strict semantics > than the fptosisat, with less values that need to produce defined > behaviour. > > This especially helps on ARM/AArch64 where the vcvt instructions > naturally saturate the result. > > Differential Revision: https://reviews.llvm.org/D111976 This reverts commit `52ff3b0093`.	2021-11-30 15:36:56 +01:00
David Green	52ff3b0093	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 11:05:32 +00:00
Jeremy Morse	a20987adf4	[DebugInfo][InstrRef] Add indirection from dbg.declare in SelectionDAG Usually dbg.declares get translated into either entries in an MF side-table, or a DBG_VALUE on entry to the function with IsIndirect set (including in instruction referencing mode). Much rarer is a dbg.declare attached to a non-argument value, such as in the test added in this patch where there's a variable-length-array. Such dbg.declares become SDDbgValue nodes with InIndirect=true. As it happens, we weren't correctly emitting DBG_INSTR_REFs with the additional indirection. This patch adds the extra indirection, encoded as adding an additional DW_OP_deref to the expression. Differential Revision: https://reviews.llvm.org/D114440	2021-11-29 22:24:19 +00:00
Kazu Hirata	f240e528ce	[llvm] Use range-based for loops (NFC)	2021-11-29 09:04:44 -08:00
Bradley Smith	6180806632	[AArch64][SVE] Mark fixed-type FP extending/truncating loads/stores as custom This allows the generic DAG combine to fold fp_extend/fp_trunc into loads/stores which we can then lower into a integer extending load/truncating store plus an FP_EXTEND/FP_ROUND. The nuance here is that fixed-type FP_EXTEND/FP_ROUND require unpacked types hence lowering them introduces an unpack/zip. By allowing these nodes to be combined with loads/store we make it much easier to have this unpack/zip combined into the load/store by our custom lowering. Differential Revision: https://reviews.llvm.org/D114580	2021-11-29 11:56:07 +00:00
David Sherwood	a31f4bdfe8	[CodeGen][SVE] Use whilelo instruction when lowering @llvm.get.active.lane.mask In most common cases the @llvm.get.active.lane.mask intrinsic maps directly to the SVE whilelo instruction, which already takes overflow into account. However, currently in SelectionDAGBuilder::visitIntrinsicCall we always lower this immediately to a generic sequence of instructions that explicitly take overflow into account. This makes it very difficult to then later transform back into a single whilelo instruction. Therefore, this patch introduces a new TLI function called shouldExpandGetActiveLaneMask that asks if we should lower/expand this to a sequence of generic ISD nodes, or instead just leave it as an intrinsic for the target to lower. You can see the significant improvement in code quality for some of the tests in this file: CodeGen/AArch64/active_lane_mask.ll Differential Revision: https://reviews.llvm.org/D114542	2021-11-29 08:08:17 +00:00
Kazu Hirata	fd7d40640d	[llvm] Use range-based for loops (NFC)	2021-11-28 18:14:49 -08:00
Nikita Popov	bfa91f38a9	[DAG] Restore dropped condition This was dropped in `fcee33bd5a`, presumably accidentally.	2021-11-26 21:18:54 +01:00
Simon Pilgrim	fcee33bd5a	[DAG] Pull out repeated isLittleEndian() calls. NFC.	2021-11-26 18:41:56 +00:00
Simon Pilgrim	2778f9a9f6	[DAG] SimplifyDemandedVectorElts - attempt to handle ADD(x,x) as single use If the ADD node is the only user of the repeated operand, then treat this as single use - allows us to peek through shl(x,1) patterns.	2021-11-26 10:32:10 +00:00
David Sherwood	86137fb722	[CodeGen] Add scalable vector support for lowering of llvm.get.active.lane.mask Currently the generic lowering of llvm.get.active.lane.mask is done in SelectionDAGBuilder::visitIntrinsicCall and currently assumes only fixed-width vectors are used. This patch changes the code to be more generic and support scalable vectors too. I have added tests for SVE here: CodeGen/AArch64/active_lane_mask.ll although the code quality leaves a lot to be desired. The code will be improved significantly in a later patch that makes use of the SVE whilelo instruction. Differential Revision: https://reviews.llvm.org/D114541	2021-11-26 08:17:55 +00:00
Simon Pilgrim	63b1e58f07	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED) If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. Reapplied with fix for AArch64 rev patterns to matching the ARM fix. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-25 11:14:15 +00:00
Zarko Todorovski	95875d246a	[LLVM][NFC]Inclusive language: remove occurances of sanity check/test from llvm Part of work to use more inclusive language in clang/llvm. Rewording some comments and change function and variable names.	2021-11-24 17:29:55 -05:00
Benjamin Kramer	d32787230d	Revert "[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl" This reverts commit `3cf4a2c620`. It makes llc hang on the following test case. ``` target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" target triple = "aarch64-unknown-linux-gnu" define dso_local void @_PyUnicode_EncodeUTF16() local_unnamed_addr #0 { entry: br label %while.body117.i while.body117.i: ; preds = %cleanup149.i, %entry %out.6269.i = phi i16* [ undef, %cleanup149.i ], [ undef, %entry ] %0 = load i16, i16* undef, align 2 %1 = icmp eq i16 undef, -10240 br i1 %1, label %fail.i, label %cleanup149.i cleanup149.i: ; preds = %while.body117.i %or130.i = call i16 @llvm.bswap.i16(i16 %0) #2 store i16 %or130.i, i16* %out.6269.i, align 2 br label %while.body117.i fail.i: ; preds = %while.body117.i ret void } ; Function Attrs: nofree nosync nounwind readnone speculatable willreturn declare i16 @llvm.bswap.i16(i16) #1 attributes #0 = { "target-features"="+neon,+v8a" } attributes #1 = { nofree nosync nounwind readnone speculatable willreturn } attributes #2 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" } ```	2021-11-24 14:42:54 +01:00
Simon Pilgrim	3cf4a2c620	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-24 11:28:35 +00:00
David Sherwood	cf40ca026f	[NFC] Tidy up SelectionDAGBuilder::visitIntrinsicCall to use existing sdl debug loc In quite a few places we were calling getCurSDLoc() to get the debug location, but this is already a local variable `sdl`. Differential Revision: https://reviews.llvm.org/D114447	2021-11-24 10:35:49 +00:00
Simon Moll	1e65b93f3a	[VP] Canonicalize macros of VPIntrinsics.def Usage and naming of macros in VPIntrinsics.def has been inconsistent. Rename all property macros to VP_PROPERTY_<name>. Use BEGIN/END scope macros to attach properties to vp intrinsics and SDNodes (instead of specifying either directly with the property macro). A follow-up patch has documentation on how the macros are (intended) to be used. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D114144	2021-11-23 16:51:11 +01:00
David Green	32b6c17b29	[SDAG] Use UnknownSize for masked load/store MMO size A masked load or store will load a potentially unknown number of bytes from a memory location - that is not generally known at compile time. They do not necessarily load/store the entire vector width, and treating them as such can lead to incorrect aliasing information (for example, if the underlying object is smaller than the size of the vector). This makes sure that the MMO is given an unknown size to represent this. which is less accurate that "may load/store from up to 16 bytes", but less incorrect that "will load/store from 16 bytes". Differential Revision: https://reviews.llvm.org/D113888	2021-11-23 09:47:56 +00:00
Simon Pilgrim	812e64ef0c	[DAG] MatchRotate - support rotate-by-constant of illegal types Patch to fix some of the regressions in D77804. By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443. Differential Revision: https://reviews.llvm.org/D113192	2021-11-19 11:12:04 +00:00
Eric Tang	9fe6b9e802	[TargetLowering][RISCV] Fixed a scalable vector issue when lowering [s\|u]mul.overflow intrinsics Fixed the vector type issue that where we used getVectorNumElements() should be replaced by getVectorElementCount() when lowering these intrinsics. This is similar to D94149 Signed-off-by: Eric Tang <tangxingxin1008@gmail.com> Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D109809	2021-11-18 10:16:08 +00:00
Craig Topper	d78fdf111d	[LegalizeTypes] Further limit expansion of CTTZ during type promotion. Don't expand CTTZ if CTPOP or CTLZ is supported on the promoted type. We have special handling for CTTZ expansion to use those ops with a small conversion. The setup for that doesn't generate extra code or large constants so we don't gain anything from expanding early and we make CTTZ_ZERO_UNDEF codegen worse. Follow up from post commit feedback on D112268. We don't seem to have any in tree tests that care about this.	2021-11-17 15:27:29 -08:00
Craig Topper	0274be28d7	[RISCV] Lower vector CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF by converting to FP and extracting the exponent. If we have a large enough floating point type that can exactly represent the integer value, we can convert the value to FP and use the exponent to calculate the leading/trailing zeros. The exponent will contain log2 of the value plus the exponent bias. We can then remove the bias and convert from log2 to leading/trailing zeros. This doesn't work for zero since the exponent of zero is zero so we can only do this for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF. If we need a value for zero we can use a vmseq and a vmerge to handle it. We need to be careful to make sure the floating point type is legal. If it isn't we'll continue using the integer expansion. We could split the vector and concatenate the results but that needs some additional work and evaluation. Differential Revision: https://reviews.llvm.org/D111904	2021-11-17 10:29:41 -08:00
Simon Pilgrim	5fedbd5b18	[DAG] SimplifyDemandedVectorElts - zero_extend_vector_inreg(and(x,c)) -> and(x,c') If we've only demanded the 0'th element, and it comes from a (one-use) AND, try to convert the zero_extend_vector_inreg into a mask and constant fold it with the AND.	2021-11-17 12:41:48 +00:00
Eric Tang	f7eb061a5f	[SelectionDAG] Make WidenVecRes_SELECT work for scalable vectors This change make WidenVecRes_SELECT work for scalable vectors. This patch is split from [D110319](https://reviews.llvm.org/D110319) Signed-off-by: Eric Tang <tangxingxin1008@gmail.com> Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D110388	2021-11-17 08:55:11 +00:00
Fabian Wolff	b484fa8289	[X86] Fix crash with inline asm using wrong register name Fixes PR#48678. `X86TargetLowering::getRegForInlineAsmConstraint()` can adjust the register class to match the type, e.g. change `VR128X` to `VR256X` if the type needs 256 bits. However, the function currently returns the unadjusted register and the adjusted register class, e.g. `xmm15` and `VR256X`, which then causes an assertion failure later because the register class does not contain that register. This patch fixes this behavior. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D113834	2021-11-16 10:38:12 +08:00
Craig Topper	233def40f7	[DAGCombiner] Prevent unfoldMaskedMerge from creating an AND with two inverted inputs. It's possible that the mask is already a NOT. At least if InstCombine hasn't canonicalized the input. In that case we will form an ANDN with X instead of with Y. So we don't need to worry about Y being a constant. We might need to check that X isn't a constant instead, but we don't have a test case for that yet. This fixes a size regression found when trying to enable this combine for RISCV in D113937. Differential Revision: https://reviews.llvm.org/D113948	2021-11-15 17:15:51 -08:00
Simon Pilgrim	7bac1985f4	[DAG] SimplifyVBinOp - add SDLoc() argument Pass in SDLoc instead of (repeated) local creations in SimplifyVBinOp and scalarizeBinOpOfSplats	2021-11-15 10:43:56 +00:00
Simon Pilgrim	8658d20724	[DAG] SimplifyVBinOp - pull out repeated getValueType() call. NFC.	2021-11-15 10:43:55 +00:00
Sanjay Patel	254c5246e9	[DAGCombiner] match inverted/swapped patterns for vselect of mask of signbit This was noted as a follow-up to D113212 / D113426: `4fc1fc4005` `7e30404c3b` `11522cfcad` https://alive2.llvm.org/ce/z/e4o96b The canonicalization rules for these IR patterns are complicated, and we were not matching the expected forms in 2 out of the 3 cases. We can make codegen more robust by matching the swapped forms (and that will also work if these patterns are created late).	2021-11-14 09:35:26 -05:00
Craig Topper	82bc6a094e	[X86] Promote f16 STRICT_FROUND to f32 and call libc. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D113817	2021-11-12 21:37:03 -08:00
Kazu Hirata	99d5cbbd7e	[CodeGen] Use SDNode::uses (NFC)	2021-11-12 07:33:29 -08:00
Simon Pilgrim	010b09b0c5	[DAG] reassociateOpsCommutative - test getNode result directly. NFC Matches the clean code style we use directly above	2021-11-11 18:45:50 +00:00
Sanjay Patel	11522cfcad	[DAGCombiner] add fold for vselect based on mask of signbit, part 3 (Cond0 s> -1) ? N1 : 0 --> ~(Cond0 s>> BW-1) & N1 https://alive2.llvm.org/ce/z/mGCBrd This was suggested as a potential enhancement in D113212 (also `7e30404c3b` ). There's an improvement for AArch that could be generalized ( X > -1 --> X >= 0 ). For x86, we have a counter-acting fold for most cases that turns the shift+not back into a setcc, so that needs a work-around to get more cases to use "pandn": D113603 Note that this pattern (and a previous one) are not currently canonical forms in IR: https://alive2.llvm.org/ce/z/e4o96b Adding swapped variants is left as a TODO item here, but is planned as a near-term follow-up patch. Differential Revision: https://reviews.llvm.org/D113426	2021-11-11 10:27:37 -05:00
Simon Pilgrim	82b74363a9	[DAG] reassociateOpsCommutative - peek through bitcasts to find constants Now that FoldConstantArithmetic can fold bitcasted constants, we should peek through bitcasts of binop operands to try and find foldable constants	2021-11-11 12:00:22 +00:00
Simon Pilgrim	098ea29641	[DAG] FoldConstantArithmetic - fold intop(bitcast(buildvector(c1)),bitcast(buildvector(c1))) -> bitcast(intop(buildvector(c1'),buildvector(c2'))) Enable FoldConstantArithmetic to constant fold bitcasted constant build vectors. These have typically been bitcasted for type legalization purposes. By extracting the raw constant bit data, performing the constant fold, and then casting the constant bit data back to the (legalized) type, we can perform constant folding on integer types after legalization. This in particular helps 32-bit targets which need to handle vXi64 build vectors - during legalization the (unsupported) i64 elements are split to create a bitcasted v2Xi32 build vector. Addresses some regressions in D113192. Differential Revision: https://reviews.llvm.org/D113564	2021-11-11 11:35:18 +00:00
Fraser Cormack	b1d8d70b9d	[SelectionDAG] Replace the Chain in LOAD->VP_LOAD widening The introduction of this legalization, D111248, forgot to replace the old chain with the new. This could manifest itself in the old (illegally-typed) value remaining in the DAG, though the simple test cases didn't catch this. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D113561	2021-11-10 17:49:12 +00:00
Simon Pilgrim	381d14775e	[DAG] reassociateOpsCommutative - pull out repeated getOperand() calls. NFC.	2021-11-10 15:19:13 +00:00
Simon Pilgrim	ed80761b50	[DAG] Split BuildVectorSDNode::getConstantRawBits into BuildVectorSDNode::recastRawBits helper. NFC. NFC refactor of D113351, pulling out the APInt split/merge code from the BuildVectorSDNode bits extraction into a BuildVectorSDNode::recastRawBits helper. This is to allow us to reuse the code when we're packing constant folded APInt data back together.	2021-11-10 13:06:19 +00:00
Fraser Cormack	332318ffb6	[SelectionDAG] Widen scalable-vector loads/stores via VP_LOAD/VP_STORE This patch fixes a compiler crash when widening scalable-vector loads and stores which end up breaking down to element-wise store operations. It does so by providing a way for targets with support for vector-predicated loads and stores to use those instead. By widening the operation but maintaining the original effective operation length via the EVL, only the intended vector elements are loaded or stored. This method should in theory be possible and even preferred for fixed-length vector types, but all fixed-length types can be broken down into their elements, and regardless I have observed regressions in the generated code when doing so. I believe this is simply due to VP_LOAD/VP_STORE not being up to par with LOAD/STORE in terms of optimization. It does improve performance on smaller self-contained examples, however, so the potential is there. While the only target that benefits from this is RISCV, the legalization is generic and so was placed centrally. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111248	2021-11-10 09:55:03 +00:00
Simon Pilgrim	58c01ef270	[SelectionDAG] Merge FoldConstantVectorArithmetic into FoldConstantArithmetic (PR36544) This patch merges FoldConstantVectorArithmetic back into FoldConstantArithmetic. Like FoldConstantVectorArithmetic we now handle vector ops with any operand count, but we currently still only handle binops for scalar types - this can be improved in future patches - in particular some common unary/trinary ops still have poor constant folding. There's one change in functionality causing test changes - FoldConstantVectorArithmetic bails early if the build/splat vector isn't all constant (with some undefs) elements, but FoldConstantArithmetic doesn't - it instead attempts to fold the scalar nodes and bails if they fail to regenerate a constant/undef result, allowing some additional identity/undef patterns to be handled. Differential Revision: https://reviews.llvm.org/D113300	2021-11-09 11:31:01 +00:00
Simon Pilgrim	f059b04f7b	[DAG] Add SelectionDAG::ComputeMinSignedBits helper As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper. I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious. Differential Revision: https://reviews.llvm.org/D113396	2021-11-08 14:12:45 +00:00
Simon Pilgrim	f60d3ec0c7	[DAG] Add BuildVectorSDNode::getConstantRawBits helper We have several places where we need to extract the raw bits data from a BUILD_VECTOR node, so consolidate this to a single helper function that handles Undefs and Integer/FP constants, including implicit truncation. This should make it easier to extend D113202 to handle more constant folding of bitcasted constant data. Differential Revision: https://reviews.llvm.org/D113351	2021-11-08 12:07:38 +00:00
Simon Pilgrim	0ff1edeeec	[DAG] SimplifyVBinOp - replace FoldConstantVectorArithmetic with FoldConstantArithmetic Currently FoldConstantArithmetic only handles binops, so replacing other uses of FoldConstantVectorArithmetic (in particular for SETCC nodes), still require more work.	2021-11-07 12:11:46 +00:00
Sanjay Patel	39c4c7d391	[DAGCombiner] remove vselect fold that was accidentally added This diff snuck into the unrelated: `025a2f73a3` It's a suggested follow-up for D113212, but I need to add test coverage first.	2021-11-06 09:34:30 -04:00
Sanjay Patel	025a2f73a3	[InstCombine] add tests for umax with sub; NFC	2021-11-06 08:32:52 -04:00
Kazu Hirata	87e53a0ad8	[llvm] Use make_early_inc_range (NFC)	2021-11-05 19:39:07 -07:00
Sanjay Patel	7e30404c3b	[DAGCombiner] add fold for vselect based on mask of signbit, part 2 This is the 'or' sibling for the fold added with: D113212 https://alive2.llvm.org/ce/z/tgnp7K Note that neither of these transforms is poison-safe, but it does not seem to matter at this level. We have had the scalar version of D113212 for a long time, so this is just making optimizer behavior consistent. We do not have the scalar version of this fold, however, so that is another follow-up.	2021-11-05 15:02:12 -04:00
Simon Pilgrim	9e6506299a	[DAG] FoldConstantVectorArithmetic - remove SDNodeFlags argument Another minor step towards merging FoldConstantVectorArithmetic into FoldConstantArithmetic. We don't use SDNodeFlags in any constant folding inside DAG, so passing the Flags argument is a waste of time - an alternative would be to wire up FoldConstantArithmetic to take SDNodeFlags just-in-case we someday start using it, but we don't have any way to test it and I'd prefer to avoid dead code. Differential Revision: https://reviews.llvm.org/D113276	2021-11-05 14:36:17 +00:00
Sanjay Patel	4fc1fc4005	[DAGCombiner] add fold for vselect based on mask of signbit (X s< 0) ? Y : 0 --> (X s>> BW-1) & Y We canonicalize to the icmp+select form in IR, and we already have this fold for scalar select in SDAG, so I think it's an oversight that we don't have the fold for vectors. It seems neutral for AArch64 and saves some instructions on x86. Whether we should also have the sibling folds for the inverse condition or all-ones true value may depend on target-specific factors such as whether there's an "and-not" instruction. Differential Revision: https://reviews.llvm.org/D113212	2021-11-05 10:06:16 -04:00
Simon Pilgrim	f2703c3c33	[DAG] FoldConstantArithmetic - rename NumOps -> NumElts. NFC. NumOps represents the number of elements for vector constant folding, rename this NumElts so in future we can the consistently use NumOps to represent the number of operands of the opcode. Minor cleanup before trying to begin generalizing FoldConstantArithmetic to support opcodes other than binops.	2021-11-05 13:32:34 +00:00
Simon Pilgrim	c1e7911c3b	[DAG] FoldConstantArithmetic - fold bitlogic(bitcast(x),bitcast(y)) -> bitcast(bitlogic(x,y)) To constant fold bitwise logic ops where we've legalized constant build vectors to a different type (e.g. v2i64 -> v4i32), this patch adds a basic ability to peek through the bitcasts and perform the constant fold on the inner operands. The MVE predicate v2i64 regressions will be addressed by future support for basic v2i64 type support. One of the yak shaving fixes for D113192.... Differential Revision: https://reviews.llvm.org/D113202	2021-11-05 12:00:59 +00:00
jacquesguan	a39eadcf16	[DAGCombiner] Teach combineShiftToMULH to handle constant and const splat vector. Fold (srl (mul (zext i32:$a to i64), i64:c), 32) -> (mulhu $a, $b), if c can truncate to i32 without loss. Reviewed By: frasercrmck, craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D108129	2021-11-02 12:04:23 +00:00
Simon Pilgrim	325031786e	[SelectionDAG] Optimize expansion for rotates/funnel shifts If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering. Also use the optimized funnel shift lowering for rotates. Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept (Branched from D108058 as getting this completed should help unlock some other WIP patches). Original Patch: @efriedma (Eli Friedman) Differential Revision: https://reviews.llvm.org/D112443	2021-11-02 11:38:25 +00:00
Simon Pilgrim	37e17f278f	[DAG] MatchRotate - remove (redundant) legal type check. Rely on the hasOperation() instead - as commented on D77804, the mid-term intention is to recognise rotate/funnel-by-constant pre-legalization to help avoid SimplifyDemandedBits regressions.	2021-11-02 11:24:50 +00:00
Craig Topper	ada5458521	[RISCV] Expand scalable vector bswap. Fix crash for bitreverse. Fix LegalizeVectorOps to not try shuffle or unrolling expansions for scalable vectors. Differential Revision: https://reviews.llvm.org/D112236	2021-10-31 10:01:27 -07:00
Roman Lebedev	25043c8276	[NFCI] Introduce `ICmpInst::compare()` and use it where appropriate As noted in https://reviews.llvm.org/D90924#inline-1076197 apparently this is a pretty common pattern, let's not repeat it yet again, but have it in a common place. There may be some more places where it could be used, but these are the most obvious ones.	2021-10-30 17:50:06 +03:00
Fraser Cormack	8314a04ede	[SelectionDAG] Allow FindMemType to fail when widening loads & stores This patch removes an internal failure found in FindMemType and "bubbles it up" to the users of that method: GenWidenVectorLoads and GenWidenVectorStores. FindMemType -- renamed findMemType -- now returns an optional value, returning None if no such type is found. Each of the aforementioned users now pre-calculates the list of types it will use to widen the memory access. If the type breakdown is not possible they will signal a failure, at which point the compiler will crash as it does currently. This patch is preparing the ground for alternative legalization strategies for vector loads and stores, such as using vector-predication versions of loads or stores. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112000	2021-10-29 18:27:31 +01:00
Abinav Puthan Purayil	db8d7b6e2d	[DAGCombine][NFC] s/it's/its in the comment of hasNoInfs().	2021-10-29 07:36:38 +05:30
Kerry McLaughlin	f01fafdcd4	[SVE][CodeGen] Fix incorrect legalisation of zero-extended masked loads PromoteIntRes_MLOAD always sets the extension type to `EXTLOAD`, which results in a sign-extended load. If the type returned by getExtensionType() for the load being promoted is something other than `NON_EXTLOAD`, we should instead pass this to getMaskedLoad() as the extension type. Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D112320	2021-10-27 14:15:41 +01:00
Caroline Concatto	1137b7207d	[SelectionDAG] Widening the result of INSERT_SUBVECTOR. Widens the result and first input vector because they have the same size. The subvector to be inserted is widened in the operand widen function. Differential Revision: https://reviews.llvm.org/D112187	2021-10-27 13:52:25 +01:00
Craig Topper	d51e3a2139	[LegalizeTypes][TargetLowering] Merge getShiftAmountTyForConstant into TargetLowering::getShiftAmountTy. getShiftAmountTyForConstant is a special helper that changes the shift amount to i32 if the type chosen by TargetLowering::getShiftAmountTy can't represent all possible values. This is needed to satisfy an assert in SelectionDAG::getNode. It requires additional consideration to know when this helper should be used. I'm not sure that we are always using it when we should. This patch merges the getShiftAmountTyForConstant handling into TargetLowering::getShiftAmountTy so we don't need to think about it anymore. Technically this may slightly increase compile times since the majority of callers of getShiftAmountTy won't need this. Hopefully, this isn't an issue in practice. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112469	2021-10-25 14:06:53 -07:00
Sanjay Patel	6e46b66e2a	[DAGCombiner] make matching bit-hack form of usubsat more flexible (i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128 As suggested in D112085, we can substitute 'xor' with 'add' in this pattern, and it is logically equivalent: https://alive2.llvm.org/ce/z/eJtWWC We canonicalize to 'xor' in IR, but SDAG does not do that (and it probably should not - https://llvm.org/PR52267 ), so it is possible to see either pattern in codegen. Note that 'sub' is a another potential pattern, but that is canonicalized to 'add' in DAGCombiner, so we don't need to worry about that variation. Differential Revision: https://reviews.llvm.org/D112377	2021-10-25 09:01:52 -04:00
Craig Topper	93139a3c32	[LegalizeTypes] Only expand CTLZ/CTTZ/CTPOP during type promotion if the new type is legal. We might be promoting a large non-power of 2 type and the new type may need to be split. Once we split it we may have a ctlz/cttz/ctpop instruction for the split type. I'm also concerned that we may create large shifts with shift amounts that are too small.	2021-10-22 11:02:35 -07:00
Simon Pilgrim	a5f56342b0	[DAG] narrowExtractedVectorLoad - EXTRACT_SUBVECTOR indices are always constant EXTRACT_SUBVECTOR indices are always constant, we don't need to check for ConstantSDNode, we should just use getConstantOperandVal which will assert for the constant.	2021-10-22 18:32:14 +01:00
Craig Topper	04c184bba7	[TargetLowering] Simplify the interface of expandABS. NFC Instead of returning a bool to indicate success and a separate SDValue, return the SDValue and have the callers check if it is null. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112331	2021-10-22 10:22:23 -07:00
Craig Topper	0766aef3f3	[LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if they'll be expanded later. Expanding these requires multiple constants. If we promote during type legalization when they'll end up getting expanded in LegalizeDAG, we'll use larger constants. These constants may be harder to materialize. For example, 64-bit constants on 64-bit RISCV are very expensive. This is similar to what has already been done to BSWAP and BITREVERSE. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112268	2021-10-22 09:10:01 -07:00
Craig Topper	996123e5e8	[TargetLowering] Simplify the interface for expandCTPOP/expandCTLZ/expandCTTZ. There is no need to return a bool and have an SDValue output parameter. Just return the SDValue and let the caller check if it is null. I have another patch to add more callers of these so I thought I'd clean up the interface first. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112267	2021-10-21 15:35:28 -07:00
Craig Topper	ff37b1105d	[LegalizeVectorOps][X86] Don't defer BITREVERSE expansion to LegalizeDAG. By expanding early it allows the shifts to be custom lowered in LegalizeVectorOps. Then a DAG combine is able to run on them before LegalizeDAG handles the BUILD_VECTORS for the masks used. v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16 shift. The BITREVERSE expansion applied an AND mask before SHL ops and after SRL ops. This was done to share the same mask constant for both shifts. It looks like this patch allows DAG combine to remove the AND mask added after v16i8 SHL by X86 lowering. This maintains the mask sharing that BITREVERSE was trying to achieve. Prior to this patch it looks like we kept the mask after the SHL instead which required an extra constant pool or a PANDN to invert it. This is dependent on D112248 because RISCV will end up scalarizing the BSWAP portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in LegalizeVectorOps first. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112254	2021-10-21 15:23:23 -07:00
Craig Topper	458ed5fcc3	[TargetLowering][RISCV] Prevent scalarization of fixed vector bswap. It's better to do the ands, shifts, ors in the vector domain than to scalarize it and do those operations on each element. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112248	2021-10-21 14:34:01 -07:00
Sanjay Patel	d2198771e9	[DAGCombiner] fold bit-hack form of usubsat (i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128 I haven't found a generalization of this identity: https://alive2.llvm.org/ce/z/_sriEQ Note: I was actually looking at the first form of the pattern in that link, but that's part of a long chain of potential missed transforms in codegen and IR....that I hope ends here! The predicates for when this is profitable are a bit tricky. This version of the patch excludes multi-use but includes custom lowering (as opposed to legal only). On x86 for example, we have custom lowering for some vector types, and that uses umax and sub. So to enable that fold, we need add use checks to avoid regressions. Even with legal-only lowering, we could see code with extra reg move instructions for extra uses, so that constraint would have to be eased very carefully to avoid penalties. Differential Revision: https://reviews.llvm.org/D112085	2021-10-21 09:47:19 -04:00
Kerry McLaughlin	0d153df69e	[SVE] Fix selection failure when splitting extended masked loads When splitting a masked load, `GetDependentSplitDestVTs` is used to get the MemVTs of the high and low parts. If the masked load is extended, this may return VTs with different element types which are used to create the high & low masked load instructions. This patch changes `GetDependentSplitDestVTs` to ensure we return VTs with the same element type. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D111996	2021-10-21 13:04:38 +01:00
Arthur Eubanks	6ea7437ca5	[SelectionDAG] Bail out of mergeTruncStores when not optimizing With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores. Fixes PR51827. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D111596	2021-10-20 16:58:22 -07:00
Craig Topper	fe1f0de003	[RISCV][WebAssembly][TargetLowering] Allow expandCTLZ/expandCTTZ to rely on CTPOP expansion for vectors. Our fallback expansion for CTLZ/CTTZ relies on CTPOP. If CTPOP isn't legal or custom for a vector type we would scalarize the CTLZ/CTTZ. This is different than CTPOP itself which would use a vector expansion. This patch teaches expandCTLZ/CTTZ to rely on the vector CTPOP expansion instead of scalarizing. To do this I had to add additional checks to make sure the operations used by CTPOP expansions are all supported. Some of the operations were already needed for the CTLZ/CTTZ expansion. This is a huge improvement to the RISCV which doesn't have a scalar ctlz or cttz in the base ISA. For WebAssembly, I've added Custom lowering to keep the scalarizing behavior. I've also extended the scalarizing to CTPOP. Differential Revision: https://reviews.llvm.org/D111919	2021-10-20 07:46:41 -07:00
Sander de Smalen	be6c8dc765	[SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors. When inserting a scalable subvector into a scalable vector through the stack, the index to store to needs to be scaled by vscale. Before this patch, that didn't yet happen, so it would generate the wrong offset, thus storing a subvector to the incorrect address and overwriting the wrong lanes. For some insert: nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2) The offset was not scaled by vscale: orr x8, x8, #0x4 st1h { z0.h }, p0, [sp] st1h { z1.d }, p1, [x8] ld1h { z0.h }, p0/z, [sp] And is changed to: mov x8, sp st1h { z0.h }, p0, [sp] st1h { z1.d }, p1, [x8, #1, mul vl] ld1h { z0.h }, p0/z, [sp] Differential Revision: https://reviews.llvm.org/D111633	2021-10-20 13:55:24 +01:00
Simon Pilgrim	71e39e3f18	[ADT] Add APInt::isNegatedPowerOf2() helper Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos..... Differential Revision: https://reviews.llvm.org/D111998	2021-10-19 14:38:21 +01:00
Sanjay Patel	2a3cc4d461	[Analysis] add utility function for unary shuffle mask creation This is NFC-intended for the callers. Posting in case there are other potential users that I missed. I would also use this from VectorCombine in a patch for: https://llvm.org/PR52178 ( D111901 ) Differential Revision: https://reviews.llvm.org/D111891	2021-10-18 09:00:39 -04:00
Fraser Cormack	3d850d03ae	[SelectionDAG] Fix illegal widening of scalable-vector loads The process of widening simple vector loads attempts to use a load of a wider vector type if the original load is sufficiently aligned to avoid memory faults. However this optimization is only legal when performed on fixed-length vector types. For scalable vector types this is invalid (unless vscale happens to be 1). This patch does increase the likelihood of compiler crashes (from `FindMemType` failing to find a suitable type) but this now better matches how widening non-simple loads, insufficiently-aligned loads, and scalable-vector stores are handled. Patches will be introduced later by which loads and stores can be widened on targets with support for masked or predicated operations. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111885	2021-10-18 10:00:00 +01:00
Mingming Liu	cfd155c41b	[SelectionDAG] Fix typo in option help Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D111867	2021-10-15 11:27:40 -07:00
Dávid Bolvanský	6678db00e6	[X86] Enable promotion of i16 popcnt (PR52056) Solves https://bugs.llvm.org/show_bug.cgi?id=52056 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111507	2021-10-15 15:41:37 +02:00
Kazu Hirata	81e9c90686	[llvm] Use llvm::is_contained (NFC)	2021-10-14 22:44:09 -07:00
Simon Pilgrim	88487662f7	[Codegen] TargetLowering::getCanonicalIndexType - early out scaled MVT::i8 indices. NFCI. Avoids unused assignment scan-build warning.	2021-10-14 13:08:40 +01:00
Kerry McLaughlin	1a2e90199f	[SVE][CodeGen] Add patterns for ADD/SUB + element count This patch adds patterns to match the following with INC/DEC: - @llvm.aarch64.sve.cnt[b\|h\|w\|d] intrinsics + ADD/SUB - vscale + ADD/SUB For some implementations of SVE, INC/DEC VL is not as cheap as ADD/SUB and so this behaviour is guarded by the "use-scalar-inc-vl" feature flag, which for SVE is off by default. There are no known issues with SVE2, so this feature is enabled by default when targeting SVE2. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D111441	2021-10-13 11:36:15 +01:00
Roman Lebedev	684cbae89a	[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places	2021-10-11 23:36:06 +03:00
Bradley Smith	7c68d4b8ff	Revert "[SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR." This reverts commit `3e8d2008f7`. The code removed in this commit is actually required for extracting fixed types from illegal scalable types, hence this commit causes assertion failures in such extracts.	2021-10-08 14:53:26 +00:00
Itay Bookstein	faa0e2ae76	[SelectionDAG] Fix shift libcall ABI mismatch in shift-amount argument The shift libcalls have a shift amount parameter of MVT::i32, but sometimes ExpandIntRes_Shift may be called with a node whose second operand is a type that is larger than that. This leads to an ABI mismatch, and for example causes a spurious zeroing of a register in RV32 for 64-bit shifts. Note that at present regular shift intstructions already have their shift amount operand adapted at SelectionDAGBuilder::visitShift time, and funnelled shifts bypass that. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110508	2021-10-08 09:57:57 +08:00
Wang, Pengfei	c236883b6b	[X86] Optimize fdiv with reciprocal instructions for half type Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110557	2021-10-08 09:41:13 +08:00
Bradley Smith	5be266db7a	[AArch64][SVE] Improve VECTOR_SPLICE codegen for VL > 128-bit Differential Revision: https://reviews.llvm.org/D111135	2021-10-07 15:28:55 +00:00
Jack Andersen	bd4dad87f4	[MachineInstr] Move MIParser's DBG_VALUE RegState::Debug invariant into MachineInstr::addOperand Based on the reasoning of D53903, register operands of DBG_VALUE are invariably treated as RegState::Debug operands. This change enforces this invariant as part of MachineInstr::addOperand so that all passes emit this flag consistently. RegState::Debug is inconsistently set on DBG_VALUE registers throughout LLVM. This runs the risk of a filtering iterator like MachineRegisterInfo::reg_nodbg_iterator to process these operands erroneously when not parsed from MIR sources. This issue was observed in the development of the llvm-mos fork which adds a backend that relies on physical register operands much more than existing targets. Physical RegUnit 0 has the same numeric encoding as $noreg (indicating an undef for DBG_VALUE). Allowing debug operands into the machine scheduler correlates $noreg with RegUnit 0 (i.e. a collision of register numbers with different zero semantics). Eventually, this causes an assert where DBG_VALUE instructions are prohibited from participating in live register ranges. Reviewed By: MatzeB, StephenTozer Differential Revision: https://reviews.llvm.org/D110105	2021-10-07 16:08:52 +01:00
Simon Pilgrim	21661607ca	[llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine) As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.	2021-10-06 12:04:30 +01:00
David Sherwood	37edb7d3e2	[SVE] Fix incorrect DAG combines when extracting fixed-width from scalable vectors We were previously silently generating incorrect code when extracting a fixed-width vector from a scalable vector. This is worse than crashing, since the user will have no indication that this is currently unsupported behaviour. I have fixed the code to only perform DAG combines when safe to do so, i.e. the input and output vectors are both fixed-width or both scalable. Test added here: CodeGen/AArch64/sve-extract-scalable-vector.ll Differential revision: https://reviews.llvm.org/D110624	2021-10-06 09:27:44 +01:00
Simon Pilgrim	2e5daac217	[llvm] Update report_fatal_error calls from raw_string_ostream to use Twine(OS.str()) As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared. We can use the raw_string_ostream::str() method to perform the implicit flush() and return a reference to the std::string container that we can then wrap inside Twine().	2021-10-05 18:42:12 +01:00
Bjorn Pettersson	8ed0e6b2cf	[SelectionDAG] Replace error prone index check in BaseIndexOffset::computeAliasing Deriving NoAlias based on having the same index in two BaseIndexOffset expressions seemed weird (and as shown in the added unittest the correctness of doing so depended on undocumented pre-conditions that the user of BaseIndexOffset::computeAliasing would need to take care of. This patch removes the code that dereived NoAlias based on indices being the same. As a compensation, to avoid regressions/diffs in various lit test, we also add a new check. The new check derives NoAlias in case the two base pointers are based on two different GlobalValue:s (neither of them being a GlobalAlias). Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D110256	2021-10-05 12:15:55 +02:00
Bjorn Pettersson	1896fb2cff	[SelectionDAG] Assume that a GlobalAlias may alias other global values This fixes a bug detected in DAGCombiner when using global alias variables. Here is an example: @foo = global i16 0, align 1 @aliasFoo = alias i16, i16 * @foo define i16 @bar() { ... store i16 7, i16 * @foo, align 1 store i16 8, i16 * @aliasFoo, align 1 ... } BaseIndexOffset::computeAliasing would incorrectly derive NoAlias for the two accesses in the example above, resulting in DAGCombiner miscompiles. This patch fixes the problem by a defensive approach letting BaseIndexOffset::computeAliasing return false, i.e. that the aliasing couldn't be determined, when comparing two global values and at least one is a GlobalAlias. In the future we might improve this with a deeper analysis to look at the aliasee for the GlobalAlias etc. But that is a bit more complicated considering that we could have 'local_unnamed_addr' and situations with several 'alias' variables. Fixes PR51878. Differential Revision: https://reviews.llvm.org/D110064	2021-10-05 12:15:55 +02:00
Amara Emerson	cfef1803dd	[GlobalISel] Port over the SelectionDAG stack protector codegen feature. This is a port of the feature that allows the StackProtector pass to omit checking code for stack canary checks, and rely on SelectionDAG to do it at a later stage. The reasoning behind this seems to be to prevent the IR checking instructions from hindering tail-call optimizations during codegen. Here we allow GlobalISel to also use that scheme. Doing so requires that we do some analysis using some factored-out code to determine where to generate code for the epilogs. Not every case is handled in this patch since we don't have support for all targets that exercise different stack protector schemes. Differential Revision: https://reviews.llvm.org/D98200	2021-10-04 21:33:44 -07:00
Jay Foad	a9bceb2b05	[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC. Stop using APInt constructors and methods that were soft-deprecated in D109483. This fixes all the uses I found in llvm, except for the APInt unit tests which should still test the deprecated methods. Differential Revision: https://reviews.llvm.org/D110807	2021-10-04 08:57:44 +01:00
Kazu Hirata	d34cd75d89	[Analysis, CodeGen] Migrate from arg_operands to args (NFC) Note that arg_operands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-03 08:22:20 -07:00
Simon Pilgrim	df672f66b6	[DAG] scalarizeExtractedVectorLoad - replace getABITypeAlign with allowsMemoryAccess (PR45116) One of the cases identified in PR45116 - we don't need to limit extracted loads to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback. I've also cleaned up the alignment calculation code - if we have a constant extraction index then the alignment can be based on an offset from the original vector load alignment, but for non-constant indices we should assume the worst (single element alignment only). Differential Revision: https://reviews.llvm.org/D110486	2021-10-01 21:07:34 +01:00
Sander de Smalen	b62e6f19d7	[SelectionDAG] Handle promotion + widening in getCopyToPartsVector Some vectors require both widening and promotion for their legalization. This case is not yet handled in getCopyToPartsVector and falls back on scalarizing by default. BBecause scalable vectors can't easily be scalarised, we need to implement this in two separate stages: 1. Widen the vector. 2. Promote the vector. As part of this patch, PromoteIntRes_CONCAT_VECTORS also needed to be made scalable aware. Instead of falling back on scalarizing the vector (fixed-width only), each sub-part of the CONCAT vector is promoted, and the operation is performed on the type with the widest element type, finally truncating the result to the promoted result type. Differential Revision: https://reviews.llvm.org/D110646	2021-10-01 08:19:47 +01:00
Christopher Tetreault	3077bc90de	[NFC] Restore magic and magicu to a globally visible location While these functions are only used in one location in upstream, it has been reused in multiple downstreams. Restore this file to a globally visibile location (outside of APInt.h) to eliminate donwstream breakage and enable potential future reuse. Additionally, this patch renames types and cleans up clang-tidy issues.	2021-09-30 17:43:12 -07:00
Sander de Smalen	6709b193ea	[SelectionDAG] Make WidenVecRes_EXTRACT_SUBVECTOR work for scalable vectors. The legalizer handles this by breaking up an EXTRACT_SUBVECTOR into smaller parts, and combines those together, padding the result with UNDEF vectors, e.g. nxv6i64 extract_subvector(nxv12i64, 6) <-> nxv8i64 concat( nxv2i64 extract_subvector(nxv16i64, 6) nxv2i64 extract_subvector(nxv16i64, 8) nxv2i64 extract_subvector(nxv16i64, 10) nxv2i64 undef) Reviewed By: frasercrmck, david-arm Differential Revision: https://reviews.llvm.org/D110253	2021-09-29 11:33:45 +01:00
Itay Bookstein	7255ce30e4	[SelectionDAG] Fix incorrect condition for shift amount truncation Comment says: // If the operand is larger than the shift count type but the shift // count type has enough bits to represent any shift value ... It clearly talks about the shifted operand, not the shift-amount operand, but the comparison is performed against Log2_32_Ceil(Op2.getValueSizeInBits()) where Op2 is the shift amount operand. This comparison also doesn't make sense in the context of the previous one (ShiftsSize > Op2Size) because Op2Size == Op2.getValueSizeInBits(). Fix to use Op1. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110509	2021-09-28 17:52:30 -07:00
Arthur Eubanks	aa53785f23	Reland [clang] Rework dontcall attributes To avoid using the AST when emitting diagnostics, split the "dontcall" attribute into "dontcall-warn" and "dontcall-error", and also add the frontend attribute value as the LLVM attribute value. This gives us all the information to report diagnostics we need from within the IR (aside from access to the original source). One downside is we directly use LLVM's demangler rather than using the existing Clang diagnostic pretty printing of symbols. Previous revisions didn't properly declare the new dependencies. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D110364	2021-09-28 15:31:30 -07:00
Arthur Eubanks	7833d20f1f	Revert "[clang] Rework dontcall attributes" This reverts commit `2943071e2e`. Breaks bots	2021-09-28 14:49:27 -07:00
Arthur Eubanks	2943071e2e	[clang] Rework dontcall attributes To avoid using the AST when emitting diagnostics, split the "dontcall" attribute into "dontcall-warn" and "dontcall-error", and also add the frontend attribute value as the LLVM attribute value. This gives us all the information to report diagnostics we need from within the IR (aside from access to the original source). One downside is we directly use LLVM's demangler rather than using the existing Clang diagnostic pretty printing of symbols. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D110364	2021-09-28 14:21:10 -07:00
Xiang1 Zhang	ebe9944a34	[ISel] Legalized arithmetic.fence.f128 for 32-bits target Reviewed By: Craig Topper, Wang Pengfei Differential Revision: https://reviews.llvm.org/D110467	2021-09-28 10:27:25 +08:00
Fraser Cormack	e2b46e336b	[DAGCombiner][VP] Fold zero-length or false-masked VP ops This patch adds a generic DAGCombine for vector-predicated (VP) nodes. Those for which we can determine that no vector element is active can be replaced by either undef or, for reductions, the start value. This is tested rather trivially at the IR level, where it's possible that we want to teach instcombine to perform this optimization. However, we can also see the zero-evl case arise during SelectionDAG legalization, when wide VP operations can be split into two and the upper operation emerges as trivially false. It's possible that we could perform this optimization "proactively" (both on legal vectors and before splitting) and reduce the width of an operation and insert it into a larger undef vector: ``` v8i32 vp_add x, y, mask, 4 -> v8i32 insert_subvector (v8i32 undef), (v4i32 vp_add xsub, ysub, mask, 4), i32 0 ``` This is somewhat analogous to similar vector narrow/widening optimizations, but it's unclear at this point whether that's beneficial to do this for VP ops for any/all targets. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D109148	2021-09-27 11:30:09 +01:00
Simon Pilgrim	18c8ed5416	[DAG] ReduceLoadOpStoreWidth - replace getABITypeAlign with allowsMemoryAccess (PR45116) One of the cases identified in PR45116 - we don't need to limit store narrowing to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory access by checking allowsMisalignedMemoryAccesses as a fallback.	2021-09-25 18:35:57 +01:00
Simon Pilgrim	6bd5b1b1ce	[DAG] combineShiftToMULH - move getValueType() inside assert. NFCI. Avoids an unnecessary (void).	2021-09-25 11:56:35 +01:00
Simon Pilgrim	2a5936faf0	[CodeGen] ProcessSDDbgValues - use const-ref value in for-range loop. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-23 12:23:46 +01:00
Fraser Cormack	e7c879a69d	[RISCV][VP] Add support for VP_REDUCE_* operations This patch adds codegen support for lowering the vector-predicated reduction intrinsics to RVV instructions. The process is similar to that of the other reduction intrinsics, save for the fact that every VP reduction has a start value. We reuse the existing custom "VL" nodes, adding extra patterns where required to handle non-true masks. To support these nodes, the `RISCVISD::VECREDUCE_*_VL` nodes have been given an explicit "merge" operand. This is to faciliate the VP reductions, where we must be careful to ensure that even if no operation is performed (when VL=0) we still produce the start value. The RVV reductions don't update the destination register under these conditions, so we tie the splatted start value to the output register. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107657	2021-09-23 11:11:05 +01:00
Bjorn Pettersson	c3ae8ecb52	[DAGCombiner] Rename isAlias as mayAlias. NFC Differential Revision: https://reviews.llvm.org/D110062	2021-09-23 09:54:42 +02:00
Sander de Smalen	3e8d2008f7	[SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR. This code seems untested and is likely obsolete, because this case should already be handled by the code that legalizes the result type of EXTRACT_SUBVECTOR. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110061	2021-09-22 14:23:35 +01:00
Sander de Smalen	d5681f1d68	[SelectionDAG] Add PromoteIntOp_INSERT_SUBVECTOR. This is required to codegen something like: <vscale x 8 x i16> @llvm.experimental.vector.insert(<vscale x 8 x i16> %vec, <vscale x 2 x i16> %subvec, i64 %idx) where the output vector is legal, but the input vector needs promoting. It implements this by performing the whole operation on the promoted type, and then truncating the result. Reviewed By: david-arm, craig.topper Differential Revision: https://reviews.llvm.org/D110059	2021-09-22 13:32:36 +01:00
Sander de Smalen	4ca1fbe361	[SelectionDAG] Make WidenVecRes_Convert work for scalable vectors. Most of the code wasn't yet scalable safe, although most of the code conceptually just works for scalable vectors. This change makes the algorithm work on ElementCount, where appropriate, and leaves the fixed-width only code to use `getFixedNumElements`. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D110058	2021-09-22 10:58:38 +01:00
Michael Liao	5fb3ae525f	[SelectionDAG] Re-calculate scoped AA metadata when merging stores. Reviewed By: jeroen.dobbelaere Differential Revision: https://reviews.llvm.org/D102821	2021-09-21 11:41:17 -04:00
Simon Pilgrim	20b58855e0	[CodeGen] SelectionDAGBuilder - Use const-ref iterator in for-range loops. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-21 13:01:08 +01:00
Simon Pilgrim	0f83456cf5	[CodeGen] SDDbgValue::getSDNodes() - use const-ref to avoid unnecessary copies. NFCI. Reported by MSVC static analyzer.	2021-09-21 13:01:08 +01:00
Kazu Hirata	84b07c9b3a	[llvm] Use pop_back_val (NFC)	2021-09-19 13:44:23 -07:00
Nikita Popov	0fc624f029	[IR] Return AAMDNodes from Instruction::getMetadata() (NFC) getMetadata() currently uses a weird API where it populates a structure passed to it, and optionally merges into it. Instead, we can return the AAMDNodes and provide a separate merge() API. This makes usages more compact. Differential Revision: https://reviews.llvm.org/D109852	2021-09-16 21:06:57 +02:00
Matt Arsenault	54d755a034	DAG: Fix incorrect folding of fmul -1 to fneg The fmul is a canonicalizing operation, and fneg is not so this would break denormals that need flushing and also would not quiet signaling nans. Fold to fsub instead, which is also canonicalizing.	2021-09-14 21:25:02 -04:00
Simon Pilgrim	9db20822f7	[APInt] Add APIntOps::ScaleBitMask helper APInt is used to describe a bit mask in a variety of value tracking and demanded bits/elts functions. When traversing through dst/src operands, we have a number of places where these masks need to widened/narrowed to translate through bitcasts, reductions etc. to a different type. This patch add a APIntOps::ScaleBitMask common helper, adds unit test coverage, and updates a number of cases to use the the helper instead of their own implementation. This came up on D109065 where we currently have to add yet another implementation of the same code. Differential Revision: https://reviews.llvm.org/D109683	2021-09-13 16:27:12 +01:00
vnalamot	0fc3ebb70a	[SelectionDAG][NFC] Fix typo in VerifyDAGDiverence() function name Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109674	2021-09-13 20:48:04 +05:30
David Truby	915e9e76bf	[llvm][sve] Lowering for VLS masked extending loads This extends the custom lowering for extending loads on fixed length vectors in SVE to support masked extending loads. The existing tests for correct behaviour of masked extending loads exhibit bad code generation due to the legalistaion of i1 vectors. They have been left as-is and new tests have been added that do not exhibit this behaviour. Differential Revision: https://reviews.llvm.org/D108200	2021-09-13 11:13:25 +01:00
Sander de Smalen	ec7d8d5069	[SelectionDAG] PromoteIntRes_EXTRACT_SUBVECTOR for scalable vectors (widening). This patch implements legalization of EXTRACT_SUBVECTOR for the case where the result needs promoting, and the input type requires widening. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109509	2021-09-10 13:29:26 +01:00
Sander de Smalen	801a745dd2	[SelectionDAG] PromoteIntRes_EXTRACT_SUBVECTOR for scalable vectors. This patch implements legalization of EXTRACT_SUBVECTOR for the case where the result needs promoting, and the input type is either legal or requires splitting. The idea is that the operation is broken down into simpler steps, by first extracting a smaller subvector until the input vector becomes legal or requires promotion. Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D109313	2021-09-10 13:29:26 +01:00
Craig Topper	9af8f1b18e	[SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode. Soft deprecrate isNullValue/isAllOnesValue and update in tree callers. This matches the changes to the APInt interface from D109483. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D109535	2021-09-09 13:28:30 -07:00
Craig Topper	517728fe1e	[SelectionDAG] Use DAG.getNOT to further simplify some code. NFC Followup to D109483	2021-09-09 10:53:39 -07:00
Nick Desaulniers	e69d402088	[NFC] rename member of BitTestBlock and JumpTableHeader Follow up to suggestions in D109103 via hans: I think UnreachableDefault (or UnreachableFallthrough) would be a better name now, since it doesn't just omit the range check, it also omits the last bit test. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D109455	2021-09-09 10:43:00 -07:00
Chris Lattner	d51da74889	[CodeGen] Use DAG.getAllOnesConstant where possible to simplify code. NFC.	2021-09-09 10:22:51 -07:00
Chris Lattner	735f46715d	[APInt] Normalize naming on keep constructors / predicate methods. This renames the primary methods for creating a zero value to `getZero` instead of `getNullValue` and renames predicates like `isAllOnesValue` to simply `isAllOnes`. This achieves two things: 1) This starts standardizing predicates across the LLVM codebase, following (in this case) ConstantInt. The word "Value" doesn't convey anything of merit, and is missing in some of the other things. 2) Calling an integer "null" doesn't make any sense. The original sin here is mine and I've regretted it for years. This moves us to calling it "zero" instead, which is correct! APInt is widely used and I don't think anyone is keen to take massive source breakage on anything so core, at least not all in one go. As such, this doesn't actually delete any entrypoints, it "soft deprecates" them with a comment. Included in this patch are changes to a bunch of the codebase, but there are more. We should normalize SelectionDAG and other APIs as well, which would make the API change more mechanical. Differential Revision: https://reviews.llvm.org/D109483	2021-09-09 09:50:24 -07:00
Chris Lattner	9e46dd965a	[APInt.h] Reduce the APInt header file interface a bit. NFC This moves one mid-size function out of line, inlines the trivial tcAnd/tcOr/tcXor/tcComplement methods into their only caller, and moves the magic/umagic functions into SelectionDAG since they are implementation details of its algorithm. This also removes the unit tests for magic, but these are already tested in the divide lowering logic for various targets. This also upgrades some C style comments to C++. Differential Revision: https://reviews.llvm.org/D109476	2021-09-08 18:17:07 -07:00
Nick Desaulniers	4331f19d8b	[ISEL][BitTestBlock] omit additional bit test when default destination is unreachable Otherwise we end up with an extra conditional jump, following by an unconditional jump off the end of a function. ie. bb.0: BT32rr .. JCC_1 %bb.4 ... bb.1: BT32rr .. JCC_1 %bb.2 ... JMP_1 %bb.3 bb.2: ... bb.3.unreachable: bb.4: ... Should be equivalent to: bb.0: BT32rr .. JCC_1 %bb.4 ... JMP_1 %bb.2 bb.1: bb.2: ... bb.3.unreachable: bb.4: ... This can occur since at the higher level IR (Instruction) SwitchInsts are required to have BBs for default destinations, even when it can be deduced that such BBs are unreachable. For most programs, this isn't an issue, just wasted instructions since the unreachable has been statically proven. The x86_64 Linux kernel when built with CONFIG_LTO_CLANG_THIN=y fails to boot though once D106056 is re-applied. D106056 makes it more likely that correlation-propagation (CVP) can deduce that the default case of SwitchInsts are unreachable. The x86_64 kernel uses a binary post processor called objtool, which emits this warning: vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x169: can't find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x17b I haven't debugged precisely why this causes a failure at boot time, but fixing this very obvious jump off the end of the function fixes the warning and boot problem. Link: https://bugs.llvm.org/show_bug.cgi?id=50080 Fixes: https://github.com/ClangBuiltLinux/linux/issues/679 Fixes: https://github.com/ClangBuiltLinux/linux/issues/1440 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D109103	2021-09-08 11:03:47 -07:00
David Green	d8d24c64fe	[DAG] Fix GT -> GE condition when creating SetCC `79845ed6df` folded some setcc(ashr) conditions to setcc, but got the condition for NE incorrect, using GT where it should be using GE.	2021-09-08 12:41:51 +01:00
Fraser Cormack	2c5568a6a9	[LegalizeTypes][VP] Add promotion support for binary VP ops This patch extends the preliminary support for vector-predicated (VP) operation legalization to include promotion of illegal integer vector types. Integer promotion of binary VP operations is relatively simple and piggy-backs on the non-VP logic, but passing the two extra mask and VP operands through to the promoted operation. Tests have been added to the RISC-V target to cover the basic scenarios for integer promotion for both fixed- and scalable-vector types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D108288	2021-09-08 10:22:57 +01:00
Fraser Cormack	a823bdf3ab	[RISCV][VP] Custom lower VP_STORE and VP_LOAD This patch adds support for the vector-predicated `VP_STORE` and `VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and `MLOAD`: to regular load/store instructions via intrinsics. One necessary change was made to `SelectionDAGLegalize` so that `VP_STORE` nodes' operation actions are taken from the stored "value" operands, in the same vein as `STORE` or `MSTORE`. Reviewed By: craig.topper, rogfer01 Differential Revision: https://reviews.llvm.org/D108999	2021-09-07 10:53:25 +01:00
Fraser Cormack	f4dee8cb82	[RISCV][VP] Custom lower VP_SCATTER and VP_GATHER This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by lowering them to RVV's `vsox`/`vlux` instructions, respectively. This process is almost identical to the existing `MSCATTER`/`MGATHER` support. One extra change was made to `SelectionDAGLegalize` so that `VP_SCATTER`'s operation action is derived from its stored "value" operand rather than its return type (which is always the chain). Reviewed By: craig.topper, rogfer01 Differential Revision: https://reviews.llvm.org/D108987	2021-09-07 10:43:07 +01:00
Sanjay Patel	e1e4bf174b	[DAGCombine] Prevent the transform of combine for multi-use operand The test is based on a miscompile example in: https://llvm.org/PR51321 Differential Revision: https://reviews.llvm.org/D107692	2021-09-06 15:30:32 -04:00
Jonas Paulsson	118997d8e9	[SelectionDAGBuilder] Bugfix in visitInlineAsm() In case of a virtual register tied to a phys-def, the register class needs to be computed. Make sure that this works generally also with fast regalloc by using TLI.getRegClassFor() whenever possible, and make only the case of 'Untyped' use getMinimalPhysRegClass(). Fixes https://bugs.llvm.org/show_bug.cgi?id=51699. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D109291	2021-09-06 17:46:31 +02:00
David Green	1b83aaaefa	[DAG] Remove oneuse check in select_cc setgt X, -1, C, ~C fold This appears to produce better code, even if the condition may need to be replicated.	2021-09-05 16:18:31 +01:00
David Green	8523fb96a6	[DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Given a select_cc producing a constant and a invertion of the constant for a comparison more than zero, we can produce an xor with ashr instead, which produces smaller code. The ashr either sets all bits or clear all bits depending on if the value is negative. This is then xor'd with the constant to optionally negate the value. https://alive2.llvm.org/ce/z/DTFaBZ This includes a OneUseCheck on the Cmp, which seems to make thinks a little worse and will be removed in a followup. Differential Revision: https://reviews.llvm.org/D109149	2021-09-05 16:04:01 +01:00
David Green	79845ed6df	[DAG] Fold setcc eq with ashr to compare to zero. Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 -> set_cc setlt X, 0 to prevent some regressions later on when folding select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Differential Revision: https://reviews.llvm.org/D109214	2021-09-05 14:06:47 +01:00
Heejin Ahn	28780e59f6	[WebAssembly] Add Wasm SjLj support This add support for SjLj using Wasm exception handling instructions: https://github.com/WebAssembly/exception-handling/blob/master/proposals/exception-handling/Exceptions.md This does not yet support the mixed use of EH and SjLj within a function. It will be added in a follow-up CL. This currently passes all SjLj Emscripten tests for wasm0/1/2/3/s, except for the below: - `test_longjmp_standalone`: Uses Node - `test_dlfcn_longjmp`: Uses NodeRAWFS - `test_longjmp_throw`: Mixes EH and SjLj - `test_exceptions_longjmp1`: Mixes EH and SjLj - `test_exceptions_longjmp2`: Mixes EH and SjLj - `test_exceptions_longjmp3`: Mixes EH and SjLj Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D108960	2021-09-02 10:51:02 -07:00
Roman Lebedev	3f1f08f0ed	Revert @llvm.isnan intrinsic patchset. Please refer to https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html (and that whole thread.) TLDR: the original patch had no prior RFC, yet it had some changes that really need a proper RFC discussion. It won't be productive to discuss such an RFC, once it's actually posted, while said patch is already committed, because that introduces bias towards already-committed stuff, and the tree is potentially in broken state meanwhile. While the end result of discussion may lead back to the current design, it may also not lead to the current design. Therefore i take it upon myself to revert the tree back to last known good state. This reverts commit `4c4093e6e3`. This reverts commit `0a2b1ba33a`. This reverts commit `d9873711cb`. This reverts commit `791006fb8c`. This reverts commit `c22b64ef66`. This reverts commit `72ebcd3198`. This reverts commit `5fa6039a5f`. This reverts commit `9efda541bf`. This reverts commit `94d3ff09cf`.	2021-09-02 13:53:56 +03:00
Fraser Cormack	ef78f2106c	[LegalizeTypes][VP] Add splitting support for binary VP ops This patch extends D107904's introduction of vector-predicated (VP) operation legalization to include vector splitting. When the result of a binary VP operation needs splitting, all of its operands are split in kind. The two operands and the mask are split as usual, and the vector-length parameter EVL is "split" such that the low and high halves each execute the correct number of elements. Tests have been added to the RISC-V target to show splitting several scenarios for fixed- and scalable-vector types. Without support for `umax` (e.g. in the `B` extension) the generated code starts to branch. Ideally a cost model would prevent their insertion in the first place. Through these tests many opportunities for better codegen can be seen: combining known-undef VP operations and for constant-folding operations on `ISD::VSCALE`, to name but a few. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D107957	2021-09-02 10:15:53 +01:00
Abinav Puthan Purayil	0baace5379	[DAGCombine] Add node level checks for fp-contract and fp-ninf in visitFMULForFMADistributiveCombine(). Differential Revision: https://reviews.llvm.org/D107551	2021-09-02 11:33:14 +05:30
Roman Lebedev	f5753125f0	[Codegen][TLI][X86] SimplifyMultipleUseDemandedBits(): 0'th vec subreg widening is free, try to perform it earlier I believe, the profitability reasoning here is correct "sub"reg is already located within the 0'th subreg of wider reg, so if we have suvector insertion at index 0 into undef, then it's always free do to. After this, D109065 finally avoids the regression in D108382. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D109074	2021-09-02 00:54:05 +03:00
Fraser Cormack	85fd44d7fe	[SelectionDAG][NFC] Fix typo in assertion message s/Uexpected/Unexpected.	2021-09-01 08:55:06 +01:00
Hussain Kadhem	524ded7d01	[VP] implementation of sdag support for VP memory intrinsics Followup to D99355: SDAG support for vector-predicated load/store/gather/scatter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D105871	2021-08-31 17:01:50 +02:00
Craig Topper	201f6446da	[LegalizeTypes][X86] Improve ExpandIntRes_FP_TO_SINT/ExpandIntRes_FP_TO_UINT when input is SoftPromoteHalf. Instead of splitting off the fp16 to float conversion and generating a libcall, we should split the operation into fp16 to float and float to integer operations. This will allow the float to integer conversion to go through any custom handling the target has. If the target doesn't have custom handling then we should come back to ExpandIntRes_FP_TO_SINT/ ExpandIntRes_FP_TO_UINT automatically to create the libcall. This avoids generating libcalls on 32-bit X86. These library functions may not exist in 32-bit libgcc. At least for LLVM, we never generate them when hardware floating point instructions are available. Differential Revision: https://reviews.llvm.org/D108933	2021-08-30 13:12:59 -07:00
Bjorn Pettersson	789f01283d	[SelectionDAG] Fix miscompile bugs related to smul.fix.sat with scale zero When expanding a SMULFIXSAT ISD node (usually originating from a smul.fix.sat intrinsic) we've applied some optimizations for the special case when the scale is zero. The idea has been that it would be cheaper to use an SMULO instruction (if legal) to perform the multiplication and at the same time detect any overflow. And in case of overflow we could use some SELECT:s to replace the result with the saturated min/max value. The only tricky part is to know if we overflowed on the min or max value, i.e. if the product is positive or negative. Unfortunately the implementation has been incorrect as it has looked at the product returned by the SMULO to determine the sign of the product. In case of overflow that product is truncated and won't give us the correct sign bit. This patch is adding an extra XOR of the multiplication operands, which is used to determine the sign of the non truncated product. This patch fixes PR51677. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D108938	2021-08-30 22:08:26 +02:00
Craig Topper	705d005781	[DAGCombiner][RISCV] Don't use vector types in DAGCombiner::tryStoreMergeOfLoads if we need a rotate. The check for whether a rotate is possible occurs before the memory legality checks for the integer type. So it's possible we decide we can use a rotate, but then fail the legality checks. If that happens we should not fall back to a vector type. This triggers an assertion in the rotate handling when it finds a vector type instead of an integer type. In theory we could use a shufflevector in place of the rotate, but right now I'd just like to fix the crash. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D108839	2021-08-30 08:47:15 -07:00
Carl Ritson	5d9de3ea18	[DAGCombine] Allow FMA combine with both FMA and FMAD Without this change only the preferred fusion opcode is tested when attempting to combine FMA operations. If both FMA and FMAD are available then FMA ops formed prior to legalization will not be merged post legalization as FMAD becomes the preferred fusion opcode. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D108619	2021-08-27 19:49:35 +09:00
Craig Topper	8bb24289f3	[SelectionDAG] Optimize bitreverse expansion to minimize the number of mask constants. We can halve the number of mask constants by masking before shl and after srl. This can reduce the number of mov immediate or constant materializations. Or reduce the number of constant pool loads for X86 vectors. I think we might be able to do something similar for bswap. I'll look at it next. Differential Revision: https://reviews.llvm.org/D108738	2021-08-26 09:33:24 -07:00
Sanjay Patel	e728d1a3e8	[DAGCombiner] create binop nodes with all of expected values This is another bug exposed by https://llvm.org/PR51612 (and the one that triggered the initial assertion) in the report. That example was suppressed with: `985b48f183` ...but these would still crash because we created nodes like UADDO without the expected 2 output values.	2021-08-25 16:14:22 -04:00
Sanjay Patel	985b48f183	[DAGCombiner] check uses more strictly on select-of-binop fold There are 2 bugs here: 1. We were not checking uses of operand 2 (the false value of the select). 2. We were not checking for multiple uses of nodes that produce >1 result. Correcting those is enough to avoid the crash in the reduced test based on: https://llvm.org/PR51612 The additional use check on operand 0 (the condition value of the select) should not strictly be necessary because we are only replacing one use with another (whether it makes performance sense to do the transform with that pattern is not clear). But as noted in the TODO, changing that uncovers another bug. Note: there's at least one more bug here - we aren't propagating EVTs correctly, but I plan to fix that in another patch.	2021-08-25 14:14:41 -04:00

... 8 9 10 11 12 ...

12514 Commits