llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	bc9ab9a5cd	[DAG] CombineToPreIndexedLoadStore - use const APInt& for getAPIntValue(). NFCI. Cleanup some code to use auto* properly from cast, and use const APInt& for getAPIntValue() to avoid an unnecessary copy.	2021-01-21 11:04:09 +00:00
Hans Wennborg	a51226057f	Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE" It caused "Vector shift amounts must be in the same as their first arg" asserts in Chromium builds. See the code review for repro instructions. > Add DemandedElts support inside the TRUNCATE analysis. > > Differential Revision: https://reviews.llvm.org/D56387 This reverts commit `cad4275d69`.	2021-01-20 20:06:55 +01:00
Simon Pilgrim	cad4275d69	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE Add DemandedElts support inside the TRUNCATE analysis. Differential Revision: https://reviews.llvm.org/D56387	2021-01-20 15:39:58 +00:00
Kazu Hirata	b023cdeacc	[llvm] Use llvm::all_of (NFC)	2021-01-19 20:19:17 -08:00
Kazu Hirata	8857202489	[llvm] Use llvm::find (NFC)	2021-01-19 20:19:14 -08:00
Simon Pilgrim	207f32948b	[DAG] SimplifyDemandedBits - use KnownBits comparisons to remove ISD::UMIN/UMAX ops Use the KnownBits icmp comparisons to determine when a ISD::UMIN/UMAX op is unnecessary should either op be known to be ULT/ULE or UGT/UGE than the other. Differential Revision: https://reviews.llvm.org/D94532	2021-01-18 10:29:23 +00:00
Simon Pilgrim	46aa3c6c33	[DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - improve shuffle(shuffle(x,y),shuffle(x,y)) merging MergeInnerShuffle currently attempts to merge shuffle(shuffle(x,y),z) patterns into a single shuffle, using 1 or 2 of the x,y,z ops. However if we already match 2 ops we might be able to handle the third op if its also a shuffle that references one of the previous ops, allowing us to handle some cases like: shuffle(shuffle(x,y),shuffle(x,y)) shuffle(shuffle(shuffle(x,z),y),z) shuffle(shuffle(x,shuffle(x,y)),z) etc. This isn't an exhaustive match and is dependent on the order the candidate ops are encountered - if one of the matched ops was a shuffle that was peek-able we don't go back and try to split that, I haven't found much need for that amount of analysis yet. This is a preliminary patch that will allow us to later improve x86 HADD/HSUB matching - but needs to be reviewed separately as its in generic code and affects existing Thumb2 tests. Differential Revision: https://reviews.llvm.org/D94671	2021-01-15 15:08:31 +00:00
Simon Pilgrim	7c30c05ff7	[DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - reset shuffle ops and reorder early-out and second op matching. NFCI. I'm hoping to reuse MergeInnerShuffle in some other folds - so ensure the candidate ops/mask are reset at the start of each run. Also, move the second op matching before bailing to make it simpler to try to match other things afterward.	2021-01-14 11:55:20 +00:00
Simon Pilgrim	af8d27a7a8	[DAG] visitVECTOR_SHUFFLE - pull out shuffle merging code into lambda helper. NFCI. Make it easier to reuse in a future patch.	2021-01-14 11:05:19 +00:00
Kazu Hirata	5c1c39e8d8	[llvm] Use *Set::contains (NFC)	2021-01-13 19:14:41 -08:00
Simon Pilgrim	993c488ed2	[DAG] visitVECTOR_SHUFFLE - use all_of to check for all-undef shuffle mask. NFCI.	2021-01-13 17:19:41 +00:00
Juneyoung Lee	25eb7b08ba	[DAGCombiner] Fold BRCOND(FREEZE(COND)) to BRCOND(COND) This patch resolves the suboptimal codegen described in http://llvm.org/pr47873 . When CodeGenPrepare lowers select into a conditional branch, a freeze instruction is inserted. It is then translated to `BRCOND(FREEZE(SETCC))` in SelDag. The `FREEZE` in the middle of `SETCC` and `BRCOND` was causing a suboptimal code generation however. This patch adds `BRCOND(FREEZE(cond))` -> `BRCOND(cond)` fold to DAGCombiner to remove the `FREEZE`. To make this optimization sound, `BRCOND(UNDEF)` simply should nondeterministically jump to the branch or not, rather than raising UB. It wasn't clear what happens when the condition was undef according to the comments in ISDOpcodes.h, however. I updated the comments of `BRCOND` to make it explicit (as well as `BR_CC`, which is also a conditional branch instruction). Note that it diverges from the semantics of `br` instruction in IR, which is explicitly UB. Since the UB semantics was necessary to explain optimizations that use branching conditions, and SelDag doesn't seem to have such optimization, I think this divergence is okay. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D92015	2021-01-13 09:36:52 +09:00
Craig Topper	df74c001fa	[DAGCombiner] Replace static helper function isConstantFPBuildVectorOrConstantFP with the identical version in SelectionDAG. NFC	2021-01-11 23:41:40 -08:00
Joe Ellis	007358239d	[DAGCombiner] Use getVectorElementCount inside visitINSERT_SUBVECTOR This avoids TypeSize-/ElementCount-related warnings. Differential Revision: https://reviews.llvm.org/D92747	2021-01-11 14:15:11 +00:00
QingShan Zhang	7539c75bb4	[DAGCombine] Remove the check for unsafe-fp-math when we are checking the AFN We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent. As the direction is to remove this global option, we need to remove the unsafe-fp-math check for sqrt and update the test with afn fast-math flags. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D93891	2021-01-11 02:25:53 +00:00
Craig Topper	4ef91f5871	[DAGCombiner] Don't speculatively create an all ones constant in visitREM that might not be used. This looks to have been done to save some duplicated code under two different if statements, but it ends up being harmful to D94073. This speculative constant can be called on a scalable vector type with i64 element size when i64 scalars aren't legal. The code tries and fails to find a vector type with i32 elements that it can use. So only create the node when we know it will be used.	2021-01-05 12:45:57 -08:00
Cameron McInally	92be640bd7	[FPEnv][AMDGPU] Disable FSUB(-0,X)->FNEG(X) DAGCombine when subnormals are flushed This patch disables the FSUB(-0,X)->FNEG(X) DAG combine when we're flushing subnormals. It requires updating the existing AMDGPU tests to use the fneg IR instruction, in place of the old fsub(-0,X) canonical form, since AMDGPU is the only backend currently checking the DenormalMode flags. Note that this will require follow-up optimizations to make sure the FSUB(-0,X) form is handled appropriately Differential Revision: https://reviews.llvm.org/D93243	2021-01-04 14:44:10 -06:00
Layton Kifer	d29f93bda5	[DAGCombiner] Don't create sexts of deleted xors when they were in-visit replaced Fixes a bug introduced by D91589. When folding `(sext (not i1 x)) -> (add (zext i1 x), -1)`, we try to replace the not first when possible. If we replace the not in-visit, then the now invalidated node will be returned, and subsequently we will return an invalid sext. In cases where the not is replaced in-visit we can simply return SDValue, as the not in the current sext should have already been replaced. Thanks @jgorbe, for finding the below reproducer. The following reduced test case crashes clang when built with `clang -O1 -frounding-math`: ``` template <class> class a { int b() { return c == 0.0 ? 0 : -1; } int c; }; template class a<long>; ``` A debug build of clang produces this "assertion failed" error: ``` clang: /home/jgorbe/code/llvm/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:264: void {anonymous}::DAGCombiner::AddToWorklist(llvm:: SDNode*): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed. ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93274	2020-12-23 16:16:26 -08:00
Layton Kifer	385e9a2a04	[DAGCombiner] Improve shift by select of constant Clean up a TODO, to support folding a shift of a constant by a select of constants, on targets with different shift operand sizes. Reviewed By: RKSimon, lebedev.ri Differential Revision: https://reviews.llvm.org/D90349	2020-12-18 02:21:42 +00:00
Kerry McLaughlin	05edfc5475	[SVE][CodeGen] Add DAG combines for s/zext_masked_gather This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true: - fold (and (masked_gather x)) -> (zext_masked_gather x) - fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x) LowerMGATHER has also been updated to fetch the LoadExtType associated with the gather and also use this value to determine the correct masked gather opcode to use. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92230	2020-12-09 11:53:19 +00:00
Kerry McLaughlin	4519ff4b6f	[SVE][CodeGen] Add the ExtensionType flag to MGATHER Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode. Also updated SelectionDAGDumper::print_details so that details of the gather load (is signed, is scaled & extension type) are printed. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91084	2020-12-09 11:19:08 +00:00
Huihui Zhang	8e6fc1f97e	[AArch64][SVE] Add lowering for llvm.maxnum\|minnum for scalable type. LLVM intrinsic llvm.maxnum\|minnum is overloaded intrinsic, can be used on any floating-point or vector of floating-point type. This patch extends current infrastructure to support scalable vector type. This patch also fix a warning message of incorrect use of EVT::getVectorNumElements() for scalable type, when DAGCombiner trying to split scalable vector. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92607	2020-12-08 09:35:53 -08:00
Kai Luo	44bd8ea167	[DAGCombine][PowerPC] Simplify nabs by using legal `smin` operation Convert `0 - abs(x)` to `smin (x, -x)` if `smin` is a legal operation. Verification: https://alive2.llvm.org/ce/z/vpquFR Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92637	2020-12-08 03:24:07 +00:00
Simon Pilgrim	b6e847c396	[DAG] Cleanup by folding some single use VT.getScalarSizeInBits() calls into its comparison. NFCI.	2020-12-07 18:23:54 +00:00
Kerry McLaughlin	111f559bbd	[SVE][CodeGen] Call refineIndexType & refineUniformBase from visitMGATHER The refineIndexType & refineUniformBase functions added by D90942 can also be used to improve CodeGen of masked gathers. These changes were split out from D91092 Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92319	2020-12-07 13:20:19 +00:00
Bing1 Yu	eee30a6dce	[CodeGen] Modify the refineIndexType(...)'s code to fix a bug in D90942. In previous code, when refineIndexType(...) is called and Index is undef, Index.getOperand(0) will raise a assertion fail. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D92548	2020-12-07 08:49:07 +08:00
Layton Kifer	ac522f8700	[DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1) Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets. Differential Revision: https://reviews.llvm.org/D91589	2020-12-06 11:52:10 -05:00
Simon Pilgrim	6f4ee6f870	[DAGCombiner] Use const APInt& for getConstantOperandAPInt results. NFCI. Avoid unnecessary instantiation. Noticed while removing unnecessary autos	2020-12-04 09:44:58 +00:00
dfukalov	2ce38b3f03	[NFC] Reduce include files dependency. 1. Removed #include "...AliasAnalysis.h" in other headers and modules. 2. Cleaned up includes in AliasAnalysis.h. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92489	2020-12-03 18:25:05 +03:00
Joe Ellis	78c0ea54a2	[DAGCombine] Fix TypeSize warning in DAGCombine::visitLIFETIME_END Bail out early if we encounter a scalable store. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D92392	2020-12-03 12:12:41 +00:00
Layton Kifer	d7fec38f05	[DAGCombiner][NFC] Replace duplicate implementation flipBoolean with DAG.getLogicalNOT Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D92246	2020-12-01 22:23:04 +03:00
Benjamin Kramer	107e92dff8	[DAG] Remove unused variable. NFC.	2020-12-01 16:29:02 +01:00
Simon Pilgrim	1b209ff9e3	[DAG] Move vselect(icmp_ult, 0, sub(x,y)) -> usubsat(x,y) to DAGCombine (PR40111) Move the X86 VSELECT->USUBSAT fold to DAGCombiner - there's nothing target specific about these folds.	2020-12-01 14:25:29 +00:00
Simon Pilgrim	6dbd0d36a1	[DAG] Move vselect(icmp_ult, -1, add(x,y)) -> uaddsat(x,y) to DAGCombine (PR40111) Move the X86 VSELECT->UADDSAT fold to DAGCombiner - there's nothing target specific about these folds. The SSE42 test diffs are relatively benign - its avoiding an extra constant load in exchange for an extra xor operation - there are extra register moves, which is annoying as all those operations should commute them away. Differential Revision: https://reviews.llvm.org/D91876	2020-12-01 11:56:26 +00:00
QingShan Zhang	4d83aba422	[DAGCombine] Adding a hook to improve the precision of fsqrt if the input is denormal For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will have the impact the precision. As the fsqrt added belong to the cold path of the cmp+branch, it won't impact the performance for normal inputs for PowerPC, but improve the precision if the input is denormal. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D80974	2020-11-27 02:10:55 +00:00
QingShan Zhang	9c588f53fc	[DAGCombine] Add hook to allow target specific test for sqrt input PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root. LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to target that has test instructions. Reviewed By: Spatel, Chen Zheng, Qiu Chao Fang Differential Revision: https://reviews.llvm.org/D80706	2020-11-25 05:37:15 +00:00
Kai Luo	5931be60b5	[DAGCombine][PowerPC] Convert negated abs to trivial arithmetic ops This patch converts `0 - abs(x)` to `Y = sra (X, size(X)-1); sub (Y, xor (X, Y))` for better codegen. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D91120	2020-11-24 09:43:35 +00:00
Kerry McLaughlin	306c8ab208	[SVE][CodeGen] Improve codegen of scalable masked scatters If the scatter store is able to perform the sign/zero extend of its index, this is folded into the instruction with refineIndexType(). Additionally, refineUniformBase() will return the base pointer and index from an add + splat_vector. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90942	2020-11-13 11:19:36 +00:00
Francesco Petrogalli	fc2fe6817e	[llvm][AArch64] Simplify (and (sign_extend..) #bitmask). Fold VT = (and (sign_extend NarrowVT to VT) #bitmask) into VT = (zero_extend NarrowVT) With this combine, the test replaces a sign extended load + an unsigned extention with a zero extended load to render one of the operands of the last multiplication. BEFORE \| AFTER f_i16_i32: \| f_i16_i32: .fnstart \| .fnstart ldrsh r0, [r0] \| ldrh r1, [r1] ldrsh r1, [r1] \| ldrsh r0, [r0] smulbb r0, r1, r0 \| smulbb r0, r0, r1 uxth r1, r1 \| mul r0, r0, r1 mul r0, r0, r1 \| bx lr bx lr \| Reviewed By: resistor Differential Revision: https://reviews.llvm.org/D90605	2020-11-09 12:53:36 +00:00
Fraser Cormack	f99580c1e5	[DAGCombine] Fix bug in load scalarization Summary: For vector element types which are not byte-sized, we would generate incorrect scalar offsets and produce incorrect codegen. This optimization could potentially be supported in the future, e.g. by loading in bytes, then shifting and masking out the remaining bits of the vector element. However, without an upstream target to test against it's best to avoid the bad codegen in the simplest possible way. Related to this bug: https://bugs.llvm.org/show_bug.cgi?id=27600 Reviewed by: foad Differential Revision: https://reviews.llvm.org/D78568	2020-11-04 19:02:40 +00:00
Kerry McLaughlin	f2412d372d	[SVE][CodeGen] Lower scalable integer vector reductions This patch uses the existing LowerFixedLengthReductionToSVE function to also lower scalable vector reductions. A separate function has been added to lower VECREDUCE_AND & VECREDUCE_OR operations with predicate types using ptest. Lowering scalable floating-point reductions will be addressed in a follow up patch, for now these will hit the assertion added to expandVecReduce() in TargetLowering. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D89382	2020-11-04 11:38:49 +00:00
Simon Pilgrim	f53d7f55f1	[DAG] Move canFoldInAddressingMode before foldBinOpIntoSelect. NFC. Reduces the diff in D90113.	2020-10-28 12:16:05 +00:00
Peter Waller	5b742a0c10	[SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination The modified code in visitSTORE was missing a scalable vector check, and still using the now deprecated implicit cast of TypeSize to uint64_t through the overloaded operator. This patch fixes these issues. This brings the logic in line with the comment on the context line immediately above the added precondition. Add a test in sve-redundant-store.ll that the warning is not triggered. Differential Revision: https://reviews.llvm.org/D89701	2020-10-26 16:37:48 +00:00
Peter Waller	6536d6040f	Revert "[SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination" This reverts commit `4604441386`. Reverting because it was not the intended version of the patch, which follows this patch.	2020-10-26 16:37:00 +00:00
Peter Waller	4604441386	[SVE][CodeGen][DAGCombiner] Fix TypeSize warning in redundant store elimination The modified code in visitSTORE was missing a scalable vector check, and still using the now deprecated implicit cast of TypeSize to uint64_t through the overloaded operator. This patch fixes these issues. This brings the logic in line with the comment on the context line immediately above the added precondition. Add a test in Redundantstores.ll that the warning is not triggered.	2020-10-26 16:23:42 +00:00
Qiu Chaofan	1b2fe71ecf	[DAGCombiner] Tighten reasscociation of visitFMA From LangRef, FMF contract should not enable reassociating to form arbitrary contractions. So it should not help rearrange nodes like (fma (fmul x, c1), c2, y) into (fma x, c1*c2, y). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D89527	2020-10-20 10:13:01 +08:00
Amy Kwan	6a946fd06f	[DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use isOperationLegalOrCustom directly instead. MULH is often expanded on targets. This patch removes the isMulhCheaperThanMulShift hook and uses isOperationLegalOrCustom instead. Differential Revision: https://reviews.llvm.org/D80485	2020-10-19 12:23:04 -05:00
David Sherwood	3945b69e81	[SVE][CodeGen] Replace more TypeSize comparison operators with their scalar equivalents In certain places in llvm/lib/CodeGen we were relying upon the TypeSize comparison operators when in fact the code was only ever expecting either scalar values or fixed width vectors. This patch changes a few functions that were always expecting to work on scalar or fixed width types: 1. DAGCombiner::mergeTruncStores - deals with scalar integers only. 2. DAGCombiner::ReduceLoadWidth - not valid for vectors. 3. DAGCombiner::createBuildVecShuffle - should only be used for fixed width vectors. 4. SelectionDAGLegalize::ExpandFCOPYSIGN and SelectionDAGLegalize::getSignAsIntValue - only work on scalars. Differential Revision: https://reviews.llvm.org/D88562	2020-10-19 08:38:50 +01:00
David Sherwood	35a531fb45	[SVE][CodeGen][NFC] Replace TypeSize comparison operators with their scalar equivalents In certain places in llvm/lib/CodeGen we were relying upon the TypeSize comparison operators when in fact the code was only ever expecting either scalar values or fixed width vectors. I've changed some of these places to use the equivalent scalar operator. Differential Revision: https://reviews.llvm.org/D88482	2020-10-19 08:30:31 +01:00
David Sherwood	f693f915a0	[SVE][CodeGen] Replace uses of TypeSize comparison operators In certain places in the code we can never end up in a situation where we're mixing fixed width and scalable vector types. For example, we can't have truncations and extends that change the lane count. Also, in other places such as GenWidenVectorStores and GenWidenVectorLoads we know from the behaviour of FindMemType that we can never choose a vector type with a different scalable property. In various places I have used EVT::bitsXY functions instead of TypeSize::isKnownXY, where it probably makes sense to keep an assert that scalable properties match. Differential Revision: https://reviews.llvm.org/D88654	2020-10-19 08:08:41 +01:00
Craig Topper	1687a8d83b	[X86][SelectionDAG] Add SADDO_CARRY and SSUBO_CARRY to support multipart signed add/sub overflow legalization. This passes existing X86 test but I'm not sure if it handles all type legalization cases it needs to. Alternative to D89200 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D89222	2020-10-12 23:18:29 -07:00
David Sherwood	c5ba0d33cc	[SVE] Make ElementCount and TypeSize use a new PolySize class I have introduced a new template PolySize class, where the template parameter determines the type of quantity, i.e. for an element count this is just an unsigned value. The ElementCount class is now just a simple derivation of PolySize<unsigned>, whereas TypeSize is more complicated because it still needs to contain the uint64_t cast operator, since there are still many places in the code that rely upon this implicit cast. As such the class also still needs some of it's own operators. I've tried to minimise the amount of code in the base PolySize class, which led to a couple of changes: 1. In some places we were relying on '==' operator comparisons between ElementCounts and the scalar value 1. I didn't put this operator in the new PolySize class, and thought it was actually clearer to use the isScalar() function instead. 2. I removed the isByteSized function and replaced it with calls to isKnownMultipleOf(8). I've also renamed NextPowerOf2 to be coefficientNextPowerOf2 so that it's more consistent with coefficientDivideBy. Differential Revision: https://reviews.llvm.org/D88409	2020-10-12 08:23:38 +01:00
Esme-Yi	e9fd8823ba	[DAGCombiner] Add decomposition patterns for Mul-by-Imm. Summary: This patch is derived from D87384. In this patch we expand the existing decomposition of mul-by-constant to be more general by implementing 2 patterns: ``` mul x, (2^N + 2^M) --> (add (shl x, N), (shl x, M)) mul x, (2^N - 2^M) --> (sub (shl x, N), (shl x, M)) ``` The conversion will be trigged if the multiplier is a big constant that the target can't use a single multiplication instruction to handle. This is controlled by the hook `decomposeMulByConstant`. More over, the conversion benefits from an ILP improvement since the instructions are independent. A case with the sequence like following also gets benefit since a shift instruction is saved. ``` res1 = a 0x8800; res2 = a 0x8080; ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D88201	2020-10-09 08:51:40 +00:00
David Sherwood	4ed47d50ea	[SVE][CodeGen] Fix DAGCombiner::ForwardStoreValueToDirectLoad for scalable vectors In DAGCombiner::ForwardStoreValueToDirectLoad I have fixed up some implicit casts from TypeSize -> uint64_t and replaced calls to getVectorNumElements() with getVectorElementCount(). There are some simple cases of forwarding that we can definitely support for scalable vectors, i.e. when the store and load are both scalable vectors and have the same size. I have added tests for the new code paths here: CodeGen/AArch64/sve-forward-st-to-ld.ll Differential Revision: https://reviews.llvm.org/D87098	2020-10-06 08:04:03 +01:00
Craig Topper	1127662c6d	[SelectionDAG] Make sure FMF are propagated when getSetcc canonicalizes FP constants to RHS. getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize FP constants to the RHS. When this happens we should create the node with the FMF that was requested. By using FlagInserter when can ensure any calls to getNode/getSetcc during canonicalization will also get the flags. Differential Revision: https://reviews.llvm.org/D88063	2020-10-05 14:55:23 -07:00
Sanjay Patel	2ccbf3dbd5	[SDAG] fold x * 0.0 at node creation time In the motivating case from https://llvm.org/PR47517 we create a node that does not get constant folded before getNegatedExpression is attempted from some other node, and we crash. By moving the fold into SelectionDAG::simplifyFPBinop(), we get the constant fold sooner and avoid the problem.	2020-10-04 11:31:57 -04:00
David Sherwood	bafdd11326	[SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy After some recent upstream discussion we decided that it was best to avoid having the / operator for both ElementCount and TypeSize, since this could give the impression that these classes can be used in the same way as basic integer integer types. However, division for scalable types is a bit odd because we are only dividing the minimum quantity by a value, as opposed to something like: (MinSize * Vscale) / SomeValue This is why when performing division it's important the caller first establishes whether the operation makes sense, perhaps by calling isKnownMultipleOf() prior to division. The caller must now explictly call divideCoefficientBy() on the class to perform the operation. Differential Revision: https://reviews.llvm.org/D87700	2020-09-28 08:03:00 +01:00
Simon Pilgrim	a61272a900	[DAG] Fold vector mul(x,0)/mul(x,1) to a clearing mask If we're multiplying all elements of a vector by '0' or '1' then we can more efficiently perform this as a clearing mask (that is likely to further simplify to a shuffle blend). This was noticed when reviewing D87502 but seems to help idiv/irem by constant cases even more as '0'/'1' values are often used for 'passthrough' cases. Differential Revision: https://reviews.llvm.org/D88225	2020-09-26 14:31:57 +01:00
Qiu Chaofan	c0f8e4c06c	[SelectionDAG] Add guard to automatically insert flags This is like FastMathFlagGuard in IR. Since we use SDAG instance to get values, it's with SelectionDAG. By creating a FlagInserter in current scope, all values created by getNode will get the flags if no Flags argument provided. In this patch, I applied it to floating point operations folding part in DAG combiner, and removed Flags passing to getNode to show its effect. Other places in DAG combiner and other helper methods similar to getNode also need this. They can be done in follow-up patches. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D87361	2020-09-26 13:57:52 +08:00
David Sherwood	e077367a28	[SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits An existing function Type::getScalarSizeInBits returns a uint64_t instead of a TypeSize class because the caller is requesting a scalar size, which cannot be scalable. This patch makes other similar functions requesting a scalar size consistent with that, thereby eliminating more than 1000 implicit TypeSize -> uint64_t casts. Differential revision: https://reviews.llvm.org/D87889	2020-09-23 09:20:08 +01:00
Craig Topper	e30371d99d	[DAGCombiner] Teach visitMSTORE to replace an all ones mask with an unmasked store. Similar to what done in D87788 for MLOAD. Again I've skipped indexed, truncating, and compressing stores.	2020-09-16 16:42:22 -07:00
Craig Topper	89ee4c0314	[DAGCombiner] Teach visitMLOAD to replace an all ones mask with an unmasked load If we have an all ones mask, we can just a regular masked load. InstCombine already gets this in IR. But the all ones mask can appear after type legalization. Only avx512 test cases are affected because X86 backend already looks for element 0 and the last element being 1. It replaces this with an unmasked load and blend. The all ones mask is a special case of that where the blend will be removed. That transform is only enabled on avx2 targets. I believe that's because a non-zero passthru on avx2 already requires a separate blend so its more profitable to handle mixed constant masks. This patch adds a dedicated all ones handling to the target independent DAG combiner. I've skipped extending, expanding, and index loads for now. X86 doesn't use index so I don't know much about it. Extending made me nervous because I wasn't sure I could trust the memory VT had the right element count due to some weirdness in vector splitting. For expanding I wasn't sure if we needed different undef handling. Differential Revision: https://reviews.llvm.org/D87788	2020-09-16 13:21:16 -07:00
Simon Pilgrim	3f682611ab	[DAG] Remover getOperand() call. NFCI.	2020-09-16 11:18:58 +01:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Nikita Popov	8e69c3cde8	[DAGCombiner] Fold fmin/fmax with INF / FLT_MAX Similar to D87415, this folds the various float min/max opcodes with a constant INF or -INF operand, or FLT_MAX / -FLT_MAX operand if the ninf flag is set. Some of the folds are only possible under nnan. The fminnum(X, INF) with nnan and fmaxnum(X, -INF) with nnan cases are needed to improve the VECREDUCE_FMIN/FMAX lowerings on X86, the rest is here for the sake of completeness. Differential Revision: https://reviews.llvm.org/D87571	2020-09-14 19:59:33 +02:00
Craig Topper	56b33391d3	[SelectionDAG] Move ISD:PARITY formation from DAGCombine to SimplifyDemandedBits. Previously, we formed ISD::PARITY by looking for (and (ctpop X), 1) but the AND might be separated from the ctpop. For example if the parity result is multiplied by 2, we'll pull the AND through the shift. So to handle more cases, move to SimplifyDemandedBits where we can handle more cases that result in only the LSB of the CTPOP being used.	2020-09-13 21:04:13 -07:00
Qiu Chaofan	a4c5351986	[DAGCombiner] Propagate FMF flags in FMA folding DAG combiner folds (fma a 1.0 b) into (fadd a b) but the flag isn't propagated into new fadd. This patch fixes that. Some code in visitFMA is redundant and such support for vector constants is missing. Need follow-up patch to clean. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D87037	2020-09-14 00:19:06 +08:00
Craig Topper	ad3d6f993d	[SelectionDAG][X86][ARM][AArch64] Add ISD opcode for __builtin_parity. Expand it to shifts and xors. Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop isn't natively supported by the target, this leads to poor codegen due to the expansion of ctpop being more complex than what is needed for parity. This adds a DAG combine to convert the pattern to ISD::PARITY before operation legalization. Type legalization is updated to handled Expanding and Promoting this operation. If after type legalization, CTPOP is supported for this type, LegalizeDAG will turn it back into CTPOP+AND. Otherwise LegalizeDAG will emit a series of shifts and xors followed by an AND with 1. I've avoided vectors in this patch to avoid more legalization complexity for this patch. X86 previously had a custom DAG combiner for this. This is now moved to Custom lowering for the new opcode. There is a minor regression in vector-reduce-xor-bool.ll, but a follow up patch can easily fix that. Fixes PR47433 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87209	2020-09-12 11:42:18 -07:00
Nikita Popov	0a5dc7effb	[DAGCombiner] Fold fmin/fmax of NaN fminnum(X, NaN) is X, fminimum(X, NaN) is NaN. This mirrors the behavior of existing InstSimplify folds. This is expected to improve the reduction lowerings in D87391, which use NaN as a neutral element. Differential Revision: https://reviews.llvm.org/D87415	2020-09-09 23:53:32 +02:00
Ulrich Weigand	1a25133bcd	[DAGCombine] Skip re-visiting EntryToken to avoid compile time explosion During the main DAGCombine loop, whenever a node gets replaced, the new node and all its users are pushed onto the worklist. Omit this if the new node is the EntryToken (e.g. if a store managed to get optimized out), because re-visiting the EntryToken and its users will not uncover any additional opportunities, but there may be a large number of such users, potentially causing compile time explosion. This compile time explosion showed up in particular when building the SingleSource/UnitTests/matrix-types-spec.cpp test-suite case on any platform without SIMD vector support. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D86963	2020-09-09 19:13:46 +02:00
Craig Topper	b1e68f885b	[SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.: This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130. This required adding a SDNodeFlags to SelectionDAG::getSetCC. Now we manage to contant fold some stuff undefs during the initial getNode that we don't do in later DAG combines. Differential Revision: https://reviews.llvm.org/D87200	2020-09-08 15:27:21 -07:00
Sanjay Patel	7a06b166b1	[DAGCombiner] allow more store merging for non-i8 truncated ops This is a follow-up suggested in D86420 - if we have a pair of stores in inverted order for the target endian, we can rotate the source bits into place. The "be_i64_to_i16_order" test shows a limitation of the current function (which might be avoided if we integrate this function with the other cases in mergeConsecutiveStores). In the earlier "be_i64_to_i16" test, we skip the first 2 stores because we do not match the full set as consecutive or rotate-able, but then we reach the last 2 stores and see that they are an inverted pair of 16-bit stores. The "be_i64_to_i16_order" test alters the program order of the stores, so we miss matching the sub-pattern. Differential Revision: https://reviews.llvm.org/D87112	2020-09-07 14:12:36 -04:00
Paul Walker	f72121254d	[SVE] Don't reorder subvector/binop sequences when the resulting binop is not legal. When lowering fixed length vector operations for SVE the subvector operations are used extensively to marshall data between scalable and fixed-length vectors. This means that sequences like: extract_subvec(binop(insert_subvec(a), insert_subvec(b))) are very common. DAGCombine only checks if the resulting binop is legal or can be custom lowered when undoing such sequences. When it's custom lowering that is introducing them the result is an infinite legalise->combine->legalise loop. This patch extends the isOperationLegalOr... functions to include a "LegalOnly" parameter to restrict the check to legal operations only. Although isOperationLegal could be used it's common for the affected code paths to be visited pre and post legalisation, so the extra parameter keeps the code tidy. Differential Revision: https://reviews.llvm.org/D86450	2020-09-02 11:01:33 +01:00
Cameron McInally	cfe2b81710	[SVE] Update INSERT_SUBVECTOR DAGCombine to use getVectorElementCount(). A small piece of the project to replace getVectorNumElements() with getVectorElementCount(). Differential Revision: https://reviews.llvm.org/D86894	2020-09-01 16:51:44 -05:00
Sam Tebbs	15e880a04f	[DAGCombiner] Fold an AND of a masked load into a zext_masked_load This patch folds an AND of a masked load and build vector into a zero extended masked load. Differential Revision: https://reviews.llvm.org/D86789	2020-09-01 17:02:07 +01:00
Qiu Chaofan	5475154865	[NFC] [DAGCombiner] Refactor bitcast folding within fabs/fneg fabs and fneg share a common transformation: (fneg (bitconvert x)) -> (bitconvert (xor x sign)) (fabs (bitconvert x)) -> (bitconvert (and x ~sign)) This patch separate the code into a single method. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86862	2020-09-01 00:48:12 +08:00
Qiu Chaofan	eb2a405c18	[NFC] [DAGCombiner] Remove unnecessary negation in visitFNEG In visitFNEG of DAGCombiner, the folding of (fneg (fsub c, x)) is redundant since getNegatedExpression already handles it.	2020-09-01 00:35:01 +08:00
Sanjay Patel	1c9a09f42e	[DAGCombiner] skip reciprocal divisor optimization for x/sqrt(x), better I tried to fix this in: rG716e35a0cf53 ...but that patch depends on the order that we encounter the magic "x/sqrt(x)" expression in the combiner's worklist. This patch should improve that by waiting until we walk the user list to decide if there's a use to skip. The AArch64 test reveals another (existing) ordering problem though - we may try to create an estimate for plain sqrt(x) before we see that it is part of a 1/sqrt(x) expression.	2020-08-31 09:35:59 -04:00
Sanjay Patel	716e35a0cf	[DAGCombiner] skip reciprocal divisor optimization for x/sqrt(x) In general, we probably want to try the multi-use reciprocal transform before sqrt transforms, but x/sqrt(x) is a special-case because that will always reduce to plain sqrt(x) or an estimate. The AArch64 tests show that the transform is limited by TLI hook to patterns where there are 3 or more uses of the divisor. So this change can result in an extra division compared to what we had, but that's the intended behvior based on the current setting of that hook.	2020-08-30 10:55:45 -04:00
Kai Luo	b904324788	[DAGCombiner] Enhance (zext(setcc)) Current `v:t = zext(setcc x,y,cc)` will be transformed to `select x, y, 1:t, 0:t, cc`. It misses some opportunities if x's type size is less than `t`'s size. This patch enhances the above transformation. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86687	2020-08-29 03:37:41 +00:00
David Sherwood	f4257c5832	[SVE] Make ElementCount members private This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isScalable(). In addition I've added some other member functions for more commonly used operations. Hopefully this makes the class more useful and will reduce the need for calling getKnownMinValue(). Differential Revision: https://reviews.llvm.org/D86065	2020-08-28 14:43:53 +01:00
Drew Wock	0ec098e22b	[FPEnv] Allow fneg + strict_fadd -> strict_fsub in DAGCombiner This is the first of a set of DAGCombiner changes enabling strictfp optimizations. I want to test to waters with this to make sure changes like these are acceptable for the strictfp case- this particular change should preserve exception ordering and result precision perfectly, and many other possible changes appear to be able to as well. Copied from regular fadd combines but modified to preserve ordering via the chain, this change allows strict_fadd x, (fneg y) to become struct_fsub x, y and strict_fadd (fneg x), y to become strict_fsub y, x. Differential Revision: https://reviews.llvm.org/D85548	2020-08-27 08:17:01 -04:00
Sanjay Patel	54a5dd485c	[DAGCombiner] allow store merging non-i8 truncated ops We have a gap in our store merging capabilities for shift+truncate patterns as discussed in: https://llvm.org/PR46662 I generalized the code/comments for this function in earlier commits, so we only need ease the type restriction and adjust the address/endian checking to make this work. AArch64 lets us switch endian to make sure that patterns are matched either way. Differential Revision: https://reviews.llvm.org/D86420	2020-08-26 15:23:08 -04:00
Venkataramanan Kumar	62e91bf563	[DAGCombine]: Fold X/Sqrt(X) to Sqrt(X) With FMF ( "nsz" and " reassoc") fold X/Sqrt(X) to Sqrt(X). This is done after targets have the chance to produce a reciprocal sqrt estimate sequence because that expansion is probably more efficient than an expansion of a non-reciprocal sqrt. That is also why we deferred doing this transform in IR (D85709). Differential Revision: https://reviews.llvm.org/D86403	2020-08-24 18:16:13 -04:00
Sanjay Patel	1d0fa79824	[DAGCombiner] restrict store merge of truncs to early combining The pattern matching does not account for truncating stores, so it is unlikely to work at later stages. So we are likely wasting compile-time with no hope of improvement by running this later.	2020-08-23 10:44:23 -04:00
Sanjay Patel	79cb289a95	[DAGCombiner] add early exit for store merging of truncs This should be NFC in terms of output because the endian check further down would bail out too, but we are wasting time by waiting to that point to give up. If we generalize that function to deal with more than i8 types, we should not have to deal with the degenerate case.	2020-08-22 16:25:16 -04:00
Sanjay Patel	2fc7c85201	[DAGCombiner] clean up merge of truncated stores; NFC This code handles the special-case of i8 stores, but it could be generalized to deal with other types.	2020-08-22 09:23:32 -04:00
Sanjay Patel	f925fd3304	[DAGCombiner] give magic number a name in getStoreMergeCandidates; NFC	2020-08-17 15:37:55 -04:00
Sanjay Patel	046b4a550a	[DAGCombiner] reduce code duplication in getStoreMergeCandidates; NFC	2020-08-17 15:37:55 -04:00
Sanjay Patel	20c85fd1ab	[DAGCombiner] simplify bool return in getStoreMergeCandidates; NFC	2020-08-17 15:37:55 -04:00
Sanjay Patel	52cd8f1ecb	[DAGCombiner] clean up getStoreMergeCandidates(); NFC 1. Move bailouts and local var declarations. 2. Convert if-chain to switch on StoreSource with unreachable default.	2020-08-17 15:37:54 -04:00
Sanjay Patel	27708db3e3	[DAGCombiner] convert StoreSource if-chain to switch; NFC The "isa" checks were less constrained because they allow target constants, but the later matching code would bail out on those anyway, so this should be slightly more efficient.	2020-08-17 15:37:54 -04:00
Kerry McLaughlin	30af595f05	[SVE][CodeGen] Legalisation of EXTRACT_VECTOR_ELT for scalable vectors This patch changes SplitVecOp_EXTRACT_VECTOR_ELT to work correctly for scalable vectors and also fixes an a bug in DAGCombiner where the scalable property is dropped in visitTRUNCATE when attempting to fold an extract + a truncate. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85754	2020-08-13 12:32:59 +01:00
David Sherwood	3ec3fcb97a	[CodeGen] In narrowExtractedVectorLoad bail out for scalable vectors In narrowExtractedVectorLoad there is an optimisation that tries to combine extract_subvector with a narrowing vector load. At the moment this produces warnings due to the incorrect calls to getVectorNumElements() for scalable vector types. I've got this working for scalable vectors too when the extract subvector index is a multiple of the minimum number of elements. I have added a new variant of the function: MachineFunction::getMachineMemOperand that copies an existing MachineMemOperand, but replaces the pointer info with a null version since we cannot currently represent scaled offsets. I've added a new test for this particular case in: CodeGen/AArch64/sve-extract-subvector.ll Differential Revision: https://reviews.llvm.org/D83950	2020-08-13 10:46:18 +01:00
Kerry McLaughlin	85c7e89f3b	[CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize Changes the Offset arguments to both functions from int64_t to TypeSize & updates all uses of the functions to create the offset using TypeSize::Fixed() Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85220	2020-08-11 12:17:10 +01:00
Sanjay Patel	f22ac1d15b	[DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division, part 2 Follow-up to D82716 / rGea71ba11ab11 We do not have the fabs removal fold in IR yet for the case where the sqrt operand is repeated, so that's another potential improvement.	2020-08-08 10:38:06 -04:00
Simon Pilgrim	4aaf301fb8	[DAG] Fold vector (aext (load x)) -> (zext (truncate (zextload x))) We currently don't do anything to fold any_extend vector loads as no target has such an instruction. Instead I've added support for folding to a zextload, SimplifyDemandedBits does a good job of adjusting the zext(truncate(()) stages as required later on. We still need the custom scalar extload handling instead of using the tryToFoldExtOfLoad helper as it has different legality tests - we can probably tweak that to reduce most of the code duplication. Fixes the regression I mentioned in rG99a971cadff7 Differential Revision: https://reviews.llvm.org/D85129	2020-08-05 11:22:23 +01:00
Sam Tebbs	276ed5f7e4	[DAGCombiner] Fold sext_inreg of a masked load into a sign extended masked load This patch adds a DAG combine fold for a sext(masked_load) into a sign extended masked load. Differential Revision: https://reviews.llvm.org/D84332	2020-07-30 10:34:02 +01:00
David Sherwood	2078771759	[SVE][CodeGen] Add simple integer add tests for SVE tuple types I have added tests to: CodeGen/AArch64/sve-intrinsics-int-arith.ll for doing simple integer add operations on tuple types. Since these tests introduced new warnings due to incorrect use of getVectorNumElements() I have also fixed up these warnings in the same patch. These fixes are: 1. In narrowExtractedVectorBinOp I have changed the code to bail out early for scalable vector types, since we've not yet hit a case that proves the optimisations are profitable for scalable vectors. 2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced calls to getVectorNumElements with getVectorMinNumElements in cases that work with scalable vectors. For the other cases I have added asserts that the vector is not scalable because we should not be using shuffle vectors and build vectors in such cases. Differential revision: https://reviews.llvm.org/D84016	2020-07-29 13:32:10 +01:00
Changpeng Fang	9162b70e51	DADCombiner: Don't simplify the token factor if the node's number of operands already exceeds TokenFactorInlineLimit Summary: In parallelizeChainedStores, a TokenFactor was created with the size greater than 3000. We found that DAGCombiner::visitTokenFactor will consume a huge amount of time on such nodes. Since the number of operands already exceeds TokenFactorInlineLimit, we propose to give up simplification with the consideration of compile time. Reviewers: @spatel, @arsenm Differential Revision: https://reviews.llvm.org/D84204	2020-07-25 21:20:59 -07:00

1 2 3 4 5 ...

3019 Commits