llvm-project

Commit Graph

Author	SHA1	Message	Date
Ayke van Laethem	de48717fcf	[AVR] Support unaligned store This patch really just extends D39946 towards stores as well as loads. While the patch is in SelectionDAGBuilder, it only applies to AVR (the only target that supports unaligned atomic operations). Differential Revision: https://reviews.llvm.org/D128483	2022-08-15 14:29:37 +02:00
Simon Pilgrim	3a73133217	[DAG] canCreateUndefOrPoison - add freeze(sign_extend_inreg(x,vt)) -> sign_extend_inreg(freeze(x),vt) support Guaranteed not to create undef/poison	2022-08-15 12:18:59 +01:00
Peter Waller	6e85db7293	[DAGCombine] Combine signext_inreg of extract-extend The outer signext_inreg is redundant in the following: Fold (signext_inreg (extract_subvector (zext\|anyext\|sext iN_value to _) _) from iN) -> (extract_subvector (signext iN_value to iM)) Tests are precommitted and clone those by analogy from the AND case in the same file. Add a negative test to check extension width is handled correctly. This patch supersedes D130700. Differential Revision: https://reviews.llvm.org/D131503	2022-08-15 10:58:07 +00:00
Simon Pilgrim	7e294e676e	[DAG] canCreateUndefOrPoison - add freeze(assertsext/zext(x,bt)) -> assertsext/zext(freeze(x),vt) support These are guaranteed not to create undef/poison (although they may pass through) - the associated ISD::VALUETYPE node is also guaranteed never to generate poison	2022-08-15 11:13:43 +01:00
Simon Pilgrim	e2d13fd096	[DAG] canCreateUndefOrPoison - add freeze(shl(x,y)) -> shl(freeze(x),y) support These are guaranteed not to create undef/poison if the shift amount is known to be in range	2022-08-14 14:38:10 +01:00
Simon Pilgrim	a621d38bcb	[DAG] canCreateUndefOrPoison - add freeze(and/or/xor(x,y)) -> and/or/xor(freeze(x),y) support These are guaranteed not to create undef/poison	2022-08-14 13:14:53 +01:00
Simon Pilgrim	60534b8879	[DAG] canCreateUndefOrPoison - add freeze(add/sub/mul(x,y)) -> add/sub/mul(freeze(x),y,z) support These are guaranteed not to create undef/poison as long as there are no poison generating flags	2022-08-13 20:58:00 +01:00
Joe Loser	b12aa497cd	[DAGCombine] Replace std::monostate equivalent in DAGCombiner.cpp Remove the `UnitT` type and operators in favor of using `std::monostate` directly. Differential Revision: https://reviews.llvm.org/D131778	2022-08-12 21:42:09 -06:00
Simon Pilgrim	4de35f4bbf	[DAG] Add TODO to remove creation of INSERT_SUBVECTOR nodes from SimplifyMultipleUseDemandedBits SimplifyMultipleUseDemandedBits shouldn't be creating general nodes like this - although we allow bitcasts, even general constant folding is avoided. Removing it causes a number of regressions that need addressing first, but I've added a TODO for now.	2022-08-12 10:45:30 +01:00
Filipp Zhinkin	1626ee6a95	[DAGCombine] Hoist shifts out of a logic operations tree. Hoist and combine shift operations from logic operations tree: logic (logic (SH x0, s), y), (logic (SH x1, s), z) --> logic (SH (logic x0, x1), s), (logic y, z) The transformation improves code generated for some cases related to the issue https://github.com/llvm/llvm-project/issues/49541. Correctness: https://alive2.llvm.org/ce/z/pVqVgY https://alive2.llvm.org/ce/z/YVvT-q https://alive2.llvm.org/ce/z/W5zTBq https://alive2.llvm.org/ce/z/YfJsvJ https://alive2.llvm.org/ce/z/3YSyDM https://alive2.llvm.org/ce/z/Bs2kzk https://alive2.llvm.org/ce/z/EoQpzU https://alive2.llvm.org/ce/z/Jnc_5H https://alive2.llvm.org/ce/z/_LP6k_ https://alive2.llvm.org/ce/z/KvZNC9 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D131189	2022-08-12 12:42:16 +03:00
wanglian	061f7ec9fa	[LegalizeTypes][NFC] Use getConstantOperandVal instead of cast constant getvalue Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131642	2022-08-12 14:35:10 +08:00
wanglian	1303057888	[LegalizeTypes][NFC] Use dyn_cast instead of isa and cast Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131544	2022-08-12 14:18:49 +08:00
wanglian	3b71f1d5ab	[LegalizeTypes][NFC] Use getConstantOperandAPInt instead of cast constant getAPInt Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D131653	2022-08-12 10:21:54 +08:00
Peter Waller	898699831b	[DAGCombine] Check zext legality in zext-extract-extend combine Discussed in D131503. Fix to D130782.	2022-08-11 14:30:42 +00:00
aqjune	02e56e2533	[CodeGen] Generate efficient assembly for freeze(poison) version of `mm_cast` intel intrinsics This patch makes the variants of `mm_cast` intel intrinsics that use `shufflevector(freeze(poison), ..)` emit efficient assembly. (These intrinsics are planned to use `shufflevector(freeze(poison), ..)` after shufflevector's semantics update; relevant thread: D103874) To do so, this patch 1. Updates `LowerAVXCONCAT_VECTORS` in X86ISelLowering.cpp to recognize `FREEZE(UNDEF)` operand of `CONCAT_VECTOR` in addition to `UNDEF` 2. Updates X86InstrVecCompiler.td to recognize `insert_subvector` of `FREEZE(UNDEF)` vector as its first operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130339	2022-08-11 13:36:21 +09:00
Simon Pilgrim	8623da5f74	[DAG] visitFREEZE - generalize freeze(op()) -> op(freeze()) to any number of operands canCreateUndefOrPoison currently only handles unary ops, but we intend to change that soon - this more closely matches the pushFreezeToPreventPoisonFromPropagating behaviour where the freeze is pushed up to a single operand value, as long as all others are guaranteed not to be poison/undef. However, pushFreezeToPreventPoisonFromPropagating would freeze all uses of the value - whilst this variant requires the frozen value to be only used in the op - we can look at generalize multiple uses later if the need arises.	2022-08-10 13:12:46 +01:00
Simon Pilgrim	bbc27d0148	[DAG] canCreateUndefOrPoison - add freeze(truncate(x)) -> truncate(freeze(x)) support	2022-08-10 11:27:22 +01:00
David Truby	b1b9c39629	[AArch64][SVE] Use SVE for VLS fcopysign for wide vectors Currently fcopysign for VLS vectors lowers through NEON even when the vector width is wider than a NEON vector, causing bad codegen as the vectors are split. This patch causes SVE to be used for these vectors instead, giving much better codegen on wide VLS vectors. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D128642	2022-08-10 10:17:19 +00:00
Simon Pilgrim	df3ea7365e	[DAG] Use DAG.getFreeze() to create freeze node. NFC.	2022-08-10 10:26:26 +01:00
Simon Pilgrim	ed162d455a	[DAG] Avoid hasOneUse() calls if the cheaper !AssumeSingleUse test has already failed. NFC. Very minor optimization, but every little helps..	2022-08-09 16:42:19 +01:00
Simon Pilgrim	d79e7dc939	[DAG] SimplifyDemandedVectorElts - and/mul(x,y) - if a demanded element of y is known zero then we don't need to demand it in x This fixes most of the remaining regressions from the fixes in rG293899c64b75	2022-08-09 16:24:08 +01:00
Simon Pilgrim	2724143551	[DAG] canCreateUndefOrPoison - add freeze(ctpop(x)) -> ctpop(freeze(x)) and freeze(parity(x)) -> parity(freeze(x)) support Both are guaranteed not to create undef/poison	2022-08-09 10:10:29 +01:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Simon Pilgrim	6f2bee667a	[DAG] canCreateUndefOrPoison - add freeze(bswap(x)) -> bswap(freeze(x)) and freeze(bitreverse(x)) -> bitreverse(freeze(x)) support Both are guaranteed not to create undef/poison	2022-08-08 17:27:17 +01:00
Simon Pilgrim	e4b2c52420	[DAG] canCreateUndefOrPoison - add freeze(sext(x)) -> sext(freeze(x)) and freeze(zext(x)) -> zext(freeze(x)) support Both are guaranteed not to create undef/poison	2022-08-08 16:43:40 +01:00
Simon Pilgrim	9641a201a5	[DAG] Add initial SelectionDAG::canCreateUndefOrPoison support This patch adds basic support for a DAG variant of the canCreateUndefOrPoison call and updates DAGCombiner::visitFREEZE to use it, further Opcodes (including target specific Opcodes) can be handled when we have test coverage. So far, I've left visitFREEZE to just use this for unary nodes (which currently means the existing BITCAST/FREEZE cases) - later patches will add other unary opcodes (with test coverage) and we can also refactor visitFREEZE to support a general number of operands like we do in InstCombinerImpl::pushFreezeToPreventPoisonFromPropagating. I'm not aware of any vector test freeze coverage so the DemandedElts (and the Depth) args are not being used yet - but they are in place. Similarly we will be able to handle poison generating SDNodeFlags as and when it becomes an issue. Part of the work for D106675 / PR50468 Differential Revision: https://reviews.llvm.org/D130646	2022-08-08 15:16:06 +01:00
Simon Pilgrim	b334709467	Remove superfluous ; outside of a function	2022-08-08 12:14:03 +01:00
Shubham Narlawar	ab4fc87a9d	[DAG] Emit table lookup from TargetLowering::expandCTTZ() This patch emits table lookup in expandCTTZ. Context - https://reviews.llvm.org/D113291 transforms set of IR instructions to cttz intrinsic but there are some targets which does not support CTTZ or CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ(). Differential Revision: https://reviews.llvm.org/D128911	2022-08-08 12:08:05 +01:00
Simon Pilgrim	e5e93b6130	[DAG] FoldConstantArithmetic - add initial support for undef elements in bitcasted binop constant folding FoldConstantArithmetic can fold constant vectors hidden behind bitcasts (e.g. vXi64 -> v2Xi32 on 32-bit platforms), but currently bails if either vector contains undef elements. These undefs can often occur due to SimplifyDemandedBits/VectorElts calls recognising that the upper bits are often unnecessary (e.g. funnel-shift/rotate implicit-modulo and AND masks). This patch adds a basic 'FoldValueWithUndef' handler that will attempt to constant fold if one or both of the ops are undef - so far this just handles the AND and MUL cases where we always fold to zero. The RISCV codegen increase is interesting - it looks like the BUILD_VECTOR lowering was loading a constant pool entry but now (with all elements defined constant) it can materialize the constant instead? Differential Revision: https://reviews.llvm.org/D130839	2022-08-08 11:53:56 +01:00
David Green	061e0189a3	[DAG] Ensure Legal BUILD_VECTOR elements types in shuffle->And combine D129150 added a combine from shuffles to And that creates a BUILD_VECTOR of constant elements. We need to ensure that the elements are of a legal type, to prevent asserts during lowering. Fixes #56970. Differential Revision: https://reviews.llvm.org/D131350	2022-08-08 09:47:55 +01:00
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Filipp Zhinkin	c55899f763	[DAGCombiner] Hoist funnel shifts from logic operation Hoist funnel shift from logic op: logic_op (FSH x0, x1, s), (FSH y0, y1, s) --> FSH (logic_op x0, y0), (logic_op x1, y1), s The transformation improves code generated for some cases related to issue https://github.com/llvm/llvm-project/issues/49541. Reduced amount of funnel shifts can also improve throughput on x86 CPUs by utilizing more available ports: https://quick-bench.com/q/gC7AKkJJsDZzRrs_JWDzm9t_iDM Transformation correctness checks: https://alive2.llvm.org/ce/z/TKPULH https://alive2.llvm.org/ce/z/UvTd_9 https://alive2.llvm.org/ce/z/j8qW3_ https://alive2.llvm.org/ce/z/7Wq7gE https://alive2.llvm.org/ce/z/Xr5w8R https://alive2.llvm.org/ce/z/D5xe_E https://alive2.llvm.org/ce/z/2yBZiy Differential Revision: https://reviews.llvm.org/D130994	2022-08-05 17:02:22 -04:00
Dawid Jurczak	1bd31a6898	[NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T Extracted from https://reviews.llvm.org/D129781 and address comment: https://reviews.llvm.org/D129781#3655571 Differential Revision: https://reviews.llvm.org/D130268	2022-08-05 13:35:41 +02:00
Lorenzo Albano	74940d2668	[VP] Add widening for VP_STRIDED_LOAD and VP_STRIDED_STORE Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D121114	2022-08-04 16:12:01 +02:00
wanglian	b6b0690355	[LegalizeTypes][VP] Add split operand support for VP float and integer casting Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D130685	2022-08-04 15:41:50 +08:00
Felipe de Azevedo Piovezan	a5a8a05c78	[SelectionDAG] Handle IntToPtr constants in dbg.value The function `handleDebugValue` has custom logic to handle certain kinds constants, namely integers, floats and null pointers. However, it does not handle constant pointers created from IntToPtr ConstantExpressions. This patch addresses the issue by replacing the Constant with its integer operand. A similar bug was addressed for GlobalISel in D130642. Reviewed By: aprantl, #debug-info Differential Revision: https://reviews.llvm.org/D130908	2022-08-03 14:10:05 -04:00
David Truby	9a976f3661	[llvm] Always use TargetConstant for FP_ROUND ISD Nodes This patch ensures consistency in the construction of FP_ROUND nodes such that they always use ISD::TargetConstant instead of ISD::Constant. This additionally fixes a bug in the AArch64 SVE backend where patterns were matching against TargetConstant nodes and sometimes failing when passed a Constant node. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130370	2022-08-03 14:02:11 +01:00
Fraser Cormack	646e2f4803	[VP] Rename VP int<->float conversion ISD opcodes These should be named like the non-VP versions for consistency. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130967	2022-08-03 10:04:38 +01:00
Simon Pilgrim	b651fdff79	[DAG] matchRotateSub - ensure the (pre-extended) shift amount is wide enough for the amount mask (PR56859) matchRotateSub is given shift amounts that will already have stripped any/zero-extend nodes from - so make sure those values are wide enough to take a mask.	2022-08-02 11:38:52 +01:00
Marius Brehler	ddb6c28638	Avoid comparison of integers of different signs Otherwiese a warning is emitted when compiling with `-Wsign-compare`.	2022-08-01 11:20:41 +00:00
Simon Pilgrim	b43d7aacf8	[DAG] visitINSERT_VECTOR_ELT - extend folding to BUILD_VECTOR if all missing elements from an insertion chain are known zero	2022-08-01 11:32:33 +01:00
David Sherwood	41119a0f52	[DAGCombiner] Extend visitAND to include EXTRACT_SUBVECTOR Eliminate an AND by redefining an anyext\|sext\|zext. (and (extract_subvector (anyext\|sext\|zext v) _) iN_mask) => (extract_subvector (zeroext_iN v)) Differential Revision: https://reviews.llvm.org/D130782	2022-08-01 10:32:32 +01:00
Chuanqi Xu	9701053517	Introduce @llvm.threadlocal.address intrinsic to access TLS variable This belongs to a series of patches which try to solve the thread identification problem in coroutines. See https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015 for a full background. The problem consists of two concrete problems: TLS variable and readnone functions. This patch tries to convert the TLS problem to readnone problem by converting the access of TLS variable to an intrinsic which is marked as readnone. The readnone problem would be addressed in following patches. Reviewed By: nikic, jyknight, nhaehnle, ychen Differential Revision: https://reviews.llvm.org/D125291	2022-08-01 10:51:30 +08:00
Simon Pilgrim	9ad082eb5a	[DAG] Pull out repeated getOperand() calls for shuffle ops. NFC.	2022-07-30 14:02:54 +01:00
Amaury Séchet	226086230c	[DAG] Use recursivelyDeleteUnusedNodes in CommitTargetLoweringOpt. It simplifies the logic and removes the need for manual bookkeeping. Differential Revision: https://reviews.llvm.org/D130445	2022-07-29 13:49:03 +00:00
Simon Pilgrim	af1b7ebcdf	[TargetLowering] Move a few hasOneUse() tests later to reduce unnecessary computations. NFC. Many of these cases, an early-out on the much cheaper getOpcode() check will avoid us needing to call hasOneUse() entirely.	2022-07-29 14:20:35 +01:00
Simon Pilgrim	641dba9e28	[DAG] Move a few hasOneUse() tests later to reduce unnecessary computations. NFC. Many of these cases, an early-out on the much cheaper getOpcode() check will avoid us needing to call hasOneUse() entirely.	2022-07-29 11:34:39 +01:00
Simon Pilgrim	9082c13106	[Support] Add KnownBits::concat method Add a method for the various cases where we need to concatenate 2 KnownBits together (BUILD_PAIR and SHIFT_PARTS in particular) - uses the existing APInt::concat 'HiBits.concat(LoBits)' convention Differential Revision: https://reviews.llvm.org/D130557	2022-07-29 11:06:39 +01:00
Simon Pilgrim	8c99cef1e7	[DAG] Remove SelectionDAG::GetDemandedBits and use SimplifyMultipleUseDemandedBits directly. GetDemandedBits is mainly a wrapper around SimplifyMultipleUseDemandedBits now, and is only used by DAGCombiner::visitSTORE so I've moved all remaining functionality there. visitSTORE was making use of this to 'simplify' constants for a trunc-store. Just removing this code left to a mixture of regressions and gains - it came down to whether a target preferred a sign or zero extended constant for materialization/truncation. I've just moved the code over for now, but a next step would be to move this to targetShrinkDemandedConstant, but some targets that override the method expect a basic binop, and might react badly to a store node.....	2022-07-28 17:03:44 +01:00
Simon Pilgrim	be488ba7de	[DAG] DAGCombiner::visitTRUNCATE - remove GetDemandedBits call This should now all be handled by SimplifyDemandedBits.	2022-07-28 15:23:04 +01:00
Simon Pilgrim	ea7f14dad0	[DAG] SelectionDAG::GetDemandedBits - don't simplify opaque constants I'm actually trying to get rid of GetDemandedBits - but while dismantling it I noticed that we were altering opaque constants. Fixing that causes a FP_TO_INT_SAT regression that should be addressed separately - I'll raise a bug.	2022-07-28 14:46:59 +01:00
Simon Pilgrim	69d5a038b9	[DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits. There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP. Differential Revision: https://reviews.llvm.org/D77804	2022-07-28 14:10:44 +01:00
Amaury Séchet	474a8ee03d	[DAG] Use recursivelyDeleteUnusedNodes in PromoteLoad It simplifies the code overall and removes the need for manual bookkeeping. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130447	2022-07-28 12:54:52 +00:00
Amaury Séchet	7920805b27	[DAG] Use recursivelyDeleteUnusedNodes in ReplaceLoadWithPromotedLoad It simplifies the code overall and removes the need for manual bookkeeping. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130444	2022-07-28 12:32:37 +00:00
Simon Pilgrim	c0b3f7a50f	[DAG] SimplifyDemandedBits - ensure we clear known One bits that AssertZext asserts are really known Zero Matches ComputeKnownBits behaviour Thanks to @uabelho for the fuzz regression report on D129765	2022-07-27 13:57:47 +01:00
Simon Pilgrim	529bd4f352	[DAG] SimplifyDemandedBits - don't early-out for multiple use values SimplifyDemandedBits currently early-outs for multi-use values beyond the root node (just returning the knownbits), which is missing a number of optimizations as there are plenty of cases where we can still simplify when initially demanding all elements/bits. @lenary has confirmed that the test cases in aea-erratum-fix.ll need refactoring and the current increase codegen is not a major concern. Differential Revision: https://reviews.llvm.org/D129765	2022-07-27 10:54:06 +01:00
Simon Pilgrim	1ea7b9c6ee	[DAG] matchRotateSub - set demanded bits to the shift amount type size, not the shift result size. This should fix a report on D130251 of an assert due to a bitwidth mismatch in APInt::isSubSetOf	2022-07-26 17:58:51 +01:00
Paul Walker	e5c892dd85	[SVE][SelectionDAG] Use INDEX to generate matching instances of BUILD_VECTOR. This patch starts small, only detecting sequences of the form <a, a+n, a+2n, a+3n, ...> where a and n are ConstantSDNodes. Differential Revision: https://reviews.llvm.org/D125194	2022-07-26 15:28:37 +00:00
wangpc	1a7078d106	[DAGCombine] Mask doesn't have to be (EltSize - 1) exactly when combining rotation I think what we need is the least Log2(EltSize) significant bits are known to be ones. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130251	2022-07-26 21:14:45 +08:00
Sven van Haastregt	c8d91b07bb	Reassoc FMF should not optimize FMA(a, 0, b) to (b) Optimizing (a * 0 + b) to (b) requires assuming that a is finite and not NaN. DAGCombiner will do this optimization when the reassoc fast math flag is set, which is not correct. Change DAGCombiner to only consider UnsafeMath for this optimization. Differential Revision: https://reviews.llvm.org/D130232 Co-authored-by: Andrea Faulds <andrea.faulds@arm.com>	2022-07-26 09:39:12 +01:00
Kazu Hirata	3f3930a451	Remove redundaunt virtual specifiers (NFC) Identified with tidy-modernize-use-override.	2022-07-25 23:00:59 -07:00
jacquesguan	cb370cf413	[DAGCombiner] Teach scalarizeExtractedBinop to support scalable splat. This patch supports the scalable splat part for scalarizeExtractedBinop. Differential Revision: https://reviews.llvm.org/D129725	2022-07-26 09:31:45 +08:00
Simon Pilgrim	562ee7cc5f	[DAG] visitSMUL_LOHI/visitUMUL_LOHI - ensure we canonicalize constants to the RHS	2022-07-24 16:09:56 +01:00
Simon Pilgrim	428c0f2adc	[DAG] getNode - assert that SMUL_LOHI/UMUL_LOHI nodes have the correct ops + types	2022-07-24 15:30:57 +01:00
Simon Pilgrim	0708771cce	[DAG] MaskedVectorIsZero - don't bother with (-1).isSubsetOf mask check. NFC. Just use KnownBits::isZero() to ensure all the bits are known zero.	2022-07-24 13:12:21 +01:00
Simon Pilgrim	e82d49bfed	[DAG] SimplifyMultipleUseDemandedBits - early-out for any scalable vector types Noticed while working to remove SelectionDAG::GetDemandedBits - we were relying on the callers to have already bailed for scalable vectors	2022-07-24 12:59:43 +01:00
Simon Pilgrim	a3e38b4a20	[DAG] SimplifyDemandedVectorElts - if every and/mul element-pair has a zero/undef then just constant fold to zero	2022-07-24 12:00:31 +01:00
Simon Pilgrim	ac8be21365	[DAG] isSplatValue - don't attempt to merge any BITCAST sub elements if they contain UNDEFs We still haven't found a solution that correctly handles 'don't care' sub elements properly - given how close it is to the next release branch, I'm making this fail safe change and we can revisit this later if we can't find alternatives. NOTE: This isn't a reversion of D128570 - it's the removal of undef handling across bitcasts entirely Fixes #56520	2022-07-23 18:38:48 +01:00
Simon Pilgrim	5f89d2bae9	[DAG] Move OR(AND(X,C1),AND(OR(X,Y),C2)) -> OR(AND(X,OR(C1,C2)),AND(Y,C2)) fold to SimplifyDemandedBits This will fix the SystemZ v3i31 memcpy regression in D77804 (with the help of D129765 as well....). It should also allow us to /bend/ the oneuse limitation for cases where we can use demanded bits to safely peek though multiple uses of the AND ops.	2022-07-23 13:17:24 +01:00
Simon Pilgrim	6aff1b7b3c	[DAG] SimplifyDemandedBits - pull out repeated getValueType() calls. NFC.	2022-07-23 12:01:54 +01:00
Simon Pilgrim	2421a5af72	[DAG] ExpandIntRes_ADDSUB - create UADDO/USUBO instead of ADDCARRY/SUBCARRY if overflow is known to be zero As noticed on D127115, when splitting ADD/SUB nodes we often end up with cases where overflow from the lower bits is impossible - in such cases we're better off breaking the carry chain dependency as soon as possible. This path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen diff without a topological worklist.	2022-07-23 11:13:44 +01:00
Simon Pilgrim	8937252465	[DAG] computeKnownBits - add basic shift-by-parts handling Concat KnownBits from ISD::SHL_PARTS / ISD::SRA_PARTS / ISD::SRL_PARTS lo/hi operands and perform the KnownBits calculation by the shift amount on the extended type, before splitting the KnownBits based on the requested lo/hi result.	2022-07-23 09:46:30 +01:00
Craig Topper	be208b40c1	[DAGCombiner] Simplify code around call to reduceLoadWidth in visitAND. NFC We were looking for loads or any_extend+load. reduceLoadWidth hasn't known how to look through such an any_extend to find the load since D40667 almost 5 years ago. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130333	2022-07-22 08:36:56 -07:00
Cullen Rhodes	bf268a05cd	[AArch64] Emit vector FP cmp when LE is used with fast-math Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130093	2022-07-22 07:53:55 +00:00
jacquesguan	e60eb7053d	recommit "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat." With fix for AArch64 and Hexgon test cases.	2022-07-21 17:34:34 +08:00
David Green	23d6186be0	[SelectionDAG] Fix fptoi.sat scalable vector lowering Vector fptosi_sat and fptoui_sat were being expanded by unrolling the vector operation. This doesn't work for scalable vector, so this patch adds a call to TLI.expandFP_TO_INT_SAT if the vector is scalable. Scalable tests are added for AArch64 and RISCV. Some of the AArch64 fptoi_sat operations should be legal, but that will be handled in another patch. Differential Revision: https://reviews.llvm.org/D130028	2022-07-21 08:00:22 +01:00
Simon Pilgrim	029e83b401	[DAG] getNode - don't bother creating ADDO(X,0) or SUBO(X,0) nodes. Similar to what we already do in getNode for basic ADD/SUB nodes, return the X operand directly, but here we know that there will be no/zero overflow as well. As noted on D127115 - this path is being exercised by llvm/test/CodeGen/ARM/dsp-mlal.ll, although I haven't been able to get any codegen without a topological worklist.	2022-07-20 12:04:33 +01:00
Simon Pilgrim	766cd95481	[DAG] getNode - assert that ADDO/SUBO nodes have the correct ops + types	2022-07-20 11:23:58 +01:00
Simon Pilgrim	9fc347aa4e	[DAG] PromoteIntRes_BUILD_VECTOR - extend constant boolean vectors according to target BooleanContents PromoteIntRes_BUILD_VECTOR currently always ANY_EXTENDs build vector operands, but if this is a constant boolean vector we're losing the useful ability to keep the vector matching the BooleanContents mode used by the target. This patch extends constant boolean vectors according to target BooleanContents, allowing a number of additional all-bits folds (notable XOR -> NOT conversions) to occur. Differential Revision: https://reviews.llvm.org/D129641	2022-07-20 10:49:31 +01:00
Lorenzo Albano	07d69d9fc9	[VP] Legalize the stride operand for EXPERIMENTAL_VP_STRIDED SDNodes Add promotion and expansion of integer operands for experimental_vp_strided SelectionDAG nodes; the expansion is actually just a truncation of the stride operand. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D123112	2022-07-20 10:22:43 +02:00
David Truby	4c82f56d8f	[llvm][SVE] Remove redundant and when comparing against extending load When determining if an `and` should be merged into an extending load the constant argument to the `and` is currently not checked if the argument requires truncation. This prevents the combine happening when the vector width is half the normal available vector width for SVE VLA vectors. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D129281	2022-07-19 17:08:32 +01:00
Simon Pilgrim	71c502cbca	[DAG] Call SimplifyDemandedBits from ISD::MUL nodes Noticed while triaging D129765.	2022-07-19 14:11:04 +01:00
Benjamin Kramer	8aff88fd3a	[LegalizeDAG] Propagate alignment in ExpandExtractFromVectorThroughStack Unlike the name suggests this can reuse any store as a base for a memory-based vector extract. If that store is underaligned the loads created to extract will have an invalid alignment. Since most CPUs are forgiving wrt alignment this is almost never an issue, on x86 this is only reproducible by extracting a 128 bit vector out of a wider vector. I tried making a test case in the context of https://reviews.llvm.org/D127982 but it's really really fragile, as the output pretty much looks like a missed optimization.	2022-07-19 13:13:55 +02:00
Simon Pilgrim	0f6b0461b0	[DAG] SimplifyDemandedBits - relax "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" to match only demanded bits The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits. This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115 Alive2: https://alive2.llvm.org/ce/z/fl7T7K Differential Revision: https://reviews.llvm.org/D129933	2022-07-19 10:59:07 +01:00
Max Kazantsev	69b284aaf6	Revert "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat." This reverts commit `58dfaaaace`. Massive AARCH test failures in buildbot.	2022-07-19 13:41:52 +07:00
jacquesguan	58dfaaaace	[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat. This revision supports to scalarize a binary operation of two scalable splat vectors. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122791	2022-07-19 11:20:51 +08:00
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Itay Bookstein	2570f226d1	[SDAG] Remove single-result restriction on commutative CSE The DAG Combiner unnecessarily restricts commutative CSE to nodes with a single result value. This commit removes that restriction. Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D129666	2022-07-18 19:19:13 +03:00
Nikita Popov	56b4b6e81b	[SDAG] Fix release build This variable was only declared in debug builds, but is needed in release builds as well.	2022-07-18 14:10:31 +02:00
Max Kazantsev	d693fd29f1	[Verifier] Make Verifier recognize undef tokens as correct IR Undef tokens may appear in unreached code as result of RAUW of some optimization, and it should not be considered as bad IR. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D128904 Reviewed By: mkazantsev	2022-07-18 16:26:06 +07:00
Craig Topper	7fa1c32634	[CodeGen] Remove unnecessary APInt copy. NFC	2022-07-17 23:41:53 -07:00
Craig Topper	a55ff6aadd	[Support][CodeGen] Fix spelling Divison->Division. NFC	2022-07-17 23:16:29 -07:00
Craig Topper	795602af0c	[CodeGen] Don't compare bool with integer 0. NFC The IsAdd field is a bool.	2022-07-17 23:16:14 -07:00
Simon Pilgrim	53b90dd372	[DAG] Fold (or (and X, C1), (and (or X, Y), C2)) -> (or (and X, C1\|C2), (and Y, C2)) Pulled out of D77804 Alive2: https://alive2.llvm.org/ce/z/g61VRe	2022-07-17 18:51:41 +01:00
Simon Pilgrim	26ce33706f	[DAG] computeKnownBits - move UDIV handling to same place as UREM/SREM. NFC.	2022-07-17 11:59:42 +01:00
Simon Pilgrim	5ec47c6dc5	[DAG] Add MERGE_VALUE computeKnownBits/ComputeNumSignBits handling. Just forward the value tracking to the operand specified by the ResNo	2022-07-17 11:58:08 +01:00
Kazu Hirata	9e6d1f4b5d	[CodeGen] Qualify auto variables in for loops (NFC)	2022-07-17 01:33:28 -07:00
Sanjay Patel	7ca3e23f25	[SDAG] narrow truncated sign_extend_inreg trunc (sign_ext_inreg X, iM) to iN --> sign_ext_inreg (trunc X to iN), iM There are improvements on existing tests from this, and there are a pair of large regressions in D127115 for Thumb2 caused by not folding this pattern. Differential Revision: https://reviews.llvm.org/D129890	2022-07-16 16:29:15 -04:00
Simon Pilgrim	a44bdf9bc1	[DAG] visitINSERT_VECTOR_ELT - refactor BUILD_VECTOR creation from INSERT_VECTOR_ELT chain. D127595 added the ability to recurse up a (one-use) INSERT_VECTOR_ELT chain to create a BUILD_VECTOR before other combines manage to break the chain, something that is particularly bad in D127115. The patch generalises this so it doesn't have to build the chain starting from the last element insertion, instead it can now start from any insertion and will recurse up the chain until it finds all elements or finds a UNDEF/BUILD_VECTOR/SCALAR_TO_VECTOR which represents that start of the chain. Fixes several regressions in D127115	2022-07-16 16:37:31 +01:00
Simon Pilgrim	52b6168c16	[DAG] visitINSERT_VECTOR_ELT - remove duplicate VT.getVectorNumElements() call. NFC.	2022-07-16 16:20:49 +01:00
Simon Pilgrim	2bb6b03d71	Fix signed/unsigned mismatch	2022-07-16 11:48:41 +01:00
Simon Pilgrim	a5d0122f75	[DAG] Canonicalize non-inlane shuffle -> AND if all non-inlane referenced elements are known zero As mentioned on D127115, this patch that attempts to recognise shuffle masks that could be simplified to a AND mask - we already have a similar transform that will fold AND -> 'clear mask' shuffle, but this patch handles cases where the referenced elements are not from the same lane indices but are known to be zero. Differential Revision: https://reviews.llvm.org/D129150	2022-07-16 11:38:24 +01:00
Simon Pilgrim	1cb7416ee3	[DAG] combineShiftAnd1ToBitTest - match "and (srl (not X), C)), 1 --> (and X, 1<<C) == 0" patterns combineShiftAnd1ToBitTest already matches "and (not (srl X, C)), 1 --> (and X, 1<<C) == 0" patterns, but we can end up with situations where the not is before the shift. Part of some yak shaving for D127115 to generalise the "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold.	2022-07-16 11:00:07 +01:00
Kazu Hirata	1a5d007659	Use has_value/value instead of hasValue/getValue (NFC)	2022-07-15 21:48:17 -07:00
Simon Pilgrim	3c8bf29696	[DAG] Move "xor (X logical_shift ShiftC), XorC --> (not X) logical_shift ShiftC" fold into SimplifyDemandedBits SimplifyDemandedBits is called slightly later which allows the not(sext(x)) -> sext(not(x)) fold to occur via foldLogicOfShifts As mentioned on D127115, we should be able to further generalise this based off the demanded bits.	2022-07-15 13:10:15 +01:00
Edd Barrett	2e62a26fd7	[stackmaps] Legalise patchpoint arguments. This is similar to D125680, but for llvm.experimental.patchpoint (instead of llvm.experimental.stackmap). Differential review: https://reviews.llvm.org/D129268	2022-07-15 12:01:59 +01:00
Nikita Popov	2a721374ae	[IR] Don't use blockaddresses as callbr arguments Following some recent discussions, this changes the representation of callbrs in IR. The current blockaddress arguments are replaced with `!` label constraints that refer directly to callbr indirect destinations: ; Before: %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo)) to label %asm.fallthrough [label %foo] ; After: %res = callbr i8* asm "", "=r,r,!i"(i8* %x) to label %asm.fallthrough [label %foo] The benefit of this is that we can easily update the successors of a callbr, without having to worry about also updating blockaddress references. This should allow us to remove some limitations: * Allow unrolling/peeling/rotation of callbr, or any other clone-based optimizations (https://github.com/llvm/llvm-project/issues/41834) * Allow duplicate successors (https://github.com/llvm/llvm-project/issues/45248) This is just the IR representation change though, I will follow up with patches to remove limtations in various transformation passes that are no longer needed. Differential Revision: https://reviews.llvm.org/D129288	2022-07-15 10:18:17 +02:00
Craig Topper	dcfc1fd26f	[SelectionDAG][RISCV][AMDGPU][ARM] Improve SimplifyDemandedBits for SHL with variable shift amount. If we have a variable shift amount and the demanded mask has leading zeros, we can propagate those leading zeros to not demand those bits from operand 0. This can allow zero_extend/sign_extend to become any_extend. This pattern can occur due to C integer promotion rules. This transform is already done by InstCombineSimplifyDemanded.cpp where sign_extend can be turned into zero_extend for example. Reviewed By: spatel, foad Differential Revision: https://reviews.llvm.org/D121833	2022-07-14 16:10:14 -07:00
Kazu Hirata	611ffcf4e4	[llvm] Use value instead of getValue (NFC)	2022-07-13 23:11:56 -07:00
Philip Reames	dde2a7fb6d	[RISCV] Exploit fact that vscale is always power of two to replace urem sequence When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale. vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.) We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.) It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic. Differential Revision: https://reviews.llvm.org/D129609	2022-07-13 10:54:47 -07:00
Simon Pilgrim	d172842b51	[DAG] SimplifyDemandedVectorElts - adjust demanded elements for selection mask for known zero results If an element is known zero from both selections then it shouldn't matter what the selection mask element is.	2022-07-13 17:36:05 +01:00
Philip Reames	fd67992f9c	[DAGCombine] fold (urem x, (lshr pow2, y)) -> (and x, (add (lshr pow2, y), -1)) We have the same fold in InstCombine - though implemented via OrZero flag on isKnownToBePowerOfTwo. The reasoning here is that either a) the result of the lshr is a power-of-two, or b) we have a div-by-zero triggering UB which we can ignore. Differential Revision: https://reviews.llvm.org/D129606	2022-07-13 08:34:38 -07:00
Craig Topper	8eaf00e04d	[TargetLowering][RISCV] Make expandCTLZ work for non-power of 2 types. To convert CTLZ to popcount we do x = x \| (x >> 1); x = x \| (x >> 2); ... x = x \| (x >>16); x = x \| (x >>32); // for 64-bit input return popcount(~x); This smears the most significant set bit across all of the bits below it then inverts the remaining 0s and does a population count. To support non-power of 2 types, the last shift amount must be more than half of the size of the type. For i15, the last shift was previously a shift by 4, with this patch we add another shift of 8. Fixes PR56457. Differential Revision: https://reviews.llvm.org/D129431	2022-07-12 11:36:37 -07:00
Simon Pilgrim	ded62411f7	[DAG] SimplifyDemandedBits - AND/OR/XOR - attempt basic knownbits simplifications before calling SimplifyMultipleUseDemandedBits Noticed while investigating the SystemZ regressions in D77804, prefer handling the knownbits analysis/simplification in the bitop nodes directly before falling back to SimplifyMultipleUseDemandedBits	2022-07-12 14:09:00 +01:00
Nikita Popov	c64aba5d93	[SDAG] Don't duplicate ParseConstraints() implementation SDAGBuilder (NFCI) visitInlineAsm() in SDAGBuilder was duplicating a lot of the code in ParseConstraints(), in particular all the logic to determine the operand value and constraint VT. Rely on the data computed by ParseConstraints() instead, and update its ConstraintVT implementation to match getCallOperandValEVT() more precisely.	2022-07-12 10:42:02 +02:00
Craig Topper	b05160dbdf	[SelectionDAG] Simplify how we drop poison flags in SimplifyDemandedBits. As far as I can tell what was happening in the original code is that the getNode call receives the same operands as the original node with different SDNodeFlags. The logic inside getNode detects that the node already exists and intersects the flags into the existing node and returns it. This results in Op and NewOp for the TLO.CombineTo call always being the same node. We may have already called CombineTo as part of the recursive handling. A second call to CombineTo as we unwind the recursion overwrites the previous CombineTo. I think this means any time we updated the poison flags that was the only change that ends up getting made and we relied on DAGCombiner to revisit and call SimplifyDemandedBits again. The second time the poison flags wouldn't need to be dropped and we would keep the CombineTo call from further down the recursion. We can instead call setFlags to drop the poison flags and remove the call to TLO.CombineTo. This way we keep the CombineTo from deeper in the recursion which should be more efficient. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D129511	2022-07-11 13:42:33 -07:00
Sanjay Patel	d0eec5f7e7	[SDAG] enhance sub->xor fold to ignore signbit As suggested in the post-commit feedback for D128123, we can ease the mask constraint to ignore the MSB (and make the code easier to read by adjusting the check). https://alive2.llvm.org/ce/z/bbvqWv	2022-07-11 12:37:50 -04:00
Kazu Hirata	1fd6611fc8	[SelectionDAG] Restore calls to has_value (NFC) This patch restores calls to has_value to make it clear that we are checking the presence of an optional value, not the underlying value. This patch partially reverts `d08f34b592`. Differential Revision: https://reviews.llvm.org/D129454	2022-07-10 14:37:23 -07:00
Nicolai Hähnle	ede600377c	ManagedStatic: remove many straightforward uses in llvm (Reapply after revert in `e9ce1a5880` due to Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other than error categories, to be checked in more detail and reapplied separately.) Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, moving the static variable directly into the only function that uses it. Differential Revision: https://reviews.llvm.org/D129120	2022-07-10 10:29:15 +02:00
Nicolai Hähnle	e9ce1a5880	Revert "ManagedStatic: remove many straightforward uses in llvm" This reverts commit `e6f1f06245`. Reverting due to a failure on the fuchsia-x86_64-linux buildbot.	2022-07-10 09:54:30 +02:00
Nicolai Hähnle	e6f1f06245	ManagedStatic: remove many straightforward uses in llvm Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, moving the static variable directly into the only function that uses it. Differential Revision: https://reviews.llvm.org/D129120	2022-07-10 09:15:08 +02:00
Craig Topper	40866b74bd	[DAGCombiner][X86] Fold sra (sub AddC, (shl X, N1C)), N1C --> sext (sub AddC1',(trunc X to (width - N1C))) We already handled this case for add with a constant RHS. A similar pattern can occur for sub with a constant left hand side. Test cases use add and a mul representing (neg (shl X, C)) because that's what I saw in the wild. The mul will be decomposed and then the new transform can kick in. Tests have not been committed, but this patch shows the changes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D128769	2022-07-09 11:53:44 -07:00
Simon Pilgrim	b53046122f	[DAG] SimplifyDemandedBits - fold AND(INSERT_SUBVECTOR(C,X,I),M) -> INSERT_SUBVECTOR(AND(C,M),X,I) If all the demanded bits of the AND mask covering the inserted subvector 'X' are known to be one, then the mask isn't affecting the subvector at all. In which case, if the base vector 'C' is undef/constant, then move the AND mask up to just (constant) fold it directly. Addresses some of the regressions from D129150, particularly the cases where we're attempting to zero the upper elements of a widened vector. Differential Revision: https://reviews.llvm.org/D129290	2022-07-08 16:08:31 +01:00
Sanjay Patel	8b75671314	[SDAG] try to replace subtract-from-constant with xor This is almost the same as the abandoned D48529, but it allows splat vector constants too. This replaces the x86-specific code that was added with the alternate patch D48557 with the original generic combine. This transform is a less restricted form of an existing InstCombine and the proposed SDAG equivalent for that in D128080: https://alive2.llvm.org/ce/z/OUm6N_ Differential Revision: https://reviews.llvm.org/D128123	2022-07-08 08:14:24 -04:00
OCHyams	6b62ca9043	[NFC][SelectionDAG] Fix debug prints in salvageUnresolvedDbgValue The prints are printing pointer values - fix by dereferencing the pointers.	2022-07-08 12:09:30 +01:00
Sergei Barannikov	2247fdc84d	[SelectionDAG] computeKnownBits / ComputeNumSignBits for the remaining overflow-aware nodes Some overflow-aware nodes were missing from the switches in computeKnownBits and ComputeNumSignBits.	2022-07-08 09:19:19 +01:00
Bradley Smith	60d6be5dd3	[LegalizeTypes] Replace vecreduce_xor/or/and with vecreduce_add/umax/umin if not legal This is done during type legalization since the target representation of these nodes may not be valid until after type legalization, and after type legalization the fact that these are dealing with i1 types may be lost. Differential Revision: https://reviews.llvm.org/D128996	2022-07-07 09:33:54 +00:00
Sander de Smalen	15c3ba8a44	[AArc64] Legalisation of compares and truncates of nxv1i1 types. Truncates and compares require some changes to generic legalisation functions to use ElementCount instead of getNumElements. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D129082	2022-07-07 07:39:27 +00:00
Edd Barrett	ed8ef65f3d	[stackmaps] Start legalizing live variable operands Prior to this change, live variable operands passed to `llvm.experimental.stackmap` would be emitted directly to target nodes, meaning that they don't get legalised. The upshot of this is that LLVM may crash when encountering illegally typed target nodes. e.g. https://github.com/llvm/llvm-project/issues/21657 This change introduces a platform independent stackmap DAG node whose operands are legalised as per usual, thus avoiding aforementioned crashes. Note that some kinds of argument are still not handled properly, namely vectors, structs, and large integers, like i128s. These will need to be addressed in follow-up changes. Note also that this does not change the behaviour of `llvm.experimental.patchpoint`. A follow up change will do the same for this intrinsic. Differential review: https://reviews.llvm.org/D125680	2022-07-06 14:01:54 +01:00
Shilei Tian	1023ddaf77	[LLVM] Add the support for fmax and fmin in atomicrmw instruction This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instruction will be expanded to CAS loop. There are already a couple of targets supporting the feature. I'll create another patch(es) to enable them accordingly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127041	2022-07-06 10:57:53 -04:00
Nikita Popov	bb84e5eeff	[SelectionDAGISel] Drop unused variable (NFC)	2022-07-06 10:46:13 +02:00
Nikita Popov	8ee913d83b	[IR] Remove Constant::canTrap() (NFC) As integer div/rem constant expressions are no longer supported, constants can no longer trap and are always safe to speculate. Remove the Constant::canTrap() method and its usages.	2022-07-06 10:36:47 +02:00
Simon Pilgrim	7068c843d2	[DAG] visitREM - use isAllOnesOrAllOnesSplat instead of isConstOrConstSplat We were only using the N1C scalar/splat value once, so for clarity use isAllOnesOrAllOnesSplat instead if we actually need it.	2022-07-05 16:44:31 +01:00
Simon Pilgrim	e7a0fa4df0	[DAG] foldAddSubOfSignBit - don't bother creating the new shift node unless constant folding succeeds Noticed by inspection - the new shift is only ever used if the constant fold occurs	2022-07-05 16:44:31 +01:00
Simon Pilgrim	cce64e7a9c	[DAG] visitTRUNCATE - move GetDemandedBits AFTER SimplifyDemandedBits. Another cleanup step before removing GetDemandedBits entirely.	2022-07-04 11:25:40 +01:00
Nikita Popov	7283f48a05	[IR] Remove support for insertvalue constant expression This removes the insertvalue constant expression, as part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179. This is very similar to the extractvalue removal from D125795. insertvalue is also not supported in bitcode, so no auto-ugprade is necessary. ConstantExpr::getInsertValue() can be replaced with IRBuilder::CreateInsertValue() or ConstantFoldInsertValueInstruction(), depending on whether a constant result is required (with the latter being fallible). The ConstantExpr::hasIndices() and ConstantExpr::getIndices() methods also go away here, because there are no longer any constant expressions with indices. Differential Revision: https://reviews.llvm.org/D128719	2022-07-04 09:27:22 +02:00
Sander de Smalen	690db16422	[AArch64] Make nxv1i1 types a legal type for SVE. One motivation to add support for these types are the LD1Q/ST1Q instructions in SME, for which we have defined a number of load/store intrinsics which at the moment still take a `<vscale x 16 x i1>` predicate regardless of their element type. This patch adds basic support for the nxv1i1 type such that it can be passed/returned from functions, as well as some basic support to support some existing tests that result in a nxv1i1 type. It also adds support for splats. Other operations (e.g. insert/extract subvector, logical ops, etc) will be supported in follow-up patches. Reviewed By: paulwalker-arm, efriedma Differential Revision: https://reviews.llvm.org/D128665	2022-07-01 15:11:13 +00:00
Xiang1 Zhang	72a23cef7e	[ISel] Match all bits when merge undefs for DAG combine Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D128570	2022-07-01 09:09:43 +08:00
Xiang1 Zhang	64f44a90ef	Revert "[ISel] Match all bits when merge undef(s) for DAG combine" This reverts commit `5fe5aa284e`.	2022-07-01 08:59:04 +08:00
Xiang1 Zhang	5fe5aa284e	[ISel] Match all bits when merge undef(s) for DAG combine	2022-07-01 08:58:00 +08:00
jeff	09424f802c	[AMDGPU] Check for CopyToReg PhysReg clobbers in pre-RA-sched Differential Revision: https://reviews.llvm.org/D128681	2022-06-30 09:18:04 -07:00
Nikita Popov	16033ffdd9	[ConstExpr] Remove more leftovers of extractvalue expression (NFC) Remove some leftover bits of extractvalue handling after the removal in D125795.	2022-06-29 10:45:19 +02:00
Tim Northover	4aafebce52	SelectionDAG: allow FP extensions when folding extract/insert. Before, we were trying to sign extend half -> float, and asserted in getNode.	2022-06-28 12:08:35 +01:00
Guillaume Chatelet	3c126d5fe4	[Alignment] Replace commonAlignment with std::min `commonAlignment` is a shortcut to pick the smallest of two `Align` objects. As-is it doesn't bring much value compared to `std::min`. Differential Revision: https://reviews.llvm.org/D128345	2022-06-28 07:15:02 +00:00
Bradley Smith	a83aa33d1b	[IR] Move vector.insert/vector.extract out of experimental namespace These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of the experimental namespace. Differential Revision: https://reviews.llvm.org/D127976	2022-06-27 10:48:45 +00:00
Kazu Hirata	94460f5136	Don't use Optional::hasValue (NFC) This patch replaces x.hasValue() with x where x is contextually convertible to bool.	2022-06-26 19:54:41 -07:00
Kazu Hirata	d08f34b592	[llvm] Don't use Optional::hasValue (NFC) This patch replaces Optional::hasValue with the implicit cast to bool in conditionals only.	2022-06-26 18:31:51 -07:00
Kazu Hirata	a7938c74f1	[llvm] Don't use Optional::hasValue (NFC) This patch replaces Optional::hasValue with the implicit cast to bool in conditionals only.	2022-06-25 21:42:52 -07:00
Kazu Hirata	3b7c3a654c	Revert "Don't use Optional::hasValue (NFC)" This reverts commit `aa8feeefd3`.	2022-06-25 11:56:50 -07:00
Kazu Hirata	aa8feeefd3	Don't use Optional::hasValue (NFC)	2022-06-25 11:55:57 -07:00
chenglin.bi	8c74205642	[SelectionDAG][DAGCombiner] Reuse exist node by reassociate When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122539	2022-06-24 23:15:06 +08:00
Nabeel Omer	0d41794335	[SLP] Add cost model for `llvm.powi.` intrinsics (REAPPLIED) Patch was reverted in `4c5f10a` due to buildbot failures, now being reapplied with updated AArch64 and RISCV tests. This patch adds handling for the llvm.powi. intrinsics in BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization. Closes #53887. Differential Revision: https://reviews.llvm.org/D128172	2022-06-24 10:23:19 +00:00
Lian Wang	1ce30457c1	[LegalizeTypes][NFC] Add an assert to WidenVecRes_EXTRACT_SUBVECTOR and adjust some code Reviewed By: craig.topper, david-arm Differential Revision: https://reviews.llvm.org/D128038	2022-06-24 03:06:16 +00:00
Lian Wang	770fe864fe	[SelectionDAG] Enable WidenVecOp_VECREDUCE for scalable vector Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128239	2022-06-24 02:32:53 +00:00
Craig Topper	8b10ffabae	[RISCV] Disable <vscale x 1 x > types with Zve32x or Zve32f. According to the vector spec, mf8 is not supported for i8 if ELEN is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32. Since RVVBitsPerBlock is 64 and LMUL is calculated as ((MinNumElements ElementSize) / RVVBitsPerBlock) this means we need to disable any type with MinNumElements==1. For generic IR, these types will now be widened in type legalization. For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan to work on disabling the intrinsics in the riscv_vector.h header. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D128286	2022-06-23 08:49:18 -07:00
chenglin.bi	9c2bf534f5	Revert "[SelectionDAG][DAGCombiner] Reuse exist node by reassociate" This reverts commit `6c951c5ee6`.	2022-06-23 13:21:51 +08:00
Guillaume Chatelet	57ffff6db0	Revert "[NFC] Remove dead code" This reverts commit `8ba2cbff70`.	2022-06-22 14:55:47 +00:00
Guillaume Chatelet	8ba2cbff70	[NFC] Remove dead code	2022-06-22 13:33:58 +00:00
Simon Pilgrim	2c3a4a9334	[DAG] SelectionDAG::GetDemandedBits - don't recurse back into GetDemandedBits Another minor cleanup as we work toward removing GetDemandedBits entirely - call SimplifyMultipleUseDemandedBits directly.	2022-06-22 13:48:57 +01:00
Simon Pilgrim	1c2b756cd6	[DAG] visitTRUNCATE - move TRUNCATE(ADDE/ADDCARRY) folds to switch statement handling the other binops. NFC.	2022-06-21 22:07:41 +01:00
Simon Pilgrim	8cecb6be56	[DAG] Remove SelectionDAG::GetDemandedBits DemandedElts variant. NFC. We're slowly removing SelectionDAG::GetDemandedBits and replacing it with SimplifyMultipleUseDemandedBits, we no longer have any uses for the vector demanded elt variant.	2022-06-21 21:23:10 +01:00
Nabeel Omer	4c5f10aeeb	Revert rGe6ccb57bb3f6b761f2310e97fd6ca99eff42f73e "[SLP] Add cost model for `llvm.powi.*` intrinsics" This reverts commit `e6ccb57bb3`.	2022-06-21 15:05:55 +00:00
Nabeel Omer	e6ccb57bb3	[SLP] Add cost model for `llvm.powi.` intrinsics This patch adds handling for the llvm.powi. intrinsics in BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization. Closes #53887. Differential Revision: https://reviews.llvm.org/D128172	2022-06-21 14:40:34 +00:00
Kazu Hirata	7a47ee51a1	[llvm] Don't use Optional::getValue (NFC)	2022-06-20 22:45:45 -07:00
Kazu Hirata	d66cbc565a	Don't use Optional::hasValue (NFC)	2022-06-20 20:26:05 -07:00
Kazu Hirata	0916d96d12	Don't use Optional::hasValue (NFC)	2022-06-20 20:17:57 -07:00
chenglin.bi	6c951c5ee6	[SelectionDAG][DAGCombiner] Reuse exist node by reassociate When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122539	2022-06-21 09:45:19 +08:00
David Green	c0ecbfa4fd	[AArch64] Known bits for AArch64ISD::DUP An AArch64ISD::DUP is just a splat, where the known bits for each lane are the same as the input. This teaches that to computeKnownBitsForTargetNode. Problems arise for constants though, as a constant BUILD_VECTOR can be lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then turn back into a constant BUILD_VECTOR leading to an infinite cycle. This has been prevented by adding a isTargetCanonicalConstantNode node to prevent the conversion back into a BUILD_VECTOR. Differential Revision: https://reviews.llvm.org/D128144	2022-06-20 19:11:57 +01:00
Kazu Hirata	e0e687a615	[llvm] Don't use Optional::hasValue (NFC)	2022-06-20 10:38:12 -07:00
Guillaume Chatelet	03036061c7	[Alignment] Use 'previous()' method instead of scalar division This is in preparation of integration with D128052. Differential Revision: https://reviews.llvm.org/D128169	2022-06-20 11:01:43 +00:00
Simon Pilgrim	e4a124dda5	[DAG] Fold (srl (shl x, c1), c2) -> and(shl/srl(x, c3), m) Similar to the existing (shl (srl x, c1), c2) fold Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125836	2022-06-20 08:37:38 +01:00
Lian Wang	ab25e263a9	[SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D127710	2022-06-20 06:30:26 +00:00
Craig Topper	314dbde12c	[DAGCombiner][ARM][RISCV] Teach ShrinkLoadReplaceStoreWithStore to use truncstore. The VT we want to shrink to may not be legal especially after type legalization. Fixes PR56110. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D128135	2022-06-19 15:50:15 -07:00
Simon Pilgrim	ba3f2667b6	[DAG] Add MaskedVectorIsZero helper Equivalent to MaskedValueIsZero, except its checking if all of the demanded vectors elements are known to be zero	2022-06-19 17:56:30 +01:00
Simon Pilgrim	1ebe5cac46	[DAG] SimplifyDemandedBits - add DemandedElts handling to ISD::SIGN_EXTEND_INREG simplification	2022-06-19 15:35:29 +01:00
Simon Pilgrim	db1be696c4	[DAG] SimplifyDemandedBits - add ISD::VSELECT handling	2022-06-19 15:18:25 +01:00
Kazu Hirata	129b531c9c	[llvm] Use value_or instead of getValueOr (NFC)	2022-06-18 23:07:11 -07:00
Paul Walker	0e21f1d56a	[SelectionDAG] Extend WidenVecOp_INSERT_SUBVECTOR to cover more cases. WidenVecOp_INSERT_SUBVECTOR only supported cases where widening effectively converts the insert into a copy. However, when the widened subvector is no bigger than the vector being inserted into and we can be sure there's no loss of data, we can simply emit another INSERT_SUBVECTOR. Fixes: #54982 Differential Revision: https://reviews.llvm.org/D127508	2022-06-17 12:39:42 +00:00
Lian Wang	f2bcf33058	[LegalizeTypes][NFC] Merge promote SPLAT_VECTOR and promote SCALAR_TO_VECTOR to one function Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D127825	2022-06-17 02:43:52 +00:00
Lian Wang	16215eb979	[LegalizeTypes][RISCV][NFC] Modify assert in PromoteIntRes_STEP_VECTOR and add some tests for RISCV Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D127939	2022-06-17 02:26:09 +00:00
Craig Topper	e6c7a3a54f	[SelectionDAG] Don't apply MinRCSize constraint in InstrEmitter::AddRegisterOperand for IMPLICIT_DEF sources. MinRCSize is 4 and prevents constrainRegClass from changing the register class if the new class has size less than 4. IMPLICIT_DEF gets a unique vreg for each use and will be removed by the ProcessImplicitDef pass before register allocation. I don't think there is any reason to prevent constraining the virtual register to whatever register class the use needs. The attached test case was previously creating a copy of IMPLICIT_DEF because vrm8nov0 has 3 registers in it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128005	2022-06-16 14:55:14 -07:00
Adrian Tong	55311801f0	Allow bitwidth difference when checking for isOneOrOneSplat. This helps handling a case where the BUILD_VECTOR has i16 element type and i32 constant operands t2: v8i16 = setcc t8, t17, setult:ch t3: v8i16 = BUILD_VECTOR Constant:i32<1>, ... t4: v8i16 = and t2, t3 t5: v8i16 = add t8, t4 This can be turned into t5: v8i16 = sub t8, t2, and allows us to remove t3 and t4 from the DAG. Differential Revision: https://reviews.llvm.org/D127354	2022-06-16 16:04:20 +00:00
Benjamin Kramer	8c4a07c61f	[DAGCombiner] Fold fold (fp_to_bf16 (bf16_to_fp op)) -> op	2022-06-15 19:54:39 +02:00
Benjamin Kramer	ca50cb120b	[SelectionDAG] Constant fold FP_TO_BF16 and BF16_TO_FP.	2022-06-15 18:51:32 +02:00
Paul Robinson	654a835c3f	[PS5] Trap after noreturn calls, with special case for stack-check-fail	2022-06-15 09:02:17 -07:00
Benjamin Kramer	fb34d531af	Promote bf16 to f32 when the target doesn't support it This is modeled after the half-precision fp support. Two new nodes are introduced for casting from and to bf16. Since casting from bf16 is a simple operation I opted to always directly lower it to integer arithmetic. The other way round is more complicated if you want to preserve IEEE semantics, so it's handled by a new __truncsfbf2 compiler-rt builtin. This is of course very bare bones, but sufficient to get a semi-softened fadd on x86. Possible future improvements: - Targets with bf16 conversion instructions can now make fp_to_bf16 legal - The software conversion to bf16 can be replaced by a trivial implementation under fast math. Differential Revision: https://reviews.llvm.org/D126953	2022-06-15 12:56:31 +02:00
Simon Pilgrim	f096d5926d	[DAG] Fix SDLoc mismatch in (shl (srl x, c1), c2) -> and(shift(x,c3)) fold Noticed by @craig.topper on D125836 which uses a tweaked copy of the same code. Differential Revision: https://reviews.llvm.org/D127772	2022-06-15 11:07:59 +01:00
Ping Deng	c06f77ec0d	[SelectionDAG] fold 'Op0 - (X * MulC)' to 'Op0 + (X << log2(-MulC))' Reviewed By: craig.topper, spatel Differential Revision: https://reviews.llvm.org/D127474	2022-06-15 05:50:18 +00:00
Guillaume Chatelet	5a293d21fc	[NFC][Alignment] Use getAlign in SelectionDAGBuilder	2022-06-13 15:13:05 +00:00
Nikita Popov	b9a7dea917	[SelectionDAG] Handle trapping aggregate (PR49839) Call canTrap() on Constant to account for trapping ConstantAggregate.	2022-06-13 15:06:53 +02:00
Simon Pilgrim	7d8fd4f5db	[DAG] visitINSERT_VECTOR_ELT - attempt to reconstruct BUILD_VECTOR before other fold interfere Another issue unearthed by D127115 We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with). D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded. This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late. Differential Revision: https://reviews.llvm.org/D127595	2022-06-13 11:48:18 +01:00
Simon Pilgrim	1cf9b24da3	[DAG] Enable ISD::FSHL/R SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. This helps with several of the regressions from D125836	2022-06-12 19:25:20 +01:00
Simon Pilgrim	54ae4ca755	[DAG] visitSRL - pull out ShiftVT. NFC.	2022-06-12 14:02:23 +01:00
Simon Pilgrim	cf5c63d187	[DAG] visitVECTOR_SHUFFLE - fold splat(insert_vector_elt()) and splat(scalar_to_vector()) to build_vector splats Addresses a number of regressions identified in D127115	2022-06-11 21:06:42 +01:00
Simon Pilgrim	44a0cd25df	[DAG] visitINSERT_VECTOR_ELT - add <1 x ???> insert_vector_elt(v0,extract_vector_elt(v1,0),0) special case handling Check if we're just replacing one v1x?? vector with another	2022-06-11 19:30:00 +01:00
Simon Pilgrim	a71ad6a3c8	[DAG] visitINSERT_VECTOR_ELT - fold insert_vector_elt(scalar_to_vector(x),v,i) -> build_vector() Allow scalar_to_vector nodes to be used for the start of a build_vector creation	2022-06-11 15:29:22 +01:00
Simon Pilgrim	693f4db1ec	[DAG] visitINSERT_VECTOR_ELT - refactor BUILD_VECTOR insertion to remove early-out. NFCI. Remove the early-out cases so we can more easily add additional folds in the future.	2022-06-11 12:01:13 +01:00
Paul Walker	10d55c4634	[SelectionDAG] Remove invalid TypeSize conversion from WidenVecOp_BITCAST. Differential Revision: https://reviews.llvm.org/D127322	2022-06-11 10:41:13 +01:00
Guillaume Chatelet	95083fa3b8	[NFC] Remove deadcode	2022-06-10 15:13:42 +00:00
Simon Pilgrim	91adbc3208	[DAG] SimplifyDemandedVectorElts - adding SimplifyMultipleUseDemandedVectorElts handling to ISD::CONCAT_VECTORS Attempt to look through multiple use operands of ISD::CONCAT_VECTORS nodes Another minor improvement for D127115	2022-06-10 16:06:43 +01:00
Guillaume Chatelet	38637ee477	[clang] Add support for __builtin_memset_inline In the same spirit as D73543 and in reply to https://reviews.llvm.org/D126768#3549920 this patch is adding support for `__builtin_memset_inline`. The idea is to get support from the compiler to easily write efficient memory function implementations. This patch could be split in two: - one for the LLVM part adding the `llvm.memset.inline.*` intrinsics. - and another one for the Clang part providing the instrinsic as a builtin. Differential Revision: https://reviews.llvm.org/D126903	2022-06-10 13:13:59 +00:00
Simon Pilgrim	7dbfcfa735	[DAG] combineInsertEltToShuffle - if EXTRACT_VECTOR_ELT fails to match an existing shuffle op, try to replace an undef op if there is one. This should fix a number of shuffle regressions in D127115 where the re-ordered combines mean we fail to fold a EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT sequence into a BUILD_VECTOR if we extract from more than one vector source.	2022-06-09 14:56:14 +01:00
Guillaume Chatelet	dc3367970e	[SelectionDAG] Handle bzero/memset libcalls globally instead of per target Differential Revision: https://reviews.llvm.org/D127279	2022-06-09 08:34:55 +00:00
Craig Topper	4bcfc41846	[SelectionDAG] Teach computeKnownBits that a nsw self multiply produce a positive value. This matches what we do in IR. For the RISC-V test case, this allows us to use -8 for the AND mask instead of materializing a constant in a register. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D127335	2022-06-08 14:55:58 -07:00
Kai Nacke	d897a14c2e	[SystemZ] Fix check for zero size when lowering memcmp. During lowering of memcmp/bcmp, the check for a size of 0 is done in 2 different ways. In rare cases this can lead to a crash in SystemZSelectionDAGInfo::EmitTargetCodeForMemcmp(). The root cause is that SelectionDAGBuilder::visitMemCmpBCmpCall() checks for a constant int value which is not yet evaluated. When the value is turned into a SDValue, then the evaluation is done and results in a ConstantSDNode. But EmitTargetCodeForMemcmp() expects the special case of 0 length to be handled, which results in an assertion. The fix is to turn the value into a SDValue, so that both functions use the same check. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D126900	2022-06-08 14:52:13 -04:00
Simon Pilgrim	b84c10d4bc	[DAG] visitVSELECT - don't wait for truncation of sub before attempting to match with getTruncatedUSUBSAT Fixes some X86 PSUBUS regressions encountered in D127115 where the truncate was being replaced with a PACKSS/PACKUS before the fold got called again	2022-06-08 16:16:35 +01:00
Paul Walker	d88354213c	[SelectionDAG] Remove invalid TypeSize conversion from PromoteIntRes_BITCAST. Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work with TypeSize directly rather than silently casting to unsigned. To accomplish this I've extended TypeSize with an interface that essentially allows TypeSize division when both operands have the same number of dimensions. There still exists combinations of scalable vector bitcasts that cause compiler crashes. I call these out by adding "is missing" entries to sve-bitcast. Depends on D126957. Fixes: #55114 Differential Revision: https://reviews.llvm.org/D127126	2022-06-08 10:30:07 +01:00
Paul Walker	a1121c31d8	[SVE] Fix incorrect code generation for bitcasts of unpacked vector types. Bitcasting between unpacked scalable vector types of different element counts is not a NOP because the live elements are laid out differently. 01234567 e.g. nxv2i32 = XX??XX?? nxv4f16 = X?X?X?X? Differential Revision: https://reviews.llvm.org/D126957	2022-06-08 10:30:07 +01:00
Simon Pilgrim	a083f3caa1	[DAG] combineShuffleOfSplatVal - fold shuffle(splat,undef) -> splat, iff the splat contains no UNDEF elements As noticed on D127115 - we were missing this fold, instead just having the shuffle(shuffle(x,undef,splatmask),undef) fold. We should be able to merge these into one using SelectionDAG::isSplatValue, but we'll need to match the shuffle's undef handling first. This also exposed an issue in SelectionDAG::isSplatValue which was incorrectly propagating the undef mask across a bitcast (it was trying to just bail with a APInt::isSubsetOf if it found any undefs but that was actually the wrong way around so didn't fire for partial undef cases).	2022-06-07 16:42:24 +01:00
Guillaume Chatelet	0788186182	[Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times. Differential Revision: https://reviews.llvm.org/D126910	2022-06-07 13:52:20 +00:00
Nikita Popov	5a64bc207e	[DAGCombiner] Remove overzealous assertion when folding assert+trunc+assert (PR55846) These assert that there are no "useless" assertzext/assertsext nodes (that assert a wider width than a following trunc), but I don't think there is anything preventing such nodes from reaching this code. I don't think the assertion is relevant for correctness of this transform either -- if such an assert is present, then the other one will always be to a smaller width, and we'll pick that one. The assertion dates back to D37017. Fixes https://github.com/llvm/llvm-project/issues/55846. Differential Revision: https://reviews.llvm.org/D126952	2022-06-07 09:50:26 +02:00
Craig Topper	be398100ea	[SelectionDAG] Further improve computeKnownBits for (smax X, C) where C is non-negative. Move the code that was added for D126896 after the normal recursive calls to computeKnownBits. This allows us to calculate trailing zeros. Previously we would break out of the switch before the recursive calls.	2022-06-06 09:59:23 -07:00
Lian Wang	20cf77f776	[LegalizeTypes][VP] Add widen and split support for vp.fptrunc and vp.fpext Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D126439	2022-06-06 02:28:01 +00:00
Fangrui Song	d86a206f06	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 00:31:44 -07:00
Fangrui Song	557efc9a8b	[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded. Also remove cl::init(false) while touching the lines.	2022-06-03 21:59:05 -07:00
Benjamin Kramer	e8e4b741dd	[DAGCombiner] Add bf16 to the matrix of types that we don't promote to integer stores Remove a few stray semicolons while there.	2022-06-03 13:28:34 +02:00
Nikita Popov	ad742cf85d	[DAGCombine] Handle promotion of shift with both operands the same When promoting a shift, make sure we only fetch the second operand after promoting the first. Load promotion may replace users of the old load, and we don't want to be left with a dangling reference to the old load instruction. The crashing test case is from https://reviews.llvm.org/D126689#3553212. Differential Revision: https://reviews.llvm.org/D126886	2022-06-03 10:00:44 +02:00
Craig Topper	fa20bf1636	[DAGCombiner][RISCV] Improve computeKnownBits for (smax X, C) where C is non-negative. If C is non-negative, the result of the smax must also be non-negative, so all sign bits of the result are 0. This allows DAGCombiner to remove a zext_inreg in the modified test. This zext_inreg started as a sext that became zext before type legalization then was promoted to a zext_inreg. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126896	2022-06-02 12:34:24 -07:00
jacquesguan	5482ae6328	[LegalizeTypes][VP] Add widen and split support for VP FP integer casting op. This patch adds widen and split support for VP_FPTOSI, VP_FPTOUI, VP_SITOFP and VP_UITOFP. Differential Revision: https://reviews.llvm.org/D126847	2022-06-02 09:05:27 +00:00
jacquesguan	058791d8f2	[LegalizeTypes][VP] Add widen and split support for VP_SIGN_EXTEND and VP_ZERO_EXTEND. Differential Revision: https://reviews.llvm.org/D126442	2022-06-02 02:21:22 +00:00
Ping Deng	ae8ae45e2a	[DAGCombine][NFC] Add braces to 'else' to match braced 'if' Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D126624	2022-06-01 07:54:05 +00:00
Ping Deng	88af539c0e	[RISCV] Support VP_REDUCE_MUL mask operation Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126520	2022-05-30 03:05:39 +00:00
Ping Deng	083798e270	[LegalizeTypes][VP] Add integer promotion support for vp.fptosi/vp.fptoui Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125760	2022-05-30 03:05:39 +00:00
Ping Deng	121689a62e	[SelectionDAG][NFC] Simplify integer promotion in setcc/vp.setcc Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126516	2022-05-27 05:50:19 +00:00
Craig Topper	460781feef	[LegalizeTypes] Fix bug in expensive checks verification With a fix for an expensive checks build failure exposed by new RISC-V tests. Something about expanding two rotates in type legalization caused a change in the remapping tables that the expensive checks verifying wasn't expecting. See comment in the code for how it was fixed. Tests came from this commit that exposed the bug [RISCV] Add test cases showing failure to remove mask on rotate amounts. If the masking AND has multiple users we fail to remove it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D126036	2022-05-26 13:13:32 -07:00
Simon Pilgrim	f366acdbf6	[DAG] Generalize (sra (trunc (sra x, c1)), c2) -> (trunc (sra x, c1 + c2)) constant folding Remove local (uniform) constant folding and rely on getNode() to perform it Minor cleanup step toward adding non-uniform shift amount support	2022-05-26 14:05:09 +01:00
Simon Pilgrim	7b617eef80	[DAG] Cleanup "and/or of cmp with single bit diff" fold to use ISD::matchBinaryPredicate Prep work as I'm investigating some cases where TLI::convertSetCCLogicToBitwiseLogic should accept vectors.	2022-05-26 12:34:09 +01:00
Lian Wang	8aa6b05deb	[LegalizeTypes][VP] Add widen and split support for VP_TRUNCATE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125950	2022-05-26 02:03:27 +00:00
Paul Walker	6f215ca680	[SelectionDAG] Add support to widen ISD::STEP_VECTOR operations. Fixes: #55165 Differential Revision: https://reviews.llvm.org/D126168	2022-05-24 22:42:37 +01:00
Serge Pavlov	6fc0bc5b0f	Fix behavior of is_fp_class on empty class set The second argument to is_fp_class specifies the set of floating-point class to test against. It can be zero, in this case the intrinsic is expected to return zero value. Differential Revision: https://reviews.llvm.org/D112025	2022-05-24 21:50:18 +07:00
Simon Pilgrim	11455e4758	[DAG] Unroll vectorized FPOW instructions before widening that will scalarize to libcalls anyway Followup to D125988 - FPOW is similar to FREM and will most likely scalarize to libcalls, so unroll before widening to prevent use making additional libcalls with UNDEF args.	2022-05-24 15:44:53 +01:00
Nabeel Omer	8b5d9cbbfe	[x86][DAG] Unroll vectorized FREMs that will become libcalls Currently, two element vectors produced as the result of a binary op are widened to four element vectors on x86 by DAGTypeLegalizer::WidenVecRes_BinaryCanTrap. If the op still isn't legal after widening it is unrolled into 4 scalar ops in SelectionDAG before being converted into a libcall. This way we end up with 4 libcalls (two of them on known undef elements) instead of the original two libcalls. This patch modifies DAGTypeLegalizer::WidenVectorResult to ensure that if it is known that a binary op will be tunred into a libcall, it is unrolled instead of being widened. This prevents the creation of the extra scalar instructions on known undef elements and (eventually) libacalls with known undef parameters which would otherwise be created when the op gets expanded post widening. Differential Revision: https://reviews.llvm.org/D125988	2022-05-24 13:34:51 +01:00
Fraser Cormack	7f7ef0ed61	[LegalizeTypes][NFC] Fix node name in assertion message This was probably copy/pasted from the MSCATTER widening.	2022-05-24 09:16:18 +01:00
Lian Wang	be84f91f87	[LegalizeTypes][VP] Fix OpNo in WidenVecOp_VP_SCATTER Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126276	2022-05-24 07:14:46 +00:00
Craig Topper	569d8945f3	[DAGCombiner][AArch64] Don't fold (smulo x, 2) -> (saddo x, x) if VT is i2. If the VT is i2, then 2 is really -2. Test has not been commited yet, but diff shows the change. Fixes PR55644. Differential Revision: https://reviews.llvm.org/D126213	2022-05-23 11:13:57 -07:00
Craig Topper	c11051a400	[SelectionDAG] Add a freeze to ISD::ABS expansion. I had initially assumed this was the problem with https://github.com/llvm/llvm-project/issues/55271#issuecomment-1133426243 But it turns out that was a simpler issue. This patch is still more correct than what we were doing before so figured I'd submit it anyway. No test case because I'm not sure how to get an undef around until expansion. Looking at the test deltas I wonder if it be valid to combine (sext_inreg (freeze (aextload X))) -> (freeze (sextload X)). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D126175	2022-05-22 14:29:58 -07:00
Craig Topper	768a1ca5ec	[SelectionDAG] Fold abs(undef) to 0 instead of undef. abs should only produce a positive value or the signed minimum value. This means we can't fold abs(undef) to undef as that would allow more values. Fold to 0 instead to match InstSimplify. Fixes test mentioned in comment on pr55271. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D126174	2022-05-22 12:47:32 -07:00
Paul Walker	258dac43d6	[SVE] Enable use of 32bit gather/scatter indices for fixed length vectors Differential Revision: https://reviews.llvm.org/D125193	2022-05-22 12:32:30 +01:00
Ping Deng	0e8ac3a797	[LegalizeTypes][VP] Add integer promotion support for vp.sitofp/vp.uitofp Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125960	2022-05-22 02:13:45 +00:00
Craig Topper	003b95acf2	[LegalizeTypes] Remove double map lookup in DAGTypeLegalizer::PerformExpensiveChecks. NFC Remove repeated checks for ResId being 0.	2022-05-21 00:06:59 -07:00
Craig Topper	66875dbcc0	[LegalizeTypes] Use SmallDenseMap::count instead of SmallDenseMap::find. NFC It's more readable and more efficient.	2022-05-21 00:06:55 -07:00
Jay Foad	6bec3e9303	[APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf Most clients only used these methods because they wanted to be able to extend or truncate to the same bit width (which is a no-op). Now that the standard zext, sext and trunc allow this, there is no reason to use the OrSelf versions. The OrSelf versions additionally have the strange behaviour of allowing extending to a smaller width, or truncating to a larger width, which are also treated as no-ops. A small amount of client code relied on this (ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and needed rewriting. Differential Revision: https://reviews.llvm.org/D125557	2022-05-19 11:23:13 +01:00
Lian Wang	530bab1f93	[RISCV][SelectionDAG] Support VECREDUCE_ADD mask operation Re-landed D125206 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125206	2022-05-19 09:53:33 +00:00
Lian Wang	f035068bb3	[LegalizeVectorTypes][VP] Add widen and split support for VP_SETCC Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D125446	2022-05-19 07:42:39 +00:00
Lian Wang	bbc6834e26	[LegalizeTypes][VP] Add integer promotions support for VP_TRUNCATE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125739	2022-05-19 07:36:10 +00:00
Lian Wang	993070d11f	[LegalizeTypes][VP][NFC] Use an if and two returns instead of ?: operator Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125858	2022-05-19 07:18:24 +00:00
Craig Topper	46eef76876	[DAGCombiner] Fix bug in MatchBSwapHWordLow. This function tries to match (a >> 8) \| (a << 8) as (bswap a) >> 16. If the SRL isn't masked and the high bits aren't demanded, we still need to ensure that bits 23:16 are zero. After the right shift they will be in bits 15:8 which is where the important bits from the SHL end up. It's only a bswap if the OR on bits 15:8 only takes the bits from the SHL. Fixes PR55484. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D125641	2022-05-18 09:23:18 -07:00
Yeting Kuo	00999fb6e1	[SelectionDAGBuilder] Pass fast math flags to most of VP SDNodes. The patch does not pass math flags to float VPCmpIntrinsics because LLParser could not identify float VPCmpIntrinsics as FPMathOperators. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125600	2022-05-18 16:15:47 +08:00
Simon Pilgrim	d40b7f0d5a	[DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a single AND node. AArch64 has a couple of regressions due to this, so I've enforced the existing one use limit inside a AArch64TargetLowering::shouldFoldConstantShiftPairToMask callback. Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125607	2022-05-17 13:40:11 +01:00
jacquesguan	26593e7314	[SelectionDAG] Support more VP reduction mask operation. This patch uses VP_REDUCE_AND and VP_REDUCE_OR to replace VP_REDUCE_SMAX,VP_REDUCE_SMIN,VP_REDUCE_UMAX and VP_REDUCE_UMIN for mask vector type. Differential Revision: https://reviews.llvm.org/D125002	2022-05-17 09:14:21 +00:00

... 3 4 5 6 7 ...

12514 Commits