llvm-project

Commit Graph

Author	SHA1	Message	Date
Sander de Smalen	ec46232517	[DAGCombiner] Fold `ty1 extract_vector(ty2 splat(V)) -> ty1 splat(V)` This seems like an obvious fold, which leads to a few improvements. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118920	2022-02-09 14:30:01 +00:00
Sanjay Patel	905abc5b7d	[SDAG] enable binop identity constant folds for fmul/fdiv The test diffs are identical to D119111. This only affects x86 currently because no other target has an override for the TLI hook that controls this transform.	2022-02-08 10:52:28 -05:00
Roman Lebedev	ae9414d562	[ValueTracking] Only check for non-undef/poison if already known to be a self-multiply https://godbolt.org/z/js9fTTG9h ^ we don't care what `isGuaranteedNotToBeUndefOrPoison()` says unless we already knew that the operands were equal.	2022-02-08 18:35:29 +03:00
Sanjay Patel	a68e098024	[SDAG] move x86 select-with-identity-constant fold behind a target hook; NFC This is no-functional-change-intended because only the x86 target enables the TLI hook currently. We can add fmul/fdiv opcodes to the switch similar to the proposal D119111, but we don't need to make other changes like enabling target-specific combines. We can also add integer opcodes (add, or, shl, etc.) to the switch because this function is called from all of the generic binary opcodes. The goal is to incrementally enable the profitable diffs from D90113 while avoiding regressions. Differential Revision: https://reviews.llvm.org/D119150	2022-02-08 09:55:05 -05:00
Simon Pilgrim	fd2bb51f1e	[ADT] Add APInt/MathExtras isShiftedMask variant returning mask offset/length In many cases, calls to isShiftedMask are immediately followed with checks to determine the size and position of the bitmask. This patch adds variants of APInt::isShiftedMask, isShiftedMask_32 and isShiftedMask_64 that return these values as additional arguments. I've updated a number of cases that were either performing seperate size/position calculations or had created their own local wrapper versions of these. Differential Revision: https://reviews.llvm.org/D119019	2022-02-08 12:04:13 +00:00
Sanjay Patel	d1ecfaa097	[SDAG] try to fold one-demanded-bit-of-multiply This is a translation of the transform added to InstCombine with: D118539	2022-02-07 17:24:35 -05:00
Sanjay Patel	fc6bee1c11	[SDAG] SimplifyDemandedBits - generalize fold for 2 LSB of X*X This is translated from recent changes to the IR version of this function: D119060 D119139	2022-02-07 15:38:50 -05:00
Simon Pilgrim	74555fd367	[DAG] visitINSERT_VECTOR_ELT - break if-else chain as they both return (style). NFC.	2022-02-07 09:58:47 +00:00
Craig Topper	c35ccd2ac8	[DAGCombiner][RISCV] Allow rotates by non-constant to be matched for i32 on riscv64 with Zbb. rv64izbb has a RORW/ROLW instructions that operate on the lower 32-bits of a 64-bit value and sign extend bit 31 of the result. DAGCombiner won't match rotate idioms because the i32 type isn't Legal on riscv64. This patch teaches DAGCombiner to allow it if the type is going to be promoted and the target has Custom type legalization for ISD::ROTL or ISD::ROTR. I've restricted this to scalar types. It doesn't appear any in tree targets other than riscv64 have custom type legalization for rotates. If this patch isn't acceptable, I guess I can match SRLW, SLLW, and OR after type legalization, but I'd like to avoid that if possible. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119062	2022-02-06 10:58:12 -08:00
Bjorn Pettersson	cecf11c315	[DAGCombiner] Fold SSHLSAT/USHLSAT to SHL when no saturation will occur When the shift amount is known and a known sign bit analysis of the shiftee indicates that no saturation will occur, then we can replace SSHLSAT/USHLSAT by SHL. Differential Revision: https://reviews.llvm.org/D118765	2022-02-06 18:59:06 +01:00
Benjamin Kramer	a40dc4eaf8	Simplify mask creation with llvm::seq. NFCI.	2022-02-05 23:35:41 +01:00
Sander de Smalen	6452549f30	[DAGCombiner] Fold vecreduce_or/and if operand is insert_subvector. Fold: vecreduce_or(insert_subvec(zeroinitializer, vec)) -> vecreduce_or(vec) vecreduce_and(insert_subvec(allones, vec)) -> vecreduce_and(vec) vecreduce_and/or(insert_subvec(undef, vec)) -> vecreduce_and/or(vec) This is useful for SVE which uses insert/extract subvector to convert fixed-width to/from scalable vectors. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D118919	2022-02-05 14:35:53 +00:00
John Brawn	0d8092dd48	[AArch64] Fix legalization of v1f64 strict_fsetcc and strict_fsetccs These operations are scalarized but the result type v1i1 isn't which needs special handling (the same as is done for the non-strict versions of these operations). Differential Revision: https://reviews.llvm.org/D118258	2022-02-04 12:55:38 +00:00
serge-sans-paille	ffe8720aa0	Reduce dependencies on llvm/BinaryFormat/Dwarf.h This header is very large (3M Lines once expended) and was included in location where dwarf-specific information were not needed. More specifically, this commit suppresses the dependencies on llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used, this has a decent impact on number of preprocessed lines generated during compilation of LLVM, as showcased below. This is achieved by moving some definitions back to the .cpp file, no performance impact implied[0]. As a consequence of that patch, downstream user may need to manually some extra files: llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h In some situations, codes maybe relying on the fact that llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden dependency now needs to be explicit. $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l after: 10978519 before: 11245451 Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup [0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions Differential Revision: https://reviews.llvm.org/D118781	2022-02-04 11:44:03 +01:00
Bjorn Pettersson	3db39e7479	[DAGCombiner] Fix dependency analysis in checkMergeStoreCandidatesForDependencies In the aftermath of D116895 a problem was found in the analysis of dependencies between store merge candidates in checkMergeStoreCandidatesForDependencies, that is needed to avoid the cycles are introduced in the DAG. In the past it has been enough (or assumed to be enough) to start scanning from non-chain operands when analysing the store merge candidates for dependencies, assuming that the analysis of chain dependencies performed when finding the candidates would cover up for potential dependencies that exist involving the chain operands. It was however discovered that one could end up with scenarios such as descibed in the aarch64-checkMergeStoreCandidatesForDependencies.ll test case, when the dependency between two stores is given by a mix of chain operand dependencies and non-chain operand dependencies. The fix in this patch make sure that we also account for chain operand dependencies when doing the more elaborate analysis in checkMergeStoreCandidatesForDependencies, no longer relying on that the earlier check involving chain operands is enough. Differential Revision: https://reviews.llvm.org/D118943	2022-02-04 08:53:01 +01:00
Sander de Smalen	01bfe9729a	[ISEL] Canonicalize STEP_VECTOR to LHS if RHS is a splat. This helps recognise patterns where we're trying to match STEP_VECTOR patterns to INDEX instructions that take a GPR for the Start/Step. The reason for canonicalising this operation to the LHS is because it will already be canonicalised to the LHS if the RHS is a constant splat vector. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D118459	2022-02-03 09:31:46 +00:00
Simon Pilgrim	5aa2acc86b	[DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.	2022-02-02 12:04:49 +00:00
Simon Moll	7d926b7177	[VE] LEGALAVL and staged VVP legalization The new LEGALAVL node annotates that the AVL refers to packs of 64bit. We use a two-stage lowering approach with LEGALAVL: First, standard SDNodes are translated into illegal VVP layer nodes. Regardless of source (VP or standard), all VVP nodes have a mask and AVL parameter. The AVL parameter refers to the element position (just as in VP intrinsics). Second, we legalize the AVL usage in VVP layer nodes. If the element size is < 64bit, the EVL parameter has to be adjusted to refer to packs of 64bits. We wrap the legalized AVL in a LEGALAVL node to track this. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D118321	2022-02-02 09:11:41 +01:00
David Green	c89cfbd4dd	Revert "[DAG] Extend SearchForAndLoads with any_extend handling" This reverts commit `100763a88f` as it was making incorrect assumptions about implicit zero_extends.	2022-02-01 20:18:40 +00:00
Simon Pilgrim	904395ab8f	[DAG] SimplifyMultipleUseDemandedBits - add default Depth = 0 argument. Simplifies an upcoming change.	2022-02-01 12:34:38 +00:00
Simon Pilgrim	d83a96f59f	[DAG] Make it clear mul(x,x) knownbits bit[1] == 0 check should be for x is undef only As raised on rGffd0e464b4b9, if x is poison, this fold is still ok.	2022-02-01 11:32:14 +00:00
Bjorn Pettersson	3885879046	[DAGCombine] Add simple folds for SSHLSAT/USHLSAT Do "simplifyShift" and "FoldConstantArithmetic" folds for the SSHLSAT and USHLSAT DAG nodes. This includes folds such as: (shlsat undef/poison, x) -> 0 (shlsat x, undef/poison) -> undef (shlsat x, too_large_shamt) -> undef (shlsat 0, x) -> 0 (shlsat x, 0) -> x (shlsat c1, c2) -> c3 Differential Revision: https://reviews.llvm.org/D118603	2022-02-01 10:51:35 +01:00
David Sherwood	daa80339df	[CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors I have updated TargetLowering::isConstTrueVal to also consider SPLAT_VECTOR nodes with constant integer operands. This allows the optimisation to also work for targets that support scalable vectors. Differential Revision: https://reviews.llvm.org/D117210	2022-02-01 09:50:00 +00:00
Philip Reames	57cf29ac1b	[Statepoint] Remove another use of getActualReturnType [NFC] For the cross block gc.result projection case, we only care about the return type if there is a cross block gc.result, and if there is one, we can take the type from the gc.result. At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.	2022-01-31 09:57:46 -08:00
Philip Reames	6e4f7c0823	[Statepoints] Take result type from gc.result [NFC] When lowering a gc.result, we can assume that the result type of the gc.result matches the type of the underlying call. This is explicitly required in LangRef. At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.	2022-01-31 09:42:34 -08:00
Philip Reames	093b43f48d	Sink getGCResultLocality to sole use [NFC]	2022-01-31 09:33:57 -08:00
Kerry McLaughlin	002b944dfa	[SVE] Fix TypeSize->uint64_t implicit conversion in visitAlloca() Fixes a crash ('Invalid size request on a scalable vector') in visitAlloca() when we call this function for a scalable alloca instruction, caused by the implicit conversion of TySize to uint64_t. This patch changes TySize to a TypeSize as returned by getTypeAllocSize() and ensures the allocation size is multiplied by vscale for scalable vectors. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D118372	2022-01-31 14:37:23 +00:00
Dávid Bolvanský	ae990a3cbd	[Analysis] Attribute noundef should not prevent tail call optimization Very similar to https://reviews.llvm.org/D101230 Fixes https://github.com/llvm/llvm-project/issues/53501	2022-01-31 15:13:52 +01:00
Simon Pilgrim	7ec8fc2932	[X86] combineAnd() - per-element simplification - call SimplifyDemandedBits using mask demanded bits if SimplifyDemandedVectorElts fails We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements. This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.	2022-01-31 13:58:00 +00:00
Simon Pilgrim	2d1390efbe	[DAG] SimplifyDemandedBits - mul(x,x) - if only demand bit[1] then fold to zero	2022-01-31 12:00:51 +00:00
Simon Pilgrim	48f45f6b25	[X86] Limit mul(x,x) knownbits tests with not undef/poison check We can only assume bit[1] == zero if its the only demanded bit or the source is not undef/poison	2022-01-31 11:55:10 +00:00
Kazu Hirata	2bea207d26	[CodeGen] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-01-30 12:32:51 -08:00
Cullen Rhodes	5d089d9a83	[DAGCombiner] Fix invalid size request in combineRepeatedFPDivisors If we have a vector FP division with a splatted divisor, use getVectorMinNumElements when scaling the num of uses by splat factor. For AArch64 the combine kicks in for the <vscale x 4 x float> case since it's above the fdiv threshold (3) when scaling num uses by splat factor, but the codegen is worse (splat + vector fdiv + vector fmul) than the <vscale x 2 x double> case (splat + vector fdiv). If the combine could be converted into a scalar FP division by scalarizeBinOpOfSplats it may be cheaper, but it looks like this is predicated on the isExtractVecEltCheap TLI function which is implemented for x86 but not AArch64. Perhaps for now combineRepeatedFPDivisors should only scale num uses by splat if the division can be converted into scalar op. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118343	2022-01-28 17:01:08 +00:00
Ellis Hoag	11d3074267	[InstrProf] Add single byte coverage mode Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because * We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions * We use a single byte per function rather than 8 bytes per block The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code. When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%). [0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D116180	2022-01-27 17:38:55 -08:00
Simon Pilgrim	fdd3e2c943	[DAG] SelectionDAG::getNode(N1,N2) - detect N2 constant vector splats as well as scalars We already perform some basic folds (add/sub with zero etc.) on scalar types, this patch adds some basic support for constant splats as well in a few cases (we can add more with future test coverage). In the cases I've enabled, we can handle buildvector implicit truncation as we're not creating new constant nodes from the vector types - we're just returning existing nodes. This allows us to get a number of extra cases in the aarch64 tests. I haven't enabled support for undefs in buildvector splats, as we're often checking for zero/allones patterns that return the original constant and we shouldn't be returning undef elements in some of these cases - we can enable this later if we're OK with creating new constants. Differential Revision: https://reviews.llvm.org/D118264	2022-01-27 10:59:08 +00:00
Fraser Cormack	84e85e025e	[SelectionDAG][VP] Provide expansion for VP_MERGE This patch adds support for expanding VP_MERGE through a sequence of vector operations producing a full-length mask setting up the elements past EVL/pivot to be false, combining this with the original mask, and culminating in a full-length vector select. This expansion should work for any data type, though the only use for RVV is for boolean vectors, which themselves rely on an expansion for the VSELECT. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118058	2022-01-27 09:00:41 +00:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
Sanjay Patel	63daea8b35	[SDAG] fix bug in ComputeNumSignBits of target constant The loop below the changed line assumes that the element width of the target constant is the same as the element width of the loaded value, but that is not always true. We could try harder to do some kind of min/max calc even if the sizes don't match, but that can be another patch if needed. This fixes #53401 (miscompile) and does not change the motivating cases added when this analysis was introduced: `ad298f86b7`	2022-01-26 10:22:41 -05:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
alex-t	5157f984ae	[AMDGPU] Enable divergence-driven XNOR selection Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence. This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32) on those targets which have no V_XNOR. Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form. We assume that xor (not) is already turned into the not (xor) by the combiner. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116270	2022-01-26 15:33:10 +03:00
David Green	57356d6bb7	[DAG] Create fptoui.sat from clamped fptoui This is the unsigned variant of D111976, where we convert a clamped fptoui to a fptoui.sat. Because we are unsigned, the condition this time is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D114964	2022-01-26 08:37:44 +00:00
Simon Pilgrim	15e2be291f	[DAG] visitMULHS/MULHU/AND - remove some redundant LHS constant checks Now that we constant fold and canonicalize constants to the RHS, we don't need to check both LHS and RHS for specific constants	2022-01-25 11:54:23 +00:00
Bjorn Pettersson	109cc5adcc	[DAGCombine] Fold SRA of a load into a narrower sign-extending load An sra is basically sign-extending a narrower value. Fold away the shift by doing a sextload of a narrower value, when it is legal to reduce the load width accordingly. Differential Revision: https://reviews.llvm.org/D116930	2022-01-25 12:14:48 +01:00
Fraser Cormack	7cb452bfde	[SelectionDAG][VP] Add widening support for VP_MERGE This patch adds widening support for ISD::VP_MERGE, which widens identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118030	2022-01-25 10:59:40 +00:00
Fraser Cormack	5f5c5603ce	[SelectionDAG][VP] Add splitting support for VP_MERGE This patch adds splitting support for ISD::VP_MERGE, which splits identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118032	2022-01-25 10:33:23 +00:00
Victor Perez	2233befa5d	[LegalizeTypes][VP] Add splitting support for vp.gather and vp.scatter Split these nodes in a similar way as their masked versions. Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D117760	2022-01-25 10:08:07 +00:00
Paweł Bylica	9d32847b33	[DAGCombine] Remove unused param in combineCarryDiamond(). NFC	2022-01-24 20:57:00 +01:00
Sander de Smalen	699e22a083	[ISEL] Move trivial step_vector folds to FoldConstantArithmetic. Given that step_vector is practically a constant, doing this early helps with DAGCombine folds that happen before type legalization. There is currently no way to test this happens earlier, although existing tests for step_vector folds continue protect the folds happening at all. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117863	2022-01-24 16:37:21 +00:00
Craig Topper	a43ed49f5b	[DAGCombiner][RISCV] Canonicalize (bswap(bitreverse(x))->bitreverse(bswap(x)). If the bitreverse gets expanded, it will introduce a new bswap. By putting a bswap before the bitreverse, we can ensure it gets cancelled out when this happens. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118012	2022-01-24 08:31:53 -08:00
Craig Topper	b8c7cdcc81	[SelectionDAG][RISCV] Teach getNode to fold bswap(bswap(x))->x. This can show up during when bitreverse is expanded to bswap and swap of bits within a byte. If the input is already a bswap, we should cancel them out before we further transform them in a way that makes it harder to see the redundancy. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118007	2022-01-24 08:17:46 -08:00

1 2 3 4 5 ...

11875 Commits