llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	8c83bd3bd4	[CostModel][X86] Adjust vXi32 multiply costs if it can be performed using PMADDWD Update the costs to match the codegen from combineMulToPMADDWD - not only can we use PMADDWD is its zero-extended, but also if its a constant or sign-extended from a vXi16 (which can be replaced with a zero-extension).	2021-09-25 16:28:48 +01:00
Simon Pilgrim	5a14edd8ed	[InstCombine] Ensure shifts are in range for (X << C1) / C2 -> X fold. We can get here before out of range shift amounts have been handled - limit to BW-2 for sdiv and BW-1 for udiv Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38078	2021-09-25 12:57:43 +01:00
Simon Pilgrim	993f3c61b3	[TTI] getUserCost - Ensure a vector insert/extract index is in unsigned 32-bit range Otherwise fallback to the generic 'unknown index' path Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29050	2021-09-25 10:50:54 +01:00
Nikita Popov	5969e5743a	[IR] Handle large element size when calculating GEP indices This is a fix for the issue reported at https://reviews.llvm.org/D110043#3019942: The ElementSize is a uint64_t and as such may be larger than the index space, or be negative in the index space. This is UB, but shouldn't cause assertion failures. We address this by detecting whether the size is too large and use a zero index in that case (which is always conservatively correct). Differential Revision: https://reviews.llvm.org/D110437	2021-09-24 22:20:20 +02:00
Sanjay Patel	a47c8e40c7	[InstCombine] fold lshr(trunc(lshr X, C1)) C2 Only the multi-use cases are changing here because there's another fold that catches the simpler patterns. But that other fold is the source of infinite loops when we try to add D110170, so removing that is planned as a follow-up. Attempt to show the general proof in Alive2: https://alive2.llvm.org/ce/z/Ns1uS2 Note that the overshift fold-to-zero tests are not currently handled by instsimplify. If they were, we could assert that the shift amount sum is less than the source bitwidth.	2021-09-24 15:44:07 -04:00
Nikita Popov	7774166499	[DSE] Add additional capture tests (NFC) These test other escape sources and the case of multiple underlying objects.	2021-09-24 21:13:29 +02:00
Simon Pilgrim	bdee805b32	[ConstantFold] ConstantFoldGetElementPtr - use APInt::isNegative() instead of getSExtValue() to support big ints Fixes fuzz test: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39197	2021-09-24 18:18:53 +01:00
Simon Pilgrim	36eb6c0134	[SCCP] Regenerate bigint test checks	2021-09-24 18:18:53 +01:00
Hans Wennborg	1e9afab875	Re-apply "[JumpThreading] Ignore free instructions" It seems the crashes we saw wasn't caused by this (see comments on the review). > This is basically D108837 but for jump threading. Free instructions > should be ignored for the threading decision. JumpThreading already > skips some free instructions (like pointer bitcasts), but does not > skip various free intrinsics -- in fact, it currently gives them a > fairly large cost of 2. > > Differential Revision: https://reviews.llvm.org/D110290 This reverts commit `4604695d7c`.	2021-09-24 18:52:30 +02:00
Florian Hahn	6f28fb7081	Recommit "[DSE] Track earliest escape, use for loads in isReadClobber." This reverts the revert commit `df56fc6ebb`. This version of the patch adjusts the location where the EarliestEscapes cache is cleared when an instruction gets removed. The earliest escaping instruction does not have to be a memory instruction. It could be a ptrtoint instruction like in the added test @earliest_escape_ptrtoint, which subsequently gets removed. We need to invalidate the EarliestEscape entry referring to the ptrtoint when deleting it. This fixes the crash mentioned in https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6	2021-09-24 17:13:27 +01:00
Sanjay Patel	638a4147fc	[InstCombine] add tests for lshr-trunc-lshr; NFC	2021-09-24 11:38:19 -04:00
Sanjay Patel	3c5500907b	Revert "[InstCombine] fold cast of right-shift if high bits are not demanded (2nd try)" This reverts commit `bb9333c350`. This exposes another existing bug that causes an infinite loop as shown in D110170 ...so reverting while I look at another fix.	2021-09-24 10:47:35 -04:00
Hans Wennborg	4604695d7c	Revert "[JumpThreading] Ignore free instructions" It caused compiler crashes, see comment on the code review for repro. > This is basically D108837 but for jump threading. Free instructions > should be ignored for the threading decision. JumpThreading already > skips some free instructions (like pointer bitcasts), but does not > skip various free intrinsics -- in fact, it currently gives them a > fairly large cost of 2. > > Differential Revision: https://reviews.llvm.org/D110290 This reverts commit `1e3c6fc7cb`.	2021-09-24 16:14:53 +02:00
Nico Weber	df56fc6ebb	Revert "[DSE] Track earliest escape, use for loads in isReadClobber." This reverts commit `5ce89279c0`. Makes clang crash, see comments on https://reviews.llvm.org/D109844	2021-09-24 09:57:59 -04:00
Nikita Popov	1e3c6fc7cb	[JumpThreading] Ignore free instructions This is basically D108837 but for jump threading. Free instructions should be ignored for the threading decision. JumpThreading already skips some free instructions (like pointer bitcasts), but does not skip various free intrinsics -- in fact, it currently gives them a fairly large cost of 2. Differential Revision: https://reviews.llvm.org/D110290	2021-09-23 18:28:36 +02:00
Simon Pilgrim	c931d35216	[CostModel][X86] Increase i64 mul cost from 1 to 2 Only the most recent cpus support really 1cy 64-bit multiplies, and the X64 cost table represents a realistic worst case. The 1cy value was also discouraging vectorization when most vXi64 PMULDQ expansions aren't actually slower than scalarization. Noticed while investigating PR51436.	2021-09-23 14:48:21 +01:00
Sanjay Patel	bb9333c350	[InstCombine] fold cast of right-shift if high bits are not demanded (2nd try) The 1st try at this was reverted because it caused an infinite loop in instcombine. That should be fixed after: `1cd6b44f26` (masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C Narrowing the shift should be better for analysis and can lead to follow-on transforms as shown. Attempt at a general proof in Alive2: https://alive2.llvm.org/ce/z/tRnnSF Here are a couple of the specific tests: https://alive2.llvm.org/ce/z/bCnTp- https://alive2.llvm.org/ce/z/TfaHnb Differential Revision: https://reviews.llvm.org/D110170	2021-09-23 09:41:37 -04:00
Florian Hahn	5ce89279c0	[DSE] Track earliest escape, use for loads in isReadClobber. At the moment, DSE only considers whether a pointer may be captured at all in a function. This leads to cases where we fail to remove stores to local objects because we do not check if they escape before potential read-clobbers or after. Doing context-sensitive escape queries in isReadClobber has been removed a while ago in `d1a1cce5b1` to save compile-time. See PR50220 for more context. This patch introduces a new capture tracker, which keeps track of the 'earliest' capture. An instruction A is considered earlier than instruction B, if A dominates B. If 2 escapes do not dominate each other, the terminator of the common dominator is chosen. If not all uses cannot be analyzed, the earliest escape is set to the first instruction in the function entry block. If the query instruction dominates the earliest escape and is not in a cycle, then pointer does not escape before the query instruction. This patch uses this information when checking if a load of a loaded underlying object may alias a write to a stack object. If the stack object does not escape before the load, they do not alias. I will share a follow-up patch to also use the information for call instructions to fix PR50220. In terms of compile-time, the impact is low in general, NewPM-O3: +0.05% NewPM-ReleaseThinLTO: +0.05% NewPM-ReleaseLTO-g: +0.03 with the largest change being tramp3d-v4 (+0.30%) http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions Compared to always computing the capture information on demand, we get the following benefits from the caching: NewPM-O3: -0.03% NewPM-ReleaseThinLTO: -0.08% NewPM-ReleaseLTO-g: -0.04% The biggest speedup is tramp3d-v4 (-0.21%). http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions Overall there is a small, but noticeable benefit from caching. I am not entirely sure if the speedups warrant the extra complexity of caching. The way the caching works also means that we might miss a few cases, as it is less precise. Also, there may be a better way to cache things. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109844	2021-09-23 12:45:05 +01:00
Alex Richardson	05663dc146	[InstSimplify] Don't lose inbounds when simplifying a GEP I noticed this while working on a (ptrtoint (gep null, x)) -> x fold. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D110168	2021-09-23 09:25:06 +01:00
Johannes Doerfert	c6457dcae8	[OpenMP][FIX] Be more deliberate about invalidating the AAKernelInfo state This patch fixes a problem when the AAKernelInfo state was invalidated, e.g., due to `optnone` for a kernel, but not all parts indicated the invalidation properly. We further eliminate most full state invalidations as they should never be necessary. Differential Revision: https://reviews.llvm.org/D109468	2021-09-23 00:04:30 -05:00
Johannes Doerfert	57822c3f4f	[OpenMP][NFC] Repair test that contained nested kernels The benchmark contained (partially) nested kernels, something we do not generate nor support.	2021-09-23 00:04:29 -05:00
Johannes Doerfert	92280ae3d8	[OpenMP][NFC] Rerun the test check update script on all OpenMP-Opt tests	2021-09-23 00:04:29 -05:00
Johannes Doerfert	5e835ecb6d	[OpenMP][NFC] Precommit test that exposes a bug in our optnone handling	2021-09-23 00:04:29 -05:00
Usman Nadeem	3b12282b0e	[AArch64][SVE][InstCombine] Eliminate redundant chains of tuple get/set Differential Revision: https://reviews.llvm.org/D109667 Change-Id: I06a3c28e3658ecda109a3a1b73265828274ab2ea	2021-09-22 20:59:46 -07:00
Shilei Tian	423d34f74a	[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit` This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110279	2021-09-22 17:16:41 -04:00
Sanjay Patel	1cd6b44f26	[InstCombine] add one-use check to shift-shift transform We don't want to create extra instructions, and this could infinite loop with the proposed transform in D110170.	2021-09-22 16:31:12 -04:00
Sanjay Patel	55aa4e92f7	[InstCombine] add test for shift-shift with extra use; NFC	2021-09-22 16:31:12 -04:00
Nikita Popov	d8e1203f91	[JumpThreading] Add test with free instructions (NFC) Which demonstrates that "free" instructions can prevent jump threading.	2021-09-22 22:29:39 +02:00
Sanjay Patel	a85d7a56c7	[ValueTracking] fix isOnlyUsedInZeroEqualityComparison with no users This is another problem exposed by: https://bugs.llvm.org/PR50836	2021-09-22 15:01:53 -04:00
Sanjay Patel	c240169ff2	[Analysis] improve function matching for strlen libcall The return type of strlen is size_t, not just any integer. This is a partial fix for an example based on: https://llvm.org/PR50836 There's another bug here because we can still crash processing a real strlen or something that looks like it.	2021-09-22 13:50:12 -04:00
Arthur Eubanks	e7249e4acf	[SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest When determining whether to fold branches to a common destination by merging two blocks, SimplifyCFG will count the number of instructions to be moved into the first basic block. However, there's no reason to count free instructions like bitcasts and other similar instructions. This resolves missed branch foldings with -fstrict-vtable-pointers in llvm-test-suite's lambda benchmark. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108837	2021-09-22 09:52:37 -07:00
Hongtao Yu	d9b511d8e8	[CSSPGO] Set PseudoProbeInserter as a default pass. Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110209	2021-09-22 09:09:48 -07:00
Alexey Bataev	173dd896db	[SLP][NFC]Add a test to show an issue with incorrectly extracted pointers.	2021-09-22 09:02:13 -07:00
hyeongyu kim	98e96663f6	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (3/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineVectorOps. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110230	2021-09-23 00:48:24 +09:00
Shilei Tian	ca999f7191	[OpenMP][Offloading] Use bitset to indicate execution mode instead of value The execution mode of a kernel is stored in a global variable, whose value means: - 0 - SPMD mode - 1 - indicates generic mode - 2 - SPMD mode execution with generic mode semantics We are going to add support for SIMD execution mode. It will be come with another execution mode, such as SIMD-generic mode. As a result, this value-based indicator is not flexible. This patch changes to bitset based solution to encode execution mode. Each position is: [0] - generic mode [1] - SPMD mode [2] - SIMD mode (will be added later) In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode execution with generic mode semantics. In the future after we add the support for SIMD mode, `0b1xx` will be in SIMD mode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110029	2021-09-22 11:40:52 -04:00
hyeongyu kim	ec8311444a	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (2/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110227	2021-09-23 00:14:50 +09:00
hyeongyu kim	e5aaf03326	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (1/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineCasts. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110226	2021-09-22 23:18:51 +09:00
Alexey Bataev	b6d10beb50	[SLP][NFC]Rename function in the test for better matching of the transformation.	2021-09-22 05:51:18 -07:00
Florian Hahn	a7c6471a85	[Passes] Run vector-combine early with -fenable-matrix. IR with matrix intrinsics is likely to also contain large vector operations, which can benefit from early simplifications. This is the last step in a series of changes to improve code-gen for code using matrix subscript operators with the C/C++ matrix extension in CLang, like using matrix_t = double __attribute__((matrix_type(15, 15))); void foo(unsigned i, matrix_t &A, matrix_t &B) { for (unsigned j = 0; j < 4; ++j) for (unsigned k = 0; k < i; k++) B[k][j] -= A[k][j] * B[i][j]; } https://clang.godbolt.org/z/6dKxK1Ed7 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D102496	2021-09-22 12:48:32 +01:00
Sanjay Patel	c6013f71a4	Revert "[InstCombine] fold cast of right-shift if high bits are not demanded" This reverts commit `2f6b07316f`. This caused several bots to hit an infinite loop at stage 2, so it needs to be reverted while figuring out how to fix that.	2021-09-22 07:45:21 -04:00
Yi Kong	d0746f2e9b	Don't fold (select C, (gep Ptr, Idx), Ptr) if C is vector but Idx is scalar The folding rule (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) creates a malformed SELECT IR if C is a vector while Idx is scalar. SELECT VecC, ScalarIdx, 0 We could splat Idx to a vector but it defeats the purpose of optimisation. Don't apply the folding rule in this case. This fixes a regression from commit `d561b6fbdb`.	2021-09-22 18:11:33 +08:00
Simon Pilgrim	41492d77ba	[LoopVectorize][X86] Add operands to make it more obvious what line the CHECK concerns As we're checking the cost debug analysis these should match the original IR line - so we shouldn't have any variable naming issues. I'm investigating v4i32 mul -> PMADDDW costs handling (for PR47437) and these CHECK lines were proving tricky to keep track of	2021-09-22 10:08:32 +01:00
Florian Hahn	300870a95c	[VectorCombine] Switch to using a worklist. This patch updates VectorCombine to use a worklist to allow iterative simplifications where a combine enables other combines. Suggested in D100302. The main use case at the moment is foldSingleElementStore and scalarizeLoadExtract working together to improve scalarization. Note that we now also do not run SimplifyInstructionsInBlock on the whole function if there have been changes. This means we fail to remove/simplify instructions not related to any of the vector combines. IMO this is fine, as simplifying the whole function seems more like a workaround for not tracking the changed instructions. Compile-time impact looks neutral: NewPM-O3: +0.02% NewPM-ReleaseThinLTO: -0.00% NewPM-ReleaseLTO-g: -0.02% http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions Reviewed By: spatel, lebedev.ri Differential Revision: https://reviews.llvm.org/D110171	2021-09-22 09:54:58 +01:00
Sanjay Patel	2f6b07316f	[InstCombine] fold cast of right-shift if high bits are not demanded (masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C Narrowing the shift should be better for analysis and can lead to follow-on transforms as shown. Attempt at a general proof in Alive2: https://alive2.llvm.org/ce/z/tRnnSF Here are a couple of the specific tests: https://alive2.llvm.org/ce/z/bCnTp- https://alive2.llvm.org/ce/z/TfaHnb Differential Revision: https://reviews.llvm.org/D110170	2021-09-21 16:09:08 -04:00
Antonio Frighetto	43d6991c2a	[IR] Look through bitcast in hasFnAttribute() A logic incompleteness may lead MemorySSA to be too conservative in its results. Specifically, when dealing with a call of kind `call i32 bitcast (i1 (i1)* @test to i32 (i32)*)(i32 %1)`, where the function `test` is declared with readonly attribute, the bitcast is not looked through, obscuring function attributes. Hence, some methods of CallBase (e.g., doesNotReadMemory) could provide suboptimal results. Differential Revision: https://reviews.llvm.org/D109888	2021-09-21 21:57:02 +02:00
Nikita Popov	f2fa6ad047	[MergeICmps] Don't reorder unmerged comparisons MergeICmps will currently sort (by offset) all comparisons in a chain, including those that do not get merged. This is problematic in two ways: * We may end up moving the original first block into the middle of the chain, in which case the "extra work" instructions will also be in the middle of the chain, resulting in invalid IR (reported in https://reviews.llvm.org/D108782#3005583). * Reordering branches is generally not legal, because it may introduce branch on poison, which is UB (PR51845). The merging done by MergeICmps is legal as long as we assume that memcmp() works on frozen memory, but the reordering of unmerged comparisons is definitely incorrect (without inserting freeze instructions), so we should avoid it. There are easier ways to fix the first issue, but I figured it was worthwhile to do this properly to also fix the second one. What we now do is to restore the original relative order of (potentially merged) comparisons. I took the liberty of dropping the MERGEICMPS_DOT_ON functionality, because it would be more awkward to implement now (as the before and after representation is different) and it doesn't seem terribly useful nowadays. Differential Revision: https://reviews.llvm.org/D110024	2021-09-21 21:22:12 +02:00
Owen Anderson	b5fbbdd202	Teach InstCombine to eliminate malloc-realloc-free triplets. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D109988	2021-09-21 18:07:49 +00:00
Danila Malyutin	78b51c7a2c	[LSR] Make sure that Factor fits into Base type Fixes pr42770 Differential Revision: https://reviews.llvm.org/D108772	2021-09-21 20:50:50 +03:00
Ayal Zaks	ab6a69dfea	[LV] Fix crash for reverse interleaved loads with gap under fold-tail. This patch fixes the crash found by PR51614: whenever doing tail folding, interleave groups must be considered under mask. Another fix D108900 follows for targets that support masked loads and stores: when deciding to vectorize with masked interleave groups, check if the access is reverse - which is currently not supported; rather than (only) asserting when computing cost and generating code. Differential Revision: https://reviews.llvm.org/D108891	2021-09-21 20:13:32 +03:00
Dávid Bolvanský	c0fdfc9af2	[InstCombine] powi(x, y) * powi(x, z) -> powi(x, y + z) We already have pow(x, y) * pow(x, z) -> pow(x, y + z) transformation, but we are missing same transformation for powi (power is integer). Requires reassoc. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D109954	2021-09-21 18:20:46 +02:00
Sanjay Patel	08ef71ca92	[InstCombine] move/add tests for trunc-of-lshr; NFC Planning to reframe a proposed transform in terms of demanded bits as suggested in D110170. The new tests end with an 'or'.	2021-09-21 12:11:25 -04:00
Florian Hahn	5131037ea9	[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange. isValidAssumeForContext can provide better results with access to the dominator tree in some cases. This patch adjusts computeConstantRange to allow passing through a dominator tree. The use VectorCombine is updated to pass through the DT to enable additional scalarization. Note that similar APIs like computeKnownBits already accept optional dominator tree arguments. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D110175	2021-09-21 16:54:47 +01:00
Anna Thomas	69921f6f45	[InstCombine] Improve TryToSinkInstruction with multiple uses This patch allows sinking an instruction which can have multiple uses in a single user. We were previously over-restrictive by looking for exactly one use, rather than one user. Also added an API for retrieving a unique undroppable user. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109700	2021-09-21 10:04:04 -04:00
Sanjay Patel	af1c5312d7	[InstCombine] add tests for mask-shift with trunc; NFC	2021-09-21 09:41:41 -04:00
Florian Hahn	ea27dd7497	[VectorCombine] Add tests which require DT to use info from assumes.	2021-09-21 13:07:06 +01:00
Simon Pilgrim	fc8f1e4419	[InstCombine] foldConstantInsEltIntoShuffle - bail if we fail to find constant element (PR51824) If getAggregateElement() returns null for any element, early out as otherwise we will assert when creating a new constant vector Fixes PR51824 + ; OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38057	2021-09-21 13:01:09 +01:00
David Stenberg	7b4cc09b14	[LowerConstantIntrinsics] Fix heap-use-after-free bug in worklist This fixes PR51730, a heap-use-after-free bug in replaceConditionalBranchesOnConstant(). With the attached reproducer we were left with a function looking something like this after replaceAndRecursivelySimplify(): [...] cont2.i: br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i handler.type_mismatch3.i: %3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ] unreachable cont4.i: unreachable [...] with both the branch instruction and PHI node being in the worklist. As a result of replacing the branch instruction with an unconditional branch, the PHI node in %handler.type_mismatch3.i would be removed. This then resulted in a heap-use-after-free bug due to accessing that removed PHI node in the next worklist iteration. This is solved by using a value handle worklist. I am a unsure if this is the most idiomatic solution. Another solution could have been to produce a worklist just containing the interesting branch instructions, but I thought that it perhaps was a bit cleaner to keep all worklist filtering in the loop that does the rewrites. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D109221	2021-09-21 11:33:07 +02:00
Max Kazantsev	2c7d5fbc9e	[SCEV] Generalize implication when signedness of FoundPred doesn't matter The implication logic for two values that are both negative or non-negative says that it doesn't matter whether their predicate is signed and unsigned, but only flips unsigned into signed for further inference. This patch adds support for flipping a signed predicate into unsigned as well. Differential Revision: https://reviews.llvm.org/D109959 Reviewed By: nikic	2021-09-21 11:17:56 +07:00
Max Kazantsev	073b254cff	[SimplifyCFG] Redirect switch cases that lead to UB into an unreachable block When following a case of a switch instruction is guaranteed to lead to UB, we can safely break these edges and redirect those cases into a newly created unreachable block. As result, CFG will become simpler and we can remove some of Phi inputs to make further analyzes easier. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D109428 Reviewed By: lebedev.ri	2021-09-21 10:45:19 +07:00
Usman Nadeem	f417d9d821	[InstCombine] Eliminate vector reverse if all inputs/outputs to an instruction are reverses Differential Revision: https://reviews.llvm.org/D109808 Change-Id: I1a10d2bc33acbe0ea353c6cb3d077851391fe73e	2021-09-20 18:32:24 -07:00
Nikita Popov	dd0226561e	[IR] Add helper to convert offset to GEP indices We implement logic to convert a byte offset into a sequence of GEP indices for that offset in a number of places. This patch adds a DataLayout::getGEPIndicesForOffset() method, which implements the core logic. I've updated SROA, ConstantFolding and InstCombine to use it, and there's a few more places where it looks relevant. Differential Revision: https://reviews.llvm.org/D110043	2021-09-20 20:18:16 +02:00
Florian Hahn	963d3a22b3	[DSE] Add additional tests to cover review comments. Adds additional tests following comments from D109844. Also removes unusued in.ptr arguments and places in the call tests that used loads instead of a getval call.	2021-09-20 17:06:04 +01:00
Alexey Bataev	bc69dd62c0	[SLP]Improve graph reordering. Reworked reordering algorithm. Originally, the compiler just tried to detect the most common order in the reordarable nodes (loads, stores, extractelements,extractvalues) and then fully rebuilding the graph in the best order. This was not effecient, since it required an extra memory and time for building/rebuilding tree, double the use of the scheduling budget, which could lead to missing vectorization due to exausted scheduling resources. Patch provide 2-way approach for graph reodering problem. At first, all reordering is done in-place, it doe not required tree deleting/rebuilding, it just rotates the scalars/orders/reuses masks in the graph node. The first step (top-to bottom) rotates the whole graph, similarly to the previous implementation. Compiler counts the number of the most used orders of the graph nodes with the same vectorization factor and then rotates the subgraph with the given vectorization factor to the most used order, if it is not empty. Then repeats the same procedure for the subgraphs with the smaller vectorization factor. We can do this because we still need to reshuffle smaller subgraph when buildiong operands for the graph nodes with lasrger vectorization factor, we can rotate just subgraph, not the whole graph. The second step (bottom-to-top) scans through the leaves and tries to detect the users of the leaves which can be reordered. If the leaves can be reorder in the best fashion, they are reordered and their user too. It allows to remove double shuffles to the same ordering of the operands in many cases and just reorder the user operations instead. Plus, it moves the final shuffles closer to the top of the graph and in many cases allows to remove extra shuffle because the same procedure is repeated again and we can again merge some reordering masks and reorder user nodes instead of the operands. Also, patch improves cost model for gathering of loads, which improves x264 benchmark in some cases. Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264, +3% for 508.namd, improves most of other benchmarks. The compile and link time are almost the same, though in some cases it should be better (we're not doing an extra instruction scheduling anymore) + we may vectorize more code for the large basic blocks again because of saving scheduling budget. Differential Revision: https://reviews.llvm.org/D105020	2021-09-20 08:42:19 -07:00
David Sherwood	f988f68064	[Analysis] Add support for vscale in computeKnownBitsFromOperator In ValueTracking.cpp we use a function called computeKnownBitsFromOperator to determine the known bits of a value. For the vscale intrinsic if the function contains the vscale_range attribute we can use the maximum and minimum values of vscale to determine some known zero and one bits. This should help to improve code quality by allowing certain optimisations to take place. Tests added here: Transforms/InstCombine/icmp-vscale.ll Differential Revision: https://reviews.llvm.org/D109883	2021-09-20 15:01:59 +01:00
Max Kazantsev	e9d34c5429	[NFC] Add assert and test showing that revert of D109596 wasn't justified All transforms of IndVars have prerequisite requirement of LCSSA and LoopSimplify form and rely on it. Added test that shows that this actually stands.	2021-09-20 12:01:12 +07:00
Max Kazantsev	471217cff8	Revert "Revert "[IndVars] Replace PHIs if loop exits on 1st iteration"" This reverts commit `6fec6552f5`. The patch was reverted on incorrect claim that this patch may break LCSSA form when the loop is not in a simplify form. All IndVars' transform insure that the loop is in simplify and LCSSA form, so if it wasn't broken before this transform, it will also not be broken after it.	2021-09-20 12:01:10 +07:00
Max Kazantsev	def15c5fb6	[SCEV] Support negative values in signed/unsigned predicate reasoning There is a piece of logic that uses the fact that signed and unsigned versions of the same predicate are equivalent when both values are non-negative. It's also true when both of them are negative. Differential Revision: https://reviews.llvm.org/D109957 Reviewed By: nikic	2021-09-20 11:26:33 +07:00
Chris Jackson	5ba8020326	[DebugInfo][LSR] Emit shorter expressions from scev-based salvaging The scev-based salvaging for LSR can sometimes produce unnecessarily verbose expressions. This patch adds logic to detect when the value to be recovered and the induction variable differ by only a constant offset. Then, the expression to derive the current iteration count can be omitted from the dbg.value in favour of the offset. Reviewed by: aprantl Differential Revision: https://reviews.llvm.org/D109044	2021-09-19 21:41:44 +01:00
Sanjay Patel	9555d1edb0	[InstCombine] add/adjust tests for min/max intrinsics; NFC If we transform these, we have to propagate no-wrap/undef carefully.	2021-09-19 10:10:37 -04:00
Nikita Popov	abe21da670	[Tests] Fix noalias metadata in one more test Missed this one in `80110aafa0`. This is another test mixing up alias scopes and alias scope lists.	2021-09-18 21:17:05 +02:00
Nikita Popov	80110aafa0	[Tests] Fix incorrect noalias metadata Mostly this fixes cases where !noalias or !alias.scope were passed a scope rather than a scope list. In some cases I opted to drop the metadata entirely instead, because it is not really relevant to the test.	2021-09-18 20:51:00 +02:00
Usman Nadeem	d841c72e09	Precommit tests for D109807 "[InstCombine] Narrow type of logical operation chains when possible" Change-Id: Iae9bf18619e4926301a866c7e2bd38ced524804e	2021-09-18 11:28:49 -07:00
Joseph Huber	27905eeb89	[Attributor] Change AAExecutionDomain to check intrinsic edges The AAExecutionDomain instance checks if a BB is executed by the main thread only. Currently, this only checks the `__kmpc_kernel_init` call for generic regions to indicate the path taken by the main thread. In the new runtime, we want to be able to detect basic blocks even in SPMD mode. For this we enable it to check thread-ID intrinsics being compared to zero as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109849	2021-09-17 19:51:38 -04:00
Joseph Huber	fec2927e07	[OpenMP] Add NoSync attributes to alloc / free shared RTL calls This patch adds the `nosync` attribute to the `__kmpc_alloc_shared` and `__kmpc_free_shared` runtime library calls. This allows code analysis to know that these functins dont contain any barriers. This will help optimizations reason about the CFG of blocks containing these calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109995	2021-09-17 19:50:13 -04:00
Usman Nadeem	757384abff	[AArch64][SVE][InstCombine] Fold redundant zip1/2(uzp1/2) operations zip1(uzp1(A, B), uzp2(A, B)) --> A zip2(uzp1(A, B), uzp2(A, B)) --> B Differential Revision: https://reviews.llvm.org/D109666 Change-Id: I4a6578db2fcef9ff71ad0e77b9fe08354e6dbfcd	2021-09-17 15:24:46 -07:00
Sanjay Patel	6da3503602	[InstCombine] add tests for min/max intrinsics with offset operand; NFC	2021-09-17 16:36:46 -04:00
Dávid Bolvanský	d01e0c8c66	[NFC] Precommit tests for D109954	2021-09-17 21:59:35 +02:00
Hongtao Yu	c5fafc1e73	[CSSPGO] Tweakes to lower pseudo probe runtime overhead A couple tweaks to 1. allow more thinlto importing by excluding probe intrinsics from IR size in module summary 2. Allow general default attributes (nofree nosync nounwind) for pseudo probe intrinsic. Without those attributes, pseudo probes will be basically treated as unknown calls which will in turn block their containing functions from annotated with those attributes. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D109976	2021-09-17 12:28:09 -07:00
Alexey Bataev	2b0b1d5319	[SLP][NFC]Add a test for reorder of alt shuffle operands.	2021-09-17 10:42:45 -07:00
Sanjay Patel	41ff7612b3	[InstCombine] allow splat vectors for narrowing masked fold Mostly cosmetic diffs, but the use of m_APInt matches splat constants.	2021-09-17 11:24:16 -04:00
Sanjay Patel	3a587ed20f	[InstCombine] add vector tests for 'and' folds; NFC	2021-09-17 11:24:16 -04:00
Max Kazantsev	690f76958a	[Test] Add simple test where IndVars fails to remove checks on negative values	2021-09-17 15:40:32 +07:00
Florian Hahn	bdafe3124c	[DSE] Add test cases with stores to objects before they escape. Test cases where stores to local objects can be removed because the object does not escape before calls that may read/write to memory. Includes test from PR50220.	2021-09-17 09:10:53 +01:00
Max Kazantsev	74fa174f33	[Test] One more missing opportunity on IndVars check removal	2021-09-17 14:52:15 +07:00
Sjoerd Meijer	97cc678cc4	[FuncSpec] Specialising on addresses of const global values. This introduces an option to allow specialising on the address of global values. This option is off by default because it is likely not that profitable to do so and needs more investigation. Before, we were specialising on addresses and thus this changes the default behaviour. Differential Revision: https://reviews.llvm.org/D109775	2021-09-17 08:07:05 +01:00
Christudasan Devadasan	167ff5280d	[GlobalOpt] Do not shrink global to bool for an unfavorable AS Do not call `TryToShrinkGlobalToBoolean` for address spaces that don't allow initializers. It inserts an initializer value while shrinking to bool. Used the target hook introduced with D109337 to skip this call for the restricted address spaces. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D109823	2021-09-16 23:13:30 -04:00
Daniil Suchkov	fe950cba8f	Update LoopPredication test to fix buildbot failure. This patch updates tests added in `5f2b7879f1`.	2021-09-16 23:37:59 +00:00
Daniil Suchkov	0e36288318	[LoopPredication] Report changes correctly when attempting loop exit predication To make the IR easier to analyze, this pass makes some minor transformations. After that, even if it doesn't decide to optimize anything, it can't report that it changed nothing and preserved all the analyses. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D109855	2021-09-16 22:49:55 +00:00
Daniil Suchkov	5f2b7879f1	NFC. Add tests exposing missing analysis invalidation in LoopPredication.	2021-09-16 22:49:55 +00:00
Jon Roelofs	4b19e7dfae	[LoopIdiomRecognize][Remarks] Track loop-strided store to/from blocks Differential revision: https://reviews.llvm.org/D109929	2021-09-16 15:46:26 -07:00
Arthur Eubanks	d49cb5b303	[SimplifyCFG] Add bonus when seeing vector ops to branch fold to common dest This makes some tests in vector-reductions-logical.ll more stable when applying D108837. The cost of branching is higher when vector ops are involved due to potential SLP transformations. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108935	2021-09-16 10:50:36 -07:00
Dávid Bolvanský	a4a426c9e0	[InstCombine] Added llvm.powi optimizations If power is even: powi(-x, p) -> powi(x, p) powi(fabs(x), p) -> powi(x, p) powi(copysign(x, y), p) -> powi(x, p)	2021-09-16 19:42:21 +02:00
Dávid Bolvanský	c0afb00924	[NFC] Added tests for llvm.powi optimizations	2021-09-16 19:42:21 +02:00
Sjoerd Meijer	2a1ac2e318	[FuncSpec] Add force flag to test case to trigger the transform. NFC.	2021-09-16 17:48:13 +01:00
Florian Hahn	2f97ff8e7b	[SLP] Add additional memory versioning tests.	2021-09-16 13:31:14 +01:00
Anton Afanasyev	6a5f49a1ac	[AggressiveInstCombine] Add `{insert/extract}element` to `TruncInstCombine` DAG Alive2 for `{insert/extract}element`: https://alive2.llvm.org/ce/z/hwy_E- Actually, no one file of test suite is touched by this change, which means that is rare pattern not generated by frontend. But it's worth being in place. Differential Revision: https://reviews.llvm.org/D109236	2021-09-16 11:24:31 +03:00
Anton Afanasyev	8371a4c9d5	[Test][AggressiveInstCombine] Add test for truncation of vector instructions Precommit test for D109236	2021-09-16 11:24:30 +03:00
Sjoerd Meijer	a4e437e3c9	[FuncSpec] Add a test for specialising on a non-constant global argument. NFC.	2021-09-16 09:17:39 +01:00
Sam Parker	c98a8a09b5	[HardwareLoops] Loop guard intrinsic to recognise zext If a loop count was initially represented by a 32b unsigned int in C then the hardware-loop pass can recognise the loop guard and insert the llvm.test.set.loop.iterations intrinsic. If this was instead a unsigned short/char then clang inserts a zext instruction to expand the loop count to an i32. This patch adds the necessary pattern matching to enable the use of lvm.test.set.loop.iterations in those cases. Patch by: sherwin-dc Differential Revision: https://reviews.llvm.org/D109631	2021-09-16 08:33:16 +01:00
Owen Anderson	68079ef0eb	Teach SimplifyCFG to fold switches into lookup tables in more cases. In particular, it couldn't handle cases where lookup table constant expressions involved bitcasts. This does not seem to come up frequently in C++, but comes up reasonably often in Rust via `#[derive(Debug)]`. Originally reported by pcwalton. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109565	2021-09-15 22:07:08 +00:00

1 2 3 4 5 ...

19812 Commits