llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	c14fd171fe	LoopUnrollAndJamPass.cpp - remove unused <vector> include. NFCI.	2021-06-13 14:06:32 +01:00
Adam Nemet	e0efebb8eb	[Matrix] In transpose opts, handle a^t * a^t Without the fix the testcase crashes because we remove the same instruction twice. Differential Revision: https://reviews.llvm.org/D104127	2021-06-11 09:29:43 -07:00
Sjoerd Meijer	9907746f5d	Move Function Specialization to its correct location. NFC. As a follow up of rGc4a0969b9c14, and as part of D104102, move it to the IPO transformations directory.	2021-06-11 15:00:10 +01:00
Sjoerd Meijer	c4a0969b9c	Function Specialization Pass This adds a function specialization pass to LLVM. Constant parameters like function pointers and constant globals are propagated to the callee by specializing the function. This is a first version with a number of limitations: - The pass is off by default, so needs to be enabled on the command line, - It does not handle specialization of recursive functions, - It does not yet handle constants and constant ranges, - Only 1 argument per function is specialised, - The cost-model could be further looked into, and perhaps related, - We are not yet caching analysis results. This is based on earlier work by Matthew Simpson (D36432) and Vinay Madhusudan. More recently this was also discussed on the list, see: https://lists.llvm.org/pipermail/llvm-dev/2021-March/149380.html. The motivation for this work is that function specialisation often comes up as a reason for performance differences of generated code between LLVM and GCC, which has this enabled by default from optimisation level -O3 and up. And while this certainly helps a few cpu benchmark cases, this also triggers in real world codes and is thus a generally useful transformation to have in LLVM. Function specialisation has great potential to increase compile-times and code-size. The summary from some investigations with this patch is: - Compile-time increases for short compile jobs is high relatively, but the increase in absolute numbers still low. - For longer compile-jobs, the extra compile time is around 1%, and very much in line with GCC. - It is difficult to blame one thing for compile-time increases: it looks like everywhere a little bit more time is spent processing more functions and instructions. - But the function specialisation pass itself is not very expensive; it doesn't show up very high in the profile of the optimisation passes. The goal of this work is to reach parity with GCC which means that eventually we would like to get this enabled by default. But first we would like to address some of the limitations before that. Differential Revision: https://reviews.llvm.org/D93838	2021-06-11 09:11:29 +01:00
Andy Kaylor	41555eaf65	Preserve more MD_mem_parallel_loop_access and MD_access_group in SROA SROA sometimes preserves MD_mem_parallel_loop_access and MD_access_group metadata on loads/stores, and sometimes fails to do so. This change adds copying of the MD after other CreateAlignedLoad/CreateAlignedStores. Also fix a case where the metadata was being copied from a load, rather than the store. Added a LIT test to catch one case. Patch by Mark Mendell Differential Revision: https://reviews.llvm.org/D103254	2021-06-10 15:47:03 -07:00
Philip Reames	7629b2a09c	[LI] Add a cover function for checking if a loop is mustprogress [nfc] Essentially, the cover function simply combines the loop level check and the function level scope into one call. This simplifies several callers and is (subjectively) less error prone.	2021-06-10 13:37:32 -07:00
LemonBoy	d3faef6eef	[SROA] Avoid splitting loads/stores with irregular type Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D99435	2021-06-09 16:36:58 +02:00
Nico Weber	205cde63c7	Revert "[SROA] Avoid splitting loads/stores with irregular type" This reverts commit `905f4eb537`. Breaks check-llvm on most (all?) bots, see https://reviews.llvm.org/D99435	2021-06-09 06:32:58 -04:00
LemonBoy	905f4eb537	[SROA] Avoid splitting loads/stores with irregular type Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D99435	2021-06-09 11:48:20 +02:00
Jingu Kang	8eee02020b	[LoopBoundSplit] Ignore phi node which is not scevable There was a bug in LoopBoundSplit. The pass should ignore phi node which is not scevable. Differential Revision: https://reviews.llvm.org/D103913	2021-06-09 09:44:36 +01:00
David Green	297088d1ad	Revert "[DSE] Remove stores in the same loop iteration" Apparently non-dead stores are being removed, as noted in D100464. This reverts commit `222aeb4d51`.	2021-06-08 21:23:08 +01:00
maekawatoshiki	09e92c607c	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Also, a crash problem on legacy pass manager is fixed. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-06-08 20:30:02 +09:00
Simon Pilgrim	596004a947	MemCpyOptimizer.cpp - hasUndefContentsMSSA - Pass DataLayout by reference. NFCI.	2021-06-08 10:41:02 +01:00
Philip Reames	c880d5e583	[RS4GC] Treat inttoptr as base pointer This is a modified version of a patch by tolziplohu with a style change, and most importantly, a revised commit message. inttoptr for a non-integral address space is currently ill defined in the LangRef. Figuring out exactly what the dynamic semantics of such a cast would be is hard, and not yet settled. Despite that, we still need to go ahead and implement something in RS4GC for a couple of reasons. First, as a simple consistency argument. We're apparently added support for constexpr inttoptrs a while back, and even have tests which exercised them. Having a lack of constant folding trigger a crash during lowering is non-ideal. Second, and more fundementally, the optimizer is allowed to insert undefined constructs in unreachable code. At the same time, we can't assume that dynamically dead code is always pruned before lowering. As a result, we must assume that inttoptrs can occur (even if completely ill defined) along dead paths. We need the lowering to not crash. The stackmaps produced can be garbage (as the assumption is the code is dynamically dead), but the lowering itself can't crash. Differential Revision: https://reviews.llvm.org/D103492	2021-06-07 10:27:23 -07:00
Jingu Kang	a2a0ac42ab	[SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IV This pass transforms loops that contain a conditional branch with induction variable. For example, it transforms left code to right code: newbound = min(n, c) while (iv < n) { while(iv < newbound) { A A if (iv < c) B B C C } } if (iv != n) { while (iv < n) { A C } } Differential Revision: https://reviews.llvm.org/D102234	2021-06-07 10:55:25 +01:00
maekawatoshiki	0a9d079931	Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass" This reverts commit `2165360003`. To fix the crash problem in legacy pass manager	2021-06-07 01:26:47 +09:00
Nikita Popov	db45746821	[LoopUnroll] Separate peeling from unrolling Loop peeling is currently performed as part of UnrollLoop(). Outside test scenarios, it is always performed with an unroll count of 1. This means that unrolling doesn't actually do anything apart from performing post-unroll simplification. When testing, it's currently possible to specify both an explicit peel count and an explicit unroll count. This doesn't perform any sensible operation and may result in miscompiles, see https://bugs.llvm.org/show_bug.cgi?id=45939. This patch moves peeling from UnrollLoop() into tryToUnrollLoop(), so that peeling does not also perform a susequent unroll. We only run the post-unroll simplifications. Specifying both an explicit peel count and unroll count is forbidden. In the future, we may want to support both (non-PGO) peeling a loop and unrolling it, but this needs to be done by first performing the peel and then recalculating unrolling heuristics on a now possibly analyzable loop. Differential Revision: https://reviews.llvm.org/D103362	2021-06-05 10:32:00 +02:00
Adam Nemet	ffde966cd9	[Matrix] Fix transpose-multiply folding if transpose has multiple uses Don't add it to FusedInsts in this case. Differential Revision: https://reviews.llvm.org/D103627	2021-06-04 10:55:03 -07:00
Philip Reames	5c0d1b2f90	[LoopUnroll] Eliminate PreserveCondBr parameter and fix a bug in the process This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing when the passed in trip count is an exact trip count or a max trip count. In theory the new code is slightly less powerful (since it relies on exact computable trip counts), but in practice, it appears to cover all the same cases. It can also be extended if needed. The test change shows what appears to be a bug in the existing code around the interaction of peeling and unrolling. The original loop only ran 8 iterations. The previous output had the loop peeled by 2, and then an exact unroll of 8. This meant the loop ran a total of 10 iterations which appears to have been a miscompile. Differential Revision: https://reviews.llvm.org/D103620	2021-06-03 14:09:16 -07:00
Philip Reames	44d70d298a	[LoopUnroll] Eliminate PreserveOnlyFirst parameter [nfc] This is a first step towards simplifying the transform interface to be less error prone. The basic idea is that querying SCEV is cheap (since it's cached) and we can just check for properties related to branch folding in the transform method instead of relying on the heuristic part to pass everything in correctly. Differential Revision: https://reviews.llvm.org/D103584	2021-06-03 10:33:14 -07:00
Hamza Mahfooz	83235b07e3	[Matrix] Preserve existing fast-math flags during lowering This patch makes it so, floating-point instructions created in LowerMatrixIntrinsics retain fast-math flags from instructions that are higher up the chain. Fixes https://bugs.llvm.org/show_bug.cgi?id=49738 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D103233	2021-06-03 15:29:31 +01:00
Stephen Tozer	4316b0e59c	[LoopStrengthReduce] Ensure that debug intrinsics do not affect LSR's output During Loop Strength Reduce, if the terminating condition for the loop is not immediately adjacent to the terminating branch and it has more than one use, a clone of the condition will be created just before the terminating branch and will be used as the branch condition. Currently, whether the instructions are "immediately adjacent" is determined by checking whether the next instruction after the condition is the terminating branch; this is incorrect however, as the presence of a debug intrinsic between the two will result in a change to the output. This is fixed by using getNextNonDebugInstruction() instead. Differential Revision: https://reviews.llvm.org/D103033	2021-06-02 15:56:23 +01:00
Jingu Kang	f3a27511c9	[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch This re-enables commit `107d19eb01` with bug fixes. Differential Revision: https://reviews.llvm.org/D99354	2021-06-02 10:58:22 +01:00
Daniil Fukalov	0b34acdab7	[NFC] Fix 'Load' name masking. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D103456	2021-06-02 11:09:53 +03:00
Florian Hahn	1b84acb23a	[LoopDeletion] Consider infinite loops alive, unless mustprogress. The current loop or any of its sub-loops may be infinite. Unless the function or the loops are marked as mustprogress, this in itself makes the loop not dead. This patch moves the logic to check whether the current loop is finite or mustprogress to `isLoopDead` and also extends it to check the sub-loops. This should fix PR50511. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D103382	2021-06-01 13:07:36 +01:00
Congzhe Cao	bfefde22b6	[LoopInterhcange] Handle movement of reduction phis appropriately This patch fixes pr43326 and pr48212. Currently when we move reduction phis to the right place, loop interchange assumes the first phi in loop headers is an induction phi, skips the first phi and assumes the rest of phis are candidate reduction phis to move. However, it may not always be the case. This patch loops over all phis in loop headers and considers a phi node as a candidate reduction phi to move only when it is indeed a reduction phi across outer and inner loop. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102743	2021-05-31 16:27:38 -04:00
David Green	222aeb4d51	[DSE] Remove stores in the same loop iteration DSE will currently only remove stores in the same block unless they can be guaranteed to be loop invariant. This expands that to any stores that are in the same Loop, at the same loop level. This should still account for where AA/MSSA will not handle aliasing between loops, but allow the dead stores to be removed where they overlap in the same loop iteration. It requires adding loop info to DSE, but that looks fairly harmless. The test case this helps is from code like this, which can come up in certain matrix operations: for(i=..) dst[i] = 0; for(j=..) dst[i] += src[in+j]; After LICM, this becomes: for(i=..) dst[i] = 0; sum = 0; for(j=..) sum += src[in+j]; dst[i] = sum; The first store is dead, and with this patch is now removed. Differntial Revision: https://reviews.llvm.org/D100464	2021-05-31 10:22:37 +01:00
Mindong Chen	71acce68da	[NFCI] Move DEBUG_TYPE definition below #includes When you try to define a new DEBUG_TYPE in a header file, DEBUG_TYPE definition defined around the #includes in files include it could result in redefinition warnings even compile errors. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D102594	2021-05-30 17:31:01 +08:00
Stefan Pintilie	0159652058	Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" (try 2)" This reverts commit `be1a23203b`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	24bd657202	Revert "[NFCI][LoopDeletion] Only query SCEV about loop successor if another successor is also in loop" This reverts commit `b0b2bf3b5d`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	fd55331203	Revert "[NFC] Formatting fix" This reverts commit `59d938e649`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	807fc7cdc9	Revert "[NFC] Reuse existing variables instead of re-requesting successors" This reverts commit `c467585682`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	dd226803c2	Revert "[NFCI][LoopDeletion] Do not call complex analysis for known non-zero BTC" This reverts commit `7d418dadf6`.	2021-05-28 12:21:21 -05:00
eopXD	fa488ea864	[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass This patch changes LoopFlattenPass from FunctionPass to LoopNestPass. Utilize LoopNest and let function 'Flatten' generate information from it. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102904	2021-05-28 15:43:12 +00:00
eopXD	e96d6f4821	Revert "[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass" This reverts commit `7952ddb21f`. Differential Revision: https://reviews.llvm.org/D103302	2021-05-28 07:58:06 +00:00
eopXD	7e06cf8f1b	Revert "[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass" This reverts commit `ffc4d3e068`.	2021-05-28 07:48:04 +00:00
eopXD	ffc4d3e068	[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass This patch changes LoopFlattenPass from FunctionPass to LoopNestPass. Utilize LoopNest and let function 'Flatten' generate information from it. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102904	2021-05-28 07:25:53 +00:00
eopXD	7952ddb21f	[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass This patch changes LoopFlattenPass from FunctionPass to LoopNestPass. Utilize LoopNest and let function 'Flatten' generate information from it. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102904	2021-05-28 07:11:26 +00:00
Max Kazantsev	6a2af607ad	Revert "[NFCI] Lazily evaluate SCEVs of PHIs" This reverts commit `51d334a845`. Reported failures, need to analyze.	2021-05-28 11:05:30 +07:00
maekawatoshiki	2165360003	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-05-28 01:17:23 +09:00
Max Kazantsev	7d418dadf6	[NFCI][LoopDeletion] Do not call complex analysis for known non-zero BTC	2021-05-27 15:29:37 +07:00
Max Kazantsev	c467585682	[NFC] Reuse existing variables instead of re-requesting successors	2021-05-27 15:29:37 +07:00
Max Kazantsev	51d334a845	[NFCI] Lazily evaluate SCEVs of PHIs Eager evaluation has cost of compile time. Only query them if they are required for proving predicates.	2021-05-27 13:35:31 +07:00
Max Kazantsev	59d938e649	[NFC] Formatting fix	2021-05-27 12:50:54 +07:00
Max Kazantsev	b0b2bf3b5d	[NFCI][LoopDeletion] Only query SCEV about loop successor if another successor is also in loop	2021-05-27 12:44:22 +07:00
Yevgeny Rouban	4d26f41f76	[RS4GC] Introduce intrinsics to get base ptr and offset There can be a need for some optimizations to get (base, offset) for any GC pointer. The base can be calculated by generating needed instructions as it is done by the RewriteStatepointsForGC::findBasePointer() function. The offset can be calculated in the same way. Though to not expose the base calculation and to make the offset calculation as simple as ptrtoint(derived_ptr) - ptrtoint(base_ptr), which is illegal outside RS4GC, this patch introduces 2 intrinsics: @llvm.experimental.gc.get.pointer.base(%derived_ptr) @llvm.experimental.gc.get.pointer.offset(%derived_ptr) These intrinsics are inlined by RS4GC along with generation of statepoint sequences. With these new intrinsics the GC parseable lowering for atomic memcpy intrinsics (`6ec2c5e402`) could be implemented as a separate pass. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D100445	2021-05-27 09:14:14 +07:00
Philip Reames	9cc2181ec3	[unroll] Use value domain for symbolic execution based cost model The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost. By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928. Differential Revision: https://reviews.llvm.org/D102934	2021-05-26 08:41:25 -07:00
Max Kazantsev	be1a23203b	Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" (try 2) The patch was reverted due to compile time impact of contextual SCEV queries. It also appeared that it introduced a miscompile on irreducible CFG. Changes made: 1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate; 2. Irreducible CFG in live code is now detected and excluded from processing. Differential Revision: https://reviews.llvm.org/D102615	2021-05-26 19:47:14 +07:00
Max Kazantsev	0de553dce0	Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration"" This reverts commit `43d2e51c2e`. Commited wrong version.	2021-05-26 19:29:07 +07:00
Max Kazantsev	43d2e51c2e	Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" The patch was reverted due to compile time impact of contextual SCEV queries. It also appeared that it introduced a miscompile on irreducible CFG. Changes made: 1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate; 2. Irreducible CFG in live code is now detected and excluded from processing. Differential Revision: https://reviews.llvm.org/D102615	2021-05-26 19:23:21 +07:00
Matt Morehouse	832c99f727	Revert "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" This reverts commit `2531fd70d1` due to performance regression on the PPC buildbot.	2021-05-25 13:58:42 -07:00
Benjamin Kramer	d2d4f16806	[Matrix] Use LLVM_DEBUG for a debug flag dump() doesn't exist in release builds. ld.lld: error: undefined symbol: llvm::Value::dump() const >>> referenced by LowerMatrixIntrinsics.cpp >>> LowerMatrixIntrinsics.o:((anonymous namespace)::LowerMatrixIntrinsics::Visit())	2021-05-25 21:10:19 +02:00
Nikita Popov	9c91614959	[CVP] Guard against poison in common phi value transform (PR50399) The common phi value transform replaces constants with values that have the same value as the constant on a given edge. However, LVI generally only provides information that is correct up to poison, so this can end up replacing a well-defined value with poison. D69442 addressed an instance of this problem by clearing poison flags on the generating instruction, which was sufficient at the time. rGa917fb89dc28 made LVI's edge value analysis slightly more powerful, and clearing poison flags is no longer sufficient. This patch changes the transform to instead explicitly guard against a poison value instead. This should be satisfied for most cases due to a prior branch on poison. Fixes https://bugs.llvm.org/show_bug.cgi?id=50399. Differential Revision: https://reviews.llvm.org/D102966	2021-05-25 20:47:17 +02:00
Adam Nemet	dfd1bbd00a	[Matrix] Factor and distribute transposes across multiplies Now that we can fold some transposes into multiplies (CM: A * B^t and RM: A^t * B), we want to move them around to create the optimal expressions: * fold away double transposes while still using them to assert the shape * sink transposes hoping they cancel out * lift transposes when both operands are transposed This also modifies the matrix remarks to include the number of exposed transposes (i.e. transposes that we couldn't fold into a multiply). The adjustment to the test remarks-inlining is a bit subtle: I am changing the double transpose to a single transpose so that we don't remove it completely. More importantly this changes some of the total instruction count, most notable stores because we can no longer use a vector store. Differential Revision: https://reviews.llvm.org/D102733	2021-05-25 11:12:20 -07:00
Roman Lebedev	149e018d12	[LoopIdiom] 'arithmetic right-shift until zero': don't turn potentially infinite loops into finite ones Nowadays LLVM does not assume that all loops are finite, so if we want to produce a finite loop from a potentially-infinite one, we must ensure that the original loop is known to be a finite one. For this transform, it only matters for arithmetic right-shifts. For them, either the function or the loop must be known to be `mustprogress`, or the original value being shifted must be known to be non-negative (because iff the sign bit was set, it will never become zero, but will become `-1` in the "end"). It would be really good for alive2 to actually complain about this, but it currently does not: https://github.com/AliveToolkit/alive2/issues/726	2021-05-25 21:02:28 +03:00
Roman Lebedev	8f4db14d1c	[LoopIdiom] Support 'left-shift until zero' idiom This adds support for the "count active bits" pattern, i.e.: ``` int countBits(unsigned val) { int cnt = 0; for( ; (val << cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one: ``` int countBits(unsigned val, int start, int off) { int cnt; for (cnt = start; val << (cnt + off); cnt++) ; return cnt; } ``` alive2 is happy with all the tests there. Note that, again, much like with the right-shift cases, we don't require the `val != 0` guard. This is the last pattern that was supported by `detectShiftUntilZeroIdiom()`, which now becomes obsolete.	2021-05-25 15:26:35 +03:00
Roman Lebedev	f1c5f78d38	[LoopIdiom] Support 'arithmetic right-shift until zero' idiom This adds support for the "count active bits" pattern, i.e.: ``` int countActiveBits(signed val) { int cnt = 0; for( ; (val >> cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one: ``` int countActiveBits(signed val, int start, int off) { int cnt; for (cnt = start; val >> (cnt + off); cnt++) ; return cnt; } ``` This directly matches the existing 'logical right-shift until zero' idiom. alive2 is happy with all the tests there. Note that, again, much like with the original unsigned case, we don't require the `val != 0` guard. The old `detectShiftUntilZeroIdiom()` already supports this pattern, the idea here is that the `val` must be positive (have at least one leading zero), because otherwise the loop is non-terminating, but since it is not `while(1)`, that would have been UB.	2021-05-25 14:30:49 +03:00
Alexey Lapshin	10c2e26159	[TRE] Reland: allow TRE for non-capturing calls. The D82085 "allow TRE for non-capturing calls" caused failure during bootstrap. This patch does the same as D82085 plus fixes bootstrap error. The problem with D82085 is that it does not create copies for byval operands, while replacing function call with a branch. Consider following example: ``` int zoo ( S p1 ); int foo ( int count, S p1 ) { if ( count > 10 ) return zoo(p1); // temporarily variable created for passing byvalue parameter // p1 could be used when zoo(p1) is called(after TRE is done). // lifetime.start p1.byvalue.temp return foo(count+1, p1); // lifetime.end p1.byvalue.temp } ``` After recursive call to foo is replaced with a jump into start of the function, its parameters could be passed to zoo function. i.e. temporarily variable created for byvalue parameter "p1" could be passed to zoo. Finally zoo receives broken operand: ``` int foo ( int count, S p1 ) { :tailrecurse p1_tr = phi p1, p1.byvalue.temp if ( count > 10 ) return zoo(p1_tr); // temporarily variable created for passing byvalue parameter // p1 could be used when zoo(p1) is called(after TRE is done). lifetime.start p1.byvalue.temp memcpy (p1.byvalue.temp, p1_tr) count = count + 1 lifetime.end p1.byvalue.temp br tailrecurse } ``` To prevent using p1.byvalue.temp after its scope finished by lifetime.end marker this patch copies value from p1.byvalue.temp into another temporarily variable and then copies this variable into the input parameter for next iteration. This patch passes bootstrap build and bootstrap build with AddressSanitizer. Differential Revision: https://reviews.llvm.org/D85614	2021-05-25 11:35:48 +03:00
Max Kazantsev	2531fd70d1	[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration This patch handles one particular case of one-iteration loops for which SCEV cannot straightforwardly prove BECount = 1. The idea of the optimization is to symbolically execute conditional branches on the 1st iteration, moving in topoligical order, and only visiting blocks that may be reached on the first iteration. If we find out that we never reach header via the latch, then the backedge can be broken. Differential Revision: https://reviews.llvm.org/D102615 Reviewed By: reames	2021-05-25 12:43:31 +07:00
maekawatoshiki	e77d24f70a	Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass" This reverts commit `d65c32fb41`.	2021-05-25 11:39:49 +09:00
Jon Roelofs	095e91c973	[Remarks] Add analysis remarks for memset/memcpy/memmove lengths Re-landing now that the crasher this patch previously uncovered has been fixed in: https://reviews.llvm.org/D102935 Differential revision: https://reviews.llvm.org/D102452	2021-05-24 10:10:44 -07:00
Roman Lebedev	32bee42719	[NFCI][LoopIdiom] 'left-shift until bittest': assert that BaseX is loop-invariant Given that BaseX is an incoming value when coming from the preheader, it should be loop-invariant, but let's just document this assumption.	2021-05-24 12:15:06 +03:00
Roman Lebedev	aa3dac95ed	[LoopIdiom] 'logical right shift until zero': the value must be loop-invariant As per the reproducer provided by Mikael Holmén in post-commit review.	2021-05-24 12:15:06 +03:00
maekawatoshiki	d65c32fb41	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-05-23 22:32:01 +09:00
Florian Hahn	a6de8d95db	[Matrix] Bail out early if there are no matrix intrinsics. If there are no matrix intrinsics in a function, we can directly bail out, as there's nothing left to do. Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D102931	2021-05-22 11:37:25 +01:00
Florian Hahn	a0ce6439ca	[Matrix] Remove unused matrix-propagate-shape option. The option was used during the initial bringup, but it does not add any value at this point. Remove it. Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D102930	2021-05-21 19:01:54 +01:00
maekawatoshiki	fd53cb4148	Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass" This reverts commit `cea7a3fe3d`. To investigate sanitizer-x86_64-linux-fast failure.	2021-05-22 01:40:43 +09:00
maekawatoshiki	cea7a3fe3d	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-05-21 23:57:39 +09:00
Djordje Todorovic	cd49b3ae1a	[DebugInfo] Salvage dbg.value() during ADCE This has been found by using the [0]. [0] https://llvm.org/docs/HowToUpdateDebugInfo.html#\ test-original-debug-info-preservation-in-optimizations Differential Revision: https://reviews.llvm.org/D100844	2021-05-21 05:25:59 -07:00
Jon Roelofs	0af3105b64	Revert "[Remarks] Add analysis remarks for memset/memcpy/memmove lengths" This reverts commit `4bf69fb52b`. This broke spec2k6/403.gcc under -global-isel. Details to follow once I've reduced the problem.	2021-05-20 12:19:16 -07:00
Kevin P. Neal	f21f1eea05	[FPEnv] EarlyCSE support for constrained intrinsics, default FP environment edition EarlyCSE cannot distinguish between floating point instructions and constrained floating point intrinsics that are marked as running in the default FP environment. Said intrinsics are supposed to behave exactly the same as the regular FP instructions. Teach EarlyCSE to handle them in that case. Differential Revision: https://reviews.llvm.org/D99962	2021-05-20 14:40:51 -04:00
Jon Roelofs	4bf69fb52b	[Remarks] Add analysis remarks for memset/memcpy/memmove lengths Differential revision: https://reviews.llvm.org/D102452	2021-05-19 15:09:18 -07:00
Roman Lebedev	729e18cbf4	[NFCI] SimplifyCFGPass: mergeEmptyReturnBlocks(): use DeleteDeadBlocks() In this case, it does the same thing as the original pattern does. SimplifyCFG has a few lurking miscompilations about deleting blocks that have their address taken, and consistently using DeleteDeadBlocks() instead of a hand-rolled pattern will allow to weed those cases out easierly.	2021-05-19 11:32:24 +03:00
Arthur Eubanks	6b9524a05b	[NewPM] Don't mark AA analyses as preserved Currently all AA analyses marked as preserved are stateless, not taking into account their dependent analyses. So there's no need to mark them as preserved, they won't be invalidated unless their analyses are. SCEVAAResults was the one exception to this, it was treated like a typical analysis result. Make it like the others and don't invalidate unless SCEV is invalidated. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D102032	2021-05-18 13:49:03 -07:00
Nikita Popov	e81334a754	[LICM] Remove MaybePromotable set (PR50367) The MaybePromotable set keeps track of loads/stores for which promotion was not attempted yet. Normally, any load/stores that are promoted in the current iteration will be removed from this set, because they naturally MustAlias with the promoted value. However, if the source program has UB with metadata claiming that a store is NoAlias, while it is actually MustAlias, and multiple different pointers are promoted in the same iteration, it can happen that a store is removed that is still in the MaybePromotable set, causing a use-after-free. While this could be fixed by explicitly invalidating values in MaybePromotable in the LoopPromoter, I'm going with the more radical option of dropping the set entirely here and check all load/stores on each promotion iteration. As promotion, and especially repeated promotion, are quite rare, this doesn't seem to have any impact on compile-time. Fixes https://bugs.llvm.org/show_bug.cgi?id=50367.	2021-05-18 20:26:01 +02:00
Adam Nemet	ab1f6ffa56	[GVN] Improve analysis for missed optimization remark This change tries to handle multiple dominating users of the pointer operand by choosing the most immediately dominating one, if possible. While making this change I also found that the previous implementation had a missing break statement, making all loads with an odd number of dominating users emit an OtherAccess value, so that has also been fixed. Patch by Henrik G Olsson! Differential Revision: https://reviews.llvm.org/D79097	2021-05-17 21:51:15 -07:00
Adam Nemet	fcffd087c6	[Matrix] Fold the transpose into the matmul operand used to fetch scalars For column-major this is: A * B^t whereas for row-major: A^t * B Differential Revision: https://reviews.llvm.org/D101762	2021-05-17 17:40:46 -07:00
Roman Lebedev	0633d5ce7b	[LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition. I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests, so in principle i'm fine with landing this without review, but just in case.. This adds support for the "count active bits" pattern, i.e.: ``` int countActiveBits(unsigned val) { int cnt = 0; for( ; (val >> cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one, since that is what i need: ``` int countActiveBits(unsigned val, int start, int off) { int cnt; for (cnt = start; val >> (cnt + off); cnt++) ; return cnt; } ``` I've followed in footstep of 'left-shift until bittest' idiom (D91038), in the sense that iff the `ctlz` intrinsic is cheap, we'll transform, regardless of all other factors. This can have a shocking effect on certain benchmarks: ``` raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm 2021-05-09T01:06:05+03:00 Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench Run on (32 X 3600.24 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 512 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 5.26, 6.29, 3.49 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ p1319978.orf/threads:32/process_time/real_time_mean 145 ms 145 ms 128 0.145319 0.999981 10.1568M 69.8949M 69.8936M 6.88159 6.88146 0.145322 p1319978.orf/threads:32/process_time/real_time_median 145 ms 145 ms 128 0.145317 0.999986 10.1568M 69.8941M 69.8931M 6.88151 6.88141 0.145319 p1319978.orf/threads:32/process_time/real_time_stddev 0.766 ms 0.766 ms 128 766.586u 15.1302u 0 354.167k 354.098k 0.0348699 0.0348631 766.469u RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0 2021-05-09T01:06:24+03:00 Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Run on (32 X 3599.95 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 512 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 4.05, 5.95, 3.43 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ p1319978.orf/threads:32/process_time/real_time_mean 99.8 ms 99.8 ms 128 0.0997758 0.999972 10.1568M 101.797M 101.794M 10.0225 10.0222 0.0997786 p1319978.orf/threads:32/process_time/real_time_median 99.7 ms 99.7 ms 128 0.0997165 0.999985 10.1568M 101.857M 101.854M 10.0284 10.0281 0.0997195 p1319978.orf/threads:32/process_time/real_time_stddev 0.224 ms 0.224 ms 128 224.166u 34.345u 0 226.81k 227.231k 0.0223309 0.0223723 224.586u Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------- p1319978.orf/threads:32/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 128 vs 128 p1319978.orf/threads:32/process_time/real_time_mean -0.3134 -0.3134 145 100 145 100 p1319978.orf/threads:32/process_time/real_time_median -0.3138 -0.3138 145 100 145 100 p1319978.orf/threads:32/process_time/real_time_stddev -0.7073 -0.7078 1 0 1 0 ``` Reviewed By: craig.topper, zhuhan0 Differential Revision: https://reviews.llvm.org/D102116	2021-05-17 20:33:33 +03:00
Nikita Popov	fb9ed1979a	[IR] Add BasicBlock::isEntryBlock() (NFC) This is a recurring and somewhat awkward pattern. Add a helper method for it.	2021-05-15 12:41:58 +02:00
Vitaly Buka	6ce7b2f026	Fix "is not used" warning	2021-05-14 20:58:58 -07:00
Philip Reames	fcd12fed41	Extract a helper routine to simplify D91481 [NFC]	2021-05-14 18:40:23 -07:00
Nick Desaulniers	8c72749bd9	[LowerConstantIntrinsics] reuse isManifestLogic from ConstantFolding GlobalVariables are Constants, yet should not unconditionally be considered true for __builtin_constant_p. Via the LangRef https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic: This intrinsic generates no code. If its argument is known to be a manifest compile-time constant value, then the intrinsic will be converted to a constant true value. Otherwise, it will be converted to a constant false value. In particular, note that if the argument is a constant expression which refers to a global (the address of which _is_ a constant, but not manifest during the compile), then the intrinsic evaluates to false. Move isManifestConstant from ConstantFolding to be a method of Constant so that we can reuse the same logic in LowerConstantIntrinsics. pr/41459 Reviewed By: rsmith, george.burgess.iv Differential Revision: https://reviews.llvm.org/D102367	2021-05-14 15:35:21 -07:00
Philip Reames	3f1c218318	[rs4gc] Strip memory related attributes consistently I noticed that rs4gc is not stripping a number of memory aliasing related attributes. We do strip some from call sites, but don't strip the same ones from declarations or parameters. Why do we need to strip these? Two answers: Safepoints conceptually read and write to the entire garbage collected heap in the physical model. We need this to preserve ordering of all loads and stores with respect to possible relocation. We can infer other attributes from these. For instance, readnone can imply both nofree and nosync. Both of which don't hold after physical rewriting. Note: This exposed a latent issue which was fixed a couple weeks back in `01801d5274`. Differential Revision: https://reviews.llvm.org/D99802	2021-05-14 07:54:56 -07:00
dfukalov	fdae3fc8b3	[GVN] Clobber partially aliased loads. Use offsets stored in `AliasResult` implemented in D98718. Updated with fix of issue reported in https://reviews.llvm.org/D95543#2745161 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95543	2021-05-14 11:17:14 +03:00
David Green	f7cb654763	[DSE] Move isOverwrite into DSEState. NFC This moves the isOverwrite function into the DSEState so that it can share the analyses and members from the state. A few extra loop tests were also added to test stores in and around multi block loops for D100464.	2021-05-14 09:16:51 +01:00
Jingu Kang	107d19eb01	Revert "[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch" This reverts commit `88b259c014`. It needs to fix below bugs. https://bugs.llvm.org/show_bug.cgi?id=50279 https://bugs.llvm.org/show_bug.cgi?id=50302	2021-05-13 08:40:49 +01:00
Stelios Ioannou	1124ad2f5d	[LoopFlatten] Simplify loops so that the pass can operate on unsimplified loops. The loop flattening pass requires loops to be in simplified form. If the loops are not in simplified form, the pass cannot operate. This patch simplifies all loops before flattening. As a result, all loops will be simplified regardless of whether anything ends up being flattened. This change was inspired by observing a certain loop that was not flatten because the loops were not in simplified form. This loop is added as a test to verify that it is now flattened. Differential Revision: https://reviews.llvm.org/D102249 Change-Id: I45bcabe70fb99b0d89f0effafc82eb9e0585ec30	2021-05-12 19:22:01 +01:00
Congzhe Cao	3f8be15f29	[LoopInterchange] Handle lcssa PHIs with multiple predecessors This is a bugfix in the transformation phase. If the original outer loop header branches to both the inner loop (header) and the outer loop latch, and if there is an lcssa PHI node outside the loop nest, then after interchange the new outer latch will have an lcssa PHI node inserted which has two predecessors, i.e., the original outer header and the original outer latch. Currently the transformation assumes it has only one predecessor (the original outer latch) and crashes, since the inserted lcssa PHI node does not take both predecessors as incoming BBs. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D100792	2021-05-11 21:30:54 -04:00
Jordan Rupprecht	fec2945998	Revert "[GVN] Clobber partially aliased loads." This reverts commit `6c57044231`. It causes assertion errors due to widening atomic loads, and potentially causes miscompile elsewhere too. Repro, also posted to D95543: ``` $ cat repro.ll ; ModuleID = 'repro.ll' source_filename = "repro.ll" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" %struct.widget = type { i32 } %struct.baz = type { i32, %struct.snork } %struct.snork = type { %struct.spam } %struct.spam = type { i32, i32 } @global = external local_unnamed_addr global %struct.widget, align 4 @global.1 = external local_unnamed_addr global i8, align 1 @global.2 = external local_unnamed_addr global i32, align 4 define void @zot(%struct.baz* %arg) local_unnamed_addr align 2 { bb: %tmp = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1 %tmp1 = bitcast %struct.snork* %tmp to i64* %tmp2 = load i64, i64* %tmp1, align 4 %tmp3 = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1, i32 0, i32 1 %tmp4 = icmp ugt i64 %tmp2, 4294967295 br label %bb5 bb5: ; preds = %bb14, %bb %tmp6 = load i32, i32* %tmp3, align 4 %tmp7 = icmp ne i32 %tmp6, 0 %tmp8 = select i1 %tmp7, i1 %tmp4, i1 false %tmp9 = zext i1 %tmp8 to i8 store i8 %tmp9, i8* @global.1, align 1 %tmp10 = load i32, i32* @global.2, align 4 switch i32 %tmp10, label %bb11 [ i32 1, label %bb12 i32 2, label %bb12 ] bb11: ; preds = %bb5 br label %bb14 bb12: ; preds = %bb5, %bb5 %tmp13 = load atomic i32, i32* getelementptr inbounds (%struct.widget, %struct.widget* @global, i64 0, i32 0) acquire, align 4 br label %bb14 bb14: ; preds = %bb12, %bb11 br label %bb5 } $ opt -O2 repro.ll -disable-output opt: /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Utils/VNCoercion.cpp:496: llvm::Value llvm::VNCoercion::getLoadValueForLoad(llvm::LoadInst , unsigned int, llvm::Type , llvm::Instruction , const llvm::DataLayout &): Assertion `SrcVal->isSimple() && "Cannot widen volatile/atomic load!"' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. Stack dump: 0. Program arguments: /home/rupprecht/dev/opt -O2 repro.ll -disable-output ... ```	2021-05-11 16:08:53 -07:00
Congzhe Cao	40e3aa39bd	[LoopInterchange] Fix legality for triangular loops This is a bug fix in legality check. When we encounter triangular loops such as the following form: for (int i = 0; i < m; i++) for (int j = 0; j < i; j++), or for (int i = 0; i < m; i++) for (int j = 0; j*i < n; j++), we should not perform interchange since the number of executions of the loop body will be different before and after interchange, resulting in incorrect results. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D101305	2021-05-11 18:36:53 -04:00
Congzhe Cao	d3f89d4d16	Revert "[LoopInterchange] Fix legality for triangular loops" This reverts commit `29342291d2`. The test case requires an assert build. Will add REQUIRES and re-commit.	2021-05-11 18:10:58 -04:00
Congzhe Cao	29342291d2	[LoopInterchange] Fix legality for triangular loops This is a bug fix in legality check. When we encounter triangular loops such as the following form: for (int i = 0; i < m; i++) for (int j = 0; j < i; j++), or for (int i = 0; i < m; i++) for (int j = 0; j*i < n; j++), we should not perform interchange since the number of executions of the loop body will be different before and after interchange, resulting in incorrect results. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D101305	2021-05-11 11:00:46 -04:00
Roman Lebedev	1acd9a1a29	Revert "[LICM] Hoist loads with invariant.group metadata" This appears to miscompile google benchmark's GetCacheSizesFromKVFS() when compiling with -fstrict-vtable-pointers. Runnable reproducer: https://godbolt.org/z/f9ovKqTzb The "f.fail()" crashes with BUS error, it is compiled into testb, and the adress it is testing is non-sensical. This reverts commit `4c89bcadf6`.	2021-05-08 15:44:49 +03:00
Arthur Eubanks	34a8a437bf	[NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose Printing pass manager invocations is fairly verbose and not super useful. This allows us to remove DebugLogging from pass managers and PassBuilder since all logging (aside from analysis managers) goes through instrumentation now. This has the downside of never being able to print the top level pass manager via instrumentation, but that seems like a minor downside. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D101797	2021-05-07 21:51:47 -07:00
Han Zhu	da1cdffbb1	[loop-idiom] Hoist loop memcpys to loop preheader For a simple loop like: ``` struct S { int x; int y; char b; }; unsigned foo(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; return sizeof(a[0]); } ``` We could eliminate the loop and convert it to a large memcpy of 12n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll` ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %0 = bitcast %struct.S* %arrayidx2 to i8* %1 = bitcast %struct.S* %arrayidx to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false) %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic. With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass. ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %a1 = bitcast %struct.S* %a to i8* %b2 = bitcast %struct.S* %b to i8* %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = zext i32 %n to i64 %1 = mul nuw nsw i64 %0, 12 call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false) br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %2 = bitcast %struct.S* %arrayidx2 to i8* %3 = bitcast %struct.S* %arrayidx to i8* %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` Reviewed By: zino Differential Revision: https://reviews.llvm.org/D97667	2021-05-04 17:05:04 -07:00
Philip Reames	11326cbcdb	[IndVarSimplify][NFC] Removed mayThrow from if-condition in predicateLoopExits of IndVarSimplify Instruction has mayHaveSideEffects method that returns true if mayThrow return true because this is called internally in the first method. As such, the call being removed is redundant. Patch By: vdsered (Daniil Seredkin) Differential Revision: https://reviews.llvm.org/D101685	2021-05-03 18:25:07 -07:00
Nikita Popov	ffa5a402a9	[IndVars] Remove redundant loop invariance check (NFC) This is checked again directly below this condition.	2021-05-01 17:22:00 +02:00
Jingu Kang	88b259c014	[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch Differential Revision: https://reviews.llvm.org/D99354	2021-04-30 15:55:56 +01:00
Evgeniy Brevnov	7861cb600c	[NARY] Don't optimize min/max if there are side uses (part2) Previous attempt to fix infinite recursion in min/max reassociation was not fully successful (D100170). Newly discovered failing case is due to not properly handled when there is a single use. It should be processed separately from 2 uses case. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D101359	2021-04-30 19:02:02 +07:00
Tres Popp	efce19c3b0	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `75d6b8bb40`. The reasoning is mentioned in https://reviews.llvm.org/D97667	2021-04-28 13:16:34 +02:00
Dávid Bolvanský	e81819377e	[DSE] Eliminate zero memset after calloc Solves PR11896 As noted, this can be improved futher (calloc -> malloc) in some cases. But for know, this is the first step. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101391	2021-04-28 03:30:52 +02:00
Han Zhu	75d6b8bb40	[loop-idiom] Hoist loop memcpys to loop preheader For a simple loop like: ``` struct S { int x; int y; char b; }; unsigned foo(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; return sizeof(a[0]); } ``` We could eliminate the loop and convert it to a large memcpy of 12n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll` ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %0 = bitcast %struct.S* %arrayidx2 to i8* %1 = bitcast %struct.S* %arrayidx to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false) %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic. With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass. ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %a1 = bitcast %struct.S* %a to i8* %b2 = bitcast %struct.S* %b to i8* %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = zext i32 %n to i64 %1 = mul nuw nsw i64 %0, 12 call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false) br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %2 = bitcast %struct.S* %arrayidx2 to i8* %3 = bitcast %struct.S* %arrayidx to i8* %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` Reviewed By: zino Differential Revision: https://reviews.llvm.org/D97667	2021-04-27 17:37:51 -07:00
Han Zhu	cd13f19031	[loop-idiom][NFC] Extract processLoopStoreOfLoopLoad into a helper function Differential Revision: https://reviews.llvm.org/D100979	2021-04-27 13:42:30 -07:00
Sanjay Patel	e808289fe6	[IndVars] avoid crash in LFTR when assuming an add recurrence The test is a crasher reduced from: https://llvm.org/PR49993 linearFunctionTestReplace() assumes that we have an add recurrence, so check for that as a condition of matching a loop counter. Differential Revision: https://reviews.llvm.org/D101291	2021-04-27 08:26:02 -04:00
Vitaly Buka	f2a585e6d3	[NFC] Fix "not used" warning	2021-04-26 22:09:23 -07:00
William S. Moses	7aa3cad46a	[NVPTX] Enable lowering of atomics on local memory LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store. Differential Revision: https://reviews.llvm.org/D98650	2021-04-26 20:12:12 -04:00
Hongtao Yu	30bb5be389	[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 2. As a follow-up to D95982, this patch continues unblocking optimizations that are blocked by pseudu probe instrumention. The optimizations unblocked are: - In-block load propagation. - In-block dead store elimination - Memory copy optimization that turns stores to consecutive memories into a memset. These optimizations are local to a block, so they shouldn't affect the profile quality. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D100075	2021-04-26 16:52:33 -07:00
Fangrui Song	18839be9c5	[ADT] Remove StatisticBase and make NoopStatistic empty In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic` but has 3 mostly unused pointers. GlobalOpt considers that the pointers can potentially retain allocated objects, so GlobalOpt cannot optimize out the `NoopStatistic` variables (see D69428 for more context), wasting 23KiB for stage 2 clang. This patch makes `NoopStatistic` empty and thus reclaims the wasted space. The clang size is even smaller than applying D69428 (slightly smaller in both .bss and .text). ``` # This means the D69428 optimization on clang is mostly nullified by this patch. HEAD+D69428: size(.bss) = 0x0725a8 HEAD+D101211: size(.bss) = 0x072238 # bloaty - HEAD+D69428 vs HEAD+D101211 # With D101211, we also save a lot of string table space (.rodata). FILE SIZE VM SIZE -------------- -------------- -0.0% -32 -0.0% -24 .eh_frame -0.0% -336 [ = ] 0 .symtab -0.0% -360 [ = ] 0 .strtab [ = ] 0 -0.2% -880 .bss -0.0% -2.11Ki -0.0% -2.11Ki .rodata -0.0% -2.89Ki -0.0% -2.89Ki .text -0.0% -5.71Ki -0.0% -5.88Ki TOTAL ``` Note: LoopFuse is a disabled pass. For now this patch adds `#if LLVM_ENABLE_STATS` so `OptimizationRemarkMissed` is skipped in LLVM_ENABLE_STATS==0 builds. If these `OptimizationRemarkMissed` are useful in LLVM_ENABLE_STATS==0 builds, we can replace `llvm::Statistic` with `llvm::TrackingStatistic`, or use a different abstraction to keep track of the strings. Similarly, skip the code in `mlir/lib/Pass/PassStatistics.cpp` which calls `getName`/`getDesc`/`getValue`. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D101211	2021-04-26 16:47:32 -07:00
William S. Moses	8ede96493c	Revert "[NVPTX] Enable lowering of atomics on local memory" This reverts commit `fede99d386`.	2021-04-26 19:33:01 -04:00
William S. Moses	fede99d386	[NVPTX] Enable lowering of atomics on local memory LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store. Differential Revision: https://reviews.llvm.org/D98650	2021-04-26 19:27:27 -04:00
Lei Zhang	254e289d45	Revert "[ADT] Remove StatisticBase and make NoopStatistic empty" This reverts commit `b540311781` because it breaks MLIR build: https://buildkite.com/mlir/mlir-core/builds/13299#ad0f8901-dfa4-43cf-81b8-7940e2c6c15b	2021-04-26 18:31:04 -04:00
Fangrui Song	b540311781	[ADT] Remove StatisticBase and make NoopStatistic empty In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic` but has 3 unused pointers. GlobalOpt considers that the pointers can potentially retain allocated objects, so GlobalOpt cannot optimize out the `NoopStatistic` variables (see D69428 for more context), wasting 23KiB for stage 2 clang. This patch makes `NoopStatistic` empty and thus reclaims the wasted space. The clang size is even smaller than applying D69428 (slightly smaller in both .bss and .text). ``` # This means the D69428 optimization on clang is mostly nullified by this patch. HEAD+D69428: size(.bss) = 0x0725a8 HEAD+D101211: size(.bss) = 0x072238 # bloaty - HEAD+D69428 vs HEAD+D101211 # With D101211, we also save a lot of string table space (.rodata). FILE SIZE VM SIZE -------------- -------------- -0.0% -32 -0.0% -24 .eh_frame -0.0% -336 [ = ] 0 .symtab -0.0% -360 [ = ] 0 .strtab [ = ] 0 -0.2% -880 .bss -0.0% -2.11Ki -0.0% -2.11Ki .rodata -0.0% -2.89Ki -0.0% -2.89Ki .text -0.0% -5.71Ki -0.0% -5.88Ki TOTAL ``` Note: LoopFuse is a disabled pass. This patch adds `#if LLVM_ENABLE_STATS` so `OptimizationRemarkMissed` is skipped in LLVM_ENABLE_STATS==0 builds. If these `OptimizationRemarkMissed` are useful and not noisy, we can replace `llvm::Statistic` with `llvm::TrackingStatistic` in the future. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D101211	2021-04-26 13:39:35 -07:00
Yuanbo Li	cc7803ee3f	[LSR][DebugInfo] Don't unnecessarily drop DebugLocs When transforming a loop terminating condition into a "max" comparison, the DebugLoc from the old condition should be set on the newly created comparison. They are the same operation, just optimized. Fixes PR48067. Differential Revision: https://reviews.llvm.org/D98218	2021-04-26 13:14:42 +01:00
Florian Hahn	fa2f162e76	[NewGVN] Properly transfer PredDep in move constructor.	2021-04-25 11:22:59 +01:00
Florian Hahn	1d8ef761be	[NewGVN] Use ExprResult to add extra predicate users. This patch updates performSymbolicPredicateInfoEvaluation to manage registering additional dependencies using ExprResult. Similar to D99987, this fixes an issues where we failed to track the correct dependency for a phi-of-ops value, which is marked as temporary. Fixes PR49873. Reviewed By: asbirlea, ruiling Differential Revision: https://reviews.llvm.org/D100560	2021-04-25 11:13:32 +01:00
Florian Hahn	1cc5946cc8	[NewGVN] Use performSymbolicEvaluation instead of createExpression. performSymbolicEvaluation is used to obtain the symbolic expression when visiting instructions and this is used to determine their congruence class. performSymbolicEvaluation only creates expressions for certain instructions (via createExpression). For unsupported instructions, 'unknown' expression are created. The use of createExpression in processOutgoingEdges means we may simplify the condition in processOutgoingEdges to a constant in the initial round of processing, but we use Unknown(I) for the congruence class. If an operand of I changes the expression Unknown(I) stays the same, so there is no update of the congruence class of I. Hence it won't get re-visited. So if an operand of I changes in a way that causes createExpression to return different result, this update is missed. This patch updates the code to use performSymbolicEvaluation, to be symmetric with the congruence class updating code. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D99990	2021-04-24 18:49:07 +01:00
dfukalov	6c57044231	[GVN] Clobber partially aliased loads. Use offsets stored in `AliasResult` implemented in D98718. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95543	2021-04-24 14:14:20 +03:00
Sander de Smalen	f9a50f04ba	[TTI] NFC: Change getIntImmCost[Inst\|Intrin] to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100565	2021-04-23 16:06:36 +01:00
Sander de Smalen	43ace8b5ce	[TTI] NFC: Change getScalingFactorCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100564	2021-04-23 16:06:36 +01:00
Timm Bäder	e60d6e91e1	[llvm][NFC] Fix assert indentation This triggers GCC's misleading-indentation checker.	2021-04-23 14:44:05 +02:00
Florian Hahn	2b15262f89	Recommit "[NewGVN] Track simplification dependencies for phi-of-ops." This recommits `4f5da356ff`, including explicit implementations of move a constructor and deleted copy constructors/assignment operators, to fix failures with some compilers. This reverts the revert `74854d00e8`.	2021-04-23 11:27:43 +01:00
Florian Hahn	74854d00e8	Revert "[NewGVN] Track simplification dependencies for phi-of-ops." This reverts commit `4f5da356ff`. This causes some buildbot failures, e.g. https://lab.llvm.org/buildbot/#/builders/139/builds/3019	2021-04-23 09:56:17 +01:00
Florian Hahn	4f5da356ff	[NewGVN] Track simplification dependencies for phi-of-ops. If we are using a simplified value, we need to add an extra dependency this value , because changes to the class of the simplified value may require us to invalidate any decision based on that value. This is done by adding such values as additional users, however the current code does not excludes temporary instructions. At the moment, this means that we miss those dependencies for phi-of-ops, because they are temporary instructions at this point. We instead need to add the extra dependencies to the root instruction of the phi-of-ops. This patch pushes the responsibility of adding extra users to the callers of createExpression & performSymbolicEvaluation. At those points, it is clearer which real instruction to pick. Alternatively we could either pass the 'real' instruction as additional argument or use another map, but I think the approach in the patch makes things a bit easier to follow. Fixes PR35074. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D99987	2021-04-23 09:48:38 +01:00
KAWASHIMA Takahiro	d9a9c992d1	[LoopReroll] Fix rerolling loop with extra instructions Fixes PR47627 This fix suppresses rerolling a loop which has an unrerollable instruction. Sample IR for the explanation below: ``` define void @foo([2 x i32]* nocapture %a) { entry: br label %loop loop: ; base instruction %indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ] ; unrerollable instructions %stptrx = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %indvar, i64 0 store i32 999, i32* %stptrx, align 4 ; extra simple arithmetic operations, used by root instructions %plus20 = add nuw nsw i64 %indvar, 20 %plus10 = add nuw nsw i64 %indvar, 10 ; root instruction 0 %ldptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 0 %value0 = load i32, i32* %ldptr0, align 4 %stptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 0 store i32 %value0, i32* %stptr0, align 4 ; root instruction 1 %ldptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 1 %value1 = load i32, i32* %ldptr1, align 4 %stptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 1 store i32 %value1, i32* %stptr1, align 4 ; loop-increment and latch %indvar.next = add nuw nsw i64 %indvar, 1 %exitcond = icmp eq i64 %indvar.next, 5 br i1 %exitcond, label %exit, label %loop exit: ret void } ``` In the loop rerolling pass, `%indvar` and `%indvar.next` are appended to the `LoopIncs` vector in the `LoopReroll::DAGRootTracker::findRoots` function. Before this fix, two instructions with `unrerollable instructions` comment above are marked as `IL_All` at the end of the `LoopReroll::DAGRootTracker::collectUsedInstructions` function, as well as instructions with `extra simple arithmetic operations` comment and `loop-increment and latch` comment. It is incorrect because `IL_All` means that the instruction should be executed in all iterations of the rerolled loop but the `store` instruction should not. This fix rejects instructions which may have side effects and don't belong to def-use chains of any root instructions and reductions. See https://bugs.llvm.org/show_bug.cgi?id=47627 for more information.	2021-04-23 15:14:46 +09:00
Max Kazantsev	8fe62b7af1	[GVN] Introduce loop load PRE This patch allows PRE of the following type of loads: ``` preheader: br label %loop loop: br i1 ..., label %merge, label %clobber clobber: call foo() // Clobbers %p br label %merge merge: ... br i1 ..., label %loop, label %exit ``` Into ``` preheader: %x0 = load %p br label %loop loop: %x.pre = phi(x0, x2) br i1 ..., label %merge, label %clobber clobber: call foo() // Clobbers %p %x1 = load %p br label %merge merge: x2 = phi(x.pre, x1) ... br i1 ..., label %loop, label %exit ``` So instead of loading from %p on every iteration, we load only when the actual clobber happens. The typical pattern which it is trying to address is: hot loop, with all code inlined and provably having no side effects, and some side-effecting calls on cold path. The worst overhead from it is, if we always take clobber block, we make 1 more load overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken at least once, the transform is neutral or profitable. There are several improvements prospect open up: - We can sometimes be smarter in loop-exiting blocks via split of critical edges; - If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that we don't know if their sum is colder than the header. Differential Revision: https://reviews.llvm.org/D99926 Reviewed By: reames	2021-04-22 12:50:38 +07:00
Olle Fredriksson	f5446b769a	[MemCpyOpt] Allow variable lengths in memcpy optimizer This makes the memcpy-memcpy and memcpy-memset optimizations work for variable sizes as long as they are equal, relaxing the old restriction that they are constant integers. If they're not equal, the old requirement that they are constant integers with certain size restrictions is used. The implementation works by pushing the length tests further down in the code, which reveals some places where it's enough that the lengths are equal (but not necessarily constant). Differential Revision: https://reviews.llvm.org/D100870	2021-04-21 23:23:38 +02:00
Philip Reames	01801d5274	[rs4gc] Fix a latent bug around attribute stripping for intrinsics This change fixes a latent bug which was exposed by a change currently in review (https://reviews.llvm.org/D99802#2685032). The story on this is a bit involved. Without this change, what ended up happening with the pending review was that we'd strip attributes off intrinsics, and then selectiondag would fail to lower the intrinsic. Why? Because the lowering of the intrinsic relies on the presence of the readonly attribute. We don't have a matcher to select the case where there's a glue node needed. Now, on the surface, this still seems like a codegen bug. However, here it gets fun. I was unable to reproduce this with a standalone test at all, and was pretty much struck until skatkov provided the critical detail. This reproduces only when RS4GC and codegen are run in the same process and context. Why? Because it turns out we can't roundtrip the stripped attribute through serialized IR! We'll happily print out the missing attribute, but when we parse it back, the auto-upgrade logic has a side effect of blindly overwriting attributes on intrinsics with those specified in Intrinsics.td. This makes it impossible to exercise SelectionDAG from a standalone test case. At this point, I decided to treat this an RS4GC bug as a) we don't need to strip in this case, and b) I could write a test which shows the correct behavior to ensure this doesn't break again in the future. As an aside, I'd originally set out to handle libfuncs too - since in theory they might have the same issues - but backed away quickly when I realized how the semantics of builtin, nobuiltin, and no-builtin-x all interacted. I'm utterly convinced that no part of the optimizer handles that correctly, and decided not to open that can of worms here.	2021-04-19 13:14:07 -07:00
Nikita Popov	d440f9a326	[LICM] Make capture check more precise During store promotion, we check whether the pointer was captured to exclude potential reads from other threads. However, we're only interested in captures before or inside the loop. Check this using PointerMayBeCapturedBefore against the loop header. Differential Revision: https://reviews.llvm.org/D100706	2021-04-19 20:34:23 +02:00
Evgeniy Brevnov	35e95c6817	[CVP] processCallSite returns wrong status Recently processMinMaxIntrinsic has been added and we started to observe a number of analysis get invalidated after CVP. The problem is CVP conservatively returns 'true' even if there were no modifications to IR. I found one more place besides processMinMaxIntrinsic which has the same problem. I think processMinMaxIntrinsic and similar should better have boolean return status to prevent similar issue reappear in future. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D100538	2021-04-19 12:13:22 +07:00
Serge Guelton	d6de1e1a71	Normalize interaction with boolean attributes Such attributes can either be unset, or set to "true" or "false" (as string). throughout the codebase, this led to inelegant checks ranging from if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") to if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") Introduce a getValueAsBool that normalize the check, with the following behavior: no attributes or attribute set to "false" => return false attribute set to "true" => return true Differential Revision: https://reviews.llvm.org/D99299	2021-04-17 08:17:33 +02:00
Marcythm	f8cf3b9931	[LICM][NFC] Fix typo fixed some typos which may lead to misunderstandings in LICM.cpp Reviewed By: nikic, asbirlea Differential Revision: https://reviews.llvm.org/D100470	2021-04-16 09:42:00 +08:00
Arthur Eubanks	9c776c2fa2	[NFC][NewPM] Remove some AnalysisManager invalidate methods These were misleading, they're more of a "clear" than an "invalidate". We shouldn't be individually clearing analysis results. Either we clear all analyses when some IR becomes invalid, or we properly go through invalidation. There was only one use of this, which can be simulated with AM.invalidate(F, PA). Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D100519	2021-04-15 16:51:26 -07:00
Stelios Ioannou	bf147c4653	[LSR] Fix for pre-indexed generated constant offset This patch changed the isLegalUse check to ensure that LSRInstance::GenerateConstantOffsetsImpl generates an offset that results in a legal addressing mode and formula. The check is changed to look similar to the assert check used for illegal formulas. Differential Revision: https://reviews.llvm.org/D100383 Change-Id: Iffb9e32d59df96b8f072c00f6c339108159a009a	2021-04-15 16:44:42 +01:00
Florian Hahn	5a3ff24b12	[NewGVN] Add phi-of-ops operands if no real PHI is created. If the PHI-of-ops simplifies to an existing value, no real PHI is created, which means the dependencies between the PHI-of-ops and its operands is not materialized in IR. At the moment, we fail to create a real PHI node for the PHI-of-ops, because the PHI-of-ops root instruction is not re-visited if one of the PHI-of-ops operands changes. We need to add the operands as additional users in this case. Even with this patch, there are still some dependencies missing. I will continue tackling the outstanding reporeted crashes in this area. Fixes PR36501, PR42422, PR42557. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D66924	2021-04-15 08:25:10 +01:00
Sjoerd Meijer	39d29817f3	[SCCP] Follow up of rGbbab9f986c6d. NFC. This addresses the linter messages, mainly the inconsistent capitalisation of member functions.	2021-04-14 17:14:46 +01:00
Sjoerd Meijer	bbab9f986c	[SCCP] Create SCCP Solver This refactors SCCP and creates a SCCPSolver interface and class so that it can be used by other passes and transformations. We will use this in D93838, which adds a function specialisation pass. This is based on an early version by Vinay Madhusudan. Differential Revision: https://reviews.llvm.org/D93762	2021-04-14 14:58:03 +01:00
Evgeniy Brevnov	e50aa1af2d	[NARY][NFC] Use hasNUsesOrMore instead of getNumUses since it's more efficient.	2021-04-13 09:29:49 +07:00
Nick Desaulniers	237d4ee835	[JumpThreading] merge debug info when merging select+br Jump threading can replace select then unconditional branch with conditional branch, but when doing so loses debug info. This destructive transform is eventually leading to a failed Verifier run during full LTO builds of the Linux kernel with CFI and KCOV enabled, as reported in PR39531. ModuleSanitizerCoveragePass will insert calls to __sanitizer_cov_trace_pc, and sometimes split critical edges, using whatever debug info may or may not exist for the branch for the added libcall. Since we can inline calls to __sanitizer_cov_trace_pc due to LTO, this can lead to the error observed in PR39531 when the debug info isn't propagated to the libcall, because of prior destructive transforms that failed to retain debug info. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D100137	2021-04-12 17:51:21 -07:00
Evgeniy Brevnov	36b932d6a3	[NARY] Don't optimize min/max if there are side uses Say we have %1=min(%a,%b) %2=min(%b,%c) %3=min(%2,%a) The optimization will try to reassociate the later one so that we can rewrite it to %3=min(%1, %c) and remove %2. But if %2 has another uses outside of %3 then we can't remove %2 and end up with: %1=min(%a,%b) %2=min(%b,%c) %3=min(%1, %c) This doesn't harm by itself except it is not profitable and changes IR for no good reason. What is bad it triggers next iteration which finds out that optimization is applicable to %2 and %3 and generates: %1=min(%a,%b) %2=min(%b,%c) %3=min(%1,%c) %4=min(%2,%a) and so on... The solution is to prevent optimization in the first place if intermediate result (%2) has side uses and known to be not removed. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D100170	2021-04-12 12:43:54 +07:00
Roman Lebedev	13fca9d816	[NFCI][SimplifyCFG] mergeEmptyReturnBlocks(): improve Dominator Tree updating Same as with previous patches.	2021-04-11 23:56:23 +03:00
Roman Lebedev	005881e96e	[LoopIdiom] left-shift-until-bittest: set all allowed no-wrap flags on add/sub I've checked each one of these with alive2, and this is both correct and precise.	2021-04-11 18:08:07 +03:00
Roman Lebedev	9829f5e6b1	[CVP] @llvm.[us]{min,max}() intrinsics handling If we can tell that either one of the arguments is taken, bypass the intrinsic. Notably, we are indeed fine with non-strict predicate: * UL: https://alive2.llvm.org/ce/z/69qVW9 https://alive2.llvm.org/ce/z/kNFTKf https://alive2.llvm.org/ce/z/AvaPw2 https://alive2.llvm.org/ce/z/oxo53i * UG: https://alive2.llvm.org/ce/z/wxHeGH https://alive2.llvm.org/ce/z/Lf76qx * SL: https://alive2.llvm.org/ce/z/hkeTGS https://alive2.llvm.org/ce/z/eR_b-W * SG: https://alive2.llvm.org/ce/z/wEqRm7 https://alive2.llvm.org/ce/z/FpAsVr Much like with all other comparison handling in CVP, while we could sort-of handle two Value's, at least for plain ICmpInst it does not appear to be worthwhile. This only fires 78 times on test-suite + dt + rs, but we don't canonicalize to these yet. (only SCEV produces them)	2021-04-11 00:33:47 +03:00
Roman Lebedev	f041757e9c	[NFC][JumpThreading] Increment 'NumFolds' statistic all places terminator becomes uncond	2021-04-10 21:24:29 +03:00
Roman Lebedev	a407738def	[NFC][CVP] Add statistic for function pointer argument non-null-ness deduction	2021-04-10 21:23:20 +03:00
Roman Lebedev	fe7b3ad8d5	[CVP] LVI: Use in-block values when checking value signedness domain This has a huge positive impact on all the folds that use these helpers, as it can be seen on vanilla test-suite + rawspeed + darktable: correlated-value-propagation.NumSRems +75.68% (+ 28) correlated-value-propagation.NumAShrs +63.87% (+198) correlated-value-propagation.NumSDivs +49.42% (+127) correlated-value-propagation.NumSExt + 8.85% (+593) correlated-value-propagation.NumUDivURemsNarrowed + 8.65% (+34) ... while having pretty minimal compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=e8c7f43e2c2c6f3581ec1c6489ec21ad9f98958a&to=4cd197711e58ee1b2faeee0c35eea54540185569&stat=instructions	2021-04-10 21:10:59 +03:00
Roman Lebedev	257eda0794	[NFC][LVI] getPredicateAt(): drop default value for UseBlockValue The default is likely wrong. Out of all the callees, only a single one needs to pass-in false (JumpThread), everything else either already passes true, or should pass true. Until the default is flipped, at least make it harder to unintentionally add new callees with UseBlockValue=false.	2021-04-10 20:46:01 +03:00
Roman Lebedev	c329a47d9e	[CVP] @llvm.abs() handling Iff we know the sigdness domain of the argument, we can either skip @llvm.abs, or do negation directly. Notably, INT_MIN can belong to either domain: * X u<= INT_MIN --> X is always fine https://alive2.llvm.org/ce/z/QB8j-C https://alive2.llvm.org/ce/z/7sFKpS * X s<= 0 --> -X is always fine https://alive2.llvm.org/ce/z/QbGSyq https://alive2.llvm.org/ce/z/APsN84 If all else fails, try to inferr NSW flag: https://alive2.llvm.org/ce/z/qCJfYm	2021-04-10 16:47:31 +03:00
dfukalov	c1a88e007b	[AA][NFC] Convert AliasResult to class containing offset for PartialAlias case. Add an ability to store `Offset` between partially aliased location. Use this storage within returned `ResultAlias` instead of caching it in `AAQueryInfo`. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98718	2021-04-09 13:26:09 +03:00
dfukalov	d066079728	[NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset. Main reason is preparation to transform AliasResult to class that contains offset for PartialAlias case. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98027	2021-04-09 12:54:22 +03:00
Max Kazantsev	baf17e2cc9	[NFC] Move statictic increment out of helper	2021-04-09 16:32:35 +07:00
Max Kazantsev	275f3a2540	[GVN][NFC] Factor out load elimination logic via PRE for reuse	2021-04-09 16:12:25 +07:00
Arthur Eubanks	4c89bcadf6	[LICM] Hoist loads with invariant.group metadata Previously loading the vtable used in calling a virtual method in a loop was not hoisted out of the loop. This fixes that. canSinkOrHoistInst() itself doesn't check that the load operands are loop invariant, callers also check that separately. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99784	2021-04-08 21:57:37 -07:00
Serguei Katkov	d2e15a83a6	[RS4GC] Cleanup meetBDVState. NFC. meetBDVState looks pretty difficult to read and follow. This is purely NFC but doing several things: 1) Combine meet and meetBDVState 2) Move the function to be a member of BDVState 3) Make BDVState be a mutable object 4) Convert switch to sequence of ifs 5) Adds comments. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D99064	2021-04-09 10:20:25 +07:00
Arthur Eubanks	c5d1ccbcdf	[GVN] Properly invalidate ICF cache when we simplify a value This fixes a "Cached first special instruction is wrong!" assert. The assert fires because replacing a value with another can cause an instruction to no longer be "special" to ICF. In this case, devirtualization happened, turning an indirect call to a call to a willreturn function which is no longer special. Reviewed By: nikic, rnk Differential Revision: https://reviews.llvm.org/D99977	2021-04-08 14:01:57 -07:00
Nikita Popov	59a2f67011	[LoopRotate] Don't split loop pass manager After D99249 we use three different loop pass managers for LICM, LoopRotate and LICM+LoopUnswitch. This happens because LazyBFI and LazyBPI are not preserved by LoopRotate (note that D74640 is no longer needed). Avoid this by marking them as preserved. My understanding of D86156 is that it is okay to simply preserve them (which LoopUnswitch already does for the same reason) and rely on callbacks to deal with deleted blocks. Differential Revision: https://reviews.llvm.org/D99843	2021-04-08 22:05:18 +02:00
Congzhe Cao	ce2db9005d	[LoopInterchange] Fix transformation bugs in loop interchange After loop interchange, the (old) outer loop header should not jump to the `LoopExit`. Note that the old outer loop becomes the new inner loop after interchange. If we branched to `LoopExit` then after interchange we would jump directly from the (new) inner loop header to `LoopExit` without executing the rest of outer loop. This patch modifies adjustLoopBranches() such that the old outer loop header (which becomes the new inner loop header) jumps to the old inner loop latch which becomes the new outer loop latch after interchange. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D98475	2021-04-08 14:58:13 -04:00
Stephen Tozer	140757bfaa	[DebugInfo] Prevent invalid debug info being produced during LoopStrengthReduce During LoopStrengthReduce, some of the SSA values that are used by debug values may be lost and/or salvaged. After LSR we attempt to recover any undef debug values, including any that were salvaged but then lost their values afterwards, by replacing the lost values with any live equal values (plus a possible constant offset) that have been gathered prior to running LSR. When we do this we restore the debug value's original DIExpression, to undo any salvaging (as we have gone back to using the original debug value). This process can currently produce invalid debug info if the number of operands has changed by salvaging during LSR. Replacing old values during the applyEqualValues step does not change the number of location operands, which means that when we restore the old DIExpression we may have a mismatch between the number of operands used by the debug value and the number of operands referenced by the DIExpression. This patch fixes this by restoring the full original location metadata at the start of the applyEqualValues step, so that there is no mismatch in operand count between the debug value and its DIExpression. Differential Revision: https://reviews.llvm.org/D98644	2021-04-08 13:04:48 +01:00
Congzhe Cao	593cb46550	Revert "[LoopInterchange] Fix transformation bugs in loop interchange" This reverts commit 6ec68bd815d00c1eec2a6b9766452554f0e6cb61.	2021-04-07 21:17:30 -04:00
CongzheUalberta	f5645ea65f	[LoopInterchange] Fix transformation bugs in loop interchange After loop interchange, the (old) outer loop header should not jump to `LoopExit`. Note that the old outer loop becomes the new inner loop after interchange. If we branched to `LoopExit` then after interchange we would jump directly from the (new) inner loop header to `LoopExit` without executing the rest of (new) outer loop. This patch modifies adjustLoopBranches() such that the old outer loop header (which becomes the new inner loop header) jumps to the old inner loop latch which becomes the new outer loop latch after interchange. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D98475	2021-04-07 20:55:44 -04:00
Craig Topper	5fc0e98d9a	[LoopIdiomRecognize] Minor cleanups to the FFS idiom matching. NFC -Make sure of the CreateShl/LShr/AShr methods that take a uint64_t instead of creating a ConstantInt for 1 ourselves. -Use Builder.getInt1 or ConstantInt::getBool instead of a conditional. -Pull out repeated calls to getType.	2021-04-07 10:03:14 -07:00
Philip Reames	908215b346	Use AssumeInst in a few more places [nfc] Follow up to `a6d2a8d6f5`. These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.	2021-04-06 13:18:53 -07:00
Philip Reames	9ef6aa020b	Plumb AssumeInst through operand bundle apis [nfc] Follow up to `a6d2a8d6f5`. This covers all the public interfaces of the bundle related code. I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.	2021-04-06 12:53:53 -07:00
Philip Reames	a6d2a8d6f5	Add a subclass of IntrinsicInst for llvm.assume [nfc] Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class. A follow up change will do the same for the newer assumption query/bundle mechanisms.	2021-04-06 11:16:22 -07:00
Arthur Eubanks	4e83e59eb8	[GVN] Add missing ICF update performScalarPREInsertion() inserts instructions into blocks that we need to tell ImplicitControlFlowTracking about, otherwise the ICF cache may be invalid. Fixes PR49193. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99909	2021-04-06 10:13:42 -07:00
Philip Reames	21d4839948	Move GCRelocateInst and GCResultInst to IntrinsicInst.h [nfc] These two are part of the IntrinsicInst class hierarchy and it helps to cut down on some redundant includes.	2021-04-06 08:33:15 -07:00
Philip Reames	52ecd94cfb	Remove last remnants of PR49607 migration [NFC] The key change (`4f5e92c`) to switch gc.result and gc.relocate to being readnone landed nearly two weeks ago, and we haven't seen any fallout. Time to remove the code added to make reverting easy.	2021-04-06 07:56:55 -07:00
Simon Pilgrim	b8aba76a4e	LoopFlatten - CanWidenIV - Fix uninitialized variable warnings and use for-range loop. NFCI. Fix static analysis uninitialized variable warnings, and use for-range loop iteration across WideIVs array.	2021-04-06 12:24:20 +01:00
Arthur Eubanks	ea0e2ca1ac	[SROA] Allow SROA on pointers with invariant group intrinsic uses When we are able to SROA an alloca, we know all uses of it, meaning we don't have to preserve the invariant group intrinsics and metadata. It's possible that we could lose information regarding redundant loads/stores, but that's unlikely to have any real impact since right now the only user is Clang and vtables. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99760	2021-04-05 19:53:40 -07:00
Ta-Wei Tu	6a82ace5f2	[LoopFusion] Bails out if only the second candidate is guarded (PR48060) If only the second candidate loop is guarded while the first one is not, fusioning two loops might not be valid but this check is currently missing. Fixes https://bugs.llvm.org/show_bug.cgi?id=48060 Reviewed By: sidbav Differential Revision: https://reviews.llvm.org/D99716	2021-04-06 01:08:56 +08:00
Dimitry Andric	6abb92f210	[SCCP] Avoid modifying AdditionalUsers while iterating over it When run under valgrind, or with a malloc that poisons freed memory, this can lead to segfaults or other problems. To avoid modifying the AdditionalUsers DenseMap while still iterating, save the instructions to be notified in a separate SmallPtrSet, and use this to later call OperandChangedState on each instruction. Fixes PR49582. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D98602	2021-04-02 19:05:59 +02:00
Philip Reames	2c4548e18e	[rs4gc] Use loops instead of straightline code for attribute stripping [nfc] Mostly because I'm about to add more attributes and the straightline copies get much uglier. What's currently there isn't too bad.	2021-04-02 09:25:15 -07:00
Philip Reames	a505801e2b	[rs4gc] Strip nofree and nosync attributes when lowering from abstract model The safepoints being inserted exists to free memory, or coordinate with another thread to do so. Thus, we must strip any inferred attributes and reinfer them after the lowering. I'm not aware of any active miscompiles caused by this, but since I'm working on strengthening inference of both and leveraging them in the optimization decisions, I figured a bit of future proofing was warranted.	2021-04-02 09:12:24 -07:00
Evgeniy Brevnov	2388aae401	[NARY-REASSOCIATE] Support reassociation of min/max Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available. Reviewed By: mkazantsev, lebedev.ri Differential Revision: https://reviews.llvm.org/D88287	2021-04-02 15:30:13 +07:00
Juneyoung Lee	c664769330	[AssumeBundles] offset should be added to correctly calculate align This is a patch to fix the bug in alignment calculation (see https://reviews.llvm.org/D90529#2619492). Consider this code: ``` call void @llvm.assume(i1 true) ["align"(i32* %a, i32 32, i32 28)] %arrayidx = getelementptr inbounds i32, i32* %a, i64 -1 ; aligment of %arrayidx? ``` The llvm.assume guarantees that `%a - 28` is 32-bytes aligned, meaning that `%a` is 32k + 28 for some k. Therefore `a - 4` cannot be 32-bytes aligned but the existing code was calculating the pointer as 32-bytes aligned. The reason why this happened is as follows. `DiffSCEV` stores `%arrayidx - %a` which is -4. `OffSCEV` stores the offset value of “align”, which is 28. `DiffSCEV` + `OffSCEV` = 24 should be used for `a - 4`'s offset from 32k, but `DiffSCEV` - `OffSCEV` = 32 was being used instead. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D98759	2021-04-02 12:32:05 +09:00
Philip Reames	91790c6785	[indvars[ Fix pr49802 by checking for SCEVCouldNotCompute The code is assuming that having an exact exit count for the loop implies that exit counts for every exit are known. This used to be true, but when we added handling for dead exits we broke this invariant. The new invariant is that an exact loop count implies that any exits non trivially dead have exit counts. We could have fixed this by either a) explicitly checking for a dead exit, or b) just testing for SCEVCouldNotCompute. I chose the second as it was simpler. (Debugging this took longer than it should have since I'd mistyped the original assert and it wasn't checking what it was meant to...) p.s. Sorry for the lack of test case. Getting things into a state to actually hit this is difficult and fragile. The original repro involves loop-deletion leaving SCEV in a slightly inprecise state which lets us bypass other transforms in IndVarSimplify on the way to this one. All of my attempts to separate it into a standalone test failed.	2021-04-01 17:53:44 -07:00
Yevgeny Rouban	1ed53d44d8	[LoopFlatten] Do not report CFG analyses as up-to-date Removes CFGAnalyses from the preserved analyses set returned by LoopFlattenPass::run(). Reviewed By: Dave Green, Ta-Wei Tu Differential Revision: https://reviews.llvm.org/D99700	2021-04-01 15:52:36 +07:00
Max Kazantsev	a1d83776bf	[NFC] Undo some erroneous renamings Some vars renamed by mistake during auto-replacements. Undoing them.	2021-04-01 13:10:10 +07:00
Max Kazantsev	630818a850	[NFC] Disambiguate LI in GVN Name GVN uses name 'LI' for two different unrelated things: LoadInst and LoopInfo. This patch relates the variables with former meaning into 'Load' to disambiguate the code.	2021-04-01 12:40:35 +07:00
KAWASHIMA Takahiro	5fac7c6046	[GVN] Propagate llvm.access.group metadata of loads Before this change, the `llvm.access.group` metadata was dropped when moving a load instruction in GVN. This prevents vectorizing a C/C++ loop with `#pragma clang loop vectorize(assume_safety)`. This change propagates the metadata as well as other metadata if it is safe (the move-destination basic block and source basic block belong to the same loop). Differential Revision: https://reviews.llvm.org/D93503	2021-04-01 10:00:48 +09:00
Fangrui Song	3e5ee194c0	[SimpleLoopUnswitch] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after `431a40e1e2`	2021-03-30 19:27:10 -07:00
Juneyoung Lee	431a40e1e2	[LoopUnswitch] Assert that branch condition is either and/or but not both as suggested at https://reviews.llvm.org/rG5bb38e84d3d0#986321	2021-03-31 10:35:22 +09:00
Juneyoung Lee	6b4b1dc6ec	[LoopUnswitch] Simplify branch condition if it is select with constant operands This fixes the miscompilation reported in https://reviews.llvm.org/rG5bb38e84d3d0#986154 . `select _, true, false` matches both m_LogicalAnd and m_LogicalOr, making later transformations confused. Simplify the branch condition to not have the form.	2021-03-30 20:09:42 +09:00
Krasimir Georgiev	8e7df996e3	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `92ddd3c1b6`. Causes multistage clang crashes, e.g.: https://lab.llvm.org/buildbot/#/builders/36/builds/6678	2021-03-30 11:47:12 +02:00
Han Zhu	92ddd3c1b6	[loop-idiom] Hoist loop memcpys to loop preheader For a simple loop like: ``` struct S { int x; int y; char b; }; unsigned foo(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; return sizeof(a[0]); } ``` We could eliminate the loop and convert it to a large memcpy of 12n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll` ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %0 = bitcast %struct.S* %arrayidx2 to i8* %1 = bitcast %struct.S* %arrayidx to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false) %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic. With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass. ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %a1 = bitcast %struct.S* %a to i8* %b2 = bitcast %struct.S* %b to i8* %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = zext i32 %n to i64 %1 = mul nuw nsw i64 %0, 12 call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false) br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %2 = bitcast %struct.S* %arrayidx2 to i8* %3 = bitcast %struct.S* %arrayidx to i8* %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` Reviewed By: zino Differential Revision: https://reviews.llvm.org/D97667	2021-03-29 23:36:26 -07:00
Han Zhu	2bd4049ceb	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `deb5095833`. Bad commit message.	2021-03-29 23:35:35 -07:00
Han Zhu	deb5095833	[loop-idiom] Hoist loop memcpys to loop preheader Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Blame Revision: Differential Revision: https://phabricator.intern.facebook.com/D26380397	2021-03-29 23:14:42 -07:00
Jingu Kang	e4abb64100	[LoopUnswitch] Use reference variables instead of pointer one Differential Revision: https://reviews.llvm.org/D99496	2021-03-29 13:08:46 +01:00
Jingu Kang	cfe87d4edd	[NFC][LoopUnswitch] Move hasPartialIVCondition to LoopUtils Differential revision: https://reviews.llvm.org/D99490	2021-03-29 10:29:45 +01:00
Juneyoung Lee	fc3f0c9cc0	[IRCE] Use m_LogicalAnd This is a minor fix to use m_LogicalAnd. This allows IRCE to recognize select form of and conditions as well.	2021-03-27 15:23:18 +09:00
Florian Hahn	4858e081d7	[ConstraintElimination] Only strip casts preserving the representation. Things like addrspacecast may not be no-ops, so we should not look through them.	2021-03-26 20:07:41 +00:00
Sameer Sahasrabuddhe	b92c8c22b9	[NewPM] Disable non-trivial loop-unswitch on targets with divergence Unswitching a loop on a non-trivial divergent branch is expensive since it serializes the execution of both version of the loop. But identifying a divergent branch needs divergence analysis, which is a function level analysis. The legacy pass manager handles this dependency by isolating such a loop transform and rerunning the required function analyses. This functionality is currently missing in the new pass manager, and there is no safe way for the SimpleLoopUnswitch pass to depend on DivergenceAnalysis. So we conservatively assume that all non-trivial branches are divergent if the target has divergence. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D98958	2021-03-25 11:27:10 +00:00
Philip Reames	9a82f42d12	Plumb TLI through isSafeToExecuteUnconditionally [NFC] Split from D95815 to reduce patch size. Isn't (yet) used for anything, only the client side is wired up.	2021-03-24 17:52:04 -07:00
Congzhe Cao	829c1b6443	[LoopInterchange] fix tightlyNested() in LoopInterchange legality This is yet another attempt to fix tightlyNested(). Add checks in tightlyNested() for the inner loop exit block, such that 1) if there is control-flow divergence in between the inner loop exit block and the outer loop latch, or 2) if the inner loop exit block contains unsafe instructions, tightlyNested() returns false. The reasoning behind is that after interchange, the original inner loop exit block, which was part of the outer loop, would be put into the new inner loop, and will be executed different number of times before and after interchange. Thus it should be dealt with appropriately. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D98263	2021-03-24 15:49:25 -04:00
Nikita Popov	8a168d2d70	[LICM] Fix NumSunk statistic (NFC) LICM can sink instructions that have uses inside the loop, as long as these uses are considered "free". However, if there were only free uses inside the loop, and no uses outside the loop at all, the instruction would still count towards the NumSunk statistic. This resulted in a wild inflation of the NumSunk metric. After this patch it drops down from 1141787 to 5852 on test-suite O3.	2021-03-24 18:28:19 +01:00
Thomas Preud'homme	3b52c04e82	Make FindAvailableLoadedValue TBAA aware FindAvailableLoadedValue() relies on FindAvailablePtrLoadStore() to run the alias analysis when searching for an equivalent value. However, FindAvailablePtrLoadStore() calls the alias analysis framework with a memory location for the load constructed from an address and a size, which thus lacks TBAA metadata info. This commit modifies FindAvailablePtrLoadStore() to accept an optional memory location as parameter to allow FindAvailableLoadedValue() to create it based on the load instruction, which would then have TBAA metadata info attached. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99206	2021-03-24 17:20:26 +00:00
Sander de Smalen	55d18b3cc2	[TTI] Return a TypeSize from getRegisterBitWidth. This patch changes the interface to take a RegisterKind, to indicate whether the register bitwidth of a scalar register, fixed-width vector register, or scalable vector register must be returned. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98874	2021-03-24 14:45:13 +00:00
Ta-Wei Tu	4d9d736875	[NFC] Improve debug message and test description in `4c1f74a`	2021-03-24 18:21:13 +08:00
Ta-Wei Tu	4c1f74a76c	[LoopFlatten] Fix invalid assertion (PR49571) The `InductionPHI` is not necessarily the increment instruction, as demonstrated in pr49571.ll. This patch removes the assertion and instead bails out from the `LoopFlatten` pass if that happens. This fixes https://bugs.llvm.org/show_bug.cgi?id=49571 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D99252	2021-03-24 18:08:27 +08:00
Ta-Wei Tu	8fde25b3c3	[NFC] Remove redundant `struct` prefix Reviewed By: SjoerdMeijer, fhahn Differential Revision: https://reviews.llvm.org/D99251	2021-03-24 17:58:33 +08:00
Florian Hahn	e43e8e9138	[AnnotationRemarks] Use subprogram location for summary remarks. The summary remarks are generated on a per-function basis. Using the first instruction's location is sub-optimal for 2 reasons: 1. Sometimes the first instruction is missing !dbg 2. The location of the first instruction may be mis-leading. Instead, just use the location of the function directly.	2021-03-23 12:05:41 +00:00

... 2 3 4 5 6 ...

11157 Commits