llvm-project

Commit Graph

Author	SHA1	Message	Date
Nicholas Guy	2b6e0c90f9	[AArch64] Enable runtime unrolling for in-order sched models Differential Revision: https://reviews.llvm.org/D97947	2021-04-27 13:22:10 +01:00
Nikita Popov	c456ab78ae	[LoopUnroll] Regenerate test checks (NFC)	2021-04-17 20:59:20 +02:00
Nikita Popov	fe9a5a806e	[LoopUnroll] Make some tests more robust (NFC) Replace branch on undef by branch on unknown condition.	2021-04-17 20:59:20 +02:00
Florian Hahn	acd9cc7495	[AArch64] Use type-legalization cost for code size memop cost. At the moment, getMemoryOpCost returns 1 for all inputs if CostKind is CodeSize or SizeAndLatency. This fools LoopUnroll into thinking memory operations on large vectors have a cost of one, even if they will get expanded to a large number of memory operations in the backend. This patch updates getMemoryOpCost to return the cost for the type legalization for both CodeSize and SizeAndLatency. This should more accurately reflect the number of memory operations required. I am not sure how latency should properly be included in SizeAndLatency from the description, but returning the size cost should be clearly more accurate. This does not cause any binary changes when building MultiSource/SPEC2000/SPEC2006 with -O3 -flto for AArch64, likely because large vector memops are not really formed by code emitted from Clang. But using the C/C++ matrix extension can easily result in code with very large vector operations directly from Clang, e.g. https://clang.godbolt.org/z/6xzxcTGvb Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D100291	2021-04-15 10:11:05 +01:00
Florian Hahn	816cf41462	[LoopUnroll] Add AArch64 test case with large vector ops. Add test case to illustrate over-eager unrolling on AArch64, due to the cost-model not estimating the size of vector loads/stores accurately.	2021-04-11 21:39:52 +01:00
dfukalov	8f4b7e94a2	[AMDGPU][CostModel] Refine cost model for control-flow instructions. Added cost estimation for switch instruction, updated costs of branches, fixed phi cost. Had to increase `-amdgpu-unroll-threshold-if` default value since conditional branch cost (size) was corrected to higher value. Test renamed to "control-flow.ll". Removed redundant code in `X86TTIImpl::getCFInstrCost()` and `PPCTTIImpl::getCFInstrCost()`. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D96805	2021-04-10 09:20:24 +03:00
David Green	da98177cda	[ARM] Allow v6m runtime loop unrolling This removes the restriction that only Thumb2 targets enable runtime loop unrolling, allowing it for Thumb1 only cores as well. The existing T2 heuristics are used (for the time being) to control when and how unrolling is performed. Differential Revision: https://reviews.llvm.org/D99588	2021-04-01 21:21:40 +01:00
David Green	14b2ec934e	[ARM] Enable UpperBound unrolling for all loops This UpperBound unrolling was already enabled so long as a series of conditions in ARMTTIImpl::getUnrollingPreferences pass. This just always enables it as it can help fully unroll loops that would not otherwise pass those tests. Differential Revision: https://reviews.llvm.org/D99174	2021-03-24 16:39:21 +00:00
David Green	003fab9e8d	[ARM] Additional Upper bound unrolling test. NFC	2021-03-23 12:00:40 +00:00
Whitney Tsang	0d8f102809	[NFC][LoopUnroll] Add `-unroll-runtime-other-exit-predictable=false` in `runtime-multiexit-heuristic.ll` Added -unroll-runtime-other-exit-predictable=false in runtime-multiexit-heuristic.ll to make it more robust. runtime-multiexit-heuristic.ll intention is to test -unroll-runtime-multi-exit=false, so the default value of -unroll-runtime-other-exit-predictable should not impact the result. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D98098	2021-03-07 23:51:09 +00:00
Whitney Tsang	40391cef61	[LoopUnrollRuntime] Add option to assume the non latch exit block to be predictable. (Add LIT) Reviewed By: Meinersbur, bmahjour Differential Revision: https://reviews.llvm.org/D97747	2021-03-07 23:48:00 +00:00
Roman Lebedev	b46c085d2b	[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions These intrinsics, not the icmp+select are the canonical form nowadays, so we might as well directly emit them. This should not cause any regressions, but if it does, then then they would needed to be fixed regardless. Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`, but that is a pessimization, not a correctness issue. Additionally, the non-intrinsic form has issues with undef, see https://reviews.llvm.org/D88287#2587863	2021-03-06 21:52:46 +03:00
Dávid Bolvanský	cd54c57919	Reland "[Libcalls, Attrs] Annotate libcalls with noundef" Fixed Clang tests.	2021-02-20 06:18:48 +01:00
Dávid Bolvanský	94d034fb86	Revert "[Libcalls, Attrs] Annotate libcalls with noundef" This reverts commit `33b0c63775`. Bots are failing. Some Clang tests need to be updated too.	2021-02-20 04:18:42 +01:00
Dávid Bolvanský	33b0c63775	[Libcalls, Attrs] Annotate libcalls with noundef I think we can use here same logic as for nonnull. strlen(X) - X must be noundef => valid pointer. for libcalls with size arg, we add noundef only if size is known and greater than 0 - so pointers must be noundef (valid ones) Reviewed By: jdoerfert, aqjune Differential Revision: https://reviews.llvm.org/D95122	2021-02-20 04:10:07 +01:00
Sanjay Patel	378941f611	[ValueTracking] add scan limit for assumes In the motivating example from https://llvm.org/PR49171 and reduced test here, we would unroll and clone assumes so much that compile-time effectively became infinite while analyzing all of those assumes.	2021-02-15 15:24:20 -05:00
Sam Parker	9d81ccc02f	[WebAssembly] Enable loop unrolling Enable partial and runtime unrolling with a threshold of 30, which was derived from a large number of kernels running on node and wasmtime for amd64 and aarch64. Unrolling is enabled by default at -O2 and -O3 and is disabled at -Oz and -Os. Compiling with -Os is recommended if the wasm binary size is the most important factor. Differential Revision: https://reviews.llvm.org/D95125	2021-02-10 08:25:46 +00:00
Gil Rapaport	d475030dc2	[SCEV] Apply loop guards to divisibility tests Extend applyLoopGuards() to take into account conditions/assumes proving some value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the loop unroller and the loop vectorizer identify more loops as not requiring remainder loops. Differential Revision: https://reviews.llvm.org/D95521	2021-02-02 08:09:39 +02:00
Jeroen Dobbelaere	80cdd30eb9	[LoopPeel] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed. The reduction of a sanitizer build failure when enabling the dominance check (D95335) showed that loop peeling also needs to take care of scope duplication, just like loop unrolling (D92887). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95544	2021-02-01 10:01:17 +01:00
Jeroen Dobbelaere	774629641b	[LoopUnroll] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed This is a fix for https://bugs.llvm.org/show_bug.cgi?id=39282. Compared to D90104, this version is based on part of the full restrict patched (D68484) and uses the `@llvm.experimental.noalias.scope.decl` intrinsic to track the location where !noalias and !alias.scope scopes have been introduced. This allows us to only duplicate the scopes that are really needed. Notes: - it also includes changes and tests from D90104 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D92887	2021-01-24 13:48:20 +01:00
Roman Lebedev	1742203844	[SimplifyCFG] FoldBranchToCommonDest(): re-lift restrictions on liveout uses of bonus instructions I have previously tried doing that in `b33fbbaa34` / `d38205144f`, but eventually it was pointed out that the approach taken there was just broken wrt how the uses of bonus instructions are updated to account for the fact that they should now use either bonus instruction or the cloned bonus instruction. In particluar, all that manual handling of PHI nodes in successors was just wrong. But, the fix is actually much much simpler than my initial approach: just tell SSAUpdate about both instances of bonus instruction, and let it deal with all the PHI handling. Alive2 confirms that the reproducers from the original bugs (@pr48450*) are now handled correctly. This effectively reverts commit `59560e8589`, effectively relanding `b33fbbaa34`.	2021-01-23 01:29:05 +03:00
Joseph Tremoulet	40cd262c43	Loop peeling: check that latch is conditional branch Loop peeling assumes that the loop's latch is a conditional branch. Add a check to canPeel that explicitly checks for this, and testcases that otherwise fail an assertion when trying to peel a loop whose back-edge is a switch case or the non-unwind edge of an invoke. Reviewed By: skatkov, fhahn Differential Revision: https://reviews.llvm.org/D94995	2021-01-20 11:01:16 -05:00
Arthur Eubanks	f748e92295	[NewPM] Run non-trivial loop unswitching under -O2/3/s/z Fixes https://bugs.llvm.org/show_bug.cgi?id=48715. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D94448	2021-01-12 11:04:40 -08:00
Serguei Katkov	7f69860243	[LoopUnroll] Fix a crash Loop peeling as a last step triggers loop simplification and this can change the loop structure. As a result all cashed values like latch branch becomes invalid. Patch re-structure the code to take into account the possible changes caused by peeling. Reviewers: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour Reviewed By: Meinersbur, fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D93686	2021-01-11 10:19:26 +07:00
Hiroshi Yamauchi	cf5415c727	[PGO][PGSO] Let unroll hints take precedence over PGSO. Differential Revision: https://reviews.llvm.org/D94199	2021-01-07 10:10:31 -08:00
Juneyoung Lee	ae6e89327b	Precommit tests that have poison as shufflevector's placeholder This commit copies existing tests at llvm/Transforms containing 'shufflevector X, undef' and replaces them with 'shufflevector X, poison'. The new copied tests have -inseltpoison.ll suffix at its file name (as `db7a2f347f` did) See https://reviews.llvm.org/D93793 Test files listed using grep -R -E "^[^;]shufflevector <.> ., <.> undef" \| cut -d":" -f1 \| uniq Test files copied & updated using file_org=llvm/test/Transforms/$1 if [[ "$file_org" = -inseltpoison.ll ]]; then file=$file_org else file=${file_org%.ll}-inseltpoison.ll if [ ! -f $file ]; then cp $file_org $file fi fi sed -i -E 's/^([^;])shufflevector <(.)> (.), <(.)> undef/\1shufflevector <\2> \3, <\4> poison/g' $file head -1 $file \| grep "Assertions have been autogenerated by utils/update_test_checks.py" -q if [ "$?" == 1 ]; then echo "$file : should be manually updated" # The test is manually updated exit 1 fi python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file	2020-12-29 17:09:31 +09:00
Juneyoung Lee	db7a2f347f	Precommit transform tests that have poison as insertelement's placeholder This commit copies existing tests at llvm/Transforms and replaces 'insertelement undef' in those files with 'insertelement poison'. (see https://reviews.llvm.org/D93586) Tests listed using this script: grep -R -E '^[^;]insertelement <.> undef,' . \| cut -d":" -f1 \| uniq \| wc -l Tests updated: file_org=llvm/test/Transforms/$1 file=${file_org%.ll}-inseltpoison.ll cp $file_org $file sed -i -E 's/^([^;])insertelement <(.)> undef/\1insertelement <\2> poison/g' $file head -1 $file \| grep "Assertions have been autogenerated by utils/update_test_checks.py" -q if [ "$?" == 1 ]; then echo "$file : should be manually updated" # I manually updated the script exit 1 fi python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file	2020-12-24 11:46:17 +09:00
Roman Lebedev	5cce4aff18	[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-17 01:03:49 +03:00
Roman Lebedev	aa2009fe78	[NFCI][SimplifyCFG] Mark all the SimplifyCFG tests that already don't invalidate DomTree as such First step after `e113317958`, in these tests, DomTree is valid afterwards, so mark them as such, so that they don't regress. In further steps, SimplifyCFG transforms shall taught to preserve DomTree, in as small steps as possible.	2020-12-17 01:03:49 +03:00
Roman Lebedev	59560e8589	[SimplifyCFG] FoldBranchToCommonDest(): temporairly put back restrictions on liveout uses of bonus instructions (PR48450) Even though `d38205144f` was mostly a correct fix for the external non-PHI users, it's not a generally correct fix, because the 'placeholder' values in those trivial PHI's we create shouldn't be always 'undef', but the PHI itself for the backedges, else we end up with wrong value, as the `@pr48450_2` test shows. But we can't just do that, because we can't check that the PHI can be it's own incoming value when coming from certain predecessor, because we don't have a dominator tree. So until we can address this correctness problem properly, ensure that we don't perform the transformation if there are such problematic external uses. Making dominator tree available there is going to be involved, since `-simplifycfg` pass currently does not preserve/update domtree...	2020-12-14 20:14:31 +03:00
Arthur Eubanks	a820261bf3	[test] Fix store_cost.ll under NPM The NPM processes loops in forward program order, whereas the legacy PM processes them in reverse program order. No reason to test both PMs here, so just stick to the NPM.	2020-12-07 21:19:05 -08:00
Roman Lebedev	b33fbbaa34	Reland [SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions This was orginally committed in `2245fb8aaa`. but was immediately reverted in `f3abd54958` because of a PHI handling issue. Original commit message: 1. It doesn't make sense to enforce that the bonus instruction is only used once in it's basic block. What matters is whether those user instructions fit within our budget, sure, but that is another question. 2. It doesn't make sense to enforce that said bonus instructions are only used within their basic block. Perhaps the branch condition isn't using the value computed by said bonus instruction, and said bonus instruction is simply being calculated to be used in successors? So iff we can clone bonus instructions, to lift these restrictions, we just need to carefully update their external uses to use the new cloned instructions. Notably, this transform (even without this change) appears to be poison-unsafe as per alive2, but is otherwise (including the patch) legal. We don't introduce any new PHI nodes, but only "move" the instructions around, i'm not really seeing much potential for extra cost modelling for the transform, especially since now we allow at most one such bonus instruction by default. This causes the fold to fire +11.4% more (13216 -> 14725) as of vanilla llvm test-suite + RawSpeed. The motivational pattern is IEEE-754-2008 Binary16->Binary32 extension code: `ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)` ^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM apparently that fold is happening somewhere else afterall, so something else also has a similar 'artificial' restriction.	2020-11-27 12:47:15 +03:00
Roman Lebedev	f3abd54958	Revert "[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions" Many bots are unhappy, at the very least missed a few codegen tests, and possibly this has a logic hole inducing a miscompile (will be really awesome to have ready reproducer..) Need to investigate. This reverts commit `2245fb8aaa`.	2020-11-26 23:13:43 +03:00
Roman Lebedev	2245fb8aaa	[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions 1. It doesn't make sense to enforce that the bonus instruction is only used once in it's basic block. What matters is whether those user instructions fit within our budget, sure, but that is another question. 2. It doesn't make sense to enforce that said bonus instructions are only used within their basic block. Perhaps the branch condition isn't using the value computed by said bonus instruction, and said bonus instruction is simply being calculated to be used in successors? So iff we can clone bonus instructions, to lift these restrictions, we just need to carefully update their external uses to use the new cloned instructions. Notably, this transform (even without this change) appears to be poison-unsafe as per alive2, but is otherwise (including the patch) legal. We don't introduce any new PHI nodes, but only "move" the instructions around, i'm not really seeing much potential for extra cost modelling for the transform, especially since now we allow at most one such bonus instruction by default. This causes the fold to fire +11.4% more (13216 -> 14725) as of vanilla llvm test-suite + RawSpeed. The motivational pattern is IEEE-754-2008 Binary16->Binary32 extension code: `ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)` ^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM apparently that fold is happening somewhere else afterall, so something else also has a similar 'artificial' restriction.	2020-11-26 22:51:22 +03:00
Sanjay Patel	99cf39bfed	[LoopUnroll] add test for full unroll that is sensitive to cost-model; NFC See discussion in D90554. This is a partial un-revert of `32dd5870ee`. I'm adding back the baseline tests first, so we don't have to back-track as much in case there are still problems.	2020-11-20 08:15:46 -05:00
Eric Christopher	32dd5870ee	Temporarily Revert "[CostModel] remove cost-kind predicate for intrinsics in basic TTI implementation" as it's causing crashes in the optimizer. A reduced testcase has been posted as a follow-up. This reverts commit `f7eac51b9b`. Temporarily Revert "[CostModel] make default size cost for libcalls small (again)" as it depends upon the primary revert. This reverts commit `8ec7ea3ddc`. Temporarily Revert "[CostModel] add tests for math library calls; NFC" as it depends upon the primary revert. This reverts commit `df09f82599`. Temporarily Revert "[LoopUnroll] add test for full unroll that is sensitive to cost-model; NFC" as it depends upon the primary revert. This reverts commit `618d555e8d`.	2020-11-19 22:10:23 -08:00
Sanjay Patel	8ec7ea3ddc	[CostModel] make default size cost for libcalls small (again) This was changed recently with D90554 / `f7eac51b9b` ...because we had a regression testing blindspot for intrinsics that are expected to be lowered to libcalls. In general, we want the size cost for a scalar call to be cheap even if the other costs are expensive - we expect it to just be a branch with some optional stack manipulation. It is likely that we will want to carve out some exceptions/overrides to this rule as follow-up patches for calls that have some general and/or target-specific difference to the expected lowering. This was noticed as a regression in unrolling, so we have a test for that now along with a couple of direct cost model tests. If the assumed scalarization costs for the oversized vector calls are not realistic, that would be another follow-up refinement of the cost models.	2020-11-14 08:15:35 -05:00
Sanjay Patel	618d555e8d	[LoopUnroll] add test for full unroll that is sensitive to cost-model; NFC See discussion in D90554.	2020-11-13 17:15:23 -05:00
David Green	c7e275388e	[ARM] Don't aggressively unroll vector remainder loops We already do not unroll loops with vector instructions under MVE, but that does not include the remainder loops that the vectorizer produces. These remainder loops will be rarely executed and are not worth unrolling, as the trip count is likely to be low if they get executed at all. Luckily they get llvm.loop.isvectorized to make recognizing them simpler. We have wanted to do this for a while but hit issues with low overhead loops being reverted due to difficult registry allocation. With recent changes that seems to be less of an issue now. Differential Revision: https://reviews.llvm.org/D90055	2020-11-10 17:01:31 +00:00
David Green	44c1a56869	[ARM] Add extra MVE tests for various patches. NFC	2020-11-01 16:24:23 +00:00
Arthur Eubanks	5c31b8b94f	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit `10f2a0d662`. More uint64_t overflows.	2020-10-31 00:25:32 -07:00
Arthur Eubanks	10f2a0d662	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-30 10:03:46 -07:00
Nico Weber	2a4e704c92	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit `e5766f25c6`. Makes clang assert when building Chromium, see https://crbug.com/1142813 for a repro.	2020-10-27 09:26:21 -04:00
Arthur Eubanks	e5766f25c6	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-26 20:24:04 -07:00
Tim Corringham	3c1273d737	[AMDGPU] Add amdgpu specific loop threshold metadata Add new loop metadata amdgpu.loop.unroll.threshold to allow the initial AMDGPU specific unroll threshold value to be specified on a loop by loop basis. The intention is to be able to to allow more nuanced hints, e.g. specifying a low threshold value to indicate that a loop may be unrolled if cheap enough rather than using the all or nothing llvm.loop.unroll.disable metadata. Differential Revision: https://reviews.llvm.org/D84779	2020-10-22 17:21:32 +01:00
Arthur Eubanks	f2f0474c93	[test] Fix FullUnroll.ll I believe the intention of this test added in https://reviews.llvm.org/D71687 was to test LoopFullUnrollPass with clang's -fno-unroll-loops, not its interaction with optnone. Loop unrolling passes don't run under optnone/-O0. Also added back unintentionally removed -disable-loop-unrolling from https://reviews.llvm.org/D85578. Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D86485	2020-09-17 15:56:13 -07:00
Roman Lebedev	95848ea101	[Value][InstCombine] Fix one-use checks in PHI-of-op -> Op-of-PHI[s] transforms to be one-user checks As FIXME said, they really should be checking for a single user, not use, so let's do that. It is not that unusual to have the same value as incoming value in a PHI node, not unlike how a PHI may have the same incoming basic block more than once. There isn't a nice way to do that, Value::users() isn't uniqified, and Value only tracks it's uses, not Users, so the check is potentially costly since it does indeed potentially involes traversing the entire use list of a value.	2020-08-26 20:20:41 +03:00
dfukalov	33e2f69a24	[AMDGPU][LoopUnroll] Increase BB size to analyze for complete unroll. The `UnrollMaxBlockToAnalyze` parameter is used at the stage when we have no information about a loop body BB cost. In some cases, e.g. for simple loop ``` for(int i=0; i<32; ++i){ D = Arr2[i8 + C1]; Arr1[i64 + C2] += C3 * D; Arr1[i64 + C2 + 2048] += C4 D; } ``` current default parameter value is not enough to run deeper cost analyze so the loop is not completely unrolled. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D86248	2020-08-20 10:41:47 +03:00
Sam Parker	dad04e62f1	[NFC] run update test script On Transforms/LoopUnroll/runtime-small-upperbound.ll	2020-08-17 13:54:28 +01:00
Arthur Eubanks	72effd8d5b	[test][LoopUnroll] Cleanup FullUnroll.ll This is in preparation for enabling proper handling of optnone under the NPM. Most optimizations won't run on an optnone function. Previously the test would rely on lots of optimizations to optimize the IR into a simple infinite loop. This is an optnone function, so clearly that shouldn't be the case. This IR was found by printing the module before the LoopFullUnrollerPass ran. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D85578	2020-08-14 16:05:04 -07:00
Sam Parker	ea8448e361	[LoopUnroll] Adjust CostKind query When TTI was updated to use an explicit cost, TCK_CodeSize was used although the default implicit cost would have been the hand-wavey cost of size and latency. So, revert back to this behaviour. This is not expected to have (much) impact on targets since most (all?) of them return the same value for SizeAndLatency and CodeSize. When optimising for size, the logic has been changed to query CodeSize costs instead of SizeAndLatency. This patch also adds a testing option in the unroller so that OptSize thresholds can be specified. Differential Revision: https://reviews.llvm.org/D85723	2020-08-12 12:56:09 +01:00
Arthur Eubanks	b36c39260e	[NewPM] Don't print 'Invalidating all non-preserved analyses' If an analysis is actually invalidated, there's already a log statement for that: 'Invalidating analysis: FooAnalysis'. Otherwise the statement is not very useful. Reviewed By: asbirlea, ychen Differential Revision: https://reviews.llvm.org/D84981	2020-07-30 19:40:29 -07:00
Yuanfang Chen	555cf42f38	[NewPM][PassInstrument] Add PrintPass callback to StandardInstrumentations Problem: Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order. Solution: Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder) This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order. This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want. The test fixes looks messy. It includes changes: - Remove PassInstrumentationAnalysis - Remove PassAdaptor - If a PassAdaptor is for a real pass, the pass is added - Pass reorder (to the correct order), related to PassAdaptor - Add missing passes (due to Debuglogging not passed down) Reviewed By: asbirlea, aeubanks Differential Revision: https://reviews.llvm.org/D84774	2020-07-30 10:07:57 -07:00
Jinsong Ji	d28f86723f	Re-land "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `bf544fa1c3`. Fixed the typo in PPCInstrInfo.cpp.	2020-07-28 14:00:11 +00:00
Jinsong Ji	bf544fa1c3	Revert "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `adffce7153`. This is breaking test-suite, revert while investigation.	2020-07-27 21:07:00 +00:00
Jinsong Ji	adffce7153	[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support Per RFC http://lists.llvm.org/pipermail/llvm-dev/2020-April/141295.html no one is making use of QPX/A2Q/BGQ/BGP CNK anymore. This patch remove the support of QPX/A2Q in llvm, BGQ/BGP in clang, CNK support in openmp/polly. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D83915	2020-07-27 19:24:39 +00:00
Hongtao Yu	f3731d34fa	[LoopUnroll] Update branch weight for remainder loop Unrolling a loop with compile-time unknown trip count results in a remainder loop. The remainder loop executes the remaining iterations of the original loop when the original trip count is not a multiple of the unroll factor. For better profile counts maintenance throughout the optimization pipeline, I'm assigning an artificial weight to the latch branch of the remainder loop. A remainder loop runs up to as many times as the unroll factor subtracted by 1. Therefore I'm assigning the maximum possible trip count as the back edge weight. This should be more accurate than the default non-profile weight, which assumes the back edge runs much more frequently than the exit edge. Differential Revision: https://reviews.llvm.org/D83187	2020-07-15 12:33:29 -07:00
Arthur Eubanks	481709e831	[NewPM][opt] Share -disable-loop-unrolling between pass managers There's no reason to introduce a new option for the NPM. The various PGO options are shared in this manner. Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D83368	2020-07-08 08:50:56 -07:00
Roman Lebedev	c3b8bd1eea	[InstCombine] Always try to invert non-canonical predicate of an icmp Summary: The actual transform i was going after was: https://rise4fun.com/Alive/Tp9H ``` Name: zz Pre: isPowerOf2(C0) && isPowerOf2(C1) && C1 == C0 %t0 = and i8 %x, C0 %r = icmp eq i8 %t0, C1 => %t = icmp eq i8 %t0, 0 %r = xor i1 %t, -1 Name: zz Pre: isPowerOf2(C0) %t0 = and i8 %x, C0 %r = icmp ne i8 %t0, 0 => %t = icmp eq i8 %t0, 0 %r = xor i1 %t, -1 ``` but as it can be seen from the current tests, we already canonicalize most of it, and we are only missing handling multi-use non-canonical icmp predicates. If we have both `!=0` and `==0`, even though we can CSE them, we end up being stuck with them. We should canonicalize to the `==0`. I believe this is one of the cleanup steps i'll need after `-scalarizer` if i end up proceeding with my WIP alloca promotion helper pass. Reviewers: spatel, jdoerfert, nikic Reviewed By: nikic Subscribers: zzheng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83139	2020-07-04 18:12:04 +03:00
Roman Lebedev	17a15c32af	[NFCI][LoopUnroll] s/%tmp/%i/ in one test to silence update script warning	2020-07-04 00:39:36 +03:00
Sam Parker	0724153bbe	[CostModel] Fix cast crash Don't presume instruction operands while matching reductions. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46430 Differential Revision: https://reviews.llvm.org/D82453	2020-07-03 07:53:45 +01:00
Arthur Eubanks	a95796a380	[NewPM][LoopUnroll] Rename unroll* to loop-unroll* The legacy pass is called "loop-unroll", but in the new PM it's called "unroll". Also applied to unroll-and-jam and unroll-full. Fixes various check-llvm tests when NPM is turned on. Reviewed By: Whitney, dmgreen Differential Revision: https://reviews.llvm.org/D82590	2020-06-26 09:28:32 -07:00
Serguei Katkov	eae0d2e9b2	Revert "[Peeling] Extend the scope of peeling a bit" This reverts commit `29b2c1ca72`. The patch causes the DT verifier failure like: DominatorTree is different than a freshly computed one! Not sure the patch itself it wrong but revert to investigate the failure.	2020-06-22 17:48:29 +07:00
Serguei Katkov	29b2c1ca72	[Peeling] Extend the scope of peeling a bit Currently we allow peeling of the loops if there is a exiting latch block and all other exits are blocks ending with deopt. Actually we want that exit would end up with deopt unconditionally but it is not required that exit itself ends with deopt. Reviewers: reames, ashlykov, fhahn, apilipenko, fedor.sergeev Reviewed By: apilipenko Subscribers: hiraditya, zzheng, dantrushin, llvm-commits Differential Revision: https://reviews.llvm.org/D81140	2020-06-22 12:17:44 +07:00
Whitney Tsang	5225cd43e8	[LoopUnroll] Allow loops with multiple exiting blocks where loop latch is not necessary one of them. Summary: Currently LoopUnrollPass already allow loops with multiple exiting blocks, but it is only allowed when the loop latch is one of the exiting blocks. When the loop latch is not an exiting block, then only single exiting block is supported. When possible, the single loop latch or the single exiting block terminator is optimized to an unconditional branch in the unrolled loop. This patch allows loops with multiple exiting blocks even if the loop latch is not one of them. However, the optimization of exiting block terminator to unconditional branch is not done when there exists more than one exiting block. Reviewer: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour Reviewed By: efriedma Subscribers: hiraditya, zzheng, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D81053	2020-06-14 18:44:18 +00:00
dfukalov	c94d32a6b3	[AMDGPU] Increase max iterations count to analyze complete unroll Summary: In some cases inner loops may not get boosts so try to analyze them deeper. Reviewers: rampitec, mzolotukhin Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81204	2020-06-06 16:32:45 +03:00
Whitney Tsang	0fee91a187	[LoopUnroll] Add a test case for rG7873376bb36b. rG7873376bb36b fixes a build failure for allyesconfig. The problem happened when the single exiting block doesn't dominate the loop latch, then the immediate dominator of the exit block should not be the exiting block after unrolling. As the exiting block of different unrolled iteration can branch to the exit block, and the ith exiting block doesn't dominate (i+1)th exiting block, the immediate dominator of the exit block should not the nearest common dominator of the exiting block and the loop latch of the same iteration. Differential Revision: https://reviews.llvm.org/D80477	2020-05-30 20:34:27 +00:00
Eric Christopher	c554c5e159	Fix full unrolling with new pass manager. Last we looked at this and couldn't come up with a reason to change it, but with a pragma for full loop unrolling we bypass every other loop unroll and then fail to fully unroll a loop when the pragma is set. Move the OnlyWhenForced out of the check and into the initialization of the full unroll pass in the new pass manager. This doesn't show up with the old pass manager. Add a new option to opt so that we can turn off loop unrolling manually since this is a difference between clang and opt. Tested with check-clang and check-llvm.	2020-05-29 20:08:21 -07:00
Whitney Tsang	1bc73b02d6	[LoopUnroll] Support loops with exiting block that is neither header nor latch. Summary: Remove the limitation in LoopUnrollPass that exiting block must be either header or latch. Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto, fhahn, efriedma Reviewed By: etiotto, fhahn, efriedma Subscribers: efriedma, lkail, xbolva00, hiraditya, zzheng, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D80477	2020-05-29 01:18:38 +00:00
Whitney Tsang	47ffc81830	Revert "[LoopUnroll] Support loops with exiting block that is neither header nor" This reverts commit `2810582265`. Revert until http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/7334 is resolved.	2020-05-28 19:10:27 +00:00
Whitney Tsang	2810582265	[LoopUnroll] Support loops with exiting block that is neither header nor latch. Summary: Remove the limitation in LoopUnrollPass that exiting block must be either header or latch. Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto, fhahn, efriedma Reviewed By: etiotto, fhahn, efriedma Subscribers: efriedma, lkail, xbolva00, hiraditya, zzheng, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D80477	2020-05-28 18:27:09 +00:00
Florian Hahn	b54a663312	[LoopUnroll] Extend test case with additional loop with larger TC.	2020-05-17 13:55:11 +01:00
Florian Hahn	9e2a99e5b7	[LoopUnroll] Precommit test for PR459393.	2020-05-17 13:29:36 +01:00
Eli Friedman	11aa3707e3	StoreInst should store Align, not MaybeAlign This is D77454, except for stores. All the infrastructure work was done for loads, so the remaining changes necessary are relatively small. Differential Revision: https://reviews.llvm.org/D79968	2020-05-15 12:26:58 -07:00
Eli Friedman	4532a50899	Infer alignment of unmarked loads in IR/bitcode parsing. For IR generated by a compiler, this is really simple: you just take the datalayout from the beginning of the file, and apply it to all the IR later in the file. For optimization testcases that don't care about the datalayout, this is also really simple: we just use the default datalayout. The complexity here comes from the fact that some LLVM tools allow overriding the datalayout: some tools have an explicit flag for this, some tools will infer a datalayout based on the code generation target. Supporting this properly required plumbing through a bunch of new machinery: we want to allow overriding the datalayout after the datalayout is parsed from the file, but before we use any information from it. Therefore, IR/bitcode parsing now has a callback to allow tools to compute the datalayout at the appropriate time. Not sure if I covered all the LLVM tools that want to use the callback. (clang? lli? Misc IR manipulation tools like llvm-link?). But this is at least enough for all the LLVM regression tests, and IR without a datalayout is not something frontends should generate. This change had some sort of weird effects for certain CodeGen regression tests: if the datalayout is overridden with a datalayout with a different program or stack address space, we now parse IR based on the overridden datalayout, instead of the one written in the file (or the default one, if none is specified). This broke a few AVR tests, and one AMDGPU test. Outside the CodeGen tests I mentioned, the test changes are all just fixing CHECK lines and moving around datalayout lines in weird places. Differential Revision: https://reviews.llvm.org/D78403	2020-05-14 13:03:50 -07:00
Jonathan Roelofs	7c5d2bec76	[llvm] Fix missing FileCheck directive colons https://reviews.llvm.org/D77352	2020-04-06 09:59:08 -06:00
Sam Parker	fc2a5ef9c8	[NFC][PowerPC] Update test Run the update script on one of the loop unroll tests.	2020-03-18 16:21:37 +00:00
Max Kazantsev	3dc6e53c97	[LoopPeel] Turn incorrect assert into a check Summary: This patch replaces incorrectt assert with a check. Previously it asserts that if SCEV cannot prove `isKnownPredicate(A != B)`, then it should be able to prove `isKnownPredicate(A == B)`. Both these fact may be not provable. It is shown in the provided test: Could not prove: `{-294,+,-2}<%bb1> != 0` Asserting: `{-294,+,-2}<%bb1> == 0` Obviously, this SCEV is not equal to zero, but 0 is in its range so we cannot also prove that it is not zero. Instead of assert, we should be checking the required conditions explicitly. Reviewers: lebedev.ri, fhahn, sanjoy, fedor.sergeev Reviewed By: lebedev.ri Subscribers: hiraditya, zzheng, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76050	2020-03-12 17:23:07 +07:00
Roman Lebedev	1badf7c33a	[InstComine] Forego of one-use check in `(X - (X & Y)) --> (X & ~Y)` if Y is a constant Summary: This is potentially more friendly for further optimizations, analysies, e.g.: https://godbolt.org/z/G24anE This resolves phase-ordering bug that was introduced in D75145 for https://godbolt.org/z/2gBwF2 https://godbolt.org/z/XvgSua Reviewers: spatel, nikic, dmgreen, xbolva00 Reviewed By: nikic, xbolva00 Subscribers: hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75757	2020-03-06 21:39:07 +03:00
Arkady Shlykov	3dcaf296ae	[Loop Peeling] Add possibility to enable peeling on loop nests. Summary: Current peeling implementation bails out in case of loop nests. The patch introduces a field in TargetTransformInfo structure that certain targets can use to relax the constraints if it's profitable (disabled by default). Also additional option is added to enable peeling manually for experimenting and testing purposes. Reviewers: fhahn, lebedev.ri, xbolva00 Reviewed By: xbolva00 Subscribers: RKSimon, xbolva00, hiraditya, zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D70304	2020-03-02 08:37:11 -08:00
Justin Bogner	b81a337be7	[LoopUnroll] Avoid UB when converting from WeakVH to `Value ` Calling `operator` on a WeakVH with a null value yields a null reference, which is UB. Avoid this by implicitly converting the WeakVH to a `Value *` rather than dereferencing and then taking the address for the type conversion. Differential Revision: https://reviews.llvm.org/D73280	2020-01-23 10:36:39 -08:00
Evgeniy Brevnov	10357e1c89	[LoopUtils] Better accuracy for getLoopEstimatedTripCount. Summary: Current implementation of getLoopEstimatedTripCount returns 1 iteration less than it should. The reason is that in bottom tested loop first iteration is executed before first back branch is taken. For example for loop with !{!"branch_weights", i32 1 // taken, i32 1 // exit} metadata getLoopEstimatedTripCount gives 1 while actual number of iterations is 2. Reviewers: Ayal, fhahn Reviewed By: Ayal Subscribers: mgorny, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71990	2020-01-20 16:58:07 +07:00
Arkady Shlykov	c87982b467	Revert "[Loop Peeling] Add possibility to enable peeling on loop nests." This reverts commit `3f3017e` because there's a failure on peel-loop-nests.ll with LLVM_ENABLE_EXPENSIVE_CHECKS on. Differential Revision: https://reviews.llvm.org/D70304	2020-01-16 10:33:38 -08:00
Mircea Trofin	7acfda633f	[llvm] Make new pass manager's OptimizationLevel a class Summary: The old pass manager separated speed optimization and size optimization levels into two unsigned values. Coallescing both in an enum in the new pass manager may lead to unintentional casts and comparisons. In particular, taking a look at how the loop unroll passes were constructed previously, the Os/Oz are now (==new pass manager) treated just like O3, likely unintentionally. This change disallows raw comparisons between optimization levels, to avoid such unintended effects. As an effect, the O{s\|z} behavior changes for loop unrolling and loop unroll and jam, matching O2 rather than O3. The change also parameterizes the threshold values used for loop unrolling, primarily to aid testing. Reviewers: tejohnson, davidxl Reviewed By: tejohnson Subscribers: zzheng, ychen, mehdi_amini, hiraditya, steven_wu, dexonsmith, dang, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72547	2020-01-16 09:00:56 -08:00
Arkady Shlykov	3f3017e162	[Loop Peeling] Add possibility to enable peeling on loop nests. Summary: Current peeling implementation bails out in case of loop nests. The patch introduces a field in TargetTransformInfo structure that certain targets can use to relax the constraints if it's profitable (disabled by default). Also additional option is added to enable peeling manually for experimenting and testing purposes. Reviewers: fhahn, lebedev.ri, xbolva00 Reviewed By: xbolva00 Subscribers: xbolva00, hiraditya, zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D70304	2020-01-15 08:25:21 -08:00
Arkady Shlykov	019c8d9d15	[NFC] Adjust test cases numbering, test commit. Summary: Test case test14 is missing, adjust the numbering to have a consecutive range. Also a test commit to verify commit access.	2020-01-15 03:44:57 -08:00
Sjoerd Meijer	356685a1d8	Follow up of `67bf9a6154`, minor fix in test case, removed duplicate option	2020-01-10 09:41:41 +00:00
Sjoerd Meijer	67bf9a6154	[SVEV] Recognise hardware-loop intrinsic loop.decrement.reg Teach SCEV about the @loop.decrement.reg intrinsic, which has exactly the same semantics as a sub expression. This allows us to query hardware-loops, which contain this @loop.decrement.reg intrinsic, so that we can calculate iteration counts, exit values, etc. of hardwareloops. This "int_loop_decrement_reg" intrinsic is defined as "IntrNoDuplicate". Thus, while hardware-loops and tripcounts now become analysable by SCEV, this prevents the usual loop transformations from applying transformations on hardware-loops, which is what we want at this point, for which I have added test cases for loopunrolling and IndVarSimplify and LFTR. Differential Revision: https://reviews.llvm.org/D71563	2020-01-10 09:35:00 +00:00
Sam Parker	15c7fa4d11	[ARM][MVE] Don't unroll intrinsic loops. We don't unroll vector loops for MVE targets, but we miss the case when loops only contain intrinsic calls. So just move the logic a bit to catch this case. Differential Revision: https://reviews.llvm.org/D72440	2020-01-09 11:57:34 +00:00
Fangrui Song	502a77f125	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351	2019-12-24 15:57:33 -08:00
Roman Lebedev	0f22e783a0	[InstCombine] Revert rL341831: relax one-use check in foldICmpAddConstant() (PR44100) rL341831 moved one-use check higher up, restricting a few folds that produced a single instruction from two instructions to the case where the inner instruction would go away. Original commit message: > InstCombine: move hasOneUse check to the top of foldICmpAddConstant > > There were two combines not covered by the check before now, > neither of which actually differed from normal in the benefit analysis. > > The most recent seems to be because it was just added at the top of the > function (naturally). The older is from way back in 2008 (r46687) > when we just didn't put those checks in so routinely, and has been > diligently maintained since. From the commit message alone, there doesn't seem to be a deeper motivation, deeper problem that was trying to solve, other than 'fixing the wrong one-use check'. As i have briefly discusses in IRC with Tim, the original motivation can no longer be recovered, too much time has passed. However i believe that the original fold was doing the right thing, we should be performing such a transformation even if the inner `add` will not go away - that will still unchain the comparison from `add`, it will no longer need to wait for `add` to compute. Doing so doesn't seem to break any particular idioms, as least as far as i can see. References https://bugs.llvm.org/show_bug.cgi?id=44100	2019-12-02 18:06:15 +03:00
dfukalov	6fd11b14f6	[AMDGPU] Tune inlining parameters for AMDGPU target (part 2) Summary: Most of IR instructions got better code size estimations after commit `47a5c36b`. So default parameters values should be updated to improve inlining and unrolling for the target. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70391	2019-11-19 16:33:16 +03:00
Philip Reames	8748be7750	[LoopPred] Enable new transformation by default The basic idea of the transform is to convert variant loop exit conditions into invariant exit conditions by changing the iteration on which the exit is taken when we know that the trip count is unobservable. See the original patch which introduced the code for a more complete explanation. The individual parts of this have been reviewed, the result has been fuzzed, and then further analyzed by hand, but despite all of that, I will not be suprised to see breakage here. If you see problems, please don't hesitate to revert - though please do provide a test case. The most likely class of issues are latent SCEV bugs and without a reduced test case, I'll be essentially stuck on reducing them. (Note: A bunch of tests were opted out of the new transform to preserve coverage. That landed in a previous commit to simplify revert cycles if they turn out to be needed.)	2019-11-06 15:41:57 -08:00
Roman Lebedev	4fe94d0331	[LoopUnroll] countToEliminateCompares(): fix handling of [in]equality predicates (PR43840) Summary: I believe this bisects to https://reviews.llvm.org/D44983 (`[LoopUnroll] Only peel if a predicate becomes known in the loop body.`) While that revision did contain tests that showed arguably-subpar peeling for [in]equality predicates that [not] happen in the middle of the loop, it also disabled peeling for the first loop iteration, because latch would be canonicalized to [in]equality comparison.. That was intentional as per https://reviews.llvm.org/D44983#1059583. I'm not 100% sure that i'm using correct checks here, but this fix appears to be going in the right direction.. Let me know if i'm missing some checks here.. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43840 \| PR43840 ]]. Reviewers: fhahn, mkazantsev, efriedma Reviewed By: fhahn Subscribers: xbolva00, hiraditya, zzheng, llvm-commits, fhahn Tags: #llvm Differential Revision: https://reviews.llvm.org/D69617	2019-11-06 15:08:59 +03:00
Roman Lebedev	432a12c803	[NFC][LoopUnroll] Update test coverage for peeling w/ inequality predicates	2019-11-06 15:08:59 +03:00
dfukalov	47a5c36b37	[AMDGPU] Improve code size cost model (part 2) Summary: Added estimations for ShuffleVector, some cast and arithmetic instructions Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69629	2019-11-06 13:55:48 +03:00
Roman Lebedev	12c4a71ca9	[LoopUnroll] peel-loop-conditions.ll: add some 'is even/odd' peeling tests	2019-11-05 13:02:57 +03:00
Roman Lebedev	0405b48646	[NFC][LoopUnroll] Tests for peeling of first iteration (PR43840)	2019-10-30 18:08:54 +03:00
Florian Hahn	596e4ab97a	[LCSSA] Forget values we create LCSSA phis for Summary: Currently we only forget the loop we added LCSSA phis for. But SCEV expressions in other loops could also depend on the instruction we added a PHI for and currently we do not invalidate those expressions. This can happen when we use ScalarEvolution before converting a function to LCSSA form. The SCEV expressions will refer to the non-LCSSA value. If this SCEV expression is then used with the expander, we do not preserve LCSSA form. This patch properly forgets the values we created PHIs for. Those need to be recomputed again. This patch fixes PR43458. Currently SCEV::verify does not catch this mismatch and any test would need to run multiple passes to trigger the error (e.g. -loop-reduce -loop-unroll). I will also look into catching this kind of mismatch in the verifier. Also, we currently forget the whole loop in LCSSA and I'll check if we can be more surgical. Reviewers: efriedma, sanjoy.google, reames Reviewed By: efriedma Subscribers: zzheng, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68194	2019-10-29 12:05:09 +00:00
Zhaoshi Zheng	1128fa0924	[Unroll] Do NOT unroll a loop with small runtime upperbound For a runtime loop if we can compute its trip count upperbound: Don't unroll if: 1. loop is not guaranteed to run either zero or upperbound iterations; and 2. trip count upperbound is less than UnrollMaxUpperBound Unless user or TTI asked to do so. If unrolling, limit unroll factor to loop's trip count upperbound. Differential Revision: https://reviews.llvm.org/D62989 Change-Id: I6083c46a9d98b2e22cd855e60523fdc5a4929c73 llvm-svn: 373017	2019-09-26 21:40:27 +00:00
Serguei Katkov	a44768858c	[Unroll] Add an option to control complete unrolling Add an ability to specify the max full unroll count for LoopUnrollPass pass in pass options. Reviewers: fhahn, fedor.sergeev Reviewed By: fedor.sergeev Subscribers: hiraditya, zzheng, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D67701 llvm-svn: 372305	2019-09-19 06:57:29 +00:00
Florian Hahn	1bd58870e5	[LoopUnroll] Use LoopSize+1 as threshold, to allow unrolling loops matching LoopSize. We use `< UP.Threshold` later on, so we should use LoopSize + 1, to allow unrolling if the result won't exceed to loop size. Fixes PR43305. Reviewers: efriedma, dmgreen, paquette Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D67594 llvm-svn: 372084	2019-09-17 09:02:48 +00:00
Bjorn Pettersson	d804bd17de	[LoopUnroll] Handle certain PHIs in full unrolling properly Summary: When reconstructing the CFG of the loop after unrolling, LoopUnroll could in some cases remove the phi operands of loop-carried values instead of preserving them, resulting in undef phi values after loop unrolling. When doing this reconstruction, avoid removing incoming phi values for phis in the successor blocks if the successor is the block we are jumping to anyway. Patch-by: ebevhan Reviewers: fhahn, efriedma Reviewed By: fhahn Subscribers: bjope, lebedev.ri, zzheng, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66334 llvm-svn: 369886	2019-08-26 09:29:53 +00:00
Serguei Katkov	036e636aa7	[Loop Peeling] Fix silly bug in metadata update. We must update loop metedata before we moved to parent loop if it is present. llvm-svn: 369637	2019-08-22 10:06:46 +00:00
Philip Reames	6cca3ad43e	[RLEV] Rewrite loop exit values for multiple exit loops w/o overall loop exit count We already supported rewriting loop exit values for multiple exit loops, but if any of the loop exits were not computable, we gave up on all loop exit values. This patch generalizes the existing code to handle individual computable loop exits where possible. As discussed in the review, this is a starting point for figuring out a better API. The code is a bit ugly, but getting it in lets us test as we go. Differential Revision: https://reviews.llvm.org/D65544 llvm-svn: 368898	2019-08-14 18:27:57 +00:00
David Green	11c4602fce	[MVE] Don't try to unroll vectorised MVE loops Due to the nature of the beat system in the MVE architecture, along with tail predication and low-overhead loops, unrolling has less benefit compared to normal loops. You can not, for example, hide the latency of a load with other instructions as you can for scalar code. Preventing unrolling also makes the code easier to read and reason about. So if a loop contains vector code, don't enable the runtime unrolling. At least for the time being. Differential Revision: https://reviews.llvm.org/D65803 llvm-svn: 368530	2019-08-11 08:53:18 +00:00
Serguei Katkov	de67affd00	[Loop Peeling] Introduce an option for profile based peeling disabling. This patch adds an ability to disable profile based peeling causing the peeling of all iterations and as a result prohibits further unroll/peeling attempts on that loop. The motivation to get an ability to separate peeling usage in pipeline where in the first part we peel only separate iterations if needed and later in pipeline we apply the full peeling which will prohibit further peeling. Reviewers: reames, fhahn Reviewed By: reames Subscribers: hiraditya, zzheng, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D64983 llvm-svn: 367668	2019-08-02 09:32:52 +00:00
Serguei Katkov	bbdcc82111	[Loop Peeling] Do not close further unroll/peel if profile based peeling was not used. Current peeling cost model can decide to peel off not all iterations but only some of them to eliminate conditions on phi. At the same time if any peeling happens the door for further unroll/peel optimizations on that loop closes because the part of the code thinks that if peeling happened it is profile based peeling and all iterations are peeled off. To resolve this inconsistency the patch provides the flag which states whether the full peeling basing on profile is enabled or not and peeling cost model is able to modify this field like it does not PeelCount. In a separate patch I will introduce an option to allow/disallow peeling basing on profile. To avoid infinite loop peeling the patch tracks the total number of peeled iteration through llvm.loop.peeled.count loop metadata. Reviewers: reames, fhahn Reviewed By: reames Subscribers: hiraditya, zzheng, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D64972 llvm-svn: 367647	2019-08-02 04:29:23 +00:00
Serguei Katkov	cde00c02e1	[Loop Peeling] Fix idom detection algorithm. We'd like to determine the idom of exit block after peeling one iteration. Let Exit is exit block. Let ExitingSet - is a set of predecessors of Exit block. They are exiting blocks. Let Latch' and ExitingSet' are copies after a peeling. We'd like to find an idom'(Exit) - idom of Exit after peeling. It is an evident that idom'(Exit) will be the nearest common dominator of ExitingSet and ExitingSet'. idom(Exit) is a nearest common dominator of ExitingSet. idom(Exit)' is a nearest common dominator of ExitingSet'. Taking into account that we have a single Latch, Latch' will dominate Header and idom(Exit). So the idom'(Exit) is nearest common dominator of idom(Exit)' and Latch'. All these basic blocks are in the same loop, so what we find is (nearest common dominator of idom(Exit) and Latch)'. Reviewers: reames, fhahn Reviewed By: reames Subscribers: hiraditya, zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D65292 llvm-svn: 367044	2019-07-25 19:31:50 +00:00
Serguei Katkov	c6c31da867	[Loop Peeling] Fix the handling of branch weights of peeled off branches. Current algorithm to update branch weights of latch block and its copies is based on the assumption that number of peeling iterations is approximately equal to trip count. However it is not correct. According to profitability check in one case we can decide to peel in case it helps to reduce the number of phi nodes. In this case the number of peeled iteration can be less then estimated trip count. This patch introduces another way to set the branch weights to peeled of branches. Let F is a weight of the edge from latch to header. Let E is a weight of the edge from latch to exit. F/(F+E) is a probability to go to loop and E/(F+E) is a probability to go to exit. Then, Estimated TripCount = F / E. For I-th (counting from 0) peeled off iteration we set the the weights for the peeled latch as (TC - I, 1). It gives us reasonable distribution, The probability to go to exit 1/(TC-I) increases. At the same time the estimated trip count of remaining loop reduces by I. As a result after peeling off N iteration the weights will be (F - N * E, E) and trip count of loop becomes F / E - N or TC - N. The idea is taken from the review of the patch D63918 proposed by Philip. Reviewers: reames, mkuper, iajbar, fhahn Reviewed By: reames Subscribers: hiraditya, zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D64235 llvm-svn: 366665	2019-07-22 05:15:34 +00:00
Nick Desaulniers	c4f245b40a	[LoopUnroll+LoopUnswitch] do not transform loops containing callbr Summary: There is currently a correctness issue when unrolling loops containing callbr's where their indirect targets are being updated correctly to the newly created labels, but their operands are not. This manifests in unrolled loops where the second and subsequent copies of callbr instructions have blockaddresses of the label from the first instance of the unrolled loop, which would result in nonsensical runtime control flow. For now, conservatively do not unroll the loop. In the future, I think we can pursue unrolling such loops provided we transform the cloned callbr's operands correctly. Such a transform and its legalities are being discussed in: https://reviews.llvm.org/D64101 Link: https://bugs.llvm.org/show_bug.cgi?id=42489 Link: https://groups.google.com/forum/#!topic/clang-built-linux/z-hRWP9KqPI Reviewers: fhahn, hfinkel, efriedma Reviewed By: fhahn, hfinkel, efriedma Subscribers: efriedma, hiraditya, zzheng, dmgreen, llvm-commits, pirama, kees, nathanchance, E5ten, craig.topper, chandlerc, glider, void, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D64368 llvm-svn: 366130	2019-07-15 21:16:29 +00:00
David Zarzycki	12400b9783	[Testing] Add missing "REQUIRES: asserts" This broke after r366048 / https://reviews.llvm.org/D63923 llvm-svn: 366065	2019-07-15 14:12:35 +00:00
Serguei Katkov	d021ad9fbe	[Loop Peeling] Fix the bug with IDom setting for exit loops It is possible that loop exit has two predecessors in a loop body. In this case after the peeling the iDom of the exit should be a clone of iDom of original exit but no a clone of a block coming to this exit. Reviewers: reames, fhahn Reviewed By: reames Subscribers: hiraditya, zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D64618 llvm-svn: 366050	2019-07-15 09:13:11 +00:00
Serguei Katkov	3ed93b4673	[Loop Peeling] Enable peeling for loops with multiple exits This CL enables peeling of the loop with multiple exits where one exit should be from latch and others are basic blocks with call to deopt. The peeling is enabled under the flag which is false by default. Reviewers: reames, mkuper, iajbar, fhahn Reviewed By: reames Subscribers: xbolva00, hiraditya, zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D63923 llvm-svn: 366048	2019-07-15 08:26:45 +00:00
Florian Hahn	4c11b5268c	[LoopUnroll] Add support for loops with exiting headers and uncond latches. This patch generalizes the UnrollLoop utility to support loops that exit from the header instead of the latch. Usually, LoopRotate would take care of must of those cases, but in some cases (e.g. -Oz), LoopRotate does not kick in. Codesize impact looks relatively neutral on ARM64 with -Oz + LTO. Program master patch diff External/S.../CFP2006/447.dealII/447.dealII 629060.00 627676.00 -0.2% External/SPEC/CINT2000/176.gcc/176.gcc 1245916.00 1244932.00 -0.1% MultiSourc...Prolangs-C/simulator/simulator 86100.00 86156.00 0.1% MultiSourc...arks/Rodinia/backprop/backprop 66212.00 66252.00 0.1% MultiSourc...chmarks/Prolangs-C++/life/life 67276.00 67312.00 0.1% MultiSourc...s/Prolangs-C/compiler/compiler 69824.00 69788.00 -0.1% MultiSourc...Prolangs-C/assembler/assembler 86672.00 86696.00 0.0% Reviewers: efriedma, vsk, paquette Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D61962 llvm-svn: 364398	2019-06-26 09:16:57 +00:00
Matt Arsenault	5a89ba7343	InstCombine: Preserve nuw when reassociating nuw ops [1/3] Alive says this is OK. llvm-svn: 364233	2019-06-24 21:36:59 +00:00
Orlando Cazalet-Hyams	1251cac62a	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 > llvm-svn: 363046 llvm-svn: 363786	2019-06-19 10:50:47 +00:00
Adrian Prantl	1db8d4a866	Fix broken debug info in in an !llvm.loop attachment in this testcase. llvm-svn: 363730	2019-06-18 20:07:53 +00:00
Fangrui Song	ac14f7b10c	[lit] Delete empty lines at the end of lit.local.cfg NFC llvm-svn: 363538	2019-06-17 09:51:07 +00:00
Orlando Cazalet-Hyams	a947156396	Revert "[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion" This reverts commit `1a0f7a2077`. See phabricator thread for D60831. llvm-svn: 363132	2019-06-12 08:34:51 +00:00
Orlando Cazalet-Hyams	1a0f7a2077	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 llvm-svn: 363046	2019-06-11 10:37:20 +00:00
David Green	d847aa573b	[ARM] Enable Unroll UpperBound This option allows loops with small max trip counts to be fully unrolled. This can help with code like the remainder loops from manually unrolled loops like those that appear in the cmsis dsp library. We would apparently previously runtime unroll them with the default unroll count (4). Differential Revision: https://reviews.llvm.org/D63064 llvm-svn: 362928	2019-06-10 10:22:14 +00:00
Matt Arsenault	8dbeb9256c	TTI: Improve default costs for addrspacecast For some reason multiple places need to do this, and the variant the loop unroller and inliner use was not handling it. Also, introduce a new wrapper to be slightly more precise, since on AMDGPU some addrspacecasts are free, but not no-ops. llvm-svn: 362436	2019-06-03 18:41:34 +00:00
Simon Tatham	760df47b77	[ARM] Replace fp-only-sp and d16 with fp64 and d32. Those two subtarget features were awkward because their semantics are reversed: each one indicates the _lack_ of support for something in the architecture, rather than the presence. As a consequence, you don't get the behavior you want if you combine two sets of feature bits. Each SubtargetFeature for an FP architecture version now comes in four versions, one for each combination of those options. So you can still say (for example) '+vfp2' in a feature string and it will mean what it's always meant, but there's a new string '+vfp2d16sp' meaning the version without those extra options. A lot of this change is just mechanically replacing positive checks for the old features with negative checks for the new ones. But one more interesting change is that I've rearranged getFPUFeatures() so that the main FPU feature is appended to the output list before rather than after the features derived from the Restriction field, so that -fp64 and -d32 can override defaults added by the main feature. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60691 llvm-svn: 361845	2019-05-28 16:13:20 +00:00
Philip Reames	bd8d309111	[IndVars] Extend reasoning about loop invariant exits to non-header blocks Noticed while glancing through the code for other reasons. The extension is trivial enough, decided to just do it. llvm-svn: 360694	2019-05-14 17:20:10 +00:00
Kostya Serebryany	b9c5768302	revert r360162 as it breaks most of the buildbots llvm-svn: 360190	2019-05-07 20:57:11 +00:00
Orlando Cazalet-Hyams	78a6062c24	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel Reviewed By: hfinkel Subscribers: bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 llvm-svn: 360162	2019-05-07 15:37:38 +00:00
Florian Hahn	893aea58ea	[LoopUnroll] Allow unrolling if the unrolled size does not exceed loop size. Summary: In the following cases, unrolling can be beneficial, even when optimizing for code size: 1) very low trip counts 2) potential to constant fold most instructions after fully unrolling. We can unroll in those cases, by setting the unrolling threshold to the loop size. This might highlight some cost modeling issues and fixing them will have a positive impact in general. Reviewers: vsk, efriedma, dmgreen, paquette Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D60265 llvm-svn: 358586	2019-04-17 15:57:43 +00:00
Eric Christopher	cee313d288	Revert "Temporarily Revert "Add basic loop fusion pass."" The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552	2019-04-17 04:52:47 +00:00
Eric Christopher	a863435128	Temporarily Revert "Add basic loop fusion pass." As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546	2019-04-17 02:12:23 +00:00
Hiroshi Yamauchi	09e539fcae	[PGO] Profile guided code size optimization. Summary: Enable some of the existing size optimizations for cold code under PGO. A ~5% code size saving in big internal app under PGO. The way it gets BFI/PSI is discussed in the RFC thread http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html Note it doesn't currently touch loop passes. Reviewers: davidxl, eraman Reviewed By: eraman Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59514 llvm-svn: 358422	2019-04-15 16:49:00 +00:00
Florian Hahn	6ab83b7db6	[LoopUnrollPeel] Add case where we should forget the peeled loop from SCEV. The test case requires the peeled loop to be forgotten after peeling, even though it does not have a parent. When called via the unroller, SE->forgetTopmostLoop is also called, so the test case would also pass without any SCEV invalidation, but peelLoop is exposed as utility function. Also, in the test case, simplifyLoop will make changes, removing the loop from SCEV, but it is better to not rely on this behavior. Reviewers: sanjoy, mkazantsev Reviewed By: mkazantsev Tags: #llvm Differential Revision: https://reviews.llvm.org/D58192 llvm-svn: 354031	2019-02-14 13:59:39 +00:00
Anna Thomas	2dfa412efe	[UnrollRuntime] Fix domTree failures in multiexit unrolling Summary: This fixes the IDom for exit blocks and all blocks reachable from the exit blocks, when runtime unrolling under multiexit/exiting case. We initially had a restrictive check that the IDom is only updated when it is the header of the loop. However, we also need to update the IDom to the correct one when the IDom is any block within the original loop. See added test cases (which fail dom tree verification without the patch). Reviewers: reames, mzolotukhin, mkazantsev, hfinkel Reviewed by: brzycki, kuhar Subscribers: zzheng, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D56284 llvm-svn: 350640	2019-01-08 17:16:25 +00:00
Anna Thomas	bae11e7999	[UnrollRuntime] NFC: Updated exiting tests and added more tests Added more tests for multiple exiting blocks to the LatchExit. Today these cases are not supported. Patch to follow soon. llvm-svn: 350135	2018-12-28 19:21:50 +00:00
Anna Thomas	98743fa77a	[UnrollRuntime] NFC: Add comment and verify LCSSA Added -verify-loop-lcssa to test cases. Updated comments in ConnectProlog. llvm-svn: 350131	2018-12-28 18:52:16 +00:00
Michael Kruse	3284775b70	[LoopUnroll] Honor '#pragma unroll' even with -fno-unroll-loops. When using clang with `-fno-unroll-loops` (implicitly added with `-O1`), the LoopUnrollPass is not not added to the (legacy) pass pipeline. This also means that it will not process any loop metadata such as llvm.loop.unroll.enable (which is generated by #pragma unroll or WarnMissedTransformationsPass emits a warning that a forced transformation has not been applied (see https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html). Such explicit transformations should take precedence over disabling heuristics. This patch unconditionally adds LoopUnrollPass to the optimizing pipeline (that is, it is still not added with `-O0`), but passes a flag indicating whether automatic unrolling is dis-/enabled. This is the same approach as LoopVectorize uses. The new pass manager's pipeline builder has no option to disable unrolling, hence the problem does not apply. Differential Revision: https://reviews.llvm.org/D55716 llvm-svn: 349509	2018-12-18 17:16:05 +00:00
Michael Kruse	7244852557	[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes. When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g. #pragma clang loop unroll_and_jam(enable) #pragma clang loop distribute(enable) is the same as #pragma clang loop distribute(enable) #pragma clang loop unroll_and_jam(enable) and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used. This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance, !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.unroll_and_jam.enable"} !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3} !3 = !{!"llvm.loop.distribute.enable"} defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop. Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account. For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations. Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated. To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied. With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling). Reviewed By: hfinkel, dmgreen Differential Revision: https://reviews.llvm.org/D49281 Differential Revision: https://reviews.llvm.org/D55288 llvm-svn: 348944	2018-12-12 17:32:52 +00:00
Fedor Sergeev	412ed34744	[LoopUnroll] allow customization for new-pass-manager version of LoopUnroll Unlike its legacy counterpart new pass manager's LoopUnrollPass does not provide any means to select which flavors of unroll to run (runtime, peeling, partial), relying on global defaults. In some cases having ability to run a restricted LoopUnroll that does more than LoopFullUnroll is needed. Introduced LoopUnrollOptions to select optional unroll behaviors. Added 'unroll<peeling>' to PassRegistry mainly for the sake of testing. Reviewers: chandlerc, tejohnson Differential Revision: https://reviews.llvm.org/D53440 llvm-svn: 345723	2018-10-31 14:33:14 +00:00
Fedor Sergeev	f923e86205	[LoopUnroll] NFC. Factor out runtime-loop.ll common test behavior. Adding COMMON prefix to get common part handled there. Needed to simplify test changes for D53440. llvm-svn: 345538	2018-10-29 20:38:23 +00:00
Sam Parker	a16667e79b	[ARM] Use Cortex-A57 sched model for Cortex-A72 This mirrors what we already do for AArch64 as the cores are similar. As discussed in the review, enabling the machine scheduler causes more variations in performance changes so it is not enabled for now. This patch improves LNT scores by a geomean of 1.57% at -O3. Differential Revision: https://reviews.llvm.org/D53562 llvm-svn: 345272	2018-10-25 15:08:29 +00:00
Vyacheslav Zakharin	e06831a3b2	Remove LoopID metadata from the branch instruction that follows the peeled iterations. Differential Revision: https://reviews.llvm.org/D52176 llvm-svn: 343054	2018-09-26 01:03:21 +00:00
David Green	9108c2b921	[LoopUnroll] Add check to Latch's terminator in UnrollRuntimeLoopRemainder In this patch, I'm adding an extra check to the Latch's terminator in llvm::UnrollRuntimeLoopRemainder, similar to how it is already done in the llvm::UnrollLoop. The compiler would crash if this function is called with a malformed loop. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D51486 llvm-svn: 342958	2018-09-25 10:08:47 +00:00
Matt Arsenault	72d27f5525	AMDGPU: Fix tests using old number for constant address space llvm-svn: 341770	2018-09-10 02:54:25 +00:00
Craig Topper	b7b353be60	[X86] Make Feature64Bit useful We now only add +64bit to the CPU string for "generic" CPU. All other CPU names are assumed to have the feature flag already set if they support 64-bit. I've remove the implies from CMPXCHG8 so that Feature64Bit only comes in via CPUs or user passing -mattr=+64bit. I've changed the assert to a report_fatal_error so it's not lost in Release builds. The test updates are to fix things that tripped the new error. Differential Revision: https://reviews.llvm.org/D51231 llvm-svn: 341022	2018-08-30 06:01:05 +00:00
Matt Arsenault	2c1a570aab	LoopUnroll: Allow analyzing intrinsic call costs I'm not sure why the code here is skipping calls since TTI does try to do something for general calls, but it at least should allow intrinsics. Skip intrinsics that should not be omitted as calls, which is by far the most common case on AMDGPU. llvm-svn: 335645	2018-06-26 18:51:17 +00:00
Shiva Chen	2c864551df	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label. In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841	2018-05-09 02:40:45 +00:00
Florian Hahn	ac27758895	[LoopUnroll] Only peel if a predicate becomes known in the loop body. If a predicate does not become known after peeling, peeling is unlikely to be beneficial. Reviewers: mcrosier, efriedma, mkazantsev, junbuml Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D44983 llvm-svn: 330250	2018-04-18 12:29:24 +00:00
Chad Rosier	45735b8e40	[LoopUnroll] Make LoopPeeling respect the AllowPeeling preference. The SimpleLoopUnrollPass isn't suppose to perform loop peeling. Differential Revision: https://reviews.llvm.org/D45334 llvm-svn: 329395	2018-04-06 13:57:21 +00:00
Ikhlas Ajbar	b7322e8ac7	peel loops with runtime small trip counts For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329042	2018-04-03 03:39:43 +00:00
Krzysztof Parzyszek	fce30c2ba3	Revert "peel loops with runtime small trip counts" This reverts commit r328854, it breaks some Hexagon tests. llvm-svn: 328875	2018-03-30 16:55:44 +00:00

1 2 3 4 5 ...

501 Commits