llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	a34428f07d	[LV] Use variables instead of hard-coded metadata IDs in tests.	2022-08-16 12:21:49 +01:00
Zain Jaffal	94d21a94d9	[AArch64] Add tests to check for loop vectorization of non temporal loads Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D131899	2022-08-16 09:40:51 +01:00
Philip Reames	33e7a0a33b	[RISCV][LV] Add test coverage for upcoming dependence distance handling change	2022-08-15 15:20:36 -07:00
Florian Hahn	4f04be5649	[LV] Add tests for vectorizing select of minimum idx idiom. Test cases for selecting the index with the minimum value.	2022-08-14 17:44:11 +01:00
Martin Sebor	0dcfe7aa35	[InstCombine] Tighten up known library function signature tests (PR #56463 ) Replace a switch statement used to validate arguments to known library functions with a more consistent table-driven approach and tighten it up.	2022-08-10 14:15:46 -06:00
Dinar Temirbulatov	cab6cd6834	[AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding. After D121595 was commited, I noticed regressions assosicated with small trip count numbersvectorisation by tail folding with scalable vectors. As a solution for those issues I propose to introduce the minimal trip count threshold value. Differential Revision: https://reviews.llvm.org/D130755	2022-08-09 22:10:17 +01:00
jacquesguan	45bae1be90	[RISCV][test] Add inloop reduction vectorize test. NFC	2022-08-04 15:06:44 +08:00
Philip Reames	0b47615fcf	[LV] Recognize store of invariant value to invariant address as uniform This extends the handling of uniform memory operations to handle the case where a store is storing a loop invariant value. Unlike the general case of a store to an invariant address where we must use the last active lane, in this case we can use any lane since all lanes must produce the same result. For context, the basic structure of the existing code and how the change fits in: * First, we select a widening strategy. (The result is irrelevant for this patch.) * Then we determine if a computation is uniform within all lanes of VF. (Note this is the uniform-per-part definition, not LAI's uniform across all unrolled iterations definition.) * If it is, we overrule the widening strategy, and unconditionally scalarize. * VPReplicationRecipe - which is what actually does the scalarization - knows how to handle unform-per-part values including for scalable vectors. However, we do need to know that the expression is safe to execute without predication - e.g. the uniform mem op was unconditional in the original loop. (This part was split off and already landed.) An obvious question is why not simply implement the generic case? The answer is that I'm going to, but doing so without a canonicalization towards uniform causes regressions due to bad interaction with scalarization/uniformity of values feeding the uniform mem-op. This patch is needed to avoid those regressions. Differential Revision: https://reviews.llvm.org/D130364	2022-08-02 08:09:49 -07:00
David Sherwood	4ef9cb6c17	[AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses If we have interleave groups in the loop we want to vectorise then we should fall back on normal vectorisation with a scalar epilogue. In such cases when tail-folding is enabled we'll almost certainly go on to create vplans with very high costs for all vector VFs and fall back on VF=1 anyway. This is likely to be worse than if we'd just used an unpredicated vector loop in the first place. Once the vectoriser has proper support for analysing all the costs for each combination of VF and vectorisation style, then we should be able to remove this. Added an extra test here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D128342	2022-08-02 09:52:33 +01:00
Florian Hahn	ff5ae948a7	[LV] Add variation of test cases with order of phis flipped. Additional tests with integer and pointer inductions for D119661.	2022-08-01 11:38:16 +01:00
Florian Hahn	6e1ba62d0d	[LV] Add additional tests with multiple chained recurrences. Adds more extra tests for D119661. Also update the test to use opaque pointers.	2022-08-01 10:01:19 +01:00
Philip Reames	82c1b136db	[LV] Don't predicate uniform mem op stores unneccessarily We already had the reasoning about uniform mem op loads; if the address is accessed at least once, we know the instruction doesn't need predicated to ensure fault safety. For stores, we do need to ensure that the values visible in memory are the same with and without predication. The easiest sub-case to check for is that all the values being stored are the same. Since we know that at least one lane is active, this tells us that the value must be visible. Warning on confusing terminology: "uniform" vs "uniform mem op" mean two different things here, and this patch is specific to the later. It would not be legal to make this same change for merely "uniform" operations. Differential Revision: https://reviews.llvm.org/D130637	2022-07-28 08:55:52 -07:00
Philip Reames	15c645f7ee	[RISCV] Enable (scalable) vectorization by default This change enables vectorization (using scalable vectorization only, fixed vectors are not yet enabled) for RISCV when vector instructions are available for the target configuration. At this point, the resulting configuration should be both stable (e.g. no crashes), and profitable (i.e. few cases where scalar loops beat vector ones), but is not going to be particularly well tuned (i.e. we emit the best possible vector loop). The goal of this change is to align testing across organizations and ensure the default configuration matches what downstreams are using as closely as possible. This exposes a large amount of code which hasn't otherwise been on by default, and thus may not have been fully exercised. Given that, having issues fall out is not unexpected. If you find issues, please make sure to include as much information as you can when reverting this change. Differential Revision: https://reviews.llvm.org/D129013	2022-07-27 12:36:04 -07:00
Philip Reames	e8ceadd0ce	[LV][RISCV] Add a test case for a quality problem mixing vector index and data types The problem here is target independent, but particularly painful on RISCV. If we chose to vectorize such that vscale x 2 x i32 is our widest type and fits in a register, a naive expansion of i64 comparisons results in comparisons and index types at <scalabe x 2 x i64>. This requires both an LMUL of 2, and a VSETVLI toggle in the loop. Note that we could have used <vscale x 2 x i32> for the compairons legally given the range of the trip count.	2022-07-27 11:42:28 -07:00
Florian Hahn	16e0620d6d	[VPlan] Mark VPPredInstPHIRecipe as not having side-effects. Now that all uses of VPPredInstPHIRecipes are properly modeled, they can be treated as not having side-effects, enabling removal.	2022-07-27 19:29:26 +01:00
Philip Reames	43b5e12159	[LV] Refresh an autogened test to pickup naming changes	2022-07-27 10:54:15 -07:00
Philip Reames	ebee4fbb34	[RISCV][LV] Add basic tests for default configuration All of our other tests are functionality tests constrained to some specific configuration. This one is intended to float with the default configuration so that changes in that default are visible in reviews. Note that our current default does not enable vectorization at all; thus the current output is unvectorized.	2022-07-27 09:16:44 -07:00
Florian Hahn	a8fdc247e9	[LV] Add missing uses to test to make them more robust. The changes ensure the VPPredInstPHIRecipes are actually used and cannot be remove by VP-DCE.	2022-07-27 16:06:52 +01:00
Sanjay Patel	bfb9b8e075	[Passes] add a tail-call-elim pass near the end of the opt pipeline We call tail-call-elim near the beginning of the pipeline, but that is too early to annotate calls that get added later. In the motivating case from issue #47852, the missing 'tail' on memset leads to sub-optimal codegen. I experimented with removing the early instance of tail-call-elim instead of just adding another pass, but that appears to be slightly worse for compile-time: +0.15% vs. +0.08% time. "tailcall" shows adding the pass; "tailcall2" shows moving the pass to later, then adding the original early pass back (so 1596886802 is functionally equivalent to 180b0439dc ): https://llvm-compile-time-tracker.com/index.php?config=NewPM-O3&stat=instructions&remote=rotateright Note that there was an effort to split the tail call functionality into 2 passes - that could help reduce compile-time if we find that this change costs more in compile-time than expected based on the preliminary testing: D60031 Differential Revision: https://reviews.llvm.org/D130374	2022-07-25 15:25:47 -04:00
Nuno Lopes	a30e77b6f6	fix tests for commit `9df0b254d2`	2022-07-23 22:32:30 +01:00
Nuno Lopes	9df0b254d2	[NFC] Switch a few uses of undef to poison as placeholders for unreachable code	2022-07-23 21:50:11 +01:00
Philip Reames	bd75350180	[LV] Fix a conceptual mistake around meaning of uniform in isPredicatedInst This code confuses LV's "Uniform" and LVL/LAI's "Uniform". Despite the common name, these are different. * LVs notion means that only the first lane of each unrolled part is required. That is, lanes within a single unroll factor are considered uniform. This allows e.g. widenable memory ops to be considered uses of uniform computations. * LVL and LAI's notion refers to all lanes across all unrollings. IsUniformMem is in turn defined in terms of LAI's notion. Thus a UniformMemOpmeans is a memory operation with a loop invariant address. This means the same address is accessed in every iteration. The tweaked piece of code was trying to match a uniform mem op (i.e. fully loop invariant address), but instead checked for LV's notion of uniformity. In theory, this meant with UF > 1, we could speculate a load which wasn't safe to execute. This ends up being mostly silent in current code as it is nearly impossible to create the case where this difference is visible. The closest I've come in the test case from 54cb87, but even then, the incorrect result is only visible in the vplan debug output; before this change we sink the unsafely speculated load back into the user's predicate blocks before emitting IR. Both before and after IR are correct so the differences aren't "interesting". The other test changes are uninteresting. They're cases where LV's uniform analysis is slightly weaker than SCEV isLoopInvariant.	2022-07-21 15:44:34 -07:00
Philip Reames	54cb87964d	[LV] Add a load focused version of the r45679 test This a reproducer for bug in predicated instruction handling. The final result code is correct, but the reasoning by which we get there isn't.	2022-07-21 15:33:42 -07:00
Philip Reames	83993d666b	[LV][SVE] Autogen a test for ease of update	2022-07-21 13:12:53 -07:00
Philip Reames	27945f9282	[RISCV][LV] Split coverage of uniform load with outside use Turns out this has a large effect of tail folding, so split out a single test to cover that case and remove it from the others.	2022-07-21 12:07:26 -07:00
Philip Reames	bb5dc2918f	{RISCV][LV] Add tail folding coverage of uniform load store cases	2022-07-21 11:15:36 -07:00
Philip Reames	56a25ed208	{RISCV][LV] Add a test for uniform store of a loop varying value	2022-07-21 11:15:36 -07:00
Philip Reames	0ae46693f0	{RISCV][LV] Split out and expand tests for uniform loads and stores	2022-07-21 10:42:18 -07:00
David Sherwood	f15b6b2907	[AArch64] Add target hook for preferPredicateOverEpilogue This patch adds the AArch64 hook for preferPredicateOverEpilogue, which currently returns true if SVE is enabled and one of the following conditions (non-exhaustive) is met: 1. The "sve-tail-folding" option is set to "all", or 2. The "sve-tail-folding" option is set to "all+noreductions" and the loop does not contain reductions, 3. The "sve-tail-folding" option is set to "all+norecurrences" and the loop has no first-order recurrences. Currently the default option is "disabled", but this will be changed in a later patch. I've added new tests to show the options behave as expected here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D129560	2022-07-21 17:20:06 +01:00
David Sherwood	ceb6c23b70	[NFC][LoopVectorize] Explicitly disable tail-folding on some SVE tests This patch is in preparation for enabling vectorisation with tail-folding by default for SVE targets. Once we do that many existing tests will break that depend upon having normal unpredicated vector loops. For all such tests I have added the flag: -prefer-predicate-over-epilogue=scalar-epilogue Differential Revision: https://reviews.llvm.org/D129137	2022-07-21 15:23:00 +01:00
Philip Reames	f934b9b073	[LV] Refresh a couple of autogen tests for naming change These appear to just be changes in temporary identifiers; bit suprising we have so many.	2022-07-20 14:47:52 -07:00
Philip Reames	1a73ef75fa	[LV] Autogen a test for ease of update	2022-07-20 08:19:38 -07:00
Philip Reames	be25f52fec	[LV] Autogen several tests for ease of update in upcoming change	2022-07-20 07:17:51 -07:00
Philip Reames	523a526a02	[LV] Fix miscompile due to srem/sdiv speculation safety condition An srem or sdiv has two cases which can cause undefined behavior, not just one. The existing code did not account for this, and as a result, we miscompiled when we encountered e.g. a srem i64 %v, -1 in a conditional block. Instead of hand rolling the logic, just use the utility function which exists exactly for this purpose. Differential Revision: https://reviews.llvm.org/D130106	2022-07-20 05:35:23 -07:00
David Sherwood	79660d339e	[LoopVectorize][AArch64] Add TTI hook preferPredicatedReductionSelect By default if SVE is enabled we want the select instruction used for reductions to be inside the loop, rather than outside. This makes it possible for the backend to fold the select into the operation to produce a single predicated add, fadd, etc. Differential Revision: https://reviews.llvm.org/D129763	2022-07-20 09:33:29 +01:00
Philip Reames	f1243fa193	[LV] Autogen a partially autogened test for ease of update	2022-07-19 14:18:53 -07:00
Philip Reames	8353403f08	[LV] Add test for generic predicated sdiv	2022-07-19 12:33:36 -07:00
Philip Reames	2247fe856a	[LV] Add test coverage for a bug in srem handling	2022-07-19 11:29:17 -07:00
Philip Reames	b7d3ba4bdb	[LV] Add test coverage for scalable div/rem patterns	2022-07-19 11:02:14 -07:00
David Sherwood	34f81cfa3d	[LoopVectorize][NFC] Split reductions out from sve-tail-folding into new file In sve-tail-folding-reductions.ll I've also added an extra RUN line to test normal reductions, i.e. not in-loop. This patch is a pre-commit in preparation for a follow-on patch that changes how reduction selects are generated in the vector loop. Differential Revision: https://reviews.llvm.org/D129761	2022-07-18 13:56:39 +01:00
David Sherwood	1e77b0c871	[AArch64][NFC] Simplify loop vectoriser tail-folding tests I've simplified all of the SVE vectoriser tail-folding tests to only care about testing the flag: -prefer-predicate-over-epiloge=predicate-else-scalar-epilogue In practice we always want to fall back on unpredicated vector loops if tail-folding is not possible. Differential Revision: https://reviews.llvm.org/D129843	2022-07-18 13:37:29 +01:00
Graham Hunter	db8fcb2c25	[LAA] Add recursive IR walker for forked pointers This builds on the previous forked pointers patch, which only accepted a single select as the pointer to check. A recursive function to walk through IR has been added, which searches for either a loop-invariant or addrec SCEV. This will only handle a single fork at present, so selects of selects or a GEP with a select for both the base and offset will be rejected. There is also a recursion limit with a cli option to change it. Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D108699	2022-07-18 12:06:17 +01:00
Florian Hahn	105032f549	[LV] Use PHI recipe instead of PredRecipe for subsequent uses. At the moment, the VPPRedInstPHIRecipe is not used in subsequent uses of the predicate recipe. This incorrectly models the def-use chains, as all later uses should use the phi recipe. Fix that by delaying recording of the recipe. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D129436	2022-07-18 09:35:34 +01:00
Florian Hahn	6813b41d57	[LV] Avoid creating new run-time VF expression for each runtime checks. At the moment, the cost of runtime checks for scalable vectors is overestimated due to creating separate vscale * VF expressions for each check. Instead re-use the first expression.	2022-07-16 17:24:07 +01:00
Florian Hahn	aa00fb02c9	[LV] Use umax(VF * UF, MinProfTC) for scalable vectors. For scalable vectors, it is not sufficient to only check MinProfitableTripCount if it is >= VF.getKnownMinValue() * UF, because this property may not holder for larger values of vscale. In those cases, compute umax(VF * UF, MinProfTC) instead. This should fix https://lab.llvm.org/buildbot/#/builders/197/builds/2262	2022-07-15 10:23:14 -07:00
Florian Hahn	4c85a01758	[LV] Add scalable vector test showing incorrect min-trip count check. The test shows a case where the minimum trip count check incorrectly only checks the minimum profitable trip count computed due to runtime checks. This is incorrect for scalable VFs, because the VF * UF may exceed the minimum profitable trip count for vscale > 1. This is the likely reason for https://lab.llvm.org/buildbot/#/builders/197/builds/2262 failing.	2022-07-15 10:02:55 -07:00
Mel Chen	bd404fbcc8	[LV][NFC] Fix the condition for printing debug messages Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D128523	2022-07-15 01:47:33 -07:00
Mel Chen	ae15e6a952	[LV] Pre-commit test case for D128523, NFC	2022-07-15 01:22:06 -07:00
David Sherwood	307ace7f20	[LoopVectorize] Ensure the VPReductionRecipe is placed after all it's inputs When vectorising ordered reductions we call a function LoopVectorizationPlanner::adjustRecipesForReductions to replace the existing VPWidenRecipe for the fadd instruction with a new VPReductionRecipe. We attempt to insert the new recipe in the same place, but this is wrong because createBlockInMask may have generated new recipes that VPReductionRecipe now depends upon. I have changed the insertion code to append the recipe to the VPBasicBlock instead. Added a new RUN with tail-folding enabled to the existing test: Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll Differential Revision: https://reviews.llvm.org/D129550	2022-07-13 09:29:25 +01:00
Nikita Popov	a5ee62a141	[IndVars] Call replaceLoopPHINodesWithPreheaderValues() for already constant exits Currently we only call replaceLoopPHINodesWithPreheaderValues() if optimizeLoopExits() replaces the exit with an unconditional exit. However, it is very common that this already happens as part of eliminateIVComparison(), in which case we're leaving behind the dead header phi. Tweak the early bailout for already-constant exits to also call replaceLoopPHINodesWithPreheaderValues(). Differential Revision: https://reviews.llvm.org/D129214	2022-07-13 09:43:21 +02:00

1 2 3 4 5 ...

1790 Commits