llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	f5abf0bd48	[ARM] Tail predication with constant loop bounds The TripCount for a predicated vector loop body will be ceil(ElementCount/Width). This alters the conversion of an active.lane.mask to a VCPT intrinsics to match. Differential Revision: https://reviews.llvm.org/D94608	2021-01-15 18:17:31 +00:00
David Green	0e49a40d75	[ARM] Cleanup for the MVETailPrediction pass This strips out a lot of the code that should no longer be needed from the MVETailPredictionPass, leaving the important part - find active lane mask instructions and convert them to VCTP operations. Differential Revision: https://reviews.llvm.org/D91866	2020-11-26 15:10:44 +00:00
David Green	b2ac9681a7	[ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881	2020-11-10 15:57:58 +00:00
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Tres Popp	509fba75df	[llvm] Fix unused variable in non-debug configurations	2020-09-28 17:04:08 +02:00
Sjoerd Meijer	1696dd27fb	[ARM][MVE] Enable tail-predication by default We have been running tests/benchmarks downstream with tail-predication enabled for some time now and this behaves as expected: we are not aware of any correctness issues, and this performs better across the board than with tail-predication disabled. Time to flip the switch! Differential Revision: https://reviews.llvm.org/D88093	2020-09-28 14:01:23 +01:00
Sjoerd Meijer	f39f92c1f6	[ARM][MVE] tail-predication: overflow checks for elementcount, cont'd This is a reimplementation of the overflow checks for the elementcount, i.e. the 2nd argument of intrinsic get.active.lane.mask. The element count is lowered in each iteration of the tail-predicated loop, and we must prove that this expression doesn't overflow. Many thanks to Eli Friedman and Sam Parker for all their help with this work. Differential Revision: https://reviews.llvm.org/D88086	2020-09-28 09:20:51 +01:00
Sjoerd Meijer	2fc690ac90	[ARM] LowoverheadLoops: add an option to disable tail-predication This might be useful for testing. We already have an option -tail-predication but that controls the MVETailPredication pass. This -arm-loloops-disable-tail-pred is just for disabling it in the LowoverheadLoops pass. Differential Revision: https://reviews.llvm.org/D88212	2020-09-24 13:30:48 +01:00
Sjoerd Meijer	b5c3efeb7b	[ARM][MVE] Tail-predication: predicate new elementcount checks on force-enabled Additional sanity checks were added to get.active.lane.mask's second argument, the loop tripcount/elementcount, in rG635b87511ec3. Like the other (overflow) checks, skip this if tail-predication is forced. Differential Revision: https://reviews.llvm.org/D87769	2020-09-16 17:05:14 +01:00
Sjoerd Meijer	635b87511e	[ARM][MVE] Tail-predication: use unsigned SCEV ranges for tripcount Loop tripcount expressions have a positive range, so use unsigned SCEV ranges for them. Differential Revision: https://reviews.llvm.org/D87608	2020-09-15 13:23:02 +01:00
Sjoerd Meijer	b4b1b84106	[MVE] fix typo in llvm debug message. NFC.	2020-09-15 10:13:54 +01:00
Sjoerd Meijer	676febc044	[ARM][MVE] Tail-predication: check get.active.lane.mask's TC value This adds additional checks for the original scalar loop tripcount value, i.e. get.active.lane.mask second argument, and perform several sanity checks to see if it is of the form that we expect similarly like we already do for the IV which is the first argument of get.active.lane. Differential Revision: https://reviews.llvm.org/D86074	2020-09-14 11:32:15 +01:00
David Green	4ca60915bc	[ARM] Correct predicate operand for offset gather/scatter These arm_mve_vldr_gather_offset_predicated and arm_mve_vstr_scatter_offset_predicated have some extra parameters meaning the predicate is at a later operand. If a loop contains _only_ those masked instructions, we would miss transforming the active lane mask. Differential Revision: https://reviews.llvm.org/D86791	2020-08-28 17:48:15 +01:00
Sjoerd Meijer	c352e7fbda	[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1. Differential Revision: https://reviews.llvm.org/D86303	2020-08-25 14:38:03 +01:00
Anna Welker	9eb9ba076a	[ARM][MVE] Fix for tail predication for loops containing MVE gather/scatters Fix to include non-predicated version of write-back gather in special case treatment for deducting the instruction type. (This is fixing https://reviews.llvm.org/D85138 for corner cases) Differential Revision: https://reviews.llvm.org/D85889	2020-08-13 12:24:19 +01:00
Anna Welker	4fe5615eab	[ARM][MVE] Enable tail predication for loops containing MVE gather/scatters Widen the scope of memory operations that are allowed to be tail predicated to include gathers and scatters, such that loops that are auto-vectorized with the option -enable-arm-maskedgatscat (and actually end up containing an MVE gather or scatter) can be tail predicated. Differential Revision: https://reviews.llvm.org/D85138	2020-08-12 15:32:37 +01:00
Sjoerd Meijer	6716e7868e	[ARM][MVE] tail-predication: overflow checks for backedge taken count. This pick ups the work on the overflow checks for get.active.lane.mask, which ensure that it is safe to insert the VCTP intrinisc that enables tail-predication. For a 2d auto-correlation kernel and its inner loop j: M = Size - i; for (j = 0; j < M; j++) Sum += Input[j] * Input[j+i]; For this inner loop, the SCEV backedge taken count (BTC) expression is: (-1 + (sext i16 %Size to i32)),+,-1}<nw><%for.body> and LoopUtil cannotBeMaxInLoop couldn't calculate a bound on this, thus "BTC cannot be max" could not be determined. So overflow behaviour had to be assumed in the loop tripcount expression that uses the BTC. As a result tail-predication had to be forced (with an option) for this case. This change solves that by using ScalarEvolution's helper getConstantMaxBackedgeTakenCount which is able to determine the range of BTC, thus can determine it is safe, so that we no longer need to force tail-predication as reflected in the changed test cases. Differential Revision: https://reviews.llvm.org/D85737	2020-08-12 09:32:26 +01:00
David Green	8590e5abad	[ARM] Allow vecreduce_add in tail predicated loops This allows vecreduce_add in loops so that we can tailpredicate them. Differential Revision: https://reviews.llvm.org/D85454	2020-08-09 10:57:17 +01:00
Sjoerd Meijer	595270ae39	[ARM][MVE] Refactor option -disable-mve-tail-predication This refactors option -disable-mve-tail-predication to take different arguments so that we have 1 option to control tail-predication rather than several different ones. This is also a prep step for D82953, in which we want to reject reductions unless that is requested with this option. Differential Revision: https://reviews.llvm.org/D83133	2020-07-13 13:40:33 +01:00
Samuel Tebbs	3324e3a6ee	[ARM] Allow the fabs intrinsic to be tail predicated This patch stops the fabs intrinsic from blocking tail predication. Differential Revision: https://reviews.llvm.org/D82570	2020-06-30 17:27:28 +01:00
Samuel Tebbs	66fa313999	[ARM] Allow the usub_sat and ssub_sat intrinsics to be tail predicated This patch stops the usub_sat and ssub_sat intrinsics from blocking tail predication. Differential Revision: https://reviews.llvm.org/D82571	2020-06-30 17:16:58 +01:00
Sjoerd Meijer	af45907653	[ARM][MVE] Tail-predication: clean-up of unused code After the rewrite of this pass (D79175) I missed one thing: the inserted VCTP intrinsic can be cloned to exit blocks if there are instructions present in it that perform the same operation, but this wasn't triggering anymore. However, it turns out that for handling reductions, see D75533, it's actually easier not not to have the VCTP in exit blocks, so this removes that code. This was possible because it turned out that some other code that depended on this, rematerialization of the trip count enabling more dead code removal later, wasn't doing much anymore due to more aggressive dead code removal that was added to the low-overhead loops pass. Differential Revision: https://reviews.llvm.org/D82773	2020-06-30 17:09:36 +01:00
Samuel Tebbs	d9cb811cbf	[ARM] Allow rounding intrinsics to be tail predicated This patch stops the trunc, rint, round, floor and ceil intrinsics from blocking tail predication. Differential Revision: https://reviews.llvm.org/D82553	2020-06-30 16:52:25 +01:00
Sjoerd Meijer	1319d9bb84	[ARM] Don't revert get.active.lane.mask in ARM Tail-Predication pass Don't revert intrinsic get.active.lane.mask here, this is moved to isel legalization in D82292. Differential Revision: https://reviews.llvm.org/D82105	2020-06-26 07:42:39 +01:00
Sam Tebbs	187f627a50	[ARM] Allow tail predication on sadd_sat and uadd_sat intrinsics This patch stops the sadd_sat and uadd_sat intrinsics from blocking tail predication. Differential revision: https://reviews.llvm.org/D82377	2020-06-25 11:54:29 +01:00
Simon Pilgrim	c18b753686	LoopUtils.h - reduce AliasAnalysis.h include to forward declarations. NFC. Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.	2020-06-24 17:58:38 +01:00
Sjoerd Meijer	4aa893b8f2	[ARM][MVE] tail-predication: renamed internal option. Renamed -force-tail-predication to -force-mve-tail-predication because that's more descriptive and consistent.	2020-06-19 15:07:06 +01:00
Sjoerd Meijer	d1522513d4	[ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask To set up a tail-predicated loop, we need to to calculate the number of elements processed by the loop. We can now use intrinsic @llvm.get.active.lane.mask() to do this, which is emitted by the vectoriser in D79100. This intrinsic generates a predicate for the masked loads/stores, and consumes the Backedge Taken Count (BTC) as its second argument. We can now use that to reconstruct the loop tripcount, instead of the IR pattern match approach we were using before. Many thanks to Eli Friedman and Sam Parker for all their help with this work. This also adds overflow checks for the different, new expressions that we create: the loop tripcount, and the sub expression that calculates the remaining elements to be processed. For the latter, SCEV is not able to calculate precise enough bounds, so we work around that at the moment, but is not entirely correct yet, it's conservative. The overflow checks can be overruled with a force flag, which is thus potentially unsafe (but not really because the vectoriser is the only place where this intrinsic is emitted at the moment). It's also good to mention that the tail-predication pass is not yet enabled by default. We will follow up to see if we can implement these overflow checks better, either by a change in SCEV or we may want revise the definition of llvm.get.active.lane.mask. Differential Revision: https://reviews.llvm.org/D79175	2020-06-17 15:17:42 +01:00
Sanjay Patel	7eed772a27	[PatternMatch] abbreviate vector inst matchers; NFC Readability is not reduced with these opcodes/match lines, so reduce odds of awkward wrapping from 80-col limit.	2020-05-24 09:19:47 -04:00
Florian Hahn	bcbd26bfe6	[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC). SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using other transform utils in SCEVExpander. This patch was originally committed as `b8a3c34eee`, but broke the modules build, as LoopAccessAnalysis was using the Expander. The code-gen part of LAA was moved to lib/Transforms recently, so this patch can be landed again. Reviewers: sanjoy.google, efriedma, reames Reviewed By: sanjoy.google Differential Revision: https://reviews.llvm.org/D71537	2020-05-20 10:53:40 +01:00
Christopher Tetreault	245679b62e	[SVE] Remove usages of VectorType::getNumElements() from ARM Reviewers: efriedma, fpetrogalli, kmclaughlin, grosbach, dmgreen Reviewed By: dmgreen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, dmgreen, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79816	2020-05-15 12:55:27 -07:00
David Green	61b8af0375	[ARM] Allow fma in tail predicated loops There are some intrinsics like this that currently block tail predication, but should be fine. This allows fma through, as the one that I ran into. There may be others that need the same treatment but I've only done this one here. Differential Revision: https://reviews.llvm.org/D78385	2020-04-27 15:32:47 +01:00
Sjoerd Meijer	0736d1ccf3	[ARM][MVE] Tail-predication: some more comments and debug messages. NFC. Finding the loop tripcount is the first crucial step in preparing a loop for tail-predication, and this adds a debug message if a tripcount cannot be found. And while I was at it, I added some more comments here and there. Differential Revision: https://reviews.llvm.org/D78485	2020-04-22 10:34:23 +01:00
Eli Friedman	1ee6ec2bf3	Remove "mask" operand from shufflevector. Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors. This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types. I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that. Differential Revision: https://reviews.llvm.org/D72467	2020-03-31 13:08:59 -07:00
Roman Lebedev	0789f28048	[NFC][SCEV] Piping to pass TTI into SCEVExpander::isHighCostExpansionHelper() Summary: Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()` This is a fully NFC patch to make things reviewable. Reviewers: reames, mkazantsev, wmi, sanjoy Reviewed By: mkazantsev Subscribers: hiraditya, zzheng, javed.absar, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73704	2020-02-25 23:05:56 +03:00
Sjoerd Meijer	6b0ed508fa	[ARM][MVE] Tail-Predication: recognise (again) active lanes IR pattern A small IR change in calculating the active lanes resulted in no longer recognising tail-predication. Now recognise both an 'add' and 'or' in the expression that calculates the active lanes. Differential Revision: https://reviews.llvm.org/D74394	2020-02-11 15:18:18 +00:00
Sjoerd Meijer	b567ff2fa0	[ARM][MVE] Tail-predication: support constant trip count We had support for runtime trip count values, but not constants, and this adds supports for that. And added a minor optimisation while I was add it: don't invoke Cleanup when there's nothing to clean up. Differential Revision: https://reviews.llvm.org/D73198	2020-01-27 11:05:26 +00:00
Sam Parker	c04b9ba595	[ARM][MVE] Clear MaskedInsts vector In MVETailPredication, clear the vector before running on a new loop. Differential Revision: https://reviews.llvm.org/D73048	2020-01-22 04:27:36 -05:00
Sjoerd Meijer	8cba99e2aa	[ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks This patch uses helper function rewriteLoopExitValues that is refactored in D72602 to rematerialise the iteration count in exit blocks, so that we can clean-up loop update expressions inside the hardware-loops later in ARMLowOverheadLoops, which is necessary to get actual performance gains for tail-predicated loops. Differential Revision: https://reviews.llvm.org/D72714	2020-01-20 10:26:36 +00:00
Sjoerd Meijer	0efc9e5a8c	[ARM][MVE] More MVETailPredication debug messages. NFC. I've added a few more debug messages to MVETailPredication because I wanted to trace better which instructions are added/removed. And while I was at it, I factored out one function which I thought was clearer, and have added some comments to describe better the flow between MVETailPredication and ARMLowOverheadLoops. Differential Revision: https://reviews.llvm.org/D71549	2020-01-06 09:56:02 +00:00
Florian Hahn	b8a3c34eee	Revert "[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC)." This reverts commit `51ef53f3bd`, as it breaks some bots.	2020-01-04 18:44:38 +00:00
Florian Hahn	51ef53f3bd	[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC). SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using other transform utils in SCEVExpander. Reviewers: sanjoy.google, efriedma, reames Reviewed By: sanjoy.google Differential Revision: https://reviews.llvm.org/D71537	2020-01-04 18:29:35 +00:00
David Green	b4abe7afbf	[ARM] Sink splat to ICmp This adds ICmp to the list of instructions that we sink a splat to in a loop, allowing the register forms of instructions to be selected more often. It does not add FCmp yet as the results look a little odd, trying to keep the register in an float reg and having to move it back to a GPR. Differential Revision: https://reviews.llvm.org/D70997	2019-12-30 12:58:14 +00:00
Reid Kleckner	5d986953c8	[IR] Split out target specific intrinsic enums into separate headers This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320	2019-12-11 18:02:14 -08:00
Simon Tatham	48cce077ef	[ARM,MVE] Rename and clean up VCTP IR intrinsics. Summary: D65884 added a set of Arm IR intrinsics for the MVE VCTP instruction, to use in tail predication. But the 64-bit one doesn't work properly: its predicate type is `<2 x i1>` / `v2i1`, which isn't a legal MVE type (due to not having a full set of instructions that manipulate it usefully). The test of `vctp64` in `basic-tail-pred.ll` goes through `opt` fine, as the test expects, but if you then feed it to `llc` it causes a type legality failure at isel time. The usual workaround we've been using in the rest of the MVE intrinsics family is to bodge `v2i1` into `v4i1`. So I've adjusted the `vctp64` IR intrinsic to do that, and completely removed the code (and test) that uses that intrinsic for 64-bit tail predication. That will allow me to add isel rules (upcoming in D70485) that actually generate the VCTP64 instruction. Also renamed all four of these IR intrinsics so that they have `mve` in the name, since its absence was confusing. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: MarkMurrayARM Subscribers: samparker, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70592	2019-12-02 16:20:30 +00:00
Sam Parker	d43913ae38	[ARM][MVE] Enable narrow vectors for tail pred Remove the restriction, from the mve tail predication pass, that the all masked vectors instructions need to be 128-bits. This allows us to supported extending loads and truncating stores. Differential Revision: https://reviews.llvm.org/D69946	2019-11-19 08:51:12 +00:00
Sjoerd Meijer	d90804d26b	[ARM][MVE] canTailPredicateLoop This implements TTI hook 'preferPredicateOverEpilogue' for MVE. This is a first version and it operates on single block loops only. With this change, the vectoriser will now determine if tail-folding scalar remainder loops is possible/desired, which is the first step to generate MVE tail-predicated vector loops. This is disabled by default for now. I.e,, this is depends on option -disable-mve-tail-predication, which is off by default. I will follow up on this soon with a patch for the vectoriser to respect loop hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled, we don't want to tail-fold and we shouldn't query this TTI hook, which is done in D70125. Differential Revision: https://reviews.llvm.org/D69845	2019-11-13 13:24:33 +00:00
Sam Parker	aac03ae06a	[ARM][MVE] Change VCTP operand The VCTP instruction will calculate the predicate masked based upon the number of elements that need to be processed. I had inserted the sub before the vctp intrinsic and supplied it as the operand, but this is incorrect as the phi should directly feed the vctp. The sub is calculating the value for the next iteration. Differential Revision: https://reviews.llvm.org/D67921 llvm-svn: 373188	2019-09-30 08:03:23 +00:00
Sam Parker	9feb429a33	[ARM][MVE] Remove old tail predicates Remove any predicate that we replace with a vctp intrinsic, and try to remove their operands too. Also look into the exit block to see if there's any duplicates of the predicates that we've replaced and clone the vctp to be used there instead. Differential Revision: https://reviews.llvm.org/D67709 llvm-svn: 372567	2019-09-23 09:48:25 +00:00
Sam Parker	312409e464	[ARM] MVE Tail Predication The MVE and LOB extensions of Armv8.1m can be combined to enable 'tail predication' which removes the need for a scalar remainder loop after vectorization. Lane predication is performed implicitly via a system register. The effects of predication is described in Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points being: - For vector operations that perform reduction across the vector and produce a scalar result, whether the value is accumulated or not. - For non-load instructions, the predicate flags determine if the destination register byte is updated with the new value or if the previous value is preserved. - For vector store instructions, whether the store occurs or not. - For vector load instructions, whether the value that is loaded or whether zeros are written to that element of the destination register. This patch implements a pass that takes a hardware loop, containing masked vector instructions, and converts it something that resembles an MVE tail predicated loop. Currently, if we had code generation, we'd generate a loop in which the VCTP would generate the predicate and VPST would then setup the value of VPR.PO. The loads and stores would be placed in VPT blocks so this is not tail predication, but normal VPT predication with the predicate based upon a element counting induction variable. Further work needs to be done to finally produce a true tail predicated loop. Because only the loads and stores are predicated, in both the LLVM IR and MIR level, we will restrict support to only lane-wise operations (no horizontal reductions). We will perform a final check on MIR during loop finalisation too. Another restriction, specific to MVE, is that all the vector instructions need operate on the same number of elements. This is because predication is performed at the byte level and this is set on entry to the loop, or by the VCTP instead. Differential Revision: https://reviews.llvm.org/D65884 llvm-svn: 371179	2019-09-06 08:24:41 +00:00

50 Commits