llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazu Hirata	2833760c57	[Target] Qualify auto in range-based for loops (NFC)	2022-08-28 17:35:09 -07:00
David Green	ab0c5cea0b	[ARM] Use v2i1 for MVE and CDE intrinsics This adjusts all the MVE and CDE intrinsics now that v2i1 is a legal type, to use a <2 x i1> as opposed to emulating the predicate with a <4 x i1>. The v4i1 workarounds have been removed leaving the natural v2i1 types, notably in vctp64 which now generates a v2i1 type. AutoUpgrade code has been added to upgrade old IR, which needs to convert the old v4i1 to a v2i1 be converting it back and forth to an integer with arm.mve.v2i and arm.mve.i2v intrinsics. These should be optimized away in the final assembly. Differential Revision: https://reviews.llvm.org/D114455	2021-12-03 15:27:58 +00:00
Philip Reames	29fa37ec9f	[SCEV] If max BTC is zero, then so is the exact BTC [2 of 2] This extends D108921 into a generic rule applied to constructing ExitLimits along all paths. The remaining paths (primarily howFarToZero) don't have the same reasoning about UB sensitivity as the howManyLessThan ones did. Instead, the remain cause for max counts being more precise than exact counts is that we apply context sensitive loop guards on the max path, and not on the exact path. That choice is mildly suspect, but out of scope of this patch. The MVETailPredication.cpp change deserves a bit of explanation. We were previously figuring out that two SCEVs happened to be equal because the happened to be identical. When we optimized one with context sensitive information, but not the other, we lost the ability to prove them equal. So, cover this case by subtracting and then applying loop guards again. Without this, we see changes in test/CodeGen/Thumb2/mve-blockplacement.ll Differential Revision: https://reviews.llvm.org/D109015	2021-09-01 11:51:48 -07:00
David Green	258e2e9a0b	[ARM] Ensure loop invariant active.lane.mask operands CGP can move instructions like a ptrtoint into a loop, but the MVETailPredication when converting them will currently assume invariant trip counts. This tries to ensure the operands are loop invariant, and bails if not. Differential Revision: https://reviews.llvm.org/D100550	2021-04-26 10:04:33 +01:00
David Green	fad70c3068	[ARM] Improve WLS lowering Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over to the WLS while loops, improving the chance of lowering them successfully. To do this the lowering has to change a little as the instructions are terminators that produce a value - something that needs to be treated carefully. Lowering starts at the Hardware Loop pass, inserting a new llvm.test.start.loop.iterations that produces both an i1 to control the loop entry and an i32 similar to the llvm.start.loop.iterations intrinsic added for do loops. This feeds into the loop phi, properly gluing the values together: %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div) %wls0 = extractvalue { i32, i1 } %wls, 0 %wls1 = extractvalue { i32, i1 } %wls, 1 br i1 %wls1, label %loop.ph, label %loop.exit ... loop: %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ] .. %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1) %cmp = icmp ne i32 %iv.next, 0 br i1 %cmp, label %loop, label %loop.exit The llvm.test.start.loop.iterations need to be lowered through ISel lowering as a pair of WLS and WLSSETUP nodes, which each get converted to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent t2WhileLoopStart from being a terminator that produces a value, something difficult to control at that stage in the pipeline. Instead the t2WhileLoopSetup produces the value of LR (essentially acting as a lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc). These are then converted into a single t2WhileLoopStartLR at the same point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop to prevent them from progressing further in the pipeline. The t2WhileLoopStartLR is a single instruction that takes a GPR and produces LR, similar to the WLS instruction. %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3 t2B %bb.1 ... bb.2.loop: %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2 ... %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2 t2B %bb.3 The t2WhileLoopStartLR can then be treated similar to the other low overhead loop pseudos, eventually being lowered to a WLS providing the branches are within range. Differential Revision: https://reviews.llvm.org/D97729	2021-03-11 17:56:19 +00:00
David Green	f5abf0bd48	[ARM] Tail predication with constant loop bounds The TripCount for a predicated vector loop body will be ceil(ElementCount/Width). This alters the conversion of an active.lane.mask to a VCPT intrinsics to match. Differential Revision: https://reviews.llvm.org/D94608	2021-01-15 18:17:31 +00:00
David Green	0e49a40d75	[ARM] Cleanup for the MVETailPrediction pass This strips out a lot of the code that should no longer be needed from the MVETailPredictionPass, leaving the important part - find active lane mask instructions and convert them to VCTP operations. Differential Revision: https://reviews.llvm.org/D91866	2020-11-26 15:10:44 +00:00
David Green	b2ac9681a7	[ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881	2020-11-10 15:57:58 +00:00
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Tres Popp	509fba75df	[llvm] Fix unused variable in non-debug configurations	2020-09-28 17:04:08 +02:00
Sjoerd Meijer	1696dd27fb	[ARM][MVE] Enable tail-predication by default We have been running tests/benchmarks downstream with tail-predication enabled for some time now and this behaves as expected: we are not aware of any correctness issues, and this performs better across the board than with tail-predication disabled. Time to flip the switch! Differential Revision: https://reviews.llvm.org/D88093	2020-09-28 14:01:23 +01:00
Sjoerd Meijer	f39f92c1f6	[ARM][MVE] tail-predication: overflow checks for elementcount, cont'd This is a reimplementation of the overflow checks for the elementcount, i.e. the 2nd argument of intrinsic get.active.lane.mask. The element count is lowered in each iteration of the tail-predicated loop, and we must prove that this expression doesn't overflow. Many thanks to Eli Friedman and Sam Parker for all their help with this work. Differential Revision: https://reviews.llvm.org/D88086	2020-09-28 09:20:51 +01:00
Sjoerd Meijer	2fc690ac90	[ARM] LowoverheadLoops: add an option to disable tail-predication This might be useful for testing. We already have an option -tail-predication but that controls the MVETailPredication pass. This -arm-loloops-disable-tail-pred is just for disabling it in the LowoverheadLoops pass. Differential Revision: https://reviews.llvm.org/D88212	2020-09-24 13:30:48 +01:00
Sjoerd Meijer	b5c3efeb7b	[ARM][MVE] Tail-predication: predicate new elementcount checks on force-enabled Additional sanity checks were added to get.active.lane.mask's second argument, the loop tripcount/elementcount, in rG635b87511ec3. Like the other (overflow) checks, skip this if tail-predication is forced. Differential Revision: https://reviews.llvm.org/D87769	2020-09-16 17:05:14 +01:00
Sjoerd Meijer	635b87511e	[ARM][MVE] Tail-predication: use unsigned SCEV ranges for tripcount Loop tripcount expressions have a positive range, so use unsigned SCEV ranges for them. Differential Revision: https://reviews.llvm.org/D87608	2020-09-15 13:23:02 +01:00
Sjoerd Meijer	b4b1b84106	[MVE] fix typo in llvm debug message. NFC.	2020-09-15 10:13:54 +01:00
Sjoerd Meijer	676febc044	[ARM][MVE] Tail-predication: check get.active.lane.mask's TC value This adds additional checks for the original scalar loop tripcount value, i.e. get.active.lane.mask second argument, and perform several sanity checks to see if it is of the form that we expect similarly like we already do for the IV which is the first argument of get.active.lane. Differential Revision: https://reviews.llvm.org/D86074	2020-09-14 11:32:15 +01:00
David Green	4ca60915bc	[ARM] Correct predicate operand for offset gather/scatter These arm_mve_vldr_gather_offset_predicated and arm_mve_vstr_scatter_offset_predicated have some extra parameters meaning the predicate is at a later operand. If a loop contains _only_ those masked instructions, we would miss transforming the active lane mask. Differential Revision: https://reviews.llvm.org/D86791	2020-08-28 17:48:15 +01:00
Sjoerd Meijer	c352e7fbda	[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1. Differential Revision: https://reviews.llvm.org/D86303	2020-08-25 14:38:03 +01:00
Anna Welker	9eb9ba076a	[ARM][MVE] Fix for tail predication for loops containing MVE gather/scatters Fix to include non-predicated version of write-back gather in special case treatment for deducting the instruction type. (This is fixing https://reviews.llvm.org/D85138 for corner cases) Differential Revision: https://reviews.llvm.org/D85889	2020-08-13 12:24:19 +01:00
Anna Welker	4fe5615eab	[ARM][MVE] Enable tail predication for loops containing MVE gather/scatters Widen the scope of memory operations that are allowed to be tail predicated to include gathers and scatters, such that loops that are auto-vectorized with the option -enable-arm-maskedgatscat (and actually end up containing an MVE gather or scatter) can be tail predicated. Differential Revision: https://reviews.llvm.org/D85138	2020-08-12 15:32:37 +01:00
Sjoerd Meijer	6716e7868e	[ARM][MVE] tail-predication: overflow checks for backedge taken count. This pick ups the work on the overflow checks for get.active.lane.mask, which ensure that it is safe to insert the VCTP intrinisc that enables tail-predication. For a 2d auto-correlation kernel and its inner loop j: M = Size - i; for (j = 0; j < M; j++) Sum += Input[j] * Input[j+i]; For this inner loop, the SCEV backedge taken count (BTC) expression is: (-1 + (sext i16 %Size to i32)),+,-1}<nw><%for.body> and LoopUtil cannotBeMaxInLoop couldn't calculate a bound on this, thus "BTC cannot be max" could not be determined. So overflow behaviour had to be assumed in the loop tripcount expression that uses the BTC. As a result tail-predication had to be forced (with an option) for this case. This change solves that by using ScalarEvolution's helper getConstantMaxBackedgeTakenCount which is able to determine the range of BTC, thus can determine it is safe, so that we no longer need to force tail-predication as reflected in the changed test cases. Differential Revision: https://reviews.llvm.org/D85737	2020-08-12 09:32:26 +01:00
David Green	8590e5abad	[ARM] Allow vecreduce_add in tail predicated loops This allows vecreduce_add in loops so that we can tailpredicate them. Differential Revision: https://reviews.llvm.org/D85454	2020-08-09 10:57:17 +01:00
Sjoerd Meijer	595270ae39	[ARM][MVE] Refactor option -disable-mve-tail-predication This refactors option -disable-mve-tail-predication to take different arguments so that we have 1 option to control tail-predication rather than several different ones. This is also a prep step for D82953, in which we want to reject reductions unless that is requested with this option. Differential Revision: https://reviews.llvm.org/D83133	2020-07-13 13:40:33 +01:00
Samuel Tebbs	3324e3a6ee	[ARM] Allow the fabs intrinsic to be tail predicated This patch stops the fabs intrinsic from blocking tail predication. Differential Revision: https://reviews.llvm.org/D82570	2020-06-30 17:27:28 +01:00
Samuel Tebbs	66fa313999	[ARM] Allow the usub_sat and ssub_sat intrinsics to be tail predicated This patch stops the usub_sat and ssub_sat intrinsics from blocking tail predication. Differential Revision: https://reviews.llvm.org/D82571	2020-06-30 17:16:58 +01:00
Sjoerd Meijer	af45907653	[ARM][MVE] Tail-predication: clean-up of unused code After the rewrite of this pass (D79175) I missed one thing: the inserted VCTP intrinsic can be cloned to exit blocks if there are instructions present in it that perform the same operation, but this wasn't triggering anymore. However, it turns out that for handling reductions, see D75533, it's actually easier not not to have the VCTP in exit blocks, so this removes that code. This was possible because it turned out that some other code that depended on this, rematerialization of the trip count enabling more dead code removal later, wasn't doing much anymore due to more aggressive dead code removal that was added to the low-overhead loops pass. Differential Revision: https://reviews.llvm.org/D82773	2020-06-30 17:09:36 +01:00
Samuel Tebbs	d9cb811cbf	[ARM] Allow rounding intrinsics to be tail predicated This patch stops the trunc, rint, round, floor and ceil intrinsics from blocking tail predication. Differential Revision: https://reviews.llvm.org/D82553	2020-06-30 16:52:25 +01:00
Sjoerd Meijer	1319d9bb84	[ARM] Don't revert get.active.lane.mask in ARM Tail-Predication pass Don't revert intrinsic get.active.lane.mask here, this is moved to isel legalization in D82292. Differential Revision: https://reviews.llvm.org/D82105	2020-06-26 07:42:39 +01:00
Sam Tebbs	187f627a50	[ARM] Allow tail predication on sadd_sat and uadd_sat intrinsics This patch stops the sadd_sat and uadd_sat intrinsics from blocking tail predication. Differential revision: https://reviews.llvm.org/D82377	2020-06-25 11:54:29 +01:00
Simon Pilgrim	c18b753686	LoopUtils.h - reduce AliasAnalysis.h include to forward declarations. NFC. Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.	2020-06-24 17:58:38 +01:00
Sjoerd Meijer	4aa893b8f2	[ARM][MVE] tail-predication: renamed internal option. Renamed -force-tail-predication to -force-mve-tail-predication because that's more descriptive and consistent.	2020-06-19 15:07:06 +01:00
Sjoerd Meijer	d1522513d4	[ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask To set up a tail-predicated loop, we need to to calculate the number of elements processed by the loop. We can now use intrinsic @llvm.get.active.lane.mask() to do this, which is emitted by the vectoriser in D79100. This intrinsic generates a predicate for the masked loads/stores, and consumes the Backedge Taken Count (BTC) as its second argument. We can now use that to reconstruct the loop tripcount, instead of the IR pattern match approach we were using before. Many thanks to Eli Friedman and Sam Parker for all their help with this work. This also adds overflow checks for the different, new expressions that we create: the loop tripcount, and the sub expression that calculates the remaining elements to be processed. For the latter, SCEV is not able to calculate precise enough bounds, so we work around that at the moment, but is not entirely correct yet, it's conservative. The overflow checks can be overruled with a force flag, which is thus potentially unsafe (but not really because the vectoriser is the only place where this intrinsic is emitted at the moment). It's also good to mention that the tail-predication pass is not yet enabled by default. We will follow up to see if we can implement these overflow checks better, either by a change in SCEV or we may want revise the definition of llvm.get.active.lane.mask. Differential Revision: https://reviews.llvm.org/D79175	2020-06-17 15:17:42 +01:00
Sanjay Patel	7eed772a27	[PatternMatch] abbreviate vector inst matchers; NFC Readability is not reduced with these opcodes/match lines, so reduce odds of awkward wrapping from 80-col limit.	2020-05-24 09:19:47 -04:00
Florian Hahn	bcbd26bfe6	[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC). SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using other transform utils in SCEVExpander. This patch was originally committed as `b8a3c34eee`, but broke the modules build, as LoopAccessAnalysis was using the Expander. The code-gen part of LAA was moved to lib/Transforms recently, so this patch can be landed again. Reviewers: sanjoy.google, efriedma, reames Reviewed By: sanjoy.google Differential Revision: https://reviews.llvm.org/D71537	2020-05-20 10:53:40 +01:00
Christopher Tetreault	245679b62e	[SVE] Remove usages of VectorType::getNumElements() from ARM Reviewers: efriedma, fpetrogalli, kmclaughlin, grosbach, dmgreen Reviewed By: dmgreen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, dmgreen, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79816	2020-05-15 12:55:27 -07:00
David Green	61b8af0375	[ARM] Allow fma in tail predicated loops There are some intrinsics like this that currently block tail predication, but should be fine. This allows fma through, as the one that I ran into. There may be others that need the same treatment but I've only done this one here. Differential Revision: https://reviews.llvm.org/D78385	2020-04-27 15:32:47 +01:00
Sjoerd Meijer	0736d1ccf3	[ARM][MVE] Tail-predication: some more comments and debug messages. NFC. Finding the loop tripcount is the first crucial step in preparing a loop for tail-predication, and this adds a debug message if a tripcount cannot be found. And while I was at it, I added some more comments here and there. Differential Revision: https://reviews.llvm.org/D78485	2020-04-22 10:34:23 +01:00
Eli Friedman	1ee6ec2bf3	Remove "mask" operand from shufflevector. Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors. This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types. I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that. Differential Revision: https://reviews.llvm.org/D72467	2020-03-31 13:08:59 -07:00
Roman Lebedev	0789f28048	[NFC][SCEV] Piping to pass TTI into SCEVExpander::isHighCostExpansionHelper() Summary: Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()` This is a fully NFC patch to make things reviewable. Reviewers: reames, mkazantsev, wmi, sanjoy Reviewed By: mkazantsev Subscribers: hiraditya, zzheng, javed.absar, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73704	2020-02-25 23:05:56 +03:00
Sjoerd Meijer	6b0ed508fa	[ARM][MVE] Tail-Predication: recognise (again) active lanes IR pattern A small IR change in calculating the active lanes resulted in no longer recognising tail-predication. Now recognise both an 'add' and 'or' in the expression that calculates the active lanes. Differential Revision: https://reviews.llvm.org/D74394	2020-02-11 15:18:18 +00:00
Sjoerd Meijer	b567ff2fa0	[ARM][MVE] Tail-predication: support constant trip count We had support for runtime trip count values, but not constants, and this adds supports for that. And added a minor optimisation while I was add it: don't invoke Cleanup when there's nothing to clean up. Differential Revision: https://reviews.llvm.org/D73198	2020-01-27 11:05:26 +00:00
Sam Parker	c04b9ba595	[ARM][MVE] Clear MaskedInsts vector In MVETailPredication, clear the vector before running on a new loop. Differential Revision: https://reviews.llvm.org/D73048	2020-01-22 04:27:36 -05:00
Sjoerd Meijer	8cba99e2aa	[ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks This patch uses helper function rewriteLoopExitValues that is refactored in D72602 to rematerialise the iteration count in exit blocks, so that we can clean-up loop update expressions inside the hardware-loops later in ARMLowOverheadLoops, which is necessary to get actual performance gains for tail-predicated loops. Differential Revision: https://reviews.llvm.org/D72714	2020-01-20 10:26:36 +00:00
Sjoerd Meijer	0efc9e5a8c	[ARM][MVE] More MVETailPredication debug messages. NFC. I've added a few more debug messages to MVETailPredication because I wanted to trace better which instructions are added/removed. And while I was at it, I factored out one function which I thought was clearer, and have added some comments to describe better the flow between MVETailPredication and ARMLowOverheadLoops. Differential Revision: https://reviews.llvm.org/D71549	2020-01-06 09:56:02 +00:00
Florian Hahn	b8a3c34eee	Revert "[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC)." This reverts commit `51ef53f3bd`, as it breaks some bots.	2020-01-04 18:44:38 +00:00
Florian Hahn	51ef53f3bd	[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC). SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using other transform utils in SCEVExpander. Reviewers: sanjoy.google, efriedma, reames Reviewed By: sanjoy.google Differential Revision: https://reviews.llvm.org/D71537	2020-01-04 18:29:35 +00:00
David Green	b4abe7afbf	[ARM] Sink splat to ICmp This adds ICmp to the list of instructions that we sink a splat to in a loop, allowing the register forms of instructions to be selected more often. It does not add FCmp yet as the results look a little odd, trying to keep the register in an float reg and having to move it back to a GPR. Differential Revision: https://reviews.llvm.org/D70997	2019-12-30 12:58:14 +00:00
Reid Kleckner	5d986953c8	[IR] Split out target specific intrinsic enums into separate headers This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320	2019-12-11 18:02:14 -08:00
Simon Tatham	48cce077ef	[ARM,MVE] Rename and clean up VCTP IR intrinsics. Summary: D65884 added a set of Arm IR intrinsics for the MVE VCTP instruction, to use in tail predication. But the 64-bit one doesn't work properly: its predicate type is `<2 x i1>` / `v2i1`, which isn't a legal MVE type (due to not having a full set of instructions that manipulate it usefully). The test of `vctp64` in `basic-tail-pred.ll` goes through `opt` fine, as the test expects, but if you then feed it to `llc` it causes a type legality failure at isel time. The usual workaround we've been using in the rest of the MVE intrinsics family is to bodge `v2i1` into `v4i1`. So I've adjusted the `vctp64` IR intrinsic to do that, and completely removed the code (and test) that uses that intrinsic for 64-bit tail predication. That will allow me to add isel rules (upcoming in D70485) that actually generate the VCTP64 instruction. Also renamed all four of these IR intrinsics so that they have `mve` in the name, since its absence was confusing. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: MarkMurrayARM Subscribers: samparker, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70592	2019-12-02 16:20:30 +00:00

1 2

55 Commits