Commit Graph

16 Commits

Author SHA1 Message Date
David Green 7786ac8377 [ARM] Remove dead mov's in preheader of tail predicated loops
With t2DoLoopDec we can be left with some extra MOV's in the preheaders
of tail predicated loops. This removes them, in the same way we remove
other dead variables.

Differential Revision: https://reviews.llvm.org/D91857
2021-02-11 10:48:20 +00:00
David Green 3f571be1c0 [ARM] Make t2DoLoopStartTP a terminator
Although this was something that I was hoping we would not have to do,
this patch makes t2DoLoopStartTP a terminator in order to keep it at the
end of it's block, so not allowing extra MVE instruction between it and
the end. With t2DoLoopStartTP's also starting tail predication regions,
it also marks them as having side effects. The t2DoLoopStart is still
not a terminator, giving it the extra scheduling freedom that can be
helpful, but now that we have a TP version they can be treated
differently.

Differential Revision: https://reviews.llvm.org/D91887
2020-12-11 09:23:57 +00:00
David Green f08c37da7b [ARM] Disable WLSTP loops
This checks to see if the loop will likely become a tail predicated loop
and disables wls loop generation if so, as the likelihood for reverting
is currently too high. These should be fairly rare situations anyway due
to the way iterations and element counts are used during lowering. Just
not trying can alter how SCEV's are materialized however, leading to
different codegen.

It also adds a option to disable all while low overhead loops, for
debugging.

Differential Revision: https://reviews.llvm.org/D91663
2020-11-20 13:30:44 +00:00
David Green 08d1c2d470 [ARM] Introduce t2DoLoopStartTP
This introduces a new pseudo instruction, almost identical to a
t2DoLoopStart but taking 2 parameters - the original loop iteration
count needed for a low overhead loop, plus the VCTP element count needed
for a DLSTP instruction setting up a tail predicated loop. The idea is
that the instruction holds both values and the backend
ARMLowOverheadLoops pass can pick between the two, depending on whether
it creates a tail predicated loop or falls back to a low overhead loop.

To do that there needs to be something that converts a t2DoLoopStart to
a t2DoLoopStartTP, for which this patch repurposes the
MVEVPTOptimisationsPass as a "tail predication and vpt optimisation"
pass. The extra operand for the t2DoLoopStartTP is chosen based on the
operands of VCTP's in the loop, and the instruction is moved as late in
the block as possible to attempt to increase the likelihood of making
tail predicated loops.

Differential Revision: https://reviews.llvm.org/D90591
2020-11-10 18:08:12 +00:00
David Green b2ac9681a7 [ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR

This will hopefully mean that low overhead loops are more tied together,
and we can more reliably generate loops without reverting or being at
the whims of the register allocator.

This is a fairly simple change in itself, but leads to a number of other
required alterations.

 - The hardware loop pass, if UsePhi is set, now generates loops of the
   form:
       %start = llvm.start.loop.iterations(%N)
     loop:
       %p = phi [%start], [%dec]
       %dec = llvm.loop.decrement.reg(%p, 1)
       %c = icmp ne %dec, 0
       br %c, loop, exit
 - For this a new llvm.start.loop.iterations intrinsic was added, identical
   to llvm.set.loop.iterations but produces a value as seen above, gluing
   the loop together more through def-use chains.
 - This new instrinsic conceptually produces the same output as input,
   which is taught to SCEV so that the checks in MVETailPredication are not
   affected.
 - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has
   been left mostly as before. We should now more reliably be able to tell
   that the t2DoLoopStart is correct without having to prove it, but
   t2WhileLoopStart and tail-predicated loops will remain the same.
 - And all the tests have been updated. There are a lot of them!

This patch on it's own might cause more trouble that it helps, with more
tail-predicated loops being reverted, but some additional patches can
hopefully improve upon that to get to something that is better overall.

Differential Revision: https://reviews.llvm.org/D89881
2020-11-10 15:57:58 +00:00
David Green 6dcbc323fd Revert "[ARM][LowOverheadLoops] Adjust Start insertion."
This reverts commit 38f625d0d1.

This commit contains some holes in its logic and has been causing
issues since it was commited. The idea sounds OK but some cases were not
handled correctly. Instead of trying to fix that up later it is probably
simpler to revert it and work to reimplement it in a more reliable way.
2020-10-20 08:55:21 +01:00
Sam Parker 38f625d0d1 [ARM][LowOverheadLoops] Adjust Start insertion.
Try to move the insertion point to become the terminator of the
block, usually the preheader.

Differential Revision: https://reviews.llvm.org/D88638
2020-10-01 10:49:19 +01:00
Sam Parker 6ec5f32497 [ARM][LowOverheadLoops] Iteration count liveness
Before deciding to insert a [W|D]LSTP, check that defining LR with
the element count won't affect any other instructions that should be
taking the iteration count.

Differential Revision: https://reviews.llvm.org/D88549
2020-10-01 10:11:10 +01:00
Sam Parker 7b90516d47 [ARM][LowOverheadLoops] Start insertion point
If possible, try not to move the start position earlier than it
already is.

Differential Revision: https://reviews.llvm.org/D88542
2020-10-01 10:05:25 +01:00
Sam Parker 700f93e92b [RDA] Switch isSafeToMove iterators
So forwards is forwards and backwards is reverse. Also add a check
so that we know the instructions are in the expected order.

Differential Revision: https://reviews.llvm.org/D88419
2020-09-30 08:10:48 +01:00
David Green e4b9867cb6 [ARM] Expand cannotInsertWDLSTPBetween to the last instruction
9d9a11c7be added this check for predicatable instructions between the
D/WLSTP and the loop's start, but it was missing the last instruction in
the block. Change it to use some iterators instead.

Differential Revision: https://reviews.llvm.org/D88354
2020-09-28 09:14:40 +01:00
David Green 34b27b9441 [ARM] Sink splats to MVE intrinsics
The predicated MVE intrinsics are generated as, for example,
llvm.arm.mve.add.predicated(x, splat(y). p). We need to sink the splat
value back into the loop, like we do for other instructions, so we can
re-select qr variants.

Differential Revision: https://reviews.llvm.org/D87693
2020-09-17 16:00:51 +01:00
Sam Tebbs 7aabb6ad77 [ARM][LowOverheadLoops] Remove modifications to the correct element
count register

After my patch at D86087, code that now uses the mov operand rather than
the vctp operand will no longer remove modifications to the vctp operand
as they should. This patch fixes that by explicitly removing
modifications to the vctp operand rather than the register used as the
element count.
2020-09-08 10:30:05 +01:00
Sam Parker a3e41d4581 [ARM] Make MachineVerifier more strict about terminators
Fix the ARM backend's analyzeBranch so it doesn't ignore predicated
return instructions, and make the MachineVerifier rule more strict.

Differential Revision: https://reviews.llvm.org/D40061
2020-08-27 07:10:20 +01:00
David Green 41495dd57a [ARM] Change target triple to arm-none-none-eabi. NFC 2020-08-19 11:58:50 +01:00
Sam Tebbs 31f02ac60a [ARM] Use mov operand if the mov cannot be moved while tail predicating
There are some cases where the instruction that sets up the iteration
count for a tail predicated loop cannot be moved before the dlstp,
stopping tail predication entirely. This patch checks if the mov operand
can be used and if so, uses that instead.

Differential Revision: https://reviews.llvm.org/D86087
2020-08-18 17:10:29 +01:00