Commit Graph

13 Commits

Author SHA1 Message Date
David Green 48230355e9 [ARM] Remove DLS lr, lr
A DLS lr, lr instruction only moves lr to itself. It need not be emitted
on it's own to save a instruction in the loop preheader.

Differential Revision: https://reviews.llvm.org/D78916
2021-02-02 11:09:31 +00:00
David Green 73a6cd4b6b [ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR
This hints the operand of a t2DoLoopStart towards using LR, which can
help make it more likely to become t2DLS lr, lr. This makes it easier to
move if needed (as the input is the same as the output), or potentially
remove entirely.

The hint is added after others (from COPY's etc) which still take
precedence. It needed to find a place to add the hint, which currently
uses the post isel custom inserter.

Differential Revision: https://reviews.llvm.org/D89883
2020-11-10 16:28:57 +00:00
David Green b2ac9681a7 [ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR

This will hopefully mean that low overhead loops are more tied together,
and we can more reliably generate loops without reverting or being at
the whims of the register allocator.

This is a fairly simple change in itself, but leads to a number of other
required alterations.

 - The hardware loop pass, if UsePhi is set, now generates loops of the
   form:
       %start = llvm.start.loop.iterations(%N)
     loop:
       %p = phi [%start], [%dec]
       %dec = llvm.loop.decrement.reg(%p, 1)
       %c = icmp ne %dec, 0
       br %c, loop, exit
 - For this a new llvm.start.loop.iterations intrinsic was added, identical
   to llvm.set.loop.iterations but produces a value as seen above, gluing
   the loop together more through def-use chains.
 - This new instrinsic conceptually produces the same output as input,
   which is taught to SCEV so that the checks in MVETailPredication are not
   affected.
 - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has
   been left mostly as before. We should now more reliably be able to tell
   that the t2DoLoopStart is correct without having to prove it, but
   t2WhileLoopStart and tail-predicated loops will remain the same.
 - And all the tests have been updated. There are a lot of them!

This patch on it's own might cause more trouble that it helps, with more
tail-predicated loops being reverted, but some additional patches can
hopefully improve upon that to get to something that is better overall.

Differential Revision: https://reviews.llvm.org/D89881
2020-11-10 15:57:58 +00:00
David Green 6dcbc323fd Revert "[ARM][LowOverheadLoops] Adjust Start insertion."
This reverts commit 38f625d0d1.

This commit contains some holes in its logic and has been causing
issues since it was commited. The idea sounds OK but some cases were not
handled correctly. Instead of trying to fix that up later it is probably
simpler to revert it and work to reimplement it in a more reliable way.
2020-10-20 08:55:21 +01:00
Sam Parker 38f625d0d1 [ARM][LowOverheadLoops] Adjust Start insertion.
Try to move the insertion point to become the terminator of the
block, usually the preheader.

Differential Revision: https://reviews.llvm.org/D88638
2020-10-01 10:49:19 +01:00
Sam Parker a3e41d4581 [ARM] Make MachineVerifier more strict about terminators
Fix the ARM backend's analyzeBranch so it doesn't ignore predicated
return instructions, and make the MachineVerifier rule more strict.

Differential Revision: https://reviews.llvm.org/D40061
2020-08-27 07:10:20 +01:00
David Green fa15255d8a [ARM] Convert floating point splats to integer
Under MVE a vdup will always take a gpr register, not a floating point
value. During DAG combine we convert the types to a bitcast to an
integer in an attempt to fold the bitcast into other instructions. This
is OK, but only works inside the same basic block. To do the same trick
across a basic block boundary we need to convert the type in
codegenprepare, before the splat is sunk into the loop.

This adds a convertSplatType function to codegenprepare to do that,
putting bitcasts around the splat to force the type to an integer. There
is then some adjustment to the code in shouldSinkOperands to handle the
extra bitcasts.

Differential Revision: https://reviews.llvm.org/D78728
2020-05-13 15:24:16 +01:00
David Green eecba95067 [ARM] Replace arm vendor with none. NFC 2020-04-22 18:19:35 +01:00
Pierre-vh 2effe8f5e7 [Target][ARM] Improvements to the VPT Block Insertion Pass
This allows the MVE VPT Block insertion pass to remove VPNOTs in
order to create more complex VPT blocks such as TE, TEET, TETE, etc.

Differential Revision: https://reviews.llvm.org/D75993
2020-04-01 12:34:20 +01:00
David Green 37b9cc8f29 [ARM] Sink splats to vector float instructions
Some MVE floating point instructions have gpr register variants that take
the scalar gpr value and splat them to all lanes. In order to accept
them in loops, the shuffle_vector and insert need to be sunk down into
the loop, next to the instruction so that ISel can see the whole
pattern.

This does that sinking for FAdd, FSub, FMul and FCmp. The patterns for
mul are slightly more constrained as there are no fms variants taking
register arguments.

Differential Revision: https://reviews.llvm.org/D76023
2020-03-26 09:02:18 +00:00
David Green b3499f572d [ARM] Change VDUP type to i32 for MVE
The MVE VDUP instruction take a GPR and splats into every lane of a
vector register. Unlike NEON we do not have a VDUPLANE equivalent
instruction, doing the same splat from a fp register. Previously a VDUP
to a v4f32/v8f16 would be represented as a (v4f32 VDUP f32), which
would mean the instruction pattern needs to add a COPY_TO_REGCLASS to
the GPR.

Instead this now converts that earlier during an ISel DAG combine,
converting (VDUP x) to (VDUP (bitcast x)). This can allow instruction
selection to tell that the input needs to be an i32, which in one of the
testcases allows it to use ldr (or specifically ldm) over (vldr;vmov).

Whilst being simple enough for floats, as the types sizes are the same,
these is no BITCAST equivalent for getting a half into a i32. This uses
a VMOVrh ARMISD node, which doesn't know the same tricks yet.

Differential Revision: https://reviews.llvm.org/D76292
2020-03-20 09:48:45 +00:00
David Green b4abe7afbf [ARM] Sink splat to ICmp
This adds ICmp to the list of instructions that we sink a splat to in a
loop, allowing the register forms of instructions to be selected more
often. It does not add FCmp yet as the results look a little odd, trying
to keep the register in an float reg and having to move it back to a GPR.

Differential Revision: https://reviews.llvm.org/D70997
2019-12-30 12:58:14 +00:00
David Green a5a141544d [ARM] MVE sink ICmp test. NFC 2019-12-30 12:58:13 +00:00