Converting the *LoopStart pseudo instructions into DLS/WLS results in
LR being defined. These instructions were inserted on the assumption
that LR would already contain the loop counter because a mov is
introduced during ISel as the the consumers in the loop can only use
LR. That assumption proved wrong!
So perform a safety check, finding an appropriate place to insert the
DLS/WLS instructions or revert if this isn't possible.
Differential Revision: https://reviews.llvm.org/D67539
llvm-svn: 372111
The low-overhead branch extension provides a loop-end 'LE' instruction
that performs no decrement nor compare, it just jumps backwards. This
patch modifies the constant islands pass to try to insert LE
instructions in place of a Thumb2 conditional branch, instead of
shrinking it. This only happens if a cmp can be converted to a cbn/z
and used to exit the loop.
Differential Revision: https://reviews.llvm.org/D67404
llvm-svn: 372085
Summary:
This catches malformed mir files which specify alignment as log2 instead of pow2.
See https://reviews.llvm.org/D65945 for reference,
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67433
llvm-svn: 371608
The MVE and LOB extensions of Armv8.1m can be combined to enable
'tail predication' which removes the need for a scalar remainder
loop after vectorization. Lane predication is performed implicitly
via a system register. The effects of predication is described in
Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points
being:
- For vector operations that perform reduction across the vector and
produce a scalar result, whether the value is accumulated or not.
- For non-load instructions, the predicate flags determine if the
destination register byte is updated with the new value or if the
previous value is preserved.
- For vector store instructions, whether the store occurs or not.
- For vector load instructions, whether the value that is loaded or
whether zeros are written to that element of the destination
register.
This patch implements a pass that takes a hardware loop, containing
masked vector instructions, and converts it something that resembles
an MVE tail predicated loop. Currently, if we had code generation,
we'd generate a loop in which the VCTP would generate the predicate
and VPST would then setup the value of VPR.PO. The loads and stores
would be placed in VPT blocks so this is not tail predication, but
normal VPT predication with the predicate based upon a element
counting induction variable. Further work needs to be done to finally
produce a true tail predicated loop.
Because only the loads and stores are predicated, in both the LLVM IR
and MIR level, we will restrict support to only lane-wise operations
(no horizontal reductions). We will perform a final check on MIR
during loop finalisation too.
Another restriction, specific to MVE, is that all the vector
instructions need operate on the same number of elements. This is
because predication is performed at the byte level and this is set
on entry to the loop, or by the VCTP instead.
Differential Revision: https://reviews.llvm.org/D65884
llvm-svn: 371179
Two issues:
1. t2CMPri shouldn't use CPSR if it isn't predicated. This doesn't
really have any visible effect at the moment, but it might matter in the
future.
2. The t2CMPri generated for t2WhileLoopStart might need to use a
register that isn't LR.
My team found this because we have a patch to track register liveness
late in the pass pipeline. I'll look into upstreaming it to help catch
issues like this earlier.
Differential Revision: https://reviews.llvm.org/D66243
llvm-svn: 369069
Currently we check whether LR is stored/loaded to/from inbetween the
loop decrement and loop end pseudo instructions. There's two problems
here:
- It relies on all load/store instructions being labelled as such in
tablegen.
- Actually any use of loop decrement is troublesome because the value
doesn't exist!
So we need to check for any read/write of LR that occurs between the
two instructions and revert if we find anything.
Differential Revision: https://reviews.llvm.org/D65792
llvm-svn: 368130
Revert the hardware loop upon finding a LoopEnd that doesn't target
the loop header, instead of asserting a failure.
Differential Revision: https://reviews.llvm.org/D65268
llvm-svn: 367296
Both WhileLoopStart and LoopEnd may get turned into a cmp and br pair,
so add an implicit def to these pseudo instructions in case that WLS
and LE aren't generated.
Differential Revision: https://reviews.llvm.org/D65275
llvm-svn: 367089
While lowering test.set.loop.iterations, it wasn't checked how the
brcond was using the result and so the wls could branch to the loop
preheader instead of not entering it. The same was true for
loop.decrement.reg.
So brcond and br_cc and now lowered manually when using the hwloop
intrinsics. During this we now check whether the result has been
negated and whether we're using SETEQ or SETNE and 0 or 1. We can
then figure out which basic block the WLS and LE should be targeting.
Differential Revision: https://reviews.llvm.org/D64616
llvm-svn: 366809
ARMLowOverheadLoops would assert a failure if it did not find all the
pseudo instructions that comprise the hardware loop. Instead of doing
this, iterate through all the instructions of the function and revert
any remaining pseudo instructions that haven't been converted.
Differential Revision: https://reviews.llvm.org/D65080
llvm-svn: 366691
This patch addresses a couple of problems:
1) The maximum supported offset of LE is -4094.
2) The offset of WLS also needs to be checked, this uses a
maximum positive offset of 4094.
The use of BasicBlockUtils has been changed because the block offsets
weren't being initialised, but the isBBInRange checks both positive
and negative offsets.
ARMISelLowering has been tweaked because the test case presented
another pattern that we weren't supporting.
llvm-svn: 365749
Backend changes to enable WLS/LE low-overhead loops for armv8.1-m:
1) Use TTI to communicate to the HardwareLoop pass that we should try
to generate intrinsics that guard the loop entry, as well as setting
the loop trip count.
2) Lower the BRCOND that uses said intrinsic to an Arm specific node:
ARMWLS.
3) ISelDAGToDAG the node to a new pseudo instruction:
t2WhileLoopStart.
4) Add support in ArmLowOverheadLoops to handle the new pseudo
instruction.
Differential Revision: https://reviews.llvm.org/D63816
llvm-svn: 364733