Commit Graph

231 Commits

Author SHA1 Message Date
Sam Parker ff9ac33e1e [ARM][MVE] Validate tail predication values
Iterate through the loop and check that the observable values
produced are the same whether tail predication happens or not.

We want to find out if the tail-predicated version of this loop will
produce the same values as the loop in its original form. For this to
be true, the newly inserted implicit predication must not change the
the (observable) results.

We're doing this because many instructions in the loop will not be
predicated and so the conversion from VPT predication to tail
predication can result in different values being produced, because of
falsely predicated lanes not being updated in the converted form.

A masked load, whether through VPT or tail predication, will write
zeros to any of the falsely predicated bytes. So, from the loads, we
know that the false lanes are zeroed and here we're trying to track
that those false lanes remain zero, or where they change, the
differences are masked away by their user(s).

All MVE loads and stores have to be predicated, so we know that any
load operands, or stored results are equivalent already. Other
explicitly predicated instructions will perform the same operation in
the original loop and the tail-predicated form too. Because of this,
we can insert loads, stores and other predicated instructions into
our KnownFalseZeros set and build from there.

Differential Revision: https://reviews.llvm.org/D75452
2020-03-10 09:59:01 +00:00
Sam Parker 5618e9be37 [RDA][ARM] collectKilledOperands across multiple blocks
Use MIOperand in collectLocalKilledOperands to make the search
global, as we already have to search for global uses too. This
allows us to delete more dead code when tail predicating.

Differential Revision: https://reviews.llvm.org/D75167
2020-03-03 15:23:05 +00:00
Sumanth Gundapaneni 9897daa6bf Update LSR's logic that identifies a post-increment SCEV value.
One of the checks has been removed as it seem invalid.
The LoopStep size is always almost a 32-bit.

Differential Revision: https://reviews.llvm.org/D75079
2020-03-02 16:34:18 -06:00
David Green e2a2f3f7fc [ARM] MVE VMLAS
This addes extra patterns for the VMLAS MVE instruction, which performs
Qda = Qda * Qn + Rm, a similar pattern to the existing VMLA. The sinking
of splat(Rm) into the loop is already performed, meaning we just need
extra Pat's in tablegen.

Differential Revision: https://reviews.llvm.org/D75115
2020-02-28 14:27:21 +00:00
Sam Parker 46bfc2bc01 [NFC][ARM] Add tests 2020-02-28 11:24:02 +00:00
Sam Parker bf61421a02 [RDA] Track implicit-defs
Ensure that we're recording implicit defs, as well as visiting implicit
uses and implicit defs when we're walking through operands.

Differential Revision: https://reviews.llvm.org/D75185
2020-02-28 11:14:42 +00:00
Sam Parker 965ba4291a Revert "[ARM] Add CPSR as an implicit use of t2IT"
This reverts commit e58229fded.

Differential Revision: https://reviews.llvm.org/D75186
2020-02-27 15:43:44 +00:00
Sam Parker e58229fded [ARM] Add CPSR as an implicit use of t2IT
This use is already attached to the BUNDLE instruction but is lost
after finalisation.

Differential Revision: https://reviews.llvm.org/D75186
2020-02-27 10:10:40 +00:00
Sam Parker 72f044ecdf [NFC][ARM] Add test case 2020-02-27 09:38:34 +00:00
Sjoerd Meijer 7efabe5c7d [MIR][ARM] MachineOperand comments
This adds infrastructure to print and parse MIR MachineOperand comments.
The motivation for the ARM backend is to print condition code names instead of
magic constants that are difficult to read (for human beings). For example,
instead of this:

  dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14, $noreg
  t2Bcc %bb.4, 0, killed $cpsr

we now print this:

  dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14 /* CC::always */, $noreg
  t2Bcc %bb.4, 0 /* CC:eq */, killed $cpsr

This shows that MachineOperand comments are enclosed between /* and */. In this
example, the EOR instruction is not conditionally executed (i.e. it is "always
executed"), which is encoded by the 14 immediate machine operand. Thus, now
this machine operand has /* CC::always */ as a comment. The 0 on the next
conditional branch instruction represents the equal condition code, thus now
this operand has /* CC:eq */ as a comment.

As it is a comment, the MI lexer/parser completely ignores it. The benefit is
that this keeps the change in the lexer extremely minimal and no target
specific parsing needs to be done. The changes on the MIPrinter side are also
minimal, as there is only one target hooks that is used to create the machine
operand comments.

Differential Revision: https://reviews.llvm.org/D74306
2020-02-24 14:19:21 +00:00
Sam Parker a67eb221e2 [RDA][ARM][LowOverheadLoops] Iteration count IT blocks
Change the way that we remove the redundant iteration count code in
the presence of IT blocks. collectLocalKilledOperands has been
introduced to scan an instructions operands, collecting the killed
instructions and then visiting them too. This is used to delete the
code in the preheader which calculates the iteration count. We also
track any IT blocks within the preheader and, if we remove all the
instructions from the IT block, we also remove the IT instruction.
isSafeToRemove is used to remove any redundant uses of the iteration
count within the loop body.

Differential Revision: https://reviews.llvm.org/D74975
2020-02-24 13:51:03 +00:00
Sam Parker 03756a4197 [ARM][MVE] Combine more extending masked loads
For MVE, don't look at the users of the extending loads so that more
as desirable for folding.

Differential Revision: https://reviews.llvm.org/D74958
2020-02-24 07:50:15 +00:00
Sam Parker de3e65e60c [ARM][LowOverheadLoops] Check loop liveouts
Check that no Q-regs are live out of the loop, unless the instruction
within the loop is predicated on the vctp.

Differential Revision: https://reviews.llvm.org/D72713
2020-02-19 12:59:01 +00:00
Sjoerd Meijer 6b0ed508fa [ARM][MVE] Tail-Predication: recognise (again) active lanes IR pattern
A small IR change in calculating the active lanes resulted in no longer
recognising tail-predication. Now recognise both an 'add' and 'or' in
the expression that calculates the active lanes.

Differential Revision: https://reviews.llvm.org/D74394
2020-02-11 15:18:18 +00:00
Jinsong Ji 01edae1271 [AsmPrinter] Print FP constant in hexadecimal form instead
Printing floating point number in decimal is inconvenient for humans.
Verbose asm output will print out floating point values in comments, it
helps.

But in lots of cases, users still need additional work to covert the
decimal back to hex or binary to check the bit patterns,
especially when there are small precision difference.

Hexadecimal form is one of the supported form in LLVM IR, and easier for
debugging.

This patch try to print all FP constant in hex form instead.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D73566
2020-02-07 16:00:55 +00:00
Sam Parker 0a8cae10fe [ReachingDefs] Make isSafeToMove more strict.
Test that we're not moving the instruction through instructions with
side-effects.

Differential Revision: https://reviews.llvm.org/D74058
2020-02-06 14:06:08 +00:00
Sjoerd Meijer 01022af5d5 [ARM][MVE] LowOverheadLoops: DCE on the iteration count setup expression
Once we have created a tail-predicated hardware-loop, and thus know the number
of elements that are processed, we want to clean-up the iteration count
expression of that loop. In D73682, we bailed the analysis on conditionally
executed instructions. This adds support for IT-blocks, so that we can handle
these cases again. The restriction is that we only support IT blocks containing
1 statement, but that seems to cover most cases and forms of the iteration
count expression.

Differential Revision: https://reviews.llvm.org/D73947
2020-02-05 15:15:46 +00:00
Sam Parker 564275289d [ARM][LowOverheadLoops] Fix loop count chain
Checking that the use-def chain that performs the loop count
isSafeToRemove is not sufficient because it means that we can
remove register copies that we need to restore lr to its correct
value. This change now prevents the transform from kicking in for the
'remove-elem-moves' test which needs to addressed later on.

Differential Revision: https://reviews.llvm.org/D74037
2020-02-05 13:21:51 +00:00
Sam Parker 4c7f819204 [ARM][LowOverheadLoops] Ensure memory predication
While validating each MVE instruction, check that all instructions
that touch memory are somehow predicated upon the VCTP.

Differential Revision: https://reviews.llvm.org/D73616
2020-02-05 13:19:08 +00:00
Sam Parker 06e12893ff [ARM][LowOverheadLoops] Skip debug values
While iterating through the loop, don't inspect any dbg values.

Differential Revision: https://reviews.llvm.org/D73688
2020-01-30 11:51:58 +00:00
Sam Parker 6726d67bfd [ARM][LowOverheadLoops] Check scalar predicates
When trying to remove the loop iteration count, check that the
instruction will always execute.

Differential Revision: https://reviews.llvm.org/D73682
2020-01-30 09:13:04 +00:00
Sam Parker dc0d84f09e [NFC][ARM] Add test 2020-01-29 06:59:21 -05:00
Sjoerd Meijer b567ff2fa0 [ARM][MVE] Tail-predication: support constant trip count
We had support for runtime trip count values, but not constants, and this adds
supports for that.

And added a minor optimisation while I was add it: don't invoke Cleanup when
there's nothing to clean up.

Differential Revision: https://reviews.llvm.org/D73198
2020-01-27 11:05:26 +00:00
Sam Parker 6c2df5d14f [ARM][LowOverheadLoops] Dont ignore VCTP
When expanding the LoopStart, we try to remove the iteration count
calculation. However, if part of the calculation was also used to
calculate the number of elements we could end up deleting
instructions that were required to feed DLSTP/WLSTP.

Differential Revision: https://reviews.llvm.org/D73275
2020-01-27 10:59:12 +00:00
Sam Parker 0ae13766ff [NFC][ARM] Add test 2020-01-24 11:00:18 +00:00
Sam Parker 05532575e8 [RDA] Skip debug values
Skip debug instructions when iterating through a block to find uses.

Differential Revision: https://reviews.llvm.org/D73273
2020-01-23 17:04:54 +00:00
Sam Parker 0c943c6117 [NFC][ARM] Add test 2020-01-23 16:21:52 +00:00
David Green 58991ba773 [ARM] Mark MVE loads/store as not having side effects
The hasSideEffect parameter is usually automatically inferred from
instruction patterns. For some of our MVE instructions, we do not have
patterns though, such as for the pre/post inc loads and stores. This
instead specifies the flag manually on the base MVE_VLDRSTR_base
tablegen class, making sure we get this correct.

This can help with scheduling multiple loads more optimally. Here I've
added a unittest as a more direct form of testing.

Differential Revision: https://reviews.llvm.org/D73117
2020-01-22 17:56:55 +00:00
Sam Parker c04b9ba595 [ARM][MVE] Clear MaskedInsts vector
In MVETailPredication, clear the vector before running on a new loop.

Differential Revision: https://reviews.llvm.org/D73048
2020-01-22 04:27:36 -05:00
Sjoerd Meijer 8cba99e2aa [ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks
This patch uses helper function rewriteLoopExitValues that is refactored in
D72602 to rematerialise the iteration count in exit blocks, so that we can
clean-up loop update expressions inside the hardware-loops later in
ARMLowOverheadLoops, which is necessary to get actual performance gains for
tail-predicated loops.

Differential Revision: https://reviews.llvm.org/D72714
2020-01-20 10:26:36 +00:00
David Green 5e51f75542 [ARM] Favour post inc for MVE loops
We were previously not necessarily favouring postinc for the MVE loads
and stores, leading to extra code prior to the loop to set up the
preinc. MVE in general can benefit from postinc (as we don't have
unrolled loops), and certain instructions like the VLD2's only post-inc
versions are available.

Differential Revision: https://reviews.llvm.org/D70790
2020-01-20 06:57:07 +00:00
Sam Parker 42350cd893 [ARM][MVE] Tail Predicate IsSafeToRemove
Introduce a method to walk through use-def chains to decide whether
it's possible to remove a given instruction and its users. These
instructions are then stored in a set until the end of the transform
when they're erased. This is now used to perform checks on the
iteration count (LoopDec chain), element count (VCTP chain) and the
possibly redundant iteration count.

As well as being able to remove chains of instructions, we know also
check that the sub feeding the vctp is producing the expected value.

Differential Revision: https://reviews.llvm.org/D71837
2020-01-17 13:19:14 +00:00
Sam Parker 760b175109 [ARM][LowOverheadLoops] Update liveness info
Recommitting e93e0d413f after reverting due to test failures, which
will hopefully now be fixed. Original commit message:

After expanding the pseudo instructions, update the liveness info.
We do this in a post-order traversal of the loop, including its
exit blocks and preheader(s).

Differential Revision: https://reviews.llvm.org/D72131
2020-01-16 15:44:25 +00:00
Sam Parker e27632c302 [ARM][LowOverheadLoops] Allow all MVE instrs.
We have a whitelist of instructions that we allow when tail
predicating, since these are trivial ones that we've deemed need no
special handling. Now change ARMLowOverheadLoops to allow the
non-trivial instructions if they're contained within a valid VPT
block. Since a valid block is one that is predicated upon the VCTP so
we know that these non-trivial instructions will still behave as
expected once the implicit predication is used instead.

This also fixes a previous test failure.

Differential Revision: https://reviews.llvm.org/D72509
2020-01-14 12:03:58 +00:00
Sam Parker bad6032bc1 [ARM][LowOverheadLoops] Change predicate inspection
Use the already provided helper function to get the operand type so
that we can detect whether the vpr is being used as a predicate or
not. Also use existing helpers to get the predicate indices when we
converting the vpt blocks. This enables us to support both types of
vpr predicate operand.

Differential Revision: https://reviews.llvm.org/D72504
2020-01-14 11:47:34 +00:00
Sam Parker e73b20c57d [ARM][MVE] Disallow VPSEL for tail predication
Due to the current way that we collect predicated instructions, we
can't easily handle vpsel in tail predicated loops. There are a
couple of issues:
1) It will use the VPR as a predicate operand, but doesn't have to be
   instead a VPT block, which means we can assert while building up
   the VPT block because we don't find another VPST to being a new
   one.
2) VPSEL still requires a VPR operand even after tail predicating,
   which means we can't remove it unless there is another
   instruction, such as vcmp, that can provide the VPR def.

The first issue should be a relatively simple fix in the logic of the
LowOverheadLoops pass, whereas the second will require us to
represent the 'implicit' tail predication with an explicit value.

Differential Revision: https://reviews.llvm.org/D72629
2020-01-14 11:41:17 +00:00
Momchil Velikov 173b711e83 [ARM][MVE] MVE-I should not be disabled by -mfpu=none
Architecturally, it's allowed to have MVE-I without an FPU, thus
-mfpu=none should not disable MVE-I, or moves to/from FP-registers.

This patch removes `+/-fpregs` from features unconditionally added to
target feature list, depending on FPU and moves the logic to Clang
driver, where the negative form (`-fpregs`) is conditionally added to
the target features list for the cases of `-mfloat-abi=soft`, or
`-mfpu=none` without either `+mve` or `+mve.fp`. Only the negative
form is added by the driver, the positive one is derived from other
features in the backend.

Differential Revision: https://reviews.llvm.org/D71843
2020-01-09 14:03:25 +00:00
Sam Parker 1cba261239 Revert "[ARM][LowOverheadLoops] Update liveness info"
This reverts commit e93e0d413f.

There's some ordering problems on some on the buildbots which needs
investigating.
2020-01-09 09:22:06 +00:00
Sam Parker e93e0d413f [ARM][LowOverheadLoops] Update liveness info
After expanding the pseudo instructions, update the liveness info.
We do this in a post-order traversal of the loop, including its
exit blocks and preheader(s).

Differential Revision: https://reviews.llvm.org/D72131
2020-01-09 08:33:47 +00:00
Sam Parker 96d2d96b03 [NFC][ARM] Update tests
Run the update_mir_test on some of the low-overhead loop tests.
2020-01-08 06:08:30 -05:00
Roman Lebedev 3d492d7503
[DAGCombine][X86][Thumb2/LowOverheadLoops] `A - (A & C)` -> `A & (~C)` fold (PR44448)
While we do manage to fold integer-typed IR in middle-end,
we can't do that for the main motivational case of pointers.

There is @llvm.ptrmask() intrinsic which may or may not be helpful,
but i'm not sure it is fully considered canonical yet,
not everything is fully aware of it likely.

Name: PR44448  ptr - (ptr & C) -> ptr & (~C)
%bias = and i32 %ptr, C
%r = sub i32 %ptr, %bias
  =>
%r = and i32 %ptr, ~C

See
  https://bugs.llvm.org/show_bug.cgi?id=44448
  https://reviews.llvm.org/D71499
2020-01-03 17:55:45 +03:00
Sam Parker 69cfbb460e [ARM][NFC] Update MIR test 2020-01-03 14:51:15 +00:00
David Green f323ab919a [ARM] Add +mve feature to mve tests. NFC 2020-01-01 17:25:20 +00:00
David Green b4abe7afbf [ARM] Sink splat to ICmp
This adds ICmp to the list of instructions that we sink a splat to in a
loop, allowing the register forms of instructions to be selected more
often. It does not add FCmp yet as the results look a little odd, trying
to keep the register in an float reg and having to move it back to a GPR.

Differential Revision: https://reviews.llvm.org/D70997
2019-12-30 12:58:14 +00:00
Sam Parker acbc9aed72 [ARM][MVE] Fixes for tail predication.
1) Fix an issue with the incorrect value being used for the number of
   elements being passed to [d|w]lstp. We were trying to check that
   the value was available at LoopStart, but this doesn't consider
   that the last instruction in the block could also define the
   register. Two helpers have been added to RDA for this.
2) Insert some code to now try to move the element count def or the
   insertion point so that we can perform more tail predication.
3) Related to (1), the same off-by-one could prevent us from
   generating a low-overhead loop when a mov lr could have been
   the last instruction in the block.
4) Fix up some instruction attributes so that not all the
   low-overhead loop instructions are labelled as branches and
   terminators - as this is not true for dls/dlstp.

Differential Revision: https://reviews.llvm.org/D71609
2019-12-20 09:34:18 +00:00
Sam Parker 4042518335 [ARM][MVE] Tail predicate in the presence of vcmp
Record the discovered VPT blocks while checking for validity and, for
now, only handle blocks that begin with VPST and not VPT. We're now
allowing more than one instruction to define vpr, but each block must
somehow be predicated using the vctp. This leaves us with several
scenarios which need fixing up:
1) A VPT block with is only predicated by the vctp and has no
   internal vpr defs.
2) A VPT block which is only predicated by the vctp but has an
   internal vpr def.
3) A VPT block which is predicated upon the vctp as well as another
   vpr def.
4) A VPT block which is not predicated upon a vctp, but contains it
   and all instructions within the block are predicated upon in.

The changes needed are, for:
1) The easy one, just remove the vpst and unpredicate the
   instructions in the block.
2) Remove the vpst and unpredicate the instructions up to the
   internal vpr def. Need insert a new vpst to predicate the
   remaining instructions.
3) No nothing.
4) The vctp will be inside a vpt and the instruction will be removed,
   so adjust the size of the mask on the vpst.

Differential Revision: https://reviews.llvm.org/D71107
2019-12-20 08:42:11 +00:00
Sjoerd Meijer d97cf1f889 [ARM][LowOverheadLoops] Remove dead loop update instructions.
After creating a low-overhead loop, the loop update instruction was still
lingering around hurting performance. This removes dead loop update
instructions, which in our case are mostly SUBS instructions.

To support this, some helper functions were added to MachineLoopUtils and
ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses
before a particular loop instruction, respectively.

This is a first version that removes a SUBS instruction when there are no other
uses inside and outside the loop block, but there are some more interesting
cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which
shows that there is room for improvement. For example, we can't handle this
case yet:

    ..
    dlstp.32  lr, r2
  .LBB0_1:
    mov r3, r2
    subs  r2, #4
    vldrh.u32 q2, [r1], #8
    vmov  q1, q0
    vmla.u32  q0, q2, r0
    letp  lr, .LBB0_1
  @ %bb.2:
    vctp.32 r3
    ..

which is a lot more tricky because r2 is not only used by the subs, but also by
the mov to r3, which is used outside the low-overhead loop by the vctp
instruction, and that requires a bit of a different approach, and I will follow
up on this.

Differential Revision: https://reviews.llvm.org/D71007
2019-12-11 10:20:19 +00:00
David Green b1aba0378e [ARM] Enable MVE masked loads and stores
With the extra optimisations we have done, these should now be fine to
enable by default. Which is what this patch does.

Differential Revision: https://reviews.llvm.org/D70968
2019-12-09 11:37:34 +00:00
Guozhi Wei 72942459d0 [MBP] Avoid tail duplication if it can't bring benefit
Current tail duplication integrated in bb layout is designed to increase the fallthrough from a BB's predecessor to its successor, but we have observed cases that duplication doesn't increase fallthrough, or it brings too much size overhead.

To overcome these two issues in function canTailDuplicateUnplacedPreds I add two checks:

  make sure there is at least one duplication in current work set.
  the number of duplication should not exceed the number of successors.

The modification in hasBetterLayoutPredecessor fixes a bug that potential predecessor must be at the bottom of a chain.

Differential Revision: https://reviews.llvm.org/D64376
2019-12-06 09:53:53 -08:00
Simon Tatham 48cce077ef [ARM,MVE] Rename and clean up VCTP IR intrinsics.
Summary:
D65884 added a set of Arm IR intrinsics for the MVE VCTP instruction,
to use in tail predication. But the 64-bit one doesn't work properly:
its predicate type is `<2 x i1>` / `v2i1`, which isn't a legal MVE
type (due to not having a full set of instructions that manipulate it
usefully). The test of `vctp64` in `basic-tail-pred.ll` goes through
`opt` fine, as the test expects, but if you then feed it to `llc` it
causes a type legality failure at isel time.

The usual workaround we've been using in the rest of the MVE
intrinsics family is to bodge `v2i1` into `v4i1`. So I've adjusted the
`vctp64` IR intrinsic to do that, and completely removed the code (and
test) that uses that intrinsic for 64-bit tail predication. That will
allow me to add isel rules (upcoming in D70485) that actually generate
the VCTP64 instruction.

Also renamed all four of these IR intrinsics so that they have `mve`
in the name, since its absence was confusing.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: MarkMurrayARM

Subscribers: samparker, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70592
2019-12-02 16:20:30 +00:00
Mark Murray e8a8dbe9c4 [ARM][MVE][Intrinsics] Add MVE VMUL intrinsics. Remove annoying "t1" from VMUL* instructions. Add unit tests.
Summary: Add MVE VMUL intrinsics. Remove annoying "t1" from VMUL* instructions. Add unit tests.

Reviewers: simon_tatham, ostannard, dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70546
2019-11-27 16:52:05 +00:00
David Green b5315ae8ff [Codegen][ARM] Add addressing modes from masked loads and stores
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.

To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.

This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
   Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
   legality of masked operations as well as normal ones. This array is
   fairly small, so doubling the size still won't make it very large.
   Offset masked loads can then be controlled with
   setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
   CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
   the same way.
- The ARM backend is then adjusted to make use of these indexed masked
   loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.

Differential Revision: https://reviews.llvm.org/D70176
2019-11-26 16:21:01 +00:00
Sam Parker 28166816b0 [ARM][ReachingDefs] Remove dead code in loloops.
Add some more helper functions to ReachingDefs to query the uses of
a given MachineInstr and also to query whether two MachineInstrs use
the same def of a register.

For Arm, while tail-predicating, these helpers are used in the
low-overhead loops to remove the dead code that calculates the number
of loop iterations.

Differential Revision: https://reviews.llvm.org/D70240
2019-11-26 10:27:46 +00:00
Sam Parker cced971fd3 [ARM][ReachingDefs] RDA in LoLoops
Add several new methods to ReachingDefAnalysis:
- getReachingMIDef, instead of returning an integer, return the
  MachineInstr that produces the def.
- getInstFromId, return a MachineInstr for which the given integer
  corresponds to.
- hasSameReachingDef, return whether two MachineInstr use the same
  def of a register.
- isRegUsedAfter, return whether a register is used after a given
  MachineInstr.

These methods have been used in ARMLowOverhead to replace searching
for uses/defs.

Differential Revision: https://reviews.llvm.org/D70009
2019-11-26 10:13:46 +00:00
Sam Parker 4a59eedd2d [ARM][ConstantIslands] Correct block size update
When inserting a non-decrementing LE, the basic block was being
resized to take into consideration that a tCMP and tBcc had been
combined into one T1 instruction. This is not true in the LE case
where we generate a CBN?Z and an LE.

Differential Revision: https://reviews.llvm.org/D70536
2019-11-26 09:55:58 +00:00
Sam Parker d43913ae38 [ARM][MVE] Enable narrow vectors for tail pred
Remove the restriction, from the mve tail predication pass, that the
all masked vectors instructions need to be 128-bits. This allows us
to supported extending loads and truncating stores.

Differential Revision: https://reviews.llvm.org/D69946
2019-11-19 08:51:12 +00:00
Sam Parker 8978c12b39 [ARM][MVE] Tail predication conversion
This patch modifies ARMLowOverheadLoops to convert a predicated
vector low-overhead loop into a tail-predicatd one. This is currently
a very basic conversion, with the following restrictions:
- Operates only on single block loops.
- The loop can only contain a single vctp instruction.
- No other instructions can write to the vpr.
- We only allow a subset of the mve instructions in the loop.

TODO: Pass the number of elements, not the number of iterations to
dlstp/wlstp.

Differential Revision: https://reviews.llvm.org/D69945
2019-11-19 08:22:18 +00:00
David Green 91b0cad813 [ARM] Use isFMAFasterThanFMulAndFAdd for MVE
The Arm backend will usually return false for isFMAFasterThanFMulAndFAdd,
where both the fused VFMA.f32 and a non-fused VMLA.f32 are usually
available for scalar code. For MVE we don't have the non-fused version
though. It makes more sense for isFMAFasterThanFMulAndFAdd to return
true, allowing us to simplify some of the existing ISel patterns.

The tests here are that non of the existing tests failed, and so we are
still selecting VFMA and VFMS. The one test that changed shows we can
now select from fast math flags, as opposed to just relying on the
isFMADLegalForFAddFSub option.

Differential Revision: https://reviews.llvm.org/D69115
2019-11-04 15:05:41 +00:00
Sam Parker 3ff961cabd [ARM][MVE] Change VPST to use, not def, VPR
Unlike VPT, VPST just uses the current value of VPR.P0.

Differential Revision: https://reviews.llvm.org/D69037

llvm-svn: 375087
2019-10-17 08:46:31 +00:00
Sam Parker 39af8a3a3b [DAGCombine][ARM] Enable extending masked loads
Add generic DAG combine for extending masked loads.

Allow us to generate sext/zext masked loads which can access v4i8,
v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively.

Differential Revision: https://reviews.llvm.org/D68337

llvm-svn: 375085
2019-10-17 07:55:55 +00:00
Sam Parker ef7990a88a [NFC][ARM][MVE] More tests
Add some tail predication tests with fast math.

llvm-svn: 373331
2019-10-01 13:02:14 +00:00
Sam Parker e3b4f0ec25 [NFC][ARM][MVE] More tests
Add some loop tests that cover different float operations and types.

llvm-svn: 373192
2019-09-30 08:49:42 +00:00
Sam Parker aac03ae06a [ARM][MVE] Change VCTP operand
The VCTP instruction will calculate the predicate masked based upon
the number of elements that need to be processed. I had inserted the
sub before the vctp intrinsic and supplied it as the operand, but
this is incorrect as the phi should directly feed the vctp. The sub
is calculating the value for the next iteration.

Differential Revision: https://reviews.llvm.org/D67921

llvm-svn: 373188
2019-09-30 08:03:23 +00:00
Sam Parker 110607b284 [NFC][ARM] Add some tail-predication tests
Use different data types for some simple loops.

llvm-svn: 373064
2019-09-27 10:33:53 +00:00
Sam Parker 9feb429a33 [ARM][MVE] Remove old tail predicates
Remove any predicate that we replace with a vctp intrinsic, and try
to remove their operands too. Also look into the exit block to see if
there's any duplicates of the predicates that we've replaced and
clone the vctp to be used there instead.

Differential Revision: https://reviews.llvm.org/D67709

llvm-svn: 372567
2019-09-23 09:48:25 +00:00
Sam Parker 4ba6d0ded2 [ARM][LowOverheadLoops] Use subs during revert.
Check whether there are any uses or defs between the LoopDec and
LoopEnd. If there's not, then we can use a subs to set the cpsr and
skip generating a cmp.

Differential Revision: https://reviews.llvm.org/D67801

llvm-svn: 372560
2019-09-23 08:57:50 +00:00
Sam Parker 566127e376 [ARM][LowOverheadLoops] Use tBcc when reverting
Check the branch target ranges and use a tBcc instead of t2Bcc when
we can.

Differential Revision: https://reviews.llvm.org/D67796

llvm-svn: 372557
2019-09-23 08:35:31 +00:00
Sam Parker 56aa691c41 [ARM] Fix for buildbots
I had missed that massive.mir also needed updating.

llvm-svn: 372303
2019-09-19 06:50:19 +00:00
Sam Parker f1d069e54d [ARM] Fix for buildbots
Add --verifymachineinstrs and update the remaining low overhead loop
tests.

llvm-svn: 372121
2019-09-17 13:46:26 +00:00
Sam Parker 36c922278e [ARM][LowOverheadLoops] Add LR def safety check
Converting the *LoopStart pseudo instructions into DLS/WLS results in
LR being defined. These instructions were inserted on the assumption
that LR would already contain the loop counter because a mov is
introduced during ISel as the the consumers in the loop can only use
LR. That assumption proved wrong!

So perform a safety check, finding an appropriate place to insert the
DLS/WLS instructions or revert if this isn't possible.

Differential Revision: https://reviews.llvm.org/D67539

llvm-svn: 372111
2019-09-17 12:19:32 +00:00
Sam Parker 95b28a4c72 [ARM] LE support in ConstantIslands
The low-overhead branch extension provides a loop-end 'LE' instruction
that performs no decrement nor compare, it just jumps backwards. This
patch modifies the constant islands pass to try to insert LE
instructions in place of a Thumb2 conditional branch, instead of
shrinking it. This only happens if a cmp can be converted to a cbn/z
and used to exit the loop.

Differential Revision: https://reviews.llvm.org/D67404

llvm-svn: 372085
2019-09-17 09:08:05 +00:00
Guillaume Chatelet 48904e9452 [Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes mir parsing
Summary:
This catches malformed mir files which specify alignment as log2 instead of pow2.
See https://reviews.llvm.org/D65945 for reference,

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67433

llvm-svn: 371608
2019-09-11 11:16:48 +00:00
Sam Parker 312409e464 [ARM] MVE Tail Predication
The MVE and LOB extensions of Armv8.1m can be combined to enable
'tail predication' which removes the need for a scalar remainder
loop after vectorization. Lane predication is performed implicitly
via a system register. The effects of predication is described in
Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points
being:
- For vector operations that perform reduction across the vector and
  produce a scalar result, whether the value is accumulated or not.
- For non-load instructions, the predicate flags determine if the
  destination register byte is updated with the new value or if the
  previous value is preserved.
- For vector store instructions, whether the store occurs or not.
- For vector load instructions, whether the value that is loaded or
  whether zeros are written to that element of the destination
  register.

This patch implements a pass that takes a hardware loop, containing
masked vector instructions, and converts it something that resembles
an MVE tail predicated loop. Currently, if we had code generation,
we'd generate a loop in which the VCTP would generate the predicate
and VPST would then setup the value of VPR.PO. The loads and stores
would be placed in VPT blocks so this is not tail predication, but
normal VPT predication with the predicate based upon a element
counting induction variable. Further work needs to be done to finally
produce a true tail predicated loop.

Because only the loads and stores are predicated, in both the LLVM IR
and MIR level, we will restrict support to only lane-wise operations
(no horizontal reductions). We will perform a final check on MIR
during loop finalisation too.

Another restriction, specific to MVE, is that all the vector
instructions need operate on the same number of elements. This is
because predication is performed at the byte level and this is set
on entry to the loop, or by the VCTP instead.

Differential Revision: https://reviews.llvm.org/D65884

llvm-svn: 371179
2019-09-06 08:24:41 +00:00
Eli Friedman 9b9a308452 [ARM][LowOverheadLoops] Fix generated code for "revert".
Two issues:

1. t2CMPri shouldn't use CPSR if it isn't predicated. This doesn't
really have any visible effect at the moment, but it might matter in the
future.
2. The t2CMPri generated for t2WhileLoopStart might need to use a
register that isn't LR.

My team found this because we have a patch to track register liveness
late in the pass pipeline. I'll look into upstreaming it to help catch
issues like this earlier.

Differential Revision: https://reviews.llvm.org/D66243

llvm-svn: 369069
2019-08-15 23:35:53 +00:00
Sam Parker 173de03740 [ARM][LowOverheadLoops] Revert after read/write
Currently we check whether LR is stored/loaded to/from inbetween the
loop decrement and loop end pseudo instructions. There's two problems
here:
- It relies on all load/store instructions being labelled as such in
  tablegen.
- Actually any use of loop decrement is troublesome because the value
  doesn't exist!
    
So we need to check for any read/write of LR that occurs between the
two instructions and revert if we find anything.

Differential Revision: https://reviews.llvm.org/D65792

llvm-svn: 368130
2019-08-07 07:39:19 +00:00
Sam Parker ed2ea3e46b [ARM][LowOverheadLoops] Revert non-header LE target
Revert the hardware loop upon finding a LoopEnd that doesn't target
the loop header, instead of asserting a failure.

Differential Revision: https://reviews.llvm.org/D65268

llvm-svn: 367296
2019-07-30 08:08:44 +00:00
Sam Parker c760b5da11 [ARM][LowOverheadLoops] Add CPSR defs
Both WhileLoopStart and LoopEnd may get turned into a cmp and br pair,
so add an implicit def to these pseudo instructions in case that WLS
and LE aren't generated.

Differential Revision: https://reviews.llvm.org/D65275

llvm-svn: 367089
2019-07-26 08:15:01 +00:00
Sam Parker 57e87dd81b [ARM][LowOverheadLoops] Fix branch target codegen
While lowering test.set.loop.iterations, it wasn't checked how the
brcond was using the result and so the wls could branch to the loop
preheader instead of not entering it. The same was true for
loop.decrement.reg.
    
So brcond and br_cc and now lowered manually when using the hwloop
intrinsics. During this we now check whether the result has been
negated and whether we're using SETEQ or SETNE and 0 or 1. We can
then figure out which basic block the WLS and LE should be targeting.

Differential Revision: https://reviews.llvm.org/D64616

llvm-svn: 366809
2019-07-23 14:08:46 +00:00
Sam Parker 4379a40088 [ARM][LowOverheadLoops] Revert remaining pseudos
ARMLowOverheadLoops would assert a failure if it did not find all the
pseudo instructions that comprise the hardware loop. Instead of doing
this, iterate through all the instructions of the function and revert
any remaining pseudo instructions that haven't been converted.

Differential Revision: https://reviews.llvm.org/D65080

llvm-svn: 366691
2019-07-22 14:16:40 +00:00
Sam Parker 08b4a8da07 [ARM][LowOverheadLoops] Correct offset checking
This patch addresses a couple of problems:
1) The maximum supported offset of LE is -4094.
2) The offset of WLS also needs to be checked, this uses a
   maximum positive offset of 4094.
    
The use of BasicBlockUtils has been changed because the block offsets
weren't being initialised, but the isBBInRange checks both positive
and negative offsets.
    
ARMISelLowering has been tweaked because the test case presented
another pattern that we weren't supporting.

llvm-svn: 365749
2019-07-11 09:56:15 +00:00
Sam Parker 98722691b0 [ARM] WLS/LE Code Generation
Backend changes to enable WLS/LE low-overhead loops for armv8.1-m:
1) Use TTI to communicate to the HardwareLoop pass that we should try
   to generate intrinsics that guard the loop entry, as well as setting
   the loop trip count.
2) Lower the BRCOND that uses said intrinsic to an Arm specific node:
   ARMWLS.
3) ISelDAGToDAG the node to a new pseudo instruction:
   t2WhileLoopStart.
4) Add support in ArmLowOverheadLoops to handle the new pseudo
   instruction.

Differential Revision: https://reviews.llvm.org/D63816

llvm-svn: 364733
2019-07-01 08:21:28 +00:00