Commit Graph

6 Commits

Author SHA1 Message Date
Sjoerd Meijer c352e7fbda [ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks
This adapts tail-predication to the new semantics of get.active.lane.mask as
defined in D86147. This means that:
- we can remove the BTC + 1 overflow checks because now the loop tripcount is
  passed in to the intrinsic,
- we can immediately use that value to setup a counter for the number of
  elements processed by the loop and don't need to materialize BTC + 1.

Differential Revision: https://reviews.llvm.org/D86303
2020-08-25 14:38:03 +01:00
Sjoerd Meijer 595270ae39 [ARM][MVE] Refactor option -disable-mve-tail-predication
This refactors option -disable-mve-tail-predication to take different arguments
so that we have 1 option to control tail-predication rather than several
different ones.

This is also a prep step for D82953, in which we want to reject reductions
unless that is requested with this option.

Differential Revision: https://reviews.llvm.org/D83133
2020-07-13 13:40:33 +01:00
Sjoerd Meijer 1319d9bb84 [ARM] Don't revert get.active.lane.mask in ARM Tail-Predication pass
Don't revert intrinsic get.active.lane.mask here, this is moved to isel
legalization in D82292.

Differential Revision: https://reviews.llvm.org/D82105
2020-06-26 07:42:39 +01:00
Sjoerd Meijer d1522513d4 [ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask
To set up a tail-predicated loop, we need to to calculate the number of
elements processed by the loop. We can now use intrinsic
@llvm.get.active.lane.mask() to do this, which is emitted by the vectoriser in
D79100. This intrinsic generates a predicate for the masked loads/stores, and
consumes the Backedge Taken Count (BTC) as its second argument. We can now use
that to reconstruct the loop tripcount, instead of the IR pattern match
approach we were using before.

Many thanks to Eli Friedman and Sam Parker for all their help with this work.

This also adds overflow checks for the different, new expressions that we
create: the loop tripcount, and the sub expression that calculates the
remaining elements to be processed. For the latter, SCEV is not able to
calculate precise enough bounds, so we work around that at the moment, but is
not entirely correct yet, it's conservative. The overflow checks can be
overruled with a force flag, which is thus potentially unsafe (but not really
because the vectoriser is the only place where this intrinsic is emitted at the
moment). It's also good to mention that the tail-predication pass is not yet
enabled by default.  We will follow up to see if we can implement these
overflow checks better, either by a change in SCEV or we may want revise the
definition of llvm.get.active.lane.mask.

Differential Revision: https://reviews.llvm.org/D79175
2020-06-17 15:17:42 +01:00
Sjoerd Meijer b0614509a0 [HardwareLoops] llvm.loop.decrement.reg definition
This is split off from D80316, slightly tightening the definition of overloaded
hardwareloop intrinsic llvm.loop.decrement.reg specifying that both operands
its result have the same type.
2020-05-21 10:48:16 +01:00
Sjoerd Meijer b567ff2fa0 [ARM][MVE] Tail-predication: support constant trip count
We had support for runtime trip count values, but not constants, and this adds
supports for that.

And added a minor optimisation while I was add it: don't invoke Cleanup when
there's nothing to clean up.

Differential Revision: https://reviews.llvm.org/D73198
2020-01-27 11:05:26 +00:00