llvm-project

Commit Graph

Author	SHA1	Message	Date
Sjoerd Meijer	c352e7fbda	[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1. Differential Revision: https://reviews.llvm.org/D86303	2020-08-25 14:38:03 +01:00
David Green	186a7f81e8	[ARM] Add VADDV and VMLAV patterns for v16i16 This adds patterns for v16i16's vecreduce, using all the existing code to go via an i32 VADDV/VMLAV and truncating the result. Differential Revision: https://reviews.llvm.org/D85452	2020-08-09 11:09:49 +01:00
David Green	8590e5abad	[ARM] Allow vecreduce_add in tail predicated loops This allows vecreduce_add in loops so that we can tailpredicate them. Differential Revision: https://reviews.llvm.org/D85454	2020-08-09 10:57:17 +01:00
David Green	25e38c3f3c	[ARM] Extra reduction plus tailpredication tests. NFC	2020-08-07 17:16:56 +01:00
David Green	146d35b6ee	[ARM] CSEL generation This adds a peephole optimisation to turn a t2MOVccr that could not be folded into any other instruction into a CSEL on 8.1-m. The t2MOVccr would usually be expanded into a conditional mov, that becomes an IT; MOV pair. We can instead generate a CSEL instruction, which can potentially be smaller and allows better register allocation freedom, which can help reduce codesize. Performance is more variable and may depend on the micrarchitecture details, but initial results look good. If we need to control this per-cpu, we can add a subtarget feature as we need it. Original patch by David Penry. Differential Revision: https://reviews.llvm.org/D83566	2020-07-16 11:10:53 +01:00
David Green	deb72ce298	[ARM] Better reductions MVE has native reductions for integer add and min/max. The others need to be expanded to a series of extract's and scalar operators to reduce the vector into a single scalar. The default codegen for that expands the reduction into a series of in-order operations. This modifies that to something more suitable for MVE. The basic idea is to use vector operations until there are 4 remaining items then switch to pairwise operations. For example a v8f16 fadd reduction would become: Y = VREV X Z = ADD(X, Y) z0 = Z[0] + Z[1] z1 = Z[2] + Z[3] return z0 + z1 The awkwardness (there is always some) comes in from something like a v4f16, which is first legalized by adding identity values to the extra lanes of the reduction, and which can then not be optimized away through the vrev; fadd combo, the inserts remain. I've made sure they custom lower so that we can produce the pairwise additions before the extra values are added. Differential Revision: https://reviews.llvm.org/D81397	2020-06-29 16:04:13 +01:00
David Green	c755157de9	[ARM] Add some MVE vecreduce tests. NFC	2020-06-09 12:07:19 +01:00

7 Commits