llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	54c91c0c74	[ARM] Implement isLoad/StoreFromStackSlot for MVE stack stores accesses This implements the isLoadFromStackSlot and isStoreToStackSlot for MVE MVE_VSTRWU32 and MVE_VLDRWU32 functions. They behave the same as many other loads/stores, expecting a FI in Op1 and zero offset in Op2. At the same time this alters VLDR_P0_off and VSTR_P0_off to use the same code too, as they too should be returning VPR in Op0, take a FI in Op1 and zero offset in Op2. Differential Revision: https://reviews.llvm.org/D106797	2021-07-27 09:11:58 +01:00
David Green	ca78151001	[ARM] Introduce MVEEXT ISel lowering Similar to D91921 (and D104515) this introduces two MVESEXT and MVEZEXT nodes that larger-than-legal sext and zext are lowered to. These either get optimized away or end up becoming a series of stack loads/store, in order to perform the extending whilst keeping the order of the lanes correct. They are generated from v8i16->v8i32, v16i8->v16i16 and v16i8->v16i32 extends, potentially with a intermediate extend for the larger v16i8->v16i32 extend. A number of combines have been added for obvious cases that come up in tests, notably MVEEXT of shuffles. More may be needed in the future, but this seems to cover most of the cases that come up in the tests. Differential Revision: https://reviews.llvm.org/D105090	2021-07-13 07:21:20 +01:00
Tomas Matheson	773771ba38	[CodeGen][regalloc] Don't align stack slots if the stack can't be realigned Register allocation may spill virtual registers to the stack, which can increase alignment requirements of the stack frame. If the the function did not require stack realignment before register allocation, the registers required to do so may not be reserved/available. This results in a stack frame that requires realignment but can not be realigned. Instead, only increase the alignment of the stack if we are still able to realign. The register SpillAlignment will be ignored if we can't realign, and the backend will be responsible for emitting the correct unaligned loads and stores. This seems to be the assumed behaviour already, e.g. ARMBaseInstrInfo::storeRegToStackSlot and X86InstrInfo::storeRegToStackSlot are both `canRealignStack` aware. Differential Revision: https://reviews.llvm.org/D103602	2021-06-11 16:49:12 +01:00
David Green	211ce51f27	[ARM] Clean up some tests, removing dead instructions. NFC	2021-05-22 13:38:00 +01:00
David Green	48cef1fa8e	[ARM] Create VMOVRRD from adjacent vector extracts This adds a combine for extract(x, n); extract(x, n+1) -> VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the same time in a single instruction, and thanks to the other VMOVRRD folds we have added recently can help reduce the amount of executed instructions. Floating point types are very similar, but will include a bitcast to an integer type. This also adds a shouldRewriteCopySrc, to prevent copy propagation from DPR to SPR, which can break as not all DPR regs can be extracted from directly. Otherwise the machine verifier is unhappy. Differential Revision: https://reviews.llvm.org/D100244	2021-04-20 15:15:43 +01:00
David Green	e7dc083a41	[ARM] Don't handle low overhead branches in AnalyzeBranch It turns our that the BranchFolder and IfCvt does not like unanalyzable branches that fall-through. This means that removing the unconditional branches from the end of tail predicated instruction can run into asserts and verifier issues. This effectively reverts `372eb2bbb6`, but adds handling to t2DoLoopEndDec which are not branches, so can be safely skipped.	2021-01-18 17:16:07 +00:00
David Green	372eb2bbb6	[ARM] Add low overhead loops terminators to AnalyzeBranch This treats low overhead loop branches the same as jump tables and indirect branches in analyzeBranch - they cannot be analyzed but the direct branches on the end of the block may be removed. This helps remove the unnecessary branches earlier, which can help produce better codegen (and change block layout in a number of cases). Differential Revision: https://reviews.llvm.org/D94392	2021-01-16 18:30:21 +00:00
David Green	e1c1adf9dc	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. This is a recommit of `6cc3d80a84` after fixing the backward instruction definitions.	2020-12-18 16:13:08 +00:00
David Green	6e913e4451	Revert "[ARM] Match dual lane vmovs from insert_vector_elt" This one needed more testing.	2020-12-18 13:33:40 +00:00
David Green	6cc3d80a84	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. Differential Revision: https://reviews.llvm.org/D92553	2020-12-15 15:58:52 +00:00
David Green	b2ac9681a7	[ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881	2020-11-10 15:57:58 +00:00
Anna Welker	064981f0ce	[ARM][MVE] Enable MVE gathers and scatters by default Enable MVE gather/scatters by default, which requires some minor adaptations in some tests. Differential revision: https://reviews.llvm.org/D86776	2020-08-28 19:05:29 +01:00
Sam Parker	a3e41d4581	[ARM] Make MachineVerifier more strict about terminators Fix the ARM backend's analyzeBranch so it doesn't ignore predicated return instructions, and make the MachineVerifier rule more strict. Differential Revision: https://reviews.llvm.org/D40061	2020-08-27 07:10:20 +01:00
David Green	dc2b282a8a	[ARM] Change test target to arm-none-none-eabi. NFC	2020-05-08 14:16:31 +01:00
Anna Welker	1e413a8c36	[ARM][MVE] Add support for incrementing gathers Enables the MVEGatherScatterLowering pass to build pre-incrementing gathers. Incrementing writeback gathers are built when it is possible to replace the loop increment instruction. Differential Revision: https://reviews.llvm.org/D76786	2020-05-07 12:33:50 +01:00

15 Commits