llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	e1c1adf9dc	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. This is a recommit of `6cc3d80a84` after fixing the backward instruction definitions.	2020-12-18 16:13:08 +00:00
David Green	6e913e4451	Revert "[ARM] Match dual lane vmovs from insert_vector_elt" This one needed more testing.	2020-12-18 13:33:40 +00:00
David Green	6cc3d80a84	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. Differential Revision: https://reviews.llvm.org/D92553	2020-12-15 15:58:52 +00:00
David Green	0447f3508f	[ARM][RegAlloc] Add t2LoopEndDec We currently have problems with the way that low overhead loops are specified, with LR being spilled between the t2LoopDec and the t2LoopEnd forcing the entire loop to be reverted late in the backend. As they will eventually become a single instruction, this patch introduces a t2LoopEndDec which is the combination of the two, combined before registry allocation to make sure this does not fail. Unfortunately this instruction is a terminator that produces a value (and also branches - it only produces the value around the branching edge). So this needs some adjustment to phi elimination and the register allocator to make sure that we do not spill this LR def around the loop (needing to put a spill after the terminator). We treat the loop very carefully, making sure that there is nothing else like calls that would break it's ability to use LR. For that, this adds a isUnspillableTerminator to opt in the new behaviour. There is a chance that this could cause problems, and so I have added an escape option incase. But I have not seen any problems in the testing that I've tried, and not reverting Low overhead loops is important for our performance. If this does work then we can hopefully do the same for t2WhileLoopStart and t2DoLoopStart instructions. This patch also contains the code needed to convert or revert the t2LoopEndDec in the backend (which just needs a subs; bne) and the code pre-ra to create them. Differential Revision: https://reviews.llvm.org/D91358	2020-12-10 12:14:23 +00:00
David Green	73a6cd4b6b	[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR This hints the operand of a t2DoLoopStart towards using LR, which can help make it more likely to become t2DLS lr, lr. This makes it easier to move if needed (as the input is the same as the output), or potentially remove entirely. The hint is added after others (from COPY's etc) which still take precedence. It needed to find a place to add the hint, which currently uses the post isel custom inserter. Differential Revision: https://reviews.llvm.org/D89883	2020-11-10 16:28:57 +00:00
David Green	b2ac9681a7	[ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881	2020-11-10 15:57:58 +00:00
David Green	6dcbc323fd	Revert "[ARM][LowOverheadLoops] Adjust Start insertion." This reverts commit `38f625d0d1`. This commit contains some holes in its logic and has been causing issues since it was commited. The idea sounds OK but some cases were not handled correctly. Instead of trying to fix that up later it is probably simpler to revert it and work to reimplement it in a more reliable way.	2020-10-20 08:55:21 +01:00
Sam Parker	38f625d0d1	[ARM][LowOverheadLoops] Adjust Start insertion. Try to move the insertion point to become the terminator of the block, usually the preheader. Differential Revision: https://reviews.llvm.org/D88638	2020-10-01 10:49:19 +01:00
Meera Nakrani	675431b987	[ARM] Added more patterns to generate SSAT/USAT with shift Added patterns to generate an SSAT or USAT with shift for SSAT/USAT instructions that are matched from IR patterns. Differential Revision: https://reviews.llvm.org/D88145	2020-09-28 14:50:19 +00:00
Meera Nakrani	a3d0dce260	[ARM][TTI] Prevents constants in a min(max) or max(min) pattern from being hoisted when in a loop Changes TTI function getIntImmCostInst to take an additional Instruction parameter, which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT. We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated. Required minor changes in some non-ARM backends to allow for the optional parameter to be included. Differential Revision: https://reviews.llvm.org/D87457	2020-09-22 11:54:10 +00:00
Meera Nakrani	dd519bf0b0	[ARM] Selects SSAT/USAT from correct LLVM IR LLVM will canonicalize conditional selectors to a different pattern than the old code that was used. This is updating the function to match the new expected patterns and select SSAT or USAT when successful. Tests have also been updated to use the new patterns. Differential Review: https://reviews.llvm.org/D87379	2020-09-14 10:58:21 +00:00
Sam Parker	a3e41d4581	[ARM] Make MachineVerifier more strict about terminators Fix the ARM backend's analyzeBranch so it doesn't ignore predicated return instructions, and make the MachineVerifier rule more strict. Differential Revision: https://reviews.llvm.org/D40061	2020-08-27 07:10:20 +01:00
Dávid Bolvanský	0f14b2e6cb	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit `50c743fa71`. Patch will be split to smaller ones.	2020-08-17 20:44:33 +02:00
Dávid Bolvanský	50c743fa71	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 19:54:27 +02:00
Dávid Bolvanský	f9264995a6	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit `44587e2f7e`. Sanitizer tests need to be updated.	2020-08-13 14:37:40 +02:00
Dávid Bolvanský	44587e2f7e	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 14:23:58 +02:00
Dávid Bolvanský	a0485421d2	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit `385c9d673f`.	2020-08-13 12:59:15 +02:00
Dávid Bolvanský	385c9d673f	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 12:45:40 +02:00
David Green	146d35b6ee	[ARM] CSEL generation This adds a peephole optimisation to turn a t2MOVccr that could not be folded into any other instruction into a CSEL on 8.1-m. The t2MOVccr would usually be expanded into a conditional mov, that becomes an IT; MOV pair. We can instead generate a CSEL instruction, which can potentially be smaller and allows better register allocation freedom, which can help reduce codesize. Performance is more variable and may depend on the micrarchitecture details, but initial results look good. If we need to control this per-cpu, we can add a subtarget feature as we need it. Original patch by David Penry. Differential Revision: https://reviews.llvm.org/D83566	2020-07-16 11:10:53 +01:00
David Green	e73bb45c2b	[ARM] VQMOVN demand bits analysis Similar to VMOVN, a VQMOVN will only demand the top/bottom lanes of it's first input. However unlike VMOVN it will need access to the entire second argument, as that value is saturated not just moved in place. Differential Revision: https://reviews.llvm.org/D80515	2020-06-05 18:41:02 +01:00
David Green	2123bb843e	[ARM] Patterns for VQSHRN Given a VQMOVN(VSHR), we can fold that into a VQSHRN simply enough using a few tablegen patterns. Differential Revision: https://reviews.llvm.org/D77720	2020-05-16 17:46:43 +01:00
David Green	72f1fb2edf	[ARM] Combines for VMOVN This adds two combines for VMOVN, one to fold VMOVN[tb](c, VQMOVNb(a, b)) => VQMOVN[tb](c, b) The other to perform demand bits analysis on the lanes of a VMOVN. We know that only the bottom lanes of the second operand and the top or bottom lanes of the Qd operand are needed in the result, depending on if the VMOVN is bottom or top. Differential Revision: https://reviews.llvm.org/D77718	2020-05-16 15:13:16 +01:00
David Green	2e1fbf85b6	[ARM] MVE saturating truncates This adds some custom lowering for VQMOVN, an instruction that can be used to perform saturating truncates from a pair of min(max(X, -0x8000), 0x7fff), providing those constants are correct. This leaves a VQMOVNBs which saturates the value and inserts that into the bottom lanes of an existing vector. We then need to do something with the other lanes, extending the value using a vmovlb. Ideally, as will often be the case, only the bottom lane of what remains will be demanded, allowing the vmovlb to be removed. Which should mean the instruction is either equal or a win most of the time, and allows some extra follow-up folding to happen. Differential Revision: https://reviews.llvm.org/D77590	2020-05-16 15:10:20 +01:00
David Green	eecba95067	[ARM] Replace arm vendor with none. NFC	2020-04-22 18:19:35 +01:00
David Green	892af45c86	[ARM] Distribute MVE post-increments This adds some extra processing into the Pre-RA ARM load/store optimizer to detect and merge MVE loads/stores and adds of the same base. This we don't always turn into a post-inc during ISel, and due to the nature of it being a graph we don't always know an order to use for the nodes, not knowing which nodes to make post-inc and which to use the new post-inc of. After ISel, we have an order that we can use to post-inc the following instructions. So this looks for a loads/store with a starting offset of 0, and an add/sub from the same base, plus a number of other loads/stores. We then do some checks and convert the zero offset load/store into a postinc variant. Any loads/stores after it have the offset subtracted from their immediates. For example: LDR #4 LDR #4 LDR #0 LDR_POSTINC #16 LDR #8 LDR #-8 LDR #12 LDR #-4 ADD #16 It only handles MVE loads/stores at the moment. Normal loads/store will be added in a followup patch, they just have some extra details to ensure that we keep generating LDRD/LDM successfully. Differential Revision: https://reviews.llvm.org/D77813	2020-04-22 14:16:51 +01:00
David Green	9fa38c985f	[ARM] MVE vqmovn tests. NFC.	2020-04-06 11:13:02 +01:00

26 Commits