llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	2753722b0f	[ARM] Mark MVE_VMOV_to_lane_32 as isInsertSubregLike This allows the peephole optimizer to know that a MVE_VMOV_to_lane_32 is the same as an insert subreg, allowing it to optimize some redundant lane moves. Differential Revision: https://reviews.llvm.org/D95433	2021-02-02 16:35:47 +00:00
Yvan Roux	244ad228f3	[ARM][MachineOutliner] Add stack fixup feature This patch handles cases where we have to save/restore the link register into the stack and and load/store instruction which use the stack are part of the outlined region. It checks that there will be no overflow introduced by the new offset and fixup these instructions accordingly. Differential Revision: https://reviews.llvm.org/D92934	2021-01-19 10:59:09 +01:00
David Green	e7dc083a41	[ARM] Don't handle low overhead branches in AnalyzeBranch It turns our that the BranchFolder and IfCvt does not like unanalyzable branches that fall-through. This means that removing the unconditional branches from the end of tail predicated instruction can run into asserts and verifier issues. This effectively reverts `372eb2bbb6`, but adds handling to t2DoLoopEndDec which are not branches, so can be safely skipped.	2021-01-18 17:16:07 +00:00
David Green	372eb2bbb6	[ARM] Add low overhead loops terminators to AnalyzeBranch This treats low overhead loop branches the same as jump tables and indirect branches in analyzeBranch - they cannot be analyzed but the direct branches on the end of the block may be removed. This helps remove the unnecessary branches earlier, which can help produce better codegen (and change block layout in a number of cases). Differential Revision: https://reviews.llvm.org/D94392	2021-01-16 18:30:21 +00:00
Kazu Hirata	eb198f4c3c	[llvm] Use llvm::any_of (NFC)	2021-01-04 11:42:47 -08:00
David Penry	a9f14cdc62	[ARM] Add bank conflict hazarding Adds ARMBankConflictHazardRecognizer. This hazard recognizer looks for a few situations where the same base pointer is used and then checks whether the offsets lead to a bank conflict. Two parameters are also added to permit overriding of the target assumptions: arm-data-bank-mask=<int> - Mask of bits which are to be checked for conflicts. If all these bits are equal in the offsets, there is a conflict. arm-assume-itcm-bankconflict=<bool> - Assume that there will be bank conflicts on any loads to a constant pool. This hazard recognizer is enabled for Cortex-M7, where the Technical Reference Manual states that there are two DTCM banks banked using bit 2 and one ITCM bank. Differential Revision: https://reviews.llvm.org/D93054	2020-12-23 14:00:59 +00:00
Kazu Hirata	966f1431de	[Target] Use llvm::erase_if (NFC)	2020-12-20 17:43:22 -08:00
Kristof Beyls	df8ed39283	[ARM] harden-sls-blr: avoid r12 and lr in indirect calls. As a linker is allowed to clobber r12 on function calls, the code transformation that hardens indirect calls is not correct in case a linker does so. Similarly, the transformation is not correct when register lr is used. This patch makes sure that r12 or lr are not used for indirect calls when harden-sls-blr is enabled. Differential Revision: https://reviews.llvm.org/D92469	2020-12-19 12:39:59 +00:00
Kristof Beyls	a4c1f5160e	[ARM] Harden indirect calls against SLS To make sure that no barrier gets placed on the architectural execution path, each indirect call calling the function in register rN, it gets transformed to a direct call to __llvm_slsblr_thunk_mode_rN. mode is either arm or thumb, depending on the mode of where the indirect call happens. The llvm_slsblr_thunk_mode_rN thunk contains: bx rN <speculation barrier> Therefore, the indirect call gets split into 2; one direct call and one indirect jump. This transformation results in not inserting a speculation barrier on the architectural execution path. The mitigation is off by default and can be enabled by the harden-sls-blr subtarget feature. As a linker is allowed to clobber r12 on function calls, the above code transformation is not correct in case a linker does so. Similarly, the transformation is not correct when register lr is used. Avoiding r12/lr being used is done in a follow-on patch to make reviewing this code easier. Differential Revision: https://reviews.llvm.org/D92468	2020-12-19 12:33:42 +00:00
Kristof Beyls	320fd3314e	[ARM] Implement harden-sls-retbr for Thumb mode The only non-trivial consideration in this patch is that the formation of TBB/TBH instructions, which is done in the constant island pass, does not understand the speculation barriers inserted by the SLSHardening pass. As such, when harden-sls-retbr is enabled for a function, the formation of TBB/TBH instructions in the constant island pass is disabled. Differential Revision: https://reviews.llvm.org/D92396	2020-12-19 12:32:47 +00:00
Kristof Beyls	195f44278c	[ARM] Implement harden-sls-retbr for ARM mode Some processors may speculatively execute the instructions immediately following indirect control flow, such as returns, indirect jumps and indirect function calls. To avoid a potential miss-speculatively executed gadget after these instructions leaking secrets through side channels, this pass places a speculation barrier immediately after every indirect control flow where control flow doesn't return to the next instruction, such as returns and indirect jumps, but not indirect function calls. Hardening of indirect function calls will be done in a later, independent patch. This patch is implementing the same functionality as the AArch64 counter part implemented in https://reviews.llvm.org/D81400. For AArch64, returns and indirect jumps only occur on RET and BR instructions and hence the function attribute to control the hardening is called "harden-sls-retbr" there. On AArch32, there is a much wider variety of instructions that can trigger an indirect unconditional control flow change. I've decided to stick with the name "harden-sls-retbr" as introduced for the corresponding AArch64 mitigation. This patch implements this for ARM mode. A future patch will extend this to also support Thumb mode. The inserted barriers are never on the correct, architectural execution path, and therefore performance overhead of this is expected to be low. To ensure these barriers are never on an architecturally executed path, when the harden-sls-retbr function attribute is present, indirect control flow is never conditionalized/predicated. On targets that implement that Armv8.0-SB Speculation Barrier extension, a single SB instruction is emitted that acts as a speculation barrier. On other targets, a DSB SYS followed by a ISB is emitted to act as a speculation barrier. These speculation barriers are implemented as pseudo instructions to avoid later passes to analyze them and potentially remove them. The mitigation is off by default and can be enabled by the harden-sls-retbr subtarget feature. Differential Revision: https://reviews.llvm.org/D92395	2020-12-19 11:42:39 +00:00
David Green	e1c1adf9dc	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. This is a recommit of `6cc3d80a84` after fixing the backward instruction definitions.	2020-12-18 16:13:08 +00:00
David Green	6e913e4451	Revert "[ARM] Match dual lane vmovs from insert_vector_elt" This one needed more testing.	2020-12-18 13:33:40 +00:00
Yvan Roux	923ca0b411	[ARM][MachineOutliner] Fix costs model. Fix candidates calls costs models allocation and prepare stack fixups handling. Differential Revision: https://reviews.llvm.org/D92933	2020-12-17 16:08:23 +01:00
David Green	6cc3d80a84	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. Differential Revision: https://reviews.llvm.org/D92553	2020-12-15 15:58:52 +00:00
David Green	0447f3508f	[ARM][RegAlloc] Add t2LoopEndDec We currently have problems with the way that low overhead loops are specified, with LR being spilled between the t2LoopDec and the t2LoopEnd forcing the entire loop to be reverted late in the backend. As they will eventually become a single instruction, this patch introduces a t2LoopEndDec which is the combination of the two, combined before registry allocation to make sure this does not fail. Unfortunately this instruction is a terminator that produces a value (and also branches - it only produces the value around the branching edge). So this needs some adjustment to phi elimination and the register allocator to make sure that we do not spill this LR def around the loop (needing to put a spill after the terminator). We treat the loop very carefully, making sure that there is nothing else like calls that would break it's ability to use LR. For that, this adds a isUnspillableTerminator to opt in the new behaviour. There is a chance that this could cause problems, and so I have added an escape option incase. But I have not seen any problems in the testing that I've tried, and not reverting Low overhead loops is important for our performance. If this does work then we can hopefully do the same for t2WhileLoopStart and t2DoLoopStart instructions. This patch also contains the code needed to convert or revert the t2LoopEndDec in the backend (which just needs a subs; bne) and the code pre-ra to create them. Differential Revision: https://reviews.llvm.org/D91358	2020-12-10 12:14:23 +00:00
David Green	d9bf6245bf	[ARM] Revert low overhead loops with calls before registry allocation. This adds code to revert low overhead loops with calls in them before register allocation. Ideally we would not create low overhead loops with calls in them to begin with, but that can be difficult to always get correct. If we want to try and glue together t2LoopDec and t2LoopEnd into a single instruction, we need to ensure that no instructions use LR in the loop. (Technically the final code can be better too, as it doesn't need to use the same registers but that has not been optimized for here, as reverting loops with calls is expected to be very rare). It also adds a MVETailPredUtils.h header to share the revert code between different passes, and provides a place to expand upon, with RevertLoopWithCall becoming a place to perform other low overhead loop alterations like removing copies or combining LoopDec and End into a single instruction. Differential Revision: https://reviews.llvm.org/D91273	2020-12-07 15:44:40 +00:00
Pirama Arumuga Nainar	8262e94a6d	[ARM] Fix PR 47980: Use constrainRegClass during foldImmediate opt. Previously we used setRegClass to rgpr, which may expand the register domain if the result was already in a constrained class (tcgpr in the above PR). Differential Revision: https://reviews.llvm.org/D91192	2020-11-10 13:38:11 -08:00
David Green	08d1c2d470	[ARM] Introduce t2DoLoopStartTP This introduces a new pseudo instruction, almost identical to a t2DoLoopStart but taking 2 parameters - the original loop iteration count needed for a low overhead loop, plus the VCTP element count needed for a DLSTP instruction setting up a tail predicated loop. The idea is that the instruction holds both values and the backend ARMLowOverheadLoops pass can pick between the two, depending on whether it creates a tail predicated loop or falls back to a low overhead loop. To do that there needs to be something that converts a t2DoLoopStart to a t2DoLoopStartTP, for which this patch repurposes the MVEVPTOptimisationsPass as a "tail predication and vpt optimisation" pass. The extra operand for the t2DoLoopStartTP is chosen based on the operands of VCTP's in the loop, and the instruction is moved as late in the block as possible to attempt to increase the likelihood of making tail predicated loops. Differential Revision: https://reviews.llvm.org/D90591	2020-11-10 18:08:12 +00:00
Momchil Velikov	937ab6a785	[ARM][MachineOutliner] Emit more CFI instructions This patch make the outliner emit CFI instructions in a few more places: * after LR is restored, but before the return in an outlined function * around save/restore of LR to/from a register at calls to outlined functions * around save/restore of LR to/from the stack at calls to outlined functions The latter two only when the function does NOT spill LR. If the function spills LR, then outliner generated saves/restores around calls are not considered interesting for unwinding the frame. Differential Revision: https://reviews.llvm.org/D89483	2020-11-09 15:26:18 +00:00
Momchil Velikov	7360d6d921	[ARM][MachineOutliner] Do not overestimate LR liveness in return block The `LiveRegUnits` utility (as well as `LivePhysRegs`) considers callee-saved registers to be alive at the point after the return instruction in a block. In the ARM backend, the `LR` register is classified as callee-saved, which is not really correct (from an ARM eABI or just common sense point of view). These two conditions cause the `MachineOutliner` to overestimate the liveness of `LR`, which results in unnecessary saves/restores of `LR` around calls to outlined sequences. It also causes the `MachineVerifer` to crash in some cases, because the save instruction reads a dead `LR`, for example when the following program: int h(int, int); int f(int a, int b, int c, int d) { a = h(a + 1, b - 1); b = b + c; return 1 + (2 * a + b) * (c - d) / (a - b) * (c + d); } int g(int a, int b, int c, int d) { a = h(a - 1, b + 1); b = b + c; return 2 + (2 * a + b) * (c - d) / (a - b) * (c + d); } is compiled with `-target arm-eabi -march=armv7-m -Oz`. This patch computes the liveness of `LR` in return blocks only, while taking into account the few ARM instructions, which read `LR`, but nevertheless the register is not mentioned (explicitly or implicitly) in the instruction operands. Differential Revision: https://reviews.llvm.org/D89189	2020-11-02 16:47:22 +00:00
Nicholas Guy	eb9fe24eaf	[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz Fixes a regression caused by D82439, in which IT blocks were no longer being generated when -Oz is present. Differential Revision: https://reviews.llvm.org/D88496	2020-10-29 15:17:31 +00:00
Evgeny Leviant	e74f66125e	[ARM][SchedModels] Convert IsLdstsoScaledNotOptimalPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90150	2020-10-26 20:22:41 +03:00
Evgeny Leviant	99b2756517	[ARM][SchedModels] Get rid of IsLdrAm2ScaledPred Differential revision: https://reviews.llvm.org/D90024	2020-10-26 12:01:39 +03:00
Evgeny Leviant	a4fc18e641	[ARM][SchedModels] Convert IsLdstsoMinusRegPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90029	2020-10-26 11:54:08 +03:00
Evgeny Leviant	d613e39d52	[ARM][SchedModels] Convert IsLdrAm3NegRegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90045	2020-10-26 11:43:02 +03:00
David Green	61bc18de0b	[Schedule] Add a MultiHazardRecognizer This adds a MultiHazardRecognizer and starts to make use of it in the ARM backend. The idea of the class is to allow multiple independent hazard recognizers to be added to a single base MultiHazardRecognizer, allowing them to all work in parallel without requiring them to be chained into subclasses. They can then be added or not based on cpu or subtarget features, which will become useful in the ARM backend once more hazard recognizers are being used for various things. This also renames ARMHazardRecognizer to ARMHazardRecognizerFPMLx in the process, to more clearly explain what that recognizer is designed for. Differential Revision: https://reviews.llvm.org/D72939	2020-10-26 08:06:17 +00:00
Evgeny Leviant	7a78073be7	[ARM][SchedModels] Let ldm* instruction scheduling use MCSchedPredicate Differential revision: https://reviews.llvm.org/D89957	2020-10-23 10:33:20 +03:00
Evgeny Leviant	ed6a91f456	[ARM][SchedModels] Convert IsLdstsoScaledPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89939	2020-10-22 18:03:01 +03:00
Evgeny Leviant	bf9edcb6fd	[ARM][SchedModels] Convert IsLdrAm3RegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89876	2020-10-21 20:49:10 +03:00
Nicholas Guy	9a2d2bedb7	Add "SkipDead" parameter to TargetInstrInfo::DefinesPredicate Some instructions may be removable through processes such as IfConversion, however DefinesPredicate can not be made aware of when this should be considered. This parameter allows DefinesPredicate to distinguish these removable instructions on a per-call basis, allowing for more fine-grained control from processes like ifConversion. Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose Differential Revision: https://reviews.llvm.org/D88494	2020-10-21 11:52:47 +01:00
Yvan Roux	070b96962f	[ARM][MachineOutliner] Add calls handling. Handles calls inside outlined regions, by saving and restoring the link register. Differential Revision: https://reviews.llvm.org/D87136	2020-09-16 09:54:26 +02:00
Sam Parker	3ebc755227	[ARM] Try to rematerialize VCTP instructions We really want to try and avoid spilling P0, which can be difficult since there's only one register, so try to rematerialize any VCTP instructions. Differential Revision: https://reviews.llvm.org/D87280	2020-09-09 07:41:22 +01:00
Sam Parker	03141aa04a	[ARM] Enable outliner at -Oz for M-class Enable default outlining when the function has the minsize attribute and we're targeting an m-class core. Differential Revision: https://reviews.llvm.org/D82951	2020-08-27 08:02:56 +01:00
Sam Parker	a3e41d4581	[ARM] Make MachineVerifier more strict about terminators Fix the ARM backend's analyzeBranch so it doesn't ignore predicated return instructions, and make the MachineVerifier rule more strict. Differential Revision: https://reviews.llvm.org/D40061	2020-08-27 07:10:20 +01:00
Yvan Roux	0459f29e8b	[ARM][MachineOutliner] Add default mode. Use the stack to save and restore the link register when there is no available register to do it. Differential Revision: https://reviews.llvm.org/D76069	2020-08-20 09:25:33 +02:00
David Green	0c390c22a5	Revert "[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz" This reverts commit `18279a54b5` as it is causing some chromium android test problems.	2020-08-13 22:40:36 +01:00
Sam Parker	4f9f4b21e0	[ARM] Unrestrict Armv8-a IT when at minsize IT blocks with more than one instruction were performance deprecated in Armv8 but that doesn't mean we should follow that advise when optimising for size. Differential Revision: https://reviews.llvm.org/D85638	2020-08-10 14:59:53 +01:00
Nicholas Guy	18279a54b5	[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz Fixes a regression caused by D82439, in which IT blocks were no longer being generated when -Oz is present. This was due to the CPSR register being marked as dead, while this case was not accounted for. Differential Revision: https://reviews.llvm.org/D83667	2020-08-03 13:20:32 +01:00
Sjoerd Meijer	85342c27a3	[ARM] Optimize immediate selection Optimize some specific immediates selection by materializing them with sub/mvn instructions as opposed to loading them from the constant pool. Patch by Ben Shi, powerman1st@163.com. Differential Revision: https://reviews.llvm.org/D83745	2020-07-29 13:29:17 +01:00
Tim Northover	39108f4c7a	ARM: make Thumb1 instructions non-flag-setting in IT block. Many Thumb1 instructions are defined to set CPSR if executed outside an IT block, but leave it alone from inside one. In MachineIR this is represented by whether an optional register is CPSR or NoReg (0), and affects how the instructions are printed. This sets the instruction to the appropriate form during if-conversion.	2020-07-28 13:31:17 +01:00
James Y Knight	4b0aa5724f	Change the INLINEASM_BR MachineInstr to be a non-terminating instruction. Before this instruction supported output values, it fit fairly naturally as a terminator. However, being a terminator while also supporting outputs causes some trouble, as the physreg->vreg COPY operations cannot be in the same block. Modeling it as a non-terminator allows it to be handled the same way as invoke is handled already. Most of the changes here were created by auditing all the existing users of MachineBasicBlock::isEHPad() and MachineBasicBlock::hasEHPadSuccessor(), and adding calls to isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate. Reviewed By: nickdesaulniers, void Differential Revision: https://reviews.llvm.org/D79794	2020-07-01 12:51:50 -04:00
Eric Christopher	cf23852587	[Target] As part of using inclusive language within the llvm project, migrate away from the use of blacklist and whitelist. This change affects an internal llvm command line option.	2020-06-20 00:06:39 -07:00
Yvan Roux	669066de65	[ARM][MachineOutliner] Add LR RegSave mode. Outline chunks of code which need to save and restore the link register when a spare register can be used to it. Differential Revision: https://reviews.llvm.org/D80127	2020-06-15 15:22:08 +02:00
Yvan Roux	6b8628a1f0	[ARM][MachineOutliner] Add NoLRSave mode. Outline chunks of code which don't need a save/restore mechanism of the link register. Differential Revision: https://reviews.llvm.org/D80125	2020-06-11 08:45:46 +02:00
Yvan Roux	6b9e102243	[ARM][MachineOutliner] Remove unneeded dynamic allocation.	2020-06-04 13:12:26 +02:00
Yvan Roux	3648dde3dd	[ARM][MachineOutliner] Fix memory leak #2 . Use smart pointer instead of new/delete.	2020-05-15 17:33:56 +02:00
Yvan Roux	96c4460a0b	[ARM][MachineOutliner] Fix memory leak. Fix sanitizer bots after `0e4827aa4e`	2020-05-15 16:27:14 +02:00
Yvan Roux	0e4827aa4e	[ARM][MachineOutliner] Add Machine Outliner support for ARM. Enables Machine Outlining for ARM and Thumb2 modes. This is the first patch of the series which adds all the basic logic for the support, and only handles tail-calls and thunks. The outliner can be turned on by using clang -moutline option or -mllvm -enable-machine-outliner one (like AArch64). Differential Revision: https://reviews.llvm.org/D76066	2020-05-15 08:44:23 +02:00
Craig Topper	8c72b0271b	[CodeGen] Use Align in MachineConstantPool.	2020-05-12 10:06:40 -07:00

1 2 3 4 5 ...

681 Commits