llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	39f8384964	[ARM] Correct features on pacbti instructions. Given a patch like D129506, using instructions not valid for the current feature set becomes an error. This updates the Arm hint-space instructions for pac/bti to require thumbv7m as opposed to 8.1-m.main, to make them valid when compiling for thumbv7m with -mbranch-protection. Differential Revision: https://reviews.llvm.org/D129692	2022-07-27 09:15:14 +01:00
Martin Storsjö	d8e67c1ccc	[ARM] Add SEH opcodes in frame lowering Skip inserting regular CFI instructions if using WinCFI. This is based a fair amount on the corresponding ARM64 implementation, but instead of trying to insert the SEH opcodes one by one where we generate other prolog/epilog instructions, we try to walk over the whole prolog/epilog range and insert them. This is done because in many cases, the exact number of instructions inserted is abstracted away deeper. For some cases, we manually insert specific SEH opcodes directly where instructions are generated, where the automatic mapping of instructions to SEH opcodes doesn't hold up (e.g. for __chkstk stack probes). Skip Thumb2SizeReduction for SEH prologs/epilogs, and force tail calls to wide instructions (just like on MachO), to make sure that the unwind info actually matches the width of the final instructions, without heuristics about what later passes will do. Mark SEH instructions as scheduling boundaries, to make sure that they aren't reordered away from the instruction they describe by PostRAScheduler. Mark the SEH instructions with the NoMerge flag, to avoid doing tail merging of functions that have multiple epilogs that all end with the same sequence of "b <other>; .seh_nop_w, .seh_endepilogue". Differential Revision: https://reviews.llvm.org/D125648	2022-06-02 12:28:46 +03:00
Archibald Elliott	f496330f97	[ARM] Fix Decode of tsb csync There is a crash in the ARM backend when attempting to decode a "tsb csync" instruction using `llvm-objdump --triple=armv8.4a -d`. The crash was in `ARMMCInstrAnalysis::evaluateBranch` where the number of operands in the decoded instruction (0) did not match the number of operands in the instruction description (1). This is becuase `tsb csync` looks like it has an operand during assembly, but there is only one valid operand (csync), so there is no encoding space in the instruction for the operand, so the decoder never has a field to decode that represents `csync`. The fix is to add a custom decode method, which ensures that this instruction does have the right number of operands after decoding. This method merely adds the only available operand value, `ARM_TSB::CSYNC`. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D121479	2022-03-17 17:29:31 +00:00
Mark Murray	3d7662142d	[ARM] Undeprecate complex IT blocks AArch32/Armv8A introduced the performance deprecation of certain patterns of IT instructions. After some debate internal to ARM, this is now being reverted; i.e. no IT instruction patterns are performance deprecated anymore, as the perfomance degredation is not significant enough. This reverts the following: "ARMv8-A deprecates some uses of the T32 IT instruction. All uses of IT that apply to instructions other than a single subsequent 16-bit instruction from a restricted set are deprecated, as are explicit references to the PC within that single 16-bit instruction. This permits the non-deprecated forms of IT and subsequent instructions to be treated as a single 32-bit conditional instruction." The deprecation no longer applies, but the behaviour may be controlled by the -arm-restrict-it and -arm-no-restrict-it command-line options, with the latter being the default. No warnings about complex IT blocks will be generated. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118044	2022-02-07 15:47:53 +00:00
tyb0807	762f0b5463	[ARM] Make getInstSizeInBytes() use instruction size from InstrInfo.td Currently, ARMBaseInstrInfo::getInstSizeInBytes() uses hard-coded instruction size for some pseudo-instructions, while this information should ideally be found in ARMInstrInfo.td, ARMInstrThumb(2).td files (which can be accessed via MCInstrDesc). Hence, the .td files should be updated and no hard-coded instruction sizes should be used by getInstSizeInBytes() anymore. Differential Revision: https://reviews.llvm.org/D118009	2022-02-01 10:39:14 +00:00
Ties Stuij	0fbb17458a	[ARM] Implement setjmp BTI placement for PACBTI-M This patch intends to guard indirect branches performed by longjmp by inserting BTI instructions after calls to setjmp. Calls with 'returns-twice' are lowered to a new pseudo-instruction named t2CALL_BTI that is later expanded to a bundle of {tBL,t2BTI}. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Alexandros Lamprineas - Ties Stuij Reviewed By: labrinea Differential Revision: https://reviews.llvm.org/D112427	2021-12-06 11:07:10 +00:00
David Green	b8f1ccb0ac	[ARM] Introduce i8neg and i8pos addressing modes Some instructions with i8 immediate ranges can only hold negative values (like t2LDRHi8), only hold positive values (like t2STRT) or hold +/- depending on the U bit (like the pre/post inc instructions. e.g t2LDRH_POST). This patch splits the AddrModeT2_i8 into AddrModeT2_i8, AddrModeT2_i8pos and AddrModeT2_i8neg to make this clear. This allows us to get the offset ranges of t2LDRHi8 correct in the load/store optimizer, fixing issues where we could end up creating instructions with positive offsets (which may then be encoded as ldrht). Differential Revision: https://reviews.llvm.org/D114638	2021-12-02 17:10:26 +00:00
Ties Stuij	5cff77c23f	[clang][ARM] PACBTI-M assembly support Introduce assembly support for Armv8.1-M PACBTI extension. This is an optional extension in v8.1-M. There are 10 new system registers and 5 new instructions, all predicated on the feature. The attribute for llvm-mc is called "pacbti". For armclang, an architecture extension also called "pacbti" was created. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Victor Campos - Ties Stuij Reviewed By: labrinea Differential Revision: https://reviews.llvm.org/D112420	2021-11-30 09:28:18 +00:00
Nick Desaulniers	89453ed6f2	[ARM] create new pseudo t2LDRLIT_ga_pcrel for stack guards We can't use the existing pseudo ARM::tLDRLIT_ga_pcrel for loading the stack guard for PIC code that references the GOT, since arm-pseudo may expand this to the narrow tLDRpci rather than the wider t2LDRpci. Create a new pseudo, t2LDRLIT_ga_pcrel, and expand it to t2LDRpci. Fixes: https://bugs.chromium.org/p/chromium/issues/detail?id=1270361 Reviewed By: ardb Differential Revision: https://reviews.llvm.org/D114762	2021-11-30 08:46:05 +01:00
David Green	5c64d8ef8c	[ARM] CSINC/CSINV patterns from CMOV We sometimes end up generating CMOV with constant operands that can be simplified to CSINC or CSINV under Arm-8.1m. This adds some simple patterns for them. Differential Revision: https://reviews.llvm.org/D114349	2021-11-27 20:21:41 +00:00
Simon Pilgrim	63b1e58f07	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED) If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. Reapplied with fix for AArch64 rev patterns to matching the ARM fix. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-25 11:14:15 +00:00
Benjamin Kramer	d32787230d	Revert "[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl" This reverts commit `3cf4a2c620`. It makes llc hang on the following test case. ``` target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" target triple = "aarch64-unknown-linux-gnu" define dso_local void @_PyUnicode_EncodeUTF16() local_unnamed_addr #0 { entry: br label %while.body117.i while.body117.i: ; preds = %cleanup149.i, %entry %out.6269.i = phi i16* [ undef, %cleanup149.i ], [ undef, %entry ] %0 = load i16, i16* undef, align 2 %1 = icmp eq i16 undef, -10240 br i1 %1, label %fail.i, label %cleanup149.i cleanup149.i: ; preds = %while.body117.i %or130.i = call i16 @llvm.bswap.i16(i16 %0) #2 store i16 %or130.i, i16* %out.6269.i, align 2 br label %while.body117.i fail.i: ; preds = %while.body117.i ret void } ; Function Attrs: nofree nosync nounwind readnone speculatable willreturn declare i16 @llvm.bswap.i16(i16) #1 attributes #0 = { "target-features"="+neon,+v8a" } attributes #1 = { nofree nosync nounwind readnone speculatable willreturn } attributes #2 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" } ```	2021-11-24 14:42:54 +01:00
Simon Pilgrim	3cf4a2c620	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-24 11:28:35 +00:00
Ard Biesheuvel	d7e089f2d6	[ARM] Use hardware TLS register in Thumb2 mode when -mtp=cp15 is passed In ARM mode, passing -mtp=cp15 forces the use of an inline MRC system register read to move the thread pointer value into a register. Currently, in Thumb2 mode, -mtp=cp15 is ignored, and a call to the __aeabi_read_tp helper is emitted instead. This is inconsistent, and breaks the Linux/ARM build for Thumb2 targets, as the Linux kernel does not provide an implementation of __aeabi_read_tp,. Reviewed By: nickdesaulniers, peter.smith Differential Revision: https://reviews.llvm.org/D112600	2021-10-27 16:42:11 -07:00
Igor Kudrin	ddbe812bcc	[ARM][llvm-objdump] Annotate PC-relative memory operands This implements `MCInstrAnalysis::evaluateMemoryOperandAddress()` for Arm so that the disassembler can print the target address of memory operands that use PC+immediate addressing. Differential Revision: https://reviews.llvm.org/D105979	2021-08-05 14:11:11 +07:00
Daniel Egger	98c2e4115d	[ARM] Add lowering of uadd_sat to uq{add\|sub}8 and uq{add\|sub}16 This follow the lead of https://reviews.llvm.org/D68974 to add lowering of unsigned saturated addition/subtraction. Differential Revision: https://reviews.llvm.org/D105413	2021-07-11 15:58:11 +01:00
David Green	bee2f618d5	[ARM] Introduce t2WhileLoopStartTP This adds t2WhileLoopStartTP, similar to the t2DoLoopStartTP added in D90591. It keeps a reference to both the tripcount register and the element count register, so that the ARMLowOverheadLoops pass in the backend can pick the correct one without having to search for it from the operand of a VCTP. Differential Revision: https://reviews.llvm.org/D103236	2021-06-13 13:55:34 +01:00
Ryan Prichard	65d0264ba2	[MC][ARM] Reject Thumb "ror rX, #0" The ROR instruction can only handle immediates between 1 and 31. The would-be encoding for ROR #0 is actually the RRX instruction. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D102455	2021-05-19 15:05:39 -07:00
David Green	1011d4ed60	[ARM] Constrain CMPZ shift combine to a single use We currently prefer t2CMPrs over t2CMPri when the node contains a shift. This can introduce more nodes if the shift has multiple uses though, as value from the shift will be needed anyway, and in the case of a t2CMPri compared with zero will more readily be removed entirely. Differential Revision: https://reviews.llvm.org/D101688	2021-05-13 18:31:01 +01:00
Nick Desaulniers	52338af569	[MC][ARM] add .w suffixes for RSB/RSBS T1 See also: F5.1.167 RSB, RSBS (register) T1 shift or rotate by value variant of the Arm ARM. Link: https://github.com/ClangBuiltLinux/linux/issues/1309 Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D99542	2021-04-01 10:45:37 -07:00
Nick Desaulniers	1addc231cd	[MC][ARM] add .w suffixes for ORN/ORNS T1 See also: F5.1.128 ORN, ORNS (register) T1 shift or rotate by value variant of the Arm ARM. Link: https://github.com/ClangBuiltLinux/linux/issues/1309 Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D99538	2021-04-01 10:27:09 -07:00
David Green	bd516d24c1	[ARM] Move t2DoLoopStart reg alloc hint This adjusts the place that the t2DoLoopStart reg allocation hint is inserted, adding it in the ARMTPAndVPTOptimizaionPass in a similar place as other tail predicated loop optimizations. This removes the need for doing so in a custom inserter, and should make the hint more accurate, only adding it where we expect to create a DLS (not DLSTP or WLS).	2021-03-11 17:56:19 +00:00
David Green	fad70c3068	[ARM] Improve WLS lowering Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over to the WLS while loops, improving the chance of lowering them successfully. To do this the lowering has to change a little as the instructions are terminators that produce a value - something that needs to be treated carefully. Lowering starts at the Hardware Loop pass, inserting a new llvm.test.start.loop.iterations that produces both an i1 to control the loop entry and an i32 similar to the llvm.start.loop.iterations intrinsic added for do loops. This feeds into the loop phi, properly gluing the values together: %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div) %wls0 = extractvalue { i32, i1 } %wls, 0 %wls1 = extractvalue { i32, i1 } %wls, 1 br i1 %wls1, label %loop.ph, label %loop.exit ... loop: %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ] .. %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1) %cmp = icmp ne i32 %iv.next, 0 br i1 %cmp, label %loop, label %loop.exit The llvm.test.start.loop.iterations need to be lowered through ISel lowering as a pair of WLS and WLSSETUP nodes, which each get converted to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent t2WhileLoopStart from being a terminator that produces a value, something difficult to control at that stage in the pipeline. Instead the t2WhileLoopSetup produces the value of LR (essentially acting as a lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc). These are then converted into a single t2WhileLoopStartLR at the same point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop to prevent them from progressing further in the pipeline. The t2WhileLoopStartLR is a single instruction that takes a GPR and produces LR, similar to the WLS instruction. %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3 t2B %bb.1 ... bb.2.loop: %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2 ... %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2 t2B %bb.3 The t2WhileLoopStartLR can then be treated similar to the other low overhead loop pseudos, eventually being lowered to a WLS providing the branches are within range. Differential Revision: https://reviews.llvm.org/D97729	2021-03-11 17:56:19 +00:00
David Green	438c98515c	[ARM] Use 0, not ZR during ISel for CSINC/INV/NEG Instead of converting the 0 into a ZR reg during lowering, do that with tablegen by matching the zero immediate. This when combined with other optimizations is more likely to use ZR and helps keep the DAG more easily optimizable. It should not otherwise effect code generation.	2021-03-02 19:01:14 +00:00
Nick Desaulniers	404843a94d	[MC][ARM] add .w suffixes for BL (T1) and DBG F1.2 Standard assembler syntax fields describes .w and .n suffixes for wide and narrow encodings. arch/arm/probes/kprobes/test-thumb.c tests installing kprobes for certain instructions using inline asm. There's a few instructions we fail to assemble due to missing .w t2InstAliases. Adds .w suffixes for: * bl (F5.1.25 BL, BLX (immediate) T1) * dbg (F5.1.42 DBG T1) Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D97236	2021-02-24 09:58:08 -08:00
Nick Desaulniers	1e204ac789	[THUMB2] add .w suffixes for ldr/str (immediate) T4 The Linux kernel when built with CONFIG_THUMB2_KERNEL makes use of these instructions with immediate operands and wide encodings. These are the T4 variants of the follow sections from the Arm ARM. F5.1.72 LDR (immediate) F5.1.229 STR (immediate) I wasn't able to represent these simple aliases using t2InstAlias due to the Constraints on the non-suffixed existing instructions, which results in some manual parsing logic needing to be added. F1.2 Standard assembler syntax fields describes the use of the .w (wide) vs .n (narrow) encoding suffix. Link: https://bugs.llvm.org/show_bug.cgi?id=49118 Link: https://github.com/ClangBuiltLinux/linux/issues/1296 Reported-by: Stefan Agner <stefan@agner.ch> Reported-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96632	2021-02-23 09:25:40 -08:00
Nick Desaulniers	68945a8686	[Thumb2] support `movs pc, lr` alias for `subs pc, lr, #0`/`eret` This is used by the Linux kernel built with CONFIG_THUMB2_KERNEL. Because different operands are not permitted to `movs`, the diagnostics now provide multiple suggestions along the lines of using a non-pc destination operand or lr source operand. Forked from D95586. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96304	2021-02-10 11:00:42 -08:00
Zhuojia Shen	8cef45517e	[ARM] Fix STRT/STRHT/STRBT input/output operands. STRT, STRHT, and STRBT are store instructions and their source register $Rt should be treated as an input operand instead of an output operand. This should fix things (e.g., liveness tracking in LivePhysRegs) if these instructions were used in CodeGen. Differential Revision: https://reviews.llvm.org/D95074	2021-01-26 14:00:58 -08:00
Kristof Beyls	320fd3314e	[ARM] Implement harden-sls-retbr for Thumb mode The only non-trivial consideration in this patch is that the formation of TBB/TBH instructions, which is done in the constant island pass, does not understand the speculation barriers inserted by the SLSHardening pass. As such, when harden-sls-retbr is enabled for a function, the formation of TBB/TBH instructions in the constant island pass is disabled. Differential Revision: https://reviews.llvm.org/D92396	2020-12-19 12:32:47 +00:00
David Green	3f571be1c0	[ARM] Make t2DoLoopStartTP a terminator Although this was something that I was hoping we would not have to do, this patch makes t2DoLoopStartTP a terminator in order to keep it at the end of it's block, so not allowing extra MVE instruction between it and the end. With t2DoLoopStartTP's also starting tail predication regions, it also marks them as having side effects. The t2DoLoopStart is still not a terminator, giving it the extra scheduling freedom that can be helpful, but now that we have a TP version they can be treated differently. Differential Revision: https://reviews.llvm.org/D91887	2020-12-11 09:23:57 +00:00
David Green	0447f3508f	[ARM][RegAlloc] Add t2LoopEndDec We currently have problems with the way that low overhead loops are specified, with LR being spilled between the t2LoopDec and the t2LoopEnd forcing the entire loop to be reverted late in the backend. As they will eventually become a single instruction, this patch introduces a t2LoopEndDec which is the combination of the two, combined before registry allocation to make sure this does not fail. Unfortunately this instruction is a terminator that produces a value (and also branches - it only produces the value around the branching edge). So this needs some adjustment to phi elimination and the register allocator to make sure that we do not spill this LR def around the loop (needing to put a spill after the terminator). We treat the loop very carefully, making sure that there is nothing else like calls that would break it's ability to use LR. For that, this adds a isUnspillableTerminator to opt in the new behaviour. There is a chance that this could cause problems, and so I have added an escape option incase. But I have not seen any problems in the testing that I've tried, and not reverting Low overhead loops is important for our performance. If this does work then we can hopefully do the same for t2WhileLoopStart and t2DoLoopStart instructions. This patch also contains the code needed to convert or revert the t2LoopEndDec in the backend (which just needs a subs; bne) and the code pre-ra to create them. Differential Revision: https://reviews.llvm.org/D91358	2020-12-10 12:14:23 +00:00
David Green	08d1c2d470	[ARM] Introduce t2DoLoopStartTP This introduces a new pseudo instruction, almost identical to a t2DoLoopStart but taking 2 parameters - the original loop iteration count needed for a low overhead loop, plus the VCTP element count needed for a DLSTP instruction setting up a tail predicated loop. The idea is that the instruction holds both values and the backend ARMLowOverheadLoops pass can pick between the two, depending on whether it creates a tail predicated loop or falls back to a low overhead loop. To do that there needs to be something that converts a t2DoLoopStart to a t2DoLoopStartTP, for which this patch repurposes the MVEVPTOptimisationsPass as a "tail predication and vpt optimisation" pass. The extra operand for the t2DoLoopStartTP is chosen based on the operands of VCTP's in the loop, and the instruction is moved as late in the block as possible to attempt to increase the likelihood of making tail predicated loops. Differential Revision: https://reviews.llvm.org/D90591	2020-11-10 18:08:12 +00:00
David Green	73a6cd4b6b	[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR This hints the operand of a t2DoLoopStart towards using LR, which can help make it more likely to become t2DLS lr, lr. This makes it easier to move if needed (as the input is the same as the output), or potentially remove entirely. The hint is added after others (from COPY's etc) which still take precedence. It needed to find a place to add the hint, which currently uses the post isel custom inserter. Differential Revision: https://reviews.llvm.org/D89883	2020-11-10 16:28:57 +00:00
David Green	b2ac9681a7	[ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881	2020-11-10 15:57:58 +00:00
Meera Nakrani	48c9e8244b	[ARM] Removed hasSideEffects from signed/unsigned saturates Removed hasSideEffects from SSAT and USAT so that they are no longer marked as unpredictable. Differential Revision: https://reviews.llvm.org/D88545	2020-10-01 14:55:01 +00:00
Meera Nakrani	675431b987	[ARM] Added more patterns to generate SSAT/USAT with shift Added patterns to generate an SSAT or USAT with shift for SSAT/USAT instructions that are matched from IR patterns. Differential Revision: https://reviews.llvm.org/D88145	2020-09-28 14:50:19 +00:00
Meera Nakrani	20283ff491	[ARM] Generated SSAT and USAT instructions with shift Added patterns so that both SSAT and USAT instructions are generated with shifts. Added corresponding regression tests. Differential Review: https://reviews.llvm.org/D85120	2020-08-04 09:38:17 +00:00
David Green	146d35b6ee	[ARM] CSEL generation This adds a peephole optimisation to turn a t2MOVccr that could not be folded into any other instruction into a CSEL on 8.1-m. The t2MOVccr would usually be expanded into a conditional mov, that becomes an IT; MOV pair. We can instead generate a CSEL instruction, which can potentially be smaller and allows better register allocation freedom, which can help reduce codesize. Performance is more variable and may depend on the micrarchitecture details, but initial results look good. If we need to control this per-cpu, we can add a subtarget feature as we need it. Original patch by David Penry. Differential Revision: https://reviews.llvm.org/D83566	2020-07-16 11:10:53 +01:00
David Green	d604cc6e9a	[ARM] Mark more integer instructions as not having side effects. LDRD and STRD along with UBFX and SBFX are selected from DAGToDAG transforms, so do not have tblgen patterns. They don't get marked as having side effects so cannot be scheduled as efficiently as you would like. This specifically marks then as not having side effects. Differential Revision: https://reviews.llvm.org/D82358	2020-06-23 22:45:51 +01:00
Victor Campos	c010d4d195	[ARM] Improve codegen of volatile load/store of i64 Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. The code generation explicitly deviates from using the register-offset variant of LDRD/STRD. In this variant, the register allocated to the register-offset cannot be reused in any of the remaining operands. Such restriction seems to be non-trivial to implement in LLVM, thus it is left as a to-do. Differential Revision: https://reviews.llvm.org/D70072	2020-05-28 10:52:43 +01:00
Victor Campos	872ee78f65	Revert "[ARM] Improve codegen of volatile load/store of i64" This reverts commit `8a12553223`. A bug has been found when generating code for Thumb2. In some very specific cases, the prologue/epilogue emitter generates erroneous stack offsets for the new LDRD instructions that access the stack. This bug does not seem to be caused by the reverted patch though. Likely the latter has made an undiscovered issue emerge in the prologue/epilogue emission pass. Nevertheless, this reversion is necessary since it is blocking users of the ARM backend.	2020-05-22 11:01:57 +01:00
Kazuaki Ishizaki	0312b9f550	[llvm] NFC: Fix trivial typo in rst and td files Differential Revision: https://reviews.llvm.org/D77469	2020-04-23 14:26:32 +09:00
David Green	2c5f43f9dd	[ARM] Fix qdadd operand order qdadd is defined as sat(Rm + sat(2*Rn)). We had the Rm and Rn switched the wrong way around. Differential Revision: https://reviews.llvm.org/D77049	2020-03-31 10:11:36 +01:00
Victor Campos	8a12553223	[ARM] Improve codegen of volatile load/store of i64 Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. The code generation explicitly deviates from using the register-offset variant of LDRD/STRD. In this variant, the register allocated to the register-offset cannot be reused in any of the remaining operands. Such restriction seems to be non-trivial to implement in LLVM, thus it is left as a to-do. Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers Reviewed By: efriedma, nickdesaulniers Subscribers: danielkiss, alanphipps, hans, nathanchance, nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072	2020-03-11 10:19:27 +00:00
Stefan Agner	2f95d5f103	[ARM][Thumb2] support .w assembler qualifier for dmb/dsb/isb Support the explicit wide assembler qualifier for the dmb/dsb/isb synchronization barrier instructions. Differential revision: https://reviews.llvm.org/D75143	2020-02-28 11:08:24 +00:00
Stefan Agner	b4207e705b	[ARM][Thumb2] Support .w assembler qualifier for pld/pldw/pli Accept explicit wide assembler qualifier for the pld/pldw/pli. Differential revision: https://reviews.llvm.org/D75144	2020-02-28 11:08:24 +00:00
Sam Parker	965ba4291a	Revert "[ARM] Add CPSR as an implicit use of t2IT" This reverts commit `e58229fded`. Differential Revision: https://reviews.llvm.org/D75186	2020-02-27 15:43:44 +00:00
Sam Parker	e58229fded	[ARM] Add CPSR as an implicit use of t2IT This use is already attached to the BUNDLE instruction but is lost after finalisation. Differential Revision: https://reviews.llvm.org/D75186	2020-02-27 10:10:40 +00:00
Victor Campos	af2a384581	Revert "[ARM] Improve codegen of volatile load/store of i64" This reverts commit `60e0120c91`.	2020-02-08 13:18:45 +00:00
Momchil Velikov	a328536c6d	[ARM] Correct syntax of the CLRM insn The predicate should be adjacent to the opcode. Differential Revision: https://reviews.llvm.org/D74040	2020-02-05 13:54:34 +00:00

1 2 3 4 5 ...

882 Commits