llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	3a6365a439	[ARM] Add FeatureHasNoBranchPredictor for Thumb1 cores Mark v6m/v8m-baseline cores as having no branch predictors. This should not alter very much on its own, but is more correct as the cores do not have branch predictors and can help in the future.	2021-03-30 21:45:26 +01:00
Tomas Matheson	a9968c0a33	[NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions Currently needsStackRealignment returns false if canRealignStack returns false. This means that the behavior of needsStackRealignment does not correspond to it's name and description; a function might need stack realignment, but if it is not possible then this function returns false. Furthermore, needsStackRealignment is not virtual and therefore some backends have made use of canRealignStack to indicate whether a function needs stack realignment. This patch attempts to clarify the situation by separating them and introducing new names: - shouldRealignStack - true if there is any reason the stack should be realigned - canRealignStack - true if we are still able to realign the stack (e.g. we can still reserve/have reserved a frame pointer) - hasStackRealignment = shouldRealignStack && canRealignStack (not target customisable) Targets can now override shouldRealignStack to indicate that stack realignment is required. This change will make it easier in a future change to handle the case where we need to realign the stack but can't do so (for example when the register allocator creates an aligned spill after the frame pointer has been eliminated). Differential Revision: https://reviews.llvm.org/D98716 Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87	2021-03-30 17:31:39 +01:00
David Green	d4b3380dfe	[ARM] Handle Splats in MVE lane interleaving As another addition to MVE lane interleaving, this handles Splat shuffle vectors, as the shuffle of a splat is a splat. Differential Revision: https://reviews.llvm.org/D97291	2021-03-30 11:19:16 +01:00
David Green	3a68c6d26c	[ARM] Extend MVE lane interleaving to handle other non-instruction leaves This extends the recent MVE lane interleaving passto handle other non-instruction leaves, for which a new shuffle is added. This helps especially for constants and potentially for arguments. Differential Revision: https://reviews.llvm.org/D97289	2021-03-29 09:05:45 +01:00
David Green	6c88ffeda3	[ARM] Fix the Changed value in the MVE lane interleaving pass.	2021-03-28 23:47:53 +01:00
David Green	7b6f760fcd	[ARM] MVE vector lane interleaving MVE does not have a single sext/zext or trunc instruction that takes the bottom half of a vector and extends to a full width, like NEON has with MOVL. Instead it is expected that this happens through top/bottom instructions. So the MVE equivalent VMOVLT/B instructions take either the even or odd elements of the input and extend them to the larger type, producing a vector with half the number of elements each of double the bitwidth. As there is no simple instruction for a normal extend, we often have to expand sext/zext/trunc into a series of lane moves (or stack loads/stores, which we do not do yet). This pass takes vector code that starts at truncs, looks for interconnected blobs of operations that end with sext/zext and transforms them by adding shuffles so that the lanes are interleaved and the MVE VMOVL/VMOVN instructions can be used. This is done pre-ISel so that it can work across basic blocks. This initial version of the pass just handles a limited set of instructions, not handling constants or splats or FP, which can all come as extensions to this base. Differential Revision: https://reviews.llvm.org/D95804	2021-03-28 19:34:58 +01:00
David Green	d97189600e	[ARM] Revert WhileLoopStartLR to DoLoopStart If a WhileLoopStartLR is reverted due to calls in the preheader, we may still be able to instead create a DoLoopStart, preserving the low overhead loop. This adds code for that, only reverting the WhileLoopStartR to a Br/Cmp, leaving the rest of the low overhead loop in place. Differential Revision: https://reviews.llvm.org/D98413	2021-03-25 16:44:15 +00:00
David Green	14b2ec934e	[ARM] Enable UpperBound unrolling for all loops This UpperBound unrolling was already enabled so long as a series of conditions in ARMTTIImpl::getUnrollingPreferences pass. This just always enables it as it can help fully unroll loops that would not otherwise pass those tests. Differential Revision: https://reviews.llvm.org/D99174	2021-03-24 16:39:21 +00:00
Sander de Smalen	55d18b3cc2	[TTI] Return a TypeSize from getRegisterBitWidth. This patch changes the interface to take a RegisterKind, to indicate whether the register bitwidth of a scalar register, fixed-width vector register, or scalable vector register must be returned. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98874	2021-03-24 14:45:13 +00:00
Victor Campos	f22b4c7122	[ARM] Handle debug instrs in ARM Low Overhead Loop pass In function ConvertVPTBlocks(), it is assumed that every instruction within a vector-predicated block is predicated. This is false for debug instructions, used by LLVM. Because of this, an assertion failure is reached when an input contains debug instructions inside VPT blocks. In non-assert builds, an out of bounds memory access took place. The present patch properly covers the case of debug instructions. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99075	2021-03-23 11:49:06 +00:00
David Green	6d9d2049c8	[ARM] VINS f16 pattern This adds an extra pattern for inserting an f16 into a odd vector lane via an VINS. If the dual-insert-lane pattern does not happen to apply, this can help with some simple cases. Differential Revision: https://reviews.llvm.org/D95471	2021-03-21 12:00:06 +00:00
David Green	a2e0312cda	[ARM] Tone down the MVE scalarization overhead The scalarization overhead was set deliberately high for MVE, whilst the codegen was new. It helps protect us against the negative ramifications of mixing scalar and vector instructions. This decreases that, especially for floating point where the cost of extracting/inserting lane elements can be low. For integer the cost is still fairly high due to the cross-register-bank copy, but is no longer n^2 in the length of the vector. In general, this will decrease the cost of scalarizing floats and long integer vectors. i64 increase in cost, having a high cost before and after this patch. For floats this allows up to start doing things like vectorizing fdiv instructions, even if they are scalarized. Differential Revision: https://reviews.llvm.org/D98245	2021-03-19 18:30:11 +00:00
David Green	35e0567d58	[ARM] Add VREV MVE shuffle costs This uses the shuffle mask cost from D98206 to give a better cost of MVE VREV instructions. This helps especially in VectorCombine where the cost of shuffles is used to reorder bitcasts, which this helps keep the phase ordering test for fp16 reductions producing optimal code. The isVREVMask has been moved to a header file to allow it to be used across target transform and isel lowering. Differential Revision: https://reviews.llvm.org/D98210	2021-03-17 21:21:43 +00:00
David Green	e2935dcfc4	[TTI] Add a Mask to getShuffleCost This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask can be provided a more accurate cost can be provided by the backend. For example VREV costs could be returned by the ARM backend. This should be an NFC until then, laying the groundwork for that to be added. Differential Revision: https://reviews.llvm.org/D98206	2021-03-17 17:46:26 +00:00
David Green	402f2cae7d	[ARM] Use lrdsb for more thumb1 loads. Given a sextload i16, we can usually generate "ldrsh [rn. rm]". If we don't naturally have a rn, rm addressing mode, we can either generate "ldrh [rn, #0]; sxth" or "mov rm, #0; ldrsh [rn. rm]". We currently generate the first, always creating a sxth. They are both the same number of instructions, but if we generate the second then the mov #0 will likely be CSE'd or pulled out of a loop, etc. This adjusts the ISel patterns to do that, creating a mov instead of a sxth. Differential Revision: https://reviews.llvm.org/D98693	2021-03-17 15:29:02 +00:00
Fangrui Song	5d44c92bf8	Change void getNoop(MCInst &NopInst) to MCInst getNop() Prefer (self-documenting) return values to output parameters (which are liable to be used). While here, rename Noop to Nop which is more widely used and improves consistency with hasEmitNops/setEmitNops/emitNop/etc.	2021-03-15 12:05:34 -07:00
Matt Arsenault	6b76d82853	GlobalISel: Fix marking byval arguments as immutable byval arguments need to be assumed writable. Only implicitly stack passed arguments which aren't addressable in the IR can be assumed immutable. Mips is still broken since for some reason its doing its own thing with the ValueHandlers (and x86 doesn't actually handle byval arguments now, although some of the code is there).	2021-03-12 09:01:53 -05:00
David Green	bd516d24c1	[ARM] Move t2DoLoopStart reg alloc hint This adjusts the place that the t2DoLoopStart reg allocation hint is inserted, adding it in the ARMTPAndVPTOptimizaionPass in a similar place as other tail predicated loop optimizations. This removes the need for doing so in a custom inserter, and should make the hint more accurate, only adding it where we expect to create a DLS (not DLSTP or WLS).	2021-03-11 17:56:19 +00:00
David Green	fad70c3068	[ARM] Improve WLS lowering Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over to the WLS while loops, improving the chance of lowering them successfully. To do this the lowering has to change a little as the instructions are terminators that produce a value - something that needs to be treated carefully. Lowering starts at the Hardware Loop pass, inserting a new llvm.test.start.loop.iterations that produces both an i1 to control the loop entry and an i32 similar to the llvm.start.loop.iterations intrinsic added for do loops. This feeds into the loop phi, properly gluing the values together: %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div) %wls0 = extractvalue { i32, i1 } %wls, 0 %wls1 = extractvalue { i32, i1 } %wls, 1 br i1 %wls1, label %loop.ph, label %loop.exit ... loop: %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ] .. %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1) %cmp = icmp ne i32 %iv.next, 0 br i1 %cmp, label %loop, label %loop.exit The llvm.test.start.loop.iterations need to be lowered through ISel lowering as a pair of WLS and WLSSETUP nodes, which each get converted to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent t2WhileLoopStart from being a terminator that produces a value, something difficult to control at that stage in the pipeline. Instead the t2WhileLoopSetup produces the value of LR (essentially acting as a lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc). These are then converted into a single t2WhileLoopStartLR at the same point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop to prevent them from progressing further in the pipeline. The t2WhileLoopStartLR is a single instruction that takes a GPR and produces LR, similar to the WLS instruction. %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3 t2B %bb.1 ... bb.2.loop: %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2 ... %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2 t2B %bb.3 The t2WhileLoopStartLR can then be treated similar to the other low overhead loop pseudos, eventually being lowered to a WLS providing the branches are within range. Differential Revision: https://reviews.llvm.org/D97729	2021-03-11 17:56:19 +00:00
Oliver Stannard	8d632ca436	[ARM] Add comment explaining stack frame layout Add a comment explaining how we lay out stack frames for ARM targets, based on the existing one for AArch64. Also expand the comment to explain reserved call frames for both architectures. Differential revision: https://reviews.llvm.org/D98258	2021-03-09 15:20:32 +00:00
Fangrui Song	59ff9315fd	[MC][ARM] Support .reloc , BFD_RELOC_{NONE,8,16,32}, BFD_RELOC_NONE is useful for ld --gc-sections: it provides a generic way indicating a dependency between two sections.	2021-03-05 21:39:16 -08:00
Oliver Stannard	aac056c528	[objdump][ARM] Use correct offset when printing ARM/Thumb branch targets llvm-objdump only uses one MCInstrAnalysis object, so if ARM and Thumb code is mixed in one object, or if an object is disassembled without explicitly setting the triple to match the ISA used, then branch and call targets will be printed incorrectly. This could be fixed by creating two MCInstrAnalysis objects in llvm-objdump, like we currently do for SubtargetInfo. However, I don't think there's any reason we need two separate sub-classes of MCInstrAnalysis, so instead these can be merged into one, and the ISA determined by checking the opcode of the instruction. Differential revision: https://reviews.llvm.org/D97766	2021-03-04 11:15:57 +00:00
David Green	a968e7b82e	[ARM] KnownBits for CSINC/CSNEG/CSINV This adds some simple known bits handling for the three CSINC/NEG/INV instructions. From the operands known bits we can compute the common bits of the first operand and incremented/negated/inverted second operand. The first, especially CSINC ZR, ZR, comes up fair amount in the tests. The others are more rare so a unit test for them is added. Differential Revision: https://reviews.llvm.org/D97788	2021-03-04 08:40:20 +00:00
David Green	ab280cbaa3	[ARM] Ensure undef is propagated to CBZ/CBNZ flags In some rare circumstances we can be using an undef register for a compare. When folded into a CBZ/CBNZ the undef flags are lost, leading to machine verifier problems. This propagates the existing flags to the new instruction.	2021-03-03 08:02:58 +00:00
Amara Emerson	8a316045ed	[AArch64][GlobalISel] Enable use of the optsize predicate in the selector. To do this while supporting the existing functionality in SelectionDAG of using PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis dependencies to the instruction selector pass. Then, use the predicate to generate constant pool loads for f32 materialization, if we're targeting optsize/minsize. Differential Revision: https://reviews.llvm.org/D97732	2021-03-02 12:55:51 -08:00
David Green	438c98515c	[ARM] Use 0, not ZR during ISel for CSINC/INV/NEG Instead of converting the 0 into a ZR reg during lowering, do that with tablegen by matching the zero immediate. This when combined with other optimizations is more likely to use ZR and helps keep the DAG more easily optimizable. It should not otherwise effect code generation.	2021-03-02 19:01:14 +00:00
David Green	d6ba8ecb60	[ARM] Add handling of t2LDRSB/t2LDRSH in Constant Island Pass These constant pool loads should be treated similarly to t2LDRB/t2LDRH, acting on the same offset ranges. Add handling and a simple test.	2021-03-02 08:46:07 +00:00
Jian Cai	c35105055e	[ARM] support symbolic expressions as branch target in b.w Currently ARM backend validates the range of branch targets before the layout of fragments is finalized. This causes build failure if symbolic expressions are used, with the exception of a single symbolic value. For example, "b.w ." works but "b.w . + 2" currently fails to assemble. This fixes the issue by delaying this check (in ARMAsmParser::validateInstruction) of b.w instructions until the symbol expressions are resolved (in ARMAsmBackend::adjustFixupValue). Link: https://github.com/ClangBuiltLinux/linux/issues/1286 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D97568	2021-03-01 17:41:35 -08:00
David Green	e880f8b88a	[ARM] Rename pass to MVETPAndVPTOptimisationsPass This pass has for a while performed Tail predication as well as VPT block optimizations. Rename the pass to make that clear.	2021-03-01 21:57:19 +00:00
Matt Arsenault	6c260d3bc0	GlobalISel: Move splitToValueTypes to generic code I copied the nearly identical function from AArch64 into AMDGPU, so fix this duplication. Mips and X86 have their own more exotic versions which should be removed. However replacing those is better left for a separate patch since it requires other changes to avoid regressions.	2021-03-01 08:58:18 -05:00
David Green	91ebc4e864	[ARM] VMOVN undef folding If we insert undef using a VMOVN, we can just use the original value in three out of the four possible combinations. Using VMOVT into a undef vector will still require the lanes to be moved, but otherwise the non-undef value can be used.	2021-02-28 14:44:45 +00:00
David Green	0fe64812d8	[ARM] VECTOR_REG_CAST undef -> undef Propagate undef through VECTOR_REG_CAST nodes, allowing extra simplification in some patterns.	2021-02-28 11:13:49 +00:00
Stefan Agner	a921aaf789	[MC][ARM] make Thumb function also if type attribute is set Make sure to set the bottom bit of the symbol even when the type attribute of a label is set after the label. GNU as sets the thumb state according to the thumb state of the label. If a .type directive is placed after the label, set the symbol's thumb state according to the thumb state of the .type directive. This matches GNU as in most cases. From: Stefan Agner <stefan@agner.ch> This fixes: https://bugs.llvm.org/show_bug.cgi?id=44860 https://github.com/ClangBuiltLinux/linux/issues/866 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D74927	2021-02-24 14:08:56 -08:00
Nick Desaulniers	404843a94d	[MC][ARM] add .w suffixes for BL (T1) and DBG F1.2 Standard assembler syntax fields describes .w and .n suffixes for wide and narrow encodings. arch/arm/probes/kprobes/test-thumb.c tests installing kprobes for certain instructions using inline asm. There's a few instructions we fail to assemble due to missing .w t2InstAliases. Adds .w suffixes for: * bl (F5.1.25 BL, BLX (immediate) T1) * dbg (F5.1.42 DBG T1) Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D97236	2021-02-24 09:58:08 -08:00
David Green	03892a27d6	[ARM] Expand the range of allowed post-incs in load/store optimizer Currently the load/store optimizer will only fold in increments of the same size as the load/store. This patch expands that to any legal immediate for the post-inc instruction. This is a recommit of `3b34b06fc5` with correctness fixes and extra tests. Differential Revision: https://reviews.llvm.org/D95885	2021-02-24 08:46:15 +00:00
Nick Desaulniers	1e204ac789	[THUMB2] add .w suffixes for ldr/str (immediate) T4 The Linux kernel when built with CONFIG_THUMB2_KERNEL makes use of these instructions with immediate operands and wide encodings. These are the T4 variants of the follow sections from the Arm ARM. F5.1.72 LDR (immediate) F5.1.229 STR (immediate) I wasn't able to represent these simple aliases using t2InstAlias due to the Constraints on the non-suffixed existing instructions, which results in some manual parsing logic needing to be added. F1.2 Standard assembler syntax fields describes the use of the .w (wide) vs .n (narrow) encoding suffix. Link: https://bugs.llvm.org/show_bug.cgi?id=49118 Link: https://github.com/ClangBuiltLinux/linux/issues/1296 Reported-by: Stefan Agner <stefan@agner.ch> Reported-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96632	2021-02-23 09:25:40 -08:00
Sjoerd Meijer	e1c3bf6afe	[ARM] do not consider sp as deprecated for ldm/stm Early versions of the ARMv7 reference manuals considered the sp register as a deprecated register for ldm/stm familiy of instructions. However, later versions such as ARM DDI 0406C.d added a note to the Appendix: D9.3 Use of the SP as a general-purpose register Most ARM instructions, unlike Thumb instructions, provide exactly the same access to the SP as to R0-R12. This means that it is possible to use the SP as a general-purpose register. Earlier issues of this manual deprecated the use of SP in an ARM instruction, in any way that is deprecated, not permitted, or not possible in the corresponding Thumb instruction. However, user feedback indicates a number of cases where these instructions are useful. Therefore, ARM no longer deprecates these instruction uses. Also Armv8 manuals no longer consider SP as deprecated register for ldm/ stm A32 instructions. Furthermore, GNU as also does not print a deprecated warning when using SP with those instructions. Drop deprecation warning for pop/ldm/push/stm instructions. Patch by: Stefan Agner. Differential Revision: https://reviews.llvm.org/D82692	2021-02-23 13:26:18 +00:00
David Green	dd2dbf7ee2	[TTI] Change getOperandsScalarizationOverhead to take Type args As a followup to D95291, getOperandsScalarizationOverhead was still using a VF as a vector factor if the arguments were scalar, and would assert on certain matrix intrinsics with differently sized vector arguments. This patch removes the VF arg, instead passing the Types through directly. This should allow it to more accurately compute the cost without having to guess at which operands will be vectorized, something difficult with more complex intrinsics. This adjusts one SVE test as it is now calling the wrong intrinsic vs veccall. Without invalid InstructCosts the cost of the scalarized intrinsic is too low. This should get fixed when the cost of scalarization is accounted for with scalable types. Differential Revision: https://reviews.llvm.org/D96287	2021-02-23 13:04:59 +00:00
David Green	bd4b61efbd	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-23 13:03:26 +00:00
David Green	188f15d973	[ARM] Remove dead lowering code. NFC Remove the unnecessary code from `21a4faab60`, left over from a different way of lowering.	2021-02-22 10:07:53 +00:00
David Green	21a4faab60	[ARM] Move double vector insert patterns using vins to DAG combine This removes the existing patterns for inserting two lanes into an f16/i16 vector register using VINS, instead using a DAG combine to pattern match the same code sequences. The tablegen patterns were already on the large side (foreach LANE = [0, 2, 4, 6]) and were not handling all the cases they could. Moving that to a DAG combine, whilst not less code, allows us to better control and expand the selection of VINSs. Additionally this allows us to remove the AddedComplexity on VCVTT. The extra trick that this has learned in the process is to move two adjacent lanes using a single f32 vmov, allowing some extra inefficiencies to be removed. Differenial Revision: https://reviews.llvm.org/D96876	2021-02-22 09:29:47 +00:00
David Green	a1c34a9d6a	[ARM] Correct vector predicate type in MVE getCmpSelInstrCost	2021-02-19 14:43:51 +00:00
David Green	7a5c26e99a	Revert "[ARM] Expand the range of allowed post-incs in load/store optimizer" This reverts commit `3b34b06fc5` as runtime errors were reported.	2021-02-19 13:15:10 +00:00
Leonard Chan	c77659e549	[llvm][IR] Do not place constants with static relocations in a mergeable section This patch provides two major changes: 1. Add getRelocationInfo to check if a constant will have static, dynamic, or no relocations. (Also rename the original needsRelocation to needsDynamicRelocation.) 2. Only allow a constant with no relocations (static or dynamic) to be placed in a mergeable section. This will allow unused symbols that contain static relocations and happen to fit in mergeable constant sections (.rodata.cstN) to instead be placed in unique-named sections if -fdata-sections is used and subsequently garbage collected by --gc-sections. See https://lists.llvm.org/pipermail/llvm-dev/2021-February/148281.html. Differential Revision: https://reviews.llvm.org/D95960	2021-02-18 15:39:00 -08:00
David Green	3b34b06fc5	[ARM] Expand the range of allowed post-incs in load/store optimizer Currently the load/store optimizer will only fold in increments of the same size as the load/store. This patch expands that to any legal immediate for the post-inc instruction. Differential Revision: https://reviews.llvm.org/D95885	2021-02-18 14:59:02 +00:00
David Green	33ba220611	[ARM] Ensure types provided to getIntrinsicCost are valid It appears that pointer types were causing issues for the min/max cost code in getIntrinsicInstrCost. This makes sure that when matching icmp/select to a min/max, we only do that for normal int or float types.	2021-02-18 14:00:23 +00:00
David Green	1a6744e3dc	[ARM] Add larger than legal ICmp costs A v8i32 compare will produce a v8i1 predicate, but during codegen the v8i32 will be split into two v4i32, potentially requiring two v4i1 predicates to be merged into a single v8i1. Because this merging of two v4i1's into a v8i1 is very expensive, we need to make the cost of the compare equally high. This patch adds the cost of that to ARMTTIImpl::getCmpSelInstrCost. Because we don't know whether the user of the predicate can be split, and the cost model is mostly pre-instruction, we may be pessimistic but that should only be for larger and legal types. This also adds min/max detection to the costmodel where it can be detected, to keep those in line with the cost of simple min/max instructions. Otherwise for the most part, costs that were already expensive have become more expensive. Differential Revision: https://reviews.llvm.org/D96692	2021-02-18 11:42:17 +00:00
David Green	6d835c5fcd	[ARM] Add MVE abs costs Similar to min/max, this increases the accuracy of abs intrinsics costs under MVE.	2021-02-17 14:21:09 +00:00
Petr Hosek	16af973933	[MC][ELF] Support for zero flag section groups This change introduces support for zero flag ELF section groups to LLVM. LLVM already supports COMDAT sections, which in ELF are a special type of ELF section groups. These are generally useful to enable linker GC where you want a group of sections to always travel together, that is to be either retained or discarded as a whole, but without the COMDAT semantics. Other ELF assemblers already support zero flag ELF section groups and this change helps us reach feature parity. Differential Revision: https://reviews.llvm.org/D95851	2021-02-16 14:23:40 -08:00
David Green	1e007cf43c	[ARM] Use rGPR for writeback vldrs From what I can tell, a writeback is unpredictable with LR for both loads and stores. This changes the operand from a gprnopc to a rGPR in both cases (which I believe is essentially a NFC due to the tied-def already being a rGPR.) Differential Revision: https://reviews.llvm.org/D96723	2021-02-16 16:44:47 +00:00
David Green	0a98efb049	[ARM] Add some basic Min/Max costs This adds basic MVE costs for SMIN/SMAX/UMIN/UMAX, as well as MINNUM and MAXNUM representing fmin and fmax. It tightens up the costs, not using a ICmp+Select cost. Differential Revision: https://reviews.llvm.org/D96603	2021-02-15 15:06:19 +00:00
David Green	a838a4f69f	[ARM] Extend search for increment in load/store optimizer Currently the findIncDecAfter will only look at the next instruction for post-inc candidates in the load/store optimizer. This extends that to a search through the current BB, until an instruction that modifies or uses the increment reg is found. This allows more post-inc load/stores and ldm/stm's to be created, especially in cases where a schedule might move instructions further apart. We make sure not to look any further for an SP, as that might invalidate stack slots that are still in use. Differential Revision: https://reviews.llvm.org/D95881	2021-02-15 13:17:21 +00:00
Sjoerd Meijer	357237e93e	Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `effc3b0799`, with the build problem fixed.	2021-02-15 11:33:00 +00:00
Sjoerd Meijer	effc3b0799	Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `cd6de0e8de`.	2021-02-15 11:01:23 +00:00
Sjoerd Meijer	cd6de0e8de	[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600	2021-02-15 10:44:15 +00:00
Arlo Siemsen	080866470d	Add ehcont section support In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling. This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker. This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag. The change includes a test that when the module flag is enabled the section is correctly generated. The set of exception continuation information includes returns from exceptional control flow (catchret in llvm). In order to collect catchret we: 1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation, 2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and 3) Combines these targets with the other EHCont targets that were already being collected. Change originally authored by Daniel Frampton <dframpto@microsoft.com> For more details, see MSVC documentation for `/guard:ehcont` https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D94835	2021-02-15 14:27:12 +08:00
Serge Pavlov	816053bc71	[FPEnv][ARM] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D96501	2021-02-13 11:16:29 +07:00
David Green	875f0cbcc6	[ARM] Optimize fp store of extract to integer store if already available. Given a floating point store from an extracted vector, with an integer VGETLANE that already exists, storing the existing VGETLANEu directly can be better for performance. As the value is known to already be in an integer registers, this can help reduce fp register pressure, removed the need for the fp extract and allows use of more integer post-inc stores not available with vstr. This can be a bit narrow in scope, but helps with certain biquad kernels that store shuffled vector elements. Differential Revision: https://reviews.llvm.org/D96159	2021-02-12 18:34:58 +00:00
David Green	541828e35d	[ARM] Single source VMOVNT Our current lowering of VMOVNT goes via a shuffle vector of the form <0, N, 2, N+2, 4, N+4, ..>. That can of course also be a single input shuffle of the form <0, 0, 2, 2, 4, 4, ..>, where we use a VMOVNT to insert a vector into the top lanes of itself. This adds lowering of that case, re-using the existing isVMOVNMask. Differential Revision: https://reviews.llvm.org/D96065	2021-02-12 14:28:57 +00:00
Sanjay Patel	79b1b4a581	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
David Green	b1ef919aad	[ARM] Add CostKind to getMVEVectorCostFactor. This adds the CostKind to getMVEVectorCostFactor, so that it can automatically account for CodeSize costs, where it returns a cost of 1 not the MVEFactor used for Throughput/Latency. This helps simplify the caller code and allows us to get the codesize cost more correct in more cases.	2021-02-11 15:33:59 +00:00
Simon Tatham	69f1a7ad82	[ARM] Copy-paste error in ARMv87a architecture definition. In the tablegen architecture definition, the Name field for the ARMv87a record read "ARMv86a". All the other records contain their own names. Corrected it to "ARMv87a", and added the necessary value in ARMArchEnum for that to refer to. Reviewed By: pratlucas Differential Revision: https://reviews.llvm.org/D96493	2021-02-11 13:35:56 +00:00
David Green	e771614bae	[ARM] Change getScalarizationOverhead overload used in gather costs. NFC This changes which of the getScalarizationOverhead overloads is used in the gather/scatter cost to use the base variant directly, not relying on the version using heuristics on the number of args with no args provided. It should still produce the same costs for scalarized gathers/scatters.	2021-02-11 11:58:55 +00:00
David Green	7786ac8377	[ARM] Remove dead mov's in preheader of tail predicated loops With t2DoLoopDec we can be left with some extra MOV's in the preheaders of tail predicated loops. This removes them, in the same way we remove other dead variables. Differential Revision: https://reviews.llvm.org/D91857	2021-02-11 10:48:20 +00:00
David Green	1db7b9ceaa	[ARM] Make a BE predicate bitcast consistent with the rest of llvm We were storing predicate registers, such as a <8 x i1>, in the opposite order to how the rest of llvm expects. This actually turns out to be correct for the one place that usually uses it - the ScalarizeMaskedMemIntrin pass, but only because the pass was incorrect itself. This fixes the order so that bits are stored in the opposite order and bitcasts work as expected. This allows the Scalarization pass to be fixed, as in https://reviews.llvm.org/D94765. Differential Revision: https://reviews.llvm.org/D94867	2021-02-11 08:59:52 +00:00
Nick Desaulniers	68945a8686	[Thumb2] support `movs pc, lr` alias for `subs pc, lr, #0`/`eret` This is used by the Linux kernel built with CONFIG_THUMB2_KERNEL. Because different operands are not permitted to `movs`, the diagnostics now provide multiple suggestions along the lines of using a non-pc destination operand or lr source operand. Forked from D95586. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96304	2021-02-10 11:00:42 -08:00
Matt Arsenault	b72a23650f	GlobalISel: Fix using wrong calling convention for callees This was taking the calling convention from the parent function, instead of the callee. Avoids regressions in a future patch when the caller and callee have different type breakdowns. For some reason AArch64's lowerFormalArguments seems to intentionally ignore the parent isVarArg.	2021-02-09 13:48:56 -05:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
David Green	0c7e044a7f	[ARM] One-off identity shuffle A One-Off Identity mask is a shuffle that is mostly an identity mask from as single source but contains a single element out-of-place, either from a different vector or from another position in the same vector. As opposed to lowering this via a ARMISD::BUILD_VECTOR we can generate an extract/insert pair directly. Under ARM with individually accessible lane elements this often becomes a simple lane move. This also alters the LowerVECTOR_SHUFFLEUsingMovs code to use v4f32 (not v4i32), a more natural type for lane moves. Differential Revision: https://reviews.llvm.org/D95551	2021-02-08 21:24:32 +00:00
David Green	11e415dc90	[ARM] Make v2f64 scalar_to_vector legal Because we mark all operations as expand for v2f64, scalar_to_vector would end up lowering through a stack store/reload. But it is pretty simple to implement, only inserting a D reg into an undef vector. This helps clear up some inefficient codegen from soft calling conventions. Differential Revision: https://reviews.llvm.org/D96153	2021-02-08 11:34:55 +00:00
David Green	1b435eb8f3	[ARM] i16 insert-of-extract to VINS pattern This adds another tablegen fold that converts an i16 odd-lane-insert of an even-lane-extract into a VINS. We extract the existing f32 value from the destination register and VINS the new value into it. The rest of the backend then is able to optimize the INSERT_SUBREG / COPY_TO_REGCLASS / EXTRACT_SUBREG. Differential Revision: https://reviews.llvm.org/D95456	2021-02-08 08:41:07 +00:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00
Fangrui Song	e5269da979	[ARM][WebAssembly] Fix incorrect MCOperand::createDFPImm after D96091	2021-02-04 20:39:52 -08:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Dan Gohman	698c6b0a09	[WebAssembly] Support single-floating-point immediate value As mentioned in TODO comment, casting double to float causes NaNs to change bits. To avoid the change, this patch adds support for single-floating-point immediate value on MachineCode. Patch by Yuta Saito. Differential Revision: https://reviews.llvm.org/D77384	2021-02-04 18:05:06 -08:00
Ayke van Laethem	aecdf15cc7	[ARM] Do not emit ldrexd/strexd on Cortex-M chips The ldrexd/strexd instructions are not supported on M-class chips, see for example https://developer.arm.com/documentation/dui0489/e/arm-and-thumb-instructions/memory-access-instructions/ldrex-and-strex which says: > All these 32-bit Thumb instructions are available in ARMv6T2 and > above, except that LDREXD and STREXD are not available in the ARMv7-M > architecture. Looking at the ARMv8-M architecture, it appears that these instructions aren't supported either. The Architecture Reference Manual lists ldrex/strex but not ldrexd/strexd: https://developer.arm.com/documentation/ddi0553/bn/ Godbolt example on LLVM 11.0.0, which incorrectly emits ldrexd/strexd instructions: https://llvm.godbolt.org/z/5qqPnE Differential Revision: https://reviews.llvm.org/D95891	2021-02-04 21:55:34 +01:00
David Green	649a3d00df	[ARM] Handle f16 in GeneratePerfectShuffle This new f16 shuffle under Neon would hit an assert in GeneratePerfectShuffle as it would try to treat a f16 vector as an i8. Add f16 handling, treating them like an i16. Differential Revision: https://reviews.llvm.org/D95446	2021-02-04 11:14:52 +00:00
David Green	3e780616c4	[ARM] Correct some tablegen operand types. NFC	2021-02-02 16:55:31 +00:00
David Green	2753722b0f	[ARM] Mark MVE_VMOV_to_lane_32 as isInsertSubregLike This allows the peephole optimizer to know that a MVE_VMOV_to_lane_32 is the same as an insert subreg, allowing it to optimize some redundant lane moves. Differential Revision: https://reviews.llvm.org/D95433	2021-02-02 16:35:47 +00:00
David Green	3a5adf8483	[ARM] Add MVE insert-of-extract pattern A v4i32 insert of an extract can become a simple lane move, as opposed to round-tripping via a GPR. This adds a patterns that turns an v4i32 insert-extract pair into a EXTRACT_SUBREG/INSERT_SUBREG, with the required COPY_TO_REGCLASS. These get better optimized into a simple lane move by the rest of the backend. Differential Revision: https://reviews.llvm.org/D95428	2021-02-02 15:15:04 +00:00
David Green	c722575633	[ARM] Select VINS from vector inserts This patch adds tablegen patterns for pairs of i16/f16 insert/extracts. If we are inserting into two adjacent vector lanes (0 and 1 for example), we can use either a vmov;vins or vmovx;vins to insert the pair together, avoiding a round-trip from GRP registers. This is quite a large patterns with a number of EXTRACT_SUBREG/INSERT_SUBREG/ COPY_TO_REGCLASS nodes, but hopefully as most of those become copies all that will be cleaned up by further optimizations. The VINS pattern was also adjusted to allow it to represent that it is inserting into the top half of an existing register. Differential Revision: https://reviews.llvm.org/D95381	2021-02-02 13:50:02 +00:00
David Green	48230355e9	[ARM] Remove DLS lr, lr A DLS lr, lr instruction only moves lr to itself. It need not be emitted on it's own to save a instruction in the loop preheader. Differential Revision: https://reviews.llvm.org/D78916	2021-02-02 11:09:31 +00:00
David Green	5b2626ea87	[ARM] Flatten identity shuffles through vqdmulh nodes Given a shuffle(vqdmulh(shuffle, shuffle), we can flatter the shuffles out if they become an identity mask. This can come up during lane interleaving, when we do that better. Differential Revision: https://reviews.llvm.org/D94034	2021-02-01 19:14:20 +00:00
David Green	5805521207	[ARM] Simplify VMOVRRD from extracts of buildvectors Under the softfp calling convention, we are often left with VMOVRRD(extract(bitcast(build_vector(a, b, c, d)))) for the return value of the function. These can be simplified to a,b or c,d directly, depending on the value of the extract. Big endian is a little different because the bitcast switches the lanes around, meaning we end up with b,a or d,c. Differential Revision: https://reviews.llvm.org/D94989	2021-02-01 16:09:25 +00:00
David Green	ad12e6ee95	[ARM] Turn sext_inreg(VGetLaneu) into VGetLaneu This adds a DAG combine for converting sext_inreg of VGetLaneu into VGetLanes, providing the types match correctly. Differential Revision: https://reviews.llvm.org/D95073	2021-02-01 11:10:35 +00:00
David Green	6ab792b68d	[ARM] Simplify extract of VMOVDRR Under SoftFP calling conventions, we can be left with extract(bitcast(BUILD_VECTOR(VMOVDRR(a, b), ..))) patterns that can simplify to a or b, depending on the extract lane. Differential Revision: https://reviews.llvm.org/D94990	2021-02-01 10:24:57 +00:00
Kazu Hirata	7925aa091d	[llvm] Populate SmallVector at construction time (NFC)	2021-01-28 22:21:14 -08:00
Christudasan Devadasan	892e4567e1	Support a list of CostPerUse values This patch allows targets to define multiple cost values for each register so that the cost model can be more flexible and better used during the register allocation as per the target requirements. For AMDGPU the VGPR allocation will be more efficient if the register cost can be associated dynamically based on the calling convention. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D86836	2021-01-29 10:14:52 +05:30
David Green	40f46cb0e4	[ARM] Add alignment checks for MVE VLDn The MVE VLD2/4 and VST2/4 instructions require the pointer to be aligned to at least the size of the element type. This adds a check for that into the ARM lowerInterleavedStore and lowerInterleavedLoad functions, not creating the intrinsics if they are invalid for the alignment of the load/store. Unfortunately this is one of those bug fixes that does effect some useful codegen, as we were able to sometimes do some nice lowering of q15 types. But they can cause problem with low aligned pointers. Differential Revision: https://reviews.llvm.org/D95319	2021-01-28 13:10:08 +00:00
Kazu Hirata	f890fd5f91	[llvm] Use llvm::is_sorted (NFC)	2021-01-27 23:25:39 -08:00
David Green	9e2768a3d9	[ARM] Add neon FP16 scalar_to_vector patterns. This adds some simple fp16 scalar_to_vector patterns, preventing a selection failure if this came up. Differential Revision: https://reviews.llvm.org/D95427	2021-01-27 09:59:15 +00:00
Zhuojia Shen	8cef45517e	[ARM] Fix STRT/STRHT/STRBT input/output operands. STRT, STRHT, and STRBT are store instructions and their source register $Rt should be treated as an input operand instead of an output operand. This should fix things (e.g., liveness tracking in LivePhysRegs) if these instructions were used in CodeGen. Differential Revision: https://reviews.llvm.org/D95074	2021-01-26 14:00:58 -08:00
Adhemerval Zanella	dad55c2218	[ARM] [ELF] Fix ARMMaterializeGV for Indirect calls Recent shouldAssumeDSOLocal changes (introduced by `961f31d8ad`) do not take in consideration the relocation model anymore. The ARM fast-isel pass uses the function return to set whether a global symbol is loaded indirectly or not, and without the expected information llvm now generates an extra load for following code: ``` $ cat test.ll @__asan_option_detect_stack_use_after_return = external global i32 define dso_local i32 @main(i32 %argc, i8** %argv) #0 { entry: %0 = load i32, i32* @__asan_option_detect_stack_use_after_return, align 4 %1 = icmp ne i32 %0, 0 br i1 %1, label %2, label %3 2: ret i32 0 3: ret i32 1 } attributes #0 = { noinline optnone } $ lcc test.ll -o - [...] main: .fnstart [...] movw r0, :lower16:__asan_option_detect_stack_use_after_return movt r0, :upper16:__asan_option_detect_stack_use_after_return ldr r0, [r0] ldr r0, [r0] cmp r0, #0 [...] ``` And without 'optnone' it produces: ``` [...] main: .fnstart [...] movw r0, :lower16:__asan_option_detect_stack_use_after_return movt r0, :upper16:__asan_option_detect_stack_use_after_return ldr r0, [r0] clz r0, r0 lsr r0, r0, #5 bx lr [...] ``` This triggered a lot of invalid memory access in sanitizers for arm-linux-gnueabihf. I checked this patch both a stage1 built with gcc and a stage2 bootstrap and it fixes all the Linux sanitizers issues. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D95379	2021-01-26 15:57:55 -03:00
Kazu Hirata	16baad8f4e	[llvm] Use pop_back_val (NFC)	2021-01-24 12:18:57 -08:00
Kazu Hirata	054444177b	[Target] Use llvm::append_range (NFC)	2021-01-24 12:18:56 -08:00
Kazu Hirata	e4847a7fcf	Revert "[Target] Use llvm::append_range (NFC)" This reverts commit `cc7a238286`. The X86WinEHState.cpp hunk seems to break certain builds.	2021-01-23 11:25:27 -08:00
Kazu Hirata	cc7a238286	[Target] Use llvm::append_range (NFC)	2021-01-23 10:56:31 -08:00
Stanislav Mekhanoshin	607bec0bb9	Change materializeFrameBaseRegister() to return register The only caller of this function is in the LocalStackSlotAllocation and it creates base register of class returned by the target's getPointerRegClass(). AMDGPU wants to use a different reg class here so let materializeFrameBaseRegister to just create and return whatever it wants. Differential Revision: https://reviews.llvm.org/D95268	2021-01-22 15:51:06 -08:00
David Green	af03324984	[ARM] Disable sign extended SSAT pattern recognition. I may have given bad advice, and skipping sext_inreg when matching SSAT patterns is not valid on it's own. It at least needs to sext_inreg the input again, but as far as I can tell is still only valid based on demanded bits. For the moment disable that part of the combine, hopefully reimplementing it in the future more correctly.	2021-01-22 14:07:48 +00:00
David Green	9ae73cdbc1	[ARM] Adjust isSaturatingConditional to return a new SDValue. NFC This replaces the isSaturatingConditional function with LowerSaturatingConditional that directly returns a new SSAT or USAT SDValue, instead of returning true and the components of it.	2021-01-22 11:11:36 +00:00
David Green	39db5753f9	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in `745bf6cf44`. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
David Green	dfac521da1	[ARM] Fix vector saddsat costs. It turns out the vectorizer calls the getIntrinsicInstrCost functions with a scalar return type and vector VF. This updates the costmodel to handle that, still producing the correct vector costs. A vectorizer test is added to show it vectorizing at the correct factor again.	2021-01-21 15:30:39 +00:00
David Green	6a563eef13	[ARM] Expand vXi1 VSELECT's We have no lowering for VSELECT vXi1, vXi1, vXi1, so mark them as expanded to turn them into a series of logical operations. Differential Revision: https://reviews.llvm.org/D94946	2021-01-19 17:56:50 +00:00
David Green	f373b30923	[ARM] Add MVE add.sat costs This adds some basic MVE sadd_sat/ssub_sat/uadd_sat/usub_sat costs, based on when the instruction is legal. With smaller than legal types that are promoted we generate shr(qadd(shl, shl)), so the cost is 4 appropriately. Differential Revision: https://reviews.llvm.org/D94958	2021-01-19 15:38:46 +00:00
Yvan Roux	244ad228f3	[ARM][MachineOutliner] Add stack fixup feature This patch handles cases where we have to save/restore the link register into the stack and and load/store instruction which use the stack are part of the outlined region. It checks that there will be no overflow introduced by the new offset and fixup these instructions accordingly. Differential Revision: https://reviews.llvm.org/D92934	2021-01-19 10:59:09 +01:00
David Green	e7dc083a41	[ARM] Don't handle low overhead branches in AnalyzeBranch It turns our that the BranchFolder and IfCvt does not like unanalyzable branches that fall-through. This means that removing the unconditional branches from the end of tail predicated instruction can run into asserts and verifier issues. This effectively reverts `372eb2bbb6`, but adds handling to t2DoLoopEndDec which are not branches, so can be safely skipped.	2021-01-18 17:16:07 +00:00
David Green	1454724215	[ARM] Align blocks that are not fallthough targets If the previous block in a function does not fallthough, adding nop's to align it will never be executed. This means we can freely (except for codesize) align more branches. This happens in constantislandspass (as it cannot happen later) and only happens at aggressive optimization levels as it does increase codesize. Differential Revision: https://reviews.llvm.org/D94394	2021-01-16 22:19:35 +00:00
David Green	372eb2bbb6	[ARM] Add low overhead loops terminators to AnalyzeBranch This treats low overhead loop branches the same as jump tables and indirect branches in analyzeBranch - they cannot be analyzed but the direct branches on the end of the block may be removed. This helps remove the unnecessary branches earlier, which can help produce better codegen (and change block layout in a number of cases). Differential Revision: https://reviews.llvm.org/D94392	2021-01-16 18:30:21 +00:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
David Green	f5abf0bd48	[ARM] Tail predication with constant loop bounds The TripCount for a predicated vector loop body will be ceil(ElementCount/Width). This alters the conversion of an active.lane.mask to a VCPT intrinsics to match. Differential Revision: https://reviews.llvm.org/D94608	2021-01-15 18:17:31 +00:00
Sam Tebbs	1a497ae9b8	[ARM][Block placement] Check the predecessor exists before processing it Not all machine loops will have a predecessor. so the pass needs to check it before continuing. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D94780	2021-01-15 15:45:13 +00:00
Sam Tebbs	5e4480b6c0	[ARM] Don't run the block placement pass at O0 The block placement pass shouldn't run unless optimisations are enabled. Differential Revision: https://reviews.llvm.org/D94691	2021-01-15 13:59:29 +00:00
Oliver Stannard	3676ef1053	[ARM][GISel] Treat calls as variadic even if only fixed arguments provided For the ARM hard-float calling convention, calls to variadic functions need to be treated diffrently, even if only the fixed arguments are provided. This fixes GCC-C-execute-pr68390 in the test-suite, which is failing on the ARM GlobaISel bot.	2021-01-15 09:37:01 +00:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Sam Tebbs	60fda8ebb6	[ARM] Add a pass that re-arranges blocks when there is a backwards WLS branch Blocks can be laid out such that a t2WhileLoopStart branches backwards. This is forbidden by the architecture and so it fails to be converted into a low-overhead loop. This new pass checks for these cases and moves the target block, fixing any fall-through that would then be broken. Differential Revision: https://reviews.llvm.org/D92385	2021-01-13 17:23:00 +00:00
David Green	c29ca8551a	[ARM] Update isVMOVNOriginalMask to handle single input shuffle vectors The isVMOVNOriginalMask was previously only checking for two input shuffles that could be better expanded as vmovn nodes. This expands that to single input shuffles that will later be legalized to multiple vectors. Differential Revision: https://reviews.llvm.org/D94189	2021-01-13 08:51:28 +00:00
Stephan Herhut	2e17d9c0ee	[ARM] Add uses for locals introduced for debug messages. NFC. This adds uses for locals introduced for new debug messages for the load store optimizer. Those locals are only used on debug statements and otherwise create unused variable warnings. Differential Revision: https://reviews.llvm.org/D94398	2021-01-11 14:27:28 +01:00
David Green	8165a03420	[ARM] Add debug messages for the load store optimizer. NFC	2021-01-11 09:24:28 +00:00
David Green	dcefcd51e0	[ARM] Update trunc costs We did not have specific costs for larger than legal truncates that were not otherwise cheap (where they were next to stores, for example). As MVE does not have a dedicated instruction for them (and we do not use loads/stores yet), they should be expensive as they get expanded to a series of lane moves. Differential Revision: https://reviews.llvm.org/D94260	2021-01-11 08:59:28 +00:00
David Green	024af42c60	[ARM] Custom lower i1 vector truncates The ISel patterns we have for truncating to i1's under MVE do not seem to be correct. Instead custom lower to icmp(ne, and(x, 1), 0). Differential Revision: https://reviews.llvm.org/D94226	2021-01-08 18:21:00 +00:00
Kazu Hirata	b934160aaa	[Target] Use llvm::find_if (NFC)	2021-01-07 20:29:36 -08:00
David Green	63dce70b79	[ARM] Handle any extend whilst lowering addw/addl/subw/subl Same as `a9b6440edd`, use zanyext to treat any_extends as zero extends during lowering to create addw/addl/subw/subl nodes. Differential Revision: https://reviews.llvm.org/D93835	2021-01-06 11:26:39 +00:00
David Green	ddb82fc76c	[ARM] Handle any extend whilst lowering mull Similar to `78d8a821e2` but for ARM, this handles any_extend whilst creating MULL nodes, treating them as zextends. Differential Revision: https://reviews.llvm.org/D93834	2021-01-06 10:51:12 +00:00
Christudasan Devadasan	d68458bd56	[GlobalISel] Base implementation for sret demotion. If the return values can't be lowered to registers SelectionDAG performs the sret demotion. This patch contains the basic implementation for the same in the GlobalISel pipeline. Furthermore, targets should bring relevant changes during lowerFormalArguments, lowerReturn and lowerCall to make use of this feature. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D92953	2021-01-06 10:30:50 +05:30
QingShan Zhang	2962f1149c	[NFC] Add the getSizeInBytes() interface for MachineConstantPoolValue Current implementation assumes that, each MachineConstantPoolValue takes up sizeof(MachineConstantPoolValue::Ty) bytes. For PowerPC, we want to lump all the constants with the same type as one MachineConstantPoolValue to save the cost that calculate the TOC entry for each const. So, we need to extend the MachineConstantPoolValue that break this assumption. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89108	2021-01-05 03:22:45 +00:00
Kazu Hirata	eb198f4c3c	[llvm] Use llvm::any_of (NFC)	2021-01-04 11:42:47 -08:00
David Green	901cc9b6f3	[ARM] Extend lowering for i64 reductions The lowering of a <4 x i16> or <4 x i8> vecreduce.add into an i64 would previously be expanded, due to the i64 not being legal. This patch adjusts our reduction matchers, making it produce a VADDLV(sext A to v4i32) instead. Differential Revision: https://reviews.llvm.org/D93622	2021-01-04 12:44:43 +00:00
Kazu Hirata	0e219b6443	[Target] Construct SmallVector with iterator ranges (NFC)	2021-01-03 09:57:45 -08:00
Kazu Hirata	331c28f60d	[ARM] Declare Op within an if statement (NFC)	2020-12-30 17:45:36 -08:00
Mark Murray	5abfeccf10	[ARM][AArch64] Add Cortex-A78C Support for Clang and LLVM This patch upstreams support for the Armv8-a Cortex-A78C processor for AArch64 and ARM. In detail: Adding cortex-a78c as cpu option for aarch64 and arm targets in clang Adding Cortex-A78C CPU name and ProcessorModel in llvm Details of the CPU can be found here: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78c	2020-12-29 10:18:59 +00:00
David Penry	a9f14cdc62	[ARM] Add bank conflict hazarding Adds ARMBankConflictHazardRecognizer. This hazard recognizer looks for a few situations where the same base pointer is used and then checks whether the offsets lead to a bank conflict. Two parameters are also added to permit overriding of the target assumptions: arm-data-bank-mask=<int> - Mask of bits which are to be checked for conflicts. If all these bits are equal in the offsets, there is a conflict. arm-assume-itcm-bankconflict=<bool> - Assume that there will be bank conflicts on any loads to a constant pool. This hazard recognizer is enabled for Cortex-M7, where the Technical Reference Manual states that there are two DTCM banks banked using bit 2 and one ITCM bank. Differential Revision: https://reviews.llvm.org/D93054	2020-12-23 14:00:59 +00:00
Fangrui Song	d9a0c40bce	[MC] Split MCContext::createTempSymbol, default AlwaysAddSuffix to true, and add comments CanBeUnnamed is rarely false. Splitting to a createNamedTempSymbol makes the intention clearer and matches the direction of reverted r240130 (to drop the unneeded parameters). No behavior change.	2020-12-21 14:04:13 -08:00
Fangrui Song	d33abc337c	Migrate MCContext::createTempSymbol call sites to AlwaysAddSuffix=true Most call sites set AlwaysAddSuffix to true. The two use cases do not really need false and can be more consistent with other temporary symbol usage.	2020-12-21 14:04:13 -08:00
Fangrui Song	7b9890e17e	[MC][ELF] Remove unneeded MCSymbol::setExternal calls ELF code uses symbol bindings and does not call isExternal().	2020-12-20 21:26:36 -08:00
Kazu Hirata	966f1431de	[Target] Use llvm::erase_if (NFC)	2020-12-20 17:43:22 -08:00
Kristof Beyls	df8ed39283	[ARM] harden-sls-blr: avoid r12 and lr in indirect calls. As a linker is allowed to clobber r12 on function calls, the code transformation that hardens indirect calls is not correct in case a linker does so. Similarly, the transformation is not correct when register lr is used. This patch makes sure that r12 or lr are not used for indirect calls when harden-sls-blr is enabled. Differential Revision: https://reviews.llvm.org/D92469	2020-12-19 12:39:59 +00:00
Kristof Beyls	a4c1f5160e	[ARM] Harden indirect calls against SLS To make sure that no barrier gets placed on the architectural execution path, each indirect call calling the function in register rN, it gets transformed to a direct call to __llvm_slsblr_thunk_mode_rN. mode is either arm or thumb, depending on the mode of where the indirect call happens. The llvm_slsblr_thunk_mode_rN thunk contains: bx rN <speculation barrier> Therefore, the indirect call gets split into 2; one direct call and one indirect jump. This transformation results in not inserting a speculation barrier on the architectural execution path. The mitigation is off by default and can be enabled by the harden-sls-blr subtarget feature. As a linker is allowed to clobber r12 on function calls, the above code transformation is not correct in case a linker does so. Similarly, the transformation is not correct when register lr is used. Avoiding r12/lr being used is done in a follow-on patch to make reviewing this code easier. Differential Revision: https://reviews.llvm.org/D92468	2020-12-19 12:33:42 +00:00
Kristof Beyls	320fd3314e	[ARM] Implement harden-sls-retbr for Thumb mode The only non-trivial consideration in this patch is that the formation of TBB/TBH instructions, which is done in the constant island pass, does not understand the speculation barriers inserted by the SLSHardening pass. As such, when harden-sls-retbr is enabled for a function, the formation of TBB/TBH instructions in the constant island pass is disabled. Differential Revision: https://reviews.llvm.org/D92396	2020-12-19 12:32:47 +00:00
Kristof Beyls	195f44278c	[ARM] Implement harden-sls-retbr for ARM mode Some processors may speculatively execute the instructions immediately following indirect control flow, such as returns, indirect jumps and indirect function calls. To avoid a potential miss-speculatively executed gadget after these instructions leaking secrets through side channels, this pass places a speculation barrier immediately after every indirect control flow where control flow doesn't return to the next instruction, such as returns and indirect jumps, but not indirect function calls. Hardening of indirect function calls will be done in a later, independent patch. This patch is implementing the same functionality as the AArch64 counter part implemented in https://reviews.llvm.org/D81400. For AArch64, returns and indirect jumps only occur on RET and BR instructions and hence the function attribute to control the hardening is called "harden-sls-retbr" there. On AArch32, there is a much wider variety of instructions that can trigger an indirect unconditional control flow change. I've decided to stick with the name "harden-sls-retbr" as introduced for the corresponding AArch64 mitigation. This patch implements this for ARM mode. A future patch will extend this to also support Thumb mode. The inserted barriers are never on the correct, architectural execution path, and therefore performance overhead of this is expected to be low. To ensure these barriers are never on an architecturally executed path, when the harden-sls-retbr function attribute is present, indirect control flow is never conditionalized/predicated. On targets that implement that Armv8.0-SB Speculation Barrier extension, a single SB instruction is emitted that acts as a speculation barrier. On other targets, a DSB SYS followed by a ISB is emitted to act as a speculation barrier. These speculation barriers are implemented as pseudo instructions to avoid later passes to analyze them and potentially remove them. The mitigation is off by default and can be enabled by the harden-sls-retbr subtarget feature. Differential Revision: https://reviews.llvm.org/D92395	2020-12-19 11:42:39 +00:00
David Green	e1c1adf9dc	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. This is a recommit of `6cc3d80a84` after fixing the backward instruction definitions.	2020-12-18 16:13:08 +00:00
David Green	6e913e4451	Revert "[ARM] Match dual lane vmovs from insert_vector_elt" This one needed more testing.	2020-12-18 13:33:40 +00:00
Yvan Roux	923ca0b411	[ARM][MachineOutliner] Fix costs model. Fix candidates calls costs models allocation and prepare stack fixups handling. Differential Revision: https://reviews.llvm.org/D92933	2020-12-17 16:08:23 +01:00
Lucas Prates	c5046ebdf6	[ARM] Adding v8.7-A command-line support for the ARM target This extends the command-line support for the 'armv8.7-a' architecture name to the ARM target. Based on a patch written by Momchil Velikov. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D93231	2020-12-17 13:48:54 +00:00
Lucas Prates	42b92b31b8	[ARM][AArch64] Adding basic support for the v8.7-A architecture This introduces support for the v8.7-A architecture through a new subtarget feature called "v8.7a". It adds two new "WFET" and "WFIT" instructions, the nXS limited-TLB-maintenance qualifier for DSB and TLBI instructions, a new CPU id register, ID_AA64ISAR2_EL1, and the new HCRX_EL2 system register. Based on patches written by Simon Tatham and Victor Campos. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D91772	2020-12-17 13:45:08 +00:00
David Green	6cc3d80a84	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. Differential Revision: https://reviews.llvm.org/D92553	2020-12-15 15:58:52 +00:00
David Green	1de3e7fd62	[ARM] Improve handling of empty VPT blocks in tail predicated loops A vpt block that just contains either VPST;VCTP or VPT;VCTP, once the VCTP is removed will become invalid. This fixed the first by removing the now empty block and bails out for the second, as we have no simple way of converting a VPT to a VCMP. Differential Revision: https://reviews.llvm.org/D92369	2020-12-14 11:17:01 +00:00
David Green	a4823377fd	[ARM] Add basic masked load/store costs This adds some basic MVE masked load/store costs, notably changing the cost of legal loads/stores to the MVECostFactor and the cost of scalarized instructions to 8*NumElts. Differential Revision: https://reviews.llvm.org/D86538	2020-12-12 15:26:32 +00:00
David Green	3f571be1c0	[ARM] Make t2DoLoopStartTP a terminator Although this was something that I was hoping we would not have to do, this patch makes t2DoLoopStartTP a terminator in order to keep it at the end of it's block, so not allowing extra MVE instruction between it and the end. With t2DoLoopStartTP's also starting tail predication regions, it also marks them as having side effects. The t2DoLoopStart is still not a terminator, giving it the extra scheduling freedom that can be helpful, but now that we have a TP version they can be treated differently. Differential Revision: https://reviews.llvm.org/D91887	2020-12-11 09:23:57 +00:00
Haojian Wu	2fc4afda0f	Fix a -Wunused-variable warning in release build.	2020-12-10 14:52:45 +01:00
David Green	0447f3508f	[ARM][RegAlloc] Add t2LoopEndDec We currently have problems with the way that low overhead loops are specified, with LR being spilled between the t2LoopDec and the t2LoopEnd forcing the entire loop to be reverted late in the backend. As they will eventually become a single instruction, this patch introduces a t2LoopEndDec which is the combination of the two, combined before registry allocation to make sure this does not fail. Unfortunately this instruction is a terminator that produces a value (and also branches - it only produces the value around the branching edge). So this needs some adjustment to phi elimination and the register allocator to make sure that we do not spill this LR def around the loop (needing to put a spill after the terminator). We treat the loop very carefully, making sure that there is nothing else like calls that would break it's ability to use LR. For that, this adds a isUnspillableTerminator to opt in the new behaviour. There is a chance that this could cause problems, and so I have added an escape option incase. But I have not seen any problems in the testing that I've tried, and not reverting Low overhead loops is important for our performance. If this does work then we can hopefully do the same for t2WhileLoopStart and t2DoLoopStart instructions. This patch also contains the code needed to convert or revert the t2LoopEndDec in the backend (which just needs a subs; bne) and the code pre-ra to create them. Differential Revision: https://reviews.llvm.org/D91358	2020-12-10 12:14:23 +00:00
David Green	b0ce615b2d	[ARM] Remove copies from low overhead phi inductions. The phi created in a low overhead loop gets created with a default register class it seems. There are then copied inserted between the low overhead loop pseudo instructions (which produce/consume GPRlr instructions) and the phi holding the induction. This patch removes those as a step towards attempting to make t2LoopDec and t2LoopEnd a single instruction, and appears useful in it's own right as shown in the tests. Differential Revision: https://reviews.llvm.org/D91267	2020-12-10 10:30:31 +00:00
David Green	384383e15c	[ARM] Common inverse constant predicates to VPNOT This scans through blocks looking for constants used as predicates in MVE instructions. When two constants are found which are the inverse of one another, the second can be replaced by a VPNOT of the first, potentially allowing that not to be folded away into an else predicate of a vpt block. Differential Revision: https://reviews.llvm.org/D92470	2020-12-09 07:56:44 +00:00
David Green	03e675fd12	[ARM] Turn pred_cast(xor(x, -1)) into xor(pred_cast(x), -1) This folds a not (an xor -1) though a predicate_cast, so that it can be turned into a VPNOT and potentially be folded away as an else predicate inside a VPT block. Differential Revision: https://reviews.llvm.org/D92235	2020-12-08 15:22:46 +00:00
David Green	91fb9eac0b	[ARM] Remove dead instructions before creating VPT block bundles We remove VPNOT instructions in VPT blocks as we create them, turning them into else predicates. We don't remove the dead instructions until after the block has been created though. Because the VPNOT will have killed the vpr register it used, this makes finalizeBundle add internal flags to the vpr uses of any instructions after the VPNOT. These incorrect flags can then confuse what is alive and what is not, leading to machine verifier problems. This patch removes them earlier instead, before the bundle is finalized so that kill flags remain valid. Differential Revision: https://reviews.llvm.org/D92227	2020-12-08 14:05:07 +00:00
David Green	d9bf6245bf	[ARM] Revert low overhead loops with calls before registry allocation. This adds code to revert low overhead loops with calls in them before register allocation. Ideally we would not create low overhead loops with calls in them to begin with, but that can be difficult to always get correct. If we want to try and glue together t2LoopDec and t2LoopEnd into a single instruction, we need to ensure that no instructions use LR in the loop. (Technically the final code can be better too, as it doesn't need to use the same registers but that has not been optimized for here, as reverting loops with calls is expected to be very rare). It also adds a MVETailPredUtils.h header to share the revert code between different passes, and provides a place to expand upon, with RevertLoopWithCall becoming a place to perform other low overhead loop alterations like removing copies or combining LoopDec and End into a single instruction. Differential Revision: https://reviews.llvm.org/D91273	2020-12-07 15:44:40 +00:00
Evgeny Leviant	993eaf2d69	Recommit [TableGen][SchedModels] Fix read/write variant substitution Original commit rG112b3cb6ba49 introduced non-determinism in subtarget generator due to iteration over DenseMap. New patch fixes this changing ProcModelMapTy from DenseMap to std::map.	2020-12-04 21:50:34 +03:00
Fangrui Song	86fa896363	Revert D90844 "[TableGen][SchedModels] Fix read/write variant substitution" This reverts commit `112b3cb6ba`. D90844 made lib/Target/AArch64/AArch64GenSubtargetInfo.inc non-deterministic.	2020-12-03 14:24:29 -08:00
dfukalov	2ce38b3f03	[NFC] Reduce include files dependency. 1. Removed #include "...AliasAnalysis.h" in other headers and modules. 2. Cleaned up includes in AliasAnalysis.h. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92489	2020-12-03 18:25:05 +03:00
Mircea Trofin	bab72dd5d5	[NFC][MC] TargetRegisterInfo::getSubReg is a MCRegister. Typing the API appropriately. Differential Revision: https://reviews.llvm.org/D92341	2020-12-02 15:46:38 -08:00
David Green	eedf0ed63e	[ARM] Mark select and selectcc of MVE vector operations as expand. We already expand select and select_cc in codegenprepare, but they can still be generated under some situations. Explicitly mark them as expand to ensure they are not produced, leading to a failure to select the nodes. Differential Revision: https://reviews.llvm.org/D92373	2020-12-01 15:05:55 +00:00
David Green	7923d71b4a	[ARM] PREDICATE_CAST demanded bits The PREDICATE_CAST node is used to model moves between MVE predicate registers and gpr's, and eventually become a VMSR p0, rn. When moving to a predicate only the bottom 16 bits of the sources register are demanded. This adds a simple fold for that, allowing it to potentially remove instructions like uxth. Differential Revision: https://reviews.llvm.org/D92213	2020-12-01 10:32:24 +00:00
Evgeny Leviant	112b3cb6ba	[TableGen][SchedModels] Fix read/write variant substitution Patch fixes multiple issues related to expansion of variant sched reads and writes. Differential revision: https://reviews.llvm.org/D90844	2020-11-30 11:55:55 +03:00
Nikita Popov	4df8efce80	[AA] Split up LocationSize::unknown() Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object). This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses. The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way. Differential Revision: https://reviews.llvm.org/D91649	2020-11-26 18:39:55 +01:00
David Green	0e49a40d75	[ARM] Cleanup for the MVETailPrediction pass This strips out a lot of the code that should no longer be needed from the MVETailPredictionPass, leaving the important part - find active lane mask instructions and convert them to VCTP operations. Differential Revision: https://reviews.llvm.org/D91866	2020-11-26 15:10:44 +00:00
Mark Murray	2b6691894a	[ARM][AArch64] Adding Neoverse N2 CPU support Add support for the Neoverse N2 CPU to the ARM and AArch64 backends. Differential Revision: https://reviews.llvm.org/D91695	2020-11-25 11:42:54 +00:00
Evgeny Leviant	a6a6d11c7b	[MC][ARM] Fix number of operands of tMOVSr Differential revision: https://reviews.llvm.org/D92029	2020-11-24 18:13:10 +03:00
Evgeny Leviant	50bd686695	Add support for branch forms of ALU instructions to Cortex-A57 model Patch fixes scheduling of ALU instructions which modify pc register. Patch also fixes computation of mutually exclusive predicates for sequences of variants to be properly expanded Differential revision: https://reviews.llvm.org/D91266	2020-11-24 11:43:51 +03:00
Craig Topper	4252f7773a	[SelectionDAG][ARM][AArch64][Hexagon][RISCV][X86] Add SDNPCommutative to fma and fmad nodes in tablegen. Remove explicit commuted patterns from targets. X86 was already specially marking fma as commutable which allowed tablegen to autogenerate commuted patterns. This moves it to the target independent definition and fix up the targets to remove now unneeded patterns. Unfortunately, the tests change because the commuted version of the patterns are generating operands in a different than the explicit patterns. Differential Revision: https://reviews.llvm.org/D91842	2020-11-23 10:09:20 -08:00
David Green	c8c3a411c5	[ARM] Ensure MVE_TwoOpPattern is used inside Predicate's	2020-11-22 21:38:00 +00:00
David Green	f08c37da7b	[ARM] Disable WLSTP loops This checks to see if the loop will likely become a tail predicated loop and disables wls loop generation if so, as the likelihood for reverting is currently too high. These should be fairly rare situations anyway due to the way iterations and element counts are used during lowering. Just not trying can alter how SCEV's are materialized however, leading to different codegen. It also adds a option to disable all while low overhead loops, for debugging. Differential Revision: https://reviews.llvm.org/D91663	2020-11-20 13:30:44 +00:00
Sam Tebbs	8ecb015ed5	[ARM][LowOverheadLoops] Convert intermediate vpr use assertion to condition This converts the intermediate VPR use assertion to a condition in the if-statement to protect against assertion failures in case behaviuour is changed. This is a follow-up to https://reviews.llvm.org/D90935 and implements the post-approval comments. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91790	2020-11-19 17:15:45 +00:00
David Green	006b3bdedd	[ARM] Deliberately prevent inline asm in low overhead loops. NFC This was already something that was handled by one of the "else" branches in maybeLoweredToCall, so this patch is an NFC but makes it explicit and adds a test. We may in the future want to support this under certain situations but for the moment just don't try and create low overhead loops with inline asm in them. Differential Revision: https://reviews.llvm.org/D91257	2020-11-19 13:28:21 +00:00
Duncan P. N. Exon Smith	5abf76fbe3	ADT: Add assertions to SmallVector::insert, etc., for reference invalidation `2c196bbc6b` asserted that `SmallVector::push_back` doesn't invalidate the parameter when it needs to grow. Do the same for `resize`, `append`, `assign`, `insert`, and `emplace_back`. Differential Revision: https://reviews.llvm.org/D91744	2020-11-18 17:36:28 -08:00
Mikhail Goncharov	f45c052c9e	Fix unused variables in release build Differential Revision: https://reviews.llvm.org/D91705	2020-11-18 15:18:31 +01:00
Sam Tebbs	da2e4728c7	[ARM][LowOverheadLoops] Merge VCMP and VPST across VPT blocks This patch adds support for combining a VPST with a dangling VCMP from a previous VPT block. Differential Revision: https://reviews.llvm.org/D90935	2020-11-18 12:54:16 +00:00
Florian Hahn	b2f4c5fddc	[AsmWriter] Factor out mnemonic generation to accessible getMnemonic. This patch factors out the part of printInstruction that gets the mnemonic string for a given MCInst. This is intended to be used subsequently for the instruction-mix remarks to display the final mnemonic (D90040). Unfortunately making `getMnemonic` available to the AsmPrinter seems to require making it virtual. Not sure if there's a way around that with the current layering of the AsmPrinters. Reviewed By: Paul-C-Anagnostopoulos Differential Revision: https://reviews.llvm.org/D90039	2020-11-17 09:47:38 +00:00
David Penry	48b43c9d4f	[ARM] Cortex-M7 schedule This patch adds the SchedMachineModel for Cortex-M7. It also adds test cases for the scheduling information. Details of the pipeline and descriptions are in comments in file ARMScheduleM7.td included in this patch. Differential Revision: https://reviews.llvm.org/D91355	2020-11-16 10:16:07 +00:00
serge-sans-paille	9218ff50f9	llvmbuildectomy - replace llvm-build by plain cmake No longer rely on an external tool to build the llvm component layout. Instead, leverage the existing `add_llvm_componentlibrary` cmake function and introduce `add_llvm_component_group` to accurately describe component behavior. These function store extra properties in the created targets. These properties are processed once all components are defined to resolve library dependencies and produce the header expected by llvm-config. Differential Revision: https://reviews.llvm.org/D90848	2020-11-13 10:35:24 +01:00
David Green	11dee2eae2	[ARM] Ensure CountReg definition dominates InsertPt when creating t2DoLoopStartTP Of course there was something missing, in this case a check that the def of the count register we are adding to a t2DoLoopStartTP would dominate the insertion point. In the future, when we remove some of these COPY's in between, the t2DoLoopStartTP will always become the last instruction in the block, preventing this from happening. In the meantime we need to check they are created in a sensible order. Differential Revision: https://reviews.llvm.org/D91287	2020-11-12 13:47:46 +00:00
Simon Pilgrim	1a62ca65c1	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
Sam Parker	898a81dfc5	[NFC][ARM] Replace lambda with any_of	2020-11-11 10:02:55 +00:00
Pirama Arumuga Nainar	8262e94a6d	[ARM] Fix PR 47980: Use constrainRegClass during foldImmediate opt. Previously we used setRegClass to rgpr, which may expand the register domain if the result was already in a constrained class (tcgpr in the above PR). Differential Revision: https://reviews.llvm.org/D91192	2020-11-10 13:38:11 -08:00
Benjamin Kramer	92c61a045f	[ARM] Silence unused variable warning in Release builds. NFC.	2020-11-10 20:35:28 +01:00
David Green	08d1c2d470	[ARM] Introduce t2DoLoopStartTP This introduces a new pseudo instruction, almost identical to a t2DoLoopStart but taking 2 parameters - the original loop iteration count needed for a low overhead loop, plus the VCTP element count needed for a DLSTP instruction setting up a tail predicated loop. The idea is that the instruction holds both values and the backend ARMLowOverheadLoops pass can pick between the two, depending on whether it creates a tail predicated loop or falls back to a low overhead loop. To do that there needs to be something that converts a t2DoLoopStart to a t2DoLoopStartTP, for which this patch repurposes the MVEVPTOptimisationsPass as a "tail predication and vpt optimisation" pass. The extra operand for the t2DoLoopStartTP is chosen based on the operands of VCTP's in the loop, and the instruction is moved as late in the block as possible to attempt to increase the likelihood of making tail predicated loops. Differential Revision: https://reviews.llvm.org/D90591	2020-11-10 18:08:12 +00:00
David Green	dbe1bf63aa	[ARM] Cleanup for ARMLowOverheadLoops. NFC	2020-11-10 17:28:07 +00:00
David Green	c7e275388e	[ARM] Don't aggressively unroll vector remainder loops We already do not unroll loops with vector instructions under MVE, but that does not include the remainder loops that the vectorizer produces. These remainder loops will be rarely executed and are not worth unrolling, as the trip count is likely to be low if they get executed at all. Luckily they get llvm.loop.isvectorized to make recognizing them simpler. We have wanted to do this for a while but hit issues with low overhead loops being reverted due to difficult registry allocation. With recent changes that seems to be less of an issue now. Differential Revision: https://reviews.llvm.org/D90055	2020-11-10 17:01:31 +00:00
David Green	73a6cd4b6b	[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR This hints the operand of a t2DoLoopStart towards using LR, which can help make it more likely to become t2DLS lr, lr. This makes it easier to move if needed (as the input is the same as the output), or potentially remove entirely. The hint is added after others (from COPY's etc) which still take precedence. It needed to find a place to add the hint, which currently uses the post isel custom inserter. Differential Revision: https://reviews.llvm.org/D89883	2020-11-10 16:28:57 +00:00
David Green	b2ac9681a7	[ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881	2020-11-10 15:57:58 +00:00
David Green	c8cd7e2bbf	[ARM] Remove MI variable aliasing. NFC This was accidentally using the same name for two different variables in the same line. Whilst it seems to work for some compilers, others have trouble and it is probably not a fantastic idea.	2020-11-09 18:18:43 +00:00
Momchil Velikov	937ab6a785	[ARM][MachineOutliner] Emit more CFI instructions This patch make the outliner emit CFI instructions in a few more places: * after LR is restored, but before the return in an outlined function * around save/restore of LR to/from a register at calls to outlined functions * around save/restore of LR to/from the stack at calls to outlined functions The latter two only when the function does NOT spill LR. If the function spills LR, then outliner generated saves/restores around calls are not considered interesting for unwinding the frame. Differential Revision: https://reviews.llvm.org/D89483	2020-11-09 15:26:18 +00:00
Sam Tebbs	40a3f7e48d	[ARM][LowOverheadLoops] Merge a VCMP and the new VPST into a VPT There were cases where a VCMP and a VPST were merged even if the VCMP didn't have the same defs of its operands as the VPST. This is fixed by adding RDA checks for the defs. This however gave rise to cases where the new VPST created would precede the un-merged VCMP and so would fail a predicate mask assertion since the VCMP wasn't predicated. This was solved by converting the VCMP to a VPT instead of inserting the new VPST. Differential Revision: https://reviews.llvm.org/D90461	2020-11-09 15:03:48 +00:00
David Green	a0a9e1c798	[ARM] Remove kill flags between VCMP and insertion point When we fold a VCMP into a VPST instruction any kill flags between the old VCMP position and the new insertion point need to be removed, in order to keep the verifier happy. Differential Revision: https://reviews.llvm.org/D90964	2020-11-09 13:17:53 +00:00
Lucas Prates	c2c2cc1360	[ARM][AArch64] Adding Neoverse V1 CPU support Add support for the Neoverse V1 CPU to the ARM and AArch64 backends. This is based on patches from Mark Murray and Victor Campos. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D90765	2020-11-09 13:15:40 +00:00
Sanjay Patel	264a6df353	[ARM] remove cost-kind predicate for cmp/sel costs This is the cmp/sel sibling to D90692. Again, the reasoning is: the throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uops). We need to check for a valid (non-null) condition type parameter because SimplifyCFG may pass nullptr for that (and so we will crash multiple regression tests without that check). I'm not sure if passing nullptr makes sense, but other code in the cost model does appear to check if that param is set or not. Differential Revision: https://reviews.llvm.org/D90781	2020-11-05 14:52:25 -05:00
Sander de Smalen	d57bba7cf8	[SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference. To accommodate frame layouts that have both fixed and scalable objects on the stack, describing a stack location or offset using a pointer + uint64_t is not sufficient. For this reason, we've introduced the StackOffset class, which models both the fixed- and scalable sized offsets. The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset, so that this can be used in other interfaces, such as to eliminate frame indices in PEI or to emit Debug locations for variables on the stack. This patch is purely mechanical and doesn't change the behaviour of how the result of this function is used for fixed-sized offsets. The patch adds various checks to assert that the offset has no scalable component, as frame offsets with a scalable component are not yet supported in various places. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D90018	2020-11-05 11:02:18 +00:00
Cameron McInally	c126eb7529	[SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247. Differential Revision: https://reviews.llvm.org/D90644	2020-11-04 14:20:31 -06:00
David Green	eb611930b6	[ARM] Remove unused variable. NFC	2020-11-04 09:00:03 +00:00
Sanjay Patel	c40126e740	[ARM] remove cost-kind predicate for most math op costs This is based on the same idea that I am using for the basic model implementation and what I have partly already done for x86: throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uop)). Differential Revision: https://reviews.llvm.org/D90692	2020-11-03 17:23:46 -05:00
David Green	bd32386410	[ARM] Remove unused variable. NFC	2020-11-03 12:58:10 +00:00
David Green	e474499402	[ARM] Treat memcpy/memset/memmove as call instructions for low overhead loops If an instruction will be lowered to a call there is no advantage of using a low overhead loop as the LR register will need to be spilled and reloaded around the call, and the low overhead will end up being reverted. This teaches our hardware loop lowering that these memory intrinsics will be calls under certain situations. Differential Revision: https://reviews.llvm.org/D90439	2020-11-03 11:53:09 +00:00
Momchil Velikov	7360d6d921	[ARM][MachineOutliner] Do not overestimate LR liveness in return block The `LiveRegUnits` utility (as well as `LivePhysRegs`) considers callee-saved registers to be alive at the point after the return instruction in a block. In the ARM backend, the `LR` register is classified as callee-saved, which is not really correct (from an ARM eABI or just common sense point of view). These two conditions cause the `MachineOutliner` to overestimate the liveness of `LR`, which results in unnecessary saves/restores of `LR` around calls to outlined sequences. It also causes the `MachineVerifer` to crash in some cases, because the save instruction reads a dead `LR`, for example when the following program: int h(int, int); int f(int a, int b, int c, int d) { a = h(a + 1, b - 1); b = b + c; return 1 + (2 * a + b) * (c - d) / (a - b) * (c + d); } int g(int a, int b, int c, int d) { a = h(a - 1, b + 1); b = b + c; return 2 + (2 * a + b) * (c - d) / (a - b) * (c + d); } is compiled with `-target arm-eabi -march=armv7-m -Oz`. This patch computes the liveness of `LR` in return blocks only, while taking into account the few ARM instructions, which read `LR`, but nevertheless the register is not mentioned (explicitly or implicitly) in the instruction operands. Differential Revision: https://reviews.llvm.org/D89189	2020-11-02 16:47:22 +00:00
Florian Hahn	b3b993a7ad	Reland "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts the revert commit `408c4408fa`. This version of the patch includes a fix for a crash caused by treating ICmp/FCmp constant expressions as instructions. Original message: On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV.	2020-11-02 15:39:29 +00:00
Evgeny Leviant	cc96a82291	[TableGen][SchedModels] Fix read/write variant substitution Patch fixes case when sched class has write and read variants belonging to different processor models. Differential revision: https://reviews.llvm.org/D89777	2020-11-02 17:39:04 +03:00
David Green	30ad742644	[ARM] Fix crash for gather of pointer costs. If the elt size is unknown due to it being a pointer, a comparison against 0 will cause an assert. Make sure the elt size is large enough before comparing and for the moment just return the scalar cost.	2020-10-31 13:10:14 +00:00
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit `73f01e3df5`. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Cameron McInally	dda1e74b58	[Legalize] Add legalizations for VECREDUCE_SEQ_FADD Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass. Differential Revision: https://reviews.llvm.org/D90247	2020-10-30 16:02:55 -05:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
David Green	d14db8c8dc	[ARM] Match MVE vqdmulh This adds ISel matching for a form of VQDMULH. There are several ir patterns that we could match to that instruction, this one is for: min(ashr(mul(sext(a), sext(b)), 7), 127) Which is what llvm will optimize to once it has removed the max that usually makes up the min/max saturate pattern, as in this case the compare will always be false. The additional complication to match i32 patterns (which extend into an i64) is that the min will be a vselect/setcc, as vmin is not supported for i64 vectors. Tablegen patterns have also been updated to attempt to reuse the MVE_TwoOpPattern patterns. Differential Revision: https://reviews.llvm.org/D90096	2020-10-30 13:34:27 +00:00
Nicholas Guy	eb9fe24eaf	[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz Fixes a regression caused by D82439, in which IT blocks were no longer being generated when -Oz is present. Differential Revision: https://reviews.llvm.org/D88496	2020-10-29 15:17:31 +00:00
Evgeny Leviant	a28388f95b	[ARM][SchedModels] Move IsLDMBaseRegInListPred to ARMSchedule.td. NFC This predicate is not specific to cortex-a57 and can be used in other processor models as well.	2020-10-26 22:31:41 +03:00
Evgeny Leviant	e74f66125e	[ARM][SchedModels] Convert IsLdstsoScaledNotOptimalPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90150	2020-10-26 20:22:41 +03:00
Evgeny Leviant	a877bda397	Fix issue in cortex-a57 sched model Differential revision: https://reviews.llvm.org/D90152	2020-10-26 20:16:40 +03:00
Evgeny Leviant	a95ce5f65f	[ARM][SchedModels] Rename and generalize predicate. NFC	2020-10-26 12:14:55 +03:00
Evgeny Leviant	99b2756517	[ARM][SchedModels] Get rid of IsLdrAm2ScaledPred Differential revision: https://reviews.llvm.org/D90024	2020-10-26 12:01:39 +03:00
Evgeny Leviant	a4fc18e641	[ARM][SchedModels] Convert IsLdstsoMinusRegPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90029	2020-10-26 11:54:08 +03:00
Evgeny Leviant	d613e39d52	[ARM][SchedModels] Convert IsLdrAm3NegRegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90045	2020-10-26 11:43:02 +03:00
David Green	61bc18de0b	[Schedule] Add a MultiHazardRecognizer This adds a MultiHazardRecognizer and starts to make use of it in the ARM backend. The idea of the class is to allow multiple independent hazard recognizers to be added to a single base MultiHazardRecognizer, allowing them to all work in parallel without requiring them to be chained into subclasses. They can then be added or not based on cpu or subtarget features, which will become useful in the ARM backend once more hazard recognizers are being used for various things. This also renames ARMHazardRecognizer to ARMHazardRecognizerFPMLx in the process, to more clearly explain what that recognizer is designed for. Differential Revision: https://reviews.llvm.org/D72939	2020-10-26 08:06:17 +00:00
David Green	92205bf122	[ARM] Remove some dead code. NFC	2020-10-24 17:22:49 +01:00
Paul C. Anagnostopoulos	876af264c1	[TableGen] Change !getop and !setop to !getdagop and !setdagop. Differential Revision: https://reviews.llvm.org/D89814	2020-10-23 10:36:05 -04:00
Evgeny Leviant	cb86522c94	[ARM][SchedModels] Convert IsR1P0AndLaterPred to MCSchedPredicate. NFC Differential revision: https://reviews.llvm.org/D90017	2020-10-23 14:27:49 +03:00
Evgeny Leviant	7a78073be7	[ARM][SchedModels] Let ldm* instruction scheduling use MCSchedPredicate Differential revision: https://reviews.llvm.org/D89957	2020-10-23 10:33:20 +03:00
Mircea Trofin	e24537d48f	[NFC][MC] Use MCRegister for ReachingDefAnalysis APIs Also updated the users of the APIs; and a drive-by small change to RDFRegister.cpp Differential Revision: https://reviews.llvm.org/D89912	2020-10-22 08:47:35 -07:00
Evgeny Leviant	ed6a91f456	[ARM][SchedModels] Convert IsLdstsoScaledPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89939	2020-10-22 18:03:01 +03:00
Evgeny Leviant	bf9edcb6fd	[ARM][SchedModels] Convert IsLdrAm3RegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89876	2020-10-21 20:49:10 +03:00
Paul C. Anagnostopoulos	dfd6b69e01	[ARM] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans Differential Revision: https://reviews.llvm.org/D89822	2020-10-21 09:52:45 -04:00
Nicholas Guy	9a2d2bedb7	Add "SkipDead" parameter to TargetInstrInfo::DefinesPredicate Some instructions may be removable through processes such as IfConversion, however DefinesPredicate can not be made aware of when this should be considered. This parameter allows DefinesPredicate to distinguish these removable instructions on a per-call basis, allowing for more fine-grained control from processes like ifConversion. Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose Differential Revision: https://reviews.llvm.org/D88494	2020-10-21 11:52:47 +01:00
Evgeny Leviant	991e86156c	[ARM][SchedModels] Convert IsCPSRDefinedPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89460	2020-10-20 11:14:21 +03:00
David Green	6dcbc323fd	Revert "[ARM][LowOverheadLoops] Adjust Start insertion." This reverts commit `38f625d0d1`. This commit contains some holes in its logic and has been causing issues since it was commited. The idea sounds OK but some cases were not handled correctly. Instead of trying to fix that up later it is probably simpler to revert it and work to reimplement it in a more reliable way.	2020-10-20 08:55:21 +01:00
Luqman Aden	51892a42da	[COFF][ARM] Fix CodeView for Windows on 32bit ARM targets. Create the LLVM / CodeView register mappings for the 32-bit ARM Window targets. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D89622	2020-10-19 22:16:16 -07:00
Evgeny Leviant	8a7ca143f8	[ARM][SchedModels] Convert IsPredicatedPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89553	2020-10-19 11:37:54 +03:00
David Green	b93d74ac9c	[ARM] Basic getArithmeticReductionCost reduction costs This adds some basic costs for MVE reductions - currently just costing the simple legal add vectors as a single MVE instruction. More complex costing can be added in the future when the framework more readily allows it. Differential Revision: https://reviews.llvm.org/D88980	2020-10-17 10:29:00 +01:00
David Green	d79ee3a807	[ARM] Add a very basic active_lane_mask cost This adds a very basic cost for active_lane_mask under MVE - making the assumption that they will be free and then apologizing for that in a comment. In reality they may either be free (by being nicely folded into a tail predicated loop), cost the same as a VCTP or be expanded into vdup's, adds and cmp's. It is difficult to detect the difference from a single getIntrinsicInstrCost call, so makes the assumption that the vectorizer is adding them, and only added them where it makes sense. We may need to change this in the future to better model predicate costs in the vectorizer, especially at -Os or non-tail predicated loops. The vectorizer currently does not query the cost of these instructions but that will change in the future and a zero cost there probably makes the most sense at the moment. Differential Revision: https://reviews.llvm.org/D88989	2020-10-17 10:09:42 +01:00
David Sherwood	47f2dc7e5f	[SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets In most of lib/Target we know that we are not dealing with scalable types so it's perfectly fine to replace TypeSize comparison operators with their fixed width equivalents, making use of getFixedSize() and so on. Differential Revision: https://reviews.llvm.org/D89101	2020-10-15 09:01:21 +01:00
Evgeny Leviant	2ad82b0ed1	[ARM.td] Make instruction definitions visible to sched models Differential revision: https://reviews.llvm.org/D89308	2020-10-14 09:58:45 +03:00
David Green	cb27006a94	[ARM] Attempt to make Tail predication / RDA more resilient to empty blocks There are a number of places in RDA where we assume the block will not be empty. This isn't necessarily true for tail predicated loops where we have removed instructions. This attempt to make the pass more resilient to empty blocks, not casting pointers to machine instructions where they would be invalid. The test contains a case that was previously failing, but recently been hidden on trunk. It contains an empty block to begin with to show a similar error. Differential Revision: https://reviews.llvm.org/D88926	2020-10-10 14:50:25 +01:00
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Sam Tebbs	68e002e181	[ARM] Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV. Differential Revision: https://reviews.llvm.org/D87836	2020-10-06 14:44:58 +01:00
Amara Emerson	c9f5cdd453	Revert "[ARM]Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV" This reverts commit `2573cf3c3d`. These seem to break some lit tests.	2020-10-05 10:52:43 -07:00
Sam Tebbs	2573cf3c3d	[ARM]Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV. Differential Revision: https://reviews.llvm.org/D87836	2020-10-05 15:51:28 +01:00
David Green	7feafa0286	[ARM] Fix pointer offset when splitting stores from VMOVDRR We were not accounting for the pointer offset when splitting a store from a VMOVDRR node, which could lead to incorrect aliasing info. In this case it is the fneg via integer arithmetic that gives us a store->load pair that we started getting wrong. Differential Revision: https://reviews.llvm.org/D88653	2020-10-03 16:47:50 +01:00
Meera Nakrani	f7c0e2b8f2	[ARM] Prevent constants from iCmp instruction from being hoisted if part of a min(max()) pattern Marks constants of an ICmp instruction as free if it's only user is a select instruction that is part of a min(max()) pattern. Ensures that in loops, in particular when loop unrolling is turned on, SSAT will still be correctly generated. Differential Revision: https://reviews.llvm.org/D88662	2020-10-02 09:28:35 +00:00
Meera Nakrani	48c9e8244b	[ARM] Removed hasSideEffects from signed/unsigned saturates Removed hasSideEffects from SSAT and USAT so that they are no longer marked as unpredictable. Differential Revision: https://reviews.llvm.org/D88545	2020-10-01 14:55:01 +00:00
Sam Parker	7e02bc81c6	[NFC][ARM] LowOverheadLoop DEBUG statements	2020-10-01 13:38:16 +01:00
Sam Parker	38f625d0d1	[ARM][LowOverheadLoops] Adjust Start insertion. Try to move the insertion point to become the terminator of the block, usually the preheader. Differential Revision: https://reviews.llvm.org/D88638	2020-10-01 10:49:19 +01:00
Sam Parker	6ec5f32497	[ARM][LowOverheadLoops] Iteration count liveness Before deciding to insert a [W\|D]LSTP, check that defining LR with the element count won't affect any other instructions that should be taking the iteration count. Differential Revision: https://reviews.llvm.org/D88549	2020-10-01 10:11:10 +01:00
Sam Parker	7b90516d47	[ARM][LowOverheadLoops] Start insertion point If possible, try not to move the start position earlier than it already is. Differential Revision: https://reviews.llvm.org/D88542	2020-10-01 10:05:25 +01:00
Sam Parker	dfa2c14b8f	[ARM][LowOverheadLoops] Use iterator for InsertPt. Use a MachineBasicBlock::iterator instead of a MachineInstr* for the position of our LoopStart instruction. NFCish, as it change debug info.	2020-10-01 08:32:35 +01:00
Rahman Lavaee	8955950c12	Exception support for basic block sections This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf This patch provides exception support for basic block sections by splitting the call-site table into call-site ranges corresponding to different basic block sections. Still all landing pads must reside in the same basic block section (which is guaranteed by the the core basic block section patch D73674 (ExceptionSection) ). Each call-site table will refer to the landing pad fragment by explicitly specifying @LPstart (which is omitted in the normal non-basic-block section case). All these call-site tables will share their action and type tables. The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. In the case of basic block section where one section contains all the landing pads, the landing pad offset relative to LPStart could actually be zero. Thus, we avoid zero-offset landing pads by inserting a nop operation as the first non-CFI instruction in the exception section. Background on Exception Handling in C++ ABI https://github.com/itanium-cxx-abi/cxx-abi/blob/master/exceptions.pdf Compiler emits an exception table for every function. When an exception is thrown, the stack unwinding library queries the unwind table (which includes the start and end of each function) to locate the exception table for that function. The exception table includes a call site table for the function, which is used to guide the exception handling runtime to take the appropriate action upon an exception. Each call site record in this table is structured as follows: \| CallSite \| --> Position of the call site (relative to the function entry) \| CallSite length \| --> Length of the call site. \| Landing Pad \| --> Position of the landing pad (relative to the landing pad fragment’s begin label) \| Action record offset \| --> Position of the first action record The call site records partition a function into different pieces and describe what action must be taken for each callsite. The callsite fields are relative to the start of the function (as captured in the unwind table). The landing pad entry is a reference into the function and corresponds roughly to the catch block of a try/catch statement. When execution resumes at a landing pad, it receives an exception structure and a selector value corresponding to the type of the exception thrown, and executes similar to a switch-case statement. The landing pad field is relative to the beginning of the procedure fragment which includes all the landing pads (@LPStart). The C++ ABI requires all landing pads to be in the same fragment. Nonetheless, without basic block sections, @LPStart is the same as the function @Start (found in the unwind table) and can be omitted. The action record offset is an index into the action table which includes information about which exception types are caught. C++ Exceptions with Basic Block Sections Basic block sections break the contiguity of a function fragment. Therefore, call sites must be specified relative to the beginning of the basic block section. Furthermore, the unwinding library should be able to find the corresponding callsites for each section. To do so, the .cfi_lsda directive for a section must point to the range of call-sites for that section. This patch introduces a new CallSiteRange structure which specifies the range of call-sites which correspond to every section: `struct CallSiteRange { // Symbol marking the beginning of the precedure fragment. MCSymbol FragmentBeginLabel = nullptr; // Symbol marking the end of the procedure fragment. MCSymbol FragmentEndLabel = nullptr; // LSDA symbol for this call-site range. MCSymbol *ExceptionLabel = nullptr; // Index of the first call-site entry in the call-site table which // belongs to this range. size_t CallSiteBeginIdx = 0; // Index just after the last call-site entry in the call-site table which // belongs to this range. size_t CallSiteEndIdx = 0; // Whether this is the call-site range containing all the landing pads. bool IsLPRange = false; };` With N basic-block-sections, the call-site table is partitioned into N call-site ranges. Conceptually, we emit the call-site ranges for sections sequentially in the exception table as if each section has its own exception table. In the example below, two sections result in the two call site ranges (denoted by LSDA1 and LSDA2) placed next to each other. However, their call-sites will refer to records in the shared Action Table. We also emit the header fields (@LPStart and CallSite Table Length) for each call site range in order to place the call site ranges in separate LSDAs. We note that with -basic-block-sections, The CallSiteTableLength will not actually represent the length of the call site table, but rather the reference to the action table. Since the only purpose of this field is to locate the action table, correctness is guaranteed. Finally, every call site range has one @LPStart pointer so the landing pads of each section must all reside in one section (not necessarily the same section). To make this easier, we decide to place all landing pads of the function in one section (hence the `IsLPRange` field in CallSiteRange). \| @LPStart \| ---> Landing pad fragment ( LSDA1 points here) \| CallSite Table Length \| ---> Used to find the action table. \| CallSites \| \| … \| \| … \| \| @LPStart \| ---> Landing pad fragment ( LSDA2 points here) \| CallSite Table Length \| \| CallSites \| \| … \| \| … \| … … \| Action Table \| \| Types Table \| Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D73739	2020-09-30 11:05:55 -07:00
Sam Parker	779a8a028f	[ARM][LowOverheadLoops] TryRemove helper. Make a helper function that wraps around RDA::isSafeToRemove and utilises the existing DCE IT block checks.	2020-09-30 09:37:24 +01:00
Sam Parker	195c22f273	[ARM] Change VPT state assertion Just because we haven't encountered an instruction setting the VPR, it doesn't mean we can't create a VPT block - the VPR maybe a live-in. Differential Revision: https://reviews.llvm.org/D88224	2020-09-30 08:01:10 +01:00

... 3 4 5 6 7 ...

11532 Commits