llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	0a98efb049	[ARM] Add some basic Min/Max costs This adds basic MVE costs for SMIN/SMAX/UMIN/UMAX, as well as MINNUM and MAXNUM representing fmin and fmax. It tightens up the costs, not using a ICmp+Select cost. Differential Revision: https://reviews.llvm.org/D96603	2021-02-15 15:06:19 +00:00
David Green	a838a4f69f	[ARM] Extend search for increment in load/store optimizer Currently the findIncDecAfter will only look at the next instruction for post-inc candidates in the load/store optimizer. This extends that to a search through the current BB, until an instruction that modifies or uses the increment reg is found. This allows more post-inc load/stores and ldm/stm's to be created, especially in cases where a schedule might move instructions further apart. We make sure not to look any further for an SP, as that might invalidate stack slots that are still in use. Differential Revision: https://reviews.llvm.org/D95881	2021-02-15 13:17:21 +00:00
Sjoerd Meijer	357237e93e	Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `effc3b0799`, with the build problem fixed.	2021-02-15 11:33:00 +00:00
Sjoerd Meijer	effc3b0799	Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `cd6de0e8de`.	2021-02-15 11:01:23 +00:00
Sjoerd Meijer	cd6de0e8de	[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600	2021-02-15 10:44:15 +00:00
Arlo Siemsen	080866470d	Add ehcont section support In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling. This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker. This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag. The change includes a test that when the module flag is enabled the section is correctly generated. The set of exception continuation information includes returns from exceptional control flow (catchret in llvm). In order to collect catchret we: 1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation, 2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and 3) Combines these targets with the other EHCont targets that were already being collected. Change originally authored by Daniel Frampton <dframpto@microsoft.com> For more details, see MSVC documentation for `/guard:ehcont` https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D94835	2021-02-15 14:27:12 +08:00
Serge Pavlov	816053bc71	[FPEnv][ARM] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D96501	2021-02-13 11:16:29 +07:00
David Green	875f0cbcc6	[ARM] Optimize fp store of extract to integer store if already available. Given a floating point store from an extracted vector, with an integer VGETLANE that already exists, storing the existing VGETLANEu directly can be better for performance. As the value is known to already be in an integer registers, this can help reduce fp register pressure, removed the need for the fp extract and allows use of more integer post-inc stores not available with vstr. This can be a bit narrow in scope, but helps with certain biquad kernels that store shuffled vector elements. Differential Revision: https://reviews.llvm.org/D96159	2021-02-12 18:34:58 +00:00
David Green	541828e35d	[ARM] Single source VMOVNT Our current lowering of VMOVNT goes via a shuffle vector of the form <0, N, 2, N+2, 4, N+4, ..>. That can of course also be a single input shuffle of the form <0, 0, 2, 2, 4, 4, ..>, where we use a VMOVNT to insert a vector into the top lanes of itself. This adds lowering of that case, re-using the existing isVMOVNMask. Differential Revision: https://reviews.llvm.org/D96065	2021-02-12 14:28:57 +00:00
Sanjay Patel	79b1b4a581	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
David Green	b1ef919aad	[ARM] Add CostKind to getMVEVectorCostFactor. This adds the CostKind to getMVEVectorCostFactor, so that it can automatically account for CodeSize costs, where it returns a cost of 1 not the MVEFactor used for Throughput/Latency. This helps simplify the caller code and allows us to get the codesize cost more correct in more cases.	2021-02-11 15:33:59 +00:00
Simon Tatham	69f1a7ad82	[ARM] Copy-paste error in ARMv87a architecture definition. In the tablegen architecture definition, the Name field for the ARMv87a record read "ARMv86a". All the other records contain their own names. Corrected it to "ARMv87a", and added the necessary value in ARMArchEnum for that to refer to. Reviewed By: pratlucas Differential Revision: https://reviews.llvm.org/D96493	2021-02-11 13:35:56 +00:00
David Green	e771614bae	[ARM] Change getScalarizationOverhead overload used in gather costs. NFC This changes which of the getScalarizationOverhead overloads is used in the gather/scatter cost to use the base variant directly, not relying on the version using heuristics on the number of args with no args provided. It should still produce the same costs for scalarized gathers/scatters.	2021-02-11 11:58:55 +00:00
David Green	7786ac8377	[ARM] Remove dead mov's in preheader of tail predicated loops With t2DoLoopDec we can be left with some extra MOV's in the preheaders of tail predicated loops. This removes them, in the same way we remove other dead variables. Differential Revision: https://reviews.llvm.org/D91857	2021-02-11 10:48:20 +00:00
David Green	1db7b9ceaa	[ARM] Make a BE predicate bitcast consistent with the rest of llvm We were storing predicate registers, such as a <8 x i1>, in the opposite order to how the rest of llvm expects. This actually turns out to be correct for the one place that usually uses it - the ScalarizeMaskedMemIntrin pass, but only because the pass was incorrect itself. This fixes the order so that bits are stored in the opposite order and bitcasts work as expected. This allows the Scalarization pass to be fixed, as in https://reviews.llvm.org/D94765. Differential Revision: https://reviews.llvm.org/D94867	2021-02-11 08:59:52 +00:00
Nick Desaulniers	68945a8686	[Thumb2] support `movs pc, lr` alias for `subs pc, lr, #0`/`eret` This is used by the Linux kernel built with CONFIG_THUMB2_KERNEL. Because different operands are not permitted to `movs`, the diagnostics now provide multiple suggestions along the lines of using a non-pc destination operand or lr source operand. Forked from D95586. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96304	2021-02-10 11:00:42 -08:00
Matt Arsenault	b72a23650f	GlobalISel: Fix using wrong calling convention for callees This was taking the calling convention from the parent function, instead of the callee. Avoids regressions in a future patch when the caller and callee have different type breakdowns. For some reason AArch64's lowerFormalArguments seems to intentionally ignore the parent isVarArg.	2021-02-09 13:48:56 -05:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
David Green	0c7e044a7f	[ARM] One-off identity shuffle A One-Off Identity mask is a shuffle that is mostly an identity mask from as single source but contains a single element out-of-place, either from a different vector or from another position in the same vector. As opposed to lowering this via a ARMISD::BUILD_VECTOR we can generate an extract/insert pair directly. Under ARM with individually accessible lane elements this often becomes a simple lane move. This also alters the LowerVECTOR_SHUFFLEUsingMovs code to use v4f32 (not v4i32), a more natural type for lane moves. Differential Revision: https://reviews.llvm.org/D95551	2021-02-08 21:24:32 +00:00
David Green	11e415dc90	[ARM] Make v2f64 scalar_to_vector legal Because we mark all operations as expand for v2f64, scalar_to_vector would end up lowering through a stack store/reload. But it is pretty simple to implement, only inserting a D reg into an undef vector. This helps clear up some inefficient codegen from soft calling conventions. Differential Revision: https://reviews.llvm.org/D96153	2021-02-08 11:34:55 +00:00
David Green	1b435eb8f3	[ARM] i16 insert-of-extract to VINS pattern This adds another tablegen fold that converts an i16 odd-lane-insert of an even-lane-extract into a VINS. We extract the existing f32 value from the destination register and VINS the new value into it. The rest of the backend then is able to optimize the INSERT_SUBREG / COPY_TO_REGCLASS / EXTRACT_SUBREG. Differential Revision: https://reviews.llvm.org/D95456	2021-02-08 08:41:07 +00:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00
Fangrui Song	e5269da979	[ARM][WebAssembly] Fix incorrect MCOperand::createDFPImm after D96091	2021-02-04 20:39:52 -08:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Dan Gohman	698c6b0a09	[WebAssembly] Support single-floating-point immediate value As mentioned in TODO comment, casting double to float causes NaNs to change bits. To avoid the change, this patch adds support for single-floating-point immediate value on MachineCode. Patch by Yuta Saito. Differential Revision: https://reviews.llvm.org/D77384	2021-02-04 18:05:06 -08:00
Ayke van Laethem	aecdf15cc7	[ARM] Do not emit ldrexd/strexd on Cortex-M chips The ldrexd/strexd instructions are not supported on M-class chips, see for example https://developer.arm.com/documentation/dui0489/e/arm-and-thumb-instructions/memory-access-instructions/ldrex-and-strex which says: > All these 32-bit Thumb instructions are available in ARMv6T2 and > above, except that LDREXD and STREXD are not available in the ARMv7-M > architecture. Looking at the ARMv8-M architecture, it appears that these instructions aren't supported either. The Architecture Reference Manual lists ldrex/strex but not ldrexd/strexd: https://developer.arm.com/documentation/ddi0553/bn/ Godbolt example on LLVM 11.0.0, which incorrectly emits ldrexd/strexd instructions: https://llvm.godbolt.org/z/5qqPnE Differential Revision: https://reviews.llvm.org/D95891	2021-02-04 21:55:34 +01:00
David Green	649a3d00df	[ARM] Handle f16 in GeneratePerfectShuffle This new f16 shuffle under Neon would hit an assert in GeneratePerfectShuffle as it would try to treat a f16 vector as an i8. Add f16 handling, treating them like an i16. Differential Revision: https://reviews.llvm.org/D95446	2021-02-04 11:14:52 +00:00
David Green	3e780616c4	[ARM] Correct some tablegen operand types. NFC	2021-02-02 16:55:31 +00:00
David Green	2753722b0f	[ARM] Mark MVE_VMOV_to_lane_32 as isInsertSubregLike This allows the peephole optimizer to know that a MVE_VMOV_to_lane_32 is the same as an insert subreg, allowing it to optimize some redundant lane moves. Differential Revision: https://reviews.llvm.org/D95433	2021-02-02 16:35:47 +00:00
David Green	3a5adf8483	[ARM] Add MVE insert-of-extract pattern A v4i32 insert of an extract can become a simple lane move, as opposed to round-tripping via a GPR. This adds a patterns that turns an v4i32 insert-extract pair into a EXTRACT_SUBREG/INSERT_SUBREG, with the required COPY_TO_REGCLASS. These get better optimized into a simple lane move by the rest of the backend. Differential Revision: https://reviews.llvm.org/D95428	2021-02-02 15:15:04 +00:00
David Green	c722575633	[ARM] Select VINS from vector inserts This patch adds tablegen patterns for pairs of i16/f16 insert/extracts. If we are inserting into two adjacent vector lanes (0 and 1 for example), we can use either a vmov;vins or vmovx;vins to insert the pair together, avoiding a round-trip from GRP registers. This is quite a large patterns with a number of EXTRACT_SUBREG/INSERT_SUBREG/ COPY_TO_REGCLASS nodes, but hopefully as most of those become copies all that will be cleaned up by further optimizations. The VINS pattern was also adjusted to allow it to represent that it is inserting into the top half of an existing register. Differential Revision: https://reviews.llvm.org/D95381	2021-02-02 13:50:02 +00:00
David Green	48230355e9	[ARM] Remove DLS lr, lr A DLS lr, lr instruction only moves lr to itself. It need not be emitted on it's own to save a instruction in the loop preheader. Differential Revision: https://reviews.llvm.org/D78916	2021-02-02 11:09:31 +00:00
David Green	5b2626ea87	[ARM] Flatten identity shuffles through vqdmulh nodes Given a shuffle(vqdmulh(shuffle, shuffle), we can flatter the shuffles out if they become an identity mask. This can come up during lane interleaving, when we do that better. Differential Revision: https://reviews.llvm.org/D94034	2021-02-01 19:14:20 +00:00
David Green	5805521207	[ARM] Simplify VMOVRRD from extracts of buildvectors Under the softfp calling convention, we are often left with VMOVRRD(extract(bitcast(build_vector(a, b, c, d)))) for the return value of the function. These can be simplified to a,b or c,d directly, depending on the value of the extract. Big endian is a little different because the bitcast switches the lanes around, meaning we end up with b,a or d,c. Differential Revision: https://reviews.llvm.org/D94989	2021-02-01 16:09:25 +00:00
David Green	ad12e6ee95	[ARM] Turn sext_inreg(VGetLaneu) into VGetLaneu This adds a DAG combine for converting sext_inreg of VGetLaneu into VGetLanes, providing the types match correctly. Differential Revision: https://reviews.llvm.org/D95073	2021-02-01 11:10:35 +00:00
David Green	6ab792b68d	[ARM] Simplify extract of VMOVDRR Under SoftFP calling conventions, we can be left with extract(bitcast(BUILD_VECTOR(VMOVDRR(a, b), ..))) patterns that can simplify to a or b, depending on the extract lane. Differential Revision: https://reviews.llvm.org/D94990	2021-02-01 10:24:57 +00:00
Kazu Hirata	7925aa091d	[llvm] Populate SmallVector at construction time (NFC)	2021-01-28 22:21:14 -08:00
Christudasan Devadasan	892e4567e1	Support a list of CostPerUse values This patch allows targets to define multiple cost values for each register so that the cost model can be more flexible and better used during the register allocation as per the target requirements. For AMDGPU the VGPR allocation will be more efficient if the register cost can be associated dynamically based on the calling convention. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D86836	2021-01-29 10:14:52 +05:30
David Green	40f46cb0e4	[ARM] Add alignment checks for MVE VLDn The MVE VLD2/4 and VST2/4 instructions require the pointer to be aligned to at least the size of the element type. This adds a check for that into the ARM lowerInterleavedStore and lowerInterleavedLoad functions, not creating the intrinsics if they are invalid for the alignment of the load/store. Unfortunately this is one of those bug fixes that does effect some useful codegen, as we were able to sometimes do some nice lowering of q15 types. But they can cause problem with low aligned pointers. Differential Revision: https://reviews.llvm.org/D95319	2021-01-28 13:10:08 +00:00
Kazu Hirata	f890fd5f91	[llvm] Use llvm::is_sorted (NFC)	2021-01-27 23:25:39 -08:00
David Green	9e2768a3d9	[ARM] Add neon FP16 scalar_to_vector patterns. This adds some simple fp16 scalar_to_vector patterns, preventing a selection failure if this came up. Differential Revision: https://reviews.llvm.org/D95427	2021-01-27 09:59:15 +00:00
Zhuojia Shen	8cef45517e	[ARM] Fix STRT/STRHT/STRBT input/output operands. STRT, STRHT, and STRBT are store instructions and their source register $Rt should be treated as an input operand instead of an output operand. This should fix things (e.g., liveness tracking in LivePhysRegs) if these instructions were used in CodeGen. Differential Revision: https://reviews.llvm.org/D95074	2021-01-26 14:00:58 -08:00
Adhemerval Zanella	dad55c2218	[ARM] [ELF] Fix ARMMaterializeGV for Indirect calls Recent shouldAssumeDSOLocal changes (introduced by `961f31d8ad`) do not take in consideration the relocation model anymore. The ARM fast-isel pass uses the function return to set whether a global symbol is loaded indirectly or not, and without the expected information llvm now generates an extra load for following code: ``` $ cat test.ll @__asan_option_detect_stack_use_after_return = external global i32 define dso_local i32 @main(i32 %argc, i8** %argv) #0 { entry: %0 = load i32, i32* @__asan_option_detect_stack_use_after_return, align 4 %1 = icmp ne i32 %0, 0 br i1 %1, label %2, label %3 2: ret i32 0 3: ret i32 1 } attributes #0 = { noinline optnone } $ lcc test.ll -o - [...] main: .fnstart [...] movw r0, :lower16:__asan_option_detect_stack_use_after_return movt r0, :upper16:__asan_option_detect_stack_use_after_return ldr r0, [r0] ldr r0, [r0] cmp r0, #0 [...] ``` And without 'optnone' it produces: ``` [...] main: .fnstart [...] movw r0, :lower16:__asan_option_detect_stack_use_after_return movt r0, :upper16:__asan_option_detect_stack_use_after_return ldr r0, [r0] clz r0, r0 lsr r0, r0, #5 bx lr [...] ``` This triggered a lot of invalid memory access in sanitizers for arm-linux-gnueabihf. I checked this patch both a stage1 built with gcc and a stage2 bootstrap and it fixes all the Linux sanitizers issues. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D95379	2021-01-26 15:57:55 -03:00
Kazu Hirata	16baad8f4e	[llvm] Use pop_back_val (NFC)	2021-01-24 12:18:57 -08:00
Kazu Hirata	054444177b	[Target] Use llvm::append_range (NFC)	2021-01-24 12:18:56 -08:00
Kazu Hirata	e4847a7fcf	Revert "[Target] Use llvm::append_range (NFC)" This reverts commit `cc7a238286`. The X86WinEHState.cpp hunk seems to break certain builds.	2021-01-23 11:25:27 -08:00
Kazu Hirata	cc7a238286	[Target] Use llvm::append_range (NFC)	2021-01-23 10:56:31 -08:00
Stanislav Mekhanoshin	607bec0bb9	Change materializeFrameBaseRegister() to return register The only caller of this function is in the LocalStackSlotAllocation and it creates base register of class returned by the target's getPointerRegClass(). AMDGPU wants to use a different reg class here so let materializeFrameBaseRegister to just create and return whatever it wants. Differential Revision: https://reviews.llvm.org/D95268	2021-01-22 15:51:06 -08:00
David Green	af03324984	[ARM] Disable sign extended SSAT pattern recognition. I may have given bad advice, and skipping sext_inreg when matching SSAT patterns is not valid on it's own. It at least needs to sext_inreg the input again, but as far as I can tell is still only valid based on demanded bits. For the moment disable that part of the combine, hopefully reimplementing it in the future more correctly.	2021-01-22 14:07:48 +00:00
David Green	9ae73cdbc1	[ARM] Adjust isSaturatingConditional to return a new SDValue. NFC This replaces the isSaturatingConditional function with LowerSaturatingConditional that directly returns a new SSAT or USAT SDValue, instead of returning true and the components of it.	2021-01-22 11:11:36 +00:00
David Green	39db5753f9	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in `745bf6cf44`. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
David Green	dfac521da1	[ARM] Fix vector saddsat costs. It turns out the vectorizer calls the getIntrinsicInstrCost functions with a scalar return type and vector VF. This updates the costmodel to handle that, still producing the correct vector costs. A vectorizer test is added to show it vectorizing at the correct factor again.	2021-01-21 15:30:39 +00:00
David Green	6a563eef13	[ARM] Expand vXi1 VSELECT's We have no lowering for VSELECT vXi1, vXi1, vXi1, so mark them as expanded to turn them into a series of logical operations. Differential Revision: https://reviews.llvm.org/D94946	2021-01-19 17:56:50 +00:00
David Green	f373b30923	[ARM] Add MVE add.sat costs This adds some basic MVE sadd_sat/ssub_sat/uadd_sat/usub_sat costs, based on when the instruction is legal. With smaller than legal types that are promoted we generate shr(qadd(shl, shl)), so the cost is 4 appropriately. Differential Revision: https://reviews.llvm.org/D94958	2021-01-19 15:38:46 +00:00
Yvan Roux	244ad228f3	[ARM][MachineOutliner] Add stack fixup feature This patch handles cases where we have to save/restore the link register into the stack and and load/store instruction which use the stack are part of the outlined region. It checks that there will be no overflow introduced by the new offset and fixup these instructions accordingly. Differential Revision: https://reviews.llvm.org/D92934	2021-01-19 10:59:09 +01:00
David Green	e7dc083a41	[ARM] Don't handle low overhead branches in AnalyzeBranch It turns our that the BranchFolder and IfCvt does not like unanalyzable branches that fall-through. This means that removing the unconditional branches from the end of tail predicated instruction can run into asserts and verifier issues. This effectively reverts `372eb2bbb6`, but adds handling to t2DoLoopEndDec which are not branches, so can be safely skipped.	2021-01-18 17:16:07 +00:00
David Green	1454724215	[ARM] Align blocks that are not fallthough targets If the previous block in a function does not fallthough, adding nop's to align it will never be executed. This means we can freely (except for codesize) align more branches. This happens in constantislandspass (as it cannot happen later) and only happens at aggressive optimization levels as it does increase codesize. Differential Revision: https://reviews.llvm.org/D94394	2021-01-16 22:19:35 +00:00
David Green	372eb2bbb6	[ARM] Add low overhead loops terminators to AnalyzeBranch This treats low overhead loop branches the same as jump tables and indirect branches in analyzeBranch - they cannot be analyzed but the direct branches on the end of the block may be removed. This helps remove the unnecessary branches earlier, which can help produce better codegen (and change block layout in a number of cases). Differential Revision: https://reviews.llvm.org/D94392	2021-01-16 18:30:21 +00:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
David Green	f5abf0bd48	[ARM] Tail predication with constant loop bounds The TripCount for a predicated vector loop body will be ceil(ElementCount/Width). This alters the conversion of an active.lane.mask to a VCPT intrinsics to match. Differential Revision: https://reviews.llvm.org/D94608	2021-01-15 18:17:31 +00:00
Sam Tebbs	1a497ae9b8	[ARM][Block placement] Check the predecessor exists before processing it Not all machine loops will have a predecessor. so the pass needs to check it before continuing. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D94780	2021-01-15 15:45:13 +00:00
Sam Tebbs	5e4480b6c0	[ARM] Don't run the block placement pass at O0 The block placement pass shouldn't run unless optimisations are enabled. Differential Revision: https://reviews.llvm.org/D94691	2021-01-15 13:59:29 +00:00
Oliver Stannard	3676ef1053	[ARM][GISel] Treat calls as variadic even if only fixed arguments provided For the ARM hard-float calling convention, calls to variadic functions need to be treated diffrently, even if only the fixed arguments are provided. This fixes GCC-C-execute-pr68390 in the test-suite, which is failing on the ARM GlobaISel bot.	2021-01-15 09:37:01 +00:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Sam Tebbs	60fda8ebb6	[ARM] Add a pass that re-arranges blocks when there is a backwards WLS branch Blocks can be laid out such that a t2WhileLoopStart branches backwards. This is forbidden by the architecture and so it fails to be converted into a low-overhead loop. This new pass checks for these cases and moves the target block, fixing any fall-through that would then be broken. Differential Revision: https://reviews.llvm.org/D92385	2021-01-13 17:23:00 +00:00
David Green	c29ca8551a	[ARM] Update isVMOVNOriginalMask to handle single input shuffle vectors The isVMOVNOriginalMask was previously only checking for two input shuffles that could be better expanded as vmovn nodes. This expands that to single input shuffles that will later be legalized to multiple vectors. Differential Revision: https://reviews.llvm.org/D94189	2021-01-13 08:51:28 +00:00
Stephan Herhut	2e17d9c0ee	[ARM] Add uses for locals introduced for debug messages. NFC. This adds uses for locals introduced for new debug messages for the load store optimizer. Those locals are only used on debug statements and otherwise create unused variable warnings. Differential Revision: https://reviews.llvm.org/D94398	2021-01-11 14:27:28 +01:00
David Green	8165a03420	[ARM] Add debug messages for the load store optimizer. NFC	2021-01-11 09:24:28 +00:00
David Green	dcefcd51e0	[ARM] Update trunc costs We did not have specific costs for larger than legal truncates that were not otherwise cheap (where they were next to stores, for example). As MVE does not have a dedicated instruction for them (and we do not use loads/stores yet), they should be expensive as they get expanded to a series of lane moves. Differential Revision: https://reviews.llvm.org/D94260	2021-01-11 08:59:28 +00:00
David Green	024af42c60	[ARM] Custom lower i1 vector truncates The ISel patterns we have for truncating to i1's under MVE do not seem to be correct. Instead custom lower to icmp(ne, and(x, 1), 0). Differential Revision: https://reviews.llvm.org/D94226	2021-01-08 18:21:00 +00:00
Kazu Hirata	b934160aaa	[Target] Use llvm::find_if (NFC)	2021-01-07 20:29:36 -08:00
David Green	63dce70b79	[ARM] Handle any extend whilst lowering addw/addl/subw/subl Same as `a9b6440edd`, use zanyext to treat any_extends as zero extends during lowering to create addw/addl/subw/subl nodes. Differential Revision: https://reviews.llvm.org/D93835	2021-01-06 11:26:39 +00:00
David Green	ddb82fc76c	[ARM] Handle any extend whilst lowering mull Similar to `78d8a821e2` but for ARM, this handles any_extend whilst creating MULL nodes, treating them as zextends. Differential Revision: https://reviews.llvm.org/D93834	2021-01-06 10:51:12 +00:00
Christudasan Devadasan	d68458bd56	[GlobalISel] Base implementation for sret demotion. If the return values can't be lowered to registers SelectionDAG performs the sret demotion. This patch contains the basic implementation for the same in the GlobalISel pipeline. Furthermore, targets should bring relevant changes during lowerFormalArguments, lowerReturn and lowerCall to make use of this feature. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D92953	2021-01-06 10:30:50 +05:30
QingShan Zhang	2962f1149c	[NFC] Add the getSizeInBytes() interface for MachineConstantPoolValue Current implementation assumes that, each MachineConstantPoolValue takes up sizeof(MachineConstantPoolValue::Ty) bytes. For PowerPC, we want to lump all the constants with the same type as one MachineConstantPoolValue to save the cost that calculate the TOC entry for each const. So, we need to extend the MachineConstantPoolValue that break this assumption. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89108	2021-01-05 03:22:45 +00:00
Kazu Hirata	eb198f4c3c	[llvm] Use llvm::any_of (NFC)	2021-01-04 11:42:47 -08:00
David Green	901cc9b6f3	[ARM] Extend lowering for i64 reductions The lowering of a <4 x i16> or <4 x i8> vecreduce.add into an i64 would previously be expanded, due to the i64 not being legal. This patch adjusts our reduction matchers, making it produce a VADDLV(sext A to v4i32) instead. Differential Revision: https://reviews.llvm.org/D93622	2021-01-04 12:44:43 +00:00
Kazu Hirata	0e219b6443	[Target] Construct SmallVector with iterator ranges (NFC)	2021-01-03 09:57:45 -08:00
Kazu Hirata	331c28f60d	[ARM] Declare Op within an if statement (NFC)	2020-12-30 17:45:36 -08:00
Mark Murray	5abfeccf10	[ARM][AArch64] Add Cortex-A78C Support for Clang and LLVM This patch upstreams support for the Armv8-a Cortex-A78C processor for AArch64 and ARM. In detail: Adding cortex-a78c as cpu option for aarch64 and arm targets in clang Adding Cortex-A78C CPU name and ProcessorModel in llvm Details of the CPU can be found here: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78c	2020-12-29 10:18:59 +00:00
David Penry	a9f14cdc62	[ARM] Add bank conflict hazarding Adds ARMBankConflictHazardRecognizer. This hazard recognizer looks for a few situations where the same base pointer is used and then checks whether the offsets lead to a bank conflict. Two parameters are also added to permit overriding of the target assumptions: arm-data-bank-mask=<int> - Mask of bits which are to be checked for conflicts. If all these bits are equal in the offsets, there is a conflict. arm-assume-itcm-bankconflict=<bool> - Assume that there will be bank conflicts on any loads to a constant pool. This hazard recognizer is enabled for Cortex-M7, where the Technical Reference Manual states that there are two DTCM banks banked using bit 2 and one ITCM bank. Differential Revision: https://reviews.llvm.org/D93054	2020-12-23 14:00:59 +00:00
Fangrui Song	d9a0c40bce	[MC] Split MCContext::createTempSymbol, default AlwaysAddSuffix to true, and add comments CanBeUnnamed is rarely false. Splitting to a createNamedTempSymbol makes the intention clearer and matches the direction of reverted r240130 (to drop the unneeded parameters). No behavior change.	2020-12-21 14:04:13 -08:00
Fangrui Song	d33abc337c	Migrate MCContext::createTempSymbol call sites to AlwaysAddSuffix=true Most call sites set AlwaysAddSuffix to true. The two use cases do not really need false and can be more consistent with other temporary symbol usage.	2020-12-21 14:04:13 -08:00
Fangrui Song	7b9890e17e	[MC][ELF] Remove unneeded MCSymbol::setExternal calls ELF code uses symbol bindings and does not call isExternal().	2020-12-20 21:26:36 -08:00
Kazu Hirata	966f1431de	[Target] Use llvm::erase_if (NFC)	2020-12-20 17:43:22 -08:00
Kristof Beyls	df8ed39283	[ARM] harden-sls-blr: avoid r12 and lr in indirect calls. As a linker is allowed to clobber r12 on function calls, the code transformation that hardens indirect calls is not correct in case a linker does so. Similarly, the transformation is not correct when register lr is used. This patch makes sure that r12 or lr are not used for indirect calls when harden-sls-blr is enabled. Differential Revision: https://reviews.llvm.org/D92469	2020-12-19 12:39:59 +00:00
Kristof Beyls	a4c1f5160e	[ARM] Harden indirect calls against SLS To make sure that no barrier gets placed on the architectural execution path, each indirect call calling the function in register rN, it gets transformed to a direct call to __llvm_slsblr_thunk_mode_rN. mode is either arm or thumb, depending on the mode of where the indirect call happens. The llvm_slsblr_thunk_mode_rN thunk contains: bx rN <speculation barrier> Therefore, the indirect call gets split into 2; one direct call and one indirect jump. This transformation results in not inserting a speculation barrier on the architectural execution path. The mitigation is off by default and can be enabled by the harden-sls-blr subtarget feature. As a linker is allowed to clobber r12 on function calls, the above code transformation is not correct in case a linker does so. Similarly, the transformation is not correct when register lr is used. Avoiding r12/lr being used is done in a follow-on patch to make reviewing this code easier. Differential Revision: https://reviews.llvm.org/D92468	2020-12-19 12:33:42 +00:00
Kristof Beyls	320fd3314e	[ARM] Implement harden-sls-retbr for Thumb mode The only non-trivial consideration in this patch is that the formation of TBB/TBH instructions, which is done in the constant island pass, does not understand the speculation barriers inserted by the SLSHardening pass. As such, when harden-sls-retbr is enabled for a function, the formation of TBB/TBH instructions in the constant island pass is disabled. Differential Revision: https://reviews.llvm.org/D92396	2020-12-19 12:32:47 +00:00
Kristof Beyls	195f44278c	[ARM] Implement harden-sls-retbr for ARM mode Some processors may speculatively execute the instructions immediately following indirect control flow, such as returns, indirect jumps and indirect function calls. To avoid a potential miss-speculatively executed gadget after these instructions leaking secrets through side channels, this pass places a speculation barrier immediately after every indirect control flow where control flow doesn't return to the next instruction, such as returns and indirect jumps, but not indirect function calls. Hardening of indirect function calls will be done in a later, independent patch. This patch is implementing the same functionality as the AArch64 counter part implemented in https://reviews.llvm.org/D81400. For AArch64, returns and indirect jumps only occur on RET and BR instructions and hence the function attribute to control the hardening is called "harden-sls-retbr" there. On AArch32, there is a much wider variety of instructions that can trigger an indirect unconditional control flow change. I've decided to stick with the name "harden-sls-retbr" as introduced for the corresponding AArch64 mitigation. This patch implements this for ARM mode. A future patch will extend this to also support Thumb mode. The inserted barriers are never on the correct, architectural execution path, and therefore performance overhead of this is expected to be low. To ensure these barriers are never on an architecturally executed path, when the harden-sls-retbr function attribute is present, indirect control flow is never conditionalized/predicated. On targets that implement that Armv8.0-SB Speculation Barrier extension, a single SB instruction is emitted that acts as a speculation barrier. On other targets, a DSB SYS followed by a ISB is emitted to act as a speculation barrier. These speculation barriers are implemented as pseudo instructions to avoid later passes to analyze them and potentially remove them. The mitigation is off by default and can be enabled by the harden-sls-retbr subtarget feature. Differential Revision: https://reviews.llvm.org/D92395	2020-12-19 11:42:39 +00:00
David Green	e1c1adf9dc	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. This is a recommit of `6cc3d80a84` after fixing the backward instruction definitions.	2020-12-18 16:13:08 +00:00
David Green	6e913e4451	Revert "[ARM] Match dual lane vmovs from insert_vector_elt" This one needed more testing.	2020-12-18 13:33:40 +00:00
Yvan Roux	923ca0b411	[ARM][MachineOutliner] Fix costs model. Fix candidates calls costs models allocation and prepare stack fixups handling. Differential Revision: https://reviews.llvm.org/D92933	2020-12-17 16:08:23 +01:00
Lucas Prates	c5046ebdf6	[ARM] Adding v8.7-A command-line support for the ARM target This extends the command-line support for the 'armv8.7-a' architecture name to the ARM target. Based on a patch written by Momchil Velikov. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D93231	2020-12-17 13:48:54 +00:00
Lucas Prates	42b92b31b8	[ARM][AArch64] Adding basic support for the v8.7-A architecture This introduces support for the v8.7-A architecture through a new subtarget feature called "v8.7a". It adds two new "WFET" and "WFIT" instructions, the nXS limited-TLB-maintenance qualifier for DSB and TLBI instructions, a new CPU id register, ID_AA64ISAR2_EL1, and the new HCRX_EL2 system register. Based on patches written by Simon Tatham and Victor Campos. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D91772	2020-12-17 13:45:08 +00:00
David Green	6cc3d80a84	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. Differential Revision: https://reviews.llvm.org/D92553	2020-12-15 15:58:52 +00:00
David Green	1de3e7fd62	[ARM] Improve handling of empty VPT blocks in tail predicated loops A vpt block that just contains either VPST;VCTP or VPT;VCTP, once the VCTP is removed will become invalid. This fixed the first by removing the now empty block and bails out for the second, as we have no simple way of converting a VPT to a VCMP. Differential Revision: https://reviews.llvm.org/D92369	2020-12-14 11:17:01 +00:00
David Green	a4823377fd	[ARM] Add basic masked load/store costs This adds some basic MVE masked load/store costs, notably changing the cost of legal loads/stores to the MVECostFactor and the cost of scalarized instructions to 8*NumElts. Differential Revision: https://reviews.llvm.org/D86538	2020-12-12 15:26:32 +00:00
David Green	3f571be1c0	[ARM] Make t2DoLoopStartTP a terminator Although this was something that I was hoping we would not have to do, this patch makes t2DoLoopStartTP a terminator in order to keep it at the end of it's block, so not allowing extra MVE instruction between it and the end. With t2DoLoopStartTP's also starting tail predication regions, it also marks them as having side effects. The t2DoLoopStart is still not a terminator, giving it the extra scheduling freedom that can be helpful, but now that we have a TP version they can be treated differently. Differential Revision: https://reviews.llvm.org/D91887	2020-12-11 09:23:57 +00:00
Haojian Wu	2fc4afda0f	Fix a -Wunused-variable warning in release build.	2020-12-10 14:52:45 +01:00
David Green	0447f3508f	[ARM][RegAlloc] Add t2LoopEndDec We currently have problems with the way that low overhead loops are specified, with LR being spilled between the t2LoopDec and the t2LoopEnd forcing the entire loop to be reverted late in the backend. As they will eventually become a single instruction, this patch introduces a t2LoopEndDec which is the combination of the two, combined before registry allocation to make sure this does not fail. Unfortunately this instruction is a terminator that produces a value (and also branches - it only produces the value around the branching edge). So this needs some adjustment to phi elimination and the register allocator to make sure that we do not spill this LR def around the loop (needing to put a spill after the terminator). We treat the loop very carefully, making sure that there is nothing else like calls that would break it's ability to use LR. For that, this adds a isUnspillableTerminator to opt in the new behaviour. There is a chance that this could cause problems, and so I have added an escape option incase. But I have not seen any problems in the testing that I've tried, and not reverting Low overhead loops is important for our performance. If this does work then we can hopefully do the same for t2WhileLoopStart and t2DoLoopStart instructions. This patch also contains the code needed to convert or revert the t2LoopEndDec in the backend (which just needs a subs; bne) and the code pre-ra to create them. Differential Revision: https://reviews.llvm.org/D91358	2020-12-10 12:14:23 +00:00
David Green	b0ce615b2d	[ARM] Remove copies from low overhead phi inductions. The phi created in a low overhead loop gets created with a default register class it seems. There are then copied inserted between the low overhead loop pseudo instructions (which produce/consume GPRlr instructions) and the phi holding the induction. This patch removes those as a step towards attempting to make t2LoopDec and t2LoopEnd a single instruction, and appears useful in it's own right as shown in the tests. Differential Revision: https://reviews.llvm.org/D91267	2020-12-10 10:30:31 +00:00
David Green	384383e15c	[ARM] Common inverse constant predicates to VPNOT This scans through blocks looking for constants used as predicates in MVE instructions. When two constants are found which are the inverse of one another, the second can be replaced by a VPNOT of the first, potentially allowing that not to be folded away into an else predicate of a vpt block. Differential Revision: https://reviews.llvm.org/D92470	2020-12-09 07:56:44 +00:00
David Green	03e675fd12	[ARM] Turn pred_cast(xor(x, -1)) into xor(pred_cast(x), -1) This folds a not (an xor -1) though a predicate_cast, so that it can be turned into a VPNOT and potentially be folded away as an else predicate inside a VPT block. Differential Revision: https://reviews.llvm.org/D92235	2020-12-08 15:22:46 +00:00
David Green	91fb9eac0b	[ARM] Remove dead instructions before creating VPT block bundles We remove VPNOT instructions in VPT blocks as we create them, turning them into else predicates. We don't remove the dead instructions until after the block has been created though. Because the VPNOT will have killed the vpr register it used, this makes finalizeBundle add internal flags to the vpr uses of any instructions after the VPNOT. These incorrect flags can then confuse what is alive and what is not, leading to machine verifier problems. This patch removes them earlier instead, before the bundle is finalized so that kill flags remain valid. Differential Revision: https://reviews.llvm.org/D92227	2020-12-08 14:05:07 +00:00
David Green	d9bf6245bf	[ARM] Revert low overhead loops with calls before registry allocation. This adds code to revert low overhead loops with calls in them before register allocation. Ideally we would not create low overhead loops with calls in them to begin with, but that can be difficult to always get correct. If we want to try and glue together t2LoopDec and t2LoopEnd into a single instruction, we need to ensure that no instructions use LR in the loop. (Technically the final code can be better too, as it doesn't need to use the same registers but that has not been optimized for here, as reverting loops with calls is expected to be very rare). It also adds a MVETailPredUtils.h header to share the revert code between different passes, and provides a place to expand upon, with RevertLoopWithCall becoming a place to perform other low overhead loop alterations like removing copies or combining LoopDec and End into a single instruction. Differential Revision: https://reviews.llvm.org/D91273	2020-12-07 15:44:40 +00:00
Evgeny Leviant	993eaf2d69	Recommit [TableGen][SchedModels] Fix read/write variant substitution Original commit rG112b3cb6ba49 introduced non-determinism in subtarget generator due to iteration over DenseMap. New patch fixes this changing ProcModelMapTy from DenseMap to std::map.	2020-12-04 21:50:34 +03:00
Fangrui Song	86fa896363	Revert D90844 "[TableGen][SchedModels] Fix read/write variant substitution" This reverts commit `112b3cb6ba`. D90844 made lib/Target/AArch64/AArch64GenSubtargetInfo.inc non-deterministic.	2020-12-03 14:24:29 -08:00
dfukalov	2ce38b3f03	[NFC] Reduce include files dependency. 1. Removed #include "...AliasAnalysis.h" in other headers and modules. 2. Cleaned up includes in AliasAnalysis.h. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92489	2020-12-03 18:25:05 +03:00
Mircea Trofin	bab72dd5d5	[NFC][MC] TargetRegisterInfo::getSubReg is a MCRegister. Typing the API appropriately. Differential Revision: https://reviews.llvm.org/D92341	2020-12-02 15:46:38 -08:00
David Green	eedf0ed63e	[ARM] Mark select and selectcc of MVE vector operations as expand. We already expand select and select_cc in codegenprepare, but they can still be generated under some situations. Explicitly mark them as expand to ensure they are not produced, leading to a failure to select the nodes. Differential Revision: https://reviews.llvm.org/D92373	2020-12-01 15:05:55 +00:00
David Green	7923d71b4a	[ARM] PREDICATE_CAST demanded bits The PREDICATE_CAST node is used to model moves between MVE predicate registers and gpr's, and eventually become a VMSR p0, rn. When moving to a predicate only the bottom 16 bits of the sources register are demanded. This adds a simple fold for that, allowing it to potentially remove instructions like uxth. Differential Revision: https://reviews.llvm.org/D92213	2020-12-01 10:32:24 +00:00
Evgeny Leviant	112b3cb6ba	[TableGen][SchedModels] Fix read/write variant substitution Patch fixes multiple issues related to expansion of variant sched reads and writes. Differential revision: https://reviews.llvm.org/D90844	2020-11-30 11:55:55 +03:00
Nikita Popov	4df8efce80	[AA] Split up LocationSize::unknown() Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object). This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses. The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way. Differential Revision: https://reviews.llvm.org/D91649	2020-11-26 18:39:55 +01:00
David Green	0e49a40d75	[ARM] Cleanup for the MVETailPrediction pass This strips out a lot of the code that should no longer be needed from the MVETailPredictionPass, leaving the important part - find active lane mask instructions and convert them to VCTP operations. Differential Revision: https://reviews.llvm.org/D91866	2020-11-26 15:10:44 +00:00
Mark Murray	2b6691894a	[ARM][AArch64] Adding Neoverse N2 CPU support Add support for the Neoverse N2 CPU to the ARM and AArch64 backends. Differential Revision: https://reviews.llvm.org/D91695	2020-11-25 11:42:54 +00:00
Evgeny Leviant	a6a6d11c7b	[MC][ARM] Fix number of operands of tMOVSr Differential revision: https://reviews.llvm.org/D92029	2020-11-24 18:13:10 +03:00
Evgeny Leviant	50bd686695	Add support for branch forms of ALU instructions to Cortex-A57 model Patch fixes scheduling of ALU instructions which modify pc register. Patch also fixes computation of mutually exclusive predicates for sequences of variants to be properly expanded Differential revision: https://reviews.llvm.org/D91266	2020-11-24 11:43:51 +03:00
Craig Topper	4252f7773a	[SelectionDAG][ARM][AArch64][Hexagon][RISCV][X86] Add SDNPCommutative to fma and fmad nodes in tablegen. Remove explicit commuted patterns from targets. X86 was already specially marking fma as commutable which allowed tablegen to autogenerate commuted patterns. This moves it to the target independent definition and fix up the targets to remove now unneeded patterns. Unfortunately, the tests change because the commuted version of the patterns are generating operands in a different than the explicit patterns. Differential Revision: https://reviews.llvm.org/D91842	2020-11-23 10:09:20 -08:00
David Green	c8c3a411c5	[ARM] Ensure MVE_TwoOpPattern is used inside Predicate's	2020-11-22 21:38:00 +00:00
David Green	f08c37da7b	[ARM] Disable WLSTP loops This checks to see if the loop will likely become a tail predicated loop and disables wls loop generation if so, as the likelihood for reverting is currently too high. These should be fairly rare situations anyway due to the way iterations and element counts are used during lowering. Just not trying can alter how SCEV's are materialized however, leading to different codegen. It also adds a option to disable all while low overhead loops, for debugging. Differential Revision: https://reviews.llvm.org/D91663	2020-11-20 13:30:44 +00:00
Sam Tebbs	8ecb015ed5	[ARM][LowOverheadLoops] Convert intermediate vpr use assertion to condition This converts the intermediate VPR use assertion to a condition in the if-statement to protect against assertion failures in case behaviuour is changed. This is a follow-up to https://reviews.llvm.org/D90935 and implements the post-approval comments. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91790	2020-11-19 17:15:45 +00:00
David Green	006b3bdedd	[ARM] Deliberately prevent inline asm in low overhead loops. NFC This was already something that was handled by one of the "else" branches in maybeLoweredToCall, so this patch is an NFC but makes it explicit and adds a test. We may in the future want to support this under certain situations but for the moment just don't try and create low overhead loops with inline asm in them. Differential Revision: https://reviews.llvm.org/D91257	2020-11-19 13:28:21 +00:00
Duncan P. N. Exon Smith	5abf76fbe3	ADT: Add assertions to SmallVector::insert, etc., for reference invalidation `2c196bbc6b` asserted that `SmallVector::push_back` doesn't invalidate the parameter when it needs to grow. Do the same for `resize`, `append`, `assign`, `insert`, and `emplace_back`. Differential Revision: https://reviews.llvm.org/D91744	2020-11-18 17:36:28 -08:00
Mikhail Goncharov	f45c052c9e	Fix unused variables in release build Differential Revision: https://reviews.llvm.org/D91705	2020-11-18 15:18:31 +01:00
Sam Tebbs	da2e4728c7	[ARM][LowOverheadLoops] Merge VCMP and VPST across VPT blocks This patch adds support for combining a VPST with a dangling VCMP from a previous VPT block. Differential Revision: https://reviews.llvm.org/D90935	2020-11-18 12:54:16 +00:00
Florian Hahn	b2f4c5fddc	[AsmWriter] Factor out mnemonic generation to accessible getMnemonic. This patch factors out the part of printInstruction that gets the mnemonic string for a given MCInst. This is intended to be used subsequently for the instruction-mix remarks to display the final mnemonic (D90040). Unfortunately making `getMnemonic` available to the AsmPrinter seems to require making it virtual. Not sure if there's a way around that with the current layering of the AsmPrinters. Reviewed By: Paul-C-Anagnostopoulos Differential Revision: https://reviews.llvm.org/D90039	2020-11-17 09:47:38 +00:00
David Penry	48b43c9d4f	[ARM] Cortex-M7 schedule This patch adds the SchedMachineModel for Cortex-M7. It also adds test cases for the scheduling information. Details of the pipeline and descriptions are in comments in file ARMScheduleM7.td included in this patch. Differential Revision: https://reviews.llvm.org/D91355	2020-11-16 10:16:07 +00:00
serge-sans-paille	9218ff50f9	llvmbuildectomy - replace llvm-build by plain cmake No longer rely on an external tool to build the llvm component layout. Instead, leverage the existing `add_llvm_componentlibrary` cmake function and introduce `add_llvm_component_group` to accurately describe component behavior. These function store extra properties in the created targets. These properties are processed once all components are defined to resolve library dependencies and produce the header expected by llvm-config. Differential Revision: https://reviews.llvm.org/D90848	2020-11-13 10:35:24 +01:00
David Green	11dee2eae2	[ARM] Ensure CountReg definition dominates InsertPt when creating t2DoLoopStartTP Of course there was something missing, in this case a check that the def of the count register we are adding to a t2DoLoopStartTP would dominate the insertion point. In the future, when we remove some of these COPY's in between, the t2DoLoopStartTP will always become the last instruction in the block, preventing this from happening. In the meantime we need to check they are created in a sensible order. Differential Revision: https://reviews.llvm.org/D91287	2020-11-12 13:47:46 +00:00
Simon Pilgrim	1a62ca65c1	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
Sam Parker	898a81dfc5	[NFC][ARM] Replace lambda with any_of	2020-11-11 10:02:55 +00:00
Pirama Arumuga Nainar	8262e94a6d	[ARM] Fix PR 47980: Use constrainRegClass during foldImmediate opt. Previously we used setRegClass to rgpr, which may expand the register domain if the result was already in a constrained class (tcgpr in the above PR). Differential Revision: https://reviews.llvm.org/D91192	2020-11-10 13:38:11 -08:00
Benjamin Kramer	92c61a045f	[ARM] Silence unused variable warning in Release builds. NFC.	2020-11-10 20:35:28 +01:00
David Green	08d1c2d470	[ARM] Introduce t2DoLoopStartTP This introduces a new pseudo instruction, almost identical to a t2DoLoopStart but taking 2 parameters - the original loop iteration count needed for a low overhead loop, plus the VCTP element count needed for a DLSTP instruction setting up a tail predicated loop. The idea is that the instruction holds both values and the backend ARMLowOverheadLoops pass can pick between the two, depending on whether it creates a tail predicated loop or falls back to a low overhead loop. To do that there needs to be something that converts a t2DoLoopStart to a t2DoLoopStartTP, for which this patch repurposes the MVEVPTOptimisationsPass as a "tail predication and vpt optimisation" pass. The extra operand for the t2DoLoopStartTP is chosen based on the operands of VCTP's in the loop, and the instruction is moved as late in the block as possible to attempt to increase the likelihood of making tail predicated loops. Differential Revision: https://reviews.llvm.org/D90591	2020-11-10 18:08:12 +00:00
David Green	dbe1bf63aa	[ARM] Cleanup for ARMLowOverheadLoops. NFC	2020-11-10 17:28:07 +00:00
David Green	c7e275388e	[ARM] Don't aggressively unroll vector remainder loops We already do not unroll loops with vector instructions under MVE, but that does not include the remainder loops that the vectorizer produces. These remainder loops will be rarely executed and are not worth unrolling, as the trip count is likely to be low if they get executed at all. Luckily they get llvm.loop.isvectorized to make recognizing them simpler. We have wanted to do this for a while but hit issues with low overhead loops being reverted due to difficult registry allocation. With recent changes that seems to be less of an issue now. Differential Revision: https://reviews.llvm.org/D90055	2020-11-10 17:01:31 +00:00
David Green	73a6cd4b6b	[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR This hints the operand of a t2DoLoopStart towards using LR, which can help make it more likely to become t2DLS lr, lr. This makes it easier to move if needed (as the input is the same as the output), or potentially remove entirely. The hint is added after others (from COPY's etc) which still take precedence. It needed to find a place to add the hint, which currently uses the post isel custom inserter. Differential Revision: https://reviews.llvm.org/D89883	2020-11-10 16:28:57 +00:00
David Green	b2ac9681a7	[ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881	2020-11-10 15:57:58 +00:00
David Green	c8cd7e2bbf	[ARM] Remove MI variable aliasing. NFC This was accidentally using the same name for two different variables in the same line. Whilst it seems to work for some compilers, others have trouble and it is probably not a fantastic idea.	2020-11-09 18:18:43 +00:00
Momchil Velikov	937ab6a785	[ARM][MachineOutliner] Emit more CFI instructions This patch make the outliner emit CFI instructions in a few more places: * after LR is restored, but before the return in an outlined function * around save/restore of LR to/from a register at calls to outlined functions * around save/restore of LR to/from the stack at calls to outlined functions The latter two only when the function does NOT spill LR. If the function spills LR, then outliner generated saves/restores around calls are not considered interesting for unwinding the frame. Differential Revision: https://reviews.llvm.org/D89483	2020-11-09 15:26:18 +00:00
Sam Tebbs	40a3f7e48d	[ARM][LowOverheadLoops] Merge a VCMP and the new VPST into a VPT There were cases where a VCMP and a VPST were merged even if the VCMP didn't have the same defs of its operands as the VPST. This is fixed by adding RDA checks for the defs. This however gave rise to cases where the new VPST created would precede the un-merged VCMP and so would fail a predicate mask assertion since the VCMP wasn't predicated. This was solved by converting the VCMP to a VPT instead of inserting the new VPST. Differential Revision: https://reviews.llvm.org/D90461	2020-11-09 15:03:48 +00:00
David Green	a0a9e1c798	[ARM] Remove kill flags between VCMP and insertion point When we fold a VCMP into a VPST instruction any kill flags between the old VCMP position and the new insertion point need to be removed, in order to keep the verifier happy. Differential Revision: https://reviews.llvm.org/D90964	2020-11-09 13:17:53 +00:00
Lucas Prates	c2c2cc1360	[ARM][AArch64] Adding Neoverse V1 CPU support Add support for the Neoverse V1 CPU to the ARM and AArch64 backends. This is based on patches from Mark Murray and Victor Campos. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D90765	2020-11-09 13:15:40 +00:00
Sanjay Patel	264a6df353	[ARM] remove cost-kind predicate for cmp/sel costs This is the cmp/sel sibling to D90692. Again, the reasoning is: the throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uops). We need to check for a valid (non-null) condition type parameter because SimplifyCFG may pass nullptr for that (and so we will crash multiple regression tests without that check). I'm not sure if passing nullptr makes sense, but other code in the cost model does appear to check if that param is set or not. Differential Revision: https://reviews.llvm.org/D90781	2020-11-05 14:52:25 -05:00
Sander de Smalen	d57bba7cf8	[SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference. To accommodate frame layouts that have both fixed and scalable objects on the stack, describing a stack location or offset using a pointer + uint64_t is not sufficient. For this reason, we've introduced the StackOffset class, which models both the fixed- and scalable sized offsets. The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset, so that this can be used in other interfaces, such as to eliminate frame indices in PEI or to emit Debug locations for variables on the stack. This patch is purely mechanical and doesn't change the behaviour of how the result of this function is used for fixed-sized offsets. The patch adds various checks to assert that the offset has no scalable component, as frame offsets with a scalable component are not yet supported in various places. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D90018	2020-11-05 11:02:18 +00:00
Cameron McInally	c126eb7529	[SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247. Differential Revision: https://reviews.llvm.org/D90644	2020-11-04 14:20:31 -06:00
David Green	eb611930b6	[ARM] Remove unused variable. NFC	2020-11-04 09:00:03 +00:00
Sanjay Patel	c40126e740	[ARM] remove cost-kind predicate for most math op costs This is based on the same idea that I am using for the basic model implementation and what I have partly already done for x86: throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uop)). Differential Revision: https://reviews.llvm.org/D90692	2020-11-03 17:23:46 -05:00
David Green	bd32386410	[ARM] Remove unused variable. NFC	2020-11-03 12:58:10 +00:00
David Green	e474499402	[ARM] Treat memcpy/memset/memmove as call instructions for low overhead loops If an instruction will be lowered to a call there is no advantage of using a low overhead loop as the LR register will need to be spilled and reloaded around the call, and the low overhead will end up being reverted. This teaches our hardware loop lowering that these memory intrinsics will be calls under certain situations. Differential Revision: https://reviews.llvm.org/D90439	2020-11-03 11:53:09 +00:00
Momchil Velikov	7360d6d921	[ARM][MachineOutliner] Do not overestimate LR liveness in return block The `LiveRegUnits` utility (as well as `LivePhysRegs`) considers callee-saved registers to be alive at the point after the return instruction in a block. In the ARM backend, the `LR` register is classified as callee-saved, which is not really correct (from an ARM eABI or just common sense point of view). These two conditions cause the `MachineOutliner` to overestimate the liveness of `LR`, which results in unnecessary saves/restores of `LR` around calls to outlined sequences. It also causes the `MachineVerifer` to crash in some cases, because the save instruction reads a dead `LR`, for example when the following program: int h(int, int); int f(int a, int b, int c, int d) { a = h(a + 1, b - 1); b = b + c; return 1 + (2 * a + b) * (c - d) / (a - b) * (c + d); } int g(int a, int b, int c, int d) { a = h(a - 1, b + 1); b = b + c; return 2 + (2 * a + b) * (c - d) / (a - b) * (c + d); } is compiled with `-target arm-eabi -march=armv7-m -Oz`. This patch computes the liveness of `LR` in return blocks only, while taking into account the few ARM instructions, which read `LR`, but nevertheless the register is not mentioned (explicitly or implicitly) in the instruction operands. Differential Revision: https://reviews.llvm.org/D89189	2020-11-02 16:47:22 +00:00
Florian Hahn	b3b993a7ad	Reland "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts the revert commit `408c4408fa`. This version of the patch includes a fix for a crash caused by treating ICmp/FCmp constant expressions as instructions. Original message: On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV.	2020-11-02 15:39:29 +00:00
Evgeny Leviant	cc96a82291	[TableGen][SchedModels] Fix read/write variant substitution Patch fixes case when sched class has write and read variants belonging to different processor models. Differential revision: https://reviews.llvm.org/D89777	2020-11-02 17:39:04 +03:00
David Green	30ad742644	[ARM] Fix crash for gather of pointer costs. If the elt size is unknown due to it being a pointer, a comparison against 0 will cause an assert. Make sure the elt size is large enough before comparing and for the moment just return the scalar cost.	2020-10-31 13:10:14 +00:00
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit `73f01e3df5`. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Cameron McInally	dda1e74b58	[Legalize] Add legalizations for VECREDUCE_SEQ_FADD Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass. Differential Revision: https://reviews.llvm.org/D90247	2020-10-30 16:02:55 -05:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
David Green	d14db8c8dc	[ARM] Match MVE vqdmulh This adds ISel matching for a form of VQDMULH. There are several ir patterns that we could match to that instruction, this one is for: min(ashr(mul(sext(a), sext(b)), 7), 127) Which is what llvm will optimize to once it has removed the max that usually makes up the min/max saturate pattern, as in this case the compare will always be false. The additional complication to match i32 patterns (which extend into an i64) is that the min will be a vselect/setcc, as vmin is not supported for i64 vectors. Tablegen patterns have also been updated to attempt to reuse the MVE_TwoOpPattern patterns. Differential Revision: https://reviews.llvm.org/D90096	2020-10-30 13:34:27 +00:00
Nicholas Guy	eb9fe24eaf	[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz Fixes a regression caused by D82439, in which IT blocks were no longer being generated when -Oz is present. Differential Revision: https://reviews.llvm.org/D88496	2020-10-29 15:17:31 +00:00
Evgeny Leviant	a28388f95b	[ARM][SchedModels] Move IsLDMBaseRegInListPred to ARMSchedule.td. NFC This predicate is not specific to cortex-a57 and can be used in other processor models as well.	2020-10-26 22:31:41 +03:00
Evgeny Leviant	e74f66125e	[ARM][SchedModels] Convert IsLdstsoScaledNotOptimalPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90150	2020-10-26 20:22:41 +03:00
Evgeny Leviant	a877bda397	Fix issue in cortex-a57 sched model Differential revision: https://reviews.llvm.org/D90152	2020-10-26 20:16:40 +03:00
Evgeny Leviant	a95ce5f65f	[ARM][SchedModels] Rename and generalize predicate. NFC	2020-10-26 12:14:55 +03:00
Evgeny Leviant	99b2756517	[ARM][SchedModels] Get rid of IsLdrAm2ScaledPred Differential revision: https://reviews.llvm.org/D90024	2020-10-26 12:01:39 +03:00
Evgeny Leviant	a4fc18e641	[ARM][SchedModels] Convert IsLdstsoMinusRegPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90029	2020-10-26 11:54:08 +03:00
Evgeny Leviant	d613e39d52	[ARM][SchedModels] Convert IsLdrAm3NegRegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90045	2020-10-26 11:43:02 +03:00
David Green	61bc18de0b	[Schedule] Add a MultiHazardRecognizer This adds a MultiHazardRecognizer and starts to make use of it in the ARM backend. The idea of the class is to allow multiple independent hazard recognizers to be added to a single base MultiHazardRecognizer, allowing them to all work in parallel without requiring them to be chained into subclasses. They can then be added or not based on cpu or subtarget features, which will become useful in the ARM backend once more hazard recognizers are being used for various things. This also renames ARMHazardRecognizer to ARMHazardRecognizerFPMLx in the process, to more clearly explain what that recognizer is designed for. Differential Revision: https://reviews.llvm.org/D72939	2020-10-26 08:06:17 +00:00
David Green	92205bf122	[ARM] Remove some dead code. NFC	2020-10-24 17:22:49 +01:00
Paul C. Anagnostopoulos	876af264c1	[TableGen] Change !getop and !setop to !getdagop and !setdagop. Differential Revision: https://reviews.llvm.org/D89814	2020-10-23 10:36:05 -04:00
Evgeny Leviant	cb86522c94	[ARM][SchedModels] Convert IsR1P0AndLaterPred to MCSchedPredicate. NFC Differential revision: https://reviews.llvm.org/D90017	2020-10-23 14:27:49 +03:00
Evgeny Leviant	7a78073be7	[ARM][SchedModels] Let ldm* instruction scheduling use MCSchedPredicate Differential revision: https://reviews.llvm.org/D89957	2020-10-23 10:33:20 +03:00
Mircea Trofin	e24537d48f	[NFC][MC] Use MCRegister for ReachingDefAnalysis APIs Also updated the users of the APIs; and a drive-by small change to RDFRegister.cpp Differential Revision: https://reviews.llvm.org/D89912	2020-10-22 08:47:35 -07:00
Evgeny Leviant	ed6a91f456	[ARM][SchedModels] Convert IsLdstsoScaledPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89939	2020-10-22 18:03:01 +03:00
Evgeny Leviant	bf9edcb6fd	[ARM][SchedModels] Convert IsLdrAm3RegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89876	2020-10-21 20:49:10 +03:00
Paul C. Anagnostopoulos	dfd6b69e01	[ARM] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans Differential Revision: https://reviews.llvm.org/D89822	2020-10-21 09:52:45 -04:00
Nicholas Guy	9a2d2bedb7	Add "SkipDead" parameter to TargetInstrInfo::DefinesPredicate Some instructions may be removable through processes such as IfConversion, however DefinesPredicate can not be made aware of when this should be considered. This parameter allows DefinesPredicate to distinguish these removable instructions on a per-call basis, allowing for more fine-grained control from processes like ifConversion. Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose Differential Revision: https://reviews.llvm.org/D88494	2020-10-21 11:52:47 +01:00
Evgeny Leviant	991e86156c	[ARM][SchedModels] Convert IsCPSRDefinedPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89460	2020-10-20 11:14:21 +03:00
David Green	6dcbc323fd	Revert "[ARM][LowOverheadLoops] Adjust Start insertion." This reverts commit `38f625d0d1`. This commit contains some holes in its logic and has been causing issues since it was commited. The idea sounds OK but some cases were not handled correctly. Instead of trying to fix that up later it is probably simpler to revert it and work to reimplement it in a more reliable way.	2020-10-20 08:55:21 +01:00
Luqman Aden	51892a42da	[COFF][ARM] Fix CodeView for Windows on 32bit ARM targets. Create the LLVM / CodeView register mappings for the 32-bit ARM Window targets. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D89622	2020-10-19 22:16:16 -07:00
Evgeny Leviant	8a7ca143f8	[ARM][SchedModels] Convert IsPredicatedPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89553	2020-10-19 11:37:54 +03:00
David Green	b93d74ac9c	[ARM] Basic getArithmeticReductionCost reduction costs This adds some basic costs for MVE reductions - currently just costing the simple legal add vectors as a single MVE instruction. More complex costing can be added in the future when the framework more readily allows it. Differential Revision: https://reviews.llvm.org/D88980	2020-10-17 10:29:00 +01:00
David Green	d79ee3a807	[ARM] Add a very basic active_lane_mask cost This adds a very basic cost for active_lane_mask under MVE - making the assumption that they will be free and then apologizing for that in a comment. In reality they may either be free (by being nicely folded into a tail predicated loop), cost the same as a VCTP or be expanded into vdup's, adds and cmp's. It is difficult to detect the difference from a single getIntrinsicInstrCost call, so makes the assumption that the vectorizer is adding them, and only added them where it makes sense. We may need to change this in the future to better model predicate costs in the vectorizer, especially at -Os or non-tail predicated loops. The vectorizer currently does not query the cost of these instructions but that will change in the future and a zero cost there probably makes the most sense at the moment. Differential Revision: https://reviews.llvm.org/D88989	2020-10-17 10:09:42 +01:00
David Sherwood	47f2dc7e5f	[SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets In most of lib/Target we know that we are not dealing with scalable types so it's perfectly fine to replace TypeSize comparison operators with their fixed width equivalents, making use of getFixedSize() and so on. Differential Revision: https://reviews.llvm.org/D89101	2020-10-15 09:01:21 +01:00
Evgeny Leviant	2ad82b0ed1	[ARM.td] Make instruction definitions visible to sched models Differential revision: https://reviews.llvm.org/D89308	2020-10-14 09:58:45 +03:00
David Green	cb27006a94	[ARM] Attempt to make Tail predication / RDA more resilient to empty blocks There are a number of places in RDA where we assume the block will not be empty. This isn't necessarily true for tail predicated loops where we have removed instructions. This attempt to make the pass more resilient to empty blocks, not casting pointers to machine instructions where they would be invalid. The test contains a case that was previously failing, but recently been hidden on trunk. It contains an empty block to begin with to show a similar error. Differential Revision: https://reviews.llvm.org/D88926	2020-10-10 14:50:25 +01:00
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Sam Tebbs	68e002e181	[ARM] Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV. Differential Revision: https://reviews.llvm.org/D87836	2020-10-06 14:44:58 +01:00
Amara Emerson	c9f5cdd453	Revert "[ARM]Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV" This reverts commit `2573cf3c3d`. These seem to break some lit tests.	2020-10-05 10:52:43 -07:00
Sam Tebbs	2573cf3c3d	[ARM]Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV. Differential Revision: https://reviews.llvm.org/D87836	2020-10-05 15:51:28 +01:00
David Green	7feafa0286	[ARM] Fix pointer offset when splitting stores from VMOVDRR We were not accounting for the pointer offset when splitting a store from a VMOVDRR node, which could lead to incorrect aliasing info. In this case it is the fneg via integer arithmetic that gives us a store->load pair that we started getting wrong. Differential Revision: https://reviews.llvm.org/D88653	2020-10-03 16:47:50 +01:00
Meera Nakrani	f7c0e2b8f2	[ARM] Prevent constants from iCmp instruction from being hoisted if part of a min(max()) pattern Marks constants of an ICmp instruction as free if it's only user is a select instruction that is part of a min(max()) pattern. Ensures that in loops, in particular when loop unrolling is turned on, SSAT will still be correctly generated. Differential Revision: https://reviews.llvm.org/D88662	2020-10-02 09:28:35 +00:00
Meera Nakrani	48c9e8244b	[ARM] Removed hasSideEffects from signed/unsigned saturates Removed hasSideEffects from SSAT and USAT so that they are no longer marked as unpredictable. Differential Revision: https://reviews.llvm.org/D88545	2020-10-01 14:55:01 +00:00
Sam Parker	7e02bc81c6	[NFC][ARM] LowOverheadLoop DEBUG statements	2020-10-01 13:38:16 +01:00
Sam Parker	38f625d0d1	[ARM][LowOverheadLoops] Adjust Start insertion. Try to move the insertion point to become the terminator of the block, usually the preheader. Differential Revision: https://reviews.llvm.org/D88638	2020-10-01 10:49:19 +01:00
Sam Parker	6ec5f32497	[ARM][LowOverheadLoops] Iteration count liveness Before deciding to insert a [W\|D]LSTP, check that defining LR with the element count won't affect any other instructions that should be taking the iteration count. Differential Revision: https://reviews.llvm.org/D88549	2020-10-01 10:11:10 +01:00
Sam Parker	7b90516d47	[ARM][LowOverheadLoops] Start insertion point If possible, try not to move the start position earlier than it already is. Differential Revision: https://reviews.llvm.org/D88542	2020-10-01 10:05:25 +01:00
Sam Parker	dfa2c14b8f	[ARM][LowOverheadLoops] Use iterator for InsertPt. Use a MachineBasicBlock::iterator instead of a MachineInstr* for the position of our LoopStart instruction. NFCish, as it change debug info.	2020-10-01 08:32:35 +01:00
Rahman Lavaee	8955950c12	Exception support for basic block sections This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf This patch provides exception support for basic block sections by splitting the call-site table into call-site ranges corresponding to different basic block sections. Still all landing pads must reside in the same basic block section (which is guaranteed by the the core basic block section patch D73674 (ExceptionSection) ). Each call-site table will refer to the landing pad fragment by explicitly specifying @LPstart (which is omitted in the normal non-basic-block section case). All these call-site tables will share their action and type tables. The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. In the case of basic block section where one section contains all the landing pads, the landing pad offset relative to LPStart could actually be zero. Thus, we avoid zero-offset landing pads by inserting a nop operation as the first non-CFI instruction in the exception section. Background on Exception Handling in C++ ABI https://github.com/itanium-cxx-abi/cxx-abi/blob/master/exceptions.pdf Compiler emits an exception table for every function. When an exception is thrown, the stack unwinding library queries the unwind table (which includes the start and end of each function) to locate the exception table for that function. The exception table includes a call site table for the function, which is used to guide the exception handling runtime to take the appropriate action upon an exception. Each call site record in this table is structured as follows: \| CallSite \| --> Position of the call site (relative to the function entry) \| CallSite length \| --> Length of the call site. \| Landing Pad \| --> Position of the landing pad (relative to the landing pad fragment’s begin label) \| Action record offset \| --> Position of the first action record The call site records partition a function into different pieces and describe what action must be taken for each callsite. The callsite fields are relative to the start of the function (as captured in the unwind table). The landing pad entry is a reference into the function and corresponds roughly to the catch block of a try/catch statement. When execution resumes at a landing pad, it receives an exception structure and a selector value corresponding to the type of the exception thrown, and executes similar to a switch-case statement. The landing pad field is relative to the beginning of the procedure fragment which includes all the landing pads (@LPStart). The C++ ABI requires all landing pads to be in the same fragment. Nonetheless, without basic block sections, @LPStart is the same as the function @Start (found in the unwind table) and can be omitted. The action record offset is an index into the action table which includes information about which exception types are caught. C++ Exceptions with Basic Block Sections Basic block sections break the contiguity of a function fragment. Therefore, call sites must be specified relative to the beginning of the basic block section. Furthermore, the unwinding library should be able to find the corresponding callsites for each section. To do so, the .cfi_lsda directive for a section must point to the range of call-sites for that section. This patch introduces a new CallSiteRange structure which specifies the range of call-sites which correspond to every section: `struct CallSiteRange { // Symbol marking the beginning of the precedure fragment. MCSymbol FragmentBeginLabel = nullptr; // Symbol marking the end of the procedure fragment. MCSymbol FragmentEndLabel = nullptr; // LSDA symbol for this call-site range. MCSymbol *ExceptionLabel = nullptr; // Index of the first call-site entry in the call-site table which // belongs to this range. size_t CallSiteBeginIdx = 0; // Index just after the last call-site entry in the call-site table which // belongs to this range. size_t CallSiteEndIdx = 0; // Whether this is the call-site range containing all the landing pads. bool IsLPRange = false; };` With N basic-block-sections, the call-site table is partitioned into N call-site ranges. Conceptually, we emit the call-site ranges for sections sequentially in the exception table as if each section has its own exception table. In the example below, two sections result in the two call site ranges (denoted by LSDA1 and LSDA2) placed next to each other. However, their call-sites will refer to records in the shared Action Table. We also emit the header fields (@LPStart and CallSite Table Length) for each call site range in order to place the call site ranges in separate LSDAs. We note that with -basic-block-sections, The CallSiteTableLength will not actually represent the length of the call site table, but rather the reference to the action table. Since the only purpose of this field is to locate the action table, correctness is guaranteed. Finally, every call site range has one @LPStart pointer so the landing pads of each section must all reside in one section (not necessarily the same section). To make this easier, we decide to place all landing pads of the function in one section (hence the `IsLPRange` field in CallSiteRange). \| @LPStart \| ---> Landing pad fragment ( LSDA1 points here) \| CallSite Table Length \| ---> Used to find the action table. \| CallSites \| \| … \| \| … \| \| @LPStart \| ---> Landing pad fragment ( LSDA2 points here) \| CallSite Table Length \| \| CallSites \| \| … \| \| … \| … … \| Action Table \| \| Types Table \| Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D73739	2020-09-30 11:05:55 -07:00
Sam Parker	779a8a028f	[ARM][LowOverheadLoops] TryRemove helper. Make a helper function that wraps around RDA::isSafeToRemove and utilises the existing DCE IT block checks.	2020-09-30 09:37:24 +01:00
Sam Parker	195c22f273	[ARM] Change VPT state assertion Just because we haven't encountered an instruction setting the VPR, it doesn't mean we can't create a VPT block - the VPR maybe a live-in. Differential Revision: https://reviews.llvm.org/D88224	2020-09-30 08:01:10 +01:00
Sam Parker	4c19b89b25	[NFC][ARM] Comments and lambdas Add some comments in LowOverheadLoops and make some lambda variables explicit arguments instead of capturing.	2020-09-29 08:41:53 +01:00
Sam Parker	e82a0084d3	[ARM][LowOverheadLoops] Cleanup and re-arrange Rename and reorganise how we decide where to put the LoopStart instruction.	2020-09-28 16:06:30 +01:00
Tres Popp	509fba75df	[llvm] Fix unused variable in non-debug configurations	2020-09-28 17:04:08 +02:00
Meera Nakrani	675431b987	[ARM] Added more patterns to generate SSAT/USAT with shift Added patterns to generate an SSAT or USAT with shift for SSAT/USAT instructions that are matched from IR patterns. Differential Revision: https://reviews.llvm.org/D88145	2020-09-28 14:50:19 +00:00
Sam Parker	3d1d089155	[NFC][ARM] Factor out some logic for LoLoops. Create a DCE function that accepts an instruction.	2020-09-28 14:51:52 +01:00
Sjoerd Meijer	1696dd27fb	[ARM][MVE] Enable tail-predication by default We have been running tests/benchmarks downstream with tail-predication enabled for some time now and this behaves as expected: we are not aware of any correctness issues, and this performs better across the board than with tail-predication disabled. Time to flip the switch! Differential Revision: https://reviews.llvm.org/D88093	2020-09-28 14:01:23 +01:00
Sjoerd Meijer	f39f92c1f6	[ARM][MVE] tail-predication: overflow checks for elementcount, cont'd This is a reimplementation of the overflow checks for the elementcount, i.e. the 2nd argument of intrinsic get.active.lane.mask. The element count is lowered in each iteration of the tail-predicated loop, and we must prove that this expression doesn't overflow. Many thanks to Eli Friedman and Sam Parker for all their help with this work. Differential Revision: https://reviews.llvm.org/D88086	2020-09-28 09:20:51 +01:00
David Green	e4b9867cb6	[ARM] Expand cannotInsertWDLSTPBetween to the last instruction `9d9a11c7be` added this check for predicatable instructions between the D/WLSTP and the loop's start, but it was missing the last instruction in the block. Change it to use some iterators instead. Differential Revision: https://reviews.llvm.org/D88354	2020-09-28 09:14:40 +01:00
Sam Parker	a399d1880b	[ARM] Find VPT implicitly predicated by VCTP On failing to find a VCTP in the list of instructions that explicitly predicate the entry of a VPT block, inspect whether the block is controlled via VPT which is implicitly predicated due to it's predicated operand(s). Differential Revision: https://reviews.llvm.org/D87819	2020-09-25 08:50:53 +01:00
Sam Parker	00ee52ae04	[NFC][ARM] Remove dead loop. Remove a loop that just calculated a couple of values that were now longer needed.	2020-09-24 15:37:26 +01:00
Sjoerd Meijer	2fc690ac90	[ARM] LowoverheadLoops: add an option to disable tail-predication This might be useful for testing. We already have an option -tail-predication but that controls the MVETailPredication pass. This -arm-loloops-disable-tail-pred is just for disabling it in the LowoverheadLoops pass. Differential Revision: https://reviews.llvm.org/D88212	2020-09-24 13:30:48 +01:00
Sam Parker	9d9a11c7be	[ARM] Check for LSTP side-effects. If the LSTP instruction is inserted with an element count low enough to immediately predicate some lanes as false, this can have some unintended effects on any proceeding MVE instructions in the preheader. Differential Revision: https://reviews.llvm.org/D88209	2020-09-24 13:28:35 +01:00
Eli Friedman	3f739f736b	[SelectionDAG][GISel] Make LegalizeDAG lower FNEG using integer ops. Previously, if a floating-point type was legal, but FNEG wasn't legal, we would use FSUB. Instead, we should use integer ops, to preserve the semantics. (Alternatively, there's a compiler-rt call we could use, but there isn't much reason to use that.) It turns out we actually are still using this obscure codepath in a few cases: on some targets, we have "legal" floating-point types that don't actually support any floating-point operations. In particular, ARM and AArch64 are using this path. The implementation for SelectionDAG is pretty simple because we can reuse the infrastructure from FCOPYSIGN. See also `9a3dc3e`, the corresponding change to type legalization. Also includes a "bonus" change to STRICT_FSUB legalization, so we can lower a STRICT_FSUB to a float libcall. Includes the changes to both LegalizeDAG and GlobalISel so we don't have inconsistent results in the future. Fixes https://bugs.llvm.org/show_bug.cgi?id=46792 . Differential Revision: https://reviews.llvm.org/D84287	2020-09-23 14:10:33 -07:00
David Sherwood	e077367a28	[SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits An existing function Type::getScalarSizeInBits returns a uint64_t instead of a TypeSize class because the caller is requesting a scalar size, which cannot be scalable. This patch makes other similar functions requesting a scalar size consistent with that, thereby eliminating more than 1000 implicit TypeSize -> uint64_t casts. Differential revision: https://reviews.llvm.org/D87889	2020-09-23 09:20:08 +01:00
Stefanos Baziotis	a7873e5abc	Small fixes for "[LoopInfo] empty() -> isInnermost(), add isOutermost()"	2020-09-22 23:59:34 +03:00
Stefanos Baziotis	89c1e35f3c	[LoopInfo] empty() -> isInnermost(), add isOutermost() Differential Revision: https://reviews.llvm.org/D82895	2020-09-22 23:28:51 +03:00
Sam Parker	94c799fecf	[ARM] Trying to fix asan buildbot	2020-09-22 13:43:23 +01:00
Meera Nakrani	a3d0dce260	[ARM][TTI] Prevents constants in a min(max) or max(min) pattern from being hoisted when in a loop Changes TTI function getIntImmCostInst to take an additional Instruction parameter, which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT. We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated. Required minor changes in some non-ARM backends to allow for the optional parameter to be included. Differential Revision: https://reviews.llvm.org/D87457	2020-09-22 11:54:10 +00:00
Sam Parker	b4fa884a73	[ARM] Improve VPT predicate tracking The VPTBlock has been modified to track the 'global' state of the VPR, as well as the state for each block. Each object now just holds a list of instructions that makeup the block, while static structures hold the predicate information. This enables global access for querying how both a VPT block and individual instructions are predicated. These changes now allow us, again, to handle more complicated cases where multiple instructions build a predicate and/or where the same predicate in used in multiple blocks. It doesn't, however, get us back to before the tracking was 'fixed' as some extra logic will be required to properly handle VPT instructions. Currently a VPT could be effectively predicated because of it's inputs, but the existing logic will not detect that and so will refuse to perform the transformation. This can be seen in remat-vctp.ll test where we still don't perform the transform. Differential Revision: https://reviews.llvm.org/D87681	2020-09-22 10:40:27 +01:00
Sam Parker	a0c1dcc318	[ARM] Remove MVEDomain from VLDR/STR of P0 Remove the domain from the instructions and create a shouldInspect helper for LowOverheadLoops which queries it or a vpr operand. Differential Revision: https://reviews.llvm.org/D87900	2020-09-22 09:05:50 +01:00
Sam Parker	e461921d6c	[ARM] VPT validForTailPredication Mark all VPT instructions as valid. Differential Revision: https://reviews.llvm.org/D87759	2020-09-22 08:58:37 +01:00
Momchil Velikov	742250bf62	[ARM][CMSE] Issue an error if passing arguments through memory across security boundary It was never supported and that part was accidentally omitted when upstreaming D76518. Differential Revision: https://reviews.llvm.org/D86478 Change-Id: If6ba9506eb0431c87a1d42a38aa60e47ce263039	2020-09-21 17:26:10 +01:00
David Green	f4c5cadbcb	[ARM] Select f32 constants with vmov.f16 This adds lowering for f32 values using the vmov.f16, which zeroes the top bits whilst setting the lower bits to a pattern. This range of values does not often come up, except where a f16 constant value has been converted to a f32. Differential Revision: https://reviews.llvm.org/D87790	2020-09-21 11:10:47 +01:00
David Green	29bd8ea110	[ARM] Constant fold VMOVrh This adds simple constant folding for VMOVrh, to constant fold fp16 constants to integer values. It can help especially with soft calling conventions, but some of the results are not optimal as we end up loading using a vldr. This will be improved in a follow up patch. Differential Revision: https://reviews.llvm.org/D87789	2020-09-20 21:32:51 +01:00
Gabriel Hjort Åkerlund	c10200536f	[TableGen][GlobalISel] Fix handling of zero_reg When generating matching tables for GlobalISel, TableGen would output "::zero_reg" whenever encountering the zero_reg, which in turn would result in compilation error. This patch fixes that by instead outputting NoRegister (== 0), which is the same result that TableGen produces when generating matching tables for ISelDAG. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D86215	2020-09-18 11:01:11 +02:00
David Green	7f7993e0da	[ARM] Expand distributing increments to also handle existing pre/post inc instructions. This extends the distributing postinc code in load/store optimizer to also handle the case where there is an existing pre/post inc instruction, where subsequent instructions can be modified to use the adjusted offset from the increment. This can save us having to keep the old register live past the increment instruction. Differential Revision: https://reviews.llvm.org/D83377	2020-09-17 16:58:35 +01:00
David Green	34b27b9441	[ARM] Sink splats to MVE intrinsics The predicated MVE intrinsics are generated as, for example, llvm.arm.mve.add.predicated(x, splat(y). p). We need to sink the splat value back into the loop, like we do for other instructions, so we can re-select qr variants. Differential Revision: https://reviews.llvm.org/D87693	2020-09-17 16:00:51 +01:00
Simon Pilgrim	f026812110	InstCombiner.h - remove unnecessary KnownBits.h include. NFCI. Move the include down to cpp files with an implicit dependency.	2020-09-17 14:28:42 +01:00
Sjoerd Meijer	b5c3efeb7b	[ARM][MVE] Tail-predication: predicate new elementcount checks on force-enabled Additional sanity checks were added to get.active.lane.mask's second argument, the loop tripcount/elementcount, in rG635b87511ec3. Like the other (overflow) checks, skip this if tail-predication is forced. Differential Revision: https://reviews.llvm.org/D87769	2020-09-16 17:05:14 +01:00
Sam Parker	3ce9ec0cfa	[ARM] Reorder some logic Re-order some checks in ValidateMVEInst.	2020-09-16 13:39:22 +01:00
Sam Parker	a63b2a4614	[ARM] Fix tail predication predicate tracking Clear the CurrentPredicate when we find an instruction which would completely overwrite the VPR. This fix essentially means we're back to not really being able to handle VPT instructions when tail predicating. Differential Revision: https://reviews.llvm.org/D87610	2020-09-16 11:59:29 +01:00
Sam Parker	86172ce378	[ARM] Add more validForTailPredication Modify the unit test to inspect all MVE instructions and mark the load/store/move of vpr/p0 as valid, as well as the remaining scalar shifts. Differential Revision: https://reviews.llvm.org/D87753	2020-09-16 11:51:50 +01:00
Sam Tebbs	ef0b9f3307	[ARM][LowOverheadLoops] Combine a VCMP and VPST into a VPT This patch combines a VCMP followed by a VPST into a VPT, which has the same semantics as the combination of the former two.	2020-09-16 09:27:10 +01:00
Yvan Roux	070b96962f	[ARM][MachineOutliner] Add calls handling. Handles calls inside outlined regions, by saving and restoring the link register. Differential Revision: https://reviews.llvm.org/D87136	2020-09-16 09:54:26 +02:00
Sjoerd Meijer	635b87511e	[ARM][MVE] Tail-predication: use unsigned SCEV ranges for tripcount Loop tripcount expressions have a positive range, so use unsigned SCEV ranges for them. Differential Revision: https://reviews.llvm.org/D87608	2020-09-15 13:23:02 +01:00
Meera Nakrani	1119bf95be	[ARM] Corrected condition in isSaturatingConditional Fixed a small error in an if condition to prevent usat/ssat being generated if (upper constant + 1) is not a power of 2.	2020-09-15 10:14:30 +00:00
Sjoerd Meijer	b4b1b84106	[MVE] fix typo in llvm debug message. NFC.	2020-09-15 10:13:54 +01:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Nikita Popov	53f36f06af	[Legalize][ARM][X86] Add float legalization for VECREDUCE This adds SoftenFloatRes, PromoteFloatRes and SoftPromoteHalfRes legalizations for VECREDUCE, to fill the remaining hole in the SDAG legalization. These legalizations simply expand the reduction and let it be recursively legalized. For the PromoteFloatRes case at least it is possible to do better than that, but it's pretty tricky (because we need to consider the interaction of three different vector legalizations and the type promotion) and probably not really worthwhile. I haven't added ExpandFloatRes support, as I am not familiar with ppc_fp128. Differential Revision: https://reviews.llvm.org/D87569	2020-09-14 20:42:09 +02:00
Simon Pilgrim	98eaacd73d	Assert we've found both vector types. NFCI. Fixes clang static analyzer warning about potential null dereferences.	2020-09-14 13:24:17 +01:00
Meera Nakrani	dd519bf0b0	[ARM] Selects SSAT/USAT from correct LLVM IR LLVM will canonicalize conditional selectors to a different pattern than the old code that was used. This is updating the function to match the new expected patterns and select SSAT or USAT when successful. Tests have also been updated to use the new patterns. Differential Review: https://reviews.llvm.org/D87379	2020-09-14 10:58:21 +00:00
Sjoerd Meijer	676febc044	[ARM][MVE] Tail-predication: check get.active.lane.mask's TC value This adds additional checks for the original scalar loop tripcount value, i.e. get.active.lane.mask second argument, and perform several sanity checks to see if it is of the form that we expect similarly like we already do for the IV which is the first argument of get.active.lane. Differential Revision: https://reviews.llvm.org/D86074	2020-09-14 11:32:15 +01:00
Simon Wallis	4946802c5f	[ARM] Fix so immediates and pc relative checks Treating an SoImm offset as a multiple of 4 between -1020 and 1020 mis-handles the second of a pair of 16-bit constants where the offset is a multiple of 2 but not a multiple of 4, leading to an LLVM ERROR: out of range pc-relative fixup value For 32-bit and larger (64-bit) constants, continue to treat an SoImm offset as a multiple of 4 between -1020 and 1020. For smaller (16-bit) constants, treat an SoImm offset as a multiple of 1 between -255 and 255. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86949	2020-09-14 08:52:59 +01:00
David Green	74760bb00f	[LV][ARM] Add preferInloopReduction target hook. This allows the backend to tell the vectorizer to produce inloop reductions through a TTI hook. For the moment on ARM under MVE this means allowing integer add reductions of the correct size. In the future this can include integer min/max too, under -Os. Differential Revision: https://reviews.llvm.org/D75512	2020-09-12 17:47:04 +01:00
David Green	6cfd38d03d	[ARM] Fixup single source mla reductions. This fixes a complication on top of D87276. If we are sign extending around a mul with the two operands that are the same, instcombine will helpfully convert one of the sext to a zext. Reverse that so that we again generate a reduction. Differnetial Revision: https://reviews.llvm.org/D87287	2020-09-12 14:31:26 +01:00
Sanjay Patel	3a8ea8609b	[Intrinsics] define semantics for experimental fmax/fmin vector reductions As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html This is hopefully the final remaining showstopper before we can remove the 'experimental' from the reduction intrinsics. No behavior was specified for the FP min/max reductions, so we have a mess of different interpretations. There are a few potential options for the semantics of these max/min ops. I think this is the simplest based on current behavior/implementation: make the reductions inherit from the existing llvm.maxnum/minnum intrinsics. These correspond to libm fmax/fmin, and those are similar to the (now deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing data). So the default expansion creates calls to libm functions. Another option would be to inherit from llvm.maximum/minimum (NaNs propagate), but most targets just crash in codegen when given those nodes because no default expansion was ever implemented AFAICT. We could also just assume 'nnan' semantics by default (we are already assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets (AArch64, PowerPC) support the more defined behavior, so it doesn't make much sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to loosen the semantics. (Note that D67507 was proposed to update the LangRef to acknowledge the more recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do update based on the new standard, the reduction instructions can seamlessly inherit from whatever updates are made to the max/min intrinsics.) x86 sees a regression here on 'nnan' tests because we have underlying, longstanding bugs in FMF creation/propagation. Those need to be fixed apart from this change (for example: https://llvm.org/PR35538). The expansion sequence before this patch may not have been correct. Differential Revision: https://reviews.llvm.org/D87391	2020-09-12 09:10:28 -04:00
David Green	c437446d90	[ARM] Recognize "double extend" reduction patterns We can sometimes get code that does: xe = zext i16 x to i32 ye = zext i16 y to i32 m = mul i32 xe, ye me = zext i32 m to i64 r = vecreduce.add(me) This "double extend" can trip up the reduction identification, but should give identical results. This extends the pattern matching to handle them. Differential Revision: https://reviews.llvm.org/D87276	2020-09-12 13:51:42 +01:00
Sam Tebbs	b81c57d646	[ARM][LowOverheadLoops] Allow tail predication on predicated instructions with unknown lane values The effects of unpredicated vector instruction with unknown lanes cannot be predicted and therefore cannot be tail predicated. This does not apply to predicated vector instructions and so this patch allows tail predication on them. Differential Revision: https://reviews.llvm.org/D87376	2020-09-10 10:34:32 +01:00
Sam Parker	1919b65052	[ARM] Tail predicate VQDMULH and VQRDMULH Mark the family of instructions as valid for tail predication. Differential Revision: https://reviews.llvm.org/D87348	2020-09-10 08:20:07 +01:00
Amara Emerson	e5784ef8f6	[GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator. We weren't using this before, so none of the MachineFunction CFG edges had the branch probability information added. As a result, block placement later in the pipeline was flying blind. This is enabled only with optimizations enabled like SelectionDAG. Differential Revision: https://reviews.llvm.org/D86824	2020-09-09 14:31:12 -07:00
Sam Parker	3ebc755227	[ARM] Try to rematerialize VCTP instructions We really want to try and avoid spilling P0, which can be difficult since there's only one register, so try to rematerialize any VCTP instructions. Differential Revision: https://reviews.llvm.org/D87280	2020-09-09 07:41:22 +01:00
Sam Tebbs	7aabb6ad77	[ARM][LowOverheadLoops] Remove modifications to the correct element count register After my patch at D86087, code that now uses the mov operand rather than the vctp operand will no longer remove modifications to the vctp operand as they should. This patch fixes that by explicitly removing modifications to the vctp operand rather than the register used as the element count.	2020-09-08 10:30:05 +01:00
Roman Lebedev	bb7d3af113	Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline This was reverted in `503deec218` because it caused gigantic increase (3x) in branch mispredictions in certain benchmarks on certain CPU's, see https://reviews.llvm.org/D84108#2227365. It has since been investigated and here are the results: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html > It's an amazingly severe regression, but it's also all due to branch > mispredicts (about 3x without this). The code layout looks ok so there's > probably something else to deal with. I'm not sure there's anything we can > reasonably do so we'll just have to take the hit for now and wait for > another code reorganization to make the branch predictor a bit more happy :) > > Thanks for giving us some time to investigate and feel free to recommit > whenever you'd like. > > -eric So let's just reland this. Original commit message: I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108 This reverts commit `503deec218`.	2020-09-08 00:24:03 +03:00
Sam Parker	0af4147804	[ARM][CostModel] CodeSize costs for i1 arith ops When optimising for size, make the cost of i1 logical operations relatively expensive so that optimisations don't try to combine predicates. Differential Revision: https://reviews.llvm.org/D86525	2020-09-07 09:27:18 +01:00
David Green	294c0cc3eb	[ARM] Fold predicate_cast(load) into vldr p0 This adds a simple tablegen pattern for folding predicate_cast(load) into vldr p0, providing the alignment and offset are correct. Differential Revision: https://reviews.llvm.org/D86702	2020-09-04 11:29:59 +01:00
David Green	ffd0b31c7c	Revert "[ARM] Register pressure with -mthumb forces register reload before each call" Expensive checks are failing, complaining about additional MMO operands added to the branch.	2020-09-01 07:39:54 +01:00
Prathamesh Kulkarni	85b4d286d7	[ARM] Register pressure with -mthumb forces register reload before each call This patch implements the foldMemoryOperand hook in Thumb1InstrInfo, allowing tBLXr and a spilled function address to be combined back into a tBL. This can help with codesize at Oz, especailly in the tinycrypt library. Differential Revision: https://reviews.llvm.org/D79785	2020-08-31 20:00:30 +01:00
Craig Topper	aab90384a3	[Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None) There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None. So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline. This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations. Differential Revision: https://reviews.llvm.org/D86744	2020-08-28 13:23:45 -07:00
Sjoerd Meijer	5f1cad4d29	[ARM] Skip combining base updates for vld1x NEON intrinsics Skip this for now, to avoid a backend crash in: UNREACHABLE executed at llvm/lib/Target/ARM/ARMISelLowering.cpp:13412 This should fix PR45824. Differential Revision: https://reviews.llvm.org/D86784	2020-08-28 20:29:15 +01:00
Anna Welker	064981f0ce	[ARM][MVE] Enable MVE gathers and scatters by default Enable MVE gather/scatters by default, which requires some minor adaptations in some tests. Differential revision: https://reviews.llvm.org/D86776	2020-08-28 19:05:29 +01:00
David Green	4ca60915bc	[ARM] Correct predicate operand for offset gather/scatter These arm_mve_vldr_gather_offset_predicated and arm_mve_vstr_scatter_offset_predicated have some extra parameters meaning the predicate is at a later operand. If a loop contains _only_ those masked instructions, we would miss transforming the active lane mask. Differential Revision: https://reviews.llvm.org/D86791	2020-08-28 17:48:15 +01:00
Sam Parker	b30adfb529	[ARM][LowOverheadLoops] Liveouts and reductions Remove the code that tried to look for reduction patterns, since the vectorizer and isel can now produce predicated arithmetic instructios within the loop body. This has required some reorganisation and fixes around live-out and predication checks, as well as looking for cases where an input/output is initialised to zero. Differential Revision: https://reviews.llvm.org/D86613	2020-08-28 13:56:16 +01:00
Mikhail Maltsev	ae1396c7d4	[ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics This patch adjusts the following ARM/AArch64 LLVM IR intrinsics: - neon_bfmmla - neon_bfmlalb - neon_bfmlalt so that they take and return bf16 and float types. Previously these intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from implementation lacking bf16 IR type). The neon_vbfdot[q] intrinsics are adjusted similarly. This change required some additional selection patterns for vbfdot itself and also for vector shuffles (in a previous patch) because of SelectionDAG transformations kicking in and mangling the original code. This patch makes the generated IR cleaner (less useless bitcasts are produced), but it does not affect the final assembly. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D86146	2020-08-27 18:43:16 +01:00
Sam Parker	03141aa04a	[ARM] Enable outliner at -Oz for M-class Enable default outlining when the function has the minsize attribute and we're targeting an m-class core. Differential Revision: https://reviews.llvm.org/D82951	2020-08-27 08:02:56 +01:00
Sam Parker	a3e41d4581	[ARM] Make MachineVerifier more strict about terminators Fix the ARM backend's analyzeBranch so it doesn't ignore predicated return instructions, and make the MachineVerifier rule more strict. Differential Revision: https://reviews.llvm.org/D40061	2020-08-27 07:10:20 +01:00
David Green	677c1590c0	[ARM] Increase MVE gather/scatter cost by MVECostFactor. MVE Gather scatter codegeneration is looking a lot better than it used to, but still has some issues. The instructions we currently model as 1 cycle per element, which is a bit low for some cases. Increasing the cost by the MVECostFactor brings them in-line with our other instruction costs. This will have the effect of only generating then when the extra benefit is more likely to overcome some of the issues. Notably in running out of registers and vectorizing loops that could otherwise be SLP vectorized. In the short-term whilst we look at other ways of dealing with those more directly, we can increase the costs of gathers to make them more likely to be beneficial when created. Differential Revision: https://reviews.llvm.org/D86444	2020-08-26 13:03:46 +01:00
Sjoerd Meijer	c352e7fbda	[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1. Differential Revision: https://reviews.llvm.org/D86303	2020-08-25 14:38:03 +01:00
Anna Welker	8048068c3e	[ARM][MVE] Allow tail predication for strides !=1 with gather/scatters If gather/scatters are enabled, ARMTargetTransformInfo now allows tail predication for loops with a much wider range of strides, up to anything that is loop invariant. Differential Revision: https://reviews.llvm.org/D85410	2020-08-24 13:54:47 +01:00
Roman Lebedev	503deec218	Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline" As disscussed in post-commit review starting with https://reviews.llvm.org/D84108#2227365 while this appears to be mostly a win overall, especially code-size-wise, this appears to shake //certain// code pattens in a way that is extremely unfavorable for performance (+30% runtime regression) on certain CPU's (i personally can't reproduce). So until the behaviour is better understood, and a path forward is mapped, let's back this out for now. This reverts commit `1d51dc38d8`.	2020-08-22 00:33:22 +03:00
Sam Parker	acf0bb41e4	[ARM][CostModel] Select instruction costs. Modify the ARM getCmpSelInstrCost implementation for the code size costs of selects. Now consider the legalization cost and increase the cost of i1 because those values wouldn't live in a general purpose register. We also make selects +1 more expensive to account for the IT instruction. Differential Revision: https://reviews.llvm.org/D82091	2020-08-21 08:49:56 +01:00
David Green	2b69efded0	[ARM][LV] Add a preferPredicatedReductionSelect target hook As part of D84741, this adds a target hook for the preferPredicatedReductionSelect option and makes use of it under MVE, allowing us to tail predicate most reduction loops. Differential Revision: https://reviews.llvm.org/D85980	2020-08-21 08:48:12 +01:00
Yvan Roux	0459f29e8b	[ARM][MachineOutliner] Add default mode. Use the stack to save and restore the link register when there is no available register to do it. Differential Revision: https://reviews.llvm.org/D76069	2020-08-20 09:25:33 +02:00
Meera Nakrani	545de56f87	[ARM] Enabled VMLAV and Add instructions to use VMLAVA Used InstCombine to enable VMLAV and Add instructions to generate VMLAVA instead with tests.	2020-08-19 08:36:49 +00:00
Fangrui Song	c466c5fa7e	[ARM] Fix build after D86087	2020-08-18 09:20:32 -07:00
David Green	3471520b1f	[ARM] Allow tail predication of VLDn VLD2/4 instructions cannot be predicated, so we cannot tail predicate them from autovec. From intrinsics though, they should be valid as they will just end up loading extra values into off vector lanes, not effecting the on lanes. The same is true for loads in general where so long as we are not using the other vector lanes, an unpredicated load can be converted to a predicated one. This marks VLD2 and VLD4 instructions as validForTailPredication and allows any unpredicated load in tail predication loop, which seems to be valid given the other checks we have. Differential Revision: https://reviews.llvm.org/D86022	2020-08-18 17:15:45 +01:00
Sam Tebbs	31f02ac60a	[ARM] Use mov operand if the mov cannot be moved while tail predicating There are some cases where the instruction that sets up the iteration count for a tail predicated loop cannot be moved before the dlstp, stopping tail predication entirely. This patch checks if the mov operand can be used and if so, uses that instead. Differential Revision: https://reviews.llvm.org/D86087	2020-08-18 17:10:29 +01:00
Craig Topper	c7a0b2684f	[X86][MC][Target] Initial backend support a tune CPU to support -mtune This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line. This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned. One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU. I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning. Differential Revision: https://reviews.llvm.org/D85165	2020-08-14 15:31:50 -07:00
Sam Parker	eb82d58f83	[NFC][ARM] Port MaybeCall into ARMTTImpl method Renamed to maybeLoweredToCall.	2020-08-14 10:23:20 +01:00
David Green	0c390c22a5	Revert "[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz" This reverts commit `18279a54b5` as it is causing some chromium android test problems.	2020-08-13 22:40:36 +01:00
David Green	2632c625ed	[ARM] Mark VMINNMA/VMAXNMA as commutative These operations take Qda and Rn register operands, which are commutative so long as the instruction is not predicated. Differential Revision: https://reviews.llvm.org/D85813	2020-08-13 18:01:11 +01:00
Anna Welker	9eb9ba076a	[ARM][MVE] Fix for tail predication for loops containing MVE gather/scatters Fix to include non-predicated version of write-back gather in special case treatment for deducting the instruction type. (This is fixing https://reviews.llvm.org/D85138 for corner cases) Differential Revision: https://reviews.llvm.org/D85889	2020-08-13 12:24:19 +01:00
David Green	1bb3488685	[ARM] Predicated VFMA patterns Similar to the Two op + select patterns that were added recently, this adds some patterns for select + fma to turn them into predicated operations. Differential Revision: https://reviews.llvm.org/D85824	2020-08-12 18:35:01 +01:00
Anna Welker	4fe5615eab	[ARM][MVE] Enable tail predication for loops containing MVE gather/scatters Widen the scope of memory operations that are allowed to be tail predicated to include gathers and scatters, such that loops that are auto-vectorized with the option -enable-arm-maskedgatscat (and actually end up containing an MVE gather or scatter) can be tail predicated. Differential Revision: https://reviews.llvm.org/D85138	2020-08-12 15:32:37 +01:00
Sam Parker	ea8448e361	[LoopUnroll] Adjust CostKind query When TTI was updated to use an explicit cost, TCK_CodeSize was used although the default implicit cost would have been the hand-wavey cost of size and latency. So, revert back to this behaviour. This is not expected to have (much) impact on targets since most (all?) of them return the same value for SizeAndLatency and CodeSize. When optimising for size, the logic has been changed to query CodeSize costs instead of SizeAndLatency. This patch also adds a testing option in the unroller so that OptSize thresholds can be specified. Differential Revision: https://reviews.llvm.org/D85723	2020-08-12 12:56:09 +01:00
Sjoerd Meijer	6716e7868e	[ARM][MVE] tail-predication: overflow checks for backedge taken count. This pick ups the work on the overflow checks for get.active.lane.mask, which ensure that it is safe to insert the VCTP intrinisc that enables tail-predication. For a 2d auto-correlation kernel and its inner loop j: M = Size - i; for (j = 0; j < M; j++) Sum += Input[j] * Input[j+i]; For this inner loop, the SCEV backedge taken count (BTC) expression is: (-1 + (sext i16 %Size to i32)),+,-1}<nw><%for.body> and LoopUtil cannotBeMaxInLoop couldn't calculate a bound on this, thus "BTC cannot be max" could not be determined. So overflow behaviour had to be assumed in the loop tripcount expression that uses the BTC. As a result tail-predication had to be forced (with an option) for this case. This change solves that by using ScalarEvolution's helper getConstantMaxBackedgeTakenCount which is able to determine the range of BTC, thus can determine it is safe, so that we no longer need to force tail-predication as reflected in the changed test cases. Differential Revision: https://reviews.llvm.org/D85737	2020-08-12 09:32:26 +01:00
Kerry McLaughlin	85c7e89f3b	[CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize Changes the Offset arguments to both functions from int64_t to TypeSize & updates all uses of the functions to create the offset using TypeSize::Fixed() Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85220	2020-08-11 12:17:10 +01:00
Sam Parker	4f9f4b21e0	[ARM] Unrestrict Armv8-a IT when at minsize IT blocks with more than one instruction were performance deprecated in Armv8 but that doesn't mean we should follow that advise when optimising for size. Differential Revision: https://reviews.llvm.org/D85638	2020-08-10 14:59:53 +01:00
David Green	186a7f81e8	[ARM] Add VADDV and VMLAV patterns for v16i16 This adds patterns for v16i16's vecreduce, using all the existing code to go via an i32 VADDV/VMLAV and truncating the result. Differential Revision: https://reviews.llvm.org/D85452	2020-08-09 11:09:49 +01:00
David Green	8590e5abad	[ARM] Allow vecreduce_add in tail predicated loops This allows vecreduce_add in loops so that we can tailpredicate them. Differential Revision: https://reviews.llvm.org/D85454	2020-08-09 10:57:17 +01:00
David Green	296faa91ed	[ARM] Some formatting and predicate VRHADD patterns. NFC This formats some of the MVE patterns, and adds a missing Predicates = [HasMVEInt] to some VRHADD patterns I noticed as going through. Although I don't believe NEON would ever use the patterns (as it would use ADDL and VSHRN instead) they should ideally be predicated on having MVE instructions.	2020-08-09 10:07:52 +01:00
Martin Storsjö	5eedc01a82	[ARM, AArch64] Fix a comment typo. NFC.	2020-08-06 09:23:45 +03:00
Sam Parker	f2675ab45f	[ARM][CostModel] Implement getCFInstrCost As with other targets, set the throughput cost of control-flow instructions to free so that we don't miss out of vectorization opportunities. Differential Revision: https://reviews.llvm.org/D85283	2020-08-05 12:44:51 +01:00
Meera Nakrani	20283ff491	[ARM] Generated SSAT and USAT instructions with shift Added patterns so that both SSAT and USAT instructions are generated with shifts. Added corresponding regression tests. Differential Review: https://reviews.llvm.org/D85120	2020-08-04 09:38:17 +00:00
Christopher Tetreault	b5059b7140	[SVE] Remove bad call to VectorType::getNumElements() from ARM Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85152	2020-08-03 15:41:14 -07:00
Jordan Rupprecht	af3ec731d5	[NFC][ARM] Silence unused variable in release builds	2020-08-03 15:21:44 -07:00
David Green	22916481c1	[ARM] Convert VPSEL to VMOV in tail predicated loops VPSEL has slightly different semantics under tail predication (it can end up selecting from Qn, Qm and Qd). We do not model that at the moment so they block tail predicated loops from being formed. This just converts them into a predicated VMOV instead (via a VORR), allowing tail predication to happen whilst still modelling the original behaviour of the input. Differential Revision: https://reviews.llvm.org/D85110	2020-08-03 22:03:14 +01:00
Nicholas Guy	18279a54b5	[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz Fixes a regression caused by D82439, in which IT blocks were no longer being generated when -Oz is present. This was due to the CPSR register being marked as dead, while this case was not accounted for. Differential Revision: https://reviews.llvm.org/D83667	2020-08-03 13:20:32 +01:00
David Green	fd69df62ed	[ARM] Distribute post-inc for Thumb2 sign/zero extending loads/stores This adds sign/zero extending scalar loads/stores to the MVE instructions added in D77813, allowing us to create up more post-inc instructions. These are comparatively simple, compared to LDR/STR (which may be better turned into an LDRD/LDM), but still require some additions over MVE instructions. Because there are i12 and i8 variants of the offset loads/stores dealing with different signs, we may need to convert an i12 address to a i8 negative instruction. t2LDRBi12 can also be shrunk to a tLDRi under the right conditions, so we need to be careful with codesize too. Differential Revision: https://reviews.llvm.org/D78625	2020-08-01 14:01:18 +01:00
Benjamin Kramer	c6f08b14d4	Hide some internal symbols. NFC.	2020-07-31 17:28:02 +02:00
Matt Arsenault	57bd64ff84	Support addrspacecast initializers with isNoopAddrSpaceCast Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs with the DataLayout.	2020-07-31 10:42:43 -04:00

... 4 5 6 7 8 ...

11532 Commits