llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	11b34e78c1	[ARM] Define CPSR on MEMCPY pseudos These pseudos are converted post-isel into t2WhileLoopStart and t2LoopEnd/LoopDec instructions, which themselves are defined to clobber CPSR. Doing the same with the MEMCPY nodes will make sure they are scheduled correctly to not end up with incorrect uses.	2021-05-14 15:06:59 +01:00
Tomas Matheson	34c098b780	[ARM] Prevent spilling between ldrex/strex pairs Based on the same for AArch64: `4751cadcca` At -O0, the fast register allocator may insert spills between the ldrex and strex instructions inserted by AtomicExpandPass when expanding atomicrmw instructions in LL/SC loops. To avoid this, expand to cmpxchg loops and therefore expand the cmpxchg pseudos after register allocation. Required a tweak to ARMExpandPseudo::ExpandCMP_SWAP to use the 4-byte encoding of UXT, since the pseudo instruction can be allocated a high register (R8-R15) which the 2-byte encoding doesn't support. However, the 4-byte encodings are not present for ARM v8-M Baseline. To enable this, two new pseudos are added for Thumb which are only valid for v8mbase, tCMP_SWAP_8 and tCMP_SWAP_16. The previously committed attempt in D101164 had to be reverted due to runtime failures in the test suites. Rather than spending time fixing that implementation (adding another implementation of atomic operations and more divergence between backends) I have chosen to follow the approach taken in D101163. Differential Revision: https://reviews.llvm.org/D101898 Depends on D101912	2021-05-12 09:43:21 +01:00
David Green	76786037c6	[ARM] Fix postinc of vst1xN These nodes are not handled correctly by CombineBaseUpdate. For the moment, similar to `5f1cad4d29` mark them as unsupported.	2021-05-09 21:57:55 +01:00
Malhar Jajoo	dfe3ffaa4a	[ARM] Transforming memset to Tail predicated Loop This patch converts llvm.memset intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE). The llvm.memset is converted to a TP loop for both constant and non-constant input sizes (of llvm.memset). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100435	2021-05-07 13:35:53 +01:00
Malhar Jajoo	9ff38e2d9d	[ARM] Transforming memcpy to Tail predicated Loop This patch converts llvm.memcpy intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE). From an implementation point of view, the patch - adds an ARM specific SDAG Node (to which the llvm.memcpy intrinsic is lowered to, during first phase of ISel) - adds a corresponding TableGen entry to generate a pseudo instruction, with a custom inserter, on matching the above node. - Adds a custom inserter function that expands the pseudo instruction into MIR suitable to be (by later passes) into a WLSTP loop. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99723	2021-05-06 23:21:28 +01:00
Malhar Jajoo	fc690777fc	Revert "[ARM] Transforming memcpy to Tail predicated Loop" Reverting commit since it causes failure (10462). This reverts commit `b856f4a232`.	2021-05-06 12:39:08 +01:00
Malhar Jajoo	b856f4a232	[ARM] Transforming memcpy to Tail predicated Loop This patch converts llvm.memcpy intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE). From an implementation point of view, the patch - adds an ARM specific SDAG Node (to which the llvm.memcpy intrinsic is lowered to, during first phase of ISel) - adds a corresponding TableGen entry to generate a pseudo instruction, with a custom inserter, on matching the above node. - Adds a custom inserter function that expands the pseudo instruction into MIR suitable to be (by later passes) into a WLSTP loop. Note: A cli option is used to control the conversion of memcpy to TP loop and this option is currently disabled by default. It may be enabled in the future after further downstream testing. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99723	2021-05-06 09:34:09 +01:00
Tomas Matheson	9d86095ff8	Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0" This reverts commit `753185031d`.	2021-05-03 21:48:20 +01:00
Tomas Matheson	753185031d	[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0 atomicrmw instructions are expanded by AtomicExpandPass before register allocation into cmpxchg loops. Register allocation can insert spills between the exclusive loads and stores, which invalidates the exclusive monitor and can lead to infinite loops. To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them after register allocation. Floating point legalisation: f16 ATOMIC_LOAD_FADD(f16, f16) is legalised to f32 ATOMIC_LOAD_FADD(i16, f32) and then eventually f32 ATOMIC_LOAD_FADD_16(*i16, f32) Differential Revision: https://reviews.llvm.org/D101164 Originally submitted as `3338290c18`. Reverted in `c7df6b1223`.	2021-05-03 20:25:15 +01:00
David Green	d1bbe61d1c	[ARM] Memory operands for MVE gathers/scatters Similarly to D101096, this makes sure that MMO operands get propagated through from MVE gathers/scatters to the Machine Instructions. This allows extra scheduling freedom, not forcing the instructions to act as scheduling barriers. We create MMO's with an unknown size, specifying that they can load from anywhere in memory, similar to the masked_gather or X86 intrinsics. Differential Revision: https://reviews.llvm.org/D101219	2021-05-03 11:24:59 +01:00
Tomas Matheson	c7df6b1223	Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0" This reverts commit `3338290c18`. Broke expensive checks on debian.	2021-04-30 16:53:14 +01:00
Tomas Matheson	3338290c18	[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0 atomicrmw instructions are expanded by AtomicExpandPass before register allocation into cmpxchg loops. Register allocation can insert spills between the exclusive loads and stores, which invalidates the exclusive monitor and can lead to infinite loops. To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them after register allocation. Floating point legalisation: f16 ATOMIC_LOAD_FADD(f16, f16) is legalised to f32 ATOMIC_LOAD_FADD(i16, f32) and then eventually f32 ATOMIC_LOAD_FADD_16(*i16, f32) Differential Revision: https://reviews.llvm.org/D101164	2021-04-30 16:40:33 +01:00
David Green	e11420ca23	[ARM] Ensure CSINC has one use in CSINV combine Otherwise the CMP glue may be used in multiple nodes, needing to be emitted multiple times. Currently this either increases instruction count or fails as it attempt to insert the same node multiple times.	2021-04-29 10:59:14 +01:00
David Green	8de7d8b2c2	[ARM] Recognize VIDUP from BUILDVECTORs of additions This adds a pattern to recognize VIDUP from BUILD_VECTOR of incrementing adds. This can come up from either geps or adds, and came up recently in D100550. We are just looking for a BUILD_VECTOR where each lane is an add of the first lane with N*i, where i is the lane and N is one of 1, 2, 4, or 8, supported by the VIDUP instruction. Differential Revision: https://reviews.llvm.org/D101263	2021-04-27 19:33:24 +01:00
David Green	94c7bd7eb2	[ARM] Expand VMOVRRD simplification pattern This expands the VMOVRRD(extract(..(build_vector(a, b, c, d)))) pattern, to also handle insert_vectors. Providing we can find the correct insert, this helps further simplify patterns by removing the redundant VMOVRRD. Differential Revision: https://reviews.llvm.org/D100245	2021-04-26 12:27:38 +01:00
David Green	7255d1f54f	[ARM] Format ARMISD node definitions. NFC This clang-formats the list of ARMISD nodes. Usually this is something I would avoid, but these cause problems with formatting every time new nodes are added. The list in getTargetNodeName also makes use of MAKE_CASE macros, as other backends do.	2021-04-24 14:50:32 +01:00
Sander de Smalen	43ace8b5ce	[TTI] NFC: Change getScalingFactorCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100564	2021-04-23 16:06:36 +01:00
David Green	21a8b9d9e9	[ARM] Limit PerformExtractEltToVMOVRRD to when f64 is legal. The generic SoftFloatVectorExtract.ll test was failing when run on arm machines, as it tries to create a f64 under soft float. Limit the transform to when f64 is legal. Also add a missing override, as reported in D100244.	2021-04-20 16:24:36 +01:00
David Green	48cef1fa8e	[ARM] Create VMOVRRD from adjacent vector extracts This adds a combine for extract(x, n); extract(x, n+1) -> VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the same time in a single instruction, and thanks to the other VMOVRRD folds we have added recently can help reduce the amount of executed instructions. Floating point types are very similar, but will include a bitcast to an integer type. This also adds a shouldRewriteCopySrc, to prevent copy propagation from DPR to SPR, which can break as not all DPR regs can be extracted from directly. Otherwise the machine verifier is unhappy. Differential Revision: https://reviews.llvm.org/D100244	2021-04-20 15:15:43 +01:00
David Green	00a6045473	[ARM] Combine sub 0, csinc X, Y, CC -> csinv -X, Y, CC Combine sub 0, csinc X, Y, CC to csinv -X, Y, CC providing that the negation of X is cheap, currently just handling constants. This comes up during the splat of an i1 to a predicate, where we now generate csetm, as opposed to cset; rsb. Differential Revision: https://reviews.llvm.org/D99940	2021-04-16 11:52:31 +01:00
Simon Pilgrim	ddbb58736a	[KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI. As promised in D98866	2021-04-06 10:11:41 +01:00
David Green	35e0567d58	[ARM] Add VREV MVE shuffle costs This uses the shuffle mask cost from D98206 to give a better cost of MVE VREV instructions. This helps especially in VectorCombine where the cost of shuffles is used to reorder bitcasts, which this helps keep the phase ordering test for fp16 reductions producing optimal code. The isVREVMask has been moved to a header file to allow it to be used across target transform and isel lowering. Differential Revision: https://reviews.llvm.org/D98210	2021-03-17 21:21:43 +00:00
David Green	bd516d24c1	[ARM] Move t2DoLoopStart reg alloc hint This adjusts the place that the t2DoLoopStart reg allocation hint is inserted, adding it in the ARMTPAndVPTOptimizaionPass in a similar place as other tail predicated loop optimizations. This removes the need for doing so in a custom inserter, and should make the hint more accurate, only adding it where we expect to create a DLS (not DLSTP or WLS).	2021-03-11 17:56:19 +00:00
David Green	fad70c3068	[ARM] Improve WLS lowering Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over to the WLS while loops, improving the chance of lowering them successfully. To do this the lowering has to change a little as the instructions are terminators that produce a value - something that needs to be treated carefully. Lowering starts at the Hardware Loop pass, inserting a new llvm.test.start.loop.iterations that produces both an i1 to control the loop entry and an i32 similar to the llvm.start.loop.iterations intrinsic added for do loops. This feeds into the loop phi, properly gluing the values together: %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div) %wls0 = extractvalue { i32, i1 } %wls, 0 %wls1 = extractvalue { i32, i1 } %wls, 1 br i1 %wls1, label %loop.ph, label %loop.exit ... loop: %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ] .. %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1) %cmp = icmp ne i32 %iv.next, 0 br i1 %cmp, label %loop, label %loop.exit The llvm.test.start.loop.iterations need to be lowered through ISel lowering as a pair of WLS and WLSSETUP nodes, which each get converted to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent t2WhileLoopStart from being a terminator that produces a value, something difficult to control at that stage in the pipeline. Instead the t2WhileLoopSetup produces the value of LR (essentially acting as a lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc). These are then converted into a single t2WhileLoopStartLR at the same point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop to prevent them from progressing further in the pipeline. The t2WhileLoopStartLR is a single instruction that takes a GPR and produces LR, similar to the WLS instruction. %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3 t2B %bb.1 ... bb.2.loop: %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2 ... %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2 t2B %bb.3 The t2WhileLoopStartLR can then be treated similar to the other low overhead loop pseudos, eventually being lowered to a WLS providing the branches are within range. Differential Revision: https://reviews.llvm.org/D97729	2021-03-11 17:56:19 +00:00
David Green	a968e7b82e	[ARM] KnownBits for CSINC/CSNEG/CSINV This adds some simple known bits handling for the three CSINC/NEG/INV instructions. From the operands known bits we can compute the common bits of the first operand and incremented/negated/inverted second operand. The first, especially CSINC ZR, ZR, comes up fair amount in the tests. The others are more rare so a unit test for them is added. Differential Revision: https://reviews.llvm.org/D97788	2021-03-04 08:40:20 +00:00
David Green	438c98515c	[ARM] Use 0, not ZR during ISel for CSINC/INV/NEG Instead of converting the 0 into a ZR reg during lowering, do that with tablegen by matching the zero immediate. This when combined with other optimizations is more likely to use ZR and helps keep the DAG more easily optimizable. It should not otherwise effect code generation.	2021-03-02 19:01:14 +00:00
David Green	91ebc4e864	[ARM] VMOVN undef folding If we insert undef using a VMOVN, we can just use the original value in three out of the four possible combinations. Using VMOVT into a undef vector will still require the lanes to be moved, but otherwise the non-undef value can be used.	2021-02-28 14:44:45 +00:00
David Green	0fe64812d8	[ARM] VECTOR_REG_CAST undef -> undef Propagate undef through VECTOR_REG_CAST nodes, allowing extra simplification in some patterns.	2021-02-28 11:13:49 +00:00
Leonard Chan	c77659e549	[llvm][IR] Do not place constants with static relocations in a mergeable section This patch provides two major changes: 1. Add getRelocationInfo to check if a constant will have static, dynamic, or no relocations. (Also rename the original needsRelocation to needsDynamicRelocation.) 2. Only allow a constant with no relocations (static or dynamic) to be placed in a mergeable section. This will allow unused symbols that contain static relocations and happen to fit in mergeable constant sections (.rodata.cstN) to instead be placed in unique-named sections if -fdata-sections is used and subsequently garbage collected by --gc-sections. See https://lists.llvm.org/pipermail/llvm-dev/2021-February/148281.html. Differential Revision: https://reviews.llvm.org/D95960	2021-02-18 15:39:00 -08:00
Serge Pavlov	816053bc71	[FPEnv][ARM] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D96501	2021-02-13 11:16:29 +07:00
David Green	875f0cbcc6	[ARM] Optimize fp store of extract to integer store if already available. Given a floating point store from an extracted vector, with an integer VGETLANE that already exists, storing the existing VGETLANEu directly can be better for performance. As the value is known to already be in an integer registers, this can help reduce fp register pressure, removed the need for the fp extract and allows use of more integer post-inc stores not available with vstr. This can be a bit narrow in scope, but helps with certain biquad kernels that store shuffled vector elements. Differential Revision: https://reviews.llvm.org/D96159	2021-02-12 18:34:58 +00:00
David Green	541828e35d	[ARM] Single source VMOVNT Our current lowering of VMOVNT goes via a shuffle vector of the form <0, N, 2, N+2, 4, N+4, ..>. That can of course also be a single input shuffle of the form <0, 0, 2, 2, 4, 4, ..>, where we use a VMOVNT to insert a vector into the top lanes of itself. This adds lowering of that case, re-using the existing isVMOVNMask. Differential Revision: https://reviews.llvm.org/D96065	2021-02-12 14:28:57 +00:00
David Green	1db7b9ceaa	[ARM] Make a BE predicate bitcast consistent with the rest of llvm We were storing predicate registers, such as a <8 x i1>, in the opposite order to how the rest of llvm expects. This actually turns out to be correct for the one place that usually uses it - the ScalarizeMaskedMemIntrin pass, but only because the pass was incorrect itself. This fixes the order so that bits are stored in the opposite order and bitcasts work as expected. This allows the Scalarization pass to be fixed, as in https://reviews.llvm.org/D94765. Differential Revision: https://reviews.llvm.org/D94867	2021-02-11 08:59:52 +00:00
David Green	0c7e044a7f	[ARM] One-off identity shuffle A One-Off Identity mask is a shuffle that is mostly an identity mask from as single source but contains a single element out-of-place, either from a different vector or from another position in the same vector. As opposed to lowering this via a ARMISD::BUILD_VECTOR we can generate an extract/insert pair directly. Under ARM with individually accessible lane elements this often becomes a simple lane move. This also alters the LowerVECTOR_SHUFFLEUsingMovs code to use v4f32 (not v4i32), a more natural type for lane moves. Differential Revision: https://reviews.llvm.org/D95551	2021-02-08 21:24:32 +00:00
David Green	11e415dc90	[ARM] Make v2f64 scalar_to_vector legal Because we mark all operations as expand for v2f64, scalar_to_vector would end up lowering through a stack store/reload. But it is pretty simple to implement, only inserting a D reg into an undef vector. This helps clear up some inefficient codegen from soft calling conventions. Differential Revision: https://reviews.llvm.org/D96153	2021-02-08 11:34:55 +00:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Ayke van Laethem	aecdf15cc7	[ARM] Do not emit ldrexd/strexd on Cortex-M chips The ldrexd/strexd instructions are not supported on M-class chips, see for example https://developer.arm.com/documentation/dui0489/e/arm-and-thumb-instructions/memory-access-instructions/ldrex-and-strex which says: > All these 32-bit Thumb instructions are available in ARMv6T2 and > above, except that LDREXD and STREXD are not available in the ARMv7-M > architecture. Looking at the ARMv8-M architecture, it appears that these instructions aren't supported either. The Architecture Reference Manual lists ldrex/strex but not ldrexd/strexd: https://developer.arm.com/documentation/ddi0553/bn/ Godbolt example on LLVM 11.0.0, which incorrectly emits ldrexd/strexd instructions: https://llvm.godbolt.org/z/5qqPnE Differential Revision: https://reviews.llvm.org/D95891	2021-02-04 21:55:34 +01:00
David Green	649a3d00df	[ARM] Handle f16 in GeneratePerfectShuffle This new f16 shuffle under Neon would hit an assert in GeneratePerfectShuffle as it would try to treat a f16 vector as an i8. Add f16 handling, treating them like an i16. Differential Revision: https://reviews.llvm.org/D95446	2021-02-04 11:14:52 +00:00
David Green	5b2626ea87	[ARM] Flatten identity shuffles through vqdmulh nodes Given a shuffle(vqdmulh(shuffle, shuffle), we can flatter the shuffles out if they become an identity mask. This can come up during lane interleaving, when we do that better. Differential Revision: https://reviews.llvm.org/D94034	2021-02-01 19:14:20 +00:00
David Green	5805521207	[ARM] Simplify VMOVRRD from extracts of buildvectors Under the softfp calling convention, we are often left with VMOVRRD(extract(bitcast(build_vector(a, b, c, d)))) for the return value of the function. These can be simplified to a,b or c,d directly, depending on the value of the extract. Big endian is a little different because the bitcast switches the lanes around, meaning we end up with b,a or d,c. Differential Revision: https://reviews.llvm.org/D94989	2021-02-01 16:09:25 +00:00
David Green	ad12e6ee95	[ARM] Turn sext_inreg(VGetLaneu) into VGetLaneu This adds a DAG combine for converting sext_inreg of VGetLaneu into VGetLanes, providing the types match correctly. Differential Revision: https://reviews.llvm.org/D95073	2021-02-01 11:10:35 +00:00
David Green	6ab792b68d	[ARM] Simplify extract of VMOVDRR Under SoftFP calling conventions, we can be left with extract(bitcast(BUILD_VECTOR(VMOVDRR(a, b), ..))) patterns that can simplify to a or b, depending on the extract lane. Differential Revision: https://reviews.llvm.org/D94990	2021-02-01 10:24:57 +00:00
Kazu Hirata	7925aa091d	[llvm] Populate SmallVector at construction time (NFC)	2021-01-28 22:21:14 -08:00
David Green	40f46cb0e4	[ARM] Add alignment checks for MVE VLDn The MVE VLD2/4 and VST2/4 instructions require the pointer to be aligned to at least the size of the element type. This adds a check for that into the ARM lowerInterleavedStore and lowerInterleavedLoad functions, not creating the intrinsics if they are invalid for the alignment of the load/store. Unfortunately this is one of those bug fixes that does effect some useful codegen, as we were able to sometimes do some nice lowering of q15 types. But they can cause problem with low aligned pointers. Differential Revision: https://reviews.llvm.org/D95319	2021-01-28 13:10:08 +00:00
Kazu Hirata	054444177b	[Target] Use llvm::append_range (NFC)	2021-01-24 12:18:56 -08:00
Kazu Hirata	e4847a7fcf	Revert "[Target] Use llvm::append_range (NFC)" This reverts commit `cc7a238286`. The X86WinEHState.cpp hunk seems to break certain builds.	2021-01-23 11:25:27 -08:00
Kazu Hirata	cc7a238286	[Target] Use llvm::append_range (NFC)	2021-01-23 10:56:31 -08:00
David Green	af03324984	[ARM] Disable sign extended SSAT pattern recognition. I may have given bad advice, and skipping sext_inreg when matching SSAT patterns is not valid on it's own. It at least needs to sext_inreg the input again, but as far as I can tell is still only valid based on demanded bits. For the moment disable that part of the combine, hopefully reimplementing it in the future more correctly.	2021-01-22 14:07:48 +00:00
David Green	9ae73cdbc1	[ARM] Adjust isSaturatingConditional to return a new SDValue. NFC This replaces the isSaturatingConditional function with LowerSaturatingConditional that directly returns a new SSAT or USAT SDValue, instead of returning true and the components of it.	2021-01-22 11:11:36 +00:00
David Green	6a563eef13	[ARM] Expand vXi1 VSELECT's We have no lowering for VSELECT vXi1, vXi1, vXi1, so mark them as expanded to turn them into a series of logical operations. Differential Revision: https://reviews.llvm.org/D94946	2021-01-19 17:56:50 +00:00
David Green	c29ca8551a	[ARM] Update isVMOVNOriginalMask to handle single input shuffle vectors The isVMOVNOriginalMask was previously only checking for two input shuffles that could be better expanded as vmovn nodes. This expands that to single input shuffles that will later be legalized to multiple vectors. Differential Revision: https://reviews.llvm.org/D94189	2021-01-13 08:51:28 +00:00
David Green	024af42c60	[ARM] Custom lower i1 vector truncates The ISel patterns we have for truncating to i1's under MVE do not seem to be correct. Instead custom lower to icmp(ne, and(x, 1), 0). Differential Revision: https://reviews.llvm.org/D94226	2021-01-08 18:21:00 +00:00
David Green	ddb82fc76c	[ARM] Handle any extend whilst lowering mull Similar to `78d8a821e2` but for ARM, this handles any_extend whilst creating MULL nodes, treating them as zextends. Differential Revision: https://reviews.llvm.org/D93834	2021-01-06 10:51:12 +00:00
David Green	901cc9b6f3	[ARM] Extend lowering for i64 reductions The lowering of a <4 x i16> or <4 x i8> vecreduce.add into an i64 would previously be expanded, due to the i64 not being legal. This patch adjusts our reduction matchers, making it produce a VADDLV(sext A to v4i32) instead. Differential Revision: https://reviews.llvm.org/D93622	2021-01-04 12:44:43 +00:00
Kazu Hirata	0e219b6443	[Target] Construct SmallVector with iterator ranges (NFC)	2021-01-03 09:57:45 -08:00
Kristof Beyls	df8ed39283	[ARM] harden-sls-blr: avoid r12 and lr in indirect calls. As a linker is allowed to clobber r12 on function calls, the code transformation that hardens indirect calls is not correct in case a linker does so. Similarly, the transformation is not correct when register lr is used. This patch makes sure that r12 or lr are not used for indirect calls when harden-sls-blr is enabled. Differential Revision: https://reviews.llvm.org/D92469	2020-12-19 12:39:59 +00:00
David Green	03e675fd12	[ARM] Turn pred_cast(xor(x, -1)) into xor(pred_cast(x), -1) This folds a not (an xor -1) though a predicate_cast, so that it can be turned into a VPNOT and potentially be folded away as an else predicate inside a VPT block. Differential Revision: https://reviews.llvm.org/D92235	2020-12-08 15:22:46 +00:00
David Green	eedf0ed63e	[ARM] Mark select and selectcc of MVE vector operations as expand. We already expand select and select_cc in codegenprepare, but they can still be generated under some situations. Explicitly mark them as expand to ensure they are not produced, leading to a failure to select the nodes. Differential Revision: https://reviews.llvm.org/D92373	2020-12-01 15:05:55 +00:00
David Green	7923d71b4a	[ARM] PREDICATE_CAST demanded bits The PREDICATE_CAST node is used to model moves between MVE predicate registers and gpr's, and eventually become a VMSR p0, rn. When moving to a predicate only the bottom 16 bits of the sources register are demanded. This adds a simple fold for that, allowing it to potentially remove instructions like uxth. Differential Revision: https://reviews.llvm.org/D92213	2020-12-01 10:32:24 +00:00
Simon Pilgrim	1a62ca65c1	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
David Green	73a6cd4b6b	[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR This hints the operand of a t2DoLoopStart towards using LR, which can help make it more likely to become t2DLS lr, lr. This makes it easier to move if needed (as the input is the same as the output), or potentially remove entirely. The hint is added after others (from COPY's etc) which still take precedence. It needed to find a place to add the hint, which currently uses the post isel custom inserter. Differential Revision: https://reviews.llvm.org/D89883	2020-11-10 16:28:57 +00:00
David Green	d14db8c8dc	[ARM] Match MVE vqdmulh This adds ISel matching for a form of VQDMULH. There are several ir patterns that we could match to that instruction, this one is for: min(ashr(mul(sext(a), sext(b)), 7), 127) Which is what llvm will optimize to once it has removed the max that usually makes up the min/max saturate pattern, as in this case the compare will always be false. The additional complication to match i32 patterns (which extend into an i64) is that the min will be a vselect/setcc, as vmin is not supported for i64 vectors. Tablegen patterns have also been updated to attempt to reuse the MVE_TwoOpPattern patterns. Differential Revision: https://reviews.llvm.org/D90096	2020-10-30 13:34:27 +00:00
David Sherwood	47f2dc7e5f	[SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets In most of lib/Target we know that we are not dealing with scalable types so it's perfectly fine to replace TypeSize comparison operators with their fixed width equivalents, making use of getFixedSize() and so on. Differential Revision: https://reviews.llvm.org/D89101	2020-10-15 09:01:21 +01:00
Sam Tebbs	68e002e181	[ARM] Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV. Differential Revision: https://reviews.llvm.org/D87836	2020-10-06 14:44:58 +01:00
Amara Emerson	c9f5cdd453	Revert "[ARM]Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV" This reverts commit `2573cf3c3d`. These seem to break some lit tests.	2020-10-05 10:52:43 -07:00
Sam Tebbs	2573cf3c3d	[ARM]Fold select_cc(vecreduce_[u\|s][min\|max], x) into VMINV or VMAXV This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV. Differential Revision: https://reviews.llvm.org/D87836	2020-10-05 15:51:28 +01:00
David Green	7feafa0286	[ARM] Fix pointer offset when splitting stores from VMOVDRR We were not accounting for the pointer offset when splitting a store from a VMOVDRR node, which could lead to incorrect aliasing info. In this case it is the fneg via integer arithmetic that gives us a store->load pair that we started getting wrong. Differential Revision: https://reviews.llvm.org/D88653	2020-10-03 16:47:50 +01:00
David Sherwood	e077367a28	[SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits An existing function Type::getScalarSizeInBits returns a uint64_t instead of a TypeSize class because the caller is requesting a scalar size, which cannot be scalable. This patch makes other similar functions requesting a scalar size consistent with that, thereby eliminating more than 1000 implicit TypeSize -> uint64_t casts. Differential revision: https://reviews.llvm.org/D87889	2020-09-23 09:20:08 +01:00
Momchil Velikov	742250bf62	[ARM][CMSE] Issue an error if passing arguments through memory across security boundary It was never supported and that part was accidentally omitted when upstreaming D76518. Differential Revision: https://reviews.llvm.org/D86478 Change-Id: If6ba9506eb0431c87a1d42a38aa60e47ce263039	2020-09-21 17:26:10 +01:00
David Green	f4c5cadbcb	[ARM] Select f32 constants with vmov.f16 This adds lowering for f32 values using the vmov.f16, which zeroes the top bits whilst setting the lower bits to a pattern. This range of values does not often come up, except where a f16 constant value has been converted to a f32. Differential Revision: https://reviews.llvm.org/D87790	2020-09-21 11:10:47 +01:00
David Green	29bd8ea110	[ARM] Constant fold VMOVrh This adds simple constant folding for VMOVrh, to constant fold fp16 constants to integer values. It can help especially with soft calling conventions, but some of the results are not optimal as we end up loading using a vldr. This will be improved in a follow up patch. Differential Revision: https://reviews.llvm.org/D87789	2020-09-20 21:32:51 +01:00
David Green	34b27b9441	[ARM] Sink splats to MVE intrinsics The predicated MVE intrinsics are generated as, for example, llvm.arm.mve.add.predicated(x, splat(y). p). We need to sink the splat value back into the loop, like we do for other instructions, so we can re-select qr variants. Differential Revision: https://reviews.llvm.org/D87693	2020-09-17 16:00:51 +01:00
Meera Nakrani	1119bf95be	[ARM] Corrected condition in isSaturatingConditional Fixed a small error in an if condition to prevent usat/ssat being generated if (upper constant + 1) is not a power of 2.	2020-09-15 10:14:30 +00:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Meera Nakrani	dd519bf0b0	[ARM] Selects SSAT/USAT from correct LLVM IR LLVM will canonicalize conditional selectors to a different pattern than the old code that was used. This is updating the function to match the new expected patterns and select SSAT or USAT when successful. Tests have also been updated to use the new patterns. Differential Review: https://reviews.llvm.org/D87379	2020-09-14 10:58:21 +00:00
David Green	6cfd38d03d	[ARM] Fixup single source mla reductions. This fixes a complication on top of D87276. If we are sign extending around a mul with the two operands that are the same, instcombine will helpfully convert one of the sext to a zext. Reverse that so that we again generate a reduction. Differnetial Revision: https://reviews.llvm.org/D87287	2020-09-12 14:31:26 +01:00
David Green	c437446d90	[ARM] Recognize "double extend" reduction patterns We can sometimes get code that does: xe = zext i16 x to i32 ye = zext i16 y to i32 m = mul i32 xe, ye me = zext i32 m to i64 r = vecreduce.add(me) This "double extend" can trip up the reduction identification, but should give identical results. This extends the pattern matching to handle them. Differential Revision: https://reviews.llvm.org/D87276	2020-09-12 13:51:42 +01:00
Sjoerd Meijer	5f1cad4d29	[ARM] Skip combining base updates for vld1x NEON intrinsics Skip this for now, to avoid a backend crash in: UNREACHABLE executed at llvm/lib/Target/ARM/ARMISelLowering.cpp:13412 This should fix PR45824. Differential Revision: https://reviews.llvm.org/D86784	2020-08-28 20:29:15 +01:00
Anna Welker	8048068c3e	[ARM][MVE] Allow tail predication for strides !=1 with gather/scatters If gather/scatters are enabled, ARMTargetTransformInfo now allows tail predication for loops with a much wider range of strides, up to anything that is loop invariant. Differential Revision: https://reviews.llvm.org/D85410	2020-08-24 13:54:47 +01:00
Kerry McLaughlin	85c7e89f3b	[CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize Changes the Offset arguments to both functions from int64_t to TypeSize & updates all uses of the functions to create the offset using TypeSize::Fixed() Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85220	2020-08-11 12:17:10 +01:00
David Green	186a7f81e8	[ARM] Add VADDV and VMLAV patterns for v16i16 This adds patterns for v16i16's vecreduce, using all the existing code to go via an i32 VADDV/VMLAV and truncating the result. Differential Revision: https://reviews.llvm.org/D85452	2020-08-09 11:09:49 +01:00
David Green	b37e92201c	[ARM] Add predicated mla reduction patterns Similar to `8fa824d7a3` but this time for MLA patterns, this selects predicated vmlav/vmlava/vmlalv/vmlava instructions from vecreduce.add(select(p, mul(x, y), 0)) nodes. Differential Revision: https://reviews.llvm.org/D84102	2020-07-23 21:47:59 +01:00
David Green	8fa824d7a3	[ARM] Add predicated add reduction patterns Given a vecreduce.add(select(p, x, 0)), we can convert that to a predicated vaddv, as the else value for the select is the identity value, a zero. That is what this patch does for the vaddv, vaddva, vaddlv and vaddlva instructions, copying the existing patterns to also handle predication through a select. Differential Revision: https://reviews.llvm.org/D84101	2020-07-22 17:30:02 +01:00
Pavel Iliin	b9a6fb6428	[ARM] VBIT/VBIF support added. Vector bitwise selects are matched by pseudo VBSP instruction and expanded to VBSL/VBIT/VBIF after register allocation depend on operands registers to minimize extra copies.	2020-07-16 11:25:53 +01:00
Guillaume Chatelet	87e2751cf0	[Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D83082	2020-07-03 08:06:43 +00:00
Guillaume Chatelet	d3085c2501	[Alignment][NFC] Transition and simplify calls to DL::getABITypeAlignment This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82956	2020-07-01 14:31:56 +00:00
Guillaume Chatelet	28de229bc6	[Alignment][NFC] Migrate MachineFrameInfo::CreateStackObject to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82894	2020-07-01 07:28:11 +00:00
Guillaume Chatelet	4f5133a4dc	[Alignment][NFC] Migrate AArch64, ARM, Hexagon, MSP and NVPTX backends to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82749	2020-06-30 07:56:17 +00:00
David Green	deb72ce298	[ARM] Better reductions MVE has native reductions for integer add and min/max. The others need to be expanded to a series of extract's and scalar operators to reduce the vector into a single scalar. The default codegen for that expands the reduction into a series of in-order operations. This modifies that to something more suitable for MVE. The basic idea is to use vector operations until there are 4 remaining items then switch to pairwise operations. For example a v8f16 fadd reduction would become: Y = VREV X Z = ADD(X, Y) z0 = Z[0] + Z[1] z1 = Z[2] + Z[3] return z0 + z1 The awkwardness (there is always some) comes in from something like a v4f16, which is first legalized by adding identity values to the extra lanes of the reduction, and which can then not be optimized away through the vrev; fadd combo, the inserts remain. I've made sure they custom lower so that we can produce the pairwise additions before the extra values are added. Differential Revision: https://reviews.llvm.org/D81397	2020-06-29 16:04:13 +01:00
Guillaume Chatelet	368a5e3a66	[Alignment][NFC] migrate DataLayout::getPreferredAlignment This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82752	2020-06-29 11:24:36 +00:00
Simon Pilgrim	973685fc78	[TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.	2020-06-29 11:46:58 +01:00
David Green	d79b57b8bb	[ARM] Split FPExt loads This extends PerformSplittingToWideningLoad to also handle FP_Ext, as well as sign and zero extends. It uses an integer extending load followed by a VCVTL on the bottom lanes to efficiently perform an fpext on a smaller than legal type. The existing code had to be rewritten a little to not just split the node in two and let legalization handle it from there, but to actually split into legal chunks. Differential Revision: https://reviews.llvm.org/D81340	2020-06-25 21:55:13 +01:00
David Green	8532b2ee89	[ARM] MVE VCVT lowering for f16->f32 extends This adds code to lower f16 to f32 fp_exts's using an MVE VCVT instructions, similar to a recent similar patch for fp_trunc. Again it goes through the lowering of a BUILD_VECTOR, but is slightly simpler only having to deal with interleaved indices. It adds a VCVTL node to lower to, similar to VCVTN. Differential Revision: https://reviews.llvm.org/D81339	2020-06-25 20:54:26 +01:00
David Green	0bfb4c2506	[ARM] Add FP_ROUND handling to splitting MVE stores This splits MVE vector stores of a fp_trunc in the same way that we do for standard trunc's. It extends PerformSplittingToNarrowingStores to handle fp_round, splitting the store into pieces and adding a VCVTNb to perform the actual fp_round. The actual store is then converted to an integer store so that it can truncate bottom lanes of the result. Differential Revision: https://reviews.llvm.org/D81141	2020-06-25 19:37:15 +01:00
David Green	b044a82270	[ARM] Fixup for signed comparison warning. NFC	2020-06-25 16:29:44 +01:00
David Green	3cb2190b0b	[ARM] MVE VCVT lowering for f32->f16 truncs This adds code to lower f32 to f16 fp_trunc's using a pair of MVE VCVT instructions. Due to v4f16 not being legal, fp_round are often split up fairly early. So this reconstructs the vcvt's from a buildvector of fp_rounds from two vector inputs. Something like: BUILDVECTOR(FP_ROUND(EXTRACT_ELT(X, 0), FP_ROUND(EXTRACT_ELT(Y, 0), FP_ROUND(EXTRACT_ELT(X, 1), FP_ROUND(EXTRACT_ELT(Y, 1), ...) It adds a VCVTN node to handle this, which like VMOVN or VQMOVN lowers into the top/bottom lanes of an MVE instruction. Differential Revision: https://reviews.llvm.org/D81139	2020-06-25 15:59:36 +01:00
Simon Tatham	b769eb02b5	[ARM][BFloat] Legalize bf16 type even without fullfp16. Summary: This change permits scalar bfloats to be loaded, stored, moved and used as function call arguments and return values, whenever the bf16 feature is supported by the subtarget. Previously that was only supported in the presence of the fullfp16 feature, because the code generation strategy depended on instructions from that extension. This change adds alternative code generation strategies so that those operations can be done even without fullfp16. The strategy for loads and stores is to replace VLDRH/VSTRH with integer LDRH/STRH plus a move between register classes. I've written isel patterns for those, conditional on //not// having the fullfp16 feature (so that in the fullfp16 case, the existing patterns will still be used). For function arguments and returns, instead of writing isel patterns to match `VMOVhr` and `VMOVrh`, I've avoided generating those SDNodes in the first place, by factoring out the code that constructs them into helper functions `MoveToHPR` and `MoveFromHPR` which have a fallback for non-fullfp16 subtargets. The current output code is not especially pretty: in the new test file you can see unnecessary store/load pairs implementing no-op bitcasts, and lots of pointless moves back and forth between FP registers and GPRs. But it at least works, which is an improvement on the previous situation. Reviewers: dmgreen, SjoerdMeijer, stuij, chill, miyuki, labrinea Reviewed By: dmgreen, labrinea Subscribers: labrinea, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82372	2020-06-24 09:36:26 +01:00
Eli Friedman	e9d4e34ab8	[AArch64][SVE] Add legalization support for i32/i64 vector srem/urem Implement them on top of sdiv/udiv, similar to what we do for integer types. Potential future work: implementing i8/i16 srem/urem, optimizations for constant divisors, optimizing the mul+sub to mls. Differential Revision: https://reviews.llvm.org/D81511	2020-06-23 16:27:52 -07:00
Alexandros Lamprineas	ecdf48f15b	[ARM] Basic bfloat support This patch adds basic support for BFloat in the Arm backend. For now the code generation relies on fullfp16 being present. Briefly: * adds the bfloat scalar and vector types in the necessary register classes, * adjusts the calling convention to cope with bfloat argument passing and return, * adds codegen patterns for moves, loads and stores. It's tested mostly by the intrinsic patches that depend on it (load/store, convert/copy). The following people contributed to this patch: * Alexandros Lamprineas * Ties Stuij Differential Revision: https://reviews.llvm.org/D81373	2020-06-18 17:26:24 +01:00
Lucas Prates	92ad6d57c2	[ARM] Moving CMSE handling of half arguments and return to the backend Summary: As half-precision floating point arguments and returns were previously coerced to either float or int32 by clang's codegen, the CMSE handling of those was also performed in clang's side by zeroing the unused MSBs of the coercer values. This patch moves this handling to the backend's calling convention lowering, making sure the high bits of the registers used by half-precision arguments and returns are zeroed. Reviewers: chill, rjmccall, ostannard Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D81428	2020-06-18 13:16:29 +01:00
Lucas Prates	a255931c40	[ARM] Supporting lowering of half-precision FP arguments and returns in AArch32's backend Summary: Half-precision floating point arguments and returns are currently promoted to either float or int32 in clang's CodeGen and there's no existing support for the lowering of `half` arguments and returns from IR in AArch32's backend. Such frontend coercions, implemented as coercion through memory in clang, can cause a series of issues in argument lowering, as causing arguments to be stored on the wrong bits on big-endian architectures and incurring in missing overflow detections in the return of certain functions. This patch introduces the handling of half-precision arguments and returns in the backend using the actual "half" type on the IR. Using the "half" type the backend is able to properly enforce the AAPCS' directions for those arguments, making sure they are stored on the proper bits of the registers and performing the necessary floating point convertions. Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer Reviewed By: ostannard Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75169	2020-06-18 13:15:13 +01:00
David Green	158e734af1	[ARM] Adjust AND/OR combines to not call isConstantSplat on i1 vectors. NFC. The rearranges PerformANDCombine and PerformORCombine to try and make sure we don't call isConstantSplat on any i1 vectors. As pointed out in D81860 it may not be very well defined in those cases.	2020-06-18 08:25:44 +01:00
David Green	f269bb7da0	[ARM] Fix crash trying to generate i1 immediates These code patterns attempt to call isVMOVModifiedImm on a splat of i1 values, leading to an unreachable being hit. I've guarded the call on a more specific set of sizes, as i1 vectors are legal under MVE. Differential Revision: https://reviews.llvm.org/D81860	2020-06-16 12:27:24 +01:00
Eli Friedman	7e58d0ded0	Revert "[arm][darwin] Don't generate libcalls for wide shifts on Darwin" This reverts commit `2ba016cd5c`. This is causing a failure on the clang-cmake-armv7-full bot, and there are outstanding review comments.	2020-06-08 16:37:29 -07:00
Guillaume Chatelet	94b0c32a0b	[Alignment][NFC] Migrate HandleByVal to Align Summary: Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::HandleByVal` without marking it `override`. This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81365	2020-06-08 10:50:27 +00:00
Simon Wallis	7432fb2c78	[ARM][XO] Execute-only miscompiles double literals for big-endian Summary: With -mbig-endian -mexecute-only and targeting an fpu, an incorrect sequence of movw/movt was generated to construct a double literal. The test suite was hardwired to check these wrong values. The fault was caused by the explicit word swap in LowerConstantFP(). With -mbig-endian -mexecute-only -mfpu=none, a correct sequence of movw/movt is generated to construct a double literal. The test suite did not test this no fpu case. The test suite expected values have been corrected. The test file is updated to add testing of fpu=none case Reviewers: christof, llvm-commits, dmgreen Reviewed By: dmgreen Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss Tags: #llvm Differential Revision: https://reviews.llvm.org/D81259 Change-Id: Ia3737df243218c89c82f02b7f9f4032ecd5a3917	2020-06-08 08:13:08 +01:00
Alex Lorenz	2ba016cd5c	[arm][darwin] Don't generate libcalls for wide shifts on Darwin Similar to `ceb801612a`. Darwin doesn't always use compiler-rt, and so we can't assume that these functions are available on arm.	2020-06-05 15:41:23 -07:00
Fangrui Song	d2bd075e8d	Fix -Wunused-variable after D80515	2020-06-05 11:46:50 -07:00
David Green	e73bb45c2b	[ARM] VQMOVN demand bits analysis Similar to VMOVN, a VQMOVN will only demand the top/bottom lanes of it's first input. However unlike VMOVN it will need access to the entire second argument, as that value is saturated not just moved in place. Differential Revision: https://reviews.llvm.org/D80515	2020-06-05 18:41:02 +01:00
Zequan Wu	80e107ccd0	Add NoMerge MIFlag to avoid MIR branch folding Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given Differential Revision: https://reviews.llvm.org/D79537	2020-05-29 12:31:06 -07:00
Victor Campos	c010d4d195	[ARM] Improve codegen of volatile load/store of i64 Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. The code generation explicitly deviates from using the register-offset variant of LDRD/STRD. In this variant, the register allocated to the register-offset cannot be reused in any of the remaining operands. Such restriction seems to be non-trivial to implement in LLVM, thus it is left as a to-do. Differential Revision: https://reviews.llvm.org/D70072	2020-05-28 10:52:43 +01:00
Sanjay Patel	7eed772a27	[PatternMatch] abbreviate vector inst matchers; NFC Readability is not reduced with these opcodes/match lines, so reduce odds of awkward wrapping from 80-col limit.	2020-05-24 09:19:47 -04:00
Victor Campos	872ee78f65	Revert "[ARM] Improve codegen of volatile load/store of i64" This reverts commit `8a12553223`. A bug has been found when generating code for Thumb2. In some very specific cases, the prologue/epilogue emitter generates erroneous stack offsets for the new LDRD instructions that access the stack. This bug does not seem to be caused by the reverted patch though. Likely the latter has made an undiscovered issue emerge in the prologue/epilogue emission pass. Nevertheless, this reversion is necessary since it is blocking users of the ARM backend.	2020-05-22 11:01:57 +01:00
David Green	72f1fb2edf	[ARM] Combines for VMOVN This adds two combines for VMOVN, one to fold VMOVN[tb](c, VQMOVNb(a, b)) => VQMOVN[tb](c, b) The other to perform demand bits analysis on the lanes of a VMOVN. We know that only the bottom lanes of the second operand and the top or bottom lanes of the Qd operand are needed in the result, depending on if the VMOVN is bottom or top. Differential Revision: https://reviews.llvm.org/D77718	2020-05-16 15:13:16 +01:00
David Green	2e1fbf85b6	[ARM] MVE saturating truncates This adds some custom lowering for VQMOVN, an instruction that can be used to perform saturating truncates from a pair of min(max(X, -0x8000), 0x7fff), providing those constants are correct. This leaves a VQMOVNBs which saturates the value and inserts that into the bottom lanes of an existing vector. We then need to do something with the other lanes, extending the value using a vmovlb. Ideally, as will often be the case, only the bottom lane of what remains will be demanded, allowing the vmovlb to be removed. Which should mean the instruction is either equal or a win most of the time, and allows some extra follow-up folding to happen. Differential Revision: https://reviews.llvm.org/D77590	2020-05-16 15:10:20 +01:00
Christopher Tetreault	245679b62e	[SVE] Remove usages of VectorType::getNumElements() from ARM Reviewers: efriedma, fpetrogalli, kmclaughlin, grosbach, dmgreen Reviewed By: dmgreen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, dmgreen, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79816	2020-05-15 12:55:27 -07:00
Momchil Velikov	bc2e572f51	Re-commit: [ARM] CMSE code generation This patch implements the final bits of CMSE code generation: * emit special linker symbols * restrict parameter passing to no use memory * emit BXNS and BLXNS instructions for returns from non-secure entry functions, and non-secure function calls, respectively * emit code to save/restore secure floating-point state around calls to non-secure functions * emit code to save/restore non-secure floating-pointy state upon entry to non-secure entry function, and return to non-secure state * emit code to clobber registers not used for arguments and returns * when switching to no-secure state Patch by Momchil Velikov, Bradley Smith, Javed Absar, David Green, possibly others. Differential Revision: https://reviews.llvm.org/D76518	2020-05-14 16:46:16 +01:00
David Green	fa15255d8a	[ARM] Convert floating point splats to integer Under MVE a vdup will always take a gpr register, not a floating point value. During DAG combine we convert the types to a bitcast to an integer in an attempt to fold the bitcast into other instructions. This is OK, but only works inside the same basic block. To do the same trick across a basic block boundary we need to convert the type in codegenprepare, before the splat is sunk into the loop. This adds a convertSplatType function to codegenprepare to do that, putting bitcasts around the splat to force the type to an integer. There is then some adjustment to the code in shouldSinkOperands to handle the extra bitcasts. Differential Revision: https://reviews.llvm.org/D78728	2020-05-13 15:24:16 +01:00
David Green	87c56594dd	[ARM] Sink splats to fma intrinsics Similar to fmul/fadd, we can sink a splat into a loop containing a fma in order to use more register instruction variants. For that there are also adjustments to the sinking code to handle more than 2 arguments. Differential Revision: https://reviews.llvm.org/D78386	2020-05-13 14:58:30 +01:00
Craig Topper	8c72b0271b	[CodeGen] Use Align in MachineConstantPool.	2020-05-12 10:06:40 -07:00
David Green	6eee2d9b5b	[ARM] Convert VDUPLANE to VDUP under MVE Unlike Neon, MVE does not have a way of duplicating from a vector lane, so a VDUPLANE currently selects to a VDUP(move_from_lane(..)). This forces that to be done earlier as a dag combine to allow other folds to happen. It converts to a VDUP(EXTRACT). On FP16 this is then folded to a VGETLANEu to prevent it from creating a vmovx;vmovhr pair, using a single move_from_reg instead. Differential Revision: https://reviews.llvm.org/D79606	2020-05-09 18:58:13 +01:00
Craig Topper	d1119980e5	[SelectionDAG] Use Align/MaybeAlign for ConstantPoolSDNode. This patch stores the alignment for ConstantPoolSDNode as an Align and updates the getConstantPool interface to take a MaybeAlign. Removing getAlignment() will be done as a follow up. Differential Revision: https://reviews.llvm.org/D79436	2020-05-08 16:04:11 -07:00
David Green	f5f83cf4df	[ARM] VMOVhr load -> vldr Much like the similar combine added recently for VMOVrh load, this adds a fold for VMOVhr load turning it into a vldr.f16 as opposed to a vldrh and vmov.f16. Differential Revision: https://reviews.llvm.org/D78714	2020-05-06 15:45:56 +01:00
David Green	d05f8a38c5	[ARM] VMOVrh of VMOVhr A VMOVhr of a VMOVrh can be simply folded to the original HPR value. Differential Revision: https://reviews.llvm.org/D78710	2020-05-06 15:10:01 +01:00
David Green	a349949f8a	[ARM] Extract from a VDUP If we get into the situation where we are extracting from a VDUP, the extracted value is just the origin, so long as the types match or we can bitcast between the two. Differential Revision: https://reviews.llvm.org/D78708	2020-05-06 14:51:25 +01:00
David Green	ed7db68c35	[ARM] Convert a bitcast VDUP to a VDUP The idea, under MVE, is to introduce more bitcasts around VDUP's in an attempt to get the type correct across basic block boundaries. In order to do that without other regressions we need a few fixups, of which this is the first. If the code is a bitcast of a VDUP, we can convert that straight into a VDUP of the new type, so long as they have the same size. Differential Revision: https://reviews.llvm.org/D78706	2020-05-06 14:14:21 +01:00
Momchil Velikov	fb18dffaeb	Revert "[ARM] CMSE code generation" This reverts commit `7cbbf89d23`. The regression tests fail with the expensive checks.	2020-05-05 19:05:40 +01:00
Momchil Velikov	7cbbf89d23	[ARM] CMSE code generation This patch implements the final bits of CMSE code generation: * emit special linker symbols * restrict parameter passing to not use memory * emit BXNS and BLXNS instructions for returns from non-secure entry functions, and non-secure function calls, respectively * emit code to save/restore secure floating-point state around calls to non-secure functions * emit code to save/restore non-secure floating-pointy state upon entry to non-secure entry function, and return to non-secure state * emit code to clobber registers not used for arguments and returns when switching to no-secure state Patch by Momchil Velikov, Bradley Smith, Javed Absar, David Green, possibly others. Differential Revision: https://reviews.llvm.org/D76518	2020-05-05 18:23:28 +01:00
David Green	f85acb1915	[ARM] Correct the type on a predicate cast A PREDICATE_CAST(PREDICATE_CAST(X)) can be converted to a PREDICATE_CAST(X) as the operation can convert between any forms of predicates (v4i1/v8i1/v16i1/i32). Unfortunately I got the type wrong on one of the rarer converts, which would lead to invalid nodes during isel. This fixes it up to use the correct type. Differential Revision: https://reviews.llvm.org/D79402	2020-05-05 13:15:10 +01:00
Pierre-vh	d5eb7ffa33	[Target][ARM] Fold or(A, B) more aggressively for I1 vectors This patch makes the folding of or(A, B) into not(and(not(A), not(B))) more agressive for I1 vector. This only affects Thumb2 MVE and improves codegen, because it removes a lot of msr/mrs instructions on VPR.P0. This patch also adds a xor(vcmp) -> !vcmp fold for MVE. Differential Revision: https://reviews.llvm.org/D77202	2020-05-05 10:03:02 +01:00
Pierre-vh	ffdda495f7	[Target][ARM] Add PerformVSELECTCombine for MVE Integer Ops This patch adds an implementation of PerformVSELECTCombine in the ARM DAG Combiner that transforms vselect(not(cond), lhs, rhs) into vselect(cond, rhs, lhs). Normally, this should be done by the target-independent DAG Combiner, but it doesn't handle the kind of constants that we generate, so we have to reimplement it here. Differential Revision: https://reviews.llvm.org/D77712	2020-05-05 10:03:02 +01:00
Eli Friedman	1eb160fe8d	[ARM] Fix tail call validity checking for varargs calls. If a varargs function is calling a non-varargs function, or vice versa, make sure we use the correct "varargs" bit for each. Fixes https://bugs.llvm.org/show_bug.cgi?id=45234 Differential Revision: https://reviews.llvm.org/D79199	2020-05-04 12:34:14 -07:00
David Green	1084b32339	[ARM] Always replace FP16 bitcasts with VMOVhr or VMOVrh This changes the logic with lowering fp16 bitcasts to always produce either a VMOVhr or a VMOVrh, instead of only trying to do it with certain surrounding nodes. To perform the same optimisations demand bits and known bits information has been added for them. Differential Revision: https://reviews.llvm.org/D78587	2020-04-28 16:12:53 +01:00
Craig Topper	a58b62b4a2	[IR] Replace all uses of CallBase::getCalledValue() with getCalledOperand(). This method has been commented as deprecated for a while. Remove it and replace all uses with the equivalent getCalledOperand(). I also made a few cleanups in here. For example, to removes use of getElementType on a pointer when we could just use getFunctionType from the call. Differential Revision: https://reviews.llvm.org/D78882	2020-04-27 22:17:03 -07:00
David Green	8807139026	[ARM] Only produce qadd8b under hasV6Ops When compiling for a arm5te cpu from clang, the +dsp attribute is set. This meant we could try and generate qadd8 instructions where we would end up having no pattern. I've changed the condition here to be hasV6Ops && hasDSP, which is what other parts of ARMISelLowering seem to use for similar instructions. Fixed PR45677. Differential Revision: https://reviews.llvm.org/D78877	2020-04-27 10:13:29 +01:00
Benjamin Kramer	166467e822	[VectorUtils] Create shufflevector masks as int vectors instead of Constants No functionality change intended.	2020-04-17 15:28:00 +02:00
Christopher Tetreault	0badd8f613	[SVE] Remove calls to getBitWidth from ARM Reviewers: efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77904	2020-04-14 10:56:38 -07:00
Craig Topper	113f37a1f9	[CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase Differential Revision: https://reviews.llvm.org/D77995	2020-04-13 13:50:15 -07:00
Christopher Tetreault	e1e131ea5e	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: grosbach, efriedma, sdesmalen Reviewed By: efriedma Subscribers: hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77271	2020-04-09 12:52:44 -07:00
Matt Arsenault	84aa58cbe2	CodeGen: Use Register in TargetLowering	2020-04-08 12:10:58 -04:00
Oliver Stannard	a294d9eb21	Revert "[IPRA][ARM] Spill extra registers at -Oz" Reverting because this is causing failures on bots with expensive checks enabled. This reverts commit `73cea83a6f`.	2020-04-06 10:34:59 +01:00
John Brawn	4ad9ca0f9e	[ARM] Fix incorrect handling of big-endian vmov.i64 Currently when the target is big-endian vmov.i64 reverses the order of the two words of the vector. This is correct only when the underlying element type is 32-bit, as actually what it should be doing is considering it a vector of the underlying type and reversing the elements of that. Differential Revision: https://reviews.llvm.org/D76515	2020-04-03 17:36:50 +01:00
John Brawn	cd58fb6325	[ARM] Avoid pointless vrev of element-wise vmov If we have an element-wise vmov immediate instruction then a subsequent vrev with width greater or equal to the vmov element width, then that vrev won't do anything. Add a DAG combine to convert bitcasts that would become such vrevs into vector_reg_casts instead. Differential Revision: https://reviews.llvm.org/D76514	2020-04-03 17:36:50 +01:00
David Green	fbd53ffc3a	[ARM] MVE VMULL patterns This adds MVE vmull patterns, which are conceptually the same as mul(vmovl, vmovl), and so the tablegen patterns follow the same structure. For i8 and i16 this is simple enough, but in the i32 version the multiply (in 64bits) is illegal, meaning we need to catch the pattern earlier in a dag fold. Because bitcasts are involved in the zext versions and the patterns are a little different in little and big endian. I have only added little endian support in this patch. Differential Revision: https://reviews.llvm.org/D76740	2020-04-02 10:57:40 +01:00
Guillaume Chatelet	189d2e215f	[Alignment][NFC] Use more Align versions of various functions Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: MatzeB, qcolombet, arsenm, sdardis, jvesely, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77291	2020-04-02 09:00:53 +00:00
Guillaume Chatelet	3a78f44daf	[Alignment][NFC] Convert SelectionDAG::InferPtrAlignment to MaybeAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77212	2020-04-01 13:22:11 +00:00
Eli Friedman	1ee6ec2bf3	Remove "mask" operand from shufflevector. Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors. This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types. I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that. Differential Revision: https://reviews.llvm.org/D72467	2020-03-31 13:08:59 -07:00
Guillaume Chatelet	b9810988b2	[Alignment][NFC] Transitionning more getMachineMemOperand call sites Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77127	2020-03-31 11:04:10 +00:00
Guillaume Chatelet	c9d5c19597	[Alignment][NFC] Transitionning more getMachineMemOperand call sites Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77121	2020-03-31 08:36:18 +00:00
Guillaume Chatelet	b91535f6c7	[Alignment][NFC] Return Align for SelectionDAGNodes::getOriginalAlignment/getAlignment Summary: Also deprecate getOriginalAlignment, getAlignment will take much more time as it is pervasive through the codebase (including TableGened files). This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76933	2020-03-30 07:26:48 +00:00
David Green	c9eaed5149	[ARM] MVE VMOV.i64 In the original batch of MVE VMOVimm code generation VMOV.i64 was left out due to the way it was done downstream. It turns out that it's fairly simple though. This adds the codegen for it, similar to NEON. Bigendian is technically incorrect in this version, which John is fixing in a Neon patch.	2020-03-30 07:44:23 +01:00
David Green	37b9cc8f29	[ARM] Sink splats to vector float instructions Some MVE floating point instructions have gpr register variants that take the scalar gpr value and splat them to all lanes. In order to accept them in loops, the shuffle_vector and insert need to be sunk down into the loop, next to the instruction so that ISel can see the whole pattern. This does that sinking for FAdd, FSub, FMul and FCmp. The patterns for mul are slightly more constrained as there are no fms variants taking register arguments. Differential Revision: https://reviews.llvm.org/D76023	2020-03-26 09:02:18 +00:00
David Green	f8c79b94af	[ARM] Fold VMOVrh VLDR to LDRH This adds a simple fold to combine VMOVrh load to a integer load. Similar to what is already performed for BITCAST, but needs to account for the types being of different sizes, creating an zero extending load. Differential Revision: https://reviews.llvm.org/D76485	2020-03-24 15:51:03 +00:00
David Green	1232cfa385	[ARM] Don't split trunc stores that can be better handled as VMOVN We deliberately split stores of the form store(truncate(larger-than-legal-type)) into two stores, allowing each store to perform part of the truncate for free. There are times however where it makes more sense to use VMOVN to de-interlace the results back into a single vector, and store that in one go. This adds a check for that situation, not splitting the store if it looks like a VMOVN can be more useful. Differential Revision: https://reviews.llvm.org/D76511	2020-03-24 08:48:52 +00:00
Simon Tatham	1adfa4c991	[ARM,MVE] Add ACLE intrinsics for the vaddv/vaddlv family. Summary: I've implemented them as target-specific IR intrinsics rather than using `@llvm.experimental.vector.reduce.add`, on the grounds that the 'experimental' intrinsic doesn't currently have much code generation benefit, and my replacements encapsulate the sign- or zero-extension so that you don't expose the illegal MVE vector type (`<4 x i64>`) in IR. The machine instructions come in two versions: with and without an input accumulator. My new IR intrinsics, like the 'experimental' one, don't take an accumulator parameter: we represent that by just adding on the input value using an ordinary i32 or i64 add. So if you write the `vaddvaq` C-language intrinsic with an input accumulator of zero, it can be optimised to VADDV, and conversely, if you write something like `x += vaddvq(y)` then that can be combined into VADDVA. Most of this is achieved in isel lowering, by converting these IR intrinsics into the existing `ARMISD::VADDV` family of custom SDNode types. For the difficult case (64-bit accumulators), isel lowering already implements the optimization of folding an addition into a VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is already done, except that I had to introduce a parallel set of ARMISD nodes for the //predicated// forms of VADDLV. For the simpler VADDV, we handle the predicated form by just leaving the IR intrinsic alone and matching it in an ordinary dag pattern. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76491	2020-03-20 15:42:33 +00:00
Simon Tatham	45a9945b9e	[ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family. Summary: I've implemented these as target-specific IR intrinsics, because they're not //quite// enough like @llvm.experimental.vector.reduce.min (which doesn't take the extra scalar parameter). Also this keeps the predicated and unpredicated versions looking similar, and the floating-point minnm/maxnm versions fold into the same schema. We had a couple of min/max reductions already implemented, from the initial pathfinding exercise in D67158. Those were done by having separate IR intrinsic names for the signed and unsigned integer versions; as part of this commit, I've changed them to use a flag parameter indicating signedness, which is how we ended up deciding that the rest of the MVE intrinsics family ought to work. So now hopefully the ewhole lot is consistent. In the new llc test, the output code from the `v8f16` test functions looks quite unpleasant, but most of it is PCS lowering (you can't pass a `half` directly in or out of a function). In other circumstances, where you do something else with your `half` in the same function, it doesn't look nearly as nasty. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76490	2020-03-20 15:42:33 +00:00
David Green	b3499f572d	[ARM] Change VDUP type to i32 for MVE The MVE VDUP instruction take a GPR and splats into every lane of a vector register. Unlike NEON we do not have a VDUPLANE equivalent instruction, doing the same splat from a fp register. Previously a VDUP to a v4f32/v8f16 would be represented as a (v4f32 VDUP f32), which would mean the instruction pattern needs to add a COPY_TO_REGCLASS to the GPR. Instead this now converts that earlier during an ISel DAG combine, converting (VDUP x) to (VDUP (bitcast x)). This can allow instruction selection to tell that the input needs to be an i32, which in one of the testcases allows it to use ldr (or specifically ldm) over (vldr;vmov). Whilst being simple enough for floats, as the types sizes are the same, these is no BITCAST equivalent for getting a half into a i32. This uses a VMOVrh ARMISD node, which doesn't know the same tricks yet. Differential Revision: https://reviews.llvm.org/D76292	2020-03-20 09:48:45 +00:00
Eli Friedman	e24e95fe90	Remove CompositeType class. The existence of the class is more confusing than helpful, I think; the commonality is mostly just "GEP is legal", which can be queried using APIs on GetElementPtrInst. Differential Revision: https://reviews.llvm.org/D75660	2020-03-18 13:53:17 -07:00
Oliver Stannard	73cea83a6f	[IPRA][ARM] Spill extra registers at -Oz When optimising for code size at the expense of performance, it is often worth saving and restoring some of r0-r3, if IPRA will be able to take advantage of them. This doesn't cost any extra code size if we already have a PUSH/POP pair, and increases the number of available registers across any calls to the function. We already have an optimisation which tries fold the subtract/add of the SP into the PUSH/POP by using extra registers, which somewhat conflicts with this. I've made the new optimisation less aggressive in cases where the existing one is likely to trigger, which gives better results than either of these optimisations by themselves. Differential revision: https://reviews.llvm.org/D69936	2020-03-18 13:51:16 +00:00
Simon Tatham	928776de92	[ARM,MVE] Add intrinsics for the VQDMLAH family. Summary: These are complicated integer multiply+add instructions with extra saturation, taking the high half of a double-width product, and optional rounding. There's no sensible way to represent that in standard IR, so I've converted the clang builtins directly to target-specific intrinsics. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76123	2020-03-18 10:55:04 +00:00
Simon Tatham	28c5d97bee	[ARM,MVE] Add intrinsics and isel for MVE integer VMLA. Summary: These instructions compute multiply+add in integers, with one of the operands being a splat of a scalar. (VMLA and VMLAS differ in whether the splat operand is a multiplier or the addend.) I've represented these in IR using existing standard IR operations for the unpredicated forms. The predicated forms are done with target- specific intrinsics, as usual. When operating on n-bit vector lanes, only the bottom n bits of the i32 scalar operand are used. So we have to tell that to isel lowering, to allow it to remove a pointless sign- or zero-extension instruction on that input register. That's done in `PerformIntrinsicCombine`, but first I had to enable `PerformIntrinsicCombine` for MVE targets (previously all the intrinsics it handled were for NEON), and make it a method of `ARMTargetLowering` so that it can get at `SimplifyDemandedBits`. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76122	2020-03-18 10:55:04 +00:00
David Green	2c6c169dbd	[ARM] Optimise ASRL/LSRL to smaller shifts using demand bits. The ASRL/LSRL long shifts are generated from 64bit shifts. Once we have them, it might turn out that enough of the 64bit result was not required that we can use a smaller shift to perform the same result. As the smaller shift can in general be folded in more way, such as into add instructions in one of the test cases here, we can use the demand bit analysis to prefer the smaller shifts where we can. Differential Revision: https://reviews.llvm.org/D75371	2020-03-13 10:09:03 +00:00
David Green	f67d93dc23	[ARM] Constant long shift combines This changes the way that asrl and lsrl intrinsics are lowered, going via a the ISEL ASRL and LSLL nodes instead of straight to machine nodes. On top of that, it adds some constant folds for long shifts, in case it turns out that the shift amount was either constant or 0. Differential Revision: https://reviews.llvm.org/D75553	2020-03-13 08:54:59 +00:00
Victor Campos	8a12553223	[ARM] Improve codegen of volatile load/store of i64 Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. The code generation explicitly deviates from using the register-offset variant of LDRD/STRD. In this variant, the register allocated to the register-offset cannot be reused in any of the remaining operands. Such restriction seems to be non-trivial to implement in LLVM, thus it is left as a to-do. Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers Reviewed By: efriedma, nickdesaulniers Subscribers: danielkiss, alanphipps, hans, nathanchance, nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072	2020-03-11 10:19:27 +00:00
James Greenhalgh	f0de8d0940	[Arm] Do not lower vmax/vmin to Neon instructions On some Arm cores there is a performance penalty when forwarding from an S register to a D register. Calculating VMAX in a D register creates false forwarding hazards, so don't do that unless we're on a core which specifically asks for it. Patch by James Greenhalgh Differential Revision: https://reviews.llvm.org/D75248	2020-03-10 10:48:48 +00:00
Djordje Todorovic	c15c68abdc	[CallSiteInfo] Enable the call site info only for -g + optimizations Emit call site info only in the case of '-g' + 'O>0' level. Differential Revision: https://reviews.llvm.org/D75175	2020-03-09 12:12:44 +01:00
David Green	13f2a5883f	[ARM] Fixup FP16 bitcasts Under fp16 we optimise the bitcast between a VMOVhr and a CopyToReg via custom lowering. This rewrites that to be a DAG combine instead, which helps produce better code in the cases where the bitcast is actaully legal. Differential Revision: https://reviews.llvm.org/D72753	2020-02-27 12:19:31 +00:00
Craig Topper	735d27dc40	[SelectionDAG][PowerPC][AArch64][X86][ARM] Add chain input and output the ISD::FLT_ROUNDS_ This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order. This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code. Differential Revision: https://reviews.llvm.org/D75132	2020-02-25 16:58:23 -08:00
Hans Wennborg	decd021fac	Don't generate libcalls for wide shift on Windows ARM (PR42711) The previous patch (`cff90f07cb`) didn't cover ARM.	2020-02-25 11:54:07 +01:00
Sam Parker	03756a4197	[ARM][MVE] Combine more extending masked loads For MVE, don't look at the users of the extending loads so that more as desirable for folding. Differential Revision: https://reviews.llvm.org/D74958	2020-02-24 07:50:15 +00:00
David Green	83012cb217	[ARM] Correct Formatting. NFC Also removed an unnecessary TODO that I don't believe is relevant for the instruction in question.	2020-02-21 16:08:56 +00:00
Djordje Todorovic	2f215cf36a	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGfaff707db82d. A failure found on an ARM 2-stage buildbot. The investigation is needed.	2020-02-20 14:41:39 +01:00
David Green	33aa5dfe9c	[ARM] VMLAVA reduction patterns Similar to VADDV and VADDLV that have been added recently, this adds lowering and patterns for VMLAV, VMLAVA, VMLALV and VMLALVA. They perform the same roles as the add's, just folding a mul into the same instruction (and so taking two inputs). As such, they need to be lowered in the same way as the types are often not legal. Differential Revision: https://reviews.llvm.org/D74390	2020-02-19 12:39:58 +00:00
David Green	fceb3e3b4a	[ARM] MVE VADDLV lowering Following on from the extra VADDV lowering, this extends things to handle VADDLV which allows summing values into a pair of i32 registers, together treated as a i64. This needs to be done in DAGCombine too as the types are otherwise illegal, which is a fairly simple addition on top of the existing code. There is also a VADDLVA instruction handled here, that adds the incoming values from the two general purpose registers. As opposed to the non-long version where we could just add patterns for add(x, VADDV), the long version needs to handle this early before the i64 has being split into too many pieces. Differential Revision: https://reviews.llvm.org/D74224	2020-02-19 11:07:20 +00:00
Djordje Todorovic	faff707db8	Reland "[DebugInfo] Enable the debug entry values feature by default" Differential Revision: https://reviews.llvm.org/D73534	2020-02-19 11:12:26 +01:00
David Green	51c6e9445c	[ARM] Extra MVE VADDV reduction patterns We already make use of the VADDV vector reduction instruction for cases where the input and the output start out at the same type. The MVE instruction however will sum into an i32, so if we are summing a v16i8 into an i32, we can still use the same instructions. In terms of IR, this looks like a sext of a legal type (v16i8) into a very illegal type (v16i32) and a vecreduce.add of that into the result. This means we have to catch the pattern early in a DAG combine, producing a target VADDVs/u node, where the signedness is now important. This is the first part, handling VADDV and VADDVA. There are also VADDVL/VADDVLA instructions, which are interesting because they sum into a 64bit value. And VMLAV and VMLALV, which are interesting because they also do a multiply of two values. It may look a little odd in places as a result. On it's own this will probably not do very much, as the vectorizer will not produce this IR yet. Differential Revision: https://reviews.llvm.org/D74218	2020-02-19 09:45:35 +00:00
Djordje Todorovic	2bf44d11cb	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGa82d3e8a6e67.	2020-02-18 16:38:11 +01:00
Djordje Todorovic	a82d3e8a6e	Reland "[DebugInfo] Enable the debug entry values feature by default" This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-18 14:41:08 +01:00
John Brawn	594a89f727	[FPEnv][ARM] Don't call mutateStrictFPToFP when lowering mutateStrictFPToFP can delete the node and replace it with another with the same value which can later cause problems, and returning the result of mutateStrictFPToFP doesn't work because SelectionDAGLegalize expects that the returned value has the same number of results as the original. Instead handle things by doing the mutation manually. Differential Revision: https://reviews.llvm.org/D74726	2020-02-17 18:19:25 +00:00
Benjamin Kramer	5fc5c7db38	Strength reduce vectors into arrays. NFCI.	2020-02-17 15:37:35 +01:00
John Brawn	0ec5797296	[ARM] Fix infinite loop when lowering STRICT_FP_EXTEND If the target has FP64 but not FP16 then we have custom lowering for FP_EXTEND and STRICT_FP_EXTEND with type f64. However if the extend is from f32 to f64 the current implementation will cause in infinite loop for STRICT_FP_EXTEND due to emitting a merge_values of the original node which after replacement becomes a merge_values of itself. Fix this by not doing anything for f32 to f64 extend when we have FP64, though for STRICT_FP_EXTEND we have to do the strict-to-nonstrict mutation as that doesn't happen automatically for opcodes with custom lowering. Differential Revision: https://reviews.llvm.org/D74559	2020-02-13 16:12:50 +00:00
David Green	9d4c597541	[ARM] Fix ReconstructShuffle for bigendian Simon pointed out that this function is doing a bitcast, which can be incorrect for big endian. That makes the lowering of VMOVN in MVE wrong, but the function is shared between Neon and MVE so both can be incorrect. This attempts to fix things by using the newly added VECTOR_REG_CAST instead of the BITCAST. As it may now be used on Neon, I've added the relevant patterns for it there too. I've also added a quick dag combine for it to remove them where possible. Differential Revision: https://reviews.llvm.org/D74485	2020-02-13 09:56:46 +00:00
Jay Foad	32aac25637	[KnownBits] Introduce anyext instead of passing a flag into zext Summary: This was a very odd API, where you had to pass a flag into a zext function to say whether the extended bits really were zero or not. All callers passed in a literal true or false. I think it's much clearer to make the function name reflect the operation being performed on the value we're tracking (rather than on the KnownBits Zero and One fields), so zext means the value is being zero extended and new function anyext means the value is being extended with unknown bits. NFC. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74482	2020-02-12 19:06:53 +00:00
Djordje Todorovic	97ed706a96	Revert "[DebugInfo] Enable the debug entry values feature by default" This reverts commit rG9f6ff07f8a39. Found a test failure on clang-with-thin-lto-ubuntu buildbot.	2020-02-12 11:59:04 +01:00
Djordje Todorovic	9f6ff07f8a	[DebugInfo] Enable the debug entry values feature by default This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-12 10:25:14 +01:00
Craig Topper	eeb63944e4	[LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results. Remove code from LegalizeTypes that allowed this to work. We were already using BUILD_PAIR for this in some places so this standardizes on a single way to do this.	2020-02-08 09:52:31 -08:00
Victor Campos	af2a384581	Revert "[ARM] Improve codegen of volatile load/store of i64" This reverts commit `60e0120c91`.	2020-02-08 13:18:45 +00:00
Guillaume Chatelet	f85d3408e6	[NFC] Introduce an API for MemOp Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73964	2020-02-07 11:32:27 +01:00
Guillaume Chatelet	b8144c0536	[NFC] Encapsulate MemOp logic Summary: This patch simply introduces functions instead of directly accessing the fields. This helps introducing additional check logic. A second patch will add simplifying functions. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73945	2020-02-04 10:36:26 +01:00
Guillaume Chatelet	333f2ad8b8	[Alignment][NFC] Use Align for getMemcpy/Memmove/Memset Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73885	2020-02-03 17:13:19 +01:00
John Brawn	b37d59353f	[FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCS These can be lowered to code sequences using CMPFP and CMPFPE which then get selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain operand isn't handled correctly, but resolving that looks like it would involve changes around FPSCR-handling instructions and how the FPSCR is modelled. The fp-intrinsics test was already testing some of this but as the entire test was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the cases where we aren't generating the right instruction sequences as FIXME. Differential Revision: https://reviews.llvm.org/D73194	2020-02-03 12:59:12 +00:00
Simon Tatham	961530fdc9	[ARM,MVE] Fix vreinterpretq in big-endian mode. Summary: In big-endian MVE, the simple vector load/store instructions (i.e. both contiguous and non-widening) don't all store the bytes of a register to memory in the same order: it matters whether you did a VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register formats of different vector types relate to each other in a different way from the in-memory formats. So, if you want to 'bitcast' or 'reinterpret' one vector type as another, you have to carefully specify which you mean: did you want to reinterpret the //register// format of one type as that of the other, or the //memory// format? The ACLE `vreinterpretq` intrinsics are specified to reinterpret the register format. But I had implemented them as LLVM IR bitcast, which is specified for all types as a reinterpretation of the memory format. So a `vreinterpretq` intrinsic, applied to values already in registers, would code-generate incorrectly if compiled big-endian: instead of emitting no code, it would emit a `vrev`. To fix this, I've introduced a new IR intrinsic to perform a register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's implemented by a trivial isel pattern that expects the input in an MQPR register, and just returns it unchanged. In the clang codegen, I only emit this new intrinsic where it's actually needed: I prefer a bitcast wherever it will have the right effect, because LLVM understands bitcasts better. So we still generate bitcasts in little-endian mode, and even in big-endian when you're casting between two vector types with the same lane size. For testing, I've moved all the codegen tests of vreinterpretq out into their own file, so that they can have a different set of RUN lines to check both big- and little-endian. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73786	2020-02-03 11:20:06 +00:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00
Benjamin Kramer	adcd026838	Make llvm::StringRef to std::string conversions explicit. This is how it should've been and brings it more in line with std::string_view. There should be no functional change here. This is mostly mechanical from a custom clang-tidy check, with a lot of manual fixups. It uncovers a lot of minor inefficiencies. This doesn't actually modify StringRef yet, I'll do that in a follow-up.	2020-01-28 23:25:25 +01:00
Simon Tatham	4321c6af28	[ARM,MVE] Support immediate vbicq,vorrq,vmvnq intrinsics. Summary: Immediate vmvnq is code-generated as a simple vector constant in IR, and left to the backend to recognize that it can be created with an MVE VMVN instruction. The predicated version is represented as a select between the input and the same constant, and I've added a Tablegen isel rule to turn that into a predicated VMVN. (That should be better than the previous VMVN + VPSEL: it's the same number of instructions but now it can fold into an adjacent VPT block.) The unpredicated forms of VBIC and VORR are done by enabling the same isel lowering as for NEON, recognizing appropriate immediates and rewriting them as ARMISD::VBICIMM / ARMISD::VORRIMM SDNodes, which I then instruction-select into the right MVE instructions (now that I've also reworked those instructions to use the same MC operand encoding). In order to do that, I had to promote the Tablegen SDNode instance `NEONvorrImm` to a general `ARMvorrImm` available in MVE as well, and similarly for `NEONvbicImm`. The predicated forms of VBIC and VORR are represented as a vector select between the original input vector and the output of the unpredicated operation. The main convenience of this is that it still lets me use the existing isel lowering for VBICIMM/VORRIMM, and not have to write another copy of the operand encoding translation code. This intrinsic family is the first to use the `imm_simd` system I put into the MveEmitter tablegen backend. So, naturally, it showed up a bug or two (emitting bogus range checks and the like). Fixed those, and added a full set of tests for the permissible immediates in the existing Sema test. Also adjusted the isel pattern for `vmovlb.u8`, which stopped matching because lowering started turning its input into a VBICIMM. Now it recognizes the VBICIMM instead. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72934	2020-01-23 11:53:52 +00:00
David Green	ff2e67a4f7	[ARM] MVE VLDn postinc This adds Post inc variants of the VLD2/4 and VST2/4 instructions in MVE. It uses the same mechanism/nodes as Neon, transforming the intrinsic+add pair into a ARMISD::VLD2_UPD, which gets selected to a post-inc instruction. The code to do that is mostly taken from the existing Neon code, but simplified as less variants are needed. It also fills in some getTgtMemIntrinsic for the arm.mve.vld2/4 instrinsics, which allow the nodes to have MMO's, calculated as the full length to the memory being loaded/stored. Differential Revision: https://reviews.llvm.org/D71194	2020-01-20 06:57:07 +00:00
Craig Topper	bb2553175a	[TargetLowering][ARM][Mips][WebAssembly] Remove the ordered FP compare from RunttimeLibcalls.def and all associated usages Summary: This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare. This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code. Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn Reviewed By: efriedma Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72536	2020-01-10 19:30:08 -08:00
Matt Arsenault	255cc5a760	CodeGen: Use LLT instead of EVT in getRegisterByName Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.	2020-01-09 17:37:52 -05:00
Victor Campos	60e0120c91	[ARM] Improve codegen of volatile load/store of i64 Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers Reviewed By: efriedma, nickdesaulniers Subscribers: nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072	2020-01-07 13:16:18 +00:00
James Henderson	d68904f957	[NFC] Fix trivial typos in comments Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72143 Patch by Kazuaki Ishizaki.	2020-01-06 10:50:26 +00:00

... 2 3 4 5 6 ...

2148 Commits