llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	2cbf9fd402	[DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes. This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector. This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector. Fixes PR50053 Differential Revision: https://reviews.llvm.org/D107068	2021-08-05 15:40:48 +01:00
Jessica Paquette	ca2e053652	[AArch64][GlobalISel] Legalize wide vector G_PHIs Clamp the max number of elements when legalizing G_PHI. This allows us to legalize some common fallbacks like 4 x s64. Here's an example: https://godbolt.org/z/6YocsEYTd Had to add -global-isel-abort=0 to legalize-phi.mir to account for the G_EXTRACT_VECTOR_ELT from the 32 x s8 G_PHI. Differential Revision: https://reviews.llvm.org/D107508	2021-08-04 16:48:59 -07:00
Jessica Paquette	d9279843b1	[AArch64][GlobalISel] Widen G_PHI before clamping it during legalization This allows us to handle weird types like s88; we first widen to s128, then clamp back down to s64. https://godbolt.org/z/9xqbP46Mz Also this makes it possible for GISel to legalize the case in pr48188.ll. It now does the same thing as SDAG, although regalloc chooses different registers. Differential Revision: https://reviews.llvm.org/D107417	2021-08-04 10:25:14 -07:00
Jessica Paquette	7d97de60b3	[AArch64][GlobalISel] Widen G_FPTO*I before clamping Going through our legalization rules and doing some cleanup. Widening and then clamping is usually easier than clamping and then widening. This allows us to legalize some weird types like s88. Differential Revision: https://reviews.llvm.org/D107413	2021-08-04 10:19:26 -07:00
Bradley Smith	d9cc5d84e4	[AArch64][SVE] Combine bitcasts of predicate types with vector inserts/extracts of loads/stores An insert subvector that is inserting the result of a vector predicate sized load into undef at index 0, whose result is casted to a predicate type, can be combined into a direct predicate load. Likewise the same applies to extract subvector but in reverse. The purpose of this optimization is to clean up cases that will be introduced in a later patch where casts to/from predicate types from i8 types will use insert subvector, rather than going through memory early. This optimization is done in SVEIntrinsicOpts rather than InstCombine to re-introduce scalable loads as late as possible, to give other optimizations the best chance possible to do a good job. Differential Revision: https://reviews.llvm.org/D106549	2021-08-04 15:51:14 +00:00
Simon Wallis	9269752671	[AArch64] Fix assert AArch64TargetLowering::ReplaceNodeResults Don't know how to custom expand this UNREACHABLE executed at llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16788 The fix is to provide missing expansions for: case ISD::STRICT_FP_TO_UINT: case ISD::STRICT_FP_TO_SINT: A test case is provided. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107452	2021-08-04 16:18:19 +01:00
Arthur Eubanks	ad25344620	[MC][CodeGen] Emit constant pools earlier Previously we would emit constant pool entries for ldr inline asm at the very end of AsmPrinter::doFinalization(). However, if we're emitting dwarf aranges, that would end all sections with aranges. Then if we have constant pool entries to be emitted in those same sections, we'd hit an assert that the section has already been ended. We want to emit constant pool entries before emitting dwarf aranges. This patch splits out arm32/64's constant pool entry emission into its own MCTargetStreamer virtual method. Fixes PR51208 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107314	2021-08-03 20:55:31 -07:00
Jessica Paquette	5643736378	[AArch64][GlobalISel] Widen G_SELECT before clamping it This allows us to handle the s88 G_SELECTS: https://godbolt.org/z/5s18M4erY Weird types like this can result in weird merges. Widening to s128 first and then clamping down avoids that situation. Differential Revision: https://reviews.llvm.org/D107415	2021-08-03 18:31:17 -07:00
David Green	bd07c2e266	[AArch64] Prefer fmov over orr v.16b when copying f32/f64 This changes the lowering of f32 and f64 COPY from a 128bit vector ORR to a fmov of the appropriate type. At least on some CPU's with 64bit NEON data paths this is expected to be faster, and shouldn't be slower on any CPU that treats fmov as a register rename. Differential Revision: https://reviews.llvm.org/D106365	2021-08-03 17:25:40 +01:00
Jason Molenda	0d8cd4e2d5	[AArch64InstPrinter] Change printAddSubImm to comment imm value when shifted Add a comment when there is a shifted value, add x9, x0, #291, lsl #12 ; =1191936 but not when the immediate value is unshifted, subs x9, x0, #256 ; =256 when the comment adds nothing additional to the reader. Differential Revision: https://reviews.llvm.org/D107196	2021-08-03 02:28:46 -07:00
Cullen Rhodes	a02bbeeae7	[AArch64][AsmParser] NFC: Use helpers in matrix tile list parsing	2021-08-03 08:13:01 +00:00
Roman Lebedev	6f6e9a867f	[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop I'm not sure this is the best way to approach this, but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D107271	2021-08-03 00:57:26 +03:00
Jessica Paquette	bd13c8e610	[AArch64][GlobalISel] Emit extloads for ZExt/SExt values in assignValueToAddress When a value is expected to be extended, we should emit an extended load rather than a normal G_LOAD. Add checklines to arm64-abi.ll which show that we now emit the correct loads. For ease of comparison: https://godbolt.org/z/8WvY6EfdE Differential Revision: https://reviews.llvm.org/D107313	2021-08-02 14:48:44 -07:00
Irina Dobrescu	b01417d3c5	[AArch64] Optimise min/max lowering in ISel Differential Revision: https://reviews.llvm.org/D106561	2021-08-02 13:40:21 +01:00
Cullen Rhodes	7ed0120d84	[AArch64][AsmParser] NFC: Parser.Lex() -> Lex() Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D107146	2021-08-02 09:48:41 +00:00
Eli Friedman	bdd55b2f18	Fix the default alignment of i1 vectors. Currently, the default alignment is much larger than the actual size of the vector in memory. Fix this to use a sane default. For SVE, temporarily remove lowering of load/store operations for predicates with less than 16 elements. The layout the backend was assuming for SVE predicates with less than 16 elements doesn't agree with the frontend. More work probably needs to be done here. This change is, strictly speaking, not backwards-compatible at the bitcode level. But probably nobody is actually depending on that; i1 vectors in memory are rare, and the code that does use them probably ends up forcing the alignment to something sane anyway. If we think this is a concern, I can restrict this to scalable vectors for now (where it's actually causing issues for me at the moment). Differential Revision: https://reviews.llvm.org/D88994	2021-07-31 14:09:59 -07:00
Matt Arsenault	ebc17a0d68	GlobalISel: Scalarize unaligned vector stores This has the same problems and limitations as the load path.	2021-07-31 10:37:15 -04:00
Alexandros Lamprineas	7d940432c4	[AArch64] Legalize MVT::i64x8 in DAG isel lowering This patch legalizes the Machine Value Type introduced in D94096 for loads and stores. A new target hook named getAsmOperandValueType() is added which maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization. Differential Revision: https://reviews.llvm.org/D94097	2021-07-31 09:51:28 +01:00
Cullen Rhodes	3a349d2269	[AArch64][SME] Introduce feature for streaming mode The Scalable Matrix Extension (SME) introduces a new execution mode called Streaming SVE mode. In streaming mode a substantial subset of the SVE and SVE2 instruction set is available, along with new outer product, load, store, extract and insert instructions that operate on the new architectural register state for the matrix. To support streaming mode this patch introduces a new subtarget feature +streaming-sve. If enabled, the subset of SVE(2) instructions are available. The existing behaviour for SVE(2) remains unchanged, the subset of instructions that are legal in streaming mode are enabled if either +sve[2] or +streaming-sve is specified. Instructions that are illegal in streaming mode remain predicated on +sve[2]. The SME target feature has been updated to imply +streaming-sve rather than +sve. The following changes are made to the SVE(2) tests: * For instructions that are legal in streaming mode: - added RUN line to verify +streaming-sve enables the instruction. - updated diagnostic to 'instruction requires: streaming-sve or sve'. * For instructions that are illegal in streaming-mode: - added RUN line to verify +streaming-sve does not enable the instruction. SVE(2) instructions that are legal in streaming mode have: if !HaveSVE[2]() && !HaveSME() then UNDEFINED; at the top of the pseudocode in the XML. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D106272	2021-07-30 07:30:45 +00:00
Adrian Prantl	c5d84d2eb3	GlobalISel/AArch64: don't optimize away redundant branches at -O0 This patch prevents GlobalISel from optimizing out redundant branch instructions when compiling without optimizations. The motivating example is code like the following common pattern in Swift, where users expect to be able to set a breakpoint on the early exit: public func f(b: Bool) { guard b else { return // I would like to set a breakpoint here. } ... } The patch modifies two places in GlobalISEL: The first one is in IRTranslator.cpp where the removal of redundant branches is made conditional on the optimization level. The second one is in AArch64InstructionSelector.cpp where an -O0 only optimization is being removed. Disabling these optimizations increases code size at -O0 by ~8%. However, doing so improves debuggability, and debug builds are the primary reason why developers compile without optimizations. We thus concluded that this is the right trade-off. rdar://79515454 This tenatively reapplies the patch without modifications, the LLDB test that has blocked this from landing previously has since been modified to hopefully no longer be sensitive to this change. Differential Revision: https://reviews.llvm.org/D105238	2021-07-29 16:04:22 -07:00
Bradley Smith	191831e380	[AArch64][SVE] Fix incorrect mask type when lowering fixed type SVE gather/scatter An incorrect mask type when lowering an SVE gather/scatter was causing a codegen fault which manifested as the incorrect predicate size being used for an SVE gather/scatter, (e.g.. p0.b rather than p0.d). Fixes PR51182. Differential Revision: https://reviews.llvm.org/D106943	2021-07-29 11:22:17 +00:00
Cullen Rhodes	08d92dbbff	[AArch64][AsmParser] NFC: Parser.getTok() -> getTok() Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106949	2021-07-29 10:18:54 +00:00
Amara Emerson	da61ab8475	[AArch64][GlobalISel] More widenToNextPow2 changes, this time for arithmetic/bitwise ops.	2021-07-29 03:02:29 -07:00
Jessica Paquette	5a333dc5da	[AArch64][GlobalISel] Improve legalization for odd-type G_LOAD Swap the order of widening so that we widen to the next power-of-2 first when legalizing G_LOAD. Also, provide a minimum type for the power of 2 to disallow s2 + s1. Clamping ought to disallow s2 and s1, but I think it's better to be explicit about the expected minimum size. We probably need a similar change for G_STORE, but it seems to be a bit more finnicky. So, let's just handle G_LOAD for now. Differential Revision: https://reviews.llvm.org/D107013	2021-07-28 17:19:14 -07:00
Jessica Paquette	c0a41c3d3b	[AArch64][GlobalISel] Improve legalization for odd-sized G_ICMP/G_CONSTANT We were handing types like s88 like 1) clamp to the range 2) widen to the next power of 2 This isn't desirable because it causes an odd breakdown for types like s88. If we widen to the next power of 2 (s128) first, then we get a clean breakdown when we clamp back to s64. Differential Revision: https://reviews.llvm.org/D106998	2021-07-28 15:31:33 -07:00
Amara Emerson	a11d9a1f48	[AArch64][GlobalISel] Fix constraining LDXPX intrinsic selection. Causes a fallback because of lack of regclasses on vregs, unless its without asserts, where we end up crashing later in codegen.	2021-07-27 12:13:56 -07:00
Cullen Rhodes	2e27c4e1f1	[AArch64][SME] Add zero instruction This patch adds the zero instruction for zeroing a list of 64-bit element ZA tiles. The instruction takes a list of up to eight tiles ZA0.D-ZA7.D, which must be in order, e.g. zero {za0.d,za1.d,za2.d,za3.d,za4.d,za5.d,za6.d,za7.d} zero {za1.d,za3.d,za5.d,za7.d} The assembler also accepts 32-bit, 16-bit and 8-bit element tiles which are mapped to corresponding 64-bit element tiles in accordance with the architecturally defined mapping between different element size tiles, e.g. * Zeroing ZA0.B, or the entire array name ZA, is equivalent to zeroing all eight 64-bit element tiles ZA0.D to ZA7.D. * Zeroing ZA0.S is equivalent to zeroing ZA0.D and ZA4.D. The preferred disassembly of this instruction uses the shortest list of tile names that represent the encoded immediate mask, e.g. * An immediate which encodes 64-bit element tiles ZA0.D, ZA1.D, ZA4.D and ZA5.D is disassembled as {ZA0.S, ZA1.S}. * An immediate which encodes 64-bit element tiles ZA0.D, ZA2.D, ZA4.D and ZA6.D is disassembled as {ZA0.H}. * An all-ones immediate is disassembled as {ZA}. * An all-zeros immediate is disassembled as an empty list {}. This patch adds the MatrixTileList asm operand and related parsing to support this. Depends on D105570. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D105575	2021-07-27 08:35:45 +00:00
Craig Topper	2ea9db0c49	[AArch64] Fix -Wparentheses warning with gcc 5.4. NFC	2021-07-26 21:08:56 -07:00
Jon Roelofs	f2e8e46d78	Revert "[AArch64][GlobalISel] Legalize ctpop s128" This reverts commit `97e95fea53`. It broke test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll. Not sure why I didn't see that.	2021-07-26 17:06:43 -07:00
Jon Roelofs	97e95fea53	[AArch64][GlobalISel] Legalize ctpop s128 Differential revision: https://reviews.llvm.org/D106494	2021-07-26 16:33:50 -07:00
Amara Emerson	172051a1f4	[AArch64][GlobalISel] Add identity combines to post-legal combiner. We see some shifts of zero emitted during legalization. Differential Revision: https://reviews.llvm.org/D106816	2021-07-26 15:17:11 -07:00
Amara Emerson	c658b472f3	[GlobalISel] Add a constant folding combine. Use it AArch64 post-legal combiner. These don't always get folded because when the instructions are created the constants are obscured by artifacts. Differential Revision: https://reviews.llvm.org/D106776	2021-07-26 14:53:33 -07:00
Amara Emerson	6af8d36054	[AArch4][GlobalISel] Post-legalize combine s64 = G_MERGE s32, 0 -> G_ZEXT. These are generated as a byproduce of legalization. Differential Revision: https://reviews.llvm.org/D106768	2021-07-26 10:58:04 -07:00
Amara Emerson	0d41d21929	[AArch64][GlobalISel] Enable some select combines after legalization. The legalizer generates selects for some operations, which can have constant condition values, resulting in lots of dead code if it's not folded away. Differential Revision: https://reviews.llvm.org/D106762	2021-07-26 10:40:32 -07:00
Amara Emerson	dec34104bf	[GlobalISel] Add combine for merge(unmerge) and use AArch64 postlegal-combiner. Differential Revision: https://reviews.llvm.org/D106761	2021-07-26 10:37:31 -07:00
Paul Walker	3b77e2737c	[SVE] Use reg+reg addressing mode for immediate offsets. For reg+imm SVE addressing mode imm is implictly scaled by VL, making them impractical for truely immediate offsets. However, if the offset can be unscaled based on the storage element type we can use the reg+reg SVE addressing mode and thus either reduce the number of generate add instructions or replace them with a mov instruction that can be hoisted from the hot code path. Differential Revision: https://reviews.llvm.org/D106744	2021-07-26 16:24:16 +01:00
Bradley Smith	81eafb8a37	[AArch64][SVE] Break false dependencies for inactive lanes of unary operations Differential Revision: https://reviews.llvm.org/D105889	2021-07-26 15:01:21 +00:00
Tim Northover	a487a49acc	AArch64: support i128 (& larger) returns in GlobalISel	2021-07-26 14:16:35 +01:00
Caroline Concatto	bf28111ebd	[AArch65][SVE] Remove vector_splice from AddedComplexity pattern The pattern for vector_splice with Index equal or bigger than zero was misplaced in the AddedComplexity = 1 pattern in the AArch64 tablegen file. This patch fixes it by removing vector_splice pattern from inside AddedComplexity = 1.	2021-07-26 13:35:51 +01:00
Caroline Concatto	0bfc26e3a4	[SVE][AArch64] Improve code generation for vector_splice for Imm > 0 This patch implements vector_splice in tablegen for all cases when the Immediate is positive and lower than the known minimum value of a scalable vector. Vector_splice can be implemented using SVE instruction EXT. For instance : @llvm.experimental.vector.splice(Vector_1, Vector_2, Imm) @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> EXT Vector_1, Vector_2, Imm // Vector_1 = B, C, D + Vector_2 = E Depends on D105633 Differential Revision: https://reviews.llvm.org/D106273	2021-07-26 11:45:46 +01:00
Caroline Concatto	73e4e9cd00	[AArch64][SVE] Improve code generation for vector_splice for Imm == -1 This patch implements vector_splice in tablegen for: a) when the immediate is equal to -1 (Imm==1) and uses: INSR + LASTB For instance : @llvm.experimental.vector.splice(Vector_1, Vector_2, -1) @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <D, E, F, G> LAST RegLast, Vector_1 // RegLast = D INSR Res, (Vector_1 >> 1), RegLast // Res = D + E, F, G Differential Revision: https://reviews.llvm.org/D105633	2021-07-26 11:25:01 +01:00
Cullen Rhodes	e6ff9179ce	[AArch64][AsmParser] NFC: Parser.getTok().getLoc() -> getLoc() Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D106635	2021-07-26 09:36:34 +00:00
David Sherwood	0aff1798b5	[Analysis] Add simple cost model for strict (in-order) reductions I have added a new FastMathFlags parameter to getArithmeticReductionCost to indicate what type of reduction we are performing: 1. Tree-wise. This is the typical fast-math reduction that involves continually splitting a vector up into halves and adding each half together until we get a scalar result. This is the default behaviour for integers, whereas for floating point we only do this if reassociation is allowed. 2. Ordered. This now allows us to estimate the cost of performing a strict vector reduction by treating it as a series of scalar operations in lane order. This is the case when FP reassociation is not permitted. For scalable vectors this is more difficult because at compile time we do not know how many lanes there are, and so we use the worst case maximum vscale value. I have also fixed getTypeBasedIntrinsicInstrCost to pass in the FastMathFlags, which meant fixing up some X86 tests where we always assumed the vector.reduce.fadd/mul intrinsics were 'fast'. New tests have been added here: Analysis/CostModel/AArch64/reduce-fadd.ll Analysis/CostModel/AArch64/sve-intrinsics.ll Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D105432	2021-07-26 10:26:06 +01:00
Kyungwoo Lee	6530ea4095	[AArch64] Fix Local Deallocation for Homogeneous Prolog/Epilog The stack adjustment for local deallocation was incorrectly ported. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D106760	2021-07-25 10:51:11 -07:00
Amara Emerson	acbc0c5f0e	[AArch64][GlobalISel] Widen non-pow-2 types for shifts before clamping. For types like s96, we don't want to clamp to s64, we want to first widen to s128 and then narrow it. Otherwise we end up with impossible to legalize types.	2021-07-24 15:50:43 -07:00
Benjamin Kramer	dd70cd089a	[llvm][sve] Silence unused variable warning in Release builds. NFC	2021-07-23 16:16:35 +02:00
David Truby	1528a4d400	[llvm][sve] Lowering for VLS truncating stores This adds custom lowering for truncating stores when operating on fixed length vectors in SVE. It also includes a DAG combine to fold extends followed by truncating stores into non-truncating stores in order to prevent this pattern appearing once truncating stores are supported. Currently truncating stores are not used in certain cases where the size of the vector is larger than the target vector width. Differential Revision: https://reviews.llvm.org/D104471	2021-07-23 14:04:55 +01:00
David Green	38986c6782	[AArch64] Add worst case shuffle costs This adds some missing single source shuffle costs for AArch64, of i16 and i8 vectors. v4i16 are the same as v4i32 with a worse case cost of 3 coming from the perfect shuffle tables. The larger vector sizes expand into a constant pool, plus a load (and adrp) and a tbl. I arbitrarily chose 8 for the cost to be expensive but not too expensive. Differential Revision: https://reviews.llvm.org/D106241	2021-07-23 09:01:58 +01:00
Cullen Rhodes	fde7550094	[AArch64][AsmParser] NFC: when creating a token IsSuffix=false should be default Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106568	2021-07-23 06:36:06 +00:00
Vitaly Buka	44ba8c691c	[NFC][asan] Always pass Dominator Trees into forAllReachableExits	2021-07-22 18:01:38 -07:00
David Green	c9cebda772	[AArch64] Adjust the cost of integer sum reductions This changes the cost to (LT.first-1) * cost(add) + 2, where the cost of an add is assumed to be 1. This brings it inline with the other reductions. Differential Revision: https://reviews.llvm.org/D106240	2021-07-22 18:19:54 +01:00
Cullen Rhodes	00e87e1c5b	[AArch64][SME] Improve diagnostic for vector select register Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D106540	2021-07-22 13:46:40 +00:00
Jessica Paquette	c75a2bbe08	[AArch64][GlobalISel] Change \| -> \|\| in an if I wrote the wrong type of OR by mistake.	2021-07-21 14:57:31 -07:00
Jessica Paquette	d0af732bd0	[AArch64][GlobalISel] Widen s2 and s4 G_IMPLICIT_DEF + G_FREEZE These had ``` .clampScalar(0, s1, 64) .widenScalarToNextPow2(0, 8) ``` If you have s2 or s4, then `widenScalarToNextPow2` does nothing. This changes the `widenScalarToNextPow2` rule to use s8 as the minimum type instead, allowing us to correctly widen s2 and s4. This does not impact s1, since it's marked as legal already. Differential Revision: https://reviews.llvm.org/D106413	2021-07-21 12:59:20 -07:00
Eli Friedman	0ca46a1757	[SelectionDAG] Fix the representation of ISD::STEP_VECTOR. The existing rule about the operand type is strange. Instead, just say the operand is a TargetConstant with the right width. (Legalization ignores TargetConstants, so it doesn't matter if that width is legal.) Highlights: 1. I had to substantially rewrite the AArch64 isel patterns to expect a TargetConstant. Nothing too exotic, but maybe a little hairy. Maybe worth considering a target-specific node with some dagcombines instead of this complicated nest of isel patterns. 2. Our behavior on RV32 for vectors of i64 has changed slightly. In particular, we correctly preserve the width of the arithmetic through legalization. This changes the DAG a bit. Maybe room for improvement here. 3. I explicitly defined the behavior around overflow. This is necessary to make the DAGCombine transforms legal, and I don't think it causes any practical issues. Differential Revision: https://reviews.llvm.org/D105673	2021-07-21 10:58:40 -07:00
Cullen Rhodes	008c755d76	[AArch64][SME] Support .arch and .arch_extension assembler directives Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D105566	2021-07-21 08:40:27 +00:00
Tim Northover	291e0daa6e	AArch64: support 8 & 16-bit atomic operations in GlobalISel We have SelectionDAG patterns for 8 & 16-bit atomic operations, but they assume the value types will have been legalized to 32-bits. So this adds the ability to widen them to both AArch64 & generic GISel infrastructure.	2021-07-21 09:35:14 +01:00
Cullen Rhodes	2d80bbd939	[AArch64][SME] Add mova instructions This patch adds the mova instruction to insert/extract an SVE vector register to/from a ZA tile vector. The preferred MOV aliases are also implemented. Depends on D105572. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: david-arm, CarolineConcatto Differential Revision: https://reviews.llvm.org/D105574	2021-07-21 08:20:01 +00:00
Cullen Rhodes	6c32cfe85c	[AArch64][SME] Add ldr and str instructions The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D105573	2021-07-21 08:17:13 +00:00
Jon Roelofs	75187aa352	[AArch64][GlobalISel] Legalize ctpop for v2s64, v2s32, v4s32, v4s16, v8s16 https://llvm.godbolt.org/z/nTTK6M5qe Differential revision: https://reviews.llvm.org/D106388	2021-07-20 15:37:56 -07:00
Jessica Paquette	8f54ebd51d	[AArch64][GlobalISel] Select llvm.aarch64.neon.st2 intrinsics Add manual selection code similar to the code in AArch64ISelDAGToDAG, and add `createTuple` helpers similar to the code there as well. This accounted for around 111 fallbacks while building clang for AArch64 with GlobalISel. This also should make it easy to add selection code for other store intrinsics. As a minor cleanup, this uses `createQTuple` in the other place where we use REG_SEQUENCE. Differential Revision: https://reviews.llvm.org/D106332	2021-07-20 13:23:46 -07:00
Eli Friedman	664a1fd9f0	[AArch64] Use the CMP_SWAP_128 variants added in `843c6140`. Accidentally forgot to flip the opcode... and I didn't notice because it was working fine for the GlobalISel.	2021-07-20 13:23:27 -07:00
Fangrui Song	0c0549fbb3	[AArch64] Delete unused Opcode after D106039	2021-07-20 12:51:44 -07:00
Eli Friedman	843c614058	[AArch64] Fix i128 cmpxchg using ldxp/stxp. Basically two parts to this fix: 1. Stop using AtomicExpand to expand cmpxchg i128 2. Fix AArch64ExpandPseudoInsts to use a correct expansion. From ARM architecture reference: To atomically load two 64-bit quantities, perform a Load-Exclusive pair/Store-Exclusive pair sequence of reading and writing the same value for which the Store-Exclusive pair succeeds, and use the read values from the Load-Exclusive pair. Fixes https://bugs.llvm.org/show_bug.cgi?id=51102 Differential Revision: https://reviews.llvm.org/D106039	2021-07-20 12:38:12 -07:00
Bradley Smith	191f9fa5d2	[AArch64][SVE] Move instcombine like transforms out of SVEIntrinsicOpts Instead move them to the instcombine that happens in AArch64TargetTransformInfo. Differential Revision: https://reviews.llvm.org/D106144	2021-07-20 14:17:30 +00:00
Sander de Smalen	eb1a5120b8	[AArch64][SVE][InstCombine] last{a,b} of a splat vector Replace last{a,b}(splat(X)) with X, irrespective of the predicate. Patch by/Committing on behalf of: Usman Nadeem (mnadeem) Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105520	2021-07-20 09:44:43 +01:00
Cullen Rhodes	15af3aaa2e	[AArch64][SME] Add system registers and related instructions This patch adds the new system registers introduced in SME: - ID_AA64SMFR0_EL1 (ro) SME feature identifier. - SMCR_ELx (r/w) streaming mode control register for configuring effective SVE Streaming SVE Vector length when the PE is in Streaming SVE mode. - SVCR (r/w) streaming vector control register, visible at all exception levels. Provides access to PSTATE.SM and PSTATE.ZA using MSR and MRS instructions. - SMPRI_EL1 (r/w) streaming mode execution priority register. - SMPRIMAP_EL2 (r/w) streaming mode priority mapping register. - SMIDR_EL1 (ro) streaming mode identification register. - TPIDR2_EL0 (r/w) for use by SME software to manage per-thread SME context. - MPAMSM_EL1 (r/w) MPAM (v8.4) streaming mode register, for labelling memory accesses performed in streaming mode. Also added in this patch are the SME mode change instructions. Three MSR immediate instructions are implemented to set or clear PSTATE.SM, PSTATE.ZA, or both respectively: - MSR SVCRSM, #<imm1> - MSR SVCRZA, #<imm1> - MSR SVCRSMZA, #<imm1> The following smstart/smstop aliases are also implemented for convenience: smstart -> MSR SVCRSMZA, #1 smstart sm -> MSR SVCRSM, #1 smstart za -> MSR SVCRZA, #1 smstop -> MSR SVCRSMZA, #0 smstop sm -> MSR SVCRSM, #0 smstop za -> MSR SVCRZA, #0 The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D105576	2021-07-20 08:06:26 +00:00
Amara Emerson	56a6686e0c	[AArch64][GlobalISel] Don't form truncstores in postlegalizer-lowering for s128. We don't support truncating s128 stores, so don't form them.	2021-07-20 00:04:34 -07:00
Matt Arsenault	30fa074c0a	AArch64/GlobalISel: Preserve memory types	2021-07-19 20:21:05 -04:00
Amy Huang	fd972bb9fd	Revert "[llvm][sve] Lowering for VLS truncating stores" because it causes a seg fault (see https://reviews.llvm.org/D104471). This reverts commit `c305557acd`.	2021-07-19 11:03:33 -07:00
David Green	5561ad8b36	[ARM] Remove PromotedBitwiseVT for NEON types This removes the promotion of NEON AND, OR and XOR nodes to v2i32/v4i32, treating them the same as the AArch64 and MVE backends where we just add the relevant patterns for each legal type. This prevents a lot of bitcasts from being added to the DAG, which have the potential to make optimizations more difficult. It does mean adding extra patterns, and some codegen can change due to the types now being legal, not promoted. Differential Revision: https://reviews.llvm.org/D105588	2021-07-19 16:36:33 +01:00
Matt Arsenault	e574fd9d52	AArch64/GlobalISel: Cleanup unnecessary size checks in call lowering The CCValAssign types should now be accurate, so these are no longer necessary.	2021-07-19 11:01:30 -04:00
Florian Mayer	d23f26f0af	[NFC] [MTE] helper for stack tagging lifetimes. Reviewed By: eugenis, vitalybuka Differential Revision: https://reviews.llvm.org/D106135	2021-07-19 11:09:16 +01:00
Cullen Rhodes	f91eaa7007	[AArch64][SME] Add SVE2 instructions added in SME This patch adds support for the following instructions: SCLAMP, UCLAMP, REV, DUP (predicate) The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D105577	2021-07-19 08:03:05 +00:00
Sander de Smalen	0ed0573527	[AArch64][SVE] Optimize bitcasts between unpacked half/i16 vectors. The case for nxv2f32/nxv2i32 was already covered by D104573. This patch builds on top of that by making the mechanism work for nxv2[b]f16/nxv2i16, nxv4[b]f16/nxv4i16 as well. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D106138	2021-07-19 08:29:28 +01:00
Jon Roelofs	5cd63e9ec2	[AArch64][GlobalISel] Legalize bswap <2 x i16> Differential revision: https://reviews.llvm.org/D105935	2021-07-17 15:31:15 -07:00
Eli Friedman	e41e865b15	[AArch64] Prepare for changes to STEP_VECTOR. Rewrite patterns to assume that the operand of STEP_VECTOR is a constant. The old patterns will stop working when the operand is changed from a Constant to a TargetConstant. (See D105673.) Add test coverage for certain patterns that weren't exercised by existing regression tests. Differential Revision: https://reviews.llvm.org/D105847	2021-07-17 14:13:41 -07:00
Nikita Popov	6d3e7c783b	[OpaquePtr] Remove uses of CreateConstGEP1_32() without element type Remove uses of to-be-deprecated API. I've fallen back to calling getPointerElementType() in some cases where the correct type wasn't immediately obvious to me.	2021-07-17 18:32:36 +02:00
Nicholas Guy	9769535efd	[AArch64] Update Cortex-A55 SchedModel to improve LDP scheduling Specifying the latencies of specific LDP variants appears to improve performance almost universally. Differential Revision: https://reviews.llvm.org/D105882	2021-07-16 12:00:57 +01:00
Cullen Rhodes	99eb96f031	[AArch64][SME] Add load and store instructions This patch adds support for following contiguous load and store instructions: * LD1B, LD1H, LD1W, LD1D, LD1Q * ST1B, ST1H, ST1W, ST1D, ST1Q A new register class and operand is added for the 32-bit vector select register W12-W15. The differences in the following tests which have been re-generated are caused by the introduction of this register class: * llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll * llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir * llvm/test/CodeGen/AArch64/stp-opt-with-renaming-reserved-regs.mir * llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir D88663 attempts to resolve the issue with the store pair test differences in the AArch64 load/store optimizer. The GlobalISel differences are caused by changes in the enum values of register classes, tests have been updated with the new values. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D105572	2021-07-16 10:11:10 +00:00
Matt Arsenault	e91da668d0	GlobalISel: Track argument pointeriness with arg flags Since we're still building on top of the MVT based infrastructure, we need to track the pointer type/address space on the side so we can end up with the correct pointer LLTs when interpreting CCValAssigns.	2021-07-15 19:11:40 -04:00
Jessica Paquette	46c8e7122b	[AArch64][GlobalISel] Clamp <n x p0> vecs when legalizing G_EXTRACT_VECTOR_ELT This case was missing from G_EXTRACT_VECTOR_ELT. It's the same as for s64. https://godbolt.org/z/Tnq4acY8z Differential Revision: https://reviews.llvm.org/D105952	2021-07-15 14:05:28 -07:00
Simon Pilgrim	91e151476c	[TTI] Consistently make getMinVectorRegisterBitWidth() methods const. NFCI. The underlying getMinVectorRegisterBitWidth() methods are const, but it was missed in a couple of TargetTransformInfo wrappers. Noticed while working on D103925	2021-07-15 13:27:55 +01:00
Irina Dobrescu	831ee6b0c3	[AArch64][GlobalISel] Optimise lowering for some vector types for min/max Differential Revision: https://reviews.llvm.org/D105696	2021-07-15 11:34:32 +01:00
Cullen Rhodes	dfa76933c2	[AArch64][SME] Add outer product instructions This patch adds support for the following outer product instructions: * BFMOPA, BFMOPS, FMOPA, FMOPS, SMOPA, SMOPS, SUMOPA, SUMOPS, UMOPA, UMOPS, USMOPA, USMOPS. Depends on D105570. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D105571	2021-07-15 09:51:06 +00:00
Jon Roelofs	0e49c54a8c	[AArch64] Fix selection of G_UNMERGE <2 x s16> Differential revision: https://reviews.llvm.org/D106007	2021-07-14 13:40:56 -07:00
Eli Friedman	1e30bf8621	[SelectionDAG] Add an overload of getStepVector that assumes step 1. This is mostly a minor convenience, but the pattern seems frequent enough to be worthwhile (and we'll probably add more uses in the future). Differential Revision: https://reviews.llvm.org/D105850	2021-07-14 11:37:01 -07:00
Sander de Smalen	eac1670739	[CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> invalid. At the moment, <vscale x 1 x eltty> are not yet fully handled by the code-generator, so to avoid vectorizing loops with that VF, we mark the cost for these types as invalid. The reason for not adding a new "TTI::getMinimumScalableVF" is because the type is supposed to be a type that can be legalized. It partially is, although the support for these types need some more work. Reviewed By: paulwalker-arm, dmgreen Differential Revision: https://reviews.llvm.org/D103882	2021-07-14 16:44:22 +01:00
Stephen Tozer	810e4c3c66	[DebugInfo] Correctly update dbg.values with duplicated location ops This patch fixes code that incorrectly handled dbg.values with duplicate location operands, i.e. !DIArgList(i32 %a, i32 %a). The errors in question were caused by either applying an update to dbg.value multiple times when the update is only valid once, or by updating the DIExpression for only the first instance of a value that appears multiple times. Differential Revision: https://reviews.llvm.org/D105831	2021-07-14 11:17:24 +01:00
Cullen Rhodes	c08dabb0f4	[AArch64][SME] Add matrix register definitions and parsing support SME introduces the ZA array, a new piece of architectural register state consisting of a matrix of [SVLb x SVLb] bytes, where SVL is the implementation defined Streaming SVE vector length and SVLb is the number of 8-bit elements in a vector of SVL bits. SME instructions consist of three types of matrix operands: * Tiles: a ZA tile is a square, two-dimensional sub-array of elements within the ZA array. These tiles make up the larger accumulator array and the granularity varies based on the element size, i.e. - ZAQ0..ZAQ15 (smallest tile granule) - ZAD0..ZAD7 - ZAS0..ZAS3 - ZAH0..ZAH1 or ZAB0 (largest tile granule, single tile) * Tile vectors: similar to regular tiles, but have an extra 'h' or 'v' to tell how the vector at [reg+offset] is layed out in the tile, horizontally or vertically. E.g. za1h.h or za15v.q, which corresponds to vectors in registers ZAH1 and ZAQ15, respectively. * Accumulator matrix: this is the entire accumulator array ZA. This patch adds the register classes and related operands and parsing for SME instructions operating on the accumulator array. The ADDHA and ADDVA instructions which operate on tiles are also added in this patch to make some use of the code added, later patches will make use of the other operands introduced here. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Co-authored by: Sander de Smalen (@sdesmalen) Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D105570	2021-07-14 08:25:49 +00:00
Jon Roelofs	87c6bf92a9	[AArch64] rm unused subreg's	2021-07-13 18:06:31 -07:00
Jon Roelofs	6377388c32	[AArch64] Fix AArch64::dsub's size	2021-07-13 18:06:31 -07:00
Jessica Paquette	5bd7cc4f42	[AArch64][GlobalISel] Mark v2s64 -> v2p0 G_INTTOPTR as legal Allow ``` %x:_<2 x p0> = G_INTTOPTR %y:_<2 x s64> ``` This shows up when building clang for AArch64 with GlobalISel. Also show that we can select it. This should match SDAG's behaviour: https://godbolt.org/z/33oqYoaYv Differential Revision: https://reviews.llvm.org/D105944	2021-07-13 17:28:14 -07:00
Jon Roelofs	eba638dbbb	[AArch64][GlobalISel] Legalize load <2 x i16> Differential revision: https://reviews.llvm.org/D105913	2021-07-13 11:12:05 -07:00
Jon Roelofs	43c7ca8e49	[AArch64][GlobalISel] Legalize store <2 x i16> Differential revision: https://reviews.llvm.org/D105912	2021-07-13 11:12:05 -07:00
Matt Arsenault	77a608d9de	GlobalISel: Remove getIntrinsicID utility function This is redundant with a method directly on MachineInstr	2021-07-13 11:04:10 -04:00
Tim Northover	7802f62b3f	AArch64: use 4-byte slots for arm64_32 pointers in a tail call	2021-07-13 11:08:59 +01:00
Jon Roelofs	6611fbc62a	[AArch64] Dump a little more info about unimplemented reg-to-reg copies. NFC	2021-07-12 15:37:11 -07:00
Eli Friedman	6c04b7dd4f	[AArch64] Optimize overflow checks for [s\|u]mul.with.overflow.i32. Saves one instruction for signed, uses a cheaper instruction for unsigned. Differential Revision: https://reviews.llvm.org/D105770	2021-07-12 15:30:42 -07:00
Benjamin Kramer	0da3573a9e	[AArch64] Silence unused variable warning. NFC. AArch64ISelLowering.cpp:15167:8: warning: unused variable 'OpCode' [-Wunused-variable] auto OpCode = N->getOpcode(); ^	2021-07-12 16:01:11 +02:00

1 2 3 4 5 ...

5409 Commits