llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	fa1a68285e	[AArch64] Improve fptosi.sat vector lowering Similar to D111236, this improves the lowering of vector fptosi.sat and fptoui.sat, using legal converts and further saturating from there with min/max. f64 are excluded for the moment due to producing worse code in places compared to the unrolling. Differential Revision: https://reviews.llvm.org/D111787	2021-10-15 11:37:53 +01:00
David Green	a4f42a33be	[AArch64] Improve fptosi.sat lowering Improve the lowering of scalar fptosi.sat and fptoui.sat for saturating widths smaller than legal types by using the fact that the legal type will saturate under aarch64, and saturating the result further using min/max. Differential Revision: https://reviews.llvm.org/D111236	2021-10-15 11:12:23 +01:00
guopeilin	b092dc0bb9	[AArch64ISelLowering] Avoid duplane in some cases when sve enabled Reviewed By: david-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D110524	2021-10-15 15:33:24 +08:00
Bradley Smith	2eb42e3d2a	[AArch64][SVE] Add fixed type lowering for EXTRACT_SUBVECTOR Depends on D111135 Differential Revision: https://reviews.llvm.org/D111165	2021-10-12 14:56:15 +00:00
Bradley Smith	03065ecd85	[AArch64][SVE] Ensure LowerEXTRACT_SUBVECTOR is not called for illegal types The lowering for EXTRACT_SUBVECTOR should not be called during type legalization, only as part of lowering, hence return SDValue() when called on illegal types. This also adds missing tests for extracting fixed types from illegal scalable types. Differential Revision: https://reviews.llvm.org/D111412	2021-10-11 11:20:50 +00:00
Andrew Savonichev	7ae8f392a1	[AArch64] Emit AssertZExt for i1 arguments AAPCS requires i1 argument to be zero-extended to 8-bits by the caller. Emit a new AArch64ISD::ASSERT_ZEXT_BOOL hint (or AssertZExt for GlobalISel) to enable some optimization opportunities. In particular, when the argument is forwarded to the callee, we can avoid zero-extension and use it as-is. Differential Revision: https://reviews.llvm.org/D107160	2021-10-11 11:55:11 +03:00
Bradley Smith	5be266db7a	[AArch64][SVE] Improve VECTOR_SPLICE codegen for VL > 128-bit Differential Revision: https://reviews.llvm.org/D111135	2021-10-07 15:28:55 +00:00
Jay Foad	a9bceb2b05	[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC. Stop using APInt constructors and methods that were soft-deprecated in D109483. This fixes all the uses I found in llvm, except for the APInt unit tests which should still test the deprecated methods. Differential Revision: https://reviews.llvm.org/D110807	2021-10-04 08:57:44 +01:00
Kazu Hirata	c1e32b3fc0	[Target] Migrate from getNumArgOperands to arg_size (NFC) Note that getNumArgOperands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-02 12:06:29 -07:00
Tim Northover	3a00e58c2f	AArch64: use indivisible cmpxchg for 128-bit atomic loads at O0 Like normal atomicrmw operations, at -O0 the simple register-allocator can insert spills into the LL/SC loop if it's expanded and visible when regalloc runs. This can cause the operation to never succeed by repeatedly clearing the monitor. Instead expand to a cmpxchg, which has a pseudo-instruction for -O0.	2021-09-22 14:20:43 +01:00
Tim Northover	13aa102e07	AArch64: use ldp/stp for 128-bit atomic load/store in v.84 onwards v8.4 says that normal loads/stores of 128-bytes are single-copy atomic if they're properly aligned (which all LLVM atomics are) so we no longer need to do a full RMW operation to guarantee we got a clean read.	2021-09-20 09:50:11 +01:00
Cullen Rhodes	17f1ccc759	[AArch64][SVE] NFC: Remove unnecessary if	2021-09-16 11:26:46 +00:00
David Truby	915e9e76bf	[llvm][sve] Lowering for VLS masked extending loads This extends the custom lowering for extending loads on fixed length vectors in SVE to support masked extending loads. The existing tests for correct behaviour of masked extending loads exhibit bad code generation due to the legalistaion of i1 vectors. They have been left as-is and new tests have been added that do not exhibit this behaviour. Differential Revision: https://reviews.llvm.org/D108200	2021-09-13 11:13:25 +01:00
Huihui Zhang	da4a2fd832	[AArch64ISelLowering] Fix null pointer access in performSVEAndCombine. When combining 'and' of an unsigned unpack and shuffle instruction, bail early if shuffle is not constructed from a constant integer. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D109556	2021-09-10 10:36:43 -07:00
Craig Topper	9af8f1b18e	[SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode. Soft deprecrate isNullValue/isAllOnesValue and update in tree callers. This matches the changes to the APInt interface from D109483. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D109535	2021-09-09 13:28:30 -07:00
Chris Lattner	735f46715d	[APInt] Normalize naming on keep constructors / predicate methods. This renames the primary methods for creating a zero value to `getZero` instead of `getNullValue` and renames predicates like `isAllOnesValue` to simply `isAllOnes`. This achieves two things: 1) This starts standardizing predicates across the LLVM codebase, following (in this case) ConstantInt. The word "Value" doesn't convey anything of merit, and is missing in some of the other things. 2) Calling an integer "null" doesn't make any sense. The original sin here is mine and I've regretted it for years. This moves us to calling it "zero" instead, which is correct! APInt is widely used and I don't think anyone is keen to take massive source breakage on anything so core, at least not all in one go. As such, this doesn't actually delete any entrypoints, it "soft deprecates" them with a comment. Included in this patch are changes to a bunch of the codebase, but there are more. We should normalize SelectionDAG and other APIs as well, which would make the API change more mechanical. Differential Revision: https://reviews.llvm.org/D109483	2021-09-09 09:50:24 -07:00
Bradley Smith	8089f9ed5a	[AArch64][SVE] Add missing patterns for unpredicated subr intrinsics Differential Revision: https://reviews.llvm.org/D109369	2021-09-09 10:28:37 +00:00
Ben Shi	f0460fa4eb	[AArch64] Improve target hook function to decide folding (mul (add x, c1), c2) Prevent the folding if it leads to worse code. Reviewed By: dmgreen, kda Differential Revision: https://reviews.llvm.org/D108871	2021-09-08 01:51:26 +00:00
Sander de Smalen	448d47f743	[AArch64][SVE] Implement all-inactive predicate with PFALSE. Instead of using a WHILE XZR, XZR instruction, just emit a PFALSE. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D109311	2021-09-07 14:29:02 +01:00
David Truby	b297531ece	[AArch64][sve] Prevent incorrect function call on fixed width vector The isEssentiallyExtractHighSubvector function currently calls getVectorNumElements on a type that in specific cases might be scalable. Since this function only has correct behaviour at the moment on scalable types anyway, the function can just return false when given a fixed type. Differential Revision: https://reviews.llvm.org/D109163	2021-09-06 14:25:03 +01:00
Kevin Athey	c7f50a445e	Revert "[AArch64] Implement target hook function to decide folding (mul (add x, c1), c2)" This reverts commit `095bea23d0`. Broke buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/11411	2021-09-03 18:08:58 -07:00
Ben Shi	095bea23d0	[AArch64] Implement target hook function to decide folding (mul (add x, c1), c2) Prevent the folding if it leads to worse code. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D108871	2021-09-04 07:24:23 +08:00
Cullen Rhodes	1dcd900d1d	[AArch64][ISel] NFC: DAG.getMachineFunction() -> MF Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D109135	2021-09-03 07:59:17 +00:00
Bradley Smith	14e1a4a6ee	[AArch64][SVE] Workaround incorrect types when lowering fixed length gather/scatter When lowering a fixed length gather/scatter the index type is assumed to be the same as the memory type, this is incorrect in cases where the extension of the index has been folded into the addressing mode. For now add a temporary workaround to fix the codegen faults caused by this by preventing the removal of this extension. At a later date the lowering for SVE gather/scatters will be redesigned to improve the way addressing modes are handled. As a short term side effect of this change, the addressing modes generated for fixed length gather/scatters will not be optimal. Differential Revision: https://reviews.llvm.org/D109145	2021-09-02 15:07:24 +00:00
Jun Ma	15b2a8e7fa	[AArch64][SVE] Optimize ptrue predicate pattern with known sve register width. For vectors that are exactly equal to getMaxSVEVectorSizeInBits, just use AArch64SVEPredPattern::all, which can enable the use of unpredicated ptrue when available. TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D108706	2021-08-27 20:03:48 +08:00
Jun Ma	8c47103491	[AArch64][SVE] Add API for conversion between SVE predicate pattern and element number. NFC This patch solely moves convert operation between SVE predicate pattern and element number into two small functions. It's pre-commit patch for optimize pture with known sve register width. Differential Revision: https://reviews.llvm.org/D108705	2021-08-27 20:03:48 +08:00
Jun Ma	3f919dfe0d	[AArch64][SVE] Use getPTrue uniformly.NFC.	2021-08-27 20:03:48 +08:00
Jacob Bramley	05f3219b38	[AArch64] Lower fptoi.sat intrinsics for NEON. Following on from D102353, extend the fptoi.sat intrinsics to use NEON fcvt* instructions. Differential Revision: https://reviews.llvm.org/D108460	2021-08-26 15:37:00 +01:00
Tim Northover	edab411ee6	AArch64: copy all parts of the mem operand across when combining a store In particular we were dropping volatility, which can lead to unwanted transformations.	2021-08-19 18:26:39 +01:00
Owen Anderson	06a4c85890	Use v16i8 rather than v2i64 as the VT for memset expansion on AArch64. This allows the instruction selector to realize that it can directly broadcast the low byte of the memset value, rather than replicating it to a 64-bit GPR before broadcasting. This fixes PR50985. Differential Revision: https://reviews.llvm.org/D108354	2021-08-19 16:54:07 +00:00
Tim Northover	8eb054a87d	AArch64: compare correct type for multi-valued SDNode. If Orig produces more than one value (rare) with different types (rarer) then we need to make sure we check against the one that Orig actually represents, not just the first type. Unfortunately because of the combination of things that need to happen I wasn't able to produce a test.	2021-08-18 09:35:31 +01:00
Arthur Eubanks	46cf82532c	[NFC] Replace Function handling of attributes with less confusing calls To avoid magic constants and confusing indexes.	2021-08-17 21:05:40 -07:00
Simon Pilgrim	895ed64009	[AArch64] LowerCONCAT_VECTORS - merge getNumOperands() calls. NFCI. Improves on the unused variable fix from rG4357562067003e25ab343a2d67a60bd89cd66dbf	2021-08-17 11:23:03 +01:00
Jordan Rupprecht	4357562067	[NFC][AArch64] Fix unused var in release build	2021-08-16 10:04:32 -07:00
Simon Pilgrim	d6fe8d37c6	[DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b) Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal. This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands. Differential Revision: https://reviews.llvm.org/D107597	2021-08-16 16:06:54 +01:00
Arthur Eubanks	92ce6db9ee	[NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr() This is more consistent with similar methods.	2021-08-13 11:09:18 -07:00
David Truby	9c47d6b48d	[llvm][sve] Lowering for VLS extending loads This patch enables extending loads for fixed length SVE code generation. There is a slight regression here in the mulh tests; since these tests load the parameter and then extend it these are treated as extending loads which are merged, preventing the mulh instruction from being generated. As this affects scalable SVE codegen as well this should be addressed in a separate patch. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D107057	2021-08-12 09:43:39 +00:00
David Green	013030a0b2	[AArch64] Correct sinking of shuffles to adds/subs This was checking extends as shuffles, where as we should be checking the operands. This helps sink the shuffles, creating more addl/subl instructions. Differential Revision: https://reviews.llvm.org/D107623	2021-08-10 13:25:42 +01:00
Eli Friedman	ac20e56911	[AArch64] Implement FCOPYSIGN for SVE. I was originally going to try to implement this in target-independent code, but it's actually sort of tricky to generate the correct sequence for vectors like nxv2f32. So just stick this in target-specific code, at least for now. Differential Revision: https://reviews.llvm.org/D107608	2021-08-09 12:06:48 -07:00
Bradley Smith	73ecb9987b	[AArch64][SVE] Fix assertion failure when lowering fixed length gather/scatter The patterns for fixed length gather/scatter with 32-bit offsets and 64-bit memory type are slightly different that the rest of the patterns, as such the lowering needs to be slightly different to ensure the correct types are used. Differential Revision: https://reviews.llvm.org/D107576	2021-08-09 14:05:22 +00:00
Simon Pilgrim	2cbf9fd402	[DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes. This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector. This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector. Fixes PR50053 Differential Revision: https://reviews.llvm.org/D107068	2021-08-05 15:40:48 +01:00
Simon Wallis	9269752671	[AArch64] Fix assert AArch64TargetLowering::ReplaceNodeResults Don't know how to custom expand this UNREACHABLE executed at llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16788 The fix is to provide missing expansions for: case ISD::STRICT_FP_TO_UINT: case ISD::STRICT_FP_TO_SINT: A test case is provided. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107452	2021-08-04 16:18:19 +01:00
Irina Dobrescu	b01417d3c5	[AArch64] Optimise min/max lowering in ISel Differential Revision: https://reviews.llvm.org/D106561	2021-08-02 13:40:21 +01:00
Alexandros Lamprineas	7d940432c4	[AArch64] Legalize MVT::i64x8 in DAG isel lowering This patch legalizes the Machine Value Type introduced in D94096 for loads and stores. A new target hook named getAsmOperandValueType() is added which maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization. Differential Revision: https://reviews.llvm.org/D94097	2021-07-31 09:51:28 +01:00
Bradley Smith	191831e380	[AArch64][SVE] Fix incorrect mask type when lowering fixed type SVE gather/scatter An incorrect mask type when lowering an SVE gather/scatter was causing a codegen fault which manifested as the incorrect predicate size being used for an SVE gather/scatter, (e.g.. p0.b rather than p0.d). Fixes PR51182. Differential Revision: https://reviews.llvm.org/D106943	2021-07-29 11:22:17 +00:00
Craig Topper	2ea9db0c49	[AArch64] Fix -Wparentheses warning with gcc 5.4. NFC	2021-07-26 21:08:56 -07:00
Caroline Concatto	0bfc26e3a4	[SVE][AArch64] Improve code generation for vector_splice for Imm > 0 This patch implements vector_splice in tablegen for all cases when the Immediate is positive and lower than the known minimum value of a scalable vector. Vector_splice can be implemented using SVE instruction EXT. For instance : @llvm.experimental.vector.splice(Vector_1, Vector_2, Imm) @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> EXT Vector_1, Vector_2, Imm // Vector_1 = B, C, D + Vector_2 = E Depends on D105633 Differential Revision: https://reviews.llvm.org/D106273	2021-07-26 11:45:46 +01:00
Caroline Concatto	73e4e9cd00	[AArch64][SVE] Improve code generation for vector_splice for Imm == -1 This patch implements vector_splice in tablegen for: a) when the immediate is equal to -1 (Imm==1) and uses: INSR + LASTB For instance : @llvm.experimental.vector.splice(Vector_1, Vector_2, -1) @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <D, E, F, G> LAST RegLast, Vector_1 // RegLast = D INSR Res, (Vector_1 >> 1), RegLast // Res = D + E, F, G Differential Revision: https://reviews.llvm.org/D105633	2021-07-26 11:25:01 +01:00
Benjamin Kramer	dd70cd089a	[llvm][sve] Silence unused variable warning in Release builds. NFC	2021-07-23 16:16:35 +02:00
David Truby	1528a4d400	[llvm][sve] Lowering for VLS truncating stores This adds custom lowering for truncating stores when operating on fixed length vectors in SVE. It also includes a DAG combine to fold extends followed by truncating stores into non-truncating stores in order to prevent this pattern appearing once truncating stores are supported. Currently truncating stores are not used in certain cases where the size of the vector is larger than the target vector width. Differential Revision: https://reviews.llvm.org/D104471	2021-07-23 14:04:55 +01:00
Eli Friedman	664a1fd9f0	[AArch64] Use the CMP_SWAP_128 variants added in `843c6140`. Accidentally forgot to flip the opcode... and I didn't notice because it was working fine for the GlobalISel.	2021-07-20 13:23:27 -07:00
Fangrui Song	0c0549fbb3	[AArch64] Delete unused Opcode after D106039	2021-07-20 12:51:44 -07:00
Eli Friedman	843c614058	[AArch64] Fix i128 cmpxchg using ldxp/stxp. Basically two parts to this fix: 1. Stop using AtomicExpand to expand cmpxchg i128 2. Fix AArch64ExpandPseudoInsts to use a correct expansion. From ARM architecture reference: To atomically load two 64-bit quantities, perform a Load-Exclusive pair/Store-Exclusive pair sequence of reading and writing the same value for which the Store-Exclusive pair succeeds, and use the read values from the Load-Exclusive pair. Fixes https://bugs.llvm.org/show_bug.cgi?id=51102 Differential Revision: https://reviews.llvm.org/D106039	2021-07-20 12:38:12 -07:00
Amy Huang	fd972bb9fd	Revert "[llvm][sve] Lowering for VLS truncating stores" because it causes a seg fault (see https://reviews.llvm.org/D104471). This reverts commit `c305557acd`.	2021-07-19 11:03:33 -07:00
David Green	5561ad8b36	[ARM] Remove PromotedBitwiseVT for NEON types This removes the promotion of NEON AND, OR and XOR nodes to v2i32/v4i32, treating them the same as the AArch64 and MVE backends where we just add the relevant patterns for each legal type. This prevents a lot of bitcasts from being added to the DAG, which have the potential to make optimizations more difficult. It does mean adding extra patterns, and some codegen can change due to the types now being legal, not promoted. Differential Revision: https://reviews.llvm.org/D105588	2021-07-19 16:36:33 +01:00
Sander de Smalen	0ed0573527	[AArch64][SVE] Optimize bitcasts between unpacked half/i16 vectors. The case for nxv2f32/nxv2i32 was already covered by D104573. This patch builds on top of that by making the mechanism work for nxv2[b]f16/nxv2i16, nxv4[b]f16/nxv4i16 as well. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D106138	2021-07-19 08:29:28 +01:00
Eli Friedman	1e30bf8621	[SelectionDAG] Add an overload of getStepVector that assumes step 1. This is mostly a minor convenience, but the pattern seems frequent enough to be worthwhile (and we'll probably add more uses in the future). Differential Revision: https://reviews.llvm.org/D105850	2021-07-14 11:37:01 -07:00
Tim Northover	7802f62b3f	AArch64: use 4-byte slots for arm64_32 pointers in a tail call	2021-07-13 11:08:59 +01:00
Eli Friedman	6c04b7dd4f	[AArch64] Optimize overflow checks for [s\|u]mul.with.overflow.i32. Saves one instruction for signed, uses a cheaper instruction for unsigned. Differential Revision: https://reviews.llvm.org/D105770	2021-07-12 15:30:42 -07:00
Benjamin Kramer	0da3573a9e	[AArch64] Silence unused variable warning. NFC. AArch64ISelLowering.cpp:15167:8: warning: unused variable 'OpCode' [-Wunused-variable] auto OpCode = N->getOpcode(); ^	2021-07-12 16:01:11 +02:00
Michael Liao	8253fa2298	Fix warning '-Wparentheses'. NFC.	2021-07-12 09:25:30 -04:00
David Truby	c305557acd	[llvm][sve] Lowering for VLS truncating stores This adds custom lowering for truncating stores when operating on fixed length vectors in SVE. It also includes a DAG combine to fold extends followed by truncating stores into non-truncating stores in order to prevent this pattern appearing once truncating stores are supported. Currently truncating stores are not used in certain cases where the size of the vector is larger than the target vector width. Differential Revision: https://reviews.llvm.org/D104471	2021-07-12 11:14:17 +01:00
Bradley Smith	026bb84bcd	[AArch64][SVE] Add ISel patterns for floating point compare with zero instructions Additionally, lower the floating point compare SVE intrinsics to SETCC_MERGE_ZERO ISD nodes to avoid duplicating ISel patterns. Differential Revision: https://reviews.llvm.org/D105486	2021-07-08 10:46:12 +00:00
Bradley Smith	5ab9000fbb	[AArch64][SVE] Fix selection failures for scalable MLOAD nodes with passthru Differential Revision: https://reviews.llvm.org/D105348	2021-07-06 14:17:23 +00:00
Caroline Concatto	a2c5c56055	[AArch64][CostModel] Add cost model for experimental.vector.splice This patch adds a new ShuffleKind SK_Splice and then handle the cost in getShuffleCost, as in experimental.vector.reverse. Differential Revision: https://reviews.llvm.org/D104630	2021-07-05 14:30:24 +01:00
Bradley Smith	cc273983f7	[AArch64][SVE] Improve fixed length codegen for common vector shuffle case Improve codegen when lowering the common vector shuffle case from the vectorizer (op1[last]:op2[0:last-1]). This patch only handles this common case as it is difficult to handle this more generally when using fixed length vectors, due to being unable to use the SVE ext instruction. Differential Revision: https://reviews.llvm.org/D105289	2021-07-05 12:09:27 +01:00
Paul Walker	287d39dd5a	[NFC] Fix a few whitespace issues and typos.	2021-07-04 11:49:58 +01:00
Krzysztof Parzyszek	df88c26f0d	[OpaquePtr] Add type parameter to emitLoadLinked Differential Revision: https://reviews.llvm.org/D105353	2021-07-02 13:07:40 -05:00
Florian Hahn	1a248233a5	[AArch64] Use custom lowering for fp16 vector copysign. The custom copysign lowering already supports fp16. Use it. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105277	2021-07-02 11:15:30 +01:00
Eli Friedman	0176ac9503	[AArch64] Optimize SVE bitcasts of unpacked types. Target-independent code only knows how to spill to the stack; instead, use AArch64ISD::REINTERPRET_CAST. Differential Revision: https://reviews.llvm.org/D104573	2021-07-01 15:35:48 -07:00
Bradley Smith	2668727929	[SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR Inserting into a smaller-than-legal scalable vector would result in an internal compiler error. For example, inserting a <vscale x 4 x i8> into a <vscale x 8 x i8> (both illegal vector types for SVE) would cause a crash. This crash was happening because there was no code to promote (legalise) the result of an INSERT_SUBVECTOR node. This patch implements PromoteIntRes_INSERT_SUBVECTOR, which legalises the ISD node. This is currently done by going through memory. This is necessary because of the requirement that the SubVec parameter of the INSERT_SUBVECTOR node must be smaller than the Vec parameter, which means that INSERT_SUBVECTOR cannot always have a legal result/operand types. Co-Authored-by: Joe Ellis <joe.ellis@arm.com> Differential Revision: https://reviews.llvm.org/D102766	2021-07-01 17:05:53 +01:00
Bradley Smith	01b846674d	[AArch64][SVE] Add support for fixed length MSCATTER/MGATHER Since gather lowering can now lower to nodes that may need expansion via the vector legalizer, do MGATHER lowering via vector legalizer. Additionally, as part of adding passthru support for fixed typed gathers, fix passthru support for scalable types. Depends on D104910 Differential Revision: https://reviews.llvm.org/D104217	2021-07-01 12:13:59 +01:00
Fangrui Song	17858da022	[AArch64] Remove unneeded ExternalSymbolSDNode code for machine constraint "S". NFC ExternalSymbolSDNode is implicitly generated libcalls but with an address taking operation we cannot reference an ExternalSymbolSDNode.	2021-06-30 17:52:56 -07:00
Sjoerd Meijer	b062fff87a	Recommit "[AArch64] Custom lower <4 x i8> loads" This recommits D104782 including a fix for adding a wrong operand to the new load node. Differential Revision: https://reviews.llvm.org/D105110	2021-06-30 09:18:06 +01:00
Dylan Fleming	c3d3defd11	[SVE] Added CodeGen support for inserting an element into a predicate vector Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D104722	2021-06-29 14:55:40 +01:00
Sjoerd Meijer	3a7cea2858	Revert "[AArch64] Custom lower <4 x i8> loads" This reverts commit `51e434fc25` because of a build bot failure in test-suite::GCC-C-execute-pr60960.test that I need to investigate.	2021-06-28 17:44:46 +01:00
Bradley Smith	c089e29aa4	[AArch64][SVE] DAG combine SETCC_MERGE_ZERO of a SETCC_MERGE_ZERO This helps remove extra comparisons when generating masks for fixed length masked operations. Differential Revision: https://reviews.llvm.org/D104910	2021-06-28 15:06:06 +01:00
David Green	2887f14639	[ISel] Port AArch64 SABD and UABD to DAGCombine This ports the AArch64 SABD and USBD over to DAG Combine, where they can be used by more backends (notably MVE in a follow-up patch). The matching code has changed very little, just to handle legal operations and types differently. It selects from (ABS (SUB (EXTEND a), (EXTEND b))), producing a ubds/abdu which is zexted to the original type. Differential Revision: https://reviews.llvm.org/D91937	2021-06-26 19:34:16 +01:00
Sjoerd Meijer	51e434fc25	[AArch64] Custom lower <4 x i8> loads This custom lowers <4 x i8> vector loads using a 32-bit load, followed by 2 SSHLL instructions to extend it to e.g. a <4 x i32> vector. Before, it was really inefficient and expensive to construct a <4 x i32> for this as 4 byte loads and 4 moves were used. With this improvement SLP vectorisation might for example become profitable, see D103629. Differential Revision: https://reviews.llvm.org/D104782	2021-06-25 09:53:51 +01:00
Martin Storsjö	42f74e8249	[llvm] Rename StringRef _lower() method calls to _insensitive() This is a mechanical change. This actually also renames the similarly named methods in the SmallString class, however these methods don't seem to be used outside of the llvm subproject, so this doesn't break building of the rest of the monorepo.	2021-06-25 00:22:01 +03:00
Sander de Smalen	d5e14ba88c	[GlobalISel] NFC: Change LLT::vector to take ElementCount. This also adds new interfaces for the fixed- and scalable case: * LLT::fixed_vector * LLT::scalable_vector The strategy for migrating to the new interfaces was as follows: * If the new LLT is a (modified) clone of another LLT, taking the same number of elements, then use LLT::vector(OtherTy.getElementCount()) or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2) or operator. That is because there is no reason to specifically restrict the types to 'fixed_vector'. If the algorithm works on the number of elements (as unsigned), then just use fixed_vector. This will need to be fixed up in the future when modifying the algorithm to also work for scalable vectors, and will need then need additional tests to confirm the behaviour works the same for scalable vectors. * If the test used the '/Scalable=/true` flag of LLT::vector, then this is replaced by LLT::scalable_vector. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104451	2021-06-24 11:26:12 +01:00
David Spickett	e4ecd83fe9	[llvm][AArch64] Handle arrays of struct properly (from IR) This only applies to FastIsel. GlobalIsel seems to sidestep the issue. This fixes https://bugs.llvm.org/show_bug.cgi?id=46996 One of the things we do in llvm is decide if a type needs consecutive registers. Previously, we just checked if it was an array or not. (plus an SVE specific check that is not changing here) This causes some confusion when you arbitrary IR like: ``` %T1 = type { double, i1 }; define [ 1 x %T1 ] @foo() { entry: ret [ 1 x %T1 ] zeroinitializer } ``` We see it is an array so we call CC_AArch64_Custom_Block which bails out when it sees the i1, a type we don't want to put into a block. This leaves the location of the double in some kind of intermediate state and leads to odd codegen. Which then crashes the backend because it doesn't know how to implement what it's been asked for. You get this: ``` renamable $d0 = FMOVD0 $w0 = COPY killed renamable $d0 ``` Rather than this: ``` $d0 = FMOVD0 $w0 = COPY $wzr ``` The backend knows how to copy 64 bit to 64 bit registers, but not 64 to 32. It can certainly be taught how but the real issue seems to be us even trying to assign a register block in the first place. This change makes the logic of AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters a bit more in depth. If we find an array, also check that all the nested aggregates in that array have a single member type. Then CC_AArch64_Custom_Block's assumption of a type that looks like [ N x type ] will be valid and we get the expected codegen. New tests have been added to exercise these situations. Note that some of the output is not ABI compliant. The aim of this change is to simply handle these situations and not to make our processing of arbitrary IR ABI compliant. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D104123	2021-06-16 13:56:01 +00:00
Huihui Zhang	1c096bf09f	[SVE][LSR] Teach LSR to enable simple scaled-index addressing mode generation for SVE. Currently, Loop strengh reduce is not handling loops with scalable stride very well. Take loop vectorized with scalable vector type <vscale x 8 x i16> for instance, (refer to test/CodeGen/AArch64/sve-lsr-scaled-index-addressing-mode.ll added). Memory accesses are incremented by "16vscale", while induction variable is incremented by "8vscale". The scaling factor "2" needs to be extracted to build candidate formula i.e., "reg(%in) + 2reg({0,+,(8 %vscale)}". So that addrec register reg({0,+,(8vscale)}) can be reused among Address and ICmpZero LSRUses to enable optimal solution selection. This patch allow LSR getExactSDiv to recognize special cases like "C1XY /s C2X*Y", and pull out "C1 /s C2" as scaling factor whenever possible. Without this change, LSR is missing candidate formula with proper scaled factor to leverage target scaled-index addressing mode. Note: This patch doesn't fully fix AArch64 isLegalAddressingMode for scalable vector. But allow simple valid scale to pass through. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D103939	2021-06-14 16:42:34 -07:00
Jingu Kang	08ce52ef5e	[AArch64] Improve SAD pattern Given a vecreduce_add node, detect the below pattern and convert it to the node sequence with UABDL, [S\|U]ADB and UADDLP. i32 vecreduce_add( v16i32 abs( v16i32 sub( v16i32 [sign\|zero]_extend(v16i8 a), v16i32 [sign\|zero]_extend(v16i8 b)))) =================> i32 vecreduce_add( v4i32 UADDLP( v8i16 add( v8i16 zext( v8i8 [S\|U]ABD low8:v16i8 a, low8:v16i8 b v8i16 zext( v8i8 [S\|U]ABD high8:v16i8 a, high8:v16i8 b Differential Revision: https://reviews.llvm.org/D104042	2021-06-14 15:48:51 +01:00
Nikita Popov	1ffa6499ea	[TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC) Don't require a specific kind of IRBuilder for TargetLowering hooks. This allows us to drop the IRBuilder.h include from TargetLowering.h. Differential Revision: https://reviews.llvm.org/D103759	2021-06-06 16:29:50 +02:00
David Green	12f53e5392	[AArch64] Remove AArch64ISD::NEG This NEG node is just a vector negation, easily represented as a SUB zero. Removing it from the one place it is generated is essentially an NFC, but can allow some extra folding. The updated tests are now loading different constant literals, which have already been negated. Differential Revision: https://reviews.llvm.org/D103703	2021-06-05 19:54:42 +01:00
Bradley Smith	a85f5874e2	[AArch64] Remove SETCC of CSEL when the latter's condition can be inverted setcc (csel 0, 1, cond, X), 1, ne ==> csel 0, 1, !cond, X Where X is a condition code setting instruction. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D103256	2021-06-04 15:53:21 +01:00
Irina Dobrescu	e971099a9b	[AArch64] Optimise bitreverse lowering in ISel Differential Revision: https://reviews.llvm.org/D103105	2021-06-02 12:51:12 +01:00
Jessica Paquette	e7f501b5e7	[GlobalISel][AArch64] Combine and (lshr x, cst), mask -> ubfx x, cst, width Also add a target hook which allows us to get around custom legalization on AArch64. Differential Revision: https://reviews.llvm.org/D99283	2021-06-01 10:56:17 -07:00
Eli Friedman	0b3b0a727a	[AArch64][RISCV] Make sure isel correctly honors failure orderings. If a cmpxchg specifies acquire or seq_cst on failure, make sure we generate code consistent with that ordering even if the success ordering is not acquire/seq_cst. At one point, it was ambiguous whether this sort of construct was valid, but the C++ standad and LLVM now accept arbitrary combinations of success/failure orderings. This doesn't address the corresponding issue in AtomicExpand. (This was reported as https://bugs.llvm.org/show_bug.cgi?id=33332 .) Fixes https://bugs.llvm.org/show_bug.cgi?id=50512. Differential Revision: https://reviews.llvm.org/D103284	2021-05-28 12:47:40 -07:00
Tim Northover	9ff2eb1ea5	SwiftTailCC: teach verifier musttail rules applicable to this CC. SwiftTailCC has a different set of requirements than the C calling convention for a tail call. The exact argument sequence doesn't have to match, but fewer ABI-affecting attributes are allowed. Also make sure the musttail diagnostic triggers if a musttail call isn't actually a tail call.	2021-05-28 11:12:00 +01:00
Bradley Smith	f3c577ed38	[AArch64][SVE] Add fixed length codegen for FP_TO_{S,U}INT/{S,U}INT_TO_FP Depends on D102607 Differential Revision: https://reviews.llvm.org/D102777	2021-05-25 12:54:55 +01:00
Bradley Smith	e40513252a	[AArch64][SVE] Add fixed length codegen for FP_ROUND/FP_EXTEND Depends on D102498 Differential Revision: https://reviews.llvm.org/D102607	2021-05-24 13:02:30 +01:00
Bradley Smith	4bc14be259	[AArch64][SVE] Improve codegen for fixed length vector concat Differential Revision: https://reviews.llvm.org/D102498	2021-05-24 12:56:02 +01:00
David Truby	bf3b6cf920	[llvm][sve] Lowering for VLS MLOAD/MSTORE This adds custom lowering for the MLOAD and MSTORE ISD nodes when passed fixed length vectors in SVE. This is done by converting the vectors to VLA vectors and using the VLA code generation. Fixed length extending loads and truncating stores currently produce correct code, but do not use the built in extend/truncate in the load and store instructions. This will be fixed in a future patch. Differential Revision: https://reviews.llvm.org/D101834	2021-05-20 10:50:59 +00:00
Andrew Savonichev	a647100b43	[AArch64] Combine vector shift instructions in SelectionDAG bswap.v2i16 + sitofp in LLVM IR generate a sequence of: - REV32 + USHR for bswap.v2i16 - SHL + SSHR + SCVTF for sext to v2i32 and scvt The shift instructions are excessive as noted in PR24820, and they can be optimized to just SSHR. Differential Revision: https://reviews.llvm.org/D102333	2021-05-20 10:50:13 +03:00
Eli Friedman	3dd49ec194	[AArch64][SVE] Implement extractelement of i1 vectors. The implementation just extends the vector to a larger element type, and extracts from that. Not fancy, but generates reasonable code. There was discussion in the review of doing the promotion in target-independent code, but I'm sticking with this to avoid making LegalizeDAG infrastructure more complicated. Differential Revision: https://reviews.llvm.org/D87651	2021-05-17 14:51:11 -07:00
Irina Dobrescu	50511df32e	[AArch64] Lower bitreverse in ISel Adding lowering support for bitreverse. Previously, lowering bitreverse would expand it into a series of other instructions. This patch makes it so this produces a single rbit instruction instead. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D102397	2021-05-17 13:35:27 +01:00
Tim Northover	82a0e808bb	IR/AArch64/X86: add "swifttailcc" calling convention. Swift's new concurrency features are going to require guaranteed tail calls so that they don't consume excessive amounts of stack space. This would normally mean "tailcc", but there are also Swift-specific ABI desires that don't naturally go along with "tailcc" so this adds another calling convention that's the combination of "swiftcc" and "tailcc". Support is added for AArch64 and X86 for now.	2021-05-17 10:48:34 +01:00
Jacob Bramley	900c898994	[AArch64] Lower fptoi.sat intrinsics. AArch64's fctv instructions implement the saturating behaviour that the fpto*i.sat intrinsics require, in cases where the destination width matches the saturation width. Lowering them removes a lot of unnecessary generated code. Only scalar lowerings are supported for now. Differential Revision: https://reviews.llvm.org/D102353	2021-05-17 10:19:19 +01:00
Bradley Smith	90ffcb1245	[AArch64][SVE] Add unpredicated vector BIC ISD node Addition of this node allows us to better utilize the different forms of the SVE BIC instructions, including using the alias to an AND (immediate). Differential Revision: https://reviews.llvm.org/D101831	2021-05-14 16:12:13 +01:00
Tim Northover	ea0eec69f1	IR+AArch64: add a "swiftasync" argument attribute. This extends any frame record created in the function to include that parameter, passed in X22. The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001 in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect of this is that tools walking the stack should expect to see one of three values there: * 0b0000 => a normal, non-extended record with just [FP, LR] * 0b0001 => the extended record [X22, FP, LR] * 0b1111 => kernel space, and a non-extended record. All other values are currently reserved. If compiling for arm64e this context pointer is address-discriminated with the discriminator 0xc31a and the DB (process-specific) key. There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing front-ends access to this slot (and forcing its creation initialized to nullptr if necessary).	2021-05-14 11:43:58 +01:00
Peter Waller	6e6f9a636b	[AArch64][SVE] Improve sve.convert.to.svbool lowering The sve.convert.to.svbool lowering has the effect of widening a logical <M x i1> vector representing lanes into a physical <16 x i1> vector representing bits in a predicate register. In general, if converting to svbool, the contents of lanes in the physical register might not be known. For sve.convert.to.svbool the new lanes are specified to be zeroed, requiring 'and' instructions to mask off the new lanes. For lanes coming from a ptrue or a comparison, however, they are known to be zero. CodeGen Before: ptrue p0.s, vl16 ptrue p1.s ptrue p2.b and p0.b, p2/z, p0.b, p1.b ret After: ptrue p0.s, vl16 ret Differential Revision: https://reviews.llvm.org/D101544	2021-05-12 10:57:25 +01:00
Bradley Smith	635164b95a	[AArch64][SVE] Improve SVE codegen for fixed length BITCAST Expanding a fixed length operation involves wrapping the operation in an insert/extract subvector pair, as such, when this is done to bitcast we end up with an extract_subvector of a bitcast. DAGCombine tries to convert this into a bitcast of an extract_subvector which restores the initial fixed length bitcast, causing an infinite loop of legalization. As part of this patch, we must make sure the above DAGCombine does not trigger after legalization if the created bitcast would not be legal. Differential Revision: https://reviews.llvm.org/D101990	2021-05-10 14:43:53 +01:00
Bradley Smith	65c89cd1a6	[AArch64][SVE] Better utilisation of unpredicated forms of remaining intrinsics When using predicated intrinsics, if the predicate used is all lanes active, use an unpredicated form of the instruction, additionally this allows for better use of immediate forms. This only includes instructions where the unpredicated/predicated forms matched in such a way that instruction selection would not introduce extra ptrue instructions. This allows us to convert the intrinsics directly to architecture independent ISD nodes. Depends on D101062 Differential Revision: https://reviews.llvm.org/D101828	2021-05-10 13:06:02 +01:00
Bradley Smith	f8f953c2a6	[AArch64][SVE] Better utilisation of unpredicated forms of arithmetic intrinsics When using predicated arithmetic intrinsics, if the predicate used is all lanes active, use an unpredicated form of the instruction, additionally this allows for better use of immediate forms. This also includes a new complex isel pattern which allows matching an all active predicate when the types are different but the predicate is a superset of the type being used. For example, to allow a b8 ptrue for a b32 predicate operand. This only includes instructions where the unpredicated/predicated forms are mismatched between variants, meaning that the removal of the predicate is done during instruction selection in order to prevent spurious re-introductions of ptrue instructions. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D101062	2021-05-10 13:05:37 +01:00
Sander de Smalen	407a33889d	[AArch64][SVE] Fix isel failure for FP-extending loads DAGCombiner tries to combine a (fpext (load)) to (fround (extload)) but SVE has no FP-extending loads. By marking these as expand, the combine no longer happens. This also fixes a similar issue for fptrunc, where the source type is not a legal type. Reviewed By: bsmith, kmclaughlin Differential Revision: https://reviews.llvm.org/D102053	2021-05-10 11:27:38 +01:00
Jun Ma	b3aeb13892	[AArch64][SVE] Remove index_vector node. Since index_vector is lowered into step_vector in D100816, we can just remove index_vector, use step_vector for codegen directly. Differential Revision: https://reviews.llvm.org/D101593	2021-05-10 11:08:58 +08:00
Simon Pilgrim	280aa3415e	[DAG] Add a generic expansion for SHIFT_PARTS opcodes using funnel shifts Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts. Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication. I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to). NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly. Differential Revision: https://reviews.llvm.org/D101987	2021-05-07 13:12:30 +01:00
Bradley Smith	9f37980d45	[AArch64][SVE] Fold insert(zero, extract(X, 0), 0) -> X, when X is known to zero lanes 1-N Specifically, this allow us to rely on the lane zero'ing behaviour of SVE reduce instructions. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D101369	2021-05-04 15:05:05 +01:00
David Green	966435daf9	[AArch64] Fold CSEL x, x, cc -> x This can come up in rare situations, where a csel is created with identical operands. These can be folded simply to the original value, allowing the csel to be removed and further simplification to happen. This patch also removes FCSEL as it is unused, not being produced anywhere or lowered to anything. Differential Revision: https://reviews.llvm.org/D101687	2021-05-03 17:34:05 +01:00
LemonBoy	4751cadcca	[AArch64] Prevent spilling between ldxr/stxr pairs Apply the same logic used to check if CMPXCHG nodes should be expanded at -O0: the register allocator may end up spilling some register in between the atomic load/store pairs, breaking the atomicity and possibly stalling the execution. Fixes PR48017 Reviewed By: efriedman Differential Revision: https://reviews.llvm.org/D101163	2021-05-01 17:17:05 +02:00
Eli Friedman	6e6ae6c727	[AArch64] Fix lowering for fshl/fshr with SVE types. These operations don't exist natively, so just let the target-independent code expand to plain shifts. The generated sequences could probably be optimized a bit more, but they seem good enough for now. Differential Revision: https://reviews.llvm.org/D101574	2021-04-30 10:51:25 -07:00
Jun Ma	b310dd1501	[AArch64][SVE] Lower index_vector to step_vector As discussed in D100107, this patch first convert index_vector to step_vector, and convert step_vector back to index_vector after LegalizeDAG. Differential Revision: https://reviews.llvm.org/D100816	2021-04-30 19:04:39 +08:00
Joe Ellis	1eb81f8309	[AArch64] Add missing UINT_TO_FP promotions for v16i8 Differential Revision: https://reviews.llvm.org/D101042	2021-04-28 08:49:15 +00:00
Sander de Smalen	43ace8b5ce	[TTI] NFC: Change getScalingFactorCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100564	2021-04-23 16:06:36 +01:00
Joe Ellis	c19c0ad681	[AArch64][SVE] Fix bug in lowering of fixed-length integer vector divides The function AArch64TargetLowering::LowerFixedLengthVectorIntDivideToSVE previously assumed the operands were full vectors, but this is not always true. This function would produce bogus if the division operands are not full vectors, resulting in miscompiles when dividing 8-bit or 16-bit vectors. The fix is to perform an extend + div + truncate for non-full vectors, instead of the usual unpacking and unzipping logic. This is an additive change which reduces the non-full integer vector divisions to a pattern recognised by the existing lowering logic. For future reference, an example of code that would miscompile before this patch is below: 1 int8_t foo(unsigned N, int8_t a, int8_t b, int8_t *c) { 2 int8_t result = 0; 3 for (int i = 0; i < N; ++i) { 4 result += (a[i] / b[i]) / c[i]; 5 } 6 return result; 7 } Differential Revision: https://reviews.llvm.org/D100370	2021-04-23 14:55:10 +00:00
David Green	c0bf5929ee	[AArch64] Improve vector reverse lowering This improves the lowering of v8i16 and v16i8 vector reverse shuffles. Instead of going via a generic tbl it uses a rev64; ext pair, as already happens for v4i32. Differential Revision: https://reviews.llvm.org/D100882	2021-04-22 21:01:25 +01:00
Joe Ellis	528ee161c9	[AArch64] Block tryCombineToBSL combines for vectors wider than NEON There are no patterns for the AArch64ISD::BSP ISD node for anything other than NEON vectors at the moment. As a result, if we hit these combines for vectors wider than a NEON vector (such as what we might get with fixed length SVE) we will fail to lower. This patch simply prevents us from attempting the combines if the input vector type is too wide. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D100961	2021-04-22 15:09:13 +00:00
Martin Storsjö	8000e1f578	[AArch64] Fix calling windows varargs with floats in fixed args from non-windows functions When inspecting the calling convention, for calling windows functions from a non-windows function, inspect the calling convention of the called function, not the caller. Also remove an unnecessary parameter to AArch64CallLowering OutgoingArgHandler. Differential Revision: https://reviews.llvm.org/D100890	2021-04-22 12:02:49 +03:00
Caroline Concatto	ca9b7e2e2f	[AArch64][SVE] Fix crash with icmp+select This patch changes the lowering of SELECT_CC from Legal to Expand for scalable vector and adds support for scalable vectors in performSelectCombine. When selecting the nodes to lower in visitSELECT it checks if it is possible to use SELECT_CC in cases where SETCC is followed by SELECT. visistSELECT checks if SELECT_CC is legal or custom to replace SELECT by SELECT_CC. SELECT_CC used to be legal for scalable vector, so the node changes to SELECT_CC. This used to crash the compiler as there is no support for SELECT_CC with scalable vectors. So now the compiler lowers to VSELECT instead of SELECT_CC. Differential Revision: https://reviews.llvm.org/D100485	2021-04-21 14:16:27 +01:00
Bradley Smith	b8b075d8d7	[AArch64][SVE] Lower MULHU/MULHS nodes to umulh/smulh instructions Mark MULHS/MULHU nodes as legal for both scalable and fixed SVE types, and lower them to the appropriate SVE instructions. Additionally now that the MULH nodes are legal, integer divides can be expanded into a more performant code sequence. Differential Revision: https://reviews.llvm.org/D100487	2021-04-20 15:18:06 +01:00
Bradley Smith	22c017f0f9	[AArch64][NEON] Match (or (and -a b) (and (a+1) b)) => bit select With this patch vbslq_f32(vnegq_s32(a), b, c) lowers to a BIT instruction. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D100304	2021-04-15 13:52:47 +01:00
Martin Storsjö	5144f730a8	[AArch64] Fix windows vararg functions with floats in the fixed args On Windows, float arguments are normally passed in float registers in the calling convention for regular functions. For variable argument functions, floats are passed in integer registers. This already was done correctly since many years. However, the surprising bit was that floats among the fixed arguments also are supposed to be passed in integer registers, contrary to regular functions. (This also seems to be the behaviour on ARM though, both on Windows, but also on e.g. hardfloat linux.) In the calling convention, don't promote shorter floats to f64, but convert them to integers of the same length. (Floats passed as part of the actual variable arguments are promoted to double already on the C/Clang level; the LLVM vararg calling convention doesn't do any extra promotion of f32 to f64 - this matches how it works on X86 too.) Technically, this is an ABI break compared to older LLVM versions, but it fixes compatibility with the official platform ABI. (In practice, floats among the fixed arguments in variable argument functions is a pretty rare construct.) Differential Revision: https://reviews.llvm.org/D100365	2021-04-15 11:02:14 +03:00
David Sherwood	1206313f82	[CodeGen][AArch64] Fix isel crash for truncating FP stores When attempting to truncate a FP vector and store the result out to memory we crashed because we had no pattern for truncating FP stores. In fact, we don't support these types of stores and the correct fix is to stop marking these truncating stores as legal. Tests have been added here: CodeGen/AArch64/sve-fptrunc-store.ll Differential Revision: https://reviews.llvm.org/D100025	2021-04-08 13:21:29 +01:00
Jun Ma	274ac9d40e	[AArch64][SVE] Lowering sve.dot to DOT node Differential Revision: https://reviews.llvm.org/D99699	2021-04-02 20:05:17 +08:00
Bradley Smith	2f45e632c0	[AArch64][SVE] Improve codegen for select nodes with fixed types Additionally, move the existing fixed vselect tests to *-vselect.ll. Differential Revision: https://reviews.llvm.org/D99418	2021-04-01 15:54:37 +01:00
Bradley Smith	0934fa4f5d	[AArch64][SVE] SVE functions should use the SVE calling convention for fast calls When an SVE function calls another SVE function using the C calling convention we use the more efficient SVE VectorCall PCS. However, for the Fast calling convention we're incorrectly falling back to the generic AArch64 PCS. This patch adds the same "can use SVE vector calling convention" detection used by CallingConv::C to CallingConv::Fast. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D99657	2021-04-01 15:52:08 +01:00
Sander de Smalen	7108b2dec1	[SVE] Fix LoopVectorizer test scalalable-call.ll This marks FSIN and other operations to EXPAND for scalable vectors, so that they are not assumed to be legal by the cost-model. Depends on D97470 Reviewed By: dmgreen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97471	2021-03-31 14:52:49 +01:00
Florian Hahn	52e015081a	[AArch64] Avoid SCALAR_TO_VECTOR for single FP constant vector. Currently the code only checks for integer constants (ConstantSDNode) and triggers an infinite cycle for single-element floating point vector constants. We need to check for both FP and integer constants. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D99384	2021-03-31 10:17:36 +01:00
Joe Ellis	a7dde4c5f7	[AArch64][SVE] Lower fixed length INSERT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D98496	2021-03-30 09:37:11 +00:00
Joe Ellis	c4d39f64d0	[AArch64][SVE] Lower fixed length EXTRACT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D98625	2021-03-30 09:35:44 +00:00
Florian Hahn	482283042f	[AArch64] Remove custom zext/sext legalization code. Currently performExtendCombine assumes that the src-element bitwidth * 2 is a valid MVT. But this is not the case for i1 and it causes a crash on the v64i1 test cases added in this patch. It turns out that this code appears to not be needed; the same patterns are handled by other code and we end up with the same results, even without the custom lowering. I also added additional test cases in `a50037aaa6`. Let's just remove the unneeded code. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99437	2021-03-29 22:22:05 +01:00
Bradley Smith	9745dce8c3	[SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps This is currently performed in SelectionDAGLegalize, here we make it also happen in LegalizeVectorOps, allowing a target to lower the SETCC condition codes first in LegalizeVectorOps and then lower to a custom node afterwards, without having to duplicate all of the SETCC condition legalization in the target specific lowering. As a result of this, fixed length floating point SETCC nodes can now be properly lowered for SVE. Differential Revision: https://reviews.llvm.org/D98939	2021-03-29 15:32:25 +01:00
Florian Hahn	eb3d9f2eb6	[SelDag] Add isIntOrFPConstant helper function. This patch adds a new isIntOrFPConstant helper function to check if a SDValue is a integer of FP constant. This pattern is used in various places. There also are places that incorrectly just check for integer constants, e.g. D99384, so hopefully this helper will help people avoid that issue. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D99428	2021-03-28 12:48:58 +01:00
Nashe Mncube	ac2a1e9596	[SVE] Suppress vselect warning from incorrect interface call The VSelectCombine handler within AArch64ISelLowering, uses an interface call which only expects fixed vectors. This generates a warning when the call is made on a scalable vector. This warning has been suppressed with this change, by using the ElementCount interface, which supports both fixed and scalable vectors. I have also added a regression test which recreates the warning. Differential Revision: https://reviews.llvm.org/D98249	2021-03-24 14:34:34 +00:00
Benjamin Kramer	39e36fff3d	[AArch64] Fix unused variable warning	2021-03-23 13:42:14 +01:00
David Sherwood	748ae5281d	[IR][SVE] Add new llvm.experimental.stepvector intrinsic This patch adds a new llvm.experimental.stepvector intrinsic, which takes no arguments and returns a linear integer sequence of values of the form <0, 1, ...>. It is primarily intended for scalable vectors, although it will work for fixed width vectors too. It is intended that later patches will make use of this new intrinsic when vectorising induction variables, currently only supported for fixed width. I've added a new CreateStepVector method to the IRBuilder, which will generate a call to this intrinsic for scalable vectors and fall back on creating a ConstantVector for fixed width. For scalable vectors this intrinsic is lowered to a new ISD node called STEP_VECTOR, which takes a single constant integer argument as the step. During lowering this argument is set to a value of 1. The reason for this additional argument at the codegen level is because in future patches we will introduce various generic DAG combines such as mul step_vector(1), 2 -> step_vector(2) add step_vector(1), step_vector(1) -> step_vector(2) shl step_vector(1), 1 -> step_vector(2) etc. that encourage a canonical format for all targets. This hopefully means all other targets supporting scalable vectors can benefit from this too. I've added cost model tests for both fixed width and scalable vectors: llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll as well as codegen lowering tests for fixed width and scalable vectors: llvm/test/CodeGen/AArch64/neon-stepvector.ll llvm/test/CodeGen/AArch64/sve-stepvector.ll See this thread for discussion of the intrinsic: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html	2021-03-23 10:43:35 +00:00
Craig Topper	1066dcb550	[AArch64] Fix LowerMGATHER to return the chain result for floating point gathers. Found by adding asserts to LegalizeDAG to make sure custom legalized results had the right types. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D98968	2021-03-19 11:53:46 -07:00
Peter Waller	0d6482a76a	[llvm][AArch64][SVE] Lower fixed length vector fabs Seemingly striaghtforward. Differential Revision: https://reviews.llvm.org/D98434	2021-03-18 17:20:08 +00:00
Sjoerd Meijer	90ecb862a0	[AArch64] Rewrite (add, csel) to cinc Don't rewrite an add instruction with 2 SET_CC operands into a csel instruction. The total instruction sequence uses an extra instruction and register. Preventing this allows us to match a `(add, csel)` pattern and rewrite this into a `cinc`. Differential Revision: https://reviews.llvm.org/D98704	2021-03-18 08:49:27 +00:00
Bradley Smith	cf0da91ba5	[AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE Previously NEON used a target specific intrinsic for frintn, given that the FROUNDEVEN ISD node now exists, move over to that instead and add codegen support for that node for both NEON and fixed length SVE. Differential Revision: https://reviews.llvm.org/D98487	2021-03-17 11:41:22 +00:00
Joe Ellis	ff2dd8a212	[AArch64][SVE] Fold vector ZExt/SExt into gather loads where possible This commit folds sxtw'd or uxtw'd offsets into gather loads where possible with a DAGCombine optimization. As an example, the following code: 1 #include <arm_sve.h> 2 3 svuint64_t func(svbool_t pred, const int32_t *base, svint64_t offsets) { 4 return svld1sw_gather_s64offset_u64( 5 pred, base, svextw_s64_x(pred, offsets) 6 ); 7 } would previously lower to the following assembly: sxtw z0.d, p0/m, z0.d ld1sw { z0.d }, p0/z, [x0, z0.d] ret but now lowers to: ld1sw { z0.d }, p0/z, [x0, z0.d, sxtw] ret Differential Revision: https://reviews.llvm.org/D97858	2021-03-16 15:09:46 +00:00
Stelios Ioannou	ab86edbc88	[AArch64] Implement __rndr, __rndrrs intrinsics This patch implements the __rndr and __rndrrs intrinsics to provide access to the random number instructions introduced in Armv8.5-A. They are only defined for the AArch64 execution state and are available when __ARM_FEATURE_RNG is defined. These intrinsics store the random number in their pointer argument and return a status code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter intrinsic reseeds the random number generator. The instructions write the NZCV flags indicating the success of the operation that we can then read with a CSET. [1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics [2] https://bugs.llvm.org/show_bug.cgi?id=47838 Differential Revision: https://reviews.llvm.org/D98264 Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc	2021-03-15 17:51:48 +00:00
Bradley Smith	860ae9d50c	[AArch64][SVE] Add fixed/scalable lowering of FMAXIMUM/FMINIMUM ISD nodes Differential Revision: https://reviews.llvm.org/D98348	2021-03-11 13:37:47 +00:00
David Green	1a808286ef	[AArch64] Extend vecreduce -> udot handling to mla reductions We previously have lowering for: vecreduce.add(zext(X)) to vecreduce.add(UDOT(zero, X, one)) This extends that to also handle: vecreduce.add(mul(zext(X), zext(Y)) to vecreduce.add(UDOT(zero, X, Y)) It extends the existing code to optionally handle a mul with equal extends. Differential Revision: https://reviews.llvm.org/D97280	2021-03-10 22:25:12 +00:00
David Green	a02f506876	[AArch64] Extend vecreduce -> udot handling to v8i8 https://reviews.llvm.org/D88577 added v16i8 vecreduce to udot/sdot lowering. This extends that to v8i8 too, generalizing the pattern to handle the extra types. Differential Revision: https://reviews.llvm.org/D97279	2021-03-10 21:03:15 +00:00
Cullen Rhodes	2750f3ed31	[IR] Introduce llvm.experimental.vector.splice intrinsic This patch introduces a new intrinsic @llvm.experimental.vector.splice that constructs a vector of the same type as the two input vectors, based on a immediate where the sign of the immediate distinguishes two variants. A positive immediate specifies an index into the first vector and a negative immediate specifies the number of trailing elements to extract from the first vector. For example: @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> ; index @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing element count These intrinsics support both fixed and scalable vectors, where the former is lowered to a shufflevector to maintain existing behaviour, although while marked as experimental the recommended way to express this operation for fixed-width vectors is to use shufflevector. For scalable vectors where it is not possible to express a shufflevector mask for this operation, a new ISD node has been implemented. This is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker and Cullen Rhodes. [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D94708	2021-03-09 10:44:22 +00:00
Akira Hatanaka	dca5737945	Move ObjCARCUtil.h back to llvm/Analysis Instead of adding the header to llvm/IR, just duplicate the marker string in the auto upgrader.	2021-03-08 16:35:24 -08:00
LemonBoy	8725b24c6d	[AArch64] Legalize horizontal fmax/fmin reductions on f16 vectors Expand the horizontal reduction during the instruction selection phase, but only if the target doesn't support the full fp16 instruction set. Fixes https://bugs.llvm.org/show_bug.cgi?id=49401 Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D97840	2021-03-05 16:09:37 +01:00
David Blaikie	a2a55def35	Move llvm/Analysis/ObjCARCUtil.h to IR to fix layering. This is included from IR files, and IR doesn't/can't depend on Analysis (because Analysis depends on IR). Also fix the implementation - don't use non-member static in headers, as it leads to ODR violations, inaccurate "unused function" warnings, etc. And fix the header protection macro name (we don't generally include "LIB" in the names, so far as I can tell).	2021-03-04 16:14:53 -08:00
Akira Hatanaka	1900503595	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `ed4718eccb`, which was reverted because it was causing a miscompile. The bug that was causing the miscompile has been fixed in `75805dce5f`. Original commit message: Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-03-04 11:22:30 -08:00
Hans Wennborg	0a5dd06718	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR" This caused miscompiles of Chromium tests for iOS due clobbering of live registers. See discussion on the code review for details. > Background: > > This fixes a longstanding problem where llvm breaks ARC's autorelease > optimization (see the link below) by separating calls from the marker > instructions or retainRV/claimRV calls. The backend changes are in > https://reviews.llvm.org/D92569. > > https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue > > What this patch does to fix the problem: > > - The front-end adds operand bundle "clang.arc.attachedcall" to calls, > which indicates the call is implicitly followed by a marker > instruction and an implicit retainRV/claimRV call that consumes the > call result. In addition, it emits a call to > @llvm.objc.clang.arc.noop.use, which consumes the call result, to > prevent the middle-end passes from changing the return type of the > called function. This is currently done only when the target is arm64 > and the optimization level is higher than -O0. > > - ARC optimizer temporarily emits retainRV/claimRV calls after the calls > with the operand bundle in the IR and removes the inserted calls after > processing the function. > > - ARC contract pass emits retainRV/claimRV calls after the call with the > operand bundle. It doesn't remove the operand bundle on the call since > the backend needs it to emit the marker instruction. The retainRV and > claimRV calls are emitted late in the pipeline to prevent optimization > passes from transforming the IR in a way that makes it harder for the > ARC middle-end passes to figure out the def-use relationship between > the call and the retainRV/claimRV calls (which is the cause of > PR31925). > > - The function inliner removes an autoreleaseRV call in the callee if > nothing in the callee prevents it from being paired up with the > retainRV/claimRV call in the caller. It then inserts a release call if > claimRV is attached to the call since autoreleaseRV+claimRV is > equivalent to a release. If it cannot find an autoreleaseRV call, it > tries to transfer the operand bundle to a function call in the callee. > This is important since the ARC optimizer can remove the autoreleaseRV > returning the callee result, which makes it impossible to pair it up > with the retainRV/claimRV call in the caller. If that fails, it simply > emits a retain call in the IR if retainRV is attached to the call and > does nothing if claimRV is attached to it. > > - SCCP refrains from replacing the return value of a call with a > constant value if the call has the operand bundle. This ensures the > call always has at least one user (the call to > @llvm.objc.clang.arc.noop.use). > > - This patch also fixes a bug in replaceUsesOfNonProtoConstant where > multiple operand bundles of the same kind were being added to a call. > > Future work: > > - Use the operand bundle on x86-64. > > - Fix the auto upgrader to convert call+retainRV/claimRV pairs into > calls with the operand bundles. > > rdar://71443534 > > Differential Revision: https://reviews.llvm.org/D92808 This reverts commit `ed4718eccb`.	2021-03-03 15:51:40 +01:00
David Green	7abf7dd5ef	[AArch64] Add combine for add(udot(0, x, y), z) -> udot(z, x, y). Given a zero input for a udot, an add can be folded in to take the place of the input, using thte addition that the instruction naturally performs. Differential Revision: https://reviews.llvm.org/D97188	2021-03-01 12:53:34 +00:00
Fraser Cormack	6718fda6ad	[CodeGen] Fix issues with subvector intrinsic index types This patch addresses issues arising from the fact that the index type used for subvector insertion/extraction is inconsistent between the intrinsics and SDNodes. The intrinsic forms require i64 whereas the SDNodes use the type returned by SelectionDAG::getVectorIdxTy. Rather than update the intrinsic definitions to use an overloaded index type, this patch fixes the issue by transforming the index to the correct type as required. Any loss of index bits going from i64 to a smaller type is unexpected, and will be caught by an assertion in SelectionDAG::getVectorIdxConstant. The patch also updates the documentation for INSERT_SUBVECTOR and adds an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR. This necessitated changes to AArch64 which was using i64 for EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed its codegen after updating the backend accordingly. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D97459	2021-03-01 10:28:21 +00:00
David Green	f51b3de4e8	[AArch64] Introduce UDOT/SDOT DAG nodes This is used to lower UDOT/SDOT instructions, as opposed to relying on the intrinsic. Subsequent optimizations will be able to optimize them more cleanly based on these nodes.	2021-02-23 20:31:01 +00:00
Serge Pavlov	2c4f60e45b	[FPEnv][AArch64] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D96836	2021-02-19 13:16:51 +07:00
Bradley Smith	8bad8a43c3	[AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD Adjust generateFMAsInMachineCombiner to return false if SVE is present in order to combine fmul+fadd into fma. Also add new pseudo instructions so as to select the most appropriate of FMLA/FMAD depending on register allocation. Depends on D96599 Differential Revision: https://reviews.llvm.org/D96424	2021-02-18 16:55:16 +00:00
Bradley Smith	5b094bfeb3	[AArch64] Allow folding FMUL/FADD into FMA for FP16 types isFMAFasterThanFMulAndFAdd should return true for FP16 types when HasFullFP16 is present, since we have the instructions to handle it for both SVE and NEON. (SVE patterns and tests will follow). Differential Revision: https://reviews.llvm.org/D96599	2021-02-18 16:51:22 +00:00
Fraser Cormack	0176fecfbc	[SVE][CodeGen] Expand SVE MULH[SU] and [SU]MUL_LOHI nodes This patch fixes a codegen crash introduced in `fde2466171`, where the DAGCombiner started generating optimized MULH[SU] or [SU]MUL_LOHI nodes unless the target opted out. The AArch64 backend cannot currently select any of these nodes, so ensure that they are not generated in the first place. This issue was raised by @huihuiz in D94501. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D96849	2021-02-18 10:06:24 +00:00
Florian Hahn	211147c5ba	[AArch64] Convert CMP/SELECT sign patterns to OR & ASR. ICMP & SELECT patterns extracting the sign of a value can be simplified to OR & ASR (see https://alive2.llvm.org/ce/z/Xx4iZ0). This does not save any instructions in IR, but it is profitable on AArch64, because we need at least 2 extra instructions to materialize 1 and -1 for the SELECT. The improvements result in ~5% speedups on loops of the form static int sign_of(int x) { if (x < 0) return -1; return 1; } void foo(const int x, int res, int cnt) { for (int i=0;i<cnt;i++) res[i] = sign_of(x[i]); } Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D96596	2021-02-16 17:17:34 +00:00
Caroline Concatto	2d728bbff5	[CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse This patch adds a new intrinsic experimental.vector.reduce that takes a single vector and returns a vector of matching type but with the original lane order reversed. For example: ``` vector.reverse(<A,B,C,D>) ==> <D,C,B,A> ``` The new intrinsic supports fixed and scalable vectors types. The fixed-width vector relies on shufflevector to maintain existing behaviour. Scalable vector uses the new ISD node - VECTOR_REVERSE. This new intrinsic is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker (@paulwalker-arm). [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Differential Revision: https://reviews.llvm.org/D94883	2021-02-15 13:39:43 +00:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Nico Weber	de1966e542	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `4a64d8fe39`. Makes clang crash when buildling trivial iOS programs, see comment after https://reviews.llvm.org/D92808#2551401	2021-02-09 11:06:32 -05:00
Akira Hatanaka	4a64d8fe39	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `3fe3946d9a` without the changes made to lib/IR/AutoUpgrade.cpp, which was violating layering. Original commit message: Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 06:09:42 -08:00
Akira Hatanaka	2fbbb18c1d	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `3fe3946d9a`. The commit violates layering by including a header from Analysis in lib/IR/AutoUpgrade.cpp.	2021-02-05 06:00:05 -08:00
Akira Hatanaka	3fe3946d9a	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 05:55:18 -08:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Kerry McLaughlin	9b4fcfaa9e	[SVE][CodeGen] Remove performMaskedGatherScatterCombine The AArch64 DAG combine added by D90945 & D91433 extends the index of a scalable masked gather or scatter to i32 if necessary. This patch removes the combine and instead adds shouldExtendGSIndex, which is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether the index should be extended before calling getMaskedGather/Scatter. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D94525	2021-02-01 14:10:00 +00:00
Kazu Hirata	7925aa091d	[llvm] Populate SmallVector at construction time (NFC)	2021-01-28 22:21:14 -08:00
Richard Smith	925ae8c790	Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV" This reverts commit `53176c1680`, which introduceed a layering violation. LLVM's IR library can't include headers from Analysis.	2021-01-25 13:53:38 -08:00
Akira Hatanaka	53176c1680	[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end annotates calls with attribute "clang.arc.rv"="retain" or "clang.arc.rv"="claim", which indicates the call is implicitly followed by a marker instruction and a retainRV/claimRV call that consumes the call result. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the annotated calls in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the annotated calls. It doesn't remove the attribute on the call since the backend needs it to emit the marker instruction. The retainRV/claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes the autoreleaseRV call in the callee that returns the result if nothing in the callee prevents it from being paired up with the calls annotated with "clang.arc.rv"="retain/claim" in the caller. If the call is annotated with "claim", a release call is inserted since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the attributes to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV call returning the callee result, which makes it impossible to pair it up with the retainRV or claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the call is annotated with "retain" and does nothing if it's annotated with "claim". - This patch teaches dead argument elimination pass not to change the return type of a function if any of the calls to the function are annotated with attribute "clang.arc.rv". This is necessary since the pass can incorrectly determine nothing in the IR uses the function return, which can happen since the front-end no longer explicitly emits retainRV/claimRV calls in the IR, and change its return type to 'void'. Future work: - Use the attribute on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the attributes. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-01-25 11:57:08 -08:00
QingShan Zhang	ffc3e800c6	[NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero For now, we correct the result for sqrt if iteration > 0. This doesn't make sense as they are not strict relative. Reviewed By: dmgreen, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D94480	2021-01-25 04:02:44 +00:00
Kazu Hirata	cfa241680f	[llvm] Don't include StringSwitch.h where unnecessary (NFC)	2021-01-21 19:59:48 -08:00
Simon Pilgrim	69bc0990a9	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED). Add DemandedElts support inside the TRUNCATE analysis. REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d Differential Revision: https://reviews.llvm.org/D56387	2021-01-21 13:01:34 +00:00
Hans Wennborg	a51226057f	Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE" It caused "Vector shift amounts must be in the same as their first arg" asserts in Chromium builds. See the code review for repro instructions. > Add DemandedElts support inside the TRUNCATE analysis. > > Differential Revision: https://reviews.llvm.org/D56387 This reverts commit `cad4275d69`.	2021-01-20 20:06:55 +01:00
Simon Pilgrim	cad4275d69	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE Add DemandedElts support inside the TRUNCATE analysis. Differential Revision: https://reviews.llvm.org/D56387	2021-01-20 15:39:58 +00:00
Amanieu d'Antras	21bfd068b3	[AArch64] Add support for the GNU ILP32 ABI Add the aarch64[_be]-*-gnu_ilp32 targets to support the GNU ILP32 ABI for AArch64. The needed codegen changes were mostly already implemented in D61259, which added support for the watchOS ILP32 ABI. The main changes are: - Wiring up the new target to enable ILP32 codegen and MC. - ILP32 va_list support. - ILP32 TLSDESC relocation support. There was existing MC support for ELF ILP32 relocations from D25159 which could be enabled by passing "-target-abi ilp32" to llvm-mc. This was changed to check for "gnu_ilp32" in the target triple instead. This shouldn't cause any issues since the existing support was slightly broken: it was generating ELF64 objects instead of the ELF32 object files expected by the GNU ILP32 toolchain. This target has been tested by running the full rustc testsuite on a big-endian ILP32 system based on the GCC ILP32 toolchain. Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D94143	2021-01-20 13:34:47 +00:00
Nicholas Guy	16bf02c3a1	Reland "[AArch64] Attempt to sink mul operands"" This relands `dda60035e9`, which was reverted by `dbaa6a1858`	2021-01-18 16:00:22 +00:00
Nicholas Guy	f5fcbe4e3c	[AArch64] Further restricts when a dup(ext) can be rearranged In most cases, the dup(ext) pattern can be rearranged to perform the extension on the vector side, allowing for further vector-specific optimisations to be made. However the initial checks for this conversion were insufficient, allowing invalid encodings to be attempted (causing compilation to fail). Differential Revision: https://reviews.llvm.org/D94778	2021-01-18 16:00:21 +00:00
Stephan Herhut	061d152085	[SVE] Fix unused variable. Introduced by [SVE] Restrict the usage of REINTERPRET_CAST. Differential Revision: https://reviews.llvm.org/D94773	2021-01-15 15:10:33 +01:00
Paul Walker	2b8db40c92	[SVE] Restrict the usage of REINTERPRET_CAST. In order to limit the number of combinations of REINTERPRET_CAST, whilst at the same time prevent overlap with BITCAST, this patch establishes the following rules: 1. The operand and result element types must be the same. 2. The operand and/or result type must be an unpacked type. Differential Revision: https://reviews.llvm.org/D94593	2021-01-15 11:32:13 +00:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Martin Storsjö	dbaa6a1858	Revert "[AArch64] Attempt to sink mul operands" This reverts commit `dda60035e9`. This commit caused failures to compile some sources, erroring out with "error in backend: Cannot select: t85: v2i32 = AArch64ISD::DUP t15", see https://reviews.llvm.org/D91271 for the full reproduction case.	2021-01-14 17:28:18 +02:00
Nicholas Guy	dda60035e9	[AArch64] Attempt to sink mul operands Following on from D91255, this patch is responsible for sinking relevant mul operands to the same block so that umull/smull instructions can be correctly generated by the mul combine implemented in the aforementioned patch. Differential revision: https://reviews.llvm.org/D91271	2021-01-13 15:23:36 +00:00
Hsiangkai Wang	914e2f5a02	[NFC] Use generic name for scalable vector stack ID. Differential Revision: https://reviews.llvm.org/D94471	2021-01-13 10:57:43 +08:00
Kerry McLaughlin	c37f68a888	[SVE][CodeGen] Fix legalisation of floating-point masked gathers Changes in this patch: - When lowering floating-point masked gathers, cast the result of the gather back to the original type with reinterpret_cast before returning. - Added patterns for reinterpret_casts from integer to floating point, and concat_vector patterns for bfloat16. - Tests for various legalisation scenarios with floating point types. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D94171	2021-01-11 10:57:46 +00:00
Mark Murray	af7cce2fa4	[AArch64] Add +pauth archictecture option, allowing the v8.3a pointer authentication extension. Differential Revision: https://reviews.llvm.org/D94083	2021-01-08 13:21:11 +00:00
Nicholas Guy	ed23229a64	[AArch64] Fix crash caused by invalid vector element type Fixes a crash caused by D91255, when LLVMTy is null when calling changeExtendedVectorElementType. Differential Revision: https://reviews.llvm.org/D94234	2021-01-08 12:02:54 +00:00
David Sherwood	d1bf26fd94	[AArch64][SVE] Add lowering for llvm abs intrinsic Add functionality to permit lowering of the abs and neg intrinsics using the passthru variants. Differential Revision: https://reviews.llvm.org/D94160	2021-01-08 08:55:25 +00:00
Kazu Hirata	b934160aaa	[Target] Use llvm::find_if (NFC)	2021-01-07 20:29:36 -08:00
Nicholas Guy	350247a93c	[AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup)) Performing this rearrangement allows for existing patterns to match cases where the vector may be built after an extend, instead of before. Differential Revision: https://reviews.llvm.org/D91255	2021-01-06 16:02:16 +00:00
David Green	78d8a821e2	[AArch64] Handle any extend whilst lowering mull Demanded bits may turn a sext or zext into an anyext if the top bits are not needed. This currently prevents the lowering to instructions like mull, addl and addw. This patch fixes the mull generation by keeping it simple and treating them like zextends. Differential Revision: https://reviews.llvm.org/D93832	2021-01-06 10:08:43 +00:00
Sander de Smalen	a9f5e4375b	[AArch64] Use faddp to implement fadd reductions. Custom-expand legal VECREDUCE_FADD SDNodes to benefit from pair-wise faddp instructions. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D59259	2021-01-06 09:36:51 +00:00
Paul Walker	eba6deab22	[SVE] Lower vector CTLZ, CTPOP and CTTZ operations. CTLZ and CTPOP are lowered to CLZ and CNT instructions respectively. CTTZ is not a native SVE operation but is instead lowered to: CTTZ(V) => CTLZ(BITREVERSE(V)) In the case of fixed-length support using SVE we also lower CTTZ operating on NEON sized vectors because of its reliance on BITREVERSE which is also lowered to SVE intructions at these lengths. Differential Revision: https://reviews.llvm.org/D93607	2021-01-05 10:42:35 +00:00
Nikita Popov	fb77d95022	[AArch64] Fix legalization of i128 ctpop without neon If neon is disabled, LowerCTPOP will return SDValue() to indicate that normal legalization should be used. However, ReplaceNodeResults does not check for this and pushes the empty SDValue() onto the result vector, which will subsequently result in a crash. Differential Revision: https://reviews.llvm.org/D93825	2020-12-27 17:24:41 +01:00
Paul Walker	8eec7294fe	[SVE] Lower vector BITREVERSE and BSWAP operations. These operations are lowered to RBIT and REVB instructions respectively. In the case of fixed-length support using SVE we also lower BITREVERSE operating on NEON sized vectors as this results in fewer instructions. Differential Revision: https://reviews.llvm.org/D93606	2020-12-22 16:49:50 +00:00
Kazu Hirata	966f1431de	[Target] Use llvm::erase_if (NFC)	2020-12-20 17:43:22 -08:00
Kerry McLaughlin	52e4084d9c	[SVE][CodeGen] Vector + immediate addressing mode for masked gather/scatter This patch extends LowerMGATHER/MSCATTER to make use of the vector + reg/immediate addressing modes for scalable masked gathers & scatters. selectGatherScatterAddrMode checks if the base pointer is null, in which case we can swap the base pointer and the index, e.g. getelementptr nullptr, <vscale x N x T> (splat(%offset)) + %indices) -> getelementptr %offset, <vscale x N x T> %indices Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D93132	2020-12-18 11:56:36 +00:00
Kerry McLaughlin	6d2a78996b	[SVE][CodeGen] Add bfloat16 support to scalable masked gather Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D93307	2020-12-17 11:08:15 +00:00
QingShan Zhang	ebdd20f430	Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128 X86 and AArch64 expand it as libcall inside the target. And PowerPC also want to expand them as libcall for P8. So, propose an implement in the legalizer to common the logic and remove the code for X86/AArch64 to avoid the duplicate code. Reviewed By: Craig Topper Differential Revision: https://reviews.llvm.org/D91331	2020-12-17 07:59:30 +00:00
Paul Walker	b74c4dbb96	[SVE] Move INT_TO_FP i1 promotion into custom lowering. AddPromotedToType is being used to legalise INT_TO_FP operations when the source is a predicate. The point where this introduces vector extends might cause problems in the future so this patch falls back to manual promotion within custom lowering. Differential Revision: https://reviews.llvm.org/D90093	2020-12-15 11:57:07 +00:00
Kerry McLaughlin	c5ced82c8e	[SVE][CodeGen] Lower scalable floating-point vector reductions Changes in this patch: - Minor changes to the LowerVECREDUCE_SEQ_FADD function added by @cameron.mcinally to also work for scalable types - Added TableGen patterns for FP reductions with unpacked types (nxv2f16, nxv4f16 & nxv2f32) - Asserts added to expandFMINNUM_FMAXNUM & expandVecReduceSeq for scalable types Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D93050	2020-12-14 11:45:42 +00:00
Florian Hahn	46bc40e502	Recommit "[AArch64] Lower calls with rv_marker attribute." This recommits `a87fccb3ff` with a fix to mark the destination operand of the marker instruction as def, to fix a machine verifier failure. This reverts the revert commit `c0f2cea7c0`.	2020-12-13 16:20:39 +00:00
Florian Hahn	c0f2cea7c0	Revert "[AArch64] Lower calls with rv_marker attribute ." This reverts commit `a87fccb3ff`. A test appears to fail with expensive checks. Reverting while I investigate.	2020-12-11 20:12:59 +00:00
Florian Hahn	a87fccb3ff	[AArch64] Lower calls with rv_marker attribute . This patch adds support for lowering function calls with the rv_marker attribute. The goal is to expand such calls to the following sequence of instructions: BL @fn mov x29, x29 This sequence of instructions triggers Objective-C runtime optimizations, hence we want to ensure no instructions get moved in between them. This patch achieves that by adding a new CALL_RVMARKER ISD node, which gets turned into the BLR_RVMARKER pseudo, which eventually gets expanded into the sequence mentioned above. The sequence is then marked as instruction bundle, to avoid anything being moved in between. @ahatanak is working on using this attribute in the front- & middle-end. Together with the front- & middle-end changes, this should address PR31925 for AArch64. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D92569	2020-12-11 19:45:44 +00:00
Kerry McLaughlin	abe7775f5a	[SVE][CodeGen] Extend index of masked gathers This patch changes performMSCATTERCombine to also promote the indices of masked gathers where the element type is i8 or i16, and adds various tests for gathers with illegal types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91433	2020-12-10 13:54:45 +00:00
Kerry McLaughlin	05edfc5475	[SVE][CodeGen] Add DAG combines for s/zext_masked_gather This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true: - fold (and (masked_gather x)) -> (zext_masked_gather x) - fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x) LowerMGATHER has also been updated to fetch the LoadExtType associated with the gather and also use this value to determine the correct masked gather opcode to use. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92230	2020-12-09 11:53:19 +00:00
Huihui Zhang	8e6fc1f97e	[AArch64][SVE] Add lowering for llvm.maxnum\|minnum for scalable type. LLVM intrinsic llvm.maxnum\|minnum is overloaded intrinsic, can be used on any floating-point or vector of floating-point type. This patch extends current infrastructure to support scalable vector type. This patch also fix a warning message of incorrect use of EVT::getVectorNumElements() for scalable type, when DAGCombiner trying to split scalable vector. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92607	2020-12-08 09:35:53 -08:00
David Sherwood	59f17b57d9	[SVE] Fix crashes with inline assembly All the crashes found compiling inline assembly are fixed in this patch by changing AArch64TargetLowering::getRegForInlineAsmConstraint to be more resilient to mismatched value and register types. For example, it makes no sense to request a predicate register for a nxv2i64 type and so on. Tests have been added here: test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll Differential Revision: https://reviews.llvm.org/D92554	2020-12-08 13:48:43 +00:00
Tim Northover	c5978f42ec	UBSAN: emit distinctive traps Sometimes people get minimal crash reports after a UBSAN incident. This change tags each trap with an integer representing the kind of failure encountered, which can aid in tracking down the root cause of the problem.	2020-12-08 10:28:26 +00:00
Kerry McLaughlin	111f559bbd	[SVE][CodeGen] Call refineIndexType & refineUniformBase from visitMGATHER The refineIndexType & refineUniformBase functions added by D90942 can also be used to improve CodeGen of masked gathers. These changes were split out from D91092 Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92319	2020-12-07 13:20:19 +00:00
Kerry McLaughlin	f6dd32fd35	[SVE][CodeGen] Lower scalable masked gathers Lowers the llvm.masked.gather intrinsics (scalar plus vector addressing mode only) Changes in this patch: - Add custom lowering for MGATHER, using getGatherVecOpcode() to choose the appropriate gather load opcode to use. - Improve codegen with refineIndexType/refineUniformBase, added in D90942 - Tests added for gather loads with 32 & 64-bit scaled & unscaled offsets. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91092	2020-12-07 12:20:41 +00:00
Craig Topper	c55d9af8c0	[AArch64] Add custom lowering for ISD::ABS Instead of trying to pattern match the code produced by ISD::ABS expansion, just custom legalize ISD::ABS to the desired sequence. The one test change is because a DAG combine for (neg (abs)) is no longer firing because ISD::ABS is now Custom instead of Expand. Differential Revision: https://reviews.llvm.org/D92154	2020-12-04 10:45:31 -08:00
Kerry McLaughlin	603d40da9d	[SVE][CodeGen] Add a DAG combine to extend mscatter indices This patch adds a target-specific DAG combine for mscatter to promote indices with element types i8 or i16 before legalisation, plus various tests with illegal types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90945	2020-11-25 11:18:22 +00:00
Pavel Iliin	4d7df43ffd	[AArch64] Out-of-line atomics (-moutline-atomics) implementation. This patch implements out of line atomics for LSE deployment mechanism. Details how it works can be found in llvm/docs/Atomics.rst Options -moutline-atomics and -mno-outline-atomics to enable and disable it were added to clang driver. This is clang and llvm part of out-of-line atomics interface, library part is already supported by libgcc. Compiler-rt support is provided in separate patch. Differential Revision: https://reviews.llvm.org/D91157	2020-11-20 13:30:12 +00:00
Adhemerval Zanella	807320119f	[AArch64] Lower fptrunc/fpext from/to FP128t to/from FP16 The compiler-rt part which adds the emitted symbols is handled in a subsequent patch. Differential Revision: https://reviews.llvm.org/D91731	2020-11-19 15:14:50 -03:00
Kerry McLaughlin	306c8ab208	[SVE][CodeGen] Improve codegen of scalable masked scatters If the scatter store is able to perform the sign/zero extend of its index, this is folded into the instruction with refineIndexType(). Additionally, refineUniformBase() will return the base pointer and index from an add + splat_vector. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90942	2020-11-13 11:19:36 +00:00
David Sherwood	3225fcf11e	[SVE] Deal with SVE tuple call arguments correctly when running out of registers When passing SVE types as arguments to function calls we can run out of hardware SVE registers. This is normally fine, since we switch to an indirect mode where we pass a pointer to a SVE stack object in a GPR. However, if we switch over part-way through processing a SVE tuple then part of it will be in registers and the other part will be on the stack. I've fixed this by ensuring that: 1. When we don't have enough registers to allocate the whole block we mark any remaining SVE registers temporarily as allocated. 2. We temporarily remove the InConsecutiveRegs flags from the last tuple part argument and reinvoke the autogenerated calling convention handler. Doing this prevents the code from entering an infinite recursion and, in combination with 1), ensures we switch over to the Indirect mode. 3. After allocating a GPR register for the pointer to the tuple we then deallocate any SVE registers we marked as allocated in 1). We also set the InConsecutiveRegs flags back how they were before. 4. I've changed the AArch64ISelLowering LowerCALL and LowerFormalArguments functions to detect the start of a tuple, which involves allocating a single stack object and doing the correct numbers of legal loads and stores. Differential Revision: https://reviews.llvm.org/D90219	2020-11-12 08:41:50 +00:00
Caroline Concatto	37f4ccb275	[AArch64]Add memory op cost model for SVE This patch adds/fixes memory op cost model for SVE with fixed-width vector. Differential Revision: https://reviews.llvm.org/D90950	2020-11-11 12:49:19 +00:00
Simon Pilgrim	1a62ca65c1	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
Kerry McLaughlin	170947a5de	[SVE][CodeGen] Lower scalable masked scatters Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only) Changes included in this patch: - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use. Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts. - Added the getCanonicalIndexType function to convert redundant addressing modes (e.g. scaling is redundant when accessing bytes) - Tests with 32 & 64-bit scaled & unscaled offsets Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90941	2020-11-11 11:50:22 +00:00
Francesco Petrogalli	9f61931e07	[llvm][AArch64] Allow TB(N)Z to drop signext for sign bit tests. For example if the sign extension is only used in for TBZ, and the value is used elsewhere with a zero extension, this can eliminate a sign extension. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D90606	2020-11-09 18:27:48 +00:00
Christopher Tetreault	900ec97bbe	[UBSan] Cannot negate smallest negative signed integer Silence warning Undefined Behavior Sanitzer warning: runtime error: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D90710	2020-11-04 10:07:52 -08:00
Kerry McLaughlin	f2412d372d	[SVE][CodeGen] Lower scalable integer vector reductions This patch uses the existing LowerFixedLengthReductionToSVE function to also lower scalable vector reductions. A separate function has been added to lower VECREDUCE_AND & VECREDUCE_OR operations with predicate types using ptest. Lowering scalable floating-point reductions will be addressed in a follow up patch, for now these will hit the assertion added to expandVecReduce() in TargetLowering. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D89382	2020-11-04 11:38:49 +00:00
David Sherwood	cea69fa4dc	[SVE] Add fatal error for unnamed SVE variadic arguments We don't currently support passing unnamed variadic SVE arguments so I've added a fatal error if we hit such cases to prevent any silent ABI issues in future. Differential Revision: https://reviews.llvm.org/D90230	2020-10-30 13:35:47 +00:00
Florian Hahn	ba78cae20f	[AArch64] Use DUP for BUILD_VECTOR with few different elements. If most elements of BUILD_VECTOR are the same, with a few different elements, it is better to use DUP for the common elements and INSERT_VECTOR_ELT for the different elements. Currently this transform is guarded quite restrictively to only trigger in clearly beneficial cases. With D90176, the lowering for patterns originating from code like ` float32x4_t y = {a,a,a,0};` (common in 3D apps) are lowered even better (unnecessary fmov is removed). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D90233	2020-10-28 19:48:20 +00:00
David Green	066737fdbc	[AArch64] Remove AArch64ISD::NOT, use vnot instead vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT node, but allow more folding thanks to all the target independent optimizations. Specifically this allows select(icmp ne, x, y) to become "cmeq; bsl y, x" as opposed to needing to convert the predicate with "cmeq; mvn; bsl x, y" Unfortunately there is a regression in a cmtst test, but the code it selected from was already non-canonical, with instcombine preferring to use an eq predicate instead. Plus the more common case of icmp ne is improved. Differential Revision: https://reviews.llvm.org/D90126	2020-10-28 08:15:37 +00:00
Cameron McInally	a1cc274cb3	[SVE] Lower fixed length VECREDUCE_SEQ_FADD operation Differential Revision: https://reviews.llvm.org/D89162	2020-10-23 16:24:02 -05:00
David Sherwood	d67d8f8790	[SVE][AArch64] Replace TypeSize comparisons with their integer equivalents In many places in the AArch64 backend we are comparing TypeSize objects, but in fact we are only ever expecting fixed width types. I've changed all such comparisons to use their integer equivalents by replacing calls to getSizeInBits() with getFixedSizeInBits(), etc. Differential Revision: https://reviews.llvm.org/D89116	2020-10-19 07:41:33 +01:00
Vinay Madhusudan	159b2d8e62	[AArch64] Combine UADDVs to generate vector add ADD(UADDV a, UADDV b) --> UADDV(ADD a, b) This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888 Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929 Differential Revision: https://reviews.llvm.org/D88731	2020-10-15 09:56:31 +05:30
Cameron McInally	421f1b7294	[SVE] Lower fixed length VECREDUCE_FADD operation Differential Revision: https://reviews.llvm.org/D89263	2020-10-14 09:41:11 -05:00
David Sherwood	af57a0838e	[SVE] Add fatal error when running out of registers for SVE tuple call arguments When passing SVE types as arguments to function calls we can run out of hardware SVE registers. This is normally fine, since we switch to an indirect mode where we pass a pointer to a SVE stack object in a GPR. However, if we switch over part-way through processing a SVE tuple then part of it will be in registers and the other part will be on the stack. This is wrong and we'd like to avoid any silent ABI compatibility issues in future. For now, I've added a fatal error when this happens until we can get a proper fix. Differential Revision: https://reviews.llvm.org/D89326	2020-10-14 09:31:41 +01:00
Vinay Madhusudan	37dce7475b	[AArch64] Identify SAD pattern (ABS (SUB (EXTEND a), (EXTEND b))) to ZERO_EXTEND((UABD a, b)) (ABS (SUB (SIGN_EXTEND a), (SIGN_EXTEND b))) to ZERO_EXTEND((SABD a, b)) This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888 Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929 Differential Revision: https://reviews.llvm.org/D88742	2020-10-13 15:50:54 +05:30
Cameron McInally	974ddb54c9	[SVE] Lower fixed length VECREDUCE_XOR operation Differential Revision: https://reviews.llvm.org/D88974	2020-10-12 10:12:15 -05:00
Cameron McInally	333b2ab60b	[SVE] Lower fixed length VECREDUCE_OR operation Differential Revision: https://reviews.llvm.org/D88847	2020-10-07 09:56:25 -05:00
Paul Walker	27f3d51b4e	[SVE] Lower fixed length vector fneg and fsqrt operations. Also updates sve-fp.ll to use fneg directly. Differential Revision: https://reviews.llvm.org/D88683	2020-10-06 10:48:16 +01:00
Paul Walker	8bb702a8ad	[SVE] Lower fixed length vector floating point rounding operations. Adds lowering for: llvm.ceil llvm.floor llvm.nearbyint llvm.rint llvm.round llvm.trunc Differential Revision: https://reviews.llvm.org/D88671	2020-10-06 10:48:16 +01:00
Cameron McInally	9642ded8ba	[SVE] Lower fixed length VECREDUCE_AND operation Differential Revision: https://reviews.llvm.org/D88707	2020-10-05 11:28:38 -05:00
Vinay Madhusudan	f192594956	[AArch64] Generate dot for v16i8 sum reduction to i32 Convert VECREDUCE_ADD( EXTEND(v16i8_type) ) to VECREDUCE_ADD( DOTv16i8(v16i8_type) ) whenever the result type is i32. This gains in one of the SPECCPU 2017 benchmark. This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888 Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929 Differential Revision: https://reviews.llvm.org/D88577	2020-10-02 17:11:02 +01:00
Muhammad Asif Manzoor	aab6f7db47	[AArch64][SVE] Add lowering for llvm fabs Add the functionality to lower fabs for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D88679	2020-10-01 19:41:25 -04:00
Kerry McLaughlin	fcf70e1e3b	[SVE][CodeGen] Lower scalable fp_extend & fp_round operations This patch adds FP_EXTEND_MERGE_PASSTHRU & FP_ROUND_MERGE_PASSTHRU ISD nodes, used to lower scalable vector fp_extend/fp_round operations. fp_round has an additional argument, the 'trunc' flag, which is an integer of zero or one. This also fixes a warning introduced by the new tests added to sve-split-fcvt.ll, resulting from an implicit TypeSize -> uint64_t cast in SplitVecOp_FP_ROUND. Reviewed By: sdesmalen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D88321	2020-10-01 12:17:37 +01:00
Kerry McLaughlin	75db7cf78a	[SVE][CodeGen] Legalisation of integer -> floating point conversions Splitting the operand of a scalable [S\|U]INT_TO_FP results in a concat_vectors operation where the operands are unpacked FP scalable vectors (e.g. nxv2f32). This patch adds custom lowering of concat_vectors which checks that the number of operands is 2, and isel patterns to match concat_vectors of scalable FP types with uzp1. Reviewed By: efriedma, paulwalker-arm Differential Revision: https://reviews.llvm.org/D88033	2020-10-01 10:43:20 +01:00
Paul Walker	8931c3d682	[NFC] Iterate across an explicit list of scalable MVTs when driving setOperationAction. Iterating across all of integer_scalable_vector_valuetypes seems wasteful when there's only a handful we care about. Also removes some rouge whitespace. Differential Revision: https://reviews.llvm.org/D88552	2020-10-01 10:17:59 +01:00
Cameron McInally	80381c4dc9	[SVE] Lower fixed length VECREDUCE_[FMAX\|FMIN] to Scalable Differential Revision: https://reviews.llvm.org/D88444	2020-09-29 16:22:29 -05:00
Cameron McInally	9b0b09671c	[SVE] Lower fixed length VECREDUCE_[UMAX\|UMIN] to Scalable Essentially the same as the signed variants from D88259. Also includes a clean up of the lowering function. Differential Revision: https://reviews.llvm.org/D88317	2020-09-28 09:29:00 -05:00
David Sherwood	bafdd11326	[SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy After some recent upstream discussion we decided that it was best to avoid having the / operator for both ElementCount and TypeSize, since this could give the impression that these classes can be used in the same way as basic integer integer types. However, division for scalable types is a bit odd because we are only dividing the minimum quantity by a value, as opposed to something like: (MinSize * Vscale) / SomeValue This is why when performing division it's important the caller first establishes whether the operation makes sense, perhaps by calling isKnownMultipleOf() prior to division. The caller must now explictly call divideCoefficientBy() on the class to perform the operation. Differential Revision: https://reviews.llvm.org/D87700	2020-09-28 08:03:00 +01:00
Cameron McInally	9a4767411e	[SVE] Revert accidental change from 405e22fbe8719cff6c40eec15c2044f42527f116 Accidentally commited two lines that were not intended. Remove those.	2020-09-25 10:11:10 -05:00
Cameron McInally	e2ccf7f178	[SVE] Lower fixed length VECREDUCE_[SMAX\|SMIN] to Scalable This patch is pretty similar to the VECREDUCE_ADD patch, with some minor tweaks. Results from the AArch64ISD::[SMAX\|SMIN]V_PRED return element sized results. This requires an ANY_EXTEND for results < 32-bits, since Legalization promotes those results. There is no NEON i64 vector support for SMAXV\|SMINV, so use SVE for those. Differential Revision: https://reviews.llvm.org/D88259	2020-09-25 09:58:17 -05:00
Daniel Kiss	2a96f47c5f	[AArch64] __builtin_return_address for PAuth. This change adds the support for __builtin_return_address for ARMv8.3A Pointer Authentication. Location of the authentication code in the pointer depends on the system configuration, therefore a dedicated instruction is used for effectively removing the authentication code without authenticating the pointer. Reviewed By: chill Differential Revision: https://reviews.llvm.org/D75044	2020-09-24 23:23:49 +02:00
Cameron McInally	e8413ac97f	[AArch64] Expand some vector of i64 reductions on NEON With the exception of VECREDUCE_ADD, there are no NEON instructions to support vector of i64 reductions. This patch removes the Custom lowerings for those and adds some test coverage to confirm. Differential Revision: https://reviews.llvm.org/D88161	2020-09-23 16:01:24 -05:00
Muhammad Asif Manzoor	3a76de4275	[AArch64][SVE] Add lowering for llvm frecpx Add the functionality to lower frecpx for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D88032	2020-09-23 15:23:54 -04:00
Cameron McInally	db40a74344	[SVE] Lower fixed length ISD::VECREDUCE_ADD to Scalable Differential Revision: https://reviews.llvm.org/D87796	2020-09-23 09:08:07 -05:00
Kerry McLaughlin	d0149ba9b4	[SVE][CodeGen] Lower legal integer -> floating point conversions This patch adds new ISD nodes, SCVTZ_MERGE_PASSTHRU & UCVTZ_MERGE_PASSTHRU, which are used to lower both legal scalable vector [S\|U]INT_TO_FP operations and the following intrinsics: - llvm.aarch64.sve.scvtf - llvm.aarch64.sve.ucvtf Reviewed By: sdesmalen, efriedma Differential Revision: https://reviews.llvm.org/D87913	2020-09-23 11:53:53 +01:00
David Sherwood	e077367a28	[SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits An existing function Type::getScalarSizeInBits returns a uint64_t instead of a TypeSize class because the caller is requesting a scalar size, which cannot be scalable. This patch makes other similar functions requesting a scalar size consistent with that, thereby eliminating more than 1000 implicit TypeSize -> uint64_t casts. Differential revision: https://reviews.llvm.org/D87889	2020-09-23 09:20:08 +01:00
Paul Walker	f3fa954b5b	[SVE] Change definition of reduction ISD nodes to have an SVE vector result type. The current nodes, AArch64::SMAXV_PRED for example, are defined to return a NEON vector result. This is incorrect because they modify the complete SVE register and are thus changed to represent such. This patch also adds nodes for UADDV_PRED and SADDV_PRED, which unifies the handling of all SVE reductions. NOTE: Floating-point reductions are already implemented correctly, so this patch is essentially making everything consistent with those. Differential Revision: https://reviews.llvm.org/D87843	2020-09-21 13:16:28 +01:00
Tim Northover	2afe4becec	AArch64: make sure jump table entries can reach entire image This turns all jump table entries into deltas within the target function because in the small memory model all code & static data must be in a 4GB block somewhere in memory. When the entries were a delta between the table location and a basic block, the 32-bit signed entries are not enough to guarantee reachability. https://reviews.llvm.org/D87286	2020-09-18 09:50:40 +01:00
Cameron McInally	a35c7f3076	[SVE][WIP] Implement lowering for fixed length VSELECT to Scalable Map fixed length VSELECT to its Scalable equivalent. Differential Revision: https://reviews.llvm.org/D85364	2020-09-17 14:02:57 -05:00
Sanne Wouda	d5fd3d9b90	[AArch64] Match pairwise add/fadd pattern D75689 turns the faddp pattern into a shuffle with vector add. Match this new pattern in target-specific DAG combine, rather than ISel, because legalization (for v2f32) turns it into a bit of a mess. - extended to cover f16, f32, f64 and i64	2020-09-17 16:27:01 +01:00
Kerry McLaughlin	f7185b271f	[SVE][CodeGen] Lower floating point -> integer conversions This patch adds new ISD nodes, FCVTZS_MERGE_PASSTHRU & FCVTZU_MERGE_PASSTHRU, which are used to lower scalable vector FP_TO_SINT/FP_TO_UINT operations and the following intrinsics: - llvm.aarch64.sve.fcvtzu - llvm.aarch64.sve.fcvtzs Reviewed By: efriedma, paulwalker-arm Differential Revision: https://reviews.llvm.org/D87232	2020-09-17 14:04:22 +01:00
Muhammad Asif Manzoor	d417488ef5	[AArch64][SVE] Add lowering for llvm fsqrt Add the functionality to lower fsqrt for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D87707	2020-09-15 15:26:17 -04:00
Philip Reames	e6bc7037d3	[AArch64] Statepoint support for AArch64. Differential Revision: https://reviews.llvm.org/D66012 Patch By: loicottet (with major rebase by me)	2020-09-14 16:43:08 -07:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Sanjay Patel	3a8ea8609b	[Intrinsics] define semantics for experimental fmax/fmin vector reductions As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html This is hopefully the final remaining showstopper before we can remove the 'experimental' from the reduction intrinsics. No behavior was specified for the FP min/max reductions, so we have a mess of different interpretations. There are a few potential options for the semantics of these max/min ops. I think this is the simplest based on current behavior/implementation: make the reductions inherit from the existing llvm.maxnum/minnum intrinsics. These correspond to libm fmax/fmin, and those are similar to the (now deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing data). So the default expansion creates calls to libm functions. Another option would be to inherit from llvm.maximum/minimum (NaNs propagate), but most targets just crash in codegen when given those nodes because no default expansion was ever implemented AFAICT. We could also just assume 'nnan' semantics by default (we are already assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets (AArch64, PowerPC) support the more defined behavior, so it doesn't make much sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to loosen the semantics. (Note that D67507 was proposed to update the LangRef to acknowledge the more recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do update based on the new standard, the reduction instructions can seamlessly inherit from whatever updates are made to the max/min intrinsics.) x86 sees a regression here on 'nnan' tests because we have underlying, longstanding bugs in FMF creation/propagation. Those need to be fixed apart from this change (for example: https://llvm.org/PR35538). The expansion sequence before this patch may not have been correct. Differential Revision: https://reviews.llvm.org/D87391	2020-09-12 09:10:28 -04:00
Kerry McLaughlin	cd89f5c91b	[SVE][CodeGen] Legalisation of truncate for scalable vectors Truncating from an illegal SVE type to a legal type, e.g. `trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>` fails after PromoteIntOp_CONCAT_VECTORS attempts to create a BUILD_VECTOR. This patch changes the promote function to create a sequence of INSERT_SUBVECTORs if the return type is scalable, and replaces these with UNPK+UZP1 for AArch64. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86548	2020-09-10 11:35:33 +01:00
Muhammad Asif Manzoor	1ffcbe35ae	[AArch64][SVE] Add lowering for rounding operations Add the functionality to lower SVE rounding operations for passthru variant. Created a new test case file for all rounding operations. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86793	2020-09-04 11:16:57 -04:00
Benjamin Kramer	8782c72765	Strength-reduce SmallVectors to arrays. NFCI.	2020-08-28 21:14:20 +02:00
David Sherwood	f4257c5832	[SVE] Make ElementCount members private This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isScalable(). In addition I've added some other member functions for more commonly used operations. Hopefully this makes the class more useful and will reduce the need for calling getKnownMinValue(). Differential Revision: https://reviews.llvm.org/D86065	2020-08-28 14:43:53 +01:00
Ties Stuij	d678e14c55	[AArch64][CodeGen] Restrict bfloat vector operations to what's actually supported Previously in addTypeForNeon, we would set the operations for bfloat vectors like other generic types. But as bfloat is a storage-only type a number of operations shouldn't be set. This patch fixes that. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85101	2020-08-28 11:44:37 +01:00
Mikhail Maltsev	23d5e93f34	[AArch64] Optimize instruction selection for certain vector shuffles This patch adds code to recognize vector shuffles which can be represented as VDUP (splat) of a vector lane with of a different (wider) type than the original vector lane type. For example: shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1> is essentially: shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 0, i32 0> Such patterns are generated by the SelectionDAG machinery in some cases (see DAGCombiner::visitBITCAST in DAGCombiner.cpp, the "Remove double bitcasts from shuffles" part). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D86225	2020-08-27 11:06:49 +01:00
Paul Walker	81337c915f	[SVE] Fallback to default expansion when lowering SIGN_EXTEN_INREG from non-byte based source. Differential Revision: https://reviews.llvm.org/D86394	2020-08-27 10:57:37 +01:00
Ahmed Bougacha	383f7c8858	[AArch64] Use CCAssignFnForReturn helper in more spots. NFC. It was added for GISel, but SDAG could use it too!	2020-08-26 14:39:11 -07:00
Muhammad Asif Manzoor	fd536eeed9	[AArch64][SVE] Add lowering for llvm fceil Add the functionality to lower fceil for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D84548	2020-08-26 15:59:44 -04:00
Paul Walker	73ac3c0ede	[SVE] Lower scalable vector ISD::FNEG operations. Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A) transformation when -1 is expressed as an ISD::SPLAT_VECTOR. Differential Revision: https://reviews.llvm.org/D86415	2020-08-25 11:22:28 +01:00
Cameron McInally	36dbb8fc97	[SVE] Lower fixed length UDIV to scalable Pretty much just a copy of the SDIV patches (D86114 and D85982) with string replacement. Differential Revision: https://reviews.llvm.org/D86316	2020-08-21 09:01:25 -05:00
Cameron McInally	ac63959460	[SVE] Lower fixed length vXi8/vXi16 SDIV to scalable There are no nxv16i8/nxv8i16 SDIV instructions, so these fixed width operations must be promoted to nxv4i32. Differential Revision: https://reviews.llvm.org/D86114	2020-08-20 13:47:01 -05:00
Mehdi Amini	a407ec9b6d	Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private."" Was reverted because MLIR/Flang builds were broken, these APIs have been fixed in the meantime.	2020-08-19 17:26:36 +00:00
Mehdi Amini	4fc56d70aa	Revert "[NFC][llvm] Make the contructors of `ElementCount` private." This reverts commit `264afb9e6a`. (and dependent `6b742cc48` and `fc53bd610f`) MLIR/Flang are broken.	2020-08-19 17:21:37 +00:00
Francesco Petrogalli	264afb9e6a	[NFC][llvm] Make the contructors of `ElementCount` private. Differential Revision: https://reviews.llvm.org/D86120	2020-08-19 16:26:44 +00:00
Eli Friedman	bb18532399	[AArch64][SVE] Allow llvm.aarch64.sve.st2/3/4 with vectors of pointers. This isn't necessaary for ACLE, but could be useful in other situations. And the change is simple. Differential Revision: https://reviews.llvm.org/D85251	2020-08-18 12:51:16 -07:00
Paul Walker	cb5cc47a65	[SVE] Lower fixed length vector ISD::SPLAT_VECTOR operations. Also strengthens the CHECK lines for scalable vector splat tests. Differential Revision: https://reviews.llvm.org/D86070	2020-08-18 11:19:43 +01:00
Cameron McInally	92593f9e77	[SVE] Lower fixed length vXi32/vXi64 SDIV to scalable vectors. Differential Revision: https://reviews.llvm.org/D85982	2020-08-14 18:47:22 -05:00
Cameron McInally	21810b0e14	[SVE] Lower fixed length vector integer UMIN/UMAX Differential Revision: https://reviews.llvm.org/D85926	2020-08-13 14:48:36 -05:00
Cameron McInally	e1a87f0a9b	[SVE] Lower fixed length vector integer SMIN/SMAX Differential Revision: https://reviews.llvm.org/D85855	2020-08-13 11:41:20 -05:00
Paul Walker	e63cc8105a	[SVE] Lower fixed length vector integer shifts. Differential Revision: https://reviews.llvm.org/D85724	2020-08-13 12:35:47 +01:00
Paul Walker	130098228d	[SVE] Lower fixed length vector integer ISD::SETCC operations. Differential Revision: https://reviews.llvm.org/D85831	2020-08-13 12:01:56 +01:00
Paul Walker	9e04895258	[SVE] Lower fixed length integer extend operations. Differential Revision: https://reviews.llvm.org/D85640	2020-08-13 11:54:53 +01:00
Francesco Petrogalli	c561f4d2ec	[SVE][VLS] Don't combine logical AND. Testing is performed when targeting 128, 256 and 512-bit wide vectors. For 128-bit vectors, the original behavior of using NEON instructions is preserved. Differential Revision: https://reviews.llvm.org/D85479	2020-08-12 20:00:07 +01:00
Cameron McInally	ce2c991061	[SVE] Lower fixed length FP minnum/maxnum Lower fixed length MINNUM/MAXNUM to scalable vectors. Cherry-picked from D71767 with added tests. Differential Revision: https://reviews.llvm.org/D85744	2020-08-12 12:02:52 -05:00
David Sherwood	88bbd30736	[SVE][CodeGen] Fix issues with EXTRACT_SUBVECTOR when using scalable FP vectors In this patch I have fixed two issues: 1. Our SVE tuple get/set intrinsics were using the wrong constant type for the index passed to EXTRACT_SUBVECTOR. I have fixed this by using the function SelectionDAG::getVectorIdxConstant to create the value. Also, I have updated the documentation for EXTRACT_SUBVECTOR describing what type the constant index should be and we now enforce this when creating the node. 2. The AArch64 backend was missing the appropriate patterns for extracting certain subvectors (nxv4f16 and nxv2f32) from legal SVE types. I have added them as part of this patch. The only way that I could find to test the new patterns was to use the SVE tuple get intrinsics, although I realise it looks a bit unusual. Tests added here: test/CodeGen/AArch64/sve-extract-subvector.ll Differential Revision: https://reviews.llvm.org/D85516	2020-08-12 08:35:46 +01:00
Paul Walker	b6c7b7fa31	[SVE] Add ISD nodes for predicated integer extend inreg operations. These are useful instructions when lowering fixed length vector extends, so I've broken this patch out as kind of NFC like work. Differential Revision: https://reviews.llvm.org/D85546	2020-08-11 11:39:26 +01:00
Paul Walker	d542feb8e4	[SVE] Lower fixed length vector integer subtract operations. Differential Revision: https://reviews.llvm.org/D85665	2020-08-11 11:32:12 +01:00
Paul Walker	0d33a8ef5b	[SVE] Lower scalable vector mul operations. This allows us to remove extra patterns from AArch64SVEInstrInfo.td because we can reuse those required for fixed length vectors. Differential Revision: https://reviews.llvm.org/D85328	2020-08-06 11:15:35 +01:00
Paul Walker	3ed59b775d	[SVE] Implement lowering for fixed length vector multiplication. NOTE: Also uses SVE code generation for NEON size vectors, instead of expanding i64 based vector multiplications. Differential Revision: https://reviews.llvm.org/D85327	2020-08-06 11:01:39 +01:00
Paul Walker	927fc536ca	[SVE] Add lowering for fixed length vector and, or & xor operations. Since there are no ill effects when performing these operations with undefined elements, they are lowered to the already supported unpredicated scalable vector equivalents. Differential Revision: https://reviews.llvm.org/D85117	2020-08-05 11:28:34 +01:00
Sander de Smalen	f2916636f8	[AArch64][SVE] Disable tail calls if callee does not preserve SVE regs. This fixes an issue triggered by the following code, where emitEpilogue got confused when trying to restore the SVE registers after the call, whereas the call to bar() is implemented as a TCReturn: int non_sve(); int sve(svint32_t x) { return non_sve(); } Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84869	2020-08-05 09:38:54 +01:00
Eli Friedman	95efea4b93	[AArch64][SVE] Widen narrow sdiv/udiv operations. The SVE instruction set only supports sdiv/udiv for 32-bit and 64-bit integers. If we see an 8-bit or 16-bit divide, widen the operands to 32 bits, and narrow the result. Differential Revision: https://reviews.llvm.org/D85170	2020-08-04 13:22:15 -07:00
Paul Walker	4be13b15d6	[SVE] Replace remaining _MERGE_OP1 nodes with _PRED variants. This is the final bit of work to relax the register allocation requirements when code generating normal LLVM IR, which rarely care about the result of inactive lanes. By using _PRED nodes we can make better use of SVE's reversed instructions. Also removes a redundant parameter from the min/max tests. Differential Revision: https://reviews.llvm.org/D85142	2020-08-04 11:19:17 +01:00
David Sherwood	23ad660b5d	[SVE][CodeGen] At -O0 fallback to DAG ISel when translating alloca with scalable types When building code at -O0 We weren't falling back to DAG ISel correctly when encountering alloca instructions with scalable vector types. This is because the alloca has no operands that are scalable. I've fixed this by adding a check in AArch64ISelLowering::fallBackToDAGISel for alloca instructions with scalable types. Differential Revision: https://reviews.llvm.org/D84746	2020-07-30 08:40:53 +01:00
Eli Friedman	c02aa53ecb	[AArch64][SVE] Add "fast" fcmp operations. `dacf8d3` added support for most fcmp operations, but there are some extra variations I hadn't considered: SelectionDAG supports float comparisons that are neither ordered nor unordered. Add support for the missing operations. Differential Revision: https://reviews.llvm.org/D84460	2020-07-24 13:22:41 -07:00

... 4 5 6 7 8 ...

1445 Commits