llvm-project

Commit Graph

Author	SHA1	Message	Date
Eli Friedman	664a1fd9f0	[AArch64] Use the CMP_SWAP_128 variants added in `843c6140`. Accidentally forgot to flip the opcode... and I didn't notice because it was working fine for the GlobalISel.	2021-07-20 13:23:27 -07:00
Fangrui Song	0c0549fbb3	[AArch64] Delete unused Opcode after D106039	2021-07-20 12:51:44 -07:00
Eli Friedman	843c614058	[AArch64] Fix i128 cmpxchg using ldxp/stxp. Basically two parts to this fix: 1. Stop using AtomicExpand to expand cmpxchg i128 2. Fix AArch64ExpandPseudoInsts to use a correct expansion. From ARM architecture reference: To atomically load two 64-bit quantities, perform a Load-Exclusive pair/Store-Exclusive pair sequence of reading and writing the same value for which the Store-Exclusive pair succeeds, and use the read values from the Load-Exclusive pair. Fixes https://bugs.llvm.org/show_bug.cgi?id=51102 Differential Revision: https://reviews.llvm.org/D106039	2021-07-20 12:38:12 -07:00
Amy Huang	fd972bb9fd	Revert "[llvm][sve] Lowering for VLS truncating stores" because it causes a seg fault (see https://reviews.llvm.org/D104471). This reverts commit `c305557acd`.	2021-07-19 11:03:33 -07:00
David Green	5561ad8b36	[ARM] Remove PromotedBitwiseVT for NEON types This removes the promotion of NEON AND, OR and XOR nodes to v2i32/v4i32, treating them the same as the AArch64 and MVE backends where we just add the relevant patterns for each legal type. This prevents a lot of bitcasts from being added to the DAG, which have the potential to make optimizations more difficult. It does mean adding extra patterns, and some codegen can change due to the types now being legal, not promoted. Differential Revision: https://reviews.llvm.org/D105588	2021-07-19 16:36:33 +01:00
Sander de Smalen	0ed0573527	[AArch64][SVE] Optimize bitcasts between unpacked half/i16 vectors. The case for nxv2f32/nxv2i32 was already covered by D104573. This patch builds on top of that by making the mechanism work for nxv2[b]f16/nxv2i16, nxv4[b]f16/nxv4i16 as well. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D106138	2021-07-19 08:29:28 +01:00
Eli Friedman	1e30bf8621	[SelectionDAG] Add an overload of getStepVector that assumes step 1. This is mostly a minor convenience, but the pattern seems frequent enough to be worthwhile (and we'll probably add more uses in the future). Differential Revision: https://reviews.llvm.org/D105850	2021-07-14 11:37:01 -07:00
Tim Northover	7802f62b3f	AArch64: use 4-byte slots for arm64_32 pointers in a tail call	2021-07-13 11:08:59 +01:00
Eli Friedman	6c04b7dd4f	[AArch64] Optimize overflow checks for [s\|u]mul.with.overflow.i32. Saves one instruction for signed, uses a cheaper instruction for unsigned. Differential Revision: https://reviews.llvm.org/D105770	2021-07-12 15:30:42 -07:00
Benjamin Kramer	0da3573a9e	[AArch64] Silence unused variable warning. NFC. AArch64ISelLowering.cpp:15167:8: warning: unused variable 'OpCode' [-Wunused-variable] auto OpCode = N->getOpcode(); ^	2021-07-12 16:01:11 +02:00
Michael Liao	8253fa2298	Fix warning '-Wparentheses'. NFC.	2021-07-12 09:25:30 -04:00
David Truby	c305557acd	[llvm][sve] Lowering for VLS truncating stores This adds custom lowering for truncating stores when operating on fixed length vectors in SVE. It also includes a DAG combine to fold extends followed by truncating stores into non-truncating stores in order to prevent this pattern appearing once truncating stores are supported. Currently truncating stores are not used in certain cases where the size of the vector is larger than the target vector width. Differential Revision: https://reviews.llvm.org/D104471	2021-07-12 11:14:17 +01:00
Bradley Smith	026bb84bcd	[AArch64][SVE] Add ISel patterns for floating point compare with zero instructions Additionally, lower the floating point compare SVE intrinsics to SETCC_MERGE_ZERO ISD nodes to avoid duplicating ISel patterns. Differential Revision: https://reviews.llvm.org/D105486	2021-07-08 10:46:12 +00:00
Bradley Smith	5ab9000fbb	[AArch64][SVE] Fix selection failures for scalable MLOAD nodes with passthru Differential Revision: https://reviews.llvm.org/D105348	2021-07-06 14:17:23 +00:00
Caroline Concatto	a2c5c56055	[AArch64][CostModel] Add cost model for experimental.vector.splice This patch adds a new ShuffleKind SK_Splice and then handle the cost in getShuffleCost, as in experimental.vector.reverse. Differential Revision: https://reviews.llvm.org/D104630	2021-07-05 14:30:24 +01:00
Bradley Smith	cc273983f7	[AArch64][SVE] Improve fixed length codegen for common vector shuffle case Improve codegen when lowering the common vector shuffle case from the vectorizer (op1[last]:op2[0:last-1]). This patch only handles this common case as it is difficult to handle this more generally when using fixed length vectors, due to being unable to use the SVE ext instruction. Differential Revision: https://reviews.llvm.org/D105289	2021-07-05 12:09:27 +01:00
Paul Walker	287d39dd5a	[NFC] Fix a few whitespace issues and typos.	2021-07-04 11:49:58 +01:00
Krzysztof Parzyszek	df88c26f0d	[OpaquePtr] Add type parameter to emitLoadLinked Differential Revision: https://reviews.llvm.org/D105353	2021-07-02 13:07:40 -05:00
Florian Hahn	1a248233a5	[AArch64] Use custom lowering for fp16 vector copysign. The custom copysign lowering already supports fp16. Use it. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105277	2021-07-02 11:15:30 +01:00
Eli Friedman	0176ac9503	[AArch64] Optimize SVE bitcasts of unpacked types. Target-independent code only knows how to spill to the stack; instead, use AArch64ISD::REINTERPRET_CAST. Differential Revision: https://reviews.llvm.org/D104573	2021-07-01 15:35:48 -07:00
Bradley Smith	2668727929	[SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR Inserting into a smaller-than-legal scalable vector would result in an internal compiler error. For example, inserting a <vscale x 4 x i8> into a <vscale x 8 x i8> (both illegal vector types for SVE) would cause a crash. This crash was happening because there was no code to promote (legalise) the result of an INSERT_SUBVECTOR node. This patch implements PromoteIntRes_INSERT_SUBVECTOR, which legalises the ISD node. This is currently done by going through memory. This is necessary because of the requirement that the SubVec parameter of the INSERT_SUBVECTOR node must be smaller than the Vec parameter, which means that INSERT_SUBVECTOR cannot always have a legal result/operand types. Co-Authored-by: Joe Ellis <joe.ellis@arm.com> Differential Revision: https://reviews.llvm.org/D102766	2021-07-01 17:05:53 +01:00
Bradley Smith	01b846674d	[AArch64][SVE] Add support for fixed length MSCATTER/MGATHER Since gather lowering can now lower to nodes that may need expansion via the vector legalizer, do MGATHER lowering via vector legalizer. Additionally, as part of adding passthru support for fixed typed gathers, fix passthru support for scalable types. Depends on D104910 Differential Revision: https://reviews.llvm.org/D104217	2021-07-01 12:13:59 +01:00
Fangrui Song	17858da022	[AArch64] Remove unneeded ExternalSymbolSDNode code for machine constraint "S". NFC ExternalSymbolSDNode is implicitly generated libcalls but with an address taking operation we cannot reference an ExternalSymbolSDNode.	2021-06-30 17:52:56 -07:00
Sjoerd Meijer	b062fff87a	Recommit "[AArch64] Custom lower <4 x i8> loads" This recommits D104782 including a fix for adding a wrong operand to the new load node. Differential Revision: https://reviews.llvm.org/D105110	2021-06-30 09:18:06 +01:00
Dylan Fleming	c3d3defd11	[SVE] Added CodeGen support for inserting an element into a predicate vector Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D104722	2021-06-29 14:55:40 +01:00
Sjoerd Meijer	3a7cea2858	Revert "[AArch64] Custom lower <4 x i8> loads" This reverts commit `51e434fc25` because of a build bot failure in test-suite::GCC-C-execute-pr60960.test that I need to investigate.	2021-06-28 17:44:46 +01:00
Bradley Smith	c089e29aa4	[AArch64][SVE] DAG combine SETCC_MERGE_ZERO of a SETCC_MERGE_ZERO This helps remove extra comparisons when generating masks for fixed length masked operations. Differential Revision: https://reviews.llvm.org/D104910	2021-06-28 15:06:06 +01:00
David Green	2887f14639	[ISel] Port AArch64 SABD and UABD to DAGCombine This ports the AArch64 SABD and USBD over to DAG Combine, where they can be used by more backends (notably MVE in a follow-up patch). The matching code has changed very little, just to handle legal operations and types differently. It selects from (ABS (SUB (EXTEND a), (EXTEND b))), producing a ubds/abdu which is zexted to the original type. Differential Revision: https://reviews.llvm.org/D91937	2021-06-26 19:34:16 +01:00
Sjoerd Meijer	51e434fc25	[AArch64] Custom lower <4 x i8> loads This custom lowers <4 x i8> vector loads using a 32-bit load, followed by 2 SSHLL instructions to extend it to e.g. a <4 x i32> vector. Before, it was really inefficient and expensive to construct a <4 x i32> for this as 4 byte loads and 4 moves were used. With this improvement SLP vectorisation might for example become profitable, see D103629. Differential Revision: https://reviews.llvm.org/D104782	2021-06-25 09:53:51 +01:00
Martin Storsjö	42f74e8249	[llvm] Rename StringRef _lower() method calls to _insensitive() This is a mechanical change. This actually also renames the similarly named methods in the SmallString class, however these methods don't seem to be used outside of the llvm subproject, so this doesn't break building of the rest of the monorepo.	2021-06-25 00:22:01 +03:00
Sander de Smalen	d5e14ba88c	[GlobalISel] NFC: Change LLT::vector to take ElementCount. This also adds new interfaces for the fixed- and scalable case: * LLT::fixed_vector * LLT::scalable_vector The strategy for migrating to the new interfaces was as follows: * If the new LLT is a (modified) clone of another LLT, taking the same number of elements, then use LLT::vector(OtherTy.getElementCount()) or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2) or operator. That is because there is no reason to specifically restrict the types to 'fixed_vector'. If the algorithm works on the number of elements (as unsigned), then just use fixed_vector. This will need to be fixed up in the future when modifying the algorithm to also work for scalable vectors, and will need then need additional tests to confirm the behaviour works the same for scalable vectors. * If the test used the '/Scalable=/true` flag of LLT::vector, then this is replaced by LLT::scalable_vector. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104451	2021-06-24 11:26:12 +01:00
David Spickett	e4ecd83fe9	[llvm][AArch64] Handle arrays of struct properly (from IR) This only applies to FastIsel. GlobalIsel seems to sidestep the issue. This fixes https://bugs.llvm.org/show_bug.cgi?id=46996 One of the things we do in llvm is decide if a type needs consecutive registers. Previously, we just checked if it was an array or not. (plus an SVE specific check that is not changing here) This causes some confusion when you arbitrary IR like: ``` %T1 = type { double, i1 }; define [ 1 x %T1 ] @foo() { entry: ret [ 1 x %T1 ] zeroinitializer } ``` We see it is an array so we call CC_AArch64_Custom_Block which bails out when it sees the i1, a type we don't want to put into a block. This leaves the location of the double in some kind of intermediate state and leads to odd codegen. Which then crashes the backend because it doesn't know how to implement what it's been asked for. You get this: ``` renamable $d0 = FMOVD0 $w0 = COPY killed renamable $d0 ``` Rather than this: ``` $d0 = FMOVD0 $w0 = COPY $wzr ``` The backend knows how to copy 64 bit to 64 bit registers, but not 64 to 32. It can certainly be taught how but the real issue seems to be us even trying to assign a register block in the first place. This change makes the logic of AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters a bit more in depth. If we find an array, also check that all the nested aggregates in that array have a single member type. Then CC_AArch64_Custom_Block's assumption of a type that looks like [ N x type ] will be valid and we get the expected codegen. New tests have been added to exercise these situations. Note that some of the output is not ABI compliant. The aim of this change is to simply handle these situations and not to make our processing of arbitrary IR ABI compliant. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D104123	2021-06-16 13:56:01 +00:00
Huihui Zhang	1c096bf09f	[SVE][LSR] Teach LSR to enable simple scaled-index addressing mode generation for SVE. Currently, Loop strengh reduce is not handling loops with scalable stride very well. Take loop vectorized with scalable vector type <vscale x 8 x i16> for instance, (refer to test/CodeGen/AArch64/sve-lsr-scaled-index-addressing-mode.ll added). Memory accesses are incremented by "16vscale", while induction variable is incremented by "8vscale". The scaling factor "2" needs to be extracted to build candidate formula i.e., "reg(%in) + 2reg({0,+,(8 %vscale)}". So that addrec register reg({0,+,(8vscale)}) can be reused among Address and ICmpZero LSRUses to enable optimal solution selection. This patch allow LSR getExactSDiv to recognize special cases like "C1XY /s C2X*Y", and pull out "C1 /s C2" as scaling factor whenever possible. Without this change, LSR is missing candidate formula with proper scaled factor to leverage target scaled-index addressing mode. Note: This patch doesn't fully fix AArch64 isLegalAddressingMode for scalable vector. But allow simple valid scale to pass through. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D103939	2021-06-14 16:42:34 -07:00
Jingu Kang	08ce52ef5e	[AArch64] Improve SAD pattern Given a vecreduce_add node, detect the below pattern and convert it to the node sequence with UABDL, [S\|U]ADB and UADDLP. i32 vecreduce_add( v16i32 abs( v16i32 sub( v16i32 [sign\|zero]_extend(v16i8 a), v16i32 [sign\|zero]_extend(v16i8 b)))) =================> i32 vecreduce_add( v4i32 UADDLP( v8i16 add( v8i16 zext( v8i8 [S\|U]ABD low8:v16i8 a, low8:v16i8 b v8i16 zext( v8i8 [S\|U]ABD high8:v16i8 a, high8:v16i8 b Differential Revision: https://reviews.llvm.org/D104042	2021-06-14 15:48:51 +01:00
Nikita Popov	1ffa6499ea	[TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC) Don't require a specific kind of IRBuilder for TargetLowering hooks. This allows us to drop the IRBuilder.h include from TargetLowering.h. Differential Revision: https://reviews.llvm.org/D103759	2021-06-06 16:29:50 +02:00
David Green	12f53e5392	[AArch64] Remove AArch64ISD::NEG This NEG node is just a vector negation, easily represented as a SUB zero. Removing it from the one place it is generated is essentially an NFC, but can allow some extra folding. The updated tests are now loading different constant literals, which have already been negated. Differential Revision: https://reviews.llvm.org/D103703	2021-06-05 19:54:42 +01:00
Bradley Smith	a85f5874e2	[AArch64] Remove SETCC of CSEL when the latter's condition can be inverted setcc (csel 0, 1, cond, X), 1, ne ==> csel 0, 1, !cond, X Where X is a condition code setting instruction. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D103256	2021-06-04 15:53:21 +01:00
Irina Dobrescu	e971099a9b	[AArch64] Optimise bitreverse lowering in ISel Differential Revision: https://reviews.llvm.org/D103105	2021-06-02 12:51:12 +01:00
Jessica Paquette	e7f501b5e7	[GlobalISel][AArch64] Combine and (lshr x, cst), mask -> ubfx x, cst, width Also add a target hook which allows us to get around custom legalization on AArch64. Differential Revision: https://reviews.llvm.org/D99283	2021-06-01 10:56:17 -07:00
Eli Friedman	0b3b0a727a	[AArch64][RISCV] Make sure isel correctly honors failure orderings. If a cmpxchg specifies acquire or seq_cst on failure, make sure we generate code consistent with that ordering even if the success ordering is not acquire/seq_cst. At one point, it was ambiguous whether this sort of construct was valid, but the C++ standad and LLVM now accept arbitrary combinations of success/failure orderings. This doesn't address the corresponding issue in AtomicExpand. (This was reported as https://bugs.llvm.org/show_bug.cgi?id=33332 .) Fixes https://bugs.llvm.org/show_bug.cgi?id=50512. Differential Revision: https://reviews.llvm.org/D103284	2021-05-28 12:47:40 -07:00
Tim Northover	9ff2eb1ea5	SwiftTailCC: teach verifier musttail rules applicable to this CC. SwiftTailCC has a different set of requirements than the C calling convention for a tail call. The exact argument sequence doesn't have to match, but fewer ABI-affecting attributes are allowed. Also make sure the musttail diagnostic triggers if a musttail call isn't actually a tail call.	2021-05-28 11:12:00 +01:00
Bradley Smith	f3c577ed38	[AArch64][SVE] Add fixed length codegen for FP_TO_{S,U}INT/{S,U}INT_TO_FP Depends on D102607 Differential Revision: https://reviews.llvm.org/D102777	2021-05-25 12:54:55 +01:00
Bradley Smith	e40513252a	[AArch64][SVE] Add fixed length codegen for FP_ROUND/FP_EXTEND Depends on D102498 Differential Revision: https://reviews.llvm.org/D102607	2021-05-24 13:02:30 +01:00
Bradley Smith	4bc14be259	[AArch64][SVE] Improve codegen for fixed length vector concat Differential Revision: https://reviews.llvm.org/D102498	2021-05-24 12:56:02 +01:00
David Truby	bf3b6cf920	[llvm][sve] Lowering for VLS MLOAD/MSTORE This adds custom lowering for the MLOAD and MSTORE ISD nodes when passed fixed length vectors in SVE. This is done by converting the vectors to VLA vectors and using the VLA code generation. Fixed length extending loads and truncating stores currently produce correct code, but do not use the built in extend/truncate in the load and store instructions. This will be fixed in a future patch. Differential Revision: https://reviews.llvm.org/D101834	2021-05-20 10:50:59 +00:00
Andrew Savonichev	a647100b43	[AArch64] Combine vector shift instructions in SelectionDAG bswap.v2i16 + sitofp in LLVM IR generate a sequence of: - REV32 + USHR for bswap.v2i16 - SHL + SSHR + SCVTF for sext to v2i32 and scvt The shift instructions are excessive as noted in PR24820, and they can be optimized to just SSHR. Differential Revision: https://reviews.llvm.org/D102333	2021-05-20 10:50:13 +03:00
Eli Friedman	3dd49ec194	[AArch64][SVE] Implement extractelement of i1 vectors. The implementation just extends the vector to a larger element type, and extracts from that. Not fancy, but generates reasonable code. There was discussion in the review of doing the promotion in target-independent code, but I'm sticking with this to avoid making LegalizeDAG infrastructure more complicated. Differential Revision: https://reviews.llvm.org/D87651	2021-05-17 14:51:11 -07:00
Irina Dobrescu	50511df32e	[AArch64] Lower bitreverse in ISel Adding lowering support for bitreverse. Previously, lowering bitreverse would expand it into a series of other instructions. This patch makes it so this produces a single rbit instruction instead. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D102397	2021-05-17 13:35:27 +01:00
Tim Northover	82a0e808bb	IR/AArch64/X86: add "swifttailcc" calling convention. Swift's new concurrency features are going to require guaranteed tail calls so that they don't consume excessive amounts of stack space. This would normally mean "tailcc", but there are also Swift-specific ABI desires that don't naturally go along with "tailcc" so this adds another calling convention that's the combination of "swiftcc" and "tailcc". Support is added for AArch64 and X86 for now.	2021-05-17 10:48:34 +01:00
Jacob Bramley	900c898994	[AArch64] Lower fptoi.sat intrinsics. AArch64's fctv instructions implement the saturating behaviour that the fpto*i.sat intrinsics require, in cases where the destination width matches the saturation width. Lowering them removes a lot of unnecessary generated code. Only scalar lowerings are supported for now. Differential Revision: https://reviews.llvm.org/D102353	2021-05-17 10:19:19 +01:00
Bradley Smith	90ffcb1245	[AArch64][SVE] Add unpredicated vector BIC ISD node Addition of this node allows us to better utilize the different forms of the SVE BIC instructions, including using the alias to an AND (immediate). Differential Revision: https://reviews.llvm.org/D101831	2021-05-14 16:12:13 +01:00
Tim Northover	ea0eec69f1	IR+AArch64: add a "swiftasync" argument attribute. This extends any frame record created in the function to include that parameter, passed in X22. The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001 in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect of this is that tools walking the stack should expect to see one of three values there: * 0b0000 => a normal, non-extended record with just [FP, LR] * 0b0001 => the extended record [X22, FP, LR] * 0b1111 => kernel space, and a non-extended record. All other values are currently reserved. If compiling for arm64e this context pointer is address-discriminated with the discriminator 0xc31a and the DB (process-specific) key. There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing front-ends access to this slot (and forcing its creation initialized to nullptr if necessary).	2021-05-14 11:43:58 +01:00
Peter Waller	6e6f9a636b	[AArch64][SVE] Improve sve.convert.to.svbool lowering The sve.convert.to.svbool lowering has the effect of widening a logical <M x i1> vector representing lanes into a physical <16 x i1> vector representing bits in a predicate register. In general, if converting to svbool, the contents of lanes in the physical register might not be known. For sve.convert.to.svbool the new lanes are specified to be zeroed, requiring 'and' instructions to mask off the new lanes. For lanes coming from a ptrue or a comparison, however, they are known to be zero. CodeGen Before: ptrue p0.s, vl16 ptrue p1.s ptrue p2.b and p0.b, p2/z, p0.b, p1.b ret After: ptrue p0.s, vl16 ret Differential Revision: https://reviews.llvm.org/D101544	2021-05-12 10:57:25 +01:00
Bradley Smith	635164b95a	[AArch64][SVE] Improve SVE codegen for fixed length BITCAST Expanding a fixed length operation involves wrapping the operation in an insert/extract subvector pair, as such, when this is done to bitcast we end up with an extract_subvector of a bitcast. DAGCombine tries to convert this into a bitcast of an extract_subvector which restores the initial fixed length bitcast, causing an infinite loop of legalization. As part of this patch, we must make sure the above DAGCombine does not trigger after legalization if the created bitcast would not be legal. Differential Revision: https://reviews.llvm.org/D101990	2021-05-10 14:43:53 +01:00
Bradley Smith	65c89cd1a6	[AArch64][SVE] Better utilisation of unpredicated forms of remaining intrinsics When using predicated intrinsics, if the predicate used is all lanes active, use an unpredicated form of the instruction, additionally this allows for better use of immediate forms. This only includes instructions where the unpredicated/predicated forms matched in such a way that instruction selection would not introduce extra ptrue instructions. This allows us to convert the intrinsics directly to architecture independent ISD nodes. Depends on D101062 Differential Revision: https://reviews.llvm.org/D101828	2021-05-10 13:06:02 +01:00
Bradley Smith	f8f953c2a6	[AArch64][SVE] Better utilisation of unpredicated forms of arithmetic intrinsics When using predicated arithmetic intrinsics, if the predicate used is all lanes active, use an unpredicated form of the instruction, additionally this allows for better use of immediate forms. This also includes a new complex isel pattern which allows matching an all active predicate when the types are different but the predicate is a superset of the type being used. For example, to allow a b8 ptrue for a b32 predicate operand. This only includes instructions where the unpredicated/predicated forms are mismatched between variants, meaning that the removal of the predicate is done during instruction selection in order to prevent spurious re-introductions of ptrue instructions. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D101062	2021-05-10 13:05:37 +01:00
Sander de Smalen	407a33889d	[AArch64][SVE] Fix isel failure for FP-extending loads DAGCombiner tries to combine a (fpext (load)) to (fround (extload)) but SVE has no FP-extending loads. By marking these as expand, the combine no longer happens. This also fixes a similar issue for fptrunc, where the source type is not a legal type. Reviewed By: bsmith, kmclaughlin Differential Revision: https://reviews.llvm.org/D102053	2021-05-10 11:27:38 +01:00
Jun Ma	b3aeb13892	[AArch64][SVE] Remove index_vector node. Since index_vector is lowered into step_vector in D100816, we can just remove index_vector, use step_vector for codegen directly. Differential Revision: https://reviews.llvm.org/D101593	2021-05-10 11:08:58 +08:00
Simon Pilgrim	280aa3415e	[DAG] Add a generic expansion for SHIFT_PARTS opcodes using funnel shifts Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts. Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication. I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to). NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly. Differential Revision: https://reviews.llvm.org/D101987	2021-05-07 13:12:30 +01:00
Bradley Smith	9f37980d45	[AArch64][SVE] Fold insert(zero, extract(X, 0), 0) -> X, when X is known to zero lanes 1-N Specifically, this allow us to rely on the lane zero'ing behaviour of SVE reduce instructions. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D101369	2021-05-04 15:05:05 +01:00
David Green	966435daf9	[AArch64] Fold CSEL x, x, cc -> x This can come up in rare situations, where a csel is created with identical operands. These can be folded simply to the original value, allowing the csel to be removed and further simplification to happen. This patch also removes FCSEL as it is unused, not being produced anywhere or lowered to anything. Differential Revision: https://reviews.llvm.org/D101687	2021-05-03 17:34:05 +01:00
LemonBoy	4751cadcca	[AArch64] Prevent spilling between ldxr/stxr pairs Apply the same logic used to check if CMPXCHG nodes should be expanded at -O0: the register allocator may end up spilling some register in between the atomic load/store pairs, breaking the atomicity and possibly stalling the execution. Fixes PR48017 Reviewed By: efriedman Differential Revision: https://reviews.llvm.org/D101163	2021-05-01 17:17:05 +02:00
Eli Friedman	6e6ae6c727	[AArch64] Fix lowering for fshl/fshr with SVE types. These operations don't exist natively, so just let the target-independent code expand to plain shifts. The generated sequences could probably be optimized a bit more, but they seem good enough for now. Differential Revision: https://reviews.llvm.org/D101574	2021-04-30 10:51:25 -07:00
Jun Ma	b310dd1501	[AArch64][SVE] Lower index_vector to step_vector As discussed in D100107, this patch first convert index_vector to step_vector, and convert step_vector back to index_vector after LegalizeDAG. Differential Revision: https://reviews.llvm.org/D100816	2021-04-30 19:04:39 +08:00
Joe Ellis	1eb81f8309	[AArch64] Add missing UINT_TO_FP promotions for v16i8 Differential Revision: https://reviews.llvm.org/D101042	2021-04-28 08:49:15 +00:00
Sander de Smalen	43ace8b5ce	[TTI] NFC: Change getScalingFactorCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100564	2021-04-23 16:06:36 +01:00
Joe Ellis	c19c0ad681	[AArch64][SVE] Fix bug in lowering of fixed-length integer vector divides The function AArch64TargetLowering::LowerFixedLengthVectorIntDivideToSVE previously assumed the operands were full vectors, but this is not always true. This function would produce bogus if the division operands are not full vectors, resulting in miscompiles when dividing 8-bit or 16-bit vectors. The fix is to perform an extend + div + truncate for non-full vectors, instead of the usual unpacking and unzipping logic. This is an additive change which reduces the non-full integer vector divisions to a pattern recognised by the existing lowering logic. For future reference, an example of code that would miscompile before this patch is below: 1 int8_t foo(unsigned N, int8_t a, int8_t b, int8_t *c) { 2 int8_t result = 0; 3 for (int i = 0; i < N; ++i) { 4 result += (a[i] / b[i]) / c[i]; 5 } 6 return result; 7 } Differential Revision: https://reviews.llvm.org/D100370	2021-04-23 14:55:10 +00:00
David Green	c0bf5929ee	[AArch64] Improve vector reverse lowering This improves the lowering of v8i16 and v16i8 vector reverse shuffles. Instead of going via a generic tbl it uses a rev64; ext pair, as already happens for v4i32. Differential Revision: https://reviews.llvm.org/D100882	2021-04-22 21:01:25 +01:00
Joe Ellis	528ee161c9	[AArch64] Block tryCombineToBSL combines for vectors wider than NEON There are no patterns for the AArch64ISD::BSP ISD node for anything other than NEON vectors at the moment. As a result, if we hit these combines for vectors wider than a NEON vector (such as what we might get with fixed length SVE) we will fail to lower. This patch simply prevents us from attempting the combines if the input vector type is too wide. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D100961	2021-04-22 15:09:13 +00:00
Martin Storsjö	8000e1f578	[AArch64] Fix calling windows varargs with floats in fixed args from non-windows functions When inspecting the calling convention, for calling windows functions from a non-windows function, inspect the calling convention of the called function, not the caller. Also remove an unnecessary parameter to AArch64CallLowering OutgoingArgHandler. Differential Revision: https://reviews.llvm.org/D100890	2021-04-22 12:02:49 +03:00
Caroline Concatto	ca9b7e2e2f	[AArch64][SVE] Fix crash with icmp+select This patch changes the lowering of SELECT_CC from Legal to Expand for scalable vector and adds support for scalable vectors in performSelectCombine. When selecting the nodes to lower in visitSELECT it checks if it is possible to use SELECT_CC in cases where SETCC is followed by SELECT. visistSELECT checks if SELECT_CC is legal or custom to replace SELECT by SELECT_CC. SELECT_CC used to be legal for scalable vector, so the node changes to SELECT_CC. This used to crash the compiler as there is no support for SELECT_CC with scalable vectors. So now the compiler lowers to VSELECT instead of SELECT_CC. Differential Revision: https://reviews.llvm.org/D100485	2021-04-21 14:16:27 +01:00
Bradley Smith	b8b075d8d7	[AArch64][SVE] Lower MULHU/MULHS nodes to umulh/smulh instructions Mark MULHS/MULHU nodes as legal for both scalable and fixed SVE types, and lower them to the appropriate SVE instructions. Additionally now that the MULH nodes are legal, integer divides can be expanded into a more performant code sequence. Differential Revision: https://reviews.llvm.org/D100487	2021-04-20 15:18:06 +01:00
Bradley Smith	22c017f0f9	[AArch64][NEON] Match (or (and -a b) (and (a+1) b)) => bit select With this patch vbslq_f32(vnegq_s32(a), b, c) lowers to a BIT instruction. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D100304	2021-04-15 13:52:47 +01:00
Martin Storsjö	5144f730a8	[AArch64] Fix windows vararg functions with floats in the fixed args On Windows, float arguments are normally passed in float registers in the calling convention for regular functions. For variable argument functions, floats are passed in integer registers. This already was done correctly since many years. However, the surprising bit was that floats among the fixed arguments also are supposed to be passed in integer registers, contrary to regular functions. (This also seems to be the behaviour on ARM though, both on Windows, but also on e.g. hardfloat linux.) In the calling convention, don't promote shorter floats to f64, but convert them to integers of the same length. (Floats passed as part of the actual variable arguments are promoted to double already on the C/Clang level; the LLVM vararg calling convention doesn't do any extra promotion of f32 to f64 - this matches how it works on X86 too.) Technically, this is an ABI break compared to older LLVM versions, but it fixes compatibility with the official platform ABI. (In practice, floats among the fixed arguments in variable argument functions is a pretty rare construct.) Differential Revision: https://reviews.llvm.org/D100365	2021-04-15 11:02:14 +03:00
David Sherwood	1206313f82	[CodeGen][AArch64] Fix isel crash for truncating FP stores When attempting to truncate a FP vector and store the result out to memory we crashed because we had no pattern for truncating FP stores. In fact, we don't support these types of stores and the correct fix is to stop marking these truncating stores as legal. Tests have been added here: CodeGen/AArch64/sve-fptrunc-store.ll Differential Revision: https://reviews.llvm.org/D100025	2021-04-08 13:21:29 +01:00
Jun Ma	274ac9d40e	[AArch64][SVE] Lowering sve.dot to DOT node Differential Revision: https://reviews.llvm.org/D99699	2021-04-02 20:05:17 +08:00
Bradley Smith	2f45e632c0	[AArch64][SVE] Improve codegen for select nodes with fixed types Additionally, move the existing fixed vselect tests to *-vselect.ll. Differential Revision: https://reviews.llvm.org/D99418	2021-04-01 15:54:37 +01:00
Bradley Smith	0934fa4f5d	[AArch64][SVE] SVE functions should use the SVE calling convention for fast calls When an SVE function calls another SVE function using the C calling convention we use the more efficient SVE VectorCall PCS. However, for the Fast calling convention we're incorrectly falling back to the generic AArch64 PCS. This patch adds the same "can use SVE vector calling convention" detection used by CallingConv::C to CallingConv::Fast. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D99657	2021-04-01 15:52:08 +01:00
Sander de Smalen	7108b2dec1	[SVE] Fix LoopVectorizer test scalalable-call.ll This marks FSIN and other operations to EXPAND for scalable vectors, so that they are not assumed to be legal by the cost-model. Depends on D97470 Reviewed By: dmgreen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97471	2021-03-31 14:52:49 +01:00
Florian Hahn	52e015081a	[AArch64] Avoid SCALAR_TO_VECTOR for single FP constant vector. Currently the code only checks for integer constants (ConstantSDNode) and triggers an infinite cycle for single-element floating point vector constants. We need to check for both FP and integer constants. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D99384	2021-03-31 10:17:36 +01:00
Joe Ellis	a7dde4c5f7	[AArch64][SVE] Lower fixed length INSERT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D98496	2021-03-30 09:37:11 +00:00
Joe Ellis	c4d39f64d0	[AArch64][SVE] Lower fixed length EXTRACT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D98625	2021-03-30 09:35:44 +00:00
Florian Hahn	482283042f	[AArch64] Remove custom zext/sext legalization code. Currently performExtendCombine assumes that the src-element bitwidth * 2 is a valid MVT. But this is not the case for i1 and it causes a crash on the v64i1 test cases added in this patch. It turns out that this code appears to not be needed; the same patterns are handled by other code and we end up with the same results, even without the custom lowering. I also added additional test cases in `a50037aaa6`. Let's just remove the unneeded code. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99437	2021-03-29 22:22:05 +01:00
Bradley Smith	9745dce8c3	[SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps This is currently performed in SelectionDAGLegalize, here we make it also happen in LegalizeVectorOps, allowing a target to lower the SETCC condition codes first in LegalizeVectorOps and then lower to a custom node afterwards, without having to duplicate all of the SETCC condition legalization in the target specific lowering. As a result of this, fixed length floating point SETCC nodes can now be properly lowered for SVE. Differential Revision: https://reviews.llvm.org/D98939	2021-03-29 15:32:25 +01:00
Florian Hahn	eb3d9f2eb6	[SelDag] Add isIntOrFPConstant helper function. This patch adds a new isIntOrFPConstant helper function to check if a SDValue is a integer of FP constant. This pattern is used in various places. There also are places that incorrectly just check for integer constants, e.g. D99384, so hopefully this helper will help people avoid that issue. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D99428	2021-03-28 12:48:58 +01:00
Nashe Mncube	ac2a1e9596	[SVE] Suppress vselect warning from incorrect interface call The VSelectCombine handler within AArch64ISelLowering, uses an interface call which only expects fixed vectors. This generates a warning when the call is made on a scalable vector. This warning has been suppressed with this change, by using the ElementCount interface, which supports both fixed and scalable vectors. I have also added a regression test which recreates the warning. Differential Revision: https://reviews.llvm.org/D98249	2021-03-24 14:34:34 +00:00
Benjamin Kramer	39e36fff3d	[AArch64] Fix unused variable warning	2021-03-23 13:42:14 +01:00
David Sherwood	748ae5281d	[IR][SVE] Add new llvm.experimental.stepvector intrinsic This patch adds a new llvm.experimental.stepvector intrinsic, which takes no arguments and returns a linear integer sequence of values of the form <0, 1, ...>. It is primarily intended for scalable vectors, although it will work for fixed width vectors too. It is intended that later patches will make use of this new intrinsic when vectorising induction variables, currently only supported for fixed width. I've added a new CreateStepVector method to the IRBuilder, which will generate a call to this intrinsic for scalable vectors and fall back on creating a ConstantVector for fixed width. For scalable vectors this intrinsic is lowered to a new ISD node called STEP_VECTOR, which takes a single constant integer argument as the step. During lowering this argument is set to a value of 1. The reason for this additional argument at the codegen level is because in future patches we will introduce various generic DAG combines such as mul step_vector(1), 2 -> step_vector(2) add step_vector(1), step_vector(1) -> step_vector(2) shl step_vector(1), 1 -> step_vector(2) etc. that encourage a canonical format for all targets. This hopefully means all other targets supporting scalable vectors can benefit from this too. I've added cost model tests for both fixed width and scalable vectors: llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll as well as codegen lowering tests for fixed width and scalable vectors: llvm/test/CodeGen/AArch64/neon-stepvector.ll llvm/test/CodeGen/AArch64/sve-stepvector.ll See this thread for discussion of the intrinsic: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html	2021-03-23 10:43:35 +00:00
Craig Topper	1066dcb550	[AArch64] Fix LowerMGATHER to return the chain result for floating point gathers. Found by adding asserts to LegalizeDAG to make sure custom legalized results had the right types. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D98968	2021-03-19 11:53:46 -07:00
Peter Waller	0d6482a76a	[llvm][AArch64][SVE] Lower fixed length vector fabs Seemingly striaghtforward. Differential Revision: https://reviews.llvm.org/D98434	2021-03-18 17:20:08 +00:00
Sjoerd Meijer	90ecb862a0	[AArch64] Rewrite (add, csel) to cinc Don't rewrite an add instruction with 2 SET_CC operands into a csel instruction. The total instruction sequence uses an extra instruction and register. Preventing this allows us to match a `(add, csel)` pattern and rewrite this into a `cinc`. Differential Revision: https://reviews.llvm.org/D98704	2021-03-18 08:49:27 +00:00
Bradley Smith	cf0da91ba5	[AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE Previously NEON used a target specific intrinsic for frintn, given that the FROUNDEVEN ISD node now exists, move over to that instead and add codegen support for that node for both NEON and fixed length SVE. Differential Revision: https://reviews.llvm.org/D98487	2021-03-17 11:41:22 +00:00
Joe Ellis	ff2dd8a212	[AArch64][SVE] Fold vector ZExt/SExt into gather loads where possible This commit folds sxtw'd or uxtw'd offsets into gather loads where possible with a DAGCombine optimization. As an example, the following code: 1 #include <arm_sve.h> 2 3 svuint64_t func(svbool_t pred, const int32_t *base, svint64_t offsets) { 4 return svld1sw_gather_s64offset_u64( 5 pred, base, svextw_s64_x(pred, offsets) 6 ); 7 } would previously lower to the following assembly: sxtw z0.d, p0/m, z0.d ld1sw { z0.d }, p0/z, [x0, z0.d] ret but now lowers to: ld1sw { z0.d }, p0/z, [x0, z0.d, sxtw] ret Differential Revision: https://reviews.llvm.org/D97858	2021-03-16 15:09:46 +00:00
Stelios Ioannou	ab86edbc88	[AArch64] Implement __rndr, __rndrrs intrinsics This patch implements the __rndr and __rndrrs intrinsics to provide access to the random number instructions introduced in Armv8.5-A. They are only defined for the AArch64 execution state and are available when __ARM_FEATURE_RNG is defined. These intrinsics store the random number in their pointer argument and return a status code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter intrinsic reseeds the random number generator. The instructions write the NZCV flags indicating the success of the operation that we can then read with a CSET. [1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics [2] https://bugs.llvm.org/show_bug.cgi?id=47838 Differential Revision: https://reviews.llvm.org/D98264 Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc	2021-03-15 17:51:48 +00:00
Bradley Smith	860ae9d50c	[AArch64][SVE] Add fixed/scalable lowering of FMAXIMUM/FMINIMUM ISD nodes Differential Revision: https://reviews.llvm.org/D98348	2021-03-11 13:37:47 +00:00
David Green	1a808286ef	[AArch64] Extend vecreduce -> udot handling to mla reductions We previously have lowering for: vecreduce.add(zext(X)) to vecreduce.add(UDOT(zero, X, one)) This extends that to also handle: vecreduce.add(mul(zext(X), zext(Y)) to vecreduce.add(UDOT(zero, X, Y)) It extends the existing code to optionally handle a mul with equal extends. Differential Revision: https://reviews.llvm.org/D97280	2021-03-10 22:25:12 +00:00
David Green	a02f506876	[AArch64] Extend vecreduce -> udot handling to v8i8 https://reviews.llvm.org/D88577 added v16i8 vecreduce to udot/sdot lowering. This extends that to v8i8 too, generalizing the pattern to handle the extra types. Differential Revision: https://reviews.llvm.org/D97279	2021-03-10 21:03:15 +00:00
Cullen Rhodes	2750f3ed31	[IR] Introduce llvm.experimental.vector.splice intrinsic This patch introduces a new intrinsic @llvm.experimental.vector.splice that constructs a vector of the same type as the two input vectors, based on a immediate where the sign of the immediate distinguishes two variants. A positive immediate specifies an index into the first vector and a negative immediate specifies the number of trailing elements to extract from the first vector. For example: @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> ; index @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing element count These intrinsics support both fixed and scalable vectors, where the former is lowered to a shufflevector to maintain existing behaviour, although while marked as experimental the recommended way to express this operation for fixed-width vectors is to use shufflevector. For scalable vectors where it is not possible to express a shufflevector mask for this operation, a new ISD node has been implemented. This is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker and Cullen Rhodes. [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D94708	2021-03-09 10:44:22 +00:00
Akira Hatanaka	dca5737945	Move ObjCARCUtil.h back to llvm/Analysis Instead of adding the header to llvm/IR, just duplicate the marker string in the auto upgrader.	2021-03-08 16:35:24 -08:00
LemonBoy	8725b24c6d	[AArch64] Legalize horizontal fmax/fmin reductions on f16 vectors Expand the horizontal reduction during the instruction selection phase, but only if the target doesn't support the full fp16 instruction set. Fixes https://bugs.llvm.org/show_bug.cgi?id=49401 Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D97840	2021-03-05 16:09:37 +01:00
David Blaikie	a2a55def35	Move llvm/Analysis/ObjCARCUtil.h to IR to fix layering. This is included from IR files, and IR doesn't/can't depend on Analysis (because Analysis depends on IR). Also fix the implementation - don't use non-member static in headers, as it leads to ODR violations, inaccurate "unused function" warnings, etc. And fix the header protection macro name (we don't generally include "LIB" in the names, so far as I can tell).	2021-03-04 16:14:53 -08:00
Akira Hatanaka	1900503595	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `ed4718eccb`, which was reverted because it was causing a miscompile. The bug that was causing the miscompile has been fixed in `75805dce5f`. Original commit message: Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-03-04 11:22:30 -08:00
Hans Wennborg	0a5dd06718	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR" This caused miscompiles of Chromium tests for iOS due clobbering of live registers. See discussion on the code review for details. > Background: > > This fixes a longstanding problem where llvm breaks ARC's autorelease > optimization (see the link below) by separating calls from the marker > instructions or retainRV/claimRV calls. The backend changes are in > https://reviews.llvm.org/D92569. > > https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue > > What this patch does to fix the problem: > > - The front-end adds operand bundle "clang.arc.attachedcall" to calls, > which indicates the call is implicitly followed by a marker > instruction and an implicit retainRV/claimRV call that consumes the > call result. In addition, it emits a call to > @llvm.objc.clang.arc.noop.use, which consumes the call result, to > prevent the middle-end passes from changing the return type of the > called function. This is currently done only when the target is arm64 > and the optimization level is higher than -O0. > > - ARC optimizer temporarily emits retainRV/claimRV calls after the calls > with the operand bundle in the IR and removes the inserted calls after > processing the function. > > - ARC contract pass emits retainRV/claimRV calls after the call with the > operand bundle. It doesn't remove the operand bundle on the call since > the backend needs it to emit the marker instruction. The retainRV and > claimRV calls are emitted late in the pipeline to prevent optimization > passes from transforming the IR in a way that makes it harder for the > ARC middle-end passes to figure out the def-use relationship between > the call and the retainRV/claimRV calls (which is the cause of > PR31925). > > - The function inliner removes an autoreleaseRV call in the callee if > nothing in the callee prevents it from being paired up with the > retainRV/claimRV call in the caller. It then inserts a release call if > claimRV is attached to the call since autoreleaseRV+claimRV is > equivalent to a release. If it cannot find an autoreleaseRV call, it > tries to transfer the operand bundle to a function call in the callee. > This is important since the ARC optimizer can remove the autoreleaseRV > returning the callee result, which makes it impossible to pair it up > with the retainRV/claimRV call in the caller. If that fails, it simply > emits a retain call in the IR if retainRV is attached to the call and > does nothing if claimRV is attached to it. > > - SCCP refrains from replacing the return value of a call with a > constant value if the call has the operand bundle. This ensures the > call always has at least one user (the call to > @llvm.objc.clang.arc.noop.use). > > - This patch also fixes a bug in replaceUsesOfNonProtoConstant where > multiple operand bundles of the same kind were being added to a call. > > Future work: > > - Use the operand bundle on x86-64. > > - Fix the auto upgrader to convert call+retainRV/claimRV pairs into > calls with the operand bundles. > > rdar://71443534 > > Differential Revision: https://reviews.llvm.org/D92808 This reverts commit `ed4718eccb`.	2021-03-03 15:51:40 +01:00
David Green	7abf7dd5ef	[AArch64] Add combine for add(udot(0, x, y), z) -> udot(z, x, y). Given a zero input for a udot, an add can be folded in to take the place of the input, using thte addition that the instruction naturally performs. Differential Revision: https://reviews.llvm.org/D97188	2021-03-01 12:53:34 +00:00
Fraser Cormack	6718fda6ad	[CodeGen] Fix issues with subvector intrinsic index types This patch addresses issues arising from the fact that the index type used for subvector insertion/extraction is inconsistent between the intrinsics and SDNodes. The intrinsic forms require i64 whereas the SDNodes use the type returned by SelectionDAG::getVectorIdxTy. Rather than update the intrinsic definitions to use an overloaded index type, this patch fixes the issue by transforming the index to the correct type as required. Any loss of index bits going from i64 to a smaller type is unexpected, and will be caught by an assertion in SelectionDAG::getVectorIdxConstant. The patch also updates the documentation for INSERT_SUBVECTOR and adds an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR. This necessitated changes to AArch64 which was using i64 for EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed its codegen after updating the backend accordingly. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D97459	2021-03-01 10:28:21 +00:00
David Green	f51b3de4e8	[AArch64] Introduce UDOT/SDOT DAG nodes This is used to lower UDOT/SDOT instructions, as opposed to relying on the intrinsic. Subsequent optimizations will be able to optimize them more cleanly based on these nodes.	2021-02-23 20:31:01 +00:00
Serge Pavlov	2c4f60e45b	[FPEnv][AArch64] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D96836	2021-02-19 13:16:51 +07:00
Bradley Smith	8bad8a43c3	[AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD Adjust generateFMAsInMachineCombiner to return false if SVE is present in order to combine fmul+fadd into fma. Also add new pseudo instructions so as to select the most appropriate of FMLA/FMAD depending on register allocation. Depends on D96599 Differential Revision: https://reviews.llvm.org/D96424	2021-02-18 16:55:16 +00:00
Bradley Smith	5b094bfeb3	[AArch64] Allow folding FMUL/FADD into FMA for FP16 types isFMAFasterThanFMulAndFAdd should return true for FP16 types when HasFullFP16 is present, since we have the instructions to handle it for both SVE and NEON. (SVE patterns and tests will follow). Differential Revision: https://reviews.llvm.org/D96599	2021-02-18 16:51:22 +00:00
Fraser Cormack	0176fecfbc	[SVE][CodeGen] Expand SVE MULH[SU] and [SU]MUL_LOHI nodes This patch fixes a codegen crash introduced in `fde2466171`, where the DAGCombiner started generating optimized MULH[SU] or [SU]MUL_LOHI nodes unless the target opted out. The AArch64 backend cannot currently select any of these nodes, so ensure that they are not generated in the first place. This issue was raised by @huihuiz in D94501. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D96849	2021-02-18 10:06:24 +00:00
Florian Hahn	211147c5ba	[AArch64] Convert CMP/SELECT sign patterns to OR & ASR. ICMP & SELECT patterns extracting the sign of a value can be simplified to OR & ASR (see https://alive2.llvm.org/ce/z/Xx4iZ0). This does not save any instructions in IR, but it is profitable on AArch64, because we need at least 2 extra instructions to materialize 1 and -1 for the SELECT. The improvements result in ~5% speedups on loops of the form static int sign_of(int x) { if (x < 0) return -1; return 1; } void foo(const int x, int res, int cnt) { for (int i=0;i<cnt;i++) res[i] = sign_of(x[i]); } Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D96596	2021-02-16 17:17:34 +00:00
Caroline Concatto	2d728bbff5	[CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse This patch adds a new intrinsic experimental.vector.reduce that takes a single vector and returns a vector of matching type but with the original lane order reversed. For example: ``` vector.reverse(<A,B,C,D>) ==> <D,C,B,A> ``` The new intrinsic supports fixed and scalable vectors types. The fixed-width vector relies on shufflevector to maintain existing behaviour. Scalable vector uses the new ISD node - VECTOR_REVERSE. This new intrinsic is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker (@paulwalker-arm). [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Differential Revision: https://reviews.llvm.org/D94883	2021-02-15 13:39:43 +00:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Nico Weber	de1966e542	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `4a64d8fe39`. Makes clang crash when buildling trivial iOS programs, see comment after https://reviews.llvm.org/D92808#2551401	2021-02-09 11:06:32 -05:00
Akira Hatanaka	4a64d8fe39	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `3fe3946d9a` without the changes made to lib/IR/AutoUpgrade.cpp, which was violating layering. Original commit message: Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 06:09:42 -08:00
Akira Hatanaka	2fbbb18c1d	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `3fe3946d9a`. The commit violates layering by including a header from Analysis in lib/IR/AutoUpgrade.cpp.	2021-02-05 06:00:05 -08:00
Akira Hatanaka	3fe3946d9a	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 05:55:18 -08:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Kerry McLaughlin	9b4fcfaa9e	[SVE][CodeGen] Remove performMaskedGatherScatterCombine The AArch64 DAG combine added by D90945 & D91433 extends the index of a scalable masked gather or scatter to i32 if necessary. This patch removes the combine and instead adds shouldExtendGSIndex, which is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether the index should be extended before calling getMaskedGather/Scatter. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D94525	2021-02-01 14:10:00 +00:00
Kazu Hirata	7925aa091d	[llvm] Populate SmallVector at construction time (NFC)	2021-01-28 22:21:14 -08:00
Richard Smith	925ae8c790	Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV" This reverts commit `53176c1680`, which introduceed a layering violation. LLVM's IR library can't include headers from Analysis.	2021-01-25 13:53:38 -08:00
Akira Hatanaka	53176c1680	[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end annotates calls with attribute "clang.arc.rv"="retain" or "clang.arc.rv"="claim", which indicates the call is implicitly followed by a marker instruction and a retainRV/claimRV call that consumes the call result. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the annotated calls in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the annotated calls. It doesn't remove the attribute on the call since the backend needs it to emit the marker instruction. The retainRV/claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes the autoreleaseRV call in the callee that returns the result if nothing in the callee prevents it from being paired up with the calls annotated with "clang.arc.rv"="retain/claim" in the caller. If the call is annotated with "claim", a release call is inserted since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the attributes to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV call returning the callee result, which makes it impossible to pair it up with the retainRV or claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the call is annotated with "retain" and does nothing if it's annotated with "claim". - This patch teaches dead argument elimination pass not to change the return type of a function if any of the calls to the function are annotated with attribute "clang.arc.rv". This is necessary since the pass can incorrectly determine nothing in the IR uses the function return, which can happen since the front-end no longer explicitly emits retainRV/claimRV calls in the IR, and change its return type to 'void'. Future work: - Use the attribute on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the attributes. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-01-25 11:57:08 -08:00
QingShan Zhang	ffc3e800c6	[NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero For now, we correct the result for sqrt if iteration > 0. This doesn't make sense as they are not strict relative. Reviewed By: dmgreen, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D94480	2021-01-25 04:02:44 +00:00
Kazu Hirata	cfa241680f	[llvm] Don't include StringSwitch.h where unnecessary (NFC)	2021-01-21 19:59:48 -08:00
Simon Pilgrim	69bc0990a9	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED). Add DemandedElts support inside the TRUNCATE analysis. REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d Differential Revision: https://reviews.llvm.org/D56387	2021-01-21 13:01:34 +00:00
Hans Wennborg	a51226057f	Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE" It caused "Vector shift amounts must be in the same as their first arg" asserts in Chromium builds. See the code review for repro instructions. > Add DemandedElts support inside the TRUNCATE analysis. > > Differential Revision: https://reviews.llvm.org/D56387 This reverts commit `cad4275d69`.	2021-01-20 20:06:55 +01:00
Simon Pilgrim	cad4275d69	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE Add DemandedElts support inside the TRUNCATE analysis. Differential Revision: https://reviews.llvm.org/D56387	2021-01-20 15:39:58 +00:00
Amanieu d'Antras	21bfd068b3	[AArch64] Add support for the GNU ILP32 ABI Add the aarch64[_be]-*-gnu_ilp32 targets to support the GNU ILP32 ABI for AArch64. The needed codegen changes were mostly already implemented in D61259, which added support for the watchOS ILP32 ABI. The main changes are: - Wiring up the new target to enable ILP32 codegen and MC. - ILP32 va_list support. - ILP32 TLSDESC relocation support. There was existing MC support for ELF ILP32 relocations from D25159 which could be enabled by passing "-target-abi ilp32" to llvm-mc. This was changed to check for "gnu_ilp32" in the target triple instead. This shouldn't cause any issues since the existing support was slightly broken: it was generating ELF64 objects instead of the ELF32 object files expected by the GNU ILP32 toolchain. This target has been tested by running the full rustc testsuite on a big-endian ILP32 system based on the GCC ILP32 toolchain. Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D94143	2021-01-20 13:34:47 +00:00
Nicholas Guy	16bf02c3a1	Reland "[AArch64] Attempt to sink mul operands"" This relands `dda60035e9`, which was reverted by `dbaa6a1858`	2021-01-18 16:00:22 +00:00
Nicholas Guy	f5fcbe4e3c	[AArch64] Further restricts when a dup(ext) can be rearranged In most cases, the dup(ext) pattern can be rearranged to perform the extension on the vector side, allowing for further vector-specific optimisations to be made. However the initial checks for this conversion were insufficient, allowing invalid encodings to be attempted (causing compilation to fail). Differential Revision: https://reviews.llvm.org/D94778	2021-01-18 16:00:21 +00:00
Stephan Herhut	061d152085	[SVE] Fix unused variable. Introduced by [SVE] Restrict the usage of REINTERPRET_CAST. Differential Revision: https://reviews.llvm.org/D94773	2021-01-15 15:10:33 +01:00
Paul Walker	2b8db40c92	[SVE] Restrict the usage of REINTERPRET_CAST. In order to limit the number of combinations of REINTERPRET_CAST, whilst at the same time prevent overlap with BITCAST, this patch establishes the following rules: 1. The operand and result element types must be the same. 2. The operand and/or result type must be an unpacked type. Differential Revision: https://reviews.llvm.org/D94593	2021-01-15 11:32:13 +00:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Martin Storsjö	dbaa6a1858	Revert "[AArch64] Attempt to sink mul operands" This reverts commit `dda60035e9`. This commit caused failures to compile some sources, erroring out with "error in backend: Cannot select: t85: v2i32 = AArch64ISD::DUP t15", see https://reviews.llvm.org/D91271 for the full reproduction case.	2021-01-14 17:28:18 +02:00
Nicholas Guy	dda60035e9	[AArch64] Attempt to sink mul operands Following on from D91255, this patch is responsible for sinking relevant mul operands to the same block so that umull/smull instructions can be correctly generated by the mul combine implemented in the aforementioned patch. Differential revision: https://reviews.llvm.org/D91271	2021-01-13 15:23:36 +00:00
Hsiangkai Wang	914e2f5a02	[NFC] Use generic name for scalable vector stack ID. Differential Revision: https://reviews.llvm.org/D94471	2021-01-13 10:57:43 +08:00
Kerry McLaughlin	c37f68a888	[SVE][CodeGen] Fix legalisation of floating-point masked gathers Changes in this patch: - When lowering floating-point masked gathers, cast the result of the gather back to the original type with reinterpret_cast before returning. - Added patterns for reinterpret_casts from integer to floating point, and concat_vector patterns for bfloat16. - Tests for various legalisation scenarios with floating point types. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D94171	2021-01-11 10:57:46 +00:00
Mark Murray	af7cce2fa4	[AArch64] Add +pauth archictecture option, allowing the v8.3a pointer authentication extension. Differential Revision: https://reviews.llvm.org/D94083	2021-01-08 13:21:11 +00:00
Nicholas Guy	ed23229a64	[AArch64] Fix crash caused by invalid vector element type Fixes a crash caused by D91255, when LLVMTy is null when calling changeExtendedVectorElementType. Differential Revision: https://reviews.llvm.org/D94234	2021-01-08 12:02:54 +00:00
David Sherwood	d1bf26fd94	[AArch64][SVE] Add lowering for llvm abs intrinsic Add functionality to permit lowering of the abs and neg intrinsics using the passthru variants. Differential Revision: https://reviews.llvm.org/D94160	2021-01-08 08:55:25 +00:00
Kazu Hirata	b934160aaa	[Target] Use llvm::find_if (NFC)	2021-01-07 20:29:36 -08:00
Nicholas Guy	350247a93c	[AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup)) Performing this rearrangement allows for existing patterns to match cases where the vector may be built after an extend, instead of before. Differential Revision: https://reviews.llvm.org/D91255	2021-01-06 16:02:16 +00:00
David Green	78d8a821e2	[AArch64] Handle any extend whilst lowering mull Demanded bits may turn a sext or zext into an anyext if the top bits are not needed. This currently prevents the lowering to instructions like mull, addl and addw. This patch fixes the mull generation by keeping it simple and treating them like zextends. Differential Revision: https://reviews.llvm.org/D93832	2021-01-06 10:08:43 +00:00
Sander de Smalen	a9f5e4375b	[AArch64] Use faddp to implement fadd reductions. Custom-expand legal VECREDUCE_FADD SDNodes to benefit from pair-wise faddp instructions. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D59259	2021-01-06 09:36:51 +00:00
Paul Walker	eba6deab22	[SVE] Lower vector CTLZ, CTPOP and CTTZ operations. CTLZ and CTPOP are lowered to CLZ and CNT instructions respectively. CTTZ is not a native SVE operation but is instead lowered to: CTTZ(V) => CTLZ(BITREVERSE(V)) In the case of fixed-length support using SVE we also lower CTTZ operating on NEON sized vectors because of its reliance on BITREVERSE which is also lowered to SVE intructions at these lengths. Differential Revision: https://reviews.llvm.org/D93607	2021-01-05 10:42:35 +00:00
Nikita Popov	fb77d95022	[AArch64] Fix legalization of i128 ctpop without neon If neon is disabled, LowerCTPOP will return SDValue() to indicate that normal legalization should be used. However, ReplaceNodeResults does not check for this and pushes the empty SDValue() onto the result vector, which will subsequently result in a crash. Differential Revision: https://reviews.llvm.org/D93825	2020-12-27 17:24:41 +01:00
Paul Walker	8eec7294fe	[SVE] Lower vector BITREVERSE and BSWAP operations. These operations are lowered to RBIT and REVB instructions respectively. In the case of fixed-length support using SVE we also lower BITREVERSE operating on NEON sized vectors as this results in fewer instructions. Differential Revision: https://reviews.llvm.org/D93606	2020-12-22 16:49:50 +00:00
Kazu Hirata	966f1431de	[Target] Use llvm::erase_if (NFC)	2020-12-20 17:43:22 -08:00
Kerry McLaughlin	52e4084d9c	[SVE][CodeGen] Vector + immediate addressing mode for masked gather/scatter This patch extends LowerMGATHER/MSCATTER to make use of the vector + reg/immediate addressing modes for scalable masked gathers & scatters. selectGatherScatterAddrMode checks if the base pointer is null, in which case we can swap the base pointer and the index, e.g. getelementptr nullptr, <vscale x N x T> (splat(%offset)) + %indices) -> getelementptr %offset, <vscale x N x T> %indices Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D93132	2020-12-18 11:56:36 +00:00
Kerry McLaughlin	6d2a78996b	[SVE][CodeGen] Add bfloat16 support to scalable masked gather Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D93307	2020-12-17 11:08:15 +00:00
QingShan Zhang	ebdd20f430	Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128 X86 and AArch64 expand it as libcall inside the target. And PowerPC also want to expand them as libcall for P8. So, propose an implement in the legalizer to common the logic and remove the code for X86/AArch64 to avoid the duplicate code. Reviewed By: Craig Topper Differential Revision: https://reviews.llvm.org/D91331	2020-12-17 07:59:30 +00:00
Paul Walker	b74c4dbb96	[SVE] Move INT_TO_FP i1 promotion into custom lowering. AddPromotedToType is being used to legalise INT_TO_FP operations when the source is a predicate. The point where this introduces vector extends might cause problems in the future so this patch falls back to manual promotion within custom lowering. Differential Revision: https://reviews.llvm.org/D90093	2020-12-15 11:57:07 +00:00
Kerry McLaughlin	c5ced82c8e	[SVE][CodeGen] Lower scalable floating-point vector reductions Changes in this patch: - Minor changes to the LowerVECREDUCE_SEQ_FADD function added by @cameron.mcinally to also work for scalable types - Added TableGen patterns for FP reductions with unpacked types (nxv2f16, nxv4f16 & nxv2f32) - Asserts added to expandFMINNUM_FMAXNUM & expandVecReduceSeq for scalable types Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D93050	2020-12-14 11:45:42 +00:00
Florian Hahn	46bc40e502	Recommit "[AArch64] Lower calls with rv_marker attribute." This recommits `a87fccb3ff` with a fix to mark the destination operand of the marker instruction as def, to fix a machine verifier failure. This reverts the revert commit `c0f2cea7c0`.	2020-12-13 16:20:39 +00:00
Florian Hahn	c0f2cea7c0	Revert "[AArch64] Lower calls with rv_marker attribute ." This reverts commit `a87fccb3ff`. A test appears to fail with expensive checks. Reverting while I investigate.	2020-12-11 20:12:59 +00:00
Florian Hahn	a87fccb3ff	[AArch64] Lower calls with rv_marker attribute . This patch adds support for lowering function calls with the rv_marker attribute. The goal is to expand such calls to the following sequence of instructions: BL @fn mov x29, x29 This sequence of instructions triggers Objective-C runtime optimizations, hence we want to ensure no instructions get moved in between them. This patch achieves that by adding a new CALL_RVMARKER ISD node, which gets turned into the BLR_RVMARKER pseudo, which eventually gets expanded into the sequence mentioned above. The sequence is then marked as instruction bundle, to avoid anything being moved in between. @ahatanak is working on using this attribute in the front- & middle-end. Together with the front- & middle-end changes, this should address PR31925 for AArch64. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D92569	2020-12-11 19:45:44 +00:00
Kerry McLaughlin	abe7775f5a	[SVE][CodeGen] Extend index of masked gathers This patch changes performMSCATTERCombine to also promote the indices of masked gathers where the element type is i8 or i16, and adds various tests for gathers with illegal types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91433	2020-12-10 13:54:45 +00:00
Kerry McLaughlin	05edfc5475	[SVE][CodeGen] Add DAG combines for s/zext_masked_gather This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true: - fold (and (masked_gather x)) -> (zext_masked_gather x) - fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x) LowerMGATHER has also been updated to fetch the LoadExtType associated with the gather and also use this value to determine the correct masked gather opcode to use. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92230	2020-12-09 11:53:19 +00:00
Huihui Zhang	8e6fc1f97e	[AArch64][SVE] Add lowering for llvm.maxnum\|minnum for scalable type. LLVM intrinsic llvm.maxnum\|minnum is overloaded intrinsic, can be used on any floating-point or vector of floating-point type. This patch extends current infrastructure to support scalable vector type. This patch also fix a warning message of incorrect use of EVT::getVectorNumElements() for scalable type, when DAGCombiner trying to split scalable vector. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92607	2020-12-08 09:35:53 -08:00
David Sherwood	59f17b57d9	[SVE] Fix crashes with inline assembly All the crashes found compiling inline assembly are fixed in this patch by changing AArch64TargetLowering::getRegForInlineAsmConstraint to be more resilient to mismatched value and register types. For example, it makes no sense to request a predicate register for a nxv2i64 type and so on. Tests have been added here: test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll Differential Revision: https://reviews.llvm.org/D92554	2020-12-08 13:48:43 +00:00
Tim Northover	c5978f42ec	UBSAN: emit distinctive traps Sometimes people get minimal crash reports after a UBSAN incident. This change tags each trap with an integer representing the kind of failure encountered, which can aid in tracking down the root cause of the problem.	2020-12-08 10:28:26 +00:00
Kerry McLaughlin	111f559bbd	[SVE][CodeGen] Call refineIndexType & refineUniformBase from visitMGATHER The refineIndexType & refineUniformBase functions added by D90942 can also be used to improve CodeGen of masked gathers. These changes were split out from D91092 Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92319	2020-12-07 13:20:19 +00:00
Kerry McLaughlin	f6dd32fd35	[SVE][CodeGen] Lower scalable masked gathers Lowers the llvm.masked.gather intrinsics (scalar plus vector addressing mode only) Changes in this patch: - Add custom lowering for MGATHER, using getGatherVecOpcode() to choose the appropriate gather load opcode to use. - Improve codegen with refineIndexType/refineUniformBase, added in D90942 - Tests added for gather loads with 32 & 64-bit scaled & unscaled offsets. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91092	2020-12-07 12:20:41 +00:00
Craig Topper	c55d9af8c0	[AArch64] Add custom lowering for ISD::ABS Instead of trying to pattern match the code produced by ISD::ABS expansion, just custom legalize ISD::ABS to the desired sequence. The one test change is because a DAG combine for (neg (abs)) is no longer firing because ISD::ABS is now Custom instead of Expand. Differential Revision: https://reviews.llvm.org/D92154	2020-12-04 10:45:31 -08:00
Kerry McLaughlin	603d40da9d	[SVE][CodeGen] Add a DAG combine to extend mscatter indices This patch adds a target-specific DAG combine for mscatter to promote indices with element types i8 or i16 before legalisation, plus various tests with illegal types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90945	2020-11-25 11:18:22 +00:00
Pavel Iliin	4d7df43ffd	[AArch64] Out-of-line atomics (-moutline-atomics) implementation. This patch implements out of line atomics for LSE deployment mechanism. Details how it works can be found in llvm/docs/Atomics.rst Options -moutline-atomics and -mno-outline-atomics to enable and disable it were added to clang driver. This is clang and llvm part of out-of-line atomics interface, library part is already supported by libgcc. Compiler-rt support is provided in separate patch. Differential Revision: https://reviews.llvm.org/D91157	2020-11-20 13:30:12 +00:00
Adhemerval Zanella	807320119f	[AArch64] Lower fptrunc/fpext from/to FP128t to/from FP16 The compiler-rt part which adds the emitted symbols is handled in a subsequent patch. Differential Revision: https://reviews.llvm.org/D91731	2020-11-19 15:14:50 -03:00
Kerry McLaughlin	306c8ab208	[SVE][CodeGen] Improve codegen of scalable masked scatters If the scatter store is able to perform the sign/zero extend of its index, this is folded into the instruction with refineIndexType(). Additionally, refineUniformBase() will return the base pointer and index from an add + splat_vector. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90942	2020-11-13 11:19:36 +00:00
David Sherwood	3225fcf11e	[SVE] Deal with SVE tuple call arguments correctly when running out of registers When passing SVE types as arguments to function calls we can run out of hardware SVE registers. This is normally fine, since we switch to an indirect mode where we pass a pointer to a SVE stack object in a GPR. However, if we switch over part-way through processing a SVE tuple then part of it will be in registers and the other part will be on the stack. I've fixed this by ensuring that: 1. When we don't have enough registers to allocate the whole block we mark any remaining SVE registers temporarily as allocated. 2. We temporarily remove the InConsecutiveRegs flags from the last tuple part argument and reinvoke the autogenerated calling convention handler. Doing this prevents the code from entering an infinite recursion and, in combination with 1), ensures we switch over to the Indirect mode. 3. After allocating a GPR register for the pointer to the tuple we then deallocate any SVE registers we marked as allocated in 1). We also set the InConsecutiveRegs flags back how they were before. 4. I've changed the AArch64ISelLowering LowerCALL and LowerFormalArguments functions to detect the start of a tuple, which involves allocating a single stack object and doing the correct numbers of legal loads and stores. Differential Revision: https://reviews.llvm.org/D90219	2020-11-12 08:41:50 +00:00
Caroline Concatto	37f4ccb275	[AArch64]Add memory op cost model for SVE This patch adds/fixes memory op cost model for SVE with fixed-width vector. Differential Revision: https://reviews.llvm.org/D90950	2020-11-11 12:49:19 +00:00
Simon Pilgrim	1a62ca65c1	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
Kerry McLaughlin	170947a5de	[SVE][CodeGen] Lower scalable masked scatters Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only) Changes included in this patch: - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use. Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts. - Added the getCanonicalIndexType function to convert redundant addressing modes (e.g. scaling is redundant when accessing bytes) - Tests with 32 & 64-bit scaled & unscaled offsets Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90941	2020-11-11 11:50:22 +00:00
Francesco Petrogalli	9f61931e07	[llvm][AArch64] Allow TB(N)Z to drop signext for sign bit tests. For example if the sign extension is only used in for TBZ, and the value is used elsewhere with a zero extension, this can eliminate a sign extension. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D90606	2020-11-09 18:27:48 +00:00
Christopher Tetreault	900ec97bbe	[UBSan] Cannot negate smallest negative signed integer Silence warning Undefined Behavior Sanitzer warning: runtime error: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D90710	2020-11-04 10:07:52 -08:00
Kerry McLaughlin	f2412d372d	[SVE][CodeGen] Lower scalable integer vector reductions This patch uses the existing LowerFixedLengthReductionToSVE function to also lower scalable vector reductions. A separate function has been added to lower VECREDUCE_AND & VECREDUCE_OR operations with predicate types using ptest. Lowering scalable floating-point reductions will be addressed in a follow up patch, for now these will hit the assertion added to expandVecReduce() in TargetLowering. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D89382	2020-11-04 11:38:49 +00:00
David Sherwood	cea69fa4dc	[SVE] Add fatal error for unnamed SVE variadic arguments We don't currently support passing unnamed variadic SVE arguments so I've added a fatal error if we hit such cases to prevent any silent ABI issues in future. Differential Revision: https://reviews.llvm.org/D90230	2020-10-30 13:35:47 +00:00
Florian Hahn	ba78cae20f	[AArch64] Use DUP for BUILD_VECTOR with few different elements. If most elements of BUILD_VECTOR are the same, with a few different elements, it is better to use DUP for the common elements and INSERT_VECTOR_ELT for the different elements. Currently this transform is guarded quite restrictively to only trigger in clearly beneficial cases. With D90176, the lowering for patterns originating from code like ` float32x4_t y = {a,a,a,0};` (common in 3D apps) are lowered even better (unnecessary fmov is removed). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D90233	2020-10-28 19:48:20 +00:00
David Green	066737fdbc	[AArch64] Remove AArch64ISD::NOT, use vnot instead vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT node, but allow more folding thanks to all the target independent optimizations. Specifically this allows select(icmp ne, x, y) to become "cmeq; bsl y, x" as opposed to needing to convert the predicate with "cmeq; mvn; bsl x, y" Unfortunately there is a regression in a cmtst test, but the code it selected from was already non-canonical, with instcombine preferring to use an eq predicate instead. Plus the more common case of icmp ne is improved. Differential Revision: https://reviews.llvm.org/D90126	2020-10-28 08:15:37 +00:00
Cameron McInally	a1cc274cb3	[SVE] Lower fixed length VECREDUCE_SEQ_FADD operation Differential Revision: https://reviews.llvm.org/D89162	2020-10-23 16:24:02 -05:00
David Sherwood	d67d8f8790	[SVE][AArch64] Replace TypeSize comparisons with their integer equivalents In many places in the AArch64 backend we are comparing TypeSize objects, but in fact we are only ever expecting fixed width types. I've changed all such comparisons to use their integer equivalents by replacing calls to getSizeInBits() with getFixedSizeInBits(), etc. Differential Revision: https://reviews.llvm.org/D89116	2020-10-19 07:41:33 +01:00
Vinay Madhusudan	159b2d8e62	[AArch64] Combine UADDVs to generate vector add ADD(UADDV a, UADDV b) --> UADDV(ADD a, b) This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888 Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929 Differential Revision: https://reviews.llvm.org/D88731	2020-10-15 09:56:31 +05:30
Cameron McInally	421f1b7294	[SVE] Lower fixed length VECREDUCE_FADD operation Differential Revision: https://reviews.llvm.org/D89263	2020-10-14 09:41:11 -05:00
David Sherwood	af57a0838e	[SVE] Add fatal error when running out of registers for SVE tuple call arguments When passing SVE types as arguments to function calls we can run out of hardware SVE registers. This is normally fine, since we switch to an indirect mode where we pass a pointer to a SVE stack object in a GPR. However, if we switch over part-way through processing a SVE tuple then part of it will be in registers and the other part will be on the stack. This is wrong and we'd like to avoid any silent ABI compatibility issues in future. For now, I've added a fatal error when this happens until we can get a proper fix. Differential Revision: https://reviews.llvm.org/D89326	2020-10-14 09:31:41 +01:00
Vinay Madhusudan	37dce7475b	[AArch64] Identify SAD pattern (ABS (SUB (EXTEND a), (EXTEND b))) to ZERO_EXTEND((UABD a, b)) (ABS (SUB (SIGN_EXTEND a), (SIGN_EXTEND b))) to ZERO_EXTEND((SABD a, b)) This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888 Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929 Differential Revision: https://reviews.llvm.org/D88742	2020-10-13 15:50:54 +05:30
Cameron McInally	974ddb54c9	[SVE] Lower fixed length VECREDUCE_XOR operation Differential Revision: https://reviews.llvm.org/D88974	2020-10-12 10:12:15 -05:00
Cameron McInally	333b2ab60b	[SVE] Lower fixed length VECREDUCE_OR operation Differential Revision: https://reviews.llvm.org/D88847	2020-10-07 09:56:25 -05:00
Paul Walker	27f3d51b4e	[SVE] Lower fixed length vector fneg and fsqrt operations. Also updates sve-fp.ll to use fneg directly. Differential Revision: https://reviews.llvm.org/D88683	2020-10-06 10:48:16 +01:00
Paul Walker	8bb702a8ad	[SVE] Lower fixed length vector floating point rounding operations. Adds lowering for: llvm.ceil llvm.floor llvm.nearbyint llvm.rint llvm.round llvm.trunc Differential Revision: https://reviews.llvm.org/D88671	2020-10-06 10:48:16 +01:00
Cameron McInally	9642ded8ba	[SVE] Lower fixed length VECREDUCE_AND operation Differential Revision: https://reviews.llvm.org/D88707	2020-10-05 11:28:38 -05:00
Vinay Madhusudan	f192594956	[AArch64] Generate dot for v16i8 sum reduction to i32 Convert VECREDUCE_ADD( EXTEND(v16i8_type) ) to VECREDUCE_ADD( DOTv16i8(v16i8_type) ) whenever the result type is i32. This gains in one of the SPECCPU 2017 benchmark. This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888 Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929 Differential Revision: https://reviews.llvm.org/D88577	2020-10-02 17:11:02 +01:00
Muhammad Asif Manzoor	aab6f7db47	[AArch64][SVE] Add lowering for llvm fabs Add the functionality to lower fabs for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D88679	2020-10-01 19:41:25 -04:00
Kerry McLaughlin	fcf70e1e3b	[SVE][CodeGen] Lower scalable fp_extend & fp_round operations This patch adds FP_EXTEND_MERGE_PASSTHRU & FP_ROUND_MERGE_PASSTHRU ISD nodes, used to lower scalable vector fp_extend/fp_round operations. fp_round has an additional argument, the 'trunc' flag, which is an integer of zero or one. This also fixes a warning introduced by the new tests added to sve-split-fcvt.ll, resulting from an implicit TypeSize -> uint64_t cast in SplitVecOp_FP_ROUND. Reviewed By: sdesmalen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D88321	2020-10-01 12:17:37 +01:00
Kerry McLaughlin	75db7cf78a	[SVE][CodeGen] Legalisation of integer -> floating point conversions Splitting the operand of a scalable [S\|U]INT_TO_FP results in a concat_vectors operation where the operands are unpacked FP scalable vectors (e.g. nxv2f32). This patch adds custom lowering of concat_vectors which checks that the number of operands is 2, and isel patterns to match concat_vectors of scalable FP types with uzp1. Reviewed By: efriedma, paulwalker-arm Differential Revision: https://reviews.llvm.org/D88033	2020-10-01 10:43:20 +01:00
Paul Walker	8931c3d682	[NFC] Iterate across an explicit list of scalable MVTs when driving setOperationAction. Iterating across all of integer_scalable_vector_valuetypes seems wasteful when there's only a handful we care about. Also removes some rouge whitespace. Differential Revision: https://reviews.llvm.org/D88552	2020-10-01 10:17:59 +01:00
Cameron McInally	80381c4dc9	[SVE] Lower fixed length VECREDUCE_[FMAX\|FMIN] to Scalable Differential Revision: https://reviews.llvm.org/D88444	2020-09-29 16:22:29 -05:00
Cameron McInally	9b0b09671c	[SVE] Lower fixed length VECREDUCE_[UMAX\|UMIN] to Scalable Essentially the same as the signed variants from D88259. Also includes a clean up of the lowering function. Differential Revision: https://reviews.llvm.org/D88317	2020-09-28 09:29:00 -05:00
David Sherwood	bafdd11326	[SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy After some recent upstream discussion we decided that it was best to avoid having the / operator for both ElementCount and TypeSize, since this could give the impression that these classes can be used in the same way as basic integer integer types. However, division for scalable types is a bit odd because we are only dividing the minimum quantity by a value, as opposed to something like: (MinSize * Vscale) / SomeValue This is why when performing division it's important the caller first establishes whether the operation makes sense, perhaps by calling isKnownMultipleOf() prior to division. The caller must now explictly call divideCoefficientBy() on the class to perform the operation. Differential Revision: https://reviews.llvm.org/D87700	2020-09-28 08:03:00 +01:00
Cameron McInally	9a4767411e	[SVE] Revert accidental change from 405e22fbe8719cff6c40eec15c2044f42527f116 Accidentally commited two lines that were not intended. Remove those.	2020-09-25 10:11:10 -05:00
Cameron McInally	e2ccf7f178	[SVE] Lower fixed length VECREDUCE_[SMAX\|SMIN] to Scalable This patch is pretty similar to the VECREDUCE_ADD patch, with some minor tweaks. Results from the AArch64ISD::[SMAX\|SMIN]V_PRED return element sized results. This requires an ANY_EXTEND for results < 32-bits, since Legalization promotes those results. There is no NEON i64 vector support for SMAXV\|SMINV, so use SVE for those. Differential Revision: https://reviews.llvm.org/D88259	2020-09-25 09:58:17 -05:00
Daniel Kiss	2a96f47c5f	[AArch64] __builtin_return_address for PAuth. This change adds the support for __builtin_return_address for ARMv8.3A Pointer Authentication. Location of the authentication code in the pointer depends on the system configuration, therefore a dedicated instruction is used for effectively removing the authentication code without authenticating the pointer. Reviewed By: chill Differential Revision: https://reviews.llvm.org/D75044	2020-09-24 23:23:49 +02:00
Cameron McInally	e8413ac97f	[AArch64] Expand some vector of i64 reductions on NEON With the exception of VECREDUCE_ADD, there are no NEON instructions to support vector of i64 reductions. This patch removes the Custom lowerings for those and adds some test coverage to confirm. Differential Revision: https://reviews.llvm.org/D88161	2020-09-23 16:01:24 -05:00
Muhammad Asif Manzoor	3a76de4275	[AArch64][SVE] Add lowering for llvm frecpx Add the functionality to lower frecpx for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D88032	2020-09-23 15:23:54 -04:00
Cameron McInally	db40a74344	[SVE] Lower fixed length ISD::VECREDUCE_ADD to Scalable Differential Revision: https://reviews.llvm.org/D87796	2020-09-23 09:08:07 -05:00
Kerry McLaughlin	d0149ba9b4	[SVE][CodeGen] Lower legal integer -> floating point conversions This patch adds new ISD nodes, SCVTZ_MERGE_PASSTHRU & UCVTZ_MERGE_PASSTHRU, which are used to lower both legal scalable vector [S\|U]INT_TO_FP operations and the following intrinsics: - llvm.aarch64.sve.scvtf - llvm.aarch64.sve.ucvtf Reviewed By: sdesmalen, efriedma Differential Revision: https://reviews.llvm.org/D87913	2020-09-23 11:53:53 +01:00
David Sherwood	e077367a28	[SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits An existing function Type::getScalarSizeInBits returns a uint64_t instead of a TypeSize class because the caller is requesting a scalar size, which cannot be scalable. This patch makes other similar functions requesting a scalar size consistent with that, thereby eliminating more than 1000 implicit TypeSize -> uint64_t casts. Differential revision: https://reviews.llvm.org/D87889	2020-09-23 09:20:08 +01:00
Paul Walker	f3fa954b5b	[SVE] Change definition of reduction ISD nodes to have an SVE vector result type. The current nodes, AArch64::SMAXV_PRED for example, are defined to return a NEON vector result. This is incorrect because they modify the complete SVE register and are thus changed to represent such. This patch also adds nodes for UADDV_PRED and SADDV_PRED, which unifies the handling of all SVE reductions. NOTE: Floating-point reductions are already implemented correctly, so this patch is essentially making everything consistent with those. Differential Revision: https://reviews.llvm.org/D87843	2020-09-21 13:16:28 +01:00
Tim Northover	2afe4becec	AArch64: make sure jump table entries can reach entire image This turns all jump table entries into deltas within the target function because in the small memory model all code & static data must be in a 4GB block somewhere in memory. When the entries were a delta between the table location and a basic block, the 32-bit signed entries are not enough to guarantee reachability. https://reviews.llvm.org/D87286	2020-09-18 09:50:40 +01:00
Cameron McInally	a35c7f3076	[SVE][WIP] Implement lowering for fixed length VSELECT to Scalable Map fixed length VSELECT to its Scalable equivalent. Differential Revision: https://reviews.llvm.org/D85364	2020-09-17 14:02:57 -05:00
Sanne Wouda	d5fd3d9b90	[AArch64] Match pairwise add/fadd pattern D75689 turns the faddp pattern into a shuffle with vector add. Match this new pattern in target-specific DAG combine, rather than ISel, because legalization (for v2f32) turns it into a bit of a mess. - extended to cover f16, f32, f64 and i64	2020-09-17 16:27:01 +01:00
Kerry McLaughlin	f7185b271f	[SVE][CodeGen] Lower floating point -> integer conversions This patch adds new ISD nodes, FCVTZS_MERGE_PASSTHRU & FCVTZU_MERGE_PASSTHRU, which are used to lower scalable vector FP_TO_SINT/FP_TO_UINT operations and the following intrinsics: - llvm.aarch64.sve.fcvtzu - llvm.aarch64.sve.fcvtzs Reviewed By: efriedma, paulwalker-arm Differential Revision: https://reviews.llvm.org/D87232	2020-09-17 14:04:22 +01:00
Muhammad Asif Manzoor	d417488ef5	[AArch64][SVE] Add lowering for llvm fsqrt Add the functionality to lower fsqrt for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D87707	2020-09-15 15:26:17 -04:00
Philip Reames	e6bc7037d3	[AArch64] Statepoint support for AArch64. Differential Revision: https://reviews.llvm.org/D66012 Patch By: loicottet (with major rebase by me)	2020-09-14 16:43:08 -07:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Sanjay Patel	3a8ea8609b	[Intrinsics] define semantics for experimental fmax/fmin vector reductions As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html This is hopefully the final remaining showstopper before we can remove the 'experimental' from the reduction intrinsics. No behavior was specified for the FP min/max reductions, so we have a mess of different interpretations. There are a few potential options for the semantics of these max/min ops. I think this is the simplest based on current behavior/implementation: make the reductions inherit from the existing llvm.maxnum/minnum intrinsics. These correspond to libm fmax/fmin, and those are similar to the (now deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing data). So the default expansion creates calls to libm functions. Another option would be to inherit from llvm.maximum/minimum (NaNs propagate), but most targets just crash in codegen when given those nodes because no default expansion was ever implemented AFAICT. We could also just assume 'nnan' semantics by default (we are already assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets (AArch64, PowerPC) support the more defined behavior, so it doesn't make much sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to loosen the semantics. (Note that D67507 was proposed to update the LangRef to acknowledge the more recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do update based on the new standard, the reduction instructions can seamlessly inherit from whatever updates are made to the max/min intrinsics.) x86 sees a regression here on 'nnan' tests because we have underlying, longstanding bugs in FMF creation/propagation. Those need to be fixed apart from this change (for example: https://llvm.org/PR35538). The expansion sequence before this patch may not have been correct. Differential Revision: https://reviews.llvm.org/D87391	2020-09-12 09:10:28 -04:00
Kerry McLaughlin	cd89f5c91b	[SVE][CodeGen] Legalisation of truncate for scalable vectors Truncating from an illegal SVE type to a legal type, e.g. `trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>` fails after PromoteIntOp_CONCAT_VECTORS attempts to create a BUILD_VECTOR. This patch changes the promote function to create a sequence of INSERT_SUBVECTORs if the return type is scalable, and replaces these with UNPK+UZP1 for AArch64. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86548	2020-09-10 11:35:33 +01:00
Muhammad Asif Manzoor	1ffcbe35ae	[AArch64][SVE] Add lowering for rounding operations Add the functionality to lower SVE rounding operations for passthru variant. Created a new test case file for all rounding operations. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86793	2020-09-04 11:16:57 -04:00
Benjamin Kramer	8782c72765	Strength-reduce SmallVectors to arrays. NFCI.	2020-08-28 21:14:20 +02:00
David Sherwood	f4257c5832	[SVE] Make ElementCount members private This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isScalable(). In addition I've added some other member functions for more commonly used operations. Hopefully this makes the class more useful and will reduce the need for calling getKnownMinValue(). Differential Revision: https://reviews.llvm.org/D86065	2020-08-28 14:43:53 +01:00
Ties Stuij	d678e14c55	[AArch64][CodeGen] Restrict bfloat vector operations to what's actually supported Previously in addTypeForNeon, we would set the operations for bfloat vectors like other generic types. But as bfloat is a storage-only type a number of operations shouldn't be set. This patch fixes that. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85101	2020-08-28 11:44:37 +01:00
Mikhail Maltsev	23d5e93f34	[AArch64] Optimize instruction selection for certain vector shuffles This patch adds code to recognize vector shuffles which can be represented as VDUP (splat) of a vector lane with of a different (wider) type than the original vector lane type. For example: shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1> is essentially: shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 0, i32 0> Such patterns are generated by the SelectionDAG machinery in some cases (see DAGCombiner::visitBITCAST in DAGCombiner.cpp, the "Remove double bitcasts from shuffles" part). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D86225	2020-08-27 11:06:49 +01:00
Paul Walker	81337c915f	[SVE] Fallback to default expansion when lowering SIGN_EXTEN_INREG from non-byte based source. Differential Revision: https://reviews.llvm.org/D86394	2020-08-27 10:57:37 +01:00
Ahmed Bougacha	383f7c8858	[AArch64] Use CCAssignFnForReturn helper in more spots. NFC. It was added for GISel, but SDAG could use it too!	2020-08-26 14:39:11 -07:00
Muhammad Asif Manzoor	fd536eeed9	[AArch64][SVE] Add lowering for llvm fceil Add the functionality to lower fceil for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D84548	2020-08-26 15:59:44 -04:00
Paul Walker	73ac3c0ede	[SVE] Lower scalable vector ISD::FNEG operations. Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A) transformation when -1 is expressed as an ISD::SPLAT_VECTOR. Differential Revision: https://reviews.llvm.org/D86415	2020-08-25 11:22:28 +01:00
Cameron McInally	36dbb8fc97	[SVE] Lower fixed length UDIV to scalable Pretty much just a copy of the SDIV patches (D86114 and D85982) with string replacement. Differential Revision: https://reviews.llvm.org/D86316	2020-08-21 09:01:25 -05:00
Cameron McInally	ac63959460	[SVE] Lower fixed length vXi8/vXi16 SDIV to scalable There are no nxv16i8/nxv8i16 SDIV instructions, so these fixed width operations must be promoted to nxv4i32. Differential Revision: https://reviews.llvm.org/D86114	2020-08-20 13:47:01 -05:00
Mehdi Amini	a407ec9b6d	Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private."" Was reverted because MLIR/Flang builds were broken, these APIs have been fixed in the meantime.	2020-08-19 17:26:36 +00:00
Mehdi Amini	4fc56d70aa	Revert "[NFC][llvm] Make the contructors of `ElementCount` private." This reverts commit `264afb9e6a`. (and dependent `6b742cc48` and `fc53bd610f`) MLIR/Flang are broken.	2020-08-19 17:21:37 +00:00
Francesco Petrogalli	264afb9e6a	[NFC][llvm] Make the contructors of `ElementCount` private. Differential Revision: https://reviews.llvm.org/D86120	2020-08-19 16:26:44 +00:00
Eli Friedman	bb18532399	[AArch64][SVE] Allow llvm.aarch64.sve.st2/3/4 with vectors of pointers. This isn't necessaary for ACLE, but could be useful in other situations. And the change is simple. Differential Revision: https://reviews.llvm.org/D85251	2020-08-18 12:51:16 -07:00
Paul Walker	cb5cc47a65	[SVE] Lower fixed length vector ISD::SPLAT_VECTOR operations. Also strengthens the CHECK lines for scalable vector splat tests. Differential Revision: https://reviews.llvm.org/D86070	2020-08-18 11:19:43 +01:00
Cameron McInally	92593f9e77	[SVE] Lower fixed length vXi32/vXi64 SDIV to scalable vectors. Differential Revision: https://reviews.llvm.org/D85982	2020-08-14 18:47:22 -05:00
Cameron McInally	21810b0e14	[SVE] Lower fixed length vector integer UMIN/UMAX Differential Revision: https://reviews.llvm.org/D85926	2020-08-13 14:48:36 -05:00
Cameron McInally	e1a87f0a9b	[SVE] Lower fixed length vector integer SMIN/SMAX Differential Revision: https://reviews.llvm.org/D85855	2020-08-13 11:41:20 -05:00
Paul Walker	e63cc8105a	[SVE] Lower fixed length vector integer shifts. Differential Revision: https://reviews.llvm.org/D85724	2020-08-13 12:35:47 +01:00
Paul Walker	130098228d	[SVE] Lower fixed length vector integer ISD::SETCC operations. Differential Revision: https://reviews.llvm.org/D85831	2020-08-13 12:01:56 +01:00
Paul Walker	9e04895258	[SVE] Lower fixed length integer extend operations. Differential Revision: https://reviews.llvm.org/D85640	2020-08-13 11:54:53 +01:00
Francesco Petrogalli	c561f4d2ec	[SVE][VLS] Don't combine logical AND. Testing is performed when targeting 128, 256 and 512-bit wide vectors. For 128-bit vectors, the original behavior of using NEON instructions is preserved. Differential Revision: https://reviews.llvm.org/D85479	2020-08-12 20:00:07 +01:00
Cameron McInally	ce2c991061	[SVE] Lower fixed length FP minnum/maxnum Lower fixed length MINNUM/MAXNUM to scalable vectors. Cherry-picked from D71767 with added tests. Differential Revision: https://reviews.llvm.org/D85744	2020-08-12 12:02:52 -05:00
David Sherwood	88bbd30736	[SVE][CodeGen] Fix issues with EXTRACT_SUBVECTOR when using scalable FP vectors In this patch I have fixed two issues: 1. Our SVE tuple get/set intrinsics were using the wrong constant type for the index passed to EXTRACT_SUBVECTOR. I have fixed this by using the function SelectionDAG::getVectorIdxConstant to create the value. Also, I have updated the documentation for EXTRACT_SUBVECTOR describing what type the constant index should be and we now enforce this when creating the node. 2. The AArch64 backend was missing the appropriate patterns for extracting certain subvectors (nxv4f16 and nxv2f32) from legal SVE types. I have added them as part of this patch. The only way that I could find to test the new patterns was to use the SVE tuple get intrinsics, although I realise it looks a bit unusual. Tests added here: test/CodeGen/AArch64/sve-extract-subvector.ll Differential Revision: https://reviews.llvm.org/D85516	2020-08-12 08:35:46 +01:00
Paul Walker	b6c7b7fa31	[SVE] Add ISD nodes for predicated integer extend inreg operations. These are useful instructions when lowering fixed length vector extends, so I've broken this patch out as kind of NFC like work. Differential Revision: https://reviews.llvm.org/D85546	2020-08-11 11:39:26 +01:00
Paul Walker	d542feb8e4	[SVE] Lower fixed length vector integer subtract operations. Differential Revision: https://reviews.llvm.org/D85665	2020-08-11 11:32:12 +01:00
Paul Walker	0d33a8ef5b	[SVE] Lower scalable vector mul operations. This allows us to remove extra patterns from AArch64SVEInstrInfo.td because we can reuse those required for fixed length vectors. Differential Revision: https://reviews.llvm.org/D85328	2020-08-06 11:15:35 +01:00
Paul Walker	3ed59b775d	[SVE] Implement lowering for fixed length vector multiplication. NOTE: Also uses SVE code generation for NEON size vectors, instead of expanding i64 based vector multiplications. Differential Revision: https://reviews.llvm.org/D85327	2020-08-06 11:01:39 +01:00
Paul Walker	927fc536ca	[SVE] Add lowering for fixed length vector and, or & xor operations. Since there are no ill effects when performing these operations with undefined elements, they are lowered to the already supported unpredicated scalable vector equivalents. Differential Revision: https://reviews.llvm.org/D85117	2020-08-05 11:28:34 +01:00
Sander de Smalen	f2916636f8	[AArch64][SVE] Disable tail calls if callee does not preserve SVE regs. This fixes an issue triggered by the following code, where emitEpilogue got confused when trying to restore the SVE registers after the call, whereas the call to bar() is implemented as a TCReturn: int non_sve(); int sve(svint32_t x) { return non_sve(); } Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84869	2020-08-05 09:38:54 +01:00
Eli Friedman	95efea4b93	[AArch64][SVE] Widen narrow sdiv/udiv operations. The SVE instruction set only supports sdiv/udiv for 32-bit and 64-bit integers. If we see an 8-bit or 16-bit divide, widen the operands to 32 bits, and narrow the result. Differential Revision: https://reviews.llvm.org/D85170	2020-08-04 13:22:15 -07:00
Paul Walker	4be13b15d6	[SVE] Replace remaining _MERGE_OP1 nodes with _PRED variants. This is the final bit of work to relax the register allocation requirements when code generating normal LLVM IR, which rarely care about the result of inactive lanes. By using _PRED nodes we can make better use of SVE's reversed instructions. Also removes a redundant parameter from the min/max tests. Differential Revision: https://reviews.llvm.org/D85142	2020-08-04 11:19:17 +01:00
David Sherwood	23ad660b5d	[SVE][CodeGen] At -O0 fallback to DAG ISel when translating alloca with scalable types When building code at -O0 We weren't falling back to DAG ISel correctly when encountering alloca instructions with scalable vector types. This is because the alloca has no operands that are scalable. I've fixed this by adding a check in AArch64ISelLowering::fallBackToDAGISel for alloca instructions with scalable types. Differential Revision: https://reviews.llvm.org/D84746	2020-07-30 08:40:53 +01:00
Eli Friedman	c02aa53ecb	[AArch64][SVE] Add "fast" fcmp operations. `dacf8d3` added support for most fcmp operations, but there are some extra variations I hadn't considered: SelectionDAG supports float comparisons that are neither ordered nor unordered. Add support for the missing operations. Differential Revision: https://reviews.llvm.org/D84460	2020-07-24 13:22:41 -07:00
Francesco Petrogalli	809600d664	[llvm][sve] Reg + Imm addressing mode for ld1ro. Reviewers: kmclaughlin, efriedma, sdesmalen Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83357	2020-07-24 17:48:47 +00:00
Petre-Ionut Tudor	1af9fc8213	[ARM] Generate [SU]HADD from ((a + b) >> 1) Summary: Teach LLVM to recognize the above pattern, where the operands are either signed or unsigned types. Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83777	2020-07-21 13:22:07 +01:00
Eli Friedman	b8f765a1e1	[AArch64][SVE] Add support for trunc to <vscale x N x i1>. This isn't a natively supported operation, so convert it to a mask+compare. In addition to the operation itself, fix up some surrounding stuff to make the testcase work: we need concat_vectors on i1 vectors, we need legalization of i1 vector truncates, and we need to fix up all the relevant uses of getVectorNumElements(). Differential Revision: https://reviews.llvm.org/D83811	2020-07-20 13:11:02 -07:00
Paul Walker	6384ec4099	[SVE] Add lowering for fixed length vector fdiv, fma, fmul and fsub operations. Differential Revision: https://reviews.llvm.org/D84034	2020-07-20 11:57:34 +00:00
Tim Northover	88464a55b4	AArch64: emit @llvm.debugtrap as `brk #0xf000` on all platforms It's useful for a debugger to be able to distinguish an @llvm.debugtrap from a (noreturn) @llvm.trap, so this extends the existing Windows behaviour to other platforms.	2020-07-20 10:31:26 +01:00
Paul Walker	509351d768	[SVE] Add lowering for scalable vector fadd, fdiv, fmul and fsub operations. Lower the operations to predicated variants. This is prep work required for fixed length code generation but also fixes a bug whereby these operations fail selection when "unpacked" vector types (e.g. MVT::nxv2f32) are used. This patch also adds the missing "unpacked" patterns for FMA. Differential Revision: https://reviews.llvm.org/D83765	2020-07-16 11:31:35 +00:00
Paul Walker	319a97b5e2	[SVE] Ensure fixed length vector fptrunc operations bigger than NEON are not considered legal. Differential Revision: https://reviews.llvm.org/D83568	2020-07-13 11:16:30 +00:00
Paul Walker	f78e6a3095	[SVE] Code generation for fixed length vector truncates. Lower fixed length vector truncates to a sequence of SVE UZP1 instructions. Differential Revision: https://reviews.llvm.org/D83395	2020-07-10 10:37:19 +00:00
Eli Friedman	56ae2cebcd	[AArch64][SVE] Add lowering for llvm.fma. This is currently bare-bones; we aren't taking advantage of any of the FMA variant instructions. But it's enough to at least generate code. Differential Revision: https://reviews.llvm.org/D83444	2020-07-09 16:12:41 -07:00
Paul Walker	614fb09645	[SVE] Disable some BUILD_VECTOR related code generator features. Fixed length vector code generation for SVE does not yet custom lower BUILD_VECTOR and instead relies on expansion. At the same time custom lowering for VECTOR_SHUFFLE is also not available so this patch updates isShuffleMaskLegal to reject vector types that require SVE. Related to this it also prevents the merging of stores after legalisation because this only works when BUILD_VECTOR is either legal or can be elminated. When this is not the case the code generator enters an infinite legalisation loop. Differential Revision: https://reviews.llvm.org/D83408	2020-07-09 10:47:04 +00:00
Paul Walker	fb75451775	[SVE] Custom ISel for fixed length extract/insert_subvector. We use extact_subvector and insert_subvector to "cast" between fixed length and scalable vectors. This patch adds custom c++ based ISel for the following cases: fixed_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0 scalable_vector = ISD::INSERT_SUBVECTOR undef(scalable_vector), fixed_vector, 0 Which result in either EXTRACT_SUBREG/INSERT_SUBREG for NEON sized vectors or COPY_TO_REGCLASS otherwise. Differential Revision: https://reviews.llvm.org/D82871	2020-07-08 09:49:28 +00:00
Petre-Ionut Tudor	af80a4353e	[ARM] Generate [SU]RHADD from (b - (~a)) >> 1 Summary: Teach LLVM to recognize the above pattern, which is usually a transformation of (a + b + 1) >> 1, where the operands are either signed or unsigned types. Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82669	2020-07-03 16:00:06 +01:00
Guillaume Chatelet	87e2751cf0	[Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D83082	2020-07-03 08:06:43 +00:00
Sander de Smalen	143e324e75	[CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics, that shouldn't really have been there (I suspect this was a remnant from when we expected the wider vector always to have come from a vector CONCAT). When I tried to create a more minimal reproducer, I found a bug in DAGCombiner where it drops the scalable flag when trying to fold: extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index') This patch fixes both issues. Reviewers: david-arm, efriedma, spatel Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D82910	2020-07-02 10:16:43 +01:00
Guillaume Chatelet	ef36f5143d	[Alignment] TargetLowering::hasPairedLoad must use Align for RequiredAlignment As per documentation of `hasPairLoad`: "`RequiredAlignment` gives the minimal alignment constraints that must be met to be able to select this paired load." In this sense, `0` is strictly equivalent to `1`. We make this obvious by using `Align` instead of unsigned. There is only one implementor of this interface. Differential Revision: https://reviews.llvm.org/D82958	2020-07-01 14:32:30 +00:00
Paul Walker	a1aed80a35	[SVE] Relax merge requirement for IR based divides. We currently lower SDIV to SDIV_MERGE_OP1. This forces the value for inactive lanes in a way that can hamper register allocation, however, the lowering has no requirement for inactive lanes. Instead this patch replaces SDIV_MERGE_OP1 with SDIV_PRED thus freeing the register allocator. Once done the only user of SDIV_MERGE_OP1 is intrinsic lowering so I've removed the node and perform ISel on the intrinsic directly. This also allows us to implement MOVPRFX based zeroing in the same manner as SUB. This patch also renames UDIV_MERGE_OP1 and [F]ADD_MERGE_OP1 for the same reason but in the ADD cases the ISel code is already as required. Differential Revision: https://reviews.llvm.org/D82783	2020-07-01 08:18:42 +00:00
Guillaume Chatelet	28de229bc6	[Alignment][NFC] Migrate MachineFrameInfo::CreateStackObject to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82894	2020-07-01 07:28:11 +00:00
Christopher Tetreault	ab35ba5742	[SVE] Remove calls to VectorType::getNumElements from AArch64 Reviewers: efriedma, paquette, david-arm, kmclaughlin Reviewed By: david-arm Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82214	2020-06-30 11:17:50 -07:00
Guillaume Chatelet	4f5133a4dc	[Alignment][NFC] Migrate AArch64, ARM, Hexagon, MSP and NVPTX backends to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82749	2020-06-30 07:56:17 +00:00
Sander de Smalen	39f6a36a24	[AArch64][SVE] NFCI: Choose consistent naming for predicated SDAG nodes This patch proposes a naming convention for operations that take a general predicate (and are thus predicated) that specifies what happens to the false lanes. Currently the _PRED suffix is used, which doesn't really say much other than that it takes a predicate. In some instances this means it has merging predication and in other cases it means zeroing-predication. This patch also changes the order of operands to AArch64ISD::DUP_MERGE_PASSTHRU, to pass the predicate as the first operand, which is in line with all other predicates nodes. It takes the passthru value as an explicit passthru value, which is always passed as the last operand. Reviewers: paulwalker-arm, cameron.mcinally, eli.friedman, dancgr, efriedma Reviewed By: paulwalker-arm Tags: #llvm Differential Revision: https://reviews.llvm.org/D81850	2020-06-29 13:37:30 +01:00
Kerry McLaughlin	bb6603f013	[AArch64][SVE] Bail out of performPostLD1Combine for scalable types Summary: performPostLD1Combine will introduce either a LD1LANEpost or LD1DUPpost node, which will cause selection failure if the return type is a scalable vector. Reviewers: sdesmalen, c-rhodes, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82670	2020-06-29 11:59:53 +01:00
Simon Pilgrim	973685fc78	[TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.	2020-06-29 11:46:58 +01:00
Paul Walker	3a98d5d7e7	[SVE] Code generation for fixed length vector adds. Summary: Teach LowerToPredicatedOp to lower fixed length vector operations. Add AArch64ISD nodes and isel patterns for predicated integer and floating point adds. Together this enables SVE code generation for fixed length vector adds. Reviewers: rengolin, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82483	2020-06-26 19:54:41 +00:00
Kerry McLaughlin	6b313f198c	[AArch64][SVE] Remove asserts from AArch64ISelLowering for bfloat16 types Remove the asserts in performLDNT1Combine & performST[NT]1Combine to ensure we get a failure where the type is a bfloat16 and hasBF16() is false, regardless of whether asserts are enabled.	2020-06-26 14:51:27 +01:00
Cullen Rhodes	4319c48fc7	[AArch64][SVE] Only support sizeless bfloat types if supported by subtarget Reviewers: sdesmalen, efriedma, kmclaughlin, fpetrogalli Reviewed By: sdesmalen, fpetrogalli Differential Revision: https://reviews.llvm.org/D82494	2020-06-26 12:37:47 +00:00
Kerry McLaughlin	edcfef8fee	[AArch64][SVE] Add bfloat16 support to store intrinsics Summary: Bfloat16 support added for the following intrinsics: - ST1 - STNT1 Reviewers: sdesmalen, c-rhodes, fpetrogalli, efriedma, stuij, david-arm Reviewed By: fpetrogalli Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D82448	2020-06-26 11:05:56 +01:00
Kerry McLaughlin	0ccfe1b267	[AArch64][SVE] Predicate bfloat16 load patterns with HasBF16 Reviewers: sdesmalen, c-rhodes, efriedma, fpetrogalli Reviewed By: fpetrogalli Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82464	2020-06-26 10:38:24 +01:00
Eli Friedman	e9d4e34ab8	[AArch64][SVE] Add legalization support for i32/i64 vector srem/urem Implement them on top of sdiv/udiv, similar to what we do for integer types. Potential future work: implementing i8/i16 srem/urem, optimizations for constant divisors, optimizing the mul+sub to mls. Differential Revision: https://reviews.llvm.org/D81511	2020-06-23 16:27:52 -07:00
Paul Walker	499c63288f	[SVE] Code generation for fixed length vector loads & stores. Summary: This patch adds base support for code generating fixed length vector operations targeting a known SVE vector length. To achieve this we lower fixed length vector operations to equivalent scalable vector operations, whereby SVE predication is used to limit the elements processed to those present within the fixed length vector. Specifically this patch implements load and store operations, which get lowered to their masked counterparts thusly: V = load(Addr) => V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr)) store(V, (Addr)) => masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr)) Reviewers: rengolin, efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80385	2020-06-23 09:39:03 +00:00
David Sherwood	584d0d5c17	[SVE] Fall back on DAG ISel at -O0 when encountering scalable types At the moment we use Global ISel by default at -O0, however it is currently not capable of dealing with scalable vectors for two reasons: 1. The register banks know nothing about SVE registers. 2. The LLT (Low Level Type) class knows nothing about scalable vectors. For now, the easiest way to avoid users hitting issues when using the SVE ACLE is to fall back on normal DAG ISel when encountering instructions that operate on scalable vector types. I've added a couple of RUN lines to existing SVE tests to ensure we can compile at -O0. I've also added some new tests to CodeGen/AArch64/GlobalISel/arm64-fallback.ll that demonstrate we correctly fallback to DAG ISel at -O0 when lowering formal arguments or translating instructions that involve scalable vector types. Differential Revision: https://reviews.llvm.org/D81557	2020-06-19 10:57:00 +01:00
David Sherwood	0dc28af219	[CodeGen,AArch64] Fix up warnings in performExtendCombine Try to avoid calling getVectorNumElements() or relying upon the TypeSize conversion to uin64_t. Differential Revision: https://reviews.llvm.org/D81573	2020-06-19 10:34:51 +01:00
Paul Walker	4612f39120	[SVE] Add flag to specify SVE register size, using this to calculate legal vector types. Adds aarch64-sve-vector-bits-{min,max} to allow the size of SVE data registers (in bits) to be specified. This allows the code generator to make assumptions it normally couldn't. As a starting point this information is used to mark fixed length vector types that can fit within the specified size as legal. Reviewers: rengolin, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80384	2020-06-18 12:11:16 +00:00
Nikita Popov	7cac7e0cfc	[IR] Prefer hasFnAttribute() where possible (NFC) When checking for an enum function attribute, use hasFnAttribute() rather than hasAttribute() at FunctionIndex, because it is significantly faster (and more concise to boot).	2020-06-15 09:30:35 +02:00
Shawn Landden	9ec57cce62	[AArch64] custom lowering for i128 popcount halves the number of CNT instructions generated	2020-06-10 09:44:16 +04:00
Henry Kao	4dcc0d1958	[CodeGen][SVE] Avoid scalarizing zero splat stores on scalable vectors. Summary: Implemented in replaceZeroVectorStore(). Fixes several warnings in AArch64 SVE unit tests. Reviewers: sdesmalen, kmclaughlin, dancgr, efriedma, each, andwar, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80824	2020-06-09 12:52:39 -04:00
Guillaume Chatelet	3b6196c9b3	[Alignment][NFC] TargetLowering::allowsMisalignedMemoryAccesses Summary: Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMisalignedMemoryAccesses` without marking it override. This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81374	2020-06-09 10:17:42 +00:00
Cullen Rhodes	b82be5db71	[AArch64][SVE] Implement structured load intrinsics Summary: This patch adds initial support for the following instrinsics: * llvm.aarch64.sve.ld2 * llvm.aarch64.sve.ld3 * llvm.aarch64.sve.ld4 For loading two, three and four vectors worth of data. Basic codegen is implemented with reg+reg and reg+imm addressing modes being addressed in a later patch. The types returned by these intrinsics have a number of elements that is a multiple of the elements in a 128-bit vector for a given type and N, where N is the number of vectors being loaded, i.e. 2, 3 or 4. Thus, for 32-bit elements the types are: LD2 : <vscale x 8 x i32> LD3 : <vscale x 12 x i32> LD4 : <vscale x 16 x i32> This is implemented with target-specific intrinsics for each variant that take the same operands as the IR intrinsic but return N values, where the type of each value is a full vector, i.e. <vscale x 4 x i32> in the above example. These values are then concatenated using the standard concat_vector intrinsic to maintain type legality with the IR. These intrinsics are intended for use in the Arm C Language Extension (ACLE). Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D75751	2020-06-09 08:51:58 +00:00
David Sherwood	30dfbf03a2	[CodeGen,AArch64] Fix up warnings in splitStores The code for trying to split up stores is designed for NEON vectors, where we support arbitrary alignments. It's an optimisation designed to improve performance by using smaller, aligned stores. However, we currently only support 16 byte alignments for SVE vectors anyway so we may as well bail out early. This change fixes up remaining warnings in a couple of tests: CodeGen/AArch64/sve-callbyref-notailcall.ll CodeGen/AArch64/sve-calling-convention-byref.ll Differential Revision: https://reviews.llvm.org/D80720	2020-06-09 07:39:38 +01:00
David Sherwood	41fb119e8c	[CodeGen] Fix nullptr crash in tryConvertSVEWideCompare When the input to a wide compare instruction is a DUP or SPLAT_VECTOR node we should deal with cases where the DUP/SPLAT_VECTOR input operand is not an immediate value. I've fixed the code to return SDValue() in such cases and added a couple of tests - one each to represent the signed and unsigned cases. Differential Revision: https://reviews.llvm.org/D81167	2020-06-08 15:20:18 +01:00
Cullen Rhodes	3ebbe35363	[AArch64][SVE] Implement vector tuple intrinsics Summary: This patch adds the following intrinsics for creating two-tuple, three-tuple and four-tuple scalable vectors: * llvm.aarch64.sve.tuple.create2 * llvm.aarch64.sve.tuple.create3 * llvm.aarch64.sve.tuple.create4 As well as: * llvm.aarch64.sve.tuple.get * llvm.aarch64.sve.tuple.set For extracting and inserting scalable vectors from vector tuples. These intrinsics are intended to be used by the ACLE functions svcreate<n>, svget and svset. This patch also includes calling convention support for passing and returning tuples of scalable vectors to/from functions. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D75674	2020-06-08 11:09:55 +00:00
Kadir Cetinkaya	6bad8b07e6	[llvm][AArch64] Fix unused variable	2020-06-05 15:56:19 +02:00
Kerry McLaughlin	89fc0166f5	[CodeGen][SVE] Legalisation of extends with scalable types Summary: This patch adds legalisation of extensions where the operand of the extend is a legal scalable type but the result is not. EXTRACT_SUBVECTOR is used to split the result, before being replaced by target-specific [S\|U]UNPK[HI\|LO] operations. For example: ``` zext <vscale x 16 x i8> %a to <vscale x 16 x i16> ``` should emit: ``` uunpklo z2.h, z0.b uunpkhi z1.h, z0.b ``` Reviewers: sdesmalen, efriedma, david-arm Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79587	2020-06-05 12:08:42 +01:00
Eric Christopher	8c9badf61d	Replace integer usage with enumeration.	2020-06-03 20:00:28 -07:00
Francesco Petrogalli	febeaf94a8	[llvm][SVE] IR intrinsic for LD1RO. Reviewers: sdesmalen, efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80738	2020-06-03 13:57:16 +00:00
Amara Emerson	f573d489b6	[AArch64][GlobalISel] Split G_GLOBAL_VALUE into ADRP + G_ADD_LOW and optimize. The concept of G_GLOBAL_VALUE is nice and simple, but always using it as the representation for global var addressing until selection time creates some problems in optimizing accesses in certain code/relocation models. The problem comes from trying to optimize adrp -> add -> load/store sequences in the most common "small" code model. These accesses can be optimized into an adrp -> load with the add offset being folded into the load's immediate field. If we try to keep all global var references as a single generic instruction then by the time we get to the complex operand trying to match these, we end up generating an adrp at the point of use. The real issue here is that we don't have any form of CSE during selection, so the code size will bloat from many redundant adrp's. This patch custom legalizes small code mode non-GOT G_GLOBALs into target ADRP and a new "target specific generic opcode" G_ADD_LOW. We also teach the localizer to localize these instructions via the custom hook that was added recently. Finally, the complex pattern for indexed loads/stores is extended to try to fold these G_ADD_LOW instructions into the load immediate. On -O0 CTMark, we see a 0.8% geomean code size improvement. We should also see some minor performance improvements too. Differential Revision: https://reviews.llvm.org/D78465	2020-06-01 16:00:56 -07:00
Martin Storsjö	cf97e0ec42	[AArch64] Treat x18 as callee-saved in functions with windows calling convention on non-windows OSes Treat it as callee-saved, and always back it up. When windows code calls entry points in unix code, marked with the windows calling convention, that unix code can call other functions that isn't compiled with -ffixed-x18 which may clobber x18 freely. By backing it up and restoring it on return, we preserve the register across the function call, fulfilling this part of the windows calling convention on another OS. This isn't enough for making sure that x18 is preseved when non-windows code does a callback to windows code, but is a clear improvement over the current status quo. Additionally, wine is nowadays building many modules as PE DLLs, which avoids the callback issue altogether for those DLLs. Differential Revision: https://reviews.llvm.org/D61892	2020-05-30 09:22:09 +03:00
Christopher Tetreault	56eb7556e7	[SVE] Eliminate calls to default-false VectorType::get() from AArch64 Reviewers: efriedma, c-rhodes, david-arm, mcrosier, t.p.northover Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80327	2020-05-29 15:39:30 -07:00
Zequan Wu	80e107ccd0	Add NoMerge MIFlag to avoid MIR branch folding Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given Differential Revision: https://reviews.llvm.org/D79537	2020-05-29 12:31:06 -07:00
David Sherwood	205085d4cc	[CodeGen] Fix warnings in LowerToPredicatedOp When creating a new vector type based on another vector type we should pass in the element count instead of the number of elements and scalable flag separately. I encountered this warning whilst compiling this test: CodeGen/AArch64/sve-intrinsics-int-compares.ll Differential revision: https://reviews.llvm.org/D80621	2020-05-29 15:19:03 +01:00
Ties Stuij	78bd0c0e5e	[AArch64][BFloat] add BFloat instruction support for AArch64 Summary: Add support for lowering various BFloat related SelDAG nodes: - load/store (ldrh/strh) - concat - dup/duplane - bitconvert/bitcast - insert_subvector/insert_subreg This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a The bfloat type, and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile Reviewers: ab, t.p.northover, john.brawn, fpetrogalli, sdesmalen, LukeGeeson Reviewed By: fpetrogalli Subscribers: LukeGeeson, pbarrio, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79712	2020-05-27 15:36:54 +01:00
Ties Stuij	42eba9b40b	[AArch64][BFloat] basic AArch64 bfloat support Summary: This patch adds the bfloat type to the AArch64 backend: - adds it as part of the FPR16 register class - adds bfloat calling conventions - as f16 is now not the only FPR16 type anymore, we need to constrain a number of instruction patterns using FPR16Op to help out the TableGen type inferrer This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a The bfloat type, and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile Reviewers: t.p.northover, c-rhodes, fpetrogalli, sdesmalen, ostannard, LukeGeeson, ab Reviewed By: fpetrogalli Subscribers: pbarrio, LukeGeeson, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79709	2020-05-27 15:26:40 +01:00
Craig Topper	80cc43b420	[AArch64] Set i32 ISD::MULHU/S to Expand instead of Legal. Looks like there are no isel patterns for these. A DAG combine turns it into i64 multiply and a shift which hides this. Extracted from D80485	2020-05-26 00:41:09 -07:00
Sanjay Patel	7eed772a27	[PatternMatch] abbreviate vector inst matchers; NFC Readability is not reduced with these opcodes/match lines, so reduce odds of awkward wrapping from 80-col limit.	2020-05-24 09:19:47 -04:00
Alexey Lapshin	bf242c067e	[AARCH64][NEON] Allow to sink operands of aarch64_neon_pmull64. Summary: This patch fixes a problem when pmull2 instruction is not generated for vmull_high_p64 intrinsic. ISel has a pattern for int_aarch64_neon_pmull64 intrinsic to generate PMULL2 instruction. That pattern assumes that extraction operations are located in the same basic block. We need to sink them if they are not. Handle operands of int_aarch64_neon_pmull64 into AArch64TargetLowering::shouldSinkOperands. Reviewed by: efriedma Differential Revision: https://reviews.llvm.org/D80320	2020-05-22 01:35:24 +03:00
Eli Friedman	a1ce88b4e3	[AArch64][SVE] Implement AArch64ISD::SETCC_PRED This unifies SETCC operations along the lines of other operations. Differential Revision: https://reviews.llvm.org/D79975	2020-05-15 11:53:21 -07:00
Benjamin Kramer	f242950fdf	Fold single-use variables into assert This avoids unused variable warnings in Release builds.	2020-05-12 15:26:59 +02:00
Sander de Smalen	077d2d6802	[CodeGen][SVE] Add patterns for whole vector predicate select Added patterns to implement `select i1 %p, <vty> %a, <vty> %b` Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D79356	2020-05-12 11:47:39 +01:00
Petre-Ionut Tudor	9682d0d5dc	[ARM] Refactor lower to S[LR]I optimization Summary: The optimization has been refactored to fix certain bugs and limitations. The condition for lowering to S[LR]I has been changed to reflect the manual pseudocode description of SLI and SRI operation. The optimization can now handle more cases of operand type and order. Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79233	2020-05-12 11:00:13 +01:00
Craig Topper	d1119980e5	[SelectionDAG] Use Align/MaybeAlign for ConstantPoolSDNode. This patch stores the alignment for ConstantPoolSDNode as an Align and updates the getConstantPool interface to take a MaybeAlign. Removing getAlignment() will be done as a follow up. Differential Revision: https://reviews.llvm.org/D79436	2020-05-08 16:04:11 -07:00
Kerry McLaughlin	3bcd3dd473	[CodeGen][SVE] Lowering of shift operations with scalable types Summary: Adds AArch64ISD nodes for: - SHL_PRED (logical shift left) - SHR_PRED (logical shift right) - SRA_PRED (arithmetic shift right) Existing patterns for unpredicated left shift by immediate have also been moved into the appropriate multiclasses in SVEInstrFormats.td. Reviewers: sdesmalen, efriedma, ctetreau, huihuiz, rengolin Reviewed By: efriedma Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79478	2020-05-07 11:43:49 +01:00
Eli Friedman	2c8546107a	[AArch64][SVE] Implement lowering for SIGN_EXTEND etc. of SVE predicates. Now using patterns, since there's a single-instruction lowering. (We could convert to VSELECT and pattern-match that, but there doesn't seem to be much point.) I think this might be the first instruction to use nested multiclasses this way? It seems like a good way to reduce duplication between different integer widths. Let me know if it seems like an improvement. Also, while I'm here, fix the return type of SETCC so we don't try to merge a sign-extend with a SETCC. Differential Revision: https://reviews.llvm.org/D79193	2020-05-06 17:56:32 -07:00
Kerry McLaughlin	19f5da9c1d	[SVE][Codegen] Lower legal min & max operations Summary: This patch adds AArch64ISD nodes for [S\|U]MIN_PRED and [S\|U]MAX_PRED, and lowers both SVE intrinsics and IR operations for min and max to these nodes. There are two forms of these instructions for SVE: a predicated form and an immediate (unpredicated) form. The patterns which existed for the latter have been updated to match a predicated node with an immediate and map this to the immediate instruction. Reviewers: sdesmalen, efriedma, dancgr, rengolin Reviewed By: efriedma Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79087	2020-05-04 11:19:19 +01:00
Cullen Rhodes	672b62ea21	[AArch64][SVE] Custom lowering of floating-point reductions Summary: This patch implements custom floating-point reduction ISD nodes that have vector results, which are used to lower the following intrinsics: * llvm.aarch64.sve.fadda * llvm.aarch64.sve.faddv * llvm.aarch64.sve.fmaxv * llvm.aarch64.sve.fmaxnmv * llvm.aarch64.sve.fminv * llvm.aarch64.sve.fminnmv SVE reduction instructions keep their result within a vector register, with all other bits set to zero. Changes in this patch were implemented by Paul Walker and Sander de Smalen. Reviewers: sdesmalen, efriedma, rengolin Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D78723	2020-04-30 10:18:40 +00:00
Kerry McLaughlin	53dd72a87a	[SVE][CodeGen] Lower SDIV & UDIV to SVE intrinsics Summary: This patch maps IR operations for sdiv & udiv to the @llvm.aarch64.sve.[s\|u]div intrinsics. A ptrue must be created during lowering as the div instructions have only a predicated form. Patch contains changes by Andrzej Warzynski. Reviewers: sdesmalen, c-rhodes, efriedma, cameron.mcinally, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, andwar, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78569	2020-04-24 11:38:20 +01:00
Christopher Tetreault	84584b0d29	[SVE] Remove calls to isScalable from AARCH64 Reviewers: efriedma, sdesmalen, t.p.northover, mcrosier Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77758	2020-04-23 13:09:17 -07:00
Kerry McLaughlin	17f6e18acf	[AArch64][SVE] Add SVE intrinsic for LD1RQ Summary: Adds the following intrinsic for contiguous load & replicate: - @llvm.aarch64.sve.ld1rq The LD1RQ intrinsic only needs the SImmS16XForm added by this patch. The others (SImmS2XForm, SImmS3XForm & SImmS4XForm) were added for consistency. Reviewers: andwar, sdesmalen, efriedma, cameron.mcinally, dancgr, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76929	2020-04-22 11:29:27 +01:00
Petre-Ionut Tudor	52474992b1	Revert "[ARM] Fix conditions for lowering to S[LR]I" This reverts commit `cabfcf840a`. The patch introduced another bug in the optimization.	2020-04-20 16:11:04 +01:00
Kerry McLaughlin	33ffce5414	[AArch64][SVE] Remove LD1/ST1 dependency on llvm.masked.load/store Summary: The SVE masked load and store intrinsics introduced in D76688 rely on common llvm.masked.load/store nodes. This patch creates new ISD nodes for LD1(S) & ST1 to remove this dependency. Additionally, this adds support for sign & zero extending loads and truncating stores. Reviewers: sdesmalen, efriedma, cameron.mcinally, c-rhodes, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, andwar, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78204	2020-04-20 11:08:11 +01:00
Francesco Petrogalli	897fdec586	[llvm][CodeGen] Addressing modes for SVE stN. This reverts commit `17b1869b72`. It is an attempt to fix the failure reported at The patch differs from the original one reviwed at https://reviews.llvm.org/D77435 only for the use of the std::make_tuple in building the return value of `findAddrModeSVELoadStore`: - return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset}; + return std::make_tuple(IsRegReg ? Opc_rr : Opc_ri, NewBase, the original patch submitted at `fc4e954ed5` was failing the following build: http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420/ with error: /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10: error: chosen constructor is explicit in copy-initialization return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset}; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here constexpr tuple(_UElements&&... __elements) ^ 1 error generated.	2020-04-17 20:35:35 +01:00
Francesco Petrogalli	17b1869b72	Revert "[llvm][CodeGen] Addressing modes for SVE stN." This reverts commit `fc4e954ed5`. The commit reported the following failure: http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420 FAILED: lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o /usr/bin/c++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Target/AArch64 -I/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64 -I/usr/include/libxml2 -Iinclude -I/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/include -mthumb -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -fvisibility=hidden -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MMD -MT lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o -MF lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o.d -o lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o -c /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10: error: chosen constructor is explicit in copy-initialization return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset}; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here constexpr tuple(_UElements&&... __elements)	2020-04-17 20:03:11 +01:00
Francesco Petrogalli	fc4e954ed5	[llvm][CodeGen] Addressing modes for SVE stN. Reviewers: efriedma, sdesmalen, c-rhodes, ctetreau Reviewed By: c-rhodes Subscribers: tschuett, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77435	2020-04-17 19:31:44 +01:00
Francesco Petrogalli	48879c02bf	[llvm][CodeGen] Fix issue for SVE gather prefetch. Summary: This change is fixing an issue where the dagcombine incorrectly used an addressing mode with scaled offsets (indices), instead of unscaled offsets. Those addressing modes do not exist for `prfh` , `prfw` and `prfd`, hence we can reuse `prfb` because that has unscaled offsets, and because the pseudo-code in the XML spec suggests that the element size is not used for the amount of data that is prefetched by the instruction. FWIW, GCC also emits a `prfb` for these cases. Reviewers: sdesmalen, andwar, rengolin Reviewed By: sdesmalen Subscribers: tschuett, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78069	2020-04-17 19:23:28 +01:00
Benjamin Kramer	d1ef44982f	[AArch64] Fold one-use variables into assert Avoids unused variable warnings in Release builds.	2020-04-17 19:43:06 +02:00
Petre-Ionut Tudor	cabfcf840a	[ARM] Fix conditions for lowering to S[LR]I Summary: Fixed wrong conditions for generating (S[LR]I X, Y, C2) from (or (and X, BvecC1), (lsl Y, C2)) and added ISel nodes to lower to S[LR]I. The optimisation is also enabled by default now. Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77387	2020-04-17 17:19:24 +01:00
Benjamin Kramer	166467e822	[VectorUtils] Create shufflevector masks as int vectors instead of Constants No functionality change intended.	2020-04-17 15:28:00 +02:00
Francesco Petrogalli	89680f25e8	[llvm][CodeGen] Rename SVE gather prefetch intrinsics. [NFC] Summary: The renaming is necessary to make the naming scheme uniform with other gather/scatter load/stores SVE intrinsics. The naming of variables and functions have been adapted to make it explicit whether we are dealing with a scalar offset (which is unscaled) or an index (which is scaled according to the data type of the lanes of the vector). Reviewers: andwar, sdesmalen, rengolin Reviewed By: andwar Subscribers: tschuett, hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77839	2020-04-15 21:49:16 +01:00
Christopher Tetreault	05a079895c	[SVE] Remove calls to getBitWidth from AArch64 Reviewers: efriedma Reviewed By: efriedma Subscribers: danielkiss, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77905	2020-04-14 10:26:37 -07:00
Craig Topper	113f37a1f9	[CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase Differential Revision: https://reviews.llvm.org/D77995	2020-04-13 13:50:15 -07:00
Christopher Tetreault	9f87d951fc	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: mcrosier, efriedma, sdesmalen Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77269	2020-04-09 16:43:29 -07:00
Eli Friedman	1ee6ec2bf3	Remove "mask" operand from shufflevector. Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors. This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types. I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that. Differential Revision: https://reviews.llvm.org/D72467	2020-03-31 13:08:59 -07:00
Eli Friedman	dacf8d3562	[AArch64][SVE] Add support for fcmp. This also requires support for boolean "not", so I added boolean logic while I was there. Differential Revision: https://reviews.llvm.org/D76901	2020-03-31 12:04:39 -07:00
Kerry McLaughlin	05606329e2	[AArch64][SVE] Add SVE intrinsics for masked loads & stores Summary: Implements the following intrinsics for contiguous loads & stores: - @llvm.aarch64.sve.ld1 - @llvm.aarch64.sve.st1 Reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, dancgr, rengolin Reviewed By: cameron.mcinally Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76688	2020-03-25 11:48:40 +00:00
Amara Emerson	472d282046	[AArch64][GlobalISel] Don't localize TLS G_GLOBAL_VALUEs on Darwin. On Darwin these need to be selected into a function call for the TLS address lookup. As a result, they can't be moved below a physreg write, which happens in call sequences. In the long term, we should have some mechanism in the localizer to prevent localizing into target-specific atomic instruction sequences. rdar://60056248 Differential Revision: https://reviews.llvm.org/D76652	2020-03-24 13:35:50 -07:00
Andrzej Warzynski	0ea4fb5bb7	[AArch64][SVE] Rename intrinsics for gather prefetch [NFC] Summary: In order to keep the names consistent with other SVE gather loads, the intrinsics for gather prefetch are renamed as follows: * @llvm.aarch64.sve.gather.prfb -> @llvm.aarch64.sve.prfb.gather Reviewed by: fpetrogalli Differential Revision: https://reviews.llvm.org/D76421	2020-03-19 12:53:36 +00:00
Sander de Smalen	4788ca450f	[AArch64][SVE] Change pointer type of nontemporal load/store intrinsics Summary: This fixes a discrepancy between the non-temporal loads/store intrinsics and other SVE load intrinsics (such as nf/ff), so that Clang can use the same code to generate these intrinsics. Reviewers: andwar, kmclaughlin, rengolin, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76237	2020-03-18 12:44:51 +00:00
Vitaly Buka	f20dcc31e3	Fix unused function warning	2020-03-16 19:45:36 -07:00
Benjamin Kramer	05ff3323e0	[AArch64] Remove unused variable	2020-03-16 21:59:55 +01:00
Francesco Petrogalli	0f2b68d9c7	Implement IR intrinsics for gather prefetch. Summary: Intrinsics and relative codegen has been implemented for the following SVE instructions: 1. PRF<T> <prfop>, <Pg>, [<Xn\|SP>, <Zm>.S, <mod>] -> 32-bit scaled offset 2. PRF<T> <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D, <mod>] -> 32-bit unpacked scaled offset 3. PRF<T> <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D] -> 64-bit scaled offset 4. PRF<T> <prfop>, <Pg>, [<Zn>.S{, #<imm>}] -> 32-bit element 5. PRF<T> <prfop>, <Pg>, [<Zn>.D{, #<imm>}] -> 64-bit element The instructions are associated the following intrinsics, respectively: 1. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx4vi32( i8* %base, <vscale x 4 x i32> %offset, <vscale x 4 x i1> %Pg, i32 %prfop) 2. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx2vi32( i8* %base, <vscale x 2 x i32> %offset, <vscale x 2 x i1> %Pg, i32 %prfop) 3. void @llvm.aarch64.sve.gather.prf<T>.scaled.nx2vi64( i8* %base, <vscale x 2 x i64> %offset, <vscale x 2 x i1> %Pg, i32 %prfop) 4. void @llvm.aarch64.sve.gather.prf<T>.nx4vi32( <vscale x 4 x i32> %bases, i64 %imm, <vscale x 4 x i1> %Pg, i32 %prfop) 5. void @llvm.aarch64.sve.gather.prf<T>.nx2vi64( <vscale x 2 x i64> %bases, i64 %imm, <vscale x 2 x i1> %Pg, i32 %prfop) The intrinsics are the IR counterpart of the following SVE ACLE functions: * void svprf<T>(svbool_t pg, const void base, svprfop op) void svprf<T>_vnum(svbool_t pg, const void base, int64_t vnum, svprfop op) void svprf<T>_gather[_u32base](svbool_t pg, svuint32_t bases, svprfop op) * void svprf<T>_gather[_u64base](svbool_t pg, svuint64_t bases, svprfop op) * void svprf<T>_gather_[s32]offset(svbool_t pg, const void base, svint32_t offsets, svprfop op) void svprf<T>_gather_[u32]offset(svbool_t pg, const void base, svint32_t offsets, svprfop op) void svprf<T>_gather_[s64]offset(svbool_t pg, const void base, svint64_t offsets, svprfop op) void svprf<T>_gather_[u64]offset(svbool_t pg, const void base, svint64_t offsets, svprfop op) void svprf<T>_gather[_u32base]_offset(svbool_t pg, svuint32_t bases, int64_t offset, svprfop op) * void svprf<T>_gather[_u64base]_offset(svbool_t pg, svuint64_t bases,int64_t offset, svprfop op) Reviewers: andwar, sdesmalen, efriedma, rengolin Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75580	2020-03-16 18:52:35 +00:00
Andrzej Warzynski	a0c15ed460	[AArch64][SVE] Add the @llvm.aarch64.sve.dup.x intrinsic Summary: This intrinsic implements the unpredicated duplication of scalar values and is mapped to (through ISD::SPLAT_VECTOR): * DUP <Zd>.<T>, #<imm> * DUP <Zd>.<T>, <R><n\|SP> Reviewed by: sdesmalen Differential Revision: https://reviews.llvm.org/D75900	2020-03-13 12:40:22 +00:00
Andrzej Warzynski	46b9f14d71	[AArch64][SVE] Add intrinsics for non-temporal scatters/gathers Summary: This patch adds the following intrinsics for non-temporal gather loads and scatter stores: * aarch64_sve_ldnt1_gather_index * aarch64_sve_stnt1_scatter_index These intrinsics implement the "scalar + vector of indices" addressing mode. As opposed to regular and first-faulting gathers/scatters, there's no instruction that would take indices and then scale them. Instead, the indices for non-temporal gathers/scatters are scaled before the intrinsics are lowered to `ldnt1` instructions. The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as placeholders so that we can easily identify the cases implemented in this patch in performGatherLoadCombine and performScatterStoreCombined. Once encountered, they are replaced with: * GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1 * SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1 The patterns for lowering ISD::SHL for scalable vectors (required by this patch) were missing, so these are added too. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D75601	2020-03-12 13:55:56 +00:00
Andrzej Warzynski	a9f1583228	[AArch64][SVE] Add the @llvm.aarch64.sve.sel intrinsic Reviewers: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75928	2020-03-11 17:05:21 +00:00
Djordje Todorovic	c15c68abdc	[CallSiteInfo] Enable the call site info only for -g + optimizations Emit call site info only in the case of '-g' + 'O>0' level. Differential Revision: https://reviews.llvm.org/D75175	2020-03-09 12:12:44 +01:00
Andrzej Warzynski	9249f60602	[AArch64][SVE] Add intrinsics for non-temporal gather-loads/scatter-stores Summary: This patch adds the following LLVM IR intrinsics for SVE: 1. non-temporal gather loads * @llvm.aarch64.sve.ldnt1.gather * @llvm.aarch64.sve.ldnt1.gather.uxtw * @llvm.aarch64.sve.ldnt1.gather.scalar.offset 2. non-temporal scatter stores * @llvm.aarch64.sve.stnt1.scatter * @llvm.aarch64.sve.ldnt1.gather.uxtw * @llvm.aarch64.sve.ldnt1.gather.scalar.offset These intrinsic are mapped to the corresponding SVE instructions (example for half-words, zero-extending): * ldnt1h { z0.s }, p0/z, [z0.s, x0] * stnt1h { z0.s }, p0/z, [z0.s, x0] Note that for non-temporal gathers/scatters, the SVE spec defines only one instruction type: "vector + scalar". For this reason, we swap the arguments when processing intrinsics that implement the "scalar + vector" addressing mode: * @llvm.aarch64.sve.ldnt1.gather * @llvm.aarch64.sve.ldnt1.gather.uxtw * @llvm.aarch64.sve.stnt1.scatter * @llvm.aarch64.sve.ldnt1.gather.uxtw In other words, all intrinsics for gather-loads and scatter-stores implemented in this patch are mapped to the same load and store instruction, respectively. The sve2_mem_gldnt_vs multiclass (and it's counterpart for scatter stores) from SVEInstrFormats.td was split into: * sve2_mem_gldnt_vec_vs_32_ptrs (32bit wide base addresses) * sve2_mem_gldnt_vec_vs_62_ptrs (64bit wide base addresses) This is consistent with what we did for @llvm.aarch64.sve.ld1.scalar.offset and highlights the actual split in the spec and the implementation. Reviewed by: sdesmalen Differential Revision: https://reviews.llvm.org/D74858	2020-03-02 10:38:28 +00:00
Andrzej Warzynski	fa9439fac8	[AArch64][SVE] Add intrinsics for first-faulting gather loads Summary: The following intrinsics are added: * @llvm.aarch64.sve.ldff1.gather * @llvm.aarch64.sve.ldff1.gather.index * @llvm.aarch64.sve.ldff1.gather_sxtw * @llvm.aarch64.sve.ldff1.gather.uxtw * @llvm.aarch64.sve.ldff1.gather_sxtw.index * @llvm.aarch64.sve.ldff1.gather.uxtw.index * @llvm.aarch64.sve.ldff1.gather.scalar.offset Although this patch is quite substantial, the vast majority of the implementation is just a 'copy & paste' of the implementation of regular gather loads, including tests. There's only a handful of new definitions: * AArch64ISD nodes defined in AArch64ISelLowering.h (e.g. GLDFF1) * Seleciton DAG Types in AArch64SVEInstrInfo.td (e.g. AArch64ldff1_gather) * intrinsics in IntrinsicsAArch64.td (e.g. aarch64_sve_ldff1_gather) * Pseudo instructions in SVEInstrFormats.td to workaround the issue of use-before-def for the FFR register. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D75128	2020-02-27 12:56:33 +00:00
Sjoerd Meijer	13db7490fa	[AArch64] Peephole optimization: merge AND and TST instructions In some cases Clang does not perform merging of instructions AND and TST (aka ANDS xzr). Example: tst x2, x1 and x3, x2, x1 to: ands x3, x2, x1 This patch add such merging during instruction selection: when AND is replaced with ANDS instruction in LowerSELECT_CC, all users of AND also should be changed for using this ANDS instruction Short discussion on mailing list: http://llvm.1065342.n5.nabble.com/llvm-dev-ARM-Peephole-optimization-instructions-tst-add-tp133109.html Patch by Pavel Kosov. Differential Revision: https://reviews.llvm.org/D71701	2020-02-27 09:23:47 +00:00
Craig Topper	735d27dc40	[SelectionDAG][PowerPC][AArch64][X86][ARM] Add chain input and output the ISD::FLT_ROUNDS_ This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order. This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code. Differential Revision: https://reviews.llvm.org/D75132	2020-02-25 16:58:23 -08:00
Andrzej Warzynski	cff90c938b	[AArch64][SVE] Update names and comments for gathers/scatters (NFC) Summary: This patch renames functions and TableGen classes for SVE gathers and scatters. The original names implied that the corresponding methods/classes are only suited for regular gathers/scatters (i.e. LD1 and ST1), which is not the case. Indeed, we will be re-using them for non-temporal and first-faulting gathers/scatters in the forthcoming patches. The new names also highlight the split into Vector-Scalar (VS) and Scalar-Vector (SV) cases. List of changes: * `performLD1GatherCombine` and `performST1ScatterCombine` are renamed as `performGatherLoadCombine` and `performScatterStoreCombine`, respectively. * Selection DAG types for scatters and gathers from AArch64SVEInstrInfo.td are renamed. For example, `SDT_AArch64_GLD1` is renamed as `SDT_AArch64_GATHER_SV`. SV stands for Scalar-Vector, as opposed to Vector-Scalar (VS). * The intrinsic classes from IntrinsicsAArch64.td are renamed. For example, `AdvSIMD_GatherLoad_64bitOffset_Intrinsic` is renamed as `AdvSIMD_GatherLoad_SV_64b_Offsets_Intrinsic`. * Updated comments in `performGatherLoadCombine` and `performScatterStoreCombine`. Reviewers: sdesmalen, rengolin, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75035	2020-02-25 11:09:01 +00:00
Cullen Rhodes	72848f26b4	[AArch64][SVE] Add predicate reinterpret intrinsics Summary: Implements the following intrinsics: * llvm.aarch64.sve.convert.to.svbool * llvm.aarch64.sve.convert.from.svbool For converting the ACLE svbool_t type (<n x 16 x i1>) to and from the other predicate types: <n x 8 x i1>, <n x 4 x i1> and <n x 2 x i1>. Reviewers: sdesmalen, kmclaughlin, efriedma, dancgr, rengolin Reviewed By: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74471	2020-02-25 10:24:06 +00:00
Kerry McLaughlin	f87f23c81c	[AArch64][SVE] Add the SVE dupq_lane intrinsic Summary: Implements the @llvm.aarch64.sve.dupq.lane intrinsic. As specified in the ACLE, the behaviour of: svdupq_lane_u64(data, index) ...is identical to: svtbl(data, svadd_x(svptrue_b64(), svand_x(svptrue_b64(), svindex_u64(0, 1), 1), index * 2)) If the index is in the range [0,3], the operation is equivalent to a single DUP (.q) instruction. Reviewers: sdesmalen, c-rhodes, cameron.mcinally, efriedma, dancgr, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74734	2020-02-24 13:59:47 +00:00
Craig Topper	9bbf271fc9	[AArch64] Move isOverflowIntrOpRes help function to the ISD namespace in SelectionDAG.h. NFC Enables sharing with an upcoming X86 change.	2020-02-20 08:50:17 -08:00

... 5 6 7 8 9 ...

1445 Commits