llvm-project

Commit Graph

Author	SHA1	Message	Date
David Sherwood	88bbd30736	[SVE][CodeGen] Fix issues with EXTRACT_SUBVECTOR when using scalable FP vectors In this patch I have fixed two issues: 1. Our SVE tuple get/set intrinsics were using the wrong constant type for the index passed to EXTRACT_SUBVECTOR. I have fixed this by using the function SelectionDAG::getVectorIdxConstant to create the value. Also, I have updated the documentation for EXTRACT_SUBVECTOR describing what type the constant index should be and we now enforce this when creating the node. 2. The AArch64 backend was missing the appropriate patterns for extracting certain subvectors (nxv4f16 and nxv2f32) from legal SVE types. I have added them as part of this patch. The only way that I could find to test the new patterns was to use the SVE tuple get intrinsics, although I realise it looks a bit unusual. Tests added here: test/CodeGen/AArch64/sve-extract-subvector.ll Differential Revision: https://reviews.llvm.org/D85516	2020-08-12 08:35:46 +01:00
Paul Walker	b6c7b7fa31	[SVE] Add ISD nodes for predicated integer extend inreg operations. These are useful instructions when lowering fixed length vector extends, so I've broken this patch out as kind of NFC like work. Differential Revision: https://reviews.llvm.org/D85546	2020-08-11 11:39:26 +01:00
Paul Walker	d542feb8e4	[SVE] Lower fixed length vector integer subtract operations. Differential Revision: https://reviews.llvm.org/D85665	2020-08-11 11:32:12 +01:00
Paul Walker	0d33a8ef5b	[SVE] Lower scalable vector mul operations. This allows us to remove extra patterns from AArch64SVEInstrInfo.td because we can reuse those required for fixed length vectors. Differential Revision: https://reviews.llvm.org/D85328	2020-08-06 11:15:35 +01:00
Paul Walker	3ed59b775d	[SVE] Implement lowering for fixed length vector multiplication. NOTE: Also uses SVE code generation for NEON size vectors, instead of expanding i64 based vector multiplications. Differential Revision: https://reviews.llvm.org/D85327	2020-08-06 11:01:39 +01:00
Paul Walker	927fc536ca	[SVE] Add lowering for fixed length vector and, or & xor operations. Since there are no ill effects when performing these operations with undefined elements, they are lowered to the already supported unpredicated scalable vector equivalents. Differential Revision: https://reviews.llvm.org/D85117	2020-08-05 11:28:34 +01:00
Sander de Smalen	f2916636f8	[AArch64][SVE] Disable tail calls if callee does not preserve SVE regs. This fixes an issue triggered by the following code, where emitEpilogue got confused when trying to restore the SVE registers after the call, whereas the call to bar() is implemented as a TCReturn: int non_sve(); int sve(svint32_t x) { return non_sve(); } Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84869	2020-08-05 09:38:54 +01:00
Eli Friedman	95efea4b93	[AArch64][SVE] Widen narrow sdiv/udiv operations. The SVE instruction set only supports sdiv/udiv for 32-bit and 64-bit integers. If we see an 8-bit or 16-bit divide, widen the operands to 32 bits, and narrow the result. Differential Revision: https://reviews.llvm.org/D85170	2020-08-04 13:22:15 -07:00
Paul Walker	4be13b15d6	[SVE] Replace remaining _MERGE_OP1 nodes with _PRED variants. This is the final bit of work to relax the register allocation requirements when code generating normal LLVM IR, which rarely care about the result of inactive lanes. By using _PRED nodes we can make better use of SVE's reversed instructions. Also removes a redundant parameter from the min/max tests. Differential Revision: https://reviews.llvm.org/D85142	2020-08-04 11:19:17 +01:00
David Sherwood	23ad660b5d	[SVE][CodeGen] At -O0 fallback to DAG ISel when translating alloca with scalable types When building code at -O0 We weren't falling back to DAG ISel correctly when encountering alloca instructions with scalable vector types. This is because the alloca has no operands that are scalable. I've fixed this by adding a check in AArch64ISelLowering::fallBackToDAGISel for alloca instructions with scalable types. Differential Revision: https://reviews.llvm.org/D84746	2020-07-30 08:40:53 +01:00
Eli Friedman	c02aa53ecb	[AArch64][SVE] Add "fast" fcmp operations. `dacf8d3` added support for most fcmp operations, but there are some extra variations I hadn't considered: SelectionDAG supports float comparisons that are neither ordered nor unordered. Add support for the missing operations. Differential Revision: https://reviews.llvm.org/D84460	2020-07-24 13:22:41 -07:00
Francesco Petrogalli	809600d664	[llvm][sve] Reg + Imm addressing mode for ld1ro. Reviewers: kmclaughlin, efriedma, sdesmalen Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83357	2020-07-24 17:48:47 +00:00
Petre-Ionut Tudor	1af9fc8213	[ARM] Generate [SU]HADD from ((a + b) >> 1) Summary: Teach LLVM to recognize the above pattern, where the operands are either signed or unsigned types. Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83777	2020-07-21 13:22:07 +01:00
Eli Friedman	b8f765a1e1	[AArch64][SVE] Add support for trunc to <vscale x N x i1>. This isn't a natively supported operation, so convert it to a mask+compare. In addition to the operation itself, fix up some surrounding stuff to make the testcase work: we need concat_vectors on i1 vectors, we need legalization of i1 vector truncates, and we need to fix up all the relevant uses of getVectorNumElements(). Differential Revision: https://reviews.llvm.org/D83811	2020-07-20 13:11:02 -07:00
Paul Walker	6384ec4099	[SVE] Add lowering for fixed length vector fdiv, fma, fmul and fsub operations. Differential Revision: https://reviews.llvm.org/D84034	2020-07-20 11:57:34 +00:00
Tim Northover	88464a55b4	AArch64: emit @llvm.debugtrap as `brk #0xf000` on all platforms It's useful for a debugger to be able to distinguish an @llvm.debugtrap from a (noreturn) @llvm.trap, so this extends the existing Windows behaviour to other platforms.	2020-07-20 10:31:26 +01:00
Paul Walker	509351d768	[SVE] Add lowering for scalable vector fadd, fdiv, fmul and fsub operations. Lower the operations to predicated variants. This is prep work required for fixed length code generation but also fixes a bug whereby these operations fail selection when "unpacked" vector types (e.g. MVT::nxv2f32) are used. This patch also adds the missing "unpacked" patterns for FMA. Differential Revision: https://reviews.llvm.org/D83765	2020-07-16 11:31:35 +00:00
Paul Walker	319a97b5e2	[SVE] Ensure fixed length vector fptrunc operations bigger than NEON are not considered legal. Differential Revision: https://reviews.llvm.org/D83568	2020-07-13 11:16:30 +00:00
Paul Walker	f78e6a3095	[SVE] Code generation for fixed length vector truncates. Lower fixed length vector truncates to a sequence of SVE UZP1 instructions. Differential Revision: https://reviews.llvm.org/D83395	2020-07-10 10:37:19 +00:00
Eli Friedman	56ae2cebcd	[AArch64][SVE] Add lowering for llvm.fma. This is currently bare-bones; we aren't taking advantage of any of the FMA variant instructions. But it's enough to at least generate code. Differential Revision: https://reviews.llvm.org/D83444	2020-07-09 16:12:41 -07:00
Paul Walker	614fb09645	[SVE] Disable some BUILD_VECTOR related code generator features. Fixed length vector code generation for SVE does not yet custom lower BUILD_VECTOR and instead relies on expansion. At the same time custom lowering for VECTOR_SHUFFLE is also not available so this patch updates isShuffleMaskLegal to reject vector types that require SVE. Related to this it also prevents the merging of stores after legalisation because this only works when BUILD_VECTOR is either legal or can be elminated. When this is not the case the code generator enters an infinite legalisation loop. Differential Revision: https://reviews.llvm.org/D83408	2020-07-09 10:47:04 +00:00
Paul Walker	fb75451775	[SVE] Custom ISel for fixed length extract/insert_subvector. We use extact_subvector and insert_subvector to "cast" between fixed length and scalable vectors. This patch adds custom c++ based ISel for the following cases: fixed_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0 scalable_vector = ISD::INSERT_SUBVECTOR undef(scalable_vector), fixed_vector, 0 Which result in either EXTRACT_SUBREG/INSERT_SUBREG for NEON sized vectors or COPY_TO_REGCLASS otherwise. Differential Revision: https://reviews.llvm.org/D82871	2020-07-08 09:49:28 +00:00
Petre-Ionut Tudor	af80a4353e	[ARM] Generate [SU]RHADD from (b - (~a)) >> 1 Summary: Teach LLVM to recognize the above pattern, which is usually a transformation of (a + b + 1) >> 1, where the operands are either signed or unsigned types. Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82669	2020-07-03 16:00:06 +01:00
Guillaume Chatelet	87e2751cf0	[Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D83082	2020-07-03 08:06:43 +00:00
Sander de Smalen	143e324e75	[CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics, that shouldn't really have been there (I suspect this was a remnant from when we expected the wider vector always to have come from a vector CONCAT). When I tried to create a more minimal reproducer, I found a bug in DAGCombiner where it drops the scalable flag when trying to fold: extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index') This patch fixes both issues. Reviewers: david-arm, efriedma, spatel Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D82910	2020-07-02 10:16:43 +01:00
Guillaume Chatelet	ef36f5143d	[Alignment] TargetLowering::hasPairedLoad must use Align for RequiredAlignment As per documentation of `hasPairLoad`: "`RequiredAlignment` gives the minimal alignment constraints that must be met to be able to select this paired load." In this sense, `0` is strictly equivalent to `1`. We make this obvious by using `Align` instead of unsigned. There is only one implementor of this interface. Differential Revision: https://reviews.llvm.org/D82958	2020-07-01 14:32:30 +00:00
Paul Walker	a1aed80a35	[SVE] Relax merge requirement for IR based divides. We currently lower SDIV to SDIV_MERGE_OP1. This forces the value for inactive lanes in a way that can hamper register allocation, however, the lowering has no requirement for inactive lanes. Instead this patch replaces SDIV_MERGE_OP1 with SDIV_PRED thus freeing the register allocator. Once done the only user of SDIV_MERGE_OP1 is intrinsic lowering so I've removed the node and perform ISel on the intrinsic directly. This also allows us to implement MOVPRFX based zeroing in the same manner as SUB. This patch also renames UDIV_MERGE_OP1 and [F]ADD_MERGE_OP1 for the same reason but in the ADD cases the ISel code is already as required. Differential Revision: https://reviews.llvm.org/D82783	2020-07-01 08:18:42 +00:00
Guillaume Chatelet	28de229bc6	[Alignment][NFC] Migrate MachineFrameInfo::CreateStackObject to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82894	2020-07-01 07:28:11 +00:00
Christopher Tetreault	ab35ba5742	[SVE] Remove calls to VectorType::getNumElements from AArch64 Reviewers: efriedma, paquette, david-arm, kmclaughlin Reviewed By: david-arm Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82214	2020-06-30 11:17:50 -07:00
Guillaume Chatelet	4f5133a4dc	[Alignment][NFC] Migrate AArch64, ARM, Hexagon, MSP and NVPTX backends to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82749	2020-06-30 07:56:17 +00:00
Sander de Smalen	39f6a36a24	[AArch64][SVE] NFCI: Choose consistent naming for predicated SDAG nodes This patch proposes a naming convention for operations that take a general predicate (and are thus predicated) that specifies what happens to the false lanes. Currently the _PRED suffix is used, which doesn't really say much other than that it takes a predicate. In some instances this means it has merging predication and in other cases it means zeroing-predication. This patch also changes the order of operands to AArch64ISD::DUP_MERGE_PASSTHRU, to pass the predicate as the first operand, which is in line with all other predicates nodes. It takes the passthru value as an explicit passthru value, which is always passed as the last operand. Reviewers: paulwalker-arm, cameron.mcinally, eli.friedman, dancgr, efriedma Reviewed By: paulwalker-arm Tags: #llvm Differential Revision: https://reviews.llvm.org/D81850	2020-06-29 13:37:30 +01:00
Kerry McLaughlin	bb6603f013	[AArch64][SVE] Bail out of performPostLD1Combine for scalable types Summary: performPostLD1Combine will introduce either a LD1LANEpost or LD1DUPpost node, which will cause selection failure if the return type is a scalable vector. Reviewers: sdesmalen, c-rhodes, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82670	2020-06-29 11:59:53 +01:00
Simon Pilgrim	973685fc78	[TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.	2020-06-29 11:46:58 +01:00
Paul Walker	3a98d5d7e7	[SVE] Code generation for fixed length vector adds. Summary: Teach LowerToPredicatedOp to lower fixed length vector operations. Add AArch64ISD nodes and isel patterns for predicated integer and floating point adds. Together this enables SVE code generation for fixed length vector adds. Reviewers: rengolin, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82483	2020-06-26 19:54:41 +00:00
Kerry McLaughlin	6b313f198c	[AArch64][SVE] Remove asserts from AArch64ISelLowering for bfloat16 types Remove the asserts in performLDNT1Combine & performST[NT]1Combine to ensure we get a failure where the type is a bfloat16 and hasBF16() is false, regardless of whether asserts are enabled.	2020-06-26 14:51:27 +01:00
Cullen Rhodes	4319c48fc7	[AArch64][SVE] Only support sizeless bfloat types if supported by subtarget Reviewers: sdesmalen, efriedma, kmclaughlin, fpetrogalli Reviewed By: sdesmalen, fpetrogalli Differential Revision: https://reviews.llvm.org/D82494	2020-06-26 12:37:47 +00:00
Kerry McLaughlin	edcfef8fee	[AArch64][SVE] Add bfloat16 support to store intrinsics Summary: Bfloat16 support added for the following intrinsics: - ST1 - STNT1 Reviewers: sdesmalen, c-rhodes, fpetrogalli, efriedma, stuij, david-arm Reviewed By: fpetrogalli Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D82448	2020-06-26 11:05:56 +01:00
Kerry McLaughlin	0ccfe1b267	[AArch64][SVE] Predicate bfloat16 load patterns with HasBF16 Reviewers: sdesmalen, c-rhodes, efriedma, fpetrogalli Reviewed By: fpetrogalli Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82464	2020-06-26 10:38:24 +01:00
Eli Friedman	e9d4e34ab8	[AArch64][SVE] Add legalization support for i32/i64 vector srem/urem Implement them on top of sdiv/udiv, similar to what we do for integer types. Potential future work: implementing i8/i16 srem/urem, optimizations for constant divisors, optimizing the mul+sub to mls. Differential Revision: https://reviews.llvm.org/D81511	2020-06-23 16:27:52 -07:00
Paul Walker	499c63288f	[SVE] Code generation for fixed length vector loads & stores. Summary: This patch adds base support for code generating fixed length vector operations targeting a known SVE vector length. To achieve this we lower fixed length vector operations to equivalent scalable vector operations, whereby SVE predication is used to limit the elements processed to those present within the fixed length vector. Specifically this patch implements load and store operations, which get lowered to their masked counterparts thusly: V = load(Addr) => V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr)) store(V, (Addr)) => masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr)) Reviewers: rengolin, efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80385	2020-06-23 09:39:03 +00:00
David Sherwood	584d0d5c17	[SVE] Fall back on DAG ISel at -O0 when encountering scalable types At the moment we use Global ISel by default at -O0, however it is currently not capable of dealing with scalable vectors for two reasons: 1. The register banks know nothing about SVE registers. 2. The LLT (Low Level Type) class knows nothing about scalable vectors. For now, the easiest way to avoid users hitting issues when using the SVE ACLE is to fall back on normal DAG ISel when encountering instructions that operate on scalable vector types. I've added a couple of RUN lines to existing SVE tests to ensure we can compile at -O0. I've also added some new tests to CodeGen/AArch64/GlobalISel/arm64-fallback.ll that demonstrate we correctly fallback to DAG ISel at -O0 when lowering formal arguments or translating instructions that involve scalable vector types. Differential Revision: https://reviews.llvm.org/D81557	2020-06-19 10:57:00 +01:00
David Sherwood	0dc28af219	[CodeGen,AArch64] Fix up warnings in performExtendCombine Try to avoid calling getVectorNumElements() or relying upon the TypeSize conversion to uin64_t. Differential Revision: https://reviews.llvm.org/D81573	2020-06-19 10:34:51 +01:00
Paul Walker	4612f39120	[SVE] Add flag to specify SVE register size, using this to calculate legal vector types. Adds aarch64-sve-vector-bits-{min,max} to allow the size of SVE data registers (in bits) to be specified. This allows the code generator to make assumptions it normally couldn't. As a starting point this information is used to mark fixed length vector types that can fit within the specified size as legal. Reviewers: rengolin, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80384	2020-06-18 12:11:16 +00:00
Nikita Popov	7cac7e0cfc	[IR] Prefer hasFnAttribute() where possible (NFC) When checking for an enum function attribute, use hasFnAttribute() rather than hasAttribute() at FunctionIndex, because it is significantly faster (and more concise to boot).	2020-06-15 09:30:35 +02:00
Shawn Landden	9ec57cce62	[AArch64] custom lowering for i128 popcount halves the number of CNT instructions generated	2020-06-10 09:44:16 +04:00
Henry Kao	4dcc0d1958	[CodeGen][SVE] Avoid scalarizing zero splat stores on scalable vectors. Summary: Implemented in replaceZeroVectorStore(). Fixes several warnings in AArch64 SVE unit tests. Reviewers: sdesmalen, kmclaughlin, dancgr, efriedma, each, andwar, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80824	2020-06-09 12:52:39 -04:00
Guillaume Chatelet	3b6196c9b3	[Alignment][NFC] TargetLowering::allowsMisalignedMemoryAccesses Summary: Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMisalignedMemoryAccesses` without marking it override. This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81374	2020-06-09 10:17:42 +00:00
Cullen Rhodes	b82be5db71	[AArch64][SVE] Implement structured load intrinsics Summary: This patch adds initial support for the following instrinsics: * llvm.aarch64.sve.ld2 * llvm.aarch64.sve.ld3 * llvm.aarch64.sve.ld4 For loading two, three and four vectors worth of data. Basic codegen is implemented with reg+reg and reg+imm addressing modes being addressed in a later patch. The types returned by these intrinsics have a number of elements that is a multiple of the elements in a 128-bit vector for a given type and N, where N is the number of vectors being loaded, i.e. 2, 3 or 4. Thus, for 32-bit elements the types are: LD2 : <vscale x 8 x i32> LD3 : <vscale x 12 x i32> LD4 : <vscale x 16 x i32> This is implemented with target-specific intrinsics for each variant that take the same operands as the IR intrinsic but return N values, where the type of each value is a full vector, i.e. <vscale x 4 x i32> in the above example. These values are then concatenated using the standard concat_vector intrinsic to maintain type legality with the IR. These intrinsics are intended for use in the Arm C Language Extension (ACLE). Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D75751	2020-06-09 08:51:58 +00:00
David Sherwood	30dfbf03a2	[CodeGen,AArch64] Fix up warnings in splitStores The code for trying to split up stores is designed for NEON vectors, where we support arbitrary alignments. It's an optimisation designed to improve performance by using smaller, aligned stores. However, we currently only support 16 byte alignments for SVE vectors anyway so we may as well bail out early. This change fixes up remaining warnings in a couple of tests: CodeGen/AArch64/sve-callbyref-notailcall.ll CodeGen/AArch64/sve-calling-convention-byref.ll Differential Revision: https://reviews.llvm.org/D80720	2020-06-09 07:39:38 +01:00
David Sherwood	41fb119e8c	[CodeGen] Fix nullptr crash in tryConvertSVEWideCompare When the input to a wide compare instruction is a DUP or SPLAT_VECTOR node we should deal with cases where the DUP/SPLAT_VECTOR input operand is not an immediate value. I've fixed the code to return SDValue() in such cases and added a couple of tests - one each to represent the signed and unsigned cases. Differential Revision: https://reviews.llvm.org/D81167	2020-06-08 15:20:18 +01:00
Cullen Rhodes	3ebbe35363	[AArch64][SVE] Implement vector tuple intrinsics Summary: This patch adds the following intrinsics for creating two-tuple, three-tuple and four-tuple scalable vectors: * llvm.aarch64.sve.tuple.create2 * llvm.aarch64.sve.tuple.create3 * llvm.aarch64.sve.tuple.create4 As well as: * llvm.aarch64.sve.tuple.get * llvm.aarch64.sve.tuple.set For extracting and inserting scalable vectors from vector tuples. These intrinsics are intended to be used by the ACLE functions svcreate<n>, svget and svset. This patch also includes calling convention support for passing and returning tuples of scalable vectors to/from functions. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D75674	2020-06-08 11:09:55 +00:00
Kadir Cetinkaya	6bad8b07e6	[llvm][AArch64] Fix unused variable	2020-06-05 15:56:19 +02:00
Kerry McLaughlin	89fc0166f5	[CodeGen][SVE] Legalisation of extends with scalable types Summary: This patch adds legalisation of extensions where the operand of the extend is a legal scalable type but the result is not. EXTRACT_SUBVECTOR is used to split the result, before being replaced by target-specific [S\|U]UNPK[HI\|LO] operations. For example: ``` zext <vscale x 16 x i8> %a to <vscale x 16 x i16> ``` should emit: ``` uunpklo z2.h, z0.b uunpkhi z1.h, z0.b ``` Reviewers: sdesmalen, efriedma, david-arm Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79587	2020-06-05 12:08:42 +01:00
Eric Christopher	8c9badf61d	Replace integer usage with enumeration.	2020-06-03 20:00:28 -07:00
Francesco Petrogalli	febeaf94a8	[llvm][SVE] IR intrinsic for LD1RO. Reviewers: sdesmalen, efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80738	2020-06-03 13:57:16 +00:00
Amara Emerson	f573d489b6	[AArch64][GlobalISel] Split G_GLOBAL_VALUE into ADRP + G_ADD_LOW and optimize. The concept of G_GLOBAL_VALUE is nice and simple, but always using it as the representation for global var addressing until selection time creates some problems in optimizing accesses in certain code/relocation models. The problem comes from trying to optimize adrp -> add -> load/store sequences in the most common "small" code model. These accesses can be optimized into an adrp -> load with the add offset being folded into the load's immediate field. If we try to keep all global var references as a single generic instruction then by the time we get to the complex operand trying to match these, we end up generating an adrp at the point of use. The real issue here is that we don't have any form of CSE during selection, so the code size will bloat from many redundant adrp's. This patch custom legalizes small code mode non-GOT G_GLOBALs into target ADRP and a new "target specific generic opcode" G_ADD_LOW. We also teach the localizer to localize these instructions via the custom hook that was added recently. Finally, the complex pattern for indexed loads/stores is extended to try to fold these G_ADD_LOW instructions into the load immediate. On -O0 CTMark, we see a 0.8% geomean code size improvement. We should also see some minor performance improvements too. Differential Revision: https://reviews.llvm.org/D78465	2020-06-01 16:00:56 -07:00
Martin Storsjö	cf97e0ec42	[AArch64] Treat x18 as callee-saved in functions with windows calling convention on non-windows OSes Treat it as callee-saved, and always back it up. When windows code calls entry points in unix code, marked with the windows calling convention, that unix code can call other functions that isn't compiled with -ffixed-x18 which may clobber x18 freely. By backing it up and restoring it on return, we preserve the register across the function call, fulfilling this part of the windows calling convention on another OS. This isn't enough for making sure that x18 is preseved when non-windows code does a callback to windows code, but is a clear improvement over the current status quo. Additionally, wine is nowadays building many modules as PE DLLs, which avoids the callback issue altogether for those DLLs. Differential Revision: https://reviews.llvm.org/D61892	2020-05-30 09:22:09 +03:00
Christopher Tetreault	56eb7556e7	[SVE] Eliminate calls to default-false VectorType::get() from AArch64 Reviewers: efriedma, c-rhodes, david-arm, mcrosier, t.p.northover Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80327	2020-05-29 15:39:30 -07:00
Zequan Wu	80e107ccd0	Add NoMerge MIFlag to avoid MIR branch folding Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given Differential Revision: https://reviews.llvm.org/D79537	2020-05-29 12:31:06 -07:00
David Sherwood	205085d4cc	[CodeGen] Fix warnings in LowerToPredicatedOp When creating a new vector type based on another vector type we should pass in the element count instead of the number of elements and scalable flag separately. I encountered this warning whilst compiling this test: CodeGen/AArch64/sve-intrinsics-int-compares.ll Differential revision: https://reviews.llvm.org/D80621	2020-05-29 15:19:03 +01:00
Ties Stuij	78bd0c0e5e	[AArch64][BFloat] add BFloat instruction support for AArch64 Summary: Add support for lowering various BFloat related SelDAG nodes: - load/store (ldrh/strh) - concat - dup/duplane - bitconvert/bitcast - insert_subvector/insert_subreg This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a The bfloat type, and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile Reviewers: ab, t.p.northover, john.brawn, fpetrogalli, sdesmalen, LukeGeeson Reviewed By: fpetrogalli Subscribers: LukeGeeson, pbarrio, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79712	2020-05-27 15:36:54 +01:00
Ties Stuij	42eba9b40b	[AArch64][BFloat] basic AArch64 bfloat support Summary: This patch adds the bfloat type to the AArch64 backend: - adds it as part of the FPR16 register class - adds bfloat calling conventions - as f16 is now not the only FPR16 type anymore, we need to constrain a number of instruction patterns using FPR16Op to help out the TableGen type inferrer This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a The bfloat type, and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile Reviewers: t.p.northover, c-rhodes, fpetrogalli, sdesmalen, ostannard, LukeGeeson, ab Reviewed By: fpetrogalli Subscribers: pbarrio, LukeGeeson, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79709	2020-05-27 15:26:40 +01:00
Craig Topper	80cc43b420	[AArch64] Set i32 ISD::MULHU/S to Expand instead of Legal. Looks like there are no isel patterns for these. A DAG combine turns it into i64 multiply and a shift which hides this. Extracted from D80485	2020-05-26 00:41:09 -07:00
Sanjay Patel	7eed772a27	[PatternMatch] abbreviate vector inst matchers; NFC Readability is not reduced with these opcodes/match lines, so reduce odds of awkward wrapping from 80-col limit.	2020-05-24 09:19:47 -04:00
Alexey Lapshin	bf242c067e	[AARCH64][NEON] Allow to sink operands of aarch64_neon_pmull64. Summary: This patch fixes a problem when pmull2 instruction is not generated for vmull_high_p64 intrinsic. ISel has a pattern for int_aarch64_neon_pmull64 intrinsic to generate PMULL2 instruction. That pattern assumes that extraction operations are located in the same basic block. We need to sink them if they are not. Handle operands of int_aarch64_neon_pmull64 into AArch64TargetLowering::shouldSinkOperands. Reviewed by: efriedma Differential Revision: https://reviews.llvm.org/D80320	2020-05-22 01:35:24 +03:00
Eli Friedman	a1ce88b4e3	[AArch64][SVE] Implement AArch64ISD::SETCC_PRED This unifies SETCC operations along the lines of other operations. Differential Revision: https://reviews.llvm.org/D79975	2020-05-15 11:53:21 -07:00
Benjamin Kramer	f242950fdf	Fold single-use variables into assert This avoids unused variable warnings in Release builds.	2020-05-12 15:26:59 +02:00
Sander de Smalen	077d2d6802	[CodeGen][SVE] Add patterns for whole vector predicate select Added patterns to implement `select i1 %p, <vty> %a, <vty> %b` Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D79356	2020-05-12 11:47:39 +01:00
Petre-Ionut Tudor	9682d0d5dc	[ARM] Refactor lower to S[LR]I optimization Summary: The optimization has been refactored to fix certain bugs and limitations. The condition for lowering to S[LR]I has been changed to reflect the manual pseudocode description of SLI and SRI operation. The optimization can now handle more cases of operand type and order. Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79233	2020-05-12 11:00:13 +01:00
Craig Topper	d1119980e5	[SelectionDAG] Use Align/MaybeAlign for ConstantPoolSDNode. This patch stores the alignment for ConstantPoolSDNode as an Align and updates the getConstantPool interface to take a MaybeAlign. Removing getAlignment() will be done as a follow up. Differential Revision: https://reviews.llvm.org/D79436	2020-05-08 16:04:11 -07:00
Kerry McLaughlin	3bcd3dd473	[CodeGen][SVE] Lowering of shift operations with scalable types Summary: Adds AArch64ISD nodes for: - SHL_PRED (logical shift left) - SHR_PRED (logical shift right) - SRA_PRED (arithmetic shift right) Existing patterns for unpredicated left shift by immediate have also been moved into the appropriate multiclasses in SVEInstrFormats.td. Reviewers: sdesmalen, efriedma, ctetreau, huihuiz, rengolin Reviewed By: efriedma Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79478	2020-05-07 11:43:49 +01:00
Eli Friedman	2c8546107a	[AArch64][SVE] Implement lowering for SIGN_EXTEND etc. of SVE predicates. Now using patterns, since there's a single-instruction lowering. (We could convert to VSELECT and pattern-match that, but there doesn't seem to be much point.) I think this might be the first instruction to use nested multiclasses this way? It seems like a good way to reduce duplication between different integer widths. Let me know if it seems like an improvement. Also, while I'm here, fix the return type of SETCC so we don't try to merge a sign-extend with a SETCC. Differential Revision: https://reviews.llvm.org/D79193	2020-05-06 17:56:32 -07:00
Kerry McLaughlin	19f5da9c1d	[SVE][Codegen] Lower legal min & max operations Summary: This patch adds AArch64ISD nodes for [S\|U]MIN_PRED and [S\|U]MAX_PRED, and lowers both SVE intrinsics and IR operations for min and max to these nodes. There are two forms of these instructions for SVE: a predicated form and an immediate (unpredicated) form. The patterns which existed for the latter have been updated to match a predicated node with an immediate and map this to the immediate instruction. Reviewers: sdesmalen, efriedma, dancgr, rengolin Reviewed By: efriedma Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79087	2020-05-04 11:19:19 +01:00
Cullen Rhodes	672b62ea21	[AArch64][SVE] Custom lowering of floating-point reductions Summary: This patch implements custom floating-point reduction ISD nodes that have vector results, which are used to lower the following intrinsics: * llvm.aarch64.sve.fadda * llvm.aarch64.sve.faddv * llvm.aarch64.sve.fmaxv * llvm.aarch64.sve.fmaxnmv * llvm.aarch64.sve.fminv * llvm.aarch64.sve.fminnmv SVE reduction instructions keep their result within a vector register, with all other bits set to zero. Changes in this patch were implemented by Paul Walker and Sander de Smalen. Reviewers: sdesmalen, efriedma, rengolin Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D78723	2020-04-30 10:18:40 +00:00
Kerry McLaughlin	53dd72a87a	[SVE][CodeGen] Lower SDIV & UDIV to SVE intrinsics Summary: This patch maps IR operations for sdiv & udiv to the @llvm.aarch64.sve.[s\|u]div intrinsics. A ptrue must be created during lowering as the div instructions have only a predicated form. Patch contains changes by Andrzej Warzynski. Reviewers: sdesmalen, c-rhodes, efriedma, cameron.mcinally, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, andwar, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78569	2020-04-24 11:38:20 +01:00
Christopher Tetreault	84584b0d29	[SVE] Remove calls to isScalable from AARCH64 Reviewers: efriedma, sdesmalen, t.p.northover, mcrosier Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77758	2020-04-23 13:09:17 -07:00
Kerry McLaughlin	17f6e18acf	[AArch64][SVE] Add SVE intrinsic for LD1RQ Summary: Adds the following intrinsic for contiguous load & replicate: - @llvm.aarch64.sve.ld1rq The LD1RQ intrinsic only needs the SImmS16XForm added by this patch. The others (SImmS2XForm, SImmS3XForm & SImmS4XForm) were added for consistency. Reviewers: andwar, sdesmalen, efriedma, cameron.mcinally, dancgr, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76929	2020-04-22 11:29:27 +01:00
Petre-Ionut Tudor	52474992b1	Revert "[ARM] Fix conditions for lowering to S[LR]I" This reverts commit `cabfcf840a`. The patch introduced another bug in the optimization.	2020-04-20 16:11:04 +01:00
Kerry McLaughlin	33ffce5414	[AArch64][SVE] Remove LD1/ST1 dependency on llvm.masked.load/store Summary: The SVE masked load and store intrinsics introduced in D76688 rely on common llvm.masked.load/store nodes. This patch creates new ISD nodes for LD1(S) & ST1 to remove this dependency. Additionally, this adds support for sign & zero extending loads and truncating stores. Reviewers: sdesmalen, efriedma, cameron.mcinally, c-rhodes, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, andwar, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78204	2020-04-20 11:08:11 +01:00
Francesco Petrogalli	897fdec586	[llvm][CodeGen] Addressing modes for SVE stN. This reverts commit `17b1869b72`. It is an attempt to fix the failure reported at The patch differs from the original one reviwed at https://reviews.llvm.org/D77435 only for the use of the std::make_tuple in building the return value of `findAddrModeSVELoadStore`: - return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset}; + return std::make_tuple(IsRegReg ? Opc_rr : Opc_ri, NewBase, the original patch submitted at `fc4e954ed5` was failing the following build: http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420/ with error: /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10: error: chosen constructor is explicit in copy-initialization return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset}; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here constexpr tuple(_UElements&&... __elements) ^ 1 error generated.	2020-04-17 20:35:35 +01:00
Francesco Petrogalli	17b1869b72	Revert "[llvm][CodeGen] Addressing modes for SVE stN." This reverts commit `fc4e954ed5`. The commit reported the following failure: http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420 FAILED: lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o /usr/bin/c++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Target/AArch64 -I/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64 -I/usr/include/libxml2 -Iinclude -I/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/include -mthumb -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -fvisibility=hidden -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MMD -MT lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o -MF lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o.d -o lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o -c /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10: error: chosen constructor is explicit in copy-initialization return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset}; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here constexpr tuple(_UElements&&... __elements)	2020-04-17 20:03:11 +01:00
Francesco Petrogalli	fc4e954ed5	[llvm][CodeGen] Addressing modes for SVE stN. Reviewers: efriedma, sdesmalen, c-rhodes, ctetreau Reviewed By: c-rhodes Subscribers: tschuett, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77435	2020-04-17 19:31:44 +01:00
Francesco Petrogalli	48879c02bf	[llvm][CodeGen] Fix issue for SVE gather prefetch. Summary: This change is fixing an issue where the dagcombine incorrectly used an addressing mode with scaled offsets (indices), instead of unscaled offsets. Those addressing modes do not exist for `prfh` , `prfw` and `prfd`, hence we can reuse `prfb` because that has unscaled offsets, and because the pseudo-code in the XML spec suggests that the element size is not used for the amount of data that is prefetched by the instruction. FWIW, GCC also emits a `prfb` for these cases. Reviewers: sdesmalen, andwar, rengolin Reviewed By: sdesmalen Subscribers: tschuett, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78069	2020-04-17 19:23:28 +01:00
Benjamin Kramer	d1ef44982f	[AArch64] Fold one-use variables into assert Avoids unused variable warnings in Release builds.	2020-04-17 19:43:06 +02:00
Petre-Ionut Tudor	cabfcf840a	[ARM] Fix conditions for lowering to S[LR]I Summary: Fixed wrong conditions for generating (S[LR]I X, Y, C2) from (or (and X, BvecC1), (lsl Y, C2)) and added ISel nodes to lower to S[LR]I. The optimisation is also enabled by default now. Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77387	2020-04-17 17:19:24 +01:00
Benjamin Kramer	166467e822	[VectorUtils] Create shufflevector masks as int vectors instead of Constants No functionality change intended.	2020-04-17 15:28:00 +02:00
Francesco Petrogalli	89680f25e8	[llvm][CodeGen] Rename SVE gather prefetch intrinsics. [NFC] Summary: The renaming is necessary to make the naming scheme uniform with other gather/scatter load/stores SVE intrinsics. The naming of variables and functions have been adapted to make it explicit whether we are dealing with a scalar offset (which is unscaled) or an index (which is scaled according to the data type of the lanes of the vector). Reviewers: andwar, sdesmalen, rengolin Reviewed By: andwar Subscribers: tschuett, hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77839	2020-04-15 21:49:16 +01:00
Christopher Tetreault	05a079895c	[SVE] Remove calls to getBitWidth from AArch64 Reviewers: efriedma Reviewed By: efriedma Subscribers: danielkiss, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77905	2020-04-14 10:26:37 -07:00
Craig Topper	113f37a1f9	[CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase Differential Revision: https://reviews.llvm.org/D77995	2020-04-13 13:50:15 -07:00
Christopher Tetreault	9f87d951fc	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: mcrosier, efriedma, sdesmalen Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77269	2020-04-09 16:43:29 -07:00
Eli Friedman	1ee6ec2bf3	Remove "mask" operand from shufflevector. Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors. This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types. I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that. Differential Revision: https://reviews.llvm.org/D72467	2020-03-31 13:08:59 -07:00
Eli Friedman	dacf8d3562	[AArch64][SVE] Add support for fcmp. This also requires support for boolean "not", so I added boolean logic while I was there. Differential Revision: https://reviews.llvm.org/D76901	2020-03-31 12:04:39 -07:00
Kerry McLaughlin	05606329e2	[AArch64][SVE] Add SVE intrinsics for masked loads & stores Summary: Implements the following intrinsics for contiguous loads & stores: - @llvm.aarch64.sve.ld1 - @llvm.aarch64.sve.st1 Reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, dancgr, rengolin Reviewed By: cameron.mcinally Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76688	2020-03-25 11:48:40 +00:00
Amara Emerson	472d282046	[AArch64][GlobalISel] Don't localize TLS G_GLOBAL_VALUEs on Darwin. On Darwin these need to be selected into a function call for the TLS address lookup. As a result, they can't be moved below a physreg write, which happens in call sequences. In the long term, we should have some mechanism in the localizer to prevent localizing into target-specific atomic instruction sequences. rdar://60056248 Differential Revision: https://reviews.llvm.org/D76652	2020-03-24 13:35:50 -07:00
Andrzej Warzynski	0ea4fb5bb7	[AArch64][SVE] Rename intrinsics for gather prefetch [NFC] Summary: In order to keep the names consistent with other SVE gather loads, the intrinsics for gather prefetch are renamed as follows: * @llvm.aarch64.sve.gather.prfb -> @llvm.aarch64.sve.prfb.gather Reviewed by: fpetrogalli Differential Revision: https://reviews.llvm.org/D76421	2020-03-19 12:53:36 +00:00
Sander de Smalen	4788ca450f	[AArch64][SVE] Change pointer type of nontemporal load/store intrinsics Summary: This fixes a discrepancy between the non-temporal loads/store intrinsics and other SVE load intrinsics (such as nf/ff), so that Clang can use the same code to generate these intrinsics. Reviewers: andwar, kmclaughlin, rengolin, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76237	2020-03-18 12:44:51 +00:00
Vitaly Buka	f20dcc31e3	Fix unused function warning	2020-03-16 19:45:36 -07:00
Benjamin Kramer	05ff3323e0	[AArch64] Remove unused variable	2020-03-16 21:59:55 +01:00
Francesco Petrogalli	0f2b68d9c7	Implement IR intrinsics for gather prefetch. Summary: Intrinsics and relative codegen has been implemented for the following SVE instructions: 1. PRF<T> <prfop>, <Pg>, [<Xn\|SP>, <Zm>.S, <mod>] -> 32-bit scaled offset 2. PRF<T> <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D, <mod>] -> 32-bit unpacked scaled offset 3. PRF<T> <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D] -> 64-bit scaled offset 4. PRF<T> <prfop>, <Pg>, [<Zn>.S{, #<imm>}] -> 32-bit element 5. PRF<T> <prfop>, <Pg>, [<Zn>.D{, #<imm>}] -> 64-bit element The instructions are associated the following intrinsics, respectively: 1. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx4vi32( i8* %base, <vscale x 4 x i32> %offset, <vscale x 4 x i1> %Pg, i32 %prfop) 2. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx2vi32( i8* %base, <vscale x 2 x i32> %offset, <vscale x 2 x i1> %Pg, i32 %prfop) 3. void @llvm.aarch64.sve.gather.prf<T>.scaled.nx2vi64( i8* %base, <vscale x 2 x i64> %offset, <vscale x 2 x i1> %Pg, i32 %prfop) 4. void @llvm.aarch64.sve.gather.prf<T>.nx4vi32( <vscale x 4 x i32> %bases, i64 %imm, <vscale x 4 x i1> %Pg, i32 %prfop) 5. void @llvm.aarch64.sve.gather.prf<T>.nx2vi64( <vscale x 2 x i64> %bases, i64 %imm, <vscale x 2 x i1> %Pg, i32 %prfop) The intrinsics are the IR counterpart of the following SVE ACLE functions: * void svprf<T>(svbool_t pg, const void base, svprfop op) void svprf<T>_vnum(svbool_t pg, const void base, int64_t vnum, svprfop op) void svprf<T>_gather[_u32base](svbool_t pg, svuint32_t bases, svprfop op) * void svprf<T>_gather[_u64base](svbool_t pg, svuint64_t bases, svprfop op) * void svprf<T>_gather_[s32]offset(svbool_t pg, const void base, svint32_t offsets, svprfop op) void svprf<T>_gather_[u32]offset(svbool_t pg, const void base, svint32_t offsets, svprfop op) void svprf<T>_gather_[s64]offset(svbool_t pg, const void base, svint64_t offsets, svprfop op) void svprf<T>_gather_[u64]offset(svbool_t pg, const void base, svint64_t offsets, svprfop op) void svprf<T>_gather[_u32base]_offset(svbool_t pg, svuint32_t bases, int64_t offset, svprfop op) * void svprf<T>_gather[_u64base]_offset(svbool_t pg, svuint64_t bases,int64_t offset, svprfop op) Reviewers: andwar, sdesmalen, efriedma, rengolin Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75580	2020-03-16 18:52:35 +00:00
Andrzej Warzynski	a0c15ed460	[AArch64][SVE] Add the @llvm.aarch64.sve.dup.x intrinsic Summary: This intrinsic implements the unpredicated duplication of scalar values and is mapped to (through ISD::SPLAT_VECTOR): * DUP <Zd>.<T>, #<imm> * DUP <Zd>.<T>, <R><n\|SP> Reviewed by: sdesmalen Differential Revision: https://reviews.llvm.org/D75900	2020-03-13 12:40:22 +00:00
Andrzej Warzynski	46b9f14d71	[AArch64][SVE] Add intrinsics for non-temporal scatters/gathers Summary: This patch adds the following intrinsics for non-temporal gather loads and scatter stores: * aarch64_sve_ldnt1_gather_index * aarch64_sve_stnt1_scatter_index These intrinsics implement the "scalar + vector of indices" addressing mode. As opposed to regular and first-faulting gathers/scatters, there's no instruction that would take indices and then scale them. Instead, the indices for non-temporal gathers/scatters are scaled before the intrinsics are lowered to `ldnt1` instructions. The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as placeholders so that we can easily identify the cases implemented in this patch in performGatherLoadCombine and performScatterStoreCombined. Once encountered, they are replaced with: * GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1 * SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1 The patterns for lowering ISD::SHL for scalable vectors (required by this patch) were missing, so these are added too. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D75601	2020-03-12 13:55:56 +00:00
Andrzej Warzynski	a9f1583228	[AArch64][SVE] Add the @llvm.aarch64.sve.sel intrinsic Reviewers: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75928	2020-03-11 17:05:21 +00:00
Djordje Todorovic	c15c68abdc	[CallSiteInfo] Enable the call site info only for -g + optimizations Emit call site info only in the case of '-g' + 'O>0' level. Differential Revision: https://reviews.llvm.org/D75175	2020-03-09 12:12:44 +01:00
Andrzej Warzynski	9249f60602	[AArch64][SVE] Add intrinsics for non-temporal gather-loads/scatter-stores Summary: This patch adds the following LLVM IR intrinsics for SVE: 1. non-temporal gather loads * @llvm.aarch64.sve.ldnt1.gather * @llvm.aarch64.sve.ldnt1.gather.uxtw * @llvm.aarch64.sve.ldnt1.gather.scalar.offset 2. non-temporal scatter stores * @llvm.aarch64.sve.stnt1.scatter * @llvm.aarch64.sve.ldnt1.gather.uxtw * @llvm.aarch64.sve.ldnt1.gather.scalar.offset These intrinsic are mapped to the corresponding SVE instructions (example for half-words, zero-extending): * ldnt1h { z0.s }, p0/z, [z0.s, x0] * stnt1h { z0.s }, p0/z, [z0.s, x0] Note that for non-temporal gathers/scatters, the SVE spec defines only one instruction type: "vector + scalar". For this reason, we swap the arguments when processing intrinsics that implement the "scalar + vector" addressing mode: * @llvm.aarch64.sve.ldnt1.gather * @llvm.aarch64.sve.ldnt1.gather.uxtw * @llvm.aarch64.sve.stnt1.scatter * @llvm.aarch64.sve.ldnt1.gather.uxtw In other words, all intrinsics for gather-loads and scatter-stores implemented in this patch are mapped to the same load and store instruction, respectively. The sve2_mem_gldnt_vs multiclass (and it's counterpart for scatter stores) from SVEInstrFormats.td was split into: * sve2_mem_gldnt_vec_vs_32_ptrs (32bit wide base addresses) * sve2_mem_gldnt_vec_vs_62_ptrs (64bit wide base addresses) This is consistent with what we did for @llvm.aarch64.sve.ld1.scalar.offset and highlights the actual split in the spec and the implementation. Reviewed by: sdesmalen Differential Revision: https://reviews.llvm.org/D74858	2020-03-02 10:38:28 +00:00
Andrzej Warzynski	fa9439fac8	[AArch64][SVE] Add intrinsics for first-faulting gather loads Summary: The following intrinsics are added: * @llvm.aarch64.sve.ldff1.gather * @llvm.aarch64.sve.ldff1.gather.index * @llvm.aarch64.sve.ldff1.gather_sxtw * @llvm.aarch64.sve.ldff1.gather.uxtw * @llvm.aarch64.sve.ldff1.gather_sxtw.index * @llvm.aarch64.sve.ldff1.gather.uxtw.index * @llvm.aarch64.sve.ldff1.gather.scalar.offset Although this patch is quite substantial, the vast majority of the implementation is just a 'copy & paste' of the implementation of regular gather loads, including tests. There's only a handful of new definitions: * AArch64ISD nodes defined in AArch64ISelLowering.h (e.g. GLDFF1) * Seleciton DAG Types in AArch64SVEInstrInfo.td (e.g. AArch64ldff1_gather) * intrinsics in IntrinsicsAArch64.td (e.g. aarch64_sve_ldff1_gather) * Pseudo instructions in SVEInstrFormats.td to workaround the issue of use-before-def for the FFR register. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D75128	2020-02-27 12:56:33 +00:00
Sjoerd Meijer	13db7490fa	[AArch64] Peephole optimization: merge AND and TST instructions In some cases Clang does not perform merging of instructions AND and TST (aka ANDS xzr). Example: tst x2, x1 and x3, x2, x1 to: ands x3, x2, x1 This patch add such merging during instruction selection: when AND is replaced with ANDS instruction in LowerSELECT_CC, all users of AND also should be changed for using this ANDS instruction Short discussion on mailing list: http://llvm.1065342.n5.nabble.com/llvm-dev-ARM-Peephole-optimization-instructions-tst-add-tp133109.html Patch by Pavel Kosov. Differential Revision: https://reviews.llvm.org/D71701	2020-02-27 09:23:47 +00:00
Craig Topper	735d27dc40	[SelectionDAG][PowerPC][AArch64][X86][ARM] Add chain input and output the ISD::FLT_ROUNDS_ This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order. This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code. Differential Revision: https://reviews.llvm.org/D75132	2020-02-25 16:58:23 -08:00
Andrzej Warzynski	cff90c938b	[AArch64][SVE] Update names and comments for gathers/scatters (NFC) Summary: This patch renames functions and TableGen classes for SVE gathers and scatters. The original names implied that the corresponding methods/classes are only suited for regular gathers/scatters (i.e. LD1 and ST1), which is not the case. Indeed, we will be re-using them for non-temporal and first-faulting gathers/scatters in the forthcoming patches. The new names also highlight the split into Vector-Scalar (VS) and Scalar-Vector (SV) cases. List of changes: * `performLD1GatherCombine` and `performST1ScatterCombine` are renamed as `performGatherLoadCombine` and `performScatterStoreCombine`, respectively. * Selection DAG types for scatters and gathers from AArch64SVEInstrInfo.td are renamed. For example, `SDT_AArch64_GLD1` is renamed as `SDT_AArch64_GATHER_SV`. SV stands for Scalar-Vector, as opposed to Vector-Scalar (VS). * The intrinsic classes from IntrinsicsAArch64.td are renamed. For example, `AdvSIMD_GatherLoad_64bitOffset_Intrinsic` is renamed as `AdvSIMD_GatherLoad_SV_64b_Offsets_Intrinsic`. * Updated comments in `performGatherLoadCombine` and `performScatterStoreCombine`. Reviewers: sdesmalen, rengolin, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75035	2020-02-25 11:09:01 +00:00
Cullen Rhodes	72848f26b4	[AArch64][SVE] Add predicate reinterpret intrinsics Summary: Implements the following intrinsics: * llvm.aarch64.sve.convert.to.svbool * llvm.aarch64.sve.convert.from.svbool For converting the ACLE svbool_t type (<n x 16 x i1>) to and from the other predicate types: <n x 8 x i1>, <n x 4 x i1> and <n x 2 x i1>. Reviewers: sdesmalen, kmclaughlin, efriedma, dancgr, rengolin Reviewed By: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74471	2020-02-25 10:24:06 +00:00
Kerry McLaughlin	f87f23c81c	[AArch64][SVE] Add the SVE dupq_lane intrinsic Summary: Implements the @llvm.aarch64.sve.dupq.lane intrinsic. As specified in the ACLE, the behaviour of: svdupq_lane_u64(data, index) ...is identical to: svtbl(data, svadd_x(svptrue_b64(), svand_x(svptrue_b64(), svindex_u64(0, 1), 1), index * 2)) If the index is in the range [0,3], the operation is equivalent to a single DUP (.q) instruction. Reviewers: sdesmalen, c-rhodes, cameron.mcinally, efriedma, dancgr, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74734	2020-02-24 13:59:47 +00:00
Craig Topper	9bbf271fc9	[AArch64] Move isOverflowIntrOpRes help function to the ISD namespace in SelectionDAG.h. NFC Enables sharing with an upcoming X86 change.	2020-02-20 08:50:17 -08:00
Djordje Todorovic	2f215cf36a	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGfaff707db82d. A failure found on an ARM 2-stage buildbot. The investigation is needed.	2020-02-20 14:41:39 +01:00
Cameron McInally	3931734990	[AArch64][SVE] Add initial backend support for FP splat_vector Differential Revision: https://reviews.llvm.org/D74632	2020-02-19 10:19:11 -06:00
Djordje Todorovic	faff707db8	Reland "[DebugInfo] Enable the debug entry values feature by default" Differential Revision: https://reviews.llvm.org/D73534	2020-02-19 11:12:26 +01:00
Djordje Todorovic	2bf44d11cb	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGa82d3e8a6e67.	2020-02-18 16:38:11 +01:00
Djordje Todorovic	a82d3e8a6e	Reland "[DebugInfo] Enable the debug entry values feature by default" This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-18 14:41:08 +01:00
Sander de Smalen	a7a96c726e	[AArch64] Implement passing SVE vectors by ref for AAPCS. Summary: This patch implements the part of the calling convention where SVE Vectors are passed by reference. This means the caller must allocate stack space for these objects and pass the address to the callee. Reviewers: efriedma, rovka, cameron.mcinally, c-rhodes, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71216	2020-02-17 15:20:28 +00:00
Kerry McLaughlin	633db60f3e	[AArch64][SVE] Add SVE index intrinsic Summary: Implements the @llvm.aarch64.sve.index intrinsic, which takes a scalar base and step value. This patch also adds the printSImm function to AArch64InstPrinter to ensure that immediates of type i8 & i16 are printed correctly. Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin Reviewed By: cameron.mcinally Subscribers: tatyana-krasnukha, tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74550	2020-02-17 10:30:11 +00:00
Pavel Iliin	b6a9fe2099	[AArch64] Add BIT/BIF support. This patch added generation of SIMD bitwise insert BIT/BIF instructions. In the absence of GCC-like functionality for optimal constraints satisfaction during register allocation the bitwise insert and select patterns are matched by pseudo bitwise select BSP instruction with not tied def. It is expanded later after register allocation with def tied to BSL/BIT/BIF depending on operands registers. This allows to get rid of redundant moves. Reviewers: t.p.northover, samparker, dmgreen Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D74147	2020-02-14 14:19:39 +00:00
Kerry McLaughlin	671cbc1fbb	[AArch64][SVE] Add mul/mla/mls lane & dup intrinsics Summary: Implements the following intrinsics: - @llvm.aarch64.sve.dup - @llvm.aarch64.sve.mul.lane - @llvm.aarch64.sve.mla.lane - @llvm.aarch64.sve.mls.lane Reviewers: c-rhodes, sdesmalen, dancgr, efriedma, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74222	2020-02-13 10:32:59 +00:00
Djordje Todorovic	97ed706a96	Revert "[DebugInfo] Enable the debug entry values feature by default" This reverts commit rG9f6ff07f8a39. Found a test failure on clang-with-thin-lto-ubuntu buildbot.	2020-02-12 11:59:04 +01:00
Djordje Todorovic	9f6ff07f8a	[DebugInfo] Enable the debug entry values feature by default This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-12 10:25:14 +01:00
Craig Topper	eeb63944e4	[LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results. Remove code from LegalizeTypes that allowed this to work. We were already using BUILD_PAIR for this in some places so this standardizes on a single way to do this.	2020-02-08 09:52:31 -08:00
Guillaume Chatelet	f85d3408e6	[NFC] Introduce an API for MemOp Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73964	2020-02-07 11:32:27 +01:00
Reid Kleckner	2d89e0a098	[SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr The CATCHPAD node mostly existed to be selected into the EH_RESTORE instruction, which sets the frame back up when 32-bit Windows exceptions return to the parent function. However, creating this MachineInstr early increases the risk that other passes will come along and insert instructions that use the stack before ESP and EBP are restored. That happened in PR44697. Instead of representing these in the instruction stream early, delay it until PEI. Mark the blocks where this needs to happen as EHPads, but not funclet entry blocks. Passes after PEI have to be careful not to hoist instructions that can use stack across frame setup instructions, so this should be relatively reliable. Fixes PR44697 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D73752	2020-02-04 15:13:12 -08:00
Guillaume Chatelet	b8144c0536	[NFC] Encapsulate MemOp logic Summary: This patch simply introduces functions instead of directly accessing the fields. This helps introducing additional check logic. A second patch will add simplifying functions. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73945	2020-02-04 10:36:26 +01:00
Guillaume Chatelet	333f2ad8b8	[Alignment][NFC] Use Align for getMemcpy/Memmove/Memset Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73885	2020-02-03 17:13:19 +01:00
John Brawn	68cf574857	[FPEnv][AArch64] Add lowering of f128 STRICT_FSETCC These get lowered to function calls, like the non-strict versions. Differential Revision: https://reviews.llvm.org/D73784	2020-02-03 14:39:16 +00:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00
John Brawn	0bb9a27c98	[FPEnv][AArch64] Add lowering and instruction selection for strict conversions Strict fp-to-int and int-to-fp conversions can be handled in the same way that the non-strict versions are (by using the appropriate instruction or converting to a function call when we have no instruction). Differential Revision: https://reviews.llvm.org/D73625	2020-01-30 13:50:06 +00:00
John Brawn	258d8dd76a	[FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUND This gets selected to the appropriate fcvt instruction. Handling from there on isn't fully correct yet, as we need to model fcvt reading and writing to fpsr and fpcr. Differential Revision: https://reviews.llvm.org/D73201	2020-01-30 12:51:25 +00:00
John Brawn	2224407ef5	Add lowering of STRICT_FSETCC and STRICT_FSETCCS These become STRICT_FCMP and STRICT_FCMPE, which then get selected to the corresponding FCMP and FCMPE instructions, though the handling from there on isn't fully correct as we don't model reads and writes to FPCR and FPSR. Differential Revision: https://reviews.llvm.org/D73368	2020-01-30 10:40:55 +00:00
Kerry McLaughlin	aa0f37e14a	[AArch64][SVE] Add first-faulting load intrinsic Summary: Implements the llvm.aarch64.sve.ldff1 intrinsic and DAG combine rules for first-faulting loads with sign & zero extends Reviewers: sdesmalen, efriedma, andwar, dancgr, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73025	2020-01-23 11:57:16 +00:00
Sander de Smalen	4cf16efe49	[AArch64][SVE] Add patterns for unpredicated load/store to frame-indices. This patch also fixes up a number of cases in DAGCombine and SelectionDAGBuilder where the size of a scalable vector is used in a fixed-width context (thus triggering an assertion failure). Reviewers: efriedma, c-rhodes, rovka, cameron.mcinally Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D71215	2020-01-22 14:32:27 +00:00
Kerry McLaughlin	cdcc4f2a44	[AArch64][SVE] Add intrinsic for non-faulting loads Summary: This patch adds the llvm.aarch64.sve.ldnf1 intrinsic, plus DAG combine rules for non-faulting loads and sign/zero extends Reviewers: sdesmalen, efriedma, andwar, dancgr, mgudim, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71698	2020-01-22 11:15:20 +00:00
Sander de Smalen	67d4c9924c	Add support for (expressing) vscale. In LLVM IR, vscale can be represented with an intrinsic. For some targets, this is equivalent to the constexpr: getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1 This can be used to propagate the value in CodeGenPrepare. In ISel we add a node that can be legalized to one or more instructions to materialize the runtime vector length. This patch also adds SVE CodeGen support for VSCALE, which maps this node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD] instructions (scaled multiples of 2, 4, or 8 bytes, respectively). Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner Reviewed by: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D68203	2020-01-22 10:09:27 +00:00
Florian Hahn	535ed62c5f	[AArch64] Add custom store lowering for 256 bit non-temporal stores. Currently we fail to lower non-termporal stores for 256+ bit vectors to STNPQ, because type legalization will split them up to 128 bit stores and because there are no single non-temporal stores, creating STPNQ in the Load/Store optimizer would be quite tricky. This patch adds custom lowering for 256 bit non-temporal vector stores to improve the generated code. Reviewers: dmgreen, samparker, t.p.northover, ab Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D72919	2020-01-21 14:53:40 -08:00
Andrzej Warzynski	7e717b3990	[AArch64][SVE] Extend int_aarch64_sve_ld1_gather_imm The ACLE distinguishes between the following addressing modes for gather loads: * "scalar base, vector offset", and * "vector base, scalar offset". For the "vector base, scalar offset" case, the `int_aarch64_sve_ld1_gather_imm` intrinsic was added in `79f2422d`. Currently, that intrinsic assumes that the scalar offset is passed as an immediate. As a result, it does not cater for cases where scalar offset is stored in a register. In this patch `int_aarch64_sve_ld1_gather_imm` is extended so that all cases are covered: * `int_aarch64_sve_ld1_gather_imm` is renamed as `int_aarch64_sve_ld1_gather_scalar_offset` * new DAG combine rules are added for GLD1_IMM for scenarios where the offset is a non-immediate scalar or an out-of-range immediate * sve-intrinsics-gather-loads-vector-base.ll is renamed as sve-intrinsics-gather-loads-vector-base-imm-offset.ll * sve-intrinsics-gather-loads-vector-base-scalar-offset.ll is added to test file for non-immediate offsets Similar changes are made for scatter store intrinsics. Reviewed By: sdesmalen, efriedma Differential Revision: https://reviews.llvm.org/D71773	2020-01-20 12:19:18 +00:00
Matt Arsenault	0d0fce42b0	GlobalISel: Preserve load/store metadata in IRTranslator This was dropping the invariant metadata on dead argument loads, so they weren't deleted. Atomics still need to be fixed the same way. Also, apparently store was never preserving dereferencable which should also be fixed.	2020-01-16 13:49:43 -05:00
Benjamin Kramer	06cfcdcca7	[AArch64][SVE] Fold variable into assert to silence unused variable warnings in Release builds	2020-01-15 12:50:27 +01:00
Cullen Rhodes	93a4dede3a	[AArch64][SVE] Add ptest intrinsics Summary: Implements the following intrinsics: * @llvm.aarch64.sve.ptest.any * @llvm.aarch64.sve.ptest.first * @llvm.aarch64.sve.ptest.last Reviewers: sdesmalen, efriedma, dancgr, mgudim, cameron.mcinally, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72398	2020-01-15 11:15:01 +00:00
Danilo Carvalho Grael	2d7e757a83	[AArch64][SVE] Add patterns for some arith SVE instructions. Summary: Add patterns for the following instructions: - smax, smin, umax, umin Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin Subscribers: amehsan Differential Revision: https://reviews.llvm.org/D71779	2020-01-13 11:39:42 -05:00
KAWASHIMA Takahiro	10c11e4e2d	This option allows selecting the TLS size in the local exec TLS model, which is the default TLS model for non-PIC objects. This allows large/ many thread local variables or a compact/fast code in an executable. Specification is same as that of GCC. For example, the code model option precedes the TLS size option. TLS access models other than local-exec are not changed. It means supoort of the large code model is only in the local exec TLS model. Patch By KAWASHIMA Takahiro (kawashima-fj <t-kawashima@fujitsu.com>) Reviewers: dmgreen, mstorsjo, t.p.northover, peter.smith, ostannard Reviewd By: peter.smith Committed by: peter.smith Differential Revision: https://reviews.llvm.org/D71688	2020-01-13 10:16:53 +00:00
Jessica Paquette	ceb801612a	[AArch64] Don't generate libcalls for wide shifts on Darwin Similar to `cff90f07cb`. Darwin doesn't always use compiler-rt, and so we can't assume that these functions are available (at least on arm64).	2020-01-10 15:58:51 -08:00
Matt Arsenault	255cc5a760	CodeGen: Use LLT instead of EVT in getRegisterByName Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.	2020-01-09 17:37:52 -05:00
QingShan Zhang	2133d3c558	[DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG for vector type as 'expand' instead of 'legal' For now, we didn't set the default operation action for SIGN_EXTEND_INREG for vector type, which is 0 by default, that is legal. However, most target didn't have native instructions to support this opcode. It should be set as expand by default, as what we did for ANY_EXTEND_VECTOR_INREG. Differential Revision: https://reviews.llvm.org/D70000	2020-01-03 03:26:41 +00:00
Jay Foad	13a7a4ccbf	Remove unneeded extra variable realArgIdx. NFC.	2020-01-02 14:27:32 +00:00
Andrzej Warzynski	404da13e1e	[AArch64][SVE] Gather loads: pass 32 bit unpacked offsets as nxv2i32 Summary: Currently 32 bit unpacked offsets are passed as nxv2i64. However, as pointed out in https://reviews.llvm.org/D71074, using nxv2i32 instead would improve consistency with: * how other arguments are treated * how scatter stores are implemented This patch makes sure that 32 bit unpacked offsets are passes as nxv2i32 instead of nxv2i64. Reviewers: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71724	2020-01-02 13:01:28 +00:00
Craig Topper	4ae3120ed8	[LegalizeVectorOps][AArch64] Stop asking for v4f16 fp_round and fp_extend to be promoted. These operations are needed as building blocks for promoting so they can't be promoted themselves. This appeared to work because the fp_extend query type for operation actions is the result type, not the input type so it never triggered in the legalizer. For fp_round, the vector op legalizer just ended up creating a nop fp_extend that was elided by getNode, followed by a nop fp_round that was also elided by getNode. This was followed by a final fp_round from v4f32 back to vf416 which was CSEd to the original node. Then legalize vector ops just believed that node legalized to itself. LegalizeDAG took another crack at promoting it, but didn't have a handler so just skipped it with a debug message saying it wasn't promoted. This patch just removes the operation actions to avoid this non-sense. Found while trying to refactor LegalizeVectorOps to handle multiple result nodes better.	2019-12-31 15:04:12 -08:00
Sanjay Patel	8cefc37be5	[DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast()) support This moves the X86 specific transform from rL364407 into DAGCombiner to generically handle 'little to big' cases (for example: extract_subvector(v2i64 bitcast(v16i8))). This allows us to remove both the x86 implementation and the aarch64 bitcast(extract_subvector(bitcast())) combine. Earlier patches that dealt with regressions initially exposed by this patch: rG5e5e99c041e4 rG0b38af89e2c0 Patch by: @RKSimon (Simon Pilgrim) Differential Revision: https://reviews.llvm.org/D63815	2019-12-23 10:11:45 -05:00
Martin Storsjö	5a751e747d	[AArch64] [Windows] Use COFF stubs for calls to extern_weak functions As the extern_weak target might be missing, resolving to the absolute address zero, we can't use the normal direct PC-relative branch instructions (as that would result in relocations out of range). Improve the classifyGlobalFunctionReference method to set MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in AArch64TargetLowering::LowerCall to use the return value from classifyGlobalFunctionReference for these cases. Add code in both AArch64FastISel and GlobalISel/IRTranslator to bail out for function calls to extern weak functions on windows, to let SelectionDAG handle them. This matches what was done for X86 in `6bf108d77a`. Differential Revision: https://reviews.llvm.org/D71721	2019-12-23 12:13:49 +02:00
Sanjay Patel	0b38af89e2	[AArch64] match splat of bitcasted extract subvector to DUPLANE This is another potential regression exposed by D63815. Here we peek through a bitcast to find an extract subvector and scale the splat offset based on that: splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC' Differential Revision: https://reviews.llvm.org/D71672	2019-12-22 08:37:03 -05:00
Cullen Rhodes	974f00a436	[AArch64][SVE] Fold constant multiply of element count Summary: E.g. %0 = tail call i64 @llvm.aarch64.sve.cntw(i32 31) %mul = mul i64 %0, <const> Should emit: cntw x0, all, mul #<const> For <const> in the range 1-16. Patch by Kerry McLaughlin Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71014	2019-12-20 11:58:00 +00:00
Cullen Rhodes	3f9005eb89	Recommit "[AArch64][SVE] Add permutation and selection intrinsics" Recommit `23c28c4043` (reverted in `dcb48f50bd`) with a fix for an assert "Request for a fixed size on a scalable object" being triggered in `LowerSVEIntrinsicEXT`. The fix is to call `getKnownMinSize` on the TypeSize object.	2019-12-20 10:45:17 +00:00
Cullen Rhodes	dcb48f50bd	Revert "[AArch64][SVE] Add permutation and selection intrinsics" This reverts commit `23c28c4043`. It caused build failures in the following expensive checks builders: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1295 http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/700 Reverting for now whilst I figure what the issue is.	2019-12-19 14:26:14 +00:00
Cullen Rhodes	23c28c4043	[AArch64][SVE] Add permutation and selection intrinsics Summary: Adds the following intrinsics: * @llvm.aarch64.sve.clasta * @llvm.aarch64.sve.clasta_n * @llvm.aarch64.sve.clastb * @llvm.aarch64.sve.clastb_n * @llvm.aarch64.sve.compact * @llvm.aarch64.sve.ext * @llvm.aarch64.sve.lasta * @llvm.aarch64.sve.lastb * @llvm.aarch64.sve.rev * @llvm.aarch64.sve.splice * @llvm.aarch64.sve.tbl * @llvm.aarch64.sve.trn1 * @llvm.aarch64.sve.trn2 * @llvm.aarch64.sve.uzp1 * @llvm.aarch64.sve.uzp2 * @llvm.aarch64.sve.zip1 * @llvm.aarch64.sve.zip2 Reviewers: sdesmalen, efriedma, dancgr, mgudim, huntergr, rengolin Reviewed By: sdesmalen, efriedma Subscribers: kmclaughlin, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71401	2019-12-19 13:18:40 +00:00
Cullen Rhodes	49199465a3	[AArch64][SVE] Implement ptrue intrinsic Reviewers: sdesmalen, eli.friedman, dancgr, mgudim, cameron.mcinally, huntergr, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71457	2019-12-19 11:02:05 +00:00
Victor Campos	364b8f5fbe	[AArch64] Improve codegen of volatile load/store of i128 Summary: Instead of generating two i64 instructions for each load or store of a volatile i128 value (two LDRs or STRs), now emit a single LDP or STP. Reviewers: labrinea, t.p.northover, efriedma Reviewed By: efriedma Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69559	2019-12-18 10:03:12 +00:00
Andrzej Warzynski	7e20c3a71d	[Aarch64][SVE] Add intrinsics for scatter stores Summary: This patch adds the following SVE intrinsics for scatter stores: * 64-bit offsets: * @llvm.aarch64.sve.st1.scatter (unscaled) * @llvm.aarch64.sve.st1.scatter.index (scaled) * 32-bit unscaled offsets: * @llvm.aarch64.sve.st1.scatter.uxtw (zero-extended offset) * @llvm.aarch64.sve.st1.scatter.sxtw (sign-extended-offset) * 32-bit scaled offsets: * @llvm.aarch64.sve.st1.scatter.uxtw.index (zero-extended offset) * @llvm.aarch64.sve.st1.scatter.sxtw.index (sign-extended offset) * vector base + immediate: * @llvm.aarch64.sve.st1.scatter.imm Reviewers: rengolin, efriedma, sdesmalen Reviewed By: efriedma, sdesmalen Subscribers: kmclaughlin, eli.friedman, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71074	2019-12-16 11:52:53 +00:00
Alex Richardson	be15dfa88f	[NFC] Use EVT instead of bool for getSetCCInverse() Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void )0x12033091e < (void )0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917	2019-12-13 12:22:03 +00:00
Kerry McLaughlin	4194ca8e5a	Recommit "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores" Updated pred_load patterns added to AArch64SVEInstrInfo.td by this patch to use reg + imm non-temporal loads to fix previous test failures. Original commit message: Adds the following intrinsics: - llvm.aarch64.sve.ldnt1 - llvm.aarch64.sve.stnt1 This patch creates masked loads and stores with the MONonTemporal flag set when used with the intrinsics above.	2019-12-13 10:08:20 +00:00
Cullen Rhodes	bbd16b6876	[AArch64][SVE] Remove nxv1f32 and nxv1f64 as legal types Summary: Also cleans up ZPR register class definition. Reviewers: sdesmalen, cameron.mcinally, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71351	2019-12-12 09:49:22 +00:00
Reid Kleckner	5d986953c8	[IR] Split out target specific intrinsic enums into separate headers This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320	2019-12-11 18:02:14 -08:00
Kerry McLaughlin	c0a3ab3655	Revert "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores" This reverts commit `3f5bf35f86` as it was causing build failures in llvm-clang-x86_64-expensive-checks: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/392 http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1045	2019-12-11 13:58:39 +00:00
Andrzej Warzynski	65651f197a	[AArch64][SVE] Add DAG combine rules for gather loads and sext/zext Summary: These changes allow us to support sign-extending gather loads with the exisiting intrinsics (i.e. @llvm.aarch64.sve.ld1.gather.*). Reviewers: sdesmalen, huntergr, kmclaughlin, efriedma, rengolin, rovka, dancgr, mgudim Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential revision: https://reviews.llvm.org/D70812	2019-12-11 12:56:18 +00:00
Kerry McLaughlin	3f5bf35f86	[AArch64][SVE] Implement intrinsics for non-temporal loads & stores Summary: Adds the following intrinsics: - llvm.aarch64.sve.ldnt1 - llvm.aarch64.sve.stnt1 This patch creates masked loads and stores with the MONonTemporal flag set when used with the intrinsics above. Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71000	2019-12-11 11:13:51 +00:00
Cullen Rhodes	1b9a608c84	[AArch64][SVE] Add wide compare immediate patterns Summary: Recognize wide compares where the wide operand is a splat of a scalar value in the appropriate range and convert to the immediate variant of the instruction. Patch by Graham Hunter Reviewers: sdesmalen, efriedma, dancgr, rovka, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71009	2019-12-10 10:41:22 +00:00
Eli Friedman	f1ddef34f1	[AArch64][SVE] Implement SPLAT_VECTOR for i1 vectors. The generated sequence with whilelo is unintuitive, but it's the best I could come up with given the limited number of SVE instructions that interact with scalar registers. The other sequence I was considering was something like dup+cmpne, but an extra scalar instruction seems better than an extra vector instruction. Differential Revision: https://reviews.llvm.org/D71160	2019-12-09 15:09:33 -08:00
Danilo Carvalho Grael	b29916cec3	[AArch64][SVE] Integer reduction instructions pattern/intrinsics. Added pattern matching/intrinsics for the following SVE instructions: -- saddv, uaddv -- smaxv, sminv, umaxv, uminv -- orv, eorv, andv	2019-12-05 09:59:19 -05:00
Sander de Smalen	79f2422d6a	[Aarch64][SVE] Add intrinsics for gather loads (vector + imm) This patch adds intrinsics for SVE gather loads from memory addresses generated by a vector base plus immediate index: * @llvm.aarch64.sve.ld1.gather.imm This intrinsics maps 1-1 to the corresponding SVE instruction (example for half-words): * ld1h { z0.d }, p0/z, [z0.d, #16] Committed on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, huntergr, kmclaughlin, eli.friedman, rengolin, rovka, dancgr, mgudim, efriedma Reviewed By: sdesmalen Tags: #llvm Differential Revision: https://reviews.llvm.org/D70806	2019-12-03 15:19:16 +00:00
Sander de Smalen	8bf31e28d7	[Aarch64][SVE] Add intrinsics for gather loads with 32-bits offsets This patch adds intrinsics for SVE gather loads for which the offsets are 32-bits wide and are: * unscaled * @llvm.aarch64.sve.ld1.gather.sxtw * @llvm.aarch64.sve.ld1.gather.uxtw * scaled (offsets become indices) * @llvm.arch64.sve.ld1.gather.sxtw.index * @llvm.arch64.sve.ld1.gather.uxtw.index The offsets are either zero (uxtw) or sign (sxtw) extended to 64 bits. These intrinsics map 1-1 to the corresponding SVE instructions (examples for half-words): * unscaled * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw] * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw] * scaled * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw #1] * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw #1] Committed on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, kmclaughlin, eli.friedman, rengolin, rovka, huntergr, dancgr, mgudim, efriedma Reviewed By: sdesmalen Tags: #llvm Differential Revision: https://reviews.llvm.org/D70782	2019-12-03 14:48:29 +00:00
Sander de Smalen	6e51ceba53	[AArch64][SVE] Add intrinsics for gather loads with 64-bit offsets This patch adds the following intrinsics for gather loads with 64-bit offsets: * @llvm.aarch64.sve.ld1.gather (unscaled offset) * @llvm.aarch64.sve.ld1.gather.index (scaled offset) These intrinsics map 1-1 to the following AArch64 instructions respectively (examples for half-words): * ld1h { z0.d }, p0/z, [x0, z0.d] * ld1h { z0.d }, p0/z, [x0, z0.d, lsl #1] Committing on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, huntergr, rovka, mgudim, dancgr, rengolin, efriedma Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D70542	2019-12-03 12:55:03 +00:00
Kerry McLaughlin	7483eb656f	[AArch64][SVE] Implement shift intrinsics Summary: Adds the following intrinsics: - asr & asrd - insr - lsl & lsr This patch also adds a new AArch64ISD node (INSR) to represent the int_aarch64_sve_insr intrinsic. Reviewers: huntergr, sdesmalen, dancgr, mgudim, rengolin, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70437	2019-12-03 11:47:12 +00:00
Matt Arsenault	b696b9dba7	DAG: Add function context to isFMAFasterThanFMulAndFAdd AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.	2019-11-19 19:25:26 +05:30
Graham Hunter	3f08ad611a	[SVE][CodeGen] Scalable vector MVT size queries * Implements scalable size queries for MVTs, split out from D53137. * Contains a fix for FindMemType to avoid using scalable vector type to contain non-scalable types. * Explicit casts for several places where implicit integer sign changes or promotion from 32 to 64 bits caused problems. * CodeGenDAGPatterns will treat scalable and non-scalable vector types as different. Reviewers: greened, cameron.mcinally, sdesmalen, rovka Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D66871	2019-11-18 12:30:59 +00:00
Sander de Smalen	84a0c8e3ae	[AArch64][SVE] Spilling/filling of SVE callee-saves. Implement the spills/fills of callee-saved SVE registers using STR and LDR instructions. Also adds the `aarch64_sve_vector_pcs` attribute to specify the callee-saved registers to be used for functions that return SVE vectors or take SVE vectors as arguments. The callee-saved registers are vector registers z8-z23 and predicate registers p4-p15. The overal frame-layout with SVE will be as follows: +-------------+ \| stack args \| +-------------+ \| Callee Saves\| \| X29, X30 \| \|-------------\| <- FP \| SVE Callee \| < ////////////// \| saved regs \| < ////////////// \| z23 \| < ////////////// \| : \| < // SCALABLE // \| z8 \| < ////////////// \| p15 \| < /// STACK //// \| : \| < ////////////// \| p4 \| < //// AREA //// +-------------+ < ////////////// \| : \| < ////////////// \| SVE locals \| < ////////////// \| : \| < ////////////// +-------------+ \|/////////////\| alignment gap. \| : \| \| Stack objs \| \| : \| +-------------+ <- SP after call and frame-setup Reviewers: cameron.mcinally, efriedma, greened, thegameg, ostannard, rengolin Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D68996	2019-11-11 09:03:19 +00:00
Eli Friedman	5df3a87224	[AArch64][X86] Don't assume __powidf2 is available on Windows. We had some code for this for 32-bit ARM, but this doesn't really need to be in target-specific code; generalize it. (I think this started showing up recently because we added an optimization that converts pow to powi.) Differential Revision: https://reviews.llvm.org/D69013	2019-11-08 12:43:21 -08:00
David Green	2179867ddc	[AArch64] Select saturating Neon instructions This adds some extra patterns to select AArch64 Neon SQADD, UQADD, SQSUB and UQSUB from the existing target independent sadd_sat, uadd_sat, ssub_sat and usub_sat nodes. It does not attempt to replace the existing int_aarch64_neon_uqadd intrinsic nodes as they are apparently used for both scalar and vector, and need to be legal on scalar types for some of the patterns to work. The int_aarch64_neon_uqadd on scalar would move the two integers into floating point registers, perform a Neon uqadd and move the value back. I don't believe this is good idea for uadd_sat to do the same as the scalar alternative is simpler (an adds with a csinv). For signed it may be smaller, but I'm not sure about it being better. So this just adds some extra patterns for the existing vector instructions, matching on the _sat nodes. Differential Revision: https://reviews.llvm.org/D69374	2019-10-31 17:28:36 +00:00
Ehsan Amiri	ed7bcb2cb1	[AArch64][SVE] Add patterns for some integer vector instructions Add pattern matching for SVE vector instructions: -- add, sub, and, or, xor instructions -- sqadd, uqadd, sqsub, uqsub target-independent intrinsics -- bic intrinsics -- predicated add, sub, subr intrinsics Patch Review: https://reviews.llvm.org/D69128 Patch authored by: dancgr (Danilo Carvalho Grael)	2019-10-30 21:52:19 -04:00
Andrew Paverd	d157a9bc8b	Add Windows Control Flow Guard checks (/guard:cf). Summary: A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on indirect function calls, using either the check mechanism (X86, ARM, AArch64) or or the dispatch mechanism (X86-64). The check mechanism requires a new calling convention for the supported targets. The dispatch mechanism adds the target as an operand bundle, which is processed by SelectionDAG. Another pass (CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as required by /guard:cf. This feature is enabled using the `cfguard` CC1 option. Reviewers: thakis, rnk, theraven, pcc Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65761	2019-10-28 15:19:39 +00:00
Kerry McLaughlin	da720a38b9	[AArch64][SVE] Implement masked load intrinsics Summary: Adds support for codegen of masked loads, with non-extending, zero-extending and sign-extending variants. Reviewers: huntergr, rovka, greened, dmgreen Reviewed By: dmgreen Subscribers: dmgreen, samparker, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68877	2019-10-28 10:06:14 +00:00
Graham Hunter	84da2596f9	[AArch64][SVE] Add SPLAT_VECTOR ISD Node Adds a new ISD node to replicate a scalar value across all elements of a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot be used. Fixes up default type legalization for scalable vectors after the new MVT type ranges were introduced. At present I only use this node for scalable vectors. A DAGCombine has been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all elements are the same, but only if the default operation action of Expand has been overridden by the target. I've only added result promotion legalization for scalable vector i8/i16/i32/i64 types in AArch64 for now. Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy Reviewed By: jmolloy Differential Revision: https://reviews.llvm.org/D47775 llvm-svn: 375222	2019-10-18 11:48:35 +00:00
Kerry McLaughlin	0c7cc383e5	[AArch64][SVE] Implement unpack intrinsics Summary: Implements the following intrinsics: - int_aarch64_sve_sunpkhi - int_aarch64_sve_sunpklo - int_aarch64_sve_uunpkhi - int_aarch64_sve_uunpklo This patch also adds AArch64ISD nodes for UNPK instead of implementing the intrinsics directly, as they are required for a future patch which implements the sign/zero extension of legal vectors. This patch includes tests for the Subdivide2Argument type added by D67549 Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka Reviewed By: greened Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D67550 llvm-svn: 375210	2019-10-18 09:40:16 +00:00
Graham Hunter	b302561b76	[SVE][IR] Scalable Vector size queries and IR instruction support * Adds a TypeSize struct to represent the known minimum size of a type along with a flag to indicate that the runtime size is a integer multiple of that size * Converts existing size query functions from Type.h and DataLayout.h to return a TypeSize result * Adds convenience methods (including a transparent conversion operator to uint64_t) so that most existing code 'just works' as if the return values were still scalars. * Uses the new size queries along with ElementCount to ensure that all supported instructions used with scalable vectors can be constructed in IR. Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen Reviewed By: rovka, sdesmalen Differential Revision: https://reviews.llvm.org/D53137 llvm-svn: 374042	2019-10-08 12:53:54 +00:00
Nikola Prica	02682498b8	[ISEL][ARM][AARCH64] Tracking simple parameter forwarding registers Support for tracking registers that forward function parameters into the following function frame. For now we only support cases when parameter is forwarded through single register. Reviewers: aprantl, vsk, t.p.northover Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D66953 llvm-svn: 374033	2019-10-08 09:43:05 +00:00
Matt Arsenault	f24ac13aaa	TLI: Remove DAG argument from getRegisterByName Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292	2019-10-01 01:44:39 +00:00
Guillaume Chatelet	18f805a7ea	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types llvm-svn: 373081	2019-09-27 12:54:21 +00:00
Hans Wennborg	3740ae3b8a	Revert r372893 "[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets" This caused severe compile-time regressions, see PR43455. > Modern processors predict the targets of an indirect branch regardless of > the size of any jump table used to glean its target address. Moreover, > branch predictors typically use resources limited by the number of actual > targets that occur at run time. > > This patch changes the semantics of the option `-max-jump-table-size` to limit > the number of different targets instead of the number of entries in a jump > table. Thus, it is now renamed to `-max-jump-table-targets`. > > Before, when `-max-jump-table-size` was specified, it could happen that > cluster jump tables could have targets used repeatedly, but each one was > counted and typically resulted in tables with the same number of entries. > With this patch, when specifying `-max-jump-table-targets`, tables may have > different lengths, since the number of unique targets is counted towards the > limit, but the number of unique targets in tables is the same, but for the > last one containing the balance of targets. > > Differential revision: https://reviews.llvm.org/D60295 llvm-svn: 373060	2019-09-27 09:54:26 +00:00
Evandro Menezes	3bd8ba156b	[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets Modern processors predict the targets of an indirect branch regardless of the size of any jump table used to glean its target address. Moreover, branch predictors typically use resources limited by the number of actual targets that occur at run time. This patch changes the semantics of the option `-max-jump-table-size` to limit the number of different targets instead of the number of entries in a jump table. Thus, it is now renamed to `-max-jump-table-targets`. Before, when `-max-jump-table-size` was specified, it could happen that cluster jump tables could have targets used repeatedly, but each one was counted and typically resulted in tables with the same number of entries. With this patch, when specifying `-max-jump-table-targets`, tables may have different lengths, since the number of unique targets is counted towards the limit, but the number of unique targets in tables is the same, but for the last one containing the balance of targets. Differential revision: https://reviews.llvm.org/D60295 llvm-svn: 372893	2019-09-25 16:10:20 +00:00
Florian Hahn	364a23427b	[AArch64] Convert neon_ushl and neon_sshl with positive constants to VSHL. I think we should be able to use shl instead of sshl and ushl for positive constant shift values, unless I am missing something. We already have the machinery in place to ensure we only replace nodes, if the shift value is positive and <= the element width. This is a generalization of an earlier patch rL372565. Reviewers: t.p.northover, samparker, dmgreen, anemet Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D67955 llvm-svn: 372824	2019-09-25 08:22:05 +00:00
Florian Hahn	3e2fdbee80	[AArch64] support neon_sshl and neon_ushl in performIntrinsicCombine. Try to generate ushll/sshll for aarch64_neon_ushl/aarch64_neon_sshl, if their first operand is extended and the second operand is a constant Also adds a few tests marked with FIXME, where we can further increase codegen. Reviewers: t.p.northover, samparker, dmgreen, anemet Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D62308 llvm-svn: 372565	2019-09-23 09:38:53 +00:00
Benjamin Kramer	df4b9a3f4f	Hide implementation details in namespaces. llvm-svn: 372113	2019-09-17 12:56:29 +00:00
Graham Hunter	1a9195d817	[SVE][MVT] Fixed-length vector MVT ranges * Reordered MVT simple types to group scalable vector types together. * New range functions in MachineValueType.h to only iterate over the fixed-length int/fp vector types. * Stopped backends which don't support scalable vector types from iterating over scalable types. Reviewers: sdesmalen, greened Reviewed By: greened Differential Revision: https://reviews.llvm.org/D66339 llvm-svn: 372099	2019-09-17 10:19:23 +00:00
Kerry McLaughlin	e55b3bf40e	[SVE][Inline-Asm] Add constraints for SVE predicate registers Summary: Adds the following inline asm constraints for SVE: - Upl: One of the low eight SVE predicate registers, P0 to P7 inclusive - Upa: SVE predicate register with full range, P0 to P15 Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, cameron.mcinally, greened, rengolin Reviewed By: rovka Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66524 llvm-svn: 371967	2019-09-16 09:45:27 +00:00
Tim Northover	f1c2892912	AArch64: support arm64_32, an ILP32 slice for watchOS. This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM. FastISel is mostly disabled for now since it would generate incorrect code for ILP32. llvm-svn: 371722	2019-09-12 10:22:23 +00:00
Guillaume Chatelet	ad1cea0dda	[Alignment][NFC] Use Align with TargetLowering::setPrefFunctionAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67267 llvm-svn: 371212	2019-09-06 15:03:49 +00:00
Guillaume Chatelet	9fcf066d0c	[Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67278 llvm-svn: 371210	2019-09-06 14:51:15 +00:00
Guillaume Chatelet	4fc3ad9e13	[Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67229 llvm-svn: 371200	2019-09-06 12:48:34 +00:00
Guillaume Chatelet	aff45e4b23	[LLVM][Alignment] Make functions using log of alignment explicit Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045	2019-09-05 10:00:22 +00:00
Kerry McLaughlin	7b5c6b8d86	[SVE][Inline-Asm] Fix -Wimplicit-fallthrough in AArch64ISelLowering.cpp Summary: Adds break to 'x' case in getRegForInlineAsmConstraint added by D66302, fixing the unintentional fallthrough. Reviewers: sdesmalen, rovka, cameron.mcinally, greened, gribozavr, ruiu Reviewed By: sdesmalen Subscribers: bjope, javed.absar, tschuett, kristof.beyls, rkruppe, psnobl, llvm-commits, cfe-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67095 llvm-svn: 370769	2019-09-03 15:45:42 +00:00

... 2 3 4 5 6 ...

1056 Commits