llvm-project

Commit Graph

Author	SHA1	Message	Date
QingShan Zhang	ebdd20f430	Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128 X86 and AArch64 expand it as libcall inside the target. And PowerPC also want to expand them as libcall for P8. So, propose an implement in the legalizer to common the logic and remove the code for X86/AArch64 to avoid the duplicate code. Reviewed By: Craig Topper Differential Revision: https://reviews.llvm.org/D91331	2020-12-17 07:59:30 +00:00
Paul Walker	b74c4dbb96	[SVE] Move INT_TO_FP i1 promotion into custom lowering. AddPromotedToType is being used to legalise INT_TO_FP operations when the source is a predicate. The point where this introduces vector extends might cause problems in the future so this patch falls back to manual promotion within custom lowering. Differential Revision: https://reviews.llvm.org/D90093	2020-12-15 11:57:07 +00:00
Kerry McLaughlin	c5ced82c8e	[SVE][CodeGen] Lower scalable floating-point vector reductions Changes in this patch: - Minor changes to the LowerVECREDUCE_SEQ_FADD function added by @cameron.mcinally to also work for scalable types - Added TableGen patterns for FP reductions with unpacked types (nxv2f16, nxv4f16 & nxv2f32) - Asserts added to expandFMINNUM_FMAXNUM & expandVecReduceSeq for scalable types Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D93050	2020-12-14 11:45:42 +00:00
Florian Hahn	46bc40e502	Recommit "[AArch64] Lower calls with rv_marker attribute." This recommits `a87fccb3ff` with a fix to mark the destination operand of the marker instruction as def, to fix a machine verifier failure. This reverts the revert commit `c0f2cea7c0`.	2020-12-13 16:20:39 +00:00
Florian Hahn	c0f2cea7c0	Revert "[AArch64] Lower calls with rv_marker attribute ." This reverts commit `a87fccb3ff`. A test appears to fail with expensive checks. Reverting while I investigate.	2020-12-11 20:12:59 +00:00
Florian Hahn	a87fccb3ff	[AArch64] Lower calls with rv_marker attribute . This patch adds support for lowering function calls with the rv_marker attribute. The goal is to expand such calls to the following sequence of instructions: BL @fn mov x29, x29 This sequence of instructions triggers Objective-C runtime optimizations, hence we want to ensure no instructions get moved in between them. This patch achieves that by adding a new CALL_RVMARKER ISD node, which gets turned into the BLR_RVMARKER pseudo, which eventually gets expanded into the sequence mentioned above. The sequence is then marked as instruction bundle, to avoid anything being moved in between. @ahatanak is working on using this attribute in the front- & middle-end. Together with the front- & middle-end changes, this should address PR31925 for AArch64. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D92569	2020-12-11 19:45:44 +00:00
Kerry McLaughlin	abe7775f5a	[SVE][CodeGen] Extend index of masked gathers This patch changes performMSCATTERCombine to also promote the indices of masked gathers where the element type is i8 or i16, and adds various tests for gathers with illegal types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91433	2020-12-10 13:54:45 +00:00
Kerry McLaughlin	05edfc5475	[SVE][CodeGen] Add DAG combines for s/zext_masked_gather This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true: - fold (and (masked_gather x)) -> (zext_masked_gather x) - fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x) LowerMGATHER has also been updated to fetch the LoadExtType associated with the gather and also use this value to determine the correct masked gather opcode to use. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92230	2020-12-09 11:53:19 +00:00
Huihui Zhang	8e6fc1f97e	[AArch64][SVE] Add lowering for llvm.maxnum\|minnum for scalable type. LLVM intrinsic llvm.maxnum\|minnum is overloaded intrinsic, can be used on any floating-point or vector of floating-point type. This patch extends current infrastructure to support scalable vector type. This patch also fix a warning message of incorrect use of EVT::getVectorNumElements() for scalable type, when DAGCombiner trying to split scalable vector. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92607	2020-12-08 09:35:53 -08:00
David Sherwood	59f17b57d9	[SVE] Fix crashes with inline assembly All the crashes found compiling inline assembly are fixed in this patch by changing AArch64TargetLowering::getRegForInlineAsmConstraint to be more resilient to mismatched value and register types. For example, it makes no sense to request a predicate register for a nxv2i64 type and so on. Tests have been added here: test/CodeGen/AArch64/inline-asm-constraints-bad-sve.ll Differential Revision: https://reviews.llvm.org/D92554	2020-12-08 13:48:43 +00:00
Tim Northover	c5978f42ec	UBSAN: emit distinctive traps Sometimes people get minimal crash reports after a UBSAN incident. This change tags each trap with an integer representing the kind of failure encountered, which can aid in tracking down the root cause of the problem.	2020-12-08 10:28:26 +00:00
Kerry McLaughlin	111f559bbd	[SVE][CodeGen] Call refineIndexType & refineUniformBase from visitMGATHER The refineIndexType & refineUniformBase functions added by D90942 can also be used to improve CodeGen of masked gathers. These changes were split out from D91092 Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92319	2020-12-07 13:20:19 +00:00
Kerry McLaughlin	f6dd32fd35	[SVE][CodeGen] Lower scalable masked gathers Lowers the llvm.masked.gather intrinsics (scalar plus vector addressing mode only) Changes in this patch: - Add custom lowering for MGATHER, using getGatherVecOpcode() to choose the appropriate gather load opcode to use. - Improve codegen with refineIndexType/refineUniformBase, added in D90942 - Tests added for gather loads with 32 & 64-bit scaled & unscaled offsets. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91092	2020-12-07 12:20:41 +00:00
Craig Topper	c55d9af8c0	[AArch64] Add custom lowering for ISD::ABS Instead of trying to pattern match the code produced by ISD::ABS expansion, just custom legalize ISD::ABS to the desired sequence. The one test change is because a DAG combine for (neg (abs)) is no longer firing because ISD::ABS is now Custom instead of Expand. Differential Revision: https://reviews.llvm.org/D92154	2020-12-04 10:45:31 -08:00
Kerry McLaughlin	603d40da9d	[SVE][CodeGen] Add a DAG combine to extend mscatter indices This patch adds a target-specific DAG combine for mscatter to promote indices with element types i8 or i16 before legalisation, plus various tests with illegal types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90945	2020-11-25 11:18:22 +00:00
Pavel Iliin	4d7df43ffd	[AArch64] Out-of-line atomics (-moutline-atomics) implementation. This patch implements out of line atomics for LSE deployment mechanism. Details how it works can be found in llvm/docs/Atomics.rst Options -moutline-atomics and -mno-outline-atomics to enable and disable it were added to clang driver. This is clang and llvm part of out-of-line atomics interface, library part is already supported by libgcc. Compiler-rt support is provided in separate patch. Differential Revision: https://reviews.llvm.org/D91157	2020-11-20 13:30:12 +00:00
Adhemerval Zanella	807320119f	[AArch64] Lower fptrunc/fpext from/to FP128t to/from FP16 The compiler-rt part which adds the emitted symbols is handled in a subsequent patch. Differential Revision: https://reviews.llvm.org/D91731	2020-11-19 15:14:50 -03:00
Kerry McLaughlin	306c8ab208	[SVE][CodeGen] Improve codegen of scalable masked scatters If the scatter store is able to perform the sign/zero extend of its index, this is folded into the instruction with refineIndexType(). Additionally, refineUniformBase() will return the base pointer and index from an add + splat_vector. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90942	2020-11-13 11:19:36 +00:00
David Sherwood	3225fcf11e	[SVE] Deal with SVE tuple call arguments correctly when running out of registers When passing SVE types as arguments to function calls we can run out of hardware SVE registers. This is normally fine, since we switch to an indirect mode where we pass a pointer to a SVE stack object in a GPR. However, if we switch over part-way through processing a SVE tuple then part of it will be in registers and the other part will be on the stack. I've fixed this by ensuring that: 1. When we don't have enough registers to allocate the whole block we mark any remaining SVE registers temporarily as allocated. 2. We temporarily remove the InConsecutiveRegs flags from the last tuple part argument and reinvoke the autogenerated calling convention handler. Doing this prevents the code from entering an infinite recursion and, in combination with 1), ensures we switch over to the Indirect mode. 3. After allocating a GPR register for the pointer to the tuple we then deallocate any SVE registers we marked as allocated in 1). We also set the InConsecutiveRegs flags back how they were before. 4. I've changed the AArch64ISelLowering LowerCALL and LowerFormalArguments functions to detect the start of a tuple, which involves allocating a single stack object and doing the correct numbers of legal loads and stores. Differential Revision: https://reviews.llvm.org/D90219	2020-11-12 08:41:50 +00:00
Caroline Concatto	37f4ccb275	[AArch64]Add memory op cost model for SVE This patch adds/fixes memory op cost model for SVE with fixed-width vector. Differential Revision: https://reviews.llvm.org/D90950	2020-11-11 12:49:19 +00:00
Simon Pilgrim	1a62ca65c1	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
Kerry McLaughlin	170947a5de	[SVE][CodeGen] Lower scalable masked scatters Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only) Changes included in this patch: - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use. Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts. - Added the getCanonicalIndexType function to convert redundant addressing modes (e.g. scaling is redundant when accessing bytes) - Tests with 32 & 64-bit scaled & unscaled offsets Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90941	2020-11-11 11:50:22 +00:00
Francesco Petrogalli	9f61931e07	[llvm][AArch64] Allow TB(N)Z to drop signext for sign bit tests. For example if the sign extension is only used in for TBZ, and the value is used elsewhere with a zero extension, this can eliminate a sign extension. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D90606	2020-11-09 18:27:48 +00:00
Christopher Tetreault	900ec97bbe	[UBSan] Cannot negate smallest negative signed integer Silence warning Undefined Behavior Sanitzer warning: runtime error: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D90710	2020-11-04 10:07:52 -08:00
Kerry McLaughlin	f2412d372d	[SVE][CodeGen] Lower scalable integer vector reductions This patch uses the existing LowerFixedLengthReductionToSVE function to also lower scalable vector reductions. A separate function has been added to lower VECREDUCE_AND & VECREDUCE_OR operations with predicate types using ptest. Lowering scalable floating-point reductions will be addressed in a follow up patch, for now these will hit the assertion added to expandVecReduce() in TargetLowering. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D89382	2020-11-04 11:38:49 +00:00
David Sherwood	cea69fa4dc	[SVE] Add fatal error for unnamed SVE variadic arguments We don't currently support passing unnamed variadic SVE arguments so I've added a fatal error if we hit such cases to prevent any silent ABI issues in future. Differential Revision: https://reviews.llvm.org/D90230	2020-10-30 13:35:47 +00:00
Florian Hahn	ba78cae20f	[AArch64] Use DUP for BUILD_VECTOR with few different elements. If most elements of BUILD_VECTOR are the same, with a few different elements, it is better to use DUP for the common elements and INSERT_VECTOR_ELT for the different elements. Currently this transform is guarded quite restrictively to only trigger in clearly beneficial cases. With D90176, the lowering for patterns originating from code like ` float32x4_t y = {a,a,a,0};` (common in 3D apps) are lowered even better (unnecessary fmov is removed). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D90233	2020-10-28 19:48:20 +00:00
David Green	066737fdbc	[AArch64] Remove AArch64ISD::NOT, use vnot instead vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT node, but allow more folding thanks to all the target independent optimizations. Specifically this allows select(icmp ne, x, y) to become "cmeq; bsl y, x" as opposed to needing to convert the predicate with "cmeq; mvn; bsl x, y" Unfortunately there is a regression in a cmtst test, but the code it selected from was already non-canonical, with instcombine preferring to use an eq predicate instead. Plus the more common case of icmp ne is improved. Differential Revision: https://reviews.llvm.org/D90126	2020-10-28 08:15:37 +00:00
Cameron McInally	a1cc274cb3	[SVE] Lower fixed length VECREDUCE_SEQ_FADD operation Differential Revision: https://reviews.llvm.org/D89162	2020-10-23 16:24:02 -05:00
David Sherwood	d67d8f8790	[SVE][AArch64] Replace TypeSize comparisons with their integer equivalents In many places in the AArch64 backend we are comparing TypeSize objects, but in fact we are only ever expecting fixed width types. I've changed all such comparisons to use their integer equivalents by replacing calls to getSizeInBits() with getFixedSizeInBits(), etc. Differential Revision: https://reviews.llvm.org/D89116	2020-10-19 07:41:33 +01:00
Vinay Madhusudan	159b2d8e62	[AArch64] Combine UADDVs to generate vector add ADD(UADDV a, UADDV b) --> UADDV(ADD a, b) This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888 Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929 Differential Revision: https://reviews.llvm.org/D88731	2020-10-15 09:56:31 +05:30
Cameron McInally	421f1b7294	[SVE] Lower fixed length VECREDUCE_FADD operation Differential Revision: https://reviews.llvm.org/D89263	2020-10-14 09:41:11 -05:00
David Sherwood	af57a0838e	[SVE] Add fatal error when running out of registers for SVE tuple call arguments When passing SVE types as arguments to function calls we can run out of hardware SVE registers. This is normally fine, since we switch to an indirect mode where we pass a pointer to a SVE stack object in a GPR. However, if we switch over part-way through processing a SVE tuple then part of it will be in registers and the other part will be on the stack. This is wrong and we'd like to avoid any silent ABI compatibility issues in future. For now, I've added a fatal error when this happens until we can get a proper fix. Differential Revision: https://reviews.llvm.org/D89326	2020-10-14 09:31:41 +01:00
Vinay Madhusudan	37dce7475b	[AArch64] Identify SAD pattern (ABS (SUB (EXTEND a), (EXTEND b))) to ZERO_EXTEND((UABD a, b)) (ABS (SUB (SIGN_EXTEND a), (SIGN_EXTEND b))) to ZERO_EXTEND((SABD a, b)) This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888 Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929 Differential Revision: https://reviews.llvm.org/D88742	2020-10-13 15:50:54 +05:30
Cameron McInally	974ddb54c9	[SVE] Lower fixed length VECREDUCE_XOR operation Differential Revision: https://reviews.llvm.org/D88974	2020-10-12 10:12:15 -05:00
Cameron McInally	333b2ab60b	[SVE] Lower fixed length VECREDUCE_OR operation Differential Revision: https://reviews.llvm.org/D88847	2020-10-07 09:56:25 -05:00
Paul Walker	27f3d51b4e	[SVE] Lower fixed length vector fneg and fsqrt operations. Also updates sve-fp.ll to use fneg directly. Differential Revision: https://reviews.llvm.org/D88683	2020-10-06 10:48:16 +01:00
Paul Walker	8bb702a8ad	[SVE] Lower fixed length vector floating point rounding operations. Adds lowering for: llvm.ceil llvm.floor llvm.nearbyint llvm.rint llvm.round llvm.trunc Differential Revision: https://reviews.llvm.org/D88671	2020-10-06 10:48:16 +01:00
Cameron McInally	9642ded8ba	[SVE] Lower fixed length VECREDUCE_AND operation Differential Revision: https://reviews.llvm.org/D88707	2020-10-05 11:28:38 -05:00
Vinay Madhusudan	f192594956	[AArch64] Generate dot for v16i8 sum reduction to i32 Convert VECREDUCE_ADD( EXTEND(v16i8_type) ) to VECREDUCE_ADD( DOTv16i8(v16i8_type) ) whenever the result type is i32. This gains in one of the SPECCPU 2017 benchmark. This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888 Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929 Differential Revision: https://reviews.llvm.org/D88577	2020-10-02 17:11:02 +01:00
Muhammad Asif Manzoor	aab6f7db47	[AArch64][SVE] Add lowering for llvm fabs Add the functionality to lower fabs for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D88679	2020-10-01 19:41:25 -04:00
Kerry McLaughlin	fcf70e1e3b	[SVE][CodeGen] Lower scalable fp_extend & fp_round operations This patch adds FP_EXTEND_MERGE_PASSTHRU & FP_ROUND_MERGE_PASSTHRU ISD nodes, used to lower scalable vector fp_extend/fp_round operations. fp_round has an additional argument, the 'trunc' flag, which is an integer of zero or one. This also fixes a warning introduced by the new tests added to sve-split-fcvt.ll, resulting from an implicit TypeSize -> uint64_t cast in SplitVecOp_FP_ROUND. Reviewed By: sdesmalen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D88321	2020-10-01 12:17:37 +01:00
Kerry McLaughlin	75db7cf78a	[SVE][CodeGen] Legalisation of integer -> floating point conversions Splitting the operand of a scalable [S\|U]INT_TO_FP results in a concat_vectors operation where the operands are unpacked FP scalable vectors (e.g. nxv2f32). This patch adds custom lowering of concat_vectors which checks that the number of operands is 2, and isel patterns to match concat_vectors of scalable FP types with uzp1. Reviewed By: efriedma, paulwalker-arm Differential Revision: https://reviews.llvm.org/D88033	2020-10-01 10:43:20 +01:00
Paul Walker	8931c3d682	[NFC] Iterate across an explicit list of scalable MVTs when driving setOperationAction. Iterating across all of integer_scalable_vector_valuetypes seems wasteful when there's only a handful we care about. Also removes some rouge whitespace. Differential Revision: https://reviews.llvm.org/D88552	2020-10-01 10:17:59 +01:00
Cameron McInally	80381c4dc9	[SVE] Lower fixed length VECREDUCE_[FMAX\|FMIN] to Scalable Differential Revision: https://reviews.llvm.org/D88444	2020-09-29 16:22:29 -05:00
Cameron McInally	9b0b09671c	[SVE] Lower fixed length VECREDUCE_[UMAX\|UMIN] to Scalable Essentially the same as the signed variants from D88259. Also includes a clean up of the lowering function. Differential Revision: https://reviews.llvm.org/D88317	2020-09-28 09:29:00 -05:00
David Sherwood	bafdd11326	[SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy After some recent upstream discussion we decided that it was best to avoid having the / operator for both ElementCount and TypeSize, since this could give the impression that these classes can be used in the same way as basic integer integer types. However, division for scalable types is a bit odd because we are only dividing the minimum quantity by a value, as opposed to something like: (MinSize * Vscale) / SomeValue This is why when performing division it's important the caller first establishes whether the operation makes sense, perhaps by calling isKnownMultipleOf() prior to division. The caller must now explictly call divideCoefficientBy() on the class to perform the operation. Differential Revision: https://reviews.llvm.org/D87700	2020-09-28 08:03:00 +01:00
Cameron McInally	9a4767411e	[SVE] Revert accidental change from 405e22fbe8719cff6c40eec15c2044f42527f116 Accidentally commited two lines that were not intended. Remove those.	2020-09-25 10:11:10 -05:00
Cameron McInally	e2ccf7f178	[SVE] Lower fixed length VECREDUCE_[SMAX\|SMIN] to Scalable This patch is pretty similar to the VECREDUCE_ADD patch, with some minor tweaks. Results from the AArch64ISD::[SMAX\|SMIN]V_PRED return element sized results. This requires an ANY_EXTEND for results < 32-bits, since Legalization promotes those results. There is no NEON i64 vector support for SMAXV\|SMINV, so use SVE for those. Differential Revision: https://reviews.llvm.org/D88259	2020-09-25 09:58:17 -05:00
Daniel Kiss	2a96f47c5f	[AArch64] __builtin_return_address for PAuth. This change adds the support for __builtin_return_address for ARMv8.3A Pointer Authentication. Location of the authentication code in the pointer depends on the system configuration, therefore a dedicated instruction is used for effectively removing the authentication code without authenticating the pointer. Reviewed By: chill Differential Revision: https://reviews.llvm.org/D75044	2020-09-24 23:23:49 +02:00
Cameron McInally	e8413ac97f	[AArch64] Expand some vector of i64 reductions on NEON With the exception of VECREDUCE_ADD, there are no NEON instructions to support vector of i64 reductions. This patch removes the Custom lowerings for those and adds some test coverage to confirm. Differential Revision: https://reviews.llvm.org/D88161	2020-09-23 16:01:24 -05:00
Muhammad Asif Manzoor	3a76de4275	[AArch64][SVE] Add lowering for llvm frecpx Add the functionality to lower frecpx for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D88032	2020-09-23 15:23:54 -04:00
Cameron McInally	db40a74344	[SVE] Lower fixed length ISD::VECREDUCE_ADD to Scalable Differential Revision: https://reviews.llvm.org/D87796	2020-09-23 09:08:07 -05:00
Kerry McLaughlin	d0149ba9b4	[SVE][CodeGen] Lower legal integer -> floating point conversions This patch adds new ISD nodes, SCVTZ_MERGE_PASSTHRU & UCVTZ_MERGE_PASSTHRU, which are used to lower both legal scalable vector [S\|U]INT_TO_FP operations and the following intrinsics: - llvm.aarch64.sve.scvtf - llvm.aarch64.sve.ucvtf Reviewed By: sdesmalen, efriedma Differential Revision: https://reviews.llvm.org/D87913	2020-09-23 11:53:53 +01:00
David Sherwood	e077367a28	[SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits An existing function Type::getScalarSizeInBits returns a uint64_t instead of a TypeSize class because the caller is requesting a scalar size, which cannot be scalable. This patch makes other similar functions requesting a scalar size consistent with that, thereby eliminating more than 1000 implicit TypeSize -> uint64_t casts. Differential revision: https://reviews.llvm.org/D87889	2020-09-23 09:20:08 +01:00
Paul Walker	f3fa954b5b	[SVE] Change definition of reduction ISD nodes to have an SVE vector result type. The current nodes, AArch64::SMAXV_PRED for example, are defined to return a NEON vector result. This is incorrect because they modify the complete SVE register and are thus changed to represent such. This patch also adds nodes for UADDV_PRED and SADDV_PRED, which unifies the handling of all SVE reductions. NOTE: Floating-point reductions are already implemented correctly, so this patch is essentially making everything consistent with those. Differential Revision: https://reviews.llvm.org/D87843	2020-09-21 13:16:28 +01:00
Tim Northover	2afe4becec	AArch64: make sure jump table entries can reach entire image This turns all jump table entries into deltas within the target function because in the small memory model all code & static data must be in a 4GB block somewhere in memory. When the entries were a delta between the table location and a basic block, the 32-bit signed entries are not enough to guarantee reachability. https://reviews.llvm.org/D87286	2020-09-18 09:50:40 +01:00
Cameron McInally	a35c7f3076	[SVE][WIP] Implement lowering for fixed length VSELECT to Scalable Map fixed length VSELECT to its Scalable equivalent. Differential Revision: https://reviews.llvm.org/D85364	2020-09-17 14:02:57 -05:00
Sanne Wouda	d5fd3d9b90	[AArch64] Match pairwise add/fadd pattern D75689 turns the faddp pattern into a shuffle with vector add. Match this new pattern in target-specific DAG combine, rather than ISel, because legalization (for v2f32) turns it into a bit of a mess. - extended to cover f16, f32, f64 and i64	2020-09-17 16:27:01 +01:00
Kerry McLaughlin	f7185b271f	[SVE][CodeGen] Lower floating point -> integer conversions This patch adds new ISD nodes, FCVTZS_MERGE_PASSTHRU & FCVTZU_MERGE_PASSTHRU, which are used to lower scalable vector FP_TO_SINT/FP_TO_UINT operations and the following intrinsics: - llvm.aarch64.sve.fcvtzu - llvm.aarch64.sve.fcvtzs Reviewed By: efriedma, paulwalker-arm Differential Revision: https://reviews.llvm.org/D87232	2020-09-17 14:04:22 +01:00
Muhammad Asif Manzoor	d417488ef5	[AArch64][SVE] Add lowering for llvm fsqrt Add the functionality to lower fsqrt for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D87707	2020-09-15 15:26:17 -04:00
Philip Reames	e6bc7037d3	[AArch64] Statepoint support for AArch64. Differential Revision: https://reviews.llvm.org/D66012 Patch By: loicottet (with major rebase by me)	2020-09-14 16:43:08 -07:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Sanjay Patel	3a8ea8609b	[Intrinsics] define semantics for experimental fmax/fmin vector reductions As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html This is hopefully the final remaining showstopper before we can remove the 'experimental' from the reduction intrinsics. No behavior was specified for the FP min/max reductions, so we have a mess of different interpretations. There are a few potential options for the semantics of these max/min ops. I think this is the simplest based on current behavior/implementation: make the reductions inherit from the existing llvm.maxnum/minnum intrinsics. These correspond to libm fmax/fmin, and those are similar to the (now deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing data). So the default expansion creates calls to libm functions. Another option would be to inherit from llvm.maximum/minimum (NaNs propagate), but most targets just crash in codegen when given those nodes because no default expansion was ever implemented AFAICT. We could also just assume 'nnan' semantics by default (we are already assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets (AArch64, PowerPC) support the more defined behavior, so it doesn't make much sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to loosen the semantics. (Note that D67507 was proposed to update the LangRef to acknowledge the more recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do update based on the new standard, the reduction instructions can seamlessly inherit from whatever updates are made to the max/min intrinsics.) x86 sees a regression here on 'nnan' tests because we have underlying, longstanding bugs in FMF creation/propagation. Those need to be fixed apart from this change (for example: https://llvm.org/PR35538). The expansion sequence before this patch may not have been correct. Differential Revision: https://reviews.llvm.org/D87391	2020-09-12 09:10:28 -04:00
Kerry McLaughlin	cd89f5c91b	[SVE][CodeGen] Legalisation of truncate for scalable vectors Truncating from an illegal SVE type to a legal type, e.g. `trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>` fails after PromoteIntOp_CONCAT_VECTORS attempts to create a BUILD_VECTOR. This patch changes the promote function to create a sequence of INSERT_SUBVECTORs if the return type is scalable, and replaces these with UNPK+UZP1 for AArch64. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86548	2020-09-10 11:35:33 +01:00
Muhammad Asif Manzoor	1ffcbe35ae	[AArch64][SVE] Add lowering for rounding operations Add the functionality to lower SVE rounding operations for passthru variant. Created a new test case file for all rounding operations. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86793	2020-09-04 11:16:57 -04:00
Benjamin Kramer	8782c72765	Strength-reduce SmallVectors to arrays. NFCI.	2020-08-28 21:14:20 +02:00
David Sherwood	f4257c5832	[SVE] Make ElementCount members private This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isScalable(). In addition I've added some other member functions for more commonly used operations. Hopefully this makes the class more useful and will reduce the need for calling getKnownMinValue(). Differential Revision: https://reviews.llvm.org/D86065	2020-08-28 14:43:53 +01:00
Ties Stuij	d678e14c55	[AArch64][CodeGen] Restrict bfloat vector operations to what's actually supported Previously in addTypeForNeon, we would set the operations for bfloat vectors like other generic types. But as bfloat is a storage-only type a number of operations shouldn't be set. This patch fixes that. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85101	2020-08-28 11:44:37 +01:00
Mikhail Maltsev	23d5e93f34	[AArch64] Optimize instruction selection for certain vector shuffles This patch adds code to recognize vector shuffles which can be represented as VDUP (splat) of a vector lane with of a different (wider) type than the original vector lane type. For example: shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1> is essentially: shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 0, i32 0> Such patterns are generated by the SelectionDAG machinery in some cases (see DAGCombiner::visitBITCAST in DAGCombiner.cpp, the "Remove double bitcasts from shuffles" part). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D86225	2020-08-27 11:06:49 +01:00
Paul Walker	81337c915f	[SVE] Fallback to default expansion when lowering SIGN_EXTEN_INREG from non-byte based source. Differential Revision: https://reviews.llvm.org/D86394	2020-08-27 10:57:37 +01:00
Ahmed Bougacha	383f7c8858	[AArch64] Use CCAssignFnForReturn helper in more spots. NFC. It was added for GISel, but SDAG could use it too!	2020-08-26 14:39:11 -07:00
Muhammad Asif Manzoor	fd536eeed9	[AArch64][SVE] Add lowering for llvm fceil Add the functionality to lower fceil for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D84548	2020-08-26 15:59:44 -04:00
Paul Walker	73ac3c0ede	[SVE] Lower scalable vector ISD::FNEG operations. Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A) transformation when -1 is expressed as an ISD::SPLAT_VECTOR. Differential Revision: https://reviews.llvm.org/D86415	2020-08-25 11:22:28 +01:00
Cameron McInally	36dbb8fc97	[SVE] Lower fixed length UDIV to scalable Pretty much just a copy of the SDIV patches (D86114 and D85982) with string replacement. Differential Revision: https://reviews.llvm.org/D86316	2020-08-21 09:01:25 -05:00
Cameron McInally	ac63959460	[SVE] Lower fixed length vXi8/vXi16 SDIV to scalable There are no nxv16i8/nxv8i16 SDIV instructions, so these fixed width operations must be promoted to nxv4i32. Differential Revision: https://reviews.llvm.org/D86114	2020-08-20 13:47:01 -05:00
Mehdi Amini	a407ec9b6d	Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private."" Was reverted because MLIR/Flang builds were broken, these APIs have been fixed in the meantime.	2020-08-19 17:26:36 +00:00
Mehdi Amini	4fc56d70aa	Revert "[NFC][llvm] Make the contructors of `ElementCount` private." This reverts commit `264afb9e6a`. (and dependent `6b742cc48` and `fc53bd610f`) MLIR/Flang are broken.	2020-08-19 17:21:37 +00:00
Francesco Petrogalli	264afb9e6a	[NFC][llvm] Make the contructors of `ElementCount` private. Differential Revision: https://reviews.llvm.org/D86120	2020-08-19 16:26:44 +00:00
Eli Friedman	bb18532399	[AArch64][SVE] Allow llvm.aarch64.sve.st2/3/4 with vectors of pointers. This isn't necessaary for ACLE, but could be useful in other situations. And the change is simple. Differential Revision: https://reviews.llvm.org/D85251	2020-08-18 12:51:16 -07:00
Paul Walker	cb5cc47a65	[SVE] Lower fixed length vector ISD::SPLAT_VECTOR operations. Also strengthens the CHECK lines for scalable vector splat tests. Differential Revision: https://reviews.llvm.org/D86070	2020-08-18 11:19:43 +01:00
Cameron McInally	92593f9e77	[SVE] Lower fixed length vXi32/vXi64 SDIV to scalable vectors. Differential Revision: https://reviews.llvm.org/D85982	2020-08-14 18:47:22 -05:00
Cameron McInally	21810b0e14	[SVE] Lower fixed length vector integer UMIN/UMAX Differential Revision: https://reviews.llvm.org/D85926	2020-08-13 14:48:36 -05:00
Cameron McInally	e1a87f0a9b	[SVE] Lower fixed length vector integer SMIN/SMAX Differential Revision: https://reviews.llvm.org/D85855	2020-08-13 11:41:20 -05:00
Paul Walker	e63cc8105a	[SVE] Lower fixed length vector integer shifts. Differential Revision: https://reviews.llvm.org/D85724	2020-08-13 12:35:47 +01:00
Paul Walker	130098228d	[SVE] Lower fixed length vector integer ISD::SETCC operations. Differential Revision: https://reviews.llvm.org/D85831	2020-08-13 12:01:56 +01:00
Paul Walker	9e04895258	[SVE] Lower fixed length integer extend operations. Differential Revision: https://reviews.llvm.org/D85640	2020-08-13 11:54:53 +01:00
Francesco Petrogalli	c561f4d2ec	[SVE][VLS] Don't combine logical AND. Testing is performed when targeting 128, 256 and 512-bit wide vectors. For 128-bit vectors, the original behavior of using NEON instructions is preserved. Differential Revision: https://reviews.llvm.org/D85479	2020-08-12 20:00:07 +01:00
Cameron McInally	ce2c991061	[SVE] Lower fixed length FP minnum/maxnum Lower fixed length MINNUM/MAXNUM to scalable vectors. Cherry-picked from D71767 with added tests. Differential Revision: https://reviews.llvm.org/D85744	2020-08-12 12:02:52 -05:00
David Sherwood	88bbd30736	[SVE][CodeGen] Fix issues with EXTRACT_SUBVECTOR when using scalable FP vectors In this patch I have fixed two issues: 1. Our SVE tuple get/set intrinsics were using the wrong constant type for the index passed to EXTRACT_SUBVECTOR. I have fixed this by using the function SelectionDAG::getVectorIdxConstant to create the value. Also, I have updated the documentation for EXTRACT_SUBVECTOR describing what type the constant index should be and we now enforce this when creating the node. 2. The AArch64 backend was missing the appropriate patterns for extracting certain subvectors (nxv4f16 and nxv2f32) from legal SVE types. I have added them as part of this patch. The only way that I could find to test the new patterns was to use the SVE tuple get intrinsics, although I realise it looks a bit unusual. Tests added here: test/CodeGen/AArch64/sve-extract-subvector.ll Differential Revision: https://reviews.llvm.org/D85516	2020-08-12 08:35:46 +01:00
Paul Walker	b6c7b7fa31	[SVE] Add ISD nodes for predicated integer extend inreg operations. These are useful instructions when lowering fixed length vector extends, so I've broken this patch out as kind of NFC like work. Differential Revision: https://reviews.llvm.org/D85546	2020-08-11 11:39:26 +01:00
Paul Walker	d542feb8e4	[SVE] Lower fixed length vector integer subtract operations. Differential Revision: https://reviews.llvm.org/D85665	2020-08-11 11:32:12 +01:00
Paul Walker	0d33a8ef5b	[SVE] Lower scalable vector mul operations. This allows us to remove extra patterns from AArch64SVEInstrInfo.td because we can reuse those required for fixed length vectors. Differential Revision: https://reviews.llvm.org/D85328	2020-08-06 11:15:35 +01:00
Paul Walker	3ed59b775d	[SVE] Implement lowering for fixed length vector multiplication. NOTE: Also uses SVE code generation for NEON size vectors, instead of expanding i64 based vector multiplications. Differential Revision: https://reviews.llvm.org/D85327	2020-08-06 11:01:39 +01:00
Paul Walker	927fc536ca	[SVE] Add lowering for fixed length vector and, or & xor operations. Since there are no ill effects when performing these operations with undefined elements, they are lowered to the already supported unpredicated scalable vector equivalents. Differential Revision: https://reviews.llvm.org/D85117	2020-08-05 11:28:34 +01:00
Sander de Smalen	f2916636f8	[AArch64][SVE] Disable tail calls if callee does not preserve SVE regs. This fixes an issue triggered by the following code, where emitEpilogue got confused when trying to restore the SVE registers after the call, whereas the call to bar() is implemented as a TCReturn: int non_sve(); int sve(svint32_t x) { return non_sve(); } Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84869	2020-08-05 09:38:54 +01:00
Eli Friedman	95efea4b93	[AArch64][SVE] Widen narrow sdiv/udiv operations. The SVE instruction set only supports sdiv/udiv for 32-bit and 64-bit integers. If we see an 8-bit or 16-bit divide, widen the operands to 32 bits, and narrow the result. Differential Revision: https://reviews.llvm.org/D85170	2020-08-04 13:22:15 -07:00
Paul Walker	4be13b15d6	[SVE] Replace remaining _MERGE_OP1 nodes with _PRED variants. This is the final bit of work to relax the register allocation requirements when code generating normal LLVM IR, which rarely care about the result of inactive lanes. By using _PRED nodes we can make better use of SVE's reversed instructions. Also removes a redundant parameter from the min/max tests. Differential Revision: https://reviews.llvm.org/D85142	2020-08-04 11:19:17 +01:00
David Sherwood	23ad660b5d	[SVE][CodeGen] At -O0 fallback to DAG ISel when translating alloca with scalable types When building code at -O0 We weren't falling back to DAG ISel correctly when encountering alloca instructions with scalable vector types. This is because the alloca has no operands that are scalable. I've fixed this by adding a check in AArch64ISelLowering::fallBackToDAGISel for alloca instructions with scalable types. Differential Revision: https://reviews.llvm.org/D84746	2020-07-30 08:40:53 +01:00
Eli Friedman	c02aa53ecb	[AArch64][SVE] Add "fast" fcmp operations. `dacf8d3` added support for most fcmp operations, but there are some extra variations I hadn't considered: SelectionDAG supports float comparisons that are neither ordered nor unordered. Add support for the missing operations. Differential Revision: https://reviews.llvm.org/D84460	2020-07-24 13:22:41 -07:00
Francesco Petrogalli	809600d664	[llvm][sve] Reg + Imm addressing mode for ld1ro. Reviewers: kmclaughlin, efriedma, sdesmalen Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83357	2020-07-24 17:48:47 +00:00
Petre-Ionut Tudor	1af9fc8213	[ARM] Generate [SU]HADD from ((a + b) >> 1) Summary: Teach LLVM to recognize the above pattern, where the operands are either signed or unsigned types. Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83777	2020-07-21 13:22:07 +01:00
Eli Friedman	b8f765a1e1	[AArch64][SVE] Add support for trunc to <vscale x N x i1>. This isn't a natively supported operation, so convert it to a mask+compare. In addition to the operation itself, fix up some surrounding stuff to make the testcase work: we need concat_vectors on i1 vectors, we need legalization of i1 vector truncates, and we need to fix up all the relevant uses of getVectorNumElements(). Differential Revision: https://reviews.llvm.org/D83811	2020-07-20 13:11:02 -07:00
Paul Walker	6384ec4099	[SVE] Add lowering for fixed length vector fdiv, fma, fmul and fsub operations. Differential Revision: https://reviews.llvm.org/D84034	2020-07-20 11:57:34 +00:00
Tim Northover	88464a55b4	AArch64: emit @llvm.debugtrap as `brk #0xf000` on all platforms It's useful for a debugger to be able to distinguish an @llvm.debugtrap from a (noreturn) @llvm.trap, so this extends the existing Windows behaviour to other platforms.	2020-07-20 10:31:26 +01:00
Paul Walker	509351d768	[SVE] Add lowering for scalable vector fadd, fdiv, fmul and fsub operations. Lower the operations to predicated variants. This is prep work required for fixed length code generation but also fixes a bug whereby these operations fail selection when "unpacked" vector types (e.g. MVT::nxv2f32) are used. This patch also adds the missing "unpacked" patterns for FMA. Differential Revision: https://reviews.llvm.org/D83765	2020-07-16 11:31:35 +00:00
Paul Walker	319a97b5e2	[SVE] Ensure fixed length vector fptrunc operations bigger than NEON are not considered legal. Differential Revision: https://reviews.llvm.org/D83568	2020-07-13 11:16:30 +00:00
Paul Walker	f78e6a3095	[SVE] Code generation for fixed length vector truncates. Lower fixed length vector truncates to a sequence of SVE UZP1 instructions. Differential Revision: https://reviews.llvm.org/D83395	2020-07-10 10:37:19 +00:00
Eli Friedman	56ae2cebcd	[AArch64][SVE] Add lowering for llvm.fma. This is currently bare-bones; we aren't taking advantage of any of the FMA variant instructions. But it's enough to at least generate code. Differential Revision: https://reviews.llvm.org/D83444	2020-07-09 16:12:41 -07:00
Paul Walker	614fb09645	[SVE] Disable some BUILD_VECTOR related code generator features. Fixed length vector code generation for SVE does not yet custom lower BUILD_VECTOR and instead relies on expansion. At the same time custom lowering for VECTOR_SHUFFLE is also not available so this patch updates isShuffleMaskLegal to reject vector types that require SVE. Related to this it also prevents the merging of stores after legalisation because this only works when BUILD_VECTOR is either legal or can be elminated. When this is not the case the code generator enters an infinite legalisation loop. Differential Revision: https://reviews.llvm.org/D83408	2020-07-09 10:47:04 +00:00
Paul Walker	fb75451775	[SVE] Custom ISel for fixed length extract/insert_subvector. We use extact_subvector and insert_subvector to "cast" between fixed length and scalable vectors. This patch adds custom c++ based ISel for the following cases: fixed_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0 scalable_vector = ISD::INSERT_SUBVECTOR undef(scalable_vector), fixed_vector, 0 Which result in either EXTRACT_SUBREG/INSERT_SUBREG for NEON sized vectors or COPY_TO_REGCLASS otherwise. Differential Revision: https://reviews.llvm.org/D82871	2020-07-08 09:49:28 +00:00
Petre-Ionut Tudor	af80a4353e	[ARM] Generate [SU]RHADD from (b - (~a)) >> 1 Summary: Teach LLVM to recognize the above pattern, which is usually a transformation of (a + b + 1) >> 1, where the operands are either signed or unsigned types. Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82669	2020-07-03 16:00:06 +01:00
Guillaume Chatelet	87e2751cf0	[Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D83082	2020-07-03 08:06:43 +00:00
Sander de Smalen	143e324e75	[CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics, that shouldn't really have been there (I suspect this was a remnant from when we expected the wider vector always to have come from a vector CONCAT). When I tried to create a more minimal reproducer, I found a bug in DAGCombiner where it drops the scalable flag when trying to fold: extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index') This patch fixes both issues. Reviewers: david-arm, efriedma, spatel Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D82910	2020-07-02 10:16:43 +01:00
Guillaume Chatelet	ef36f5143d	[Alignment] TargetLowering::hasPairedLoad must use Align for RequiredAlignment As per documentation of `hasPairLoad`: "`RequiredAlignment` gives the minimal alignment constraints that must be met to be able to select this paired load." In this sense, `0` is strictly equivalent to `1`. We make this obvious by using `Align` instead of unsigned. There is only one implementor of this interface. Differential Revision: https://reviews.llvm.org/D82958	2020-07-01 14:32:30 +00:00
Paul Walker	a1aed80a35	[SVE] Relax merge requirement for IR based divides. We currently lower SDIV to SDIV_MERGE_OP1. This forces the value for inactive lanes in a way that can hamper register allocation, however, the lowering has no requirement for inactive lanes. Instead this patch replaces SDIV_MERGE_OP1 with SDIV_PRED thus freeing the register allocator. Once done the only user of SDIV_MERGE_OP1 is intrinsic lowering so I've removed the node and perform ISel on the intrinsic directly. This also allows us to implement MOVPRFX based zeroing in the same manner as SUB. This patch also renames UDIV_MERGE_OP1 and [F]ADD_MERGE_OP1 for the same reason but in the ADD cases the ISel code is already as required. Differential Revision: https://reviews.llvm.org/D82783	2020-07-01 08:18:42 +00:00
Guillaume Chatelet	28de229bc6	[Alignment][NFC] Migrate MachineFrameInfo::CreateStackObject to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82894	2020-07-01 07:28:11 +00:00
Christopher Tetreault	ab35ba5742	[SVE] Remove calls to VectorType::getNumElements from AArch64 Reviewers: efriedma, paquette, david-arm, kmclaughlin Reviewed By: david-arm Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82214	2020-06-30 11:17:50 -07:00
Guillaume Chatelet	4f5133a4dc	[Alignment][NFC] Migrate AArch64, ARM, Hexagon, MSP and NVPTX backends to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82749	2020-06-30 07:56:17 +00:00
Sander de Smalen	39f6a36a24	[AArch64][SVE] NFCI: Choose consistent naming for predicated SDAG nodes This patch proposes a naming convention for operations that take a general predicate (and are thus predicated) that specifies what happens to the false lanes. Currently the _PRED suffix is used, which doesn't really say much other than that it takes a predicate. In some instances this means it has merging predication and in other cases it means zeroing-predication. This patch also changes the order of operands to AArch64ISD::DUP_MERGE_PASSTHRU, to pass the predicate as the first operand, which is in line with all other predicates nodes. It takes the passthru value as an explicit passthru value, which is always passed as the last operand. Reviewers: paulwalker-arm, cameron.mcinally, eli.friedman, dancgr, efriedma Reviewed By: paulwalker-arm Tags: #llvm Differential Revision: https://reviews.llvm.org/D81850	2020-06-29 13:37:30 +01:00
Kerry McLaughlin	bb6603f013	[AArch64][SVE] Bail out of performPostLD1Combine for scalable types Summary: performPostLD1Combine will introduce either a LD1LANEpost or LD1DUPpost node, which will cause selection failure if the return type is a scalable vector. Reviewers: sdesmalen, c-rhodes, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82670	2020-06-29 11:59:53 +01:00
Simon Pilgrim	973685fc78	[TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.	2020-06-29 11:46:58 +01:00
Paul Walker	3a98d5d7e7	[SVE] Code generation for fixed length vector adds. Summary: Teach LowerToPredicatedOp to lower fixed length vector operations. Add AArch64ISD nodes and isel patterns for predicated integer and floating point adds. Together this enables SVE code generation for fixed length vector adds. Reviewers: rengolin, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82483	2020-06-26 19:54:41 +00:00
Kerry McLaughlin	6b313f198c	[AArch64][SVE] Remove asserts from AArch64ISelLowering for bfloat16 types Remove the asserts in performLDNT1Combine & performST[NT]1Combine to ensure we get a failure where the type is a bfloat16 and hasBF16() is false, regardless of whether asserts are enabled.	2020-06-26 14:51:27 +01:00
Cullen Rhodes	4319c48fc7	[AArch64][SVE] Only support sizeless bfloat types if supported by subtarget Reviewers: sdesmalen, efriedma, kmclaughlin, fpetrogalli Reviewed By: sdesmalen, fpetrogalli Differential Revision: https://reviews.llvm.org/D82494	2020-06-26 12:37:47 +00:00
Kerry McLaughlin	edcfef8fee	[AArch64][SVE] Add bfloat16 support to store intrinsics Summary: Bfloat16 support added for the following intrinsics: - ST1 - STNT1 Reviewers: sdesmalen, c-rhodes, fpetrogalli, efriedma, stuij, david-arm Reviewed By: fpetrogalli Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D82448	2020-06-26 11:05:56 +01:00
Kerry McLaughlin	0ccfe1b267	[AArch64][SVE] Predicate bfloat16 load patterns with HasBF16 Reviewers: sdesmalen, c-rhodes, efriedma, fpetrogalli Reviewed By: fpetrogalli Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82464	2020-06-26 10:38:24 +01:00
Eli Friedman	e9d4e34ab8	[AArch64][SVE] Add legalization support for i32/i64 vector srem/urem Implement them on top of sdiv/udiv, similar to what we do for integer types. Potential future work: implementing i8/i16 srem/urem, optimizations for constant divisors, optimizing the mul+sub to mls. Differential Revision: https://reviews.llvm.org/D81511	2020-06-23 16:27:52 -07:00
Paul Walker	499c63288f	[SVE] Code generation for fixed length vector loads & stores. Summary: This patch adds base support for code generating fixed length vector operations targeting a known SVE vector length. To achieve this we lower fixed length vector operations to equivalent scalable vector operations, whereby SVE predication is used to limit the elements processed to those present within the fixed length vector. Specifically this patch implements load and store operations, which get lowered to their masked counterparts thusly: V = load(Addr) => V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr)) store(V, (Addr)) => masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr)) Reviewers: rengolin, efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80385	2020-06-23 09:39:03 +00:00
David Sherwood	584d0d5c17	[SVE] Fall back on DAG ISel at -O0 when encountering scalable types At the moment we use Global ISel by default at -O0, however it is currently not capable of dealing with scalable vectors for two reasons: 1. The register banks know nothing about SVE registers. 2. The LLT (Low Level Type) class knows nothing about scalable vectors. For now, the easiest way to avoid users hitting issues when using the SVE ACLE is to fall back on normal DAG ISel when encountering instructions that operate on scalable vector types. I've added a couple of RUN lines to existing SVE tests to ensure we can compile at -O0. I've also added some new tests to CodeGen/AArch64/GlobalISel/arm64-fallback.ll that demonstrate we correctly fallback to DAG ISel at -O0 when lowering formal arguments or translating instructions that involve scalable vector types. Differential Revision: https://reviews.llvm.org/D81557	2020-06-19 10:57:00 +01:00
David Sherwood	0dc28af219	[CodeGen,AArch64] Fix up warnings in performExtendCombine Try to avoid calling getVectorNumElements() or relying upon the TypeSize conversion to uin64_t. Differential Revision: https://reviews.llvm.org/D81573	2020-06-19 10:34:51 +01:00
Paul Walker	4612f39120	[SVE] Add flag to specify SVE register size, using this to calculate legal vector types. Adds aarch64-sve-vector-bits-{min,max} to allow the size of SVE data registers (in bits) to be specified. This allows the code generator to make assumptions it normally couldn't. As a starting point this information is used to mark fixed length vector types that can fit within the specified size as legal. Reviewers: rengolin, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80384	2020-06-18 12:11:16 +00:00
Nikita Popov	7cac7e0cfc	[IR] Prefer hasFnAttribute() where possible (NFC) When checking for an enum function attribute, use hasFnAttribute() rather than hasAttribute() at FunctionIndex, because it is significantly faster (and more concise to boot).	2020-06-15 09:30:35 +02:00
Shawn Landden	9ec57cce62	[AArch64] custom lowering for i128 popcount halves the number of CNT instructions generated	2020-06-10 09:44:16 +04:00
Henry Kao	4dcc0d1958	[CodeGen][SVE] Avoid scalarizing zero splat stores on scalable vectors. Summary: Implemented in replaceZeroVectorStore(). Fixes several warnings in AArch64 SVE unit tests. Reviewers: sdesmalen, kmclaughlin, dancgr, efriedma, each, andwar, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80824	2020-06-09 12:52:39 -04:00
Guillaume Chatelet	3b6196c9b3	[Alignment][NFC] TargetLowering::allowsMisalignedMemoryAccesses Summary: Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMisalignedMemoryAccesses` without marking it override. This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81374	2020-06-09 10:17:42 +00:00
Cullen Rhodes	b82be5db71	[AArch64][SVE] Implement structured load intrinsics Summary: This patch adds initial support for the following instrinsics: * llvm.aarch64.sve.ld2 * llvm.aarch64.sve.ld3 * llvm.aarch64.sve.ld4 For loading two, three and four vectors worth of data. Basic codegen is implemented with reg+reg and reg+imm addressing modes being addressed in a later patch. The types returned by these intrinsics have a number of elements that is a multiple of the elements in a 128-bit vector for a given type and N, where N is the number of vectors being loaded, i.e. 2, 3 or 4. Thus, for 32-bit elements the types are: LD2 : <vscale x 8 x i32> LD3 : <vscale x 12 x i32> LD4 : <vscale x 16 x i32> This is implemented with target-specific intrinsics for each variant that take the same operands as the IR intrinsic but return N values, where the type of each value is a full vector, i.e. <vscale x 4 x i32> in the above example. These values are then concatenated using the standard concat_vector intrinsic to maintain type legality with the IR. These intrinsics are intended for use in the Arm C Language Extension (ACLE). Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D75751	2020-06-09 08:51:58 +00:00
David Sherwood	30dfbf03a2	[CodeGen,AArch64] Fix up warnings in splitStores The code for trying to split up stores is designed for NEON vectors, where we support arbitrary alignments. It's an optimisation designed to improve performance by using smaller, aligned stores. However, we currently only support 16 byte alignments for SVE vectors anyway so we may as well bail out early. This change fixes up remaining warnings in a couple of tests: CodeGen/AArch64/sve-callbyref-notailcall.ll CodeGen/AArch64/sve-calling-convention-byref.ll Differential Revision: https://reviews.llvm.org/D80720	2020-06-09 07:39:38 +01:00
David Sherwood	41fb119e8c	[CodeGen] Fix nullptr crash in tryConvertSVEWideCompare When the input to a wide compare instruction is a DUP or SPLAT_VECTOR node we should deal with cases where the DUP/SPLAT_VECTOR input operand is not an immediate value. I've fixed the code to return SDValue() in such cases and added a couple of tests - one each to represent the signed and unsigned cases. Differential Revision: https://reviews.llvm.org/D81167	2020-06-08 15:20:18 +01:00
Cullen Rhodes	3ebbe35363	[AArch64][SVE] Implement vector tuple intrinsics Summary: This patch adds the following intrinsics for creating two-tuple, three-tuple and four-tuple scalable vectors: * llvm.aarch64.sve.tuple.create2 * llvm.aarch64.sve.tuple.create3 * llvm.aarch64.sve.tuple.create4 As well as: * llvm.aarch64.sve.tuple.get * llvm.aarch64.sve.tuple.set For extracting and inserting scalable vectors from vector tuples. These intrinsics are intended to be used by the ACLE functions svcreate<n>, svget and svset. This patch also includes calling convention support for passing and returning tuples of scalable vectors to/from functions. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D75674	2020-06-08 11:09:55 +00:00
Kadir Cetinkaya	6bad8b07e6	[llvm][AArch64] Fix unused variable	2020-06-05 15:56:19 +02:00
Kerry McLaughlin	89fc0166f5	[CodeGen][SVE] Legalisation of extends with scalable types Summary: This patch adds legalisation of extensions where the operand of the extend is a legal scalable type but the result is not. EXTRACT_SUBVECTOR is used to split the result, before being replaced by target-specific [S\|U]UNPK[HI\|LO] operations. For example: ``` zext <vscale x 16 x i8> %a to <vscale x 16 x i16> ``` should emit: ``` uunpklo z2.h, z0.b uunpkhi z1.h, z0.b ``` Reviewers: sdesmalen, efriedma, david-arm Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79587	2020-06-05 12:08:42 +01:00
Eric Christopher	8c9badf61d	Replace integer usage with enumeration.	2020-06-03 20:00:28 -07:00
Francesco Petrogalli	febeaf94a8	[llvm][SVE] IR intrinsic for LD1RO. Reviewers: sdesmalen, efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80738	2020-06-03 13:57:16 +00:00
Amara Emerson	f573d489b6	[AArch64][GlobalISel] Split G_GLOBAL_VALUE into ADRP + G_ADD_LOW and optimize. The concept of G_GLOBAL_VALUE is nice and simple, but always using it as the representation for global var addressing until selection time creates some problems in optimizing accesses in certain code/relocation models. The problem comes from trying to optimize adrp -> add -> load/store sequences in the most common "small" code model. These accesses can be optimized into an adrp -> load with the add offset being folded into the load's immediate field. If we try to keep all global var references as a single generic instruction then by the time we get to the complex operand trying to match these, we end up generating an adrp at the point of use. The real issue here is that we don't have any form of CSE during selection, so the code size will bloat from many redundant adrp's. This patch custom legalizes small code mode non-GOT G_GLOBALs into target ADRP and a new "target specific generic opcode" G_ADD_LOW. We also teach the localizer to localize these instructions via the custom hook that was added recently. Finally, the complex pattern for indexed loads/stores is extended to try to fold these G_ADD_LOW instructions into the load immediate. On -O0 CTMark, we see a 0.8% geomean code size improvement. We should also see some minor performance improvements too. Differential Revision: https://reviews.llvm.org/D78465	2020-06-01 16:00:56 -07:00
Martin Storsjö	cf97e0ec42	[AArch64] Treat x18 as callee-saved in functions with windows calling convention on non-windows OSes Treat it as callee-saved, and always back it up. When windows code calls entry points in unix code, marked with the windows calling convention, that unix code can call other functions that isn't compiled with -ffixed-x18 which may clobber x18 freely. By backing it up and restoring it on return, we preserve the register across the function call, fulfilling this part of the windows calling convention on another OS. This isn't enough for making sure that x18 is preseved when non-windows code does a callback to windows code, but is a clear improvement over the current status quo. Additionally, wine is nowadays building many modules as PE DLLs, which avoids the callback issue altogether for those DLLs. Differential Revision: https://reviews.llvm.org/D61892	2020-05-30 09:22:09 +03:00
Christopher Tetreault	56eb7556e7	[SVE] Eliminate calls to default-false VectorType::get() from AArch64 Reviewers: efriedma, c-rhodes, david-arm, mcrosier, t.p.northover Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80327	2020-05-29 15:39:30 -07:00
Zequan Wu	80e107ccd0	Add NoMerge MIFlag to avoid MIR branch folding Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given Differential Revision: https://reviews.llvm.org/D79537	2020-05-29 12:31:06 -07:00
David Sherwood	205085d4cc	[CodeGen] Fix warnings in LowerToPredicatedOp When creating a new vector type based on another vector type we should pass in the element count instead of the number of elements and scalable flag separately. I encountered this warning whilst compiling this test: CodeGen/AArch64/sve-intrinsics-int-compares.ll Differential revision: https://reviews.llvm.org/D80621	2020-05-29 15:19:03 +01:00
Ties Stuij	78bd0c0e5e	[AArch64][BFloat] add BFloat instruction support for AArch64 Summary: Add support for lowering various BFloat related SelDAG nodes: - load/store (ldrh/strh) - concat - dup/duplane - bitconvert/bitcast - insert_subvector/insert_subreg This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a The bfloat type, and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile Reviewers: ab, t.p.northover, john.brawn, fpetrogalli, sdesmalen, LukeGeeson Reviewed By: fpetrogalli Subscribers: LukeGeeson, pbarrio, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79712	2020-05-27 15:36:54 +01:00
Ties Stuij	42eba9b40b	[AArch64][BFloat] basic AArch64 bfloat support Summary: This patch adds the bfloat type to the AArch64 backend: - adds it as part of the FPR16 register class - adds bfloat calling conventions - as f16 is now not the only FPR16 type anymore, we need to constrain a number of instruction patterns using FPR16Op to help out the TableGen type inferrer This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a The bfloat type, and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile Reviewers: t.p.northover, c-rhodes, fpetrogalli, sdesmalen, ostannard, LukeGeeson, ab Reviewed By: fpetrogalli Subscribers: pbarrio, LukeGeeson, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79709	2020-05-27 15:26:40 +01:00
Craig Topper	80cc43b420	[AArch64] Set i32 ISD::MULHU/S to Expand instead of Legal. Looks like there are no isel patterns for these. A DAG combine turns it into i64 multiply and a shift which hides this. Extracted from D80485	2020-05-26 00:41:09 -07:00
Sanjay Patel	7eed772a27	[PatternMatch] abbreviate vector inst matchers; NFC Readability is not reduced with these opcodes/match lines, so reduce odds of awkward wrapping from 80-col limit.	2020-05-24 09:19:47 -04:00
Alexey Lapshin	bf242c067e	[AARCH64][NEON] Allow to sink operands of aarch64_neon_pmull64. Summary: This patch fixes a problem when pmull2 instruction is not generated for vmull_high_p64 intrinsic. ISel has a pattern for int_aarch64_neon_pmull64 intrinsic to generate PMULL2 instruction. That pattern assumes that extraction operations are located in the same basic block. We need to sink them if they are not. Handle operands of int_aarch64_neon_pmull64 into AArch64TargetLowering::shouldSinkOperands. Reviewed by: efriedma Differential Revision: https://reviews.llvm.org/D80320	2020-05-22 01:35:24 +03:00
Eli Friedman	a1ce88b4e3	[AArch64][SVE] Implement AArch64ISD::SETCC_PRED This unifies SETCC operations along the lines of other operations. Differential Revision: https://reviews.llvm.org/D79975	2020-05-15 11:53:21 -07:00
Benjamin Kramer	f242950fdf	Fold single-use variables into assert This avoids unused variable warnings in Release builds.	2020-05-12 15:26:59 +02:00
Sander de Smalen	077d2d6802	[CodeGen][SVE] Add patterns for whole vector predicate select Added patterns to implement `select i1 %p, <vty> %a, <vty> %b` Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D79356	2020-05-12 11:47:39 +01:00
Petre-Ionut Tudor	9682d0d5dc	[ARM] Refactor lower to S[LR]I optimization Summary: The optimization has been refactored to fix certain bugs and limitations. The condition for lowering to S[LR]I has been changed to reflect the manual pseudocode description of SLI and SRI operation. The optimization can now handle more cases of operand type and order. Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79233	2020-05-12 11:00:13 +01:00
Craig Topper	d1119980e5	[SelectionDAG] Use Align/MaybeAlign for ConstantPoolSDNode. This patch stores the alignment for ConstantPoolSDNode as an Align and updates the getConstantPool interface to take a MaybeAlign. Removing getAlignment() will be done as a follow up. Differential Revision: https://reviews.llvm.org/D79436	2020-05-08 16:04:11 -07:00
Kerry McLaughlin	3bcd3dd473	[CodeGen][SVE] Lowering of shift operations with scalable types Summary: Adds AArch64ISD nodes for: - SHL_PRED (logical shift left) - SHR_PRED (logical shift right) - SRA_PRED (arithmetic shift right) Existing patterns for unpredicated left shift by immediate have also been moved into the appropriate multiclasses in SVEInstrFormats.td. Reviewers: sdesmalen, efriedma, ctetreau, huihuiz, rengolin Reviewed By: efriedma Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79478	2020-05-07 11:43:49 +01:00
Eli Friedman	2c8546107a	[AArch64][SVE] Implement lowering for SIGN_EXTEND etc. of SVE predicates. Now using patterns, since there's a single-instruction lowering. (We could convert to VSELECT and pattern-match that, but there doesn't seem to be much point.) I think this might be the first instruction to use nested multiclasses this way? It seems like a good way to reduce duplication between different integer widths. Let me know if it seems like an improvement. Also, while I'm here, fix the return type of SETCC so we don't try to merge a sign-extend with a SETCC. Differential Revision: https://reviews.llvm.org/D79193	2020-05-06 17:56:32 -07:00
Kerry McLaughlin	19f5da9c1d	[SVE][Codegen] Lower legal min & max operations Summary: This patch adds AArch64ISD nodes for [S\|U]MIN_PRED and [S\|U]MAX_PRED, and lowers both SVE intrinsics and IR operations for min and max to these nodes. There are two forms of these instructions for SVE: a predicated form and an immediate (unpredicated) form. The patterns which existed for the latter have been updated to match a predicated node with an immediate and map this to the immediate instruction. Reviewers: sdesmalen, efriedma, dancgr, rengolin Reviewed By: efriedma Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79087	2020-05-04 11:19:19 +01:00
Cullen Rhodes	672b62ea21	[AArch64][SVE] Custom lowering of floating-point reductions Summary: This patch implements custom floating-point reduction ISD nodes that have vector results, which are used to lower the following intrinsics: * llvm.aarch64.sve.fadda * llvm.aarch64.sve.faddv * llvm.aarch64.sve.fmaxv * llvm.aarch64.sve.fmaxnmv * llvm.aarch64.sve.fminv * llvm.aarch64.sve.fminnmv SVE reduction instructions keep their result within a vector register, with all other bits set to zero. Changes in this patch were implemented by Paul Walker and Sander de Smalen. Reviewers: sdesmalen, efriedma, rengolin Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D78723	2020-04-30 10:18:40 +00:00
Kerry McLaughlin	53dd72a87a	[SVE][CodeGen] Lower SDIV & UDIV to SVE intrinsics Summary: This patch maps IR operations for sdiv & udiv to the @llvm.aarch64.sve.[s\|u]div intrinsics. A ptrue must be created during lowering as the div instructions have only a predicated form. Patch contains changes by Andrzej Warzynski. Reviewers: sdesmalen, c-rhodes, efriedma, cameron.mcinally, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, andwar, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78569	2020-04-24 11:38:20 +01:00
Christopher Tetreault	84584b0d29	[SVE] Remove calls to isScalable from AARCH64 Reviewers: efriedma, sdesmalen, t.p.northover, mcrosier Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77758	2020-04-23 13:09:17 -07:00
Kerry McLaughlin	17f6e18acf	[AArch64][SVE] Add SVE intrinsic for LD1RQ Summary: Adds the following intrinsic for contiguous load & replicate: - @llvm.aarch64.sve.ld1rq The LD1RQ intrinsic only needs the SImmS16XForm added by this patch. The others (SImmS2XForm, SImmS3XForm & SImmS4XForm) were added for consistency. Reviewers: andwar, sdesmalen, efriedma, cameron.mcinally, dancgr, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76929	2020-04-22 11:29:27 +01:00
Petre-Ionut Tudor	52474992b1	Revert "[ARM] Fix conditions for lowering to S[LR]I" This reverts commit `cabfcf840a`. The patch introduced another bug in the optimization.	2020-04-20 16:11:04 +01:00
Kerry McLaughlin	33ffce5414	[AArch64][SVE] Remove LD1/ST1 dependency on llvm.masked.load/store Summary: The SVE masked load and store intrinsics introduced in D76688 rely on common llvm.masked.load/store nodes. This patch creates new ISD nodes for LD1(S) & ST1 to remove this dependency. Additionally, this adds support for sign & zero extending loads and truncating stores. Reviewers: sdesmalen, efriedma, cameron.mcinally, c-rhodes, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, andwar, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78204	2020-04-20 11:08:11 +01:00
Francesco Petrogalli	897fdec586	[llvm][CodeGen] Addressing modes for SVE stN. This reverts commit `17b1869b72`. It is an attempt to fix the failure reported at The patch differs from the original one reviwed at https://reviews.llvm.org/D77435 only for the use of the std::make_tuple in building the return value of `findAddrModeSVELoadStore`: - return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset}; + return std::make_tuple(IsRegReg ? Opc_rr : Opc_ri, NewBase, the original patch submitted at `fc4e954ed5` was failing the following build: http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420/ with error: /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10: error: chosen constructor is explicit in copy-initialization return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset}; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here constexpr tuple(_UElements&&... __elements) ^ 1 error generated.	2020-04-17 20:35:35 +01:00
Francesco Petrogalli	17b1869b72	Revert "[llvm][CodeGen] Addressing modes for SVE stN." This reverts commit `fc4e954ed5`. The commit reported the following failure: http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420 FAILED: lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o /usr/bin/c++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Target/AArch64 -I/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64 -I/usr/include/libxml2 -Iinclude -I/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/include -mthumb -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -fvisibility=hidden -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MMD -MT lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o -MF lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o.d -o lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o -c /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10: error: chosen constructor is explicit in copy-initialization return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset}; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here constexpr tuple(_UElements&&... __elements)	2020-04-17 20:03:11 +01:00
Francesco Petrogalli	fc4e954ed5	[llvm][CodeGen] Addressing modes for SVE stN. Reviewers: efriedma, sdesmalen, c-rhodes, ctetreau Reviewed By: c-rhodes Subscribers: tschuett, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77435	2020-04-17 19:31:44 +01:00
Francesco Petrogalli	48879c02bf	[llvm][CodeGen] Fix issue for SVE gather prefetch. Summary: This change is fixing an issue where the dagcombine incorrectly used an addressing mode with scaled offsets (indices), instead of unscaled offsets. Those addressing modes do not exist for `prfh` , `prfw` and `prfd`, hence we can reuse `prfb` because that has unscaled offsets, and because the pseudo-code in the XML spec suggests that the element size is not used for the amount of data that is prefetched by the instruction. FWIW, GCC also emits a `prfb` for these cases. Reviewers: sdesmalen, andwar, rengolin Reviewed By: sdesmalen Subscribers: tschuett, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78069	2020-04-17 19:23:28 +01:00
Benjamin Kramer	d1ef44982f	[AArch64] Fold one-use variables into assert Avoids unused variable warnings in Release builds.	2020-04-17 19:43:06 +02:00
Petre-Ionut Tudor	cabfcf840a	[ARM] Fix conditions for lowering to S[LR]I Summary: Fixed wrong conditions for generating (S[LR]I X, Y, C2) from (or (and X, BvecC1), (lsl Y, C2)) and added ISel nodes to lower to S[LR]I. The optimisation is also enabled by default now. Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77387	2020-04-17 17:19:24 +01:00
Benjamin Kramer	166467e822	[VectorUtils] Create shufflevector masks as int vectors instead of Constants No functionality change intended.	2020-04-17 15:28:00 +02:00
Francesco Petrogalli	89680f25e8	[llvm][CodeGen] Rename SVE gather prefetch intrinsics. [NFC] Summary: The renaming is necessary to make the naming scheme uniform with other gather/scatter load/stores SVE intrinsics. The naming of variables and functions have been adapted to make it explicit whether we are dealing with a scalar offset (which is unscaled) or an index (which is scaled according to the data type of the lanes of the vector). Reviewers: andwar, sdesmalen, rengolin Reviewed By: andwar Subscribers: tschuett, hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77839	2020-04-15 21:49:16 +01:00
Christopher Tetreault	05a079895c	[SVE] Remove calls to getBitWidth from AArch64 Reviewers: efriedma Reviewed By: efriedma Subscribers: danielkiss, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77905	2020-04-14 10:26:37 -07:00
Craig Topper	113f37a1f9	[CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase Differential Revision: https://reviews.llvm.org/D77995	2020-04-13 13:50:15 -07:00
Christopher Tetreault	9f87d951fc	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: mcrosier, efriedma, sdesmalen Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77269	2020-04-09 16:43:29 -07:00
Eli Friedman	1ee6ec2bf3	Remove "mask" operand from shufflevector. Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors. This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types. I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that. Differential Revision: https://reviews.llvm.org/D72467	2020-03-31 13:08:59 -07:00
Eli Friedman	dacf8d3562	[AArch64][SVE] Add support for fcmp. This also requires support for boolean "not", so I added boolean logic while I was there. Differential Revision: https://reviews.llvm.org/D76901	2020-03-31 12:04:39 -07:00
Kerry McLaughlin	05606329e2	[AArch64][SVE] Add SVE intrinsics for masked loads & stores Summary: Implements the following intrinsics for contiguous loads & stores: - @llvm.aarch64.sve.ld1 - @llvm.aarch64.sve.st1 Reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, dancgr, rengolin Reviewed By: cameron.mcinally Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76688	2020-03-25 11:48:40 +00:00
Amara Emerson	472d282046	[AArch64][GlobalISel] Don't localize TLS G_GLOBAL_VALUEs on Darwin. On Darwin these need to be selected into a function call for the TLS address lookup. As a result, they can't be moved below a physreg write, which happens in call sequences. In the long term, we should have some mechanism in the localizer to prevent localizing into target-specific atomic instruction sequences. rdar://60056248 Differential Revision: https://reviews.llvm.org/D76652	2020-03-24 13:35:50 -07:00
Andrzej Warzynski	0ea4fb5bb7	[AArch64][SVE] Rename intrinsics for gather prefetch [NFC] Summary: In order to keep the names consistent with other SVE gather loads, the intrinsics for gather prefetch are renamed as follows: * @llvm.aarch64.sve.gather.prfb -> @llvm.aarch64.sve.prfb.gather Reviewed by: fpetrogalli Differential Revision: https://reviews.llvm.org/D76421	2020-03-19 12:53:36 +00:00
Sander de Smalen	4788ca450f	[AArch64][SVE] Change pointer type of nontemporal load/store intrinsics Summary: This fixes a discrepancy between the non-temporal loads/store intrinsics and other SVE load intrinsics (such as nf/ff), so that Clang can use the same code to generate these intrinsics. Reviewers: andwar, kmclaughlin, rengolin, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76237	2020-03-18 12:44:51 +00:00
Vitaly Buka	f20dcc31e3	Fix unused function warning	2020-03-16 19:45:36 -07:00
Benjamin Kramer	05ff3323e0	[AArch64] Remove unused variable	2020-03-16 21:59:55 +01:00
Francesco Petrogalli	0f2b68d9c7	Implement IR intrinsics for gather prefetch. Summary: Intrinsics and relative codegen has been implemented for the following SVE instructions: 1. PRF<T> <prfop>, <Pg>, [<Xn\|SP>, <Zm>.S, <mod>] -> 32-bit scaled offset 2. PRF<T> <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D, <mod>] -> 32-bit unpacked scaled offset 3. PRF<T> <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D] -> 64-bit scaled offset 4. PRF<T> <prfop>, <Pg>, [<Zn>.S{, #<imm>}] -> 32-bit element 5. PRF<T> <prfop>, <Pg>, [<Zn>.D{, #<imm>}] -> 64-bit element The instructions are associated the following intrinsics, respectively: 1. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx4vi32( i8* %base, <vscale x 4 x i32> %offset, <vscale x 4 x i1> %Pg, i32 %prfop) 2. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx2vi32( i8* %base, <vscale x 2 x i32> %offset, <vscale x 2 x i1> %Pg, i32 %prfop) 3. void @llvm.aarch64.sve.gather.prf<T>.scaled.nx2vi64( i8* %base, <vscale x 2 x i64> %offset, <vscale x 2 x i1> %Pg, i32 %prfop) 4. void @llvm.aarch64.sve.gather.prf<T>.nx4vi32( <vscale x 4 x i32> %bases, i64 %imm, <vscale x 4 x i1> %Pg, i32 %prfop) 5. void @llvm.aarch64.sve.gather.prf<T>.nx2vi64( <vscale x 2 x i64> %bases, i64 %imm, <vscale x 2 x i1> %Pg, i32 %prfop) The intrinsics are the IR counterpart of the following SVE ACLE functions: * void svprf<T>(svbool_t pg, const void base, svprfop op) void svprf<T>_vnum(svbool_t pg, const void base, int64_t vnum, svprfop op) void svprf<T>_gather[_u32base](svbool_t pg, svuint32_t bases, svprfop op) * void svprf<T>_gather[_u64base](svbool_t pg, svuint64_t bases, svprfop op) * void svprf<T>_gather_[s32]offset(svbool_t pg, const void base, svint32_t offsets, svprfop op) void svprf<T>_gather_[u32]offset(svbool_t pg, const void base, svint32_t offsets, svprfop op) void svprf<T>_gather_[s64]offset(svbool_t pg, const void base, svint64_t offsets, svprfop op) void svprf<T>_gather_[u64]offset(svbool_t pg, const void base, svint64_t offsets, svprfop op) void svprf<T>_gather[_u32base]_offset(svbool_t pg, svuint32_t bases, int64_t offset, svprfop op) * void svprf<T>_gather[_u64base]_offset(svbool_t pg, svuint64_t bases,int64_t offset, svprfop op) Reviewers: andwar, sdesmalen, efriedma, rengolin Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75580	2020-03-16 18:52:35 +00:00
Andrzej Warzynski	a0c15ed460	[AArch64][SVE] Add the @llvm.aarch64.sve.dup.x intrinsic Summary: This intrinsic implements the unpredicated duplication of scalar values and is mapped to (through ISD::SPLAT_VECTOR): * DUP <Zd>.<T>, #<imm> * DUP <Zd>.<T>, <R><n\|SP> Reviewed by: sdesmalen Differential Revision: https://reviews.llvm.org/D75900	2020-03-13 12:40:22 +00:00
Andrzej Warzynski	46b9f14d71	[AArch64][SVE] Add intrinsics for non-temporal scatters/gathers Summary: This patch adds the following intrinsics for non-temporal gather loads and scatter stores: * aarch64_sve_ldnt1_gather_index * aarch64_sve_stnt1_scatter_index These intrinsics implement the "scalar + vector of indices" addressing mode. As opposed to regular and first-faulting gathers/scatters, there's no instruction that would take indices and then scale them. Instead, the indices for non-temporal gathers/scatters are scaled before the intrinsics are lowered to `ldnt1` instructions. The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as placeholders so that we can easily identify the cases implemented in this patch in performGatherLoadCombine and performScatterStoreCombined. Once encountered, they are replaced with: * GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1 * SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1 The patterns for lowering ISD::SHL for scalable vectors (required by this patch) were missing, so these are added too. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D75601	2020-03-12 13:55:56 +00:00
Andrzej Warzynski	a9f1583228	[AArch64][SVE] Add the @llvm.aarch64.sve.sel intrinsic Reviewers: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75928	2020-03-11 17:05:21 +00:00
Djordje Todorovic	c15c68abdc	[CallSiteInfo] Enable the call site info only for -g + optimizations Emit call site info only in the case of '-g' + 'O>0' level. Differential Revision: https://reviews.llvm.org/D75175	2020-03-09 12:12:44 +01:00
Andrzej Warzynski	9249f60602	[AArch64][SVE] Add intrinsics for non-temporal gather-loads/scatter-stores Summary: This patch adds the following LLVM IR intrinsics for SVE: 1. non-temporal gather loads * @llvm.aarch64.sve.ldnt1.gather * @llvm.aarch64.sve.ldnt1.gather.uxtw * @llvm.aarch64.sve.ldnt1.gather.scalar.offset 2. non-temporal scatter stores * @llvm.aarch64.sve.stnt1.scatter * @llvm.aarch64.sve.ldnt1.gather.uxtw * @llvm.aarch64.sve.ldnt1.gather.scalar.offset These intrinsic are mapped to the corresponding SVE instructions (example for half-words, zero-extending): * ldnt1h { z0.s }, p0/z, [z0.s, x0] * stnt1h { z0.s }, p0/z, [z0.s, x0] Note that for non-temporal gathers/scatters, the SVE spec defines only one instruction type: "vector + scalar". For this reason, we swap the arguments when processing intrinsics that implement the "scalar + vector" addressing mode: * @llvm.aarch64.sve.ldnt1.gather * @llvm.aarch64.sve.ldnt1.gather.uxtw * @llvm.aarch64.sve.stnt1.scatter * @llvm.aarch64.sve.ldnt1.gather.uxtw In other words, all intrinsics for gather-loads and scatter-stores implemented in this patch are mapped to the same load and store instruction, respectively. The sve2_mem_gldnt_vs multiclass (and it's counterpart for scatter stores) from SVEInstrFormats.td was split into: * sve2_mem_gldnt_vec_vs_32_ptrs (32bit wide base addresses) * sve2_mem_gldnt_vec_vs_62_ptrs (64bit wide base addresses) This is consistent with what we did for @llvm.aarch64.sve.ld1.scalar.offset and highlights the actual split in the spec and the implementation. Reviewed by: sdesmalen Differential Revision: https://reviews.llvm.org/D74858	2020-03-02 10:38:28 +00:00
Andrzej Warzynski	fa9439fac8	[AArch64][SVE] Add intrinsics for first-faulting gather loads Summary: The following intrinsics are added: * @llvm.aarch64.sve.ldff1.gather * @llvm.aarch64.sve.ldff1.gather.index * @llvm.aarch64.sve.ldff1.gather_sxtw * @llvm.aarch64.sve.ldff1.gather.uxtw * @llvm.aarch64.sve.ldff1.gather_sxtw.index * @llvm.aarch64.sve.ldff1.gather.uxtw.index * @llvm.aarch64.sve.ldff1.gather.scalar.offset Although this patch is quite substantial, the vast majority of the implementation is just a 'copy & paste' of the implementation of regular gather loads, including tests. There's only a handful of new definitions: * AArch64ISD nodes defined in AArch64ISelLowering.h (e.g. GLDFF1) * Seleciton DAG Types in AArch64SVEInstrInfo.td (e.g. AArch64ldff1_gather) * intrinsics in IntrinsicsAArch64.td (e.g. aarch64_sve_ldff1_gather) * Pseudo instructions in SVEInstrFormats.td to workaround the issue of use-before-def for the FFR register. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D75128	2020-02-27 12:56:33 +00:00
Sjoerd Meijer	13db7490fa	[AArch64] Peephole optimization: merge AND and TST instructions In some cases Clang does not perform merging of instructions AND and TST (aka ANDS xzr). Example: tst x2, x1 and x3, x2, x1 to: ands x3, x2, x1 This patch add such merging during instruction selection: when AND is replaced with ANDS instruction in LowerSELECT_CC, all users of AND also should be changed for using this ANDS instruction Short discussion on mailing list: http://llvm.1065342.n5.nabble.com/llvm-dev-ARM-Peephole-optimization-instructions-tst-add-tp133109.html Patch by Pavel Kosov. Differential Revision: https://reviews.llvm.org/D71701	2020-02-27 09:23:47 +00:00
Craig Topper	735d27dc40	[SelectionDAG][PowerPC][AArch64][X86][ARM] Add chain input and output the ISD::FLT_ROUNDS_ This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order. This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code. Differential Revision: https://reviews.llvm.org/D75132	2020-02-25 16:58:23 -08:00
Andrzej Warzynski	cff90c938b	[AArch64][SVE] Update names and comments for gathers/scatters (NFC) Summary: This patch renames functions and TableGen classes for SVE gathers and scatters. The original names implied that the corresponding methods/classes are only suited for regular gathers/scatters (i.e. LD1 and ST1), which is not the case. Indeed, we will be re-using them for non-temporal and first-faulting gathers/scatters in the forthcoming patches. The new names also highlight the split into Vector-Scalar (VS) and Scalar-Vector (SV) cases. List of changes: * `performLD1GatherCombine` and `performST1ScatterCombine` are renamed as `performGatherLoadCombine` and `performScatterStoreCombine`, respectively. * Selection DAG types for scatters and gathers from AArch64SVEInstrInfo.td are renamed. For example, `SDT_AArch64_GLD1` is renamed as `SDT_AArch64_GATHER_SV`. SV stands for Scalar-Vector, as opposed to Vector-Scalar (VS). * The intrinsic classes from IntrinsicsAArch64.td are renamed. For example, `AdvSIMD_GatherLoad_64bitOffset_Intrinsic` is renamed as `AdvSIMD_GatherLoad_SV_64b_Offsets_Intrinsic`. * Updated comments in `performGatherLoadCombine` and `performScatterStoreCombine`. Reviewers: sdesmalen, rengolin, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75035	2020-02-25 11:09:01 +00:00
Cullen Rhodes	72848f26b4	[AArch64][SVE] Add predicate reinterpret intrinsics Summary: Implements the following intrinsics: * llvm.aarch64.sve.convert.to.svbool * llvm.aarch64.sve.convert.from.svbool For converting the ACLE svbool_t type (<n x 16 x i1>) to and from the other predicate types: <n x 8 x i1>, <n x 4 x i1> and <n x 2 x i1>. Reviewers: sdesmalen, kmclaughlin, efriedma, dancgr, rengolin Reviewed By: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74471	2020-02-25 10:24:06 +00:00
Kerry McLaughlin	f87f23c81c	[AArch64][SVE] Add the SVE dupq_lane intrinsic Summary: Implements the @llvm.aarch64.sve.dupq.lane intrinsic. As specified in the ACLE, the behaviour of: svdupq_lane_u64(data, index) ...is identical to: svtbl(data, svadd_x(svptrue_b64(), svand_x(svptrue_b64(), svindex_u64(0, 1), 1), index * 2)) If the index is in the range [0,3], the operation is equivalent to a single DUP (.q) instruction. Reviewers: sdesmalen, c-rhodes, cameron.mcinally, efriedma, dancgr, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74734	2020-02-24 13:59:47 +00:00
Craig Topper	9bbf271fc9	[AArch64] Move isOverflowIntrOpRes help function to the ISD namespace in SelectionDAG.h. NFC Enables sharing with an upcoming X86 change.	2020-02-20 08:50:17 -08:00
Djordje Todorovic	2f215cf36a	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGfaff707db82d. A failure found on an ARM 2-stage buildbot. The investigation is needed.	2020-02-20 14:41:39 +01:00
Cameron McInally	3931734990	[AArch64][SVE] Add initial backend support for FP splat_vector Differential Revision: https://reviews.llvm.org/D74632	2020-02-19 10:19:11 -06:00
Djordje Todorovic	faff707db8	Reland "[DebugInfo] Enable the debug entry values feature by default" Differential Revision: https://reviews.llvm.org/D73534	2020-02-19 11:12:26 +01:00
Djordje Todorovic	2bf44d11cb	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGa82d3e8a6e67.	2020-02-18 16:38:11 +01:00
Djordje Todorovic	a82d3e8a6e	Reland "[DebugInfo] Enable the debug entry values feature by default" This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-18 14:41:08 +01:00
Sander de Smalen	a7a96c726e	[AArch64] Implement passing SVE vectors by ref for AAPCS. Summary: This patch implements the part of the calling convention where SVE Vectors are passed by reference. This means the caller must allocate stack space for these objects and pass the address to the callee. Reviewers: efriedma, rovka, cameron.mcinally, c-rhodes, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71216	2020-02-17 15:20:28 +00:00
Kerry McLaughlin	633db60f3e	[AArch64][SVE] Add SVE index intrinsic Summary: Implements the @llvm.aarch64.sve.index intrinsic, which takes a scalar base and step value. This patch also adds the printSImm function to AArch64InstPrinter to ensure that immediates of type i8 & i16 are printed correctly. Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin Reviewed By: cameron.mcinally Subscribers: tatyana-krasnukha, tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74550	2020-02-17 10:30:11 +00:00
Pavel Iliin	b6a9fe2099	[AArch64] Add BIT/BIF support. This patch added generation of SIMD bitwise insert BIT/BIF instructions. In the absence of GCC-like functionality for optimal constraints satisfaction during register allocation the bitwise insert and select patterns are matched by pseudo bitwise select BSP instruction with not tied def. It is expanded later after register allocation with def tied to BSL/BIT/BIF depending on operands registers. This allows to get rid of redundant moves. Reviewers: t.p.northover, samparker, dmgreen Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D74147	2020-02-14 14:19:39 +00:00
Kerry McLaughlin	671cbc1fbb	[AArch64][SVE] Add mul/mla/mls lane & dup intrinsics Summary: Implements the following intrinsics: - @llvm.aarch64.sve.dup - @llvm.aarch64.sve.mul.lane - @llvm.aarch64.sve.mla.lane - @llvm.aarch64.sve.mls.lane Reviewers: c-rhodes, sdesmalen, dancgr, efriedma, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74222	2020-02-13 10:32:59 +00:00
Djordje Todorovic	97ed706a96	Revert "[DebugInfo] Enable the debug entry values feature by default" This reverts commit rG9f6ff07f8a39. Found a test failure on clang-with-thin-lto-ubuntu buildbot.	2020-02-12 11:59:04 +01:00
Djordje Todorovic	9f6ff07f8a	[DebugInfo] Enable the debug entry values feature by default This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-12 10:25:14 +01:00
Craig Topper	eeb63944e4	[LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results. Remove code from LegalizeTypes that allowed this to work. We were already using BUILD_PAIR for this in some places so this standardizes on a single way to do this.	2020-02-08 09:52:31 -08:00
Guillaume Chatelet	f85d3408e6	[NFC] Introduce an API for MemOp Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73964	2020-02-07 11:32:27 +01:00
Reid Kleckner	2d89e0a098	[SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr The CATCHPAD node mostly existed to be selected into the EH_RESTORE instruction, which sets the frame back up when 32-bit Windows exceptions return to the parent function. However, creating this MachineInstr early increases the risk that other passes will come along and insert instructions that use the stack before ESP and EBP are restored. That happened in PR44697. Instead of representing these in the instruction stream early, delay it until PEI. Mark the blocks where this needs to happen as EHPads, but not funclet entry blocks. Passes after PEI have to be careful not to hoist instructions that can use stack across frame setup instructions, so this should be relatively reliable. Fixes PR44697 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D73752	2020-02-04 15:13:12 -08:00
Guillaume Chatelet	b8144c0536	[NFC] Encapsulate MemOp logic Summary: This patch simply introduces functions instead of directly accessing the fields. This helps introducing additional check logic. A second patch will add simplifying functions. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73945	2020-02-04 10:36:26 +01:00
Guillaume Chatelet	333f2ad8b8	[Alignment][NFC] Use Align for getMemcpy/Memmove/Memset Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73885	2020-02-03 17:13:19 +01:00
John Brawn	68cf574857	[FPEnv][AArch64] Add lowering of f128 STRICT_FSETCC These get lowered to function calls, like the non-strict versions. Differential Revision: https://reviews.llvm.org/D73784	2020-02-03 14:39:16 +00:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00
John Brawn	0bb9a27c98	[FPEnv][AArch64] Add lowering and instruction selection for strict conversions Strict fp-to-int and int-to-fp conversions can be handled in the same way that the non-strict versions are (by using the appropriate instruction or converting to a function call when we have no instruction). Differential Revision: https://reviews.llvm.org/D73625	2020-01-30 13:50:06 +00:00
John Brawn	258d8dd76a	[FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUND This gets selected to the appropriate fcvt instruction. Handling from there on isn't fully correct yet, as we need to model fcvt reading and writing to fpsr and fpcr. Differential Revision: https://reviews.llvm.org/D73201	2020-01-30 12:51:25 +00:00
John Brawn	2224407ef5	Add lowering of STRICT_FSETCC and STRICT_FSETCCS These become STRICT_FCMP and STRICT_FCMPE, which then get selected to the corresponding FCMP and FCMPE instructions, though the handling from there on isn't fully correct as we don't model reads and writes to FPCR and FPSR. Differential Revision: https://reviews.llvm.org/D73368	2020-01-30 10:40:55 +00:00
Kerry McLaughlin	aa0f37e14a	[AArch64][SVE] Add first-faulting load intrinsic Summary: Implements the llvm.aarch64.sve.ldff1 intrinsic and DAG combine rules for first-faulting loads with sign & zero extends Reviewers: sdesmalen, efriedma, andwar, dancgr, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73025	2020-01-23 11:57:16 +00:00
Sander de Smalen	4cf16efe49	[AArch64][SVE] Add patterns for unpredicated load/store to frame-indices. This patch also fixes up a number of cases in DAGCombine and SelectionDAGBuilder where the size of a scalable vector is used in a fixed-width context (thus triggering an assertion failure). Reviewers: efriedma, c-rhodes, rovka, cameron.mcinally Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D71215	2020-01-22 14:32:27 +00:00
Kerry McLaughlin	cdcc4f2a44	[AArch64][SVE] Add intrinsic for non-faulting loads Summary: This patch adds the llvm.aarch64.sve.ldnf1 intrinsic, plus DAG combine rules for non-faulting loads and sign/zero extends Reviewers: sdesmalen, efriedma, andwar, dancgr, mgudim, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71698	2020-01-22 11:15:20 +00:00
Sander de Smalen	67d4c9924c	Add support for (expressing) vscale. In LLVM IR, vscale can be represented with an intrinsic. For some targets, this is equivalent to the constexpr: getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1 This can be used to propagate the value in CodeGenPrepare. In ISel we add a node that can be legalized to one or more instructions to materialize the runtime vector length. This patch also adds SVE CodeGen support for VSCALE, which maps this node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD] instructions (scaled multiples of 2, 4, or 8 bytes, respectively). Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner Reviewed by: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D68203	2020-01-22 10:09:27 +00:00
Florian Hahn	535ed62c5f	[AArch64] Add custom store lowering for 256 bit non-temporal stores. Currently we fail to lower non-termporal stores for 256+ bit vectors to STNPQ, because type legalization will split them up to 128 bit stores and because there are no single non-temporal stores, creating STPNQ in the Load/Store optimizer would be quite tricky. This patch adds custom lowering for 256 bit non-temporal vector stores to improve the generated code. Reviewers: dmgreen, samparker, t.p.northover, ab Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D72919	2020-01-21 14:53:40 -08:00
Andrzej Warzynski	7e717b3990	[AArch64][SVE] Extend int_aarch64_sve_ld1_gather_imm The ACLE distinguishes between the following addressing modes for gather loads: * "scalar base, vector offset", and * "vector base, scalar offset". For the "vector base, scalar offset" case, the `int_aarch64_sve_ld1_gather_imm` intrinsic was added in `79f2422d`. Currently, that intrinsic assumes that the scalar offset is passed as an immediate. As a result, it does not cater for cases where scalar offset is stored in a register. In this patch `int_aarch64_sve_ld1_gather_imm` is extended so that all cases are covered: * `int_aarch64_sve_ld1_gather_imm` is renamed as `int_aarch64_sve_ld1_gather_scalar_offset` * new DAG combine rules are added for GLD1_IMM for scenarios where the offset is a non-immediate scalar or an out-of-range immediate * sve-intrinsics-gather-loads-vector-base.ll is renamed as sve-intrinsics-gather-loads-vector-base-imm-offset.ll * sve-intrinsics-gather-loads-vector-base-scalar-offset.ll is added to test file for non-immediate offsets Similar changes are made for scatter store intrinsics. Reviewed By: sdesmalen, efriedma Differential Revision: https://reviews.llvm.org/D71773	2020-01-20 12:19:18 +00:00
Matt Arsenault	0d0fce42b0	GlobalISel: Preserve load/store metadata in IRTranslator This was dropping the invariant metadata on dead argument loads, so they weren't deleted. Atomics still need to be fixed the same way. Also, apparently store was never preserving dereferencable which should also be fixed.	2020-01-16 13:49:43 -05:00
Benjamin Kramer	06cfcdcca7	[AArch64][SVE] Fold variable into assert to silence unused variable warnings in Release builds	2020-01-15 12:50:27 +01:00
Cullen Rhodes	93a4dede3a	[AArch64][SVE] Add ptest intrinsics Summary: Implements the following intrinsics: * @llvm.aarch64.sve.ptest.any * @llvm.aarch64.sve.ptest.first * @llvm.aarch64.sve.ptest.last Reviewers: sdesmalen, efriedma, dancgr, mgudim, cameron.mcinally, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72398	2020-01-15 11:15:01 +00:00
Danilo Carvalho Grael	2d7e757a83	[AArch64][SVE] Add patterns for some arith SVE instructions. Summary: Add patterns for the following instructions: - smax, smin, umax, umin Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin Subscribers: amehsan Differential Revision: https://reviews.llvm.org/D71779	2020-01-13 11:39:42 -05:00
KAWASHIMA Takahiro	10c11e4e2d	This option allows selecting the TLS size in the local exec TLS model, which is the default TLS model for non-PIC objects. This allows large/ many thread local variables or a compact/fast code in an executable. Specification is same as that of GCC. For example, the code model option precedes the TLS size option. TLS access models other than local-exec are not changed. It means supoort of the large code model is only in the local exec TLS model. Patch By KAWASHIMA Takahiro (kawashima-fj <t-kawashima@fujitsu.com>) Reviewers: dmgreen, mstorsjo, t.p.northover, peter.smith, ostannard Reviewd By: peter.smith Committed by: peter.smith Differential Revision: https://reviews.llvm.org/D71688	2020-01-13 10:16:53 +00:00
Jessica Paquette	ceb801612a	[AArch64] Don't generate libcalls for wide shifts on Darwin Similar to `cff90f07cb`. Darwin doesn't always use compiler-rt, and so we can't assume that these functions are available (at least on arm64).	2020-01-10 15:58:51 -08:00
Matt Arsenault	255cc5a760	CodeGen: Use LLT instead of EVT in getRegisterByName Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.	2020-01-09 17:37:52 -05:00
QingShan Zhang	2133d3c558	[DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG for vector type as 'expand' instead of 'legal' For now, we didn't set the default operation action for SIGN_EXTEND_INREG for vector type, which is 0 by default, that is legal. However, most target didn't have native instructions to support this opcode. It should be set as expand by default, as what we did for ANY_EXTEND_VECTOR_INREG. Differential Revision: https://reviews.llvm.org/D70000	2020-01-03 03:26:41 +00:00
Jay Foad	13a7a4ccbf	Remove unneeded extra variable realArgIdx. NFC.	2020-01-02 14:27:32 +00:00
Andrzej Warzynski	404da13e1e	[AArch64][SVE] Gather loads: pass 32 bit unpacked offsets as nxv2i32 Summary: Currently 32 bit unpacked offsets are passed as nxv2i64. However, as pointed out in https://reviews.llvm.org/D71074, using nxv2i32 instead would improve consistency with: * how other arguments are treated * how scatter stores are implemented This patch makes sure that 32 bit unpacked offsets are passes as nxv2i32 instead of nxv2i64. Reviewers: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71724	2020-01-02 13:01:28 +00:00
Craig Topper	4ae3120ed8	[LegalizeVectorOps][AArch64] Stop asking for v4f16 fp_round and fp_extend to be promoted. These operations are needed as building blocks for promoting so they can't be promoted themselves. This appeared to work because the fp_extend query type for operation actions is the result type, not the input type so it never triggered in the legalizer. For fp_round, the vector op legalizer just ended up creating a nop fp_extend that was elided by getNode, followed by a nop fp_round that was also elided by getNode. This was followed by a final fp_round from v4f32 back to vf416 which was CSEd to the original node. Then legalize vector ops just believed that node legalized to itself. LegalizeDAG took another crack at promoting it, but didn't have a handler so just skipped it with a debug message saying it wasn't promoted. This patch just removes the operation actions to avoid this non-sense. Found while trying to refactor LegalizeVectorOps to handle multiple result nodes better.	2019-12-31 15:04:12 -08:00
Sanjay Patel	8cefc37be5	[DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast()) support This moves the X86 specific transform from rL364407 into DAGCombiner to generically handle 'little to big' cases (for example: extract_subvector(v2i64 bitcast(v16i8))). This allows us to remove both the x86 implementation and the aarch64 bitcast(extract_subvector(bitcast())) combine. Earlier patches that dealt with regressions initially exposed by this patch: rG5e5e99c041e4 rG0b38af89e2c0 Patch by: @RKSimon (Simon Pilgrim) Differential Revision: https://reviews.llvm.org/D63815	2019-12-23 10:11:45 -05:00
Martin Storsjö	5a751e747d	[AArch64] [Windows] Use COFF stubs for calls to extern_weak functions As the extern_weak target might be missing, resolving to the absolute address zero, we can't use the normal direct PC-relative branch instructions (as that would result in relocations out of range). Improve the classifyGlobalFunctionReference method to set MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in AArch64TargetLowering::LowerCall to use the return value from classifyGlobalFunctionReference for these cases. Add code in both AArch64FastISel and GlobalISel/IRTranslator to bail out for function calls to extern weak functions on windows, to let SelectionDAG handle them. This matches what was done for X86 in `6bf108d77a`. Differential Revision: https://reviews.llvm.org/D71721	2019-12-23 12:13:49 +02:00
Sanjay Patel	0b38af89e2	[AArch64] match splat of bitcasted extract subvector to DUPLANE This is another potential regression exposed by D63815. Here we peek through a bitcast to find an extract subvector and scale the splat offset based on that: splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC' Differential Revision: https://reviews.llvm.org/D71672	2019-12-22 08:37:03 -05:00
Cullen Rhodes	974f00a436	[AArch64][SVE] Fold constant multiply of element count Summary: E.g. %0 = tail call i64 @llvm.aarch64.sve.cntw(i32 31) %mul = mul i64 %0, <const> Should emit: cntw x0, all, mul #<const> For <const> in the range 1-16. Patch by Kerry McLaughlin Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71014	2019-12-20 11:58:00 +00:00
Cullen Rhodes	3f9005eb89	Recommit "[AArch64][SVE] Add permutation and selection intrinsics" Recommit `23c28c4043` (reverted in `dcb48f50bd`) with a fix for an assert "Request for a fixed size on a scalable object" being triggered in `LowerSVEIntrinsicEXT`. The fix is to call `getKnownMinSize` on the TypeSize object.	2019-12-20 10:45:17 +00:00
Cullen Rhodes	dcb48f50bd	Revert "[AArch64][SVE] Add permutation and selection intrinsics" This reverts commit `23c28c4043`. It caused build failures in the following expensive checks builders: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1295 http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/700 Reverting for now whilst I figure what the issue is.	2019-12-19 14:26:14 +00:00
Cullen Rhodes	23c28c4043	[AArch64][SVE] Add permutation and selection intrinsics Summary: Adds the following intrinsics: * @llvm.aarch64.sve.clasta * @llvm.aarch64.sve.clasta_n * @llvm.aarch64.sve.clastb * @llvm.aarch64.sve.clastb_n * @llvm.aarch64.sve.compact * @llvm.aarch64.sve.ext * @llvm.aarch64.sve.lasta * @llvm.aarch64.sve.lastb * @llvm.aarch64.sve.rev * @llvm.aarch64.sve.splice * @llvm.aarch64.sve.tbl * @llvm.aarch64.sve.trn1 * @llvm.aarch64.sve.trn2 * @llvm.aarch64.sve.uzp1 * @llvm.aarch64.sve.uzp2 * @llvm.aarch64.sve.zip1 * @llvm.aarch64.sve.zip2 Reviewers: sdesmalen, efriedma, dancgr, mgudim, huntergr, rengolin Reviewed By: sdesmalen, efriedma Subscribers: kmclaughlin, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71401	2019-12-19 13:18:40 +00:00
Cullen Rhodes	49199465a3	[AArch64][SVE] Implement ptrue intrinsic Reviewers: sdesmalen, eli.friedman, dancgr, mgudim, cameron.mcinally, huntergr, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71457	2019-12-19 11:02:05 +00:00
Victor Campos	364b8f5fbe	[AArch64] Improve codegen of volatile load/store of i128 Summary: Instead of generating two i64 instructions for each load or store of a volatile i128 value (two LDRs or STRs), now emit a single LDP or STP. Reviewers: labrinea, t.p.northover, efriedma Reviewed By: efriedma Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69559	2019-12-18 10:03:12 +00:00
Andrzej Warzynski	7e20c3a71d	[Aarch64][SVE] Add intrinsics for scatter stores Summary: This patch adds the following SVE intrinsics for scatter stores: * 64-bit offsets: * @llvm.aarch64.sve.st1.scatter (unscaled) * @llvm.aarch64.sve.st1.scatter.index (scaled) * 32-bit unscaled offsets: * @llvm.aarch64.sve.st1.scatter.uxtw (zero-extended offset) * @llvm.aarch64.sve.st1.scatter.sxtw (sign-extended-offset) * 32-bit scaled offsets: * @llvm.aarch64.sve.st1.scatter.uxtw.index (zero-extended offset) * @llvm.aarch64.sve.st1.scatter.sxtw.index (sign-extended offset) * vector base + immediate: * @llvm.aarch64.sve.st1.scatter.imm Reviewers: rengolin, efriedma, sdesmalen Reviewed By: efriedma, sdesmalen Subscribers: kmclaughlin, eli.friedman, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71074	2019-12-16 11:52:53 +00:00
Alex Richardson	be15dfa88f	[NFC] Use EVT instead of bool for getSetCCInverse() Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void )0x12033091e < (void )0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917	2019-12-13 12:22:03 +00:00
Kerry McLaughlin	4194ca8e5a	Recommit "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores" Updated pred_load patterns added to AArch64SVEInstrInfo.td by this patch to use reg + imm non-temporal loads to fix previous test failures. Original commit message: Adds the following intrinsics: - llvm.aarch64.sve.ldnt1 - llvm.aarch64.sve.stnt1 This patch creates masked loads and stores with the MONonTemporal flag set when used with the intrinsics above.	2019-12-13 10:08:20 +00:00
Cullen Rhodes	bbd16b6876	[AArch64][SVE] Remove nxv1f32 and nxv1f64 as legal types Summary: Also cleans up ZPR register class definition. Reviewers: sdesmalen, cameron.mcinally, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71351	2019-12-12 09:49:22 +00:00
Reid Kleckner	5d986953c8	[IR] Split out target specific intrinsic enums into separate headers This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320	2019-12-11 18:02:14 -08:00
Kerry McLaughlin	c0a3ab3655	Revert "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores" This reverts commit `3f5bf35f86` as it was causing build failures in llvm-clang-x86_64-expensive-checks: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/392 http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1045	2019-12-11 13:58:39 +00:00
Andrzej Warzynski	65651f197a	[AArch64][SVE] Add DAG combine rules for gather loads and sext/zext Summary: These changes allow us to support sign-extending gather loads with the exisiting intrinsics (i.e. @llvm.aarch64.sve.ld1.gather.*). Reviewers: sdesmalen, huntergr, kmclaughlin, efriedma, rengolin, rovka, dancgr, mgudim Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential revision: https://reviews.llvm.org/D70812	2019-12-11 12:56:18 +00:00
Kerry McLaughlin	3f5bf35f86	[AArch64][SVE] Implement intrinsics for non-temporal loads & stores Summary: Adds the following intrinsics: - llvm.aarch64.sve.ldnt1 - llvm.aarch64.sve.stnt1 This patch creates masked loads and stores with the MONonTemporal flag set when used with the intrinsics above. Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71000	2019-12-11 11:13:51 +00:00
Cullen Rhodes	1b9a608c84	[AArch64][SVE] Add wide compare immediate patterns Summary: Recognize wide compares where the wide operand is a splat of a scalar value in the appropriate range and convert to the immediate variant of the instruction. Patch by Graham Hunter Reviewers: sdesmalen, efriedma, dancgr, rovka, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71009	2019-12-10 10:41:22 +00:00
Eli Friedman	f1ddef34f1	[AArch64][SVE] Implement SPLAT_VECTOR for i1 vectors. The generated sequence with whilelo is unintuitive, but it's the best I could come up with given the limited number of SVE instructions that interact with scalar registers. The other sequence I was considering was something like dup+cmpne, but an extra scalar instruction seems better than an extra vector instruction. Differential Revision: https://reviews.llvm.org/D71160	2019-12-09 15:09:33 -08:00
Danilo Carvalho Grael	b29916cec3	[AArch64][SVE] Integer reduction instructions pattern/intrinsics. Added pattern matching/intrinsics for the following SVE instructions: -- saddv, uaddv -- smaxv, sminv, umaxv, uminv -- orv, eorv, andv	2019-12-05 09:59:19 -05:00
Sander de Smalen	79f2422d6a	[Aarch64][SVE] Add intrinsics for gather loads (vector + imm) This patch adds intrinsics for SVE gather loads from memory addresses generated by a vector base plus immediate index: * @llvm.aarch64.sve.ld1.gather.imm This intrinsics maps 1-1 to the corresponding SVE instruction (example for half-words): * ld1h { z0.d }, p0/z, [z0.d, #16] Committed on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, huntergr, kmclaughlin, eli.friedman, rengolin, rovka, dancgr, mgudim, efriedma Reviewed By: sdesmalen Tags: #llvm Differential Revision: https://reviews.llvm.org/D70806	2019-12-03 15:19:16 +00:00
Sander de Smalen	8bf31e28d7	[Aarch64][SVE] Add intrinsics for gather loads with 32-bits offsets This patch adds intrinsics for SVE gather loads for which the offsets are 32-bits wide and are: * unscaled * @llvm.aarch64.sve.ld1.gather.sxtw * @llvm.aarch64.sve.ld1.gather.uxtw * scaled (offsets become indices) * @llvm.arch64.sve.ld1.gather.sxtw.index * @llvm.arch64.sve.ld1.gather.uxtw.index The offsets are either zero (uxtw) or sign (sxtw) extended to 64 bits. These intrinsics map 1-1 to the corresponding SVE instructions (examples for half-words): * unscaled * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw] * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw] * scaled * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw #1] * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw #1] Committed on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, kmclaughlin, eli.friedman, rengolin, rovka, huntergr, dancgr, mgudim, efriedma Reviewed By: sdesmalen Tags: #llvm Differential Revision: https://reviews.llvm.org/D70782	2019-12-03 14:48:29 +00:00
Sander de Smalen	6e51ceba53	[AArch64][SVE] Add intrinsics for gather loads with 64-bit offsets This patch adds the following intrinsics for gather loads with 64-bit offsets: * @llvm.aarch64.sve.ld1.gather (unscaled offset) * @llvm.aarch64.sve.ld1.gather.index (scaled offset) These intrinsics map 1-1 to the following AArch64 instructions respectively (examples for half-words): * ld1h { z0.d }, p0/z, [x0, z0.d] * ld1h { z0.d }, p0/z, [x0, z0.d, lsl #1] Committing on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, huntergr, rovka, mgudim, dancgr, rengolin, efriedma Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D70542	2019-12-03 12:55:03 +00:00
Kerry McLaughlin	7483eb656f	[AArch64][SVE] Implement shift intrinsics Summary: Adds the following intrinsics: - asr & asrd - insr - lsl & lsr This patch also adds a new AArch64ISD node (INSR) to represent the int_aarch64_sve_insr intrinsic. Reviewers: huntergr, sdesmalen, dancgr, mgudim, rengolin, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70437	2019-12-03 11:47:12 +00:00
Matt Arsenault	b696b9dba7	DAG: Add function context to isFMAFasterThanFMulAndFAdd AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.	2019-11-19 19:25:26 +05:30
Graham Hunter	3f08ad611a	[SVE][CodeGen] Scalable vector MVT size queries * Implements scalable size queries for MVTs, split out from D53137. * Contains a fix for FindMemType to avoid using scalable vector type to contain non-scalable types. * Explicit casts for several places where implicit integer sign changes or promotion from 32 to 64 bits caused problems. * CodeGenDAGPatterns will treat scalable and non-scalable vector types as different. Reviewers: greened, cameron.mcinally, sdesmalen, rovka Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D66871	2019-11-18 12:30:59 +00:00
Sander de Smalen	84a0c8e3ae	[AArch64][SVE] Spilling/filling of SVE callee-saves. Implement the spills/fills of callee-saved SVE registers using STR and LDR instructions. Also adds the `aarch64_sve_vector_pcs` attribute to specify the callee-saved registers to be used for functions that return SVE vectors or take SVE vectors as arguments. The callee-saved registers are vector registers z8-z23 and predicate registers p4-p15. The overal frame-layout with SVE will be as follows: +-------------+ \| stack args \| +-------------+ \| Callee Saves\| \| X29, X30 \| \|-------------\| <- FP \| SVE Callee \| < ////////////// \| saved regs \| < ////////////// \| z23 \| < ////////////// \| : \| < // SCALABLE // \| z8 \| < ////////////// \| p15 \| < /// STACK //// \| : \| < ////////////// \| p4 \| < //// AREA //// +-------------+ < ////////////// \| : \| < ////////////// \| SVE locals \| < ////////////// \| : \| < ////////////// +-------------+ \|/////////////\| alignment gap. \| : \| \| Stack objs \| \| : \| +-------------+ <- SP after call and frame-setup Reviewers: cameron.mcinally, efriedma, greened, thegameg, ostannard, rengolin Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D68996	2019-11-11 09:03:19 +00:00
Eli Friedman	5df3a87224	[AArch64][X86] Don't assume __powidf2 is available on Windows. We had some code for this for 32-bit ARM, but this doesn't really need to be in target-specific code; generalize it. (I think this started showing up recently because we added an optimization that converts pow to powi.) Differential Revision: https://reviews.llvm.org/D69013	2019-11-08 12:43:21 -08:00
David Green	2179867ddc	[AArch64] Select saturating Neon instructions This adds some extra patterns to select AArch64 Neon SQADD, UQADD, SQSUB and UQSUB from the existing target independent sadd_sat, uadd_sat, ssub_sat and usub_sat nodes. It does not attempt to replace the existing int_aarch64_neon_uqadd intrinsic nodes as they are apparently used for both scalar and vector, and need to be legal on scalar types for some of the patterns to work. The int_aarch64_neon_uqadd on scalar would move the two integers into floating point registers, perform a Neon uqadd and move the value back. I don't believe this is good idea for uadd_sat to do the same as the scalar alternative is simpler (an adds with a csinv). For signed it may be smaller, but I'm not sure about it being better. So this just adds some extra patterns for the existing vector instructions, matching on the _sat nodes. Differential Revision: https://reviews.llvm.org/D69374	2019-10-31 17:28:36 +00:00
Ehsan Amiri	ed7bcb2cb1	[AArch64][SVE] Add patterns for some integer vector instructions Add pattern matching for SVE vector instructions: -- add, sub, and, or, xor instructions -- sqadd, uqadd, sqsub, uqsub target-independent intrinsics -- bic intrinsics -- predicated add, sub, subr intrinsics Patch Review: https://reviews.llvm.org/D69128 Patch authored by: dancgr (Danilo Carvalho Grael)	2019-10-30 21:52:19 -04:00
Andrew Paverd	d157a9bc8b	Add Windows Control Flow Guard checks (/guard:cf). Summary: A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on indirect function calls, using either the check mechanism (X86, ARM, AArch64) or or the dispatch mechanism (X86-64). The check mechanism requires a new calling convention for the supported targets. The dispatch mechanism adds the target as an operand bundle, which is processed by SelectionDAG. Another pass (CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as required by /guard:cf. This feature is enabled using the `cfguard` CC1 option. Reviewers: thakis, rnk, theraven, pcc Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65761	2019-10-28 15:19:39 +00:00
Kerry McLaughlin	da720a38b9	[AArch64][SVE] Implement masked load intrinsics Summary: Adds support for codegen of masked loads, with non-extending, zero-extending and sign-extending variants. Reviewers: huntergr, rovka, greened, dmgreen Reviewed By: dmgreen Subscribers: dmgreen, samparker, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68877	2019-10-28 10:06:14 +00:00
Graham Hunter	84da2596f9	[AArch64][SVE] Add SPLAT_VECTOR ISD Node Adds a new ISD node to replicate a scalar value across all elements of a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot be used. Fixes up default type legalization for scalable vectors after the new MVT type ranges were introduced. At present I only use this node for scalable vectors. A DAGCombine has been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all elements are the same, but only if the default operation action of Expand has been overridden by the target. I've only added result promotion legalization for scalable vector i8/i16/i32/i64 types in AArch64 for now. Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy Reviewed By: jmolloy Differential Revision: https://reviews.llvm.org/D47775 llvm-svn: 375222	2019-10-18 11:48:35 +00:00
Kerry McLaughlin	0c7cc383e5	[AArch64][SVE] Implement unpack intrinsics Summary: Implements the following intrinsics: - int_aarch64_sve_sunpkhi - int_aarch64_sve_sunpklo - int_aarch64_sve_uunpkhi - int_aarch64_sve_uunpklo This patch also adds AArch64ISD nodes for UNPK instead of implementing the intrinsics directly, as they are required for a future patch which implements the sign/zero extension of legal vectors. This patch includes tests for the Subdivide2Argument type added by D67549 Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka Reviewed By: greened Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D67550 llvm-svn: 375210	2019-10-18 09:40:16 +00:00
Graham Hunter	b302561b76	[SVE][IR] Scalable Vector size queries and IR instruction support * Adds a TypeSize struct to represent the known minimum size of a type along with a flag to indicate that the runtime size is a integer multiple of that size * Converts existing size query functions from Type.h and DataLayout.h to return a TypeSize result * Adds convenience methods (including a transparent conversion operator to uint64_t) so that most existing code 'just works' as if the return values were still scalars. * Uses the new size queries along with ElementCount to ensure that all supported instructions used with scalable vectors can be constructed in IR. Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen Reviewed By: rovka, sdesmalen Differential Revision: https://reviews.llvm.org/D53137 llvm-svn: 374042	2019-10-08 12:53:54 +00:00
Nikola Prica	02682498b8	[ISEL][ARM][AARCH64] Tracking simple parameter forwarding registers Support for tracking registers that forward function parameters into the following function frame. For now we only support cases when parameter is forwarded through single register. Reviewers: aprantl, vsk, t.p.northover Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D66953 llvm-svn: 374033	2019-10-08 09:43:05 +00:00
Matt Arsenault	f24ac13aaa	TLI: Remove DAG argument from getRegisterByName Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292	2019-10-01 01:44:39 +00:00
Guillaume Chatelet	18f805a7ea	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types llvm-svn: 373081	2019-09-27 12:54:21 +00:00
Hans Wennborg	3740ae3b8a	Revert r372893 "[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets" This caused severe compile-time regressions, see PR43455. > Modern processors predict the targets of an indirect branch regardless of > the size of any jump table used to glean its target address. Moreover, > branch predictors typically use resources limited by the number of actual > targets that occur at run time. > > This patch changes the semantics of the option `-max-jump-table-size` to limit > the number of different targets instead of the number of entries in a jump > table. Thus, it is now renamed to `-max-jump-table-targets`. > > Before, when `-max-jump-table-size` was specified, it could happen that > cluster jump tables could have targets used repeatedly, but each one was > counted and typically resulted in tables with the same number of entries. > With this patch, when specifying `-max-jump-table-targets`, tables may have > different lengths, since the number of unique targets is counted towards the > limit, but the number of unique targets in tables is the same, but for the > last one containing the balance of targets. > > Differential revision: https://reviews.llvm.org/D60295 llvm-svn: 373060	2019-09-27 09:54:26 +00:00
Evandro Menezes	3bd8ba156b	[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets Modern processors predict the targets of an indirect branch regardless of the size of any jump table used to glean its target address. Moreover, branch predictors typically use resources limited by the number of actual targets that occur at run time. This patch changes the semantics of the option `-max-jump-table-size` to limit the number of different targets instead of the number of entries in a jump table. Thus, it is now renamed to `-max-jump-table-targets`. Before, when `-max-jump-table-size` was specified, it could happen that cluster jump tables could have targets used repeatedly, but each one was counted and typically resulted in tables with the same number of entries. With this patch, when specifying `-max-jump-table-targets`, tables may have different lengths, since the number of unique targets is counted towards the limit, but the number of unique targets in tables is the same, but for the last one containing the balance of targets. Differential revision: https://reviews.llvm.org/D60295 llvm-svn: 372893	2019-09-25 16:10:20 +00:00
Florian Hahn	364a23427b	[AArch64] Convert neon_ushl and neon_sshl with positive constants to VSHL. I think we should be able to use shl instead of sshl and ushl for positive constant shift values, unless I am missing something. We already have the machinery in place to ensure we only replace nodes, if the shift value is positive and <= the element width. This is a generalization of an earlier patch rL372565. Reviewers: t.p.northover, samparker, dmgreen, anemet Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D67955 llvm-svn: 372824	2019-09-25 08:22:05 +00:00
Florian Hahn	3e2fdbee80	[AArch64] support neon_sshl and neon_ushl in performIntrinsicCombine. Try to generate ushll/sshll for aarch64_neon_ushl/aarch64_neon_sshl, if their first operand is extended and the second operand is a constant Also adds a few tests marked with FIXME, where we can further increase codegen. Reviewers: t.p.northover, samparker, dmgreen, anemet Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D62308 llvm-svn: 372565	2019-09-23 09:38:53 +00:00
Benjamin Kramer	df4b9a3f4f	Hide implementation details in namespaces. llvm-svn: 372113	2019-09-17 12:56:29 +00:00
Graham Hunter	1a9195d817	[SVE][MVT] Fixed-length vector MVT ranges * Reordered MVT simple types to group scalable vector types together. * New range functions in MachineValueType.h to only iterate over the fixed-length int/fp vector types. * Stopped backends which don't support scalable vector types from iterating over scalable types. Reviewers: sdesmalen, greened Reviewed By: greened Differential Revision: https://reviews.llvm.org/D66339 llvm-svn: 372099	2019-09-17 10:19:23 +00:00
Kerry McLaughlin	e55b3bf40e	[SVE][Inline-Asm] Add constraints for SVE predicate registers Summary: Adds the following inline asm constraints for SVE: - Upl: One of the low eight SVE predicate registers, P0 to P7 inclusive - Upa: SVE predicate register with full range, P0 to P15 Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, cameron.mcinally, greened, rengolin Reviewed By: rovka Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66524 llvm-svn: 371967	2019-09-16 09:45:27 +00:00
Tim Northover	f1c2892912	AArch64: support arm64_32, an ILP32 slice for watchOS. This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM. FastISel is mostly disabled for now since it would generate incorrect code for ILP32. llvm-svn: 371722	2019-09-12 10:22:23 +00:00
Guillaume Chatelet	ad1cea0dda	[Alignment][NFC] Use Align with TargetLowering::setPrefFunctionAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67267 llvm-svn: 371212	2019-09-06 15:03:49 +00:00
Guillaume Chatelet	9fcf066d0c	[Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67278 llvm-svn: 371210	2019-09-06 14:51:15 +00:00
Guillaume Chatelet	4fc3ad9e13	[Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67229 llvm-svn: 371200	2019-09-06 12:48:34 +00:00
Guillaume Chatelet	aff45e4b23	[LLVM][Alignment] Make functions using log of alignment explicit Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045	2019-09-05 10:00:22 +00:00
Kerry McLaughlin	7b5c6b8d86	[SVE][Inline-Asm] Fix -Wimplicit-fallthrough in AArch64ISelLowering.cpp Summary: Adds break to 'x' case in getRegForInlineAsmConstraint added by D66302, fixing the unintentional fallthrough. Reviewers: sdesmalen, rovka, cameron.mcinally, greened, gribozavr, ruiu Reviewed By: sdesmalen Subscribers: bjope, javed.absar, tschuett, kristof.beyls, rkruppe, psnobl, llvm-commits, cfe-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67095 llvm-svn: 370769	2019-09-03 15:45:42 +00:00
Kerry McLaughlin	da4ef9b4c8	[SVE][Inline-Asm] Support for SVE asm operands Summary: Adds the following inline asm constraints for SVE: - w: SVE vector register with full range, Z0 to Z31 - x: Restricted to registers Z0 to Z15 inclusive. - y: Restricted to registers Z0 to Z7 inclusive. This change also adds the "z" modifier to interpret a register as an SVE register. Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness. Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened Reviewed By: sdesmalen Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66302 llvm-svn: 370673	2019-09-02 16:12:31 +00:00
Shiva Chen	b39876d8cd	[RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall The patch fixed the issue that RV64 didn't clear the upper bits when return complex floating value with lp64 ABI. float _Complex complex_add(float _Complex a, float _Complex b) { return a + b; } RealResult = zero_extend(RealA + RealB) ImageResult = ImageA + ImageB Return (RealResult \| (ImageResult << 32)) The patch introduces shouldExtendTypeInLibCall target hook to suppress the AssertZext generation when lowering floating LibCall. Thanks to Eli's comments from the Bugzilla https://bugs.llvm.org/show_bug.cgi?id=42820 Differential Revision: https://reviews.llvm.org/D65497 llvm-svn: 370275	2019-08-28 23:40:37 +00:00
Hans Wennborg	cff90f07cb	[SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711) Neither libgcc or compiler-rt are usually used on Windows, so these functions can't be called. Differential revision: https://reviews.llvm.org/D66880 llvm-svn: 370204	2019-08-28 13:55:10 +00:00
Tim Northover	a7f226f9db	AArch64: avoid creating cycle in DAG for post-increment NEON ops. Inserting a value into Visited has the effect of terminating a search for predecessors if that node is seen. This is legitimate for the base address, and acts as a slight performance optimization, but the vector-building node can be paert of a legitimate cycle so we shouldn't stop searching there. PR43056. llvm-svn: 370036	2019-08-27 10:21:11 +00:00
Simon Pilgrim	c88408cf85	Use VT::getHalfNumVectorElementsVT helpers in a few places. NFCI. llvm-svn: 369751	2019-08-23 12:37:09 +00:00
Shiva Chen	72a41e7b0d	[TargetLowering] Remove optional arguments passing to makeLibCall The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497. The struct contain argument flags which will pass to makeLibCall function. The patch should not has any functionality changes. Differential Revision: https://reviews.llvm.org/D65795 llvm-svn: 369622	2019-08-22 04:59:43 +00:00
Eli Friedman	b5eb3e1e82	[AArch64] Remove incorrect usage of MONonTemporal. This has no effect at the moment, but might matter if we try to implement non-temporal loads in the future. llvm-svn: 368770	2019-08-13 23:12:14 +00:00
Daniel Sanders	5ae66e56cf	[aarch64] Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Manual fixups in: AArch64InstrInfo.cpp - genFusedMultiply() now takes a Register* instead of unsigned* AArch64LoadStoreOptimizer.cpp - Ternary operator was ambiguous between Register/MCRegister. Settled on Register Depends on D65919 Reviewers: aemerson Subscribers: jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision for full review was: https://reviews.llvm.org/D65962 llvm-svn: 368628	2019-08-12 22:40:53 +00:00
Evandro Menezes	a005c1ac4f	[AArch64] Expand bcmp() for small block lengths Patch D56593 by @courbet results in calls to `bcmp()` in some cases, should the target support the it. Unless `TTI::MemCmpExpansionOptions()` is overridden by the target. In a proprietary benchmark we see a performance drop of about 12% on PNG compression before this patch, though it passes all tests. This patch mirrors X86 for AArch64 and initializes `TTI::MemCmpExpansionOptions()` to then expand calls to `bcmp()` when appropriate. No tuning of the parameters was performed, but, at this point, it's enough to recover the performance drop above. This problem also exists on ARM. Once a consensus is reached for AArch64, we can work to fix ARM as well. Authors: - Evandro Menezes (@evandro) <e.menezes@samsung.com> - Brian Rzycki (@brzycki) <b.rzycki@samsung.com> Differential revision: https://reviews.llvm.org/D64805 llvm-svn: 367898	2019-08-05 18:09:14 +00:00
Cullen Rhodes	2a48176373	[AArch64] Implement initial SVE calling convention support Summary: This patch adds initial support for the SVE calling convention such that SVE types can be passed as arguments and return values to/from a subroutine. The SVE AAPCS states [1]: z0-z7 are used to pass scalable vector arguments to a subroutine, and to return scalable vector results from a function. If a subroutine takes arguments in scalable vector or predicate registers, or if it is a function that returns results in such registers, it must ensure that the entire contents of z8-z23 are preserved across the call. In other cases it need only preserve the low 64 bits of z8-z15, as described in §5.1.2. p0-p3 are used to pass scalable predicate arguments to a subroutine and to return scalable predicate results from a function. If a subroutine takes arguments in scalable vector or predicate registers, or if it is a function that returns results in these registers, it must ensure that p4-p15 are preserved across the call. In other cases it need not preserve any scalable predicate register contents. SVE predicate and data registers are passed indirectly (i.e. spilled to the stack and pass the address) if they exceed the registers used for argument passing defined by the PCS referenced above. Until SVE stack support is merged we can't spill SVE registers to the stack, so currently an llvm_unreachable is used where we will eventually handle this. [1] https://static.docs.arm.com/100986/0000/100986_0000.pdf Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D65448 llvm-svn: 367859	2019-08-05 13:44:10 +00:00
Florian Hahn	e3ea97b049	[AArch64] Skip isZIPMask check for masks with an odd number of elements. We process 2 elements at a time and expect the number of elements to be even. Similar to D60690. Reviewers: dmgreen, samparker, t.p.northover Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D65400 llvm-svn: 367831	2019-08-05 11:12:23 +00:00
Guillaume Chatelet	c97a3d15d2	[LLVM][Alignment] Introduce Alignment Type Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jfb, jakehehrlich Reviewed By: jfb Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65514 llvm-svn: 367828	2019-08-05 11:02:05 +00:00
Bill Wendling	41a2847a9a	Emit diagnostic if an inline asm constraint requires an immediate Summary: An inline asm call can result in an immediate after inlining. Therefore emit a diagnostic here if constraint requires an immediate but one isn't supplied. Reviewers: joerg, mgorny, efriedma, rsmith Reviewed By: joerg Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60942 llvm-svn: 367750	2019-08-03 05:52:47 +00:00
Peter Collingbourne	33773d5cfc	SelectionDAG, MI, AArch64: Widen target flags fields/arguments from unsigned char to unsigned. This makes the field wider than MachineOperand::SubReg_TargetFlags so that we don't end up silently truncating any higher bits. We should still catch any bits truncated from the MachineOperand field as a consequence of the assertion in MachineOperand::setTargetFlags(). Differential Revision: https://reviews.llvm.org/D65465 llvm-svn: 367474	2019-07-31 20:14:09 +00:00
Roman Lebedev	017e272c3a	[Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 llvm-svn: 366955	2019-07-24 22:57:22 +00:00
Amara Emerson	13af1ed8e3	[GlobalISel] Support for inlining memcpy, memset and memmove calls. This introduces a new family of combiner helper routines that re-use the target specific cost model from SelectionDAG, and generate inline implementations of the memcpy family of intrinsics. The combines are only enabled at optimization levels higher than -O0, and give very substantial performance improvements. Differential Revision: https://reviews.llvm.org/D65167 llvm-svn: 366951	2019-07-24 22:17:31 +00:00
Evgeniy Stepanov	d752f5e953	Basic codegen for MTE stack tagging. Implement IR intrinsics for stack tagging. Generated code is very unoptimized for now. Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are used to implement a tagged stack frame pointer in a virtual register. Differential Revision: https://reviews.llvm.org/D64172 llvm-svn: 366360	2019-07-17 19:24:02 +00:00
Roman Lebedev	c4b83a6054	[Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457) Summary: This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 \| PR42457 ]]. In middle-end, we'd want to prefer the form with two adds - D63992, but as this diff shows, not every target will prefer that pattern. Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars, but only X86 prefer that same pattern for vectors. Here i'm adding a new TLI hook, always defaulting to the inc-of-add, but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars. Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel Reviewed By: efriedma Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64090 llvm-svn: 365010	2019-07-03 09:41:35 +00:00
Tom Tan	7ecb5145ba	[COFF, ARM64] Fix encoding of debugtrap for Windows On Windows ARM64, intrinsic __debugbreak is compiled into brk #0xF000 which is mapped to llvm.debugtrap in Clang. Instruction brk #F000 is the defined break point instruction on ARM64 which is recognized by Windows debugger and exception handling code, so llvm.debugtrap should map to it instead of redirecting to llvm.trap (brk #1) as the default implementation. Differential Revision: https://reviews.llvm.org/D63635 llvm-svn: 364115	2019-06-21 23:38:05 +00:00
Simon Pilgrim	4e0648a541	[TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179	2019-06-12 17:14:03 +00:00
Sam Parker	a0bd6f8a1a	[AArch64] Check for simple type in FPToUInt DAGCombiner was hitting a SimpleType assertion when trying to combine a v3f32 before type legalization. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41916 Differential Revision: https://reviews.llvm.org/D62734 llvm-svn: 362365	2019-06-03 08:49:17 +00:00
Adhemerval Zanella	34d8daae53	[AArch64] Handle ISD::LRINT and ISD::LLRINT This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus fcvtzs. It currently only handles the scalar version. Reviewed By: SjoerdMeijer, mstorsjo Differential Revision: https://reviews.llvm.org/D62018 llvm-svn: 361877	2019-05-28 21:04:29 +00:00
Reid Kleckner	b7a78c7dff	[AArch64] Preserve X8 for thunks ending in variadic musttail calls Summary: On Windows, X8 may be used to pass in the address of an aggregate that is returned indirectly. Therefore, it should be forwarded to variadic musttail calls and preserved in thunks. Fixes PR41997 Reviewers: mgrang, efriedma Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62344 llvm-svn: 361585	2019-05-24 01:27:20 +00:00
Florian Hahn	4a8835c655	[AArch64] Skip mask checks for masks with an odd number of elements. Some checks in isShuffleMaskLegal expect an even number of elements, e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access invalid elements and crash. This patch adds checks to the impacted functions. Fixes PR41951 Reviewers: t.p.northover, dmgreen, samparker Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D60690 llvm-svn: 361235	2019-05-21 10:05:26 +00:00
Adhemerval Zanella	2d28db6b9f	[AArch64] Handle ISD::LROUND and ISD::LLROUND This patch optimizes ISD::LROUND and ISD::LLROUND to fcvtas instruction. It currently only handles the scalar version. llvm-svn: 360894	2019-05-16 13:30:18 +00:00
Mandeep Singh Grang	5dc8aeb26d	[COFF, ARM64] Fix ABI implementation of struct returns Summary: Refer the ABI doc at: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values Related clang patch: D60349 Reviewers: rnk, efriedma, TomTan, ssijaric Reviewed By: rnk, efriedma Subscribers: mstorsjo, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60348 llvm-svn: 359934	2019-05-03 21:12:36 +00:00
Sjoerd Meijer	180f1ae57c	[TargetLowering] Change getOptimalMemOpType to take a function attribute list The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537	2019-04-30 08:38:12 +00:00
Craig Topper	35fe07916a	[AArch64] Teach getTestBitOperand to look through ANY_EXTENDS This patch teach getTestBitOperand to look through ANY_EXTENDs when the extended bits aren't used. The test case changed here is based what D60358 did to test16 in tbz-tbnz.ll. So this patch will avoid that regression. Differential Revision: https://reviews.llvm.org/D60482 llvm-svn: 358108	2019-04-10 17:27:29 +00:00
Evandro Menezes	85bd3978ae	[IR] Refactor attribute methods in Function class (NFC) Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731	2019-04-04 22:40:06 +00:00
Evandro Menezes	0f797b8732	[CodeGen] Refactor the option for the maximum jump table size Refactor the option `max-jump-table-size` to default to the maximum representable number. Essentially, NFC. llvm-svn: 357280	2019-03-29 17:28:11 +00:00
Adhemerval Zanella	a3cefa5d64	[AArch64] Optimize floating point materialization This patch follows some ideas from r352866 to optimize the floating point materialization even further. It changes isFPImmLegal to considere up to 2 mov instruction or up to 5 in case subtarget has fused literals. The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but the mov+fmov sequence is always better because of the reduced d-cache pressure. The timings are still the same if you consider movw+movk+fmov vs. adrp+ldr will be fused (although one instruction longer). Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D58460 llvm-svn: 356390	2019-03-18 18:45:57 +00:00
Adhemerval Zanella	664c1ef528	[TargetLowering] Add code size information on isFPImmLegal. NFC This allows better code size for aarch64 floating point materialization in a future patch. Reviewers: evandro Differential Revision: https://reviews.llvm.org/D58690 llvm-svn: 356389	2019-03-18 18:40:07 +00:00
Nikita Popov	1a26144ff5	[AArch64] Turn BIC immediate creation into a DAG combine Switch BIC immediate creation for vector ANDs from custom lowering to a DAG combine, which gives generic DAG combines a change to apply first. In particular this avoids (and x, -1) being turned into a (bic x, 0) instead of being eliminated entirely. Differential Revision: https://reviews.llvm.org/D59187 llvm-svn: 356299	2019-03-15 21:04:34 +00:00
Nikita Popov	aa7cfa75f9	[SDAG][AArch64] Legalize VECREDUCE Fixes https://bugs.llvm.org/show_bug.cgi?id=36796. Implement basic legalizations (PromoteIntRes, PromoteIntOp, ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes. There are more legalizations missing (esp float legalizations), but there's no way to test them right now, so I'm not adding them. This also includes a few more changes to make this work somewhat reasonably: * Add support for expanding VECREDUCE in SDAG. Usually experimental.vector.reduce is expanded prior to codegen, but if the target does have native vector reduce, it may of course still be necessary to expand due to legalization issues. This uses a shuffle reduction if possible, followed by a naive scalar reduction. * Allow the result type of integer VECREDUCE to be larger than the vector element type. For example we need to be able to reduce a v8i8 into an (nominally) i32 result type on AArch64. * Use the vector operand type rather than the scalar result type to determine the action, so we can control exactly which vector types are supported. Also change the legalize vector op code to handle operations that only have vector operands, but no vector results, as is the case for VECREDUCE. * Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE), explicitly specify for which vector types the reductions are supported. This does not handle anything related to VECREDUCE_STRICT_*. Differential Revision: https://reviews.llvm.org/D58015 llvm-svn: 355860	2019-03-11 20:22:13 +00:00
Abderrazek Zaafrani	5ced596198	[AArch64] Improve FP16 instruction selection for vector round and vector conver from half instructions https://reviews.llvm.org/D58855 llvm-svn: 355545	2019-03-06 20:30:06 +00:00
Abderrazek Zaafrani	abfd10807c	[AArch64] Improve FP16 vector convert from short instructions. https://reviews.llvm.org/D58563 llvm-svn: 355134	2019-02-28 20:21:46 +00:00
Abderrazek Zaafrani	2fc498a652	[AArch64] Generate FP16 vector compare instructions. https://reviews.llvm.org/D58561 llvm-svn: 355050	2019-02-28 00:31:38 +00:00
Petr Hosek	fcbec02ea6	[AArch64] Support reserving arbitrary general purpose registers This is a follow up to D48580 and D48581 which allows reserving arbitrary general purpose registers with the exception of registers with special purpose (X8, X16-X18, X29, X30) and registers used by LLVM (X0, X19). This change also generalizes some of the existing logic to rely entirely on values generated from tablegen. Differential Revision: https://reviews.llvm.org/D56305 llvm-svn: 353957	2019-02-13 17:28:47 +00:00
Nikita Popov	a3be17ea1c	[AArch64] Expand v8i8 cttz (PR39729) Fix for https://bugs.llvm.org/show_bug.cgi?id=39729. Rather than adding just a case for v8i8 I'm setting cttz to expand for all vector types. Differential Revision: https://reviews.llvm.org/D58008 llvm-svn: 353872	2019-02-12 18:55:53 +00:00
Eli Friedman	29c0609301	[AArch64] Fix condition for "high-vector" DUP optimizations. AArch64 NEON has a bunch of instructions with a "2" suffix that extract the top half of the source vectors, instead of the bottom half. We have some DAGCombines to try to take advantage of that. However, they assumed that any EXTRACT_VECTOR was extracting the high half of the vector in question. This issue has apparently existed since the AArch64 backend was merged. Fixes https://bugs.llvm.org/show_bug.cgi?id=40632 . Differential Revision: https://reviews.llvm.org/D57862 llvm-svn: 353486	2019-02-08 00:23:35 +00:00
Florian Hahn	3b251963c3	[CGP] Add support for sinking operands to their users, if they are free. This patch improves code generation for some AArch64 ACLE intrinsics. It adds support to CGP to duplicate and sink operands to their user, if they can be folded into a target instruction, like zexts and sub into usubl. It adds a TargetLowering hook shouldSinkOperands, which looks at the operands of instructions to see if sinking is profitable. I decided to add a new target hook, as for the sinking to be profitable, at least on AArch64, we have to look at multiple operands of an instruction, instead of looking at the users of a zext for example. The sinking is done in CGP, because it works around an instruction selection limitation. If instruction selection is not limited to a single basic block, this patch should not be needed any longer. Alternatively this could be done in the LoopSink pass, which tries to undo LICM for instructions in blocks that are not executed frequently. Note that we do not force the operands to sink to have a single user, because we duplicate them before sinking. Therefore this is only desirable if they really can be done for free. Additionally we could consider the impact on live ranges later on. This should fix https://bugs.llvm.org/show_bug.cgi?id=40025. As for performance, we have internal code that uses intrinsics and can be speed up by 10% by this change. Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D57377 llvm-svn: 353152	2019-02-05 10:27:40 +00:00
Mandeep Singh Grang	70d484d94e	[COFF, ARM64] Fix localaddress to handle stack realignment and variable size objects Summary: This fixes using the correct stack registers for SEH when stack realignment is needed or when variable size objects are present. Reviewers: rnk, efriedma, ssijaric, TomTan Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D57183 llvm-svn: 352923	2019-02-01 21:41:33 +00:00
James Y Knight	7716075a17	[opaque pointer types] Pass value type to GetElementPtr creation. This cleans up all GetElementPtr creation in LLVM to explicitly pass a value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57173 llvm-svn: 352913	2019-02-01 20:44:47 +00:00
James Y Knight	7976eb5838	[opaque pointer types] Pass function types to CallInst creation. This cleans up all CallInst creation in LLVM to explicitly pass a function type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57170 llvm-svn: 352909	2019-02-01 20:43:25 +00:00
Adhemerval Zanella	b3ccc5550d	[AArch64] Optimize floating point materialization This patch changes isFPImmLegal to return if the value can be enconded as the immediate operand of a logical instruction besides checking if for immediate field for fmov. This optimizes some floating point materization, inclusive values used on isinf lowering. Reviewed By: rengolin, efriedma, evandro Differential Revision: https://reviews.llvm.org/D57044 llvm-svn: 352866	2019-02-01 12:26:06 +00:00
James Y Knight	13680223b9	[opaque pointer types] Add a FunctionCallee wrapper type, and use it. Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc doesn't choke on it, hopefully. Original Message: The FunctionCallee type is effectively a {FunctionType,Value} pair, and is a useful convenience to enable code to continue passing the result of getOrInsertFunction() through to EmitCall, even once pointer types lose their pointee-type. Then: - update the CallInst/InvokeInst instruction creation functions to take a Callee, - modify getOrInsertFunction to return FunctionCallee, and - update all callers appropriately. One area of particular note is the change to the sanitizer code. Previously, they had been casting the result of `getOrInsertFunction` to a `Function*` via `checkSanitizerInterfaceFunction`, and storing that. That would report an error if someone had already inserted a function declaraction with a mismatching signature. However, in general, LLVM allows for such mismatches, as `getOrInsertFunction` will automatically insert a bitcast if needed. As part of this cleanup, cause the sanitizer code to do the same. (It will call its functions using the expected signature, however they may have been declared.) Finally, in a small number of locations, callers of `getOrInsertFunction` actually were expecting/requiring that a brand new function was being created. In such cases, I've switched them to Function::Create instead. Differential Revision: https://reviews.llvm.org/D57315 llvm-svn: 352827	2019-02-01 02:28:03 +00:00
James Y Knight	fadf25068e	Revert "[opaque pointer types] Add a FunctionCallee wrapper type, and use it." This reverts commit `f47d6b38c7` (r352791). Seems to run into compilation failures with GCC (but not clang, where I tested it). Reverting while I investigate. llvm-svn: 352800	2019-01-31 21:51:58 +00:00
James Y Knight	f47d6b38c7	[opaque pointer types] Add a FunctionCallee wrapper type, and use it. The FunctionCallee type is effectively a {FunctionType,Value} pair, and is a useful convenience to enable code to continue passing the result of getOrInsertFunction() through to EmitCall, even once pointer types lose their pointee-type. Then: - update the CallInst/InvokeInst instruction creation functions to take a Callee, - modify getOrInsertFunction to return FunctionCallee, and - update all callers appropriately. One area of particular note is the change to the sanitizer code. Previously, they had been casting the result of `getOrInsertFunction` to a `Function*` via `checkSanitizerInterfaceFunction`, and storing that. That would report an error if someone had already inserted a function declaraction with a mismatching signature. However, in general, LLVM allows for such mismatches, as `getOrInsertFunction` will automatically insert a bitcast if needed. As part of this cleanup, cause the sanitizer code to do the same. (It will call its functions using the expected signature, however they may have been declared.) Finally, in a small number of locations, callers of `getOrInsertFunction` actually were expecting/requiring that a brand new function was being created. In such cases, I've switched them to Function::Create instead. Differential Revision: https://reviews.llvm.org/D57315 llvm-svn: 352791	2019-01-31 20:35:56 +00:00
Reid Kleckner	96c581d7d0	[AArch64] Include AArch64GenCallingConv.inc once Summary: Avoids duplicating generated static helpers for calling convention analysis. This also means you can modify AArch64CallingConv.td without recompiling the AArch64ISelLowering.cpp monolith, so it provides faster incremental rebuilds. Saves 12K in llc.exe, but adds a new object file, which is large. Reviewers: efriedma, t.p.northover Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56948 llvm-svn: 352430	2019-01-28 21:28:40 +00:00
Matt Arsenault	39508331ef	Reapply "IR: Add fp operations to atomicrmw" This reapplies commits r351778 and r351782 with RISCV test fixes. llvm-svn: 351850	2019-01-22 18:18:02 +00:00
Chandler Carruth	285fe716c5	Revert r351778: IR: Add fp operations to atomicrmw This broke the RISCV build, and even with that fixed, one of the RISCV tests behaves surprisingly differently with asserts than without, leaving there no clear test pattern to use. Generally it seems bad for hte IR to differ substantially due to asserts (as in, an alloca is used with asserts that isn't needed without!) and nothing I did simply would fix it so I'm reverting back to green. This also required reverting the RISCV build fix in r351782. llvm-svn: 351796	2019-01-22 10:29:58 +00:00
Matt Arsenault	bfdba5e4fc	IR: Add fp operations to atomicrmw Add just fadd/fsub for now. llvm-svn: 351778	2019-01-22 03:32:36 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Matt Arsenault	0cb08e448a	Allow FP types for atomicrmw xchg llvm-svn: 351427	2019-01-17 10:49:01 +00:00
Mandeep Singh Grang	33c49c0c82	[COFF, ARM64] Implement support for SEH extensions __try/__except/__finally Summary: This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler. We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape. Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric Reviewed By: rnk, efriedma Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53540 llvm-svn: 351370	2019-01-16 19:52:59 +00:00
Eli Friedman	33aecc8182	[AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.abs.i64 . Otherwise, with D56544, the intrinsic will be expanded to an integer csel, which is probably not what the user expected. This matches the general convention of using "v1" types to represent scalar integer operations in vector registers. While I'm here, also add some error checking so we don't generate illegal ABS nodes. Differential Revision: https://reviews.llvm.org/D56616 llvm-svn: 351141	2019-01-15 00:15:24 +00:00
Bryan Chan	7ce5775e62	[AArch64] Fix operation actions for FP16 vector intrinsics Summary: This patch changes the legalization action for some half-precision floating- point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops are not supported in hardware for half-precision vectors, but promotion is not always possible (for v8f16 operands). Changing the action to Expand fixes an assertion failure in the legalizer when the frontend produces such ops. In addition, a quick microbenchmark shows that, in the v4f16 case, expanding introduces fewer spills and is therefore slightly faster than promoting. Reviewers: t.p.northover, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56296 llvm-svn: 350825	2019-01-10 15:02:37 +00:00
Tim Northover	964eea7ad2	AArch64: avoid splitting vector truncating stores. We have code to split vector splats (of zero and non-zero) for performance reasons, but it ignores the fact that a store might be truncating. Actually, truncating stores are formed for vNi8 and vNi16 types. Since the truncation is from a legal type, the size of the store is always <= 64-bits and so they don't actually benefit from being split up anyway, so this patch just disables that transformation. llvm-svn: 350620	2019-01-08 13:30:27 +00:00
Simon Pilgrim	148957f336	[AArch64] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349908	2018-12-21 15:05:10 +00:00
Kristof Beyls	e66bc1f756	Introduce control flow speculation tracking pass for AArch64 The pass implements tracking of control flow miss-speculation into a "taint" register. That taint register can then be used to mask off registers with sensitive data when executing under miss-speculation, a.k.a. "transient execution". This pass is aimed at mitigating against SpectreV1-style vulnarabilities. At the moment, it implements the tracking of miss-speculation of control flow into a taint register, but doesn't implement a mechanism yet to then use that taint register to mask off vulnerable data in registers (something for a follow-on improvement). Possible strategies to mask out vulnerable data that can be implemented on top of this are: - speculative load hardening to automatically mask of data loaded in registers. - using intrinsics to mask of data in registers as indicated by the programmer (see https://lwn.net/Articles/759423/). For AArch64, the following implementation choices are made. Some of these are different than the implementation choices made in the similar pass implemented in X86SpeculativeLoadHardening.cpp, as the instruction set characteristics result in different trade-offs. - The speculation hardening is done after register allocation. With a relative abundance of registers, one register is reserved (X16) to be the taint register. X16 is expected to not clash with other register reservation mechanisms with very high probability because: . The AArch64 ABI doesn't guarantee X16 to be retained across any call. . The only way to request X16 to be used as a programmer is through inline assembly. In the rare case a function explicitly demands to use X16/W16, this pass falls back to hardening against speculation by inserting a DSB SYS/ISB barrier pair which will prevent control flow speculation. - It is easy to insert mask operations at this late stage as we have mask operations available that don't set flags. - The taint variable contains all-ones when no miss-speculation is detected, and contains all-zeros when miss-speculation is detected. Therefore, when masking, an AND instruction (which only changes the register to be masked, no other side effects) can easily be inserted anywhere that's needed. - The tracking of miss-speculation is done by using a data-flow conditional select instruction (CSEL) to evaluate the flags that were also used to make conditional branch direction decisions. Speculation of the CSEL instruction can be limited with a CSDB instruction - so the combination of CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL aren't speculated. When conditional branch direction gets miss-speculated, the semantics of the inserted CSEL instruction is such that the taint register will contain all zero bits. One key requirement for this to work is that the conditional branch is followed by an execution of the CSEL instruction, where the CSEL instruction needs to use the same flags status as the conditional branch. This means that the conditional branches must not be implemented as one of the AArch64 conditional branches that do not use the flags as input (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction selectors to not produce these instructions when speculation hardening is enabled. This pass will assert if it does encounter such an instruction. - On function call boundaries, the miss-speculation state is transferred from the taint register X16 to be encoded in the SP register as value 0. Future extensions/improvements could be: - Implement this functionality using full speculation barriers, akin to the x86-slh-lfence option. This may be more useful for the intrinsics-based approach than for the SLH approach to masking. Note that this pass already inserts the full speculation barriers if the function for some niche reason makes use of X16/W16. - no indirect branch misprediction gets protected/instrumented; but this could be done for some indirect branches, such as switch jump tables. Differential Revision: https://reviews.llvm.org/D54896 llvm-svn: 349456	2018-12-18 08:50:02 +00:00
Arnaud A. de Grandmaison	dfe861087d	[AArch64] Catch some more CMN opportunities. Fixes https://bugs.llvm.org/show_bug.cgi?id=33486 llvm-svn: 349022	2018-12-13 10:31:32 +00:00
Matthias Braun	d041212c07	AArch64: Fix invalid CCMP emission The code emitting AND-subtrees used to check whether any of the operands was an OR in order to figure out if the result needs to be negated. However the OR could be hidden in further subtrees and not immediately visible. Change the code so that canEmitConjunction() determines whether the result of the generated subtree needs to be negated. Cleanup emission logic to use this. I also changed the code a bit to make all negation decisions early before we actually emit the subtrees. This fixes http://llvm.org/PR39550 Differential Revision: https://reviews.llvm.org/D54137 llvm-svn: 348444	2018-12-06 01:40:23 +00:00
Craig Topper	129d529ab3	[SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG. I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse. Differential Revision: https://reviews.llvm.org/D54276 llvm-svn: 347902	2018-11-29 19:36:17 +00:00
David Blaikie	1fecbec5fa	AArch64ISelLowering: Remove a return-of-assignment to allow NRVO Patch by Arthur O'Dwyer! llvm-svn: 347609	2018-11-26 22:57:18 +00:00
John Brawn	d6e0ebea10	[AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTOR A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop. Fix this by making LowerBUILD_VECTOR not do this transformation for those vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element constant vector. Doing that means we get a DUP, which we then need to recognise in ISel as a copy. llvm-svn: 347456	2018-11-22 11:45:23 +00:00
Sanjay Patel	0a515595a7	[x86] allow vector load narrowing with multi-use values This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595	2018-11-10 20:05:31 +00:00
Eli Friedman	ad1151cf6a	[ARM64] [Windows] Handle funclets This patch adds support for funclets in frame lowering and ISel lowering. Together with D50288 and D50166, it enables C++ exception handling. Patch by Sanjin Sijaric, with some fixes by me. Differential Revision: https://reviews.llvm.org/D51524 llvm-svn: 346568	2018-11-09 23:33:30 +00:00
Mandeep Singh Grang	397765bc51	[COFF, ARM64] Add support for MSVC buffer security check Reviewers: rnk, mstorsjo, compnerd, efriedma, TomTan Reviewed By: rnk Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D54248 llvm-svn: 346469	2018-11-09 02:48:36 +00:00
Matthias Braun	96d12513a1	AArch64: Cleanup CCMP code; NFC Cleanup CCMP pattern matching code in preparation for review/bugfix: - Rename `isConjunctionDisjunctionTree()` to `canEmitConjunction()` (it won't accept arbitrary disjunctions and is really about whether we can transform the subtree into a conjunction that we can emit). - Rename `emitConjunctionDisjunctionTree()` to `emitConjunction()` llvm-svn: 346203	2018-11-06 03:15:22 +00:00
Craig Topper	0b5f8169b0	[TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit. llvm-svn: 346180	2018-11-05 23:26:13 +00:00
Mandeep Singh Grang	547a0d765a	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Patch by: Yin Ma (yinma@codeaurora.org) Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53996 llvm-svn: 345909	2018-11-01 23:22:25 +00:00
Mandeep Singh Grang	df19e57a1c	[COFF, ARM64] Implement llvm.addressofreturnaddress intrinsic Reviewers: rnk, mstorsjo, efriedma, TomTan Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53962 llvm-svn: 345892	2018-11-01 21:23:47 +00:00
Reid Kleckner	eb56894a4b	[AArch64] Fix unintended fallthrough and strengthen cast This was added in r330630. GCC's -Wimplicit-fallthrough seems to not fire when the previous case contains a switch itself. This fallthrough was bening because the helper function implementing the case used dyn_cast to re-check the type of the node in question. After fixing the fallthrough, we can strengthen the cast. llvm-svn: 345864	2018-11-01 18:02:27 +00:00
Mandeep Singh Grang	b0cdf56dd7	Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64" This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2. llvm-svn: 345863	2018-11-01 17:53:57 +00:00
Mandeep Singh Grang	88ad9ac720	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma Reviewed By: efriedma Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53673 llvm-svn: 345791	2018-10-31 23:16:20 +00:00
Mandeep Singh Grang	71e0cc2a0b	[COFF, ARM64] Make sure to forward arguments from vararg to musttail vararg Summary: Thunk functions in Windows are varag functions that call a musttail function to pass the arguments after the fixup is done. We need to make sure that we forward the arguments from the caller vararg to the callee vararg function. This is the same mechanism that is used for Windows on X86. Reviewers: ssijaric, eli.friedman, TomTan, mgrang, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, chrib, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53843 llvm-svn: 345641	2018-10-30 20:46:10 +00:00
David Greene	3e89fa8e08	[AArch64] Create proper memoperand for multi-vector stores Re-apply r345315 with testcase fixes. Include all of the store's source vector operands when creating the MachineMemOperand. Previously, we were missing the first operand, making the store size seem smaller than it really is. Differential Revision: https://reviews.llvm.org/D52816 llvm-svn: 345631	2018-10-30 19:17:51 +00:00
Vlad Tsyrklevich	21beeb29ea	Revert "[AArch64] Create proper memoperand for multi-vector stores" This reverts commit r345315, it was causing test failures on sanitizer-x86_64-linux-fast. llvm-svn: 345356	2018-10-26 02:00:14 +00:00
David Greene	53e869da7d	[AArch64] Create proper memoperand for multi-vector stores Include all of the store's source vector operands when creating the MachineMemOperand. Previously, we were missing the first operand, making the store size seem smaller than it really is. Differential Revision: https://reviews.llvm.org/D52816 llvm-svn: 345315	2018-10-25 21:10:39 +00:00
Thomas Lively	30f1d69115	[NFC] Rename minnan and maxnan to minimum and maximum Summary: Changes all uses of minnan/maxnan to minimum/maximum globally. These names emphasize that the semantic difference between these operations is more than just NaN-propagation. Reviewers: arsenm, aheejin, dschuff, javed.absar Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53112 llvm-svn: 345218	2018-10-24 22:49:55 +00:00
Tim Northover	1c353419ab	AArch64: add a pass to compress jump-table entries when possible. llvm-svn: 345188	2018-10-24 20:19:09 +00:00
Simon Pilgrim	095a7fe635	[AARCH64] Improve vector popcnt lowering with ADDLP AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors. This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258. Differential Revision: https://reviews.llvm.org/D53259 llvm-svn: 344554	2018-10-15 21:15:58 +00:00
Arnaud A. de Grandmaison	162435e7b5	[AArch64] Swap comparison operands if that enables some folding. Summary: AArch64 can fold some shift+extend operations on the RHS operand of comparisons, so swap the operands if that makes sense. This provides a fix for https://bugs.llvm.org/show_bug.cgi?id=38751 Reviewers: efriedma, t.p.northover, javed.absar Subscribers: mcrosier, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53067 llvm-svn: 344439	2018-10-13 07:43:56 +00:00
Nirav Dave	e40e2bbd37	[AArch64] Share search bookkeeping in combines. NFCI. Share predecessor search bookkeeping in both perform PostLD1Combine and performNEONPostLDSTCombine. This should be approximately a 4x and 2x performance improvement. llvm-svn: 342986	2018-09-25 15:30:22 +00:00
Tri Vo	6c47c62588	[AArch64] Support adding X[8-15,18] registers as CSRs. Summary: Specifying X[8-15,18] registers as callee-saved is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. As part of this patch we: - use custom CSR list/mask when user specifies custom CSRs - update Machine Register Info's list of CSRs with additional custom CSRs in LowerCall and LowerFormalArguments. Reviewers: srhines, nickdesaulniers, efriedma, javed.absar Reviewed By: nickdesaulniers Subscribers: kristof.beyls, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52216 llvm-svn: 342824	2018-09-22 22:17:50 +00:00
Alex Bradbury	79518b02cd	[AtomicExpandPass]: Add a hook for custom cmpxchg expansion in IR This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they will see no functional change. Previously this hook returned bool, but it now returns AtomicExpansionKind. This hook allows targets to select how a given cmpxchg is to be expanded. D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic. See my associated RFC for more info on the motivation for this change <http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>. Differential Revision: https://reviews.llvm.org/D48130 llvm-svn: 342550	2018-09-19 14:51:42 +00:00
Sander de Smalen	4dbc512676	[AArch64] Add parsing of aarch64_vector_pcs attribute. This patch adds parsing support for the 'aarch64_vector_pcs' calling convention attribute to calls and function declarations. More information describing the vector ABI and procedure call standard can be found here: https://developer.arm.com/products/software-development-tools/\ hpc/arm-compiler-for-hpc/vector-function-abi Reviewers: t.p.northover, rnk, rengolin, javed.absar, thegameg, SjoerdMeijer Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D51477 llvm-svn: 342030	2018-09-12 08:54:06 +00:00
Nick Desaulniers	287a3be379	[AArch64] Support reserving x1-7 registers. Summary: Reserving registers x1-7 is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. This change adds support for reserving registers x1 through x7. Reviewers: javed.absar, phosek, srhines, nickdesaulniers, efriedma Reviewed By: nickdesaulniers, efriedma Subscribers: niravd, jfb, manojgupta, nickdesaulniers, jyknight, efriedma, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D48580 llvm-svn: 341706	2018-09-07 20:58:57 +00:00
JF Bastien	2920061105	ARM64: improve non-zero memset isel by ~2x Summary: I added a few ARM64 memset codegen tests in r341406 and r341493, and annotated where the generated code was bad. This patch fixes the majority of the issues by requesting that a 2xi64 vector be used for memset of 32 bytes and above. The patch leaves the former request for f128 unchanged, despite f128 materialization being suboptimal: doing otherwise runs into other asserts in isel and makes this patch too broad. This patch hides the issue that was present in bzero_40_stack and bzero_72_stack because the code now generates in a better order which doesn't have the store offset issue. I'm not aware of that issue appearing elsewhere at the moment. <rdar://problem/44157755> Reviewers: t.p.northover, MatzeB, javed.absar Subscribers: eraman, kristof.beyls, chrib, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51706 llvm-svn: 341558	2018-09-06 16:03:32 +00:00
JF Bastien	da33900b95	NFC: improve ARM64 isFPImmLegal debug print Forking this change from D51706. This just made it easier to understand llc output with -debug. llvm-svn: 341504	2018-09-05 23:38:11 +00:00
Martin Storsjo	fed420d6b6	[MinGW] [AArch64] Add stubs for potential automatic dllimported variables The runtime pseudo relocations can't handle the AArch64 format PC relative addressing in adrp+add/ldr pairs. By using stubs, the potentially dllimported addresses can be touched up by the runtime pseudo relocation framework. Differential Revision: https://reviews.llvm.org/D51452 llvm-svn: 341401	2018-09-04 20:56:21 +00:00
Martin Storsjo	5c984fb16d	[AArch64] Simplify code in LowerGlobalAddress. NFCI. When initial support for dllimport was added for aarch64 in SVN r316555, ClassifyGlobalReference didn't set the MO_DLLIMPORT flag - that was only completed in SVN r323810. Reuse the return value from ClassifyGlobalReference for this purpose as well. llvm-svn: 341310	2018-09-03 11:59:23 +00:00
Eli Friedman	071203bbf2	[AArch64] Reject inline asm with FP registers when FP is disabled. Otherwise, we would crash trying to deal with an illegal input. Differential Revision: https://reviews.llvm.org/D51202 llvm-svn: 340637	2018-08-24 19:12:13 +00:00
David Green	9dd1d451d9	[AArch64] Add Tiny Code Model for AArch64 This adds the plumbing for the Tiny code model for the AArch64 backend. This, instead of loading addresses through the normal ADRP;ADD pair used in the Small model, uses a single ADR. The 21 bit range of an ADR means that the code and its statically defined symbols need to be within 1MB of each other. This makes it mostly interesting for embedded applications where we want to fit as much as we can in as small a space as possible. Differential Revision: https://reviews.llvm.org/D49673 llvm-svn: 340397	2018-08-22 11:31:39 +00:00
Chandler Carruth	66654b72c9	[SDAG] Remove the reliance on MI's allocation strategy for `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be surprised at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740	2018-08-14 23:30:32 +00:00
Eli Friedman	0d12e90bf5	[ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly. Intentionally excluding nodes from the DAGCombine worklist is likely to lead to weird optimizations and infinite loops, so it's generally a bad idea. To avoid the infinite loops, fix DAGCombine to use the isDesirableToCommuteWithShift target hook before performing the transforms in question, and implement the target hook in the ARM backend disable the transforms in question. Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a reduced testcase for that bug. But we should have sufficient test coverage for PerformSHLSimplify given that we're not playing weird tricks with the worklist. I can try to bugpoint it if necessary, though.) Differential Revision: https://reviews.llvm.org/D50667 llvm-svn: 339734	2018-08-14 22:10:25 +00:00
Bryan Chan	e023706471	[AArch64] Fix assertion failure on widened f16 BUILD_VECTOR Summary: Ensure that NormalizedBuildVector returns a BUILD_VECTOR with operands of the same type. This fixes an assertion failure in VerifySDNode. Reviewers: SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50202 llvm-svn: 339013	2018-08-06 14:14:41 +00:00
Craig Topper	2f60ef2c78	[DAGCombiner][TargetLowering] Pass a SmallVector instead of a std::vector to BuildSDIV/BuildUDIV/etc. The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation. llvm-svn: 338329	2018-07-30 23:22:00 +00:00
Craig Topper	a568a27dfa	[DAGCombiner][PowerPC][AArch64] Pass Created vector by reference to BuildSDIVPow2. llvm-svn: 338303	2018-07-30 21:04:34 +00:00
Adhemerval Zanella	cadcfed7aa	[AArch64] Add custom lowering for v4i8 trunc store This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int src, int width, unsigned char dst) { for (int i = 0; i < width; i++) dst++ = src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735	2018-06-27 13:58:46 +00:00
Tim Northover	70666e7765	[AArch64] Implement FLT_ROUNDS macro. Very similar to ARM implementation, just maps to an MRS. Should fix PR25191. Patch by Michael Brase. llvm-svn: 335118	2018-06-20 12:09:01 +00:00
Petr Hosek	7250908016	[AArch64] Support reserving x20 register Register x20 is a callee-saved register which may be used for other purposes in certain contexts, for example to hold special variables within the kernel. This change adds support for reserving this register both to frontend and backend to make this register usable for these purposes. Differential Revision: https://reviews.llvm.org/D46552 llvm-svn: 334531	2018-06-12 20:00:50 +00:00
Evandro Menezes	f8425340e4	[AArch64] Fix PR32384: bump up the number of stores per memset and memcpy As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change makes the inlining of `memset()` and `memcpy()` more aggressive when compiling for speed. The tuning remains the same when optimizing for size. Patch by: Sebastian Pop <s.pop@samsung.com> Evandro Menezes <e.menezes@samsung.com> Differential revision: https://reviews.llvm.org/D45098 llvm-svn: 333429	2018-05-29 15:58:50 +00:00
Sirish Pande	cabe50a308	[AArch64] Gangup loads and stores for pairing. Keep loads and stores together (target defines how many loads and stores to gang up), such that it will help in pairing and vectorization. Differential Revision https://reviews.llvm.org/D46477 llvm-svn: 332482	2018-05-16 15:36:52 +00:00
Peter Smith	c811758da6	[AArch64] Support "S" inline assembler constraint This patch re-introduces the "S" inline assembler constraint. This matches an absolute symbolic address or a label reference. The primary use case is asm("adrp %0, %1\n\t" "add %0, %0, :lo12:%1" : "=r"(addr) : "S"(&var)); I say re-introduces as it seems like "S" was implemented in the original AArch64 backend, but it looks like it wasn't carried forward to the merged backend. The original implementation had A and L modifiers that could be used to print ":lo12:" to the string. It looks like gcc doesn't use these and :lo12: is expected to be written in the inline assembly string so I've not implemented A and L. Clang already supports the S modifier. Fixes PR37180 Differential Revision: https://reviews.llvm.org/D46745 llvm-svn: 332444	2018-05-16 09:33:25 +00:00
Nicola Zaghen	d34e60ca85	Rename DEBUG macro to LLVM_DEBUG. The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' \| xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master \| ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240	2018-05-14 12:53:11 +00:00
Geoff Berry	60460268c0	[AArch64] Fix performPostLD1Combine to check for constant lane index. Summary: performPostLD1Combine in AArch64ISelLowering looks for vector insert_vector_elt of a loaded value which it can optimize into a single LD1LANE instruction. The code checking for the pattern was not checking if the lane index was a constant which could cause two problems: - an assert when lowering the LD1LANE ISD node since it assumes an constant operand - an assert in isel if the lane index value depends on the post-incremented base register Both of these issues are avoided by simply checking that the lane index is a constant. Fixes bug 35822. Reviewers: t.p.northover, javed.absar Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D46591 llvm-svn: 332103	2018-05-11 16:25:06 +00:00
Haicheng Wu	0aae2bc260	[CGP] Split large data structres to sink more GEPs Accessing the members of a large data structures needs a lot of GEPs which usually have large offsets due to the size of the underlying data structure. If the offsets are too large to fit into the r+i addressing mode, these GEPs cannot be sunk to their users' blocks and many extra registers are needed then to carry the values of these GEPs. This patch tries to split a large data struct starting from %base like the following. Before: BB0: %base = BB1: %gep0 = gep %base, off0 %gep1 = gep %base, off1 %gep2 = gep %base, off2 BB2: %load1 = load %gep0 %load2 = load %gep1 %load3 = load %gep2 After: BB0: %base = %new_base = gep %base, off0 BB1: %new_gep0 = %new_base %new_gep1 = gep %new_base, off1 - off0 %new_gep2 = gep %new_base, off2 - off0 BB2: %load1 = load i32, i32* %new_gep0 %load2 = load i32, i32* %new_gep1 %load3 = load i32, i32* %new_gep2 In the above example, the struct is split into two parts. The first part still starts from %base and the second part starts from %new_base. After the splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to BB2 and folded into their users. The algorithm to split data structure is simple and very similar to the work of merging SExts. First, it collects GEPs that have large offsets when iterating the blocks. Second, it splits the underlying data structures and updates the collected GEPs to use smaller offsets. Differential Revision: https://reviews.llvm.org/D42759 llvm-svn: 332015	2018-05-10 18:27:36 +00:00
Craig Topper	781aa181ab	Fix a bunch of places where operator-> was used directly on the return from dyn_cast. Inspired by r331508, I did a grep and found these. Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa. llvm-svn: 331577	2018-05-05 01:57:00 +00:00
Michael Berg	7acc81b744	Fast Math Flag mapping into SDNode Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage. Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar Reviewed By: spatel Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng Differential Revision: https://reviews.llvm.org/D45710 llvm-svn: 331547	2018-05-04 18:48:20 +00:00
Adhemerval Zanella	a57ef17ab6	[AArch64] Custom Lower MULLH{S,U} for v16i8, v8i16, and v4i32 This patch adds a custom lowering for ISD::MULH{S,U} used on divide by constant optimization (DAGCombiner::BuildSDIV and DAGCombiner::BuildUDIV). New patterns for smull and umull are added, so AArch64ISD::{S,U}MULL can be correctly lowered to smull2 and umull2. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46009 llvm-svn: 331522	2018-05-04 14:33:55 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Peter Collingbourne	5ab4a4793e	Reland r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses.", with a fix for the bot failure. This reland includes a check to prevent the DAG combiner from folding an offset that is smaller than the existing one. This can cause oscillations between two possible DAGs, which was the cause of the hang and later assertion failure observed on the lnt-ctmark-aarch64-O3-flto bot. http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/ Original commit message: > This is a code size win in code that takes offseted addresses > frequently, such as C++ constructors that typically need to compute > an offseted address of a vtable. This reduces the size of Chromium > for Android's .text section by 108KB. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 330630	2018-04-23 19:09:34 +00:00
Peter Collingbourne	ab40e44ba1	Revert r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses." Caused a hang and eventually an assertion failure in LTO builds of 7zip-benchmark on aarch64 iOS targets. http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/ llvm-svn: 330063	2018-04-13 20:21:00 +00:00
Peter Collingbourne	00db326b0d	AArch64: Introduce a DAG combine for folding offsets into addresses. This is a code size win in code that takes offseted addresses frequently, such as C++ constructors that typically need to compute an offseted address of a vtable. This reduces the size of Chromium for Android's .text section by 108KB. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 329956	2018-04-12 21:23:55 +00:00
Amara Emerson	e27d5016ef	[AArch64] Fix isel failure when BUILD_PAIR nodes are left over. rdar://39175175 llvm-svn: 329743	2018-04-10 19:01:58 +00:00
Peter Collingbourne	a7d936f0c0	Revert r329611, "AArch64: Allow offsets to be folded into addresses with ELF." Caused a build failure in check-tsan. llvm-svn: 329718	2018-04-10 16:19:30 +00:00
Peter Collingbourne	5cff2409ae	AArch64: Allow offsets to be folded into addresses with ELF. This is a code size win in code that takes offseted addresses frequently, such as C++ constructors that typically need to compute an offseted address of a vtable. It reduces the size of Chromium for Android's .text section by 46KB, or 56KB with ThinLTO (which exposes more opportunities to use a direct access rather than a GOT access). Because the addend range is limited in COFF and Mach-O, this is enabled for ELF only. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 329611	2018-04-09 19:59:57 +00:00
Craig Topper	2fa1436206	[IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806	2018-03-29 17:21:10 +00:00
David Blaikie	36a0f226b1	Fix layering by moving ValueTypes.h from CodeGen to IR ValueTypes.h is implemented in IR already. llvm-svn: 328397	2018-03-23 23:58:31 +00:00
David Blaikie	13e77db2df	Fix layering of MachineValueType.h by moving it from CodeGen to Support This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395	2018-03-23 23:58:25 +00:00
John Brawn	e3b44f9de6	[AArch64] Don't reduce the width of loads if it prevents combining a shift Loads and stores can only shift the offset register by the size of the value being loaded, but currently the DAGCombiner will reduce the width of the load if it's followed by a trunc making it impossible to later combine the shift. Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and make it prevent the width reduction if this is what would happen, though do allow it if reducing the load width will let us eliminate a later sign or zero extend. Differential Revision: https://reviews.llvm.org/D44794 llvm-svn: 328321	2018-03-23 14:47:07 +00:00
Martin Storsjo	9a55c1b0dc	[ARM, AArch64] Check the no-stack-arg-probe attribute for dynamic stack probes This extends the use of this attribute on ARM and AArch64 from SVN r325900 (where it was only checked for fixed stack allocations on ARM/AArch64, but for all stack allocations on X86). This also adds a testcase for the existing use of disabling the fixed stack probe with the attribute on ARM and AArch64. Differential Revision: https://reviews.llvm.org/D44291 llvm-svn: 327897	2018-03-19 20:06:50 +00:00
Martin Storsjo	36d6419cc5	[AArch64] Skip an unnecessary getCopyToReg in DYNAMIC_STACKALLOC Differential Revision: https://reviews.llvm.org/D44586 llvm-svn: 327779	2018-03-17 20:08:48 +00:00
Martin Storsjo	bde677289a	[AArch64] Don't produce R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC Support for this relocation is missing in both LLD and GNU binutils at the moment. This reverts the ELF parts of SVN r327316. llvm-svn: 327503	2018-03-14 13:09:10 +00:00
Martin Storsjo	7bc64bd889	[AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str Differential Revision: https://reviews.llvm.org/D44355 llvm-svn: 327316	2018-03-12 18:47:43 +00:00
Martin Storsjo	cc24096d4d	[AArch64] Implement native TLS for Windows Differential Revision: https://reviews.llvm.org/D43971 llvm-svn: 327220	2018-03-10 19:05:21 +00:00
Sebastian Pop	41073e8046	[AArch64] define isExtractSubvectorCheap Following the ARM-neon backend, define isExtractSubvectorCheap to return true when extracting low and high part of a neon register. The patch disables a test in llvm/test/CodeGen/AArch64/arm64-ext.ll This testcase is fragile in the sense that it requires a BUILD_VECTOR to "survive" all DAG transforms until ISelLowering. The testcase is supposed to check that AArch64TargetLowering::ReconstructShuffle() works, and for that we need a BUILD_VECTOR in ISelLowering. As we now transform the BUILD_VECTOR earlier into an VEXT + vector_shuffle, we don't have the BUILD_VECTOR pattern when we get to ISelLowering. As there is no way to disable the combiner to only exercise the code in ISelLowering, the patch disables the testcase. Differential revision: https://reviews.llvm.org/D43973 llvm-svn: 326811	2018-03-06 16:54:55 +00:00
Sebastian Pop	ac0bfb5938	fix PR36582 The error occurs when reading i16 elements (as in the testcase) from a v8i8 with a pattern of <0,2,4,6>. As all the data in the vector is accessed, the operation is not a VUZP. The patch stops the pattern recognition of VUZP when EXTRACT_VECTOR_ELT has a different element type than BUILD_VECTOR. llvm-svn: 326722	2018-03-05 17:35:49 +00:00
Evandro Menezes	cd855f70c5	[AArch64] Improve code generation of constant vectors Use the whole gammut of constant immediates available to set up a vector. Instead of using, for example, `mov w0, #0xffff; dup v0.4s, w0`, which transfers between register files, use the more efficient `movi v0.4s, #-1` instead. Not limited to just a few values, but any immediate value that can be encoded by all the variants of `FMOV`, `MOVI`, `MVNI`, thus eliminating the need to there be patterns to optimize special cases. Differential revision: https://reviews.llvm.org/D42133 llvm-svn: 326718	2018-03-05 17:02:47 +00:00
Evandro Menezes	2bbb4a7c93	[AArch64] Clean up code (NFC) Clean up a couple of functions in `AArch64TargetLowering` by removing redundant statements. llvm-svn: 326486	2018-03-01 21:17:36 +00:00
Sebastian Pop	c33af715d7	[AArch64] generate vuzp instead of mov when a BUILD_VECTOR is created out of a sequence of EXTRACT_VECTOR_ELT with a specific pattern sequence, either <0, 2, 4, ...> or <1, 3, 5, ...>, replace the BUILD_VECTOR with either vuzp1 or vuzp2. With this patch LLVM generates the following code for the first function fun1 in the testcase: adrp x8, .LCPI0_0 ldr q0, [x8, :lo12:.LCPI0_0] tbl v0.16b, { v0.16b }, v0.16b ext v1.16b, v0.16b, v0.16b, #8 uzp1 v0.8b, v0.8b, v1.8b str d0, [x8] ret Without this patch LLVM currently generates this code: adrp x8, .LCPI0_0 ldr q0, [x8, :lo12:.LCPI0_0] tbl v0.16b, { v0.16b }, v0.16b mov v1.16b, v0.16b mov v1.b[1], v0.b[2] mov v1.b[2], v0.b[4] mov v1.b[3], v0.b[6] mov v1.b[4], v0.b[8] mov v1.b[5], v0.b[10] mov v1.b[6], v0.b[12] mov v1.b[7], v0.b[14] str d1, [x8] ret llvm-svn: 326443	2018-03-01 15:47:39 +00:00
Chih-Hung Hsieh	9f9e4681ac	[TLS] use emulated TLS if the target supports only this mode Emulated TLS is enabled by llc flag -emulated-tls, which is passed by clang driver. When llc is called explicitly or from other drivers like LTO, missing -emulated-tls flag would generate wrong TLS code for targets that supports only this mode. Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether emulated TLS code should be generated. Unit tests are modified to run with and without the -emulated-tls flag. Differential Revision: https://reviews.llvm.org/D42999 llvm-svn: 326341	2018-02-28 17:48:55 +00:00
Evandro Menezes	72f3983633	[AArch64] Refactor instructions using SIMD immediates Get rid of icky goto loops and make the code easier to maintain. Otherwise, NFC. Restore r324903 and fix PR36369. Differentail revision: https://reviews.llvm.org/D43364 llvm-svn: 325621	2018-02-20 20:31:45 +00:00
Martin Storsjo	a63a5b993e	[AArch64] Implement dynamic stack probing for windows This makes sure that alloca() function calls properly probe the stack as needed. Differential Revision: https://reviews.llvm.org/D42356 llvm-svn: 325433	2018-02-17 14:26:32 +00:00
Evandro Menezes	10ae20d80c	[AArch64] Fix BITCAST lowering crash The data type is assumed to be a vector, but sometimes it is not, leading to an assertion. Add simple test-case to verify this. Differential revision: https://reviews.llvm.org/D42599 llvm-svn: 325378	2018-02-16 20:00:57 +00:00
Hans Wennborg	f381e94ac8	Revert r324903 "[AArch64] Refactor identification of SIMD immediates" It caused "Cannot select: t33: f64 = AArch64ISD::FMOV Constant:i32<0>" in Chromium builds. See PR36369. > Get rid of icky goto loops and make the code easier to maintain (NFC). > > Differential revision: https://reviews.llvm.org/D42723 llvm-svn: 325034	2018-02-13 18:14:38 +00:00
Oliver Stannard	02f08c9d1f	[AArch64] Improve v8.1-A code-gen for atomic load-and Armv8.1-A added an atomic load-clear instruction (which performs bitwise and with the complement of it's operand), but not a load-and instruction. Our current code-generation for atomic load-and always inserts an MVN instruction to invert its argument, even if it could be folded into a constant or another instruction. This adds lowering early in selection DAG to convert a load-and operation into an xor with -1 and a load-clear, allowing the normal DAG optimisations to work on it. To do this, I've had to add a new ISD opcode, ATOMIC_LOAD_CLR. I don't see any easy way to do this with an AArch64-specific ISD node, because the code-generation for atomic operations assumes the SDNodes are of type AtomicSDNode. I've left the old tablegen patterns in because they are still needed for global isel. Differential revision: https://reviews.llvm.org/D42478 llvm-svn: 324908	2018-02-12 17:03:11 +00:00
Evandro Menezes	7dc0f1ec45	[AArch64] Refactor identification of SIMD immediates Get rid of icky goto loops and make the code easier to maintain (NFC). Differential revision: https://reviews.llvm.org/D42723 llvm-svn: 324903	2018-02-12 16:41:41 +00:00
Oliver Stannard	4269917304	[AArch64] Improve v8.1-A code-gen for atomic load-subtract Armv8.1-A added an atomic load-add instruction, but not a load-subtract instruction. Our current code-generation for atomic load-subtract always inserts a NEG instruction to negate it's argument, even if it could be folded into a constant or another instruction. This adds lowering early in selection DAG to convert a load-subtract operation into a subtract and a load-add, allowing the normal DAG optimisations to work on it. I've left the old tablegen patterns in because they are still needed for global isel. Some of the tests in this patch are copied from D35375 by Chad Rosier (which was abandoned). Differential revision: https://reviews.llvm.org/D42477 llvm-svn: 324892	2018-02-12 14:22:03 +00:00
Sjoerd Meijer	5ea465ded7	[AArch64] Don't materialize 0 with "fmov h0, .." when FullFP16 is not supported We were generating "fmov h0, wzr" instructions when FullFP16 is not enabled. I've not added any tests, because the problem was visible in: test/CodeGen/AArch64/arm64-zero-cycle-zeroing.ll, which I had to change: I don't think Cyclone has FullFP16 enabled by default, so it shouldn't be using this v8.2a instruction. I've also removed these rdar tags, please shout if there are any objections. Differential Revision: https://reviews.llvm.org/D43020 llvm-svn: 324581	2018-02-08 08:39:05 +00:00
Sanjay Patel	d7bed12192	[AArch64] remove bogus comment; NFC I added this comment with D42323, but as discussed in D42806, the architecture does the right thing for denorms. We don't even need the select on 0.0 here? llvm-svn: 323996	2018-02-01 19:59:33 +00:00
Sanjay Patel	657e5d8d41	[DAGCombiner] filter out denorm inputs when calculating sqrt estimate (PR34994) As shown in the example in PR34994: https://bugs.llvm.org/show_bug.cgi?id=34994 ...we can return a very wrong answer (inf instead of 0.0) for square root when using a reciprocal square root estimate instruction. Here, I've conditionalized the filtering out of denorms based on the function having "denormal-fp-math"="ieee" in its attributes. The other options for this attribute are 'preserve-sign' and 'positive-zero'. So we don't generate this extra code by default with just '-ffast-math' (because then there's no denormal attribute string at all), but it works if you specify '-ffast-math -fdenormal-fp-math=ieee' from clang. As noted in the review, there may be other problems in clang that affect the results depending on platform (Linux x86 at least), but this should allow creating the desired codegen. Differential Revision: https://reviews.llvm.org/D42323 llvm-svn: 323981	2018-02-01 16:57:18 +00:00
Jun Bum Lim	fc7d56d949	Revert "AArch64: Omit callframe setup/destroy when not necessary" This reverts commit r322917 due to multiple performance regressions in spec2006 and spec2017. XFAILed llvm/test/CodeGen/AArch64/big-callframe.ll which initially motivated this change. llvm-svn: 323683	2018-01-29 19:56:42 +00:00
Oliver Stannard	a9d2e004d2	[AArch64] Generate the CASP instruction for 128-bit cmpxchg The Large System Extension added an atomic compare-and-swap instruction that operates on a pair of 64-bit registers, which we can use to implement a 128-bit cmpxchg. Because i128 is not a legal type for AArch64 we have to do all of the instruction selection in C++, and the instruction requires even/odd register pairs, so we have to wrap it in REG_SEQUENCE and EXTRACT_SUBREG nodes. This is very similar to what we do for 64-bit cmpxchg in the ARM backend. Differential revision: https://reviews.llvm.org/D42104 llvm-svn: 323634	2018-01-29 09:18:37 +00:00
Joel Jones	0715092c65	[AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others. This patch enables aggressive FMA by default on T99, and provides a -mllvm option to enable the same on other AArch64 micro-arch's (-mllvm -aarch64-enable-aggressive-fma). Test case demonstrating the effects on T99 is included. Patch by: steleman (Stefan Teleman) Differential Revision: https://reviews.llvm.org/D40696 llvm-svn: 323474	2018-01-25 21:55:39 +00:00
Pablo Barrio	9b3d4c01a0	[AArch64] Avoid unnecessary vector byte-swapping in big-endian Summary: Loads/stores of some NEON vector types are promoted to other vector types with different lane sizes but same vector size. This is not a problem in little-endian but, when in big-endian, it requires additional byte reversals required to preserve the lane ordering while keeping the right endianness of the data inside each lane. For example: %1 = load <4 x half>, <4 x half>* %p results in the following assembly: ld1 { v0.2s }, [x1] rev32 v0.4h, v0.4h This patch changes the promotion of these loads/stores so that the actual vector load/store (LD1/ST1) takes care of the endianness correctly and there is no need for further byte reversals. The previous code now results in the following assembly: ld1 { v0.4h }, [x1] Reviewers: olista01, SjoerdMeijer, efriedma Reviewed By: efriedma Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D42235 llvm-svn: 323325	2018-01-24 14:13:47 +00:00
Carey Williams	da15b5b116	[AArch64] optimise v4f16 fcmps to utilise vector instructions Improves the code generation for v4f16 FCMP instructions when FullFP16 is not supported. Generating FCTVL(s) rather than a longer series of FCVTs. Differential Revision: https://reviews.llvm.org/D41772 llvm-svn: 323118	2018-01-22 14:16:11 +00:00
Carey Williams	22c49c6470	Test commit llvm-svn: 322958	2018-01-19 16:55:23 +00:00
Matthias Braun	5c290dc206	AArch64: Fix emergency spillslot being out of reach for large callframes Re-commit of r322200: The testcase shouldn't hit machineverifiers anymore with r322917 in place. Large callframes (calls with several hundreds or thousands or parameters) could lead to situations in which the emergency spillslot is out of range to be addressed relative to the stack pointer. This commit forces the use of a frame pointer in the presence of large callframes. This commit does several things: - Compute max callframe size at the end of instruction selection. - Add mirFileLoaded target callback. Use it to compute the max callframe size after loading a .mir file when the size wasn't specified in the file. - Let TargetFrameLowering::hasFP() return true if there exists a callframe > 255 bytes. - Always place the emergency spillslot close to FP if we have a frame pointer. - Note that `useFPForScavengingIndex()` would previously return false when a base pointer was available leading to the emergency spillslot getting allocated late (that's the whole effect of this callback). Which made no sense to me so I took this case out: Even though the emergency spillslot is technically not referenced by FP in this case we still want it allocated early. Differential Revision: https://reviews.llvm.org/D40876 llvm-svn: 322919	2018-01-19 03:16:36 +00:00
Matthias Braun	dc4b3e87f4	AArch64: Omit callframe setup/destroy when not necessary Do not create CALLSEQ_START/CALLSEQ_END when there is no callframe to setup and the callframe size is 0. - Fixes an invalid callframe nesting for byval arguments, which would look like this before this patch (as in `big-byval.ll`): ... ADJCALLSTACKDOWN 32768, 0, ... # Setup for extfunc ... ADJCALLSTACKDOWN 0, 0, ... # setup for memcpy ... BL &memcpy ... ADJCALLSTACKUP 0, 0, ... # destroy for memcpy ... BL &extfunc ADJCALLSTACKUP 32768, 0, ... # destroy for extfunc - Saves us two instructions in the common case of zero-sized stackframes. - Remove an unnecessary scheduling barrier (hence the small unittest changes). Differential Revision: https://reviews.llvm.org/D42006 llvm-svn: 322917	2018-01-19 02:45:38 +00:00
Matthias Braun	e3a8db7ba1	Revert "AArch64: Fix emergency spillslot being out of reach for large callframes" Revert for now as the testcase is hitting a pre-existing verifier error that manifest as a failure when expensive checks are enabled (or -verify-machineinstrs) is used. This reverts commit r322200. llvm-svn: 322231	2018-01-10 22:36:28 +00:00
Matthias Braun	b42ffa1283	AArch64: Fix emergency spillslot being out of reach for large callframes Large callframes (calls with several hundreds or thousands or parameters) could lead to situations in which the emergency spillslot is out of range to be addressed relative to the stack pointer. This commit forces the use of a frame pointer in the presence of large callframes. This commit does several things: - Compute max callframe size at the end of instruction selection. - Add mirFileLoaded target callback. Use it to compute the max callframe size after loading a .mir file when the size wasn't specified in the file. - Let TargetFrameLowering::hasFP() return true if there exists a callframe > 255 bytes. - Always place the emergency spillslot close to FP if we have a frame pointer. - Note that `useFPForScavengingIndex()` would previously return false when a base pointer was available leading to the emergency spillslot getting allocated late (that's the whole effect of this callback). Which made no sense to me so I took this case out: Even though the emergency spillslot is technically not referenced by FP in this case we still want it allocated early. Differential Revision: https://reviews.llvm.org/D40876 llvm-svn: 322200	2018-01-10 18:16:24 +00:00
Craig Topper	a4f9997675	[SelectionDAG][X86][AArch64] Require targets to specify the promotion type when using setOperationAction Promote for INT_TO_FP and FP_TO_INT Currently the promotion for these ignores the normal getTypeToPromoteTo and instead just tries to double the element width. This is because the default behavior of getTypeToPromote to just adds 1 to the SimpleVT, which has the affect of increasing the element count while keeping the scalar size the same. If multiple steps are required to get to a legal operation type, int_to_fp will be promoted multiple times. And fp_to_int will keep trying wider types in a loop until it finds one that works. getTypeToPromoteTo does have the ability to query a promotion map to get the type and not do the increasing behavior. It seems better to just let the target specify the promotion type in the map explicitly instead of letting the legalizer iterate via widening. FWIW, it's worth I think for any other vector operations that need to be promoted, we have to specify the type explicitly because the default behavior of getTypeToPromote isn't useful for vectors. The other types of promotion already require either the element count is constant or the total vector width is constant, but neither happens by incrementing the SimpleVT enum. Differential Revision: https://reviews.llvm.org/D40664 llvm-svn: 321629	2018-01-01 19:21:35 +00:00
Matthias Braun	a4852d2c19	X86/AArch64/ARM: Factor out common sincos_stret logic; NFCI Note: - X86ISelLowering: setLibcallName(SINCOS) was superfluous as InitLibcalls() already does it. - ARMISelLowering: Setting libcallnames for sincos/sincosf seemed superfluous as in the darwin case it wouldn't be used while for all other cases InitLibcalls already does it. llvm-svn: 321036	2017-12-18 23:19:42 +00:00
Matthias Braun	f1caa2833f	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. llvm-svn: 320884	2017-12-15 22:22:58 +00:00
Matt Arsenault	7d7adf4f2e	TLI: Allow using PSV for intrinsic mem operands llvm-svn: 320756	2017-12-14 22:34:10 +00:00
Matt Arsenault	1117133687	DAG: Expose all MMO flags in getTgtMemIntrinsic Rather than adding more bits to express every MMO flag you could want, just directly use the MMO flags. Also fixes using a bunch of bool arguments to getMemIntrinsicNode. On AMDGPU, buffer and image intrinsics should always have MODereferencable set, but currently there is no way to do that directly during the initial intrinsic lowering. llvm-svn: 320746	2017-12-14 21:39:51 +00:00
Joel Galenson	3e40883e4c	[AArch64] Do not abort if overflow check does not use EQ or NE. As suggested by Eli Friedman, instead of aborting if an overflow check uses something other than SETEQ or SETNE, simply do not apply the optimization. Differential Revision: https://reviews.llvm.org/D39147 llvm-svn: 319837	2017-12-05 21:33:12 +00:00
Martin Storsjo	eca862de07	[AArch64] Allow using emulated tls on platforms other than ELF This matches how it is done on X86. This allows using emulated tls on windows; in MinGW environments, native tls isn't supported at the moment. Set the right Data*bitsDirective for windows to match the existing tests for other platforms. Make parts of the existing tests a regex, to allow matching .section .rdata for windows, to avoid having to duplicate the rest of the tests for windows. Differential Revision: https://reviews.llvm.org/D40770 llvm-svn: 319644	2017-12-04 09:09:04 +00:00
Nirav Dave	db77e57ea8	[DAG] Do MergeConsecutiveStores again before Instruction Selection Summary: Now that store-merge is only generates type-safe stores, do a second pass just before instruction selection to allow lowered intrinsics to be merged as well. Reviewers: jyknight, hfinkel, RKSimon, efriedma, rnk, jmolloy Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33675 llvm-svn: 319036	2017-11-27 15:28:15 +00:00
David Blaikie	b3bde2ea50	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490	2017-11-17 01:07:10 +00:00
Martin Storsjo	4629f52312	[ARM, AArch64] Fix an assert message, Darwin isn't the only target supporting TLS. NFC. llvm-svn: 318184	2017-11-14 19:57:59 +00:00
David Blaikie	3f833edc7c	Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the layering of its implementation. llvm-svn: 317647	2017-11-08 01:01:31 +00:00
Evandro Menezes	9dcf099944	[AArch64] Fix the number of iterations for the Newton series The number of iterations was incorrectly determined for DP FP vector types and the tests were insufficient to flag this issue. Differential revision: https://reviews.llvm.org/D39507 llvm-svn: 317349	2017-11-03 18:56:36 +00:00
Martin Storsjo	373c8efa1e	[AArch64] Add support for dllimport of values and functions Previously, the dllimport attribute did the right thing in terms of treating it as a pointer to a value, but this makes sure the names get mangled properly, and calls to such functions load the function from the __imp_ pointer. This is based on SVN r212431 and r212430 where the same was implemented for ARM. Differential Revision: https://reviews.llvm.org/D38530 llvm-svn: 316555	2017-10-25 07:25:18 +00:00
Amara Emerson	24ca39ce71	[AArch64] Improve codegen for inverted overflow checking intrinsics E.g. if we have a (xor(overflow-bit), 1) where overflow-bit comes from an intrinsic like llvm.sadd.with.overflow then we can kill the xor and use the inverted condition code for the CSEL. rdar://28495949 Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D38160 llvm-svn: 315205	2017-10-09 15:15:09 +00:00
Geoff Berry	bb23df92b5	[AArch64] Fix bug in store of vector 0 DAGCombine. Summary: Avoid using XZR/WZR directly as operands to split stores of zero vectors. Doing so can lead to the XZR/WZR being used by an instruction that doesn't allow it (e.g. add). Fixes bug 34674. Reviewers: t.p.northover, efriedma, MatzeB Subscribers: aemerson, rengolin, javed.absar, mcrosier, eraman, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D38146 llvm-svn: 313916	2017-09-21 21:10:06 +00:00
Sjoerd Meijer	0c5ba21cbf	[AArch64] allow v8f16 types when FullFP16 is supported This adds support for allowing v8f16 vector types, thus avoiding conversions from/to single precision for these types. This is a follow up patch of commits r311154 and r312104, which added support for scalars and v4f16 types, respectively. Differential Revision: https://reviews.llvm.org/D37802 llvm-svn: 313351	2017-09-15 09:24:48 +00:00
Petr Hosek	c35fe2b70b	[Fuchsia] Magenta -> Zircon Fuchsia's lowest API layer has been renamed from Magenta to Zircon. In LLVM proper, this is only mentioned in comments. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D37763 llvm-svn: 313105	2017-09-13 01:18:06 +00:00
Sjoerd Meijer	bafde8f3e3	[AArch64] ISel: Add some debug messages to LowerBUILDVECTOR. NFC. Differential Revision: https://reviews.llvm.org/D37676 llvm-svn: 313017	2017-09-12 10:24:12 +00:00
Sjoerd Meijer	be5b60f735	[AArch64] allow v4f16 types when FullFP16 is supported Support for scalars was committed in r311154, this adds support for allowing v4f16 vector types (thus avoiding conversions from/to single precision for these types). Differential Revision: https://reviews.llvm.org/D37145 llvm-svn: 312104	2017-08-30 08:38:13 +00:00
Sjoerd Meijer	b0eb5fb317	[AArch64] Add FMOVH0: materialize 0 using zero register for f16 values Instead of loading 0 from a constant pool, it's of course much better to materialize it using an fmov and the zero register. Thanks to Ahmed Bougacha for the suggestion. Differential Revision: https://reviews.llvm.org/D37102 llvm-svn: 311662	2017-08-24 14:47:06 +00:00
Sjoerd Meijer	afc2cd3c9e	[AArch64] Custom lowering of copysign f16 This is a follow up patch of r311154 and introduces custom lowering of copysign f16 to avoid promotions to single precision types when the subtarget supports fullfp16. Differential Revision: https://reviews.llvm.org/D36893 llvm-svn: 311646	2017-08-24 09:21:10 +00:00
Sjoerd Meijer	046a969360	[AArch64] fix for fcos and frem f16 promotion Fix for copy-paste mistake in r311154; setOperationAction for fcos and frem f16 operands appeared twice (and it should be set to 'promote'). Differential Revision: https://reviews.llvm.org/D37071 llvm-svn: 311635	2017-08-24 07:43:52 +00:00
Krasimir Georgiev	3d55cef48b	[AArch64] Silence unused variable warning in opt mode after r311533 llvm-svn: 311535	2017-08-23 08:40:22 +00:00
Sjoerd Meijer	24c98189ed	[AArch64] ISel legalization debug messages. NFCI. Debugging AArch64 instruction legalization and custom lowering is really an unpleasant experience because it shows nodes that appear out of thin air. In commit r311444, some debug messages have been added to SelectionDAG, the target independent part, and this patch adds some AArch64 specific messages. Differential Revision: https://reviews.llvm.org/D36964 llvm-svn: 311533	2017-08-23 08:18:37 +00:00
Sjoerd Meijer	b9de2b4871	[AArch64] Cleanup of HasFullFP16 argument. NFC. This is a clean up of commit r311154; it's not necessary to pass HasFullFP16 as an argument, instead just query the DAG. Differential Revision: https://reviews.llvm.org/D36978 llvm-svn: 311438	2017-08-22 09:21:08 +00:00
Alex Bradbury	080f6976c0	Use report_fatal_error for unsupported calling conventions The calling convention can be specified by the user in IR. Failing to support a particular calling convention isn't a programming error, and so relying on llvm_unreachable to catch and report an unsupported calling convention is not appropriate. Differential Revision: https://reviews.llvm.org/D36830 llvm-svn: 311435	2017-08-22 09:11:41 +00:00
Sjoerd Meijer	ec9581e5e0	[AArch64] Do not promote f16 when subtarget HasFullFP16 Armv8.2-A adds FP16 support, i.e. f16 is not only a storage-only type, but it also supports performing data processing on 16-bit floating-point quantities. All the necessary (tablegen) groundwork of adding the ARMv8.2-A FP16 (scalar) instructions was done in D15014. To take advantage of this, this patch avoids promotion of f16 to f32 types when the subtarget supports FullFP16, which enables instruction selection of these FP16 instructions. Differential Revision: https://reviews.llvm.org/D36396 llvm-svn: 311154	2017-08-18 10:51:14 +00:00
Craig Topper	4e22ee6745	[ConstantInt] Use ConstantInt::getValue instead of Constant::getUniqueInteger in a few places where we obviously have a ConstantInt. NFC getUniqueInteger will ultimately call ConstantInt::getValue, but calling ConstantInt::getValue should be inlined. llvm-svn: 310069	2017-08-04 16:59:29 +00:00
Haicheng Wu	50692a203c	[AArch64] Fix a typo in isExtFreeImpl() next => not Differential Revision: https://reviews.llvm.org/D36104 llvm-svn: 309748	2017-08-01 21:26:45 +00:00
Ahmed Bougacha	87807c5a86	[AArch64] Fix legality info passed to demanded bits for TBI opt. The (seldom-used) TBI-aware optimization had a typo lying dormant since it was first introduced, in r252573: when asking for demanded bits, it told TLI that it was running after legalize, where the opposite was true. This is an important piece of information, that the demanded bits analysis uses to make assumptions about the node. r301019 added such an assumption, which was broken by the TBI combine. Instead, pass the correct flags to TLO. llvm-svn: 309323	2017-07-27 21:27:25 +00:00
Peter Collingbourne	081ffe2ff2	Change CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. NFCI. This was a use-after-free waiting to happen. llvm-svn: 309159	2017-07-26 19:15:29 +00:00
Zvi Rackover	1b73682243	TargetLowering: Change isShuffleMaskLegal's mask argument type to ArrayRef<int>. NFCI. Changing mask argument type from const SmallVectorImpl<int>& to ArrayRef<int>. This came up in D35700 where a mask is received as an ArrayRef<int> and we want to pass it to TargetLowering::isShuffleMaskLegal(). Also saves a few lines of code. llvm-svn: 309085	2017-07-26 08:06:58 +00:00
Martin Storsjo	8cb3667541	[AArch64] Reserve a 16 byte aligned amount of fixed stack for win64 varargs Create a dummy 8 byte fixed object for the unused slot below the first stored vararg. Alternative ideas tested but skipped: One could try to align the whole fixed object to 16, but I haven't found how to add an offset to the stack frame used in LowerWin64_VASTART. If only the size of the fixed stack object size is padded but not the offset, via MFI.CreateFixedObject(alignTo(GPRSaveSize, 16), -(int)GPRSaveSize, false), PrologEpilogInserter crashes due to "Attempted to reset backwards range!". This fixes misconceptions about where registers are spilled, since AArch64FrameLowering.cpp assumes the offset from fixed objects is aligned to 16 bytes (and the Win64 case there already manually aligns the offset to 16 bytes). This fixes cases where local stack allocations could overwrite callee saved registers on the stack. Differential Revision: https://reviews.llvm.org/D35720 llvm-svn: 308950	2017-07-25 05:20:01 +00:00
Jonas Paulsson	024e319489	[SystemZ, LoopStrengthReduce] This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729	2017-07-21 11:59:37 +00:00
Martin Storsjo	2f24e93481	[AArch64] Extend CallingConv::X86_64_Win64 to AArch64 as well Rename the enum value from X86_64_Win64 to plain Win64. The symbol exposed in the textual IR is changed from 'x86_64_win64cc' to 'win64cc', but the numeric value is kept, keeping support for old bitcode. Differential Revision: https://reviews.llvm.org/D34474 llvm-svn: 308208	2017-07-17 20:05:19 +00:00
Geoff Berry	b1e8714af9	[AArch64][Falkor] Avoid HW prefetcher tag collisions (step 1) Summary: This patch is the first step in reducing HW prefetcher instruction tag collisions in inner loops for Falkor. It adds a pass that annotates IR loads with metadata to indicate that they are known to be strided loads, and adds a target lowering hook that translates this metadata to a target-specific MachineMemOperand flag. A follow on change will use this MachineMemOperand flag to re-write instructions to reduce tag collisions. Reviewers: mcrosier, t.p.northover Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34963 llvm-svn: 308059	2017-07-14 21:44:12 +00:00
Martin Storsjo	68266faa31	[AArch64] Implement support for windows style vararg functions Pass parameters properly in calls to such functions (pass all floats in integer registers), and handle va_start properly (allocate stack immediately below the arguments on the stack, to save the register arguments into a single continuous array). Differential Revision: https://reviews.llvm.org/D35006 llvm-svn: 307928	2017-07-13 17:03:12 +00:00
Matthew Simpson	06e6a6bdff	[AArch64] Add preliminary support for ARMv8.1 SUB/AND atomics This patch is a follow-up to r305893 and adds preliminary support for the fetch_sub and fetch_and operations. llvm-svn: 307913	2017-07-13 15:01:23 +00:00
Joel Jones	7466ccfc59	Doxygen formatting. NFCI llvm-svn: 307597	2017-07-10 22:11:50 +00:00
Dehao Chen	38f1bc7834	Fix the bug when handling shufflevector for aarch64. Summary: This Fixes https://bugs.llvm.org/show_bug.cgi?id=33600 Reviewers: mssimpso, davidxl, Carrot Reviewed By: mssimpso Subscribers: aemerson, rengolin, sanjoy, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D34641 llvm-svn: 306334	2017-06-26 21:33:51 +00:00
Christof Douma	c1c28051d2	[AARCH64][LSE] Preliminary support for ARMv8.1 LSE Atomics. Implemented support to AArch64 codegen for ARMv8.1 Large System Extensions atomic instructions. Where supported, these instructions can provide atomic operations with higher performance. Currently supported operations include: fetch_add, fetch_or, fetch_xor, fetch_smin, fetch_min/max (signed and unsigned), swap, and compare_exchange. This implementation implies sequential-consistency ordering, more relaxed ordering is under development. Subtarget->hasLSE is currently supported for Cavium ThunderX2T99. Patch by Ananth Jasty. Differential Revision: https://reviews.llvm.org/D33586 Change-Id: I82f6d3d64255622791ceb0715b7ab9f4dc4d4b2c llvm-svn: 305893	2017-06-21 10:58:31 +00:00
Nirav Dave	85e92223b4	[AArch64] Add indexed check to splitStores. NFC. Add explicit check for unhandled cases in preparation for delaying splitStores to post-legalization. llvm-svn: 305471	2017-06-15 14:47:44 +00:00
Chandler Carruth	6bda14b313	Sort the remaining #include lines in include/... and lib/.... I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787	2017-06-06 11:49:48 +00:00
Craig Topper	f6d4dc5b4a	[SelectionDAG] Set ISD::FPOWI to Expand by default Summary: Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie". This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default. Reviewers: spatel, RKSimon, efriedma Reviewed By: RKSimon Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits Differential Revision: https://reviews.llvm.org/D33530 llvm-svn: 304215	2017-05-30 15:27:55 +00:00
Nirav Dave	6ff50bf242	Fix signedness of constant. NFC. llvm-svn: 303980	2017-05-26 12:53:10 +00:00
Nirav Dave	bb20b5d5c3	[AArch64] Prevent nested ADDs from address calc in splitStoreSplat. NFC In preparation for late-stage store merging. llvm-svn: 303800	2017-05-24 19:55:49 +00:00
Akira Hatanaka	e8ae3346a3	[AArch64] Fix PRR33100. This commit fixes a bug introduced in r301019 where optimizeLogicalImm would replace a logical node's immediate operand that was CSE'd and was also an operand of another node. This commit fixes the bug by replacing the logical node instead of its immediate operand. rdar://problem/32295276 llvm-svn: 303607	2017-05-23 06:08:37 +00:00
Amara Emerson	c9916d7e97	Re-commit r302678, fixing PR33053. The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions which didn't have a lowering. llvm-svn: 303211	2017-05-16 21:29:22 +00:00
Hans Wennborg	bd6e9e77a7	Revert r302678 "[AArch64] Enable use of reduction intrinsics." This caused PR33053. Original commit message: > The new experimental reduction intrinsics can now be used, so I'm enabling this > for AArch64. We will need this for SVE anyway, so it makes sense to do this for > NEON reductions as well. > > The existing code to match shufflevector patterns are replaced with a direct > lowering of the reductions to AArch64-specific nodes. Tests updated with the > new, simpler, representation. > > Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 303115	2017-05-15 20:59:32 +00:00
Amara Emerson	816542ceb3	[AArch64] Enable use of reduction intrinsics. The new experimental reduction intrinsics can now be used, so I'm enabling this for AArch64. We will need this for SVE anyway, so it makes sense to do this for NEON reductions as well. The existing code to match shufflevector patterns are replaced with a direct lowering of the reductions to AArch64-specific nodes. Tests updated with the new, simpler, representation. Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 302678	2017-05-10 15:15:38 +00:00
Serge Guelton	e38003f839	Suppress all uses of LLVM_END_WITH_NULL. NFC. Use variadic templates instead of relying on <cstdarg> + sentinel. This enforces better type checking and makes code more readable. Differential Revision: https://reviews.llvm.org/D32541 llvm-svn: 302571	2017-05-09 19:31:13 +00:00
Serge Pavlov	d526b13e61	Add extra operand to CALLSEQ_START to keep frame part set up previously Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors. This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments. The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it. The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481. Differential Revision: https://reviews.llvm.org/D32394 llvm-svn: 302527	2017-05-09 13:35:13 +00:00
Simon Pilgrim	7a28a3ac78	[AARCH64][NEON] Add support for ISD::ABS lowering Update int_aarch64_neon_abs intrinsic to use the ISD::ABS opcode directly Differential Revision: https://reviews.llvm.org/D32940 llvm-svn: 302415	2017-05-08 10:25:18 +00:00
Amara Emerson	d28f0cd448	Generalize the specialized flag-carrying SDNodes by moving flags into SDNode. This removes BinaryWithFlagsSDNode, and flags are now all passed by value. Differential Revision: https://reviews.llvm.org/D32527 llvm-svn: 301803	2017-05-01 15:17:51 +00:00
Craig Topper	d0af7e8ab8	[SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620	2017-04-28 05:31:46 +00:00
Akira Hatanaka	22e839f4b2	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300932 and r300930, which was causing dag-combine to loop forever. The problem was that optimizeLogicalImm was returning true even when there was no change to the immediate node (which happened when the immediate was all zeros or ones), which caused dag-combine to push and pop the same node to the work list over and over again without making any progress. This commit fixes the bug by returning false early in optimizeLogicalImm if the immediate is all zeros or ones. Also, it changes the code to compare the immediate with 0 or Mask rather than calling countPopulation. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 301019	2017-04-21 18:53:12 +00:00
Joel Jones	a7c4a52188	[AArch64] Refactor instruction selection lowering for addresses. NFCI Factor out the common code used for generating addresses into common templated functions that call overloaded versions of a new function, getTargetNode. Tested with make check-llvm with targets AArch64. Differential Revision: https://reviews.llvm.org/D32169 llvm-svn: 301005	2017-04-21 17:31:03 +00:00
Akira Hatanaka	78ccba6a20	Revert r300932 and r300930. It seems that r300930 was creating an infinite loop in dag-combine when compling the following file: MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c llvm-svn: 300940	2017-04-21 01:31:50 +00:00

... 8 9 10 11 12 ...

1445 Commits