llvm-project

Commit Graph

Author	SHA1	Message	Date
Cullen Rhodes	7ed0120d84	[AArch64][AsmParser] NFC: Parser.Lex() -> Lex() Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D107146	2021-08-02 09:48:41 +00:00
Carl Ritson	a441de6d94	[AMDGPU][GlobalISel] Add missing default mapping for BVH intrinsics Application of default mapping to BVH intrinsics was missing. Copy parts of SelectionDAG test to GlobalISel test as these would have indicated this error. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D107211	2021-08-02 12:43:38 +09:00
Hsiangkai Wang	8b33839f01	[RISCV] Rename vector inline constraint from 'v' to 'vr' and 'vm' in IR. Differential Revision: https://reviews.llvm.org/D107139	2021-08-01 05:58:17 +08:00
Eli Friedman	bdd55b2f18	Fix the default alignment of i1 vectors. Currently, the default alignment is much larger than the actual size of the vector in memory. Fix this to use a sane default. For SVE, temporarily remove lowering of load/store operations for predicates with less than 16 elements. The layout the backend was assuming for SVE predicates with less than 16 elements doesn't agree with the frontend. More work probably needs to be done here. This change is, strictly speaking, not backwards-compatible at the bitcode level. But probably nobody is actually depending on that; i1 vectors in memory are rare, and the code that does use them probably ends up forcing the alignment to something sane anyway. If we think this is a concern, I can restrict this to scalable vectors for now (where it's actually causing issues for me at the moment). Differential Revision: https://reviews.llvm.org/D88994	2021-07-31 14:09:59 -07:00
Craig Topper	593059b328	[RISCV] Rename RISCVISD::FCVT_W_RV64 to FCVT_W_RTZ_RV64. NFC fcvt.w(u) supports multiple rounding modes, but the ISD node doesn't encode that. So name it to match the rounding mode it uses.	2021-07-31 11:14:59 -07:00
David Green	15a1d7e839	[ARM] Switch order of creating VADDV and VMLAV. It can be beneficial to attempt to try the larger VMLAV patterns before VADDV, in case both may match the same code.	2021-07-31 16:28:52 +01:00
Matt Arsenault	ebc17a0d68	GlobalISel: Scalarize unaligned vector stores This has the same problems and limitations as the load path.	2021-07-31 10:37:15 -04:00
Alexandros Lamprineas	7d940432c4	[AArch64] Legalize MVT::i64x8 in DAG isel lowering This patch legalizes the Machine Value Type introduced in D94096 for loads and stores. A new target hook named getAsmOperandValueType() is added which maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization. Differential Revision: https://reviews.llvm.org/D94097	2021-07-31 09:51:28 +01:00
David Green	69cdadddec	[ARM] Distribute reductions based on ascending load offset This distributes reductions based on the relative offset of loads, if one is found from their operands. Given chains of reductions this will then sort them in ascending load order, which in turn can help simple prefetches latch on to increasing strides more easily. Differential Revision: https://reviews.llvm.org/D106569	2021-07-30 19:50:07 +01:00
Matt Arsenault	faccf427df	AMDGPU/GlobalISel: Remove special case lowering for non-pow-2 stores We end up with extra copies from buildAnyExtOrTrunc if these are lowered after the register types are legalized.	2021-07-30 12:37:29 -04:00
David Green	532d05b714	[ARM] Attempt to distribute reductions This adds a combine for adds of reductions, distributing them so that they occur sequentially to enable better use of accumulating VADDVA instructions. It combines: add(X, add(vecreduce(Y), vecreduce(Z))) -> add(add(X, vecreduce(Y)), vecreduce(Z)) and add(add(A, reduce(B)), add(C, reduce(D))) -> add(add(add(A, C), reduce(B)), reduce(D)) These together distribute the add's so that more reductions can be selected to VADDVA. Differential Revision: https://reviews.llvm.org/D106532	2021-07-30 14:48:31 +01:00
David Green	4b56306762	[ARM] Turn vecreduce_add(add(x, y)) into vecreduce(x) + vecreduce(y) Under MVE we can use VADDV/VADDVA's to perform integer add reductions, so it can be beneficial to use more reductions than summing subvectors and reducing once. Especially for VMLAV/VMLAVA the mul can be incorporated into the reduction, producing less instructions. Some of the test cases currently get larger due to extra integer adds, but will be improved in a followup patch. Differential Revision: https://reviews.llvm.org/D106531	2021-07-30 10:10:41 +01:00
Cullen Rhodes	3a349d2269	[AArch64][SME] Introduce feature for streaming mode The Scalable Matrix Extension (SME) introduces a new execution mode called Streaming SVE mode. In streaming mode a substantial subset of the SVE and SVE2 instruction set is available, along with new outer product, load, store, extract and insert instructions that operate on the new architectural register state for the matrix. To support streaming mode this patch introduces a new subtarget feature +streaming-sve. If enabled, the subset of SVE(2) instructions are available. The existing behaviour for SVE(2) remains unchanged, the subset of instructions that are legal in streaming mode are enabled if either +sve[2] or +streaming-sve is specified. Instructions that are illegal in streaming mode remain predicated on +sve[2]. The SME target feature has been updated to imply +streaming-sve rather than +sve. The following changes are made to the SVE(2) tests: * For instructions that are legal in streaming mode: - added RUN line to verify +streaming-sve enables the instruction. - updated diagnostic to 'instruction requires: streaming-sve or sve'. * For instructions that are illegal in streaming-mode: - added RUN line to verify +streaming-sve does not enable the instruction. SVE(2) instructions that are legal in streaming mode have: if !HaveSVE[2]() && !HaveSME() then UNDEFINED; at the top of the pseudocode in the XML. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D106272	2021-07-30 07:30:45 +00:00
Tarindu Jayatilaka	7a797b2902	Take OptimizationLevel class out of Pass Builder Pulled out the OptimizationLevel class from PassBuilder in order to be able to access it from within the PassManager and avoid include conflicts. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D107025	2021-07-29 21:57:23 -07:00
Stefan Pintilie	754520a2bf	[PowerPC] Fix issue where hint was providing the incorrect regsiter class. Regsier hints when copying to a UACC register do not always produce VSRp registers. This patch makes sure that we do not produce hints in cases where the subregsiter of the UACC is not a VSRp. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D107101	2021-07-29 21:10:45 -05:00
Mark Schimmel	e622c99f30	[ARC] Add norm/normh instructions with disassembly tests Add disassembler support for the NORM and NORMH instructions. These instructions only exist when the ARC processor is configured with the "norm" extension. fferential Revision: https://reviews.llvm.org/D107118	2021-07-29 17:54:52 -07:00
Ben Shi	bb6fddb63c	Optimize mul in the zba extension with SH*ADD This patch does the following optimization of mul with a constant. (mul x, 11) -> (SH1ADD (SH2ADD x, x), x) (mul x, 19) -> (SH1ADD (SH3ADD x, x), x) (mul x, 13) -> (SH2ADD (SH1ADD x, x), x) (mul x, 21) -> (SH2ADD (SH2ADD x, x), x) (mul x, 37) -> (SH2ADD (SH3ADD x, x), x) (mul x, 25) -> (SH3ADD (SH1ADD x, x), x) (mul x, 41) -> (SH3ADD (SH2ADD x, x), x) (mul x, 73) -> (SH3ADD (SH3ADD x, x), x) (mul x, 27) -> (SH1ADD (SH3ADD x, x), (SH3ADD x, x)) (mul x, 45) -> (SH2ADD (SH3ADD x, x), (SH3ADD x, x)) (mul x, 81) -> (SH3ADD (SH3ADD x, x), (SH3ADD x, x)) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107065	2021-07-30 08:36:28 +08:00
Thomas Johnson	cc238a6e03	[ARC] Add additional mov immediate instruction formats with a fix for u6 decoding Differential Revision: https://reviews.llvm.org/D107088	2021-07-29 16:41:55 -07:00
Adrian Prantl	c5d84d2eb3	GlobalISel/AArch64: don't optimize away redundant branches at -O0 This patch prevents GlobalISel from optimizing out redundant branch instructions when compiling without optimizations. The motivating example is code like the following common pattern in Swift, where users expect to be able to set a breakpoint on the early exit: public func f(b: Bool) { guard b else { return // I would like to set a breakpoint here. } ... } The patch modifies two places in GlobalISEL: The first one is in IRTranslator.cpp where the removal of redundant branches is made conditional on the optimization level. The second one is in AArch64InstructionSelector.cpp where an -O0 only optimization is being removed. Disabling these optimizations increases code size at -O0 by ~8%. However, doing so improves debuggability, and debug builds are the primary reason why developers compile without optimizations. We thus concluded that this is the right trade-off. rdar://79515454 This tenatively reapplies the patch without modifications, the LLDB test that has blocked this from landing previously has since been modified to hopefully no longer be sensitive to this change. Differential Revision: https://reviews.llvm.org/D105238	2021-07-29 16:04:22 -07:00
David Green	d4a2daa919	[ARM] Define a couple more ssub indexes. NFC Same as `91bd3ad128`, this doesn't really change anything but gives the registers better names than the ones tablegen would define. And fills in the missing gaps.	2021-07-29 23:00:35 +01:00
Bradley Smith	191831e380	[AArch64][SVE] Fix incorrect mask type when lowering fixed type SVE gather/scatter An incorrect mask type when lowering an SVE gather/scatter was causing a codegen fault which manifested as the incorrect predicate size being used for an SVE gather/scatter, (e.g.. p0.b rather than p0.d). Fixes PR51182. Differential Revision: https://reviews.llvm.org/D106943	2021-07-29 11:22:17 +00:00
Cullen Rhodes	08d92dbbff	[AArch64][AsmParser] NFC: Parser.getTok() -> getTok() Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106949	2021-07-29 10:18:54 +00:00
Amara Emerson	da61ab8475	[AArch64][GlobalISel] More widenToNextPow2 changes, this time for arithmetic/bitwise ops.	2021-07-29 03:02:29 -07:00
Mirko Brkusanin	971f4173f8	[AMDGPU][GlobalISel] Insert an and with exec before s_cbranch_vccnz if necessary While v_cmp will AND inactive lanes with 0, that is not the case for logical operations. This fixes a Vulkan CTS test that would hang otherwise. Differential Revision: https://reviews.llvm.org/D105709	2021-07-29 11:20:49 +02:00
Fraser Cormack	02dd4b59bc	[RISCV] Optimize floating-point "dominant value" BUILD_VECTORs This patch aims to improve the performance of BUILD_VECTORs which are identified as containing a dominant element. Given that most floating-point constants themselves require a load from the constant pool, it was possible for the optimization to actually increase the number of individual loads on small vectors. The exception is the zero constant -- +0.0 -- which can be materialized efficiently. While this optimization could do with a proper cost model to weigh the benfits of a single vector load vs. the manipulation of individual elements -- even for integer vectors which often require several instructions to materialize -- without a concrete RVV implementation to work with any heuristic is likely to be both more obtuse and inaccurate. Until then, this patch fixes at least one known obvious deficiency. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106963	2021-07-29 09:22:34 +01:00
Ben Shi	264b8e2a20	[RISCV] Optimize mul in the zba extension with SH*ADD This patch makes the following optimization, if the immediate multiplier is not a simm12. (mul x, (power_of_2 + 2)) => (SH1ADD x, (SLLI x, bits)) (mul x, (power_of_2 + 4)) => (SH2ADD x, (SLLI x, bits)) (mul x, (power_of_2 + 8)) => (SH3ADD x, (SLLI x, bits)) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106648	2021-07-29 09:46:41 +08:00
Jessica Paquette	5a333dc5da	[AArch64][GlobalISel] Improve legalization for odd-type G_LOAD Swap the order of widening so that we widen to the next power-of-2 first when legalizing G_LOAD. Also, provide a minimum type for the power of 2 to disallow s2 + s1. Clamping ought to disallow s2 and s1, but I think it's better to be explicit about the expected minimum size. We probably need a similar change for G_STORE, but it seems to be a bit more finnicky. So, let's just handle G_LOAD for now. Differential Revision: https://reviews.llvm.org/D107013	2021-07-28 17:19:14 -07:00
Jessica Paquette	c0a41c3d3b	[AArch64][GlobalISel] Improve legalization for odd-sized G_ICMP/G_CONSTANT We were handing types like s88 like 1) clamp to the range 2) widen to the next power of 2 This isn't desirable because it causes an odd breakdown for types like s88. If we widen to the next power of 2 (s128) first, then we get a clean breakdown when we clamp back to s64. Differential Revision: https://reviews.llvm.org/D106998	2021-07-28 15:31:33 -07:00
Patrick Holland	dbed061bf1	[MCA] Moving the target specific CustomBehaviour impl. from /tools/llvm-mca/ to /lib/Target/. Differential Revision: https://reviews.llvm.org/D106775	2021-07-28 11:23:18 -07:00
Fangrui Song	6da3d8b19c	[llvm] Replace LLVM_ATTRIBUTE_NORETURN with C++11 [[noreturn]] [[noreturn]] can be used since Oct 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015. Note: the definition of LLVM_ATTRIBUTE_NORETURN is kept for now.	2021-07-28 09:31:14 -07:00
Craig Topper	3106f85945	[RISCV] Fix grammar in a comment. NFC	2021-07-28 09:09:26 -07:00
Craig Topper	54588bcc05	[RISCV] Restrict performANY_EXTENDCombine to prevent an infinite loop. The sign_extend we insert here can get turned into a zero_extend if the sign bit is known zero. This can enable a setcc combine that shrinks compares with zero_extend. This reduces the use count of the zero_extend allowing other combines to turn it back into an any_extend. This restricts the combine to only cases where the result is used by a CopyToReg. This works for my original motivating case. I hope the CopyToReg use will prevent any converted extends from turning back into an any_extend. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D106754	2021-07-28 09:05:45 -07:00
Sanjay Patel	4c41caa287	[x86] improve CMOV codegen by pushing add into operands, part 3 In this episode, we are trying to avoid an x86 micro-arch quirk where complex (3 operand) LEA potentially costs significantly more than simple LEA. So we simultaneously push and pull the math around the CMOV to balance the operations. I looked at the debug spew during instruction selection and decided against trying a later DAGToDAG transform -- it seems very difficult to match if the trailing memops are already selected and managing the creation of extra instructions at that level is always tricky. Differential Revision: https://reviews.llvm.org/D106918	2021-07-28 09:10:33 -04:00
Simon Pilgrim	124d586382	[X86][AVX] Move VPERM2F128 defs above VINSERTF128 defs. NFC. This will be necessary for a future patch to lower VINSERTF128 custom folds to VPERM2F128	2021-07-28 14:02:17 +01:00
David Green	41cedb1c9a	[LV][ARM] Tighten up MLA reduction costing This makes a couple of changes to the costing of MLA reduction patterns, to more accurately cost various patterns that can come up from vectorization. - The Arm implementation of getExtendedAddReductionCost is altered to only provide costs for legal or smaller types. Larger than legal types need to be split, which currently does not work very well, especially for predicated reductions where the predicate may be legal but needs to be split. Currently we limit it to legal or smaller input types. - The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext)) is a pattern that can come up, and can be treated the same as reduce(mul(ext, ext)) providing the extension types match. - And it has been adjusted to not count the ext in reduce(mul(ext, ext)) as part of a reduce(mul) pattern. Together these changes help to more accurately cost the mla reductions in cases such as where the extend types don't match or the extend opcodes are different, picking better vector factors that don't result in expanded reductions. Differential Revision: https://reviews.llvm.org/D106166	2021-07-28 12:50:58 +01:00
RamNalamothu	1a8c57179a	[AMDGPU] We would need FP if there is call and caller save VGPR spills Since https://reviews.llvm.org/D98319, determineCalleeSavesSGPR() needs to consider caller save VGPR spills as well while anticipating if we require FP. Fixes: SWDEV-295978 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106758	2021-07-28 11:12:55 +05:30
Xiang1 Zhang	3223d41017	[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D106780	2021-07-28 08:16:59 +08:00
Xiang1 Zhang	2ca3937131	Revert "[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT" This reverts commit `6ff73efea9`.	2021-07-28 08:12:29 +08:00
Xiang1 Zhang	6ff73efea9	[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT	2021-07-28 08:08:30 +08:00
Krzysztof Parzyszek	64d5b6e373	[Hexagon] Fix resetting dead registers in DBG_VALUE_LISTs This fixes https://llvm.org/PR51229.	2021-07-27 18:36:28 -05:00
Nemanja Ivanovic	778932c673	[PowerPC] Turn deprecated altivec prefetch instrs to nops on AIX The dst/dstt/dstst/dststt instructions are nop's on all PowerPC cores that AIX supports. The AIX assembler also does not accept these mnemonics. Turn them into nop's on AIX (similar to dstall).	2021-07-27 15:50:02 -05:00
Sanjay Patel	156ba620b3	[x86] update stale code comment; NFC The transform was generalized with: `1ce05ad619`	2021-07-27 16:45:52 -04:00
Matt Arsenault	d7d2e4545e	AMDGPU/GlobalISel: Fix selecting G_SEXTLOAD/G_ZEXTLOAD pre-gfx9 The patterns for the m0 glue patterns were failing to import.	2021-07-27 15:56:42 -04:00
Amara Emerson	a11d9a1f48	[AArch64][GlobalISel] Fix constraining LDXPX intrinsic selection. Causes a fallback because of lack of regclasses on vregs, unless its without asserts, where we end up crashing later in codegen.	2021-07-27 12:13:56 -07:00
Craig Topper	3852b8c70f	[RISCV] Select vector shl by 1 to a vector add. A vector add may be faster than a vector shift. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106689	2021-07-27 10:57:28 -07:00
Matt Arsenault	b32d3d9e81	AMDGPU: Treat IMPLICIT_DEF like a constant lanemask source This is partially a workaround. SILowerI1Copies does not understand unstructured loops. This would result in inserting instructions to merge a mask register in the same block where it was defined in an unstructured loop.	2021-07-27 11:44:38 -04:00
Thomas Lively	33786576fd	[WebAssembly] Codegen for extmul SIMD instructions Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions with normal codegen patterns. Differential Revision: https://reviews.llvm.org/D106724	2021-07-27 08:41:30 -07:00
Anirudh Prasad	a8cfa4b9bd	[SystemZ][z/OS] Initial code to generate assembly files on z/OS - This patch consists of the bare basic code needed in order to generate some assembly for the z/OS target. - Only the .text and the .bss sections are added for now. - The relevant MCSectionGOFF/Symbol interfaces have been added. This enables us to print out the GOFF machine code sections. - This patch enables us to add simple lit tests wherever possible, and contribute to the testing coverage for the z/OS target - Further improvements and additions will be made in future patches. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D106380	2021-07-27 11:29:15 -04:00
Tres Popp	d225de60c9	Revert "[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI." This reverts commit `1cfecf4fc4`. This commit broke LLVM code generated through XLA by removing a conditional on Ld->getExtensionType() == ISD::NON_EXTLOAD This is not a perfect revert. The new function is left as other uses of it exist now.	2021-07-27 16:55:50 +02:00
Tres Popp	70fa9479b2	Revert "Revert "[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI."" This reverts commit `d7bbb1230a`. There were follow up uses of a deleted method and I didn't run the tests. Undo the revert, so I can do it properly.	2021-07-27 16:48:31 +02:00
Tres Popp	d7bbb1230a	Revert "[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI." This reverts commit `1cfecf4fc4`. This commit broke LLVM code generated through XLA by removing a conditional on Ld->getExtensionType() == ISD::NON_EXTLOAD	2021-07-27 16:22:25 +02:00
Fraser Cormack	172487fe4c	[RISCV] Add support for vector saturating add/sub operations This patch adds support for lowering the saturating vector add/sub intrinsics to RVV instructions, for both fixed-length and scalable-vector forms alike. Note that some of the DAG combines are still not triggering for the scalable-vector tests. These require a bit more work in the DAGCombiner itself. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106651	2021-07-27 10:04:14 +01:00
Cullen Rhodes	2e27c4e1f1	[AArch64][SME] Add zero instruction This patch adds the zero instruction for zeroing a list of 64-bit element ZA tiles. The instruction takes a list of up to eight tiles ZA0.D-ZA7.D, which must be in order, e.g. zero {za0.d,za1.d,za2.d,za3.d,za4.d,za5.d,za6.d,za7.d} zero {za1.d,za3.d,za5.d,za7.d} The assembler also accepts 32-bit, 16-bit and 8-bit element tiles which are mapped to corresponding 64-bit element tiles in accordance with the architecturally defined mapping between different element size tiles, e.g. * Zeroing ZA0.B, or the entire array name ZA, is equivalent to zeroing all eight 64-bit element tiles ZA0.D to ZA7.D. * Zeroing ZA0.S is equivalent to zeroing ZA0.D and ZA4.D. The preferred disassembly of this instruction uses the shortest list of tile names that represent the encoded immediate mask, e.g. * An immediate which encodes 64-bit element tiles ZA0.D, ZA1.D, ZA4.D and ZA5.D is disassembled as {ZA0.S, ZA1.S}. * An immediate which encodes 64-bit element tiles ZA0.D, ZA2.D, ZA4.D and ZA6.D is disassembled as {ZA0.H}. * An all-ones immediate is disassembled as {ZA}. * An all-zeros immediate is disassembled as an empty list {}. This patch adds the MatrixTileList asm operand and related parsing to support this. Depends on D105570. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D105575	2021-07-27 08:35:45 +00:00
David Green	54c91c0c74	[ARM] Implement isLoad/StoreFromStackSlot for MVE stack stores accesses This implements the isLoadFromStackSlot and isStoreToStackSlot for MVE MVE_VSTRWU32 and MVE_VLDRWU32 functions. They behave the same as many other loads/stores, expecting a FI in Op1 and zero offset in Op2. At the same time this alters VLDR_P0_off and VSTR_P0_off to use the same code too, as they too should be returning VPR in Op0, take a FI in Op1 and zero offset in Op2. Differential Revision: https://reviews.llvm.org/D106797	2021-07-27 09:11:58 +01:00
Craig Topper	2ea9db0c49	[AArch64] Fix -Wparentheses warning with gcc 5.4. NFC	2021-07-26 21:08:56 -07:00
Carl Ritson	fbaa35e169	[AMDGPU] Add SelectionDAG support for insert_subvector on v4f64 Enable custom insert_subvector for larger vector types. This is necessary now that SelectionDAG can attempt v3f64 insert to v4f64, etc. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D105385	2021-07-27 10:11:34 +09:00
Nemanja Ivanovic	9654cfd5bb	[PowerPC] Fix materialization of SP float values on Power10 All floating point values in registers are in double precision representation. In order to materialize the correct single precision value, we need to convert the APFloat that represents the value to double precision first. Reviewed By: amyk, NeHuang Differential Revision: https://reviews.llvm.org/D106812	2021-07-26 19:43:10 -05:00
Jon Roelofs	f2e8e46d78	Revert "[AArch64][GlobalISel] Legalize ctpop s128" This reverts commit `97e95fea53`. It broke test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll. Not sure why I didn't see that.	2021-07-26 17:06:43 -07:00
Jon Roelofs	97e95fea53	[AArch64][GlobalISel] Legalize ctpop s128 Differential revision: https://reviews.llvm.org/D106494	2021-07-26 16:33:50 -07:00
Masoud Ataei	45951ad323	[PowerPC] Add pwr7 and pwr10 support to IBM MASSV pass on AIX Before MASSV only supported P8 and P9 on AIX ans Linux . This patch proposes MASSV to add support of P7 and P10 only on AIX too. Differential: https://reviews.llvm.org/D106678	2021-07-26 23:21:38 +00:00
Amara Emerson	172051a1f4	[AArch64][GlobalISel] Add identity combines to post-legal combiner. We see some shifts of zero emitted during legalization. Differential Revision: https://reviews.llvm.org/D106816	2021-07-26 15:17:11 -07:00
Amara Emerson	c658b472f3	[GlobalISel] Add a constant folding combine. Use it AArch64 post-legal combiner. These don't always get folded because when the instructions are created the constants are obscured by artifacts. Differential Revision: https://reviews.llvm.org/D106776	2021-07-26 14:53:33 -07:00
Heejin Ahn	c285a11efd	[WebAssembly] Make Emscripten EH work with Emscripten SjLj When Emscripten EH mixes with Emscripten SjLj, we are not currently handling some of them correctly. There are three cases: 1. The current function calls `setjmp` and there is an `invoke` to a function that can either throw or longjmp. In this case, we have to check both for exception and longjmp. We are currently handling this case correctly: `0c0eb76782/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L1058-L1090)` When inserting routines for functions that can longjmp, which we do only for setjmp-calling functions, we check if the function was previously an `invoke` and handle it correctly. 2. The current function does NOT call `setjmp` and there is an `invoke` to a function that can either throw or longjmp. Because there is no `setjmp` call, we haven't been doing any check for functions that can longjmp. But in that case, for `invoke`, we only check for an exception and if it is not an exception we reset `__THREW__` to 0, which can silently swallow the longjmp: `0c0eb76782/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L70-L80)` This CL fixes this. 3. The current function calls `setjmp` and there is no `invoke`. Because it is not an `invoke`, we haven't been doing any check for functions that can throw, and only insert longjmp-checking routines for functions that can longjmp. But in that case, if a longjmpable function throws, we only check for a longjmp so if it is not a longjmp we reset `__THREW__` to 0, which can silently swallow the exception: `0c0eb76782/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L156-L169)` This CL fixes this. To do that, this moves around some code, so we register necessary functions for both EH and SjLj and precompute some data (the set of functions that contains `setjmp`) before doing actual EH or SjLj transformation. This CL makes 2nd and 3rd tests in https://github.com/emscripten-core/emscripten/pull/14732 work. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D106525	2021-07-26 13:48:31 -07:00
Lei Huang	64a15817a0	[PowerPC]Add addex instruction definition and MC tests Add td definitions and asm/disasm tests for the addex instruction introduced in ISA 3.0. Reviewed By: nemanjai, amyk, NeHuang Differential Revision: https://reviews.llvm.org/D106666	2021-07-26 14:55:38 -05:00
Lei Huang	2d788959ed	[PowerPC] Add implicit-def RM to instructions mtfsb[01] This is a followup patch for D105930 to add implicit-def of RM for mtfsb[01] instructions as per review comments. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D106603	2021-07-26 14:07:08 -05:00
Michael Liao	b0402a35fc	[amdgpu] Add 64-bit PC support when expanding unconditional branches. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106445	2021-07-26 14:50:30 -04:00
Amara Emerson	6af8d36054	[AArch4][GlobalISel] Post-legalize combine s64 = G_MERGE s32, 0 -> G_ZEXT. These are generated as a byproduce of legalization. Differential Revision: https://reviews.llvm.org/D106768	2021-07-26 10:58:04 -07:00
Amara Emerson	0d41d21929	[AArch64][GlobalISel] Enable some select combines after legalization. The legalizer generates selects for some operations, which can have constant condition values, resulting in lots of dead code if it's not folded away. Differential Revision: https://reviews.llvm.org/D106762	2021-07-26 10:40:32 -07:00
Amara Emerson	dec34104bf	[GlobalISel] Add combine for merge(unmerge) and use AArch64 postlegal-combiner. Differential Revision: https://reviews.llvm.org/D106761	2021-07-26 10:37:31 -07:00
Heejin Ahn	6b9aba43a2	[WebAssembly] Improve pseudocode in LowerEmscriptenEHSjLj Both `__THREW__` and `__threwValue` are global variables, and we have been distinguishing the global variable `__THREW__` and the loaded value `%__THREW__.val` in comments but not doing it for `__threwValue`. Made the pseudocode comments consistent for both variables. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D106524	2021-07-26 10:13:28 -07:00
Paul Walker	3b77e2737c	[SVE] Use reg+reg addressing mode for immediate offsets. For reg+imm SVE addressing mode imm is implictly scaled by VL, making them impractical for truely immediate offsets. However, if the offset can be unscaled based on the storage element type we can use the reg+reg SVE addressing mode and thus either reduce the number of generate add instructions or replace them with a mov instruction that can be hoisted from the hot code path. Differential Revision: https://reviews.llvm.org/D106744	2021-07-26 16:24:16 +01:00
Bradley Smith	81eafb8a37	[AArch64][SVE] Break false dependencies for inactive lanes of unary operations Differential Revision: https://reviews.llvm.org/D105889	2021-07-26 15:01:21 +00:00
Ulrich Weigand	8cd8120a7b	[SystemZ] Add support for new cpu architecture - arch14 This patch adds support for the next-generation arch14 CPU architecture to the SystemZ backend. This includes: - Basic support for the new processor and its features. - Detection of arch14 as host processor. - Assembler/disassembler support for new instructions. - New LLVM intrinsics for certain new instructions. - Support for low-level builtins mapped to new LLVM intrinsics. - New high-level intrinsics in vecintrin.h. - Indicate support by defining __VEC__ == 10304. Note: No currently available Z system supports the arch14 architecture. Once new systems become available, the official system name will be added as supported -march name.	2021-07-26 16:57:28 +02:00
Jay Foad	59f6865231	[AMDGPU][GISel] Fix MMO for raw/struct buffer access with non-constant offset Codegen for the raw/struct buffer access intrinsics would update the offset in the MMO to reflect the combined offset, if it was known to be constant. If the combined offset was not known to be constant, or if there was an index, it would set the offset in the MMO to 0. This is unsafe because it makes it look like the access does not alias with another access with a fixed non-zero offset. Fix these cases by setting the pointer in the MMO to null, to reflect the fact that we do not have any known IR value pointer + constant offset for the access. D106284 did this for SelectionDAG. This is the corresponding fix for GlobalISel. Differential Revision: https://reviews.llvm.org/D106451	2021-07-26 14:27:30 +01:00
Jay Foad	9ac10658ae	[AMDGPU] Fix MMO for raw/struct buffer access with non-constant offset Codegen for the raw/struct buffer access intrinsics would update the offset in the MMO to reflect the combined offset, if it was known to be constant. If the combined offset was not known to be constant, or if there was an index, it would set the offset in the MMO to 0. This is unsafe because it makes it look like the access does not alias with another access with a fixed non-zero offset. Fix these cases by setting the pointer in the MMO to null, to reflect the fact that we do not have any known IR value pointer + constant offset for the access. Differential Revision: https://reviews.llvm.org/D106284	2021-07-26 14:27:30 +01:00
David Green	010f8e3057	[ARM] Ensure correct regclass in distributing postinc The register class required for some MVE loads/stores is more constrained than the register we use when creating postinc. Make sure we constrain the register class to keep the code correct.	2021-07-26 14:26:38 +01:00
Tim Northover	a487a49acc	AArch64: support i128 (& larger) returns in GlobalISel	2021-07-26 14:16:35 +01:00
Caroline Concatto	bf28111ebd	[AArch65][SVE] Remove vector_splice from AddedComplexity pattern The pattern for vector_splice with Index equal or bigger than zero was misplaced in the AddedComplexity = 1 pattern in the AArch64 tablegen file. This patch fixes it by removing vector_splice pattern from inside AddedComplexity = 1.	2021-07-26 13:35:51 +01:00
Caroline Concatto	0bfc26e3a4	[SVE][AArch64] Improve code generation for vector_splice for Imm > 0 This patch implements vector_splice in tablegen for all cases when the Immediate is positive and lower than the known minimum value of a scalable vector. Vector_splice can be implemented using SVE instruction EXT. For instance : @llvm.experimental.vector.splice(Vector_1, Vector_2, Imm) @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> EXT Vector_1, Vector_2, Imm // Vector_1 = B, C, D + Vector_2 = E Depends on D105633 Differential Revision: https://reviews.llvm.org/D106273	2021-07-26 11:45:46 +01:00
Caroline Concatto	73e4e9cd00	[AArch64][SVE] Improve code generation for vector_splice for Imm == -1 This patch implements vector_splice in tablegen for: a) when the immediate is equal to -1 (Imm==1) and uses: INSR + LASTB For instance : @llvm.experimental.vector.splice(Vector_1, Vector_2, -1) @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <D, E, F, G> LAST RegLast, Vector_1 // RegLast = D INSR Res, (Vector_1 >> 1), RegLast // Res = D + E, F, G Differential Revision: https://reviews.llvm.org/D105633	2021-07-26 11:25:01 +01:00
Simon Pilgrim	c8472db0a8	[X86][AVX] Prefer vinsertf128 to vperm2f128 on AVX1 targets Splatting the lower xmm with vinsertf128 is at least as quick as vperm2f128, and a lot faster on some AMD targets. First step towards PR50053	2021-07-26 11:11:56 +01:00
Cullen Rhodes	e6ff9179ce	[AArch64][AsmParser] NFC: Parser.getTok().getLoc() -> getLoc() Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D106635	2021-07-26 09:36:34 +00:00
David Sherwood	0aff1798b5	[Analysis] Add simple cost model for strict (in-order) reductions I have added a new FastMathFlags parameter to getArithmeticReductionCost to indicate what type of reduction we are performing: 1. Tree-wise. This is the typical fast-math reduction that involves continually splitting a vector up into halves and adding each half together until we get a scalar result. This is the default behaviour for integers, whereas for floating point we only do this if reassociation is allowed. 2. Ordered. This now allows us to estimate the cost of performing a strict vector reduction by treating it as a series of scalar operations in lane order. This is the case when FP reassociation is not permitted. For scalable vectors this is more difficult because at compile time we do not know how many lanes there are, and so we use the worst case maximum vscale value. I have also fixed getTypeBasedIntrinsicInstrCost to pass in the FastMathFlags, which meant fixing up some X86 tests where we always assumed the vector.reduce.fadd/mul intrinsics were 'fast'. New tests have been added here: Analysis/CostModel/AArch64/reduce-fadd.ll Analysis/CostModel/AArch64/sve-intrinsics.ll Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D105432	2021-07-26 10:26:06 +01:00
Simon Pilgrim	1cfecf4fc4	[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI. Begin replacing individual getMemIntrinsicNode calls and setup (for X86ISD::VBROADCAST_LOAD + X86ISD::SUBV_BROADCAST_LOAD opcodes) with this getBROADCAST_LOAD helper.	2021-07-25 20:37:58 +01:00
Kyungwoo Lee	6530ea4095	[AArch64] Fix Local Deallocation for Homogeneous Prolog/Epilog The stack adjustment for local deallocation was incorrectly ported. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D106760	2021-07-25 10:51:11 -07:00
Simon Pilgrim	b95f66ad78	[X86][SSE] LowerRotate - perform modulo on the amount splat source directly. If the rotation amount is a known splat, perform the modulo on the splat source, and then perform the splat. That way the amount-extension performed later by LowerScalarVariableShift can fold the splats away without any multiple-use issues. Fixes one of the concerns raised on D104156	2021-07-25 17:30:32 +01:00
Sanjay Patel	1ce05ad619	[x86] improve CMOV codegen by pushing add into operands, part 2 This is a minimum extension of D106607 to allow folding for 2 non-zero constantsi that can be materialized as immediates.. In the reduced test examples, we save 1 instruction by rolling the constants into LEA/ADD. In the motivating test from the bullet benchmark, we absorb both of the constant moves into add ops via LEA magic, so we reduce by 2 instructions. Differential Revision: https://reviews.llvm.org/D106684	2021-07-25 10:05:41 -04:00
Simon Pilgrim	15b883f457	[X86][AVX] Adjust AllowBWIVPERMV3 tolerance to account for VariableCrossLaneShuffleDepth As noticed on D105390 - we were hardwiring the depth limit for combining to VPERMI2W/VPERMI2B instructions. Not only had we made the limit too low, we hadn't accounted for slow/fast shuffles via the VariableCrossLaneShuffleDepth control	2021-07-25 14:05:11 +01:00
Amara Emerson	acbc0c5f0e	[AArch64][GlobalISel] Widen non-pow-2 types for shifts before clamping. For types like s96, we don't want to clamp to s64, we want to first widen to s128 and then narrow it. Otherwise we end up with impossible to legalize types.	2021-07-24 15:50:43 -07:00
Craig Topper	c63dbd8501	[RISCV] Custom lower (i32 (fptoui/fptosi X)). I stumbled onto a case where our (sext_inreg (assertzexti32 (fptoui X)), i32) isel pattern can cause an fcvt.wu and fcvt.lu to be emitted if the assertzexti32 has an additional user. If we add a one use check it would just cause a fcvt.lu followed by a sext.w when only need a fcvt.wu to satisfy both users. To mitigate this I've added custom isel and new ISD opcodes for fcvt.wu. This allows us to keep know it started life as a conversion to i32 without needing to match multiple nodes. ComputeNumSignBits has been taught that this new nodes produces 33 sign bits. To prevent regressions when we need to zero extend the result of an (i32 (fptoui X)), I've added a DAG combine to convert it to an (i64 (fptoui X)) before type legalization. In most cases this would happen in InstCombine, but a zero_extend can be created for function returns or arguments. To keep everything consistent I've added new nodes for fptosi as well. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D106346	2021-07-24 10:50:43 -07:00
Ayke van Laethem	4d7f5c0a85	[AVR] Only support sp, r0 and r1 in llvm.read_register Most other registers are allocatable and therefore cannot be used. This issue was flagged by the machine verifier, because reading other registers is considered reading from an undefined register. Differential Revision: https://reviews.llvm.org/D96969	2021-07-24 14:03:27 +02:00
Ayke van Laethem	41f905b211	[AVR] Fix rotate instructions This patch fixes some issues with the RORB pseudo instruction. - A minor issue in which the instructions were said to use the SREG, which is not true. - An issue with the BLD instruction, which did not have an output operand. - A major issue in which invalid instructions were generated. The fix also reduce RORB from 4 to 3 instructions, so it's also a small optimization. These issues were flagged by the machine verifier. Differential Revision: https://reviews.llvm.org/D96957	2021-07-24 14:03:26 +02:00
Ayke van Laethem	6aa9e746eb	[AVR] Expand large shifts early in IR This patch makes sure shift instructions such as this one: %result = shl i32 %n, %amount are expanded just before the IR to SelectionDAG conversion to a loop so that calls to non-existing library functions such as __ashlsi3 are avoided. The generated code is currently pretty bad but there's a lot of room for improvement: the shift itself can be done in just four instructions. Differential Revision: https://reviews.llvm.org/D96677	2021-07-24 14:03:26 +02:00
Ayke van Laethem	431a941465	[AVR] Improve 8/16 bit atomic operations There were some serious issues with atomic operations. This patch should fix the biggest issues. For details on the issue take a look at this Compiler Explorer sample: https://godbolt.org/z/n3ndhn Code: void atomicadd(_Atomic char val) { val += 5; } Output: atomicadd: movw r26, r24 ldi r24, 5 ; 'operand' register in r0, 63 cli ld r24, X ; load value add r24, r26 ; value += X st X, r24 ; store value back out 63, r0 ret ; return the wrong value (in r24) There are various problems with this. - The value to add (5) is stored in r24. However, the value to add to is loaded in the same register: r24. - The `add` instruction adds half of the pointer to the loaded value, instead of (attempting to) add the operand with value 5. - The output value of the cmpxchg instruction (which is not used in this code sample) is the new value with 5 added, not the old value. The LangRef specifies that it has to be the old value, before the operation. This patch fixes the first two and leaves the third problem to be fixed at a later date. I believe atomics were mostly broken before this patch, with this patch they should become usable as long as you ignore the output of the atomic operation. In particular it fixes the following things: - It sets the earlyclobber flag for the input ('$operand' operand) so that the register allocator puts it in a different register than the output value. - It fixes a number of issues with the pseudo op expansion pass, for example now it adds the $operand field instead of the pointer. This fixes most machine instruction verifier issues (other flagged issues are unrelated to atomics). Differential Revision: https://reviews.llvm.org/D97127	2021-07-24 14:03:26 +02:00
Ayke van Laethem	8544ce80f8	[AVR] Set R31R30 as clobbered after ADJCALLSTACKDOWN In most cases, using R31R30 is fine because the call (which always precedes ADJCALLSTACKDOWN) will clobber R31R30 anyway. However, in some rare cases the register allocator might insert an instruction between the call and the ADJCALLSTACKDOWN instruction and expect the register pair to be live afterwards. I think this happens as a result of rematerialization. Therefore, to fix this, the instruction needs to have Defs set to R31R30. Setting the Defs field does have the effect of making the instruction look dead, which it certainly is not. This is fixed by setting hasSideEffects to true. Differential Revision: https://reviews.llvm.org/D97745	2021-07-24 14:03:26 +02:00
Ayke van Laethem	feda08b70a	[AVR] Do not chain stores in call frame setup Previously, AVRTargetLowering::LowerCall attempted to keep stack stores in order with chains. Perhaps this worked in the past, but it does not work now: it appears that the SelectionDAG legalization phase removes these chains. Therefore, I've removed these chains entirely to match X86 (which, similar to AVR, also prefers to use push instructions over stack-relative stores to set up a call frame). With this change, all the stack stores are in a somewhat reasonable order. Differential Revision: https://reviews.llvm.org/D97853	2021-07-24 14:03:26 +02:00
Alexander Belyaev	edb05d555e	[llvm] Inline getAssociatedFunction() in LLVM_DEBUG. Function* F is used only inside LLVM_DEBUG, so that it causes unused variable warning.	2021-07-24 11:49:21 +02:00
Amara Emerson	5ec0f051c8	[GlobalISel] Add GUnmerge, GMerge, GConcatVectors, GBuildVector abstractions. NFC. Use these to slightly simplify some code in the artifact combiner.	2021-07-23 22:32:26 -07:00
Kuter Dinel	96709823ec	[AMDGPU] Deduce attributes with the Attributor This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes. Reviewed By: jdoerfert, arsenm Differential Revision: https://reviews.llvm.org/D104997	2021-07-24 06:07:15 +03:00
Thomas Lively	85157c0079	[WebAssembly] Codegen for pmin and pmax Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax} with standard codegen patterns. Since wasm_simd128.h uses an integer vector as the standard single vector type, the IR for the pmin and pmax intrinsic functions contains bitcasts that would not be there otherwise. Add extra codegen patterns that can still select the pmin and pmax instructions in the presence of these bitcasts. Differential Revision: https://reviews.llvm.org/D106612	2021-07-23 14:49:21 -07:00
Thomas Lively	39c0e4afce	[WebAssembly][NFC] Simplify SIMD bitconvert pattern Differential Revision: https://reviews.llvm.org/D106680	2021-07-23 14:43:48 -07:00
Craig Topper	5edccc4581	[RISCV] Avoid using x0,x0 vsetvli for vmv.x.s and vfmv.f.s unless we know the sew/lmul ratio is constant. Since we're changing VTYPE, we may change VLMAX which could invalidate the previous VL. If we can't tell if it is safe we should use an AVL of 1 instead of keeping the old VL. This is a quick fix. We may want to thread VL to the pseudo instruction instead of making up a value. That will require ISD opcode changes and changes to the C intrinsic interface. This fixes the issue raised in D106286. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106403	2021-07-23 09:12:05 -07:00
Craig Topper	cc6d302c91	[X86] Fix a bug in TEST with immediate creation This code tries to form a TEST from CMP+AND with an optional truncate in between. If we looked through the truncate, we may have extra bits in the AND mask that shouldn't participate in the checks. Normally SimplifyDemendedBits takes care of this, but the AND may have another user. So manually mask out any extra bits. Fixes PR51175. Differential Revision: https://reviews.llvm.org/D106634	2021-07-23 09:03:53 -07:00
Benjamin Kramer	dd70cd089a	[llvm][sve] Silence unused variable warning in Release builds. NFC	2021-07-23 16:16:35 +02:00
Sanjay Patel	f060aa1cf3	[x86] improve CMOV codegen by pushing add into operands This is not the transform direction we want in general, but by the time we have a CMOV, we've already tried everything else that could be better. The transform increases the uses of the other add operand, but that is safe according to Alive2: https://alive2.llvm.org/ce/z/Yn6p-A We could probably extend this to other binops (not just add). This is the motivating pattern discussed in: https://llvm.org/PR51069 The test with i8 shows a missed fold because there's a trunc sitting in front of the add. That can be handled with a small follow-up. Differential Revision: https://reviews.llvm.org/D106607	2021-07-23 09:39:32 -04:00
David Truby	1528a4d400	[llvm][sve] Lowering for VLS truncating stores This adds custom lowering for truncating stores when operating on fixed length vectors in SVE. It also includes a DAG combine to fold extends followed by truncating stores into non-truncating stores in order to prevent this pattern appearing once truncating stores are supported. Currently truncating stores are not used in certain cases where the size of the vector is larger than the target vector width. Differential Revision: https://reviews.llvm.org/D104471	2021-07-23 14:04:55 +01:00
Simon Pilgrim	71d0fd3564	[X86][AVX] lowerV2X128Shuffle - attempt to recognise broadcastf128 subvector load As noticed on PR50053 we were failing to recognise when a shuffle of a load was really a subvector broadcast load	2021-07-23 13:10:38 +01:00
David Green	38986c6782	[AArch64] Add worst case shuffle costs This adds some missing single source shuffle costs for AArch64, of i16 and i8 vectors. v4i16 are the same as v4i32 with a worse case cost of 3 coming from the perfect shuffle tables. The larger vector sizes expand into a constant pool, plus a load (and adrp) and a tbl. I arbitrarily chose 8 for the cost to be expensive but not too expensive. Differential Revision: https://reviews.llvm.org/D106241	2021-07-23 09:01:58 +01:00
Sebastian Neubauer	2f15319968	[AMDGPU] Fix running ResourceUsageAnalysis Clear the map when running the analysis multiple times. The assertion that should ensure that every function is only analyzed once triggered sometimes (once every ~70 compiles of some graphics pipelines) when two functions of subsequent runs were allocated at the same address. Differential Revision: https://reviews.llvm.org/D106452	2021-07-23 09:25:15 +02:00
Carl Ritson	7d4baf25aa	[AMDGPU] Add maximum NSA size limit ISA feature Add maximum NSA size limit as an ISA feature. Use this to reduce NSA usage on GFX10.1 to avoid stability issues with 4 and 5 dwords NSA instructions. Maintain use of longer NSA instructions on GFX10.3. Note: this also contains some minor fixes for GlobalISel which did not work correctly with non-NSA form instructions on GFX10. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D103348	2021-07-23 16:16:06 +09:00
Cullen Rhodes	fde7550094	[AArch64][AsmParser] NFC: when creating a token IsSuffix=false should be default Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106568	2021-07-23 06:36:06 +00:00
Hsiangkai Wang	4b2dd318dd	[RISCV] Add FrameSetup/FrameDestroy flag to prologue/epilog instructions. Differential Revision: https://reviews.llvm.org/D105086	2021-07-23 11:35:19 +08:00
Vitaly Buka	44ba8c691c	[NFC][asan] Always pass Dominator Trees into forAllReachableExits	2021-07-22 18:01:38 -07:00
Thomas Johnson	51d8e67e88	[ARC] Add tablegen definition for the Find Leading Set (FLS) instruction Differential Revision: https://reviews.llvm.org/D106602	2021-07-22 17:42:25 -07:00
Paulo Matos	46667a1003	[WebAssembly] Implementation of global.get/set for reftypes in LLVM IR Reland of `31859f896`. This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and lowering methods for load and stores of reference types from IR globals. Once the lowering creates the new nodes, tablegen pattern matches those and converts them to Wasm global.get/set. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D104797	2021-07-22 22:07:24 +02:00
Simon Pilgrim	4185c5502c	[CostModel][X86] Adjust shift SSE4 legalized costs based on llvm-mca reports. Update shl/lshr/ashr costs based on the worst case costs from the script in D103695 - many of the 128-bit shifts (usually where integer multiplies aren't used) have similar behaviour to AVX1 so we can merge them.	2021-07-22 20:07:32 +01:00
Simon Pilgrim	d073b19dbf	[X86] Fix SLM FP<->INT throughputs. Noticed while trying to clean up the shift costs model for SSE4 targets using the script in D10369 - SLM double-pumps all the 128-bit vector conversion ops and only use FP0 pipe - numbers taken from Intel AOM + Agner.	2021-07-22 19:39:04 +01:00
Thomas Johnson	1cda1e6186	[ARC] Add disassembly for the conditioned RSUB immediate instruction Differential Revision: https://reviews.llvm.org/D106497	2021-07-22 11:34:39 -07:00
David Green	c9cebda772	[AArch64] Adjust the cost of integer sum reductions This changes the cost to (LT.first-1) * cost(add) + 2, where the cost of an add is assumed to be 1. This brings it inline with the other reductions. Differential Revision: https://reviews.llvm.org/D106240	2021-07-22 18:19:54 +01:00
Simon Pilgrim	e1bdb57958	[CostModel][X86] Adjust shift SSE legalized costs based on llvm-mca reports. Update shl/lshr/ashr costs based on the worst case costs from the script in D103695.	2021-07-22 18:12:49 +01:00
Victor Huang	26ea4a4432	[PowerPC] Add PowerPC "__stbcx" builtin and intrinsic for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds the builtin and intrinsic for "__stbcx". Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D106484	2021-07-22 10:48:46 -05:00
Cullen Rhodes	00e87e1c5b	[AArch64][SME] Improve diagnostic for vector select register Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D106540	2021-07-22 13:46:40 +00:00
Fraser Cormack	b115c038d2	[RISCV] Fix a crash when lowering split float arguments Lowering certain float vectors without legal vector types could cause a crash due to a bad interaction between passing floats via GPRs and argument splitting. Split vector floats appear just like scalar floats. Under certain situations we choose to pass these float arguments via GPRs and use an XLenVT location and set the 'BCvt' info to track how they must be converted back to floating-point values. However, later logic for handling split arguments may take over, in which case we lose the previous information and set the 'Indirect' info, thus incorrectly lowering to integer types. I don't believe that we would have come across the notion of split floating-point arguments before. This patch addresses the issue by updating the lowering so that split arguments are only passed indirectly when they are scalar integer types. This has some change to how we lower some larger illegal float vectors, as can be seen in 'fastcc-float.ll' where the vector is now passed partly in registers and partly on the stack. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D102852	2021-07-22 09:55:26 +01:00
Fraser Cormack	7b3a69bc16	[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID This relands `a6ca88e908` which was originally reverted due to overflow bugs in `e3fa2b1eab`. This patch teaches the compiler to identify a wider variety of `BUILD_VECTOR`s which form integer arithmetic sequences, and to lower them to `vid.v` with modifications for non-unit steps and non-zero addends. The sequences handled by this optimization must either be monotonically increasing or decreasing. Consecutive elements holding the same value indicate a fractional step which, while simple mathematically, becomes more complex to handle both in the realm of lossy integer division and in the presence of `undef`s. For example, a common "interleaving" shuffle index will be lowered by LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR` nodes. Either of these would ideally be lowered to `vid.v` shifted right by 1. Detection of this sequence in presence of general `undef` values is more complicated, however: `<0,u,u,1,>` could match either `<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence. Both are possible, so backtracking or multiple passes is inevitable. Sticking to monotonic sequences keeps the logic simpler as it can be done in one pass. Fractional steps will likely be a separate optimization in a future patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104921	2021-07-22 09:36:12 +01:00
Ben Shi	9e5c5afc7e	[RISCV] Optimize multiplication in the zba extension with SHADD This patch make the following optimization. (mul x, 3 power_of_2) -> (SLLI (SH1ADD x, x), bits) (mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits) (mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D105796	2021-07-22 10:28:41 +08:00
Carl Ritson	6efb3220b4	[AMDGPU] Add VReg_192/VReg_224 support for MIMG instructions Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr. Previously these were rounded up to VReg_256 this saves VGPRs. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D103800	2021-07-22 10:42:15 +09:00
Carl Ritson	9dcd75f86f	[AMDGPU] Allow frontends to disable null export for pixel shaders Disable null export (for kills) when a frontend defines a pixel shader as not exporting using amdgpu-color-export and amdgpu-depth-export function attrbutes. This allows the generation of export free pixel shaders. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D105683	2021-07-22 10:20:46 +09:00
Thomas Lively	8af333cf1a	[WebAssembly] Replace @llvm.wasm.popcnt with @llvm.ctpop.v16i8 Use the standard target-independent intrinsic to take advantage of standard optimizations. Differential Revision: https://reviews.llvm.org/D106506	2021-07-21 16:45:54 -07:00
Jessica Paquette	c75a2bbe08	[AArch64][GlobalISel] Change \| -> \|\| in an if I wrote the wrong type of OR by mistake.	2021-07-21 14:57:31 -07:00
Stanislav Mekhanoshin	fe197ef9f1	[AMDGPU] Mark relevant rematerializable VOP3 instructions Differential Revision: https://reviews.llvm.org/D106110	2021-07-21 14:44:13 -07:00
Stanislav Mekhanoshin	9625ca5b60	[AMDGPU] Mark relevant rematerializable VOP2 instructions Differential Revision: https://reviews.llvm.org/D106023	2021-07-21 14:24:59 -07:00
David Green	ba42f6a4b5	[ARM] Pass SelectionDAG to methods that dont require DCI. NFC In these methods DCI is never used, only the DAG from it. Pass the DAG directly, cleaning up the code a little.	2021-07-21 22:11:09 +01:00
Stanislav Mekhanoshin	4eb24817ec	[AMDGPU] Mark all relevant VOP1 instructions rematerializable Differential Revision: https://reviews.llvm.org/D105919	2021-07-21 14:05:32 -07:00
Stanislav Mekhanoshin	d01b34ed31	[AMDGPU] Move perfhint analysis This is SCC pass, moving it to the end of SCC PM saves one Function PM. This needs the analysis to take into account memory access width since it is now places after the load/store optimizer (D105651). Differential Revision: https://reviews.llvm.org/D105652	2021-07-21 13:06:49 -07:00
Jessica Paquette	d0af732bd0	[AArch64][GlobalISel] Widen s2 and s4 G_IMPLICIT_DEF + G_FREEZE These had ``` .clampScalar(0, s1, 64) .widenScalarToNextPow2(0, 8) ``` If you have s2 or s4, then `widenScalarToNextPow2` does nothing. This changes the `widenScalarToNextPow2` rule to use s8 as the minimum type instead, allowing us to correctly widen s2 and s4. This does not impact s1, since it's marked as legal already. Differential Revision: https://reviews.llvm.org/D106413	2021-07-21 12:59:20 -07:00
Stanislav Mekhanoshin	a397c1c82f	[AMDGPU] Tune perfhint analysis to account access width A function with less memory instructions but wider access is the same as a function with more but narrower accesses in terms of memory boundness. In fact the pass would give different answers before and after vectorization without this change. Differential Revision: https://reviews.llvm.org/D105651	2021-07-21 12:46:10 -07:00
Craig Topper	a467c08570	[RISCV] Cleanup comment around vector tail policy handling. NFC vmv.x.s and reductions don't ignore tail policy anymore.	2021-07-21 12:45:08 -07:00
Eli Friedman	0ca46a1757	[SelectionDAG] Fix the representation of ISD::STEP_VECTOR. The existing rule about the operand type is strange. Instead, just say the operand is a TargetConstant with the right width. (Legalization ignores TargetConstants, so it doesn't matter if that width is legal.) Highlights: 1. I had to substantially rewrite the AArch64 isel patterns to expect a TargetConstant. Nothing too exotic, but maybe a little hairy. Maybe worth considering a target-specific node with some dagcombines instead of this complicated nest of isel patterns. 2. Our behavior on RV32 for vectors of i64 has changed slightly. In particular, we correctly preserve the width of the arithmetic through legalization. This changes the DAG a bit. Maybe room for improvement here. 3. I explicitly defined the behavior around overflow. This is necessary to make the DAGCombine transforms legal, and I don't think it causes any practical issues. Differential Revision: https://reviews.llvm.org/D105673	2021-07-21 10:58:40 -07:00
Thomas Lively	1a57ee1276	[WebAssembly] Codegen for v128.load{32,64}_zero Replace the experimental clang builtins and LLVM intrinsics for these instructions with normal instruction selection patterns. The wasm_simd128.h intrinsics header was already using portable code for the corresponding intrinsics, so now it produces the correct instructions. Differential Revision: https://reviews.llvm.org/D106400	2021-07-21 09:02:12 -07:00
Eric Astor	69551486fd	[ms] [llvm-ml] Restrict implicit RIP-relative addressing to named-variable references ML64.EXE applies implicit RIP-relative addressing only to memory references that include a named-variable reference. Reviewed By: mstorsjo Differential Revision: https://reviews.llvm.org/D105372	2021-07-21 11:49:58 -04:00
Quinn Pham	e002d251dd	[PowerPC] Floating Point Builtins for XL Compat. This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds builtins related to floating point operations Reviewed By: #powerpc, nemanjai, amyk, NeHuang Differential Revision: https://reviews.llvm.org/D103986	2021-07-21 08:33:39 -05:00
Sebastian Neubauer	b642d01fa8	[AMDGPU] Improve killed check for vgpr optimization The killed flag is not always set. E.g. when a variable is used in a loop, it is never marked as killed, although it is unused in following basic blocks. Also, we try to deprecate kill flags and not use them. Check if the register is live in the endif block. If not, consider it killed in the then and else blocks. The vgpr-liverange tests have two new tests with loops (pre-committed, so the diff is visible). I also needed to change the subtarget to gfx10.1, otherwise calls are not working. Differential Revision: https://reviews.llvm.org/D106291	2021-07-21 15:24:59 +02:00
Jay Foad	3ed29f960c	[AMDGPU] NFC refactoring in isel for buffer access intrinsics Rename getBufferOffsetForMMO to updateBufferMMO and pass in the MMO to be updated, in preparation for the bug fix in D106284. Call updateBufferMMO consistently for all buffer intrinsics, even the ones that use setBufferOffsets to decompose a combined offset expression. Add a getIdxEn helper function. Differential Revision: https://reviews.llvm.org/D106354	2021-07-21 11:12:49 +01:00
Cullen Rhodes	008c755d76	[AArch64][SME] Support .arch and .arch_extension assembler directives Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D105566	2021-07-21 08:40:27 +00:00
Tim Northover	19d2e42be2	ARM: don't return by popping PC if we have to adjust the stack afterwards. In mandatory tail calling conventions we might have to deallocate stack space used by our arguments before return. This happens after popping CSRs, so the pop cannot be turned into the return itself in this case. The else branch here was already a nop, so removing it as a tidy-up.	2021-07-21 09:35:14 +01:00
Tim Northover	291e0daa6e	AArch64: support 8 & 16-bit atomic operations in GlobalISel We have SelectionDAG patterns for 8 & 16-bit atomic operations, but they assume the value types will have been legalized to 32-bits. So this adds the ability to widen them to both AArch64 & generic GISel infrastructure.	2021-07-21 09:35:14 +01:00
Cullen Rhodes	2d80bbd939	[AArch64][SME] Add mova instructions This patch adds the mova instruction to insert/extract an SVE vector register to/from a ZA tile vector. The preferred MOV aliases are also implemented. Depends on D105572. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: david-arm, CarolineConcatto Differential Revision: https://reviews.llvm.org/D105574	2021-07-21 08:20:01 +00:00
Cullen Rhodes	6c32cfe85c	[AArch64][SME] Add ldr and str instructions The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D105573	2021-07-21 08:17:13 +00:00
Tianqing Wang	bec4a8157d	[X86] Update MachineLoopInfo in CMOV conversion. If a CMOV is in a loop and is converted to branches, CMOV conversion wouldn't add newly created basic blocks to loop info. Since the candidates is collected based on loops, instructions in these basic blocks will be ignored. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D104623	2021-07-21 10:53:46 +08:00
Albion Fung	2fd1520247	[PowerPC] Implemented mtmsr, mfspr, mtspr Builtins Implemented builtins for mtmsr, mfspr, mtspr on PowerPC; the patch is intended for XL Compatibility. Differential revision: https://reviews.llvm.org/D106130	2021-07-20 17:51:00 -05:00
Jon Roelofs	75187aa352	[AArch64][GlobalISel] Legalize ctpop for v2s64, v2s32, v4s32, v4s16, v8s16 https://llvm.godbolt.org/z/nTTK6M5qe Differential revision: https://reviews.llvm.org/D106388	2021-07-20 15:37:56 -07:00
Albion Fung	3434ac9e39	[PowerPC] Store, load, move from and to registers related builtins This patch implements store, load, move from and to registers related builtins, as well as the builtin for stfiw. The patch aims to provide feature parady with xlC on AIX. Differential revision: https://reviews.llvm.org/D105946	2021-07-20 15:46:14 -05:00
Jessica Paquette	8f54ebd51d	[AArch64][GlobalISel] Select llvm.aarch64.neon.st2 intrinsics Add manual selection code similar to the code in AArch64ISelDAGToDAG, and add `createTuple` helpers similar to the code there as well. This accounted for around 111 fallbacks while building clang for AArch64 with GlobalISel. This also should make it easy to add selection code for other store intrinsics. As a minor cleanup, this uses `createQTuple` in the other place where we use REG_SEQUENCE. Differential Revision: https://reviews.llvm.org/D106332	2021-07-20 13:23:46 -07:00
Eli Friedman	664a1fd9f0	[AArch64] Use the CMP_SWAP_128 variants added in `843c6140`. Accidentally forgot to flip the opcode... and I didn't notice because it was working fine for the GlobalISel.	2021-07-20 13:23:27 -07:00
Fangrui Song	0c0549fbb3	[AArch64] Delete unused Opcode after D106039	2021-07-20 12:51:44 -07:00
Eli Friedman	843c614058	[AArch64] Fix i128 cmpxchg using ldxp/stxp. Basically two parts to this fix: 1. Stop using AtomicExpand to expand cmpxchg i128 2. Fix AArch64ExpandPseudoInsts to use a correct expansion. From ARM architecture reference: To atomically load two 64-bit quantities, perform a Load-Exclusive pair/Store-Exclusive pair sequence of reading and writing the same value for which the Store-Exclusive pair succeeds, and use the read values from the Load-Exclusive pair. Fixes https://bugs.llvm.org/show_bug.cgi?id=51102 Differential Revision: https://reviews.llvm.org/D106039	2021-07-20 12:38:12 -07:00
Victor Huang	1a762f93f8	[PowerPC] Add PowerPC cmpb builtin and emit target indepedent code for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch add the builtin and emit target independent code for __cmpb. Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D105194	2021-07-20 13:06:22 -05:00
Craig Topper	81efb82570	[RISCV] Teach RISCVMatInt about cases where it can use LUI+SLLI to replace LUI+ADDI+SLLI for large constants. If we need to shift left anyway we might be able to take advantage of LUI implicitly shifting its immediate left by 12 to cover part of the shift. This allows us to use more bits of the LUI immediate to avoid an ADDI. isDesirableToCommuteWithShift now considers compressed instruction opportunities when deciding if commuting should be allowed. I believe this is the same or similar to one of the optimizations from D79492. Reviewed By: luismarques, arcbbb Differential Revision: https://reviews.llvm.org/D105417	2021-07-20 09:22:06 -07:00
Craig Topper	98d4adc2d1	[RISCV] Add custom isel to select (and (srl X, C1), C2) and (and (shl X, C1), C2) Replace some existing isel patterns that are covered by the new code. SLLIUWPat has been removed in favor of folding its root case into the new code. The other uses in isel patterns for shXadd.uw have been switched to using hardcoded AND masks. This is based on the original version of D49585 from ARM. The final version of that was made a DAG combine, but I've chosen to keep it as custom isel. I'm not convinced DAG combine is as good with shift pairs as it is with and+shift. I saw some issues optimizing the shifts created by vscale lowering if an and isn't created for from a shift pair. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D106230	2021-07-20 08:53:55 -07:00
Stefan Pintilie	1a6dc92be7	[PowerPC] Inefficient register allocation of ACC registers results in many copies. ACC registers are a combination of four consecutive vector registers. If the vector registers are assigned first this often forces a number of copies to appear just before the ACC register is created. If the ACC register is assigned first then fewer copies are generated when the vector registers are assigned. This patch tries to force the register allocator to assign the ACC registers first and then the UACC registers and then the vector pair registers. It does this by changing the priority of the register classes. This patch also adds hints to help the register allocator assign UACC registers from known ACC registers and vector pair registers from known UACC registers. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D105854	2021-07-20 10:53:40 -05:00
Craig Topper	84877a098a	[RISCV] Use unordered indexed loads for MGATHER. I don't think the semantics of the llvm masked gather intrinsic care about the order the elements are loaded. For example, type legalization by splitting will chain them in parallel. This is different than scatter which we do chain in order. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106025	2021-07-20 08:46:02 -07:00
Bradley Smith	191f9fa5d2	[AArch64][SVE] Move instcombine like transforms out of SVEIntrinsicOpts Instead move them to the instcombine that happens in AArch64TargetTransformInfo. Differential Revision: https://reviews.llvm.org/D106144	2021-07-20 14:17:30 +00:00
Simon Pilgrim	c188f0b876	[X86] X86InstCombineIntrinsic.cpp - silence clang-tidy warnings about incorrect uses of auto. NFCI. We were using auto instead of auto* in a number of places which failed the llvm-qualified-auto check. Additionally we were using auto in some places where the type wasn't immediately obvious - the style guide rule of thumb is only to use auto from casts etc. where the type is already explicitly stated.	2021-07-20 13:37:45 +01:00
Sebastian Neubauer	2b08f6af62	[AMDGPU] Improve register computation for indirect calls First, collect the register usage in each function, then apply the maximum register usage of all functions to functions with indirect calls. This is more accurate than guessing the maximum register usage without looking at the actual usage. As before, assume that indirect calls will hit a function in the current module. Differential Revision: https://reviews.llvm.org/D105839	2021-07-20 13:48:50 +02:00
Stanislav Mekhanoshin	9dc2636623	[AMDGPU] Disable LDS lowering for GFX shaders Apparently these need external LDS symbols to remain. Fixes: SC1-3279 Differential Revision: https://reviews.llvm.org/D106288	2021-07-20 02:55:25 -07:00
Sander de Smalen	eb1a5120b8	[AArch64][SVE][InstCombine] last{a,b} of a splat vector Replace last{a,b}(splat(X)) with X, irrespective of the predicate. Patch by/Committing on behalf of: Usman Nadeem (mnadeem) Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105520	2021-07-20 09:44:43 +01:00
Cullen Rhodes	15af3aaa2e	[AArch64][SME] Add system registers and related instructions This patch adds the new system registers introduced in SME: - ID_AA64SMFR0_EL1 (ro) SME feature identifier. - SMCR_ELx (r/w) streaming mode control register for configuring effective SVE Streaming SVE Vector length when the PE is in Streaming SVE mode. - SVCR (r/w) streaming vector control register, visible at all exception levels. Provides access to PSTATE.SM and PSTATE.ZA using MSR and MRS instructions. - SMPRI_EL1 (r/w) streaming mode execution priority register. - SMPRIMAP_EL2 (r/w) streaming mode priority mapping register. - SMIDR_EL1 (ro) streaming mode identification register. - TPIDR2_EL0 (r/w) for use by SME software to manage per-thread SME context. - MPAMSM_EL1 (r/w) MPAM (v8.4) streaming mode register, for labelling memory accesses performed in streaming mode. Also added in this patch are the SME mode change instructions. Three MSR immediate instructions are implemented to set or clear PSTATE.SM, PSTATE.ZA, or both respectively: - MSR SVCRSM, #<imm1> - MSR SVCRZA, #<imm1> - MSR SVCRSMZA, #<imm1> The following smstart/smstop aliases are also implemented for convenience: smstart -> MSR SVCRSMZA, #1 smstart sm -> MSR SVCRSM, #1 smstart za -> MSR SVCRZA, #1 smstop -> MSR SVCRSMZA, #0 smstop sm -> MSR SVCRSM, #0 smstop za -> MSR SVCRZA, #0 The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D105576	2021-07-20 08:06:26 +00:00
Amara Emerson	56a6686e0c	[AArch64][GlobalISel] Don't form truncstores in postlegalizer-lowering for s128. We don't support truncating s128 stores, so don't form them.	2021-07-20 00:04:34 -07:00
Kai Luo	e2ee27b20b	[PowerPC] Fallback to base's implementation of shouldExpandAtomicCmpXchgInIR and shouldExpandAtomicCmpXchgInIR If we can't decide `shouldExpandAtomicCmpXchgInIR` or `shouldExpandAtomicCmpXchgInIR` in PPC's implementation after https://reviews.llvm.org/rGb9c3941cd61de1e1b9e4f3311ddfa92394475f4b, resort to base's implementation. This fixes internal build of OpenMP which uses atomic operations on float. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D106234	2021-07-20 06:14:24 +00:00
Matt Arsenault	30fa074c0a	AArch64/GlobalISel: Preserve memory types	2021-07-19 20:21:05 -04:00
Derek Schuff	ad1f5457d2	[WebAssembly] Generate R_WASM_FUNCTION_OFFSET relocs in debuginfo sections Debug info sections need R_WASM_FUNCTION_OFFSET_I32 relocs (with FK_Data_4 fixup kinds) to refer to functions (instead of R_WASM_TABLE_INDEX as is used in data sections). Usually this is done in a convoluted way, with unnamed temp data symbols which target the start of the function, in which case WasmObjectWriter::recordRelocation converts it to use the section symbol instead. However in some cases the function can actually be undefined; in this case the dwarf generator uses the function symbol (a named undefined function symbol) instead. In that case the section-symbol transform doesn't work and we need to generate the correct reloc type a different way. In this change WebAssemblyWasmObjectWriter::getRelocType takes the fixup section type into account to choose the correct reloc type. Fixes PR50408 Differential Revision: https://reviews.llvm.org/D103557	2021-07-19 14:02:33 -07:00
Jonas Paulsson	6c0e6895d0	[SystemZ] Handle NoRegister in SystemZTargetLowering::emitMemMemWrapper(). Bugfix: The compiler should be able to generate a memset to nullptr. Review: Ulrich Weigand	2021-07-19 20:04:44 +02:00
Amy Huang	fd972bb9fd	Revert "[llvm][sve] Lowering for VLS truncating stores" because it causes a seg fault (see https://reviews.llvm.org/D104471). This reverts commit `c305557acd`.	2021-07-19 11:03:33 -07:00
Wouter van Oortmerssen	670944fb20	[WebAssembly] Support R_WASM_MEMORY_ADDR_TLS_SLEB64 for wasm64 Also fixed TLS tests swapping addr & value in store op Differential Revision: https://reviews.llvm.org/D106096	2021-07-19 10:22:43 -07:00
Craig Topper	50302feb1d	[SelectionDAG][RISCV] Use isSExtCheaperThanZExt to control whether sext or zext is used for constant folding any_extend. RISCV would prefer a sign extended constant since that works better with our constant materialization. We have an existing TLI hook we use to control sign extension of setcc operands in type legalization. That hook happens to do the right check we need here, but might be straying from its original purpose. With only RISCV defining this hook in tree, I wasn't sure if it was worth adding another hook with identical behavior. This is an alternative to D105785 where I tried to handle this in the RISCV backend by not creating ANY_EXTENDs in some places. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D105918	2021-07-19 09:25:28 -07:00
Simon Pilgrim	142e60f40b	[X86] Fix case of IsAfterLegalize argument. NFC. Pulled out of D106280	2021-07-19 17:15:28 +01:00
David Green	5561ad8b36	[ARM] Remove PromotedBitwiseVT for NEON types This removes the promotion of NEON AND, OR and XOR nodes to v2i32/v4i32, treating them the same as the AArch64 and MVE backends where we just add the relevant patterns for each legal type. This prevents a lot of bitcasts from being added to the DAG, which have the potential to make optimizations more difficult. It does mean adding extra patterns, and some codegen can change due to the types now being legal, not promoted. Differential Revision: https://reviews.llvm.org/D105588	2021-07-19 16:36:33 +01:00
Matt Arsenault	e574fd9d52	AArch64/GlobalISel: Cleanup unnecessary size checks in call lowering The CCValAssign types should now be accurate, so these are no longer necessary.	2021-07-19 11:01:30 -04:00
Jeremy Morse	f46321207f	[InstrRef][X86] Drop debug instruction numbers from x87 instructions Avoid a crash when using instruction referencing if x87 floating point instructions are used. These instructions are significantly mutated when they're rewritten from referring to registers, to referring to floating-point-stack positions. As a result, their operands are re-ordered, and (InstrRef) LiveDebugValues asserts when it sees a DBG_INSTR_REF referring to a non-reg non-def register operand. To fix this, drop the instruction numbers, and thus variable locations. This patch adds a helper utility do do that. Dropping the variable locations is sub-optimal, but applying DBG_VALUEs to the $fp0 and similar registers is dropped on emission too. It seems we've never done well at describing variables that live in x87 registers, at all. Differential Revision: https://reviews.llvm.org/D105657	2021-07-19 15:08:27 +01:00
Jay Foad	96d8f2a1e0	[AMDGPU] Fix typo in comments idexen -> idxen	2021-07-19 13:39:30 +01:00
Kazushi (Jam) Marukawa	4ee28b4fec	[VE] Set getExtendForAtomicOps to ISD::ANY_EXTEND The implementation of subword atomics does not actually guarantee the result is zero-extended, which now caused failures after https://reviews.llvm.org/D101342 was landed. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D106225	2021-07-19 19:58:44 +09:00
Kazushi (Jam) Marukawa	b28e5b7910	[VE] Disable relative lookup table converter pass for VE VE's linker, /opt/nec/ve/bin/nld, doesn't implement relative lookup table. The relative lookup table is introduced by https://reviews.llvm.org/D94355, but we need to disable it at the moment. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D106224	2021-07-19 19:25:33 +09:00
Florian Mayer	d23f26f0af	[NFC] [MTE] helper for stack tagging lifetimes. Reviewed By: eugenis, vitalybuka Differential Revision: https://reviews.llvm.org/D106135	2021-07-19 11:09:16 +01:00
Cullen Rhodes	f91eaa7007	[AArch64][SME] Add SVE2 instructions added in SME This patch adds support for the following instructions: SCLAMP, UCLAMP, REV, DUP (predicate) The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D105577	2021-07-19 08:03:05 +00:00
David Green	eb1e95dbdf	[ARM] Extend more reductions during lowering This relaxes the VMLAV and VADDV reduction recognition code to handle smaller than legal types, extending them as needed. That was already handled for some reductions, this extends it to more types in a more generic way. If a smaller than legal value is found it is extended to the legal type as needed. Differential Revision: https://reviews.llvm.org/D106051	2021-07-19 08:58:03 +01:00
Sander de Smalen	0ed0573527	[AArch64][SVE] Optimize bitcasts between unpacked half/i16 vectors. The case for nxv2f32/nxv2i32 was already covered by D104573. This patch builds on top of that by making the mechanism work for nxv2[b]f16/nxv2i16, nxv4[b]f16/nxv4i16 as well. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D106138	2021-07-19 08:29:28 +01:00
Eli Friedman	6601be4419	[X86] Remove incorrect use of known bits in shuffle simplification. This reverts commit `2a419a0b99`. The result of a shufflevector must not propagate poison from any element other than the one noted in the shuffle mask. The regressions outside of fptoui-may-overflow.ll can probably be recovered some other way; for example, using isGuaranteedNotToBePoison. See discussion on https://reviews.llvm.org/D106053 for more background. Differential Revision: https://reviews.llvm.org/D106222	2021-07-18 18:13:11 -07:00
Simon Pilgrim	51a12d2ff0	[X86][SSE] matchShuffleWithPACK - avoid poison pollution from bitcasting multiple elements together. D106053 exposed that we've not been taking into account that by bitcasting smaller elements together and then performing a ComputeKnownBits on the result we'd be allowing a poison element to influence other neighbouring elements being used in the pack. Instead we now peek through any existing bitcast to ensure that the source type already matches the width source of the pack node we're trying to match. This has also been a chance to stop matchShuffleWithPACK creating unused nodes on the fly which could affect oneuse tests during shuffle lowering/combining. The only regression we're seeing is due to being unable to peek through a bitcast as its on the other side of a extract_subvector - which should go away once we finally allow shuffle combining across different vector widths (by making matchShuffleWithPACK using const SelectionDAG& we've gotten closer to this - see PR45974).	2021-07-18 14:25:28 +01:00
Jon Roelofs	5cd63e9ec2	[AArch64][GlobalISel] Legalize bswap <2 x i16> Differential revision: https://reviews.llvm.org/D105935	2021-07-17 15:31:15 -07:00
David Green	5acddf5b09	[ARM] Lower non-extended small gathers via truncated gathers. Corollary to `1113e06821` this allows us to match gather that dont produce a full vector width results. They use an extended gather which is truncated back to the original type.	2021-07-17 22:38:31 +01:00
Eli Friedman	e41e865b15	[AArch64] Prepare for changes to STEP_VECTOR. Rewrite patterns to assume that the operand of STEP_VECTOR is a constant. The old patterns will stop working when the operand is changed from a Constant to a TargetConstant. (See D105673.) Add test coverage for certain patterns that weren't exercised by existing regression tests. Differential Revision: https://reviews.llvm.org/D105847	2021-07-17 14:13:41 -07:00
Nikita Popov	2c68ecccc9	[OpaquePtr] Remove uses of CreateGEP() without element type Remove uses of to-be-deprecated API. In cases where the correct element type was not immediately obvious to me, fall back to explicit getPointerElementType().	2021-07-17 22:56:27 +02:00
Craig Topper	d0f8047d37	[RISCV] Teach computeKnownBitsForTargetNode that VLENB will never be more than 65536/8.	2021-07-17 11:24:20 -07:00
Nikita Popov	6d3e7c783b	[OpaquePtr] Remove uses of CreateConstGEP1_32() without element type Remove uses of to-be-deprecated API. I've fallen back to calling getPointerElementType() in some cases where the correct type wasn't immediately obvious to me.	2021-07-17 18:32:36 +02:00
Nikita Popov	357756ecf6	[OpaquePtr] Remove uses of CreateConstGEP1_64() without element type Remove uses of to-be-deprecated API.	2021-07-17 16:43:20 +02:00
Nikita Popov	be5af50e7d	[BPF] Use elementtype attribute for preserve.array/struct.index intrinsics Use the elementtype attribute introduced in D105407 for the llvm.preserve.array/struct.index intrinsics. It carries the element type of the GEP these intrinsics effectively encode. This patch: * Adds a verifier check that the attribute is required. * Adds it in the IRBuilder methods for these intrinsics. * Autoupgrades old bitcode without the attribute. * Updates the lowering code to use the attribute rather than the pointer element type. * Updates lots of tests to specify the attribute. * Adds -force-opaque-pointers to the intrinsic-array.ll test to demonstrate they work now. https://reviews.llvm.org/D106184	2021-07-17 11:09:18 +02:00
Craig Topper	173332d175	[RISCV] Manually emit the best shift for VSCALE lowering to improve codegen. We assume VLENB is a multiple of 8 and previously relied on shift pairs being optimized to an AND+SHL/SHR and computeKnownBits removing the AND. This doesn't happen if (vlenb >> 3) gets CSEd to have multiple uses. This patch manually emits the best shift to workaround this.	2021-07-17 00:52:07 -07:00
jacquesguan	f4ec30d808	[RISCV] Make VLEN no greater than 65536 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106134	2021-07-17 12:47:46 +08:00
Carl Ritson	c7f2f81f5e	[AMDGPU] Tidy SReg/SGPR definitions using template class Use a multiclass to consistently define SReg/SGPR/TTMP register classes. Add missing TTMP registers for 96b, 160b, 192b, 224b. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D105800	2021-07-17 11:26:46 +09:00
Matt Arsenault	71de6e9b4a	Mips/GlobalISel: Remove leftover dead code	2021-07-16 20:20:55 -04:00
David Green	ad8e75caa2	[ARM] Fix for matching reductions that are both sext and zext. Fix a silly mistake that was not making sure that _both_ operands were the correct extend code.	2021-07-16 23:11:42 +01:00
Nemanja Ivanovic	35a18a981f	[PowerPC] Implement intrinsics for mtfsf[i] This provides intrinsics for emitting instructions that set the FPSCR (`mtfsf/mtfsfi`). The patch also conservatively marks the rounding mode as an implicit def for both since they both may set the rounding mode depending on the operands. Reviewed By: #powerpc, qiucf Differential Revision: https://reviews.llvm.org/D105957	2021-07-16 16:26:11 -05:00
Jon Roelofs	15267595fd	[RISCV] Compose vector subregs hierarchically This fixes the test I broke in: https://reviews.llvm.org/D105953#2883579 Differential revision: https://reviews.llvm.org/D106168	2021-07-16 12:32:13 -07:00
Simon Pilgrim	d2458bcdc6	[X86][SSE] combineX86ShufflesRecursively - bail if constant folding fails due to oneuse limits. Fixes issue reported on D105827 where a single shuffle of a constant (with multiple uses) was caught in an infinite loop where one shuffle (UNPCKL) used an undef arg but then that got recombined to SHUFPS as the constant value had its own undef that confused matching.....	2021-07-16 19:21:46 +01:00
Lei Huang	c8937b6cb9	[PowerPC] Implement XL compact math builtins Implement a subset of builtins required for compatiblilty with AIX XL compiler. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D105930	2021-07-16 13:21:13 -05:00
Craig Topper	8f0343cc9c	[RISCV] Use tail agnostic policy for fixed vector vwmacc(u). This adds new pseudoinstructions with ForceTailAgnostic set. This matches what we did for non-widening VMACC. We should move to a tail policy operand on the pseudos when we expand the intrinsic interface to include the tail policy.	2021-07-16 10:41:09 -07:00
Craig Topper	d634ec8d29	[RISCV] Refactor where in the multiclass hierarchy we add commutable VFMADD/VFMACC instructions. NFC I'm preparing to add tail agnostic versions of VWMACC and VFWMACC so this will make them more consistent.	2021-07-16 10:41:09 -07:00
Guozhi Wei	5609c8b607	[X86FixupLEAs] Try again to transform the sequence LEA/SUB to SUB/SUB This patch transforms the sequence lea (reg1, reg2), reg3 sub reg3, reg4 to two sub instructions sub reg1, reg4 sub reg2, reg4 Similar optimization can also be applied to LEA/ADD sequence. The modifications to TwoAddressInstructionPass is to ensure the operands of ADD instruction has expected order (the dest register of LEA should be src register of ADD). Differential Revision: https://reviews.llvm.org/D104684	2021-07-16 10:16:03 -07:00
Craig Topper	4dbb788068	[RISCV] Teach constant materialization that it can use zext.w at the end with Zba to reduce number of instructions. If the upper 32 bits are zero and bit 31 is set, we might be able to use zext.w to fill in the zeros after using an lui and/or addi. Most of this patch is plumbing the subtarget features into the constant materialization. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D105509	2021-07-16 09:35:56 -07:00
Craig Topper	0ce13f92b7	[RISCV] Add curly braces around a case body that declares variables. NFC This is at the end of the switch so doesn't cause any issues now, but if a new case is added it will break.	2021-07-16 09:35:56 -07:00
Matt Arsenault	9ad1a49956	Mips/GlobalISel: Use LLT form of getMachineMemOperand NFC here since it's just using a scalar anyway.	2021-07-16 11:41:32 -04:00
Masoud Ataei	ee2068b30e	[PowerPC] Updated the error message of MASSV pass to mention vectorization is needed be enable on P8 and later targets. Differential Revision: https://reviews.llvm.org/D106091	2021-07-16 14:45:09 +00:00
Amy Kwan	ba627a32e1	[PowerPC] Update Refactored Load/Store Implementation, XForm VSX Patterns, and Tests This patch includes the following updates to the load/store refactoring effort introduced in D93370: - Update various VSX patterns that use to "force" an XForm, to instead just XForm. This allows the ability for the patterns to compute the most optimal addressing mode (and to produce a DForm instruction when possible) - Update pattern and test case for the LXVD2X/STXVD2X intrinsics - Update LIT test cases that use to use the XForm instruction to use the DForm instruction Differential Revision: https://reviews.llvm.org/D95115	2021-07-16 09:28:48 -05:00
Fraser Cormack	e3fa2b1eab	Revert "[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID" This reverts commit `a6ca88e908`. More caution is required to avoid overflow/underflow. Thanks to the santizers for catching this.	2021-07-16 15:00:20 +01:00
Matt Arsenault	3ceb92295e	AMDGPU/GlobalISel: Preserve more memory types	2021-07-16 08:57:26 -04:00
Matt Arsenault	21a0ef8d19	AMDGPU/GlobalISel: Redo kernel argument load handling This avoids relying on G_EXTRACT on unusual types, and also properly decomposes structs into multiple registers. This also preserves the LLTs in the memory operands.	2021-07-16 08:56:54 -04:00
Dmitry Preobrazhensky	09c9f4dc7d	[AMDGPU][MC] Added missing isCall/isBranch flags Added isCall for S_CALL_B64; added isBranch for S_SUBVECTOR_LOOP_*. Differential Revision: https://reviews.llvm.org/D106072	2021-07-16 14:59:10 +03:00
Nicholas Guy	9769535efd	[AArch64] Update Cortex-A55 SchedModel to improve LDP scheduling Specifying the latencies of specific LDP variants appears to improve performance almost universally. Differential Revision: https://reviews.llvm.org/D105882	2021-07-16 12:00:57 +01:00
Cullen Rhodes	99eb96f031	[AArch64][SME] Add load and store instructions This patch adds support for following contiguous load and store instructions: * LD1B, LD1H, LD1W, LD1D, LD1Q * ST1B, ST1H, ST1W, ST1D, ST1Q A new register class and operand is added for the 32-bit vector select register W12-W15. The differences in the following tests which have been re-generated are caused by the introduction of this register class: * llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll * llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir * llvm/test/CodeGen/AArch64/stp-opt-with-renaming-reserved-regs.mir * llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir D88663 attempts to resolve the issue with the store pair test differences in the AArch64 load/store optimizer. The GlobalISel differences are caused by changes in the enum values of register classes, tests have been updated with the new values. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D105572	2021-07-16 10:11:10 +00:00
Fraser Cormack	a6ca88e908	[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID This patch teaches the compiler to identify a wider variety of `BUILD_VECTOR`s which form integer arithmetic sequences, and to lower them to `vid.v` with modifications for non-unit steps and non-zero addends. The sequences handled by this optimization must either be monotonically increasing or decreasing. Consecutive elements holding the same value indicate a fractional step which, while simple mathematically, becomes more complex to handle both in the realm of lossy integer division and in the presence of `undef`s. For example, a common "interleaving" shuffle index will be lowered by LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR` nodes. Either of these would ideally be lowered to `vid.v` shifted right by 1. Detection of this sequence in presence of general `undef` values is more complicated, however: `<0,u,u,1,>` could match either `<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence. Both are possible, so backtracking or multiple passes is inevitable. Sticking to monotonic sequences keeps the logic simpler as it can be done in one pass. Fractional steps will likely be a separate optimization in a future patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104921	2021-07-16 10:35:13 +01:00
Mehdi Amini	76374573ce	Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer We can build it with -Werror=global-constructors now. This helps in situation where libSupport is embedded as a shared library, potential with dlopen/dlclose scenario, and when command-line parsing or other facilities may not be involved. Avoiding the implicit construction of these cl::opt can avoid double-registration issues and other kind of behavior. Reviewed By: lattner, jpienaar Differential Revision: https://reviews.llvm.org/D105959	2021-07-16 07:38:16 +00:00
Mehdi Amini	8d051d8546	Revert "Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer" This reverts commit `af9321739b`. Still some specific config broken in some way that requires more investigation.	2021-07-16 07:35:13 +00:00
Mehdi Amini	af9321739b	Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer We can build it with -Werror=global-constructors now. This helps in situation where libSupport is embedded as a shared library, potential with dlopen/dlclose scenario, and when command-line parsing or other facilities may not be involved. Avoiding the implicit construction of these cl::opt can avoid double-registration issues and other kind of behavior. Reviewed By: lattner, jpienaar Differential Revision: https://reviews.llvm.org/D105959	2021-07-16 06:54:26 +00:00
Mehdi Amini	16b5e9d6a2	Revert "Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer" This reverts commit `42f588f39c`. Broke some buildbots	2021-07-16 03:46:53 +00:00
Mehdi Amini	42f588f39c	Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer We can build it with -Werror=global-constructors now. This helps in situation where libSupport is embedded as a shared library, potential with dlopen/dlclose scenario, and when command-line parsing or other facilities may not be involved. Avoiding the implicit construction of these cl::opt can avoid double-registration issues and other kind of behavior. Reviewed By: lattner, jpienaar Differential Revision: https://reviews.llvm.org/D105959	2021-07-16 03:33:20 +00:00
Matt Arsenault	e91da668d0	GlobalISel: Track argument pointeriness with arg flags Since we're still building on top of the MVT based infrastructure, we need to track the pointer type/address space on the side so we can end up with the correct pointer LLTs when interpreting CCValAssigns.	2021-07-15 19:11:40 -04:00
Victor Huang	4eb107ccba	[PowerPC] Add PowerPC population count, reversed load and store related builtins and instrinsics for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds the builtins and instrisics for population count, reversed load and store related operations. Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D106021	2021-07-15 17:23:56 -05:00
Harald van Dijk	a8ad917054	[X86] Fix handling of maskmovdqu in X32 The maskmovdqu instruction is an odd one: it has a 32-bit and a 64-bit variant, the former using EDI, the latter RDI, but the use of the register is implicit. In 64-bit mode, a 0x67 prefix can be used to get the version using EDI, but there is no way to express this in assembly in a single instruction, the only way is with an explicit addr32. This change adds support for the instruction. When generating assembly text, that explicit addr32 will be added. When not generating assembly text, it will be kept as a single instruction and will be emitted with that 0x67 prefix. When parsing assembly text, it will be re-parsed as ADDR32 followed by MASKMOVDQU64, which still results in the correct bytes when converted to machine code. The same applies to vmaskmovdqu as well. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103427	2021-07-15 22:56:08 +01:00
Jessica Paquette	46c8e7122b	[AArch64][GlobalISel] Clamp <n x p0> vecs when legalizing G_EXTRACT_VECTOR_ELT This case was missing from G_EXTRACT_VECTOR_ELT. It's the same as for s64. https://godbolt.org/z/Tnq4acY8z Differential Revision: https://reviews.llvm.org/D105952	2021-07-15 14:05:28 -07:00
Artem Belevich	d774b4aa5e	[NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction. That should allow clang to compile mma.h from CUDA-11.3. Differential Revision: https://reviews.llvm.org/D105384	2021-07-15 12:02:09 -07:00
Sushma Unnibhavi	aaccc985a8	[M68k][GloballSel] LegalizerInfo implementation Added rules for G_ADD, G_SUB, G_MUL, G_UDIV to be legal. Differential Revision: https://reviews.llvm.org/D105536	2021-07-15 13:00:43 -06:00
Sam Tebbs	ff0ef6a518	[ARM][LowOverheadLoops] Make some stack spills valid for tail predication This patch makes vector spills valid for tail predication when all loads from the same stack slot are within the loop Differential Revision: https://reviews.llvm.org/D105443	2021-07-15 19:23:52 +01:00
Quinn Pham	de3956605a	[PowerPC] Fix popcntb XL Compat Builtin for 32bit This patch implements the `__popcntb` XL compatibility builtin for 32bit in the frontend and backend. This patch also updates tests for `__popcntb` and other XL Compat sync related builtins. Reviewed By: #powerpc, nemanjai, amyk Differential Revision: https://reviews.llvm.org/D105360	2021-07-15 13:19:47 -05:00
Stanislav Mekhanoshin	c46d99e4ba	[AMDGPU] Refine -O0 and -O1 passes. Differential Revision: https://reviews.llvm.org/D105579	2021-07-15 09:51:54 -07:00
David Green	dad506bd4e	[ARM] Expand types handled in VQDMULH recognition We have a DAG combine for recognizing the sequence of nodes that make up an MVE VQDMULH, but only currently handles specifically legal types. This patch expands that to other power-2 vector types. For smaller than legal types this means any_extending the type and casting it to a legal type, using a VQDMULH where we only use some of the lanes. The result is sign extended back to the original type, to properly set the invalid lanes. Larger than legal types are split into chunks with extracts and concat back together. Differential Revision: https://reviews.llvm.org/D105814	2021-07-15 14:47:53 +01:00
Simon Pilgrim	91e151476c	[TTI] Consistently make getMinVectorRegisterBitWidth() methods const. NFCI. The underlying getMinVectorRegisterBitWidth() methods are const, but it was missed in a couple of TargetTransformInfo wrappers. Noticed while working on D103925	2021-07-15 13:27:55 +01:00
Irina Dobrescu	831ee6b0c3	[AArch64][GlobalISel] Optimise lowering for some vector types for min/max Differential Revision: https://reviews.llvm.org/D105696	2021-07-15 11:34:32 +01:00
Sebastian Neubauer	afd895709d	[AMDGPU] Use isMetaInstruction for instruction size Meta instructions have a size of 0. Use isMetaInstruction instead of listing them explicitly. Differential Revision: https://reviews.llvm.org/D106043	2021-07-15 12:23:11 +02:00
Cullen Rhodes	dfa76933c2	[AArch64][SME] Add outer product instructions This patch adds support for the following outer product instructions: * BFMOPA, BFMOPS, FMOPA, FMOPS, SMOPA, SMOPS, SUMOPA, SUMOPS, UMOPA, UMOPS, USMOPA, USMOPS. Depends on D105570. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D105571	2021-07-15 09:51:06 +00:00
Bogdan Graur	442123cada	Fixes memory sanitizer 'use-of-uninitialized-value' diagnostic. Differential Revision: https://reviews.llvm.org/D106047	2021-07-15 11:17:04 +02:00
Kai Luo	b9c3941cd6	[PowerPC] Generate inlined quadword lock free atomic operations via AtomicExpand This patch uses AtomicExpandPass to implement quadword lock free atomic operations. It adopts the method introduced in https://reviews.llvm.org/D47882, which expand atomic operations post RA to avoid spilling that might prevent LL/SC progress. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D103614	2021-07-15 01:12:09 +00:00
Thomas Lively	4a4229f70f	[WebAssembly] Codegen for v128.storeX_lane instructions Replace the experimental clang builtins and LLVM intrinsics for these instructions with normal codegen patterns. Resolves PR50435. Differential Revision: https://reviews.llvm.org/D106019	2021-07-14 16:15:25 -07:00
Jon Roelofs	0e49c54a8c	[AArch64] Fix selection of G_UNMERGE <2 x s16> Differential revision: https://reviews.llvm.org/D106007	2021-07-14 13:40:56 -07:00
Stanislav Mekhanoshin	76b7d3432e	[AMDGPU] Add TII::isIgnorableUse() to allow VOP rematerialization Any def of EXEC prevents rematerialization of any VOP instruction because of the physreg use. Create a callback to check if the physreg use can be ingored to allow rematerialization. Differential Revision: https://reviews.llvm.org/D105836	2021-07-14 13:03:58 -07:00
David Green	31b8f40006	[ARM] Move add(VMLALVA(A, X, Y), B) to VMLALVA(add(A, B), X, Y) For i64 reductions we currently try and convert add(VMLALV(X, Y), B) to VMLALVA(B, X, Y), incorporating the addition into the VMLALVA. If we have an add of an existing VMLALVA, this patch pushes the add up above the VMLALVA so that it may potentially be simplified further, for example being folded into another VMLALV. Differential Revision: https://reviews.llvm.org/D105686	2021-07-14 20:06:49 +01:00
Eli Friedman	1e30bf8621	[SelectionDAG] Add an overload of getStepVector that assumes step 1. This is mostly a minor convenience, but the pattern seems frequent enough to be worthwhile (and we'll probably add more uses in the future). Differential Revision: https://reviews.llvm.org/D105850	2021-07-14 11:37:01 -07:00
Thomas Lively	970e090010	[WebAssembly] Codegen for v128.loadX_lane instructions Replace the experimental clang builtin and LLVM intrinsics for these instructions with normal codegen patterns. Resolves PR50433. Differential Revision: https://reviews.llvm.org/D105950	2021-07-14 11:31:53 -07:00
David Green	338314f9c2	[ARM] Lower v16i8 -> i64 VMLA reductions. MVE does not have a VMLALV instruction that can perform v16i8 -> i64 reductions, like it does for v8i16->i64 and v4i32->i64 reductions. That means that the pattern to create them will be spilt up by type legalization, creating a lot of instructions. This extends the patterns for matching i64 reductions a little to handle the v16i8->i64 case. We need to turn them into a pair of v8i16->i64 VMLALVs that each perform half of the reduction and are summed together (so the later is a VMLALVA). The order of the lanes does not matter for the reduction so we generate a MVEEXT for the extension, that will either be folded into a extending load or can be optimized to a VREV/VMOVL. Some of the resulting codegen isn't optimal, but will be improved in a later patch. Differential Revision: https://reviews.llvm.org/D105680	2021-07-14 18:11:32 +01:00
Matt Arsenault	47269da5d8	GlobalISel: Handle lowering non-power-of-2 extloads	2021-07-14 11:54:11 -04:00
Sander de Smalen	eac1670739	[CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> invalid. At the moment, <vscale x 1 x eltty> are not yet fully handled by the code-generator, so to avoid vectorizing loops with that VF, we mark the cost for these types as invalid. The reason for not adding a new "TTI::getMinimumScalableVF" is because the type is supposed to be a type that can be legalized. It partially is, although the support for these types need some more work. Reviewed By: paulwalker-arm, dmgreen Differential Revision: https://reviews.llvm.org/D103882	2021-07-14 16:44:22 +01:00

... 3 4 5 6 7 ...

63852 Commits