llvm-project

Commit Graph

Author	SHA1	Message	Date
Fraser Cormack	a5693445ca	[RISCV] Support OR/XOR/AND reductions on vector masks This patch adds RVV codegen support for OR/XOR/AND reductions for both scalable- and fixed-length vector types. There are a few possible codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be used to some extent -- but the vpopc.m instruction was chosen since it produces the scalar result in one instruction, after which scalar instructions can finish off the computation. The reductions are lowered identically for both scalable- and fixed-length vectors, although some alternate strategies may be more optimal on fixed-length vectors since it's cheaper to get the length of those types. Other reduction types were not deemed to be relevant for mask vectors. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100030	2021-04-08 09:46:38 +01:00
Hsiangkai Wang	ba72bdef32	[RISCV] Add scalable offset under very large stack size. If the stack size is larger than 12 bits, we have to use a scratch register to store the stack size. Before we introduce the scalable stack offset, we could simplify %0 = ADDI %stack.0, 0 => %scratch = ... # sequence of instructions to move the offset into %%scratch %0 = ADD %fp, %scratch However, if the offset contains scalable part, we need to consider it. %0 = ADDI %stack.0, 0 => %scratch = ... # sequence of instructions to move the offset into %%scratch %scratch = ADD %fp, %scratch %scalable_offset = ... # sequence of instructions for vscaled-offset. %0 = ADD/SUB %scratch, %scalable_offset Differential Revision: https://reviews.llvm.org/D100035	2021-04-08 14:46:05 +08:00
Serge Pavlov	65b1103798	[RISCV] DAG nodes and pseudo instructions for CSR access New custom DAG nodes were added to represent operations on CSR. These nodes are lowered to corresponding pseudo instruction. Using the pseudo instructions allows to specify different scheduling information for operations on different system registers. It also make possible to specify dependencies of instructions on specific system registers. Differential Revision: https://reviews.llvm.org/D98936	2021-04-08 10:36:36 +07:00
hsmahesha	ac64995ceb	[AMDGPU] Only use ds_read/write_b128 for alignment >= 16 PS: Submitting on behalf of Jay. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100008	2021-04-08 08:12:05 +05:30
Chen Zheng	74e77295e7	[PowerPC] fixup killed flags for ri + addi to ri transformation Fixup killed flags if DefMI and MI are not in the same basic blocks. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D100023	2021-04-07 22:04:08 -04:00
Stanislav Mekhanoshin	37878de503	Disable use of SCC bit from asm Differential Revision: https://reviews.llvm.org/D100069	2021-04-07 15:32:17 -07:00
Tony Tye	4658cd4c18	[AMDGPU] Update gfx90a memory model support Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100070	2021-04-07 22:17:58 +00:00
Stanislav Mekhanoshin	d5d412f2ae	[AMDGPU] Split GCNRegBankReassign Allow pass to work separately with SGPR, VGPR registers or both. This is NFC now but will be needed to split RA for separate SGPR and VGPR passes. Differential Revision: https://reviews.llvm.org/D100063	2021-04-07 14:45:13 -07:00
Craig Topper	56ea2e2fdd	[RISCV] Add a special case to lowerSELECT for select of 2 constants with a SETLT condition. If the constants have a difference of 1 we can convert one to the other by adding or subtracting the condition. We have a DAG combine for this, but it only runs before type legalization. If the select is introduced later during type legalization or op legalization we will miss it. We don't need a specific condition, but some conditions are harder to materialize than others on RISCV. I know that SETLT will be a single instruction and it is what is used by the motivating pattern from signed saturating add/sub. Differential Revision: https://reviews.llvm.org/D99021	2021-04-07 13:47:17 -07:00
Craig Topper	9895285191	[RISCV] Replace 'return ReplaceNode' with 'ReplaceNode; return;' NFC ReplaceNode is a void function as is the function that we were doing this in. While this is valid code, it was a bit confusing.	2021-04-07 12:18:41 -07:00
Jonas Hahnfeld	6415f424bc	[AArch64] Materialize FP constant in code for large code model When using the large code model with FastISel (for example via clang -O0 which adds the optnone attribute), FP constants could still be materialized using adrp + ldr. Unconditionally enable the existing path for MachO to materialize the constant in code. For testing, restore literal_pools_float.ll to exercise the constant pool and add two optnone-functions that return a float and a double, respectively. Consolidate fpimm.ll and add a new fast-isel-fpimm.ll to check the code paths taken with FastISel. Differential Revision: https://reviews.llvm.org/D99607	2021-04-07 21:02:05 +02:00
Craig Topper	f087d7544a	[RISCV] Support vslide1up/down intrinsics for SEW=64 on RV32. This can't use our normal strategy of splatting the scalar and using a .vv operation instead of .vx. Instead this patch bitcasts the vector to the equivalent SEW=32 vector and inserts the scalar parts using two vslide1up/down. We do that unmasked and apply the mask separately at the end with a vmerge. For vslide1up there maybe some other options here like getting i64 into element 0 and using vslideup.vi with this vector as vd and the original source as vs1. Masking would still need to be done afterwards. That idea doesn't work for vslide1down. We need to slidedown and then insert a single scalar at vl-1 which we could do with a vslideup, but that assumes vl > 0 which I don't think we can assume. The i32 double slide1down implemented here is the best I could come up with and I just made vslide1up consistent. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99910	2021-04-07 10:44:53 -07:00
Sebastian Neubauer	2dc6be5209	[AMDGPU] Update SGPRSpillVGPRCSR name. NFC The struct is used for both, callee and caller-save registers now. The frame index is not set for entrypoints, as we do not need to save the registers then. Update the struct name to reflect that. Differential Revision: https://reviews.llvm.org/D99722	2021-04-07 16:30:40 +02:00
Simon Pilgrim	302e748065	[X86] Improve optimizeCompareInstr for signed comparisons after AND/OR/XOR instructions Extend D94856 to handle 'and', 'or' and 'xor' instructions as well We still fail on many i8/i16 cases as the test and the logic-op are performed on different widths	2021-04-07 14:28:42 +01:00
Jay Foad	bf6cab6f07	[AMDGPU] SIFoldOperands: don't dump extra '\n' after MachineInstr. NFC.	2021-04-07 14:13:00 +01:00
Simon Pilgrim	583258723f	[X86] Improve optimizeCompareInstr for signed comparisons after BZHI instructions Extend D94856 to handle 'bzhi' instructions as well	2021-04-07 12:07:26 +01:00
Qiu Chaofan	033c9c2552	[PowerPC] Fix use check of swap-reduction This will fix swap-reduction in DAGISel for cases where COPY_TO_REGCLASS has multiple uses.	2021-04-07 15:55:52 +08:00
Craig Topper	01a23dccb1	[RISCV] Add an assertion to the ReplaceNodeResults handling of bitcasts to make sure the VT is always a scalar integer.	2021-04-06 16:48:40 -07:00
Nicolás Alvarez	a1aada75f5	[docs] Fix doxygen comments wrongly attached to the llvm namespace Looking at the Doxygen-generated documentation for the llvm namespace currently shows all sorts of random comments from different parts of the codebase. These are mostly caused by: - File doc comments that aren't marked with \file, so they're attached to the next declaration, which is usually "namespace llvm {". - Class doc comments placed before the namespace rather than before the class. - Code comments before the namespace that (in my opinion) shouldn't be extracted by doxygen at all. This commit fixes these comments. The generated doxygen documentation now has proper docs for several classes and files, and the docs for the llvm and llvm::detail namespaces are now empty. Reviewed By: thakis, mizvekov Differential Revision: https://reviews.llvm.org/D96736	2021-04-07 01:20:18 +02:00
Craig Topper	2641c1f15e	[RISCV] Don't custom type legalize fixed vector to scalar integer bitcasts if the fixed vector type isn't legal. We encountered a hang in our internal code base. I'm having trouble creating a test case because the test that hit it was testing some code that is not upstream.	2021-04-06 15:00:33 -07:00
Artem Belevich	d0615a93bb	[NVPTX] Handle bitcast and ASC(101) when trying to avoid argument copy. This allows us to skip the copy in few more cases. Differential Revision: https://reviews.llvm.org/D99979	2021-04-06 13:06:00 -07:00
Amy Kwan	bd6033eca7	[PowerPC] Materialize 34-bit constants with pli directly Previously, 34-bit constants were materialized in selectI64Imm(), and we relied on td pattern matching to instead produce a pli. This becomes problematic as there is no guarantee that the 34-bit constant will reach the td pattern selection for pli. It is also possible for other transformations (such as complex bit permutations) to also produce and utilize the 34-bit constant materialized through selectI64Imm(). This patch instead produces pli on Power10 directly whenever the constant fits within 34-bits. Differential Revision: https://reviews.llvm.org/D99906	2021-04-06 13:38:11 -05:00
Craig Topper	3ae03f67fe	[RISCV] Add helper function to share some of the code for isel of vector load/store intrinsics. Many of the operands are handled the same or in the same order for all these intrinsics. Factor out the code for selecting and pushing them into the Operands vector. Differential Revision: https://reviews.llvm.org/D99923	2021-04-06 09:54:24 -07:00
Jay Foad	8f798566a3	[AMDGPU] SIFoldOperands: use isUseMIInFoldList. NFC.	2021-04-06 17:53:48 +01:00
Simon Pilgrim	53283cc2f1	[X86][SSE] canonicalizeShuffleWithBinOps - add MOVSD/MOVSS handling.	2021-04-06 16:42:18 +01:00
Konstantin Zhuravlyov	844012940e	AMDGPU: Add isBranch=1 to SOPP branch instructions Differential Revision: https://reviews.llvm.org/D99955	2021-04-06 10:59:30 -04:00
Jay Foad	efc7bf27f5	[AMDGPU] SIFoldOperands: use MachineRegisterInfo::hasOneNonDBGUser NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	005dcd196e	[AMDGPU] SIFoldOperands: use range-based loops and make_early_inc_range NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	ce9cca6c3a	[AMDGPU] SIFoldOperands: rename tryFoldInst to tryFoldCndMask This follows the pattern of the other tryFold* functions. NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	cf4f5292f6	[AMDGPU] SIFoldOperands: use getVRegDef instead of getUniqueVRegDef We are in SSA so getVRegDef is equivalent but simpler. NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	e9608a84d8	[AMDGPU][SDag] Add IMG init also for image_gather4 instructions This fixes an oversight in D99747 which moved the IMG init code from SIAddIMGInit to AdjustInstrPostInstrSelection, but did not set the hasPostISelHook flag on gather4 instructions. Differential Revision: https://reviews.llvm.org/D99953	2021-04-06 14:47:20 +01:00
Simon Pilgrim	1dcb5b5e89	[X86] Improve optimizeCompareInstr for signed comparisons after ANDN instructions Extend D94856 to handle 'andn' instructions as well	2021-04-06 14:16:16 +01:00
Dmitry Preobrazhensky	3eadcb86ab	[AMDGPU][MC][GFX9] Corrected SMEM decoding Corrected SMEM decoding when IMM=0 and OFFSET>127 Fixed bug 49819 (https://bugs.llvm.org/show_bug.cgi?id=49819) Differential Revision: https://reviews.llvm.org/D99804	2021-04-06 14:10:46 +03:00
Simon Pilgrim	201877d572	[CostModel][X86] Improve accuracy of vXi8 multiply reduction costs After rG47321c311bdbe0145b9bf45d822185c37b19fa50 we promote vXi8 reductions to vXi16 to create a much faster PMULLW mul reduction, followed by a (free) truncation. This avoids the high cost of repeated vXi8 multiplications (which extend+multiply+truncate to/from vXi16 types....). Fixes the missing vXi8 mul reduction vectorization in PR42674 (Comment #20) 'mul16' test case.	2021-04-06 11:53:22 +01:00
Simon Pilgrim	ddbb58736a	[KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI. As promised in D98866	2021-04-06 10:11:41 +01:00
Sjoerd Meijer	d5f1131c81	[AArch64] Default to zero-cycle-zeroing FP registers It is generally beneficial to prefer "movi d0, #0" over "fmov s0, wzr" as this is most efficient across all cores; it is recognised as a zeroing idiom. For newer cores, fmov instructions can also be eliminated early and there is no difference with movi, but some implementations lack this so is not true for other/older cores. Thus this standardises on using movi as this should always gives the same or better performance than the fmov with wzr. Differential Revision: https://reviews.llvm.org/D99586	2021-04-06 09:47:50 +01:00
Sjoerd Meijer	ef05b08c61	[AArch64] Use 64-bit movi for zeroing halfs/floats This was using the .2d variant which zeros 128 bits, but using the .2s variant that zeros 64 bits is faster on some cores. This is a prep step for D99586 to always using movi for zeroing floats. Differential Revision: https://reviews.llvm.org/D99710	2021-04-06 08:42:13 +01:00
Craig Topper	cb1028a0b9	[RISCV] When custom iseling masked stores, copy the mask into V0 instead of virtual register. I missed a few intrinsics in `3dd4aa7d09` when I did this for masked loads and masked segment loads/stores. Found while trying to share more code between these custom isel functions.	2021-04-05 21:28:32 -07:00
Craig Topper	780a47285a	[RISCV] Add SDTCisInt to the SDTRVVSlide1 since it is only used for vslide1up.vx/vslide1down.vx. The scalar type is already marked as XLenVT. The floating point version would need a different rule.	2021-04-05 13:03:39 -07:00
Craig Topper	af2837675a	[RISCV] Split RISCVISD::VMV_S_XF_VL into separate integer and FP. It's a bit silly, but it allows us to write stricter type constraints for isel. There's still some extra type checks in the generated table due to some type interference limitations around HWMode.	2021-04-05 12:57:35 -07:00
Craig Topper	7edda698c0	[RISCV] Move VSLIDE1UP_VX pattern out of a loop that includes FP types. FP would need VFSLIDE1UP_VF which uses an FP register.	2021-04-05 12:05:54 -07:00
Ricky Taylor	4db18d62af	[M68k] Add support for Motorola literal syntax to AsmParser These look like $00A0cf for hex and %001010101 for binary. They are used in Motorola assembly syntax. Differential Revision: https://reviews.llvm.org/D98519	2021-04-05 20:02:29 +01:00
Fraser Cormack	af3a839c70	[RISCV] Add support for bitcasts between scalars and fixed-length vectors This patch supports bitcasts from scalar types to fixed-length vectors and vice versa. It custom-lowers and custom-legalizes them to EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT operations, using a single-element vectors to hold the scalar where appropriate. Previously, some of these would fail to select, others would be expanded through stack loads and stores. Effort was made to ensure the codegen avoids the stack for both legal and illegal scalar types. Some of the codegen could be improved, but on first glance it looks like a general optimization of EXTRACT_VECTOR_ELT when extracting an i64 element on RV32. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99667	2021-04-05 17:21:55 +01:00
John Paul Adrian Glaubitz	62a94b725c	[M68k] Mark public functions with the LLVM_EXTERNAL_VISIBILITY macro In `0dbcb36394`, most most target symbols were made hidden by default with the public ones marked with LLVM_EXTERNAL_VISIBILITY. When the M68k target was added, this particular change was forgotten so that external tools cannot make use of the public M68k target functions in libLLVM.so. Thus, add the missing LLVM_EXTERNAL_VISIBILITY macro to all public target functions in the M68k backend. Differential Revision: https://reviews.llvm.org/D99869	2021-04-05 09:24:30 -07:00
Fraser Cormack	3f0df4d7b0	[RISCV] Expand scalable-vector truncstores and extloads Caught in internal testing, these operations are assumed legal by default, even for scalable vector types. Expand them back into separate truncations and stores, or loads and extensions. Also add explicit fixed-length vector tests for these operations, even though they should have been correct already. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99654	2021-04-05 17:03:45 +01:00
Simon Pilgrim	36d4f6d7f8	[X86] Fold xor(zext(xor(x,c1)),c2) -> xor(zext(x),xor(zext(c1),c2)) Fixes PR47603 (second case) by extending rG89afec348dbd3e5078f176e978971ee2d3b5dec8	2021-04-05 11:40:37 +01:00
Craig Topper	4708a05da0	[RISCV] Use gorciw for i32 orc.b intrinsic when Zbp is enabled. The W version of orc.b does not exist in Zbp so we need to use gorci encoding. If we have Zbp, we can use gorciw which can avoid a sext.w in some cases.	2021-04-04 17:14:28 -07:00
Craig Topper	98d5db3e3a	[RISCV] Lower orc.b intrinsic to RISCVISD::GORCI. This will allow us to share any future known bits, demaned bits, or sign bits improvements.	2021-04-04 12:31:41 -07:00
Craig Topper	a2ea003fcb	[RISCV] Don't convert fshr/fshl to target specific FSL/FSR node if shift amount is a constant. As long as it's a constant we can directly pattern match it without any problems. It's only when it isn't a constant that we need to add an AND. In theory this should allow more target independent optimizations to remain active.	2021-04-03 23:13:30 -07:00
Roman Lebedev	7727cc242d	[NFC][X86] Split VPMOV* AVX2 instructions into their own sched class At least on all three Zen's, all such instructions cleanly map into this new class with no overrides needed.	2021-04-03 22:39:07 +03:00
Nikita Popov	665065821e	[FastISel] Remove kill tracking This is a followup to D98145: As far as I know, tracking of kill flags in FastISel is just a compile-time optimization. However, I'm not actually seeing any compile-time regression when removing the tracking. This probably used to be more important in the past, before FastRA was switched to allocate instructions in reverse order, which means that it discovers kills as a matter of course. As such, the kill tracking doesn't really seem to serve a purpose anymore, and just adds additional complexity and potential for errors. This patch removes it entirely. The primary changes are dropping the hasTrivialKill() method and removing the kill arguments from the emitFast methods. The rest is mechanical fixup. Differential Revision: https://reviews.llvm.org/D98294	2021-04-03 15:50:13 +02:00
Simon Pilgrim	89afec348d	[X86] Fold xor(truncate(xor(x,c1)),c2) -> xor(truncate(x),xor(truncate(c1),c2)) Fixes PR47603 This should probably be transferable to DAGCombine - the main limitation with the existing trunc(logicop) DAG fold is we don't know if legalization has tried to promote truncated logicops already. We might be able to peek through extensions as well.	2021-04-03 12:43:05 +01:00
Simon Pilgrim	7c17f1ea84	[X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper (REAPPLIED) Use the getTargetShuffleInputs helper for all shuffle decoding Reapplied (after reversion in rGfa0aff6d6960) with fix+test for subvector splitting - we weren't accounting for peeking through bitcasts changing the vector element count of the shuffle sources.	2021-04-03 11:59:19 +01:00
Levy Hsu	f78d932cf2	[RISCV] Add IR intrinsics for Zbc extension Head files are included in a separate patch in case the name needs to be changed. RV32 / 64: clmul clmulh clmulr Differential Revision: https://reviews.llvm.org/D99711	2021-04-02 12:09:13 -07:00
Levy Hsu	944adbf285	Recommit "[RISCV] Add IR intrinsic for Zbb extension" Forgot to amend the Author. Original commit message: Header files are included in a separate patch in case the name needs to be changed. RV32 / 64: orc.b Differential Revision: https://reviews.llvm.org/D99320	2021-04-02 11:50:19 -07:00
Craig Topper	1f0b309f24	Revert "[RISCV] Add IR intrinsic for Zbb extension" This reverts commit `1808194590`. I forgot to change the author.	2021-04-02 11:47:02 -07:00
Craig Topper	1808194590	[RISCV] Add IR intrinsic for Zbb extension Header files are included in a separate patch in case the name needs to be changed. RV32 / 64: orc.b	2021-04-02 11:23:57 -07:00
Levy Hsu	b001d574d7	[RISCV] Add IR intrinsic for Zbr extension Implementation for RISC-V Zbr extension intrinsic. Header files are included in separate patch in case the name needs to be changed RV32 / 64: crc32b crc32h crc32w crc32cb crc32ch crc32cw RV64 Only: crc32d crc32cd Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99009	2021-04-02 10:58:45 -07:00
Craig Topper	d7ffa82a8e	[RISCV] Improve 64-bit integer constant materialization for more cases. For positive constants we try shifting left to remove leading zeros and fill the bottom bits with 1s. We then materialize that constant shift it right. This patch adds a new strategy to try filling the bottom bits with zeros instead. This catches some additional cases.	2021-04-02 10:18:08 -07:00
Brendon Cahoon	09a88278cb	[GlobalISel] Allow different types for G_SBFX and G_UBFX operands Change the definition of G_SBFX and G_UBFX so that the lsb and width can have different types than the src and dst operands. Differential Revision: https://reviews.llvm.org/D99739	2021-04-02 11:11:06 -04:00
Nico Weber	fa0aff6d69	Revert "[X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper" This reverts commit `500969f1d0`. Makes clang assert compiling avx2 code, see https://bugs.chromium.org/p/chromium/issues/detail?id=1195353#c4 for a standalone repro.	2021-04-02 09:55:55 -04:00
Jun Ma	274ac9d40e	[AArch64][SVE] Lowering sve.dot to DOT node Differential Revision: https://reviews.llvm.org/D99699	2021-04-02 20:05:17 +08:00
Jun Ma	ab3c5fb282	[NFC][SVE] Use SVE_4_Op_Imm_Pat for sve_intx_dot_by_indexed_elem	2021-04-02 20:05:17 +08:00
Simon Pilgrim	500969f1d0	[X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper Use the getTargetShuffleInputs helper for all shuffle decoding	2021-04-02 11:50:18 +01:00
Fraser Cormack	3b48d849d4	[RISCV] Optimize more redundant VSETVLIs D99717 introduced some test cases which showed that the output of one vsetvli into another would not be picked up by the RISCVCleanupVSETVLI pass. This patch teaches the optimization about such a pattern. The pattern is quite common when using the RVV vsetvli intrinsic to pass the VL onto other intrinsics. The second test case introduced by D99717 is left unoptimized by this patch. It is a rarer case and will require us to rewire any uses of the redundant vset[i]vli's output to the previous one's. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99730	2021-04-02 10:04:07 +01:00
Yang Fan	bc6001ce1e	[X86] Fix -Wunused-function warning (NFC) GCC warning: ``` /llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:9212:13: warning: ‘bool isHorizOp(unsigned int)’ defined but not used [-Wunused-function] 9212 \| static bool isHorizOp(unsigned Opcode) { \| ^~~~~~~~~ ```	2021-04-02 09:38:12 +08:00
Craig Topper	766d27dc85	[RISCV] Add isel patterns to handle vrsub intrinsic with 2 vector operands. This occurs when we type legalize an i64 scalar input on RV32. We need to manually splat, which requires a vector input. Rather than special case this in lowering just pattern match it.	2021-04-01 14:10:21 -07:00
David Green	da98177cda	[ARM] Allow v6m runtime loop unrolling This removes the restriction that only Thumb2 targets enable runtime loop unrolling, allowing it for Thumb1 only cores as well. The existing T2 heuristics are used (for the time being) to control when and how unrolling is performed. Differential Revision: https://reviews.llvm.org/D99588	2021-04-01 21:21:40 +01:00
Craig Topper	dbbc95e3e5	[RISCV] Use softPromoteHalf legalization for fp16 without Zfh rather than PromoteFloat. The default legalization strategy is PromoteFloat which keeps half in single precision format through multiple floating point operations. Conversion to/from float is done at loads, stores, bitcasts, and other places that care about the exact size being 16 bits. This patches switches to the alternative method softPromoteHalf. This aims to keep the type in 16-bit format between every operation. So we promote to float and immediately round for any arithmetic operation. This should be closer to the IR semantics since we are rounding after each operation and not accumulating extra precision across multiple operations. X86 is the only other target that enables this today. See https://reviews.llvm.org/D73749 I had to update getRegisterTypeForCallingConv to force f16 to use f32 when the F extension is enabled. This way we can still pass it in the lower bits of an FPR for ilp32f and lp64f ABIs. The softPromoteHalf would otherwise always give i16 as the argument type. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D99148	2021-04-01 12:41:57 -07:00
Martin Storsjö	4391d764e1	[ARM] Remove an unused parameter in ARMWinCOFFObjectWriter. NFC. This writer only ever operates on 32 bit arm code. Differential Revision: https://reviews.llvm.org/D99575	2021-04-01 21:25:41 +03:00
Nick Desaulniers	52338af569	[MC][ARM] add .w suffixes for RSB/RSBS T1 See also: F5.1.167 RSB, RSBS (register) T1 shift or rotate by value variant of the Arm ARM. Link: https://github.com/ClangBuiltLinux/linux/issues/1309 Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D99542	2021-04-01 10:45:37 -07:00
Craig Topper	d157e3f387	[RISCV] Fix handling of nxvXi64 vmsgt(u).vx intrinsics on RV32. We need to splat the scalar separately and use .vv, but there is no vmsgt(u).vv. So add isel patterns to select vmslt(u).vv with swapped operands. We also need to get VT to use for the splat from an operand rather than the result since the result VT is nxvXi1. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D99704	2021-04-01 10:38:05 -07:00
Nick Desaulniers	1addc231cd	[MC][ARM] add .w suffixes for ORN/ORNS T1 See also: F5.1.128 ORN, ORNS (register) T1 shift or rotate by value variant of the Arm ARM. Link: https://github.com/ClangBuiltLinux/linux/issues/1309 Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D99538	2021-04-01 10:27:09 -07:00
Craig Topper	b7c2e577cc	[RISCV] Add custom type legalization to form MULHSU when possible. There's no target independent ISD opcode for MULHSU, so custom legalize 2*XLen multiplies ourselves. We have to be a little careful to prefer MULHU or MULHSU. I thought about doing this in isel by pattern matching the (add (mul X, (srai Y, XLen-1)), (mulhu X, Y)) pattern. I decided against this because the add might become part of a chain of adds. I don't trust DAG combine not to reassociate with other adds making it difficult to find both pieces again. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D99479	2021-04-01 10:15:55 -07:00
Jay Foad	fdc4f19e2f	[AMDGPU] Remove SIAddIMGInit pass which is now unused Differential Revision: https://reviews.llvm.org/D99748	2021-04-01 18:13:17 +01:00
Jay Foad	3d07a6d891	[AMDGPU][GlobalISel] Add IMG init in selectImageIntrinsic Doing this during instruction selection avoids the cost of running SIAddIMGInit which is yet another pass over the MIR. Differential Revision: https://reviews.llvm.org/D99670	2021-04-01 18:13:17 +01:00
Jay Foad	4af6251cea	[AMDGPU][SDag] Add IMG init in AdjustInstrPostInstrSelection Doing this in a post-isel hook avoids the cost of running SIAddIMGInit which is yet another pass over the MIR. Differential Revision: https://reviews.llvm.org/D99747	2021-04-01 18:13:17 +01:00
Craig Topper	d61b40ed27	[RISCV] Improve 64-bit integer materialization for some cases. This adds a new integer materialization strategy mainly targeted at 64-bit constants like 0xffffffff where there are 32 or more trailing ones with leading zeros. We can materialize these by using an addi -1 and srli to restore the leading zeros. This matches what gcc does. I haven't limited to just these cases though. The implementation here takes the constant, shifts out all the leading zeros and shifts ones into the LSBs, creates the new sequence, adds an srli, and checks if this is shorter than our original strategy. I've separated the recursive portion into a standalone function so I could append the new strategy outside of the recursion. Since external users are no longer using the recursive function, I've cleaned up the external interface to return the sequence instead of taking a vector by reference. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D98821	2021-04-01 09:12:52 -07:00
Jay Foad	b1fbfd9e4c	[AMDGPU] Small cleanup to constructRetValue and its caller. NFC.	2021-04-01 16:36:16 +01:00
Mircea Trofin	ce61def529	[regalloc] Ensure Query::collectInterferringVregs is called before interval iteration The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs() needs to be called before iterating the interfering live ranges. The rest of the patch offers support that is the case: instead of clearing the query's InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix is invalidated (existing triggering mechanism). Without the change in RegAllocGreedy.cpp, the compiler ices. This patch should make it more easily discoverable by developers that collectInterferringVregs needs to be called before iterating. I will follow up with a subsequent patch to improve the usability and maintainability of Query. Differential Revision: https://reviews.llvm.org/D98232	2021-04-01 08:33:28 -07:00
Bradley Smith	2f45e632c0	[AArch64][SVE] Improve codegen for select nodes with fixed types Additionally, move the existing fixed vselect tests to *-vselect.ll. Differential Revision: https://reviews.llvm.org/D99418	2021-04-01 15:54:37 +01:00
Bradley Smith	0934fa4f5d	[AArch64][SVE] SVE functions should use the SVE calling convention for fast calls When an SVE function calls another SVE function using the C calling convention we use the more efficient SVE VectorCall PCS. However, for the Fast calling convention we're incorrectly falling back to the generic AArch64 PCS. This patch adds the same "can use SVE vector calling convention" detection used by CallingConv::C to CallingConv::Fast. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D99657	2021-04-01 15:52:08 +01:00
Brendon Cahoon	65c8bfb509	[AMDGPU] Enable output modifiers for double precision instructions Update SIFoldOperands pass to recognize v_add_f64 and v_mul_f64 instructions for folding output modifiers. Differential Revision: https://reviews.llvm.org/D99505	2021-04-01 10:08:17 -04:00
Dmitry Preobrazhensky	cd953434f2	[AMDGPU][MC][GFX10][GFX90A] Corrected _e32/_e64 suffices Fixed bugs https://bugs.llvm.org//show_bug.cgi?id=49643, https://bugs.llvm.org//show_bug.cgi?id=49644, https://bugs.llvm.org//show_bug.cgi?id=49645. Differential Revision: https://reviews.llvm.org/D99413	2021-04-01 14:21:00 +03:00
Simon Pilgrim	abbe80fa52	[X86][SSE] Fold HOP(HOP(X,X),HOP(Y,Y)) -> HOP(PERMUTE(HOP(X,Y)),PERMUTE(HOP(X,Y)) For slow-hop targets, attempt to merge HADD/SUB pairs used in chains.	2021-04-01 11:54:10 +01:00
Simon Pilgrim	301319840e	[X86][SSE] Enable (F)HADD/SUB handling to SimplifyMultipleUseDemandedVectorElts Attempt to bypass unused horiz-op operands. This is very similar to the PACKSS/PACKUS handling - we should try to merge these.	2021-04-01 11:54:09 +01:00
Simon Pilgrim	f7aeaced65	[X86][SSE] Add isHorizOp helper function. NFCI.	2021-04-01 11:54:09 +01:00
Dmitry Preobrazhensky	0f5ebbcc7f	[AMDGPU][MC] Added flag to identify VOP instructions which have a single variant By convention, VOP1/2/C instructions which can be promoted to VOP3 have _e32 suffix while promoted instructions have _e64 suffix. Instructions which have a single variant should have no _e32/_e64 suffix. Unfortunately there was no simple way to identify single variant instructions - it was implemented by a hack. See bug https://bugs.llvm.org/show_bug.cgi?id=39086. This fix simplifies handling of single VOP instructions by adding a dedicated flag. Differential Revision: https://reviews.llvm.org/D99408	2021-04-01 13:53:12 +03:00
Sam Parker	92e7771483	[WebAssembly] Invert branch condition on xor input A frequent pattern for floating point conditional branches use an xor to invert the input for the branch. Instead we can fold away the xor by swapping the branch target instead. Differential Revision: https://reviews.llvm.org/D99171	2021-04-01 09:23:28 +01:00
Craig Topper	c88ee1a094	[RISCV] Add UnsupportedSchedZfh multiclass to reduce duplicate lines from RISCVSchedRocket.td and RISCVSchedSiFive7.td. NFC	2021-03-31 15:06:14 -07:00
YangKeao	1c268a8ff4	[X86] add dwarf annotation for inline stack probe While probing stack, the stack register is moved without dwarf information, which could cause panic if unwind the backtrace. This commit only add annotation for the inline stack probe case. Dwarf information for the loop case should be done in another patch and need further discussion. Reviewed By: nagisa Differential Revision: https://reviews.llvm.org/D99579	2021-04-01 00:32:50 +03:00
Thomas Lively	45783d0e8a	[WebAssembly] Implement i64x2 comparisons Removes the prototype builtin and intrinsic for i64x2.eq and implements that instruction as well as the other i64x2 comparison instructions in the final SIMD spec. Unsigned comparisons were not included in the final spec, so they still need to be scalarized via a custom lowering. Differential Revision: https://reviews.llvm.org/D99623	2021-03-31 10:46:17 -07:00
Craig Topper	437958d9fd	[X86] Improve SMULO/UMULO codegen for vXi8 vectors. The default expansion creates a MUL and either a MULHS/MULHU. Each of those separately expand to sequences that use one or more PMULLW instructions as well as additional instructions to extend the types to vXi16. The MULHS/MULHU expansion computes the whole 16-bit product, but only keeps the high part. We can improve the lowering of SMULO/UMULO for some cases by using the MULHS/MULHU expansion, but keep both the high and low parts. And we can use those parts to calculate the overflow. For AVX512 we might have vXi1 overflow outputs. We can improve those by using vpcmpeqw to produce a k register if AVX512BW is enabled. This is a little better than truncating the high result to use vpcmpeqb. If we don't have avx512bw we can extend up to v16i32 to use vpcmpeqd to produce a k register. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97624	2021-03-31 10:13:50 -07:00
Shimin Cui	00c0c8c87d	[PowerPC] [MLICM] Enable hoisting of caller preserved registers on AIX On ppc64 linux , MachineLICM will hoist caller preserved registers, including TOC loads of the global variable address, out of loops. This is to enable this on AIX for both ppc64 and ppc32. Differential Revision: https://reviews.llvm.org/D99076	2021-03-31 12:46:25 -04:00
Craig Topper	50b8634a99	[X86] Improve optimizeCompareInstr for signed comparisons after BMI/TBM instructions We previously couldn't optimize out a TEST if the branch/setcc/cmov used the overflow flag. This patches allows the TEST to be removed if the flag producing instruction is known to clear the OF flag. Thats what the TEST instruction would have done so that should be equivalent. Need to add test cases. I'll try to get back to this if I have bandwidth. Fixes PR48768. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D94856	2021-03-31 09:45:29 -07:00
Craig Topper	2a8b7cab6a	[RISCV] Add RISCVISD opcodes for CLZW and CTZW. Our CLZW isel pattern is quite easily broken by surrounding code preventing it from matching sometimes. This usually results in failing to remove the and X, 0xffffffff inserted by type legalization. The add with -32 that type legalization also inserts will often gets combined into other add/sub nodes. That doesn't usually result in extra code when we don't use clzw. CTTZ seems to be less fragile, but I wanted to keep it consistent with CTLZ. Reviewed By: asb, HsiangKai Differential Revision: https://reviews.llvm.org/D99317	2021-03-31 09:40:07 -07:00
Craig Topper	04f10ab367	[RISCV] Add isel patterns to select vsub_vx intrinsic to vadd.vi if it uses a small enough immediate Also modify the simm5_plus1 check because Imm-1 is UB if Imm happens to be INT64_MIN. I don't think the compiler would optimize based on that in this usage, but it could fail UBSan or -ftrapv. Reviewed By: HsiangKai, frasercrmck Differential Revision: https://reviews.llvm.org/D99637	2021-03-31 09:26:41 -07:00
Sander de Smalen	7108b2dec1	[SVE] Fix LoopVectorizer test scalalable-call.ll This marks FSIN and other operations to EXPAND for scalable vectors, so that they are not assumed to be legal by the cost-model. Depends on D97470 Reviewed By: dmgreen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97471	2021-03-31 14:52:49 +01:00
Sander de Smalen	2f6f249a49	NFC: Change getIntrinsicInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Depends on D97468 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D97469	2021-03-31 14:04:41 +01:00
Sander de Smalen	2f56e1c6b1	NFC: Change getTypeBasedIntrinsicCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Depends on D97466 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D97468	2021-03-31 14:04:41 +01:00
Roman Lebedev	ce548aa236	[X86] AMD Zen 3 has macro fusion This is an improvement over Zen 2, where only branch fusion is supported, as per Agner, 21.4 Instruction fusion. AMD SOG 17h has no mention of fusion. AMD SOG 19h, 2.9.3 Branch Fusion The following flag writing instructions support branch fusion with their reg/reg, reg/imm and reg/mem forms * CMP * TEST * SUB * ADD * INC (no fusion with branches dependent on CF) * DEC (no fusion with branches dependent on CF) * OR * AND * XOR Agner, 22.4 Instruction fusion <...> This applies to CMP, TEST, ADD, SUB, AND, OR, XOR, INC, DEC and all conditional jumps, except if the arithmetic or logic instruction has a rip-relative address or both an address displacement and an immediate operand.	2021-03-31 14:31:50 +03:00
Fraser Cormack	10fc6e4358	[RISCV] Add support for the stepvector intrinsic This adds almost everything required for supporting the new stepvector intrinsic on RVV. It is lowered to the existing VID_VL SDNode. The only exception is a limitation that RV32 cannot yet lower the intrinsic on i64 vectors. This is because the step operand is (currently) required to be at least as large as the vector element type. I will look into patching that out and loosening the requirement to only an integer pointer type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99594	2021-03-31 11:41:17 +01:00
Jay Foad	5d0e9ddfa5	[AMDGPU][GlobalISel] Add support for global atomicrmw fadd This includes gfx908 which only has a no-return version of the global_atomic_add_f32 instruction, using the same hack that was previously implemented for selecting from the llvm.amdgcn.global.atomic.fadd intrinsic. Differential Revision: https://reviews.llvm.org/D97767	2021-03-31 11:13:00 +01:00
Florian Hahn	52e015081a	[AArch64] Avoid SCALAR_TO_VECTOR for single FP constant vector. Currently the code only checks for integer constants (ConstantSDNode) and triggers an infinite cycle for single-element floating point vector constants. We need to check for both FP and integer constants. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D99384	2021-03-31 10:17:36 +01:00
Sander de Smalen	3ccbd4f3c7	NFC: Change getUserCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Depends on D97382 Reviewed By: ctetreau, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97466	2021-03-31 10:13:09 +01:00
Craig Topper	5db19cc010	[RISCV] simm12_plus1 should not inherit from Operand. NFC We only use this in Pat patterns, so it just needs to be an ImmLeaf. If we did need it as an instruction operand, the ParserMatchClass, EncoderMethod, and DecoderMethod were probably wrong.	2021-03-30 19:02:11 -07:00
Craig Topper	05998701b9	[RISCV] Remove some unused ImmLeafs. NFC These got left behind when we switched RV32 to use selectImm to match RV64.	2021-03-30 18:54:11 -07:00
David Green	3a6365a439	[ARM] Add FeatureHasNoBranchPredictor for Thumb1 cores Mark v6m/v8m-baseline cores as having no branch predictors. This should not alter very much on its own, but is more correct as the cores do not have branch predictors and can help in the future.	2021-03-30 21:45:26 +01:00
Fangrui Song	73adc05ced	[GlobalISel] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D99463	2021-03-30 12:52:56 -07:00
Amara Emerson	a35c2c7942	[GlobalISel] Implement fewerElements legalization for vector reductions. This patch adds 3 methods, one for power-of-2 vectors which use tree reductions using vector ops, before a final reduction op. For non-pow-2 types it generates multiple narrow reductions and combines the values with scalar ops. Differential Revision: https://reviews.llvm.org/D97163	2021-03-30 11:19:21 -07:00
Amara Emerson	1bc90847ee	[AArch64][GlobalISel] Define some legalization rules for G_ROTR and G_ROTL. For imported pattern purposes, we have a custom rule that promotes the rotate amount to 64b as well. Differential Revision: https://reviews.llvm.org/D99463	2021-03-30 11:11:19 -07:00
Jessica Paquette	700431128e	[GlobalISel][AArch64] Combine G_SEXT_INREG + right shift -> G_SBFX Basically a port of isBitfieldExtractOpFromSExtInReg in AArch64ISelDAGToDAG. This is only done post-legalization for now. Once the legalizer knows how to decompose these back into shifts, this requirement can probably be removed. Differential Revision: https://reviews.llvm.org/D99230	2021-03-30 10:14:30 -07:00
Craig Topper	a33fcafaf0	[RISCV] Pass 'half' in the lower 16 bits of an f32 value when F extension is enabled, but Zfh is not. Without Zfh the half type isn't legal, but it could still be used as an argument/return in IR. Clang will not generate this today. Previously we promoted the half value to float for arguments and returns if the F extension is enabled but Zfh isn't. Then depending on which ABI is enabled we would pass it in either an FPR or a GPR in float format. If the F extension isn't enabled, it would get passed in the lower 16 bits of a GPR in half format. With this patch the value will always in half format and will be in the lower bits of a GPR or FPR. This should be consistent with where the bits are located when Zfh is enabled. I've based this implementation off of how this is done on ARM. I've manually nan-boxed the value to 32 bits using integer ops. It looks like flw, fsw, fmv.s, fmv.w.x, fmf.x.w won't canonicalize nans so should leave the value alone. I think those are the instructions that could get used on this value. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D98670	2021-03-30 09:47:54 -07:00
Tomas Matheson	a9968c0a33	[NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions Currently needsStackRealignment returns false if canRealignStack returns false. This means that the behavior of needsStackRealignment does not correspond to it's name and description; a function might need stack realignment, but if it is not possible then this function returns false. Furthermore, needsStackRealignment is not virtual and therefore some backends have made use of canRealignStack to indicate whether a function needs stack realignment. This patch attempts to clarify the situation by separating them and introducing new names: - shouldRealignStack - true if there is any reason the stack should be realigned - canRealignStack - true if we are still able to realign the stack (e.g. we can still reserve/have reserved a frame pointer) - hasStackRealignment = shouldRealignStack && canRealignStack (not target customisable) Targets can now override shouldRealignStack to indicate that stack realignment is required. This change will make it easier in a future change to handle the case where we need to realign the stack but can't do so (for example when the register allocator creates an aligned spill after the frame pointer has been eliminated). Differential Revision: https://reviews.llvm.org/D98716 Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87	2021-03-30 17:31:39 +01:00
Craig Topper	f069000b43	[RISCV] Remove floating point condition code legalization from lowerFixedLengthVectorSetccToRVV. After D98939, this is done by LegalizeVectorOps making this code dead. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99519	2021-03-30 09:11:56 -07:00
Sebastian Neubauer	1c3b74f0ab	[AMDGPU] Remove outdated TODOs. NFC spillSGPRToVGPR is already respected in these places since D95768. Differential Revision: https://reviews.llvm.org/D99570	2021-03-30 15:18:49 +02:00
Sanjay Patel	e694e19a79	[x86] enhance matching of pmaddwd This was crashing with the example from: https://llvm.org/PR49716 ...and that was avoided with `a283d72583` , but as we can see from the SSE vs. AVX test code diff, we can try harder to match the pattern. This matcher code was adapted from another pmadd pattern match in D49636, but it needs different ops to deal with size mismatches. Differential Revision: https://reviews.llvm.org/D99531	2021-03-30 07:28:33 -04:00
David Green	d4b3380dfe	[ARM] Handle Splats in MVE lane interleaving As another addition to MVE lane interleaving, this handles Splat shuffle vectors, as the shuffle of a splat is a splat. Differential Revision: https://reviews.llvm.org/D97291	2021-03-30 11:19:16 +01:00
Joe Ellis	a7dde4c5f7	[AArch64][SVE] Lower fixed length INSERT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D98496	2021-03-30 09:37:11 +00:00
Joe Ellis	c4d39f64d0	[AArch64][SVE] Lower fixed length EXTRACT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D98625	2021-03-30 09:35:44 +00:00
Bing1 Yu	0c63b862c4	Revert "[X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation" This reverts commit `275df61f04`.	2021-03-30 16:33:07 +08:00
Bing1 Yu	275df61f04	[X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D99244	2021-03-30 16:21:10 +08:00
Jun Ma	65462a08bf	[NFC][SVE] Remove redundant pattern	2021-03-30 10:35:08 +08:00
Jun Ma	1af373c673	[AArch64][SVE] Codegen dup_lane for dup(vector_extract) Differential Revision: https://reviews.llvm.org/D99324	2021-03-30 10:35:08 +08:00
Jun Ma	b0db2dbc29	[AArch64][SVEIntrinsicOpts] Optimize tbl+dup into dup+extractelement Differential Revision: https://reviews.llvm.org/D99412	2021-03-30 10:35:08 +08:00
Evandro Menezes	fd94cfeeb5	[RISCV] Move scheduling resources for B into a separate file (NFC) Differential Revision: https://reviews.llvm.org/D99557	2021-03-29 20:37:22 -05:00
Thomas Lively	a1b8b0739a	[WebAssembly] Fix i8x16.popcnt opcode When I updated the SIMD opcodes in `f5764a8654`, I accidentally missed updating i8x16.popcnt. This patch fixes the omission. Differential Revision: https://reviews.llvm.org/D99536	2021-03-29 17:23:15 -07:00
Florian Hahn	482283042f	[AArch64] Remove custom zext/sext legalization code. Currently performExtendCombine assumes that the src-element bitwidth * 2 is a valid MVT. But this is not the case for i1 and it causes a crash on the v64i1 test cases added in this patch. It turns out that this code appears to not be needed; the same patterns are handled by other code and we end up with the same results, even without the custom lowering. I also added additional test cases in `a50037aaa6`. Let's just remove the unneeded code. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99437	2021-03-29 22:22:05 +01:00
Nikita Popov	7669455df4	[X86][FastISel] Fix with.overflow eflags clobber (PR49587) If the successor block has a phi node, then additional moves may be inserted into predecessors, which may clobber eflags. Don't try to fold the with.overflow result into the branch in that case. This is done by explicitly checking for any phis in successor blocks, not sure if there's some more principled way to address this. Other fused compare and branch patterns avoid the issue by emitting the comparison when handling the branch, so that no instructions may be inserted in between. In this case, the with.overflow call is emitted separately (and I don't think this is avoidable, as it will generally have at least two users). Fixes https://bugs.llvm.org/show_bug.cgi?id=49587. Differential Revision: https://reviews.llvm.org/D98600	2021-03-29 23:08:47 +02:00
Stanislav Mekhanoshin	619b88849e	[AMDGPU] Fix "Sequence" spelling. NFC.	2021-03-29 12:11:36 -07:00
Joe Nash	45fd7c02af	Revert "[AMDGPU] Mark additional VOP3 as commutable" This reverts commit `d35d8da7d6`.	2021-03-29 14:48:11 -04:00
Joe Nash	d35d8da7d6	[AMDGPU] Mark additional VOP3 as commutable Note, only src0 and src1 will be commuted if the isCommutable flag is set. This patch does not change that, it just makes it possible to commute src0 and src1 of more instructions. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D99376 Change-Id: I61e20490962d95ea429beb355c55f55c024dafdc	2021-03-29 14:22:20 -04:00
Craig Topper	3dd4aa7d09	[RISCV] When custom iseling masked loads/stores, copy the mask into V0 instead of virtual register. This matches what we do in our isel patterns. In our internal testing we've found this is needed to make the fast register allocator happy at -O0. Otherwise it may assign V0 to an earlier operand and find itself with no registers left when it reaches the mask operand. By using V0 explicitly, the fast register allocator will see it when it checks for phys register usages before it starts allocating vregs. I'll try to update this with a test case. Unfortunately, this does appear to prevent some instruction reordering by the pre-RA scheduler which leads to the increased spills seen in some tests. I suspect that problem could already occur for other instructions that already used V0 directly. There's a lot of repeated code here that could do with some wrapper functions. Not sure if that should be at the level of the new code that deals with V0. That would require multiple output parameters to pass the glue, chain and register back. Maybe it should be at a higher level over the entire set of push_backs. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D99367	2021-03-29 10:20:43 -07:00
Craig Topper	54bacaf311	[X86] Always use rip-relative addressing on 64-bit when rematerializing all zeros/ones registers using a folded load. Previously we only used RIP relative when PIC was enabled. But we know we're in small/kernel code model here so we should be able to always use RIP-relative which will give a smaller encoding. Here's a godbolt link that demonstrates the current codegen https://godbolt.org/z/j3158o Note in the non-PIC version the load from .LCPI0_0 doesn't use RIP-relative addressing, but if you change the constant in the source from 0.0 to 1.0 it will become RIP-relative. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97208	2021-03-29 10:06:17 -07:00
Roger Ferrer Ibanez	ef76a333fa	[RISCV] Fix offset computation for RVV In D97111 we changed the RVV frame layout when using sp or bp to address the stack slots so we could address the emergency stack slot. The idea is to put the RVV objects as far as possible (in offset terms) from the frame reference register (sp / fp / bp). When using fp this happens naturally because the RVV objects are already the top of the stack and due to the constraints of RVV (VLENB being a power of two >= 128) the stack remains aligned. The rest of this summary does not apply to this case. When using sp / bp we need to skip the non-RVV stack slots. The size of the the non-RVV objects is computed subtracting the callee saved register size (whose computation is added in D97111 itself) to the total size of the stack (which does not account for RVV stack slots). However, when doing so we round to 16 bytes when computing that size and we end emitting a smaller offset that may belong to a scalar stack slot (see D98801). So this change removes that rounding. Also, because we want the RVV objects be between the non-RVV stack slots and the callee-saved register slots, we need to make sure the RVV objects are properly aligned to 8 bytes. Adding a padding of 8 would render the stack unaligned. So when allocating space for RVV (only when we don't use fp) we need to have extra padding that preserves the stack alignment. This way we can round to 8 bytes the offset that skips the non-RVV objects and we do not misalign the whole stack in the way. In some circumstances this means that the RVV objects may have padding before (=lower offsets from sp/bp) and after (before the CSR stack slots). Differential Revision: https://reviews.llvm.org/D98802	2021-03-29 17:03:49 +00:00
Bradley Smith	9745dce8c3	[SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps This is currently performed in SelectionDAGLegalize, here we make it also happen in LegalizeVectorOps, allowing a target to lower the SETCC condition codes first in LegalizeVectorOps and then lower to a custom node afterwards, without having to duplicate all of the SETCC condition legalization in the target specific lowering. As a result of this, fixed length floating point SETCC nodes can now be properly lowered for SVE. Differential Revision: https://reviews.llvm.org/D98939	2021-03-29 15:32:25 +01:00
Simon Pilgrim	805148eaf2	[X86][SSE] combineHorizOpWithShuffle - consistently use getTargetShuffleInputs to decode shuffles Minor cleanup before I start trying to merge the unary/binary shuffle combining paths.	2021-03-29 11:31:19 +01:00
Nashe Mncube	19601a4c6c	[SVE][Analysis]Instruction costs for ops on scalable-vec The following operations have no associated cost for them when applied to scalable vectors, and as a consequence can trigger a crash when a call is made to AArch64TTIImpl::getCastInstrCost(): - fptrunc - trunc - fpext - fpto(u,s)i This patch adds costs for these operations and relevant regression tests. Differential Revision: https://reviews.llvm.org/D98934	2021-03-29 11:15:50 +01:00
David Green	3a68c6d26c	[ARM] Extend MVE lane interleaving to handle other non-instruction leaves This extends the recent MVE lane interleaving passto handle other non-instruction leaves, for which a new shuffle is added. This helps especially for constants and potentially for arguments. Differential Revision: https://reviews.llvm.org/D97289	2021-03-29 09:05:45 +01:00
David Green	6c88ffeda3	[ARM] Fix the Changed value in the MVE lane interleaving pass.	2021-03-28 23:47:53 +01:00
Craig Topper	69bdf35dc7	[X86] Optimize vXi8 MULHS on targets where we can't sign_extend to the next register size. For these cases we need to extract the upper or lower elements, multiply them using 16-bit multiplies and repack them. Previously we used punpcklbw/punpckhbw+psraw or pmovsxbw+pshudfd to extract and sign extend so we could use pmullw to compute the 16-bit product and then shift down the high bits. We can avoid the need to sign extend if we unpack the bytes into the high byte of each word and fill the lower byte with 0 using pxor. This puts the sign bit of each byte into the sign bit of each word. Since the LHS and RHS have 8 trailing zeros, the full 32-bit product of those 16-bit values will have 16 trailing zeros. This means the 16-bit product of the original bytes is in the upper 16 bits which we can calculate using pmulhw. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98587	2021-03-28 11:41:29 -07:00
David Green	7b6f760fcd	[ARM] MVE vector lane interleaving MVE does not have a single sext/zext or trunc instruction that takes the bottom half of a vector and extends to a full width, like NEON has with MOVL. Instead it is expected that this happens through top/bottom instructions. So the MVE equivalent VMOVLT/B instructions take either the even or odd elements of the input and extend them to the larger type, producing a vector with half the number of elements each of double the bitwidth. As there is no simple instruction for a normal extend, we often have to expand sext/zext/trunc into a series of lane moves (or stack loads/stores, which we do not do yet). This pass takes vector code that starts at truncs, looks for interconnected blobs of operations that end with sext/zext and transforms them by adding shuffles so that the lanes are interleaved and the MVE VMOVL/VMOVN instructions can be used. This is done pre-ISel so that it can work across basic blocks. This initial version of the pass just handles a limited set of instructions, not handling constants or splats or FP, which can all come as extensions to this base. Differential Revision: https://reviews.llvm.org/D95804	2021-03-28 19:34:58 +01:00
Florian Hahn	eb3d9f2eb6	[SelDag] Add isIntOrFPConstant helper function. This patch adds a new isIntOrFPConstant helper function to check if a SDValue is a integer of FP constant. This pattern is used in various places. There also are places that incorrectly just check for integer constants, e.g. D99384, so hopefully this helper will help people avoid that issue. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D99428	2021-03-28 12:48:58 +01:00
Hsiangkai Wang	bc82e9bf25	[RISCV] Add vfabs.v pseudo instruction. Differential Revision: https://reviews.llvm.org/D99454	2021-03-28 10:24:05 +08:00
Craig Topper	5692fc38e0	[RISCV] Add a pattern for (sext_inreg (mul (and X, 0xffffffff), (and Y, 0xffffffff)), i32) to suppress MULW formation We have a special pattern for (mul (and X, 0xffffffff), (and Y, 0xffffffff)), to optimize the ANDs to shift. But if a sext_inreg coms first, we'll form a MULW and limit the effectiveness of the special match. So this patch adds a larger pattern to suppress the MULW formation by emitting a sext.w and then the same output we use for the (mul (and X, 0xffffffff), (and Y, 0xffffffff)). This should all get CSEd. This is the issue I was trying to fix with D99029, but that affected many more tests.	2021-03-27 15:37:18 -07:00
Simon Pilgrim	2a0d5da917	[X86][SSE] foldShuffleOfHorizOp - remove broadcast handling. Remove VBROADCAST/MOVDDUP/splat-shuffle handling from foldShuffleOfHorizOp This can all be handled by canonicalizeShuffleMaskWithHorizOp along as we check that the HADD/SUB are only used once (to prevent infinite loops on slow-horizop targets which will try to reuse the nodes again followed by a post-hop shuffle).	2021-03-27 15:09:23 +00:00
Simon Pilgrim	41146bfe82	[X86][SSE] combineX86ShuffleChain - attempt to recognise 'hidden' identity shuffles See if the combined shuffle mask is equivalent to an identity shuffle, typically this is due to repeated LHS/RHS ops in horiz-ops, but isTargetShuffleEquivalent might see other patterns as well. This is another small step towards getting rid of foldShuffleOfHorizOp and relying on canonicalizeShuffleMaskWithHorizOp and generic shuffle combining.	2021-03-27 11:09:30 +00:00
Sanjay Patel	a283d72583	[x86] prevent crashing while matching pmaddwd This could crash in 2 ways: either one or both of the input vectors could be a different size than the math ops. https://llvm.org/PR49716	2021-03-27 05:27:14 -04:00
Craig Topper	4d5ee71b52	[RISCV] Merge FMulAdd and FMulSub scheduler classes to a single FMA scheduler class. NFC It's unlikely that FMADD and FMSUB would have different scheduling information so merge them. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D99140	2021-03-26 16:37:20 -07:00
Craig Topper	c41f2f6492	[RISCV] Add scheduler classes for the Zba and Zbb extensions. I've used IALU for the simplest operations from Zbb: min, minu, max, maxu, sext.b, sext.h, zext.h, andn, orn, xnor I've put add.uw in IALU32 and slli.uw in ShiftImm32. Remaining instructions have received new classes. All 3 shadd are grouped together. shadd.uw are grouped together. Rotate left and right are together. Everything else got their own class containing one instruction. I think what I have here is the minimum granularity we need. I could be convinced that we need more classes. Reviewed By: evandro Differential Revision: https://reviews.llvm.org/D99040	2021-03-26 14:15:29 -07:00
Simon Pilgrim	c769ba9514	[X86][AVX] combineHorizOpWithShuffle - improve SHUFFLE(HOP(LOSUBVECTOR(X),HISUBVECTOR(X))) folding Peek through bitcasts to find subvector splits and use getTargetShuffleInputs to decode target shuffles as well as ShuffleVectorSDNode	2021-03-26 17:23:54 +00:00
Jay Foad	9d08f276d7	[AMDGPU] Use reductions instead of scans in the atomic optimizer If the result of an atomic operation is not used then it can be more efficient to build a reduction across all lanes instead of a scan. Do this for GFX10, where the permlanex16 instruction makes it viable. For wave64 this saves a couple of dpp operations. For wave32 it saves one readlane (which are generally bad for performance) and one dpp operation. Differential Revision: https://reviews.llvm.org/D98953	2021-03-26 15:38:14 +00:00
Zakk Chen	9049cf77e3	[RISCV] Add constraint for RVV indexed loads. Add the constraint when destination EEW not equals the source EEW for correctness. The RVV spec has three register overlap rules and I implement the first stricter constraint because the others are difficult to enforce. Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D98920	2021-03-26 07:23:24 -07:00
Jay Foad	d92b4956d6	[AMDGPU] Inline FSHRPattern into its only use. NFC.	2021-03-26 09:32:02 +00:00
Craig Topper	8f62a80328	[RISCV] Optimize (and (shl GPR:, uimm5:), 0xffffffff) to use 2 shifts instead of 3. The and would normally become SLLI+SRLI, giving us 2 SLLI+SRLI. We can detect this and combine the 2 SLLIs into 1.	2021-03-25 23:31:01 -07:00
Craig Topper	5a18c576c4	[RISCV] Don't call CheckAndMask from selectZExti32. Now that targetShrinkDemandedConstant preserves 0xffffffff masks we shouldn't need to call computeKnownBits here.	2021-03-25 22:07:41 -07:00
Jessica Paquette	23f657c165	[AArch64][GlobalISel] Emit bzero on Darwin Darwin platforms for both AArch64 and X86 can provide optimized `bzero()` routines. In this case, it may be preferable to use `bzero` in place of a memset of 0. This adds a G_BZERO generic opcode, similar to G_MEMSET et al. This opcode can be generated by platforms which may want to use bzero. To emit the G_BZERO, this adds a pre-legalize combine for AArch64. The conditions for this are largely a port of the bzero case in `AArch64SelectionDAGInfo::EmitTargetCodeForMemset`. The only difference in comparison to the SelectionDAG code is that, when compiling for minsize, this will fire for all memsets of 0. The original code notes that it's not beneficial to do this for small memsets; however, using bzero here will save a mov from wzr. For minsize, I think that it's preferable to prioritise omitting the mov. This also fixes a bug in the libcall legalization code which would delete instructions which could not be legalized. It also adds a check to make sure that we actually get a libcall name. Code size improvements (Darwin): - CTMark -Os: -0.0% geomean (-0.1% on pairlocalalign) - CTMark -Oz: -0.2% geomean (-0.5% on bullet) Differential Revision: https://reviews.llvm.org/D99358	2021-03-25 17:14:25 -07:00
Yonghong Song	886f9ff531	BPF: add extern func to data sections if specified This permits extern function (BTF_KIND_FUNC) be added to BTF_KIND_DATASEC if a section name is specified. For example, -bash-4.4$ cat t.c void foo(int) __attribute__((section(".kernel.funcs"))); int test(void) { foo(5); return 0; } The extern function foo (BTF_KIND_FUNC) will be put into BTF_KIND_DATASEC with name ".kernel.funcs". This will help to differentiate two kinds of external functions, functions in kernel and functions defined in other bpf programs. Differential Revision: https://reviews.llvm.org/D93563	2021-03-25 16:03:29 -07:00
Craig Topper	5797feaa55	[RISCV] Reorder checks in RISCVTTIImpl::getGatherScatterOpCost to avoid calling getMinRVVVectorSizeInBits() when V extension is not enabled. getMinRVVVectorSizeInBits() asserts if the V extension isn't enabled. So check that gather/scatter is legal first since it already contains a check for V extension being enabled. It also already checks getMinRVVVectorSizeInBits for fixed length vectors so we don't need a check in getGatherScatterOpCost.	2021-03-25 14:20:47 -07:00
Krzysztof Parzyszek	a5b7d38c57	[Hexagon] Limit virtual register reuse range in FI elimination	2021-03-25 13:59:36 -05:00
David Green	d97189600e	[ARM] Revert WhileLoopStartLR to DoLoopStart If a WhileLoopStartLR is reverted due to calls in the preheader, we may still be able to instead create a DoLoopStart, preserving the low overhead loop. This adds code for that, only reverting the WhileLoopStartR to a Br/Cmp, leaving the rest of the low overhead loop in place. Differential Revision: https://reviews.llvm.org/D98413	2021-03-25 16:44:15 +00:00
Craig Topper	c40cea6f08	[RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffffffff). We look for this pattern frequently in isel patterns so its a good idea to try to preserve it. This also let's us remove our special isel handling for srliw and use a direct pattern match of (srl (and X, 0xffffffff), C) since no bits will be removed from the and mask. Differential Revision: https://reviews.llvm.org/D99042	2021-03-25 09:03:25 -07:00
Fraser Cormack	99211352c1	[RISCV] Optimize select-like vector shuffles This patch adds a small optimization for vector shuffle lowering, detecting shuffles which can be re-expressed as vector selects. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99270	2021-03-25 11:39:57 +00:00
Fraser Cormack	321a71a772	[RISCV] Optimize BUILD_VECTOR sequences that reveal hidden splats This patch adds further optimization techniques to RVV BUILD_VECTOR lowering. It teaches the compiler to find splats of larger vector element types "hidden" in smaller ones. For example, a v4i8 build_vector (0x1, 0x2, 0x1, 0x2) could be splat as v2i16 0x0201. This is generally more optimal than the dominant-element BUILD_VECTORs and so takes priority. This optimization is currently limited to all-constant-or-undef BUILD_VECTORs as those were found to be the most common. There's no reason this couldn't be extended to other BUILD_VECTORs, but the additional bit-manipulation instructions may require more sophisticated heuristics. There are some cases where the materialization of the larger constant takes more scalar instructions than it does to build the vector with vector instructions. We could add heuristics to try and catch this. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99195	2021-03-25 10:35:31 +00:00
Simon Pilgrim	36e3c6c841	[X86][AVX] Truncate vectors with PACKSS/PACKUS on AVX2 targets Until AVX512 we don't have any vector truncation instructions, and always lower using shuffles instead. combineVectorTruncation performs this earlier than lowering as it makes it easier to use any sign/zero-extended bits in the truncated bits with PACKSS/PACKUS to perform the shuffle. We currently don't attempt to use combineVectorTruncation on AVX2 targets as in the past 256-bit PACKSS/PACKUS tended to cause 128-bit lane shuffle regressions - but these should now be all resolved with combineHorizOpWithShuffle and in all cases we now reduce the amount of cross-lane shuffling and variable shuffle mask usage. Differential Revision: https://reviews.llvm.org/D96609	2021-03-25 10:34:34 +00:00
Simon Pilgrim	9fde88c3e2	[X86][AVX] splitIntVSETCC - handle separate (canonicalized) SETCC operands LowerVSETCC calls splitIntVSETCC after canonicalizing certain patterns, in particular (X & CPow2 != 0) -> (X & CPow2 == CPow2). Unfortunately if we're splitting for AVX1/non-AVX512BW cases, we lose these canonicalizations as we call the split with the original SetCC node, and when the split nodes are later lowered in LowerVSETCC the patterns are lost behind extract_subvector etc. But if we pass the canonicalized operands for splitting we retain the optimizations. Differential Revision: https://reviews.llvm.org/D99256	2021-03-25 10:18:44 +00:00
Serge Pavlov	ddb0bcbdff	Add missing cases in RISCVMCExpr::getVariantKindName Differential Revision: https://reviews.llvm.org/D98929	2021-03-25 12:57:05 +07:00
Craig Topper	0f99c6c56e	[RISCV] Remove duplicate DebugLoc variables from cases in ReplaceNodeResults. NFC We already created a DebugLoc at the top of the function. We can just use that one.	2021-03-24 20:23:03 -07:00
Jessica Paquette	56e6eb7975	[AArch64][GlobalISel] Make G_UBFX/G_SBFX legalization check for constants The original rule just checked the type, but this is actually only legal if it has a constant. Differential Revision: https://reviews.llvm.org/D99298	2021-03-24 13:58:27 -07:00
Albion Fung	e29bb074c6	[PowerPC] Exploit xxsplti32dx (constant materialization) for scalars This patch exploits the xxsplti32dx instruction available on Power10 in place of constant pool loads where xxspltidp would not be able to, usually because the immediate cannot fit into 32 bits. Differential Revision: https://reviews.llvm.org/D95458	2021-03-24 15:59:59 -04:00
Roland McGrath	3cb2346982	[AArch64] Support .arch_extension pan This makes the behavior consistent with the GNU assembler. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D99209	2021-03-24 11:29:22 -07:00
Jessica Paquette	a141c7d06b	[AArch64][GlobalISel] Select G_SBFX and G_UBFX Add selection support for G_SBFX and G_UBFX and add a test. These must always have a constant LSB and width. Differential Revision: https://reviews.llvm.org/D99224	2021-03-24 11:15:57 -07:00
Craig Topper	512bae81cc	[RISCV] Add basic cost modelling for fixed vector gather/scatter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99142	2021-03-24 11:14:14 -07:00
Jessica Paquette	1818dc394f	[AArch64][GlobalISel] Mark G_SBFX/G_UBFX as legal for s32 and s64 This isn't perfect, since we should also verify that these only use constants. Differential Revision: https://reviews.llvm.org/D99219	2021-03-24 11:08:41 -07:00
Craig Topper	f24f09d256	[RISCV] Add TTI support for cpop with Zbb This will tell loop idiom recognize that it can make popcount loops countable using the ctpop intrinsic. I didn't bother checking for illegal types. Type legalization knows how to split a ctpop into multiple ctops added together. Assuming we only receive reasonable integer bit widths, a few cpop instructions added together is probably better than the loop. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99203	2021-03-24 10:58:42 -07:00
David Green	14b2ec934e	[ARM] Enable UpperBound unrolling for all loops This UpperBound unrolling was already enabled so long as a series of conditions in ARMTTIImpl::getUnrollingPreferences pass. This just always enables it as it can help fully unroll loops that would not otherwise pass those tests. Differential Revision: https://reviews.llvm.org/D99174	2021-03-24 16:39:21 +00:00
Konstantin Zhuravlyov	f4ace63737	AMDGPU: Add target id and code object v4 support - Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id) - Add code object v4 support (https://llvm.org/docs/AMDGPUUsage.html#elf-code-object) - Add kernarg_size to kernel descriptor - Change trap handler ABI to no longer move queue pointer into s[0:1] - Cleanup ELF definitions - Add V2, V3, V4 suffixes to make a clear distinction for code object version - Consolidate note names Differential Revision: https://reviews.llvm.org/D95638	2021-03-24 11:54:05 -04:00
Sander de Smalen	55d18b3cc2	[TTI] Return a TypeSize from getRegisterBitWidth. This patch changes the interface to take a RegisterKind, to indicate whether the register bitwidth of a scalar register, fixed-width vector register, or scalable vector register must be returned. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98874	2021-03-24 14:45:13 +00:00
Nashe Mncube	ac2a1e9596	[SVE] Suppress vselect warning from incorrect interface call The VSelectCombine handler within AArch64ISelLowering, uses an interface call which only expects fixed vectors. This generates a warning when the call is made on a scalable vector. This warning has been suppressed with this change, by using the ElementCount interface, which supports both fixed and scalable vectors. I have also added a regression test which recreates the warning. Differential Revision: https://reviews.llvm.org/D98249	2021-03-24 14:34:34 +00:00
Anirudh Prasad	301d9261b7	[AsmParser][SystemZ][z/OS] Re-introduce HLASM comment syntax - https://reviews.llvm.org/rGb605cfb336989705f391d255b7628062d3dfe9c3 was reverted due to sanitizer bugs in the introduced unit-test (specifically in the Address sanitizer https://lab.llvm.org/buildbot/#/builders/5/builds/5697) - This patch attempts to rectify that, as well as re-factor parts of the test - The issue was previously, within the `setupCallToAsmParser` function in the unit-test, `SrcMgr` was declared as a local variable. `SrcMgr` owns a unique pointer. Since the variable goes out of scope at the end of the function, the unique pointer is released. - This patch, moves the declaration of the `SrcMgr` variable to a class field, since the scope will remain until the class's destructor is invoked (which in this case is at the end of the unit test) - Furthermore, this patch also moves the `MCContext Ctx` declaration from a local variable instance inside a function, to a unique pointer class field. This ensures the instantiation of the MCContext remains until the tear down of the test. Reviewed By: abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D99004	2021-03-24 10:17:00 -04:00
Simon Pilgrim	7920527796	[X86][AVX] combineBitcastvxi1 - improve handling of vectors truncated to vXi1 If we're truncating to vXi1 from a wider type, then prefer the original wider vector as is simplifies folding the separate truncations + extensions. AVX1 this is only worth it for v8i1 cases, not v4i1 where we're always better off truncating down to v4i32 for movmsk. Helps with some regressions encountered in D96609	2021-03-24 14:05:59 +00:00
Stefan Pintilie	91f4c11133	[PowerPC] Add mprivileged option Add an option to tell the compiler that it can use privileged instructions. This patch only adds the option. Backend implementation will be added in a future patch. Reviewed By: lei, amyk Differential Revision: https://reviews.llvm.org/D99193	2021-03-24 08:33:22 -05:00
Simon Pilgrim	e9015bd595	[X86][AVX] lowerShuffleAsBroadcast - MOVDDUP(SCALAR_TO_VECTOR(X)) -> BROADCAST(X) Prefer broadcast from scalar on AVX targets as this makes it easier for later folds to strip away bitcasts etc. This helps a lot with the AVX1 poor codegen from PR49658. There's a trivial regression in bitcast-int-to-vector-bool-*ext.ll tests due to SimplifyDemandedBits not being able to see a multi-use case, but there's bigger existing codegen issues to be addressed first in those tests (unnecessary NOTs).	2021-03-24 11:31:56 +00:00
Simon Pilgrim	c1ef642ad8	[X86] Remove unused 'OneUse' option from IsNOT helper. NFCI.	2021-03-24 11:14:38 +00:00
alex-t	dccf83acf9	[AMDGPU] SIOptimizeExecMaskingPreRA should check constant bus constraint when folds EXEC copy Folding EXEC copy into it's single use may lead to constant bus constraint violation as it adds one more SGPR operand. This change makes it validate the user instruction with the new SGPR operand and only fold it if it is legal. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D98888	2021-03-24 14:14:13 +03:00
Andrew Savonichev	292da93d59	[MCA] Disable RCU for InOrderIssueStage This is a follow-up for: D98604 [MCA] Ensure that writes occur in-order When instructions are aligned by the order of writes, they retire in-order naturally. There is no need for an RCU, so it is disabled. Differential Revision: https://reviews.llvm.org/D98628	2021-03-24 13:54:04 +03:00
Stefan Pintilie	0e4f5f3ea6	[PowerPC] Change option to mrop-protect In order to have the same option on power PC LLVM and power PC gcc the option will be changed from -mrop-protection to -mrop-protect. The feature will be off by default and turned on when the option is used. Reviewed By: lei, amyk Differential Revision: https://reviews.llvm.org/D99185	2021-03-24 05:51:35 -05:00
Andy Wingo	c9801db2eb	[WebAssembly][MC] Record limit constraints for table sizes This commit adds a full WasmTableType to MCSymbolWasm, differing from the current situation (just an ElemType) in that it additionally records a WasmLimits. We add support for specifying the limits in .S files also, via the following syntax variations: .tabletype SYM, ELEMTYPE .tabletype SYM, ELEMTYPE, MINSIZE .tabletype SYM, ELEMTYPE, MINSIZE, MAXSIZE Depends on D99186. Differential Revision: https://reviews.llvm.org/D99191	2021-03-24 09:44:22 +01:00
Jim Lin	503f1d845f	[RISCV] Add HasStdExtD predicate to copysign from double and to double patterns Copysign from double and to double patterns have lack of HasStdExtD predicate. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99234	2021-03-24 14:29:23 +08:00
Craig Topper	6204ac4536	[X86] Bale out of X86FastISel::X86SelectCmp for vectors. None of the code in this function was written to handle vectors. Most of the cases already fail for vectors for one reason or another. The exception is an optimization that detects identical operands. This can be triggered by vectors, but the code always creates a 0 or 1 constants in a scalar register which is incorrect for vectors. Fixes PR49706.	2021-03-23 20:16:04 -07:00
Amara Emerson	7bddf00581	[AArch64][GlobalISel] Lower G_FSHL and G_FSHR. Codegen isn't as good as we need it, but that'll be done later.	2021-03-23 16:09:19 -07:00
Amara Emerson	75b6a47bd0	[AArch64][GlobalISel] Lower G_CTLZ_ZERO_UNDEF. This adds some missing legalizer tests, which uncovered a v2s64 selection test that wasn't working since there's no legalization or instruction for that.	2021-03-23 12:49:10 -07:00
Jay Foad	fd142e6c18	[AMDGPU] Simplify AMDGPUAnnotateUniformValues::visitBranchInst. NFC. A BranchInst is always the terminator of its containing BasicBlock.	2021-03-23 16:54:43 +00:00
Joe Nash	538bda0b80	[AMDGPU] Refactor DPPCombine NFC. Extract IsShrinkable into a helper function, and make Subtarget a member variable. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D99099 Change-Id: If4bc97a88a9ae4eb1df47e717345d46a6ed515bf	2021-03-23 11:53:53 -04:00
Craig Topper	839a46d88f	[RISCV] Use selectImm for RV32. NFC Previously we used selectImm for RV64 and isel patterns for RV32. This should be NFC, but will allow RV32 and RV64 to share improvements in the future. For example, it might be useful to use BSETI from Zbs to make single bit constants. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98877	2021-03-23 08:57:15 -07:00
Jay Foad	fc7e3e7dd9	[AMDGPU] Set SchedRW on real instructions Coyp SchedRW from pseudos to real instructions so that llvm-mca has access to it. This is NFC for normal compiler codegen, which schedules pseudos not real instructions. Add an llvm-mca test for some high latency double-precision instructions as a smoke test. Differential Revision: https://reviews.llvm.org/D99187	2021-03-23 15:38:11 +00:00
Fraser Cormack	feff66a082	[RISCV] Further optimize BUILD_VECTORs with repeated elements This patch builds upon the initial BUILD_VECTOR work introduced in D98700. It further optimizes the lowering of BUILD_VECTOR by using VSELECT operations to effectively insert repeated elements into the vector with relatively few instructions. This allows us to optimize more BUILD_VECTORs without significantly increasing the size of the generated code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98969	2021-03-23 14:14:48 +00:00
Matt Arsenault	b24436ac96	GlobalISel: Lower funnel shifts	2021-03-23 09:11:17 -04:00
Benjamin Kramer	39e36fff3d	[AArch64] Fix unused variable warning	2021-03-23 13:42:14 +01:00
Victor Campos	f22b4c7122	[ARM] Handle debug instrs in ARM Low Overhead Loop pass In function ConvertVPTBlocks(), it is assumed that every instruction within a vector-predicated block is predicated. This is false for debug instructions, used by LLVM. Because of this, an assertion failure is reached when an input contains debug instructions inside VPT blocks. In non-assert builds, an out of bounds memory access took place. The present patch properly covers the case of debug instructions. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99075	2021-03-23 11:49:06 +00:00
David Sherwood	748ae5281d	[IR][SVE] Add new llvm.experimental.stepvector intrinsic This patch adds a new llvm.experimental.stepvector intrinsic, which takes no arguments and returns a linear integer sequence of values of the form <0, 1, ...>. It is primarily intended for scalable vectors, although it will work for fixed width vectors too. It is intended that later patches will make use of this new intrinsic when vectorising induction variables, currently only supported for fixed width. I've added a new CreateStepVector method to the IRBuilder, which will generate a call to this intrinsic for scalable vectors and fall back on creating a ConstantVector for fixed width. For scalable vectors this intrinsic is lowered to a new ISD node called STEP_VECTOR, which takes a single constant integer argument as the step. During lowering this argument is set to a value of 1. The reason for this additional argument at the codegen level is because in future patches we will introduce various generic DAG combines such as mul step_vector(1), 2 -> step_vector(2) add step_vector(1), step_vector(1) -> step_vector(2) shl step_vector(1), 1 -> step_vector(2) etc. that encourage a canonical format for all targets. This hopefully means all other targets supporting scalable vectors can benefit from this too. I've added cost model tests for both fixed width and scalable vectors: llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll as well as codegen lowering tests for fixed width and scalable vectors: llvm/test/CodeGen/AArch64/neon-stepvector.ll llvm/test/CodeGen/AArch64/sve-stepvector.ll See this thread for discussion of the intrinsic: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html	2021-03-23 10:43:35 +00:00
Fraser Cormack	5bfbd9d938	[RISCV] Optimize all-constant mask BUILD_VECTORs This patch adds an optimization for mask-vector BUILD_VECTOR nodes whose elements are all constants or undef. It lowers such operations by building up the vector via a series of integer operations, in which multiple mask elements are inserted into a vector at a time via i8/i16/i32/i64 element types. The final result is then bitcast from that integer vector. We restrict this optimization in certain circumstances when optimizing for size. If we are required to use more than one integer insert operation, then it will likely increase code size compared with using a load from a constant pool. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98860	2021-03-23 10:11:19 +00:00
Simon Pilgrim	080cb83e52	[X86][AVX] Narrow VPBROADCASTQ->VPBROADCASTD if we don't need the upper bits. Helps fix cases where we've splatted smaller types to a wider vector element type without needing the upper bits. Avoid this on AVX512 targets as that can affect broadcast folding.	2021-03-23 09:41:02 +00:00
Pushpinder Singh	d0e5422eb8	[GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO Reviewed By: foad Differential Revision: https://reviews.llvm.org/D93963	2021-03-23 05:45:43 +00:00
Craig Topper	d7b0c19823	[RISCV] Add scheduler classes to Zfh instructions. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D99053	2021-03-22 20:30:09 -07:00
Craig Topper	8db4804da7	[RISCV] Remove unused SchedWrites WriteFConv32/WriteFConv64/WriteFMov32/WriteFMov64. It doesn't look like any instructions have ever been assigned to these classes. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D99050	2021-03-22 20:29:18 -07:00
Carl Ritson	64db6b8d37	[AMDGPU] Only unbundle memory accesses in SIMemoryLegalizer This restores previous behaviour and is a step toward removing unbundling entirely. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D99061	2021-03-23 11:30:36 +09:00
serge-sans-paille	e617cf9576	[NFC] Restore original SmallString size for X86TargetMachine::getSubtargetImpl lookup Better safe than sorry here, quoting Craig Topper: > Clang passes a pretty lengthy feature string.	2021-03-22 19:19:46 +01:00
Craig Topper	294efcd6f7	[RISCV] Add support for fixed vector masked gather/scatter. I've split the gather/scatter custom handler to avoid complicating it with even more differences between gather/scatter. Tests are the scalable vector tests with the vscale removed and dropped the tests that used vector.insert. We're probably not as thorough on the splitting cases since we use 128 for VLEN here but scalable vector use a known min size of 64. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98991	2021-03-22 10:17:30 -07:00
Matt Arsenault	1dd23c6d53	AMDGPU: Allow tail calls for amdgpu_gfx functions	2021-03-22 10:55:19 -04:00
Stefan Pintilie	b8f3c6d011	[PowerPC][NFC] Do not enter prefix selection if it cannot do better. Do not try to materialize a constant using prefix instructions if the selection using non prefix instructions was able to do it using a single non prefix instruction. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D98791	2021-03-22 09:17:52 -05:00
Sjoerd Meijer	7515e81e8c	[AArch64] Add some float -> int -> float conversion patterns This adds some conversion match patterns for which we want to keep the int values in FP registers using the corresponding NEON instructions (not the FP instructions) to avoid more costly int <-> fp register transfers. Differential Revision: https://reviews.llvm.org/D98956	2021-03-22 11:06:08 +00:00
serge-sans-paille	b2f7ce91a6	[NFC] Simpler and faster key computation for getSubtargetImpl memoization There's no use in computing a large key that's only used for a memoization optimization.	2021-03-22 10:02:51 +01:00
Qiu Chaofan	52f33f7953	[PowerPC] Enable redundant TOC save removal on AIX Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D97039	2021-03-22 14:29:22 +08:00
Bing1 Yu	113f077f80	[X86] Pass to transform tdpbf16ps intrinsics to scalar operation. In previous patch https://reviews.llvm.org/D93594, we only scalarize tilezero, tileload, tilestore and tiledpbssd. In this patch we scalarize tdpbf16ps intrinsic. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D96110	2021-03-22 13:00:40 +08:00
Matt Arsenault	a0f5aad6d7	AMDGPU: Fix allowing immediates for tail call pseudo. The pseudo was using SSrc_b64, so it allowed folding immediates into the destination operand for a tail call to null. However, this is not a valid operand for the s_setpc_b64 this will be lowered to. Avoids printing the operand as an invalid immediate. Avoids a regression when tail calls are enabled in GlobalISel (somehow tail calls to null get deleted in the DAG).	2021-03-21 13:14:04 -04:00
Matt Arsenault	6314a72730	AMDGPU/GlobalISel: Enable CSE in pre-legalizer combiner	2021-03-21 10:07:37 -04:00
Simon Pilgrim	3179588947	[X86][AVX] ComputeNumSignBitsForTargetNode - add X86ISD::VBROADCAST handling for scalar sources The target shuffle code handles vector sources, but X86ISD::VBROADCAST can also accept a scalar source for splatting. Added as an extension to PR49658	2021-03-21 12:22:51 +00:00
David Green	6d9d2049c8	[ARM] VINS f16 pattern This adds an extra pattern for inserting an f16 into a odd vector lane via an VINS. If the dual-insert-lane pattern does not happen to apply, this can help with some simple cases. Differential Revision: https://reviews.llvm.org/D95471	2021-03-21 12:00:06 +00:00
luxufan	02ffbac844	[RISCV] remove redundant instruction when eliminate frame index The reason for generating mv a0, a0 instruction is when the stack object offset is large then int<12>. To deal this situation, in the elimintateFrameIndex function, it will create a virtual register, which needs the register scavenger to scavenge it. If the machine instruction that contains the stack object and the opcode is ADDI(the addi was generated by frameindexNode), and then this instruction's destination register was the same as the register that was generated by the register scavenger, then the mv a0, a0 was generated. So to eliminnate this instruction, in the eliminateFrameIndex function, if the instrution opcode is ADDI, then the virtual register can't be created. Differential Revision: https://reviews.llvm.org/D92479	2021-03-21 18:54:00 +08:00
Simon Pilgrim	297b9bc3fa	[X86][AVX] computeKnownBitsForTargetNode - add X86ISD::VBROADCAST handling for scalar sources The target shuffle code handles vector sources, but X86ISD::VBROADCAST can also accept a scalar source for splatting. Suggested by @craig.topper on PR49658	2021-03-21 10:40:57 +00:00
Simon Pilgrim	54a05f2ec8	[X86] computeKnownBitsForTargetNode - add X86ISD::PMULUDQ handling Reuse the existing KnownBits multiplication code to handle what is effectively a ISD::UMUL_LOHI varient	2021-03-21 09:57:20 +00:00
Jessica Clarke	b2bb003774	[RISCV] Update comment in RISCVInstrInfoM.td Missed in `07ed62b7d5`.	2021-03-20 22:35:40 +00:00
Craig Topper	07ed62b7d5	[RISCV] Disable (mul (and X, 0xffffffff), (and Y, 0xffffffff)) optimization when Zba is enabled. This optimization is trying to save SRLI instructions needed to implement the ANDs. If we have zext.w we won't save anything. Because we don't check that the multiply is the only user of the AND we might even increase instruction count.	2021-03-20 15:31:45 -07:00
Craig Topper	b0d8823a8a	[RISCV] Add isel pattern to optimize (mul (and X, 0xffffffff), (and Y, 0xffffffff)) on RV64 This patterns computes the full 64 bit product of a 32x32 unsigned multiply. This requires a two pairs of SLLI+SRLI to zero the upper 32 bits of the inputs. We can do better than this by using two SLLI to move the lower bits to the upper bits then use MULHU to compute the product. This is the high half of a full 64x64 product. Since we put 32 0s in the lower bits of the inputs we know the 128-bit product will have zeros in the lower 64 bits. So the upper 64 bits, which MULHU computes, will contain the original 64 bit product we were after. The same trick would work for (mul (sext_inreg X, i32), (sext_inreg Y, i32)) using MULHS, but sext_inreg is sext.w which is already one instruction so we wouldn't save anything. Differential Revision: https://reviews.llvm.org/D99026	2021-03-20 14:55:46 -07:00
Fangrui Song	879760c245	[VE] Fix types of multiclass template arguments in TableGen files There were not properly checked before `[TableGen] Improve handling of template arguments`.	2021-03-20 10:36:51 -07:00
Wang, Pengfei	2327513b85	[X86] Fix a bug when calculating the ldtilecfg insertion points. The BB we initialized the ldtilecfg is special. We don't need to check if its predecessor BBs need to insert ldtilecfg for calls. We reused the flag HasCallBeforeAMX, so that the predecessors won't be added to CfgNeedInsert. This case happens only when the entry BB is in a loop. We need to hoist the first tile config point out of the loop in future. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D98845	2021-03-20 17:48:59 +08:00
Carl Ritson	6c9cac5da1	[AMDGPU] Add MDT update missing from D98915	2021-03-20 13:38:58 +09:00
Nemanja Ivanovic	ea48bf8649	[PowerPC][NFC] Do not produce i64 constants in 32-bit mode There are some instances where we produce constants of type MVT::i64 unconditionally in the target DAG combines. This is not actually valid in 32-bit mode.	2021-03-19 22:54:47 -05:00
Craig Topper	d5c1d305b3	[RISCV] Rename WriteShift/ReadShift scheduler classes to WriteShiftImm/ReadShiftImm. Move variable shifts from WriteIALU/ReadIALU to new WriteShiftReg/ReadShiftReg. Previously only immediate shifts were in WriteShift. Register shifts were grouped with IALU. Seems likely that immediate shifts would be as fast or faster than register shifts. And that immediate shifts wouldn't be any faster than IALU. So if any deserved to be in their own group it should be register shifts not immediate shifts. Rather than try to flip them let's just add more granularity and give each kind their own class. I've used new names for both to make them unambiguous and to force any downstream implementations to be forced to put correct information in their scheduler models. Reviewed By: evandro Differential Revision: https://reviews.llvm.org/D98911	2021-03-19 20:39:49 -07:00
Carl Ritson	fe5f4c397f	[AMDGPU] Rename SIInsertSkips Pass Pass no longer handles skips. Pass now removes unnecessary unconditional branches and lowers early termination branches. Hence rename to SILateBranchLowering. Move code to handle returns to epilog from SIPreEmitPeephole into SILateBranchLowering. This means SIPreEmitPeephole only contains optional optimisations, and all required transforms are in SILateBranchLowering. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D98915	2021-03-20 11:48:04 +09:00
Carl Ritson	5df2af8b0e	[AMDGPU] Merge SIRemoveShortExecBranches into SIPreEmitPeephole SIRemoveShortExecBranches is an optimisation so fits well in the context of SIPreEmitPeephole. Test changes relate to early termination from kills which have now been lowered prior to considering branches for removal. As these use s_cbranch the execz skips are now retained instead. Currently either behaviour is valid as kill with EXEC=0 is a nop; however, if early termination is used differently in future then the new behaviour is the correct one. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D98917	2021-03-20 11:26:42 +09:00
Carl Ritson	b76c09023d	[AMDGPU] Allow index optimisation in SIPreEmitPeephole for bundles Add code so duplication index register changes can be removed from inside bundles. Reviewed By: rampitec, foad Differential Revision: https://reviews.llvm.org/D98940	2021-03-20 10:26:23 +09:00
Anshil Gandhi	697f90ebfa	[NFC] [PowerPC] Determine Endianness in PPCTargetMachine The TargetMachine uses the triple to determine endianness. Just use that logic rather than replicating it in PPCSubtarget. Differential revision: https://reviews.llvm.org/D98674	2021-03-19 20:22:16 -05:00
Craig Topper	1066dcb550	[AArch64] Fix LowerMGATHER to return the chain result for floating point gathers. Found by adding asserts to LegalizeDAG to make sure custom legalized results had the right types. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D98968	2021-03-19 11:53:46 -07:00
David Green	a2e0312cda	[ARM] Tone down the MVE scalarization overhead The scalarization overhead was set deliberately high for MVE, whilst the codegen was new. It helps protect us against the negative ramifications of mixing scalar and vector instructions. This decreases that, especially for floating point where the cost of extracting/inserting lane elements can be low. For integer the cost is still fairly high due to the cross-register-bank copy, but is no longer n^2 in the length of the vector. In general, this will decrease the cost of scalarizing floats and long integer vectors. i64 increase in cost, having a high cost before and after this patch. For floats this allows up to start doing things like vectorizing fdiv instructions, even if they are scalarized. Differential Revision: https://reviews.llvm.org/D98245	2021-03-19 18:30:11 +00:00
Alexey Bataev	14ae0cf0f5	[Cost]Canonicalize the cost for logical or/and reductions. The generic cost of logical or/and reductions should be cost of bitcast <ReduxWidth x i1> to iReduxWidth + cmp eq\|ne iReduxWidth. Differential Revision: https://reviews.llvm.org/D97961	2021-03-19 11:01:58 -07:00
Craig Topper	5d315691c4	[RISCV] Add missing bitcasts to the results of lowerINSERT_SUBVECTOR and lowerEXTRACT_SUBVECTOR when handling mask vectors. Found by adding asserts to LegalizeDAG to catch incorrect result types being returned. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98964	2021-03-19 10:54:33 -07:00
Craig Topper	95998b898c	[Hexagon] Return an i64 for result 0 from LowerREADCYCLECOUNTER instead of an i32. As far as I can tell, the node coming in has an i64 result so the return should have the same type. The HexagonISD node used for this has a type profile that says the result is i64. Found while trying to add assserts to LegalizeDAG to catch result type mismatches. Reviewed By: kparzysz Differential Revision: https://reviews.llvm.org/D98962	2021-03-19 10:54:33 -07:00
Craig Topper	85f3f6b3cc	[RISCV] Lower scalable vector masked loads to intrinsics to match fixed vectors and reduce isel patterns. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98840	2021-03-19 10:39:35 -07:00
Fraser Cormack	d399b82e2a	[RISCV] Maintain fixed-length info when optimizing BUILD_VECTORs I'm not sure how I failed to notice this before, but when optimizing dominant-element BUILD_VECTORs we would lower via the scalable container type, which lost us the information about the fixed length of the vector types. By lowering via the fixed-length type we can preserve that information and eliminate redundant vsetvli instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98938	2021-03-19 17:21:06 +00:00
Fraser Cormack	550292ecb1	[RISCV] Fix missing scalable->fixed-length vector conversion Returning the scalable-vector container type would present problems when the fixed-length INSERT_VECTOR_ELT was used by later operations. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98776	2021-03-19 16:49:47 +00:00
Stanislav Mekhanoshin	57effe2205	[AMDGPU] Remove dead glc1 handing in asm parser. NFC.	2021-03-19 08:37:47 -07:00
Ricky Taylor	028d6250ea	[M68k] Replace unknown operand with explicit type Replace the unknown operand used for immediate operands for DIV/MUL with a fixed 16-bit immediate. This is required since the assembly parser generator requires that all operands are typed. Differential Revision: https://reviews.llvm.org/D98819	2021-03-19 13:44:46 +00:00
Nemanja Ivanovic	a8697c57fa	[PowerPC] Fix the check for 16-bit signed field in peephole When a D-Form instruction is fed by an add-immediate, we attempt to merge the two immediates to form a single displacement so we can remove the add-immediate. However, we don't check whether the new displacement fits into a 16-bit signed immediate field early enough. Namely, we do a sign-extend from 16 bits first which will discard high bits and then we check whether the result is a 16-bit signed immediate. It of course will always be. Move the check prior to the sign extend to ensure we are checking the correct value. Fixes https://bugs.llvm.org/show_bug.cgi?id=49640	2021-03-19 07:15:53 -05:00
Ricky Taylor	cd442157cf	[M68k] Convert register Aliases to AltNames This makes it simpler to determine when two registers are actually the same vs just partially aliasing. The only real caveat is that it becomes impossible to know which name was used for the register previously. (i.e. parsing assembly and then disassembling it can result in the register name changing.) Differential Revision: https://reviews.llvm.org/D98536	2021-03-19 11:44:53 +00:00
Ricky Taylor	51884c6bef	[M68k] Introduce DReg bead This is required in order to determine during disassembly whether a Reg bead without associated DA bead is referring to a data register. Differential Revision: https://reviews.llvm.org/D98534	2021-03-19 11:44:53 +00:00
Jay Foad	5a5a531214	[AMDGPU] Remove some redundant code. NFC. This is redundant because we have already checked that we can't handle divergent 64-bit atomic operands.	2021-03-19 11:36:15 +00:00
Jay Foad	5dd5ddcb41	[AMDGPU] Skip building some IR if it won't be used. NFC.	2021-03-19 11:36:14 +00:00
Jay Foad	c96dfe0d8b	[AMDGPU] Sink Intrinsic::getDeclaration calls to where they are used. NFC.	2021-03-19 11:36:14 +00:00

... 3 4 5 6 7 ...

62287 Commits