llvm-project

Commit Graph

Author	SHA1	Message	Date
Chenbing.Zheng	6d6c44a3f3	[RISCV] Add support for matching vwmulsu from fixed vectors According to riscv-v-spec-1.0, widening signed(vs2)-unsigned integer multiply vwmulsu.vv vd, vs2, vs1, vm # vector-vector vwmulsu.vx vd, vs2, rs1, vm # vector-scalar It is worth noting that signed op is only for vs2. For vwmulsu.vv, we can swap two ops, and don't care which is sign extension, but for vwmulsu.vx signExt can not be a vector extended from scalar (rs1). I specifically added two functions ending with _swap in the test case. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118215	2022-01-28 02:33:30 +00:00
Jonas Paulsson	9ca9fee6e8	[SystemZ] Don't shrink 64-bit FP constants. Return false from ShouldShrinkFPConstant(), so that these constants are stored in their full size on the constant pool, even if they could have been shrunk and used with an extending load. This is better since LD is faster than LDE, and it also enables reg/mem opcodes. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D117927	2022-01-27 16:14:53 -06:00
Jonas Paulsson	f541a5048a	[SystemZ] Implement orderFrameObjects(). By reordering the objects on the stack frame after looking at the users, a better utilization of displacement operands will result. This means less needed Load Address instructions for the accessing of these objects. This is important for very large functions where otherwise small changes could cause a lot more/less accesses go out of range. Note: this is not yet enabled for SystemZXPLINKFrameLowering, but should be. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D115690	2022-01-27 16:09:19 -06:00
Craig Topper	70e1cc6792	[RISCV] Prefer vmslt.vx v0, v8, zero over vmsle.vi v0, v8, -1. At least when starting from a vmslt.vx intrinsic or ISD::SETLT. We don't handle the case where the user used vmsle.vx intrinsic with -1.	2022-01-27 11:48:27 -08:00
David Green	82973edfb7	[ARM][AArch64] Introduce qrdmlah and qrdmlsh intrinsics Since it's introduction, the qrdmlah has been represented as a qrdmulh and a sadd_sat. This doesn't produce the same result for all input values though. This patch fixes that by introducing a qrdmlah (and qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah instructions. The old test cases will now produce a qrdmulh and sqadd, as expected. Fixes #53120 and #50905 and #51761. Differential Revision: https://reviews.llvm.org/D117592	2022-01-27 19:19:46 +00:00
Simon Pilgrim	9103b73fe0	[X86] Fold MOVMSK(CONCAT(X,Y)) -> MOVMSK(AND/OR(X,Y)) for all_of/any_of patterns Makes it easier for later folds and avoids unnecessary 256-bit ops (especially on AVX1-only targets where we miss a lot of integer instructions)	2022-01-27 18:28:09 +00:00
Jay Foad	4b133cee80	[AMDGPU] SILoadStoreOptimizer: reject AGPR DS_WRITE sooner Rejecting AGPR DS_WRITE instructions before adding them to any mergeable list seems cleaner than adding them to the list and rejecting them later. Differential Revision: https://reviews.llvm.org/D118368	2022-01-27 18:20:46 +00:00
Jay Foad	94a4594c54	[AMDGPU] SILoadStoreOptimizer: use separate lists for AGPR instructions Using separate lists for AGPR and non-AGPR instructions seems like a cleaner solution than putting them all in the same list and then later refusing to merge instructions of different AGPR-ness. Differential Revision: https://reviews.llvm.org/D118367	2022-01-27 18:20:46 +00:00
Jay Foad	8a52fef1e0	[AMDGPU] SILoadStoreOptimizer: tweak API of CombineInfo::setMI. NFC. Change CombineInfo::setMI to take a reference to the SILoadStoreOptimizer instance, for easy access to common fields like TII and STM. Differential Revision: https://reviews.llvm.org/D118366	2022-01-27 18:20:46 +00:00
Matt Arsenault	33b45ee44b	AMDGPU: Handle addrspacecast of constant 32-bit to flat I accidentally made this work on the GlobalISel path, and there's no real reason not to handle this.	2022-01-27 11:01:44 -05:00
Sander de Smalen	af1c8f0d14	[AArch64][SVE] Folds VSELECT if the predicate is all active. This adds the following changes: * Fold: vselect(<all active predicate>, x, y) => x * Extend isAllActivePredicate to take vscale_range into account, e.g. isAllActivePredicate(vl16) for nxv16i1 and vscale == 1 => true. isAllActivePredicate(vl32) for nxv16i1 and vscale == 2 => true. Differential Revision: https://reviews.llvm.org/D118147	2022-01-27 15:58:56 +00:00
Matt Arsenault	aa88b65392	AMDGPU/GlobalISel: Fix assert on invalid cond code for llvm.amdgcn.icmp	2022-01-27 10:34:06 -05:00
Matt Arsenault	f482e86980	AMDGPU/GlobalISel: Fix flat_scratch_init handling for shaders I don't think this is actually defined for mesa, but this is what we were doing on the DAG path.	2022-01-27 10:20:52 -05:00
Simon Pilgrim	ccda0f2226	[X86][SSE] Add combineBitOpWithShift for BITOP(SHIFT(X,Z),SHIFT(Y,Z)) -> SHIFT(BITOP(X,Y),Z) vector folds InstCombine performs this more generally with SimplifyUsingDistributiveLaws, but we don't need anything that complex here - this is mainly to fix up cases where logic ops get created late on during lowering, often in conjunction with sext/zext ops for type legalization. https://alive2.llvm.org/ce/z/gGpY5v	2022-01-27 14:54:41 +00:00
Jay Foad	185cb8e82c	[AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access Swizzled accesses are not merged, but there is no particular reason not to merge two instructions if any of the intervening instructions happens to be a swizzled access. This moves the check for swizzled accesses out of checkAndPrepareMerge into collectMergeableInsts where I think it makes more sense. Differential Revision: https://reviews.llvm.org/D118267	2022-01-27 14:40:58 +00:00
Simon Pilgrim	389ae775e4	[X86] Fold TESTZ(OR(LO(X),HI(X)),OR(LO(Y),HI(Y))) -> TESTZ(X,Y) Helps fix a number of poor codegen cases for allof(cmp()) with 256-bit vectors on AVX1	2022-01-27 13:20:36 +00:00
Sander de Smalen	dafd1f29da	[AArch64][SVE] Avoid using ptrue for unpredicated predicate AND. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118146	2022-01-27 13:00:23 +00:00
Sander de Smalen	417a75c6d0	[AArch64][SVE] Avoid using ptrue for ptest in VECREDUCE_OR. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118145	2022-01-27 11:44:49 +00:00
Sander de Smalen	c9da81d997	[AArch64][SVE] Implement missing lowering for extract_subvector for predicates. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D118057	2022-01-27 11:01:11 +00:00
Sander de Smalen	d58757e522	[AArch64][SVE] Implement PFALSE with explicit AArch64ISD node. The ISel patterns for PFALSE helps recognise the instructions as being free of side-effects, which helps MachineCSE remove redundant PFALSE instructions. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118054	2022-01-27 10:30:13 +00:00
Jay Foad	95857a7058	[AMDGPU] SILoadStoreOptimizer: Remove redundant check for volatile SILoadStoreOptimizer::collectMergeableInsts already ends the current block if it sees a volatile (or ordered) memory access, so there is no need to check for them again when scanning the instructions between two pairing candidates in a block. Differential Revision: https://reviews.llvm.org/D118266	2022-01-27 10:14:53 +00:00
Nikita Popov	fc72f3a168	[BTFDebug] Avoid pointer element type access Use the global value type instead.	2022-01-27 10:30:21 +01:00
Fraser Cormack	84e85e025e	[SelectionDAG][VP] Provide expansion for VP_MERGE This patch adds support for expanding VP_MERGE through a sequence of vector operations producing a full-length mask setting up the elements past EVL/pivot to be false, combining this with the original mask, and culminating in a full-length vector select. This expansion should work for any data type, though the only use for RVV is for boolean vectors, which themselves rely on an expansion for the VSELECT. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118058	2022-01-27 09:00:41 +00:00
Wu Xinlong	6a4d3f37b5	[RISCV] fix dead code fix dead code mentioned on https://reviews.llvm.org/D98136 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118323	2022-01-27 16:00:01 +08:00
Zi Xuan Wu	4ad517e6b0	[CSKY] Add floating operation support including float and double CSKY arch has multiple FPU instruction versions such as FPU, FPUv2 and FPUv3 to implement floating operations. For now, we just only support FPUv2 and FPUv3. It includes the encoding, asm parsing of instructions and codegen of DAG nodes.	2022-01-27 15:54:04 +08:00
Wu Xinlong	615d71d9a3	[RISCV][CodeGen] Implement IR Intrinsic support for K extension This revision implements IR Intrinsic support for RISCV Scalar Crypto extension according to the specification of version [[ https://github.com/riscv/riscv-crypto/releases/tag/v1.0.0-scalar \| 1.0]] Co-author：@ksyx & @VincentWu & @lihongliang & @achieveartificialintelligence Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102310	2022-01-27 15:53:35 +08:00
Ting Wang	6f25cb8685	[PowerPC] Add the Power10 XS[MAX\|MIN]CQP instruction Add the Power 10 instruction XS[MAX\|MIN]CQP. Reviewed By: shchenz, amyk Differential Revision: https://reviews.llvm.org/D118036	2022-01-26 23:00:43 -05:00
Craig Topper	b3bec6e453	[RISCV] Use vnsrl.wx with x0 instead of vnsrl.vi for truncate. This matches what the spec uses for the vncvt.x.x.w assembly pseudoinstruction. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D118295	2022-01-26 18:38:13 -08:00
Stanislav Mekhanoshin	409c4436f9	[AMDGPU] Validate dst and src2 non-overlapping restriction in asm Differential Revision: https://reviews.llvm.org/D118089	2022-01-26 15:14:06 -08:00
Stanislav Mekhanoshin	dbf278b984	[AMDGPU] Prevent aliasing of SrcC and Dst in MAI Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are overlapping should be avoid as new value could be written to accumulator input before it gets read. Note that this inevitably increases register pressure to the point where some programs will become uncompilable. This patch separates MAC and FMA versions of MFMA instructions using either tied dst and src2 or earlyclobber dst. Fixes: SWDEV-318900 Differential Revision: https://reviews.llvm.org/D117844	2022-01-26 14:48:20 -08:00
Craig Topper	f487a76430	[RISCV] Add hasStdExtZbp() to hasAndNotCompare.	2022-01-26 13:54:05 -08:00
Matt Arsenault	045be6ff36	AMDGPU/GlobalISel: Fold wave address into mubuf addressing modes	2022-01-26 15:25:26 -05:00
Matt Arsenault	09fc311af7	AMDGPU/GlobalISel: Mostly fix BFI patterns Most importantly, fixes constant bus errors in the 64-bit cases. It's surprising to me these were even passing the selection test using SReg_* sources. Also fixes pattern matching in the 32-bit cases, with simple operands. These patterns aren't working in a few cases, like with mixed SGPR inputs. The patterns aren't looking through the SGPR->VGPR copies like they need to. The vector cases also have some unmerges of build_vector which are obscuring the inputs.	2022-01-26 15:06:50 -05:00
Craig Topper	b3d94b199c	[RISCV] Remove references to 'B' extension from AssemblerPredicate and SubtargetFeature strings. For Zba/Zbb/Zbc/Zbs I've removed the 'B' completely and used the extension names as presented at the start of Chapter 1 of the 1.0.0 Bitmanipulation spec. For the unratified extensions, I've replaced 'B' with 'Zb' and otherwise left them unchanged. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D117822	2022-01-26 11:08:29 -08:00
Matt Arsenault	e6564f39c7	AMDGPU: Emit user sgpr count directives in text asm We were emitting these in the object file but not printing them.	2022-01-26 13:51:12 -05:00
Konstantina	aa418b9133	[AMDGPU][SIWholeQuadMode] Use the right VCC register to activate the correct lanes. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D118096	2022-01-26 08:54:39 -08:00
Stanislav Mekhanoshin	4e077c0a0b	[AMDGPU] Remove feature register-banking Since RegBankReassign pass was removed this feature is not use for anything. Differential Revision: https://reviews.llvm.org/D118195	2022-01-26 08:39:17 -08:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
Simon Pilgrim	99ae5c13f6	[X86] Add 'getSplitVectorSrc' helper to determine if subvectors all come from the same source Helps determine if the subvector ops come from the same larger vector and match the lower/upper extractions	2022-01-26 15:17:21 +00:00
Nemanja Ivanovic	0c56bc92e4	[PowerPC] Fix eq/ne comparison of v2i64 pre-Power8 In commit `1674d9b6b2`, I fixed the bug where we didn't consider both words of the result of the comparison. However, the logic needs to be different for eq and ne. Namely for eq, we need both words of the doubleword to equal so it is an AND. OTOH for ne, we need either word to be unequal so it is an OR.	2022-01-26 08:59:08 -06:00
Nikita Popov	a5e324e3e2	[AMDGPUHSAMetadataStreamer] Do not assume ABI alignment for pointers AMDGPUHSAMetadataStreamer currently assumes that pointer arguments without align attribute have ABI alignment of the pointee type. This is incompatible with opaque pointers, but also plain incorrect: Pointer arguments without explicit alignment have alignment 1. It is the responsibility of the frontent to add correct align annotations. Differential Revision: https://reviews.llvm.org/D118229	2022-01-26 15:45:14 +01:00
Alban Bridonneau	2feddb37b4	Implement correct cost for SVE bitcasts We have some bitcasts which we know will be simplified, so their cost is zero. Reviewed By: david-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D118019	2022-01-26 14:25:44 +00:00
Simon Moll	5ceb0bc7ea	[VE] Packed 32/64bit broadcast isel and tests Packed-mode broadcast of f32/i32 requires the subregister to be replicated to the full I64 register prior. Add repl_i32 and repl_f32 to faciliate this. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D117878	2022-01-26 14:16:06 +01:00
alex-t	5157f984ae	[AMDGPU] Enable divergence-driven XNOR selection Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence. This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32) on those targets which have no V_XNOR. Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form. We assume that xor (not) is already turned into the not (xor) by the combiner. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116270	2022-01-26 15:33:10 +03:00
Paul Walker	66bd7ebdf7	[SVE] Use DUPM to handling more splat immediate cases. NOTE: Only considers i64 based vectors at this time because smaller element types require extra isel operand parsing. Differential Revision: https://reviews.llvm.org/D118040	2022-01-26 12:04:44 +00:00
Maciej Gabka	c5263cd518	Restrict performPostLD1Combine to 64 and 128 bit vectors When wider vectors are used, for example fixed width SVE, there is no patterns to select AArch64ISD::LD1LANEpost nodes, so we should do an early exit. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D117674	2022-01-26 09:57:44 +00:00
jacquesguan	267711e38b	[RISCV] Fix support of vlen = 64. In the Zve* extensions, the vlen could be 64. This patch change the vlen constraint of low bound to 64. Differential Revision: https://reviews.llvm.org/D118217	2022-01-26 16:31:21 +08:00
Jim Lin	da1cac7d19	[NFC] Remove duplicate include	2022-01-26 15:10:16 +08:00
Qiu Chaofan	ad0345aed1	[PowerPC] Emit gnu_attribute according to float-abi metadata According to GNU as documentation, PowerPC supports some .gnu_attribute tags to represent the vector and float ABI type in the object file. Some linkers like GNU ld respects the attribute and will prevent objects with conflicting ABIs being linked. This patch emits gnu_attribute value in assembly printer according to the float-abi metadata. More attributes for soft-fp, hard single/double and even vector ABI need to be supported in the future. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D117193	2022-01-26 13:28:50 +08:00
Micah Weston	f65651cc8a	[AArch64] Fixes ADD/SUB opt bug and abstracts shared behavior in MIPeepholeOpt for ADD, SUB, and AND. This fixes a bug where (SUBREG_TO_REG 0 (MOVi32imm <negative-number>) sub_32) would generate invalid code since the top 32-bits were not zeroed when inspecting the immediate value. A new test was added for this case. Change to abstract shared behavior in MIPeepholeOpt. Both visitAND and visitADDSUB attempt to split an RR instruction with an immediate operand into two RI instructions with the immediate split. The differing behavior lies in how the immediate is split into two pieces and how the new instructions are built. The rest of the behavior (adding new VRegs, checking for the MOVImm, constraining reg classes, removing old intructions) are shared between the operations. The new helper function splitTwoPartImm implements the shared behavior and delegates differing behavior to two function objects passed by the caller. One function object splits the immediate into two values and returns the opcode to use if it is a valid split. The other function object builds the new instructions. I felt this abstraction would help since I believe it will help reduce the code repetition when adding new instructions of the pattern, such as SUBS for this conditional optimization. Tested it locally by running check all with compiler-rt, mlir, clang-tools-extra, flang, llvm, and clang enabled. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118000	2022-01-26 04:22:27 +00:00
Zakk Chen	510710d037	[RISCV][NFC] Add getVLOperand for RVV intrinsics. Use the VLOperand information to get the VL. Differential Revision: https://reviews.llvm.org/D118156	2022-01-25 17:37:58 -08:00
Zakk Chen	9273378b85	[RISCV] Add the passthru operand for RVV nomask load intrinsics. The goal is support tail and mask policy in RVV builtins. We focus on IR part first. If the passthru operand is undef, we use tail agnostic, otherwise use tail undisturbed. Co-Authored-by: Hsiangkai Wang <Hsiangkai@gmail.com> Reviewers: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D117647	2022-01-25 17:31:36 -08:00
eopXD	b089e4072a	[RISCV] Don't allow i64 vector div by constant to use mulh with Zve64x EEW=64 of mulh and its vairants requires V extension. Authored by: Craig Topper <craig.topper@sifive.com> @craig.topper Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117947	2022-01-25 09:55:05 -08:00
Sebastian Neubauer	4ed7c6eec9	[AMDGPU] Only match correct type for a16 Addresses are floats when a sampler is present and unsigned integers when no sampler is present. Therefore, only zext instructions, not sext instructions should match. Also match integer constants that can be truncated. Differential Revision: https://reviews.llvm.org/D118043	2022-01-25 14:59:16 +01:00
Danila Malyutin	153b1e3cba	[AArch64] Add patterns for relaxed atomic ld/st into fp registers Adds patterns to match integer loads/stores bitcasted to fp values Fixes https://github.com/llvm/llvm-project/issues/52927 Differential Revision: https://reviews.llvm.org/D117573	2022-01-25 15:33:37 +03:00
Paul Walker	d95cf1f6cf	[SVE] Enable ISD::ABDS/U ISel for scalable vectors. NOTE: This patch also includes tests that highlight those cases where the existing DAG combine doesn't yet work well for SVE. Differential Revision: https://reviews.llvm.org/D117873	2022-01-25 12:14:53 +00:00
Simon Pilgrim	157f9b68a3	[X86] combineVectorSignBitsTruncation - fix indentation. NFC.	2022-01-25 11:54:22 +00:00
Simon Tatham	f302e0b5dd	[AArch64] Exclude optional features from HasV8_0rOps. The following SubtargetFeatures are removed from the definition of HasV8_0rOps, on the grounds that they are optional in Armv8.4-A, and therefore (by the definition of Armv8.0-R) also optional in v8.0-R: * performance monitoring: FeaturePerfMon * cryptography: FeatureSM4 and FeatureSHA3 * half-precision FP: FeatureFullFP16, FeatureFP16FML * speculation control: FeatureSSBS, FeaturePredRes, FeatureSB, FeatureSpecRestrict This isn't the full set of features that are listed as optional in the spec. FeatureCCIDX and FeatureTRACEV8_4 are also optional. But LLVM includes those in HasV8_3aOps and HasV8_4aOps respectively (I think on the grounds that the system registers they enable are useful to be able to access after a runtime check), and so for consistency, I've left those in HasV8_0rOps too. After this commit, HasV8_0rOps is a strict subset of HasV8_4aOps (but missing features that are not in Armv8.0-R at all). The definition of Cortex-R82 is correspondingly updated to add most of the features that I've removed from base Armv8.0-R (with the exception of the cryptography ones), since that particular implementation of v8.0-R does have them. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118045	2022-01-25 10:54:59 +00:00
David Sherwood	13252160c3	[NFC] Move useSVEForFixedLengthVectors into AArch64Subtarget.h Given how small the function is and how often it gets used it makes more sense to live in the header file. Differential Revision: https://reviews.llvm.org/D117883	2022-01-25 09:49:04 +00:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Martin Storsjö	70cb8daed4	[X86] [CodeView] Add codeview mappings for registers ST0-ST7 These can end up needed after https://reviews.llvm.org/D116821. Suggested by Alexandre Ganea. Differential Revision: https://reviews.llvm.org/D118072	2022-01-25 10:09:06 +02:00
Craig Topper	fd0a4bc76b	[RISCV] Add missing space to 'clang-format on' directive. NFC Without a space after the comment characters it seems to be ignored.	2022-01-24 17:00:37 -08:00
Simon Pilgrim	902184e6cc	[X86] combinePredicateReduction - generalize allof(cmpeq(x,0)) handling to allof(cmpeq(x,y)) There's no further reasons to limit this to cmpeq-with-zero, the outstanding regressions with lowering to PTEST have now been addressed Improves codegen for Issue #53379	2022-01-25 00:24:06 +00:00
Simon Pilgrim	11bb4a1111	[X86] combinePredicateReduction - split vXi16 allof(cmpeq()) to vXi8 allof(cmpeq()) vXi16 patterns allof(cmp()) reduction patterns will have to be pack the comparison results to vXi8 to use PMOVMSKB. If we're reducing cmpeq(), then we can compare the vXi8 halves directly - similar to what we already do for vXi64 -> vXi32 for cases without PCMPEQQ.	2022-01-24 22:43:29 +00:00
Simon Pilgrim	8d298355ca	[X86] combineSetCCMOVMSK - detect and(pcmpeq(),pcmpeq()) ptest pattern. Handle cases where we've split an allof(cmpeq()) pattern to a legal vector type	2022-01-24 21:42:03 +00:00
Quinn Pham	6a028296fe	[PowerPC] Emit warning when SP is clobbered by asm This patch emits a warning when the stack pointer register (`R1`) is found in the clobber list of an inline asm statement. Clobbering the stack pointer is not supported. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D112073	2022-01-24 15:12:23 -06:00
Stanislav Mekhanoshin	bb1fe36977	[AMDGPU] Make v8i16/v8f16 legal Differential Revision: https://reviews.llvm.org/D117721	2022-01-24 11:51:08 -08:00
Stanislav Mekhanoshin	c27f8fb968	[AMDGPU] Remove cndmask from readsExecAsData Differential Revision: https://reviews.llvm.org/D117909	2022-01-24 11:24:47 -08:00
Sebastian Neubauer	80532ebb50	[AMDGPU][InstCombine] Remove zero image offset Remove the offset parameter if it is zero. Differential Revision: https://reviews.llvm.org/D117876	2022-01-24 18:06:33 +01:00
Simon Pilgrim	6997f4d07f	[X86] combineSetCCMOVMSK - fold allof(cmpeq(x,y)) -> ptest(sub(x,y)) (PR53379) As suggested on PR53379, for all-of icmp-eq patterns, we can use ptest(sub(x,y)) on SSE41+ targets This is a generalization of the existing allof(cmpeq(x,0)) -> ptest(x) pattern We can probably extend this further, in particularly to handle 256-bit cases on pre-AVX2 targets, but this part of the generalization is pretty trivial Fixes Issue #53379	2022-01-24 16:44:37 +00:00
Craig Topper	cd2a9ff397	[RISCV] Select int_riscv_vsll with shift of 1 to vadd.vv. Add might be faster than shift. We can't do this earlier without using a Freeze instruction. This is the intrinsic version of D106689. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118013	2022-01-24 08:04:53 -08:00
Matt Arsenault	18aabae8e2	AMDGPU: Fix assertion on fixed stack objects with VGPR->AGPR spills These have negative / out of bounds frame index values and would assert when trying to set the BitVector. Fixed stack objects can't be colored away so ignore them.	2022-01-24 09:45:41 -05:00
Matt Arsenault	99e8e17313	Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This reverts commit `a97e20a3a8`.	2022-01-24 09:26:52 -05:00
Fraser Cormack	d42678b453	[RISCV] Add side-effect-free vsetvli intrinsics This patch introduces new intrinsics that enable the use of vsetvli in contexts where only the returned vector length is of interest. The pre-existing intrinsics are marked with side-effects, which prevents even trivial optimizations on/across them. These intrinsics are intended to be used in situations where the vector length is fed in turn to RVV intrinsics or to vector-predication intrinsics during loop vectorization, for example. Those codegen paths ensure that instructions are generated with their own implicit vsetvli, so the vector length and vtype can be relied upon to be correct. No corresponding C builtins are planned at this stage, though that is a possibility for the future if the need arises. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117910	2022-01-24 13:52:08 +00:00
serge-sans-paille	5f290c090a	Move STLFunctionalExtras out of STLExtras Only using that change in StringRef already decreases the number of preoprocessed lines from 7837621 to 7776151 for LLVMSupport Perhaps more interestingly, it shows that many files were relying on the inclusion of StringRef.h to have the declaration from STLExtras.h. This patch tries hard to patch relevant part of llvm-project impacted by this hidden dependency removal. Potential impact: - "llvm/ADT/StringRef.h" no longer includes <memory>, "llvm/ADT/Optional.h" nor "llvm/ADT/STLExtras.h" Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831	2022-01-24 14:13:21 +01:00
Sebastian Neubauer	f1e36474b9	[AMDGPU][NFC] Fix debug prints Print the instructions instead of pointers.	2022-01-24 13:55:00 +01:00
SForeKeeper	70f83f3084	[RISCV] add support for zbkx subextension in MC layer. This patch adds support for zbkx extension from K extension(v1.0.0) in MC layer. Instructions with same functionality and same encoding is defined in the bitmanip extension. It defines {Xperm8, Xperm4} as instruction aliases for xperm.* in Zbp extension. When Zbkx is enabled while Zbp is not, xperm.h will not be available. When Zbkx and Zbp are both enabled, the instructions will be decoded in Zbp format. [[ https://reviews.llvm.org/D94999 \| D94999 ]] this is the patch that introduces xperm.* instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117889	2022-01-24 20:38:46 +08:00
Fraser Cormack	af773a1818	[RISCV][VP] Lower VP_MERGE to RVV instructions This patch adds lowering of the llvm.vp.merge.* intrinsic (ISD::VP_MERGE) to RVV vmerge/vfmerge instructions. It introduces a special pseudo form of vmerge which allows a tied merge operand, allowing us to specify the tail elements as being equal to the "on false" operand, using a tied-def constraint and a "tail undisturbed" policy. While this strategy allows us to often lower the intrinsic to just one instruction, it may be less efficient in fixed-vector types as the number of tail elements may extend far beyond the length of the fixed vector. Another strategy could be to use a vmerge/vfmerge instruction with an AVL equal to the length of the vector type, and manipulate the condition operand such that mask elements greater than the operation's EVL are false. I've also observed inefficient codegen in which our 'VF' patterns don't match raw floating-point SPLAT_VECTORs, which occur in scalable-vector code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117561	2022-01-24 11:05:05 +00:00
Fraser Cormack	e7926e8d97	[RISCV] Match VF variants for masked VFRDIV/VFRSUB This patch follows up on D117697 to help the simple binary operations behave similarly in the presence of masks. It also enables CGP sinking support for vp.fdiv and vp.fsub intrinsics, now that VFRDIV and VFRSUB are consistently matched with a LHS splat for masked and unmasked variants. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117783	2022-01-24 10:59:43 +00:00
Simon Pilgrim	577a6dc9a1	[X86] getVectorMaskingNode - fix indentation. NFC. clang-format	2022-01-24 11:08:41 +00:00
Nikita Popov	0d1308a7b7	[AArch64][GlobalISel] Support returned argument with multiple registers The call lowering code assumed that a returned argument could only consist of one register. Pass an ArrayRef<Register> instead of Register to make sure that all parts get assigned. Fixes https://github.com/llvm/llvm-project/issues/53315. Differential Revision: https://reviews.llvm.org/D117866	2022-01-24 10:55:28 +01:00
Chenbing.Zheng	9aaa74aeef	[RISCV] Add patterns of SET[U]LT_VI for STECC forms This patch optmizes "li a0, 5 vmsgt[u].vx v10, v8, a0" -> "vmsgt[u].vi v10, v8, 5" Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118014	2022-01-24 08:50:49 +00:00
Jim Lin	f533011252	[Hexagon] Use llvm::Register instead of unsigned in HexagonConstExtenders.cpp. NFC. Reviewed By: kparzysz Differential Revision: https://reviews.llvm.org/D117851	2022-01-24 16:06:25 +08:00
jacquesguan	ba16e3c31f	[RISCV] Decouple Zve* extensions and the V extension. According to the spec, there are some difference between V and Zve64d. For example, the vmulh integer multiply variants that return the high word of the product (vmulh.vv, vmulh.vx, vmulhu.vv, vmulhu.vx, vmulhsu.vv, vmulhsu.vx) are not included for EEW=64 in Zve64, but V extension does support these instructions. So we should decouple Zve extensions and the V extension. Differential Revision: https://reviews.llvm.org/D117854	2022-01-24 14:55:21 +08:00
Kazu Hirata	bf039a8620	[Target] Use range-based for loops (NFC)	2022-01-23 22:53:15 -08:00
Wu Xinlong	e29d8fb169	[RISCV] Initially support the K-extension instructions on the LLVM MC layer This commit is currently implementing supports for scalar cryptography extension for LLVM according to version v1.0.0 of [K Ext specification](https://github.com/riscv/riscv-crypto/releases)(scala crypto has been ratified already). Currently, we are implementing the MC (Machine Code) layer of his extension and the majority of work is done under `llvm/lib/Target/RISCV` directory. There are also some test files in `llvm/test/MC/RISCV` directory. Remove the subfeature of Zbk* which conflict with b extensions to reduce the size of the patch. (Zbk* will be resubmit after this patch has been merged) Co-author：@ksyx & @VincentWu & @lihongliang & @achieveartificialintelligence Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98136	2022-01-24 14:45:35 +08:00
Jim Lin	3f24cdec25	[RISCV][NFC] Remove tailing whitespaces in RISCVInstrInfoVSDPatterns.td and RISCVInstrInfoVVLPatterns.td	2022-01-24 10:49:43 +08:00
Simon Pilgrim	4762c077e7	[X86] LowerFunnelShift - always lower vXi8 fshl by constant amounts as unpack(y,x) << zext(z) This can always be lowered as PMULLW+PSRLWI+PACKUSWB	2022-01-23 21:35:05 +00:00
Simon Pilgrim	32dc14f876	[X86] LowerFunnelShift - use supportedVectorShiftWithBaseAmnt to check for supported scalar shifts Allows us to reuse the ISD shift opcode instead of a mixture of ISD/X86ISD variants	2022-01-23 21:13:58 +00:00
Craig Topper	413684313d	[RISCV] Adjust the header comment in RISCVInstrInfoZb.td to better integrate Zbk* extensions. The Zbk* extensions have some overlap with Zb so have been placed in this file. Reviewed By: VincentWu Differential Revision: https://reviews.llvm.org/D117958	2022-01-23 11:42:52 -08:00
Ayke van Laethem	116ab78694	[AVR] Make use of the constant value 0 in R1 The register R1 is defined to have the constant value 0 in the avr-gcc calling convention (which we follow). Unfortunately, we don't really make use of it. This patch replaces `LDI 0` instructions with a copy from R1. This reduces code size: my AVR build of compiler-rt goes from 50660 to 50240 bytes of code size, which is a 0.8% reduction. Presumably it will also improve execution speed, although I didn't measure this. Differential Revision: https://reviews.llvm.org/D117425	2022-01-23 17:08:01 +01:00
Ayke van Laethem	153359180a	[AVR] Remove regalloc workaround for LDDWRdPtrQ Background: https://github.com/avr-rust/rust-legacy-fork/issues/126 In short, this workaround was introduced to fix a "ran out of registers during regalloc" issue. The root cause has since been fixed in https://reviews.llvm.org/D54218 so this workaround can be removed. There is one test that changes a little bit, removing a single instruction. I also compiled compiler-rt before and after this patch but didn't see a difference. So presumably the impact is very low. Still, it's nice to be able to remove such a workaround. Differential Revision: https://reviews.llvm.org/D117831	2022-01-23 17:08:00 +01:00
eopXD	3cf15af2da	[RISCV] Remove experimental prefix from rvv-related extensions. Extensions affected: +v, +zve, +zvl Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117860	2022-01-22 20:18:40 -08:00
Phoebe Wang	37d1d02200	[X86][MS] Change the alignment of f80 to 16 bytes on Windows 32bits to match with ICC MSVC currently doesn't support 80 bits long double. ICC supports it when the option `/Qlong-double` is specified. Changing the alignment of f80 to 16 bytes so that we can be compatible with ICC's option. Reviewed By: rnk, craig.topper Differential Revision: https://reviews.llvm.org/D115942	2022-01-23 09:58:46 +08:00
Craig Topper	d44b6be6ea	[RISCV] Don't Custom legalize f16/f32/f64 bitcasts if those types aren't Legal.	2022-01-22 11:55:18 -08:00
Qiu Chaofan	00d68c3824	[PowerPC] Support parsing GNU attributes in MC This patch is the first step to enable support of GNU attribute in LLVM PowerPC, enabling it for PowerPC targets, otherwise llvm-mc raises error when seeing the attribute section. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D115854	2022-01-22 23:29:34 +08:00
Qiu Chaofan	8dedf9b58b	[PowerPC] Change CTR clobber estimation for 128-bit floating types Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D117459	2022-01-22 23:20:14 +08:00
David Green	b27e5459d5	[DAG] Convert truncstore(extend(x)) back to store(x) Pulled out of D106237, this folds truncstore(extend(x)) back to store(x) if the original store was legal. This can come up due to the order we fold nodes. A fold from X86 needs to be adjusted to prevent infinite loops, to have it pick the operand of a trunc more directly. Differential Revision: https://reviews.llvm.org/D117901	2022-01-22 13:20:36 +00:00
Micah Weston	93deac2e2b	[AArch64] Optimize add/sub with immediate through MIPeepholeOpt Fixes the build issue with D111034, whose goal was to optimize add/sub with long immediates. Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. The change which fixed the build issue in D111034 was the use of new virtual registers so that SSA form is maintained until deleting MI. Differential Revision: https://reviews.llvm.org/D117429	2022-01-22 12:39:22 +00:00
Alex Fan	e796eaf2af	[RISCV][RFC] add MC support for zbkc subextension Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117874	2022-01-22 10:23:01 +08:00
Craig Topper	0379459fc5	[RISCV] Strengthen a SDTypeProfile. Fix formatting.	2022-01-21 13:01:53 -08:00
Craig Topper	48132bb1e4	[RISCV] Simplify interface to combineMUL_VLToVWMUL. NFC Instead of passing the both the SDNode* and 2 of the operands in two different orders, just pass the SDNode * and a bool to indicate which operand order to test. While there rename to combineMUL_VLToVWMUL_VL.	2022-01-21 11:43:06 -08:00
Craig Topper	11754a4dbb	[RISCV] Use RVBUnary in more places to simplify some tablegen declarations. NFCI	2022-01-21 10:55:35 -08:00
Kai Nacke	d5ae039ed7	[SystemZ] Properly register machine passes. Registering the passes enables use of -stop-before=/-stop-after options. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D117823	2022-01-21 09:10:37 -05:00
serge-sans-paille	75e164f61d	[llvm] Cleanup header dependencies in ADT and Support The cleanup was manual, but assisted by "include-what-you-use". It consists in 1. Removing unused forward declaration. No impact expected. 2. Removing unused headers in .cpp files. No impact expected. 3. Removing unused headers in .h files. This removes implicit dependencies and is generally considered a good thing, but this may break downstream builds. I've updated llvm, clang, lld, lldb and mlir deps, and included a list of the modification in the second part of the commit. 4. Replacing header inclusion by forward declaration. This has the same impact as 3. Notable changes: - llvm/Support/TargetParser.h no longer includes llvm/Support/AArch64TargetParser.h nor llvm/Support/ARMTargetParser.h - llvm/Support/TypeSize.h no longer includes llvm/Support/WithColor.h - llvm/Support/YAMLTraits.h no longer includes llvm/Support/Regex.h - llvm/ADT/SmallVector.h no longer includes llvm/Support/MemAlloc.h nor llvm/Support/ErrorHandling.h You may need to add some of these headers in your compilation units, if needs be. As an hint to the impact of the cleanup, running clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Support/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l before: 8000919 lines after: 7917500 lines Reduced dependencies also helps incremental rebuilds and is more ccache friendly, something not shown by the above metric :-) Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831	2022-01-21 13:54:49 +01:00
Fraser Cormack	4d268dc94a	[RISCV] Enable CGP to sink splat operands of VP intrinsics This patch brings better splat-matching to our VP support, by sinking splat operands of VP intrinsics back into the same block as the VP operation. The list of VP intrinsics we are interested in matches that of the regular instructions. Some optimization is still lacking. For instance, our VL nodes aren't recognized as commutative, so splats must be on the RHS. Because of this, we limit our sinking of splats to just the RHS operand for now. Improvement in this regard can come in another patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117703	2022-01-21 11:30:37 +00:00
Sebastian Neubauer	ae2f9c8be8	[AMDGPU] Remove lz and nomip combine from codegen These combines have been moved into the IR combiner in D116042. Differential Revision: https://reviews.llvm.org/D116116	2022-01-21 12:09:08 +01:00
Sebastian Neubauer	603d18033c	[AMDGPU][InstCombine] Remove zero LOD bias If the bias is zero, we can remove it from the image instruction. Also copy other image optimizations (l->lz, mip->nomip) to IR combines. Differential Revision: https://reviews.llvm.org/D116042	2022-01-21 12:09:07 +01:00
Sebastian Neubauer	0530fdbbbb	[AMDGPU] Fix LOD bias in A16 combine As the codegen fix in D111754, the LOD bias needs to be converted to 16 bits. Fix this in the combine. Differential Revision: https://reviews.llvm.org/D116038	2022-01-21 12:09:06 +01:00
Simon Moll	7950010e49	[VE][NFC] Factor out helper functions Factor out some helper functions to cleanup VEISelLowering. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D117683	2022-01-21 09:15:59 +01:00
wangpc	8def89b5dc	[RISCV] Set CostPerUse to 1 iff RVC is enabled After D86836, we can define multiple cost values for different cost models. So here we set CostPerUse to 1 iff RVC is enabled to avoid potential impact on RA. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117741	2022-01-21 14:44:26 +08:00
Zi Xuan Wu	82bb8a588d	[CSKY] Add codegen support of GlobalTLSAddress lowering There are static and dynamic TLS address lowering in DAG stage according to different TLS model. It needs PseudoTLSLA32 pseudo to get address of TLS-related entry which resides in constant pool.	2022-01-21 14:39:55 +08:00
Craig Topper	7b3d307288	[RISCV] Add isel patterns for grevi, shfli, and unshfli to brev8/zip/unzip instructions. Zbkb supports some encodings of the general grevi, shfli, and unshfli instructions legal, so we added separate instructions for those encodings to improve the diagnostics for assembler and disassembler. To be consistent we should always use these separate instructions whenever those specific encodings of grevi/shfli/unshfli occur. So this patch adds specific isel patterns to override the generic isel patterns for these cases. Similar was done for rev8 and zext.h for Zbb previously.	2022-01-20 20:43:52 -08:00
Wu Xinlong	7ee1c162cc	[RISCV][RFC] add inst support of zbkb This commit add instructions supports of `zbkb` which defined in scalar cryptography extension version v1.0.0 (has been ratified already). Most of the zbkb directives reuse parts of the zbp and zbb directives, so this patch just modified some of the inst aliases and predicates. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117640	2022-01-21 11:49:36 +08:00
Joao Moreira	82af95029e	[X86] Enable ibt-seal optimization when LTO is used in Kernel Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit these instruction on function's prologues. Because this is a security feature, it is desirable that only actual indirect-branch-targeted functions are emitted with ENDBRs. While it is possible to identify address-taken functions through LTO, minimizing these ENDBR instructions remains a hard task for user-space binaries because exported functions may end being reachable through PLT entries, that will use an indirect branch for such. Because this cannot be determined during compilation-time, the compiler currently emits ENDBRs to every non-local-linkage function. Despite the challenge presented for user-space, the kernel landscape is different as no PLTs are used. With the intent of providing the most fit ENDBR emission for the kernel, kernel developers proposed an optimization named "ibt-seal" which replaces the ENDBRs for NOPs directly in the binary. The discussion of this feature can be seen in [1]. This diff brings the enablement of the flag -mibt-seal, which in combination with LTO enforces a different policy for ENDBR placement in when the code-model is set to "kernel". In this scenario, the compiler will only emit ENDBRs to address taken functions, ignoring non-address taken functions that are don't have local linkage. A comparison between an LTO-compiled kernel binaries without and with the -mibt-seal feature enabled shows that when -mibt-seal was used, the number of ENDBRs in the vmlinux.o binary patched by objtool decreased from 44383 to 33192, and that the number of superfluous ENDBR instructions nopped-out decreased from 11730 to 540. The 540 missed superfluous ENDBRs need to be investigated further, but hypotheses are: assembly code not being taken care of by the compiler, kernel exported symbols mechanisms creating bogus address taken situations or even these being removed due to other binary optimizations like kernel's static_calls. For now, I assume that the large drop in the number of ENDBR instructions already justifies the feature being merged. [1] - https://lkml.org/lkml/2021/11/22/591 Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D116070	2022-01-21 10:55:34 +08:00
Hsiangkai Wang	ad06e65dc4	[RISCV] Fix the bug in the register allocator caused by reserved BP. Originally, hasRVVFrameObject() will scan all the stack objects to check whether if there is any scalable vector object on the stack or not. However, it causes errors in the register allocator. In issue 53016, it returns false before RA because there is no RVV stack objects. After RA, it returns true because there are spilling slots for RVV values during RA. The compiler will not reserve BP during register allocation and generate BP access in the PEI pass due to the inconsistent behavior of the function. The function is changed to use hasStdExtV() as the return value. It is not precise, but it can make the register allocation correct. Refer to https://github.com/llvm/llvm-project/issues/53016. Differential Revision: https://reviews.llvm.org/D117663	2022-01-21 01:23:01 +00:00
Craig Topper	cfae2c65db	[RISCV] Factor Zve32 support into RISCVSubtarget::getMaxELENForFixedLengthVectors. This is needed to properly limit fractional LMULs for Zve32. Add new RUN Zve32 RUN lines to the existing tests for the -riscv-v-fixed-length-vector-elen-max command line option.	2022-01-20 16:31:12 -08:00
Craig Topper	5e88f527da	[RISCV] Remove RISCVSubtarget::hasStdExtV() and hasStdExtZve(). NFC All code should use one of the cleaner named hasVInstructions functions. Fix the two uses that weren't and delete the methods so no new uses can be created.	2022-01-20 15:05:09 -08:00
Craig Topper	fa8bb22466	[RISCV] Optimize vector_shuffles that are interleaving the lowest elements of two vectors. RISCV only has a unary shuffle that requires places indices in a register. For interleaving two vectors this means we need at least two vrgathers and a vmerge to do a shuffle of two vectors. This patch teaches shuffle lowering to use a widening addu followed by a widening vmaccu to implement the interleave. First we extract the low half of both V1 and V2. Then we implement (zext(V1) + zext(V2)) + (zext(V2) * zext(2^eltbits - 1)) which simplifies to (zext(V1) + zext(V2) * 2^eltbits). This further simplifies to (zext(V1) + zext(V2) << eltbits). Then we bitcast the result back to the original type splitting the wide elements in half. We can only do this if we have a type with wider elements available. Because we're using extends we also have to be careful with fractional lmuls. Floating point types are supported by bitcasting to/from integer. The tests test a varied combination of LMULs split across VLEN>=128 and VLEN>=512 tests. There a few tests with shuffle indices commuted as well as tests for undef indices. There's one test for a vXi64/vXf64 vector which we can't optimize, but verifies we don't crash. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D117743	2022-01-20 14:44:47 -08:00
Stanislav Mekhanoshin	41ebd19681	[AMDGPU] Do not ignore exec use where exec is read as data Compares, v_cndmask_b32, and v_readfirstlane_b32 use EXEC in a way which modifies the result. This implicit EXEC use shall not be ignored for the purposes of instruction moves. Differential Revision: https://reviews.llvm.org/D117814	2022-01-20 14:05:22 -08:00
Craig Topper	dd7b69a61f	[RISCV] Remove HadStdExtV and HasStdZve* Predicates from tablegen. No instructions should be using these. Everything should use HasVInstructions* Predicates. Remove them so that they can't be used by accident.	2022-01-20 12:54:20 -08:00
Craig Topper	7a275dc354	[RISCV] Remove Zvlsseg extension. This string no longer appears in the Vector Extension specification. The segment load/store instructions are just part of the vector instruction set. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D117724	2022-01-20 12:40:07 -08:00
Craig Topper	94e69fbb4f	[RISCV] Add DAG combine to fold (fp_to_int_sat (ffloor X)) -> (select X == nan, 0, (fcvt X, rdn)) Similar for ceil, trunc, round, and roundeven. This allows us to use static rounding modes to avoid a libcall. This is similar to D116771, but for the saturating conversions. This optimization is done for AArch64 as isel patterns. RISCV doesn't have instructions for ceil/floor/trunc/round/roundeven so the operations don't stick around until isel to enable a pattern match. Thus I've implemented a DAG combine. I'm only handling saturating to i64 or i32. This could be extended to other sizes in the future. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D116864	2022-01-20 11:35:37 -08:00
Jonas Paulsson	792853cb78	[SystemZ] Remove the ManipulatesSP flag from backend (NFC). This flag was set in the presence of stacksave/stackrestore in order to force a frame pointer. This should however not be needed per the comment in MachineFrameInfo.h stating that a a variable sized object "...is the sole condition which prevents frame pointer elimination", and experiments have also shown that there seems to be no effect whatsoever on code generation with ManipulatesSP. Review: Ulrich Weigand	2022-01-20 13:00:51 -06:00
Matt Arsenault	064cea9c9a	AMDGPU/GlobalISel: Try to use s_and_b64 in ptrmask selection Avoids a test diff with SDAG.	2022-01-20 12:56:53 -05:00
Matt Arsenault	2e49e0cfde	AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics Emit an error if the return value is used on subtargets that do not support them. Previously we were falling back to the DAG on selection failure, where it would emit this error and then fail again.	2022-01-20 12:46:45 -05:00
Matt Arsenault	be7e938e27	AMDGPU/GlobalISel: Stop handling llvm.amdgcn.buffer.atomic.fadd This code is not structured to handle the legacy buffer intrinsics and was miscompiling them.	2022-01-20 12:12:06 -05:00
Matt Arsenault	8ff3c9e0be	AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics The struct/raw forms for the buffer atomics now work as expected. However, we're incorrectly handling the legacy form (which we probably shouldn't handle at all). We also are not diagnosing the use of the return value on gfx908. These will be addressed separately.	2022-01-20 12:12:06 -05:00
Matt Arsenault	89c447e4e6	AMDGPU: Stop reserving 36-bytes before kernel arguments for amdpal This was inheriting the mesa behavior, and as far as I know nobody is using opencl kernels with amdpal. The isMesaKernel check was irrelevant because this property needs to be held for all functions.	2022-01-20 12:12:05 -05:00
Random06457	ee198df2e1	[mips] Improve vr4300 mulmul bugfix pass When compiling with dwarf info, the mfix4300 flag introduced in https://reviews.llvm.org/D116238 can miss some occurrences of the vr4300 mulmul bug if a debug instruction happens to be between two `muls` instructions. This change skips debug instructions in order to fix the mulmul bug detection. Fixes https://github.com/llvm/llvm-project/issues/53094 Differential Revision: https://reviews.llvm.org/D117615	2022-01-20 20:10:04 +03:00
Simon Pilgrim	866311e71c	[X86] lowerToAddSubOrFMAddSub - lower 512-bit ADDSUB patterns to blend(fsub,fadd) AVX512 doesn't provide a ADDSUB instruction, but if we've built this from a build vector of scalar fsub/fadd elements we can still lower to blend(fsub,fadd)	2022-01-20 15:16:05 +00:00
Simon Tatham	a4ac40e92f	[AArch64] Remove PRBAR0_ELn and PRLAR0_ELn sysregs. The Armv8-R.64 architecture defines numbered MPU region registers with indices 1-15, not 0-15. So there's no such register as PRBAR0_EL2 or PRLAR0_EL1 (for example). The encodings that they would occupy are used for the unnumbered PRBAR_ELn and PRLAR_ELn registers. Reviewed By: labrinea Differential Revision: https://reviews.llvm.org/D117755	2022-01-20 13:37:58 +00:00
Abinav Puthan Purayil	d8b690409d	[AMDGPU] Set MemoryVT for truncstores in tblgen. GlobalISelEmitter was skipping these patterns when its predicates were checked. This patch should allow us to select d16_hi stores in GlobalISel. Differential Revision: https://reviews.llvm.org/D117762	2022-01-20 19:05:12 +05:30
Simon Pilgrim	304cfc706a	[X86] combineConcatVectorOps - remove superfluous Subtarget.hasAVX() check This function only ever gets called by AVX targets, and we already assert for this at the top of the function	2022-01-20 12:56:09 +00:00
Simon Pilgrim	c4f5fd76da	[X86] combineConcatVectorOps - add handling for X86ISD::VSHL/VSRL/VSRA These can be handled the same as the vector shift by immediate variants that are already handled.	2022-01-20 12:56:08 +00:00
Fraser Cormack	ca36cc56ac	[RISCV] Match RVV VF variants also through masked operations This brings floating-point RVV vector/scalar support more in line with the integer vector patterns, which can already match '.vx' instructions with masked operations. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117697	2022-01-20 12:08:02 +00:00
Peter Waller	d4a6bf4d1a	Revert "[AArch64][SVE][VLS] Move extends into arguments of comparisons" This reverts commit `db04d3e30b`, which causes a buildbot failure.	2022-01-20 12:01:23 +00:00
Fraser Cormack	5a12024b95	[RISCV] Optimize lowering of floating-point -0.0 This idea has come up in several reviews -- D115978 and D105902 -- so I can't take any credit for the idea. Instead of using a constant pool to lower -0.0, we can emit a sequence of two instructions: fmv.[hwd].x freg, zero fsgnjn.[hsd] freg, freg, freg This is only done when the floating-point type is legal. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117687	2022-01-20 11:46:28 +00:00
David Spickett	6732c43897	[llvm][AArch64] Accept armv8.8 "hbc" and "mops" in .arch_extension directive Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D117693	2022-01-20 09:16:08 +00:00
Chenbing.Zheng	0be3da1fab	[RISCV] Add intrinsic for Zbt extension RV32: fsl, fsr, fsri RV64: fsl, fsr, fsri, fslw, fsrw, fsriw Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117468	2022-01-20 08:27:05 +00:00
eopXD	8eae99dfe5	[RISCV] Add the zve extension according to the v1.0 spec `zve` is the new standard vector extension to specify varying degrees of vector support for embedding processors. The `zve` extension is related to the `zvl` extension and other updates that are added in v1.0. According to https://github.com/riscv-non-isa/riscv-c-api-doc/pull/21, Clang defines macro `__riscv_v_max_elen`, `__riscv_v_max_elen_fp` for `zve` and it can be used by applications that uses the vector extension. Authored by: Zakk Chen <zakk.chen@sifive.com> @khchen Co-Authored by: Eop Chen <eop.chen@sifive.com> @eopXD Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D112408	2022-01-19 23:48:28 -08:00
Jim Lin	216ac31dd7	[M68k][NFC] Rename Bt(BT) to Btst(BTST) It seems that implementation of Bt refered from x86. In M68k, Bt(BT) should be renamed to Btst(BTST). Reviewed By: myhsu Differential Revision: https://reviews.llvm.org/D117534	2022-01-20 12:45:02 +08:00
Heejin Ahn	eb675e972d	[WebAssembly] Support Wasm EH + Wasm SjLj D108960 added support for SjLj using Wasm EH instructions, which we call Wasm SjLj going forward. (We call the old SjLj Emscripten SjLj) But it did not support using Wasm EH and Wasm SjLj together. So far users of Wasm EH had to use Wasm EH with Emscripten SjLj, which had a certain limitation and it suffered from bigger code size increases as well. This enables using Wasm EH and Wasm SjLj together. 1. This redirects `catchswitch` and `cleanupret` that unwind to caller to `catch.dispatch.longjmp` BB, which is a `catchswitch` BB that handles longjmps. 2. D108960 converted all longjmpable `call`s to `invokes` that unwind to `catch.dispatch.longjmp`. This CL checks if the `call` is embedded within another `catchpad`, and if so, makes it unwind to its nearest parent's unwind destination, rather than `catch.dispatch.longjmp`. This is necessary to preserve the scoping structure. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D117610	2022-01-19 20:13:54 -08:00
Luo, Yuanke	5dea7a865e	Combine to vpdpbusd when operand is constant and small enough. Differential Revision: https://reviews.llvm.org/D116363	2022-01-20 11:10:49 +08:00
Ben Shi	94173dc24c	[AVR] Generate ELPM for loading byte/word from extended program memory Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D116493	2022-01-20 02:53:10 +00:00
Ben Shi	c1dd607463	[AVR][MC] Generate section '.progmemX.data' for extended flash banks Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D115987	2022-01-20 02:53:10 +00:00
Adrian Tong	b6a7ae2c5d	Optimize shift and accumulate pattern in AArch64. AArch64 supports unsigned shift right and accumulate. In case we see a unsigned shift right followed by an OR. We could turn them into a USRA instruction, given the operands of the OR has no common bits. Differential Revision: https://reviews.llvm.org/D114405	2022-01-20 01:57:40 +00:00
Mohammed Nurul Hoque	21c79be5d7	[RISCV] Add patterns to MIR sign-extension removal pass. This patch adds a few instruction patterns that generate sign-extended values or propagate them, adding to the pass introduced in https://reviews.llvm.org/D116397 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117465	2022-01-19 17:33:58 -08:00
Luís Marques	a767ae2c5c	[RISCV] Fix incomplete asm statement parsing For instructions without operands, the final `AsmToken::EndOfStatement` wasn't being consumed. In the context of inline assembly, the resulting empty statements would cause extraneous empty lines to be emitted. Fix the issue by consuming the `EndOfStatement` token. Differential Revision: https://reviews.llvm.org/D117565	2022-01-19 21:56:21 +00:00

1 2 3 4 5 ...

65876 Commits