llvm-project

Commit Graph

Author	SHA1	Message	Date
Philip Reames	32cfafddb1	[RISCV] Verify VL operand on instructions if present These should only be immediate values or GPR registers. Differential Revision: https://reviews.llvm.org/D133953	2022-09-15 13:06:52 -07:00
Sergei Barannikov	c6acb4eb0f	[SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s All in-tree targets pass pointer-sized ConstantSDNodes to the method. This overload reduced amount of boilerplate code a bit. This also makes getCALLSEQ_END consistent with getCALLSEQ_START, which already takes uint64_ts.	2022-09-15 14:02:12 -04:00
Craig Topper	5888c157a7	[RISCV] Simplify some code in RISCVInstrInfo::verifyInstruction. NFCI This code was written as if it lived in the MC layer instead of the CodeGen layer. We get the MCInstrDesc directly from MachineInstr. And we can use RISCVSubtarget::is64Bit instead of going to the Triple. Differential Revision: https://reviews.llvm.org/D133905	2022-09-14 17:07:21 -07:00
Philip Reames	e395915ac0	[RISCV] Verify SEW/VecPolicy immediate values Copy the asserts from the printing code, and turn them into actual verifier rules. Doing this revealed an existing bug - see `0a14551`. Differential Revision: https://reviews.llvm.org/D133869	2022-09-14 14:45:16 -07:00
Philip Reames	0a145516a2	[RISCV] Fix a silent miscompile in copyPhysReg Found this when adding verifier rules. The case which arises is that we have a DefMBBI which has a VecPolicy operand. The code was not expecting this, and the unconditional copy of the last two operands resulted in the SEW and VecPolicy fields being added to the VMV_V_V as AVL and SEW respectively. Oddly, this appears to be a silent in practice. There's no test change despite verifier changes proving that we definitely hit this in existing tests. Differential Revision: https://reviews.llvm.org/D133868	2022-09-14 14:45:01 -07:00
jacquesguan	ecf327f154	[RISCV] Add cost model for vector insert/extract element. This patch adds cost model for vector insert/extract element instructions. In RVV, we could use vector scalar move instruction to insert or extract the first element, and use vslide to move it. But for mask vector or i64 vector in i32 target, we need special instructions to make it. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133007	2022-09-14 11:10:18 +08:00
Yeting Kuo	1b56b2b267	[RISCV] Transform VMERGE_VVM_<LMUL>_TU with all ones mask to VADD_VI_<LMUL>_TU. The transformation is benefit because vmerge.vvm always needs mask operand but vadd.vi may not. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D133255	2022-09-14 10:01:37 +08:00
Han-Kuan Chen	dd53a0bb30	[RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type. Differential Revision: https://reviews.llvm.org/D133688	2022-09-13 18:50:20 -07:00
Fangrui Song	ab1c259613	[RISCV] Assemble `call foo` to R_RISCV_CALL_PLT R_RISCV_CALL/R_RISCV_CALL_PLT distinction isn't necessary. R_RISCV_CALL has been deprecated as a resolution to https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/98 . ld.lld and mold treat the two relocation types the same. GNU ld has a custom handling for undefined weak functions which is unnecessary: calling an unresolved undefined weak function is UB and GNU ld can handle the case without a relocation error (such a function call is usually guarded by a zero value check and should be allowed). This patch assembles `call foo` to use R_RISCV_CALL_PLT instead of the deprecated R_RISCV_CALL. Note: the code generator still differentiates `call foo` and (maybe preemptible) `call foo@plt`, but the difference is purely aesthetic. Note: D105429 does not support R_RISCV_CALL_PLT correctly. Changed the test to force R_RISCV_CALL for now. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D132530	2022-09-13 18:47:55 -07:00
Philip Reames	09d73fe8cd	[RISCV] Add MIR comments for VecPolicy operands Analogous to what we already do for SEW operands, aimed at making the resulting MIR readable by a human.	2022-09-13 15:36:33 -07:00
Philip Reames	cc45687e1c	[RISCV] Simpify operand index calculation in createMIROperandComment [nfc]	2022-09-13 15:06:40 -07:00
Craig Topper	8d7e73effe	[RISCV] Teach lowerVECTOR_SHUFFLE to recognize some shuffles as vnsrl. Unary shuffles such as <0,2,4,6,8,10,12,14> or <1,3,5,7,9,11,13,15> where half the elements are returned, can be lowered using vnsrl. SelectionDAGBuilder lowers such shuffles as a build_vector of extract_elements since the mask has less elements than the source. To fix this, I've enable the extractSubvectorIsCheapHook to allow DAGCombine to rebuild the shuffle using 2 extract_subvectors preceding the shufffle. I've gone very conservative on extractSubvectorIsCheapHook to minimize test impact and match what we have test coverage for. This can be improved in the future. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133736	2022-09-13 11:07:11 -07:00
Alex Bradbury	c44c1e9d3e	[RISCV] Implement isMaskAndCmp0FoldingBeneficial hook This hook is currently only used by CodeGenPrepare, which will sink and duplicate an 'and' into a block that has an 'icmp 0' user of it if the hook returns true. This hook is less useful for RISC-V than for targets like AArch64 that have a TBZ (test bit and branch if zero instruction), but may still be profitable if Zbs is available and a BEXTI can be selected. Conservatively, we return false even if Zbs is enabled for any masks that fit in the ANDI immediate because it's possible the only use is a branch on the result, and ANDI+BNEZ => BEXTI+BNEZ isn't a profitable transformation. Differential Revision: https://reviews.llvm.org/D131492	2022-09-13 18:54:00 +01:00
Alex Bradbury	547160848c	[RISCV] Return true in hasBitTest when Zbs is enabled and update BEXTI pattern for resulting canonicalisation As the Zbs extension includes bext[i] for bit extract, we can unconditionally return true from this hook. This hook causes the DAG combiner to perform the following canonicalisation: and (not (srl X, C)), 1 --> (and X, 1<<C) == 0 and (srl (not X), C)), 1 --> (and X, 1<<C) == 0 As simply changing the hook causes a codegen regression, this patch also modifies a BEXTI pattern to match this canonicalised form. As BSETINVMask is now used for BEXT as well as BSET and BINV, it has been renamed to the more generic SingleBitSetMask. There is one codegen change in bittest.ll for bittest_31_i64 (NOT+BEXTI rather than NOT+SRLIW). This is neutral in terms of code quality. Differential Revision: https://reviews.llvm.org/D131482	2022-09-13 16:51:47 +01:00
Craig Topper	5224bae613	[RISCV] Fix a bug in i32 FP_TO_UINT_SAT lowering on RV64. We use the saturating behavior of fcvt.wu.h/s/d but forgot to take into account that fcvt.wu will sign extend the saturated result. According to computeKnownBits a promoted FP_TO_UINT_SAT is expected to zero extend the saturated value. In many case the upper bits aren't be demanded so this wouldn't be an issue. But if we computeKnownBits caused an AND to be removed it would be a bug. This patch inserts an AND during to zero the upper bits. Unfortunately, this pessimizes code if we aren't able to tell if the upper bits are demanded. To fix that we could custom type promote the FP_TO_UINT_SAT with SEXT_INREG after it, but I'll leave that for future work. I haven't found a failure from this, I was revisiting the code to add vector support and spotted it. Differential Revision: https://reviews.llvm.org/D133746	2022-09-13 08:41:32 -07:00
Haojian Wu	7ed68182d7	Fix a -Wswitch warning.	2022-09-13 08:57:43 +02:00
jacquesguan	b98b4fae75	[RISCV] Add cost model for compare and select instructions. This patch adds cost model for vector compare and select instructions. For vector FP compare instruction, it only add the comparisions supported natively. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132296	2022-09-13 14:44:46 +08:00
Yeting Kuo	5fcb5d7759	[RISCV] Add assertion of hasVecPolicyOp to catch masked intrinsic without policy operand. The original code may have incorrect result if there is a masked instruction without policy operand to make us set its policy to TUMU. The patch adds an assertion to catch the instruction. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133302	2022-09-13 10:09:49 +08:00
Craig Topper	d49280e0a4	[RISCV] Rename WriteFALU* and ReadFALU* to WriteFAdd/ReadFAdd. ALU seems a little vague. FAdd felt more precise even though it also include FSUB instructions. Reviewed By: monkchiang Differential Revision: https://reviews.llvm.org/D133632	2022-09-12 09:37:28 -07:00
Craig Topper	4186a49d79	[RISCV] Custom type legalize i32 loads by sign extending. The default is to use extload which can become a zextload or sextload if it is followed by an 'and' or sext_inreg. Sometimes type legalization will introduce an 'and' from promoting something like 'srl X, C' and a sext_inreg from from a setcc. The 'and' could be freely folded with the promoted 'srl' by using srliw, but the sext_inreg can't be folded into a compare. DAG combiner will see both of these choices and may decide to fold the 'and' instead of the 'sext_inreg'. This forces the sext_inreg to become a sext.w. By picking sextload in the type legalizer we take this choice away. Looking at spec2006 compiled with Zba and Zbb this appeared to be net reduction in lines of code in the objdump disassembly output. This is similar to what we do with i32 add/sub/mul/shl in type legalization where we always emit a sext_inreg. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D130397	2022-09-12 09:13:07 -07:00
Alex Bradbury	51ae462447	[RISCV] Add the GlobalMerge pass (disabled by default) Split out from D129178, this just adds the GlobalMerge tests (other than global-merge-minsize.ll which is testing a specific configuration of the pass when it's enabled) and exposes `-riscv-enable-global-merge` and //doesn't enable it by default//. Note that the comment "// FIXME: Unify control over GlobalMerge." is copied from the Arm and AArch64 backends, which expose the same flag. Presumably the author is imagining some later refactoring that provides a target-independent flag. Reviewed By: craig.topper, reames, hiraditya Differential Revision: https://reviews.llvm.org/D130481	2022-09-08 18:40:38 -07:00
Craig Topper	5f3a8b585b	[RISCV] Add RecurKind::FMulAdd to isLegalToVectorizeReduction for scalable vectors. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133511	2022-09-08 12:34:59 -07:00
Joe Loser	5e96cea1db	[llvm] Use std::size instead of llvm::array_lengthof LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpful for C++17 or later: `std::size` already has support for C-style arrays. Change call sites to use `std::size` instead. Differential Revision: https://reviews.llvm.org/D133429	2022-09-08 09:01:53 -06:00
liqinweng	9b4e75ee76	[RISCV][COST] Add cost model for mask vector select instruction when its condition is a scalar type Reviewed By: jacquesguan Differential Revision: https://reviews.llvm.org/D132992	2022-09-08 18:55:49 +08:00
Philip Reames	a4a29438f4	[RISCV][MC] Add minimal support for Ztso extension This is a minimalist implementation which simply adds the extension (in the experimental namespace since its not ratified), and wires up the setting of the required ELF header flag. Future changes will include codegen changes to exploit the stronger memory model. This is intended to implement v0.1 of the proposed specification which can be found in Chapter 25 of https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-spec.pdf. Differential Revision: https://reviews.llvm.org/D133239	2022-09-07 09:30:57 -07:00
Craig Topper	5d30565d80	[RISCV] Improve vector fround lowering by changing FRM. This is a follow up to D133238 which did this for ceil/floor. Reviewed By: arcbbb, frasercrmck Differential Revision: https://reviews.llvm.org/D133335	2022-09-06 09:33:13 -07:00
Craig Topper	f0332d12ae	[RISCV] Improve vector fceil/ffloor lowering by changing FRM. This adds new VFCVT pseudoinstructions that take a rounding mode operand. A custom inserter is used to insert additional instructions to change FRM around the VFCVT. Some of this is borrowed from D122860, but takes a somewhat different direction. We may migrate to that patch, but for now I was trying to keep this as independent from RVV intrinsics as I could. A followup patch will use this approach for FROUND too. Still need to fix the cost model. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D133238	2022-09-05 19:03:44 -07:00
Craig Topper	11881a8f3f	[RISCV] Rename some V extension multiclasses for consistency. NFC Use "SDNode" in the name is the convention for the VLMax patterns in RISCVInstrInfoVSDPatterns.td. This files use "VL".	2022-09-01 22:17:08 -07:00
Alex Bradbury	6e1897ce95	[RISCV][NFC] Fix typo in comment in RISCVInstrInfoZicbo.td Zicbop->Zicbom typo.	2022-09-01 13:49:55 +01:00
liqinweng	c45810f810	[RISCV] When ISD::SETUGT && Imm == -1, has processed before lowering When ISD::SETUGT && Imm == -1, has processed before lowering. Use assert replace it Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132373	2022-09-01 15:38:16 +08:00
Craig Topper	6e0ae7e940	[RISCV] Slightly simplify coode in combineVWADD_W_VL_VWSUB_W_VL and combineMUL_VLToVWMUL_VL. NFC Use computeMaxSignificantBits instead of ComputeNumSignBits. Create APInt as part of call to MaskedValueIsZero instead of creating a named temporary.	2022-08-31 15:02:03 -07:00
Michael Maitland	30a4264f5f	[RISCV][CodeGen] add assertion to RISCVTargetStreamer getTargetStreamer() X86 and ARM AsmParsers have this same assertion. This assertion provides better reporting when the RISCVTargetStreamer is null and helps to prevent null pointer access. Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D132863	2022-08-31 11:15:47 -07:00
jacquesguan	45c1ce321d	[RISCV] Add cost model for select and integer compare instructions. This patch adds cost model for vector select and integer compare instructions.	2022-08-31 11:32:58 +08:00
Craig Topper	7973346d16	[RISCV] Use uint64_t countTrailingZeros/Ones instead of APInt. NFC We know the type is 32 or 64 bits, we can use getZExtValue and bypass the slow path check in APInt.	2022-08-30 12:39:36 -07:00
Craig Topper	893f5e95e2	[RISCV] Improve isel of AND with shiftedMask containing 32 leading zeros and some trailing zeros. We can use srliw to shift out the trailing bits and slli to shift back in zeros. The sign extend of srliw will 0 the upper 32 bits since we will be shifting a 0 into bit 31.	2022-08-30 12:22:46 -07:00
liqinweng	72c9f811d8	[RISCV][COST] Refactor for costs of integer saturing add/sub Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132822	2022-08-30 11:39:55 +08:00
Craig Topper	e25eb61d03	[RISCV] Enable (srl (and X, C2), C) to form SRLIW in more cases. Don't require the AND has one use and don't depend on targetShrinkDemandedConstant turning C2 into 0xffffffff. Instead, check that the constant is 0xffffffff after replacing any bits that will be shifted out with 1s. Another way to fix this might be to prevent SimplifyDemandedBits from destroying the ANDI after type legalization using targetShrinkDemandedBits. That would prevent the CSE that created this mess. targetShrinkDemandedBits is currently only enable after legalize ops. Quick experiment shows we can't just change when it runs, we would need to try a different heuristic for post type legalization.	2022-08-29 15:52:08 -07:00
Craig Topper	0fbe71e91f	[RISCV] Use hasAllWUsers to recover ANDI. SimplifyDemandedBits can 0 the upper bits and targetShrinkDemandedConstant isn't alway able to recover it. At least part of that may be because targetShrinkDemandedConstant only runs in the last DAGCombine. Might be worth seeing what happens if we move it post type legalization.	2022-08-29 14:11:09 -07:00
Craig Topper	1c334b306e	[RISCV] Add more invertible setccs to tryDemorganOfBooleanCondition. This builds on D132771 to invert (setlt 0, X) to (setlt X, 1) and vice versa. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132798	2022-08-29 12:23:03 -07:00
Craig Topper	9d12bb77f9	[RISCV] Apply DeMorgan to (beqz (and/or (seteq), (xor Z, 1))) to remove the xor. We can rewrite to (bnez (or/and (setne), Z) is Z is 0/1. Alternatively, we could canonicalize to (xor (or/and (setne), Z), 1) even if there is no branch. The xor would not always get removed, but it might enable other DeMorgan combines. I decided to be conservative for this first patch and require the xor to be removed. I have a couple other invertible setccs I will add in a follow up patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132771	2022-08-29 12:16:34 -07:00
Craig Topper	2f811a6c7f	[VP][RISCV] Add vp.fabs intrinsic and RISC-V support. Mostly just modeled after vp.fneg except there is a "functional instruction" for fneg while fabs is always an intrinsic. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D132793	2022-08-29 09:32:06 -07:00
Craig Topper	e0a9da2562	[RISCV] Add Uses=[FRM] and mayRaiseFPException to VF(N/W)CVT instructions. Reviewed By: arcbbb, kito-cheng Differential Revision: https://reviews.llvm.org/D132792	2022-08-29 09:26:33 -07:00
Craig Topper	6732896bbf	[RISCV] Use analyzeBranch in RISCVRedundantCopyElimination. The existing code was incorrect if we had more than one conditional branch instruction in a basic block. Though I don't think that will occur, using analyzeBranch detects that as an unsupported case. Overall this results in simpler code in RISCVRedundantCopyElimination. Reviewed By: reames, kito-cheng Differential Revision: https://reviews.llvm.org/D132347	2022-08-29 09:05:53 -07:00
Yeting Kuo	abf0416328	[RISCV] Merge vmerge.vvm and unmasked intrinsic with VLMAX vector length. The motivation of this patch is to lower the IR pattern (vp.merge mask, (add x, y), false, vl) to (PseudoVADD_VV_<LMUL>_MASK false, x, y, mask, vl). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131841	2022-08-29 11:44:51 +08:00
liqinweng	6fdd62c98d	[RISCV] Remove unused code Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D132281	2022-08-29 10:16:44 +08:00
liqinweng	a42e21deb8	[RISCV] Refactor for costs of integer min/max Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132724	2022-08-29 10:13:50 +08:00
Kazu Hirata	2833760c57	[Target] Qualify auto in range-based for loops (NFC)	2022-08-28 17:35:09 -07:00
Benjamin Kramer	b69086e6c7	[RISC-V][HWASAN] Fold variable into assert	2022-08-29 00:32:37 +02:00
Alexey Baturo	e3485345d3	[RISC-V][HWASAN] Add support for lowering HWASAN intrinsic for RISC-V Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D131343	2022-08-28 21:22:13 +03:00
Alexey Baturo	0636aec330	[RISC-V][HWASAN] Add intrinsics required for HWASAN support for RISC-V Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D131340	2022-08-28 18:05:43 +03:00
Philip Reames	b45a262679	[RISCV] Enable fixed length vectors and loop vectorization with same This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size. For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware. The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization. SLP has been disabled for now, even when fixed vectors are enabled. See `a310637` and associated review. There are a few addiitional code quality issues which need worked through before turning SLP on would be reasonable. Differential Revision: https://reviews.llvm.org/D131508	2022-08-26 14:45:23 -07:00
Philip Reames	a310637132	[RISCV] Disable SLP vectorization by default due to unresolved profitability issues This change implements a TTI query with the goal of disabling slp vectorization on RISCV. The current default configuration disables SLP already, but its current tied to the ability to lower fixed length vectors. Over in D131508, I want to enable fixed length vectors for purposes of LoopVectorizer, but preliminary analysis has revealed a couple of SLP specific issues we need to resolve before enabling it by default. This change exists to allow us to enable LV without SLP. Differential Revision: https://reviews.llvm.org/D132680	2022-08-26 14:11:22 -07:00
Yunze Zhu	3846e3970f	[RISCV] Generate correct ELF abi flag when empty .ll file has target-abi attribute In patch D121183, target abi is get from .ll file's target-abi attribute and set in RISCVAsmPrinter::emitFunctionEntryLabel function. In https://github.com/llvm/llvm-project/issues/57242, an api mismatch error may be caused by failing to call function RISCVAsmPrinter::emitFunctionEntryLabel to set target-abi to correct one when the .ll is empty or a module has no function. This patch move setting target-abi part to function RISCVAsmPrinter::emitStartOfAsmFile, make sure all .ll file and module in LTO read target-abi from module flag and set, with or without function. Signed-off-by: xiaojing.zhang <xiaojing.zhang@xcalibyte.com> Signed-off-by: jianxin.lai <jianxin.lai@xcalibyte.com> Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D132204	2022-08-26 14:39:39 +08:00
LiaoChunyu	6b098bf35a	[RISCV] : Add support for simm10_lsb0000nonzero operand. Running on RISCV machine llvm-exegesis I faced with trouble: can't measure C_ADDI16SP, beacuse immediate has type simm10_lsb0000nonzero. Patch adds support for processing this immediate operand type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132650	2022-08-26 14:37:37 +08:00
Craig Topper	e4177201eb	[RISCV][M68k] Replace fixed size BitVector with std::bitset. Saves a heap allocation and avoids an explicit call to the BitVector constructor. Reviewed By: reames, myhsu Differential Revision: https://reviews.llvm.org/D132674	2022-08-25 12:45:08 -07:00
Craig Topper	41a3b5739b	[RISCV] Teach combineDeMorganOfBoolean to handle (and (xor X, 1), (not Y)). SimplifyDemandedBits tries to agressively turn xor immediates into -1 to match a 'not' instruction. In this case, because X is a boolean, the upper bits of (xor X, 1) are known to be 0. Because this is an AND instruction, that means those bits aren't demanded from the other operand, and thus SimplifyDemandedBits can turn (xor Y, 1) to (not Y). We need to detect that this has happened to enable the DeMorgan optimization. To do this we allow one of the xors to use -1 when the outer operation is And. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132671	2022-08-25 10:55:45 -07:00
Philip Reames	53f738ce7e	[RISCV] Add empirical costs for integer min/max and saturing add/sub All of these are lowered to a single instruction for all legal vector types.	2022-08-25 09:27:17 -07:00
Craig Topper	ec91d761ac	[RISCV] Apply DeMorgan's law to (and/or (xor X, 1), (xor Y, 1)) if X and Y are 0/1. This optimizes xors that appear due to legalizing setge/setle which require an xor with 1. This reduces the number of xors and may allow the xor to fold with a beqz or bnez. Differential Revision: https://reviews.llvm.org/D132614	2022-08-25 08:49:30 -07:00
Philip Reames	03798f268b	{RISCV] Backout cttz/ctlz instruction costs Craig points out correctly in post-commit review that these depend on the availability of floating point extensions.	2022-08-24 15:40:48 -07:00
Philip Reames	d4d6e71ea2	[RISCV] Add empirical costs for bswap/bitreverse/ctpop/ctlz/cttz If anyone is looking for a source of ideas on vector codegen improvements, the lowerings for several of these seem to include pretty obvious fixits.	2022-08-24 15:09:21 -07:00
Philip Reames	42af1a776a	[RISCV] Add empirically measured vector sqrt intrinsic costs	2022-08-24 14:27:57 -07:00
Philip Reames	4d3134866f	[RISCV] Add vector fabs intrinsic costs We have a fabs vector instruction, and are using it for current lowering.	2022-08-24 14:09:51 -07:00
Saleem Abdulrasool	8f45b5a7a9	RISCV: permit unaligned nop-slide padding emission We may be requested to emit an unaligned nop sequence (e.g. 7-bytes or 3-bytes). These should be 0-filled even though that is not a valid instruction. This matches the behaviour on other architectures like ARM, X86, and MIPS. When a custom section is emitted, it may be classified as text even though it may be a data section or we may be emitting data into a text segment (e.g. a literal pool). In such cases, we should be resilient to the emission request. This was originally identified by the Linux kernel build and reported on D131270 by Nathan Chancellor. Differential Revision: https://reviews.llvm.org/D132482 Reviewed By: luismarques Tested By: Nathan Chancellor	2022-08-24 20:26:48 +00:00
Simon Pilgrim	f9de13232f	[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis. For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling. Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU. Differential Revision: https://reviews.llvm.org/D132520	2022-08-24 17:28:18 +01:00
Kito Cheng	8e8a62006e	[RISCV][NFC] Minor cleanup in RISCVInstrInfo::getOutliningType The only use of TM is checking result of TargetMachine::getFunctionSections, check that directly instead of introdce a local variable.	2022-08-24 23:42:34 +08:00
Alex Richardson	38107171ed	[RegisterInfoEmitter] Generate isConstantPhysReg(). NFCI This commit moves the information on whether a register is constant into the Tablegen files to allow generating the implementaiton of isConstantPhysReg(). I've marked isConstantPhysReg() as final in this generated file to ensure that changes are made to tablegen instead of overriding this function, but if that turns out to be too restrictive, we can remove the qualifier. This should be pretty much NFC, but I did notice that e.g. the AMDGPU generated file also includes the LO16/HI16 registers now. The new isConstant flag will also be used by D131958 to ensure that constant registers are marked as call-preserved. Differential Revision: https://reviews.llvm.org/D131962	2022-08-24 14:16:20 +00:00
Kito Cheng	96c85f80f0	[RISCV] Don't outline pcrel-lo operand. This issue is found by build llvm-testsuite with `-Oz`, linker will complain `dangerous relocation: %pcrel_lo missing matching %pcrel_hi` and that turn out cause by we outlined pcrel-lo, but leave pcrel-hi there, that's not problem in general, but the problem is they put into different section, they pcrel-hi and pcrel-lo pair (e.g. AUIPC+ADDI) MUST put be present in same section due to the implementation. Outlined function will put into .text name, but the source functions will put in .text.<function-name> if function-section is enabled or the function has `comdat` attribute. There are few solutions for this issue: 1. Always disallow instructions with pcrel-lo flags. 2. Only disallow instructions with pcrel-lo flags that when function-section is enabled or this function has `comdat` attribute. 3. Check the corresponding instruction with pcrel-high also included in the outlining candidate sequence or not, and allow that only when pcrel-high is included in the outlining candidate. First one is most conservative, that might lose some optimization opportunities, and second one could save those opportunities, and last one is hard to implement, and don't have any benefits since pcrel-high are using different label even accessing same symbol. Use custom section name might also cause this problem, but that already filtered by RISCVInstrInfo::isFunctionSafeToOutlineFrom. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D132528	2022-08-24 21:47:46 +08:00
MarkGoncharovAl	8c1f18bd3e	[RISCV] : Add support for immediate operands. llvm-exegesis uses operand type information provided in tablegen files to initialize immediate arguments of the instruction. Some of them simply don't have such information. Thus we should set into relevant immediate operands their specific type. Also create verification methods for them. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131771	2022-08-24 17:48:39 +08:00
Alex	07a700f814	[RISCV] Add zihintntl compressed instructions Add zihintntl compressed instructions and some files related to zihintntl. This patch is base on {D121670}. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D121779	2022-08-24 14:29:02 +08:00
ZHU Zijia	9c85382ade	[RISCV] Handle register spill in branch relaxation In branch relaxation pass, `j`'s with offset over 1MiB will be relaxed to `jump` pseudo-instructions. This patch allocates a stack slot for functions with a size greater than 1MiB. If the register scavenger cannot find a scratch register for `jump`, spill a register to the slot before the jump and restore it after the jump. .mbb: foo j .dest_bb bar bar bar .dest_bb: baz The above code will be relaxed to the following code. .mbb: foo sd s11, 0(sp) jump .restore_bb, s11 bar bar bar j .dest_bb .restore_bb: ld s11, 0(sp) .dest_bb: baz Depends on D129999. Reviewed By: StephenFan Differential Revision: https://reviews.llvm.org/D130560	2022-08-24 13:27:56 +08:00
Philip Reames	c9608d57b8	[TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC] This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet. The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.	2022-08-23 07:55:42 -07:00
Philip Reames	478cf94378	[X86][AArch64][WebAsm][RISCV] Query operand properties instead of using enums directly [nfc] This is part of an ongoing transition to use OperandValueInfo which combines OperandValueKind and OperandValueProperties. This change adds some accessor methods and uses them to simplify backend code. The primary motivation of doing so is removing uses of the parameters so that an upcoming api change is less error prone.	2022-08-22 13:37:59 -07:00
Shao-Ce SUN	7167a4207e	[RISCV] Add zihintntl instructions Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D121670	2022-08-22 12:06:30 +08:00
Craig Topper	abce7acebd	[RISCV] Remove impossible TODO in RISCVRedundantCopyElimination. NFC If there are multiple conditional branches we shouldn't do any optimization.	2022-08-21 13:18:02 -07:00
Craig Topper	1a042dd6ed	[RISCV] Optimize x <s -1 ? x : -1. Improve x >u 1 ? x : 1. Similar to D132211, we can optimize x <s -1 ? x : -1 -> x <s 0 ? x : -1 Also improve the unsigned case from D132211 to use x != 0 which will give a bnez instruction which might be compressible. Differential Revision: https://reviews.llvm.org/D132252	2022-08-21 11:48:28 -07:00
Craig Topper	a6c3ccd476	[RISCV] Be more strict about LUI+ADDI macrofusion pre-RA. Don't macrofuse if the LUI has more than 1 user. That will likely require the LUI to have a different destination register post-RA. LUI+ADDI can only be fused if they write the same register.	2022-08-21 10:58:15 -07:00
LiaoChunyu	1fb87ace4d	[RISCV] Optimize x > 1 ? x : 1 -> x > 0 ? x : 1 if x == 1, x > 1 ? x : 1 return x, which is also 1. x > 0 ? x : 1 return 1. Reduce the number of load 1 instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132211	2022-08-21 20:26:39 +08:00
Simon Pilgrim	5263155d5b	[CostModel] Add CostKind argument to getShuffleCost Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future. Differential Revision: https://reviews.llvm.org/D132287	2022-08-21 10:54:51 +01:00
Craig Topper	6227b7ae31	[RISCV] Move xori creation for scalar setccs to lowering. This patch enables expansion or custom lowering for some integer condition codes so that any xori that is needed is created before the last DAG combine to enable optimization. I've seen cases where we end up with (or (xori (setcc), 1), (xori (setcc), 1)) which we would ideally convert to (xori (and (setcc), (setcc)), 1). This patch doesn't accomplish that yet, but it should allow us to add DAG combines as follow ups. Example https://godbolt.org/z/Y4qnvsq1b Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131729	2022-08-19 13:51:53 -07:00
Philip Reames	59960e8db9	[RISCV] Factor out getVectorImmCost cost after 0e7ed3 [nfc]	2022-08-19 12:53:54 -07:00
Philip Reames	e7fda46300	[RISCV] Correct costs for vector ceil/floor/trunc/round Add vector costs for ceil/floor/trunc/round. As can be seen in the tests, the prior default costs were a significant under estimate of the actual code generated. These costs are computed by simply generating code with the current backend, and then counting the number of instructions. I discount one vsetvli, and ignore the return. Differential Revision: https://reviews.llvm.org/D131967	2022-08-19 10:37:39 -07:00
Craig Topper	961838cc13	[RISCV] Add passthru operand to RISCVISD::SETCC_VL. Use it to the fix a bug in the fceil/ffloor lowerings. We were setting the passthru to IMPLICIT_DEF before and using a mask agnostic policy. This means where the incoming bits in the mask were 0 they could be anything in the outgoing mask. We want those bits in the outgoing mask to be 0. This means we need to pass the input mask as the passthru. This generates worse code because we are unable to allocate the v0 register to the output due to an earlyclobber constraint. We probably need a special TIED pseudoinstruction and probably custom isel since you can't use V0 twice in the input pattern. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132058	2022-08-19 08:53:44 -07:00
Craig Topper	c9a41fe60a	[RISCV] Prefer vnsrl.wi v8, v8, 0 over vnsrl.wx v8, v8, x0. I have a couple data points that some microarchitectures prefer the immediate 0 over x0. Does anyone know of microarchitectures where the opposite is true? Unfortunately, this is different than the vncvt.x.x.w alias from the spec. Perhaps the alias was poorly chosen if x0 isn't as optimal as immediate 0 on all microarchitectures. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132041	2022-08-19 08:40:17 -07:00
Alexey Bataev	0e7ed32c71	[SLP]Cost for a constant buildvector. In many cases constant buildvector results in a vector load from a constant/data pool. Need to consider this cost too. Differential Revision: https://reviews.llvm.org/D126885	2022-08-19 08:02:42 -07:00
Alexey Bataev	d53e245951	[COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC. Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to better estimate cost with immediate values. Part of D126885.	2022-08-19 07:33:00 -07:00
Craig Topper	ba1f4cab44	[RISCV] Copy SDNodeFlags in lowerToScalableOp. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D132177	2022-08-18 20:42:59 -07:00
Craig Topper	5349aa2354	[RISCV] Copy SDNodeFlags in doPeepholeMaskedRVV and doPeepholeMergeVVMFold Especially the NoFPExcept flag for FP. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D132173	2022-08-18 20:42:46 -07:00
Craig Topper	37c47b2cac	[RISCV] Change how mtune aliases are implemented. The previous implementation translated from names like sifive-7-series to sifive-7-rv32 or sifive-7-rv64. This also required sifive-7-rv32 and sifive-7-rv64 to be valid CPU names. As those are not real CPUs it doesn't make sense to accept them in -mcpu. This patch does away with the translation and adds sifive-7-series directly to RISCV.td. Removing sifive-7-rv32 and sifive-7-rv64. sifive-7-series is only allowed in -mtune. I've also added "rocket" to RISCV.td but have not removed rocket-rv32 or rocket-rv64. To prevent -mcpu=sifive-7-series or -mcpu=rocket being used with llc, I've added a Feature32Bit to all rv32 CPUs. And made it an error to have an rv32 triple without Feature32Bit. sifive-7-series and rocket do not have Feature32Bit or Feature64Bit set so the user would need to provide -mattr=+32bit or -mattr=+64bit along with the -mcpu to avoid the error. SiFive no longer names their newer products with 3, 5, or 7 series. Instead we have p200 series, x200 series, p500 series, and p600 series. Following the previous behavior would require a sifive-p500-rv32 and sifive-p500-rv64 in order to support -mtune=sifive-p500-series. There is currently no p500 product, but it could start getting confusing if there was in the future. I'm open to hearing alternatives for how to achieve my main goal of removing sifive-7-rv32/rv64 as a CPU name. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131708	2022-08-18 16:22:25 -07:00
Philip Reames	4d87591028	[RISCV] Use VScaleForTuning in costing of operations whose cost depends on VL On known hardware, reductions, gather, and scatter operations have execution latencies which correlated with the vector length (VL) of the operation. Most other operations (e.g. simply arithmetic) don't correlated in this way, and instead essentially fixed cost as VL varies. When I'd implemented initial scalable cost model support for reductions, gather, and scatter operations, I had used an upper bound on the statically unknown VL. The argument at the time was that this prevented falsely low costs, and biased the vectorizer away from generating bad (on some hardware) code. Unfortunately, practical experience shows we were a bit too effective at that goal, and the high costs defacto prevents vectorization using these constructs at all. This patch reverses course, and ties the returned cost not to the maximum possible VL, but the VL which would correspond to VScaleForTuning. This parameter is the same one the vectorizer uses when normalizing loop costs, so the term effectively cancels out. The result is that the vectorizer now sees these constructs as comparable in cost to their fixed length variants. This does introduce the possibility of the cost for these operations being a significant under estimate on platforms where actual VLEN is far from that implied by VScaleForTuning. On such platforms, we might make poor heuristic choices. Probably not in LV itself (due to the cancellation mentioned above), but possibly during e.g. lowering. I'm not currently aware of any concrete examples of this, but this patch does open a concern which did not previously exist. Previously, we had the problem of overestimating costs causing the same problem on machines much closer to default values for vscale for tuning. With this patch, we still have that problem potentially if vscale for tuning is set high (manually), and then the code is run on a narrow VLEN machine. Differential Revision: https://reviews.llvm.org/D131519	2022-08-18 13:10:03 -07:00
Simon Pilgrim	fdec50182d	[CostModel] Replace getUserCost with getInstructionCost * Replace getUserCost with getInstructionCost, covering all cost kinds. * Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks. Original Patch by @samparker (Sam Parker) Differential Revision: https://reviews.llvm.org/D79483	2022-08-18 11:55:23 +01:00
WuXinlong	515ece1a90	[RISCV] Add MC support of RISCV Zca Extension This patch adds support for part of Zc extension which will be frozen soon. This extension is designed to continue reducing the binary size of RISC-V programs. In this patch: `Zca` is a subset of C extension instructions that are compatible with the Zc extension. The spec of Zc ext is [[ https://github.com/riscv/riscv-code-size-reduction/releases \| Here ]] Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130141	2022-08-18 12:13:35 +08:00
Daniil Fukalov	7ed3d81333	[NFCI] Move cost estimation from TargetLowering to TargetTransformInfo. TragetLowering had two last InstructionCost related `getTypeLegalizationCost()` and `getScalingFactorCost()` members, but all other costs are processed in TTI. E.g. it is not comfortable to use other TTI members in these two functions overrided in a target. Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout parameter - it was always passed from TTI. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D117723	2022-08-18 00:38:55 +03:00
Craig Topper	550fab53e1	[RISCV] Fold (sub C, (xor (setcc), 1)) -> (add (setcc), C-1). Extracted from D131729 where we handled C==0. It's now generalized to more constants. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132000	2022-08-17 09:50:08 -07:00
Craig Topper	ab4cd154c6	[RISCV] Refactor performSUBCombine to prepare for D132000. This refactors the code into a separate function with early returns. D132000 adds an additional operation to the if/else that selects NewLHS, but can otherwise share the rest of the code. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132002	2022-08-17 09:50:08 -07:00
Alex Bradbury	ce38128194	[RISCV] Avoid redundant branch-to-branch when expanding cmpxchg If the success value of a cmpxchg is used in a branch, the expanded cmpxchg sequence ends up with a redundant branch-to-branch (as the backend atomics expansion happens as late as possible, passes to optimise such cases have already run). This patch identifies this case and avoid it when expanding the cmpxchg. Note that a similar optimisation is possible for a BEQ on the cmpxchg success value. As it's hard to imagine a case where real-world code may do that, this patch doens't handle that case. Differential Revision: https://reviews.llvm.org/D130192	2022-08-17 13:49:15 +01:00
Craig Topper	d27c147aaa	[RISCV] Allow lowerSELECT to fold integer setcc with FP select. We'd pick it up in DAG combine later even if we didn't handle it here. No test changes because we get it in DAG combine anyway.	2022-08-16 21:28:54 -07:00
Craig Topper	ba1fb54821	[RISCV] Reuse existing VT variable instead of calling getValueType() repeatedly. NFC	2022-08-16 19:56:55 -07:00
Monk Chiang	0af4651c0f	[RISCV] Add scheduling class for vector pseudo segment instructions. Add scheduling resource for vector segment load/store instructions in D128886. I miss to add scheduling resource for pseudo segment instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130222	2022-08-16 17:54:47 -07:00
Craig Topper	53ce22e429	Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine." This time using N1 instead of N0 since N1 points to the original setcc. This now affects scheduling as I expected. Original commit message: We change seteq<->setne but it doesn't change the semantics of the setcc. We should keep original debug location. This is consistent with visitXor in the generic DAGCombiner.	2022-08-16 15:51:07 -07:00
Craig Topper	2dfa4b6475	Revert "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine." This reverts commit `1380b21ceb`. I mixed up N0 and N1 and didn't do what I intended.	2022-08-16 15:47:01 -07:00

1 2 3 4 5 ...

2411 Commits