llvm-project

Commit Graph

Author	SHA1	Message	Date
Yeting Kuo	1b56b2b267	[RISCV] Transform VMERGE_VVM_<LMUL>_TU with all ones mask to VADD_VI_<LMUL>_TU. The transformation is benefit because vmerge.vvm always needs mask operand but vadd.vi may not. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D133255	2022-09-14 10:01:37 +08:00
Han-Kuan Chen	dd53a0bb30	[RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type. Differential Revision: https://reviews.llvm.org/D133688	2022-09-13 18:50:20 -07:00
Philip Reames	09d73fe8cd	[RISCV] Add MIR comments for VecPolicy operands Analogous to what we already do for SEW operands, aimed at making the resulting MIR readable by a human.	2022-09-13 15:36:33 -07:00
Philip Reames	c97952ee1e	[RISCV] Split vmerge peephole tests so they can be auto-gened Per original review, only one test needed the MIR output checks. Copy that to it's own file, and use auto-gen for all three files.	2022-09-13 11:44:50 -07:00
Craig Topper	8d7e73effe	[RISCV] Teach lowerVECTOR_SHUFFLE to recognize some shuffles as vnsrl. Unary shuffles such as <0,2,4,6,8,10,12,14> or <1,3,5,7,9,11,13,15> where half the elements are returned, can be lowered using vnsrl. SelectionDAGBuilder lowers such shuffles as a build_vector of extract_elements since the mask has less elements than the source. To fix this, I've enable the extractSubvectorIsCheapHook to allow DAGCombine to rebuild the shuffle using 2 extract_subvectors preceding the shufffle. I've gone very conservative on extractSubvectorIsCheapHook to minimize test impact and match what we have test coverage for. This can be improved in the future. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133736	2022-09-13 11:07:11 -07:00
Alex Bradbury	547160848c	[RISCV] Return true in hasBitTest when Zbs is enabled and update BEXTI pattern for resulting canonicalisation As the Zbs extension includes bext[i] for bit extract, we can unconditionally return true from this hook. This hook causes the DAG combiner to perform the following canonicalisation: and (not (srl X, C)), 1 --> (and X, 1<<C) == 0 and (srl (not X), C)), 1 --> (and X, 1<<C) == 0 As simply changing the hook causes a codegen regression, this patch also modifies a BEXTI pattern to match this canonicalised form. As BSETINVMask is now used for BEXT as well as BSET and BINV, it has been renamed to the more generic SingleBitSetMask. There is one codegen change in bittest.ll for bittest_31_i64 (NOT+BEXTI rather than NOT+SRLIW). This is neutral in terms of code quality. Differential Revision: https://reviews.llvm.org/D131482	2022-09-13 16:51:47 +01:00
Craig Topper	5224bae613	[RISCV] Fix a bug in i32 FP_TO_UINT_SAT lowering on RV64. We use the saturating behavior of fcvt.wu.h/s/d but forgot to take into account that fcvt.wu will sign extend the saturated result. According to computeKnownBits a promoted FP_TO_UINT_SAT is expected to zero extend the saturated value. In many case the upper bits aren't be demanded so this wouldn't be an issue. But if we computeKnownBits caused an AND to be removed it would be a bug. This patch inserts an AND during to zero the upper bits. Unfortunately, this pessimizes code if we aren't able to tell if the upper bits are demanded. To fix that we could custom type promote the FP_TO_UINT_SAT with SEXT_INREG after it, but I'll leave that for future work. I haven't found a failure from this, I was revisiting the code to add vector support and spotted it. Differential Revision: https://reviews.llvm.org/D133746	2022-09-13 08:41:32 -07:00
Craig Topper	b14b0b5213	[RISCV] Add test cases with result of fp_to_s/uint_sat sign/zero-extended from i32 to i64. NFC I believe the result for fp_to_uint_sat is incorrect for this case.	2022-09-12 20:27:25 -07:00
Craig Topper	00d36f61ad	[RISCV] Move some vector test into the rvv test directory. NFC	2022-09-12 16:36:14 -07:00
Craig Topper	38ffa2bb96	[LegalizeTypes] Improve splitting for urem/udiv by constant for some constants. For remainder: If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer type, this urem will be expand by DAGCombiner using multiply by magic constant. We do have to take into account that adding high and low together can produce a carry, making it a (BitWidth / 2)+1 bit number. So we need to also add back in the carry from the first addition. For division: We can use the above trick to compute the remainder, subtract that remainder from the dividend, then multiply by the multiplicative inverse of the Divisor modulo (1 << BitWidth). This is based on the section "Remainder by Summing Digits" in Hacker's delight. The remainder trick is similar to a trick you may have learned for determining if a decimal number is divisible by 3. You can add all the digits together and see if the sum is divisible by 3. If you're not sure if the sum is divisible by 3, you can add its digits together. This can be repeated until you have a single decimal digit. If that digit is 3, 6, or 9, then the original number is divisible by 3. This works because 10 % 3 == 1. gcc already does this same trick. There are additional tricks gcc does urem as well as srem, udiv, and sdiv that I plan to add in future patches. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130862	2022-09-12 10:34:52 -07:00
Craig Topper	4186a49d79	[RISCV] Custom type legalize i32 loads by sign extending. The default is to use extload which can become a zextload or sextload if it is followed by an 'and' or sext_inreg. Sometimes type legalization will introduce an 'and' from promoting something like 'srl X, C' and a sext_inreg from from a setcc. The 'and' could be freely folded with the promoted 'srl' by using srliw, but the sext_inreg can't be folded into a compare. DAG combiner will see both of these choices and may decide to fold the 'and' instead of the 'sext_inreg'. This forces the sext_inreg to become a sext.w. By picking sextload in the type legalizer we take this choice away. Looking at spec2006 compiled with Zba and Zbb this appeared to be net reduction in lines of code in the objdump disassembly output. This is similar to what we do with i32 add/sub/mul/shl in type legalization where we always emit a sext_inreg. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D130397	2022-09-12 09:13:07 -07:00
Alex Bradbury	51ae462447	[RISCV] Add the GlobalMerge pass (disabled by default) Split out from D129178, this just adds the GlobalMerge tests (other than global-merge-minsize.ll which is testing a specific configuration of the pass when it's enabled) and exposes `-riscv-enable-global-merge` and //doesn't enable it by default//. Note that the comment "// FIXME: Unify control over GlobalMerge." is copied from the Arm and AArch64 backends, which expose the same flag. Presumably the author is imagining some later refactoring that provides a target-independent flag. Reviewed By: craig.topper, reames, hiraditya Differential Revision: https://reviews.llvm.org/D130481	2022-09-08 18:40:38 -07:00
Eric Wang	d8a2d3f7d4	[NFC][Regalloc] Introduce the RegAllocPriorityAdvisorAnalysis This patch introduces the priority analysis and the priority advisor, the default implementation, and the scaffolding for introducing the other implementations of the advisor. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D132835	2022-09-08 07:50:03 -07:00
Philip Reames	a4a29438f4	[RISCV][MC] Add minimal support for Ztso extension This is a minimalist implementation which simply adds the extension (in the experimental namespace since its not ratified), and wires up the setting of the required ELF header flag. Future changes will include codegen changes to exploit the stronger memory model. This is intended to implement v0.1 of the proposed specification which can be found in Chapter 25 of https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-spec.pdf. Differential Revision: https://reviews.llvm.org/D133239	2022-09-07 09:30:57 -07:00
Craig Topper	5d30565d80	[RISCV] Improve vector fround lowering by changing FRM. This is a follow up to D133238 which did this for ceil/floor. Reviewed By: arcbbb, frasercrmck Differential Revision: https://reviews.llvm.org/D133335	2022-09-06 09:33:13 -07:00
Matthias Gehre	af3758d678	Fix remaining test failures for "[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64"	2022-09-06 16:38:43 +01:00
Craig Topper	f0332d12ae	[RISCV] Improve vector fceil/ffloor lowering by changing FRM. This adds new VFCVT pseudoinstructions that take a rounding mode operand. A custom inserter is used to insert additional instructions to change FRM around the VFCVT. Some of this is borrowed from D122860, but takes a somewhat different direction. We may migrate to that patch, but for now I was trying to keep this as independent from RVV intrinsics as I could. A followup patch will use this approach for FROUND too. Still need to fix the cost model. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D133238	2022-09-05 19:03:44 -07:00
Craig Topper	893f5e95e2	[RISCV] Improve isel of AND with shiftedMask containing 32 leading zeros and some trailing zeros. We can use srliw to shift out the trailing bits and slli to shift back in zeros. The sign extend of srliw will 0 the upper 32 bits since we will be shifting a 0 into bit 31.	2022-08-30 12:22:46 -07:00
wanglian	e2bb9774b1	[LegalizeTypes] Support widen result for VECTOR_REVERSE. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132359	2022-08-30 10:01:26 +08:00
Craig Topper	e25eb61d03	[RISCV] Enable (srl (and X, C2), C) to form SRLIW in more cases. Don't require the AND has one use and don't depend on targetShrinkDemandedConstant turning C2 into 0xffffffff. Instead, check that the constant is 0xffffffff after replacing any bits that will be shifted out with 1s. Another way to fix this might be to prevent SimplifyDemandedBits from destroying the ANDI after type legalization using targetShrinkDemandedBits. That would prevent the CSE that created this mess. targetShrinkDemandedBits is currently only enable after legalize ops. Quick experiment shows we can't just change when it runs, we would need to try a different heuristic for post type legalization.	2022-08-29 15:52:08 -07:00
Craig Topper	5bd92d21b0	[RISCV] Add test for failure to use ANDI and SRLIW due to SimplifyDemandedBits.	2022-08-29 15:47:55 -07:00
Craig Topper	0fbe71e91f	[RISCV] Use hasAllWUsers to recover ANDI. SimplifyDemandedBits can 0 the upper bits and targetShrinkDemandedConstant isn't alway able to recover it. At least part of that may be because targetShrinkDemandedConstant only runs in the last DAGCombine. Might be worth seeing what happens if we move it post type legalization.	2022-08-29 14:11:09 -07:00
Craig Topper	7c17b0afb1	[RISCV] Add test case for missed opportunity to use ANDI. Immediate was messed up by SimplfyDemandedBits.	2022-08-29 14:02:07 -07:00
Craig Topper	1c334b306e	[RISCV] Add more invertible setccs to tryDemorganOfBooleanCondition. This builds on D132771 to invert (setlt 0, X) to (setlt X, 1) and vice versa. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132798	2022-08-29 12:23:03 -07:00
Craig Topper	34e83525aa	[RISCV] Pre-commit tests for D132798. NFC	2022-08-29 12:20:36 -07:00
Craig Topper	9d12bb77f9	[RISCV] Apply DeMorgan to (beqz (and/or (seteq), (xor Z, 1))) to remove the xor. We can rewrite to (bnez (or/and (setne), Z) is Z is 0/1. Alternatively, we could canonicalize to (xor (or/and (setne), Z), 1) even if there is no branch. The xor would not always get removed, but it might enable other DeMorgan combines. I decided to be conservative for this first patch and require the xor to be removed. I have a couple other invertible setccs I will add in a follow up patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132771	2022-08-29 12:16:34 -07:00
Craig Topper	2f811a6c7f	[VP][RISCV] Add vp.fabs intrinsic and RISC-V support. Mostly just modeled after vp.fneg except there is a "functional instruction" for fneg while fabs is always an intrinsic. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D132793	2022-08-29 09:32:06 -07:00
Craig Topper	e0a9da2562	[RISCV] Add Uses=[FRM] and mayRaiseFPException to VF(N/W)CVT instructions. Reviewed By: arcbbb, kito-cheng Differential Revision: https://reviews.llvm.org/D132792	2022-08-29 09:26:33 -07:00
Yeting Kuo	abf0416328	[RISCV] Merge vmerge.vvm and unmasked intrinsic with VLMAX vector length. The motivation of this patch is to lower the IR pattern (vp.merge mask, (add x, y), false, vl) to (PseudoVADD_VV_<LMUL>_MASK false, x, y, mask, vl). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131841	2022-08-29 11:44:51 +08:00
jacquesguan	1a1c59f995	[RISCV][NFC] Refactor fadd test to match the code. Change fadd test case in D122563 to match the fold base case. Differential Revision: https://reviews.llvm.org/D132722	2022-08-29 10:45:29 +08:00
Alexey Baturo	e3485345d3	[RISC-V][HWASAN] Add support for lowering HWASAN intrinsic for RISC-V Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D131343	2022-08-28 21:22:13 +03:00
Craig Topper	faf373e526	[RISCV] Pre-commit tests for D132771. NFC	2022-08-26 16:41:01 -07:00
Philip Reames	b45a262679	[RISCV] Enable fixed length vectors and loop vectorization with same This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size. For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware. The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization. SLP has been disabled for now, even when fixed vectors are enabled. See `a310637` and associated review. There are a few addiitional code quality issues which need worked through before turning SLP on would be reasonable. Differential Revision: https://reviews.llvm.org/D131508	2022-08-26 14:45:23 -07:00
Yunze Zhu	3846e3970f	[RISCV] Generate correct ELF abi flag when empty .ll file has target-abi attribute In patch D121183, target abi is get from .ll file's target-abi attribute and set in RISCVAsmPrinter::emitFunctionEntryLabel function. In https://github.com/llvm/llvm-project/issues/57242, an api mismatch error may be caused by failing to call function RISCVAsmPrinter::emitFunctionEntryLabel to set target-abi to correct one when the .ll is empty or a module has no function. This patch move setting target-abi part to function RISCVAsmPrinter::emitStartOfAsmFile, make sure all .ll file and module in LTO read target-abi from module flag and set, with or without function. Signed-off-by: xiaojing.zhang <xiaojing.zhang@xcalibyte.com> Signed-off-by: jianxin.lai <jianxin.lai@xcalibyte.com> Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D132204	2022-08-26 14:39:39 +08:00
jacquesguan	b28b54f8fc	[RISCV][NFC] Use common prefix to simplify test. Differential Revision: https://reviews.llvm.org/D132637	2022-08-26 10:39:41 +08:00
Craig Topper	41a3b5739b	[RISCV] Teach combineDeMorganOfBoolean to handle (and (xor X, 1), (not Y)). SimplifyDemandedBits tries to agressively turn xor immediates into -1 to match a 'not' instruction. In this case, because X is a boolean, the upper bits of (xor X, 1) are known to be 0. Because this is an AND instruction, that means those bits aren't demanded from the other operand, and thus SimplifyDemandedBits can turn (xor Y, 1) to (not Y). We need to detect that this has happened to enable the DeMorgan optimization. To do this we allow one of the xors to use -1 when the outer operation is And. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132671	2022-08-25 10:55:45 -07:00
Craig Topper	ec91d761ac	[RISCV] Apply DeMorgan's law to (and/or (xor X, 1), (xor Y, 1)) if X and Y are 0/1. This optimizes xors that appear due to legalizing setge/setle which require an xor with 1. This reduces the number of xors and may allow the xor to fold with a beqz or bnez. Differential Revision: https://reviews.llvm.org/D132614	2022-08-25 08:49:30 -07:00
ZHU Zijia	395bda933f	[RISCV][test] Update branch-relaxation.ll with update_llc_test_checks.py [NFC] Update `llvm/test/CodeGen/RISCV/branch-relaxation.ll` with `update_llc_test_checks.py`, according to https://reviews.llvm.org/D130560#3746417: >>! In D130560#3746417, @luismarques wrote: >>>! In D130560#3746379, @luismarques wrote: >> The tests don't seem to have been properly updated with >> `update_llc_test_checks.py`. >> `llvm/test/CodeGen/RISCV/branch-relaxation.ll` contains RV64 RUN >> lines but the corresponding CHECK lines are missing in >> some functions. > > Looking more closely at this, I guess you tried to only include the > `CHECK-RV64` and `CHECK-RV32` checks when relevant. That's a good > instinct but I guess it goes a bit against how we normally use > `update_llc_test_checks.py`. My understanding of the trade-off of > using that tool is that the test updates are much easier, even if > sometimes the CHECKs aren't as tight as something more tailormade. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D132625	2022-08-25 17:01:34 +08:00
ZHU Zijia	d098e45eed	Revert "[RISCV][test] Update branch-relaxation.ll with update_llc_test_checks.py [NFC]" This reverts commit `c374789fba`.	2022-08-25 17:01:33 +08:00
ZHU Zijia	c374789fba	[RISCV][test] Update branch-relaxation.ll with update_llc_test_checks.py [NFC] Update `llvm/test/CodeGen/RISCV/branch-relaxation.ll` with `update_llc_test_checks.py`, according to https://reviews.llvm.org/D130560#3746417: >>! In D130560#3746417, @luismarques wrote: >>>! In D130560#3746379, @luismarques wrote: >> The tests don't seem to have been properly updated with >> `update_llc_test_checks.py`. >> `llvm/test/CodeGen/RISCV/branch-relaxation.ll` contains RV64 RUN >> lines but the corresponding CHECK lines are missing in >> some functions. > > Looking more closely at this, I guess you tried to only include the > `CHECK-RV64` and `CHECK-RV32` checks when relevant. That's a good > instinct but I guess it goes a bit against how we normally use > `update_llc_test_checks.py`. My understanding of the trade-off of > using that tool is that the test updates are much easier, even if > sometimes the CHECKs aren't as tight as something more tailormade.	2022-08-25 16:54:48 +08:00
Craig Topper	ecde303690	[RISCV] Pre-commit tests for D132614. NFC	2022-08-24 15:31:17 -07:00
Kito Cheng	96c85f80f0	[RISCV] Don't outline pcrel-lo operand. This issue is found by build llvm-testsuite with `-Oz`, linker will complain `dangerous relocation: %pcrel_lo missing matching %pcrel_hi` and that turn out cause by we outlined pcrel-lo, but leave pcrel-hi there, that's not problem in general, but the problem is they put into different section, they pcrel-hi and pcrel-lo pair (e.g. AUIPC+ADDI) MUST put be present in same section due to the implementation. Outlined function will put into .text name, but the source functions will put in .text.<function-name> if function-section is enabled or the function has `comdat` attribute. There are few solutions for this issue: 1. Always disallow instructions with pcrel-lo flags. 2. Only disallow instructions with pcrel-lo flags that when function-section is enabled or this function has `comdat` attribute. 3. Check the corresponding instruction with pcrel-high also included in the outlining candidate sequence or not, and allow that only when pcrel-high is included in the outlining candidate. First one is most conservative, that might lose some optimization opportunities, and second one could save those opportunities, and last one is hard to implement, and don't have any benefits since pcrel-high are using different label even accessing same symbol. Use custom section name might also cause this problem, but that already filtered by RISCVInstrInfo::isFunctionSafeToOutlineFrom. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D132528	2022-08-24 21:47:46 +08:00
Kito Cheng	f054cbfe91	[RISCV] Precommit test for machine outliner issue for instruction with pcrel-lo. Differential Revision: https://reviews.llvm.org/D132527	2022-08-24 21:16:23 +08:00
ZHU Zijia	9c85382ade	[RISCV] Handle register spill in branch relaxation In branch relaxation pass, `j`'s with offset over 1MiB will be relaxed to `jump` pseudo-instructions. This patch allocates a stack slot for functions with a size greater than 1MiB. If the register scavenger cannot find a scratch register for `jump`, spill a register to the slot before the jump and restore it after the jump. .mbb: foo j .dest_bb bar bar bar .dest_bb: baz The above code will be relaxed to the following code. .mbb: foo sd s11, 0(sp) jump .restore_bb, s11 bar bar bar j .dest_bb .restore_bb: ld s11, 0(sp) .dest_bb: baz Depends on D129999. Reviewed By: StephenFan Differential Revision: https://reviews.llvm.org/D130560	2022-08-24 13:27:56 +08:00
Shao-Ce SUN	7167a4207e	[RISCV] Add zihintntl instructions Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D121670	2022-08-22 12:06:30 +08:00
Craig Topper	1a042dd6ed	[RISCV] Optimize x <s -1 ? x : -1. Improve x >u 1 ? x : 1. Similar to D132211, we can optimize x <s -1 ? x : -1 -> x <s 0 ? x : -1 Also improve the unsigned case from D132211 to use x != 0 which will give a bnez instruction which might be compressible. Differential Revision: https://reviews.llvm.org/D132252	2022-08-21 11:48:28 -07:00
LiaoChunyu	1fb87ace4d	[RISCV] Optimize x > 1 ? x : 1 -> x > 0 ? x : 1 if x == 1, x > 1 ? x : 1 return x, which is also 1. x > 0 ? x : 1 return 1. Reduce the number of load 1 instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132211	2022-08-21 20:26:39 +08:00
Lorenzo Albano	98117fe208	[VP] Add splitting for VP_STRIDED_STORE and VP_STRIDED_LOAD Following the comment's thread of D117235, I added checks for the widening + splitting case, which also causes a split with one of the resulting vectors to be empty. Due to the same issues described in that same thread, the `fixed-vectors-strided-store.ll` test is missing the widening + splitting case, while the same case in the `strided-vpload.ll` test requires to manually split the loaded vector. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121784	2022-08-19 18:15:56 -07:00
Craig Topper	6227b7ae31	[RISCV] Move xori creation for scalar setccs to lowering. This patch enables expansion or custom lowering for some integer condition codes so that any xori that is needed is created before the last DAG combine to enable optimization. I've seen cases where we end up with (or (xori (setcc), 1), (xori (setcc), 1)) which we would ideally convert to (xori (and (setcc), (setcc)), 1). This patch doesn't accomplish that yet, but it should allow us to add DAG combines as follow ups. Example https://godbolt.org/z/Y4qnvsq1b Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131729	2022-08-19 13:51:53 -07:00
Craig Topper	961838cc13	[RISCV] Add passthru operand to RISCVISD::SETCC_VL. Use it to the fix a bug in the fceil/ffloor lowerings. We were setting the passthru to IMPLICIT_DEF before and using a mask agnostic policy. This means where the incoming bits in the mask were 0 they could be anything in the outgoing mask. We want those bits in the outgoing mask to be 0. This means we need to pass the input mask as the passthru. This generates worse code because we are unable to allocate the v0 register to the output due to an earlyclobber constraint. We probably need a special TIED pseudoinstruction and probably custom isel since you can't use V0 twice in the input pattern. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132058	2022-08-19 08:53:44 -07:00
Craig Topper	c9a41fe60a	[RISCV] Prefer vnsrl.wi v8, v8, 0 over vnsrl.wx v8, v8, x0. I have a couple data points that some microarchitectures prefer the immediate 0 over x0. Does anyone know of microarchitectures where the opposite is true? Unfortunately, this is different than the vncvt.x.x.w alias from the spec. Perhaps the alias was poorly chosen if x0 isn't as optimal as immediate 0 on all microarchitectures. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132041	2022-08-19 08:40:17 -07:00
Archibald Elliott	3a729069e4	[IR] Update llvm.prefetch to match docs The current llvm.prefetch intrinsic docs state "The rw, locality and cache type arguments must be constant integers." This change: - Makes arg 3 (cache type) an ImmArg - Improves the verifier error messages to reference the incorrect argument. - Fixes two tests which contradict the docs. This is needed as the lowering to GlobalISel is different for ImmArgs compared to other constants. The non-ImmArgs create a G_CONSTANT MIR instruction, the for ImmArgs the constant is put directly on the intrinsic's MIR instruction as an immediate. Differential Revision: https://reviews.llvm.org/D132042	2022-08-19 09:11:17 +01:00
Craig Topper	ba1f4cab44	[RISCV] Copy SDNodeFlags in lowerToScalableOp. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D132177	2022-08-18 20:42:59 -07:00
Craig Topper	5349aa2354	[RISCV] Copy SDNodeFlags in doPeepholeMaskedRVV and doPeepholeMergeVVMFold Especially the NoFPExcept flag for FP. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D132173	2022-08-18 20:42:46 -07:00
Craig Topper	550fab53e1	[RISCV] Fold (sub C, (xor (setcc), 1)) -> (add (setcc), C-1). Extracted from D131729 where we handled C==0. It's now generalized to more constants. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132000	2022-08-17 09:50:08 -07:00
Alex Bradbury	ce38128194	[RISCV] Avoid redundant branch-to-branch when expanding cmpxchg If the success value of a cmpxchg is used in a branch, the expanded cmpxchg sequence ends up with a redundant branch-to-branch (as the backend atomics expansion happens as late as possible, passes to optimise such cases have already run). This patch identifies this case and avoid it when expanding the cmpxchg. Note that a similar optimisation is possible for a BEQ on the cmpxchg success value. As it's hard to imagine a case where real-world code may do that, this patch doens't handle that case. Differential Revision: https://reviews.llvm.org/D130192	2022-08-17 13:49:15 +01:00
Craig Topper	39707c1a9a	[RISCV] Add test coverage for (select (icmp X, Y), float, float). NFC We fold integer setcc into SELECT_CC during DAG combine even if the SELECT_CC has FP result type, but we had no test coverage.	2022-08-16 21:28:26 -07:00
Craig Topper	d8cdd78b6c	[RISCV] Add test cases to show missed opportunity to fold (sub C, (xor (setcc), 1)). NFC (sub C, (xori X, 1)) can be folded to (add X, C-1) if X is 0 or 1. This would avoid the xori and in some cases remove an instruction neede to materialize the constant.	2022-08-16 16:40:53 -07:00
Craig Topper	53ce22e429	Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine." This time using N1 instead of N0 since N1 points to the original setcc. This now affects scheduling as I expected. Original commit message: We change seteq<->setne but it doesn't change the semantics of the setcc. We should keep original debug location. This is consistent with visitXor in the generic DAGCombiner.	2022-08-16 15:51:07 -07:00
Craig Topper	b5a18de651	[RISCV] Remove C!=0 restriction from (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)). While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is still preferrable. (addi X, -1) can be compressed, sub with x0 on the LHS is never compressible.	2022-08-16 14:49:52 -07:00
Craig Topper	de6fd16971	[RISCV] Don't fold (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) if C-1 isn't simm12. We still need to materialize the constant in a register and we may not be removing all uses of the original constant so it may increase code size.	2022-08-16 14:11:31 -07:00
Craig Topper	1180ed41ee	[RISCV] Add more test cases for (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)). NFC In these test cases we do the transform, but the immediate is too large to form an ADDI so it didn't save any instructions. If the constant is opaque or has additional users we shouldn't do the transform if it doesn't form an ADDI.	2022-08-16 14:08:42 -07:00
Craig Topper	4854fa217f	[RISCV] Move test from setcc-logic.ll to select-const.ll. NFC Also add setne version of the test. Add some common prefixes to reduce number of identical CHECK lines.	2022-08-16 14:08:42 -07:00
Craig Topper	4184edc691	[RISCV] (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) fold for FP setcc. This introduce an xori in some cases. I don't believe it was the intention of the original patch. This was an accident because nonan FP equality compares also use SETEQ/SETNE. Also pass the correct type to getSetCCInverse.	2022-08-16 13:00:36 -07:00
Craig Topper	87e7837293	[RISCV] Add test cases to show where we inverted a fp setcc and introduced an extra xori. In these tests we had (sub C, (seteq X, Y)) which we converted to the (add (setne X, Y), C-1). We don't have a FNE compare instruction so this created an XORI to invert an FEQ instruction. This might be a good idea since it can save a constant materialization, but does not appear to be the intention of the original patch.	2022-08-16 12:59:16 -07:00
Craig Topper	7a73ab5818	[RISCV] Enable isTruncateFree in SDAG for i64->i32 on rv64. We have a good selection of W instructions, so promoting a truncated value back to i64 is often free. This appears to be a net code size reduction on SPECINT2006. This has been split from D130397 as one of the patches needed to complete that. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131819	2022-08-15 08:32:51 -07:00
Simon Pilgrim	3a73133217	[DAG] canCreateUndefOrPoison - add freeze(sign_extend_inreg(x,vt)) -> sign_extend_inreg(freeze(x),vt) support Guaranteed not to create undef/poison	2022-08-15 12:18:59 +01:00
Simon Pilgrim	7e294e676e	[DAG] canCreateUndefOrPoison - add freeze(assertsext/zext(x,bt)) -> assertsext/zext(freeze(x),vt) support These are guaranteed not to create undef/poison (although they may pass through) - the associated ISD::VALUETYPE node is also guaranteed never to generate poison	2022-08-15 11:13:43 +01:00
Craig Topper	b8c5420d74	[X86][RISCV] Pre-commit tests for D130862. NFC Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D131442	2022-08-14 16:31:15 -07:00
LiaoChunyu	99ef0ddea3	[RISCV] Fold (sub constant, (setcc x, y, eq/neq)) -> (add constant - 1, (setcc x, y, neq/eq)) (setcc x, y, eq/neq) are seqz, snez that set rd = 0/1. addi is used to process immediate, which can save instructions for load immediate. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131471	2022-08-13 20:37:57 +08:00
Craig Topper	e493944f5f	[RISCV] Use SLTIU X, -1 for (setne X, -1). Since -1 is the maximum unsigned value, all values less than it are not equal to it.	2022-08-11 15:36:04 -07:00
Craig Topper	2c79801a0e	[RISCV] Add more ineg+setcc isel patterns to avoid creating neg+xori+slti(u). Including patterns to select addiw if only the lower 32 bits are used. I'm not excited about adding this many patterns. I'm looking at whether we can create the xori during lowering and move the ineg patterns to DAGCombiner.	2022-08-11 14:24:09 -07:00
Yeting Kuo	875694089d	[RISCV] Peephole optimization to fold merge.vvm and unmasked intrinsics. The patch uses peephole method to fold merge.vvm and unmasked intrinsics to masked intrinsics. Using peephole intead of tablegen patterns is to avoid large auto gnerated code. Note: The patch ignores segment loads since I don't know how to test them. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130442	2022-08-11 17:58:11 +08:00
Yeting Kuo	7050f2102e	[RISCV][test] Precommitted test for optimization for vmerge and unmasked intrinsics. Precommitted test cases for D130442. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130753	2022-08-11 13:23:08 +08:00
Alex Bradbury	47b1f8362a	[RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls Prior to this patch, libcalls inserted by the SelectionDAG legalizer could never be tailcalled. The eligibility of libcalls for tail calling is is partly determined by checking TargetLowering::isInTailCallPosition and comparing the return type of the libcall and the calleer. isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly (which always returns false if not implemented by the target). This patch provides a minimal implementation of TargetLowering::isUsedByReturnOnly - enough to support tail calling libcalls on hard float ABIs. Soft-float ABIs are left for a follow on patch. libcall-tail-calls.ll also shows missed opportunities to tail call integer libcalls, but this is due to issues outside of the isUsedByReturnOnly hook. Differential Revision: https://reviews.llvm.org/D131087	2022-08-10 10:50:29 +01:00
Philip Reames	46fe24e2fb	[RISCV] Split check lines for fpclamptosat_vec test This is currently exercising scalarization code path; with vectors enabled, we hit a different code path. Explicitly exercise both so that both configurations have testing.	2022-08-09 15:03:29 -07:00
Philip Reames	8800b1103a	[RISCV] Pin a test to scalar lowering to preserve test intent [nfc] In an upcoming change to enable fixed length vector lowering via vector registers, the codepath exercised would change. Pin this to the old lowering.	2022-08-09 10:12:18 -07:00
Nikita Popov	f5ed0cb217	[RISCV] Add target feature to force-enable atomics This adds a +forced-atomics target feature with the same semantics as +atomics-32 on ARM (D130480). For RISCV targets without the +a extension, this forces LLVM to assume that lock-free atomics (up to 32/64 bits for riscv32/64 respectively) are available. This means that atomic load/store are lowered to a simple load/store (and fence as necessary), as these are guaranteed to be atomic (as long as they're aligned). Atomic RMW/CAS are lowered to __sync (rather than __atomic) libcalls. Responsibility for providing the __sync libcalls lies with the user (for privileged single-core code they can be implemented by disabling interrupts). Code using +forced-atomics and -forced-atomics are not ABI compatible if atomic variables cross the ABI boundary. For context, the difference between __sync and __atomic is that the former are required to be lock-free, while the latter requires a shared global lock provided by a shared object library. See https://llvm.org/docs/Atomics.html#libcalls-atomic for a detailed discussion on the topic. This target feature will be used by Rust's riscv32i target family to support the use of atomic load/store without atomic RMW/CAS. Differential Revision: https://reviews.llvm.org/D130621	2022-08-09 16:04:46 +02:00
Shubham Narlawar	ab4fc87a9d	[DAG] Emit table lookup from TargetLowering::expandCTTZ() This patch emits table lookup in expandCTTZ. Context - https://reviews.llvm.org/D113291 transforms set of IR instructions to cttz intrinsic but there are some targets which does not support CTTZ or CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ(). Differential Revision: https://reviews.llvm.org/D128911	2022-08-08 12:08:05 +01:00
Simon Pilgrim	e5e93b6130	[DAG] FoldConstantArithmetic - add initial support for undef elements in bitcasted binop constant folding FoldConstantArithmetic can fold constant vectors hidden behind bitcasts (e.g. vXi64 -> v2Xi32 on 32-bit platforms), but currently bails if either vector contains undef elements. These undefs can often occur due to SimplifyDemandedBits/VectorElts calls recognising that the upper bits are often unnecessary (e.g. funnel-shift/rotate implicit-modulo and AND masks). This patch adds a basic 'FoldValueWithUndef' handler that will attempt to constant fold if one or both of the ops are undef - so far this just handles the AND and MUL cases where we always fold to zero. The RISCV codegen increase is interesting - it looks like the BUILD_VECTOR lowering was loading a constant pool entry but now (with all elements defined constant) it can materialize the constant instead? Differential Revision: https://reviews.llvm.org/D130839	2022-08-08 11:53:56 +01:00
Craig Topper	75c64c7c4e	[RISCV] Don't use li+sh3add for constants that can use lui+add. If we're adding a constant that can't use addi we try a few tricks, one of which is using li+sh3add. We should not do this if lui+add would work. For example adding 8192. Using sh3add prevents folding a sext.w to form addw, thus increasing instruction count.	2022-08-05 12:47:03 -07:00
Philip Reames	9a9848f4b9	[RISCVInsertVSETVLI] Remove an unsound optimization This fixes a bug reported privately by @craig.topper. Here's an example which illustrates the problem: vsetivli a1, a0, e32, m1, ta, mu # both DefInfo and PrevInfo vsetivli a2, a1, e32, m4, ta, mu With the unsound result being: vsetivli a1, a0, e32, m1, ta, mu vsetivli a2, a0, e32, m4, ta, mu Consider the case where this is running on a machine with VLEN=512,. For this case, the VLMAXs are 16 and 64 respectively. Consider for a0 = 33. The correct result is: a1 = 16, and a2 = 16 After the unsound optimization: a1 = 16 and a2 = 33 This particular example used VLMAXs which differed by more than a power of two. With a difference of only one power of two, there's another form of this bug which involves the AVL < 2 x VLMAX special case, but that ones more complicated to construct as many examples turn out accidentally sound. This patch takes the approach of simply removing the unsound optimization, but there are multiple sound sub-cases of it. I plan to return to at least a couple of them, but figured it was cleaner to remove the unsound optimization (for ease of backporting), and then review the new optimizations on their own. Differential Revision: https://reviews.llvm.org/D131264	2022-08-05 12:13:08 -07:00
Craig Topper	12a1ca9c42	[RISCV] Relax another one use restriction in performSRACombine. When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C) it's possible that the add is used by multiple sras. We should allow the combine if all the SRAs will eventually be updated. After transforming all of the sras, the shls will share a single (sext_inreg (add X, C1), i32). This pattern occurs if an sra with 32 is used as index in multiple GEPs with different scales. The shl from the GEPs will be combined with the sra before we get a chance to match the sra pattern.	2022-08-04 14:32:31 -07:00
Craig Topper	a2de12c987	[RISCV] Relax a one use restriction performSRACombine When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C) ignore the use count on the (shl X, 32). The sext_inreg after the transform is free. So we're only making 2 new instructions, the add and the shl. So we only need to be concerned with replacing the original sra+add. The original shl can have other uses. This helps if there are multiple different constants being added to the same shl.	2022-08-04 11:25:08 -07:00
Lorenzo Albano	74940d2668	[VP] Add widening for VP_STRIDED_LOAD and VP_STRIDED_STORE Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D121114	2022-08-04 16:12:01 +02:00
wanglian	b6b0690355	[LegalizeTypes][VP] Add split operand support for VP float and integer casting Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D130685	2022-08-04 15:41:50 +08:00
Craig Topper	53d560b22f	[RISCV] Prevent infinite loop after D129980. D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into (seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it will be turned back into an 'and' by SimplifyDemandedBits which can cause an infinite loop. To prevent this, check if bit 31 is 0 with computeKnownBits before doing the transformation. Fixes PR56905. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131113	2022-08-03 15:19:07 -07:00
Alex Bradbury	9e966dd298	[RISCV][test] Add test for ability to tailcall libcalls Although there's good coverage of the libcalls within llvm/test/CodeGen, it's useful to have tests for all ABI and hard/soft-float combinations in order to properly test the logic that enables libcall tail calls (which will be implemented in a follow-up patch).	2022-08-03 19:27:59 +01:00
Alex Bradbury	28f12a09ae	[RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics An unnecessary sext.w is generated when masking the result of the riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed. Although this isn't a particularly important optimisation, removing the sext.w simplifies implementation of an additional cmpxchg-related optimisation in D130192. Although I can't produce a test with different codegen for the other atomics intrinsics, these are added as well for completeness. Differential Revision: https://reviews.llvm.org/D130191	2022-08-03 13:41:58 +01:00
Craig Topper	c2d0685286	[RISCV] Simplify test case from D130931. NFC	2022-08-01 16:50:56 -07:00
Craig Topper	da5b1bf5bb	[RISCV] Teach RISCVMergeBaseOffset to merge %lo/%pcrel_lo into load/store after folding arithmetic. It's possible we have: lui a0, %hi(sym) addi a0, %lo(sym) addi a0, <offset1> lw a0, <offset2>(a0) We want to arrive at lui a0, %hi(sym+offset1+offset2) lw a0, %lo(sym+offset1+offset2) We currently fail to do this because we only consider loads/stores if we didn't find any arithmetic. This patch splits arithmetic folding and load/store folding into two separate phases. The load/store folding can no longer assume the offset in hi/lo is 0 so we must combine the offsets. I've applied the same simm32 limit that we applied in the arithmetic folding. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D130931	2022-08-01 15:33:21 -07:00
Craig Topper	e07a8155f5	[RISCV] Move Pre-RA pseudo expansion from addMachineSSAOptimization to addPreRegAlloc. addMachineSSAOptimization is skipped for -O0, but this pass is required for -O0.	2022-08-01 13:44:43 -07:00
Craig Topper	f9b05e6dad	[RISCV] Pre-commit tests for D130931. NFC	2022-08-01 12:57:09 -07:00
Craig Topper	450edb0b37	[RISCV] Explicitly select second operand of branch condition to X0. At least based on the lit tests, the coalescer sometimes fails to propagate the copy from X0 into the branch instruction. This patch does it manually during isel. The majority of the changes are from the select patterns. Some of the changes are just register allocation changes. Only the Select change affects the whether a b*z instruction is generated in the tests. I changed the branch pattern for consistency. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D130809	2022-08-01 11:16:48 -07:00
Lorenzo Albano	71b7c03fd6	[RISCV][VP] Custom lower VP_STRIDED_LOAD and VP_STRIDED_STORE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121113	2022-08-01 09:23:45 -07:00
Luís Marques	0bc177b6f5	[RISCV] Extend the Merge Base Offset pass to handle AUIPC+ADDI Builds upon D123264, adding support for merging the low part of the LLA address into the load/store instruction offsets. Differential Revision: https://reviews.llvm.org/D123265	2022-08-01 11:30:02 +02:00
Luís Marques	260a641068	[RISCV] Pre-RA expand pseudos pass Expand load address pseudo-instructions earlier (pre-ra) to allow follow-up patches to fold the addi of PseudoLLA instructions into the immediate operand of load/store instructions. Differential Revision: https://reviews.llvm.org/D123264	2022-07-31 23:19:00 +02:00
Craig Topper	d21b315360	[RISCV] Remove vmerges from vector ceil, floor, trunc lowering. Use masked operations to suppress spurious exception bits being set in fflags. Unfortunately, doing this adds extra copies.	2022-07-30 10:58:41 -07:00
Luís Marques	383bc7210e	[RISCV] Precommit test for D123265 Differential Revision: https://reviews.llvm.org/D128562	2022-07-30 00:13:42 +02:00
Craig Topper	e637feee80	[RISCV] Add isel pattern for (setne/eq GPR, -2048) For constants in the range [-2047, 2048] we use addi. If the constant is -2048 we can use xori. If we don't match this explicitly, we'll emit an LI for the -2048 followed by an XOR.	2022-07-29 14:07:38 -07:00

1 2 3 4 5 ...

1997 Commits