llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazu Hirata	3f3930a451	Remove redundaunt virtual specifiers (NFC) Identified with tidy-modernize-use-override.	2022-07-25 23:00:59 -07:00
Craig Topper	45944e7cf4	[RISCV] Refactor translateSetCCForBranch to prepare for D130508. NFC. D130508 handles more constants than just 1 or -1. We need to extract the constant instead of relying isOneConstant or isAllOnesConstant.	2022-07-25 15:54:54 -07:00
Craig Topper	1db6d6dcd8	[RISCV] Teach RISCVCodeGenPrepare to optimize (zext (abs(i32 X, i1 1))). (abs(i32 X, i1 1) always produces a positive result. The 'i1 1' means INT_MIN input produces poison. If the result is sign extended, InstCombine will convert it to zext. This does not produce ideal code for RISCV. This patch reverses the zext back to sext which can be folded into a subw or negw. Ideally we'd do this in SelectionDAG, but we lose the INT_MIN poison flag when llvm.abs becomes ISD::ABS. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130412	2022-07-25 09:36:41 -07:00
jacquesguan	d8800ead62	[RISCV] Scalarize binop followed by extractelement. This patch adds shouldScalarizeBinop to RISCV target in order to convert an extract element of a vector binary operation into an extract element followed by a scalar binary operation. Differential Revision: https://reviews.llvm.org/D129545	2022-07-25 17:23:31 +08:00
Kazu Hirata	b5188591a0	[llvm] Remove redundaunt virtual specifiers (NFC) Identified with modernize-use-override.	2022-07-24 21:50:35 -07:00
Craig Topper	9adc00a9d0	[RISCV] Add a continue to reduce nesting. NFC	2022-07-23 17:36:12 -07:00
Kazu Hirata	1cc7f5bede	Use static_assert instead of assert (NFC) Identified with misc-static-assert.	2022-07-23 09:22:27 -07:00
Craig Topper	ab2348a6fa	[RISCV] Add sext.b/h and zext.b/h/w to RISCVInstrInfo::foldMemoryOperandImpl. We can always fold zext.b since it is just andi. The others require Zba/Zbb. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130302	2022-07-21 14:54:58 -07:00
Craig Topper	add17fc8e4	[RISCV] Combine (select_cc (srl (and X, 1<<C), C), 0, eq/ne, true, fale) (srl (and X, 1<<C), C) is the form we receive for testing bit C. An earlier combine removed the setcc so it wasn't there to match when we created the SELECT_CC. This doesn't happen for BR_CC because generic DAG combine rebuilds the setcc if it is used by BRCOND. We can shift X left by XLen-1-C to put the bit to be tested in the MSB, and use a signed compare with 0 to test the MSB.	2022-07-20 22:32:11 -07:00
Craig Topper	7dda6c71b1	[RISCV] Refactor the common combines for SELECT_CC and BR_CC into a helper function. The only difference between the combines were the calls to getNode that include the true/false values for SELECT_CC or the chain and branch target for BR_CC. Wrap the rest of the code into a helper that reads LHS, RHS, and CC and outputs new values and a bool if a new node needs to be created.	2022-07-20 21:18:07 -07:00
Craig Topper	8983db15a3	[RISCV] Optimize (brcond (seteq (and X, 1 << C), 0)) If C > 10, this will require a constant to be materialized for the And. To avoid this, we can shift X left by XLen-1-C bits to put the tested bit in the MSB, then we can do a signed compare with 0 to determine if the MSB is 0 or 1. Thanks to @reames for the suggestion. I've implemented this inside of translateSetCCForBranch which is called when setcc+brcond or setcc+select is converted to br_cc or select_cc during lowering. It doesn't make sense to do this for general setcc since we lack a sgez instruction. I've tested bit 10, 11, 31, 32, 63 and a couple bits betwen 11 and 31 and between 32 and 63 for both i32 and i64 where applicable. Select has some deficiencies where we receive (and (srl X, C), 1) instead. This doesn't happen for br_cc due to the call to rebuildSetCC in the generic DAGCombiner for brcond. I'll explore improving select in a future patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130203	2022-07-20 18:40:49 -07:00
Craig Topper	31b8939ded	[RISCV] Recognize bexti from (srl (and X, 1<<C), C). This is the form we get for (zext (setne (and X 1<<C))). We only had bexti patterns for the alternative form (and (srl X, C), 1).	2022-07-20 15:03:52 -07:00
ksyx	3198364e6e	[RISCV][Clang] Add support for Zmmul extension This patch implements recently ratified extension Zmmul, a subextension of M (Integer Multiplication and Division) consisting only multiplication part of it. Differential Revision: https://reviews.llvm.org/D103313 Reviewed By: craig.topper, jrtc27, asb	2022-07-18 20:26:08 -04:00
Craig Topper	0b02752899	[RISCV] Optimize (seteq (i64 (and X, 0xffffffff)), C1) (and X, 0xffffffff) requires 2 shifts in the base ISA. Since we know the result is being used by a compare, we can use a sext_inreg instead of an AND if we also modify C1 to have 33 sign bits instead of 32 leading zeros. This can also improve the generated code for materializing C1. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D129980	2022-07-18 10:54:45 -07:00
Craig Topper	7c0b9b379b	[RISCV] Add isel patterns for ineg+setge/le/uge/ule. setge/le/uge/ule selected by themselves require an xori with 1. If we're negating the setcc, we can fold the xori with the neg to create an addi with -1. This works because xori X, 1 is equivalent to 1 - X if X is either 0 or 1. So we're doing -(1 - X) which is X-1 or X+-1. This improves the code for selecting between 0 and -1 based on a condition for some conditions. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D129957	2022-07-18 09:55:01 -07:00
Craig Topper	d7f2a63371	[RISCV] Fold stack reload into sext.w by using lw instead of ld. We can use lw to load 4 bytes from the stack and sign extend them instead of loading all 8 bytes. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D129948	2022-07-18 09:09:17 -07:00
Simon Pilgrim	259c36e7c1	[DAG] Add asserts to isDesirableToCommuteWithShift overrides to ensure its being called from a shift. NFC.	2022-07-18 13:11:24 +01:00
jacquesguan	2b11174079	[RISCV][NFC] Use more Arrayref in TargetLowering functions. This patch replaces some foreach with Arrayref, and abstract some same literal array with a variable. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125656	2022-07-18 03:33:45 +00:00
jacquesguan	bd228a1772	[RISCV] Extend use of SHXADD instructions in RVV spill/reload code. This patch extends D124824. It uses SHXADD+SLLI to emit 3, 5, or 9 multiplied by a power 2. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D129179	2022-07-18 10:53:19 +08:00
Fangrui Song	d955497112	[RISCV] Simplify lowerGlobalAddress. NFC	2022-07-17 15:42:45 -07:00
Craig Topper	decf385c27	[RISCV] Teach targetShrinkDemandedConstant to handle OR and XOR. We were only handling AND before, but SimplifyDemandedBits can also call it for OR and XOR.	2022-07-17 12:36:33 -07:00
Craig Topper	8cc483099a	[RISCV] Teach RISCVCodeGenPrepare to optimize (i64 (and (zext/sext (i32 X), C1))) If X is known positive by a dominating condition, we can fill in ones into the upper bits of C1 if that would allow it to become an simm12 allowing the use of ANDI. This pattern often occurs in unrolled loops where the induction variable has been widened. To get the best benefit from this, I had to move the pass above ConstantHoisting which is in addIRPasses. Otherwise the AND constant is often hoisted away from the AND. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D129888	2022-07-17 11:00:56 -07:00
Craig Topper	73f766ca9a	[RISCV] Remove unnecessary use of IRBuilder from RISCVCodeGenPrepare. We're creating single instruction to replace another instruction. We can insert using the InsertBefore operand of the constructor. Then copy the debug location.	2022-07-17 10:59:54 -07:00
Craig Topper	ee6267c443	[RISCV] Remove Gather/Scatter Opt from the O0 pipeline.	2022-07-17 10:58:33 -07:00
Lian Wang	dca821d80a	[RISCV] Add cost model for vector.reverse mask operation Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128784	2022-07-15 06:58:57 +00:00
Craig Topper	79016f6eef	[RISCV] Refine the heuristics for our custom (mul (and X, C2), C1) isel. Prefer to use SLLI instead of zext.w/zext.h in more cases. SLLI might be better for compression.	2022-07-14 18:24:10 -07:00
Craig Topper	bc0d656558	[RISCV] Fix mistake in RISCVTTIImpl::getIntImmCostInst. zext.w requires Zba not Zbb. The test was also wrong, but had the correct comment.	2022-07-14 16:42:35 -07:00
Craig Topper	9913ea490a	[RISCV] Make TuneSiFive7 depend on TuneNoDefaultUnroll instead of listing it for every SiFive7 CPU	2022-07-14 15:57:30 -07:00
Craig Topper	1a8468ba61	[RISCV] Add a RISCV specific CodeGenPrepare pass. Initial optimization is to convert (i64 (zext (i32 X))) to (i64 (sext (i32 X))) if the dominating condition for the basic block guaranteed the sign bit of X is zero. This frequently occurs in loop preheaders where a signed induction variable that can never be negative has been widened. There will be a dominating check that the 32-bit trip count isn't negative or zero. The check here is not restricted to that specific case though. A i32->i64 sext is cheaper than zext on RV64 without the Zba extension. Later optimizations can often remove the sext from the preheader basic block because the dominating block also needs a sext to evaluate the greater than 0 check. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D129732	2022-07-14 10:20:59 -07:00
Fraser Cormack	d1a5669f5e	[RISCV] Disable subregister liveness by default We previously enabled subregister liveness by default when compiling with RVV. This has been shown to cause miscompilations where RVV register operand constraints are not met. A test was added for this in D129639 which explains the issue in more detail. Until this issue is fixed in some way, we should not be enabling subregister liveness unless the user asks for it. Reviewed By: craig.topper, rogfer01, kito-cheng Differential Revision: https://reviews.llvm.org/D129646	2022-07-14 17:04:10 +01:00
David Green	3e0bf1c7a9	[CodeGen] Move instruction predicate verification to emitInstruction D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Recommitted with some fixes for the leftover MCII variables in release builds. Differential Revision: https://reviews.llvm.org/D129506	2022-07-14 09:33:28 +01:00
Craig Topper	257755530a	[RISCV] Fold (sra (sext_inreg (shl X, C1), i32), C2) -> (sra (shl X, C1+32), C2+32). The former pattern will select as slliw+sraiw while the latter will select as slli+srai. This can enable the slli+srai to be compressed. Differential Revision: https://reviews.llvm.org/D129688	2022-07-13 14:34:17 -07:00
Philip Reames	dde2a7fb6d	[RISCV] Exploit fact that vscale is always power of two to replace urem sequence When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale. vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.) We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.) It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic. Differential Revision: https://reviews.llvm.org/D129609	2022-07-13 10:54:47 -07:00
David Green	95252133e1	Revert "Move instruction predicate verification to emitInstruction" This reverts commit `e2fb8c0f4b` as it does not build for Release builds, and some buildbots are giving more warning than I saw locally. Reverting to fix those issues.	2022-07-13 13:28:11 +01:00
David Green	e2fb8c0f4b	Move instruction predicate verification to emitInstruction D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Differential Revision: https://reviews.llvm.org/D129506	2022-07-13 12:53:32 +01:00
Fraser Cormack	b336cf856e	[RISCV] Add early-exit to RVV stack computation. NFCI. This patch was split off from D126465, where an early-exit is necessary as it checks the VLEN and that asserts that V instructions are present. Since this makes logical sense on its own, I think it's worth landing regardless of D126465. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D129617	2022-07-13 08:50:08 +01:00
Monk Chiang	2b045324b2	[RISCV] Add scheduling resources for vector segment instructions. Add scheduling resources for vector segment instructions Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128886	2022-07-12 22:51:58 -07:00
Craig Topper	c5be6a8308	[RISCV] Use X0 in place of VLMaxSentinel in lowering. I thought I had already fixed all of these, but I guess I missed one.	2022-07-11 23:29:04 -07:00
Craig Topper	c3c17b1695	[RISCV] Use MVT for the argument to getMaskTypeFor. NFC Only one caller didn't already have an MVT and that was easy to fix. Since the return type is MVT and it uses MVT::getVectorVT, taking an MVT as input makes the most sense.	2022-07-11 15:14:44 -07:00
Craig Topper	759e5e0096	[RISCV] Remove doPeepholeLoadStoreADDI. All of the cases should be handled by SelectAddrRegImm now. Reviewed By: asb, luismarques Differential Revision: https://reviews.llvm.org/D129451	2022-07-11 10:44:33 -07:00
Craig Topper	907d923a20	[RISCV] Move the custom isel for (add X, imm) into SelectAddrRegImm. This custom isel was used to split the lo12 bits of the imm so that they could be folded into load/store addresses via a post-isel peephole. This patch instead splits the immediate during isel and folds the lo12 removing the need for the post-isel peephole to do anything. After this we'll be able to remove the post-isel peephole. Reviewed By: asb, luismarques Differential Revision: https://reviews.llvm.org/D129450	2022-07-11 10:44:33 -07:00
Craig Topper	1a2bd44b77	[RISCV] Make shouldConvertConstantLoadToIntImm return true unless enableUnalignedScalarMem is true. This restores the old behavior before D129402 when enableUnalignedScalarMem is false. This fixes a regression spotted by @asb. To fix this correctly, we need to consider alignment of the load we'd be replacing, but that's not possible in the current interface.	2022-07-11 09:40:08 -07:00
David Sherwood	03fee6712a	[LoopVectorize] Add option to use active lane mask for loop control flow Currently, for vectorised loops that use the get.active.lane.mask intrinsic we only use the mask for predicated vector operations, such as masked loads and stores, etc. The loop itself is still controlled by comparing the canonical induction variable with the trip count. However, for some targets this is inefficient when it's cheap to use the mask itself to control the loop. This patch adds support for using the active lane mask for control flow by: 1. Generating the active lane mask for the next iteration of the vector loop, rather than the current one. If there are still any remaining iterations then at least the first bit of the mask will be set. 2. Extract the first bit of this mask and use this bit for the conditional branch. I did this by creating a new VPActiveLaneMaskPHIRecipe that sets up the initial PHI values in the vector loop pre-header. I've also made use of the new BranchOnCond VPInstruction for the final instruction in the loop region. Differential Revision: https://reviews.llvm.org/D125301	2022-07-11 13:46:55 +01:00
LiaoChunyu	3f68f0f816	[RISCV] Optimize 2x SELECT for floating-point types Including the following opcode: Select_FPR16_Using_CC_GPR Select_FPR32_Using_CC_GPR Select_FPR64_Using_CC_GPR Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127871	2022-07-11 14:10:27 +08:00
Pengcheng Wang	8977989449	[RISCV] Increase complexity of RVV element extraction patterns Somehow some tests failed in our downstream because it matched VFMV+FSD pattern first. Both FSD and VSE patterns have the same complexity, while FSD is matched before VSE in the generated matcher table. This problem only occurs in our downstream (so sorry that I can't provide a test here) and increasing the value of `AddedComplexity` can fix it. Reviewed By: StephenFan, craig.topper Differential Revision: https://reviews.llvm.org/D129360	2022-07-11 10:53:15 +08:00
Craig Topper	35ec8a423d	[RISCV] Teach shouldConvertConstantLoadToIntImm that constant materialization can use constant pools. I think it only makes sense to return true here if we aren't going to turn around and create a constant pool for the immmediate. I left out the check for useConstantPoolForLargeInts() thinking that even if you don't want the commpiler to create a constant pool you might still want to avoid materializing an integer that is already available in a global variable. Test file was copied from AArch64/ARM and has not been commited yet. Will post separate review for that. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D129402	2022-07-10 14:10:17 -07:00
Craig Topper	5f7641a3be	[RISCV] Modify the custom isel for (add X, imm) used by load/stores. We have custom isel that tries to select the Lo12 bits using a separate ADDI that can later folded into the load/store address by the post-isel peephole. This patch disables this if the load/store already had a non-zero offset. A non-zero offset implies that CodeGenPrepare split several large offsets used by different loads and stores into a common large offset and multiple small offsets that could be folded. Folding more of the lo12 bits changes this common offset by increasing the small offsets. While this can save an instruction to materialize the common offset, it can also prevent the small offsets from fitting in a compressed load/store instruction. Removing this also simplifies the last piece needed to fold the custom isel for add into SelectAddrRegImm and remove the post-isel peephole.	2022-07-09 22:47:27 -07:00
Craig Topper	9c6a2200e2	[RISCV] Support folding constant addresses in SelectAddrRegImm. We already handled this by folding an ADDI in the post-isel peephole. My goal is to remove that peephole so this adds the functionality to isel.	2022-07-09 13:12:02 -07:00
Philip Reames	b12930e133	[RISCV] Switch to using get.active.lane.mask when tail folding The motivation here is to a) bring us closer into alignment with AArch64 under the assumption that codepath is better tested, and b) simplify pattern matching in an upcoming change. The immediate impact is a significant IR reduction but a fairly minimal change in the generated assembly. Due to a difference in expansion behavior we get a saturating add vs an unsaturating one for the old code, but that's about it. This difference comes down to different handling of overflow, which doesn't seem to be possible here anyways, so the assembly codegen is arguably a minor regression. I don't expect that to matter in practice. Differential Revision: https://reviews.llvm.org/D129221	2022-07-08 10:24:59 -07:00
Craig Topper	92f1794d41	[RISCV] Mark fminnum_vl and fmaxnum_vl as commutable.	2022-07-08 10:19:09 -07:00

1 2 3 4 5 ...

2204 Commits