llvm-project

Commit Graph

Author	SHA1	Message	Date
Philip Reames	b45a262679	[RISCV] Enable fixed length vectors and loop vectorization with same This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size. For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware. The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization. SLP has been disabled for now, even when fixed vectors are enabled. See `a310637` and associated review. There are a few addiitional code quality issues which need worked through before turning SLP on would be reasonable. Differential Revision: https://reviews.llvm.org/D131508	2022-08-26 14:45:23 -07:00
Philip Reames	a310637132	[RISCV] Disable SLP vectorization by default due to unresolved profitability issues This change implements a TTI query with the goal of disabling slp vectorization on RISCV. The current default configuration disables SLP already, but its current tied to the ability to lower fixed length vectors. Over in D131508, I want to enable fixed length vectors for purposes of LoopVectorizer, but preliminary analysis has revealed a couple of SLP specific issues we need to resolve before enabling it by default. This change exists to allow us to enable LV without SLP. Differential Revision: https://reviews.llvm.org/D132680	2022-08-26 14:11:22 -07:00
Yunze Zhu	3846e3970f	[RISCV] Generate correct ELF abi flag when empty .ll file has target-abi attribute In patch D121183, target abi is get from .ll file's target-abi attribute and set in RISCVAsmPrinter::emitFunctionEntryLabel function. In https://github.com/llvm/llvm-project/issues/57242, an api mismatch error may be caused by failing to call function RISCVAsmPrinter::emitFunctionEntryLabel to set target-abi to correct one when the .ll is empty or a module has no function. This patch move setting target-abi part to function RISCVAsmPrinter::emitStartOfAsmFile, make sure all .ll file and module in LTO read target-abi from module flag and set, with or without function. Signed-off-by: xiaojing.zhang <xiaojing.zhang@xcalibyte.com> Signed-off-by: jianxin.lai <jianxin.lai@xcalibyte.com> Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D132204	2022-08-26 14:39:39 +08:00
LiaoChunyu	6b098bf35a	[RISCV] : Add support for simm10_lsb0000nonzero operand. Running on RISCV machine llvm-exegesis I faced with trouble: can't measure C_ADDI16SP, beacuse immediate has type simm10_lsb0000nonzero. Patch adds support for processing this immediate operand type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132650	2022-08-26 14:37:37 +08:00
Craig Topper	e4177201eb	[RISCV][M68k] Replace fixed size BitVector with std::bitset. Saves a heap allocation and avoids an explicit call to the BitVector constructor. Reviewed By: reames, myhsu Differential Revision: https://reviews.llvm.org/D132674	2022-08-25 12:45:08 -07:00
Craig Topper	41a3b5739b	[RISCV] Teach combineDeMorganOfBoolean to handle (and (xor X, 1), (not Y)). SimplifyDemandedBits tries to agressively turn xor immediates into -1 to match a 'not' instruction. In this case, because X is a boolean, the upper bits of (xor X, 1) are known to be 0. Because this is an AND instruction, that means those bits aren't demanded from the other operand, and thus SimplifyDemandedBits can turn (xor Y, 1) to (not Y). We need to detect that this has happened to enable the DeMorgan optimization. To do this we allow one of the xors to use -1 when the outer operation is And. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132671	2022-08-25 10:55:45 -07:00
Philip Reames	53f738ce7e	[RISCV] Add empirical costs for integer min/max and saturing add/sub All of these are lowered to a single instruction for all legal vector types.	2022-08-25 09:27:17 -07:00
Craig Topper	ec91d761ac	[RISCV] Apply DeMorgan's law to (and/or (xor X, 1), (xor Y, 1)) if X and Y are 0/1. This optimizes xors that appear due to legalizing setge/setle which require an xor with 1. This reduces the number of xors and may allow the xor to fold with a beqz or bnez. Differential Revision: https://reviews.llvm.org/D132614	2022-08-25 08:49:30 -07:00
Philip Reames	03798f268b	{RISCV] Backout cttz/ctlz instruction costs Craig points out correctly in post-commit review that these depend on the availability of floating point extensions.	2022-08-24 15:40:48 -07:00
Philip Reames	d4d6e71ea2	[RISCV] Add empirical costs for bswap/bitreverse/ctpop/ctlz/cttz If anyone is looking for a source of ideas on vector codegen improvements, the lowerings for several of these seem to include pretty obvious fixits.	2022-08-24 15:09:21 -07:00
Philip Reames	42af1a776a	[RISCV] Add empirically measured vector sqrt intrinsic costs	2022-08-24 14:27:57 -07:00
Philip Reames	4d3134866f	[RISCV] Add vector fabs intrinsic costs We have a fabs vector instruction, and are using it for current lowering.	2022-08-24 14:09:51 -07:00
Saleem Abdulrasool	8f45b5a7a9	RISCV: permit unaligned nop-slide padding emission We may be requested to emit an unaligned nop sequence (e.g. 7-bytes or 3-bytes). These should be 0-filled even though that is not a valid instruction. This matches the behaviour on other architectures like ARM, X86, and MIPS. When a custom section is emitted, it may be classified as text even though it may be a data section or we may be emitting data into a text segment (e.g. a literal pool). In such cases, we should be resilient to the emission request. This was originally identified by the Linux kernel build and reported on D131270 by Nathan Chancellor. Differential Revision: https://reviews.llvm.org/D132482 Reviewed By: luismarques Tested By: Nathan Chancellor	2022-08-24 20:26:48 +00:00
Simon Pilgrim	f9de13232f	[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis. For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling. Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU. Differential Revision: https://reviews.llvm.org/D132520	2022-08-24 17:28:18 +01:00
Kito Cheng	8e8a62006e	[RISCV][NFC] Minor cleanup in RISCVInstrInfo::getOutliningType The only use of TM is checking result of TargetMachine::getFunctionSections, check that directly instead of introdce a local variable.	2022-08-24 23:42:34 +08:00
Alex Richardson	38107171ed	[RegisterInfoEmitter] Generate isConstantPhysReg(). NFCI This commit moves the information on whether a register is constant into the Tablegen files to allow generating the implementaiton of isConstantPhysReg(). I've marked isConstantPhysReg() as final in this generated file to ensure that changes are made to tablegen instead of overriding this function, but if that turns out to be too restrictive, we can remove the qualifier. This should be pretty much NFC, but I did notice that e.g. the AMDGPU generated file also includes the LO16/HI16 registers now. The new isConstant flag will also be used by D131958 to ensure that constant registers are marked as call-preserved. Differential Revision: https://reviews.llvm.org/D131962	2022-08-24 14:16:20 +00:00
Kito Cheng	96c85f80f0	[RISCV] Don't outline pcrel-lo operand. This issue is found by build llvm-testsuite with `-Oz`, linker will complain `dangerous relocation: %pcrel_lo missing matching %pcrel_hi` and that turn out cause by we outlined pcrel-lo, but leave pcrel-hi there, that's not problem in general, but the problem is they put into different section, they pcrel-hi and pcrel-lo pair (e.g. AUIPC+ADDI) MUST put be present in same section due to the implementation. Outlined function will put into .text name, but the source functions will put in .text.<function-name> if function-section is enabled or the function has `comdat` attribute. There are few solutions for this issue: 1. Always disallow instructions with pcrel-lo flags. 2. Only disallow instructions with pcrel-lo flags that when function-section is enabled or this function has `comdat` attribute. 3. Check the corresponding instruction with pcrel-high also included in the outlining candidate sequence or not, and allow that only when pcrel-high is included in the outlining candidate. First one is most conservative, that might lose some optimization opportunities, and second one could save those opportunities, and last one is hard to implement, and don't have any benefits since pcrel-high are using different label even accessing same symbol. Use custom section name might also cause this problem, but that already filtered by RISCVInstrInfo::isFunctionSafeToOutlineFrom. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D132528	2022-08-24 21:47:46 +08:00
MarkGoncharovAl	8c1f18bd3e	[RISCV] : Add support for immediate operands. llvm-exegesis uses operand type information provided in tablegen files to initialize immediate arguments of the instruction. Some of them simply don't have such information. Thus we should set into relevant immediate operands their specific type. Also create verification methods for them. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131771	2022-08-24 17:48:39 +08:00
Alex	07a700f814	[RISCV] Add zihintntl compressed instructions Add zihintntl compressed instructions and some files related to zihintntl. This patch is base on {D121670}. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D121779	2022-08-24 14:29:02 +08:00
ZHU Zijia	9c85382ade	[RISCV] Handle register spill in branch relaxation In branch relaxation pass, `j`'s with offset over 1MiB will be relaxed to `jump` pseudo-instructions. This patch allocates a stack slot for functions with a size greater than 1MiB. If the register scavenger cannot find a scratch register for `jump`, spill a register to the slot before the jump and restore it after the jump. .mbb: foo j .dest_bb bar bar bar .dest_bb: baz The above code will be relaxed to the following code. .mbb: foo sd s11, 0(sp) jump .restore_bb, s11 bar bar bar j .dest_bb .restore_bb: ld s11, 0(sp) .dest_bb: baz Depends on D129999. Reviewed By: StephenFan Differential Revision: https://reviews.llvm.org/D130560	2022-08-24 13:27:56 +08:00
Philip Reames	c9608d57b8	[TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC] This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet. The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.	2022-08-23 07:55:42 -07:00
Philip Reames	478cf94378	[X86][AArch64][WebAsm][RISCV] Query operand properties instead of using enums directly [nfc] This is part of an ongoing transition to use OperandValueInfo which combines OperandValueKind and OperandValueProperties. This change adds some accessor methods and uses them to simplify backend code. The primary motivation of doing so is removing uses of the parameters so that an upcoming api change is less error prone.	2022-08-22 13:37:59 -07:00
Shao-Ce SUN	7167a4207e	[RISCV] Add zihintntl instructions Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D121670	2022-08-22 12:06:30 +08:00
Craig Topper	abce7acebd	[RISCV] Remove impossible TODO in RISCVRedundantCopyElimination. NFC If there are multiple conditional branches we shouldn't do any optimization.	2022-08-21 13:18:02 -07:00
Craig Topper	1a042dd6ed	[RISCV] Optimize x <s -1 ? x : -1. Improve x >u 1 ? x : 1. Similar to D132211, we can optimize x <s -1 ? x : -1 -> x <s 0 ? x : -1 Also improve the unsigned case from D132211 to use x != 0 which will give a bnez instruction which might be compressible. Differential Revision: https://reviews.llvm.org/D132252	2022-08-21 11:48:28 -07:00
Craig Topper	a6c3ccd476	[RISCV] Be more strict about LUI+ADDI macrofusion pre-RA. Don't macrofuse if the LUI has more than 1 user. That will likely require the LUI to have a different destination register post-RA. LUI+ADDI can only be fused if they write the same register.	2022-08-21 10:58:15 -07:00
LiaoChunyu	1fb87ace4d	[RISCV] Optimize x > 1 ? x : 1 -> x > 0 ? x : 1 if x == 1, x > 1 ? x : 1 return x, which is also 1. x > 0 ? x : 1 return 1. Reduce the number of load 1 instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132211	2022-08-21 20:26:39 +08:00
Simon Pilgrim	5263155d5b	[CostModel] Add CostKind argument to getShuffleCost Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future. Differential Revision: https://reviews.llvm.org/D132287	2022-08-21 10:54:51 +01:00
Craig Topper	6227b7ae31	[RISCV] Move xori creation for scalar setccs to lowering. This patch enables expansion or custom lowering for some integer condition codes so that any xori that is needed is created before the last DAG combine to enable optimization. I've seen cases where we end up with (or (xori (setcc), 1), (xori (setcc), 1)) which we would ideally convert to (xori (and (setcc), (setcc)), 1). This patch doesn't accomplish that yet, but it should allow us to add DAG combines as follow ups. Example https://godbolt.org/z/Y4qnvsq1b Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131729	2022-08-19 13:51:53 -07:00
Philip Reames	59960e8db9	[RISCV] Factor out getVectorImmCost cost after 0e7ed3 [nfc]	2022-08-19 12:53:54 -07:00
Philip Reames	e7fda46300	[RISCV] Correct costs for vector ceil/floor/trunc/round Add vector costs for ceil/floor/trunc/round. As can be seen in the tests, the prior default costs were a significant under estimate of the actual code generated. These costs are computed by simply generating code with the current backend, and then counting the number of instructions. I discount one vsetvli, and ignore the return. Differential Revision: https://reviews.llvm.org/D131967	2022-08-19 10:37:39 -07:00
Craig Topper	961838cc13	[RISCV] Add passthru operand to RISCVISD::SETCC_VL. Use it to the fix a bug in the fceil/ffloor lowerings. We were setting the passthru to IMPLICIT_DEF before and using a mask agnostic policy. This means where the incoming bits in the mask were 0 they could be anything in the outgoing mask. We want those bits in the outgoing mask to be 0. This means we need to pass the input mask as the passthru. This generates worse code because we are unable to allocate the v0 register to the output due to an earlyclobber constraint. We probably need a special TIED pseudoinstruction and probably custom isel since you can't use V0 twice in the input pattern. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132058	2022-08-19 08:53:44 -07:00
Craig Topper	c9a41fe60a	[RISCV] Prefer vnsrl.wi v8, v8, 0 over vnsrl.wx v8, v8, x0. I have a couple data points that some microarchitectures prefer the immediate 0 over x0. Does anyone know of microarchitectures where the opposite is true? Unfortunately, this is different than the vncvt.x.x.w alias from the spec. Perhaps the alias was poorly chosen if x0 isn't as optimal as immediate 0 on all microarchitectures. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132041	2022-08-19 08:40:17 -07:00
Alexey Bataev	0e7ed32c71	[SLP]Cost for a constant buildvector. In many cases constant buildvector results in a vector load from a constant/data pool. Need to consider this cost too. Differential Revision: https://reviews.llvm.org/D126885	2022-08-19 08:02:42 -07:00
Alexey Bataev	d53e245951	[COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC. Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to better estimate cost with immediate values. Part of D126885.	2022-08-19 07:33:00 -07:00
Craig Topper	ba1f4cab44	[RISCV] Copy SDNodeFlags in lowerToScalableOp. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D132177	2022-08-18 20:42:59 -07:00
Craig Topper	5349aa2354	[RISCV] Copy SDNodeFlags in doPeepholeMaskedRVV and doPeepholeMergeVVMFold Especially the NoFPExcept flag for FP. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D132173	2022-08-18 20:42:46 -07:00
Craig Topper	37c47b2cac	[RISCV] Change how mtune aliases are implemented. The previous implementation translated from names like sifive-7-series to sifive-7-rv32 or sifive-7-rv64. This also required sifive-7-rv32 and sifive-7-rv64 to be valid CPU names. As those are not real CPUs it doesn't make sense to accept them in -mcpu. This patch does away with the translation and adds sifive-7-series directly to RISCV.td. Removing sifive-7-rv32 and sifive-7-rv64. sifive-7-series is only allowed in -mtune. I've also added "rocket" to RISCV.td but have not removed rocket-rv32 or rocket-rv64. To prevent -mcpu=sifive-7-series or -mcpu=rocket being used with llc, I've added a Feature32Bit to all rv32 CPUs. And made it an error to have an rv32 triple without Feature32Bit. sifive-7-series and rocket do not have Feature32Bit or Feature64Bit set so the user would need to provide -mattr=+32bit or -mattr=+64bit along with the -mcpu to avoid the error. SiFive no longer names their newer products with 3, 5, or 7 series. Instead we have p200 series, x200 series, p500 series, and p600 series. Following the previous behavior would require a sifive-p500-rv32 and sifive-p500-rv64 in order to support -mtune=sifive-p500-series. There is currently no p500 product, but it could start getting confusing if there was in the future. I'm open to hearing alternatives for how to achieve my main goal of removing sifive-7-rv32/rv64 as a CPU name. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131708	2022-08-18 16:22:25 -07:00
Philip Reames	4d87591028	[RISCV] Use VScaleForTuning in costing of operations whose cost depends on VL On known hardware, reductions, gather, and scatter operations have execution latencies which correlated with the vector length (VL) of the operation. Most other operations (e.g. simply arithmetic) don't correlated in this way, and instead essentially fixed cost as VL varies. When I'd implemented initial scalable cost model support for reductions, gather, and scatter operations, I had used an upper bound on the statically unknown VL. The argument at the time was that this prevented falsely low costs, and biased the vectorizer away from generating bad (on some hardware) code. Unfortunately, practical experience shows we were a bit too effective at that goal, and the high costs defacto prevents vectorization using these constructs at all. This patch reverses course, and ties the returned cost not to the maximum possible VL, but the VL which would correspond to VScaleForTuning. This parameter is the same one the vectorizer uses when normalizing loop costs, so the term effectively cancels out. The result is that the vectorizer now sees these constructs as comparable in cost to their fixed length variants. This does introduce the possibility of the cost for these operations being a significant under estimate on platforms where actual VLEN is far from that implied by VScaleForTuning. On such platforms, we might make poor heuristic choices. Probably not in LV itself (due to the cancellation mentioned above), but possibly during e.g. lowering. I'm not currently aware of any concrete examples of this, but this patch does open a concern which did not previously exist. Previously, we had the problem of overestimating costs causing the same problem on machines much closer to default values for vscale for tuning. With this patch, we still have that problem potentially if vscale for tuning is set high (manually), and then the code is run on a narrow VLEN machine. Differential Revision: https://reviews.llvm.org/D131519	2022-08-18 13:10:03 -07:00
Simon Pilgrim	fdec50182d	[CostModel] Replace getUserCost with getInstructionCost * Replace getUserCost with getInstructionCost, covering all cost kinds. * Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks. Original Patch by @samparker (Sam Parker) Differential Revision: https://reviews.llvm.org/D79483	2022-08-18 11:55:23 +01:00
WuXinlong	515ece1a90	[RISCV] Add MC support of RISCV Zca Extension This patch adds support for part of Zc extension which will be frozen soon. This extension is designed to continue reducing the binary size of RISC-V programs. In this patch: `Zca` is a subset of C extension instructions that are compatible with the Zc extension. The spec of Zc ext is [[ https://github.com/riscv/riscv-code-size-reduction/releases \| Here ]] Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130141	2022-08-18 12:13:35 +08:00
Daniil Fukalov	7ed3d81333	[NFCI] Move cost estimation from TargetLowering to TargetTransformInfo. TragetLowering had two last InstructionCost related `getTypeLegalizationCost()` and `getScalingFactorCost()` members, but all other costs are processed in TTI. E.g. it is not comfortable to use other TTI members in these two functions overrided in a target. Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout parameter - it was always passed from TTI. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D117723	2022-08-18 00:38:55 +03:00
Craig Topper	550fab53e1	[RISCV] Fold (sub C, (xor (setcc), 1)) -> (add (setcc), C-1). Extracted from D131729 where we handled C==0. It's now generalized to more constants. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132000	2022-08-17 09:50:08 -07:00
Craig Topper	ab4cd154c6	[RISCV] Refactor performSUBCombine to prepare for D132000. This refactors the code into a separate function with early returns. D132000 adds an additional operation to the if/else that selects NewLHS, but can otherwise share the rest of the code. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132002	2022-08-17 09:50:08 -07:00
Alex Bradbury	ce38128194	[RISCV] Avoid redundant branch-to-branch when expanding cmpxchg If the success value of a cmpxchg is used in a branch, the expanded cmpxchg sequence ends up with a redundant branch-to-branch (as the backend atomics expansion happens as late as possible, passes to optimise such cases have already run). This patch identifies this case and avoid it when expanding the cmpxchg. Note that a similar optimisation is possible for a BEQ on the cmpxchg success value. As it's hard to imagine a case where real-world code may do that, this patch doens't handle that case. Differential Revision: https://reviews.llvm.org/D130192	2022-08-17 13:49:15 +01:00
Craig Topper	d27c147aaa	[RISCV] Allow lowerSELECT to fold integer setcc with FP select. We'd pick it up in DAG combine later even if we didn't handle it here. No test changes because we get it in DAG combine anyway.	2022-08-16 21:28:54 -07:00
Craig Topper	ba1fb54821	[RISCV] Reuse existing VT variable instead of calling getValueType() repeatedly. NFC	2022-08-16 19:56:55 -07:00
Monk Chiang	0af4651c0f	[RISCV] Add scheduling class for vector pseudo segment instructions. Add scheduling resource for vector segment load/store instructions in D128886. I miss to add scheduling resource for pseudo segment instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130222	2022-08-16 17:54:47 -07:00
Craig Topper	53ce22e429	Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine." This time using N1 instead of N0 since N1 points to the original setcc. This now affects scheduling as I expected. Original commit message: We change seteq<->setne but it doesn't change the semantics of the setcc. We should keep original debug location. This is consistent with visitXor in the generic DAGCombiner.	2022-08-16 15:51:07 -07:00
Craig Topper	2dfa4b6475	Revert "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine." This reverts commit `1380b21ceb`. I mixed up N0 and N1 and didn't do what I intended.	2022-08-16 15:47:01 -07:00
Craig Topper	1380b21ceb	[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine. We change seteq<->setne but it doesn't change the semantics of the setcc. We should keep original debug location. This is consistent with visitXor in the generic DAGCombiner.	2022-08-16 15:40:09 -07:00
Craig Topper	b5a18de651	[RISCV] Remove C!=0 restriction from (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)). While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is still preferrable. (addi X, -1) can be compressed, sub with x0 on the LHS is never compressible.	2022-08-16 14:49:52 -07:00
Craig Topper	de6fd16971	[RISCV] Don't fold (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) if C-1 isn't simm12. We still need to materialize the constant in a register and we may not be removing all uses of the original constant so it may increase code size.	2022-08-16 14:11:31 -07:00
Craig Topper	4184edc691	[RISCV] (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) fold for FP setcc. This introduce an xori in some cases. I don't believe it was the intention of the original patch. This was an accident because nonan FP equality compares also use SETEQ/SETNE. Also pass the correct type to getSetCCInverse.	2022-08-16 13:00:36 -07:00
Craig Topper	c7e58836e8	[RISCV] Minor cleanups to performSUBCombine. NFC -Rename variable NnzC -> N0C. -Use SelectionDAG::getSetCC to reduce code. -Use SDValue::getOperand instead of operator-> and SDNode::getOperand. Initial steps to add another similar combine to this code.	2022-08-16 12:59:16 -07:00
Craig Topper	7a73ab5818	[RISCV] Enable isTruncateFree in SDAG for i64->i32 on rv64. We have a good selection of W instructions, so promoting a truncated value back to i64 is often free. This appears to be a net code size reduction on SPECINT2006. This has been split from D130397 as one of the patches needed to complete that. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131819	2022-08-15 08:32:51 -07:00
Kazu Hirata	f5a68feab3	Use llvm::none_of (NFC)	2022-08-14 16:25:39 -07:00
LiaoChunyu	99ef0ddea3	[RISCV] Fold (sub constant, (setcc x, y, eq/neq)) -> (add constant - 1, (setcc x, y, neq/eq)) (setcc x, y, eq/neq) are seqz, snez that set rd = 0/1. addi is used to process immediate, which can save instructions for load immediate. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131471	2022-08-13 20:37:57 +08:00
Craig Topper	37db283362	[RISCV] isImpliedByDomCondition returns an Optional<bool> not a bool. We were incorrectly checking that it returned an implicaton result, not that the implication result itself was true.	2022-08-12 22:21:05 -07:00
jacquesguan	0fe5f03eeb	[RISCV][NFC] Use nested namespace definations. Since we use C++17 now, we could use nested namespace definations to simplify code. Differential Revision: https://reviews.llvm.org/D131751	2022-08-13 09:56:59 +08:00
Craig Topper	e493944f5f	[RISCV] Use SLTIU X, -1 for (setne X, -1). Since -1 is the maximum unsigned value, all values less than it are not equal to it.	2022-08-11 15:36:04 -07:00
Craig Topper	2c79801a0e	[RISCV] Add more ineg+setcc isel patterns to avoid creating neg+xori+slti(u). Including patterns to select addiw if only the lower 32 bits are used. I'm not excited about adding this many patterns. I'm looking at whether we can create the xori during lowering and move the ineg patterns to DAGCombiner.	2022-08-11 14:24:09 -07:00
Yeting Kuo	875694089d	[RISCV] Peephole optimization to fold merge.vvm and unmasked intrinsics. The patch uses peephole method to fold merge.vvm and unmasked intrinsics to masked intrinsics. Using peephole intead of tablegen patterns is to avoid large auto gnerated code. Note: The patch ignores segment loads since I don't know how to test them. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130442	2022-08-11 17:58:11 +08:00
Jonas Hahnfeld	940733d6a0	[RISCV] Re-enable JIT support Commit `8922adf646` recently made JITTargetMachineBuilder honor the hasJIT property of the target. LLVM supports just-in-time compilation on RISC-V, so set the flag. Differential Revision: https://reviews.llvm.org/D131617	2022-08-11 11:41:02 +02:00
jacquesguan	21bf59c92a	[RISCV] Add cost model for mask vector extend and truncate instruction. As extending from or truncating to mask vector do not use the same instructions as the normal cast, this path changed it to 2 which is the number of instructions we used. Differential Revision: https://reviews.llvm.org/D131552	2022-08-11 10:55:43 +08:00
Alex Bradbury	47b1f8362a	[RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls Prior to this patch, libcalls inserted by the SelectionDAG legalizer could never be tailcalled. The eligibility of libcalls for tail calling is is partly determined by checking TargetLowering::isInTailCallPosition and comparing the return type of the libcall and the calleer. isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly (which always returns false if not implemented by the target). This patch provides a minimal implementation of TargetLowering::isUsedByReturnOnly - enough to support tail calling libcalls on hard float ABIs. Soft-float ABIs are left for a follow on patch. libcall-tail-calls.ll also shows missed opportunities to tail call integer libcalls, but this is due to issues outside of the isUsedByReturnOnly hook. Differential Revision: https://reviews.llvm.org/D131087	2022-08-10 10:50:29 +01:00
jacquesguan	b6b1c0d1c4	[RISCV] Add cost model for fp-mask cast op. The cost of convert from or to mask vector is different from other cases. We could not use PowDiff to calculate it. This patch set it to 3 as we use 3 instruction to make it. Differential Revision: https://reviews.llvm.org/D131149	2022-08-10 17:14:37 +08:00
Nikita Popov	f5ed0cb217	[RISCV] Add target feature to force-enable atomics This adds a +forced-atomics target feature with the same semantics as +atomics-32 on ARM (D130480). For RISCV targets without the +a extension, this forces LLVM to assume that lock-free atomics (up to 32/64 bits for riscv32/64 respectively) are available. This means that atomic load/store are lowered to a simple load/store (and fence as necessary), as these are guaranteed to be atomic (as long as they're aligned). Atomic RMW/CAS are lowered to __sync (rather than __atomic) libcalls. Responsibility for providing the __sync libcalls lies with the user (for privileged single-core code they can be implemented by disabling interrupts). Code using +forced-atomics and -forced-atomics are not ABI compatible if atomic variables cross the ABI boundary. For context, the difference between __sync and __atomic is that the former are required to be lock-free, while the latter requires a shared global lock provided by a shared object library. See https://llvm.org/docs/Atomics.html#libcalls-atomic for a detailed discussion on the topic. This target feature will be used by Rust's riscv32i target family to support the use of atomic load/store without atomic RMW/CAS. Differential Revision: https://reviews.llvm.org/D130621	2022-08-09 16:04:46 +02:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Craig Topper	e2bfbed2bb	[RISCV] Add ReadFStoreData as a SchedRead. The floating point stores use a different register class, it probably makes sense to have a different SchedRead. Reviewed By: monkchiang Differential Revision: https://reviews.llvm.org/D131379	2022-08-08 09:33:19 -07:00
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Craig Topper	75c64c7c4e	[RISCV] Don't use li+sh3add for constants that can use lui+add. If we're adding a constant that can't use addi we try a few tricks, one of which is using li+sh3add. We should not do this if lui+add would work. For example adding 8192. Using sh3add prevents folding a sext.w to form addw, thus increasing instruction count.	2022-08-05 12:47:03 -07:00
Philip Reames	9a9848f4b9	[RISCVInsertVSETVLI] Remove an unsound optimization This fixes a bug reported privately by @craig.topper. Here's an example which illustrates the problem: vsetivli a1, a0, e32, m1, ta, mu # both DefInfo and PrevInfo vsetivli a2, a1, e32, m4, ta, mu With the unsound result being: vsetivli a1, a0, e32, m1, ta, mu vsetivli a2, a0, e32, m4, ta, mu Consider the case where this is running on a machine with VLEN=512,. For this case, the VLMAXs are 16 and 64 respectively. Consider for a0 = 33. The correct result is: a1 = 16, and a2 = 16 After the unsound optimization: a1 = 16 and a2 = 33 This particular example used VLMAXs which differed by more than a power of two. With a difference of only one power of two, there's another form of this bug which involves the AVL < 2 x VLMAX special case, but that ones more complicated to construct as many examples turn out accidentally sound. This patch takes the approach of simply removing the unsound optimization, but there are multiple sound sub-cases of it. I plan to return to at least a couple of them, but figured it was cleaner to remove the unsound optimization (for ease of backporting), and then review the new optimizations on their own. Differential Revision: https://reviews.llvm.org/D131264	2022-08-05 12:13:08 -07:00
Craig Topper	12a1ca9c42	[RISCV] Relax another one use restriction in performSRACombine. When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C) it's possible that the add is used by multiple sras. We should allow the combine if all the SRAs will eventually be updated. After transforming all of the sras, the shls will share a single (sext_inreg (add X, C1), i32). This pattern occurs if an sra with 32 is used as index in multiple GEPs with different scales. The shl from the GEPs will be combined with the sra before we get a chance to match the sra pattern.	2022-08-04 14:32:31 -07:00
Craig Topper	a2de12c987	[RISCV] Relax a one use restriction performSRACombine When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C) ignore the use count on the (shl X, 32). The sext_inreg after the transform is free. So we're only making 2 new instructions, the add and the shl. So we only need to be concerned with replacing the original sra+add. The original shl can have other uses. This helps if there are multiple different constants being added to the same shl.	2022-08-04 11:25:08 -07:00
jacquesguan	b61cfc91ea	[RISCV] Add cost modelling for vector widenning reduction. In RVV, we use vwredsum.vs and vwredsumu.vs for vecreduce.add(ext(Ty A)) if the result type's width is twice of the input vector's SEW-width. In this situation, the cost of extended add reduction should be same as single-width add reduction. So as the vector float widenning reduction. Differential Revision: https://reviews.llvm.org/D129994	2022-08-04 15:31:31 +08:00
Craig Topper	53d560b22f	[RISCV] Prevent infinite loop after D129980. D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into (seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it will be turned back into an 'and' by SimplifyDemandedBits which can cause an infinite loop. To prevent this, check if bit 31 is 0 with computeKnownBits before doing the transformation. Fixes PR56905. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131113	2022-08-03 15:19:07 -07:00
David Truby	9a976f3661	[llvm] Always use TargetConstant for FP_ROUND ISD Nodes This patch ensures consistency in the construction of FP_ROUND nodes such that they always use ISD::TargetConstant instead of ISD::Constant. This additionally fixes a bug in the AArch64 SVE backend where patterns were matching against TargetConstant nodes and sometimes failing when passed a Constant node. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130370	2022-08-03 14:02:11 +01:00
Alex Bradbury	28f12a09ae	[RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics An unnecessary sext.w is generated when masking the result of the riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed. Although this isn't a particularly important optimisation, removing the sext.w simplifies implementation of an additional cmpxchg-related optimisation in D130192. Although I can't produce a test with different codegen for the other atomics intrinsics, these are added as well for completeness. Differential Revision: https://reviews.llvm.org/D130191	2022-08-03 13:41:58 +01:00
Fraser Cormack	646e2f4803	[VP] Rename VP int<->float conversion ISD opcodes These should be named like the non-VP versions for consistency. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130967	2022-08-03 10:04:38 +01:00
Craig Topper	f19497f7b0	[RISCV] Use InstVisitor in RISCVCodeGenPrepare. NFC Makes it easy to add new instructions to look at without dispatching manually.	2022-08-02 21:19:30 -07:00
Craig Topper	a5605f1f68	[RISCV] Fix operand number in debug message in RISCVMergeBaseOffset. This used to print from the ADDI where the operand number was correct. It recently changed to print from the LUI or AUIPC which needs to use operand 1 instead of 2. This shows up as a crash with -debug.	2022-08-02 15:27:23 -07:00
Craig Topper	ae6877836e	[RISCV] Add scheduler classes to PseudoVMV*R_V. I think these pseudos will exist when the post-RA scheduler runs so they should have sched classes. Reviewed By: monkchiang Differential Revision: https://reviews.llvm.org/D130945	2022-08-02 09:38:32 -07:00
Craig Topper	2e5c516a3d	[RISCV] Add scheduler class to PseudoReadVLENB. Reviewed By: monkchiang Differential Revision: https://reviews.llvm.org/D130938	2022-08-02 09:38:32 -07:00
Alex Bradbury	5ad59c9e59	[RISCV][NFCI] Set TransientStackAlignment and rely on it rather than RVV-specific logic on RVV-less functions * TargetFrameLowering has a TransientStackAlignment field that "returns the number of bytes to which the stack pointer must be aligned at all times, even between calls. * As explained in the [RISC-V calling convention](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc), the stack pointer must remain fully aligned throughout execution for compliant code. This is important for embedded targets that might avoid realigning the stack pointer for interrupt service routines. Systems running full OSes may always realign the stack anyway. * TransientStackAlignment is used in estimateStackSize in MachineFrameInfo and in PEI::calculateFrameObjectOffsets. * estimateStackSize is only used in the RISC-V backend for scavenging slots. It may be possible to craft a function where the difference is observable, but it wouldn't be a meaningful test. * calculateFrameObjectOffsets makes use of TransientStackAlignment, but then sets the stack alignment to the max of that alignment and MaxAlign, which is unconditionally set to 16 in RISCVFrameLowering::processFunctionBeforeFrameFinalized * I've changed this logic to only set MaxAlign if there are RVV frame objects. There should be no functional change here for either RVV targets (MaxAlign is set as before) or non-RVV targets (TransientStackAlign is now 16 anyway). Differential Revision: https://reviews.llvm.org/D130068	2022-08-02 09:46:06 +01:00
wanglian	e208bab55f	[RISCV][NFC] Use defined variable instead some code. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D130687	2022-08-02 16:26:33 +08:00
Craig Topper	da5b1bf5bb	[RISCV] Teach RISCVMergeBaseOffset to merge %lo/%pcrel_lo into load/store after folding arithmetic. It's possible we have: lui a0, %hi(sym) addi a0, %lo(sym) addi a0, <offset1> lw a0, <offset2>(a0) We want to arrive at lui a0, %hi(sym+offset1+offset2) lw a0, %lo(sym+offset1+offset2) We currently fail to do this because we only consider loads/stores if we didn't find any arithmetic. This patch splits arithmetic folding and load/store folding into two separate phases. The load/store folding can no longer assume the offset in hi/lo is 0 so we must combine the offsets. I've applied the same simm32 limit that we applied in the arithmetic folding. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D130931	2022-08-01 15:33:21 -07:00
Craig Topper	e07a8155f5	[RISCV] Move Pre-RA pseudo expansion from addMachineSSAOptimization to addPreRegAlloc. addMachineSSAOptimization is skipped for -O0, but this pass is required for -O0.	2022-08-01 13:44:43 -07:00
Craig Topper	450edb0b37	[RISCV] Explicitly select second operand of branch condition to X0. At least based on the lit tests, the coalescer sometimes fails to propagate the copy from X0 into the branch instruction. This patch does it manually during isel. The majority of the changes are from the select patterns. Some of the changes are just register allocation changes. Only the Select change affects the whether a b*z instruction is generated in the tests. I changed the branch pattern for consistency. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D130809	2022-08-01 11:16:48 -07:00
Craig Topper	ad8db972b0	[RISCV] Eagerly delete instructions in MergeBaseOffset. The only iterator we're holding points to HiLUI and we never delete that so I think it is safe to delete everything else immediately. I want to split detectAndFoldOffset into two phases. First, combine LUI+ADDI with any ADD/ADDI/SHXADD that comes after it. This may open opportunities to fold the ADDI from the LUI+ADDI into a load/store address. So the load/store folding should run as a second phase even if the ADD/ADDI/SHXADD made changes. In order to do this we need to eagerly delete instructions in the first phase so that we don't have dead users of the LUI+ADDI when we start the second phase. Patches to split the phases will come later. Reviewed By: asb, luismarques Differential Revision: https://reviews.llvm.org/D130119	2022-08-01 09:32:46 -07:00
Lorenzo Albano	71b7c03fd6	[RISCV][VP] Custom lower VP_STRIDED_LOAD and VP_STRIDED_STORE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121113	2022-08-01 09:23:45 -07:00
Luís Marques	0bc177b6f5	[RISCV] Extend the Merge Base Offset pass to handle AUIPC+ADDI Builds upon D123264, adding support for merging the low part of the LLA address into the load/store instruction offsets. Differential Revision: https://reviews.llvm.org/D123265	2022-08-01 11:30:02 +02:00
Luís Marques	260a641068	[RISCV] Pre-RA expand pseudos pass Expand load address pseudo-instructions earlier (pre-ra) to allow follow-up patches to fold the addi of PseudoLLA instructions into the immediate operand of load/store instructions. Differential Revision: https://reviews.llvm.org/D123264	2022-07-31 23:19:00 +02:00
Craig Topper	d21b315360	[RISCV] Remove vmerges from vector ceil, floor, trunc lowering. Use masked operations to suppress spurious exception bits being set in fflags. Unfortunately, doing this adds extra copies.	2022-07-30 10:58:41 -07:00
Craig Topper	a23f07fb1d	[RISCV] Add merge operands to more RISCVISD::*_VL opcodes. This adds a merge operand to all of the binary _VL nodes. Including integer and widening. They all share multiclasses in tablegen so doing them all at once was easiest. I plan to use FADD_VL in an upcoming patch. The rest are just for consistency to keep tablegen working. This does reduce the isel table size by about 25k so that's nice. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130816	2022-07-30 10:26:38 -07:00
Craig Topper	9bf305fe2b	[RISCV] Swap the merge and mask operand order for VRGATHER*_VL and FCOPYSIGN_VL nodes. Based on review feedback from D130816.	2022-07-30 09:57:05 -07:00
Craig Topper	e637feee80	[RISCV] Add isel pattern for (setne/eq GPR, -2048) For constants in the range [-2047, 2048] we use addi. If the constant is -2048 we can use xori. If we don't match this explicitly, we'll emit an LI for the -2048 followed by an XOR.	2022-07-29 14:07:38 -07:00
Craig Topper	2750873dfe	[RISCV] Update lowerFROUND to use masked instructions. This avoids a vmerge at the end and avoids spurious fflags updates. This isn't used for constrained intrinsic so we technically don't have to worry about fflags, but it doesn't cost much to support it. To support I've extend our FCOPYSIGN_VL node to support a passthru operand. Similar to what was done for VRGATHER*_VL nodes. I plan to do a similar update for trunc, floor, and ceil. Reviewed By: reames, frasercrmck Differential Revision: https://reviews.llvm.org/D130659	2022-07-28 10:05:19 -07:00
Craig Topper	89173dee71	[RISCV] Remove duplicate code. NFC The same operations are part of `FloatingPointVecReduceOps` a little bit earlier.	2022-07-28 10:05:19 -07:00
Craig Topper	a304d70ee9	[RISCV] Reorder (and/or/xor (shl X, C1), C2) if we can form ANDI/ORI/XORI. InstCombine and DAGCombine prefer to keep shl before binops. This patch teaches isel to convert to (shl (and/or/xor X, C1 >> C2), C2) if (C1 >> C2) is a simm12. The idea was taken from X86's isel code. There's a special case implemented for a sext_inreg between the shift and the binop. Differential Revision: https://reviews.llvm.org/D130610	2022-07-27 17:35:26 -07:00
Craig Topper	1d1d8d6025	[RISCV] Reorder code in lowerFROUND to make the diff in D130659 cleaner. NFC	2022-07-27 17:13:04 -07:00
Craig Topper	98647330bf	[RISCV] Add merge operand to RISCVISD::FCOPYSIGN_VL. Similar to what was done for VRGATHER*_VL recently. This will be used in D130659.	2022-07-27 15:25:34 -07:00
Philip Reames	15c645f7ee	[RISCV] Enable (scalable) vectorization by default This change enables vectorization (using scalable vectorization only, fixed vectors are not yet enabled) for RISCV when vector instructions are available for the target configuration. At this point, the resulting configuration should be both stable (e.g. no crashes), and profitable (i.e. few cases where scalar loops beat vector ones), but is not going to be particularly well tuned (i.e. we emit the best possible vector loop). The goal of this change is to align testing across organizations and ensure the default configuration matches what downstreams are using as closely as possible. This exposes a large amount of code which hasn't otherwise been on by default, and thus may not have been fully exercised. Given that, having issues fall out is not unexpected. If you find issues, please make sure to include as much information as you can when reverting this change. Differential Revision: https://reviews.llvm.org/D129013	2022-07-27 12:36:04 -07:00
Craig Topper	32622d6de4	[RISCV] Add isel pattern for (mul (and X, 0xffffffff), 3<<C) with Zba. We can use slli.uw by C followed by sh1add. Similar can be done for multiples of 5 and 9. We need to make sure that C is less than 32 to stay in bounds of the 5-bit immediate for slli.uw. We have existing patterns for (mul X, 3<<C) that use sh1add followed by slli. That order doesn't allow the and to be folded. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130146	2022-07-27 09:41:59 -07:00
Craig Topper	9b27d13204	[RISCV] Disable constant hoisting for multiply by negated power of 2. A mul by a negated power of 2 is a slli followed by neg. This doesn't require any constant materialization and may be lower latency than mul. The neg may also be foldable into other arithmetic. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130047	2022-07-27 09:37:59 -07:00
LiaoChunyu	bf4f9a468a	[RISCV]Enable isIntDivCheap when attribute is minsize Don't expand divisions by constants when attribute is minsize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130543	2022-07-27 18:22:51 +08:00
Craig Topper	3a2d7d8ad5	[RISCV] Add Predicate to c.lw/c.sw/c.lwsp/c.swsp InstAliases with no offset. These are aliases that allow the immediate offset to be ommitted. We had predicates for the RV64, RV32+F, and D versions, but not the base versions. I've also re-ordered them to share Predicate lines to improve readability.	2022-07-26 11:06:00 -07:00
Kazu Hirata	3f3930a451	Remove redundaunt virtual specifiers (NFC) Identified with tidy-modernize-use-override.	2022-07-25 23:00:59 -07:00
Craig Topper	45944e7cf4	[RISCV] Refactor translateSetCCForBranch to prepare for D130508. NFC. D130508 handles more constants than just 1 or -1. We need to extract the constant instead of relying isOneConstant or isAllOnesConstant.	2022-07-25 15:54:54 -07:00
Craig Topper	1db6d6dcd8	[RISCV] Teach RISCVCodeGenPrepare to optimize (zext (abs(i32 X, i1 1))). (abs(i32 X, i1 1) always produces a positive result. The 'i1 1' means INT_MIN input produces poison. If the result is sign extended, InstCombine will convert it to zext. This does not produce ideal code for RISCV. This patch reverses the zext back to sext which can be folded into a subw or negw. Ideally we'd do this in SelectionDAG, but we lose the INT_MIN poison flag when llvm.abs becomes ISD::ABS. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130412	2022-07-25 09:36:41 -07:00
jacquesguan	d8800ead62	[RISCV] Scalarize binop followed by extractelement. This patch adds shouldScalarizeBinop to RISCV target in order to convert an extract element of a vector binary operation into an extract element followed by a scalar binary operation. Differential Revision: https://reviews.llvm.org/D129545	2022-07-25 17:23:31 +08:00
Kazu Hirata	b5188591a0	[llvm] Remove redundaunt virtual specifiers (NFC) Identified with modernize-use-override.	2022-07-24 21:50:35 -07:00
Craig Topper	9adc00a9d0	[RISCV] Add a continue to reduce nesting. NFC	2022-07-23 17:36:12 -07:00
Kazu Hirata	1cc7f5bede	Use static_assert instead of assert (NFC) Identified with misc-static-assert.	2022-07-23 09:22:27 -07:00
Craig Topper	ab2348a6fa	[RISCV] Add sext.b/h and zext.b/h/w to RISCVInstrInfo::foldMemoryOperandImpl. We can always fold zext.b since it is just andi. The others require Zba/Zbb. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130302	2022-07-21 14:54:58 -07:00
Craig Topper	add17fc8e4	[RISCV] Combine (select_cc (srl (and X, 1<<C), C), 0, eq/ne, true, fale) (srl (and X, 1<<C), C) is the form we receive for testing bit C. An earlier combine removed the setcc so it wasn't there to match when we created the SELECT_CC. This doesn't happen for BR_CC because generic DAG combine rebuilds the setcc if it is used by BRCOND. We can shift X left by XLen-1-C to put the bit to be tested in the MSB, and use a signed compare with 0 to test the MSB.	2022-07-20 22:32:11 -07:00
Craig Topper	7dda6c71b1	[RISCV] Refactor the common combines for SELECT_CC and BR_CC into a helper function. The only difference between the combines were the calls to getNode that include the true/false values for SELECT_CC or the chain and branch target for BR_CC. Wrap the rest of the code into a helper that reads LHS, RHS, and CC and outputs new values and a bool if a new node needs to be created.	2022-07-20 21:18:07 -07:00
Craig Topper	8983db15a3	[RISCV] Optimize (brcond (seteq (and X, 1 << C), 0)) If C > 10, this will require a constant to be materialized for the And. To avoid this, we can shift X left by XLen-1-C bits to put the tested bit in the MSB, then we can do a signed compare with 0 to determine if the MSB is 0 or 1. Thanks to @reames for the suggestion. I've implemented this inside of translateSetCCForBranch which is called when setcc+brcond or setcc+select is converted to br_cc or select_cc during lowering. It doesn't make sense to do this for general setcc since we lack a sgez instruction. I've tested bit 10, 11, 31, 32, 63 and a couple bits betwen 11 and 31 and between 32 and 63 for both i32 and i64 where applicable. Select has some deficiencies where we receive (and (srl X, C), 1) instead. This doesn't happen for br_cc due to the call to rebuildSetCC in the generic DAGCombiner for brcond. I'll explore improving select in a future patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130203	2022-07-20 18:40:49 -07:00
Craig Topper	31b8939ded	[RISCV] Recognize bexti from (srl (and X, 1<<C), C). This is the form we get for (zext (setne (and X 1<<C))). We only had bexti patterns for the alternative form (and (srl X, C), 1).	2022-07-20 15:03:52 -07:00
ksyx	3198364e6e	[RISCV][Clang] Add support for Zmmul extension This patch implements recently ratified extension Zmmul, a subextension of M (Integer Multiplication and Division) consisting only multiplication part of it. Differential Revision: https://reviews.llvm.org/D103313 Reviewed By: craig.topper, jrtc27, asb	2022-07-18 20:26:08 -04:00
Craig Topper	0b02752899	[RISCV] Optimize (seteq (i64 (and X, 0xffffffff)), C1) (and X, 0xffffffff) requires 2 shifts in the base ISA. Since we know the result is being used by a compare, we can use a sext_inreg instead of an AND if we also modify C1 to have 33 sign bits instead of 32 leading zeros. This can also improve the generated code for materializing C1. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D129980	2022-07-18 10:54:45 -07:00
Craig Topper	7c0b9b379b	[RISCV] Add isel patterns for ineg+setge/le/uge/ule. setge/le/uge/ule selected by themselves require an xori with 1. If we're negating the setcc, we can fold the xori with the neg to create an addi with -1. This works because xori X, 1 is equivalent to 1 - X if X is either 0 or 1. So we're doing -(1 - X) which is X-1 or X+-1. This improves the code for selecting between 0 and -1 based on a condition for some conditions. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D129957	2022-07-18 09:55:01 -07:00
Craig Topper	d7f2a63371	[RISCV] Fold stack reload into sext.w by using lw instead of ld. We can use lw to load 4 bytes from the stack and sign extend them instead of loading all 8 bytes. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D129948	2022-07-18 09:09:17 -07:00
Simon Pilgrim	259c36e7c1	[DAG] Add asserts to isDesirableToCommuteWithShift overrides to ensure its being called from a shift. NFC.	2022-07-18 13:11:24 +01:00
jacquesguan	2b11174079	[RISCV][NFC] Use more Arrayref in TargetLowering functions. This patch replaces some foreach with Arrayref, and abstract some same literal array with a variable. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125656	2022-07-18 03:33:45 +00:00
jacquesguan	bd228a1772	[RISCV] Extend use of SHXADD instructions in RVV spill/reload code. This patch extends D124824. It uses SHXADD+SLLI to emit 3, 5, or 9 multiplied by a power 2. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D129179	2022-07-18 10:53:19 +08:00
Fangrui Song	d955497112	[RISCV] Simplify lowerGlobalAddress. NFC	2022-07-17 15:42:45 -07:00
Craig Topper	decf385c27	[RISCV] Teach targetShrinkDemandedConstant to handle OR and XOR. We were only handling AND before, but SimplifyDemandedBits can also call it for OR and XOR.	2022-07-17 12:36:33 -07:00
Craig Topper	8cc483099a	[RISCV] Teach RISCVCodeGenPrepare to optimize (i64 (and (zext/sext (i32 X), C1))) If X is known positive by a dominating condition, we can fill in ones into the upper bits of C1 if that would allow it to become an simm12 allowing the use of ANDI. This pattern often occurs in unrolled loops where the induction variable has been widened. To get the best benefit from this, I had to move the pass above ConstantHoisting which is in addIRPasses. Otherwise the AND constant is often hoisted away from the AND. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D129888	2022-07-17 11:00:56 -07:00
Craig Topper	73f766ca9a	[RISCV] Remove unnecessary use of IRBuilder from RISCVCodeGenPrepare. We're creating single instruction to replace another instruction. We can insert using the InsertBefore operand of the constructor. Then copy the debug location.	2022-07-17 10:59:54 -07:00
Craig Topper	ee6267c443	[RISCV] Remove Gather/Scatter Opt from the O0 pipeline.	2022-07-17 10:58:33 -07:00
Lian Wang	dca821d80a	[RISCV] Add cost model for vector.reverse mask operation Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128784	2022-07-15 06:58:57 +00:00
Craig Topper	79016f6eef	[RISCV] Refine the heuristics for our custom (mul (and X, C2), C1) isel. Prefer to use SLLI instead of zext.w/zext.h in more cases. SLLI might be better for compression.	2022-07-14 18:24:10 -07:00
Craig Topper	bc0d656558	[RISCV] Fix mistake in RISCVTTIImpl::getIntImmCostInst. zext.w requires Zba not Zbb. The test was also wrong, but had the correct comment.	2022-07-14 16:42:35 -07:00
Craig Topper	9913ea490a	[RISCV] Make TuneSiFive7 depend on TuneNoDefaultUnroll instead of listing it for every SiFive7 CPU	2022-07-14 15:57:30 -07:00
Craig Topper	1a8468ba61	[RISCV] Add a RISCV specific CodeGenPrepare pass. Initial optimization is to convert (i64 (zext (i32 X))) to (i64 (sext (i32 X))) if the dominating condition for the basic block guaranteed the sign bit of X is zero. This frequently occurs in loop preheaders where a signed induction variable that can never be negative has been widened. There will be a dominating check that the 32-bit trip count isn't negative or zero. The check here is not restricted to that specific case though. A i32->i64 sext is cheaper than zext on RV64 without the Zba extension. Later optimizations can often remove the sext from the preheader basic block because the dominating block also needs a sext to evaluate the greater than 0 check. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D129732	2022-07-14 10:20:59 -07:00
Fraser Cormack	d1a5669f5e	[RISCV] Disable subregister liveness by default We previously enabled subregister liveness by default when compiling with RVV. This has been shown to cause miscompilations where RVV register operand constraints are not met. A test was added for this in D129639 which explains the issue in more detail. Until this issue is fixed in some way, we should not be enabling subregister liveness unless the user asks for it. Reviewed By: craig.topper, rogfer01, kito-cheng Differential Revision: https://reviews.llvm.org/D129646	2022-07-14 17:04:10 +01:00
David Green	3e0bf1c7a9	[CodeGen] Move instruction predicate verification to emitInstruction D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Recommitted with some fixes for the leftover MCII variables in release builds. Differential Revision: https://reviews.llvm.org/D129506	2022-07-14 09:33:28 +01:00
Craig Topper	257755530a	[RISCV] Fold (sra (sext_inreg (shl X, C1), i32), C2) -> (sra (shl X, C1+32), C2+32). The former pattern will select as slliw+sraiw while the latter will select as slli+srai. This can enable the slli+srai to be compressed. Differential Revision: https://reviews.llvm.org/D129688	2022-07-13 14:34:17 -07:00
Philip Reames	dde2a7fb6d	[RISCV] Exploit fact that vscale is always power of two to replace urem sequence When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale. vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.) We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.) It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic. Differential Revision: https://reviews.llvm.org/D129609	2022-07-13 10:54:47 -07:00
David Green	95252133e1	Revert "Move instruction predicate verification to emitInstruction" This reverts commit `e2fb8c0f4b` as it does not build for Release builds, and some buildbots are giving more warning than I saw locally. Reverting to fix those issues.	2022-07-13 13:28:11 +01:00
David Green	e2fb8c0f4b	Move instruction predicate verification to emitInstruction D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Differential Revision: https://reviews.llvm.org/D129506	2022-07-13 12:53:32 +01:00
Fraser Cormack	b336cf856e	[RISCV] Add early-exit to RVV stack computation. NFCI. This patch was split off from D126465, where an early-exit is necessary as it checks the VLEN and that asserts that V instructions are present. Since this makes logical sense on its own, I think it's worth landing regardless of D126465. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D129617	2022-07-13 08:50:08 +01:00
Monk Chiang	2b045324b2	[RISCV] Add scheduling resources for vector segment instructions. Add scheduling resources for vector segment instructions Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128886	2022-07-12 22:51:58 -07:00
Craig Topper	c5be6a8308	[RISCV] Use X0 in place of VLMaxSentinel in lowering. I thought I had already fixed all of these, but I guess I missed one.	2022-07-11 23:29:04 -07:00
Craig Topper	c3c17b1695	[RISCV] Use MVT for the argument to getMaskTypeFor. NFC Only one caller didn't already have an MVT and that was easy to fix. Since the return type is MVT and it uses MVT::getVectorVT, taking an MVT as input makes the most sense.	2022-07-11 15:14:44 -07:00
Craig Topper	759e5e0096	[RISCV] Remove doPeepholeLoadStoreADDI. All of the cases should be handled by SelectAddrRegImm now. Reviewed By: asb, luismarques Differential Revision: https://reviews.llvm.org/D129451	2022-07-11 10:44:33 -07:00
Craig Topper	907d923a20	[RISCV] Move the custom isel for (add X, imm) into SelectAddrRegImm. This custom isel was used to split the lo12 bits of the imm so that they could be folded into load/store addresses via a post-isel peephole. This patch instead splits the immediate during isel and folds the lo12 removing the need for the post-isel peephole to do anything. After this we'll be able to remove the post-isel peephole. Reviewed By: asb, luismarques Differential Revision: https://reviews.llvm.org/D129450	2022-07-11 10:44:33 -07:00
Craig Topper	1a2bd44b77	[RISCV] Make shouldConvertConstantLoadToIntImm return true unless enableUnalignedScalarMem is true. This restores the old behavior before D129402 when enableUnalignedScalarMem is false. This fixes a regression spotted by @asb. To fix this correctly, we need to consider alignment of the load we'd be replacing, but that's not possible in the current interface.	2022-07-11 09:40:08 -07:00
David Sherwood	03fee6712a	[LoopVectorize] Add option to use active lane mask for loop control flow Currently, for vectorised loops that use the get.active.lane.mask intrinsic we only use the mask for predicated vector operations, such as masked loads and stores, etc. The loop itself is still controlled by comparing the canonical induction variable with the trip count. However, for some targets this is inefficient when it's cheap to use the mask itself to control the loop. This patch adds support for using the active lane mask for control flow by: 1. Generating the active lane mask for the next iteration of the vector loop, rather than the current one. If there are still any remaining iterations then at least the first bit of the mask will be set. 2. Extract the first bit of this mask and use this bit for the conditional branch. I did this by creating a new VPActiveLaneMaskPHIRecipe that sets up the initial PHI values in the vector loop pre-header. I've also made use of the new BranchOnCond VPInstruction for the final instruction in the loop region. Differential Revision: https://reviews.llvm.org/D125301	2022-07-11 13:46:55 +01:00

1 2 3 4 5 ...

2411 Commits