Commit Graph

2411 Commits

Author SHA1 Message Date
Philip Reames b45a262679 [RISCV] Enable fixed length vectors and loop vectorization with same
This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size.

For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware.

The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization.

SLP has been disabled for now, even when fixed vectors are enabled.  See a310637 and associated review.  There are a few addiitional code quality issues which need worked through before turning SLP on would be reasonable.

Differential Revision: https://reviews.llvm.org/D131508
2022-08-26 14:45:23 -07:00
Philip Reames a310637132 [RISCV] Disable SLP vectorization by default due to unresolved profitability issues
This change implements a TTI query with the goal of disabling slp vectorization on RISCV. The current default configuration disables SLP already, but its current tied to the ability to lower fixed length vectors. Over in D131508, I want to enable fixed length vectors for purposes of LoopVectorizer, but preliminary analysis has revealed a couple of SLP specific issues we need to resolve before enabling it by default. This change exists to allow us to enable LV without SLP.

Differential Revision: https://reviews.llvm.org/D132680
2022-08-26 14:11:22 -07:00
Yunze Zhu 3846e3970f [RISCV] Generate correct ELF abi flag when empty .ll file has target-abi attribute
In patch D121183, target abi is get from .ll file's target-abi
attribute and set in RISCVAsmPrinter::emitFunctionEntryLabel
function. In https://github.com/llvm/llvm-project/issues/57242,
an api mismatch error may be caused by failing to call function
RISCVAsmPrinter::emitFunctionEntryLabel to set target-abi to
correct one when the .ll is empty or a module has no function.

This patch move setting target-abi part to function
RISCVAsmPrinter::emitStartOfAsmFile, make sure all .ll file and
module in LTO read target-abi from module flag and set, with or
without function.

Signed-off-by: xiaojing.zhang <xiaojing.zhang@xcalibyte.com>
Signed-off-by: jianxin.lai <jianxin.lai@xcalibyte.com>

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D132204
2022-08-26 14:39:39 +08:00
LiaoChunyu 6b098bf35a [RISCV] : Add support for simm10_lsb0000nonzero operand.
Running on RISCV machine llvm-exegesis I faced with trouble: can't measure C_ADDI16SP, beacuse immediate has type simm10_lsb0000nonzero.

Patch adds support for processing this immediate operand type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132650
2022-08-26 14:37:37 +08:00
Craig Topper e4177201eb [RISCV][M68k] Replace fixed size BitVector with std::bitset.
Saves a heap allocation and avoids an explicit call to the BitVector constructor.

Reviewed By: reames, myhsu

Differential Revision: https://reviews.llvm.org/D132674
2022-08-25 12:45:08 -07:00
Craig Topper 41a3b5739b [RISCV] Teach combineDeMorganOfBoolean to handle (and (xor X, 1), (not Y)).
SimplifyDemandedBits tries to agressively turn xor immediates into -1
to match a 'not' instruction. In this case, because X is a boolean, the
upper bits of (xor X, 1) are known to be 0. Because this is an AND
instruction, that means those bits aren't demanded from the other
operand, and thus SimplifyDemandedBits can turn (xor Y, 1) to (not Y).

We need to detect that this has happened to enable the DeMorgan
optimization. To do this we allow one of the xors to use -1 when
the outer operation is And.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132671
2022-08-25 10:55:45 -07:00
Philip Reames 53f738ce7e [RISCV] Add empirical costs for integer min/max and saturing add/sub
All of these are lowered to a single instruction for all legal vector types.
2022-08-25 09:27:17 -07:00
Craig Topper ec91d761ac [RISCV] Apply DeMorgan's law to (and/or (xor X, 1), (xor Y, 1)) if X and Y are 0/1.
This optimizes xors that appear due to legalizing setge/setle which
require an xor with 1. This reduces the number of xors and may
allow the xor to fold with a beqz or bnez.

Differential Revision: https://reviews.llvm.org/D132614
2022-08-25 08:49:30 -07:00
Philip Reames 03798f268b {RISCV] Backout cttz/ctlz instruction costs
Craig points out correctly in post-commit review that these depend on the availability of floating point extensions.
2022-08-24 15:40:48 -07:00
Philip Reames d4d6e71ea2 [RISCV] Add empirical costs for bswap/bitreverse/ctpop/ctlz/cttz
If anyone is looking for a source of ideas on vector codegen improvements, the lowerings for several of these seem to include pretty obvious fixits.
2022-08-24 15:09:21 -07:00
Philip Reames 42af1a776a [RISCV] Add empirically measured vector sqrt intrinsic costs 2022-08-24 14:27:57 -07:00
Philip Reames 4d3134866f [RISCV] Add vector fabs intrinsic costs
We have a fabs vector instruction, and are using it for current lowering.
2022-08-24 14:09:51 -07:00
Saleem Abdulrasool 8f45b5a7a9 RISCV: permit unaligned nop-slide padding emission
We may be requested to emit an unaligned nop sequence (e.g. 7-bytes or
3-bytes).  These should be 0-filled even though that is not a valid
instruction.  This matches the behaviour on other architectures like
ARM, X86, and MIPS.  When a custom section is emitted, it may be
classified as text even though it may be a data section or we may be
emitting data into a text segment (e.g. a literal pool).  In such cases,
we should be resilient to the emission request.

This was originally identified by the Linux kernel build and reported on
D131270 by Nathan Chancellor.

Differential Revision: https://reviews.llvm.org/D132482
Reviewed By: luismarques
Tested By: Nathan Chancellor
2022-08-24 20:26:48 +00:00
Simon Pilgrim f9de13232f [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Differential Revision: https://reviews.llvm.org/D132520
2022-08-24 17:28:18 +01:00
Kito Cheng 8e8a62006e [RISCV][NFC] Minor cleanup in RISCVInstrInfo::getOutliningType
The only use of TM is checking result of TargetMachine::getFunctionSections,
check that directly instead of introdce a local variable.
2022-08-24 23:42:34 +08:00
Alex Richardson 38107171ed [RegisterInfoEmitter] Generate isConstantPhysReg(). NFCI
This commit moves the information on whether a register is constant into
the Tablegen files to allow generating the implementaiton of
isConstantPhysReg(). I've marked isConstantPhysReg() as final in this
generated file to ensure that changes are made to tablegen instead of
overriding this function, but if that turns out to be too restrictive,
we can remove the qualifier.

This should be pretty much NFC, but I did notice that e.g. the AMDGPU
generated file also includes the LO16/HI16 registers now.

The new isConstant flag will also be used by D131958 to ensure that
constant registers are marked as call-preserved.

Differential Revision: https://reviews.llvm.org/D131962
2022-08-24 14:16:20 +00:00
Kito Cheng 96c85f80f0 [RISCV] Don't outline pcrel-lo operand.
This issue is found by build llvm-testsuite with `-Oz`, linker will complain
`dangerous relocation: %pcrel_lo missing matching %pcrel_hi` and that
turn out cause by we outlined pcrel-lo, but leave pcrel-hi there, that's
not problem in general, but the problem is they put into different section, they
pcrel-hi and pcrel-lo pair (e.g. AUIPC+ADDI) *MUST* put be present in same
section due to the implementation.

Outlined function will put into .text name, but the source functions
will put in .text.<function-name> if function-section is enabled or the
function has `comdat` attribute.

There are few solutions for this issue:
1. Always disallow instructions with pcrel-lo flags.
2. Only disallow instructions with pcrel-lo flags that when function-section is
   enabled or this function has `comdat` attribute.
3. Check the corresponding instruction with pcrel-high also included in the
   outlining candidate sequence or not, and allow that only when pcrel-high is
   included in the outlining candidate.

First one is most conservative, that might lose some optimization
opportunities, and second one could save those opportunities, and last
one is hard to implement, and don't have any benefits since pcrel-high
are using different label even accessing same symbol.

Use custom section name might also cause this problem, but that already
filtered by RISCVInstrInfo::isFunctionSafeToOutlineFrom.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D132528
2022-08-24 21:47:46 +08:00
MarkGoncharovAl 8c1f18bd3e [RISCV] : Add support for immediate operands.
llvm-exegesis uses operand type information provided in tablegen files to initialize
immediate arguments of the instruction. Some of them simply don't have such information.
Thus we should set into relevant immediate operands their specific type.
Also create verification methods for them.

Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D131771
2022-08-24 17:48:39 +08:00
Alex 07a700f814 [RISCV] Add zihintntl compressed instructions
Add zihintntl compressed instructions and some files related to zihintntl.
This patch is base on {D121670}.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D121779
2022-08-24 14:29:02 +08:00
ZHU Zijia 9c85382ade [RISCV] Handle register spill in branch relaxation
In branch relaxation pass, `j`'s with offset over 1MiB will be relaxed
to `jump` pseudo-instructions.

This patch allocates a stack slot for functions with a size greater than
1MiB. If the register scavenger cannot find a scratch register for
`jump`, spill a register to the slot before the jump and restore it
after the jump.

.mbb:
        foo
        j       .dest_bb
        bar
        bar
        bar
.dest_bb:
        baz

The above code will be relaxed to the following code.

.mbb:
        foo
        sd      s11, 0(sp)
        jump    .restore_bb, s11
        bar
        bar
        bar
        j       .dest_bb
.restore_bb:
        ld      s11, 0(sp)
.dest_bb:
        baz

Depends on D129999.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D130560
2022-08-24 13:27:56 +08:00
Philip Reames c9608d57b8 [TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC]
This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet.  The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.
2022-08-23 07:55:42 -07:00
Philip Reames 478cf94378 [X86][AArch64][WebAsm][RISCV] Query operand properties instead of using enums directly [nfc]
This is part of an ongoing transition to use OperandValueInfo which combines OperandValueKind and OperandValueProperties.  This change adds some accessor methods and uses them to simplify backend code.  The primary motivation of doing so is removing uses of the parameters so that an upcoming api change is less error prone.
2022-08-22 13:37:59 -07:00
Shao-Ce SUN 7167a4207e [RISCV] Add zihintntl instructions
Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D121670
2022-08-22 12:06:30 +08:00
Craig Topper abce7acebd [RISCV] Remove impossible TODO in RISCVRedundantCopyElimination. NFC
If there are multiple conditional branches we shouldn't do any
optimization.
2022-08-21 13:18:02 -07:00
Craig Topper 1a042dd6ed [RISCV] Optimize x <s -1 ? x : -1. Improve x >u 1 ? x : 1.
Similar to D132211, we can optimize x <s -1 ? x : -1 -> x <s 0 ? x : -1

Also improve the unsigned case from D132211 to use x != 0 which
will give a bnez instruction which might be compressible.

Differential Revision: https://reviews.llvm.org/D132252
2022-08-21 11:48:28 -07:00
Craig Topper a6c3ccd476 [RISCV] Be more strict about LUI+ADDI macrofusion pre-RA.
Don't macrofuse if the LUI has more than 1 user. That will likely
require the LUI to have a different destination register post-RA.
LUI+ADDI can only be fused if they write the same register.
2022-08-21 10:58:15 -07:00
LiaoChunyu 1fb87ace4d [RISCV] Optimize x > 1 ? x : 1 -> x > 0 ? x : 1
if x == 1,
  x > 1 ? x : 1  return x, which is also 1.
  x > 0 ? x : 1  return 1.

Reduce the number of load 1 instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132211
2022-08-21 20:26:39 +08:00
Simon Pilgrim 5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Craig Topper 6227b7ae31 [RISCV] Move xori creation for scalar setccs to lowering.
This patch enables expansion or custom lowering for some integer
condition codes so that any xori that is needed is created before
the last DAG combine to enable optimization.

I've seen cases where we end up with
(or (xori (setcc), 1), (xori (setcc), 1)) which we would ideally
convert to (xori (and (setcc), (setcc)), 1). This patch doesn't
accomplish that yet, but it should allow us to add DAG
combines as follow ups. Example https://godbolt.org/z/Y4qnvsq1b

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131729
2022-08-19 13:51:53 -07:00
Philip Reames 59960e8db9 [RISCV] Factor out getVectorImmCost cost after 0e7ed3 [nfc] 2022-08-19 12:53:54 -07:00
Philip Reames e7fda46300 [RISCV] Correct costs for vector ceil/floor/trunc/round
Add vector costs for ceil/floor/trunc/round. As can be seen in the tests, the prior default costs were a significant under estimate of the actual code generated.

These costs are computed by simply generating code with the current backend, and then counting the number of instructions. I discount one vsetvli, and ignore the return.

Differential Revision: https://reviews.llvm.org/D131967
2022-08-19 10:37:39 -07:00
Craig Topper 961838cc13 [RISCV] Add passthru operand to RISCVISD::SETCC_VL.
Use it to the fix a bug in the fceil/ffloor lowerings. We were
setting the passthru to IMPLICIT_DEF before and using a mask
agnostic policy. This means where the incoming bits in
the mask were 0 they could be anything in the outgoing mask. We
want those bits in the outgoing mask to be 0. This means we need to
pass the input mask as the passthru.

This generates worse code because we are unable to allocate the
v0 register to the output due to an earlyclobber constraint. We
probably need a special TIED pseudoinstruction and probably custom
isel since you can't use V0 twice in the input pattern.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132058
2022-08-19 08:53:44 -07:00
Craig Topper c9a41fe60a [RISCV] Prefer vnsrl.wi v8, v8, 0 over vnsrl.wx v8, v8, x0.
I have a couple data points that some microarchitectures prefer
the immediate 0 over x0. Does anyone know of microarchitectures
where the opposite is true?

Unfortunately, this is different than the vncvt.x.x.w alias
from the spec. Perhaps the alias was poorly chosen if x0 isn't
as optimal as immediate 0 on all microarchitectures.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132041
2022-08-19 08:40:17 -07:00
Alexey Bataev 0e7ed32c71 [SLP]Cost for a constant buildvector.
In many cases constant buildvector results in a vector load from a
constant/data pool. Need to consider this cost too.

Differential Revision: https://reviews.llvm.org/D126885
2022-08-19 08:02:42 -07:00
Alexey Bataev d53e245951 [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC.
Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to
better estimate cost with immediate values.

Part of D126885.
2022-08-19 07:33:00 -07:00
Craig Topper ba1f4cab44 [RISCV] Copy SDNodeFlags in lowerToScalableOp.
Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D132177
2022-08-18 20:42:59 -07:00
Craig Topper 5349aa2354 [RISCV] Copy SDNodeFlags in doPeepholeMaskedRVV and doPeepholeMergeVVMFold
Especially the NoFPExcept flag for FP.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D132173
2022-08-18 20:42:46 -07:00
Craig Topper 37c47b2cac [RISCV] Change how mtune aliases are implemented.
The previous implementation translated from names like sifive-7-series
to sifive-7-rv32 or sifive-7-rv64. This also required sifive-7-rv32
and sifive-7-rv64 to be valid CPU names. As those are not real
CPUs it doesn't make sense to accept them in -mcpu.

This patch does away with the translation and adds sifive-7-series
directly to RISCV.td. Removing sifive-7-rv32 and sifive-7-rv64.
sifive-7-series is only allowed in -mtune.

I've also added "rocket" to RISCV.td but have not removed rocket-rv32
or rocket-rv64.

To prevent -mcpu=sifive-7-series or -mcpu=rocket being used with llc,
I've added a Feature32Bit to all rv32 CPUs. And made it an error to
have an rv32 triple without Feature32Bit. sifive-7-series and rocket
do not have Feature32Bit or Feature64Bit set so the user would need
to provide -mattr=+32bit or -mattr=+64bit along with the -mcpu to
avoid the error.

SiFive no longer names their newer products with 3, 5, or 7 series.
Instead we have p200 series, x200 series, p500 series, and p600 series.
Following the previous behavior would require a sifive-p500-rv32 and
sifive-p500-rv64 in order to support -mtune=sifive-p500-series. There
is currently no p500 product, but it could start getting confusing if
there was in the future.

I'm open to hearing alternatives for how to achieve my main goal
of removing sifive-7-rv32/rv64 as a CPU name.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131708
2022-08-18 16:22:25 -07:00
Philip Reames 4d87591028 [RISCV] Use VScaleForTuning in costing of operations whose cost depends on VL
On known hardware, reductions, gather, and scatter operations have execution latencies which correlated with the vector length (VL) of the operation. Most other operations (e.g. simply arithmetic) don't correlated in this way, and instead essentially fixed cost as VL varies.

When I'd implemented initial scalable cost model support for reductions, gather, and scatter operations, I had used an upper bound on the statically unknown VL. The argument at the time was that this prevented falsely low costs, and biased the vectorizer away from generating bad (on some hardware) code. Unfortunately, practical experience shows we were a bit too effective at that goal, and the high costs defacto prevents vectorization using these constructs at all.

This patch reverses course, and ties the returned cost not to the maximum possible VL, but the VL which would correspond to VScaleForTuning. This parameter is the same one the vectorizer uses when normalizing loop costs, so the term effectively cancels out. The result is that the vectorizer now sees these constructs as comparable in cost to their fixed length variants.

This does introduce the possibility of the cost for these operations being a significant under estimate on platforms where actual VLEN is far from that implied by VScaleForTuning. On such platforms, we might make poor heuristic choices. Probably not in LV itself (due to the cancellation mentioned above), but possibly during e.g. lowering. I'm not currently aware of any concrete examples of this, but this patch does open a concern which did not previously exist.

Previously, we had the problem of overestimating costs causing the same problem on machines much closer to default values for vscale for tuning. With this patch, we still have that problem potentially if vscale for tuning is set high (manually), and then the code is run on a narrow VLEN machine.

Differential Revision: https://reviews.llvm.org/D131519
2022-08-18 13:10:03 -07:00
Simon Pilgrim fdec50182d [CostModel] Replace getUserCost with getInstructionCost
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Differential Revision: https://reviews.llvm.org/D79483
2022-08-18 11:55:23 +01:00
WuXinlong 515ece1a90 [RISCV] Add MC support of RISCV Zca Extension
This patch adds support for part of Zc extension which will be frozen soon.

This extension is designed to continue reducing the binary size of RISC-V programs.
In this patch:
`Zca` is a subset of C extension instructions that are compatible with the Zc extension.

The spec of Zc ext is [[ https://github.com/riscv/riscv-code-size-reduction/releases | Here ]]

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130141
2022-08-18 12:13:35 +08:00
Daniil Fukalov 7ed3d81333 [NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.

E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.

Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D117723
2022-08-18 00:38:55 +03:00
Craig Topper 550fab53e1 [RISCV] Fold (sub C, (xor (setcc), 1)) -> (add (setcc), C-1).
Extracted from D131729 where we handled C==0. It's now generalized
to more constants.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132000
2022-08-17 09:50:08 -07:00
Craig Topper ab4cd154c6 [RISCV] Refactor performSUBCombine to prepare for D132000.
This refactors the code into a separate function with early returns.
D132000 adds an additional operation to the if/else that selects
NewLHS, but can otherwise share the rest of the code.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132002
2022-08-17 09:50:08 -07:00
Alex Bradbury ce38128194 [RISCV] Avoid redundant branch-to-branch when expanding cmpxchg
If the success value of a cmpxchg is used in a branch, the expanded
cmpxchg sequence ends up with a redundant branch-to-branch (as the
backend atomics expansion happens as late as possible, passes to
optimise such cases have already run). This patch identifies this case
and avoid it when expanding the cmpxchg.

Note that a similar optimisation is possible for a BEQ on the cmpxchg
success value. As it's hard to imagine a case where real-world code may
do that, this patch doens't handle that case.

Differential Revision: https://reviews.llvm.org/D130192
2022-08-17 13:49:15 +01:00
Craig Topper d27c147aaa [RISCV] Allow lowerSELECT to fold integer setcc with FP select.
We'd pick it up in DAG combine later even if we didn't handle it here.
No test changes because we get it in DAG combine anyway.
2022-08-16 21:28:54 -07:00
Craig Topper ba1fb54821 [RISCV] Reuse existing VT variable instead of calling getValueType() repeatedly. NFC 2022-08-16 19:56:55 -07:00
Monk Chiang 0af4651c0f [RISCV] Add scheduling class for vector pseudo segment instructions.
Add scheduling resource for vector segment load/store instructions in D128886.
I miss to add scheduling resource for pseudo segment instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130222
2022-08-16 17:54:47 -07:00
Craig Topper 53ce22e429 Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine."
This time using N1 instead of N0 since N1 points to the original
setcc. This now affects scheduling as I expected.

Original commit message:
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
2022-08-16 15:51:07 -07:00
Craig Topper 2dfa4b6475 Revert "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine."
This reverts commit 1380b21ceb.

I mixed up N0 and N1 and didn't do what I intended.
2022-08-16 15:47:01 -07:00
Craig Topper 1380b21ceb [RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine.
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
2022-08-16 15:40:09 -07:00
Craig Topper b5a18de651 [RISCV] Remove C!=0 restriction from (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)).
While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is
still preferrable. (addi X, -1) can be compressed, sub with x0 on
the LHS is never compressible.
2022-08-16 14:49:52 -07:00
Craig Topper de6fd16971 [RISCV] Don't fold (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) if C-1 isn't simm12.
We still need to materialize the constant in a register and we
may not be removing all uses of the original constant so it may
increase code size.
2022-08-16 14:11:31 -07:00
Craig Topper 4184edc691 [RISCV] (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) fold for FP setcc.
This introduce an xori in some cases. I don't believe it was the
intention of the original patch. This was an accident because
nonan FP equality compares also use SETEQ/SETNE.

Also pass the correct type to getSetCCInverse.
2022-08-16 13:00:36 -07:00
Craig Topper c7e58836e8 [RISCV] Minor cleanups to performSUBCombine. NFC
-Rename variable NnzC -> N0C.
-Use SelectionDAG::getSetCC to reduce code.
-Use SDValue::getOperand instead of operator-> and SDNode::getOperand.

Initial steps to add another similar combine to this code.
2022-08-16 12:59:16 -07:00
Craig Topper 7a73ab5818 [RISCV] Enable isTruncateFree in SDAG for i64->i32 on rv64.
We have a good selection of W instructions, so promoting a truncated
value back to i64 is often free.

This appears to be a net code size reduction on SPECINT2006.

This has been split from D130397 as one of the patches needed to
complete that.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131819
2022-08-15 08:32:51 -07:00
Kazu Hirata f5a68feab3 Use llvm::none_of (NFC) 2022-08-14 16:25:39 -07:00
LiaoChunyu 99ef0ddea3 [RISCV] Fold (sub constant, (setcc x, y, eq/neq)) -> (add constant - 1, (setcc x, y, neq/eq))
(setcc x, y, eq/neq) are seqz, snez that set rd = 0/1.

addi is used to process immediate, which can save instructions for load immediate.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D131471
2022-08-13 20:37:57 +08:00
Craig Topper 37db283362 [RISCV] isImpliedByDomCondition returns an Optional<bool> not a bool.
We were incorrectly checking that it returned an implicaton result,
not that the implication result itself was true.
2022-08-12 22:21:05 -07:00
jacquesguan 0fe5f03eeb [RISCV][NFC] Use nested namespace definations.
Since we use C++17 now, we could use nested namespace definations to simplify code.

Differential Revision: https://reviews.llvm.org/D131751
2022-08-13 09:56:59 +08:00
Craig Topper e493944f5f [RISCV] Use SLTIU X, -1 for (setne X, -1).
Since -1 is the maximum unsigned value, all values less than it
are not equal to it.
2022-08-11 15:36:04 -07:00
Craig Topper 2c79801a0e [RISCV] Add more ineg+setcc isel patterns to avoid creating neg+xori+slti(u).
Including patterns to select addiw if only the lower 32 bits are used.

I'm not excited about adding this many patterns. I'm looking at whether
we can create the xori during lowering and move the ineg patterns to
DAGCombiner.
2022-08-11 14:24:09 -07:00
Yeting Kuo 875694089d [RISCV] Peephole optimization to fold merge.vvm and unmasked intrinsics.
The patch uses peephole method to fold merge.vvm and unmasked intrinsics to
masked intrinsics. Using peephole intead of tablegen patterns is to avoid large
auto gnerated code.

Note: The patch ignores segment loads since I don't know how to test them.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130442
2022-08-11 17:58:11 +08:00
Jonas Hahnfeld 940733d6a0 [RISCV] Re-enable JIT support
Commit 8922adf646 recently made JITTargetMachineBuilder honor the
hasJIT property of the target. LLVM supports just-in-time compilation
on RISC-V, so set the flag.

Differential Revision: https://reviews.llvm.org/D131617
2022-08-11 11:41:02 +02:00
jacquesguan 21bf59c92a [RISCV] Add cost model for mask vector extend and truncate instruction.
As extending from or truncating to mask vector do not use the same instructions as the normal cast, this path changed it to 2 which is the number of instructions we used.

Differential Revision: https://reviews.llvm.org/D131552
2022-08-11 10:55:43 +08:00
Alex Bradbury 47b1f8362a [RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls
Prior to this patch, libcalls inserted by the SelectionDAG legalizer
could never be tailcalled. The eligibility of libcalls for tail calling
is is partly determined by checking TargetLowering::isInTailCallPosition
and comparing the return type of the libcall and the calleer.
isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly
(which always returns false if not implemented by the target).

This patch provides a minimal implementation of
TargetLowering::isUsedByReturnOnly - enough to support tail calling
libcalls on hard float ABIs. Soft-float ABIs are left for a follow on
patch. libcall-tail-calls.ll also shows missed opportunities to tail
call integer libcalls, but this is due to issues outside of
the isUsedByReturnOnly hook.

Differential Revision: https://reviews.llvm.org/D131087
2022-08-10 10:50:29 +01:00
jacquesguan b6b1c0d1c4 [RISCV] Add cost model for fp-mask cast op.
The cost of convert from or to mask vector is different from other cases. We could not use PowDiff to calculate it. This patch set it to 3 as we use 3 instruction to make it.

Differential Revision: https://reviews.llvm.org/D131149
2022-08-10 17:14:37 +08:00
Nikita Popov f5ed0cb217 [RISCV] Add target feature to force-enable atomics
This adds a +forced-atomics target feature with the same semantics
as +atomics-32 on ARM (D130480). For RISCV targets without the +a
extension, this forces LLVM to assume that lock-free atomics
(up to 32/64 bits for riscv32/64 respectively) are available.

This means that atomic load/store are lowered to a simple load/store
(and fence as necessary), as these are guaranteed to be atomic
(as long as they're aligned). Atomic RMW/CAS are lowered to __sync
(rather than __atomic) libcalls. Responsibility for providing the
__sync libcalls lies with the user (for privileged single-core code
they can be implemented by disabling interrupts). Code using
+forced-atomics and -forced-atomics are not ABI compatible if atomic
variables cross the ABI boundary.

For context, the difference between __sync and __atomic is that the
former are required to be lock-free, while the latter requires a
shared global lock provided by a shared object library. See
https://llvm.org/docs/Atomics.html#libcalls-atomic for a detailed
discussion on the topic.

This target feature will be used by Rust's riscv32i target family
to support the use of atomic load/store without atomic RMW/CAS.

Differential Revision: https://reviews.llvm.org/D130621
2022-08-09 16:04:46 +02:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Craig Topper e2bfbed2bb [RISCV] Add ReadFStoreData as a SchedRead.
The floating point stores use a different register class, it
probably makes sense to have a different SchedRead.

Reviewed By: monkchiang

Differential Revision: https://reviews.llvm.org/D131379
2022-08-08 09:33:19 -07:00
Kazu Hirata a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Craig Topper 75c64c7c4e [RISCV] Don't use li+sh3add for constants that can use lui+add.
If we're adding a constant that can't use addi we try a few tricks,
one of which is using li+sh3add. We should not do this if lui+add
would work. For example adding 8192. Using sh3add prevents folding
a sext.w to form addw, thus increasing instruction count.
2022-08-05 12:47:03 -07:00
Philip Reames 9a9848f4b9 [RISCVInsertVSETVLI] Remove an unsound optimization
This fixes a bug reported privately by @craig.topper. Here's an example which illustrates the problem:

vsetivli a1, a0, e32, m1, ta, mu # both DefInfo and PrevInfo
vsetivli a2, a1, e32, m4, ta, mu

With the unsound result being:

vsetivli a1, a0, e32, m1, ta, mu
vsetivli a2, a0, e32, m4, ta, mu

Consider the case where this is running on a machine with VLEN=512,. For this case, the VLMAXs are 16 and 64 respectively.

Consider for a0 = 33. The correct result is: a1 = 16, and a2 = 16

After the unsound optimization: a1 = 16 and a2 = 33

This particular example used VLMAXs which differed by more than a power of two. With a difference of only one power of two, there's another form of this bug which involves the AVL < 2 x VLMAX special case, but that ones more complicated to construct as many examples turn out accidentally sound.

This patch takes the approach of simply removing the unsound optimization, but there are multiple sound sub-cases of it. I plan to return to at least a couple of them, but figured it was cleaner to remove the unsound optimization (for ease of backporting), and then review the new optimizations on their own.

Differential Revision: https://reviews.llvm.org/D131264
2022-08-05 12:13:08 -07:00
Craig Topper 12a1ca9c42 [RISCV] Relax another one use restriction in performSRACombine.
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C)
it's possible that the add is used by multiple sras. We should
allow the combine if all the SRAs will eventually be updated.

After transforming all of the sras, the shls will share a single
(sext_inreg (add X, C1), i32).

This pattern occurs if an sra with 32 is used as index in multiple
GEPs with different scales. The shl from the GEPs will be combined
with the sra before we get a chance to match the sra pattern.
2022-08-04 14:32:31 -07:00
Craig Topper a2de12c987 [RISCV] Relax a one use restriction performSRACombine
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
ignore the use count on the (shl X, 32).

The sext_inreg after the transform is free. So we're only making
2 new instructions, the add and the shl. So we only need to be
concerned with replacing the original sra+add. The original shl
can have other uses. This helps if there are multiple different
constants being added to the same shl.
2022-08-04 11:25:08 -07:00
jacquesguan b61cfc91ea [RISCV] Add cost modelling for vector widenning reduction.
In RVV, we use vwredsum.vs and vwredsumu.vs for vecreduce.add(ext(Ty A)) if the result type's width is twice of the input vector's SEW-width. In this situation, the cost of extended add reduction should be same as single-width add reduction. So as the vector float widenning reduction.

Differential Revision: https://reviews.llvm.org/D129994
2022-08-04 15:31:31 +08:00
Craig Topper 53d560b22f [RISCV] Prevent infinite loop after D129980.
D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into
(seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it
will be turned back into an 'and' by SimplifyDemandedBits which
can cause an infinite loop.

To prevent this, check if bit 31 is 0 with computeKnownBits before
doing the transformation.

Fixes PR56905.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131113
2022-08-03 15:19:07 -07:00
David Truby 9a976f3661 [llvm] Always use TargetConstant for FP_ROUND ISD Nodes
This patch ensures consistency in the construction of FP_ROUND nodes
such that they always use ISD::TargetConstant instead of ISD::Constant.

This additionally fixes a bug in the AArch64 SVE backend where patterns
were matching against TargetConstant nodes and sometimes failing when
passed a Constant node.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130370
2022-08-03 14:02:11 +01:00
Alex Bradbury 28f12a09ae [RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics
An unnecessary sext.w is generated when masking the result of the
riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the
intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed.

Although this isn't a particularly important optimisation, removing the
sext.w simplifies implementation of an additional cmpxchg-related
optimisation in D130192.

Although I can't produce a test with different codegen for the other
atomics intrinsics, these are added as well for completeness.

Differential Revision: https://reviews.llvm.org/D130191
2022-08-03 13:41:58 +01:00
Fraser Cormack 646e2f4803 [VP] Rename VP int<->float conversion ISD opcodes
These should be named like the non-VP versions for consistency.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130967
2022-08-03 10:04:38 +01:00
Craig Topper f19497f7b0 [RISCV] Use InstVisitor in RISCVCodeGenPrepare. NFC
Makes it easy to add new instructions to look at without dispatching
manually.
2022-08-02 21:19:30 -07:00
Craig Topper a5605f1f68 [RISCV] Fix operand number in debug message in RISCVMergeBaseOffset.
This used to print from the ADDI where the operand number was
correct. It recently changed to print from the LUI or AUIPC which
needs to use operand 1 instead of 2.

This shows up as a crash with -debug.
2022-08-02 15:27:23 -07:00
Craig Topper ae6877836e [RISCV] Add scheduler classes to PseudoVMV*R_V.
I think these pseudos will exist when the post-RA scheduler runs
so they should have sched classes.

Reviewed By: monkchiang

Differential Revision: https://reviews.llvm.org/D130945
2022-08-02 09:38:32 -07:00
Craig Topper 2e5c516a3d [RISCV] Add scheduler class to PseudoReadVLENB.
Reviewed By: monkchiang

Differential Revision: https://reviews.llvm.org/D130938
2022-08-02 09:38:32 -07:00
Alex Bradbury 5ad59c9e59 [RISCV][NFCI] Set TransientStackAlignment and rely on it rather than RVV-specific logic on RVV-less functions
* TargetFrameLowering has a TransientStackAlignment field that "returns
  the number of bytes to which the stack pointer must be aligned at all
  times, even between calls.
  * As explained in the [RISC-V calling
    convention](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc),
    the stack pointer must remain fully aligned throughout execution for
    compliant code. This is important for embedded targets that might avoid
    realigning the stack pointer for interrupt service routines. Systems
    running full OSes may always realign the stack anyway.
* TransientStackAlignment is used in estimateStackSize in
  MachineFrameInfo and in PEI::calculateFrameObjectOffsets.
  * estimateStackSize is only used in the RISC-V backend for scavenging
    slots. It may be possible to craft a function where the difference
    is observable, but it wouldn't be a meaningful test.
  * calculateFrameObjectOffsets makes use of TransientStackAlignment,
    but then sets the stack alignment to the max of that alignment and
    MaxAlign, which is unconditionally set to 16 in
    RISCVFrameLowering::processFunctionBeforeFrameFinalized
  * I've changed this logic to only set MaxAlign if there are RVV frame
    objects. There should be no functional change here for either RVV
    targets (MaxAlign is set as before) or non-RVV targets
    (TransientStackAlign is now 16 anyway).

Differential Revision: https://reviews.llvm.org/D130068
2022-08-02 09:46:06 +01:00
wanglian e208bab55f [RISCV][NFC] Use defined variable instead some code.
Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D130687
2022-08-02 16:26:33 +08:00
Craig Topper da5b1bf5bb [RISCV] Teach RISCVMergeBaseOffset to merge %lo/%pcrel_lo into load/store after folding arithmetic.
It's possible we have:
lui  a0, %hi(sym)
addi a0, %lo(sym)
addi a0, <offset1>
lw a0, <offset2>(a0)

We want to arrive at
lui a0, %hi(sym+offset1+offset2)
lw a0, %lo(sym+offset1+offset2)

We currently fail to do this because we only consider loads/stores
if we didn't find any arithmetic.

This patch splits arithmetic folding and load/store folding into
two separate phases. The load/store folding can no longer assume
the offset in hi/lo is 0 so we must combine the offsets. I've applied
the same simm32 limit that we applied in the arithmetic folding.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D130931
2022-08-01 15:33:21 -07:00
Craig Topper e07a8155f5 [RISCV] Move Pre-RA pseudo expansion from addMachineSSAOptimization to addPreRegAlloc.
addMachineSSAOptimization is skipped for -O0, but this pass is
required for -O0.
2022-08-01 13:44:43 -07:00
Craig Topper 450edb0b37 [RISCV] Explicitly select second operand of branch condition to X0.
At least based on the lit tests, the coalescer sometimes fails to
propagate the copy from X0 into the branch instruction. This patch
does it manually during isel. The majority of the changes are from
the select patterns.

Some of the changes are just register allocation changes. Only
the Select change affects the whether a b*z instruction is generated
in the tests. I changed the branch pattern for consistency.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D130809
2022-08-01 11:16:48 -07:00
Craig Topper ad8db972b0 [RISCV] Eagerly delete instructions in MergeBaseOffset.
The only iterator we're holding points to HiLUI and we never
delete that so I think it is safe to delete everything else
immediately.

I want to split detectAndFoldOffset into two phases. First, combine
LUI+ADDI with any ADD/ADDI/SHXADD that comes after it. This may
open opportunities to fold the ADDI from the LUI+ADDI into a
load/store address. So the load/store folding should run as a
second phase even if the ADD/ADDI/SHXADD made changes.

In order to do this we need to eagerly delete instructions in the
first phase so that we don't have dead users of the LUI+ADDI
when we start the second phase.

Patches to split the phases will come later.

Reviewed By: asb, luismarques

Differential Revision: https://reviews.llvm.org/D130119
2022-08-01 09:32:46 -07:00
Lorenzo Albano 71b7c03fd6 [RISCV][VP] Custom lower VP_STRIDED_LOAD and VP_STRIDED_STORE
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121113
2022-08-01 09:23:45 -07:00
Luís Marques 0bc177b6f5 [RISCV] Extend the Merge Base Offset pass to handle AUIPC+ADDI
Builds upon D123264, adding support for merging the low part of the LLA
address into the load/store instruction offsets.

Differential Revision: https://reviews.llvm.org/D123265
2022-08-01 11:30:02 +02:00
Luís Marques 260a641068 [RISCV] Pre-RA expand pseudos pass
Expand load address pseudo-instructions earlier (pre-ra) to allow follow-up
patches to fold the addi of PseudoLLA instructions into the immediate
operand of load/store instructions.

Differential Revision: https://reviews.llvm.org/D123264
2022-07-31 23:19:00 +02:00
Craig Topper d21b315360 [RISCV] Remove vmerges from vector ceil, floor, trunc lowering.
Use masked operations to suppress spurious exception bits being
set in fflags. Unfortunately, doing this adds extra copies.
2022-07-30 10:58:41 -07:00
Craig Topper a23f07fb1d [RISCV] Add merge operands to more RISCVISD::*_VL opcodes.
This adds a merge operand to all of the binary _VL nodes. Including
integer and widening. They all share multiclasses in tablegen
so doing them all at once was easiest.

I plan to use FADD_VL in an upcoming patch. The rest are just for
consistency to keep tablegen working.

This does reduce the isel table size by about 25k so that's nice.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130816
2022-07-30 10:26:38 -07:00
Craig Topper 9bf305fe2b [RISCV] Swap the merge and mask operand order for VRGATHER*_VL and FCOPYSIGN_VL nodes.
Based on review feedback from D130816.
2022-07-30 09:57:05 -07:00
Craig Topper e637feee80 [RISCV] Add isel pattern for (setne/eq GPR, -2048)
For constants in the range [-2047, 2048] we use addi. If the constant
is -2048 we can use xori. If we don't match this explicitly, we'll
emit an LI for the -2048 followed by an XOR.
2022-07-29 14:07:38 -07:00
Craig Topper 2750873dfe [RISCV] Update lowerFROUND to use masked instructions.
This avoids a vmerge at the end and avoids spurious fflags updates.
This isn't used for constrained intrinsic so we technically don't have
to worry about fflags, but it doesn't cost much to support it.

To support I've extend our FCOPYSIGN_VL node to support a passthru
operand. Similar to what was done for VRGATHER*_VL nodes.

I plan to do a similar update for trunc, floor, and ceil.

Reviewed By: reames, frasercrmck

Differential Revision: https://reviews.llvm.org/D130659
2022-07-28 10:05:19 -07:00
Craig Topper 89173dee71 [RISCV] Remove duplicate code. NFC
The same operations are part of `FloatingPointVecReduceOps` a little
bit earlier.
2022-07-28 10:05:19 -07:00
Craig Topper a304d70ee9 [RISCV] Reorder (and/or/xor (shl X, C1), C2) if we can form ANDI/ORI/XORI.
InstCombine and DAGCombine prefer to keep shl before binops.

This patch teaches isel to convert to (shl (and/or/xor X, C1 >> C2), C2)
if (C1 >> C2) is a simm12. The idea was taken from X86's isel code.

There's a special case implemented for a sext_inreg between the
shift and the binop.

Differential Revision: https://reviews.llvm.org/D130610
2022-07-27 17:35:26 -07:00
Craig Topper 1d1d8d6025 [RISCV] Reorder code in lowerFROUND to make the diff in D130659 cleaner. NFC 2022-07-27 17:13:04 -07:00
Craig Topper 98647330bf [RISCV] Add merge operand to RISCVISD::FCOPYSIGN_VL.
Similar to what was done for VRGATHER*_VL recently.

This will be used in D130659.
2022-07-27 15:25:34 -07:00
Philip Reames 15c645f7ee [RISCV] Enable (scalable) vectorization by default
This change enables vectorization (using scalable vectorization only, fixed vectors are not yet enabled) for RISCV when vector instructions are available for the target configuration.

At this point, the resulting configuration should be both stable (e.g. no crashes), and profitable (i.e. few cases where scalar loops beat vector ones), but is not going to be particularly well tuned (i.e. we emit the best possible vector loop). The goal of this change is to align testing across organizations and ensure the default configuration matches what downstreams are using as closely as possible.

This exposes a large amount of code which hasn't otherwise been on by default, and thus may not have been fully exercised.  Given that, having issues fall out is not unexpected.  If you find issues, please make sure to include as much information as you can when reverting this change.

Differential Revision: https://reviews.llvm.org/D129013
2022-07-27 12:36:04 -07:00
Craig Topper 32622d6de4 [RISCV] Add isel pattern for (mul (and X, 0xffffffff), 3<<C) with Zba.
We can use slli.uw by C followed by sh1add. Similar can be done
for multiples of 5 and 9. We need to make sure that C is less than
32 to stay in bounds of the 5-bit immediate for slli.uw.

We have existing patterns for (mul X, 3<<C) that use sh1add
followed by slli. That order doesn't allow the and to be folded.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130146
2022-07-27 09:41:59 -07:00
Craig Topper 9b27d13204 [RISCV] Disable constant hoisting for multiply by negated power of 2.
A mul by a negated power of 2 is a slli followed by neg. This doesn't
require any constant materialization and may be lower latency than mul.
The neg may also be foldable into other arithmetic.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130047
2022-07-27 09:37:59 -07:00
LiaoChunyu bf4f9a468a [RISCV]Enable isIntDivCheap when attribute is minsize
Don't expand divisions by constants when attribute is minsize.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130543
2022-07-27 18:22:51 +08:00
Craig Topper 3a2d7d8ad5 [RISCV] Add Predicate to c.lw/c.sw/c.lwsp/c.swsp InstAliases with no offset.
These are aliases that allow the immediate offset to be ommitted.
We had predicates for the RV64, RV32+F, and D versions, but
not the base versions.

I've also re-ordered them to share Predicate lines to improve
readability.
2022-07-26 11:06:00 -07:00
Kazu Hirata 3f3930a451 Remove redundaunt virtual specifiers (NFC)
Identified with tidy-modernize-use-override.
2022-07-25 23:00:59 -07:00
Craig Topper 45944e7cf4 [RISCV] Refactor translateSetCCForBranch to prepare for D130508. NFC.
D130508 handles more constants than just 1 or -1. We need to extract
the constant instead of relying isOneConstant or isAllOnesConstant.
2022-07-25 15:54:54 -07:00
Craig Topper 1db6d6dcd8 [RISCV] Teach RISCVCodeGenPrepare to optimize (zext (abs(i32 X, i1 1))).
(abs(i32 X, i1 1) always produces a positive result. The 'i1 1'
means INT_MIN input produces poison. If the result is sign extended,
InstCombine will convert it to zext. This does not produce ideal
code for RISCV.

This patch reverses the zext back to sext which can be folded
into a subw or negw. Ideally we'd do this in SelectionDAG, but
we lose the INT_MIN poison flag when llvm.abs becomes ISD::ABS.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130412
2022-07-25 09:36:41 -07:00
jacquesguan d8800ead62 [RISCV] Scalarize binop followed by extractelement.
This patch adds shouldScalarizeBinop to RISCV target in order to convert an extract element of a vector binary operation into an extract element followed by a scalar binary operation.

Differential Revision: https://reviews.llvm.org/D129545
2022-07-25 17:23:31 +08:00
Kazu Hirata b5188591a0 [llvm] Remove redundaunt virtual specifiers (NFC)
Identified with modernize-use-override.
2022-07-24 21:50:35 -07:00
Craig Topper 9adc00a9d0 [RISCV] Add a continue to reduce nesting. NFC 2022-07-23 17:36:12 -07:00
Kazu Hirata 1cc7f5bede Use static_assert instead of assert (NFC)
Identified with misc-static-assert.
2022-07-23 09:22:27 -07:00
Craig Topper ab2348a6fa [RISCV] Add sext.b/h and zext.b/h/w to RISCVInstrInfo::foldMemoryOperandImpl.
We can always fold zext.b since it is just andi. The others require
Zba/Zbb.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130302
2022-07-21 14:54:58 -07:00
Craig Topper add17fc8e4 [RISCV] Combine (select_cc (srl (and X, 1<<C), C), 0, eq/ne, true, fale)
(srl (and X, 1<<C), C) is the form we receive for testing bit C.
An earlier combine removed the setcc so it wasn't there to match
when we created the SELECT_CC. This doesn't happen for BR_CC because
generic DAG combine rebuilds the setcc if it is used by BRCOND.

We can shift X left by XLen-1-C to put the bit to be tested in the
MSB, and use a signed compare with 0 to test the MSB.
2022-07-20 22:32:11 -07:00
Craig Topper 7dda6c71b1 [RISCV] Refactor the common combines for SELECT_CC and BR_CC into a helper function.
The only difference between the combines were the calls to getNode
that include the true/false values for SELECT_CC or the chain
and branch target for BR_CC.

Wrap the rest of the code into a helper that reads LHS, RHS, and
CC and outputs new values and a bool if a new node needs to be
created.
2022-07-20 21:18:07 -07:00
Craig Topper 8983db15a3 [RISCV] Optimize (brcond (seteq (and X, 1 << C), 0))
If C > 10, this will require a constant to be materialized for the
And. To avoid this, we can shift X left by XLen-1-C bits to put the
tested bit in the MSB, then we can do a signed compare with 0 to
determine if the MSB is 0 or 1. Thanks to @reames for the suggestion.

I've implemented this inside of translateSetCCForBranch which is
called when setcc+brcond or setcc+select is converted to br_cc or
select_cc during lowering. It doesn't make sense to do this for
general setcc since we lack a sgez instruction.

I've tested bit 10, 11, 31, 32, 63 and a couple bits betwen 11 and 31
and between 32 and 63 for both i32 and i64 where applicable. Select
has some deficiencies where we receive (and (srl X, C), 1) instead.
This doesn't happen for br_cc due to the call to rebuildSetCC in the
generic DAGCombiner for brcond. I'll explore improving select in a
future patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130203
2022-07-20 18:40:49 -07:00
Craig Topper 31b8939ded [RISCV] Recognize bexti from (srl (and X, 1<<C), C).
This is the form we get for (zext (setne (and X 1<<C))). We only
had bexti patterns for the alternative form (and (srl X, C), 1).
2022-07-20 15:03:52 -07:00
ksyx 3198364e6e [RISCV][Clang] Add support for Zmmul extension
This patch implements recently ratified extension Zmmul, a subextension
of M (Integer Multiplication and Division) consisting only
multiplication part of it.

Differential Revision: https://reviews.llvm.org/D103313
Reviewed By: craig.topper, jrtc27, asb
2022-07-18 20:26:08 -04:00
Craig Topper 0b02752899 [RISCV] Optimize (seteq (i64 (and X, 0xffffffff)), C1)
(and X, 0xffffffff) requires 2 shifts in the base ISA. Since we
know the result is being used by a compare, we can use a sext_inreg
instead of an AND if we also modify C1 to have 33 sign bits instead
of 32 leading zeros. This can also improve the generated code for
materializing C1.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D129980
2022-07-18 10:54:45 -07:00
Craig Topper 7c0b9b379b [RISCV] Add isel patterns for ineg+setge/le/uge/ule.
setge/le/uge/ule selected by themselves require an xori with 1.
If we're negating the setcc, we can fold the xori with the neg
to create an addi with -1.

This works because xori X, 1 is equivalent to 1 - X if X is either
0 or 1. So we're doing -(1 - X) which is X-1 or X+-1.

This improves the code for selecting between 0 and -1 based on a
condition for some conditions.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D129957
2022-07-18 09:55:01 -07:00
Craig Topper d7f2a63371 [RISCV] Fold stack reload into sext.w by using lw instead of ld.
We can use lw to load 4 bytes from the stack and sign extend them
instead of loading all 8 bytes.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D129948
2022-07-18 09:09:17 -07:00
Simon Pilgrim 259c36e7c1 [DAG] Add asserts to isDesirableToCommuteWithShift overrides to ensure its being called from a shift. NFC. 2022-07-18 13:11:24 +01:00
jacquesguan 2b11174079 [RISCV][NFC] Use more Arrayref in TargetLowering functions.
This patch replaces some foreach with Arrayref, and abstract some same literal array with a variable.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125656
2022-07-18 03:33:45 +00:00
jacquesguan bd228a1772 [RISCV] Extend use of SHXADD instructions in RVV spill/reload code.
This patch extends D124824. It uses SHXADD+SLLI to emit 3, 5, or 9 multiplied by a power 2.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D129179
2022-07-18 10:53:19 +08:00
Fangrui Song d955497112 [RISCV] Simplify lowerGlobalAddress. NFC 2022-07-17 15:42:45 -07:00
Craig Topper decf385c27 [RISCV] Teach targetShrinkDemandedConstant to handle OR and XOR.
We were only handling AND before, but SimplifyDemandedBits can
also call it for OR and XOR.
2022-07-17 12:36:33 -07:00
Craig Topper 8cc483099a [RISCV] Teach RISCVCodeGenPrepare to optimize (i64 (and (zext/sext (i32 X), C1)))
If X is known positive by a dominating condition, we can fill in
ones into the upper bits of C1 if that would allow it to become an
simm12 allowing the use of ANDI.

This pattern often occurs in unrolled loops where the induction
variable has been widened.

To get the best benefit from this, I had to move the pass above
ConstantHoisting which is in addIRPasses. Otherwise the AND constant
is often hoisted away from the AND.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D129888
2022-07-17 11:00:56 -07:00
Craig Topper 73f766ca9a [RISCV] Remove unnecessary use of IRBuilder from RISCVCodeGenPrepare.
We're creating single instruction to replace another instruction.
We can insert using the InsertBefore operand of the constructor.
Then copy the debug location.
2022-07-17 10:59:54 -07:00
Craig Topper ee6267c443 [RISCV] Remove Gather/Scatter Opt from the O0 pipeline. 2022-07-17 10:58:33 -07:00
Lian Wang dca821d80a [RISCV] Add cost model for vector.reverse mask operation
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128784
2022-07-15 06:58:57 +00:00
Craig Topper 79016f6eef [RISCV] Refine the heuristics for our custom (mul (and X, C2), C1) isel.
Prefer to use SLLI instead of zext.w/zext.h in more cases. SLLI
might be better for compression.
2022-07-14 18:24:10 -07:00
Craig Topper bc0d656558 [RISCV] Fix mistake in RISCVTTIImpl::getIntImmCostInst.
zext.w requires Zba not Zbb. The test was also wrong, but had the
correct comment.
2022-07-14 16:42:35 -07:00
Craig Topper 9913ea490a [RISCV] Make TuneSiFive7 depend on TuneNoDefaultUnroll instead of listing it for every SiFive7 CPU 2022-07-14 15:57:30 -07:00
Craig Topper 1a8468ba61 [RISCV] Add a RISCV specific CodeGenPrepare pass.
Initial optimization is to convert (i64 (zext (i32 X))) to
(i64 (sext (i32 X))) if the dominating condition for the basic block
guaranteed the sign bit of X is zero.

This frequently occurs in loop preheaders where a signed induction
variable that can never be negative has been widened. There will be
a dominating check that the 32-bit trip count isn't negative or zero.
The check here is not restricted to that specific case though.

A i32->i64 sext is cheaper than zext on RV64 without the Zba
extension. Later optimizations can often remove the sext from the
preheader basic block because the dominating block also needs a sext to
evaluate the greater than 0 check.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D129732
2022-07-14 10:20:59 -07:00
Fraser Cormack d1a5669f5e [RISCV] Disable subregister liveness by default
We previously enabled subregister liveness by default when compiling
with RVV. This has been shown to cause miscompilations where RVV
register operand constraints are not met. A test was added for this in
D129639 which explains the issue in more detail.

Until this issue is fixed in some way, we should not be enabling
subregister liveness unless the user asks for it.

Reviewed By: craig.topper, rogfer01, kito-cheng

Differential Revision: https://reviews.llvm.org/D129646
2022-07-14 17:04:10 +01:00
David Green 3e0bf1c7a9 [CodeGen] Move instruction predicate verification to emitInstruction
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.

This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo.  The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.

The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.

Recommitted with some fixes for the leftover MCII variables in release
builds.

Differential Revision: https://reviews.llvm.org/D129506
2022-07-14 09:33:28 +01:00
Craig Topper 257755530a [RISCV] Fold (sra (sext_inreg (shl X, C1), i32), C2) -> (sra (shl X, C1+32), C2+32).
The former pattern will select as slliw+sraiw while the latter
will select as slli+srai. This can enable the slli+srai to be
compressed.

Differential Revision: https://reviews.llvm.org/D129688
2022-07-13 14:34:17 -07:00
Philip Reames dde2a7fb6d [RISCV] Exploit fact that vscale is always power of two to replace urem sequence
When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale.

vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.)

We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.)

It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic.

Differential Revision: https://reviews.llvm.org/D129609
2022-07-13 10:54:47 -07:00
David Green 95252133e1 Revert "Move instruction predicate verification to emitInstruction"
This reverts commit e2fb8c0f4b as it does
not build for Release builds, and some buildbots are giving more warning
than I saw locally. Reverting to fix those issues.
2022-07-13 13:28:11 +01:00
David Green e2fb8c0f4b Move instruction predicate verification to emitInstruction
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.

This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo.  The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.

The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.

Differential Revision: https://reviews.llvm.org/D129506
2022-07-13 12:53:32 +01:00
Fraser Cormack b336cf856e [RISCV] Add early-exit to RVV stack computation. NFCI.
This patch was split off from D126465, where an early-exit is necessary
as it checks the VLEN and that asserts that V instructions are present.

Since this makes logical sense on its own, I think it's worth landing
regardless of D126465.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D129617
2022-07-13 08:50:08 +01:00
Monk Chiang 2b045324b2 [RISCV] Add scheduling resources for vector segment instructions.
Add scheduling resources for vector segment instructions

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128886
2022-07-12 22:51:58 -07:00
Craig Topper c5be6a8308 [RISCV] Use X0 in place of VLMaxSentinel in lowering.
I thought I had already fixed all of these, but I guess I missed one.
2022-07-11 23:29:04 -07:00
Craig Topper c3c17b1695 [RISCV] Use MVT for the argument to getMaskTypeFor. NFC
Only one caller didn't already have an MVT and that was easy to
fix. Since the return type is MVT and it uses MVT::getVectorVT,
taking an MVT as input makes the most sense.
2022-07-11 15:14:44 -07:00
Craig Topper 759e5e0096 [RISCV] Remove doPeepholeLoadStoreADDI.
All of the cases should be handled by SelectAddrRegImm now.

Reviewed By: asb, luismarques

Differential Revision: https://reviews.llvm.org/D129451
2022-07-11 10:44:33 -07:00
Craig Topper 907d923a20 [RISCV] Move the custom isel for (add X, imm) into SelectAddrRegImm.
This custom isel was used to split the lo12 bits of the imm so that
they could be folded into load/store addresses via a post-isel
peephole.

This patch instead splits the immediate during isel and folds the
lo12 removing the need for the post-isel peephole to do anything.

After this we'll be able to remove the post-isel peephole.

Reviewed By: asb, luismarques

Differential Revision: https://reviews.llvm.org/D129450
2022-07-11 10:44:33 -07:00
Craig Topper 1a2bd44b77 [RISCV] Make shouldConvertConstantLoadToIntImm return true unless enableUnalignedScalarMem is true.
This restores the old behavior before D129402 when
enableUnalignedScalarMem is false. This fixes a regression spotted
by @asb.

To fix this correctly, we need to consider alignment of the load
we'd be replacing, but that's not possible in the current interface.
2022-07-11 09:40:08 -07:00
David Sherwood 03fee6712a [LoopVectorize] Add option to use active lane mask for loop control flow
Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operations,
such as masked loads and stores, etc. The loop itself is still
controlled by comparing the canonical induction variable with the
trip count. However, for some targets this is inefficient when it's
cheap to use the mask itself to control the loop.

This patch adds support for using the active lane mask for control
flow by:

1. Generating the active lane mask for the next iteration of the
vector loop, rather than the current one. If there are still any
remaining iterations then at least the first bit of the mask will
be set.
2. Extract the first bit of this mask and use this bit for the
conditional branch.

I did this by creating a new VPActiveLaneMaskPHIRecipe that sets
up the initial PHI values in the vector loop pre-header. I've also
made use of the new BranchOnCond VPInstruction for the final
instruction in the loop region.

Differential Revision: https://reviews.llvm.org/D125301
2022-07-11 13:46:55 +01:00