This patch adds cost model for vector compare and select instructions. For vector FP compare instruction, it only add the comparisions supported natively.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D132296
The original code may have incorrect result if there is a masked instruction
without policy operand to make us set its policy to TUMU. The patch adds an
assertion to catch the instruction.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D133302
ALU seems a little vague. FAdd felt more precise even though it
also include FSUB instructions.
Reviewed By: monkchiang
Differential Revision: https://reviews.llvm.org/D133632
The default is to use extload which can become a zextload or
sextload if it is followed by an 'and' or sext_inreg.
Sometimes type legalization will introduce an 'and' from promoting
something like 'srl X, C' and a sext_inreg from from a setcc. The
'and' could be freely folded with the promoted 'srl' by using srliw,
but the sext_inreg can't be folded into a compare. DAG combiner
will see both of these choices and may decide to fold the 'and'
instead of the 'sext_inreg'. This forces the sext_inreg to become
a sext.w.
By picking sextload in the type legalizer we take this choice away.
Looking at spec2006 compiled with Zba and Zbb this appeared to be
net reduction in lines of code in the objdump disassembly output.
This is similar to what we do with i32 add/sub/mul/shl in
type legalization where we always emit a sext_inreg.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D130397
Split out from D129178, this just adds the GlobalMerge tests (other than global-merge-minsize.ll which is testing a specific configuration of the pass when it's enabled) and exposes `-riscv-enable-global-merge` and //doesn't enable it by default//.
Note that the comment "// FIXME: Unify control over GlobalMerge." is copied from the Arm and AArch64 backends, which expose the same flag. Presumably the author is imagining some later refactoring that provides a target-independent flag.
Reviewed By: craig.topper, reames, hiraditya
Differential Revision: https://reviews.llvm.org/D130481
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.
Change call sites to use `std::size` instead.
Differential Revision: https://reviews.llvm.org/D133429
This is a minimalist implementation which simply adds the extension (in the experimental namespace since its not ratified), and wires up the setting of the required ELF header flag. Future changes will include codegen changes to exploit the stronger memory model.
This is intended to implement v0.1 of the proposed specification which can be found in Chapter 25 of https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-spec.pdf.
Differential Revision: https://reviews.llvm.org/D133239
This adds new VFCVT pseudoinstructions that take a rounding mode operand. A custom inserter is used to insert additional instructions to change FRM around the
VFCVT.
Some of this is borrowed from D122860, but takes a somewhat different direction. We may migrate to that patch, but for now I was trying to keep this as independent from
RVV intrinsics as I could.
A followup patch will use this approach for FROUND too.
Still need to fix the cost model.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D133238
When ISD::SETUGT && Imm == -1, has processed before lowering. Use assert replace it
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D132373
X86 and ARM AsmParsers have this same assertion. This assertion provides better reporting when the RISCVTargetStreamer is null and helps to prevent null pointer access.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D132863
We can use srliw to shift out the trailing bits and slli to shift
back in zeros. The sign extend of srliw will 0 the upper 32 bits
since we will be shifting a 0 into bit 31.
Don't require the AND has one use and don't depend on
targetShrinkDemandedConstant turning C2 into 0xffffffff. Instead,
check that the constant is 0xffffffff after replacing any bits
that will be shifted out with 1s.
Another way to fix this might be to prevent SimplifyDemandedBits
from destroying the ANDI after type legalization using
targetShrinkDemandedBits. That would prevent the CSE that created
this mess. targetShrinkDemandedBits is currently only enable after
legalize ops. Quick experiment shows we can't just change when it
runs, we would need to try a different heuristic for post type
legalization.
SimplifyDemandedBits can 0 the upper bits and targetShrinkDemandedConstant
isn't alway able to recover it.
At least part of that may be because targetShrinkDemandedConstant
only runs in the last DAGCombine. Might be worth seeing what happens
if we move it post type legalization.
This builds on D132771 to invert (setlt 0, X) to (setlt X, 1) and
vice versa.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D132798
We can rewrite to (bnez (or/and (setne), Z) is Z is 0/1.
Alternatively, we could canonicalize to (xor (or/and (setne), Z), 1)
even if there is no branch. The xor would not always get removed,
but it might enable other DeMorgan combines. I decided to be
conservative for this first patch and require the xor to be removed.
I have a couple other invertible setccs I will add in a follow up
patch.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D132771
Mostly just modeled after vp.fneg except there is a
"functional instruction" for fneg while fabs is always an
intrinsic.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D132793
The existing code was incorrect if we had more than one conditional
branch instruction in a basic block. Though I don't think that will
occur, using analyzeBranch detects that as an unsupported case.
Overall this results in simpler code in RISCVRedundantCopyElimination.
Reviewed By: reames, kito-cheng
Differential Revision: https://reviews.llvm.org/D132347
The motivation of this patch is to lower the IR pattern
(vp.merge mask, (add x, y), false, vl) to
(PseudoVADD_VV_<LMUL>_MASK false, x, y, mask, vl).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D131841
This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size.
For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware.
The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization.
SLP has been disabled for now, even when fixed vectors are enabled. See a310637 and associated review. There are a few addiitional code quality issues which need worked through before turning SLP on would be reasonable.
Differential Revision: https://reviews.llvm.org/D131508
This change implements a TTI query with the goal of disabling slp vectorization on RISCV. The current default configuration disables SLP already, but its current tied to the ability to lower fixed length vectors. Over in D131508, I want to enable fixed length vectors for purposes of LoopVectorizer, but preliminary analysis has revealed a couple of SLP specific issues we need to resolve before enabling it by default. This change exists to allow us to enable LV without SLP.
Differential Revision: https://reviews.llvm.org/D132680
In patch D121183, target abi is get from .ll file's target-abi
attribute and set in RISCVAsmPrinter::emitFunctionEntryLabel
function. In https://github.com/llvm/llvm-project/issues/57242,
an api mismatch error may be caused by failing to call function
RISCVAsmPrinter::emitFunctionEntryLabel to set target-abi to
correct one when the .ll is empty or a module has no function.
This patch move setting target-abi part to function
RISCVAsmPrinter::emitStartOfAsmFile, make sure all .ll file and
module in LTO read target-abi from module flag and set, with or
without function.
Signed-off-by: xiaojing.zhang <xiaojing.zhang@xcalibyte.com>
Signed-off-by: jianxin.lai <jianxin.lai@xcalibyte.com>
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D132204
Running on RISCV machine llvm-exegesis I faced with trouble: can't measure C_ADDI16SP, beacuse immediate has type simm10_lsb0000nonzero.
Patch adds support for processing this immediate operand type.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D132650
Saves a heap allocation and avoids an explicit call to the BitVector constructor.
Reviewed By: reames, myhsu
Differential Revision: https://reviews.llvm.org/D132674
SimplifyDemandedBits tries to agressively turn xor immediates into -1
to match a 'not' instruction. In this case, because X is a boolean, the
upper bits of (xor X, 1) are known to be 0. Because this is an AND
instruction, that means those bits aren't demanded from the other
operand, and thus SimplifyDemandedBits can turn (xor Y, 1) to (not Y).
We need to detect that this has happened to enable the DeMorgan
optimization. To do this we allow one of the xors to use -1 when
the outer operation is And.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D132671
This optimizes xors that appear due to legalizing setge/setle which
require an xor with 1. This reduces the number of xors and may
allow the xor to fold with a beqz or bnez.
Differential Revision: https://reviews.llvm.org/D132614
We may be requested to emit an unaligned nop sequence (e.g. 7-bytes or
3-bytes). These should be 0-filled even though that is not a valid
instruction. This matches the behaviour on other architectures like
ARM, X86, and MIPS. When a custom section is emitted, it may be
classified as text even though it may be a data section or we may be
emitting data into a text segment (e.g. a literal pool). In such cases,
we should be resilient to the emission request.
This was originally identified by the Linux kernel build and reported on
D131270 by Nathan Chancellor.
Differential Revision: https://reviews.llvm.org/D132482
Reviewed By: luismarques
Tested By: Nathan Chancellor
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.
For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.
Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.
Differential Revision: https://reviews.llvm.org/D132520