Commit Graph

2411 Commits

Author SHA1 Message Date
Philip Reames 32cfafddb1 [RISCV] Verify VL operand on instructions if present
These should only be immediate values or GPR registers.

Differential Revision: https://reviews.llvm.org/D133953
2022-09-15 13:06:52 -07:00
Sergei Barannikov c6acb4eb0f [SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit.  This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.
2022-09-15 14:02:12 -04:00
Craig Topper 5888c157a7 [RISCV] Simplify some code in RISCVInstrInfo::verifyInstruction. NFCI
This code was written as if it lived in the MC layer instead of
the CodeGen layer. We get the MCInstrDesc directly from MachineInstr.
And we can use RISCVSubtarget::is64Bit instead of going to the
Triple.

Differential Revision: https://reviews.llvm.org/D133905
2022-09-14 17:07:21 -07:00
Philip Reames e395915ac0 [RISCV] Verify SEW/VecPolicy immediate values
Copy the asserts from the printing code, and turn them into actual verifier rules. Doing this revealed an existing bug - see 0a14551.

Differential Revision: https://reviews.llvm.org/D133869
2022-09-14 14:45:16 -07:00
Philip Reames 0a145516a2 [RISCV] Fix a silent miscompile in copyPhysReg
Found this when adding verifier rules. The case which arises is that we have a DefMBBI which has a VecPolicy operand. The code was not expecting this, and the unconditional copy of the last two operands resulted in the SEW and VecPolicy fields being added to the VMV_V_V as AVL and SEW respectively.

Oddly, this appears to be a silent in practice. There's no test change despite verifier changes proving that we definitely hit this in existing tests.

Differential Revision: https://reviews.llvm.org/D133868
2022-09-14 14:45:01 -07:00
jacquesguan ecf327f154 [RISCV] Add cost model for vector insert/extract element.
This patch adds cost model for vector insert/extract element instructions. In RVV, we could use vector scalar move instruction to insert or extract the first element, and use vslide to move it. But for mask vector or i64 vector in i32 target, we need special instructions to make it.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133007
2022-09-14 11:10:18 +08:00
Yeting Kuo 1b56b2b267 [RISCV] Transform VMERGE_VVM_<LMUL>_TU with all ones mask to VADD_VI_<LMUL>_TU.
The transformation is benefit because vmerge.vvm always needs mask operand but
vadd.vi may not.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D133255
2022-09-14 10:01:37 +08:00
Han-Kuan Chen dd53a0bb30 [RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type.
Differential Revision: https://reviews.llvm.org/D133688
2022-09-13 18:50:20 -07:00
Fangrui Song ab1c259613 [RISCV] Assemble `call foo` to R_RISCV_CALL_PLT
R_RISCV_CALL/R_RISCV_CALL_PLT distinction isn't necessary. R_RISCV_CALL has been
deprecated as a resolution to
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/98 .

ld.lld and mold treat the two relocation types the same. GNU ld has a custom
handling for undefined weak functions which is unnecessary: calling an
unresolved undefined weak function is UB and GNU ld can handle the case without
a relocation error (such a function call is usually guarded by a zero value
check and should be allowed).

This patch assembles `call foo` to use R_RISCV_CALL_PLT instead of the
deprecated R_RISCV_CALL.

Note: the code generator still differentiates `call foo` and (maybe preemptible)
`call foo@plt`, but the difference is purely aesthetic.

Note: D105429 does not support R_RISCV_CALL_PLT correctly. Changed the test to
force R_RISCV_CALL for now.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D132530
2022-09-13 18:47:55 -07:00
Philip Reames 09d73fe8cd [RISCV] Add MIR comments for VecPolicy operands
Analogous to what we already do for SEW operands, aimed at making the resulting MIR readable by a human.
2022-09-13 15:36:33 -07:00
Philip Reames cc45687e1c [RISCV] Simpify operand index calculation in createMIROperandComment [nfc] 2022-09-13 15:06:40 -07:00
Craig Topper 8d7e73effe [RISCV] Teach lowerVECTOR_SHUFFLE to recognize some shuffles as vnsrl.
Unary shuffles such as <0,2,4,6,8,10,12,14> or <1,3,5,7,9,11,13,15>
where half the elements are returned, can be lowered using vnsrl.

SelectionDAGBuilder lowers such shuffles as a build_vector of
extract_elements since the mask has less elements than the source.
To fix this, I've enable the extractSubvectorIsCheapHook to allow
DAGCombine to rebuild the shuffle using 2 extract_subvectors preceding
the shufffle.

I've gone very conservative on extractSubvectorIsCheapHook to minimize
test impact and match what we have test coverage for. This can be
improved in the future.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133736
2022-09-13 11:07:11 -07:00
Alex Bradbury c44c1e9d3e [RISCV] Implement isMaskAndCmp0FoldingBeneficial hook
This hook is currently only used by CodeGenPrepare, which will sink *and
duplicate* an 'and' into a block that has an 'icmp 0' user of it if the
hook returns true.

This hook is less useful for RISC-V than for targets like AArch64 that
have a TBZ (test bit and branch if zero instruction), but may still be
profitable if Zbs is available and a BEXTI can be selected.

Conservatively, we return false even if Zbs is enabled for any masks
that fit in the ANDI immediate because it's possible the only use is a
branch on the result, and ANDI+BNEZ => BEXTI+BNEZ isn't a profitable
transformation.

Differential Revision: https://reviews.llvm.org/D131492
2022-09-13 18:54:00 +01:00
Alex Bradbury 547160848c [RISCV] Return true in hasBitTest when Zbs is enabled and update BEXTI pattern for resulting canonicalisation
As the Zbs extension includes bext[i] for bit extract, we can
unconditionally return true from this hook. This hook causes the DAG
combiner to perform the following canonicalisation:

  and (not (srl X, C)), 1 --> (and X, 1<<C) == 0
  and (srl (not X), C)), 1 --> (and X, 1<<C) == 0

As simply changing the hook causes a codegen regression, this patch also
modifies a BEXTI pattern to match this canonicalised form.

As BSETINVMask is now used for BEXT as well as BSET and BINV, it has
been renamed to the more generic SingleBitSetMask.

There is one codegen change in bittest.ll for bittest_31_i64 (NOT+BEXTI
rather than NOT+SRLIW). This is neutral in terms of code quality.

Differential Revision: https://reviews.llvm.org/D131482
2022-09-13 16:51:47 +01:00
Craig Topper 5224bae613 [RISCV] Fix a bug in i32 FP_TO_UINT_SAT lowering on RV64.
We use the saturating behavior of fcvt.wu.h/s/d but forgot to
take into account that fcvt.wu will sign extend the saturated
result. According to computeKnownBits a promoted FP_TO_UINT_SAT
is expected to zero extend the saturated value.

In many case the upper bits aren't be demanded so this wouldn't
be an issue. But if we computeKnownBits caused an AND to be removed
it would be a bug.

This patch inserts an AND during to zero the upper bits.

Unfortunately, this pessimizes code if we aren't able to tell if
the upper bits are demanded. To fix that we could custom type
promote the FP_TO_UINT_SAT with SEXT_INREG after it, but I'll
leave that for future work.

I haven't found a failure from this, I was revisiting the code to
add vector support and spotted it.

Differential Revision: https://reviews.llvm.org/D133746
2022-09-13 08:41:32 -07:00
Haojian Wu 7ed68182d7 Fix a -Wswitch warning. 2022-09-13 08:57:43 +02:00
jacquesguan b98b4fae75 [RISCV] Add cost model for compare and select instructions.
This patch adds cost model for vector compare and select instructions. For vector FP compare instruction, it only add the comparisions supported natively.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132296
2022-09-13 14:44:46 +08:00
Yeting Kuo 5fcb5d7759 [RISCV] Add assertion of hasVecPolicyOp to catch masked intrinsic without policy operand.
The original code may have incorrect result if there is a masked instruction
without policy operand to make us set its policy to TUMU. The patch adds an
assertion to catch the instruction.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133302
2022-09-13 10:09:49 +08:00
Craig Topper d49280e0a4 [RISCV] Rename WriteFALU* and ReadFALU* to WriteFAdd*/ReadFAdd*.
ALU seems a little vague. FAdd felt more precise even though it
also include FSUB instructions.

Reviewed By: monkchiang

Differential Revision: https://reviews.llvm.org/D133632
2022-09-12 09:37:28 -07:00
Craig Topper 4186a49d79 [RISCV] Custom type legalize i32 loads by sign extending.
The default is to use extload which can become a zextload or
sextload if it is followed by an 'and' or sext_inreg.

Sometimes type legalization will introduce an 'and' from promoting
something like 'srl X, C' and a sext_inreg from from a setcc. The
'and' could be freely folded with the promoted 'srl' by using srliw,
but the sext_inreg can't be folded into a compare. DAG combiner
will see both of these choices and may decide to fold the 'and'
instead of the 'sext_inreg'. This forces the sext_inreg to become
a sext.w.

By picking sextload in the type legalizer we take this choice away.
Looking at spec2006 compiled with Zba and Zbb this appeared to be
net reduction in lines of code in the objdump disassembly output.

This is similar to what we do with i32 add/sub/mul/shl in
type legalization where we always emit a sext_inreg.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D130397
2022-09-12 09:13:07 -07:00
Alex Bradbury 51ae462447 [RISCV] Add the GlobalMerge pass (disabled by default)
Split out from D129178, this just adds the GlobalMerge tests (other than global-merge-minsize.ll which is testing a specific configuration of the pass when it's enabled) and exposes `-riscv-enable-global-merge` and //doesn't enable it by default//.

Note that the comment "// FIXME: Unify control over GlobalMerge." is copied from the Arm and AArch64 backends, which expose the same flag. Presumably the author is imagining some later refactoring that provides a target-independent flag.

Reviewed By: craig.topper, reames, hiraditya

Differential Revision: https://reviews.llvm.org/D130481
2022-09-08 18:40:38 -07:00
Craig Topper 5f3a8b585b [RISCV] Add RecurKind::FMulAdd to isLegalToVectorizeReduction for scalable vectors.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133511
2022-09-08 12:34:59 -07:00
Joe Loser 5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
liqinweng 9b4e75ee76 [RISCV][COST] Add cost model for mask vector select instruction when its condition is a scalar type
Reviewed By: jacquesguan

Differential Revision: https://reviews.llvm.org/D132992
2022-09-08 18:55:49 +08:00
Philip Reames a4a29438f4 [RISCV][MC] Add minimal support for Ztso extension
This is a minimalist implementation which simply adds the extension (in the experimental namespace since its not ratified), and wires up the setting of the required ELF header flag. Future changes will include codegen changes to exploit the stronger memory model.

This is intended to implement v0.1 of the proposed specification which can be found in Chapter 25 of https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220723-10eea63/riscv-spec.pdf.

Differential Revision: https://reviews.llvm.org/D133239
2022-09-07 09:30:57 -07:00
Craig Topper 5d30565d80 [RISCV] Improve vector fround lowering by changing FRM.
This is a follow up to D133238 which did this for ceil/floor.

Reviewed By: arcbbb, frasercrmck

Differential Revision: https://reviews.llvm.org/D133335
2022-09-06 09:33:13 -07:00
Craig Topper f0332d12ae [RISCV] Improve vector fceil/ffloor lowering by changing FRM.
This adds new VFCVT pseudoinstructions that take a rounding mode operand. A custom inserter is used to insert additional instructions to change FRM around the
VFCVT.

Some of this is borrowed from D122860, but takes a somewhat different direction. We may migrate to that patch, but for now I was trying to keep this as independent from
RVV intrinsics as I could.

A followup patch will use this approach for FROUND too.

Still need to fix the cost model.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D133238
2022-09-05 19:03:44 -07:00
Craig Topper 11881a8f3f [RISCV] Rename some V extension multiclasses for consistency. NFC
Use "SDNode" in the name is the convention for the VLMax patterns
in RISCVInstrInfoVSDPatterns.td. This files use "VL".
2022-09-01 22:17:08 -07:00
Alex Bradbury 6e1897ce95 [RISCV][NFC] Fix typo in comment in RISCVInstrInfoZicbo.td
Zicbop->Zicbom typo.
2022-09-01 13:49:55 +01:00
liqinweng c45810f810 [RISCV] When ISD::SETUGT && Imm == -1, has processed before lowering
When ISD::SETUGT && Imm == -1, has processed before lowering. Use assert replace it

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132373
2022-09-01 15:38:16 +08:00
Craig Topper 6e0ae7e940 [RISCV] Slightly simplify coode in combineVWADD_W_VL_VWSUB_W_VL and combineMUL_VLToVWMUL_VL. NFC
Use computeMaxSignificantBits instead of ComputeNumSignBits. Create
APInt as part of call to MaskedValueIsZero instead of creating
a named temporary.
2022-08-31 15:02:03 -07:00
Michael Maitland 30a4264f5f [RISCV][CodeGen] add assertion to RISCVTargetStreamer getTargetStreamer()
X86 and ARM AsmParsers have this same assertion. This assertion provides better reporting when the RISCVTargetStreamer is null and helps to prevent null pointer access.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D132863
2022-08-31 11:15:47 -07:00
jacquesguan 45c1ce321d [RISCV] Add cost model for select and integer compare instructions.
This patch adds cost model for vector select and integer compare instructions.
2022-08-31 11:32:58 +08:00
Craig Topper 7973346d16 [RISCV] Use uint64_t countTrailingZeros/Ones instead of APInt. NFC
We know the type is 32 or 64 bits, we can use getZExtValue and
bypass the slow path check in APInt.
2022-08-30 12:39:36 -07:00
Craig Topper 893f5e95e2 [RISCV] Improve isel of AND with shiftedMask containing 32 leading zeros and some trailing zeros.
We can use srliw to shift out the trailing bits and slli to shift
back in zeros. The sign extend of srliw will 0 the upper 32 bits
since we will be shifting a 0 into bit 31.
2022-08-30 12:22:46 -07:00
liqinweng 72c9f811d8 [RISCV][COST] Refactor for costs of integer saturing add/sub
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132822
2022-08-30 11:39:55 +08:00
Craig Topper e25eb61d03 [RISCV] Enable (srl (and X, C2), C) to form SRLIW in more cases.
Don't require the AND has one use and don't depend on
targetShrinkDemandedConstant turning C2 into 0xffffffff. Instead,
check that the constant is 0xffffffff after replacing any bits
that will be shifted out with 1s.

Another way to fix this might be to prevent SimplifyDemandedBits
from destroying the ANDI after type legalization using
targetShrinkDemandedBits. That would prevent the CSE that created
this mess. targetShrinkDemandedBits is currently only enable after
legalize ops. Quick experiment shows we can't just change when it
runs, we would need to try a different heuristic for post type
legalization.
2022-08-29 15:52:08 -07:00
Craig Topper 0fbe71e91f [RISCV] Use hasAllWUsers to recover ANDI.
SimplifyDemandedBits can 0 the upper bits and targetShrinkDemandedConstant
isn't alway able to recover it.

At least part of that may be because targetShrinkDemandedConstant
only runs in the last DAGCombine. Might be worth seeing what happens
if we move it post type legalization.
2022-08-29 14:11:09 -07:00
Craig Topper 1c334b306e [RISCV] Add more invertible setccs to tryDemorganOfBooleanCondition.
This builds on D132771 to invert (setlt 0, X) to (setlt X, 1) and
vice versa.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132798
2022-08-29 12:23:03 -07:00
Craig Topper 9d12bb77f9 [RISCV] Apply DeMorgan to (beqz (and/or (seteq), (xor Z, 1))) to remove the xor.
We can rewrite to (bnez (or/and (setne), Z) is Z is 0/1.

Alternatively, we could canonicalize to (xor (or/and (setne), Z), 1)
even if there is no branch. The xor would not always get removed,
but it might enable other DeMorgan combines. I decided to be
conservative for this first patch and require the xor to be removed.

I have a couple other invertible setccs I will add in a follow up
patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132771
2022-08-29 12:16:34 -07:00
Craig Topper 2f811a6c7f [VP][RISCV] Add vp.fabs intrinsic and RISC-V support.
Mostly just modeled after vp.fneg except there is a
"functional instruction" for fneg while fabs is always an
intrinsic.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D132793
2022-08-29 09:32:06 -07:00
Craig Topper e0a9da2562 [RISCV] Add Uses=[FRM] and mayRaiseFPException to VF(N/W)CVT instructions.
Reviewed By: arcbbb, kito-cheng

Differential Revision: https://reviews.llvm.org/D132792
2022-08-29 09:26:33 -07:00
Craig Topper 6732896bbf [RISCV] Use analyzeBranch in RISCVRedundantCopyElimination.
The existing code was incorrect if we had more than one conditional
branch instruction in a basic block. Though I don't think that will
occur, using analyzeBranch detects that as an unsupported case.

Overall this results in simpler code in RISCVRedundantCopyElimination.

Reviewed By: reames, kito-cheng

Differential Revision: https://reviews.llvm.org/D132347
2022-08-29 09:05:53 -07:00
Yeting Kuo abf0416328 [RISCV] Merge vmerge.vvm and unmasked intrinsic with VLMAX vector length.
The motivation of this patch is to lower the IR pattern
(vp.merge mask, (add x, y), false, vl) to
(PseudoVADD_VV_<LMUL>_MASK false, x, y, mask, vl).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D131841
2022-08-29 11:44:51 +08:00
liqinweng 6fdd62c98d [RISCV] Remove unused code
Reviewed By: benshi001

Differential Revision: https://reviews.llvm.org/D132281
2022-08-29 10:16:44 +08:00
liqinweng a42e21deb8 [RISCV] Refactor for costs of integer min/max
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132724
2022-08-29 10:13:50 +08:00
Kazu Hirata 2833760c57 [Target] Qualify auto in range-based for loops (NFC) 2022-08-28 17:35:09 -07:00
Benjamin Kramer b69086e6c7 [RISC-V][HWASAN] Fold variable into assert 2022-08-29 00:32:37 +02:00
Alexey Baturo e3485345d3 [RISC-V][HWASAN] Add support for lowering HWASAN intrinsic for RISC-V
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D131343
2022-08-28 21:22:13 +03:00
Alexey Baturo 0636aec330 [RISC-V][HWASAN] Add intrinsics required for HWASAN support for RISC-V
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D131340
2022-08-28 18:05:43 +03:00
Philip Reames b45a262679 [RISCV] Enable fixed length vectors and loop vectorization with same
This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size.

For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware.

The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization.

SLP has been disabled for now, even when fixed vectors are enabled.  See a310637 and associated review.  There are a few addiitional code quality issues which need worked through before turning SLP on would be reasonable.

Differential Revision: https://reviews.llvm.org/D131508
2022-08-26 14:45:23 -07:00
Philip Reames a310637132 [RISCV] Disable SLP vectorization by default due to unresolved profitability issues
This change implements a TTI query with the goal of disabling slp vectorization on RISCV. The current default configuration disables SLP already, but its current tied to the ability to lower fixed length vectors. Over in D131508, I want to enable fixed length vectors for purposes of LoopVectorizer, but preliminary analysis has revealed a couple of SLP specific issues we need to resolve before enabling it by default. This change exists to allow us to enable LV without SLP.

Differential Revision: https://reviews.llvm.org/D132680
2022-08-26 14:11:22 -07:00
Yunze Zhu 3846e3970f [RISCV] Generate correct ELF abi flag when empty .ll file has target-abi attribute
In patch D121183, target abi is get from .ll file's target-abi
attribute and set in RISCVAsmPrinter::emitFunctionEntryLabel
function. In https://github.com/llvm/llvm-project/issues/57242,
an api mismatch error may be caused by failing to call function
RISCVAsmPrinter::emitFunctionEntryLabel to set target-abi to
correct one when the .ll is empty or a module has no function.

This patch move setting target-abi part to function
RISCVAsmPrinter::emitStartOfAsmFile, make sure all .ll file and
module in LTO read target-abi from module flag and set, with or
without function.

Signed-off-by: xiaojing.zhang <xiaojing.zhang@xcalibyte.com>
Signed-off-by: jianxin.lai <jianxin.lai@xcalibyte.com>

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D132204
2022-08-26 14:39:39 +08:00
LiaoChunyu 6b098bf35a [RISCV] : Add support for simm10_lsb0000nonzero operand.
Running on RISCV machine llvm-exegesis I faced with trouble: can't measure C_ADDI16SP, beacuse immediate has type simm10_lsb0000nonzero.

Patch adds support for processing this immediate operand type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132650
2022-08-26 14:37:37 +08:00
Craig Topper e4177201eb [RISCV][M68k] Replace fixed size BitVector with std::bitset.
Saves a heap allocation and avoids an explicit call to the BitVector constructor.

Reviewed By: reames, myhsu

Differential Revision: https://reviews.llvm.org/D132674
2022-08-25 12:45:08 -07:00
Craig Topper 41a3b5739b [RISCV] Teach combineDeMorganOfBoolean to handle (and (xor X, 1), (not Y)).
SimplifyDemandedBits tries to agressively turn xor immediates into -1
to match a 'not' instruction. In this case, because X is a boolean, the
upper bits of (xor X, 1) are known to be 0. Because this is an AND
instruction, that means those bits aren't demanded from the other
operand, and thus SimplifyDemandedBits can turn (xor Y, 1) to (not Y).

We need to detect that this has happened to enable the DeMorgan
optimization. To do this we allow one of the xors to use -1 when
the outer operation is And.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132671
2022-08-25 10:55:45 -07:00
Philip Reames 53f738ce7e [RISCV] Add empirical costs for integer min/max and saturing add/sub
All of these are lowered to a single instruction for all legal vector types.
2022-08-25 09:27:17 -07:00
Craig Topper ec91d761ac [RISCV] Apply DeMorgan's law to (and/or (xor X, 1), (xor Y, 1)) if X and Y are 0/1.
This optimizes xors that appear due to legalizing setge/setle which
require an xor with 1. This reduces the number of xors and may
allow the xor to fold with a beqz or bnez.

Differential Revision: https://reviews.llvm.org/D132614
2022-08-25 08:49:30 -07:00
Philip Reames 03798f268b {RISCV] Backout cttz/ctlz instruction costs
Craig points out correctly in post-commit review that these depend on the availability of floating point extensions.
2022-08-24 15:40:48 -07:00
Philip Reames d4d6e71ea2 [RISCV] Add empirical costs for bswap/bitreverse/ctpop/ctlz/cttz
If anyone is looking for a source of ideas on vector codegen improvements, the lowerings for several of these seem to include pretty obvious fixits.
2022-08-24 15:09:21 -07:00
Philip Reames 42af1a776a [RISCV] Add empirically measured vector sqrt intrinsic costs 2022-08-24 14:27:57 -07:00
Philip Reames 4d3134866f [RISCV] Add vector fabs intrinsic costs
We have a fabs vector instruction, and are using it for current lowering.
2022-08-24 14:09:51 -07:00
Saleem Abdulrasool 8f45b5a7a9 RISCV: permit unaligned nop-slide padding emission
We may be requested to emit an unaligned nop sequence (e.g. 7-bytes or
3-bytes).  These should be 0-filled even though that is not a valid
instruction.  This matches the behaviour on other architectures like
ARM, X86, and MIPS.  When a custom section is emitted, it may be
classified as text even though it may be a data section or we may be
emitting data into a text segment (e.g. a literal pool).  In such cases,
we should be resilient to the emission request.

This was originally identified by the Linux kernel build and reported on
D131270 by Nathan Chancellor.

Differential Revision: https://reviews.llvm.org/D132482
Reviewed By: luismarques
Tested By: Nathan Chancellor
2022-08-24 20:26:48 +00:00
Simon Pilgrim f9de13232f [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Differential Revision: https://reviews.llvm.org/D132520
2022-08-24 17:28:18 +01:00
Kito Cheng 8e8a62006e [RISCV][NFC] Minor cleanup in RISCVInstrInfo::getOutliningType
The only use of TM is checking result of TargetMachine::getFunctionSections,
check that directly instead of introdce a local variable.
2022-08-24 23:42:34 +08:00
Alex Richardson 38107171ed [RegisterInfoEmitter] Generate isConstantPhysReg(). NFCI
This commit moves the information on whether a register is constant into
the Tablegen files to allow generating the implementaiton of
isConstantPhysReg(). I've marked isConstantPhysReg() as final in this
generated file to ensure that changes are made to tablegen instead of
overriding this function, but if that turns out to be too restrictive,
we can remove the qualifier.

This should be pretty much NFC, but I did notice that e.g. the AMDGPU
generated file also includes the LO16/HI16 registers now.

The new isConstant flag will also be used by D131958 to ensure that
constant registers are marked as call-preserved.

Differential Revision: https://reviews.llvm.org/D131962
2022-08-24 14:16:20 +00:00
Kito Cheng 96c85f80f0 [RISCV] Don't outline pcrel-lo operand.
This issue is found by build llvm-testsuite with `-Oz`, linker will complain
`dangerous relocation: %pcrel_lo missing matching %pcrel_hi` and that
turn out cause by we outlined pcrel-lo, but leave pcrel-hi there, that's
not problem in general, but the problem is they put into different section, they
pcrel-hi and pcrel-lo pair (e.g. AUIPC+ADDI) *MUST* put be present in same
section due to the implementation.

Outlined function will put into .text name, but the source functions
will put in .text.<function-name> if function-section is enabled or the
function has `comdat` attribute.

There are few solutions for this issue:
1. Always disallow instructions with pcrel-lo flags.
2. Only disallow instructions with pcrel-lo flags that when function-section is
   enabled or this function has `comdat` attribute.
3. Check the corresponding instruction with pcrel-high also included in the
   outlining candidate sequence or not, and allow that only when pcrel-high is
   included in the outlining candidate.

First one is most conservative, that might lose some optimization
opportunities, and second one could save those opportunities, and last
one is hard to implement, and don't have any benefits since pcrel-high
are using different label even accessing same symbol.

Use custom section name might also cause this problem, but that already
filtered by RISCVInstrInfo::isFunctionSafeToOutlineFrom.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D132528
2022-08-24 21:47:46 +08:00
MarkGoncharovAl 8c1f18bd3e [RISCV] : Add support for immediate operands.
llvm-exegesis uses operand type information provided in tablegen files to initialize
immediate arguments of the instruction. Some of them simply don't have such information.
Thus we should set into relevant immediate operands their specific type.
Also create verification methods for them.

Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D131771
2022-08-24 17:48:39 +08:00
Alex 07a700f814 [RISCV] Add zihintntl compressed instructions
Add zihintntl compressed instructions and some files related to zihintntl.
This patch is base on {D121670}.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D121779
2022-08-24 14:29:02 +08:00
ZHU Zijia 9c85382ade [RISCV] Handle register spill in branch relaxation
In branch relaxation pass, `j`'s with offset over 1MiB will be relaxed
to `jump` pseudo-instructions.

This patch allocates a stack slot for functions with a size greater than
1MiB. If the register scavenger cannot find a scratch register for
`jump`, spill a register to the slot before the jump and restore it
after the jump.

.mbb:
        foo
        j       .dest_bb
        bar
        bar
        bar
.dest_bb:
        baz

The above code will be relaxed to the following code.

.mbb:
        foo
        sd      s11, 0(sp)
        jump    .restore_bb, s11
        bar
        bar
        bar
        j       .dest_bb
.restore_bb:
        ld      s11, 0(sp)
.dest_bb:
        baz

Depends on D129999.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D130560
2022-08-24 13:27:56 +08:00
Philip Reames c9608d57b8 [TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC]
This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet.  The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.
2022-08-23 07:55:42 -07:00
Philip Reames 478cf94378 [X86][AArch64][WebAsm][RISCV] Query operand properties instead of using enums directly [nfc]
This is part of an ongoing transition to use OperandValueInfo which combines OperandValueKind and OperandValueProperties.  This change adds some accessor methods and uses them to simplify backend code.  The primary motivation of doing so is removing uses of the parameters so that an upcoming api change is less error prone.
2022-08-22 13:37:59 -07:00
Shao-Ce SUN 7167a4207e [RISCV] Add zihintntl instructions
Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D121670
2022-08-22 12:06:30 +08:00
Craig Topper abce7acebd [RISCV] Remove impossible TODO in RISCVRedundantCopyElimination. NFC
If there are multiple conditional branches we shouldn't do any
optimization.
2022-08-21 13:18:02 -07:00
Craig Topper 1a042dd6ed [RISCV] Optimize x <s -1 ? x : -1. Improve x >u 1 ? x : 1.
Similar to D132211, we can optimize x <s -1 ? x : -1 -> x <s 0 ? x : -1

Also improve the unsigned case from D132211 to use x != 0 which
will give a bnez instruction which might be compressible.

Differential Revision: https://reviews.llvm.org/D132252
2022-08-21 11:48:28 -07:00
Craig Topper a6c3ccd476 [RISCV] Be more strict about LUI+ADDI macrofusion pre-RA.
Don't macrofuse if the LUI has more than 1 user. That will likely
require the LUI to have a different destination register post-RA.
LUI+ADDI can only be fused if they write the same register.
2022-08-21 10:58:15 -07:00
LiaoChunyu 1fb87ace4d [RISCV] Optimize x > 1 ? x : 1 -> x > 0 ? x : 1
if x == 1,
  x > 1 ? x : 1  return x, which is also 1.
  x > 0 ? x : 1  return 1.

Reduce the number of load 1 instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132211
2022-08-21 20:26:39 +08:00
Simon Pilgrim 5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Craig Topper 6227b7ae31 [RISCV] Move xori creation for scalar setccs to lowering.
This patch enables expansion or custom lowering for some integer
condition codes so that any xori that is needed is created before
the last DAG combine to enable optimization.

I've seen cases where we end up with
(or (xori (setcc), 1), (xori (setcc), 1)) which we would ideally
convert to (xori (and (setcc), (setcc)), 1). This patch doesn't
accomplish that yet, but it should allow us to add DAG
combines as follow ups. Example https://godbolt.org/z/Y4qnvsq1b

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131729
2022-08-19 13:51:53 -07:00
Philip Reames 59960e8db9 [RISCV] Factor out getVectorImmCost cost after 0e7ed3 [nfc] 2022-08-19 12:53:54 -07:00
Philip Reames e7fda46300 [RISCV] Correct costs for vector ceil/floor/trunc/round
Add vector costs for ceil/floor/trunc/round. As can be seen in the tests, the prior default costs were a significant under estimate of the actual code generated.

These costs are computed by simply generating code with the current backend, and then counting the number of instructions. I discount one vsetvli, and ignore the return.

Differential Revision: https://reviews.llvm.org/D131967
2022-08-19 10:37:39 -07:00
Craig Topper 961838cc13 [RISCV] Add passthru operand to RISCVISD::SETCC_VL.
Use it to the fix a bug in the fceil/ffloor lowerings. We were
setting the passthru to IMPLICIT_DEF before and using a mask
agnostic policy. This means where the incoming bits in
the mask were 0 they could be anything in the outgoing mask. We
want those bits in the outgoing mask to be 0. This means we need to
pass the input mask as the passthru.

This generates worse code because we are unable to allocate the
v0 register to the output due to an earlyclobber constraint. We
probably need a special TIED pseudoinstruction and probably custom
isel since you can't use V0 twice in the input pattern.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132058
2022-08-19 08:53:44 -07:00
Craig Topper c9a41fe60a [RISCV] Prefer vnsrl.wi v8, v8, 0 over vnsrl.wx v8, v8, x0.
I have a couple data points that some microarchitectures prefer
the immediate 0 over x0. Does anyone know of microarchitectures
where the opposite is true?

Unfortunately, this is different than the vncvt.x.x.w alias
from the spec. Perhaps the alias was poorly chosen if x0 isn't
as optimal as immediate 0 on all microarchitectures.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132041
2022-08-19 08:40:17 -07:00
Alexey Bataev 0e7ed32c71 [SLP]Cost for a constant buildvector.
In many cases constant buildvector results in a vector load from a
constant/data pool. Need to consider this cost too.

Differential Revision: https://reviews.llvm.org/D126885
2022-08-19 08:02:42 -07:00
Alexey Bataev d53e245951 [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC.
Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to
better estimate cost with immediate values.

Part of D126885.
2022-08-19 07:33:00 -07:00
Craig Topper ba1f4cab44 [RISCV] Copy SDNodeFlags in lowerToScalableOp.
Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D132177
2022-08-18 20:42:59 -07:00
Craig Topper 5349aa2354 [RISCV] Copy SDNodeFlags in doPeepholeMaskedRVV and doPeepholeMergeVVMFold
Especially the NoFPExcept flag for FP.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D132173
2022-08-18 20:42:46 -07:00
Craig Topper 37c47b2cac [RISCV] Change how mtune aliases are implemented.
The previous implementation translated from names like sifive-7-series
to sifive-7-rv32 or sifive-7-rv64. This also required sifive-7-rv32
and sifive-7-rv64 to be valid CPU names. As those are not real
CPUs it doesn't make sense to accept them in -mcpu.

This patch does away with the translation and adds sifive-7-series
directly to RISCV.td. Removing sifive-7-rv32 and sifive-7-rv64.
sifive-7-series is only allowed in -mtune.

I've also added "rocket" to RISCV.td but have not removed rocket-rv32
or rocket-rv64.

To prevent -mcpu=sifive-7-series or -mcpu=rocket being used with llc,
I've added a Feature32Bit to all rv32 CPUs. And made it an error to
have an rv32 triple without Feature32Bit. sifive-7-series and rocket
do not have Feature32Bit or Feature64Bit set so the user would need
to provide -mattr=+32bit or -mattr=+64bit along with the -mcpu to
avoid the error.

SiFive no longer names their newer products with 3, 5, or 7 series.
Instead we have p200 series, x200 series, p500 series, and p600 series.
Following the previous behavior would require a sifive-p500-rv32 and
sifive-p500-rv64 in order to support -mtune=sifive-p500-series. There
is currently no p500 product, but it could start getting confusing if
there was in the future.

I'm open to hearing alternatives for how to achieve my main goal
of removing sifive-7-rv32/rv64 as a CPU name.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131708
2022-08-18 16:22:25 -07:00
Philip Reames 4d87591028 [RISCV] Use VScaleForTuning in costing of operations whose cost depends on VL
On known hardware, reductions, gather, and scatter operations have execution latencies which correlated with the vector length (VL) of the operation. Most other operations (e.g. simply arithmetic) don't correlated in this way, and instead essentially fixed cost as VL varies.

When I'd implemented initial scalable cost model support for reductions, gather, and scatter operations, I had used an upper bound on the statically unknown VL. The argument at the time was that this prevented falsely low costs, and biased the vectorizer away from generating bad (on some hardware) code. Unfortunately, practical experience shows we were a bit too effective at that goal, and the high costs defacto prevents vectorization using these constructs at all.

This patch reverses course, and ties the returned cost not to the maximum possible VL, but the VL which would correspond to VScaleForTuning. This parameter is the same one the vectorizer uses when normalizing loop costs, so the term effectively cancels out. The result is that the vectorizer now sees these constructs as comparable in cost to their fixed length variants.

This does introduce the possibility of the cost for these operations being a significant under estimate on platforms where actual VLEN is far from that implied by VScaleForTuning. On such platforms, we might make poor heuristic choices. Probably not in LV itself (due to the cancellation mentioned above), but possibly during e.g. lowering. I'm not currently aware of any concrete examples of this, but this patch does open a concern which did not previously exist.

Previously, we had the problem of overestimating costs causing the same problem on machines much closer to default values for vscale for tuning. With this patch, we still have that problem potentially if vscale for tuning is set high (manually), and then the code is run on a narrow VLEN machine.

Differential Revision: https://reviews.llvm.org/D131519
2022-08-18 13:10:03 -07:00
Simon Pilgrim fdec50182d [CostModel] Replace getUserCost with getInstructionCost
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Differential Revision: https://reviews.llvm.org/D79483
2022-08-18 11:55:23 +01:00
WuXinlong 515ece1a90 [RISCV] Add MC support of RISCV Zca Extension
This patch adds support for part of Zc extension which will be frozen soon.

This extension is designed to continue reducing the binary size of RISC-V programs.
In this patch:
`Zca` is a subset of C extension instructions that are compatible with the Zc extension.

The spec of Zc ext is [[ https://github.com/riscv/riscv-code-size-reduction/releases | Here ]]

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130141
2022-08-18 12:13:35 +08:00
Daniil Fukalov 7ed3d81333 [NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.

E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.

Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D117723
2022-08-18 00:38:55 +03:00
Craig Topper 550fab53e1 [RISCV] Fold (sub C, (xor (setcc), 1)) -> (add (setcc), C-1).
Extracted from D131729 where we handled C==0. It's now generalized
to more constants.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132000
2022-08-17 09:50:08 -07:00
Craig Topper ab4cd154c6 [RISCV] Refactor performSUBCombine to prepare for D132000.
This refactors the code into a separate function with early returns.
D132000 adds an additional operation to the if/else that selects
NewLHS, but can otherwise share the rest of the code.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132002
2022-08-17 09:50:08 -07:00
Alex Bradbury ce38128194 [RISCV] Avoid redundant branch-to-branch when expanding cmpxchg
If the success value of a cmpxchg is used in a branch, the expanded
cmpxchg sequence ends up with a redundant branch-to-branch (as the
backend atomics expansion happens as late as possible, passes to
optimise such cases have already run). This patch identifies this case
and avoid it when expanding the cmpxchg.

Note that a similar optimisation is possible for a BEQ on the cmpxchg
success value. As it's hard to imagine a case where real-world code may
do that, this patch doens't handle that case.

Differential Revision: https://reviews.llvm.org/D130192
2022-08-17 13:49:15 +01:00
Craig Topper d27c147aaa [RISCV] Allow lowerSELECT to fold integer setcc with FP select.
We'd pick it up in DAG combine later even if we didn't handle it here.
No test changes because we get it in DAG combine anyway.
2022-08-16 21:28:54 -07:00
Craig Topper ba1fb54821 [RISCV] Reuse existing VT variable instead of calling getValueType() repeatedly. NFC 2022-08-16 19:56:55 -07:00
Monk Chiang 0af4651c0f [RISCV] Add scheduling class for vector pseudo segment instructions.
Add scheduling resource for vector segment load/store instructions in D128886.
I miss to add scheduling resource for pseudo segment instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130222
2022-08-16 17:54:47 -07:00
Craig Topper 53ce22e429 Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine."
This time using N1 instead of N0 since N1 points to the original
setcc. This now affects scheduling as I expected.

Original commit message:
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
2022-08-16 15:51:07 -07:00
Craig Topper 2dfa4b6475 Revert "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine."
This reverts commit 1380b21ceb.

I mixed up N0 and N1 and didn't do what I intended.
2022-08-16 15:47:01 -07:00