Commit Graph

281 Commits

Author SHA1 Message Date
ziliangzl 60f388930d [VENTUS][fix]Assign initial value for VastartStoreFrameIndex
VastartStoreFrameIndex havn't initial value, caused issue THU-DSP-LAB/llvm-project#117
2024-05-15 13:39:07 +08:00
zhoujing 797c85d829 [patch] Add a fix patch from terapines_dev branch 2024-03-08 18:23:47 +08:00
zhoujing 87fe5f3ce8 [VENTUS][fix] Put local variables declared in kernel function into shared memory 2024-03-05 16:32:59 +08:00
qinfan 8dfd3561c4 [VENTUS][RISCV][feat] Legalized vector parameters
Summary: LegaLegalized vector parameters, but not been added FileCheck now.

Test Plan: Legalized vector parameters

Differential Revision: http://www.tpt.com/D740
2023-11-09 13:09:54 +08:00
zhoujingya 0a45eabde0 Revert "[VENTUS][RISCV][feat] Legalized vector parameters"
This reverts commit 7bd98c0ff8.
2023-10-08 17:31:53 +08:00
qinfan 7bd98c0ff8 [VENTUS][RISCV][feat] Legalized vector parameters
Summary: LegaLegalized vector parameters, but not been added FileCheck now.

Test Plan: Legalized vector parameters

Differential Revision: http://www.tpt.com/D740
2023-10-08 11:28:04 +08:00
zhoujingya 0e5eef6abb [VENTUS][fix] fix add instruction
Summary:
fix addi instruction, there will be a hardware error when immediate is
negative number.

Test Plan: fix add instruction

Reviewers: zhoujing

Subscribers: zhoujing

Differential Revision: http://www.tpt.com/D722

Signed-off-by: qinfan <qinfan.wang@terapines.com>
2023-09-08 10:32:17 +08:00
zhoujingya b7b8fa50ba [VENTUS][fix] Fix load instruction selection pattern for vastart
In standard riscv vararg support, the varstart frame index will be stored in stack,
but because if the design of ventus, some code generation  will be like this
        vlw.v   v0, -44(v8)
        vadd.vi v1, v0, 4
        vsw.v   v1, -44(v8)
        \vlw12.v v0, 0(v0)
the last vlw12 instruction is actually illeagl, it should be vlw
2023-09-05 14:41:10 +08:00
zhoujing 6636793f64 Merge libclc-vector-support 2023-06-16 09:41:08 +08:00
zhoujing 1fab7b80f3 Legalize operation for SETCC 2022-12-29 17:13:49 +08:00
Aries b9da010dd5 [NFC] Refactor messy switch...case 2022-12-22 14:50:13 +08:00
Aries beb878e97c Add OpenCL addressing space mapping to RISCVAS.
Add kernel argument lowering.
Clean up a few unrelated RVV code.
2022-12-20 17:08:08 +08:00
Aries dee3135130 Drafting divergent related code, not working yet. 2022-12-19 18:11:34 +08:00
Aries 894931f522 More clean up and fix build error. 2022-12-19 10:10:28 +08:00
Aries 521e83631d Roughly cleaned RVV instruction selection. 2022-12-19 09:40:05 +08:00
Aries 35633e31e3 In the middle of removing RVV code. 2022-12-16 18:04:43 +08:00
Aries f1eff7fcfe Very very early step to remove RVV features from code base. 2022-12-16 17:33:54 +08:00
Fangrui Song b0df70403d [Target] llvm::Optional => std::optional
The updated functions are mostly internal with a few exceptions (virtual functions in
TargetInstrInfo.h, TargetRegisterInfo.h).
To minimize changes to LLVMCodeGen, GlobalISel files are skipped.

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 22:43:14 +00:00
Krzysztof Parzyszek 864aaa21b4 TargetLowering: convert Optional to std::optional 2022-12-01 16:19:10 -08:00
Philip Reames 7d82c99403 [RISCV][TTI] Account for constant materialization cost when costing arithmetic operations
At the IR level, we generally assume that constants are free to materialize. However, for RISCV due to some quirks of the ISA, materializing arbitrary constants can be rather expensive. We frequently fallback to constant pool loads.

We've been slowly moving in the direction of modeling the cost of the remat as part of the instruction cost. This has the effect of disincentivizing vectorization - mostly SLP - when we'd have to materialize an expensive constant.

We need better modeling of which constants are expensive and not, but the moment let's be consistent with how we model arithmetic and memory instructions. The difference between the two is that arithmetic can sometimes fold a splat operation which stores can not.

Differential Revision: https://reviews.llvm.org/D138941
2022-11-30 07:20:51 -08:00
Philip Reames b25672ba82 [RISCV] Separate out helper for checking if vector splat supported for operand [nfc] 2022-11-29 11:05:46 -08:00
Stanislav Mekhanoshin bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Yeting Kuo ed9638c44b [VP][RISCV] Add vp.nearbyint and RISC-V support.
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137685
2022-11-16 14:05:35 +08:00
Craig Topper dde8423f21 [RISCV] Expand i32 abs to negw+max at isel.
This adds a RISCVISD::ABSW to remember that we started with an i32
abs. Previously we used a DAG combine of (sext_inreg (abs)) to
delay emitting a freeze from type legalization in order to make
ComputeNumSignBits optimizations work on other promoted nodes.

This new approach always uses negw+max even if the result doesn't
need to be sign extended. This helps the RISCVSExtWRemoval pass
if the sext.w is in another basic block.
2022-11-14 19:44:05 -08:00
Craig Topper 6254495c6b [RISCV] Move RVVBitsPerBlock to TargetParser.h so we can use it in clang. NFC
Differential Revision: https://reviews.llvm.org/D137266
2022-11-02 13:09:14 -07:00
Yeting Kuo 71e4e35581 [VP][RISCV] Add vp.rint and RISC-V support.
FRINT uses dynamic rounding mode instead of static rounding mode. The patch
rename VFCVT_X_F_VL to VFCVT_RM_X_F_VL for static rounding mode uses and added
new ISDNode VFCVT_X_F_VL directly selected to PseudoVFCVT_X_F_V.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D136662
2022-11-01 14:52:47 +08:00
Craig Topper e94dc58dff [RISCV] Inline scalar ceil/floor/trunc/rint/round/roundeven.
This avoids the call overhead as well as the the save/restore of
fflags and the snan handling in the libm function.

The save/restore of fflags and snan handling are needed to be
correct for -ftrapping-math. I think we can ignore them in the
default environment.

The inline sequence will generate an invalid exception for nan
and an inexact exception if fractional bits are discarded.

I've used a custom inserter to explicitly create the control flow
around the float->int->float conversion.

We can probably avoid the final fsgnj after the conversion for
no signed zeros FMF, but I'll leave that for future work.

Note the comparison constant is slightly different than glibc uses.
They use 1<<53 for double, I'm using 1<<52. I believe either are valid.
Numbers >= 1<<52 can't have any fractional bits. It's ok to do the
float->int->float conversion on numbers between 1<<53 and 1<<52 since
they will all fit in 64. We only have a problem if the double can't fit
in i64

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D136508
2022-10-26 14:36:49 -07:00
Philip Reames 60c91fd364 [RISCV] Disallow scale for scatter/gather
RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely.

This has two effects:
* First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to.
* Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.)

The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics.

Differential Revision: https://reviews.llvm.org/D134382
2022-09-22 15:31:26 -07:00
Craig Topper 52708be182 [RISCV] Remove support for the unratified Zbe, Zbf, and Zbm extensions.
These extensions do not appear to be on their way to ratification.
2022-09-22 13:04:41 -07:00
Craig Topper 8b8e18e11f [RISCV] Replace RISCVISD::GREV/GORC/SHFL/UNSHFL with BREV8/ORC_B/ZIP/UNZIP.
With Zbp removed, we no longer need the generalized forms.

The computeKnownBitsForTargetNode code brev8/orc.b is still based
on the general form with the shift amount forced to 7.
2022-09-21 21:57:59 -07:00
Craig Topper 182aa0cbe0 [RISCV] Remove support for the unratified Zbp extension.
This extension does not appear to be on its way to ratification.

Still need some follow up to simplify the RISCVISD nodes.
2022-09-21 21:22:42 -07:00
Craig Topper 1d8a7adca6 [RISCV] Rename RISCVISD::SINT_TO_FP_VL/UINT_TO_FP_VL. NFC
Name them after the instructions VFCVT_RTZ_X(U)_F_VL to make it
clear that the ISD nodes don't have the poison semantics of
ISD::SINT_TO_FP/UINT_TO_FP.

I play to reuse this node for a FP_TO_SINT_SAT/FP_TO_UINT_SAT
patch and need the instruction semantics.
2022-09-21 15:33:04 -07:00
Craig Topper 70a64fe7b1 [RISCV] Remove support for the unratified Zbt extension.
This extension does not appear to be on its way to ratification.

Out of the unratified bitmanip extensions, this one had the
largest impact on the compiler.

Posting this patch to start a discussion about whether we should
remove these extensions. We'll talk more at the RISC-V sync meeting this
Thursday.

Reviewed By: asb, reames

Differential Revision: https://reviews.llvm.org/D133834
2022-09-20 20:26:48 -07:00
Craig Topper 8d7e73effe [RISCV] Teach lowerVECTOR_SHUFFLE to recognize some shuffles as vnsrl.
Unary shuffles such as <0,2,4,6,8,10,12,14> or <1,3,5,7,9,11,13,15>
where half the elements are returned, can be lowered using vnsrl.

SelectionDAGBuilder lowers such shuffles as a build_vector of
extract_elements since the mask has less elements than the source.
To fix this, I've enable the extractSubvectorIsCheapHook to allow
DAGCombine to rebuild the shuffle using 2 extract_subvectors preceding
the shufffle.

I've gone very conservative on extractSubvectorIsCheapHook to minimize
test impact and match what we have test coverage for. This can be
improved in the future.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133736
2022-09-13 11:07:11 -07:00
Alex Bradbury c44c1e9d3e [RISCV] Implement isMaskAndCmp0FoldingBeneficial hook
This hook is currently only used by CodeGenPrepare, which will sink *and
duplicate* an 'and' into a block that has an 'icmp 0' user of it if the
hook returns true.

This hook is less useful for RISC-V than for targets like AArch64 that
have a TBZ (test bit and branch if zero instruction), but may still be
profitable if Zbs is available and a BEXTI can be selected.

Conservatively, we return false even if Zbs is enabled for any masks
that fit in the ANDI immediate because it's possible the only use is a
branch on the result, and ANDI+BNEZ => BEXTI+BNEZ isn't a profitable
transformation.

Differential Revision: https://reviews.llvm.org/D131492
2022-09-13 18:54:00 +01:00
Craig Topper f0332d12ae [RISCV] Improve vector fceil/ffloor lowering by changing FRM.
This adds new VFCVT pseudoinstructions that take a rounding mode operand. A custom inserter is used to insert additional instructions to change FRM around the
VFCVT.

Some of this is borrowed from D122860, but takes a somewhat different direction. We may migrate to that patch, but for now I was trying to keep this as independent from
RVV intrinsics as I could.

A followup patch will use this approach for FROUND too.

Still need to fix the cost model.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D133238
2022-09-05 19:03:44 -07:00
Simon Pilgrim f9de13232f [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Differential Revision: https://reviews.llvm.org/D132520
2022-08-24 17:28:18 +01:00
Alex Bradbury 47b1f8362a [RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls
Prior to this patch, libcalls inserted by the SelectionDAG legalizer
could never be tailcalled. The eligibility of libcalls for tail calling
is is partly determined by checking TargetLowering::isInTailCallPosition
and comparing the return type of the libcall and the calleer.
isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly
(which always returns false if not implemented by the target).

This patch provides a minimal implementation of
TargetLowering::isUsedByReturnOnly - enough to support tail calling
libcalls on hard float ABIs. Soft-float ABIs are left for a follow on
patch. libcall-tail-calls.ll also shows missed opportunities to tail
call integer libcalls, but this is due to issues outside of
the isUsedByReturnOnly hook.

Differential Revision: https://reviews.llvm.org/D131087
2022-08-10 10:50:29 +01:00
Lorenzo Albano 71b7c03fd6 [RISCV][VP] Custom lower VP_STRIDED_LOAD and VP_STRIDED_STORE
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121113
2022-08-01 09:23:45 -07:00
Craig Topper d21b315360 [RISCV] Remove vmerges from vector ceil, floor, trunc lowering.
Use masked operations to suppress spurious exception bits being
set in fflags. Unfortunately, doing this adds extra copies.
2022-07-30 10:58:41 -07:00
Craig Topper a23f07fb1d [RISCV] Add merge operands to more RISCVISD::*_VL opcodes.
This adds a merge operand to all of the binary _VL nodes. Including
integer and widening. They all share multiclasses in tablegen
so doing them all at once was easiest.

I plan to use FADD_VL in an upcoming patch. The rest are just for
consistency to keep tablegen working.

This does reduce the isel table size by about 25k so that's nice.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130816
2022-07-30 10:26:38 -07:00
LiaoChunyu bf4f9a468a [RISCV]Enable isIntDivCheap when attribute is minsize
Don't expand divisions by constants when attribute is minsize.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130543
2022-07-27 18:22:51 +08:00
jacquesguan d8800ead62 [RISCV] Scalarize binop followed by extractelement.
This patch adds shouldScalarizeBinop to RISCV target in order to convert an extract element of a vector binary operation into an extract element followed by a scalar binary operation.

Differential Revision: https://reviews.llvm.org/D129545
2022-07-25 17:23:31 +08:00
Philip Reames dde2a7fb6d [RISCV] Exploit fact that vscale is always power of two to replace urem sequence
When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale.

vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.)

We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.)

It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic.

Differential Revision: https://reviews.llvm.org/D129609
2022-07-13 10:54:47 -07:00
Craig Topper 35ec8a423d [RISCV] Teach shouldConvertConstantLoadToIntImm that constant materialization can use constant pools.
I think it only makes sense to return true here if we aren't going
to turn around and create a constant pool for the immmediate.

I left out the check for useConstantPoolForLargeInts() thinking
that even if you don't want the commpiler to create a constant pool
you might still want to avoid materializing an integer that is
already available in a global variable.

Test file was copied from AArch64/ARM and has not been commited yet.
Will post separate review for that.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D129402
2022-07-10 14:10:17 -07:00
Craig Topper c579ab53bd [RISCV] Move vfma_vl+fneg_vl matching to DAG combine.
This patch adds 3 new _VL RISCVISD opcodes to represent VFMA_VL with
different portions negated. It also adds a DAG combine to peek
through FNEG_VL to create these new opcodes.

This is modeled after similar code from X86.

This makes the isel patterns more regular and reduces the size of
the isel table by ~37K.

The test changes look like regressions, but they point to a bug that
was already there. We aren't able to commute a masked FMA instruction
to improve register allocation because we always use a mask undisturbed
policy. Prior to this patch we matched two multiply operands in a
different order and hid this issue for these test cases, but a different
test still could have encountered it.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D128310
2022-06-24 00:00:37 -07:00
Craig Topper f912d21e67 [RISCV] Add RISCVISD opcodes for the rest of get*Addr.
This adds RISCVISD opccodes for LA, LA_TLS_IE, and LA_TLS_GD to
remove creation of MachineSDNodes form get*Addr. This makes the
code consistent with the previous patches that added RISCVISD::HI,
ADD_LO, LLA, and TPREL_ADD.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128325
2022-06-22 09:21:07 -07:00
Craig Topper 0efbf5bfbb [RISCV] Move the passthru operand for RISCVISD::VRGATHER*_VL nodes. NFC
Put it before the VL instead of as the first operand. I want to add
passthru to more operands, but the commutable ones like VADD_VL
require the commutable operands to be operand 0 and 1. So we can't
have the passthru as operand 0 for those.
2022-06-21 14:01:02 -07:00
Craig Topper e01353f816 [RISCV] Add RISCVISD opcode for PseudoAddTPRel.
Use it along with RISCVISD::HI and ADD_LO to avoid emitting
MachineSDNodes during lowering.
2022-06-20 20:56:52 -07:00
Craig Topper 16d3a82de5 [RISCV] Add merge operand to RISCVISD::VRGATHER*_VL nodes.
Use it in place of VSELECT_VL+VRGATHER*_VL.

This simplifies the isel patterns.

Overall, I think trying to match select+op to create masked instructions
in isel doesn't scale. We either need to do it in DAG combine, pre-isel
peepole, or post-isel peephole. I don't yet know which is the right
answer, but for this case it seemed best to be able to request the
masked form directly from lowering.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D128023
2022-06-20 18:58:24 -07:00