Commit Graph

2266 Commits

Author SHA1 Message Date
Craig Topper 5afdceb82b [RISCV] Add RISCVISD opcode for PseudoLLA.
Rather than emitting a MachineSDNode from lowering. Let isel match it.

This is consistent with the RISCVISD::HI and ADD_LO nodes that were
also added. Having them both the same will make D127679 consistent.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127714
2022-06-16 15:11:03 -07:00
Craig Topper 4191de262f [RISCV] Don't emit LUI/ADDI MachineSDNodes from getAddr
Instead add RISCVISD opcodes that will be selected to LUI/ADDI
during isel.

I'm looking into maybe moving doPeepholeLoadStoreADDI into isel.
Having the ADDI as a RISCVISD node will make it visible to isel.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127713
2022-06-16 14:56:07 -07:00
Philip Reames 2fa2cee6a8 [RISCV] Start merging demanded reasoning - starting with load/stores [nfc]
This change merges the logic for reasoning about demanded portions of the VTYPE register between the main dataflow algorithm and the backwards mutation post pass. In the process, we get to delete a bunch of now redundant code.

This should be entirely NFC. I included a slight hack (see TODO) to avoid changing behavior in the post pass while being able to use the generalized logic in the prepass. I will fix the TODO in a separate change once this lands.

Differential Revision: https://reviews.llvm.org/D127983
2022-06-16 14:34:53 -07:00
Philip Reames d764aa7fc6 [RISCV] Add cost model for scalable scatter and gather
The costing we use for fixed length vector gather and scatter is to simply count up the memory ops, and multiply by a fixed memory op cost. For scalable vectors, we don't actually know how many lanes are active. Instead, we have to end up making a worst case assumption on how many lanes could be active. In the generic +V case, this results in very high costs, but we can do better when we know an upper bound on the VLEN.

There's some obvious ways to improve this - e.g. using information about VL and mask bits from the instruction to reduce the upper bound - but this seems like a reasonable starting point.

The resulting costs do bias us pretty strongly away from generating scatter/gather for generic +V.  Without this, we'd be returning an invalid cost and thus definitely not vectorizing, so no major change in practical behavior expected.

Differential Revision: https://reviews.llvm.org/D127541
2022-06-16 14:22:31 -07:00
Philip Reames 89a11ebd8e [RISCV] Avoid reducing etype just to initialize lane 0 of an undef vector
If we're writing to an undef vector (i.e. implicit_def), we can change the value of bits outside the requested write without consequence. This allows us to avoid a VSETVLI just for narrowing the value written.

Differential Revision: https://reviews.llvm.org/D127880
2022-06-16 11:14:21 -07:00
Craig Topper 6716195cd7 [RISCV] Merge TIED_TU and TIED instructions for VWADD_W/VWSUB_W by using policy operand.
This removes one of the uses of ForceTailUndisturbed.
2022-06-16 10:06:11 -07:00
Philip Reames 6ed81ec164 [RISCV] Reorder function definitions to reduce upcoming diff [nfc] 2022-06-16 09:25:27 -07:00
Craig Topper 912a5172f8 [RISCV] Use TAIL_AGNOSTIC in riscv_fma_vl patterns.
We may eventually need tail undisturbed patterns, but we will need
a policy operand on the ISD node to communicate it.
2022-06-16 09:09:36 -07:00
Philip Reames 27c61d033f [RISCV] Split DemandedField logic in advance of reuse in dataflow [nfc]
This change just moves some code around, and extracts out a helper function expected to be useful when reusing the demanded field logic in the forward dataflow.
2022-06-16 08:49:41 -07:00
Philip Reames 37fa5850f1 [RISCV] Move getSEWLMULRatio out of VSETVLIInfo [nfc] 2022-06-16 08:40:20 -07:00
Craig Topper b34e3f40e7 [RISCV] Use TAIL_UNDISTURBED_MASK_UNDISTURBED for riscv_slidedown_vl unless the merge op is undef.
If the merge operand isn't undef we need to be using tail undisturbed.

Turns out all of our uses of riscv_slidedown_vl use undef so this
doesn't affect any tests.
2022-06-16 08:35:27 -07:00
Philip Reames 4a3e46115a [RISCV] Extend demanded field transform in InsertVSETVLI to VTYPE subfeilds
The motivating case, and the only one actually enabled by this patch, is a load or store followed by another op with the same SEW/LMUL ratio.

As an example, consider:

define void @test1(ptr %in, ptr %out) {
entry:
  %0 = load <8 x i16>, ptr %in, align 2
  %1 = sext <8 x i16> %0 to <8 x i32>
  store <8 x i32> %1, ptr %out, align 4
  ret void
}

Without this patch, we get:

	vsetivli	zero, 8, e16, mf4, ta, mu
	vle16.v	v8, (a0)
	vsetvli	zero, zero, e32, mf2, ta, mu
	vsext.vf2	v9, v8
	vse32.v	v9, (a1)
	ret

Whereas with the patch we get:

	vsetivli	zero, 8, e32, mf2, ta, mu
	vle16.v	v8, (a0)
	vsext.vf2	v9, v8
	vse32.v	v9, (a1)
	ret

We have rewritten the first vsetvli and thus removed the second one.

As is strongly hinted by the code structure and todos, I am planning on communing this with all (or most all?) of the cases from isCompatible used in the forward data flow. This will be done in a series of following changes - some NFC reworks, and some reviewed optimization extensions.

Differential Revision: https://reviews.llvm.org/D127780
2022-06-16 08:01:27 -07:00
Guillaume Chatelet 412c788ab0 [NFC][Alignment] Use Align in MCAlignFragment 2022-06-15 12:31:00 +00:00
Kito Cheng 687e56614f [RISCV] Fixing undefined physical register issue when subreg liveness tracking enabled.
RISC-V expand register tuple spilling into series of register spilling after
register allocation phase by the pseudo instruction expansion, however part of
register tuple might be still undefined during spilling, machine verifier will
complain the spill instruction is using an undefined physical register.

Optimal solution should be doing liveness analysis and do not emit spill
and reload for those undefined parts, but accurate liveness info at that point
is not so easy to get.

So the suboptimal solution is still spill and reload those undefined parts, but
adding implicit-use of super register to spill function, then machine
verifier will only report report using undefined physical register if
the when whole super register is undefined, and this behavior are also
documented in MachineVerifier::checkLiveness[1].

Example for demo what happend:

```
  v10m2 = xxx
  # v12m2 not define yet
  PseudoVSPILL2_M2 v10m2_v12m2
  ...
```

After expansion:
```
  v10m2 = xxx
  # v12m2 not define yet
  # Expand PseudoVSPILL2_M2 v10m2_v12m2 to 2 vs2r
  VS2R_V v10m2
  VS2R_V v12m2 # Use undef reg!
```

What this patch did:
```
  v10m2 = xxx
  # v12m2 not define yet
  # Expand PseudoVSPILL2_M2 v10m2_v12m2 to 2 vs2r
  VS2R_V v10m2 implicit v10m2_v12m2
  # Use undef reg (v12m2), but v10m2_v12m2 ins't totally undef, so
  # that's OK.
  VS2R_V v12m2 implicit v10m2_v12m2
```

[1] https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/MachineVerifier.cpp#L2016-L2019

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127642
2022-06-15 16:23:39 +08:00
Yeting Kuo 9096a52566 [RISCV] Teach vsetvli insertion to not insert redundant vsetvli right after VLEFF/VLSEGFF.
VSETVLIInfos right after VLEFF/VLSEGFF are currently unknown since they modify
VL. Unknown VSETVLIInfos make next vector operations needed to be inserted
VSET(I)VLI. Actually the next vector operation of VLEFF/VLSEGFF may not need to
be inserted VSET(I)VLI if it uses same VTYPE and the resulted vl of
VLEFF/VLSEGFF.

Take the below C code as an example,

  vint8m4_t vec_src1 = vle8ff_v_i8m4(str1, &new_vl, vl);
  vbool2_t mask1 = vmseq_vx_i8m4_b2(vec_src1, 0, new_vl);
  vsetvli insertion adds a redundant vsetvli for that,

Assembly result:
  vsetvli a2,a2,e8,m4,ta,mu
  vle8ff.v v28,(a0)
  csrr a3,vl ; redundant
  vsetvli zero,a3,e8,m4,ta,mu ; redundant
  vmseq.vi v25,v28,0

After D126794, VLEFF/VLSEGFF has a define having value of VL. The patch consider
there is a ghost vsetvli right after VLEFF/VLSEGFF. The ghost VSET(I)LIs use the
vl output of the VLEFF/VLSEGFF as its AVL and same VTYPE of the VLEFF/VLSEGFF.
The ghost vsetvli must be redundant, and we could use it to get the VSETVLIInfo
right after VLEFF/VLSEGFF.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127576
2022-06-15 13:58:40 +08:00
wangpc 8910349e43 [RISCV][NFC] Set default value for BaseInstr in RISCVVPseudo
Since almost all pseudos have the same form of BaseInstr, we
can just set it as default value to reduce some lines.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127632
2022-06-15 10:59:45 +08:00
Craig Topper 5ae3f65cfa [RISCV] Replace uses of VLOpFrag in VLMax patterns with srcvalue.
These are on inner nodes and we're dropping the captured $vl anyway.
2022-06-14 19:19:35 -07:00
Philip Reames facb96584e [RISCV] Minor code/comment improvement in prepass of InsertVSETVLI [nfc] 2022-06-14 16:18:11 -07:00
Saleem Abdulrasool 1582bcd003 RISCV: handle 64-bit PCREL data relocations
We would previously fail to handle 64-bit PC-relative relocations on
RISCV.  This was exposed by trying to build with
`-fprofile-instr-generate`.

The original changes restricted the relocation handling to the text
segment as the paired relocations are undesirable in at least the debug
and .eh_frame sections.  We now make this explicit to handle the general
case for the data relocations as well.

It would be preferable to use `R_RISCV_n_PCREL` when available to avoid
an extra relocation.

Differential Revision: https://reviews.llvm.org/D127549
Reviewed By: luismarques, MaskRay

Fixes: #55971
2022-06-14 21:39:16 +00:00
Philip Reames c67c4133ac [RISCV] Split out transfer function explicitly in VSETVLI insertion dataflow [nfc]
In an effort to make this code easier to read and extend, this splits out helper functions for the transfer function of the data flow. Due to the other results computed during the phases, we can't completely abstract away everything, but we can abstract the actual state transitions.

The motivation here is the following upcoming changes:
* The fault first load patch - already approved, this will be rebased over - adds another case into the transferAfter path.
* An upcoming patch to fold the local prepass back into the main algorithm greatly complicates the transferBefore logic.

Differential Revision: https://reviews.llvm.org/D127761
2022-06-14 14:07:15 -07:00
Philip Reames 44a0a558dc [RISCV] Split out subfields in InsertVSETVLI's demanded fields analysis [nfc]
At the moment, this just gets the infrastructure in place.  Following changes will start using this in non-trivial ways.
2022-06-14 11:35:24 -07:00
Philip Reames 52b166c0de [RISCV] Split out getEEWForLoadStore [nfc]
Mostly about allowing reuse in an upcoming patch, but also makes the code slightly easier to follow.
2022-06-14 10:10:43 -07:00
Philip Reames 7659dc6cdd [RISCV] simplify emitVSETVLIs handling of vsetvli xN, phi(), vtype case [NFC]
This is possibly somewhat subjective, but having an explicitly named flag to track the property required and code structure that more closely matches phase 1/2 of the dataflow seems much easier to read.

Differential Revision: https://reviews.llvm.org/D126893
2022-06-14 08:00:24 -07:00
Craig Topper 17457be1c3 [RISCV] Fix use of texternalsym in output pattern where input was tglobaladdr. NFC
I don't think the name used in the output pattern is used to control
anything about the isel table emission, but it should match the input.
2022-06-13 15:42:42 -07:00
Craig Topper e4062522d3 [RISCV] Disable matchSplatAsGather for i1 vectors to prevent creating illegal nodes.
We were incorrectly creating a VRGATHER node with i1 vector type. We
could support this by promoting the mask to i8 and truncating it, but
for now I want to prevent the crash.

Fixes PR56007.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127681
2022-06-13 13:41:39 -07:00
Craig Topper bb1a52aa8b Recommit "[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates."
With fix for sanitizer build bot failure.
2022-06-13 11:35:44 -07:00
Mitch Phillips 9d99870590 Revert "[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates."
This reverts commit 8bbcb98848.

Broke the UBSan bot. More details in https://reviews.llvm.org/D127376.
2022-06-13 10:16:28 -07:00
Philip Reames aaeb958ced [RISCV] Mutate instruction after computing transfer rule in InsertVSETVLI [nfc]
If we defer the mutation of the instruction, we can add the assert discussed in D126921.  Once we do that, the API becomes subject to revision - but let's do that in a separate change.
2022-06-13 09:08:25 -07:00
Craig Topper cef03e3dcd [RISCV] Move creation of constant pools from isel to lowering.
This simplifies the isel code by removing the manual load creation.
It also improves our ability to use 0 strided loads for vector splats.

There is an assumption here that Mask and ShiftedMask constants are
cheap enough that they don't become constant pool loads so that our
isel optimizations involving And still work. I believe those constants
are 3 instructions in the worst case.

The rv64zbp-intrinsic.ll changes is a regression caused by intrinsics
being expanded to RISCVISD also occuring during lowering. So the optimizations
were only happening during the last DAGCombine, which can't see through the
load. I believe we can fix this test by implementing
TargetLowering::getTargetConstantFromLoad for RISC-V or by adding the intrinsic
to computeKnownBitsForTargetNode to enable earlier DAG combine. Since Zbp is not
a ratified extension, I don't view these as blocking this patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127520
2022-06-13 09:07:57 -07:00
Craig Topper 052536b923 [RISCV] Use isShiftedInt to improve readability. NFC 2022-06-12 21:04:45 -07:00
Hubert Tong 775a22e32a [NFC] Remove unused variable `MF`
https://reviews.llvm.org/D127583 removed the only use of this variable
and broke builds with warnings-as-errors.
2022-06-12 16:31:55 -04:00
Craig Topper d63b66840f [RISCV] Move some methods out of RISCVInstrInfo and into RISCV namespace.
These methods don't access any state from RISCVInstrInfo. Make them
free functions in the RISCV namespace.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D127583
2022-06-12 10:47:21 -07:00
Fangrui Song adf4142f76 [MC] De-capitalize SwitchSection. NFC
Add SwitchSection to return switchSection. The API will be removed soon.
2022-06-10 22:50:55 -07:00
Philip Reames 536095a27c [RISCV] Refine costs for i1 reductions
Our actual lowering for i1 reductions uses ctpop combined with possibly a vector negate and possibly a logic op afterwards. I believe ctpop to be low cost on all reasonable hardware.

The default costing implementation here was returning quite inconsistent costs. and/or were returning very high costs (because we seem to think moving into scalar registers is very expensive?) and others were returning lower but still too high (because of the assumed tree reduce strategy). While we should probably improve the generic costing strategy for i1 vectors, let's start by fixing the immediate problem.

Differential Revision: https://reviews.llvm.org/D127511
2022-06-10 13:21:52 -07:00
Philip Reames f7bb691d61 [RISCV] Implement isElementTypeLegalForScalableVector TTI hook
This brings us into alignment with AArch64, and in the process fixes a compiler crash bug in uniform store handling in the vectorizer.

Before the recent invalid cost bailout work, this would have also avoided crashes on invalid costs in some cases. I honestly think the vectorizer should gracefully bailout on uniform stores it can't use a scatter for, but it doesn't, so lets take the path of least resistance here. It's also possible that there are other vectorizer bugs AArch64 isn't seeing because of this hook; we don't want to be finding them either.

Differential Revision: https://reviews.llvm.org/D127514
2022-06-10 13:20:58 -07:00
Craig Topper 08ea27bf13 [RISCV] Don't require loop simplify form in RISCVGatherScatterLowering.
We need a preheader and a single latch, but we don't need a dedicated
exit.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127513
2022-06-10 13:00:20 -07:00
Shao-Ce SUN 117e10304b [RISCV] move `isFaultFirstLoad` into `RISCVInstrInfo`
Fix build errors in D126794

```
ld.lld: error: undefined symbol: llvm::MachineInstr::getNumExplicitDefs() const
>>> referenced by RISCVBaseInfo.cpp
>>>               RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a

ld.lld: error: undefined symbol: llvm::MachineInstr::findRegisterDefOperandIdx(llvm::Register, bool, bool, llvm::TargetRegisterInfo const*) const
>>> referenced by RISCVBaseInfo.cpp
>>>               RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
```

Reviewed By: fakepaper56, craig.topper

Differential Revision: https://reviews.llvm.org/D127477
2022-06-11 00:27:53 +08:00
Shao-Ce SUN 93116374e7 Revert "[RISCV] move `isFaultFirstLoad` into `RISCVInstrInfo`"
This reverts commit e018e493c1.

There are some problems with this commit,
related revision: https://reviews.llvm.org/D127477
2022-06-11 00:03:04 +08:00
Craig Topper e91051184c [RISCV] Mark FSIN and other math functions as Expand for scalable vectors.
This prevents them from being assumed legal by the cost model.

This matches what is done for AArch64 SVE.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D123799
2022-06-10 08:40:07 -07:00
Shao-Ce SUN e018e493c1 [RISCV] move `isFaultFirstLoad` into `RISCVInstrInfo`
Fix build errors in D126794

```
ld.lld: error: undefined symbol: llvm::MachineInstr::getNumExplicitDefs() const
>>> referenced by RISCVBaseInfo.cpp
>>>               RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a

ld.lld: error: undefined symbol: llvm::MachineInstr::findRegisterDefOperandIdx(llvm::Register, bool, bool, llvm::TargetRegisterInfo const*) const
>>> referenced by RISCVBaseInfo.cpp
>>>               RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
```

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D127477
2022-06-10 21:03:47 +08:00
Yeting Kuo f68cad9087 [RISCV] Lower VLEFF/VLSEGFF SDNodes to MachineInstrs with VL outputs.
The patch is a replacement of D125199. PseudoReadVL with vtype has worry for
computing same vtypes of VLEFF/VLSEGFF in two different places, DAGToDAG and
InsertVSETVLI. VLEFF/VLSEGFF MI with VL output still could provide the vtype of
VLEFF/VLSEGFF to the users of its VL.

The patch names the new pseudo as original VLEFF/VLSEGFF name suffixed "_VL" and
expand them in RISCVInsertVSETVLI pass.

This patch also reverts commit 4537aae0d5,
"[RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF.".

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126794
2022-06-10 13:57:10 +08:00
Philip Reames 28be4b7454 [RISCV] Simplify InstrInfo access in doPeepholeMaskedRVV [nfc] 2022-06-09 17:02:40 -07:00
Craig Topper 8bbcb98848 [RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates.
For an addition with simm14 and simm15 immediates with 2 or 3 trailing bits,
we can use a shXadd instruction and an addi to do the addition.

This patch teaches RISCVMergeBaseOffset to see through this pattern.
I don't think the sh1add case occurs because we use two addis for that,
but I implemented it for completeness.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127376
2022-06-09 16:07:35 -07:00
Kito Cheng 4b11f90903 [RISCV] Fix missing stack pointer recover
In order to make sure the stack point is right through the EH region,
we also need to restore stack pointer from the frame pointer if we
don't preserve stack space within prologue/epilogue for outgoing variables,
normally it's just checking the variable sized object is present or not
is enough, but we also don't preserve that at prologue/epilogue when
have vector objects in stack.

Example to show what happened:
```
try {
  sp adjust for outgoing args. // 1. Sp changed.
  func_call  // 2. Exception raised
  sp restore // Oh, not restored
} catch {
  // 3. And now we are here.
}

// 4. Prepare to return!, restore return address from stack, but...sp is wrong.
// 5. Screw up!
```

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D126861
2022-06-09 23:38:50 +08:00
Philip Reames 0e29a80fdc [RISCV] Add cost model for reverse shuffle
The majority of the cost appears to be forming the indices vector.

Differential Revision: https://reviews.llvm.org/D127141
2022-06-09 07:21:40 -07:00
Craig Topper c739088af5 [RISCV] Fix 80 column violations in RISCVInsertVSETVLI.cpp. NFC
I think these were likely introduced in the recent work done to
this pass.
2022-06-08 18:58:48 -07:00
Craig Topper 209c07d486 [RISCV] Add debug message that should have been in D126843.
For consistency with the other messages in this file.
2022-06-08 16:46:22 -07:00
Craig Topper e4ba24c17d [RISCV] Support (addi (addi globaladdr, C1), C2) in RISCVMergeBaseOffset.
Add with immediates in the range [-4096, -2049] or [2048, 4095] get
convert to two ADDIs. Teach RISCVMergeBaseOffset to recognize this
pattern as well.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D126843
2022-06-08 08:20:37 -07:00
Craig Topper 33f4da2455 [RISCV] Support LUI+ADDIW in RISCVMergeBaseOffsetOpt::matchLargeOffset.
LUI+ADDIW always produces a simm32. This allows us to always
fold it into a global offset.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D126729
2022-06-08 08:19:21 -07:00
Philip Reames 1ea99328b4 [RISCV] Untangle instruction properties from VSETVLIInfo [NFC]
The abstract state used in the data flow should not know anything about the instructions which produced the abstract states. Instead, when comparing two states, we can simply use information about the machine instr at that time.

In the old design, basically any use of the instruction flags on the current (as opposed to a "Require" - aka upcoming state) would be a bug. We don't seem to actually have any such bugs, but we can make this much more obvious with code structure.

Differential Revision: https://reviews.llvm.org/D126921
2022-06-08 08:09:59 -07:00
Shao-Ce SUN 862f30a428 [RISCV] Add ISD::EH_DWARF_CFA
Based on D24038.
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible __builtin_dwarf_cfa() builtin.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D126181
2022-06-08 22:03:30 +08:00
Craig Topper 0c66deb498 [RISCV] Scalarize gather/scatter on RV64 with Zve32* extension.
i64 indices aren't supported on Zve32*. Scalarize gathers to prevent
generating illegal instructions.

Since InstCombine will aggressively canonicalize GEP indices to
pointer size, we're pretty much always going to have an i64 index.

Trying to predict when SelectionDAG will find a smaller index from
the TTI hook used by the ScalarizeMaskedMemIntrinPass seems fragile.
To optimize this we probably need an IR pass to rewrite it earlier.

Test RUN lines have also been added to make sure the strided load/store
optimization still works.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127179
2022-06-07 08:07:50 -07:00
Matt Arsenault cc5a1b3dd9 llvm-reduce: Add cloning of target MachineFunctionInfo
MIR support is totally unusable for AMDGPU without this, since the set
of reserved registers is set from fields here.

Add a clone method to MachineFunctionInfo. This is a subtle variant of
the copy constructor that is required if there are any MIR constructs
that use pointers. Specifically, at minimum fields that reference
MachineBasicBlocks or the MachineFunction need to be adjusted to the
values in the new function.
2022-06-07 10:14:48 -04:00
Philip Reames 3fa5876216 [RISCV] Reorganize getShuffleCost to make it more clear what's going on [nfc] 2022-06-06 10:11:58 -07:00
yanming bc93d51d36 [NFC][RISCV][format] Blank line between functions, remove unnecessary semicolon. 2022-06-06 15:38:14 +08:00
yanming 8d9d8f866a [RISCV] Define risc-v's own register class to model FP Register.
The default RegisterClass is not enough to model RISCV Register.
We define risc-v's own register class to model FP Register.
This helps to better estimate the register pressure in the loop-vectorize.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D126854
2022-06-06 14:43:52 +08:00
Fangrui Song 77e300ffdf [MC] Change EndOfStatement "unexpected tokens in .xxx directive " to "expected newline" 2022-06-05 15:11:01 -07:00
LiaoChunyu f14d18c7a9 [RISCV] Add more patterns for FNMADD
D54205 handles fnmadd: -rs1 * rs2 - rs3
This patch add fnmadd: -(rs1 * rs2 + rs3) (the nsz flag on the FMA)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D126852
2022-06-04 12:31:45 +08:00
Craig Topper cc3bd43533 [RISCV] Support LUI+ADDIW in doPeepholeLoadStoreADDI.
This fixes an inconsistency between RV32 and RV64. Still considering
trying to do this peephole during isel, but wanted to fix the
inconsistency first.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126986
2022-06-03 18:06:56 -07:00
Craig Topper 170c550ca8 [RISCV] Use SelectionDAG::isBaseWithConstantOffset in scalar load/store address matching.
Test changes are because isBaseWithConstantOffset uses computeKnownBits
and that is able to see that an earlier AND instruction guaranteed
alignment so that we can treat an OR as an ADD.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126970
2022-06-03 10:55:28 -07:00
Craig Topper 4402852002 [RISCV] Reduce scalar load/store isel patterns to a single ComplexPattern. NFCI
Previously we had 3 different isel patterns for every scalar load
store instruction.

This reduces them to a single ComplexPattern that returns the Base
and Offset. Or an offset of 0 if there was no offset identified

I've done a similar thing for the 2 isel patterns that match add/or
with FrameIndex and immediate. Using the offset of 0, I was also
able to remove the custom handler for FrameIndex. Happy to split that
to another patch.

We might be able to enhance in the future to remove the post-isel
peephole or the special handling for ADD with constant added by D126576.

A nice side effect is that this removes nearly 3000 bytes from the isel
table.

Differential Revision: https://reviews.llvm.org/D126932
2022-06-03 09:00:17 -07:00
Craig Topper 1d67adbfbf [RISCV] Give CSImm12MulBy4 PatLeaf priority over CSImm12MulBy8. NFC
The immediate range check for CSImm12MulBy8 included some values
covered by CSImm12MulBy4. I assume CSImm12MulBy4 had priority due
to pattern order in the td file, but this makes the priority
explicit in the predicate.
2022-06-02 20:51:14 -07:00
Craig Topper dbead2388b [RISCV] Add custom isel for (add X, imm) used by load/stores.
If the imm is out of range for an ADDI, we will materialize it in
a register using multiple instructions. If the ADD is used by a
load/store, doPeepholeLoadStoreADDI can try to pull an ADDI from
the constant materialization into the load/store offset. This only
works if the ADD has a single use, otherwise the peephole would have
to rebuild multiple nodes.

This patch instead tries to solve the problem when the add is selected.
We check that the add is only used by loads/stores and if it is
we will select it to (ADDI (ADD X, Imm-Lo12), Lo12). This will enable
the simple case in doPeepholeLoadStoreADDI that can bypass an ADDI
used as a pointer. As a result we can remove the more complicated
peephole from doPeepholeLoadStoreADDI.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126576
2022-06-02 13:45:32 -07:00
Philip Reames 76ac916d63 [RISCV] Inline one copy of needVSETVLI into the other [NFC]
Calling the non-MI version directly was unsound (as fixed in dcdb0bf2), so remove that version to decrease likelyhood of future mistakes.
2022-06-02 13:06:18 -07:00
Philip Reames dcdb0bf25b [RISCV] Fix an inconsistency with compatible load/store handling
Once we've computed the incoming predecessor state, we should use the same compatibility check with knowledge of MI as we did in phase 2 in order to be consistent across all phases.

Differential Revision: https://reviews.llvm.org/D126574
2022-06-02 08:03:51 -07:00
Craig Topper 909a78b3a4 [RISCV] Use MachineRegisterInfo::use_instr_begin instead of use_begin+getParent. NFCI 2022-06-01 15:37:48 -07:00
Craig Topper aeb27f133a [RISCV] Fix i64<->f64 and i32<->f32 bitcasts with VLS vectors enabled.
We enable a custom handler to optimize conversions between scalars
and fixed vectors. Unfortunately, the custom handler picks up scalar
to scalar conversions as well. If the scalar types are both legal,
we wouldn't match any of the fixed vector cases and would return SDValue()
causing the LegalizeDAG to expand the bitcast through memory.

This patch fixes this by checking if it's a scalar to scalar conversion
and returns `Op` if both types are legal.

Differential Revision: https://reviews.llvm.org/D126739
2022-06-01 08:13:49 -07:00
Craig Topper 1b2de79ff4 [RISCV] Use two ADDIs to do some stack pointer adjustments.
If the adjustment doesn't fit in 12 bits, try to break it into
two 12 bit values before falling back to movImm+add/sub.

This is based on a similar idea from isel.

Reviewed By: luismarques, reames

Differential Revision: https://reviews.llvm.org/D126392
2022-05-31 10:25:28 -07:00
Craig Topper 80c4cf6369 [RISCV] Fix a few corner case bugs in RISCVMergeBaseOffsetOpt::matchLargeOffset
The immediate for LUI is stored as 20-bit unsigned value. We need
to sign extend if after shifting by 12 to match the instruction
behavior.

If we find an LUI+ADDI on RV64, it means the constant isn't a
simm32. If it was, we would have emitted LUI+ADDIW from constant
materialization. Make sure the constant is a simm32 before folding.
This appears to match gcc.

A future patch will add support for LUI+ADDIW on RV64.
2022-05-31 09:50:54 -07:00
Fraser Cormack 5a2e640eb7 [RISCV][NFC] Adjust some comments in RISCVInsertVSETVLI
Capitalize the first letter of comments like the others, and a few other
tweaks.
2022-05-31 10:13:15 +01:00
eopXD 2cadf84fc8 [RISCV] Pass OptLevel to `RISCVDAGToDAGISel` correctly
Originally, `OptLevel` isn't passed into the `MachineFunctionPass`.
This lets the default parameter of `SelectionDAGISel`, which is
`CodeGenOpt::Default`, be passed in. OptLevelChanger captures the
optimization level with the parameter, and rather not the value
within `TargetMachine`. This lets the optimization be
unintentionally overwriten if other value than `CodeGenOpt::Default`
passed.

This patch fixes this by passing the optimization level rather
than using the default value.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D126641
2022-05-30 17:22:50 -07:00
Craig Topper 6a6cf2e28d [RISCV] isel (add (and X, 0x1FFFFFFFE), Y) as (SH1ADD (SRLI X, 1), Y)
This pattern is what we get after DAG combine for C code like this.

short *ptr1, *ptr2, *ptr3;
unsigned diff = ptr1 - ptr2;
return ptr3[diff];

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126588
2022-05-29 18:24:07 -07:00
Philip Reames 85b4470035 [RISCV] Allow PRE of vsetvli involving non-1 LMUL
This is a follow up to address a review comment from D124869. When deciding whether to PRE a vsetvli, we can allow non-LMUL1 vsetvlis.

Differential Revision: https://reviews.llvm.org/D126563
2022-05-27 15:49:41 -07:00
Craig Topper b09e54541a [RISCV] Use template version of SignExtend64 for constant extends. NFC
We were inconsistent about which one we used.
2022-05-27 13:11:15 -07:00
Craig Topper d0f65eaa85 [RISCV] Remove unused variables. NFC 2022-05-27 12:13:45 -07:00
Craig Topper aaad507546 [RISCV] Return false from isOffsetFoldingLegal instead of reversing the fold in lowering.
When lowering GlobalAddressNodes, we were removing a non-zero offset and
creating a separate ADD.

It already comes out of SelectionDAGBuilder with a separate ADD. The
ADD was being removed by DAGCombiner.

This patch disables the DAG combine so we don't have to reverse it.
Test changes all look to be instruction order changes. Probably due
to different DAG node ordering.

Differential Revision: https://reviews.llvm.org/D126558
2022-05-27 11:05:18 -07:00
Fraser Cormack 3e450d9cbb [RISCV][NFC] Unify compatibility checks under one function
Split off from D125021.

We were duplicating logic across different phases. Since we want to
ensure a consistency of logic across phases for correctness, this patch
combines our multiple compatibility checks into one function to better
convey this.

Several methods were made const too.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126472
2022-05-27 11:21:54 +01:00
Fangrui Song cef377d75d [RISCV] Simplify code after D125905 2022-05-26 18:13:38 -07:00
Philip Reames 8a3b6ba756 [RISCV] Add a subtarget feature to enable unaligned scalar loads and stores
A RISCV implementation can choose to implement unaligned load/store support. We currently don't have a way for such a processor to indicate a preference for unaligned load/stores, so add a subtarget feature.

There doesn't appear to be a formal extension for unaligned support. The RISCV Profiles (https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva20u64-profile) docs use the name Zicclsm, but a) that doesn't appear to actually been standardized, and b) isn't quite what we want here anyway due to the perf comment.

Instead, we can follow precedent from other backends and have a feature flag for the existence of misaligned load/stores with sufficient performance that user code should actually use them.

Differential Revision: https://reviews.llvm.org/D126085
2022-05-26 15:25:47 -07:00
Craig Topper e9ac99b609 [RISCV] Simplfy creation of IndexVT in lowerMaskedGather/lowerMaskedScatter. NFC
The scalar element width is not a factor in how ContainerVT is
determined. We don't need to check the relative size of VT and
IndexVT.
2022-05-26 13:13:32 -07:00
Philip Reames d58cc0839e [RISCV] reorganize getFrameIndexReference to reduce code duplication [nfc]
This change reorganizes the majority of frame index resolution into a two strep process.

    Step 1 - Select which base register we're going to use.
    Step 2 - Compute the offset from that base register.

The key point is that this allows us to share the step 2 logic for the SP case. This reduces the code duplication, and (I think) makes the code much easier to follow.

I also went ahead and added assertions into phase 2 to catch errors where we select an illegal base pointer. In general, we can't index from a base register to a stack location if that requires crossing a variable and unknown region. In practice, we have two such cases: dynamic stack realign and var sized objects. Note that crossing the scalable region is fine since while variable, it's a known variability which can be expressed in the offset.

Differential Revision: https://reviews.llvm.org/D126403
2022-05-26 09:44:58 -07:00
Philip Reames afe49934a6 [RISCV] Allow compatible VTYPE in AVL Reg Forward cases
During insertion of VSETVLI, we have two related bits of code which decide whether we can reuse a previous vsetvli result. As was pointed out in the original review, these cases can allow any prior state for which we know that VL is the same for any value of AVL.

This was originally separated out of a desire for separate tests and review. As it turns out, finding a test case for this has been quite challenging. Most of the cases I tried, we manage to already get through other chains of logic. We do have one correct test change, but that only exercises one of the two changes.

Differential Revision: https://reviews.llvm.org/D126400
2022-05-26 08:50:35 -07:00
Fraser Cormack 2c9983f530 [RISCV][NFC] Add braces to 'else' to match braced 'if' 2022-05-26 10:00:33 +01:00
Kito Cheng e45087fd53 [RISCV] Fix state persistence bugs (PR55548)
We didn't implement RISCVELFStreamer::reset and cause some very strange
section output for attribute section...just reference D15950 to see how
ARM implement that.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D125905
2022-05-26 16:09:00 +08:00
jacquesguan b271488e8b [RISCV] Replace ISD::FP_EXTEND and ISD::FP_ROUND with RVV VL op.
This patch tries to solve the incoordination between the direct and intermediate  cast caused by D123975.
This patch replaces ISD::FP_EXTEND and ISD::FP_ROUND with RVV VL op in the lowering of FP scalable vector direct cast to unify with the intermediate cast.
And it also changes the FP widenning pattern with the VL op.

Differential Revision: https://reviews.llvm.org/D125364
2022-05-26 02:17:31 +00:00
Philip Reames 1f06398e96 Reapply "[RISCV] Enable strict assertions in InsertVSETVLI data flow"
be2cb8 fixes the case which triggered the revert.  Reapply, and let's see if anything else falls out.

Original commit message:

These asserts are believed to hold after several recent miscompiles have been fixed.  If you see an assertion failure on this change, please toggle the default back and make sure you file a bug with a reproducer.  We may have as yet uncaught miscompiles lurking in this code.

Differential Revision: https://reviews.llvm.org/D125271
2022-05-25 11:18:55 -07:00
Philip Reames be2cb824d0 [riscv] Remove mutation of prior vsetvli from insertion dataflow
This moves mutation entirely out of the main algorithm.

The immediate trigger is that we hit another case of the same issue I thought we'd fixed in 72925d9. It turns out we hadn't considered the cross block case.

As a brief summary, the issue being fixed is that if we mutate a previous vsetvli in phase 3, there's a possibility that some later use of that vsetvli changes "compatibility". In the cross_block_mutate test, this later vsetvli occurs in another block (and is thus visit order dependent too!). This causes us to fail strict asserts. (To be explicit, the current on by default workaround should compensate. It's only when we turn that off that we have problems.)

Now, I want to explicitly call out an alternate workaround. We could leave the mutation in phase 3, and simplify restrict it to the case where the previous vsetvli's GPR result is unused. That covers the case we've actually seen. (I'll note that codegen regressions with a simple form of this were significant. We might have to check specifically for the use outside block case to keep them reasonable, which complicates the workaround slightly.)

Personally, I'm at the point where I want the mutation pulled out just for robustness sake. I'm worried there's yet one more form of this bug we haven't thought about.

The other motivation for this change is that it does give us a couple of minor codegen wins. None appear to be hugely significant, but improvements never hurt right?

Differential Revision: https://reviews.llvm.org/D125270
2022-05-25 10:51:14 -07:00
Craig Topper 172149e98c [RISCV] Preserve fast math flags in lowerVPOp.
Update test to check MIR after finalize-isel instead of debug output.

This is of course not the only place we should preserve FMF, but
it's the most obvious one.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D126306
2022-05-25 09:16:07 -07:00
Philip Reames 2a3b6f2cba [RISCV] Hoist VSETVLI vlmax, vtype out of scalable loops
This is a straight forward extension of the PRE transform introduced in D124869 to handle the VLMAX case.

The test changes here look quite positive. This surprised me until I realized that all the tests are using @llvm.vscale to figure out the VLMAX, not the llvm.riscv.vsetvlmax intrinsic. If they'd used the later, these would have been full redundancy cases and fully handled by the data flow. I'm not really sure if use of vscale here is representative or not. If it is, we should probably look at using VSETVLI to lower vscale rather than a raw read of vlenb and some math.

Differential Revision: https://reviews.llvm.org/D126338
2022-05-25 08:00:27 -07:00
Philip Reames dd336b6891 [RISCV] Restructure comment and add clarifying assert to getFrameIndexReference [NFC]
Differential Revision: https://reviews.llvm.org/D126088
2022-05-25 07:59:27 -07:00
Lewis Revill 29a5a7c6d4 [RISCV] Add pre-emit pass to make more instructions compressible
When optimizing for size, this pass searches for instructions that are
prevented from being compressed by one of the following:

1. The use of a single uncompressed register.
2. A base register + offset where the offset is too large to be
   compressed and the base register may or may not already be compressed.

In the first case, if there is a compressed register available, then the
uncompressed register is copied to the compressed register and its uses
replaced. This is only done if there are enough uses that code size
would be improved.

In the second case, if a compressed register is available, then the
original base register is copied and adjusted such that:

new_base_register = base_register + adjustment
base_register + large_offset = new_base_register + small_offset

and the uses of the base register are replaced with the new base
register. Again this is only done if there are enough uses for code size
to be improved.

This pass was authored by Lewis Revill, with large offset optimization
added by Craig Blackmore.

Differential Revision: https://reviews.llvm.org/D92105
2022-05-25 09:25:02 +01:00
Craig Topper 66db5312bd [RISCV] Fix vnsrl/vnsra isel patterns that are dropping VL.
We were incorrectly using VLMax instead of the passed VL.

Reviewed By: khchen, reames

Differential Revision: https://reviews.llvm.org/D126319
2022-05-24 21:38:59 -07:00
Fraser Cormack fd93736657 [RISCV] Replace untested code with assert
We found untested code where negative frame indices were ostensibly
handled despite it being in a block guarded by !MFI.isFixedObjectIndex.

While the implementation of MachineFrameInfo::isFixedObjectIndex
suggests this is possible (i.e., if a frame index was more negative - less than the
number of fixed objects), I couldn't find any test in tree -- for any
target -- where a negative frame index wasn't also a fixed object
offset. I couldn't find a way of creating such a object with the
public MachineFrameInfo creation APIs. Even
MachineFrameInfo::getObjectIndexBegin starts counting at the negative
number of fixed objects, so such frame indices wouldn't be covered by
loops using the provided begin/end methods.

Given all this, an assert that any object encountered in the block is
non-negative seems reasonable.

Reviewed By: StephenFan, kito-cheng

Differential Revision: https://reviews.llvm.org/D126278
2022-05-25 05:03:53 +01:00
Philip Reames 948d931323 [RISCV] Ensure the forwarded AVL register is alive
When the AVL value does not fit in 5 bits, the register in which this value is stored may be dead when we want to forward it. This patch ensure the kill flags on the register are cleared before forwarding.

Patch by: loralb
Differential Revision: https://reviews.llvm.org/D125971
2022-05-24 15:07:42 -07:00
Craig Topper d2ee2c9c8d [RISCV] Add an operand kind to the opcode/imm returned from RISCVMatInt.
Instead of matching opcodes to know the format to emit, use an
enum value that we can get from the RISCVMatInt::Inst class.

Change the consumers to use fully covered switches so that we get
a compiler warning if a new kind is added. With the opcode checks
it was easier to forget to update one of the 3 consumers.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126317
2022-05-24 14:56:29 -07:00
Philip Reames fb948572e0 [riscv] Use getFirstInstrTerminator [nfc] 2022-05-24 14:56:01 -07:00
Philip Reames a95ecb20bc [RISCV] Hoist VSETVLI out of idiomatic fixed length vector loops
This patch teaches the VSETVLI insertion pass to perform a very limited form of partial redundancy elimination. The motivating example comes from the fixed length vectorization of a simple loop such as:

for (unsigned i = 0; i < a_len; i++)
    a[i] += b;

Without this change, the core vector loop and preheader is as follows:

.LBB0_3:                                # %vector.ph
	andi	a1, a6, -8
	addi	a4, a0, 16
	mv	a5, a1
.LBB0_4:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
	addi	a3, a4, -16
	vsetivli	zero, 4, e32, m1, ta, mu
	vle32.v	v8, (a3)
	vle32.v	v9, (a4)
	vadd.vx	v8, v8, a2
	vadd.vx	v9, v9, a2
	vse32.v	v8, (a3)
	vse32.v	v9, (a4)
	addi	a5, a5, -8
	addi	a4, a4, 32
	bnez	a5, .LBB0_4

The key thing to note here is that, the execution of the vsetivli only needs to happen once. Since there's no tail folding happening here, the value of the vector configuration registers are invariant through the loop.

After this patch, we hoist the configuration into the preheader and perform it once.

.LBB0_3:                                # %vector.ph
	andi	a1, a6, -8
	vsetivli	zero, 4, e32, m1, ta, mu
	addi	a4, a0, 16
	mv	a5, a1
.LBB0_4:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
	addi	a3, a4, -16
	vle32.v	v8, (a3)
	vle32.v	v9, (a4)
	vadd.vx	v8, v8, a2
	vadd.vx	v9, v9, a2
	vse32.v	v8, (a3)
	vse32.v	v9, (a4)
	addi	a5, a5, -8
	addi	a4, a4, 32
	bnez	a5, .LBB0_4

Differential Revision: https://reviews.llvm.org/D124869
2022-05-24 14:56:01 -07:00
Craig Topper 415b9f595d Recommit "[RISCV] Use selectShiftMaskXLen ComplexPattern for isel of rotates."
This reverts commit dfe513ae1b.

Tests have been changed to avoid the type legalization bug being
fixed in D126036.

Original commit message:
This will remove masks on the shift amount. We usually get this with
SimplifyDemandedBits in DAGCombine, but that's restricted to cases
where the AND has a single use. selectShiftMaskXLen does not have
that restriction.
2022-05-24 09:41:04 -07:00
Fraser Cormack 08c9fb8447 [RISCV] Ensure the entire stack is aligned to the RVV stack alignment
This patch fixes another bug in the RVV frame lowering. While some frame
objects with non-default stack IDs (such scalable-vector alloca
instructions) are considered in the target-independent max alignment
calculations, others (for example, during calling-convention lowering)
are not. This means we'd occasionally align the base of the stack to
only 16 bytes, with no way to ensure that the RVV section contained
within that is aligned to anything higher.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D125973
2022-05-24 06:58:51 +01:00
Fraser Cormack cb8681a2b3 [RISCV] Fix RVV stack frame alignment bugs
This patch addresses several alignment issues in the stack frame when
RVV objects are taken into account.

One bug is that the RVV stack was never guaranteed to keep the alignment
of the stack *as a whole*. We must maintain a 16-byte aligned stack at
all times, especially when calling other functions. With the standard V
extension, this is conveniently happening since VLEN is at least 128 and
always 16-byte aligned. However, we support Zvl64b which does not
guarantee this. To fix this, the RVV stack size is rounded up to be
aligned to 16 bytes. This in practice generally makes us allocate a
stack sized at least 2*VLEN in size, and a multiple of 2.

    |------------------------------| -- <-- FP
    | 8-byte callee-save           | |      |
    |------------------------------| |      |
    | one VLENB-sized RVV object   | |      |
    |------------------------------| |      |
    | 8-byte local variable        | |      |
    |------------------------------| -- <-- SP (must be aligned to 16)

In the example above, with Zvl64b we are decrementing SP by 12 bytes
which does not leave SP correctly aligned. We therefore introduce an
extra VLENB-sized amount used for alignment. This would therefore ensure
the total stack size was 16 bytes (48 for Zvl128b, 80 for Zvl256b, etc):

    |------------------------------| -- <-- FP
    | 8-byte callee-save           | |      |
    |------------------------------| |      |
    | one VLENB-sized padding obj  | |      |
    | one VLENB-sized RVV object   | |      |
    |------------------------------| |      |
    | 8-byte local variable        | |      |
    |------------------------------| -- <-- SP

A new RVV invariant has been introduced in this patch, which is that the
base of the RVV stack itself is now always aligned to 16 bytes, not 8 as
before. This keeps us more in line with the scalar stack and should be
easier to reason about. The calculation of the RVV padding has thus
changed to be the amount required to align the scalar local variable
section to the RVV section's alignment. This amount is further rounded
up when setting up the initial stack to keep everything aligned:

    |------------------------------| -- <-- FP
    | 8-byte callee-save           |
    |------------------------------|
    |                              |
    | RVV objects                  |
    | (aligned to at least 16)     |
    |                              |
    |------------------------------|
    | RVV padding of 8 bytes       |
    |------------------------------|
    | 8-byte local variable        |
    |------------------------------| -- <-- SP

In the example above, it's clear that we need 8 bytes of padding to keep
the RVV section aligned to 16 when using SP. But to keep SP *itself*
aligned to 16 we can't decrement the initial stack pointer by 24 - we
have to round up to 32.

With the RVV section correctly aligned, the second bug fixed by
this patch is that RVV objects themselves are now correctly aligned. We
were previously only guaranteeing an alignment of 8 bytes, even if they
required a higher alignment. This is relatively simple and in practice
we see more rounding up of VLEN amounts to account for alignment in
between objects:

    |------------------------------|
    | RVV object (aligned to 16)   |
    |------------------------------|
    | no padding necessary         |
    |------------------------------|
    | 2*VLENB RVV object (align 16)|
    |------------------------------|
    | VLENB alignment padding      |
    |------------------------------|
    | RVV object (align 32)        |
    |------------------------------|
    | 3*VLENB alignment padding    |
    |------------------------------|
    | VLENB RVV object (align 32)  |
    |------------------------------| -- <-- base of RVV section

Note that a lot of the regressions in codegen owing to the new alignment
rules are correct but actually only strictly necessary for Zvl64b (and
Zvl32b but that's not really supported). I plan a follow-up patch to
take the known VLEN into account when padding for alignment.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D125787
2022-05-24 06:53:51 +01:00
Peter Waller ade47bdc31 [LV] Improve register pressure estimate at high VFs
Previously, `getRegUsageForType` was implemented using
`getTypeLegalizationCost`.  `getRegUsageForType` is used by the loop
vectorizer to estimate the register pressure caused by using a vector
type.  However, `getTypeLegalizationCost` currently only appears to
understand splitting and not scalarization, so significantly
underestimates the register requirements.

Instead, use `getNumRegisters`, which understands when scalarization
can occur (via computeRegisterProperties).

This was discovered while investigating D118979 (Set maximum VF with
shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the
loop vectorizer previously ends up costing an v128i1 as 2 v64i*
registers where it actually occupies 128 i32 registers.

I'm sending this patch early for comment, I'm still doing some sanity checking
with LNT.  I note that getRegisterClassForType appears to return VectorRC even
though the type in question (large vNi1 types) end up occupying scalar
registers. That might be worth fixing too.

Differential Revision: https://reviews.llvm.org/D125918
2022-05-23 07:57:45 +00:00
Paul Walker 258dac43d6 [SVE] Enable use of 32bit gather/scatter indices for fixed length vectors
Differential Revision: https://reviews.llvm.org/D125193
2022-05-22 12:32:30 +01:00
Fraser Cormack d60ae47f9d [RISCV] Fix logic for determining RVV stack padding
We must add padding when using SP or BP to access stack objects.
Checking whether we're missing FP is not sufficient as stack realignment
uses SP too. The test in D125962 explains the specific issue in more
detail.

Split from D125787.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D125964
2022-05-20 13:18:52 +01:00
jacquesguan 8fc4fcecb8 [RISCV] Add VL patterns for vector widening floating-point fused multiply-add instructions.
This patch adds VL patterns for vector widening floating-point fused multiply-add instructions to support fixed length vector type.

Differential Revision: https://reviews.llvm.org/D124505
2022-05-20 06:56:48 +00:00
Craig Topper dfe513ae1b Revert "[RISCV] Use selectShiftMaskXLen ComplexPattern for isel of rotates."
This reverts commit 86f7d7074a.

The test cases added for this exposed an pre-existing bug that is failing
the expensive checks bot. Reverting so I can revert that patch.
2022-05-19 14:39:38 -07:00
Jay Foad 6bec3e9303 [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.

The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.

Differential Revision: https://reviews.llvm.org/D125557
2022-05-19 11:23:13 +01:00
Zi Xuan Wu (Zeson) 861489af1b [NFC][RISCV] Enable TuneNoDefaultUnroll feature to control targets which use default unroll preference
In RISCVTargetTransformInfo, enumerating the processor family is not a good way to predict.
Because it needs to enumerate many subtarget family and is hard to update if add new subtarget.
Instead, create a feature to distinguish whether targets want to use default unroll preference or not.

Keep TuneSiFive7 because it's flag to indicate subtarget family, which may used in other place.

Differential Revision: https://reviews.llvm.org/D125741
2022-05-19 12:21:49 +08:00
Craig Topper 86f7d7074a [RISCV] Use selectShiftMaskXLen ComplexPattern for isel of rotates.
This will remove masks on the shift amount. We usually get this with
SimplifyDemandedBits in DAGCombine, but that's restricted to cases
where the AND has a single use. selectShiftMaskXLen does not have
that restriction.
2022-05-18 10:23:29 -07:00
Philip Reames d4545e6fa0 Revert "[RISCV] Enable strict assertions in InsertVSETVLI data flow"
This reverts commit 79a66ec97b.

The stronger asserts served their purpose; I stumbled across another bug.  Will reapply once this one is also fixed.

The bug appears to be a variant of a previous one:
* We mutate an instruction in one block.
* That mutation changes the phase3 results of another block.

This is very similiar to a previous issue, except cross block instead of within a single block.
2022-05-17 15:53:13 -07:00
Philip Reames 118c5d1c97 [RISCV] Minor reorganization of VSETVLIInfo::operator== for readability [NFC] 2022-05-17 12:05:17 -07:00
Philip Reames 11a7e77c95 [RISCV] Canonicalize AVL=setvli to AVL=Imm or AVL=VLMAX
This patch adds a transform to the local prepass in InsertVSETVLI which canonicalizes an AVL of a register from another vsetvli into immediate or VLMAX when VTYPE is the same. In this patch, I chose to be conservative and avoid arbitrary vreg forwarding due to profitability concerns about possibility overlapping live ranges.

This has the effect of eliminating vsetvli instructions in loops which are walking either VLMAX or a constant number of lanes per iteration.

Differential Revision: https://reviews.llvm.org/D125812
2022-05-17 11:46:22 -07:00
Philip Reames 79a66ec97b [RISCV] Enable strict assertions in InsertVSETVLI data flow
These asserts are believed to hold after several recent miscompiles have been fixed.  If you see an assertion failure on this change, please toggle the default back and make sure you file a bug with a reproducer.  We may have as yet uncaught miscompiles lurking in this code.

Differential Revision: https://reviews.llvm.org/D125271
2022-05-17 11:12:31 -07:00
Fraser Cormack 8430b82741 [RISCV] Drop notion of "strict" vsetvli compatibility
With recent fixes to the dataflow in place, we now never pass
Strict=true to isCompatible, so remove the parameter completely.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D125748
2022-05-17 15:24:23 +01:00
Fraser Cormack f00f894d5d [RISCV][NFC] Reword split SP adjustment comments 2022-05-17 10:03:21 +01:00
Fraser Cormack 05ad4d4f38 [RISCV][NFC] Fix comment typos in split SP adjustment 2022-05-17 09:56:54 +01:00
Philip Reames 1474880353 [RISCV] Use classic dataflow for VSETVLI insertion
Our current implementation of the InsertVSETVLI dataflow allows phase 3 to arrive at a different block end state than the data flow in phase 1/2 computed. This arises because a block which contains instructions (e.g. load or stores) which don't consume all the incoming bits of the VL/VTYPE can be compatible with multiple incoming states. The algorithm effectively changes the SEW on such instructions, and propagates the prior state forward. As phase 3 uses the block input state for this propagation, but phase 1/2 doesn't, this can result in different block end states.

If we don't correct for it, this discrepancy can result in miscompiles. This was the source of multiple recent bugs. However, by now we have fixes for all known correctness issues.

The basic strategy we use is to insert a compensation vsetvli to bring the block state leaving the block back into consistency with the one computed. This is correct, but results in extra vsetvlis being placed at the end of blocks.

This change adjusts the phase 1/2 algorithm to propagate the incoming block state through the block, allowing the compatibility rules to modify the end state. The algorithm may need to run slightly more iterations, but the end result is consistent with what phase 3 does.

The benefit of doing this is two fold.

First, we reverse some of the code quality introductions introduced in the functional fixes.

Second, we simplify the invariants, and allow the strict assertions to be enabled. Several humans, myself included, have found it quite surprising that invariant didn't hold already, and arguably that confusion is the cause of several of our recent miscompiles in this code.

The downside to this patch is that the dataflow may require additional iterations to stabilize. In the worse case, we go from O(Edges) to O(E + UniquePaths) as the incoming state (and thus the outgoing one) can now change once for each path from the entry block.

Differential Revision: https://reviews.llvm.org/D125232
2022-05-16 17:06:27 -07:00
Philip Reames 3d17c91709 [RISCV] Fix missing vsetvli in transparent block case
We've got a lurking problem with our data flow implementation where different phases disagree, resulting in possible miscompiles. D119518 introduced a workaround, but failed to consider blocks which only contain load/stores compatible with their incoming state.

When I went to rebase and simplify D125232, it turned out that not all of the correctness issues had been fixed yet after all. This is the correctness fix accidentally embedded in the original more complicated version.

Note that the test changes here are mostly regressions. It's worth noting that the simplified version of D125232 exactly reverses all the non-functional diffs in the test caused here. D125232 should be the immediate following commit.

Differential Revision: https://reviews.llvm.org/D125703
2022-05-16 17:06:27 -07:00
Philip Reames 7dbf2e7b57 Teach PeepholeOpt to eliminate redundant copy from constant physreg (e.g VLENB on RISCV)
The existing redundant copy elimination required a virtual register source, but the same logic works for any physreg where we don't have to worry about clobbers.  On RISCV, this helps eliminate redundant CSR reads from VLENB.

Differential Revision: https://reviews.llvm.org/D125564
2022-05-16 16:38:30 -07:00
Paul Walker 7dd05ba9ed [SelectionDAG] Remove duplicate "is scaled" information from gather/scatter SDNodes.
During early gather/scatter enablement two different approaches
were taken to represent scaled indices:

* A Scale operand whereby byte_offsets = Index * Scale
* An IndexType whereby byte_offsets = Index * sizeof(MemVT.ElementType)

Having multiple representations is bad as shown by this patch which
fixes instances where the two are out of sync. The dedicated scale
operand is more flexible and pervasive so this patch removes the
UNSCALED values from IndexType. This means all indices are scaled
but the scale can be one, hence unscaled. SDNodes now use the scale
operand to answer the "isScaledIndex" question.

I toyed with the idea of keeping the UNSCALED enums and helper
functions but because they will have no uses and force SDNodes to
validate the set of supported values I figured it's best to remove
them. We can re-add them if there's a real need. For similar
reasons I've kept the IndexType enum when a bool could be used as I
think being explicitly looks better.

Depends On D123347

Differential Revision: https://reviews.llvm.org/D123381
2022-05-16 20:47:52 +01:00
Philip Reames e2df48bb23 [RISCV] Add further trace output to InsertVSETLVI 2022-05-16 09:15:32 -07:00
Liqin.Weng d95513ae3a [RISCV] remove useless code
When legality check for vectoring reduction, hasVInstructions() check be unneeded. RISCV can only loop vectorization with hasVInstructions()

Reviewed By: kito-cheng, craig.topper

Differential Revision: https://reviews.llvm.org/D125460
2022-05-16 12:54:03 +00:00
jacquesguan a8426ada49 [RISCV][NFC] Replace for-each with array argument call.
This patch replaces some for-each set with the new arrayref argument API, since it already used an array in defination, I think this change won't cause any ambiguity.

Differential Revision: https://reviews.llvm.org/D125455
2022-05-16 02:12:48 +00:00
Zakk Chen 1878f240c9 [RISCV] Fix incorrect use of tail agnostic vslideup.
We need to use tail undisturbed for vslideup to implement
vector insert operation correctly.

Ideally, we cound use the tail agnostic when insert subvector
or element at the end of the vector. This will be in follow-up
patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125545
2022-05-15 18:32:21 -07:00
Sheng c644488a8b Rename `MCFixedLenDisassembler.h` as `MCDecoderOps.h`
The name `MCFixedLenDisassembler.h` is out of date after D120958.

Rename it as `MCDecoderOps.h` to reflect the change.

Reviewed By: myhsu

Differential Revision: https://reviews.llvm.org/D124987
2022-05-15 08:44:58 +08:00
Craig Topper 5a19fbad83 [RISCV] Remove unneeded check for ISD::VSCALE operand being a constant. NFC
ISD::VSCALE only allows constant operands.
2022-05-14 13:45:03 -07:00
Roger Ferrer Ibanez 189ca6958e [RISCV] Use the new chain when converting a fixed RVV load
When building the final merged node, we were using the original chain
rather than the output chain of the new operation. After some collapsing
of the chain this could cause the loads be incorrectly scheduled respect
to later stores.

This was uncovered by SingleSource/Regression/C/gcc-c-torture/execute/pr36038.c
of the llvm testsuite.

https://reviews.llvm.org/D125560
2022-05-13 22:21:08 +00:00
Craig Topper a2918976cd Revert "[RISCV] Enable subregister liveness tracking for RVV."
This reverts most of ed242b54c9

I'm seeing failures in our intrinsic testing on qemu that seem
related to this. Reverting while I investigate.

I've left the command line option in place for directed testing.
It defaults to off.
2022-05-13 10:59:58 -07:00
Philip Reames 853fa8ee22 [RISCV] Address post-commit feedback from af5e09b 2022-05-13 09:51:23 -07:00
Philip Reames af5e09b7d9 [RISCV] Add llvm.read.register support for vlenb
This patch adds minimal support for lowering an read.register intrinsic with vlenb as the argument. Note that vlenb is an implementation constant, so it is never allocatable.

This was split off a patch to eventually replace PseudoReadVLENB with a COPY MI because doing so revealed a couple of optimization opportunities which really seemed to warrant individual patches and tests. To write those patches, I need a way to write the tests involving vlenb, and read.register seemed like the right testing hook.

Differential Revision: https://reviews.llvm.org/D125552
2022-05-13 09:12:02 -07:00
Zakk Chen 7dfc56c107 [RISCV] Add the passthru operand for RVV unmasked segment load IR intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125323
2022-05-13 02:16:40 -07:00
Philip Reames 52b5f1f7d4 [RISCV] Extend dataflow workaround from D119518 to fallthrough blocks
We've got a lurking problem with our data flow implementation where different phases disagree, resulting in possible miscompiles. D119518 introduced a workaround, but failed to consider blocks without terminators (e.g. fallthroughs).

I have a deeper rework of the algorithm in flight over in D125232, but this patch is specifically a minimal fix for an active miscompile. That change can be reworked over this once landed.

Differential Revision: https://reviews.llvm.org/D125408
2022-05-12 10:45:59 -07:00
Craig Topper 40e9654511 [RISCV] Use tail agnostic policy when selecting riscv_fma_vl to instructions
riscv_fma_vl doesn't have a tail, so use the tail_agnostic policy.

We were already doing this for some patterns. I think the patterns
with fneg and mask were added later and I copied the tail policy
from the unmasked patterns.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D125424
2022-05-12 09:09:24 -07:00
Craig Topper ed242b54c9 [RISCV] Enable subregister liveness tracking for RVV.
RVV makes heavy use of subregisters due to LMUL>1 and segment
load/store tuples. Enabling subregister liveness tracking improves the quality
of the register allocation.

I've added a command line that can be used to turn it off if it causes compile
time or functional issues. I used the command line to keep the old behavior
for one interesting test case that was testing register allocation.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D125108
2022-05-11 12:49:03 -07:00
Craig Topper 5c7ec998a9 [RISCV] Fold addiw from (add X, (addiw (lui C1, C2))) into load/store address
This is a followup to D124231.

We can fold the ADDIW in this pattern if we can prove that LUI+ADDI
would have produced the same result as LUI+ADDIW.

This pattern occurs because constant materialization prefers LUI+ADDIW
for all simm32 immediates. Only immediates in the range
0x7ffff800-0x7fffffff require an ADDIW. Other simm32 immediates
work with LUI+ADDI.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D124693
2022-05-11 12:47:13 -07:00
Craig Topper f499ec6b3d [RISCV] Add caching to the gather/scatter to strided load/store conversion.
If we have multiple gather/scatter instructions using the same the
same strided address we would scalarize it multiple times. I guess
a later pass cleans this up, but I don't know if that's guaranteed.

This patch adds a cache to remember the scalarization we already
created for a previous gather/scatter.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D125326
2022-05-11 11:47:27 -07:00
Craig Topper 09f48c6b80 [RISCV] Move implementation of getVLOpNum and getSEWOpNum from RISCVInsertVSETVLI to RISCVBaseInfo.h. NFC
We should consolidate the operand counting and ordering into
RISCVBaseInfo.h and stop spreading it around.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D125344
2022-05-11 11:14:58 -07:00
Craig Topper 0ebb02b90a [RISCV] Override TargetLowering::shouldProduceAndByConstByHoistingConstFromShiftsLHSOfAnd.
This hook determines if SimplifySetcc transforms (X & (C l>>/<< Y))
==/!= 0 into ((X <</l>> Y) & C) ==/!= 0. Where C is a constant and
X might be a constant.

The default implementation favors doing the transform if X is not
a constant. Otherwise the code is left alone. There is a provision
that if the target supports a bit test instruction then the transform
will favor ((1 << Y) & X) ==/!= 0. RISCV does not say it has a variable
bit test operation.

RISCV with Zbs does have a BEXT instruction that performs (X >> Y) & 1.
Without Zbs, (X >> Y) & 1 still looks preferable to ((1 << Y) & X) since
we can fold use ANDI instead of putting a 1 in a register for SLL.

This patch overrides this hook to favor bit extract patterns and
otherwise falls back to the "do the transform if X is not a constant"
heuristic.

I've added tests where both C and X are constants with both the shl form
and lshr form. I've also added a test for a switch statement that lowers
to a bit test. That was my original motivation for looking at this.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D124639
2022-05-11 11:13:17 -07:00
Craig Topper 0781742785 [RISCV] Add a DAG combine to pre-promote (i32 (and (srl X, Y), 1)) with Zbs on RV64.
Type legalization will want to turn (srl X, Y) into RISCVISD::SRLW,
which will prevent us from using a BEXT instruction.

I don't think there is any precedent for type promotion checking
users to decide how to promote. Instead, I've added this DAG combine to
do it before type legalization.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D124109
2022-05-11 10:49:16 -07:00
Philip Reames 72925d98bf [riscv] Canonicalize vsetvli (vsetvli avl, vtype1) vtype2 transitionsas reviewed
This patch is an alternative to a piece of D125270. If we have one vsetvli which is using as AVL the output of another, and the prior AVL can be proven to produce the same VL value as that defining one, we can use the AVL from the prior instruction. This has the effect of removing a state transition on AVL, and will let us use the cheaper 'vsetvli x0, x0, vtype1' form or possible even skip emitting it entirely.

This builds on the same infrastructure as D125337, and does the analogous extension to working on abstract states instead of only prior explicit vsetvli instructions. This is where the (relatively minor) code improvements come from.

More importantly, this fixes the last case where the state computed in phase 1 and 2 of the algorithm differs from the state computed during phase 3. Note that such differences can cause miscompiles by creating disagreements about contents of the VL and VTYPE registers at block boundaries.

Doing this transform inside the dataflow can cause the compatibility of a later store to change with regards to the current state. test15 in the diff illustrates this case well. What we have is a vsetvli which is mutated by one following vector op, but whose GPR result is used by another. The compatibility logic walks back to the def in this case, and checks to see if it matches the immediate prior state. In phase 1 and 2, it doesn't, and in phase 3 (after mutation) it does because we remove a transition which caused it to differ.

Differential Revision: https://reviews.llvm.org/D125392
2022-05-11 10:45:29 -07:00
Philip Reames cc0283a635 [riscv] Prefer to use previous VL for scalar move instructionsK
This patch is an alternative to a piece of D125270. Its direct motivation is to fix a wrong code bug (described below), but somewhat unexpectedly, it also results in a significant code quality improvement for idiomatic fixed length vector patterns.

The existing transform is simply wrong in its current location. We are correct about the fact that the scalar move itself can use the previous vsetvli, but we loose track of the fact that later instructions might depend on the state change represented. That is, the actual value of VL in the register is different than the abstract state thinks it is. Not simply due to precision of modeling, but e.g. the VL register could contain 3 when the abstract state says it is 1. This is annoying hard to demonstrate in practice due to differences in policy flags on the intrinsics, but this is at least a latent wrong code bug.

The code quality benefit comes from the fact we don't need to tie this to explicit vsetvli instructions at all. We can propagate the abstract state, and reduce a) the number of transitions, or b) the cost of those transitions. It turns out we have a bunch of cases - in tests at least - where fixed length AVLs are known non-zero, and we can leave VL unchanged while changing VTYPE.

Differential Revision: https://reviews.llvm.org/D125337
2022-05-11 07:37:50 -07:00
Fraser Cormack 27c7e922fe [RISCV][NFC] Rename variable to appease code style 2022-05-11 12:41:25 +01:00
Fraser Cormack 874b802a6d [RISCV][NFC] Move variable down closer to its first use 2022-05-11 12:33:01 +01:00
Fraser Cormack c1d48b35d8 [SelectionDAG][VP] Rename VP sext/zext/trunc ISD opcodes
Rather than VP_SEXT/VP_ZEXT/VP_TRUNC, having
VP_SIGN_EXTEND/VP_ZERO_EXTEND/VP_TRUNCATE better matches their non-VP
counterparts.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125298
2022-05-11 10:25:51 +01:00
Yeting Kuo 4537aae0d5 [RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF.
The patch make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF.
It's useful to get the vtypes of locations of PseudoReadVL without finding the
corresponding VLEFF/VLSEGFF.
It could simplify optimizations in RISCVInsertVSETVLI like D123581.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125199
2022-05-11 14:07:58 +08:00
jacquesguan 2509dcd58a [RISCV] Add rvv codegen support for vp.fpext.
This patch adds rvv codegen support for vp.fpext. The lowering of fp_round, vp.fptrunc, fp_extend and vp.fpext share most code so use a common lowering function to handle these four.
And this patch changes the intermediate cast from ISD::FP_EXTEND/ISD::FP_ROUND to the RVV VL version op RISCVISD::FP_EXTEND_VL and RISCVISD::FP_ROUND_VL for scalable vectors.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D123975
2022-05-11 03:28:25 +00:00
Philip Reames 7731935ffc [riscv] Consolidate logic for SEW/VL operand offset calculations [nfc] 2022-05-10 15:06:26 -07:00
Philip Reames 413052310a [riscv] Minor style cleanup so that code more obviously matches comments [nfc] 2022-05-10 14:20:26 -07:00
Fraser Cormack 0b2e7a7c72 [RISCV][NFC] Remove else after continue 2022-05-10 11:15:50 +01:00
Fraser Cormack 3b9a231d25 [RISCV] Remove two unmasked RVV patterns
These can be selected to unmasked from masked instructions by the
post-process DAG step.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125239
2022-05-09 16:54:24 +01:00
Philip Reames 70ad96ca5e [riscv, InsertVSETVLI] Rename InstrInfo to Require to more clearly indicate purpose [nfc] 2022-05-09 06:40:33 -07:00
Philip Reames 7ed16e7c51 [riscv] Fix state tracking bug on vsetvli (phi of vsetvli) peephole
This fixes the first of several cases where the state computed in phase 1 and 2 of the algorithm differs from the state computed during phase 3. Note that such differences can cause miscompiles by creating disagreements about contents of the VL and VTYPE registers at block boundaries.

In this particular case, we recognize that for the first vsetvli in a block, that if the AVL is a phi of GPR results from previous vsetvlis and the VTYPE field matches, we can avoid emitting a vsetvli as the register contents don't change. Unfortunately, the abstract state does change and that update was lost.

As noted in the test change, this can actually improve results by preserving information until later state transitions in the block. However, this minor codegen improvement is not the motivation for the patch. The motivation is to avoid cases a case where we break a key internal correctness invariant.

Differential Revision: https://reviews.llvm.org/D125133
2022-05-09 06:21:45 -07:00
Philip Reames c7c3f58544 [riscv] Use early return to reduce nesting for InsertVSETVLI [nfc] 2022-05-06 13:10:05 -07:00
Philip Reames 99a41005fe [riscv] Add early return to InsertVSETLI fixed point step [nfc]
If the income state hasn't changed, and the step function is fixed by assumption, then the output state can't have changed.

In the current algorithm, this is a very minor win and mostly allows adding tracing output without being horrible verbose.
2022-05-06 13:08:11 -07:00
Philip Reames dee9b01d83 [riscv] Add some minimal tracing output to InsertVSETVLI
Only available with -debug.  Main purpose is simplifying an upcoming change, and providing tools for debugging problems.
2022-05-06 13:08:11 -07:00
Philip Reames f486119ce9 [riscv] Add strict asserts for VSETVLI insertion algorithm to help catch bugs
This assertion should hold for any reasonable data flow algorithm, but is known not to in several cases today. I'd like to go ahead and land this off-by-default, so that we can collaborate on fixes and have a common definition of success.

Differential: https://reviews.llvm.org/D125035
2022-05-06 10:28:22 -07:00
wangpc 4ff5e8184c [RISCV] Enable MachineOutliner by default under -Oz for RISCV
Enable default outlining when the function has the minsize attribute.

`addr-label.ll` crashed after enabling this, so a barrier is added before
instruction selection as a workaround.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D122213
2022-05-06 17:37:45 +08:00
Philip Reames 042a7a5f0d [riscv] Use X0 for destination of VSETVLI instruction if result unused
If the GPR destination register of a VSETVLI instruction is unused, we can replace it with X0. This discards the result, and thus reduces register pressure.

Since after the core insertion/lowering algorithm has run, many user written VSETVLIs will have their GPR result unused (as VTYPE/VLEN is now explicitly read instead), this kicks in for most tests which involve a vsetvli intrinsic for fixed length vectorization. (vscale vectorization generally uses the GPR result to know how far to e.g. advance pointers in a loop and these uses are not removed.)  When inserting VSETVLIs to lower psuedos, we prefer the X0 form anyways.

Differential Revision: https://reviews.llvm.org/D124961
2022-05-05 07:39:45 -07:00
Lian Wang 8bb10436ab [RISCV][NFC] Use true_mask replace riscv_vmset_vl in defined patterns.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124660
2022-05-05 03:05:52 +00:00
Craig Topper 60cb489685 [RISCV] Use movImm went multiplying by simm12 in getVLENFactoredAmount.
No reason to special case simm12, movImm handles all immediates.

This also fixe a bug that we weren't passing the frame-setup/destroy
flag to movImm when we were calling it.
2022-05-04 17:23:22 -07:00
Philip Reames 18ed2ee80c [RISCV] Add a version of insertVSETVLI which uses an iterator [NFC]
This is to simplify the final version of D124869.
2022-05-04 14:48:31 -07:00
Craig Topper 411bb42eed [RISCV] Add a special case to treat riscv-v-vector-bits-min=-1 as meaning use Zvl*b value.
riscv-v-vector-bits-min is primarily used to opt-in to the
autovectorizer. The vector width can be determined from Zvl*b.

This patch adds support treating -1 as meaning use Zvl*b so we can
still opt-in to autovectorization without needing to repeat a
vector width already given by Zvl*b or -mcpu.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D124960
2022-05-04 14:26:45 -07:00
Craig Topper 1d6430b9e2 [RISCV] Update isLegalAddressingMode for RVV.
RVV instructions only support base register addressing.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D124820
2022-05-03 19:49:11 -07:00
Craig Topper 9cce9a126c [RISCV] Make use of SHXADD instructions in RVV spill/reload code.
We can use SH1ADD, SH2ADD, SH3ADD to multipy by 3, 5, and 9 respectively.

We could extend this to 3, 5, or 9 multiplied by a power 2 by also
emitting a SLLI.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D124824
2022-05-03 19:35:21 -07:00
Craig Topper 0971819740 [RISCV] Don't lookup TII in RISCVInstrInfo::getVLENFactoredAmount. NFCI
We're already inside of our implementation of TII.
2022-05-03 19:35:21 -07:00
Weverything 5afd20806d [riscv] Mark function as used to avoid unused warning. 2022-05-03 18:51:23 -07:00
Philip Reames 2982d0032b Fix a buildbot warning [nfc] 2022-05-03 14:40:27 -07:00
Philip Reames be50b8c185 [riscv] Add debug printing support for VSETVLIInfo class [nfc] 2022-05-03 14:00:17 -07:00
Hsiangkai Wang eaaa31ff2c [RISCV][TargetLowering] Special case overflow expansion for (uaddo X, C).
Follow-up to D122933.

Differential Revision: https://reviews.llvm.org/D124374
2022-05-03 03:51:36 +00:00
Craig Topper 72a66358f6 [RISCV] Add isCommutable to FADD/FMUL/FMIN/FMAX/FEQ.
Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D123972
2022-05-02 20:21:16 -07:00
Zakk Chen 5807e59a0a [RISCV] Fix incorrect codegen for masked vmsge{u}.vx with mask agnostic.
The result was totally wrong.
We could use mask undisturbed result to emulate the mask agnostic result.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124684
2022-05-02 17:57:29 -07:00
Fangrui Song 2019c9b1c8 [RISCV] Lower case the first letter of LowerRISCVMachineOperandToMCOperand. NFC 2022-05-01 14:13:55 -07:00
luxufan e098281c27 [RISCV] Don't getDebugLoc for the end node of MBB iterator
Because of shrink wrapping, the block to insert epilog may don't have
instructions (Only debug instructions). And the position to insert may
point to MBB.end() that don't have a DebugLoc. This patch fix this
problem.

The test program was copied from the issue:https://github.com/llvm/llvm-project/issues/53662

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D123679
2022-04-30 16:00:20 +08:00
Yeting Kuo c069e37019 [RISCV] Add DAGCombine to fold base operation and reduction.
Transform (<bop> x, (reduce.<bop> vec, splat(neutral_element))) to
(reduce.<bop> vec, splat (x)).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122563
2022-04-30 14:07:05 +08:00
Craig Topper f91690f7db [RISCV] Don't merge addi into load/store address if addi has a FrameIndex operand.
This fixes a crash from D124231.

We can't fold
  (load (add base, (addi src, off1)), off2)
     -> (load (add base, src), off1+off2)
if the src is a FrameIndex. FrameIndex cannot be the operand of an
add.

There was an immediate==0 check that I think was trying to catch
the common case of FrameIndex addis where the immediate is 0, but
they can also appear in non-zero form. Instead explicitly check
for a FrameIndex operand.
2022-04-29 18:22:20 -07:00
Craig Topper 5aa1a7b307 [RISCV] Remove 'frameindex' from list for ComplexPattern. NFC
Putting a node in this list allows the node to be used as the root
of an isel pattern that would then call the ComplexPattern. The
usual case is to use the ComplexPattern as the operand of another
operator.

AddrFI is never used as a root operation. frameindex is handled
directly with custom code in RISCVISelDAGToDAG::Select. So adding
frameindex to the list here serves no purpose.
2022-04-29 17:41:07 -07:00
Philip Reames 3ea191ed03 [RISCV] Factor repeating code into getMaskTypeFor(VT) [nfc] 2022-04-29 10:00:57 -07:00
Philip Reames f927be0df8 [RISCV] Extract getAllOnesMask helper [nfc] 2022-04-29 09:30:18 -07:00
Craig Topper 5c38373125 [RISCV] Improve constant materialization for cases that can use LUI+ADDI instead of LUI+ADDIW.
It's possible that we have a constant that isn't simm32 so we can't
use LUI+ADDIW, but we can use LUI+ADDI. Because ADDI uses a sign
extended constant, it's possible that after subtracting it out, we
end up with a simm32 that maps to LUI.

This patch detects this case after removing Lo12 and before shifting
the value for SLLI.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D124222
2022-04-29 08:58:32 -07:00
LiaoChunyu 03a3654203 [RISCV] Add cost model for SK_Broadcast
Add cost model for broadcast shuffle in RISCVTTIImpl::getShuffleCost
with scalable vector. The cost model might not the best.

For scalable vector, BasicTTIImpl::getShuffleCost return invalid cost,
so this patch relies on the existing cost model in BasicTTIImpl.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124101
2022-04-29 13:28:02 +08:00
Hsiangkai Wang c62b014db9 [RISCV] Merge addi into load/store as there is a ADD between them
This patch adds peephole optimizations for the following patterns:

(load (add base, (addi src, off1)), off2)
   -> (load (add base, src), off1+off2)
(store val, (add base, (addi src, off1)), off2)
   -> (store val, (add base, src), off1+off2)

Differential Revision: https://reviews.llvm.org/D124231
2022-04-29 04:33:05 +00:00
Craig Topper ec11fbb1d6 [RISCV] Use default promotion for (i32 (shl 1, X)) on RV64 when Zbs is enabled.
This improves opportunities to use bset/bclr/binv. Unfortunately,
there are no W versions of these instrcutions so this isn't always
a clear win. If we use SLLW we get free sign extend and shift masking,
but need to put a 1 in a register and can't remove an or/xor. If
we use bset/bclr/binv we remove the immediate materializationg and
logic op, but might need a mask on the shift amount and sext.w.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D124096
2022-04-28 09:58:30 -07:00
Craig Topper 8631a5e712 [RISCV] Fix alias printing for vmnot.m
By clearing the HasDummyMask flag from mask register binary operations
and mask load/store.

HasDummyMask was causing an extra operand to get appended when
converting from MachineInstr to MCInst. This extra operand doesn't
appear in the assembly string so was mostly ignored, but it prevented
the alias instruction printing from working correctly.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D124424
2022-04-28 08:33:52 -07:00
Lian Wang dc0ae8ce18 [RISCV] Support VP_SETCC mask operations
Support VP_SETCC mask operations, turn it to logical operation.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124438
2022-04-28 08:52:29 +00:00
Craig Topper c2614b31d9 [RISCV] Add isCommutable to scalar FMA instructions.
The default implementation of findCommutedOpIndices picks the
first two source operands. That's exactly what we want for the
scalar FMA instructions.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D124463
2022-04-27 11:07:18 -07:00
Jim Lin 9de7b93bc0 [RISCV][NFC] Update and add missing closed curly bracket comment in RISCVInstrInfoZb.td 2022-04-27 15:08:51 +08:00
ShihPo Hung 6b55f133fb [RISCV][RVV] Select unmasked TU RVV pseudos in a DAG post-process
Following D118810 that reduced the size of ISel table,
this patch optimizes allone-masked RVV pseudos with TU policy and
swap them out to their unmasked TU pseudos.

Since the UNDEF merge operand is not preserved, we turn it into TA
pseudo regardless of the policy operand.

Reviewed By: craig.topper, frasercrmck
Differential Revision: https://reviews.llvm.org/D121881
2022-04-26 20:14:54 -07:00
Vasileios Porpodas fa8a9fea47 Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 6a9bbd9f20.

Code review: https://reviews.llvm.org/D124202
2022-04-26 14:02:40 -07:00
Shao-Ce SUN c59473aacc [NFC][RISCV][CodeGen] Use ArrayRef in TargetLowering functions
Based on D123467.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D123653
2022-04-26 23:53:00 +08:00
Craig Topper 40f1af4760 [RISCV] Add isCommutable to ADD/ADDW/MUL/AND/OR/XOR/MIN/MAX/CLMUL
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D123970
2022-04-25 10:53:41 -07:00
Zakk Chen ffe03ff75c [RISCV] Fix incorrect policy implement for unmasked vslidedown and vslideup.
vslideup works by leaving elements 0<i<OFFSET undisturbed.
so it need the destination operand as input for correctness
regardless of policy. Add a operand to indicate policy.

We also add policy operand for unmaksed vslidedown to keep the interface consistent with vslideup
because vslidedown have only undisturbed at 0<i<vstart but user have no way to control of vstart.

Reviewed By: rogfer01, craig.topper

Differential Revision: https://reviews.llvm.org/D124186
2022-04-25 09:18:41 -07:00
wangpc 7a21a0525a [RISCV] Add sched to pseudo function call instructions
To fix llvm-mca's error of 'found an unsupported instruction
in the input assembly sequence.' caused by the lack of
scheduling info.

Pseudo function call instructions will be expanded to `auipc`
and `jalr`, so their scheduling info are the combination of
two.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123578
2022-04-24 14:58:18 +08:00
Mohammed Nurul Hoque 5dd99f71aa [RISCV] transform MI to W variant to remove sext.w
Backwards search
The sext.w removal pass (before the new patch) checks if the input to sext.w is already in sign-extended form, so it can eliminate it. It does that by checking every definition/source that reaches the sext.w is an instruction that produces a sign-extended value, either by definition (e.g. ADDW), or it propagates sign-extension (e.g. OR) so we check its sources recursively.

Forward search
Sometimes, one of the sources is an instruction that doesn't always produce a sign-extended value, but it has a W-version that does (e.g. ADD / ADDW). If we transform the ADD to ADDW, the sext.w can be removed (assuming other def paths are satisfied), but this transformation is sound only if every use of this ADD/W only reqruires the lower 32-bits either directly (like sll %x, 32) or they propagate dependency (lower word of output only depends on lower word of input) so we check its uses recursively.

When searching backwards, if an instruction that can be replaced with W-variant is encountered, this pass runs the forward search to verify it can be replaced, then adds it to a list of fixable instructions. After verifying all paths, it replaces the instruction and removes the sext.w.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D119928
2022-04-22 10:59:26 -07:00
Fraser Cormack 98db7ea262 [RISCV][NFC] Adjust some formatting in VL patterns 2022-04-22 17:19:27 +01:00
Fraser Cormack 2b0fedc2dd [RISCV] Print human-readable VTYPE/SEW/LMUL in MIR
This patch adds custom MIR operand comments to VTYPE immediate operands
in VSETVLI instructions and SEW/LMUL operands in vector codegen pseudo
instructions. The result is intended to be more human-readable and
hopefully maintainable when working with MIR, particularly when
writing or reading test cases.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124187
2022-04-22 17:13:18 +01:00
wangpc 5c3ea07848 [RISCV] Do not outline CFI instructions when they are needed in EH
We saw a failure caused by unwinding with incomplete CFIs, so we
can't outline CFI instructions when they are needed in EH.

This is a recommit of 0d40688, which was reverted in ce83883 as
related precommit test 360d44e caused some errors.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D122634
2022-04-22 12:28:19 +08:00
Ping Deng 7493d9ffb6 [RISCV][NFC] Use defvar to simplify pattern definations.
Reviewed By: jacquesguan, frasercrmck

Differential Revision: https://reviews.llvm.org/D123839
2022-04-22 02:45:14 +00:00
Craig Topper 9534811aa8 [RISCV] Teach generateInstSeqImpl to generate BSETI for single bit cases.
If the immediate has one bit set, but isn't a simm32 we can try
the BSETI instruction from Zbs.
2022-04-21 12:08:34 -07:00
Craig Topper 98b866892d [RISCV] Add special case to constant materialization to remove trailing zeros first.
If there are fewer than 12 trailing zeros, we'll try to use an ADDI
at the end of the sequence. If we strip trailing zeros and end the
sequence with a SLLI we might find a shorter sequence.

Differential Revision: https://reviews.llvm.org/D124148
2022-04-21 09:43:32 -07:00
wangpc ce83883691 Revert "[RISCV] Do not outline CFI instructions when they are needed in EH"
This reverts commit 0d40688925.
2022-04-21 16:23:10 +08:00
wangpc 0d40688925 [RISCV] Do not outline CFI instructions when they are needed in EH
We saw a failure caused by unwinding with incomplete CFIs, so we
can't outline CFI instructions when they are needed in EH.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D122634
2022-04-21 16:13:22 +08:00
Fraser Cormack 3e678cb772 [RISCV] Don't emit fractional VIDs with negative steps
We can't shift-right negative numbers to divide them, so avoid emitting
such sequences. Use negative numerators as a proxy for this situation, since
the indices are always non-negative.

An alternative strategy could be to add a compiler flag to emit division
instructions, which would at least allow us to test the VID sequence
matching itself.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123796
2022-04-21 07:00:34 +01:00
Craig Topper 186d5c8af5 [RISCV] Make getInstSeqCost handle other Zb* instructions.
We haven't been updating this as Zb* instructions have been used
for immediate materialization. They will hit the default case and
trigger an llvm_unreachable. Instead of trying to list them all,
assume instructions that aren't explicitly listed aren't compressible.

Spotted while looking at integer materialization for other reasons.
I haven't seen a crash from this yet.
2022-04-20 22:08:04 -07:00
Craig Topper 6db0afb44e [RISCV] Fold (xor (sllw 1, x), -1) -> (rolw ~1, x).
There's an existing generic combine that does this for legal types.
This patch adds a RISCV specific combine for W instructions.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D123983
2022-04-19 15:03:43 -07:00
Fraser Cormack c5cac48549 [RISCV] Fix lowering of BUILD_VECTORs as VID sequences
This patch fixes a bug when lowering BUILD_VECTOR via VID sequences.
After adding support for fractional steps in D106533, elements with zero
steps may be skipped if no step has yet been computed. This allowed
certain sequences to slip through the cracks, being identified as VID
sequences when in fact they are not.

The fix for this is to perform a second loop over the BUILD_VECTOR to
validate the entire sequence once the step has been computed. This isn't
the most efficient, but on balance the code is more readable and
maintainable than doing back-validation during the first loop.

Fixes the tests introduced in D123785.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123786
2022-04-19 07:43:38 +01:00
jacquesguan 25445b94db [RISCV] Add rvv codegen support for vp.fptrunc.
This patch adds rvv codegen support for vp.fptrunc. The lowering of fp_round and vp.fptrunc share most code so use a common lowering function to handle those two, similar to vp.trunc.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123841
2022-04-19 01:56:18 +00:00
Lian Wang 545d353b3c [RISCV][NFC] Refactor VL patterns for vnsrl and vnsra
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D123274
2022-04-15 07:42:59 +00:00
jacquesguan 1aa4f0bb6c [RISCV][VP] Add RVV codegen for vp.trunc.
Differential Revision: https://reviews.llvm.org/D123579
2022-04-15 02:29:53 +00:00
Lian Wang 3100893f63 [RISCV] Remove sext_inreg+riscv_grev/riscv_gorc isel patterns
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123565
2022-04-14 08:16:32 +00:00
Lian Wang 38706dd940 [RISCV][NFC] Refactor patterns for Multiply Add instructions
Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D123355
2022-04-14 08:00:00 +00:00
wangpc d0828c5af9 [RISCV][NFC] Use addExpr() instead of createExpr()
It seems to be neater.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D123675
2022-04-14 10:48:25 +08:00
Liqin Weng 8265679018 [RISCV][NFC] Refactor the type promotion of fsl/fsr/becompress/bdecompress/bfp
Reviewed By: asb, jrtc27, craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D123181
2022-04-13 08:52:04 +00:00
Craig Topper 057c063c9b [RISCV] Add a encodeLMUL function to RISCVVType. NFC
This moves the encoding handling out of the assembly parser.

Reviewed By: khchen, frasercrmck

Differential Revision: https://reviews.llvm.org/D123553
2022-04-12 13:39:47 -07:00
Craig Topper 2ce2562876 [RISCV][SelectionDAG] Add a hook to sign extend i32 ConstantInt operands of phis on RV64.
Materializing constants on RISCV is simpler if the constant is sign
extended from i32. By default i32 constant operands of phis are
zero extended.

This patch adds a hook to allow RISCV to override this for i32. We
have an existing isSExtCheaperThanZExt, but it operates on EVT which
we don't have at these places in the code.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D122951
2022-04-11 14:38:39 -07:00
Craig Topper 76192182d0 [RISCV] Remove riscv-v-fixed-length-vector-elen-max command line option.
This was added before Zve extensions were defined. I think users
should use Zve32x or Zve32f now. Though we will lose support for limiting
ELEN to 16 or 8, but I hope no one was using that.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D123418
2022-04-11 10:14:48 -07:00
Craig Topper c266e50430 [RISCV] Remove ExtZvl enum from RISCVSubtarget. NFC
Having an enum with names that contain the string representation
of their value doesn't add any value. We can just use the numbers.

Reviewed By: kito-cheng, frasercrmck

Differential Revision: https://reviews.llvm.org/D123417
2022-04-11 10:01:17 -07:00
LiaoChunyu 505fce5a9e [RISCV] Add basic code modeling for llvm.experimental.stepvector intrinsic
Scalable vectors llvm.experimental.stepvector intrinsic
will crash due to an invalid cost when run the code through the loopunroll.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D122782
2022-04-11 10:19:23 +08:00
Craig Topper 4e561a581f [RISCV] Remove unnecessary cast to i8* when converting gather/scatter to strided load/store.
Not sure why I thought this necessary at the time.
2022-04-09 20:05:03 -07:00
Craig Topper 70046438d0 [RISCV] Only try LUI+SH*ADD+ADDI for int materialization if LUI+ADDI+SH*ADD failed.
There's an assert in LUI+SH*ADD+ADDI materialization that makes sure the
lower 12 bits aren't zero since that case should have been handled as
LUI+ADDI+SH*ADD. But nothing prevented the LUI+SH*ADD+ADDI checks from
running after the earlier code handled it.

The sequence would be the same length or longer so it wouldn't replace
the earlier sequence, but the assert happened before that was checked.

The vector holding the sequence also wasn't reset before the second
check so that guaranteed the sequence would never be found to be
shorter.

This patch fixes this by only trying the second expansion when the
earlier fails.

Fixes PR54812.

Reviewed By: benshi001

Differential Revision: https://reviews.llvm.org/D123406
2022-04-09 08:52:15 -07:00
Fraser Cormack 34e1b4774a [RISCV] Select unmasked FP setcc insts via ISel post-process
Similar to D123217 but for the floating-point patterns. No change in
generated output, while reducing the generated table size.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D123291
2022-04-08 17:13:43 +01:00
Craig Topper 1903b99154 [RISCV] Always select (and (srl X, C), Mask) as (srli (slli X, C2), C3).
SLLI is always compressible to C.SLLI as long as the source and dest
register is the same.

ANDI and SRLI are only compressible if the register is x8-x15. By
using SLLI we have a better chance of generating shorter code.

I had to exclude one exclusion for the BEXTI case so that it's
pattern match could still fire.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D123336
2022-04-08 09:04:04 -07:00
Kito Cheng 9c5aedfbf5 [RISCV] Fixing stack offset for RVV object with vararg in stack.
We found LLVM generate wrong stack offset for RVV object when stack
having variable argument, that cause by we didn't count vaarg part during
calculate RVV stack objects.

Also update the stack layout diagram for including vaarg in the diagram.

Stack layout ref:
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/riscv/riscv.cc#L3941

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D123180
2022-04-08 12:01:16 +08:00
Kito Cheng 690085c9b7 [RISCV] Store/restore RISCVMachineFunctionInfo into MIR YAML file
RISCVMachineFunctionInfo has some fields like VarArgsFrameIndex and
VarArgsSaveSize are calculated at ISel lowering stage, those info are
not contained in MIR files, that cause test cases rely on those field
can't not reproduce correctly by MIR dump files.

This patch adding the MIR read/write for those fields.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D123178
2022-04-08 11:55:48 +08:00
jacquesguan a55c19c44b [RISCV][NFC] Use defvar to simplify pattern definations.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D123292
2022-04-08 02:51:30 +00:00
Craig Topper d98bea87ef [RISCV] Add more .vx patterns for VLMax integer setccs.
This patch synchronizes the structure of the templates with those
in RISCVInstrInfoVVLPatterns.td so that we get patterns with .vx
on the left hand side.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D123255
2022-04-07 09:17:43 -07:00
Craig Topper 82662b753d [RISCV] Add swapped patterns to VPatIntegerSetCCVL_VIPlus1.
This matches VPatIntegerSetCCVL_VI_Swappable. But as noted in the
FIXME this may only be needed due to lack of canonicalization on
VP_SETCC.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D123239
2022-04-07 09:17:08 -07:00
Luís Marques d09d297c5d [RISCV] Fix crash for section alignment with .option norvc
The existing code wasn't getting the subtarget info from the fragment,
so the current status of RVC would be ignored. This would cause a crash
for the new test case when the target then reported it couldn't write
the requested number of code alignment bytes.

Differential Revision: https://reviews.llvm.org/D122236
2022-04-07 12:02:27 +01:00
Fraser Cormack 8ebc9b1560 [RISCV] Select unmasked integer setcc insts via ISel post-process
This patch has no effect on the generated code, whilst mitigating the
increase in ISel table size caused by the recent addition of masked
patterns.

I aim to do the same for floating-point patterns once D123051 lands,
giving us a reason to use masked floating-point patterns.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D123217
2022-04-07 09:30:19 +01:00
Fraser Cormack 8216255c9f [RISCV][VP] Add basic RVV codegen for vp.fcmp
This patch adds the necessary infrastructure to lower vp.fcmp via
ISD::VP_SETCC to RVV instructions.

Most notably this patch adds cond-code legalization for VP_SETCC,
reusing the existing TargetLowering::LegalizeSetCCCondCode by passing in
additional SDValue parameters for the Mask and EVL. This method then
uses VP operations to legalize the condcode.

There is still a general lack of canonicalization on VP_SETCC as opposed
to SETCC which results in worse code than is theoretically possible.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123051
2022-04-07 09:16:07 +01:00
Liqin Weng f891123556 [RISCV] Add CMOV isel pattern for (select (setgt X, Imm), Y, Z)
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122644
2022-04-07 05:55:53 +00:00
Lian Wang 1b547799c5 [RISCV] Supplement patterns for vnsrl.wx/vnsra.wx when splat shift is sext or zext
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122786
2022-04-07 02:21:41 +00:00
Craig Topper e13a44b460 [RISCV] Add lowering for vp.sext and vp.zext.
Including mask vector inputs.

Reviewed By: frasercrmck, rogfer01

Differential Revision: https://reviews.llvm.org/D123150
2022-04-06 09:59:49 -07:00
Fraser Cormack 6be5e875be [RISCV][VP] Add basic RVV codegen for vp.icmp
This patch adds the minimum required to successfully lower vp.icmp via
the new ISD::VP_SETCC node to RVV instructions.

Regular ISD::SETCC goes through a lot of canonicalization which targets
may rely on which has not hereto been ported to VP_SETCC. It also
supports expansion of individual condition codes and a non-boolean
return type. Support for all of that will follow in later patches.

In the case of RVV this largely isn't a problem as the vector integer
comparison instructions are plentiful enough that it can lower all
VP_SETCC nodes on legal integer vectors except for boolean vectors,
which regular SETCC folds away immediately into logical operations.

Floating-point VP_SETCC operations aren't as well supported in RVV and
the backend relies on condition code expansion, so support for those
operations will come in later patches.

Portions of this code were taken from the VP reference patches.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122743
2022-04-06 16:51:22 +01:00
Craig Topper 3c831c9b28 [RISCV] Add support for vp.fptosi where the result is a mask type.
We can do this conversion by converting the same sized integer type, then compare the result with 0. The conversion is undefined if the converted FP value doesn't fit in an i1.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D122678
2022-04-05 09:48:04 -07:00
Craig Topper d970e96c53 [RISCV] Add lowering for vp.fptoui and vp.uitofp.
This is a straightforward extension of D122512 to unsigned integers.
2022-04-01 18:28:46 -07:00
Craig Topper fa630e7594 [RISCV][AMDGPU][TargetLowering] Special case overflow expansion for (uaddo X, 1).
If we expand (uaddo X, 1) we previously expanded the overflow calculation
as (X + 1) <u X. This potentially increases the live range of X and
can prevent X+1 from reusing the register that previously held X.

Since we're adding 1, overflow only occurs if X was UINT_MAX in which
case (X+1) would be 0. So this patch adds a special case to expand
the overflow calculation to (X+1) == 0.

This seems to help with uaddo intrinsics that get introduced by
CodeGenPrepare after LSR. Alternatively, we could block the uaddo
transform in CodeGenPrepare for this case.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122933
2022-04-01 13:14:10 -07:00
Lian Wang 62dd3674bc [RISCV] Supplement SDNode patterns for vfwmul/vfwadd/vfwsub
Reviewed By: jacquesguan

Differential Revision: https://reviews.llvm.org/D122720
2022-04-01 03:09:50 +00:00
Fraser Cormack ee51aefba0 [RISCV][NFC] Minor formatting fix 2022-03-31 16:15:22 +01:00
Fraser Cormack a276d1f44b [RISCV][NFC] Fix formatting on one line 2022-03-31 13:17:37 +01:00
ShihPo Hung 2f1261abe4 [RISCV][RVV] Add Uses = [FRM] and mayRaiseFPException = true to RVV instructions
This patch adds Uses = [FRM] and mayRaiseFPException = true to following
instructions:

VFADD, VFSUB, VFRSUB, VFMUL, VFDIV, VFRDIV
VFWADD, VFWSUB, VFWMUL
VFMADD, VFMACC, VFMSAC, VFMSUB
VFNMADD, VFNMACC, VFNMSAC, VVFNMSUB
VFWMACC, VFWMSAC,
VFWNMACC, VFWNMSAC
VFSQRT, VFREC7
VFREDOSUM, VFREDUSUM,
VFWREDOSUM, VFWREDUSUM
and only adds mayRaiseFPException = true to following instructions:

VFRSQRT7,
VFMIN, VFMAX, VFREDMIN, VFREDMAX
VMFEQ, VMFNE, VMFLT,VMFLE, VMFGT, VMFGE

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D121087
2022-03-31 01:33:17 -07:00
Fraser Cormack 893d63fbdc [RISCV][NFC] Fix comment to refer to correct file 2022-03-31 08:59:10 +01:00
Lian Wang b3851e9931 [RISCV] Add VL patterns for vfwmul/vfwadd/vfwsub
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D122369
2022-03-31 07:08:58 +00:00
Craig Topper 4477500533 [RISCV] ISel (and (shift X, C1), C2)) to shift pair in more cases
Previously, these isel optimizations were disabled if the AND could
be selected as a ANDI instruction. This patch disables the optimizations
only if the immediate is valid for C.ANDI. If we can't use C.ANDI,
we might be able to compress the shift instructions instead.

I'm not checking the C extension since we have relatively poor test
coverage of the C extension. Without C extension the code size
should be equal. My only concern would be if the shift+andi had
better latency/throughput on a particular CPU.

I did have to add a peephole to match SRLIW if the input is zexti32
to prevent a regression in rv64zbp.ll.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D122701
2022-03-30 11:46:42 -07:00
Craig Topper 7417eb29ce [RISCV] Use getSplatBuildVector instead of getSplatVector for fixed vectors.
The splat_vector will be legalized to build_vector eventually
anyway. This patch makes it take fewer steps.

Unfortunately, this results in some codegen changes. It looks
like it comes down to how the nodes were ordered in the topological
sort for isel. Because the build_vector is created earlier we end up
with a different ordering of nodes.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D122185
2022-03-30 11:36:34 -07:00
Liqin Weng 4cb85da811 [RISCV] Add CMIX isel pattern for (xor (and (xor rs1, rs3), rs2), rs3)
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122702
2022-03-30 16:51:09 +08:00
Fraser Cormack 75047577d6 [RISCV] Trim RVV isel pats matchable via DAG post-process
In D122512, several masked patterns were added to support lowering of
vector-predicated float-to-int and int-to-float conversions. With the
introduction of these patterns, all of the old "unmasked" patterns are
matchable via the DAG post-process introduced in D118810, once the relevant
opcode entries are set up in the helper table.

Locally this reduces the generated isel table by 4%.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D122637
2022-03-30 08:56:38 +01:00
Liqin Weng 7f81765898 [RISCV][NFC] Add immediate tests for the icmp instruction
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122651
2022-03-30 02:51:26 +00:00
Zakk Chen b578330754 [RISCV] Use maskedoff to decide mask policy for masked compare and vmsbf/vmsif/vmsof.
masked compare and vmsbf/vmsif/vmsof are always tail agnostic, we could
check maskedoff value to decide mask policy rather than have a addtional
policy operand.

Reviewed By: craig.topper, arcbbb

Differential Revision: https://reviews.llvm.org/D122456
2022-03-29 18:05:33 -07:00
Zakk Chen 10b2760da0 Revert "[RISCV] Add policy operand for masked compare and vmsbf/vmsif/vmsof IR"
This reverts commit 10fd2822b7.

I have a better implementation for those operations without the
additional policy operand.
masked compare and vmsbf/vmsif/vmsof are always tail agnostic so we could
assume undef maskedoff is mask agnostic.

Differential Revision: https://reviews.llvm.org/D122455
2022-03-29 18:05:33 -07:00
Liqin Weng d660c0d793 [RISCV] Optimize LI+SLT to SLTI+XORI for immediates in specific range
This transform will reduce one GPR.

Reviewed By: craig.topper, benshi001

Differential Revision: https://reviews.llvm.org/D122051
2022-03-29 14:46:49 +08:00
Craig Topper 45e85feba6 [RISCV] Pull APInt/computeKnonwbits specifics out of computeGREVOrGORC. NFC
This function now takes a uint64_t instead of an APInt. The caller
is responsible for masking the shift amount, extracting and inserting
into the KnownBits APInts, and inverting to compute zeros.

This is less code and cleaner division of responsibilities.
2022-03-28 20:53:54 -07:00