Commit Graph

2266 Commits

Author SHA1 Message Date
Craig Topper c3c17b1695 [RISCV] Use MVT for the argument to getMaskTypeFor. NFC
Only one caller didn't already have an MVT and that was easy to
fix. Since the return type is MVT and it uses MVT::getVectorVT,
taking an MVT as input makes the most sense.
2022-07-11 15:14:44 -07:00
Craig Topper 759e5e0096 [RISCV] Remove doPeepholeLoadStoreADDI.
All of the cases should be handled by SelectAddrRegImm now.

Reviewed By: asb, luismarques

Differential Revision: https://reviews.llvm.org/D129451
2022-07-11 10:44:33 -07:00
Craig Topper 907d923a20 [RISCV] Move the custom isel for (add X, imm) into SelectAddrRegImm.
This custom isel was used to split the lo12 bits of the imm so that
they could be folded into load/store addresses via a post-isel
peephole.

This patch instead splits the immediate during isel and folds the
lo12 removing the need for the post-isel peephole to do anything.

After this we'll be able to remove the post-isel peephole.

Reviewed By: asb, luismarques

Differential Revision: https://reviews.llvm.org/D129450
2022-07-11 10:44:33 -07:00
Craig Topper 1a2bd44b77 [RISCV] Make shouldConvertConstantLoadToIntImm return true unless enableUnalignedScalarMem is true.
This restores the old behavior before D129402 when
enableUnalignedScalarMem is false. This fixes a regression spotted
by @asb.

To fix this correctly, we need to consider alignment of the load
we'd be replacing, but that's not possible in the current interface.
2022-07-11 09:40:08 -07:00
David Sherwood 03fee6712a [LoopVectorize] Add option to use active lane mask for loop control flow
Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operations,
such as masked loads and stores, etc. The loop itself is still
controlled by comparing the canonical induction variable with the
trip count. However, for some targets this is inefficient when it's
cheap to use the mask itself to control the loop.

This patch adds support for using the active lane mask for control
flow by:

1. Generating the active lane mask for the next iteration of the
vector loop, rather than the current one. If there are still any
remaining iterations then at least the first bit of the mask will
be set.
2. Extract the first bit of this mask and use this bit for the
conditional branch.

I did this by creating a new VPActiveLaneMaskPHIRecipe that sets
up the initial PHI values in the vector loop pre-header. I've also
made use of the new BranchOnCond VPInstruction for the final
instruction in the loop region.

Differential Revision: https://reviews.llvm.org/D125301
2022-07-11 13:46:55 +01:00
LiaoChunyu 3f68f0f816 [RISCV] Optimize 2x SELECT for floating-point types
Including the following opcode:
 Select_FPR16_Using_CC_GPR
 Select_FPR32_Using_CC_GPR
 Select_FPR64_Using_CC_GPR

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127871
2022-07-11 14:10:27 +08:00
Pengcheng Wang 8977989449 [RISCV] Increase complexity of RVV element extraction patterns
Somehow some tests failed in our downstream because it matched
VFMV+FSD pattern first. Both FSD and VSE patterns have the same
complexity, while FSD is matched before VSE in the generated
matcher table.

This problem only occurs in our downstream (so sorry that I can't
provide a test here) and increasing the value of `AddedComplexity`
can fix it.

Reviewed By: StephenFan, craig.topper

Differential Revision: https://reviews.llvm.org/D129360
2022-07-11 10:53:15 +08:00
Craig Topper 35ec8a423d [RISCV] Teach shouldConvertConstantLoadToIntImm that constant materialization can use constant pools.
I think it only makes sense to return true here if we aren't going
to turn around and create a constant pool for the immmediate.

I left out the check for useConstantPoolForLargeInts() thinking
that even if you don't want the commpiler to create a constant pool
you might still want to avoid materializing an integer that is
already available in a global variable.

Test file was copied from AArch64/ARM and has not been commited yet.
Will post separate review for that.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D129402
2022-07-10 14:10:17 -07:00
Craig Topper 5f7641a3be [RISCV] Modify the custom isel for (add X, imm) used by load/stores.
We have custom isel that tries to select the Lo12 bits using a
separate ADDI that can later folded into the load/store address
by the post-isel peephole.

This patch disables this if the load/store already had a non-zero
offset. A non-zero offset implies that CodeGenPrepare split several
large offsets used by different loads and stores into a common large
offset and multiple small offsets that could be folded. Folding more
of the lo12 bits changes this common offset by increasing the small
offsets. While this can save an instruction to materialize the common
offset, it can also prevent the small offsets from fitting in a
compressed load/store instruction.

Removing this also simplifies the last piece needed to fold the custom
isel for add into SelectAddrRegImm and remove the post-isel peephole.
2022-07-09 22:47:27 -07:00
Craig Topper 9c6a2200e2 [RISCV] Support folding constant addresses in SelectAddrRegImm.
We already handled this by folding an ADDI in the post-isel peephole.
My goal is to remove that peephole so this adds the functionality
to isel.
2022-07-09 13:12:02 -07:00
Philip Reames b12930e133 [RISCV] Switch to using get.active.lane.mask when tail folding
The motivation here is to a) bring us closer into alignment with AArch64 under the assumption that codepath is better tested, and b) simplify pattern matching in an upcoming change.

The immediate impact is a significant IR reduction but a fairly minimal change in the generated assembly. Due to a difference in expansion behavior we get a saturating add vs an unsaturating one for the old code, but that's about it. This difference comes down to different handling of overflow, which doesn't seem to be possible here anyways, so the assembly codegen is arguably a minor regression. I don't expect that to matter in practice.

Differential Revision: https://reviews.llvm.org/D129221
2022-07-08 10:24:59 -07:00
Craig Topper 92f1794d41 [RISCV] Mark fminnum_vl and fmaxnum_vl as commutable. 2022-07-08 10:19:09 -07:00
Philip Reames 264018d764 [RISCV] Mark vsadd(u)_vl as commutable
This allows fixed length vectors involving splats on the LHS to commute into the _vx form of the instruction. Oddly, the generic canonicalization rules appear to catch the scalable vector cases. I haven't fully dug in to understand why, but I suspect it's because of a difference in how we represent splats (splat_vector vs build_vector).

Differential Revision: https://reviews.llvm.org/D129302
2022-07-08 10:18:21 -07:00
Craig Topper a246eb6814 [RISCV] Mark (s/u)min_vl and (s/u)max_vl as commutable. 2022-07-08 09:59:42 -07:00
Kito Cheng 5c45ae4108 [RISCV] Fix wrong register rename for store value during make-compressible optimization
Current implementation will rename both register in store instructions if
we store base address into memory with same base register, it's OK if
the offset is 0, however that is wrong transform if offset isn't 0, give
a smalle example here:

sd      a0, 808(a0)

We should not transform into:

addi    a2, a0, 768
sd      a2, 40(a2)

That should just rename base address like this:

addi    a2, a0, 768
sd      a0, 40(a2)

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128876
2022-07-08 18:07:17 +08:00
Lian Wang 9cfb28d672 [RISCV] Change VECTOR_SPLICE mask operation from expand to promote
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128717
2022-07-08 06:20:22 +00:00
Diego Caballero bf1758c3dc Revert "[RISCV] Optimize 2x SELECT for floating-point types"
This reverts commit 1178992c72.
2022-07-07 22:54:00 +00:00
Craig Topper 088bb8a328 [RISCV] Add more SHXADD patterns where the input is (and (shl/shr X, C2), C1)
It might be possible to rewrite (and (shl/shr X, C2), C1) in a way
that gives us a left shift that can be folded into a SHXADD.
2022-07-05 16:21:47 -07:00
Craig Topper a1cd3f49b6 [RISCV] Use a switch statement in PreprocessISelDAG. NFC
This should make it easier to add more peepholes in the future.
2022-07-05 12:25:04 -07:00
Craig Topper c15bcad2f9 [RISCV] Update PreprocessISelDAG to use RemoveDeadNodes.
Instead of deleting nodes as we go, delete all dead nodes if a
change is made. This allows adding peepholes that might make
multiple nodes dead.
2022-07-05 12:25:03 -07:00
Craig Topper f27672924e [RISCV] Replace an explicit check with an assert.
Shift amounts should never be 0 or more than bitwidth - 1.
2022-07-04 23:21:54 -07:00
Craig Topper 66790b70ea [RISCV] Rename some variables for clarity. NFC 2022-07-04 23:21:54 -07:00
jacquesguan 063500afc0 [RISCV][NFC] Merge the isolated decleration into foreach.
Reviewed By: benshi001

Differential Revision: https://reviews.llvm.org/D129063
2022-07-05 10:17:45 +08:00
luxufan c06d0b4d02 [RISCV] Add ADDI instr for computing FrameIndex address
RVV doesn't have immediate field for memory addressing. Currently
we build MachineInstructions in PEI to computing stack offset for
RVV load store instructions. These instructions were added too late to
can be optimized by CSE, LICM... passes.

This patch makes FrameIndex SDNodes can't be matched in RVV Load Store
instruction selection patterns. So that the FrameIndex SDNodes would be
selected as `ADDI GPR, targetframeindex`.

There are 2 advantages for such change:
1. Stack objects address computing can be optimized by machine function
passes.
2. Since the ADDI instruction's destination register can be used as a
temp register, we can save an emergency spill slot.

Differential Revision: https://reviews.llvm.org/D128187
2022-07-04 22:13:35 +08:00
Craig Topper d36e09cfe5 [RISCV] Add more SHXADD patterns.
This handles the code we get for this.

int foo(unsigned x, int *y) {
    return y[x >> 3];
}

The srl and shl implied by the array index will be combined to
form (srl (and X, C2), C1). We need to reverse this get to back
the shl to fold into SHXADD.
2022-07-03 21:57:05 -07:00
Craig Topper 8eb4dcb737 [RISCV] Move some SHXADD matching cases into a ComplexPattern. NFC
Some more complex cases require checking the relationship of
operands on different nodes of the match. They also require
additional instructions to be created. Using a ComplexPattern
gives us that flexibility.

I'll be adding another pattern in a future patch.
2022-07-03 21:57:05 -07:00
Craig Topper 13d58ff9f3 [RISCV] Replace call to APInt::countTrailingZeros with uint64_t verson. NFC
We know the number of bits is 64 or 32 so we can use the uint64_t
version directly. This saves the APInt needing to check for the
small vs large size.
2022-07-03 09:00:01 -07:00
luxufan 0f45eaf0da [RISCV] Add a scavenge spill slot when use ADDI to compute scalable stack offset
Computing scalable offset needs up to two scrach registers. We add
scavenge spill slots according to the result of `RISCV::isRVVSpill`
and `RVVStackSize`. Since ADDI is not included in `RISCV::isRVVSpill`,
PEI doesn't add scavenge spill slots for scrach registers when using
ADDI to get scalable stack offsets.

The ADDI instruction has a destination register which can be used as
a scrach register. So one scavenge spil slot is sufficient for
computing scalable stack offsets.

Differential Revision: https://reviews.llvm.org/D128188
2022-07-03 20:18:13 +08:00
Craig Topper 7e4ab9d5b8 [RISCV] Add more SHXADD isel patterns.
This handles the code we get for

int foo(int* x, unsigned y) {
  return x[y >> 1];
}

The shift right and the shl will get DAG combined into
(shl (and X, 0xfffffffe), 1). We have custom isel to match the
shl+and, but with Zba the (add (shl X, 1), Y) part will get
matched and leave the and to be iseled by itself. This commit
adds a larger pattern that includes the and.
2022-07-02 23:11:22 -07:00
Craig Topper 5d787689b1 [RISCV] Match RISCVISD::ADD_LO in SelectAddrRegImm.
This allows us to fold global and constant pool addresses into
load/store during isel instead of in the post-isel peephole. I
did not copy the alignment check for ConsantPoolSDNode because it
wasn't tested.

This is a step towards being able to remove the post-isel
peephole.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D128738
2022-07-02 09:51:06 -07:00
Craig Topper b2e9684fe4 [RISCV] isel (shl (and X, C2), C) -> (slli (srliw X, C3), C3+C).
where C2 has 32 leading zeros and C3 trailing zeros.

When the shl is used by an add C is 1,2 or 3, we end up matching
(add (shl X, C), Y) first. This leaves an and with a constant that
is harder to materialize.
2022-07-02 01:04:44 -07:00
Craig Topper 9ac548e118 [RISCV] isel (add (and X, 0xFFFFFFFE), Y) as (SH1ADD (SRLIW X, 1), Y).
Similar for SH2ADD and SH3ADD.

This is what we get from

int foo(int* x, unsigned y) {
  return x[y >> 1];
}

This allows us to avoid materializing 0xFFFFFFFE into a register.
2022-07-01 23:52:29 -07:00
Yeting Kuo 5744b9cb79 [RISCV] Restore "Enable shrink wrap by default"
This reverts commit 7af3d4ab3d.

RISC-V reverted the shrink wrap patch for bug 53662. Since the bug is fixed
by D123679, the commit re-enable it.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D128965
2022-07-02 11:13:13 +08:00
Yeting Kuo 8590a35ef9 [RISCV][NFC] Simplify condition of IsTU.
Just simplify code.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D128972
2022-07-02 09:22:38 +08:00
Craig Topper 188582b7e0 [RISCV] Considering existing offset in the alignment when folding ADDIs into load/store.
getPointerAlignment and ConstantPoolSDNode::getAlign only consider
the alignment of the object. If we already have a non-zero offset
into the offset that may have reduced the alignment.

Since the base pointer will become an LUI with the old offset, we
need to be sure the new offset fits in the alignment of the address
that will be used to create the LUI immediate.

I'm not sure it is possible to have a non-zero offset in the
GlobalAddressSDNode or ConstantPoolSDNode at this point today so this
may only be a theoretical bug.

Differential Revision: https://reviews.llvm.org/D129006
2022-07-01 11:18:40 -07:00
Fangrui Song 6e8ec13d3f [MC][RISCV] Suppress R_RISCV_{ADD,SUB}32 in .apple_names .apple_types after D127549
This fixes test/DebugInfo/Generic/accel-table-hash-collisions.ll and
cross-cu-inlining.ll when the default triple is riscv. llvm-dwarfdump
--apple-names does not resolve R_RISCV_{ADD,SUB}32 in .apple_names .apple_types
and having ADD/SUB will cause decoding failure `Atom[0]: Error extracting the
value`.
2022-07-01 11:15:04 -07:00
Craig Topper 058d521ea4 [RISCV] Avoid repeated code in SelectAddrRegImm. NFC 2022-06-30 17:22:04 -07:00
Craig Topper 5ca39a55a7 [RISCV] Remove an unnecessary copy of X0 in selectShiftMask.
We know which instruction we're emitting so its ok to directly
encode X0 into the instruction. We only need to create a copy when
a constant 0 is selected without context of what instructions uses it.
2022-06-30 15:11:58 -07:00
Craig Topper 354e04554a [RISCV] Make custom isel for (add X, imm) used by load/stores more selective.
Only handle immediates that would produce an ADDI or ADDIW of Lo12
as the final instruction in their materialization.

As the test change show this removes immediates that materialize
with lui+addiw that is not the same as lui+addi.
2022-06-30 14:20:11 -07:00
Craig Topper ae5f5eb2f1 [RISCV] Replace some uses of XLenVT in RISCVDAGToDAGISel::Select with the original Node VT. NFCI
These should contain the same thing, but we aren't consistent about
which we use.

Since we call ReplaceNode, it seems more correct to use the initial VT.
2022-06-30 13:00:44 -07:00
Craig Topper 2b7b609821 [RISCV] Use getVTList to simplify creation of vleff MachineSDNode. NFC
We don't need to pass the 3 VTs separately, we already have a list
available to us.
2022-06-30 11:34:02 -07:00
Craig Topper 89e7e59621 [RISCV] Use the VT passed into selectImm instead of XLenVT. NFCI
I think the VT pased in will always be XLen.
2022-06-30 11:15:28 -07:00
Craig Topper 51d672946e [RISCV] Fold (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
Similar for a subtract with a constant left hand side.

(sra (add (shl X, 32), C1<<32), 32) is the canonical IR from InstCombine
for (sext (add (trunc X to i32), 32) to i32).

For RISCV, we should lower this as addiw which means turning it into
(sext_inreg (add X, C1)).

There is an existing DAG combine to convert back to (sext (add (trunc X
to i32), 32) to i32), but it requires isTruncateFree to return true
and for i32 to be a legal type as it used sign_extend and truncate
nodes. So that doesn't work for RISCV.

If the outer sra happens be used by a shl by constant, it will be
folded and the shift amount of the sra will be changed before we
can do our own DAG combine. This requires us to match the more
general pattern and restore the shl.

I had wanted to do this as a separate (add (shl X, 32), C1<<32) ->
(shl (add X, C1), 32) combine, but that hit an infinite loop for some
values of C1.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128869
2022-06-30 09:01:24 -07:00
Craig Topper 9ace5af049 [RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C).
The sext_inreg can often be folded into an earlier instruction by
using a W instruction. The sext_inreg also works better with our ABI.

This is one of the steps to improving the generated code for this https://godbolt.org/z/hssn6sPco

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128843
2022-06-30 09:01:24 -07:00
Philip Reames dd48d3ad0e Revert "[RISCV] Avoid changing etype for splat of 0 or -1"
This reverts commit 755c84c62c.  A bug was reported on the original review thread (https://reviews.llvm.org/D128006), and on inspection this patch is simply wrong.  It needs to be checking for VLInBytes, not MaxVL.  These happen to be the same when using AVL=VLMAX (which is quite common), but this does not fold when AVL != VLMAX.
2022-06-29 10:27:02 -07:00
Craig Topper 7cbfb4eb7a [RISCV] Select (srl (and X, C2) as (slli (srliw X, C3), C3-C).
If C2 has 32 leading zeros and C3 trailing zeros.
2022-06-29 09:15:09 -07:00
Craig Topper 5dcc525492 [RISCV] Fold (add X, [-4096, -2049]) or (add X, [2048,4096]) into load/store address during isel.
Previously we iseled this to a pair of ADDIs and relied on a post
isel peephole to fold one of the ADDIs into the load/store. Now
we split the immediate in two parts the same way isel does and fold
one of the pieces. If the add has a non-memory use it will emit
two isels and larger one will CSE with the ADDI we created for the
the memory use.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D128741
2022-06-28 16:59:39 -07:00
Philip Reames da60558d8a [RISCV] Rename getMin/MaxVLen to getArchMin/MaxVlen and make protected [nfc] 2022-06-28 15:54:40 -07:00
Philip Reames 860c62f53c [RISCV] Refine known bits for READ_VLENB
This implements known bits for READ_VALUE using any information known about minimum and maximum VLEN. There's an additional assumption that VLEN is a power of two.

The motivation here is mostly to remove the last use of getMinVLen, but while I was here, I decided to also fix the bug for VLEN < 128 and handle max from command line generically too.

Differential Revision: https://reviews.llvm.org/D128758
2022-06-28 15:42:14 -07:00
Craig Topper 02c8453e64 [RISCV] Teach RISCVMergeBaseOffset to handle read-modify-write of a global.
The pass was previously limited to LUI+ADDI being used by a single
instruction.

This patch allows the pass to optimize multiple memory operations
that use the same offset. Each of them will receive a separate %lo
relocation. My main motivation is to handle a read-modify-write
where we have a load and store to the same address, but I didn't
restrict it to that case.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128599
2022-06-28 11:46:24 -07:00
Alex Bradbury 7bcfcabbd1 [RISCV] Implement support for the Zicbop extension
Implements the ratified RISC-V Base Cache Management Operation ISA
Extension: Zicbop, as described in
https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf.

This is implemented in a separate patch to Zicbom and Zicboz due to it
requiring a new ASM operand type to be defined.

Differential Revision: https://reviews.llvm.org/D117433
2022-06-28 12:43:26 +01:00
Alex Bradbury 4f40ca53ce [RISCV] Implement support for the Zicbom and Zicboz extensions
Implements the ratified RISC-V Base Cache Management Operation ISA
Extensions: Zicbom and Zicboz, as described in
https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf.

Zicbop is implemented in a separate patch due to it requiring a new ASM
operand type to be defined.

As discussed in the relevant issue in the upstream spec
https://github.com/riscv/riscv-CMOs/issues/47, the cbo.* instructions
use the format (rs1) or 0(rs1) for their operand, similar to the AMOs.

Differential Revision: https://reviews.llvm.org/D117432
2022-06-28 12:43:25 +01:00
Lian Wang 96ab083622 [RISCV] Support VECTOR_REVERSE mask operation.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128627
2022-06-28 07:48:51 +00:00
LiaoChunyu 1178992c72 [RISCV] Optimize 2x SELECT for floating-point types
Including the following opcode:
 Select_FPR16_Using_CC_GPR
 Select_FPR32_Using_CC_GPR
 Select_FPR64_Using_CC_GPR

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127871
2022-06-28 12:02:05 +08:00
Craig Topper ea1b861278 [RISCV] Fix misleading formatting and remove a dead getNode call. NFC 2022-06-27 18:49:57 -07:00
Craig Topper 87077c7eb5 [RISCV] Remove repeated calls to getSExtValue. NFC 2022-06-27 13:42:58 -07:00
Philip Reames 0533b6e2f6 [RISCV] Remove a use of getMinVLen in favor of getRealMinVLen
The later is possibly greater than the former, and thus the assert was overly strong when a wider VLEN was set at the command line.
2022-06-27 12:52:24 -07:00
Philip Reames aadc9d26a3 [RISCV] Cost model for scalable reductions
This extends the existing cost model for reductions for scalable vectors.

The existing cost model assumes that reductions are roughly logarithmic in cost for unordered variants and linear for ordered ones. This change keeps that same basic model, and extends it out to the maximum number of elements a scalable vector could possibly have.

This results in costs which aren't terribly high for unordered reductions, but are for ordered ones. This seems about right; we want to strongly bias away from using scalable ordered reductions if the cost might be linear in VL.

Differential Revision: https://reviews.llvm.org/D127447
2022-06-27 12:44:38 -07:00
Craig Topper eb9d21d65c [RISCV] Remove extra semicolon. NFC 2022-06-26 18:19:43 -07:00
Craig Topper 5e944e9eb7 [RISCV] Refactor SelectAddrRegImm to not depend on SelectBaseAddr.
SelectBaseAddr was a minor convenience to use since it already'
existed for vector load/store. D128187 is going to remove the other
uses of SelectBaseAddr so it has less reason to exist.

This patch removes the dependency on SelectBaseAddr and adds a new
SelectAddrFrameIndex to share some code with SelectFrameAddrRegImm.
2022-06-26 11:11:41 -07:00
Philip Reames 9803b0d1e7 [RISCV] Implement getVScaleForTuning and thus prefer scalable vectorization when enabled
LoopVectorizer uses getVScaleForTuning for deciding how to discount the cost of a potential vector factor by the amount of work performed. Without the callback implemented, the vectorizer was defaulting to an estimated vscale of 1. This results in fixed vectorization looking falsely profitable (since it used the command line VLEN).

The test change is pretty limited since a) we don't have much coverage of the vectorizer with scalable vectors at all, and b) what little coverage we have mostly uses i64 element types. There's a separate issue with <vscale x 1 x i64> which prevents us from getting to this stage of costing, and thus only the one test explicitly written to avoid that is visible in the diff. However, this is actually a very wide impact change as it changes the practical vectorization result when both fixed and scalable is enabled to scalable.

As an aside, I think the vectorizer is at little too strongly biased towards scalable when both are legal, but we can explore that separately. For now, let's just get the cost model working the way it was intended.

Differential Revision: https://reviews.llvm.org/D128547
2022-06-25 11:25:23 -07:00
Philip Reames 767ba58f80 [RISCV] Make getMinRVVVectorSizeInBits and getMaxRVVVectorSizeInBits protected [nfc]
These are now only used in the implementation of getRealMinVLen and getRealMaxVLEn, and useRVVForFixedLengthVectors; make them protected to discourage new users.
2022-06-25 11:11:31 -07:00
Shao-Ce SUN 529f05cdbb [RISCV][MC] Fold UIMM related code
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128495
2022-06-25 10:50:50 +08:00
Philip Reames 4710e78974 [RISCV] Implement RISCVTTIImpl::getMaxVScale correctly
The comments in the existing code appear to pre-exist the standardization of the +v extension. In particular, the specification *does* provide a bound on the maximum value VLEN can take. From what I can tell, the LMUL comment was simply a misunderstanding of what this API returns.

This API returns the maximum value that vscale can take at runtime. This is used in the vectorizer to bound the largest scalable VF (e.g. LMUL in RISCV terms) which can be used without violating memory dependence.

Differential Revision: https://reviews.llvm.org/D128538
2022-06-24 16:51:53 -07:00
Philip Reames a0443dd47c [RISCV] Simplify 16 bit index handling in lowerVECTOR_REVERSE [nfc]
getRealMaxVLen returns an upper bound on the value of VLEN.  We can use this upper bound (which unless explicitly set at command line is going to result in a e8 MaxVLMax of much greater than 256) instead of explicitly handling the unknown case separately from the bounded by number greater than 256 case.

Note as well that this code already implicitly depends on a capped value for VLEN.  If infinite VLEN were possible, than 16 bit indices wouldn't be enough.
2022-06-24 13:08:39 -07:00
Philip Reames f1e1c3ce77 [RISCV] Replace two calls to getMinRVVVectorSizeInBits in fixed length lowering [nfc]
Both of these are only reached if useRVVForFixedLengthVectors is true.  Given that, we know that getRealMinVLen() == getMinRVVVectorSizeInBits().
2022-06-24 13:00:57 -07:00
Philip Reames f1b1bcdbd4 [RISCV] Replace two calls to getMinRVVVectorSizeInBits with getRealMinVLen [nfc]
This doesn't change behavior, it just makes it slightly more obvious what's
going on.  Note that getRealMinVLen is always >= getMinRVVVectorSizeInBits.

The first case is a bit tricky, as you have to know that
getMinRVVVectorSizeInBits returns 0 when not set, and thus is equivalent
to the else value clause.  The new code structure makes it more obvious we
return 0 unless using RVV for fixed length vectors.
2022-06-24 12:07:33 -07:00
Craig Topper 78a31bb969 [RISCV] Change how we isel (add X, [-4096, -2049]) or (add X, [2048,4095]).
We currently split the immediate almost equally between two addis.
If the immediate is odd, it won't be split exactly equal.

This patch instead gives one addi an immediate of 2047 or -2048 and the
other getsthe remainder. If the original immediate is near -2049 or 2048,
this might allow the use of c.addi for the addi that receives the
smaller immediate.

Reviewed By: asb, luismarques

Differential Revision: https://reviews.llvm.org/D128500
2022-06-24 08:31:52 -07:00
Craig Topper c579ab53bd [RISCV] Move vfma_vl+fneg_vl matching to DAG combine.
This patch adds 3 new _VL RISCVISD opcodes to represent VFMA_VL with
different portions negated. It also adds a DAG combine to peek
through FNEG_VL to create these new opcodes.

This is modeled after similar code from X86.

This makes the isel patterns more regular and reduces the size of
the isel table by ~37K.

The test changes look like regressions, but they point to a bug that
was already there. We aren't able to commute a masked FMA instruction
to improve register allocation because we always use a mask undisturbed
policy. Prior to this patch we matched two multiply operands in a
different order and hid this issue for these test cases, but a different
test still could have encountered it.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D128310
2022-06-24 00:00:37 -07:00
Philip Reames 7bfad7b9d8 [RISCV] Replace two calls to getMinRVVVectorSizeInBits with useRVVForFixedLengthVectors [nfc] 2022-06-23 15:59:33 -07:00
Philip Reames 1cc9792281 [RISCV] Fix a crash in InsertVSETVLI where we hadn't properly guarded for a SEWLMULRatioOnly abstract state
A forward abstract state can be in the special SEWLMULRatioOnly state which means we're not allowed to inspect its fields.  The scalar to vector move case was mising a guard, and we'd crash on an assert.  Test cases included.
2022-06-23 10:25:16 -07:00
Craig Topper 8b10ffabae [RISCV] Disable <vscale x 1 x *> types with Zve32x or Zve32f.
According to the vector spec, mf8 is not supported for i8 if ELEN
is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32.

Since RVVBitsPerBlock is 64 and LMUL is calculated as
((MinNumElements * ElementSize) / RVVBitsPerBlock) this means we
need to disable any type with MinNumElements==1.

For generic IR, these types will now be widened in type legalization.
For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan
to work on disabling the intrinsics in the riscv_vector.h header.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D128286
2022-06-23 08:49:18 -07:00
Craig Topper 4045b62d4c [RISCV] Add macrofusion infrastructure and one example usage.
This adds the macrofusion plumbing and support fusing LUI+ADDI(W).

This is similar to D73643, but handles a different case. Other cases
can be added in the future.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D128393
2022-06-23 08:38:39 -07:00
Craig Topper 352346fa9e [RISCV] Refactor code to remove some small wrapper methods and merge two functions together. NFC 2022-06-22 23:04:58 -07:00
Craig Topper f912d21e67 [RISCV] Add RISCVISD opcodes for the rest of get*Addr.
This adds RISCVISD opccodes for LA, LA_TLS_IE, and LA_TLS_GD to
remove creation of MachineSDNodes form get*Addr. This makes the
code consistent with the previous patches that added RISCVISD::HI,
ADD_LO, LLA, and TPREL_ADD.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128325
2022-06-22 09:21:07 -07:00
Craig Topper 0efbf5bfbb [RISCV] Move the passthru operand for RISCVISD::VRGATHER*_VL nodes. NFC
Put it before the VL instead of as the first operand. I want to add
passthru to more operands, but the commutable ones like VADD_VL
require the commutable operands to be operand 0 and 1. So we can't
have the passthru as operand 0 for those.
2022-06-21 14:01:02 -07:00
Craig Topper 0af19ef9ff [RISCV] Remove true_mask patterns for VRGATHERE16..
After adding it to the table so the post-isel peephole can handle it.
2022-06-21 11:59:37 -07:00
Craig Topper e50b141a13 [RISCV] Remove true_mask patterns for VRGATHER.
These can be handled by the post-isel peephole.
2022-06-21 11:59:37 -07:00
Kazu Hirata 7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Craig Topper e01353f816 [RISCV] Add RISCVISD opcode for PseudoAddTPRel.
Use it along with RISCVISD::HI and ADD_LO to avoid emitting
MachineSDNodes during lowering.
2022-06-20 20:56:52 -07:00
Craig Topper 59cde2133d Recommit "[RISCV] Enable subregister liveness tracking for RVV."
The failure that caused the previous revert has been fixed
by https://reviews.llvm.org/D126048

Original commit message:

RVV makes heavy use of subregisters due to LMUL>1 and segment
load/store tuples. Enabling subregister liveness tracking improves the quality
of the register allocation.

I've added a command line that can be used to turn it off if it causes compile
time or functional issues. I used the command line to keep the old behavior
for one interesting test case that was testing register allocation.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D128016
2022-06-20 20:46:06 -07:00
Kazu Hirata 0916d96d12 Don't use Optional::hasValue (NFC) 2022-06-20 20:17:57 -07:00
Craig Topper 16d3a82de5 [RISCV] Add merge operand to RISCVISD::VRGATHER*_VL nodes.
Use it in place of VSELECT_VL+VRGATHER*_VL.

This simplifies the isel patterns.

Overall, I think trying to match select+op to create masked instructions
in isel doesn't scale. We either need to do it in DAG combine, pre-isel
peepole, or post-isel peephole. I don't yet know which is the right
answer, but for this case it seemed best to be able to request the
masked form directly from lowering.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D128023
2022-06-20 18:58:24 -07:00
Philip Reames 0aebd1d875 [RISCV] Fix crash when costing scalable gather/scatter of pointer
This was a bug introduced in d764aa. A pointer type is not a primitive type, and thus we were ending up dividing by zero when computing VLMax.

Differential Revision: https://reviews.llvm.org/D128219
2022-06-20 12:50:42 -07:00
Kazu Hirata ad7ce1e769 Don't use Optional::hasValue (NFC) 2022-06-20 11:49:10 -07:00
Kazu Hirata e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Philip Reames 14847098f9 [RISCV] Delete unexercised VL=0 vsetvli compatibility logic
The code being removed is technically correct; if we end up with two VL=0 instructions next to each other, we can avoid a state transition if the second is a scalar move.  However, since both ops are also nops, we should simply delete them instead.  As such, this compatibility rule simply complicates the code for no purpose.
2022-06-20 10:15:31 -07:00
Philip Reames dc562d570d [RISCV] Fold prepass back into InsertVSETVLI data flow [nfc-ish]
When working through correctness issues in this pass, I moved a number of transforms which were phrased as mutating prior vsetvli instructions out of the main data flow because mutating prior instructions can invalidate the running dataflow results in subtle ways. We ended up creating both a prepass and a post-pass.

After consideration, I believe the prepass to be redundant, and this change removes it by folding it back into the data flow via a key conceptual change. Instead of phrasing the mutations on instructions, we can phrase them on abstract states. This avoids the dataflow inconsistency problem mentioned above by simply propagating the potential change forward, and thus reflecting its results in the dataflow.  Critically, we do so without modifying existing VSETVLI instructions; some of the data flow steps include non-local IR analysis.

Compile time wise, this removes a linear pass, but has the potential to increase the number of iterations for the data flow to converge. That's not a algorithmic complexity change, the needVSETVLI mechanism has the same effect. In practice, I don't see this triggering more iterations, so I think it's likely to be a net win overall. (I didn't do any careful analysis here; just an impression from glancing at a couple tests.)

This has the potential to produce better results, so this isn't strictly speaking NFC.

Differential Revision: https://reviews.llvm.org/D127870
2022-06-20 07:56:33 -07:00
Philip Reames 820e84e050 [RISCV] Assert initial load/store SEW is the EEW
In D127983, I had flipped from using the computed EEW to using the SEW value pulled from the VSETVLI when checking compatibility. This wasn't intentional, though thankfully it appears to be a non-functional difference. The new code does make a unchecked assumption that the initial SEW operand on the load/store is the EEW. This patch clarifies the assumption, and adds an assert to make sure this remains true.

Differential Revision: https://reviews.llvm.org/D128085
2022-06-20 07:45:21 -07:00
Craig Topper 8780630ded [RISCV] Merge two similar asserts from different if/else blocks. NFC 2022-06-19 19:48:50 -07:00
Craig Topper 545a71c0d6 [RISCV] Pre-promote v1i1/v2i1/v4i1->i1/i2/i4 bitcasts before type legalization
Type legalization will convert the bitcast into a vector store and
scalar load.

Instead this patch widens the vector to v8i1 with undef, and bitcasts
it to i8. v8i1->i8 has custom handling for type legalization already to
bitcast to a v1i8 vector and use an extract_element.

The code here was lifted from X86's avx512 support.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D128099
2022-06-18 11:06:45 -07:00
Kazu Hirata 621f58e716 [Target, CodeGen] Use isImm(), isReg(), etc (NFC) 2022-06-18 07:41:04 -07:00
Craig Topper cbf6737cc4 [RISCV] Use RVVBitsPerBlock instead of hardcoding multiples of 64. NFC 2022-06-17 14:10:39 -07:00
Philip Reames fb8ecca06f [RISCV] Remove redundant code checking for exact VTYPE match [nfc]
Should be fully covered by the generic demanded field based logic just below, and this ensures better coverage of that logic.
2022-06-17 12:20:20 -07:00
Philip Reames 4d245f1bc2 [RISCV] Move store policy and mask reg ops into demanded handling in InsertVSETVLI
Doing so let's the post-mutation pass leverage the demanded info to rewrite vsetvlis before a store/mask-op to eliminate later vsetvlis.

Sorry for the lack of store test change; all of my attempts to write something reasonable have been handled through existing logic.
2022-06-17 12:09:50 -07:00
Philip Reames b595cddea7 [riscv] Extract isMaskRegOp helper [nfc] 2022-06-17 10:40:54 -07:00
Philip Reames e1f1407beb [RISCV] Delete dead elideCopy code in InsertVSETVLI [nfc]
This code should be dead. A simple whole register copy of an IMPLICIT_DEF, is simply an IMPLICIT_DEF of it's own. (This would not be true for freeze, but is for copy.)  If we find a case which gets here with vector operand copy of an IMPLICIT_DEF, we most likely have an earlier missed optimization anyways.  (The most recent case of this was e6c7a3a, found by Craig during review of this patch.)  There might be others, and if so, we'll revisit them individually as regressions are reported.

Differential Revision: https://reviews.llvm.org/D127996
2022-06-17 09:58:11 -07:00
Philip Reames 755c84c62c [RISCV] Avoid changing etype for splat of 0 or -1
A splat of the values 0 and -1 as sign extended 12 bit immediates are always the same bit pattern regardless of the etype used to perform the operation. As a result, we can sometimes avoid introducing a vsetvli just for the purposes of a splat.

Looking at the diffs, we don't get a huge amount of immediate value out of this. We mostly push the vsetvli one instruction down, usually in front of a vmerge. We also don't get the corresponding fixed length vector cases because VL typically is changed despite the actual bits written being the same. Both of these are areas I plan to explore in future patches.

Interestingly, this makes a great example of why we need the forward and backward implementation to be consistent. Before we merged the demanded field handling, if we implement only the forward direction, we lost the ability to mutate a prior vsetvli and eliminate a later one entirely. This resulted in practical regressions instead of improvements. It's always nice when practice matches theory. :)

Differential Revision: https://reviews.llvm.org/D128006
2022-06-17 08:10:14 -07:00
Philip Reames ea690e7019 [RISCV] Rename VTy param of RISCVTTIImpl::getArithmeticReductionCost [NFC]
Having it be consistent with getMinMaxReductionCost for ease of copy paste outweights the minor clarity of calling it VTy instead of Ty.
2022-06-16 15:26:09 -07:00
Craig Topper 9d7b01dc95 [RISCV] Implement RISCVTargetLowering::getTargetConstantFromLoad.
This allows computeKnownBits to see the constant being loaded.

This recovers the rv64zbp test case changes from D127520.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127679
2022-06-16 15:11:18 -07:00
Craig Topper 5afdceb82b [RISCV] Add RISCVISD opcode for PseudoLLA.
Rather than emitting a MachineSDNode from lowering. Let isel match it.

This is consistent with the RISCVISD::HI and ADD_LO nodes that were
also added. Having them both the same will make D127679 consistent.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127714
2022-06-16 15:11:03 -07:00
Craig Topper 4191de262f [RISCV] Don't emit LUI/ADDI MachineSDNodes from getAddr
Instead add RISCVISD opcodes that will be selected to LUI/ADDI
during isel.

I'm looking into maybe moving doPeepholeLoadStoreADDI into isel.
Having the ADDI as a RISCVISD node will make it visible to isel.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127713
2022-06-16 14:56:07 -07:00
Philip Reames 2fa2cee6a8 [RISCV] Start merging demanded reasoning - starting with load/stores [nfc]
This change merges the logic for reasoning about demanded portions of the VTYPE register between the main dataflow algorithm and the backwards mutation post pass. In the process, we get to delete a bunch of now redundant code.

This should be entirely NFC. I included a slight hack (see TODO) to avoid changing behavior in the post pass while being able to use the generalized logic in the prepass. I will fix the TODO in a separate change once this lands.

Differential Revision: https://reviews.llvm.org/D127983
2022-06-16 14:34:53 -07:00
Philip Reames d764aa7fc6 [RISCV] Add cost model for scalable scatter and gather
The costing we use for fixed length vector gather and scatter is to simply count up the memory ops, and multiply by a fixed memory op cost. For scalable vectors, we don't actually know how many lanes are active. Instead, we have to end up making a worst case assumption on how many lanes could be active. In the generic +V case, this results in very high costs, but we can do better when we know an upper bound on the VLEN.

There's some obvious ways to improve this - e.g. using information about VL and mask bits from the instruction to reduce the upper bound - but this seems like a reasonable starting point.

The resulting costs do bias us pretty strongly away from generating scatter/gather for generic +V.  Without this, we'd be returning an invalid cost and thus definitely not vectorizing, so no major change in practical behavior expected.

Differential Revision: https://reviews.llvm.org/D127541
2022-06-16 14:22:31 -07:00
Philip Reames 89a11ebd8e [RISCV] Avoid reducing etype just to initialize lane 0 of an undef vector
If we're writing to an undef vector (i.e. implicit_def), we can change the value of bits outside the requested write without consequence. This allows us to avoid a VSETVLI just for narrowing the value written.

Differential Revision: https://reviews.llvm.org/D127880
2022-06-16 11:14:21 -07:00
Craig Topper 6716195cd7 [RISCV] Merge TIED_TU and TIED instructions for VWADD_W/VWSUB_W by using policy operand.
This removes one of the uses of ForceTailUndisturbed.
2022-06-16 10:06:11 -07:00
Philip Reames 6ed81ec164 [RISCV] Reorder function definitions to reduce upcoming diff [nfc] 2022-06-16 09:25:27 -07:00
Craig Topper 912a5172f8 [RISCV] Use TAIL_AGNOSTIC in riscv_fma_vl patterns.
We may eventually need tail undisturbed patterns, but we will need
a policy operand on the ISD node to communicate it.
2022-06-16 09:09:36 -07:00
Philip Reames 27c61d033f [RISCV] Split DemandedField logic in advance of reuse in dataflow [nfc]
This change just moves some code around, and extracts out a helper function expected to be useful when reusing the demanded field logic in the forward dataflow.
2022-06-16 08:49:41 -07:00
Philip Reames 37fa5850f1 [RISCV] Move getSEWLMULRatio out of VSETVLIInfo [nfc] 2022-06-16 08:40:20 -07:00
Craig Topper b34e3f40e7 [RISCV] Use TAIL_UNDISTURBED_MASK_UNDISTURBED for riscv_slidedown_vl unless the merge op is undef.
If the merge operand isn't undef we need to be using tail undisturbed.

Turns out all of our uses of riscv_slidedown_vl use undef so this
doesn't affect any tests.
2022-06-16 08:35:27 -07:00
Philip Reames 4a3e46115a [RISCV] Extend demanded field transform in InsertVSETVLI to VTYPE subfeilds
The motivating case, and the only one actually enabled by this patch, is a load or store followed by another op with the same SEW/LMUL ratio.

As an example, consider:

define void @test1(ptr %in, ptr %out) {
entry:
  %0 = load <8 x i16>, ptr %in, align 2
  %1 = sext <8 x i16> %0 to <8 x i32>
  store <8 x i32> %1, ptr %out, align 4
  ret void
}

Without this patch, we get:

	vsetivli	zero, 8, e16, mf4, ta, mu
	vle16.v	v8, (a0)
	vsetvli	zero, zero, e32, mf2, ta, mu
	vsext.vf2	v9, v8
	vse32.v	v9, (a1)
	ret

Whereas with the patch we get:

	vsetivli	zero, 8, e32, mf2, ta, mu
	vle16.v	v8, (a0)
	vsext.vf2	v9, v8
	vse32.v	v9, (a1)
	ret

We have rewritten the first vsetvli and thus removed the second one.

As is strongly hinted by the code structure and todos, I am planning on communing this with all (or most all?) of the cases from isCompatible used in the forward data flow. This will be done in a series of following changes - some NFC reworks, and some reviewed optimization extensions.

Differential Revision: https://reviews.llvm.org/D127780
2022-06-16 08:01:27 -07:00
Guillaume Chatelet 412c788ab0 [NFC][Alignment] Use Align in MCAlignFragment 2022-06-15 12:31:00 +00:00
Kito Cheng 687e56614f [RISCV] Fixing undefined physical register issue when subreg liveness tracking enabled.
RISC-V expand register tuple spilling into series of register spilling after
register allocation phase by the pseudo instruction expansion, however part of
register tuple might be still undefined during spilling, machine verifier will
complain the spill instruction is using an undefined physical register.

Optimal solution should be doing liveness analysis and do not emit spill
and reload for those undefined parts, but accurate liveness info at that point
is not so easy to get.

So the suboptimal solution is still spill and reload those undefined parts, but
adding implicit-use of super register to spill function, then machine
verifier will only report report using undefined physical register if
the when whole super register is undefined, and this behavior are also
documented in MachineVerifier::checkLiveness[1].

Example for demo what happend:

```
  v10m2 = xxx
  # v12m2 not define yet
  PseudoVSPILL2_M2 v10m2_v12m2
  ...
```

After expansion:
```
  v10m2 = xxx
  # v12m2 not define yet
  # Expand PseudoVSPILL2_M2 v10m2_v12m2 to 2 vs2r
  VS2R_V v10m2
  VS2R_V v12m2 # Use undef reg!
```

What this patch did:
```
  v10m2 = xxx
  # v12m2 not define yet
  # Expand PseudoVSPILL2_M2 v10m2_v12m2 to 2 vs2r
  VS2R_V v10m2 implicit v10m2_v12m2
  # Use undef reg (v12m2), but v10m2_v12m2 ins't totally undef, so
  # that's OK.
  VS2R_V v12m2 implicit v10m2_v12m2
```

[1] https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/MachineVerifier.cpp#L2016-L2019

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127642
2022-06-15 16:23:39 +08:00
Yeting Kuo 9096a52566 [RISCV] Teach vsetvli insertion to not insert redundant vsetvli right after VLEFF/VLSEGFF.
VSETVLIInfos right after VLEFF/VLSEGFF are currently unknown since they modify
VL. Unknown VSETVLIInfos make next vector operations needed to be inserted
VSET(I)VLI. Actually the next vector operation of VLEFF/VLSEGFF may not need to
be inserted VSET(I)VLI if it uses same VTYPE and the resulted vl of
VLEFF/VLSEGFF.

Take the below C code as an example,

  vint8m4_t vec_src1 = vle8ff_v_i8m4(str1, &new_vl, vl);
  vbool2_t mask1 = vmseq_vx_i8m4_b2(vec_src1, 0, new_vl);
  vsetvli insertion adds a redundant vsetvli for that,

Assembly result:
  vsetvli a2,a2,e8,m4,ta,mu
  vle8ff.v v28,(a0)
  csrr a3,vl ; redundant
  vsetvli zero,a3,e8,m4,ta,mu ; redundant
  vmseq.vi v25,v28,0

After D126794, VLEFF/VLSEGFF has a define having value of VL. The patch consider
there is a ghost vsetvli right after VLEFF/VLSEGFF. The ghost VSET(I)LIs use the
vl output of the VLEFF/VLSEGFF as its AVL and same VTYPE of the VLEFF/VLSEGFF.
The ghost vsetvli must be redundant, and we could use it to get the VSETVLIInfo
right after VLEFF/VLSEGFF.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127576
2022-06-15 13:58:40 +08:00
wangpc 8910349e43 [RISCV][NFC] Set default value for BaseInstr in RISCVVPseudo
Since almost all pseudos have the same form of BaseInstr, we
can just set it as default value to reduce some lines.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127632
2022-06-15 10:59:45 +08:00
Craig Topper 5ae3f65cfa [RISCV] Replace uses of VLOpFrag in VLMax patterns with srcvalue.
These are on inner nodes and we're dropping the captured $vl anyway.
2022-06-14 19:19:35 -07:00
Philip Reames facb96584e [RISCV] Minor code/comment improvement in prepass of InsertVSETVLI [nfc] 2022-06-14 16:18:11 -07:00
Saleem Abdulrasool 1582bcd003 RISCV: handle 64-bit PCREL data relocations
We would previously fail to handle 64-bit PC-relative relocations on
RISCV.  This was exposed by trying to build with
`-fprofile-instr-generate`.

The original changes restricted the relocation handling to the text
segment as the paired relocations are undesirable in at least the debug
and .eh_frame sections.  We now make this explicit to handle the general
case for the data relocations as well.

It would be preferable to use `R_RISCV_n_PCREL` when available to avoid
an extra relocation.

Differential Revision: https://reviews.llvm.org/D127549
Reviewed By: luismarques, MaskRay

Fixes: #55971
2022-06-14 21:39:16 +00:00
Philip Reames c67c4133ac [RISCV] Split out transfer function explicitly in VSETVLI insertion dataflow [nfc]
In an effort to make this code easier to read and extend, this splits out helper functions for the transfer function of the data flow. Due to the other results computed during the phases, we can't completely abstract away everything, but we can abstract the actual state transitions.

The motivation here is the following upcoming changes:
* The fault first load patch - already approved, this will be rebased over - adds another case into the transferAfter path.
* An upcoming patch to fold the local prepass back into the main algorithm greatly complicates the transferBefore logic.

Differential Revision: https://reviews.llvm.org/D127761
2022-06-14 14:07:15 -07:00
Philip Reames 44a0a558dc [RISCV] Split out subfields in InsertVSETVLI's demanded fields analysis [nfc]
At the moment, this just gets the infrastructure in place.  Following changes will start using this in non-trivial ways.
2022-06-14 11:35:24 -07:00
Philip Reames 52b166c0de [RISCV] Split out getEEWForLoadStore [nfc]
Mostly about allowing reuse in an upcoming patch, but also makes the code slightly easier to follow.
2022-06-14 10:10:43 -07:00
Philip Reames 7659dc6cdd [RISCV] simplify emitVSETVLIs handling of vsetvli xN, phi(), vtype case [NFC]
This is possibly somewhat subjective, but having an explicitly named flag to track the property required and code structure that more closely matches phase 1/2 of the dataflow seems much easier to read.

Differential Revision: https://reviews.llvm.org/D126893
2022-06-14 08:00:24 -07:00
Craig Topper 17457be1c3 [RISCV] Fix use of texternalsym in output pattern where input was tglobaladdr. NFC
I don't think the name used in the output pattern is used to control
anything about the isel table emission, but it should match the input.
2022-06-13 15:42:42 -07:00
Craig Topper e4062522d3 [RISCV] Disable matchSplatAsGather for i1 vectors to prevent creating illegal nodes.
We were incorrectly creating a VRGATHER node with i1 vector type. We
could support this by promoting the mask to i8 and truncating it, but
for now I want to prevent the crash.

Fixes PR56007.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127681
2022-06-13 13:41:39 -07:00
Craig Topper bb1a52aa8b Recommit "[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates."
With fix for sanitizer build bot failure.
2022-06-13 11:35:44 -07:00
Mitch Phillips 9d99870590 Revert "[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates."
This reverts commit 8bbcb98848.

Broke the UBSan bot. More details in https://reviews.llvm.org/D127376.
2022-06-13 10:16:28 -07:00
Philip Reames aaeb958ced [RISCV] Mutate instruction after computing transfer rule in InsertVSETVLI [nfc]
If we defer the mutation of the instruction, we can add the assert discussed in D126921.  Once we do that, the API becomes subject to revision - but let's do that in a separate change.
2022-06-13 09:08:25 -07:00
Craig Topper cef03e3dcd [RISCV] Move creation of constant pools from isel to lowering.
This simplifies the isel code by removing the manual load creation.
It also improves our ability to use 0 strided loads for vector splats.

There is an assumption here that Mask and ShiftedMask constants are
cheap enough that they don't become constant pool loads so that our
isel optimizations involving And still work. I believe those constants
are 3 instructions in the worst case.

The rv64zbp-intrinsic.ll changes is a regression caused by intrinsics
being expanded to RISCVISD also occuring during lowering. So the optimizations
were only happening during the last DAGCombine, which can't see through the
load. I believe we can fix this test by implementing
TargetLowering::getTargetConstantFromLoad for RISC-V or by adding the intrinsic
to computeKnownBitsForTargetNode to enable earlier DAG combine. Since Zbp is not
a ratified extension, I don't view these as blocking this patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127520
2022-06-13 09:07:57 -07:00
Craig Topper 052536b923 [RISCV] Use isShiftedInt to improve readability. NFC 2022-06-12 21:04:45 -07:00
Hubert Tong 775a22e32a [NFC] Remove unused variable `MF`
https://reviews.llvm.org/D127583 removed the only use of this variable
and broke builds with warnings-as-errors.
2022-06-12 16:31:55 -04:00
Craig Topper d63b66840f [RISCV] Move some methods out of RISCVInstrInfo and into RISCV namespace.
These methods don't access any state from RISCVInstrInfo. Make them
free functions in the RISCV namespace.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D127583
2022-06-12 10:47:21 -07:00
Fangrui Song adf4142f76 [MC] De-capitalize SwitchSection. NFC
Add SwitchSection to return switchSection. The API will be removed soon.
2022-06-10 22:50:55 -07:00
Philip Reames 536095a27c [RISCV] Refine costs for i1 reductions
Our actual lowering for i1 reductions uses ctpop combined with possibly a vector negate and possibly a logic op afterwards. I believe ctpop to be low cost on all reasonable hardware.

The default costing implementation here was returning quite inconsistent costs. and/or were returning very high costs (because we seem to think moving into scalar registers is very expensive?) and others were returning lower but still too high (because of the assumed tree reduce strategy). While we should probably improve the generic costing strategy for i1 vectors, let's start by fixing the immediate problem.

Differential Revision: https://reviews.llvm.org/D127511
2022-06-10 13:21:52 -07:00
Philip Reames f7bb691d61 [RISCV] Implement isElementTypeLegalForScalableVector TTI hook
This brings us into alignment with AArch64, and in the process fixes a compiler crash bug in uniform store handling in the vectorizer.

Before the recent invalid cost bailout work, this would have also avoided crashes on invalid costs in some cases. I honestly think the vectorizer should gracefully bailout on uniform stores it can't use a scatter for, but it doesn't, so lets take the path of least resistance here. It's also possible that there are other vectorizer bugs AArch64 isn't seeing because of this hook; we don't want to be finding them either.

Differential Revision: https://reviews.llvm.org/D127514
2022-06-10 13:20:58 -07:00
Craig Topper 08ea27bf13 [RISCV] Don't require loop simplify form in RISCVGatherScatterLowering.
We need a preheader and a single latch, but we don't need a dedicated
exit.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127513
2022-06-10 13:00:20 -07:00
Shao-Ce SUN 117e10304b [RISCV] move `isFaultFirstLoad` into `RISCVInstrInfo`
Fix build errors in D126794

```
ld.lld: error: undefined symbol: llvm::MachineInstr::getNumExplicitDefs() const
>>> referenced by RISCVBaseInfo.cpp
>>>               RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a

ld.lld: error: undefined symbol: llvm::MachineInstr::findRegisterDefOperandIdx(llvm::Register, bool, bool, llvm::TargetRegisterInfo const*) const
>>> referenced by RISCVBaseInfo.cpp
>>>               RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
```

Reviewed By: fakepaper56, craig.topper

Differential Revision: https://reviews.llvm.org/D127477
2022-06-11 00:27:53 +08:00
Shao-Ce SUN 93116374e7 Revert "[RISCV] move `isFaultFirstLoad` into `RISCVInstrInfo`"
This reverts commit e018e493c1.

There are some problems with this commit,
related revision: https://reviews.llvm.org/D127477
2022-06-11 00:03:04 +08:00
Craig Topper e91051184c [RISCV] Mark FSIN and other math functions as Expand for scalable vectors.
This prevents them from being assumed legal by the cost model.

This matches what is done for AArch64 SVE.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D123799
2022-06-10 08:40:07 -07:00
Shao-Ce SUN e018e493c1 [RISCV] move `isFaultFirstLoad` into `RISCVInstrInfo`
Fix build errors in D126794

```
ld.lld: error: undefined symbol: llvm::MachineInstr::getNumExplicitDefs() const
>>> referenced by RISCVBaseInfo.cpp
>>>               RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a

ld.lld: error: undefined symbol: llvm::MachineInstr::findRegisterDefOperandIdx(llvm::Register, bool, bool, llvm::TargetRegisterInfo const*) const
>>> referenced by RISCVBaseInfo.cpp
>>>               RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
```

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D127477
2022-06-10 21:03:47 +08:00
Yeting Kuo f68cad9087 [RISCV] Lower VLEFF/VLSEGFF SDNodes to MachineInstrs with VL outputs.
The patch is a replacement of D125199. PseudoReadVL with vtype has worry for
computing same vtypes of VLEFF/VLSEGFF in two different places, DAGToDAG and
InsertVSETVLI. VLEFF/VLSEGFF MI with VL output still could provide the vtype of
VLEFF/VLSEGFF to the users of its VL.

The patch names the new pseudo as original VLEFF/VLSEGFF name suffixed "_VL" and
expand them in RISCVInsertVSETVLI pass.

This patch also reverts commit 4537aae0d5,
"[RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF.".

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126794
2022-06-10 13:57:10 +08:00
Philip Reames 28be4b7454 [RISCV] Simplify InstrInfo access in doPeepholeMaskedRVV [nfc] 2022-06-09 17:02:40 -07:00
Craig Topper 8bbcb98848 [RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates.
For an addition with simm14 and simm15 immediates with 2 or 3 trailing bits,
we can use a shXadd instruction and an addi to do the addition.

This patch teaches RISCVMergeBaseOffset to see through this pattern.
I don't think the sh1add case occurs because we use two addis for that,
but I implemented it for completeness.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127376
2022-06-09 16:07:35 -07:00
Kito Cheng 4b11f90903 [RISCV] Fix missing stack pointer recover
In order to make sure the stack point is right through the EH region,
we also need to restore stack pointer from the frame pointer if we
don't preserve stack space within prologue/epilogue for outgoing variables,
normally it's just checking the variable sized object is present or not
is enough, but we also don't preserve that at prologue/epilogue when
have vector objects in stack.

Example to show what happened:
```
try {
  sp adjust for outgoing args. // 1. Sp changed.
  func_call  // 2. Exception raised
  sp restore // Oh, not restored
} catch {
  // 3. And now we are here.
}

// 4. Prepare to return!, restore return address from stack, but...sp is wrong.
// 5. Screw up!
```

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D126861
2022-06-09 23:38:50 +08:00
Philip Reames 0e29a80fdc [RISCV] Add cost model for reverse shuffle
The majority of the cost appears to be forming the indices vector.

Differential Revision: https://reviews.llvm.org/D127141
2022-06-09 07:21:40 -07:00
Craig Topper c739088af5 [RISCV] Fix 80 column violations in RISCVInsertVSETVLI.cpp. NFC
I think these were likely introduced in the recent work done to
this pass.
2022-06-08 18:58:48 -07:00
Craig Topper 209c07d486 [RISCV] Add debug message that should have been in D126843.
For consistency with the other messages in this file.
2022-06-08 16:46:22 -07:00
Craig Topper e4ba24c17d [RISCV] Support (addi (addi globaladdr, C1), C2) in RISCVMergeBaseOffset.
Add with immediates in the range [-4096, -2049] or [2048, 4095] get
convert to two ADDIs. Teach RISCVMergeBaseOffset to recognize this
pattern as well.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D126843
2022-06-08 08:20:37 -07:00
Craig Topper 33f4da2455 [RISCV] Support LUI+ADDIW in RISCVMergeBaseOffsetOpt::matchLargeOffset.
LUI+ADDIW always produces a simm32. This allows us to always
fold it into a global offset.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D126729
2022-06-08 08:19:21 -07:00
Philip Reames 1ea99328b4 [RISCV] Untangle instruction properties from VSETVLIInfo [NFC]
The abstract state used in the data flow should not know anything about the instructions which produced the abstract states. Instead, when comparing two states, we can simply use information about the machine instr at that time.

In the old design, basically any use of the instruction flags on the current (as opposed to a "Require" - aka upcoming state) would be a bug. We don't seem to actually have any such bugs, but we can make this much more obvious with code structure.

Differential Revision: https://reviews.llvm.org/D126921
2022-06-08 08:09:59 -07:00