Commit Graph

1891 Commits

Author SHA1 Message Date
LiaoChunyu 3f68f0f816 [RISCV] Optimize 2x SELECT for floating-point types
Including the following opcode:
 Select_FPR16_Using_CC_GPR
 Select_FPR32_Using_CC_GPR
 Select_FPR64_Using_CC_GPR

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127871
2022-07-11 14:10:27 +08:00
Craig Topper 35ec8a423d [RISCV] Teach shouldConvertConstantLoadToIntImm that constant materialization can use constant pools.
I think it only makes sense to return true here if we aren't going
to turn around and create a constant pool for the immmediate.

I left out the check for useConstantPoolForLargeInts() thinking
that even if you don't want the commpiler to create a constant pool
you might still want to avoid materializing an integer that is
already available in a global variable.

Test file was copied from AArch64/ARM and has not been commited yet.
Will post separate review for that.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D129402
2022-07-10 14:10:17 -07:00
Craig Topper 60450f91c8 [RISCV] Add test cases for inline memcpy expansion
Test file was taken directly from AArch64/ARM. I've added RUN
lines for aligned and unaligned since many of the test cases
are strings that aren't aligned and have an odd size.

Some of these test cases are modified by D129402.

Differential Revision: https://reviews.llvm.org/D129403
2022-07-10 14:09:02 -07:00
Craig Topper 5f7641a3be [RISCV] Modify the custom isel for (add X, imm) used by load/stores.
We have custom isel that tries to select the Lo12 bits using a
separate ADDI that can later folded into the load/store address
by the post-isel peephole.

This patch disables this if the load/store already had a non-zero
offset. A non-zero offset implies that CodeGenPrepare split several
large offsets used by different loads and stores into a common large
offset and multiple small offsets that could be folded. Folding more
of the lo12 bits changes this common offset by increasing the small
offsets. While this can save an instruction to materialize the common
offset, it can also prevent the small offsets from fitting in a
compressed load/store instruction.

Removing this also simplifies the last piece needed to fold the custom
isel for add into SelectAddrRegImm and remove the post-isel peephole.
2022-07-09 22:47:27 -07:00
Craig Topper 92f1794d41 [RISCV] Mark fminnum_vl and fmaxnum_vl as commutable. 2022-07-08 10:19:09 -07:00
Craig Topper 069ba96660 [RISCV] Add commuted fixed vector vfmax.vf and vfmin.vf tests. NFC
The ISD opcodes aren't marked commutable so we don't match these
properly.
2022-07-08 10:19:09 -07:00
Philip Reames 264018d764 [RISCV] Mark vsadd(u)_vl as commutable
This allows fixed length vectors involving splats on the LHS to commute into the _vx form of the instruction. Oddly, the generic canonicalization rules appear to catch the scalable vector cases. I haven't fully dug in to understand why, but I suspect it's because of a difference in how we represent splats (splat_vector vs build_vector).

Differential Revision: https://reviews.llvm.org/D129302
2022-07-08 10:18:21 -07:00
Craig Topper a246eb6814 [RISCV] Mark (s/u)min_vl and (s/u)max_vl as commutable. 2022-07-08 09:59:42 -07:00
Craig Topper cd783bf997 [RISCV] Add fixed vector vmin(u).vx and vmax(u).vx tests. NFC 2022-07-08 09:59:41 -07:00
Sanjay Patel 8b75671314 [SDAG] try to replace subtract-from-constant with xor
This is almost the same as the abandoned D48529, but it
allows splat vector constants too.

This replaces the x86-specific code that was added with
the alternate patch D48557 with the original generic
combine.

This transform is a less restricted form of an existing
InstCombine and the proposed SDAG equivalent for that
in D128080:
https://alive2.llvm.org/ce/z/OUm6N_

Differential Revision: https://reviews.llvm.org/D128123
2022-07-08 08:14:24 -04:00
Kito Cheng 5c45ae4108 [RISCV] Fix wrong register rename for store value during make-compressible optimization
Current implementation will rename both register in store instructions if
we store base address into memory with same base register, it's OK if
the offset is 0, however that is wrong transform if offset isn't 0, give
a smalle example here:

sd      a0, 808(a0)

We should not transform into:

addi    a2, a0, 768
sd      a2, 40(a2)

That should just rename base address like this:

addi    a2, a0, 768
sd      a0, 40(a2)

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128876
2022-07-08 18:07:17 +08:00
Kito Cheng 7b9a3b9d6d [RISCV] Precommit testcase to show wrong result of make-compressible optimization
Use following example to demo what happened now:

  li      a1, 1
  sd      a1, 800(a0)
  sd      a0, 808(a0) # Store base address into base + offset
  li      a1, 2
  sd      a1, 816(a0)

Current will optimizate into:

  li      a1, 1
  addi    a2, a0, 768
  sd      a1, 32(a2)
  sd      a2, 40(a2) # Wrong replacement for the source register.
  li      a1, 2
  sd      a1, 48(a2)

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128875
2022-07-08 17:01:22 +08:00
Lian Wang 9cfb28d672 [RISCV] Change VECTOR_SPLICE mask operation from expand to promote
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128717
2022-07-08 06:20:22 +00:00
Lian Wang 99da3115d1 [RISCV] Recommit test for D128717
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128778
2022-07-08 02:21:33 +00:00
Lian Wang ab9e8a3a6f Revert "[RISCV] Precommit test for D128717"
This reverts commit b3b37f3ecf.
2022-07-08 01:56:29 +00:00
Lian Wang b3b37f3ecf [RISCV] Precommit test for D128717
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128778
2022-07-08 01:47:01 +00:00
Diego Caballero bf1758c3dc Revert "[RISCV] Optimize 2x SELECT for floating-point types"
This reverts commit 1178992c72.
2022-07-07 22:54:00 +00:00
Philip Reames 439783da01 [RISCV] Adjust fixed vector coverage for get.active.lane.mask
Make sure we include at least one case where the vsadd/vmsltu lowering
requires only LMUL1.  We should be able to generate all of the fixed
vector variants from scalar to vector idioms, but this is probably not
very important right now given the fixed length variants we'd actually
use when vectorizing with LMUL=1 are reasonable.
2022-07-07 13:28:29 -07:00
Philip Reames fa3783c907 [RISCV] Test coverage for missing commute of vsadd(u)
For some reason, this appears to only happen with fixed length vectors.  Scalable ones commute just fine in all the cases I've seen.
2022-07-07 09:11:44 -07:00
Philip Reames 6f4773f064 [RISCV] Add codegen coverage for get.active.lane.mask 2022-07-06 14:50:41 -07:00
Craig Topper 088bb8a328 [RISCV] Add more SHXADD patterns where the input is (and (shl/shr X, C2), C1)
It might be possible to rewrite (and (shl/shr X, C2), C1) in a way
that gives us a left shift that can be folded into a SHXADD.
2022-07-05 16:21:47 -07:00
Craig Topper ac3e26bcff [RISCV] Add more SHXADD tests. NFC 2022-07-05 13:41:58 -07:00
luxufan c06d0b4d02 [RISCV] Add ADDI instr for computing FrameIndex address
RVV doesn't have immediate field for memory addressing. Currently
we build MachineInstructions in PEI to computing stack offset for
RVV load store instructions. These instructions were added too late to
can be optimized by CSE, LICM... passes.

This patch makes FrameIndex SDNodes can't be matched in RVV Load Store
instruction selection patterns. So that the FrameIndex SDNodes would be
selected as `ADDI GPR, targetframeindex`.

There are 2 advantages for such change:
1. Stack objects address computing can be optimized by machine function
passes.
2. Since the ADDI instruction's destination register can be used as a
temp register, we can save an emergency spill slot.

Differential Revision: https://reviews.llvm.org/D128187
2022-07-04 22:13:35 +08:00
Craig Topper d36e09cfe5 [RISCV] Add more SHXADD patterns.
This handles the code we get for this.

int foo(unsigned x, int *y) {
    return y[x >> 3];
}

The srl and shl implied by the array index will be combined to
form (srl (and X, C2), C1). We need to reverse this get to back
the shl to fold into SHXADD.
2022-07-03 21:57:05 -07:00
luxufan 0f45eaf0da [RISCV] Add a scavenge spill slot when use ADDI to compute scalable stack offset
Computing scalable offset needs up to two scrach registers. We add
scavenge spill slots according to the result of `RISCV::isRVVSpill`
and `RVVStackSize`. Since ADDI is not included in `RISCV::isRVVSpill`,
PEI doesn't add scavenge spill slots for scrach registers when using
ADDI to get scalable stack offsets.

The ADDI instruction has a destination register which can be used as
a scrach register. So one scavenge spil slot is sufficient for
computing scalable stack offsets.

Differential Revision: https://reviews.llvm.org/D128188
2022-07-03 20:18:13 +08:00
Craig Topper 7e4ab9d5b8 [RISCV] Add more SHXADD isel patterns.
This handles the code we get for

int foo(int* x, unsigned y) {
  return x[y >> 1];
}

The shift right and the shl will get DAG combined into
(shl (and X, 0xfffffffe), 1). We have custom isel to match the
shl+and, but with Zba the (add (shl X, 1), Y) part will get
matched and leave the and to be iseled by itself. This commit
adds a larger pattern that includes the and.
2022-07-02 23:11:22 -07:00
Craig Topper b2e9684fe4 [RISCV] isel (shl (and X, C2), C) -> (slli (srliw X, C3), C3+C).
where C2 has 32 leading zeros and C3 trailing zeros.

When the shl is used by an add C is 1,2 or 3, we end up matching
(add (shl X, C), Y) first. This leaves an and with a constant that
is harder to materialize.
2022-07-02 01:04:44 -07:00
Craig Topper 9ac548e118 [RISCV] isel (add (and X, 0xFFFFFFFE), Y) as (SH1ADD (SRLIW X, 1), Y).
Similar for SH2ADD and SH3ADD.

This is what we get from

int foo(int* x, unsigned y) {
  return x[y >> 1];
}

This allows us to avoid materializing 0xFFFFFFFE into a register.
2022-07-01 23:52:29 -07:00
Yeting Kuo 5744b9cb79 [RISCV] Restore "Enable shrink wrap by default"
This reverts commit 7af3d4ab3d.

RISC-V reverted the shrink wrap patch for bug 53662. Since the bug is fixed
by D123679, the commit re-enable it.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D128965
2022-07-02 11:13:13 +08:00
Craig Topper 354e04554a [RISCV] Make custom isel for (add X, imm) used by load/stores more selective.
Only handle immediates that would produce an ADDI or ADDIW of Lo12
as the final instruction in their materialization.

As the test change show this removes immediates that materialize
with lui+addiw that is not the same as lui+addi.
2022-06-30 14:20:11 -07:00
Craig Topper 51d672946e [RISCV] Fold (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
Similar for a subtract with a constant left hand side.

(sra (add (shl X, 32), C1<<32), 32) is the canonical IR from InstCombine
for (sext (add (trunc X to i32), 32) to i32).

For RISCV, we should lower this as addiw which means turning it into
(sext_inreg (add X, C1)).

There is an existing DAG combine to convert back to (sext (add (trunc X
to i32), 32) to i32), but it requires isTruncateFree to return true
and for i32 to be a legal type as it used sign_extend and truncate
nodes. So that doesn't work for RISCV.

If the outer sra happens be used by a shl by constant, it will be
folded and the shift amount of the sra will be changed before we
can do our own DAG combine. This requires us to match the more
general pattern and restore the shl.

I had wanted to do this as a separate (add (shl X, 32), C1<<32) ->
(shl (add X, C1), 32) combine, but that hit an infinite loop for some
values of C1.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128869
2022-06-30 09:01:24 -07:00
Craig Topper 9ace5af049 [RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C).
The sext_inreg can often be folded into an earlier instruction by
using a W instruction. The sext_inreg also works better with our ABI.

This is one of the steps to improving the generated code for this https://godbolt.org/z/hssn6sPco

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128843
2022-06-30 09:01:24 -07:00
Craig Topper 781e3d7ad8 [RISCV] Pre-commit tests for D128869. NFC 2022-06-30 09:01:24 -07:00
Fraser Cormack d5213c83ff [RISCV] Add a test covering a (reverted) codegen issue
This test checks one of problematic cases outlined in D128006, leading
to the patch's reversal. I thought it best to add a test just in case
this sort of optimization is attempted again in the future in some
fashion.
2022-06-30 09:27:52 +01:00
Craig Topper 75095e6281 [RISCV] Pre-commit tests for D128843. NFC 2022-06-29 12:12:42 -07:00
Philip Reames dd48d3ad0e Revert "[RISCV] Avoid changing etype for splat of 0 or -1"
This reverts commit 755c84c62c.  A bug was reported on the original review thread (https://reviews.llvm.org/D128006), and on inspection this patch is simply wrong.  It needs to be checking for VLInBytes, not MaxVL.  These happen to be the same when using AVL=VLMAX (which is quite common), but this does not fold when AVL != VLMAX.
2022-06-29 10:27:02 -07:00
Craig Topper 7cbfb4eb7a [RISCV] Select (srl (and X, C2) as (slli (srliw X, C3), C3-C).
If C2 has 32 leading zeros and C3 trailing zeros.
2022-06-29 09:15:09 -07:00
Philip Reames 860c62f53c [RISCV] Refine known bits for READ_VLENB
This implements known bits for READ_VALUE using any information known about minimum and maximum VLEN. There's an additional assumption that VLEN is a power of two.

The motivation here is mostly to remove the last use of getMinVLen, but while I was here, I decided to also fix the bug for VLEN < 128 and handle max from command line generically too.

Differential Revision: https://reviews.llvm.org/D128758
2022-06-28 15:42:14 -07:00
Craig Topper 02c8453e64 [RISCV] Teach RISCVMergeBaseOffset to handle read-modify-write of a global.
The pass was previously limited to LUI+ADDI being used by a single
instruction.

This patch allows the pass to optimize multiple memory operations
that use the same offset. Each of them will receive a separate %lo
relocation. My main motivation is to handle a read-modify-write
where we have a load and store to the same address, but I didn't
restrict it to that case.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128599
2022-06-28 11:46:24 -07:00
Philip Reames c755bf658f [RISCV] Add test coverage for high known bits for vscale 2022-06-28 10:04:05 -07:00
Alex Bradbury 7bcfcabbd1 [RISCV] Implement support for the Zicbop extension
Implements the ratified RISC-V Base Cache Management Operation ISA
Extension: Zicbop, as described in
https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf.

This is implemented in a separate patch to Zicbom and Zicboz due to it
requiring a new ASM operand type to be defined.

Differential Revision: https://reviews.llvm.org/D117433
2022-06-28 12:43:26 +01:00
Alex Bradbury 4f40ca53ce [RISCV] Implement support for the Zicbom and Zicboz extensions
Implements the ratified RISC-V Base Cache Management Operation ISA
Extensions: Zicbom and Zicboz, as described in
https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf.

Zicbop is implemented in a separate patch due to it requiring a new ASM
operand type to be defined.

As discussed in the relevant issue in the upstream spec
https://github.com/riscv/riscv-CMOs/issues/47, the cbo.* instructions
use the format (rs1) or 0(rs1) for their operand, similar to the AMOs.

Differential Revision: https://reviews.llvm.org/D117432
2022-06-28 12:43:25 +01:00
Lian Wang 96ab083622 [RISCV] Support VECTOR_REVERSE mask operation.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128627
2022-06-28 07:48:51 +00:00
LiaoChunyu 1178992c72 [RISCV] Optimize 2x SELECT for floating-point types
Including the following opcode:
 Select_FPR16_Using_CC_GPR
 Select_FPR32_Using_CC_GPR
 Select_FPR64_Using_CC_GPR

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127871
2022-06-28 12:02:05 +08:00
Craig Topper 9afa5b8da2 [RISCV] Add tests for (load (add X, [2048,4094])). NFC
Offsets in the range [-4095,-2049] or [2048, 4094] are split into
two ADDIs. One of the ADDIs will be folded into the load/store
immediate through an post-isel peephole.
2022-06-27 13:42:57 -07:00
Bradley Smith a83aa33d1b [IR] Move vector.insert/vector.extract out of experimental namespace
These intrinsics are now fundemental for SVE code generation and have been
present for a year and a half, hence move them out of the experimental
namespace.

Differential Revision: https://reviews.llvm.org/D127976
2022-06-27 10:48:45 +00:00
Craig Topper 3d37e785c7 [RISCV] Merge more rv32/rv64 vector intrinsic tests that contain the same content. 2022-06-25 13:21:44 -07:00
Craig Topper 78a31bb969 [RISCV] Change how we isel (add X, [-4096, -2049]) or (add X, [2048,4095]).
We currently split the immediate almost equally between two addis.
If the immediate is odd, it won't be split exactly equal.

This patch instead gives one addi an immediate of 2047 or -2048 and the
other getsthe remainder. If the original immediate is near -2049 or 2048,
this might allow the use of c.addi for the addi that receives the
smaller immediate.

Reviewed By: asb, luismarques

Differential Revision: https://reviews.llvm.org/D128500
2022-06-24 08:31:52 -07:00
Craig Topper c579ab53bd [RISCV] Move vfma_vl+fneg_vl matching to DAG combine.
This patch adds 3 new _VL RISCVISD opcodes to represent VFMA_VL with
different portions negated. It also adds a DAG combine to peek
through FNEG_VL to create these new opcodes.

This is modeled after similar code from X86.

This makes the isel patterns more regular and reduces the size of
the isel table by ~37K.

The test changes look like regressions, but they point to a bug that
was already there. We aren't able to commute a masked FMA instruction
to improve register allocation because we always use a mask undisturbed
policy. Prior to this patch we matched two multiply operands in a
different order and hid this issue for these test cases, but a different
test still could have encountered it.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D128310
2022-06-24 00:00:37 -07:00
Lian Wang 770fe864fe [SelectionDAG] Enable WidenVecOp_VECREDUCE for scalable vector
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128239
2022-06-24 02:32:53 +00:00
Philip Reames 1cc9792281 [RISCV] Fix a crash in InsertVSETVLI where we hadn't properly guarded for a SEWLMULRatioOnly abstract state
A forward abstract state can be in the special SEWLMULRatioOnly state which means we're not allowed to inspect its fields.  The scalar to vector move case was mising a guard, and we'd crash on an assert.  Test cases included.
2022-06-23 10:25:16 -07:00
Craig Topper 8b10ffabae [RISCV] Disable <vscale x 1 x *> types with Zve32x or Zve32f.
According to the vector spec, mf8 is not supported for i8 if ELEN
is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32.

Since RVVBitsPerBlock is 64 and LMUL is calculated as
((MinNumElements * ElementSize) / RVVBitsPerBlock) this means we
need to disable any type with MinNumElements==1.

For generic IR, these types will now be widened in type legalization.
For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan
to work on disabling the intrinsics in the riscv_vector.h header.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D128286
2022-06-23 08:49:18 -07:00
Craig Topper 4045b62d4c [RISCV] Add macrofusion infrastructure and one example usage.
This adds the macrofusion plumbing and support fusing LUI+ADDI(W).

This is similar to D73643, but handles a different case. Other cases
can be added in the future.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D128393
2022-06-23 08:38:39 -07:00
Craig Topper 59cde2133d Recommit "[RISCV] Enable subregister liveness tracking for RVV."
The failure that caused the previous revert has been fixed
by https://reviews.llvm.org/D126048

Original commit message:

RVV makes heavy use of subregisters due to LMUL>1 and segment
load/store tuples. Enabling subregister liveness tracking improves the quality
of the register allocation.

I've added a command line that can be used to turn it off if it causes compile
time or functional issues. I used the command line to keep the old behavior
for one interesting test case that was testing register allocation.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D128016
2022-06-20 20:46:06 -07:00
Lian Wang ab25e263a9 [SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D127710
2022-06-20 06:30:26 +00:00
Craig Topper 314dbde12c [DAGCombiner][ARM][RISCV] Teach ShrinkLoadReplaceStoreWithStore to use truncstore.
The VT we want to shrink to may not be legal especially after type
legalization.

Fixes PR56110.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D128135
2022-06-19 15:50:15 -07:00
Craig Topper 545a71c0d6 [RISCV] Pre-promote v1i1/v2i1/v4i1->i1/i2/i4 bitcasts before type legalization
Type legalization will convert the bitcast into a vector store and
scalar load.

Instead this patch widens the vector to v8i1 with undef, and bitcasts
it to i8. v8i1->i8 has custom handling for type legalization already to
bitcast to a v1i8 vector and use an extract_element.

The code here was lifted from X86's avx512 support.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D128099
2022-06-18 11:06:45 -07:00
Han-Kuan Chen e29133629b [MachineCopyPropagation][RISCV] Fix D125335 accidentally change control flow.
D125335 makes regsOverlap skip following control flow, which is not entended
in the original code.

Differential Revision: https://reviews.llvm.org/D128039
2022-06-17 21:40:08 -07:00
Han-Kuan Chen dbfb00a930 [MachineCopyPropagation][RISCV] Add test case showing failure for MachineCopyPropagation. NFC
This is a pre-commit test cases for D128039.

Differential Revision: https://reviews.llvm.org/D128040
2022-06-17 21:39:53 -07:00
Philip Reames 4d245f1bc2 [RISCV] Move store policy and mask reg ops into demanded handling in InsertVSETVLI
Doing so let's the post-mutation pass leverage the demanded info to rewrite vsetvlis before a store/mask-op to eliminate later vsetvlis.

Sorry for the lack of store test change; all of my attempts to write something reasonable have been handled through existing logic.
2022-06-17 12:09:50 -07:00
Philip Reames 755c84c62c [RISCV] Avoid changing etype for splat of 0 or -1
A splat of the values 0 and -1 as sign extended 12 bit immediates are always the same bit pattern regardless of the etype used to perform the operation. As a result, we can sometimes avoid introducing a vsetvli just for the purposes of a splat.

Looking at the diffs, we don't get a huge amount of immediate value out of this. We mostly push the vsetvli one instruction down, usually in front of a vmerge. We also don't get the corresponding fixed length vector cases because VL typically is changed despite the actual bits written being the same. Both of these are areas I plan to explore in future patches.

Interestingly, this makes a great example of why we need the forward and backward implementation to be consistent. Before we merged the demanded field handling, if we implement only the forward direction, we lost the ability to mutate a prior vsetvli and eliminate a later one entirely. This resulted in practical regressions instead of improvements. It's always nice when practice matches theory. :)

Differential Revision: https://reviews.llvm.org/D128006
2022-06-17 08:10:14 -07:00
Lian Wang 16215eb979 [LegalizeTypes][RISCV][NFC] Modify assert in PromoteIntRes_STEP_VECTOR and add some tests for RISCV
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D127939
2022-06-17 02:26:09 +00:00
Lian Wang 5e16a781ba [RISCV][NFC][test] Correct a wrong test in vreductions-fp-vp.ll
Reviewed By: victor-eds, frasercrmck

Differential Revision: https://reviews.llvm.org/D127946
2022-06-17 02:22:59 +00:00
Craig Topper 9d7b01dc95 [RISCV] Implement RISCVTargetLowering::getTargetConstantFromLoad.
This allows computeKnownBits to see the constant being loaded.

This recovers the rv64zbp test case changes from D127520.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127679
2022-06-16 15:11:18 -07:00
Craig Topper e6c7a3a54f [SelectionDAG] Don't apply MinRCSize constraint in InstrEmitter::AddRegisterOperand for IMPLICIT_DEF sources.
MinRCSize is 4 and prevents constrainRegClass from changing the
register class if the new class has size less than 4.

IMPLICIT_DEF gets a unique vreg for each use and will be removed
by the ProcessImplicitDef pass before register allocation. I don't
think there is any reason to prevent constraining the virtual register
to whatever register class the use needs.

The attached test case was previously creating a copy of IMPLICIT_DEF
because vrm8nov0 has 3 registers in it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D128005
2022-06-16 14:55:14 -07:00
Philip Reames 89a11ebd8e [RISCV] Avoid reducing etype just to initialize lane 0 of an undef vector
If we're writing to an undef vector (i.e. implicit_def), we can change the value of bits outside the requested write without consequence. This allows us to avoid a VSETVLI just for narrowing the value written.

Differential Revision: https://reviews.llvm.org/D127880
2022-06-16 11:14:21 -07:00
Craig Topper 912a5172f8 [RISCV] Use TAIL_AGNOSTIC in riscv_fma_vl patterns.
We may eventually need tail undisturbed patterns, but we will need
a policy operand on the ISD node to communicate it.
2022-06-16 09:09:36 -07:00
Philip Reames 4a3e46115a [RISCV] Extend demanded field transform in InsertVSETVLI to VTYPE subfeilds
The motivating case, and the only one actually enabled by this patch, is a load or store followed by another op with the same SEW/LMUL ratio.

As an example, consider:

define void @test1(ptr %in, ptr %out) {
entry:
  %0 = load <8 x i16>, ptr %in, align 2
  %1 = sext <8 x i16> %0 to <8 x i32>
  store <8 x i32> %1, ptr %out, align 4
  ret void
}

Without this patch, we get:

	vsetivli	zero, 8, e16, mf4, ta, mu
	vle16.v	v8, (a0)
	vsetvli	zero, zero, e32, mf2, ta, mu
	vsext.vf2	v9, v8
	vse32.v	v9, (a1)
	ret

Whereas with the patch we get:

	vsetivli	zero, 8, e32, mf2, ta, mu
	vle16.v	v8, (a0)
	vsext.vf2	v9, v8
	vse32.v	v9, (a1)
	ret

We have rewritten the first vsetvli and thus removed the second one.

As is strongly hinted by the code structure and todos, I am planning on communing this with all (or most all?) of the cases from isCompatible used in the forward data flow. This will be done in a series of following changes - some NFC reworks, and some reviewed optimization extensions.

Differential Revision: https://reviews.llvm.org/D127780
2022-06-16 08:01:27 -07:00
Kito Cheng e9f7263b38 Reland "[SplitKit] Handle early clobber + tied to def correctly"
This reverts commit 7207373e1e.

We found another RISC-V bug when landing D126048, and it has been fixed
by D127642 now.

Differential Revision: https://reviews.llvm.org/D126048
2022-06-16 17:13:09 +08:00
Kito Cheng 8e16c4db57 Reland "[RISCV] Testcase to show wrong register allocation result of subreg liveness"
This reverts commit 6a6f632b93.

This commit will failed when EXPENSIVE_CHECKS enabled, fixed by D126048.

Differential Revision: https://reviews.llvm.org/D126047
2022-06-16 17:13:09 +08:00
Kito Cheng 687e56614f [RISCV] Fixing undefined physical register issue when subreg liveness tracking enabled.
RISC-V expand register tuple spilling into series of register spilling after
register allocation phase by the pseudo instruction expansion, however part of
register tuple might be still undefined during spilling, machine verifier will
complain the spill instruction is using an undefined physical register.

Optimal solution should be doing liveness analysis and do not emit spill
and reload for those undefined parts, but accurate liveness info at that point
is not so easy to get.

So the suboptimal solution is still spill and reload those undefined parts, but
adding implicit-use of super register to spill function, then machine
verifier will only report report using undefined physical register if
the when whole super register is undefined, and this behavior are also
documented in MachineVerifier::checkLiveness[1].

Example for demo what happend:

```
  v10m2 = xxx
  # v12m2 not define yet
  PseudoVSPILL2_M2 v10m2_v12m2
  ...
```

After expansion:
```
  v10m2 = xxx
  # v12m2 not define yet
  # Expand PseudoVSPILL2_M2 v10m2_v12m2 to 2 vs2r
  VS2R_V v10m2
  VS2R_V v12m2 # Use undef reg!
```

What this patch did:
```
  v10m2 = xxx
  # v12m2 not define yet
  # Expand PseudoVSPILL2_M2 v10m2_v12m2 to 2 vs2r
  VS2R_V v10m2 implicit v10m2_v12m2
  # Use undef reg (v12m2), but v10m2_v12m2 ins't totally undef, so
  # that's OK.
  VS2R_V v12m2 implicit v10m2_v12m2
```

[1] https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/MachineVerifier.cpp#L2016-L2019

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127642
2022-06-15 16:23:39 +08:00
Yeting Kuo 9096a52566 [RISCV] Teach vsetvli insertion to not insert redundant vsetvli right after VLEFF/VLSEGFF.
VSETVLIInfos right after VLEFF/VLSEGFF are currently unknown since they modify
VL. Unknown VSETVLIInfos make next vector operations needed to be inserted
VSET(I)VLI. Actually the next vector operation of VLEFF/VLSEGFF may not need to
be inserted VSET(I)VLI if it uses same VTYPE and the resulted vl of
VLEFF/VLSEGFF.

Take the below C code as an example,

  vint8m4_t vec_src1 = vle8ff_v_i8m4(str1, &new_vl, vl);
  vbool2_t mask1 = vmseq_vx_i8m4_b2(vec_src1, 0, new_vl);
  vsetvli insertion adds a redundant vsetvli for that,

Assembly result:
  vsetvli a2,a2,e8,m4,ta,mu
  vle8ff.v v28,(a0)
  csrr a3,vl ; redundant
  vsetvli zero,a3,e8,m4,ta,mu ; redundant
  vmseq.vi v25,v28,0

After D126794, VLEFF/VLSEGFF has a define having value of VL. The patch consider
there is a ghost vsetvli right after VLEFF/VLSEGFF. The ghost VSET(I)LIs use the
vl output of the VLEFF/VLSEGFF as its AVL and same VTYPE of the VLEFF/VLSEGFF.
The ghost vsetvli must be redundant, and we could use it to get the VSETVLIInfo
right after VLEFF/VLSEGFF.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127576
2022-06-15 13:58:40 +08:00
Ping Deng c06f77ec0d [SelectionDAG] fold 'Op0 - (X * MulC)' to 'Op0 + (X << log2(-MulC))'
Reviewed By: craig.topper, spatel

Differential Revision: https://reviews.llvm.org/D127474
2022-06-15 05:50:18 +00:00
Ping Deng a0af049614 [RISCV][NFC] Add more tests for instruction selection of 'mul'
precommit tests for D127474

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127475
2022-06-15 03:48:17 +00:00
Craig Topper e4062522d3 [RISCV] Disable matchSplatAsGather for i1 vectors to prevent creating illegal nodes.
We were incorrectly creating a VRGATHER node with i1 vector type. We
could support this by promoting the mask to i8 and truncating it, but
for now I want to prevent the crash.

Fixes PR56007.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127681
2022-06-13 13:41:39 -07:00
Philip Reames 6bab045d08 [RISCV] Add basic fshr/fshl cost and codegen coverage 2022-06-13 11:49:53 -07:00
Craig Topper bb1a52aa8b Recommit "[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates."
With fix for sanitizer build bot failure.
2022-06-13 11:35:44 -07:00
Mitch Phillips 9d99870590 Revert "[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates."
This reverts commit 8bbcb98848.

Broke the UBSan bot. More details in https://reviews.llvm.org/D127376.
2022-06-13 10:16:28 -07:00
Craig Topper cef03e3dcd [RISCV] Move creation of constant pools from isel to lowering.
This simplifies the isel code by removing the manual load creation.
It also improves our ability to use 0 strided loads for vector splats.

There is an assumption here that Mask and ShiftedMask constants are
cheap enough that they don't become constant pool loads so that our
isel optimizations involving And still work. I believe those constants
are 3 instructions in the worst case.

The rv64zbp-intrinsic.ll changes is a regression caused by intrinsics
being expanded to RISCVISD also occuring during lowering. So the optimizations
were only happening during the last DAGCombine, which can't see through the
load. I believe we can fix this test by implementing
TargetLowering::getTargetConstantFromLoad for RISC-V or by adding the intrinsic
to computeKnownBitsForTargetNode to enable earlier DAG combine. Since Zbp is not
a ratified extension, I don't view these as blocking this patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127520
2022-06-13 09:07:57 -07:00
Simon Pilgrim 7d8fd4f5db [DAG] visitINSERT_VECTOR_ELT - attempt to reconstruct BUILD_VECTOR before other fold interfere
Another issue unearthed by D127115

We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with).

D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded.

This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late.

Differential Revision: https://reviews.llvm.org/D127595
2022-06-13 11:48:18 +01:00
Craig Topper 08ea27bf13 [RISCV] Don't require loop simplify form in RISCVGatherScatterLowering.
We need a preheader and a single latch, but we don't need a dedicated
exit.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127513
2022-06-10 13:00:20 -07:00
Craig Topper a639e1fceb [RISCV] Add test case showing failure to convert gather/scatter to strided load/store. NFC
Our optimization pass checks for loop simplify form, before doing
the transform. The loops here aren't in loop simplify form because
the exit block has two predecessors.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127451
2022-06-10 13:00:20 -07:00
Nikita Popov c10921fa1a [CGP] Also freeze ctlz/cttz operand when despeculating
D125887 changed the ctlz/cttz despeculation transform to insert
a freeze for the introduced branch on zero. While this does fix
the "branch on poison" issue, we may still get in trouble if we
pick a different value for the branch and for the ctz argument
(i.e. non-zero for the branch, but zero for the ctz). To avoid
this, we should use the same frozen value in both positions.

This does cause a regression in RISCV codegen by introducing an
additional sext. The DAG looks like this:

    t0: ch = EntryToken
        t2: i64,ch = CopyFromReg t0, Register:i64 %3
      t4: i64 = AssertSext t2, ValueType:ch:i32
    t23: i64 = freeze t4
          t9: ch = CopyToReg t0, Register:i64 %0, t23
          t16: ch = CopyToReg t0, Register:i64 %4, Constant:i64<32>
        t18: ch = TokenFactor t9, t16
            t25: i64 = sign_extend_inreg t23, ValueType:ch:i32
          t24: i64 = setcc t25, Constant:i64<0>, seteq:ch
        t28: i64 = and t24, Constant:i64<1>
      t19: ch = brcond t18, t28, BasicBlock:ch<cond.end 0x8311f68>
    t21: ch = br t19, BasicBlock:ch<cond.false 0x8311e80>

I don't see a really obvious way to improve this, as we can't push
the freeze past the AssertSext (which may produce poison).

Differential Revision: https://reviews.llvm.org/D126638
2022-06-10 09:46:10 +02:00
Yeting Kuo f68cad9087 [RISCV] Lower VLEFF/VLSEGFF SDNodes to MachineInstrs with VL outputs.
The patch is a replacement of D125199. PseudoReadVL with vtype has worry for
computing same vtypes of VLEFF/VLSEGFF in two different places, DAGToDAG and
InsertVSETVLI. VLEFF/VLSEGFF MI with VL output still could provide the vtype of
VLEFF/VLSEGFF to the users of its VL.

The patch names the new pseudo as original VLEFF/VLSEGFF name suffixed "_VL" and
expand them in RISCVInsertVSETVLI pass.

This patch also reverts commit 4537aae0d5,
"[RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF.".

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126794
2022-06-10 13:57:10 +08:00
Craig Topper 8bbcb98848 [RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates.
For an addition with simm14 and simm15 immediates with 2 or 3 trailing bits,
we can use a shXadd instruction and an addi to do the addition.

This patch teaches RISCVMergeBaseOffset to see through this pattern.
I don't think the sh1add case occurs because we use two addis for that,
but I implemented it for completeness.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127376
2022-06-09 16:07:35 -07:00
Kito Cheng cfa463fdc6 [RISCV][NFC] Update testcase for D126861 2022-06-10 00:18:02 +08:00
Kito Cheng 4b11f90903 [RISCV] Fix missing stack pointer recover
In order to make sure the stack point is right through the EH region,
we also need to restore stack pointer from the frame pointer if we
don't preserve stack space within prologue/epilogue for outgoing variables,
normally it's just checking the variable sized object is present or not
is enough, but we also don't preserve that at prologue/epilogue when
have vector objects in stack.

Example to show what happened:
```
try {
  sp adjust for outgoing args. // 1. Sp changed.
  func_call  // 2. Exception raised
  sp restore // Oh, not restored
} catch {
  // 3. And now we are here.
}

// 4. Prepare to return!, restore return address from stack, but...sp is wrong.
// 5. Screw up!
```

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D126861
2022-06-09 23:38:50 +08:00
Kito Cheng 8b3426569e [RISCV] Pre-commit testcase for PR55442
The testcase show the stack pointer isn't recovered when we got
exception from `_Z3fooiiiiiiiiiiPi`, and then we screw up due to
restore return address from wrong stack pointer.

NOTE:
Trigger conditions:
1. Frame pointer is required.
2. Stack has out-going argument
3. Vector extension is enabled.

Another run-able testcase:

$ clang++ -target riscv64-unknown-linux-gnu -march=rv64gcv test.cpp
```
void __attribute__((noinline)) foo(int, int, int, int, int, int, int, int, int, int, int *){
 throw int(0);
}

int main(int argc, char **argv) {
  int exception_value = 1;
  try {
      foo(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
  } catch (int i) {
    exception_value = i;
  }
  return exception_value;
}
```

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D126860
2022-06-09 23:35:38 +08:00
Lian Wang 91e31fd205 [RISCV][VP] Add fp test of widen and split for vp.setcc
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D127079
2022-06-09 08:14:12 +00:00
Lian Wang 362a02dabe [RISCV][test] Add widen STEP_VECTOR tests.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127371
2022-06-09 07:47:04 +00:00
Craig Topper 4bcfc41846 [SelectionDAG] Teach computeKnownBits that a nsw self multiply produce a positive value.
This matches what we do in IR. For the RISC-V test case, this allows
us to use -8 for the AND mask instead of materializing a constant in a register.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127335
2022-06-08 14:55:58 -07:00
Craig Topper e4ba24c17d [RISCV] Support (addi (addi globaladdr, C1), C2) in RISCVMergeBaseOffset.
Add with immediates in the range [-4096, -2049] or [2048, 4095] get
convert to two ADDIs. Teach RISCVMergeBaseOffset to recognize this
pattern as well.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D126843
2022-06-08 08:20:37 -07:00
Craig Topper 33f4da2455 [RISCV] Support LUI+ADDIW in RISCVMergeBaseOffsetOpt::matchLargeOffset.
LUI+ADDIW always produces a simm32. This allows us to always
fold it into a global offset.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D126729
2022-06-08 08:19:21 -07:00
Shao-Ce SUN 862f30a428 [RISCV] Add ISD::EH_DWARF_CFA
Based on D24038.
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible __builtin_dwarf_cfa() builtin.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D126181
2022-06-08 22:03:30 +08:00
Kito Cheng 6a6f632b93 Revert "[RISCV] Testcase to show wrong register allocation result of subreg liveness"
Revert due to failed on LLVM_ENABLE_EXPENSIVE_CHECKS.

This reverts commit cbe22c7943.
2022-06-08 21:19:27 +08:00
Kito Cheng 7207373e1e Revert "[SplitKit] Handle early clobber + tied to def correctly"
Revert due to failed on LLVM_ENABLE_EXPENSIVE_CHECKS.

This reverts commit e14d04909d.
2022-06-08 13:05:35 +08:00
Kito Cheng e14d04909d [SplitKit] Handle early clobber + tied to def correctly
Spliter will try to extend a live range into `r` slot for a use operand,
that's works on most situaion, however that not work correctly when the operand
has tied to def, and the def operand is early clobber.

Give an example to demo what's wrong:
  0  %0 = ...
 16  early-clobber %0 = Op %0 (tied-def 0), ...
 32  ... = Op %0

Before extend:
 %0 = [0r, 0d) [16e, 32d)

The point we want to extend is 0d to 16e not 16r in this case, but if
we use 16r here we will extend nothing because that already contained
in [16e, 32d).

This patch add check for detect such case and adjust the extend point.

Detailed explanation for testcase: https://reviews.llvm.org/D126047

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D126048
2022-06-08 11:33:05 +08:00
Kito Cheng cbe22c7943 [RISCV] Testcase to show wrong register allocation result of subreg liveness
This testcase show the live range isn't construct correctly when subreg
liveness is enabled.

In the testcase `early-clobber-tied-def-subreg-liveness.ll`, first operand of
`vsext.vf2 v8, v16, v0.t` is both def and use, and the use is come from
the memory location of `.L__const._Z3foov.var_49`, it's load and spilled
into stack, and then...v8 is overwrite by another instructions.

```
lui a0, %hi(.L__const._Z3foov.var_49)
addi a0, a0, %lo(.L__const._Z3foov.var_49)
...
vle16.v v8, (a0) # Load value from var_49
...
addi a0, sp, 16
...
vs2r.v v8, (a0) # Spill
...
vl2r.v v8, (a1) # Reload
...
lui a0, %hi(.L__const._Z3foov.var_40)
addi a0, a0, %lo(.L__const._Z3foov.var_40)
vle16.v v8, (a0)     # Load value...into v8???
vmsbc.vx v0, v8, a0  # And use that.
...
vsext.vf2 v8, v16, v0.t # But v8 is here...which is expect value from the reload
```

The `early-clobber-tied-def-subreg-liveness.mir` has more detailed
infomation for that, `%25.sub_vrm2_0` is defined in 64, and used in 464,
and defined again in 464, and we has used an inline asm to clobber all
vector register for trigger spliter.

```
0B      bb.0.entry:
16B       %0:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_49
32B       %1:gpr = ADDI %0:gpr, target-flags(riscv-lo) @__const._Z3foov.var_49
48B       dead $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype
64B       undef %25.sub_vrm2_0:vrn4m2nov0 = PseudoVLE16_V_M2 %1:gpr, 2, 4, implicit $vl, implicit $vtype
80B       %3:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_48
96B       %4:gpr = ADDI %3:gpr, target-flags(riscv-lo) @__const._Z3foov.var_48
112B      %5:vr = PseudoVLE8_V_M1 %4:gpr, 2, 3, implicit $vl, implicit $vtype
128B      %6:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_46
144B      %7:gpr = ADDI %6:gpr, target-flags(riscv-lo) @__const._Z3foov.var_46
160B      %25.sub_vrm2_1:vrn4m2nov0 = PseudoVLE16_V_M2 %7:gpr, 2, 4, implicit $vl, implicit $vtype
176B      %9:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_45
192B      %10:gpr = ADDI %9:gpr, target-flags(riscv-lo) @__const._Z3foov.var_45
208B      %25.sub_vrm2_2:vrn4m2nov0 = PseudoVLE16_V_M2 %10:gpr, 2, 4, implicit $vl, implicit $vtype
224B      INLINEASM &"" [sideeffect] [attdialect], $0:[clobber], ...
240B      %12:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_44
256B      %13:gpr = ADDI %12:gpr, target-flags(riscv-lo) @__const._Z3foov.var_44
272B      dead $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype
288B      %25.sub_vrm2_3:vrn4m2nov0 = PseudoVLE16_V_M2 %13:gpr, 2, 4, implicit $vl, implicit $vtype
304B      $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype
320B      %16:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_40
336B      %17:gpr = ADDI %16:gpr, target-flags(riscv-lo) @__const._Z3foov.var_40
352B      %18:vrm2 = PseudoVLE16_V_M2 %17:gpr, 2, 4, implicit $vl, implicit $vtype
368B      $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype
384B      %20:gpr = LUI 1048572
400B      %21:gpr = ADDIW %20:gpr, 928
416B      early-clobber %22:vr = PseudoVMSBC_VX_M2 %18:vrm2, %21:gpr, 2, 4, implicit $vl, implicit $vtype
432B      $x0 = PseudoVSETIVLI 2, 9, implicit-def $vl, implicit-def $vtype
448B      $v0 = COPY %22:vr
464B      early-clobber %25.sub_vrm2_0:vrn4m2nov0 = PseudoVSEXT_VF2_M2_MASK %25.sub_vrm2_0:vrn4m2nov0(tied-def 0), %5:vr, killed $v0, 2, 4, 0, implicit $vl, implicit $vtype
480B      %26:gpr = LUI target-flags(riscv-hi) @var_47
496B      %27:gpr = ADDI %26:gpr, target-flags(riscv-lo) @var_47
512B      PseudoVSSEG4E16_V_M2 %25:vrn4m2nov0, %27:gpr, 2, 4, implicit $vl, implicit $vtype
528B      PseudoRET
```

When spliter will try to split %25:

```
selectOrSplit VRN4M2NoV0:%25 [64r,160r:4)[160r,208r:0)[208r,288r:1)[288r,464e:2)[464e,512r:3) 0@160r 1@208r 2@288r 3@464e 4@64r  L0000000000000030 [160r,512r:0) 0@160r  L00000000000000C0 [208r,512r:0) 0@208r  L0000000000000300 [288r,512r:0) 0@288r  L000000000000000C [64r,464e:1)[464e,512r:0) 0@464e 1@64r  weight:1.179245e-02 w=1.179245e-02
```

```
Best local split range: 64r-208r, 6.999861e-03, 3 instrs
    enterIntvBefore 64r: not live
    leaveIntvAfter 208r: valno 1
    useIntv [64B;216r): [64B;216r):1
  blit [64r,160r:4): [64r;160r)=1(%29)(recalc)
  blit [160r,208r:0): [160r;208r)=1(%29)(recalc)
  blit [208r,288r:1): [208r;216r)=1(%29)(recalc) [216r;288r)=0(%28)(recalc)
  blit [288r,464e:2): [288r;464e)=0(%28)(recalc)
  blit [464e,512r:3): [464e;512r)=0(%28)(recalc)
  rewr %bb.0    464e:0  early-clobber %28.sub_vrm2_0:vrn4m2nov0 = PseudoVSEXT_VF2_M2_MASK %25.sub_vrm2_0:vrn4m2nov0(tied-def 0), %5:vr, $v0, 2, 4, 0, implicit $vl, implicit $vtype
  rewr %bb.0    288r:0  %28.sub_vrm2_3:vrn4m2nov0 = PseudoVLE16_V_M2 %13:gpr, 2, 4, implicit $vl, implicit $vtype
  rewr %bb.0    208r:1  %29.sub_vrm2_2:vrn4m2nov0 = PseudoVLE16_V_M2 %10:gpr, 2, 4, implicit $vl, implicit $vtype
  rewr %bb.0    160r:1  %29.sub_vrm2_1:vrn4m2nov0 = PseudoVLE16_V_M2 %7:gpr, 2, 4, implicit $vl, implicit $vtype
  rewr %bb.0    64r:1   undef %29.sub_vrm2_0:vrn4m2nov0 = PseudoVLE16_V_M2 %1:gpr, 2, 4, implicit $vl, implicit $vtype
  rewr %bb.0    464B:0  early-clobber %28.sub_vrm2_0:vrn4m2nov0 = PseudoVSEXT_VF2_M2_MASK %28.sub_vrm2_0:vrn4m2nov0(tied-def 0), %5:vr, $v0, 2, 4, 0, implicit $vl, implicit $vtype
  rewr %bb.0    512B:0  PseudoVSSEG4E16_V_M2 %28:vrn4m2nov0, %27:gpr, 2, 4, implicit $vl, implicit $vtype
  rewr %bb.0    216B:1  undef %28.sub_vrm1_0_sub_vrm1_1_sub_vrm1_2_sub_vrm1_3_sub_vrm1_4_sub_vrm1_5:vrn4m2nov0 = COPY %29.sub_vrm1_0_sub_vrm1_1_sub_vrm1_2_sub_vrm1_3_sub_vrm1_4_sub_vrm1_5:vrn4m2nov0
queuing new interval: %28 [216r,288r:0)[288r,464e:1)[464e,512r:2) 0@216r 1@288r 2@464e  L000000000000000C [216r,216d:0)[464e,512r:1) 0@216r 1@464e  L0000000000000300 [288r,512r:0) 0@288r  L00000000000000C0 [216r,512r:0) 0@216r  L0000000000000030 [216r,512r:0) 0@216r  weight:8.706897e-03
Enqueuing %28
queuing new interval: %29 [64r,160r:0)[160r,208r:1)[208r,216r:2) 0@64r 1@160r 2@208r  L000000000000000C [64r,216r:0) 0@64r  L00000000000000C0 [208r,216r:0) 0@208r  L0000000000000030 [160r,216r:0) 0@160r  weight:1.097826e-02
Enqueuing %29
```

The live range of first part subreg of %25 is become [216r,216d:0)[464e,512r:1),
however first live range should live until 464e rather than just live
and [216r,216d:0).

And then the register allocator allocated wrong result accroding the
live range info.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D126047
2022-06-08 11:27:24 +08:00
Craig Topper 0c66deb498 [RISCV] Scalarize gather/scatter on RV64 with Zve32* extension.
i64 indices aren't supported on Zve32*. Scalarize gathers to prevent
generating illegal instructions.

Since InstCombine will aggressively canonicalize GEP indices to
pointer size, we're pretty much always going to have an i64 index.

Trying to predict when SelectionDAG will find a smaller index from
the TTI hook used by the ScalarizeMaskedMemIntrinPass seems fragile.
To optimize this we probably need an IR pass to rewrite it earlier.

Test RUN lines have also been added to make sure the strided load/store
optimization still works.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127179
2022-06-07 08:07:50 -07:00
Matt Arsenault 56303223ac llvm-reduce: Don't assert on functions which don't track liveness
Use the query that doesn't assert if TracksLiveness isn't set, which
needs to always be available. We also need to start printing liveins
regardless of TracksLiveness.
2022-06-07 10:00:25 -04:00
Craig Topper be398100ea [SelectionDAG] Further improve computeKnownBits for (smax X, C) where C is non-negative.
Move the code that was added for D126896 after the normal recursive calls
to computeKnownBits. This allows us to calculate trailing zeros.
Previously we would break out of the switch before the recursive calls.
2022-06-06 09:59:23 -07:00
Shao-Ce SUN 84bacb18c6 [RISCV] Use check-prefixes to reduce check lines
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D125083
2022-06-06 15:59:15 +08:00
Lian Wang 20cf77f776 [LegalizeTypes][VP] Add widen and split support for vp.fptrunc and vp.fpext
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D126439
2022-06-06 02:28:01 +00:00
LiaoChunyu f14d18c7a9 [RISCV] Add more patterns for FNMADD
D54205 handles fnmadd: -rs1 * rs2 - rs3
This patch add fnmadd: -(rs1 * rs2 + rs3) (the nsz flag on the FMA)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D126852
2022-06-04 12:31:45 +08:00
Craig Topper cc3bd43533 [RISCV] Support LUI+ADDIW in doPeepholeLoadStoreADDI.
This fixes an inconsistency between RV32 and RV64. Still considering
trying to do this peephole during isel, but wanted to fix the
inconsistency first.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126986
2022-06-03 18:06:56 -07:00
Craig Topper 8da5d5dbdc [RISCV] Pre-commit test cases for D126986. NFC 2022-06-03 13:31:45 -07:00
Craig Topper 170c550ca8 [RISCV] Use SelectionDAG::isBaseWithConstantOffset in scalar load/store address matching.
Test changes are because isBaseWithConstantOffset uses computeKnownBits
and that is able to see that an earlier AND instruction guaranteed
alignment so that we can treat an OR as an ADD.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126970
2022-06-03 10:55:28 -07:00
Craig Topper dbead2388b [RISCV] Add custom isel for (add X, imm) used by load/stores.
If the imm is out of range for an ADDI, we will materialize it in
a register using multiple instructions. If the ADD is used by a
load/store, doPeepholeLoadStoreADDI can try to pull an ADDI from
the constant materialization into the load/store offset. This only
works if the ADD has a single use, otherwise the peephole would have
to rebuild multiple nodes.

This patch instead tries to solve the problem when the add is selected.
We check that the add is only used by loads/stores and if it is
we will select it to (ADDI (ADD X, Imm-Lo12), Lo12). This will enable
the simple case in doPeepholeLoadStoreADDI that can bypass an ADDI
used as a pointer. As a result we can remove the more complicated
peephole from doPeepholeLoadStoreADDI.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126576
2022-06-02 13:45:32 -07:00
Craig Topper fa20bf1636 [DAGCombiner][RISCV] Improve computeKnownBits for (smax X, C) where C is non-negative.
If C is non-negative, the result of the smax must also be
non-negative, so all sign bits of the result are 0.

This allows DAGCombiner to remove a zext_inreg in the modified test.
This zext_inreg started as a sext that became zext before type
legalization then was promoted to a zext_inreg.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126896
2022-06-02 12:34:24 -07:00
Craig Topper 01ba470826 [RISCV] Add test case showing unnecessary extend after i32 smax on rv64. NFC
One of the operands of the smax is a positive value so computeKnownBits
determines the result of the smax must always be positive. This allows
DAG combiner to convert the sign extend to zero extend before type
legalization.

After type legalization the smax is promoted to i64 by sign extending
its inputs and the zero extend becomes an AND instruction. We are unable
to remove the AND at this point and it becomes a pair of shifts or a
zext.w.

The result of smax has as many sign bits as the minimum of its inputs.
Had we kept the sign extend instead of turning it into a zero extend
it would be removed by DAG combiner after type legalization.
2022-06-02 09:58:11 -07:00
Philip Reames dcdb0bf25b [RISCV] Fix an inconsistency with compatible load/store handling
Once we've computed the incoming predecessor state, we should use the same compatibility check with knowledge of MI as we did in phase 2 in order to be consistent across all phases.

Differential Revision: https://reviews.llvm.org/D126574
2022-06-02 08:03:51 -07:00
jacquesguan 5482ae6328 [LegalizeTypes][VP] Add widen and split support for VP FP integer casting op.
This patch adds widen and split support for VP_FPTOSI, VP_FPTOUI, VP_SITOFP and VP_UITOFP.

Differential Revision: https://reviews.llvm.org/D126847
2022-06-02 09:05:27 +00:00
jacquesguan 058791d8f2 [LegalizeTypes][VP] Add widen and split support for VP_SIGN_EXTEND and VP_ZERO_EXTEND.
Differential Revision: https://reviews.llvm.org/D126442
2022-06-02 02:21:22 +00:00
Hendrik Greving a92ed167f2 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-02 00:49:11 +00:00
Philip Reames f15add7d93 [RISCV] Split fixed-vector-strided-load-store.ll so it can be autogened
I've gotten tired of updating register allocation changes by hand, let's just autogen this even if we have to duplicate it.
2022-06-01 16:12:35 -07:00
Hendrik Greving e9d05cc7d8 Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4."
This reverts commit 430ac5c302.

Due to failures in Clang tests.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 13:27:49 -07:00
Hendrik Greving 430ac5c302 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 12:48:01 -07:00
Craig Topper aeb27f133a [RISCV] Fix i64<->f64 and i32<->f32 bitcasts with VLS vectors enabled.
We enable a custom handler to optimize conversions between scalars
and fixed vectors. Unfortunately, the custom handler picks up scalar
to scalar conversions as well. If the scalar types are both legal,
we wouldn't match any of the fixed vector cases and would return SDValue()
causing the LegalizeDAG to expand the bitcast through memory.

This patch fixes this by checking if it's a scalar to scalar conversion
and returns `Op` if both types are legal.

Differential Revision: https://reviews.llvm.org/D126739
2022-06-01 08:13:49 -07:00
wangpc 57203af167 [RISCV] Set target-abi explicitly to reduce codegen results
As mentioned in D125947, we can reduce codegen results by
adding an explicit hard single-float ABI.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D126640
2022-06-01 13:49:23 +08:00
Philip Reames 33b1be5916 [riscv] add test coverage for fractional lmul w/fixed length vectorization 2022-05-31 10:25:37 -07:00
Craig Topper 1b2de79ff4 [RISCV] Use two ADDIs to do some stack pointer adjustments.
If the adjustment doesn't fit in 12 bits, try to break it into
two 12 bit values before falling back to movImm+add/sub.

This is based on a similar idea from isel.

Reviewed By: luismarques, reames

Differential Revision: https://reviews.llvm.org/D126392
2022-05-31 10:25:28 -07:00
Craig Topper 80c4cf6369 [RISCV] Fix a few corner case bugs in RISCVMergeBaseOffsetOpt::matchLargeOffset
The immediate for LUI is stored as 20-bit unsigned value. We need
to sign extend if after shifting by 12 to match the instruction
behavior.

If we find an LUI+ADDI on RV64, it means the constant isn't a
simm32. If it was, we would have emitted LUI+ADDIW from constant
materialization. Make sure the constant is a simm32 before folding.
This appears to match gcc.

A future patch will add support for LUI+ADDIW on RV64.
2022-05-31 09:50:54 -07:00
Craig Topper 3b5456d5f0 [RISCV] Pre-commit tests for D126635. NFC 2022-05-31 09:49:46 -07:00
eopXD 2cadf84fc8 [RISCV] Pass OptLevel to `RISCVDAGToDAGISel` correctly
Originally, `OptLevel` isn't passed into the `MachineFunctionPass`.
This lets the default parameter of `SelectionDAGISel`, which is
`CodeGenOpt::Default`, be passed in. OptLevelChanger captures the
optimization level with the parameter, and rather not the value
within `TargetMachine`. This lets the optimization be
unintentionally overwriten if other value than `CodeGenOpt::Default`
passed.

This patch fixes this by passing the optimization level rather
than using the default value.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D126641
2022-05-30 17:22:50 -07:00
eopXD 51002bdb5e [RISCV] Precommit test case to show bug in RISCVISelDagToDag
The optimization level should not be restored into O2.
This is a pre-commit test case to show fix in D126641.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D126677
2022-05-30 15:59:20 -07:00
Ping Deng 88af539c0e [RISCV] Support VP_REDUCE_MUL mask operation
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126520
2022-05-30 03:05:39 +00:00
Ping Deng 083798e270 [LegalizeTypes][VP] Add integer promotion support for vp.fptosi/vp.fptoui
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125760
2022-05-30 03:05:39 +00:00
Craig Topper 6a6cf2e28d [RISCV] isel (add (and X, 0x1FFFFFFFE), Y) as (SH1ADD (SRLI X, 1), Y)
This pattern is what we get after DAG combine for C code like this.

short *ptr1, *ptr2, *ptr3;
unsigned diff = ptr1 - ptr2;
return ptr3[diff];

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126588
2022-05-29 18:24:07 -07:00
Craig Topper e642d0ea21 [RISCV] Add test cases showing missed opportunity to use shXadd.uw. NFC
The tests here show the codegen for something like this C code.

unsigned diff = ptr1 - ptr2;
return ptr3[diff];

The pointer difference is truncated to 32-bits before being used
again as an index. In SelectionDAG this appears as an AND between
a SRL and a SHL. DAGCombiner will remove the shifts leaving only
an AND. The Mask now has 1,2, or 3 trailing zeros and 31, 30, or 29
leading zeros. We end up falling back to constant materialization
to create this mask.

We could instead use srli followed by slli.uw. Or since
we have an add, we can use srli followed by shXadd.uw.

Differential Revision: https://reviews.llvm.org/D126589
2022-05-29 18:22:55 -07:00
Philip Reames 85b4470035 [RISCV] Allow PRE of vsetvli involving non-1 LMUL
This is a follow up to address a review comment from D124869. When deciding whether to PRE a vsetvli, we can allow non-LMUL1 vsetvlis.

Differential Revision: https://reviews.llvm.org/D126563
2022-05-27 15:49:41 -07:00
Craig Topper 542a83c362 [RISCV] Correct load/store alignments in sink-splat-operands.ll. NFC
These should be aligned to the natural alignment of the element.
Probably copy/paste mistake from the i32 tests.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126567
2022-05-27 14:39:31 -07:00
Philip Reames d4905a7b20 [RISCV] Add a vsetvli PRE test involving non-1 LMUL 2022-05-27 13:16:05 -07:00
Craig Topper aaad507546 [RISCV] Return false from isOffsetFoldingLegal instead of reversing the fold in lowering.
When lowering GlobalAddressNodes, we were removing a non-zero offset and
creating a separate ADD.

It already comes out of SelectionDAGBuilder with a separate ADD. The
ADD was being removed by DAGCombiner.

This patch disables the DAG combine so we don't have to reverse it.
Test changes all look to be instruction order changes. Probably due
to different DAG node ordering.

Differential Revision: https://reviews.llvm.org/D126558
2022-05-27 11:05:18 -07:00
Rahman Lavaee 3aa249329f Revert "[Propeller] Promote functions with propeller profiles to .text.hot."
This reverts commit 4d8d2580c5.
2022-05-26 18:45:40 -07:00
Rahman Lavaee 4d8d2580c5 [Propeller] Promote functions with propeller profiles to .text.hot.
Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely).

This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`.

The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`.

Differential Revision: https://reviews.llvm.org/D122930
2022-05-26 16:23:21 -07:00
Philip Reames 8a3b6ba756 [RISCV] Add a subtarget feature to enable unaligned scalar loads and stores
A RISCV implementation can choose to implement unaligned load/store support. We currently don't have a way for such a processor to indicate a preference for unaligned load/stores, so add a subtarget feature.

There doesn't appear to be a formal extension for unaligned support. The RISCV Profiles (https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva20u64-profile) docs use the name Zicclsm, but a) that doesn't appear to actually been standardized, and b) isn't quite what we want here anyway due to the perf comment.

Instead, we can follow precedent from other backends and have a feature flag for the existence of misaligned load/stores with sufficient performance that user code should actually use them.

Differential Revision: https://reviews.llvm.org/D126085
2022-05-26 15:25:47 -07:00
Craig Topper 460781feef [LegalizeTypes] Fix bug in expensive checks verification
With a fix for an expensive checks build failure exposed by new RISC-V tests.
Something about expanding two rotates in type legalization caused a change
in the remapping tables that the expensive checks verifying wasn't expecting.
See comment in the code for how it was fixed.

Tests came from this commit that exposed the bug
[RISCV] Add test cases showing failure to remove mask on rotate amounts.

If the masking AND has multiple users we fail to remove it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D126036
2022-05-26 13:13:32 -07:00
Philip Reames afe49934a6 [RISCV] Allow compatible VTYPE in AVL Reg Forward cases
During insertion of VSETVLI, we have two related bits of code which decide whether we can reuse a previous vsetvli result. As was pointed out in the original review, these cases can allow any prior state for which we know that VL is the same for any value of AVL.

This was originally separated out of a desire for separate tests and review. As it turns out, finding a test case for this has been quite challenging. Most of the cases I tried, we manage to already get through other chains of logic. We do have one correct test change, but that only exercises one of the two changes.

Differential Revision: https://reviews.llvm.org/D126400
2022-05-26 08:50:35 -07:00
Chen Zheng d79275238f [MachineSink] replace MachineLoop with MachineCycle
reapply 62a9b36fcf and fix module build
failue:
1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def
   MachineCycleInfoWrapperPass is a anylysis pass, should not be there.
2: move the definition for MachineCycleInfoPrinterPass to cpp file.

Otherwise, there are module conflicit for MachineCycleInfoWrapperPass
in MachinePassRegistry.def and MachineCycleAnalysis.h after
62a9b36fcf.

MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().

This patch tries to use MachineCycle so that we can handle
irreducible loop better.

Reviewed By: sameerds, MatzeB

Differential Revision: https://reviews.llvm.org/D123995
2022-05-26 06:45:23 -04:00
Lian Wang 8aa6b05deb [LegalizeTypes][VP] Add widen and split support for VP_TRUNCATE
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125950
2022-05-26 02:03:27 +00:00
Haocong.Lu 085acede57 [RISCV][NFC] Remove solved TODO for combining constant shifts
Reviewed By: benshi001, asb

Differential Revision: https://reviews.llvm.org/D126185
2022-05-26 09:55:19 +08:00
Philip Reames be2cb824d0 [riscv] Remove mutation of prior vsetvli from insertion dataflow
This moves mutation entirely out of the main algorithm.

The immediate trigger is that we hit another case of the same issue I thought we'd fixed in 72925d9. It turns out we hadn't considered the cross block case.

As a brief summary, the issue being fixed is that if we mutate a previous vsetvli in phase 3, there's a possibility that some later use of that vsetvli changes "compatibility". In the cross_block_mutate test, this later vsetvli occurs in another block (and is thus visit order dependent too!). This causes us to fail strict asserts. (To be explicit, the current on by default workaround should compensate. It's only when we turn that off that we have problems.)

Now, I want to explicitly call out an alternate workaround. We could leave the mutation in phase 3, and simplify restrict it to the case where the previous vsetvli's GPR result is unused. That covers the case we've actually seen. (I'll note that codegen regressions with a simple form of this were significant. We might have to check specifically for the use outside block case to keep them reasonable, which complicates the workaround slightly.)

Personally, I'm at the point where I want the mutation pulled out just for robustness sake. I'm worried there's yet one more form of this bug we haven't thought about.

The other motivation for this change is that it does give us a couple of minor codegen wins. None appear to be hugely significant, but improvements never hurt right?

Differential Revision: https://reviews.llvm.org/D125270
2022-05-25 10:51:14 -07:00
Craig Topper 172149e98c [RISCV] Preserve fast math flags in lowerVPOp.
Update test to check MIR after finalize-isel instead of debug output.

This is of course not the only place we should preserve FMF, but
it's the most obvious one.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D126306
2022-05-25 09:16:07 -07:00
Philip Reames 2a3b6f2cba [RISCV] Hoist VSETVLI vlmax, vtype out of scalable loops
This is a straight forward extension of the PRE transform introduced in D124869 to handle the VLMAX case.

The test changes here look quite positive. This surprised me until I realized that all the tests are using @llvm.vscale to figure out the VLMAX, not the llvm.riscv.vsetvlmax intrinsic. If they'd used the later, these would have been full redundancy cases and fully handled by the data flow. I'm not really sure if use of vscale here is representative or not. If it is, we should probably look at using VSETVLI to lower vscale rather than a raw read of vlenb and some math.

Differential Revision: https://reviews.llvm.org/D126338
2022-05-25 08:00:27 -07:00
Philip Reames a4a438f05a [riscv] Add coverage for fixed length vector loops using LMUL 2022-05-25 07:42:21 -07:00
Lewis Revill 29a5a7c6d4 [RISCV] Add pre-emit pass to make more instructions compressible
When optimizing for size, this pass searches for instructions that are
prevented from being compressed by one of the following:

1. The use of a single uncompressed register.
2. A base register + offset where the offset is too large to be
   compressed and the base register may or may not already be compressed.

In the first case, if there is a compressed register available, then the
uncompressed register is copied to the compressed register and its uses
replaced. This is only done if there are enough uses that code size
would be improved.

In the second case, if a compressed register is available, then the
original base register is copied and adjusted such that:

new_base_register = base_register + adjustment
base_register + large_offset = new_base_register + small_offset

and the uses of the base register are replaced with the new base
register. Again this is only done if there are enough uses for code size
to be improved.

This pass was authored by Lewis Revill, with large offset optimization
added by Craig Blackmore.

Differential Revision: https://reviews.llvm.org/D92105
2022-05-25 09:25:02 +01:00
Craig Topper 66db5312bd [RISCV] Fix vnsrl/vnsra isel patterns that are dropping VL.
We were incorrectly using VLMax instead of the passed VL.

Reviewed By: khchen, reames

Differential Revision: https://reviews.llvm.org/D126319
2022-05-24 21:38:59 -07:00
Chen Zheng 80c4910f3d Revert "[MachineSink] replace MachineLoop with MachineCycle"
This reverts commit 62a9b36fcf.
Cause build failure on lldb incremental buildbot:
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes
2022-05-24 22:43:37 -04:00
Philip Reames 948d931323 [RISCV] Ensure the forwarded AVL register is alive
When the AVL value does not fit in 5 bits, the register in which this value is stored may be dead when we want to forward it. This patch ensure the kill flags on the register are cleared before forwarding.

Patch by: loralb
Differential Revision: https://reviews.llvm.org/D125971
2022-05-24 15:07:42 -07:00
Philip Reames a95ecb20bc [RISCV] Hoist VSETVLI out of idiomatic fixed length vector loops
This patch teaches the VSETVLI insertion pass to perform a very limited form of partial redundancy elimination. The motivating example comes from the fixed length vectorization of a simple loop such as:

for (unsigned i = 0; i < a_len; i++)
    a[i] += b;

Without this change, the core vector loop and preheader is as follows:

.LBB0_3:                                # %vector.ph
	andi	a1, a6, -8
	addi	a4, a0, 16
	mv	a5, a1
.LBB0_4:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
	addi	a3, a4, -16
	vsetivli	zero, 4, e32, m1, ta, mu
	vle32.v	v8, (a3)
	vle32.v	v9, (a4)
	vadd.vx	v8, v8, a2
	vadd.vx	v9, v9, a2
	vse32.v	v8, (a3)
	vse32.v	v9, (a4)
	addi	a5, a5, -8
	addi	a4, a4, 32
	bnez	a5, .LBB0_4

The key thing to note here is that, the execution of the vsetivli only needs to happen once. Since there's no tail folding happening here, the value of the vector configuration registers are invariant through the loop.

After this patch, we hoist the configuration into the preheader and perform it once.

.LBB0_3:                                # %vector.ph
	andi	a1, a6, -8
	vsetivli	zero, 4, e32, m1, ta, mu
	addi	a4, a0, 16
	mv	a5, a1
.LBB0_4:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
	addi	a3, a4, -16
	vle32.v	v8, (a3)
	vle32.v	v9, (a4)
	vadd.vx	v8, v8, a2
	vadd.vx	v9, v9, a2
	vse32.v	v8, (a3)
	vse32.v	v9, (a4)
	addi	a5, a5, -8
	addi	a4, a4, 32
	bnez	a5, .LBB0_4

Differential Revision: https://reviews.llvm.org/D124869
2022-05-24 14:56:01 -07:00