Commit Graph

1997 Commits

Author SHA1 Message Date
Craig Topper c9a41fe60a [RISCV] Prefer vnsrl.wi v8, v8, 0 over vnsrl.wx v8, v8, x0.
I have a couple data points that some microarchitectures prefer
the immediate 0 over x0. Does anyone know of microarchitectures
where the opposite is true?

Unfortunately, this is different than the vncvt.x.x.w alias
from the spec. Perhaps the alias was poorly chosen if x0 isn't
as optimal as immediate 0 on all microarchitectures.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132041
2022-08-19 08:40:17 -07:00
Archibald Elliott 3a729069e4 [IR] Update llvm.prefetch to match docs
The current llvm.prefetch intrinsic docs state "The rw, locality and
cache type arguments must be constant integers."

This change:
- Makes arg 3 (cache type) an ImmArg
- Improves the verifier error messages to reference the incorrect
  argument.
- Fixes two tests which contradict the docs.

This is needed as the lowering to GlobalISel is different for ImmArgs
compared to other constants. The non-ImmArgs create a G_CONSTANT MIR
instruction, the for ImmArgs the constant is put directly on the
intrinsic's MIR instruction as an immediate.

Differential Revision: https://reviews.llvm.org/D132042
2022-08-19 09:11:17 +01:00
Craig Topper ba1f4cab44 [RISCV] Copy SDNodeFlags in lowerToScalableOp.
Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D132177
2022-08-18 20:42:59 -07:00
Craig Topper 5349aa2354 [RISCV] Copy SDNodeFlags in doPeepholeMaskedRVV and doPeepholeMergeVVMFold
Especially the NoFPExcept flag for FP.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D132173
2022-08-18 20:42:46 -07:00
Craig Topper 550fab53e1 [RISCV] Fold (sub C, (xor (setcc), 1)) -> (add (setcc), C-1).
Extracted from D131729 where we handled C==0. It's now generalized
to more constants.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132000
2022-08-17 09:50:08 -07:00
Alex Bradbury ce38128194 [RISCV] Avoid redundant branch-to-branch when expanding cmpxchg
If the success value of a cmpxchg is used in a branch, the expanded
cmpxchg sequence ends up with a redundant branch-to-branch (as the
backend atomics expansion happens as late as possible, passes to
optimise such cases have already run). This patch identifies this case
and avoid it when expanding the cmpxchg.

Note that a similar optimisation is possible for a BEQ on the cmpxchg
success value. As it's hard to imagine a case where real-world code may
do that, this patch doens't handle that case.

Differential Revision: https://reviews.llvm.org/D130192
2022-08-17 13:49:15 +01:00
Craig Topper 39707c1a9a [RISCV] Add test coverage for (select (icmp X, Y), float, float). NFC
We fold integer setcc into SELECT_CC during DAG combine even if
the SELECT_CC has FP result type, but we had no test coverage.
2022-08-16 21:28:26 -07:00
Craig Topper d8cdd78b6c [RISCV] Add test cases to show missed opportunity to fold (sub C, (xor (setcc), 1)). NFC
(sub C, (xori X, 1)) can be folded to (add X, C-1) if X is 0 or 1.

This would avoid the xori and in some cases remove an instruction
neede to materialize the constant.
2022-08-16 16:40:53 -07:00
Craig Topper 53ce22e429 Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine."
This time using N1 instead of N0 since N1 points to the original
setcc. This now affects scheduling as I expected.

Original commit message:
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
2022-08-16 15:51:07 -07:00
Craig Topper b5a18de651 [RISCV] Remove C!=0 restriction from (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)).
While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is
still preferrable. (addi X, -1) can be compressed, sub with x0 on
the LHS is never compressible.
2022-08-16 14:49:52 -07:00
Craig Topper de6fd16971 [RISCV] Don't fold (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) if C-1 isn't simm12.
We still need to materialize the constant in a register and we
may not be removing all uses of the original constant so it may
increase code size.
2022-08-16 14:11:31 -07:00
Craig Topper 1180ed41ee [RISCV] Add more test cases for (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)). NFC
In these test cases we do the transform, but the immediate is too
large to form an ADDI so it didn't save any instructions.

If the constant is opaque or has additional users we shouldn't do
the transform if it doesn't form an ADDI.
2022-08-16 14:08:42 -07:00
Craig Topper 4854fa217f [RISCV] Move test from setcc-logic.ll to select-const.ll. NFC
Also add setne version of the test.

Add some common prefixes to reduce number of identical CHECK lines.
2022-08-16 14:08:42 -07:00
Craig Topper 4184edc691 [RISCV] (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) fold for FP setcc.
This introduce an xori in some cases. I don't believe it was the
intention of the original patch. This was an accident because
nonan FP equality compares also use SETEQ/SETNE.

Also pass the correct type to getSetCCInverse.
2022-08-16 13:00:36 -07:00
Craig Topper 87e7837293 [RISCV] Add test cases to show where we inverted a fp setcc and introduced an extra xori.
In these tests we had (sub C, (seteq X, Y)) which we converted to
the (add (setne X, Y), C-1). We don't have a FNE compare instruction
so this created an XORI to invert an FEQ instruction.

This might be a good idea since it can save a constant materialization,
but does not appear to be the intention of the original patch.
2022-08-16 12:59:16 -07:00
Craig Topper 7a73ab5818 [RISCV] Enable isTruncateFree in SDAG for i64->i32 on rv64.
We have a good selection of W instructions, so promoting a truncated
value back to i64 is often free.

This appears to be a net code size reduction on SPECINT2006.

This has been split from D130397 as one of the patches needed to
complete that.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131819
2022-08-15 08:32:51 -07:00
Simon Pilgrim 3a73133217 [DAG] canCreateUndefOrPoison - add freeze(sign_extend_inreg(x,vt)) -> sign_extend_inreg(freeze(x),vt) support
Guaranteed not to create undef/poison
2022-08-15 12:18:59 +01:00
Simon Pilgrim 7e294e676e [DAG] canCreateUndefOrPoison - add freeze(assertsext/zext(x,bt)) -> assertsext/zext(freeze(x),vt) support
These are guaranteed not to create undef/poison (although they may pass through) - the associated ISD::VALUETYPE node is also guaranteed never to generate poison
2022-08-15 11:13:43 +01:00
Craig Topper b8c5420d74 [X86][RISCV] Pre-commit tests for D130862. NFC
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131442
2022-08-14 16:31:15 -07:00
LiaoChunyu 99ef0ddea3 [RISCV] Fold (sub constant, (setcc x, y, eq/neq)) -> (add constant - 1, (setcc x, y, neq/eq))
(setcc x, y, eq/neq) are seqz, snez that set rd = 0/1.

addi is used to process immediate, which can save instructions for load immediate.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D131471
2022-08-13 20:37:57 +08:00
Craig Topper e493944f5f [RISCV] Use SLTIU X, -1 for (setne X, -1).
Since -1 is the maximum unsigned value, all values less than it
are not equal to it.
2022-08-11 15:36:04 -07:00
Craig Topper 2c79801a0e [RISCV] Add more ineg+setcc isel patterns to avoid creating neg+xori+slti(u).
Including patterns to select addiw if only the lower 32 bits are used.

I'm not excited about adding this many patterns. I'm looking at whether
we can create the xori during lowering and move the ineg patterns to
DAGCombiner.
2022-08-11 14:24:09 -07:00
Yeting Kuo 875694089d [RISCV] Peephole optimization to fold merge.vvm and unmasked intrinsics.
The patch uses peephole method to fold merge.vvm and unmasked intrinsics to
masked intrinsics. Using peephole intead of tablegen patterns is to avoid large
auto gnerated code.

Note: The patch ignores segment loads since I don't know how to test them.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130442
2022-08-11 17:58:11 +08:00
Yeting Kuo 7050f2102e [RISCV][test] Precommitted test for optimization for vmerge and unmasked intrinsics.
Precommitted test cases for D130442.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130753
2022-08-11 13:23:08 +08:00
Alex Bradbury 47b1f8362a [RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls
Prior to this patch, libcalls inserted by the SelectionDAG legalizer
could never be tailcalled. The eligibility of libcalls for tail calling
is is partly determined by checking TargetLowering::isInTailCallPosition
and comparing the return type of the libcall and the calleer.
isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly
(which always returns false if not implemented by the target).

This patch provides a minimal implementation of
TargetLowering::isUsedByReturnOnly - enough to support tail calling
libcalls on hard float ABIs. Soft-float ABIs are left for a follow on
patch. libcall-tail-calls.ll also shows missed opportunities to tail
call integer libcalls, but this is due to issues outside of
the isUsedByReturnOnly hook.

Differential Revision: https://reviews.llvm.org/D131087
2022-08-10 10:50:29 +01:00
Philip Reames 46fe24e2fb [RISCV] Split check lines for fpclamptosat_vec test
This is currently exercising scalarization code path; with vectors enabled, we hit a different code path.  Explicitly exercise both so that both configurations have testing.
2022-08-09 15:03:29 -07:00
Philip Reames 8800b1103a [RISCV] Pin a test to scalar lowering to preserve test intent [nfc]
In an upcoming change to enable fixed length vector lowering via vector registers, the codepath exercised would change.  Pin this to the old lowering.
2022-08-09 10:12:18 -07:00
Nikita Popov f5ed0cb217 [RISCV] Add target feature to force-enable atomics
This adds a +forced-atomics target feature with the same semantics
as +atomics-32 on ARM (D130480). For RISCV targets without the +a
extension, this forces LLVM to assume that lock-free atomics
(up to 32/64 bits for riscv32/64 respectively) are available.

This means that atomic load/store are lowered to a simple load/store
(and fence as necessary), as these are guaranteed to be atomic
(as long as they're aligned). Atomic RMW/CAS are lowered to __sync
(rather than __atomic) libcalls. Responsibility for providing the
__sync libcalls lies with the user (for privileged single-core code
they can be implemented by disabling interrupts). Code using
+forced-atomics and -forced-atomics are not ABI compatible if atomic
variables cross the ABI boundary.

For context, the difference between __sync and __atomic is that the
former are required to be lock-free, while the latter requires a
shared global lock provided by a shared object library. See
https://llvm.org/docs/Atomics.html#libcalls-atomic for a detailed
discussion on the topic.

This target feature will be used by Rust's riscv32i target family
to support the use of atomic load/store without atomic RMW/CAS.

Differential Revision: https://reviews.llvm.org/D130621
2022-08-09 16:04:46 +02:00
Shubham Narlawar ab4fc87a9d [DAG] Emit table lookup from TargetLowering::expandCTTZ()
This patch emits table lookup in expandCTTZ.

Context -
https://reviews.llvm.org/D113291 transforms set of IR instructions to
cttz intrinsic but there are some targets which does not support CTTZ or
CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ().

Differential Revision: https://reviews.llvm.org/D128911
2022-08-08 12:08:05 +01:00
Simon Pilgrim e5e93b6130 [DAG] FoldConstantArithmetic - add initial support for undef elements in bitcasted binop constant folding
FoldConstantArithmetic can fold constant vectors hidden behind bitcasts (e.g. vXi64 -> v2Xi32 on 32-bit platforms), but currently bails if either vector contains undef elements. These undefs can often occur due to SimplifyDemandedBits/VectorElts calls recognising that the upper bits are often unnecessary (e.g. funnel-shift/rotate implicit-modulo and AND masks).

This patch adds a basic 'FoldValueWithUndef' handler that will attempt to constant fold if one or both of the ops are undef - so far this just handles the AND and MUL cases where we always fold to zero.

The RISCV codegen increase is interesting - it looks like the BUILD_VECTOR lowering was loading a constant pool entry but now (with all elements defined constant) it can materialize the constant instead?

Differential Revision: https://reviews.llvm.org/D130839
2022-08-08 11:53:56 +01:00
Craig Topper 75c64c7c4e [RISCV] Don't use li+sh3add for constants that can use lui+add.
If we're adding a constant that can't use addi we try a few tricks,
one of which is using li+sh3add. We should not do this if lui+add
would work. For example adding 8192. Using sh3add prevents folding
a sext.w to form addw, thus increasing instruction count.
2022-08-05 12:47:03 -07:00
Philip Reames 9a9848f4b9 [RISCVInsertVSETVLI] Remove an unsound optimization
This fixes a bug reported privately by @craig.topper. Here's an example which illustrates the problem:

vsetivli a1, a0, e32, m1, ta, mu # both DefInfo and PrevInfo
vsetivli a2, a1, e32, m4, ta, mu

With the unsound result being:

vsetivli a1, a0, e32, m1, ta, mu
vsetivli a2, a0, e32, m4, ta, mu

Consider the case where this is running on a machine with VLEN=512,. For this case, the VLMAXs are 16 and 64 respectively.

Consider for a0 = 33. The correct result is: a1 = 16, and a2 = 16

After the unsound optimization: a1 = 16 and a2 = 33

This particular example used VLMAXs which differed by more than a power of two. With a difference of only one power of two, there's another form of this bug which involves the AVL < 2 x VLMAX special case, but that ones more complicated to construct as many examples turn out accidentally sound.

This patch takes the approach of simply removing the unsound optimization, but there are multiple sound sub-cases of it. I plan to return to at least a couple of them, but figured it was cleaner to remove the unsound optimization (for ease of backporting), and then review the new optimizations on their own.

Differential Revision: https://reviews.llvm.org/D131264
2022-08-05 12:13:08 -07:00
Craig Topper 12a1ca9c42 [RISCV] Relax another one use restriction in performSRACombine.
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C)
it's possible that the add is used by multiple sras. We should
allow the combine if all the SRAs will eventually be updated.

After transforming all of the sras, the shls will share a single
(sext_inreg (add X, C1), i32).

This pattern occurs if an sra with 32 is used as index in multiple
GEPs with different scales. The shl from the GEPs will be combined
with the sra before we get a chance to match the sra pattern.
2022-08-04 14:32:31 -07:00
Craig Topper a2de12c987 [RISCV] Relax a one use restriction performSRACombine
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
ignore the use count on the (shl X, 32).

The sext_inreg after the transform is free. So we're only making
2 new instructions, the add and the shl. So we only need to be
concerned with replacing the original sra+add. The original shl
can have other uses. This helps if there are multiple different
constants being added to the same shl.
2022-08-04 11:25:08 -07:00
Lorenzo Albano 74940d2668 [VP] Add widening for VP_STRIDED_LOAD and VP_STRIDED_STORE
Reviewed By: frasercrmck, craig.topper

Differential Revision: https://reviews.llvm.org/D121114
2022-08-04 16:12:01 +02:00
wanglian b6b0690355 [LegalizeTypes][VP] Add split operand support for VP float and integer casting
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D130685
2022-08-04 15:41:50 +08:00
Craig Topper 53d560b22f [RISCV] Prevent infinite loop after D129980.
D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into
(seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it
will be turned back into an 'and' by SimplifyDemandedBits which
can cause an infinite loop.

To prevent this, check if bit 31 is 0 with computeKnownBits before
doing the transformation.

Fixes PR56905.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131113
2022-08-03 15:19:07 -07:00
Alex Bradbury 9e966dd298 [RISCV][test] Add test for ability to tailcall libcalls
Although there's good coverage of the libcalls within llvm/test/CodeGen,
it's useful to have tests for all ABI and hard/soft-float combinations
in order to properly test the logic that enables libcall tail calls
(which will be implemented in a follow-up patch).
2022-08-03 19:27:59 +01:00
Alex Bradbury 28f12a09ae [RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics
An unnecessary sext.w is generated when masking the result of the
riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the
intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed.

Although this isn't a particularly important optimisation, removing the
sext.w simplifies implementation of an additional cmpxchg-related
optimisation in D130192.

Although I can't produce a test with different codegen for the other
atomics intrinsics, these are added as well for completeness.

Differential Revision: https://reviews.llvm.org/D130191
2022-08-03 13:41:58 +01:00
Craig Topper c2d0685286 [RISCV] Simplify test case from D130931. NFC 2022-08-01 16:50:56 -07:00
Craig Topper da5b1bf5bb [RISCV] Teach RISCVMergeBaseOffset to merge %lo/%pcrel_lo into load/store after folding arithmetic.
It's possible we have:
lui  a0, %hi(sym)
addi a0, %lo(sym)
addi a0, <offset1>
lw a0, <offset2>(a0)

We want to arrive at
lui a0, %hi(sym+offset1+offset2)
lw a0, %lo(sym+offset1+offset2)

We currently fail to do this because we only consider loads/stores
if we didn't find any arithmetic.

This patch splits arithmetic folding and load/store folding into
two separate phases. The load/store folding can no longer assume
the offset in hi/lo is 0 so we must combine the offsets. I've applied
the same simm32 limit that we applied in the arithmetic folding.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D130931
2022-08-01 15:33:21 -07:00
Craig Topper e07a8155f5 [RISCV] Move Pre-RA pseudo expansion from addMachineSSAOptimization to addPreRegAlloc.
addMachineSSAOptimization is skipped for -O0, but this pass is
required for -O0.
2022-08-01 13:44:43 -07:00
Craig Topper f9b05e6dad [RISCV] Pre-commit tests for D130931. NFC 2022-08-01 12:57:09 -07:00
Craig Topper 450edb0b37 [RISCV] Explicitly select second operand of branch condition to X0.
At least based on the lit tests, the coalescer sometimes fails to
propagate the copy from X0 into the branch instruction. This patch
does it manually during isel. The majority of the changes are from
the select patterns.

Some of the changes are just register allocation changes. Only
the Select change affects the whether a b*z instruction is generated
in the tests. I changed the branch pattern for consistency.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D130809
2022-08-01 11:16:48 -07:00
Lorenzo Albano 71b7c03fd6 [RISCV][VP] Custom lower VP_STRIDED_LOAD and VP_STRIDED_STORE
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121113
2022-08-01 09:23:45 -07:00
Luís Marques 0bc177b6f5 [RISCV] Extend the Merge Base Offset pass to handle AUIPC+ADDI
Builds upon D123264, adding support for merging the low part of the LLA
address into the load/store instruction offsets.

Differential Revision: https://reviews.llvm.org/D123265
2022-08-01 11:30:02 +02:00
Luís Marques 260a641068 [RISCV] Pre-RA expand pseudos pass
Expand load address pseudo-instructions earlier (pre-ra) to allow follow-up
patches to fold the addi of PseudoLLA instructions into the immediate
operand of load/store instructions.

Differential Revision: https://reviews.llvm.org/D123264
2022-07-31 23:19:00 +02:00
Craig Topper d21b315360 [RISCV] Remove vmerges from vector ceil, floor, trunc lowering.
Use masked operations to suppress spurious exception bits being
set in fflags. Unfortunately, doing this adds extra copies.
2022-07-30 10:58:41 -07:00
Luís Marques 383bc7210e [RISCV] Precommit test for D123265
Differential Revision: https://reviews.llvm.org/D128562
2022-07-30 00:13:42 +02:00
Craig Topper e637feee80 [RISCV] Add isel pattern for (setne/eq GPR, -2048)
For constants in the range [-2047, 2048] we use addi. If the constant
is -2048 we can use xori. If we don't match this explicitly, we'll
emit an LI for the -2048 followed by an XOR.
2022-07-29 14:07:38 -07:00
Craig Topper 2750873dfe [RISCV] Update lowerFROUND to use masked instructions.
This avoids a vmerge at the end and avoids spurious fflags updates.
This isn't used for constrained intrinsic so we technically don't have
to worry about fflags, but it doesn't cost much to support it.

To support I've extend our FCOPYSIGN_VL node to support a passthru
operand. Similar to what was done for VRGATHER*_VL nodes.

I plan to do a similar update for trunc, floor, and ceil.

Reviewed By: reames, frasercrmck

Differential Revision: https://reviews.llvm.org/D130659
2022-07-28 10:05:19 -07:00
Simon Pilgrim 69d5a038b9 [DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.

This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits.

There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP.

Differential Revision: https://reviews.llvm.org/D77804
2022-07-28 14:10:44 +01:00
Craig Topper a304d70ee9 [RISCV] Reorder (and/or/xor (shl X, C1), C2) if we can form ANDI/ORI/XORI.
InstCombine and DAGCombine prefer to keep shl before binops.

This patch teaches isel to convert to (shl (and/or/xor X, C1 >> C2), C2)
if (C1 >> C2) is a simm12. The idea was taken from X86's isel code.

There's a special case implemented for a sext_inreg between the
shift and the binop.

Differential Revision: https://reviews.llvm.org/D130610
2022-07-27 17:35:26 -07:00
Craig Topper 8d87f71e54 [RISCV] Pre-commit tests for D130610. NFC 2022-07-27 17:35:17 -07:00
Craig Topper 32622d6de4 [RISCV] Add isel pattern for (mul (and X, 0xffffffff), 3<<C) with Zba.
We can use slli.uw by C followed by sh1add. Similar can be done
for multiples of 5 and 9. We need to make sure that C is less than
32 to stay in bounds of the 5-bit immediate for slli.uw.

We have existing patterns for (mul X, 3<<C) that use sh1add
followed by slli. That order doesn't allow the and to be folded.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130146
2022-07-27 09:41:59 -07:00
LiaoChunyu bf4f9a468a [RISCV]Enable isIntDivCheap when attribute is minsize
Don't expand divisions by constants when attribute is minsize.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130543
2022-07-27 18:22:51 +08:00
Simon Pilgrim 529bd4f352 [DAG] SimplifyDemandedBits - don't early-out for multiple use values
SimplifyDemandedBits currently early-outs for multi-use values beyond the root node (just returning the knownbits), which is missing a number of optimizations as there are plenty of cases where we can still simplify when initially demanding all elements/bits.

@lenary has confirmed that the test cases in aea-erratum-fix.ll need refactoring and the current increase codegen is not a major concern.

Differential Revision: https://reviews.llvm.org/D129765
2022-07-27 10:54:06 +01:00
Craig Topper 3928e89c31 [RISCV] Pre-commit tests for D130146. NFC 2022-07-26 14:22:08 -07:00
Philip Reames 75b15a7e63 [RISCV] Add codegen coverage for ceil/floor/trunc/round/roundeven within FPR
Currently, all of these go to libcalls.  A change to improve lowering is upcoming.
2022-07-26 08:48:46 -07:00
wangpc 1a7078d106 [DAGCombine] Mask doesn't have to be (EltSize - 1) exactly when combining rotation
I think what we need is the least Log2(EltSize) significant bits are known to be ones.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130251
2022-07-26 21:14:45 +08:00
wangpc 10a7c5b798 [RISCV] Precommit test for D130251
Added tests won't modify the least Log2(EltSize) significant bits.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130252
2022-07-26 21:09:11 +08:00
jacquesguan cb370cf413 [DAGCombiner] Teach scalarizeExtractedBinop to support scalable splat.
This patch supports the scalable splat part for scalarizeExtractedBinop.

Differential Revision: https://reviews.llvm.org/D129725
2022-07-26 09:31:45 +08:00
Craig Topper f04ae43752 [RISCV] Add more test cases for select with (setge X, C) condition.
InstCombine and SelectionDAG will tend to canonicalize these conditions
to (setgt X, C-1). C-1 might be more costly to materialize than C would
have been.
2022-07-25 11:55:21 -07:00
Craig Topper 1db6d6dcd8 [RISCV] Teach RISCVCodeGenPrepare to optimize (zext (abs(i32 X, i1 1))).
(abs(i32 X, i1 1) always produces a positive result. The 'i1 1'
means INT_MIN input produces poison. If the result is sign extended,
InstCombine will convert it to zext. This does not produce ideal
code for RISCV.

This patch reverses the zext back to sext which can be folded
into a subw or negw. Ideally we'd do this in SelectionDAG, but
we lose the INT_MIN poison flag when llvm.abs becomes ISD::ABS.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130412
2022-07-25 09:36:41 -07:00
jacquesguan d8800ead62 [RISCV] Scalarize binop followed by extractelement.
This patch adds shouldScalarizeBinop to RISCV target in order to convert an extract element of a vector binary operation into an extract element followed by a scalar binary operation.

Differential Revision: https://reviews.llvm.org/D129545
2022-07-25 17:23:31 +08:00
Craig Topper ab2348a6fa [RISCV] Add sext.b/h and zext.b/h/w to RISCVInstrInfo::foldMemoryOperandImpl.
We can always fold zext.b since it is just andi. The others require
Zba/Zbb.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130302
2022-07-21 14:54:58 -07:00
jacquesguan e60eb7053d recommit "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat."
With fix for AArch64 and Hexgon test cases.
2022-07-21 17:34:34 +08:00
David Green 23d6186be0 [SelectionDAG] Fix fptoi.sat scalable vector lowering
Vector fptosi_sat and fptoui_sat were being expanded by unrolling the
vector operation. This doesn't work for scalable vector, so this patch
adds a call to TLI.expandFP_TO_INT_SAT if the vector is scalable.

Scalable tests are added for AArch64 and RISCV. Some of the AArch64
fptoi_sat operations should be legal, but that will be handled in
another patch.

Differential Revision: https://reviews.llvm.org/D130028
2022-07-21 08:00:22 +01:00
Craig Topper add17fc8e4 [RISCV] Combine (select_cc (srl (and X, 1<<C), C), 0, eq/ne, true, fale)
(srl (and X, 1<<C), C) is the form we receive for testing bit C.
An earlier combine removed the setcc so it wasn't there to match
when we created the SELECT_CC. This doesn't happen for BR_CC because
generic DAG combine rebuilds the setcc if it is used by BRCOND.

We can shift X left by XLen-1-C to put the bit to be tested in the
MSB, and use a signed compare with 0 to test the MSB.
2022-07-20 22:32:11 -07:00
Craig Topper 8983db15a3 [RISCV] Optimize (brcond (seteq (and X, 1 << C), 0))
If C > 10, this will require a constant to be materialized for the
And. To avoid this, we can shift X left by XLen-1-C bits to put the
tested bit in the MSB, then we can do a signed compare with 0 to
determine if the MSB is 0 or 1. Thanks to @reames for the suggestion.

I've implemented this inside of translateSetCCForBranch which is
called when setcc+brcond or setcc+select is converted to br_cc or
select_cc during lowering. It doesn't make sense to do this for
general setcc since we lack a sgez instruction.

I've tested bit 10, 11, 31, 32, 63 and a couple bits betwen 11 and 31
and between 32 and 63 for both i32 and i64 where applicable. Select
has some deficiencies where we receive (and (srl X, C), 1) instead.
This doesn't happen for br_cc due to the call to rebuildSetCC in the
generic DAGCombiner for brcond. I'll explore improving select in a
future patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130203
2022-07-20 18:40:49 -07:00
Craig Topper 31b8939ded [RISCV] Recognize bexti from (srl (and X, 1<<C), C).
This is the form we get for (zext (setne (and X 1<<C))). We only
had bexti patterns for the alternative form (and (srl X, C), 1).
2022-07-20 15:03:52 -07:00
Craig Topper 6746b2349c [RISCV] Add test cases for failure to use bexti for (setne (and X, 1<<C))
This will get converted to (srl (and X, 1<<C), C) which we need
to isel to bexti.
2022-07-20 15:03:52 -07:00
Alex Bradbury c30c461dde [RISCV][test] Add tests for atomic compare exchange + branch on result
Due to the late expansion of the compare exchange sequences, there's
scope for improving codegen by folding the branches into the cmpxchg
loop (avoiding a branch-to-branch).
2022-07-20 17:49:33 +01:00
Alex Bradbury b1578bf377 [RISCV][test] Add tests showing signext behaviour of cmpxchg 2022-07-20 17:10:16 +01:00
Simon Pilgrim 0f6b0461b0 [DAG] SimplifyDemandedBits - relax "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" to match only demanded bits
The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits.

This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115

Alive2: https://alive2.llvm.org/ce/z/fl7T7K

Differential Revision: https://reviews.llvm.org/D129933
2022-07-19 10:59:07 +01:00
Max Kazantsev 69b284aaf6 Revert "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat."
This reverts commit 58dfaaaace.

Massive AARCH test failures in buildbot.
2022-07-19 13:41:52 +07:00
jacquesguan 58dfaaaace [DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat.
This revision supports to scalarize a binary operation of two scalable splat vectors.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122791
2022-07-19 11:20:51 +08:00
jacquesguan 3fcaea176c [RISCV][test] Precommit test for D122791.
Differential Revision: https://reviews.llvm.org/D123362
2022-07-19 10:56:02 +08:00
ksyx 3198364e6e [RISCV][Clang] Add support for Zmmul extension
This patch implements recently ratified extension Zmmul, a subextension
of M (Integer Multiplication and Division) consisting only
multiplication part of it.

Differential Revision: https://reviews.llvm.org/D103313
Reviewed By: craig.topper, jrtc27, asb
2022-07-18 20:26:08 -04:00
Craig Topper 0b02752899 [RISCV] Optimize (seteq (i64 (and X, 0xffffffff)), C1)
(and X, 0xffffffff) requires 2 shifts in the base ISA. Since we
know the result is being used by a compare, we can use a sext_inreg
instead of an AND if we also modify C1 to have 33 sign bits instead
of 32 leading zeros. This can also improve the generated code for
materializing C1.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D129980
2022-07-18 10:54:45 -07:00
Craig Topper 464b3a9d8a [RISCV] Pre-commit tests for D129980. NFC
Differential Revision: https://reviews.llvm.org/D129981
2022-07-18 10:54:45 -07:00
Craig Topper 7c0b9b379b [RISCV] Add isel patterns for ineg+setge/le/uge/ule.
setge/le/uge/ule selected by themselves require an xori with 1.
If we're negating the setcc, we can fold the xori with the neg
to create an addi with -1.

This works because xori X, 1 is equivalent to 1 - X if X is either
0 or 1. So we're doing -(1 - X) which is X-1 or X+-1.

This improves the code for selecting between 0 and -1 based on a
condition for some conditions.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D129957
2022-07-18 09:55:01 -07:00
Craig Topper d7f2a63371 [RISCV] Fold stack reload into sext.w by using lw instead of ld.
We can use lw to load 4 bytes from the stack and sign extend them
instead of loading all 8 bytes.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D129948
2022-07-18 09:09:17 -07:00
jacquesguan bd228a1772 [RISCV] Extend use of SHXADD instructions in RVV spill/reload code.
This patch extends D124824. It uses SHXADD+SLLI to emit 3, 5, or 9 multiplied by a power 2.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D129179
2022-07-18 10:53:19 +08:00
jacquesguan 557a471ab3 [RISCV][test] Precommit test for D129179.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D129463
2022-07-18 10:42:08 +08:00
Craig Topper decf385c27 [RISCV] Teach targetShrinkDemandedConstant to handle OR and XOR.
We were only handling AND before, but SimplifyDemandedBits can
also call it for OR and XOR.
2022-07-17 12:36:33 -07:00
Craig Topper 8cc483099a [RISCV] Teach RISCVCodeGenPrepare to optimize (i64 (and (zext/sext (i32 X), C1)))
If X is known positive by a dominating condition, we can fill in
ones into the upper bits of C1 if that would allow it to become an
simm12 allowing the use of ANDI.

This pattern often occurs in unrolled loops where the induction
variable has been widened.

To get the best benefit from this, I had to move the pass above
ConstantHoisting which is in addIRPasses. Otherwise the AND constant
is often hoisted away from the AND.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D129888
2022-07-17 11:00:56 -07:00
Craig Topper ee6267c443 [RISCV] Remove Gather/Scatter Opt from the O0 pipeline. 2022-07-17 10:58:33 -07:00
Philip Reames 6ab686eb86 [LSR] Allow already invariant operand for ICmpZero matching [try 2]
Changes since initial commit:

* Wrapping a pointer in an SCEV unknown hides the base, and SCEV is only able to compute a subtraction when the bases are known to be equal. This results in a SCEVCouldNotCompute flowing forward and triggering asserts. Test case added in d767b392.
* isLoopInvariant returns true for instructions outside the loop, but not necessarily *above* the loop. Since this code is allowed to visit uses of an IV outside of a loop, we have to make sure the operands of the compare are both invariant and dominating the header. Test case added in 2aed3cdb.

Original commit message follows...

The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already *are* loop invariant.

As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former?

Differential Revision: https://reviews.llvm.org/D129793
2022-07-15 13:29:43 -07:00
Philip Reames 6fe766beba Revert "[LSR] Allow already invariant operand for ICmpZero matching"
This reverts commit 9153515a7b.  Builtbot crash was reported in the commit thread, reverting while investigating.
2022-07-15 10:47:57 -07:00
Philip Reames 9153515a7b [LSR] Allow already invariant operand for ICmpZero matching
The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already *are* loop invariant.

As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former?

Differential Revision: https://reviews.llvm.org/D129793
2022-07-15 09:51:00 -07:00
Craig Topper 79016f6eef [RISCV] Refine the heuristics for our custom (mul (and X, C2), C1) isel.
Prefer to use SLLI instead of zext.w/zext.h in more cases. SLLI
might be better for compression.
2022-07-14 18:24:10 -07:00
Craig Topper dcfc1fd26f [SelectionDAG][RISCV][AMDGPU][ARM] Improve SimplifyDemandedBits for SHL with variable shift amount.
If we have a variable shift amount and the demanded mask has leading
zeros, we can propagate those leading zeros to not demand those bits
from operand 0. This can allow zero_extend/sign_extend to become
any_extend. This pattern can occur due to C integer promotion rules.

This transform is already done by InstCombineSimplifyDemanded.cpp where
sign_extend can be turned into zero_extend for example.

Reviewed By: spatel, foad

Differential Revision: https://reviews.llvm.org/D121833
2022-07-14 16:10:14 -07:00
Craig Topper 450f0bd17b [RISCV] Add additional tests for D121833. NFC 2022-07-14 16:10:14 -07:00
Craig Topper 1a8468ba61 [RISCV] Add a RISCV specific CodeGenPrepare pass.
Initial optimization is to convert (i64 (zext (i32 X))) to
(i64 (sext (i32 X))) if the dominating condition for the basic block
guaranteed the sign bit of X is zero.

This frequently occurs in loop preheaders where a signed induction
variable that can never be negative has been widened. There will be
a dominating check that the 32-bit trip count isn't negative or zero.
The check here is not restricted to that specific case though.

A i32->i64 sext is cheaper than zext on RV64 without the Zba
extension. Later optimizations can often remove the sext from the
preheader basic block because the dominating block also needs a sext to
evaluate the greater than 0 check.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D129732
2022-07-14 10:20:59 -07:00
Fraser Cormack d1a5669f5e [RISCV] Disable subregister liveness by default
We previously enabled subregister liveness by default when compiling
with RVV. This has been shown to cause miscompilations where RVV
register operand constraints are not met. A test was added for this in
D129639 which explains the issue in more detail.

Until this issue is fixed in some way, we should not be enabling
subregister liveness unless the user asks for it.

Reviewed By: craig.topper, rogfer01, kito-cheng

Differential Revision: https://reviews.llvm.org/D129646
2022-07-14 17:04:10 +01:00
Fraser Cormack 3b334978d5 [RISCV] Add a test showing a miscompilation with subreg liveness
This patch adds a test which shows that we may incorrectly register
allocate for RVV instructions which have no-overlap constraints on
source/dest registers of different LMUL groups.

The particular case shows that a vrgatherei16 instruction writes to a
LMUL=1 register group v11 and reads from an EMUL=2 register group
v10/v11. This breaks the overlap constraints of the vrgatherei16
instruction.

The test also shows that disabling subregister liveness fixes the test.

We use `early-clobber` on the `VR` dest and the `VRM2` source to enforce
the constraint but with subregister liveness this constraint is not met.

It's unclear to me at this point whether this is per-design of
early-clobber in conjunction with subregisters (meaning we should find
another way of expressing this constraint) or whether it's a bug in the
register allocator somewhere.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D129639
2022-07-14 10:34:38 +01:00
Craig Topper 257755530a [RISCV] Fold (sra (sext_inreg (shl X, C1), i32), C2) -> (sra (shl X, C1+32), C2+32).
The former pattern will select as slliw+sraiw while the latter
will select as slli+srai. This can enable the slli+srai to be
compressed.

Differential Revision: https://reviews.llvm.org/D129688
2022-07-13 14:34:17 -07:00
Craig Topper e32864b605 [RISCV] Add test case show missed opportunity to turn slliw+sraiw into slli+srai.
slliw and sraiw have no compressed encodings. slli and srai
do have compressed encodings.

Pre-commit for D129688
2022-07-13 12:55:12 -07:00
Alex Bradbury 2ce0a5c8c3 [RISCV][test][NFC] Regenerate RISC-V tests with update_llc_test_checks.py -u
If a change alters more than a couple of tests it's really handy to be
able to regenerate any that were created by update_llc_test_checks.py
with something like `update_llc_test_checks.py -u
llvm/test/CodeGen/RISCV`. I noticed this causes some extraneous changes
(perhaps due to hand editing). This commit addresses that by updating
any fails that are modified by update_llc_test_checks.py -u.
2022-07-13 19:37:34 +01:00
Philip Reames dde2a7fb6d [RISCV] Exploit fact that vscale is always power of two to replace urem sequence
When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale.

vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.)

We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.)

It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic.

Differential Revision: https://reviews.llvm.org/D129609
2022-07-13 10:54:47 -07:00
jacquesguan 9049c46b9d [RISCV][test] Add test of binop followed by extractelement.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D129544
2022-07-13 14:33:56 +08:00
Philip Reames 1ce3f94570 {RISCV] Test coverage for improved lowering assuming vscale is pow-of-two 2022-07-12 15:40:30 -07:00
Craig Topper 8eaf00e04d [TargetLowering][RISCV] Make expandCTLZ work for non-power of 2 types.
To convert CTLZ to popcount we do

x = x | (x >> 1);
x = x | (x >> 2);
...
x = x | (x >>16);
x = x | (x >>32); // for 64-bit input
return popcount(~x);

This smears the most significant set bit across all of the bits
below it then inverts the remaining 0s and does a population count.

To support non-power of 2 types, the last shift amount must be
more than half of the size of the type. For i15, the last shift
was previously a shift by 4, with this patch we add another shift
of 8.

Fixes PR56457.

Differential Revision: https://reviews.llvm.org/D129431
2022-07-12 11:36:37 -07:00
Craig Topper 866be0aa8a [RISCV] Pre-commit test for PR56457. NFC 2022-07-12 11:36:37 -07:00
Craig Topper ca13555e0c [RISCV] Pre-commit tests for D121833. NFC 2022-07-11 12:14:29 -07:00
LiaoChunyu 3f68f0f816 [RISCV] Optimize 2x SELECT for floating-point types
Including the following opcode:
 Select_FPR16_Using_CC_GPR
 Select_FPR32_Using_CC_GPR
 Select_FPR64_Using_CC_GPR

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127871
2022-07-11 14:10:27 +08:00
Craig Topper 35ec8a423d [RISCV] Teach shouldConvertConstantLoadToIntImm that constant materialization can use constant pools.
I think it only makes sense to return true here if we aren't going
to turn around and create a constant pool for the immmediate.

I left out the check for useConstantPoolForLargeInts() thinking
that even if you don't want the commpiler to create a constant pool
you might still want to avoid materializing an integer that is
already available in a global variable.

Test file was copied from AArch64/ARM and has not been commited yet.
Will post separate review for that.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D129402
2022-07-10 14:10:17 -07:00
Craig Topper 60450f91c8 [RISCV] Add test cases for inline memcpy expansion
Test file was taken directly from AArch64/ARM. I've added RUN
lines for aligned and unaligned since many of the test cases
are strings that aren't aligned and have an odd size.

Some of these test cases are modified by D129402.

Differential Revision: https://reviews.llvm.org/D129403
2022-07-10 14:09:02 -07:00
Craig Topper 5f7641a3be [RISCV] Modify the custom isel for (add X, imm) used by load/stores.
We have custom isel that tries to select the Lo12 bits using a
separate ADDI that can later folded into the load/store address
by the post-isel peephole.

This patch disables this if the load/store already had a non-zero
offset. A non-zero offset implies that CodeGenPrepare split several
large offsets used by different loads and stores into a common large
offset and multiple small offsets that could be folded. Folding more
of the lo12 bits changes this common offset by increasing the small
offsets. While this can save an instruction to materialize the common
offset, it can also prevent the small offsets from fitting in a
compressed load/store instruction.

Removing this also simplifies the last piece needed to fold the custom
isel for add into SelectAddrRegImm and remove the post-isel peephole.
2022-07-09 22:47:27 -07:00
Craig Topper 92f1794d41 [RISCV] Mark fminnum_vl and fmaxnum_vl as commutable. 2022-07-08 10:19:09 -07:00
Craig Topper 069ba96660 [RISCV] Add commuted fixed vector vfmax.vf and vfmin.vf tests. NFC
The ISD opcodes aren't marked commutable so we don't match these
properly.
2022-07-08 10:19:09 -07:00
Philip Reames 264018d764 [RISCV] Mark vsadd(u)_vl as commutable
This allows fixed length vectors involving splats on the LHS to commute into the _vx form of the instruction. Oddly, the generic canonicalization rules appear to catch the scalable vector cases. I haven't fully dug in to understand why, but I suspect it's because of a difference in how we represent splats (splat_vector vs build_vector).

Differential Revision: https://reviews.llvm.org/D129302
2022-07-08 10:18:21 -07:00
Craig Topper a246eb6814 [RISCV] Mark (s/u)min_vl and (s/u)max_vl as commutable. 2022-07-08 09:59:42 -07:00
Craig Topper cd783bf997 [RISCV] Add fixed vector vmin(u).vx and vmax(u).vx tests. NFC 2022-07-08 09:59:41 -07:00
Sanjay Patel 8b75671314 [SDAG] try to replace subtract-from-constant with xor
This is almost the same as the abandoned D48529, but it
allows splat vector constants too.

This replaces the x86-specific code that was added with
the alternate patch D48557 with the original generic
combine.

This transform is a less restricted form of an existing
InstCombine and the proposed SDAG equivalent for that
in D128080:
https://alive2.llvm.org/ce/z/OUm6N_

Differential Revision: https://reviews.llvm.org/D128123
2022-07-08 08:14:24 -04:00
Kito Cheng 5c45ae4108 [RISCV] Fix wrong register rename for store value during make-compressible optimization
Current implementation will rename both register in store instructions if
we store base address into memory with same base register, it's OK if
the offset is 0, however that is wrong transform if offset isn't 0, give
a smalle example here:

sd      a0, 808(a0)

We should not transform into:

addi    a2, a0, 768
sd      a2, 40(a2)

That should just rename base address like this:

addi    a2, a0, 768
sd      a0, 40(a2)

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128876
2022-07-08 18:07:17 +08:00
Kito Cheng 7b9a3b9d6d [RISCV] Precommit testcase to show wrong result of make-compressible optimization
Use following example to demo what happened now:

  li      a1, 1
  sd      a1, 800(a0)
  sd      a0, 808(a0) # Store base address into base + offset
  li      a1, 2
  sd      a1, 816(a0)

Current will optimizate into:

  li      a1, 1
  addi    a2, a0, 768
  sd      a1, 32(a2)
  sd      a2, 40(a2) # Wrong replacement for the source register.
  li      a1, 2
  sd      a1, 48(a2)

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128875
2022-07-08 17:01:22 +08:00
Lian Wang 9cfb28d672 [RISCV] Change VECTOR_SPLICE mask operation from expand to promote
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128717
2022-07-08 06:20:22 +00:00
Lian Wang 99da3115d1 [RISCV] Recommit test for D128717
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128778
2022-07-08 02:21:33 +00:00
Lian Wang ab9e8a3a6f Revert "[RISCV] Precommit test for D128717"
This reverts commit b3b37f3ecf.
2022-07-08 01:56:29 +00:00
Lian Wang b3b37f3ecf [RISCV] Precommit test for D128717
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128778
2022-07-08 01:47:01 +00:00
Diego Caballero bf1758c3dc Revert "[RISCV] Optimize 2x SELECT for floating-point types"
This reverts commit 1178992c72.
2022-07-07 22:54:00 +00:00
Philip Reames 439783da01 [RISCV] Adjust fixed vector coverage for get.active.lane.mask
Make sure we include at least one case where the vsadd/vmsltu lowering
requires only LMUL1.  We should be able to generate all of the fixed
vector variants from scalar to vector idioms, but this is probably not
very important right now given the fixed length variants we'd actually
use when vectorizing with LMUL=1 are reasonable.
2022-07-07 13:28:29 -07:00
Philip Reames fa3783c907 [RISCV] Test coverage for missing commute of vsadd(u)
For some reason, this appears to only happen with fixed length vectors.  Scalable ones commute just fine in all the cases I've seen.
2022-07-07 09:11:44 -07:00
Philip Reames 6f4773f064 [RISCV] Add codegen coverage for get.active.lane.mask 2022-07-06 14:50:41 -07:00
Craig Topper 088bb8a328 [RISCV] Add more SHXADD patterns where the input is (and (shl/shr X, C2), C1)
It might be possible to rewrite (and (shl/shr X, C2), C1) in a way
that gives us a left shift that can be folded into a SHXADD.
2022-07-05 16:21:47 -07:00
Craig Topper ac3e26bcff [RISCV] Add more SHXADD tests. NFC 2022-07-05 13:41:58 -07:00
luxufan c06d0b4d02 [RISCV] Add ADDI instr for computing FrameIndex address
RVV doesn't have immediate field for memory addressing. Currently
we build MachineInstructions in PEI to computing stack offset for
RVV load store instructions. These instructions were added too late to
can be optimized by CSE, LICM... passes.

This patch makes FrameIndex SDNodes can't be matched in RVV Load Store
instruction selection patterns. So that the FrameIndex SDNodes would be
selected as `ADDI GPR, targetframeindex`.

There are 2 advantages for such change:
1. Stack objects address computing can be optimized by machine function
passes.
2. Since the ADDI instruction's destination register can be used as a
temp register, we can save an emergency spill slot.

Differential Revision: https://reviews.llvm.org/D128187
2022-07-04 22:13:35 +08:00
Craig Topper d36e09cfe5 [RISCV] Add more SHXADD patterns.
This handles the code we get for this.

int foo(unsigned x, int *y) {
    return y[x >> 3];
}

The srl and shl implied by the array index will be combined to
form (srl (and X, C2), C1). We need to reverse this get to back
the shl to fold into SHXADD.
2022-07-03 21:57:05 -07:00
luxufan 0f45eaf0da [RISCV] Add a scavenge spill slot when use ADDI to compute scalable stack offset
Computing scalable offset needs up to two scrach registers. We add
scavenge spill slots according to the result of `RISCV::isRVVSpill`
and `RVVStackSize`. Since ADDI is not included in `RISCV::isRVVSpill`,
PEI doesn't add scavenge spill slots for scrach registers when using
ADDI to get scalable stack offsets.

The ADDI instruction has a destination register which can be used as
a scrach register. So one scavenge spil slot is sufficient for
computing scalable stack offsets.

Differential Revision: https://reviews.llvm.org/D128188
2022-07-03 20:18:13 +08:00
Craig Topper 7e4ab9d5b8 [RISCV] Add more SHXADD isel patterns.
This handles the code we get for

int foo(int* x, unsigned y) {
  return x[y >> 1];
}

The shift right and the shl will get DAG combined into
(shl (and X, 0xfffffffe), 1). We have custom isel to match the
shl+and, but with Zba the (add (shl X, 1), Y) part will get
matched and leave the and to be iseled by itself. This commit
adds a larger pattern that includes the and.
2022-07-02 23:11:22 -07:00
Craig Topper b2e9684fe4 [RISCV] isel (shl (and X, C2), C) -> (slli (srliw X, C3), C3+C).
where C2 has 32 leading zeros and C3 trailing zeros.

When the shl is used by an add C is 1,2 or 3, we end up matching
(add (shl X, C), Y) first. This leaves an and with a constant that
is harder to materialize.
2022-07-02 01:04:44 -07:00
Craig Topper 9ac548e118 [RISCV] isel (add (and X, 0xFFFFFFFE), Y) as (SH1ADD (SRLIW X, 1), Y).
Similar for SH2ADD and SH3ADD.

This is what we get from

int foo(int* x, unsigned y) {
  return x[y >> 1];
}

This allows us to avoid materializing 0xFFFFFFFE into a register.
2022-07-01 23:52:29 -07:00
Yeting Kuo 5744b9cb79 [RISCV] Restore "Enable shrink wrap by default"
This reverts commit 7af3d4ab3d.

RISC-V reverted the shrink wrap patch for bug 53662. Since the bug is fixed
by D123679, the commit re-enable it.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D128965
2022-07-02 11:13:13 +08:00
Craig Topper 354e04554a [RISCV] Make custom isel for (add X, imm) used by load/stores more selective.
Only handle immediates that would produce an ADDI or ADDIW of Lo12
as the final instruction in their materialization.

As the test change show this removes immediates that materialize
with lui+addiw that is not the same as lui+addi.
2022-06-30 14:20:11 -07:00
Craig Topper 51d672946e [RISCV] Fold (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
Similar for a subtract with a constant left hand side.

(sra (add (shl X, 32), C1<<32), 32) is the canonical IR from InstCombine
for (sext (add (trunc X to i32), 32) to i32).

For RISCV, we should lower this as addiw which means turning it into
(sext_inreg (add X, C1)).

There is an existing DAG combine to convert back to (sext (add (trunc X
to i32), 32) to i32), but it requires isTruncateFree to return true
and for i32 to be a legal type as it used sign_extend and truncate
nodes. So that doesn't work for RISCV.

If the outer sra happens be used by a shl by constant, it will be
folded and the shift amount of the sra will be changed before we
can do our own DAG combine. This requires us to match the more
general pattern and restore the shl.

I had wanted to do this as a separate (add (shl X, 32), C1<<32) ->
(shl (add X, C1), 32) combine, but that hit an infinite loop for some
values of C1.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128869
2022-06-30 09:01:24 -07:00
Craig Topper 9ace5af049 [RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C).
The sext_inreg can often be folded into an earlier instruction by
using a W instruction. The sext_inreg also works better with our ABI.

This is one of the steps to improving the generated code for this https://godbolt.org/z/hssn6sPco

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128843
2022-06-30 09:01:24 -07:00
Craig Topper 781e3d7ad8 [RISCV] Pre-commit tests for D128869. NFC 2022-06-30 09:01:24 -07:00
Fraser Cormack d5213c83ff [RISCV] Add a test covering a (reverted) codegen issue
This test checks one of problematic cases outlined in D128006, leading
to the patch's reversal. I thought it best to add a test just in case
this sort of optimization is attempted again in the future in some
fashion.
2022-06-30 09:27:52 +01:00
Craig Topper 75095e6281 [RISCV] Pre-commit tests for D128843. NFC 2022-06-29 12:12:42 -07:00
Philip Reames dd48d3ad0e Revert "[RISCV] Avoid changing etype for splat of 0 or -1"
This reverts commit 755c84c62c.  A bug was reported on the original review thread (https://reviews.llvm.org/D128006), and on inspection this patch is simply wrong.  It needs to be checking for VLInBytes, not MaxVL.  These happen to be the same when using AVL=VLMAX (which is quite common), but this does not fold when AVL != VLMAX.
2022-06-29 10:27:02 -07:00
Craig Topper 7cbfb4eb7a [RISCV] Select (srl (and X, C2) as (slli (srliw X, C3), C3-C).
If C2 has 32 leading zeros and C3 trailing zeros.
2022-06-29 09:15:09 -07:00
Philip Reames 860c62f53c [RISCV] Refine known bits for READ_VLENB
This implements known bits for READ_VALUE using any information known about minimum and maximum VLEN. There's an additional assumption that VLEN is a power of two.

The motivation here is mostly to remove the last use of getMinVLen, but while I was here, I decided to also fix the bug for VLEN < 128 and handle max from command line generically too.

Differential Revision: https://reviews.llvm.org/D128758
2022-06-28 15:42:14 -07:00
Craig Topper 02c8453e64 [RISCV] Teach RISCVMergeBaseOffset to handle read-modify-write of a global.
The pass was previously limited to LUI+ADDI being used by a single
instruction.

This patch allows the pass to optimize multiple memory operations
that use the same offset. Each of them will receive a separate %lo
relocation. My main motivation is to handle a read-modify-write
where we have a load and store to the same address, but I didn't
restrict it to that case.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128599
2022-06-28 11:46:24 -07:00
Philip Reames c755bf658f [RISCV] Add test coverage for high known bits for vscale 2022-06-28 10:04:05 -07:00
Alex Bradbury 7bcfcabbd1 [RISCV] Implement support for the Zicbop extension
Implements the ratified RISC-V Base Cache Management Operation ISA
Extension: Zicbop, as described in
https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf.

This is implemented in a separate patch to Zicbom and Zicboz due to it
requiring a new ASM operand type to be defined.

Differential Revision: https://reviews.llvm.org/D117433
2022-06-28 12:43:26 +01:00
Alex Bradbury 4f40ca53ce [RISCV] Implement support for the Zicbom and Zicboz extensions
Implements the ratified RISC-V Base Cache Management Operation ISA
Extensions: Zicbom and Zicboz, as described in
https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf.

Zicbop is implemented in a separate patch due to it requiring a new ASM
operand type to be defined.

As discussed in the relevant issue in the upstream spec
https://github.com/riscv/riscv-CMOs/issues/47, the cbo.* instructions
use the format (rs1) or 0(rs1) for their operand, similar to the AMOs.

Differential Revision: https://reviews.llvm.org/D117432
2022-06-28 12:43:25 +01:00
Lian Wang 96ab083622 [RISCV] Support VECTOR_REVERSE mask operation.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128627
2022-06-28 07:48:51 +00:00
LiaoChunyu 1178992c72 [RISCV] Optimize 2x SELECT for floating-point types
Including the following opcode:
 Select_FPR16_Using_CC_GPR
 Select_FPR32_Using_CC_GPR
 Select_FPR64_Using_CC_GPR

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127871
2022-06-28 12:02:05 +08:00