Cause there are two stacks in Ventus, we need to seperate TP stack and SP stack,
this commit just add very initial support for TP stack size calculation
This matches what we get for something like.
%0 = shl i32 %x, C
%1 = zext i32 %0 to i64
%2 = getelementptr i32, ptr %y, %1
The shift before the zext and the shift implied by the GEP get
combined with an AND after them. We need to split it back into
2 shifts so we can fold one into shXadd.uw.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D137886
For vector strided instructions, as the RVV spec says:
> When rs2=x0, then an implementation is allowed, but not required, to
> perform fewer memory operations than the number of active elements, and
> may perform different numbers of memory operations across different
> dynamic executions of the same static instruction.
So compiler shouldn't assume that fewer memory operations will be
performed when rs2=x0.
We add a target feature to specify whether u-arch supports optimized
zero-stride vector load. And we do vector splat optimization iff this
feature is supported.
This feature is enabled by default since most designs implement this
optimization.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D137699
Since we have converted SPLAT_VECTOR to VMV_V_X_VL
or VFMV_V_F_VL in RISCVDAGToDAGISel::PreprocessISelDAG().
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D136814
VMV_V_X_VL nodes should always have a passthru, a splat, and a VL.
We were sometimes missing the VL.
This went unnoticed because these cases were all selected into the
following node to form a .vx or .vi instruction. The ComplexPattern
that does this, doesn't check the VL operand. I've added an assert
to the ComplexPattern to catch if the operand is missing.
@qcolombet spotted some of these in D134703.
VMV_V_X_VL nodes should always have a passthru, a splat, and a VL.
We were sometimes missing the VL.
This went unnoticed because these cases were all selected into the
following node to form a .vx or .vi instruction. The ComplexPattern
that does this, doesn't check the VL operand. I've added an assert
to the ComplexPattern to catch if the operand is missing.
@qcolombet spotted some of these in D134703.
If True has a Chain result, the other operands of the vmerge may
depend on it through that Chain. We need to ensure it isn't a
predecessor of those operands.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D134980
With Zbp removed, we no longer need the generalized forms.
The computeKnownBitsForTargetNode code brev8/orc.b is still based
on the general form with the shift amount forced to 7.
Since there is no guaranteed correspondence of SDNode and MI operands, we need
getters simular to RISCVII::get*OpNum for SDNodes.
More uses of getVecPolicyOpIdx will be added in D130895.
Reviewed By: craig.topper, arcbbb
Differential Revision: https://reviews.llvm.org/D134179
The transformation is benefit because vmerge.vvm always needs mask operand but
vadd.vi may not.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D133255
The original code may have incorrect result if there is a masked instruction
without policy operand to make us set its policy to TUMU. The patch adds an
assertion to catch the instruction.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D133302
Don't require the AND has one use and don't depend on
targetShrinkDemandedConstant turning C2 into 0xffffffff. Instead,
check that the constant is 0xffffffff after replacing any bits
that will be shifted out with 1s.
Another way to fix this might be to prevent SimplifyDemandedBits
from destroying the ANDI after type legalization using
targetShrinkDemandedBits. That would prevent the CSE that created
this mess. targetShrinkDemandedBits is currently only enable after
legalize ops. Quick experiment shows we can't just change when it
runs, we would need to try a different heuristic for post type
legalization.
SimplifyDemandedBits can 0 the upper bits and targetShrinkDemandedConstant
isn't alway able to recover it.
At least part of that may be because targetShrinkDemandedConstant
only runs in the last DAGCombine. Might be worth seeing what happens
if we move it post type legalization.
The motivation of this patch is to lower the IR pattern
(vp.merge mask, (add x, y), false, vl) to
(PseudoVADD_VV_<LMUL>_MASK false, x, y, mask, vl).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D131841
The patch uses peephole method to fold merge.vvm and unmasked intrinsics to
masked intrinsics. Using peephole intead of tablegen patterns is to avoid large
auto gnerated code.
Note: The patch ignores segment loads since I don't know how to test them.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D130442
InstCombine and DAGCombine prefer to keep shl before binops.
This patch teaches isel to convert to (shl (and/or/xor X, C1 >> C2), C2)
if (C1 >> C2) is a simm12. The idea was taken from X86's isel code.
There's a special case implemented for a sext_inreg between the
shift and the binop.
Differential Revision: https://reviews.llvm.org/D130610
This custom isel was used to split the lo12 bits of the imm so that
they could be folded into load/store addresses via a post-isel
peephole.
This patch instead splits the immediate during isel and folds the
lo12 removing the need for the post-isel peephole to do anything.
After this we'll be able to remove the post-isel peephole.
Reviewed By: asb, luismarques
Differential Revision: https://reviews.llvm.org/D129450
We have custom isel that tries to select the Lo12 bits using a
separate ADDI that can later folded into the load/store address
by the post-isel peephole.
This patch disables this if the load/store already had a non-zero
offset. A non-zero offset implies that CodeGenPrepare split several
large offsets used by different loads and stores into a common large
offset and multiple small offsets that could be folded. Folding more
of the lo12 bits changes this common offset by increasing the small
offsets. While this can save an instruction to materialize the common
offset, it can also prevent the small offsets from fitting in a
compressed load/store instruction.
Removing this also simplifies the last piece needed to fold the custom
isel for add into SelectAddrRegImm and remove the post-isel peephole.