Now that the WebAssembly SIMD specification is finalized and engines are
generally up-to-date, there is no need for a separate target feature for gating
SIMD instructions that engines have not implemented. With this change,
v128.const is now enabled by default with the simd128 target feature.
Differential Revision: https://reviews.llvm.org/D98457
[amdgpu] Update med3 combine to skip i64
Fixes an assumption that a type which is not i32 will be i16. This asserts
when trying to sign/zero extend an i64 to i32.
Test case was cut down from an openmp application. Variations on it are hit by
other combines before reaching the problematic one, e.g. replacing the
immediate values with other function arguments changes the codegen path and
misses this combine.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D98872
This patch adds support for masked scatter intrinsics on scalable vector
types. It is mostly an extension of the earlier masked gather support
introduced in D96263, since the addressing mode legalization is the
same.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96486
This patch supports the masked gather intrinsics in RVV.
The RVV indexed load/store instructions only support the "unsigned unscaled"
addressing mode; indices are implicitly zero-extended or truncated to XLEN and
are treated as byte offsets. This ISA supports the intrinsics directly, but not
the majority of various forms of the MGATHER SDNode that LLVM combines to. Any
signed or scaled indexing is extended to the XLEN value type and scaled
accordingly. This is done during DAG combining as widening the index types to
XLEN may produce illegal vectors that require splitting, e.g.
nxv16i8->nxv16i64.
Support for scalable-vector CONCAT_VECTORS was added to avoid spilling via the
stack when lowering split legalized index operands.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96263
Without this patch, bitcasts of fixed-length mask vectors would go
through the stack.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98779
D93594 depend on the dominate tree and loop information. It increased
the compile time when build with -O0. However this is just to amend the
dominate tree and loop information, so that it is unnecessary to
re-analyze them again. Given the dominate tree of loop information are
absent in this pass, we can avoid amending them.
Differential Revision: https://reviews.llvm.org/D98773
Don't rewrite an add instruction with 2 SET_CC operands into a csel
instruction. The total instruction sequence uses an extra instruction and
register. Preventing this allows us to match a `(add, csel)` pattern and
rewrite this into a `cinc`.
Differential Revision: https://reviews.llvm.org/D98704
This patch changes the operand order of masked vmslt[u]
from (mask, rs1, scalar, maskedoff, vl)
to (maskedoff, rs1, scalar, mask, vl).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98839
The offset in HVX loads/stores is only 4 bits long, so often an
extra register is needed to hold the address. Minimize the number
of such registers by "standardizing" the base addresses and reusing
preexisting base registers when replacing frame indices.
Avoid revisiting nodes with the same set of defined lanes by
using a unified visited set which integrates lanes into the key.
This retains the intent of the original code by still revisiting
a subgraph if a different set of lanes is defined and hence
marking might progress differently.
Note: default size of the visited set has been confirmed to
cover >99% of invocations in large array of test shaders.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D98772
The previous technique relied on early-exiting the legalizer predicate
initialization, leaving an empty rule table. That causes a fallback
for most instructions, but some have legacy rules defined like G_ZEXT
which can try continue, but then crash.
We should fall back earlier, in the translator, to avoid this issue.
Differential Revision: https://reviews.llvm.org/D98730
This uses the shuffle mask cost from D98206 to give a better cost of MVE
VREV instructions. This helps especially in VectorCombine where the cost
of shuffles is used to reorder bitcasts, which this helps keep the phase
ordering test for fp16 reductions producing optimal code. The isVREVMask
has been moved to a header file to allow it to be used across target
transform and isel lowering.
Differential Revision: https://reviews.llvm.org/D98210
At the moment `getMCInstrBeads` is forward-declared in a few places,
bring this together into a single header file.
This was done as part of the disassembler work, since the disassembler
would otherwise add one more forward declaration.
Differential Revision: https://reviews.llvm.org/D98533
This is required because empty strings are not allowed when generating
the assembly parser tables.
Differential Revision: https://reviews.llvm.org/D98532
This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask
can be provided a more accurate cost can be provided by the backend.
For example VREV costs could be returned by the ARM backend. This should
be an NFC until then, laying the groundwork for that to be added.
Differential Revision: https://reviews.llvm.org/D98206
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.
This reverts commit 01ac6d1587.
Given a sextload i16, we can usually generate "ldrsh [rn. rm]". If we
don't naturally have a rn, rm addressing mode, we can either generate
"ldrh [rn, #0]; sxth" or "mov rm, #0; ldrsh [rn. rm]".
We currently generate the first, always creating a sxth. They are both
the same number of instructions, but if we generate the second then the
mov #0 will likely be CSE'd or pulled out of a loop, etc.
This adjusts the ISel patterns to do that, creating a mov instead of a
sxth.
Differential Revision: https://reviews.llvm.org/D98693
This caused non-deterministic compiler output; see comment on the
code review.
> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232
This reverts commit df69c69427.
Previously NEON used a target specific intrinsic for frintn, given that
the FROUNDEVEN ISD node now exists, move over to that instead and add
codegen support for that node for both NEON and fixed length SVE.
Differential Revision: https://reviews.llvm.org/D98487
This patch adds an optimization path for BUILD_VECTOR nodes where the
majority of the elements are identical. These can be splatted, with the
remaining elements patched up with INSERT_VECTOR_ELTs. The threshold can
be tweaked as required - it is currently conservative. Undef elements
are disregarded when judging the dominance of a particular element. This
allows them to be covered by the splat value.
In addition, vectors of 2 elements are always optimized to a splat (for
the upper element) and an insert at element zero.
This optimization is disabled when optimizing for size.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98700
Split out some of the instructions predicated on the dot2-insts target
feature into a new dot7-insts, in preparation for subtargets that have
some but not all of these instructions. NFCI.
Differential Revision: https://reviews.llvm.org/D98717
- Previously, https://reviews.llvm.org/D97703 was [[ https://reviews.llvm.org/D98543 | reverted ]] as it broke when building the unit tests when shared libs on.
- This patch reverts the "revert" and makes two minor changes
- The first is it also links in the MCParser lib when building the unittest. This should resolve the issue when building with with shared libs on and off
- The second renames the name of the unit test from `SystemZAsmLexer` to `SystemZAsmLexerTests` since the convention for unittest binaries is to suffix the name of the unit test with "Tests"
Reviewed By: Kai
Differential Revision: https://reviews.llvm.org/D98666
This change adds an operand class for each addressing mode, which can then
be used as part of the assembler to match instructions.
Differential Revision: https://reviews.llvm.org/D98535
This commit folds sxtw'd or uxtw'd offsets into gather loads where
possible with a DAGCombine optimization.
As an example, the following code:
1 #include <arm_sve.h>
2
3 svuint64_t func(svbool_t pred, const int32_t *base, svint64_t offsets) {
4 return svld1sw_gather_s64offset_u64(
5 pred, base, svextw_s64_x(pred, offsets)
6 );
7 }
would previously lower to the following assembly:
sxtw z0.d, p0/m, z0.d
ld1sw { z0.d }, p0/z, [x0, z0.d]
ret
but now lowers to:
ld1sw { z0.d }, p0/z, [x0, z0.d, sxtw]
ret
Differential Revision: https://reviews.llvm.org/D97858
The InstrEmitter can sometimes insert a copy after an IMPLICIT_DEF
before connecting it to the vector instruction. This occurs when
constrainRegClass reduces to a class with less than 4 registers.
I believe LMUL8 on masked instructions triggers this since the
result can only use the v8, v16, or v24 register group as the mask
is using v0.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D98567
This commit implements an IR-level optimization to eliminate idempotent
SVE mul/fmul intrinsic calls. Currently, the following patterns are
captured:
fmul pg (dup_x 1.0) V => V
mul pg (dup_x 1) V => V
fmul pg V (dup_x 1.0) => V
mul pg V (dup_x 1) => V
fmul pg V (dup v pg 1.0) => V
mul pg V (dup v pg 1) => V
The result of this commit is that code such as:
1 #include <arm_sve.h>
2
3 svfloat64_t foo(svfloat64_t a) {
4 svbool_t t = svptrue_b64();
5 svfloat64_t b = svdup_f64(1.0);
6 return svmul_m(t, a, b);
7 }
will lower to a nop.
This commit does not capture all possibilities; only the simple cases
described above. There is still room for further optimisation.
Differential Revision: https://reviews.llvm.org/D98033
The default promotion uses zero extends that become shifts. We
cam use sign extend instead which is better for RISCV.
I've used two different implementations based on whether we
have minu/maxu instructions.
Differential Revision: https://reviews.llvm.org/D98683
This pass runs in any situations but we skip it when it is not O0 and the
function doesn't have optnone attribute. With -O0, the def of shape to amx
intrinsics is near the amx intrinsics code. We are not able to find a
point which post-dominate all the shape and dominate all amx intrinsics.
To decouple the dependency of the shape, we transform amx intrinsics
to scalar operation, so that compiling doesn't fail. In long term, we
should improve fast register allocation to allocate amx register.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D93594
Avoid making a temporary copy of byval argument if all accesses are loads and
therefore the pointer to the parameter can not escape.
This avoids excessive global memory accesses when each kernel makes its own
copy.
Differential revision: https://reviews.llvm.org/D98469
RA can insert something like a sub1_sub2 COPY of a wide VGPR
tuple which results in the unaligned acces with v_pk_mov_b32
after the copy is expanded. This is regression after D97316.
Differential Revision: https://reviews.llvm.org/D98549
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.
Additional advantage that parser will accept these flags in any order unlike
now.
Differential Revision: https://reviews.llvm.org/D96469
Prefer (self-documenting) return values to output parameters (which are
liable to be used).
While here, rename Noop to Nop which is more widely used and improves
consistency with hasEmitNops/setEmitNops/emitNop/etc.
This allows me to introduce similar combines for branches as
we have recently added for SELECT_CC. Some of them are less
useful for standalone setccs and only help branch instructions.
By having a BR_CC node its easier to only affect branches.
I'm using CondCodeSDNode to make isel patterns easier to
write so we can refer to the codes by name. SELECT_CC uses a
constant instead.
I've translated the condition code just like SELECT_CC so
we need less patterns for the swapped conditions. This
includes special cases for X < 1 and X > -1 that get translated
to blez and bgez by using a 0 constant.
computeKnownBitsForTargetNode support for SELECT_CC is added
to allow MaskedValueIsZero to work for cases where the true
and false values of the SELECT_CC are setccs and the
result of the SELECT_CC is used by a BR_CC. This was needed
to avoid regressions in some of the overflow tests.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D98159