The default legalization of unsupported vector types is to promote the integers
in each lane, which leads to extra sign or zero extending and masking when
moving data into and out of vectors. Switch our preferred type legalization from
the default to vector widening, which keeps the data in the low lanes of the
vector rather than in the low bits of each lane. The unused high lanes can be
ignored.
Half-wide vectors are now loaded from memory into the low 64 bits of the v128
rather than spread out among the lanes. As a result, v128.load64_splat is a much
more common operation, so add new patterns to support it.
Differential Revision: https://reviews.llvm.org/D107502
Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions
with normal codegen patterns.
Differential Revision: https://reviews.llvm.org/D106724
Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax}
with standard codegen patterns. Since wasm_simd128.h uses an integer vector as
the standard single vector type, the IR for the pmin and pmax intrinsic
functions contains bitcasts that would not be there otherwise. Add extra codegen
patterns that can still select the pmin and pmax instructions in the presence of
these bitcasts.
Differential Revision: https://reviews.llvm.org/D106612
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal instruction selection patterns. The wasm_simd128.h
intrinsics header was already using portable code for the corresponding
intrinsics, so now it produces the correct instructions.
Differential Revision: https://reviews.llvm.org/D106400
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50435.
Differential Revision: https://reviews.llvm.org/D106019
Replace the experimental clang builtin and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50433.
Differential Revision: https://reviews.llvm.org/D105950
Replace the clang builtin function and LLVM intrinsic for
f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing
combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern.
Differential Revision: https://reviews.llvm.org/D105755
Replace the clang builtin function and LLVM intrinsic previously used to select
the f64x2.promote_low_f32x4 instruction with custom combines from standard
SelectionDAG nodes. Implement the new combines to share code with the similar
combines for f64x2.convert_low_i32x4_{s,u}. Resolves PR50232.
Differential Revision: https://reviews.llvm.org/D105675
WebAssembly's shift instructions implicitly masks the shift count, so optimize
out redundant explicit masks of the shift count. For vector shifts, this
currently only works if the mask is applied before splatting the shift count,
but this should be addressed in a future commit. Resolves PR49655.
Differential Revision: https://reviews.llvm.org/D105600
Previously we used an i32 constant to store the saturation width, but i32 isn't
legal on RISCV64. This wasn't a big deal to fix, but it is extra work for the
type legalizer.
This patch uses a VTSDNode to store the type similar to SEXT_INREG. This makes
it opaque to the type legalizer.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101262
We previously used splats instead of v128.const to materialize vector constants
because V8 did not support v128.const. Now that V8 supports v128.const, we can
use v128.const instead. Although this increases code size, it should also
increase performance (or at least require fewer engine-side optimizations), so
it is an appropriate change to make.
Differential Revision: https://reviews.llvm.org/D100716
Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead.
This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u},
which are now represented in IR as a saturating truncation to a v2i32 followed by
a concatenation with a zero vector.
Differential Revision: https://reviews.llvm.org/D100596
Removes the builtins and intrinsics used to opt in to using these instructions
and replaces them with normal ISel patterns now that they are no longer
prototypes.
Differential Revision: https://reviews.llvm.org/D100402
Add a custom DAG combine and ISD opcode for detecting patterns like
(uint_to_fp (extract_subvector ...))
before the extract_subvector is expanded to ensure that they will ultimately
lower to f64x2.convert_low_i32x4_{s,u} instructions. Since these instructions
are no longer prototypes and can now be produced via standard IR, this commit
also removes the target intrinsics and builtins that had been used to prototype
the instructions.
Differential Revision: https://reviews.llvm.org/D100425
Now that these instructions are no longer prototypes, we do not need to be
careful about keeping them opt-in and can use the standard LLVM infrastructure
for them. This commit removes the bespoke intrinsics we were using to represent
these operations in favor of the corresponding target-independent intrinsics.
The clang builtins are preserved because there is no standard way to easily
represent these operations in C/C++.
For consistency with the scalar codegen in the Wasm backend, the intrinsic used
to represent {f32x4,f64x2}.nearest is @llvm.nearbyint even though
@llvm.roundeven better captures the semantics of the underlying Wasm
instruction. Replacing our use of @llvm.nearbyint with use of @llvm.roundeven is
left to a potential future patch.
Differential Revision: https://reviews.llvm.org/D100411
In the final SIMD spec, there is only a single v128.any_true instruction, rather
than one for each lane interpretation because the semantics do not depend on the
lane interpretation.
Differential Revision: https://reviews.llvm.org/D100241
Removes the prototype builtin and intrinsic for i64x2.eq and implements that
instruction as well as the other i64x2 comparison instructions in the final SIMD
spec. Unsigned comparisons were not included in the final spec, so they still
need to be scalarized via a custom lowering.
Differential Revision: https://reviews.llvm.org/D99623
When I updated the SIMD opcodes in f5764a8654, I accidentally missed updating
i8x16.popcnt. This patch fixes the omission.
Differential Revision: https://reviews.llvm.org/D99536
Updates the names (e.g. widen => extend, saturate => sat) and opcodes of all
SIMD instructions to match the finalized SIMD spec. Deliberately does not change
the public interface in wasm_simd128.h yet; that will require more care.
Depends on D98466.
Differential Revision: https://reviews.llvm.org/D98676
Removes the instruction definitions, intrinsics, and builtins for qfma/qfms,
signselect, and prefetch instructions, which were not included in the final
WebAssembly SIMD spec.
Depends on D98457.
Differential Revision: https://reviews.llvm.org/D98466
Now that the WebAssembly SIMD specification is finalized and engines are
generally up-to-date, there is no need for a separate target feature for gating
SIMD instructions that engines have not implemented. With this change,
v128.const is now enabled by default with the simd128 target feature.
Differential Revision: https://reviews.llvm.org/D98457
Fixes TableGen parser errors reported by D95874 due to incompatible types being used on multiclass templates.
Differential Revision: https://reviews.llvm.org/D96205
As proposed in https://github.com/WebAssembly/simd/pull/380. This commit makes
the new instructions available only via clang builtins and LLVM intrinsics to
make their use opt-in while they are still being evaluated for inclusion in the
SIMD proposal.
Depends on D93771.
Differential Revision: https://reviews.llvm.org/D93775
This commit is a follow-on to c2c2e9119e73, using the `Vec` records introduced
in that commit in the rest of the SIMD instruction definitions. Also removes
unnecessary types in output patterns.
Differential Revision: https://reviews.llvm.org/D93771
Introduce `Vec` records, each bundling all information related to a single SIMD
lane interpretation. This lets TableGen definitions take a single Vec parameter
from which they can extract information rather than taking multiple redundant
parameters. This commit refactors all of the SIMD load and store instruction
definitions to use the new `Vec`s. Subsequent commits will similarly refactor
additional instruction definitions.
Differential Revision: https://reviews.llvm.org/D93660
As proposed in https://github.com/WebAssembly/simd/pull/376. This commit
implements new builtin functions and intrinsics for these instructions, but does
not yet add them to wasm_simd128.h because they have not yet been merged to the
proposal. These are the first instructions with opcodes greater than 0xff, so
this commit updates the MC layer and disassembler to handle that correctly.
Differential Revision: https://reviews.llvm.org/D90253
Prototype the newly proposed load_lane instructions, as specified in
https://github.com/WebAssembly/simd/pull/350. Since these instructions are not
available to origin trial users on Chrome stable, make them opt-in by only
selecting them from intrinsics rather than normal ISel patterns. Since we only
need rough prototypes to measure performance right now, this commit does not
implement all the load and store patterns that would be necessary to make full
use of the offset immediate. However, the full suite of offset tests is included
to make it easy to track improvements in the future.
Since these are the first instructions to have a memarg immediate as well as an
additional immediate, the disassembler needed some additional hacks to be able
to parse them correctly. Making that code more principled is left as future
work.
Differential Revision: https://reviews.llvm.org/D89366
Specified in https://github.com/WebAssembly/simd/pull/237, these
instructions load the first vector lane from memory and zero the other
lanes. Since these instructions are not officially part of the SIMD
proposal, they are only available on an opt-in basis via LLVM
intrinsics and clang builtin functions. If these instructions are
merged to the proposal, this implementation will change so that the
instructions will be generated from normal IR. At that point the
intrinsics and builtin functions would be removed.
This PR also changes the opcodes for the experimental f32x4.qfm{a,s}
instructions because their opcodes conflicted with those of the
v128.load{32,64}_zero instructions. The new opcodes were chosen to
match those used in V8.
Differential Revision: https://reviews.llvm.org/D84820
Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.
Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.
Differential Revision: https://reviews.llvm.org/D84556
Rather than expanding truncating stores so that vectors are stored one
lane at a time, lower them to a sequence of instructions using
narrowing operations instead, when possible. Since the narrowing
operations have saturating semantics, but truncating stores require
truncation, mask the stored value to manually truncate it before
narrowing. Also, since narrowing is a binary operation, pass in the
original vector as the unused second argument.
Differential Revision: https://reviews.llvm.org/D84377
Although the SIMD spec proposal does not specifically include a
select instruction, the select instruction in MVP WebAssembly is
polymorphic over the selected types, so it is able to work on v128
values when they are enabled. This patch introduces a new variant of
the select instruction for each legal vector type. Additional ISel
patterns are adapted from the SELECT_I32 and SELECT_I64 patterns.
Depends on D83736.
Differential Revision: https://reviews.llvm.org/D83737
We were previously expanding vselect and matching on the expansion to
generate bitselects, but in some cases the expansion would be further
combined and a bitselect would not get generated. This patch improves
codegen in those cases by legalizing vselect and lowering it to
v128.bitselect. The old pattern that matches the expansion is still
useful for lowering IR that already uses the expansion rather than a
select operation.
Differential Revision: https://reviews.llvm.org/D83734
In BUILD_VECTOR lowering, we used to generally prefer using splats
over v128.const instructions because v128.const has a very large
encoding. However, in d5b7a4e2e8 we switched to preferring consts
because they are expected to be more efficient in engines. This patch
updates the ISel patterns to match this current preference.
Differential Revision: https://reviews.llvm.org/D83581
Since WebAssembly's vector shift instructions take a scalar shift
amount rather than a vector shift amount, we have to check in ISel
that the vector shift amount is a splat. Previously, we were checking
explicitly for splat BUILD_VECTOR nodes, but this change uses the
standard utilities for detecting splat values that can handle more
complex splat patterns. Since the C++ ISel lowering is now more
general than the ISel patterns, this change also simplifies shift
lowering by using the C++ lowering for all SIMD shifts rather than
mixing C++ and normal pattern-based lowering.
This change improves ISel for shifts to the point that the
simd-shift-unroll.ll regression test no longer tests the code path it
was originally meant to test. The bug corresponding to that regression
test is no longer reproducible with its original reported reproducer,
so rather than try to fix the regression test, this change just
removes it.
Differential Revision: https://reviews.llvm.org/D83278
Context: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md
This is just a first step, adding the new instruction variants while keeping the existing 32-bit functionality working.
Some of the basic load/store tests have new wasm64 versions that show that the basics of the target are working.
Further features need implementation, but these will be added in followups to keep things reviewable.
Differential Revision: https://reviews.llvm.org/D80769
Summary:
As specified in https://github.com/WebAssembly/simd/pull/232. These
instructions are implemented as LLVM intrinsics for now rather than
normal ISel patterns to make these instructions opt-in. Once the
instructions are merged to the spec proposal, the intrinsics will be
replaced with proper ISel patterns.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D81222