This is a re-revert with a corrected test.
This patch adds a test for the PowerPC fma compiler builtins, some variations
of which negate inputs and outputs. The code to generate IR for these
builtins was untested before this patch.
Originally, the code used the outdated method of subtracting floating point
values from -0.0 as floating point negation. This patch remedies that.
Patch by: Drew Wock <drew.wock@sas.com>
Differential Revision: https://reviews.llvm.org/D76949
Summary:
During CodeGen for AArch64 Neon intrinsics, Clang was incorrectly
assuming all the pointers from which loads were being generated for vld1
intrinsics were aligned according to the intrinsics result type, causing
alignment faults on the code generated by the backend.
This patch updates vld1 intrinsics' CodeGen to properly capture the
correct load alignment based on the type of the pointer provided as
input for the intrinsic.
Reviewers: t.p.northover, ostannard, pcc, efriedma
Reviewed By: ostannard, efriedma
Subscribers: echristo, plotfi, nickdesaulniers, efriedma, kristof.beyls, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79721
The os_log helper functions are linkonce_odr and supposed to be
uniqued across TUs, so attachine a DW_AT_decl_line on it is highly
misleading. By setting the function decl to implicit, CGDebugInfo
properly marks the functions as artificial and uses a default file /
line 0 location for the function.
rdar://problem/63450824
Differential Revision: https://reviews.llvm.org/D80463
Summary:
During CodeGen for AArch64 Neon intrinsics, Clang was incorrectly
assuming all the pointers from which loads were being generated for vld1
intrinsics were aligned according to the intrinsics result type, causing
alignment faults on the code generated by the backend.
This patch updates vld1 intrinsics' CodeGen to properly capture the
correct load alignment based on the type of the pointer provided as
input for the intrinsic.
Reviewers: t.p.northover, ostannard, pcc
Reviewed By: ostannard
Subscribers: kristof.beyls, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79721
If we're going to assume references are dereferenceable, we should also
assume they're aligned: otherwise, we can't actually dereference them.
See also D80072.
Differential Revision: https://reviews.llvm.org/D80166
Such a builtin function is mostly useful to preserve btf type id
for non-global data. For example,
extern void foo(..., void *data, int size);
int test(...) {
struct t { int a; int b; int c; } d;
d.a = ...; d.b = ...; d.c = ...;
foo(..., &d, sizeof(d));
}
The function "foo" in the above only see raw data and does not
know what type of the data is. In certain cases, e.g., logging,
the additional type information will help pretty print.
This patch implemented a BPF specific builtin
u32 btf_type_id = __builtin_btf_type_id(param, flag)
which will return a btf type id for the "param".
flag == 0 will indicate a BTF local relocation,
which means btf type_id only adjusted when bpf program BTF changes.
flag == 1 will indicate a BTF remote relocation,
which means btf type_id is adjusted against linux kernel or
future other entities.
Differential Revision: https://reviews.llvm.org/D74668
Summary:
As proposed in https://github.com/WebAssembly/simd/pull/122. Since
these instructions are not yet merged to the SIMD spec proposal, this
patch makes them entirely opt-in by surfacing them only through LLVM
intrinsics and clang builtins. If these instructions are made
official, these intrinsics and builtins should be replaced with simple
instruction patterns.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D79742
Summary:
Although using `__builtin_shufflevector` and the `shufflevector`
instruction works fine, they are not opaque to the optimizer. As a
result, DAGCombine can potentially reduce the number of shuffles and
change the shuffle masks. This is unexpected behavior for users of the
WebAssembly SIMD intrinsics who have crafted their shuffles to
optimize the code generated by engines. This patch solves the problem
by adding a new shuffle intrinsic that is opaque to the optimizers in
line with the decision of the WebAssembly SIMD contributors at
https://github.com/WebAssembly/simd/issues/196#issuecomment-622494748. In
the future we may implement custom DAG combines to properly optimize
shuffles and replace this solution.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D66983
These builtins are expanded in CGBuiltin to use intrinsics
for (signed/unsigned) shift left long top/bottom.
Reviewers: efriedma, SjoerdMeijer
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D79579
passed to __builtin_os_log_format to extend its lifetime to the end of
its enclosing block
Extend only lifetimes of pointers returned by function calls or message
sends instead. In the long term, we should lifetime-extend pointers in
more complex expressions and non-ARC objects (e.g., C++ temporaries)
too.
rdar://problem/61846261
The reinterpret builtins are generated separately because they
need the cross product of all types, 121 functions in total,
which is inconvenient to specify in the arm_sve.td file.
Reviewers: SjoerdMeijer, efriedma, ctetreau, rengolin
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78756
* svdupq builtins that duplicate scalars to every quadword of a vector
are defined using builtins for svld1rq (load and replicate quadword).
* svdupq builtins that duplicate boolean values to fill a predicate vector
are defined using `svcmpne`.
Reviewers: SjoerdMeijer, efriedma, ctetreau
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78750
Summary:
As described in https://github.com/WebAssembly/simd/pull/209. This is
the final reorganization of the SIMD opcode space before
standardization. It has been landed in concert with corresponding
changes in other projects in the WebAssembly SIMD ecosystem.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79224
Summary:
Removes usage of VectorType::getNumElements identified by test located
at CodeGen/aarch64-sve-intrinsics/acle_sve_abs.c. Since the type is an
SVE predicate vector, it makes sense to specialize the code for scalable
vectors only.
Reviewers: rengolin, efriedma
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78958
The svlen builtins return the number of elements in a vector
and are implemented using `llvm.vscale`.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78755
Some ACLE builtins leave out the argument to specify the predicate
pattern, which is expected to be expanded to an SV_ALL pattern.
This patch adds the flag IsInsertOp1SVALL to insert SV_ALL as the
second operand.
Reviewers: efriedma, SjoerdMeijer
Reviewed By: SjoerdMeijer
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78401
Expose llvm fence instruction as clang builtin for AMDGPU target
__builtin_amdgcn_fence(unsigned int memoryOrdering, const char *syncScope)
The first argument of this builtin is one of the memory-ordering specifiers
__ATOMIC_ACQUIRE, __ATOMIC_RELEASE, __ATOMIC_ACQ_REL, or __ATOMIC_SEQ_CST
following C++11 memory model semantics. This is mapped to corresponding
LLVM atomic memory ordering for the fence instruction using LLVM atomic C
ABI. The second argument is an AMDGPU-specific synchronization scope
defined as string.
Reviewed By: sameerds
Differential Revision: https://reviews.llvm.org/D75917
Some ACLE builtins leave out the argument to specify the predicate
pattern, which is expected to be expanded to an SV_ALL pattern.
This patch adds the flag IsAppendSVALL to append SV_ALL as the final
operand.
Reviewers: SjoerdMeijer, efriedma, rovka, rengolin
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77597
This intrinsic isn't overloaded so we should query with types.
Doing so causes the backend to miss the intrinsic and not codegen it.
This eventually leads to a linker error.
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
This patch includes:
- Assembly support for AArch32
- Intrinsics Support for AArch32 Neon Intrinsics for Matrix
Multiplication
Note: these extensions are optional in the 8.6a architecture and so have
to be enabled by default
No additional IR types or C Types are needed for this extension.
This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)
Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman
Reviewers: t.p.northover, miyuki
Reviewed By: miyuki
Subscribers: miyuki, ostannard, kristof.beyls, hiraditya, danielkiss,
cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77872
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
This patch includes:
- Assembly support for AArch64 only (no SVE or Neon)
- Intrinsics Support for AArch64 Armv8.6a Matrix Multiplication Instructions (No bfloat16 matrix multiplication)
No IR types or C Types are needed for this extension.
This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)
Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman
Reviewers: ostannard, t.p.northover, rengolin, kmclaughlin
Reviewed By: kmclaughlin
Subscribers: kmclaughlin, kristof.beyls, hiraditya, danielkiss,
cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77871
The IsReverseCompare flag tells CGBuiltin to swap the operands,
so that a LT/LE intrinsics can be expressed in terms of GE/GT
intrinsics.
This patch also adds builtins for the wide-variants of the compares.
Reviewers: SjoerdMeijer, efriedma, ctetreau
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78747
This patch also adds the enum `sv_prfop` for the prefetch operation specifier
and checks to ensure the passed enum values are valid.
Reviewers: SjoerdMeijer, efriedma, ctetreau
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78674
This patch changes the codegen of the builtins for contiguous loads
to map onto the SVE specific IR intrinsics llvm.aarch64.sve.ld1/st1.
Reviewers: SjoerdMeijer, efriedma, kmclaughlin, rengolin
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78673
This patch changes the FP conversion intrinsics to take a predicate
that matches the number of lanes for the vector with the widest element
type as opposed to using <vscale x 16 x i1>.
For example:
```<vscale x 4 x float> @llvm.aarch64.sve.fcvt.f32f16(<vscale x 4 x float>, <vscale x 4 x i1>, <vscale x 8 x half>)```
now uses <vscale x 4 x i1> instead of <vscale x 16 x i1>
And similar for:
```<vscale x 4 x float> @llvm.aarch64.sve.fcvt.f32f64(<vscale x 4 x float>, <vscale x 2 x i1>, <vscale x 2 x double>)```
where the predicate now matches the wider type, so <vscale x 2 x i1>.
Reviewers: efriedma, SjoerdMeijer, paulwalker-arm, rengolin
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78402
This adds the flag IsOverloadCvt which tells CGBulitin to use
the result type and the type of the last operand as the
overloaded types for the LLVM IR intrinsic.
This also adds the flag IsFPConvert, which is needed to avoid
converting the predicate of the operation from svbool_t to
a predicate with fewer lanes, as the LLVM IR intrinsics use
the <vscale x 16 x i1> as the predicate.
Reviewers: SjoerdMeijer, efriedma
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78239
This also adds the IsOverloadWhileRW flag which tells CGBuiltin to use
the result predicate type and the first pointer type as the
overloaded types for the LLVM IR intrinsic.
Reviewers: SjoerdMeijer, efriedma
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78238
This also adds the IsOverloadWhile flag which tells CGBuiltin to use
both the default type (predicate) and the type of the second operand
(scalar) as the overloaded types for the LLMV IR intrinsic.
Reviewers: SjoerdMeijer, efriedma, rovka
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77595
Add the IsOverloadNone flag to tell CGBuiltin that it does not have
an overloaded type. This is used for e.g. svpfalse which does
not take any arguments and always returns a svbool_t.
This patch also adds builtins for svcntb_pat, svcnth_pat, svcntw_pat
and svcntd_pat, as those don't require custom codegen.
Reviewers: SjoerdMeijer, efriedma, rovka
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77596
The ACLE has builtins that take a scalar value that is to be expanded
into a vector by the operation. While the ISA may have an instruction
that takes an immediate or a scalar to represent this, the LLVM IR
intrinsic may not, so Clang will have to splat the scalar value.
This patch also adds the _n forms for svabd, svadd, svdiv, svdivr,
svmax, svmin, svmul, svmulh, svub and svsubr.
Reviewers: SjoerdMeijer, efriedma, rovka
Reviewed By: SjoerdMeijer
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77594
This implements zeroing of false lanes for binary operations,
where instead of merging into the first operand vector (_m)
a `select` is placed on the first input vector. This approach
easily translates to the use of the `zeroing movprfx` instruction.
This patch also adds builtins for svabd, svadd, svdiv, svdivr,
svmax, svmin, svmul, svmulh, svub and svsubr.
Reviewers: SjoerdMeijer, efriedma, rovka
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77593
Builtins that have the merge type MergeAnyExp or MergeZeroExp,
merge into a 'undef' or 'zero' vector respectively, which enables the
_x and _z behaviour for unary operations.
This patch also adds builtins for svabs and svneg.
Reviewers: SjoerdMeijer, efriedma, rovka
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77591
This patch adds a number of intrinsics that take immediates with
varying ranges based on the element size one of the operands.
svext: immediate ranging 0 to (2048/sizeinbits(elt) - 1)
svasrd: immediate ranging 1..sizeinbits(elt)
svqshlu: immediate ranging 1..sizeinbits(elt)/2
ftmad: immediate ranging 0..(sizeinbits(elt) - 1)
Reviewers: efriedma, SjoerdMeijer, rovka, rengolin
Reviewed By: SjoerdMeijer
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76679
Summary:
Remove asserting vector getters from Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: dexonsmith, sdesmalen, efriedma
Reviewed By: efriedma
Subscribers: cfe-commits, hiraditya, llvm-commits
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D77278
Summary:
This patch adds a mechanism to easily add range checks for a builtin's
immediate operands. This patch is tested with the qdech intrinsic, which takes
both an enum for the predicate pattern, as well as an immediate for the
multiplier.
Reviewers: efriedma, SjoerdMeijer, rovka
Reviewed By: efriedma, SjoerdMeijer
Subscribers: mgorny, tschuett, mgrang, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76678
This adds builtins for all contiguous loads/stores, including
non-temporal, first-faulting and non-faulting.
Reviewers: efriedma, SjoerdMeijer
Reviewed By: SjoerdMeijer
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76238
It can be used to avoid passing the begin and end of a range.
This makes the code shorter and it is consistent with another
wrappers we already have.
Differential revision: https://reviews.llvm.org/D78016
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: sdesmalen, efriedma, krememek
Reviewed By: sdesmalen, efriedma
Subscribers: dexonsmith, Charusso, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77257
When constrained floating point is enabled the AArch64-specific builtins don't use constrained intrinsics in some cases. Fix that.
Neon is part of this patch, so ARM is affected as well.
Differential Revision: https://reviews.llvm.org/D77074
This patch adds a test for the PowerPC fma compiler builtins, some variations
of which negate inputs and outputs. The code to generate IR for these
builtins was untested before this patch.
Originally, the code used the outdated method of subtracting floating point
values from -0.0 as floating point negation. This patch remedies that.
Patch by: Drew Wock <drew.wock@sas.com>
Differential Revision: https://reviews.llvm.org/D76949
The main purpose of introducing these builtins is to add a range
metadata [1, 1025) on the work group size loaded from dispatch
ptr, which cannot be done by source code.
Differential Revision: https://reviews.llvm.org/D76772
Summary:
Since the conditional operator cannot be used with vector conditions
in C, we need a builtin to be able to express this operation in C
source.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76538
Summary:
The range checks performed for the vqrdmulh_lane and vqrdmulh_lane Neon
intrinsics were incorrectly using their return type as the base type for
the range check performed on their 'lane' argument.
This patch updates those intrisics to use the type of the proper reference
argument to perform the range checks.
Reviewers: jmolloy, t.p.northover, rsmith, olista01, dnsampaio
Reviewed By: dnsampaio
Subscribers: dnsampaio, kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74766
Summary:
Range checks were not properly performed in the lane arguments of Neon
intrinsics implemented based on splat operations. Calls to those
intrinsics where translated to `__builtin__shufflevector` calls directly
by the pre-processor through the arm_neon.h macros, missing the chance
for the proper range checks.
This patch enables the range check by introducing an auxiliary splat
instruction in arm_neon.td, delaying the translation to shufflevector
calls to CGBuiltin.cpp in clang after the checks were performed.
Reviewers: jmolloy, t.p.northover, rsmith, olista01, ostannard
Reviewed By: ostannard
Subscribers: ostannard, dnsampaio, danielkiss, kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74619
Summary:
Some of the `*_laneq` intrinsics defined in arm_neon.td were missing the
setting of the `isLaneQ` attribute. This patch sets the attribute on the
related definitions, as they will be required to properly perform range
checks on their lane arguments.
Reviewers: jmolloy, t.p.northover, rsmith, olista01, dnsampaio
Reviewed By: dnsampaio
Subscribers: dnsampaio, kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74616
Reworked the patch to avoid sharing a header (SVETypeFlags.h) between
include/clang/Basic and utils/TableGen/SveEmitter.cpp. Now the patch
generates the enum/flags which is included in TargetBuiltins.h.
Also renamed one of the SveEmitter options to be in line with MVE.
Summary:
This is a first patch in a series for the SveEmitter to generate the arm_sve.h
header file and builtins.
I've tried my best to strip down this patch as best as I could, but there
are still a few changes that are not necessarily exercised by the load intrinsics
in this patch, mostly around the SVEType class which has some common logic to
represent types from a type and prototype string. I thought it didn't make
much sense to remove that from this patch and split it up.
This is a first patch in a series for the SveEmitter to generate the arm_sve.h
header file and builtins.
I've tried my best to strip down this patch as best as I could, but there
are still a few changes that are not necessarily exercised by the load intrinsics
in this patch, mostly around the SVEType class which has some common logic to
represent types from a type and prototype string. I thought it didn't make
much sense to remove that from this patch and split it up.
Reviewers: efriedma, rovka, SjoerdMeijer, rsandifo-arm, rengolin
Reviewed By: SjoerdMeijer
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75470
This patch adds 'q' to mean 'scalable vector' in the builtin
type string, and for SVE will return the matching builtin
type as defined in the C/C++ language extensions for SVE.
This patch also adds some scaffolding to generate the arm_sve.h
header file, and some builtin definitions (+CodeGen) to be able
to implement some simple masked load intrinsics that use the
ACLE types, such as:
svint8_t test_svld1_s8(svbool_t pg, const int8_t *base) {
return svld1_s8(pg, base);
}
Reviewers: efriedma, rjmccall, rovka, rsandifo-arm, rengolin
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75298
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.
This change is needed for D73753.
Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74386
Summary:
This patch generalizes the existing code to support CDE intrinsics
which will share some properties with existing MVE intrinsics
(some of the intrinsics will be polymorphic and accept/return values
of MVE vector types).
Specifically the patch:
* Adds new tablegen backends -gen-arm-cde-builtin-def,
-gen-arm-cde-builtin-codegen, -gen-arm-cde-builtin-sema,
-gen-arm-cde-builtin-aliases, -gen-arm-cde-builtin-header based on
existing MVE backends.
* Renames the '__clang_arm_mve_alias' attribute into
'__clang_arm_builtin_alias' (it will be used with CDE intrinsics as
well as MVE intrinsics)
* Implements semantic checks for the coprocessor argument of the CDE
intrinsics as well as the existing coprocessor intrinsics.
* Adds one CDE intrinsic __arm_cx1 to test the above changes
Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen
Reviewed By: simon_tatham
Subscribers: sdesmalen, mgorny, kristof.beyls, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75850
__builtin_os_log_format
This is needed to keep all the objects, including temporaries returned
by function calls, written to the buffer alive until os_log_pack_send is
called.
rdar://problem/60105410
Summary:
Although SIMD integer min/max operations can be expressed using the ?:
operator in C++, that operator is disallowed for vectors in C. As a
workaround, this change introduces new WebAssembly-specific builtin
functions that lower to the desired vector icmp/select sequences.
Reviewers: aheejin, dschuff, kripken
Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75770
This removes everything but int_x86_avx512_mask_vcvtph2ps_512 which provides the SAE variant, but even this can use the fpext generic if the rounding control is the default.
Differential Revision: https://reviews.llvm.org/D75162
Previously we emitted an fmadd and a fmadd+fneg and combined them with a shufflevector. But this doesn't follow the correct exception behavior for unselected elements so the backend can't merge them into the fmaddsub/fmsubadd instructions.
This patch restores the the fmaddsub intrinsics so we don't have two arithmetic operations. We lose out on optimization opportunity in the non-strict FP case, but I don't think this is a big loss. If someone gives us a test case we can look into adding instcombine/dagcombine improvements. I'd rather not have the frontend do completely different things for strict and non-strict.
This still has problems because target specific intrinsics don't support strict semantics yet. We also still have all of the problems with masking. But we at least generate the right instruction in constrained mode now.
Differential Revision: https://reviews.llvm.org/D74268
This commit removes the artificial types <512 x i1> and <1024 x i1>
from HVX intrinsics, and makes v512i1 and v1024i1 no longer legal on
Hexagon.
It may cause existing bitcode files to become invalid.
* Converting between vector predicates and vector registers must be
done explicitly via vandvrt/vandqrt instructions (their intrinsics),
i.e. (for 64-byte mode):
%Q = call <64 x i1> @llvm.hexagon.V6.vandvrt(<16 x i32> %V, i32 -1)
%V = call <16 x i32> @llvm.hexagon.V6.vandqrt(<64 x i1> %Q, i32 -1)
The conversion intrinsics are:
declare <64 x i1> @llvm.hexagon.V6.vandvrt(<16 x i32>, i32)
declare <128 x i1> @llvm.hexagon.V6.vandvrt.128B(<32 x i32>, i32)
declare <16 x i32> @llvm.hexagon.V6.vandqrt(<64 x i1>, i32)
declare <32 x i32> @llvm.hexagon.V6.vandqrt.128B(<128 x i1>, i32)
They are all pure.
* Vector predicate values cannot be loaded/stored directly. This directly
reflects the architecture restriction. Loading and storing or vector
predicates must be done indirectly via vector registers and explicit
conversions via vandvrt/vandqrt instructions.
Summary:
These are in some sense the inverse of vmovl[bt]q: they take a vector
of n wide elements and truncate each to half its width. So they only
write half a vector's worth of output data, and therefore they also
take an 'inactive' parameter to provide the other half of the data in
the output vector. So vmovnb overwrites the even lanes of 'inactive'
with the narrowed values from the main input, and vmovnt overwrites
the odd lanes.
LLVM had existing codegen which generates these MVE instructions in
response to IR that takes two vectors of wide elements, or two vectors
of narrow ones. But in this case, we have one vector of each. So my
clang codegen strategy is to narrow the input vector of wide elements
by simply reinterpreting it as the output type, and then we have two
narrow vectors and can represent the operation as a vector shuffle
that interleaves lanes from both of them.
Even so, not all the cases I needed ended up being selected as a
single MVE instruction, so I've added a couple more patterns that spot
combinations of the 'MVEvmovn' and 'ARMvrev32' SDNodes which can be
generated as a VMOVN instruction with operands swapped.
This commit adds the unpredicated forms only.
Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74337
Summary:
These intrinsics take a vector of 2n elements, and return a vector of
n wider elements obtained by sign- or zero-extending every other
element of the input vector. They're represented in IR as a
shufflevector that extracts the odd or even elements of the input,
followed by a sext or zext.
Existing LLVM codegen already matches this pattern and generates the
VMOVLB instruction (which widens the even-index input lanes). But no
existing isel rule was generating VMOVLT, so I've added some. However,
the new rules currently only work in little-endian MVE, because the
pattern they expect from isel lowering includes a bitconvert which
doesn't have the right semantics in big-endian.
The output of one existing codegen test is improved by those new
rules.
This commit adds the unpredicated forms only.
Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74336
Summary:
These intrinsics just reorder the lanes of a vector, so the natural IR
representation is as a shufflevector operation. Existing LLVM codegen
already recognizes those particular shufflevectors and generates the
MVE VREV instruction.
This commit adds the unpredicated forms only.
Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74334
Summary:
This commit adds the unpredicated intrinsics for the unary operations
vabsq (absolute value), vnegq (arithmetic negation), vmvnq (bitwise
complement), vqabsq and vqnegq (saturating versions of abs and neg for
signed integers, in the sense that they give INT_MAX if an input lane
is INT_MIN).
This is done entirely in clang: all of these operations have existing
isel patterns and existing tests for them on the LLVM side, so I've
just made clang emit the same IR that those patterns already match.
Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74331
With REQUIRES: x86-register-target added to the tests.
Also remove some unneeded FIXMEs
But add a FIXME for bad IR generation for FMADDSUB/FMSUBADD with
constrained FP.
Original patch by Kevin P. Neal
This reverts commit 208470dd5d.
Tests fail:
error: unable to create target: 'No available targets are compatible with triple "x86_64-apple-darwin"'
This happens on clang-hexagon-elf, clang-cmake-armv7-quick, and
clang-cmake-armv7-quick bots.
If anyone has any suggestions on why then I'm all ears.
Differential Revision: https://reviews.llvm.org/D73570
Revert "[FPEnv][X86] Speculative fix for failures introduced by eda495426."
This reverts commit 80e17e5fcc.
The speculative fix didn't solve the test failures on Hexagon, ARMv6, and
MSVC AArch64.
When constrained floating point is enabled the X86-specific builtins don't
use constrained intrinsics in some cases. Fix that.
Differential Revision: https://reviews.llvm.org/D73570
Summary:
In big-endian MVE, the simple vector load/store instructions (i.e.
both contiguous and non-widening) don't all store the bytes of a
register to memory in the same order: it matters whether you did a
VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register
formats of different vector types relate to each other in a different
way from the in-memory formats.
So, if you want to 'bitcast' or 'reinterpret' one vector type as
another, you have to carefully specify which you mean: did you want to
reinterpret the //register// format of one type as that of the other,
or the //memory// format?
The ACLE `vreinterpretq` intrinsics are specified to reinterpret the
register format. But I had implemented them as LLVM IR bitcast, which
is specified for all types as a reinterpretation of the memory format.
So a `vreinterpretq` intrinsic, applied to values already in registers,
would code-generate incorrectly if compiled big-endian: instead of
emitting no code, it would emit a `vrev`.
To fix this, I've introduced a new IR intrinsic to perform a
register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's
implemented by a trivial isel pattern that expects the input in an
MQPR register, and just returns it unchanged.
In the clang codegen, I only emit this new intrinsic where it's
actually needed: I prefer a bitcast wherever it will have the right
effect, because LLVM understands bitcasts better. So we still generate
bitcasts in little-endian mode, and even in big-endian when you're
casting between two vector types with the same lane size.
For testing, I've moved all the codegen tests of vreinterpretq out
into their own file, so that they can have a different set of RUN
lines to check both big- and little-endian.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73786
The constrained fcmp intrinsics don't allow the TRUE/FALSE predicates.
Using them will assert. To workaround this I'm emitting the old X86 specific intrinsics that were never removed from the backend when we switched to using fcmp in IR. We have no way to mark them as being strict, but that's true of all target specific intrinsics so doesn't seem like we need to solve that here.
I've also added support for selecting between signaling and quiet.
Still need to support SAE which will require using a target specific
intrinsic. Also need to fix masking to not use an AND instruction
after the compare.
Differential Revision: https://reviews.llvm.org/D72906
Summary:
Currently, sqdmulh_lane and friends from the ACLE (implemented in arm_neon.h),
are represented in LLVM IR as a (by vector) sqdmulh and a vector of (repeated)
indices, like so:
%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)
When %v's values are known, the shufflevector is optimized away and we are no
longer able to select the lane variant of sqdmulh in the backend.
This defeats a (hand-coded) optimization that packs several constants into a
single vector and uses the lane intrinsics to reduce register pressure and
trade-off materialising several constants for a single vector load from the
constant pool, like so:
int16x8_t v = {2,3,4,5,6,7,8,9};
a = vqdmulh_laneq_s16(a, v, 0);
b = vqdmulh_laneq_s16(b, v, 1);
c = vqdmulh_laneq_s16(c, v, 2);
d = vqdmulh_laneq_s16(d, v, 3);
[...]
In one microbenchmark from libjpeg-turbo this accounts for a 2.5% to 4%
performance difference.
We could teach the compiler to recover the lane variants, but this would likely
require its own pass. (Alternatively, "volatile" could be used on the constants
vector, but this is a bit ugly.)
This patch instead implements the following LLVM IR intrinsics for AArch64 to
maintain the original structure through IR optmization and into instruction
selection:
- sqdmulh_lane
- sqdmulh_laneq
- sqrdmulh_lane
- sqrdmulh_laneq.
These 'lane' variants need an additional register class. The second argument
must be in the lower half of the 64-bit NEON register file, but only when
operating on i16 elements.
Note that the existing patterns for shufflevector and sqdmulh into sqdmulh_lane
(etc.) remain, so code that does not rely on NEON intrinsics to generate these
instructions is not affected.
This patch also changes clang to emit these IR intrinsics for the corresponding
NEON intrinsics (AArch64 only).
Reviewers: SjoerdMeijer, dmgreen, t.p.northover, rovka, rengolin, efriedma
Reviewed By: efriedma
Subscribers: kristof.beyls, hiraditya, jdoerfert, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71469
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.
Reviewers: xbolva00, courbet, bollu
Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73099
When constrained floating point is enabled the SystemZ-specific builtins
don't use constrained intrinsics in some cases. Fix that.
Differential Revision: https://reviews.llvm.org/D72722
Summary:
This change implements the expansion in two parts:
- Add a utility function emitAMDGPUPrintfCall() in LLVM.
- Invoke the above function from Clang CodeGen, when processing a HIP
program for the AMDGPU target.
The printf expansion has undefined behaviour if the format string is
not a compile-time constant. As a sufficient condition, the HIP
ToolChain now emits -Werror=format-nonliteral.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D71365