We're now getting close to having the necessary analysis/combines etc. for the new generic llvm.abs.* intrinsics.
This patch updates the SSE/AVX ABS vector intrinsics to emit the generic equivalents instead of the icmp+sub+select code pattern.
Differential Revision: https://reviews.llvm.org/D87101
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().
Differential Revision: https://reviews.llvm.org/D86065
This patch adjusts the following ARM/AArch64 LLVM IR intrinsics:
- neon_bfmmla
- neon_bfmlalb
- neon_bfmlalt
so that they take and return bf16 and float types. Previously these
intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from
implementation lacking bf16 IR type).
The neon_vbfdot[q] intrinsics are adjusted similarly. This change
required some additional selection patterns for vbfdot itself and
also for vector shuffles (in a previous patch) because of SelectionDAG
transformations kicking in and mangling the original code.
This patch makes the generated IR cleaner (less useless bitcasts are
produced), but it does not affect the final assembly.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D86146
When we use mask compare intrinsics under strict FP option, the masked
elements shouldn't raise any exception. So, we cann't replace the
intrinsic with a full compare + "and" operation.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D85385
This patch simplified IR generation for __builtin_btf_type_id().
For __builtin_btf_type_id(obj, flag), previously IR builtin
looks like
if (obj is a lvalue)
llvm.bpf.btf.type.id(obj.ptr, 1, flag) !type
else
llvm.bpf.btf.type.id(obj, 0, flag) !type
The purpose of the 2nd argument is to differentiate
__builtin_btf_type_id(obj, flag) where obj is a lvalue
vs.
__builtin_btf_type_id(obj.ptr, flag)
Note that obj or obj.ptr is never used by the backend
and the `obj` argument is only used to derive the type.
This code sequence is subject to potential llvm CSE when
- obj is the same .e.g., nullptr
- flag is the same
- metadata type is different, e.g., typedef of struct "s"
and strust "s".
In the above, we don't want CSE since their metadata is different.
This patch change IR builtin to
llvm.bpf.btf.type.id(seq_num, flag) !type
and seq_num is always increasing. This will prevent potential
llvm CSE.
Also report an error if the type name is empty for
remote relocation since remote relocation needs non-empty
type name to do relocation against vmlinux.
Differential Revision: https://reviews.llvm.org/D85174
This patch added the following additional compile-once
run-everywhere (CO-RE) relocations:
- existence/size of typedef, struct/union or enum type
- enum value and enum value existence
These additional relocations will make CO-RE bpf programs more
adaptive for potential kernel internal data structure changes.
For existence/size relocations, the following two code patterns
are supported:
1. uint32_t __builtin_preserve_type_info(*(<type> *)0, flag);
2. <type> var;
uint32_t __builtin_preserve_field_info(var, flag);
flag = 0 for existence relocation and flag = 1 for size relocation.
For enum value existence and enum value relocations, the following code
pattern is supported:
uint64_t __builtin_preserve_enum_value(*(<enum_type> *)<enum_value>,
flag);
flag = 0 means existence relocation and flag = 1 for enum value.
relocation. In the above <enum_type> can be an enum type or
a typedef to enum type. The <enum_value> needs to be an enumerator
value from the same enum type. The return type is uint64_t to
permit potential 64bit enumerator values.
Differential Revision: https://reviews.llvm.org/D83242
Specified in https://github.com/WebAssembly/simd/pull/237, these
instructions load the first vector lane from memory and zero the other
lanes. Since these instructions are not officially part of the SIMD
proposal, they are only available on an opt-in basis via LLVM
intrinsics and clang builtin functions. If these instructions are
merged to the proposal, this implementation will change so that the
instructions will be generated from normal IR. At that point the
intrinsics and builtin functions would be removed.
This PR also changes the opcodes for the experimental f32x4.qfm{a,s}
instructions because their opcodes conflicted with those of the
v128.load{32,64}_zero instructions. The new opcodes were chosen to
match those used in V8.
Differential Revision: https://reviews.llvm.org/D84820
fptosi/fptoui have similar, but not identical, semantics. In
particular, the behavior on overflow is different.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46844 for 64-bit. (The
corresponding patch for 32-bit is more involved because the equivalent
intrinsics don't exist, as far as I can tell.)
Differential Revision: https://reviews.llvm.org/D84703
Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.
Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.
Differential Revision: https://reviews.llvm.org/D84556
Reapply 49e5f603d4
which had been reverted in c94332919b.
Originally reverted because I hadn't updated it in quite a while when I
got around to committing it, so there were a bunch of missing changes to
new code since I'd written the patch.
Reviewers: aaron.ballman
Differential Revision: https://reviews.llvm.org/D76646
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.
This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
There is a version that just tests (also called
isIntegerConstantExpression) & whereas this version is specifically used
when the value is of interest (a few call sites were actually refactored
to calling the test-only version) so let's make the API look more like
it.
Reviewers: aaron.ballman
Differential Revision: https://reviews.llvm.org/D76646
Remove _COMPAT. Drop the ARCHNAME. Remove the non-COMPAT versions
that are no longer needed.
We now only use these macros in places where we need compatibility
with libgcc/compiler-rt. So we don't need to call out _COMPAT
specifically.
When using __sync_nand_and_fetch with __int128, a problem is found that
the wrong value for the 'invert' value gets emitted to the xor in case
where the int size is greater than 64 bits.
This is because uses of llvm::ConstantInt::get which zero extends the
greater than 64 bits, so instead -1 that we require, it end up
getting 18446744073709551615
This patch replaces the call to llvm::ConstantInt::get with the call
to llvm::Constant::getAllOnesValue which works for all integer types.
Reviewers: jfp, erichkeane, rjmccall, hfinkel
Differential Revision: https://reviews.llvm.org/D82832
This covers both the existing memory functions as well as the new bulk memory proposal.
Added new test files since changes where also required in the inputs.
Also removes unused init/drop intrinsics rather than trying to make them work for 64-bit.
Differential Revision: https://reviews.llvm.org/D82821
This replaces the switch statement implementation in the clang's
X86.cpp with a lookup table in X86TargetParser.cpp.
I've used constexpr and copy of the FeatureBitset from
SubtargetFeature.h to store the features in a lookup table.
After the lookup the bitset is translated into strings for use
by the rest of the frontend code.
I had to modify the implementation of the FeatureBitset to avoid
bugs in gcc 5.5 constexpr handling. It seems to not like the
same array entry to be used on the left side and right hand side
of an assignment or &= or |=. I've also used uint32_t instead of
uint64_t and sized based on the X86::CPU_FEATURE_MAX.
I've initialized the features for different CPUs outside of the
table so that we can express inheritance in an adhoc way. This
was one of the big limitations of the switch and we had resorted
to labels and gotos.
Differential Revision: https://reviews.llvm.org/D82731
This change enables PowerPC compiler builtins to generate constrained
floating point operations when clang is indicated to do so.
A couple of possibly unexpected backend divergences between constrained
floating point and regular behavior are highlighted under the test tag
FIXME-CHECK. This may be something for those on the PPC backend to look
at.
Patch by: Drew Wock <drew.wock@sas.com>
Differential Revision: https://reviews.llvm.org/D82020
This was orignally done so we could separate the compatibility
values and the llvm internal only features into a separate entries
in the feature array. This was needed when we explicitly had to
convert the feature into the proper 32-bit chunk at every reference
and we didn't want things moving around.
Now everything is in an array and we have helper funtions or macros
to convert encoding to index. So we renumbering is no longer an
issue.
Currently, in order to extract an element from a bf16 vector, we cast
the vector to an i16 vector, perform the extraction, and cast the result to
bfloat. This behavior was copied from the old fp16 implementation.
The goal of this patch is to achieve optimal code generation for lane
copying intrinsics in a subsequent patch (LLVM fails to fold certain
combinations of bitcast, insertelement, extractelement and
shufflevector instructions leading to the generation of suboptimal code).
Differential Revision: https://reviews.llvm.org/D82206
Add a new builtin-function __builtin_expect_with_probability and
intrinsic llvm.expect.with.probability.
The interface is __builtin_expect_with_probability(long expr, long
expected, double probability).
It is mainly the same as __builtin_expect besides one more argument
indicating the probability of expression equal to expected value. The
probability should be a constant floating-point expression and be in
range [0.0, 1.0] inclusive.
It is similar to builtin-expect-with-probability function in GCC
built-in functions.
Differential Revision: https://reviews.llvm.org/D79830
The struct store intrinsics in LLVM IR take the individual parts
as arguments, so this patch uses the intrinsics used for `svget`
to break the tuples into individual parts.
Reviewers: c-rhodes, efriedma, ctetreau, david-arm
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81466
Summary:
The new SVE builtin type __SVBFloat16_t` is used to represent scalable
vectors of bfloat elements.
Reviewers: sdesmalen, efriedma, stuij, ctetreau, shafik, rengolin
Subscribers: tschuett, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81304
This patch add __builtin_matrix_column_major_store to Clang,
as described in clang/docs/MatrixTypes.rst. In the initial version,
the stride is not optional yet.
Reviewers: rjmccall, jfb, rsmith, Bigcheese
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D72782
For example:
svint32_t svget4(svint32x4_t tuple, uint64_t imm_index)
returns the subvector at `index`, which must be in range `0..3`.
svint32x3_t svset3(svint32x3_t tuple, uint64_t index, svint32_t vec)
returns a tuple vector with `vec` inserted into `tuple` at `index`,
which must be in range `0..2`.
Reviewers: c-rhodes, efriedma
Reviewed By: c-rhodes
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81464
This patch add __builtin_matrix_column_major_load to Clang,
as described in clang/docs/MatrixTypes.rst. In the initial version,
the stride is not optional yet.
Reviewers: rjmccall, rsmith, jfb, Bigcheese
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D72781
This patch upstreams support for BFloat Matrix Multiplication Intrinsics
and Code Generation from __bf16 to AArch64. This includes IR intrinsics. Unittests are
provided as needed. AArch32 Intrinsics + CodeGen will come after this
patch.
This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
The bfloat type, and its properties are specified in the Arm
Architecture
Reference Manual:
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
The following people contributed to this patch:
Luke Geeson
- Momchil Velikov
- Mikhail Maltsev
- Luke Cheeseman
Reviewers: SjoerdMeijer, t.p.northover, sdesmalen, labrinea, miyuki,
stuij
Reviewed By: miyuki, stuij
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki, chill, pbarrio, stuij
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80752
Change-Id: I174f0fd0f600d04e3799b06a7da88973c6c0703f
_ExtInt types
- Fix computed size for _ExtInt types passed to checked arithmetic
builtins.
- Emit diagnostic when signed _ExtInt larger than 128-bits is passed
to __builtin_mul_overflow.
- Change Sema checks for builtins to accept placeholder types.
Differential Revision: https://reviews.llvm.org/D81420
This patch adds new SVE types to Clang that describe tuples of SVE
vectors. For example `svint32x2_t` which maps to the twice-as-wide
vector `<vscale x 8 x i32>`. Similarly, `svint32x3_t` will map to
`<vscale x 12 x i32>`.
It also adds builtins to return an `undef` vector for a given
SVE type.
Reviewers: c-rhodes, david-arm, ctetreau, efriedma, rengolin
Reviewed By: c-rhodes
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81459
Summary:
As specified in https://github.com/WebAssembly/simd/pull/232. These
instructions are implemented as LLVM intrinsics for now rather than
normal ISel patterns to make these instructions opt-in. Once the
instructions are merged to the spec proposal, the intrinsics will be
replaced with proper ISel patterns.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D81222
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
This patch add __builtin_matrix_transpose to Clang, as described in
clang/docs/MatrixTypes.rst.
Reviewers: rjmccall, jfb, rsmith, Bigcheese
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D72778
Summary:
Add -ftrivial-auto-var-init-stop-after= to limit the number of times
stack variables are initialized when -ftrivial-auto-var-init= is used to
initialize stack variables to zero or a pattern. This flag can be used
to bisect uninitialized uses of a stack variable exposed by automatic
variable initialization, such as http://crrev.com/c/2020401.
Reviewers: jfb, vitalybuka, kcc, glider, rsmith, rjmccall, pcc, eugenis, vlad.tsyrklevich
Reviewed By: jfb
Subscribers: phosek, hubert.reinterpretcast, srhines, MaskRay, george.burgess.iv, dexonsmith, inglorion, gbiv, llozano, manojgupta, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77168
Summary:
This patch upstreams support for a new storage only bfloat16 C type.
This type is used to implement primitive support for bfloat16 data, in
line with the Bfloat16 extension of the Armv8.6-a architecture, as
detailed here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
In detail this patch:
- introduces an opaque, storage-only C-type __bf16, which introduces a new bfloat IR type.
This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.
The following people contributed to this patch:
- Luke Cheeseman
- Momchil Velikov
- Alexandros Lamprineas
- Luke Geeson
- Simon Tatham
- Ties Stuij
Reviewers: SjoerdMeijer, rjmccall, rsmith, liutianle, RKSimon, craig.topper, jfb, LukeGeeson, fpetrogalli
Reviewed By: SjoerdMeijer
Subscribers: labrinea, majnemer, asmith, dexonsmith, kristof.beyls, arphaman, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76077