Summary:
As defined by Arm C Language Extensions (ACLE) these macro defines
should be set to specific values depending on -mbranch-protection.
Reviewers: chill
Reviewed By: chill
Subscribers: danielkiss, cfe-commits, kristof.beyls
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77134
Currently, bpf does not specify 128bit alignment in its
layout spec. So for a structure like
struct ipv6_key_t {
unsigned pid;
unsigned __int128 saddr;
unsigned short lport;
};
clang will generate IR type
%struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] }
Additional padding is to ensure later IR->MIR can generate correct
stack layout with target layout spec.
But it is common practice for a tracing program to be
first compiled with target flag (e.g., x86_64 or aarch64) through
clang to generate IR and then go through llc to generate bpf
byte code. Tracing program often refers to kernel internal
data structures which needs to be compiled with non-bpf target.
But such a compilation model may cause a problem on aarch64.
The bcc issue https://github.com/iovisor/bcc/issues/2827
reported such a problem.
For the above structure, since aarch64 has "i128:128" in its
layout string, the generated IR will have
%struct.ipv6_key_t = type { i32, i128, i16 }
Since bpf does not have "i128:128" in its spec string,
the selectionDAG assumes alignment 8 for i128 and
computes the stack storage size for the above is 32 bytes,
which leads incorrect code later.
The x86_64 does not have this issue as it does not have
"i128:128" in its layout spec as it does permits i128 to
be alignmented at 8 bytes at stack. Its IR type looks like
%struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] }
The fix here is add i128 support in layout spec, the same as
aarch64. The only downside is we may have less optimal stack
allocation in certain cases since we require 16byte alignment
for i128 instead of 8. But this is probably fine as i128 is
not used widely and in most cases users should already
have proper alignment.
Differential Revision: https://reviews.llvm.org/D76587
Summary:
This patch introduces command-line support for the Armv8.6-a architecture and assembly support for BFloat16. Details can be found
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
in addition to the GCC patch for the 8..6-a CLI:
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02647.html
In detail this patch
- march options for armv8.6-a
- BFloat16 assembly
This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.
Based on work by:
- labrinea
- MarkMurrayARM
- Luke Cheeseman
- Javed Asbar
- Mikhail Maltsev
- Luke Geeson
Reviewers: SjoerdMeijer, craig.topper, rjmccall, jfb, LukeGeeson
Reviewed By: SjoerdMeijer
Subscribers: stuij, kristof.beyls, hiraditya, dexonsmith, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D76062
Summary:
This patch adds a virtual method `getCPUCacheLineSize()` to `TargetInfo`. Currently, I've only implemented the method in `X86TargetInfo`. It's extremely important that each CPU's cache line size correct (e.g., we can't just define it as `64` across the board) so, it has been a little slow getting to this point.
I'll work on the ARM CPUs next, but that will probably come later in a different patch.
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74918
Reworked the patch to avoid sharing a header (SVETypeFlags.h) between
include/clang/Basic and utils/TableGen/SveEmitter.cpp. Now the patch
generates the enum/flags which is included in TargetBuiltins.h.
Also renamed one of the SveEmitter options to be in line with MVE.
Summary:
This is a first patch in a series for the SveEmitter to generate the arm_sve.h
header file and builtins.
I've tried my best to strip down this patch as best as I could, but there
are still a few changes that are not necessarily exercised by the load intrinsics
in this patch, mostly around the SVEType class which has some common logic to
represent types from a type and prototype string. I thought it didn't make
much sense to remove that from this patch and split it up.
Summary:
Adds the constraints described below to ensure that we
can tie variables of SVE ACLE types to operands in inline-asm:
- y: SVE registers Z0-Z7
- Upl: One of the low eight SVE predicate registers (P0-P7)
- Upa: Full range of SVE predicate registers (P0-P15)
Reviewers: sdesmalen, huntergr, rovka, cameron.mcinally, efriedma, rengolin
Reviewed By: efriedma
Subscribers: miyuki, tschuett, rkruppe, psnobl, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75690
This is a first patch in a series for the SveEmitter to generate the arm_sve.h
header file and builtins.
I've tried my best to strip down this patch as best as I could, but there
are still a few changes that are not necessarily exercised by the load intrinsics
in this patch, mostly around the SVEType class which has some common logic to
represent types from a type and prototype string. I thought it didn't make
much sense to remove that from this patch and split it up.
Reviewers: efriedma, rovka, SjoerdMeijer, rsandifo-arm, rengolin
Reviewed By: SjoerdMeijer
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75470
Summary:
This patch implements feature test macros for the CDE extension
according to the upcoming ACLE specification.
The following 2 macros are being added:
- __ARM_FEATURE_CDE - defined as '1' when any coprocessor is
configured as a CDE coprocessor
- __ARM_FEATURE_CDE_COPROC - defined as an 8-bit mask, each bit of the
mask corresponds to a coprocessor and is set when the corresponding
coprocessor is configured as CDE (and cleared otherwise).
The patch also exposes the value of __ARM_FEATURE_CDE_COPROC in the
target-independent method TargetInfo::getARMCDECorpocMask, the method
will be used in follow-up patches implementing semantic checks of CDE
intrinsics (we want to diagnose the cases when CDE intrinsics are used
with coprocessors that are not configured as CDE).
Reviewers: simon_tatham, dmgreen, ostannard, MarkMurrayARM
Reviewed By: simon_tatham
Subscribers: kristof.beyls, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75843
This PR enables "XL" C++ ABI in frontend AST to IR codegen. And it is driven by
static init work. The current kind in Clang by default is Generic Itanium, which
has different behavior on static init with IBM xlclang compiler on AIX.
Differential Revision: https://reviews.llvm.org/D74015
Make Clang on aarch64 targets predefine `__AARCH64_CMODEL_SMALL__`
or `__AARCH64_CMODEL_TINY__`, etc. These are the names that GCC
uses for its predefines.
Reviewed By: tamur, MaskRay
Differential Revision: https://reviews.llvm.org/D75002
This commit removes the artificial types <512 x i1> and <1024 x i1>
from HVX intrinsics, and makes v512i1 and v1024i1 no longer legal on
Hexagon.
It may cause existing bitcode files to become invalid.
* Converting between vector predicates and vector registers must be
done explicitly via vandvrt/vandqrt instructions (their intrinsics),
i.e. (for 64-byte mode):
%Q = call <64 x i1> @llvm.hexagon.V6.vandvrt(<16 x i32> %V, i32 -1)
%V = call <16 x i32> @llvm.hexagon.V6.vandqrt(<64 x i1> %Q, i32 -1)
The conversion intrinsics are:
declare <64 x i1> @llvm.hexagon.V6.vandvrt(<16 x i32>, i32)
declare <128 x i1> @llvm.hexagon.V6.vandvrt.128B(<32 x i32>, i32)
declare <16 x i32> @llvm.hexagon.V6.vandqrt(<64 x i1>, i32)
declare <32 x i32> @llvm.hexagon.V6.vandqrt.128B(<128 x i1>, i32)
They are all pure.
* Vector predicate values cannot be loaded/stored directly. This directly
reflects the architecture restriction. Loading and storing or vector
predicates must be done indirectly via vector registers and explicit
conversions via vandvrt/vandqrt instructions.
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
This a recommit of 39f50da2a3 with proper LiveIn
declaration, better option handling and more portable testing.
Differential Revision: https://reviews.llvm.org/D68720
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
This a recommit of 39f50da2a3 with proper LiveIn
declaration, better option handling and more portable testing.
Differential Revision: https://reviews.llvm.org/D68720
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
This a recommit of 39f50da2a3 with better option
handling and more portable testing
Differential Revision: https://reviews.llvm.org/D68720
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
This a recommit of 39f50da2a3 with correct option
flags set.
Differential Revision: https://reviews.llvm.org/D68720
This reverts commit 39f50da2a3.
The -fstack-clash-protection is being passed to the linker too, which
is not intended.
Reverting and fixing that in a later commit.
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
Differential Revision: https://reviews.llvm.org/D68720
Summary:
For now, this ABI simply expands all possible aggregate arguments and
returns all possible aggregates directly. This ABI will change rapidly
as we prototype and benchmark a new ABI that takes advantage of
multivalue return and possibly other changes from the MVP ABI.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D72972
AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.
Summary:
This patch changes the underlying type of the ARM::ArchExtKind
enumeration to uint64_t and adjusts the related code.
The goal of the patch is to prepare the code base for a new
architecture extension.
Reviewers: simon_tatham, eli.friedman, ostannard, dmgreen
Reviewed By: dmgreen
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits, pbarrio
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73906
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
This CL adds clang declarations of built-in functions for AMDGPU MFMA intrinsics and instructions.
OpenCL tests for new built-ins are included.
Differential Revision: https://reviews.llvm.org/D72723
Since 2009 (in r63846) we've been `#define`-ing OBJC_NEW_PROPERTIES all
the time on Darwin, but this macro only makes sense for `-x objective-c`
and `-x objective-c++`. Restrict it to those cases (for which there is
already separate logic).
https://reviews.llvm.org/D72970
rdar://problem/10050342
Summary:
This adds the reference types target feature. This does not enable any
more functionality in LLVM/clang for now, but this is necessary to embed
the info in the target features section, which is used by Binaryen and
Emscripten. It turned out that after D69832 `-fwasm-exceptions` crashed
because we didn't have the reference types target feature.
Reviewers: tlively
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73320
This adds basic support for the Swift calling convention with WebAssembly
targets.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D71823
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.
- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute
AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).
Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.
Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.
-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.
Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.
This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.
AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.