When writing a unit test on replacing standard epilogue sequences with `BR __mspabi_func_epilog_<N>`, by manually asm-clobbering `rN` - `r10` for N = 4..10, everything worked well except for seeming inability to clobber r4.
The problem was that MSP430 code generator of LLVM used an obsolete name FP for that register. Things were worse because when `llc` read an unknown register name, it silently ignored it.
That is, I cannot use `fp` register name from the C code because Clang does not accept it (exactly like GCC). But the accepted name `r4` is not recognised by `llc` (it can be used in listings passed to `llvm-mc` and even `fp` is replace to `r4` by `llvm-mc`). So I can specify any of `fp` or `r4` for the string literal of `asm(...)` but nothing in the clobber list.
This patch replaces `MSP430::FP` with `MSP430::R4` in the backend code (even [MSP430 EABI](http://www.ti.com/lit/an/slaa534/slaa534.pdf) doesn't mention FP as a register name). The R0 - R3 registers, on the other hand, are left as is in the backend code (after all, they have some special meaning on the ISA level). It is just ensured clang is renaming them as expected by the downstream tools. There is probably not much sense in **marking them clobbered** but rename them //just in case// for use at potentially different contexts.
Differential Revision: https://reviews.llvm.org/D82184
Summary:
`svwhilerw_bf16` and `svwhilewr_bf16` intrinsics use the scalar
`bfloat16_t`
type which is predicated on `__ARM_FEATURE_BF16_SCALAR_ARITHMETIC`. This
patch changes the feature guard from `__ARM_FEATURE_SVE_BF16` to the
scalar bfloat feature macro.
The verify tests for `+bf16` are also removed in this patch. The purpose
of these checks was to match the SVE2 ACLE tests that look for an
implicit declaration warning if the feature isn't set. They worked when
the intrinsics were guarded on `__ARM_FEATURE_SVE_BF16` as the
`bfloat16_t`
was guarded on a different macro, but with both the type and intrinsic
guarded on the same macro an earlier error is triggered in the ACLE
regarding the type and we don't get a warning as we do for SVE2.
Reviewers: sdesmalen, fpetrogalli, kmclaughlin, rengolin, efriedma
Reviewed By: sdesmalen, fpetrogalli
Differential Revision: https://reviews.llvm.org/D82578
This change enables PowerPC compiler builtins to generate constrained
floating point operations when clang is indicated to do so.
A couple of possibly unexpected backend divergences between constrained
floating point and regular behavior are highlighted under the test tag
FIXME-CHECK. This may be something for those on the PPC backend to look
at.
Patch by: Drew Wock <drew.wock@sas.com>
Differential Revision: https://reviews.llvm.org/D82020
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
This patch enables the following macros when their corresponding
target attributes are set:
__ARM_FEATURE_SVE (+sve)
__ARM_FEATURE_SVE2 (+sve2)
__ARM_FEATURE_SVE2_AES (+sve2-aes)
__ARM_FEATURE_SVE2_BITPERM (+sve2-bitperm)
__ARM_FEATURE_SVE2_SHA3 (+sve2-sha3)
__ARM_FEATURE_SVE2_SM4 (+sve2-sm4)
This implies that the base SVE and SVE2 ACLE (00bet2) are now feature
complete, meaning that all intrinsics are implemented in LLVM and Clang.
Disclaimer:
To implement the ACLE we have had to fix up many parts of LLVM to make it
support scalable vectors. We have also used many target-specific intrinsics
to reduce reliance on parts of LLVM where we know scalable vectors may
not yet be handled properly (e.g. some transformation might drop the
'scalable' flag on a vector type). While we've done a best effort with
the limited testing that is available to us, we're still working to improve the
stability of the implementation. Additionally, Clang may print warnings
that code may have miscompiled. We find this often to be a false alarm
where the wrong interfaces have been used in LLVM and where resulting
code is not actually incorrect. However, this warrants a bug report
and investigation. If you find any bugs or issues, please raise them on
bugs.llvm.org and let us know!
Reviewers: rengolin, efriedma, david-arm, SjoerdMeijer
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D81725
This patch implements builtins for the following prototypes:
unsigned long long __builtin_cntlzdm (unsigned long long, unsigned long long)
unsigned long long __builtin_cnttzdm (unsigned long long, unsigned long long)
vector unsigned long long vec_cntlzm (vector unsigned long long, vector unsigned long long)
vector unsigned long long vec_cnttzm (vector unsigned long long, vector unsigned long long)
Differential Revision: https://reviews.llvm.org/D80941
EmitTargetMetadata passed to emitTargetMD a null pointer as returned
from GetGlobalValue, for an unused inline function which has been
removed from the module at that point.
A FIXME in CodeGenModule.cpp commented that the calling code in
EmitTargetMetadata should be moved into the one target that needs it
(XCore). A review comment agreed. So the calling loop has been moved
into the XCore subclass. The check for null is done in that loop.
Differential Revision: https://reviews.llvm.org/D77068
Fix test case added by D79830
Rewrite the test case, which did similar thing as builtin-expect.c
does(test generated llvm intrinsic instead of test branch weights).
Currently pass by "-disable-llvm-passes" option.
Differential Revision: https://reviews.llvm.org/D82403
This clang test unfortunately depends on the actions of the optimizer,
which some of the buildbots hit.
This patch makes it so it cannot ignore the return value of 'f', so it
won't do away with the implementation.
This patch contains:
- Support in LLVM CodeGen for bfloat16 types for ld2/3/4 and st2/3/4.
- New bfloat16 ACLE builtins for svld(2|3|4)[_vnum] and svst(2|3|4)[_vnum]
Reviewers: stuij, efriedma, c-rhodes, fpetrogalli
Reviewed By: fpetrogalli
Tags: #clang, #lldb, #llvm
Differential Revision: https://reviews.llvm.org/D82187
Summary:
svbfloat16_t should only be defined if the __ARM_FEATURE_SVE_BF16
feature macro is enabled, similar to the scalar bfloat16_t type. Also,
arm_bf16.h should be included in arm_sve.h when
__ARM_FEATURE_BF16_SCALAR_ARITHMETIC is defined.
Patch also contains a fix for ld1ro intrinsic which should be guarded on
__ARM_FEATURE_SVE_BF16 rather than __ARM_FEATURE_BF16_SCALAR_ARITHMETIC,
and a fix for bfmmla test which was missing
__ARM_FEATURE_BF16_SCALAR_ARITHMETIC and -target-feature +bf16 in the
RUN line.
Reviewed By: fpetrogalli
Differential Revision: https://reviews.llvm.org/D82178
This patch implements builtins for the following prototypes for the VSX Permute
Control Vector Generate with Mask Instructions:
vector unsigned char vec_genpcvm (vector unsigned char, const int);
vector unsigned short vec_genpcvm (vector unsigned short, const int);
vector unsigned int vec_genpcvm (vector unsigned int, const int);
vector unsigned long long vec_genpcvm (vector unsigned long long, const int);
Differential Revision: https://reviews.llvm.org/D81774
Currently, in order to extract an element from a bf16 vector, we cast
the vector to an i16 vector, perform the extraction, and cast the result to
bfloat. This behavior was copied from the old fp16 implementation.
The goal of this patch is to achieve optimal code generation for lane
copying intrinsics in a subsequent patch (LLVM fails to fold certain
combinations of bitcast, insertelement, extractelement and
shufflevector instructions leading to the generation of suboptimal code).
Differential Revision: https://reviews.llvm.org/D82206
Add a new builtin-function __builtin_expect_with_probability and
intrinsic llvm.expect.with.probability.
The interface is __builtin_expect_with_probability(long expr, long
expected, double probability).
It is mainly the same as __builtin_expect besides one more argument
indicating the probability of expression equal to expected value. The
probability should be a constant floating-point expression and be in
range [0.0, 1.0] inclusive.
It is similar to builtin-expect-with-probability function in GCC
built-in functions.
Differential Revision: https://reviews.llvm.org/D79830
When writing a unit test on replacing standard epilogue sequences with `BR __mspabi_func_epilog_<N>`, by manually asm-clobbering `rN` - `r10` for N = 4..10, everything worked well except for seeming inability to clobber r4.
The problem was that MSP430 code generator of LLVM used an obsolete name FP for that register. Things were worse because when `llc` read an unknown register name, it silently ignored it.
Differential Revision: https://reviews.llvm.org/D82184
Cooperlake can be detect by compiler-rt now, but not libgcc yet.
Tigerlake can't be detected by either. Both names are accepted by
gcc. Hopefully the detection code will be in place soon.
This patch implements builtins for the following prototypes:
```
vector signed char vec_clrl (vector signed char a, unsigned int n);
vector unsigned char vec_clrl (vector unsigned char a, unsigned int n);
vector signed char vec_clrr (vector signed char a, unsigned int n);
vector signed char vec_clrr (vector unsigned char a, unsigned int n);
```
Differential Revision: https://reviews.llvm.org/D81707
1. Provides no piroirity supoort && disables three priority related
attributes: init_priority, ctor attr, dtor attr;
2. '-qunique' in XL compiler equivalent behavior of emitting sinit
and sterm functions name using getUniqueModuleId() util function
in LLVM (currently no support for InternalLinkage and WeakODRLinkage
symbols);
3. Add testcases to emit IR sample with __sinit80000000, __dtor, and
__sterm80000000;
4. Temporarily side-steps the need to implement the functionality of
llvm.global_ctors and llvm.global_dtors arrays. The uses of that
functionality in this patch (with respect to the name of the functions
involved) are not representative of how the functionality will be used
once implemented.
Differential Revision: https://reviews.llvm.org/D74166
The struct store intrinsics in LLVM IR take the individual parts
as arguments, so this patch uses the intrinsics used for `svget`
to break the tuples into individual parts.
Reviewers: c-rhodes, efriedma, ctetreau, david-arm
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81466
This patch implements builtins for the following prototypes:
vector unsigned long long vec_pdep(vector unsigned long long, vector unsigned long long);
vector unsigned long long vec_pext(vector unsigned long long, vector unsigned long long __b);
unsigned long long __builtin_pdepd (unsigned long long, unsigned long long);
unsigned long long __builtin_pextd (unsigned long long, unsigned long long);
Revision Depends on D80758
Differential Revision: https://reviews.llvm.org/D80935
Summary:
As part of moving the argument lowering handling for bfloat arguments and
returns to the backend, this patch removes the code that was responsible for
handling the coercion of those arguments in Clang's Codegen.
Subscribers: kristof.beyls, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81837
Summary:
The new SVE builtin type __SVBFloat16_t` is used to represent scalable
vectors of bfloat elements.
Reviewers: sdesmalen, efriedma, stuij, ctetreau, shafik, rengolin
Subscribers: tschuett, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81304
This patch adds some missing information to the LF_BUILDINFO which allows for rebuilding an .OBJ without any external dependency but the .OBJ itself (other than the compiler executable).
Some tools need this information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO therefore stores a full path to the compiler, the PWD (which is the CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variable). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.
For more information see PR36198 and D43002.
Differential Revision: https://reviews.llvm.org/D80833
There are now quite a few SVE tests in LLVM and Clang that do not
emit warnings related to invalid use of EVT::getVectorNumElements()
and VectorType::getNumElements(). For these tests I have added
additional checks that there are no warnings in order to prevent
any future regressions.
Differential Revision: https://reviews.llvm.org/D80712
Summary:
On the process of moving the argument lowering handling for
half-precision floating point arguments and returns to the backend, this
patch removes the code that was responsible for handling the coercion of
those arguments in Clang's Codegen.
Reviewers: rjmccall, chill, ostannard, dnsampaio
Reviewed By: ostannard
Subscribers: stuij, kristof.beyls, dmgreen, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81451
This patch add __builtin_matrix_column_major_store to Clang,
as described in clang/docs/MatrixTypes.rst. In the initial version,
the stride is not optional yet.
Reviewers: rjmccall, jfb, rsmith, Bigcheese
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D72782
For example:
svint32_t svget4(svint32x4_t tuple, uint64_t imm_index)
returns the subvector at `index`, which must be in range `0..3`.
svint32x3_t svset3(svint32x3_t tuple, uint64_t index, svint32_t vec)
returns a tuple vector with `vec` inserted into `tuple` at `index`,
which must be in range `0..2`.
Reviewers: c-rhodes, efriedma
Reviewed By: c-rhodes
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81464
This patch add __builtin_matrix_column_major_load to Clang,
as described in clang/docs/MatrixTypes.rst. In the initial version,
the stride is not optional yet.
Reviewers: rjmccall, rsmith, jfb, Bigcheese
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D72781
There are now quite a few SVE tests in LLVM and Clang that do not
emit warnings related to invalid use of EVT::getVectorNumElements()
and VectorType::getNumElements(). For these tests I have added
additional checks that there are no warnings in order to prevent
any future regressions.
Differential Revision: https://reviews.llvm.org/D80712
This patch upstreams support for BFloat Matrix Multiplication Intrinsics
and Code Generation from __bf16 to AArch64. This includes IR intrinsics. Unittests are
provided as needed. AArch32 Intrinsics + CodeGen will come after this
patch.
This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
The bfloat type, and its properties are specified in the Arm
Architecture
Reference Manual:
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
The following people contributed to this patch:
Luke Geeson
- Momchil Velikov
- Mikhail Maltsev
- Luke Cheeseman
Reviewers: SjoerdMeijer, t.p.northover, sdesmalen, labrinea, miyuki,
stuij
Reviewed By: miyuki, stuij
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki, chill, pbarrio, stuij
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80752
Change-Id: I174f0fd0f600d04e3799b06a7da88973c6c0703f
_ExtInt types
- Fix computed size for _ExtInt types passed to checked arithmetic
builtins.
- Emit diagnostic when signed _ExtInt larger than 128-bits is passed
to __builtin_mul_overflow.
- Change Sema checks for builtins to accept placeholder types.
Differential Revision: https://reviews.llvm.org/D81420
This patch adds new SVE types to Clang that describe tuples of SVE
vectors. For example `svint32x2_t` which maps to the twice-as-wide
vector `<vscale x 8 x i32>`. Similarly, `svint32x3_t` will map to
`<vscale x 12 x i32>`.
It also adds builtins to return an `undef` vector for a given
SVE type.
Reviewers: c-rhodes, david-arm, ctetreau, efriedma, rengolin
Reviewed By: c-rhodes
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81459
Functions can have local pragmas that override the global settings.
We set the flags eagerly based on global settings, but if we emit
an expression under the influence of a pragma, we clear the
appropriate flags from the function.
In order to avoid doing a ton of redundant work whenever we emit
an FP expression, configure the IRBuilder to default to global
settings, and only reconfigure it when we see an FP expression
that's not using the global settings.
Patch by Michele Scandale!
https://reviews.llvm.org/D80462
The expected strings would previously not catch bugs when redzones were
added when they were not actually expected. Fix by adding "global "
before the type.
Some platforms do not support aliases. Split the test, and pass explicit
triple to avoid test failure.
Reviewed By: thakis
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81591
[ v1 was reverted by c6ec352a6b due to
modpost failing; v2 fixes this. More info:
https://github.com/ClangBuiltLinux/linux/issues/1045#issuecomment-640381783 ]
This makes -fsanitize=kernel-address emit the correct globals
constructors for the kernel. We had to do the following:
* Disable generation of constructors that rely on linker features such
as dead-global elimination.
* Only instrument globals *not* in explicit sections. The kernel uses
sections for special globals, which we should not touch.
* Do not instrument globals that are prefixed with "__" nor that are
aliased by a symbol that is prefixed with "__". For example, modpost
relies on specially named aliases to find globals and checks their
contents. Unfortunately modpost relies on size stored as ELF debug info
and any padding of globals currently causes the debug info to cause size
reported to be *with* redzone which throws modpost off.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203493
Tested:
* With 'clang/test/CodeGen/asan-globals.cpp'.
* With test_kasan.ko, we can see:
BUG: KASAN: global-out-of-bounds in kasan_global_oob+0xb3/0xba [test_kasan]
* allyesconfig, allmodconfig (x86_64)
Reviewed By: glider
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D81390
These ACLE tests were missing in previous patches:
- D79357: [SveEmitter] Add builtins for svdup and svindex
- D78747: [SveEmitter] Add builtins for compares and ReverseCompare flag.
- D76238: [SveEmitter] Implement builtins for contiguous loads/stores
Summary:
As specified in https://github.com/WebAssembly/simd/pull/232. These
instructions are implemented as LLVM intrinsics for now rather than
normal ISel patterns to make these instructions opt-in. Once the
instructions are merged to the spec proposal, the intrinsics will be
replaced with proper ISel patterns.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D81222
Check that getDebugInfo() is not null, as in the first revision, before
calling getDebugInfo()->addHeapAllocSiteMetadata().
Else would cause a crash with a new expression in a default arg.
---
Clang marks calls to operator new as heap allocation sites, but the
operator declared at global scope returns a void pointer. There is no
explicit cast in the code, so the compiler has to write down the
allocated type itself.
Also generalize a cast to use CallBase, so that we mark heap alloc sites
when exceptions are enabled.
Differential Revision: https://reviews.llvm.org/D80966
This patch add __builtin_matrix_transpose to Clang, as described in
clang/docs/MatrixTypes.rst.
Reviewers: rjmccall, jfb, rsmith, Bigcheese
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D72778
Summary:
This fixes pr33372.cpp under the new pass manager.
ASan adds padding to globals. For example, it will change a {i32, i32, i32} to a {{i32, i32, i32}, [52 x i8]}. However, when loading from the {i32, i32, i32}, InstCombine may (after various optimizations) end up loading 16 bytes instead of 12, likely because it thinks the [52 x i8] padding is ok to load from. But ASan checks that padding should not be loaded from.
Ultimately this is an issue of *San passes wanting to be run after all optimizations. This change moves the module passes right next to the corresponding function passes.
Also remove comment that's no longer relevant, this is the last ASan/MSan/TSan failure under the NPM (hopefully...).
As mentioned in https://reviews.llvm.org/rG1285e8bcac2c54ddd924ffb813b2b187467ac2a6, NPM doesn't support LTO + sanitizers, so modified some tests that test for that.
Reviewers: leonardchan, vitalybuka
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81323
With a change to use `CGM.getCodeGenOpts().getDebugInfo() != codegenoptions::NoDebugInfo`
instead of `getDebugInfo()`,
to fix `Profile-<arch> :: instrprof-gcov-multithread_fork.test`
See CodeGenModule::CodeGenModule, `EmitGcovArcs || EmitGcovNotes` can
set `clang::CodeGen::CodeGenModule::DebugInfo`.
---
Clang marks calls to operator new as heap allocation sites, but the
operator declared at global scope returns a void pointer. There is no
explicit cast in the code, so the compiler has to write down the
allocated type itself.
Also generalize a cast to use CallBase, so that we mark heap alloc sites
when exceptions are enabled.
Differential Revision: https://reviews.llvm.org/D80966
This patch implements the * binary operator for values of
MatrixType. It adds support for matrix * matrix, scalar * matrix and
matrix * scalar.
For the matrix, matrix case, the number of columns of the first operand
must match the number of rows of the second. For the scalar,matrix variants,
the element type of the matrix must match the scalar type.
Reviewers: rjmccall, anemet, Bigcheese, rsmith, martong
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D76794
global-ctor.ll no longer checks what it intended to check
(@_GLOBAL__sub_I_global-ctor.ll needs a !dbg to work).
Rewrite it.
gcov 3.4 and gcov 4.2 use the same format, thus we can lower the version
requirement to 3.4
Summary: Use a portable section name, as for the test's purpose any name will do.
Reviewers: nickdesaulniers, thakis
Reviewed By: thakis
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81306
Clang marks calls to operator new as heap allocation sites, but the
operator declared at global scope returns a void pointer. There is no
explicit cast in the code, so the compiler has to write down the
allocated type itself.
Also generalize a cast to use CallBase, so that we mark heap alloc sites
when exceptions are enabled.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D80966
Summary:
This makes -fsanitize=kernel-address emit the correct globals
constructors for the kernel. We had to do the following:
- Disable generation of constructors that rely on linker features such
as dead-global elimination.
- Only emit constructors for globals *not* in explicit sections. The
kernel uses sections for special globals, which we should not touch.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203493
Tested:
1. With 'clang/test/CodeGen/asan-globals.cpp'.
2. With test_kasan.ko, we can see:
BUG: KASAN: global-out-of-bounds in kasan_global_oob+0xb3/0xba [test_kasan]
Reviewers: glider, andreyknvl
Reviewed By: glider
Subscribers: cfe-commits, nickdesaulniers, hiraditya, llvm-commits
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D80805
Summary:
The poly64 types are guarded with ifdefs for AArch64 only. This is wrong. This
was also incorrectly documented in the ACLE spec, but this has been rectified in
the latest release. See paragraph 13.1.2 "Vector data types":
https://developer.arm.com/docs/101028/latest
This patch was written by Alexandros Lamprineas.
Reviewers: ostannard, sdesmalen, fpetrogalli, labrinea, t.p.northover, LukeGeeson
Reviewed By: ostannard
Subscribers: pbarrio, LukeGeeson, kristof.beyls, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79711
Summary:
This patch upstreams support for a new storage only bfloat16 C type.
This type is used to implement primitive support for bfloat16 data, in
line with the Bfloat16 extension of the Armv8.6-a architecture, as
detailed here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
In detail this patch:
- introduces an opaque, storage-only C-type __bf16, which introduces a new bfloat IR type.
This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.
The following people contributed to this patch:
- Luke Cheeseman
- Momchil Velikov
- Alexandros Lamprineas
- Luke Geeson
- Simon Tatham
- Ties Stuij
Reviewers: SjoerdMeijer, rjmccall, rsmith, liutianle, RKSimon, craig.topper, jfb, LukeGeeson, fpetrogalli
Reviewed By: SjoerdMeijer
Subscribers: labrinea, majnemer, asmith, dexonsmith, kristof.beyls, arphaman, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76077
This is a re-revert with a corrected test.
This patch adds a test for the PowerPC fma compiler builtins, some variations
of which negate inputs and outputs. The code to generate IR for these
builtins was untested before this patch.
Originally, the code used the outdated method of subtracting floating point
values from -0.0 as floating point negation. This patch remedies that.
Patch by: Drew Wock <drew.wock@sas.com>
Differential Revision: https://reviews.llvm.org/D76949
Summary:
During CodeGen for AArch64 Neon intrinsics, Clang was incorrectly
assuming all the pointers from which loads were being generated for vld1
intrinsics were aligned according to the intrinsics result type, causing
alignment faults on the code generated by the backend.
This patch updates vld1 intrinsics' CodeGen to properly capture the
correct load alignment based on the type of the pointer provided as
input for the intrinsic.
Reviewers: t.p.northover, ostannard, pcc, efriedma
Reviewed By: ostannard, efriedma
Subscribers: echristo, plotfi, nickdesaulniers, efriedma, kristof.beyls, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79721
When sampleFDO is enabled, people may expect they can use
-fno-profile-sample-use to opt-out using sample profile for a certain file.
That could be either for debugging purpose or for performance tuning purpose.
However, when thinlto is enabled, if a function in file A compiled with
-fno-profile-sample-use is imported to another file B compiled with
-fprofile-sample-use, the inlined copy of the function in file B may still
get its profile annotated.
The inconsistency may even introduce profile unused warning because if the
target is not compiled with explicit debug information flag, the function
in file A won't have its debug information enabled (debug information will
be enabled implicitly only when -fprofile-sample-use is used). After it is
imported into file B which is compiled with -fprofile-sample-use, profile
annotation for the outline copy of the function will fail because the
function has no debug information, and that will trigger profile unused
warning.
We add a new attribute use-sample-profile to control whether a function
will use its sample profile no matter for its outline or inline copies.
That will make the behavior of -fno-profile-sample-use consistent.
Differential Revision: https://reviews.llvm.org/D79959
This patch adds clang options:
-fbasic-block-sections={all,<filename>,labels,none} and
-funique-basic-block-section-names.
LLVM Support for basic block sections is already enabled.
+ -fbasic-block-sections={all, <file>, labels, none} : Enables/Disables basic
block sections for all or a subset of basic blocks. "labels" only enables
basic block symbols.
+ -funique-basic-block-section-names: Enables unique section names for
basic block sections, disabled by default.
Differential Revision: https://reviews.llvm.org/D68049
Canonicalize on storing FP options in LangOptions instead of
redundantly in CodeGenOptions. Incorporate -ffast-math directly
into the values of those LangOptions rather than considering it
separately when building FPOptions. Build IR attributes from
those options rather than a mix of sources.
We should really simplify the driver/cc1 interaction here and have
the driver pass down options that cc1 directly honors. That can
happen in a follow-up, though.
Patch by Michele Scandale!
https://reviews.llvm.org/D80315
This patch implements matrix index expressions
(matrix[RowIdx][ColumnIdx]).
It does so by introducing a new MatrixSubscriptExpr(Base, RowIdx, ColumnIdx).
MatrixSubscriptExprs are built in 2 steps in ActOnMatrixSubscriptExpr. First,
if the base of a subscript is of matrix type, we create a incomplete
MatrixSubscriptExpr(base, idx, nullptr). Second, if the base is an incomplete
MatrixSubscriptExpr, we create a complete
MatrixSubscriptExpr(base->getBase(), base->getRowIdx(), idx)
Similar to vector elements, it is not possible to take the address of
a MatrixSubscriptExpr.
For CodeGen, a new MatrixElt type is added to LValue, which is very
similar to VectorElt. The only difference is that we may need to cast
the type of the base from an array to a vector type when accessing it.
Reviewers: rjmccall, anemet, Bigcheese, rsmith, martong
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D76791
This file was originally added without instnamer at:
rL283716 / fe2b9b4fbf
But that was reverted and the test file reappeared with instnamer at:
rL285688 / 62f516f590
I'm not seeing any difference locally from checking nameless values,
so trying to remove a layering violation and see if that can
survive the build bots.
After the D70350, the retainedTypes: isn't being used for the purpose
of call site debug info for extern calls, so it is safe to delete it
from IR representation.
We are also adding a test to ensure the subprogram isn't stored within
the retainedTypes: from corresponding DICompileUnit.
Differential Revision: https://reviews.llvm.org/D80369
This patch implements the + and - binary operators for values of
MatrixType. It adds support for matrix +/- matrix, scalar +/- matrix and
matrix +/- scalar.
For the matrix, matrix case, the types must initially be structurally
equivalent. For the scalar,matrix variants, the element type of the
matrix must match the scalar type.
Reviewers: rjmccall, anemet, Bigcheese, rsmith, martong
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D76793
Summary:
This was attempted once before in https://reviews.llvm.org/D79698, but
was reverted due to the coverage pass running in the wrong part of the
pipeline. This commit puts it in the same place as the other sanitizers.
This changes PassBuilder.OptimizerLastEPCallbacks to work on a
ModulePassManager instead of a FunctionPassManager. That is because
SanitizerCoverage cannot (easily) be split into a module pass and a
function pass like some of the other sanitizers since in its current
implementation it conditionally inserts module constructors based on
whether or not it successfully modified functions.
This fixes compiler-rt/test/msan/coverage-levels.cpp under the new pass
manager (last check-msan test).
Currently sanitizers + LTO don't work together under the new pass
manager, so I removed tests that checked that this combination works for
sancov.
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80692
When I added __float128 a while ago, I neglected to add support for the complex
variant of the type. This patch just adds that.
Differential revision: https://reviews.llvm.org/D80533
Summary:
Versions of LLVM built on {arm,thumb}v7 appear to have differently
configured pass managers, which causes restrictions on which sanitizers
we may use.
As such, expect failure of the recently added "sanitize-coverage.c" test
on these architectures until we can investigate armv7's restrictions.
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46117
Reviewers: vitalybuka, glider
Reviewed By: glider
Subscribers: glider, kristof.beyls, danielkiss, cfe-commits, vvereschaka
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80668
The os_log helper functions are linkonce_odr and supposed to be
uniqued across TUs, so attachine a DW_AT_decl_line on it is highly
misleading. By setting the function decl to implicit, CGDebugInfo
properly marks the functions as artificial and uses a default file /
line 0 location for the function.
rdar://problem/63450824
Differential Revision: https://reviews.llvm.org/D80463
Summary:
During CodeGen for AArch64 Neon intrinsics, Clang was incorrectly
assuming all the pointers from which loads were being generated for vld1
intrinsics were aligned according to the intrinsics result type, causing
alignment faults on the code generated by the backend.
This patch updates vld1 intrinsics' CodeGen to properly capture the
correct load alignment based on the type of the pointer provided as
input for the intrinsic.
Reviewers: t.p.northover, ostannard, pcc
Reviewed By: ostannard
Subscribers: kristof.beyls, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79721
-fno-semantic-interposition is currently the CC1 default. (The opposite
disables some interprocedural optimizations.) However, it does not infer
dso_local: on most targets accesses to ExternalLinkage functions/variables
defined in the current module still need PLT/GOT.
This patch makes explicit -fno-semantic-interposition infer dso_local,
so that PLT/GOT can be eliminated if targets implement local aliases
for AsmPrinter::getSymbolPreferLocal (currently only x86).
Currently we check whether the module flag "SemanticInterposition" is 0.
If yes, infer dso_local. In the future, we can infer dso_local unless
"SemanticInterposition" is 1: frontends other than clang will also
benefit from the optimization if they don't bother setting the flag.
(There will be risks if they do want ELF interposition: they need to set
"SemanticInterposition" to 1.)
There are 65 that take a scalar shift amount. Intel documentation shows 60 of them taking unsigned int. There are 5 versions of srli_epi16 that use int, the 512-bit maskz and 128/256 mask/maskz.
Fixes PR45931
Differential Revision: https://reviews.llvm.org/D80251
We currently emit incorrect codegen for this constraint because we set it as a
constraint that allows registers. This will cause the value to be copied to the
stack and that address to be passed as the address. This is not what we want.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=42762
Differential revision: https://reviews.llvm.org/D77542
NoDebug attr does not totally eliminate debug info about a function when
inlining is enabled. This is inconsistent with when inlining is disabled.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D79967
If we're going to assume references are dereferenceable, we should also
assume they're aligned: otherwise, we can't actually dereference them.
See also D80072.
Differential Revision: https://reviews.llvm.org/D80166
Summary:
Created AIXABIInfo and AIXTargetCodeGenInfo for AIX ABI.
Reviewed By: Xiangling_L, ZarkoCA
Differential Revision: https://reviews.llvm.org/D79035
Summary:
Add x86 feature with IBT and/or SHSTK bits to ELF program property if they are enabled. Otherwise, contents in this header file are unused.
This file is mainly design for assembly source code which want to enable CET
Reviewers: hjl.tools, annita.zhang, LuoYuanke, craig.topper, tstellar, pengfei, rsmith
Reviewed By: LuoYuanke
Subscribers: cfe-commits, mgorny
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79617
Summary:
Add x86 feature with IBT and/or SHSTK bits to ELF program property if they are enabled. Otherwise, contents in this header file are unused.
This file is mainly design for assembly source code which want to enable CET
Reviewers: hjl.tools, annita.zhang, LuoYuanke, craig.topper, tstellar, pengfei, rsmith
Reviewed By: LuoYuanke
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D79617
rL82131 changed -O from -O1 to -O2, because -O1 was not different from
-O2 at that time.
GCC treats -O as -O1 and there is now work to make -O1 meaningful.
We can change -O back to -O1 again.
Reviewed By: echristo, dexonsmith, arphaman
Differential Revision: https://reviews.llvm.org/D79916
Summary:
This patch fixed the error of counting the remaining FPRs. Complex floating-point
values should be passed by two FPRs for the hard-float ABI. If no two FPRs are
available, it should be passed via a 64-bit GPR (fp+fp). `ArgFPRsLeft` is only
decreased one while the type is complex floating-point. It causes two floating-point
values in the complex are passed separately by two GPRs.
Reviewers: asb, luismarques, lenary
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, s.egerton, pzheng, sameer.abuasal, apazos, evandro, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79770
Summary:
The `arm_cmse.h` header includes standard headers, but some tests that
include this header explicitly specify a target. The standard headers
found via the standard include paths need not be compatible with the
explicitly-specified target from the tests. In order to avoid test
failures caused by such incompatibility, this patch uses `%clang_cc1`,
which doesn't pick up the host system headers.
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D79693
The "null-pointer-is-valid" attribute needs to be checked by many
pointer-related combines. To make the check more efficient, convert
it from a string into an enum attribute.
In the future, this attribute may be replaced with data layout
properties.
Differential Revision: https://reviews.llvm.org/D78862
Such a builtin function is mostly useful to preserve btf type id
for non-global data. For example,
extern void foo(..., void *data, int size);
int test(...) {
struct t { int a; int b; int c; } d;
d.a = ...; d.b = ...; d.c = ...;
foo(..., &d, sizeof(d));
}
The function "foo" in the above only see raw data and does not
know what type of the data is. In certain cases, e.g., logging,
the additional type information will help pretty print.
This patch implemented a BPF specific builtin
u32 btf_type_id = __builtin_btf_type_id(param, flag)
which will return a btf type id for the "param".
flag == 0 will indicate a BTF local relocation,
which means btf type_id only adjusted when bpf program BTF changes.
flag == 1 will indicate a BTF remote relocation,
which means btf type_id is adjusted against linux kernel or
future other entities.
Differential Revision: https://reviews.llvm.org/D74668
Summary: #pragma float_control(pop) was failing to restore the expected
floating point settings because the settings were not correctly preserved
at #pragma float_control(push).
Summary:
Use explicit target triple to match more accurately the output for libcall
or native atomic.
Similar to D74847, without explicit target triple, this test will fail for ARM.
This patch update test pr45476.cpp to check for both native atomic and libcall.
Reviewers: efriedma, ekatz, rjmccall, rsmith, luismarques
Reviewed By: efriedma
Subscribers: kristof.beyls, jfb, cfe-commits, llvm-commits
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D79914
Summary:
Analyses that are statefull should not be retrieved through a proxy from
an outer IR unit, as these analyses are only invalidated at the end of
the inner IR unit manager.
This patch disallows getting the outer manager and provides an API to
get a cached analysis through the proxy. If the analysis is not
stateless, the call to getCachedResult will assert.
Reviewers: chandlerc
Subscribers: mehdi_amini, eraman, hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72893
Sometimes you want to disable a FileCheck directive without removing
it entirely, or you want to write comments that mention a directive by
name. The `COM:` directive makes it easy to do this. For example,
you might have:
```
; X32: pinsrd_1:
; X32: pinsrd $1, 4(%esp), %xmm0
; COM: FIXME: X64 isn't working correctly yet for this part of codegen, but
; COM: X64 will have something similar to X32:
; COM:
; COM: X64: pinsrd_1:
; COM: X64: pinsrd $1, %edi, %xmm0
```
Without this patch, you need to use some combination of rewording and
directive syntax mangling to prevent FileCheck from recognizing the
commented occurrences of `X32:` and `X64:` above as directives.
Moreover, FileCheck diagnostics have been proposed that might complain
about the occurrences of `X64` that don't have the trailing `:`
because they look like directive typos:
<http://lists.llvm.org/pipermail/llvm-dev/2020-April/140610.html>
I think dodging all these problems can prove tedious for test authors,
and directive syntax mangling already makes the purpose of existing
test code unclear. `COM:` can avoid all these problems.
This patch also updates the small set of existing tests that define
`COM` as a check prefix:
- clang/test/CodeGen/default-address-space.c
- clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
- clang/test/Driver/hip-device-libs.hip
- llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll
I think lit should support `COM:` as well. Perhaps `clang -verify`
should too.
Reviewed By: jhenderson, thopre
Differential Revision: https://reviews.llvm.org/D79276
Summary:
As proposed in https://github.com/WebAssembly/simd/pull/122. Since
these instructions are not yet merged to the SIMD spec proposal, this
patch makes them entirely opt-in by surfacing them only through LLVM
intrinsics and clang builtins. If these instructions are made
official, these intrinsics and builtins should be replaced with simple
instruction patterns.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D79742
Summary: Erroneous error diagnostic observed in VS2017 <numeric> header
Also correction to propagate usesFPIntrin from template func to instantiation.
Reviewers: rjmccall, erichkeane (no feedback received)
Differential Revision: https://reviews.llvm.org/D79631
Sometimes you want to disable a FileCheck directive without removing
it entirely, or you want to write comments that mention a directive by
name. The `COM:` directive makes it easy to do this. For example,
you might have:
```
; X32: pinsrd_1:
; X32: pinsrd $1, 4(%esp), %xmm0
; COM: FIXME: X64 isn't working correctly yet for this part of codegen, but
; COM: X64 will have something similar to X32:
; COM:
; COM: X64: pinsrd_1:
; COM: X64: pinsrd $1, %edi, %xmm0
```
Without this patch, you need to use some combination of rewording and
directive syntax mangling to prevent FileCheck from recognizing the
commented occurrences of `X32:` and `X64:` above as directives.
Moreover, FileCheck diagnostics have been proposed that might complain
about the occurrences of `X64` that don't have the trailing `:`
because they look like directive typos:
<http://lists.llvm.org/pipermail/llvm-dev/2020-April/140610.html>
I think dodging all these problems can prove tedious for test authors,
and directive syntax mangling already makes the purpose of existing
test code unclear. `COM:` can avoid all these problems.
This patch also updates the small set of existing tests that define
`COM` as a check prefix:
- clang/test/CodeGen/default-address-space.c
- clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
- clang/test/Driver/hip-device-libs.hip
- llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll
I think lit should support `COM:` as well. Perhaps `clang -verify`
should too.
Reviewed By: jhenderson, thopre
Differential Revision: https://reviews.llvm.org/D79276
This patch adds a matrix type to Clang as described in the draft
specification in clang/docs/MatrixSupport.rst. It introduces a new option
-fenable-matrix, which can be used to enable the matrix support.
The patch adds new MatrixType and DependentSizedMatrixType types along
with the plumbing required. Loads of and stores to pointers to matrix
values are lowered to memory operations on 1-D IR arrays. After loading,
the loaded values are cast to a vector. This ensures matrix values use
the alignment of the element type, instead of LLVM's large vector
alignment.
The operators and builtins described in the draft spec will will be added in
follow-up patches.
Reviewers: martong, rsmith, Bigcheese, anemet, dexonsmith, rjmccall, aaron.ballman
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D72281
Summary:
Although using `__builtin_shufflevector` and the `shufflevector`
instruction works fine, they are not opaque to the optimizer. As a
result, DAGCombine can potentially reduce the number of shuffles and
change the shuffle masks. This is unexpected behavior for users of the
WebAssembly SIMD intrinsics who have crafted their shuffles to
optimize the code generated by engines. This patch solves the problem
by adding a new shuffle intrinsic that is opaque to the optimizers in
line with the decision of the WebAssembly SIMD contributors at
https://github.com/WebAssembly/simd/issues/196#issuecomment-622494748. In
the future we may implement custom DAG combines to properly optimize
shuffles and replace this solution.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D66983
These builtins are expanded in CGBuiltin to use intrinsics
for (signed/unsigned) shift left long top/bottom.
Reviewers: efriedma, SjoerdMeijer
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D79579
Defaulting to -Xclang -coverage-version='407*' makes .gcno/.gcda
compatible with gcov [4.7,8)
In addition, delete clang::CodeGenOptionsBase::CoverageExtraChecksum and GCOVOptions::UseCfgChecksum.
We can infer the information from the version.
With this change, .gcda files produced by `clang --coverage a.o` linked executable can be read by gcov 4.7~7.
We don't need other -Xclang -coverage* options.
There may be a mismatching version warning, though.
(Note, GCC r173147 "split checksum into cfg checksum and line checksum"
made gcov 4.7 incompatible with previous versions.)
rL144865 incorrectly wrote function names for GCOV_TAG_FUNCTION
(this might be part of the reasons the header says
"We emit files in a corrupt version of GCOV's "gcda" file format").
rL176173 and rL177475 realized the problem and introduced -coverage-no-function-names-in-data
to work around the issue. (However, the description is wrong.
libgcov never writes function names, even before GCC 4.2).
In reality, the linker command line has to look like:
clang --coverage -Xclang -coverage-version='407*' -Xclang -coverage-cfg-checksum -Xclang -coverage-no-function-names-in-data
Failing to pass -coverage-no-function-names-in-data can make gcov 4.7~7
either produce wrong results (for one gcov-4.9 program, I see "No executable lines")
or segfault (gcov-7).
(gcov-8 uses an incompatible format.)
This patch deletes -coverage-no-function-names-in-data and the related
function names support from libclang_rt.profile
This test would break with the proposed change to IR canonicalization
in D79171.
The test tried to do the right thing by only using -mem2reg with opt,
but it was using -O3 before that step, so the opt part was meaningless.
This test would break with the proposed change to IR canonicalization
in D79171. The raw unoptimized IR from clang is massive, so I've
replaced -instcombine with -mem2reg to make it more manageable,
but still be unlikely to break with unrelated changed to optimization.
Summary:
Since the underlying wait and notify instructions are only available
when the atomics feature is enabled, it only makes sense to expose
their builtin functions when atomics are enabled.
Reviewers: aheejin, sunfish
Subscribers: dschuff, sbc100, jgravelle-google, jfb, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79534
This is a standalone patch and this would help Propeller do a better job of code
layout as it can accurately attribute the profiles to the right internal linkage
function.
This also helps SampledFDO/AutoFDO correctly associate sampled profiles to the
right internal function. Currently, if there is more than one internal symbol
foo, their profiles are aggregated by SampledFDO.
This patch adds a new clang option, -funique-internal-funcnames, to generate
unique names for functions with internal linkage. This patch appends the md5
hash of the module name to the function symbol as a best effort to generate a
unique name for symbols with internal linkage.
Differential Revision: https://reviews.llvm.org/D73307
This patch adds various builtins under their corresponding feature macros:
Defined under __ARM_FEATURE_SVE2_AES:
- svaesd
- svaese
- svaesimc
- svaesmc
- svpmullb_pair
- svpmullt_pair
Defined under __ARM_FEATURE_SVE2_SHA3:
- svrax1
Defined under __ARM_FEATURE_SVE2_SM4:
- svsm4e
- svsm4ekey
Defined under __ARM_FEATURE_SVE2_BITPERM:
- svbdep
- svbext
- svbgrp
Neither gcc or icc support this. Split out from D79472. I want
to remove more, but it looks like icc does support some things
gcc doesn't and I need to double check our internal test suites.
This is the result of an audit of all of the ABIs in clang to implement
and enable the type for those targets.
Additionally, this finds an issue with integer-promotion passing for a
few platforms when using _ExtInt of < int, so this also corrects that
resulting in signext/zeroext being on a params of those types in some
platforms.
Differential Revisions: https://reviews.llvm.org/D79118
gcc supports selecting ymm0/zmm0 for the Yz constraint when used with 256 or 512 bit vector types.
Fixes PR45806
Differential Revision: https://reviews.llvm.org/D79448
The reinterpret builtins are generated separately because they
need the cross product of all types, 121 functions in total,
which is inconvenient to specify in the arm_sve.td file.
Reviewers: SjoerdMeijer, efriedma, ctetreau, rengolin
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78756
* svdupq builtins that duplicate scalars to every quadword of a vector
are defined using builtins for svld1rq (load and replicate quadword).
* svdupq builtins that duplicate boolean values to fill a predicate vector
are defined using `svcmpne`.
Reviewers: SjoerdMeijer, efriedma, ctetreau
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78750
Summary:
As described in https://github.com/WebAssembly/simd/pull/209. This is
the final reorganization of the SIMD opcode space before
standardization. It has been landed in concert with corresponding
changes in other projects in the WebAssembly SIMD ecosystem.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79224
test cases
Add support for #pragma float_control
Reviewers: rjmccall, erichkeane, sepavloff
Differential Revision: https://reviews.llvm.org/D72841
This reverts commit 85dc033cac, and makes
corrections to the test cases that failed on buildbots.
Summary:
Per Windows SEH Spec, except _leave, all other early exits of a _try (goto/return/continue/break) are considered abnormal exits. In those cases, the first parameter passes to its _finally funclet should be TRUE to indicate an abnormal-termination.
One way to implement abnormal exits in _try is to invoke Windows runtime _local_unwind() (MSVC approach) that will invoke _dtor funclet where abnormal-termination flag is always TRUE when calling _finally. Obviously this approach is less optimal and is complicated to implement in Clang.
Clang today has a NormalCleanupDestSlot mechanism to dispatch multiple exits at the end of _try. Since _leave (or try-end fall-through) is always Indexed with 0 in that NormalCleanupDestSlot, this fix takes the advantage of that mechanism and just passes NormalCleanupDest ID as 1st Arg to _finally.
Reviewers: rnk, eli.friedman, JosephTremoulet, asmith, efriedma
Reviewed By: efriedma
Subscribers: efriedma, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77936
I'm currently auditing all of the calling convention implications of
_ExtInt for all platforms, so splitting them up into their own test will
make this a much easier task to organize.
After speaking with Craig Topper about some recent defects, he pointed
out that _ExtInts should be passed indirectly if larger than the largest
int register, and like ints when smaller than that. This patch
implements that.
Note that this changed the way vaargs worked quite a bit, but they still
work.
Differential Revision: https://reviews.llvm.org/D78785