This patch adds 2 new options to control when Clang adds `mustprogress`:
1. -ffinite-loops: assume all loops are finite; mustprogress is added
to all loops, regardless of the selected language standard.
2. -fno-finite-loops: assume no loop is finite; mustprogress is not
added to any loop or function. We could add mustprogress to
functions without loops, but we would have to detect that in Clang,
which is probably not worth it.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D96419
The ability to specify alignment was recently added, and it's an
important property which we should ensure is set as expected by
Clang. (Especially before making further changes to Clang's code in
this area.) But, because it's on the end of the lines, the existing
tests all ignore it.
Therefore, update all the tests to also verify the expected alignment
for atomicrmw and cmpxchg. While I was in there, I also updated uses
of 'load atomic' and 'store atomic', and added the memory ordering,
where that was missing.
This change removes the XFAIL from the original test and duplicates the test into sanitize-coverage-old-pm.c
which uses the old pass manager and has the corresponding XFAIL.
This should fix the XPASS from this and similar runs:
http://lab.llvm.org:8011/#/builders/60/builds/1875
After D93264, using both -fdebug-info-for-profiling and
-fpseudo-probe-for-profiling will cause the compiler to crash.
Diagnose these conflicting options in the driver.
Also, the existing CodeGen test was using the driver when it should be
running cc1.
Differential Revision: https://reviews.llvm.org/D96354
Intrinsics *reduce_add/mul_ps/pd have assumption that the elements in
the vector are reassociable. So we need to always assign the reassoc
flag when we call _mm_reduce_* intrinsics.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D96231
When a function or a file is excluded using -fprofile-list= option,
don't emit coverage mapping as doing so confuses users since those
functions would always have zero count. This also reduces the binary
size considerably in cases where only a few functions or files are
being instrumented.
Differential Revision: https://reviews.llvm.org/D96000
__builtin_isnan currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.
Reviewed By: kpn
Differential Revision: https://reviews.llvm.org/D95948
Currently clang is not correctly retrieving from the AST the metadata for
constrained FP builtins. This patch fixes that for the X86 specific builtins.
Differential Revision: https://reviews.llvm.org/D94614
Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count.
This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes.
A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead.
Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D93264
On z/OS, the following error message is not matched correctly in lit tests.
```
EDC5129I No such file or directory.
```
This patch uses a lit config substitution to check for platform specific error messages.
Reviewed By: muiez, jhenderson
Differential Revision: https://reviews.llvm.org/D95246
The Clang enable_if extension is mangled as an <extended-qualifier>,
which is supposed to contain <template-args>. However, we were
unconditionally emitting X/E around its arguments, neglecting the fact
that <expr-primary> should be emitted directly without the surrounding
X/E.
Differential Revision: https://reviews.llvm.org/D95488
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
For Clang synthesized `__va_list_tag` (`CreateX86_64ABIBuiltinVaListDecl`),
its DW_AT_decl_file/DW_AT_decl_line are arbitrarily set from `CurLoc`.
In a stage 2 `-DCMAKE_BUILD_TYPE=Debug` clang build, I observe that
in driver.cpp, DW_AT_decl_file/DW_AT_decl_line may be set to an `#include` line
(the transitively included file uses va_arg (`__builtin_va_arg`)).
This seems arbitrary. Drop that.
Reviewed By: #debug-info, dblaikie
Differential Revision: https://reviews.llvm.org/D94735
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
The previous implementation required that `-maltivec` be specified when using either `-mabi=vec-extabi` or `-mabi=vec-default`, this patch removes that requirement.
Reviewed By: cebowleratibm
Differential Revision: https://reviews.llvm.org/D94986
Insert a llvm.experimental.noalias.scope.decl intrinsic that identifies where a noalias argument was inlined.
This patch includes some refactorings from D90104.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93040
This reverts commit 275f30df8a.
As noted on the code review (https://reviews.llvm.org/D92892), this
change causes us to reject valid code in a few cases. Reverting so we
have more time to figure out what the right fix{es are, is} here.
On z/OS, the following error message is not matched correctly in lit tests. This patch updates the CHECK expression to match successfully.
```
EDC5129I No such file or directory.
```
Reviewed By: muiez
Differential Revision: https://reviews.llvm.org/D94239
Expanding from D94808 - we ensure the same InlineAdvisor is used by both
InlinerPass instances. The notion of mandatory inlining is moved into
the core InlineAdvisor: advisors anyway have to handle that case, so
this change also factors out that a bit better.
Differential Revision: https://reviews.llvm.org/D94825
Under -mabi=ieeelongdouble on PowerPC, IEEE-quad floating point semantic
is used for long double. This patch mutates call to related builtins
into f128 version on PowerPC. And in theory, this should be applied to
other targets when their backend supports IEEE 128-bit style libcalls.
GCC already has these mutations except nansl, which is not available on
PowerPC along with other variants (nans, nansf).
Reviewed By: RKSimon, nemanjai
Differential Revision: https://reviews.llvm.org/D92080
The intent presumably is to avoid generating 'opaque' in the IR, but the
header contains the filename. Thus, having the workspace in a directory
with opaque in it causes this test to fail.
This just adds a 'CHECK' line on target-triple, which is the last line
of the IR-header.
This introduces the ARMv8.7-A LS64 extension's intrinsics for 64 bytes
atomic loads and stores: `__arm_ld64b`, `__arm_st64b`, `__arm_st64bv`,
and `__arm_st64bv0`. These are selected into the LS64 instructions
LD64B, ST64B, ST64BV and ST64BV0, respectively.
Based on patches written by Simon Tatham.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D93232
Move nomerge attribute from function declaration/definition to callsites to
allow virtual function calls attach the attribute.
Differential Revision: https://reviews.llvm.org/D94537
This patch removes the -f[no-]trapping-math flags from the -cc1 command line. These flags are ignored in the command line parser and their semantics is fully handled by -ffp-exception-mode.
This patch does not remove -f[no-]trapping-math from the driver command line. The driver flags are being used and do affect compilation.
Reviewed By: dexonsmith, SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D93395
Please see D93747 for more context which tries to make linkage names of internal
linkage functions to be the uniqueified names. This causes a problem with gdb
because breaking using the demangled function name will not work if the new
uniqueified name cannot be demangled. The problem is the generated suffix which
is a mix of integers and letters which do not demangle. The demangler accepts
either all numbers or all letters. This patch simply converts the hash to decimal.
There is no loss of uniqueness by doing this as the precision is maintained.
The symbol names get longer by a few characters though.
Differential Revision: https://reviews.llvm.org/D94154
VLST return values are coerced to VLATs in the function epilog for
consistency with the VLAT ABI. Previously, this coercion was done
through memory. It is preferable to use the
llvm.experimental.vector.insert intrinsic to avoid going through memory
here.
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D94290
ELF -fno-pic sets dso_local on a function declaration to allow direct accesses
when taking its address (similar to a data symbol). The emitted code follows the
traditional GCC/Clang -fno-pic behavior: an absolute relocation is produced.
If the function is not defined in the executable, a canonical PLT entry will be
needed at link time. This is similar to a copy relocation and is incompatible
with (-Bsymbolic or --dynamic-list linked shared objects / protected symbols in
a shared object).
This patch gives -fno-pic code a way to avoid such a canonical PLT entry.
The FIXME was about a generalization for -fpie -mpie-copy-relocations (now -fpie
-fdirect-access-external-data). While we could set dso_local to avoid GOT when
taking the address of a function declaration (there is an ignorable difference
about R_386_PC32 vs R_386_PLT32 on i386), it likely does not provide any benefit
and can just cause trouble, so we don't make the generalization.
D92633 added -f[no-]direct-access-external-data to supersede -m[no-]pie-copy-relocations.
(The option works for -fpie but is a no-op for -fno-pic and -fpic.)
This patch makes -fno-pic -fno-direct-access-external-data drop dso_local from
global variable declarations. This usually causes the backend to emit a GOT
indirection for external data access. With a GOT relocation, the subsequent
-no-pie link will not have copy relocation even if the data symbol turns out to
be defined by a shared object.
Differential Revision: https://reviews.llvm.org/D92714
GCC r218397 "x86-64: Optimize access to globals in PIE with copy reloc" made
-fpie code emit R_X86_64_PC32 to reference external data symbols by default.
Clang adopted -mpie-copy-relocations D19996 as a flexible alternative.
The name -mpie-copy-relocations can be improved [1] and does not capture the
idea that this option can apply to -fno-pic and -fpic [2], so this patch
introduces -f[no-]direct-access-external-data and makes -mpie-copy-relocations
their aliases for compatibility.
[1]
For
```
extern int var;
int get() { return var; }
```
if var is defined in another translation unit in the link unit, there is no copy
relocation.
[2]
-fno-pic -fno-direct-access-external-data is useful to avoid copy relocations.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65888
If a shared object is linked with -Bsymbolic or --dynamic-list and exports a
data symbol, normally the data symbol cannot be accessed by -fno-pic code
(because by default an absolute relocation is produced which will lead to a copy
relocation). -fno-direct-access-external-data can prevent copy relocations.
-fpic -fdirect-access-external-data can avoid GOT indirection. This is like the
undefined counterpart of -fno-semantic-interposition. However, the user should
define var in another translation unit and link with -Bsymbolic or
--dynamic-list, otherwise the linker will error in a -shared link. Generally
the user has better tools for their goal but I want to mention that this
combination is valid.
On COFF, the behavior is like always -fdirect-access-external-data.
`__declspec(dllimport)` is needed to enable indirect access.
There is currently no plan to affect non-ELF behaviors or -fpic behaviors.
-fno-pic -fno-direct-access-external-data will be implemented in the subsequent patch.
GCC feature request https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D92633
Like @aprantl suggested, modify to use the canonicalized DIFile, if we
don't know the loc info and filename for the compiler generated
functions for example static initialization functions.
Reviewed By: dblaikie, aprantl
Differential Revision: https://reviews.llvm.org/D87147
`wasm_rethrow_in_catch` intrinsic and builtin are used in order to
rethrow an exception when the exception is caught but there is no
matching clause within the current `catch`. For example,
```
try {
foo();
} catch (int n) {
...
}
```
If the caught exception does not correspond to C++ `int` type, it should
be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch`
because at the time I thought there would be another intrinsic for C++'s
`throw` keyword, which rethrows an exception. It turned out that `throw`
keyword doesn't require wasm's `rethrow` instruction, so we rename
`rethrow_in_catch` to just `rethrow` here.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D94038
Motivating example:
```
struct { int v[10]; } t[10];
__builtin_object_size(
&t[0].v[11], // access past end of subobject
1 // request remaining bytes of closest surrounding
// subobject
);
```
In GCC, this returns 0. https://godbolt.org/z/7TeGs7
In current clang, however, this returns 356, the number of bytes
remaining in the whole variable, as if the `type` was 0 instead of 1.
https://godbolt.org/z/6Kffox
This patch checks for the specific case where we're requesting a
subobject's size (type 1) but the subobject is invalid.
Differential Revision: https://reviews.llvm.org/D92892
When introducing support for @llvm.experimental.noalias.scope.decl, this tests started failing because it checks
(for no good reason) for a function attribute id of '#8' which now becomes '#9'
Reviewed By: pratlucas
Differential Revision: https://reviews.llvm.org/D94233
This patch adds the LANE variants for VCMLA on AArch64 as defined in
"Arm Neon Intrinsics Reference for ACLE Q3 2020" [1]
This patch also updates `dup_typed` to accept constant type strings directly.
Based on a patch by Tim Northover.
[1] https://developer.arm.com/documentation/ihi0073/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D93014
VLST arguments are coerced to VLATs at the function boundary for
consistency with the VLAT ABI. They are then bitcast back to VLSTs in
the function prolog. Previously, this conversion is done through memory.
With the introduction of the llvm.vector.{insert,extract} intrinsic, we
can avoid going through memory here.
Depends on D92761
Differential Revision: https://reviews.llvm.org/D92762
Add powerpcle support to clang.
For FreeBSD, assume a freestanding environment for now, as we only need it in the first place to build loader, which runs in the OpenFirmware environment instead of the FreeBSD environment.
For Linux, recognize glibc and musl environments to match current usage in Void Linux PPC.
Adjust driver to match current binutils behavior regarding machine naming.
Adjust and expand tests.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D93919
The idea is that the CC1 default for ELF should set dso_local on default
visibility external linkage definitions in the default -mrelocation-model pic
mode (-fpic/-fPIC) to match COFF/Mach-O and make output IR similar.
The refactoring is made available by 2820a2ca3a.
Currently only x86 supports local aliases. We move the decision to the driver.
There are three CC1 states:
* -fsemantic-interposition: make some linkages interposable and make default visibility external linkage definitions dso_preemptable.
* (default): selected if the target supports .Lfoo$local: make default visibility external linkage definitions dso_local
* -fhalf-no-semantic-interposition: if neither option is set or the target does not support .Lfoo$local: like -fno-semantic-interposition but local aliases are not used. So references can be interposed if not optimized out.
Add -fhalf-no-semantic-interposition to a few tests using the half-based semantic interposition behavior.
For a default visibility external linkage definition, dso_local is set for ELF
-fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar
to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences.
To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition,
which sets dso_local for default visibility external linkage definitions.
To make this flip smooth and enable future (dso_local as definition default),
this patch replaces (function) `define ` with `define{{.*}} `,
(variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `.
For a definition (of most linkage types), dso_local is set for ELF -fno-pic/-fpie
and COFF, but not for Mach-O. This nuance causes unneeded binary format differences.
This patch replaces (function) `define ` with `define{{.*}} `,
(variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `
if there is an explicit linkage.
* Clang will set dso_local for Mach-O, which is currently implied by TargetMachine.cpp. This will make COFF/Mach-O and executable ELF similar.
* Eventually I hope we can make dso_local the textual LLVM IR default (write explicit "dso_preemptable" when applicable) and -fpic ELF will be similar to everything else. This patch helps move toward that goal.
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.
Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)
The order is swapped, but in terms of correctness it is still fine.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93923
This simplifies TargetMachine::shouldAssumeDSOLocal and and gives frontend the
decision to use dso_local. For LLVM synthesized functions/globals, they may lose
inferred dso_local but such optimizations are probably not very useful.
Note: the hasComdat() condition in canBenefitFromLocalAlias (D77429) may be dead now.
(llvm/CodeGen/X86/semantic-interposition-comdat.ll)
(Investigate whether we need test coverage when Fuchsia C++ ABI is clearer)
The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when
it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it
is used by load/store instruction. So amx intrinsics only operate on type x86_amx.
It can help to separate amx intrinsics from llvm IR instructions (+-*/).
Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981.
Differential Revision: https://reviews.llvm.org/D91927
This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93793
As proposed in https://github.com/WebAssembly/simd/pull/380. This commit makes
the new instructions available only via clang builtins and LLVM intrinsics to
make their use opt-in while they are still being evaluated for inclusion in the
SIMD proposal.
Depends on D93771.
Differential Revision: https://reviews.llvm.org/D93775
Currently undef is used as a don’t-care vector when constructing a vector using a series of insertelement.
However, this is problematic because undef isn’t undefined enough.
Especially, a sequence of insertelement can be optimized to shufflevector, but using undef as its placeholder makes shufflevector a poison-blocking instruction because undef cannot be optimized to poison.
This makes a few straightforward optimizations incorrect, such as:
```
; https://bugs.llvm.org/show_bug.cgi?id=44185
define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) {
%xv = insertelement <4 x float> %q, float %x, i32 2
%r = shufflevector <4 x float> %y, <4 x float> %xv, <4 x i32> { 0, 6, 2, undef }
ret <4 x float> %r ; %r[3] is undef
}
=>
define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) {
%r = insertelement <4 x float> %y, float %x, i32 1
ret <4 x float> %r ; %r[3] = %y[3], incorrect if %y[3] = poison
}
Transformation doesn't verify!
ERROR: Target is more poisonous than source
```
I’d like to suggest
1. Using poison as insertelement’s placeholder value (IRBuilder::CreateVectorSplat should be patched too)
2. Updating shufflevector’s semantics to return poison element if mask is undef
Note that poison is currently lowered into UNDEF in SelDag, so codegen part is okay.
m_Undef() matches PoisonValue as well, so existing optimizations will still fire.
The only concern is hidden miscompilations that will go incorrect when poison constant is given.
A conservative way is copying all tests having `insertelement undef` & replacing it with `insertelement poison` & run Alive2 on it, but it will create many tests and people won’t like it. :(
Instead, I’ll simply locally maintain the tests and run Alive2.
If there is any bug found, I’ll report it.
Relevant links: https://bugs.llvm.org/show_bug.cgi?id=43958 , http://lists.llvm.org/pipermail/llvm-dev/2019-November/137242.html
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93586
Every basic block section symbol created by -fbasic-block-sections will contain
".__part." to know that this symbol corresponds to a basic block fragment of
the function.
This patch solves two problems:
a) Like D89617, we want function symbols with suffixes to be properly qualified
so that external tools like profile aggregators know exactly what this
symbol corresponds to.
b) The current basic block naming just adds a ".N" to the symbol name where N is
some integer. This collides with how clang creates __cxx_global_var_init.N.
clang creates these symbol names to call constructor functions and basic
block symbol naming should not use the same style.
Fixed all the test cases and added an extra test for __cxx_global_var_init
breakage.
Differential Revision: https://reviews.llvm.org/D93082
Currently there is an issue where the legacy pass manager uses a different OptBisect counter than the new pass manager.
This fix makes the npm OptBisectInstrumentation use the global OptBisect.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D92897
This patch enables canonicalization of SPF_ABS and SPF_ABS
to the abs intrinsic.
This is a recommit, the original try was
05d4c4ebc2,
but it was reverted due to an apparent miscompile,
which since then has just been fixed by the previous commit.
Differential Revision: https://reviews.llvm.org/D87188
Similar to D69312, and documented in D69839, the IRBuilder needs to add
the strictfp attribute to invoke instructions when constrained floating
point is enabled.
This is try 2, with the test corrected.
Differential Revision: https://reviews.llvm.org/D93134
Clang FE currently has hot/cold function attribute. But we only have
cold function attribute in LLVM IR.
This patch adds support of hot function attribute to LLVM IR. This
attribute will be used in setting function section prefix/suffix.
Currently .hot and .unlikely suffix only are added in PGO (Sample PGO)
compilation (through isFunctionHotInCallGraph and
isFunctionColdInCallGraph).
This patch changes the behavior. The new behavior is:
(1) If the user annotates a function as hot or isFunctionHotInCallGraph
is true, this function will be marked as hot. Otherwise,
(2) If the user annotates a function as cold or
isFunctionColdInCallGraph is true, this function will be marked as
cold.
The changes are:
(1) user annotated function attribute will used in setting function
section prefix/suffix.
(2) hot attribute overwrites profile count based hotness.
(3) profile count based hotness overwrite user annotated cold attribute.
The intention for these changes is to provide the user a way to mark
certain function as hot in cases where training input is hard to cover
all the hot functions.
Differential Revision: https://reviews.llvm.org/D92493
Add a special case for handling __builtin_mul_overflow with unsigned
inputs and a signed output to avoid emitting the __muloti4 library
call on x86_64. __muloti4 is not implemented in libgcc, so avoiding
this call fixes compilation of some programs that call
__builtin_mul_overflow with these arguments.
For example, this fixes the build of cpio with clang, which includes code from
gnulib that calls __builtin_mul_overflow with these argument types.
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D84405
On PPC, the vector pair instructions are independent from MMA.
This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names.
We also move the vector pair type/intrinsic/builtin tests to their own files.
Differential Revision: https://reviews.llvm.org/D91974
If two variables are declared with __attribute__((section(name))) and
the implicit section types (e.g. read only vs writeable) conflict, an
error is raised. Extend this mechanism so that an error is raised if the
section type implied by a function's __attribute__((section)) conflicts
with that of another variable.
Commit 9e52c43090 removed the directive
defining LINE_1600 but left a string substitution to that variable in a
CHECK-NOT directive. This will make that CHECK-NOT directive always fail
to match, no matter the string.
This commit follows the pattern done in
9e52c43090 of simplifying the CHECK-NOT to
only look for the function name and the opening parenthesis, thereby not
requiring the LINE_1600 variable.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D93350
This change makes use of the llvm.vector.extract intrinsic to avoid
going through memory when performing bitcasts between vector-length
agnostic types and vector-length specific types.
Depends on D91362
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D92761
The `assume` attribute is a way to provide additional, arbitrary
information to the optimizer. For now, assumptions are restricted to
strings which will be accumulated for a function and emitted as comma
separated string function attribute. The key of the LLVM-IR function
attribute is `llvm.assume`. Similar to `llvm.assume` and
`__builtin_assume`, the `assume` attribute provides a user defined
assumption to the compiler.
A follow up patch will introduce an LLVM-core API to query the
assumptions attached to a function. We also expect to add more options,
e.g., expression arguments, to the `assume` attribute later on.
The `omp [begin] asssumes` pragma will leverage this attribute and
expose the functionality in the absence of OpenMP.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D91979
Similar to D69312, and documented in D69839, the IRBuilder needs to add
the strictfp attribute to invoke instructions when constrained floating
point is enabled.
Differential Revision: https://reviews.llvm.org/D93134
Prior to this patch, Clang supported the following C/C++ intrinsics:
vceqz_p16
vceqzq_p16
vmlaq_n_f64
vmlsq_n_f64
... exposed through arm_neon.h. However, these intrinsics are not part
of the ACLE, allowing developers to write code that is not compatible
with other toolchains.
This patch removes these intrinsics.
There is a bug report capturing this issue here:
https://bugs.llvm.org/show_bug.cgi?id=47471
Reviewed By: bsmith
Differential Revision: https://reviews.llvm.org/D93206
This patch enables marshalling of the exception model options while enforcing their mutual exclusivity. The clang driver interface remains the same, this only affects the cc1 command line.
Depends on D93215.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D93216
(The clang build fails for me locally, so this is based on built bot output and a guess as to root cause.)
f5fe849 made the execution of LAA conditional, so I'm guessing that's the root cause.
Followup to D87604, having confirmed on PR47506 that we can use the llvm codegen expansion for fadd/fmul as well.
Differential Revision: https://reviews.llvm.org/D92940
Background: Call to library arithmetic functions for div is emitted by the
compiler and it set wrong “C” calling convention for calls to these functions,
whereas library functions are declared with `spir_function` calling convention.
InstCombine optimization replaces such calls with “unreachable” instruction.
It looks like clang lacks SPIRABIInfo class which should specify default
calling conventions for “system” function calls. SPIR supports only
SPIR_FUNC and SPIR_KERNEL calling convention.
Reviewers: Erich Keane, Anastasia
Differential Revision: https://reviews.llvm.org/D92721
This patch adds vcmla and the rotated variants as defined in
"Arm Neon Intrinsics Reference for ACLE Q3 2020" [1]
The *_lane_* are still missing, but they can be added separately.
This patch only adds the builtin mapping for AArch64.
[1] https://developer.arm.com/documentation/ihi0073/latest
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D92930
This patch implements amx programming model that discussed in llvm-dev
(http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html).
Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet.
This patch implemeted 7 components.
1. The c interface to end user.
2. The AMX intrinsics in LLVM IR.
3. Transform load/store <256 x i32> to AMX intrinsics or split the
type into two <128 x i32>.
4. The Lowering from AMX intrinsics to AMX pseudo instruction.
5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx
intruction.
6. The register allocation for tile register.
7. Morph AMX pseudo instruction to AMX real instruction.
Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0
Differential Revision: https://reviews.llvm.org/D87981
This test shows we're in some cases not getting strictfp information from
the AST. Correct that.
Differential Revision: https://reviews.llvm.org/D92596
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
This patch adds tests that showcase a behavior that is currently buggy.
Fix in a follow-up patch.
Differential Revision: https://reviews.llvm.org/D91269
Instruction darn was introduced in ISA 3.0. It means 'Deliver A Random
Number'. The immediate number L means:
- L=0, the number is 32-bit (higher 32-bits are all-zero)
- L=1, the number is 'conditioned' (processed by hardware to reduce bias)
- L=2, the number is not conditioned, directly from noise source
GCC implements them in three separate intrinsics: __builtin_darn,
__builtin_darn_32 and __builtin_darn_raw. This patch implements the
same intrinsics. And this change also addresses Bugzilla PR39800.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D92465
Emit error for use of 128-bit integer inside device code had been
already implemented in https://reviews.llvm.org/D74387. However,
the error is not emitted for SPIR64, because for SPIR64, hasInt128Type
return true.
hasInt128Type: is also used to control generation of certain 128-bit
predefined macros, initializer predefined 128-bit integer types and
build 128-bit ArithmeticTypes. Except predefined macros, only the
device target is considered, since error only emit when 128-bit
integer is used inside device code, the host target (auxtarget) also
needs to be considered.
The change address:
1. (SPIR.h) Correct hasInt128Type() for SPIR targets.
2. Sema.cpp and SemaOverload.cpp: Add additional check to consider host
target(auxtarget) when call to hasInt128Type. So that __int128_t
and __int128() are allowed to avoid error when they used outside
device code.
3. SemaType.cpp: add check for SYCLIsDevice to delay the error message.
The error will be emitted if the use of 128-bit integer in the device
code.
Reviewed By: Johannes Doerfert and Aaron Ballman
Differential Revision: https://reviews.llvm.org/D92439
Commit 6b1341eb fixed alignment for 128-bit FP types on PowerPC.
However, the quadword alignment adjustment shouldn't be applied to IBM
extended double (ppc_fp128 in IR) values.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92278
Enable performing mandatory inlinings upfront, by reusing the same logic
as the full inliner, instead of the AlwaysInliner. This has the
following benefits:
- reduce code duplication - one inliner codebase
- open the opportunity to help the full inliner by performing additional
function passes after the mandatory inlinings, but before th full
inliner. Performing the mandatory inlinings first simplifies the problem
the full inliner needs to solve: less call sites, more contextualization, and,
depending on the additional function optimization passes run between the
2 inliners, higher accuracy of cost models / decision policies.
Note that this patch does not yet enable much in terms of post-always
inline function optimization.
Differential Revision: https://reviews.llvm.org/D91567
This change introduces a new clang switch `-fpseudo-probe-for-profiling` to enable AutoFDO with pseudo instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.
One implication from pseudo-probe instrumentation is that the profile is now sensitive to CFG changes. We perform the pseudo instrumentation very early in the pre-LTO pipeline, before any CFG transformation. This ensures that the CFG instrumented and annotated is stable and optimization-resilient.
The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D86502
Currently clang is not correctly retrieving from the AST the metadata for
constrained FP builtins. This patch fixes that for the non-target specific
builtins.
Differential Revision: https://reviews.llvm.org/D92122
This patch enables vector type arguments on AIX. All non-aggregate Altivec vector types are 16bytes in size and are 16byte aligned.
Reviewed By: Xiangling_L
Differential Revision: https://reviews.llvm.org/D92117
This code got quite twisted because we consider some MSVC builtins to be
target agnostic, and some to be target specific. Target specific
intrinsics have a pattern of doing up-front argument evaluation, while
general intrinsics do not evaluate their arguments up front. As we tried
to share codepaths between the target-specific and target-agnostic
handling, we ended up doing double evaluation.
Instead, have each target handle MSVC intrinsics consistently before up
front argument evaluation. This requires passing less data around and is
more consistent with target independent intrinsic handling.
See D50979 for past examples of this bug. I noticed this while looking
into adding some more intrinsics.
Differential Revision: https://reviews.llvm.org/D92061
The macro is emitted when wargeting SVE code generation with the additional command line option `-msve-vector-bits=<N>`.
The behavior implied by the macro is described in sections "3.7.3.3. Behavior specific to SVE vectors" of the SVE ACLE (Version 00bet6) that can be found at https://developer.arm.com/documentation/100987/latest
Reviewed By: rengolin, rsandifo-arm
Differential Revision: https://reviews.llvm.org/D90956
Added support for the options mabi=vec-extabi and mabi=vec-default which are analogous to qvecnvol and qnovecnvol when using XL on AIX.
The extended Altivec ABI on AIX is enabled using mabi=vec-extabi in clang and vec-extabi in llc.
Reviewed By: Xiangling_L, DiggerLin
Differential Revision: https://reviews.llvm.org/D89684
Previously this option could be used to skip devirtualizations of the
given functions in regular LTO and in the ThinLTO indexing step. This
change allows them to be skipped in the backend as well, which is useful
when debugging WPD in a distributed ThinLTO backend.
Differential Revision: https://reviews.llvm.org/D91812
After commit 2482648a79, a GNU grep option
is just passed unconditionally to `grep` in general. This patch fixes
the test for platforms where `grep` is not GNU grep.
These tests invoke opt and llc even though they are in the frontend.
We now do a better job of generating commuted patterns for fma so
these tests now form fmls instead of fmla+fneg.
The dependency mechanism for C has been implemented, and we have rolled out
this to all internal users, didn't see crashy issues, we consider it is stable
enough.
Differential Revision: https://reviews.llvm.org/D89046
This will ensure that passes that add new global variables will create them
in address space 1 once the passes have been updated to no longer default
to the implicit address space zero.
This also changes AutoUpgrade.cpp to add -G1 to the DataLayout if it wasn't
already to present to ensure bitcode backwards compatibility.
Reviewed by: arsenm
Differential Revision: https://reviews.llvm.org/D84345
For MASM syntax, the prefixes are not enclosed in braces.
The assembly code should like:
"evex vcvtps2pd xmm0, xmm1"
Differential Revision: https://reviews.llvm.org/D90441
According to ELF v2 ABI, both IEEE 128-bit and IBM extended floating
point variables should be quad-word (16 bytes) aligned. Previously, only
vector types are considered aligned as quad-word on PowerPC.
This patch will fix incorrectness of IEEE 128-bit float argument in
va_arg cases.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D91596
This patch creates a SystemZ folder in clang/test/CodeGen to contain systemz-related lit tests.
Reviewed By: muiez
Differential Revision: https://reviews.llvm.org/D91628
Matrix types in memory are represented as arrays, but accessed through
vector pointers, with the alignment specified on the access operation.
For inline assembly, update pointer arguments to use vector pointers.
Otherwise there will be a mis-match if the matrix is also an
input-argument which is represented as vector.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D91631
This patch adds a new test cases which uses a matrix value as memory
inline assembly argument. Currently the pointer element type does not
match the vector type.
arguments.
* Adds 'nonnull' and 'dereferenceable(N)' to 'this' pointer arguments
* Gates 'nonnull' on -f(no-)delete-null-pointer-checks
* Introduces this-nonnull.cpp and microsoft-abi-this-nullable.cpp tests to
explicitly test the behavior of this change
* Refactors hundreds of over-constrained clang tests to permit these
attributes, where needed
* Updates Clang12 patch notes mentioning this change
Reviewed-by: rsmith, jdoerfert
Differential Revision: https://reviews.llvm.org/D17993
This patch adds a new pass to add !annotation metadata for entries in
@llvm.global.anotations, which is generated using
__attribute__((annotate("_name"))) on functions in Clang.
This has been discussed on llvm-dev as part of
RFC: Combining Annotation Metadata and Remarks
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D91195
See discussion in https://bugs.llvm.org/show_bug.cgi?id=45073 / https://reviews.llvm.org/D66324#2334485
the implementation is known-broken for certain inputs,
the bugreport was up for a significant amount of timer,
and there has been no activity to address it.
Therefore, just completely rip out all of misexpect handling.
I suspect, fixing it requires redesigning the internals of MD_misexpect.
Should anyone commit to fixing the implementation problem,
starting from clean slate may be better anyways.
This reverts commit 7bdad08429,
and some of it's follow-ups, that don't stand on their own.
Make it required. Since it's a module pass, optnone won't test it, so
extend the clang test to also use opt-bisect now that it's supported.
14/16 check-dfsan tests failed with NPM enabled, now all pass.
Reviewed By: leonardchan
Differential Revision: https://reviews.llvm.org/D91385
We have option -mabi=ieeelongdouble to set current long double to
IEEEquad semantics. Like what GCC does, we need to define
__LONG_DOUBLE_IEEE128__ macro in this case, and __LONG_DOUBLE_IBM128__
if using PPCDoubleDouble.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D90208
Some targets may add required passes via
TargetMachine::registerPassBuilderCallbacks(). We need to run those even
under -O0. As an example, BPFTargetMachine adds
BPFAbstractMemberAccessPass, a required pass.
This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
usage of the NPM) by allowing us to share added passes like coroutines
and sanitizers between -O0 and other optimization levels.
Since callbacks may end up not adding passes, we need to check if the
pass managers are empty before adding them, so PassManager now has an
isEmpty() function. For example, polly adds callbacks but doesn't always
add passes in those callbacks, so this is necessary to keep
-debug-pass-manager tests' output from changing depending on if polly is
enabled or not.
Tests are a continuation of those added in
https://reviews.llvm.org/D89083.
Reviewed By: asbirlea, Meinersbur
Differential Revision: https://reviews.llvm.org/D89158
Support a vector register constraint in inline asm of clang.
Add a regression test also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D91251
This patch adds three intrinsics compatible to x86's SSE 4.1 on PowerPC
target, with tests:
- _mm_insert_epi8
- _mm_insert_epi32
- _mm_insert_epi64
The intrinsics implementation is contributed by Paul Clarke.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D89242
D86841 had an error where for statements with no conditional were
required to make progress. This is not true, this patch removes that
line, and adds regression tests.
Differential Revision: https://reviews.llvm.org/D91075
This reverts commit b1878b4641. This does
fix the test but it means that ac73b73c16 is not implemented
correctly. Reverting for now, and will be reverting the commit that
causes this to fail.
The strictfp metadata was added to the casting AST nodes in D85960, but
we aren't using that metadata yet. This patch adds that support.
In order to avoid lots of ad-hoc passing around of the strictfp bits I
updated the IRBuilder when moving from a function that has the Expr* to a
function that lacks it. I believe we should switch to this pattern to keep
the strictfp support from being overly invasive.
For the purpose of testing that we're picking up the right metadata, I
also made my tests use a pragma to make the AST's strictfp metadata not
match the global strictfp metadata. This exposes issues that we need to
deal with in subsequent patches, and I believe this is the right method
for most all of our clang strictfp tests.
Differential Revision: https://reviews.llvm.org/D88913
This test was added in 7f38812d5b
and all the other tests make use of the COMMONIR check. So I think
this was left in by mistake for this particular test.
Reviewed By: kpn
Differential Revision: https://reviews.llvm.org/D90921
For the language C++ the keyword __unaligned (a Microsoft extension) had no effect on pointers.
The reason, why there was a difference between C and C++ for the keyword __unaligned:
For C, the Method getAsCXXREcordDecl() returns nullptr. That guarantees that hasUnaligned() is called.
If the language is C++, it is not guaranteed, that hasUnaligend() is called and evaluated.
Here are some links:
The Bug: https://bugs.llvm.org/show_bug.cgi?id=47499
Thread on the cfe-dev mailing list: http://lists.llvm.org/pipermail/cfe-dev/2020-September/066783.html
Diff, that introduced the check hasUnaligned() in getNaturalTypeAlignment(): https://reviews.llvm.org/D30166
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D90630
Some targets may add required passes via
TargetMachine::registerPassBuilderCallbacks(). We need to run those even
under -O0. As an example, BPFTargetMachine adds
BPFAbstractMemberAccessPass, a required pass.
This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
usage of the NPM) by allowing us to share added passes like coroutines
and sanitizers between -O0 and other optimization levels.
Tests are a continuation of those added in
https://reviews.llvm.org/D89083.
In order to prevent TargetMachines from adding unnecessary optimization
passes at -O0, TargetMachine::registerPassBuilderCallbacks() will be
changed to take an OptimizationLevel, but that will be done separately.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D89158
Since C++11, the C++ standard has a forward progress guarantee
[intro.progress], so all such functions must have the `mustprogress`
requirement. In addition, from C11 and onwards, loops without a non-zero
constant conditional or no conditional are also required to make
progress (C11 6.8.5p6). This patch implements these attribute deductions
so they can be used by the optimization passes.
Differential Revision: https://reviews.llvm.org/D86841
Since glibc has supported math library functions conforming IEEE 128-bit
floating point types on some platform (like ppc64le), we can fix clang's
math builtins missing this type.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D90593
Add MMA builtin decoding. These builtins use the new PowerPC-specific types __vector_pair and __vector_quad.
So to avoid pervasive changes, we use custom type descriptors and custom decoding for these builtins.
We also use custom code generation to expand builtin calls with pointers to simpler intrinsic calls with non-pointer types.
Differential Revision: https://reviews.llvm.org/D81748
This test was removed in 5963e028e7 because it failed on cores where
support of constrained intrinsics was limited. Now this test is enabled
only on x86.
For PS4 development we support dllimport/export annotations in
source code. This patch enables the dllimport/export attributes
on PS4 by adding a new function to query the triple for whether
dllimport/export are used and using that function to decide
whether these attributes are supported. This replaces the current
method of checking if the target is Windows.
This means we can drop the use of "TargetArch" in the .td file
(which is an improvement as dllimport/export support isn't really
a function of the architecture).
I have included a simple codgen test to show that the attributes
are accepted and have an effect on codegen for PS4. I have also
enabled the DLLExportStaticLocal and DLLImportStaticLocal
attributes, which we support downstream. However, I am unable to
write a test for these attributes until other patches for PS4
dllimport/export handling land upstream. Whilst writing this
patch I noticed that, as these attributes are internal, they do
not need to be target specific (when these attributes are added
internally in Clang the target specific checks have already been
run); however, I think leaving them target specific is fine
because it isn't harmful and they "really are" target specific
even if that has no functional impact.
Differential Revision: https://reviews.llvm.org/D90442
Similar to -fprofile-generate=, add -fmemory-profile= which takes a
directory path. This is passed down to LLVM via a new module flag
metadata. LLVM in turn provides this name to the runtime via the new
__memprof_profile_filename variable.
Additionally, always pass a default filename (in $cwd if a directory
name is not specified vi the = form of the option). This is also
consistent with the behavior of the PGO instrumentation. Since the
memory profiles will generally be fairly large, it doesn't make sense to
dump them to stderr. Also, importantly, the memory profiles will
eventually be dumped in a compact binary format, which is another reason
why it does not make sense to send these to stderr by default.
Change the existing memprof tests to specify log_path=stderr when that
was being relied on.
Depends on D89086.
Differential Revision: https://reviews.llvm.org/D89087
Pragma 'clang fp' is extended to support a new option, 'exceptions'. It
allows to specify floating point exception behavior more flexibly.
Differential Revision: https://reviews.llvm.org/D89849
This patch mainly made the following changes:
1. Support AVX-VNNI instructions;
2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix.
Differential Revision: https://reviews.llvm.org/D89105
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
We don't currently support passing unnamed variadic SVE arguments
so I've added a fatal error if we hit such cases to prevent any
silent ABI issues in future.
Differential Revision: https://reviews.llvm.org/D90230
This patch is mainly doing two things:
1. Adding support for parentheses, making the combination of target features
more diverse;
2. Making the priority of ’,‘ is higher than that of '|' by default. So I need
to make some change with PTX Builtin function.
Differential Revision: https://reviews.llvm.org/D89184
When passing -lto-embed-bitcode=post-merge-pre-opt, we were getting
empty .llvmcmd sections. It turns out that is because the
CodeGenOptions::CmdArgs field was only populated when clang saw
-fembed-bitcode={all|marker}.
This patch always populates the CodeGenOptions::CmdArgs. The overhead
of carrying through in memory in all cases is likely negligible in
the grand schema of things, and it keeps the using code simple.
Differential Revision: https://reviews.llvm.org/D90366
llvm::EmbedBitcodeInModule needs (what used to be called) EmbedMarker
set, in order to emit .llvmcmd. EmbedMarker is really about embedding the
command line, so renamed the parameter accordingly, too.
This was not caught at test because the check-prefix was incorrect, but
FileCheck does not report that when multiple prefixes are provided. A
separate patch will address that.
Differential Revision: https://reviews.llvm.org/D90278
Define the __vector_pair and __vector_quad types that are used to manipulate
the new accumulator registers introduced by MMA on PowerPC. Because these two
types are specific to PowerPC, they are defined in a separate new file so it
will be easier to add other PowerPC specific types if we need to in the future.
Differential Revision: https://reviews.llvm.org/D81508
As proposed in https://github.com/WebAssembly/simd/pull/376. This commit
implements new builtin functions and intrinsics for these instructions, but does
not yet add them to wasm_simd128.h because they have not yet been merged to the
proposal. These are the first instructions with opcodes greater than 0xff, so
this commit updates the MC layer and disassembler to handle that correctly.
Differential Revision: https://reviews.llvm.org/D90253
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
Prepend the module name hash with a fixed string ".__uniq." which helps tools
that consume sampled profiles and attribute it to functions to understand
that this symbol belongs to a unique internal linkage type symbol.
Symbols with suffixes can result from various optimizations in the compiler.
Function Multiversioning, function splitting, parameter constant propogation,
unique internal linkage names.
External tools like sampled profile aggregators combine profiles from multiple
runs of a binary. They use various heuristics with symbols that have suffixes
to try and attribute the profile to the right function instance. For instance
multi-versioned symbols like foo.avx, foo.sse4.2, etc even though different
should be attributed to the same source function if a single function is
versioned, using attribute target_clones (supported in GCC but yet to land in
LLVM). Similarly, functions that are split (split part having a .cold suffix)
could have profiles for both the original and split symbols but would be
aggregated and attributed to the original function that was split.
Unique internal linkage functions however have different source instances and
the aggregator must not put them together but attribute it to the appropriate
function instance. To be sure that we are dealing with a symbol of a unique
internal linkage function, we would like to prepend the hash with a known
string ".__uniq." which these tools can check to understand the suffix type.
Differential Revision: https://reviews.llvm.org/D89617
This allows using annotation in a much more contexts than it currently has.
especially when annotation with template or constexpr.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D88645
For now, we lost the encoding information if we using inline assembly.
The encoding for the inline assembly will keep default even if we add
the vex/evex prefix.
Differential Revision: https://reviews.llvm.org/D90009
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.
It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.
While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u
Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining. Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.
Fixes pr/47479.
Reviewed By: void
Differential Revision: https://reviews.llvm.org/D87956
On AIX, to support vector types, which should always be 16 bytes aligned,
we set alloca to return 16 bytes aligned memory space.
Differential Revision: https://reviews.llvm.org/D89910
assembly operands."
Earlyclobbers are now excepted from this change (original commit: c78da03).
Review: Ulrich Weigand, Nick Desaulniers
Differential Revision: https://reviews.llvm.org/D87279
With -fbasicblock-sections=, let the front-end handle the case where the file
doesnt exist. The driver only checks if the option syntax is right.
Differential Revision: https://reviews.llvm.org/D89500
D70365 allows us to make attributes default. This is a follow up to
actually make nosync, nofree and willreturn default. The approach we
chose, for now, is to opt-in to default attributes to avoid introducing
problems to target specific intrinsics. Intrinsics with default
attributes can be created using `DefaultAttrsIntrinsic` class.
for which it matters.
This is a step towards separating checking for a constant initializer
(in which std::is_constant_evaluated returns true) and any other
evaluation of a variable initializer (in which it returns false).
Recently commit D78699 (commit 26cfb6e562), fixed clang's behavior with respect
to passing a union type through a register to correctly follow the ABI. However,
this is an ABI breaking change with earlier versions of the clang compiler, so we
should add an -fclang-abi-compat option to address this. Additionally, the PS4 ABI
requires the older behavior, so that is added as well.
This change adds a Ver11 value to the ClangABI enum that when it is set (or the
target is the PS4 triple), we skip the ABI fix introduced in D78699.
Differential Revision: https://reviews.llvm.org/D89747
This broke Chromium's PGO build, it seems because hot-cold-splitting got turned
on unintentionally. See comment on the code review for repro etc.
> This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
> the splitting pass to be toggled on/off. The current method of passing
> `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
> correctly (say, with `-O0` or `-Oz`).
>
> To implement the -fsplit-cold-code option, an attribute is applied to
> functions to indicate that they may be considered for splitting. This
> removes some complexity from the old/new PM pipeline builders, and
> behaves as expected when LTO is enabled.
>
> Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
> Differential Revision: https://reviews.llvm.org/D57265
> Reviewed By: Aditya Kumar, Vedant Kumar
> Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
This reverts commit 273c299d5d.
Old GCC used to aggressively fold VLAs to constant-bound arrays at block
scope in GNU mode. That's non-conforming, and more modern versions of
GCC only do this at file scope. Update Clang to do the same.
Also promote the warning for this from off-by-default to on-by-default
in all cases; more recent versions of GCC likewise warn on this by
default.
This is still slightly more permissive than GCC, as pointed out in
PR44406, as we still fold VLAs to constant arrays in structs, but that
seems justifiable given that we don't support VLA-in-struct (and don't
intend to ever support it), but GCC does.
Differential Revision: https://reviews.llvm.org/D89523
After investigation by @asbirlea, the issue that caused the
revert appears to be an issue in the original source, rather
than a problem with the compiler.
This patch enables MemorySSA DSE again.
This reverts commit 915310bf14.
This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
the splitting pass to be toggled on/off. The current method of passing
`-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
correctly (say, with `-O0` or `-Oz`).
To implement the -fsplit-cold-code option, an attribute is applied to
functions to indicate that they may be considered for splitting. This
removes some complexity from the old/new PM pipeline builders, and
behaves as expected when LTO is enabled.
Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
Differential Revision: https://reviews.llvm.org/D57265
Reviewed By: Aditya Kumar, Vedant Kumar
Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
rL131311 added `asm()` support for builtin functions, but `asm()` for builtins with
specialized emitting (e.g. memcpy, various math functions) still do not work.
This patch makes these functions work for `asm()` and `#pragma redefine_extname`.
glibc uses `asm()` to redirect internal libc function calls to hidden aliases.
Limitation: such a function is a builtin in clang, but will not be recognized as
a libcall in optimization passes because Clang does not annotate the renamed
function as a libcall. In GCC -O1 or above, `abs` can be optimized out but we can't.
Additionally, we cannot redirect `__builtin_sin` to `real_sin` in the following example:
double sin(double x) asm("real_sin");
double f(double d) { return __builtin_sin(d); }
---
According to @rsmith, the following three statements cannot be simultaneously true:
(1) The frontend function foo has known, builtin semantics X.
(2) The symbol foo has known, builtin semantics X.
(3) It's not correct to lower a call to the frontend function foo to the symbol foo.
People do want (1) (if it is profitable to expand a memcpy, do it).
This also means that people do not want to add -fno-builtin-memcpy.
People do want (3): that is why they use asm("__GI_memcpy") in the first place.
So unfortunately we make a compromise by not refuting (2) (see the limitation above).
For most libcalls, there is a small loss because compilers don't synthesize them.
For the few glibc cares about, it uses `asm("memcpy = __GI_memcpy");` to make
the assembly level redirection.
(Changing function names (e.g. `__memcpy`) is a hit to ergonomics which is not acceptable).
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D88712
Prototype the newly proposed load_lane instructions, as specified in
https://github.com/WebAssembly/simd/pull/350. Since these instructions are not
available to origin trial users on Chrome stable, make them opt-in by only
selecting them from intrinsics rather than normal ISel patterns. Since we only
need rough prototypes to measure performance right now, this commit does not
implement all the load and store patterns that would be necessary to make full
use of the offset immediate. However, the full suite of offset tests is included
to make it easy to track improvements in the future.
Since these are the first instructions to have a memarg immediate as well as an
additional immediate, the disassembler needed some additional hacks to be able
to parse them correctly. Making that code more principled is left as future
work.
Differential Revision: https://reviews.llvm.org/D89366
Change EmitAsmStmt() to
- Not tie physregs with the "+r" constraint, but instead add the hard
register as an input constraint. This makes "+r" and "=r":"r" look the same
in the output.
Background: Macro intensive user code may contain inline assembly
statements with multiple operands constrained to the same physreg. Such a
case (with the operand constraints "+r" : "r") currently triggers the
TwoAddressInstructionPass assertion against any extra use of a tied
register. Furthermore, TwoAddress will insert a COPY to that physreg even
though isel has already done so (for the non-tied use), which may lead to a
second redundant instruction currently. A simple fix for this is to not
emit tied physreg uses in the first place for the "+r" constraint, which is
what this patch does.
- Give an error on multiple outputs to the same physical register.
This should be reported and this is also what GCC does.
Review: Ulrich Weigand, Aaron Ballman, Jennifer Yu, Craig Topper
Differential Revision: https://reviews.llvm.org/D87279
This patch resumes the work of D16586.
According to the AAPCS, volatile bit-fields should
be accessed using containers of the widht of their
declarative type. In such case:
```
struct S1 {
short a : 1;
}
```
should be accessed using load and stores of the width
(sizeof(short)), where now the compiler does only load
the minimum required width (char in this case).
However, as discussed in D16586,
that could overwrite non-volatile bit-fields, which
conflicted with C and C++ object models by creating
data race conditions that are not part of the bit-field,
e.g.
```
struct S2 {
short a;
int b : 16;
}
```
Accessing `S2.b` would also access `S2.a`.
The AAPCS Release 2020Q2
(https://documentation-service.arm.com/static/5efb7fbedbdee951c1ccf186?token=)
section 8.1 Data Types, page 36, "Volatile bit-fields -
preserving number and width of container accesses" has been
updated to avoid conflict with the C++ Memory Model.
Now it reads in the note:
```
This ABI does not place any restrictions on the access widths of bit-fields where the container
overlaps with a non-bit-field member or where the container overlaps with any zero length bit-field
placed between two other bit-fields. This is because the C/C++ memory model defines these as being
separate memory locations, which can be accessed by two threads simultaneously. For this reason,
compilers must be permitted to use a narrower memory access width (including splitting the access into
multiple instructions) to avoid writing to a different memory location. For example, in
struct S { int a:24; char b; }; a write to a must not also write to the location occupied by b, this requires at least two
memory accesses in all current Arm architectures. In the same way, in struct S { int a:24; int:0; int b:8; };,
writes to a or b must not overwrite each other.
```
I've updated the patch D16586 to follow such behavior by verifying that we
only change volatile bit-field access when:
- it won't overlap with any other non-bit-field member
- we only access memory inside the bounds of the record
- avoid overlapping zero-length bit-fields.
Regarding the number of memory accesses, that should be preserved, that will
be implemented by D67399.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D72932
Emit the equivalent integer reduction intrinsics in IR instead of expanding to shuffle+arithmetic sequences.
The fadd/fmul reductions might be trickier as they assume a similar bisection reduction while the generic intrinsics assume a sequential reduction (intel docs are ambiguous on the correct approach) - I'm not sure if we want to always tag them with reassoc? Anyway, that issue can wait until a separate fp patch along with the fmin/fmax reductions.
Differential Revision: https://reviews.llvm.org/D87604
And another step towards transforms not introducing inttoptr and/or
ptrtoint casts that weren't there already.
As we've been establishing (see D88788/D88789), if there is a int<->ptr cast,
it basically must stay as-is, we can't do much with it.
I've looked, and the most source of new such casts being introduces,
as far as i can tell, is this transform, which, ironically,
tries to reduce count of casts..
On vanilla llvm test-suite + RawSpeed, @ `-O3`, this results in
-33.58% less `IntToPtr`s (19014 -> 12629)
and +76.20% more `PtrToInt`s (18589 -> 32753),
which is an increase of +20.69% in total.
However just on RawSpeed, where i know there are basically
none `IntToPtr` in the original source code,
this results in -99.27% less `IntToPtr`s (2724 -> 20)
and +82.92% more `PtrToInt`s (4513 -> 8255).
which is again an increase of 14.34% in total.
To me this does seem like the step in the right direction,
we end up with strictly less `IntToPtr`, but strictly more `PtrToInt`,
which seems like a reasonable trade-off.
See https://reviews.llvm.org/D88860 / https://reviews.llvm.org/D88995
for some more discussion on the subject.
(Eventually, `CastInst::isNoopCast()`/`CastInst::isEliminableCastPair`
should be taught about this, yes)
Reviewed By: nlopes, nikic
Differential Revision: https://reviews.llvm.org/D88979
There doesn't seem to be a direct test of this, and I'm planning to make
future changes which will affect it.
I'm not particularly familiar with the blocks extension, so suggestions
for better tests are welcome.
Differential Revision: https://reviews.llvm.org/D88754
For example:
union M256 {
double d;
__m256 m;
};
extern void foo1(union M256 A);
union M256 m1;
void test() {
foo1(m1);
}
clang will pass m1 through stack which does not follow the ABI.
Differential Revision: https://reviews.llvm.org/D78699
Move it as an EP callback (-O[123]) or in addSanitizersAtO0.
This makes it not run in ThinLTO pre-link (like the other sanitizers),
so don't check LTO runs in hwasan-new-pm.c. Changing its position also
seems to change the generated IR. I think we just need to make sure the
pass runs.
Reviewed By: leonardchan
Differential Revision: https://reviews.llvm.org/D88936
SUMMARY:
In IBM compiler xlclang , there is an option -fnovisibility which suppresses visibility. For more details see: https://www.ibm.com/support/knowledgecenter/SSGH3R_16.1.0/com.ibm.xlcpp161.aix.doc/compiler_ref/opt_visibility.html.
We need to add the option -mignore-xcoff-visibility for compatibility with the IBM AIX OS (as the option is enabled by default in AIX). With this option llvm does not emit any visibility attribute to ASM or XCOFF object file.
The option only work on the AIX OS, for other non-AIX OS using the option will report an unsupported options error.
In AIX OS:
1.1 the option -mignore-xcoff-visibility is enabled by default , if there is not -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command .
1.2 if there is -fvisibility=* explicitly but not -mignore-xcoff-visibility explicitly in the clang command. it will generate visibility attributes.
1.3 if there are both -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command. The option "-mignore-xcoff-visibility" wins , it do not emit the visibility attribute.
The option -mignore-xcoff-visibility has no effect on visibility attribute when compile with -emit-llvm option to generated LLVM IR.
Reviewer: daltenty,Jason Liu
Differential Revision: https://reviews.llvm.org/D87451
Set the default alignment control variables for z/OS target and add test case for alignment rules on z/OS.
Reviewed By: abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D88845
(it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html)
This canonicalization seems dubious.
Most importantly, while it does not create `inttoptr` casts by itself,
it may cause them to appear later, see e.g. D88788.
I think it's pretty obvious that it is an undesirable outcome,
by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts
are not no-op, and are no longer eager to look past them.
Which e.g. means that given
```
%a = load i32
%b = inttoptr %a
%c = inttoptr %a
```
we likely won't be able to tell that `%b` and `%c` is the same thing.
As we can see in D88789 / D88788 / D88806 / D75505,
we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least)
And we can't recover the situation post-inlining in instcombine.
So it really does look like this fold is actively breaking
otherwise-good IR, in a way that is not recoverable.
And that means, this fold isn't helpful in exposing the passes
that are otherwise unaware of these patterns it produces.
Thusly, i propose to simply not perform such a canonicalization.
The original motivational RFC does not state what larger problem
that canonicalization was trying to solve, so i'm not sure
how this plays out in the larger picture.
On vanilla llvm test-suite + RawSpeed, this results in
increase of asm instructions and final object size by ~+0.05%
decreases final count of bitcasts by -4.79% (-28990),
ptrtoint casts by -15.41% (-3423),
and of inttoptr casts by -25.59% (-6919, *sic*).
Overall, there's -0.04% less IR blocks, -0.39% instructions.
See https://bugs.llvm.org/show_bug.cgi?id=47592
Differential Revision: https://reviews.llvm.org/D88789
This is one of the reason for extra invalidations in D84959. In
practice, I don't think we have use cases needing this. This simplifies
the pipeline a bit and prune corner cases when considering
invalidations.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D85676
We were taking multiple pointer arguments in the builtin.
gcc accepts a single void*.
The cast from void* to _m128i* caused the IR generation to assume
the pointer was aligned.
Instead make the builtin take a single void*, emit i8* GEPs to
adjust then cast to <2 x i64>* and perform a store with align of 1.
Summary: This patch implements the builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp.
The instructions correspond to the following builtins:
int vec_test_swdiv(vector double v1, vector double v2);
int vec_test_swdivs(vector float v1, vector float v2);
int vec_test_swsqrt(vector double v1);
int vec_test_swsqrts(vector float v1);
This patch depends on D88274, which fixes the bug in copying from CRRC to GPRC/G8RC.
Reviewed By: steven.zhang, amyk
Differential Revision: https://reviews.llvm.org/D88278
This is an alternate fix (see D87835) for a bug where a NaN constant
gets wrongly transformed into Infinity via truncation.
In this patch, we uniformly convert any SNaN to QNaN while raising
'invalid op'.
But we don't have a way to directly specify a 32-bit SNaN value in LLVM IR,
so those are always encoded/decoded by calling convert from/to 64-bit hex.
See D88664 for a clang fix needed to allow this change.
Differential Revision: https://reviews.llvm.org/D88238
This goes with the APFloat change proposed in
D88238.
This is copied from the MIPS-specific test in
builtin-nan-legacy.c to verify that the normal
behavior is correct on other targets without the
complication of an inverted quiet bit.
On some targets, preferred alignment is larger than ABI alignment in some cases. For example,
on AIX we have special power alignment rules which would cause that. Previously, to support
those cases, we added a “PreferredAlignment” field in the `RecordLayout` to store the AIX
special alignment values in “PreferredAlignment” as the community suggested.
However, that patch alone is not enough. There are places in the Clang where `PreferredAlignment`
should have been used instead of ABI-specified alignment. This patch is aimed at fixing those
spots.
Differential Revision: https://reviews.llvm.org/D86790
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D88398
This happens in glibc's headers. It's important that we recognize these
functions so that we can mark them as returns_twice.
Differential Revision: https://reviews.llvm.org/D88518
GCC 7 introduced -fprofile-update={atomic,prefer-atomic} (prefer-atomic is for
best efforts (some targets do not support atomics)) to increment counters
atomically, which is exactly what we have done with -fprofile-instr-generate
(D50867) and -fprofile-arcs (b5ef137c11).
This patch adds the option to clang to surface the internal options at driver level.
GCC 7 also turned on -fprofile-update=prefer-atomic when -pthread is specified,
but it has performance regression
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307). So we don't follow suit.
Differential Revision: https://reviews.llvm.org/D87737
This reverts commit 55c4ff91bd.
Issues were introduced as discussed in https://reviews.llvm.org/D88241
where this change made previous bugs in the linker and BitCodeWriter
visible.
Move abstractMemberAccess and PreserveDIType passes as early as
possible, right after clang code generation.
Currently, compiler may transform the above code
p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
if (a) {
p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
bpf_probe_read(buf, buf_size, p2);
}
to
p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
if (a) {
bpf_probe_read(buf, buf_size, p2);
}
and eventually assembly code looks like
reloc_exist = 1;
reloc_member_offset = 10; //calculate member offset from base
p2 = base + reloc_member_offset;
if (reloc_exist) {
bpf_probe_read(bpf, buf_size, p2);
}
if during libbpf relocation resolution, reloc_exist is actually
resolved to 0 (not exist), reloc_member_offset relocation cannot
be resolved and will be patched with illegal instruction.
This will cause verifier failure.
This patch attempts to address this issue by do chaining
analysis and replace chains with special globals right
after clang code gen. This will remove the cse possibility
described in the above. The IR typically looks like
%6 = load @llvm.sk_buff:0:50$0:0:0:2:0
%7 = bitcast %struct.sk_buff* %2 to i8*
%8 = getelementptr i8, i8* %7, %6
for a particular address computation relocation.
But this transformation has another consequence, code sinking
may happen like below:
PHI = <possibly different @preserve_*_access_globals>
%7 = bitcast %struct.sk_buff* %2 to i8*
%8 = getelementptr i8, i8* %7, %6
For such cases, we will not able to generate relocations since
multiple relocations are merged into one.
This patch introduced a passthrough builtin
to prevent such optimization. Looks like inline assembly has more
impact for optimizaiton, e.g., inlining. Using passthrough has
less impact on optimizations.
A new IR pass is introduced at the beginning of target-dependent
IR optimization, which does:
- report fatal error if any reloc global in PHI nodes
- remove all bpf passthrough builtin functions
Changes for existing CORE tests:
- for clang tests, add "-Xclang -disable-llvm-passes" flags to
avoid builtin->reloc_global transformation so the test is still
able to check correctness for clang generated IR.
- for llvm CodeGen/BPF tests, add "opt -O2 <ir_file> | llvm-dis" command
before "llc" command since "opt" is needed to call newly-placed
builtin->reloc_global transformation. Add target triple in the IR
file since "opt" requires it.
- Since target triple is added in IR file, if a test may produce
different results for different endianness, two tests will be
created, one for bpfeb and another for bpfel, e.g., some tests
for relocation of lshift/rshift of bitfields.
- field-reloc-bitfield-1.ll has different relocations compared to
old codes. This is because for the structure in the test,
new code returns struct layout alignment 4 while old code
is 8. Align 8 is more precise and permits double load. With align 4,
the new mechanism uses 4-byte load, so generating different
relocations.
- test intrinsic-transforms.ll is removed. This is used to test
cse on intrinsics so we do not lose metadata. Now metadata is attached
to global and not instruction, it won't get lost with cse.
Differential Revision: https://reviews.llvm.org/D87153
Instead of expliciting emitting a setc in the inline asm instructions,
we can use flag output. This allows the backend to use the flag
directly if it is needed by a branch. Previously we needed a test
instruction to convert the register back to a flag.
If the flag can't be used directly, the backend will emit a setcc.
Differential Revision: https://reviews.llvm.org/D87888
This patch legalizes the v256i1 and v512i1 types that will be used for MMA.
It implements loads and stores of these types.
v256i1 is a pair of VSX registers, so for this type, we load/store the two
underlying registers. v512i1 is used for MMA accumulators. So in addition to
loading and storing the 4 associated VSX registers, we generate instructions to
prime (copy the VSX registers to the accumulator) after loading and unprime
(copy the accumulator back to the VSX registers) before storing.
This patch also adds the UACC register class that is necessary to implement the
loads and stores. This class represents accumulator in their unprimed form and
allow the distinction between primed and unprimed accumulators to avoid invalid
copies of the VSX registers associated with primed accumulators.
Differential Revision: https://reviews.llvm.org/D84968
- `-cl-fp32-correctly-rounded-divide-sqrt` is an OpenCL-specific option
and `correctly-rounded-divide-sqrt-fp-math` should be added for OpenCL
at most.
Differential revision: https://reviews.llvm.org/D88303
There appears to be a mis-compile with MemorySSA-backed DSE in
combination with llvm.lifetime.end. It currently appears like
DSE is doing the right thing and the llvm.lifetime.end markers
are incorrect. The reverted patch uncovers the mis-compile.
This patch temporarily switches back to the legacy DSE
implementation, while we investigate.
This reverts commit 9d172c8e9c.
Make the corresponding change that was made for byval in
b7141207a4. Like byval, this requires a
bulk update of the test IR tests to include the type before this can
be mandatory.
PAC/BTI-related codegen in the AArch64 backend is controlled by a set
of LLVM IR function attributes, added to the function by Clang, based
on command-line options and GCC-style function attributes. However,
functions, generated in the LLVM middle end (for example,
asan.module.ctor or __llvm_gcov_write_out) do not get any attributes
and the backend incorrectly does not do any PAC/BTI code generation.
This patch record the default state of PAC/BTI codegen in a set of
LLVM IR module-level attributes, based on command-line options:
* "sign-return-address", with non-zero value means generate code to
sign return addresses (PAC-RET), zero value means disable PAC-RET.
* "sign-return-address-all", with non-zero value means enable PAC-RET
for all functions, zero value means enable PAC-RET only for
functions, which spill LR.
* "sign-return-address-with-bkey", with non-zero value means use B-key
for signing, zero value mean use A-key.
This set of attributes are always added for AArch64 targets (as
opposed, for example, to interpreting a missing attribute as having a
value 0) in order to be able to check for conflicts when combining
module attributed during LTO.
Module-level attributes are overridden by function level attributes.
All the decision making about whether to not to generate PAC and/or
BTI code is factored out into AArch64FunctionInfo, there shouldn't be
any places left, other than AArch64FunctionInfo, which directly
examine PAC/BTI attributes, except AArch64AsmPrinter.cpp, which
is/will-be handled by a separate patch.
Differential Revision: https://reviews.llvm.org/D85649
Adding this test so that I can extend it in a follow on patch with
expected IR for AIX when I implement complex handling in
AIXABIInfo.
Reviewed By: daltenty, ZarkoCA
Differential Revision: https://reviews.llvm.org/D88105
Add the ability to selectively instrument a subset of functions by dividing the functions into N logical groups and then selecting a group to cover. By selecting different groups over time you could cover the entire application incrementally with lower overhead than instrumenting the entire application at once.
Differential Revision: https://reviews.llvm.org/D87953
This patch implements custom codegen for the vec_replace_elt and
vec_replace_unaligned builtins.
These builtins map to the @llvm.ppc.altivec.vinsw and @llvm.ppc.altivec.vinsd
intrinsics depending on the arguments. The main motivation for doing custom
codegen for these intrinsics is because there are float and double versions of
the builtin. Normally, the converting the float to an integer would be done via
fptoui in the IR. This is incorrect as fptoui truncates the value and we must
ensure the value is not truncated. Therefore, we provide custom codegen to utilize
bitcast instead as bitcasts do not truncate.
Differential Revision: https://reviews.llvm.org/D83500
I believe the inline asm emitted here should have a memory clobber since it writes to memory.
It was also missing the dirflag clobber that we use by default along with flags and fpsr. To avoid missing defaults in the future, get the default list from the target
Differential Revision: https://reviews.llvm.org/D88121
This patch implements the vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins for vector signed/unsigned __int128.
Differential Revision: https://reviews.llvm.org/D87910
D87921 was reverted in commit b89059a313
as it was causing an unknown llvm PPC bot failure. Reapplying the patch
after confirming that this is not responsible. Build bot failure:
https://reviews.llvm.org/D87921#2286644 which caused the revert.
The wrong placement of add pass with optimizations led to
-funique-internal-linkage-names being disabled.
Fixed the placement of the MPM.addpass for UniqueInternalLinkageNames to make it
work correctly with -O2 and new pass manager. Updated the tests to explicitly
check O0 and O1.
Differential Revision: https://reviews.llvm.org/D87921
This completes the circle, complementing -lto-embed-bitcode
(specifically, post-merge-pre-opt). Using -thinlto-assume-merged skips
function importing. The index file is still needed for the other data it
contains.
Differential Revision: https://reviews.llvm.org/D87949
This patch implements the vector string isolate (predicate and non-predicate
versions) builtins. The predicate builtins are custom selected within PPCISelDAGToDAG.
Differential Revision: https://reviews.llvm.org/D87671
This patch implements the 128-bit vector divide extended builtins in Clang/LLVM.
These builtins map to the vdivesq and vdiveuq instructions respectively.
Differential Revision: https://reviews.llvm.org/D87729
Set the default wchar_t type on z/OS, and unsigned as the default.
Reviewed By: hubert.reinterpretcast, fanbo-meng
Differential Revision: https://reviews.llvm.org/D87624
Fixed the placement of the MPM.addpass for UniqueInternalLinkageNames to make
it work correctly with -O2 and new pass manager. Updated the tests to
explicitly check O0 and O2.
Previously, the addPass was placed before BackendUtil.cpp#L1373 which is wrong
as MPM gets assigned at this point and any additions to the pass vector before
this is wrong. This change just moves it after MPM is assigned and places it at
a point where O0 and O0+ can share it.
Differential Revision: https://reviews.llvm.org/D87921
This patch implements the vec_gen[b|h|w|d|q]m function prototypes in altivec.h
in order to utilize the move to VSR with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82725
This switches to using DSE + MemorySSA by default again, after
fixing the issues reported after the first commit.
Notable fixes fc82006331, a0017c2bc2.
This reverts commit 3a59628f3c.
This patch implements the vec_cntm function prototypes in altivec.h in order to
utilize the vector count mask bits instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82726
Currenlty assume x18 is used as pointer to shadow call stack. User shall pass
flags:
"-fsanitize=shadow-call-stack -ffixed-x18"
Runtime supported is needed to setup x18.
If SCS is desired, all parts of the program should be built with -ffixed-x18 to
maintain inter-operatability.
There's no particuluar reason that we must use x18 as SCS pointer. Any register
may be used, as long as it does not have designated purpose already, like RA or
passing call arguments.
Differential Revision: https://reviews.llvm.org/D84414
Enable canonicalization of SPF_ABS and SPF_NABS to the abs intrinsic.
To be conservative, the one-use check on the comparison is retained,
this may be relaxed if all goes well.
It's pretty likely that this will uncover places that missing
handling for the abs() intrinsic. Please report any seen performance
regressions.
Differential Revision: https://reviews.llvm.org/D87188
Instead of relying on whether a certain identifier is a builtin, introduce BuiltinAttr to specify a declaration as having builtin semantics.
This fixes incompatible redeclarations of builtins, as reverting the identifier as being builtin due to one incompatible redeclaration would have broken rest of the builtin calls.
Mostly-compatible redeclarations of builtins also no longer have builtin semantics. They don't call the builtin nor inherit their attributes.
A long-standing FIXME regarding builtins inside a namespace enclosed in extern "C" not being recognized is also addressed.
Due to the more correct handling attributes for builtin functions are added in more places, resulting in more useful warnings.
Tests are updated to reflect that.
Intrinsics without an inline definition in intrin.h had `inline` and `static` removed as they had no effect and caused them to no longer be recognized as builtins otherwise.
A pthread_create() related test is XFAIL-ed, as it relied on it being recognized as a builtin based on its name.
The builtin declaration syntax is too restrictive and doesn't allow custom structs, function pointers, etc.
It seems to be the only case and fixing this would require reworking the current builtin syntax, so this seems acceptable.
Fixes PR45410.
Reviewed By: rsmith, yutsumi
Differential Revision: https://reviews.llvm.org/D77491
This patch adds support for implicit casting between GNU vectors and SVE
vectors when `__ARM_FEATURE_SVE_BITS==N`, as defined by the Arm C
Language Extensions (ACLE, version 00bet5, section 3.7.3.3) for SVE [1].
This behavior makes it possible to use GNU vectors with ACLE functions
that operate on VLAT. For example:
typedef int8_t vec __attribute__((vector_size(32)));
vec f(vec x) { return svasrd_x(svptrue_b8(), x, 1); }
Tests are also added for implicit casting between GNU and fixed-length
SVE vectors created by the 'arm_sve_vector_bits' attribute. This
behavior makes it possible to use VLST with existing interfaces that
operate on GNUT. For example:
typedef int8_t vec1 __attribute__((vector_size(32)));
void f(vec1);
#if __ARM_FEATURE_SVE_BITS==256 && __ARM_FEATURE_SVE_VECTOR_OPERATORS
typedef svint8_t vec2 __attribute__((arm_sve_vector_bits(256)));
void g(vec2 x) { f(x); } // OK
#endif
The `__ARM_FEATURE_SVE_VECTOR_OPERATORS` feature macro indicates
interoperability with the GNU vector extension. This is the first patch
providing support for this feature, which once complete will be enabled
by the `-msve-vector-bits` flag, as the `__ARM_FEATURE_SVE_BITS` feature
currently is.
[1] https://developer.arm.com/documentation/100987/latest
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87607
This will embed bitcode after (Thin)LTO merge, but before optimizations.
In the case the thinlto backend is called from clang, the .llvmcmd
section is also produced. Doing so in the case where the caller is the
linker doesn't yet have a motivation, and would require plumbing through
command line args.
Differential Revision: https://reviews.llvm.org/D87636
We're now getting close to having the necessary analysis/combines etc. for the new generic llvm smax/smin/umax/umin intrinsics.
This patch updates the SSE/AVX integer MINMAX intrinsics to emit the generic equivalents instead of the icmp+select code pattern.
Differential Revision: https://reviews.llvm.org/D87603
This patch introduces the new .bb_addr_map section feature which allows us to emit the bits needed for mapping binary profiles to basic blocks into a separate section.
The format of the emitted data is represented as follows. It includes a header for every function:
| Address of the function | -> 8 bytes (pointer size)
| Number of basic blocks in this function (>0) | -> ULEB128
The header is followed by a BB record for every basic block. These records are ordered in the same order as MachineBasicBlocks are placed in the function. Each BB Info is structured as follows:
| Offset of the basic block relative to function begin | -> ULEB128
| Binary size of the basic block | -> ULEB128
| BB metadata | -> ULEB128 [ MBB.isReturn() OR MBB.hasTailCall() << 1 OR MBB.isEHPad() << 2 ]
The new feature will replace the existing "BB labels" functionality with -basic-block-sections=labels.
The .bb_addr_map section scrubs the specially-encoded BB symbols from the binary and makes it friendly to profilers and debuggers.
Furthermore, the new feature reduces the binary size overhead from 70% bloat to only 12%.
For more information and results please refer to the RFC: https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html
Reviewed By: MaskRay, snehasish
Differential Revision: https://reviews.llvm.org/D85408
avx512-reduceIntrin.c wasn't bothering with the exhaustive alloca/store/load/bitcast checks and avx512-reduceMinMaxIntrin.c shouldn't need to either.
This makes it a lot easier to maintain as the update script still doesn't work properly on x86 targets
gcov is an "Edge Profiling with Edge Counters" application according to
Optimally Profiling and Tracing Programs (1994).
The minimum number of counters necessary is |E|-(|V|-1). The unmeasured edges
form a spanning tree. Both GCC --coverage and clang -fprofile-generate leverage
this optimization. This patch implements the optimization for clang --coverage.
The produced .gcda files are much smaller now.
i.e. change the work flow from
* .gcno for function A
* .gcno for function B
* .gcno for function C
* .gcda for function A
* .gcda for function B
* .gcda for function C
to
* .gcno for function A
* .gcda for function A
* .gcno for function B
* .gcda for function B
* .gcno for function C
* .gcda for function C
Currently there is duplicate logic in .gcno & .gcda processing: how functions
are filtered, which edges are instrumented, etc. This refactor enables simplification.
Since we always process .gcno, in -fprofile-arcs -fno-test-coverage mode,
__llvm_internal_gcov_emit_function_args.0 will have non-zero checksums.
After the recent discussion on cfe-dev 'Can indirect class parameters be
noalias?' [1], it seems like using using noalias is problematic for
current C++, but should be allowed for C-only code.
This patch introduces a new option to let the user indicate that it is
safe to mark indirect class parameters as noalias. Note that this also
applies to external callers, e.g. it might not be safe to use this flag
for C functions that are called by C++ functions.
In targets that allocate indirect arguments in the called function, this
enables more agressive optimizations with respect to memory operations
and brings a ~1% - 2% codesize reduction for some programs.
[1] : http://lists.llvm.org/pipermail/cfe-dev/2020-July/066353.html
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D85473
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
d4ce862f introduced HasStrictFP to disable generating constrained FP
operations for platforms lacking support. Since work for enabling
constrained FP on PowerPC is almost done, we'd like to enable it.
Reviewed By: kpn, steven.zhang
Differential Revision: https://reviews.llvm.org/D87223
The tests have been updated and I plan to move them from the MSSA
directory up.
Some end-to-end tests needed small adjustments. One difference to the
legacy DSE is that legacy DSE also deletes trivially dead instructions
that are unrelated to memory operations. Because MemorySSA-backed DSE
just walks the MemorySSA, we only visit/check memory instructions. But
removing unrelated dead instructions is not really DSE's job and other
passes will clean up.
One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll,
but I think this comes down to legacy DSE not handling instructions that
may throw correctly in that case. To cover this with MemorySSA-backed
DSE, we need an update to llvm.coro.begin to treat it's return value to
belong to the same underlying object as the passed pointer.
There are some minor cases MemorySSA-backed DSE currently misses, e.g. related
to atomic operations, but I think those can be implemented after the switch.
This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html
For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores
goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers
and details in the thread on llvm-dev.
Impact on CTMark:
```
Legacy Pass Manager
exec instrs size-text
O3 + 0.60% - 0.27%
ReleaseThinLTO + 1.00% - 0.42%
ReleaseLTO-g. + 0.77% - 0.33%
RelThinLTO (link only) + 0.87% - 0.42%
RelLO-g (link only) + 0.78% - 0.33%
```
http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions
```
New Pass Manager
exec instrs. size-text
O3 + 0.95% - 0.25%
ReleaseThinLTO + 1.34% - 0.41%
ReleaseLTO-g. + 1.71% - 0.35%
RelThinLTO (link only) + 0.96% - 0.41%
RelLO-g (link only) + 2.21% - 0.35%
```
http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions
Reviewed By: asbirlea, xbolva00, nikic
Differential Revision: https://reviews.llvm.org/D87163
There are still plenty of tests that specify x86 as a triple but most shouldn't be doing anything very target specific - we can move any ones that I have missed on a case by case basis.
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.
One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87220
This patch resumes the work of D16586.
According to the AAPCS, volatile bit-fields should
be accessed using containers of the widht of their
declarative type. In such case:
```
struct S1 {
short a : 1;
}
```
should be accessed using load and stores of the width
(sizeof(short)), where now the compiler does only load
the minimum required width (char in this case).
However, as discussed in D16586,
that could overwrite non-volatile bit-fields, which
conflicted with C and C++ object models by creating
data race conditions that are not part of the bit-field,
e.g.
```
struct S2 {
short a;
int b : 16;
}
```
Accessing `S2.b` would also access `S2.a`.
The AAPCS Release 2020Q2
(https://documentation-service.arm.com/static/5efb7fbedbdee951c1ccf186?token=)
section 8.1 Data Types, page 36, "Volatile bit-fields -
preserving number and width of container accesses" has been
updated to avoid conflict with the C++ Memory Model.
Now it reads in the note:
```
This ABI does not place any restrictions on the access widths of bit-fields where the container
overlaps with a non-bit-field member or where the container overlaps with any zero length bit-field
placed between two other bit-fields. This is because the C/C++ memory model defines these as being
separate memory locations, which can be accessed by two threads simultaneously. For this reason,
compilers must be permitted to use a narrower memory access width (including splitting the access into
multiple instructions) to avoid writing to a different memory location. For example, in
struct S { int a:24; char b; }; a write to a must not also write to the location occupied by b, this requires at least two
memory accesses in all current Arm architectures. In the same way, in struct S { int a:24; int:0; int b:8; };,
writes to a or b must not overwrite each other.
```
Patch D16586 was updated to follow such behavior by verifying that we
only change volatile bit-field access when:
- it won't overlap with any other non-bit-field member
- we only access memory inside the bounds of the record
- avoid overlapping zero-length bit-fields.
Regarding the number of memory accesses, that should be preserved, that will
be implemented by D67399.
Differential Revision: https://reviews.llvm.org/D72932
The following people contributed to this patch:
- Diogo Sampaio
- Ties Stuij
Discussed with @craig.topper and @spatel - this is to try and tidyup the codegen folder and move the x86 specific tests (as opposed to general tests that just happen to use x86 triples) into subfolders. Its up to other targets if they follow suit.
It also helps speed up test iterations as using wildcards on lit commands often misses some filenames.
We're now getting close to having the necessary analysis/combines etc. for the new generic llvm.abs.* intrinsics.
This patch updates the SSE/AVX ABS vector intrinsics to emit the generic equivalents instead of the icmp+sub+select code pattern.
Differential Revision: https://reviews.llvm.org/D87101
This patch implements the vec_expandm function prototypes in altivec.h in order
to utilize the vector expand with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82727
These overloads are listed in appendix A of the ELFv2 ABI specification
without a requirement for ISA 3.0. So these need to be available on
all Altivec-capable architectures. The implementation in altivec.h
erroneously had them guarded for Power9 due to the availability of
the VCMPNE[BHW] instructions. However these need to be implemented
in terms of the VCMPEQ[BHW] instructions on older architectures.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=47423
The load builtins in altivec.h do not have const in the signature
for the pointer parameter. This prevents using them for loading
from constant pointers. A notable case for such a use is Eigen.
This patch simply adds the missing const.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=47408
The __ARM_FEATURE_SVE_BITS feature macro is specified in the Arm C
Language Extensions (ACLE) for SVE [1] (version 00bet5). From the spec,
where __ARM_FEATURE_SVE_BITS==N:
When N is nonzero, indicates that the implementation is generating
code for an N-bit SVE target and that the arm_sve_vector_bits(N)
attribute is available.
This was defined in D83550 as __ARM_FEATURE_SVE_BITS_EXPERIMENTAL and
enabled under the -msve-vector-bits flag to simplify initial tests.
This patch drops _EXPERIMENTAL now there is support for the feature.
[1] https://developer.arm.com/documentation/100987/latest
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D86720
This patch implements the builtins for Vector Multiply Builtins (vmulxxd family of instructions), and adds the appropriate test cases for these builtins. The builtins utilize the vector multiply instructions itnroduced with ISA 3.1.
Differential Revision: https://reviews.llvm.org/D83955
As a prerequisite to doing experimental buids of pieces of FreeBSD PowerPC64 as little-endian, allow actually targeting it.
This is needed so basic platform definitions are pulled in. Without it, the compiler will only run freestanding.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D73425
This patch implements the builtins for Vector Load with Zero and Signed Extend Builtins (lxvr_x for b, h, w, d), and adds the appropriate test cases for these builtins. The builtins utilize the vector load instructions itnroduced with ISA 3.1.
Differential Revision: https://reviews.llvm.org/D82502#inline-797941
This relands D85743 with a fix for test
CodeGen/attr-arm-sve-vector-bits-call.c that disables the new pass
manager with '-fno-experimental-new-pass-manager'. Test was failing due
to IR differences with the new pass manager which broke the Fuchsia
builder [1]. Reverted in 2e7041f.
[1] http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10375
Original summary:
This patch implements codegen for the 'arm_sve_vector_bits' type
attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1].
The purpose of this attribute is to define vector-length-specific (VLS)
versions of existing vector-length-agnostic (VLA) types.
VLSTs are represented as VectorType in the AST and fixed-length vectors
in the IR everywhere except in function args/return. Implemented in this
patch is codegen support for the following:
* Implicit casting between VLA <-> VLS types.
* Coercion of VLS types in function args/return.
* Mangling of VLS types.
Casting is handled by the CK_BitCast operation, which has been extended
to support the two new vector kinds for fixed-length SVE predicate and
data vectors, where the cast is implemented through memory rather than a
bitcast which is unsupported. Implementing this as a normal bitcast
would require relaxing checks in LLVM to allow bitcasting between
scalable and fixed types. Another option was adding target-specific
intrinsics, although codegen support would need to be added for these
intrinsics. Given this, casting through memory seemed like the best
approach as it's supported today and existing optimisations may remove
unnecessary loads/stores, although there is room for improvement here.
Coercion of VLSTs in function args/return from fixed to scalable is
implemented through the AArch64 ABI in TargetInfo.
The VLA and VLS types are defined by the ACLE to map to the same
machine-level SVE vectors. VLS types are mangled in the same way as:
__SVE_VLS<typename, unsigned>
where the first argument is the underlying variable-length type and the
second argument is the SVE vector length in bits. For example:
#if __ARM_FEATURE_SVE_BITS==512
// Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE
typedef svint32_t vec __attribute__((arm_sve_vector_bits(512)));
// Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE
typedef svbool_t pred __attribute__((arm_sve_vector_bits(512)));
#endif
The latest ACLE specification (00bet5) does not contain details of this
mangling scheme, it will be specified in the next revision. The
mangling scheme is otherwise defined in the appendices to the Procedure
Call Standard for the Arm Architecture, see [2] for more information.
[1] https://developer.arm.com/documentation/100987/latest
[2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85743
It's not undefined behavior for an unsigned left shift to overflow (i.e. to
shift bits out), but it has been the source of bugs and exploits in certain
codebases in the past. As we do in other parts of UBSan, this patch adds a
dynamic checker which acts beyond UBSan and checks other sources of errors. The
option is enabled as part of -fsanitize=integer.
The flag is named: -fsanitize=unsigned-shift-base
This matches shift-base and shift-exponent flags.
<rdar://problem/46129047>
Differential Revision: https://reviews.llvm.org/D86000
This patch adjusts the following ARM/AArch64 LLVM IR intrinsics:
- neon_bfmmla
- neon_bfmlalb
- neon_bfmlalt
so that they take and return bf16 and float types. Previously these
intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from
implementation lacking bf16 IR type).
The neon_vbfdot[q] intrinsics are adjusted similarly. This change
required some additional selection patterns for vbfdot itself and
also for vector shuffles (in a previous patch) because of SelectionDAG
transformations kicking in and mangling the original code.
This patch makes the generated IR cleaner (less useless bitcasts are
produced), but it does not affect the final assembly.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D86146
This patch implements codegen for the 'arm_sve_vector_bits' type
attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1].
The purpose of this attribute is to define vector-length-specific (VLS)
versions of existing vector-length-agnostic (VLA) types.
VLSTs are represented as VectorType in the AST and fixed-length vectors
in the IR everywhere except in function args/return. Implemented in this
patch is codegen support for the following:
* Implicit casting between VLA <-> VLS types.
* Coercion of VLS types in function args/return.
* Mangling of VLS types.
Casting is handled by the CK_BitCast operation, which has been extended
to support the two new vector kinds for fixed-length SVE predicate and
data vectors, where the cast is implemented through memory rather than a
bitcast which is unsupported. Implementing this as a normal bitcast
would require relaxing checks in LLVM to allow bitcasting between
scalable and fixed types. Another option was adding target-specific
intrinsics, although codegen support would need to be added for these
intrinsics. Given this, casting through memory seemed like the best
approach as it's supported today and existing optimisations may remove
unnecessary loads/stores, although there is room for improvement here.
Coercion of VLSTs in function args/return from fixed to scalable is
implemented through the AArch64 ABI in TargetInfo.
The VLA and VLS types are defined by the ACLE to map to the same
machine-level SVE vectors. VLS types are mangled in the same way as:
__SVE_VLS<typename, unsigned>
where the first argument is the underlying variable-length type and the
second argument is the SVE vector length in bits. For example:
#if __ARM_FEATURE_SVE_BITS==512
// Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE
typedef svint32_t vec __attribute__((arm_sve_vector_bits(512)));
// Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE
typedef svbool_t pred __attribute__((arm_sve_vector_bits(512)));
#endif
The latest ACLE specification (00bet5) does not contain details of this
mangling scheme, it will be specified in the next revision. The
mangling scheme is otherwise defined in the appendices to the Procedure
Call Standard for the Arm Architecture, see [2] for more information.
[1] https://developer.arm.com/documentation/100987/latest
[2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85743
This patch adds type information for SVE ACLE vector types,
by describing them as vectors, with a lower bound of 0, and
an upper bound described by a DWARF expression using the
AArch64 Vector Granule register (VG), which contains the
runtime multiple of 64bit granules in an SVE vector.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86101
This patch implements the function prototypes vec_mulh and vec_dive in order to
utilize the vector multiply high (vmulh[s|u][w|d]) and vector divide extended
(vdive[s|u][w|d]) instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82609
Support -march=sapphirerapids for x86.
Compare with Icelake Server, it includes 14 more new features. They are
amxtile, amxint8, amxbf16, avx512bf16, avx512vp2intersect, cldemote,
enqcmd, movdir64b, movdiri, ptwrite, serialize, shstk, tsxldtrk, waitpkg.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D86503
As suggested by @rsmith on PR47267, by replacing the builtin_memcpy bitcast pattern with builtin_bit_cast we can use _castf32_u32, _castu32_f32, _castf64_u64 and _castu64_f64 inside constant expresssions (constexpr). Although __builtin_bit_cast was added for c++20 it works on all clang c/c++ modes.
Differential Revision: https://reviews.llvm.org/D86398
Currently ConstantExpr::getWithOperands does not handle FNeg and
subsequently treats FNeg as binary operator, leading to an assertion
failure or segmentation fault if built without assertions.
Originally I reproduced this with llvm-dis on a bitcode file, which I
unfortunately cannot share and also cannot really reduce.
But PR45426 describes the same issue and has a reproducer with Clang, so
I'll go with that.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D86274
Commit dbcfbffc adds ppc.readflm and ppc.setflm intrinsics to read or
write FPSCR register. This patch adds them to Clang.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D85874
This is a first step patch to enable constexpr support and testing to a large number of x86 intrinsics.
All I've done here is provide a DEFAULT_FN_ATTRS_CONSTEXPR variant to our existing DEFAULT_FN_ATTRS tag approach that adds constexpr on c++ builds. The clang cuda headers do something similar.
I've started with POPCNT mainly as its tiny and are wrappers to generic __builtin_* intrinsics which already act as constexpr.
Differential Revision: https://reviews.llvm.org/D86229
This adds parsing and codegen support for tune in target attribute.
I've implemented this so that arch in the target attribute implicitly disables tune from the command line. I'm not sure what gcc does here. But since -march implies -mtune. I assume 'arch' in the target attribute implies tune in the target attribute.
Differential Revision: https://reviews.llvm.org/D86187
This patch implements the vec_extractm function prototypes in altivec.h in
order to utilize the vector extract with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82675
Pin the test to use -enable-npm-optnone.
Before, optnone wasn't implemented under NPM, so the LPM and NPM runs produced different IR. Now with -enable-npm-optnone, that is no longer necessary.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D86008
This fails due to the clang invocation running at -O0, producing an optnone function.
Then even with -O2 in the later invocations, LoopVectorizePass doesn't run on the optnone function.
So split this into an -O0 run and an -O2 run.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86011
When casting an enumerate with a fixed bool type the casting should use
an IntegralToBoolean instead of an IntegralCast as is required per Core
Issue 2338.
Fixes PR47055: Incorrect codegen for enum with bool underlying type
Differential Revision: https://reviews.llvm.org/D85612
This was done by turning on -enable-npm-optnone and fixing failures.
That will be enabled in a follow-up change for ease of reverting.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D85457
These functions won't ever unwind. This is useful for MemorySanitizer
as it simplifies handling __atomic_load in particular.
Differential Revision: https://reviews.llvm.org/D85573
This patch implements the builtins for the vector shifts (shl, srl, sra), and
adds the appropriate test cases for these builtins. The builtins utilize the
vector shift instructions introduced within ISA 3.1.
Differential Revision: https://reviews.llvm.org/D83338
This recommits the following patches now that D85684 has landed
1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
COFF targets have a max object alignment of 8192, so trying to create
one with a larger size results in an unreachable in WinCOFFObjectWriter.
For the reproducer I have uses thread local storage, however other
alignments are likely affected as well.
This patch sets the MaxVectorAlign for COFF to 8192. Additionally,
though there is no longer a way to reproduce that I could find, it
correctly sets the MaxTLSAlign for COFF to that value as well, so that
if anyone comes up with a situation where this is true, it will cause an
error.
Differential Revision: https://reviews.llvm.org/D85543
When we use mask compare intrinsics under strict FP option, the masked
elements shouldn't raise any exception. So, we cann't replace the
intrinsic with a full compare + "and" operation.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D85385
Fixes pr/11710.
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Resubmit after breaking Windows and OSX builds.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D80242
This patch adds the missing information to the LF_BUILDINFO record, which allows for rebuilding a .CPP without any external dependency but the .OBJ itself (other than the compiler).
Some external tools that we are using (Recode, Live++) are extracting the information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO stores a full path to the compiler, the PWD (CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variables). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.
For more information see PR36198 and D43002.
Differential Revision: https://reviews.llvm.org/D80833
If the CPU string is empty, the target feature map may end up having
an empty string inserted to it. The symptom of the problem is a warning
message:
'+' is not a recognized feature for this target (ignoring feature)
Also, the target-features attribute in the module will have an empty
string in it.
This patch adds noundef to return value and arguments of standard I/O functions.
With this patch, passing undef or poison to the functions becomes undefined
behavior in LLVM IR. Since undef/poison is lowered from operations having UB in C/C++,
passing undef to them was already UB in source.
With this patch, the functions cannot return undef or poison anymore as well.
According to C17 standard, ungetc/ungetwc/fgetpos/ftell can generate unspecified
value; 3.19.3 says unspecified value is a valid value of the relevant type,
and using unspecified value is unspecified behavior, which is not UB, so it
cannot be undef (using undef is UB when e.g. it is used at branch condition).
— The value of the file position indicator after a successful call to the ungetc function for a text stream, or the ungetwc function for any stream, until all pushed-back characters are read or discarded (7.21.7.10, 7.29.3.10).
— The details of the value stored by the fgetpos function (7.21.9.1).
— The details of the value returned by the ftell function for a text stream (7.21.9.4).
In the long run, most of the functions listed in BuildLibCalls should have noundefs; to remove redundant diffs which will anyway disappear in the future, I added noundef to a few more non-I/O functions as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85345
This patch implements the function prototypes vec_extractl and vec_extracth in altivec.h to utilize the vector extract double element instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D84622
The _ExtInt(1) in getTypeForMem was hitting the bool logic for expanding
to an 8 bit value. The result was an assert, or store i1 %0, i8* %2, align 1
since the parameter IS an i1. This patch changes the 'forMem' test to
exclude ext-int from the bool test.
This patch simplified IR generation for __builtin_btf_type_id().
For __builtin_btf_type_id(obj, flag), previously IR builtin
looks like
if (obj is a lvalue)
llvm.bpf.btf.type.id(obj.ptr, 1, flag) !type
else
llvm.bpf.btf.type.id(obj, 0, flag) !type
The purpose of the 2nd argument is to differentiate
__builtin_btf_type_id(obj, flag) where obj is a lvalue
vs.
__builtin_btf_type_id(obj.ptr, flag)
Note that obj or obj.ptr is never used by the backend
and the `obj` argument is only used to derive the type.
This code sequence is subject to potential llvm CSE when
- obj is the same .e.g., nullptr
- flag is the same
- metadata type is different, e.g., typedef of struct "s"
and strust "s".
In the above, we don't want CSE since their metadata is different.
This patch change IR builtin to
llvm.bpf.btf.type.id(seq_num, flag) !type
and seq_num is always increasing. This will prevent potential
llvm CSE.
Also report an error if the type name is empty for
remote relocation since remote relocation needs non-empty
type name to do relocation against vmlinux.
Differential Revision: https://reviews.llvm.org/D85174
This allows people to use `int8_t` instead of `char`, -funsigned-char,
and generally decouples SIMD from the specialness of `char`.
And it makes intrinsics like `__builtin_wasm_add_saturate_s_i8x16`
and `__builtin_wasm_add_saturate_u_i8x16` use signed and unsigned
element types, respectively.
Differential Revision: https://reviews.llvm.org/D85074
This patch added the following additional compile-once
run-everywhere (CO-RE) relocations:
- existence/size of typedef, struct/union or enum type
- enum value and enum value existence
These additional relocations will make CO-RE bpf programs more
adaptive for potential kernel internal data structure changes.
For existence/size relocations, the following two code patterns
are supported:
1. uint32_t __builtin_preserve_type_info(*(<type> *)0, flag);
2. <type> var;
uint32_t __builtin_preserve_field_info(var, flag);
flag = 0 for existence relocation and flag = 1 for size relocation.
For enum value existence and enum value relocations, the following code
pattern is supported:
uint64_t __builtin_preserve_enum_value(*(<enum_type> *)<enum_value>,
flag);
flag = 0 means existence relocation and flag = 1 for enum value.
relocation. In the above <enum_type> can be an enum type or
a typedef to enum type. The <enum_value> needs to be an enumerator
value from the same enum type. The return type is uint64_t to
permit potential 64bit enumerator values.
Differential Revision: https://reviews.llvm.org/D83242
In order to follow NEC Aurora SX VE ABI correctly, change to sign/zero
extend integer arguments and return values smaller than 64 bits in clang.
Also update regression test.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D85071
Specified in https://github.com/WebAssembly/simd/pull/237, these
instructions load the first vector lane from memory and zero the other
lanes. Since these instructions are not officially part of the SIMD
proposal, they are only available on an opt-in basis via LLVM
intrinsics and clang builtin functions. If these instructions are
merged to the proposal, this implementation will change so that the
instructions will be generated from normal IR. At that point the
intrinsics and builtin functions would be removed.
This PR also changes the opcodes for the experimental f32x4.qfm{a,s}
instructions because their opcodes conflicted with those of the
v128.load{32,64}_zero instructions. The new opcodes were chosen to
match those used in V8.
Differential Revision: https://reviews.llvm.org/D84820
ipconstprop is going to get removed and checking opt with specific
passes makes the tests more fragile.
The tests retain the important checks that !callback metadata is created
correctly.
As mentioned on D70376, LVI can currently cause performance issues
when running under NewPM. The problem is that, unlike the legacy
pass manager, NewPM will not immediately discard the LVI analysis
if the following pass does not need it. This is a problem, because
LVI has a high memory requirement, and mass invalidation of LVI
values is very inefficient. LVI should only be alive during passes
that actively interact with it.
This patch addresses the issue by explicitly abandoning LVI after CVP,
which gets us back to the LegacyPM behavior.
Differential Revision: https://reviews.llvm.org/D84959
Power10 introduces new instructions for vector multiply, divide and modulus.
These instructions can be exploited by the builtin functions: vec_mul, vec_div,
and vec_mod, respectively.
This patch aims adds the function prototype, vec_mod, as vec_mul and vec_div
been previously implemented in altivec.h.
This patch also adds the following front end tests:
vec_mul for v2i64
vec_div for v4i32 and v2i64
vec_mod for v4i32 and v2i64
Differential Revision: https://reviews.llvm.org/D82576
fptosi/fptoui have similar, but not identical, semantics. In
particular, the behavior on overflow is different.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46844 for 64-bit. (The
corresponding patch for 32-bit is more involved because the equivalent
intrinsics don't exist, as far as I can tell.)
Differential Revision: https://reviews.llvm.org/D84703
Problem:
Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order.
Solution:
Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder)
This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order.
This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want.
The test fixes looks messy. It includes changes:
- Remove PassInstrumentationAnalysis
- Remove PassAdaptor
- If a PassAdaptor is for a real pass, the pass is added
- Pass reorder (to the correct order), related to PassAdaptor
- Add missing passes (due to Debuglogging not passed down)
Reviewed By: asbirlea, aeubanks
Differential Revision: https://reviews.llvm.org/D84774
A list of target features is disabled when there is no hardware
floating-point support. This is the case when one of the following
options is passed to clang:
- -mfloat-abi=soft
- -mfpu=none
This option list is missing, however, the extension "+nofp" that can be
specified in -march flags, such as "-march=armv8-a+nofp".
This patch also disables unsupported target features when nofp is passed
to -march.
Differential Revision: https://reviews.llvm.org/D82948
Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.
Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.
Differential Revision: https://reviews.llvm.org/D84556
This patch implements the `vec_xst_trunc` function in altivec.h in order to
utilize the Store VSX Vector Rightmost [byte | half | word | doubleword] Indexed
instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82467
Implement __builtin_eh_return_data_regno for SystemZ.
Match behavior of GCC.
Author: slavek-kucera
Differential Revision: https://reviews.llvm.org/D84341
Previously, the vins*vlx instructions were incorrectly defined with i64 as the
second argument. This patches fixes this issue by correcting the second argument
of the vins*vlx instructions/intrinsics to be i32.
Differential Revision: https://reviews.llvm.org/D84277
I was trying to pick this up a bit when reviewing D48426 (& perhaps D69778) - in any case, looks like D48426 added a module level flag that might not be needed.
The D48426 implementation worked by setting a module level flag, then code generating contents from the PCH a special case in ASTContext::DeclMustBeEmitted would be used to delay emitting the definition of these functions if they came from a Module with this flag.
This strategy is similar to the one initially implemented for modular codegen that was removed in D29901 in favor of the modular decls list and a bit on each decl to specify whether it's homed to a module.
One major difference between PCH object support and modular code generation, other than the specific list of decls that are homed, is the compilation model: MSVC PCH modules are built into the object file for some other source file (when compiling that source file /Yc is specified to say "this compilation is where the PCH is homed"), whereas modular code generation invokes a separate compilation for the PCH alone. So the current modular code generation test of to decide if a decl should be emitted "is the module where this decl is serialized the current main file" has to be extended (as Lubos did in D69778) to also test the command line flag -building-pch-with-obj.
Otherwise the whole thing is basically streamlined down to the modular code generation path.
This even offers one extra material improvement compared to the existing divergent implementation: Homed functions are not emitted into object files that use the pch. Instead at -O0 they are not emitted into the IR at all, and at -O1 they are emitted using available_externally (existing functionality implemented for modular code generation). The pch-codegen test has been updated to reflect this new behavior.
[If possible: I'd love it if we could not have the extra MSVC-style way of accessing dllexport-pch-homing, and just do it the modular codegen way, but I understand that it might be a limitation of existing build systems. @hans / @thakis: Do either of you know if it'd be practical to move to something more similar to .pcm handling, where the pch itself is passed to the compilation, rather than homed as a side effect of compiling some other source file?]
Reviewers: llunak, hans
Differential Revision: https://reviews.llvm.org/D83652
The implementation of the xvtlsbb builtins/intrinsics were not correct as the
intrinsics previously used i1 as an argument type. This patch changes the i1
argument type used in these intrinsics to be i32 instead, as having the second
as an i1 can lead to issues in the backend.
Differential Revision: https://reviews.llvm.org/D84291
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.
D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).
This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic
A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.
This allows to move about 3000 lines out from InstCombine to the targets.
Differential Revision: https://reviews.llvm.org/D81728
Pass LowerMatrixIntrinsics wasn't running yet running under the new pass
manager, and this adds LowerMatrixIntrinsics to the pipeline (to the
same place as where it is running in the old PM).
Differential Revision: https://reviews.llvm.org/D84180
Sometimes we also want to avoid merging inline assembly. This patch add
the nomerge function attribute to inline assembly.
Reviewed By: zequanwu
Differential Revision: https://reviews.llvm.org/D84225
Use 'o' for the mangling specification instead of 'e'. This fixes an
error in the backend caused by a mismatch between the data layouts
generated by the backend and the frontend.
rdar://problem/64168540
GCC r187297 (2012-05) introduced `__gcov_dump` and `__gcov_reset`.
`__gcov_flush = __gcov_dump + __gcov_reset`
The resolution to https://gcc.gnu.org/PR93623 ("No need to dump gcdas when forking" target GCC 11.0) removed the unuseful and undocumented __gcov_flush.
Close PR38064.
Reviewed By: calixte, serge-sans-paille
Differential Revision: https://reviews.llvm.org/D83149
Previously, the vins* intrinsic was incorrectly defined to have its second and
third argument arguments as an i64. This patch fixes the second and third
argument of the vins* instruction and intrinsic to have i32s instead.
Differential Revision: https://reviews.llvm.org/D83497
This reverts most of the following patches due to reports of miscompiles.
I've left the added test cases with comments updated to be FIXMEs.
1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
4c5a93bd landed adjustment to handle C++20 no_unique_address attribute
correctly, clang treats empty members in aggregate type differently if
having this attribute. This commit adds necessary test for PowerPC
target to reflect this change.
In 2b3c505, the pointer arguments for the matrix load and store
intrinsics was changed to always be the element type of the vector
argument.
This patch updates the MatrixBuilder to not add the pointer type to the
overloaded types and adjusts the clang/mlir tests.
This should fix a few build failures on GreenDragon, including
http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O0-g/7891/
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.
This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: thopre, yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
This change fixed a SEH bug (exposed by test58 & test61 in MSVC test xcpt4u.c);
when an Except-filter is located inside a finally, the frame-pointer generated today
via intrinsic @llvm.eh.recoverfp is the frame-pointer of the immediate
parent _finally, not the frame-ptr of outermost host function.
The fix is to retrieve the Establisher's frame-pointer that was previously saved in
parent's frame.
The prolog of a filter inside a _finally should be like code below:
%0 = call i8* @llvm.eh.recoverfp(i8* bitcast (@"?fin$0@0@main@@"), i8*%frame_pointer)
%1 = call i8* @llvm.localrecover(i8* bitcast (@"?fin$0@0@main@@"), i8*%0, i32 0)
%2 = bitcast i8* %1 to i8**
%3 = load i8*, i8** %2, align 8
Differential Revision: https://reviews.llvm.org/D77982
Currently, Clang previously diagnosed this code by default:
void f(int a[static 0]);
saying that "static has no effect on zero-length arrays", which was
accurate.
However, static array extents require that the caller of the function
pass a nonnull pointer to an array of *at least* that number of
elements, but it can pass more (see C17 6.7.6.3p6). Given that we allow
zero-sized arrays as a GNU extension and that it's valid to pass more
elements than specified by the static array extent, we now support
zero-sized static array extents with the usual semantics because it can
be useful in cases like:
void my_bzero(char p[static 0], int n);
my_bzero(&c+1, 0); //ok
my_bzero(t+k,n-k); //ok, pattern from actual code
This patch adds some missing information to the LF_BUILDINFO which allows for rebuilding an .OBJ without any external dependency but the .OBJ itself (other than the compiler executable).
Some tools need this information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO therefore stores a full path to the compiler, the PWD (which is the CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variable). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.
For more information see PR36198 and D43002.
Differential Revision: https://reviews.llvm.org/D80833
Use the new -fexperimental-strict-floating-point flag in more cases to
fix the arm and aarch64 bots.
Differential Revision: https://reviews.llvm.org/D80952
We currently have strict floating point/constrained floating point enabled
for all targets. Constrained SDAG nodes get converted to the regular ones
before reaching the target layer. In theory this should be fine.
However, the changes are exposed to users through multiple clang options
already in use in the field, and the changes are _completely_ _untested_
on almost all of our targets. Bugs have already been found, like
"https://bugs.llvm.org/show_bug.cgi?id=45274".
This patch disables constrained floating point options in clang everywhere
except X86 and SystemZ. A warning will be printed when this happens.
Use the new -fexperimental-strict-floating-point flag to force allowing
strict floating point on hosts that aren't already marked as supporting
it (X86 and SystemZ).
Differential Revision: https://reviews.llvm.org/D80952
Many platform ABIs have special support for passing aggregates that
either just contain a single member of floatint-point type, or else
a homogeneous set of members of the same floating-point type.
When making this determination, any extra "empty" members of the
aggregate type will typically be ignored. However, in C++ (at least
in all prior versions), no data member would actually count as empty,
even if it's type is an empty record -- it would still be considered
to take up at least one byte of space, and therefore make those ABI
special cases not apply.
This is now changing in C++20, which introduced the [[no_unique_address]]
attribute. Members of empty record type, if they also carry this
attribute, now do *not* take up any space in the type, and therefore
the ABI special cases for single-element or homogeneous aggregates
should apply.
The C++ Itanium ABI has been updated accordingly, and GCC 10 has
added support for this new case. This patch now adds support to
LLVM. This is cross-platform; it affects all platforms that use
the single-element or homogeneous aggregate ABI special case and
implement this using any of the following common subroutines
in lib/CodeGen/TargetInfo.cpp:
isEmptyField
isEmptyRecord
isSingleElementStruct
isHomogeneousAggregate
This enables _InterlockedAnd64/_InterlockedOr64/_InterlockedXor64/_InterlockedDecrement64/_InterlockedIncrement64/_InterlockedExchange64/_InterlockedExchangeAdd64/_InterlockedExchangeSub64 on 32-bit Windows
The backend already knows how to expand these to a loop using cmpxchg8b on 32-bit targets.
Fixes PR46595
Differential Revision: https://reviews.llvm.org/D83254
The SystemZ ABI specifies that aggregate types with just a single
member of floating-point type shall be passed as if they were just
a scalar of that type. This applies to both struct and class types
(but not unions).
However, the current ABI support code in clang only checks this
case for struct types, which means that for class types, generated
code does not adhere to the platform ABI.
Fixed by accepting both struct and class types in the
SystemZABIInfo::GetSingleElementType routine.
When using __sync_nand_and_fetch with __int128, a problem is found that
the wrong value for the 'invert' value gets emitted to the xor in case
where the int size is greater than 64 bits.
This is because uses of llvm::ConstantInt::get which zero extends the
greater than 64 bits, so instead -1 that we require, it end up
getting 18446744073709551615
This patch replaces the call to llvm::ConstantInt::get with the call
to llvm::Constant::getAllOnesValue which works for all integer types.
Reviewers: jfp, erichkeane, rjmccall, hfinkel
Differential Revision: https://reviews.llvm.org/D82832
There are now more SVE tests in LLVM and Clang that do not
emit warnings related to invalid use of EVT::getVectorNumElements()
and VectorType::getNumElements(). For these tests I have added
additional checks that there are no warnings in order to prevent
any future regressions.
Differential Revision: https://reviews.llvm.org/D82943
This covers both the existing memory functions as well as the new bulk memory proposal.
Added new test files since changes where also required in the inputs.
Also removes unused init/drop intrinsics rather than trying to make them work for 64-bit.
Differential Revision: https://reviews.llvm.org/D82821
Mark these tests as only failing on PowerPC. Avoids unexpected passes on
other bots.
Fingers crossed.
Differential Revision: https://reviews.llvm.org/D80952
We currently have strict floating point/constrained floating point enabled
for all targets. Constrained SDAG nodes get converted to the regular ones
before reaching the target layer. In theory this should be fine.
However, the changes are exposed to users through multiple clang options
already in use in the field, and the changes are _completely_ _untested_
on almost all of our targets. Bugs have already been found, like
"https://bugs.llvm.org/show_bug.cgi?id=45274".
This patch disables constrained floating point options in clang everywhere
except X86 and SystemZ. A warning will be printed when this happens.
Differential Revision: https://reviews.llvm.org/D80952
Summary:
Change stack alignment from 64 bits to 128 bits to follow ABI correctly.
And add a regression test for datalayout.
Reviewers: simoll, k-ishizaka
Reviewed By: simoll
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #llvm, #ve, #clang
Differential Revision: https://reviews.llvm.org/D83173
Making -g[no-]column-info opt out reduces the length of a typical CC1 command line.
Additionally, in a non-debug compile, we won't see -dwarf-column-info.
Assume bundle can have more than one entry with the same name,
but at least AlignmentFromAssumptionsPass::extractAlignmentInfo() uses
getOperandBundle("align"), which internally assumes that it isn't the
case, and happily crashes otherwise.
Minimal reduced reproducer: run `opt -alignment-from-assumptions` on
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%0 = type { i64, %1*, i8*, i64, %2, i32, %3*, i8* }
%1 = type opaque
%2 = type { i8, i8, i16 }
%3 = type { i32, i32, i32, i32 }
; Function Attrs: nounwind
define i32 @f(%0* noalias nocapture readonly %arg, %0* noalias %arg1) local_unnamed_addr #0 {
bb:
call void @llvm.assume(i1 true) [ "align"(%0* %arg, i64 8), "align"(%0* %arg1, i64 8) ]
ret i32 0
}
; Function Attrs: nounwind willreturn
declare void @llvm.assume(i1) #1
attributes #0 = { nounwind "reciprocal-estimates"="none" }
attributes #1 = { nounwind willreturn }
This is what we'd have with -mllvm -enable-knowledge-retention
This reverts commit c95ffadb24.
bfloat16 variants of svdup_lane were missing, and svcvtnt_bf16_x
was implemented incorrectly (it takes an operand for the inactive
lanes)
Reviewers: fpetrogalli, efriedma
Reviewed By: fpetrogalli
Tags: #clang
Differential Revision: https://reviews.llvm.org/D82908
The x86-64 "avx" feature changes how >128 bit vector types are passed,
instead of being passed in separate 128 bit registers, they can be
passed in 256 bit registers.
"avx512f" does the same thing, except it switches from 256 bit registers
to 512 bit registers.
The result of both of these is an ABI incompatibility between functions
compiled with and without these features.
This patch implements a warning/error pair upon an attempt to call a
function that would run afoul of this. First, if a function is called
that would have its ABI changed, we issue a warning.
Second, if said call is made in a situation where the caller and callee
are known to have different calling conventions (such as the case of
'target'), we instead issue an error.
Differential Revision: https://reviews.llvm.org/D82562
clang-cl passes -x86-asm-syntax=intel to the cc1 invocation so that
assembly listings produced by the /FA flag are printed in Intel dialect.
That flag however should not affect the *parsing* of inline assembly in
the program. (See r322652)
When compiling normally, AsmPrinter::emitInlineAsm is used for
assembling and defaults to At&t dialect. However, when compiling for
ThinLTO, the code which parses module level inline asm to find symbols
for the symbol table was failing to set the dialect. This patch fixes
that. (See the bug for more details.)
Differential revision: https://reviews.llvm.org/D82862
Summary:
The following feature macros have been added:
__ARM_FEATURE_SVE_BF16
__ARM_FEATURE_SVE_MATMUL_INT8
__ARM_FEATURE_SVE_MATMUL_FP32
__ARM_FEATURE_SVE_MATMUL_FP64
The driver has been updated to enable them accordingly to the value of
the target feature passed at command line.
The SVE ACLE tests using the macros have been modified to work with
the target feature instead of passing the macro at command line.
Reviewers: sdesmalen, efriedma, c-rhodes, kmclaughlin, SjoerdMeijer, rengolin
Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D82623
Fix a warning in getNode() when extracting a subvector from a
concat vector. We can simply replace the call to getVectorNumElements
with getVectorMinNumElements as this follows the defined behaviour
for EXTRACT_SUBVECTOR.
Differential Revision: https://reviews.llvm.org/D82746
Summary:
Patch adds tests for mangling of svbfloat16_t and several other type
related tests.
Reviewers: sdesmalen, kmclaughlin, fpetrogalli, efriedma
Reviewed By: sdesmalen, fpetrogalli
Differential Revision: https://reviews.llvm.org/D82668
This reverts commit defd43a5b3.
with correction to solve msan report
To solve https://bugs.llvm.org/show_bug.cgi?id=46166 where the
floating point settings in PCH files aren't compatible, rewrite
FPFeatures to use a delta in the settings rather than absolute settings.
With this patch, these floating point options can be benign.
Reviewers: rjmccall
Differential Revision: https://reviews.llvm.org/D81869
CPUs with avx always have xsave, but some CPUs without avx also
have xsave. So we shouldn't disable xsave just because avx is
disabled. This would prevent xsave from being enabled with
-march=native on CPUs with xsave and not avx.
But we also don't want -mavx -mno-avx to leave xsave eanabled.
So only enable xsave if avx is enabled after processing all features.
I thought about just not turning xsave on with avx at all, but
there might be someone out there depending on it.
The original patch was reverted in
ff5ccf258e
as it was missing the C tests that got accidentally missing.
This patch is a NFC of https://reviews.llvm.org/D82501, together with
the SVE ACLE tests for the C intrinsics of svreinterpret for brain
float types.
This reverts commit b55d723ed6.
Reapply Modify FPFeatures to use delta not absolute settings
To solve https://bugs.llvm.org/show_bug.cgi?id=46166 where the
floating point settings in PCH files aren't compatible, rewrite
FPFeatures to use a delta in the settings rather than absolute settings.
With this patch, these floating point options can be benign.
Reviewers: rjmccall
Differential Revision: https://reviews.llvm.org/D81869
When writing a unit test on replacing standard epilogue sequences with `BR __mspabi_func_epilog_<N>`, by manually asm-clobbering `rN` - `r10` for N = 4..10, everything worked well except for seeming inability to clobber r4.
The problem was that MSP430 code generator of LLVM used an obsolete name FP for that register. Things were worse because when `llc` read an unknown register name, it silently ignored it.
That is, I cannot use `fp` register name from the C code because Clang does not accept it (exactly like GCC). But the accepted name `r4` is not recognised by `llc` (it can be used in listings passed to `llvm-mc` and even `fp` is replace to `r4` by `llvm-mc`). So I can specify any of `fp` or `r4` for the string literal of `asm(...)` but nothing in the clobber list.
This patch replaces `MSP430::FP` with `MSP430::R4` in the backend code (even [MSP430 EABI](http://www.ti.com/lit/an/slaa534/slaa534.pdf) doesn't mention FP as a register name). The R0 - R3 registers, on the other hand, are left as is in the backend code (after all, they have some special meaning on the ISA level). It is just ensured clang is renaming them as expected by the downstream tools. There is probably not much sense in **marking them clobbered** but rename them //just in case// for use at potentially different contexts.
Differential Revision: https://reviews.llvm.org/D82184
Summary:
`svwhilerw_bf16` and `svwhilewr_bf16` intrinsics use the scalar
`bfloat16_t`
type which is predicated on `__ARM_FEATURE_BF16_SCALAR_ARITHMETIC`. This
patch changes the feature guard from `__ARM_FEATURE_SVE_BF16` to the
scalar bfloat feature macro.
The verify tests for `+bf16` are also removed in this patch. The purpose
of these checks was to match the SVE2 ACLE tests that look for an
implicit declaration warning if the feature isn't set. They worked when
the intrinsics were guarded on `__ARM_FEATURE_SVE_BF16` as the
`bfloat16_t`
was guarded on a different macro, but with both the type and intrinsic
guarded on the same macro an earlier error is triggered in the ACLE
regarding the type and we don't get a warning as we do for SVE2.
Reviewers: sdesmalen, fpetrogalli, kmclaughlin, rengolin, efriedma
Reviewed By: sdesmalen, fpetrogalli
Differential Revision: https://reviews.llvm.org/D82578
This change enables PowerPC compiler builtins to generate constrained
floating point operations when clang is indicated to do so.
A couple of possibly unexpected backend divergences between constrained
floating point and regular behavior are highlighted under the test tag
FIXME-CHECK. This may be something for those on the PPC backend to look
at.
Patch by: Drew Wock <drew.wock@sas.com>
Differential Revision: https://reviews.llvm.org/D82020
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
This patch enables the following macros when their corresponding
target attributes are set:
__ARM_FEATURE_SVE (+sve)
__ARM_FEATURE_SVE2 (+sve2)
__ARM_FEATURE_SVE2_AES (+sve2-aes)
__ARM_FEATURE_SVE2_BITPERM (+sve2-bitperm)
__ARM_FEATURE_SVE2_SHA3 (+sve2-sha3)
__ARM_FEATURE_SVE2_SM4 (+sve2-sm4)
This implies that the base SVE and SVE2 ACLE (00bet2) are now feature
complete, meaning that all intrinsics are implemented in LLVM and Clang.
Disclaimer:
To implement the ACLE we have had to fix up many parts of LLVM to make it
support scalable vectors. We have also used many target-specific intrinsics
to reduce reliance on parts of LLVM where we know scalable vectors may
not yet be handled properly (e.g. some transformation might drop the
'scalable' flag on a vector type). While we've done a best effort with
the limited testing that is available to us, we're still working to improve the
stability of the implementation. Additionally, Clang may print warnings
that code may have miscompiled. We find this often to be a false alarm
where the wrong interfaces have been used in LLVM and where resulting
code is not actually incorrect. However, this warrants a bug report
and investigation. If you find any bugs or issues, please raise them on
bugs.llvm.org and let us know!
Reviewers: rengolin, efriedma, david-arm, SjoerdMeijer
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D81725
This patch implements builtins for the following prototypes:
unsigned long long __builtin_cntlzdm (unsigned long long, unsigned long long)
unsigned long long __builtin_cnttzdm (unsigned long long, unsigned long long)
vector unsigned long long vec_cntlzm (vector unsigned long long, vector unsigned long long)
vector unsigned long long vec_cnttzm (vector unsigned long long, vector unsigned long long)
Differential Revision: https://reviews.llvm.org/D80941
EmitTargetMetadata passed to emitTargetMD a null pointer as returned
from GetGlobalValue, for an unused inline function which has been
removed from the module at that point.
A FIXME in CodeGenModule.cpp commented that the calling code in
EmitTargetMetadata should be moved into the one target that needs it
(XCore). A review comment agreed. So the calling loop has been moved
into the XCore subclass. The check for null is done in that loop.
Differential Revision: https://reviews.llvm.org/D77068
Fix test case added by D79830
Rewrite the test case, which did similar thing as builtin-expect.c
does(test generated llvm intrinsic instead of test branch weights).
Currently pass by "-disable-llvm-passes" option.
Differential Revision: https://reviews.llvm.org/D82403
This clang test unfortunately depends on the actions of the optimizer,
which some of the buildbots hit.
This patch makes it so it cannot ignore the return value of 'f', so it
won't do away with the implementation.
This patch contains:
- Support in LLVM CodeGen for bfloat16 types for ld2/3/4 and st2/3/4.
- New bfloat16 ACLE builtins for svld(2|3|4)[_vnum] and svst(2|3|4)[_vnum]
Reviewers: stuij, efriedma, c-rhodes, fpetrogalli
Reviewed By: fpetrogalli
Tags: #clang, #lldb, #llvm
Differential Revision: https://reviews.llvm.org/D82187
Summary:
svbfloat16_t should only be defined if the __ARM_FEATURE_SVE_BF16
feature macro is enabled, similar to the scalar bfloat16_t type. Also,
arm_bf16.h should be included in arm_sve.h when
__ARM_FEATURE_BF16_SCALAR_ARITHMETIC is defined.
Patch also contains a fix for ld1ro intrinsic which should be guarded on
__ARM_FEATURE_SVE_BF16 rather than __ARM_FEATURE_BF16_SCALAR_ARITHMETIC,
and a fix for bfmmla test which was missing
__ARM_FEATURE_BF16_SCALAR_ARITHMETIC and -target-feature +bf16 in the
RUN line.
Reviewed By: fpetrogalli
Differential Revision: https://reviews.llvm.org/D82178
This patch implements builtins for the following prototypes for the VSX Permute
Control Vector Generate with Mask Instructions:
vector unsigned char vec_genpcvm (vector unsigned char, const int);
vector unsigned short vec_genpcvm (vector unsigned short, const int);
vector unsigned int vec_genpcvm (vector unsigned int, const int);
vector unsigned long long vec_genpcvm (vector unsigned long long, const int);
Differential Revision: https://reviews.llvm.org/D81774
Currently, in order to extract an element from a bf16 vector, we cast
the vector to an i16 vector, perform the extraction, and cast the result to
bfloat. This behavior was copied from the old fp16 implementation.
The goal of this patch is to achieve optimal code generation for lane
copying intrinsics in a subsequent patch (LLVM fails to fold certain
combinations of bitcast, insertelement, extractelement and
shufflevector instructions leading to the generation of suboptimal code).
Differential Revision: https://reviews.llvm.org/D82206
Add a new builtin-function __builtin_expect_with_probability and
intrinsic llvm.expect.with.probability.
The interface is __builtin_expect_with_probability(long expr, long
expected, double probability).
It is mainly the same as __builtin_expect besides one more argument
indicating the probability of expression equal to expected value. The
probability should be a constant floating-point expression and be in
range [0.0, 1.0] inclusive.
It is similar to builtin-expect-with-probability function in GCC
built-in functions.
Differential Revision: https://reviews.llvm.org/D79830
When writing a unit test on replacing standard epilogue sequences with `BR __mspabi_func_epilog_<N>`, by manually asm-clobbering `rN` - `r10` for N = 4..10, everything worked well except for seeming inability to clobber r4.
The problem was that MSP430 code generator of LLVM used an obsolete name FP for that register. Things were worse because when `llc` read an unknown register name, it silently ignored it.
Differential Revision: https://reviews.llvm.org/D82184
Cooperlake can be detect by compiler-rt now, but not libgcc yet.
Tigerlake can't be detected by either. Both names are accepted by
gcc. Hopefully the detection code will be in place soon.
This patch implements builtins for the following prototypes:
```
vector signed char vec_clrl (vector signed char a, unsigned int n);
vector unsigned char vec_clrl (vector unsigned char a, unsigned int n);
vector signed char vec_clrr (vector signed char a, unsigned int n);
vector signed char vec_clrr (vector unsigned char a, unsigned int n);
```
Differential Revision: https://reviews.llvm.org/D81707
1. Provides no piroirity supoort && disables three priority related
attributes: init_priority, ctor attr, dtor attr;
2. '-qunique' in XL compiler equivalent behavior of emitting sinit
and sterm functions name using getUniqueModuleId() util function
in LLVM (currently no support for InternalLinkage and WeakODRLinkage
symbols);
3. Add testcases to emit IR sample with __sinit80000000, __dtor, and
__sterm80000000;
4. Temporarily side-steps the need to implement the functionality of
llvm.global_ctors and llvm.global_dtors arrays. The uses of that
functionality in this patch (with respect to the name of the functions
involved) are not representative of how the functionality will be used
once implemented.
Differential Revision: https://reviews.llvm.org/D74166