target/teams/distribute regions.
Previously introduced globalization scheme that uses memory coalescing
scheme may increase memory usage fr the variables that are devlared in
target/teams/distribute contexts. We don't need 32 copies of such
variables, just 1. Patch reduces memory use in this case.
llvm-svn: 344273
Summary:
As per IRC disscussion, it seems we really want to have more fine-grained `-fsanitize=implicit-integer-truncation`:
* A check when both of the types are unsigned.
* Another check for the other cases (either one of the types is signed, or both of the types is signed).
This is clang part.
Compiler-rt part is D50902.
Reviewers: rsmith, vsk, Sanitizers
Reviewed by: rsmith
Differential Revision: https://reviews.llvm.org/D50901
llvm-svn: 344230
This can be used to preserve profiling information across codebase
changes that have widespread impact on mangled names, but across which
most profiling data should still be usable. For example, when switching
from libstdc++ to libc++, or from the old libstdc++ ABI to the new ABI,
or even from a 32-bit to a 64-bit build.
The user can provide a remapping file specifying parts of mangled names
that should be treated as equivalent (eg, std::__1 should be treated as
equivalent to std::__cxx11), and profile data will be treated as
applying to a particular function if its name is equivalent to the name
of a function in the profile data under the provided equivalences. See
the documentation change for a description of how this is configured.
Remapping is supported for both sample-based profiling and instruction
profiling. We do not support remapping indirect branch target
information, but all other profile data should be remapped
appropriately.
Support is only added for the new pass manager. If someone wants to also
add support for this for the old pass manager, doing so should be
straightforward.
llvm-svn: 344199
This is currently a clang extension and a resolution
of the defect report in the C++ Standard.
Differential Revision: https://reviews.llvm.org/D46441
llvm-svn: 344150
When ifunc support was added to Clang (r265917) it did not allow
resolvers to take function arguments. This was based on GCC's
documentation, which states resolvers return a pointer and take no
arguments.
However, GCC actually allows resolvers to take arguments, and glibc (on
non-x86 platforms) and FreeBSD (on x86 and arm64) pass some CPU
identification information as arguments to ifunc resolvers. I believe
GCC's documentation is simply incorrect / out-of-date.
FreeBSD already removed the prohibition in their in-tree Clang copy.
Differential Revision: https://reviews.llvm.org/D52703
llvm-svn: 344100
Added support for memory coalescing for better performance for
globalized variables. From now on all the globalized variables are
represented as arrays of 32 elements and each thread accesses these
elements using `tid & 31` as index.
llvm-svn: 344049
DWARF v5 introduces DW_AT_call_all_calls, a subprogram attribute which
indicates that all calls (both regular and tail) within the subprogram
have call site entries. The information within these call site entries
can be used by a debugger to populate backtraces with synthetic tail
call frames.
Tail calling frames go missing in backtraces because the frame of the
caller is reused by the callee. Call site entries allow a debugger to
reconstruct a sequence of (tail) calls which led from one function to
another. This improves backtrace quality. There are limitations: tail
recursion isn't handled, variables within synthetic frames may not
survive to be inspected, etc. This approach is not novel, see:
https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=jelinek.pdf
This patch adds an IR-level flag (DIFlagAllCallsDescribed) which lowers
to DW_AT_call_all_calls. It adds the minimal amount of DWARF generation
support needed to emit standards-compliant call site entries. For easier
deployment, when the debugger tuning is LLDB, the DWARF requirement is
adjusted to v4.
Testing: Apart from check-{llvm, clang}, I built a stage2 RelWithDebInfo
clang binary. Its dSYM passed verification and grew by 1.4% compared to
the baseline. 151,879 call site entries were added.
rdar://42001377
Differential Revision: https://reviews.llvm.org/D49887
llvm-svn: 343883
getGUID() returns an uint64_t and "%x" only prints 32 bits of it.
Use PRIx64 format string to print all 64 bits.
Differential Revision: https://reviews.llvm.org/D52938
llvm-svn: 343875
Fixed emission of the __kmpc_global_thread_num() so that it is not
messed up with alloca instructions anymore. Plus, fixes emission of the
__kmpc_global_thread_num() functions in the target outlined regions so
that they are not called before runtime is initialized.
llvm-svn: 343856
Summary: Add an optional attribute referring to a tuple of type and value template parameter nodes to the DIGlobalVariable node. This allows us to record the parameters of template variable specializations.
Reviewers: dblaikie, aprantl, probinson, JDevlieghere, clayborg, jingham
Reviewed By: JDevlieghere
Subscribers: cfe-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D52058
llvm-svn: 343707
Worker threads fork off to the compiler generated worker function
directly after entering the kernel function. Hence, there is no
need to check whether the current thread is the master if we are
outside of a parallel region (neither SPMD nor parallel_level > 0).
Differential Revision: https://reviews.llvm.org/D52732
llvm-svn: 343618
Only need to care about the 'distribute simd' case, all other composite
directives are handled elsewhere. This was already reflected in the
outer 'if' condition, so all other inner conditions could never be true.
Differential Revision: https://reviews.llvm.org/D52731
llvm-svn: 343617
This patch renames -f{no-}cuda-rdc to -f{no-}gpu-rdc and keeps the original
options as aliases. When -fgpu-rdc is off,
clang will assume the device code in each translation unit does not call
external functions except those in the device library, therefore it is possible
to compile the device code in each translation unit to self-contained kernels
and embed them in the host object, so that the host object behaves like
usual host object which can be linked by lld.
The benefits of this feature is: 1. allow users to create static libraries which
can be linked by host linker; 2. amortized device code linking time.
This patch modifies HIP action builder to insert actions for linking device
code and generating HIP fatbin, and pass HIP fatbin to host backend action.
It extracts code for constructing command for generating HIP fatbin as
a function so that it can be reused by early finalization. It also modifies
codegen of HIP host constructor functions to embed the device fatbin
when it is available.
Differential Revision: https://reviews.llvm.org/D52377
llvm-svn: 343611
This reverts r326937 as it broke block argument handling in OpenCL.
See the discussion on https://reviews.llvm.org/D43783 .
The next commit will add a test case that revealed the issue.
llvm-svn: 343582
of a non-trivial C struct, copy the preceding trivial fields that
haven't been copied.
This commit fixes a bug where the instructions used to copy the
preceding trivial fields were emitted inside the loop body.
rdar://problem/44185064
llvm-svn: 343556
from those that aren't.
This patch changes the way __block variables that aren't captured by
escaping blocks are handled:
- Since non-escaping blocks on the stack never get copied to the heap
(see https://reviews.llvm.org/D49303), Sema shouldn't error out when
the type of a non-escaping __block variable doesn't have an accessible
copy constructor.
- IRGen doesn't have to use the specialized byref structure (see
https://clang.llvm.org/docs/Block-ABI-Apple.html#id8) for a
non-escaping __block variable anymore. Instead IRGen can emit the
variable as a normal variable and copy the reference to the block
literal. Byref copy/dispose helpers aren't needed either.
This reapplies r343518 after fixing a use-after-free bug in function
Sema::ActOnBlockStmtExpr where the BlockScopeInfo was dereferenced after
it was popped and deleted.
rdar://problem/39352313
Differential Revision: https://reviews.llvm.org/D51564
llvm-svn: 343542
from those that aren't.
This patch changes the way __block variables that aren't captured by
escaping blocks are handled:
- Since non-escaping blocks on the stack never get copied to the heap
(see https://reviews.llvm.org/D49303), Sema shouldn't error out when
the type of a non-escaping __block variable doesn't have an accessible
copy constructor.
- IRGen doesn't have to use the specialized byref structure (see
https://clang.llvm.org/docs/Block-ABI-Apple.html#id8) for a
non-escaping __block variable anymore. Instead IRGen can emit the
variable as a normal variable and copy the reference to the block
literal. Byref copy/dispose helpers aren't needed either.
This reapplies r341754, which was reverted in r341757 because it broke a
couple of bots. r341754 was calling markEscapingByrefs after the call to
PopFunctionScopeInfo, which caused the popped function scope to be
cleared out when the following code was compiled, for example:
$ cat test.m
struct A {
id data[10];
};
void foo() {
__block A v;
^{ (void)v; };
}
This commit calls markEscapingByrefs before calling PopFunctionScopeInfo
to prevent that from happening.
rdar://problem/39352313
Differential Revision: https://reviews.llvm.org/D51564
llvm-svn: 343518
lightweight runtime.
The datasharing flag must be set to `1` when executing SPMD-mode compatible directive with reduction|lastprivate clauses.
llvm-svn: 343492
There are a few leftovers of rC343147 that are not (\w+)\.begin but in
the form of ([-[:alnum:]>.]+)\.begin or spanning two lines. Change them
to use the container form in this commit. The 12 occurrences have been
inspected manually for safety.
llvm-svn: 343425
Summary: Set default schedule for parallel for loops to schedule(static, 1) when using SPMD mode on the NVPTX device offloading toolchain to ensure coalescing.
Reviewers: ABataev, Hahnfeld, caomhin
Reviewed By: ABataev
Subscribers: jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D52629
llvm-svn: 343260
Summary: For the OpenMP NVPTX toolchain choose a default distribute schedule that ensures coalescing on the GPU when in SPMD mode. This significantly increases the performance of offloaded target code and reduces the number of registers used on the GPU side.
Reviewers: ABataev, caomhin, Hahnfeld
Reviewed By: ABataev, Hahnfeld
Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D52434
llvm-svn: 343253
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.
After fixing PR37395.
After fixing problems in LiveDebugVariables.
After fixing NULL symbol problems in AddressPool when enabling
split-dwarf-file.
Differential Revision: https://reviews.llvm.org/D45045
llvm-svn: 343148
Previously we used a select and the zero_undef=true intrinsic. In -O2 this pattern will get optimized to zero_undef=false. But in -O0 this optimization won't happen. This results in a compare and cmov being wrapped around a tzcnt/lzcnt instruction.
By using the zero_undef=false intrinsic directly without the select, we can improve the -O0 codegen to just an lzcnt/tzcnt instruction.
Differential Revision: https://reviews.llvm.org/D52392
llvm-svn: 343126
Add support for OMP5.0 requires directive and unified_address clause.
Patches to follow will include support for additional clauses.
Differential Revision: https://reviews.llvm.org/D52359
llvm-svn: 343063
Relanding rL342883 with more fragmented tests to test ELF-specific
section emission separately from broad-scope CFString tests. Now this
tests the following separately
1). CoreFoundation builds and linkage for ELF while building it.
2). CFString ELF section emission outside CF in assembly output.
3). Broad scope `cfstring3.c` tests which cover all object formats at
bitcode level and assembly level (including ELF).
This fixes non-bridged CoreFoundation builds on ELF targets
that use -fconstant-cfstrings. The original changes from differential
for a similar patch to PE/COFF (https://reviews.llvm.org/D44491) did not
check for an edge case where the global could be a constant which surfaced
as an issue when building for ELF because of different linkage semantics.
This patch addresses several issues with crashes related to CF builds on ELF
as well as improves data layout by ensuring string literals that back
the actual CFConstStrings end up in .rodata in line with Mach-O.
Change itself tested with CoreFoundation on Linux x86_64 but should be valid
for BSD-like systems as well that use ELF as the native object format.
Differential Revision: https://reviews.llvm.org/D52344
llvm-svn: 343038
[Clang][CodeGen][ObjC]: Fix non-bridged CoreFoundation builds on ELF targets
that use `-fconstant-cfstrings`. The original changes from differential
for a similar patch to PE/COFF (https://reviews.llvm.org/D44491) did not
check for an edge case where the global could be a constant which surfaced
as an issue when building for ELF because of different linkage semantics.
This patch addresses several issues with crashes related to CF builds on ELF
as well as improves data layout by ensuring string literals that back
the actual CFConstStrings end up in .rodata in line with Mach-O.
Change itself tested with CoreFoundation on Linux x86_64 but should be valid
for BSD-like systems as well that use ELF as the native object format.
Differential Revision: https://reviews.llvm.org/D52344
llvm-svn: 342883
Comparison functions used in sorting algorithms need to have strict weak
ordering. Remove the assert and allow comparisons on all lists.
llvm-svn: 342774
Currently the code-model does not get saved in the module IR, so if a
code model is specified when compiling with LTO, it gets lost and is
not propagated properly to LTO. This patch does what is necessary in
the front end to pass the code-model to the module, so that the back
end can store it in the Module .
Differential Revision: https://reviews.llvm.org/D52323
llvm-svn: 342758
Summary:
This code was in CGDecl.cpp and really belongs in LLVM. It happened to have isBytewiseValue which served a very similar purpose but wasn't as powerful as clang's version. Remove the clang version, and augment isBytewiseValue to be as powerful so that clang does the same thing it used to.
LLVM part of this patch: D51751
Subscribers: dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D51752
llvm-svn: 342734
Summary:
Some lines have a hit counter where they should not have one.
Cleanup stuff is located to the last line of the body which is most of the time a '}'.
And Exception stuff is added at the beginning of a function and at the end (represented by '{' and '}').
So in such cases, the DebugLoc used in GCOVProfiling.cpp must be marked as not covered.
This patch is a followup of https://reviews.llvm.org/D49915.
Tests in projects/compiler_rt are fixed by: https://reviews.llvm.org/D49917
Reviewers: marco-c, davidxl
Reviewed By: marco-c
Subscribers: dblaikie, cfe-commits, sylvestre.ledru
Differential Revision: https://reviews.llvm.org/D49916
llvm-svn: 342717
unsigned long long builtin_unpack_vector_int128 (vector int128_t, int);
vector int128_t builtin_pack_vector_int128 (unsigned long long, unsigned long long);
Builtins should behave the same way as in GCC.
Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52074
llvm-svn: 342614
This special case was added in r264841, but the code breaks our
invariants by calling EmitTopLevelDecl without first creating a
HandlingTopLevelDeclRAII scope.
This fixes the PCH crash in https://crbug.com/884427. I was never able
to make a satisfactory reduction, unfortunately. I'm not very worried
about this regressing since this change makes the code simpler while
passing the existing test that shows we do emit dllexported friend
function definitions. Now we just defer their emission until the tag is
fully complete, which is generally good.
llvm-svn: 342516
Summary:
Before this change, we only emit the XRay attributes in LLVM IR when the
-fxray-instrument flag is provided. This may cause issues with thinlto
when the final binary is being built/linked with -fxray-instrument, and
the constitutent LLVM IR gets re-lowered with xray instrumentation.
With this change, we can honour the "never-instrument "attributes
provided in the source code and preserve those in the IR. This way, even
in thinlto builds, we retain the attributes which say whether functions
should never be XRay instrumented.
This change addresses llvm.org/PR38922.
Reviewers: mboerger, eizan
Subscribers: mehdi_amini, dexonsmith, cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D52015
llvm-svn: 342200
Previously, both types (plus the future target-clones) of
multiversioning had a separate ResolverOption structure and emission
function. This patch combines the two, at the expense of a slightly
more expensive sorting function.
llvm-svn: 342152
declare reduction.
If the declare reduction construct with the non-dependent type is
defined in the template construct, the compiler might crash on the
template instantition. Reworked the whole instantiation scheme for the
declare reduction constructs to fix this problem correctly.
llvm-svn: 342151
Functions generated by clang and included in the .init_array section (such as
static constructors) do not follow the usual code path for adding
target-specific function attributes, so we have to add the return address
signing attribute here too, as is currently done for the sanitisers.
Differential revision: https://reviews.llvm.org/D51418
llvm-svn: 342126
Previously the alignment on the newly created rtti/typeinfo data was largely
not set, meaning that DataLayout::getPreferredAlignment was free to overalign
it to 16 bytes. This causes unnecessary code bloat.
Differential Revision: https://reviews.llvm.org/D51416
llvm-svn: 342053
Summary:
On targets that do not support FP16 natively LLVM currently legalizes
vectors of FP16 values by scalarizing them and promoting to FP32. This
causes problems for the following code:
void foo(int, ...);
typedef __attribute__((neon_vector_type(4))) __fp16 float16x4_t;
void bar(float16x4_t x) {
foo(42, x);
}
According to the AAPCS (appendix A.2) float16x4_t is a containerized
vector fundamental type, so 'foo' expects that the 4 16-bit FP values
are packed into 2 32-bit registers, but instead bar promotes them to
4 single precision values.
Since we already handle scalar FP16 values in the frontend by
bitcasting them to/from integers, this patch adds similar handling for
vector types and homogeneous FP16 vector aggregates.
One existing test required some adjustments because we now generate
more bitcasts (so the patch changes the test to target a machine with
native FP16 support).
Reviewers: eli.friedman, olista01, SjoerdMeijer, javed.absar, efriedma
Reviewed By: javed.absar, efriedma
Subscribers: efriedma, kristof.beyls, cfe-commits, chrib
Differential Revision: https://reviews.llvm.org/D50507
llvm-svn: 342034
This patch removes the last reason why DIFlagBlockByrefStruct from
Clang by directly implementing the drilling into the member type done
in DwarfDebug::DbgVariable::getType() into the frontend.
rdar://problem/31629055
Differential Revision: https://reviews.llvm.org/D51807
llvm-svn: 341842
from those that aren't.
This patch changes the way __block variables that aren't captured by
escaping blocks are handled:
- Since non-escaping blocks on the stack never get copied to the heap
(see https://reviews.llvm.org/D49303), Sema shouldn't error out when
the type of a non-escaping __block variable doesn't have an accessible
copy constructor.
- IRGen doesn't have to use the specialized byref structure (see
https://clang.llvm.org/docs/Block-ABI-Apple.html#id8) for a
non-escaping __block variable anymore. Instead IRGen can emit the
variable as a normal variable and copy the reference to the block
literal. Byref copy/dispose helpers aren't needed either.
rdar://problem/39352313
Differential Revision: https://reviews.llvm.org/D51564
llvm-svn: 341754
Summary:
The optimized (__atomic_foo_<n>) libcalls assume that the atomic object
is properly aligned, so should never be called on an underaligned
object.
This addresses one of several problems identified in PR38846.
Reviewers: jyknight, t.p.northover
Subscribers: jfb, cfe-commits
Differential Revision: https://reviews.llvm.org/D51817
llvm-svn: 341734
This is the clang side of D51803. The llvm intrinsic now returns two results. So we need to emit an explicit store in IR for the out parameter. This is similar to addcarry/subborrow/rdrand/rdseed.
Differential Revision: https://reviews.llvm.org/D51805
llvm-svn: 341699
This is the clang side of D51769. The llvm intrinsics now return two results instead of using an out parameter.
Differential Revision: https://reviews.llvm.org/D51771
llvm-svn: 341678
Boilerplate code for using KMSAN instrumentation in Clang.
We add a new command line flag, -fsanitize=kernel-memory, with a
corresponding SanitizerKind::KernelMemory, which, along with
SanitizerKind::Memory, maps to the memory_sanitizer feature.
KMSAN is only supported on x86_64 Linux.
It's incompatible with other sanitizers, but supports code coverage
instrumentation.
llvm-svn: 341641
This reverts commit r341519, which generates debug info that causes
backend crashes. (with -split-dwarf-file)
Details in https://reviews.llvm.org/D50495
llvm-svn: 341549
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.
After fixing PR37395.
After fixing problems in LiveDebugVariables.
Differential Revision: https://reviews.llvm.org/D45045
llvm-svn: 341519
Load Hardening.
Wires up the existing pass to work with a proper IR attribute rather
than just a hidden/internal flag. The internal flag continues to work
for now, but I'll likely remove it soon.
Most of the churn here is adding the IR attribute. I talked about this
Kristof Beyls and he seemed at least initially OK with this direction.
The idea of using a full attribute here is that we *do* expect at least
some forms of this for other architectures. There isn't anything
*inherently* x86-specific about this technique, just that we only have
an implementation for x86 at the moment.
While we could potentially expose this as a Clang-level attribute as
well, that seems like a good question to defer for the moment as it
isn't 100% clear whether that or some other programmer interface (or
both?) would be best. We'll defer the programmer interface side of this
for now, but at least get to the point where the feature can be enabled
without relying on implementation details.
This also allows us to do something that was really hard before: we can
enable *just* the indirect call retpolines when using SLH. For x86, we
don't have any other way to mitigate indirect calls. Other architectures
may take a different approach of course, and none of this is surfaced to
user-level flags.
Differential Revision: https://reviews.llvm.org/D51157
llvm-svn: 341363
These aren't documented in the Intel Intrinsics Guide, but are supported by gcc and icc.
Includes these intrinsics:
_ktestc_mask8_u8, _ktestz_mask8_u8, _ktest_mask8_u8
_ktestc_mask16_u8, _ktestz_mask16_u8, _ktest_mask16_u8
_ktestc_mask32_u8, _ktestz_mask32_u8, _ktest_mask32_u8
_ktestc_mask64_u8, _ktestz_mask64_u8, _ktest_mask64_u8
llvm-svn: 341265
This adds:
_cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64
_cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64
_load_mask8, _load_mask16, _load_mask32, _load_mask64
_store_mask8, _store_mask16, _store_mask32, _store_mask64
These are currently missing from the Intel Intrinsics Guide webpage.
llvm-svn: 341251
This adds the following intrinsics:
_kshiftli_mask8
_kshiftli_mask16
_kshiftli_mask32
_kshiftli_mask64
_kshiftri_mask8
_kshiftri_mask16
_kshiftri_mask32
_kshiftri_mask64
llvm-svn: 341234
Summary:
Added option -gline-directives-only to support emission of the debug directives
only. It behaves very similar to -gline-tables-only, except that it sets
llvm debug info emission kind to
llvm::DICompileUnit::DebugDirectivesOnly.
Reviewers: echristo
Subscribers: aprantl, fedor.sergeev, JDevlieghere, cfe-commits
Differential Revision: https://reviews.llvm.org/D51177
llvm-svn: 341212
'declare target'.
All the functions, referenced in implicit|explicit target regions must
be emitted during code emission for the device.
llvm-svn: 341093
If the target construct can be executed in SPMD mode + it is a loop
based directive with static scheduling, we can use lightweight runtime
support.
llvm-svn: 340953
Since MinGW supports automatically importing external variables from
DLLs even without the DLLImport attribute, we shouldn't mark them
as DSO local unless we actually know them to be local for sure.
Keep marking thread local variables as DSO local.
Differential Revision: https://reviews.llvm.org/D51382
llvm-svn: 340941
Currently ident_t objects are created const when debug info is not
enabled, but the libittnotify libray in the OpenMP runtime writes to
the reserved_2 field (See __kmp_itt_region_forking in
openmp/runtime/src/kmp_itt.inl). Now create ident_t objects non-const.
Differential Revision: https://reviews.llvm.org/D51331
llvm-svn: 340934
This adds the following intrinsics:
_kadd_mask64
_kadd_mask32
_kadd_mask16
_kadd_mask8
These are missing from the Intel Intrinsics Guide, but are implemented by both gcc and icc.
llvm-svn: 340879
This also adds a second intrinsic name for the 16-bit mask versions.
These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed.
llvm-svn: 340719
If all LLVM passes are disabled, we can't emit a summary because there
could be unnamed globals in the IR.
Differential Revision: https://reviews.llvm.org/D51198
llvm-svn: 340640
constants by default when there is no optimization.
GCC's option -fno-keep-static-consts can be used to not emit
unused static constants.
In Clang, since default behavior does not keep unused static constants,
-fkeep-static-consts can be used to emit these if required. This could be
useful for producing identification strings like SVN identifiers
inside the object file even though the string isn't used by the program.
Differential Revision: https://reviews.llvm.org/D40925
llvm-svn: 340439
of the captured variable when determining whether the capture needs
special handing when the block is copied or disposed.
This fixes bugs in the handling of variables captured by a block that is
nested inside a lambda that captures the variables by reference.
rdar://problem/43540889
Differential Revision: https://reviews.llvm.org/D51025
llvm-svn: 340408
EmitX86BuiltinExpr() emits all args into Ops at the beginning, so don't do that
work again.
This changes behavior: If e.g. ++a was passed as an arg, we incremented a twice
previously. This change fixes that bug.
https://reviews.llvm.org/D50979
llvm-svn: 340348
If using a custom stack alignment, one is expected to make sure
that all callers provide such alignment, or realign the stack in
all entry points (and callbacks).
Despite this, the compiler can assume that the main function will
need realignment in these cases, since the startup routines calling
the main function most probably won't provide the custom alignment.
This matches what GCC does in similar cases; if compiling with
-mincoming-stack-boundary=X -mpreferred-stack-boundary=X, GCC normally
assumes such alignment on entry to a function, but specifically for
the main function still does realignment.
Differential Revision: https://reviews.llvm.org/D51026
llvm-svn: 340334
This commit adds the flag -fno-c++-static-destructors and the attributes
[[clang::no_destroy]] and [[clang::always_destroy]]. no_destroy specifies that a
specific static or thread duration variable shouldn't have it's destructor
registered, and is the default in -fno-c++-static-destructors mode.
always_destroy is the opposite, and is the default in -fc++-static-destructors
mode.
A variable whose destructor is disabled (either because of
-fno-c++-static-destructors or [[clang::no_destroy]]) doesn't count as a use of
the destructor, so we don't do any access checking or mark it referenced. We
also don't emit -Wexit-time-destructors for these variables.
rdar://21734598
Differential revision: https://reviews.llvm.org/D50994
llvm-svn: 340306
This changes the current default behavior (from emitting pubnames by
default, to not emitting them by default) & moves to matching GCC's
behavior* with one significant difference: -gno(-gnu)-pubnames disables
pubnames even in the presence of -gsplit-dwarf (though -gsplit-dwarf
still by default enables -ggnu-pubnames). This allows users to disable
pubnames (& the new DWARF5 accelerated access tables) when they might
not be worth the size overhead.
* GCC's behavior is that -ggnu-pubnames and -gpubnames override each
other, and that -gno-gnu-pubnames and -gno-pubnames act as synonyms and
disable either kind of pubnames if they come last. (eg: -gpubnames
-gno-gnu-pubnames causes no pubnames (neither gnu or standard) to be
emitted)
llvm-svn: 340206
If the function is actually a weak reference, it should not be marked as
deferred definition as this is only a declaration. Patch adds checks for
the definitions if they must be emitted. Otherwise, only declaration is
emitted.
llvm-svn: 340191
by a block.
Added checks for capturing of the variable in the block when trying to
emit correct address for the variable with the reference type. This
extra check allows correctly identify the variables that are not
captured in the block context.
llvm-svn: 340181
This is a partial retry of rL340137 (reverted at rL340138 because of gcc host compiler crashing)
with 1 change:
Remove the changes to make microsoft builtins also use the LLVM intrinsics.
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops) that we want to replicate, we can change the names.
The funnel shift intrinsics were added here:
https://reviews.llvm.org/D49242
With improved codegen in:
https://reviews.llvm.org/rL337966https://reviews.llvm.org/rL339359
And basic IR optimization added in:
https://reviews.llvm.org/rL338218https://reviews.llvm.org/rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340141
This is a retry of rL340135 (reverted at rL340136 because of gcc host compiler crashing)
with 2 changes:
1. Move the code into a helper to reduce code duplication (and hopefully work-around the crash).
2. The original commit had a formatting bug in the docs (missing an underscore).
Original commit message:
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate,
we can change the names.
The funnel shift intrinsics were added here:
https://reviews.llvm.org/D49242
With improved codegen in:
https://reviews.llvm.org/rL337966https://reviews.llvm.org/rL339359
And basic IR optimization added in:
https://reviews.llvm.org/rL338218https://reviews.llvm.org/rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340137
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate,
we can change the names.
The funnel shift intrinsics were added here:
D49242
With improved codegen in:
rL337966
rL339359
And basic IR optimization added in:
rL338218
rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340135
expression
Clang emits invalid protocol metadata when a @protocol expression is used with a
forward-declared protocol. The protocol metadata is missing protocol conformance
list of the protocol since we don't have access to the definition of it in the
compiled translation unit. The linker then might end up picking the invalid
metadata when linking which will lead to incorrect runtime protocol conformance
checks.
This commit makes sure that Clang fails to compile code that uses a @protocol
expression with a forward-declared protocol. This ensures that Clang does not
emit invalid protocol metadata. I added an extra assert in CodeGen to ensure
that this kind of issue won't happen in other places.
rdar://32787811
Differential Revision: https://reviews.llvm.org/D49462
llvm-svn: 340102
Different shared libraries contain different fat binary, which is stored in a global variable
__hip_gpubin_handle. Since different compilation units share the same fat binary, this
variable has linkonce linkage. However, it should not be merged across different shared
libraries.
This patch set the visibility of the global variable to be hidden, which will make it invisible
in the shared library, therefore preventing it from being merged.
Differential Revision: https://reviews.llvm.org/D50596
llvm-svn: 340056
r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h.
Microsoft's STL plans on using these functions, and they're using intrin0.h
which just has declarations of built-ins to not pull in the huge intrin.h
header in the standard library headers. That requires that these functions are
real built-ins.
https://reviews.llvm.org/D50907
llvm-svn: 340048
Currently, clang generates a new block descriptor global variable for
each new block literal. This commit merges block descriptors that are
identical inside and across translation units using the same approach
taken in r339438.
To enable merging identical block descriptors, the size and signature of
the block and information about the captures are encoded into the name
of the block descriptor variable. Also, the block descriptor variable is
marked as linkonce_odr and unnamed_addr.
rdar://problem/42640703
Differential Revision: https://reviews.llvm.org/D50783
llvm-svn: 340041
- Add a command line options -msign-return-address to enable return address
signing
- Armv8.3a added instructions to sign the return address to help mitigate
against ROP attacks
- This patch adds command line options to generate function attributes that
signal to the back whether return address signing instructions should be
added
Differential revision: https://reviews.llvm.org/D49793
llvm-svn: 340019
Thread sanitizer instrumentation fails to skip all loads and stores to
profile counters. This can happen if profile counter updates are merged:
%.sink = phi i64* ...
%pgocount5 = load i64, i64* %.sink
%27 = add i64 %pgocount5, 1
%28 = bitcast i64* %.sink to i8*
call void @__tsan_write8(i8* %28)
store i64 %27, i64* %.sink
To suppress TSan diagnostics about racy counter updates, make the
counter updates atomic when TSan is enabled. If there's general interest
in this mode it can be surfaced as a clang/swift driver option.
Testing: check-{llvm,clang,profile}
rdar://40477803
Differential Revision: https://reviews.llvm.org/D50867
llvm-svn: 339955
The compiler may produce unexpected error messages/crashes when declare
target variables were used. Patch fixes problems with the declarations
marked as declare target to or link.
llvm-svn: 339805
Summary:
Another piece of my ongoing to work for prefer-vector-width.
min-legal-vector-width will eventually be used by the X86 backend to know whether it needs to make 512 bits type legal when prefer-vector-width=256. If the user used inline assembly that passed in/out a 512-bit register, we need to make sure 512 bits are considered legal. Otherwise we'll get an assert failure when we try to wire up the inline assembly to the rest of the code.
This patch just checks the LLVM IR types to see if they are vectors and then updates the attribute based on their total width. I'm not sure if this is the best way to do this or if there's any subtlety I might have missed. So if anyone has other opinions on how to do this I'm open to suggestions.
Reviewers: chandlerc, rsmith, rnk
Reviewed By: rnk
Subscribers: eraman, cfe-commits
Differential Revision: https://reviews.llvm.org/D50678
llvm-svn: 339721
Summary:
This probably fixes PR35277, though there may be other sources of
nondeterminism (this was the only case of iterating over a DenseMap).
It's difficult to provide a test case for this, because it shows up only
on systems with ASLR enabled.
Reviewers: rjmccall
Reviewed By: rjmccall
Subscribers: bmwiedemann, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D50559
llvm-svn: 339668
Summary: This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations.
Reviewers: craig.topper, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D46892
llvm-svn: 339651
Clang generates copy and dispose helper functions for each block literal
on the stack. Often these functions are equivalent for different blocks.
This commit makes changes to merge equivalent copy and dispose helper
functions and reduce code size.
To enable merging equivalent copy/dispose functions, the captured object
infomation is encoded into the helper function name. This allows IRGen
to check whether an equivalent helper function has already been emitted
and reuse the function instead of generating a new helper function
whenever a block is defined. In addition, the helper functions are
marked as linkonce_odr to enable merging helper functions that have the
same name across translation units and marked as unnamed_addr to enable
the linker's deduplication pass to merge functions that have different
names but the same content.
rdar://problem/42640608
Differential Revision: https://reviews.llvm.org/D50152
llvm-svn: 339438
Summary:
Introduces funclet-based unwinding for Objective-C and fixes an issue
where global blocks can't have their isa pointers initialised on
Windows.
After discussion with Dustin, this changes the name mangling of
Objective-C types to prevent a C++ catch statement of type struct X*
from catching an Objective-C object of type X*.
Reviewers: rjmccall, DHowett-MSFT
Reviewed By: rjmccall, DHowett-MSFT
Subscribers: mgrang, mstorsjo, smeenai, cfe-commits
Differential Revision: https://reviews.llvm.org/D50144
llvm-svn: 339428
This extension emits the guard cf table without inserting the
instrumentation. Currently that's what clang-cl does with /guard:cf
anyway, but this allows a user to request that explicitly.
Differential Revision: https://reviews.llvm.org/D50513
llvm-svn: 339420
Summary:
Windows does not allow globals to be initialised to point to globals in
another DLL. Exported globals may be referenced only from code. Work
around this by creating an initialiser that runs in early library
initialisation and sets the isa pointer.
Reviewers: rjmccall
Reviewed By: rjmccall
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D50436
llvm-svn: 339317
gcc defines an intrinsic called __builtin_clrsb which counts the number of extra sign bits on a number. This is equivalent to counting the number of leading zeros on a positive number or the number of leading ones on a negative number and subtracting one from the result. Since we can't count leading ones we need to invert negative numbers to count zeros.
This patch will cause the builtin to be expanded inline while gcc uses a call to a function like clrsbdi2 that is implemented in libgcc. But this is similar to what we already do for popcnt. And I don't think compiler-rt supports clrsbdi2.
Differential Revision: https://reviews.llvm.org/D50168
llvm-svn: 339282
r330571 added a new FrontendTimesIsEnabled variable and replaced many usages of llvm::TimePassesIsEnabled. Including the place that set llvm::TimePassesIsEnabled for -ftime-report. The effect of this is that -ftime-report now only contains the timers specifically referenced in CodeGenAction.cpp and none of the timers in the backend.
This commit adds back the assignment, but otherwise leaves everything else unchanged.
llvm-svn: 339281
As suggested by @theraven on PR38210, this patch fixes the gcc -Woverloaded-virtual warnings by renaming the extra CGObjCGNU::GetSelector method to CGObjCGNU::GetTypedSelector
Differential Revision: https://reviews.llvm.org/D50448
llvm-svn: 339264
declare target.
According to OpenMP 5.0, variables captured in lambdas in declare target
regions must be considered as implicitly declare target.
llvm-svn: 339152
Always emit alloca in entry block for enqueue_kernel builtin.
Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC
later because it is not in the entry block.
llvm-svn: 339150
These were intended to allow non-fragile and fragile ABI code to be
mixed, as long as the fragile classes were higher up the hierarchy than
the non-fragile ones. Unfortunately:
- No one actually wants to do this.
- Recent versions of Linux's run-time linker break it.
llvm-svn: 339128
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.
After fixing PR37395.
Differential Revision: https://reviews.llvm.org/D45045
llvm-svn: 338989
After refactoring DbgInfoIntrinsic class hierarchy, we use
DbgVariableIntrinsic as the base class of variable debug info.
In resolveTopLevelMetadata() in CGVTables.cpp, we only care about
dbg.value, so we try to cast the instructions to DbgVariableIntrinsic
before resolving variables.
Differential Revision: https://reviews.llvm.org/D50226
llvm-svn: 338985
When a non-extended temporary object is created in a conditional branch, the
lifetime of that temporary ends outside the conditional (at the end of the
full-expression). If we're inserting lifetime markers, this means we could end
up generating
if (some_cond) {
lifetime.start(&tmp);
Tmp::Tmp(&tmp);
}
// ...
if (some_cond) {
lifetime.end(&tmp);
}
... for a full-expression containing a subexpression of the form `some_cond ?
Tmp().x : 0`. This patch moves the lifetime start for such a temporary out of
the conditional branch so that we don't need to generate an additional basic
block to hold the lifetime end marker.
This is disabled if we want precise lifetime markers (for asan's
stack-use-after-scope checks) or of the temporary has a non-trivial destructor
(in which case we'd generate an extra basic block anyway to hold the destructor
call).
Differential Revision: https://reviews.llvm.org/D50286
llvm-svn: 338945
Encoding offload target triples onto comdat group key for offload initialization
code guarantees that it will be executed once per each unique combination of
offload targets.
Differential Revision: https://reviews.llvm.org/D50218
llvm-svn: 338916
Found by KlockWorks, this variable is properly protected, however
the conditions in the test that initializes it and the one that uses
it could diverge, it seems to me that this is a 'free' init that will
prevent issues if one of the conditions is ever modified without the other.
llvm-svn: 338909
Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC
later because it is not in the entry block.
Differential Revision: https://reviews.llvm.org/D50104
llvm-svn: 338899
Summary:
Emit !llvm.mem.parallel_loop_access metadata for memory accesses even if the parallel loop is not the top on the loop stack.
Fixes llvm.org/PR37558.
Reviewers: ABataev, hfinkel, amusman, tyler.nowicki
Reviewed By: hfinkel
Subscribers: Meinersbur, hfinkel, cfe-commits
Differential Revision: https://reviews.llvm.org/D48808
llvm-svn: 338810
The way address space declarations for builtins currently work
is nearly useless. The code assumes the address spaces used for
builtins is a confusingly named "target address space" from user
code using __attribute__((address_space(N))) that matches
the builtin declaration. There's no way to use this to declare
a builtin that returns a language specific address space.
The terminology used is highly cofusing since it has nothing
to do with the the address space selected by the target to use
for a language address space.
This feature is essentially unused as-is. AMDGPU and NVPTX
are the only in-tree targets attempting to use this. The AMDGPU
builtins certainly do not behave as intended (i.e. all of the
builtins returning pointers can never compile because the numbered
address space never matches the expected named address space).
The NVPTX builtins are missing tests for some, and the others
seem to rely on an implicit addrspacecast.
Change the used address space for builtins based on a target
hook to allow using a language address space for a builtin.
This allows the same builtin declaration to be used for multiple
languages with similarly purposed address spaces (e.g. the same
AMDGPU builtin can be used in OpenCL and CUDA even though the
constant address spaces are arbitarily different).
This breaks the possibility of using arbitrary numbered
address spaces alongside the named address spaces for builtins.
If this is an issue we probably need to introduce another builtin
declaration character to distinguish language address spaces from
so-called "target address spaces".
llvm-svn: 338707
This adds support for the unroll_and_jam pragma, to go with the recently
added unroll and jam pass. The name of the pragma is the same as is used
in the Intel compiler, and most of the code works the same as for unroll.
#pragma clang loop unroll_and_jam has been separated into a different
patch. This part adds #pragma unroll_and_jam with an optional count, and
#pragma no_unroll_and_jam to disable the transform.
Differential Revision: https://reviews.llvm.org/D47267
llvm-svn: 338566
offload targets.
Changed the linkage of omp_offloading.img_start.<triple> and omp_offloading.img_end.<triple> symbols from external to external weak to allow dropping of some targets during linking.
llvm-svn: 338413
No need to change the linkage, we can avoid the problem using special variable. That points to the original variable and, thus, prevent some of the optimizations that might break the compilation.
llvm-svn: 338399
OpenCL block literal structs have different fields which are now correctly
identified in the debug info.
Differential Revision: https://reviews.llvm.org/D49930
llvm-svn: 338299
Summary:
C and C++ are interesting languages. They are statically typed, but weakly.
The implicit conversions are allowed. This is nice, allows to write code
while balancing between getting drowned in everything being convertible,
and nothing being convertible. As usual, this comes with a price:
```
unsigned char store = 0;
bool consume(unsigned int val);
void test(unsigned long val) {
if (consume(val)) {
// the 'val' is `unsigned long`, but `consume()` takes `unsigned int`.
// If their bit widths are different on this platform, the implicit
// truncation happens. And if that `unsigned long` had a value bigger
// than UINT_MAX, then you may or may not have a bug.
// Similarly, integer addition happens on `int`s, so `store` will
// be promoted to an `int`, the sum calculated (0+768=768),
// and the result demoted to `unsigned char`, and stored to `store`.
// In this case, the `store` will still be 0. Again, not always intended.
store = store + 768; // before addition, 'store' was promoted to int.
}
// But yes, sometimes this is intentional.
// You can either make the conversion explicit
(void)consume((unsigned int)val);
// or mask the value so no bits will be *implicitly* lost.
(void)consume((~((unsigned int)0)) & val);
}
```
Yes, there is a `-Wconversion`` diagnostic group, but first, it is kinda
noisy, since it warns on everything (unlike sanitizers, warning on an
actual issues), and second, there are cases where it does **not** warn.
So a Sanitizer is needed. I don't have any motivational numbers, but i know
i had this kind of problem 10-20 times, and it was never easy to track down.
The logic to detect whether an truncation has happened is pretty simple
if you think about it - https://godbolt.org/g/NEzXbb - basically, just
extend (using the new, not original!, signedness) the 'truncated' value
back to it's original width, and equality-compare it with the original value.
The most non-trivial thing here is the logic to detect whether this
`ImplicitCastExpr` AST node is **actually** an implicit conversion, //or//
part of an explicit cast. Because the explicit casts are modeled as an outer
`ExplicitCastExpr` with some `ImplicitCastExpr`'s as **direct** children.
https://godbolt.org/g/eE1GkJ
Nowadays, we can just use the new `part_of_explicit_cast` flag, which is set
on all the implicitly-added `ImplicitCastExpr`'s of an `ExplicitCastExpr`.
So if that flag is **not** set, then it is an actual implicit conversion.
As you may have noted, this isn't just named `-fsanitize=implicit-integer-truncation`.
There are potentially some more implicit conversions to be warned about.
Namely, implicit conversions that result in sign change; implicit conversion
between different floating point types, or between fp and an integer,
when again, that conversion is lossy.
One thing i know isn't handled is bitfields.
This is a clang part.
The compiler-rt part is D48959.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=21530 | PR21530 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=37552 | PR37552 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=35409 | PR35409 ]].
Partially fixes [[ https://bugs.llvm.org/show_bug.cgi?id=9821 | PR9821 ]].
Fixes https://github.com/google/sanitizers/issues/940. (other than sign-changing implicit conversions)
Reviewers: rjmccall, rsmith, samsonov, pcc, vsk, eugenis, efriedma, kcc, erichkeane
Reviewed By: rsmith, vsk, erichkeane
Subscribers: erichkeane, klimek, #sanitizers, aaron.ballman, RKSimon, dtzWill, filcab, danielaustin, ygribov, dvyukov, milianw, mclow.lists, cfe-commits, regehr
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D48958
llvm-svn: 338288
The "Procedure Call Procedure Call Standard for the ARM® Architecture"
(https://static.docs.arm.com/ihi0042/f/IHI0042F_aapcs.pdf), specifies that
composite types are passed according to their "natural alignment", i.e. the
alignment before alignment adjustment on the entire composite is applied.
The same applies for AArch64 ABI.
Clang, however, used the adjusted alignment.
GCC already implements the ABI correctly. With this patch Clang becomes
compatible with GCC and passes such arguments in accordance with AAPCS.
Differential Revision: https://reviews.llvm.org/D46013
llvm-svn: 338279
This commit increases the number of sections and overall output size of
.o files by 10% and sometimes a bit more. This alone is challenging for
some users, but it also appears to trigger an as-yet unexplained
behavior in the Gold linker where the memory usage increases
considerably more than 10% (we think).
The increase is also frustrating because in many (if not all) cases we
end up with almost all of the growth coming from the ELF overhead of
-ffunction-sections and such, not from actual extra code being emitted.
Richard Smith and Eric Christopher are both going to investigate this
and try to get to the bottom of what is triggering this and whether the
kinds of increases here are sustainable or what options we might have to
minimize the impact they have. However, this is currently breaking
a pretty large number of our users' builds so reverting it while we sort
out how to make progress here. I've seen a longer and more detailed
update to the commit thread.
llvm-svn: 338209
With this change compiler generates alignment checks for wider range
of types. Previously such checks were generated only for the record types
with non-trivial default constructor. So the types like:
struct alignas(32) S2 { int x; };
typedef __attribute__((ext_vector_type(2), aligned(32))) float float32x2_t;
did not get checks when allocated by 'new' expression.
This change also optimizes the checks generated for the arrays created
in 'new' expressions. Previously the check was generated for each
invocation of type constructor. Now the check is generated only once
for entire array.
Differential Revision: https://reviews.llvm.org/D49589
llvm-svn: 338199
CUDA 8.0 E.3.9.4 says: Within the body of a __device__ or __global__
function, only __shared__ variables or variables without any device
memory qualifiers may be declared with static storage class.
It is unclear how a function-scope non-const static variable
without device memory qualifier is implemented, therefore only static
const variable without device memory qualifier is allowed, which
can be emitted as a global variable in constant address space.
Currently clang only allows function-scope static variable with
__shared__ qualifier.
This patch also allows function-scope static const variable without
device memory qualifier and emits it as a global variable in constant
address space.
Differential Revision: https://reviews.llvm.org/D49931
llvm-svn: 338188
Summary: Microsoft's C++ object model for ARM64 is the same as that for X86_64.
For example, small structs with non-trivial copy constructors or virtual
function tables are passed indirectly. Currently, they are passed in registers
when compiled with clang.
Reviewers: rnk, mstorsjo, TomTan, haripul, javed.absar
Reviewed By: rnk, mstorsjo
Subscribers: kristof.beyls, chrib, llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D49770
llvm-svn: 338076
Summary:
Clang supports the GNU style ``__attribute__((interrupt))`` attribute on RISCV targets.
Permissible values for this parameter are user, supervisor, and machine.
If there is no parameter, then it defaults to machine.
Reference: https://gcc.gnu.org/onlinedocs/gcc/RISC-V-Function-Attributes.html
Based on initial patch by Zhaoshi Zheng.
Reviewers: asb, aaron.ballman
Reviewed By: asb, aaron.ballman
Subscribers: rkruppe, the_o, aaron.ballman, MartinMosbeck, brucehoult, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, cfe-commits
Differential Revision: https://reviews.llvm.org/D48412
llvm-svn: 338045
When an exception is thrown in a block copy helper function, captured
objects that have previously been copied should be destructed or
released. Similarly, captured objects that are yet to be released should
be released when an exception is thrown in a dispose helper function.
rdar://problem/42410255
Differential Revision: https://reviews.llvm.org/D49718
llvm-svn: 338041
The first argument for the parallel outlined functions, called as
serialized parallel regions, should be a pointer to the global thread id
that always is 0.
llvm-svn: 337957
Summary:
Right now automatic variables are either initialized with bzero followed by a few stores, or memcpy'd from a synthesized global. We end up encountering a fair amount of code where memcpy of non-zero byte patterns would be better than memcpy from a global because it touches less memory and generates a smaller binary. The optimizer could reason about this, but it's not really worth it when clang already knows.
This code could definitely be more clever but I'm not sure it's worth it. In particular we could track a histogram of bytes seen and figure out (as we do with bzero) if a memset could be followed by a handful of stores. Similarly, we could tune the heuristics for GlobalSize, but using the same as for bzero seems conservatively OK for now.
<rdar://problem/42563091>
Reviewers: dexonsmith
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D49771
llvm-svn: 337887
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.
After fixing PR37395.
Differential Revision: https://reviews.llvm.org/D45045
Patch by Hsiangkai Wang.
llvm-svn: 337800
This patch adds support for vrndi_f32() and vrndiq_f32()
intrinsics in AArch32 mode and for vrndns_f32() intrinsic in
AArch64 mode.
Differential Revision: https://reviews.llvm.org/D48829
llvm-svn: 337690
The optimization looks for opportunities to emit bzero, not memset. Rename the functions accordingly (and clang-format the diff) because I want to add a fallback optimization which actually tries to generate memset. bzero is still better and it would confuse the code to merge both.
llvm-svn: 337636
HIP generates one fat binary for all devices after linking. However, for each compilation
unit a ctor function is emitted which register the same fat binary. Measures need to be
taken to make sure the fat binary is only registered once.
Currently each ctor function calls __hipRegisterFatBinary and stores the returned value
to __hip_gpubin_handle. This patch changes the linkage of __hip_gpubin_handle to be linkonce
so that they are shared between LLVM modules. Then this patch adds check of value of
__hip_gpubin_handle to make sure __hipRegisterFatBinary is only called once. The code
is equivalent to
void *_gpubin_handle;
void ctor() {
if (__hip_gpubin_handle == 0) {
__hip_gpubin_handle = __hipRegisterFatBinary(...);
}
// register kernels and variables.
}
The patch also does similar change to dtors so that __hipUnregisterFatBinary
is called once.
Differential Revision: https://reviews.llvm.org/D49083
llvm-svn: 337631
MSVC doesn't, so neither should we.
Fixes PR38004, which is a crash that happens when we try to emit debug
info for a still-dependent partial variable template specialization.
As a follow-up, we should review what we're doing for function and class
member templates. It looks like we don't filter those out, but I can't
seem to get clang to emit any.
llvm-svn: 337616
no-ops.
A non-escaping block on the stack will never be called after its
lifetime ends, so it doesn't have to be copied to the heap. To prevent
a non-escaping block from being copied to the heap, this patch sets
field 'isa' of the block object to NSConcreteGlobalBlock and sets the
BLOCK_IS_GLOBAL bit of field 'flags', which causes the runtime to treat
the block as if it were a global block (calling _Block_copy on the block
just returns the original block and calling _Block_release is a no-op).
Also, a new flag bit 'BLOCK_IS_NOESCAPE' is added, which allows the
runtime or tools to distinguish between true global blocks and
non-escaping blocks.
rdar://problem/39352313
Differential Revision: https://reviews.llvm.org/D49303
llvm-svn: 337580
As documented here: https://software.intel.com/en-us/node/682969 and
https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning
is an ICC feature that provides for function multiversioning.
This feature is implemented with two attributes: First, cpu_specific,
which specifies the individual function versions. Second, cpu_dispatch,
which specifies the location of the resolver function and the list of
resolvable functions.
This is valuable since it provides a mechanism where the resolver's TU
can be specified in one location, and the individual implementions
each in their own translation units.
The goal of this patch is to be source-compatible with ICC, so this
implementation diverges from the ICC implementation in a few ways:
1- Linux x86/64 only: This implementation uses ifuncs in order to
properly dispatch functions. This is is a valuable performance benefit
over the ICC implementation. A future patch will be provided to enable
this feature on Windows, but it will obviously more closely fit ICC's
implementation.
2- CPU Identification functions: ICC uses a set of custom functions to identify
the feature list of the host processor. This patch uses the cpu_supports
functionality in order to better align with 'target' multiversioning.
1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function
marked cpu_dispatch be an empty definition. This patch supports that as well,
however declarations are also permitted, since the linker will solve the
issue of multiple emissions.
Differential Revision: https://reviews.llvm.org/D47474
llvm-svn: 337552
constant, don't convert the rest into a packed struct.
If an array constant has a large non-zero portion and a large zero
portion, we want to emit the first part as an array and the rest as a
zeroinitializer if possible. This fixes a memory usage regression from
r333141 when compiling PHP.
llvm-svn: 337498
device IDs are now 64-bit integers (as opposed to 32-bit)
map flags are 64-bit long (used to be 32-bit)
mappings for partially mapped structs are now calculated at compile time and members of partially mapped structs are flagged using the MEMBER_OF field
Support for is_device_ptr on struct members was dropped - this functionality is not supported by the OpenMP standard and its implementation is technically infeasible (however, use_device_ptr on struct members works as a non-standard extension of the compiler)
llvm-svn: 337468
The previous version of this patch (r332839) was reverted because it was
causing "definition with same mangled name as another definition" errors
in some module builds. This was caused by an unrelated bug in module
importing which it exposed. The importing problem was fixed in r336240,
so this recommits the original patch (r332839).
Differential Revision: https://reviews.llvm.org/D46685
llvm-svn: 337456
The codegen for this builtin was initially implemented to match GCC.
However, due to interest from users GCC changed behaviour to account for the
big endian bias of the instruction and correct it. This patch brings the
handling inline with GCC.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38192
Differential Revision: https://reviews.llvm.org/D49424
llvm-svn: 337449
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in as the function attribute
"null-pointer-is-valid"="true".
This CL only adds the attribute on the function.
It also strips "nonnull" attributes from function arguments but
keeps the related warnings unchanged.
Corresponding LLVM change rL336613 already updated the
optimizations to not treat null pointer dereferencing
as undefined if the attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: jyknight
Subscribers: drinkcat, xbolva00, cfe-commits
Differential Revision: https://reviews.llvm.org/D47894
llvm-svn: 337433
This patch uses CodeSegAttr to represent __declspec(code_seg) rather than
building on the existing support for #pragma code_seg.
The code_seg declspec is applied on functions and classes. This attribute
enables the placement of code into separate named segments, including compiler-
generated codes and template instantiations.
For more information, please see the following:
https://msdn.microsoft.com/en-us/library/dn636922.aspx
This patch fixes the regression for the support for attribute ((section).
746b78de78
Patch by Soumi Manna (Manna)
Differential Revision: https://reviews.llvm.org/D48841
llvm-svn: 337420
which was reverted in r337336.
The problem that required a revert was fixed in r337338.
Also added a missing "REQUIRES: x86-registered-target" to one of
the tests.
Original commit message:
> Teach Clang to emit address-significance tables.
>
> By default, we emit an address-significance table on all ELF
> targets when the integrated assembler is enabled. The emission of an
> address-significance table can be controlled with the -faddrsig and
> -fno-addrsig flags.
>
> Differential Revision: https://reviews.llvm.org/D48155
llvm-svn: 337339
Causing multiple failures on sanitizer bots due to TLS symbol errors,
e.g.
/usr/bin/ld: __msan_origin_tls: TLS definition in /home/buildbots/ppc64be-clang-test/clang-ppc64be/stage1/lib/clang/7.0.0/lib/linux/libclang_rt.msan-powerpc64.a(msan.cc.o) section .tbss.__msan_origin_tls mismatches non-TLS reference in /tmp/lit_tmp_0a71tA/mallinfo-3ca75e.o
llvm-svn: 337336
By default, we emit an address-significance table on all ELF
targets when the integrated assembler is enabled. The emission of an
address-significance table can be controlled with the -faddrsig and
-fno-addrsig flags.
Differential Revision: https://reviews.llvm.org/D48155
llvm-svn: 337333
If the declare target link entries are created but not used, the
compiler will produce an error message. Patch improves handling of such
situations + improves checks for possibly lost declare target variables.
llvm-svn: 337207
Summary: Automatic variable initialization was generating default-aligned stores (which are deprecated) instead of using the known alignment from the alloca. Further, they didn't specify inbounds.
Subscribers: dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D49209
llvm-svn: 337041
Summary: In the SPMD case, we need to initialize the data sharing and globalization infrastructure. This covers the case when an SPMD region calls a function in a different compilation unit.
Reviewers: ABataev, carlo.bertolli, caomhin
Reviewed By: ABataev
Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D49188
llvm-svn: 337015
Code in `CodeGenModule::SetFunctionAttributes()` could set an empty
attribute `implicit-section-name` on a function that is affected by
`#pragma clang text="section"`. This is incorrect because the attribute
should contain a valid section name. If the function additionally also
used `__attribute__((section("section")))` then this could result in
emitting the function in a section with an empty name.
The patch fixes the issue by removing the problematic code that sets
empty `implicit-section-name` from
`CodeGenModule::SetFunctionAttributes()` because it is sufficient to set
this attribute only from a similar code in `setNonAliasAttributes()`
when the function is emitted.
Differential Revision: https://reviews.llvm.org/D48916
llvm-svn: 336842
The member init list for the sole constructor for CodeGenFunction
has gotten out of hand, so this patch moves the non-parameter-dependent
initializations into the member value inits.
Note: This is what was intended to be committed in r336726
llvm-svn: 336729
The member init list for the sole constructor for CodeGenFunction
has gotten out of hand, so this patch moves the non-parameter-dependent
initializations into the member value inits.
llvm-svn: 336726
Summary:
Make sure that loop metadata only is put on the backedge
when expanding a do-while loop.
Previously we added the loop metadata also on the branch
in the pre-header. That could confuse optimization passes
and result in the loop metadata being associated with the
wrong loop.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38011
Committing on behalf of deepak2427 (Deepak Panickal)
Reviewers: #clang, ABataev, hfinkel, aaron.ballman, bjope
Reviewed By: bjope
Subscribers: bjope, rsmith, shenhan, zzheng, xbolva00, lebedev.ri, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D48721
llvm-svn: 336717
This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics.
This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling.
I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle.
llvm-svn: 336622
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.
Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.
To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.
To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.
To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.
To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.
There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.
Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.
Differential Revision: https://reviews.llvm.org/D48617
llvm-svn: 336583
In generic data-sharing mode we are allowed to not globalize local
variables that escape their declaration context iff they are declared
inside of the parallel region. We can do this because L2 parallel
regions are executed sequentially and, thus, we do not need to put
shared local variables in the global memory.
llvm-svn: 336567
This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases.
Factor the code into a helper function to make this easy.
llvm-svn: 336472
Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles.
While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle.
llvm-svn: 336388
This patch removes on optimization used with the TRUE/FALSE
predicates, as was suggested in https://reviews.llvm.org/D45616
for r335339.
The optimization was buggy, since r335339 used it also
for *_mask builtins, without actually applying the mask -- the
mask argument was just ignored.
Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D48715
llvm-svn: 336355
Update clang to treat fp128 as a valid base type for homogeneous aggregate
passing and returning.
Differential Revision: https://reviews.llvm.org/D48044
llvm-svn: 336308
Summary:
Emmiting new intrinsic that strips invariant.groups to make
devirtulization sound, as described in RFC: Devirtualization v2.
Reviewers: rjmccall, rsmith, amharc, kuhar
Subscribers: llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D47299
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336137
This matches the way NVCC does it. Doing module cleanup at global
destructor phase used to work, but is, apparently, too late for
the CUDA runtime in CUDA-9.2, which ends up crashing with double-free.
Differential Revision: https://reviews.llvm.org/D48613
llvm-svn: 335763
As brought up during the discussion of the DWARF5 accelerator tables,
there is currently no way to associate Objective-C methods with the
interface they belong to, other than the .apple_objc accelerator table.
After due consideration we came to the conclusion that it makes more
sense to follow Pavel's suggestion of just emitting this information in
the .debug_info section. One concern was that categories were
emitted in the .apple_names as well, but it turns out that LLDB doesn't
rely on the accelerator tables for this information.
This patch changes the codegen behavior to emit subprograms for
structure types, like we do for C++. This will result in the
DW_TAG_subprogram being nested as a child under its
DW_TAG_structure_type. This behavior is only enabled for DWARF5 and
later, so we can have a unique code path in LLDB with regards to
obtaining the class methods.
This was tested on the LLDB side and doesn't lead to a regression.
There's already code in place to deal with member functions in C++,
which deals with this transparently.
For more background please refer to the discussion on the mailing list:
http://lists.llvm.org/pipermail/llvm-dev/2018-June/123986.html
Differential revision: https://reviews.llvm.org/D48241
llvm-svn: 335757
We track when we see a name-shaped expression followed by a '<' token
and parse the '<' as a comparison. Then:
* if we see a token sequence that cannot possibly be an expression but
can be a template argument (in particular, a type-id) that follows
either a ',' or the '<', diagnose that the '<' was supposed to start
a template argument list, and
* if we see '>()', diagnose that the '<' was supposed to start a
template argument list.
This only changes the diagnostic for error cases, and in practice
appears to catch the most common cases where a missing 'template'
keyword leads to parse errors within a template.
Differential Revision: https://reviews.llvm.org/D48571
llvm-svn: 335687
Patch tries to make better analysis of the variables that should be
globalized. From now, instead of all parallel directives it will check
only distribute parallel .. directives and check only for
firstprivte/lastprivate variables if they must be globalized.
llvm-svn: 335632
Similarly to CFI on virtual and indirect calls, this implementation
tries to use program type information to make the checks as precise
as possible. The basic way that it works is as follows, where `C`
is the name of the class being defined or the target of a call and
the function type is assumed to be `void()`.
For virtual calls:
- Attach type metadata to the addresses of function pointers in vtables
(not the functions themselves) of type `void (B::*)()` for each `B`
that is a recursive dynamic base class of `C`, including `C` itself.
This type metadata has an annotation that the type is for virtual
calls (to distinguish it from the non-virtual case).
- At the call site, check that the computed address of the function
pointer in the vtable has type `void (C::*)()`.
For non-virtual calls:
- Attach type metadata to each non-virtual member function whose address
can be taken with a member function pointer. The type of a function
in class `C` of type `void()` is each of the types `void (B::*)()`
where `B` is a most-base class of `C`. A most-base class of `C`
is defined as a recursive base class of `C`, including `C` itself,
that does not have any bases.
- At the call site, check that the function pointer has one of the types
`void (B::*)()` where `B` is a most-base class of `C`.
Differential Revision: https://reviews.llvm.org/D47567
llvm-svn: 335569
Additional IR is emitted to convert between scalar and vXi1 type to match the expected software inferface for the builtin that clang exposes.
llvm-svn: 335564
The WebAssembly backend in particular benefits from being
able to distinguish between varargs functions (...) and prototype-less
C functions.
Differential Revision: https://reviews.llvm.org/D48443
llvm-svn: 335510
Summary:
In his review of https://reviews.llvm.org/D45860, @GorNishanov suggested
avoiding generating additional exception-handling IR in the case that
the resume function was marked as 'noexcept', and exceptions could not
occur. This implements that suggestion.
Test Plan: `check-clang`
Reviewers: GorNishanov, EricWF
Reviewed By: GorNishanov
Subscribers: cfe-commits, GorNishanov
Differential Revision: https://reviews.llvm.org/D47673
llvm-svn: 335422
Since we are now producing a summary also for regular LTO builds, we
need to run the NameAnonGlobals pass in those cases as well (the
summary cannot handle anonymous globals).
See https://reviews.llvm.org/D34156 for details on the original change.
This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3.
llvm-svn: 335385
If the shuffle is required for the reduced structures/big data type,
current code may cause compiler crash because of the loading of the
aggregate values. Patch fixes this problem.
llvm-svn: 335377
D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled.
This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled.
llvm-svn: 335308
This is breaking a couple of buildbots. We need to run the
NameAnonGlobal pass for regular LTO now as well (since we're producing a
summary). I'll post a separate patch for review to make this happen and
then re-commit.
This reverts commit c0759b7b1f4a81ff9021b952aa38a222d5fa4dfd.
llvm-svn: 335291
parallel region.
If the current construct requires sharing of the local variable in the
inner parallel region, this variable must be globalized to avoid
runtime crash.
llvm-svn: 335285
Summary:
With D33921, we gained the ability to have module summaries in regular
LTO modules without triggering ThinLTO compilation. Module summaries in
regular LTO allow garbage collection (dead stripping) before LTO
compilation and thus open up additional optimization opportunities.
This patch enables summary emission in regular LTO for all targets
except ld64-based ones (which use the legacy LTO API).
Reviewers: pcc, tejohnson, mehdi_amini
Subscribers: inglorion, eraman, cfe-commits
Differential Revision: https://reviews.llvm.org/D34156
llvm-svn: 335284
Summary:
This test is a strip down version of a function inside the
amalgamated sqlite source. When converted to IR clang produces
a phi instruction without debug location.
This patch fixes the above issue.
Differential Revision: https://reviews.llvm.org/D47720
llvm-svn: 335255
This diff includes the logic for setting the precision bits for each primary fixed point type in the target info and logic for initializing a fixed point literal.
Fixed point literals are declared using the suffixes
```
hr: short _Fract
uhr: unsigned short _Fract
r: _Fract
ur: unsigned _Fract
lr: long _Fract
ulr: unsigned long _Fract
hk: short _Accum
uhk: unsigned short _Accum
k: _Accum
uk: unsigned _Accum
```
Errors are also thrown for illegal literal values
```
unsigned short _Accum u_short_accum = 256.0uhk; // expected-error{{the integral part of this literal is too large for this unsigned _Accum type}}
```
Differential Revision: https://reviews.llvm.org/D46915
llvm-svn: 335148
This is not only semantically correct but ensures that they will not
be marked as address-significant once D48155 lands.
Differential Revision: https://reviews.llvm.org/D48206
llvm-svn: 334982
Summary: All *_sqrt_round_s[s|d] intrinsics should execute a square root on
zeroth element from B (Ops[1]) and insert in to A (Ops[0]), not the other way around.
Reviewers: itaraban, craig.topper
Reviewed By: craig.topper
Subscribers: craig.topper, cfe-commits
Differential Revision: https://reviews.llvm.org/D48288
llvm-svn: 334964
The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8.
This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic.
Fixes the remaining issue from PR37795.
llvm-svn: 334773
This diff includes changes for the remaining _Fract and _Sat fixed point types.
```
signed short _Fract s_short_fract;
signed _Fract s_fract;
signed long _Fract s_long_fract;
unsigned short _Fract u_short_fract;
unsigned _Fract u_fract;
unsigned long _Fract u_long_fract;
// Aliased fixed point types
short _Accum short_accum;
_Accum accum;
long _Accum long_accum;
short _Fract short_fract;
_Fract fract;
long _Fract long_fract;
// Saturated fixed point types
_Sat signed short _Accum sat_s_short_accum;
_Sat signed _Accum sat_s_accum;
_Sat signed long _Accum sat_s_long_accum;
_Sat unsigned short _Accum sat_u_short_accum;
_Sat unsigned _Accum sat_u_accum;
_Sat unsigned long _Accum sat_u_long_accum;
_Sat signed short _Fract sat_s_short_fract;
_Sat signed _Fract sat_s_fract;
_Sat signed long _Fract sat_s_long_fract;
_Sat unsigned short _Fract sat_u_short_fract;
_Sat unsigned _Fract sat_u_fract;
_Sat unsigned long _Fract sat_u_long_fract;
// Aliased saturated fixed point types
_Sat short _Accum sat_short_accum;
_Sat _Accum sat_accum;
_Sat long _Accum sat_long_accum;
_Sat short _Fract sat_short_fract;
_Sat _Fract sat_fract;
_Sat long _Fract sat_long_fract;
```
This diff only allows for declaration of these fixed point types. Assignment and other operations done on fixed point types according to http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf will be added in future patches.
Differential Revision: https://reviews.llvm.org/D46911
llvm-svn: 334718
Summary: These intrinsics result in hint instructions. They are provided here for MSVC ARM64 compatibility.
Reviewers: mstorsjo, compnerd, javed.absar
Reviewed By: mstorsjo
Subscribers: kristof.beyls, chrib, cfe-commits
Differential Revision: https://reviews.llvm.org/D48132
llvm-svn: 334639
Summary:
In many cases we can't devirtualize
because definition of vtable is not present. Most of the
time it is caused by inline virtual function not beeing
emitted. Forcing emitting of vtable adds a reference of these
inline virtual functions.
Note that GCC was always doing it.
Reviewers: rjmccall, rsmith, amharc, kuhar
Subscribers: llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D47108
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 334600
Currently clang set kernel calling convention for CUDA/HIP after
arranging function, which causes incorrect kernel function type since
it depends on calling convention.
This patch moves setting kernel convention before arranging
function.
Differential Revision: https://reviews.llvm.org/D47733
llvm-svn: 334457
This should reduce the binary size penalty of ASan on Windows. After
r334313, ASan will add red zones to globals in comdats, so we will still
find OOB accesses to string literals.
llvm-svn: 334417
Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them.
Reviewers: RKSimon, delena, spatel, GBuella
Reviewed By: RKSimon
Subscribers: cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D47693
llvm-svn: 334366
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47446
llvm-svn: 334362
SmallSet forwards to SmallPtrSet for pointer types. SmallPtrSet supports iteration, but a normal SmallSet doesn't. So if it wasn't for the forwarding, this wouldn't work.
These places were found by hiding the begin/end methods in the SmallSet forwarding.
llvm-svn: 334339
I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl.
By using special buitlins we can emit a select without using the 128/256-bit select builtins.
llvm-svn: 334331
I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features.
The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported.
llvm-svn: 334330
CGM.GetAddrOfConstantCString() sets the adress of the created GlobalValue
to unnamed. When emitting the object file LLVM will mark the surrounding
section as SHF_MERGE iff the string is nul-terminated and contains no
other nuls (see IsNullTerminatedString). This results in problems when
saving temporaries because LLVM doesn't set an EntrySize, so reading in
the serialized assembly file fails.
This never happened for the GPU binaries because they usually contain
a nul-character somewhere. Instead this only affected the module ID
when compiling relocatable device code.
However, this points to a potentially larger problem: If we put a
constant string into a named section, we really want the data to end
up in that section in the object file. To avoid LLVM merging sections
this patch unmarks the GlobalVariable's address as unnamed which also
fixes the problem of invalid serialized assembly files when saving
temporaries.
Differential Revision: https://reviews.llvm.org/D47902
llvm-svn: 334281
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.
llvm-svn: 334261
The windows-msvc target is meant to be ABI compatible with MSVC,
including the exception handling. Ensure that a windows-msvc triple
always equates to the MSVC personality being used.
This mostly affects the GNUStep and ObjFW Obj-C runtimes. To the best of
my knowledge, those are normally not used with windows-msvc triples. I
believe WinObjC is based on GNUStep (or it at least uses libobjc2), but
that also takes the approach of wrapping Obj-C exceptions in C++
exceptions, so the MSVC personality function is the right one to use
there as well.
Differential Revision: https://reviews.llvm.org/D47862
llvm-svn: 334253
Adds support for these intrinsics, which are ARM and ARM64 only:
_interlockedbittestandreset_acq
_interlockedbittestandreset_rel
_interlockedbittestandreset_nf
_interlockedbittestandset_acq
_interlockedbittestandset_rel
_interlockedbittestandset_nf
Refactor the bittest intrinsic handling to decompose each intrinsic into
its action, its width, and its atomicity.
llvm-svn: 334239
We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature.
I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that.
llvm-svn: 334237
We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.
This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously.
llvm-svn: 334208
Summary:
When requirement imposed by __target__ attributes on functions
are not satisfied, prefer printing those requirements, which
are explicitly mentioned in the attributes.
This makes such messages more useful, e.g. printing avx512f instead of avx2
in the following scenario:
```
$ cat foo.c
static inline void __attribute__((__always_inline__, __target__("avx512f")))
x(void)
{
}
int main(void)
{
x();
}
$ clang foo.c
foo.c:7:2: error: always_inline function 'x' requires target feature 'avx2', but would be inlined into function 'main' that is compiled without support for 'avx2'
x();
^
1 error generated.
```
bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37338
Reviewers: craig.topper, echristo, dblaikie
Reviewed By: craig.topper, echristo
Differential Revision: https://reviews.llvm.org/D46541
llvm-svn: 334174
Summary:
We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't.
I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim.
The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform.
I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it.
Reviewers: tkrupa, RKSimon, spatel, GBuella
Reviewed By: tkrupa
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D47724
llvm-svn: 334159
Factor out the common setjmp call emission code.
Based on a patch by Chris January
Differential Revision: https://reviews.llvm.org/D47784
llvm-svn: 334112
I tested these locally on an x86 machine by disabling the inline asm
codepath and confirming that it does the same bitflips as we do with the
inline asm.
Addresses code review feedback.
llvm-svn: 334059
Previously we were just using extended vector operations in the header file.
This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load.
By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count.
User code still has the option to use extended vector operations themselves if they need non-constant indexing.
llvm-svn: 334057