As brought up during the discussion of the DWARF5 accelerator tables,
there is currently no way to associate Objective-C methods with the
interface they belong to, other than the .apple_objc accelerator table.
After due consideration we came to the conclusion that it makes more
sense to follow Pavel's suggestion of just emitting this information in
the .debug_info section. One concern was that categories were
emitted in the .apple_names as well, but it turns out that LLDB doesn't
rely on the accelerator tables for this information.
This patch changes the codegen behavior to emit subprograms for
structure types, like we do for C++. This will result in the
DW_TAG_subprogram being nested as a child under its
DW_TAG_structure_type. This behavior is only enabled for DWARF5 and
later, so we can have a unique code path in LLDB with regards to
obtaining the class methods.
This was tested on the LLDB side and doesn't lead to a regression.
There's already code in place to deal with member functions in C++,
which deals with this transparently.
For more background please refer to the discussion on the mailing list:
http://lists.llvm.org/pipermail/llvm-dev/2018-June/123986.html
Differential revision: https://reviews.llvm.org/D48241
llvm-svn: 335757
We track when we see a name-shaped expression followed by a '<' token
and parse the '<' as a comparison. Then:
* if we see a token sequence that cannot possibly be an expression but
can be a template argument (in particular, a type-id) that follows
either a ',' or the '<', diagnose that the '<' was supposed to start
a template argument list, and
* if we see '>()', diagnose that the '<' was supposed to start a
template argument list.
This only changes the diagnostic for error cases, and in practice
appears to catch the most common cases where a missing 'template'
keyword leads to parse errors within a template.
Differential Revision: https://reviews.llvm.org/D48571
llvm-svn: 335687
Patch tries to make better analysis of the variables that should be
globalized. From now, instead of all parallel directives it will check
only distribute parallel .. directives and check only for
firstprivte/lastprivate variables if they must be globalized.
llvm-svn: 335632
Similarly to CFI on virtual and indirect calls, this implementation
tries to use program type information to make the checks as precise
as possible. The basic way that it works is as follows, where `C`
is the name of the class being defined or the target of a call and
the function type is assumed to be `void()`.
For virtual calls:
- Attach type metadata to the addresses of function pointers in vtables
(not the functions themselves) of type `void (B::*)()` for each `B`
that is a recursive dynamic base class of `C`, including `C` itself.
This type metadata has an annotation that the type is for virtual
calls (to distinguish it from the non-virtual case).
- At the call site, check that the computed address of the function
pointer in the vtable has type `void (C::*)()`.
For non-virtual calls:
- Attach type metadata to each non-virtual member function whose address
can be taken with a member function pointer. The type of a function
in class `C` of type `void()` is each of the types `void (B::*)()`
where `B` is a most-base class of `C`. A most-base class of `C`
is defined as a recursive base class of `C`, including `C` itself,
that does not have any bases.
- At the call site, check that the function pointer has one of the types
`void (B::*)()` where `B` is a most-base class of `C`.
Differential Revision: https://reviews.llvm.org/D47567
llvm-svn: 335569
Additional IR is emitted to convert between scalar and vXi1 type to match the expected software inferface for the builtin that clang exposes.
llvm-svn: 335564
The WebAssembly backend in particular benefits from being
able to distinguish between varargs functions (...) and prototype-less
C functions.
Differential Revision: https://reviews.llvm.org/D48443
llvm-svn: 335510
Summary:
In his review of https://reviews.llvm.org/D45860, @GorNishanov suggested
avoiding generating additional exception-handling IR in the case that
the resume function was marked as 'noexcept', and exceptions could not
occur. This implements that suggestion.
Test Plan: `check-clang`
Reviewers: GorNishanov, EricWF
Reviewed By: GorNishanov
Subscribers: cfe-commits, GorNishanov
Differential Revision: https://reviews.llvm.org/D47673
llvm-svn: 335422
Since we are now producing a summary also for regular LTO builds, we
need to run the NameAnonGlobals pass in those cases as well (the
summary cannot handle anonymous globals).
See https://reviews.llvm.org/D34156 for details on the original change.
This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3.
llvm-svn: 335385
If the shuffle is required for the reduced structures/big data type,
current code may cause compiler crash because of the loading of the
aggregate values. Patch fixes this problem.
llvm-svn: 335377
D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled.
This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled.
llvm-svn: 335308
This is breaking a couple of buildbots. We need to run the
NameAnonGlobal pass for regular LTO now as well (since we're producing a
summary). I'll post a separate patch for review to make this happen and
then re-commit.
This reverts commit c0759b7b1f4a81ff9021b952aa38a222d5fa4dfd.
llvm-svn: 335291
parallel region.
If the current construct requires sharing of the local variable in the
inner parallel region, this variable must be globalized to avoid
runtime crash.
llvm-svn: 335285
Summary:
With D33921, we gained the ability to have module summaries in regular
LTO modules without triggering ThinLTO compilation. Module summaries in
regular LTO allow garbage collection (dead stripping) before LTO
compilation and thus open up additional optimization opportunities.
This patch enables summary emission in regular LTO for all targets
except ld64-based ones (which use the legacy LTO API).
Reviewers: pcc, tejohnson, mehdi_amini
Subscribers: inglorion, eraman, cfe-commits
Differential Revision: https://reviews.llvm.org/D34156
llvm-svn: 335284
Summary:
This test is a strip down version of a function inside the
amalgamated sqlite source. When converted to IR clang produces
a phi instruction without debug location.
This patch fixes the above issue.
Differential Revision: https://reviews.llvm.org/D47720
llvm-svn: 335255
This diff includes the logic for setting the precision bits for each primary fixed point type in the target info and logic for initializing a fixed point literal.
Fixed point literals are declared using the suffixes
```
hr: short _Fract
uhr: unsigned short _Fract
r: _Fract
ur: unsigned _Fract
lr: long _Fract
ulr: unsigned long _Fract
hk: short _Accum
uhk: unsigned short _Accum
k: _Accum
uk: unsigned _Accum
```
Errors are also thrown for illegal literal values
```
unsigned short _Accum u_short_accum = 256.0uhk; // expected-error{{the integral part of this literal is too large for this unsigned _Accum type}}
```
Differential Revision: https://reviews.llvm.org/D46915
llvm-svn: 335148
This is not only semantically correct but ensures that they will not
be marked as address-significant once D48155 lands.
Differential Revision: https://reviews.llvm.org/D48206
llvm-svn: 334982
Summary: All *_sqrt_round_s[s|d] intrinsics should execute a square root on
zeroth element from B (Ops[1]) and insert in to A (Ops[0]), not the other way around.
Reviewers: itaraban, craig.topper
Reviewed By: craig.topper
Subscribers: craig.topper, cfe-commits
Differential Revision: https://reviews.llvm.org/D48288
llvm-svn: 334964
The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8.
This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic.
Fixes the remaining issue from PR37795.
llvm-svn: 334773
This diff includes changes for the remaining _Fract and _Sat fixed point types.
```
signed short _Fract s_short_fract;
signed _Fract s_fract;
signed long _Fract s_long_fract;
unsigned short _Fract u_short_fract;
unsigned _Fract u_fract;
unsigned long _Fract u_long_fract;
// Aliased fixed point types
short _Accum short_accum;
_Accum accum;
long _Accum long_accum;
short _Fract short_fract;
_Fract fract;
long _Fract long_fract;
// Saturated fixed point types
_Sat signed short _Accum sat_s_short_accum;
_Sat signed _Accum sat_s_accum;
_Sat signed long _Accum sat_s_long_accum;
_Sat unsigned short _Accum sat_u_short_accum;
_Sat unsigned _Accum sat_u_accum;
_Sat unsigned long _Accum sat_u_long_accum;
_Sat signed short _Fract sat_s_short_fract;
_Sat signed _Fract sat_s_fract;
_Sat signed long _Fract sat_s_long_fract;
_Sat unsigned short _Fract sat_u_short_fract;
_Sat unsigned _Fract sat_u_fract;
_Sat unsigned long _Fract sat_u_long_fract;
// Aliased saturated fixed point types
_Sat short _Accum sat_short_accum;
_Sat _Accum sat_accum;
_Sat long _Accum sat_long_accum;
_Sat short _Fract sat_short_fract;
_Sat _Fract sat_fract;
_Sat long _Fract sat_long_fract;
```
This diff only allows for declaration of these fixed point types. Assignment and other operations done on fixed point types according to http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf will be added in future patches.
Differential Revision: https://reviews.llvm.org/D46911
llvm-svn: 334718
Summary: These intrinsics result in hint instructions. They are provided here for MSVC ARM64 compatibility.
Reviewers: mstorsjo, compnerd, javed.absar
Reviewed By: mstorsjo
Subscribers: kristof.beyls, chrib, cfe-commits
Differential Revision: https://reviews.llvm.org/D48132
llvm-svn: 334639
Summary:
In many cases we can't devirtualize
because definition of vtable is not present. Most of the
time it is caused by inline virtual function not beeing
emitted. Forcing emitting of vtable adds a reference of these
inline virtual functions.
Note that GCC was always doing it.
Reviewers: rjmccall, rsmith, amharc, kuhar
Subscribers: llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D47108
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 334600
Currently clang set kernel calling convention for CUDA/HIP after
arranging function, which causes incorrect kernel function type since
it depends on calling convention.
This patch moves setting kernel convention before arranging
function.
Differential Revision: https://reviews.llvm.org/D47733
llvm-svn: 334457
This should reduce the binary size penalty of ASan on Windows. After
r334313, ASan will add red zones to globals in comdats, so we will still
find OOB accesses to string literals.
llvm-svn: 334417
Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them.
Reviewers: RKSimon, delena, spatel, GBuella
Reviewed By: RKSimon
Subscribers: cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D47693
llvm-svn: 334366
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47446
llvm-svn: 334362
SmallSet forwards to SmallPtrSet for pointer types. SmallPtrSet supports iteration, but a normal SmallSet doesn't. So if it wasn't for the forwarding, this wouldn't work.
These places were found by hiding the begin/end methods in the SmallSet forwarding.
llvm-svn: 334339
I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl.
By using special buitlins we can emit a select without using the 128/256-bit select builtins.
llvm-svn: 334331
I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features.
The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported.
llvm-svn: 334330
CGM.GetAddrOfConstantCString() sets the adress of the created GlobalValue
to unnamed. When emitting the object file LLVM will mark the surrounding
section as SHF_MERGE iff the string is nul-terminated and contains no
other nuls (see IsNullTerminatedString). This results in problems when
saving temporaries because LLVM doesn't set an EntrySize, so reading in
the serialized assembly file fails.
This never happened for the GPU binaries because they usually contain
a nul-character somewhere. Instead this only affected the module ID
when compiling relocatable device code.
However, this points to a potentially larger problem: If we put a
constant string into a named section, we really want the data to end
up in that section in the object file. To avoid LLVM merging sections
this patch unmarks the GlobalVariable's address as unnamed which also
fixes the problem of invalid serialized assembly files when saving
temporaries.
Differential Revision: https://reviews.llvm.org/D47902
llvm-svn: 334281
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.
llvm-svn: 334261
The windows-msvc target is meant to be ABI compatible with MSVC,
including the exception handling. Ensure that a windows-msvc triple
always equates to the MSVC personality being used.
This mostly affects the GNUStep and ObjFW Obj-C runtimes. To the best of
my knowledge, those are normally not used with windows-msvc triples. I
believe WinObjC is based on GNUStep (or it at least uses libobjc2), but
that also takes the approach of wrapping Obj-C exceptions in C++
exceptions, so the MSVC personality function is the right one to use
there as well.
Differential Revision: https://reviews.llvm.org/D47862
llvm-svn: 334253
Adds support for these intrinsics, which are ARM and ARM64 only:
_interlockedbittestandreset_acq
_interlockedbittestandreset_rel
_interlockedbittestandreset_nf
_interlockedbittestandset_acq
_interlockedbittestandset_rel
_interlockedbittestandset_nf
Refactor the bittest intrinsic handling to decompose each intrinsic into
its action, its width, and its atomicity.
llvm-svn: 334239
We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature.
I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that.
llvm-svn: 334237
We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.
This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously.
llvm-svn: 334208
Summary:
When requirement imposed by __target__ attributes on functions
are not satisfied, prefer printing those requirements, which
are explicitly mentioned in the attributes.
This makes such messages more useful, e.g. printing avx512f instead of avx2
in the following scenario:
```
$ cat foo.c
static inline void __attribute__((__always_inline__, __target__("avx512f")))
x(void)
{
}
int main(void)
{
x();
}
$ clang foo.c
foo.c:7:2: error: always_inline function 'x' requires target feature 'avx2', but would be inlined into function 'main' that is compiled without support for 'avx2'
x();
^
1 error generated.
```
bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37338
Reviewers: craig.topper, echristo, dblaikie
Reviewed By: craig.topper, echristo
Differential Revision: https://reviews.llvm.org/D46541
llvm-svn: 334174
Summary:
We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't.
I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim.
The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform.
I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it.
Reviewers: tkrupa, RKSimon, spatel, GBuella
Reviewed By: tkrupa
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D47724
llvm-svn: 334159
Factor out the common setjmp call emission code.
Based on a patch by Chris January
Differential Revision: https://reviews.llvm.org/D47784
llvm-svn: 334112
I tested these locally on an x86 machine by disabling the inline asm
codepath and confirming that it does the same bitflips as we do with the
inline asm.
Addresses code review feedback.
llvm-svn: 334059
Previously we were just using extended vector operations in the header file.
This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load.
By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count.
User code still has the option to use extended vector operations themselves if they need non-constant indexing.
llvm-svn: 334057
This builtin takes an index as its second operand, but the codegen hardcodes an index of 0 and doesn't use the operand. The only use of the builtin in the header file passes 0 to the operand so this works for that usage. But its more correct to use the real operand.
llvm-svn: 334054
CUDA/HIP does not support RTTI on device side, therefore there
is no point of emitting type info when compiling for device.
Emitting type info for device not only clutters the IR with useless
global variables, but also causes undefined symbol at linking
since vtable for cxxabiv1::class_type_info has external linkage.
Differential Revision: https://reviews.llvm.org/D47694
llvm-svn: 334021
We need to implement _interlockedbittestandset as a builtin for
windows.h, so we might as well do the whole family. It reduces code
duplication anyway.
Fixes PR33188, a long standing bug in our bittest implementation
encountered by Chakra.
llvm-svn: 333978
Adding __attribute__((aligned(32))) to __m256 breaks the implementation
of _mm256_loadu_ps on Windows. On Windows, alignment attributes have
higher precedence than packing attributes.
We also might want to carefully consider the consequences of changing
our vector typedefs, since many users copy them and invent their own
new, non-Intel specific vector type names.
llvm-svn: 333958
Summary:
Because `llvm::Triple` can be derived from `TargetInfo`, it is simpler
to take only `TargetInfo` argument.
Reviewers: sbc100
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D47620
llvm-svn: 333938
// Primary fixed point types
signed short _Accum s_short_accum;
signed _Accum s_accum;
signed long _Accum s_long_accum;
unsigned short _Accum u_short_accum;
unsigned _Accum u_accum;
unsigned long _Accum u_long_accum;
// Aliased fixed point types
short _Accum short_accum;
_Accum accum;
long _Accum long_accum;
This diff only allows for declaration of the fixed point types. Assignment and other operations done on fixed point types according to http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf will be added in future patches. The saturated versions of these types and the equivalent _Fract types will also be added in future patches.
The tests included are for asserting that we can declare these types.
Fixed the test that was failing by not checking for dso_local on some
targets.
Differential Revision: https://reviews.llvm.org/D46084
llvm-svn: 333923
This seems like a premature optimization. It's unlikely a user would pass something the frontend can tell is all ones to the masked load/store intrinsics.
We do this optimization for emitting select for masking because we have builtin calls in header files that pass an all ones mask in. Though at this point we may not longer have any builtins that emit some IR and a select. We may only have the select builtins so maybe we can remove that optimization too.
llvm-svn: 333847
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47121
llvm-svn: 333829
```
// Primary fixed point types
signed short _Accum s_short_accum;
signed _Accum s_accum;
signed long _Accum s_long_accum;
unsigned short _Accum u_short_accum;
unsigned _Accum u_accum;
unsigned long _Accum u_long_accum;
// Aliased fixed point types
short _Accum short_accum;
_Accum accum;
long _Accum long_accum;
```
This diff only allows for declaration of the fixed point types. Assignment and other operations done on fixed point types according to http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf will be added in future patches. The saturated versions of these types and the equivalent `_Fract` types will also be added in future patches.
The tests included are for asserting that we can declare these types.
Differential Revision: https://reviews.llvm.org/D46084
llvm-svn: 333814
This fixes two major problems:
- We were not capping vector alignment as desired on 32-bit ARM.
- We were using different alignments based on the AVX settings on
Intel, so we did not have a consistent ABI.
This is an ABI break, but we think we can get away with it because
vectors tend to be used mostly in inline code (which is why not having
a consistent ABI has not proven disastrous on Intel).
Intel's AVX types are specified as having 32-byte / 64-byte alignment,
so align them explicitly instead of relying on the base ABI rule.
Note that this sort of attribute is stripped from template arguments
in template substitution, so there's a possibility that code templated
over vectors will produce inadequately-aligned objects. The right
long-term solution for this is for alignment attributes to be
interpreted as true qualifiers and thus preserved in the canonical type.
llvm-svn: 333791
Summary:
clang's current wasm EH implementation is a non-MVP feature in progress.
We had a `-mexception-handling` wasm feature but were not using it. This
patch hides the non-MVP wasm EH behind a flag, so it does not affect
other code for now.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits
Differential Revision: https://reviews.llvm.org/D47614
llvm-svn: 333716
A deferred region should end before the start of a label, and should not
extend to the start of the label sub-statement.
Fixes llvm.org/PR35867.
llvm-svn: 333715
The WebAssembly committee has decided on the names `memory.size` and
`memory.grow` for the memory intrinsics, so update the clang builtin
functions to follow those names, keeping both sets of old names in place
for compatibility.
llvm-svn: 333712
Summary:
Because wasm control flow needs to be structured, using WinEH
instructions to support wasm EH brings several benefits. This patch
makes wasm EH uses Windows EH instructions, with some changes:
1. Because wasm uses a single catch block to catch all C++ exceptions,
this merges all catch clauses into a single catchpad, within which we
test the EH selector as in Itanium EH.
2. Generates a call to `__clang_call_terminate` in case a cleanup
throws. Wasm does not have a runtime to handle this.
3. In case there is no catch-all clause, inserts a call to
`__cxa_rethrow` at the end of a catchpad in order to unwind to an
enclosing EH scope.
Reviewers: majnemer, dschuff
Subscribers: jfb, sbc100, jgravelle-google, sunfish, cfe-commits
Differential Revision: https://reviews.llvm.org/D44931
llvm-svn: 333703
Ensure latest MPT decl has a MSInheritanceAttr when instantiating
templates, to avoid null MSInheritanceAttr deref in
CXXRecordDecl::getMSInheritanceModel().
See PR#37399 for repo / details.
Patch by Andrew Rogers!
Differential Revision: https://reviews.llvm.org/D46664
llvm-svn: 333680
Discard the last uncompleted deferred region in a decl, if one exists.
This prevents lines at the end of a function containing only whitespace
or closing braces from being marked as uncovered, if they follow a
region terminator (return/break/etc).
The previous behavior was to heuristically complete deferred regions at
the end of a decl. In practice this ended up being too brittle for too
little gain. Users would complain that there was no way to reach full
code coverage because whitespace at the end of a function would be
marked uncovered.
rdar://40238228
Differential Revision: https://reviews.llvm.org/D46918
llvm-svn: 333609
This patch replaces all packed (and scalar without rounding
mode) fused intrinsics with fmadd/fmaddsub variations.
Then fmadd/fmaddsub are lowered to native IR.
Patch by tkrupa
Reviewers: craig.topper, sroland, spatel, RKSimon
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D47444
llvm-svn: 333555
These intrinsics are used by MSVC's header files on AArch64 Windows as
well as AArch32, so we should support them for both targets. I've
factored them out of CodeGenFunction::EmitARMBuiltinExpr into separate
functions that EmitAArch64BuiltinExpr can call as well.
Reviewers: javed.absar, mstorsjo
Reviewed By: mstorsjo
Subscribers: kristof.beyls, cfe-commits
Differential Revision: https://reviews.llvm.org/D47476
llvm-svn: 333513
This helps especially when the collision is for a template specialization,
where the template arguments are not available from anywhere else in the
diagnostic, and are likely relevant to the problem.
llvm-svn: 333489
initialization functions to 'cxx_fast_tlscc'.
This fixes a bug where instructions calling initialization functions for
thread-local static members of c++ template classes were using calling
convention 'cxx_fast_tlscc' while the called functions weren't annotated
with the calling convention.
rdar://problem/40447463
Differential Revision: https://reviews.llvm.org/D47354
llvm-svn: 333447
The checksum will not reflect the real source, so there's no clear
reason to include them in the debug info. Also this was causing a
crash on the DWARF side.
Differential Revision: https://reviews.llvm.org/D47260
llvm-svn: 333311
If orphaned parallel region is found, the next code must be emitted:
```
if(__kmpc_is_spmd_exec_mode() || __kmpc_parallel_level(loc, gtid))
Serialized execution.
else if (IsMasterThread())
Prepare and signal worker.
else
Outined function call.
```
llvm-svn: 333301
It caused asserts, see PR37560.
> Use zeroinitializer for (trailing zero portion of) large array initializers
> more reliably.
>
> Clang has two different ways it emits array constants (from InitListExprs and
> from APValues), and both had some ability to emit zeroinitializer, but neither
> was able to catch all cases where we could use zeroinitializer reliably. In
> particular, emitting from an APValue would fail to notice if all the explicit
> array elements happened to be zero. In addition, for large arrays where only an
> initial portion has an explicit initializer, we would emit the complete
> initializer (which could be huge) rather than emitting only the non-zero
> portion. With this change, when the element would have a suffix of more than 8
> zero elements, we emit the array constant as a packed struct of its initial
> portion followed by a zeroinitializer constant for the trailing zero portion.
>
> In passing, I found a bug where SemaInit would sometimes walk the entire array
> when checking an initializer that only covers the first few elements; that's
> fixed here to unblock testing of the rest.
>
> Differential Revision: https://reviews.llvm.org/D47166
llvm-svn: 333067
more reliably.
Clang has two different ways it emits array constants (from InitListExprs and
from APValues), and both had some ability to emit zeroinitializer, but neither
was able to catch all cases where we could use zeroinitializer reliably. In
particular, emitting from an APValue would fail to notice if all the explicit
array elements happened to be zero. In addition, for large arrays where only an
initial portion has an explicit initializer, we would emit the complete
initializer (which could be huge) rather than emitting only the non-zero
portion. With this change, when the element would have a suffix of more than 8
zero elements, we emit the array constant as a packed struct of its initial
portion followed by a zeroinitializer constant for the trailing zero portion.
In passing, I found a bug where SemaInit would sometimes walk the entire array
when checking an initializer that only covers the first few elements; that's
fixed here to unblock testing of the rest.
Differential Revision: https://reviews.llvm.org/D47166
llvm-svn: 333044
The clang builtins have the same semantics as the stdlib functions.
The stdlib functions are defined in section 7.20.6.1 of the C standard with:
"If the result cannot be represented, the behavior is undefined."
That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would
be UB/poison.
Differential Revision: https://reviews.llvm.org/D47202
llvm-svn: 333038
Introduced CreateMemTempWithoutCast and CreateTemporaryAllocaWithoutCast to emit alloca
without casting to default addr space.
ActiveFlag is a temporary variable emitted for clean up. It is defined as AllocaInst* type and there is
a cast to AlllocaInst in SetActiveFlag. An alloca casted to generic pointer causes assertion in
SetActiveFlag.
Since there is only load/store of ActiveFlag, it is safe to use the original alloca, therefore use
CreateMemTempWithoutCast is called.
Differential Revision: https://reviews.llvm.org/D47099
llvm-svn: 332982
This change will help Visual Studio resolve forward references to C++ lambda
routines used by captured variables.
Differential Revision: https://reviews.llvm.org/D45438
llvm-svn: 332975
Summary:
This includes initial support for the (hopefully final) updated Objective-C ABI, developed here:
https://github.com/davidchisnall/clang-gnustep-abi-2
It also includes some cleanups and refactoring from older GNU ABIs.
The current version is ELF only, other formats to follow.
Reviewers: rjmccall, DHowett-MSFT
Reviewed By: rjmccall
Subscribers: smeenai, cfe-commits
Differential Revision: https://reviews.llvm.org/D46052
llvm-svn: 332950
Because the intrinsics in the headers are implemented as macros, we can't just use a select builtin and pternlog builtin. This would require one of the macro arguments to be used twice. Depending on what was passed to the macro we could expand an expression twice leading to weird behavior. We could maybe declare our local variable in the macro, but that would need to worry about name collisions.
To avoid that just generate IR directly in CGBuiltin.cpp.
Differential Revision: https://reviews.llvm.org/D47125
llvm-svn: 332891
1. added restrictions to memory scope, order and volatile parameters
2. added custom processing for these builtins - currently is not used code,
needed to switch off GCCBuiltin link to the builtins (ongoing change to llvm
tree)
3. builtins renamed as requested
Differential Revision: https://reviews.llvm.org/D43281
llvm-svn: 332848
If a variable has an initializer, codegen tries to build its value. If
the variable is large in size, building its value requires substantial
resources. It causes strange behavior from user viewpoint: compilation
of huge zero initialized arrays like:
char data_1[2147483648u] = { 0 };
consumes enormous amount of time and memory.
With this change codegen tries to determine if variable initializer is
equivalent to zero initializer. In this case variable value is not
constructed.
This change fixes PR18978.
Differential Revision: https://reviews.llvm.org/D46241
llvm-svn: 332847
The first version of the patch (r332228) was flawed because it was
putting structors into C5/D5 comdats very eagerly. This is correct only
if we can ensure the comdat contains all required versions of the
structor (which wasn't the case). This version uses a more nuanced
approach:
- for local structor symbols we use an alias because we don't have to
worry about comdats or other compilation units.
- linkonce symbols are emitted separately, as we cannot guarantee we
will have all symbols we need to form a comdat (they are emitted
lazily, only when referenced).
- available_externally symbols are also emitted separately, as the code
seemed to be worried about emitting an alias in this case.
- other linkage types are not affected by the optimization level. They
either get put into a comdat (weak) or get aliased (external).
Reviewers: rjmccall, aprantl
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D46685
llvm-svn: 332839
When a lambda capture captures a __block in the same statement, the compiler asserts out because isCapturedBy assumes that an Expr can only be a BlockExpr, StmtExpr, or if it's a Stmt then all the statement's children are expressions. That's wrong, we need to visit all sub-statements even if they're not expressions to see if they also capture.
Fix this issue by pulling out the isCapturedBy logic to use RecursiveASTVisitor.
<rdar://problem/39926584>
llvm-svn: 332801
To support linking device code in different source files, it is necessary to
embed fat binary at host linking stage.
This patch emits an external symbol for fat binary in host codegen, then
embed the fat binary by lld through a linker script.
Differential Revision: https://reviews.llvm.org/D46472
llvm-svn: 332724
MethodVFTableLocations in MigrosoftVTableContext contains canonicalized
decl. But, it's sometimes asked to lookup for non-canonicalized decl,
and that causes assertion failure, and compilation failure.
Fixes PR37481.
Patch by Taiju Tsuiki!
Differential Revision: https://reviews.llvm.org/D46929
llvm-svn: 332639
lifetime.start/end expects pointer argument in alloca address space.
However in C++ a temporary variable is in default address space.
This patch changes API CreateMemTemp and CreateTempAlloca to
get the original alloca instruction and pass it lifetime.start/end.
It only affects targets with non-zero alloca address space.
Differential Revision: https://reviews.llvm.org/D45900
llvm-svn: 332593
functions.
If the combined construct is specified in the declare target function
and the device code is emitted, the compiler crashes because of the
incorrectly chosen captured stmt. We should choose the innermost
captured statement, not the outermost.
llvm-svn: 332477
If the orphaned directive is executed in SPMD mode, we need to emit the
check for the SPMD mode and run the orphaned parallel directive in
sequential mode.
llvm-svn: 332467
In generic data-sharing mode we do not need to globalize
variables/parameters of reference/pointer types. They already are placed
in the global memory.
llvm-svn: 332380
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
Explicitly avoided changing the strings in the clang-format tests.
Differential Revision: https://reviews.llvm.org/D44975
llvm-svn: 332350
Some targets have constant address space (e.g. amdgcn). For them string literal should be
emitted in constant address space then casted to default address space.
Differential Revision: https://reviews.llvm.org/D46643
llvm-svn: 332279
Summary:
Removing the full structor and replacing all usages with the base one
can degrade debug quality as it will leave the debugger unable to locate
the full object structor. This is apparent when evaluating an expression
in the debugger which requires constructing an object of class which has
had this optimization applied to it. When compiling the expression, we
pretend that the class and its methods have been defined in another
compilation unit, so the expression compiler assumes the structor
definition must be available. This didn't use to be the case for
structors with internal linkage. Less aggressive optimizations like
emitting the full structor as an alias remain in place, as they do not
cause the structor symbol to disappear completely.
This improves debug quality on non-darwin platforms (darwin does not
have -mconstructor-aliases on by default, so it is spared these
problems) and enable us to remove some workarounds from LLDB which attempt to
mitigate this issue.
Reviewers: rjmccall, aprantl
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D46685
llvm-svn: 332228
These intrinsics work exactly as all other atomic_fetch_* intrinsics and allow to create *atomicrmw* with ordering.
Updated the clang-extensions document.
Differential Revision: https://reviews.llvm.org/D46386
llvm-svn: 332193
Summary:
The Itanium ABI requires that the type info for pointer-to-incomplete types to have internal linkage, so that it doesn't interfere with the type info once completed. Currently it also marks the type info name as internal as well. However, this causes a bug with the STL implementations, which use the type info name pointer to perform ordering and hashing of type infos.
For example:
```
// header.h
struct T;
extern std::type_info const& Info;
// tu_one.cpp
#include "header.h"
std::type_info const& Info = typeid(T*);
// tu_two.cpp
#include "header.h"
struct T {};
int main() {
auto &TI1 = Info;
auto &TI2 = typeid(T*);
assert(TI1 == TI2); // Fails
assert(TI1.hash_code() == TI2.hash_code()); // Fails
}
```
This patch fixes the STL bug by emitting the type info name as linkonce_odr when the type-info is for a pointer-to-incomplete type.
Note that libc++ could fix this without a compiler change, but the quality of fix would be poor. The library would either have to:
(A) Always perform strcmp/string hashes.
(B) Determine if we have a pointer-to-incomplete type, and only do strcmp then. This would require an ABI break for libc++.
Reviewers: rsmith, rjmccall, majnemer, vsapsai
Reviewed By: rjmccall
Subscribers: smeenai, cfe-commits
Differential Revision: https://reviews.llvm.org/D46665
llvm-svn: 332028
This commit relands r331904.
Adding a SrcMgr::CharacteristicKind parameter to the InclusionDirective
in PPCallbacks, and updating calls to that function. This will be useful
in https://reviews.llvm.org/D43778 to determine which includes are
system
headers.
Differential Revision: https://reviews.llvm.org/D46614
llvm-svn: 332021
Added initial support for L2 parallelism in SPMD mode. Note, though,
that the orphaned parallel directives are not currently supported in
SPMD mode.
llvm-svn: 332016
This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs.
Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well.
llvm-svn: 331958
Summary:
The Itanium ABI requires that the type info for pointer-to-incomplete types to have internal linkage, so that it doesn't interfere with the type info once completed. Currently it also marks the type info name as internal as well. However, this causes a bug with the STL implementations, which use the type info name pointer to perform ordering and hashing of type infos.
For example:
```
// header.h
struct T;
extern std::type_info const& Info;
// tu_one.cpp
#include "header.h"
std::type_info const& Info = typeid(T*);
// tu_two.cpp
#include "header.h"
struct T {};
int main() {
auto &TI1 = Info;
auto &TI2 = typeid(T*);
assert(TI1 == TI2); // Fails
assert(TI1.hash_code() == TI2.hash_code()); // Fails
}
```
This patch fixes the STL bug by emitting the type info name as linkonce_odr when the type-info is for a pointer-to-incomplete type.
Note that libc++ could fix this without a compiler change, but the quality of fix would be poor. The library would either have to:
(A) Always perform strcmp/string hashes.
(B) Determine if we have a pointer-to-incomplete type, and only do strcmp then. This would require an ABI break for libc++.
Reviewers: rsmith, rjmccall, majnemer, vsapsai
Reviewed By: rjmccall
Subscribers: smeenai, cfe-commits
Differential Revision: https://reviews.llvm.org/D46665
llvm-svn: 331957
Previously we emitted something like
rotl(x, n) {
n &= bitwidth-1;
return n != 0 ? ((x << n) | (x >> (bitwidth - n)) : x;
}
We use a select to avoid the undefined behavior on the (bitwidth - n) shift.
The middle and backend don't really recognize this as a rotate and end up emitting a cmov or control flow because of the select.
A better pattern is (x << (n & mask)) | (x << (-n & mask)) where mask is bitwidth - 1.
Fixes the main complaint in PR37387. There's still some work to be done if the user writes that sequence directly on a short or char where type promotion rules can prevent it from being recognized. The builtin is emitting direct IR with unpromoted types so that isn't a problem for it.
Differential Revision: https://reviews.llvm.org/D46656
llvm-svn: 331943
Summary:
This attribute tells clang to skip this function from stack protector
when -stack-protector option is passed.
GCC option for this is:
__attribute__((__optimize__("no-stack-protector"))) and the
equivalent clang syntax would be: __attribute__((no_stack_protector))
This is used in Linux kernel to selectively disable stack protector
in certain functions.
Reviewers: aaron.ballman, rsmith, rnk, probinson
Reviewed By: aaron.ballman
Subscribers: probinson, srhines, cfe-commits
Differential Revision: https://reviews.llvm.org/D46300
llvm-svn: 331925
Adding a SrcMgr::CharacteristicKind parameter to the InclusionDirective
in PPCallbacks, and updating calls to that function. This will be useful
in https://reviews.llvm.org/D43778 to determine which includes are system
headers.
Differential Revision: https://reviews.llvm.org/D46614
llvm-svn: 331904
It is required to emit unique names for offloading regions ids. Required
to support compilation and linking of several compilation units.
llvm-svn: 331899
If the global variables are marked as declare target and they need
ctors/dtors, these ctors/dtors are emitted and then invoked by the
offloading runtime library. They are not explicitly used in the emitted
code and thus can be optimized out. Patch marks these functions as used,
so the optimizer cannot remove these function during the optimization
phase.
llvm-svn: 331879
It broke the Chromium build (see reply on the review).
> Generate DILabel metadata and call llvm.dbg.label after label
> statement to associate the metadata with the label.
>
> Differential Revision: https://reviews.llvm.org/D45045
>
> Patch by Hsiangkai Wang.
This doesn't revert the change to backend-unsupported-error.ll
that seems to correspond to an llvm-side change.
llvm-svn: 331861
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.
Differential Revision: https://reviews.llvm.org/D45045
Patch by Hsiangkai Wang.
llvm-svn: 331843
This is similar to the LLVM change https://reviews.llvm.org/D46290.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46320
llvm-svn: 331834
The linkage of the global entries must be weak to enable support of
redefinition of the same target regions in multiple compilation units.
llvm-svn: 331768
This patch addresses some mostly trivial post-commit review comments received
on r331677.
Additionally, this patch fixes an assertion in `getNarrowingKind` caused by
the use of an uninitialized value from `checkThreeWayNarrowingConversion`.
llvm-svn: 331707
Summary:
This patch tackles long hanging fruit for the builtin operator<=> expressions. It is currently needs some cleanup before landing, but I want to get some initial feedback.
The main changes are:
* Lookup, build, and store the required standard library types and expressions in `ASTContext`. By storing them in ASTContext we don't need to store (and duplicate) the required expressions in the BinaryOperator AST nodes.
* Implement [expr.spaceship] checking, including diagnosing narrowing conversions.
* Implement `ExprConstant` for builtin spaceship operators.
* Implement builitin operator<=> support in `CodeGenAgg`. Initially I emitted the required comparisons using `ScalarExprEmitter::VisitBinaryOperator`, but this caused the operand expressions to be emitted once for every required cmp.
* Implement [builtin.over] with modifications to support the intent of P0946R0. See the note on `BuiltinOperatorOverloadBuilder::addThreeWayArithmeticOverloads` for more information about the workaround.
Reviewers: rsmith, aaron.ballman, majnemer, rnk, compnerd, rjmccall
Reviewed By: rjmccall
Subscribers: rjmccall, rsmith, aaron.ballman, junbuml, mgorny, cfe-commits
Differential Revision: https://reviews.llvm.org/D45476
llvm-svn: 331677
Added initial codegen for level 2, 3 etc. parallelism. Currently, all
the second, the third etc. parallel regions will run sequentially.
llvm-svn: 331642
Summary:
Passes down the necessary code ge options to the LTO Config to enable
-fdiagnostics-show-hotness and -fsave-optimization-record in the ThinLTO
backend for a distributed build.
Also, remove warning about not having PGO when the input is IR.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, cfe-commits
Differential Revision: https://reviews.llvm.org/D46464
llvm-svn: 331592
Summary:
http://wg21.link/P0664r2 section "Evolution/Core Issues 24" describes a
proposed change to Coroutines TS that would have any exceptions thrown
after the initial suspend point of a coroutine be caught by the handler
specified by the promise type's 'unhandled_exception' member function.
This commit provides a sample implementation of the specified behavior.
Test Plan: `check-clang`
Reviewers: GorNishanov, EricWF
Reviewed By: GorNishanov
Subscribers: cfe-commits, lewissbaker, eric_niebler
Differential Revision: https://reviews.llvm.org/D45860
llvm-svn: 331519
FunctionProtoType.
We previously re-evaluated the expression each time we wanted to know whether
the type is noexcept or not. We now evaluate the expression exactly once.
This is not quite "no functional change": it fixes a crasher bug during AST
deserialization where we would try to evaluate the noexcept specification in a
situation where we have not deserialized sufficient portions of the AST to
permit such evaluation.
llvm-svn: 331428
Some symbols are not allowed to be used as names on some targets. Patch
ries to unify the emission of the names of LLVM globals so they could be
used on different targets.
llvm-svn: 331358
devices.
If the function is an instantiation|specialization of the template and
is used in the device code, the definitions of such functions should be
emitted for the device.
llvm-svn: 331261
This is not yet part of any C++ working draft, and so is controlled by the flag
-fchar8_t rather than a -std= flag. (The GCC implementation is controlled by a
flag with the same name.)
This implementation is experimental, and will be removed or revised
substantially to match the proposal as it makes its way through the C++
committee.
llvm-svn: 331244
As suggested in the post-commit thread for rL331056, we should match these
clang options with the established vocabulary of the corresponding sanitizer
option. Also, the use of 'strict' is well-known for these kinds of knobs,
and we can improve the descriptive text in the docs.
So this intends to match the logic of D46135 but only change the words.
Matching LLVM commit to match this spelling of the attribute to follow shortly.
Differential Revision: https://reviews.llvm.org/D46236
llvm-svn: 331209
Emit error messages instead of compiler crashing when the target region
does not exist in the device code + fix crash when the location comes
from macros.
llvm-svn: 331195
When a '>>' token is split into two '>' tokens (in C++11 onwards), or (as an
extension) when we do the same for other tokens starting with a '>', we can't
just use a location pointing to the first '>' as the location of the split
token, because that would result in our miscomputing the length and spelling
for the token. As a consequence, for example, a refactoring replacing 'A<X>'
with something else would sometimes replace one character too many, and
similarly diagnostics highlighting a template-id source range would highlight
one character too many.
Fix this by creating an expansion range covering the first character of the
'>>' token, whose spelling is '>'. For this to work, we generalize the
expansion range of a macro FileID to be either a token range (the common case)
or a character range (used in this new case).
llvm-svn: 331155
As discussed in the post-commit thread for:
rL330437 ( http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180423/545906.html )
We need a way to opt-out of a float-to-int-to-float cast optimization because too much
existing code relies on the platform-specific undefined result of those casts when the
float-to-int overflows.
The LLVM changes associated with adding this function attribute are here:
rL330947
rL330950
rL330951
Also as suggested, I changed the LLVM doc to mention the specific sanitizer flag that
catches this problem:
rL330958
Differential Revision: https://reviews.llvm.org/D46135
llvm-svn: 331041
The ACLE spec which describes these intrinsics hasn't been published yet, but
this is based on the final draft which will be published soon, and these have
already been implemented by GCC.
Differential revision: https://reviews.llvm.org/D46109
llvm-svn: 331039
SPIR-V encodes the read_only and write_only access qualifiers of pipes,
so separate LLVM IR types are required to target SPIR-V. Other backends
may also find this useful.
These new types are `opencl.pipe_ro_t` and `opencl.pipe_wo_t`, which
replace `opencl.pipe_t`.
This replaces __get_pipe_num_packets(...) and __get_pipe_max_packets(...)
which took a read_only pipe with separate versions for read_only and
write_only pipes, namely:
* __get_pipe_num_packets_ro(...)
* __get_pipe_num_packets_wo(...)
* __get_pipe_max_packets_ro(...)
* __get_pipe_max_packets_wo(...)
These separate versions exist to avoid needing a bitcast to one of the
two qualified pipe types.
Patch by Stuart Brady.
Differential Revision: https://reviews.llvm.org/D46015
llvm-svn: 331026
function if a function delegates to another function.
Fix a bug introduced in r328731, which caused a struct with ObjC __weak
fields that was passed to a function to be destructed twice, once in the
callee function and once in another function the callee function
delegates to. To prevent this, keep track of the callee-destructed
structs passed to a function and disable their cleanups at the point of
the call to the delegated function.
This reapplies r331016, which was reverted in r331019 because it caused
an assertion to fail in EmitDelegateCallArg on a windows bot. I made
changes to EmitDelegateCallArg so that it doesn't try to deactivate
cleanups for structs that have trivial destructors (cleanups for those
structs are never pushed to the cleanup stack in EmitParmDecl).
rdar://problem/39194693
Differential Revision: https://reviews.llvm.org/D45382
llvm-svn: 331020
function if a function delegates to another function.
Fix a bug introduced in r328731, which caused a struct with ObjC __weak
fields that was passed to a function to be destructed twice, once in the
callee function and once in another function the callee function
delegates to. To prevent this, keep track of the callee-destructed
structs passed to a function and disable their cleanups at the point of
the call to the delegated function.
rdar://problem/39194693
Differential Revision: https://reviews.llvm.org/D45382
llvm-svn: 331016
This patch is a tweak of changyu's patch: https://reviews.llvm.org/D40381. It differs in that the recognition of the 'concept' token is moved into the machinery that recognizes declaration-specifiers - this allows us to leverage the attribute handling machinery more seamlessly.
See the test file to get a sense of the basic parsing that this patch supports.
There is much more work to be done before concepts are usable...
Thanks Changyu!
llvm-svn: 330794
HIP is a language similar to CUDA (https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md ).
The language syntax is very similar, which allows a hip program to be compiled as a CUDA program by Clang. The main difference
is the host API. HIP has a set of vendor neutral host API which can be implemented on different platforms. Currently there is open source
implementation of HIP runtime on amdgpu target (https://github.com/ROCm-Developer-Tools/HIP).
This patch adds support of input kind and language standard hip.
When hip file is compiled, both LangOpts.CUDA and LangOpts.HIP is turned on. This allows compilation of hip program as CUDA
in most cases and only special handling of hip program is needed LangOpts.HIP is checked.
This patch also adds support of kernel launching of HIP program using HIP host API.
When -x hip is not specified, there is no behaviour change for CUDA.
Patch by Greg Rodgers.
Revised and lit test added by Yaxun Liu.
Differential Revision: https://reviews.llvm.org/D44984
llvm-svn: 330790
/usr/local/bin/ld.lld: error: undefined symbol: llvm::createAggressiveInstCombinerPass()
>>> referenced by cc1_main.cpp
>>> tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o:(_GLOBAL__sub_I_cc1_main.cpp)
And so on
The bot coverage is clearly missing.
llvm-svn: 330694
NVPTX target.
When generating the wrapper function for the offloading region, we need
to call the outlined function and cast the arguments correctly to follow
the ABI. Usually, variables captured by value are casted to `uintptr_t`
type. But this should not performed for the variables with pointer type.
llvm-svn: 330620
If an atomic variable is misaligned (and that suspicion is why Clang emits
libcalls at all) the runtime support library will have to use a lock to safely
access it, with potentially very bad performance consequences. There's a very
good chance this is unintentional so it makes sense to issue a warning.
Also give it a named group so people can promote it to an error, or disable it
if they really don't care.
llvm-svn: 330566
Some targets need special LLVM calling convention for CUDA kernel.
This patch does that through a TargetCodeGenInfo hook.
It only affects amdgcn target.
Patch by Greg Rodgers.
Revised and lit tests added by Yaxun Liu.
Differential Revision: https://reviews.llvm.org/D45223
llvm-svn: 330447
Summary:
By default Clang outputs its version (including git commit hash, in
case of trunk builds) into object and assembly files. It might be
useful to have an option to disable this, especially for debugging
purposes.
This patch implements new command line flags -Qn and -Qy (the names
are chosen for compatibility with GCC). -Qn disables output of
the 'llvm.ident' metadata string and the 'producer' debug info. -Qy
(enabled by default) does the opposite.
Reviewers: faisalv, echristo, aprantl
Reviewed By: aprantl
Subscribers: aprantl, cfe-commits, JDevlieghere, rogfer01
Differential Revision: https://reviews.llvm.org/D45255
llvm-svn: 330442
nvcc generates a unique registration function for each object file
that contains relocatable device code. Unique names are achieved
with a module id that is also reflected in the function's name.
Differential Revision: https://reviews.llvm.org/D42922
llvm-svn: 330425