These intrinsics are now fundemental for SVE code generation and have been
present for a year and a half, hence move them out of the experimental
namespace.
Differential Revision: https://reviews.llvm.org/D127976
Some RISC-V builtins requires ICE operands. We should call
getIntegerConstantExpr instead of EmitScalarExpr to match other
targets.
This was made a little trickier by the vector intrinsics not having
a valid type string, but there are two that have ICE operands so
I specified them manually.
Before C99 introduced flexible array member, common practice uses size-1 array
to emulate FAM, e.g. https://github.com/python/cpython/issues/94250
As a result, -fsanitize=array-bounds instrumentation skipped such structures
as a workaround (from 539e4a77bb).
D126864 accidentally dropped the workaround. Add it back with tests.
This patch gives basic parsing and semantic support for "masked taskloop"
construct introduced in OpenMP 5.1 (section 2.16.7)
Differential Revision: https://reviews.llvm.org/D128478
Add option -fhip-kernel-arg-name to emit kernel argument
name metadata, which is needed for certain HIP applications.
Reviewed by: Artem Belevich, Fangrui Song, Brian Sumner
Differential Revision: https://reviews.llvm.org/D128022
Some code [0] consider that trailing arrays are flexible, whatever their size.
Support for these legacy code has been introduced in
f8f6324983 but it prevents evaluation of
__builtin_object_size and __builtin_dynamic_object_size in some legit cases.
Introduce -fstrict-flex-arrays=<n> to have stricter conformance when it is
desirable.
n = 0: current behavior, any trailing array member is a flexible array. The default.
n = 1: any trailing array member of undefined, 0 or 1 size is a flexible array member
n = 2: any trailing array member of undefined or 0 size is a flexible array member
n = 3: any trailing array member of undefined size is a flexible array member (strict c99 conformance)
Similar patch for gcc discuss here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836
[0] https://docs.freebsd.org/en/books/developers-handbook/sockets/#sockets-essential-functions
Simplify debug info back to just "limited" or "full" by rolling the ctor
type homing fully into the "limited" debug info.
Also fix a bug I found along the way that was causing ctor type homing
to kick in even when something could be vtable homed (where vtable
homing is stronger/more effective than ctor homing) - fixing at the same
time as it keeps the tests (that were testing only "limited non ctor"
homing and now test ctor homing) passing.
This patch is needed because developers expect "GCCBuiltin" items to be the GCC intrinsics equivalent and not the Clang internals.
Reviewed By: #libc_abi, RKSimon, xbolva00
Differential Revision: https://reviews.llvm.org/D127460
Previously `#pragma STDC FENV_ACCESS ON` always set dynamic rounding
mode and strict exception handling. It is not correct in the presence
of other pragmas that also modify rounding mode and exception handling.
For example, the effect of previous pragma FENV_ROUND could be
cancelled, which is not conformant with the C standard. Also
`#pragma STDC FENV_ACCESS OFF` turned off only FEnvAccess flag, leaving
rounding mode and exception handling unchanged, which is incorrect in
general case.
Concrete rounding and exception mode depend on a combination of several
factors like various pragmas and command-line options. During the review
of this patch an idea was proposed that the semantic actions associated
with such pragmas should only set appropriate flags. Actual rounding
mode and exception handling should be calculated taking into account the
state of all relevant options. In such implementation the pragma
FENV_ACCESS should not override properties set by other pragmas but
should set them if such setting is absent.
To implement this approach the following main changes are made:
- Field `FPRoundingMode` is removed from `LangOptions`. Actually there
are no options that set it to arbitrary rounding mode, the choice was
only `dynamic` or `tonearest`. Instead, a new boolean flag
`RoundingMath` is added, with the same meaning as the corresponding
command-line option.
- Type `FPExceptionModeKind` now has possible value `FPE_Default`. It
does not represent any particular exception mode but indicates that
such mode was not set and default value should be used. It allows to
distinguish the case:
{
#pragma STDC FENV_ACCESS ON
...
}
where the pragma must set FPE_Strict, from the case:
{
#pragma clang fp exceptions(ignore)
#pragma STDC FENV_ACCESS ON
...
}
where exception mode should remain `FPE_Ignore`.
- Class `FPOptions` has now methods `getRoundingMode` and
`getExceptionMode`, which calculates the respective properties from
other specified FP properties.
- Class `LangOptions` has now methods `getDefaultRoundingMode` and
`getDefaultExceptionMode`, which calculates default modes from the
specified options and should be used instead of `getRoundingMode` and
`getFPExceptionMode` of the same class.
Differential Revision: https://reviews.llvm.org/D126364
This reverts commits:
d3ddc251acd90eecff5c
It turned out there're some options turned on that leaks the memory
intentionally, which fires the asan builds after the patch being
applied. The issue has been fixed in
7bc00ce5cd, so reland it.
Below is the original commit message:
The intent of this patch is to selectively carry some states over to
the Builder so we won't lose the information of the previous symbols.
This used to be several downstream patches of Cling, it aims to fix
errors in Clang Interpreter when trying to use inline functions.
Before this patch:
clang-repl> inline int foo() { return 42;}
clang-repl> int x = foo();
JIT session error: Symbols not found: [ _Z3foov ]
error: Failed to materialize symbols:
{ (main, { x, $.incr_module_1.__inits.0, __orc_init_func.incr_module_1 }) }
Co-authored-by: Axel Naumann <Axel.Naumann@cern.ch>
Signed-off-by: Jun Zhang <jun@junz.org>
Instead, just pop the cleanups at the end of the asm statement.
This fixes an assertion failure in BuildStmtExpr. It also fixes a bug
where blocks and C compound literals were destructed at the end of the
asm statement instead of at the end of the enclosing scope.
Differential Revision: https://reviews.llvm.org/D125936
XL considers different vector types to be incompatible with each other.
For example assignment between variables of types vector float and vector
long long or even vector signed int and vector unsigned int are diagnosed.
clang, however does not diagnose such cases and does a simple bitcast between
the two types. This could easily result in program errors. This patch is to
fix the implicit casts in altivec.h so that there is no incompatible vector
type errors whit -fno-lax-vector-conversions, this is the prerequisite patch
to switch the default to -fno-lax-vector-conversions later.
Reviewed By: nemanjai, amyk
Differential Revision: https://reviews.llvm.org/D124093
This reverts commits:
d3ddc251acd90eecff5c
This relands below commit with asan fix:
The intent of this patch is to selectively carry some states over to
the Builder so we won't lose the information of the previous symbols.
This used to be several downstream patches of Cling, it aims to fix
errors in Clang Interpreter when trying to use inline functions.
Before this patch:
clang-repl> inline int foo() { return 42;}
clang-repl> int x = foo();
JIT session error: Symbols not found: [ _Z3foov ]
error: Failed to materialize symbols:
{ (main, { x, $.incr_module_1.__inits.0, __orc_init_func.incr_module_1 }) }
Co-authored-by: Axel Naumann <Axel.Naumann@cern.ch>
Signed-off-by: Jun Zhang <jun@junz.org>
Differential Revision: https://reviews.llvm.org/D127730
RE-LAND (reverts a revert):
This reverts commit 8e1f47b596.
This patch adds generation of sanitizer metadata attributes (which were
added in D126100) to the clang frontend.
We still currently generate the llvm.asan.globals that's consumed by
the IR pass, but the plan is to eventually migrate off of that onto
purely debuginfo and these IR attributes.
Reviewed By: vitalybuka, kstoimenov
Differential Revision: https://reviews.llvm.org/D126929
This patch adds generation of sanitizer metadata attributes (which were
added in D126100) to the clang frontend.
We still currently generate the `llvm.asan.globals` that's consumed by
the IR pass, but the plan is to eventually migrate off of that onto
purely debuginfo and these IR attributes.
Reviewed By: vitalybuka, kstoimenov
Differential Revision: https://reviews.llvm.org/D126929
The option mdefault-visibility-export-mapping is created to allow
mapping default visibility to an explicit shared library export
(e.g. dllexport). Exactly how and if this is manifested is target
dependent (since it depends on how they map dllexport in the IR).
Three values are provided for the option:
* none: the default and behavior without the option, no additional export linkage information is created.
* explicit: add the export for entities with explict default visibility from the source, including RTTI
* all: add the export for all entities with default visibility
This option is useful for targets which do not export symbols as part of
their usual default linkage behaviour (e.g. AIX), such targets
traditionally specified such information in external files (e.g. export
lists), but this mapping allows them to use the visibility information
typically used for this purpose on other (e.g. ELF) platforms.
This relands commit: 8c8a2679a2
with fixes for the compile time and assert problems that were reported
by:
* making shouldMapVisibilityToDLLExport inline and provide an early return
in the case where no mapping is in effect (aka non-AIX platforms)
* don't try to export RTTI types which we will give internal linkage to
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D126340
Previously, omitting unnecessary DWARF unwinds was only done in two
cases:
* For Darwin + aarch64, if no DWARF unwind info is needed for all the
functions in a TU, then the `__eh_frame` section would be omitted
entirely. If any one function needed DWARF unwind, then MC would emit
DWARF unwind entries for all the functions in the TU.
* For watchOS, MC would omit DWARF unwind on a per-function basis, as
long as compact unwind was available for that function.
This diff makes it so that we omit DWARF unwind on a per-function basis
for Darwin + aarch64 as well. In addition, we introduce the flag
`--emit-dwarf-unwind=` which can toggle between `always`,
`no-compact-unwind` (only emit DWARF when CU cannot be emitted for a
given function), and the target platform `default`. `no-compact-unwind`
is particularly useful for newer x86_64 platforms: we don't want to omit
DWARF unwind for x86_64 in general due to possible backwards compat
issues, but we should make it possible for people to opt into this
behavior if they are only targeting newer platforms.
**Motivation:** I'm working on adding support for `__eh_frame` to LLD,
but I'm concerned that we would suffer a perf hit. Processing compact
unwind is already expensive, and that's a simpler format than EH frames.
Given that MC currently produces one EH frame entry for every compact
unwind entry, I don't think processing them will be cheap. I tried to do
something clever on LLD's end to drop the unnecessary EH frames at parse
time, but this made the code significantly more complex. So I'm looking
at fixing this at the MC level instead.
**Addendum:** It turns out that there was a latent bug in the X86
backend when `OmitDwarfIfHaveCompactUnwind` is naively enabled, which is
not too surprising given that this combination has not been heretofore
used.
For functions that have unwind info that cannot be encoded with CU, MC
would end up dropping both the compact unwind entry (OK; existing
behavior) as well as the DWARF entries (not OK). This diff fixes things
so that we emit the DWARF entry, as well as a CU entry with encoding
`UNWIND_X86_MODE_DWARF` -- this basically tells the unwinder to look for
the DWARF entry. I'm not 100% sure the `UNWIND_X86_MODE_DWARF` CU entry
is necessary, this was the simplest fix. ld64 seems to be able to handle
both the absence and presence of this CU entry. Ultimately ld64 (and
LLD) will synthesize `UNWIND_X86_MODE_DWARF` if it is absent, so there
is no impact to the final binary size.
Reviewed By: davide, lhames
Differential Revision: https://reviews.llvm.org/D122258
In the same spirit as D73543 and in reply to https://reviews.llvm.org/D126768#3549920 this patch is adding support for `__builtin_memset_inline`.
The idea is to get support from the compiler to easily write efficient memory function implementations.
This patch could be split in two:
- one for the LLVM part adding the `llvm.memset.inline.*` intrinsics.
- and another one for the Clang part providing the instrinsic as a builtin.
Differential Revision: https://reviews.llvm.org/D126903
By both AAPCS32 and AAPCS64, the test for whether an aggregate
qualifies as homogeneous (either HFA or HVA) is based on the data
layout alone. So any logical member of the structure that does not
affect the data layout also should not affect homogeneity. In
particular, an empty bitfield ('int : 0') should make no difference.
In fact, clang considered it to make a difference in C but not in C++,
and justified that policy as compatible with gcc. But that's
considered a bug in gcc as well (at least for Arm targets), and it's
fixed in gcc 12.1.
This fix mimics gcc's: zero-sized bitfields are now ignored in all
languages for the Arm (32- and 64-bit) ABIs. But I've left the
previous behaviour unchanged in other ABIs, by means of adding an
ABIInfo::isZeroLengthBitfieldPermittedInHomogeneousAggregate query
method which the Arm subclasses override.
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D127197
The intent of this patch is to selectively carry some states over to
the Builder so we won't lose the information of the previous symbols.
This used to be several downstream patches of Cling, it aims to fix
errors in Clang Interpreter when trying to use inline functions.
Before this patch:
clang-repl> inline int foo() { return 42;}
clang-repl> int x = foo();
JIT session error: Symbols not found: [ _Z3foov ]
error: Failed to materialize symbols:
{ (main, { x, $.incr_module_1.__inits.0, __orc_init_func.incr_module_1 }) }
Co-authored-by: Axel Naumann <Axel.Naumann@cern.ch>
Signed-off-by: Jun Zhang <jun@junz.org>
Differential Revision: https://reviews.llvm.org/D126781
D83592 added comments to be part of skipped regions, and as part of that, it
also shrinks a skipped range if it spans a line that contains a non-comment
token. This is done by `adjustSkippedRange`.
The `adjustSkippedRange` currently runs on skipped regions that are not
comments, causing a 5min regression while building a big C++ files without any
comments.
Fix the compile time introduced in D83592 by tagging SkippedRange with kind
information and use that to decide what needs additional processing.
Differential Revision: https://reviews.llvm.org/D127338
Originally broken by me in D122608, this is a regression where we
attempt to replace an extern-C thing with 'itself'. The problem is that
we end up deleting it, causing the value to fail when it gets put into
llvm.used.
DebugTypeVisitor
This recommits d1346e2. I've added a line to the test case to enable it
only on assert builds.
Differential Revision: https://reviews.llvm.org/D125839
A variable with `weak` attribute signifies that it can be replaced with
a "strong" symbol link time. Therefore it must not emitted with
"weak_odr" linkage, as that allows the backend to use its value in
optimizations.
The frontend already considers weak const variables as
non-constant (note_constexpr_var_init_weak diagnostic) so this change
makes frontend and backend consistent.
This commit reverses the
f49573d1 weak globals that are const should get weak_odr linkage.
commit from 2009-08-05 which introduced this behavior. Unfortunately
that commit doesn't provide any details on why the change was made.
This was discussed in
https://discourse.llvm.org/t/weak-attribute-semantics-on-const-variables/62311
Differential Revision: https://reviews.llvm.org/D126324
This patch adds the codegen support for `atomic compare capture` in clang.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D120290
This patch removes all `IgnoreImpCasts` in Sema, and only uses it if necessary. If the expression is not of the same type as the pointer value, a cast is inserted.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D126602
This caused assertions, see comment on the code review:
llvm/clang/lib/AST/Decl.cpp:1510:
clang::LinkageInfo clang::LinkageComputer::getLVForDecl(const clang::NamedDecl *, clang::LVComputationKind):
Assertion `D->getCachedLinkage() == LV.getLinkage()' failed.
> The option mdefault-visibility-export-mapping is created to allow
> mapping default visibility to an explicit shared library export
> (e.g. dllexport). Exactly how and if this is manifested is target
> dependent (since it depends on how they map dllexport in the IR).
>
> Three values are provided for the option:
>
> * none: the default and behavior without the option, no additional export linkage information is created.
> * explicit: add the export for entities with explict default visibility from the source, including RTTI
> * all: add the export for all entities with default visibility
>
> This option is useful for targets which do not export symbols as part of
> their usual default linkage behaviour (e.g. AIX), such targets
> traditionally specified such information in external files (e.g. export
> lists), but this mapping allows them to use the visibility information
> typically used for this purpose on other (e.g. ELF) platforms.
>
> Reviewed By: MaskRay
>
> Differential Revision: https://reviews.llvm.org/D126340
This reverts commit 8c8a2679a2.
We use the `OffloadBinary` to create binary images of offloading files
and their corresonding metadata. This patch changes this to inherit from
the base `Binary` class. This allows us to create and insepect these
more generically. This patch includes all the necessary glue to
implement this as a new binary format, along with added the magic bytes
we use to distinguish the offloading binary to the `file_magic`
implementation.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D126812
The option mdefault-visibility-export-mapping is created to allow
mapping default visibility to an explicit shared library export
(e.g. dllexport). Exactly how and if this is manifested is target
dependent (since it depends on how they map dllexport in the IR).
Three values are provided for the option:
* none: the default and behavior without the option, no additional export linkage information is created.
* explicit: add the export for entities with explict default visibility from the source, including RTTI
* all: add the export for all entities with default visibility
This option is useful for targets which do not export symbols as part of
their usual default linkage behaviour (e.g. AIX), such targets
traditionally specified such information in external files (e.g. export
lists), but this mapping allows them to use the visibility information
typically used for this purpose on other (e.g. ELF) platforms.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D126340
Without this patch, arguments to the
`llvm::OpenMPIRBuilder::AtomicOpValue` initializer are reversed.
Reviewed By: ABataev, tianshilei1992
Differential Revision: https://reviews.llvm.org/D126619
This patch adds !nosanitize metadata to FixedMetadataKinds.def, !nosanitize indicates that LLVM should not insert any sanitizer instrumentation.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D126294
C++ generated code with huge amount of switch cases chokes badly while emitting
coverage mapping, in our specific testcase (~72k cases), it won't stop after hours.
After this change, the frontend job now finishes in 4.5s and shrinks down `@__covrec_`
by 288k when compared to disabling simplification altogether.
There's probably no good way to create a testcase for this, but it's easy to
reproduce, just add thousands of cases in the below switch, and build with
`-fprofile-instr-generate -fcoverage-mapping`.
```
enum type : int {
FEATURE_INVALID = 0,
FEATURE_A = 1,
...
};
const char *to_string(type e) {
switch (e) {
case type::FEATURE_INVALID: return "FEATURE_INVALID";
case type::FEATURE_A: return "FEATURE_A";}
...
}
```
Differential Revision: https://reviews.llvm.org/D126345
Refactor the code that handles the align clause of 'omp allocate' so
it can be used with globals as well as local variables.
Differential Revision: https://reviews.llvm.org/D126426
CUDA requires that static variables be visible to the host when
offloading. However, The standard semantics of a stiatc variable dictate
that it should not be visible outside of the current file. In order to
access it from the host we need to perform "externalization" on the
static variable on the device. This requires generating a semi-unique
name that can be affixed to the variable as to not cause linker errors.
This is currently done using the CUID functionality, an MD5 hash value
set up by the clang driver. This allows us to achieve is mostly unique
ID that is unique even between multiple compilations of the same file.
However, this is not always availible. Instead, this patch uses the
unique ID from the file to generate a unique symbol name. This will
create a unique name that is consistent between the host and device side
compilations without requiring the CUID to be entered by the driver. The
one downside to this is that we are no longer stable under multiple
compilations of the same file. However, this is a very niche use-case
and is not supported by Nvidia's CUDA compiler so it likely to be good
enough.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D125904
This creates an entry with address=nullptr and flag=0x80.
When an 'omp_all_memory' entry is specified any other 'out' or
'inout' entries are not needed and are not passed to the runtime.
Differential Revision: https://reviews.llvm.org/D126321
Adds support for the reserved locator 'omp_all_memory' for use
in depend clauses with 'out' or 'inout' dependence-types.
Differential Revision: https://reviews.llvm.org/D125828
https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170
unsigned char __readx18byte(unsigned long)
unsigned short __readx18word(unsigned long)
unsigned long __readx18dword(unsigned long)
unsigned __int64 __readx18qword(unsigned long)
Given the lack of documentation of the intrinsics, we chose to align the offset with just
`CharUnits::One()` when calling `IRBuilderBase::CreateAlignedLoad()`
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D126024
https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170
void __writex18byte(unsigned long, unsigned char)
void __writex18word(unsigned long, unsigned short)
void __writex18dword(unsigned long, unsigned long)
void __writex18qword(unsigned long, unsigned __int64)
Given the lack of documentation of the intrinsics, we chose to align the offset with just
`CharUnits::One()` when calling `IRBuilderBase::CreateAlignedStore()`.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D126023
Support for `__attribute__((no_builtin("foo")))` was added in https://reviews.llvm.org/D68028,
but builtins were still being used even when the attribute was placed on a function.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D124701
Allows emitting define amdgpu_kernel void @func() IR from C or C++.
This replaces the current workflow which is to write a stub in opencl that
calls an external C function implemented in C++ combined through llvm-link.
Calling the resulting function still requires a manual implementation of the
ABI from the host side. The primary application is for more rapid debugging
of the amdgpu backend by permuting a C or C++ test file instead of manually
updating an IR file.
Implementation closely follows D54425. Non-amd reviewers from there.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D125970
Fix __has_builtin to return 1 only if the requested target features
of a builtin are enabled by refactoring the code for checking
required target features of a builtin and use it in evaluation
of __has_builtin.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D125829
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.
The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.
Differential Revision: https://reviews.llvm.org/D125557
An upcoming patch will extend llvm-symbolizer to provide the source line
information for global variables. The goal is to move AddressSanitizer
off of internal debug info for symbolization onto the DWARF standard
(and doing a clean-up in the process). Currently, ASan reports the line
information for constant strings if a memory safety bug happens around
them. We want to keep this behaviour, so we need to emit debuginfo for
these variables as well.
Reviewed By: dblaikie, rnk, aprantl
Differential Revision: https://reviews.llvm.org/D123534
An upcoming patch will extend llvm-symbolizer to provide the source line
information for global variables. The goal is to move AddressSanitizer
off of internal debug info for symbolization onto the DWARF standard
(and doing a clean-up in the process). Currently, ASan reports the line
information for constant strings if a memory safety bug happens around
them. We want to keep this behaviour, so we need to emit debuginfo for
these variables as well.
Reviewed By: dblaikie, rnk, aprantl
Differential Revision: https://reviews.llvm.org/D123534
We use globals to configure debugging at compile-time for the device
runtime. Because these are only used by the OpenMP runtime we shouldn't
define them if we aren't using the device runtime. When a user passes in
'-nogpulib' this indicates that we are not using the device runtime, so
we should check for the precense of this flag and not emit these globals
if used.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D125314
In order to do offloading compilation we need to embed files into the
host and create fatbainaries. Clang uses a special binary format to
bundle several files along with their metadata into a single binary
image. This is currently performed using the `-fembed-offload-binary`
option. However this is not very extensibile since it requires changing
the command flag every time we want to add something and makes optional
arguments difficult. This patch introduces a new tool called
`clang-offload-packager` that behaves similarly to CUDA's `fatbinary`.
This tool takes several input files with metadata and embeds it into a
single image that can then be embedded in the host.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D125165
The changes made in D123460 generalized the code generation for OpenMP's
offloading entries. We can use the same scheme to register globals for
CUDA code. This patch adds the code generation to create these
offloading entries when compiling using the new offloading driver mode.
The offloading entries are simple structs that contain the information
necessary to register the global. The struct used is as follows:
```
Type struct __tgt_offload_entry {
void *addr; // Pointer to the offload entry info.
// (function or global)
char *name; // Name of the function or global.
size_t size; // Size of the entry info (0 if it a function).
int32_t flags;
int32_t reserved;
};
```
Currently CUDA handles RDC code generation by deferring the registration
of globals in the current TU to a callback function containing the
modules ID. Later all the module IDs will be used to register all of the
globals at once. Rather than mimic this, offloading entries allow us to
mimic the way OpenMP registers globals. That is, we create a simple
global struct for each device global to be registered. These are placed
at a special section `cuda_offloading_entires`. Because this section is
a valid C-identifier, the linker will profide a `__start` and `__stop`
pointer that we can use to iterate and register all globals at runtime.
the registration requires a flag variable to indicate which registration
function to use. I have assigned the flags somewhat arbitrarily, but
these use the following values.
Kernel: 0
Variable: 0
Managed: 1
Surface: 2
Texture: 3
Depends on D120272
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D123471
This adds support for variable stride with the val, uval, and ref linear
modifiers. Previously only the no modifer type ls<argno> was supported.
val -> Ls<argno>
uval -> Us<argno>
ref -> Rs<argno>
Differential Revision: https://reviews.llvm.org/D125330
Add mangling for linear parameters specified with ref, uval, and val
for 'omp declare simd' vector functions.
Add missing stride for linear this parameters.
Differential Revision: https://reviews.llvm.org/D125269
In case of placement new, if we do not know the alignment of the
operand, we can't assume it has the preferred alignment. It might be
e.g. a pointer to a struct member which follows ABI alignment rules.
This makes UBSAN no longer report "constructor call on misaligned
address" when constructing a double into a struct field of type double
on i686. The psABI specifies an alignment of 4 bytes, but the preferred
alignment used by Clang is 8 bytes.
We now use ABI alignment for allocating new as well, as the preferred
alignment should be used for over-aligning e.g. local variables, which
isn't relevant for ABI code dealing with operator new. AFAICT there
wouldn't be problems either way though.
Fixes#54845.
Differential Revision: https://reviews.llvm.org/D124736
D117829 added the generic "__builtin_reduce_mul" which we can use to replace the x86 specific integer mul reduction builtins - internally these were mapping to the same intrinsic already so there are no test changes required.
Differential Revision: https://reviews.llvm.org/D125222
Similar to the existing bitwise reduction builtins, this lowers to a llvm.vector.reduce.mul intrinsic call.
For other reductions, we've tried to share builtins for float/integer vectors, but the fmul reduction intrinsic also take a starting value argument and can either do unordered or serialized, but not reduction-trees as specified for the builtins. However we address fmul support this shouldn't affect the integer case.
Differential Revision: https://reviews.llvm.org/D117829
Compared to the old implementation:
* In C++, we only recurse into aggregate classes.
* Unnamed bit-fields are not printed.
* Constant evaluation is supported.
* Proper conversion is done when passing arguments through `...`.
* Additional arguments are supported and are injected prior to the
format string; this directly supports use with `fprintf`, for example.
* An arbitrary callable can be passed rather than only a function
pointer. In particular, in C++, a function template or overload set is
acceptable.
* All text generated by Clang is printed via `%s` rather than directly;
this avoids issues where Clang's pretty-printing output might itself
contain a `%` character.
* Fields of types that we don't know how to print are printed with a
`"*%p"` format and passed by address to the print function.
* No return value is produced.
Reviewed By: aaron.ballman, erichkeane, yihanaa
Differential Revision: https://reviews.llvm.org/D124221
CUDA/HIP needs to mangle for aux target. When mangling for aux target,
the mangler should use mangling number for aux target. Previously
in https://reviews.llvm.org/D122734 a state was introduced in
ASTContext to let the mangler get mangling number for aux target
from ASTContext. This patch removes that state from ASTConext
and add an IsAux member to MangleContext to indicate that
the mangle context is for aux target. This reflects the reality that
the mangle context is created for mangling aux target and makes
ASTContext cleaner.
Reviewed by: Artem Belevich, Reid Kleckner
Differential Revision: https://reviews.llvm.org/D124842
If alignment specified with align clause is less than natural alignment for
list item type, the alignment should be set to the natural alignment.
See OMP5.1 specification, page 185, lines 7-10
Differential Revision: https://reviews.llvm.org/D124676
Currently when using `atomic update` with floating-point variables, if
the operation is add or sub, `cmpxchg`, instead of `atomicrmw` is emitted, as
shown in [1]. In fact, about three years ago, llvm-svn: 351850 added the
support for FP operations. This patch adds the support in OpenMP as well.
[1] https://godbolt.org/z/M7b4ba9na
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D124724
This patch adds support for the conditional (ternary) operator on SVE
scalable vector types in C++, matching the behaviour for NEON vector
types. Like the conditional operator for NEON types, this is disabled in
C mode.
Differential Revision: https://reviews.llvm.org/D124091
D124741 added the generic "__builtin_reduce_add" which we can use to replace the x86 specific integer add reduction builtins - internally these were mapping to the same intrinsic already so there are no test changes required.
Differential Revision: https://reviews.llvm.org/D124757
Similar to the existing bitwise reduction builtins, this lowers to a llvm.vector.reduce.add intrinsic call.
For other reductions, we've tried to share builtins for float/integer vectors, but the fadd reduction intrinsics also take a starting value argument and can either do unordered or serialized, but not reduction-trees as specified for the builtins. However we address fadd support this shouldn't affect the integer case.
(Split off from D117829)
Differential Revision: https://reviews.llvm.org/D124741
The DXIL validator version option(/validator-version) decide the validator version when compile hlsl.
The format is major.minor like 1.0.
In normal case, the value of validator version should be got from DXIL validator. Before we got DXIL validator ready for llvm/main, DXIL validator version option is added first to set validator version.
It will affect code generation for DXIL, so it is treated as a code gen option.
A new member std::string DxilValidatorVersion is added to clang::CodeGenOptions.
Then CGHLSLRuntime is added to clang::CodeGenModule.
It is used to translate clang::CodeGenOptions::DxilValidatorVersion into a ModuleFlag under key "dx.valver" at end of clang code generation.
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D123884
This patch moves the logic for generating the offloading entries to the
OpenMPIRBuilder. This makes it easier to re-use in other places, such as
for OpenMP support in Flang or using the same method for generating
offloading entires for other languages like Cuda.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D123460
Thanks for @rsmith to point this. I'm sorry for introducing this bug.
See @rsmith 's comment in https://reviews.llvm.org/D122248
Eg:(By @rsmith ) https://godbolt.org/z/o7vcbWaEf
I have added a test case
struct:
```
struct U19A {
int a;
};
struct U19B {
struct U19A a;
};
struct U19B a = {
.a.a = 2022
};
```
Dump result:
```
struct U19B {
struct U19A a = {
int a = 2022
}
}
```
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D122920
MSVC and Itanium mangling use different mangling numbers
for function-scope structs, which causes inconsistent
mangled kernel names in device and host compilations.
This patch uses Itanium mangling number for structs
in for mangling device side names in CUDA/HIP host
compilation on Windows to fix this issue.
A state is added to ASTContext to indicate whether the
current name mangling is for device side names in host
compilation. Device and host mangling number
are encoded/decoded as upper and lower half of 32 bit
unsigned integer to fit into the original mangling number
field for AST. Diagnostic will be emitted if a manglining
number exceeds limit.
Reviewed by: Artem Belevich, Reid Kleckner
Differential Revision: https://reviews.llvm.org/D122734
Fixes: SWDEV-328515
Need to emit final update of the inscan reduction variables. For
worksharing loops, the reduction values are stored in the temp array,
need to copy the last element to the original var at the end of the
construct.
Differential Revision: https://reviews.llvm.org/D121156
Different TU's may have this globl var. appending linkage can
only be used with lld recognized special variables.
Change it to internal linkage.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D124466
The callback is expected to create a branch to the ContinuationBB (sometimes called FiniBB in some lambdas) argument when finishing. This creates problems:
1. The InsertPoint used for CodeGenIP does not need to be the end of a block. If it is not, a naive callback will insert a branch instruction into the middle of the block.
2. The BasicBlock the CodeGenIP is pointing to may or may not have a terminator. There is an conflict where to branch to if the block already has a terminator.
3. Some API functions work only with block having a terminator. Some workarounds have been used to insert a temporary terminator that is removed again.
4. Some callbacks are sensitive to whether the BasicBlock has a terminator or not. This creates a callback ordering problem where different callback may have different behaviour depending on whether a previous callback created a terminator or not. The problem also exists for FinalizeCallbackTy where some callbacks do create branch to another "continue" block, but unlike BodyGenCallbackTy does not receive the target as argument. This is not addressed in this patch.
With this patch, the callback receives an CodeGenIP into a BasicBlock where to insert instructions. If it has to insert control flow, it can split the block at that position as needed but otherwise no separate ContinuationBB is needed. In particular, a callback can be empty without breaking the emitted IR. If the caller needs the control flow to branch to a specific target, it can insert the branch instruction itself and pass an InsertPoint before the terminator to the callback.
Certain frontends such as Clang may expect the current IRBuilder position to be at the end of a basic block. In this case its callbacks must split the block at CodeGenIP before setting the IRBuilder position such that the instructions after CodeGenIP are moved to another basic block and before returning create a new branch instruction to the split block.
Some utility functions such as `splitBB` are supporting correct splitting of BasicBlocks, independent of whether they have a terminator or not, returning/setting the InsertPoint of an IRBuilder to the end of split predecessor block, and optionally omitting creating a branch to the split successor block to be added later.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D118409
Default behavior for .file directory was changed in D105856, but
ptxas (CUDA 11.5 release) refuses to parse it:
$ llc -march=nvptx64 llvm/test/DebugInfo/NVPTX/debug-file-loc.ll
$ ptxas debug-file-loc.s
ptxas debug-file-loc.s, line 42; fatal : Parsing error near
'"foo.h"': syntax error
Added a new field to MCAsmInfo to control default value of
UseDwarfDirectory. This value is used if -dwarf-directory command line
option is not specified.
Differential Revision: https://reviews.llvm.org/D121299
A struct like { float a; int :0; } should per the SystemZ ABI be passed in a
GPR, but to match a bug in GCC it has been passed in an FPR (see 759449c).
GCC has now corrected the C++ ABI for this case, and this patch for clang
follows suit.
Reviewed By: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D122388
In case of OpenMP programs, thread local variables can be present in
any clause pertaining to OpenMP constructs, as we know that compiler
generates artificial functions and in some cases values are passed to
those artificial functions thru parameters. For an example, if thread
local variable is present in copyin clause (testcase attached with the
patch), parameter with same name is generated as parameter to artificial
function. When user inquires the thread Local variable, its debug info
is hidden by the parameter. User never gets the actual TLS variable
when inquires it, instead gets the artificial parameter.
Current patch suppresses the debug info for such artificial parameter to
enable correct debugging of TLS variables.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D123787
This patch is a continuation of https://reviews.llvm.org/D123353.
Not only kernels in anonymous namespace, but also template
kernels with template arguments in anonymous namespace
need to be externalized.
To be more generic, this patch checks the linkage of a kernel
assuming the kernel does not have __global__ attribute. If
the linkage is internal then clang will externalize it.
This patch also fixes the postfix for externalized symbol
since nvptx does not allow '.' in symbol name.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D124189
Fixes: https://github.com/llvm/llvm-project/issues/54560
attribute((__aligned__)) is present but ignored`
In the original code, the 'getDeclAlignIfRequired' function is used.
The 'getDeclAlignIfRequired' function will return the max alignment
of all aligned attributes if the type has aligned attributes. The
function doesn't consider the type at all.
The 'getTypeAlignIfRequired' function uses the type's alignment value,
which also used by the 'alignof' function. I think we should use the
function of 'getTypeAlignIfRequired'.
Reviewed By: dblaikie, jmorse, wolfgangp
Differential Revision: https://reviews.llvm.org/D124006
D70524 added support for auto return types for C++ member functions. I was
implementing support on the LLDB side for looking up the deduced type.
I ran into trouble with some cases with respect to lambdas. I looked into
how gcc was handling these cases and it appears gcc emits the deduced return type for lambdas.
So I am changing out behavior to match that.
Differential Revision: https://reviews.llvm.org/D123319
The legacy passes are deprecated now and would be removed in near
future. This patch tries to remove legacy passes in coroutines.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D123918
This is extended to all `std::` functions that take a reference to a
value and return a reference (or pointer) to that same value: `move`,
`forward`, `move_if_noexcept`, `as_const`, `addressof`, and the
libstdc++-specific function `__addressof`.
We still require these functions to be declared before they can be used,
but don't instantiate their definitions unless their addresses are
taken. Instead, code generation, constant evaluation, and static
analysis are given direct knowledge of their effect.
This change aims to reduce various costs associated with these functions
-- per-instantiation memory costs, compile time and memory costs due to
creating out-of-line copies and inlining them, code size at -O0, and so
on -- so that they are not substantially more expensive than a cast.
Most of these improvements are very small, but I measured a 3% decrease
in -O0 object file size for a simple C++ source file using the standard
library after this change.
We now automatically infer the `const` and `nothrow` attributes on these
now-builtin functions, in particular meaning that we get a warning for
an unused call to one of these functions.
In C++20 onwards, we disallow taking the addresses of these functions,
per the C++20 "addressable function" rule. In earlier language modes, a
compatibility warning is produced but the address can still be taken.
The same infrastructure is extended to the existing MSVC builtin
`__GetExceptionInfo`, which is now only recognized in namespace `std`
like it always should have been.
This is a re-commit of
fc30901096,
a571f82a50,
64c045e25b, and
de6ddaeef3,
and reverts aa643f455a.
This change also includes a workaround for users using libc++ 3.1 and
earlier (!!), as apparently happens on AIX, where std::move sometimes
returns by value.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D123345
Revert "Fixup D123950 to address revert of D123345"
This reverts commit aa643f455a.
This is sort of a followup to D37310; that basically fixed the same
issue, but then the libstdc++ implementation of <atomic> changed. Re-fix
the the issue in essentially the same way: look through the addressof
operation to find the alignment of the underlying object.
Differential Revision: https://reviews.llvm.org/D123950
Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.
New checks rely on 2 invariants:
1) For frontend instrumentation, MD_prof branch_weights will always be
populated before llvm.expect intrinsics are lowered.
2) for IR and sample profiling, llvm.expect intrinsics will always be
lowered before branch_weights are populated from the IR profiles.
These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.
Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D115907
This reverts commit af0285122f.
The test "libomp::loop_dispatch.c" on builder
openmp-gcc-x86_64-linux-debian fails from time-to-time.
See #54969. This patch is unrelated.
The OMPScheduleType enum stores the constants from libomp's internal sched_type in kmp.h and are used by several kmp API functions. The enum values have an internal structure, namely each scheduling algorithm (e.g.) exists in four variants: unordered, orderend, normerge unordered, and nomerge ordered.
This patch (basically a followup to D114940) splits the "ordered" and "nomerge" bits into separate flags, as was already done for the "monotonic" and "nonmonotonic", so we can apply bit flags operations on them. It also now contains all possible combinations according to kmp's sched_type. Deriving of the OMPScheduleType enum from clause parameters has been moved form MLIR's OpenMPToLLVMIRTranslation.cpp to OpenMPIRBuilder to make available for clang as well. Since the primary purpose of the flag is the binary interface to libomp, it has been made more private to LLVMFrontend. The primary interface for generating worksharing-loop using OpenMPIRBuilder code becomes `applyWorkshareLoop` which derives the OMPScheduleType automatically and calls the appropriate emitter function.
While this is mostly a NFC refactor, it still applies the following functional changes:
* The logic from OpenMPToLLVMIRTranslation to derive the OMPScheduleType also applies to clang. Most notably, it now applies the nonmonotonic flag for non-static schedules by default.
* In OpenMPToLLVMIRTranslation, the nonmonotonic default flag was previously not applied if the simd modifier was used. I assume this was a bug, since the effect was due to `loop.schedule_modifier()` returning `mlir::omp::ScheduleModifier::none` instead of `llvm::Optional::None`.
* In OpenMPToLLVMIRTranslation, the nonmonotonic default flag was set even if ordered was specified, in breach to what the comment before citing the OpenMP specification says. I assume this was an oversight.
The ordered flag with parameter was not considered in this patch. Changes will need to be made (e.g. adding/modifying function parameters) when support for it is added. The lengthy names of the enum values can be discussed, for the moment this is avoiding reusing previously existing enum value names such as `StaticChunked` to avoid confusion.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D123403
This is extended to all `std::` functions that take a reference to a
value and return a reference (or pointer) to that same value: `move`,
`forward`, `move_if_noexcept`, `as_const`, `addressof`, and the
libstdc++-specific function `__addressof`.
We still require these functions to be declared before they can be used,
but don't instantiate their definitions unless their addresses are
taken. Instead, code generation, constant evaluation, and static
analysis are given direct knowledge of their effect.
This change aims to reduce various costs associated with these functions
-- per-instantiation memory costs, compile time and memory costs due to
creating out-of-line copies and inlining them, code size at -O0, and so
on -- so that they are not substantially more expensive than a cast.
Most of these improvements are very small, but I measured a 3% decrease
in -O0 object file size for a simple C++ source file using the standard
library after this change.
We now automatically infer the `const` and `nothrow` attributes on these
now-builtin functions, in particular meaning that we get a warning for
an unused call to one of these functions.
In C++20 onwards, we disallow taking the addresses of these functions,
per the C++20 "addressable function" rule. In earlier language modes, a
compatibility warning is produced but the address can still be taken.
The same infrastructure is extended to the existing MSVC builtin
`__GetExceptionInfo`, which is now only recognized in namespace `std`
like it always should have been.
This is a re-commit of
fc30901096,
a571f82a50, and
64c045e25b
which were reverted in
e75d8b7037
due to a crasher bug where CodeGen would emit a builtin glvalue as an
rvalue if it constant-folds.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D123345
The previous patch introduced the offloading binary format so we can
store some metada along with the binary image. This patch introduces
using this inside the linker wrapper and Clang instead of the previous
method that embedded the metadata in the section name.
Differential Revision: https://reviews.llvm.org/D122683
std::addressof, plus the libstdc++-specific std::__addressof.
This brings us to parity with the corresponding GCC behavior.
Remove STDBUILTIN macro that ended up not being used.
We still require these functions to be declared before they can be used,
but don't instantiate their definitions unless their addresses are
taken. Instead, code generation, constant evaluation, and static
analysis are given direct knowledge of their effect.
This change aims to reduce various costs associated with these functions
-- per-instantiation memory costs, compile time and memory costs due to
creating out-of-line copies and inlining them, code size at -O0, and so
on -- so that they are not substantially more expensive than a cast.
Most of these improvements are very small, but I measured a 3% decrease
in -O0 object file size for a simple C++ source file using the standard
library after this change.
We now automatically infer the `const` and `nothrow` attributes on these
now-builtin functions, in particular meaning that we get a warning for
an unused call to one of these functions.
In C++20 onwards, we disallow taking the addresses of these functions,
per the C++20 "addressable function" rule. In earlier language modes, a
compatibility warning is produced but the address can still be taken.
The same infrastructure is extended to the existing MSVC builtin
`__GetExceptionInfo`, which is now only recognized in namespace `std`
like it always should have been.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D123345
In D123649, I got the formula for getFlexibleArrayInitChars slightly
wrong: the flexible array elements can be contained in the tail padding
of the struct. Fix the formula to account for that.
With the fixed formula, we run into another issue: in some cases, we
were emitting extra padding for flexible arrray initializers. Fix
CGExprConstant so it uses a packed struct when necessary, to avoid this
extra padding.
Differential Revision: https://reviews.llvm.org/D123826
This patch removes use of the deprecated `DirectoryEntry::getName()` from clangCodeGen by using `{File,Directory}EntryRef` instead.
Reviewed By: bnbarham
Differential Revision: https://reviews.llvm.org/D123768
Flexible array initialization is a C/C++ extension implemented in many
compilers to allow initializing the flexible array tail of a struct type
that contains a flexible array. In clang, this is currently restricted
to C. But this construct is used in the Microsoft SDK headers, so I'd
like to extend it to C++.
For now, this doesn't handle dynamic initialization; probably not hard
to implement, but it's extra code, and I don't think it's necessary for
the expected uses. And we explicitly fail out of constant evaluation.
I've added some additional code to assert that initializers have the
correct size, with or without flexible array init. This might catch
issues unrelated to flexible array init.
Differential Revision: https://reviews.llvm.org/D123649
Undefined behaviour is just passed on to extract_element when the
index is out of bounds. Subscript on svbool_t is not allowed as
this doesn't really have meaningful semantics.
Differential Revision: https://reviews.llvm.org/D122732
This patch changes type of the `File` parameter in `PPCallbacks::InclusionDirective()` from `const FileEntry *` to `Optional<FileEntryRef>`.
With the API change in place, this patch then removes some uses of the deprecated `FileEntry::getName()` (e.g. in `DependencyGraph.cpp` and `ModuleDependencyCollector.cpp`).
Reviewed By: dexonsmith, bnbarham
Differential Revision: https://reviews.llvm.org/D123574
We were generating wrong code for cxx20-consteval-crash.cpp: instead of
loading a value of a variable, we were using its address as the
initializer.
Found while adding code to verify the size of constant initializers.
Differential Revision: https://reviews.llvm.org/D123648
Currently we emit an error in just about every case of conditionals
with a 'non simple' branch if treated as an LValue. This patch adds
support for the special case where this is an 'ignored' lvalue, which
permits the side effects from happening.
It also splits up the emit for conditional LValue in a way that should
be usable to handle simple assignment expressions in similar situations.
Differential Revision: https://reviews.llvm.org/D123680
For -fgpu-rdc, a host function may call an external kernel
which is defined in an archive of bitcode. Since this external
kernel is only referenced in host function, the device
bitcode does not contain reference to this external
kernel, then the linker will not try to resolve this external
kernel in the archive.
To fix this issue, host-used external kernels and device
variables are tracked. A global array containing pointers
to these external kernels and variables is emitted which
serves as an artificial references to the external kernels
and variables used by host.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D123441
This removes the -flegacy-pass-manager and
-fno-experimental-new-pass-manager options, and the corresponding
support code in BackendUtil. The -fno-legacy-pass-manager and
-fexperimental-new-pass-manager options are retained as no-ops.
Differential Revision: https://reviews.llvm.org/D123609
LTO objects might compiled with different `mbranch-protection` flags which will cause an error in the linker.
Such a setup is allowed in the normal build with this change that is possible.
Reviewed By: pcc
Differential Revision: https://reviews.llvm.org/D123493
This patch changes `EmitPPCBuiltinExpr` in `CGBuiltin.cpp` to remove
the loop at the beginning of the function that emits the arguments and
to delay emitting the arguments until inside the switch statement. These
changes will put `EmitPPCBuiltinExpr` in line with the strategy of the
target independent function `EmitBuiltinExpr`. Also, this patch
ensures that arguments are only emitted once.
Tests that included builtins affected by these changes have been
modified to match expected behaviour.
Reviewed By: #powerpc, nemanjai, amyk
Differential Revision: https://reviews.llvm.org/D121637
In theory, constructors can take arguments when called via .init_array
where at least glibc passes in (argc, argv, envp). This isn't used in
the generated code and if it was, the first argument should be an
integer, not a pointer. For destructors registered via atexit, the
function should never take an argument.
Differential Revision: https://reviews.llvm.org/D123370
Currently, enablement of heap MTE on Android is specified by an ELF note, which
signals to the linker to enable heap MTE. This change allows
-fsanitize=memtag-heap to synthesize these notes, rather than adding them
through the build system. We need to extend this feature to also signal the
linker to do special work for MTE globals (in future) and MTE stack (currently
implemented in the toolchain, but not implemented in the loader).
Current Android uses a non-backwards-compatible ELF note, called
".note.android.memtag". Stack MTE is an ABI break anyway, so we don't mind that
we won't be able to run executables with stack MTE on Android 11/12 devices.
The current expectation is to support the verbiage used by Android, in
that "SYNC" means MTE Synchronous mode, and "ASYNC" effectively means
"fast", using the Kernel auto-upgrade feature that allows
hardware-specific and core-specific configuration as to whether "ASYNC"
would end up being Asynchronous, Asymmetric, or Synchronous on that
particular core, whichever has a reasonable performance delta. Of
course, this is platform and loader-specific.
Differential Revision: https://reviews.llvm.org/D118948
This was skipping specific lifetime + bitcast patterns, but with
opaque pointers the bitcast will not be present, and we did not
perform this fold.
Instead skip over lifetime.end and bitcasts generally, without
trying to correlate them.
When an inline builtin declaration is shadowed by an actual declaration, we must
reference the actual declaration, even if it's not the last, following GCC
behavior.
This fixes#54715
Differential Revision: https://reviews.llvm.org/D123308
Since the NTTP may need to be cast to the type when rebuilding the name,
check that the type can be rebuilt when determining whether a template
name can be simplified.
This patch changes `EmitPPCBuiltinExpr` in `CGBuiltin.cpp` to remove
the loop at the beginning of the function that emits the arguments and
to delay emitting the arguments until inside the switch statement. These
changes will put `EmitPPCBuiltinExpr` in line with the strategy of the
target independent function `EmitBuiltinExpr`. Also, this patch
ensures that arguments are only emitted once.
Tests that included builtins affected by these changes have been
modified to match expected behaviour.
Reviewed By: #powerpc, nemanjai, amyk
Differential Revision: https://reviews.llvm.org/D121637
The code to check if the regular LTO summary should be emitted and to
add the corresponding module flags was duplicated in the
'EmitAssemblyHelper::EmitAssemblyWithLegacyPassManager' and
'EmitAssemblyHelper::RunOptimizationPipeline' methods.
In order to eliminate these code duplications, the
'EmitAssemblyHelper::shouldEmitRegularLTOSummary' method has been
extracted. The method returns a bool value, the value is 'true' if the
module summary should be emitted. The patch keeps the setting of the
module flags inline.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D123026
clang to emit DWARF information for global alias variable as
DW_TAG_imported_declaration. This change also handles nested
(recursive) imported declarations.
Reviewed by: dblaikie, aprantl
Differential Revision: https://reviews.llvm.org/D120989
Add support for builtin_[max|min] which has below prototype:
A builtin_max (A1, A2, A3, ...)
All arguments must have the same type; they must all be float, double, or long double.
Internally use SelectCC to get the result.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D122478
This change merges code for emit of target and target_clones multiversion
resolver functions and, in doing so, corrects handling of target_clones
functions that are declared but not defined. Previously, a use of such
a target_clones function would result in an attempted emit of an ifunc
that referenced an undefined resolver function. Ifunc references to
undefined resolver functions are not allowed and, when the LLVM verifier
is not disabled (via '-disable-llvm-verifier'), resulted in the verifier
issuing a "IFunc resolver must be a definition" error and aborting the
compilation. With this change, ifuncs and resolver function definitions
are always emitted for used target_clones functions regardless of whether
the target_clones function is defined (if the function is defined, then
the ifunc and resolver are emitted regardless of whether the function is
used).
This change has the side effect of causing target_clones variants and
resolver functions to be emitted in a different order than they were
previously. This is harmless and is reflected in the updated tests.
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D122958
This change modifies CodeGenModule::emitMultiVersionFunctions() in preparation
for a change that will merge support for emitting target_clones resolvers into
this function. This change mostly serves to isolate indentation changes from
later behavior modifying changes.
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D122957
Previously, GetOrCreateMultiVersionResolver() required the caller to provide
a GlobalDecl along with an llvm::type and FunctionDecl. The latter two can be
cheaply obtained from the first, and the llvm::type parameter is not always
used, so requiring the caller to provide them was unnecessary and created the
possibility that callers would pass an inconsistent set. This change simplifies
the interface to only require the GlobalDecl value.
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D122956
Since enumerators may not be available in every translation unit they
can't be reliably used to name entities. (this also makes simplified
template name roundtripping infeasible - since the expected name could
only be rebuilt if the enumeration definition could be found (or only if
it couldn't be found, depending on the context of the original name))
Comparison operators on SVE types return a signed integer vector
of the same width as the incoming SVE type. This matches the existing
behaviour for NEON types.
Differential Revision: https://reviews.llvm.org/D122404
This allows both explicitly enabling and explicitly disabling
opaque pointers, in anticipation of the default switching at some
point.
This also slightly changes the rules by allowing calls if either
the opaque pointer mode has not yet been set (explicitly or
implicitly) or if the value remains unchanged.
This adds cc1 options for enabling and disabling opaque pointers
on the clang side. This is not super useful now (because
-mllvm -opaque-pointers and -Xclang -opaque-pointers have the same
visible effect) but will be important once opaque pointers are
enabled by default in clang. In that case, it will only be
possible to disable them using the cc1 -no-opaque-pointers option.
Differential Revision: https://reviews.llvm.org/D123034
Few times in different methods of the EmitAssemblyHelper class the following
code snippet is used to get the TargetTriple and then use it's single method
to check some conditions:
TargetTriple(TheModule->getTargetTriple())
The parsing of a target triple string is not a trivial operation and it takes
time to repeat the parsing many times in different methods of the class and
even numerous times in one method just to call a getter
(llvm::Triple(TheModule->getTargetTriple()).getVendor()), for example.
The patch extracts the TargetTriple member of the EmitAssemblyHelper class to
parse the triple only once in the class' constructor.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D122587
We have some discission in D99152 and llvm-dev and finially come up with
a solution to add amx specific cast intrinsics. We've support the
intrinsics in llvm IR. This patch is to replace bitcast with amx cast
intrinsics in code emitting in FE.
Differential Revision: https://reviews.llvm.org/D122567
We expect that `extern "C"` static functions to be usable in things like
inline assembly, as well as ifuncs:
See the bug report here: https://github.com/llvm/llvm-project/issues/54549
However, we were diagnosing this as 'not defined', because the
ifunc's attempt to look up its resolver would generate a declared IR
function.
Additionally, as background, the way we allow these static extern "C"
functions to work in inline assembly is by making an alias with the C
mangling in MOST situations to the version we emit with
internal-linkage/mangling.
The problem here was multi-fold: First- We generated the alias after the
ifunc was checked, so the function by that name didn't exist yet.
Second, the ifunc's generation caused a symbol to exist under the name
of the alias already (the declared function above), which suppressed the
alias generation.
This patch fixes all of this by moving the checking of ifuncs/CFE aliases
until AFTER we have generated the extern-C alias. Then, it does a
'fixup' around the GlobalIFunc to make sure we correct the reference.
Differential Revision: https://reviews.llvm.org/D122608
Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.
New checks rely on 2 invariants:
1) For frontend instrumentation, MD_prof branch_weights will always be
populated before llvm.expect intrinsics are lowered.
2) for IR and sample profiling, llvm.expect intrinsics will always be
lowered before branch_weights are populated from the IR profiles.
These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.
Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D115907
Beautify dump format, add indent for nested struct and struct members, also fix test cases in dump-struct-builtin.c
for example:
struct:
```
struct A {
int a;
struct B {
int b;
struct C {
struct D {
int d;
union E {
int x;
int y;
} e;
} d;
int c;
} c;
} b;
};
```
Before:
```
struct A {
int a = 0
struct B {
int b = 0
struct C {
struct D {
int d = 0
union E {
int x = 0
int y = 0
}
}
int c = 0
}
}
}
```
After:
```
struct A {
int a = 0
struct B {
int b = 0
struct C {
struct D {
int d = 0
union E {
int x = 0
int y = 0
}
}
int c = 0
}
}
}
```
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D122704
Remove anonymous tag locations, powered by 'PrintingPolicy',
@aaron.ballman once suggested removing this extra information in
https://reviews.llvm.org/D122248
struct:
struct S {
int a;
struct /* Anonymous*/ {
int x;
} b;
int c;
};
Before:
struct S {
int a = 0
struct S::(unnamed at ./builtin_dump_struct.c:20:3) {
int x = 0
}
int c = 0
}
After:
struct S {
int a = 0
struct S::(unnamed) {
int x = 0
}
int c = 0
}
Differntial Revision: https://reviews.llvm.org/D122670
Currently, the regcall calling conversion in Clang doesn't match with
ICC when passing / returning structures. https://godbolt.org/z/axxKMKrW7
This patch tries to fix the problem to match with ICC.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D122104
Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.
New checks rely on 2 invariants:
1) For frontend instrumentation, MD_prof branch_weights will always be
populated before llvm.expect intrinsics are lowered.
2) for IR and sample profiling, llvm.expect intrinsics will always be
lowered before branch_weights are populated from the IR profiles.
These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.
Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D115907
This builtin returns the address of a global instance of the
`std::source_location::__impl` type, which must be defined (with an
appropriate shape) before calling the builtin.
It will be used to implement std::source_location in libc++ in a
future change. The builtin is compatible with GCC's implementation,
and libstdc++'s usage. An intentional divergence is that GCC declares
the builtin's return type to be `const void*` (for
ease-of-implementation reasons), while Clang uses the actual type,
`const std::source_location::__impl*`.
In order to support this new functionality, I've also added a new
'UnnamedGlobalConstantDecl'. This artificial Decl is modeled after
MSGuidDecl, and is used to represent a generic concept of an lvalue
constant with global scope, deduplicated by its value. It's possible
that MSGuidDecl itself, or some of the other similar sorts of things
in Clang might be able to be refactored onto this more-generic
concept, but there's enough special-case weirdness in MSGuidDecl that
I gave up attempting to share code there, at least for now.
Finally, for compatibility with libstdc++'s <source_location> header,
I've added a second exception to the "cannot cast from void* to T* in
constant evaluation" rule. This seems a bit distasteful, but feels
like the best available option.
Reviewers: aaron.ballman, erichkeane
Differential Revision: https://reviews.llvm.org/D120159
This patch adds the necessary AMDGPU calling convention to the ctor /
dtor kernels. These are fundamentally device kenels called by the host
on image load. Without this calling convention information the AMDGPU
plugin is unable to identify them.
Depends on D122504
Fixes#54091
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122515
The default construction of constructor functions by LLVM tends to make
them have internal linkage. When we call a ctor / dtor function in the
target region we are actually creating a kernel that is called at
registration. Because the ctor is a kernel we need to make sure it's
externally visible so we can actually call it. This prevented AMDGPU
from correctly using constructors while NVPTX could use them simply
because it ignored internal visibility.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D122504
This patch adds a helper method to determine if a nonvirtual base has an entry in the LLVM struct. Such a base may not have an entry
if the base does not have any fields/bases itself that would change the size of the struct. This utility method is useful for other frontends (Polygeist) that use Clang as an API to generate code.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D122502
Currently the device kernels all have weak linkage to prevent linkage
errors on multiple defintions. However, this prevents some optimizations
from adequately analyzing them because of the nature of weak linkage.
This patch replaces the weak linkage with weak_odr linkage so we can
statically assert that multiple declarations of the same kernel will
have the same definition.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122443
Current clang generates extra set of simd variant function attribute
with extra 'v' encoding.
For example:
_ZGVbN2v__Z5add_1Pf vs _ZGVbN2vv__Z5add_1Pf
The problem is due to declaration of ParamAttrs following:
llvm::SmallVector<ParamAttrTy, 8> ParamAttrs(ParamPositions.size());
where ParamPositions.size() is grown after following assignment:
Pos = ParamPositions[PVD];
So the PVD is not find in ParamPositions.
The problem is ParamPositions need to set for each FD decl. To fix this
Move ParamPositions's init inside while loop for each FD.
Differential Revision: https://reviews.llvm.org/D122338
Fix clang crash and add bitfield support in __builtin_dump_struct.
In clang13.0.x, a struct with three or more members and a bitfield at
the same time will cause a crash. In clang15.x, as long as the struct
has one bitfield, it will cause a crash in clang.
Open issue: https://github.com/llvm/llvm-project/issues/54462
Differential Revision: https://reviews.llvm.org/D122248
This information isn't preserved in the DWARF description of function
types (though probably should be - it's preserved on the function
declarations/definitions themselves through the DW_AT_noreturn attribute
- but we should move or also include that in the subroutine type itself
too - but for now, with it not being there, the DWARF is lossy and
can't be reconstructed)
Adds basic parsing/sema/serialization support for the
#pragma omp target parallel loop directive.
Differential Revision: https://reviews.llvm.org/D122359
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.
Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.
Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.
Differential Revision: https://reviews.llvm.org/D121736
Currently we create offloading entries to register device variables with
the host. When we register a variable we will look up the symbol in the
device image and map the device address to the host address. This is a
problem when the symbol is declared with hidden visibility or internal
linkage. This means the symbol is not accessible externally and we
cannot get its address. We should still allow static variables to be
declared on the device, but ew should not create an offloading entry for
them so they exist independently on the host and device.
Fixes#54309
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122352
The way the check is written is not compatible with opaque
pointers -- while we don't need to change the IR pointer type,
we do need to change the element type stored in the Address.
As we're going to reassign the initializer, we actually need the
value types to match, not just the pointer types. This is only
relevant with opaque pointers.
This patch extends the support for C/C++ operators for SVE
types to allow one of the arguments to be a scalar, in which
case a vector splat is performed.
Differential Revision: https://reviews.llvm.org/D121829
This requires some adjustment in caller code, because there was
a confusion regarding the meaning of the PtrTy argument: This
argument is the type of the pointer being loaded, not the addresses
being loaded from.
Reapply after fixing the specified pointer type for one call in
47eb4f7dcd, where the used type is
important for determining alignment.
GCC supports power-of-2 size structures for the arguments. Clang supports fewer than GCC. But Clang always crashes for the unsupported cases.
This patch adds sema checks to do the diagnosts to solve these crashes.
Reviewed By: jyu2
Differential Revision: https://reviews.llvm.org/D107141
Worth noting that the code marked with FIXME is dead and would
produce invalid IR if hit. Someone familiar with this code should
probably look into that.
Before we start addressing the issue with having
a lot of false positives when using debugify in
the original mode, we have made a few patches that
should speed up the execution of the testing
utility Passes.
For example, when testing a large project
(let's say LLVM project itself), we can face
a lot of potential DI issues. Usually, we use
-verify-each-debuginfo-preserve (that is very
similar to -debugify-each) -- it collects
DI metadata before each Pass, and after the Pass
it checks if the Pass preserved the DI metadata.
However, we can speed up this process, since we
don't need to collect DI metadata before each
Pass -- we could use the DI metadata that are
collected after the previous Pass from
the pipeline as an input for the next Pass.
This patch speeds up the utility for ~2x.
Differential Revision: https://reviews.llvm.org/D115622
This requires some adjustment in caller code, because there was
a confusion regarding the meaning of the PtrTy argument: This
argument is the type of the pointer being loaded, not the addresses
being loaded from.
The EmitLoadOfPointer() call already specified the right pointer
type, but it did not match the Address we're loading from, so we
need to insert a bitcast first.
Rather than using a dummy void pointer type, we should specify the
correct private type and perform the bitcast beforehand rather than
afterwards. This way, the Address will have correct alignment
information.
https://reviews.llvm.org/D23944 implemented the #pragma intrinsic from
MSVC. This causes the statement #pragma intrinsic(cpuid) to fail [0]
on Clang because cpuid is currently implemented in intrin.h instead
of a Clang builtin. Reimplementing cpuid (as well as it's releated
function, cpuidex) should resolve this.
[0]: https://crbug.com/1279344
Differential revision: https://reviews.llvm.org/D121653
Rather than specifying a dummy type in EmitLoadOfPointer() and
then casting it to the correct one, we should instead specify the
correct type and cast beforehand. Otherwise the computed alignment
will be incorrect.
Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.
New checks rely on 2 invariants:
1) For frontend instrumentation, MD_prof branch_weights will always be
populated before llvm.expect intrinsics are lowered.
2) for IR and sample profiling, llvm.expect intrinsics will always be
lowered before branch_weights are populated from the IR profiles.
These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.
Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.
Differential Revision: https://reviews.llvm.org/D115907
Summary:
Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1].
To get aperture bases when targets do not have aperture registers, we load
private_base or shared_base directly from the implicit kernarg. In clang, we use
implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.
Reviewers: arsenm, sameerds, yaxunl
Differential Revision: https://reviews.llvm.org/D120265
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.
Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.
Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.
Differential Revision: https://reviews.llvm.org/D121736
Poison trivial class members one-by-one in the reverse order of their
construction, instead of all-at-once at the very end.
For example, in the following code access to `x` from `~B` will
produce an undefined value.
struct A {
struct B b;
int x;
};
Reviewed By: kda
Differential Revision: https://reviews.llvm.org/D119600
-fsanitize-memory-use-after-dtor detects memory access after a
subobject is destroyed but its memory is not yet deallocated.
This is done by poisoning each object memory near the end of its destructor.
Subobjects (members and base classes) do this in their respective
destructors, and the parent class does the same for its members with
trivial destructors.
Inexplicably, base classes with trivial destructors are not handled at
all. This change fixes this oversight by adding the base class poisoning logic
to the parent class destructor.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D119300
In AMD GPU device code the globals are in AS(1). Before, we crashed if
the global was a structure. Now we simply cast away the AS before we
generate the code to initialize the global.
Differential Revision: https://reviews.llvm.org/D121837
Includes verifier changes checking the elementtype, clang codegen
changes to emit the elementtype, and ISel changes using the elementtype.
Basically the same as D120527.
Reviewed By: #opaque-pointers, nikic
Differential Revision: https://reviews.llvm.org/D121847
Fix the instruction names to match the WebAssembly spec:
- `i32x4.trunc_sat_zero_f64x2_{s,u}` => `i32x4.trunc_sat_f64x2_{s,u}_zero`
- `f32x4.demote_zero_f64x2` => `f32x4.demote_f64x2_zero`
Also rename related things like intrinsics, builtins, and test functions to
match.
Reviewed By: aheejin
Differential Revision: https://reviews.llvm.org/D121661
Current ASTContext.getAttributedType() takes attribute kind,
ModifiedType and EquivType as the hash to decide whether an AST node
has been generated or note. But this is not enough for btf_type_tag
as the attribute might have the same ModifiedType and EquivType, but
still have different string associated with attribute.
For example, for a data structure like below,
struct map_value {
int __attribute__((btf_type_tag("tag1"))) __attribute__((btf_type_tag("tag3"))) *a;
int __attribute__((btf_type_tag("tag2"))) __attribute__((btf_type_tag("tag4"))) *b;
};
The current ASTContext.getAttributedType() will produce
an AST similar to below:
struct map_value {
int __attribute__((btf_type_tag("tag1"))) __attribute__((btf_type_tag("tag3"))) *a;
int __attribute__((btf_type_tag("tag1"))) __attribute__((btf_type_tag("tag3"))) *b;
};
and this is incorrect.
It is very difficult to use the current AttributedType as it is hard to
get the tag information. To fix the problem, this patch introduced
BTFTagAttributedType which is similar to AttributedType
in many ways but with an additional BTFTypeTagAttr. The tag itself can
be retrieved with BTFTypeTagAttr.
With the new BTFTagAttributed type, the debuginfo code can be greatly
simplified compared to previous TypeLoc based approach.
Differential Revision: https://reviews.llvm.org/D120296
This is the `ext_vector_type` alternative to D81083.
This patch extends Clang to allow 'bool' as a valid vector element type
(attribute ext_vector_type) in C/C++.
This is intended as the canonical type for SIMD masks and facilitates
clean vector intrinsic declarations. Vectors of i1 are supported on IR
level and below down to many SIMD ISAs, such as AVX512, ARM SVE (fixed
vector length) and the VE target (NEC SX-Aurora TSUBASA).
The RFC on cfe-dev: https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D88905
This fixes a bug that happens when using -fdebug-prefix-map to remap an
absolute path to a relative path. Since the path was absolute before
remapping, it is safe to assume that concatenating the remapped working
directory would be wrong.
This was originally submitted as https://reviews.llvm.org/D113718, but
reverted because when testing with dwarf 5 enabled, the tests were too
strict.
Differential Revision: https://reviews.llvm.org/D121663
On unix systems this logic would not separate the file and directory of
the DIFile unless they shared more components at the start than just the
root path character. The logic to do this was unix specific so it didn't
work on Windows. Now we check if the entire root_path is the same as
what you were going to set as the Dir and use the full filepath in that
case.
Differential Revision: https://reviews.llvm.org/D111579
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.
Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.
Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.
Differential Revision: https://reviews.llvm.org/D121327
Currently we use the `-fembed-offload-object` option to embed a binary
file into the host as a named section. This is currently only used as a
codegen action, meaning we only handle this option correctly when the
input is a bitcode file. This patch adds the same handling to embed an
offloading object after we complete code generation. This allows us to
embed the object correctly if the input file is source or bitcode.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D120270
Motivation:
```
int test(int x, int y) {
int r = 0;
[[clang::always_inline]] r += foo(x, y); // force compiler to inline this function here
return r;
}
```
In 2018, @kuhar proposed "Introduce per-callsite inline intrinsics" in https://reviews.llvm.org/D51200 to solve this motivation case (and many others).
This patch solves this problem with call site attribute. "noinline" statement attribute already landed in D119061. Also, some LLVM Inliner fixes landed so call site attribute is stronger than function attribute.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D120717
Includes verifier changes checking the elementtype, clang codegen
changes to emit the elementtype, and ISel changes using the elementtype.
Reviewed By: #opaque-pointers, nikic
Differential Revision: https://reviews.llvm.org/D120527
Due to various implementation constraints, despite the programmer
choosing a 'processor' cpu_dispatch/cpu_specific needs to use the
'feature' list of a processor to identify it. This results in the
identified processor in source-code not being propogated to the
optimizer, and thus, not able to be tuned for.
This patch changes to use the actual cpu as written for tune-cpu so that
opt can make decisions based on the cpu-as-spelled, which should better
match the behavior expected by the programmer.
Note that the 'valid' list of processors for x86 is in
llvm/include/llvm/Support/X86TargetParser.def. At the moment, this list
contains only Intel processors, but other vendors may wish to add their
own entries as 'alias'es (or with different feature lists!).
If this is not done, there is two potential performance issues with the
patch, but I believe them to be worth it in light of the improvements to
behavior and performance.
1- In the event that the user spelled "ProcessorB", but we only have the
features available to test for "ProcessorA" (where A is B minus
features),
AND there is an optimization opportunity for "B" that negatively affects
"A", the optimizer will likely choose to do so.
2- In the event that the user spelled VendorI's processor, and the
feature
list allows it to run on VendorA's processor of similar features, AND
there
is an optimization opportunity for VendorIs that negatively affects
"A"s,
the optimizer will likely choose to do so. This can be fixed by adding
an
alias to X86TargetParser.def.
Differential Revision: https://reviews.llvm.org/D121410
Replaces use of getCurrentFile with getCurrentFileOrBufferName
in CodeGenAction. This avoids an assertion error or an incorrect
name chosen for the output file when assertions are disabled.
This error previously occurred when the FrontendInputFile was a
MemoryBuffer instead of a file.
Reviewed By: jlebar
Differential Revision: https://reviews.llvm.org/D121259
* Use default ref capture for non-escaping lambdas (this makes
maintenance easier by allowing new uses, removing uses, having
conditional uses (such as in assertions) not require updates to an
explicit capture list)
* Simplify addPrivate API not to take a lambda, since it calls it
unconditionally/immediately anyway - most callers are simply passing
in a named value or short expression anyway and the lambda syntax just
adds noise/overhead
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D121077
Currently in Clang, we have two types of builtins for fnmsub operation:
one for float/double vector, they'll be transformed into IR operations;
one for float/double scalar, they'll generate corresponding intrinsics.
But for the vector version of builtin, the 3 op chain may be recognized
as expensive by some passes (like early cse). We need some way to keep
the fnmsub form until code generation.
This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub
intrinsics.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D116015
Clang is crashing on the following statement
char var[9];
__asm__ ("" : "=r" (var) : "0" (var));
This is similar to existing test: crbug_999160_regtest
The issue happens when EmitAsmStmt is trying to convert input to match
output type length. However, that is not guaranteed to be successful all the
time and if the statement itself is invalid like having an array type in
the example, we should give a regular error message here instead of
using assert().
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D120596
Currently adding attribute no_sanitize("bounds") isn't disabling
-fsanitize=local-bounds (also enabled in -fsanitize=bounds). The Clang
frontend handles fsanitize=array-bounds which can already be disabled by
no_sanitize("bounds"). However, instrumentation added by the
BoundsChecking pass in the middle-end cannot be disabled by the
attribute.
The fix is very similar to D102772 that added the ability to selectively
disable sanitizer pass on certain functions.
In this patch, if no_sanitize("bounds") is provided, an additional
function attribute (NoSanitizeBounds) is attached to IR to let the
BoundsChecking pass know we want to disable local-bounds checking. In
order to support this feature, the IR is extended (similar to D102772)
to make Clang able to preserve the information and let BoundsChecking
pass know bounds checking is disabled for certain function.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D119816
Add applyStaticChunkedWorkshareLoop method implementing static schedule when chunk-size is specified. Unlike a static schedule without chunk-size (where chunk-size is chosen by the runtime such that each thread receives one chunk), we need two nested loops: one for looping over the iterations of a chunk, and a second for looping over all chunks assigned to the threads.
This patch includes the following related changes:
* Adapt applyWorkshareLoop to triage between the schedule types, now possible since all schedules have been implemented. The default schedule is assumed to be non-chunked static, as without OpenMPIRBuilder.
* Remove the chunk parameter from applyStaticWorkshareLoop, it is ignored by the runtime. Change the value for the value passed to the init function to 0, as without OpenMPIRBuilder.
* Refactor CanonicalLoopInfo::setTripCount and CanonicalLoopInfo::mapIndVar as used by both, applyStaticWorkshareLoop and applyStaticChunkedWorkshareLoop.
* Enable Clang to use the OpenMPIRBuilder in the presence of the schedule clause.
Differential Revision: https://reviews.llvm.org/D114413
Motivation:
```
int foo(int x, int y) { // any compiler will happily inline this function
return x / y;
}
int test(int x, int y) {
int r = 0;
[[clang::noinline]] r += foo(x, y); // for some reason we don't want any inlining here
return r;
}
```
In 2018, @kuhar proposed "Introduce per-callsite inline intrinsics" in https://reviews.llvm.org/D51200 to solve this motivation case (and many others).
This patch solves this problem with call site attribute. The implementation is "smaller" wrt approach which uses new intrinsics and thanks to https://reviews.llvm.org/D79121 (Add nomerge statement attribute to clang), we have got some basic infrastructure to deal with attrs on statements with call expressions.
GCC devs are more inclined to call attribute solution as well, as builtins are problematic for them - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104187. But they have no patch proposal yet so.. We have free hands here.
If this approach makes sense, next future steps would be support for call site attributes for always_inline / flatten.
Reviewed By: aaron.ballman, kuhar
Differential Revision: https://reviews.llvm.org/D119061
The purpose of this change is to fix the following codegen bug:
```
// main.c
__attribute__((cpu_specific(generic)))
int *foo(void) { static int z; return &z;}
int main() { return *foo() = 5; }
// other.c
__attribute__((cpu_dispatch(generic))) int *foo(void);
// run:
clang main.c other.c -o main; ./main
```
This will segfault prior to the change, and return the correct
exit code 5 after the change.
The underlying cause is that when a translation unit contains
a cpu_specific function without the corresponding cpu_dispatch
the generated code binds the reference to foo() against a
GlobalIFunc whose resolver is undefined. This is invalid: the
resolver must be defined in the same translation unit as the
ifunc, but historically the LLVM bitcode verifier did not check
that. The generated code then binds against the resolver rather
than the ifunc, so it ends up calling the resolver rather than
the resolvee. In the example above it treats its return value as
an int *, therefore trying to write to program text.
The root issue at the representation level is that GlobalIFunc,
like GlobalAlias, does not support a "declaration" state. The
object which provides the correct semantics in these cases
is a Function declaration, but unlike Functions, changing a
declaration to a definition in the GlobalIFunc case constitutes
a change of the object type, as opposed to simply emitting code
into a Function.
I think this limitation is unlikely to change, so I implemented
the fix by returning a function declaration rather than an ifunc
when encountering cpu_specific, and upgrading it to an ifunc
when emitting cpu_dispatch.
This uses `takeName` + `replaceAllUsesWith` in similar vein to
other places where the correct IR object type cannot be known
locally/up-front, like in `CodeGenModule::EmitAliasDefinition`.
Previous discussion in: https://reviews.llvm.org/D112349
Signed-off-by: Itay Bookstein <ibookstein@gmail.com>
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D120266
This fixes a bug that happens when using -fdebug-prefix-map to remap
an absolute path to a relative path. Since the path was absolute
before remapping, it is safe to assume that concatenating the remapped
working directory would be wrong.
Differential Revision: https://reviews.llvm.org/D113718
Changed the we handle llvm::Constants in sizes arrays. ConstExprs and
GlobalValues cannot be used as initializers, need to put them at the
runtime, otherwise there wight be the compilation errors.
Differential Revision: https://reviews.llvm.org/D105297
(resubmit https://reviews.llvm.org/D119207 after fixing the test for
some build settings)
This patch converts CUDA pointer kernel arguments with default address
space to CrossWorkGroup address space (__global in OpenCL). This is
because Generic or Function (OpenCL's private) is not supported as
storage class for kernel pointer types.
Differential revision: https://reviews.llvm.org/D120366
Changed the we handle llvm::Constants in sizes arrays. ConstExprs and
GlobalValues cannot be used as initializers, need to put them at the
runtime, otherwise there wight be the compilation errors.
Differential Revision: https://reviews.llvm.org/D105297
Summary:
We use a section to embed offloading code into the host for later
linking. This is normally unique to the translation unit as it is thrown
away during linking. However, if the user performs a relocatable link
the sections will be merged and we won't be able to access the files
stored inside. This patch changes the section variables to have external
linkage and a name defined by the section name, so if two sections are
combined during linking we get an error.
Introduce -fgpu-default-stream={legacy|per-thread} option to
support per-thread default stream for HIP runtime.
When -fgpu-default-stream=per-thread, HIP kernels are
launched through hipLaunchKernel_spt instead of
hipLaunchKernel. Also HIP_API_PER_THREAD_DEFAULT_STREAM=1
is defined by the preprocessor to enable other per-thread stream
API's.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D120298