This patch adds support for the `depend` clause for the `task`
construct.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D135695
This is an alternative of D120395 and D120411.
Previously we use `__bfloat16` as a typedef of `unsigned short`. The
name may give user an impression it is a brand new type to represent
BF16. So that they may use it in arithmetic operations and we don't have
a good way to block it.
To solve the problem, we introduced `__bf16` to X86 psABI and landed the
support in Clang by D130964. Now we can solve the problem by switching
intrinsics to the new type.
Reviewed By: LuoYuanke, RKSimon
Differential Revision: https://reviews.llvm.org/D132329
Support SV_DispatchThreadID attribute.
Translate it into dx.thread.id in clang codeGen.
Reviewed By: beanz, aaron.ballman
Differential Revision: https://reviews.llvm.org/D133983
This patch changes the kernels generated by OpenMP to have protected
visibility. This is unlikely to change anything functionally. However,
protected visibility better matches the behaviour of these GPU kernels.
We do not expect any pending shared library load to preempt these
kernels so we can specify a more restrictive visibility.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D136198
Currently generation of align assumptions for OpenMP simd construct is done
outside OMPIRBuilder for C code and it is not supported for Fortran.
According to OpenMP 5.0 standard (2.9.3) only pointers and arrays can be
aligned for C code.
If given aligned variable is pointer, then Clang generates the following set
of the LLVM IR isntructions to support simd align clause:
; memory allocation for pointer address:
%A.addr = alloca ptr, align 8
; some LLVM IR code
; Alignment instructions (alignment is equal to 32):
%0 = load ptr, ptr %A.addr, align 8
call void @llvm.assume(i1 true) [ "align"(ptr %0, i64 32) ]
If given aligned variable is array, then Clang generates the following set
of the LLVM IR isntructions to support simd align clause:
; memory allocation for array:
%B = alloca [10 x i32], align 16
; some LLVM IR code
; Alignment instructions (alignment is equal to 32):
%arraydecay = getelementptr inbounds [10 x i32], ptr %B, i64 0, i64 0
call void @llvm.assume(i1 true) [ "align"(ptr %arraydecay, i64 32) ]
OMPIRBuilder was modified to generate aligned assumptions. It generates only
llvm.assume calls. Frontend is responsible for generation of aligned pointer
and getting the default alignment value if user does not specify it in aligned
clause.
Unit and regression tests were added to check if aligned clause was handled correctly.
Differential Revision: https://reviews.llvm.org/D133578
Reviewed By: jdoerfert
Extra NUL does not impact functionality of the generated code, but it confuses
various NVIDIA tools used to examine embedded GPU binaries.
Differential Revision: https://reviews.llvm.org/D135832
''register(ID, space)'' like register(t3, space1) will be translated into
i32 3, i32 1 as the last 2 operands for resource annotation metadata.
NamedMetadata for CBuffers and SRVs are added as "hlsl.srvs" and "hlsl.cbufs".
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D130951
Calling `getFunctionLinkage(CalleeInfo.getCalleeDecl())` will crash when the declaration does not have a body, e.g., `extern void foo();`. Instead, we can use `isExternallyVisible()` to see if the delcaration has internal linkage.
I believe using `!isExternallyVisible()` is correct because the clang linkage must be `InternalLinkage` or `UniqueExternalLinkage`, both of which are "internal linkage" in llvm.
9c26f51f5e/clang/include/clang/Basic/Linkage.h (L28-L40)
Fixes https://github.com/llvm/llvm-project/issues/54139
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D135926
PPC64 ABI pass aggregates smaller than a register into the least
significant bits of the register. In the case of variadic functions,
they will end up right-aligned in their argument slots in the argument
area on big-endian targets. Apply right-alignment for these aggregates.
Fixes#55900.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D133338
This patch moves the implementation of the OffloadEntriesInfoManager
to the OMPIRbuilder. This class will later be used by flang as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D135786
This change pulls some code from the DirectX backend into a new
LLVMFrontendHLSL library to share utility data structures between the
HLSL code generation in Clang and the backend in LLVM.
This is a small refactoring as a first start to get code into the
right structure and get the library built and dependencies correct.
Fixes#58000 (https://github.com/llvm/llvm-project/issues/58000)
Reviewed By: python3kgae
Differential Revision: https://reviews.llvm.org/D135110
functions in getter/setter functions of non-trivial C struct properties
This fixes a bug where the getter/setter functions were doing a trivial
copy instead of calling the synthesized functions that copy non-trivial
C struct types.
This fixes https://github.com/llvm/llvm-project/issues/56680.
Differential Revision: https://reviews.llvm.org/D131701
There are currently two options that are used to tell the compiler to perform
unsafe floating-point optimizations:
'-ffast-math' and '-funsafe-math-optimizations'.
'-ffast-math' is enabled by default. It automatically enables the driver option
'-menable-unsafe-fp-math'.
Below is a table illustrating the special operations enabled automatically by
'-ffast-math', '-funsafe-math-optimizations' and '-menable-unsafe-fp-math'
respectively.
Special Operations -ffast-math -funsafe-math-optimizations -menable-unsafe-fp-math
MathErrno 0 1 1
FiniteMathOnly 1 0 0
AllowFPReassoc 1 1 1
NoSignedZero 1 1 1
AllowRecip 1 1 1
ApproxFunc 1 1 1
RoundingMath 0 0 0
UnsafeFPMath 1 0 1
FPContract fast on on
'-ffast-math' enables '-fno-math-errno', '-ffinite-math-only',
'-funsafe-math-optimzations' and sets 'FpContract' to 'fast'. The driver option
'-menable-unsafe-fp-math' enables the same special options than
'-funsafe-math-optimizations'. This is redundant.
We propose to remove the driver option '-menable-unsafe-fp-math' and use
instead, the setting of the special operations to set the function attribute
'unsafe-fp-math'. This attribute will be enabled only if those special
operations are enabled and if 'FPContract' is either 'fast' or set to the
default value.
Differential Revision: https://reviews.llvm.org/D135097
The diagnostics engine is very smart about being passed a NamedDecl to
print as part of a diagnostic; it gets the "right" form of the name,
quotes it properly, etc. However, the result of using an unnamed tag
declaration was to print '' instead of anything useful.
This patch causes us to print the same information we'd have gotten if
we had printed the type of the declaration rather than the name of it,
as that's the most relevant information we can display.
Differential Revision: https://reviews.llvm.org/D134813
cbuffer A {
float a;
float b;
}
will be translated to a global variable.
Something like
struct CB_Ty {
float a;
float b;
};
CB_Ty A;
And all use of a and b will be replaced with A.a and A.b.
Only support none-legacy cbuffer layout now.
CodeGen for Resource binding will be in separate patch.
In the separate patch, resource binding will map the resource information to the global variable.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D130131
This fixes a bug from https://reviews.llvm.org/D131424 that removed the implicit `_cmd` parameter as an argument to `objc_direct` method implementations. In many cases the generated getter/setter will call `objc_getProperty` or `objc_setProperty`, both of which require the selector of the getter/setter; since `_cmd` didn't automatically have backing storage, attempting to load the address asserted.
For direct property generated getters/setters, this now passes an undefined/uninitialized/poison value as the `_cmd` argument to `objc_getProperty`/`objc_setProperty`. Prior to removing the `_cmd` argument from the ABI of direct methods, it was left uninitialized/undefined; although references within hand-implemented methods would load the selector in the method prologue, generated getters/setters never did and just forwarded the undefined value that was passed as the argument.
This change keeps the generated code mostly similar to before, passing an uninitialized/undefined/poison value; for setters, the value argument may be moved to another register.
Added a test that triggers the assert prior to the implementation code.
Differential Revision: https://reviews.llvm.org/D135091
Performing a load before calling __cxa_guard_acquire is supposed to be
an optimization, but it isn't much of one if we're just going to emit a
call to __atomic_load_1 instead. Instead, just skip the load, and
let __cxa_guard_acquire do whatever it wants.
(In practice, on such targets, the C++ library is just built with
threading turned off, so the result isn't actually threadsafe, but
there's not really anything clang can do about that.)
The alternative here is that we try to define some ABI for threadsafe
init that allows the speculative load without full atomics. Almost any
target without full atomics has a load that's s "atomic enough" for this
purpose. But it's not clear how we emit an "atomic enough" load in LLVM
IR, and there isn't any ABI document we can refer to.
Or I guess we could turn off -fthreadsafe-statics by default on
Cortex-M0, but that seems like it would be surprising.
Fixes https://github.com/llvm/llvm-project/issues/58184
Differential Revision: https://reviews.llvm.org/D135628
Similar to D131064, this alters most of the intrinsics in arm_neon.h to
be target based, not preprocessor based. The intrinsics that are changed
are the ones with obvious target features (fp16, fp16fml, cryptos, i8mm
and bf16). The ones that are not yet altered are the ones without target
features like rdma (8.1) and complex (8.3). Those will be switched in a
followup patch that allows targeting architecture versions.
The existing ArchGuard in arm_neon.td is split into ArchGuard that still
adds ifdef defines (for example for intrinsics that require __aarch64__),
and TargetGuards for intrinsics dependant on target features. From there
the TargetGuards are used in two ways:
- For intrinsics emitted as functions, __attribute__((target(TargetGuard)))
is added to the definition of the function. Along with the existing
always_inline intrinsic, this will give a compile time error if the
function is used in a context where the target feature is not available.
- For intrinsics emitted as macros, the __builtins are emitted into
arm_neon.inc using TARGET_BUILTIN as opposed to BUILTIN, which includes
the target feature and gives an error if the builtin is found in a
function without the required features, similar to arm_sve.h.
The second method requires that the intrinsics be separable from the
existing _v intrinsics used in other types. For example
__builtin_neon_splat_lane_bf16 is used as opposed to
__builtin_neon_splat_lane_v. There are some adjustments to the CGBuiltin
to account for intrinsics that can be treated similarly, except for
their target features.
Differential Revision: https://reviews.llvm.org/D132034
There is no 6.9 in C++11, the quote actually lives in
[intro.multithread] for that revision. However, the words moved in
C++17 to [intro.progress] so I added that information as well.
Generally, with PGO enabled the C++20 likelyhood attributes shall be
dropped assuming the profile has a good coverage. However, currently
this is not the case for the following code:
if (always_false()) [[likely]] {
...
}
The patch fixes this and drops the attribute, if the parent context was
executed in the profile. The patch still preserves the attribute, if the
parent context was not executed, e.g. to support the cases when the
profile has insufficient coverage.
Differential Revision: https://reviews.llvm.org/D134456
Currently there is a middle-end or backend issue
https://github.com/llvm/llvm-project/issues/58176
which causes values loaded from bool pointer incorrect when
bool range metadata is emitted. Temporarily
disable bool range metadata until the backend issue
is fixed.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D135269
Fixes: SWDEV-344137
This patch moves the emitOffloadingArraysArgument function and
supporting data structures to OpenMPIRBuilder. This will later be used
in flang as well. The TargetDataInfo class was split up into generic
information and clang-specific data, which remain in clang. Further
migration will be done in in the future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D134662
The `cl-uniform-work-group` attribute asserts that the global work-size
be a multiple of the work-group specified work group size. This should
allow optimizations. It is already present by default in the AMD
compiler and for HIP kernels so it should be safe to allow this for
OpenMP kernels by default.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D135374
After generated call for ctor/dtor for entry, global variable for ctor/dtor are useless.
Remove them for non-lib profiles.
Lib profile still need these in case export function used the global variable which require ctor/dtor.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D133993
We use protected visibility for almost everything with offloading. This
is because it provides us with the ability to read things from the host
without the expectation that it will be preempted by a shared library
load, bugs related to this have happened when offloading to the host.
This patch just makes the `exec_mode` global generated for each plugin
have protected visibility.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D135285
Details posted here: https://reviews.llvm.org/D119051#3747201
3 cases that were inconsistent with the MSABI without this patch applied:
https://godbolt.org/z/GY48qxh3G - field with protected member
https://godbolt.org/z/Mb1PYhjrP - non-static data member initializer
https://godbolt.org/z/sGvxcEPjo - defaulted copy constructor
I'm not sure what's suitable/sufficient testing for this - I did verify
the three cases above. Though if it helps to add them as explicit tests,
I can do that too.
Also, I was wondering if the other use of isTrivialForAArch64MSVC in
isPermittedToBeHomogenousAggregate could be another source of bugs - I
tried changing the function to unconditionally call
isTrivialFor(AArch64)MSVC without testing AArch64 first, but no tests
fail, so it looks like this is undertested in any case. But I had
trouble figuring out how to exercise this functionality properly to add
test coverage and then compare that to MSVC itself... - I got very
confused/turned around trying to test this, so I've given up enough to
send what I have out for review, but happy to look further into this
with help.
Differential Revision: https://reviews.llvm.org/D133817
Turn it into a single Expr::isFlexibleArrayMemberLike method, as discussed in
https://discourse.llvm.org/t/rfc-harmonize-flexible-array-members-handling
Keep different behavior with respect to macro / template substitution, and
harmonize sharp edges: ObjC interface now behave as C struct wrt. FAM and
-fstrict-flex-arrays.
This does not impact __builtin_object_size interactions with FAM.
Differential Revision: https://reviews.llvm.org/D134791
When -fmodule-file-home-is-cwd and the path to the PCM is relative, we
shouldn't assume that the path to the PCM is relative to the modulemap
that produced it. To respect the option -fmodule-file-home-is-cwd, we
should assume the path is relative to the current working directory.
Reviewed By: rmaz
Differential Revision: https://reviews.llvm.org/D134911
If 'order(concurrent)' clause is specified, then the iterations of SIMD loop
can be executed concurrently.
This patch adds support for LLVM IR codegen via OMPIRBuilder for SIMD loop
with 'order(concurrent)' clause. The functionality added to OMPIRBuilder is
similar to the functionality implemented in 'CodeGenFunction::EmitOMPSimdInit'.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D134046
Signed-off-by: Dominik Adamski <dominik.adamski@amd.com>
This option forces constructors and non-deleting destructors to return
`this` pointer in C++ ABI (except for Microsoft ABI, on which this flag
has no effect).
This is similar to ARM32, Apple ARM64, or Fuchsia C++ ABI, but can be
applied to any target triple.
Differential Revision: https://reviews.llvm.org/D119209
This adds support under AArch64 for the target("..") attributes. The
current parsing is very X86-shaped, this patch attempts to bring it line
with the GCC implementation from
https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html#AArch64-Function-Attributes.
The supported formats are:
- "arch=<arch>" strings, that specify the architecture features for a
function as per the -march=arch+feature option.
- "cpu=<cpu>" strings, that specify the target-cpu and any implied
atributes as per the -mcpu=cpu+feature option.
- "tune=<cpu>" strings, that specify the tune-cpu cpu for a function as
per -mtune.
- "+<feature>", "+no<feature>" enables/disables the specific feature, for
compatibility with GCC target attributes.
- "<feature>", "no-<feature>" enabled/disables the specific feature, for
backward compatibility with previous releases.
To do this, the parsing of target attributes has been moved into
TargetInfo to give the target the opportunity to override the existing
parsing. The only non-aarch64 change should be a minor alteration to the
error message, specifying using "CPU" to describe the cpu, not
"architecture", and the DuplicateArch/Tune from ParsedTargetAttr have
been combined into a single option.
Differential Revision: https://reviews.llvm.org/D133848
Disallow this meaningless combination. Doing so simplifies analysis
of LLVM code w.r.t t DLL storage-class, and prevents mistakes with
DLL storage class.
- Change the assembler to reject DLL storage class on symbols with
local linkage.
- Change the bitcode reader to clear the DLL Storage class when the
linkage is local for auto-upgrading
- Update LangRef.
There is an existing restriction on non-default visibility and local
linkage which this is modelled on.
Differential Review: https://reviews.llvm.org/D134784
Although the instruction names begin "frint", the ACLE spec states that
the intrinsic names begin "__rint", without the "f".
Differential Revision: https://reviews.llvm.org/D134824
Currently, clang does not emit debuginfo for the switch stmt
case value if it is an enum value. For example,
$ cat test.c
enum { AA = 1, BB = 2 };
int func1(int a) {
switch(a) {
case AA: return 10;
case BB: return 11;
default: break;
}
return 0;
}
$ llvm-dwarfdump test.o | grep AA
$
Note that gcc does emit debuginfo for the same test case.
This patch added such a support with similar implementation
to CodeGenFunction::EmitDeclRefExprDbgValue(). With this patch,
$ clang -g -c test.c
$ llvm-dwarfdump test.o | grep AA
DW_AT_name ("AA")
$
Differential Revision: https://reviews.llvm.org/D134705
This implements WG14 N2927 and WG14 N2930, which together define the
feature for typeof and typeof_unqual, which get the type of their
argument as either fully qualified or fully unqualified. The argument
to either operator is either a type name or an expression. If given a
type name, the type information is pulled directly from the given name.
If given an expression, the type information is pulled from the
expression. Recursive use of these operators is allowed and has the
expected behavior (the innermost operator is resolved to a type, and
that's used to resolve the next layer of typeof specifier, until a
fully resolved type is determined.
Note, we already supported typeof in GNU mode as a non-conforming
extension and we are *not* exposing typeof_unqual as a non-conforming
extension in that mode, nor are we exposing typeof or typeof_unqual as
a nonconforming extension in other language modes. The GNU variant of
typeof supports a form where the parentheses are elided from the
operator when given an expression (e.g., typeof 0 i = 12;). When in C2x
mode, we do not support this extension.
Differential Revision: https://reviews.llvm.org/D134286
It is data mapping ordering problem.
According omp spec
If one or more map clauses are present, the list item conversions that
are performed for any use_device_ptr or use_device_addr clause occur
after all variables are mapped on entry to the region according to those
map clauses.
The change is to put mapping data for use_device_addr at end of data
mapping array.
Differential Revision: https://reviews.llvm.org/D134556
By draft of C23 (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf),
the description for isless macro under 7.12.17.3 says,
The isless macro determines whether its first argument is less than its second
argument. The value of isless(x,y) is always equal to (x)< (y); however, unlike
(x) < (y), isless(x,y) does not raise the invalid floating-point exception when
x and y are unordered and neither is a signaling NaN.
isless should trap when encountering signaling NaN.
Reviewed By: jcranmer-intel, efriedma
Differential Revision: https://reviews.llvm.org/D134407