Translate HLSLNumThreadsAttr into function attribute with name "dx.numthreads" and value format as "x,y,z".
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D131799
The accessibility level of a typedef or using declaration in a
struct or class was being lost when producing debug information.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D134339
One must pick the same name as the one referenced in CodeGenFunction when
generating .inline version of an inline builtin, otherwise they are not
correctly replaced.
Differential Revision: https://reviews.llvm.org/D134362
When `objc_direct` methods were implemented, the implicit `_cmd` parameter was left as an argument to the method implementation function, but was unset by callers; if the method body referenced the `_cmd` variable, a selector load would be emitted inside the body. However, this leaves an unused argument in the ABI, and is unnecessary.
This change removes the empty/unset argument, and if `_cmd` is referenced inside an `objc_direct` method it will emit local storage for the implicit variable. From the ABI perspective, `objc_direct` methods will have the implicit `self` parameter, immediately followed by whatever explicit arguments are defined on the method, rather than having one unset/undefined register in the middle.
Differential Revision: https://reviews.llvm.org/D131424
In order to make this easier to track I've filed issues for each of the
HLSL FIXME comments that I can find. I may have missed some, but I want
this to be the new default mode.
This patch add codegen support for the has_device_addr clause. It use
the same logic of is_device_ptr. But passing &var instead pointer to var
to kernal.
Differential Revision: https://reviews.llvm.org/D134268
This extension does not appear to be on its way to ratification.
Out of the unratified bitmanip extensions, this one had the
largest impact on the compiler.
Posting this patch to start a discussion about whether we should
remove these extensions. We'll talk more at the RISC-V sync meeting this
Thursday.
Reviewed By: asb, reames
Differential Revision: https://reviews.llvm.org/D133834
When passing arguments with `__fastcall` or `__vectorcall` in 32-bit MSVC, the following arguments have chance to be passed by register if the current one failed. `__regcall` from ICC is on the contrary: https://godbolt.org/z/4MPbzhaMG
All the three calling conversions are not supported in GCC.
Fixes: #57737
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D133920
Summary: This patch add codegen support for the has_device_addr clause. It
use the same logic of is_device_ptr.
Differential Revision: https://reviews.llvm.org/D134186
Reuse most of RISCV's implementation with several exceptions:
1. Assign signext/zeroext attribute to args passed in stack.
On RISCV, integer scalars passed in registers have signext/zeroext
when promoted, but are anyext if passed on the stack. This is defined
in early RISCV ABI specification. But after this change [1], integers
should also be signext/zeroext if passed on the stack. So I think
RISCV's ABI lowering should be updated [2].
While in LoongArch ABI spec, we can see that integer scalars narrower
than GRLEN bits are zero/sign-extended no matter passed in registers
or on the stack.
2. Zero-width bit fields are ignored.
This matches GCC's behavior but it hasn't been documented in ABI sepc.
See https://gcc.gnu.org/r12-8294.
3. `char` is signed by default.
There is another difference worth mentioning is that `char` is signed
by default on LoongArch while it is unsigned on RISCV.
This patch also adds `_BitInt` type support to LoongArch and handle it
in LoongArchABIInfo::classifyArgumentType.
[1] cec39a064e
[2] https://github.com/llvm/llvm-project/issues/57261
Differential Revision: https://reviews.llvm.org/D132285
Before this patch, when compiling an IR file (eg the .llvmbc section
from an object file compiled with -Xclang -fembed-bitcode=all) and
profile data was passed in using the -fprofile-instrument-use-path
flag, there would be no error printed (as the previous implementation
relied on the error getting caught again in the constructor of
CodeGenModule which isn't called when -x ir is set). This patch
moves the error checking directly to where the error is caught
originally rather than failing silently in setPGOUseInstrumentor and
waiting to catch it in CodeGenModule to print diagnostic information to
the user.
Regression test added.
Reviewed By: xur, mtrofin
Differential Revision: https://reviews.llvm.org/D132991
We change the template specialization of builtin templates to
behave like aliases.
Though unlike real alias templates, these might still produce a canonical
TemplateSpecializationType when some important argument is dependent.
For example, we can't do anything about make_integer_seq when the
count is dependent, or a type_pack_element when the index is dependent.
We change type deduction to not try to deduce canonical TSTs of
builtin templates.
We also change those buitin templates to produce substitution sugar,
just like a real instantiation would, making the resulting type correctly
represent the template arguments used to specialize the underlying template.
And make_integer_seq will now produce a TST for the specialization
of it's first argument, which we use as the underlying type of
the builtin alias.
When performing member access on the resulting type, it's now
possible to map from a Subst* node to the template argument
as-written used in a regular fashion, without special casing.
And this fixes a bunch of bugs with relation to these builtin
templates factoring into deduction.
Fixes GH42102 and GH51928.
Depends on D133261
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential Revision: https://reviews.llvm.org/D133262
For this patch, a simple search was performed for patterns where there are
two types (usually an LHS and an RHS) which are structurally the same, and there
is some result type which is resolved as either one of them (typically LHS for
consistency).
We change those cases to resolve as the common sugared type between those two,
utilizing the new infrastructure created for this purpose.
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential Revision: https://reviews.llvm.org/D111509
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert, jhuber6, ABataev
Differential Revision: https://reviews.llvm.org/D102107
Restore GlobalsAA if sanitizers inserted at early optimize callback.
The analysis can be useful for the following FunctionPassManager.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D133537
The old device runtime had a "simplified" version that prevented many of
the runtime features from being initialized. The old device runtime was
deleted in LLVM 14 and is no longer in use. Selectively deactivating
features is now done using specific flags rather than the old technique.
This patch simply removes the extra logic required for handling the old
simple runtime scheme.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D133802
Two new dxc mode options -O and -Od are added for dxc mode.
-O is just alias of existing cc1 -O option.
-Od will be lowered into -O0 and -dxc-opt-disable.
-dxc-opt-disable is cc1 option added to for build ShaderFlags.
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D128845
HLSL doesn't have a C++ runtime that supports `atexit` registration. To
enable global destructors we instead rely on the `llvm.global_dtor`
mechanism.
This change disables `atexit` generation for HLSL and updates the HLSL
code generation to call global destructors on the exit from entry
functions.
Depends on D132977.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D133518
Set the EmulatedTLS option based on `Triple::hasDefaultEmulatedTLS()`
if the user didn't specify it; set `ExplicitEmulatedTLS` to true
in `llvm::TargetOptions` and set `EmulatedTLS` to Clang's
opinion of what the default or preference is.
This avoids any risk of deviance between the two.
This affects one check of `getCodeGenOpts().EmulatedTLS` in
`shouldAssumeDSOLocal` in CodeGenModule, but as that check only
is done for `TT.isWindowsGNUEnvironment()`, and
`hasDefaultEmulatedTLS()` returns false for such environments
it doesn't make any current testable difference - thus NFC.
Some mingw distributions carry a downstream patch, that enables
emulated TLS by default for mingw targets in `hasDefaultEmulatedTLS()`
- and for such cases, this patch does make a difference and fixes the
detection of emulated TLS, if it is implicitly enabled.
Differential Revision: https://reviews.llvm.org/D132916
Hidden visibility is incompatible with dllexport.
Hidden and protected visibilities are incompatible with dllimport.
(PlayStation uses dllexport protected.)
When an explicit visibility attribute applies on a dllexport/dllimport
declaration, report a Frontend error (Sema does not compute visibility).
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D133266
__declspec(safebuffers) is equivalent to
__attribute__((no_stack_protector)). This information is recorded in
CodeView.
While we are here, add support for strict_gs_check.
HLSL doesn't have a runtime loader model that supports global
construction by a loader or runtime initializer. To allow us to leverage
global constructors with minimal code generation impact we put calls to
the global constructors inside the generated entry function.
Differential Revision: https://reviews.llvm.org/D132977
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.
Change call sites to use `std::size` instead. Leave the few call sites that
use a locally defined `array_lengthof` that are meant to test previous bugs
with NTTPs in clang analyzer and SemaTemplate.
Differential Revision: https://reviews.llvm.org/D133520
This reverts commit 91d8324366.
The combo dllexport protected makes sense and is used by PlayStation.
Will change the patch to allow dllexport protected.
Introduces the frontend flag -fexperimental-sanitize-metadata=, which
enables SanitizerBinaryMetadata instrumentation.
The first intended user of the binary metadata emitted will be a variant
of GWP-TSan [1]. The plan is to open source a stable and production
quality version of GWP-TSan. The development of which, however, requires
upstream compiler support.
[1] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf
Until the tool has been open sourced, we mark this kind of
instrumentation as "experimental", and reserve the option to change
binary format, remove features, and similar.
Reviewed By: vitalybuka, MaskRay
Differential Revision: https://reviews.llvm.org/D130888
Avoid __builtin_assume_aligned crash when the 1st arg is array type (or
string literal).
Fixes Issue #57169
Differential Revision: https://reviews.llvm.org/D133202
We should only use 'auto' in case we can know the type from the right
hand side of the expression. Also we need keep '*' around if the type is
a pointer actually. Few uses of 'auto' in SemaCoroutine.cpp and
CGCoroutine.cpp violates the rule. This commit tries to fix it.
dllimport/dllexport is incompatible with protected/hidden visibilities.
(Arguably dllexport semantics is compatible with protected but let's reject the
combo for simplicity.)
When an explicit visibility attribute applies on a dllexport/dllimport
declaration, report a Frontend error (Sema does not compute visibility).
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D133266
Similar to 123ce97fac for dllimport: dllexport
expresses a non-hidden visibility intention. We can consider it explicit and
therefore it should override the global visibility setting (see AST/Decl.cpp
"NamedDecl Implementation").
Adding the special case to CodeGenModule::setGlobalVisibility is somewhat weird,
but allows we to add the code in one place instead of many in AST/Decl.cpp.
Differential Revision: https://reviews.llvm.org/D133180
This is a follow up to https://reviews.llvm.org/D126864, addressing some remaining
comments.
It also considers union with a single zero-length array field as FAM for each
value of -fstrict-flex-arrays.
Differential Revision: https://reviews.llvm.org/D132944
All the coroutine builtins were emitted in EmitCoroutineIntrinsic except
__builtin_coro_size. This patch tries to emit all the corotine builtins
uniformally.
Seeing the wrong instruction for this name in IR is confusing.
Most of the tests are not even checking a subsequent use of
the value, so I just deleted the over-specified CHECKs.
This patch has the following changes:
(1) Handling of internal linkage functions (static functions)
Static functions in FDO have a prefix of source file name, while they do not
have one in SampleFDO. Current implementation does not handle this and we are
not updating the profile for static functions. This patch fixes this.
(2) Handling of -funique-internal-linakge-symbols
Again this is for the internal linkage functions. Option
-funique-internal-linakge-symbols can now be applied to both FDO and SampleFDO
compilation. When it is used, it demangles internal linkage function names and
adds a hash value as the postfix.
When both SampleFDO and FDO profiles use this option, or both
not use this option, changes in (1) should handle this.
Here we also handle when the SampleFDO profile using this option while FDO
profile not using this option, or vice versa.
There is one case where this patch won't work: If one of the profiles used
mangled name and the other does not. For example, if the SampleFDO profile
uses clang c-compiler and without -funique-internal-linakge-symbols, while
the FDO profile uses -funique-internal-linakge-symbols. The SampleFDO profile
contains unmangled names while the FDO profile contains mangled names. If
both profiles use c++ compiler, this won't happen. We think this use case
is rare and does not justify the effort to fix.
Differential Revision: https://reviews.llvm.org/D132600
To have finer control of IR uwtable attribute generation. For target code generation,
IR nounwind and uwtable may have some interaction. However, for frontend, there are
no semantic interactions so the this new `nouwtable` is marked "SimpleHandler = 1".
Differential Revision: https://reviews.llvm.org/D132592
This patch replaces clamp idioms with std::clamp where the range is
obviously valid from the source code (that is, low <= high) to avoid
introducing undefined behavior.
If we run into a first usage or definition of a mangled name, and
there's a DeferredDecl that associated with it, we should remember it we
need to emit it later on.
Without this patch, clang-repl hits a JIT symbol not found error:
clang-repl> extern "C" int printf(const char *, ...);
clang-repl> auto l1 = []() { printf("ONE\n"); return 42; };
clang-repl> auto l2 = []() { printf("TWO\n"); return 17; };
clang-repl> auto r1 = l1();
ONE
clang-repl> auto r2 = l2();
TWO
clang-repl> auto r3 = l2();
JIT session error: Symbols not found: [ l2 ]
error: Failed to materialize symbols: { (main,
{ r3, orc_init_func.incr_module_5, $.incr_module_5.inits.0 }) }
Signed-off-by: Jun Zhang <jun@junz.org>
Differential Revision: https://reviews.llvm.org/D130831
We run into a duplicate symbol error when instrumenting the rtti_proxies
generated as part of the relative vtables ABI with hwasan:
```
ld.lld: error: duplicate symbol: typeinfo for icu_71::UObject
(.rtti_proxy)
>>> defined at brkiter.cpp
>>>
arm64-hwasan-shared/obj/third_party/icu/source/common/libicuuc.brkiter.cpp.o:(typeinfo
for icu_71::UObject (.rtti_proxy))
>>> defined at locavailable.cpp
>>>
arm64-hwasan-shared/obj/third_party/icu/source/common/libicuuc.locavailable.cpp.o:(.data.rel.ro..L_ZTIN6icu_717UObjectE.rtti_proxy.hwasan+0xE00000000000000)
```
The issue here is that the hwasan alias carries over the visibility and
linkage of the original proxy, so we have duplicate external symbols
that participate in linking. Similar to D132425 we can just disable
hwasan for the proxies for now.
Differential Revision: https://reviews.llvm.org/D132691
Full context in
https://bugs.fuchsia.dev/p/fuchsia/issues/detail?id=107017.
Instrumenting hwasan with globals results in a linker error under the
relative vtables abi:
```
ld.lld: error:
libunwind.cpp:(.rodata..L_ZTVN9libunwind12UnwindCursorINS_17LocalAddressSpaceENS_15Registers_arm64EEE.hwasan+0x8):
relocation R_AARCH64_PLT32 out of range: 6845471433603167792 is not in
[-2147483648, 2147483647]; references
libunwind::AbstractUnwindCursor::~AbstractUnwindCursor()
>>> defined in
libunwind/src/CMakeFiles/unwind_shared.dir/libunwind.cpp.obj
```
This is because the tag is included in the vtable address when
calculating the offset between the vtable and virtual function. A
temporary solution until we can resolve this is to just disable hwasan
instrumentation on relative vtables specifically, which can be done in
the frontend.
Differential Revision: https://reviews.llvm.org/D132425
Put DXIL validation version into separate NamedMetadata to avoid update ModuleFlags.
Currently DXIL validation version is saved in ModuleFlags in clang codeGen.
Then in DirectX backend, the data will be extracted from ModuleFlags and cause rebuild of ModuleFlags.
This patch will build NamedMetadata for DXIL validation version and remove the code to rebuild ModuleFlags.
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D130207
Clang crashes when encountering an `if consteval` statement.
This is the minimum fix not to crash.
The fix is consistent with the current behavior of if constexpr,
which does generate coverage data for the discarded branches.
This is of course not correct and a better solution is
needed for both if constexpr and if consteval.
See https://github.com/llvm/llvm-project/issues/54419.
Fixes#57377
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D132723
Semantic parameters aren't passed as actual parameters, instead they are
populated from intrinsics which are generally lowered to reads from
dedicated hardware registers.
This change modifies clang CodeGen to emit the intrinsic calls and
populate the parameter's LValue with the result of the intrinsic call
for SV_GroupIndex.
The result of this is to make the actual passed argument ignored, which
will make it easy to clean up later in an IR pass.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D131203
The linker is supposed to detect when an object with /kernel is linked
with another object which is not compiled with /kernel. The linker
detects this by checking bit 30 in @feat.00.
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.
Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.
KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.
A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:
```
.c:
int f(void);
int (*p)(void) = f;
p();
.s:
.4byte __kcfi_typeid_f
.global f
f:
...
```
Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.
As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.
Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.
Relands 67504c9549 with a fix for
32-bit builds.
Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay
Differential Revision: https://reviews.llvm.org/D119296
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.
Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.
KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.
A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:
```
.c:
int f(void);
int (*p)(void) = f;
p();
.s:
.4byte __kcfi_typeid_f
.global f
f:
...
```
Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.
As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.
Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.
Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay
Differential Revision: https://reviews.llvm.org/D119296
"this" parameter of lambda if undef, notnull and differentiable.
So we need to pass something consistent.
Any alloca will work. It will be eliminated as unused later by optimizer.
Otherwise we generate code which Msan is expected to catch.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D132275
The OpenMP device runtime needs to support the OpenMP standard. However
constructs like nested parallelism are very uncommon in real application
yet lead to complexity in the runtime that is sometimes difficult to
optimize out. As a stop-gap for performance we should supply an argument
that selectively disables this feature. This patch adds the
`-fopenmp-assume-no-nested-parallelism` argument which explicitly
disables the usee of nested parallelism in OpenMP.
Reviewed By: carlo.bertolli
Differential Revision: https://reviews.llvm.org/D132074
MSVC allows interpreting volatile loads and stores, when combined with
/volatile:iso, as having acquire/release semantics. MSVC also exposes a
define, _ISO_VOLATILE, which allows users to enquire if this feature is
enabled or disabled.
Fixes https://github.com/llvm/llvm-project/issues/55804
The lexing order is already bookkept in DelayedCXXInitPosition but we
were not using it based on the wrong assumption that inline variable is
unordered. This patch fixes it by ordering entries in llvm.global_ctors
by orders in DelayedCXXInitPosition.
for llvm.global_ctors entries without a lexing order, ordering them by
the insertion order.
(This *mostly* orders the template instantiation in
https://reviews.llvm.org/D126341 intuitively, minus one tweak for which I'll
submit a separate patch.)
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D127233
The hard float ABIs have a rule that if a flattened struct contains
either a single fp value, or an int+fp, or fp+fp then it may be passed
in a pair of registers (if sufficient GPRs+FPRs are available).
detectFPCCEligibleStruct and the helper it calls,
detectFPCCEligibleStructHelper examine the type of the argument/return
value to determine if it complies with the requirements for this ABI
rule.
As reported in bug #57084, this logic produces incorrect results for C++
structs that inherit from other structs. This is because only the fields
of the struct were examined, but enumerating RD->fields misses any
fields in inherited C++ structs. This patch corrects that issue by
adding appropriate logic to enumerate any included base structs.
Differential Revision: https://reviews.llvm.org/D131677
This patch replaces svget, svset and svcreate aarch64 intrinsics for tuple
types with the generic llvm-ir intrinsics extract/insert vector
Differential Revision: https://reviews.llvm.org/D131547
Currently, record arguments are always passed by reference by allocating
space for record values in the caller. This is less efficient for
small records which may take one or two registers. For example,
for x86_64 and aarch64, for a record size up to 16 bytes, the record
values can be passed by values directly on the registers.
This patch added BPF support of record argument with direct values
for up to 16 byte record size. If record size is 0, that record
will not take any register, which is the same behavior for x86_64
and aarch64. If the record size is greater than 16 bytes, the
record argument will be passed by reference.
Differential Revision: https://reviews.llvm.org/D132144
This patch adds OMPIRBuilder support for the safelen clause for the
simd directive.
Reviewed By: shraiysh, Meinersbur
Differential Revision: https://reviews.llvm.org/D131526
Add the clang option -finline-max-stacksize=<N> to suppress inlining
of functions whose stack size exceeds the given value.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D131986
This patch fixes a crash when trying to emit a constant compound literal.
For C++ Clang evaluates either casts or binary operations at translation time,
but doesn't pass on the InConstantContext information that was inferred when
parsing the statement. Because of this, strict FP evaluation (-ftrapping-math)
which shouldn't be in effect yet, then causes checkFloatingpointResult to return
false, which in tryEmitGlobalCompoundLiteral will trigger an assert that the
compound literal wasn't constant.
The discussion here around 'manifestly constant evaluated contexts' was very
helpful to me when trying to understand what LLVM's position is on what
evaluation context should be in effect, together with the explanatory text in
that patch itself:
https://reviews.llvm.org/D87528
Reviewed By: rjmccall, DavidSpickett
Differential Revision: https://reviews.llvm.org/D131555
Seems this complicated lldb sufficiently for some cases that it hasn't
been worth supporting/fixing there - and it so far hasn't provided any
new use cases/value for debug info consumers, so let's remove it until
someone has a use case for it.
(side note: the original implementation of this still had a bug (I
should've caught it in review) that we still didn't produce
auto-returning function declarations in types where the function wasn't
instantiatied (that requires a fix to remove the `if
getContainedAutoType` condition in
`CGDebugInfo::CollectCXXMemberFunctions` - without that, auto returning
functions were still being handled the same as member function templates
and special member functions - never added to the member list, only
attached to the type via the declaration chain from the definition)
Further discussion about this in D123319
This reverts commit 5ff992bca208a0e37ca6338fc735aec6aa848b72: [DEBUG-INFO] Change how we handle auto return types for lambda operator() to be consistent with gcc
This reverts commit c83602fdf51b2692e3bacb06bf861f20f74e987f: [DWARF5][clang]: Added support for DebugInfo generation for auto return type for C++ member functions.
Differential Revision: https://reviews.llvm.org/D131933
Currently bpf supports calling kernel functions (x86_64, arm64, etc.)
in bpf programs. Tejun discovered a problem where the x86_64 func
return value (a unsigned char type) is stored in 8-bit subregister %al
and the other 56-bits in %rax might be garbage. But based on current
bpf ABI, the bpf program assumes the whole %rax holds the correct value
as the callee is supposed to do necessary sign/zero extension.
This mismatch between bpf and x86_64 caused the incorrect results.
To resolve this problem, this patch forced caller to do needed
sign/zero extension for 8/16-bit return values as well. Note that
32-bit return values already had sign/zero extension even without
this patch.
For example, for the test case attached to this patch:
$ cat t.c
_Bool bar_bool(void);
unsigned char bar_char(void);
short bar_short(void);
int bar_int(void);
int foo_bool(void) {
if (bar_bool() != 1) return 0; else return 1;
}
int foo_char(void) {
if (bar_char() != 10) return 0; else return 1;
}
int foo_short(void) {
if (bar_short() != 10) return 0; else return 1;
}
int foo_int(void) {
if (bar_int() != 10) return 0; else return 1;
}
Without this patch, generated call insns in IR looks like:
%call = call zeroext i1 @bar_bool()
%call = call zeroext i8 @bar_char()
%call = call signext i16 @bar_short()
%call = call i32 @bar_int()
So it is assumed that zero extension has been done for return values of
bar_bool()and bar_char(). Sign extension has been done for the return
value of bar_short(). The return value of bar_int() does not have any
assumption so caller needs to do necessary shifting to get correct
32bit values.
With this patch, generated call insns in IR looks like:
%call = call i1 @bar_bool()
%call = call i8 @bar_char()
%call = call i16 @bar_short()
%call = call i32 @bar_int()
There are no assumptions for return values of the above four function calls,
so necessary shifting is necessary for all of them.
The following is the objdump file difference for function foo_char().
Without this patch:
0000000000000010 <foo_char>:
2: 85 10 00 00 ff ff ff ff call -1
3: bf 01 00 00 00 00 00 00 r1 = r0
4: b7 00 00 00 01 00 00 00 r0 = 1
5: 15 01 01 00 0a 00 00 00 if r1 == 10 goto +1 <LBB1_2>
6: b7 00 00 00 00 00 00 00 r0 = 0
0000000000000038 <LBB1_2>:
7: 95 00 00 00 00 00 00 00 exit
With this patch:
0000000000000018 <foo_char>:
3: 85 10 00 00 ff ff ff ff call -1
4: bf 01 00 00 00 00 00 00 r1 = r0
5: 57 01 00 00 ff 00 00 00 r1 &= 255
6: b7 00 00 00 01 00 00 00 r0 = 1
7: 15 01 01 00 0a 00 00 00 if r1 == 10 goto +1 <LBB1_2>
8: b7 00 00 00 00 00 00 00 r0 = 0
0000000000000048 <LBB1_2>:
9: 95 00 00 00 00 00 00 00 exit
The zero extension of the return 'char' value is done here.
Differential Revision: https://reviews.llvm.org/D131598
When aliasing a static array, the aliasee is going to be a GEP which
points to the value. We should strip pointer casts before forming the
reference. This was occluded by the use of opaque pointers.
This problem has existed since the introduction of the debug info
generation for aliases in b1ea0191a4. The
test case would assert due to the invalid cast with or without
`-no-opaque-pointers` at that revision.
Fixes: #57179
Currently we treat initializers with init_seg(compiler/lib) as similar
to any other init_seg, they simply have a global variable in the proper
section (".CRT$XCC" for compiler/".CRT$XCL" for lib) and are added to
llvm.used. However, this doesn't match with how LLVM sees normal (or
init_seg(user)) initializers via llvm.global_ctors. This
causes issues like incorrect init_seg(compiler) vs init_seg(user)
ordering due to GlobalOpt evaluating constructors, and the
ability to remove init_seg(compiler/lib) initializers at all.
Currently we use 'A' for priorities less than 200. Use 200 for
init_seg(compiler) (".CRT$XCC") and 400 for init_seg(lib) (".CRT$XCL"),
which do not append the priority to the section name. Priorities
between 200 and 400 use ".CRT$XCC${Priority}". This allows for
some wiggle room for people/future extensions that want to add
initializers between compiler and lib.
Fixes#56922
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D131910