Many platform ABIs have special support for passing aggregates that
either just contain a single member of floatint-point type, or else
a homogeneous set of members of the same floating-point type.
When making this determination, any extra "empty" members of the
aggregate type will typically be ignored. However, in C++ (at least
in all prior versions), no data member would actually count as empty,
even if it's type is an empty record -- it would still be considered
to take up at least one byte of space, and therefore make those ABI
special cases not apply.
This is now changing in C++20, which introduced the [[no_unique_address]]
attribute. Members of empty record type, if they also carry this
attribute, now do *not* take up any space in the type, and therefore
the ABI special cases for single-element or homogeneous aggregates
should apply.
The C++ Itanium ABI has been updated accordingly, and GCC 10 has
added support for this new case. This patch now adds support to
LLVM. This is cross-platform; it affects all platforms that use
the single-element or homogeneous aggregate ABI special case and
implement this using any of the following common subroutines
in lib/CodeGen/TargetInfo.cpp:
isEmptyField
isEmptyRecord
isSingleElementStruct
isHomogeneousAggregate
The SystemZ ABI specifies that aggregate types with just a single
member of floating-point type shall be passed as if they were just
a scalar of that type. This applies to both struct and class types
(but not unions).
However, the current ABI support code in clang only checks this
case for struct types, which means that for class types, generated
code does not adhere to the platform ABI.
Fixed by accepting both struct and class types in the
SystemZABIInfo::GetSingleElementType routine.
The x86-64 "avx" feature changes how >128 bit vector types are passed,
instead of being passed in separate 128 bit registers, they can be
passed in 256 bit registers.
"avx512f" does the same thing, except it switches from 256 bit registers
to 512 bit registers.
The result of both of these is an ABI incompatibility between functions
compiled with and without these features.
This patch implements a warning/error pair upon an attempt to call a
function that would run afoul of this. First, if a function is called
that would have its ABI changed, we issue a warning.
Second, if said call is made in a situation where the caller and callee
are known to have different calling conventions (such as the case of
'target'), we instead issue an error.
Differential Revision: https://reviews.llvm.org/D82562
EmitTargetMetadata passed to emitTargetMD a null pointer as returned
from GetGlobalValue, for an unused inline function which has been
removed from the module at that point.
A FIXME in CodeGenModule.cpp commented that the calling code in
EmitTargetMetadata should be moved into the one target that needs it
(XCore). A review comment agreed. So the calling loop has been moved
into the XCore subclass. The check for null is done in that loop.
Differential Revision: https://reviews.llvm.org/D77068
Summary:
As part of moving the argument lowering handling for bfloat arguments and
returns to the backend, this patch removes the code that was responsible for
handling the coercion of those arguments in Clang's Codegen.
Subscribers: kristof.beyls, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81837
Summary:
On the process of moving the argument lowering handling for
half-precision floating point arguments and returns to the backend, this
patch removes the code that was responsible for handling the coercion of
those arguments in Clang's Codegen.
Reviewers: rjmccall, chill, ostannard, dnsampaio
Reviewed By: ostannard
Subscribers: stuij, kristof.beyls, dmgreen, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81451
Summary:
This patch upstreams support for a new storage only bfloat16 C type.
This type is used to implement primitive support for bfloat16 data, in
line with the Bfloat16 extension of the Armv8.6-a architecture, as
detailed here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
In detail this patch:
- introduces an opaque, storage-only C-type __bf16, which introduces a new bfloat IR type.
This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.
The following people contributed to this patch:
- Luke Cheeseman
- Momchil Velikov
- Alexandros Lamprineas
- Luke Geeson
- Simon Tatham
- Ties Stuij
Reviewers: SjoerdMeijer, rjmccall, rsmith, liutianle, RKSimon, craig.topper, jfb, LukeGeeson, fpetrogalli
Reviewed By: SjoerdMeijer
Subscribers: labrinea, majnemer, asmith, dexonsmith, kristof.beyls, arphaman, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76077
D35259 introduced a case where complex types of non-long-double would
result in FI.getReturnInfo() to not be initialized properly. This
resulted in a crash under some very specific circumstances when
dereferencing the LLVMContext.
This patch makes sure that these types have the intended getReturnInfo
initialization.
Summary:
Created AIXABIInfo and AIXTargetCodeGenInfo for AIX ABI.
Reviewed By: Xiangling_L, ZarkoCA
Differential Revision: https://reviews.llvm.org/D79035
Summary:
This patch fixed the error of counting the remaining FPRs. Complex floating-point
values should be passed by two FPRs for the hard-float ABI. If no two FPRs are
available, it should be passed via a 64-bit GPR (fp+fp). `ArgFPRsLeft` is only
decreased one while the type is complex floating-point. It causes two floating-point
values in the complex are passed separately by two GPRs.
Reviewers: asb, luismarques, lenary
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, s.egerton, pzheng, sameer.abuasal, apazos, evandro, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79770
This is the result of an audit of all of the ABIs in clang to implement
and enable the type for those targets.
Additionally, this finds an issue with integer-promotion passing for a
few platforms when using _ExtInt of < int, so this also corrects that
resulting in signext/zeroext being on a params of those types in some
platforms.
Differential Revisions: https://reviews.llvm.org/D79118
Use unique_ptr to manage the lifetime of ABIInfo member inside TargetCodeGenInfo.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D79033
After speaking with Craig Topper about some recent defects, he pointed
out that _ExtInts should be passed indirectly if larger than the largest
int register, and like ints when smaller than that. This patch
implements that.
Note that this changed the way vaargs worked quite a bit, but they still
work.
Differential Revision: https://reviews.llvm.org/D78785
Summary:
Change the default ABI to be compatible with GCC. For 32-bit ELF
targets other than Linux, Clang now returns small structs in registers
r3/r4. This affects FreeBSD, NetBSD, OpenBSD. There is no change for
32-bit Linux, where Clang continues to return all structs in memory.
Add clang options -maix-struct-return (to return structs in memory) and
-msvr4-struct-return (to return structs in registers) to be compatible
with gcc. These options are only for PPC32; reject them on PPC64 and
other targets. The options are like -fpcc-struct-return and
-freg-struct-return for X86_32, and use similar code.
To actually return a struct in registers, coerce it to an integer of the
same size. LLVM may optimize the code to remove unnecessary accesses to
memory, and will return i32 in r3 or i64 in r3:r4.
Fixes PR#40736
Patch by George Koehler!
Reviewed By: jhibbits, nemanjai
Differential Revision: https://reviews.llvm.org/D73290
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: sdesmalen, efriedma, krememek
Reviewed By: sdesmalen, efriedma
Subscribers: dexonsmith, Charusso, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77257
Summary:
- Use `device_builtin_surface` and `device_builtin_texture` for
surface/texture reference support. So far, both the host and device
use the same reference type, which could be revised later when
interface/implementation is stablized.
Reviewers: yaxunl
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77583
Summary:
- Even though the bindless surface/texture interfaces are promoted,
there are still code using surface/texture references. For example,
[PR#26400](https://bugs.llvm.org/show_bug.cgi?id=26400) reports the
compilation issue for code using `tex2D` with texture references. For
better compatibility, this patch proposes the support of
surface/texture references.
- Due to the absent documentation and magic headers, it's believed that
`nvcc` does use builtins for texture support. From the limited NVVM
documentation[^nvvm] and NVPTX backend texture/surface related
tests[^test], it's believed that surface/texture references are
supported by replacing their reference types, which are annotated with
`device_builtin_surface_type`/`device_builtin_texture_type`, with the
corresponding handle-like object types, `cudaSurfaceObject_t` or
`cudaTextureObject_t`, in the device-side compilation. On the host
side, that global handle variables are registered and will be
established and updated later when corresponding binding/unbinding
APIs are called[^bind]. Surface/texture references are most like
device global variables but represented in different types on the host
and device sides.
- In this patch, the following changes are proposed to support that
behavior:
+ Refine `device_builtin_surface_type` and
`device_builtin_texture_type` attributes to be applied on `Type`
decl only to check whether a variable is of the surface/texture
reference type.
+ Add hooks in code generation to replace that reference types with
the correponding object types as well as all accesses to them. In
particular, `nvvm.texsurf.handle.internal` should be used to load
object handles from global reference variables[^texsurf] as well as
metadata annotations.
+ Generate host-side registration with proper template argument
parsing.
---
[^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf
[^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
[^bind]: See section 3.2.11.1.2 ``Texture reference API` in [CUDA C Programming Guide](https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf).
[^texsurf]: According to NVVM IR, `nvvm.texsurf.handle` should be used. But, the current backend doesn't have that supported. We may revise that later.
Reviewers: tra, rjmccall, yaxunl, a.sidorin
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76365
This has no effect on how LLVM passes the arguments, but it prevents
rewriteWithInAlloca from thinking that these parameters should be part
of the inalloca pack.
Follow-up to D72114
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D74452
This brings back 2af74e27ed and reverts
eaabaf7e04.
The changes were correct, the code that was broken contained an ODR
violation that assumed that these types are passed equivalently:
struct alignas(uint64_t) Wrapper { uint64_t P };
void f(uint64_t p);
void f(Wrapper p);
MSVC does not pass them the same way, and so clang-cl should not pass
them the same way either.
Summary:
For now, this ABI simply expands all possible aggregate arguments and
returns all possible aggregates directly. This ABI will change rapidly
as we prototype and benchmark a new ABI that takes advantage of
multivalue return and possibly other changes from the MVP ABI.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D72972
MSVC 2013 would refuse to pass highly aligned things (typically vectors
and aggregates) by value. Users would receive this error:
t.cpp(11) : error C2719: 'w': formal parameter with __declspec(align('32')) won't be aligned
t.cpp(11) : error C2719: 'q': formal parameter with __declspec(align('32')) won't be aligned
However, in MSVC 2015, this behavior was changed, and highly aligned
things are now passed indirectly. To avoid breaking backwards
incompatibility, objects that do not have a *required* high alignment
(i.e. double) are still passed directly, even though they are not
naturally aligned. This change implements the new behavior of passing
things indirectly.
The new behavior is:
- up to three vector parameters can be passed in [XYZ]MM0-2
- remaining arguments with required alignment greater than 4 bytes are
passed indirectly
Previously, MSVC never passed things truly indirectly, meaning clang
would always apply the byval attribute to indirect arguments. We had to
go to the trouble of adding inalloca so that non-trivially copyable C++
types could be passed in place without copying the object
representation. When inalloca was added, we asserted that all arguments
passed indirectly must use byval. With this change, that assert no
longer holds, and I had to update inalloca to handle that case. The
implicit sret pointer parameter was already handled this way, and this
change generalizes some of that logic to arguments.
There are two cases that this change leaves unfixed:
1. objects that are non-trivially copyable *and* overaligned
2. vectorcall + inalloca + vectors
For case 1, I need to touch C++ ABI code in MicrosoftCXXABI.cpp, so I
want to do it in a follow-up.
For case 2, my fix is one line, but it will require updating IR tests to
use lots of inreg, so I wanted to separate it out.
Related to D71915 and D72110
Fixes most of PR44395
Reviewed By: rjmccall, craig.topper, erichkeane
Differential Revision: https://reviews.llvm.org/D72114
Summary:
Before this change, X86_32ABIInfo::classifyArgument would be called
twice on vector arguments to vectorcall functions. This function has
side effects to track GPR register usage, and this would lead to
incorrect GPR usage in some cases. The specific case I noticed is from
running out of XMM registers with mixed FP and vector arguments and no
aggregates of any kind. Consider this prototype:
void __vectorcall vectorcall_indirect_vec(
double xmm0, double xmm1, double xmm2, double xmm3, double xmm4,
__m128 xmm5,
__m128 ecx,
int edx,
__m128 mem);
classifyArgument has no effects when called on a plain FP type, but when
called on a vector type, it modifies FreeRegs to model GPR consumption.
However, this should not happen during the vector call first pass.
I refactored the code to unify vectorcall HVA logic with regcall HVA
logic. The conventions pass HVAs in registers differently (expanded vs.
not expanded), but if they do not fit in registers, they both pass them
indirectly by address.
Reviewers: erichkeane, craig.topper
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D72110
Summary:
Previously, since these aggregates are > 2*XLen, Clang would think they
were being returned indirectly and thus would decrease the number of
available GPRs available by 1. For long argument lists this could lead
to a struct argument incorrectly being passed indirectly.
Reviewers: asb, lenary
Reviewed By: asb, lenary
Subscribers: luismarques, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, lenary, s.egerton, pzheng, sameer.abuasal, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D69590
Summary:
This is documented as the appropriate template modifier for call operands.
Fixes PR44272, and adds a regression test.
Also adds support for operand modifiers in Intel-style inline assembly.
Reviewers: rnk
Reviewed By: rnk
Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71677
This is equivalent to the existing `import_name` and `import_module`
attributes which control the import names in the final wasm binary
produced by lld.
This maps the existing
This attribute currently requires a string rather than using the
symbol name for a couple of reasons:
1. Avoid confusion with static and dynamic linking which is
based on symbol name. Exporting a function from a wasm module using
this directive is orthogonal to both static and dynamic linking.
2. Avoids name mangling.
Differential Revision: https://reviews.llvm.org/D70520
AggValueSlot
This reapplies 8a5b7c3570 after a null
dereference bug in CGOpenMPRuntime::emitUserDefinedMapper.
Original commit message:
This is needed for the pointer authentication work we plan to do in the
near future.
a63a81bd99/clang/docs/PointerAuthentication.rst