Instead of having separate implementations for RV32 and RV64,
use the triple to control the Is64Bit parameter.
Do the same for isValidTuneCPUName, fillValidCPUList, and
fillValidTuneCPUList.
The __ARM_FEATURE_SVE_VECTOR_OPERATORS macro should be changed to
indicate that this feature is now supported on VLA vectors as well as
VLS vectors. There is a complementary PR to the ACLE spec here
https://github.com/ARM-software/acle/pull/213
Reviewed By: peterwaller-arm
Differential Revision: https://reviews.llvm.org/D131573
We would like to make the ACLE NEON and SVE intrinsics more useable by
gating them on the target, not by ifdef preprocessor macros. In order to
do this the types they use need to be available. This patches makes
__bf16 always available under AArch64 not just when the bf16
architecture feature is present. This bringing it in-line with GCC. In
subsequent patches the NEON bfloat16x8_t and SVE svbfloat16_t types
(along with bfloat16_t used in arm_sve.h) will be made unconditional
too.
The operations valid on the types are still very limited. They can be
used as a storage type, but the intrinsics used for convertions are
still behind an ifdef guard in arm_neon.h/arm_bf16.h.
Differential Revision: https://reviews.llvm.org/D130973
The SystemZ ABI says that 128 bit integers should be aligned to only 8 bytes.
Reviewed By: Ulrich Weigand, Nikita Popov
Differential Revision: https://reviews.llvm.org/D130900
We are supporting quadword lock free atomics on AIX. For the situation that users on AIX are using a libatomic that is lock-based for quadword types, we can't enable quadword lock free atomics by default on AIX in case user's new code and existing code accessing the same shared atomic quadword variable, we can't guarentee atomicity. So we need an option to enable quadword lock free atomics on AIX, thus we can build a quadword lock-free libatomic(also for advanced users considering atomic performance critical) for users to make the transition smooth.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D127189
This patch implements recently ratified extension Zmmul, a subextension
of M (Integer Multiplication and Division) consisting only
multiplication part of it.
Differential Revision: https://reviews.llvm.org/D103313
Reviewed By: craig.topper, jrtc27, asb
Add support for the RDPRU instruction on Zen2 processors.
User-facing features:
- Clang option -m[no-]rdpru to enable/disable the feature
- Support is implicit for znver2/znver3 processors
- Preprocessor symbol __RDPRU__ to indicate support
- Header rdpruintrin.h to define intrinsics
- "rdpru" mnemonic supported for assembler code
Internal features:
- Clang builtin __builtin_ia32_rdpru
- IR intrinsic @llvm.x86.rdpru
Differential Revision: https://reviews.llvm.org/D128934
HLSL supports half type.
When enable-16bit-types is not set, half will be treated as float.
When enable-16bit-types is set, half will be treated like real 16bit float type and map to llvm half type.
Also change CXXABI to Microsoft to match dxc behavior.
The mangle name for half is "$f16@" when half is treat as native half type and "$halff@" when treat as float.
In AST, half is still half.
The special thing is done at clang codeGen, when NativeHalfType is false, half will translated into float.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D124790
Adding half float to types that can be represented by __attribute__((mode(xx))).
Original implementation authored by George Steed.
Differential Revision: https://reviews.llvm.org/D126479
For amdgpu target long double type is the same as double type.
The width and align of long double type was incorrectly
overridden when copying aux target properties, which
caused assertion in codegen when emitting global
variables with long double type.
This patch fix that by saving and restoring width
and align of long double type.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D127771
Fixes: SWDEV-335515
The arch or cpu has its default fpu features and versions such as fpuv2_sf/fpuv3_sf.
And there is also -mfpu option to specify and override fpu version and features.
For example, C860 has fpuv3_sf/fpuv3_df feature as default, when
-mfpu=fpv2 is given, fpuv3_sf/fpuv3_df is replaced with fpuv2_sf/fpuv2_df.
Allows emitting define amdgpu_kernel void @func() IR from C or C++.
This replaces the current workflow which is to write a stub in opencl that
calls an external C function implemented in C++ combined through llvm-link.
Calling the resulting function still requires a manual implementation of the
ABI from the host side. The primary application is for more rapid debugging
of the amdgpu backend by permuting a C or C++ test file instead of manually
updating an IR file.
Implementation closely follows D54425. Non-amd reviewers from there.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D125970
This patch implements the following floating point negative absolute value
builtins that required for compatibility with the XL compiler:
```
double __fnabs(double);
float __fnabss(float);
```
These builtins will emit :
- fnabs on PWR6 and below, or if VSX is disabled.
- xsnabsdp on PWR7 and above, if VSX is enabled.
Differential Revision: https://reviews.llvm.org/D125506
Emit predefined macros for GPU family. e.g.
for GPU gfx9xx emit __GFX9__, etc.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D125909
Currently we define the `__CUDA_ARCH__` macro only in CUDA mode. This
patch allows us to use this macro in OpenMP-offloading mode when
targeting NVPTX.
Reviewed By: tra, tianshilei1992
Differential Revision: https://reviews.llvm.org/D125256
This patch turns on support for CR bit accesses for Power8 and above. The reason
why CR bits are turned on as the default for Power8 and above is that because
later architectures make use of builtins and instructions that require CR bit
accesses (such as the use of setbc in the vector string isolate predicate
and bcd builtins on Power10).
This patch also adds the clang portion to allow for turning on CR bits in the
front end if the user so desires to.
Differential Revision: https://reviews.llvm.org/D124060
clang emit wrong code sequence for `int16`(`short`) to `__fp16` conversion,
and that should fix the code gen directly is the right way I think,
but I found there is a FIXME comment in clang/Basic/TargetInfo.h say
that's should be removed in future so I think just let swich to using
generic LLVM IR rather than llvm.convert.to.fp16 intrinsics code gen
path is enough.
```
/// Check whether llvm intrinsics such as llvm.convert.to.fp16 should be used
/// to convert to and from __fp16.
/// FIXME: This function should be removed once all targets stop using the
/// conversion intrinsics.
virtual bool useFP16ConversionIntrinsics() const {
return true;
}
```
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D124509
The recently announced IBM z16 processor implements the architecture
already supported as "arch14" in LLVM. This patch adds support for
"z16" as an alternate architecture name for arch14.