As suggested by @rsmith on PR47267, by replacing the builtin_memcpy bitcast pattern with builtin_bit_cast we can use _castf32_u32, _castu32_f32, _castf64_u64 and _castu64_f64 inside constant expresssions (constexpr). Although __builtin_bit_cast was added for c++20 it works on all clang c/c++ modes.
Differential Revision: https://reviews.llvm.org/D86398
This is a first step patch to enable constexpr support and testing to a large number of x86 intrinsics.
All I've done here is provide a DEFAULT_FN_ATTRS_CONSTEXPR variant to our existing DEFAULT_FN_ATTRS tag approach that adds constexpr on c++ builds. The clang cuda headers do something similar.
I've started with POPCNT mainly as its tiny and are wrappers to generic __builtin_* intrinsics which already act as constexpr.
Differential Revision: https://reviews.llvm.org/D86229
This patch implements the vec_extractm function prototypes in altivec.h in
order to utilize the vector extract with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82675
This patch implements the builtins for the vector shifts (shl, srl, sra), and
adds the appropriate test cases for these builtins. The builtins utilize the
vector shift instructions introduced within ISA 3.1.
Differential Revision: https://reviews.llvm.org/D83338
Automation to detect compiler features, such as CMake's target_compile_features,
would attempt to detect compiler features by explicitly using langugage flags.
This change ensures that the HIP headers would still work with C++98.
Patch by Siu Chi Chan
Differential Revision: https://reviews.llvm.org/D85471
Change-Id: I304e964b18a525b0fde55efd841da74b6c4dc8ed
This patch implements the function prototypes vec_extractl and vec_extracth in altivec.h to utilize the vector extract double element instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D84622
47f7174ffa changed the types used in the Wasm SIMD builtin functions,
but not all of their uses in wasm_simd128.h were updated. This commit
fixes wasm_simd128.h and adds tests to make sure similar problems do
not pass uncaught in the future.
Differential Revision: https://reviews.llvm.org/D85347
Normally math functions are forwarded to __nv_* counterparts provided by CUDA's
libdevice bitcode. However, __nv_rint*()/__nv_nearbyint*() functions there have
a bug -- they use round() which rounds *up* instead of rounding towards the
nearest integer, so we end up with rint(2.5f) producing 3.0 instead of expected
2.0. The broken bitcode is not actually used by NVCC itself, which has both a
work-around in CUDA headers and, in recent versions, uses correct
implementations in NVCC's built-ins.
This patch implements equivalent workaround and directs rint*/nearbyint* to
__builtin_* variants that produce correct results.
Differential Revision: https://reviews.llvm.org/D85236
This allows people to use `int8_t` instead of `char`, -funsigned-char,
and generally decouples SIMD from the specialness of `char`.
And it makes intrinsics like `__builtin_wasm_add_saturate_s_i8x16`
and `__builtin_wasm_add_saturate_u_i8x16` use signed and unsigned
element types, respectively.
Differential Revision: https://reviews.llvm.org/D85074
Power10 introduces new instructions for vector multiply, divide and modulus.
These instructions can be exploited by the builtin functions: vec_mul, vec_div,
and vec_mod, respectively.
This patch aims adds the function prototype, vec_mod, as vec_mul and vec_div
been previously implemented in altivec.h.
This patch also adds the following front end tests:
vec_mul for v2i64
vec_div for v4i32 and v2i64
vec_mod for v4i32 and v2i64
Differential Revision: https://reviews.llvm.org/D82576
Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.
Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.
Differential Revision: https://reviews.llvm.org/D84556
This patch implements the `vec_xst_trunc` function in altivec.h in order to
utilize the Store VSX Vector Rightmost [byte | half | word | doubleword] Indexed
instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82467
Summary:
Make hip math headers easier to use from C
Motivation is a step towards using the hip math headers to implement math.h
for openmp, which needs to work with C as well as C++. NFC for C++ code.
Reviewers: yaxunl, jdoerfert
Reviewed By: yaxunl
Subscribers: sstefan1, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D84476
The old way worked to some degree for C++-mode but in C mode we actually
tried to introduce variants of macros (e.g., isinf). To make both modes
work reliably we get rid of those extra variants and directly use NVIDIA
intrinsics in the complex implementation. While this has to be revisited
as we add other GPU targets which want to reuse the code, it should be
fine for now.
Reviewed By: tra, JonChesterfield, yaxunl
Differential Revision: https://reviews.llvm.org/D83591
Due to recent changes we cannot use OpenMP in CUDA files anymore
(PR45533) as the math handling of CUDA is different when _OPENMP is
defined. We actually want this different behavior only if we are
offloading with OpenMP to NVIDIA, thus generating NVPTX. With this patch
we do not interfere with the CUDA math handling except if we are in
NVPTX offloading mode, as indicated by the presence of __OPENMP_NVPTX__.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D78155
To avoid linkage errors we have to ensure the linkage allows multiple
definitions of these compiler inserted functions. Since they are on the
cold path of complex computations, we want to avoid `inline`. Instead,
we opt for `weak` and `noinline` for now.
This simply follows the scheme we have for other wrappers. It resolves
the current link problem, e.g., `__muldc3 not found`, when std::complex
operations are used on a device.
This will not allow complex make math function calls to work properly,
e.g., sin, but that is more complex (pan intended) anyway.
Reviewed By: tra, JonChesterfield
Differential Revision: https://reviews.llvm.org/D80897
This enables _InterlockedAnd64/_InterlockedOr64/_InterlockedXor64/_InterlockedDecrement64/_InterlockedIncrement64/_InterlockedExchange64/_InterlockedExchangeAdd64/_InterlockedExchangeSub64 on 32-bit Windows
The backend already knows how to expand these to a loop using cmpxchg8b on 32-bit targets.
Fixes PR46595
Differential Revision: https://reviews.llvm.org/D83254
This patch implements builtins for the following prototypes:
unsigned long long __builtin_cntlzdm (unsigned long long, unsigned long long)
unsigned long long __builtin_cnttzdm (unsigned long long, unsigned long long)
vector unsigned long long vec_cntlzm (vector unsigned long long, vector unsigned long long)
vector unsigned long long vec_cnttzm (vector unsigned long long, vector unsigned long long)
Differential Revision: https://reviews.llvm.org/D80941
This patch implements builtins for the following prototypes for the VSX Permute
Control Vector Generate with Mask Instructions:
vector unsigned char vec_genpcvm (vector unsigned char, const int);
vector unsigned short vec_genpcvm (vector unsigned short, const int);
vector unsigned int vec_genpcvm (vector unsigned int, const int);
vector unsigned long long vec_genpcvm (vector unsigned long long, const int);
Differential Revision: https://reviews.llvm.org/D81774
This patch implements builtins for the following prototypes:
```
vector signed char vec_clrl (vector signed char a, unsigned int n);
vector unsigned char vec_clrl (vector unsigned char a, unsigned int n);
vector signed char vec_clrr (vector signed char a, unsigned int n);
vector signed char vec_clrr (vector unsigned char a, unsigned int n);
```
Differential Revision: https://reviews.llvm.org/D81707
This patch implements builtins for the following prototypes:
vector unsigned long long vec_pdep(vector unsigned long long, vector unsigned long long);
vector unsigned long long vec_pext(vector unsigned long long, vector unsigned long long __b);
unsigned long long __builtin_pdepd (unsigned long long, unsigned long long);
unsigned long long __builtin_pextd (unsigned long long, unsigned long long);
Revision Depends on D80758
Differential Revision: https://reviews.llvm.org/D80935
Summary:
This instruction was implemented in 3181273be7, but that commit did
not add an intrinsic for it.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81757
Recent change from `#if !defined(__CUDA__)` to `#if !__CUDA__` caused
regression on ROCm 3.5 since there is `#define __CUDA__`
before inclusion of the header file, which causes `#if !__CUDA__`
to be invalid.
Change `#if !__CUDA__` back to `#if !defined(__CUDA__)` for backward
compatibility.
To support std::complex and some other standard C/C++ functions in HIP device code,
they need to be forced to be __host__ __device__ functions by pragmas. This is done
by some clang standard C++ wrapper headers which are shared between cuda-clang and hip-Clang.
For these standard C++ wapper headers to work properly, specific include path order
has to be enforced:
clang C++ wrapper include path
standard C++ include path
clang include path
Also, these C++ wrapper headers require device version of some standard C/C++ functions
must be declared before including them. This needs to be done by including a default
header which declares or defines these device functions. The default header is always
included before any other headers are included by users.
This patch adds the the default header and include path for HIP.
Differential Revision: https://reviews.llvm.org/D81176
Added extensions and their function declarations into
the standard header.
Patch by Piotr Fusik!
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79781
Summary:
The shuffle intrinsic macros did not parenthesize usages of their
constant parameters, which could lead to incorrect results due to
operator precedence issues. This patch fixes the problem by adding the
missing paretheses.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80968
There are 65 that take a scalar shift amount. Intel documentation shows 60 of them taking unsigned int. There are 5 versions of srli_epi16 that use int, the 512-bit maskz and 128/256 mask/maskz.
Fixes PR45931
Differential Revision: https://reviews.llvm.org/D80251
Summary:
This reflects changes in the spec proposal made since basic arithmetic
was first implemented.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80174
Summary:
Add x86 feature with IBT and/or SHSTK bits to ELF program property if they are enabled. Otherwise, contents in this header file are unused.
This file is mainly design for assembly source code which want to enable CET
Reviewers: hjl.tools, annita.zhang, LuoYuanke, craig.topper, tstellar, pengfei, rsmith
Reviewed By: LuoYuanke
Subscribers: cfe-commits, mgorny
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79617
Summary:
Add x86 feature with IBT and/or SHSTK bits to ELF program property if they are enabled. Otherwise, contents in this header file are unused.
This file is mainly design for assembly source code which want to enable CET
Reviewers: hjl.tools, annita.zhang, LuoYuanke, craig.topper, tstellar, pengfei, rsmith
Reviewed By: LuoYuanke
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D79617
Summary:
Move instructions that have recently been implemented in V8 from the
`unimplemented-simd128` target feature to the `simd128` target
feature. The updated instructions match the update at
https://github.com/WebAssembly/simd/pull/223.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D79973
Summary:
As proposed in https://github.com/WebAssembly/simd/pull/122. Since
these instructions are not yet merged to the SIMD spec proposal, this
patch makes them entirely opt-in by surfacing them only through LLVM
intrinsics and clang builtins. If these instructions are made
official, these intrinsics and builtins should be replaced with simple
instruction patterns.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D79742
Summary:
Although using `__builtin_shufflevector` and the `shufflevector`
instruction works fine, they are not opaque to the optimizer. As a
result, DAGCombine can potentially reduce the number of shuffles and
change the shuffle masks. This is unexpected behavior for users of the
WebAssembly SIMD intrinsics who have crafted their shuffles to
optimize the code generated by engines. This patch solves the problem
by adding a new shuffle intrinsic that is opaque to the optimizers in
line with the decision of the WebAssembly SIMD contributors at
https://github.com/WebAssembly/simd/issues/196#issuecomment-622494748. In
the future we may implement custom DAG combines to properly optimize
shuffles and replace this solution.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D66983
Summary:
As described in https://github.com/WebAssembly/simd/pull/209. This is
the final reorganization of the SIMD opcode space before
standardization. It has been landed in concert with corresponding
changes in other projects in the WebAssembly SIMD ecosystem.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79224
Some intrinsics in vecintrin.h are currently implemented by
performing address arithmetic in __INTPTR_TYPE__ and converting
the result to some pointer type. While this works correctly,
it leads to suboptimal code generation since many optimizers
cannot trace the provenance of the resulting pointers.
Fixed by using "char *" pointer arithmetic instead.
System headers should avoid using the "vector" and "bool" keywords
since those might be redefined by user code. For example, using
<stdbool.h> before <vecintrin.h> will currently lead to compiler
errors.
Fixed by using the reserved "__vector" and "__bool" keywords
instead. NFC otherwise.
If we are in C++ mode and include <math.h> (not <cmath>) first, we still
need to make sure <cmath> is read first. The problem otherwise is that
we haven't seen the declarations of the math.h functions when the system
math.h includes our cmath overlay. However, our cmath overlay, or better
the underlying overlay, e.g. CUDA, uses the math.h functions. Since we
haven't declared them yet we get errors. CUDA avoids this by eagerly
declaring all math functions (in the __device__ space) but we cannot do
this. Instead we break the dependence by forcing cmath to go first.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D77774
For OpenMP target regions to piggy back on the CUDA/AMDGPU/... implementation of math functions,
we include the appropriate definitions inside of an `omp begin/end declare variant match(device={arch(nvptx)})` scope.
This way, the vendor specific math functions will become specialized versions of the system math functions.
When a system math function is called and specialized version is available the selection logic introduced in D75779
instead call the specialized version. In contrast to the code path we used so far, the system header is actually included.
This means functions without specialized versions are available and so are macro definitions.
This should address PR42061, PR42798, and PR42799.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D75788
Warnings in emmintrin.h and xmmintrin.h are reported by
-fsanitize=implicit-integer-sign-change.
Reviewed By: RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D77393
In d1705c1196 (D77238) we accidentally included subsequent changes and
did not only move the code into a new file (which was the intention).
We undo the changes now and re-introduce them with the appropriate test
changes later.
This is not supported to change anything but allow us to reuse the math
functions separately from the device functions, e.g., source them at
different times. This will be used by the OpenMP overlay.
This also adds two `return` keywords that were missing.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D77238
Summary:
The convention for the wasm_simd128.h intrinsics is to have the
integer sign in the lane interpretation rather than as a suffix. This
PR changes the names of the integer min, max, and avgr intrinsics to
match this convention.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77185
Summary:
As the WebAssembly SIMD proposal nears stabilization, there is desire
to use it with toolchains other than Emscripten. Moving the intrinsics
header to clang will make it available to WASI toolchains as well.
Reviewers: aheejin, sunfish
Subscribers: dschuff, mgorny, sbc100, jgravelle-google, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76959
Summary:
- Even though the bindless surface/texture interfaces are promoted,
there are still code using surface/texture references. For example,
[PR#26400](https://bugs.llvm.org/show_bug.cgi?id=26400) reports the
compilation issue for code using `tex2D` with texture references. For
better compatibility, this patch proposes the support of
surface/texture references.
- Due to the absent documentation and magic headers, it's believed that
`nvcc` does use builtins for texture support. From the limited NVVM
documentation[^nvvm] and NVPTX backend texture/surface related
tests[^test], it's believed that surface/texture references are
supported by replacing their reference types, which are annotated with
`device_builtin_surface_type`/`device_builtin_texture_type`, with the
corresponding handle-like object types, `cudaSurfaceObject_t` or
`cudaTextureObject_t`, in the device-side compilation. On the host
side, that global handle variables are registered and will be
established and updated later when corresponding binding/unbinding
APIs are called[^bind]. Surface/texture references are most like
device global variables but represented in different types on the host
and device sides.
- In this patch, the following changes are proposed to support that
behavior:
+ Refine `device_builtin_surface_type` and
`device_builtin_texture_type` attributes to be applied on `Type`
decl only to check whether a variable is of the surface/texture
reference type.
+ Add hooks in code generation to replace that reference types with
the correponding object types as well as all accesses to them. In
particular, `nvvm.texsurf.handle.internal` should be used to load
object handles from global reference variables[^texsurf] as well as
metadata annotations.
+ Generate host-side registration with proper template argument
parsing.
---
[^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf
[^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
[^bind]: See section 3.2.11.1.2 ``Texture reference API` in [CUDA C Programming Guide](https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf).
[^texsurf]: According to NVVM IR, `nvvm.texsurf.handle` should be used. But, the current backend doesn't have that supported. We may revise that later.
Reviewers: tra, rjmccall, yaxunl, a.sidorin
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76365
This patch adds 'q' to mean 'scalable vector' in the builtin
type string, and for SVE will return the matching builtin
type as defined in the C/C++ language extensions for SVE.
This patch also adds some scaffolding to generate the arm_sve.h
header file, and some builtin definitions (+CodeGen) to be able
to implement some simple masked load intrinsics that use the
ACLE types, such as:
svint8_t test_svld1_s8(svbool_t pg, const int8_t *base) {
return svld1_s8(pg, base);
}
Reviewers: efriedma, rjmccall, rovka, rsandifo-arm, rengolin
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75298
Summary:
This patch generalizes the existing code to support CDE intrinsics
which will share some properties with existing MVE intrinsics
(some of the intrinsics will be polymorphic and accept/return values
of MVE vector types).
Specifically the patch:
* Adds new tablegen backends -gen-arm-cde-builtin-def,
-gen-arm-cde-builtin-codegen, -gen-arm-cde-builtin-sema,
-gen-arm-cde-builtin-aliases, -gen-arm-cde-builtin-header based on
existing MVE backends.
* Renames the '__clang_arm_mve_alias' attribute into
'__clang_arm_builtin_alias' (it will be used with CDE intrinsics as
well as MVE intrinsics)
* Implements semantic checks for the coprocessor argument of the CDE
intrinsics as well as the existing coprocessor intrinsics.
* Adds one CDE intrinsic __arm_cx1 to test the above changes
Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen
Reviewed By: simon_tatham
Subscribers: sdesmalen, mgorny, kristof.beyls, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75850
arm_acle.h relied on `_MSC_VER` to determine if a given function was
already defined as a builtin. This was incorrect because
`-fms-extensions` enables these builtins, but is not responsible for
defining `_MSC_VER` on any target. The next closest thing is
`_MSC_EXTENSIONS`, which is only defined on Windows targets, but even
this is suboptimal. What this conditional is actually trying to
determine is if the given functions are defined as builtins, so just
check that directly.
I also attempted to do this for `__nop`, but in that case intrin.h,
which is only includable if `_MSC_VER` is defined, has its own
definition. So in that case `_MSC_VER` is correct.
Differential Revision: https://reviews.llvm.org/D75719
rdar://60102353
These declarations use a mix of unsigned and signed argument and
return types. This is not in accordance with OpenCL v2.0 s6.13.11.
Differential Revision: https://reviews.llvm.org/D74910
New intrinisics are implemented for when we need to port SIMD code from other
arhitectures and only load or store portions of MSA registers.
Following intriniscs are added which only load/store element 0 of a vector:
v4i32 __builtin_msa_ldrq_w (const void *, imm_n2048_2044);
v2i64 __builtin_msa_ldr_d (const void *, imm_n4096_4088);
void __builtin_msa_strq_w (v4i32, void *, imm_n2048_2044);
void __builtin_msa_str_d (v2i64, void *, imm_n4096_4088);
Differential Revision: https://reviews.llvm.org/D73644
Summary:
To use new/delete in NVPTX code we need to define them. Implementation
copied from CUDA wrappers.
Reviewers: hfinkel, jdoerfert
Subscribers: mgorny, guansong, kkwli0, caomhin, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73128
This makes clang somewhat forward-compatible with new CUDA releases
without having to patch it for every minor release without adding
any new function.
If an unknown version is found, clang issues a warning (can be disabled
with -Wno-cuda-unknown-version) and assumes that it has detected
the latest known version. CUDA releases are usually supersets
of older ones feature-wise, so it should be sufficient to keep
released clang versions working with minor CUDA updates without
having to upgrade clang, too.
Differential Revision: https://reviews.llvm.org/D73231
Use floating-point instead of integer zero constants to avoid
creating implicit conversions, which currently cause suboptimal
code to be generated with -ffp-exception-behavior=strict.
NFC otherwise.
Enabling `-Wcast-qual` identified many casts in various system headers
that were dropping the `const` qualifier. Fixing those missing
qualifiers pointed out that a few of the definitions of the builtins
did not properly identify their arguments as `const` pointers. This
commit fixes those builtin definitions, and the system header files
so that they no longer drop the qualifier.
Differential Revision: https://reviews.llvm.org/D71718
The forward declaration had a cdecl calling convention, but the
inline version did not. This leads to a conflict if the default
calling convention is not cdecl. Fix this by just removing the
forward declaration.
Fixes PR41503
We need to use a 64-bit type in 64-bit mode so a 64-bit register
will get used in the generated assembly. I've also changed the
constraints to just use "r" intead of "q". "q" forces to a only
an a/b/c/d register in 32-bit mode, but I see no reason that
would matter here.
Fixes Nico's note in PR19301 over 4 years ago.
Differential Revision: https://reviews.llvm.org/D70101
As we currently have it implemented in altivec.h, the offsets for these two
intrinsics are element offsets. The documentation in the ABI (as well as the
implementation in both XL and GCC) states that these should be byte offsets.
Differential revision: https://reviews.llvm.org/D63636
We currently emit a double precision comparison instruction for this, whereas we
need to emit the single precision version.
Differential revision: https://reviews.llvm.org/D64024
Summary:
Writing support for three ACLE functions:
unsigned int __cls(uint32_t x)
unsigned int __clsl(unsigned long x)
unsigned int __clsll(uint64_t x)
CLS stands for "Count number of leading sign bits".
In AArch64, these two intrinsics can be translated into the 'cls'
instruction directly. In AArch32, on the other hand, this functionality
is achieved by implementing it in terms of clz (count number of leading
zeros).
Reviewers: compnerd
Reviewed By: compnerd
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69250
This commit sets up the infrastructure for auto-generating <arm_mve.h>
and doing clang-side code generation for the builtins it relies on,
and demonstrates that it works by implementing a representative sample
of the ACLE intrinsics, more or less matching the ones introduced in
LLVM IR by D67158,D68699,D68700.
Like NEON, that header file will provide a set of vector types like
uint16x8_t and C functions with names like vaddq_u32(). Unlike NEON,
the ACLE spec for <arm_mve.h> includes a polymorphism system, so that
you can write plain vaddq() and disambiguate by the vector types you
pass to it.
Unlike the corresponding NEON code, I've arranged to make every user-
facing ACLE intrinsic into a clang builtin, and implement all the code
generation inside clang. So <arm_mve.h> itself contains nothing but
typedefs and function declarations, with the latter all using the new
`__attribute__((__clang_builtin))` system to arrange that the user-
facing function names correspond to the right internal BuiltinIDs.
So the new MveEmitter tablegen system specifies the full sequence of
IRBuilder operations that each user-facing ACLE intrinsic should
translate into. Where possible, the ACLE intrinsics map to standard IR
operations such as vector-typed `add` and `fadd`; where no standard
representation exists, I call down to the sample IR intrinsics
introduced in an earlier commit.
Doing it like this means that you get the polymorphism for free just
by using __attribute__((overloadable)): the clang overload resolution
decides which function declaration is the relevant one, and _then_ its
BuiltinID is looked up, so by the time we're doing code generation,
that's all been resolved by the standard system. It also means that
you get really nice error messages if the user passes the wrong
combination of types: clang will show the declarations from the header
file and explain why each one doesn't match.
(The obvious alternative approach would be to have wrapper functions
in <arm_mve.h> which pass their arguments to the underlying builtins.
But that doesn't work in the case where one of the arguments has to be
a constant integer: the wrapper function can't pass the constantness
through. So you'd have to do that case using a macro instead, and then
use C11 `_Generic` to handle the polymorphism. Then you have to add
horrible workarounds because `_Generic` requires even the untaken
branches to type-check successfully, and //then// if the user gets the
types wrong, the error message is totally unreadable!)
Reviewers: dmgreen, miyuki, ostannard
Subscribers: mgorny, javed.absar, kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D67161
These intrinsics use llvm.cttz intrinsics so are always available
even without the bmi feature. We already don't check for the bmi
feature on the intrinsics themselves. But we were blocking the
include of the header file with _MSC_VER unless BMI was enabled
on the command line.
Fixes PR30506.
llvm-svn: 374516
Summary:
This is similar to vec_ct* in https://reviews.llvm.org/rL304205.
The argument must be a constant, otherwise instruction selection
will fail. always_inline is not enough for isel to always fold
everything away at -O0.
The fix is to turn the function into macros in altivec.h.
Fixes https://bugs.llvm.org/show_bug.cgi?id=43072
Reviewers: nemanjai, hfinkel, #powerpc, wuzish
Reviewed By: #powerpc, wuzish
Subscribers: wuzish, kbarton, MaskRay, shchenz, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D66699
llvm-svn: 370902
vote.ballot instruction is gone in recent CUDA versions and
vote.sync.ballot can not be used because it needs a thread mask parameter.
Fortunately PTX 6.2 (introduced with CUDA-9.2) provides activemask.b32
instruction for this.
Differential Revision: https://reviews.llvm.org/D66665
llvm-svn: 370792
In C++ mode we should only avoid adding __OPENCL_C_VERSION__,
all other predefined macros about the language mode are still
valid.
This change also fixes the language version check in the
headers accordingly.
Differential Revision: https://reviews.llvm.org/D65941
llvm-svn: 368552
Port existing headers which include x86 intrinsics implementation to
PowerPC platform (using Altivec), along with tests. Also, tests about
including these intrinsic headers are combined.
The headers are mainly developed by Steven Munroe, with contributions
from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D65630
llvm-svn: 368392
Move the platform check out of PPC Linux toolchain code and add platform guards
to the intrinsic headers, since they are supported currently only on 64-bit
PowerPC targets.
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D64849
llvm-svn: 367281
Defining CLK_NULL_EVENT with a `(void*)` cast has the (unintended?)
side-effect that the address space will be fixed (as generic in OpenCL
2.0 mode). The consequence is that any target specific address space
for the clk_event_t type will not be applied.
It is not clear why the void pointer cast was needed in the first
place, and it seems we can do without it.
Differential Revision: https://reviews.llvm.org/D63876
llvm-svn: 366546
Remove dependency of malloc in implementation of mm_malloc function in PowerPC
intrinsics and alignment assumption on glibc.
Reviewed By: Hal Finkel
Differential Revision: https://reviews.llvm.org/D64850
llvm-svn: 366406
The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined.
This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function.
The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used.
I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example).
make check-all didn't show any new failures.
[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
Differential Revision: https://reviews.llvm.org/D64495
llvm-svn: 366197
This patch series adds support for the next-generation arch13
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining __VEC__ == 10303.
Note: No currently available Z system supports the arch13
architecture. Once new systems become available, the
official system name will be added as supported -march name.
llvm-svn: 365933
We accidentally lost the ATOMIC_VAR_INIT and ATOMIC_FLAG_INIT macros
in r363794.
Also put the `memory_order` typedef back inside a `>= CL2.0` guard.
llvm-svn: 364174
Add overloads with generic address space pointer to old atomics.
This is currently only added for C++ compilation mode.
Differential Revision: https://reviews.llvm.org/D62335
llvm-svn: 364071
These intrinsics should always take an immediate for the rounding mode.
The base instruction comes from before EVEX embdedded rounding. The
user should always provide the immediate rather than us assuming
CUR_DIRECTION.
Make the 512-bit versions also explicit aliases instead of copy
pasting the code.
llvm-svn: 363961
Summary:
AIX system headers need stdint.h and inttypes.h to be re-enterable when macro _STD_TYPES_T is defined so that limit macro definitions such as UINT32_MAX can be found. This patch attempts to allow that on AIX.
Reviewers: hubert.reinterpretcast, jasonliu, mclow.lists, EricWF
Reviewed by: hubert.reinterpretcast, mclow.lists
Subscribers: jfb, jsji, christof, cfe-commits, libcxx-commits, llvm-commits
Tags: #LLVM, #clang, #libc++
Differential Revision: https://reviews.llvm.org/D59253
llvm-svn: 363939
Using the -fdeclare-opencl-builtins option will require a way to
predefine types and macros such as `int4`, `CLK_GLOBAL_MEM_FENCE`,
etc. Move these out of opencl-c.h into opencl-c-base.h such that the
latter can be shared by -fdeclare-opencl-builtins and
-finclude-default-header.
This changes the behaviour of -finclude-default-header when
-fdeclare-opencl-builtins is specified: instead of including the full
header, it will include the header with only the base definitions.
Differential revision: https://reviews.llvm.org/D63256
llvm-svn: 363794
Port emmintrin.h which include Intel SSE2 intrinsics implementation to PowerPC platform (using Altivec).
The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
It's a follow-up patch of D62121.
Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Differential Revision: https://reviews.llvm.org/D62569
llvm-svn: 363122
Summary:
Remove unnecessary definition (otherwise the extension will be defined
where it's not supposed to be defined).
Consider the code:
#pragma OPENCL EXTENSION cl_intel_planar_yuv : begin
// some declarations
#pragma OPENCL EXTENSION cl_intel_planar_yuv : end
is enough for extension to become known for clang.
Patch by: Dmitry Sidorov <dmitry.sidorov@intel.com>
Reviewers: Anastasia, yaxunl
Reviewed By: Anastasia
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58666
llvm-svn: 362398
Port xmmintrin.h which include Intel SSE intrinsics implementation to PowerPC platform (using Altivec).
The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D62121
llvm-svn: 362190
Port xmmintrin.h which include Intel SSE intrinsics implementation to PowerPC platform (using Altivec).
The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D62121
llvm-svn: 361928
Summary: When including the random header in C++, some of the math functions it relies on are not present in the CUDA headers. We include this variants in this case.
Reviewers: jdoerfert, hfinkel, tra, caomhin
Reviewed By: tra
Subscribers: efriedma, guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D62046
llvm-svn: 361066
Previously we were doing this so that the 256 bit selectw builtin could be used in the implementation of the 512->256 bit conversion intrinsic.
After this commit we now use a masked convert builtin that will emit the intrinsic call and the 256-bit select from custom code in CGBuiltin. Then the header only needs to call that one intrinsic.
llvm-svn: 360924
Summary:
This is a fix for the reported bug:
[[ https://bugs.llvm.org/show_bug.cgi?id=41861 | 41861 ]]
abs functions need to be moved under the c++ macro to avoid conflicts with included headers.
Reviewers: tra, jdoerfert, hfinkel, ABataev, caomhin
Reviewed By: jdoerfert
Subscribers: guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61959
llvm-svn: 360809
Summary: In OpenMP device offloading we must ensure that unde C++ 17, the inclusion of cstdlib will works correctly.
Reviewers: ABataev, tra, jdoerfert, hfinkel, caomhin
Reviewed By: jdoerfert
Subscribers: Hahnfeld, guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61949
llvm-svn: 360804
Currently `immintrin.h` includes `pconfigintrin.h` and `sgxintrin.h`
which contain inline assembly. It causes failures when building with the
flag `-fno-gnu-inline-asm`.
Fix by excluding functions with inline assembly when this extension is
disabled. So far there was no need to support `_pconfig_u32`,
`_enclu_u32`, `_encls_u32`, `_enclv_u32` on platforms that require
`-fno-gnu-inline-asm`. But if developers start using these functions,
they'll have compile-time undeclared identifier errors which is
preferrable to runtime errors.
rdar://problem/49540880
Reviewers: craig.topper, GBuella, rnk, echristo
Reviewed By: rnk
Subscribers: jkorous, dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D61621
llvm-svn: 360630
Summary: This patches fixes an issue in which the __clang_cuda_cmath.h header is being included even when cmath or math.h headers are not included.
Reviewers: jdoerfert, ABataev, hfinkel, caomhin, tra
Reviewed By: tra
Subscribers: tra, mgorny, guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61765
llvm-svn: 360626
This reverts r360271 (git commit a0933bd8ec)
There are concerns on the review that this breaks EFI builds and that
the transitive includes (sal.h) are actually heavy enough that we might
care.
llvm-svn: 360291
compatibility. This allows some applications developed with MSVC to
compile with clang without any extra changes.
Fixes: llvm.org/PR40789
Differential Revision: https://reviews.llvm.org/D61646
llvm-svn: 360271
Summary:
In this patch we propose a temporary solution to resolving math functions for the NVPTX toolchain, temporary until OpenMP variant is supported by Clang.
We intercept the inclusion of math.h and cmath headers and if we are in the OpenMP-NVPTX case, we re-use CUDA's math function resolution mechanism.
Authors:
@gtbercea
@jdoerfert
Reviewers: hfinkel, caomhin, ABataev, tra
Reviewed By: hfinkel, ABataev, tra
Subscribers: JDevlieghere, mgorny, guansong, cfe-commits, jdoerfert
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61399
llvm-svn: 360265
This commit appears to be breaking stage-2 builds on GreenDragon. The
OpenMP wrappers for cmath and math.h are copied into the root of the
resource directory and cause a cyclic dependency in module 'Darwin':
Darwin -> std -> Darwin. This blows up when CMake is testing for modules
support and breaks all stage 2 module builds, including the ThinLTO bot
and all LLDB bots.
CMake Error at cmake/modules/HandleLLVMOptions.cmake:497 (message):
LLVM_ENABLE_MODULES is not supported by this compiler
llvm-svn: 360192
Summary:
In this patch we propose a temporary solution to resolving math functions for the NVPTX toolchain, temporary until OpenMP variant is supported by Clang.
We intercept the inclusion of math.h and cmath headers and if we are in the OpenMP-NVPTX case, we re-use CUDA's math function resolution mechanism.
Authors:
@gtbercea
@jdoerfert
Reviewers: hfinkel, caomhin, ABataev, tra
Reviewed By: hfinkel, ABataev, tra
Subscribers: mgorny, guansong, cfe-commits, jdoerfert
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61399
llvm-svn: 360063
Summary:
1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
2. Enable intrinsics for VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
For more details about BF16 intrinsic, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference
Patch by LiuTianle
Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: mgorny, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D60552
llvm-svn: 360018
Summary:
This is a follow up to r355253 and a better fix than the first attempt
which was r359257.
We can't install anything from ${CMAKE_CFG_INTDIR}, because this value
is only defined at build time, but we still must make sure to copy the
headers into ${CMAKE_CFG_INTDIR}/lib/clang/$VERSION/include, because the lit
tests look for headers there. So for this fix we revert to the
old behavior of copying the headers to ${CMAKE_CFG_INTDIR}/lib/clang/$VERSION/include
during the build and then installing them from the source tree.
Reviewers: smeenai, vzakhari, phosek
Reviewed By: smeenai, vzakhari
Subscribers: mgorny, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61220
llvm-svn: 359654
This provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
These intrinsics are available when __ARM_FEATURE_MEMORY_TAGGING is defined.
Each intrinsic is described in detail in the ACLE Q1 2019 documentation:
https://developer.arm.com/docs/101028/latest
Reviewed By: Tim Nortover, David Spickett
Differential Revision: https://reviews.llvm.org/D60485
llvm-svn: 359348
Summary:
This is a follow up to r355253, which inadvertently broke Visual
Studio builds by trying to copy files from CMAKE_CFG_INTDIR.
See https://reviews.llvm.org/D58537#inline-532492
Reviewers: smeenai, vzakhari, phosek
Reviewed By: smeenai
Subscribers: mgorny, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61054
llvm-svn: 359257
Port mmintrin.h which include x86 MMX intrinsics implementation to PowerPC platform (using Altivec).
To make the include process correct, PowerPC's toolchain class is overrided to insert new headers directory (named ppc_wrappers) into the path. Basic test cases for several intrinsic functions are added.
The header is mainly developed by Steven Munroe, with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D59924
llvm-svn: 358949
The pattern we replaced these with may be too hard to match as demonstrated by
PR41496 and PR41316.
This patch restores the intrinsics and then we can start focusing
on the optimizing the intrinsics.
I've mostly reverted the original patch that removed them. Though I modified
the avx512 intrinsics to not have masking built in.
Differential Revision: https://reviews.llvm.org/D60674
llvm-svn: 358427
Summary:
These all had somewhat custom file headers with different text from the
ones I searched for previously, and so I missed them. Thanks to Hal and
Kristina and others who prompted me to fix this, and sorry it took so
long.
Reviewers: hfinkel
Subscribers: mcrosier, javed.absar, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D60406
llvm-svn: 357941
[IMPORTANT]
With that last fix, CUDA has just started being compiling by clang on Windows after nearly a year and two clang’s major releases (7 and 8).
As long as the last LLVM release, in which clang was compiling CUDA on Windows successfully, was 6.0.1, this fix and two previous have to be included into upcoming 7.1.0 and 8.0.1 releases.
[How to repro]
clang++.exe -x cuda "c:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\0_Simple\simplePrintf\simplePrintf.cu" -I"c:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\common\inc" --cuda-gpu-arch=sm_50 --cuda-path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0" -L"c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64" -lcudart.lib -v
[Output]
In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:390:11: error: no matching function for call to '__isinfl'
return (__isinfl(a) != 0);
^~~~~~~~
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2662:14: note: candidate function not viable: call to __host__ function from __device__ function
__func__(int __isinfl(long double a))
^
In file included from <built-in>:1:
In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:438:11: error: no matching function for call to '__isnanl'
return (__isnanl(a) != 0);
^~~~~~~~
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2672:14: note: candidate function not viable: call to __host__ function from __device__ function
__func__(int __isnanl(long double a))
^
In file included from <built-in>:1:
In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:486:11: error: no matching function for call to '__finitel'
return (__finitel(a) != 0);
^~~~~~~~~
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2652:14: note: candidate function not viable: call to __host__ function from __device__ function
__func__(int __finitel(long double a))
^
3 errors generated when compiling for sm_50.
[Solution]
Add missing long double device functions' declarations. Provide only declarations to prevent any use of long double on the device side, because CUDA does not support long double on the device side.
[Testing]
{Windows 10, Ubuntu 16.04.5}/{Visual C++ 2017 15.9.9, gcc+ 5.4.0}/CUDA {8.0, 9.0, 9.1, 9.2, 10.0, 10.1}
Reviewed by: Artem Belevich
Differential Revision: http://reviews.llvm.org/D60220
llvm-svn: 357779
Summary:
These are all implemented by icc as well.
I made bit_scan_forward/reverse forward to the __bsfd/__bsrq since we also have
__bsfq/__bsrq.
Note, when lzcnt is enabled the bsr intrinsics generates lzcnt+xor instead of bsr.
Reviewers: RKSimon, spatel
Subscribers: cfe-commits, llvm-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59682
llvm-svn: 356848
gcc and icc both implement popcntd and popcntq which we did not. gcc doesn't seem to require a feature flag for the _popcnt32/_popcnt64 spelling and will use a libcall if its not supported.
Differential Revision: https://reviews.llvm.org/D59567
llvm-svn: 356689
gcc has these intrinsics in ia32intrin.h as well. And icc implements them
though they aren't documented in the Intel Intrinsics Guide.
Differential Revision: https://reviews.llvm.org/D59533
llvm-svn: 356609
This is another attempt at what Erich Keane tried to do in r355322.
This adds rolb, rolw, rold, rolq and their ror equivalent as always_inline wrappers around __builtin_rotate* which will lower to funnel shift intrinsics in IR.
Additionally, when _MSC_VER is not defined we will define _rotl, _lrotl, _rotr, _lrotr as macros to one of the always_inline intrinsics mentioned above. Making sure that _lrotl/_lrotr use either 32 or 64 bit based on the size of long. These need to be macros because we have builtins with the same name for MS compatibility, but _MSC_VER isn't always defined when those builtins are enabled.
We also define _rotwl and _rotwr as macros aliasing to rolw/rorw just like gcc to complete the set. These don't need to be gated with _MSC_VER because these aren't MS builtins.
I've added tests both for non-MS and -ms-extensions with and without _MSC_VER being defined.
Differential Revision: https://reviews.llvm.org/D59346
llvm-svn: 356423
Partial fix for the clang Bug 38811 "Clang fails to compile with CUDA-9.x on Windows".
[Synopsis]
__sptr is a new Microsoft specific modifier (https://docs.microsoft.com/en-us/cpp/cpp/sptr-uptr?view=vs-2017).
[Solution]
Replace all `__sptr` occurrences with `__s` (and all `__cptr` with `__c` as well) to eliminate the below clang compilation error on Windows.
In file included from C:\GIT\LLVM\trunk\llvm-64-release-vs2017-15.9.5\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:162:
C:\GIT\LLVM\trunk\llvm-64-release-vs2017-15.9.5\dist\lib\clang\9.0.0\include\__clang_cuda_device_functions.h:524:33: error: expected expression
return __nv_fast_sincosf(__a, __sptr, __cptr);
^
Reviewed by: Artem Belevich
Differential Revision: http://reviews.llvm.org/D59423
llvm-svn: 356291
Partial fix for the clang Bug https://bugs.llvm.org/show_bug.cgi?id=38811 "Clang fails to compile with CUDA-9.x on Windows".
Adding defined(_WIN64) check along with existing #if defined(__LP64__) eliminates the below clang (64-bit) compilation error on Windows.
C:/GIT/LLVM/trunk/llvm-64-release-vs2017/dist/lib/clang/9.0.0\include\__clang_cuda_device_functions.h(1609,45): error GEF7559A7: no matching function for call to 'roundf'
__DEVICE__ long lroundf(float __a) { return roundf(__a); }
Reviewed by: Artem Belevich
Differential Revision: http://reviews.llvm.org/D59361
llvm-svn: 356255
Summary:
The current install-clang-headers target installs clang's resource
directory headers. This is different from the install-llvm-headers
target, which installs LLVM's API headers. We want to introduce the
corresponding target to clang, and the natural name for that new target
would be install-clang-headers. Rename the existing target to
install-clang-resource-headers to free up the install-clang-headers name
for the new target, following the discussion on cfe-dev [1].
I didn't find any bots on zorg referencing install-clang-headers. I'll
send out another PSA to cfe-dev to accompany this rename.
[1] http://lists.llvm.org/pipermail/cfe-dev/2019-February/061365.html
Reviewers: beanz, phosek, tstellar, rnk, dim, serge-sans-paille
Subscribers: mgorny, javed.absar, jdoerfert, #sanitizers, openmp-commits, lldb-commits, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #lldb, #openmp, #llvm
Differential Revision: https://reviews.llvm.org/D58791
llvm-svn: 355340
Summary:
Replace cut and pasted code with cmake macros and reduce the number of
install commands. This fixes an issue where the headers were being
installed twice.
This clean up should also make future modifications easier, like
adding a cmake option to install header files into a custom resource
directory.
Reviewers: chandlerc, smeenai, mgorny, beanz, phosek
Reviewed By: smeenai
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58537
llvm-svn: 355253
Summary:
In r353970, I enabled those features in C++11 and above. To be strictly
conforming, those features should only be enabled in C++17 and above.
Reviewers: jfb, eli.friedman
Subscribers: jkorous, dexonsmith, libcxx-commits
Differential Revision: https://reviews.llvm.org/D58289
llvm-svn: 354691
r344555 switched LLVM to guarding install targets with LLVM_ENABLE_IDE
instead of CMAKE_CONFIGURATION_TYPES, which expresses the intent more
directly and can be overridden by a user. Make the corresponding change
in clang. LLVM_ENABLE_IDE is computed by HandleLLVMOptions, so it should
be available for both standalone and integrated builds.
Differential Revision: https://reviews.llvm.org/D58284
llvm-svn: 354525
Summary:
Previously, those #defines were only provided in C or when GNU extensions were
enabled. We need those #defines in C++11 and above, too.
Reviewers: jfb, eli.friedman
Subscribers: jkorous, dexonsmith, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58149
llvm-svn: 353970
The rationale of this change is to fix _Unwind_Word / _Unwind_SWord
definitions for MIPS N32 ABI. This ABI uses 32-bit pointers,
but _Unwind_Word and _Unwind_SWord types are eight bytes long.
# The __attribute__((__mode__(__unwind_word__))) is added to the type
definitions. It makes them equal to the corresponding definitions used
by GCC and allows to override types using `getUnwindWordWidth` function.
# The `getUnwindWordWidth` virtual function override in the `MipsTargetInfo`
class and provides correct type size values.
Differential revision: https://reviews.llvm.org/D58165
llvm-svn: 353965
Add secondary triple to existing SSE test for it. I audited other uses
of __attribute__((__packed__)) in the intrinsic headers, and this seemed
to be the only missing one.
llvm-svn: 353878
_byteswap_* functions are are implemented in below file as normal function
from libucrt.lib and declared in stdlib.h. Define them in intrin.h triggers
lld error "conflicting comdat type" and "duplicate symbols" which was just
added to LLD (https://reviews.llvm.org/D57324).
C:\Program Files (x86)\Windows Kits\10\Source\10.0.17763.0\ucrt\stdlib\byteswap.cpp
Differential Revision: https://reviews.llvm.org/D57915
llvm-svn: 353740
Summary:
With MSVC, #pragma pack is ignored when there is explicit alignment. This differs from gcc. Clang emulates this difference when compiling for Windows.
It appears that MSVC and its headers consider the __m128/__m128i/__m128d/etc. types to be explicitly aligned and ignores #pragma pack for them. Since we don't have explicit alignment on them in our headers, we don't match the MSVC behavior here.
This patch adds explicit alignment to match this behavior. I'm hoping this won't cause any problems when we're not emulating MSVC. But if someone knows of something that would be different we can swith to conditionally adding the alignment based on _MSC_VER.
I had to add explicitly unaligned types as well so we could use them in the loadu/storeu intrinsics which use __attribute__(__packed__). Using the now explicitly aligned types wouldn't produce align 1 accesses when targeting Windows.
Reviewers: rnk, erichkeane, spatel, RKSimon
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D57961
llvm-svn: 353555
Instead of calling CUDA runtime to arrange function arguments,
the new API constructs arguments in a local array and the kernels
are launched with __cudaLaunchKernel().
The old API has been deprecated and is expected to go away
in the next CUDA release.
Differential Revision: https://reviews.llvm.org/D57488
llvm-svn: 352799
Re-enable format string warnings on printf.
The warnings are still incomplete. Apparently it is undefined to use a
vector specifier without a length modifier, which is not currently
warned on. Additionally, type warnings appear to not be working with
the hh modifier, and aren't warning on all of the special restrictions
from c99 printf.
llvm-svn: 352540
This reverts r348083. This was based on a misreading of the spec
for printf specifiers.
Also revert r343653, as without a subsequent patch, a correctly
specified format for a vector will incorrectly warn.
Fixes bug 40491.
llvm-svn: 352539
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
This patch attempts to redo what was tried in r278783, but was reverted.
These intrinsics should be available on non-windows platforms with "xsave" feature check. But on Windows platforms they shouldn't have feature check since that's how MSVC behaves.
To accomplish this I've added a MS builtin with no feature check. And a normal gcc builtin with a feature check. When _MSC_VER is not defined _xgetbv/_xsetbv will be macros pointing to the gcc builtin name.
I've moved the forward declarations from intrin.h to immintrin.h to match the MSDN documentation and used that as the header file for the MS builtin.
I'm not super happy with this implementation, and I'm open to suggestions for better ways to do it.
Reviewers: rnk, RKSimon, spatel
Reviewed By: rnk
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D56686
llvm-svn: 351160
The following two bugs in SystemZ high-level vector intrinsics are
fixes by this patch:
- The float case of vec_insert_and_zero should generate a VLLEZF
pattern, but currently erroneously generates VLLEZLF.
- The float and double versions of vec_orc erroneously generate
and-with-complement instead of or-with-complement.
The patch also fixes a couple of typos in the associated test.
llvm-svn: 349751
intrin.h had forward declarations for these and lzcntintrin.h had implementations that were only available with -mlzcnt or a -march that supported the lzcnt feature.
For MS compatibility we should always have these builtins available regardless of X86 being the target or the CPU support the lzcnt instruction. The backends should be able to gracefully fallback to something support even if its just shifts and bit ops.
Unfortunately, gcc also implements 2 of the 3 function names here on X86 when lzcnt feature is enabled.
This patch adds builtins for these for MSVC compatibility and drops the forward declarations from intrin.h. To keep the gcc compatibility the two intrinsics that collided have been turned into macros that use the X86 specific builtins with the lzcnt feature check. These macros are only defined when _MSC_VER is not defined. Without them being macros we can get a redefinition error because -ms-extensions doesn't seem to set _MSC_VER but does make the MS builtins available.
Should fix PR40014
Differential Revision: https://reviews.llvm.org/D55677
llvm-svn: 349098
The addcarry and addcarryx builtins do the same thing. The only difference is that addcarryx previously required adx feature.
This commit removes the adx feature check from addcarryx and removes the addcarry builtin. This matches the builtins that gcc has. We don't guarantee compatibility in builtins, but we generally try to be consistent if its not a burden.
llvm-svn: 348738
Summary:
LLDB.framework wants a copy these headers. With this change LLDB can easily glob for the list of files:
```
get_target_property(clang_include_dir clang-headers RUNTIME_OUTPUT_DIRECTORY)
file(GLOB_RECURSE clang_vendor_headers RELATIVE ${clang_include_dir} "${clang_include_dir}/*")
```
By default `RUNTIME_OUTPUT_DIRECTORY` is unset for custom targets like `clang-headers`.
Reviewers: aprantl, JDevlieghere, davide, friss, dexonsmith
Reviewed By: JDevlieghere
Subscribers: mgorny, #lldb, cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D55128
llvm-svn: 348116
A number of builtins in altivec.h load/store vectors from pointers to scalar
types. Currently they just cast the pointer to a vector pointer, but expressions
like that have the alignment of the target type. Of course, the input pointer
did not have that alignment so this triggers UBSan (and rightly so).
This resolves https://bugs.llvm.org/show_bug.cgi?id=39704
Differential revision: https://reviews.llvm.org/D54787
llvm-svn: 347556
The second parameter of vec_sr function is representing shift bits and it should be modulo the number of bits in the element like what vec_sl does now.
This is actually required by the ABI:
Each element of the result vector is the result of logically right shifting the corresponding
element of ARG1 by the number of bits specified by the value of the corresponding
element of ARG2, modulo the number of bits in the element. The bits that are shifted out
are replaced by zeros.
Differential Revision: https://reviews.llvm.org/D54087
llvm-svn: 346471
This patch breaks Index/opencl-types.cl LIT test:
Script:
--
: 'RUN: at line 1'; stage1/bin/c-index-test -test-print-type llvm/tools/clang/test/Index/opencl-types.cl -cl-std=CL2.0 | stage1/bin/FileCheck llvm/tools/clang/test/Index/opencl-types.cl
--
Command Output (stderr):
--
llvm/tools/clang/test/Index/opencl-types.cl:3:26: warning: unsupported OpenCL extension 'cl_khr_fp16' - ignoring [-Wignored-pragmas]
llvm/tools/clang/test/Index/opencl-types.cl:4:26: warning: unsupported OpenCL extension 'cl_khr_fp64' - ignoring [-Wignored-pragmas]
llvm/tools/clang/test/Index/opencl-types.cl:8:9: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:11:8: error: declaring variable of type 'half' is not allowed
llvm/tools/clang/test/Index/opencl-types.cl:15:3: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:16:3: error: use of type 'double4' (vector of 4 'double' values) requires cl_khr_fp64 extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:26:26: warning: unsupported OpenCL extension 'cl_khr_gl_msaa_sharing' - ignoring [-Wignored-pragmas]
llvm/tools/clang/test/Index/opencl-types.cl:35:44: error: use of type '__read_only image2d_msaa_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:36:49: error: use of type '__read_only image2d_array_msaa_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:37:49: error: use of type '__read_only image2d_msaa_depth_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:38:54: error: use of type '__read_only image2d_array_msaa_depth_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm-svn: 346338
Summary:
Some CPUID leafs depend on the value of ECX as well as EAX, but we left
it uninitialized.
Originally reported as https://crbug.com/901547
Reviewers: craig.topper, hans
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D54171
llvm-svn: 346265
This is fifth in a series of patches to move intrinsic definitions out of intrin.h.
Note: This was reviewed and approved in D54065 but somehow that diff was messed
up. Committing this again with the proper diff.
llvm-svn: 346205
Summary: This is fifth in a series of patches to move intrinsic definitions out of intrin.h.
Reviewers: rnk, efriedma, mstorsjo, TomTan
Reviewed By: efriedma
Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits
Differential Revision: https://reviews.llvm.org/D54065
llvm-svn: 346191
Summary: This is third in a series of patches to move intrinsic definitions out of intrin.h.
Reviewers: rnk, efriedma, mstorsjo, TomTan
Reviewed By: efriedma
Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits
Differential Revision: https://reviews.llvm.org/D54062
llvm-svn: 346189
Summary: Windows SDK needs these intrinsics to be proper builtins. This is second in a series of patches to move intrinsic defintions out of intrin.h.
Reviewers: rnk, mstorsjo, efriedma, TomTan
Reviewed By: rnk, efriedma
Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits
Differential Revision: https://reviews.llvm.org/D54046
llvm-svn: 346044
Summary:
PIPE_RESERVE_ID_VALID_BIT is implementation defined, so lets not keep it in the header.
Previously the topic was discussed here: https://reviews.llvm.org/D32896
Reviewers: Anastasia, yaxunl
Reviewed By: Anastasia
Subscribers: cfe-commits, asavonic, bader
Differential Revision: https://reviews.llvm.org/D52658
llvm-svn: 345051
Just adding a preprocessor #define for the extension.
Patch by Alexey Sotkin and Dmitry Sidorov
Phabricator review: https://reviews.llvm.org/D51402
llvm-svn: 345044
These intrinsics exist in icc. They can be found on the Intel Intrinsics Guide website.
All the backend support is in place to pattern match a load+bswap or a bswap+store pattern to the MOVBE instructions. So we just need to get the frontend to emit the correct IR. The pointer arguments in icc are declared as void so I had to jump through a packed struct to forcing a specific alignment on the load/store. Same trick we use in the unaligned vector load/store intrinsics
Differential Revision: https://reviews.llvm.org/D52586
llvm-svn: 343343
Previously we used a select and the zero_undef=true intrinsic. In -O2 this pattern will get optimized to zero_undef=false. But in -O0 this optimization won't happen. This results in a compare and cmov being wrapped around a tzcnt/lzcnt instruction.
By using the zero_undef=false intrinsic directly without the select, we can improve the -O0 codegen to just an lzcnt/tzcnt instruction.
Differential Revision: https://reviews.llvm.org/D52392
llvm-svn: 343126
These aren't documented in the Intel Intrinsics Guide, but are supported by gcc and icc.
Includes these intrinsics:
_ktestc_mask8_u8, _ktestz_mask8_u8, _ktest_mask8_u8
_ktestc_mask16_u8, _ktestz_mask16_u8, _ktest_mask16_u8
_ktestc_mask32_u8, _ktestz_mask32_u8, _ktest_mask32_u8
_ktestc_mask64_u8, _ktestz_mask64_u8, _ktest_mask64_u8
llvm-svn: 341265
This adds:
_cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64
_cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64
_load_mask8, _load_mask16, _load_mask32, _load_mask64
_store_mask8, _store_mask16, _store_mask32, _store_mask64
These are currently missing from the Intel Intrinsics Guide webpage.
llvm-svn: 341251
This adds the following intrinsics:
_kshiftli_mask8
_kshiftli_mask16
_kshiftli_mask32
_kshiftli_mask64
_kshiftri_mask8
_kshiftri_mask16
_kshiftri_mask32
_kshiftri_mask64
llvm-svn: 341234
This adds the following intrinsics:
_kadd_mask64
_kadd_mask32
_kadd_mask16
_kadd_mask8
These are missing from the Intel Intrinsics Guide, but are implemented by both gcc and icc.
llvm-svn: 340879
This also adds a second intrinsic name for the 16-bit mask versions.
These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed.
llvm-svn: 340719
r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h.
Microsoft's STL plans on using these functions, and they're using intrin0.h
which just has declarations of built-ins to not pull in the huge intrin.h
header in the standard library headers. That requires that these functions are
real built-ins.
https://reviews.llvm.org/D50907
llvm-svn: 340048
Summary:
These macros are defined in the C11 standard and can be defined based on
the __*_HAS_DENORM__ default macros.
Reviewers: bruno, rsmith, doug.gregor
Subscribers: llvm-commits, enh, srhines
Differential Revision: https://reviews.llvm.org/D37302
llvm-svn: 339284