To support std::complex and some other standard C/C++ functions in HIP device code,
they need to be forced to be __host__ __device__ functions by pragmas. This is done
by some clang standard C++ wrapper headers which are shared between cuda-clang and hip-Clang.
For these standard C++ wapper headers to work properly, specific include path order
has to be enforced:
clang C++ wrapper include path
standard C++ include path
clang include path
Also, these C++ wrapper headers require device version of some standard C/C++ functions
must be declared before including them. This needs to be done by including a default
header which declares or defines these device functions. The default header is always
included before any other headers are included by users.
This patch adds the the default header and include path for HIP.
Differential Revision: https://reviews.llvm.org/D81176
Added extensions and their function declarations into
the standard header.
Patch by Piotr Fusik!
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79781
Summary:
The shuffle intrinsic macros did not parenthesize usages of their
constant parameters, which could lead to incorrect results due to
operator precedence issues. This patch fixes the problem by adding the
missing paretheses.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80968
There are 65 that take a scalar shift amount. Intel documentation shows 60 of them taking unsigned int. There are 5 versions of srli_epi16 that use int, the 512-bit maskz and 128/256 mask/maskz.
Fixes PR45931
Differential Revision: https://reviews.llvm.org/D80251
Summary:
This reflects changes in the spec proposal made since basic arithmetic
was first implemented.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80174
Summary:
Add x86 feature with IBT and/or SHSTK bits to ELF program property if they are enabled. Otherwise, contents in this header file are unused.
This file is mainly design for assembly source code which want to enable CET
Reviewers: hjl.tools, annita.zhang, LuoYuanke, craig.topper, tstellar, pengfei, rsmith
Reviewed By: LuoYuanke
Subscribers: cfe-commits, mgorny
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79617
Summary:
Add x86 feature with IBT and/or SHSTK bits to ELF program property if they are enabled. Otherwise, contents in this header file are unused.
This file is mainly design for assembly source code which want to enable CET
Reviewers: hjl.tools, annita.zhang, LuoYuanke, craig.topper, tstellar, pengfei, rsmith
Reviewed By: LuoYuanke
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D79617
Summary:
Move instructions that have recently been implemented in V8 from the
`unimplemented-simd128` target feature to the `simd128` target
feature. The updated instructions match the update at
https://github.com/WebAssembly/simd/pull/223.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D79973
Summary:
As proposed in https://github.com/WebAssembly/simd/pull/122. Since
these instructions are not yet merged to the SIMD spec proposal, this
patch makes them entirely opt-in by surfacing them only through LLVM
intrinsics and clang builtins. If these instructions are made
official, these intrinsics and builtins should be replaced with simple
instruction patterns.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D79742
Summary:
Although using `__builtin_shufflevector` and the `shufflevector`
instruction works fine, they are not opaque to the optimizer. As a
result, DAGCombine can potentially reduce the number of shuffles and
change the shuffle masks. This is unexpected behavior for users of the
WebAssembly SIMD intrinsics who have crafted their shuffles to
optimize the code generated by engines. This patch solves the problem
by adding a new shuffle intrinsic that is opaque to the optimizers in
line with the decision of the WebAssembly SIMD contributors at
https://github.com/WebAssembly/simd/issues/196#issuecomment-622494748. In
the future we may implement custom DAG combines to properly optimize
shuffles and replace this solution.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D66983
Summary:
As described in https://github.com/WebAssembly/simd/pull/209. This is
the final reorganization of the SIMD opcode space before
standardization. It has been landed in concert with corresponding
changes in other projects in the WebAssembly SIMD ecosystem.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79224
Some intrinsics in vecintrin.h are currently implemented by
performing address arithmetic in __INTPTR_TYPE__ and converting
the result to some pointer type. While this works correctly,
it leads to suboptimal code generation since many optimizers
cannot trace the provenance of the resulting pointers.
Fixed by using "char *" pointer arithmetic instead.
System headers should avoid using the "vector" and "bool" keywords
since those might be redefined by user code. For example, using
<stdbool.h> before <vecintrin.h> will currently lead to compiler
errors.
Fixed by using the reserved "__vector" and "__bool" keywords
instead. NFC otherwise.
If we are in C++ mode and include <math.h> (not <cmath>) first, we still
need to make sure <cmath> is read first. The problem otherwise is that
we haven't seen the declarations of the math.h functions when the system
math.h includes our cmath overlay. However, our cmath overlay, or better
the underlying overlay, e.g. CUDA, uses the math.h functions. Since we
haven't declared them yet we get errors. CUDA avoids this by eagerly
declaring all math functions (in the __device__ space) but we cannot do
this. Instead we break the dependence by forcing cmath to go first.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D77774
For OpenMP target regions to piggy back on the CUDA/AMDGPU/... implementation of math functions,
we include the appropriate definitions inside of an `omp begin/end declare variant match(device={arch(nvptx)})` scope.
This way, the vendor specific math functions will become specialized versions of the system math functions.
When a system math function is called and specialized version is available the selection logic introduced in D75779
instead call the specialized version. In contrast to the code path we used so far, the system header is actually included.
This means functions without specialized versions are available and so are macro definitions.
This should address PR42061, PR42798, and PR42799.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D75788
Warnings in emmintrin.h and xmmintrin.h are reported by
-fsanitize=implicit-integer-sign-change.
Reviewed By: RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D77393
In d1705c1196 (D77238) we accidentally included subsequent changes and
did not only move the code into a new file (which was the intention).
We undo the changes now and re-introduce them with the appropriate test
changes later.
This is not supported to change anything but allow us to reuse the math
functions separately from the device functions, e.g., source them at
different times. This will be used by the OpenMP overlay.
This also adds two `return` keywords that were missing.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D77238
Summary:
The convention for the wasm_simd128.h intrinsics is to have the
integer sign in the lane interpretation rather than as a suffix. This
PR changes the names of the integer min, max, and avgr intrinsics to
match this convention.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77185
Summary:
As the WebAssembly SIMD proposal nears stabilization, there is desire
to use it with toolchains other than Emscripten. Moving the intrinsics
header to clang will make it available to WASI toolchains as well.
Reviewers: aheejin, sunfish
Subscribers: dschuff, mgorny, sbc100, jgravelle-google, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76959
Summary:
- Even though the bindless surface/texture interfaces are promoted,
there are still code using surface/texture references. For example,
[PR#26400](https://bugs.llvm.org/show_bug.cgi?id=26400) reports the
compilation issue for code using `tex2D` with texture references. For
better compatibility, this patch proposes the support of
surface/texture references.
- Due to the absent documentation and magic headers, it's believed that
`nvcc` does use builtins for texture support. From the limited NVVM
documentation[^nvvm] and NVPTX backend texture/surface related
tests[^test], it's believed that surface/texture references are
supported by replacing their reference types, which are annotated with
`device_builtin_surface_type`/`device_builtin_texture_type`, with the
corresponding handle-like object types, `cudaSurfaceObject_t` or
`cudaTextureObject_t`, in the device-side compilation. On the host
side, that global handle variables are registered and will be
established and updated later when corresponding binding/unbinding
APIs are called[^bind]. Surface/texture references are most like
device global variables but represented in different types on the host
and device sides.
- In this patch, the following changes are proposed to support that
behavior:
+ Refine `device_builtin_surface_type` and
`device_builtin_texture_type` attributes to be applied on `Type`
decl only to check whether a variable is of the surface/texture
reference type.
+ Add hooks in code generation to replace that reference types with
the correponding object types as well as all accesses to them. In
particular, `nvvm.texsurf.handle.internal` should be used to load
object handles from global reference variables[^texsurf] as well as
metadata annotations.
+ Generate host-side registration with proper template argument
parsing.
---
[^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf
[^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
[^bind]: See section 3.2.11.1.2 ``Texture reference API` in [CUDA C Programming Guide](https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf).
[^texsurf]: According to NVVM IR, `nvvm.texsurf.handle` should be used. But, the current backend doesn't have that supported. We may revise that later.
Reviewers: tra, rjmccall, yaxunl, a.sidorin
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76365
This patch adds 'q' to mean 'scalable vector' in the builtin
type string, and for SVE will return the matching builtin
type as defined in the C/C++ language extensions for SVE.
This patch also adds some scaffolding to generate the arm_sve.h
header file, and some builtin definitions (+CodeGen) to be able
to implement some simple masked load intrinsics that use the
ACLE types, such as:
svint8_t test_svld1_s8(svbool_t pg, const int8_t *base) {
return svld1_s8(pg, base);
}
Reviewers: efriedma, rjmccall, rovka, rsandifo-arm, rengolin
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75298
Summary:
This patch generalizes the existing code to support CDE intrinsics
which will share some properties with existing MVE intrinsics
(some of the intrinsics will be polymorphic and accept/return values
of MVE vector types).
Specifically the patch:
* Adds new tablegen backends -gen-arm-cde-builtin-def,
-gen-arm-cde-builtin-codegen, -gen-arm-cde-builtin-sema,
-gen-arm-cde-builtin-aliases, -gen-arm-cde-builtin-header based on
existing MVE backends.
* Renames the '__clang_arm_mve_alias' attribute into
'__clang_arm_builtin_alias' (it will be used with CDE intrinsics as
well as MVE intrinsics)
* Implements semantic checks for the coprocessor argument of the CDE
intrinsics as well as the existing coprocessor intrinsics.
* Adds one CDE intrinsic __arm_cx1 to test the above changes
Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen
Reviewed By: simon_tatham
Subscribers: sdesmalen, mgorny, kristof.beyls, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75850
arm_acle.h relied on `_MSC_VER` to determine if a given function was
already defined as a builtin. This was incorrect because
`-fms-extensions` enables these builtins, but is not responsible for
defining `_MSC_VER` on any target. The next closest thing is
`_MSC_EXTENSIONS`, which is only defined on Windows targets, but even
this is suboptimal. What this conditional is actually trying to
determine is if the given functions are defined as builtins, so just
check that directly.
I also attempted to do this for `__nop`, but in that case intrin.h,
which is only includable if `_MSC_VER` is defined, has its own
definition. So in that case `_MSC_VER` is correct.
Differential Revision: https://reviews.llvm.org/D75719
rdar://60102353
These declarations use a mix of unsigned and signed argument and
return types. This is not in accordance with OpenCL v2.0 s6.13.11.
Differential Revision: https://reviews.llvm.org/D74910
New intrinisics are implemented for when we need to port SIMD code from other
arhitectures and only load or store portions of MSA registers.
Following intriniscs are added which only load/store element 0 of a vector:
v4i32 __builtin_msa_ldrq_w (const void *, imm_n2048_2044);
v2i64 __builtin_msa_ldr_d (const void *, imm_n4096_4088);
void __builtin_msa_strq_w (v4i32, void *, imm_n2048_2044);
void __builtin_msa_str_d (v2i64, void *, imm_n4096_4088);
Differential Revision: https://reviews.llvm.org/D73644
Summary:
To use new/delete in NVPTX code we need to define them. Implementation
copied from CUDA wrappers.
Reviewers: hfinkel, jdoerfert
Subscribers: mgorny, guansong, kkwli0, caomhin, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73128
This makes clang somewhat forward-compatible with new CUDA releases
without having to patch it for every minor release without adding
any new function.
If an unknown version is found, clang issues a warning (can be disabled
with -Wno-cuda-unknown-version) and assumes that it has detected
the latest known version. CUDA releases are usually supersets
of older ones feature-wise, so it should be sufficient to keep
released clang versions working with minor CUDA updates without
having to upgrade clang, too.
Differential Revision: https://reviews.llvm.org/D73231
Use floating-point instead of integer zero constants to avoid
creating implicit conversions, which currently cause suboptimal
code to be generated with -ffp-exception-behavior=strict.
NFC otherwise.
Enabling `-Wcast-qual` identified many casts in various system headers
that were dropping the `const` qualifier. Fixing those missing
qualifiers pointed out that a few of the definitions of the builtins
did not properly identify their arguments as `const` pointers. This
commit fixes those builtin definitions, and the system header files
so that they no longer drop the qualifier.
Differential Revision: https://reviews.llvm.org/D71718
The forward declaration had a cdecl calling convention, but the
inline version did not. This leads to a conflict if the default
calling convention is not cdecl. Fix this by just removing the
forward declaration.
Fixes PR41503
We need to use a 64-bit type in 64-bit mode so a 64-bit register
will get used in the generated assembly. I've also changed the
constraints to just use "r" intead of "q". "q" forces to a only
an a/b/c/d register in 32-bit mode, but I see no reason that
would matter here.
Fixes Nico's note in PR19301 over 4 years ago.
Differential Revision: https://reviews.llvm.org/D70101
As we currently have it implemented in altivec.h, the offsets for these two
intrinsics are element offsets. The documentation in the ABI (as well as the
implementation in both XL and GCC) states that these should be byte offsets.
Differential revision: https://reviews.llvm.org/D63636
We currently emit a double precision comparison instruction for this, whereas we
need to emit the single precision version.
Differential revision: https://reviews.llvm.org/D64024
Summary:
Writing support for three ACLE functions:
unsigned int __cls(uint32_t x)
unsigned int __clsl(unsigned long x)
unsigned int __clsll(uint64_t x)
CLS stands for "Count number of leading sign bits".
In AArch64, these two intrinsics can be translated into the 'cls'
instruction directly. In AArch32, on the other hand, this functionality
is achieved by implementing it in terms of clz (count number of leading
zeros).
Reviewers: compnerd
Reviewed By: compnerd
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69250
This commit sets up the infrastructure for auto-generating <arm_mve.h>
and doing clang-side code generation for the builtins it relies on,
and demonstrates that it works by implementing a representative sample
of the ACLE intrinsics, more or less matching the ones introduced in
LLVM IR by D67158,D68699,D68700.
Like NEON, that header file will provide a set of vector types like
uint16x8_t and C functions with names like vaddq_u32(). Unlike NEON,
the ACLE spec for <arm_mve.h> includes a polymorphism system, so that
you can write plain vaddq() and disambiguate by the vector types you
pass to it.
Unlike the corresponding NEON code, I've arranged to make every user-
facing ACLE intrinsic into a clang builtin, and implement all the code
generation inside clang. So <arm_mve.h> itself contains nothing but
typedefs and function declarations, with the latter all using the new
`__attribute__((__clang_builtin))` system to arrange that the user-
facing function names correspond to the right internal BuiltinIDs.
So the new MveEmitter tablegen system specifies the full sequence of
IRBuilder operations that each user-facing ACLE intrinsic should
translate into. Where possible, the ACLE intrinsics map to standard IR
operations such as vector-typed `add` and `fadd`; where no standard
representation exists, I call down to the sample IR intrinsics
introduced in an earlier commit.
Doing it like this means that you get the polymorphism for free just
by using __attribute__((overloadable)): the clang overload resolution
decides which function declaration is the relevant one, and _then_ its
BuiltinID is looked up, so by the time we're doing code generation,
that's all been resolved by the standard system. It also means that
you get really nice error messages if the user passes the wrong
combination of types: clang will show the declarations from the header
file and explain why each one doesn't match.
(The obvious alternative approach would be to have wrapper functions
in <arm_mve.h> which pass their arguments to the underlying builtins.
But that doesn't work in the case where one of the arguments has to be
a constant integer: the wrapper function can't pass the constantness
through. So you'd have to do that case using a macro instead, and then
use C11 `_Generic` to handle the polymorphism. Then you have to add
horrible workarounds because `_Generic` requires even the untaken
branches to type-check successfully, and //then// if the user gets the
types wrong, the error message is totally unreadable!)
Reviewers: dmgreen, miyuki, ostannard
Subscribers: mgorny, javed.absar, kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D67161
These intrinsics use llvm.cttz intrinsics so are always available
even without the bmi feature. We already don't check for the bmi
feature on the intrinsics themselves. But we were blocking the
include of the header file with _MSC_VER unless BMI was enabled
on the command line.
Fixes PR30506.
llvm-svn: 374516
Summary:
This is similar to vec_ct* in https://reviews.llvm.org/rL304205.
The argument must be a constant, otherwise instruction selection
will fail. always_inline is not enough for isel to always fold
everything away at -O0.
The fix is to turn the function into macros in altivec.h.
Fixes https://bugs.llvm.org/show_bug.cgi?id=43072
Reviewers: nemanjai, hfinkel, #powerpc, wuzish
Reviewed By: #powerpc, wuzish
Subscribers: wuzish, kbarton, MaskRay, shchenz, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D66699
llvm-svn: 370902
vote.ballot instruction is gone in recent CUDA versions and
vote.sync.ballot can not be used because it needs a thread mask parameter.
Fortunately PTX 6.2 (introduced with CUDA-9.2) provides activemask.b32
instruction for this.
Differential Revision: https://reviews.llvm.org/D66665
llvm-svn: 370792
In C++ mode we should only avoid adding __OPENCL_C_VERSION__,
all other predefined macros about the language mode are still
valid.
This change also fixes the language version check in the
headers accordingly.
Differential Revision: https://reviews.llvm.org/D65941
llvm-svn: 368552
Port existing headers which include x86 intrinsics implementation to
PowerPC platform (using Altivec), along with tests. Also, tests about
including these intrinsic headers are combined.
The headers are mainly developed by Steven Munroe, with contributions
from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D65630
llvm-svn: 368392
Move the platform check out of PPC Linux toolchain code and add platform guards
to the intrinsic headers, since they are supported currently only on 64-bit
PowerPC targets.
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D64849
llvm-svn: 367281
Defining CLK_NULL_EVENT with a `(void*)` cast has the (unintended?)
side-effect that the address space will be fixed (as generic in OpenCL
2.0 mode). The consequence is that any target specific address space
for the clk_event_t type will not be applied.
It is not clear why the void pointer cast was needed in the first
place, and it seems we can do without it.
Differential Revision: https://reviews.llvm.org/D63876
llvm-svn: 366546
Remove dependency of malloc in implementation of mm_malloc function in PowerPC
intrinsics and alignment assumption on glibc.
Reviewed By: Hal Finkel
Differential Revision: https://reviews.llvm.org/D64850
llvm-svn: 366406
The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined.
This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function.
The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used.
I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example).
make check-all didn't show any new failures.
[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
Differential Revision: https://reviews.llvm.org/D64495
llvm-svn: 366197
This patch series adds support for the next-generation arch13
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining __VEC__ == 10303.
Note: No currently available Z system supports the arch13
architecture. Once new systems become available, the
official system name will be added as supported -march name.
llvm-svn: 365933
We accidentally lost the ATOMIC_VAR_INIT and ATOMIC_FLAG_INIT macros
in r363794.
Also put the `memory_order` typedef back inside a `>= CL2.0` guard.
llvm-svn: 364174
Add overloads with generic address space pointer to old atomics.
This is currently only added for C++ compilation mode.
Differential Revision: https://reviews.llvm.org/D62335
llvm-svn: 364071
These intrinsics should always take an immediate for the rounding mode.
The base instruction comes from before EVEX embdedded rounding. The
user should always provide the immediate rather than us assuming
CUR_DIRECTION.
Make the 512-bit versions also explicit aliases instead of copy
pasting the code.
llvm-svn: 363961
Summary:
AIX system headers need stdint.h and inttypes.h to be re-enterable when macro _STD_TYPES_T is defined so that limit macro definitions such as UINT32_MAX can be found. This patch attempts to allow that on AIX.
Reviewers: hubert.reinterpretcast, jasonliu, mclow.lists, EricWF
Reviewed by: hubert.reinterpretcast, mclow.lists
Subscribers: jfb, jsji, christof, cfe-commits, libcxx-commits, llvm-commits
Tags: #LLVM, #clang, #libc++
Differential Revision: https://reviews.llvm.org/D59253
llvm-svn: 363939
Using the -fdeclare-opencl-builtins option will require a way to
predefine types and macros such as `int4`, `CLK_GLOBAL_MEM_FENCE`,
etc. Move these out of opencl-c.h into opencl-c-base.h such that the
latter can be shared by -fdeclare-opencl-builtins and
-finclude-default-header.
This changes the behaviour of -finclude-default-header when
-fdeclare-opencl-builtins is specified: instead of including the full
header, it will include the header with only the base definitions.
Differential revision: https://reviews.llvm.org/D63256
llvm-svn: 363794
Port emmintrin.h which include Intel SSE2 intrinsics implementation to PowerPC platform (using Altivec).
The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
It's a follow-up patch of D62121.
Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Differential Revision: https://reviews.llvm.org/D62569
llvm-svn: 363122
Summary:
Remove unnecessary definition (otherwise the extension will be defined
where it's not supposed to be defined).
Consider the code:
#pragma OPENCL EXTENSION cl_intel_planar_yuv : begin
// some declarations
#pragma OPENCL EXTENSION cl_intel_planar_yuv : end
is enough for extension to become known for clang.
Patch by: Dmitry Sidorov <dmitry.sidorov@intel.com>
Reviewers: Anastasia, yaxunl
Reviewed By: Anastasia
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58666
llvm-svn: 362398
Port xmmintrin.h which include Intel SSE intrinsics implementation to PowerPC platform (using Altivec).
The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D62121
llvm-svn: 362190
Port xmmintrin.h which include Intel SSE intrinsics implementation to PowerPC platform (using Altivec).
The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D62121
llvm-svn: 361928
Summary: When including the random header in C++, some of the math functions it relies on are not present in the CUDA headers. We include this variants in this case.
Reviewers: jdoerfert, hfinkel, tra, caomhin
Reviewed By: tra
Subscribers: efriedma, guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D62046
llvm-svn: 361066
Previously we were doing this so that the 256 bit selectw builtin could be used in the implementation of the 512->256 bit conversion intrinsic.
After this commit we now use a masked convert builtin that will emit the intrinsic call and the 256-bit select from custom code in CGBuiltin. Then the header only needs to call that one intrinsic.
llvm-svn: 360924
Summary:
This is a fix for the reported bug:
[[ https://bugs.llvm.org/show_bug.cgi?id=41861 | 41861 ]]
abs functions need to be moved under the c++ macro to avoid conflicts with included headers.
Reviewers: tra, jdoerfert, hfinkel, ABataev, caomhin
Reviewed By: jdoerfert
Subscribers: guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61959
llvm-svn: 360809
Summary: In OpenMP device offloading we must ensure that unde C++ 17, the inclusion of cstdlib will works correctly.
Reviewers: ABataev, tra, jdoerfert, hfinkel, caomhin
Reviewed By: jdoerfert
Subscribers: Hahnfeld, guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61949
llvm-svn: 360804
Currently `immintrin.h` includes `pconfigintrin.h` and `sgxintrin.h`
which contain inline assembly. It causes failures when building with the
flag `-fno-gnu-inline-asm`.
Fix by excluding functions with inline assembly when this extension is
disabled. So far there was no need to support `_pconfig_u32`,
`_enclu_u32`, `_encls_u32`, `_enclv_u32` on platforms that require
`-fno-gnu-inline-asm`. But if developers start using these functions,
they'll have compile-time undeclared identifier errors which is
preferrable to runtime errors.
rdar://problem/49540880
Reviewers: craig.topper, GBuella, rnk, echristo
Reviewed By: rnk
Subscribers: jkorous, dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D61621
llvm-svn: 360630
Summary: This patches fixes an issue in which the __clang_cuda_cmath.h header is being included even when cmath or math.h headers are not included.
Reviewers: jdoerfert, ABataev, hfinkel, caomhin, tra
Reviewed By: tra
Subscribers: tra, mgorny, guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61765
llvm-svn: 360626
This reverts r360271 (git commit a0933bd8ec)
There are concerns on the review that this breaks EFI builds and that
the transitive includes (sal.h) are actually heavy enough that we might
care.
llvm-svn: 360291
compatibility. This allows some applications developed with MSVC to
compile with clang without any extra changes.
Fixes: llvm.org/PR40789
Differential Revision: https://reviews.llvm.org/D61646
llvm-svn: 360271
Summary:
In this patch we propose a temporary solution to resolving math functions for the NVPTX toolchain, temporary until OpenMP variant is supported by Clang.
We intercept the inclusion of math.h and cmath headers and if we are in the OpenMP-NVPTX case, we re-use CUDA's math function resolution mechanism.
Authors:
@gtbercea
@jdoerfert
Reviewers: hfinkel, caomhin, ABataev, tra
Reviewed By: hfinkel, ABataev, tra
Subscribers: JDevlieghere, mgorny, guansong, cfe-commits, jdoerfert
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61399
llvm-svn: 360265
This commit appears to be breaking stage-2 builds on GreenDragon. The
OpenMP wrappers for cmath and math.h are copied into the root of the
resource directory and cause a cyclic dependency in module 'Darwin':
Darwin -> std -> Darwin. This blows up when CMake is testing for modules
support and breaks all stage 2 module builds, including the ThinLTO bot
and all LLDB bots.
CMake Error at cmake/modules/HandleLLVMOptions.cmake:497 (message):
LLVM_ENABLE_MODULES is not supported by this compiler
llvm-svn: 360192
Summary:
In this patch we propose a temporary solution to resolving math functions for the NVPTX toolchain, temporary until OpenMP variant is supported by Clang.
We intercept the inclusion of math.h and cmath headers and if we are in the OpenMP-NVPTX case, we re-use CUDA's math function resolution mechanism.
Authors:
@gtbercea
@jdoerfert
Reviewers: hfinkel, caomhin, ABataev, tra
Reviewed By: hfinkel, ABataev, tra
Subscribers: mgorny, guansong, cfe-commits, jdoerfert
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61399
llvm-svn: 360063
Summary:
1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
2. Enable intrinsics for VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
For more details about BF16 intrinsic, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference
Patch by LiuTianle
Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: mgorny, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D60552
llvm-svn: 360018
Summary:
This is a follow up to r355253 and a better fix than the first attempt
which was r359257.
We can't install anything from ${CMAKE_CFG_INTDIR}, because this value
is only defined at build time, but we still must make sure to copy the
headers into ${CMAKE_CFG_INTDIR}/lib/clang/$VERSION/include, because the lit
tests look for headers there. So for this fix we revert to the
old behavior of copying the headers to ${CMAKE_CFG_INTDIR}/lib/clang/$VERSION/include
during the build and then installing them from the source tree.
Reviewers: smeenai, vzakhari, phosek
Reviewed By: smeenai, vzakhari
Subscribers: mgorny, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61220
llvm-svn: 359654
This provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
These intrinsics are available when __ARM_FEATURE_MEMORY_TAGGING is defined.
Each intrinsic is described in detail in the ACLE Q1 2019 documentation:
https://developer.arm.com/docs/101028/latest
Reviewed By: Tim Nortover, David Spickett
Differential Revision: https://reviews.llvm.org/D60485
llvm-svn: 359348
Summary:
This is a follow up to r355253, which inadvertently broke Visual
Studio builds by trying to copy files from CMAKE_CFG_INTDIR.
See https://reviews.llvm.org/D58537#inline-532492
Reviewers: smeenai, vzakhari, phosek
Reviewed By: smeenai
Subscribers: mgorny, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61054
llvm-svn: 359257
Port mmintrin.h which include x86 MMX intrinsics implementation to PowerPC platform (using Altivec).
To make the include process correct, PowerPC's toolchain class is overrided to insert new headers directory (named ppc_wrappers) into the path. Basic test cases for several intrinsic functions are added.
The header is mainly developed by Steven Munroe, with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D59924
llvm-svn: 358949
The pattern we replaced these with may be too hard to match as demonstrated by
PR41496 and PR41316.
This patch restores the intrinsics and then we can start focusing
on the optimizing the intrinsics.
I've mostly reverted the original patch that removed them. Though I modified
the avx512 intrinsics to not have masking built in.
Differential Revision: https://reviews.llvm.org/D60674
llvm-svn: 358427
Summary:
These all had somewhat custom file headers with different text from the
ones I searched for previously, and so I missed them. Thanks to Hal and
Kristina and others who prompted me to fix this, and sorry it took so
long.
Reviewers: hfinkel
Subscribers: mcrosier, javed.absar, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D60406
llvm-svn: 357941
[IMPORTANT]
With that last fix, CUDA has just started being compiling by clang on Windows after nearly a year and two clang’s major releases (7 and 8).
As long as the last LLVM release, in which clang was compiling CUDA on Windows successfully, was 6.0.1, this fix and two previous have to be included into upcoming 7.1.0 and 8.0.1 releases.
[How to repro]
clang++.exe -x cuda "c:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\0_Simple\simplePrintf\simplePrintf.cu" -I"c:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\common\inc" --cuda-gpu-arch=sm_50 --cuda-path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0" -L"c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64" -lcudart.lib -v
[Output]
In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:390:11: error: no matching function for call to '__isinfl'
return (__isinfl(a) != 0);
^~~~~~~~
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2662:14: note: candidate function not viable: call to __host__ function from __device__ function
__func__(int __isinfl(long double a))
^
In file included from <built-in>:1:
In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:438:11: error: no matching function for call to '__isnanl'
return (__isnanl(a) != 0);
^~~~~~~~
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2672:14: note: candidate function not viable: call to __host__ function from __device__ function
__func__(int __isnanl(long double a))
^
In file included from <built-in>:1:
In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:486:11: error: no matching function for call to '__finitel'
return (__finitel(a) != 0);
^~~~~~~~~
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2652:14: note: candidate function not viable: call to __host__ function from __device__ function
__func__(int __finitel(long double a))
^
3 errors generated when compiling for sm_50.
[Solution]
Add missing long double device functions' declarations. Provide only declarations to prevent any use of long double on the device side, because CUDA does not support long double on the device side.
[Testing]
{Windows 10, Ubuntu 16.04.5}/{Visual C++ 2017 15.9.9, gcc+ 5.4.0}/CUDA {8.0, 9.0, 9.1, 9.2, 10.0, 10.1}
Reviewed by: Artem Belevich
Differential Revision: http://reviews.llvm.org/D60220
llvm-svn: 357779
Summary:
These are all implemented by icc as well.
I made bit_scan_forward/reverse forward to the __bsfd/__bsrq since we also have
__bsfq/__bsrq.
Note, when lzcnt is enabled the bsr intrinsics generates lzcnt+xor instead of bsr.
Reviewers: RKSimon, spatel
Subscribers: cfe-commits, llvm-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59682
llvm-svn: 356848
gcc and icc both implement popcntd and popcntq which we did not. gcc doesn't seem to require a feature flag for the _popcnt32/_popcnt64 spelling and will use a libcall if its not supported.
Differential Revision: https://reviews.llvm.org/D59567
llvm-svn: 356689
gcc has these intrinsics in ia32intrin.h as well. And icc implements them
though they aren't documented in the Intel Intrinsics Guide.
Differential Revision: https://reviews.llvm.org/D59533
llvm-svn: 356609
This is another attempt at what Erich Keane tried to do in r355322.
This adds rolb, rolw, rold, rolq and their ror equivalent as always_inline wrappers around __builtin_rotate* which will lower to funnel shift intrinsics in IR.
Additionally, when _MSC_VER is not defined we will define _rotl, _lrotl, _rotr, _lrotr as macros to one of the always_inline intrinsics mentioned above. Making sure that _lrotl/_lrotr use either 32 or 64 bit based on the size of long. These need to be macros because we have builtins with the same name for MS compatibility, but _MSC_VER isn't always defined when those builtins are enabled.
We also define _rotwl and _rotwr as macros aliasing to rolw/rorw just like gcc to complete the set. These don't need to be gated with _MSC_VER because these aren't MS builtins.
I've added tests both for non-MS and -ms-extensions with and without _MSC_VER being defined.
Differential Revision: https://reviews.llvm.org/D59346
llvm-svn: 356423
Partial fix for the clang Bug 38811 "Clang fails to compile with CUDA-9.x on Windows".
[Synopsis]
__sptr is a new Microsoft specific modifier (https://docs.microsoft.com/en-us/cpp/cpp/sptr-uptr?view=vs-2017).
[Solution]
Replace all `__sptr` occurrences with `__s` (and all `__cptr` with `__c` as well) to eliminate the below clang compilation error on Windows.
In file included from C:\GIT\LLVM\trunk\llvm-64-release-vs2017-15.9.5\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:162:
C:\GIT\LLVM\trunk\llvm-64-release-vs2017-15.9.5\dist\lib\clang\9.0.0\include\__clang_cuda_device_functions.h:524:33: error: expected expression
return __nv_fast_sincosf(__a, __sptr, __cptr);
^
Reviewed by: Artem Belevich
Differential Revision: http://reviews.llvm.org/D59423
llvm-svn: 356291
Partial fix for the clang Bug https://bugs.llvm.org/show_bug.cgi?id=38811 "Clang fails to compile with CUDA-9.x on Windows".
Adding defined(_WIN64) check along with existing #if defined(__LP64__) eliminates the below clang (64-bit) compilation error on Windows.
C:/GIT/LLVM/trunk/llvm-64-release-vs2017/dist/lib/clang/9.0.0\include\__clang_cuda_device_functions.h(1609,45): error GEF7559A7: no matching function for call to 'roundf'
__DEVICE__ long lroundf(float __a) { return roundf(__a); }
Reviewed by: Artem Belevich
Differential Revision: http://reviews.llvm.org/D59361
llvm-svn: 356255
Summary:
The current install-clang-headers target installs clang's resource
directory headers. This is different from the install-llvm-headers
target, which installs LLVM's API headers. We want to introduce the
corresponding target to clang, and the natural name for that new target
would be install-clang-headers. Rename the existing target to
install-clang-resource-headers to free up the install-clang-headers name
for the new target, following the discussion on cfe-dev [1].
I didn't find any bots on zorg referencing install-clang-headers. I'll
send out another PSA to cfe-dev to accompany this rename.
[1] http://lists.llvm.org/pipermail/cfe-dev/2019-February/061365.html
Reviewers: beanz, phosek, tstellar, rnk, dim, serge-sans-paille
Subscribers: mgorny, javed.absar, jdoerfert, #sanitizers, openmp-commits, lldb-commits, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #lldb, #openmp, #llvm
Differential Revision: https://reviews.llvm.org/D58791
llvm-svn: 355340
Summary:
Replace cut and pasted code with cmake macros and reduce the number of
install commands. This fixes an issue where the headers were being
installed twice.
This clean up should also make future modifications easier, like
adding a cmake option to install header files into a custom resource
directory.
Reviewers: chandlerc, smeenai, mgorny, beanz, phosek
Reviewed By: smeenai
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58537
llvm-svn: 355253
Summary:
In r353970, I enabled those features in C++11 and above. To be strictly
conforming, those features should only be enabled in C++17 and above.
Reviewers: jfb, eli.friedman
Subscribers: jkorous, dexonsmith, libcxx-commits
Differential Revision: https://reviews.llvm.org/D58289
llvm-svn: 354691
r344555 switched LLVM to guarding install targets with LLVM_ENABLE_IDE
instead of CMAKE_CONFIGURATION_TYPES, which expresses the intent more
directly and can be overridden by a user. Make the corresponding change
in clang. LLVM_ENABLE_IDE is computed by HandleLLVMOptions, so it should
be available for both standalone and integrated builds.
Differential Revision: https://reviews.llvm.org/D58284
llvm-svn: 354525
Summary:
Previously, those #defines were only provided in C or when GNU extensions were
enabled. We need those #defines in C++11 and above, too.
Reviewers: jfb, eli.friedman
Subscribers: jkorous, dexonsmith, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58149
llvm-svn: 353970
The rationale of this change is to fix _Unwind_Word / _Unwind_SWord
definitions for MIPS N32 ABI. This ABI uses 32-bit pointers,
but _Unwind_Word and _Unwind_SWord types are eight bytes long.
# The __attribute__((__mode__(__unwind_word__))) is added to the type
definitions. It makes them equal to the corresponding definitions used
by GCC and allows to override types using `getUnwindWordWidth` function.
# The `getUnwindWordWidth` virtual function override in the `MipsTargetInfo`
class and provides correct type size values.
Differential revision: https://reviews.llvm.org/D58165
llvm-svn: 353965
Add secondary triple to existing SSE test for it. I audited other uses
of __attribute__((__packed__)) in the intrinsic headers, and this seemed
to be the only missing one.
llvm-svn: 353878
_byteswap_* functions are are implemented in below file as normal function
from libucrt.lib and declared in stdlib.h. Define them in intrin.h triggers
lld error "conflicting comdat type" and "duplicate symbols" which was just
added to LLD (https://reviews.llvm.org/D57324).
C:\Program Files (x86)\Windows Kits\10\Source\10.0.17763.0\ucrt\stdlib\byteswap.cpp
Differential Revision: https://reviews.llvm.org/D57915
llvm-svn: 353740
Summary:
With MSVC, #pragma pack is ignored when there is explicit alignment. This differs from gcc. Clang emulates this difference when compiling for Windows.
It appears that MSVC and its headers consider the __m128/__m128i/__m128d/etc. types to be explicitly aligned and ignores #pragma pack for them. Since we don't have explicit alignment on them in our headers, we don't match the MSVC behavior here.
This patch adds explicit alignment to match this behavior. I'm hoping this won't cause any problems when we're not emulating MSVC. But if someone knows of something that would be different we can swith to conditionally adding the alignment based on _MSC_VER.
I had to add explicitly unaligned types as well so we could use them in the loadu/storeu intrinsics which use __attribute__(__packed__). Using the now explicitly aligned types wouldn't produce align 1 accesses when targeting Windows.
Reviewers: rnk, erichkeane, spatel, RKSimon
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D57961
llvm-svn: 353555
Instead of calling CUDA runtime to arrange function arguments,
the new API constructs arguments in a local array and the kernels
are launched with __cudaLaunchKernel().
The old API has been deprecated and is expected to go away
in the next CUDA release.
Differential Revision: https://reviews.llvm.org/D57488
llvm-svn: 352799
Re-enable format string warnings on printf.
The warnings are still incomplete. Apparently it is undefined to use a
vector specifier without a length modifier, which is not currently
warned on. Additionally, type warnings appear to not be working with
the hh modifier, and aren't warning on all of the special restrictions
from c99 printf.
llvm-svn: 352540
This reverts r348083. This was based on a misreading of the spec
for printf specifiers.
Also revert r343653, as without a subsequent patch, a correctly
specified format for a vector will incorrectly warn.
Fixes bug 40491.
llvm-svn: 352539
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636