Commit Graph

1683 Commits

Author SHA1 Message Date
Yaxun (Sam) Liu 1fa43e0b34 recommit "[HIP] Add default header and include path"
recommit 11d06b9511 with
fix for lit tests.
2020-06-05 20:41:15 -04:00
Yaxun (Sam) Liu 8a8c6913a9 Revert "[HIP] Add default header and include path"
This reverts commit 11d06b9511.
2020-06-05 15:42:57 -04:00
Yaxun (Sam) Liu 11d06b9511 [HIP] Add default header and include path
To support std::complex and some other standard C/C++ functions in HIP device code,
they need to be forced to be __host__ __device__ functions by pragmas. This is done
by some clang standard C++ wrapper headers which are shared between cuda-clang and hip-Clang.

For these standard C++ wapper headers to work properly, specific include path order
has to be enforced:

  clang C++ wrapper include path
  standard C++ include path
  clang include path

Also, these C++ wrapper headers require device version of some standard C/C++ functions
must be declared before including them. This needs to be done by including a default
header which declares or defines these device functions. The default header is always
included before any other headers are included by users.

This patch adds the the default header and include path for HIP.

Differential Revision: https://reviews.llvm.org/D81176
2020-06-05 12:44:57 -04:00
Ties Stuij a6fcf5ca03 [clang][BFloat] add NEON emitter for bfloat
Summary:
This patch adds the bfloat16_t struct typedefs (e.g. bfloat16x8x2_t) to
arm_neon.h

This patch is part of a series implementing the Bfloat16 extension of the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:
- Luke Cheeseman
- Simon Tatham
- Ties Stuij

Reviewers: t.p.northover, fpetrogalli, sdesmalen, az, LukeGeeson

Reviewed By: fpetrogalli

Subscribers: SjoerdMeijer, LukeGeeson, pbarrio, mgorny, kristof.beyls, ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79708
2020-06-05 14:11:51 +01:00
Anastasia Stulova 4a4402f0d7 [OpenCL] Add cl_khr_extended_subgroup extensions.
Added extensions and their function declarations into
the standard header.

Patch by Piotr Fusik!

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79781
2020-06-04 13:29:30 +01:00
Thomas Lively 237be3404b [WebAssembly] Improve macro hygiene in wasm_simd128.h
Summary:
The shuffle intrinsic macros did not parenthesize usages of their
constant parameters, which could lead to incorrect results due to
operator precedence issues. This patch fixes the problem by adding the
missing paretheses.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D80968
2020-06-02 12:55:06 -07:00
Craig Topper 1b02db52b7 [X86] Update some av512 shift intrinsics to use "unsigned int" parameter instead of int to match Intel documentation
There are 65 that take a scalar shift amount. Intel documentation shows 60 of them taking unsigned int. There are 5 versions of srli_epi16 that use int, the 512-bit maskz and 128/256 mask/maskz.

Fixes PR45931

Differential Revision: https://reviews.llvm.org/D80251
2020-05-22 20:12:57 -07:00
Thomas Lively 3181273be7 [WebAssembly] Implement i64x2.mul and remove i8x16.mul
Summary:
This reflects changes in the spec proposal made since basic arithmetic
was first implemented.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80174
2020-05-19 12:50:44 -07:00
Xiang1 Zhang bcc0c894f3 Add cet.h for writing CET-enabled assembly code
Summary:
Add x86 feature with IBT and/or SHSTK bits to ELF program property if they  are enabled. Otherwise, contents in this header file are unused.
This file is mainly design for assembly source code which want to enable CET

Reviewers: hjl.tools, annita.zhang, LuoYuanke, craig.topper, tstellar, pengfei, rsmith

Reviewed By: LuoYuanke

Subscribers: cfe-commits, mgorny

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79617
2020-05-19 14:03:17 +08:00
Xiang1 Zhang 62a9eca859 Test asm-cet.S fail for window clang
This reverts commit e7e84ff24a.
2020-05-19 13:18:05 +08:00
Xiang1 Zhang e7e84ff24a Add cet.h for writing CET-enabled assembly code
Summary:
Add x86 feature with IBT and/or SHSTK bits to ELF program property if they  are enabled. Otherwise, contents in this header file are unused.
This file is mainly design for assembly source code which want to enable CET

Reviewers: hjl.tools, annita.zhang, LuoYuanke, craig.topper, tstellar, pengfei, rsmith

Reviewed By: LuoYuanke

Subscribers: mgorny

Differential Revision: https://reviews.llvm.org/D79617
2020-05-19 10:37:46 +08:00
Thomas Lively c702d4bf41 [WebAssembly] Update latest implemented SIMD instructions
Summary:
Move instructions that have recently been implemented in V8 from the
`unimplemented-simd128` target feature to the `simd128` target
feature. The updated instructions match the update at
https://github.com/WebAssembly/simd/pull/223.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D79973
2020-05-15 10:53:02 -07:00
Thomas Lively 3d49d1cfa7 [WebAssembly] Implement pseudo-min/max SIMD instructions
Summary:
As proposed in https://github.com/WebAssembly/simd/pull/122. Since
these instructions are not yet merged to the SIMD spec proposal, this
patch makes them entirely opt-in by surfacing them only through LLVM
intrinsics and clang builtins. If these instructions are made
official, these intrinsics and builtins should be replaced with simple
instruction patterns.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D79742
2020-05-12 09:39:01 -07:00
Thomas Lively 8e3e56f2a3 [WebAssembly] Add wasm-specific vector shuffle builtin and intrinsic
Summary:

Although using `__builtin_shufflevector` and the `shufflevector`
instruction works fine, they are not opaque to the optimizer. As a
result, DAGCombine can potentially reduce the number of shuffles and
change the shuffle masks. This is unexpected behavior for users of the
WebAssembly SIMD intrinsics who have crafted their shuffles to
optimize the code generated by engines. This patch solves the problem
by adding a new shuffle intrinsic that is opaque to the optimizers in
line with the decision of the WebAssembly SIMD contributors at
https://github.com/WebAssembly/simd/issues/196#issuecomment-622494748. In
the future we may implement custom DAG combines to properly optimize
shuffles and replace this solution.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D66983
2020-05-11 10:01:55 -07:00
Thomas Lively e0f52842c8 [WebAssembly] Renumber SIMD opcodes
Summary:
As described in https://github.com/WebAssembly/simd/pull/209. This is
the final reorganization of the SIMD opcode space before
standardization. It has been landed in concert with corresponding
changes in other projects in the WebAssembly SIMD ecosystem.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79224
2020-05-01 17:20:49 -07:00
Craig Topper af28e02e74 [clang] Add vendor identity for Hygon Dhyana processor to cpuid.h
The vendor id is used to determine whether the processor
supports hardware CRC32 in the Scudo code. The previous
discussion about the patch is in [1], and more information
about Hygon Dhyana processor is in[2].

[1]: https://reviews.llvm.org/D62368
[2]: https://git.kernel.org/torvalds/c/c9661c1e80b609cd038db7c908e061f0535804ef

Patch by fanjinke (Jinke Fan)

Differential Revision: https://reviews.llvm.org/D78874
2020-04-30 18:17:01 -07:00
Douglas Yung 046130490f Add header guards for header files that should not be included on the PS4 platform.
Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79194
2020-04-30 16:17:34 -07:00
Ulrich Weigand 095ccf4455 [SystemZ] Avoid __INTPTR_TYPE__ conversions in vecintrin.h
Some intrinsics in vecintrin.h are currently implemented by
performing address arithmetic in __INTPTR_TYPE__ and converting
the result to some pointer type.  While this works correctly,
it leads to suboptimal code generation since many optimizers
cannot trace the provenance of the resulting pointers.

Fixed by using "char *" pointer arithmetic instead.
2020-04-28 18:49:49 +02:00
Ulrich Weigand c90e09b13c [SystemZ] Use reserved keywords in vecintrin.h
System headers should avoid using the "vector" and "bool" keywords
since those might be redefined by user code.  For example, using
<stdbool.h> before <vecintrin.h> will currently lead to compiler
errors.

Fixed by using the reserved "__vector" and "__bool" keywords
instead.  NFC otherwise.
2020-04-28 18:49:48 +02:00
Raul Tambre 8e20516540 [CUDA] Define __CUDACC__ before standard library headers
libstdc++ since version 7 when GNU extensions are enabled (e.g. -std=gnu++11)
use it to avoid defining overloads using `__float128`.  This fixes compiling
with GNU extensions failing due to `__float128` being used.

Discovered at https://gitlab.kitware.com/cmake/cmake/-/merge_requests/4442#note_737136.

Differential Revision: https://reviews.llvm.org/D78392
2020-04-17 12:56:13 -07:00
Johannes Doerfert 17d8334223 [OpenMP] Allow <math.h> to go first in C++-mode in target regions
If we are in C++ mode and include <math.h> (not <cmath>) first, we still
need to make sure <cmath> is read first. The problem otherwise is that
we haven't seen the declarations of the math.h functions when the system
math.h includes our cmath overlay. However, our cmath overlay, or better
the underlying overlay, e.g. CUDA, uses the math.h functions. Since we
haven't declared them yet we get errors. CUDA avoids this by eagerly
declaring all math functions (in the __device__ space) but we cannot do
this. Instead we break the dependence by forcing cmath to go first.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D77774
2020-04-09 22:10:31 -05:00
WangTianQing a3dc949000 [X86] Add TSXLDTRK instructions.
Summary: For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference

Reviewers: craig.topper, RKSimon, LuoYuanke

Reviewed By: craig.topper

Subscribers: mgorny, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77205
2020-04-09 13:17:29 +08:00
Johannes Doerfert f85ae058f5 [OpenMP] Provide math functions in OpenMP device code via OpenMP variants
For OpenMP target regions to piggy back on the CUDA/AMDGPU/... implementation of math functions,
we include the appropriate definitions inside of an `omp begin/end declare variant match(device={arch(nvptx)})` scope.
This way, the vendor specific math functions will become specialized versions of the system math functions.
When a system math function is called and specialized version is available the selection logic introduced in D75779
instead call the specialized version. In contrast to the code path we used so far, the system header is actually included.
This means functions without specialized versions are available and so are macro definitions.

This should address PR42061, PR42798, and PR42799.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D75788
2020-04-07 23:33:24 -05:00
Pierre Gousseau 08fab9ebec [X86] Fix implicit sign conversion warnings in X86 headers.
Warnings in emmintrin.h and xmmintrin.h are reported by
-fsanitize=implicit-integer-sign-change.

Reviewed By: RKSimon, craig.topper

Differential Revision: https://reviews.llvm.org/D77393
2020-04-07 11:25:08 +01:00
WangTianQing d08fadd662 [X86] Add SERIALIZE instruction.
Summary: For more details about this instruction, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference

Reviewers: craig.topper, RKSimon, LuoYuanke

Reviewed By: craig.topper

Subscribers: mgorny, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77193
2020-04-02 16:19:23 +08:00
Johannes Doerfert b0b5f0416b [OpenMP][FIX] Undo changes accidentally already introduced in NFC commit
In d1705c1196 (D77238) we accidentally included subsequent changes and
did not only move the code into a new file (which was the intention).
We undo the changes now and re-introduce them with the appropriate test
changes later.
2020-04-02 01:33:39 -05:00
Johannes Doerfert 410cfc478f [OpenMP][FIX] Add second include after header was split in d1705c1196
The math wrapper handling is going to be replaced shortly and
d1705c1196 was actually a precursor for
that.
2020-04-02 00:20:23 -05:00
Johannes Doerfert d1705c1196 [CUDA][NFC] Split math.h functions out of __clang_cuda_device_functions.h
This is not supported to change anything but allow us to reuse the math
functions separately from the device functions, e.g., source them at
different times. This will be used by the OpenMP overlay.

This also adds two `return` keywords that were missing.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D77238
2020-04-01 23:46:27 -05:00
Thomas Lively 95fac2e46b [WebAssembly] Rename SIMD min/max/avgr intrinsics for consistency
Summary:
The convention for the wasm_simd128.h intrinsics is to have the
integer sign in the lane interpretation rather than as a suffix. This
PR changes the names of the integer min, max, and avgr intrinsics to
match this convention.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77185
2020-04-01 09:38:41 -07:00
Thomas Lively 5074776de4 [WebAssembly] Import wasm_simd128.h from Emscripten
Summary:
As the WebAssembly SIMD proposal nears stabilization, there is desire
to use it with toolchains other than Emscripten. Moving the intrinsics
header to clang will make it available to WASI toolchains as well.

Reviewers: aheejin, sunfish

Subscribers: dschuff, mgorny, sbc100, jgravelle-google, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76959
2020-03-30 17:04:18 -07:00
Michael Liao 5be9b8cbe2 [cuda][hip] Add CUDA builtin surface/texture reference support.
Summary: - Re-commit after fix Sema checks on partial template specialization.

Reviewers: tra, rjmccall, yaxunl, a.sidorin

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76365
2020-03-27 17:18:49 -04:00
Artem Belevich fe8063e1a0 Revert "[cuda][hip] Add CUDA builtin surface/texture reference support."
This reverts commit 6a9ad5f3f4.
The patch breaks CUDA copmilation.

Differential Revision: https://reviews.llvm.org/D76365
2020-03-27 10:01:38 -07:00
Michael Liao 6a9ad5f3f4 [cuda][hip] Add CUDA builtin surface/texture reference support.
Summary:
- Even though the bindless surface/texture interfaces are promoted,
  there are still code using surface/texture references. For example,
  [PR#26400](https://bugs.llvm.org/show_bug.cgi?id=26400) reports the
  compilation issue for code using `tex2D` with texture references. For
  better compatibility, this patch proposes the support of
  surface/texture references.
- Due to the absent documentation and magic headers, it's believed that
  `nvcc` does use builtins for texture support. From the limited NVVM
  documentation[^nvvm] and NVPTX backend texture/surface related
  tests[^test], it's believed that surface/texture references are
  supported by replacing their reference types, which are annotated with
  `device_builtin_surface_type`/`device_builtin_texture_type`, with the
  corresponding handle-like object types, `cudaSurfaceObject_t` or
  `cudaTextureObject_t`, in the device-side compilation. On the host
  side, that global handle variables are registered and will be
  established and updated later when corresponding binding/unbinding
  APIs are called[^bind]. Surface/texture references are most like
  device global variables but represented in different types on the host
  and device sides.
- In this patch, the following changes are proposed to support that
  behavior:
  + Refine `device_builtin_surface_type` and
    `device_builtin_texture_type` attributes to be applied on `Type`
    decl only to check whether a variable is of the surface/texture
    reference type.
  + Add hooks in code generation to replace that reference types with
    the correponding object types as well as all accesses to them. In
    particular, `nvvm.texsurf.handle.internal` should be used to load
    object handles from global reference variables[^texsurf] as well as
    metadata annotations.
  + Generate host-side registration with proper template argument
    parsing.

---
[^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf
[^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
[^bind]: See section 3.2.11.1.2 ``Texture reference API` in [CUDA C Programming Guide](https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf).
[^texsurf]: According to NVVM IR, `nvvm.texsurf.handle` should be used.  But, the current backend doesn't have that supported. We may revise that later.

Reviewers: tra, rjmccall, yaxunl, a.sidorin

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76365
2020-03-26 14:44:52 -04:00
Sander de Smalen 5087ace651 [Clang][SVE] Parse builtin type string for scalable vectors
This patch adds 'q' to mean 'scalable vector' in the builtin
type string, and for SVE will return the matching builtin
type as defined in the C/C++ language extensions for SVE.

This patch also adds some scaffolding to generate the arm_sve.h
header file, and some builtin definitions (+CodeGen) to be able
to implement some simple masked load intrinsics that use the
ACLE types, such as:

 svint8_t test_svld1_s8(svbool_t pg, const int8_t *base) {
   return svld1_s8(pg, base);
 }

Reviewers: efriedma, rjmccall, rovka, rsandifo-arm, rengolin

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75298
2020-03-15 14:34:52 +00:00
Shengchen Kan 214d24e1f8 [X86] Support intrinsic _mm_broadcastsi128_si256
Reviewers: LuoYuanke, craig.topper, RKSimon, pengfei

Reviewed By: craig.topper

Subscribers: cfe-commits, llvm-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75897
2020-03-12 10:56:39 +08:00
Shengchen Kan ab69cd0779 [X86] Support intrinsic _mm_cldemote
Reviewers: LuoYuanke, craig.topper, RKSimon, pengfei

Reviewed By: craig.topper

Subscribers: cfe-commits, llvm-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75896
2020-03-12 10:03:41 +08:00
Shengchen Kan 560aa53f8f [X86] Support intrinsics _bextr2*
Reviewers: LuoYuanke, craig.topper, RKSimon, pengfei

Reviewed By: craig.topper

Subscribers: cfe-commits, llvm-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75894
2020-03-12 09:26:51 +08:00
Mikhail Maltsev 47edf5bafb [ARM,CDE] Generalize MVE intrinsics infrastructure to support CDE
Summary:
This patch generalizes the existing code to support CDE intrinsics
which will share some properties with existing MVE intrinsics
(some of the intrinsics will be polymorphic and accept/return values
of MVE vector types).
Specifically the patch:
* Adds new tablegen backends -gen-arm-cde-builtin-def,
  -gen-arm-cde-builtin-codegen, -gen-arm-cde-builtin-sema,
  -gen-arm-cde-builtin-aliases, -gen-arm-cde-builtin-header based on
  existing MVE backends.
* Renames the '__clang_arm_mve_alias' attribute into
  '__clang_arm_builtin_alias' (it will be used with CDE intrinsics as
  well as MVE intrinsics)
* Implements semantic checks for the coprocessor argument of the CDE
  intrinsics as well as the existing coprocessor intrinsics.
* Adds one CDE intrinsic __arm_cx1 to test the above changes

Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen

Reviewed By: simon_tatham

Subscribers: sdesmalen, mgorny, kristof.beyls, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75850
2020-03-10 14:03:16 +00:00
Michael Spencer 16af23fae8 [clang][Headers] Use __has_builtin instead of _MSC_VER.
arm_acle.h relied on `_MSC_VER` to determine if a given function was
already defined as a builtin. This was incorrect because
`-fms-extensions` enables these builtins, but is not responsible for
defining `_MSC_VER` on any target. The next closest thing is
`_MSC_EXTENSIONS`, which is only defined on Windows targets, but even
this is suboptimal. What this conditional is actually trying to
determine is if the given functions are defined as builtins, so just
check that directly.

I also attempted to do this for `__nop`, but in that case intrin.h,
which is only includable if `_MSC_VER` is defined, has its own
definition. So in that case `_MSC_VER` is correct.

Differential Revision: https://reviews.llvm.org/D75719
rdar://60102353
2020-03-06 13:48:09 -08:00
Sven van Haastregt 8a37b9e617 [OpenCL] Remove spurious atomic_fetch_min/max builtins
These declarations use a mix of unsigned and signed argument and
return types.  This is not in accordance with OpenCL v2.0 s6.13.11.

Differential Revision: https://reviews.llvm.org/D74910
2020-03-02 15:56:48 +00:00
Mirko Brkusanin 5ba931a84a [Mips] Add intrinsics for 4-byte and 8-byte MSA loads/stores.
New intrinisics are implemented for when we need to port SIMD code from other
arhitectures and only load or store portions of MSA registers.

Following intriniscs are added which only load/store element 0 of a vector:
v4i32 __builtin_msa_ldrq_w (const void *, imm_n2048_2044);
v2i64 __builtin_msa_ldr_d (const void *, imm_n4096_4088);
void __builtin_msa_strq_w (v4i32, void *, imm_n2048_2044);
void __builtin_msa_str_d (v2i64, void *, imm_n4096_4088);

Differential Revision: https://reviews.llvm.org/D73644
2020-02-11 11:47:30 +01:00
Alexey Bataev fd3437a4f7 [OPENMP][NVPTX]Add NVPTX specific definitions for new/delete operators.
Summary:
To use new/delete in NVPTX code we need to define them. Implementation
copied from CUDA wrappers.

Reviewers: hfinkel, jdoerfert

Subscribers: mgorny, guansong, kkwli0, caomhin, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D73128
2020-02-05 09:57:53 -05:00
Alexey Sotkin f780e15caf [OpenCL] Fix support for cl_khr_mipmap_image_writes
Text of the extension is available here:
https://github.com/KhronosGroup/OpenCL-Docs/blob/master/ext/cl_khr_mipmap_image.asciidoc

Patch by Ilya Mashkov

Differential Revision: https://reviews.llvm.org/D71460
2020-02-05 14:55:32 +03:00
Artem Belevich 12fefeef20 [CUDA] Assume the latest known CUDA version if we've found an unknown one.
This makes clang somewhat forward-compatible with new CUDA releases
without having to patch it for every minor release without adding
any new function.

If an unknown version is found, clang issues a warning (can be disabled
with -Wno-cuda-unknown-version) and assumes that it has detected
the latest known version. CUDA releases are usually supersets
of older ones feature-wise, so it should be sufficient to keep
released clang versions working with minor CUDA updates without
having to upgrade clang, too.

Differential Revision: https://reviews.llvm.org/D73231
2020-01-28 10:11:42 -08:00
Artem Belevich cc14de88da [CUDA] Fix order of memcpy arguments in __shfl_*(<64-bit type>).
Wrong argument order resulted in broken shfl ops for 64-bit types.
2020-01-23 13:17:52 -08:00
Craig Topper 16b9410caa [X86] Cast to __v4hi instead of __m64 in the implementation of _mm_extract_pi16 and _mm_insert_pi16.
__m64 is a vector of 1 long long. But the builtins these intrinsics
are calling expect a vector of 4 shorts.

Fixes PR44589
2020-01-22 16:00:23 -06:00
Ulrich Weigand cebba7ce39 [SystemZ] Avoid unnecessary conversions in vecintrin.h
Use floating-point instead of integer zero constants to avoid
creating implicit conversions, which currently cause suboptimal
code to be generated with -ffp-exception-behavior=strict.

NFC otherwise.
2020-01-16 18:58:14 +01:00
Richard Smith 388eaa1270 Work around PR43337: don't try to use the vec_sel overloads for vector long long, since clang's <altivec.h> doesn't provide it yet! 2020-01-15 13:14:57 -08:00
Warren Ristow 7fcd9e3f70 [X86] Mark various pointer arguments in builtins as const
Enabling `-Wcast-qual` identified many casts in various system headers
that were dropping the `const` qualifier.  Fixing those missing
qualifiers pointed out that a few of the definitions of the builtins
did not properly identify their arguments as `const` pointers.  This
commit fixes those builtin definitions, and the system header files
so that they no longer drop the qualifier.

Differential Revision: https://reviews.llvm.org/D71718
2019-12-19 11:42:11 -08:00
Momchil Velikov 600d123c6f [ARM][CMSE] Add CMSE header and builtins
This is patch C2 as mentioned in RFC
http://lists.llvm.org/pipermail/cfe-dev/2019-March/061834.html

This adds CMSE builtin functions, and introduces arm_cmse.h header which has
useful macros, functions, and data types for end-users of CMSE.

Patch by Javed Absar.

Diferential Revision: https://reviews.llvm.org/D70817
2019-12-12 15:01:14 +00:00
Craig Topper 890c6ef1fb [X86] Remove forward declaration of _invpcid from intrin.h. Rely on inline version from immintrin.h
The forward declaration had a cdecl calling convention, but the
inline version did not. This leads to a conflict if the default
calling convention is not cdecl. Fix this by just removing the
forward declaration.

Fixes PR41503
2019-11-25 16:27:39 -08:00
Craig Topper 3cec2a17de [X86] Fix the implementation of __readcr3/__writecr3 to work in 64-bit mode
We need to use a 64-bit type in 64-bit mode so a 64-bit register
will get used in the generated assembly. I've also changed the
constraints to just use "r" intead of "q". "q" forces to a only
an a/b/c/d register in 32-bit mode, but I see no reason that
would matter here.

Fixes Nico's note in PR19301 over 4 years ago.

Differential Revision: https://reviews.llvm.org/D70101
2019-11-14 13:21:36 -08:00
Nemanja Ivanovic e0407f5496 [PowerPC][Altivec] Fix offsets for vec_xl and vec_xst
As we currently have it implemented in altivec.h, the offsets for these two
intrinsics are element offsets. The documentation in the ABI (as well as the
implementation in both XL and GCC) states that these should be byte offsets.

Differential revision: https://reviews.llvm.org/D63636
2019-11-07 20:58:11 -06:00
Nemanja Ivanovic 070e4027b0 [PowerPC][Altivec] Emit correct builtin for single precision vec_all_ne
We currently emit a double precision comparison instruction for this, whereas we
need to emit the single precision version.

Differential revision: https://reviews.llvm.org/D64024
2019-11-07 20:40:32 -06:00
Eli Friedman 98286b569d [Headers] Fix compatibility between arm_acle.h and intrin.h
Make sure they don't both define __nop.

Differential Revision: https://reviews.llvm.org/D69012
2019-10-29 14:52:56 -07:00
vhscampos f6e11a36c4 [ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLE
Summary:
Writing support for three ACLE functions:
  unsigned int __cls(uint32_t x)
  unsigned int __clsl(unsigned long x)
  unsigned int __clsll(uint64_t x)

CLS stands for "Count number of leading sign bits".

In AArch64, these two intrinsics can be translated into the 'cls'
instruction directly. In AArch32, on the other hand, this functionality
is achieved by implementing it in terms of clz (count number of leading
zeros).

Reviewers: compnerd

Reviewed By: compnerd

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D69250
2019-10-28 11:06:58 +00:00
vhscampos 5d35b7d9e1 [ARM][AArch64] Implement __arm_rsrf, __arm_rsrf64, __arm_wsrf & __arm_wsrf64
Summary:
Adding support for ACLE intrinsics.

Patch by Michael Platings.

Reviewers: chill, t.p.northover, efriedma

Reviewed By: chill

Subscribers: kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D69297
2019-10-28 10:59:18 +00:00
Greg Bedwell d4758d4a8d Fix a spelling mistake in a couple of intrinsic description comments. NFC 2019-10-27 09:42:14 +00:00
Simon Tatham 08074cc965 [clang,ARM] Initial ACLE intrinsics for MVE.
This commit sets up the infrastructure for auto-generating <arm_mve.h>
and doing clang-side code generation for the builtins it relies on,
and demonstrates that it works by implementing a representative sample
of the ACLE intrinsics, more or less matching the ones introduced in
LLVM IR by D67158,D68699,D68700.

Like NEON, that header file will provide a set of vector types like
uint16x8_t and C functions with names like vaddq_u32(). Unlike NEON,
the ACLE spec for <arm_mve.h> includes a polymorphism system, so that
you can write plain vaddq() and disambiguate by the vector types you
pass to it.

Unlike the corresponding NEON code, I've arranged to make every user-
facing ACLE intrinsic into a clang builtin, and implement all the code
generation inside clang. So <arm_mve.h> itself contains nothing but
typedefs and function declarations, with the latter all using the new
`__attribute__((__clang_builtin))` system to arrange that the user-
facing function names correspond to the right internal BuiltinIDs.

So the new MveEmitter tablegen system specifies the full sequence of
IRBuilder operations that each user-facing ACLE intrinsic should
translate into. Where possible, the ACLE intrinsics map to standard IR
operations such as vector-typed `add` and `fadd`; where no standard
representation exists, I call down to the sample IR intrinsics
introduced in an earlier commit.

Doing it like this means that you get the polymorphism for free just
by using __attribute__((overloadable)): the clang overload resolution
decides which function declaration is the relevant one, and _then_ its
BuiltinID is looked up, so by the time we're doing code generation,
that's all been resolved by the standard system. It also means that
you get really nice error messages if the user passes the wrong
combination of types: clang will show the declarations from the header
file and explain why each one doesn't match.

(The obvious alternative approach would be to have wrapper functions
in <arm_mve.h> which pass their arguments to the underlying builtins.
But that doesn't work in the case where one of the arguments has to be
a constant integer: the wrapper function can't pass the constantness
through. So you'd have to do that case using a macro instead, and then
use C11 `_Generic` to handle the polymorphism. Then you have to add
horrible workarounds because `_Generic` requires even the untaken
branches to type-check successfully, and //then// if the user gets the
types wrong, the error message is totally unreadable!)

Reviewers: dmgreen, miyuki, ostannard

Subscribers: mgorny, javed.absar, kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D67161
2019-10-24 16:33:13 +01:00
Craig Topper 282eff3847 [X86] Always define the tzcnt intrinsics even when _MSC_VER is defined.
These intrinsics use llvm.cttz intrinsics so are always available
even without the bmi feature. We already don't check for the bmi
feature on the intrinsics themselves. But we were blocking the
include of the header file with _MSC_VER unless BMI was enabled
on the command line.

Fixes PR30506.

llvm-svn: 374516
2019-10-11 06:07:53 +00:00
Pengfei Wang 1f3a15c397 [x86] Adding support for some missing intrinsics: _castf32_u32, _castf64_u64, _castu32_f32, _castu64_f64
Summary:
Adding support for some missing intrinsics:
_castf32_u32, _castf64_u64, _castu32_f32, _castu64_f64

Reviewers: craig.topper, LuoYuanke, RKSimon, pengfei

Reviewed By: RKSimon

Subscribers: llvm-commits

Patch by yubing (Bing Yu)

Differential Revision: https://reviews.llvm.org/D67212

llvm-svn: 372802
2019-09-25 02:24:05 +00:00
Richard Smith 5b2ba5afa9 Fix reliance on -flax-vector-conversions in AVX intrinsics headers and
corresponding tests.

llvm-svn: 372063
2019-09-17 03:56:30 +00:00
Richard Smith a50884abad Remove reliance on lax vector conversions from altivec.h in VSX mode.
llvm-svn: 372061
2019-09-17 03:56:26 +00:00
Richard Smith aeb279dd88 Remove reliance on lax vector conversions from altivec.h and its test.
llvm-svn: 371814
2019-09-13 05:19:12 +00:00
Jinsong Ji 5309189d9b [PowerPC][Altivec] Fix constant argument for vec_dss
Summary:
This is similar to vec_ct* in https://reviews.llvm.org/rL304205.

The argument must be a constant, otherwise instruction selection
will fail. always_inline is not enough for isel to always fold
everything away at -O0.

The fix is to turn the function into macros in altivec.h.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43072

Reviewers: nemanjai, hfinkel, #powerpc, wuzish

Reviewed By: #powerpc, wuzish

Subscribers: wuzish, kbarton, MaskRay, shchenz, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D66699

llvm-svn: 370902
2019-09-04 14:01:47 +00:00
Artem Belevich ce94ec661f [CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+
vote.ballot instruction is gone in recent CUDA versions and
vote.sync.ballot can not be used because it needs a thread mask parameter.
Fortunately PTX 6.2 (introduced with CUDA-9.2) provides activemask.b32
instruction for this.

Differential Revision: https://reviews.llvm.org/D66665

llvm-svn: 370792
2019-09-03 17:31:58 +00:00
Pengfei Wang dea9cad10e [x86] Fix bugs of some intrinsic functions in CLANG : _mm512_stream_ps, _mm512_stream_pd, _mm512_stream_si512
Reviewers: craig.topper, pengfei, LuoYuanke, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Patch by Bing Yu (yubing)

Differential Revision: https://reviews.llvm.org/D66786

llvm-svn: 370691
2019-09-03 02:06:15 +00:00
Pengfei Wang caac097fbf [x86] Adding support for some missing intrinsics: _mm512_cvtsi512_si32
Summary:
Adding support for some missing intrinsics:
_mm512_cvtsi512_si32

Reviewers: craig.topper, pengfei, LuoYuanke, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits

Patch by Bing Yu (yubing)

Differential Revision: https://reviews.llvm.org/D66785

llvm-svn: 370297
2019-08-29 06:18:34 +00:00
Yaxun Liu af478e240b [OpenCL] Fix declaration of enqueue_marker
Differential Revision: https://reviews.llvm.org/D66512

llvm-svn: 369641
2019-08-22 11:18:59 +00:00
Anastasia Stulova ef58804ebc [OpenCL] Fix lang mode predefined macros for C++ mode.
In C++ mode we should only avoid adding __OPENCL_C_VERSION__,
all other predefined macros about the language mode are still
valid.

This change also fixes the language version check in the
headers accordingly.

Differential Revision: https://reviews.llvm.org/D65941

llvm-svn: 368552
2019-08-12 10:44:07 +00:00
Raphael Isemann c4b5b66a05 [clang] Fixed x86 cpuid NSC signature
Summary:
The signature "Geode by NSC" for NSC vendor is wrong.
In lib/Headers/cpuid.h, signature_NSC_edx and signature_NSC_ecx constants are inverted (cpuid signature order is ebx # edx # ecx).

Reviewers: teemperor, rsmith, craig.topper

Reviewed By: teemperor, craig.topper

Subscribers: craig.topper, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D65978

llvm-svn: 368510
2019-08-10 10:14:01 +00:00
Qiu Chaofan e9efaf3529 [PowerPC] [Clang] Port SSE3, SSSE3 and SSE4 intrinsics to PowerPC
Port existing headers which include x86 intrinsics implementation to
PowerPC platform (using Altivec), along with tests. Also, tests about
including these intrinsic headers are combined.

The headers are mainly developed by Steven Munroe, with contributions
from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.

Reviewed By: Jinsong Ji

Differential Revision: https://reviews.llvm.org/D65630

llvm-svn: 368392
2019-08-09 03:39:55 +00:00
Momchil Velikov a36d31478c [AArch64] Add support for Transactional Memory Extension (TME)
Re-commit r366322 after some fixes

TME is a future architecture technology, documented in

  https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools
  https://developer.arm.com/docs/ddi0601/a

More about the future architectures:

  https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture

This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and
TCANCEL and the target feature/arch extension "tme".

It also implements TME builtin functions, defined in ACLE Q2 2019
(https://developer.arm.com/docs/101028/latest)

Differential Revision: https://reviews.llvm.org/D64416

Patch by Javed Absar and Momchil Velikov

llvm-svn: 367428
2019-07-31 12:52:17 +00:00
Qiu Chaofan 852d444671 [PowerPC] [Clang] Add platform guards to PPC vector intrinsics headers
Move the platform check out of PPC Linux toolchain code and add platform guards
to the intrinsic headers, since they are supported currently only on 64-bit
PowerPC targets.

Reviewed By: Jinsong Ji

Differential Revision: https://reviews.llvm.org/D64849

llvm-svn: 367281
2019-07-30 02:18:11 +00:00
Paul Robinson d2c0eefd5c [X86] Remove const from some intrinsics that shouldn't have them
llvm-svn: 366699
2019-07-22 16:14:09 +00:00
Sven van Haastregt e9e59ad79f [OpenCL] Define CLK_NULL_EVENT without cast
Defining CLK_NULL_EVENT with a `(void*)` cast has the (unintended?)
side-effect that the address space will be fixed (as generic in OpenCL
2.0 mode).  The consequence is that any target specific address space
for the clk_event_t type will not be applied.

It is not clear why the void pointer cast was needed in the first
place, and it seems we can do without it.

Differential Revision: https://reviews.llvm.org/D63876

llvm-svn: 366546
2019-07-19 09:11:48 +00:00
Qiu Chaofan 03aaef8e72 [PowerPC][Clang] Remove use of malloc in mm_malloc
Remove dependency of malloc in implementation of mm_malloc function in PowerPC
intrinsics and alignment assumption on glibc.

Reviewed By: Hal Finkel

Differential Revision: https://reviews.llvm.org/D64850

llvm-svn: 366406
2019-07-18 06:20:12 +00:00
Momchil Velikov 0e2b74a2b0 Revert [AArch64] Add support for Transactional Memory Extension (TME)
This reverts r366322 (git commit 4b8da3a503)

llvm-svn: 366355
2019-07-17 17:43:32 +00:00
Momchil Velikov 4b8da3a503 [AArch64] Add support for Transactional Memory Extension (TME)
TME is a future architecture technology, documented in

https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools
https://developer.arm.com/docs/ddi0601/a

More about the future architectures:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture

This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and
TCANCEL and the target feature/arch extension "tme".

It also implements TME builtin functions, defined in ACLE Q2 2019
(https://developer.arm.com/docs/101028/latest)

Patch by Javed Absar and Momchil Velikov

Differential Revision: https://reviews.llvm.org/D64416

llvm-svn: 366322
2019-07-17 13:23:27 +00:00
Kyrylo Tkachov eb72138340 [AArch64] Implement __jcvt intrinsic from Armv8.3-A
The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined.

This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function.
The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used.
I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example).

make check-all didn't show any new failures.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics

Differential Revision: https://reviews.llvm.org/D64495

llvm-svn: 366197
2019-07-16 09:27:39 +00:00
Ulrich Weigand b98bf60ef7 [SystemZ] Add support for new cpu architecture - arch13
This patch series adds support for the next-generation arch13
CPU architecture to the SystemZ backend.

This includes:
- Basic support for the new processor and its features.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining  __VEC__ == 10303.

Note: No currently available Z system supports the arch13
architecture.  Once new systems become available, the
official system name will be added as supported -march name.

llvm-svn: 365933
2019-07-12 18:14:51 +00:00
Craig Topper caf6b71ab2 [X86] Change the IR sequence for _mm_storeh_pi and _mm_storel_pi to perform the store as a <2 x float> instead of i64.
This is similar to what we do for loadl_pi and loadh_pi.

llvm-svn: 365669
2019-07-10 17:11:29 +00:00
Sven van Haastregt b502a44110 [OpenCL] Restore ATOMIC_VAR_INIT
We accidentally lost the ATOMIC_VAR_INIT and ATOMIC_FLAG_INIT macros
in r363794.

Also put the `memory_order` typedef back inside a `>= CL2.0` guard.

llvm-svn: 364174
2019-06-24 10:06:40 +00:00
Sven van Haastregt 853dfab799 [OpenCL] Remove more duplicates from opencl-c.h
Identified the duplicate declarations using

  sort lib/Headers/opencl-c.h | uniq -c | grep '      2'

llvm-svn: 364173
2019-06-24 10:06:34 +00:00
Anastasia Stulova 999f676d75 [OpenCL][PR41963] Add generic addr space to old atomics in C++ mode
Add overloads with generic address space pointer to old atomics.
This is currently only added for C++ compilation mode.

Differential Revision: https://reviews.llvm.org/D62335

llvm-svn: 364071
2019-06-21 16:19:16 +00:00
Sven van Haastregt 772a7a7680 [OpenCL] Remove duplicate read_image declarations
Patch by Pierre Gondois.

llvm-svn: 364020
2019-06-21 10:26:10 +00:00
Craig Topper 6d9fb68c53 [X86] Make _mm_mask_cvtps_ph, _mm_maskz_cvtps_ph, _mm256_mask_cvtps_ph, and _mm256_maskz_cvtps_ph aliases for their corresponding cvt_roundps_ph intrinsic.
These intrinsics should always take an immediate for the rounding mode.
The base instruction comes from before EVEX embdedded rounding. The
user should always provide the immediate rather than us assuming
CUR_DIRECTION.

Make the 512-bit versions also explicit aliases instead of copy
pasting the code.

llvm-svn: 363961
2019-06-20 18:24:29 +00:00
Xing Xue ab4bcd844a AIX system headers need stdint.h and inttypes.h to be re-enterable
Summary:
AIX system headers need stdint.h and inttypes.h to be re-enterable when macro _STD_TYPES_T is defined so that limit macro definitions such as UINT32_MAX can be found. This patch attempts to allow that on AIX.

Reviewers: hubert.reinterpretcast, jasonliu, mclow.lists, EricWF

Reviewed by: hubert.reinterpretcast, mclow.lists

Subscribers: jfb, jsji, christof, cfe-commits, libcxx-commits, llvm-commits

Tags: #LLVM, #clang, #libc++

Differential Revision: https://reviews.llvm.org/D59253

llvm-svn: 363939
2019-06-20 15:36:32 +00:00
Craig Topper 24151619a0 [X86] Correct the __min_vector_width__ attribute on a few intrinsics.
llvm-svn: 363890
2019-06-19 23:27:04 +00:00
Sven van Haastregt af1c230e70 [OpenCL] Split type and macro definitions into opencl-c-base.h
Using the -fdeclare-opencl-builtins option will require a way to
predefine types and macros such as `int4`, `CLK_GLOBAL_MEM_FENCE`,
etc.  Move these out of opencl-c.h into opencl-c-base.h such that the
latter can be shared by -fdeclare-opencl-builtins and
-finclude-default-header.

This changes the behaviour of -finclude-default-header when
-fdeclare-opencl-builtins is specified: instead of including the full
header, it will include the header with only the base definitions.

Differential revision: https://reviews.llvm.org/D63256

llvm-svn: 363794
2019-06-19 12:48:22 +00:00
Zi Xuan Wu cc12f68fff [PowerPC] [Clang] Port SSE2 intrinsics to PowerPC
Port emmintrin.h which include Intel SSE2 intrinsics implementation to PowerPC platform (using Altivec).

The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.

It's a follow-up patch of D62121.

Patched by: Qiu Chaofan <qiucf@cn.ibm.com>

Differential Revision: https://reviews.llvm.org/D62569

llvm-svn: 363122
2019-06-12 05:25:40 +00:00
Pengfei Wang 244062eece [X86] Enable intrinsics that convert float and bf16 data to each other
Scalar version :
_mm_cvtsbh_ss , _mm_cvtness_sbh

Vector version:
_mm512_cvtpbh_ps , _mm256_cvtpbh_ps
_mm512_maskz_cvtpbh_ps , _mm256_maskz_cvtpbh_ps
_mm512_mask_cvtpbh_ps , _mm256_mask_cvtpbh_ps

Patch by Shengchen Kan (skan)

Differential Revision: https://reviews.llvm.org/D62363

llvm-svn: 363018
2019-06-11 01:17:28 +00:00
Pengfei Wang 3a29f7c99c [X86] Add ENQCMD instructions
For more details about these instructions, please refer to the latest
ISE document:
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.

Patch by Tianqing Wang (tianqing)

Differential Revision: https://reviews.llvm.org/D62282

llvm-svn: 362685
2019-06-06 08:28:42 +00:00
Andrew Savonichev 9ed325e463 [OpenCL] Undefine cl_intel_planar_yuv extension
Summary:

Remove unnecessary definition (otherwise the extension will be defined
where it's not supposed to be defined).

Consider the code:

  #pragma OPENCL EXTENSION cl_intel_planar_yuv : begin
  // some declarations
  #pragma OPENCL EXTENSION cl_intel_planar_yuv : end

is enough for extension to become known for clang.

Patch by: Dmitry Sidorov <dmitry.sidorov@intel.com>

Reviewers: Anastasia, yaxunl

Reviewed By: Anastasia

Tags: #clang

Differential Revision: https://reviews.llvm.org/D58666

llvm-svn: 362398
2019-06-03 13:02:43 +00:00
Pengfei Wang cc3629d545 [X86] Add VP2INTERSECT instructions
Support intel AVX512 VP2INTERSECT instructions in clang

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D62367

llvm-svn: 362196
2019-05-31 06:09:35 +00:00
Zi Xuan Wu fc3ed1ec50 re-commit r361928: [PowerPC] [Clang] Port SSE intrinsics to PowerPC
Port xmmintrin.h which include Intel SSE intrinsics implementation to PowerPC platform (using Altivec).

The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.

Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Reviewed By: Jinsong Ji

Differential Revision: https://reviews.llvm.org/D62121

llvm-svn: 362190
2019-05-31 04:42:13 +00:00
Zi Xuan Wu 48061cd999 revert rC361928: [PowerPC] [Clang] Port SSE intrinsics to PowerPC
Because test fails in other targets rather than PowerPC

llvm-svn: 361930
2019-05-29 07:09:54 +00:00
Zi Xuan Wu b3bcbb5b66 [PowerPC] [Clang] Port SSE intrinsics to PowerPC
Port xmmintrin.h which include Intel SSE intrinsics implementation to PowerPC platform (using Altivec).

The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.

Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Reviewed By: Jinsong Ji

Differential Revision: https://reviews.llvm.org/D62121

llvm-svn: 361928
2019-05-29 05:17:03 +00:00
Kevin Petit aa7754cc90 [OpenCL] Add support for the cl_arm_integer_dot_product extensions
The specification is available in the Khronos OpenCL registry:

https://www.khronos.org/registry/OpenCL/extensions/arm/cl_arm_integer_dot_product.txt

Signed-off-by: Kevin Petit <kevin.petit@arm.com>
llvm-svn: 361641
2019-05-24 14:53:52 +00:00
Craig Topper 3d7ecc4618 [X86] Remove semicolons at the end of intrinsics implemented as macros so they can be used as arguments to other intrinsics.
Also fix one intrinsic that was using variable names without underscores.

Fixes PR41932

llvm-svn: 361109
2019-05-19 01:01:52 +00:00