Commit Graph

1875 Commits

Author SHA1 Message Date
Simon Tatham 08074cc965 [clang,ARM] Initial ACLE intrinsics for MVE.
This commit sets up the infrastructure for auto-generating <arm_mve.h>
and doing clang-side code generation for the builtins it relies on,
and demonstrates that it works by implementing a representative sample
of the ACLE intrinsics, more or less matching the ones introduced in
LLVM IR by D67158,D68699,D68700.

Like NEON, that header file will provide a set of vector types like
uint16x8_t and C functions with names like vaddq_u32(). Unlike NEON,
the ACLE spec for <arm_mve.h> includes a polymorphism system, so that
you can write plain vaddq() and disambiguate by the vector types you
pass to it.

Unlike the corresponding NEON code, I've arranged to make every user-
facing ACLE intrinsic into a clang builtin, and implement all the code
generation inside clang. So <arm_mve.h> itself contains nothing but
typedefs and function declarations, with the latter all using the new
`__attribute__((__clang_builtin))` system to arrange that the user-
facing function names correspond to the right internal BuiltinIDs.

So the new MveEmitter tablegen system specifies the full sequence of
IRBuilder operations that each user-facing ACLE intrinsic should
translate into. Where possible, the ACLE intrinsics map to standard IR
operations such as vector-typed `add` and `fadd`; where no standard
representation exists, I call down to the sample IR intrinsics
introduced in an earlier commit.

Doing it like this means that you get the polymorphism for free just
by using __attribute__((overloadable)): the clang overload resolution
decides which function declaration is the relevant one, and _then_ its
BuiltinID is looked up, so by the time we're doing code generation,
that's all been resolved by the standard system. It also means that
you get really nice error messages if the user passes the wrong
combination of types: clang will show the declarations from the header
file and explain why each one doesn't match.

(The obvious alternative approach would be to have wrapper functions
in <arm_mve.h> which pass their arguments to the underlying builtins.
But that doesn't work in the case where one of the arguments has to be
a constant integer: the wrapper function can't pass the constantness
through. So you'd have to do that case using a macro instead, and then
use C11 `_Generic` to handle the polymorphism. Then you have to add
horrible workarounds because `_Generic` requires even the untaken
branches to type-check successfully, and //then// if the user gets the
types wrong, the error message is totally unreadable!)

Reviewers: dmgreen, miyuki, ostannard

Subscribers: mgorny, javed.absar, kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D67161
2019-10-24 16:33:13 +01:00
Craig Topper 282eff3847 [X86] Always define the tzcnt intrinsics even when _MSC_VER is defined.
These intrinsics use llvm.cttz intrinsics so are always available
even without the bmi feature. We already don't check for the bmi
feature on the intrinsics themselves. But we were blocking the
include of the header file with _MSC_VER unless BMI was enabled
on the command line.

Fixes PR30506.

llvm-svn: 374516
2019-10-11 06:07:53 +00:00
Pengfei Wang 1f3a15c397 [x86] Adding support for some missing intrinsics: _castf32_u32, _castf64_u64, _castu32_f32, _castu64_f64
Summary:
Adding support for some missing intrinsics:
_castf32_u32, _castf64_u64, _castu32_f32, _castu64_f64

Reviewers: craig.topper, LuoYuanke, RKSimon, pengfei

Reviewed By: RKSimon

Subscribers: llvm-commits

Patch by yubing (Bing Yu)

Differential Revision: https://reviews.llvm.org/D67212

llvm-svn: 372802
2019-09-25 02:24:05 +00:00
Richard Smith 5b2ba5afa9 Fix reliance on -flax-vector-conversions in AVX intrinsics headers and
corresponding tests.

llvm-svn: 372063
2019-09-17 03:56:30 +00:00
Richard Smith a50884abad Remove reliance on lax vector conversions from altivec.h in VSX mode.
llvm-svn: 372061
2019-09-17 03:56:26 +00:00
Richard Smith aeb279dd88 Remove reliance on lax vector conversions from altivec.h and its test.
llvm-svn: 371814
2019-09-13 05:19:12 +00:00
Jinsong Ji 5309189d9b [PowerPC][Altivec] Fix constant argument for vec_dss
Summary:
This is similar to vec_ct* in https://reviews.llvm.org/rL304205.

The argument must be a constant, otherwise instruction selection
will fail. always_inline is not enough for isel to always fold
everything away at -O0.

The fix is to turn the function into macros in altivec.h.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43072

Reviewers: nemanjai, hfinkel, #powerpc, wuzish

Reviewed By: #powerpc, wuzish

Subscribers: wuzish, kbarton, MaskRay, shchenz, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D66699

llvm-svn: 370902
2019-09-04 14:01:47 +00:00
Artem Belevich ce94ec661f [CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+
vote.ballot instruction is gone in recent CUDA versions and
vote.sync.ballot can not be used because it needs a thread mask parameter.
Fortunately PTX 6.2 (introduced with CUDA-9.2) provides activemask.b32
instruction for this.

Differential Revision: https://reviews.llvm.org/D66665

llvm-svn: 370792
2019-09-03 17:31:58 +00:00
Pengfei Wang dea9cad10e [x86] Fix bugs of some intrinsic functions in CLANG : _mm512_stream_ps, _mm512_stream_pd, _mm512_stream_si512
Reviewers: craig.topper, pengfei, LuoYuanke, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Patch by Bing Yu (yubing)

Differential Revision: https://reviews.llvm.org/D66786

llvm-svn: 370691
2019-09-03 02:06:15 +00:00
Pengfei Wang caac097fbf [x86] Adding support for some missing intrinsics: _mm512_cvtsi512_si32
Summary:
Adding support for some missing intrinsics:
_mm512_cvtsi512_si32

Reviewers: craig.topper, pengfei, LuoYuanke, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits

Patch by Bing Yu (yubing)

Differential Revision: https://reviews.llvm.org/D66785

llvm-svn: 370297
2019-08-29 06:18:34 +00:00
Yaxun Liu af478e240b [OpenCL] Fix declaration of enqueue_marker
Differential Revision: https://reviews.llvm.org/D66512

llvm-svn: 369641
2019-08-22 11:18:59 +00:00
Anastasia Stulova ef58804ebc [OpenCL] Fix lang mode predefined macros for C++ mode.
In C++ mode we should only avoid adding __OPENCL_C_VERSION__,
all other predefined macros about the language mode are still
valid.

This change also fixes the language version check in the
headers accordingly.

Differential Revision: https://reviews.llvm.org/D65941

llvm-svn: 368552
2019-08-12 10:44:07 +00:00
Raphael Isemann c4b5b66a05 [clang] Fixed x86 cpuid NSC signature
Summary:
The signature "Geode by NSC" for NSC vendor is wrong.
In lib/Headers/cpuid.h, signature_NSC_edx and signature_NSC_ecx constants are inverted (cpuid signature order is ebx # edx # ecx).

Reviewers: teemperor, rsmith, craig.topper

Reviewed By: teemperor, craig.topper

Subscribers: craig.topper, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D65978

llvm-svn: 368510
2019-08-10 10:14:01 +00:00
Qiu Chaofan e9efaf3529 [PowerPC] [Clang] Port SSE3, SSSE3 and SSE4 intrinsics to PowerPC
Port existing headers which include x86 intrinsics implementation to
PowerPC platform (using Altivec), along with tests. Also, tests about
including these intrinsic headers are combined.

The headers are mainly developed by Steven Munroe, with contributions
from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.

Reviewed By: Jinsong Ji

Differential Revision: https://reviews.llvm.org/D65630

llvm-svn: 368392
2019-08-09 03:39:55 +00:00
Momchil Velikov a36d31478c [AArch64] Add support for Transactional Memory Extension (TME)
Re-commit r366322 after some fixes

TME is a future architecture technology, documented in

  https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools
  https://developer.arm.com/docs/ddi0601/a

More about the future architectures:

  https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture

This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and
TCANCEL and the target feature/arch extension "tme".

It also implements TME builtin functions, defined in ACLE Q2 2019
(https://developer.arm.com/docs/101028/latest)

Differential Revision: https://reviews.llvm.org/D64416

Patch by Javed Absar and Momchil Velikov

llvm-svn: 367428
2019-07-31 12:52:17 +00:00
Qiu Chaofan 852d444671 [PowerPC] [Clang] Add platform guards to PPC vector intrinsics headers
Move the platform check out of PPC Linux toolchain code and add platform guards
to the intrinsic headers, since they are supported currently only on 64-bit
PowerPC targets.

Reviewed By: Jinsong Ji

Differential Revision: https://reviews.llvm.org/D64849

llvm-svn: 367281
2019-07-30 02:18:11 +00:00
Paul Robinson d2c0eefd5c [X86] Remove const from some intrinsics that shouldn't have them
llvm-svn: 366699
2019-07-22 16:14:09 +00:00
Sven van Haastregt e9e59ad79f [OpenCL] Define CLK_NULL_EVENT without cast
Defining CLK_NULL_EVENT with a `(void*)` cast has the (unintended?)
side-effect that the address space will be fixed (as generic in OpenCL
2.0 mode).  The consequence is that any target specific address space
for the clk_event_t type will not be applied.

It is not clear why the void pointer cast was needed in the first
place, and it seems we can do without it.

Differential Revision: https://reviews.llvm.org/D63876

llvm-svn: 366546
2019-07-19 09:11:48 +00:00
Qiu Chaofan 03aaef8e72 [PowerPC][Clang] Remove use of malloc in mm_malloc
Remove dependency of malloc in implementation of mm_malloc function in PowerPC
intrinsics and alignment assumption on glibc.

Reviewed By: Hal Finkel

Differential Revision: https://reviews.llvm.org/D64850

llvm-svn: 366406
2019-07-18 06:20:12 +00:00
Momchil Velikov 0e2b74a2b0 Revert [AArch64] Add support for Transactional Memory Extension (TME)
This reverts r366322 (git commit 4b8da3a503)

llvm-svn: 366355
2019-07-17 17:43:32 +00:00
Momchil Velikov 4b8da3a503 [AArch64] Add support for Transactional Memory Extension (TME)
TME is a future architecture technology, documented in

https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools
https://developer.arm.com/docs/ddi0601/a

More about the future architectures:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture

This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and
TCANCEL and the target feature/arch extension "tme".

It also implements TME builtin functions, defined in ACLE Q2 2019
(https://developer.arm.com/docs/101028/latest)

Patch by Javed Absar and Momchil Velikov

Differential Revision: https://reviews.llvm.org/D64416

llvm-svn: 366322
2019-07-17 13:23:27 +00:00
Kyrylo Tkachov eb72138340 [AArch64] Implement __jcvt intrinsic from Armv8.3-A
The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined.

This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function.
The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used.
I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example).

make check-all didn't show any new failures.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics

Differential Revision: https://reviews.llvm.org/D64495

llvm-svn: 366197
2019-07-16 09:27:39 +00:00
Ulrich Weigand b98bf60ef7 [SystemZ] Add support for new cpu architecture - arch13
This patch series adds support for the next-generation arch13
CPU architecture to the SystemZ backend.

This includes:
- Basic support for the new processor and its features.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining  __VEC__ == 10303.

Note: No currently available Z system supports the arch13
architecture.  Once new systems become available, the
official system name will be added as supported -march name.

llvm-svn: 365933
2019-07-12 18:14:51 +00:00
Craig Topper caf6b71ab2 [X86] Change the IR sequence for _mm_storeh_pi and _mm_storel_pi to perform the store as a <2 x float> instead of i64.
This is similar to what we do for loadl_pi and loadh_pi.

llvm-svn: 365669
2019-07-10 17:11:29 +00:00
Sven van Haastregt b502a44110 [OpenCL] Restore ATOMIC_VAR_INIT
We accidentally lost the ATOMIC_VAR_INIT and ATOMIC_FLAG_INIT macros
in r363794.

Also put the `memory_order` typedef back inside a `>= CL2.0` guard.

llvm-svn: 364174
2019-06-24 10:06:40 +00:00
Sven van Haastregt 853dfab799 [OpenCL] Remove more duplicates from opencl-c.h
Identified the duplicate declarations using

  sort lib/Headers/opencl-c.h | uniq -c | grep '      2'

llvm-svn: 364173
2019-06-24 10:06:34 +00:00
Anastasia Stulova 999f676d75 [OpenCL][PR41963] Add generic addr space to old atomics in C++ mode
Add overloads with generic address space pointer to old atomics.
This is currently only added for C++ compilation mode.

Differential Revision: https://reviews.llvm.org/D62335

llvm-svn: 364071
2019-06-21 16:19:16 +00:00
Sven van Haastregt 772a7a7680 [OpenCL] Remove duplicate read_image declarations
Patch by Pierre Gondois.

llvm-svn: 364020
2019-06-21 10:26:10 +00:00
Craig Topper 6d9fb68c53 [X86] Make _mm_mask_cvtps_ph, _mm_maskz_cvtps_ph, _mm256_mask_cvtps_ph, and _mm256_maskz_cvtps_ph aliases for their corresponding cvt_roundps_ph intrinsic.
These intrinsics should always take an immediate for the rounding mode.
The base instruction comes from before EVEX embdedded rounding. The
user should always provide the immediate rather than us assuming
CUR_DIRECTION.

Make the 512-bit versions also explicit aliases instead of copy
pasting the code.

llvm-svn: 363961
2019-06-20 18:24:29 +00:00
Xing Xue ab4bcd844a AIX system headers need stdint.h and inttypes.h to be re-enterable
Summary:
AIX system headers need stdint.h and inttypes.h to be re-enterable when macro _STD_TYPES_T is defined so that limit macro definitions such as UINT32_MAX can be found. This patch attempts to allow that on AIX.

Reviewers: hubert.reinterpretcast, jasonliu, mclow.lists, EricWF

Reviewed by: hubert.reinterpretcast, mclow.lists

Subscribers: jfb, jsji, christof, cfe-commits, libcxx-commits, llvm-commits

Tags: #LLVM, #clang, #libc++

Differential Revision: https://reviews.llvm.org/D59253

llvm-svn: 363939
2019-06-20 15:36:32 +00:00
Craig Topper 24151619a0 [X86] Correct the __min_vector_width__ attribute on a few intrinsics.
llvm-svn: 363890
2019-06-19 23:27:04 +00:00
Sven van Haastregt af1c230e70 [OpenCL] Split type and macro definitions into opencl-c-base.h
Using the -fdeclare-opencl-builtins option will require a way to
predefine types and macros such as `int4`, `CLK_GLOBAL_MEM_FENCE`,
etc.  Move these out of opencl-c.h into opencl-c-base.h such that the
latter can be shared by -fdeclare-opencl-builtins and
-finclude-default-header.

This changes the behaviour of -finclude-default-header when
-fdeclare-opencl-builtins is specified: instead of including the full
header, it will include the header with only the base definitions.

Differential revision: https://reviews.llvm.org/D63256

llvm-svn: 363794
2019-06-19 12:48:22 +00:00
Zi Xuan Wu cc12f68fff [PowerPC] [Clang] Port SSE2 intrinsics to PowerPC
Port emmintrin.h which include Intel SSE2 intrinsics implementation to PowerPC platform (using Altivec).

The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.

It's a follow-up patch of D62121.

Patched by: Qiu Chaofan <qiucf@cn.ibm.com>

Differential Revision: https://reviews.llvm.org/D62569

llvm-svn: 363122
2019-06-12 05:25:40 +00:00
Pengfei Wang 244062eece [X86] Enable intrinsics that convert float and bf16 data to each other
Scalar version :
_mm_cvtsbh_ss , _mm_cvtness_sbh

Vector version:
_mm512_cvtpbh_ps , _mm256_cvtpbh_ps
_mm512_maskz_cvtpbh_ps , _mm256_maskz_cvtpbh_ps
_mm512_mask_cvtpbh_ps , _mm256_mask_cvtpbh_ps

Patch by Shengchen Kan (skan)

Differential Revision: https://reviews.llvm.org/D62363

llvm-svn: 363018
2019-06-11 01:17:28 +00:00
Pengfei Wang 3a29f7c99c [X86] Add ENQCMD instructions
For more details about these instructions, please refer to the latest
ISE document:
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.

Patch by Tianqing Wang (tianqing)

Differential Revision: https://reviews.llvm.org/D62282

llvm-svn: 362685
2019-06-06 08:28:42 +00:00
Andrew Savonichev 9ed325e463 [OpenCL] Undefine cl_intel_planar_yuv extension
Summary:

Remove unnecessary definition (otherwise the extension will be defined
where it's not supposed to be defined).

Consider the code:

  #pragma OPENCL EXTENSION cl_intel_planar_yuv : begin
  // some declarations
  #pragma OPENCL EXTENSION cl_intel_planar_yuv : end

is enough for extension to become known for clang.

Patch by: Dmitry Sidorov <dmitry.sidorov@intel.com>

Reviewers: Anastasia, yaxunl

Reviewed By: Anastasia

Tags: #clang

Differential Revision: https://reviews.llvm.org/D58666

llvm-svn: 362398
2019-06-03 13:02:43 +00:00
Pengfei Wang cc3629d545 [X86] Add VP2INTERSECT instructions
Support intel AVX512 VP2INTERSECT instructions in clang

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D62367

llvm-svn: 362196
2019-05-31 06:09:35 +00:00
Zi Xuan Wu fc3ed1ec50 re-commit r361928: [PowerPC] [Clang] Port SSE intrinsics to PowerPC
Port xmmintrin.h which include Intel SSE intrinsics implementation to PowerPC platform (using Altivec).

The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.

Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Reviewed By: Jinsong Ji

Differential Revision: https://reviews.llvm.org/D62121

llvm-svn: 362190
2019-05-31 04:42:13 +00:00
Zi Xuan Wu 48061cd999 revert rC361928: [PowerPC] [Clang] Port SSE intrinsics to PowerPC
Because test fails in other targets rather than PowerPC

llvm-svn: 361930
2019-05-29 07:09:54 +00:00
Zi Xuan Wu b3bcbb5b66 [PowerPC] [Clang] Port SSE intrinsics to PowerPC
Port xmmintrin.h which include Intel SSE intrinsics implementation to PowerPC platform (using Altivec).

The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.

Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Reviewed By: Jinsong Ji

Differential Revision: https://reviews.llvm.org/D62121

llvm-svn: 361928
2019-05-29 05:17:03 +00:00
Kevin Petit aa7754cc90 [OpenCL] Add support for the cl_arm_integer_dot_product extensions
The specification is available in the Khronos OpenCL registry:

https://www.khronos.org/registry/OpenCL/extensions/arm/cl_arm_integer_dot_product.txt

Signed-off-by: Kevin Petit <kevin.petit@arm.com>
llvm-svn: 361641
2019-05-24 14:53:52 +00:00
Craig Topper 3d7ecc4618 [X86] Remove semicolons at the end of intrinsics implemented as macros so they can be used as arguments to other intrinsics.
Also fix one intrinsic that was using variable names without underscores.

Fixes PR41932

llvm-svn: 361109
2019-05-19 01:01:52 +00:00
Gheorghe-Teodor Bercea 144291e14c [OpenMP][bugfix] Add missing math functions variants for log and abs.
Summary: When including the random header in C++, some of the math functions it relies on are not present in the CUDA headers. We include this variants in this case.

Reviewers: jdoerfert, hfinkel, tra, caomhin

Reviewed By: tra

Subscribers: efriedma, guansong, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D62046

llvm-svn: 361066
2019-05-17 19:15:53 +00:00
Craig Topper 20040db9a6 [X86] Stop implicitly enabling avx512vl when avx512bf16 is enabled.
Previously we were doing this so that the 256 bit selectw builtin could be used in the implementation of the 512->256 bit conversion intrinsic.

After this commit we now use a masked convert builtin that will emit the intrinsic call and the 256-bit select from custom code in CGBuiltin. Then the header only needs to call that one intrinsic.

llvm-svn: 360924
2019-05-16 18:28:17 +00:00
Craig Topper 58964566e0 [X86] Update doxygen comments for AVX512BF16 to not refer to masks as 'immediates'. Refer to parameter names instead of 'src', 'src1', 'src2'. NFC
llvm-svn: 360918
2019-05-16 17:34:35 +00:00
Gheorghe-Teodor Bercea 9392bd6987 [OpenMP][Bugfix] Move double and float versions of abs under c++ macro
Summary:
This is a fix for the reported bug:

[[ https://bugs.llvm.org/show_bug.cgi?id=41861 | 41861 ]]

abs functions need to be moved under the c++ macro to avoid conflicts with included headers.

Reviewers: tra, jdoerfert, hfinkel, ABataev, caomhin

Reviewed By: jdoerfert

Subscribers: guansong, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D61959

llvm-svn: 360809
2019-05-15 20:28:23 +00:00
Gheorghe-Teodor Bercea 7641f310d7 [OpenMP][bugfix] Fix issues with C++ 17 compilation when handling math functions
Summary: In OpenMP device offloading we must ensure that unde C++ 17, the inclusion of cstdlib will works correctly.

Reviewers: ABataev, tra, jdoerfert, hfinkel, caomhin

Reviewed By: jdoerfert

Subscribers: Hahnfeld, guansong, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D61949

llvm-svn: 360804
2019-05-15 20:18:21 +00:00
Volodymyr Sapsai 51e79f0634 [X86] Make `x86intrin.h`, `immintrin.h` includable with `-fno-gnu-inline-asm`.
Currently `immintrin.h` includes `pconfigintrin.h` and `sgxintrin.h`
which contain inline assembly. It causes failures when building with the
flag `-fno-gnu-inline-asm`.

Fix by excluding functions with inline assembly when this extension is
disabled. So far there was no need to support `_pconfig_u32`,
`_enclu_u32`, `_encls_u32`, `_enclv_u32` on platforms that require
`-fno-gnu-inline-asm`. But if developers start using these functions,
they'll have compile-time undeclared identifier errors which is
preferrable to runtime errors.

rdar://problem/49540880

Reviewers: craig.topper, GBuella, rnk, echristo

Reviewed By: rnk

Subscribers: jkorous, dexonsmith, cfe-commits

Differential Revision: https://reviews.llvm.org/D61621

llvm-svn: 360630
2019-05-13 22:40:11 +00:00
Gheorghe-Teodor Bercea 946957189d [OpenMP][Clang][BugFix] Split declares and math functions inclusion.
Summary: This patches fixes an issue in which the __clang_cuda_cmath.h header is being included even when cmath or math.h headers are not included.

Reviewers: jdoerfert, ABataev, hfinkel, caomhin, tra

Reviewed By: tra

Subscribers: tra, mgorny, guansong, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D61765

llvm-svn: 360626
2019-05-13 22:11:44 +00:00
Reid Kleckner 55fab1ff48 Revert Include corecrt.h in stddef.h and vcruntime.h in stdarg.h to improve MS compatibility.
This reverts r360271 (git commit a0933bd8ec)

There are concerns on the review that this breaks EFI builds and that
the transitive includes (sal.h) are actually heavy enough that we might
care.

llvm-svn: 360291
2019-05-08 22:01:20 +00:00
Mike Rice a0933bd8ec Include corecrt.h in stddef.h and vcruntime.h in stdarg.h to improve MS
compatibility.  This allows some applications developed with MSVC to
compile with clang without any extra changes.

Fixes: llvm.org/PR40789

Differential Revision: https://reviews.llvm.org/D61646

llvm-svn: 360271
2019-05-08 17:15:21 +00:00
Gheorghe-Teodor Bercea e62c693c8e [OpenMP][Clang] Support for target math functions
Summary:
In this patch we propose a temporary solution to resolving math functions for the NVPTX toolchain, temporary until OpenMP variant is supported by Clang.

We intercept the inclusion of math.h and cmath headers and if we are in the OpenMP-NVPTX case, we re-use CUDA's math function resolution mechanism.

Authors:
@gtbercea
@jdoerfert

Reviewers: hfinkel, caomhin, ABataev, tra

Reviewed By: hfinkel, ABataev, tra

Subscribers: JDevlieghere, mgorny, guansong, cfe-commits, jdoerfert

Tags: #clang

Differential Revision: https://reviews.llvm.org/D61399

llvm-svn: 360265
2019-05-08 15:52:33 +00:00
Jonas Devlieghere fe608c938c Revert "[OpenMP][Clang] Support for target math functions"
This commit appears to be breaking stage-2 builds on GreenDragon. The
OpenMP wrappers for cmath and math.h are copied into the root of the
resource directory and cause a cyclic dependency in module 'Darwin':
Darwin -> std -> Darwin. This blows up when CMake is testing for modules
support and breaks all stage 2 module builds, including the ThinLTO bot
and all LLDB bots.

CMake Error at cmake/modules/HandleLLVMOptions.cmake:497 (message):
  LLVM_ENABLE_MODULES is not supported by this compiler

llvm-svn: 360192
2019-05-07 21:08:15 +00:00
Gheorghe-Teodor Bercea 1e28a668bc [OpenMP][Clang] Support for target math functions
Summary:
In this patch we propose a temporary solution to resolving math functions for the NVPTX toolchain, temporary until OpenMP variant is supported by Clang.

We intercept the inclusion of math.h and cmath headers and if we are in the OpenMP-NVPTX case, we re-use CUDA's math function resolution mechanism.

Authors:
@gtbercea
@jdoerfert

Reviewers: hfinkel, caomhin, ABataev, tra

Reviewed By: hfinkel, ABataev, tra

Subscribers: mgorny, guansong, cfe-commits, jdoerfert

Tags: #clang

Differential Revision: https://reviews.llvm.org/D61399

llvm-svn: 360063
2019-05-06 18:19:15 +00:00
Fangrui Song 041c377a59 [X86] Move files to correct directories after D60552
llvm-svn: 360022
2019-05-06 09:24:36 +00:00
Luo, Yuanke 844f662932 Enable intrinsics of AVX512_BF16, which are supported for BFLOAT16 in Cooper Lake
Summary:
1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
2. Enable intrinsics for VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
For more details about BF16 intrinsic, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference

Patch by LiuTianle

Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: mgorny, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D60552

llvm-svn: 360018
2019-05-06 08:25:11 +00:00
Tom Stellard dbe1c4aa6f lib/Header: Fix Visual Studio builds try #2
Summary:
This is a follow up to r355253 and a better fix than the first attempt
which was r359257.

We can't install anything from ${CMAKE_CFG_INTDIR}, because this value
is only defined at build time, but we still must make sure to copy the
headers into ${CMAKE_CFG_INTDIR}/lib/clang/$VERSION/include, because the lit
tests look for headers there.  So for this fix we revert to the
old behavior of copying the headers to ${CMAKE_CFG_INTDIR}/lib/clang/$VERSION/include
during the build and then installing them from the source tree.

Reviewers: smeenai, vzakhari, phosek

Reviewed By: smeenai, vzakhari

Subscribers: mgorny, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D61220

llvm-svn: 359654
2019-05-01 06:18:03 +00:00
Javed Absar 18b0c40bc5 [AArch64] Add support for MTE intrinsics
This provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
These intrinsics are available when __ARM_FEATURE_MEMORY_TAGGING is defined.
Each intrinsic is described in detail in the ACLE Q1 2019 documentation:
https://developer.arm.com/docs/101028/latest
Reviewed By: Tim Nortover, David Spickett
Differential Revision: https://reviews.llvm.org/D60485

llvm-svn: 359348
2019-04-26 21:08:11 +00:00
Tom Stellard 0184819e81 Revert lib/Header: Fix Visual Studio builds
This reverts r359257 (git commit 00d9789509)

This broke check-clang.

llvm-svn: 359258
2019-04-26 01:43:59 +00:00
Tom Stellard 00d9789509 lib/Header: Fix Visual Studio builds
Summary:
This is a follow up to r355253, which inadvertently broke Visual
Studio builds by trying to copy files from CMAKE_CFG_INTDIR.

See https://reviews.llvm.org/D58537#inline-532492

Reviewers: smeenai, vzakhari, phosek

Reviewed By: smeenai

Subscribers: mgorny, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D61054

llvm-svn: 359257
2019-04-26 01:18:59 +00:00
Jinsong Ji 12450d51a2 [PowerPC][NFC]Update licence to Apache 2
llvm-svn: 359164
2019-04-25 02:40:06 +00:00
Qiu Chaofan 19828e399b [PowerPC] [Clang] Port MMX intrinsics and basic test cases to Power
Port mmintrin.h which include x86 MMX intrinsics implementation to PowerPC platform (using Altivec).

To make the include process correct, PowerPC's toolchain class is overrided to insert new headers directory (named ppc_wrappers) into the path. Basic test cases for several intrinsic functions are added.

The header is mainly developed by Steven Munroe, with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.

Reviewed By: Jinsong Ji

Differential Revision: https://reviews.llvm.org/D59924

llvm-svn: 358949
2019-04-23 05:50:24 +00:00
Evgeny Mankov 88aa3d7237 [CUDA][Windows] Restrict long double device functions declarations to Windows
As agreed in D60220, make long double declarations unobservable on non-windows platforms.

[Testing]
{Windows 10, Ubuntu 16.04.5}/{Visual C++ 2017 15.9.11 & 2019 16.0.1, gcc+ 5.4.0}/CUDA {8.0, 9.0, 9.1, 9.2, 10.0, 10.1}

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D60818

llvm-svn: 358654
2019-04-18 10:08:55 +00:00
Craig Topper 8e364c680f [X86] Restore the pavg intrinsics.
The pattern we replaced these with may be too hard to match as demonstrated by
PR41496 and PR41316.

This patch restores the intrinsics and then we can start focusing
on the optimizing the intrinsics.

I've mostly reverted the original patch that removed them. Though I modified
the avx512 intrinsics to not have masking built in.

Differential Revision: https://reviews.llvm.org/D60674

llvm-svn: 358427
2019-04-15 17:17:35 +00:00
Chandler Carruth 4cf5743b77 Move the builtin headers to use the new license file header.
Summary:
These all had somewhat custom file headers with different text from the
ones I searched for previously, and so I missed them. Thanks to Hal and
Kristina and others who prompted me to fix this, and sorry it took so
long.

Reviewers: hfinkel

Subscribers: mcrosier, javed.absar, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D60406

llvm-svn: 357941
2019-04-08 20:51:30 +00:00
Evgeny Mankov 66a8b07cd9 [CUDA][Windows] Last fix for the clang Bug 38811 "Clang fails to compile with CUDA-9.x on Windows" (https://bugs.llvm.org/show_bug.cgi?id=38811).
[IMPORTANT]
With that last fix, CUDA has just started being compiling by clang on Windows after nearly a year and two clang’s major releases (7 and 8).
As long as the last LLVM release, in which clang was compiling CUDA on Windows successfully, was 6.0.1, this fix and two previous have to be included into upcoming 7.1.0 and 8.0.1 releases.

[How to repro]
clang++.exe -x cuda "c:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\0_Simple\simplePrintf\simplePrintf.cu" -I"c:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\common\inc" --cuda-gpu-arch=sm_50 --cuda-path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0" -L"c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64" -lcudart.lib  -v

[Output]
In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:390:11: error: no matching function for call to '__isinfl'
  return (__isinfl(a) != 0);
          ^~~~~~~~
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2662:14: note: candidate function not viable: call to __host__ function from __device__ function
__func__(int __isinfl(long double a))
             ^
In file included from <built-in>:1:
In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:438:11: error: no matching function for call to '__isnanl'
  return (__isnanl(a) != 0);
          ^~~~~~~~
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2672:14: note: candidate function not viable: call to __host__ function from __device__ function
__func__(int __isnanl(long double a))
             ^
In file included from <built-in>:1:
In file included from C:\GIT\LLVM\trunk-for-submits\llvm-64-release-vs2017-15.9.9\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:327:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:486:11: error: no matching function for call to '__finitel'
  return (__finitel(a) != 0);
          ^~~~~~~~~
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0/include\crt/math_functions.hpp:2652:14: note: candidate function not viable: call to __host__ function from __device__ function
__func__(int __finitel(long double a))
             ^
3 errors generated when compiling for sm_50.

[Solution]
Add missing long double device functions' declarations. Provide only declarations to prevent any use of long double on the device side, because CUDA does not support long double on the device side.

[Testing]
{Windows 10, Ubuntu 16.04.5}/{Visual C++ 2017 15.9.9, gcc+ 5.4.0}/CUDA {8.0, 9.0, 9.1, 9.2, 10.0, 10.1}

Reviewed by: Artem Belevich

Differential Revision: http://reviews.llvm.org/D60220

llvm-svn: 357779
2019-04-05 16:51:10 +00:00
Craig Topper 6af0363857 [X86] Make _bswap intrinsic a function instead of a macro to hopefully fix the chromium build.
This intrinsic was added in r356848 but was implemented as a macro to match gcc.

llvm-svn: 356862
2019-03-24 18:00:20 +00:00
Craig Topper 88f4054f48 [X86] Add BSR/BSF/BSWAP intrinsics to ia32intrin.h to match gcc.
Summary:
These are all implemented by icc as well.

I made bit_scan_forward/reverse forward to the __bsfd/__bsrq since we also have
__bsfq/__bsrq.

Note, when lzcnt is enabled the bsr intrinsics generates lzcnt+xor instead of bsr.

Reviewers: RKSimon, spatel

Subscribers: cfe-commits, llvm-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D59682

llvm-svn: 356848
2019-03-24 00:56:52 +00:00
Craig Topper 1383340422 [X86] Add __popcntd and __popcntq to ia32intrin.h to match gcc and icc. Remove popcnt feature flag from _popcnt32/_popcnt64 and move to ia32intrin.h to match gcc
gcc and icc both implement popcntd and popcntq which we did not. gcc doesn't seem to require a feature flag for the _popcnt32/_popcnt64 spelling and will use a libcall if its not supported.

Differential Revision: https://reviews.llvm.org/D59567

llvm-svn: 356689
2019-03-21 17:43:53 +00:00
Craig Topper e0941cb326 [X86] Add __crc32b/__crc32w/__crc32d/__crc32q intrinsics to match gcc and icc.
gcc has these intrinsics in ia32intrin.h as well. And icc implements them
though they aren't documented in the Intel Intrinsics Guide.

Differential Revision: https://reviews.llvm.org/D59533

llvm-svn: 356609
2019-03-20 20:25:28 +00:00
Craig Topper 8b653d0308 [X86] Add gcc rotate intrinsics to ia32intrin.h
This is another attempt at what Erich Keane tried to do in r355322.

This adds rolb, rolw, rold, rolq and their ror equivalent as always_inline wrappers around __builtin_rotate* which will lower to funnel shift intrinsics in IR.

Additionally, when _MSC_VER is not defined we will define _rotl, _lrotl, _rotr, _lrotr as macros to one of the always_inline intrinsics mentioned above. Making sure that _lrotl/_lrotr use either 32 or 64 bit based on the size of long. These need to be macros because we have builtins with the same name for MS compatibility, but _MSC_VER isn't always defined when those builtins are enabled.

We also define _rotwl and _rotwr as macros aliasing to rolw/rorw just like gcc to complete the set. These don't need to be gated with _MSC_VER because these aren't MS builtins.

I've added tests both for non-MS and -ms-extensions with and without _MSC_VER being defined.

Differential Revision: https://reviews.llvm.org/D59346

llvm-svn: 356423
2019-03-18 22:25:57 +00:00
Evgeny Mankov 177301f048 [CUDA][Windows] Partial fix for bug 38811 (Step 2 of 3)
Partial fix for the clang Bug 38811 "Clang fails to compile with CUDA-9.x on Windows".

[Synopsis]
__sptr is a new Microsoft specific modifier (https://docs.microsoft.com/en-us/cpp/cpp/sptr-uptr?view=vs-2017).

[Solution]
Replace all `__sptr` occurrences with `__s` (and all `__cptr` with `__c` as well) to eliminate the below clang compilation error on Windows.

In file included from C:\GIT\LLVM\trunk\llvm-64-release-vs2017-15.9.5\dist\lib\clang\9.0.0\include\__clang_cuda_runtime_wrapper.h:162:
C:\GIT\LLVM\trunk\llvm-64-release-vs2017-15.9.5\dist\lib\clang\9.0.0\include\__clang_cuda_device_functions.h:524:33: error: expected expression
  return __nv_fast_sincosf(__a, __sptr, __cptr);
                                ^
Reviewed by: Artem Belevich

Differential Revision: http://reviews.llvm.org/D59423

llvm-svn: 356291
2019-03-15 19:04:46 +00:00
Evgeny Mankov 04188fc0c6 [CUDA][Windows] Partial fix for bug #38811 (Step 1 of 3)
Partial fix for the clang Bug https://bugs.llvm.org/show_bug.cgi?id=38811 "Clang fails to compile with CUDA-9.x on Windows".

Adding defined(_WIN64) check along with existing #if defined(__LP64__) eliminates the below clang (64-bit) compilation error on Windows.

C:/GIT/LLVM/trunk/llvm-64-release-vs2017/dist/lib/clang/9.0.0\include\__clang_cuda_device_functions.h(1609,45): error GEF7559A7: no matching function for call to 'roundf'
 __DEVICE__ long lroundf(float __a) { return roundf(__a); }

Reviewed by: Artem Belevich

Differential Revision: http://reviews.llvm.org/D59361

llvm-svn: 356255
2019-03-15 12:05:36 +00:00
Shoaib Meenai 5be71faf4b [build] Rename clang-headers to clang-resource-headers
Summary:
The current install-clang-headers target installs clang's resource
directory headers. This is different from the install-llvm-headers
target, which installs LLVM's API headers. We want to introduce the
corresponding target to clang, and the natural name for that new target
would be install-clang-headers. Rename the existing target to
install-clang-resource-headers to free up the install-clang-headers name
for the new target, following the discussion on cfe-dev [1].

I didn't find any bots on zorg referencing install-clang-headers. I'll
send out another PSA to cfe-dev to accompany this rename.

[1] http://lists.llvm.org/pipermail/cfe-dev/2019-February/061365.html

Reviewers: beanz, phosek, tstellar, rnk, dim, serge-sans-paille

Subscribers: mgorny, javed.absar, jdoerfert, #sanitizers, openmp-commits, lldb-commits, cfe-commits, llvm-commits

Tags: #clang, #sanitizers, #lldb, #openmp, #llvm

Differential Revision: https://reviews.llvm.org/D58791

llvm-svn: 355340
2019-03-04 21:19:53 +00:00
Tom Stellard 4f076000c6 lib/Header: Simplify CMakeLists.txt
Summary:
Replace cut and pasted code with cmake macros and reduce the number of
install commands.  This fixes an issue where the headers were being
installed twice.

This clean up should also make future modifications easier, like
adding a cmake option to install header files into a custom resource
directory.

Reviewers: chandlerc, smeenai, mgorny, beanz, phosek

Reviewed By: smeenai

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D58537

llvm-svn: 355253
2019-03-02 00:50:13 +00:00
Louis Dionne c2d95792d6 [clang] Only provide C11 features in <float.h> starting with C++17
Summary:
In r353970, I enabled those features in C++11 and above. To be strictly
conforming, those features should only be enabled in C++17 and above.

Reviewers: jfb, eli.friedman

Subscribers: jkorous, dexonsmith, libcxx-commits

Differential Revision: https://reviews.llvm.org/D58289

llvm-svn: 354691
2019-02-22 20:48:54 +00:00
Shoaib Meenai defb5a383b [clang] Switch to LLVM_ENABLE_IDE
r344555 switched LLVM to guarding install targets with LLVM_ENABLE_IDE
instead of CMAKE_CONFIGURATION_TYPES, which expresses the intent more
directly and can be overridden by a user. Make the corresponding change
in clang. LLVM_ENABLE_IDE is computed by HandleLLVMOptions, so it should
be available for both standalone and integrated builds.

Differential Revision: https://reviews.llvm.org/D58284

llvm-svn: 354525
2019-02-20 23:08:43 +00:00
Louis Dionne defa9f8f85 [clang] Make sure C99/C11 features in <float.h> are provided in C++11
Summary:
Previously, those #defines were only provided in C or when GNU extensions were
enabled. We need those #defines in C++11 and above, too.

Reviewers: jfb, eli.friedman

Subscribers: jkorous, dexonsmith, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D58149

llvm-svn: 353970
2019-02-13 19:08:01 +00:00
Simon Atanasyan 4c22a57414 [Headers][mips] Add `__attribute__((__mode__(__unwind_word__)))` to the _Unwind_Word / _Unwind_SWord definitions
The rationale of this change is to fix _Unwind_Word / _Unwind_SWord
definitions for MIPS N32 ABI. This ABI uses 32-bit pointers,
but _Unwind_Word and _Unwind_SWord types are eight bytes long.

 # The __attribute__((__mode__(__unwind_word__))) is added to the type
   definitions. It makes them equal to the corresponding definitions used
   by GCC and allows to override types using `getUnwindWordWidth` function.
 # The `getUnwindWordWidth` virtual function override in the `MipsTargetInfo`
   class and provides correct type size values.

Differential revision: https://reviews.llvm.org/D58165

llvm-svn: 353965
2019-02-13 18:27:09 +00:00
Reid Kleckner 79d7f4114d [X86] Use __m128_u for _mm_loadu_ps after r353555
Add secondary triple to existing SSE test for it.  I audited other uses
of __attribute__((__packed__)) in the intrinsic headers, and this seemed
to be the only missing one.

llvm-svn: 353878
2019-02-12 21:04:21 +00:00
Craig Topper 4390c721cb [X86] Use the new unaligned vector typedefs for the loadu/storeu intrinsics pointer arguments.
This matches what gcc does and what was suggested by rnk in PR20670.

llvm-svn: 353802
2019-02-12 07:44:40 +00:00
Tom Tan 42b2424e4f [COFF, ARM64] Remove definitions for _byteswap library functions
_byteswap_* functions are are implemented in below file as normal function
from libucrt.lib and declared in stdlib.h. Define them in intrin.h triggers
lld error "conflicting comdat type" and "duplicate symbols" which was just
added to LLD (https://reviews.llvm.org/D57324).

C:\Program Files (x86)\Windows Kits\10\Source\10.0.17763.0\ucrt\stdlib\byteswap.cpp

Differential Revision: https://reviews.llvm.org/D57915

llvm-svn: 353740
2019-02-11 20:04:02 +00:00
Craig Topper be4cbe8726 [X86] Add explicit alignment to __m128/__m128i/__m128d/etc. to allow matching of MSVC behavior with #pragma pack.
Summary:
With MSVC, #pragma pack is ignored when there is explicit alignment. This differs from gcc. Clang emulates this difference when compiling for Windows.

It appears that MSVC and its headers consider the __m128/__m128i/__m128d/etc. types to be explicitly aligned and ignores #pragma pack for them. Since we don't have explicit alignment on them in our headers, we don't match the MSVC behavior here.

This patch adds explicit alignment to match this behavior. I'm hoping this won't cause any problems when we're not emulating MSVC. But if someone knows of something that would be different we can swith to conditionally adding the alignment based on _MSC_VER.

I had to add explicitly unaligned types as well so we could use them in the loadu/storeu intrinsics which use __attribute__(__packed__). Using the now explicitly aligned types wouldn't produce align 1 accesses when targeting Windows.

Reviewers: rnk, erichkeane, spatel, RKSimon

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D57961

llvm-svn: 353555
2019-02-08 19:45:08 +00:00
Eli Friedman 3189d5f48c [COFF, ARM64] Fix types for _ReadStatusReg, _WriteStatusReg
r344765 added those intrinsics, but used the wrong types.

Patch by Mike Hommey

Differential Revision: https://reviews.llvm.org/D57636

llvm-svn: 353493
2019-02-08 01:17:49 +00:00
Artem Belevich 4071763bb8 Basic CUDA-10 support.
Differential Revision: https://reviews.llvm.org/D57771

llvm-svn: 353232
2019-02-05 22:38:58 +00:00
Artem Belevich c62214da3d [CUDA] add support for the new kernel launch API in CUDA-9.2+.
Instead of calling CUDA runtime to arrange function arguments,
the new API constructs arguments in a local array and the kernels
are launched with __cudaLaunchKernel().

The old API has been deprecated and is expected to go away
in the next CUDA release.

Differential Revision: https://reviews.llvm.org/D57488

llvm-svn: 352799
2019-01-31 21:34:03 +00:00
Matt Arsenault 58fc8082a8 OpenCL: Use length modifier for warning on vector printf arguments
Re-enable format string warnings on printf.

The warnings are still incomplete. Apparently it is undefined to use a
vector specifier without a length modifier, which is not currently
warned on. Additionally, type warnings appear to not be working with
the hh modifier, and aren't warning on all of the special restrictions
from c99 printf.

llvm-svn: 352540
2019-01-29 20:49:54 +00:00
Matt Arsenault 297afb14ec Revert "OpenCL: Extend argument promotion rules to vector types"
This reverts r348083. This was based on a misreading of the spec
for printf specifiers.

Also revert r343653, as without a subsequent patch, a correctly
specified format for a vector will incorrectly warn.

Fixes bug 40491.

llvm-svn: 352539
2019-01-29 20:49:47 +00:00
Craig Topper 8de5abc4c8 [X86] Remove mask and passthru arguments from vpconflict builtins. Use select in IR instead.
llvm-svn: 352173
2019-01-25 07:08:22 +00:00
Craig Topper 9fddc3fd00 [X86] Remove the cvtuqq2ps256/cvtqq2ps256 mask builtins. Replace with uitofp/sitofp and select.
Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: kristina, cfe-commits

Differential Revision: https://reviews.llvm.org/D56965

llvm-svn: 351694
2019-01-20 19:04:56 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Craig Topper d08d90ce51 [X86] Only define _XCR_XFEATURE_ENABLED_MASK in xsaveintrin.h when _MSC_VER is defined. Remove from intrin.h.
I think this was my intention when I added it xsaveintrin.h

llvm-svn: 351568
2019-01-18 17:51:51 +00:00
Craig Topper 931779761e Recommit r351160 "[X86] Make _xgetbv/_xsetbv on non-windows platforms"
V8 has been fixed now.

llvm-svn: 351391
2019-01-16 22:56:25 +00:00
Benjamin Kramer 9c53890833 Revert "[X86] Make _xgetbv/_xsetbv on non-windows platforms"
This reverts commit r351160. Breaks building v8.

llvm-svn: 351210
2019-01-15 17:23:36 +00:00
Roman Lebedev cce1c2eb0e [OpenCL] opencl-c.h: read_image*(): sampler-less, and image{1,2}d_array_t variants are OpenCL-1.2+, mark them as such
Summary:
Refer to [[ https://www.khronos.org/registry/OpenCL/specs/opencl-1.1.pdf#page=242 | `6.11.13.2 Built-in Image Functions` ]],
and [[ https://www.khronos.org/registry/OpenCL/specs/opencl-1.1.pdf#page=306 | `9.6.8 Image Read and Write Functions` ]] of the OpenCL 1.1 spec.

* There is no mention of `image1d_array_t` and `image2d_array_t` anywhere in the OpenCL 1.1 spec.
* All the `read_image{f,i,ui,h}()` functions, as of OpenCL 1.1 spec, have a second required parameter `sampler_t sampler`

Should have prevented the following regression:
https://redmine.darktable.org/issues/12493

Reviewers: yaxunl, Anastasia, echuraev, asavonic

Reviewed By: Anastasia

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D56646

llvm-svn: 351188
2019-01-15 11:20:02 +00:00
Craig Topper 69aed7c364 [X86] Make _xgetbv/_xsetbv on non-windows platforms
Summary:
This patch attempts to redo what was tried in r278783, but was reverted.

These intrinsics should be available on non-windows platforms with "xsave" feature check. But on Windows platforms they shouldn't have feature check since that's how MSVC behaves.

To accomplish this I've added a MS builtin with no feature check. And a normal gcc builtin with a feature check. When _MSC_VER is not defined _xgetbv/_xsetbv will be macros pointing to the gcc builtin name.

I've moved the forward declarations from intrin.h to immintrin.h to match the MSDN documentation and used that as the header file for the MS builtin.

I'm not super happy with this implementation, and I'm open to suggestions for better ways to do it.

Reviewers: rnk, RKSimon, spatel

Reviewed By: rnk

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D56686

llvm-svn: 351160
2019-01-15 05:03:18 +00:00
Mandeep Singh Grang 1b9337f16b [COFF, ARM64] Add __byteswap intrinsics
Reviewers: rnk, efriedma, ssijaric, TomTan, haripul

Reviewed By: efriedma

Subscribers: javed.absar, cfe-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D56685

llvm-svn: 351147
2019-01-15 01:26:26 +00:00
Mandeep Singh Grang 0055f80b3f [COFF, ARM64] Add __nop intrinsic
Reviewers: rnk, efriedma, TomTan, haripul, ssijaric

Reviewed By: rnk, efriedma

Subscribers: javed.absar, kristof.beyls, cfe-commits

Differential Revision: https://reviews.llvm.org/D56671

llvm-svn: 351135
2019-01-14 23:26:01 +00:00
Craig Topper 49488407aa [X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select in IR instead.
Fixes PR40259

llvm-svn: 351036
2019-01-14 08:46:51 +00:00
Craig Topper bdbe5c7dc7 [X86] Make the pointer arguments to avx512 gather/scatter intrinsics 'void*' to match gcc and Intel's documentation.
The avx2 gather intrinsics are documented to use 'int', 'long long', 'float', or 'double' *. So I'm leaving those. This matches gcc.

llvm-svn: 350696
2019-01-09 07:36:01 +00:00
Craig Topper cd9e232a4d Recommit r350555 "[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins."
The MSVC limit hit in AutoUpgrade.cpp has been worked around for now.

llvm-svn: 350568
2019-01-07 21:00:41 +00:00
Craig Topper 33c9088783 Revert r350555 "[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins."
Had to revert the LLVM patch this depends on to fix a MSVC compiler limit in AutoUpgrade.cpp

llvm-svn: 350563
2019-01-07 19:39:25 +00:00
Craig Topper e34f2bb807 [X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins.
Differential Revision: https://reviews.llvm.org/D56365

llvm-svn: 350555
2019-01-07 19:10:22 +00:00
Ulrich Weigand 22ca9c628a [SystemZ] Fix wrong codegen caused by typos in vecintrin.h
The following two bugs in SystemZ high-level vector intrinsics are
fixes by this patch:

- The float case of vec_insert_and_zero should generate a VLLEZF
  pattern, but currently erroneously generates VLLEZLF.

- The float and double versions of vec_orc erroneously generate
  and-with-complement instead of or-with-complement.

The patch also fixes a couple of typos in the associated test.

llvm-svn: 349751
2018-12-20 13:09:09 +00:00
Craig Topper 1f2b181689 [Builltins][X86] Provide implementations of __lzcnt16, __lzcnt, __lzcnt64 for MS compatibility. Remove declarations from intrin.h and implementations from lzcntintrin.h
intrin.h had forward declarations for these and lzcntintrin.h had implementations that were only available with -mlzcnt or a -march that supported the lzcnt feature.

For MS compatibility we should always have these builtins available regardless of X86 being the target or the CPU support the lzcnt instruction. The backends should be able to gracefully fallback to something support even if its just shifts and bit ops.

Unfortunately, gcc also implements 2 of the 3 function names here on X86 when lzcnt feature is enabled.

This patch adds builtins for these for MSVC compatibility and drops the forward declarations from intrin.h. To keep the gcc compatibility the two intrinsics that collided have been turned into macros that use the X86 specific builtins with the lzcnt feature check. These macros are only defined when _MSC_VER is not defined. Without them being macros we can get a redefinition error because -ms-extensions doesn't seem to set _MSC_VER but does make the MS builtins available.

Should fix PR40014

Differential Revision: https://reviews.llvm.org/D55677

llvm-svn: 349098
2018-12-14 00:21:02 +00:00
Craig Topper 6d7a7ef9eb [X86] Remove the addcarry builtins. Leaving only the addcarryx builtins since that matches gcc.
The addcarry and addcarryx builtins do the same thing. The only difference is that addcarryx previously required adx feature.

This commit removes the adx feature check from addcarryx and removes the addcarry builtin. This matches the builtins that gcc has. We don't guarantee compatibility in builtins, but we generally try to be consistent if its not a burden.

llvm-svn: 348738
2018-12-10 06:07:59 +00:00
Artem Belevich 43cfce88b4 [CUDA] Added missing 'inline' for functions defined in a header.
llvm-svn: 348662
2018-12-07 22:20:53 +00:00
Stefan Granitz 5c1ec2e9cb [CMake] Store path to vendor-specific headers in clang-headers target property
Summary:
LLDB.framework wants a copy these headers. With this change LLDB can easily glob for the list of files:
```
get_target_property(clang_include_dir clang-headers RUNTIME_OUTPUT_DIRECTORY)
file(GLOB_RECURSE clang_vendor_headers RELATIVE ${clang_include_dir} "${clang_include_dir}/*")
```

By default `RUNTIME_OUTPUT_DIRECTORY` is unset for custom targets like `clang-headers`.

Reviewers: aprantl, JDevlieghere, davide, friss, dexonsmith

Reviewed By: JDevlieghere

Subscribers: mgorny, #lldb, cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D55128

llvm-svn: 348116
2018-12-03 10:34:25 +00:00
Nemanja Ivanovic 2447baff84 [PowerPC] Vector load/store builtins overstate alignment of pointers
A number of builtins in altivec.h load/store vectors from pointers to scalar
types. Currently they just cast the pointer to a vector pointer, but expressions
like that have the alignment of the target type. Of course, the input pointer
did not have that alignment so this triggers UBSan (and rightly so).

This resolves https://bugs.llvm.org/show_bug.cgi?id=39704

Differential revision: https://reviews.llvm.org/D54787

llvm-svn: 347556
2018-11-26 14:35:38 +00:00
Zi Xuan Wu 71c35e13c3 [PowerPC] [Clang] [AltiVec] The second parameter of vec_sr function should be modulo the number of bits in the element
The second parameter of vec_sr function is representing shift bits and it should be modulo the number of bits in the element like what vec_sl does now. 
This is actually required by the ABI:

Each element of the result vector is the result of logically right shifting the corresponding
element of ARG1 by the number of bits specified by the value of the corresponding
element of ARG2, modulo the number of bits in the element. The bits that are shifted out
are replaced by zeros.

Differential Revision: https://reviews.llvm.org/D54087

llvm-svn: 346471
2018-11-09 03:35:32 +00:00
Andrew Savonichev 3fee351867 [OpenCL] Add support of cl_intel_device_side_avc_motion_estimation extension
Summary:
Documentation can be found at https://www.khronos.org/registry/OpenCL/extensions/intel/cl_intel_device_side_avc_motion_estimation.txt

Patch by Kristina Bessonova


Reviewers: Anastasia, yaxunl, shafik

Reviewed By: Anastasia

Subscribers: arphaman, sidorovd, AlexeySotkin, krisb, bader, asavonic, cfe-commits

Differential Revision: https://reviews.llvm.org/D51484

llvm-svn: 346392
2018-11-08 11:25:41 +00:00
Andrew Savonichev 3b12b7e702 Revert r346326 [OpenCL] Add support of cl_intel_device_side_avc_motion_estimation
This patch breaks Index/opencl-types.cl LIT test:

Script:
--
: 'RUN: at line 1';   stage1/bin/c-index-test -test-print-type llvm/tools/clang/test/Index/opencl-types.cl -cl-std=CL2.0 | stage1/bin/FileCheck llvm/tools/clang/test/Index/opencl-types.cl
--
Command Output (stderr):
--
llvm/tools/clang/test/Index/opencl-types.cl:3:26: warning: unsupported OpenCL extension 'cl_khr_fp16' - ignoring [-Wignored-pragmas]
llvm/tools/clang/test/Index/opencl-types.cl:4:26: warning: unsupported OpenCL extension 'cl_khr_fp64' - ignoring [-Wignored-pragmas]
llvm/tools/clang/test/Index/opencl-types.cl:8:9: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:11:8: error: declaring variable of type 'half' is not allowed
llvm/tools/clang/test/Index/opencl-types.cl:15:3: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:16:3: error: use of type 'double4' (vector of 4 'double' values) requires cl_khr_fp64 extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:26:26: warning: unsupported OpenCL extension 'cl_khr_gl_msaa_sharing' - ignoring [-Wignored-pragmas]
llvm/tools/clang/test/Index/opencl-types.cl:35:44: error: use of type '__read_only image2d_msaa_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:36:49: error: use of type '__read_only image2d_array_msaa_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:37:49: error: use of type '__read_only image2d_msaa_depth_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:38:54: error: use of type '__read_only image2d_array_msaa_depth_t' requires cl_khr_gl_msaa_sharing extension to be enabled

llvm-svn: 346338
2018-11-07 18:34:19 +00:00
Andrew Savonichev 35dfce723c [OpenCL] Add support of cl_intel_device_side_avc_motion_estimation extension
Summary:
Documentation can be found at https://www.khronos.org/registry/OpenCL/extensions/intel/cl_intel_device_side_avc_motion_estimation.txt

Patch by Kristina Bessonova


Reviewers: Anastasia, yaxunl, shafik

Reviewed By: Anastasia

Subscribers: arphaman, sidorovd, AlexeySotkin, krisb, bader, asavonic, cfe-commits

Differential Revision: https://reviews.llvm.org/D51484

llvm-svn: 346326
2018-11-07 15:44:01 +00:00
Reid Kleckner 902fa41188 [MS] Zero out ECX in __cpuid in intrin.h
Summary:
Some CPUID leafs depend on the value of ECX as well as EAX, but we left
it uninitialized.

Originally reported as https://crbug.com/901547

Reviewers: craig.topper, hans

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D54171

llvm-svn: 346265
2018-11-06 20:45:26 +00:00
Mandeep Singh Grang 574caddc0d [COFF, ARM64] Implement InterlockedDecrement*_* builtins
This is eight in a series of patches to move intrinsic definitions out of intrin.h.

Differential: https://reviews.llvm.org/D54068
llvm-svn: 346208
2018-11-06 05:07:43 +00:00
Mandeep Singh Grang fdf74d9751 [COFF, ARM64] Implement InterlockedIncrement*_* builtins
This is seventh in a series of patches to move intrinsic definitions out of intrin.h.

Differential: https://reviews.llvm.org/D54067
llvm-svn: 346207
2018-11-06 05:05:32 +00:00
Mandeep Singh Grang c89157b5c1 [COFF, ARM64] Implement InterlockedAnd*_* builtins
This is sixth in a series of patches to move intrinsic definitions out of intrin.h.

Differential: https://reviews.llvm.org/D54066
llvm-svn: 346206
2018-11-06 05:03:13 +00:00
Mandeep Singh Grang 806f10701b [COFF, ARM64] Implement InterlockedXor*_* builtins
This is fifth in a series of patches to move intrinsic definitions out of intrin.h.

Note: This was reviewed and approved in D54065 but somehow that diff was messed
up. Committing this again with the proper diff.

llvm-svn: 346205
2018-11-06 04:55:20 +00:00
Mandeep Singh Grang d9f70b1495 Revert "[COFF, ARM64] Implement InterlockedXor*_* builtins"
This reverts commit cc3d3cd0fbeb88412d332354c261ff139c4ede6b.

llvm-svn: 346192
2018-11-06 01:14:24 +00:00
Mandeep Singh Grang d8a4455d97 [COFF, ARM64] Implement InterlockedXor*_* builtins
Summary: This is fifth in a series of patches to move intrinsic definitions out of intrin.h.

Reviewers: rnk, efriedma, mstorsjo, TomTan

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits

Differential Revision: https://reviews.llvm.org/D54065

llvm-svn: 346191
2018-11-06 01:12:29 +00:00
Mandeep Singh Grang ec62b31e2c [COFF, ARM64] Implement InterlockedOr*_* builtins
This is fourth in a series of patches to move intrinsic definitions out of intrin.h.

llvm-svn: 346190
2018-11-06 01:11:25 +00:00
Mandeep Singh Grang 6b880689f0 [COFF, ARM64] Implement InterlockedCompareExchange*_* builtins
Summary: This is third in a series of patches to move intrinsic definitions out of intrin.h.

Reviewers: rnk, efriedma, mstorsjo, TomTan

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits

Differential Revision: https://reviews.llvm.org/D54062

llvm-svn: 346189
2018-11-06 00:36:48 +00:00
Mandeep Singh Grang 7fa07e554d [COFF, ARM64] Implement InterlockedExchange*_* builtins
Summary: Windows SDK needs these intrinsics to be proper builtins.  This is second in a series of patches to move intrinsic defintions out of intrin.h.

Reviewers: rnk, mstorsjo, efriedma, TomTan

Reviewed By: rnk, efriedma

Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits

Differential Revision: https://reviews.llvm.org/D54046

llvm-svn: 346044
2018-11-02 21:18:23 +00:00
Eli Friedman b262d1631e [ARM64] [Windows] Implement _InterlockedExchangeAdd*_* builtins.
These apparently need to be proper builtins to handle the Windows
SDK.

Differential Revision: https://reviews.llvm.org/D53916

llvm-svn: 345779
2018-10-31 21:31:09 +00:00
Andrew Savonichev 70b6bbe2bb [OpenCL] Remove PIPE_RESERVE_ID_VALID_BIT from opencl-c.h
Summary:
PIPE_RESERVE_ID_VALID_BIT is implementation defined, so lets not keep it in the header. 

Previously the topic was discussed here: https://reviews.llvm.org/D32896 

Reviewers: Anastasia, yaxunl

Reviewed By: Anastasia

Subscribers: cfe-commits, asavonic, bader

Differential Revision: https://reviews.llvm.org/D52658

llvm-svn: 345051
2018-10-23 17:05:29 +00:00
Andrew Savonichev 700c3bea9e [OpenCL] Add cl_intel_planar_yuv extension
Just adding a preprocessor #define for the extension.

Patch by Alexey Sotkin and Dmitry Sidorov

Phabricator review: https://reviews.llvm.org/D51402

llvm-svn: 345044
2018-10-23 16:13:16 +00:00
Craig Topper eae26bf737 [X86] Add more intrinsics to match icc.
This adds
_mm_loadu_epi8, _mm256_loadu_epi8, _mm512_loadu_epi8
_mm_loadu_epi16, _mm256_loadu_epi16, _mm512_loadu_epi16
_mm_storeu_epi8, _mm256_storeu_epi8, _mm512_storeu_epi8
_mm_storeu_epi16, _mm256_storeu_epi16, _mm512_storeu_epi16

llvm-svn: 344862
2018-10-20 19:28:52 +00:00
Craig Topper 58508be3c0 [X86] Add missing intrinsics to match icc.
This adds
_mm_and_epi32, _mm_and_epi64
_mm_andnot_epi32, _mm_andnot_epi64
_mm_or_epi32, _mm_or_epi64
_mm_xor_epi32, _mm_xor_epi64
_mm256_and_epi32, _mm256_and_epi64
_mm256_andnot_epi32, _mm256_andnot_epi64
_mm256_or_epi32, _mm256_or_epi64
_mm256_xor_epi32, _mm256_xor_epi64
_mm_loadu_epi32, _mm_loadu_epi64
_mm_load_epi32, _mm_load_epi64
_mm256_loadu_epi32, _mm256_loadu_epi64
_mm256_load_epi32, _mm256_load_epi64
_mm512_loadu_epi32, _mm512_loadu_epi64
_mm512_load_epi32, _mm512_load_epi64
_mm_storeu_epi32, _mm_storeu_epi64
_mm_store_epi32, _mm_load_epi64
_mm256_storeu_epi32, _mm256_storeu_epi64
_mm256_store_epi32, _mm256_load_epi64
_mm512_storeu_epi32, _mm512_storeu_epi64
_mm512_store_epi32,V _mm512_load_epi64

llvm-svn: 344861
2018-10-20 19:28:50 +00:00
Mandeep Singh Grang 2147b1af95 [COFF, ARM64] Add _ReadStatusReg and_WriteStatusReg intrinsics
Reviewers: rnk, compnerd, mstorsjo, efriedma, TomTan, haripul, javed.absar

Reviewed By: efriedma

Subscribers: dmajor, kristof.beyls, chrib, cfe-commits

Differential Revision: https://reviews.llvm.org/D53115

llvm-svn: 344765
2018-10-18 23:35:35 +00:00
Mandeep Singh Grang df7929676d [COFF, ARM64] Add _InterlockedAdd intrinsic
Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, javed.absar, efriedma

Reviewed By: efriedma

Subscribers: efriedma, kristof.beyls, chrib, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D52811

llvm-svn: 343894
2018-10-05 21:57:41 +00:00
Mandeep Singh Grang ecc82ef0c2 [COFF, ARM64] Add __getReg intrinsic
Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, javed.absar, efriedma

Reviewed By: efriedma

Subscribers: peter.smith, efriedma, kristof.beyls, chrib, cfe-commits

Differential Revision: https://reviews.llvm.org/D52838

llvm-svn: 343824
2018-10-04 22:32:42 +00:00
Matt Arsenault a01151294a OpenCL: Mark printf format string argument
Fixes not warning on format string errors.

llvm-svn: 343653
2018-10-03 02:01:19 +00:00
Craig Topper 716e8e6858 [X86] Add more of the icc unaligned load/store to/from 128 bit vector intrinsics
Summary:
This patch adds
_mm_loadu_si32
_mm_loadu_si16
_mm_storeu_si64
_mm_storeu_si32
_mm_storeu_si16

We already had _mm_load_si64.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D52665

llvm-svn: 343388
2018-09-29 17:49:42 +00:00
Craig Topper 6ad9220067 [X86] Add the movbe instruction intrinsics from icc.
These intrinsics exist in icc. They can be found on the Intel Intrinsics Guide website.

All the backend support is in place to pattern match a load+bswap or a bswap+store pattern to the MOVBE instructions. So we just need to get the frontend to emit the correct IR. The pointer arguments in icc are declared as void so I had to jump through a packed struct to forcing a specific alignment on the load/store. Same trick we use in the unaligned vector load/store intrinsics

Differential Revision: https://reviews.llvm.org/D52586

llvm-svn: 343343
2018-09-28 17:09:51 +00:00
Craig Topper fb5d9f2849 [X86] For lzcnt/tzcnt intrinsics use cttz/ctlz intrinsics with zero_undef flag set to false.
Previously we used a select and the zero_undef=true intrinsic. In -O2 this pattern will get optimized to zero_undef=false. But in -O0 this optimization won't happen. This results in a compare and cmov being wrapped around a tzcnt/lzcnt instruction.

By using the zero_undef=false intrinsic directly without the select, we can improve the -O0 codegen to just an lzcnt/tzcnt instruction.

Differential Revision: https://reviews.llvm.org/D52392

llvm-svn: 343126
2018-09-26 17:01:44 +00:00
Artem Belevich 44ecb0e3c2 [CUDA] Added basic support for compiling with CUDA-10.0
llvm-svn: 342924
2018-09-24 23:10:44 +00:00
Craig Topper d88f76a891 [X86] Add ktest intrinsics to match gcc and icc.
These aren't documented in the Intel Intrinsics Guide, but are supported by gcc and icc.

Includes these intrinsics:
_ktestc_mask8_u8, _ktestz_mask8_u8, _ktest_mask8_u8
_ktestc_mask16_u8, _ktestz_mask16_u8, _ktest_mask16_u8
_ktestc_mask32_u8, _ktestz_mask32_u8, _ktest_mask32_u8
_ktestc_mask64_u8, _ktestz_mask64_u8, _ktest_mask64_u8

llvm-svn: 341265
2018-08-31 22:29:56 +00:00
Craig Topper 42a4d0822e [X86] Add k-mask conversion and load/store instrinsics to match gcc and icc.
This adds:
_cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64
_cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64
_load_mask8, _load_mask16, _load_mask32, _load_mask64
_store_mask8, _store_mask16, _store_mask32, _store_mask64

These are currently missing from the Intel Intrinsics Guide webpage.

llvm-svn: 341251
2018-08-31 20:41:06 +00:00
Craig Topper 2aa8efc820 [X86] Add kshift intrinsics to match gcc and icc.
This adds the following intrinsics:
_kshiftli_mask8
_kshiftli_mask16
_kshiftli_mask32
_kshiftli_mask64
_kshiftri_mask8
_kshiftri_mask16
_kshiftri_mask32
_kshiftri_mask64

llvm-svn: 341234
2018-08-31 18:22:52 +00:00
Craig Topper a65bf65e0b [X86] Add kadd intrinsics to match gcc and icc.
This adds the following intrinsics:
_kadd_mask64
_kadd_mask32
_kadd_mask16
_kadd_mask8

These are missing from the Intel Intrinsics Guide, but are implemented by both gcc and icc.

llvm-svn: 340879
2018-08-28 22:32:14 +00:00
Craig Topper cb5fd56c7f [X86] Add kortest intrinsics for 8, 32, and 64 bit masks. Add new intrinsic names for 16 bit masks.
This matches gcc and icc despite not being documented in the Intel Intrinsics Guide.

llvm-svn: 340798
2018-08-28 06:28:25 +00:00
Craig Topper c330ca8611 [X86] Add intrinsics for kand/kandn/knot/kor/kxnor/kxor with 8, 32, and 64-bit mask registers.
This also adds a second intrinsic name for the 16-bit mask versions.

These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed.

llvm-svn: 340719
2018-08-27 06:20:22 +00:00
Craig Topper 9a022280b5 [X86] Remove min_vector_width 512 from some intrinsics that operate only on k-registers.
llvm-svn: 340718
2018-08-27 06:20:20 +00:00
Craig Topper e0b5d4cd9d [X86] Rename __DEFAULT_FN_ATTRS to a__DEFAULT_FN_ATTRS512 in avx512dqintrin.h and avx512bwintrin.h.
This is preparation for adding removing min_vector_width 512 from some intrinsics.

llvm-svn: 340717
2018-08-27 06:20:19 +00:00
Craig Topper 266b858705 [X86] Undef __DEFAULT_FN_ATTRS in avx512fintrin.h.
Fixes test failure after r340713

llvm-svn: 340714
2018-08-27 05:44:45 +00:00
Craig Topper f821f5314e [X86] Don't set min_vector_width to 512 on intrinsics that only operate on k registers.
llvm-svn: 340713
2018-08-27 05:27:15 +00:00
Nico Weber b2c53d3393 Make __shiftleft128 / __shiftright128 real compiler built-ins.
r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h.
Microsoft's STL plans on using these functions, and they're using intrin0.h
which just has declarations of built-ins to not pull in the huge intrin.h
header in the standard library headers. That requires that these functions are
real built-ins.

https://reviews.llvm.org/D50907

llvm-svn: 340048
2018-08-17 17:19:06 +00:00
Craig Topper 72a7606433 [X86] Remove masking from the 512-bit paddus/psubus builtins. Use a select builtin instead.
llvm-svn: 339845
2018-08-16 07:28:06 +00:00
Craig Topper 0609d1e211 [X86] Remove masking from the 512-bit padds and psubs builtins. Use select builtin instead.
llvm-svn: 339843
2018-08-16 06:20:29 +00:00
Pirama Arumuga Nainar 3c1a7bc290 [Headers] Define *_HAS_SUBNORM for FLT, DBL, LDBL
Summary:
These macros are defined in the C11 standard and can be defined based on
the __*_HAS_DENORM__ default macros.

Reviewers: bruno, rsmith, doug.gregor

Subscribers: llvm-commits, enh, srhines

Differential Revision: https://reviews.llvm.org/D37302

llvm-svn: 339284
2018-08-08 20:38:38 +00:00
Martin Storsjo 1662647995 [Headers] Expand _Unwind_Exception for SEH on MinGW/x86_64
This matches how GCC defines this struct.

Differential Revision: https://reviews.llvm.org/D50380

llvm-svn: 339170
2018-08-07 20:02:40 +00:00
Louis Dionne 58529c3f57 [clang] Fix broken include_next in float.h
Summary:
The code defines __FLOAT_H and then includes the next <float.h>, which is
guarded on __FLOAT_H so it gets skipped entirely. This commit uses the header
guard __CLANG_FLOAT_H, like other headers (such as limits.h) do.

Reviewers: jfb

Subscribers: dexonsmith, cfe-commits

Differential Revision: https://reviews.llvm.org/D50276

llvm-svn: 339016
2018-08-06 14:29:47 +00:00
Fangrui Song 6907ce2f8f Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

llvm-svn: 338291
2018-07-30 19:24:48 +00:00
Nico Weber e5898c1911 [ms] Add __shiftleft128 / __shiftright128 intrinsics
Carefully match the pattern matched by ISel so that this produces shld / shrd
(unless Subtarget->isSHLDSlow() is true).

Thanks to Craig Topper for providing the LLVM IR pattern that gets successfully
matched.

Fixes PR37755.

llvm-svn: 337619
2018-07-20 21:02:09 +00:00
Artem Belevich fa07bb646c [CUDA] Provide integer SIMD functions for CUDA-9.2
CUDA-9.2 made all integer SIMD functions into compiler builtins,
so clang no longer has access to the implementation of these
functions in either headers of libdevice and has to provide
its own implementation.

This is mostly a 1:1 mapping to a corresponding PTX instructions
with an exception of vhadd2/vhadd4 that don't have an equivalent
instruction and had to be implemented with a bit hack.

Performance of this implementation will be suboptimal for SM_50
and newer GPUs where PTXAS generates noticeably worse code for
the SIMD instructions compared to the code it generates
for the inline assembly generated by nvcc (or used to come
with CUDA headers).

Differential Revision: https://reviews.llvm.org/D49274

llvm-svn: 337587
2018-07-20 17:44:34 +00:00
Mandeep Singh Grang 0054f48b44 [COFF] Add more missing MSVC ARM64 intrinsics
Summary:
Added the following intrinsics:
_BitScanForward, _BitScanReverse, _BitScanForward64, _BitScanReverse64
_InterlockedAnd64, _InterlockedDecrement64, _InterlockedExchange64,
_InterlockedExchangeAdd64, _InterlockedExchangeSub64,
_InterlockedIncrement64, _InterlockedOr64, _InterlockedXor64.

Reviewers: compnerd, mstorsjo, rnk, javed.absar

Reviewed By: mstorsjo

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D49445

llvm-svn: 337327
2018-07-17 22:03:24 +00:00
Eric Christopher 83225b4075 Remove unnecessary trailing ; in macro intrinsic definition.
llvm-svn: 337321
2018-07-17 20:22:17 +00:00
Mikhail Dvoretckii d1bf9ef0c7 [X86] Lowering integer truncation intrinsics to native IR
This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to
native IR in cases where the result's length is less than 128 bits.

The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for
128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps)
instructions are generated instead

Differential Revision: https://reviews.llvm.org/D48712

llvm-svn: 336643
2018-07-10 08:22:44 +00:00
Craig Topper 36ab775cc1 [X86] Use masked the masked scalar fma builtins to implement the default rounding version of the fma intrinsics.
The rounding mode is checked in CGBuiltin.cpp to generate the correct intrinsic call.

Making this switch switchs the masking to use the i8 bitcast to <8 x i1> and extract i1 version of the IR for the mask. Previously we ended up with a scalar 'and' plus an icmp.

llvm-svn: 336637
2018-07-10 04:38:29 +00:00
Craig Topper 638426fc36 [X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that is suitable for use in scalar mask intrinsics.
This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics.

This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling.

I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle.

llvm-svn: 336622
2018-07-10 00:37:25 +00:00
Craig Topper 74c10e3236 [Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.

Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.

To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.

To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.

To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.

To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.

There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.

Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.

Differential Revision: https://reviews.llvm.org/D48617

llvm-svn: 336583
2018-07-09 19:00:16 +00:00
Craig Topper 0a485d13a4 [X86] Remove some unnecessarily escaped new lines from avx512fintrin.h
llvm-svn: 336499
2018-07-07 22:03:19 +00:00
Craig Topper 3e720a302c [X86] Fix a few intrinsics that were ignoring their rounding mode argument and hardcoded _MM_FROUND_CUR_DIRECTION internally.
I believe these have been broken since their introduction into clang.

I've enhanced the tests for these intrinsics to using a real rounding mode and checking all the intrinsic arguments instead of just the name.

llvm-svn: 336498
2018-07-07 22:03:16 +00:00
Craig Topper 218da62091 [X86] Change _mm512_shuffle_pd and _mm512_shuffle_ps to use target specific shuffle builtins instead of generic __builtin_shufflevector.
I added the builtins for 128, 256, and 512 bits recently but looks like I failed to convert to using the 512 bit one.

llvm-svn: 336488
2018-07-07 17:03:34 +00:00
Craig Topper 5cbeeedd27 [X86] Fix various type mismatches in intrinsic headers and intrinsic tests that cause extra bitcasts to be emitted in the IR.
Found via imprecise grepping of the -O0 IR. There could still be more bugs out there.

llvm-svn: 336487
2018-07-07 17:03:32 +00:00
Craig Topper 10f20fc42b [X86] Add missing scalar fma intrinsics with rounding, but no mask.
We had the mask versions of the rounding intrinsics, but not one without masking.

Also change the rounding tests to not use the CUR_DIRECTION rounding mode.

llvm-svn: 336470
2018-07-06 22:08:43 +00:00
Craig Topper 0029470dde [X86] Correct the width of mask arguments in intrinsic headers and tests.
All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there.

Some of these were real bugs where we lost bits from the user input:
_mm512_mask_broadcast_f32x8
_mm512_maskz_broadcast_f32x8
_mm512_mask_broadcast_i32x8
_mm512_maskz_broadcast_i32x8
_mm256_mask_cvtusepi16_storeu_epi8

llvm-svn: 336042
2018-06-30 06:05:17 +00:00
Craig Topper 0e9de769a0 [X86] Remove masking from the avx512 rotate builtins. Use a select builtin instead.
llvm-svn: 336036
2018-06-30 01:32:14 +00:00
Justin Lebar 2a192abaec [CUDA] Make __host__/__device__ min/max overloads constexpr in C++14.
Summary: Tests in a separate change to the test-suite.

Reviewers: rsmith, tra

Subscribers: lahwaacz, sanjoy, cfe-commits

Differential Revision: https://reviews.llvm.org/D48151

llvm-svn: 336026
2018-06-29 22:28:09 +00:00
Justin Lebar 5cb41c2acf [CUDA] Make min/max shims host+device.
Summary:
Fixes PR37753: min/max can't be called from __host__ __device__
functions in C++14 mode.

Testcase in a separate test-suite commit.

Reviewers: rsmith

Subscribers: sanjoy, lahwaacz, cfe-commits

Differential Revision: https://reviews.llvm.org/D48036

llvm-svn: 336025
2018-06-29 22:27:56 +00:00
Craig Topper 8bf793fb35 [X86] Remove masking from the avx512 packed sqrt builtins. Use select builtins instead.
llvm-svn: 335945
2018-06-29 05:43:33 +00:00
Craig Topper 1763dbb278 [X86] Correct the inline assembly implementations of __movsb/w/d/q and __stosw/d/q to mark registers/memory as modified
The inline assembly for these didn't mark that edi, esi, ecx are modified by movs/stos instruction. It also didn't mark that memory is modified.

This issue was reported to llvm-dev last year http://lists.llvm.org/pipermail/cfe-dev/2017-November/055863.html but no bug was ever filed.

Differential Revision: https://reviews.llvm.org/D48448

llvm-svn: 335270
2018-06-21 18:56:30 +00:00
Craig Topper b2431c6c33 [Intrinsics] Add/move some builtin declarations in intrin.h to get ms-intrinsics.c to not issue warnings
ud2 and int2c were missing declarations entirely. And the bitscans were only under x86_64, but they seem to be in BuiltinsARM.def as well and are tested by ms_intrinsics.c

Differential Revision: https://reviews.llvm.org/D48187

llvm-svn: 335259
2018-06-21 17:07:04 +00:00
Craig Topper ddfe69cc99 [X86] Rewrite the add/mul/or/and reduction intrinsics to make better use of other intrinsics and remove undef shuffle indices.
Similar to what was done to max/min recently.

These already reduced the vector width to 256 and 128 bit as we go unlike the original max/min code.

Differential Revision: https://reviews.llvm.org/D48346

llvm-svn: 335253
2018-06-21 16:41:28 +00:00
Craig Topper 2da60bc231 [X86] Remove masking from the 512-bit floating point max/min builtins. Use select in IR instead.
llvm-svn: 335200
2018-06-21 05:01:01 +00:00
Craig Topper d334615c23 [X86] Undefine _mm512_mask_reduce_operator macro in avx512fintrin.h before redefining it.
llvm-svn: 335086
2018-06-20 00:31:39 +00:00
Craig Topper 61495b3650 Recommit r335070 "[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible.""
Test has been updated to reflect the IRGen.

llvm-svn: 335075
2018-06-19 21:00:30 +00:00
Craig Topper 79b13a0348 Revert r335070 "[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible."
The test changes are failing the buildbot and its going to take me some time to fix it.

llvm-svn: 335072
2018-06-19 19:37:07 +00:00
Craig Topper 873afd0262 [X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible.
We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX.

For v16i32 and floating point we have legacy 128/256 bit instructions we can use.

I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization.

Differential Revision: https://reviews.llvm.org/D47401

llvm-svn: 335070
2018-06-19 19:13:54 +00:00
Craig Topper 31730ae761 [X86] Rename __builtin_ia32_pslldqi128 to __builtin_ia32_pslldqi128_byteshift and similar for other sizes. Remove the multiply by 8 from the header files.
The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8.

This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic.

Fixes the remaining issue from PR37795.

llvm-svn: 334773
2018-06-14 22:02:35 +00:00
Craig Topper b521dc3acf [X86] Add inline assembly versions of _InterlockedExchange_HLEAcquire/Release and _InterlockedCompareExchange_HLEAcquire/Release for MSVC compatibility.
Clang/LLVM doesn't have a way to pass an HLE hint through to the X86 backend to emit HLE prefixed instructions. So this is a good short term fix.

Differential Revision: https://reviews.llvm.org/D47672

llvm-svn: 334751
2018-06-14 18:43:52 +00:00
Tomasz Krupa 82aa42af49 [X86] Lowering Mask Scalar intrinsics to native IR (Clang part)
Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls
to native IR.

Reviewers: craig.topper, RKSimon, spatel, sroland

Reviewed By: craig.topper

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D47979

llvm-svn: 334741
2018-06-14 17:36:23 +00:00
Craig Topper 2527c378c6 [X86] Remove masking from avx512vbmi2 concat and shift by immediate builtins. Use select builtins instead.
llvm-svn: 334577
2018-06-13 07:19:28 +00:00
Craig Topper 91bbe98757 [X86] Remove masking from dbpsadbw builtins, use select builtin instead.
llvm-svn: 334385
2018-06-11 06:18:29 +00:00
Craig Topper 3614b41a8e [X86] Remove masking from the 512-bit packed floating point add/sub/mul/div builtins. Use select in IR instead.
llvm-svn: 334359
2018-06-10 06:01:42 +00:00
Craig Topper 88097d9355 [X86] Add back some masked vector truncate builtins. Custom IRgen a a few others.
I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl.

By using special buitlins we can emit a select without using the 128/256-bit select builtins.

llvm-svn: 334331
2018-06-08 21:50:08 +00:00
Craig Topper 5f50f33806 [X86] Fold masking into subvector extract builtins.
I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features.

The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported.

llvm-svn: 334330
2018-06-08 21:50:07 +00:00
Craig Topper 03f4f04b91 [X86] Add builtins for vpermq/vpermpd instructions to enable target feature checking.
llvm-svn: 334311
2018-06-08 18:00:25 +00:00
Craig Topper 9d3962f4f1 [X86] Change immediate type for some builtins from char to int.
These builtins are all handled by CGBuiltin.cpp so it doesn't much matter what the immediate type is, but int matches the intrinsic spec.

llvm-svn: 334310
2018-06-08 18:00:22 +00:00
Craig Topper 422a1bbb84 [X86] Add builtins for shufps and shufpd to enable target feature and immediate range checking.
llvm-svn: 334266
2018-06-08 07:18:33 +00:00
Craig Topper 03de166ccd [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking.
llvm-svn: 334265
2018-06-08 06:13:16 +00:00
Craig Topper 573dab1553 [X86] Fix some typecasts in intrinsic headers that I messed up in r334261.
This was caught by the Header tests, but not the CodeGen tests.

llvm-svn: 334264
2018-06-08 04:09:14 +00:00
Craig Topper 3428beeb2f [X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking.
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.

llvm-svn: 334261
2018-06-08 03:24:47 +00:00
Craig Topper acf5601961 [X86] Add builtins for vpermilps/pd instructions to enable target feature checking.
llvm-svn: 334256
2018-06-08 00:59:27 +00:00
Craig Topper 7d17d7278b [X86] Add builtins for blend with immediate control to enforce target feature requirements and check immediate range.
llvm-svn: 334249
2018-06-08 00:00:21 +00:00
Craig Topper 9392136414 [X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable target feature checking and immediate range checking.
llvm-svn: 334244
2018-06-07 23:03:08 +00:00
Reid Kleckner aa46ed9278 [MS] Re-add support for the ARM interlocked bittest intrinscs
Adds support for these intrinsics, which are ARM and ARM64 only:
  _interlockedbittestandreset_acq
  _interlockedbittestandreset_rel
  _interlockedbittestandreset_nf
  _interlockedbittestandset_acq
  _interlockedbittestandset_rel
  _interlockedbittestandset_nf

Refactor the bittest intrinsic handling to decompose each intrinsic into
its action, its width, and its atomicity.

llvm-svn: 334239
2018-06-07 21:39:04 +00:00
Craig Topper e56819eb69 [X86] Add builtins for VALIGNQ/VALIGND to enable proper target feature checking.
We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature.

I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that.

llvm-svn: 334237
2018-06-07 21:27:41 +00:00
Craig Topper d3623155a2 [X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar intrinsics.
We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.

This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously.

llvm-svn: 334208
2018-06-07 17:28:03 +00:00
Craig Topper b92c77d176 [X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit fmadd/fmsub/fmaddsub/fmsubadd builtins.
Summary:
We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't.

I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim.

The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform.

I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it.

Reviewers: tkrupa, RKSimon, spatel, GBuella

Reviewed By: tkrupa

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D47724

llvm-svn: 334159
2018-06-07 02:46:02 +00:00
Artem Belevich a6cd1a994c [CUDA] Replace 'nv_weak' attributes in CUDA headers with 'weak'.
Differential Revision: https://reviews.llvm.org/D47804

llvm-svn: 334108
2018-06-06 17:52:55 +00:00
Craig Topper f3914b74c1 [X86] Add builtins for vector element insert and extract for different 128 and 256 bit vector types. Use them to implement the extract and insert intrinsics.
Previously we were just using extended vector operations in the header file.

This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load.

By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count.

User code still has the option to use extended vector operations themselves if they need non-constant indexing.

llvm-svn: 334057
2018-06-06 00:24:55 +00:00
Craig Topper 9b0c61e9de [X86] Mark all the builtins and intrinsics that require MMX and an SSE feature as requiring both mmx and the sse feature.
Previously we only checked the sse feature, but this means that if you passed -mno-mmx, the builtins/intrinsics wouldn't be disabled in the frontend and would instead fail backend isel.

llvm-svn: 333980
2018-06-05 03:12:14 +00:00
Reid Kleckner 1d9c249db5 Reimplement the bittest intrinsic family as builtins with inline asm
We need to implement _interlockedbittestandset as a builtin for
windows.h, so we might as well do the whole family. It reduces code
duplication anyway.

Fixes PR33188, a long standing bug in our bittest implementation
encountered by Chakra.

llvm-svn: 333978
2018-06-05 01:33:40 +00:00
Reid Kleckner 89fbd55145 Revert r333791 "Cap "voluntary" vector alignment at 16 for all Darwin platforms."
Adding __attribute__((aligned(32))) to __m256 breaks the implementation
of _mm256_loadu_ps on Windows. On Windows, alignment attributes have
higher precedence than packing attributes.

We also might want to carefully consider the consequences of changing
our vector typedefs, since many users copy them and invent their own
new, non-Intel specific vector type names.

llvm-svn: 333958
2018-06-04 21:39:20 +00:00
Craig Topper 95ed88a1a9 [X86] Avoid passing _mm_undefined* to builtin_shufflevector if we are able to pass the first input a second time.
This is more consistent with other usages of builtin_shufflevector. Later optimization passes or codegen will detect the duplicate vector and replace it with undef. Using _mm_undefined just puts a zeroinitializer that still needs to be optimized out later.

llvm-svn: 333944
2018-06-04 19:28:09 +00:00
Craig Topper ae5f0a8a78 [X86] Fix a couple places that were using macro arguments twice when of the usages could just be undefined.
One of the arguments was being used when the passthru argument is unused due to the mask being all 1s. But in that case the actual value doesn't matter so we should use undef instead to avoid expanding the macro argument unnecessarily.

llvm-svn: 333865
2018-06-04 02:56:18 +00:00
Craig Topper 1359a46c48 [X86] Remove superfluous escaped new lines from intrinsic files.
llvm-svn: 333858
2018-06-03 23:31:01 +00:00
Craig Topper b41c6b854b [X86] Explicitly make the arguments to __slwpcb intrinsic 'void'.
This is the correct way to say it takes no arguments in C.

llvm-svn: 333855
2018-06-03 22:05:19 +00:00
Craig Topper 6fb26f93ef [X86] Replace __builtin_ia32_vbroadcastf128_pd256 and __builtin_ia32_vbroadcastf128_ps256 with an unaligned load intrinsics and a __builtin_shufflevector call.
llvm-svn: 333853
2018-06-03 19:42:59 +00:00
John McCall 280c656031 Cap "voluntary" vector alignment at 16 for all Darwin platforms.
This fixes two major problems:
- We were not capping vector alignment as desired on 32-bit ARM.
- We were using different alignments based on the AVX settings on
  Intel, so we did not have a consistent ABI.

This is an ABI break, but we think we can get away with it because
vectors tend to be used mostly in inline code (which is why not having
a consistent ABI has not proven disastrous on Intel).

Intel's AVX types are specified as having 32-byte / 64-byte alignment,
so align them explicitly instead of relying on the base ABI rule.
Note that this sort of attribute is stripped from template arguments
in template substitution, so there's a possibility that code templated
over vectors will produce inadequately-aligned objects.  The right
long-term solution for this is for alignment attributes to be
interpreted as true qualifiers and thus preserved in the canonical type.

llvm-svn: 333791
2018-06-01 21:34:26 +00:00
Craig Topper d521d16ba4 [X86] Rewrite avx512vbmi unmasked and maskz macro intrinsics to be wrappers around their __builtin function with appropriate arguments rather than just passing arguments to the masked intrinsic.
This is more consistent with all of our other avx512 macro intrinsics.

It also fixes a bad cast where an argument was casted to mmask8 when it should have been a mmask16.

llvm-svn: 333778
2018-06-01 18:26:35 +00:00
Martin Storsjo cad7a5f1aa [X86] Remove leftover semicolons at end of macros
This was missed in a few places in SVN r333613, causing compilation
errors if these macros are used e.g. as parameter to a function.

llvm-svn: 333734
2018-06-01 09:40:50 +00:00
Craig Topper a6dd2faaea [X86] Make 512-bit unmasked load/store builtins more like their 128/256-bit equivalents.
Previously we were just passing -1 mask to the masked builtin. This changes it to the more generic way that the 128/256 bit use.

llvm-svn: 333626
2018-05-31 05:02:08 +00:00
Tim Shen f811de484c [X86] Fix wrong intrinsic semantic.
llvm-svn: 333617
2018-05-31 01:51:07 +00:00
Craig Topper cbf3929bc9 [X86] Fix some places where macro arguments to intrinsics weren't cast to _m512(i|d)/_m256(i|d/_m128(i|d) first.
The majority of the cases were correct. This fixes the few that weren't.

I also removed some superfluous parentheses in non-macros that confused by attempts at grepping for missing casts.

llvm-svn: 333615
2018-05-31 01:24:40 +00:00
Craig Topper c633867944 [X86] Remove __extension__ from macro intrinsics when its not needed.
I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added.

Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load.

It also removed a bogus error message on another test.

llvm-svn: 333613
2018-05-31 00:51:20 +00:00
Craig Topper 73d1d403e2 [X86] Use C style comments in intrinsic headers for overall consistency.
Most of the origial comments used C style /* */ comments, but some C++ // comments had snuck in over time.

Still need to convert all the doxygen comments. Which is much harder to do.

llvm-svn: 333603
2018-05-30 22:33:21 +00:00
Craig Topper 63ec0ea7bc [X86] Add __extension__ to a bunch of places in our intrinsic headers that fail if you run it through -pedantic -ansi.
All of these are lines that create a 'compound literal' to concatenate elements together.

llvm-svn: 333593
2018-05-30 21:08:27 +00:00
Craig Topper c5ec55e921 [X86] Simplify the implementation of _mm_sqrt_ss, _mm_rcp_ss, and _mm_rsqrt_ss.
We don't need the insertion back into the original vector at the end. The builtin already understands that.

This is different than _mm_sqrt_sd which takes two arguments and we do need to insert.

llvm-svn: 333572
2018-05-30 18:27:07 +00:00
Craig Topper dff5b311af [X86] Reduce the number of setzero intrinsics to just the set defined by the Intel Intrinsics Guide.
We had quite a few for different element sizes of integers sometimes with strange target features attached to them.

We only need a single version for each of _m128i, _m256i, and _m512i with the target feature that first introduced those types.

llvm-svn: 333568
2018-05-30 18:02:11 +00:00
Craig Topper 819f2a20c3 [X86] Remove 'return' from a bunch of intrinsics that return void and use a builtin that returns void.
Found by running the intrinsic headers through -pedantic -ansi.

llvm-svn: 333563
2018-05-30 17:23:45 +00:00
Gabor Buella 70d8d51073 [X86] Lowering FMA intrinsics to native IR (Clang part)
This patch replaces all packed (and scalar without rounding
mode) fused intrinsics with fmadd/fmaddsub variations.
Then fmadd/fmaddsub are lowered to native IR.

Patch by tkrupa

Reviewers: craig.topper, sroland, spatel, RKSimon

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D47444

llvm-svn: 333555
2018-05-30 15:27:49 +00:00
Hans Wennborg 64fcb04950 Add missing curly from r333509
llvm-svn: 333515
2018-05-30 08:05:24 +00:00
Craig Topper f6e79c6d3f [X86] Remove masking from the AVX512VNNI builtins. Use a select in IR instead.
llvm-svn: 333509
2018-05-30 05:26:04 +00:00
Craig Topper 3d9305f28b [X86] Fix the names of a bunch of icelake intrinsics.
Mostly this fixes the names of all the 128-bit intrinsics to start with _mm_ instead of _mm128_ as is the convention and what the Intel docs say.

This also fixes the name of the bitshuffle intrinsics to say epi64 for 128 and 256 bit versions.

llvm-svn: 333497
2018-05-30 03:38:15 +00:00
Craig Topper 68a272d501 [X86] Merge the 3 different flavors of masked vpermi2var/vpermt2var builtins to a single version without masking. Use select builtins with appropriate operand instead.
llvm-svn: 333387
2018-05-29 03:26:38 +00:00
Craig Topper f99532faee Revert r333347 "[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible."
This wasn't supposed to be commited yet.

llvm-svn: 333349
2018-05-26 18:57:41 +00:00
Craig Topper 387b1423db [X86] Remove mask from avx512ifma builtins. Use a select instruction instead.
This reduces from 12 builtins to 6 since we no longer need a mask and maskz version.

llvm-svn: 333348
2018-05-26 18:55:26 +00:00
Craig Topper e091523c82 [X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible.
Summary:
We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX.

For v16i32 and floating point we have legacy 128/256 bit instructions we can use.

I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization.

Reviewers: RKSimon, spatel, GBuella

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D47401

llvm-svn: 333347
2018-05-26 18:55:24 +00:00
Gabor Buella 078bb99a90 [x86] invpcid intrinsic
An intrinsic for an old instruction, as described in the Intel SDM.

Reviewers: craig.topper, rnk

Reviewed By: craig.topper, rnk

Differential Revision: https://reviews.llvm.org/D47142

llvm-svn: 333256
2018-05-25 06:34:42 +00:00
Craig Topper 26df8c48da [X86] Fix a bad cast in _mm512_mask_abs_epi32 and _mm512_maskz_abs_epi32.
llvm-svn: 333211
2018-05-24 17:32:49 +00:00
Craig Topper 3e7d8dfae3 [X86] Move the include of clzerointrin.h from immintrin.h back to x86intrin.h.
This is an AMD intrinsic not an Intel intrinsic so it shouldn't be in immintrin.h

llvm-svn: 333124
2018-05-23 21:04:26 +00:00
Raphael Isemann 83bdfe5e51 [modules] Mark __wmmintrin_pclmul.h/__wmmintrin_aes.h as textual
Summary:
Since clang r332929 these two headers throw errors when included from somewhere else than their wrapper header. It seems marking them as textual is the best way to fix the builds.

Fixes this new module build error:
    While building module '_Builtin_intrinsics' imported from ...:
    In file included from <module-includes>:2:
    In file included from lib/clang/7.0.0/include/immintrin.h:54:
    In file included from lib/clang/7.0.0/include/wmmintrin.h:29:
    lib/clang/7.0.0/include/__wmmintrin_aes.h:25:2: error: "Never use <__wmmintrin_aes.h> directly; include <wmmintrin.h> instead."
    #error "Never use <__wmmintrin_aes.h> directly; include <wmmintrin.h> instead."

Reviewers: rsmith, v.g.vassilev, craig.topper

Reviewed By: craig.topper

Subscribers: craig.topper, cfe-commits

Differential Revision: https://reviews.llvm.org/D47277

llvm-svn: 333123
2018-05-23 20:59:46 +00:00
Craig Topper 664af9bc34 [X86] Move all Intel defined intrinsic includes into immintrin.h
This matches the Intel documentation which shows them available by importing immintrin.h. x86intrin.h also includes immintrin.h so anyone including x86intrin.h will still get them.

This is different than gcc, but I don't think we were a perfect match there already. I'm unclear what gcc's policy is about how they choose which to add things to.

Differential Revision: https://reviews.llvm.org/D47182

llvm-svn: 333110
2018-05-23 18:32:58 +00:00
Ekaterina Romanova 9b412153bf [DOXYGEN] Formatting changes for better intrinsics documentation rendering
(1) I added some \see cross-references to a few select intrinsics that are related (and have the same or similar semantics). 

(2) pmmintrin.h, smmintrin.h, xmmintrin.h have very few minor formatting changes. They make rendering of our intrinsics documentation better. 

llvm-svn: 333065
2018-05-23 06:33:22 +00:00
Craig Topper 9bed2e6953 [X86] Undef the vector reduction helper macros when we're done with them.
These are implementation helper macros we shouldn't expose them to user code if we don't need to.

llvm-svn: 333064
2018-05-23 06:31:36 +00:00
Craig Topper 39e0347e6a [X86] In the floating point max reduction intrinsics, negate infinity before feeding it to set1.
Previously we negated the whole vector after splatting infinity. But its better to negate the infinity before splatting. This generates IR with the negate already folded with the infinity constant.

llvm-svn: 333062
2018-05-23 05:51:52 +00:00
Craig Topper f2043b08b4 [X86] Remove mask argument from more builtins that are handled completely in CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead.
llvm-svn: 333061
2018-05-23 04:51:54 +00:00
Craig Topper 25caca72f4 [X86] As mentioned in post-commit feedback in D47174, move the 128 bit f16c intrinsics into f16cintrin.h and remove __emmintrin_f16c.h
These were included in emmintrin.h to match Intel Intrinsics Guide documentation. But this is because icc is capable of emulating them on targets that don't support F16C using library calls. Clang/LLVM doesn't have this emulation support. So it makes more sense to include them in immintrin.h instead.

I've left a comment behind to hopefully deter someone from trying to move them again in the future.

llvm-svn: 333033
2018-05-22 22:19:19 +00:00
Craig Topper 8e3689c066 [X86] Remove mask argument from some builtins that are handled completely in CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead.
llvm-svn: 333027
2018-05-22 20:48:24 +00:00
Craig Topper 99be40c363 [X86] Another attempt at fixing the intrinsic module map for rr333014.
llvm-svn: 333026
2018-05-22 20:48:20 +00:00
Craig Topper 1fceff99d8 [X86] Add two missing #endif directives to immintrin.h that should have been in r333014.
llvm-svn: 333023
2018-05-22 20:33:04 +00:00
Craig Topper a82ee182d4 [X86] Add __emmintrin_f16c.h to module map and CMakeLists.
I missed this in r333014

llvm-svn: 333020
2018-05-22 20:19:05 +00:00
Craig Topper 34c8c0d858 [X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h
Intel documents the 128-bit versions as being in emmintrin.h and the 256-bit version as being in immintrin.h.

This patch makes a new __emmtrin_f16c.h to hold the 128-bit versions to be included from emmintrin.h. And makes the existing f16cintrin.h contain the 256-bit versions and include it from immintrin.h with an error if its included directly.

Differential Revision: https://reviews.llvm.org/D47174

llvm-svn: 333014
2018-05-22 18:54:19 +00:00
Craig Topper d97a95ae2c [X86] Prevent inclusion of __wmmintrin_aes.h and __wmmintrin_pclmul.h without including wmmintrin.h
llvm-svn: 332929
2018-05-22 02:02:13 +00:00
Craig Topper 842171de36 [X86] Use __builtin_convertvector to implement some of the packed integer to packed float conversion intrinsics.
I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions.

We already do something similar for scalar conversions.

Differential Revision: https://reviews.llvm.org/D46863

llvm-svn: 332882
2018-05-21 20:19:17 +00:00
Craig Topper 092d42557b [X86] Remove some preprocessor feature checks from intrinsic headers
Summary:
These look to be a couple things that weren't removed when we switched to target attribute.

The popcnt makes including just smmintrin.h also include popcntintrin.h. The popcnt file itself already contains target attrributes.

The prefetch ones are just wrappers around __builtin_prefetch which we have graceful fallbacks for in the backend if the exact instruction isn't available. So there's no reason to hide them. And it makes them available in functions that have the write target attribute but not a -march command line flag.

Reviewers: echristo, RKSimon, spatel, DavidKreitzer

Reviewed By: echristo

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D47029

llvm-svn: 332830
2018-05-21 06:07:49 +00:00
Craig Topper 55b4067350 [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead.
Someday maybe we'll use selects for all the builtins.

llvm-svn: 332825
2018-05-20 23:34:10 +00:00
Craig Topper b809fc3d63 [X86] Fix a bad cast from mask16 to mask8 in _mm256_mask_cvtepi16_epi8 introduced in r332266.
llvm-svn: 332738
2018-05-18 17:18:46 +00:00
Justin Lebar c70121b99b [CUDA] Make std::min/max work when compiling in C++14 mode with a C++11 stdlib.
Reviewers: rsmith

Subscribers: sanjoy, cfe-commits, tra

Differential Revision: https://reviews.llvm.org/D46993

llvm-svn: 332619
2018-05-17 16:12:42 +00:00
Craig Topper 9d146bbaf7 [X86] Revert part of r332266: Use __builtin_convertvector to replace some of the avx512 truncate builtins.
The masking doesn't work right in the backend for the ones that produce byte or word elements without avx512bw.

llvm-svn: 332322
2018-05-15 03:17:52 +00:00
Craig Topper 25de41cfbc [X86] Use __builtin_convertvector to replace some of the avx512 truncate builtins.
As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend.

Differential Revision: https://reviews.llvm.org/D46742

llvm-svn: 332266
2018-05-14 17:50:40 +00:00
Craig Topper 8cb261e353 [X86] Use select instrution and fpextend in the implementation of _mm512_mask_cvtps_pd and _mm512_maskz_cvtps_pd.
llvm-svn: 332213
2018-05-14 04:57:46 +00:00
Craig Topper daaf105f86 [X86] Use __builtin_convertvector to implement _mm512_cvtps_pd.
If we're using default rounding mode we can let __builtin_convertvector to generate an fpextend. This matches 128 and 256 bit.

If we're using the version that takes an explicit rounding mode argument we would need to look at the immediate to see if its CUR_DIRECTION.

llvm-svn: 332210
2018-05-14 04:05:06 +00:00
Craig Topper 6fa91254e4 [X86] Emit better code for _mm_cvtu32_sd, _mm_cvtu64_sd, _mm_cvtu32_ss, and _mm_cvtu64_ss.
We can use direct C code for these that will use uitofp and insertelement instructions.

For the versions that take an explicit rounding mode we can't do this.

llvm-svn: 332203
2018-05-13 23:03:30 +00:00
Craig Topper 65ef3280b8 [X86] Fix the file header name on fmaintrin.h
llvm-svn: 332108
2018-05-11 17:37:40 +00:00
Gabor Buella 9cd4f16601 [X86] Assume alignment of movdir64b dst argument
Reviewers: craig.topper

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D46683

llvm-svn: 332091
2018-05-11 14:22:04 +00:00
Gabor Buella 3a7571259e [X86] ptwrite intrinsic
Reviewers: craig.topper, RKSimon

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D46540

llvm-svn: 331962
2018-05-10 07:28:54 +00:00
Craig Topper 74ac0eda68 [X86] Change the implementation of scalar masked load/store intrinsics to not use a 512-bit intermediate vector.
This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs.

Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well.

llvm-svn: 331958
2018-05-10 05:43:43 +00:00
Adrian Prantl 9fc8faf9e6 Remove \brief commands from doxygen comments.
This is similar to the LLVM change https://reviews.llvm.org/D46290.

We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46320

llvm-svn: 331834
2018-05-09 01:00:01 +00:00
Gabor Buella 5e52fa9035 [x86] Introduce the encl[u|s|v] intrinsics
Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D46435

llvm-svn: 331743
2018-05-08 07:12:34 +00:00
Gabor Buella b0f310d51d [x86] Introduce the pconfig intrinsic
Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D46431

llvm-svn: 331740
2018-05-08 06:49:41 +00:00
Craig Topper 66ef4185bc [X86] Make _mm256_gf2p8mul_epi8 require avx features since its 256 bits.
Without this we throw an error on the header file instead of the user code when the right features aren't enabled in clang.

Rename the other DEFAULT_FN_ATTRS defines to _Z for 512-bit since I used _Y for this case.

llvm-svn: 331682
2018-05-07 21:47:11 +00:00
Craig Topper 934f86a848 [X86] Fix some inconsistent formatting in the first line of our intrinsics headers.
Some were too long and some were too short.

llvm-svn: 331559
2018-05-04 21:45:25 +00:00
Volodymyr Sapsai 2d77119f72 Revert "Emit an error when mixing <stdatomic.h> and <atomic>"
It reverts r331378 as it caused test failures

    ThreadSanitizer-x86_64 :: Darwin/gcd-groups-destructor.mm
    ThreadSanitizer-x86_64 :: Darwin/libcxx-shared-ptr-stress.mm
    ThreadSanitizer-x86_64 :: Darwin/xpc-race.mm

Only clang part of the change is reverted, libc++ part remains as is because it
emits error less aggressively.

llvm-svn: 331392
2018-05-02 19:52:07 +00:00
Volodymyr Sapsai c0a278aada Emit an error when mixing <stdatomic.h> and <atomic>
Atomics in C and C++ are incompatible at the moment and mixing the
headers can result in confusing error messages.

Emit an error explicitly telling about the incompatibility. Introduce
the macro `__ALLOW_STDC_ATOMICS_IN_CXX__` that allows to choose in C++
between C atomics and C++ atomics.

rdar://problem/27435938

Reviewers: rsmith, EricWF, mclow.lists

Reviewed By: mclow.lists

Subscribers: jkorous-apple, christof, bumblebritches57, JonChesterfield, smeenai, cfe-commits

Differential Revision: https://reviews.llvm.org/D45470

llvm-svn: 331378
2018-05-02 17:50:43 +00:00
Gabor Buella a51e0c2243 [X86] directstore and movdir64b intrinsics
Reviewers: spatel, craig.topper, RKSimon

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45984

llvm-svn: 331249
2018-05-01 10:05:42 +00:00
Craig Topper e95bde33df [X86] Add support for _mm512_mullox_epi64 and _mm512_mask_mullox_epi64 intrinsics to match icc.
On AVX512F targets we'll produce an emulated sequence using 3 pmuludqs with shifts and adds. On AVX512DQ we'll use vpmulld.

Fixes PR37140.

llvm-svn: 330923
2018-04-26 05:38:39 +00:00
Artem Belevich 3cce307799 [CUDA] Enable CUDA compilation with CUDA-9.2
Differential Revision: https://reviews.llvm.org/D45827

llvm-svn: 330753
2018-04-24 18:23:19 +00:00
Craig Topper 5f1d10e26e [X86] Add recently added intrinsic headers to the module map.
llvm-svn: 330744
2018-04-24 17:40:49 +00:00
Craig Topper bd16b11255 [X86] Consistently use double underscore at the beginning of the include guards in our intrinsic headers.
Most files used double underscore, but a few used single. This converges them all to double.

llvm-svn: 330743
2018-04-24 17:40:47 +00:00
Craig Topper ce281a41b5 [X86] Remove '#ifdef __x86_64__' around mask_set1_epi64 intrinsics.
The unmasked versions already didn't have this restrction. I don't think gcc or icc limit these to 64-bit mode so we shouldn't either.

llvm-svn: 330681
2018-04-24 03:36:08 +00:00
Gabor Buella eba6c42e66 [X86] WaitPKG intrinsics
Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45254

llvm-svn: 330463
2018-04-20 18:44:33 +00:00
Artem Belevich 5832eb4cfd [CUDA] added missing __ldg(const signed char *)
Differential Revision: https://reviews.llvm.org/D45780

llvm-svn: 330280
2018-04-18 18:33:43 +00:00
Gabor Buella b220dd2b6c [X86] Introduce cldemote intrinsic
Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45257

llvm-svn: 329993
2018-04-13 07:37:24 +00:00
Gabor Buella e708a09e21 [X86] Introduce wbinvd intrinsic
A previously missing intrinsic for an old instruction.

Reviewers: craig.topper, echristo

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45311

llvm-svn: 329937
2018-04-12 18:42:02 +00:00
Gabor Buella a052016ef2 [x86] wbnoinvd intrinsic
The WBNOINVD instruction writes back all modified
cache lines in the processor’s internal cache to main memory
but does not invalidate (flush) the internal caches.

Reviewers: craig.topper, zvi, ashlykov

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D43817

llvm-svn: 329848
2018-04-11 20:09:09 +00:00
Craig Topper dcdac965f1 [X86] Fix typo in intrinsic header file __mask16->__mmask16 from r329775.
llvm-svn: 329777
2018-04-11 05:17:14 +00:00
Craig Topper 2575454fe9 [X86] Replace 512-bit masked pmaddubsw and pmaddwd intrinsic with unmasked intrinsic and a select.
This makes it consistent with the 128/256-bit functions.

Someday maybe we'll have all the masking moved to selects.

llvm-svn: 329775
2018-04-11 04:55:10 +00:00
Alexander Kornienko 2a8c18d991 Fix typos in clang
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:

  archtype
  cas
  classs
  checkk
  compres
  definit
  frome
  iff
  inteval
  ith
  lod
  methode
  nd
  optin
  ot
  pres
  statics
  te
  thru

Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)

Differential revision: https://reviews.llvm.org/D44188

llvm-svn: 329399
2018-04-06 15:14:32 +00:00
Douglas Yung 17d2ef90e0 [DOXYGEN] Fix doxygen and content issues in mmintrin.h
- Fix instruction mappings/listings for various intrinsics

This patch was made by Craig Flores

Differential Revision: https://reviews.llvm.org/D41517

llvm-svn: 327090
2018-03-09 00:38:51 +00:00
Craig Topper 260ed8647a [X86] Fix typo in cpuid.h, bit_AVX51SER->bit_AVX512ER.
llvm-svn: 326807
2018-03-06 16:06:44 +00:00
Alexander Ivchenko 9d3b45301f [x86][CET] Introduce _get_ssp, _inc_ssp intrinsics
Summary:
The _get_ssp intrinsic can be used to retrieve the
shadow stack pointer, independent of the current arch -- in
contract with the rdsspd and the rdsspq intrinsics.
Also, this intrinsic returns zero on CPUs which don't
support CET. The rdssp[d|q] instruction is decoded as nop,
essentially just returning the input operand, which is zero.
Example result of compilation:

```
xorl    %eax, %eax
movl    %eax, %ecx
rdsspq  %rcx         # NOP when CET is not supported
movq    %rcx, %rax   # return zero
```

Reviewers: craig.topper

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D43814

llvm-svn: 326689
2018-03-05 11:30:28 +00:00
Craig Topper 21f66a3f6b [X86] Remove some masked cvt builtins that can be replaced with legacy sse/avx buiiltins and a select.
llvm-svn: 326039
2018-02-24 18:55:13 +00:00
Craig Topper 5dc6ca8e5b [X86] Remove __builtin_ia32_permvarsf256_mask and __builtin_ia32_permvarsi256_mask and use the avx2 unmasked versions and a select instead.
llvm-svn: 326022
2018-02-24 06:46:42 +00:00
Artem Belevich df38f155ec [CUDA] Added missing functions.
Initial commit missed sincos(float), llabs() and few atomics that we
used to pull in from device_functions.hpp, which we no longer include.

Differential Revision: https://reviews.llvm.org/D43602

llvm-svn: 325814
2018-02-22 18:40:52 +00:00
Artem Belevich 4dbea99137 [CUDA] Added missing __threadfence_system() function for CUDA9.
llvm-svn: 325626
2018-02-20 21:25:30 +00:00
Craig Topper 0a70c3c7af [X86] Remove mask from 512 bit pmulhrsw/pmulhw/pmulhuw builtins.
We now use a vselect node in IR around an unmasked builtin. This makes it consistent with the 128 and 256 bit versions.

llvm-svn: 325560
2018-02-20 07:28:18 +00:00
Ekaterina Romanova f28751849e [DOXYGEN] There was a request in the review D41507 to change the notation for hex numbers in doxygen documentation from <...>h to 0x<...>. Both of these notations were used in x86 intrinsics documentation. I promised to change them to 0x<...> for consistency.
Differential Revision: https://reviews.llvm.org/D41888

llvm-svn: 325312
2018-02-16 03:11:35 +00:00
Artem Belevich fbc56a904f [CUDA] Added partial support for CUDA-9.1
Clang can use CUDA-9.1 now, though new APIs (are not implemented yet.

The major change is that headers in CUDA-9.1 went through substantial
changes that started in CUDA-9.0 which required substantial changes
in the cuda compatibility headers provided by clang.

There are two major issues:
* CUDA SDK no longer provides declarations for libdevice functions.
* A lot of device-side functions have become nvcc's builtins and
  CUDA headers no longer contain their implementations.

This patch changes the way CUDA headers are handled if we compile
with CUDA 9.x. Both 9.0 and 9.1 are affected.

* Clang provides its own declarations of libdevice functions.
* For CUDA-9.x clang now provides implementation of device-side
  'standard library' functions using libdevice.

This patch should not affect compilation with CUDA-8. There may be
some observable differences for CUDA-9.0, though they are not expected
to affect functionality.

Tested: CUDA test-suite tests for all supported combinations of:
        CUDA: 7.0,7.5,8.0,9.0,9.1
        GPU: sm_20, sm_35, sm_60, sm_70

Differential Revision: https://reviews.llvm.org/D42513

llvm-svn: 323713
2018-01-30 00:00:12 +00:00
Hiroshi Inoue 1019f8a98e [NFC] fix trivial typos in comments
"to to" -> "to"

llvm-svn: 323627
2018-01-29 05:15:18 +00:00
Craig Topper 8cdb94901d [X86] Add rdpid command line option and intrinsics.
Summary: This patch adds -mrdpid/-mno-rdpid and the rdpid intrinsic. The corresponding LLVM commit has already been made.

Reviewers: RKSimon, spatel, zvi, AndreiGrischenko

Reviewed By: RKSimon

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D42272

llvm-svn: 323047
2018-01-20 18:36:52 +00:00
Abderrazek Zaafrani ce8746d178 [AArch64] Add ARMv8.2-A FP16 scalar intrinsics
https://reviews.llvm.org/D41792

llvm-svn: 323006
2018-01-19 23:11:18 +00:00
Douglas Yung 46474dae4d [DOXYGEN] Fix doxygen and content issues in xmmintrin.h
- Fix inaccurate instruction listings.
- Fix small issues in _mm_getcsr and _mm_setcsr.
- Fix description of NaN handling in comparison intrinsics.
- Fix inaccurate description of _mm_movemask_pi8.
- Fix inaccurate instruction mappings.
- Fix typos.
- Clarify wording on some descriptions.
- Fix bit ranges in return value.
- Fix typo in _mm_move_ms intrinsic instruction since it operates on singe-precision values, not double.
- This patch was made by Craig Flores

Differential Revision: https://reviews.llvm.org/D41523

llvm-svn: 322778
2018-01-17 22:53:15 +00:00
Craig Topper f517f1a516 [X86] Implement old kunpck intrinsics using vector ops on vXi1 instead of integer shift/and/or
Summary:
kunpck intrinsics were removed in favor of native IR a few months ago. The implementation lowers them as by operation on the integer types passed to the intrinsic and then just shifting, masking, and oring them together. A special X86 DAG combine was added to recognize this patter and turn it into a concat_vector operation.

I think it makes more sense to keep the IR implementation closer to vector operations on vXi1. Given that we expect these builtins to be used around other builtins that operate on k-registers which we try to represent in IR with vXi1. InstCombine should be able to get rid of the bitcasts between integers and vXi1 leaving only the vector operations.

Reviewers: RKSimon, spatel, zvi, jina.nahias

Reviewed By: RKSimon

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D42016

llvm-svn: 322461
2018-01-14 19:23:50 +00:00
Sven van Haastregt 774355e321 [OpenCL] Reorder the CLK_sRGBx/sRGBA defines, NFC
Swap them so that all channel order defines are ordered according to
their values.

llvm-svn: 322278
2018-01-11 14:05:38 +00:00
Douglas Yung 7ff91421b4 [DOXYGEN] Fix doxygen and content issues in avxintrin.h
- Fix incorrect wording in various intrinsic descriptions. Previously the descriptions used "low-order" and "high-order" when the intended meaning was "even-indexed" and "odd-indexed".
- Fix a few typos and errors found during review.
- Restore new line endings.

This patch was made by Craig Flores

llvm-svn: 322027
2018-01-08 21:21:17 +00:00
Douglas Yung 4c549c31bb [DOXYGEN] Fix doxygen and content issues in smmintrin.h
- Fix formatting issue due to hyphenated terms at line breaks.
- Fix typo

This patch was made by Craig Flores

Differential Revision: https://reviews.llvm.org/D41520

llvm-svn: 321671
2018-01-02 20:45:29 +00:00
Douglas Yung df1e9ef156 [DOXYGEN] Fix doxygen and content issues in pmmintrin.h
- Fix incorrect wording in various intrinsic descriptions. Previously the descriptions used "low-order" and "high-order" when the intended meaning was "even-indexed" and "odd-indexed".

This patch was made by Craig Flores

Differential Revision: https://reviews.llvm.org/D41518

llvm-svn: 321670
2018-01-02 20:42:53 +00:00
Douglas Yung 0686df106c [DOXYGEN] Fix doxygen and content issues in emmintrin.h
- Fixed innaccurate instruction mappings for various intrinsics.
- Fixed description of NaN handling in comparison intrinsics.
- Unify description of _mm_store_pd1 to match _mm_store1_pd.
- Fix incorrect wording in various intrinsic descriptions. Previously the descriptions used "low-order" and "high-order" when the intended meaning was "even-indexed" and "odd-indexed".
- Fix typos.
- Add missing italics command (\a) for params and fixed some parameter spellings.

This patch was made by Craig Flores

Differential Revision: https://reviews.llvm.org/D41516

llvm-svn: 321669
2018-01-02 20:39:29 +00:00
Coby Tayree a09663a5c1 [x86][icelake][vbmi2]
added vbmi2 feature recognition
added intrinsics support for vbmi2 instructions
_mm[128,256,512]_mask[z]_compress_epi[16,32]
_mm[128,256,512]_mask_compressstoreu_epi[16,32]
_mm[128,256,512]_mask[z]_expand_epi[16,32]
_mm[128,256,512]_mask[z]_expandloadu_epi[16,32]
_mm[128,256,512]_mask[z]_sh[l,r]di_epi[16,32,64]
_mm[128,256,512]_mask_sh[l,r]dv_epi[16,32,64]
matching a similar work on the backend (D40206)
Differential Revision: https://reviews.llvm.org/D41557

llvm-svn: 321487
2017-12-27 11:25:07 +00:00
Coby Tayree 3d9c88cfec [x86][icelake][vnni]
added vnni feature recognition
added intrinsics support for VNNI instructions
_mm256_mask_dpbusd_epi32
_mm256_maskz_dpbusd_epi32
_mm256_dpbusd_epi32
_mm256_mask_dpbusds_epi32
_mm256_maskz_dpbusds_epi32
_mm256_dpbusds_epi32
_mm256_mask_dpwssd_epi32
_mm256_maskz_dpwssd_epi32
_mm256_dpwssd_epi32
_mm256_mask_dpwssds_epi32
_mm256_maskz_dpwssds_epi32
_mm256_dpwssds_epi32
_mm128_mask_dpbusd_epi32
_mm128_maskz_dpbusd_epi32
_mm128_dpbusd_epi32
_mm128_mask_dpbusds_epi32
_mm128_maskz_dpbusds_epi32
_mm128_dpbusds_epi32
_mm128_mask_dpwssd_epi32
_mm128_maskz_dpwssd_epi32
_mm128_dpwssd_epi32
_mm128_mask_dpwssds_epi32
_mm128_maskz_dpwssds_epi32
_mm128_dpwssds_epi32
_mm512_mask_dpbusd_epi32
_mm512_maskz_dpbusd_epi32
_mm512_dpbusd_epi32
_mm512_mask_dpbusds_epi32
_mm512_maskz_dpbusds_epi32
_mm512_dpbusds_epi32
_mm512_mask_dpwssd_epi32
_mm512_maskz_dpwssd_epi32
_mm512_dpwssd_epi32
_mm512_mask_dpwssds_epi32
_mm512_maskz_dpwssds_epi32
_mm512_dpwssds_epi32
matching a similar work on the backend (D40208)
Differential Revision: https://reviews.llvm.org/D41558

llvm-svn: 321484
2017-12-27 10:37:51 +00:00
Coby Tayree 2268576fa0 [x86][icelake][bitalg]
added bitalg feature recognition
added intrinsics support for bitalg instructions
_mm512_popcnt_epi16
_mm512_mask_popcnt_epi16
_mm512_maskz_popcnt_epi16
_mm512_popcnt_epi8
_mm512_mask_popcnt_epi8
_mm512_maskz_popcnt_epi8
_mm512_mask_bitshuffle_epi64_mask
_mm512_bitshuffle_epi64_mask
_mm256_popcnt_epi16
_mm256_mask_popcnt_epi16
_mm256_maskz_popcnt_epi16
_mm128_popcnt_epi16
_mm128_mask_popcnt_epi16
_mm128_maskz_popcnt_epi16
_mm256_popcnt_epi8
_mm256_mask_popcnt_epi8
_mm256_maskz_popcnt_epi8
_mm128_popcnt_epi8
_mm128_mask_popcnt_epi8
_mm128_maskz_popcnt_epi8
_mm256_mask_bitshuffle_epi32_mask
_mm256_bitshuffle_epi32_mask
_mm128_mask_bitshuffle_epi16_mask
_mm128_bitshuffle_epi16_mask
matching a similar work on the backend (D40222)
Differential Revision: https://reviews.llvm.org/D41564

llvm-svn: 321483
2017-12-27 10:01:00 +00:00
Coby Tayree cf96c876c6 [x86][icelake][vpclmulqdq]
added vpclmulqdq feature recognition
added intrinsics support for vpclmulqdq instructions
  _mm256_clmulepi64_epi128
  _mm512_clmulepi64_epi128
matching a similar work on the backend (D40101)
Differential Revision: https://reviews.llvm.org/D41573

llvm-svn: 321480
2017-12-27 09:00:31 +00:00
Coby Tayree f4811ebc39 [x86][icelake][gfni]
added gfni feature recognition
added intrinsics support for gfni instructions
  _mm_gf2p8affineinv_epi64_epi8
  _mm_mask_gf2p8affineinv_epi64_epi8
  _mm_maskz_gf2p8affineinv_epi64_epi8
  _mm256_gf2p8affineinv_epi64_epi8
  _mm256_mask_gf2p8affineinv_epi64_epi8
  _mm256_maskz_gf2p8affineinv_epi64_epi8
  _mm512_gf2p8affineinv_epi64_epi8
  _mm512_mask_gf2p8affineinv_epi64_epi8
  _mm512_maskz_gf2p8affineinv_epi64_epi8
  _mm_gf2p8affine_epi64_epi8
  _mm_mask_gf2p8affine_epi64_epi8
  _mm_maskz_gf2p8affine_epi64_epi8
  _mm256_gf2p8affine_epi64_epi8
  _mm256_mask_gf2p8affine_epi64_epi8
  _mm256_maskz_gf2p8affine_epi64_epi8
  _mm512_gf2p8affine_epi64_epi8
  _mm512_mask_gf2p8affine_epi64_epi8
  _mm512_maskz_gf2p8affine_epi64_epi8
  _mm_gf2p8mul_epi8
  _mm_mask_gf2p8mul_epi8
  _mm_maskz_gf2p8mul_epi8
  _mm256_gf2p8mul_epi8
  _mm256_mask_gf2p8mul_epi8
  _mm256_maskz_gf2p8mul_epi8
  _mm512_gf2p8mul_epi8
  _mm512_mask_gf2p8mul_epi8
  _mm512_maskz_gf2p8mul_epi8
matching a similar work on the backend (D40373)
Differential Revision: https://reviews.llvm.org/D41582

llvm-svn: 321477
2017-12-27 08:37:47 +00:00
Coby Tayree a1e5f0c339 [x86][icelake][vaes]
added vaes feature recognition
added intrinsics support for vaes instructions, matching a similar work on the backend (D40078)
  _mm256_aesenc_epi128
  _mm512_aesenc_epi128
  _mm256_aesenclast_epi128
  _mm512_aesenclast_epi128
  _mm256_aesdec_epi128
  _mm512_aesdec_epi128
  _mm256_aesdeclast_epi128
  _mm512_aesdeclast_epi128

llvm-svn: 321474
2017-12-27 08:16:54 +00:00
Artem Belevich 3cebc738b6 [CUDA] More fixes for __shfl_* intrinsics.
* __shfl_{up,down}* uses unsigned int for the third parameter.
* added [unsigned] long overloads for non-sync shuffles.

Differential Revision: https://reviews.llvm.org/D41521

llvm-svn: 321326
2017-12-21 23:52:09 +00:00
Craig Topper 170de4b4ba [X86] Allow _mm_prefetch (both the header implementation and the builtin) to accept bit 2 which is supposed to indicate the prefetched addresses will be written to
Add the appropriate _MM_HINT_ET0/ET1 defines to match gcc.

llvm-svn: 321325
2017-12-21 23:50:22 +00:00
Craig Topper 54b3f718e4 [X86] Add more CPUID bits to cpuid.h to match gcc and support icelake features.
llvm-svn: 321129
2017-12-20 00:46:09 +00:00
Craig Topper 798f2c037c [X86] Add the two files I forgot to commit in r320915.
llvm-svn: 320916
2017-12-16 06:10:24 +00:00
Craig Topper b846d1ff76 [X86] Add builtins and tests for 128 and 256 bit vpopcntdq.
llvm-svn: 320915
2017-12-16 06:02:31 +00:00
Stephan Bergmann feed26ff07 In stdbool.h, define bool, false, true only in gnu++98
GCC has meanwhile corrected that with the similar
<https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=216679> "C++11
explicitly forbids macros for bool, true and false."

Differential Revision: https://reviews.llvm.org/D40167

llvm-svn: 320135
2017-12-08 08:28:08 +00:00
Artem Belevich a659d2590e [NVPTX,CUDA] Added llvm.nvvm.fns intrinsic and matching __nvvm_fns builtin in clang.
Differential Revision: https://reviews.llvm.org/D40872

llvm-svn: 319909
2017-12-06 17:50:05 +00:00
Artem Belevich 4631ef1e43 [CUDA] Added overloads for '[unsigned] long' variants of shfl builtins.
Differential Revision: https://reviews.llvm.org/D40871

llvm-svn: 319908
2017-12-06 17:40:35 +00:00
Jina Nahias eb0829155f [x86][AVX512] Lowering kunpack intrinsics to LLVM IR
This patch, together with a matching llvm patch (https://reviews.llvm.org/D39720), implements the lowering of X86 kunpack intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D39719

Change-Id: Id5d3cb394ad33b98be79a6783d1d15569e2b798d
llvm-svn: 319777
2017-12-05 15:42:47 +00:00
Shoaib Meenai 669cae1f28 [clang] Use add_llvm_install_targets
Use this function to create the install targets rather than doing so
manually, which gains us the `-stripped` install targets to perform
stripped installations.

Differential Revision: https://reviews.llvm.org/D40675

llvm-svn: 319489
2017-11-30 22:35:02 +00:00
Artem Belevich 05914bf482 [CUDA] Tweak CUDA wrappers to make cuda-9 work with libc++
CUDA-9 headers check for specific libc++ version and ifdef out
some of the definitions we need if LIBCPP_VERSION >= 3800.

Differential Revision: https://reviews.llvm.org/D40198

llvm-svn: 319485
2017-11-30 22:22:21 +00:00
Alexey Sotkin b833bf6ae1 [OpenCL] Add extensions cl_intel_subgroups and cl_intel_subgroups_short
Reviewers: yaxunl, Anastasia, bader

Reviewed By: Anastasia, bader

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D39936

llvm-svn: 319011
2017-11-27 09:14:17 +00:00
Oren Ben Simhon fec21ec0c6 Control-Flow Enforcement Technology - Shadow Stack and Indirect Branch Tracking support (Clang side)
Shadow stack solution introduces a new stack for return addresses only.
The stack has a Shadow Stack Pointer (SSP) that points to the last address to which we expect to return.
If we return to a different address an exception is triggered.
This patch includes shadow stack intrinsics as well as the corresponding CET header.
It includes CET clang flags for shadow stack and Indirect Branch Tracking.

For more information, please see the following:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf

Differential Revision: https://reviews.llvm.org/D40224

Change-Id: I79ad0925a028bbc94c8ecad75f6daa2f214171f1
llvm-svn: 318995
2017-11-26 12:34:54 +00:00
Craig Topper 9e032ed55a [X86] Use separate builtins for fma4 scalar intrinsics. Use negations to remove some of the scalar fma3 builtins.
fma4 instructions zero the upper bits of the xmm register. fma3 instructions leave the bits unmodified. This requires separate builtins for the different semantics.

While we're cleaning up the scalar builtins this also removes the fma3 fmsub/fnmadd/fnmsub builtins by using negates in the header file.

llvm-svn: 318985
2017-11-25 19:32:12 +00:00
Justin Lebar 370c766e40 [CUDA] Remove implementations of nexttoward.
Summary:
__builtin_nexttoward lowers to a libcall, e.g. nexttowardf(), that CUDA
does not have.

Rather than try to implement it, we simply remove these functions --
nvcc doesn't support them either, and nextafter, which does work, does
essentially the same thing on GPUs, because GPUs don't have long double.

Reviewers: tra

Subscribers: cfe-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D40152

llvm-svn: 318494
2017-11-17 01:15:43 +00:00
Uriel Korach 5b2b71d909 [X86] test/testn intrinsics lowering to IR. clang side
Change Header files of the intrinsics for lowering test and testn intrinsics to IR code.
Removed test and testn builtins from clang

Differential Revision: https://reviews.llvm.org/D38737

llvm-svn: 318035
2017-11-13 12:50:52 +00:00
Jina Nahias dca979194d [x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IR
This patch, together with a matching llvm patch (https://reviews.llvm.org/D38671), implements the lowering of X86 shuffle i/f intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D38672

Change-Id: I9b3c2f2b34323bd9ccb21d0c1832f848b88ec047
llvm-svn: 318025
2017-11-13 09:15:31 +00:00
Justin Lebar 7c56dfe441 [CUDA] Fix std::min on device side to return the min, not the max.
Summary:
How embarrassing.

This is tested in the test-suite -- fix to come there in a separate
patch.

Reviewers: tra

Subscribers: sanjoy, cfe-commits

Differential Revision: https://reviews.llvm.org/D39817

llvm-svn: 317961
2017-11-11 01:25:44 +00:00
Craig Topper b3d447356f [X86] Reduce the number of FMA builtins needed by the frontend by adding negates to operands of the fmadd and fmaddsub builtins.
The backend should be able to combine the negates to create fmsub, fnmadd, and fnmsub. faddsub converting to fsubadd still needs work I think, but should be very doable.

This matches what we already do for the masked builtins.

This only covers the packed builtins. Scalar builtins will be done after FMA4 is fixed.

llvm-svn: 317873
2017-11-10 05:20:32 +00:00
Craig Topper e5b84ec2a1 [X86] Rename the VEX scalar fma builtins to end with a '3' to match gcc
I think we need to use different builtins for the FMA4 instructions since those instructions zero the upper bits and FMA3 instructions pass the bits through.

So this moves the existing builtins to be the FMA3 versions. New versions will be added for FMA4.

llvm-svn: 317766
2017-11-09 04:10:46 +00:00
Craig Topper 57f96ac6dc [X86] Replace the mask cmpeq/cmple/cmplt/cmpgt/cmpge/cmpneq intrinsics with macros that just pass the right comparison predicate value to the regular cmp intrinsic. Remove mask cmpeq/cmpgt builtins that are now unused.
This shortens the intrinsic headers a little and allows us to get rid of the cmpeq and cmpgt handling from CGBuiltin.cpp.

llvm-svn: 317506
2017-11-06 21:00:49 +00:00
Jina Nahias 48e298b8c4 lowering broadcastm
Change-Id: I0661abea3e3742860e0a03ff9e4fcdc367eff7db
llvm-svn: 317456
2017-11-06 07:04:12 +00:00
Martin Storsjo 5aa613ed2f [Headers] Fix typoed __ARM_DWARF_EH__ ifdefs
These typos appeared in SVN r309226 and r309327.

llvm-svn: 316149
2017-10-19 07:40:45 +00:00
Craig Topper 89cd7533f7 [X86] Add CLWB intrinsic. clang part
Reviewers: RKSimon, zvi, igorb

Reviewed By: RKSimon

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D38781

llvm-svn: 315607
2017-10-12 18:57:15 +00:00
Craig Topper 189576f80e [X86] Correct type for argument to clflushopt intrinsic.
Summary: According to Intel docs this should take void const *. We had char*. The lack of const is the main issue.

Reviewers: RKSimon, zvi, igorb

Reviewed By: igorb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38782

llvm-svn: 315470
2017-10-11 16:06:08 +00:00
Jonas Hahnfeld f21a60233c [CUDA] Fix name of __activemask()
The name has two underscores in the official CUDA documentation:
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-vote-functions

Differential Revision: https://reviews.llvm.org/D38468

llvm-svn: 314691
2017-10-02 17:50:11 +00:00
Artem Belevich 93e33f8fb3 [CUDA] Work around conflicting function definitions in CUDA-9 headers.
Differential Revision: https://reviews.llvm.org/D38326

llvm-svn: 314334
2017-09-27 19:07:15 +00:00
Artem Belevich bab95c7087 [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.
Differential Revision: https://reviews.llvm.org/D38191

llvm-svn: 314223
2017-09-26 17:07:23 +00:00
Justin Lebar d31d5e6aa2 Revert "[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.", rL314135.
Causing assertion failures on macos:

> Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"),
> function getOperand, file
> /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h,
> line 835.

http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/

llvm-svn: 314142
2017-09-25 19:41:56 +00:00
Artem Belevich 9941ee9529 [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.
Differential Revision: https://reviews.llvm.org/D38191

llvm-svn: 314135
2017-09-25 18:53:57 +00:00
Artem Belevich 4d80105792 [CUDA] Fix names of __nvvm_vote* intrinsics.
Also fixed a syntax error in activemask().

Differential Revision: https://reviews.llvm.org/D38188

llvm-svn: 314129
2017-09-25 17:55:26 +00:00
Jina Nahias 123c599a0f fixing a bug in mask[z]_set1 intrinsic
Differential Revision: https://reviews.llvm.org/D38231

Change-Id: I80bbff9cbe93e4be54d8a761ef9723edf3f57c57
llvm-svn: 314102
2017-09-25 13:38:08 +00:00
Artem Belevich b542f1f3df [CUDA] Fixed order of words in the names of shfl builtins.
Differential Revision: https://reviews.llvm.org/D38147

llvm-svn: 313899
2017-09-21 18:46:39 +00:00
Artem Belevich 42960b4188 [NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} instructions/intrinsics/builtins.
Differential Revision: https://reviews.llvm.org/D38148

llvm-svn: 313898
2017-09-21 18:44:49 +00:00
Artem Belevich 4654dc89be [NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins.
Differential Revision: https://reviews.llvm.org/D38090

llvm-svn: 313820
2017-09-20 21:23:07 +00:00
Jina Nahias 3ad702a1ed Lowering Mask Set1 intrinsics to LLVM IR
This patch, together with a matching llvm patch (https://reviews.llvm.org/D37669), implements the lowering of X86 mask set1 intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D37668

llvm-svn: 313624
2017-09-19 11:00:27 +00:00
Craig Topper 04370d3a82 [X86] Disable _mm512_maskz_set1_epi64 intrinsic on 32-bit targets to prevent a backend isel failure.
The __builtin_ia32_pbroadcastq512_mem_mask we were previously trying to use in 32-bit mode is not implemented in the x86 backend and causes isel to fail in release builds. In debug builds it fails even earlier during legalization with an llvm_unreachable.

While there add the missing test case for this intrinsic for this for 64-bit mode.

This fixes PR34631. D37668 should be able to recover this for 32-bit mode soon. But I wanted to fix the crash ahead of that.

llvm-svn: 313392
2017-09-15 20:27:59 +00:00
Artem Belevich 9d0052160f [CUDA] Work around a new quirk in CUDA9 headers.
In CUDA-9 some of device-side math functions that we need are conditionally
defined within '#if _GLIBCXX_MATH_H'. We need to temporarily undo the guard
around inclusion of math_functions.hpp.

Differential Revision: https://reviews.llvm.org/D37906

llvm-svn: 313369
2017-09-15 17:30:53 +00:00
Martin Storsjo 0fd7c5ccd6 [Headers] Fix the return type of _InterlockedCompareExchange_rel
This was a typo in SVN r282447, where it was added.

llvm-svn: 313232
2017-09-14 07:04:59 +00:00
Sjoerd Meijer c05609ca36 This adds the _Float16 preprocessor macro definitions.
Differential Revision: https://reviews.llvm.org/D34695

llvm-svn: 313152
2017-09-13 15:23:19 +00:00
Yael Tsafrir 23e7733230 [X86] Lower _mm[256|512]_[mask[z]]_avg_epu[8|16] intrinsics to native llvm IR
Differential Revision: https://reviews.llvm.org/D37562

llvm-svn: 313011
2017-09-12 07:46:32 +00:00
Artem Belevich 8af4e23d1e [CUDA] Added rudimentary support for CUDA-9 and sm_70.
For now CUDA-9 is not included in the list of CUDA versions clang
searches for, so the path to CUDA-9 must be explicitly passed
via --cuda-path=.

On LLVM side NVPTX added sm_70 GPU type which bumps required
PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment.

Differential Revision: https://reviews.llvm.org/D37576

llvm-svn: 312734
2017-09-07 18:14:32 +00:00
Justin Lebar 3310888aec [CUDA] Add device overloads for non-placement new/delete.
Summary:
Tests have to live in the test-suite, and so will come in a separate
patch.

Fixes PR34360.

Reviewers: tra

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D37539

llvm-svn: 312681
2017-09-07 00:37:20 +00:00