Commit Graph

1308 Commits

Author SHA1 Message Date
James Y Knight 8799caee8d [opaque pointer types] Trivial changes towards CallInst requiring
explicit function types.

llvm-svn: 353009
2019-02-03 21:53:49 +00:00
Erik Pilkington 9c3b588db9 Add a new builtin: __builtin_dynamic_object_size
This builtin has the same UI as __builtin_object_size, but has the
potential to be evaluated dynamically. It is meant to be used as a
drop-in replacement for libraries that use __builtin_object_size when
a dynamic checking mode is enabled. For instance,
__builtin_object_size fails to provide any extra checking in the
following function:

  void f(size_t alloc) {
    char* p = malloc(alloc);
    strcpy(p, "foobar"); // expands to __builtin___strcpy_chk(p, "foobar", __builtin_object_size(p, 0))
  }

This is an overflow if alloc < 7, but because LLVM can't fold the
object size intrinsic statically, it folds __builtin_object_size to
-1. With __builtin_dynamic_object_size, alloc is passed through to
__builtin___strcpy_chk.

rdar://32212419

Differential revision: https://reviews.llvm.org/D56760

llvm-svn: 352665
2019-01-30 20:34:53 +00:00
Erik Pilkington 600e9deacf Add a 'dynamic' parameter to the objectsize intrinsic
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.

rdar://32212419

Differential revision: https://reviews.llvm.org/D56761

llvm-svn: 352664
2019-01-30 20:34:35 +00:00
James Y Knight 3933addd30 Cleanup: replace uses of CallSite with CallBase.
llvm-svn: 352595
2019-01-30 02:54:28 +00:00
Matt Arsenault b72888647b AMDGPU: Add ds append/consume builtins
llvm-svn: 352443
2019-01-28 23:59:18 +00:00
Craig Topper 07b6d3de1b [X86] Add new variadic avx512 compress/expand intrinsics that use vXi1 types for the mask argument.
Custom lower the builtins to these intrinsics. This enables the middle end to optimize out bitcasts for the masks.

llvm-svn: 352344
2019-01-28 07:03:10 +00:00
Craig Topper bd7884ed79 [X86] Custom codegen 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics.
Summary:
The 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics all have the possibility of taking an explicit rounding mode argument. If the rounding mode is CUR_DIRECTION we'd like to emit a sitofp/uitofp instruction and a select like we do for 256-bit intrinsics.

For cvt(u)qqtopd and cvt(u)dqtops we do this when the form of the software intrinsics that doesn't take a rounding mode argument is used. This is done by using convertvector in the header with the select builtin. But if the explicit rounding mode form of the intrinsic is used and CUR_DIRECTION is passed, we don't do this. We shouldn't have this inconsistency.

For cvt(u)qqtops nothing is done because we can't use the select builtin in the header without avx512vl. So we need to use custom codegen for this.

Even when the rounding mode isn't CUR_DIRECTION we should also use select in IR for consistency. And it will remove another scalar integer mask from our intrinsics.

To accomplish all of these goals I've taken a slightly unusual approach. I've added two new X86 specific intrinsics for sitofp/uitofp with rounding. These intrinsics are variadic on the input and output type so we only need 2 instead of 6. This avoids the need for a switch to map them in CGBuiltin.cpp. We just need to check signed vs unsigned. I believe other targets also use variadic intrinsics like this.

So if the rounding mode is CUR_DIRECTION we'll use an sitofp/uitofp instruction. Otherwise we'll use one of the new intrinsics. After that we'll emit a select instruction if needed.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D56998

llvm-svn: 352267
2019-01-26 02:42:01 +00:00
Simon Pilgrim a7bcd72c0a [X86] Replace VPCOM/VPCOMU with generic integer comparisons (clang)
These intrinsics can always be replaced with generic integer comparisons without any regression in codegen, even for -O0/-fast-isel cases.

Noticed while cleaning up vector integer comparison costs for PR40376.

A future commit will remove/autoupgrade the existing VPCOM/VPCOMU llvm intrinsics.

llvm-svn: 351687
2019-01-20 16:40:33 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Anton Korobeynikov 81cff31ccf CodeGen: Cast llvm.flt.rounds result to match __builtin_flt_rounds
llvm.flt.rounds returns an i32, but the builtin expects an integer. 
On targets where integers are not 32-bits clang tries to bitcast the result, causing an assertion failure.

The patch enables newlib build for msp430.

Patch by Edward Jones!

Differential Revision: https://reviews.llvm.org/D24461

llvm-svn: 351449
2019-01-17 15:21:55 +00:00
Craig Topper 015585abb2 [X86] Add custom emission for the avx512 scatter builtins to convert from scalar integer to vXi1 for the mask arguments to the intrinsics.
llvm-svn: 351408
2019-01-17 00:34:19 +00:00
Craig Topper 931779761e Recommit r351160 "[X86] Make _xgetbv/_xsetbv on non-windows platforms"
V8 has been fixed now.

llvm-svn: 351391
2019-01-16 22:56:25 +00:00
Craig Topper bb5b06603b [X86] Add versions of the avx512 gather intrinsics that take the mask as a vXi1 vector instead of a scalar
We need to custom handle these so we can turn the scalar mask into a vXi1 vector.

Differential Revision: https://reviews.llvm.org/D56530

llvm-svn: 351390
2019-01-16 22:34:33 +00:00
Benjamin Kramer 9c53890833 Revert "[X86] Make _xgetbv/_xsetbv on non-windows platforms"
This reverts commit r351160. Breaks building v8.

llvm-svn: 351210
2019-01-15 17:23:36 +00:00
Roman Lebedev bd1c087019 [clang][UBSan] Sanitization for alignment assumptions.
Summary:
UB isn't nice. It's cool and powerful, but not nice.
Having a way to detect it is nice though.
[[ https://wg21.link/p1007r3 | P1007R3: std::assume_aligned ]] / http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1007r2.pdf says:
```
We propose to add this functionality via a library function instead of a core language attribute.
...
If the pointer passed in is not aligned to at least N bytes, calling assume_aligned results in undefined behaviour.
```

This differential teaches clang to sanitize all the various variants of this assume-aligned attribute.

Requires D54588 for LLVM IRBuilder changes.
The compiler-rt part is D54590.

This is a second commit, the original one was r351105,
which was mass-reverted in r351159 because 2 compiler-rt tests were failing.

Reviewers: ABataev, craig.topper, vsk, rsmith, rnk, #sanitizers, erichkeane, filcab, rjmccall

Reviewed By: rjmccall

Subscribers: chandlerc, ldionne, EricWF, mclow.lists, cfe-commits, bkramer

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D54589

llvm-svn: 351177
2019-01-15 09:44:25 +00:00
Craig Topper 69aed7c364 [X86] Make _xgetbv/_xsetbv on non-windows platforms
Summary:
This patch attempts to redo what was tried in r278783, but was reverted.

These intrinsics should be available on non-windows platforms with "xsave" feature check. But on Windows platforms they shouldn't have feature check since that's how MSVC behaves.

To accomplish this I've added a MS builtin with no feature check. And a normal gcc builtin with a feature check. When _MSC_VER is not defined _xgetbv/_xsetbv will be macros pointing to the gcc builtin name.

I've moved the forward declarations from intrin.h to immintrin.h to match the MSDN documentation and used that as the header file for the MS builtin.

I'm not super happy with this implementation, and I'm open to suggestions for better ways to do it.

Reviewers: rnk, RKSimon, spatel

Reviewed By: rnk

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D56686

llvm-svn: 351160
2019-01-15 05:03:18 +00:00
Vlad Tsyrklevich 86e68fda3b Revert alignment assumptions changes
Revert r351104-6, r351109, r351110, r351119, r351134, and r351153. These
changes fail on the sanitizer bots.

llvm-svn: 351159
2019-01-15 03:38:02 +00:00
Roman Lebedev 7892c37455 [clang][UBSan] Sanitization for alignment assumptions.
Summary:
UB isn't nice. It's cool and powerful, but not nice.
Having a way to detect it is nice though.
[[ https://wg21.link/p1007r3 | P1007R3: std::assume_aligned ]] / http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1007r2.pdf says:
```
We propose to add this functionality via a library function instead of a core language attribute.
...
If the pointer passed in is not aligned to at least N bytes, calling assume_aligned results in undefined behaviour.
```

This differential teaches clang to sanitize all the various variants of this assume-aligned attribute.

Requires D54588 for LLVM IRBuilder changes.
The compiler-rt part is D54590.

Reviewers: ABataev, craig.topper, vsk, rsmith, rnk, #sanitizers, erichkeane, filcab, rjmccall

Reviewed By: rjmccall

Subscribers: chandlerc, ldionne, EricWF, mclow.lists, cfe-commits, bkramer

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D54589

llvm-svn: 351105
2019-01-14 19:09:27 +00:00
Dan Gohman 51532a524e [WebAssembly] Remove old builtins
This removes the old grow_memory and mem.grow-style builtins, leaving just
the memory.grow-style builtins.

Differential Revision: https://reviews.llvm.org/D56645

llvm-svn: 351089
2019-01-14 18:28:10 +00:00
Craig Topper 49488407aa [X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select in IR instead.
Fixes PR40259

llvm-svn: 351036
2019-01-14 08:46:51 +00:00
Craig Topper 689b3b71af [X86] Remove mask parameter from vpshufbitqmb intrinsics. Change result to a vXi1 vector.
We'll do the scalar<->vXi1 conversions with bitcasts in IR.

Fixes PR40258

llvm-svn: 351029
2019-01-14 00:03:55 +00:00
Craig Topper cd9e232a4d Recommit r350555 "[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins."
The MSVC limit hit in AutoUpgrade.cpp has been worked around for now.

llvm-svn: 350568
2019-01-07 21:00:41 +00:00
Craig Topper 33c9088783 Revert r350555 "[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins."
Had to revert the LLVM patch this depends on to fix a MSVC compiler limit in AutoUpgrade.cpp

llvm-svn: 350563
2019-01-07 19:39:25 +00:00
Craig Topper e34f2bb807 [X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins.
Differential Revision: https://reviews.llvm.org/D56365

llvm-svn: 350555
2019-01-07 19:10:22 +00:00
Haibo Huang 303b2333e4 Declares __cpu_model as dso local
__builtin_cpu_supports and __builtin_cpu_is use information in __cpu_model to decide cpu features. Before this change, __cpu_model was not declared as dso local. The generated code looks up the address in GOT when reading __cpu_model. This makes it impossible to use these functions in ifunc, because at that time GOT entries have not been relocated. This change makes it dso local.

Differential Revision: https://reviews.llvm.org/D53850

llvm-svn: 349825
2018-12-20 21:33:59 +00:00
Simon Pilgrim 4597379227 [X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (clang)
This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics.

LLVM counterpart: https://reviews.llvm.org/D55938

Differential Revision: https://reviews.llvm.org/D55937

llvm-svn: 349796
2018-12-20 19:01:13 +00:00
Simon Pilgrim 313dc85ce0 [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (clang)
This emits SADD_SAT/SSUB_SAT generic intrinsics for the SSE signed saturated math intrinsics.

LLVM counterpart: https://reviews.llvm.org/D55894

Differential Revision: https://reviews.llvm.org/D55890

llvm-svn: 349743
2018-12-20 11:53:45 +00:00
Simon Pilgrim a7b30b4a58 [X86][SSE] Auto upgrade PADDUS/PSUBUS intrinsics to UADD_SAT/USUB_SAT generic intrinsics (clang)
Sibling patch to D55855, this emits UADD_SAT/USUB_SAT generic intrinsics for the SSE saturated math intrinsics instead of expanding to a IR code sequence that could be difficult to reassemble.

Differential Revision: https://reviews.llvm.org/D55879

llvm-svn: 349631
2018-12-19 14:43:47 +00:00
Vedant Kumar 77dfca88b2 [CodeGen] Handle mixed-width ops in mixed-sign mul-with-overflow lowering
The special lowering for __builtin_mul_overflow introduced in r320902
fixed an ICE seen when passing mixed-sign operands to the builtin.

This patch extends the special lowering to cover mixed-width, mixed-sign
operands. In a few common scenarios, calls to muloti4 will no longer be
emitted.

This should address the latest comments in PR34920 and work around the
link failure seen in:

  https://bugzilla.redhat.com/show_bug.cgi?id=1657544

Testing:
- check-clang
- A/B output comparison with: https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081

Differential Revision: https://reviews.llvm.org/D55843

llvm-svn: 349542
2018-12-18 21:05:03 +00:00
Eric Fiselier 261875054e [Clang] Add __builtin_launder
Summary:
This patch adds `__builtin_launder`, which is required to implement `std::launder`. Additionally GCC provides `__builtin_launder`, so thing brings Clang in-line with GCC.

I'm not exactly sure what magic `__builtin_launder` requires, but  based on previous discussions this patch applies a `@llvm.invariant.group.barrier`. As noted in previous discussions, this may not be enough to correctly handle vtables.

Reviewers: rnk, majnemer, rsmith

Reviewed By: rsmith

Subscribers: kristina, Romain-Geissler-1A, erichkeane, amharc, jroelofs, cfe-commits, Prazek

Differential Revision: https://reviews.llvm.org/D40218

llvm-svn: 349195
2018-12-14 21:11:28 +00:00
Craig Topper 1f2b181689 [Builltins][X86] Provide implementations of __lzcnt16, __lzcnt, __lzcnt64 for MS compatibility. Remove declarations from intrin.h and implementations from lzcntintrin.h
intrin.h had forward declarations for these and lzcntintrin.h had implementations that were only available with -mlzcnt or a -march that supported the lzcnt feature.

For MS compatibility we should always have these builtins available regardless of X86 being the target or the CPU support the lzcnt instruction. The backends should be able to gracefully fallback to something support even if its just shifts and bit ops.

Unfortunately, gcc also implements 2 of the 3 function names here on X86 when lzcnt feature is enabled.

This patch adds builtins for these for MSVC compatibility and drops the forward declarations from intrin.h. To keep the gcc compatibility the two intrinsics that collided have been turned into macros that use the X86 specific builtins with the lzcnt feature check. These macros are only defined when _MSC_VER is not defined. Without them being macros we can get a redefinition error because -ms-extensions doesn't seem to set _MSC_VER but does make the MS builtins available.

Should fix PR40014

Differential Revision: https://reviews.llvm.org/D55677

llvm-svn: 349098
2018-12-14 00:21:02 +00:00
Haibo Huang e177082972 Revert "Declares __cpu_model as dso local"
This reverts r348978

llvm-svn: 348982
2018-12-12 22:39:51 +00:00
Haibo Huang 6b22f59207 Declares __cpu_model as dso local
__builtin_cpu_supports and __builtin_cpu_is use information in __cpu_model to decide cpu features. Before this change, __cpu_model was not declared as dso local. The generated code looks up the address in GOT when reading __cpu_model. This makes it impossible to use these functions in ifunc, because at that time GOT entries have not been relocated. This change makes it dso local.

Differential Revision: https://reviews.llvm.org/D53850

llvm-svn: 348978
2018-12-12 22:04:12 +00:00
Raphael Isemann b23ccecbb0 Misc typos fixes in ./lib folder
Summary: Found via `codespell -q 3 -I ../clang-whitelist.txt -L uint,importd,crasher,gonna,cant,ue,ons,orign,ned`

Reviewers: teemperor

Reviewed By: teemperor

Subscribers: teemperor, jholewinski, jvesely, nhaehnle, whisperity, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D55475

llvm-svn: 348755
2018-12-10 12:37:46 +00:00
Craig Topper 6d7a7ef9eb [X86] Remove the addcarry builtins. Leaving only the addcarryx builtins since that matches gcc.
The addcarry and addcarryx builtins do the same thing. The only difference is that addcarryx previously required adx feature.

This commit removes the adx feature check from addcarryx and removes the addcarry builtin. This matches the builtins that gcc has. We don't guarantee compatibility in builtins, but we generally try to be consistent if its not a burden.

llvm-svn: 348738
2018-12-10 06:07:59 +00:00
Fangrui Song 407659ab0a Revert "Revert r347417 "Re-Reinstate 347294 with a fix for the failures.""
It seems the two failing tests can be simply fixed after r348037

Fix 3 cases in Analysis/builtin-functions.cpp
Delete the bad CodeGen/builtin-constant-p.c for now

llvm-svn: 348053
2018-11-30 23:41:18 +00:00
Fangrui Song f5d3335d75 Revert r347417 "Re-Reinstate 347294 with a fix for the failures."
Kept the "indirect_builtin_constant_p" test case in test/SemaCXX/constant-expression-cxx1y.cpp
while we are investigating why the following snippet fails:

  extern char extern_var;
  struct { int a; } a = {__builtin_constant_p(extern_var)};

llvm-svn: 348039
2018-11-30 21:26:09 +00:00
Hans Wennborg 48ee4ad325 Re-commit r347417 "Re-Reinstate 347294 with a fix for the failures."
This was reverted in r347656 due to me thinking it caused a miscompile of
Chromium. Turns out it was the Chromium code that was broken.

llvm-svn: 347756
2018-11-28 14:04:12 +00:00
Hans Wennborg 8c79706e89 Revert r347417 "Re-Reinstate 347294 with a fix for the failures."
This caused a miscompile in Chrome (see crbug.com/908372) that's
illustrated by this small reduction:

  static bool f(int *a, int *b) {
    return !__builtin_constant_p(b - a) || (!(b - a));
  }

  int arr[] = {1,2,3};

  bool g() {
    return f(arr, arr + 3);
  }

  $ clang -O2 -S -emit-llvm a.cc -o -

g() should return true, but after r347417 it became false for some reason.

This also reverts the follow-up commits.

r347417:
> Re-Reinstate 347294 with a fix for the failures.
>
> Don't try to emit a scalar expression for a non-scalar argument to
> __builtin_constant_p().
>
> Third time's a charm!

r347446:
> The result of is.constant() is unsigned.

r347480:
> A __builtin_constant_p() returns 0 with a function type.

r347512:
> isEvaluatable() implies a constant context.
>
> Assume that we're in a constant context if we're asking if the expression can
> be compiled into a constant initializer. This fixes the issue where a
> __builtin_constant_p() in a compound literal was diagnosed as not being
> constant, even though it's always possible to convert the builtin into a
> constant.

r347531:
> A "constexpr" is evaluated in a constant context. Make sure this is reflected
> if a __builtin_constant_p() is a part of a constexpr.

llvm-svn: 347656
2018-11-27 14:01:40 +00:00
Sanjay Patel c6fa5bc7c7 [CodeGen] translate MS rotate builtins to LLVM funnel-shift intrinsics
This was originally part of:
D50924

and should resolve PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387

...but it was reverted because some bots using a gcc host compiler 
would crash for unknown reasons with this included in the patch. 
Trying again now to see if that's still a problem.

llvm-svn: 347527
2018-11-25 17:53:16 +00:00
Bill Wendling 46acc72cf4 A __builtin_constant_p() returns 0 with a function type.
llvm-svn: 347480
2018-11-22 22:58:06 +00:00
Bill Wendling 2a6c59ea2a The result of is.constant() is unsigned.
llvm-svn: 347446
2018-11-22 09:31:08 +00:00
Bill Wendling 6ff1751f7d Re-Reinstate 347294 with a fix for the failures.
Don't try to emit a scalar expression for a non-scalar argument to
__builtin_constant_p().

Third time's a charm!

llvm-svn: 347417
2018-11-21 20:44:18 +00:00
Nico Weber 9f0246d473 Revert r347364 again, the fix was incomplete.
llvm-svn: 347389
2018-11-21 12:47:43 +00:00
Bill Wendling 91549ed15f Reinstate 347294 with a fix for the failures.
EvaluateAsInt() is sometimes called in a constant context. When that's the
case, we need to specify it as so.

llvm-svn: 347364
2018-11-20 23:24:16 +00:00
Nico Weber 6438972553 Revert 347294, it turned many bots on lab.llvm.org:8011/console red.
llvm-svn: 347314
2018-11-20 15:27:43 +00:00
Bill Wendling 107b0e9881 Use is.constant intrinsic for __builtin_constant_p
Summary:
A __builtin_constant_p may end up with a constant after inlining. Use
the is.constant intrinsic if it's a variable that's in a context where
it may resolve to a constant, e.g., an argument to a function after
inlining.

Reviewers: rsmith, shafik

Subscribers: jfb, kristina, cfe-commits, nickdesaulniers, jyknight

Differential Revision: https://reviews.llvm.org/D54355

llvm-svn: 347294
2018-11-20 08:53:30 +00:00
Alexey Sotkin 692f12b389 [OpenCL] Fix invalid address space generation for clk_event_t
Summary:
Addrspace(32) was generated when putting 0 in clk_event_t * event_ret
parameter for enqueue_kernel function.

Patch by Viktoria Maksimova

Reviewers: Anastasia, yaxunl, AlexeySotkin

Reviewed By:  Anastasia, AlexeySotkin

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D53809

llvm-svn: 346838
2018-11-14 09:40:05 +00:00
Erich Keane de6480a38c [NFC] Move storage of dispatch-version to GlobalDecl
As suggested by Richard Smith, and initially put up for review here:
https://reviews.llvm.org/D53341, this patch removes a hack that was used
to ensure that proper target-feature lists were used when emitting
cpu-dispatch (and eventually, target-clones) implementations. As a part
of this, the GlobalDecl object is proliferated to a bunch more
locations.

Originally, this was put up for review (see above) to get acceptance on
the approach, though discussion with Richard in San Diego showed he
approved of the approach taken here.  Thus, I believe this is acceptable
for Review-After-commit

Differential Revision: https://reviews.llvm.org/D53341

Change-Id: I0a0bd673340d334d93feac789d653e03d9f6b1d5
llvm-svn: 346757
2018-11-13 15:48:08 +00:00
Jonas Devlieghere 64a2630825 Pass the function type instead of the return type to FunctionDecl::Create
Fix places where the return type of a FunctionDecl was being used in
place of the function type

FunctionDecl::Create() takes as its T parameter the type of function
that should be created, not the return type. Passing in the return type
looks to have been copypasta'd around a bit, but the number of correct
usages outweighs the incorrect ones so I've opted for keeping what T is
the same and fixing up the call sites instead.

This fixes a crash in Clang when attempting to compile the following
snippet of code with -fblocks -fsanitize=function -x objective-c++ (my
original repro case):

  void g(void(^)());
  void f()
  {
      __block int a = 0;
        g(^(){ a++; });
  }

as well as the following which only requires -fsanitize=function -x c++:

  void f(char * buf)
  {
      __builtin_os_log_format(buf, "");
  }

Patch by: Ben (bobsayshilol)

Differential revision: https://reviews.llvm.org/D53263

llvm-svn: 346601
2018-11-11 00:56:15 +00:00
Kadir Cetinkaya b1501462e2 T was unused on assertion disabled builds.
llvm-svn: 346216
2018-11-06 08:59:25 +00:00
Akira Hatanaka 908aabb783 Cast to uint64_t instead of to unsigned.
This is a follow-up to r346211.

llvm-svn: 346212
2018-11-06 07:12:28 +00:00
Akira Hatanaka d572cf496d os_log: Allow specifying mask type in format string.
A mask type is a 1 to 8-byte string that follows the "mask." annotation
in the format string. This enables obfuscating data in the event the
provided privacy level isn't enabled.

rdar://problem/36756282

llvm-svn: 346211
2018-11-06 07:05:14 +00:00
Mandeep Singh Grang 574caddc0d [COFF, ARM64] Implement InterlockedDecrement*_* builtins
This is eight in a series of patches to move intrinsic definitions out of intrin.h.

Differential: https://reviews.llvm.org/D54068
llvm-svn: 346208
2018-11-06 05:07:43 +00:00
Mandeep Singh Grang fdf74d9751 [COFF, ARM64] Implement InterlockedIncrement*_* builtins
This is seventh in a series of patches to move intrinsic definitions out of intrin.h.

Differential: https://reviews.llvm.org/D54067
llvm-svn: 346207
2018-11-06 05:05:32 +00:00
Mandeep Singh Grang c89157b5c1 [COFF, ARM64] Implement InterlockedAnd*_* builtins
This is sixth in a series of patches to move intrinsic definitions out of intrin.h.

Differential: https://reviews.llvm.org/D54066
llvm-svn: 346206
2018-11-06 05:03:13 +00:00
Mandeep Singh Grang 806f10701b [COFF, ARM64] Implement InterlockedXor*_* builtins
This is fifth in a series of patches to move intrinsic definitions out of intrin.h.

Note: This was reviewed and approved in D54065 but somehow that diff was messed
up. Committing this again with the proper diff.

llvm-svn: 346205
2018-11-06 04:55:20 +00:00
Mandeep Singh Grang d9f70b1495 Revert "[COFF, ARM64] Implement InterlockedXor*_* builtins"
This reverts commit cc3d3cd0fbeb88412d332354c261ff139c4ede6b.

llvm-svn: 346192
2018-11-06 01:14:24 +00:00
Mandeep Singh Grang d8a4455d97 [COFF, ARM64] Implement InterlockedXor*_* builtins
Summary: This is fifth in a series of patches to move intrinsic definitions out of intrin.h.

Reviewers: rnk, efriedma, mstorsjo, TomTan

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits

Differential Revision: https://reviews.llvm.org/D54065

llvm-svn: 346191
2018-11-06 01:12:29 +00:00
Mandeep Singh Grang ec62b31e2c [COFF, ARM64] Implement InterlockedOr*_* builtins
This is fourth in a series of patches to move intrinsic definitions out of intrin.h.

llvm-svn: 346190
2018-11-06 01:11:25 +00:00
Mandeep Singh Grang 6b880689f0 [COFF, ARM64] Implement InterlockedCompareExchange*_* builtins
Summary: This is third in a series of patches to move intrinsic definitions out of intrin.h.

Reviewers: rnk, efriedma, mstorsjo, TomTan

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits

Differential Revision: https://reviews.llvm.org/D54062

llvm-svn: 346189
2018-11-06 00:36:48 +00:00
Mandeep Singh Grang 7fa07e554d [COFF, ARM64] Implement InterlockedExchange*_* builtins
Summary: Windows SDK needs these intrinsics to be proper builtins.  This is second in a series of patches to move intrinsic defintions out of intrin.h.

Reviewers: rnk, mstorsjo, efriedma, TomTan

Reviewed By: rnk, efriedma

Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits

Differential Revision: https://reviews.llvm.org/D54046

llvm-svn: 346044
2018-11-02 21:18:23 +00:00
Mandeep Singh Grang 6cef4e5c87 [COFF, ARM64] Change setjmp for AArch64 Windows to use Intrinsic.sponentry
Summary: ARM64 setjmp expects sp on entry instead of framepointer.

Patch by: Yin Ma (yinma@codeaurora.org)

Reviewers: mgrang, eli.friedman, ssijaric, mstorsjo, rnk, compnerd

Reviewed By: mgrang

Subscribers: efriedma, javed.absar, kristof.beyls, chrib, cfe-commits

Differential Revision: https://reviews.llvm.org/D53998

llvm-svn: 346024
2018-11-02 18:10:07 +00:00
Tim Northover 314fbfa1c4 Reapply Logging: make os_log buffer size an integer constant expression.
The size of an os_log buffer is known at any stage of compilation, so making it
a constant expression means that the common idiom of declaring a buffer for it
won't result in a VLA. That allows the compiler to skip saving and restoring
the stack pointer around such buffers.

This also moves the OSLog and other FormatString helpers from
libclangAnalysis to libclangAST to avoid a circular dependency.

llvm-svn: 345971
2018-11-02 13:14:11 +00:00
Reid Kleckner 4dc0b1ac60 Fix clang -Wimplicit-fallthrough warnings across llvm, NFC
This patch should not introduce any behavior changes. It consists of
mostly one of two changes:
1. Replacing fall through comments with the LLVM_FALLTHROUGH macro
2. Inserting 'break' before falling through into a case block consisting
   of only 'break'.

We were already using this warning with GCC, but its warning behaves
slightly differently. In this patch, the following differences are
relevant:
1. GCC recognizes comments that say "fall through" as annotations, clang
   doesn't
2. GCC doesn't warn on "case N: foo(); default: break;", clang does
3. GCC doesn't warn when the case contains a switch, but falls through
   the outer case.

I will enable the warning separately in a follow-up patch so that it can
be cleanly reverted if necessary.

Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu

Differential Revision: https://reviews.llvm.org/D53950

llvm-svn: 345882
2018-11-01 19:54:45 +00:00
Mandeep Singh Grang 5c39b6ab7f Revert "[COFF, ARM64] Change setjmp for AArch64 Windows to use Intrinsic.sponentry"
This reverts commit 619111f5ccf349b635e4987ec02d15777c571495.

llvm-svn: 345872
2018-11-01 18:38:26 +00:00
Tim Northover eedc0f0f1a Revert "Reapply Logging: make os_log buffer size an integer constant expression."
Still more dependency hell.

llvm-svn: 345871
2018-11-01 18:37:42 +00:00
Tim Northover c1ac697ab7 Reapply Logging: make os_log buffer size an integer constant expression.
The size of an os_log buffer is known at any stage of compilation, so making it
a constant expression means that the common idiom of declaring a buffer for it
won't result in a VLA. That allows the compiler to skip saving and restoring
the stack pointer around such buffers.

This also moves the OSLog helpers from libclangAnalysis to libclangAST
to avoid a circular dependency.

llvm-svn: 345866
2018-11-01 18:04:49 +00:00
Tim Northover d686dbbc7c Revert "Logging: make os_log buffer size an integer constant expression.
This also reverts a couple of follow-up commits trying to fix the
dependency issues. Latest revision added a cyclic dependency that can't
just be patched up in 5 minutes.

llvm-svn: 345846
2018-11-01 16:15:24 +00:00
Tim Northover a94ecc619b Logging: make os_log buffer size an integer constant expression.
The size of an os_log buffer is known at any stage of compilation, so making it
a constant expression means that the common idiom of declaring a buffer for it
won't result in a VLA. That allows the compiler to skip saving and restoring
the stack pointer around such buffers.

llvm-svn: 345828
2018-11-01 13:49:54 +00:00
Mandeep Singh Grang be0e78e017 [COFF, ARM64] Implement llvm.addressofreturnaddress intrinsic
llvm-svn: 345808
2018-11-01 01:35:34 +00:00
Thomas Lively 6940328d02 [WebAssembly] Fix type names in truncation builtins
Summary: Use the same convention as all the other WebAssembly builtin names.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits

Differential Revision: https://reviews.llvm.org/D53724

llvm-svn: 345804
2018-11-01 01:03:17 +00:00
Mandeep Singh Grang e7c7934a11 [COFF, ARM64] Change setjmp for AArch64 Windows to use Intrinsic.sponentry
Summary: ARM64 setjmp expects sp on entry instead of framepointer.

Reviewers: mgrang, rnk, TomTan, compnerd, mstorsjo, efriedma

Reviewed By: mstorsjo

Subscribers: javed.absar, kristof.beyls, chrib, cfe-commits

Differential Revision: https://reviews.llvm.org/D53684

llvm-svn: 345792
2018-10-31 23:17:36 +00:00
Eli Friedman b262d1631e [ARM64] [Windows] Implement _InterlockedExchangeAdd*_* builtins.
These apparently need to be proper builtins to handle the Windows
SDK.

Differential Revision: https://reviews.llvm.org/D53916

llvm-svn: 345779
2018-10-31 21:31:09 +00:00
Bryan Chan 223307b3dc [AArch64] Implement FP16FML intrinsics
Generate the FP16FML intrinsics into arm_neon.h (AArch64 only for now).
Add two new type modifiers to NeonEmitter to handle the new prototypes.
Define __ARM_FEATURE_FP16FML when +fp16fml is enabled and guard the
intrinsics with the macro in arm_neon.h.

Based on a patch by Gao Yiling.

Differential Revision: https://reviews.llvm.org/D53633

llvm-svn: 345344
2018-10-25 23:47:00 +00:00
Thomas Lively d4bf99a540 [WebAssembly] Bitselect and min/max builtins
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits

Differential Revision: https://reviews.llvm.org/D53685

llvm-svn: 345301
2018-10-25 19:11:41 +00:00
Thomas Lively 535b4df75a [WebAssembly] Lower to target-independent saturating add
Summary: Goes along with D53721.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits

Differential Revision: https://reviews.llvm.org/D53722

llvm-svn: 345300
2018-10-25 19:06:15 +00:00
Craig Topper 4d8ced1807 [X86] Add support for more than 32 features for __builtin_cpu_is
libgcc supports more than 32 features by adding a new 32-bit variable __cpu_features2.

This adds the clang support for checking these feature bits.

Patches for compiler-rt and llvm to support this are coming as well.

Probably still need an additional patch for target multiversioning in clang.

Differential Revision: https://reviews.llvm.org/D53458

llvm-svn: 344832
2018-10-20 03:51:52 +00:00
Craig Topper 9c8f3c9654 [X86] When checking the bits in cpu_features for function multiversioning dispatcher in the resolver, make sure all the required bits are set. Not just one of them
Summary:
The multiversioning code repurposed the code from __builtin_cpu_supports for checking if a single feature is enabled. That code essentially performed (_cpu_features & (1 << C)) != 0. But with the multiversioning path, the mask is no longer guaranteed to be a power of 2. So we return true anytime any one of the bits in the mask is set not just all of the bits.

The correct check is (_cpu_features & mask) == mask

Reviewers: erichkeane, echristo

Reviewed By: echristo

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D53460

llvm-svn: 344824
2018-10-20 01:30:00 +00:00
Mandeep Singh Grang 2147b1af95 [COFF, ARM64] Add _ReadStatusReg and_WriteStatusReg intrinsics
Reviewers: rnk, compnerd, mstorsjo, efriedma, TomTan, haripul, javed.absar

Reviewed By: efriedma

Subscribers: dmajor, kristof.beyls, chrib, cfe-commits

Differential Revision: https://reviews.llvm.org/D53115

llvm-svn: 344765
2018-10-18 23:35:35 +00:00
Yaxun Liu aae1e87f4b AMDGPU: add __builtin_amdgcn_update_dpp
Emit llvm.amdgcn.update.dpp for both __builtin_amdgcn_mov_dpp and
__builtin_amdgcn_update_dpp. The first argument to
llvm.amdgcn.update.dpp will be undef for __builtin_amdgcn_mov_dpp.

Differential Revision: https://reviews.llvm.org/D52320

llvm-svn: 344665
2018-10-17 02:32:26 +00:00
Thomas Lively 07ce6df879 [WebAssembly] Saturating float-to-int builtins
Summary: Depends on D53007 and D53004.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits

Differential Revision: https://reviews.llvm.org/D53009

llvm-svn: 344205
2018-10-11 00:07:55 +00:00
Mandeep Singh Grang df7929676d [COFF, ARM64] Add _InterlockedAdd intrinsic
Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, javed.absar, efriedma

Reviewed By: efriedma

Subscribers: efriedma, kristof.beyls, chrib, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D52811

llvm-svn: 343894
2018-10-05 21:57:41 +00:00
Mandeep Singh Grang 15e0f7fa28 [COFF, ARM64] Add _InterlockedCompareExchangePointer_nf intrinsic
Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, efriedma

Reviewed By: efriedma

Subscribers: efriedma, kristof.beyls, chrib, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D52807

llvm-svn: 343881
2018-10-05 19:49:36 +00:00
Thomas Lively d2a293c562 [WebAssembly] abs and sqrt builtins
Summary: Depends on D52910.

Reviewers: aheejin, dschuff, craig.topper

Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits

Differential Revision: https://reviews.llvm.org/D52913

llvm-svn: 343838
2018-10-05 01:02:54 +00:00
Thomas Lively 291d75b0de [WebAssembly] any_true and all_true builtins
Summary: Depends on D52858.

Reviewers: aheejin, dschuff, craig.topper

Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits

Differential Revision: https://reviews.llvm.org/D52910

llvm-svn: 343837
2018-10-05 00:59:37 +00:00
Thomas Lively 9034a47e79 [WebAssembly] saturating arithmetic builtins
Summary: Depends on D52856.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits

Differential Revision: https://reviews.llvm.org/D52858

llvm-svn: 343836
2018-10-05 00:58:56 +00:00
Thomas Lively a347436f09 [WebAssembly] __builtin_wasm_replace_lane_* builtins
Summary: Depends on D52852.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits

Differential Revision: https://reviews.llvm.org/D52856

llvm-svn: 343835
2018-10-05 00:58:07 +00:00
Thomas Lively d6792c0c28 [WebAssembly] __builtin_wasm_extract_lane_* builtins
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits

Differential Revision: https://reviews.llvm.org/D52852

llvm-svn: 343834
2018-10-05 00:54:44 +00:00
Mandeep Singh Grang ecc82ef0c2 [COFF, ARM64] Add __getReg intrinsic
Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, javed.absar, efriedma

Reviewed By: efriedma

Subscribers: peter.smith, efriedma, kristof.beyls, chrib, cfe-commits

Differential Revision: https://reviews.llvm.org/D52838

llvm-svn: 343824
2018-10-04 22:32:42 +00:00
Mandeep Singh Grang aef87980a9 [COFF, ARM64] Add _ReadWriteBarrier intrinsic
Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, javed.absar

Reviewed By: rnk

Subscribers: kristof.beyls, chrib, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D52809

llvm-svn: 343699
2018-10-03 17:24:21 +00:00
Craig Topper fb5d9f2849 [X86] For lzcnt/tzcnt intrinsics use cttz/ctlz intrinsics with zero_undef flag set to false.
Previously we used a select and the zero_undef=true intrinsic. In -O2 this pattern will get optimized to zero_undef=false. But in -O0 this optimization won't happen. This results in a compare and cmov being wrapped around a tzcnt/lzcnt instruction.

By using the zero_undef=false intrinsic directly without the select, we can improve the -O0 codegen to just an lzcnt/tzcnt instruction.

Differential Revision: https://reviews.llvm.org/D52392

llvm-svn: 343126
2018-09-26 17:01:44 +00:00
QingShan Zhang accb65b994 [PowerPC] [Clang] Add vector int128 pack/unpack builtins
unsigned long long builtin_unpack_vector_int128 (vector int128_t, int);
vector int128_t builtin_pack_vector_int128 (unsigned long long, unsigned long long);

Builtins should behave the same way as in GCC.

Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52074

llvm-svn: 342614
2018-09-20 05:04:57 +00:00
Craig Topper ecf2e2fe31 [X86] Custom emit __builtin_rdtscp so we can emit an explicit store for the out parameter
This is the clang side of D51803. The llvm intrinsic now returns two results. So we need to emit an explicit store in IR for the out parameter. This is similar to addcarry/subborrow/rdrand/rdseed.

Differential Revision: https://reviews.llvm.org/D51805

llvm-svn: 341699
2018-09-07 19:14:24 +00:00
Craig Topper 52a61fc2ac [X86] Modify addcarry/subborrow builtins to emit an 2 result and intrinsic and an store instruction.
This is the clang side of D51769. The llvm intrinsics now return two results instead of using an out parameter.

Differential Revision: https://reviews.llvm.org/D51771

llvm-svn: 341678
2018-09-07 16:58:57 +00:00
Craig Topper d88f76a891 [X86] Add ktest intrinsics to match gcc and icc.
These aren't documented in the Intel Intrinsics Guide, but are supported by gcc and icc.

Includes these intrinsics:
_ktestc_mask8_u8, _ktestz_mask8_u8, _ktest_mask8_u8
_ktestc_mask16_u8, _ktestz_mask16_u8, _ktest_mask16_u8
_ktestc_mask32_u8, _ktestz_mask32_u8, _ktest_mask32_u8
_ktestc_mask64_u8, _ktestz_mask64_u8, _ktest_mask64_u8

llvm-svn: 341265
2018-08-31 22:29:56 +00:00
Craig Topper 42a4d0822e [X86] Add k-mask conversion and load/store instrinsics to match gcc and icc.
This adds:
_cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64
_cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64
_load_mask8, _load_mask16, _load_mask32, _load_mask64
_store_mask8, _store_mask16, _store_mask32, _store_mask64

These are currently missing from the Intel Intrinsics Guide webpage.

llvm-svn: 341251
2018-08-31 20:41:06 +00:00
Craig Topper 2aa8efc820 [X86] Add kshift intrinsics to match gcc and icc.
This adds the following intrinsics:
_kshiftli_mask8
_kshiftli_mask16
_kshiftli_mask32
_kshiftli_mask64
_kshiftri_mask8
_kshiftri_mask16
_kshiftri_mask32
_kshiftri_mask64

llvm-svn: 341234
2018-08-31 18:22:52 +00:00
Craig Topper a65bf65e0b [X86] Add kadd intrinsics to match gcc and icc.
This adds the following intrinsics:
_kadd_mask64
_kadd_mask32
_kadd_mask16
_kadd_mask8

These are missing from the Intel Intrinsics Guide, but are implemented by both gcc and icc.

llvm-svn: 340879
2018-08-28 22:32:14 +00:00
Craig Topper cb5fd56c7f [X86] Add kortest intrinsics for 8, 32, and 64 bit masks. Add new intrinsic names for 16 bit masks.
This matches gcc and icc despite not being documented in the Intel Intrinsics Guide.

llvm-svn: 340798
2018-08-28 06:28:25 +00:00
Craig Topper c330ca8611 [X86] Add intrinsics for kand/kandn/knot/kor/kxnor/kxor with 8, 32, and 64-bit mask registers.
This also adds a second intrinsic name for the 16-bit mask versions.

These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed.

llvm-svn: 340719
2018-08-27 06:20:22 +00:00
Nico Weber 14a577bfd1 Eliminate instances of `EmitScalarExpr(E->getArg(n))` in EmitX86BuiltinExpr().
EmitX86BuiltinExpr() emits all args into Ops at the beginning, so don't do that
work again.

This changes behavior: If e.g. ++a was passed as an arg, we incremented a twice
previously. This change fixes that bug.

https://reviews.llvm.org/D50979

llvm-svn: 340348
2018-08-21 22:19:55 +00:00
Sanjay Patel ad82390d3f [CodeGen] add rotate builtins that map to LLVM funnel shift
This is a partial retry of rL340137 (reverted at rL340138 because of gcc host compiler crashing)
with 1 change:
Remove the changes to make microsoft builtins also use the LLVM intrinsics.
 
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).

We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops) that we want to replicate, we can change the names.

The funnel shift intrinsics were added here:
https://reviews.llvm.org/D49242

With improved codegen in:
https://reviews.llvm.org/rL337966
https://reviews.llvm.org/rL339359

And basic IR optimization added in:
https://reviews.llvm.org/rL338218
https://reviews.llvm.org/rL340022

...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.

In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.

Differential Revision: https://reviews.llvm.org/D50924

llvm-svn: 340141
2018-08-19 16:50:30 +00:00
Sanjay Patel a09ae4b8a6 revert r340137: [CodeGen] add rotate builtins
At least a couple of bots (gcc host compiler on PPC only?) are showing the compiler dying while trying to compile.

llvm-svn: 340138
2018-08-19 15:31:42 +00:00
Sanjay Patel 446529b0d9 [CodeGen] add/fix rotate builtins that map to LLVM funnel shift (retry)
This is a retry of rL340135 (reverted at rL340136 because of gcc host compiler crashing)
with 2 changes:
1. Move the code into a helper to reduce code duplication (and hopefully work-around the crash).
2. The original commit had a formatting bug in the docs (missing an underscore).

Original commit message:

This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).

We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate,
we can change the names.

The funnel shift intrinsics were added here:
https://reviews.llvm.org/D49242

With improved codegen in:
https://reviews.llvm.org/rL337966
https://reviews.llvm.org/rL339359

And basic IR optimization added in:
https://reviews.llvm.org/rL338218
https://reviews.llvm.org/rL340022

...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.

In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.

Differential Revision: https://reviews.llvm.org/D50924

llvm-svn: 340137
2018-08-19 14:44:47 +00:00
Sanjay Patel 39b4dd2da7 revert r340135: [CodeGen] add rotate builtins
At least a couple of bots (PPC only?) are showing the compiler dying while trying to compile:
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/11065/steps/build%20stage%201/logs/stdio
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/18267/steps/build%20stage%201/logs/stdio

llvm-svn: 340136
2018-08-19 13:48:06 +00:00
Sanjay Patel 9116f0438c [CodeGen] add rotate builtins
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang 
(when both halves of a funnel shift are the same value, it's a rotate).

We're free to name these as we want because we're not copying gcc, but if there's some other 
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate, 
we can change the names.

The funnel shift intrinsics were added here:
D49242

With improved codegen in:
rL337966
rL339359

And basic IR optimization added in:
rL338218
rL340022

...so these are expected to produce asm output that's equal or better to the multi-instruction 
alternatives using primitive C/IR ops.

In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.

Differential Revision: https://reviews.llvm.org/D50924

llvm-svn: 340135
2018-08-19 13:12:40 +00:00
Nico Weber b2c53d3393 Make __shiftleft128 / __shiftright128 real compiler built-ins.
r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h.
Microsoft's STL plans on using these functions, and they're using intrin0.h
which just has declarations of built-ins to not pull in the huge intrin.h
header in the standard library headers. That requires that these functions are
real built-ins.

https://reviews.llvm.org/D50907

llvm-svn: 340048
2018-08-17 17:19:06 +00:00
Craig Topper 72a7606433 [X86] Remove masking from the 512-bit paddus/psubus builtins. Use a select builtin instead.
llvm-svn: 339845
2018-08-16 07:28:06 +00:00
Tomasz Krupa e8cf972d86 [X86] Lowering addus/subus intrinsics to native IR
Summary: This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations.

Reviewers: craig.topper, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D46892

llvm-svn: 339651
2018-08-14 08:01:38 +00:00
Stephen Kelly f2ceec4811 Port getLocStart -> getBeginLoc
Reviewers: teemperor!

Subscribers: jholewinski, whisperity, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D50350

llvm-svn: 339385
2018-08-09 21:08:08 +00:00
Craig Topper 0a4f6be443 [Builtins] Implement __builtin_clrsb to be compatible with gcc
gcc defines an intrinsic called __builtin_clrsb which counts the number of extra sign bits on a number. This is equivalent to counting the number of leading zeros on a positive number or the number of leading ones on a negative number and subtracting one from the result. Since we can't count leading ones we need to invert negative numbers to count zeros.

This patch will cause the builtin to be expanded inline while gcc uses a call to a function like clrsbdi2 that is implemented in libgcc. But this is similar to what we already do for popcnt. And I don't think compiler-rt supports clrsbdi2.

Differential Revision: https://reviews.llvm.org/D50168

llvm-svn: 339282
2018-08-08 19:55:52 +00:00
Scott Linder f8b3df4dec [OpenCL] Restore r338899 (reverted in r338904), fixing stack-use-after-return
Always emit alloca in entry block for enqueue_kernel builtin.

Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC
later because it is not in the entry block.

llvm-svn: 339150
2018-08-07 15:52:49 +00:00
Vlad Tsyrklevich c7d3d34b98 Revert "[OpenCL] Always emit alloca in entry block for enqueue_kernel builtin"
This reverts commit r338899, it was causing ASan test failures on sanitizer-x86_64-linux-fast.

llvm-svn: 338904
2018-08-03 17:47:58 +00:00
Scott Linder 91f578467c [OpenCL] Always emit alloca in entry block for enqueue_kernel builtin
Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC
later because it is not in the entry block.

Differential Revision: https://reviews.llvm.org/D50104

llvm-svn: 338899
2018-08-03 15:50:52 +00:00
Heejin Ahn 00aa81b4df [WebAssembly] Support for atomic.wait / atomic.wake builtins
Summary:
Add support for atomic.wait / atomic.wake builtins based on the Wasm
thread proposal.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits

Differential Revision: https://reviews.llvm.org/D49396

llvm-svn: 338771
2018-08-02 21:44:40 +00:00
Matt Arsenault c65f966d76 Try to make builtin address space declarations not useless
The way address space declarations for builtins currently work
is nearly useless. The code assumes the address spaces used for
builtins is a confusingly named "target address space" from user
code using __attribute__((address_space(N))) that matches
the builtin declaration. There's no way to use this to declare
a builtin that returns a language specific address space.
The terminology used is highly cofusing since it has nothing
to do with the the address space selected by the target to use
for a language address space.

This feature is essentially unused as-is. AMDGPU and NVPTX
are the only in-tree targets attempting to use this. The AMDGPU
builtins certainly do not behave as intended (i.e. all of the
builtins returning pointers can never compile because the numbered
address space never matches the expected named address space).

The NVPTX builtins are missing tests for some, and the others
seem to rely on an implicit addrspacecast.

Change the used address space for builtins based on a target
hook to allow using a language address space for a builtin.
This allows the same builtin declaration to be used for multiple
languages with similarly purposed address spaces (e.g. the same
AMDGPU builtin can be used in OpenCL and CUDA even though the
constant address spaces are arbitarily different).

This breaks the possibility of using arbitrary numbered
address spaces alongside the named address spaces for builtins.
If this is an issue we probably need to introduce another builtin
declaration character to distinguish language address spaces from
so-called "target address spaces".

llvm-svn: 338707
2018-08-02 12:14:28 +00:00
Fangrui Song 6907ce2f8f Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

llvm-svn: 338291
2018-07-30 19:24:48 +00:00
Ivan A. Kosarev 8264bb8d34 [NEON] Fix support for vrndi_f32(), vrndiq_f32() and vrndns_f32() intrinsics
This patch adds support for vrndi_f32() and vrndiq_f32()
intrinsics in AArch32 mode and for vrndns_f32() intrinsic in
AArch64 mode.

Differential Revision: https://reviews.llvm.org/D48829

llvm-svn: 337690
2018-07-23 13:26:37 +00:00
Erich Keane 3efe00206f Implement cpu_dispatch/cpu_specific Multiversioning
As documented here: https://software.intel.com/en-us/node/682969 and
https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning
is an ICC feature that provides for function multiversioning.

This feature is implemented with two attributes: First, cpu_specific,
which specifies the individual function versions. Second, cpu_dispatch,
which specifies the location of the resolver function and the list of
resolvable functions.

This is valuable since it provides a mechanism where the resolver's TU
can be specified in one location, and the individual implementions
each in their own translation units.

The goal of this patch is to be source-compatible with ICC, so this
implementation diverges from the ICC implementation in a few ways:
1- Linux x86/64 only: This implementation uses ifuncs in order to
properly dispatch functions. This is is a valuable performance benefit
over the ICC implementation. A future patch will be provided to enable
this feature on Windows, but it will obviously more closely fit ICC's
implementation.
2- CPU Identification functions: ICC uses a set of custom functions to identify
the feature list of the host processor. This patch uses the cpu_supports
functionality in order to better align with 'target' multiversioning.
1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function
marked cpu_dispatch be an empty definition. This patch supports that as well,
however declarations are also permitted, since the linker will solve the
issue of multiple emissions.

Differential Revision: https://reviews.llvm.org/D47474

llvm-svn: 337552
2018-07-20 14:13:28 +00:00
Fangrui Song 99337e246c Change \t to spaces
llvm-svn: 337530
2018-07-20 08:19:20 +00:00
Nemanja Ivanovic 2600b839d5 NFC: Remove extraneous semicolons as pointed out in the differential review
The commit for
https://reviews.llvm.org/D49424
missed the comment about the extraneous semicolons. Remove them.

llvm-svn: 337451
2018-07-19 12:49:27 +00:00
Nemanja Ivanovic 1ac56bd33f [PowerPC] Handle __builtin_xxpermdi the same way as GCC does
The codegen for this builtin was initially implemented to match GCC.
However, due to interest from users GCC changed behaviour to account for the
big endian bias of the instruction and correct it. This patch brings the
handling inline with GCC.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38192

Differential Revision: https://reviews.llvm.org/D49424

llvm-svn: 337449
2018-07-19 12:44:15 +00:00
Mandeep Singh Grang 0054f48b44 [COFF] Add more missing MSVC ARM64 intrinsics
Summary:
Added the following intrinsics:
_BitScanForward, _BitScanReverse, _BitScanForward64, _BitScanReverse64
_InterlockedAnd64, _InterlockedDecrement64, _InterlockedExchange64,
_InterlockedExchangeAdd64, _InterlockedExchangeSub64,
_InterlockedIncrement64, _InterlockedOr64, _InterlockedXor64.

Reviewers: compnerd, mstorsjo, rnk, javed.absar

Reviewed By: mstorsjo

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D49445

llvm-svn: 337327
2018-07-17 22:03:24 +00:00
Craig Topper 1faf953d75 [X86] Remove custom handling for __builtin_ia32_divss_round_mask and __builtin_ia32_divsd_round_mask.
llvm-svn: 336628
2018-07-10 00:50:03 +00:00
Craig Topper 638426fc36 [X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that is suitable for use in scalar mask intrinsics.
This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics.

This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling.

I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle.

llvm-svn: 336622
2018-07-10 00:37:25 +00:00
Craig Topper 74c10e3236 [Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.

Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.

To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.

To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.

To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.

To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.

There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.

Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.

Differential Revision: https://reviews.llvm.org/D48617

llvm-svn: 336583
2018-07-09 19:00:16 +00:00
Craig Topper 8a8d72794f [X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types.
This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma

llvm-svn: 336507
2018-07-08 01:10:47 +00:00
Craig Topper f89f62a680 [X86] When creating a select for scalar masked sqrt and div builtins make sure we optimize the all ones mask case.
This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases.

Factor the code into a helper function to make this easy.

llvm-svn: 336472
2018-07-06 22:46:52 +00:00
Craig Topper be4c2933a2 [X86] Implement _builtin_ia32_vfmaddss and _builtin_ia32_vfmaddsd with native IR using llvm.fma intrinsic.
This generates some extra zeroing currently, but we should be able to quickly address that with some isel patterns.

llvm-svn: 336417
2018-07-06 07:14:47 +00:00
Craig Topper 284c5f342c [X86] Use shufflevector instead of a select with a constant mask for fmaddsub/fmsubadd IR emission.
Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles.

While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle.

llvm-svn: 336388
2018-07-05 20:38:31 +00:00
Gabor Buella 9679eb6527 [X86] Fix some vector cmp builtins - TRUE/FALSE predicates
This patch removes on optimization used with the TRUE/FALSE
predicates, as was suggested in https://reviews.llvm.org/D45616
for r335339.
The optimization was buggy, since r335339 used it also
for *_mask builtins, without actually applying the mask -- the
mask argument was just ignored.

Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D48715

llvm-svn: 336355
2018-07-05 14:26:56 +00:00
Craig Topper 8bf793fb35 [X86] Remove masking from the avx512 packed sqrt builtins. Use select builtins instead.
llvm-svn: 335945
2018-06-29 05:43:33 +00:00
Craig Topper 851f363691 [X86] Rename llvm.x86.avx512.mask.fpclass.p* to exclude 'mask.' from the name to match llvm.
llvm-svn: 335745
2018-06-27 15:57:57 +00:00
Ivan A. Kosarev a9f484ac4a [NEON] Support vldNq intrinsics in AArch32 (Clang part)
This patch reworks the support for dup NEON intrinsics as
described in D48439.

Differential Revision: https://reviews.llvm.org/D48440

llvm-svn: 335734
2018-06-27 13:58:43 +00:00
Craig Topper 4ef61aecbd [X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction.
Additional IR is emitted to convert between scalar and vXi1 type to match the expected software inferface for the builtin that clang exposes.

llvm-svn: 335564
2018-06-26 00:44:02 +00:00
Gabor Buella 716863c820 [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IR
Summary:
Lowering some vector comparision builtins to fcmp IR instructions.
This ignores the signaling behaviour specified in the predicate
argument of said builtins.

Affected AVX512 builtins:

__builtin_ia32_cmpps128_mask
__builtin_ia32_cmpps256_mask
__builtin_ia32_cmpps512_mask
__builtin_ia32_cmppd128_mask
__builtin_ia32_cmppd256_mask
__builtin_ia32_cmppd512_mask

Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma

Reviewed By: craig.topper, spatel, efriedma

Differential Revision: https://reviews.llvm.org/D45616

llvm-svn: 335339
2018-06-22 11:59:16 +00:00
Craig Topper 342b095689 [X86] Update handling in CGBuiltin to be tolerant of out of range immediates.
D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled.

This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled.

llvm-svn: 335308
2018-06-21 23:39:47 +00:00
Tomasz Krupa 83ba6fa98d Fix a bug introduced by rL334850
Summary: All *_sqrt_round_s[s|d] intrinsics should execute a square root on
zeroth element from B (Ops[1]) and insert in to A (Ops[0]), not the other way around.

Reviewers: itaraban, craig.topper

Reviewed By: craig.topper

Subscribers: craig.topper, cfe-commits

Differential Revision: https://reviews.llvm.org/D48288

llvm-svn: 334964
2018-06-18 17:57:05 +00:00
Tomasz Krupa f1792bb3d6 [X86] Lowering sqrt intrinsics to native IR
Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k

Reviewed By: craig.topper

Subscribers: tkrupa, cfe-commits

Differential Revision: https://reviews.llvm.org/D41168

llvm-svn: 334850
2018-06-15 18:05:59 +00:00
Luke Geeson da2b2e8c26 [AArch64] Reverted rC334696 with Clang VCVTA test fix
llvm-svn: 334820
2018-06-15 10:10:45 +00:00
Craig Topper 31730ae761 [X86] Rename __builtin_ia32_pslldqi128 to __builtin_ia32_pslldqi128_byteshift and similar for other sizes. Remove the multiply by 8 from the header files.
The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8.

This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic.

Fixes the remaining issue from PR37795.

llvm-svn: 334773
2018-06-14 22:02:35 +00:00
Tomasz Krupa 82aa42af49 [X86] Lowering Mask Scalar intrinsics to native IR (Clang part)
Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls
to native IR.

Reviewers: craig.topper, RKSimon, spatel, sroland

Reviewed By: craig.topper

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D47979

llvm-svn: 334741
2018-06-14 17:36:23 +00:00
Luke Geeson bb399f8013 [AArch64] reverting rC334693 due to build failures
llvm-svn: 334696
2018-06-14 08:59:33 +00:00
Luke Geeson 010bbbf390 [AArch64] Added support for the vcvta_u16_f16 instrinsic for FP16 Armv8.2-A
llvm-svn: 334693
2018-06-14 08:28:56 +00:00
Mandeep Singh Grang 2d28383097 [COFF] Add ARM64 intrinsics: __yield, __wfe, __wfi, __sev, __sevl
Summary: These intrinsics result in hint instructions. They are provided here for MSVC ARM64 compatibility.

Reviewers: mstorsjo, compnerd, javed.absar

Reviewed By: mstorsjo

Subscribers: kristof.beyls, chrib, cfe-commits

Differential Revision: https://reviews.llvm.org/D48132

llvm-svn: 334639
2018-06-13 18:49:35 +00:00
Craig Topper 201b9dd334 [X86] Fix operand order in the shuffle created for blend builtins.
This was broken when the builtin was added in r334249.

llvm-svn: 334422
2018-06-11 17:06:01 +00:00
Craig Topper 3cce6a7ed9 [X86] Use target independent masked expandload and compressstore intrinsics to implement expandload/compressstore builtins.
Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them.

Reviewers: RKSimon, delena, spatel, GBuella

Reviewed By: RKSimon

Subscribers: cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D47693

llvm-svn: 334366
2018-06-10 17:27:05 +00:00
Ivan A. Kosarev 73c76c35a5 [NEON] Support VST1xN intrinsics in AArch32 mode (Clang part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47446

llvm-svn: 334362
2018-06-10 09:28:10 +00:00
Craig Topper 88097d9355 [X86] Add back some masked vector truncate builtins. Custom IRgen a a few others.
I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl.

By using special buitlins we can emit a select without using the 128/256-bit select builtins.

llvm-svn: 334331
2018-06-08 21:50:08 +00:00
Craig Topper 5f50f33806 [X86] Fold masking into subvector extract builtins.
I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features.

The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported.

llvm-svn: 334330
2018-06-08 21:50:07 +00:00
Craig Topper 03f4f04b91 [X86] Add builtins for vpermq/vpermpd instructions to enable target feature checking.
llvm-svn: 334311
2018-06-08 18:00:25 +00:00
Craig Topper 422a1bbb84 [X86] Add builtins for shufps and shufpd to enable target feature and immediate range checking.
llvm-svn: 334266
2018-06-08 07:18:33 +00:00
Craig Topper 03de166ccd [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking.
llvm-svn: 334265
2018-06-08 06:13:16 +00:00
Craig Topper 3428beeb2f [X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking.
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.

llvm-svn: 334261
2018-06-08 03:24:47 +00:00
Craig Topper acf5601961 [X86] Add builtins for vpermilps/pd instructions to enable target feature checking.
llvm-svn: 334256
2018-06-08 00:59:27 +00:00
Craig Topper 7d17d7278b [X86] Add builtins for blend with immediate control to enforce target feature requirements and check immediate range.
llvm-svn: 334249
2018-06-08 00:00:21 +00:00
Craig Topper 9392136414 [X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable target feature checking and immediate range checking.
llvm-svn: 334244
2018-06-07 23:03:08 +00:00
Reid Kleckner aa46ed9278 [MS] Re-add support for the ARM interlocked bittest intrinscs
Adds support for these intrinsics, which are ARM and ARM64 only:
  _interlockedbittestandreset_acq
  _interlockedbittestandreset_rel
  _interlockedbittestandreset_nf
  _interlockedbittestandset_acq
  _interlockedbittestandset_rel
  _interlockedbittestandset_nf

Refactor the bittest intrinsic handling to decompose each intrinsic into
its action, its width, and its atomicity.

llvm-svn: 334239
2018-06-07 21:39:04 +00:00
Craig Topper e56819eb69 [X86] Add builtins for VALIGNQ/VALIGND to enable proper target feature checking.
We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature.

I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that.

llvm-svn: 334237
2018-06-07 21:27:41 +00:00
Craig Topper d3623155a2 [X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar intrinsics.
We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.

This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously.

llvm-svn: 334208
2018-06-07 17:28:03 +00:00
Craig Topper b92c77d176 [X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit fmadd/fmsub/fmaddsub/fmsubadd builtins.
Summary:
We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't.

I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim.

The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform.

I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it.

Reviewers: tkrupa, RKSimon, spatel, GBuella

Reviewed By: tkrupa

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D47724

llvm-svn: 334159
2018-06-07 02:46:02 +00:00
Reid Kleckner 11c99ed05f [MS][ARM64]: Promote _setjmp to_setjmpex as there is no _setjmp in the ARM64 libvcruntime.lib
Factor out the common setjmp call emission code.

Based on a patch by Chris January

Differential Revision: https://reviews.llvm.org/D47784

llvm-svn: 334112
2018-06-06 18:39:47 +00:00
Reid Kleckner 05df851327 Fix std::tuple errors
llvm-svn: 334060
2018-06-06 01:44:10 +00:00
Reid Kleckner 368d52b7e0 Implement bittest intrinsics generically for non-x86 platforms
I tested these locally on an x86 machine by disabling the inline asm
codepath and confirming that it does the same bitflips as we do with the
inline asm.

Addresses code review feedback.

llvm-svn: 334059
2018-06-06 01:35:08 +00:00
Craig Topper f3914b74c1 [X86] Add builtins for vector element insert and extract for different 128 and 256 bit vector types. Use them to implement the extract and insert intrinsics.
Previously we were just using extended vector operations in the header file.

This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load.

By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count.

User code still has the option to use extended vector operations themselves if they need non-constant indexing.

llvm-svn: 334057
2018-06-06 00:24:55 +00:00
Craig Topper 6b5b5ce06c [X86] Implement __builtin_ia32_vec_ext_v2si correctly even though we only use it with an index of 0.
This builtin takes an index as its second operand, but the codegen hardcodes an index of 0 and doesn't use the operand. The only use of the builtin in the header file passes 0 to the operand so this works for that usage. But its more correct to use the real operand.

llvm-svn: 334054
2018-06-05 22:40:03 +00:00
Reid Kleckner 1d9c249db5 Reimplement the bittest intrinsic family as builtins with inline asm
We need to implement _interlockedbittestandset as a builtin for
windows.h, so we might as well do the whole family. It reduces code
duplication anyway.

Fixes PR33188, a long standing bug in our bittest implementation
encountered by Chakra.

llvm-svn: 333978
2018-06-05 01:33:40 +00:00
Reid Kleckner 89fbd55145 Revert r333791 "Cap "voluntary" vector alignment at 16 for all Darwin platforms."
Adding __attribute__((aligned(32))) to __m256 breaks the implementation
of _mm256_loadu_ps on Windows. On Windows, alignment attributes have
higher precedence than packing attributes.

We also might want to carefully consider the consequences of changing
our vector typedefs, since many users copy them and invent their own
new, non-Intel specific vector type names.

llvm-svn: 333958
2018-06-04 21:39:20 +00:00
Craig Topper 6fb26f93ef [X86] Replace __builtin_ia32_vbroadcastf128_pd256 and __builtin_ia32_vbroadcastf128_ps256 with an unaligned load intrinsics and a __builtin_shufflevector call.
llvm-svn: 333853
2018-06-03 19:42:59 +00:00
Craig Topper f886b44693 [X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper functions. NFC
llvm-svn: 333851
2018-06-03 19:02:57 +00:00
Craig Topper 8508c1db98 Revert r333848 "[X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper functions. NFC"
Looks like I missed some changes to make this work.

llvm-svn: 333850
2018-06-03 18:41:22 +00:00
Craig Topper d4a610f6f7 [X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper functions. NFC
llvm-svn: 333848
2018-06-03 18:08:37 +00:00
Craig Topper 21f56f5b9c [X86] When emitting masked loads/stores don't check for all ones mask.
This seems like a premature optimization. It's unlikely a user would pass something the frontend can tell is all ones to the masked load/store intrinsics.

We do this optimization for emitting select for masking because we have builtin calls in header files that pass an all ones mask in. Though at this point we may not longer have any builtins that emit some IR and a select. We may only have the select builtins so maybe we can remove that optimization too.

llvm-svn: 333847
2018-06-03 18:08:36 +00:00
Ivan A. Kosarev 9c40c0ad0c [NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47121

llvm-svn: 333829
2018-06-02 17:42:59 +00:00
John McCall 280c656031 Cap "voluntary" vector alignment at 16 for all Darwin platforms.
This fixes two major problems:
- We were not capping vector alignment as desired on 32-bit ARM.
- We were using different alignments based on the AVX settings on
  Intel, so we did not have a consistent ABI.

This is an ABI break, but we think we can get away with it because
vectors tend to be used mostly in inline code (which is why not having
a consistent ABI has not proven disastrous on Intel).

Intel's AVX types are specified as having 32-byte / 64-byte alignment,
so align them explicitly instead of relying on the base ABI rule.
Note that this sort of attribute is stripped from template arguments
in template substitution, so there's a possibility that code templated
over vectors will produce inadequately-aligned objects.  The right
long-term solution for this is for alignment attributes to be
interpreted as true qualifiers and thus preserved in the canonical type.

llvm-svn: 333791
2018-06-01 21:34:26 +00:00
Dan Gohman 9f8ee03772 [WebAssembly] Update to the new names for the memory builtin functions.
The WebAssembly committee has decided on the names `memory.size` and
`memory.grow` for the memory intrinsics, so update the clang builtin
functions to follow those names, keeping both sets of old names in place
for compatibility.

llvm-svn: 333712
2018-06-01 00:05:51 +00:00
Gabor Buella 70d8d51073 [X86] Lowering FMA intrinsics to native IR (Clang part)
This patch replaces all packed (and scalar without rounding
mode) fused intrinsics with fmadd/fmaddsub variations.
Then fmadd/fmaddsub are lowered to native IR.

Patch by tkrupa

Reviewers: craig.topper, sroland, spatel, RKSimon

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D47444

llvm-svn: 333555
2018-05-30 15:27:49 +00:00
Simon Tatham 89e31fa7fc Support __iso_volatile_load8 etc on aarch64-win32.
These intrinsics are used by MSVC's header files on AArch64 Windows as
well as AArch32, so we should support them for both targets. I've
factored them out of CodeGenFunction::EmitARMBuiltinExpr into separate
functions that EmitAArch64BuiltinExpr can call as well.

Reviewers: javed.absar, mstorsjo

Reviewed By: mstorsjo

Subscribers: kristof.beyls, cfe-commits

Differential Revision: https://reviews.llvm.org/D47476

llvm-svn: 333513
2018-05-30 07:54:05 +00:00
Craig Topper f2043b08b4 [X86] Remove mask argument from more builtins that are handled completely in CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead.
llvm-svn: 333061
2018-05-23 04:51:54 +00:00
Sanjay Patel 74c7fb002f [CodeGen] use nsw negation for builtin abs
The clang builtins have the same semantics as the stdlib functions.
The stdlib functions are defined in section 7.20.6.1 of the C standard with:
"If the result cannot be represented, the behavior is undefined."

That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would
be UB/poison.

Differential Revision: https://reviews.llvm.org/D47202

llvm-svn: 333038
2018-05-22 23:02:13 +00:00
Craig Topper 8e3689c066 [X86] Remove mask argument from some builtins that are handled completely in CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead.
llvm-svn: 333027
2018-05-22 20:48:24 +00:00
Sanjay Patel 1ff6b27940 [CodeGen] produce the LLVM canonical form of abs
We chose the 'slt' form as canonical in IR with:
rL332819
...so we should generate that form directly for efficiency.

llvm-svn: 332989
2018-05-22 15:36:50 +00:00
Craig Topper 288bd2e5a0 [X86] Remove masking from pternlog llvm intrinsics and use a select instruction instead.
Because the intrinsics in the headers are implemented as macros, we can't just use a select builtin and pternlog builtin. This would require one of the macro arguments to be used twice. Depending on what was passed to the macro we could expand an expression twice leading to weird behavior. We could maybe declare our local variable in the macro, but that would need to worry about name collisions.

To avoid that just generate IR directly in CGBuiltin.cpp.

Differential Revision: https://reviews.llvm.org/D47125

llvm-svn: 332891
2018-05-21 20:58:23 +00:00
Daniil Fukalov 1b14a3ad3d [AMDGPU] fixes for lds f32 builtins
1. added restrictions to memory scope, order and volatile parameters
2. added custom processing for these builtins - currently is not used code,
   needed to switch off GCCBuiltin link to the builtins (ongoing change to llvm
   tree)
3. builtins renamed as requested

Differential Revision: https://reviews.llvm.org/D43281

llvm-svn: 332848
2018-05-21 16:18:07 +00:00
Craig Topper 74ac0eda68 [X86] Change the implementation of scalar masked load/store intrinsics to not use a 512-bit intermediate vector.
This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs.

Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well.

llvm-svn: 331958
2018-05-10 05:43:43 +00:00
Craig Topper 2b248849ae [Builtins] Improve the IR emitted for MSVC compatible rotr/rotl builtins to match what the middle and backends understand
Previously we emitted something like

rotl(x, n) {
  n &= bitwidth-1;
  return n != 0 ? ((x << n) | (x >> (bitwidth - n)) : x;
}

We use a select to avoid the undefined behavior on the (bitwidth - n) shift.

The middle and backend don't really recognize this as a rotate and end up emitting a cmov or control flow because of the select.

A better pattern is (x << (n & mask)) | (x << (-n & mask)) where mask is bitwidth - 1.

Fixes the main complaint in PR37387. There's still some work to be done if the user writes that sequence directly on a short or char where type promotion rules can prevent it from being recognized. The builtin is emitting direct IR with unpromoted types so that isn't a problem for it.

Differential Revision: https://reviews.llvm.org/D46656

llvm-svn: 331943
2018-05-10 00:05:13 +00:00
Yaxun Liu 3cab24aa4f [OpenCL] Fix typos in emitted enqueue kernel function names
Two typos: 
vaarg => vararg
get_kernel_preferred_work_group_multiple => get_kernel_preferred_work_group_size_multiple

Differential Revision: https://reviews.llvm.org/D46601

llvm-svn: 331895
2018-05-09 17:07:06 +00:00
Adrian Prantl 9fc8faf9e6 Remove \brief commands from doxygen comments.
This is similar to the LLVM change https://reviews.llvm.org/D46290.

We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46320

llvm-svn: 331834
2018-05-09 01:00:01 +00:00
Oliver Stannard 2fcee8bd52 [ARM,AArch64] Add intrinsics for dot product instructions
The ACLE spec which describes these intrinsics hasn't been published yet, but
this is based on the final draft which will be published soon, and these have
already been implemented by GCC.

Differential revision: https://reviews.llvm.org/D46109

llvm-svn: 331039
2018-04-27 14:03:32 +00:00
Sven van Haastregt 4700faa28e [OpenCL] Add separate read_only and write_only pipe IR types
SPIR-V encodes the read_only and write_only access qualifiers of pipes,
so separate LLVM IR types are required to target SPIR-V.  Other backends
may also find this useful.

These new types are `opencl.pipe_ro_t` and `opencl.pipe_wo_t`, which
replace `opencl.pipe_t`.

This replaces __get_pipe_num_packets(...) and __get_pipe_max_packets(...)
which took a read_only pipe with separate versions for read_only and
write_only pipes, namely:

 * __get_pipe_num_packets_ro(...)
 * __get_pipe_num_packets_wo(...)
 * __get_pipe_max_packets_ro(...)
 * __get_pipe_max_packets_wo(...)

These separate versions exist to avoid needing a bitcast to one of the
two qualified pipe types.

Patch by Stuart Brady.

Differential Revision: https://reviews.llvm.org/D46015

llvm-svn: 331026
2018-04-27 10:37:04 +00:00
Chandler Carruth 16429acacb [x86] Revert r330322 (& r330323): Lowering x86 adds/addus/subs/subus intrinsics
The LLVM commit introduces a crash in LLVM's instruction selection.

I filed http://llvm.org/PR37260 with the test case.

llvm-svn: 330997
2018-04-26 21:46:01 +00:00
Alexander Ivchenko d96ddccdb4 Lowering x86 adds/addus/subs/subus intrinsics (clang)
This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations.

Patch by tkrupa

Differential Revision: https://reviews.llvm.org/D44786

llvm-svn: 330323
2018-04-19 12:15:11 +00:00
Artem Belevich 0ae8590354 [NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions.
The new instructions were added added for sm_70+ GPUs in CUDA-9.1.

Differential Revision: https://reviews.llvm.org/D45068

llvm-svn: 330296
2018-04-18 21:51:48 +00:00
Keith Wyss f437e35671 [XRay] Add clang builtin for xray typed events.
Summary:
A clang builtin for xray typed events. Differs from
__xray_customevent(...) by the presence of a type tag that is vended by
compiler-rt in typical usage. This allows xray handlers to expand logged
events with their type description and plugins to process traced events
based on type.

This change depends on D45633 for the intrinsic definition.

Reviewers: dberris, pelikan, rnk, eizan

Subscribers: cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D45716

llvm-svn: 330220
2018-04-17 21:32:43 +00:00
Aaron Ballman fe93546b11 Add modifiers for unsigned char and signed char field printing for __builtin_dump_struct.
Patch by Paul Semel.

llvm-svn: 330188
2018-04-17 14:00:06 +00:00
Aaron Ballman b6a7702297 Add checks for format specifiers used by __builtin_dump_struct and added a new specifier for null-terminated constant strings.
Patch by Paul Semel.

llvm-svn: 330185
2018-04-17 11:57:47 +00:00
Ivan A. Kosarev 9cdb2c75d9 [NEON] Support vrndns_f32 intrinsic
Differential Revision: https://reviews.llvm.org/D45515

llvm-svn: 330012
2018-04-13 12:46:02 +00:00
Dean Michael Berris 488f7c2b67 [XRay][clang] Add flag to choose instrumentation bundles
Summary:
This change addresses http://llvm.org/PR36926 by allowing users to pick
which instrumentation bundles to use, when instrumenting with XRay. In
particular, the flag `-fxray-instrumentation-bundle=` has four valid
values:

- `all`: the default, emits all instrumentation kinds
- `none`: equivalent to -fnoxray-instrument
- `function`: emits the entry/exit instrumentation
- `custom`: emits the custom event instrumentation

These can be combined either as comma-separated values, or as
repeated flag values.

Reviewers: echristo, kpw, eizan, pelikan

Reviewed By: pelikan

Subscribers: mgorny, cfe-commits

Differential Revision: https://reviews.llvm.org/D44970

llvm-svn: 329985
2018-04-13 02:31:58 +00:00
Aaron Ballman 0652534131 Introduce a new builtin, __builtin_dump_struct, that is useful for dumping structure contents at runtime in circumstances where debuggers may not be easily available (such as in kernel work).
Patch by Paul Semel.

llvm-svn: 329762
2018-04-10 21:58:13 +00:00
Craig Topper 304edc1e75 [X86] Emit native IR for pmuldq/pmuludq builtins.
I believe all the pieces are now in place in the backend to make this work correctly. We can either mask the input to 32 bits for pmuludg or shl/ashr for pmuldq and use a regular mul instruction. The backend should combine this to PMULUDQ/PMULDQ and then SimplifyDemandedBits will remove the and/shifts.

Differential Revision: https://reviews.llvm.org/D45421

llvm-svn: 329605
2018-04-09 19:17:54 +00:00
Alexander Kornienko 2a8c18d991 Fix typos in clang
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:

  archtype
  cas
  classs
  checkk
  compres
  definit
  frome
  iff
  inteval
  ith
  lod
  methode
  nd
  optin
  ot
  pres
  statics
  te
  thru

Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)

Differential revision: https://reviews.llvm.org/D44188

llvm-svn: 329399
2018-04-06 15:14:32 +00:00
Krzysztof Parzyszek 49fb6b5ecf [Hexagon] Remove default values from lambda parameters
llvm-svn: 329394
2018-04-06 13:51:48 +00:00
Gor Nishanov 2a78fa5209 [coroutines] Add __builtin_coro_noop => llvm.coro.noop
A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined
coroutine noop_coroutine that does nothing. To implement this feature, we implemented
an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that
does nothing when resumed or destroyed.

This patch adds a builtin __builtin_coro_noop() that maps to llvm.coro.noop intrinsic.

Related llvm change: https://reviews.llvm.org/D45114

llvm-svn: 328993
2018-04-02 17:35:37 +00:00
Krzysztof Parzyszek 790e422be9 [Hexagon] Aid bit-reverse load intrinsics lowering with bitcode
The conversion of operatios to bitcode helps to eliminate an additional
store in certain cases. We used to lower these load intrinsics in DAG to
DAG conversion by which time, the "Dead Store Elimination" pass is
already run. There is an associated LLVM patch.
    
Patch by Sumanth Gundapaneni.

llvm-svn: 328776
2018-03-29 13:54:31 +00:00
Krzysztof Parzyszek 1ef2a1f414 [Hexagon] Add support for "new" circular buffer intrinsics
These instructions have been around for a long time, but we
haven't supported intrinsics for them. The "new" vesrions use
the CSx register for the start of the buffer instead of the K
field in the Mx register.

There is a related llvm patch.

Patch by Brendon Cahoon.

llvm-svn: 328725
2018-03-28 19:40:57 +00:00
Abderrazek Zaafrani b5ac56fb81 [ARM] Add ARMv8.2-A FP16 vector intrinsic
Putting back the code in commit r327189 that was reverted in r322737. The code is being committed in three stages and this one is the last stage: 1) r327455 fp16 feature flags, 2) r327836 pass half type or i16 based on FullFP16, and 3) the code here which the front-end fp16 vector intrinsic for ARM.

Differential revision https://reviews.llvm.org/D43650

llvm-svn: 328277
2018-03-23 00:08:40 +00:00
Artem Belevich 30512869ff [NVPTX] Make tensor shape part of WMMA intrinsic's name.
This is needed for the upcoming implementation of the
new 8x32x16 and 32x8x16 variants of WMMA instructions
introduced in CUDA 9.1.

Differential Revision: https://reviews.llvm.org/D44719

llvm-svn: 328158
2018-03-21 21:55:02 +00:00
Eric Fiselier fa752f23cc [Builtins] Overload __builtin_operator_new/delete to allow forwarding to usual allocation/deallocation functions.
Summary:
Libc++'s default allocator uses `__builtin_operator_new` and `__builtin_operator_delete` in order to allow the calls to new/delete to be ellided. However, libc++ now needs to support over-aligned types in the default allocator. In order to support this without disabling the existing optimization Clang needs to support calling the aligned new overloads from the builtins.

See llvm.org/PR22634 for more information about the libc++ bug.

This patch changes `__builtin_operator_new`/`__builtin_operator_delete` to call any usual `operator new`/`operator delete` function. It does this by performing overload resolution with the arguments passed to the builtin to determine which allocation function to call. If the selected function is not a usual allocation function a diagnostic is issued.

One open issue is if the `align_val_t` overloads should be considered "usual" when `LangOpts::AlignedAllocation` is disabled.


In order to allow libc++ to detect this new behavior the value for `__has_builtin(__builtin_operator_new)` has been updated to `201802`.

Reviewers: rsmith, majnemer, aaron.ballman, erik.pilkington, bogner, ahatanak

Reviewed By: rsmith

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D43047

llvm-svn: 328134
2018-03-21 19:19:48 +00:00
Abderrazek Zaafrani 585051ae74 [AArch64] Add vmulxh_lane fp16 vector intrinsic
https://reviews.llvm.org/D44591

llvm-svn: 328038
2018-03-20 20:37:31 +00:00
Artem Belevich 914d4babec [NVPTX] Make tensor load/store intrinsics overloaded.
This way we can support address-space specific variants without explicitly
encoding the space in the name of the intrinsic. Less intrinsics to deal with ->
less boilerplate.

Added a bit of tablegen magic to match/replace an intrinsics with a pointer
argument in particular address space with the space-specific instruction
variant.

Updated tests to use non-default address spaces.

Differential Revision: https://reviews.llvm.org/D43268

llvm-svn: 328006
2018-03-20 17:18:59 +00:00
Sjoerd Meijer 87793e7599 [ARM] Pass half or i16 types for NEON intrinsics
For generating NEON intrinsics, this determines the NEON data type, and whether
it should be a half type or an i16 type. I.e., we always pass a half type for
AArch64, this hasn't changed, but now also for ARM but only when FullFP16 is
enabled, and i16 otherwise.

This is intended to be non-functional change, but together with the backend
work in D44538 which adds support for f16 vectors, this enables adding the
AArch32 FP16 (vector) intrinsics.

Differential Revision: https://reviews.llvm.org/D44561

llvm-svn: 327836
2018-03-19 13:22:49 +00:00
Sjoerd Meijer 95da875898 This reverts "r327189 - [ARM] Add ARMv8.2-A FP16 vector intrinsic"
This is causing problems in testing, and PR36683 was raised.
Reverting it until we have sorted out how to pass f16 vectors.

llvm-svn: 327437
2018-03-13 19:38:56 +00:00
Abderrazek Zaafrani 5bd68cf742 [ARM] Add ARMv8.2-A FP16 vector intrinsic
Add the fp16 neon vector intrinsic for ARM as described in the ARM ACLE document.

Reviews in https://reviews.llvm.org/D43650

llvm-svn: 327189
2018-03-09 23:39:34 +00:00
Craig Topper ebb0838f74 [X86] Reverse the operand order of the implementation of the kunpack builtins.
The second operand needs to be in the lower bits of the concatenation. This matches llvm 5.0, gcc, and icc behavior.

Fixes PR36360.

llvm-svn: 324954
2018-02-12 22:38:52 +00:00
Abderrazek Zaafrani e7ed880761 [AArch64] Fixes for ARMv8.2-A FP16 scalar intrinsic - clang portion
https://reviews.llvm.org/D42993

llvm-svn: 324940
2018-02-12 21:26:06 +00:00
Craig Topper a57d64e30f [X86] Change the signature of the AVX512 packed fp compare intrinsics to return vXi1 mask. Make bitcasts to scalar explicit in IR
Summary: This is the clang equivalent of r324827

Reviewers: zvi, delena, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43143

llvm-svn: 324828
2018-02-10 23:34:27 +00:00
Craig Topper c0b2e982d9 [X86] Replace kortest intrinsics with native IR.
llvm-svn: 324647
2018-02-08 20:16:17 +00:00
Peter Collingbourne 9e31f0a389 IRGen: Emit an inline implementation of __builtin_wmemcmp on MSVCRT platforms.
The MSVC runtime library does not provide a definition of wmemcmp,
so we need an inline implementation.

Differential Revision: https://reviews.llvm.org/D42441

llvm-svn: 323362
2018-01-24 18:59:58 +00:00
Dan Gohman 4f637e0ccc [WebAssembly] Add mem.* builtin functions.
This corresponds to r323222 in LLVM. The new names are not yet
finalized, so use them at your own risk.

llvm-svn: 323224
2018-01-23 17:04:04 +00:00
Abderrazek Zaafrani ce8746d178 [AArch64] Add ARMv8.2-A FP16 scalar intrinsics
https://reviews.llvm.org/D41792

llvm-svn: 323006
2018-01-19 23:11:18 +00:00
Craig Topper f517f1a516 [X86] Implement old kunpck intrinsics using vector ops on vXi1 instead of integer shift/and/or
Summary:
kunpck intrinsics were removed in favor of native IR a few months ago. The implementation lowers them as by operation on the integer types passed to the intrinsic and then just shifting, masking, and oring them together. A special X86 DAG combine was added to recognize this patter and turn it into a concat_vector operation.

I think it makes more sense to keep the IR implementation closer to vector operations on vXi1. Given that we expect these builtins to be used around other builtins that operate on k-registers which we try to represent in IR with vXi1. InstCombine should be able to get rid of the bitcasts between integers and vXi1 leaving only the vector operations.

Reviewers: RKSimon, spatel, zvi, jina.nahias

Reviewed By: RKSimon

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D42016

llvm-svn: 322461
2018-01-14 19:23:50 +00:00
Craig Topper de91dff5d4 [X86] Replace cvt*2mask intrinsics with native IR using 'icmp slt X, zeroinitializer.
llvm-svn: 322038
2018-01-08 22:37:56 +00:00
Benjamin Kramer dfecbe9ad8 Add support for a limited subset of TS 18661-3 math builtins.
These just overloads for _Float128. They're supported by GCC 7 and used
by glibc. APFloat support is already there so just add the overloads.

__builtin_copysignf128
__builtin_fabsf128
__builtin_huge_valf128
__builtin_inff128
__builtin_nanf128
__builtin_nansf128

This is the same support that GCC has, according to the documentation,
but limited to _Float128.

llvm-svn: 321948
2018-01-06 21:49:54 +00:00
Vedant Kumar bbafd50756 [CGBuiltin] Handle unsigned mul overflow properly (PR35750)
r320902 fixed the IRGen for some types of checked multiplications. It
did not handle unsigned overflow correctly in the case where the signed
operand is negative (PR35750).

Eli pointed out that on overflow, the result must be equal to the unique
value that is equivalent to the mathematically-correct result modulo two
raised to the k power, where k is the number of bits in the result type.

This patch fixes the specialized IRGen from r320902 accordingly.

Testing: Apart from check-clang, I modified the test harness from
r320902 to validate the results of all multiplications -- not just the
ones which don't overflow:

  https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081

llvm.org/PR35750, rdar://34963321

Differential Revision: https://reviews.llvm.org/D41717

llvm-svn: 321771
2018-01-03 23:11:32 +00:00
Coby Tayree 2268576fa0 [x86][icelake][bitalg]
added bitalg feature recognition
added intrinsics support for bitalg instructions
_mm512_popcnt_epi16
_mm512_mask_popcnt_epi16
_mm512_maskz_popcnt_epi16
_mm512_popcnt_epi8
_mm512_mask_popcnt_epi8
_mm512_maskz_popcnt_epi8
_mm512_mask_bitshuffle_epi64_mask
_mm512_bitshuffle_epi64_mask
_mm256_popcnt_epi16
_mm256_mask_popcnt_epi16
_mm256_maskz_popcnt_epi16
_mm128_popcnt_epi16
_mm128_mask_popcnt_epi16
_mm128_maskz_popcnt_epi16
_mm256_popcnt_epi8
_mm256_mask_popcnt_epi8
_mm256_maskz_popcnt_epi8
_mm128_popcnt_epi8
_mm128_mask_popcnt_epi8
_mm128_maskz_popcnt_epi8
_mm256_mask_bitshuffle_epi32_mask
_mm256_bitshuffle_epi32_mask
_mm128_mask_bitshuffle_epi16_mask
_mm128_bitshuffle_epi16_mask
matching a similar work on the backend (D40222)
Differential Revision: https://reviews.llvm.org/D41564

llvm-svn: 321483
2017-12-27 10:01:00 +00:00
Craig Topper 170de4b4ba [X86] Allow _mm_prefetch (both the header implementation and the builtin) to accept bit 2 which is supposed to indicate the prefetched addresses will be written to
Add the appropriate _MM_HINT_ET0/ET1 defines to match gcc.

llvm-svn: 321325
2017-12-21 23:50:22 +00:00
Abderrazek Zaafrani abb890b7be [AArch64] Enable fp16 data type for the Builtin for AArch64 only.
Differential Revision: https:://reviews.llvm.org/D41360

llvm-svn: 321301
2017-12-21 20:10:03 +00:00
Abderrazek Zaafrani f58a132eef [AARch64] Add ARMv8.2-A FP16 vector intrinsics
Putting back the code that was reverted few weeks ago.

Differential Revision: https://reviews.llvm.org/D34161

llvm-svn: 321294
2017-12-21 19:20:01 +00:00
Vedant Kumar 09b5bfdd85 [ubsan] Diagnose noreturn functions which return
Diagnose 'unreachable' UB when a noreturn function returns.

  1. Insert a check at the end of functions marked noreturn.

  2. A decl may be marked noreturn in the caller TU, but not marked in
     the TU where it's defined. To diagnose this scenario, strip away the
     noreturn attribute on the callee and insert check after calls to it.

Testing: check-clang, check-ubsan, check-ubsan-minimal, D40700

rdar://33660464

Differential Revision: https://reviews.llvm.org/D40698

llvm-svn: 321231
2017-12-21 00:10:25 +00:00
Adrian Prantl f3b3ccda59 Silence a bunch of implicit fallthrough warnings
llvm-svn: 321115
2017-12-19 22:06:11 +00:00
Craig Topper 5028ace602 [X86] Implement kand/kandn/kor/kxor/kxnor/knot intrinsics using native IR.
llvm-svn: 320919
2017-12-16 08:26:22 +00:00
Craig Topper b846d1ff76 [X86] Add builtins and tests for 128 and 256 bit vpopcntdq.
llvm-svn: 320915
2017-12-16 06:02:31 +00:00
Vedant Kumar fa5a0e59f0 [CodeGen] Specialize mixed-sign mul-with-overflow (fix PR34920)
This patch introduces a specialized way to lower overflow-checked
multiplications with mixed-sign operands. This fixes link failures and
ICEs on code like this:

  void mul(int64_t a, uint64_t b) {
    int64_t res;
    __builtin_mul_overflow(a, b, &res);
  }

The generic checked-binop irgen would use a 65-bit multiplication
intrinsic here, which requires runtime support for _muloti4 (128-bit
multiplication), and therefore fails to link on i386. To get an ICE
on x86_64, change the example to use __int128_t / __uint128_t.

Adding runtime and backend support for 65-bit or 129-bit checked
multiplication on all of our supported targets is infeasible.

This patch solves the problem by using simpler, specialized irgen for
the mixed-sign case.

llvm.org/PR34920, rdar://34963321

Testing: Apart from check-clang, I compared the output from this fairly
comprehensive test driver using unpatched & patched clangs:
https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081

Differential Revision: https://reviews.llvm.org/D41149

llvm-svn: 320902
2017-12-16 01:28:25 +00:00
Reid Kleckner 627f45fe52 [CodeGen][X86] Implement _InterlockedCompareExchange128 intrinsic
Summary:
InterlockedCompareExchange128 is a bit more complicated than the other
InterlockedCompareExchange functions, so it requires a bit more work. It
doesn't directly refer to 128bit ints, instead it takes pointers to
64bit ints for Destination and ComparandResult, and exchange is taken as
two 64bit ints (high & low). The previous value is written to
ComparandResult, and success is returned. This implementation does the
following in order to produce a cmpxchg instruction:

  1. Cast everything to 128bit ints or int pointers, and glues together
     the Exchange values
  2. Reads from CompareandResult to get the comparand
  3. Calls cmpxchg volatile (on X86 this will produce a lock cmpxchg16b
     instruction)
    1. Result 0 (previous value) is written back to ComparandResult
    2. Result 1 (success bool) is zext'ed to a uchar and returned

Resolves bug https://llvm.org/PR35251

Patch by Colden Cullen!

Reviewers: rnk, agutowski

Reviewed By: rnk

Subscribers: majnemer, cfe-commits

Differential Revision: https://reviews.llvm.org/D41032

llvm-svn: 320730
2017-12-14 19:00:21 +00:00
Krzysztof Parzyszek 5a6558382c [Hexagon] Intrinsic support for V62 and V65
llvm-svn: 320609
2017-12-13 19:56:03 +00:00
Sanjay Patel 08fba37e9d [CodeGen] fix mapping from fmod calls to frem instruction
Similar to D40044 and discussed in D40594.

llvm-svn: 319619
2017-12-02 17:52:00 +00:00
Sanjay Patel 0c0f77d03d [CodeGen] remove stale comment; NFC
The libm functions with LLVM intrinsic twins were moved above this blob with:
https://reviews.llvm.org/rL319593

llvm-svn: 319618
2017-12-02 16:29:34 +00:00
Sanjay Patel 3e287b4d35 [CodeGen] convert math libcalls/builtins to equivalent LLVM intrinsics
There are 20 LLVM math intrinsics that correspond to mathlib calls according to the LangRef:
http://llvm.org/docs/LangRef.html#standard-c-library-intrinsics

We were only converting 3 mathlib calls (sqrt, fma, pow) and 12 builtin calls (ceil, copysign, 
fabs, floor, fma, fmax, fmin, nearbyint, pow, rint, round, trunc) to their intrinsic-equivalents.

This patch pulls the transforms together and handles all 20 cases. The switch is guarded by a 
check for const-ness to make sure we're not doing the transform if errno could possibly be set by
the libcall or builtin.

Differential Revision: https://reviews.llvm.org/D40044

llvm-svn: 319593
2017-12-01 23:15:52 +00:00
Dean Michael Berris 1a5b10d5b4 [XRay][clang] Introduce -fxray-always-emit-customevents
Summary:
The -fxray-always-emit-customevents flag instructs clang to always emit
the LLVM IR for calls to the `__xray_customevent(...)` built-in
function. The default behaviour currently respects whether the function
has an `[[clang::xray_never_instrument]]` attribute, and thus not lower
the appropriate IR code for the custom event built-in.

This change allows users calling through to the
`__xray_customevent(...)` built-in to always see those calls lowered to
the corresponding LLVM IR to lay down instrumentation points for these
custom event calls.

Using this flag enables us to emit even just the user-provided custom
events even while never instrumenting the start/end of the function
where they appear. This is useful in cases where "phase markers" using
__xray_customevent(...) can have very few instructions, must never be
instrumented when entered/exited.

Reviewers: rnk, dblaikie, kpw

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D40601

llvm-svn: 319388
2017-11-30 00:04:54 +00:00
Erich Keane 0a340ab31c [X86] Update CPUSupports code to reuse LLVM .def file [NFC]
llvm-svn: 318815
2017-11-22 00:54:01 +00:00
Erich Keane 8202521cf5 Simplify CpuIs code to use include from LLVM
LLVM exposes a file in the backend (X86TargetParser.def) that
contains information about the correct list of CpuIs values.

This patch removes 2 of the copied and pasted versions of this
list from clang and instead includes the data from the .def file.

Differential Revision: https://reviews.llvm.org/D40054

llvm-svn: 318234
2017-11-15 00:11:24 +00:00
Sanjay Patel 33f83995a8 [CodeGen] fix const-ness of cbrt and fma
cbrt() is always constant because it can't overflow or underflow. Therefore, it can't set errno.

fma() is not always constant because it can overflow or underflow. Therefore, it can set errno.
But we know that it never sets errno on GNU / MSVC, so make it constant in those environments.

Differential Revision: https://reviews.llvm.org/D39641

llvm-svn: 318093
2017-11-13 22:11:49 +00:00
John McCall 26d55e0346 Fix a bug with the use of __builtin_bzero in a conditional expression.
Patch by Bharathi Seshadri!

llvm-svn: 317776
2017-11-09 09:32:32 +00:00
Justin Lebar da9e0bd3a2 [NVPTX] Implement __nvvm_atom_add_gen_d builtin.
Summary:
This just seems to have been an oversight.  We already supported the f64
atomic add with an explicit scope (e.g. "cta"), but not the scopeless
version.

Reviewers: tra

Subscribers: jholewinski, sanjoy, cfe-commits, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D39638

llvm-svn: 317623
2017-11-07 22:10:54 +00:00
Craig Topper 57f96ac6dc [X86] Replace the mask cmpeq/cmple/cmplt/cmpgt/cmpge/cmpneq intrinsics with macros that just pass the right comparison predicate value to the regular cmp intrinsic. Remove mask cmpeq/cmpgt builtins that are now unused.
This shortens the intrinsic headers a little and allows us to get rid of the cmpeq and cmpgt handling from CGBuiltin.cpp.

llvm-svn: 317506
2017-11-06 21:00:49 +00:00
Sanjay Patel 7cb25a888c [CodeGen] map sqrt libcalls to llvm.sqrt when errno is not set
The LLVM sqrt intrinsic definition changed with:
D28797
...so we don't have to use any relaxed FP settings other than errno handling.

This patch sidesteps a question raised in PR27435:
https://bugs.llvm.org/show_bug.cgi?id=27435

Is a programmer using __builtin_sqrt() invoking the compiler's intrinsic definition of sqrt or the mathlib definition of sqrt?

But we have an answer now: the builtin should match the behavior of the libm function including errno handling.

Differential Revision: https://reviews.llvm.org/D39204

llvm-svn: 317031
2017-10-31 20:19:39 +00:00
Yaxun Liu c2a87a05f1 [OpenCL] Emit enqueued block as kernel
In OpenCL the kernel function and non-kernel function has different calling conventions.
For certain targets they have different argument ABIs. Also kernels have special function
attributes and metadata for runtime to launch them.

The blocks passed to enqueue_kernel is supposed to be executed as kernels. As such,
the block invoke function should be emitted as kernel with proper calling convention and
argument ABI.

This patch emits enqueued block as kernel. If a block is both called directly and passed
to enqueue_kernel, separate functions will be generated.

Differential Revision: https://reviews.llvm.org/D38134

llvm-svn: 315804
2017-10-14 12:23:50 +00:00
Artem Belevich 91cc00bde6 [CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions on sm_70
Differential Revision: https://reviews.llvm.org/D38742

llvm-svn: 315624
2017-10-12 21:32:19 +00:00
Craig Topper 8c8e83a15f [X86] Add support for 'amdfam17h' to __builtin_cpu_is to match gcc.
The compiler-rt implementation already supported it, it just wasn't exposed.

llvm-svn: 315517
2017-10-11 21:42:02 +00:00
Matt Arsenault f12e3b848a AMDGPU: Add read_exec_lo/hi builtins
llvm-svn: 315238
2017-10-09 20:06:37 +00:00
Erich Keane 1fe643a6d7 Split X86::BI__builtin_cpu_init handling into own function[NFC]
The Cpu Init functionality is required for the target
attribute, so this patch simply splits it out into its own
function, exactly like CpuIs and CpuSupports.

llvm-svn: 315075
2017-10-06 16:40:45 +00:00
Akira Hatanaka a46381286f Fix check strings in test case and use llvm::to_string instead of
std::to_string.

These changes were needed to fix bots that started failing after
r315045.

llvm-svn: 315046
2017-10-06 07:47:47 +00:00
Akira Hatanaka 6b103bc18c [CodeGen] Emit a helper function for __builtin_os_log_format to reduce
code size.

Currently clang expands a call to __builtin_os_log_format into a long
sequence of instructions at the call site, causing code size to
increase in some cases.

This commit attempts to reduce code size by emitting a helper function
that can be shared by calls to __builtin_os_log_format with similar
formats and arguments. The helper function has linkonce_odr linkage to
enable the linker to merge identical functions across translation units.
Attribute 'noinline' is attached to the helper function at -Oz so that
the inliner doesn't inline functions that can potentially be merged.

This commit also fixes a bug where the generated IR writes past the end
of the buffer when "%m" is the last specifier appearing in the format
string passed to __builtin_os_log_format.

Original patch by Duncan Exon Smith.

rdar://problem/34065973
rdar://problem/34196543

Differential Revision: https://reviews.llvm.org/D38606

llvm-svn: 315045
2017-10-06 07:12:46 +00:00
Artem Belevich bab95c7087 [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.
Differential Revision: https://reviews.llvm.org/D38191

llvm-svn: 314223
2017-09-26 17:07:23 +00:00
Justin Lebar d31d5e6aa2 Revert "[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.", rL314135.
Causing assertion failures on macos:

> Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"),
> function getOperand, file
> /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h,
> line 835.

http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/

llvm-svn: 314142
2017-09-25 19:41:56 +00:00
Artem Belevich 9941ee9529 [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.
Differential Revision: https://reviews.llvm.org/D38191

llvm-svn: 314135
2017-09-25 18:53:57 +00:00
Heejin Ahn b29a17ba21 [WebAssembly] Restore __builtin_wasm_rethrow builtin
Summary:
Restore the `__builtin_wasm_rethrow` builtin deleted in D37931. On second
thought, it appears it can be used to implement `__cxa_rethrow`.

Reviewers: dschuff, sunfish

Reviewed By: dschuff

Subscribers: jfb, sbc100, jgravelle-google

Differential Revision: https://reviews.llvm.org/D37942

llvm-svn: 313430
2017-09-16 01:07:43 +00:00
Craig Topper 8cd7b0cd2c [X86] Use native shuffle vector for the perm2f128 intrinsics
This patch replaces the perm2f128 intrinsics with native shuffle vectors.

This uses a pretty simple approach to allocate source 0 to the lower half input and source 1 to the upper half input. Then its just a matter of filling in the indices to use either the lower or upper half of that specific source. This can result in the same source being used by both operands. InstCombine or SelectionDAGBuilder should be able to clean that up.

Differential Revision: https://reviews.llvm.org/D37892

llvm-svn: 313418
2017-09-15 23:00:59 +00:00
Heejin Ahn fa9e1fba8c Remove __builtin_wasm_rethrow builtin
Summary:
Remove `__builtin_wasm_rethrow` builtin. I thought it was required to implement
`__cxa_rethrow` function in libcxxabi, but it turned out it will be using
`__builtin_wasm_throw` instead.

Reviewers: dschuff, jgravelle-google

Reviewed By: jgravelle-google

Subscribers: jfb, sbc100, jgravelle-google

Differential Revision: https://reviews.llvm.org/D37931

llvm-svn: 313402
2017-09-15 22:01:22 +00:00
Uriel Korach 3fba3c3b0c [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (clang)
This patch, together with a matching llvm patch (https://reviews.llvm.org/D37693), implements the lowering of X86 ABS intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D37694

llvm-svn: 313133
2017-09-13 09:02:02 +00:00
Jan Vesely 31ecb4bf60 [OpenCL] Add half load and store builtins
This enables load/stores of half type, without half being a legal type.

Differential Revision: https://reviews.llvm.org/D37231

llvm-svn: 312742
2017-09-07 19:39:10 +00:00
Reid Kleckner d53c39ba46 Commit changes missing from r312572
llvm-svn: 312573
2017-09-05 20:38:29 +00:00
Reid Kleckner 30701edf76 [ms] Implement the __annotation intrinsic
llvm-svn: 312572
2017-09-05 20:27:35 +00:00
Yaxun Liu 29a5ee358e [OpenCL] Do not use vararg in emitted functions for enqueue_kernel
Not all targets support vararg (e.g. amdgpu). Instead of using vararg in the emitted functions for enqueue_kernel,
this patch creates a temporary array of size_t, stores the size arguments in the temporary array
and passes it to the emitted functions for enqueue_kernel.

Differential Revision: https://reviews.llvm.org/D36678

llvm-svn: 312441
2017-09-03 13:52:24 +00:00
Erich Keane 9937b134c5 [CodeGen]Refactor CpuSupports/CPUIs Builtin Code Gen to better work with
"target" implementation

A small set of refactors that'll make it easier for me to implement 'target' 
support.

First, extract the CPUSupports functionality into its own function. 
THis has the advantage of not wasting time in this builtin to deal with 
arguments.
Second, pulls both CPUSupports and CPUIs implementation into a member-function, 
so that it can be called from the resolver generation that I'm working on.
Third, creates an overload that takes simply the feature/cpu name (rather than 
extracting it from a callexpr), since that info isn't available later.

Note that despite how the 'diff' looks, the EmitX86CPUSupports function simply 
takes the implementation out of the 'switch'.

llvm-svn: 312355
2017-09-01 19:42:45 +00:00
Craig Topper 2c03e53f4e [X86] Add support for __builtin_cpu_init
This adds builtin_cpu_init which will emit a call to cpu_indicator_init in libgcc or compiler-rt.

This is needed to support builtin_cpu_supports/builtin_cpu_is in an ifunc resolver.

Differential Revision: https://reviews.llvm.org/D36336

llvm-svn: 311874
2017-08-28 05:43:23 +00:00
John McCall de0fe07eef Extract IRGen's constant-emitter into its own helper class and clean up
the interface.

The ultimate goal here is to make it easier to do some more interesting
things in constant emission, like emit constant initializers that have
ignorable side-effects, or doing the majority of an initialization
in-place and then patching up the last few things with calls.  But for
now this is mostly just a refactoring.

llvm-svn: 310964
2017-08-15 21:42:52 +00:00
Craig Topper 699ae0c173 [X86] Implement __builtin_cpu_is
This patch adds support for __builtin_cpu_is. I've tried to match the strings supported to the latest version of gcc.

Differential Revision: https://reviews.llvm.org/D35449

llvm-svn: 310657
2017-08-10 20:28:30 +00:00
Craig Topper 41a550ccfa [X86] Support 'avx5124vnniw' and 'avx5124fmaps' for __builtin_cpu_supports.
They still need to be implemented in the intrinsics, the command line, and the backend. But this change isn't dependent on any of that and resolves a TODO.

llvm-svn: 310386
2017-08-08 17:43:44 +00:00
Joey Gouly fa76b49cef [OpenCL] Add missing subgroup builtins
This adds get_kernel_max_sub_group_size_for_ndrange and
get_kernel_sub_group_count_for_ndrange.

llvm-svn: 309678
2017-08-01 13:27:09 +00:00
Victor Leschuk 198357bbb9 Fix incorrect assertion condition.
llvm-svn: 309484
2017-07-29 08:18:38 +00:00
Vedant Kumar 10c3102071 [ubsan] Diagnose invalid uses of builtins (clang)
On some targets, passing zero to the clz() or ctz() builtins has undefined
behavior. I ran into this issue while debugging UB in __hash_table from libcxx:
the bug I was seeing manifested itself differently under -O0 vs -Os, due to a
UB call to clz() (see: libcxx/r304617).

This patch introduces a check which can detect UB calls to builtins.

llvm.org/PR26979

Differential Revision: https://reviews.llvm.org/D34590

llvm-svn: 309459
2017-07-29 00:19:51 +00:00
Martin Storsjo 022e782e75 [AArch64] Add support for __builtin_ms_va_list on aarch64
Move builtins from the x86 specific scope into the global
scope. Their use is still limited to x86_64 and aarch64 though.

This allows wine on aarch64 to properly handle variadic functions.

Differential Revision: https://reviews.llvm.org/D34475

llvm-svn: 308218
2017-07-17 20:49:45 +00:00
Ulrich Weigand cac24ab04c [SystemZ] Add support for IBM z14 processor (1/3)
This patch series adds support for the IBM z14 processor.  This part includes:
- Basic support for the new processor and its features.
- Support for low-level builtins mapped to new LLVM intrinsics.

Support for the -fzvector extension to vector float and the new
high-level vector intrinsics is provided by separate patches.

llvm-svn: 308197
2017-07-17 17:45:57 +00:00
Konstantin Zhuravlyov b0beb30fea Enhance synchscope representation (clang)
Relevant changes required for r307722.

Differential Revision: https://reviews.llvm.org/D33109

llvm-svn: 307723
2017-07-11 22:23:37 +00:00
Craig Topper e1ba5a3132 [X86] Move AVX512VPOPCNTDQ in __builtin_cpu_support's enum to match trunk gcc.
There are two other features before it that we don't currently support in the the frontend or backend so I left placeholders to keep the encoding correct.

I think the compiler-rt implementation of this feature is even further out of date.

llvm-svn: 307456
2017-07-08 00:47:44 +00:00
Sjoerd Meijer 98ee78578b This reverts r305820 (ARMv.2-A FP16 vector intrinsics) because it shows
problems in testing, see comments in D34161 for some more details.
A fix is in progres in D35011, but a revert seems better now as the fix will
probably take some more time to land.

llvm-svn: 307277
2017-07-06 16:37:31 +00:00
Heejin Ahn b92440eab0 [WebAssembly] Add throw/rethrow builtins for exception handling
Summary:
Add new builtins for throw/rethrow instructions. This follows exception handling
handling proposal in
https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md

Reviewers: sunfish, dschuff

Reviewed By: dschuff

Subscribers: jfb, dschuff, sbc100, jgravelle-google

Differential Revision: https://reviews.llvm.org/D34783

llvm-svn: 306775
2017-06-30 00:44:01 +00:00
Abderrazek Zaafrani f10ca93f34 [AArch64] ADD ARMv.2-A FP16 vector intrinsics
Differential Revision: https://reviews.llvm.org/D34161

llvm-svn: 305820
2017-06-20 18:54:57 +00:00
Dinar Temirbulatov 7b22425dff Expand vector oparation to as IR constants, PR28129.
llvm-svn: 305551
2017-06-16 12:09:52 +00:00
Galina Kistanova 0872d6c275 Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
llvm-svn: 304649
2017-06-03 06:30:46 +00:00
Vedant Kumar a44a6ac81f Revert "[AArch64] Add ARMv8.2-A FP16 vefctor intrinsics"
This reverts commit r304493. It breaks all the Darwin bots:
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental_check/37168

Failure:
Failing Tests (2):
    Clang :: CodeGen/aarch64-v8.2a-neon-intrinsics.c
    Clang :: CodeGen/arm_neon_intrinsics.c

llvm-svn: 304509
2017-06-02 01:22:14 +00:00
Abderrazek Zaafrani a44e5f601d [AArch64] Add ARMv8.2-A FP16 vefctor intrinsics
llvm-svn: 304493
2017-06-01 23:22:29 +00:00
Oren Ben Simhon 140c1fb9ec [X86] Adding avx512_vpopcntdq feature set and its intrinsics
AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the Clang side of the addition of six intrinsics for two new machine instructions (vpopcntd and vpopcntq).
It also includes the addition of the new feature set.

Differential Revision: https://reviews.llvm.org/D33170

llvm-svn: 303857
2017-05-25 13:44:11 +00:00
Tony Jiang 9aa2c0383d [PowerPC] Implement vec_xxsldwi builtin.
The vec_xxsldwi builtin is missing from altivec.h. This has been requested by
developers working on libvpx for VP9 support for Google.

The patch fixes PR: https://bugs.llvm.org/show_bug.cgi?id=32653
Differential Revision: https://reviews.llvm.org/D33236

llvm-svn: 303766
2017-05-24 15:54:13 +00:00
Tony Jiang bbc48e9164 [PowerPC] Implement vec_xxpermdi builtin.
The vec_xxpermdi builtin is missing from altivec.h. This has been requested by
developers working on libvpx for VP9 support for Google.

The patch fixes PR: https://bugs.llvm.org/show_bug.cgi?id=32653
Differential Revision: https://reviews.llvm.org/D33053

llvm-svn: 303760
2017-05-24 15:13:32 +00:00
Krzysztof Parzyszek 8f248234fa [CodeGen] Propagate LValueBaseInfo instead of AlignmentSource
The functions creating LValues propagated information about alignment
source. Extend the propagated data to also include information about
possible unrestricted aliasing. A new class LValueBaseInfo will
contain both AlignmentSource and MayAlias info.

This patch should not introduce any functional changes.

Differential Revision: https://reviews.llvm.org/D33284

llvm-svn: 303358
2017-05-18 17:07:11 +00:00
Serge Guelton 1d993270b3 Suppress all uses of LLVM_END_WITH_NULL. NFC.
Use variadic templates instead of relying on <cstdarg> + sentinel.

This enforces better type checking and makes code more readable.

Differential revision: https://reviews.llvm.org/D32550

llvm-svn: 302572
2017-05-09 19:31:30 +00:00
Dean Michael Berris 42af651358 [XRay] Add __xray_customeevent(...) as a clang-supported builtin
Summary:
We define the `__xray_customeevent` builtin that gets translated to
IR calls to the correct intrinsic. The default implementation of this is
a no-op function. The codegen side of this follows the following logic:

- When `-fxray-instrument` is not provided in the driver, we elide all
calls to `__xray_customevent`.
- When `-fxray-instrument` is enabled and a function is marked as "never
instrumented", we elide all calls to `__xray_customevent` in that
function; if either marked as "always instrumented" or subject to
threshold-based instrumentation, we emit a call to the
`llvm.xray.customevent` intrinsic from LLVM for each
`__xray_customevent` occurrence in the function.

This change depends on D27503 (to land in LLVM first).

Reviewers: echristo, rsmith

Subscribers: mehdi_amini, pelikan, lrl, cfe-commits

Differential Revision: https://reviews.llvm.org/D30018

llvm-svn: 302492
2017-05-09 00:45:40 +00:00
Nico Weber 050af67ea8 ANSIfy more. Still no behavior change.
llvm-svn: 302259
2017-05-05 17:16:58 +00:00
Nico Weber 0a234047eb ANSIfy. No behavior change.
llvm-svn: 302258
2017-05-05 17:15:08 +00:00
Hans Wennborg 5c3c51fe05 Implement _interlockedbittestandset as a builtin
It's used by MS headers in VS 2017 without including intrin.h, so we
can't implement it in the header anymore.

Differential Revision: https://reviews.llvm.org/D31736

llvm-svn: 299782
2017-04-07 16:41:47 +00:00
Michael Zuckerman 755a13db3d [X86][Clang] Converting __mm{|256|512}_movm_epi{8|16|32|64} LLVMIR call into generic intrinsics.
This patch is a part two of two reviews, one for the clang and the other for LLVM. 
In this patch, I covered the clang side, by introducing the intrinsic to the front end. 
This is done by creating a generic replacement.

Differential Revision: https://reviews.llvm.org/D31394a

llvm-svn: 299431
2017-04-04 13:29:53 +00:00
Hans Wennborg 043f402586 [X86] Implement __readgsqword (and the rest) as builtins (PR32373)
It seems MS headers have started using __readgsqword, and since it's
used in a header that doesn't include intrin.h, we can't implement it as
an inline function anymore.

That was already the case for __readfsdword, which Saleem added support
for in r220859. This patch reuses that codegen to implement all of
__read[fg]s{byte,word,dword,qword}.

Differential Revision: https://reviews.llvm.org/D31248

llvm-svn: 298538
2017-03-22 19:13:13 +00:00
George Burgess IV a63f91574f Let llvm.objectsize be conservative with null pointers
D28494 adds another parameter to @llvm.objectsize. Clang needs to be
sure to pass that third arg whenever applicable.

llvm-svn: 298431
2017-03-21 20:09:35 +00:00
Reid Kleckner de86482ce0 Update Clang for LLVM rename AttributeSet -> AttributeList
llvm-svn: 298394
2017-03-21 16:57:30 +00:00
Sanjay Patel e795daa55e [x86] these aren't the undefs you're looking for (PR32176)
x86 has undef SSE/AVX intrinsics that should represent a bogus register operand. 
This is not the same as LLVM's undef value which can take on multiple bit patterns.

There are better solutions / follow-ups to this discussed here:
https://bugs.llvm.org/show_bug.cgi?id=32176
...but this should prevent miscompiles with a one-line code change.

Differential Revision: https://reviews.llvm.org/D30834

llvm-svn: 297588
2017-03-12 19:15:10 +00:00
Yaxun Liu 4d86799219 [AMDGPU] Add builtin functions readlane ds_permute mov_dpp
Differential Revision: https://reviews.llvm.org/D30551

llvm-svn: 297436
2017-03-10 01:30:46 +00:00
Reid Kleckner b04cb9ab7a [MS] Add support for __ud2 and __int2c MSVC intrinsics
This was requested in PR31958 and elsewhere.

llvm-svn: 297057
2017-03-06 19:43:16 +00:00