Future versions of MSVC make these intrinsics available on x86 & x64,
according to:
http://lists.llvm.org/pipermail/cfe-dev/2019-March/061711.html
The purpose of these builtins is to emit plain, non-atomic, volatile
stores when /volatile:ms (-cc1 -fms-volatile) is enabled.
llvm-svn: 357220
This is the result of discussions on the list about how to deal with intrinsics
which require codegen to disambiguate them via only the integer/fp overloads.
It causes problems for GlobalISel as some of that information is lost during
translation, while with other operations like IR instructions the information is
encoded into the instruction opcode.
This patch changes clang to emit the new faddp intrinsic if the vector operands
to the builtin have FP element types. LLVM IR AutoUpgrade has been taught to
upgrade existing calls to aarch64.neon.addp with fp vector arguments, and
we remove the workarounds introduced for GlobalISel in r355865.
This is a more permanent solution to PR40968.
Differential Revision: https://reviews.llvm.org/D59655
llvm-svn: 356722
Summary:
Because in wasm we merge all catch clauses into one big catchpad, in
case none of the types in catch handlers matches after we test against
each of them, we should unwind to the next EH enclosing scope. For this,
we should NOT use a call to `__cxa_rethrow` but rather a call to our own
rethrow intrinsic, because what we're trying to do here is just to
transfer the control flow into the next enclosing EH pad (or the
caller). Calls to `__cxa_rethrow` should only be used after a call to
`__cxa_begin_catch`.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59353
llvm-svn: 356317
This reverts commit r353765. After talking with our c stdlib folks, we decided
to use the existing pass_object_size attribute to implement _FORTIFY_SOURCE
wrappers, like Bionic does (I didn't realize that pass_object_size could be used
for this purpose). Sorry for the flip/flop, and thanks to James Y. Knight for
pointing this out to me.
llvm-svn: 356103
Argument evaluation order is different between gcc and clang, so pull out
the Builder calls to make the generated IR independent of the host compiler's
argument evaluation order. Thanks to rnk for reminding me of this clang/gcc
difference.
llvm-svn: 353969
This attribute applies to declarations of C stdlib functions
(sprintf, memcpy...) that have known fortified variants
(__sprintf_chk, __memcpy_chk, ...). When applied, clang will emit
calls to the fortified variant functions instead of calls to the
defaults.
In GCC, this is done by adding gnu_inline-style wrapper functions,
but that doesn't work for us for variadic functions because we don't
support __builtin_va_arg_pack (and have no intention to).
This attribute takes two arguments, the first is 'type' argument
passed through to __builtin_object_size, and the second is a flag
argument that gets passed through to the variadic checking variants.
rdar://47905754
Differential revision: https://reviews.llvm.org/D57918
llvm-svn: 353765
The various EltSize, Offset, DataLayout, and StructLayout arguments
are all computable from the Address's element type and the DataLayout
which the CGBuilder already has access to.
After having previously asserted that the computed values are the same
as those passed in, now remove the redundant arguments from
CGBuilder's Create*GEP functions.
Differential Revision: https://reviews.llvm.org/D57767
llvm-svn: 353629
When we are calling `__builtin_constant_p` with ObjC objects of
different classes, we hit the assertion
> Assertion failed: (isa<X>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file include/llvm/Support/Casting.h, line 254.
It happens because LLVM types for `ObjCInterfaceType` are opaque and
have no name (see `CodeGenTypes::ConvertType`). As the result, for
different ObjC classes we have different `is_constant` intrinsics with
the same name `llvm.is.constant.p0s_s`. When we try to reuse an
intrinsic with the same name, we fail because of type mismatch.
Fix by bitcasting `ObjCObjectPointerType` to `id` prior to passing as an
argument to `__builtin_constant_p`. This results in using intrinsic
`llvm.is.constant.p0i8` and correct types.
rdar://problem/47499250
Reviewers: rjmccall, ahatanak, void
Reviewed By: void, ahatanak
Subscribers: ddunbar, jkorous, hans, dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D57427
llvm-svn: 353577
Emit{Nounwind,}RuntimeCall{,OrInvoke} have been modified to take a
FunctionCallee as an argument, and CreateRuntimeFunction has been
modified to return a FunctionCallee. All callers have been updated.
Additionally, CreateBuiltinFunction is removed, as it was redundant
with CreateRuntimeFunction after some previous changes.
Differential Revision: https://reviews.llvm.org/D57668
llvm-svn: 353184
This builtin has the same UI as __builtin_object_size, but has the
potential to be evaluated dynamically. It is meant to be used as a
drop-in replacement for libraries that use __builtin_object_size when
a dynamic checking mode is enabled. For instance,
__builtin_object_size fails to provide any extra checking in the
following function:
void f(size_t alloc) {
char* p = malloc(alloc);
strcpy(p, "foobar"); // expands to __builtin___strcpy_chk(p, "foobar", __builtin_object_size(p, 0))
}
This is an overflow if alloc < 7, but because LLVM can't fold the
object size intrinsic statically, it folds __builtin_object_size to
-1. With __builtin_dynamic_object_size, alloc is passed through to
__builtin___strcpy_chk.
rdar://32212419
Differential revision: https://reviews.llvm.org/D56760
llvm-svn: 352665
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.
rdar://32212419
Differential revision: https://reviews.llvm.org/D56761
llvm-svn: 352664
Summary:
The 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics all have the possibility of taking an explicit rounding mode argument. If the rounding mode is CUR_DIRECTION we'd like to emit a sitofp/uitofp instruction and a select like we do for 256-bit intrinsics.
For cvt(u)qqtopd and cvt(u)dqtops we do this when the form of the software intrinsics that doesn't take a rounding mode argument is used. This is done by using convertvector in the header with the select builtin. But if the explicit rounding mode form of the intrinsic is used and CUR_DIRECTION is passed, we don't do this. We shouldn't have this inconsistency.
For cvt(u)qqtops nothing is done because we can't use the select builtin in the header without avx512vl. So we need to use custom codegen for this.
Even when the rounding mode isn't CUR_DIRECTION we should also use select in IR for consistency. And it will remove another scalar integer mask from our intrinsics.
To accomplish all of these goals I've taken a slightly unusual approach. I've added two new X86 specific intrinsics for sitofp/uitofp with rounding. These intrinsics are variadic on the input and output type so we only need 2 instead of 6. This avoids the need for a switch to map them in CGBuiltin.cpp. We just need to check signed vs unsigned. I believe other targets also use variadic intrinsics like this.
So if the rounding mode is CUR_DIRECTION we'll use an sitofp/uitofp instruction. Otherwise we'll use one of the new intrinsics. After that we'll emit a select instruction if needed.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D56998
llvm-svn: 352267
These intrinsics can always be replaced with generic integer comparisons without any regression in codegen, even for -O0/-fast-isel cases.
Noticed while cleaning up vector integer comparison costs for PR40376.
A future commit will remove/autoupgrade the existing VPCOM/VPCOMU llvm intrinsics.
llvm-svn: 351687
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
llvm.flt.rounds returns an i32, but the builtin expects an integer.
On targets where integers are not 32-bits clang tries to bitcast the result, causing an assertion failure.
The patch enables newlib build for msp430.
Patch by Edward Jones!
Differential Revision: https://reviews.llvm.org/D24461
llvm-svn: 351449
We need to custom handle these so we can turn the scalar mask into a vXi1 vector.
Differential Revision: https://reviews.llvm.org/D56530
llvm-svn: 351390
Summary:
UB isn't nice. It's cool and powerful, but not nice.
Having a way to detect it is nice though.
[[ https://wg21.link/p1007r3 | P1007R3: std::assume_aligned ]] / http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1007r2.pdf says:
```
We propose to add this functionality via a library function instead of a core language attribute.
...
If the pointer passed in is not aligned to at least N bytes, calling assume_aligned results in undefined behaviour.
```
This differential teaches clang to sanitize all the various variants of this assume-aligned attribute.
Requires D54588 for LLVM IRBuilder changes.
The compiler-rt part is D54590.
This is a second commit, the original one was r351105,
which was mass-reverted in r351159 because 2 compiler-rt tests were failing.
Reviewers: ABataev, craig.topper, vsk, rsmith, rnk, #sanitizers, erichkeane, filcab, rjmccall
Reviewed By: rjmccall
Subscribers: chandlerc, ldionne, EricWF, mclow.lists, cfe-commits, bkramer
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D54589
llvm-svn: 351177
Summary:
This patch attempts to redo what was tried in r278783, but was reverted.
These intrinsics should be available on non-windows platforms with "xsave" feature check. But on Windows platforms they shouldn't have feature check since that's how MSVC behaves.
To accomplish this I've added a MS builtin with no feature check. And a normal gcc builtin with a feature check. When _MSC_VER is not defined _xgetbv/_xsetbv will be macros pointing to the gcc builtin name.
I've moved the forward declarations from intrin.h to immintrin.h to match the MSDN documentation and used that as the header file for the MS builtin.
I'm not super happy with this implementation, and I'm open to suggestions for better ways to do it.
Reviewers: rnk, RKSimon, spatel
Reviewed By: rnk
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D56686
llvm-svn: 351160
Summary:
UB isn't nice. It's cool and powerful, but not nice.
Having a way to detect it is nice though.
[[ https://wg21.link/p1007r3 | P1007R3: std::assume_aligned ]] / http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1007r2.pdf says:
```
We propose to add this functionality via a library function instead of a core language attribute.
...
If the pointer passed in is not aligned to at least N bytes, calling assume_aligned results in undefined behaviour.
```
This differential teaches clang to sanitize all the various variants of this assume-aligned attribute.
Requires D54588 for LLVM IRBuilder changes.
The compiler-rt part is D54590.
Reviewers: ABataev, craig.topper, vsk, rsmith, rnk, #sanitizers, erichkeane, filcab, rjmccall
Reviewed By: rjmccall
Subscribers: chandlerc, ldionne, EricWF, mclow.lists, cfe-commits, bkramer
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D54589
llvm-svn: 351105
This removes the old grow_memory and mem.grow-style builtins, leaving just
the memory.grow-style builtins.
Differential Revision: https://reviews.llvm.org/D56645
llvm-svn: 351089
__builtin_cpu_supports and __builtin_cpu_is use information in __cpu_model to decide cpu features. Before this change, __cpu_model was not declared as dso local. The generated code looks up the address in GOT when reading __cpu_model. This makes it impossible to use these functions in ifunc, because at that time GOT entries have not been relocated. This change makes it dso local.
Differential Revision: https://reviews.llvm.org/D53850
llvm-svn: 349825
Sibling patch to D55855, this emits UADD_SAT/USUB_SAT generic intrinsics for the SSE saturated math intrinsics instead of expanding to a IR code sequence that could be difficult to reassemble.
Differential Revision: https://reviews.llvm.org/D55879
llvm-svn: 349631
The special lowering for __builtin_mul_overflow introduced in r320902
fixed an ICE seen when passing mixed-sign operands to the builtin.
This patch extends the special lowering to cover mixed-width, mixed-sign
operands. In a few common scenarios, calls to muloti4 will no longer be
emitted.
This should address the latest comments in PR34920 and work around the
link failure seen in:
https://bugzilla.redhat.com/show_bug.cgi?id=1657544
Testing:
- check-clang
- A/B output comparison with: https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081
Differential Revision: https://reviews.llvm.org/D55843
llvm-svn: 349542
Summary:
This patch adds `__builtin_launder`, which is required to implement `std::launder`. Additionally GCC provides `__builtin_launder`, so thing brings Clang in-line with GCC.
I'm not exactly sure what magic `__builtin_launder` requires, but based on previous discussions this patch applies a `@llvm.invariant.group.barrier`. As noted in previous discussions, this may not be enough to correctly handle vtables.
Reviewers: rnk, majnemer, rsmith
Reviewed By: rsmith
Subscribers: kristina, Romain-Geissler-1A, erichkeane, amharc, jroelofs, cfe-commits, Prazek
Differential Revision: https://reviews.llvm.org/D40218
llvm-svn: 349195
intrin.h had forward declarations for these and lzcntintrin.h had implementations that were only available with -mlzcnt or a -march that supported the lzcnt feature.
For MS compatibility we should always have these builtins available regardless of X86 being the target or the CPU support the lzcnt instruction. The backends should be able to gracefully fallback to something support even if its just shifts and bit ops.
Unfortunately, gcc also implements 2 of the 3 function names here on X86 when lzcnt feature is enabled.
This patch adds builtins for these for MSVC compatibility and drops the forward declarations from intrin.h. To keep the gcc compatibility the two intrinsics that collided have been turned into macros that use the X86 specific builtins with the lzcnt feature check. These macros are only defined when _MSC_VER is not defined. Without them being macros we can get a redefinition error because -ms-extensions doesn't seem to set _MSC_VER but does make the MS builtins available.
Should fix PR40014
Differential Revision: https://reviews.llvm.org/D55677
llvm-svn: 349098
__builtin_cpu_supports and __builtin_cpu_is use information in __cpu_model to decide cpu features. Before this change, __cpu_model was not declared as dso local. The generated code looks up the address in GOT when reading __cpu_model. This makes it impossible to use these functions in ifunc, because at that time GOT entries have not been relocated. This change makes it dso local.
Differential Revision: https://reviews.llvm.org/D53850
llvm-svn: 348978
The addcarry and addcarryx builtins do the same thing. The only difference is that addcarryx previously required adx feature.
This commit removes the adx feature check from addcarryx and removes the addcarry builtin. This matches the builtins that gcc has. We don't guarantee compatibility in builtins, but we generally try to be consistent if its not a burden.
llvm-svn: 348738
It seems the two failing tests can be simply fixed after r348037
Fix 3 cases in Analysis/builtin-functions.cpp
Delete the bad CodeGen/builtin-constant-p.c for now
llvm-svn: 348053
Kept the "indirect_builtin_constant_p" test case in test/SemaCXX/constant-expression-cxx1y.cpp
while we are investigating why the following snippet fails:
extern char extern_var;
struct { int a; } a = {__builtin_constant_p(extern_var)};
llvm-svn: 348039
This was reverted in r347656 due to me thinking it caused a miscompile of
Chromium. Turns out it was the Chromium code that was broken.
llvm-svn: 347756
This caused a miscompile in Chrome (see crbug.com/908372) that's
illustrated by this small reduction:
static bool f(int *a, int *b) {
return !__builtin_constant_p(b - a) || (!(b - a));
}
int arr[] = {1,2,3};
bool g() {
return f(arr, arr + 3);
}
$ clang -O2 -S -emit-llvm a.cc -o -
g() should return true, but after r347417 it became false for some reason.
This also reverts the follow-up commits.
r347417:
> Re-Reinstate 347294 with a fix for the failures.
>
> Don't try to emit a scalar expression for a non-scalar argument to
> __builtin_constant_p().
>
> Third time's a charm!
r347446:
> The result of is.constant() is unsigned.
r347480:
> A __builtin_constant_p() returns 0 with a function type.
r347512:
> isEvaluatable() implies a constant context.
>
> Assume that we're in a constant context if we're asking if the expression can
> be compiled into a constant initializer. This fixes the issue where a
> __builtin_constant_p() in a compound literal was diagnosed as not being
> constant, even though it's always possible to convert the builtin into a
> constant.
r347531:
> A "constexpr" is evaluated in a constant context. Make sure this is reflected
> if a __builtin_constant_p() is a part of a constexpr.
llvm-svn: 347656
This was originally part of:
D50924
and should resolve PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387
...but it was reverted because some bots using a gcc host compiler
would crash for unknown reasons with this included in the patch.
Trying again now to see if that's still a problem.
llvm-svn: 347527
Summary:
A __builtin_constant_p may end up with a constant after inlining. Use
the is.constant intrinsic if it's a variable that's in a context where
it may resolve to a constant, e.g., an argument to a function after
inlining.
Reviewers: rsmith, shafik
Subscribers: jfb, kristina, cfe-commits, nickdesaulniers, jyknight
Differential Revision: https://reviews.llvm.org/D54355
llvm-svn: 347294
As suggested by Richard Smith, and initially put up for review here:
https://reviews.llvm.org/D53341, this patch removes a hack that was used
to ensure that proper target-feature lists were used when emitting
cpu-dispatch (and eventually, target-clones) implementations. As a part
of this, the GlobalDecl object is proliferated to a bunch more
locations.
Originally, this was put up for review (see above) to get acceptance on
the approach, though discussion with Richard in San Diego showed he
approved of the approach taken here. Thus, I believe this is acceptable
for Review-After-commit
Differential Revision: https://reviews.llvm.org/D53341
Change-Id: I0a0bd673340d334d93feac789d653e03d9f6b1d5
llvm-svn: 346757
Fix places where the return type of a FunctionDecl was being used in
place of the function type
FunctionDecl::Create() takes as its T parameter the type of function
that should be created, not the return type. Passing in the return type
looks to have been copypasta'd around a bit, but the number of correct
usages outweighs the incorrect ones so I've opted for keeping what T is
the same and fixing up the call sites instead.
This fixes a crash in Clang when attempting to compile the following
snippet of code with -fblocks -fsanitize=function -x objective-c++ (my
original repro case):
void g(void(^)());
void f()
{
__block int a = 0;
g(^(){ a++; });
}
as well as the following which only requires -fsanitize=function -x c++:
void f(char * buf)
{
__builtin_os_log_format(buf, "");
}
Patch by: Ben (bobsayshilol)
Differential revision: https://reviews.llvm.org/D53263
llvm-svn: 346601
A mask type is a 1 to 8-byte string that follows the "mask." annotation
in the format string. This enables obfuscating data in the event the
provided privacy level isn't enabled.
rdar://problem/36756282
llvm-svn: 346211
This is fifth in a series of patches to move intrinsic definitions out of intrin.h.
Note: This was reviewed and approved in D54065 but somehow that diff was messed
up. Committing this again with the proper diff.
llvm-svn: 346205
Summary: This is fifth in a series of patches to move intrinsic definitions out of intrin.h.
Reviewers: rnk, efriedma, mstorsjo, TomTan
Reviewed By: efriedma
Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits
Differential Revision: https://reviews.llvm.org/D54065
llvm-svn: 346191
Summary: This is third in a series of patches to move intrinsic definitions out of intrin.h.
Reviewers: rnk, efriedma, mstorsjo, TomTan
Reviewed By: efriedma
Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits
Differential Revision: https://reviews.llvm.org/D54062
llvm-svn: 346189
Summary: Windows SDK needs these intrinsics to be proper builtins. This is second in a series of patches to move intrinsic defintions out of intrin.h.
Reviewers: rnk, mstorsjo, efriedma, TomTan
Reviewed By: rnk, efriedma
Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits
Differential Revision: https://reviews.llvm.org/D54046
llvm-svn: 346044
The size of an os_log buffer is known at any stage of compilation, so making it
a constant expression means that the common idiom of declaring a buffer for it
won't result in a VLA. That allows the compiler to skip saving and restoring
the stack pointer around such buffers.
This also moves the OSLog and other FormatString helpers from
libclangAnalysis to libclangAST to avoid a circular dependency.
llvm-svn: 345971
This patch should not introduce any behavior changes. It consists of
mostly one of two changes:
1. Replacing fall through comments with the LLVM_FALLTHROUGH macro
2. Inserting 'break' before falling through into a case block consisting
of only 'break'.
We were already using this warning with GCC, but its warning behaves
slightly differently. In this patch, the following differences are
relevant:
1. GCC recognizes comments that say "fall through" as annotations, clang
doesn't
2. GCC doesn't warn on "case N: foo(); default: break;", clang does
3. GCC doesn't warn when the case contains a switch, but falls through
the outer case.
I will enable the warning separately in a follow-up patch so that it can
be cleanly reverted if necessary.
Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu
Differential Revision: https://reviews.llvm.org/D53950
llvm-svn: 345882
The size of an os_log buffer is known at any stage of compilation, so making it
a constant expression means that the common idiom of declaring a buffer for it
won't result in a VLA. That allows the compiler to skip saving and restoring
the stack pointer around such buffers.
This also moves the OSLog helpers from libclangAnalysis to libclangAST
to avoid a circular dependency.
llvm-svn: 345866
This also reverts a couple of follow-up commits trying to fix the
dependency issues. Latest revision added a cyclic dependency that can't
just be patched up in 5 minutes.
llvm-svn: 345846
The size of an os_log buffer is known at any stage of compilation, so making it
a constant expression means that the common idiom of declaring a buffer for it
won't result in a VLA. That allows the compiler to skip saving and restoring
the stack pointer around such buffers.
llvm-svn: 345828
Summary: Use the same convention as all the other WebAssembly builtin names.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits
Differential Revision: https://reviews.llvm.org/D53724
llvm-svn: 345804
Generate the FP16FML intrinsics into arm_neon.h (AArch64 only for now).
Add two new type modifiers to NeonEmitter to handle the new prototypes.
Define __ARM_FEATURE_FP16FML when +fp16fml is enabled and guard the
intrinsics with the macro in arm_neon.h.
Based on a patch by Gao Yiling.
Differential Revision: https://reviews.llvm.org/D53633
llvm-svn: 345344
libgcc supports more than 32 features by adding a new 32-bit variable __cpu_features2.
This adds the clang support for checking these feature bits.
Patches for compiler-rt and llvm to support this are coming as well.
Probably still need an additional patch for target multiversioning in clang.
Differential Revision: https://reviews.llvm.org/D53458
llvm-svn: 344832
Summary:
The multiversioning code repurposed the code from __builtin_cpu_supports for checking if a single feature is enabled. That code essentially performed (_cpu_features & (1 << C)) != 0. But with the multiversioning path, the mask is no longer guaranteed to be a power of 2. So we return true anytime any one of the bits in the mask is set not just all of the bits.
The correct check is (_cpu_features & mask) == mask
Reviewers: erichkeane, echristo
Reviewed By: echristo
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D53460
llvm-svn: 344824
Emit llvm.amdgcn.update.dpp for both __builtin_amdgcn_mov_dpp and
__builtin_amdgcn_update_dpp. The first argument to
llvm.amdgcn.update.dpp will be undef for __builtin_amdgcn_mov_dpp.
Differential Revision: https://reviews.llvm.org/D52320
llvm-svn: 344665
Previously we used a select and the zero_undef=true intrinsic. In -O2 this pattern will get optimized to zero_undef=false. But in -O0 this optimization won't happen. This results in a compare and cmov being wrapped around a tzcnt/lzcnt instruction.
By using the zero_undef=false intrinsic directly without the select, we can improve the -O0 codegen to just an lzcnt/tzcnt instruction.
Differential Revision: https://reviews.llvm.org/D52392
llvm-svn: 343126
unsigned long long builtin_unpack_vector_int128 (vector int128_t, int);
vector int128_t builtin_pack_vector_int128 (unsigned long long, unsigned long long);
Builtins should behave the same way as in GCC.
Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52074
llvm-svn: 342614
This is the clang side of D51803. The llvm intrinsic now returns two results. So we need to emit an explicit store in IR for the out parameter. This is similar to addcarry/subborrow/rdrand/rdseed.
Differential Revision: https://reviews.llvm.org/D51805
llvm-svn: 341699
This is the clang side of D51769. The llvm intrinsics now return two results instead of using an out parameter.
Differential Revision: https://reviews.llvm.org/D51771
llvm-svn: 341678
These aren't documented in the Intel Intrinsics Guide, but are supported by gcc and icc.
Includes these intrinsics:
_ktestc_mask8_u8, _ktestz_mask8_u8, _ktest_mask8_u8
_ktestc_mask16_u8, _ktestz_mask16_u8, _ktest_mask16_u8
_ktestc_mask32_u8, _ktestz_mask32_u8, _ktest_mask32_u8
_ktestc_mask64_u8, _ktestz_mask64_u8, _ktest_mask64_u8
llvm-svn: 341265
This adds:
_cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64
_cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64
_load_mask8, _load_mask16, _load_mask32, _load_mask64
_store_mask8, _store_mask16, _store_mask32, _store_mask64
These are currently missing from the Intel Intrinsics Guide webpage.
llvm-svn: 341251
This adds the following intrinsics:
_kshiftli_mask8
_kshiftli_mask16
_kshiftli_mask32
_kshiftli_mask64
_kshiftri_mask8
_kshiftri_mask16
_kshiftri_mask32
_kshiftri_mask64
llvm-svn: 341234
This adds the following intrinsics:
_kadd_mask64
_kadd_mask32
_kadd_mask16
_kadd_mask8
These are missing from the Intel Intrinsics Guide, but are implemented by both gcc and icc.
llvm-svn: 340879
This also adds a second intrinsic name for the 16-bit mask versions.
These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed.
llvm-svn: 340719
EmitX86BuiltinExpr() emits all args into Ops at the beginning, so don't do that
work again.
This changes behavior: If e.g. ++a was passed as an arg, we incremented a twice
previously. This change fixes that bug.
https://reviews.llvm.org/D50979
llvm-svn: 340348
This is a partial retry of rL340137 (reverted at rL340138 because of gcc host compiler crashing)
with 1 change:
Remove the changes to make microsoft builtins also use the LLVM intrinsics.
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops) that we want to replicate, we can change the names.
The funnel shift intrinsics were added here:
https://reviews.llvm.org/D49242
With improved codegen in:
https://reviews.llvm.org/rL337966https://reviews.llvm.org/rL339359
And basic IR optimization added in:
https://reviews.llvm.org/rL338218https://reviews.llvm.org/rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340141
This is a retry of rL340135 (reverted at rL340136 because of gcc host compiler crashing)
with 2 changes:
1. Move the code into a helper to reduce code duplication (and hopefully work-around the crash).
2. The original commit had a formatting bug in the docs (missing an underscore).
Original commit message:
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate,
we can change the names.
The funnel shift intrinsics were added here:
https://reviews.llvm.org/D49242
With improved codegen in:
https://reviews.llvm.org/rL337966https://reviews.llvm.org/rL339359
And basic IR optimization added in:
https://reviews.llvm.org/rL338218https://reviews.llvm.org/rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340137
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate,
we can change the names.
The funnel shift intrinsics were added here:
D49242
With improved codegen in:
rL337966
rL339359
And basic IR optimization added in:
rL338218
rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340135
r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h.
Microsoft's STL plans on using these functions, and they're using intrin0.h
which just has declarations of built-ins to not pull in the huge intrin.h
header in the standard library headers. That requires that these functions are
real built-ins.
https://reviews.llvm.org/D50907
llvm-svn: 340048
Summary: This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations.
Reviewers: craig.topper, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D46892
llvm-svn: 339651
gcc defines an intrinsic called __builtin_clrsb which counts the number of extra sign bits on a number. This is equivalent to counting the number of leading zeros on a positive number or the number of leading ones on a negative number and subtracting one from the result. Since we can't count leading ones we need to invert negative numbers to count zeros.
This patch will cause the builtin to be expanded inline while gcc uses a call to a function like clrsbdi2 that is implemented in libgcc. But this is similar to what we already do for popcnt. And I don't think compiler-rt supports clrsbdi2.
Differential Revision: https://reviews.llvm.org/D50168
llvm-svn: 339282
Always emit alloca in entry block for enqueue_kernel builtin.
Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC
later because it is not in the entry block.
llvm-svn: 339150
Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC
later because it is not in the entry block.
Differential Revision: https://reviews.llvm.org/D50104
llvm-svn: 338899
The way address space declarations for builtins currently work
is nearly useless. The code assumes the address spaces used for
builtins is a confusingly named "target address space" from user
code using __attribute__((address_space(N))) that matches
the builtin declaration. There's no way to use this to declare
a builtin that returns a language specific address space.
The terminology used is highly cofusing since it has nothing
to do with the the address space selected by the target to use
for a language address space.
This feature is essentially unused as-is. AMDGPU and NVPTX
are the only in-tree targets attempting to use this. The AMDGPU
builtins certainly do not behave as intended (i.e. all of the
builtins returning pointers can never compile because the numbered
address space never matches the expected named address space).
The NVPTX builtins are missing tests for some, and the others
seem to rely on an implicit addrspacecast.
Change the used address space for builtins based on a target
hook to allow using a language address space for a builtin.
This allows the same builtin declaration to be used for multiple
languages with similarly purposed address spaces (e.g. the same
AMDGPU builtin can be used in OpenCL and CUDA even though the
constant address spaces are arbitarily different).
This breaks the possibility of using arbitrary numbered
address spaces alongside the named address spaces for builtins.
If this is an issue we probably need to introduce another builtin
declaration character to distinguish language address spaces from
so-called "target address spaces".
llvm-svn: 338707
This patch adds support for vrndi_f32() and vrndiq_f32()
intrinsics in AArch32 mode and for vrndns_f32() intrinsic in
AArch64 mode.
Differential Revision: https://reviews.llvm.org/D48829
llvm-svn: 337690
As documented here: https://software.intel.com/en-us/node/682969 and
https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning
is an ICC feature that provides for function multiversioning.
This feature is implemented with two attributes: First, cpu_specific,
which specifies the individual function versions. Second, cpu_dispatch,
which specifies the location of the resolver function and the list of
resolvable functions.
This is valuable since it provides a mechanism where the resolver's TU
can be specified in one location, and the individual implementions
each in their own translation units.
The goal of this patch is to be source-compatible with ICC, so this
implementation diverges from the ICC implementation in a few ways:
1- Linux x86/64 only: This implementation uses ifuncs in order to
properly dispatch functions. This is is a valuable performance benefit
over the ICC implementation. A future patch will be provided to enable
this feature on Windows, but it will obviously more closely fit ICC's
implementation.
2- CPU Identification functions: ICC uses a set of custom functions to identify
the feature list of the host processor. This patch uses the cpu_supports
functionality in order to better align with 'target' multiversioning.
1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function
marked cpu_dispatch be an empty definition. This patch supports that as well,
however declarations are also permitted, since the linker will solve the
issue of multiple emissions.
Differential Revision: https://reviews.llvm.org/D47474
llvm-svn: 337552
The codegen for this builtin was initially implemented to match GCC.
However, due to interest from users GCC changed behaviour to account for the
big endian bias of the instruction and correct it. This patch brings the
handling inline with GCC.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38192
Differential Revision: https://reviews.llvm.org/D49424
llvm-svn: 337449
This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics.
This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling.
I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle.
llvm-svn: 336622
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.
Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.
To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.
To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.
To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.
To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.
There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.
Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.
Differential Revision: https://reviews.llvm.org/D48617
llvm-svn: 336583
This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases.
Factor the code into a helper function to make this easy.
llvm-svn: 336472
Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles.
While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle.
llvm-svn: 336388
This patch removes on optimization used with the TRUE/FALSE
predicates, as was suggested in https://reviews.llvm.org/D45616
for r335339.
The optimization was buggy, since r335339 used it also
for *_mask builtins, without actually applying the mask -- the
mask argument was just ignored.
Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D48715
llvm-svn: 336355