The __yield intrinsic generates a hint instruction to indicate that the thread
is not performing any useful operations at the moment. This is for
compatibility with MSVC, although, the intrinsic is also part of the ACLE, and
is enabled globally as a result.
llvm-svn: 207275
This adds support for the various NEON intrinsics used by
aarch64-neon-intrinsics.c (originally written for AArch64) and enables the
test.
My implementations are designed to be semantically correct, the actual code
quality looks like its a wash between the two backends, and is frequently
different (hence the large number of CHECK changes).
llvm-svn: 205210
This adds Clang support for the ARM64 backend. There are definitely
still some rough edges, so please bring up any issues you see with
this patch.
As with the LLVM commit though, we think it'll be more useful for
merging with AArch64 from within the tree.
llvm-svn: 205100
The main difference between __va_start and __builtin_va_start is that
the address of the va_list has already been taken, and the va_list is
always a char*.
__va_end and __va_arg are not needed.
llvm-svn: 204821
The 'f' modifier is designed for integer type arguments really (according to
its documentation). It's better to use the "half width, same number" modifier.
Should be no user-visible change.
llvm-svn: 202152
Because GCC incorrectly defines _mm_prefetch to take anything that casts
to void*, people have started using that behavior. The previous patch
that made _mm_prefetch actually take a const char * broke compatibility
with existing code. This update to the patch leaves the macro that
defines _mm_prefetch with the (void*) cast when _MSC_VER is not defined.
llvm-svn: 201901
This extends the intrinsic lookup table format slightly, and adds
entries for use the shared ARM/AArch64 definitions. The benefit is
currently smaller than for the SISD intrinsics (there's more custom
code implementing this set), but a few lines are saved and there's
scope for future expansion.
llvm-svn: 201848
This extracts the table-driven intrinsic lookup phase into a separate
function, to be used by EmitCommonNeonBuiltinExpr soon.
It also simplifies the logic used in that lookup, since VectorCastArgN
and ScalarArgN were actually identical.
llvm-svn: 201847
This breaks backwards compatibility with existing code. Previously, this
was defined as
#define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a), 0, (sel)))
Which basically accepts any pointer. Changing this to char* simply
breaks a lot of existing code. I have tried changing char* to
"const void*", which seems to be the right thing as per Intel
specification this should work on basically any pointer. However,
apparently this breaks windows compatibility (because of a conflicting
declaration in windows.h).
So, we probably need to #ifdef this based on whether clang is compiling
for windows. According to Chandler, this might be done by introducing an
additional symbol to a fake type in BuiltinsX86.def and then condition
the type expansion on the platform.
llvm-svn: 201775
This patch adds several built-ins that are required for ms
compatibility. _mm_prefetch must be a built-in because it takes a
compile-time constant argument and our prior approach of using a #define
to the current built-in doesn't work in the presence of re-declaration
of _mm_prefetch. The others can be obtained by including the windows
system headers. If a user includes the windows system headers but not
intrin.h they still need to work and therefore must be built-in because
we don't get a chance to implement them in intrin.h in this case.
llvm-svn: 201734
This fixes one immediate bug where an expression with side-effects
could be emitted twice during a NEON call.
It also prepares the way for folding CodeGen for many of the SISD
intrinsics into a table, reducing code size and hopefully increasing
performance eventually ("binary search + few switch cases" should be
better than "lots of switch cases").
llvm-svn: 201667
These instructions (well, the f32 ones) are supported on 32-bit ARMv8, not just
AArch64. Now that the arm_neon.td refactoring is complete, adding them is
surprisingly simple.
rdar://problem/16035743
llvm-svn: 201661
This changes ARM to use @llvm.fabs for floating-point vabs. Patterns
already existed in the backend, and it might help mid-end phases since
it's more likely to be understood than @llvm.arm.neon.vabs.
llvm-svn: 201313
The s64/u64 vcvt conversion operations are actually pretty much identical to
the s32/u32 ones in implementation, and can be shared with just one extra
variable.
llvm-svn: 201145
Now that both ARM backends use the same implementation for vshll operations,
the code can be shared. This is also a necessary LLVM/Clang interface update.
llvm-svn: 201094
Now the backend supports the natural LLVM IR, we can shamelessly steal the
AArch64 front-end code to implement the vshrn intrinsic on 32-bit ARM.
llvm-svn: 201086
Now that the back-end intrinsics are more regular, there's no need for the
special handling these got in the front-end, so they can be moved to
EmitCommonNeonBuiltinExpr.
llvm-svn: 200769
The LLVM backend now has invariant types on the various crypto-intrinsics,
because in all cases there's only really one interpretation.
llvm-svn: 200707
This should be the last routine patch: AArch64 does still delegate to
EmitARMBuiltinExpr, but the remaining instances have complications of
one sort or another so some more cunning thought will be needed.
llvm-svn: 200528
point reciprocal exponent, and floating-point reciprocal square root estimate
LLVM AArch64 intrinsics to use f32/f64 types, rather than their vector
equivalents.
llvm-svn: 197069
This is a duplicate implementation.
E.g. this patch defines:
float64_t vabd_f64(float64_t a, float64_t b)
But there is already a similar intrinsic "vabdd_f64" with the same types.
Also, this intrinsic will be conflicted to the vector type intrinsic as following(Which is implemented by me and will be committed to trunk):
float64x1_t vabd_f64(float64x1_t a, float64x1_t b).
Two functions shouldn't have a same name in arm_neon.h.
According to ARM ACLE document, such vabd_f64 with float64_t is not existing.
So I revert this commit.
llvm-svn: 196205
CodeGenABITypes is a wrapper built on top of CodeGenModule that exposes
some of the functionality of CodeGenTypes (held by CodeGenModule),
specifically methods that determine the LLVM types appropriate for
function argument and return values.
I addition to CodeGenABITypes.h, CGFunctionInfo.h is introduced, and the
definitions of ABIArgInfo, RequiredArgs, and CGFunctionInfo are moved
into this new header from the private headers ABIInfo.h and CGCall.h.
Exposing this functionality is one part of making it possible for LLDB
to determine the actual ABI locations of function arguments and return
values, making it possible for it to determine this for any supported
target without hard-coding ABI knowledge in the LLDB code.
llvm-svn: 193717
This uses function prefix data to store function type information at the
function pointer.
Differential Revision: http://llvm-reviews.chandlerc.com/D1338
llvm-svn: 193058
class. The instruction class includes the signed saturating doubling
multiply-add long, signed saturating doubling multiply-subtract long, and
the signed saturating doubling multiply long instructions.
llvm-svn: 192909
Including following 14 instructions:
4 ld1 insts: load multiple 1-element structure to sequential 1/2/3/4 registers.
ld2/ld3/ld4: load multiple N-element structure to sequential N registers (N=2,3,4).
4 st1 insts: store multiple 1-element structure from sequential 1/2/3/4 registers.
st2/st3/st4: store multiple N-element structure from sequential N registers (N = 2,3,4).
llvm-svn: 192362
Including following 14 instructions:
4 ld1 insts: load multiple 1-element structure to sequential 1/2/3/4 registers.
ld2/ld3/ld4: load multiple N-element structure to sequential N registers (N=2,3,4).
4 st1 insts: store multiple 1-element structure from sequential 1/2/3/4 registers.
st2/st3/st4: store multiple N-element structure from sequential N registers (N = 2,3,4).
E.g. ld1(3 registers version) will load 32-bit elements {A, B, C, D, E, F} sequentially into the three 64-bit vectors list {BA, DC, FE}.
E.g. ld3 will load 32-bit elements {A, B, C, D, E, F} into the three 64-bit vectors list {DA, EB, FC}.
llvm-svn: 192351
These IR instructions are undefined when the amount is equal to operand
size, but NEON right shifts support such shifts. Work around that by
emitting a different IR in these cases.
llvm-svn: 191953
Patch by Ana Pazos.
1.Added support for v1ix and v1fx types.
2.Added Scalar Pairwise Reduce instructions.
3.Added initial implementation of Scalar Arithmetic instructions.
llvm-svn: 191264
This restores the sqrt -> llvm.sqrt mapping, but only in fast-math mode
(specifically, when the UnsafeFPMath or NoNaNsFPMath CodeGen options are
enabled). The @llvm.sqrt* intrinsics have slightly different semantics from the
libm call, specifically, they are undefined when given a non-zero negative
number (the libm calls will always return NaN for any negative number).
This mapping was removed in r100613, and replaced with a TODO, but at that time
the fast-math flags were not yet implemented. Now that we have these, restoring
this mapping is important because it will enable autovectorization of sqrt
calls in loops (at least in fast-math mode).
llvm-svn: 190646
These operations "vector add high-half narrow" actually correspond to the
sequence:
%sum = add <4 x i32> %lhs, %rhs
%high = lshr <4 x i32> %sum, <i32 16, i32 16, i32 16, i32 16>
%res = trunc <4 x i32> %high to <4 x i16>
Now that LLVM can spot this, Clang should emit the corresponding LLVM IR.
llvm-svn: 189463
The NEON intrinsics vqdmlal and vqdmlsl are really just combinations of a
saturating-doubling-multiply (vqdmull) and a saturating add/sub, so now that
LLVM can spot those patterns Clang should emit them instead of specialised
intrinsics.
Feature already tested by existing ARM NEON intrinsics tests.
llvm-svn: 189462
Patch by Ana Pazos
- Completed implementation of instruction formats:
AdvSIMD three same
AdvSIMD modified immediate
AdvSIMD scalar pairwise
- Completed implementation of instruction classes
(some of the instructions in these classes
belong to yet unfinished instruction formats):
Vector Arithmetic
Vector Immediate
Vector Pairwise Arithmetic
- Initial implementation of instruction formats:
AdvSIMD scalar two-reg misc
AdvSIMD scalar three same
- Intial implementation of instruction class:
Scalar Arithmetic
- Initial clang changes to support arm v8 intrinsics.
Note: no clang changes for scalar intrinsics function name mangling yet.
- Comprehensive test cases for added instructions
To verify auto codegen, encoding, decoding, diagnosis, intrinsics.
llvm-svn: 187568
This patch provides basic support for powerpc64le as an LLVM target.
However, use of this target will not actually generate little-endian
code. Instead, use of the target will cause the correct little-endian
built-in defines to be generated, so that code that tests for
__LITTLE_ENDIAN__, for example, will be correctly parsed for
syntax-only testing. Code generation will otherwise be the same as
powerpc64 (big-endian), for now.
The patch leaves open the possibility of creating a little-endian
PowerPC64 back end, but there is no immediate intent to create such a
thing.
The new test case variant ensures that correct built-in defines for
little-endian code are generated.
llvm-svn: 187180
r186899 and r187061 added a preferred way for some architectures not to get
intrinsic generation for math builtins. So the code changes in r185568 can
now be undone (the test remains).
llvm-svn: 187079
This adds three overloaded intrinsics to Clang:
T __builtin_arm_ldrex(const volatile T *addr)
int __builtin_arm_strex(T val, volatile T *addr)
void __builtin_arm_clrex()
The intent is that these do what users would expect when given most sensible
types. Currently, "sensible" translates to ints, floats and pointers.
llvm-svn: 186394
& operator (ignoring any overloaded operator& for the type). The purpose of
this builtin is for use in std::addressof, to allow it to be made constexpr;
the existing implementation technique (reinterpret_cast to some reference type,
take address, reinterpert_cast back) does not permit this because
reinterpret_cast between reference types is not permitted in a constant
expression in C++11 onwards.
llvm-svn: 186053
Without fmath-errno, Clang currently generates calls to @llvm.pow.* intrinsics
when it sees pow*(). This may not be suitable for all targets (for
example le32/PNaCl), so the attached patch adds a target hook that CodeGen
queries. The target can state its preference for having or not having the
intrinsic generated. Non-PNaCl behavior remains unchanged;
PNaCl-specific test added.
llvm-svn: 185568
This will enable users in security critical applications to perform
checked-arithmetic in a fast safe manner that is amenable to c.
Tests/an update to Language Extensions is included as well.
rdar://13421498.
llvm-svn: 184497
I have had several people ask me about why this builtin was not available in
clang (since it seems like a logical conclusion). This patch implements said
builtins.
Relevant tests are included as well. I also updated the Clang language extension reference.
rdar://14192664.
llvm-svn: 184227
Current gcc's produce an error if __clear_cache is anything but
__clear_cache(char *a, char *b);
It looks like we had just implemented a gcc bug that is now fixed.
llvm-svn: 181784
aggregate types in a profoundly wrong way that has to be
worked around in every call site, to getEvaluationKind,
which classifies and distinguishes between all of these
cases.
Also, normalize the API for loading and storing complexes.
I'm working on a larger patch and wanted to pull these
changes out, but it would have be annoying to detangle
them from each other.
llvm-svn: 176656
calls and declarations.
LLVM has a default CC determined by the target triple. This is
not always the actual default CC for the ABI we've been asked to
target, and so we sometimes find ourselves annotating all user
functions with an explicit calling convention. Since these
calling conventions usually agree for the simple set of argument
types passed to most runtime functions, using the LLVM-default CC
in principle has no effect. However, the LLVM optimizer goes
into histrionics if it sees this kind of formal CC mismatch,
since it has no concept of CC compatibility. Therefore, if this
module happens to define the "runtime" function, or got LTO'ed
with such a definition, we can miscompile; so it's quite
important to get this right.
Defining runtime functions locally is quite common in embedded
applications.
llvm-svn: 176286
In ArrayRef<T>(X), X should not be temporary value. It could be rewritten more redundantly;
llvm::Type *XTy = X->getType();
ArrayRef<llvm::Type *> Ty(XTy);
llvm::Value *Callee = CGF.CGM.getIntrinsic(IntrinsicID, Ty);
Since it is safe if both XTy and Ty are temporary value in one statement, it could be shorten;
llvm::Value *Callee = CGF.CGM.getIntrinsic(IntrinsicID, ArrayRef<llvm::Type*>(X->getType()));
ArrayRef<T> has an implicit constructor to create uni-entry of T;
llvm::Value *Callee = CGF.CGM.getIntrinsic(IntrinsicID, X->getType());
MSVC-generated clang.exe crashed.
llvm-svn: 172352
PR 14529 was opened because neither Clang or LLVM was expanding
calls to creal* or cimag* into instructions that just load the
respective complex field. After some discussion, it was not
considered realistic to do this in LLVM because of the platform
specific way complex types are expanded. Thus a way to solve
this in Clang was pursued. GCC does a similar expansion.
This patch adds the feature to Clang by making the creal* and
cimag* functions library builtins and modifying the builtin code
generator to look for the new builtin types.
llvm-svn: 170455
uncovered.
This required manually correcting all of the incorrect main-module
headers I could find, and running the new llvm/utils/sort_includes.py
script over the files.
I also manually added quite a few missing headers that were uncovered by
shuffling the order or moving headers up to be main-module-headers.
llvm-svn: 169237
checks to enable. Remove frontend support for -fcatch-undefined-behavior,
-faddress-sanitizer and -fthread-sanitizer now that they don't do anything.
llvm-svn: 167413
GCC has always supported this on PowerPC and 4.8 supports it on all platforms,
so it's a good idea to expose it in clang too. LLVM supports this on all targets.
llvm-svn: 165362
The expression based expansion too often results in IR level optimizations
splitting the intermediate values into separate basic blocks, preventing
the formation of the VBSL instruction as the code author intended. In
particular, LICM would often hoist part of the computation out of a loop.
rdar://11011471
llvm-svn: 164342
the trap BB out of the individual checks and into a common function, to prepare
for making this code call into a runtime library. Rename the existing EmitCheck
to EmitTypeCheck to clarify it and to move it out of the way of the new
EmitCheck.
llvm-svn: 163451
The backend has to legalize i64 types by splitting them into two 32-bit pieces,
which leads to poor quality code. If we produce code for these intrinsics that
uses one-element vector types, which can live in Neon vector registers without
getting split up, then the generated code is much better. Radar 11998303.
llvm-svn: 161879
intrinsics. The second instruction(s) to be handled are the vector versions
of count set bits (ctpop).
The changes here are to clang so that it generates a target independent
vector ctpop when it sees an ARM dependent vector bits set count. The changes
in llvm are to match the target independent vector ctpop and in
VMCore/AutoUpgrade.cpp to update any existing bc files containing ARM
dependent vector pop counts with target-independent ctpops. There are also
changes to an existing test case in llvm for ARM vector count instructions and
to a test for the bitcode upgrade.
<rdar://problem/11892519>
There is deliberately no test for the change to clang, as so far as I know, no
consensus has been reached regarding how to test neon instructions in clang;
q.v. <rdar://problem/8762292>
llvm-svn: 160409
intrinsics with target-indepdent intrinsics. The first instruction(s) to be
handled are the vector versions of count leading zeros (ctlz).
The changes here are to clang so that it generates a target independent
vector ctlz when it sees an ARM dependent vector ctlz. The changes in llvm
are to match the target independent vector ctlz and in VMCore/AutoUpgrade.cpp
to update any existing bc files containing ARM dependent vector ctlzs with
target-independent ctlzs. There are also changes to an existing test case in
llvm for ARM vector count instructions and a new test for the bitcode upgrade.
<rdar://problem/11831778>
There is deliberately no test for the change to clang, as so far as I know, no
consensus has been reached regarding how to test neon instructions in clang;
q.v. <rdar://problem/8762292>
llvm-svn: 160201
in the ABI arrangement, and leave a hook behind so that we can easily
tweak CCs on platforms that use different CCs by default for C++
instance methods.
llvm-svn: 159894
The tablegen'd code does the same thing without this egregious duplication.
In my limited testing everything seems to work, however there can be
differences if the clang and llvm builtin definitions don't match.
llvm-svn: 159371
pointer, but such folding encounters side-effects, ignore the side-effects
rather than performing them at runtime: CodeGen generates wrong code for
__builtin_object_size in that case.
llvm-svn: 157310
r155047. See the LLVM log for the primary motivation:
http://llvm.org/viewvc/llvm-project?rev=155047&view=rev
Primary commit r154828:
- Several issues were raised in review, and fixed in subsequent
commits.
- Follow-up commits also reverted, and which should be folded into the
original before reposting:
- r154837: Re-add the 'undef BUILTIN' thing to fix the build.
- r154928: Fix build warnings, re-add (and correct) header and
license
- r154937: Typo fix.
Please resubmit this patch with the relevant LLVM resubmission.
llvm-svn: 155048
__atomic_test_and_set, __atomic_clear, plus a pile of undocumented __GCC_*
predefined macros.
Implement library fallback for __atomic_is_lock_free and
__c11_atomic_is_lock_free, and implement __atomic_always_lock_free.
Contrary to their documentation, GCC's __atomic_fetch_add family don't
multiply the operand by sizeof(T) when operating on a pointer type.
libstdc++ relies on this quirk. Remove this handling for all but the
__c11_atomic_fetch_add and __c11_atomic_fetch_sub builtins.
Contrary to their documentation, __atomic_test_and_set and __atomic_clear
take a first argument of type 'volatile void *', not 'void *' or 'bool *',
and __atomic_is_lock_free and __atomic_always_lock_free have an argument
of type 'const volatile void *', not 'void *'.
With this change, libstdc++4.7's <atomic> passes libc++'s atomic test suite,
except for a couple of libstdc++ bugs and some cases where libc++'s test
suite tests for properties which implementations have latitude to vary.
llvm-svn: 154640
<stdatomic.h> header.
In passing, fix LanguageExtensions to note that C11 and C++11 are no longer
"upcoming standards" but are now actually standardized.
llvm-svn: 154513