Summary:
int __builtin_amdgcn_ds_swizzle (int a, int imm);
while imm is a constant.
Differential Revision:
http://reviews.llvm.org/D23682
llvm-svn: 279165
constraints were added to _mm256_broadcast_{pd,ps} intel intrinsics.
The spec for these intrinics is ... pretty much silent on alignment.
This is especially frustrating considering the amount of discussion of
alignment in the load and store instrinsics. So I was forced to rely on
the specification for the VBROADCASTF128 instruction.
That instruction's spec is *also* completely silent on alignment.
Fortunately, when it comes to the instruction's spec, silence is enough.
There is no #GP fault option for an underaligned address so this
instruction, and by inference the intrinsic, can read any alignment.
As it happens, the old code worked exactly this way and in fact we have
plenty of code that hands pointers with less than 16-byte alignment to
these intrinsics. This code broke pretty spectacularly with this commit.
Fortunately, the fix is super simple! Change a 16 to a 1, and ta da!
Anyways, a lot of debugging for a really boring fix. =]
llvm-svn: 278202
Summary:
In order to re-define OpenCL built-in functions
'to_{private,local,global}' in OpenCL run-time library LLVM names must
be different from the clang built-in function names.
Reviewers: yaxunl, Anastasia
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D23120
llvm-svn: 277743
As discussed on D22460, I've updated the vbroadcastf128 pd256/ps256 builtins to map directly to generic IR - load+splat a 128-bit vector to both lanes of a 256-bit vector.
Fix for PR28657.
llvm-svn: 276417
- Added new Builtins: enqueue_kernel, get_kernel_work_group_size
and get_kernel_preferred_work_group_size_multiple.
These Builtins use custom check to diagnose parameters of the passed Blocks
i. e. variable number of 'local void*' type params, and check different
overloads specified in Table 6.31 of OpenCL v2.0.
- IR is generated as an internal library call for each OpenCL Builtin,
reusing ObjC Block implementation.
Review: http://reviews.llvm.org/D20249
llvm-svn: 274540
Currently we only have OpenCL 2.0 Builtins i.e. pipes or address space conversions.
They have to be added only in the version 2.0 compilation mode to make the identifiers
available for use in the other versions.
Review: http://reviews.llvm.org/D20249
llvm-svn: 274509
This is important for building libclc. Since r273039 tests are failing
due to now emitting calls to these functions instead of emitting the
DAG node. The libm function names are implemented for OpenCL, and should
call the locally defined versions, so -fno-builtin is used. The IR
Some functions use the __builtins and expect the intrinsics to be
emitted. Without this we end up with nobuiltin calls to intrinsics
or to unsupported library calls.
llvm-svn: 274370
Reapplying patch in r272777 which was reverted
because the llvm patch which added support
for generating the mcrr/mcrr2 instructions
from the intrinsic was causing an assertion
failure. This has now been fixed in llvm.
llvm-svn: 272983
As noted in the code comment, a potential follow-on would be to remove
the builtins themselves. Other than ord/unord, this already works as
expected. Eg:
typedef float v4sf __attribute__((__vector_size__(16)));
v4sf fcmpgt(v4sf a, v4sf b) { return a > b; }
Differential Revision: http://reviews.llvm.org/D21268
llvm-svn: 272840
Patch adds intrinsics for mrrc/mrrc2. The
intrinsics for mrrc/mrrc2 return a single
uint64_t to represent two 32 bit values.
The mcrr/mcrr2 intrinsic was changed to
accept a single uint64_t instead of two
32 bit values as the input for consistency.
Differential Revision: http://reviews.llvm.org/D21179
llvm-svn: 272777
We can now use __builtin_nontemporal_store instead of target specific builtins for naturally aligned nontemporal stores which avoids the need for handling in CGBuiltin.cpp
The scalar integer nontemporal (unaligned) store builtins will have to wait as __builtin_nontemporal_store currently assumes natural alignment and doesn't accept the 'packed struct' trick that we use for normal unaligned load/stores.
The nontemporal loads require further backend support before we can safely convert them to __builtin_nontemporal_load
Differential Revision: http://reviews.llvm.org/D21272
llvm-svn: 272540
_InterlockedIncrement and _InterlockedDecrement have 'long' in their
prototypes. We assumed 'long' was the same size as an i32 which is
incorrect for other targets.
This fixes PR27892.
llvm-svn: 270953
OpenCL builtin functions to_{global|local|private} accepts argument of pointer type to arbitrary pointee type, and return a pointer to the same pointee type in different addr space, i.e.
global gentype *to_global(gentype *p);
It is not desirable to declare it as
global void *to_global(void *);
in opencl header file since it misses diagnostics.
This patch implements these builtin functions as Clang builtin functions. In the builtin def file they are defined to have signature void*(void*). When handling call expressions, their declarations are re-written to have correct parameter type and return type corresponding to the call argument.
In codegen call to addr void *to_addr(void*) is generated with addrcasts or bitcasts to facilitate implementation in builtin library.
Differential Revision: http://reviews.llvm.org/D19932
llvm-svn: 270261
This is matching what trunk gcc is accepting. Also adds a missing ssse3
case. PR27779. The amount of duplication here is annoying, maybe it
should be factored into a separate .def file?
llvm-svn: 270224
Summary:
Previously it was implemented as inline asm in the CUDA headers.
This change allows us to use the [addr+imm] addressing mode when
executing ld.global.nc instructions. This translates into a 1.3x
speedup on some benchmarks that call this instruction from within an
unrolled loop.
Reviewers: tra, rsmith
Subscribers: jhen, cfe-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D19990
llvm-svn: 270150