It has been part of the common functions since 1.0
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 231137
Ported from the libclc/amd-builtins branch
v2: Rename sincos_f_piby4 to __libclc__sincosf_piby4
Add cospi(double) implementation instead of using llvm.cos
Notes:
The sincosD_piby4.h file is mostly the same as the builtin implementation
released by AMD. The inline attribute declaration is changed, and M_PI is
used instead of a constant double. Otherwise, the only difference is that
the header explicitly enables the fp64 pragma.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
CC: Tom Stellard <tom@stellard.net>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 230641
This is a simple implementation which just copies data synchronously.
v2:
- Use size_t.
v3:
- Fix possible race condition by splitting the copy among multiple
work items.
llvm-svn: 219008
We were missing the local versions of the atom_* before
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217911
Uses the algorithm:
tan(x) = sin(x) / sqrt(1-sin^2(x))
An alternative is:
tan(x) = sin(x) / cos(x)
Which produces more verbose bitcode and longer assembly.
Either way, the generated bitcode seems pretty nasty and a more optimized
but still precise-enough solution is welcome.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217511
Passes the tests that were submitted to the piglit list
Tested on R600 (Pitcairn)
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217509
This was previously implemented with a macro and we were using
__builtin_copysign(), which takes double inputs for the float
version of copysign().
Reviewed-and-Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217045
This generates bitcode which is indistinguishable from what was
hand-written for int32 types in v[load|store]_impl.ll.
v4: Use vec2+scalar for vec3 load/stores to prevent corruption (per Tom)
v3: Also remove unused generic/lib/shared/v[load|store]_impl.ll
v2: (Per Matt Arsenault) Fix alignment issues with vector load stores
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 216069
v2 Changes:
- use __builtin_signbit instead of shifting by hand
- significantly improve vector shuffling
- Works correctly now for signbit(float16) on radeonsi
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 211696
Use separate implementations instead of a macro
to ensure the constant multiplied with is of
higher precision.
v2: Use the correct formula, spotted by Dan Liew <daniel.liew@imperial.ac.uk>
Reviewed-by: Aaron Warty <awatry@gmail.com>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 210891
There are two implementations of nextafter():
1. Using clang's __builtin_nextafter. Clang replaces this builtin with
a call to nextafter which is part of libm. Therefore, this
implementation will only work for targets with an implementation of
libm (e.g. most CPU targets).
2. The other implementation is written in OpenCL C. This function is
known internally as __clc_nextafter and can be used by targets that
don't have access to libm.
llvm-svn: 192383
Everything except long/ulong is handled by just casting to the next larger type,
doing the math and then shifting/casting the result.
For 64-bit types, we break the high/low parts of each operand apart, and do
a FOIL-based multiplication.
v2:
Discard the stack-overflow implementation due to copyright concerns.
- The implementation is still FOIL-based, but discards the previous code.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188684
rhadd = (x+y+1)>>1
Implemented as:
(x>>1) + (y>>1) + ((x&1)|(y&1))
This prevents us having to do assembly addition and overflow detection
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188477
(x + y) >> 1 gets changed to:
(x>>1) + (y>>1) + (x&y&1)
Saves us having to do any llvm assembly and overflow checking in the addition.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188476
Reduces all vector upsamples down to its scalar components, so probably
not the most efficient thing in the world, but it does what the
spec says it needs to do.
Another possible implementation would be to convert/cast everything as
unsigned if necessary, upsample the input vectors, create the upsampled
value, and then cast back to signed if required.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 186691
This commit gets us back to pure CLC and fixes offset calculations.
The next commit will re-enable the assembly implementation for R600,
fix bugs related to 64-bit address spaces, and also fix the
incorrect assumption that address space identifiers are the same in
all architectures.
llvm-svn: 186415
The assembly should be generic, but at least currently R600 only supports
32-bit stores of [u]int1/4, and I believe that only global is well-supported.
R600 lowers the 8/16 component stores to multiple 4-component stores.
The unoptimized C versions of the other stuff is left in place.
Patch by: Aaron Watry
llvm-svn: 185009
The assembly should be generic, but at least currently R600 only supports
32-bit loads of int1/4, and I believe that only global is well-supported.
R600 lowers the 8/16 component vectors to multiple 4-bit loads.
The unoptimized C versions of the other stuff is left in place.
Patch by: Aaron Watry
llvm-svn: 185008
Squashed commit of the following:
commit a0df0a0e86c55c1bdc0b9c0f5a739e5adef4b056
Author: Aaron Watry <awatry@gmail.com>
Date: Mon Apr 15 18:42:04 2013 -0500
libclc: Rename clz.ll to clz_if.ll to ensure it gets built.
configure.py treats files that have the same name with the .cl and .ll
extensions as overriding eachother.
E.g. If you have clz.cl and clz.ll both specified to be built in the same
SOURCES file, only the first file listed will actually be built.
Since the contents of clz.ll were an interface that is implemented in
clz_impl.ll, rename clz.ll to clz_if.ll to make sure that the interface is
built.
commit 931b62bed05c58f737de625bd415af09571a6a5a
Author: Aaron Watry <awatry@gmail.com>
Date: Sat Apr 13 12:32:54 2013 -0500
libclc: llvm assembly implementation of clz
Untested... currently crashes in the same manner as add_sat.
commit 6ef0b7b0b6d2e5584086b4b9a9243743b2e0538f
Author: Aaron Watry <awatry@gmail.com>
Date: Sat Mar 23 12:35:27 2013 -0500
libclc: Add stub clz builtin
For scalar int/uint, attempt to use the clz llvm builtin.. for all others
return 0 until an actual implementation is finished.
Patch by: Aaron Watry
llvm-svn: 185004
configure.py allows overloading *.cl with *.ll, but will only ever build
the first file listed in SOURCES of ${file}.cl and ${file}.ll
add_sat, sub_sat, (and the soon to be submitted clz) all define interfaces in
${function_name}.ll which are implemented in ${function_name}_impl.ll.
Renaming the interface files is enough to get them to build again, fixing
CL usage of these functions.
Tested on clover/r600g.
Patch by: Aaron Watry
llvm-svn: 185000
This implementation does a lot of bit shifting and masking. Suffice to say,
this is somewhat suboptimal... but it does look to produce correct results
(after the piglit tests were corrected for sign extension issues).
Someone who knows LLVM better than I could re-write this more efficiently.
Patch by: Aaron Watry
llvm-svn: 184996
Created under a new shared/ directory for functions which are available for
both integer and floating point types.
Patch by: Aaron Watry
llvm-svn: 184994