Header files that come with CUDA are assuming split host/device
compilation and are not usable by clang out of the box.
With a bit of preprocessor magic it's possible to twist them
into something clang can use.
This wrapper always includes CUDA headers exactly the same way during
host and device compilation passes and produces identical preprocessed
content during host and device side compilation for sm_35 GPUs. Device
compilation passes for older GPUs will see a smaller subset of device
functions supported by particular GPU.
The wrapper assumes specific contents of CUDA header files and works
only with CUDA 7.0 and 7.5.
Differential Revision: http://reviews.llvm.org/D13171
llvm-svn: 253388
- added detection of libdevice bitcode file and API to find one appropriate for the GPU we're compiling for.
- pass additional cc1 options for linking with detected libdevice bitcode
- added -nocudalib to prevent automatic linking with libdevice
- added test cases to verify new functionality
Differential Revision: http://reviews.llvm.org/D14556
llvm-svn: 253387
In order to compile a CUDA file clang must be able to find
include files for both both host and device.
This patch passes AuxToolchain to AddPreprocessingOptions and
uses it to add include paths for the opposite side of compilation.
We also must be able to find CUDA include files. If the driver
found CUDA installation, it adds appropriate include path
to CUDA headers. This can be disabled with '-nocudainc'.
- Added include paths for the opposite side of compilation.
- Added include paths to detected CUDA installation.
- Added -nocudainc to prevent adding CUDA include path.
- Added test cases to verify new functionality.
Differential Revision: http://reviews.llvm.org/D13170
llvm-svn: 253386
Clang needs to know target triple for both sides of compilation so that
preprocessor macros and target builtins from both sides are available.
This change augments Compilation class to carry information about
toolchains used during different CUDA compilation passes and refactors
BuildActions to use it when it constructs CUDA jobs.
Removed DeviceTriple from CudaHostAction/CudaDeviceAction as it's no
longer needed.
Differential Revision: http://reviews.llvm.org/D13144
llvm-svn: 253385
The tzcnt intrinsics are used non non-BMI targets by code (e.g. ffmpeg)
that uses it as a potentially faster BSF.
The TZCNT instruction is special in that it's encoded in a
backward-compatible way and behaves as BSF on non-BMI targets.
Differential Revision: http://reviews.llvm.org/D14748
llvm-svn: 253358
"-arch armv7k" is only normally used with a watchos target, but Clang doesn't
stop you from giving a "-miphoneos-version-min" option too, converting the
triple to "thumbv7k-apple-ios". In this case the backend will decide to use
SjLj exceptions, so Clang needs to agree so it can create the correct
predefines.
Fortunately, there's a handy function to make the decision for us now.
llvm-svn: 253355
This reverts commit r253269.
This leads to assert / segfault triggering on the following reduced example:
float foo(float U, float base, float cell) { return (U = 2 * base) - cell; }
llvm-svn: 253337
target attribute, don't include "negative" subtarget features in the
list of required features. Builtins are positive by default so don't
need this change, but we pull the default list of features from the
command line and so need to make sure that we only include features
that are turned on for code generation in our error.
llvm-svn: 253242
These two intrinsics are defined in arm_acle.h.
__rev16l needs to rotate by 16 bits, bit it was actually rotating by 2 bits.
For AArch64, where long is 64 bits, this would still be wrong.
__rev16ll was incorrect, it reversed the bytes in each 32-bit word, rather than
each 16-bit halfword. The correct implementation is to apply __rev16 to the top
and bottom words of the 64-bit value.
For AArch32 targets, these get compiled down to the hardware rev16 instruction
at -O1 and above. For AArch64 targets, the 64-bit ones get compiled to two
32-bit rev16 instructions, because there is not currently a pattern for the
64-bit rev16 instruction.
Differential Revision: http://reviews.llvm.org/D14609
llvm-svn: 253211
This has seen quite some usage and I am not aware of any issues. Also
add a style option to enable/disable include sorting. The existing
command line flag can from now on be used to override whatever is set
in the style.
llvm-svn: 253202
The TargetParser API to get the default FPU and default extensions has
changed so that it can fall back to the architecture in case of a
generic CPU.
llvm-svn: 253199
In r253186, I changed the DIBuilder API to now take size and align
for reference types as well. This was done in preparation for upcoming
changes to the Verifier that will validate that sizes match between
DI types and IR values that are declared as having those types.
This updates clang to actually pass the information through.
llvm-svn: 253190
is_empty, is_polymorphic, and is_abstract didn't handle incomplete types
correctly. Only non-union class types must be complete for these
traits.
is_final and is_sealed don't care about the particular spelling of the
FinalAttr.
is_interface_class should always return false regardless of its input.
The type trait can only be satisfied in a mode we do not support (/CLR).
llvm-svn: 253184
When calling a ObjC method on super from inside a C++ lambda, look at the
captures to find "self". This mirrors how the analyzer handles calling super in
an ObjC block and fixes an assertion failure.
rdar://problem/23550077
llvm-svn: 253176
The analyzer incorrectly treats captures as references if either the original
captured variable is a reference or the variable is captured by reference.
This causes the analyzer to crash when capturing a reference type by copy
(PR24914). Fix this by refering solely to the capture field to determine when a
DeclRefExpr for a lambda capture should be treated as a reference type.
https://llvm.org/bugs/show_bug.cgi?id=24914
rdar://problem/23524412
llvm-svn: 253157
Clang tries to figure out if a call to abs is suspicious by looking
through implicit casts to look at the underlying, implicitly converted
type.
Interestingly, C has implicit conversions from pointer-ish types like
function to less exciting types like int. This trips up our 'abs'
checker because it doesn't know which variant of 'abs' is appropriate.
Instead, diagnose 'abs' called on function types upfront. This sort of
thing is highly suspicious and is likely indicative of a missing
pointer dereference/function call/array index operation.
This fixes PR25532.
llvm-svn: 253156