Commit Graph

10 Commits

Author SHA1 Message Date
Jason Henline 7bb01a2dc4 [SE] Change CoreTests target name
Summary:
Call it StreamExecutorCoreTests in order to prevent collision with
targets from other modules.

Reviewers: jlebar, jprice

Subscribers: beanz, mgorny, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24949

llvm-svn: 282491
2016-09-27 15:32:52 +00:00
Jason Henline 9fc16d4e11 [SE] Fix config bug with CUDA tests
Summary:
It turns out CMake errors out if a processed directory contains source
files that are not used. This was causing an error with the CUDATest.cpp
file when configuring StreamExecutor with the CUDA platform disabled.

Moving CUDATest.cpp to its own directory fixes this problem.

Reviewers: jlebar, jprice

Subscribers: beanz, mgorny, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24618

llvm-svn: 281654
2016-09-15 20:26:28 +00:00
Jason Henline 70720a7e1b [SE] Support CUDA dynamic shared memory
Summary:
Add proper handling for shared memory arguments in the CUDA platform. Also add
in unit tests for CUDA.

Reviewers: jlebar

Subscribers: beanz, mgorny, jprice, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24596

llvm-svn: 281635
2016-09-15 18:11:04 +00:00
Jason Henline b38d8a3a3b [SE] Pack global dev handle addresses
Summary:
We were packing global device memory handles in
`PackedKernelArgumentArray`, but as I was implementing the CUDA
platform, I realized that CUDA wants the address of the handle, not the
handle itself. So this patch switches to packing the address of the
handle.

Reviewers: jlebar

Subscribers: jprice, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24528

llvm-svn: 281424
2016-09-13 23:59:10 +00:00
Jason Henline b459eb3529 [SE] KernelSpec return best PTX
Summary:
Before, the kernel spec would only return PTX for exactly the requested
compute capability. With this patch it will now return the PTX with the
largest compute capability that does not exceed that requested compute
capability.

Reviewers: jlebar

Subscribers: jprice, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24531

llvm-svn: 281417
2016-09-13 23:29:25 +00:00
Jason Henline 46b5e48fde [SE] Use real HostPlatformDevice for testing
Summary:
Replace uses of SimpleHostPlatformDevice in tests with
HostPlatformDevice.

Reviewers: jlebar

Subscribers: jlebar, jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24519

llvm-svn: 281384
2016-09-13 20:14:44 +00:00
Jason Henline fb62147949 [SE] Add .clang-format
Summary:
The .clang-tidy file is copied from the top-level LLVM source directory.

Also fix warnings generated by clang-format:

* Moved SimpleHostPlatformDevice.h so its header include guard could
  have the right format.
* Changed signatures of methods taking llvm::Twine by value to take it
  by const ref instead.
* Add "noexcept" to some move constructors and assignment operators.
* Removed a bunch of places where single-statement loops and
  conditionals were surrounded with braces. (This was not found by the
  current clang-tidy, but with a local patch that I hope to upstream
  soon.)

Reviewers: jlebar, jprice

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24468

llvm-svn: 281374
2016-09-13 19:25:43 +00:00
Jason Henline c16fb8748d [SE] Clean up device and host memory slices
Summary:
* Add LLVM_ATTRIBUTE_UNUSED_RESULT used to slicing methods in order to
  emphasize that the slicing is not done in place.
* Change device memory slice function name from `drop_front` to `slice`
  in order to match the naming convention of `llvm::ArrayRef` and host
  memory slice.
* Change the parameter names of host memory slice functions to
  `DropCount` and `TakeCount` to match device memory slice declarations.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24464

llvm-svn: 281239
2016-09-12 17:20:43 +00:00
Jason Henline 57ea481945 [SE] RegisteredHostMemory for async device copies
Summary:
Improve the error-prone interface that allows users to pass host
pointers that haven't been registered to asynchronous copy methods. In
CUDA, this is an extremely easy error to make, and instead of failing at
runtime, it succeeds and gives the right answers by turning the async
copy into a sync copy. So, you silently get a huge performance
degradation if you misuse the old interface. This new interface should
prevent that.

Reviewers: jlebar

Subscribers: jprice, beanz, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24353

llvm-svn: 281225
2016-09-12 16:09:41 +00:00
Justin Lebar b9e51397bf [StreamExecutor] Make SE work with an in-tree LLVM build.
Summary:
With these changes, we can put parallel-libs within llvm/projects and
build as normal.

This is kind of the minimal change I could figure out how to make while
still making us compatible with llvm's build system.  Some things I'm
not thrilled about include:

 * The creation of a CoreTests directory (the macros really seemed to
   want this)

 * Pulling SimpleHostPlatformDevice.h into CoreTests.  It seems to me
   this should live inside unittests/include, or maybe tests/include,
   but I didn't want to make that change in this patch.

One important piece of work that remains to be done is to make

  $ ninja check-streamexecutor

run all the tests.  Right now the only way I've figured out to run the
tests is

  $ ninja projects/parallel-libs/streamexecutor/unittests/StreamExecutorUnitTests
  $ projects/parallel-libs/streamexecutor/unittests/CoreTests/CoreTests

Reviewers: jhen

Subscribers: beanz, parallel_libs-commits, jprice

Differential Revision: https://reviews.llvm.org/D24368

llvm-svn: 281091
2016-09-09 21:01:02 +00:00