llvm-project/parallel-libs/streamexecutor
Jason Henline b38d8a3a3b [SE] Pack global dev handle addresses
Summary:
We were packing global device memory handles in
`PackedKernelArgumentArray`, but as I was implementing the CUDA
platform, I realized that CUDA wants the address of the handle, not the
handle itself. So this patch switches to packing the address of the
handle.

Reviewers: jlebar

Subscribers: jprice, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24528

llvm-svn: 281424
2016-09-13 23:59:10 +00:00
..
examples [SE] Pack global dev handle addresses 2016-09-13 23:59:10 +00:00
include/streamexecutor [SE] Pack global dev handle addresses 2016-09-13 23:59:10 +00:00
lib [SE] KernelSpec return best PTX 2016-09-13 23:29:25 +00:00
tools/streamexecutor-config Add streamexecutor-config 2016-09-08 16:12:33 +00:00
unittests [SE] Pack global dev handle addresses 2016-09-13 23:59:10 +00:00
CMakeLists.txt [SE] Stop using llvm-config --cxxflags 2016-09-13 15:44:18 +00:00
Doxyfile.in Add streamexecutor-config 2016-09-08 16:12:33 +00:00
README.txt
customdoxygen.css [SE] Doc tweaks 2016-09-02 17:59:12 +00:00

README.txt

StreamExecutor
==============

StreamExecutor is a wrapper around CUDA and OpenCL (host-side) programming
models (runtimes). This abstraction cleanly permits host code to target either
CUDA or OpenCL devices with identically-functioning data parallel kernels. It
manages the execution of concurrent work targeting the accelerator, similar to a
host-side Executor.

This version of StreamExecutor can be built either as a sub-project of the LLVM
project or as a standalone project depending on LLVM as an external package.