Commit Graph

181 Commits

Author SHA1 Message Date
Tobias Grosser b99c11710c [GPGPU] Make sure managed arrays are prepared at the beginning of the scop
Summary:
This resolves some "instruction does not dominate use" errors, as we used to
prepare the arrays at the location of the first kernel, which not necessarily
dominated all other kernel calls.

Reviewers: Meinersbur, bollu, singam-sanjay

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D36372

llvm-svn: 310196
2017-08-06 11:10:38 +00:00
Tobias Grosser 5b307cdb8a [GPGPU] Rename all, not only the first libdevice function
llvm-svn: 310194
2017-08-06 03:04:15 +00:00
Siddharth Bhat e53c924b0f [Polly] [PPCGCodeGeneration] Deal with loops outside the Scop correctly in PPCGCodeGeneration.
A Scop with a loop outside it is not handled currently by
PPCGCodeGeneration. The test case is such that the Scop has only one inner loop
that is detected. This currently breaks codegen.

The fix is to reuse the existing mechanism in `IslNodeBuilder` within
`GPUNodeBuilder.

Differential Revision: https://reviews.llvm.org/D36290

llvm-svn: 310193
2017-08-06 02:39:05 +00:00
Siddharth Bhat 638316da5b [PPCGCodeGeneration] [NFC] Log every location from which PPCGCodegen bails.
This is useful when trying to understand why no GPU code was produced.

Differential Revision: https://reviews.llvm.org/D36318

llvm-svn: 310103
2017-08-04 19:36:40 +00:00
Tobias Grosser b5563c6817 Make sure that all parameter dimensions are set in schedule
Summary:
In case the option -polly-ignore-parameter-bounds is set, not all parameters
will be added to context and domains. This is useful to keep the size of the
sets and maps we work with small. Unfortunately, for AST generation it is
necessary to ensure all parameters are part of the schedule tree. Hence,
we modify the GPGPU code generation to make sure this is the case.

To obtain the necessary information we expose a new function
Scop::getFullParamSpace(). We also make a couple of functions const to be
able to make SCoP::getFullParamSpace() const.

Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg

Subscribers: nemanjai, kbarton, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36243

llvm-svn: 309939
2017-08-03 13:51:15 +00:00
Siddharth Bhat eadf76d34a [PPCGCodeGeneration] Construct `isl_multi_pw_aff` of PPCGArray.bounds even when polly-ignore-parameter-bounds is turned on.
When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain
all the paramters present in the program.

The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff`
to have the same parameter dimensions. To achieve this, we used to realign
every `pw_aff` with `Scop::Context`. However, in conjunction with
`-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context`
does not contain all parameters.

We set this up correctly by creating a space that has all the parameters
used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space.

llvm-svn: 309934
2017-08-03 12:09:33 +00:00
Siddharth Bhat edf9581e4c [PPCGCodeGeneration] Correct usage of llvm::Value with getLatestValue.
It is possible that the `HostPtr` that coresponds to an array could be
invariant load hoisted. Make sure we use the invariant load hoisted
value by using `IslNodeBuilder::getLatestValue`.

Differential Revision: https://reviews.llvm.org/D36001

llvm-svn: 309681
2017-08-01 14:26:39 +00:00
Siddharth Bhat 4d5820d171 [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getGridSizes to isl++.
llvm-svn: 309671
2017-08-01 10:45:41 +00:00
Siddharth Bhat ccbf4b509c [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getArrayOffset to isl++.
llvm-svn: 309669
2017-08-01 09:58:55 +00:00
Tobias Grosser 8fc6cdfb1c [GPGPU] Add support for NVIDIA libdevice
Summary:
This allows us to map functions such as exp, expf, expl, for which no
LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides
high-performance implementations of a wide range of (math) functions. We
currently link only a small subset, the exp, cos and copysign functions. Other
functions will be enabled as needed.

Reviewers: bollu, singam-sanjay

Reviewed By: bollu

Subscribers: tstellar, tra, nemanjai, pollydev, mgorny, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35703

llvm-svn: 309560
2017-07-31 14:03:16 +00:00
Siddharth Bhat 4ebeb3568a [PPCGCodeGeneration] Check that invariant load hoisting succeeded.
If we fail, throw an error for now. We can gracefully handle this later.

llvm-svn: 309387
2017-07-28 14:48:32 +00:00
Tobias Grosser 25271b91b2 [GPGPU] Do not require the Scop::Context to have information about all parameters
llvm-svn: 309368
2017-07-28 06:49:44 +00:00
Tobias Grosser 30caae6d23 [GPGPU] Fix compilation issue with latest CUDA upgrade to i128
llvm-svn: 309366
2017-07-28 06:38:49 +00:00
Siddharth Bhat 43f178bbc9 [PPCGCodeGeneration] Skip arrays with empty extent.
Invariant load hoisted scalars, and arrays whose size we can statically compute
to be 0 do not need to be allocated as arrays.

Invariant load hoisted scalars are sent to the kernel directly as parameters.

Earlier, we used to allocate `0` bytes of memory for these because our
computation of size from `PPCGCodeGeneration::getArraySize` would result in `0`.

Now, since we don't invariant loads as arrays in PPCGCodeGeneration, this
problem does not occur anymore.

Differential Revision: https://reviews.llvm.org/D35795

llvm-svn: 308971
2017-07-25 12:35:36 +00:00
Tobias Grosser 206e9e3b3b Move ScopArrayInfo::getFromAccessFunction and getFromId to isl++
llvm-svn: 308892
2017-07-24 16:22:27 +00:00
Siddharth Bhat f7face4bc4 Convert GPUNodeBuilder::getArraySize to islcpp.
Note: PPCGCodeGeneration::pollyBuildAstExprForStmt is at
      https://reviews.llvm.org/D35770

Differential Revision: https://reviews.llvm.org/D35771

llvm-svn: 308870
2017-07-24 09:08:21 +00:00
Siddharth Bhat 35de900917 [NFC] Move PPCGCodeGeneration::pollyBuildAstExprForStmt to isl++.
Differential Revision: https://reviews.llvm.org/D35771

llvm-svn: 308869
2017-07-24 08:34:24 +00:00
Tobias Grosser 6a87036e0f Move MemoryAccess::getAddressFunction to isl++
llvm-svn: 308841
2017-07-23 04:08:45 +00:00
Tobias Grosser 1515f6b937 Move MemoryAccess::NewAccessRelation to isl++
We also move related accessor functions

llvm-svn: 308840
2017-07-23 04:08:38 +00:00
Tobias Grosser fe46c3ff3a Move MemoryAccess::id to isl++
llvm-svn: 308836
2017-07-23 04:08:11 +00:00
Tobias Grosser 77eef90f50 Move ScopArrayInfo to isl++
This moves the full ScopArrayInfo class to isl++

llvm-svn: 308801
2017-07-21 23:07:56 +00:00
Philipp Schaad 2f3073b5cb [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for Intel
Summary:
Added SPIR Code Generation to the PPCG Code Generator. This can be invoked using
the polly-gpu-arch flag value 'spir32' or 'spir64' for 32 and 64 bit code respectively.
In addition to that, runtime support has been added to execute said SPIR code on Intel
GPU's, where the system is equipped with Intel's open source driver Beignet (development
version). This requires the cmake flag 'USE_INTEL_OCL' to be turned on, and the polly-gpu-runtime
flag value to be 'libopencl'.
The transformation of LLVM IR to SPIR is currently quite a hack, consisting in part of regex
string transformations.
Has been tested (working) with Polybench 3.2 on an Intel i7-5500U (integrated graphics chip).

Reviewers: bollu, grosser, Meinersbur, singam-sanjay

Reviewed By: grosser, singam-sanjay

Subscribers: pollydev, nemanjai, mgorny, Anastasia, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35185

llvm-svn: 308751
2017-07-21 16:11:06 +00:00
Siddharth Bhat a0fb8b23e1 [NFC] [PPCGCodeGeneration] Print `verifyModule` failure to debug stream.
If verifyModule fails, it is helpful to know why it failed. Add a log to
the debug stream that prints the failure.

llvm-svn: 308727
2017-07-21 11:21:44 +00:00
Tobias Grosser 018103d34e Fix typo in function name Bllock -> Block
llvm-svn: 308715
2017-07-21 06:00:38 +00:00
Tobias Grosser 54491db687 Support fabs and copysign in Polly-ACC
llvm-svn: 308649
2017-07-20 18:26:34 +00:00
Siddharth Bhat 9e3db2b756 [PPCGCodeGen] [3/3] Update PPCGCodeGen + tests to latest ppcg.
This commit *WILL COMPILE*.

1. `PPCG` now uses `isl_multi_pw_aff` instead of an array of `pw_aff`.
   This needs us to adjust how we index array bounds and how we construct
   array bounds.

2. `PPCG` introduces two new kinds of nodes: `init_device` and `clear_device`.
   We should investigate what the correct way to handle these are.

3. `PPCG` has gotten smarter with its use of live range reordering, so some of
   the tests have a qualitative improvement.

4. `PPCG` changed its output style, so many test cases need to be updated to
   fit the new style for `polly-acc-dump-code` checks.

Differential Revision: https://reviews.llvm.org/D35677

llvm-svn: 308625
2017-07-20 15:48:36 +00:00
Siddharth Bhat edfef5ae8e [NFC] [PPCGCodeGeneration] cleanup kills related code.
We extended kills in Polly to handle both `phi` nodes and scalars that
    are not used within the Scop. Update the comments and choice of
    variable names to reflect this.

llvm-svn: 308279
2017-07-18 09:15:16 +00:00
Siddharth Bhat 233d717ec1 [PPCGCodeGeneration] Generate invariant loads before trying to generate IR.
- We should call `preloadInvariantLoads` to make sure that code is
   generated for invariant loads in the kernel.

Differential Revision: https://reviews.llvm.org/D35410

llvm-svn: 308187
2017-07-17 15:57:01 +00:00
Siddharth Bhat 03346c2701 [PPCGCodeGeneration] Fix runtime check adjustments since they make assumptions about BB layout.
- There is a conditional branch that is used to switch between the old
   and new versions of the code.

- If we detect that the build was unsuccessful, `PPCGCodeGeneration` will
  change the runtime check to be always set to false.

- To actually *reach* this runtime check instruction, `PPCGCodeGeneration`
  was using assumptions about the layout of the BBs.

- However, invariant load hoisting violates this assumption by inserting
  an extra basic block in the middle.

- Fix the assumption on the layout by having `createScopConditionally`
   return the conditional branch instruction.

- Use this reference to set to always-false.

llvm-svn: 308010
2017-07-14 10:00:25 +00:00
Siddharth Bhat a1b2086a33 [Invariant Loads] Do not consider invariant loads to have dependences.
We need to relax constraints on invariant loads so that they do not
create fake RAW dependences. So, we do not consider invariant loads as
scalar dependences in a region.

During these changes, it turned out that we do not consider `llvm::Value`
replacements correctly within `PPCGCodeGeneration` and `ISLNodeBuilder`.
The replacements dictated by `ValueMap` were not being followed in all
places. This was fixed in this commit. There is no clean way to decouple
this change because this bug only seems to arise when the relaxed
version of invariant load hoisting was enabled.

Differential Revision: https://reviews.llvm.org/D35120

llvm-svn: 307907
2017-07-13 12:18:56 +00:00
Singapuram Sanjay Srivallabh 1abd9ffa37 [PPCGCodeGen] Differentiate kernels based on their parent Scop
Summary:
Add a sequence number that identifies a ptx_kernel's parent Scop within a function to it's name to differentiate it from other kernels produced from the same function, yet different Scops.

Kernels produced from different Scops can end up having the same name. Consider a function with 2 Scops and each Scop being able to produce just one kernel. Both of these kernels have the name "kernel_0". This can lead to the wrong kernel being launched when the runtime picks a kernel from its cache based on the name alone. This patch supplements D33985, by differentiating kernels across Scops as well.

Previously (even before D33985) while profiling kernels generated through JIT e.g. Julia, [[ https://groups.google.com/d/msg/polly-dev/J1j587H3-Qw/mR-jfL16BgAJ | kernels associated with different functions, and even different SCoPs within a function, would be grouped together due to the common name ]]. This patch prevents this grouping and the kernels are reported separately.

Reviewers: grosser, bollu

Reviewed By: grosser

Subscribers: mehdi_amini, nemanjai, pollydev, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35176

llvm-svn: 307814
2017-07-12 16:46:19 +00:00
Siddharth Bhat 761e5b9310 [Polly] [PPCGCodeGeneration] Teach `must_kills` to kill scalars that are local to the scop.
- By definition, we can pass something as a `kill` to PPCG if we know
that no data can flow across a kill.
- This is useful for more complex examples where we have scalars that
are local to a scop.
- If the local is only used within a scop, we are free to kill it.

Differential Revision: https://reviews.llvm.org/D35045

llvm-svn: 307260
2017-07-06 13:42:42 +00:00
Singapuram Sanjay Srivallabh 79f13b9a80 Prefix the name of the calling host function in the name of callee GPU kernel
Summary:
Provide more context to the name of a GPU kernel by prefixing its name with the host function that calls it. E.g. The first kernel called by `gemm` would be `FUNC_gemm_KERNEL_0`.

Kernels currently follow the "kernel_#" (# = 0,1,2,3,...) nomenclature. This patch makes it easier to map host caller and device callee, especially when there are many kernels produced by Polly-ACC.

Reviewers: grosser, Meinersbur, bollu, philip.pfaffe, kbarton!

Reviewed By: grosser

Subscribers: nemanjai, pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D33985

llvm-svn: 307173
2017-07-05 16:48:21 +00:00
Siddharth Bhat a82f2d264a [PPCGCodeGeneration] Teach Polly to start using live range reordering.
Polly did not use PPCG's live range reordering feature. Teach
PPCGCodeGeneration to use this.

Documentation on this is sparse, so much of the code is conservative.

We currently kill all phi nodes in a Scop by appending them to the
must_kill map we pass to PPCG. I do not have a proof of correctness,
but it seems to be intuitively correct.

We also do not handle `array_order`, which, quoting PPCG, is:
PPCG/gpu.h: "Order dependences on non-scalars."
It seems to consist of RAW dependences between arrays. We need to
pass this information for more complex privatization cases.

Differential Revision: https://reviews.llvm.org/D34941

llvm-svn: 307163
2017-07-05 14:57:04 +00:00
Singapuram Sanjay Srivallabh 02ca346e48 Introduce a hybrid target to generate code for either the GPU or CPU
Summary:
Introduce a "hybrid" `-polly-target` option to optimise code for either the GPU or CPU.

When this target is selected, PPCGCodeGeneration will attempt first to optimise a Scop. If the Scop isn't modified, it is then sent to the passes that form the CPU pipeline, i.e. IslScheduleOptimizerPass, IslAstInfoWrapperPass and CodeGeneration.

In case the Scop is modified, it is marked to be skipped by the subsequent CPU optimisation passes.

Reviewers: grosser, Meinersbur, bollu

Reviewed By: grosser

Subscribers: kbarton, nemanjai, pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D34054

llvm-svn: 306863
2017-06-30 19:42:21 +00:00
Siddharth Bhat 65d7f72f2c [PPCGCodeGeneration] Add flag to allow polly to fail in GPU kernel fails.
- This is useful for debugging GPU code.

llvm-svn: 306290
2017-06-26 14:56:56 +00:00
Siddharth Bhat f291c8d510 [PPCGCodeGeneration] Allow intrinsics within kernels.
- In D33414, if any function call was found within a kernel, we would bail out.

- This is an over-approximation. This patch changes this by allowing the
  `llvm.sqrt.*` family of intrinsics.

- This introduces an additional step when creating a separate llvm::Module
  for a kernel (GPUModule). We now copy function declarations from the
  original module to new module.

- We also populate IslNodeBuilder::ValueMap so it replaces the function
  references to the old module to the ones in the new module
  (GPUModule).

Differential Revision: https://reviews.llvm.org/D34145

llvm-svn: 306284
2017-06-26 13:12:06 +00:00
Andreas Simbuerger 256070d85c [NFC] Return both polly.start and polly.exiting from executeScopConditionally.
This commit returns both the start and the exit block that are created
by executeScopConditionally.

In a future commit we will make use of the exit block. Before we would
have to use the implicit property that there won't be any code generated
between polly.start and polly.exiting at the time of use to find the
correct block ('polly.exiting').

All usage location are semantically unchanged.

llvm-svn: 306283
2017-06-26 12:17:11 +00:00
Siddharth Bhat a12f807f33 [PPCGCodeGeneration] Enable GPU code generation with invariant loads.
The condition that disallowed code generation in PPCGCodeGeneration with
invariant loads is not required. I haven't been able to construct a
counterexample where this generates invalid code.

Differential Revision: https://reviews.llvm.org/D34604

llvm-svn: 306245
2017-06-25 14:48:24 +00:00
Siddharth Bhat bccaea57c0 [Polly] [PPCGCodeGeneration] Skip Scops which contain function pointers.
In `PPCGCodeGeneration`, we try to take the references of every `Value`
that is used within a Scop to offload to the kernel. This occurs in
`GPUNodeBuilder::createLaunchParameters`.

This breaks if one of the values is a function pointer, since one of
these cases will trigger:

1. We try to to take the references of an intrinsic function, and this
breaks at `verifyModule`, since it is illegal to take the reference of
an intrinsic.

2. We manage to take the reference to a function, but this fails at
`verifyModule` since the function will not be present in the module that
is created in the kernel.

3. Even if `verifyModule` succeeds (which should not occur), we would
then try to call a *host function* from the *device*, which is
illegal runtime behaviour.

So, we disable this entire range of possibilities by simply not allowing
function references within a `Scop` which corresponds to a kernel.

However, note that this is too conservative. We *can* allow intrinsics
within kernels if the backend can lower the intrinsic correctly. For
example, an intrinsic like `llvm.powi.*` can actually be lowered by the `NVPTX`
backend.

We will now gradually whitelist intrinsics which are known to be safe.

Differential Revision: https://reviews.llvm.org/D33414

llvm-svn: 305185
2017-06-12 11:41:09 +00:00
Michael Kruse a6d48f59a1 Fix a lot of typos. NFC.
llvm-svn: 304974
2017-06-08 12:06:15 +00:00
Philip Pfaffe 2b852e2e42 [Polly][NewPM] Port IslAst to the new ScopPassManager
Summary: This patch ports IslAst to the new PM. The change is mostly straightforward. The only major modification required is making IslAst move-only, to correctly manage the isl resources it owns.

Reviewers: grosser, Meinersbur

Reviewed By: grosser

Subscribers: nemanjai, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D33422

llvm-svn: 303622
2017-05-23 10:12:56 +00:00
Siddharth Bhat b7f68b8c9e [Fortran Support] Materialize outermost dimension for Fortran array.
- We use the outermost dimension of arrays since we need this
information to generate GPU transfers.

- In general, if we do not know the outermost dimension of the array
(because the indexing expression is non-affine, for example) then we
simply cannot generate transfer code.

- However, for Fortran arrays, we can use the Fortran array
representation which stores the dimensions of all arrays.

- This patch uses the Fortran array representation to generate code that
computes the outermost dimension size.

Differential Revision: https://reviews.llvm.org/D32967

llvm-svn: 303429
2017-05-19 15:07:45 +00:00
Philip Pfaffe 5cc87e3ab3 [Polly][NewPM] Port ScopDetection to the new PassManager
Summary: This is a proof of concept of how to port polly-passes to the new PassManager architecture.  This approach works ootb for Function-Passes, but might not be directly applicable to Scop/Region-Passes. While we could just run the Analyses/Transforms over functions instead, we'd surrender the nice pipelining behaviour we have now.

Reviewers: Meinersbur, grosser

Reviewed By: grosser

Subscribers: pollydev, sanjoy, nemanjai, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31459

llvm-svn: 302902
2017-05-12 14:37:29 +00:00
Siddharth Bhat a90be207c6 [Polly][PPCGCodeGen] OpenCL now gets kernel argument size from PPCG CodeGen
Summary: PPCGCodeGeneration now attaches the size of the kernel launch parameters at the end of the parameter list. For the existing CUDA Runtime, this gets ignored, but the OpenCL Runtime knows to check for kernel-argument size at the end of the parameter list. (The resulting parameters list is twice as long. This has been accounted for in the corresponding test cases).

Reviewers: grosser, Meinersbur, bollu

Reviewed By: bollu

Subscribers: nemanjai, yaxunl, Anastasia, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32961

llvm-svn: 302515
2017-05-09 10:45:52 +00:00
Siddharth Bhat 17f01968f1 [Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen
Summary:
When compiling for GPU, one can now choose to compile for OpenCL or CUDA,
with the corresponding polly-gpu-runtime flag (libopencl / libcudart). The
GPURuntime library (GPUJIT) has been extended with the OpenCL Runtime library
for that purpose, correctly choosing the corresponding library calls to the
option chosen when compiling (via different initialization calls).

Additionally, a specific GPU Target architecture can now be chosen with -polly-gpu-arch (only nvptx64 implemented thus far).

Reviewers: grosser, bollu, Meinersbur, etherzhhb, singam-sanjay

Reviewed By: grosser, Meinersbur

Subscribers: singam-sanjay, llvm-commits, pollydev, nemanjai, mgorny, yaxunl, Anastasia

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32431

llvm-svn: 302379
2017-05-07 21:03:46 +00:00
Siddharth Bhat c1267b9baa Revert "[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen"
This reverts commit 17a84e414adb51ee375d14836d4c2a817b191933.

Patches should have been submitted in the order of:

1. D32852
2. D32854
3. D32431

I mistakenly pushed D32431(3) first. Reverting to push in the correct
order.

llvm-svn: 302217
2017-05-05 09:02:08 +00:00
Siddharth Bhat 51904ae35a [Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen
Summary:
When compiling for GPU, one can now choose to compile for OpenCL or CUDA,
with the corresponding polly-gpu-runtime flag (libopencl / libcudart). The
GPURuntime library (GPUJIT) has been extended with the OpenCL Runtime library
for that purpose, correctly choosing the corresponding library calls to the
option chosen when compiling (via different initialization calls).

Additionally, a specific GPU Target architecture can now be chosen with -polly-gpu-arch (only nvptx64 implemented thus far).

Reviewers: grosser, bollu, Meinersbur, etherzhhb, singam-sanjay

Reviewed By: grosser, Meinersbur

Subscribers: singam-sanjay, llvm-commits, pollydev, nemanjai, mgorny, yaxunl, Anastasia

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32431

llvm-svn: 302215
2017-05-05 07:54:49 +00:00
Siddharth Bhat abed49699b [Polly] [PPCGCodeGeneration] Add managed memory support to GPU code
generation.

This needs changes to GPURuntime to expose synchronization between host
and device.

1. Needs better function naming, I want a better name than
"getOrCreateManagedDeviceArray"

2. DeviceAllocations is used by both the managed memory and the
non-managed memory path. This exploits the fact that the two code paths
are never run together. I'm not sure if this is the best design decision

Reviewed by: PhilippSchaad

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32215

llvm-svn: 301640
2017-04-28 11:16:30 +00:00
Siddharth Bhat d277feda91 [PPCGCodeGeneration] Update PPCG Code Generation for OpenCL compatibility
Added a small change to the way pointer arguments are set in the kernel
code generation. The way the pointer is retrieved now, specifically requests
global address space to be annotated. This is necessary, if the IR should be
run through NVPTX to generate OpenCL compatible PTX.

The changes do not affect the PTX Strings generated for the CUDA target
(nvptx64-nvidia-cuda), but are necessary for OpenCL (nvptx64-nvidia-nvcl).

Additionally, the data layout has been updated to what the NVPTX Backend requests/recommends.

Contributed-by: Philipp Schaad

Reviewers: Meinersbur, grosser, bollu

Reviewed By: grosser, bollu

Subscribers: jlebar, pollydev, llvm-commits, nemanjai, yaxunl, Anastasia

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32215

llvm-svn: 301299
2017-04-25 08:08:29 +00:00
Tobias Grosser 7b5a4dfd46 Exploit BasicBlock::getModule to shorten code
Suggested-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 299914
2017-04-11 04:59:13 +00:00
Tobias Grosser 67726b3260 SAdjust to recent change in constructor definition of AllocaInst
llvm-svn: 299913
2017-04-11 04:23:38 +00:00
Philip Pfaffe 2d950f36ee [Polly][NewPM] Pull references to the legacy PM interface from utilities and helpers
Summary:
A couple of the utilities used to analyze or build IR make explicit use of the legacy PM on their interface, to access analysis results. This patch removes the legacy PM from the interface, and just passes the required results directly.

This shouldn't introduce any function changes, although the API technically allowed to obtain two different analysis results before, one passed by reference and one through the PM. I don't believe that was ever intended, however.

Reviewers: grosser, Meinersbur

Reviewed By: grosser

Subscribers: nemanjai, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31653

llvm-svn: 299423
2017-04-04 10:01:53 +00:00
Tobias Grosser de244eb450 Possible error in doc comment
If a SCoP is most probably sequential, then it's better to run it on a CPU.
Hence, there's no point in running it on a GPU.

Reviewers: grosser

Subscribers: nemanjai

Tags: #polly

Contributed-by: Singapuram Sanjay <singapuram.sanjay@gmail.com>

Differential Revision: https://reviews.llvm.org/D30864

llvm-svn: 297578
2017-03-12 08:19:01 +00:00
Tobias Grosser 24222c7357 Fix namespaces after clang-format update
llvm-svn: 296635
2017-03-01 15:54:27 +00:00
Michael Kruse 52ab4943b4 Remove all references to PostDominators. NFC.
Marking a pass as preserved is necessary if any Polly pass uses it, even
if it is not preserved within the generated code. Not marking it would
cause the the Polly pass chain to be interrupted. It is not used by any
Polly pass anymore, hence we can remove all references to it.

llvm-svn: 295983
2017-02-23 15:16:22 +00:00
Tobias Grosser ff40087a6a Update to recent formatting changes
llvm-svn: 293756
2017-02-01 10:12:09 +00:00
Tobias Grosser 587f1f57ad [Polly] [BlockGenerator] Unify ScalarMap and PhiOpsMap
Instead of keeping two separate maps from Value to Allocas, one for
MemoryType::Value and the other for MemoryType::PHI, we introduce a single map
from ScopArrayInfo to the corresponding Alloca. This change is intended, both as
a general simplification and cleanup, but also to reduce our use of
MemoryAccess::getBaseAddr(). Moving away from using getBaseAddr() makes sure
we have only a single place where the array (and its base pointer) for which we
generate code for is specified, which means we can more easily introduce new
access functions that use a different ScopArrayInfo as base. We already today
experiment with modifiable access functions, so this change does not address
a specific bug, but it just reduces the scope one needs to reason about.

Another motivation for this patch is https://reviews.llvm.org/D28518, where
memory accesses with different base pointers could possibly be mapped to a
single ScopArrayInfo object. Such a mapping is currently not possible, as we
currently generate alloca instructions according to the base addresses of the
memory accesses, not according to the ScopArrayInfo object they belong to.  By
making allocas ScopArrayInfo specific, a mapping to a single ScopArrayInfo
object will automatically mean that the same stack slot is used for these
arrays. For D28518 this is not a problem, as only MemoryType::Array objects are
mapping, but resolving this inconsistency will hopefully avoid confusion.

llvm-svn: 293374
2017-01-28 07:42:10 +00:00
Tobias Grosser 4d5a917287 Use typed enums to model MemoryKind and move MemoryKind out of ScopArrayInfo
To benefit of the type safety guarantees of C++11 typed enums, which would have
caught the type mismatch fixed in r291960, we make MemoryKind a typed enum.
This change also allows us to drop the 'MK_' prefix and to instead use the more
descriptive full name of the enum as prefix. To reduce the amount of typing
needed, we use this opportunity to move MemoryKind from ScopArrayInfo to a
global scope, which means the ScopArrayInfo:: prefix is not needed. This move
also makes historically sense. In the beginning of Polly we had different
MemoryKind enums in both MemoryAccess and ScopArrayInfo, which were later
canonicalized to one. During this canonicalization we just choose the enum in
ScopArrayInfo, but did not consider to move this shared enum to global scope.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D28090

llvm-svn: 292030
2017-01-14 20:25:44 +00:00
Tobias Grosser e29db2173b Update to recent clang-format changes
llvm-svn: 291810
2017-01-12 21:05:19 +00:00
Tobias Grosser df8f35b7b8 Update for clang-format change in r288119
llvm-svn: 288134
2016-11-29 12:52:08 +00:00
Eli Friedman acf8006471 [Polly CodeGen] Break critical edge from RTC to original loop.
This makes polly generate a CFG which is closer to what we want
in LLVM IR, with a loop preheader for the original loop. This is
just a cleanup, but it exposes some fragile assumptions.

I'm not completely happy with the changes related to expandCodeFor;
RTCBB->getTerminator() is basically a random insertion point which
happens to work due to the way we generate runtime checks. I'm not
sure what the right answer looks like, though.

Differential Revision: https://reviews.llvm.org/D26053

llvm-svn: 285864
2016-11-02 22:32:23 +00:00
Tobias Grosser bc653f2031 GPGPU: Do not run mostly sequential kernels in GPU
In case sequential kernels are found deeper in the loop tree than any parallel
kernel, the overall scop is probably mostly sequential. Hence, run it on the
CPU.

llvm-svn: 281849
2016-09-18 08:31:09 +00:00
Tobias Grosser 82f2af3508 GPGPU: Dynamically ensure 'sufficient compute'
Offloading to a GPU is only beneficial if there is a sufficient amount of
compute that can be accelerated. Many kernels just have a very small number
of dynamic compute, which means GPU acceleration is not beneficial. We
compute at run-time an approximation of how many dynamic instructions will be
executed and fall back to CPU code in case this number is not sufficiently
large. To keep the run-time checking code simple, we over-approximate the
number of instructions executed in each statement by computing the volume of
the rectangular hull of its iteration space.

llvm-svn: 281848
2016-09-18 06:50:35 +00:00
Tobias Grosser 51dfc27589 GPGPU: Store back non-read-only scalars
We may generate GPU kernels that store into scalars in case we run some
sequential code on the GPU because the remaining data is expected to already be
on the GPU. For these kernels it is important to not keep the scalar values
in thread-local registers, but to store them back to the corresponding device
memory objects that backs them up.

We currently only store scalars back at the end of a kernel. This is only
correct if precisely one thread is executed. In case more than one thread may
be run, we currently invalidate the scop. To support such cases correctly,
we would need to always load and store back from a corresponding global
memory slot instead of a thread-local alloca slot.

llvm-svn: 281838
2016-09-17 19:22:31 +00:00
Tobias Grosser fe74a7a1f5 GPGPU: Detect read-only scalar arrays ...
and pass these by value rather than by reference.

llvm-svn: 281837
2016-09-17 19:22:18 +00:00
Tobias Grosser aaabbbf886 GPGPU: Do not assume arrays start at 0
Our alias checks precisely check that the minimal and maximal accessed elements
do not overlap in a kernel. Hence, we must ensure that our host <-> device
transfers do not touch additional memory locations that are not covered in
the alias check. To ensure this, we make sure that the data we copy for a
given array is only the data from the smallest element accessed to the largest
element accessed.

We also adjust the size of the array according to the offset at which the array
is actually accessed.

An interesting result of this is: In case array are accessed with negative
subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to
cover the full array. This is important as such code indeed exists in the wild.

llvm-svn: 281611
2016-09-15 14:05:58 +00:00
Tobias Grosser 0a893f7df4 GPGPU: Use const_cast to avoid compiler warning [NFC]
llvm-svn: 281333
2016-09-13 13:22:27 +00:00
Tobias Grosser a82c4b5df8 GPGPU: Allow region statements
llvm-svn: 281305
2016-09-13 08:42:10 +00:00
Tobias Grosser b79f4d3970 GPGPU: Extend types when array sizes have smaller types
This prevents a compiler crash.

llvm-svn: 281303
2016-09-13 08:02:14 +00:00
Roman Gareev f5aff70405 Store the size of the outermost dimension in case of newly created arrays that require memory allocation.
We do not need the size of the outermost dimension in most cases, but if we
allocate memory for newly created arrays, that size is needed.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D23991

llvm-svn: 281234
2016-09-12 17:08:31 +00:00
Tobias Grosser 5857b701a3 GPGPU: Bail out gracefully in case of invalid IR
Instead of aborting, we now bail out gracefully in case the kernel IR we
generate is invalid. This can currently happen in case the SCoP stores
pointer values, which we model as arrays, as data values into other arrays. In
this case, the original pointer value is not available on the device and can
consequently not be stored. As detecting this ahead of time is not so easy, we
detect these situations after the invalid IR has been generated and bail out.

llvm-svn: 281193
2016-09-12 06:06:31 +00:00
Tobias Grosser 02293ed755 GPGPU: Do not fail in case of arrays never accessed
If these arrays have never been accessed we failed to derive an upper bound
of the accesses and consequently a size for the outermost dimension. We
now explicitly check for empty access sets and then just use zero as size
for the outermost dimension.

llvm-svn: 281165
2016-09-11 13:30:12 +00:00
Tobias Grosser d58acf866a [GPGPU] Ensure arrays where only parts are modified are copied to GPU
To do so we change the way array exents are computed. Instead of the precise
set of memory locations accessed, we now compute the extent as the range between
minimal and maximal address in the first dimension and the full extent defined
by the sizes of the inner array dimensions.

We also move the computation of the may_persist region after the construction
of the arrays, as it relies on array information. Without arrays being
constructed no useful information is computed at all.

llvm-svn: 278212
2016-08-10 10:58:19 +00:00
Tobias Grosser b06ff4574e [GPGPU] Support PHI nodes used in GPU kernel
Ensure the right scalar allocations are used as the host location of data
transfers. For the device code, we clear the allocation cache before device
code generation to be able to generate new device-specific allocation and
we need to make sure to add back the old host allocations as soon as the
device code generation is finished.

llvm-svn: 278126
2016-08-09 15:35:06 +00:00
Tobias Grosser 750160e260 [GPGPU] Use separate basic block for GPU initialization code
This increases the readability of the IR and also clarifies that the GPU
inititialization is executed _after_ the scalar initialization which needs
to before the code of the transformed scop is executed.

Besides increased readability, the IR should not change. Specifically, I
do not expect any changes in program semantics due to this patch.

llvm-svn: 278125
2016-08-09 15:35:03 +00:00
Tobias Grosser cf66ef26f3 [GPGPU] Pass parameters always by using their own type
llvm-svn: 278100
2016-08-09 07:22:08 +00:00
Tobias Grosser 124534038a [GPGPU] Support Values referenced from both isl expr and llvm instructions
When adding code that avoids to pass values used in isl expressions and
LLVM instructions twice, we forgot to make single variable passed to the
kernel available in the ValueMap that makes it usable for instructions that
are not replaced with isl ast expressions. This change adds the variable
that is passed to the kernel to the ValueMap to ensure it is available
for such use cases as well.

llvm-svn: 278039
2016-08-08 19:22:19 +00:00
Tobias Grosser cb1aef8de4 [GPGPU] Create code to verify run-time conditions
llvm-svn: 278026
2016-08-08 17:35:55 +00:00
Tobias Grosser 928d7573dd GPGPU: Sort dimension sizes of multi-dimensional shared memory arrays correctly
Before this commit we generated the array type in reverse order and we also
added the outermost dimension size to the new array declaration, which is
incorrect as Polly additionally assumed an additional unsized outermost
dimension, such that we had an off-by-one error in the linearization of access
expressions.

llvm-svn: 277802
2016-08-05 08:27:24 +00:00
Tobias Grosser c1c6a2a61b GPGPU: Add cuda annotations to specify maximal number of threads per block
These annotations ensure that the NVIDIA PTX assembler limits the number of
registers used such that we can be certain the resulting kernel can be executed
for the number of threads in a thread block that we are planning to use.

llvm-svn: 277799
2016-08-05 06:47:43 +00:00
Tobias Grosser f919d8b360 GPGPU: Support scalars that are mapped to shared memory
llvm-svn: 277726
2016-08-04 13:57:29 +00:00
Tobias Grosser 8950cead7f GPGPU: Disable verbose debug output
llvm-svn: 277724
2016-08-04 12:44:03 +00:00
Tobias Grosser b0dd95bcd2 Remove leftover debug output
llvm-svn: 277723
2016-08-04 12:41:28 +00:00
Tobias Grosser 130ca30f92 GPGPU: Add private memory support
llvm-svn: 277722
2016-08-04 12:39:03 +00:00
Tobias Grosser b513b4916b GPGPU: Add support for shared memory
llvm-svn: 277721
2016-08-04 12:18:14 +00:00
Tobias Grosser 00bb5a99f5 GPGPU: Handle scalar array references
Pass the content of scalar array references to the alloca on the kernel side
and do not pass them additional as normal LLVM scalar value.

llvm-svn: 277699
2016-08-04 06:55:59 +00:00
Tobias Grosser 576932728d GPGPU: Pass subtree values correctly to the kernel
llvm-svn: 277697
2016-08-04 06:55:49 +00:00
Tobias Grosser 629109b633 GPGPU: Mark kernel functions as polly.skip
Otherwise, we would try to re-optimize them with Polly-ACC and possibly even
generate kernels that try to offload themselves, which does not work as the
GPURuntime is not available on the accelerator and also does not make any
sense.

llvm-svn: 277589
2016-08-03 12:00:07 +00:00
Roman Gareev d7754a1245 Extend the jscop interface to allow the user to declare new arrays and to reference these arrays from access expressions
Extend the jscop interface to allow the user to export arrays. It is required
that already existing arrays of the list of arrays correspond to arrays
of the SCoP. Each array that is appended to the list will be newly created.
Furthermore, we allow the user to modify access expressions to reference
any array in case it has the same element type.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D22828

llvm-svn: 277263
2016-07-30 09:25:51 +00:00
Tobias Grosser d8b94bcac1 GPGPU: Pass context parameters to GPU kernel
llvm-svn: 276963
2016-07-28 06:47:59 +00:00
Tobias Grosser a490147c90 GPGPU: Pass host iterators to kernel
llvm-svn: 276962
2016-07-28 06:47:56 +00:00
Tobias Grosser 44143bb927 GPGPU: use current 'Index' to find slot in parameter array
Before this change we used the array index, which would result in us accessing
the parameter array out-of-bounds. This bug was visible for test cases where not
all arrays in a scop are passed to a given kernel.

llvm-svn: 276961
2016-07-28 06:47:53 +00:00
Tobias Grosser 4e18d71c71 GPGPU: Generate kernel parameter allocation with right size
Before this change we miscounted the number of function parameters.

llvm-svn: 276960
2016-07-28 06:47:50 +00:00
Tobias Grosser 79a947c233 GPGPU: Add basic support for kernel launches
llvm-svn: 276863
2016-07-27 13:20:16 +00:00
Tobias Grosser 5779359624 GPGPU: Load GPU kernels
We embed the PTX code into the host IR as a global variable and compile it
at run-time into a GPU kernel.

llvm-svn: 276645
2016-07-25 16:31:21 +00:00
Tobias Grosser 13c78e4d51 GPGPU: Emit data-transfer code
Also factor out getArraySize() to avoid code dupliciation and reorder some
function arguments to indicate the direction into which data is transferred.

llvm-svn: 276636
2016-07-25 12:47:39 +00:00
Tobias Grosser 7287aeddf1 GPGPU: Complete code to allocate and free device arrays
At the beginning of each SCoP, we allocate device arrays for all arrays
used on the GPU and we free such arrays after the SCoP has been executed.

llvm-svn: 276635
2016-07-25 12:47:33 +00:00
Tobias Grosser fa7b080218 GPGPU: initialize GPU context and simplify the corresponding GPURuntime interface.
There is no need to expose the selected device at the moment. We also pass back
pointers as return values, as this simplifies the interface.

llvm-svn: 276623
2016-07-25 09:16:01 +00:00
Tobias Grosser 8ed5e5999f IslNodeBuilder: Make finalize() virtual
This allows the finalization routine of the IslNodeBuilder to be overwritten
by derived classes. Being here, we also drop the unnecessary 'Scop' postfix
and the unnecessary 'Scop' parameter.

llvm-svn: 276622
2016-07-25 09:15:57 +00:00