When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory
we add parameter dimensions lazily to the domains, which results in PPCG not
including parameter dimensions that are only used in memory accesses in the
kernel space. To make sure these parameters are still passed to the kernel, we
collect these parameter dimensions and align the kernel's parameter space
before code-generating it.
llvm-svn: 311239
Summary:
Drop unused parameter dimensions to reduce the size of the sets we are working
with. Especially the computed dependences tend to accumulate a lot of parameters
that are present in the input memory accesses, but often not necessary to
express the actual dependences. As isl represents maps and sets with dense
matrices, reducing the dimensionality of isl sets commonly reduces code
generation performance.
This reduces compile time from 17 to 11 seconds for our test case. While this is
not impressive, this patch helped me to identify the previous two performance
improvements and additionally also increases readability of the isl data
structures we use.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36869
llvm-svn: 311161
Summary:
They are not used and consequently do not even need to be computed. This reduces
the overall compile time for our kernel from 1m33s to 17s.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36868
llvm-svn: 311157
Summary:
This change reduces the overall number of synchronize calls for kernels with
a lot of output data at the cost of additional synchronize calls for kernels
launched in sequence without any device to host transfers in between. As the
latter pattern is a lot less frequent, this seems a better tradeoff.
Even though the above motivation would be motivation enough, this is just
a step towards enabling ppcg to not compute to and from device copy calls
at all, which would be incorrect in case we still relied on these calls to
place our synchronization statements.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36867
llvm-svn: 311155
This avoid the construction of very large sets and in many cases also keeps the
number of parameters low. As a result, we see a compile time reduction from 5
minutes to only slightly above 1 minute for one of our larger test cases.
llvm-svn: 311127
This pass is useful to automatically convert a codebase that uses malloc/free
to use their managed memory counterparts.
Currently, rewrite malloc and free to the `polly_{malloc,free}Managed` variants.
A future patch will teach ManagedMemoryRewrite to rewrite global arrays
as pointers to globally allocated managed memory.
Differential Revision: https://reviews.llvm.org/D36513
llvm-svn: 310471
Previously, we used to compute this with `elementSizeInBits / 8`. This
would yield an element size of 0 when the array had element size < 8 in
bits.
To fix this, ask data layout what the size in bytes should be.
Differential Revision: https://reviews.llvm.org/D36459
llvm-svn: 310448
To do this, we replicate what `CodeGeneration` does. We expose
`markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`.
Differential Revision: https://reviews.llvm.org/D36457
llvm-svn: 310350
Summary:
This resolves some "instruction does not dominate use" errors, as we used to
prepare the arrays at the location of the first kernel, which not necessarily
dominated all other kernel calls.
Reviewers: Meinersbur, bollu, singam-sanjay
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Differential Revision: https://reviews.llvm.org/D36372
llvm-svn: 310196
A Scop with a loop outside it is not handled currently by
PPCGCodeGeneration. The test case is such that the Scop has only one inner loop
that is detected. This currently breaks codegen.
The fix is to reuse the existing mechanism in `IslNodeBuilder` within
`GPUNodeBuilder.
Differential Revision: https://reviews.llvm.org/D36290
llvm-svn: 310193
Summary:
In case the option -polly-ignore-parameter-bounds is set, not all parameters
will be added to context and domains. This is useful to keep the size of the
sets and maps we work with small. Unfortunately, for AST generation it is
necessary to ensure all parameters are part of the schedule tree. Hence,
we modify the GPGPU code generation to make sure this is the case.
To obtain the necessary information we expose a new function
Scop::getFullParamSpace(). We also make a couple of functions const to be
able to make SCoP::getFullParamSpace() const.
Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36243
llvm-svn: 309939
When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain
all the paramters present in the program.
The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff`
to have the same parameter dimensions. To achieve this, we used to realign
every `pw_aff` with `Scop::Context`. However, in conjunction with
`-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context`
does not contain all parameters.
We set this up correctly by creating a space that has all the parameters
used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space.
llvm-svn: 309934
It is possible that the `HostPtr` that coresponds to an array could be
invariant load hoisted. Make sure we use the invariant load hoisted
value by using `IslNodeBuilder::getLatestValue`.
Differential Revision: https://reviews.llvm.org/D36001
llvm-svn: 309681
Summary:
This allows us to map functions such as exp, expf, expl, for which no
LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides
high-performance implementations of a wide range of (math) functions. We
currently link only a small subset, the exp, cos and copysign functions. Other
functions will be enabled as needed.
Reviewers: bollu, singam-sanjay
Reviewed By: bollu
Subscribers: tstellar, tra, nemanjai, pollydev, mgorny, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35703
llvm-svn: 309560
Invariant load hoisted scalars, and arrays whose size we can statically compute
to be 0 do not need to be allocated as arrays.
Invariant load hoisted scalars are sent to the kernel directly as parameters.
Earlier, we used to allocate `0` bytes of memory for these because our
computation of size from `PPCGCodeGeneration::getArraySize` would result in `0`.
Now, since we don't invariant loads as arrays in PPCGCodeGeneration, this
problem does not occur anymore.
Differential Revision: https://reviews.llvm.org/D35795
llvm-svn: 308971
Summary:
Added SPIR Code Generation to the PPCG Code Generator. This can be invoked using
the polly-gpu-arch flag value 'spir32' or 'spir64' for 32 and 64 bit code respectively.
In addition to that, runtime support has been added to execute said SPIR code on Intel
GPU's, where the system is equipped with Intel's open source driver Beignet (development
version). This requires the cmake flag 'USE_INTEL_OCL' to be turned on, and the polly-gpu-runtime
flag value to be 'libopencl'.
The transformation of LLVM IR to SPIR is currently quite a hack, consisting in part of regex
string transformations.
Has been tested (working) with Polybench 3.2 on an Intel i7-5500U (integrated graphics chip).
Reviewers: bollu, grosser, Meinersbur, singam-sanjay
Reviewed By: grosser, singam-sanjay
Subscribers: pollydev, nemanjai, mgorny, Anastasia, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35185
llvm-svn: 308751
This commit *WILL COMPILE*.
1. `PPCG` now uses `isl_multi_pw_aff` instead of an array of `pw_aff`.
This needs us to adjust how we index array bounds and how we construct
array bounds.
2. `PPCG` introduces two new kinds of nodes: `init_device` and `clear_device`.
We should investigate what the correct way to handle these are.
3. `PPCG` has gotten smarter with its use of live range reordering, so some of
the tests have a qualitative improvement.
4. `PPCG` changed its output style, so many test cases need to be updated to
fit the new style for `polly-acc-dump-code` checks.
Differential Revision: https://reviews.llvm.org/D35677
llvm-svn: 308625
We extended kills in Polly to handle both `phi` nodes and scalars that
are not used within the Scop. Update the comments and choice of
variable names to reflect this.
llvm-svn: 308279
- We should call `preloadInvariantLoads` to make sure that code is
generated for invariant loads in the kernel.
Differential Revision: https://reviews.llvm.org/D35410
llvm-svn: 308187
- There is a conditional branch that is used to switch between the old
and new versions of the code.
- If we detect that the build was unsuccessful, `PPCGCodeGeneration` will
change the runtime check to be always set to false.
- To actually *reach* this runtime check instruction, `PPCGCodeGeneration`
was using assumptions about the layout of the BBs.
- However, invariant load hoisting violates this assumption by inserting
an extra basic block in the middle.
- Fix the assumption on the layout by having `createScopConditionally`
return the conditional branch instruction.
- Use this reference to set to always-false.
llvm-svn: 308010
We need to relax constraints on invariant loads so that they do not
create fake RAW dependences. So, we do not consider invariant loads as
scalar dependences in a region.
During these changes, it turned out that we do not consider `llvm::Value`
replacements correctly within `PPCGCodeGeneration` and `ISLNodeBuilder`.
The replacements dictated by `ValueMap` were not being followed in all
places. This was fixed in this commit. There is no clean way to decouple
this change because this bug only seems to arise when the relaxed
version of invariant load hoisting was enabled.
Differential Revision: https://reviews.llvm.org/D35120
llvm-svn: 307907
Summary:
Add a sequence number that identifies a ptx_kernel's parent Scop within a function to it's name to differentiate it from other kernels produced from the same function, yet different Scops.
Kernels produced from different Scops can end up having the same name. Consider a function with 2 Scops and each Scop being able to produce just one kernel. Both of these kernels have the name "kernel_0". This can lead to the wrong kernel being launched when the runtime picks a kernel from its cache based on the name alone. This patch supplements D33985, by differentiating kernels across Scops as well.
Previously (even before D33985) while profiling kernels generated through JIT e.g. Julia, [[ https://groups.google.com/d/msg/polly-dev/J1j587H3-Qw/mR-jfL16BgAJ | kernels associated with different functions, and even different SCoPs within a function, would be grouped together due to the common name ]]. This patch prevents this grouping and the kernels are reported separately.
Reviewers: grosser, bollu
Reviewed By: grosser
Subscribers: mehdi_amini, nemanjai, pollydev, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35176
llvm-svn: 307814