Commit Graph

74 Commits

Author SHA1 Message Date
Roman Gareev f5aff70405 Store the size of the outermost dimension in case of newly created arrays that require memory allocation.
We do not need the size of the outermost dimension in most cases, but if we
allocate memory for newly created arrays, that size is needed.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D23991

llvm-svn: 281234
2016-09-12 17:08:31 +00:00
Tobias Grosser c80d6979bd Drop '@brief' from doxygen comments
LLVM's coding guideline suggests to not use @brief for one-sentence doxygen
comments to improve readability. Switch this once and for all to ensure people
do not copy @brief comments from other parts of Polly, when writing new code.

llvm-svn: 280468
2016-09-02 06:33:33 +00:00
Roman Gareev 5f99f8656e Add a flag to dump SCoP optimized with the IslScheduleOptimizer pass
Dump polyhedral descriptions of Scops optimized with the isl scheduling
optimizer and the set of post-scheduling transformations applied
on the schedule tree to be able to check the work of the IslScheduleOptimizer
pass at the polyhedral level.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D23740

llvm-svn: 279395
2016-08-21 11:20:39 +00:00
Roman Gareev 1c892e91e3 Perform replacement of access relations and creation of new arrays according to the packing transformation
This is the third patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform replacement of
the access relations and create empty arrays, which are steps to implement
the packing transformation. In subsequent changes we will implement copying
to created arrays.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D22187

llvm-svn: 278666
2016-08-15 12:22:54 +00:00
Tobias Grosser 2219d15748 Fix a couple of spelling mistakes
llvm-svn: 277569
2016-08-03 05:28:09 +00:00
Roman Gareev 3a18a931a8 Apply all necessary tilings and interchangings to get a macro-kernel
This is the second patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus
two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update. In this change
we create the BLIS macro-kernel by applying a combination of tiling
and interchanging. In subsequent changes we will implement the packing
transformation.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D21491

llvm-svn: 276627
2016-07-25 09:42:53 +00:00
Roman Gareev 2cb4d133f5 [NFC] Refactor creation of the BLIS mirco-kernel and improve documentation
Reviewed-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 276616
2016-07-25 07:27:59 +00:00
Tobias Grosser 3898a0468c Propagate on-error status
This ensures that the error status set with -polly-on-isl-error-abort is
maintained even after running DependenceInfo and ScheduleOptimizer. Both
passes temporarily set the error status to CONTINUE as the dependence
analysis uses a compute-out and the scheduler may not be able to derive
a schedule. In both cases we want to not abort, but to handle the error
gracefully. Before this commit, we always set the error reporting to ABORT
after these passes. After this commit, we use the error reporting mode that was
active earlier.

This comes without a test case as this would require us to introduce (memory)
errors which would trigger the isl errors.

llvm-svn: 274272
2016-06-30 20:42:58 +00:00
Tobias Grosser af14993016 Simplify: get isl_ctx only once [NFC]
... instead of call S.getIslCtx() many times.

llvm-svn: 274271
2016-06-30 20:42:56 +00:00
Tobias Grosser 522478d2c0 clang-tidy: Add llvm namespace comments
llvm commonly adds a comment to the closing brace of a namespace to indicate
which namespace is closed. clang-tidy provides with llvm-namespace-comment
a handy tool to check for this habit. We use it to ensure we consitently use
namespace comments in Polly.

There are slightly different styles in how namespaces are closed in LLVM. As
there is no large difference between the different comment styles we go for the
style clang-tidy suggests by default.

To reproduce this fix run:

for i in `ls tools/polly/lib/*/*.cpp`; \
  clang-tidy -checks='-*,llvm-namespace-comment' -p build $i -fix \
  -header-filter=".*"; \
done

This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.

llvm-svn: 273621
2016-06-23 22:17:27 +00:00
Tobias Grosser 8dd653d983 clang-tidy: apply modern-use-nullptr fixes
Instead of using 0 or NULL use the C++11 nullptr symbol when referencing null
pointers.

This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.

llvm-svn: 273435
2016-06-22 16:22:00 +00:00
Roman Gareev 397a34a08d [NFC] Use isl_schedule_node_band_n_member to get the number of dimensions of a band node.
llvm-svn: 273400
2016-06-22 12:11:30 +00:00
Roman Gareev 42402c9e89 Apply all necessary tilings and unrollings to get a micro-kernel
This is the first patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel,
plus two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update.
In this change we create the BLIS micro-kernel by applying
a combination of tiling and unrolling. In subsequent changes
we will add the extraction of the BLIS macro-kernel
and implement the packing transformation.

Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D21140

llvm-svn: 273397
2016-06-22 09:52:37 +00:00
Roman Gareev b17b9a8324 [NFC] Outline the application of register tiling.
llvm-svn: 272515
2016-06-12 17:20:05 +00:00
Roman Gareev 827264de98 [NFC] "#include <ciso646>" is unnecessary, because "and", "or" were replaced
by "&&", "||".

llvm-svn: 272168
2016-06-08 16:44:11 +00:00
Roman Gareev ba0fb97c0a [NFC] Check that a parameter of ScheduleTreeOptimizer::isMatrMultPattern contains a correct partial schedule
llvm-svn: 271780
2016-06-04 06:34:04 +00:00
Roman Gareev 4b8c7aeb62 [FIX] Fix potential issue related to subtraction from an unsigned 0 in circularShiftOutputDims
Reported-by: Mehdi Amini <mehdi.amini@apple.com>
Contributed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: http://reviews.llvm.org/D20969

llvm-svn: 271705
2016-06-03 18:46:29 +00:00
Roman Gareev 76614d3ed9 [GSoC 2016] [Polly] [FIX] Determination of statements that contain matrix
multiplication

Fix small issues related to characters, operators  and descriptions of tests.

Differential Revision: http://reviews.llvm.org/D20806

llvm-svn: 271264
2016-05-31 11:22:21 +00:00
Johannes Doerfert 99191c78c2 Decouple SCoP building logic from pass
Created a new pass ScopInfoRegionPass. As name suggests, it is a
  region pass and it is there to preserve compatibility with our
  existing Polly passes.  ScopInfoRegionPass will return a SCoP object
  for a valid region while the creation of the SCoP stays in the
  ScopInfo class.

  Contributed-by: Utpal Bora <cs14mtech11017@iith.ac.in>
  Reviewed-by: Tobias Grosser <tobias@grosser.es>,
               Johannes Doerfert <doerfert@cs.uni-saarland.de>

Differential Revision: http://reviews.llvm.org/D20770

llvm-svn: 271259
2016-05-31 09:41:04 +00:00
Michael Kruse 7410a27820 MSVC compile fix: #include <ciso646>. NFC.
This header is required to make the ISO 646 alternative operator
spellings ("and", "or" instead of "&&", "||") work. Should these
operators be replaced by the standard ones as already suggested by
Johannes, also remove this #include again.

llvm-svn: 271206
2016-05-30 14:27:14 +00:00
Roman Gareev 9c3eb5960a Determination of statements that contain matrix multiplication
Add determination of statements that contain, in particular,
matrix multiplications and can be optimized with [1] to try to
get close-to-peak performance. It can be enabled
via polly-pm-based-opts, which is false by default.

Refs:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D20575

llvm-svn: 271128
2016-05-28 16:17:58 +00:00
Michael Kruse 315aa3278e [ScheduleOptimizer] Add -polly-opt-outer-coincidence option.
Add a command line switch to set the
isl_options_set_schedule_outer_coincidence option. ISL then tries to
build schedules where the outer member of a band satisfies the
coincidence constraints.

In practice this allows loop skewing for more parallelism in inner
loops.

llvm-svn: 268222
2016-05-02 11:35:27 +00:00
Hongbin Zheng 2a798853f8 Allow the client of DependenceInfo to obtain dependences at different granularities.
llvm-svn: 262591
2016-03-03 08:15:33 +00:00
Roman Gareev 11001e1534 Annotation of SIMD loops
Use 'mark' nodes annotate a SIMD loop during ScheduleTransformation and skip
parallelism checks.

The buildbot shows the following compile/execution time changes:

  Compile time:
    Improvements    Δ     Previous  Current  σ
    …/gesummv      -6.06% 0.2640    0.2480   0.0055
    …/gemver       -4.46% 0.4480    0.4280   0.0044
    …/covariance   -4.31% 0.8360    0.8000   0.0065
    …/adi          -3.23% 0.9920    0.9600   0.0065
    …/doitgen      -2.53% 0.9480    0.9240   0.0090
    …/3mm          -2.33% 1.0320    1.0080   0.0087

  Execution time:
    Regressions     Δ     Previous  Current  σ
    …/viterbi       1.70% 5.1840    5.2720   0.0074
    …/smallpt       1.06% 12.4920   12.6240  0.0040

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D14491

llvm-svn: 261620
2016-02-23 09:00:13 +00:00
Tobias Grosser ca7f5bb767 Full/partial tile separation for vectorization
We isolate full tiles from partial tiles to be able to, for example, vectorize
loops with parametric lower and/or upper bounds.

If we use -polly-vectorizer=stripmine, we can see execution-time improvements:
correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to
0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from
8m5565s to 7m4285s (-13.18 %).

The current full/partial tile separation increases compile-time more than
necessary. As a result, we see in compile time regressions, for example, for 3mm
from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected
as we generate more IR and consequently more time is spent in the LLVM backends.
However, a first investiagation has shown that a larger portion of compile time
is unnecessarily spent inside Polly's parallelism detection and could be
eliminated by propagating existing knowledge about vector loop parallelism.
Before enabling -polly-vectorizer=stripmine by default, it is necessary to
address this compile-time issue.

Contributed-by: Roman Gareev <gareevroman@gmail.com>

Reviewers: jdoerfert, grosser

Subscribers: grosser, #polly

Differential Revision: http://reviews.llvm.org/D13779

llvm-svn: 250809
2015-10-20 09:12:21 +00:00
Johannes Doerfert 45be64464b [NFC] Consistenly use commented and annotated ScopPass functions
The changes affect methods that are part of the Pass interface and
  include:
    - Comments that describe the methods purpose.
    - A consistent use of the keywords override and virtual.
  Additionally, the printScop method is now optional and removed from
  SCoP passes that do not implement it.

llvm-svn: 248685
2015-09-27 15:43:29 +00:00
Johannes Doerfert 0f37630849 [NFC] Use releaseMemory to release internal memory
llvm-svn: 248684
2015-09-27 15:42:28 +00:00
Tobias Grosser fa57e9b7e6 Make our data-locality schedule tree transforms externally accessible
Other passes which perform different optimizations might be interested in
also applying data-locality transformations as part of their overall
transformation.

llvm-svn: 245824
2015-08-24 06:01:47 +00:00
Tobias Grosser 1ac884d73a Use marker nodes to annotate the different levels of tiling
Currently, marker nodes are ignored during AST generation, but visible in the
-debug-only=polly-ast output.

llvm-svn: 245809
2015-08-23 09:11:00 +00:00
Tobias Grosser fc490a99f5 Do really not unroll the vector loop in combination with register tiling
The previous commit lacked a test case for register tiling + pre-vectorization
and we obviously got it immediately wrong.

llvm-svn: 245599
2015-08-20 19:08:16 +00:00
Tobias Grosser 42e2489553 Add experimental support for trivial register tiling
Register tiling in Polly is for now just an additional level of tiling which
is fully unrolled. It is disabled by default. To make this useful for more than
experiments, we still need a cost function as well as possibly further
optimizations that teach LLVM to actually put some of the values we got into
scalar registers.

llvm-svn: 245564
2015-08-20 13:45:05 +00:00
Tobias Grosser 0483271662 Add support for two-level tiling
By default we only use one level of tiling for loops, but in general tiling
for multiple levels is trivial for us. Hence, we add a set of options that
allow people to play with a second level of tiling. If this is profitable for
some cases we can work on heuristics that allow us to identify these cases
and use two-level tiling for them.

llvm-svn: 245563
2015-08-20 13:45:02 +00:00
Tobias Grosser 862b9b5239 Factor out check for tileable band node.
llvm-svn: 245559
2015-08-20 12:32:45 +00:00
Tobias Grosser 9bdea573bd Introduce tileBand function to simplify code
llvm-svn: 245558
2015-08-20 12:22:37 +00:00
Tobias Grosser d891b54132 Add some forgotten isl memory annotations
llvm-svn: 245557
2015-08-20 12:16:23 +00:00
Tobias Grosser 07c1c2fcc9 Make prevectorization width configurable
Polly uses 'prevectorization' to enable outer loop vectorization. When
vectorizing an outer loop, we strip-mine <number-of-prevec-dims> loop
iterations which are than interchanged to the innermost level such that LLVM's
inner loop vectorizer (or Polly's simple vectorizer) can easily vectorize this
loop. The number of loop iterations to strip-mine is now configurable with the
option -polly-prevect-width=<number-of-prevec-dims>.

This is mostly a debugging option. We should probably add a heuristic that
derives the number of prevectorization dimensions from the target data and
the data types used.

llvm-svn: 245424
2015-08-19 08:46:11 +00:00
Tobias Grosser 161c9081e5 Do not use negative option name
Instead of -polly-no-tiling, we use -polly-tiling=false to disable tiling.

llvm-svn: 245423
2015-08-19 08:22:06 +00:00
Tobias Grosser f10f4636ff Simplify tiling code a bit
We only need to allocate the tile size vector if we actually want to perform
a tiling.

llvm-svn: 245422
2015-08-19 08:03:37 +00:00
Tobias Grosser 234a48270e AST Generation Paper published in TOPLAS
The July issue of TOPLAS contains a 50 page discussion of the AST generation
techniques used in Polly. This discussion gives not only an in-depth
description of how we (re)generate an imperative AST from our polyhedral based
mathematical program description, but also gives interesting insights about:

- Schedule trees: A tree-based mathematical program description that enables us
to perform loop transformations on an abstract level, while issues like the
generation of the correct loop structure and loop bounds will be taken care of
by our AST generator.

- Polyhedral unrolling: We discuss techniques that allow the unrolling of
non-trivial loops in the context of parameteric loop bounds, complex tile
shapes and conditionally executed statements. Such unrolling support enables
the generation of predicated code e.g. in the context of GPGPU computing.

- Isolation for full/partial tile separation: We discuss native support for
handling full/partial tile separation and -- in general -- native support for
isolation of boundary cases to enable smooth code generation for core
computations.

- AST generation with modulo constraints: We discuss how modulo mappings are
lowered to efficient C/LLVM code.

- User-defined constraint sets for run-time checks We discuss how arbitrary
sets of constraints can be used to automatically create run-time checks that
ensure a set of constrainst actually hold. This feature is very useful to
verify at run-time various assumptions that have been taken program
optimization.

Polyhedral AST generation is more than scanning polyhedra
Tobias Grosser, Sven Verdoolaege, Albert Cohen
ACM Transations on Programming Languages and Systems (TOPLAS), 37(4), July 2015

llvm-svn: 245157
2015-08-15 09:34:33 +00:00
Tobias Grosser b241d928bd Rewrite getPrevectorMap using schedule trees operations
Schedule trees are a lot easier to work with, for both humans and machines. For
humans the more structured schedule representation is easier to reason about.
Together with the more abstract isl programming interface this can result in a
lot cleaner code (see this changeset). For machines, the structured schedule and
the fact that we now use explicit piecewise affine expressions instead of
integer maps makes it easier to generate code from this schedule tree. As a
result, we can already see a slight compile-time improvement -- for 3mm from
0m0.593s to 0m0.551s seconds (-7 %). More importantly, future optimizations such
as full-partial tile separation will most likely result in more streamlined code
to be generated.

Contributed-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 243458
2015-07-28 18:03:36 +00:00
Tobias Grosser 2764794ba4 Simplify some isl expression we use
Suggested-by: Sven Verdoolaege <skimo-polly@kotnet.org>
llvm-svn: 243254
2015-07-26 19:22:35 +00:00
Tobias Grosser 3b10c94062 Prevectorize the schedule of the band (or the point loop in case of tiling)
Contributed-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 243214
2015-07-25 12:28:56 +00:00
Tobias Grosser 808cd69a92 Use schedule trees to represent execution order of statements
Instead of flat schedules, we now use so-called schedule trees to represent the
execution order of the statements in a SCoP. Schedule trees make it a lot easier
to analyze, understand and modify properties of a schedule, as specific nodes
in the tree can be choosen and possibly replaced.

This patch does not yet fully move our DependenceInfo pass to schedule trees,
as some additional performance analysis is needed here. (In general schedule
trees should be faster in compile-time, as the more structured representation
is generally easier to analyze and work with). We also can not yet perform the
reduction analysis on schedule trees.

For more information regarding schedule trees, please see Section 6 of
https://lirias.kuleuven.be/handle/123456789/497238

llvm-svn: 242130
2015-07-14 09:33:13 +00:00
Michael Kruse c59f22c556 Update ISL to isl-0.15-3-g532568a
This version adds small integer optimization, but is not active by
default. It will be enabled in a later commit.
    
The schedule-fuse=min/max option has been replaced by the
serialize-sccs option. Adapting Polly was necessary, but retaining the
name polly-opt-fusion=min/max.

Differential Revision: http://reviews.llvm.org/D10505

Reviewers: grosser
llvm-svn: 240027
2015-06-18 16:45:40 +00:00
Tobias Grosser 97d8745087 Dump YAML schedule tree as properly indented tree in DEBUG output
llvm-svn: 238645
2015-05-30 06:46:59 +00:00
Tobias Grosser b2f399264d Update isl to 93b8e43d
This update brings mostly interface cleanups, but also fixes two bugs in
imath (a memory leak, some undefined behavior).

llvm-svn: 238422
2015-05-28 13:32:11 +00:00
Tobias Grosser 7c3bad52dd Use value semantics for list of ScopStmt(s) instead of std::owningptr
David Blaike suggested this as an alternative to the use of owningptr(s) for our
memory management, as value semantics allow to avoid the additional interface
complexity caused by owningptr while still providing similar memory consistency
guarantees. We could also have used a std::vector, but the use of std::vector
would yield possibly changing pointers which currently causes problems as for
example the memory accesses carry pointers to their parent statements. Such
pointers should not change.

Reviewer: jblaikie, jdoerfert

Differential Revision: http://reviews.llvm.org/D10041

llvm-svn: 238290
2015-05-27 05:16:57 +00:00
Tobias Grosser 679dfafd33 Use unique_ptr to clarify ownership of ScopStmt
llvm-svn: 238090
2015-05-23 05:14:09 +00:00
Tobias Grosser 1b6ea573f2 Replace low-level constraint building with higher level functions
Instead of explicitly building constraints and adding them to our maps we
now use functions like map_order_le to add the relevant information to the
maps.

llvm-svn: 237934
2015-05-21 19:02:44 +00:00
Tobias Grosser cd524dc51d Add explicit #includes for used isl features
llvm-svn: 236931
2015-05-09 09:36:38 +00:00