llvm-project

Commit Graph

Author	SHA1	Message	Date
Roman Gareev	f5aff70405	Store the size of the outermost dimension in case of newly created arrays that require memory allocation. We do not need the size of the outermost dimension in most cases, but if we allocate memory for newly created arrays, that size is needed. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D23991 llvm-svn: 281234	2016-09-12 17:08:31 +00:00
Tobias Grosser	c80d6979bd	Drop '@brief' from doxygen comments LLVM's coding guideline suggests to not use @brief for one-sentence doxygen comments to improve readability. Switch this once and for all to ensure people do not copy @brief comments from other parts of Polly, when writing new code. llvm-svn: 280468	2016-09-02 06:33:33 +00:00
Roman Gareev	5f99f8656e	Add a flag to dump SCoP optimized with the IslScheduleOptimizer pass Dump polyhedral descriptions of Scops optimized with the isl scheduling optimizer and the set of post-scheduling transformations applied on the schedule tree to be able to check the work of the IslScheduleOptimizer pass at the polyhedral level. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D23740 llvm-svn: 279395	2016-08-21 11:20:39 +00:00
Roman Gareev	1c892e91e3	Perform replacement of access relations and creation of new arrays according to the packing transformation This is the third patch to apply the BLIS matmul optimization pattern on matmul kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf). BLIS implements gemm as three nested loops around a macro-kernel, plus two packing routines. The macro-kernel is implemented in terms of two additional loops around a micro-kernel. The micro-kernel is a loop around a rank-1 (i.e., outer product) update. In this change we perform replacement of the access relations and create empty arrays, which are steps to implement the packing transformation. In subsequent changes we will implement copying to created arrays. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: http://reviews.llvm.org/D22187 llvm-svn: 278666	2016-08-15 12:22:54 +00:00
Tobias Grosser	2219d15748	Fix a couple of spelling mistakes llvm-svn: 277569	2016-08-03 05:28:09 +00:00
Roman Gareev	3a18a931a8	Apply all necessary tilings and interchangings to get a macro-kernel This is the second patch to apply the BLIS matmul optimization pattern on matmul kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf). BLIS implements gemm as three nested loops around a macro-kernel, plus two packing routines. The macro-kernel is implemented in terms of two additional loops around a micro-kernel. The micro-kernel is a loop around a rank-1 (i.e., outer product) update. In this change we create the BLIS macro-kernel by applying a combination of tiling and interchanging. In subsequent changes we will implement the packing transformation. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: http://reviews.llvm.org/D21491 llvm-svn: 276627	2016-07-25 09:42:53 +00:00
Roman Gareev	2cb4d133f5	[NFC] Refactor creation of the BLIS mirco-kernel and improve documentation Reviewed-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 276616	2016-07-25 07:27:59 +00:00
Tobias Grosser	3898a0468c	Propagate on-error status This ensures that the error status set with -polly-on-isl-error-abort is maintained even after running DependenceInfo and ScheduleOptimizer. Both passes temporarily set the error status to CONTINUE as the dependence analysis uses a compute-out and the scheduler may not be able to derive a schedule. In both cases we want to not abort, but to handle the error gracefully. Before this commit, we always set the error reporting to ABORT after these passes. After this commit, we use the error reporting mode that was active earlier. This comes without a test case as this would require us to introduce (memory) errors which would trigger the isl errors. llvm-svn: 274272	2016-06-30 20:42:58 +00:00
Tobias Grosser	af14993016	Simplify: get isl_ctx only once [NFC] ... instead of call S.getIslCtx() many times. llvm-svn: 274271	2016-06-30 20:42:56 +00:00
Tobias Grosser	522478d2c0	clang-tidy: Add llvm namespace comments llvm commonly adds a comment to the closing brace of a namespace to indicate which namespace is closed. clang-tidy provides with llvm-namespace-comment a handy tool to check for this habit. We use it to ensure we consitently use namespace comments in Polly. There are slightly different styles in how namespaces are closed in LLVM. As there is no large difference between the different comment styles we go for the style clang-tidy suggests by default. To reproduce this fix run: for i in `ls tools/polly/lib//.cpp`; \ clang-tidy -checks='-,llvm-namespace-comment' -p build $i -fix \ -header-filter="."; \ done This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in http://reviews.llvm.org/D21488 and was split out to increase readability. llvm-svn: 273621	2016-06-23 22:17:27 +00:00
Tobias Grosser	8dd653d983	clang-tidy: apply modern-use-nullptr fixes Instead of using 0 or NULL use the C++11 nullptr symbol when referencing null pointers. This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in http://reviews.llvm.org/D21488 and was split out to increase readability. llvm-svn: 273435	2016-06-22 16:22:00 +00:00
Roman Gareev	397a34a08d	[NFC] Use isl_schedule_node_band_n_member to get the number of dimensions of a band node. llvm-svn: 273400	2016-06-22 12:11:30 +00:00
Roman Gareev	42402c9e89	Apply all necessary tilings and unrollings to get a micro-kernel This is the first patch to apply the BLIS matmul optimization pattern on matmul kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf). BLIS implements gemm as three nested loops around a macro-kernel, plus two packing routines. The macro-kernel is implemented in terms of two additional loops around a micro-kernel. The micro-kernel is a loop around a rank-1 (i.e., outer product) update. In this change we create the BLIS micro-kernel by applying a combination of tiling and unrolling. In subsequent changes we will add the extraction of the BLIS macro-kernel and implement the packing transformation. Contributed-by: Roman Gareev <gareevroman@gmail.com> Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: http://reviews.llvm.org/D21140 llvm-svn: 273397	2016-06-22 09:52:37 +00:00
Roman Gareev	b17b9a8324	[NFC] Outline the application of register tiling. llvm-svn: 272515	2016-06-12 17:20:05 +00:00
Roman Gareev	827264de98	[NFC] "#include <ciso646>" is unnecessary, because "and", "or" were replaced by "&&", "\|\|". llvm-svn: 272168	2016-06-08 16:44:11 +00:00
Roman Gareev	ba0fb97c0a	[NFC] Check that a parameter of ScheduleTreeOptimizer::isMatrMultPattern contains a correct partial schedule llvm-svn: 271780	2016-06-04 06:34:04 +00:00
Roman Gareev	4b8c7aeb62	[FIX] Fix potential issue related to subtraction from an unsigned 0 in circularShiftOutputDims Reported-by: Mehdi Amini <mehdi.amini@apple.com> Contributed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: http://reviews.llvm.org/D20969 llvm-svn: 271705	2016-06-03 18:46:29 +00:00
Roman Gareev	76614d3ed9	[GSoC 2016] [Polly] [FIX] Determination of statements that contain matrix multiplication Fix small issues related to characters, operators and descriptions of tests. Differential Revision: http://reviews.llvm.org/D20806 llvm-svn: 271264	2016-05-31 11:22:21 +00:00
Johannes Doerfert	99191c78c2	Decouple SCoP building logic from pass Created a new pass ScopInfoRegionPass. As name suggests, it is a region pass and it is there to preserve compatibility with our existing Polly passes. ScopInfoRegionPass will return a SCoP object for a valid region while the creation of the SCoP stays in the ScopInfo class. Contributed-by: Utpal Bora <cs14mtech11017@iith.ac.in> Reviewed-by: Tobias Grosser <tobias@grosser.es>, Johannes Doerfert <doerfert@cs.uni-saarland.de> Differential Revision: http://reviews.llvm.org/D20770 llvm-svn: 271259	2016-05-31 09:41:04 +00:00
Michael Kruse	7410a27820	MSVC compile fix: #include <ciso646>. NFC. This header is required to make the ISO 646 alternative operator spellings ("and", "or" instead of "&&", "\|\|") work. Should these operators be replaced by the standard ones as already suggested by Johannes, also remove this #include again. llvm-svn: 271206	2016-05-30 14:27:14 +00:00
Roman Gareev	9c3eb5960a	Determination of statements that contain matrix multiplication Add determination of statements that contain, in particular, matrix multiplications and can be optimized with [1] to try to get close-to-peak performance. It can be enabled via polly-pm-based-opts, which is false by default. Refs: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf Contributed-by: Roman Gareev <gareevroman@gmail.com> Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: http://reviews.llvm.org/D20575 llvm-svn: 271128	2016-05-28 16:17:58 +00:00
Michael Kruse	315aa3278e	[ScheduleOptimizer] Add -polly-opt-outer-coincidence option. Add a command line switch to set the isl_options_set_schedule_outer_coincidence option. ISL then tries to build schedules where the outer member of a band satisfies the coincidence constraints. In practice this allows loop skewing for more parallelism in inner loops. llvm-svn: 268222	2016-05-02 11:35:27 +00:00
Hongbin Zheng	2a798853f8	Allow the client of DependenceInfo to obtain dependences at different granularities. llvm-svn: 262591	2016-03-03 08:15:33 +00:00
Roman Gareev	11001e1534	Annotation of SIMD loops Use 'mark' nodes annotate a SIMD loop during ScheduleTransformation and skip parallelism checks. The buildbot shows the following compile/execution time changes: Compile time: Improvements Δ Previous Current σ …/gesummv -6.06% 0.2640 0.2480 0.0055 …/gemver -4.46% 0.4480 0.4280 0.0044 …/covariance -4.31% 0.8360 0.8000 0.0065 …/adi -3.23% 0.9920 0.9600 0.0065 …/doitgen -2.53% 0.9480 0.9240 0.0090 …/3mm -2.33% 1.0320 1.0080 0.0087 Execution time: Regressions Δ Previous Current σ …/viterbi 1.70% 5.1840 5.2720 0.0074 …/smallpt 1.06% 12.4920 12.6240 0.0040 Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: http://reviews.llvm.org/D14491 llvm-svn: 261620	2016-02-23 09:00:13 +00:00
Tobias Grosser	ca7f5bb767	Full/partial tile separation for vectorization We isolate full tiles from partial tiles to be able to, for example, vectorize loops with parametric lower and/or upper bounds. If we use -polly-vectorizer=stripmine, we can see execution-time improvements: correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to 0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from 8m5565s to 7m4285s (-13.18 %). The current full/partial tile separation increases compile-time more than necessary. As a result, we see in compile time regressions, for example, for 3mm from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected as we generate more IR and consequently more time is spent in the LLVM backends. However, a first investiagation has shown that a larger portion of compile time is unnecessarily spent inside Polly's parallelism detection and could be eliminated by propagating existing knowledge about vector loop parallelism. Before enabling -polly-vectorizer=stripmine by default, it is necessary to address this compile-time issue. Contributed-by: Roman Gareev <gareevroman@gmail.com> Reviewers: jdoerfert, grosser Subscribers: grosser, #polly Differential Revision: http://reviews.llvm.org/D13779 llvm-svn: 250809	2015-10-20 09:12:21 +00:00
Johannes Doerfert	45be64464b	[NFC] Consistenly use commented and annotated ScopPass functions The changes affect methods that are part of the Pass interface and include: - Comments that describe the methods purpose. - A consistent use of the keywords override and virtual. Additionally, the printScop method is now optional and removed from SCoP passes that do not implement it. llvm-svn: 248685	2015-09-27 15:43:29 +00:00
Johannes Doerfert	0f37630849	[NFC] Use releaseMemory to release internal memory llvm-svn: 248684	2015-09-27 15:42:28 +00:00
Tobias Grosser	fa57e9b7e6	Make our data-locality schedule tree transforms externally accessible Other passes which perform different optimizations might be interested in also applying data-locality transformations as part of their overall transformation. llvm-svn: 245824	2015-08-24 06:01:47 +00:00
Tobias Grosser	1ac884d73a	Use marker nodes to annotate the different levels of tiling Currently, marker nodes are ignored during AST generation, but visible in the -debug-only=polly-ast output. llvm-svn: 245809	2015-08-23 09:11:00 +00:00
Tobias Grosser	fc490a99f5	Do really not unroll the vector loop in combination with register tiling The previous commit lacked a test case for register tiling + pre-vectorization and we obviously got it immediately wrong. llvm-svn: 245599	2015-08-20 19:08:16 +00:00
Tobias Grosser	42e2489553	Add experimental support for trivial register tiling Register tiling in Polly is for now just an additional level of tiling which is fully unrolled. It is disabled by default. To make this useful for more than experiments, we still need a cost function as well as possibly further optimizations that teach LLVM to actually put some of the values we got into scalar registers. llvm-svn: 245564	2015-08-20 13:45:05 +00:00
Tobias Grosser	0483271662	Add support for two-level tiling By default we only use one level of tiling for loops, but in general tiling for multiple levels is trivial for us. Hence, we add a set of options that allow people to play with a second level of tiling. If this is profitable for some cases we can work on heuristics that allow us to identify these cases and use two-level tiling for them. llvm-svn: 245563	2015-08-20 13:45:02 +00:00
Tobias Grosser	862b9b5239	Factor out check for tileable band node. llvm-svn: 245559	2015-08-20 12:32:45 +00:00
Tobias Grosser	9bdea573bd	Introduce tileBand function to simplify code llvm-svn: 245558	2015-08-20 12:22:37 +00:00
Tobias Grosser	d891b54132	Add some forgotten isl memory annotations llvm-svn: 245557	2015-08-20 12:16:23 +00:00
Tobias Grosser	07c1c2fcc9	Make prevectorization width configurable Polly uses 'prevectorization' to enable outer loop vectorization. When vectorizing an outer loop, we strip-mine <number-of-prevec-dims> loop iterations which are than interchanged to the innermost level such that LLVM's inner loop vectorizer (or Polly's simple vectorizer) can easily vectorize this loop. The number of loop iterations to strip-mine is now configurable with the option -polly-prevect-width=<number-of-prevec-dims>. This is mostly a debugging option. We should probably add a heuristic that derives the number of prevectorization dimensions from the target data and the data types used. llvm-svn: 245424	2015-08-19 08:46:11 +00:00
Tobias Grosser	161c9081e5	Do not use negative option name Instead of -polly-no-tiling, we use -polly-tiling=false to disable tiling. llvm-svn: 245423	2015-08-19 08:22:06 +00:00
Tobias Grosser	f10f4636ff	Simplify tiling code a bit We only need to allocate the tile size vector if we actually want to perform a tiling. llvm-svn: 245422	2015-08-19 08:03:37 +00:00
Tobias Grosser	234a48270e	AST Generation Paper published in TOPLAS The July issue of TOPLAS contains a 50 page discussion of the AST generation techniques used in Polly. This discussion gives not only an in-depth description of how we (re)generate an imperative AST from our polyhedral based mathematical program description, but also gives interesting insights about: - Schedule trees: A tree-based mathematical program description that enables us to perform loop transformations on an abstract level, while issues like the generation of the correct loop structure and loop bounds will be taken care of by our AST generator. - Polyhedral unrolling: We discuss techniques that allow the unrolling of non-trivial loops in the context of parameteric loop bounds, complex tile shapes and conditionally executed statements. Such unrolling support enables the generation of predicated code e.g. in the context of GPGPU computing. - Isolation for full/partial tile separation: We discuss native support for handling full/partial tile separation and -- in general -- native support for isolation of boundary cases to enable smooth code generation for core computations. - AST generation with modulo constraints: We discuss how modulo mappings are lowered to efficient C/LLVM code. - User-defined constraint sets for run-time checks We discuss how arbitrary sets of constraints can be used to automatically create run-time checks that ensure a set of constrainst actually hold. This feature is very useful to verify at run-time various assumptions that have been taken program optimization. Polyhedral AST generation is more than scanning polyhedra Tobias Grosser, Sven Verdoolaege, Albert Cohen ACM Transations on Programming Languages and Systems (TOPLAS), 37(4), July 2015 llvm-svn: 245157	2015-08-15 09:34:33 +00:00
Tobias Grosser	b241d928bd	Rewrite getPrevectorMap using schedule trees operations Schedule trees are a lot easier to work with, for both humans and machines. For humans the more structured schedule representation is easier to reason about. Together with the more abstract isl programming interface this can result in a lot cleaner code (see this changeset). For machines, the structured schedule and the fact that we now use explicit piecewise affine expressions instead of integer maps makes it easier to generate code from this schedule tree. As a result, we can already see a slight compile-time improvement -- for 3mm from 0m0.593s to 0m0.551s seconds (-7 %). More importantly, future optimizations such as full-partial tile separation will most likely result in more streamlined code to be generated. Contributed-by: Roman Gareev <gareevroman@gmail.com> llvm-svn: 243458	2015-07-28 18:03:36 +00:00
Tobias Grosser	2764794ba4	Simplify some isl expression we use Suggested-by: Sven Verdoolaege <skimo-polly@kotnet.org> llvm-svn: 243254	2015-07-26 19:22:35 +00:00
Tobias Grosser	3b10c94062	Prevectorize the schedule of the band (or the point loop in case of tiling) Contributed-by: Roman Gareev <gareevroman@gmail.com> llvm-svn: 243214	2015-07-25 12:28:56 +00:00
Tobias Grosser	808cd69a92	Use schedule trees to represent execution order of statements Instead of flat schedules, we now use so-called schedule trees to represent the execution order of the statements in a SCoP. Schedule trees make it a lot easier to analyze, understand and modify properties of a schedule, as specific nodes in the tree can be choosen and possibly replaced. This patch does not yet fully move our DependenceInfo pass to schedule trees, as some additional performance analysis is needed here. (In general schedule trees should be faster in compile-time, as the more structured representation is generally easier to analyze and work with). We also can not yet perform the reduction analysis on schedule trees. For more information regarding schedule trees, please see Section 6 of https://lirias.kuleuven.be/handle/123456789/497238 llvm-svn: 242130	2015-07-14 09:33:13 +00:00
Michael Kruse	c59f22c556	Update ISL to isl-0.15-3-g532568a This version adds small integer optimization, but is not active by default. It will be enabled in a later commit. The schedule-fuse=min/max option has been replaced by the serialize-sccs option. Adapting Polly was necessary, but retaining the name polly-opt-fusion=min/max. Differential Revision: http://reviews.llvm.org/D10505 Reviewers: grosser llvm-svn: 240027	2015-06-18 16:45:40 +00:00
Tobias Grosser	97d8745087	Dump YAML schedule tree as properly indented tree in DEBUG output llvm-svn: 238645	2015-05-30 06:46:59 +00:00
Tobias Grosser	b2f399264d	Update isl to 93b8e43d This update brings mostly interface cleanups, but also fixes two bugs in imath (a memory leak, some undefined behavior). llvm-svn: 238422	2015-05-28 13:32:11 +00:00
Tobias Grosser	7c3bad52dd	Use value semantics for list of ScopStmt(s) instead of std::owningptr David Blaike suggested this as an alternative to the use of owningptr(s) for our memory management, as value semantics allow to avoid the additional interface complexity caused by owningptr while still providing similar memory consistency guarantees. We could also have used a std::vector, but the use of std::vector would yield possibly changing pointers which currently causes problems as for example the memory accesses carry pointers to their parent statements. Such pointers should not change. Reviewer: jblaikie, jdoerfert Differential Revision: http://reviews.llvm.org/D10041 llvm-svn: 238290	2015-05-27 05:16:57 +00:00
Tobias Grosser	679dfafd33	Use unique_ptr to clarify ownership of ScopStmt llvm-svn: 238090	2015-05-23 05:14:09 +00:00
Tobias Grosser	1b6ea573f2	Replace low-level constraint building with higher level functions Instead of explicitly building constraints and adding them to our maps we now use functions like map_order_le to add the relevant information to the maps. llvm-svn: 237934	2015-05-21 19:02:44 +00:00
Tobias Grosser	cd524dc51d	Add explicit #includes for used isl features llvm-svn: 236931	2015-05-09 09:36:38 +00:00

1 2

74 Commits