forked from OSchip/llvm-project
				
			
		
			
				
	
	
		
			224 lines
		
	
	
		
			8.2 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			224 lines
		
	
	
		
			8.2 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
======================
 | 
						|
Using Polly with Clang
 | 
						|
======================
 | 
						|
 | 
						|
This documentation discusses how Polly can be used in Clang to automatically
 | 
						|
optimize C/C++ code during compilation.
 | 
						|
 | 
						|
 | 
						|
.. warning::
 | 
						|
 | 
						|
  Warning: clang/LLVM/Polly need to be in sync (compiled from the same SVN
 | 
						|
  revision).
 | 
						|
 | 
						|
Make Polly available from Clang
 | 
						|
===============================
 | 
						|
 | 
						|
Polly is available through clang, opt, and bugpoint, if Polly was checked out
 | 
						|
into tools/polly before compilation. No further configuration is needed.
 | 
						|
 | 
						|
Optimizing with Polly
 | 
						|
=====================
 | 
						|
 | 
						|
Optimizing with Polly is as easy as adding -O3 -mllvm -polly to your compiler
 | 
						|
flags (Polly is not available unless optimizations are enabled, such as
 | 
						|
-O1,-O2,-O3; Optimizing for size with -Os or -Oz is not recommended).
 | 
						|
 | 
						|
.. code-block:: console
 | 
						|
 | 
						|
  clang -O3 -mllvm -polly file.c
 | 
						|
 | 
						|
Automatic OpenMP code generation
 | 
						|
================================
 | 
						|
 | 
						|
To automatically detect parallel loops and generate OpenMP code for them you
 | 
						|
also need to add -mllvm -polly-parallel -lgomp to your CFLAGS.
 | 
						|
 | 
						|
.. code-block:: console
 | 
						|
 | 
						|
  clang -O3 -mllvm -polly -mllvm -polly-parallel -lgomp file.c
 | 
						|
 | 
						|
Switching the OpenMP backend
 | 
						|
----------------------------
 | 
						|
 | 
						|
The following CL switch allows to choose Polly's OpenMP-backend:
 | 
						|
 | 
						|
       -polly-omp-backend[=BACKEND]
 | 
						|
              choose the OpenMP backend; BACKEND can be 'GNU' (the default) or 'LLVM';
 | 
						|
 | 
						|
The OpenMP backends can be further influenced using the following CL switches:
 | 
						|
 | 
						|
 | 
						|
       -polly-num-threads[=NUM]
 | 
						|
              set the number of threads to use; NUM may be any positive integer (default: 0, which equals automatic/OMP runtime);
 | 
						|
 | 
						|
       -polly-scheduling[=SCHED]
 | 
						|
              set the OpenMP scheduling type; SCHED can be 'static', 'dynamic', 'guided' or 'runtime' (the default);
 | 
						|
 | 
						|
       -polly-scheduling-chunksize[=CHUNK]
 | 
						|
              set the chunksize (for the selected scheduling type); CHUNK may be any strictly positive integer (otherwise it will default to 1);
 | 
						|
 | 
						|
Note that at the time of writing, the GNU backend may only use the
 | 
						|
`polly-num-threads` and `polly-scheduling` switches, where the latter also has
 | 
						|
to be set to "runtime".
 | 
						|
 | 
						|
Example: Use alternative backend with dynamic scheduling, four threads and
 | 
						|
chunksize of one (additional switches).
 | 
						|
 | 
						|
.. code-block:: console
 | 
						|
 | 
						|
  -mllvm -polly-omp-backend=LLVM -mllvm -polly-num-threads=4
 | 
						|
  -mllvm -polly-scheduling=dynamic -mllvm -polly-scheduling-chunksize=1
 | 
						|
 | 
						|
Automatic Vector code generation
 | 
						|
================================
 | 
						|
 | 
						|
Automatic vector code generation can be enabled by adding -mllvm
 | 
						|
-polly-vectorizer=stripmine to your CFLAGS.
 | 
						|
 | 
						|
.. code-block:: console
 | 
						|
 | 
						|
  clang -O3 -mllvm -polly -mllvm -polly-vectorizer=stripmine file.c
 | 
						|
 | 
						|
Isolate the Polly passes
 | 
						|
========================
 | 
						|
 | 
						|
Polly's analysis and transformation passes are run with many other
 | 
						|
passes of the pass manager's pipeline.  Some of passes that run before
 | 
						|
Polly are essential for its working, for instance the canonicalization
 | 
						|
of loop.  Therefore Polly is unable to optimize code straight out of
 | 
						|
clang's -O0 output.
 | 
						|
 | 
						|
To get the LLVM-IR that Polly sees in the optimization pipeline, use the
 | 
						|
command:
 | 
						|
 | 
						|
.. code-block:: console
 | 
						|
 | 
						|
  clang file.c -c -O3 -mllvm -polly -mllvm -polly-dump-before-file=before-polly.ll
 | 
						|
 | 
						|
This writes a file 'before-polly.ll' containing the LLVM-IR as passed to
 | 
						|
polly, after SSA transformation, loop canonicalization, inlining and
 | 
						|
other passes.
 | 
						|
 | 
						|
Thereafter, any Polly pass can be run over 'before-polly.ll' using the
 | 
						|
'opt' tool.  To found out which Polly passes are active in the standard
 | 
						|
pipeline, see the output of
 | 
						|
 | 
						|
.. code-block:: console
 | 
						|
 | 
						|
  clang file.c -c -O3 -mllvm -polly -mllvm -debug-pass=Arguments
 | 
						|
 | 
						|
The Polly's passes are those between '-polly-detect' and
 | 
						|
'-polly-codegen'. Analysis passes can be omitted.  At the time of this
 | 
						|
writing, the default Polly pass pipeline is:
 | 
						|
 | 
						|
.. code-block:: console
 | 
						|
 | 
						|
  opt before-polly.ll -polly-simplify -polly-optree -polly-delicm -polly-simplify -polly-prune-unprofitable -polly-opt-isl -polly-codegen
 | 
						|
 | 
						|
Note that this uses LLVM's old/legacy pass manager.
 | 
						|
 | 
						|
For completeness, here are some other methods that generates IR
 | 
						|
suitable for processing with Polly from C/C++/Objective C source code.
 | 
						|
The previous method is the recommended one.
 | 
						|
 | 
						|
The following generates unoptimized LLVM-IR ('-O0', which is the
 | 
						|
default) and runs the canonicalizing passes on it
 | 
						|
('-polly-canonicalize'). This does /not/ include all the passes that run
 | 
						|
before Polly in the default pass pipeline.  The '-disable-O0-optnone'
 | 
						|
option is required because otherwise clang adds an 'optnone' attribute
 | 
						|
to all functions such that it is skipped by most optimization passes.
 | 
						|
This is meant to stop LTO builds to optimize these functions in the
 | 
						|
linking phase anyway.
 | 
						|
 | 
						|
.. code-block:: console
 | 
						|
 | 
						|
  clang file.c -c -O0 -Xclang -disable-O0-optnone -emit-llvm -S -o - | opt -polly-canonicalize -S
 | 
						|
 | 
						|
The option '-disable-llvm-passes' disables all LLVM passes, even those
 | 
						|
that run at -O0.  Passing -O1 (or any optimization level other than -O0)
 | 
						|
avoids that the 'optnone' attribute is added.
 | 
						|
 | 
						|
.. code-block:: console
 | 
						|
 | 
						|
  clang file.c -c -O1 -Xclang -disable-llvm-passes -emit-llvm -S -o - | opt -polly-canonicalize -S
 | 
						|
 | 
						|
As another alternative, Polly can be pushed in front of the pass
 | 
						|
pipeline, and then its output dumped.  This implicitly runs the
 | 
						|
'-polly-canonicalize' passes.
 | 
						|
 | 
						|
.. code-block:: console
 | 
						|
 | 
						|
  clang file.c -c -O3 -mllvm -polly -mllvm -polly-position=early -mllvm -polly-dump-before-file=before-polly.ll
 | 
						|
 | 
						|
Further options
 | 
						|
===============
 | 
						|
Polly supports further options that are mainly useful for the development or the
 | 
						|
analysis of Polly. The relevant options can be added to clang by appending
 | 
						|
-mllvm -option-name to the CFLAGS or the clang command line.
 | 
						|
 | 
						|
Limit Polly to a single function
 | 
						|
--------------------------------
 | 
						|
 | 
						|
To limit the execution of Polly to a single function, use the option
 | 
						|
-polly-only-func=functionname.
 | 
						|
 | 
						|
Disable LLVM-IR generation
 | 
						|
--------------------------
 | 
						|
 | 
						|
Polly normally regenerates LLVM-IR from the Polyhedral representation. To only
 | 
						|
see the effects of the preparing transformation, but to disable Polly code
 | 
						|
generation add the option polly-no-codegen.
 | 
						|
 | 
						|
Graphical view of the SCoPs
 | 
						|
---------------------------
 | 
						|
Polly can use graphviz to show the SCoPs it detects in a program. The relevant
 | 
						|
options are -polly-show, -polly-show-only, -polly-dot and -polly-dot-only. The
 | 
						|
'show' options automatically run dotty or another graphviz viewer to show the
 | 
						|
scops graphically. The 'dot' options store for each function a dot file that
 | 
						|
highlights the detected SCoPs. If 'only' is appended at the end of the option,
 | 
						|
the basic blocks are shown without the statements the contain.
 | 
						|
 | 
						|
Change/Disable the Optimizer
 | 
						|
----------------------------
 | 
						|
 | 
						|
Polly uses by default the isl scheduling optimizer. The isl optimizer optimizes
 | 
						|
for data-locality and parallelism using the Pluto algorithm.
 | 
						|
To disable the optimizer entirely use the option -polly-optimizer=none.
 | 
						|
 | 
						|
Disable tiling in the optimizer
 | 
						|
-------------------------------
 | 
						|
 | 
						|
By default both optimizers perform tiling, if possible. In case this is not
 | 
						|
wanted the option -polly-tiling=false can be used to disable it. (This option
 | 
						|
disables tiling for both optimizers).
 | 
						|
 | 
						|
Import / Export
 | 
						|
---------------
 | 
						|
 | 
						|
The flags -polly-import and -polly-export allow the export and reimport of the
 | 
						|
polyhedral representation. By exporting, modifying and reimporting the
 | 
						|
polyhedral representation externally calculated transformations can be
 | 
						|
applied. This enables external optimizers or the manual optimization of
 | 
						|
specific SCoPs.
 | 
						|
 | 
						|
Viewing Polly Diagnostics with opt-viewer
 | 
						|
-----------------------------------------
 | 
						|
 | 
						|
The flag -fsave-optimization-record will generate .opt.yaml files when compiling
 | 
						|
your program. These yaml files contain information about each emitted remark.
 | 
						|
Ensure that you have Python 2.7 with PyYaml and Pygments Python Packages.
 | 
						|
To run opt-viewer:
 | 
						|
 | 
						|
.. code-block:: console
 | 
						|
 | 
						|
   llvm/tools/opt-viewer/opt-viewer.py -source-dir /path/to/program/src/ \
 | 
						|
      /path/to/program/src/foo.opt.yaml \
 | 
						|
      /path/to/program/src/bar.opt.yaml \
 | 
						|
      -o ./output
 | 
						|
 | 
						|
Include all yaml files (use \*.opt.yaml when specifying which yaml files to view)
 | 
						|
to view all diagnostics from your program in opt-viewer. Compile with `PGO
 | 
						|
<https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation>`_ to view
 | 
						|
Hotness information in opt-viewer. Resulting html files can be viewed in an internet browser.
 |