forked from OSchip/llvm-project
				
			
		
			
				
	
	
		
			455 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			HTML
		
	
	
	
			
		
		
	
	
			455 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			HTML
		
	
	
	
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
 | 
						|
          "http://www.w3.org/TR/html4/strict.dtd">
 | 
						|
<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
 | 
						|
<html>
 | 
						|
<head>
 | 
						|
  <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
 | 
						|
  <title>Polly - Examples</title>
 | 
						|
  <link type="text/css" rel="stylesheet" href="menu.css">
 | 
						|
  <link type="text/css" rel="stylesheet" href="content.css">
 | 
						|
</head>
 | 
						|
<body>
 | 
						|
<div id="box">
 | 
						|
<!--#include virtual="menu.html.incl"-->
 | 
						|
<div id="content">
 | 
						|
<!--=====================================================================-->
 | 
						|
<h1>Execute the individual Polly passes manually</h1>
 | 
						|
<!--=====================================================================-->
 | 
						|
 | 
						|
<p>
 | 
						|
This example presents the individual passes that are involved when optimizing
 | 
						|
code with Polly. We show how to execute them individually and explain for each
 | 
						|
which analysis is performed or what transformation is applied. In this example
 | 
						|
the polyhedral transformation is user-provided to show how much performance
 | 
						|
improvement can be expected by an optimal automatic optimizer.</p>
 | 
						|
 | 
						|
The files used and created in this example are available in the Polly checkout
 | 
						|
in the folder <em>www/experiments/matmul</em>. They can be created automatically
 | 
						|
by running the <em>www/experiments/matmul/runall.sh</em> script.
 | 
						|
 | 
						|
<ol>
 | 
						|
<li><h4>Create LLVM-IR from the C code</h4>
 | 
						|
 | 
						|
Polly works on LLVM-IR. Hence it is necessary to translate the source files into
 | 
						|
LLVM-IR. If more than on file should be optimized the files can be combined into
 | 
						|
a single file with llvm-link.
 | 
						|
 | 
						|
<pre class="code">clang -S -emit-llvm matmul.c -o matmul.s</pre>
 | 
						|
</li>
 | 
						|
 | 
						|
 | 
						|
<li><h4>Load Polly automatically when calling the 'opt' tool</h4>
 | 
						|
 | 
						|
Polly is not built into opt or bugpoint, but it is a shared library that needs
 | 
						|
to be loaded into these tools explicitally. The Polly library is called
 | 
						|
LVMPolly.so. For a cmake build it is available in the build/lib/ directory,
 | 
						|
autoconf creates the same file in
 | 
						|
build/tools/polly/{Release+Asserts|Asserts|Debug}/lib. For convenience we create
 | 
						|
an alias that automatically loads Polly if 'opt' is called.
 | 
						|
<pre class="code">
 | 
						|
export PATH_TO_POLLY_LIB="~/polly/build/lib/"
 | 
						|
alias opt="opt -load ${PATH_TO_POLLY_LIB}/LLVMPolly.so"</pre>
 | 
						|
</li>
 | 
						|
 | 
						|
<li><h4>Prepare the LLVM-IR for Polly</h4>
 | 
						|
 | 
						|
Polly is only able to work with code that matches a canonical form. To translate
 | 
						|
the LLVM-IR into this form we use a set of canonicalication passes. They are
 | 
						|
scheduled by using '-polly-canonicalize'.
 | 
						|
<pre class="code">opt -S -polly-canonicalize matmul.s > matmul.preopt.ll</pre></li>
 | 
						|
 | 
						|
<li><h4>Show the SCoPs detected by Polly (optional)</h4>
 | 
						|
 | 
						|
To understand if Polly was able to detect SCoPs, we print the
 | 
						|
structure of the detected SCoPs. In our example two SCoPs were detected. One in
 | 
						|
'init_array' the other in 'main'.
 | 
						|
 | 
						|
<pre class="code">opt -basicaa -polly-ast -analyze -q matmul.preopt.ll</pre>
 | 
						|
 | 
						|
<pre>
 | 
						|
init_array():
 | 
						|
for (c2=0;c2<=1023;c2++) {
 | 
						|
  for (c4=0;c4<=1023;c4++) {
 | 
						|
    Stmt_5(c2,c4);
 | 
						|
  }
 | 
						|
}
 | 
						|
 | 
						|
main():
 | 
						|
for (c2=0;c2<=1023;c2++) {
 | 
						|
  for (c4=0;c4<=1023;c4++) {
 | 
						|
    Stmt_4(c2,c4);
 | 
						|
    for (c6=0;c6<=1023;c6++) {
 | 
						|
      Stmt_6(c2,c4,c6);
 | 
						|
    }
 | 
						|
  }
 | 
						|
}
 | 
						|
</pre>
 | 
						|
</li>
 | 
						|
<li><h4>Highlight the detected SCoPs in the CFGs of the program (requires graphviz/dotty)</h4>
 | 
						|
 | 
						|
Polly can use graphviz to graphically show a CFG in which the detected SCoPs are
 | 
						|
highlighted. It can also create '.dot' files that can be translated by
 | 
						|
the 'dot' utility into various graphic formats.
 | 
						|
 | 
						|
<pre class="code">opt -basicaa -view-scops -disable-output matmul.preopt.ll
 | 
						|
opt -basicaa -view-scops-only -disable-output matmul.preopt.ll</pre>
 | 
						|
The output for the different functions<br />
 | 
						|
view-scops:
 | 
						|
<a href="experiments/matmul/scops.main.dot.png">main</a>,
 | 
						|
<a href="experiments/matmul/scops.init_array.dot.png">init_array</a>,
 | 
						|
<a href="experiments/matmul/scops.print_array.dot.png">print_array</a><br />
 | 
						|
view-scops-only:
 | 
						|
<a href="experiments/matmul/scopsonly.main.dot.png">main</a>,
 | 
						|
<a href="experiments/matmul/scopsonly.init_array.dot.png">init_array</a>,
 | 
						|
<a href="experiments/matmul/scopsonly.print_array.dot.png">print_array</a>
 | 
						|
</li>
 | 
						|
 | 
						|
<li><h4>View the polyhedral representation of the SCoPs</h4>
 | 
						|
<pre class="code">opt -basicaa -polly-scops -analyze matmul.preopt.ll</pre>
 | 
						|
<pre>
 | 
						|
[...]
 | 
						|
Printing analysis 'Polly - Create polyhedral description of Scops' for region:
 | 
						|
'for.cond => for.end19' in function 'init_array':
 | 
						|
   Context:
 | 
						|
   { [] }
 | 
						|
   Statements {
 | 
						|
   	Stmt_5
 | 
						|
           Domain :=
 | 
						|
               { Stmt_5[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 };
 | 
						|
           Schedule :=
 | 
						|
               { Stmt_5[i0, i1] -> schedule[0, i0, 0, i1, 0] };
 | 
						|
           WriteAccess :=
 | 
						|
               { Stmt_5[i0, i1] -> MemRef_A[1037i0 + i1] };
 | 
						|
           WriteAccess :=
 | 
						|
               { Stmt_5[i0, i1] -> MemRef_B[1047i0 + i1] };
 | 
						|
   	FinalRead
 | 
						|
           Domain :=
 | 
						|
               { FinalRead[0] };
 | 
						|
           Schedule :=
 | 
						|
               { FinalRead[i0] -> schedule[200000000, o1, o2, o3, o4] };
 | 
						|
           ReadAccess :=
 | 
						|
               { FinalRead[i0] -> MemRef_A[o0] };
 | 
						|
           ReadAccess :=
 | 
						|
               { FinalRead[i0] -> MemRef_B[o0] };
 | 
						|
   }
 | 
						|
[...]
 | 
						|
Printing analysis 'Polly - Create polyhedral description of Scops' for region:
 | 
						|
'for.cond => for.end30' in function 'main':
 | 
						|
   Context:
 | 
						|
   { [] }
 | 
						|
   Statements {
 | 
						|
   	Stmt_4
 | 
						|
           Domain :=
 | 
						|
               { Stmt_4[i0, i1] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 };
 | 
						|
           Schedule :=
 | 
						|
               { Stmt_4[i0, i1] -> schedule[0, i0, 0, i1, 0, 0, 0] };
 | 
						|
           WriteAccess :=
 | 
						|
               { Stmt_4[i0, i1] -> MemRef_C[1067i0 + i1] };
 | 
						|
   	Stmt_6
 | 
						|
           Domain :=
 | 
						|
               { Stmt_6[i0, i1, i2] : i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023 };
 | 
						|
           Schedule :=
 | 
						|
               { Stmt_6[i0, i1, i2] -> schedule[0, i0, 0, i1, 1, i2, 0] };
 | 
						|
           ReadAccess :=
 | 
						|
               { Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] };
 | 
						|
           ReadAccess :=
 | 
						|
               { Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] };
 | 
						|
           ReadAccess :=
 | 
						|
               { Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] };
 | 
						|
           WriteAccess :=
 | 
						|
               { Stmt_6[i0, i1, i2] -> MemRef_C[1067i0 + i1] };
 | 
						|
   	FinalRead
 | 
						|
           Domain :=
 | 
						|
               { FinalRead[0] };
 | 
						|
           Schedule :=
 | 
						|
               { FinalRead[i0] -> schedule[200000000, o1, o2, o3, o4, o5, o6] };
 | 
						|
           ReadAccess :=
 | 
						|
               { FinalRead[i0] -> MemRef_C[o0] };
 | 
						|
           ReadAccess :=
 | 
						|
               { FinalRead[i0] -> MemRef_A[o0] };
 | 
						|
           ReadAccess :=
 | 
						|
               { FinalRead[i0] -> MemRef_B[o0] };
 | 
						|
   }
 | 
						|
[...]
 | 
						|
</pre>
 | 
						|
</li>
 | 
						|
 | 
						|
<li><h4>Show the dependences for the SCoPs</h4>
 | 
						|
<pre class="code">opt -basicaa -polly-dependences -analyze matmul.preopt.ll</pre>
 | 
						|
<pre>Printing analysis 'Polly - Calculate dependences for SCoP' for region:
 | 
						|
'for.cond => for.end19' in function 'init_array':
 | 
						|
   Must dependences:
 | 
						|
       {  }
 | 
						|
   May dependences:
 | 
						|
       {  }
 | 
						|
   Must no source:
 | 
						|
       {  }
 | 
						|
   May no source:
 | 
						|
       {  }
 | 
						|
Printing analysis 'Polly - Calculate dependences for SCoP' for region:
 | 
						|
'for.cond => for.end30' in function 'main':
 | 
						|
   Must dependences:
 | 
						|
       {  Stmt_4[i0, i1] -> Stmt_6[i0, i1, 0] :
 | 
						|
              i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023;
 | 
						|
          Stmt_6[i0, i1, i2] -> Stmt_6[i0, i1, 1 + i2] :
 | 
						|
              i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1022;
 | 
						|
          Stmt_6[i0, i1, 1023] -> FinalRead[0] :
 | 
						|
              i1 <= 1091540 - 1067i0 and i1 >= -1067i0 and i1 >= 0 and i1 <= 1023;
 | 
						|
          Stmt_6[1023, i1, 1023] -> FinalRead[0] :
 | 
						|
              i1 >= 0 and i1 <= 1023
 | 
						|
       }
 | 
						|
   May dependences:
 | 
						|
       {  }
 | 
						|
   Must no source:
 | 
						|
       {  Stmt_6[i0, i1, i2] -> MemRef_A[1037i0 + i2] :
 | 
						|
              i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023;
 | 
						|
          Stmt_6[i0, i1, i2] -> MemRef_B[i1 + 1047i2] :
 | 
						|
              i0 >= 0 and i0 <= 1023 and i1 >= 0 and i1 <= 1023 and i2 >= 0 and i2 <= 1023;
 | 
						|
          FinalRead[0] -> MemRef_A[o0];
 | 
						|
          FinalRead[0] -> MemRef_B[o0]
 | 
						|
          FinalRead[0] -> MemRef_C[o0] :
 | 
						|
              o0 >= 1092565 or (exists (e0 = [(o0)/1067]: o0 <= 1091540 and o0 >= 0
 | 
						|
              and 1067e0 <= -1024 + o0 and 1067e0 >= -1066 + o0)) or o0 <= -1;
 | 
						|
       }
 | 
						|
   May no source:
 | 
						|
       {  }
 | 
						|
</pre></li>
 | 
						|
 | 
						|
<li><h4>Export jscop files</h4>
 | 
						|
 | 
						|
Polly can export the polyhedral representation in so called jscop files. Jscop
 | 
						|
files contain the polyhedral representation stored in a JSON file.
 | 
						|
<pre class="code">opt -basicaa -polly-export-jscop matmul.preopt.ll</pre>
 | 
						|
<pre>Writing SCoP 'for.cond => for.end19' in function 'init_array' to './init_array___%for.cond---%for.end19.jscop'.
 | 
						|
Writing SCoP 'for.cond => for.end30' in function 'main' to './main___%for.cond---%for.end30.jscop'.
 | 
						|
</pre></li>
 | 
						|
 | 
						|
<li><h4>Import the changed jscop files and print the updated SCoP structure
 | 
						|
(optional)</h4>
 | 
						|
<p>Polly can reimport jscop files, in which the schedules of the statements are
 | 
						|
changed. These changed schedules are used to descripe transformations.
 | 
						|
It is possible to import different jscop files by providing the postfix
 | 
						|
of the jscop file that is imported.</p>
 | 
						|
<p> We apply three different transformations on the SCoP in the main function.
 | 
						|
The jscop files describing these transformations are hand written (and available
 | 
						|
in <em>www/experiments/matmul</em>).
 | 
						|
 | 
						|
<h5>No Polly</h5>
 | 
						|
 | 
						|
<p>As a baseline we do not call any Polly code generation, but only apply the
 | 
						|
normal -O3 optimizations.</p>
 | 
						|
 | 
						|
<pre class="code">
 | 
						|
opt matmul.preopt.ll -basicaa \
 | 
						|
    -polly-import-jscop \
 | 
						|
    -polly-ast -analyze
 | 
						|
</pre>
 | 
						|
<pre>
 | 
						|
[...]
 | 
						|
main():
 | 
						|
for (c2=0;c2<g;=1535;c2++) {
 | 
						|
  for (c4=0;c4<g;=1535;c4++) {
 | 
						|
    Stmt_4(c2,c4);
 | 
						|
    for (c6=0;c6<g;=1535;c6++) {
 | 
						|
      Stmt_6(c2,c4,c6);
 | 
						|
    }
 | 
						|
  }
 | 
						|
}
 | 
						|
[...]
 | 
						|
</pre>
 | 
						|
<h5>Interchange (and Fission to allow the interchange)</h5>
 | 
						|
<p>We split the loops and can now apply an interchange of the loop dimensions that
 | 
						|
enumerate Stmt_6.</p>
 | 
						|
<pre class="code">
 | 
						|
opt matmul.preopt.ll -basicaa \
 | 
						|
    -polly-import-jscop -polly-import-jscop-postfix=interchanged \
 | 
						|
    -polly-ast -analyze
 | 
						|
</pre>
 | 
						|
<pre>
 | 
						|
[...]
 | 
						|
Reading JScop 'for.cond => for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled'.
 | 
						|
[...]
 | 
						|
main():
 | 
						|
for (c2=0;c2<=1535;c2++) {
 | 
						|
  for (c4=0;c4<=1535;c4++) {
 | 
						|
    Stmt_4(c2,c4);
 | 
						|
  }
 | 
						|
}
 | 
						|
for (c2=0;c2<=1535;c2++) {
 | 
						|
  for (c4=0;c4<=1535;c4++) {
 | 
						|
    for (c6=0;c6<=1535;c6++) {
 | 
						|
      Stmt_6(c2,c6,c4);
 | 
						|
    }
 | 
						|
  }
 | 
						|
}
 | 
						|
[...]
 | 
						|
</pre>
 | 
						|
<h5>Interchange + Tiling</h5>
 | 
						|
<p>In addition to the interchange we tile now the second loop nest.</p>
 | 
						|
 | 
						|
<pre class="code">
 | 
						|
opt matmul.preopt.ll -basicaa \
 | 
						|
    -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled \
 | 
						|
    -polly-ast -analyze
 | 
						|
</pre>
 | 
						|
<pre>
 | 
						|
[...]
 | 
						|
Reading JScop 'for.cond => for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled'.
 | 
						|
[...]
 | 
						|
main():
 | 
						|
for (c2=0;c2<=1535;c2++) {
 | 
						|
  for (c4=0;c4<=1535;c4++) {
 | 
						|
    Stmt_4(c2,c4);
 | 
						|
  }
 | 
						|
}
 | 
						|
for (c2=0;c2<=1535;c2+=64) {
 | 
						|
  for (c3=0;c3<=1535;c3+=64) {
 | 
						|
    for (c4=0;c4<=1535;c4+=64) {
 | 
						|
      for (c5=c2;c5<=c2+63;c5++) {
 | 
						|
        for (c6=c4;c6<=c4+63;c6++) {
 | 
						|
          for (c7=c3;c7<=c3+63;c7++) {
 | 
						|
            Stmt_6(c5,c7,c6);
 | 
						|
          }
 | 
						|
        }
 | 
						|
      }
 | 
						|
    }
 | 
						|
  }
 | 
						|
}
 | 
						|
[...]
 | 
						|
</pre>
 | 
						|
<h5>Interchange + Tiling + Strip-mining to prepare vectorization</h5>
 | 
						|
To later allow vectorization we create a so called trivially parallelizable
 | 
						|
loop. It is innermost, parallel and has only four iterations. It can be
 | 
						|
replaced by 4-element SIMD instructions.
 | 
						|
<pre class="code">
 | 
						|
opt matmul.preopt.ll -basicaa \
 | 
						|
    -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
 | 
						|
    -polly-ast -analyze </pre>
 | 
						|
 | 
						|
<pre>
 | 
						|
[...]
 | 
						|
Reading JScop 'for.cond => for.end30' in function 'main' from './main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
 | 
						|
[...]
 | 
						|
main():
 | 
						|
for (c2=0;c2<=1535;c2++) {
 | 
						|
  for (c4=0;c4<=1535;c4++) {
 | 
						|
    Stmt_4(c2,c4);
 | 
						|
  }
 | 
						|
}
 | 
						|
for (c2=0;c2<=1535;c2+=64) {
 | 
						|
  for (c3=0;c3<=1535;c3+=64) {
 | 
						|
    for (c4=0;c4<=1535;c4+=64) {
 | 
						|
      for (c5=c2;c5<=c2+63;c5++) {
 | 
						|
        for (c6=c4;c6<=c4+63;c6++) {
 | 
						|
          for (c7=c3;c7<=c3+63;c7+=4) {
 | 
						|
            for (c8=c7;c8<=c7+3;c8++) {
 | 
						|
              Stmt_6(c5,c8,c6);
 | 
						|
            }
 | 
						|
          }
 | 
						|
        }
 | 
						|
      }
 | 
						|
    }
 | 
						|
  }
 | 
						|
}
 | 
						|
[...]
 | 
						|
</pre>
 | 
						|
 | 
						|
</li>
 | 
						|
 | 
						|
<li><h4>Codegenerate the SCoPs</h4>
 | 
						|
<p>
 | 
						|
This generates new code for the SCoPs detected by polly.
 | 
						|
If -polly-import-jscop is present, transformations specified in the imported
 | 
						|
jscop files will be applied.</p>
 | 
						|
<pre class="code">opt matmul.preopt.ll | opt -O3 > matmul.normalopt.ll</pre>
 | 
						|
<pre class="code">
 | 
						|
opt -basicaa \
 | 
						|
    -polly-import-jscop -polly-import-jscop-postfix=interchanged \
 | 
						|
    -polly-codegen matmul.preopt.ll \
 | 
						|
   | opt -O3 > matmul.polly.interchanged.ll</pre>
 | 
						|
<pre>
 | 
						|
Reading JScop 'for.cond => for.end19' in function 'init_array' from
 | 
						|
    './init_array___%for.cond---%for.end19.jscop.interchanged'.
 | 
						|
File could not be read: No such file or directory
 | 
						|
Reading JScop 'for.cond => for.end30' in function 'main' from
 | 
						|
    './main___%for.cond---%for.end30.jscop.interchanged'.
 | 
						|
</pre>
 | 
						|
<pre class="code">
 | 
						|
opt -basicaa \
 | 
						|
    -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled \
 | 
						|
    -polly-codegen matmul.preopt.ll \
 | 
						|
   | opt -O3 > matmul.polly.interchanged+tiled.ll</pre>
 | 
						|
<pre>
 | 
						|
Reading JScop 'for.cond => for.end19' in function 'init_array' from
 | 
						|
    './init_array___%for.cond---%for.end19.jscop.interchanged+tiled'.
 | 
						|
File could not be read: No such file or directory
 | 
						|
Reading JScop 'for.cond => for.end30' in function 'main' from
 | 
						|
    './main___%for.cond---%for.end30.jscop.interchanged+tiled'.
 | 
						|
</pre>
 | 
						|
<pre class="code">
 | 
						|
opt -basicaa \
 | 
						|
    -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
 | 
						|
    -polly-codegen -polly-vectorizer=polly matmul.preopt.ll \
 | 
						|
   | opt -O3 > matmul.polly.interchanged+tiled+vector.ll</pre>
 | 
						|
<pre>
 | 
						|
Reading JScop 'for.cond => for.end19' in function 'init_array' from
 | 
						|
    './init_array___%for.cond---%for.end19.jscop.interchanged+tiled+vector'.
 | 
						|
File could not be read: No such file or directory
 | 
						|
Reading JScop 'for.cond => for.end30' in function 'main' from
 | 
						|
    './main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
 | 
						|
</pre>
 | 
						|
<pre class="code">
 | 
						|
opt -basicaa \
 | 
						|
    -polly-import-jscop -polly-import-jscop-postfix=interchanged+tiled+vector \
 | 
						|
    -polly-codegen -polly-vectorizer=polly -polly-parallel matmul.preopt.ll \
 | 
						|
  | opt -O3 > matmul.polly.interchanged+tiled+openmp.ll</pre>
 | 
						|
<pre>
 | 
						|
Reading JScop 'for.cond => for.end19' in function 'init_array' from
 | 
						|
    './init_array___%for.cond---%for.end19.jscop.interchanged+tiled+vector'.
 | 
						|
File could not be read: No such file or directory
 | 
						|
Reading JScop 'for.cond => for.end30' in function 'main' from
 | 
						|
    './main___%for.cond---%for.end30.jscop.interchanged+tiled+vector'.
 | 
						|
</pre>
 | 
						|
 | 
						|
<li><h4>Create the executables</h4>
 | 
						|
 | 
						|
Create one executable optimized with plain -O3 as well as a set of executables
 | 
						|
optimized in different ways with Polly. One changes only the loop structure, the
 | 
						|
other adds tiling, the next adds vectorization and finally we use OpenMP
 | 
						|
parallelism.
 | 
						|
<pre class="code">
 | 
						|
llc matmul.normalopt.ll -o matmul.normalopt.s && \
 | 
						|
    gcc matmul.normalopt.s -o matmul.normalopt.exe
 | 
						|
llc matmul.polly.interchanged.ll -o matmul.polly.interchanged.s && \
 | 
						|
    gcc matmul.polly.interchanged.s -o matmul.polly.interchanged.exe
 | 
						|
llc matmul.polly.interchanged+tiled.ll -o matmul.polly.interchanged+tiled.s && \
 | 
						|
    gcc matmul.polly.interchanged+tiled.s -o matmul.polly.interchanged+tiled.exe
 | 
						|
llc matmul.polly.interchanged+tiled+vector.ll -o matmul.polly.interchanged+tiled+vector.s && \
 | 
						|
    gcc matmul.polly.interchanged+tiled+vector.s -o matmul.polly.interchanged+tiled+vector.exe
 | 
						|
llc matmul.polly.interchanged+tiled+vector+openmp.ll -o matmul.polly.interchanged+tiled+vector+openmp.s && \
 | 
						|
    gcc -lgomp matmul.polly.interchanged+tiled+vector+openmp.s -o matmul.polly.interchanged+tiled+vector+openmp.exe </pre>
 | 
						|
 | 
						|
<li><h4>Compare the runtime of the executables</h4>
 | 
						|
 | 
						|
By comparing the runtimes of the different code snippets we see that a simple
 | 
						|
loop interchange gives here the largest performance boost. However by adding
 | 
						|
vectorization and by using OpenMP we can further improve the performance
 | 
						|
significantly.
 | 
						|
<pre class="code">time ./matmul.normalopt.exe</pre>
 | 
						|
<pre>42.68 real, 42.55 user, 0.00 sys</pre>
 | 
						|
<pre class="code">time ./matmul.polly.interchanged.exe</pre>
 | 
						|
<pre>04.33 real, 4.30 user, 0.01 sys</pre>
 | 
						|
<pre class="code">time ./matmul.polly.interchanged+tiled.exe</pre>
 | 
						|
<pre>04.11 real, 4.10 user, 0.00 sys</pre>
 | 
						|
<pre class="code">time ./matmul.polly.interchanged+tiled+vector.exe</pre>
 | 
						|
<pre>01.39 real, 1.36 user, 0.01 sys</pre>
 | 
						|
<pre class="code">time ./matmul.polly.interchanged+tiled+vector+openmp.exe</pre>
 | 
						|
<pre>00.66 real, 2.58 user, 0.02 sys</pre>
 | 
						|
</li>
 | 
						|
</ol>
 | 
						|
 | 
						|
</div>
 | 
						|
</div>
 | 
						|
</body>
 | 
						|
</html>
 |