forked from OSchip/llvm-project
				
			
		
			
				
	
	
		
			786 lines
		
	
	
		
			31 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			786 lines
		
	
	
		
			31 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
=================================
 | 
						||
MergeFunctions pass, how it works
 | 
						||
=================================
 | 
						||
 | 
						||
.. contents::
 | 
						||
   :local:
 | 
						||
 | 
						||
Introduction
 | 
						||
============
 | 
						||
Sometimes code contains equal functions, or functions that does exactly the same
 | 
						||
thing even though they are non-equal on the IR level (e.g.: multiplication on 2
 | 
						||
and 'shl 1'). It could happen due to several reasons: mainly, the usage of
 | 
						||
templates and automatic code generators. Though, sometimes the user itself could
 | 
						||
write the same thing twice :-)
 | 
						||
 | 
						||
The main purpose of this pass is to recognize such functions and merge them.
 | 
						||
 | 
						||
This document is the extension to pass comments and describes the pass logic. It
 | 
						||
describes the algorithm that is used in order to compare functions and
 | 
						||
explains how we could combine equal functions correctly to keep the module
 | 
						||
valid.
 | 
						||
 | 
						||
Material is brought in a top-down form, so the reader could start to learn pass
 | 
						||
from high level ideas and end with low-level algorithm details, thus preparing
 | 
						||
him or her for reading the sources.
 | 
						||
 | 
						||
The main goal is to describe the algorithm and logic here and the concept. If
 | 
						||
you *don't want* to read the source code, but want to understand pass
 | 
						||
algorithms, this document is good for you. The author tries not to repeat the
 | 
						||
source-code and covers only common cases to avoid the cases of needing to
 | 
						||
update this document after any minor code changes.
 | 
						||
 | 
						||
 | 
						||
What should I know to be able to follow along with this document?
 | 
						||
-----------------------------------------------------------------
 | 
						||
 | 
						||
The reader should be familiar with common compile-engineering principles and
 | 
						||
LLVM code fundamentals. In this article, we assume the reader is familiar with
 | 
						||
`Single Static Assignment
 | 
						||
<http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
 | 
						||
concept and has an understanding of
 | 
						||
`IR structure <http://llvm.org/docs/LangRef.html#high-level-structure>`_.
 | 
						||
 | 
						||
We will use terms such as
 | 
						||
"`module <http://llvm.org/docs/LangRef.html#high-level-structure>`_",
 | 
						||
"`function <http://llvm.org/docs/ProgrammersManual.html#the-function-class>`_",
 | 
						||
"`basic block <http://en.wikipedia.org/wiki/Basic_block>`_",
 | 
						||
"`user <http://llvm.org/docs/ProgrammersManual.html#the-user-class>`_",
 | 
						||
"`value <http://llvm.org/docs/ProgrammersManual.html#the-value-class>`_",
 | 
						||
"`instruction
 | 
						||
<http://llvm.org/docs/ProgrammersManual.html#the-instruction-class>`_".
 | 
						||
 | 
						||
As a good starting point, the Kaleidoscope tutorial can be used:
 | 
						||
 | 
						||
:doc:`tutorial/index`
 | 
						||
 | 
						||
It's especially important to understand chapter 3 of tutorial:
 | 
						||
 | 
						||
:doc:`tutorial/LangImpl03`
 | 
						||
 | 
						||
The reader should also know how passes work in LLVM. They could use this
 | 
						||
article as a reference and start point here:
 | 
						||
 | 
						||
:doc:`WritingAnLLVMPass`
 | 
						||
 | 
						||
What else? Well perhaps the reader should also have some experience in LLVM pass
 | 
						||
debugging and bug-fixing.
 | 
						||
 | 
						||
Narrative structure
 | 
						||
-------------------
 | 
						||
The article consists of three parts. The first part explains pass functionality
 | 
						||
on the top-level. The second part describes the comparison procedure itself.
 | 
						||
The third part describes the merging process.
 | 
						||
 | 
						||
In every part, the author tries to put the contents in the top-down form.
 | 
						||
The top-level methods will first be described followed by the terminal ones at
 | 
						||
the end, in the tail of each part. If the reader sees the reference to the
 | 
						||
method that wasn't described yet, they will find its description a bit below.
 | 
						||
 | 
						||
Basics
 | 
						||
======
 | 
						||
 | 
						||
How to do it?
 | 
						||
-------------
 | 
						||
Do we need to merge functions? The obvious answer is: Yes, that is quite a
 | 
						||
possible case. We usually *do* have duplicates and it would be good to get rid
 | 
						||
of them. But how do we detect duplicates? This is the idea: we split functions
 | 
						||
into smaller bricks or parts and compare the "bricks" amount. If equal,
 | 
						||
we compare the "bricks" themselves, and then do our conclusions about functions
 | 
						||
themselves.
 | 
						||
 | 
						||
What could the difference be? For example, on a machine with 64-bit pointers
 | 
						||
(let's assume we have only one address space), one function stores a 64-bit
 | 
						||
integer, while another one stores a pointer. If the target is the machine
 | 
						||
mentioned above, and if functions are identical, except the parameter type (we
 | 
						||
could consider it as a part of function type), then we can treat a ``uint64_t``
 | 
						||
and a ``void*`` as equal.
 | 
						||
 | 
						||
This is just an example; more possible details are described a bit below.
 | 
						||
 | 
						||
As another example, the reader may imagine two more functions. The first
 | 
						||
function performs a multiplication on 2, while the second one performs an
 | 
						||
arithmetic right shift on 1.
 | 
						||
 | 
						||
Possible solutions
 | 
						||
^^^^^^^^^^^^^^^^^^
 | 
						||
Let's briefly consider possible options about how and what we have to implement
 | 
						||
in order to create full-featured functions merging, and also what it would
 | 
						||
mean for us.
 | 
						||
 | 
						||
Equal function detection obviously supposes that a "detector" method to be
 | 
						||
implemented and latter should answer the question "whether functions are equal".
 | 
						||
This "detector" method consists of tiny "sub-detectors", which each answers
 | 
						||
exactly the same question, but for function parts.
 | 
						||
 | 
						||
As the second step, we should merge equal functions. So it should be a "merger"
 | 
						||
method. "Merger" accepts two functions *F1* and *F2*, and produces *F1F2*
 | 
						||
function, the result of merging.
 | 
						||
 | 
						||
Having such routines in our hands, we can process a whole module, and merge all
 | 
						||
equal functions.
 | 
						||
 | 
						||
In this case, we have to compare every function with every another function. As
 | 
						||
the reader may notice, this way seems to be quite expensive. Of course we could
 | 
						||
introduce hashing and other helpers, but it is still just an optimization, and
 | 
						||
thus the level of O(N*N) complexity.
 | 
						||
 | 
						||
Can we reach another level? Could we introduce logarithmical search, or random
 | 
						||
access lookup? The answer is: "yes".
 | 
						||
 | 
						||
Random-access
 | 
						||
"""""""""""""
 | 
						||
How it could this be done? Just convert each function to a number, and gather
 | 
						||
all of them in a special hash-table. Functions with equal hashes are equal.
 | 
						||
Good hashing means, that every function part must be taken into account. That
 | 
						||
means we have to convert every function part into some number, and then add it
 | 
						||
into the hash. The lookup-up time would be small, but such a approach adds some
 | 
						||
delay due to the hashing routine.
 | 
						||
 | 
						||
Logarithmical search
 | 
						||
""""""""""""""""""""
 | 
						||
We could introduce total ordering among the functions set, once ordered we
 | 
						||
could then implement a logarithmical search. Lookup time still depends on N,
 | 
						||
but adds a little of delay (*log(N)*).
 | 
						||
 | 
						||
Present state
 | 
						||
"""""""""""""
 | 
						||
Both of the approaches (random-access and logarithmical) have been implemented
 | 
						||
and tested and both give a very good improvement. What was most
 | 
						||
surprising is that logarithmical search was faster; sometimes by up to 15%. The
 | 
						||
hashing method needs some extra CPU time, which is the main reason why it works
 | 
						||
slower; in most cases, total "hashing" time is greater than total
 | 
						||
"logarithmical-search" time.
 | 
						||
 | 
						||
So, preference has been granted to the "logarithmical search".
 | 
						||
 | 
						||
Though in the case of need, *logarithmical-search* (read "total-ordering") could
 | 
						||
be used as a milestone on our way to the *random-access* implementation.
 | 
						||
 | 
						||
Every comparison is based either on the numbers or on the flags comparison. In
 | 
						||
the *random-access* approach, we could use the same comparison algorithm.
 | 
						||
During comparison, we exit once we find the difference, but here we might have
 | 
						||
to scan the whole function body every time (note, it could be slower). Like in
 | 
						||
"total-ordering", we will track every number and flag, but instead of
 | 
						||
comparison, we should get the numbers sequence and then create the hash number.
 | 
						||
So, once again, *total-ordering* could be considered as a milestone for even
 | 
						||
faster (in theory) random-access approach.
 | 
						||
 | 
						||
MergeFunctions, main fields and runOnModule
 | 
						||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
There are two main important fields in the class:
 | 
						||
 | 
						||
``FnTree``  – the set of all unique functions. It keeps items that couldn't be
 | 
						||
merged with each other. It is defined as:
 | 
						||
 | 
						||
``std::set<FunctionNode> FnTree;``
 | 
						||
 | 
						||
Here ``FunctionNode`` is a wrapper for ``llvm::Function`` class, with
 | 
						||
implemented “<” operator among the functions set (below we explain how it works
 | 
						||
exactly; this is a key point in fast functions comparison).
 | 
						||
 | 
						||
``Deferred`` – merging process can affect bodies of functions that are in
 | 
						||
``FnTree`` already. Obviously, such functions should be rechecked again. In this
 | 
						||
case, we remove them from ``FnTree``, and mark them to be rescanned, namely
 | 
						||
put them into ``Deferred`` list.
 | 
						||
 | 
						||
runOnModule
 | 
						||
"""""""""""
 | 
						||
The algorithm is pretty simple:
 | 
						||
 | 
						||
1. Put all module's functions into the *worklist*.
 | 
						||
 | 
						||
2. Scan *worklist*'s functions twice: first enumerate only strong functions and
 | 
						||
then only weak ones:
 | 
						||
 | 
						||
   2.1. Loop body: take a function from *worklist*  (call it *FCur*) and try to
 | 
						||
   insert it into *FnTree*: check whether *FCur* is equal to one of functions
 | 
						||
   in *FnTree*. If there *is* an equal function in *FnTree*
 | 
						||
   (call it *FExists*): merge function *FCur* with *FExists*. Otherwise add
 | 
						||
   the function from the *worklist* to *FnTree*.
 | 
						||
 | 
						||
3. Once the *worklist* scanning and merging operations are complete, check the
 | 
						||
*Deferred* list. If it is not empty: refill the *worklist* contents with
 | 
						||
*Deferred* list and redo step 2, if the *Deferred* list is empty, then exit
 | 
						||
from method.
 | 
						||
 | 
						||
Comparison and logarithmical search
 | 
						||
"""""""""""""""""""""""""""""""""""
 | 
						||
Let's recall our task: for every function *F* from module *M*, we have to find
 | 
						||
equal functions *F`* in the shortest time possible , and merge them into a
 | 
						||
single function.
 | 
						||
 | 
						||
Defining total ordering among the functions set allows us to organize
 | 
						||
functions into a binary tree. The lookup procedure complexity would be
 | 
						||
estimated as O(log(N)) in this case. But how do we define *total-ordering*?
 | 
						||
 | 
						||
We have to introduce a single rule applicable to every pair of functions, and
 | 
						||
following this rule, then evaluate which of them is greater. What kind of rule
 | 
						||
could it be? Let's declare it as the "compare" method that returns one of 3
 | 
						||
possible values:
 | 
						||
 | 
						||
-1, left is *less* than right,
 | 
						||
 | 
						||
0, left and right are *equal*,
 | 
						||
 | 
						||
1, left is *greater* than right.
 | 
						||
 | 
						||
Of course it means, that we have to maintain
 | 
						||
*strict and non-strict order relation properties*:
 | 
						||
 | 
						||
* reflexivity (``a <= a``, ``a == a``, ``a >= a``),
 | 
						||
* antisymmetry (if ``a <= b`` and ``b <= a`` then ``a == b``),
 | 
						||
* transitivity (``a <= b`` and ``b <= c``, then ``a <= c``)
 | 
						||
* asymmetry (if ``a < b``, then ``a > b`` or ``a == b``).
 | 
						||
 | 
						||
As mentioned before, the comparison routine consists of
 | 
						||
"sub-comparison-routines", with each of them also consisting of
 | 
						||
"sub-comparison-routines", and so on. Finally, it ends up with primitive
 | 
						||
comparison.
 | 
						||
 | 
						||
Below, we will use the following operations:
 | 
						||
 | 
						||
#. ``cmpNumbers(number1, number2)`` is a method that returns -1 if left is less
 | 
						||
   than right; 0, if left and right are equal; and 1 otherwise.
 | 
						||
 | 
						||
#. ``cmpFlags(flag1, flag2)`` is a hypothetical method that compares two flags.
 | 
						||
   The logic is the same as in ``cmpNumbers``, where ``true`` is 1, and
 | 
						||
   ``false`` is 0.
 | 
						||
 | 
						||
The rest of the article is based on *MergeFunctions.cpp* source code
 | 
						||
(found in *<llvm_dir>/lib/Transforms/IPO/MergeFunctions.cpp*). We would like
 | 
						||
to ask reader to keep this file open, so we could use it as a reference
 | 
						||
for further explanations.
 | 
						||
 | 
						||
Now, we're ready to proceed to the next chapter and see how it works.
 | 
						||
 | 
						||
Functions comparison
 | 
						||
====================
 | 
						||
At first, let's define how exactly we compare complex objects.
 | 
						||
 | 
						||
Complex object comparison (function, basic-block, etc) is mostly based on its
 | 
						||
sub-object comparison results. It is similar to the next "tree" objects
 | 
						||
comparison:
 | 
						||
 | 
						||
#. For two trees *T1* and *T2* we perform *depth-first-traversal* and have
 | 
						||
   two sequences as a product: "*T1Items*" and "*T2Items*".
 | 
						||
 | 
						||
#. We then compare chains "*T1Items*" and "*T2Items*" in
 | 
						||
   the most-significant-item-first order. The result of items comparison
 | 
						||
   would be the result of *T1* and *T2* comparison itself.
 | 
						||
 | 
						||
FunctionComparator::compare(void)
 | 
						||
---------------------------------
 | 
						||
A brief look at the source code tells us that the comparison starts in the
 | 
						||
“``int FunctionComparator::compare(void)``” method.
 | 
						||
 | 
						||
1. The first parts to be compared are the function's attributes and some
 | 
						||
properties that is outside the “attributes” term, but still could make the
 | 
						||
function different without changing its body. This part of the comparison is
 | 
						||
usually done within simple *cmpNumbers* or *cmpFlags* operations (e.g.
 | 
						||
``cmpFlags(F1->hasGC(), F2->hasGC())``). Below is a full list of function's
 | 
						||
properties to be compared on this stage:
 | 
						||
 | 
						||
  * *Attributes* (those are returned by ``Function::getAttributes()``
 | 
						||
    method).
 | 
						||
 | 
						||
  * *GC*, for equivalence, *RHS* and *LHS* should be both either without
 | 
						||
    *GC* or with the same one.
 | 
						||
 | 
						||
  * *Section*, just like a *GC*: *RHS* and *LHS* should be defined in the
 | 
						||
    same section.
 | 
						||
 | 
						||
  * *Variable arguments*. *LHS* and *RHS* should be both either with or
 | 
						||
    without *var-args*.
 | 
						||
 | 
						||
  * *Calling convention* should be the same.
 | 
						||
 | 
						||
2. Function type. Checked by ``FunctionComparator::cmpType(Type*, Type*)``
 | 
						||
method. It checks return type and parameters type; the method itself will be
 | 
						||
described later.
 | 
						||
 | 
						||
3. Associate function formal parameters with each other. Then comparing function
 | 
						||
bodies, if we see the usage of *LHS*'s *i*-th argument in *LHS*'s body, then,
 | 
						||
we want to see usage of *RHS*'s *i*-th argument at the same place in *RHS*'s
 | 
						||
body, otherwise functions are different. On this stage we grant the preference
 | 
						||
to those we met later in function body (value we met first would be *less*).
 | 
						||
This is done by “``FunctionComparator::cmpValues(const Value*, const Value*)``”
 | 
						||
method (will be described a bit later).
 | 
						||
 | 
						||
4. Function body comparison. As it written in method comments:
 | 
						||
 | 
						||
“We do a CFG-ordered walk since the actual ordering of the blocks in the linked
 | 
						||
list is immaterial. Our walk starts at the entry block for both functions, then
 | 
						||
takes each block from each terminator in order. As an artifact, this also means
 | 
						||
that unreachable blocks are ignored.”
 | 
						||
 | 
						||
So, using this walk we get BBs from *left* and *right* in the same order, and
 | 
						||
compare them by “``FunctionComparator::compare(const BasicBlock*, const
 | 
						||
BasicBlock*)``” method.
 | 
						||
 | 
						||
We also associate BBs with each other, like we did it with function formal
 | 
						||
arguments (see ``cmpValues`` method below).
 | 
						||
 | 
						||
FunctionComparator::cmpType
 | 
						||
---------------------------
 | 
						||
Consider how type comparison works.
 | 
						||
 | 
						||
1. Coerce pointer to integer. If left type is a pointer, try to coerce it to the
 | 
						||
integer type. It could be done if its address space is 0, or if address spaces
 | 
						||
are ignored at all. Do the same thing for the right type.
 | 
						||
 | 
						||
2. If left and right types are equal, return 0. Otherwise we need to give
 | 
						||
preference to one of them. So proceed to the next step.
 | 
						||
 | 
						||
3. If types are of different kind (different type IDs). Return result of type
 | 
						||
IDs comparison, treating them as numbers (use ``cmpNumbers`` operation).
 | 
						||
 | 
						||
4. If types are vectors or integers, return result of their pointers comparison,
 | 
						||
comparing them as numbers.
 | 
						||
 | 
						||
5. Check whether type ID belongs to the next group (call it equivalent-group):
 | 
						||
 | 
						||
   * Void
 | 
						||
 | 
						||
   * Float
 | 
						||
 | 
						||
   * Double
 | 
						||
 | 
						||
   * X86_FP80
 | 
						||
 | 
						||
   * FP128
 | 
						||
 | 
						||
   * PPC_FP128
 | 
						||
 | 
						||
   * Label
 | 
						||
 | 
						||
   * Metadata.
 | 
						||
 | 
						||
   If ID belongs to group above, return 0. Since it's enough to see that
 | 
						||
   types has the same ``TypeID``. No additional information is required.
 | 
						||
 | 
						||
6. Left and right are pointers. Return result of address space comparison
 | 
						||
(numbers comparison).
 | 
						||
 | 
						||
7. Complex types (structures, arrays, etc.). Follow complex objects comparison
 | 
						||
technique (see the very first paragraph of this chapter). Both *left* and
 | 
						||
*right* are to be expanded and their element types will be checked the same
 | 
						||
way. If we get -1 or 1 on some stage, return it. Otherwise return 0.
 | 
						||
 | 
						||
8. Steps 1-6 describe all the possible cases, if we passed steps 1-6 and didn't
 | 
						||
get any conclusions, then invoke ``llvm_unreachable``, since it's quite an
 | 
						||
unexpectable case.
 | 
						||
 | 
						||
cmpValues(const Value*, const Value*)
 | 
						||
-------------------------------------
 | 
						||
Method that compares local values.
 | 
						||
 | 
						||
This method gives us an answer to a very curious question: whether we could
 | 
						||
treat local values as equal, and which value is greater otherwise. It's
 | 
						||
better to start from example:
 | 
						||
 | 
						||
Consider the situation when we're looking at the same place in left
 | 
						||
function "*FL*" and in right function "*FR*". Every part of *left* place is
 | 
						||
equal to the corresponding part of *right* place, and (!) both parts use
 | 
						||
*Value* instances, for example:
 | 
						||
 | 
						||
.. code-block:: text
 | 
						||
 | 
						||
   instr0 i32 %LV   ; left side, function FL
 | 
						||
   instr0 i32 %RV   ; right side, function FR
 | 
						||
 | 
						||
So, now our conclusion depends on *Value* instances comparison.
 | 
						||
 | 
						||
The main purpose of this method is to determine relation between such values.
 | 
						||
 | 
						||
What can we expect from equal functions? At the same place, in functions
 | 
						||
"*FL*" and "*FR*" we expect to see *equal* values, or values *defined* at
 | 
						||
the same place in "*FL*" and "*FR*".
 | 
						||
 | 
						||
Consider a small example here:
 | 
						||
 | 
						||
.. code-block:: text
 | 
						||
 | 
						||
  define void %f(i32 %pf0, i32 %pf1) {
 | 
						||
    instr0 i32 %pf0 instr1 i32 %pf1 instr2 i32 123
 | 
						||
  }
 | 
						||
 | 
						||
.. code-block:: text
 | 
						||
 | 
						||
  define void %g(i32 %pg0, i32 %pg1) {
 | 
						||
    instr0 i32 %pg0 instr1 i32 %pg0 instr2 i32 123
 | 
						||
  }
 | 
						||
 | 
						||
In this example, *pf0* is associated with *pg0*, *pf1* is associated with
 | 
						||
*pg1*, and we also declare that *pf0* < *pf1*, and thus *pg0* < *pf1*.
 | 
						||
 | 
						||
Instructions with opcode "*instr0*" would be *equal*, since their types and
 | 
						||
opcodes are equal, and values are *associated*.
 | 
						||
 | 
						||
Instructions with opcode "*instr1*" from *f* is *greater* than instructions
 | 
						||
with opcode "*instr1*" from *g*; here we have equal types and opcodes, but
 | 
						||
"*pf1* is greater than "*pg0*".
 | 
						||
 | 
						||
Instructions with opcode "*instr2*" are equal, because their opcodes and
 | 
						||
types are equal, and the same constant is used as a value.
 | 
						||
 | 
						||
What we associate in cmpValues?
 | 
						||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
* Function arguments. *i*-th argument from left function associated with
 | 
						||
  *i*-th argument from right function.
 | 
						||
* BasicBlock instances. In basic-block enumeration loop we associate *i*-th
 | 
						||
  BasicBlock from the left function with *i*-th BasicBlock from the right
 | 
						||
  function.
 | 
						||
* Instructions.
 | 
						||
* Instruction operands. Note, we can meet *Value* here we have never seen
 | 
						||
  before. In this case it is not a function argument, nor *BasicBlock*, nor
 | 
						||
  *Instruction*. It is a global value. It is a constant, since it's the only
 | 
						||
  supposed global here. The method also compares: Constants that are of the
 | 
						||
  same type and if right constant can be losslessly bit-casted to the left
 | 
						||
  one, then we also compare them.
 | 
						||
 | 
						||
How to implement cmpValues?
 | 
						||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
*Association* is a case of equality for us. We just treat such values as equal,
 | 
						||
but, in general, we need to implement antisymmetric relation. As mentioned
 | 
						||
above, to understand what is *less*, we can use order in which we
 | 
						||
meet values. If both values have the same order in a function (met at the same
 | 
						||
time), we then treat values as *associated*. Otherwise – it depends on who was
 | 
						||
first.
 | 
						||
 | 
						||
Every time we run the top-level compare method, we initialize two identical
 | 
						||
maps (one for the left side, another one for the right side):
 | 
						||
 | 
						||
``map<Value, int> sn_mapL, sn_mapR;``
 | 
						||
 | 
						||
The key of the map is the *Value* itself, the *value* – is its order (call it
 | 
						||
*serial number*).
 | 
						||
 | 
						||
To add value *V* we need to perform the next procedure:
 | 
						||
 | 
						||
``sn_map.insert(std::make_pair(V, sn_map.size()));``
 | 
						||
 | 
						||
For the first *Value*, map will return *0*, for the second *Value* map will
 | 
						||
return *1*, and so on.
 | 
						||
 | 
						||
We can then check whether left and right values met at the same time with
 | 
						||
a simple comparison:
 | 
						||
 | 
						||
``cmpNumbers(sn_mapL[Left], sn_mapR[Right]);``
 | 
						||
 | 
						||
Of course, we can combine insertion and comparison:
 | 
						||
 | 
						||
.. code-block:: c++
 | 
						||
 | 
						||
  std::pair<iterator, bool>
 | 
						||
    LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())), RightRes
 | 
						||
    = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
 | 
						||
  return cmpNumbers(LeftRes.first->second, RightRes.first->second);
 | 
						||
 | 
						||
Let's look, how whole method could be implemented.
 | 
						||
 | 
						||
1. We have to start with the bad news. Consider function self and
 | 
						||
cross-referencing cases:
 | 
						||
 | 
						||
.. code-block:: c++
 | 
						||
 | 
						||
  // self-reference unsigned fact0(unsigned n) { return n > 1 ? n
 | 
						||
  * fact0(n-1) : 1; } unsigned fact1(unsigned n) { return n > 1 ? n *
 | 
						||
  fact1(n-1) : 1; }
 | 
						||
 | 
						||
  // cross-reference unsigned ping(unsigned n) { return n!= 0 ? pong(n-1) : 0;
 | 
						||
  } unsigned pong(unsigned n) { return n!= 0 ? ping(n-1) : 0; }
 | 
						||
 | 
						||
..
 | 
						||
 | 
						||
  This comparison has been implemented in initial *MergeFunctions* pass
 | 
						||
  version. But, unfortunately, it is not transitive. And this is the only case
 | 
						||
  we can't convert to less-equal-greater comparison. It is a seldom case, 4-5
 | 
						||
  functions of 10000 (checked in test-suite), and, we hope, the reader would
 | 
						||
  forgive us for such a sacrifice in order to get the O(log(N)) pass time.
 | 
						||
 | 
						||
2. If left/right *Value* is a constant, we have to compare them. Return 0 if it
 | 
						||
is the same constant, or use ``cmpConstants`` method otherwise.
 | 
						||
 | 
						||
3. If left/right is *InlineAsm* instance. Return result of *Value* pointers
 | 
						||
comparison.
 | 
						||
 | 
						||
4. Explicit association of *L* (left value) and *R*  (right value). We need to
 | 
						||
find out whether values met at the same time, and thus are *associated*. Or we
 | 
						||
need to put the rule: when we treat *L* < *R*. Now it is easy: we just return
 | 
						||
the result of numbers comparison:
 | 
						||
 | 
						||
.. code-block:: c++
 | 
						||
 | 
						||
   std::pair<iterator, bool>
 | 
						||
     LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())),
 | 
						||
     RightRes = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
 | 
						||
   if (LeftRes.first->second == RightRes.first->second) return 0;
 | 
						||
   if (LeftRes.first->second < RightRes.first->second) return -1;
 | 
						||
   return 1;
 | 
						||
 | 
						||
Now when *cmpValues* returns 0, we can proceed the comparison procedure.
 | 
						||
Otherwise, if we get (-1 or 1), we need to pass this result to the top level,
 | 
						||
and finish comparison procedure.
 | 
						||
 | 
						||
cmpConstants
 | 
						||
------------
 | 
						||
Performs constants comparison as follows:
 | 
						||
 | 
						||
1. Compare constant types using ``cmpType`` method. If the result is -1 or 1,
 | 
						||
goto step 2, otherwise proceed to step 3.
 | 
						||
 | 
						||
2. If types are different, we still can check whether constants could be
 | 
						||
losslessly bitcasted to each other. The further explanation is modification of
 | 
						||
``canLosslesslyBitCastTo`` method.
 | 
						||
 | 
						||
   2.1 Check whether constants are of the first class types
 | 
						||
   (``isFirstClassType`` check):
 | 
						||
 | 
						||
   2.1.1. If both constants are *not* of the first class type: return result
 | 
						||
   of ``cmpType``.
 | 
						||
 | 
						||
   2.1.2. Otherwise, if left type is not of the first class, return -1. If
 | 
						||
   right type is not of the first class, return 1.
 | 
						||
 | 
						||
   2.1.3. If both types are of the first class type, proceed to the next step
 | 
						||
   (2.1.3.1).
 | 
						||
 | 
						||
   2.1.3.1. If types are vectors, compare their bitwidth using the
 | 
						||
   *cmpNumbers*. If result is not 0, return it.
 | 
						||
 | 
						||
   2.1.3.2. Different types, but not a vectors:
 | 
						||
 | 
						||
   * if both of them are pointers, good for us, we can proceed to step 3.
 | 
						||
   * if one of types is pointer, return result of *isPointer* flags
 | 
						||
     comparison (*cmpFlags* operation).
 | 
						||
   * otherwise we have no methods to prove bitcastability, and thus return
 | 
						||
     result of types comparison (-1 or 1).
 | 
						||
 | 
						||
Steps below are for the case when types are equal, or case when constants are
 | 
						||
bitcastable:
 | 
						||
 | 
						||
3. One of constants is a "*null*" value. Return the result of
 | 
						||
``cmpFlags(L->isNullValue, R->isNullValue)`` comparison.
 | 
						||
 | 
						||
4. Compare value IDs, and return result if it is not 0:
 | 
						||
 | 
						||
.. code-block:: c++
 | 
						||
 | 
						||
  if (int Res = cmpNumbers(L->getValueID(), R->getValueID()))
 | 
						||
    return Res;
 | 
						||
 | 
						||
5. Compare the contents of constants. The comparison depends on the kind of
 | 
						||
constants, but on this stage it is just a lexicographical comparison. Just see
 | 
						||
how it was described in the beginning of "*Functions comparison*" paragraph.
 | 
						||
Mathematically, it is equal to the next case: we encode left constant and right
 | 
						||
constant (with similar way *bitcode-writer* does). Then compare left code
 | 
						||
sequence and right code sequence.
 | 
						||
 | 
						||
compare(const BasicBlock*, const BasicBlock*)
 | 
						||
---------------------------------------------
 | 
						||
Compares two *BasicBlock* instances.
 | 
						||
 | 
						||
It enumerates instructions from left *BB* and right *BB*.
 | 
						||
 | 
						||
1. It assigns serial numbers to the left and right instructions, using
 | 
						||
``cmpValues`` method.
 | 
						||
 | 
						||
2. If one of left or right is *GEP* (``GetElementPtr``), then treat *GEP* as
 | 
						||
greater than other instructions. If both instructions are *GEPs* use ``cmpGEP``
 | 
						||
method for comparison. If result is -1 or 1, pass it to the top-level
 | 
						||
comparison (return it).
 | 
						||
 | 
						||
   3.1. Compare operations. Call ``cmpOperation`` method. If result is -1 or
 | 
						||
   1, return it.
 | 
						||
 | 
						||
   3.2. Compare number of operands, if result is -1 or 1, return it.
 | 
						||
 | 
						||
   3.3. Compare operands themselves, use ``cmpValues`` method. Return result
 | 
						||
   if it is -1 or 1.
 | 
						||
 | 
						||
   3.4. Compare type of operands, using ``cmpType`` method. Return result if
 | 
						||
   it is -1 or 1.
 | 
						||
 | 
						||
   3.5. Proceed to the next instruction.
 | 
						||
 | 
						||
4. We can finish instruction enumeration in 3 cases:
 | 
						||
 | 
						||
   4.1. We reached the end of both left and right basic-blocks. We didn't
 | 
						||
   exit on steps 1-3, so contents are equal, return 0.
 | 
						||
 | 
						||
   4.2. We have reached the end of the left basic-block. Return -1.
 | 
						||
 | 
						||
   4.3. Return 1 (we reached the end of the right basic block).
 | 
						||
 | 
						||
cmpGEP
 | 
						||
------
 | 
						||
Compares two GEPs (``getelementptr`` instructions).
 | 
						||
 | 
						||
It differs from regular operations comparison with the only thing: possibility
 | 
						||
to use ``accumulateConstantOffset`` method.
 | 
						||
 | 
						||
So, if we get constant offset for both left and right *GEPs*, then compare it as
 | 
						||
numbers, and return comparison result.
 | 
						||
 | 
						||
Otherwise treat it like a regular operation (see previous paragraph).
 | 
						||
 | 
						||
cmpOperation
 | 
						||
------------
 | 
						||
Compares instruction opcodes and some important operation properties.
 | 
						||
 | 
						||
1. Compare opcodes, if it differs return the result.
 | 
						||
 | 
						||
2. Compare number of operands. If it differs – return the result.
 | 
						||
 | 
						||
3. Compare operation types, use *cmpType*. All the same – if types are
 | 
						||
different, return result.
 | 
						||
 | 
						||
4. Compare *subclassOptionalData*, get it with ``getRawSubclassOptionalData``
 | 
						||
method, and compare it like a numbers.
 | 
						||
 | 
						||
5. Compare operand types.
 | 
						||
 | 
						||
6. For some particular instructions, check equivalence (relation in our case) of
 | 
						||
some significant attributes. For example, we have to compare alignment for
 | 
						||
``load`` instructions.
 | 
						||
 | 
						||
O(log(N))
 | 
						||
---------
 | 
						||
Methods described above implement order relationship. And latter, could be used
 | 
						||
for nodes comparison in a binary tree. So we can organize functions set into
 | 
						||
the binary tree and reduce the cost of lookup procedure from
 | 
						||
O(N*N) to O(log(N)).
 | 
						||
 | 
						||
Merging process, mergeTwoFunctions
 | 
						||
==================================
 | 
						||
Once *MergeFunctions* detected that current function (*G*) is equal to one that
 | 
						||
were analyzed before (function *F*) it calls ``mergeTwoFunctions(Function*,
 | 
						||
Function*)``.
 | 
						||
 | 
						||
Operation affects ``FnTree`` contents with next way: *F* will stay in
 | 
						||
``FnTree``. *G* being equal to *F* will not be added to ``FnTree``. Calls of
 | 
						||
*G* would be replaced with something else. It changes bodies of callers. So,
 | 
						||
functions that calls *G* would be put into ``Deferred`` set and removed from
 | 
						||
``FnTree``, and analyzed again.
 | 
						||
 | 
						||
The approach is next:
 | 
						||
 | 
						||
1. Most wished case: when we can use alias and both of *F* and *G* are weak. We
 | 
						||
make both of them with aliases to the third strong function *H*. Actually *H*
 | 
						||
is *F*. See below how it's made (but it's better to look straight into the
 | 
						||
source code). Well, this is a case when we can just replace *G* with *F*
 | 
						||
everywhere, we use ``replaceAllUsesWith`` operation here (*RAUW*).
 | 
						||
 | 
						||
2. *F* could not be overridden, while *G* could. It would be good to do the
 | 
						||
next: after merging the places where overridable function were used, still use
 | 
						||
overridable stub. So try to make *G* alias to *F*, or create overridable tail
 | 
						||
call wrapper around *F* and replace *G* with that call.
 | 
						||
 | 
						||
3. Neither *F* nor *G* could be overridden. We can't use *RAUW*. We can just
 | 
						||
change the callers: call *F* instead of *G*.  That's what
 | 
						||
``replaceDirectCallers`` does.
 | 
						||
 | 
						||
Below is a detailed body description.
 | 
						||
 | 
						||
If “F” may be overridden
 | 
						||
------------------------
 | 
						||
As follows from ``mayBeOverridden`` comments: “whether the definition of this
 | 
						||
global may be replaced by something non-equivalent at link time”. If so, that's
 | 
						||
ok: we can use alias to *F* instead of *G* or change call instructions itself.
 | 
						||
 | 
						||
HasGlobalAliases, removeUsers
 | 
						||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
First consider the case when we have global aliases of one function name to
 | 
						||
another. Our purpose is  make both of them with aliases to the third strong
 | 
						||
function. Though if we keep *F* alive and without major changes we can leave it
 | 
						||
in ``FnTree``. Try to combine these two goals.
 | 
						||
 | 
						||
Do stub replacement of *F* itself with an alias to *F*.
 | 
						||
 | 
						||
1. Create stub function *H*, with the same name and attributes like function
 | 
						||
*F*. It takes maximum alignment of *F* and *G*.
 | 
						||
 | 
						||
2. Replace all uses of function *F* with uses of function *H*. It is the two
 | 
						||
steps procedure instead. First of all, we must take into account, all functions
 | 
						||
from whom *F* is called would be changed: since we change the call argument
 | 
						||
(from *F* to *H*). If so we must to review these caller functions again after
 | 
						||
this procedure. We remove callers from ``FnTree``, method with name
 | 
						||
``removeUsers(F)`` does that (don't confuse with ``replaceAllUsesWith``):
 | 
						||
 | 
						||
   2.1. ``Inside removeUsers(Value*
 | 
						||
   V)`` we go through the all values that use value *V* (or *F* in our context).
 | 
						||
   If value is instruction, we go to function that holds this instruction and
 | 
						||
   mark it as to-be-analyzed-again (put to ``Deferred`` set), we also remove
 | 
						||
   caller from ``FnTree``.
 | 
						||
 | 
						||
   2.2. Now we can do the replacement: call ``F->replaceAllUsesWith(H)``.
 | 
						||
 | 
						||
3. *H* (that now "officially" plays *F*'s role) is replaced with alias to *F*.
 | 
						||
Do the same with *G*: replace it with alias to *F*. So finally everywhere *F*
 | 
						||
was used, we use *H* and it is alias to *F*, and everywhere *G* was used we
 | 
						||
also have alias to *F*.
 | 
						||
 | 
						||
4. Set *F* linkage to private. Make it strong :-)
 | 
						||
 | 
						||
No global aliases, replaceDirectCallers
 | 
						||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
If global aliases are not supported. We call ``replaceDirectCallers``. Just
 | 
						||
go through all calls of *G* and replace it with calls of *F*. If you look into
 | 
						||
the method you will see that it scans all uses of *G* too, and if use is callee
 | 
						||
(if user is call instruction and *G* is used as what to be called), we replace
 | 
						||
it with use of *F*.
 | 
						||
 | 
						||
If “F” could not be overridden, fix it!
 | 
						||
"""""""""""""""""""""""""""""""""""""""
 | 
						||
 | 
						||
We call ``writeThunkOrAlias(Function *F, Function *G)``. Here we try to replace
 | 
						||
*G* with alias to *F* first. The next conditions are essential:
 | 
						||
 | 
						||
* target should support global aliases,
 | 
						||
* the address itself of  *G* should be not significant, not named and not
 | 
						||
  referenced anywhere,
 | 
						||
* function should come with external, local or weak linkage.
 | 
						||
 | 
						||
Otherwise we write thunk: some wrapper that has *G's* interface and calls *F*,
 | 
						||
so *G* could be replaced with this wrapper.
 | 
						||
 | 
						||
*writeAlias*
 | 
						||
 | 
						||
As follows from *llvm* reference:
 | 
						||
 | 
						||
“Aliases act as *second name* for the aliasee value”. So we just want to create
 | 
						||
a second name for *F* and use it instead of *G*:
 | 
						||
 | 
						||
1. create global alias itself (*GA*),
 | 
						||
 | 
						||
2. adjust alignment of *F* so it must be maximum of current and *G's* alignment;
 | 
						||
 | 
						||
3. replace uses of *G*:
 | 
						||
 | 
						||
   3.1. first mark all callers of *G* as to-be-analyzed-again, using
 | 
						||
   ``removeUsers`` method (see chapter above),
 | 
						||
 | 
						||
   3.2. call ``G->replaceAllUsesWith(GA)``.
 | 
						||
 | 
						||
4. Get rid of *G*.
 | 
						||
 | 
						||
*writeThunk*
 | 
						||
 | 
						||
As it written in method comments:
 | 
						||
 | 
						||
“Replace G with a simple tail call to bitcast(F). Also replace direct uses of G
 | 
						||
with bitcast(F). Deletes G.”
 | 
						||
 | 
						||
In general it does the same as usual when we want to replace callee, except the
 | 
						||
first point:
 | 
						||
 | 
						||
1. We generate tail call wrapper around *F*, but with interface that allows use
 | 
						||
it instead of *G*.
 | 
						||
 | 
						||
2. “As-usual”: ``removeUsers`` and ``replaceAllUsesWith`` then.
 | 
						||
 | 
						||
3. Get rid of *G*.
 | 
						||
 | 
						||
 |