Original commit msg:
add the 'alloc' metadata node to represent the size of offset of buffers pointed to by pointers.
This metadata can be attached to any instruction returning a pointer
llvm-svn: 158688
Based on review discussion of r158638 with Chandler Carruth, Tobias von Koch, and Duncan Sands and a -Wmaybe-uninitialized warning from GCC.
llvm-svn: 158685
This patch changes the type used to hold the FU bitset from unsigned to uint64_t.
This will be needed for some upcoming PowerPC itineraries.
llvm-svn: 158679
It always returns the iterator for the first inserted element, or the passed in
iterator if the inserted range was empty. Flesh out the unit test more and fix
all the cases it uncovered so far.
llvm-svn: 158645
SmallDenseMap::swap.
First, make it parse cleanly. Yay for uninstantiated methods.
Second, make the inline-buckets case work correctly. This is way
trickier than it should be due to the uninitialized values in empty and
tombstone buckets.
Finally fix a few typos that caused construction/destruction mismatches
in the counting unittest.
llvm-svn: 158641
destruction and fix a bug in SmallDenseMap they caught.
This is kind of a poor-man's version of the testing that just adds the
addresses to a set on construction and removes them on destruction. We
check that double construction and double destruction don't occur.
Amusingly enough, this is enough to catch a lot of SmallDenseMap issues
because we spend a lot of time with fixed stable addresses in the inline
buffer.
The SmallDenseMap bug fix included makes grow() not double-destroy in
some cases. It also fixes a FIXME there, the code was pretty crappy. We
now don't have any wasted initialization, but we do move the entries in
inline bucket array an extra time. It's probably a better tradeoff, and
is much easier to get correct.
llvm-svn: 158639
implementation.
This type includes an inline bucket array which is used initially. Once
it is exceeded, an array of 64 buckets is allocated on the heap. The
bucket count grows from there as needed. Some highlights of this
implementation:
- The inline buffer is very carefully aligned, and so supports types
with alignment constraints.
- It works hard to avoid aliasing issues.
- Supports types with non-trivial constructors, destructors, copy
constructions, etc. It works reasonably hard to minimize copies and
unnecessary initialization. The most common initialization is to set
keys to the empty key, and so that should be fast if at all possible.
This class has a performance / space trade-off. It tries to optimize for
relatively small maps, and so packs the inline bucket array densely into
the object. It will be marginally slower than a normal DenseMap in a few
use patterns, so it isn't appropriate everywhere.
The unit tests for DenseMap have been generalized a bit to support
running over different map implementations in addition to different
key/value types. They've then been automatically extended to cover the
new container through the magic of GoogleTest's typed tests.
All of this is still a bit rough though. I'm going to be cleaning up
some aspects of the implementation, documenting things better, and
adding tests which include non-trivial types. As soon as I'm comfortable
with the correctness, I plan to switch existing users of SmallMap over
to this class as it is already more correct w.r.t. construction and
destruction of objects iin the map.
Thanks to Benjamin Kramer for all the reviews of this and the lead-up
patches. That said, more review on this would really be appreciated. As
I've noted a few times, I'm quite surprised how hard it is to get the
semantics for a hashtable-based map container with a small buffer
optimization correct. =]
llvm-svn: 158638
array of a suitable size and alignment for any of a number of different
types to be stored into the character array.
The mechanisms for producing an explicitly aligned type are fairly
complex because this operation is poorly supported on all compilers.
We've spent a fairly significant amount of time experimenting with
different implementations inside of Google, and the one using explicitly
expanded templates has been the most robust.
Credit goes to Nick Lewycky for writing the first 20 versions or so of
this logic we had inside of Google. I based this on the only one to
actually survive. In case anyone is worried, yes we are both explicitly
re-contributing and re-licensing it for LLVM. =]
Once the issues with actually specifying the alignment are finished, it
turns out that most compilers don't in turn align anything the way they
are instructed. Testing of this logic against both Clang and GCC
indicate that the alignment constraints are largely ignored by both
compilers! I've come up with and used a work-around by wrapping each
alignment-hinted type directly in a struct, and using that struct to
align the character array through a union. This elaborate hackery is
terrifying, but I've included testing that caught a terrifying number of
bugs in every other technique I've tried.
All of this in order to implement a poor C++98 programmers emulation of
C++11 unrestricted unions in classes such as SmallDenseMap.
llvm-svn: 158597
rather than the base class. Add a pile of boilerplate to indirect around
this.
This is pretty ugly, but it allows the super class to change the
representation of these values, which will be key for doing
a SmallDenseMap.
Suggestions on better method structuring / naming are welcome, but keep
in mind that SmallDenseMap won't have an 'unsigned' member to expose
a reference to... =/
llvm-svn: 158586
and a derived class that provides the allocation and growth strategy.
This is the first (and biggest) step toward building a SmallDenseMap
that actually behaves exactly the same as DenseMap, and supports all the
same types and interface points with the same semantics.
llvm-svn: 158585
since then the entire expression must equal zero (similarly for other operations
with an absorbing element). With this in place a bunch of reassociate code for
handling constants is dead since it is all taken care of when linearizing. No
intended functionality change.
llvm-svn: 158398
topologies, it is quite possible for a leaf node to have huge multiplicity, for
example: x0 = x*x, x1 = x0*x0, x2 = x1*x1, ... rapidly gives a value which is x
raised to a vast power (the multiplicity, or weight, of x). This patch fixes
the computation of weights by correctly computing them no matter how big they
are, rather than just overflowing and getting a wrong value. It turns out that
the weight for a value never needs more bits to represent than the value itself,
so it is enough to represent weights as APInts of the same bitwidth and do the
right overflow-avoiding dance steps when computing weights. As a side-effect it
reduces the number of multiplies needed in some cases of large powers. While
there, in view of external uses (eg by the vectorizer) I made LinearizeExprTree
static, pushing the rank computation out into users. This is progress towards
fixing PR13021.
llvm-svn: 158358
thread local data, embed them in the class using a uint64_t and make sure
we get compiler errors if there's a platform where this is not big enough.
This makes ThreadLocal more safe for using it in conjunction with CrashRecoveryContext.
Related to crash in rdar://11434201.
llvm-svn: 158342
This showed up the first time rend() was called on a bundled instruction
in the Mips backend.
Also avoid dereferencing end() in bundle_iterator::operator++().
We still don't have a place to put unit tests for this stuff.
llvm-svn: 158310
The LiveRegMatrix represents the live range of assigned virtual
registers in a Live interval union per register unit. This is not
fundamentally different from the interference tracking in RegAllocBase
that both RABasic and RAGreedy use.
The important differences are:
- LiveRegMatrix tracks interference per register unit instead of per
physical register. This makes interference checks cheaper and
assignments slightly more expensive. For example, the ARM D7 reigster
has 24 aliases, so we would check 24 physregs before assigning to one.
With unit-based interference, we check 2 units before assigning to 2
units.
- LiveRegMatrix caches regmask interference checks. That is currently
duplicated functionality in RABasic and RAGreedy.
- LiveRegMatrix is a pass which makes it possible to insert
target-dependent passes between register allocation and rewriting.
Such passes could tweak the register assignments with interference
checking support from LiveRegMatrix.
Eventually, RABasic and RAGreedy will be switched to LiveRegMatrix.
llvm-svn: 158255
OK, not really. We don't want to reintroduce the old rewriter hacks.
This patch extracts virtual register rewriting as a separate pass that
runs after the register allocator. This is possible now that
CodeGen/Passes.cpp can configure the full optimizing register allocator
pipeline.
The rewriter pass uses register assignments in VirtRegMap to rewrite
virtual registers to physical registers, and it inserts kill flags based
on live intervals.
These finalization steps are the same for the optimizing register
allocators: RABasic, RAGreedy, and PBQP.
llvm-svn: 158244
The commit is intended to fix rdar://11540023.
It is implemented as part of peephole optimization. We can actually implement
this in the SelectionDAG lowering phase.
llvm-svn: 158122
LLVM is now -Wunused-private-field clean except for
- lib/MC/MCDisassembler/Disassembler.h. Not sure why it keeps all those unaccessible fields.
- gtest.
llvm-svn: 158096
There are some that I didn't remove this round because they looked like
obvious stubs. There are dead variables in gtest too, they should be
fixed upstream.
llvm-svn: 158090
Don't print out the register number and spill weight, making the TRI
argument unnecessary.
This allows callers to interpret the reg field. It can currently be a
virtual register, a physical register, a spill slot, or a register unit.
llvm-svn: 158031
Instead of computing a live interval per physreg, LiveIntervals can
compute live intervals per register unit. This makes impossible the
confusing situation where aliasing registers could have overlapping live
intervals. It should also make fixed interferernce checking cheaper
since registers have fewer register units than aliases.
Live intervals for regunits are computed on demand, using MRI use-def
chains and the new LiveRangeCalc class. Only regunits live in to ABI
blocks are precomputed during LiveIntervals::runOnMachineFunction().
The regunit liveness computations don't depend on LiveVariables.
llvm-svn: 158029
expression (a * b + c) that can be implemented as a fused multiply-add (fma)
if the target determines that this will be more efficient. This intrinsic
will be used to implement FP_CONTRACT support and an aggressive FMA formation
mode.
If your target has a fast FMA instruction you should override the
isFMAFasterThanMulAndAdd method in TargetLowering to return true.
llvm-svn: 158014
Changed type of Items collection: from std::vector to std::list.
Also some small fixes made in IntegersSubset.h, IntegersSubsetMapping.h and IntegersSubsetTest.cpp.
llvm-svn: 157987
This allows a subtarget to explicitly specify the issue width and
other properties without providing pipeline stage details for every
instruction.
llvm-svn: 157979
valid itinerary but no pipeline stages.
An itinerary can contain useful scheduling information without specifying pipeline stages for each instruction.
llvm-svn: 157977
It is an old function that does a lot more than required by
CalcSpillWeights, which was the only remaining caller.
The isRematerializable() function never actually sets the isLoad
argument, so don't try to compute that.
llvm-svn: 157973
IntegersSubsetGeneric, IntegersSubsetMapping: added IntTy template parameter, that allows use either APInt or IntItem. This change allows to write unittest for these classes.
llvm-svn: 157880
These functions exposed the layout of the underlying data tables as
null-terminated uint16_t arrays.
Use the new MCSubRegIterator, MCSuperRegIterator, and MCRegAliasIterator
classes instead.
llvm-svn: 157855
No functional change intended.
Sorry for the churn. The iterator classes are supposed to help avoid
giant commits like this one in the future. The TableGen-produced
register lists are getting quite large, and it may be necessary to
change the table representation.
This makes it possible to do so without changing all clients (again).
llvm-svn: 157854
IntegersSubset devided into IntegersSubsetGeneric and into IntegersSubset itself. The first has no references to ConstantInt and works with IntItem only.
IntegersSubsetMapping also made generic. Here added second template parameter "IntegersSubsetTy" that allows to use on of two IntegersSubset types described below.
llvm-svn: 157815
IntItem cleanup. IntItemBase, IntItemConstantIntImp and IntItem merged into IntItem. All arithmetic operators was propogated from APInt. Also added comparison operators <,>,<=,>=. Currently you will find set of macros that propogates operators from APInt to IntItem in the beginning of IntegerSubset. Note that THESE MACROS WILL REMOVED after all passes will case-ranges compatible. Also note that these macros much smaller pain that something like this:
if (V->getValue().ugt(AnotherV->getValue()) { ... }
These changes made IntItem full featured integer object. It allows to make IntegerSubset class generic (move out all ConstantInt references inside and add unit-tests) in next commits.
llvm-svn: 157810
This patch will optimize the following
movq %rdi, %rax
subq %rsi, %rax
cmovsq %rsi, %rdi
movq %rdi, %rax
to
cmpq %rsi, %rdi
cmovsq %rsi, %rdi
movq %rdi, %rax
Perform this optimization if the actual result of SUB is not used.
rdar: 11540023
llvm-svn: 157755
Reg-units are named after their root registers, and most units have a
single root, so they simply print as 'AL', 'XMM0', etc. The rare dual
root reg-units print as FPSCR~FPSCR_NZCV, FP0~ST7, ...
The printing piggybacks on the existing register name tables, so no
extra const data space is required.
llvm-svn: 157754
Each register unit has one or two root registers. The full set of
registers containing a given register unit can be computed as the union
of the root registers and their super-registers.
Provide an MCRegUnitRootIterator class to enumerate the roots.
llvm-svn: 157753
This also required making recursive simplifications until
nothing changes or a hard limit (currently 3) is hit.
With the simplification in place indvars can canonicalize
loops of the form
for (unsigned i = 0; i < a-b; ++i)
into
for (unsigned i = 0; i != a-b; ++i)
which used to fail because SCEV created a weird umax expr
for the backedge taken count.
llvm-svn: 157701
Also add subclasses MCSubRegIterator, MCSuperRegIterator, and
MCRegAliasIterator.
These iterators provide an abstract interface to the MCRegisterInfo
register lists so the internal representation can be changed without
changing all clients.
llvm-svn: 157695
Besides adding the new insertPass function, this patch uses it to
enhance the existing -print-machineinstrs so that the MachineInstrs
after a specific pass can be printed.
Patch by Bin Zeng!
llvm-svn: 157655
The register unit lists are typically much shorter than the register
overlap lists, and the backing table for register units has better cache
locality because it is smaller.
This makes llc about 0.5% faster. The regsOverlap() function isn't that hot.
llvm-svn: 157651
Register units are already used internally in TableGen to compute
register pressure sets and overlapping registers. This patch makes them
available to the code generators.
The register unit lists are differentially encoded so they can be reused
for many related registers. This keeps the total size of the lists below
200 bytes for most targets. ARM has the largest table at 560 bytes.
Add an MCRegUnitIterator for traversing the register unit lists. It
provides an abstract interface so the representation can be changed in
the future without changing all clients.
llvm-svn: 157650
This required light surgery on the assembler and disassembler
because the instructions use an uncommon encoding. They are
the only two instructions in x86 that use register operands
and two immediates.
llvm-svn: 157634
Attribute bits above 1<<30 are now encoded correctly. Additionally,
the encoding/decoding functionality has been hoisted to helper functions
in Attributes.h in an effort to help the encoding/decoding to stay in
sync with the Attribute bitcode definitions.
llvm-svn: 157581
Implemented IntItem - the wrapper around APInt. Why not to use APInt item directly right now?
1. It will very difficult to implement case ranges as series of small patches. We got several large and heavy patches. Each patch will about 90-120 kb. If you replace ConstantInt with APInt in SwitchInst you will need to changes at the same time all Readers,Writers and absolutely all passes that uses SwitchInst.
2. We can implement APInt pool inside and save memory space. E.g. we use several switches that works with 256 bit items (switch on signatures, or strings). We can avoid value duplicates in this case.
3. IntItem can be easyly easily replaced with APInt.
4. Currenly we can interpret IntItem both as ConstantInt and as APInt. It allows to provide SwitchInst methods that works with ConstantInt for non-updated passes.
Why I need it right now? Currently I need to update SimplifyCFG pass (EqualityComparisons). I need to work with APInts directly a lot, so peaces of code
ConstantInt *V = ...;
if (V->getValue().ugt(AnotherV->getValue()) {
...
}
will look awful. Much more better this way:
IntItem V = ConstantIntVal->getValue();
if (AnotherV < V) {
}
Of course any reviews are welcome.
P.S.: I'm also going to rename ConstantRangesSet to IntegersSubset, and CRSBuilder to IntegersSubsetMapping (allows to map individual subsets of integers to the BasicBlocks).
Since in future these classes will founded on APInt, it will possible to use them in more generic ways.
llvm-svn: 157576
The only missing part is insert(), which uses a pair of parameters and I haven't
figured out how to convert it to rvalue references. It's now possible to use a
DenseMap with std::unique_ptr values :)
llvm-svn: 157539
to pass around a struct instead of a large set of individual values. This
cleans up the interface and allows more information to be added to the struct
for future targets without requiring changes to each and every target.
NV_CONTRIB
llvm-svn: 157479
The Hazard checker implements in-order contraints, or interlocked
resources. Ready instructions with hazards do not enter the available
queue and are not visible to other heuristics.
The major code change is the addition of SchedBoundary to encapsulate
the state at the top or bottom of the schedule, including both a
pending and available queue.
The scheduler now counts cycles in sync with the hazard checker. These
are minimum cycle counts based on known hazards.
Targets with no itinerary (x86_64) currently remain at cycle 0. To fix
this, we need to provide some maximum issue width for all targets. We
also need to add the concept of expected latency vs. minimum latency.
llvm-svn: 157427
This helps compile time when the greedy register allocator splits live
ranges in giant functions. Without the bias, we would try to grow
regions through the giant edge bundles, usually to find out that the
region became too big and expensive.
If a live range has many uses in blocks near the giant bundle, the small
negative bias doesn't make a big difference, and we still consider
regions including the giant edge bundle.
Giant edge bundles are usually connected to landing pads or indirect
branches.
llvm-svn: 157174
This class is meant to be the primary interface for examining a live
range in the vicinity on a given instruction. It avoids all the messy
dealings with iterators and early clobbers.
This is a more abstract interface to live ranges, hiding the
implementation as a vector of segments.
llvm-svn: 157141
getUDivExpr attempts to simplify by checking for overflow.
isLoopEntryGuardedByCond then evaluates the loop predicate which
may lead to the same getUDivExpr causing endless recursion.
Fixes PR12868: clang 3.2 segmentation fault.
llvm-svn: 157092
Use a dedicated MachO load command to annotate data-in-code regions.
This is the same format the linker produces for final executable images,
allowing consistency of representation and use of introspection tools
for both object and executable files.
Data-in-code regions are annotated via ".data_region"/".end_data_region"
directive pairs, with an optional region type.
data_region_directive := ".data_region" { region_type }
region_type := "jt8" | "jt16" | "jt32" | "jta32"
end_data_region_directive := ".end_data_region"
The previous handling of ARM-style "$d.*" labels was broken and has
been removed. Specifically, it didn't handle ARM vs. Thumb mode when
marking the end of the section.
rdar://11459456
llvm-svn: 157062
Many targets always use the same bitwise encoding value for physical
registers in all (or most) instructions. Add this mapping to the
.td files and TableGen'erate the information and expose an accessor
in MCRegisterInfo.
patch by Tom Stellard.
llvm-svn: 156829
so that it can be reused in MemCpyOptimizer. This analysis is needed to remove
an unnecessary memcpy when returning a struct into a local variable.
rdar://11341081
PR12686
llvm-svn: 156776
Returning a temporary BitVector is very expensive. If you must, create
the temporary explicitly: Use BitVector(A).flip() instead of ~A.
llvm-svn: 156768
These operators were crazy slow, calling malloc to return a temporary
result. At the same time, they look very innocent when used in code.
If you need temporary BitVectors to compute your thing, create them
explicitly, and use the inplace logical operators. This makes the high
cost explicit in the code.
llvm-svn: 156767
Ordinary patch for PR1255.
Added new case-ranges orientated methods for adding/removing cases in SwitchInst. After this patch cases will internally representated as ConstantArray-s instead of ConstantInt, externally cases wrapped within the ConstantRangesSet object.
Old methods of SwitchInst are also works well, but marked as deprecated. So on this stage we have no side effects except that I added support for case ranges in BitcodeReader/Writer, of course test for Bitcode is also added. Old "switch" format is also supported.
llvm-svn: 156704
This lets you save the textual representation of the LLVM IR to a file.
Before this patch it could only be printed to STDERR from llvm-c.
Patch by Carlo Kok!
llvm-svn: 156479
Added new case-ranges orientated methods for adding/removing cases in SwitchInst. After this patch cases will internally representated as ConstantArray-s instead of ConstantInt, externally cases wrapped within the ConstantRangesSet object.
Old methods of SwitchInst are also works well, but marked as deprecated. So on this stage we have no side effects except that I added support for case ranges in BitcodeReader/Writer, of course test for Bitcode is also added. Old "switch" format is also supported.
llvm-svn: 156374
The getPointerRegClass() hook can return register classes that depend on
the calling convention of the current function (ptr_rc_tailcall).
So far, we have been able to infer the calling convention from the
subtarget alone, but as we add support for multiple calling conventions
per target, that no longer works.
Patch by Yiannis Tsiouris!
llvm-svn: 156328
This function is a generalization of getMatchingSuperRegClass() to the
symmetric case where both sides are using a sub-register index. It will
find a super-register class and sub-register indexes that make this
diagram commute:
PreA
SuperRC ----------> RCA
| |
| |
PreB | | SubA
| |
| |
V V
RCB ----------> SubRC
SubB
This can be used to coalesce copies like:
%vreg1:sub16 = COPY %vreg2:sub16; GR64:%vreg1, GR32: %vreg2
llvm-svn: 156317
This will be used to determine whether it's profitable to turn a select into a
branch when the branch is likely to be predicted.
Currently enabled for everything but Atom on X86 and Cortex-A9 devices on ARM.
I'm not entirely happy with the name of this flag, suggestions welcome ;)
llvm-svn: 156233
add a new Region::block_iterator which actually iterates over the basic
blocks of the region.
The old iterator, now call 'block_node_iterator' iterates over
RegionNodes which contain a single basic block. This works well with the
GraphTraits-based iterator design, however most users actually want an
iterator over the BasicBlocks inside these RegionNodes. Now the
'block_iterator' is a wrapper which exposes exactly this interface.
Internally it uses the block_node_iterator to walk all nodes which are
single basic blocks, but transparently unwraps the basic block to make
user code simpler.
While this patch is a bit of a wash, most of the updates are to internal
users, not external users of the RegionInfo. I have an accompanying
patch to Polly that is a strict simplification of every user of this
interface, and I'm working on a pass that also wants the same simplified
interface.
This patch alone should have no functional impact.
llvm-svn: 156202
The new target machines are:
nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX
The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.
NV_CONTRIB
llvm-svn: 156196
of the CodeExtractor utility. This allows speculatively computing input
and output sets to measure the likely size impact of the code
extraction.
These sets cannot be reused sadly -- we mutate the function prior to
forming the final sets used by the actual extraction.
The interface has been revamped slightly to make it easier to use
correctly by making the interface const and sinking the computation of
the number of exit blocks into the full extraction function and away
from the rest of this logic which just computed two output parameters.
llvm-svn: 156168
and expose it as a utility class rather than as free function wrappers.
The simple free-function interface works well for the bugpoint-specific
pass's uses of code extraction, but in an upcoming patch for more
advanced code extraction, they simply don't expose a rich enough
interface. I need to expose various stages of the process of doing the
code extraction and query information to decide whether or not to
actually complete the extraction or give up.
Rather than build up a new predicate model and pass that into these
functions, just take the class that was actually implementing the
functions and lift it up into a proper interface that can be used to
perform code extraction. The interface is cleaned up and re-documented
to work better in a header. It also is now setup to accept the blocks to
be extracted in the constructor rather than in a method.
In passing this essentially reverts my previous commit here exposing
a block-level query for eligibility of extraction. That is no longer
necessary with the more rich interface as clients can query the
extraction object for eligibility directly. This will reduce the number
of walks of the input basic block sequence by quite a bit which is
useful if this enters the normal optimization pipeline.
llvm-svn: 156163
This manually enumerated list of super-register classes has been
superceeded by the automatically computed super-register class masks
available through SuperRegClassIterator.
llvm-svn: 156151
The masks returned by SuperRegClassIterator are computed automatically
by TableGen. This is better than depending on the manually specified
SuperRegClasses.
llvm-svn: 156147
This iterator class provides a more abstract interface to the (Idx,
Mask) lists of super-registers for a register class. The layout of the
tables shouldn't be exposed to clients.
llvm-svn: 156144
minor behavior changes with this, but nothing I have seen evidence of in
the wild or expect to be meaningful. The real goal is unifying our logic
and simplifying the interfaces. A summary of the changes follows:
- Make 'callIsSmall' actually accept a callsite so it can handle
intrinsics, and simplify callers appropriately.
- Nuke a completely bogus declaration of 'callIsSmall' that was still
lurking in InlineCost.h... No idea how this got missed.
- Teach the 'isInstructionFree' about the various more intelligent
'free' heuristics that got added to the inline cost analysis during
review and testing. This mostly surrounds int->ptr and ptr->int casts.
- Switch most of the interesting parts of the inline cost analysis that
were essentially computing 'is this instruction free?' to use the code
metrics routine instead. This way we won't keep duplicating logic.
All of this is motivated by the desire to allow other passes to compute
a roughly equivalent 'cost' metric for a particular basic block as the
inline cost analysis. Sadly, re-using the same analysis for both is
really messy because only the actual inline cost analysis is ever going
to go to the contortions required for simplification, SROA analysis,
etc.
llvm-svn: 156140
but using a FoldingSet underneath and with a largely compatible
interface to that of FoldingSet. This can be used anywhere a FoldingSet
would be natural, but iteration order is significant. The initial
intended use case is in Clang's template specialization lists to
preserve instantiation order iteration.
llvm-svn: 156131
This is a pointer into one of the tables used by
getMatchingSuperRegClass(). It makes it possible to use a shared
implementation of that function.
llvm-svn: 156121
extraction into a public interface. Also clean it up and apply it more
consistently such that we check for landing pads *anywhere* in the
extracted code, not just in single-block extraction.
This will be used to guide decisions in passes that are planning to
eventually perform a round of code extraction.
llvm-svn: 156114
Some targets have no sub-registers at all. Use the TargetRegisterInfo
versions of composeSubRegIndices(), getSubClassWithSubReg(), and
getMatchingSuperRegClass() for those targets.
llvm-svn: 156075
The ensures that virtual registers always belong to an allocatable class.
If your target attempts to create a vreg for an operand that has no
allocatable register subclass, you will crash quickly.
This ensures that targets define register classes as intended.
llvm-svn: 156046
This avoids warnings when included in a application that
uses -Wstrict-prototypes.
e.g: AsmPrinters.def:27:1: warning: function declaration isn't a prototype [-Wstrict-prototypes]
llvm-svn: 155997
Note that support for rvalue references does not imply support
for the full set of move-related STL operations.
I've preserved support for an odd little thing in insert() where
we're trying to support inserting a new element from an existing
one. If we actually want to support that, there's a lot more we
need to do: insert can call either grow or push_back, neither of
which is safe against this particular use pattern.
llvm-svn: 155979
The TargetPassManager's default constructor wants to initialize the PassManager
to 'null'. But it's illegal to bind a null reference to a null l-value. Make the
ivar a pointer instead.
PR12468
llvm-svn: 155902
Allow the "SplitCriticalEdge" function to split the edge to a landing pad. If
the pass is *sure* that it thinks it knows what it's doing, then it may go ahead
and specify that the landing pad can have its critical edge split. The loop
unswitch pass is one of these passes. It will split the critical edges of all
edges coming from a loop to a landing pad not within the loop. Doing so will
retain important loop analysis information, such as loop simplify.
llvm-svn: 155817
This way we can enable the POD-like class optimization for a lot more classes,
saving ~120k of code in clang/i386/Release+Asserts when selfhosting.
llvm-svn: 155761
- FlatArrayMap. Very simple map container that uses flat array inside.
- MultiImplMap. Map container interface, that has two modes, one for small amount of elements and one for big amount.
- SmallMap. SmallMap is DenseMap compatible MultiImplMap. It uses FlatArrayMap for small mode, and DenseMap for big mode.
Also added unittests for new classes and update for ProgrammersManual.
For more details about new classes see ProgrammersManual and comments in sourcecode.
llvm-svn: 155557
When an instruction match is found, but the subtarget features it
requires are not available (missing floating point unit, or thumb vs arm
mode, for example), issue a diagnostic that identifies what the feature
mismatch is.
rdar://11257547
llvm-svn: 155499
Strategy.
0. Implement new classes. Classes doesn't affect anything. They still work with ConstantInt base values at this stage.
1. Fictitious replacement of current ConstantInt case values with ConstantRangesSet. Case ranges set will still hold single value, and ConstantInt *getCaseValue() will return it. But additionally implement new method in SwitchInst that allows to work with case ranges. Currenly I think it should be some wrapper that returns either single value or ConstantRangesSet object.
2. Step-by-step replacement of old "ConstantInt* getCaseValue()" with new alternative. Modify algorithms for all passes that works with SwitchInst. But don't modify LLParser and BitcodeReader/Writer. Still hold single value in each ConstantRangesSet object. On this stage some parts of LLVM will use old-style methods, and some ones new-style.
3. After all getCaseValue() usages will removed and whole LLVM and its clients will work in new style - modify LLParser, Reader and Writer. Remove getCaseValue().
4. Replace ConstantInt*-based case ranges set items with APInt ones.
Currently we are on Zero Stage: New classes.
ConstantRangesSet.
I selected ConstantArrays as case ranges set "holder" object (it is a temporary decision, I'll explain why below). The array items are may be ConstantVectors with single item, and ConstantVectors with two items (that means single number and range respectively).
The ConstantInt will used as basic value representation. It will replaced with APInt then. Of course ConstantArray and ConstantVector will go away after ConstantInt => APInt replacement.
New class mandatory features:
- bool isSatisfies(ConstantInt *V) method (need better name?). Returns true if the given value satisfies this case.
- Case's ranges and values enumeration. In some passes we need to analize each case (SwitchLowering for example).
Factory + unified clusterify.
I also propose to implement the factory that allows to build case object with user friendly way. I called it CRSBuilder by now.
Currenly I implemented the factory that allows add,remove pairs of range+successor. It also allows add existing ConstantRangesSet decompiling it to separated ranges. Factory can emit either clusters set (single case range + successor) or the set of "ConstantRangesSet + Successor" pairs.
So you can use it either as builder for new cases set for SwitchInst, or for clusterification of existing cases set.
Just call Factory.optimize() and it emits optimized and sorted clusters collection for you!
I tested clusterification on SelectionDAGBuilder - it works fine. Don't worry it was not included in this patch. Just new classes.
Factory is a template. There are two params: SuccessorClass and IsReadonly. So you can specify what successor you need (BB or MBB). And you can also restrict your factory to use values in read-only mode (SelectionDAGBuilder need IsReadonly=true). Read-only factory couldn't build the cases ranges.
llvm-svn: 155464
on X86 Atom. Some of our tests failed because the tail merging part of
the BranchFolding pass was creating new basic blocks which did not
contain live-in information. When the anti-dependency code in the Post-RA
scheduler ran, it would sometimes rename the register containing
the function return value because the fact that the return value was
live-in to the subsequent block had been lost. To fix this, it is necessary
to run the RegisterScavenging code in the BranchFolding pass.
This patch makes sure that the register scavenging code is invoked
in the X86 subtarget only when post-RA scheduling is being done.
Post RA scheduling in the X86 subtarget is only done for Atom.
This patch adds a new function to the TargetRegisterClass to control
whether or not live-ins should be preserved during branch folding.
This is necessary in order for the anti-dependency optimizations done
during the PostRASchedulerList pass to work properly when doing
Post-RA scheduling for the X86 in general and for the Intel Atom in particular.
The patch adds and invokes the new function trackLivenessAfterRegAlloc()
instead of using the existing requiresRegisterScavenging().
It changes BranchFolding.cpp to call trackLivenessAfterRegAlloc() instead of
requiresRegisterScavenging(). It changes the all the targets that
implemented requiresRegisterScavenging() to also implement
trackLivenessAfterRegAlloc().
It adds an assertion in the Post RA scheduler to make sure that post RA
liveness information is available when it is needed.
It changes the X86 break-anti-dependencies test to use –mcpu=atom, in order
to avoid running into the added assertion.
Finally, this patch restores the use of anti-dependency checking
(which was turned off temporarily for the 3.1 release) for
Intel Atom in the Post RA scheduler.
Patch by Andy Zhang!
Thanks to Jakob and Anton for their reviews.
llvm-svn: 155395
test suite failures. The failures occur at each stage, and only get
worse, so I'm reverting all of them.
Please resubmit these patches, one at a time, after verifying that the
regression test suite passes. Never submit a patch without running the
regression test suite.
llvm-svn: 155372
The problem is that the struct file_status on UNIX systems has two
members called st_dev and st_ino; those are also members of the
struct stat, and they are reserved identifiers which can also be
provided as #define (and this is the case for st_dev on Hurd).
The solution (attached) is to rename them, for example adding a
"fs_" prefix (= file status) to them.
Patch by Pino Toscano
llvm-svn: 155354
It set NumLowBitAvailable = 3 which may not be true on all platforms. We only
ever use 2 bits (the default) so this assumption can be safely removed
Should fix PR12612.
llvm-svn: 155288
Now that multiple DAGUpdateListeners can be active at the same time,
ISelPosition can become a local variable in DoInstructionSelection.
We simply register an ISelUpdater with CurDAG while ISelPosition exists.
llvm-svn: 155249
Instead of passing listener pointers to RAUW, let SelectionDAG itself
keep a linked list of interested listeners.
This makes it possible to have multiple listeners active at once, like
RAUWUpdateListener was already doing. It also makes it possible to
register listeners up the call stack without controlling all RAUW calls
below.
DAGUpdateListener uses an RAII pattern to add itself to the SelectionDAG
list of active listeners.
llvm-svn: 155248
This nicely handles the most common case of virtual register sets, but
also handles anticipated cases where we will map pointers to IDs.
The goal is not to develop a completely generic SparseSet
template. Instead we want to handle the expected uses within llvm
without any template antics in the client code. I'm adding a bit of
template nastiness here, and some assumption about expected usage in
order to make the client code very clean.
The expected common uses cases I'm designing for:
- integer keys that need to be reindexed, and may map to additional
data
- densely numbered objects where we want pointer keys because no
number->object map exists.
llvm-svn: 155227
Assembly matchers for instructions with a two-operand form. ARM is full
of these, for example:
add {Rd}, Rn, Rm // Rd is optional and is the same as Rn if omitted.
The property TwoOperandAliasConstraint on the instruction definition controls
when, and if, an alias will be formed. No explicit InstAlias definitions
are required.
rdar://11255754
llvm-svn: 155172
commits have had several major issues pointed out in review, and those
issues are not being addressed in a timely fashion. Furthermore, this
was all committed leading up to the v3.1 branch, and we don't need piles
of code with outstanding issues in the branch.
It is possible that not all of these commits were necessary to revert to
get us back to a green state, but I'm going to let the Hexagon
maintainer sort that out. They can recommit, in order, after addressing
the feedback.
Reverted commits, with some notes:
Primary commit r154616: HexagonPacketizer
- There are lots of review comments here. This is the primary reason
for reverting. In particular, it introduced large amount of warnings
due to a bad construct in tablegen.
- Follow-up commits that should be folded back into this when
reposting:
- r154622: CMake fixes
- r154660: Fix numerous build warnings in release builds.
- Please don't resubmit this until the three commits above are
included, and the issues in review addressed.
Primary commit r154695: Pass to replace transfer/copy ...
- Reverted to minimize merge conflicts. I'm not aware of specific
issues with this patch.
Primary commit r154703: New Value Jump.
- Primarily reverted due to merge conflicts.
- Follow-up commits that should be folded back into this when
reposting:
- r154703: Remove iostream usage
- r154758: Fix CMake builds
- r154759: Fix build warnings in release builds
- Please incorporate these fixes and and review feedback before
resubmitting.
Primary commit r154829: Hexagon V5 (floating point) support.
- Primarily reverted due to merge conflicts.
- Follow-up commits that should be folded back into this when
reposting:
- r154841: Remove unused variable (fixing build warnings)
There are also accompanying Clang commits that will be reverted for
consistency.
llvm-svn: 155047
DenseMap's hash function uses slightly more entropy and reduces hash collisions
significantly. I also experimented with Hashing.h, but it didn't gave a lot of
improvement while being much more expensive to compute.
llvm-svn: 154996
also fix SimplifyLibCalls to use TLI rather than compile-time conditionals to enable optimizations on floor, ceil, round, rint, and nearbyint
llvm-svn: 154960
for the life of me remember why I wrote it this way, but I can't see any good
reason for it now. This patch replaces the custom linked list with an ilist.
This change should preserve the existing numberings exactly, so no generated code
should change (if it does, file a bug!).
llvm-svn: 154904
the MCJIT execution engine.
The GDB JIT debugging integration support works by registering a loaded
object image with a pre-defined function that GDB will monitor if GDB
is attached. GDB integration support is implemented for ELF only at this
time. This integration requires GDB version 7.0 or newer.
Patch by Andy Kaylor!
llvm-svn: 154868
through the use of 'fpmath' metadata. Currently this only provides a 'fpaccuracy'
value, which may be a number in ULPs or the keyword 'fast', however the intent is
that this will be extended with additional information about NaN's, infinities
etc later. No optimizations have been hooked up to this so far.
llvm-svn: 154822
and retrieving it from instructions. I don't have a use for this but is seems
logical for it to exist. While there, remove some 'const' markings from methods
which are in fact 'const' in practice, but aren't logically 'const'.
llvm-svn: 154811
so we don't want it to show up in the stable 3.1 interface.
While at it, add a comment about why LTOCodeGenerator manually creates the
internalize pass.
llvm-svn: 154807
thinking of generalizing it to be able to specify other freedoms beyond accuracy
(such as that NaN's don't have to be respected). I'd like the 3.1 release (the
first one with this metadata) to have the more generic name already rather than
having to auto-upgrade it in 3.2.
llvm-svn: 154744
This is a special flag for targets that really want their block
terminators in the DAG. The default scheduler cannot handle this
correctly, so it becomes the specialized scheduler's responsibility to
schedule terminators.
llvm-svn: 154712
directly instead of a user Instruction. This allows them to test
whether a def dominates a particular operand if the user instruction
is a PHI.
llvm-svn: 154631
of zero-initialized sections, virtual sections and common symbols
and preventing the loading of sections which are not required for
execution such as debug information.
Patch by Andy Kaylor!
llvm-svn: 154610
FoldingSet is implemented as a chained hash table. When there is a hash
collision during insertion, which is common as we fill the table until a
load factor of 2.0 is hit, we walk the chained elements, comparing every
operand with the new element's operands. This can be very expensive if the
MDNode has many operands.
We sacrifice a word of space in MDNode to cache the full hash value, reducing
compares on collision to a minimum. MDNode grows from 28 to 32 bytes + operands
on x86. On x86_64 the new bits fit nicely into existing padding, not growing
the struct at all.
The actual speedup depends a lot on the test case and is typically between
1% and 2% for C++ code with clang -c -O0 -g.
llvm-svn: 154497
StringMap. This was redundant and unnecessarily bloated the MDString class.
Because the MDString class is a "Value" and will never have a "name", and
because the Name field in the Value class is a pointer to a StringMap entry, we
repurpose the Name field for an MDString. It stores the StringMap entry in the
Name field, and uses the normal methods to get the string (name) back.
PR12474
llvm-svn: 154429
Take this opportunity to generalize the indirectbr bailout logic for
loop transformations. CFG transformations will never get indirectbr
right, and there's no point trying.
llvm-svn: 154386
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.
PR12419
rdar://9770785
rdar://11195178
llvm-svn: 154370
A couple of cases where we were accidentally creating constant conditions by
something like "x == a || b" instead of "x == a || x == b". In one case a
conditional & then unreachable was used - I transformed this into a direct
assert instead.
llvm-svn: 154324
optimizations which are valid for position independent code being linked
into a single executable, but not for such code being linked into
a shared library.
I discussed the design of this with Eric Christopher, and the decision
was to support an optional bit rather than a completely separate
relocation model. Fundamentally, this is still PIC relocation, its just
that certain optimizations are only valid under a PIC relocation model
when the resulting code won't be in a shared library. The simplest path
to here is to expose a single bit option in the TargetOptions. If folks
have different/better designs, I'm all ears. =]
I've included the first optimization based upon this: changing TLS
models to the *Exec models when PIE is enabled. This is the LLVM
component of PR12380 and is all of the hard work.
llvm-svn: 154294
in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.
llvm-svn: 154292
optimizers could do this for us, but expecting partial SROA of classes
with template methods through cloning is probably expecting too much
heroics. With this change, the begin/end pointer pairs which indicate
the status of each loop iteration are actually passed directly into each
layer of the combine_data calls, and the inliner has a chance to see
when most of the combine_data function could be deleted by inlining.
Similarly for 'length'.
We have to be careful to limit the places where in/out reference
parameters are used as those will also defeat the inliner / optimizers
from properly propagating constants.
With this change, LLVM is able to fully inline and unroll the hash
computation of small sets of values, such as two or three pointers.
These now decompose into essentially straight-line code with no loops or
function calls.
There is still one code quality problem to be solved with the hashing --
LLVM is failing to nuke the alloca. It removes all loads from the
alloca, leaving only lifetime intrinsics and dead(!!) stores to the
alloca. =/ Very unfortunate.
llvm-svn: 154264
by default.
This is a behaviour configurable in the MCAsmInfo. I've decided to turn
it on by default in (possibly optimistic) hopes that most assemblers are
reasonably sane. If this proves a problem, switching to default seems
reasonable.
I'm not sure if this is the opportune place to test, but it seemed good
to make sure it was tested somewhere.
llvm-svn: 154235
The empty 1-argument operator delete is for the benefit of the
destructor. A couple of spot checks of running yaml-bench under
valgrind against a few of the files under test/YAMLParser did
not reveal any leaks introduced by this change.
llvm-svn: 154137
of the BBVectorizePass without using command line option. As pointed out
by Hal, we can ask the TargetLoweringInfo for the architecture specific
VectorizeConfig to perform vectorizing with architecture specific
information.
llvm-svn: 154096
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
llvm-svn: 154011
brace) so that we get more accurate line number information about the
declaration of a given function and the line where the function
first starts.
Part of rdar://11026482
llvm-svn: 153916
This patch allows llvm to recognize that a 64 bit object file is being produced
and that the subsequently generated ELF header has the correct information.
The test case checks for both big and little endian flavors.
Patch by Jack Carter.
llvm-svn: 153889
rather than a bitfield, a great suggestion by Chris during code review.
There is still quite a bit of cruft in the interface, but that requires
sorting out some awkward uses of the cost inside the actual inliner.
No functionality changed intended here.
llvm-svn: 153853
This also avoids emitting the information twice, which led to code bloat. On i386-linux-Release+Asserts
with all targets built this change shaves a whopping 1.3 MB off clang. The number is probably exaggerated
by recent inliner changes but the methods were already enormous with the old inline cost computation.
The DWARF reg -> LLVM reg mapping doesn't seem to have holes in it, so it could be a simple lookup table.
I didn't implement that optimization yet to avoid potentially changing functionality.
There is still some duplication both in tablegen and the generated code that should be cleaned up eventually.
llvm-svn: 153837