destination location of a memcpy/memmove. I'm not clear about whether
TBAA works on these, so I'm leaving it out for now. Dan, please revisit
this when convenient.
llvm-svn: 119928
allowing the memcpy to be eliminated.
Unfortunately, the requirements on byval's without explicit
alignment are really weak and impossible to predict in the
mid-level optimizer, so this doesn't kick in much with current
frontends. The fix is to change clang to set alignment on all
byval arguments.
llvm-svn: 119916
so don't claim they are. They are allocated using DAG.getNode, so attempts
to access MemSDNode fields results in reading off the end of the allocated
memory. This fixes crashes with "llc -debug" due to debug code trying to
print MemSDNode fields for these barrier nodes (since the crashes are not
deterministic, use valgrind to see this). Add some nasty checking to try
to catch this kind of thing in the future.
llvm-svn: 119901
It is now possible to navigate the B+-tree using NodeRef::subtree() and
NodeRef::size() without knowing the key and value template types used in the
tree.
llvm-svn: 119880
Key and value objects may not be destructed instantly when they are erased from
the container, but they will be destructed eventually by the IntervalMap
destructor.
llvm-svn: 119873
llvm/include/llvm/ADT/IntervalMap.h:334: error: '((llvm::IntervalMapImpl::DesiredNodeBytes / static_cast<unsigned int>(((2 * sizeof (KeyT)) + sizeof (ValT)))) >? 3u)' is not a valid template argument for type 'unsigned int' because it is a non-constant expression
llvm-svn: 119790
This is a sorted interval map data structure for small keys and values with
automatic coalescing and bidirectional iteration over coalesced intervals.
Except for coalescing intervals, it provides similar functionality to std::map.
It is however much more compact for small keys and values, and hopefully faster
too.
The container object itself can hold the first few intervals without any
allocations, then it switches to a cache conscious B+-tree representation. A
recycling allocator can be shared between many containers, even between
containers holding different types.
The IntervalMap is initially intended to be used with SlotIndex intervals for:
- Backing store for LiveIntervalUnion that is smaller and faster than std::set.
- Backing store for LiveInterval with less overhead than std::vector for typical
intervals and O(N log N) merging of large intervals. 99% of virtual registers
need 4 entries or less and would benefit from the small object optimization.
- Backing store for LiveDebugVariable which doesn't exist yet, but will track
debug variables during register allocation.
This is a work in progress. Missing items are:
- Performance metrics.
- erase().
- insert() shrinkage.
- clear().
- More performance metrics.
- Simplification and detemplatization.
llvm-svn: 119787
MCStreamer instead of just MCObjectStreamer. Address changes cannot
be as efficient as we have to use DW_LNE_set_addres, but at least
most of the logic is shared.
This will be used so that, with CodeGen still using EmitDwarfLocDirective,
llvm-gcc is able to produce debug_line sections without needing an
assembler that supports .loc.
llvm-svn: 119777
This is a sorted interval map data structure for small keys and values with
automatic coalescing and bidirectional iteration over coalesced intervals.
Except for coalescing intervals, it provides similar functionality to std::map.
It is however much more compact for small keys and values, and hopefully faster
too.
The container object itself can hold the first few intervals without any
allocations, then it switches to a cache conscious B+-tree representation. A
recycling allocator can be shared between many containers, even between
containers holding different types.
The IntervalMap is initially intended to be used with SlotIndex intervals for:
- Backing store for LiveIntervalUnion that is smaller and faster than std::set.
- Backing store for LiveInterval with less overhead than std::vector for typical
intervals and O(N log N) merging of large intervals. 99% of virtual registers
need 4 entries or less and would benefit from the small object optimization.
- Backing store for LiveDebugVariable which doesn't exist yet, but will track
debug variables during register allocation.
This is a work in progress. Missing items are:
- Performance metrics.
- erase().
- insert() shrinkage.
- clear().
- More performance metrics.
- Simplification and detemplatization.
llvm-svn: 119772
preserves LCSSA form out of ScalarEvolution and into the LoopInfo
class. Use it to check that SimplifyInstruction simplifications
are not breaking LCSSA form. Fixes PR8622.
llvm-svn: 119727
The attached patch fixes IRBuilder and the NoFolder class so that when
NoFolder is used the instructions it generates are treated just like
the ones IRBuilder creates directly (insert into block, assign them a
name and debug info, as applicable).
It does this by
1) having NoFolder return Instruction*s instead of Value*s,
2) having IRBuilder call Insert(Value, Name) on values obtained from
the folder like it does on instructions it creates directly, and
3) adding an Insert(Constant*, const Twine& = "") overload which just
returns the constant so that the other folders shouldn't have any
extra overhead as long as inlining is enabled.
While I was there, I also added some missing (CreateFNeg and various
Create*Cast) methods to NoFolder.
llvm-svn: 119614
and testing is easier. A good example is the unknown-location.ll test that
now can just look for ".loc 1 0 0". We also don't use a DW_LNE_set_address for
every address change anymore.
llvm-svn: 119613
and xor. The 32-bit move immediates can be hoisted out of loops by machine
LICM but the isel hacks were preventing them.
Instead, let peephole optimization pass recognize registers that are defined by
immediates and the ARM target hook will fold the immediates in.
Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ
instructions if there are multiple uses. This happens when the 'and' is live
out, machine sink would have sinked the computation and that ends up pessimizing
code. The peephole pass would recognize situations where the 'and' can be
toggled to define CPSR and eliminate the comparison anyway.
2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking
important optimizations.
rdar://8663787, rdar://8241368
llvm-svn: 119548
instructions out of InstCombine and into InstructionSimplify. While
there, introduce an m_AllOnes pattern to simplify matching with integers
and vectors with all bits equal to one.
llvm-svn: 119536
simplified to itself (this can only happen in unreachable blocks).
Change it to return null instead. Hopefully this will fix some
buildbot failures.
llvm-svn: 119490
cookie argument to the SourceMgr diagnostic stuff. This cleanly separates
LLVMContext's inlineasm handler from the sourcemgr error handling
definition, increasing type safety and cleaning things up.
llvm-svn: 119486
class, uses DominatorTree which is an analysis. This change moves all of
the tricky hasConstantValue logic to SimplifyInstruction, and replaces it
with a very simple literal implementation. I already taught users of
hasConstantValue that need tricky stuff to use SimplifyInstruction instead.
I didn't update InlineFunction because the IR looks like it might be in a
funky state at the point it calls hasConstantValue, which makes calling
SimplifyInstruction dangerous since it can in theory do a lot of tricky
reasoning. This may be a pessimization, for example in the case where
all phi node operands are either undef or a fixed constant.
llvm-svn: 119459
The system API's will be shifted over to returning an error_code, and returning
other return values as out parameters to the function.
Code that needs to check error conditions will use the errc enum values which
are the same as the posix_errno defines (EBADF, E2BIG, etc...), and are
compatable with the error codes in WinError.h due to some magic in system_error.
An example would be:
if (error_code ec = KillEvil("Java")) { // error_code can be converted to bool.
handle_error(ec);
}
llvm-svn: 119360
over a phi node by applying it to each operand may be wrong if the
operation and the phi node are mutually interdependent (the testcase
has a simple example of this). So only do this transform if it would
be correct to perform the operation in each predecessor of the block
containing the phi, i.e. if the other operands all dominate the phi.
This should fix the FFMPEG snow.c regression reported by İsmail Dönmez.
llvm-svn: 119347
variable if recursing fails to simplify it.
Factor AliasedSymbol to be a method of MCSymbol.
Update MCAssembler::EvaluateFixup to match the change in
EvaluateAsRelocatableImpl.
Remove the WeakRefExpr hack, as the object writer now sees the weakref with
no extra effort needed.
Nothing else is using MCTargetExpr, but keep it for now.
Now that the ELF writer sees relocations with aliases, handle
.weak foo2
foo2:
.weak bar2
.set bar2,foo2
.quad bar2
the same way gas does and produce a relocation with bar2.
llvm-svn: 119152
This moves most of the isUsed logic to the MCSymbol itself. With this we
get a bit more relaxed about allowing definitions after uses: uses that
don't evaluate their argument immediately (jmp foo) are accepted.
ddunbar, this was the smallest compromise I could think of that lets us
accept gcc (and clang!) assembly.
llvm-svn: 119144
nodes to indicate when ha16/lo16 modifiers should be used. This lets
us pass PowerPC/indirectbr.ll.
The one annoying thing about this patch is that the MCSymbolExpr isn't
expressive enough to represent ha16(label1-label2) which we need on
PowerPC. I have a terrible hack in the meantime, but this will have
to be revisited at some point.
Last major conversion item left is global variable references.
llvm-svn: 119105
testing for dereferenceable pointers into a helper function,
isDereferenceablePointer. Teach it how to reason about GEPs
with simple non-zero indices.
Also eliminate ArgumentPromtion's IsAlwaysValidPointer,
which didn't check for weak externals or out of range gep
indices.
llvm-svn: 118840
This is the first small step towards using closed intervals for liveness instead
of the half-open intervals we're using now.
We want to be able to distinguish between a SlotIndex that represents a variable
being live-out of a basic block, and an index representing a variable live-in to
its successor.
That requires two separate indexes between blocks. One for live-outs and one for
live-ins.
With this change, getMBBEndIdx(MBB).getPrevSlot() becomes stable so it stays
greater than any instructions inserted at the end of MBB.
llvm-svn: 118747
references. For example, this allows gvn to eliminate the load in
this example:
void foo(int n, int* p, int *q) {
p[0] = 0;
p[1] = 1;
if (n) {
*q = p[0];
}
}
llvm-svn: 118714
benchmarks hitting an assertion.
Adds LiveIntervalUnion::collectInterferingVRegs.
Fixes "late spilling" by checking for any unspillable live vregs among
all physReg aliases.
llvm-svn: 118701
to optionally look for constant or local (alloca) memory.
Teach BasicAliasAnalysis::pointsToConstantMemory to look through Select
and Phi nodes, and to support looking for local memory.
Remove FunctionAttrs' PointsToLocalOrConstantMemory function, now that
AliasAnalysis knows all the tricks that it knew.
llvm-svn: 118412
operand list instead of the operand list redundantly declared on the alias
or instruction.
With this change, we finally remove the ins/outs list on the alias. Before:
def : InstAlias<(outs GR16:$dst), (ins GR8 :$src),
"movsx $src, $dst",
(MOVSX16rr8W GR16:$dst, GR8:$src)>;
After:
def : InstAlias<"movsx $src, $dst",
(MOVSX16rr8W GR16:$dst, GR8:$src)>;
This also makes the alias mechanism more general and powerful, which will
be exploited in subsequent patches.
llvm-svn: 118329
To create debugging information for a pointer, using DIBUilder front-end just needs
DBuilder.CreatePointerType(Ty, Size);
instead of
DebugFactory.CreateDerivedType(llvm::dwarf::DW_TAG_pointer_type,
TheCU, "", getOrCreateMainFile(),
0, Size, 0, 0, 0, OCTy);
llvm-svn: 118248
value type, so there is no point in passing it around using
an EVT. Use the simpler MVT everywhere. Rather than trying
to propagate this information maximally in all the code that
using the calling convention stuff, I chose to do a mainly
low impact change instead.
llvm-svn: 118167
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
"optimize for latency". Call instructions don't have the right latency and
this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
not # of micro-ops since multi-latency instructions is completely executed
even when the predicate is false. Also, some instruction will be "slower"
when they are predicated due to the register def becoming implicit input.
rdar://8598427
llvm-svn: 118135
// FIXME: We should compute this sooner, we don't want to recurse here, and
// we would like to be more functional.
In MCAssembler::ComputeFragmentSize.
llvm-svn: 118080
specializations provided here. This is a little annoying because its size
changes from platform to platform. If possible, I may follow up with a patch
that uses standard constants to simplify much of this, but assuming for now
that was avoided for a reason.
llvm-svn: 117880
directives, allowing things like this:
def : MnemonicAlias<"pop", "popl">, Requires<[In32BitMode]>;
def : MnemonicAlias<"pop", "popq">, Requires<[In64BitMode]>;
Move the rest of the X86 MnemonicAliases over to the .td file.
llvm-svn: 117830
smooshlab build. The breakage seems to be due to a collision
between LLVM's ATTRIBUTE_UNUSED and gcc's which was previously
hidden due to header files being included in a lucky order.
llvm-svn: 117260
framework. It's purpose is not to improve register allocation per se,
but to make it easier to develop powerful live range splitting. I call
it the basic allocator because it is as simple as a global allocator
can be but provides the building blocks for sophisticated register
allocation with live range splitting.
A minimal implementation is provided that trivially spills whenever it
runs out of registers. I'm checking in now to get high-level design
and style feedback. I've only done minimal testing. The next step is
implementing a "greedy" allocation algorithm that does some register
reassignment and makes better splitting decisions.
llvm-svn: 117174
A RegionPass is executed like a LoopPass but on the regions detected by the
RegionInfo pass instead of the loops detected by the LoopInfo pass.
llvm-svn: 116905
Pull an unsigned out of the Contents union such that it has the same size as two
pointers and no padding.
Arrange members such that the Contents union and all pointers can be 8-byte
aligned without padding.
This speeds up code generation by 0.8% on a 64-bit host. 32-bit hosts should be
unaffected.
llvm-svn: 116857
must be called in the pass's constructor. This function uses static dependency declarations to recursively initialize
the pass's dependencies.
Clients that only create passes through the createFooPass() APIs will require no changes. Clients that want to use the
CommandLine options for passes will need to manually call the appropriate initialization functions in PassInitialization.h
before parsing commandline arguments.
I have tested this with all standard configurations of clang and llvm-gcc on Darwin. It is possible that there are problems
with the static dependencies that will only be visible with non-standard options. If you encounter any crash in pass
registration/creation, please send the testcase to me directly.
llvm-svn: 116820
strange packaging environments. The primary result of this is to expose
a (normally empty) CLANG_RESOURCE_DIR string in the autoconf and CMake builds.
This will in turn be used by a subsequent commit to Clang.
Regenerated configure and config.h.in thanks to Nick. =D
llvm-svn: 116802
"long latency" enough to hoist even if it may increase spilling. Reloading
a value from spill slot is often cheaper than performing an expensive
computation in the loop. For X86, that means machine LICM will hoist
SQRT, DIV, etc. ARM will be somewhat aggressive with VFP and NEON
instructions.
- Enable register pressure aware machine LICM by default.
llvm-svn: 116781
does normal initialization and normal chaining. Change the default
AliasAnalysis implementation to NoAlias.
Update StandardCompileOpts.h and friends to explicitly request
BasicAliasAnalysis.
Update tests to explicitly request -basicaa.
llvm-svn: 116720
logic to use the new APInt methods. Among other things this
implements rdar://8501501 - llvm.smul.with.overflow.i32 should constant fold
which comes from "clang -ftrapv", originally brought to my attention from PR8221.
llvm-svn: 116457
getExpandedRegion() enables us to create non canonical regions. Those regions
can be used to define the largerst region, that fullfills a certain property.
llvm-svn: 116397
perform initialization without static constructors AND without explicit initialization
by the client. For the moment, passes are required to initialize both their
(potential) dependencies and any passes they preserve. I hope to be able to relax
the latter requirement in the future.
llvm-svn: 116334
Added ARM specific ELF section types.
Added AttributesSection to ARMElfTargetObject
First step in unifying .cpu assembly tag with ELF/.o
llc now asserts on actual ELF emission on -filetype=obj :-)
llvm-svn: 116257
connected components. These components should be allocated different virtual
registers because there is no reason for them to be allocated together.
Add the ConnectedVNInfoEqClasses class to calculate the connected components,
and move values to new LiveIntervals.
Use it from SplitKit::rewrite by creating new virtual registers for the
components.
llvm-svn: 116006
This function is intended to be used when inserting a machine instruction that
trivially restricts the legal registers, like LEA requiring a GR32_NOSP
argument.
llvm-svn: 115875
allow target to correctly compute latency for cases where static scheduling
itineraries isn't sufficient. e.g. variable_ops instructions such as
ARM::ldm.
This also allows target without scheduling itineraries to compute operand
latencies. e.g. X86 can return (approximated) latencies for high latency
instructions such as division.
- Compute operand latencies for those defined by load multiple instructions,
e.g. ldm and those used by store multiple instructions, e.g. stm.
llvm-svn: 115755
expand to an initializeMyPass() function (in additional to the extant static ctors). Eventually, these will be called
from a big InitializeAllPasses() function, and the PassInfo's they create (which would be leaked if this code were used
at the moment) will be handed off to a PassRegistry for ownership.
llvm-svn: 115703
it in with the SSSE3 instructions.
Steward! Could you place this chair by the aft sun deck? I'm trying to get away
from the Astors. They are such boors!
llvm-svn: 115552
1) Changed ValidateDwarfFileNumber() to isValidDwarfFileNumber() to be better
named. Since it is just a predicate and isn't actually changing any state.
2) Added a missing return in the comments for setCurrentDwarfLoc() in
include/llvm/MC/MCContext.h for fix formatting.
3) Changed clearDwarfLocSeen() to ClearDwarfLocSeen() since it does change
state.
4) Simplified the last test in isValidDwarfFileNumber() to just a one line
boolean test of MCDwarfFiles[FileNumber] != 0 for the final return statement.
llvm-svn: 115551
is partly because this attribute caused trouble in the past (the
SmallVector one had to be changed from aligned to aligned(8) due
to causing crashes on i386 for example; in theory the same might
be needed in the Allocator case...). But it's mostly because
there seems to be no point in special casing gcc here. Using the
same implementation for all compilers results in better testing.
llvm-svn: 115462
LiveInterval::MergeValueNumberInto instead of trying to extend LiveRanges and
getting it wrong.
This fixed PR8249 where a valno with a multi-segment live range was defined by
an identity copy created by RemoveCopyByCommutingDef. Some of the live
segments disappeared.
llvm-svn: 115385
stick with a constant estimate of 90% (branch predictors are good!), but we might find that we want to provide
more nuanced estimates in the future.
llvm-svn: 115364
The x86_mmx type is used for MMX intrinsics, parameters and
return values where these use MMX registers, and is also
supported in load, store, and bitcast.
Only the above operations generate MMX instructions, and optimizations
do not operate on or produce MMX intrinsics.
MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into
smaller pieces. Optimizations may occur on these forms and the
result casted back to x86_mmx, provided the result feeds into a
previous existing x86_mmx operation.
The point of all this is prevent optimizations from introducing
MMX operations, which is unsafe due to the EMMS problem.
llvm-svn: 115243
resolved or not. Different object files have different restrictions and
different native assemblers have different idiosyncrasies we want to emulate
for now.
Move the existing MachO logic to the new place and implement an ELF one that
gets fixups to globals right.
llvm-svn: 115131
or not. TableGen needs to generate the printInstruction() function as taking
an MCInstr* or a MachineInstr*, depending. Default to the old non-MC
version so that everything not yet using MC continues to just work without
fidding.
llvm-svn: 115126
hide jump threading opportunities by turning control flow into data flow. Run an early JumpThreading pass
(adds approximately an additional 1% to optimization time on SPEC), allowing it to get a shot at these cases
first. Fixes <rdar://problem/8447345>.
llvm-svn: 115099
Rather than having arbitrary cutoffs, actually try to cost model the conversion.
For now, the constants are tuned to more or less match our existing behavior, but these will be
changed to reflect realistic values as this work proceeds.
llvm-svn: 114973