running tail duplication when doing branch folding for if-conversion, and
we also want to be able to run tail duplication earlier to fix some
reg alloc problems. Move the CanFallThrough function from BranchFolding
to MachineBasicBlock so that it can be shared by TailDuplication.
llvm-svn: 89904
it is definitely profitable to tail duplicate indirect branches for x86.
This is likely to be true to various degrees for all modern x86 processors.
llvm-svn: 89865
Make tail duplication of indirect branches much more aggressive (for targets
that indicate that it is profitable), based on further experience with
this transformation. I compiled 3 large applications with and without
this more aggressive tail duplication and measured minimal changes in code
size. ("size" on Darwin seems to round the text size up to the nearest
page boundary, so I can only say that any code size increase was less than
one 4k page.) Radar 7421267.
llvm-svn: 89814
This violates the ABI (that area is "reserved"), and
while it is safe if all code is generated with current
compilers, there is some very old code around that uses
that slot for something else, and breaks if it is stored
into. Adjust testcases looking for current behavior.
I've verified that the stack frame size is right in all
testcases, whether it changed or not. 7311323.
llvm-svn: 89811
than doing the same via constpool:
1. Load from constpool costs 3 cycles on A9, movt/movw pair - just 2.
2. Load from constpool might stall up to 300 cycles due to cache miss.
3. Movt/movw does not use load/store unit.
4. Less constpool entries => better compiler performance.
This is only enabled on ELF systems, since darwin does not have needed
relocations (yet).
llvm-svn: 89720
way for each TargetJITInfo subclass to allocate its own stubs. This
means stubs aren't as exactly-sized anymore, but it lets us get rid of
TargetJITInfo::emitFunctionStubAtAddr(), which lets ARM and PPC
support the eager JIT, fixing http://llvm.org/PR4816.
* Rename the JITEmitter's stub creation functions to describe the kind
of stub they create. So far, all of them create lazy-compilation
stubs, but they sometimes get used when far-call stubs are needed.
Fixing http://llvm.org/PR5201 will involve fixing this.
llvm-svn: 89715
Note that "hasDotLocAndDotFile"-style debug info was already broken;
people wanting this functionality should implement it in the
AsmPrinter/DwarfWriter code.
llvm-svn: 89711
ConstantExpr, not just the top-level operator. This allows it to
fold many more constants.
Also, make GlobalOpt call ConstantFoldConstantExpression on
GlobalVariable initializers.
llvm-svn: 89659
the body to not pass the name for the isSigned parameter. However it
seems that changing prototypes is a big-no-no, so here I revert the
previous change and pass "true" for isSigned, meaning this always does
a signed cast, which was the previous behaviour assuming the name was
not NULL! Some other C function needs to be introduced for the general
case of signed or unsigned casts. This hopefully unbreaks the ocaml
binding.
llvm-svn: 89648
tell debug info which base register to use to reference a frame index on a
per-index basis. This is useful, for example, in the presence of dynamic
stack realignment when local variables are indexed via the stack pointer and
stack-based arguments via the frame pointer.
llvm-svn: 89620
When splitting a critical edge, the registers live through the edge are:
- Used in a PHI instruction, or
- Live out from the predecessor, and
- Live in to the successor.
This allows the coalescer to eliminate even more phi joins.
llvm-svn: 89530
DIEs are created from MDNode, which are already uniqued. And DwarfDebug already uses ValueMaps to find and use existing DIE for a given MDNode.
llvm-svn: 89518
it may be used in contexts where preheader insertion may have failed due
to an indirectbr.
Make LoopSimplify's LoopSimplify::SeparateNestedLoop properly fail in
the case that it would require splitting an indirectbr edge.
These fix PR5502.
llvm-svn: 89484
constant pool ranges, as CPEIsInRange() makes conservative assumptions about
the potential alignment changes from branch adjustments. The verification,
on the other hand, runs after those branch adjustments are made, so the
effects on alignment are known and already taken into account. The sanity
check in verify should check the range directly instead.
llvm-svn: 89473
same object to be a non-capture; Duncan pointed out a way that such
a comparison could be a capture.
Make the rule that considers a comparison against null more specific,
and only consider noalias return values compared against null. This
still supports test/Transforms/GVN/nonescaping-malloc.ll, and is not
susceptible to the problem Duncan pointed out with noalias arguments.
llvm-svn: 89468
Makes '--comma-separated val1,val2' mean the same thing as
'--comma-separated=val1,val2' (that is, 'val1' and 'val2' are not lumped
together as 'val1,val2'). Also declutters the main loop a bit.
llvm-svn: 89463
tests/Transforms/InstCombine/shufflemask-undef.ll. If
anyone cares, the use of 2*e here (and the equivalent
all over the place in instcombine) seems wrong, though
harmless: it should really be twice the length of the
input vector. I think shufflevector used to require
that the mask have the same length as the input, but I
don't think that's true any more. I don't care enough
about vectors to do anything about this...
llvm-svn: 89456
which was an expensive checks failure due to a bug in the checking. This
patch in essence reverts the original fix for PR3393, and refixes it by a
tweak to the way expensive checking is done.
llvm-svn: 89454
because if the results from getUnderlyingObject match, the values must
be from the same underlying object, even if we don't know what that
object is.
llvm-svn: 89434
careful about crazy methods of capturing pointers using comparisons.
Comparisons of identified objects with null in the default address
space are not captures. And, comparisons of two pointers within the
same identified object are not captures.
llvm-svn: 89421
assembly can confuse things utterly, as it's assumed that instructions in
inline assembly are 4 bytes wide. For Thumb mode, that's often not true,
so the calculations for when alignment padding will be present get thrown off,
ultimately leading to out of range constant pool entry references. Making
more conservative assumptions that padding may be necessary when inline asm
is present avoids this situation.
llvm-svn: 89403
if it is not ultimately captured. Teach BasicAliasAnalysis that a
local object address which does not escape and is never stored does
not alias with a value resulting from a load.
llvm-svn: 89398
critical edges in PHIElimination.
This has a huge impact on regalloc performance, and we recover almost all of
the 10% compile time regression that edge splitting introduced.
llvm-svn: 89381
fully specified at this level. Subclasses of NLdStLN can specify selective
bit(s) for Inst{7-4}, as is done for VLD[234]LN* and VST[234]LN* inside
ARMInstrNEON.td.
llvm-svn: 89377
they are lowered to instruction sequences more complex than a simple
load, such that CodeGen cannot rematerialize them, a reload from a
spill slot is likely to be cheaper than the complex sequence.
llvm-svn: 89374
Add a -linearscan-skip-count argument (default to 0) that tells the
allocator to remember the last N registers it allocated and skip them
when looking for a register candidate. This tends to spread out
register usage and free up post-allocation scheduling at the cost of
slightly more register pressure. The primary benefit is the ability
to backschedule reloads.
This is turned off by default.
llvm-svn: 89356
All spiller calls in RegAllocLinearScan now go through the new Spiller interface.
The "-new-spill-framework" command line option has been removed. To use the trivial in-place spiller you should now pass "-spiller=trivial -rewriter=trivial".
(Note the trivial spiller/rewriter are only meant to serve as examples of the new in-place modification work. Enabling them will yield terrible, though hopefully functional, code).
llvm-svn: 89311
When TwoAddressInstructionPass deletes a dead instruction, make sure that all
register kills are accounted for. The 2-addr register does not get special
treatment.
llvm-svn: 89246
when LiveVariables is available.
The -split-phi-edges is now gone, and so is the hack to disable it when using
the local register allocator. The PHIElimination pass no longer has
LiveVariables as a prerequisite - that is what broke the local allocator.
Instead we do critical edge splitting when possible - that is when
LiveVariables is available.
llvm-svn: 89213
contents of the block to be duplicated. Use this for ARM Cortex A8/9 to
be more aggressive tail duplicating indirect branches, since it makes it
much more likely that they will be predicted in the branch target buffer.
Testcase coming soon.
llvm-svn: 89187
This is probably not confined to *just* these two things.
Anyway, the llvm-gcc front-end may look up the structure layout information for
an abstract type. That information will be stored into a table with the FE's
TD. Instruction combine can come along and also ask for information on that
abstract type, but for a separate TD (the one associated with the pass manager).
After the type is refined, the old structure layout information in the pass
manager's TD file is out of date. If a new type is allocated in the same space
as the old-unrefined type, then the structure type information in the pass
manager's TD file will be wrong, but won't know it.
Fix this by making the TD's structure type information an abstract type user.
llvm-svn: 89176
The local register allocator doesn't like it when LiveVariables is run.
We should also disable edge splitting under -O0, but that has to wait a bit.
llvm-svn: 89125
unnecessary. It is broken because the "isIdenticalTo" check should be
negated. If that is fixed, this code causes the CodeGen/X86/tail-opts.ll
test to fail, in the dont_merge_oddly function. And, I confirmed that the
regression is real -- the generated code is worse. As far as I can tell,
that tail-opts.ll test is checking for what this code is supposed to handle
and we're doing the right thing anyway.
llvm-svn: 89121
unconditional branches or fallthroghes. Instcombine/SimplifyCFG
should be simplifying branches with known conditions.
This fixes some problems caused by these transformations not
updating the MachineBasicBlock CFG.
llvm-svn: 89017
The large code model is documented at
http://www.x86-64.org/documentation/abi.pdf and says that calls should
assume their target doesn't live within the 32-bit pc-relative offset
that fits in the call instruction.
To do this, we turn off the global-address->target-global-address
conversion in X86TargetLowering::LowerCall(). The first attempt at
this broke the lazy JIT because it can separate the movabs(imm->reg)
from the actual call instruction. The lazy JIT receives the address of
the movabs as a relocation and needs to record the return address from
the call; and then when that call happens, it needs to patch the
movabs with the newly-compiled target. We could thread the call
instruction into the relocation and record the movabs<->call mapping
explicitly, but that seems to require at least as much new
complication in the code generator as this change.
To fix this, we make lazy functions _always_ go through a call
stub. You'd think we'd only have to force lazy calls through a stub on
difficult platforms, but that turns out to break indirect calls
through a function pointer. The right fix for that is to distinguish
between calls and address-of operations on uncompiled functions, but
that's complex enough to leave for someone else to do.
Another attempt at this defined a new CALL64i pseudo-instruction,
which expanded to a 2-instruction sequence in the assembly output and
was special-cased in the X86CodeEmitter's emitInstruction()
function. That broke indirect calls in the same way as above.
This patch also removes a hack forcing Darwin to the small code model.
Without far-call-stubs, the small code model requires things of the
JITMemoryManager that the DefaultJITMemoryManager can't provide.
Thanks to echristo for lots of testing!
llvm-svn: 88984
Have the asm printer emit a comment if an instruction is a spill or
reload and have the spiller mark copies it introdues so the asm printer
can also annotate those.
llvm-svn: 88911
- This is an initial step towards -march=native support in Clang, and towards
eliminating host dependencies in the targets. See PR5389.
- Patch by Roman Divacky!
llvm-svn: 88768
When splitting an edge after a machine basic block with fall-through, we
forgot to insert a jump instruction. Fix this by calling updateTerminator() on
the fall-through block when relevant.
Also be more precise in PHIElimination::isLiveIn.
llvm-svn: 88728
The BasicBlock associated with a MachineBasicBlock does not necessarily
correspond to the code in the MBB.
Don't insert a new IR BasicBlock when splitting critical edges. We are not
supposed to modify the IR during codegen, and we should be able to do just
fine with a NULL BB.
llvm-svn: 88707
code-size win, and not when it's only likely to be code-size neutral,
such as when only a single instruction would be eliminated and a new
branch would be required.
This fixes rdar://7392894.
llvm-svn: 88692
D0<def,dead> = ...
...
= S0<use, kill>
S0<def> = ...
...
D0<def> =
The first D0 def is correctly marked dead, however, livevariables should have
added an implicit def of S0 or we end up with a use without a def.
llvm-svn: 88690
Provide special isLoadFromStackSlotPostFE and isStoreToStackSlotPostFE
interfaces to explicitly request checking for post-frame ptr elimination
operands. This uses a heuristic so it isn't reliable for correctness.
llvm-svn: 87047
machine instruction loads or stores from/to a stack slot. Unlike
isLoadFromStackSlot and isStoreFromStackSlot, the instruction may be
something other than a pure load/store (e.g. it may be an arithmetic
operation with a memory operand). This helps AsmPrinter determine when
to print a spill/reload comment.
This is only a hint since we may not be able to figure this out in all
cases. As such, it should not be relied upon for correctness.
Implement for X86. Return false by default for other architectures.
llvm-svn: 87026
and don't assume that the call doesn't throw. It would be nice if there were a
way to determine which is the callee and which is a parameter. In practice, the
architecture we care about normally only have one operand for a call instruction
(x86 and arm).
llvm-svn: 87023
slots. The AsmPrinter will use this information to determine whether to
print a spill/reload comment.
Remove default argument values. It's too easy to pass a wrong argument
value when multiple arguments have default values. Make everything
explicit to trap bugs early.
Update all targets to adhere to the new interfaces..
llvm-svn: 87022
making it visible to clients and adding LLVM-style cast capability.
This will be used by AsmPrinter to determine when to emit spill comments
for an instruction.
llvm-svn: 87019
running IPSCCP early, and we run functionattrs interlaced with the inliner,
we often (particularly for small or noop functions) completely propagate
all of the information about a call to its call site in IPSSCP (making a call
dead) and functionattrs is smart enough to realize that the function is
readonly (because it is interlaced with inliner).
To improve compile time and make the inliner threshold more accurate, realize
that we don't have to inline dead readonly function calls. Instead, just
delete the call. This happens all the time for C++ codes, here are some
counters from opt/llvm-ld counting the number of times calls were deleted vs
inlined on various apps:
Tramp3d opt:
5033 inline - Number of call sites deleted, not inlined
24596 inline - Number of functions inlined
llvm-ld:
667 inline - Number of functions deleted because all callers found
699 inline - Number of functions inlined
483.xalancbmk opt:
8096 inline - Number of call sites deleted, not inlined
62528 inline - Number of functions inlined
llvm-ld:
217 inline - Number of allocas merged together
2158 inline - Number of functions inlined
471.omnetpp:
331 inline - Number of call sites deleted, not inlined
8981 inline - Number of functions inlined
llvm-ld:
171 inline - Number of functions deleted because all callers found
629 inline - Number of functions inlined
Deleting a call is much faster than inlining it, and is insensitive to the
size of the callee. :)
llvm-svn: 86975
cannot be folded into target cmp instruction.
- Avoid a phase ordering issue where early cmp optimization would prevent the
later count-to-zero optimization.
- Add missing checks which could cause LSR to reuse stride that does not have
users.
- Fix a bug in count-to-zero optimization code which failed to find the pre-inc
iv's phi node.
- Remove, tighten, loosen some incorrect checks disable valid transformations.
- Quite a bit of code clean up.
llvm-svn: 86969
making the new LVI stuff smart enough to subsume some special
cases in the old code. Disable them when LVI is around, the
testcase still passes.
llvm-svn: 86951
This allows StringRef to skip controversial if(str) check in constructor.
Buildbots, wait for corresponding clang and llvm-gcc FE check-ins!
llvm-svn: 86914
instead of typedefs for std::pair. This simplifies the type of
SameTails, which previously was std::vector<std::pair<std::vector<std::pair<unsigned, MachineBasicBlock *> >::iterator, MachineBasicBlock::iterator>
llvm-svn: 86885
tail merging support to handle more cases.
- Recognize several cases where tail merging is beneficial even when
the tail size is smaller than the generic threshold.
- Make use of MachineInstrDesc::isBarrier to help detect
non-fallthrough blocks.
- Check for and avoid disrupting fall-through edges in more cases.
llvm-svn: 86871
- Edges are split before any phis are eliminated, so the code is SSA.
- Create a proper IR BasicBlock for the split edges.
- LiveVariables::addNewBlock now has same syntax as
MachineDominatorTree::addNewBlock. Algorithm calculates predecessor live-out
set rather than successor live-in set.
This feature still causes some miscompilations.
llvm-svn: 86867
llvm.invariant.start to be used without necessarily being paired with a call
to llvm.invariant.end. If you run the entire optimization pipeline then such
calls are in fact deleted (adce does it), but that's actually a good thing since
we probably do want them to be zapped late in the game. There should really be
an integration test that checks that the llvm.invariant.start call lasts long
enough that all passes that do interesting things with it get to do their stuff
before it is deleted. But since no passes do anything interesting with it yet
this will have to wait for later.
llvm-svn: 86840
can only branch forward. To best take advantage of them, we'd like to adjust
the basic blocks around a bit when reasonable. This patch puts basics in place
to do that, with a super-simple algorithm for backwards jump table targets that
creates a new branch after the jump table which branches backwards. Real
heuristics for reordering blocks or other modifications rather than inserting
branches will follow.
llvm-svn: 86791
start using them in a trivial way when -enable-jump-threading-lvi
is passed. enable-jump-threading-lvi will be my playground for
awhile.
llvm-svn: 86789
constant whose component type is not a legal type for the target.
(If the target ConstantPool cannot handle this type either, it has
an opportunity to merge elements. In practice any target with
8-bit bytes must support i8 *as data*). 7320806 (partial).
llvm-svn: 86751
generates a sequence similar to this:
__Z4funci:
LFB2:
mflr r0
LCFI0:
stmw r30,-8(r1)
LCFI1:
stw r0,8(r1)
LCFI2:
stwu r1,-80(r1)
LCFI3:
mr r30,r1
LCFI4:
where LCFI3 and LCFI4 are used by the FDE to indicate what the FP, LR, and other
things are. We generated something more like this:
Leh_func_begin1:
mflr r0
stw r31, 20(r1)
stw r0, 8(r1)
Llabel1:
stwu r1, -80(r1)
Llabel2:
mr r31, r1
Note that we are missing the "mr" instruction. This patch makes it more like the
GCC output.
llvm-svn: 86729
Critical edges leading to a PHI node are split when the PHI source variable is
live out from the predecessor block. This help the coalescer eliminate more
PHI joins.
llvm-svn: 86725
debug intrinsics, and an unconditional branch when possible. This
reuses the TryToSimplifyUncondBranchFromEmptyBlock function split
out of simplifycfg.
llvm-svn: 86722
- Force NDEBUG on in any Release build. This drops the compile time to ~100s
from ~600s, in Release mode.
- This may just be a temporary workaround, I don't know the true nature of the
gcc-4.2 compile time performance problem.
llvm-svn: 86695
just one level deep. On the testcase we go from getting this:
F1: ; preds = %T2
%F = and i1 true, %cond ; <i1> [#uses=1]
br i1 %F, label %X, label %Y
to a fully threaded:
F1: ; preds = %T2
br label %Y
This changes gets us to the point where we're forming (too many) switch
instructions on doug's strswitch testcase.
llvm-svn: 86646
except that the result may not be a constant. Switch jump threading to
use it so that it gets things like (X & 0) -> 0, which occur when phi preds
are deleted and the remaining phi pred was a zero.
llvm-svn: 86637
This patch forbids implicit conversion of DenseMap::const_iterator to
DenseMap::iterator which was possible because DenseMapIterator inherited
(publicly) from DenseMapConstIterator. Conversion the other way around is now
allowed as one may expect.
The template DenseMapConstIterator is removed and the template parameter
IsConst which specifies whether the iterator is constant is added to
DenseMapIterator.
Actually IsConst parameter is not necessary since the constness can be
determined from KeyT but this is not relevant to the fix and can be addressed
later.
Patch by Victor Zverovich!
llvm-svn: 86636
takes decimated instructions and applies identities to them. This
is pretty minimal at this point, but I plan to pull some instcombine
logic out into these and similar routines.
llvm-svn: 86613
was generated. This caused code like this:
## The asm code for the function
.section __TEXT,__const
.align 2
lJTI11_0:
LJTI11_0:
.long LBB11_16
.long LBB11_4
.long LBB11_5
.long LBB11_6
.long LBB11_7
.long LBB11_8
.long LBB11_9
.long LBB11_10
.long LBB11_11
.long LBB11_12
.long LBB11_13
.long LBB11_14
Leh_func_end11: ## <---now in the wrong section!
The `Leh_func_end11' would then end up in the wrong section, causing the
resulting EH frame information to be wrong:
__ZL11CheckRightsjPKcbRbRP6NSData.eh:
.set Lset500eh,Leh_frame_end11-Leh_frame_begin11
.long Lset500eh ; Length of Frame Information Entry
Leh_frame_begin11:
.long Leh_frame_begin11-Leh_frame_common
.long Leh_func_begin11-.
.set Lset501eh,Leh_func_end11-Leh_func_begin11
.long Lset501eh ; FDE address range
`Lset501eh' is now something huge instead of the real value.
The X86 back-end generates the jump table after the EH information is
emitted. Do the same here.
llvm-svn: 86588
the loop. This is needed because with indirectbr it may not be possible
for LoopSimplify to guarantee that all loop exit predecessors are
inside the loop. This fixes PR5437.
LCCSA no longer actually requires LoopSimplify form, but for now it
must still have the dependency because the PassManager doesn't know
how to schedule LoopSimplify otherwise.
llvm-svn: 86569
here:
1) We need to avoid processing sigma nodes as phi nodes for constraint generation.
2) We need to generate constraints for comparisons against constants properly.
This includes our first working ABCD test!
llvm-svn: 86498
graphs being produced. The cause was that we were incorrectly marking sigma instructions as
processed after handling the sigma-specific constraints for them, potentially neglecting to
process them as normal instructions as well.
Unfortunately, the testcase that inspired this still doesn't work because of a bug in the solver,
which is next on the list to debug.
llvm-svn: 86486
when both the source and dest are illegal types, since it would cause
the phi to grow (for example, we shouldn't transform test14b's phi to
a phi on i320). This fixes an infinite loop on i686 bootstrap with
phi slicing turned on, so turn it back on.
llvm-svn: 86483
not turn a PHI in a legal type into a PHI of an illegal type, and
add a new optimization that breaks up insane integer PHI nodes into
small pieces (PR3451).
llvm-svn: 86443
1. rename the movhp patfrag to movlhps, since thats what it actually matches
2. eliminate the bogus movhps load and store patterns, they were incorrect. The load transforms are already handled (correctly) by shufps/unpack.
3. revert a recent test change to its correct form.
llvm-svn: 86415
(eliminating some extends) if the new type of the
computation is legal or if both the source and dest
are illegal. This prevents instcombine from changing big
chains of computation into i64 on 32-bit targets for
example.
llvm-svn: 86398
datatypes on a given CPU. This is intended to allow instcombine and other
transformations to avoid converting big sequences of operations to an
inconvenient width, and will help clean up after SRoA. See also "Adding
legal integer sizes to TargetData" on Feb 1, 2009 on llvmdev, and PR3451.
Comments welcome.
llvm-svn: 86370
MachineRelocations, "stub" always refers to a far-call stub or a
load-a-faraway-global stub, so this patch adds "Far" to the term. (Other stubs
are used for lazy compilation and dlsym address replacement.) The variable was
also inconsistent between the positive and negative sense, and the positive
sense ("NeedStub") was more demanding than is accurate (since a nearby-enough
function can be called directly even if the platform often requires a stub).
Since the negative sense causes double-negatives, I switched to
"MayNeedFarStub" globally.
llvm-svn: 86363
except it doesn't care if the definitions' virtual registers differ. This is
used by machine LICM and other MI passes to perform CSE.
- Teach Thumb2InstrInfo::isIdentical() to check two t2LDRpci_pic are identical.
Since pc relative constantpool entries are always different, this requires it
it check if the values can actually the same.
llvm-svn: 86328
A non-identity copy cannot be coalesced when the phi join destination register
is live at the copy site.
Also verify the condition that the PHI join source register is only used in
the PHI join. Otherwise the coalescing is invalid.
llvm-svn: 86322
was wrong and too aggressive in the sense that DPSoRegFrm includes both constant
shifts (with Inst{4} = 0) and register controlled shifts (with Inst{4} = 1 and
Inst{7} = 0). The 'rr' fragment of the multiclass definitions actually means
register/register with no shift, see A8-11.
llvm-svn: 86319
Here is the original commit message:
This commit updates malloc optimizations to operate on malloc calls that have constant int size arguments.
Update CreateMalloc so that its callers specify the size to allocate:
MallocInst-autoupgrade users use non-TargetData-computed allocation sizes.
Optimization uses use TargetData to compute the allocation size.
Now that malloc calls can have constant sizes, update isArrayMallocHelper() to use TargetData to determine the size of the malloced type and the size of malloced arrays.
Extend getMallocType() to support malloc calls that have non-bitcast uses.
Update OptimizeGlobalAddressOfMalloc() to optimize malloc calls that have non-bitcast uses. The bitcast use of a malloc call has to be treated specially here because the uses of the bitcast need to be replaced and the bitcast needs to be erased (just like the malloc call) for OptimizeGlobalAddressOfMalloc() to work correctly.
Update PerformHeapAllocSRoA() to optimize malloc calls that have non-bitcast uses. The bitcast use of the malloc is not handled specially here because ReplaceUsesOfMallocWithGlobal replaces through the bitcast use.
Update OptimizeOnceStoredGlobal() to not care about the malloc calls' bitcast use.
Update all globalopt malloc tests to not rely on autoupgraded-MallocInsts, but instead use explicit malloc calls with correct allocation sizes.
llvm-svn: 86311
of going through the global TheJIT variable. This makes it easier to use
features of JITEmitter that aren't in JITCodeEmitter for fixing PR5201.
llvm-svn: 86305
load of a GV from constantpool and then add pc. It allows the code sequence to
be rematerializable so it would be hoisted by machine licm.
- Add a late pass to break these pseudo instructions into a number of real
instructions. Also move the code in Thumb2 IT pass that breaks up t2MOVi32imm
to this pass. This is done before post regalloc scheduling to allow the
scheduler to proper schedule these instructions. It also allow them to be
if-converted and shrunk by later passes.
llvm-svn: 86304
will not accept negative values for these. LLVM's default operand printing
sign extends values, so that valid unsigned values appear as negative
immediates. Print all VMOV immediate operands as hex values to resolve this.
Radar 7372576.
llvm-svn: 86301
predicates. This allows us to jump thread things like:
_ZN12StringSwitchI5ColorE4CaseILj7EEERS1_RAT__KcRKS0_.exit119:
%tmp1.i24166 = phi i8 [ 1, %bb5.i117 ], [ %tmp1.i24165, %_Z....exit ], [ %tmp1.i24165, %bb4.i114 ]
%toBoolnot.i87 = icmp eq i8 %tmp1.i24166, 0 ; <i1> [#uses=1]
%tmp4.i90 = icmp eq i32 %tmp2.i, 6 ; <i1> [#uses=1]
%or.cond173 = and i1 %toBoolnot.i87, %tmp4.i90 ; <i1> [#uses=1]
br i1 %or.cond173, label %bb4.i96, label %_ZN12...
Where it is "obvious" that when coming from %bb5.i117 that the 'and' is always
false. This triggers a surprisingly high number of times in the testsuite,
and gets us closer to generating good code for doug's strswitch testcase.
This also make a bunch of other code in jump threading redundant, I'll rip
out in the next patch. This survived an enable-checking llvm-gcc bootstrap.
llvm-svn: 86264
unsplittable critical edges, which means the introduction of
loops which cannot be transformed to LoopSimplify form. Fix
LoopSimplify to avoid transforming such loops into invalid
code.
llvm-svn: 86176
makes several optimization passes abort in cases where they're currently
silently miscompiling code.
Remove the indirectbr assertion from SplitEdge. Indirectbr is only
a problem for critical edges, and SplitEdge defers to SplitCriticalEdge
to handle those, and SplitCriticalEdge has its own assertion for
indirectbr.
llvm-svn: 86147