Summary:
nsw are flaky and can often be removed by optimizations. This patch enhances
nsw by leveraging @llvm.assume in the IR. Specifically, NaryReassociate now
understands that
assume(a + b >= 0) && assume(a >= 0) ==> a +nsw b
As a result, it can split more sext(a + b) into sext(a) + sext(b) for CSE.
Test Plan: nary-gep.ll
Reviewers: broune, meheff
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10822
llvm-svn: 241139
This change extends the detection of base pointers for vector constructs to handle arbitrary phi and select nodes. The existing non-vector code already handles those, so this is basically just extending the vector special case to be less special cased. It still isn't generalized vector handling since we can't handle arbitrary vector instructions (e.g. shufflevectors), but it's a lot closer.
The general structure of the change is as follows:
* Extend the base defining value relation over a subset of vector instructions and vector typed phi & select instructions.
* Move scalarization from before base pointer rewriting to after base pointer rewriting. The extension of the BDV relation is sufficient to find vector base phis for vector inputs.
* Preserve the existing special case logic for when the base of a vector element is locally obvious. This general idea could be extended to the scalar case as well.
Differential Revision: http://reviews.llvm.org/D10461#inline-84275
llvm-svn: 240850
This previously caused miscompilations as a result of phi nodes receiving
undef incoming values from blocks dominated by such successors.
Differential Revision: http://reviews.llvm.org/D10726
llvm-svn: 240670
r240214 fixed some UB in IndVarSimplify, and it needed a temporary
`WeakVH` to do it. Add `simplify_type<const WeakVH>` so that this
temporary isn't necessary.
llvm-svn: 240599
We performed a simple, but incomplete, intersection when it came time to
CSE instructions. It didn't handle, for example, the 'exact' flag.
This fixes PR23922.
llvm-svn: 240595
Reassociate mutated existing instructions in order to form negations
which would create additional reassociate opportunities.
This fixes PR23926.
llvm-svn: 240593
Change 1: Unswitching on trivial conditions should always happen regardless of the computed unswitching cost, as really the cost is zero. While there is code to make that happen, the logic that checks the unswitching cost against a threshold was moved to an earlier point (revision 147935) than the point where trivial unswitching is detected, so trivial unswitching is currently blocked by the cost threshold. This change fixes that.
Change 2: Before revision 147935 (from 2012-01-11), the threshold parameter was a per-loop threshold. So an unswitching happened only if the cost of the unswitching was less than the threshold. In an indirect way (and I believe unintentionally), the logic for this since then has been that the threshold is an over-all budget across all loops for all loop unswitching done by a given LoopUnswitch loop pass object. So if an unswitching with cost 100 happens in one function, that in effect reduces the threshold from 100 to 0 for the loops even in another function. This persists for the lifetime of that loop pass object. This makes no difference for most small examples but it is important for large examples. This revision fixes that.
Change 3: The cost is currently calculated as std::min(NumInstructions, 5 * NumBlocks). So a loop with 2 blocks and a million instructions will have an unswitching cost of 10. I changed this to just NumInstructions, as it were before revision 147935, though I'm open to e.g. instead replacing std::min with std::max.
I've tried to make the change minimally invasive while staying with what I think was the original intent of the code.
Submitted on behalf of broune@.
llvm-svn: 240438
As with the previous patch, the goal is to turn the class into a general
loop-versioning class. This patch removes any references to loop
distribution.
llvm-svn: 240352
Calling operator* on a WeakVH whose Value is null hits undefined
behaviour, since we bind the value to a reference. Instead, go through
`operator Value*` so that we work with the pointer itself.
Found by ubsan.
llvm-svn: 240214
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
This is now living in MemoryLocation, which is what it pertains to. It
is also an enum there rather than a static data member which is left
never defined.
llvm-svn: 239886
that it is its own entity in the form of MemoryLocation, and update all
the callers.
This is an entirely mechanical change. References to "Location" within
AA subclases become "MemoryLocation", and elsewhere
"AliasAnalysis::Location" becomes "MemoryLocation". Hope that helps
out-of-tree folks update.
llvm-svn: 239885
A reduction is a special kind of recurrence. In the loop vectorizer we currently
identify basic reductions. Future patches will extend this to identifying basic
recurrences.
llvm-svn: 239835
This change is hopefully NFC. The only tricky part is that I changed the context instruction being used to the branch rather than the comparison. I believe both to be correct, but the branch is strictly more powerful. With the moved code, using the branch instruction is required for the basic block comparison test to return the same result. The previous code was able to directly access both the branch and the comparison where the revised code is not.
Differential Revision: http://reviews.llvm.org/D9652
llvm-svn: 239797
Summary:
Scalarizer has two data structures that hold information about changes
to the function, Gathered and Scattered. These are cleared in finish()
at the end of runOnFunction() if finish() detects any changes to the
function.
However, finish() was checking for changes by only checking if
Gathered was non-empty. The function visitStore() only modifies
Scattered without touching Gathered. As a result, Scattered could have
ended up having stale data if Scalarizer only scalarized store
instructions. Since the data in Scattered is used during the execution
of the pass, this introduced dangling pointer errors.
The fix is to check whether both Scattered and Gathered are empty
before deciding what to do in finish().
Reviewers: srhines
Reviewed By: srhines
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10422
llvm-svn: 239644
Summary:
A side effect of this change is that it IRBuilder now automatically
created debug info locations for new instructions, which is the
same as debug location of insertion point. This is fine for the
functions in questions (GetStoreValueForLoad and
GetMemInstValueForLoad), as they are used in two situations:
* GVN::processLoad, which tries to eliminate a load. In this case
new instructions would have the same debug location as the load they
eventually replace;
* MaterializeAdjustedValue, which adds new instructions to the end
of the basic blocks, which could later be used to replace the load
definition. In this case we don't yet know the way the load would
be eventually replaced (either by assembling the precomputed values
via PHI, or by using them directly), so just using the basic block
strategy seems to be reasonable. There is also a special case
in the code that *would* adjust the location of the last
instruction replacing the load definition to the location of the
load.
Test Plan: regression test suite
Reviewers: echristo, dberlin, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10405
llvm-svn: 239585
This only updates one of the uses. The other is used in cases
that may never touch memory, so I'm not sure why this is even
calling it. Perhaps there should be a new, similar hook for such
cases or pass -1 for unknown address space.
llvm-svn: 239540
Determining proper debug locations for instructions created in
PHITransAddr is tricky. We use a simple approach here and simply copy
debug locations from instructions computing load address to
"corresponding" instructions re-creating the address computation
in predecessor basic blocks.
This may not always be correct, given all the rearrangement and
simplification going on, and debug locations may jump around a lot,
as the basic blocks we copy locations between may be very far from
each other.
Still, this would work good in most simple cases (e.g. when chain
of address computing instruction is short, or our mapping turns out
to be 1-to-1), and we desire to have *some* reasonable debug locations
associated with newly inserted instructions.
See http://reviews.llvm.org/D10351 review thread for more details.
Test Plan: regression test suite
Reviewers: spatel, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10351
llvm-svn: 239479
that was resetting it.
Remove the uses of DisableTailCalls in subclasses of TargetLowering and use
the value of function attribute "disable-tail-calls" instead. Also,
unconditionally add pass TailCallElim to the pipeline and check the function
attribute at the start of runOnFunction to disable the pass on a per-function
basis.
This is part of the work to remove TargetMachine::resetTargetOptions, and since
DisableTailCalls was the last non-fast-math option that was being reset in that
function, we should be able to remove the function entirely after the work to
propagate IR-level fast-math flags to DAG nodes is completed.
Out-of-tree users should remove the uses of DisableTailCalls and make changes
to attach attribute "disable-tail-calls"="true" or "false" to the functions in
the IR.
rdar://problem/13752163
Differential Revision: http://reviews.llvm.org/D10099
llvm-svn: 239427
on a per-function basis.
Previously some of the passes were conditionally added to ARM's pass pipeline
based on the target machine's subtarget. This patch makes changes to add those
passes unconditionally and execute them conditonally based on the predicate
functor passed to the pass constructors. This enables running different sets of
passes for different functions in the module.
rdar://problem/20542263
Differential Revision: http://reviews.llvm.org/D8717
llvm-svn: 239325
Summary:
Using some SCEV functionality helped to entirely remove SCEVCache class and FindConstantPointers SCEV visitor.
Also, this makes the code more universal - I'll take advandate of it in next patches where I start handling additional types of instructions.
Test Plan: Tests would be submitted in subsequent patches.
Reviewers: atrick, chandlerc
Reviewed By: atrick, chandlerc
Subscribers: atrick, llvm-commits
Differential Revision: http://reviews.llvm.org/D10205
llvm-svn: 239282
Summary:
canUnrollCompletely takes `unsigned` values for `UnrolledCost` and
`RolledDynamicCost` but is passed in `uint64_t`s that are silently
truncated. Because of this, when `UnrolledSize` is a large integer
that has a small remainder with UINT32_MAX, LLVM tries to completely
unroll loops with high trip counts.
Reviewers: mzolotukhin, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10293
llvm-svn: 239218
CVP wants to analyze the condition operand of a select along an edge.
It succeeds in getting back a Constant but not a ConstantInt. Instead,
it gets a ConstantExpr. It then assumes that the Constant must be equal
to false because it isn't equal to true.
Instead, perform an additional comparison.
This fixes PR23752.
llvm-svn: 239217
The new naming is (to me) much easier to understand. Here is a summary
of the new state of the world:
- '*Threshold' is the threshold for full unrolling. It is measured
against the estimated unrolled cost as computed by getUserCost in TTI
(or CodeMetrics, etc). We will exceed this threshold when unrolling
loops where unrolling exposes a significant degree of simplification
of the logic within the loop.
- '*PercentDynamicCostSavedThreshold' is the percentage of the loop's
estimated dynamic execution cost which needs to be saved by unrolling
to apply a discount to the estimated unrolled cost.
- '*DynamicCostSavingsDiscount' is the discount applied to the estimated
unrolling cost when the dynamic savings are expected to be high.
When actually analyzing the loop, we now produce both an estimated
unrolled cost, and an estimated rolled cost. The rolled cost is notably
a dynamic estimate based on our analysis of the expected execution of
each iteration.
While we're still working to build up the infrastructure for making
these estimates, to me it is much more clear *how* to make them better
when they have reasonably descriptive names. For example, we may want to
apply estimated (from heuristics or profiles) dynamic execution weights
to the *dynamic* cost estimates. If we start doing that, we would also
need to track the static unrolled cost and the dynamic unrolled cost, as
only the latter could reasonably be weighted by profile information.
This patch is sadly not without functionality change for the new unroll
analysis logic. Buried in the heuristic management were several things
that surprised me. For example, we never subtracted the optimized
instruction count off when comparing against the unroll heursistics!
I don't know if this just got lost somewhere along the way or what, but
with the new accounting of things, this is much easier to keep track of
and we use the post-simplification cost estimate to compare to the
thresholds, and use the dynamic cost reduction ratio to select whether
we can exceed the baseline threshold.
The old values of these flags also don't necessarily make sense. My
impression is that none of these thresholds or discounts have been tuned
yet, and so they're just arbitrary placehold numbers. As such, I've not
bothered to adjust for the fact that this is now a discount and not
a tow-tier threshold model. We need to tune all these values once the
logic is ready to be enabled.
Differential Revision: http://reviews.llvm.org/D9966
llvm-svn: 239164
port it to the new pass manager.
All this does is extract the inner "location" class used by AA into its
own full fledged type. This seems *much* cleaner as MemoryDependence and
soon MemorySSA also use this heavily, and it doesn't make much sense
being inside the AA infrastructure.
This will also make it much easier to break apart the AA infrastructure
into something that stands on its own rather than using the analysis
group design.
There are a few places where this makes APIs not make sense -- they were
taking an AliasAnalysis pointer just to build locations. I'll try to
clean those up in follow-up commits.
Differential Revision: http://reviews.llvm.org/D10228
llvm-svn: 239003
Summary:
Once a gc.statepoint has been rewritten to relocate live references, the
SSA values represent physical pointers instead of logical references.
Logical dereferencability does not imply physical dereferencability and
after RewriteStatepointsForGC has run any attributes that imply
dereferencability of the logical references need to be stripped.
This current approach is conservative, and can be made more precise
later if needed. For starters, we need to strip dereferencable
attributes only from pointers that live in the GC address space.
Reviewers: reames, pgavlin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10105
llvm-svn: 238883
Summary:
A later change that has RewriteStatepointsForGC change function
attributes throughout the module depends on this.
Reviewers: reames, pgavlin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10104
llvm-svn: 238882
If the type isn't trivially moveable emplace can skip a potentially
expensive move. It also saves a couple of characters.
Call sites were found with the ASTMatcher + some semi-automated cleanup.
memberCallExpr(
argumentCountIs(1), callee(methodDecl(hasName("push_back"))),
on(hasType(recordDecl(has(namedDecl(hasName("emplace_back")))))),
hasArgument(0, bindTemporaryExpr(
hasType(recordDecl(hasNonTrivialDestructor())),
has(constructExpr()))),
unless(isInTemplateInstantiation()))
No functional change intended.
llvm-svn: 238602
The patch evaluates the expansion cost of exitValue in indVarSimplify pass, and only does the rewriting when the expansion cost is low or loop can be deleted with the rewriting. It provides an option "-replexitval=" to control the default aggressiveness of the exitvalue rewriting. It also fixes some missing cases in SCEVExpander::isHighCostExpansionHelper to enhance the evaluation of SCEV expansion cost.
Differential Revision: http://reviews.llvm.org/D9800
llvm-svn: 238507
Canonicalizing 'x [+-] (-Constant * y)' is not a win if we don't *know*
we will open up CSE opportunities.
If the multiply was 'nsw', then negating 'y' requires us to clear the
'nsw' flag. If this is actually worth pursuing, it is probably more
appropriate to do so in GVN or EarlyCSE.
This fixes PR23675.
llvm-svn: 238397
Summary:
This patch made two improvements to NaryReassociate and the NVPTX pipeline
1. Run EarlyCSE/GVN after NaryReassociate to get rid of redundant common
expressions.
2. When adding an instruction to SeenExprs, maps both the SCEV before and after
reassociation to that instruction.
Test Plan: updated @reassociate_gep_nsw in nary-gep.ll
Reviewers: meheff, broune
Reviewed By: broune
Subscribers: dberlin, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9947
llvm-svn: 238396
Long ago, the poll insertion code assumed that the insertion site was a terminator. As a result, the entry selection code would split a basic block to ensure it could pass a terminator. The insertion code was updated quite a while ago - possibly before it ever landed upstream - but the now redundant work was never removed.
While I'm at it, remove a comment which doesn't apply to the upstreamed code.
NFC intended.
llvm-svn: 238254
While working on another change, I noticed that the naming in this function was mildly deceptive. While fixing that, I took the oppurtunity to modernize some of the code. NFC intended.
llvm-svn: 238252
lazily built.
Also, make it a much more generic SCEV cache, which today exposes only
a reduced GEP model description but could be extended in the future to
do other profitable caching of SCEV information.
llvm-svn: 238124
This patch extends EarlyCSE to take advantage of the information that a controlling branch gives us about the value of a Value within this and dominated basic blocks. If the current block has a single predecessor with a controlling branch, we can infer what the branch condition must have been to execute this block. The actual change to support this is downright simple because EarlyCSE's existing scoped hash table logic deals with most of the complexity around merging.
The patch actually implements two optimizations.
1) The first is analogous to JumpThreading in that it enables EarlyCSE's CSE handling to fold branches which are exactly redundant due to a previous branch to branches on constants. (It doesn't actually replace the branch or change the CFG.) This is pretty clearly a win since it enables substantial CFG simplification before we start trying to inline.
2) The second is analogous to CVP in that it exploits the knowledge gained to replace dominated *uses* of the original value. EarlyCSE does not otherwise reason about specific uses, so this is the more arguable one. It does enable further simplication and constant folding within the rest of the visit by EarlyCSE.
In both cases, the added code only handles the easy dominance based case of each optimization. The general case is deferred to the existing passes.
Differential Revision: http://reviews.llvm.org/D9763
llvm-svn: 238071
accumulating estimated cost, and other loop-centric logic from the logic
used to analyze instructions in a particular iteration.
This makes the visitor very narrow in scope -- all it does is visit
instructions, update a map of simplified values, and return whether it
is able to optimize away a particular instruction.
The two cost metrics are now returned as an optional struct. When the
optional is left unengaged, there is no information about the unrolled
cost of the loop, when it is engaged the cost metrics are available to
run against the thresholds.
No functionality changed.
llvm-svn: 238033
simplified model for use simulating each iteration into a separate
helper function that just returns the cache.
Building this cache had nothing to do with the rest of the unroll
analysis and so this removes an unnecessary coupling, etc. It should
also make it easier to think about the concept of providing fast cached
access to basic SCEV models as an orthogonal concept to the overall
unroll simulation.
I'd really like to see this kind of caching logic folded into SCEV
itself, it seems weird for us to provide it at this layer rather than
making repeated queries into SCEV fast all on their own.
No functionality changed.
llvm-svn: 237993
a single location.
This reduces code duplication a bit and will also pave the way for
a better separation between the visitation algorithm and the unroll
analysis.
No functionality changed.
llvm-svn: 237990
PR23608 pointed out that using the preheader to gain a context instruction isn't always legal because a loop might not have a preheader. When looking into that, I realized that using the preheader to determine legality for sinking is questionable at best. Given no test covers that case and the original commit didn't seem to intend it, I restructured the code to only ask context sensative queries for hoising of loads and stores. This is effectively a partial revert of 237593.
llvm-svn: 237985
Summary:
x = &a[i];
y = &a[i + j];
=>
y = x + j;
along with some refactoring work such as extracting method
findClosestMatchingDominator.
Depends on D9786 which provides the ScalarEvolution::getGEPExpr interface.
Test Plan: nary-gep.ll
Reviewers: meheff, broune
Reviewed By: broune
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9802
llvm-svn: 237971
Correct assertion would be that there is no other uses from chain we are currently cloning. It is ok to have other uses of values not from this chain.
Differential Revision: http://reviews.llvm.org/D9882
llvm-svn: 237899
In effect a partial revert of r237858, which was a dumb shortcut.
Looking at the dependencies of the destination should be the proper
fix: if the new memset would depend on anything other than itself,
the transformation isn't correct.
llvm-svn: 237874
Fixes PR23599, another miscompile introduced by r235232: when there is
another dependency on the destination of the created memset (i.e., the
part of the original destination that the memcpy doesn't depend on)
between the memcpy and the original memset, we would insert the created
memset after the memcpy, and thus after the other dependency.
Instead, insert the created memset right after the old one.
llvm-svn: 237858
This change adds a new GC strategy for supporting the CoreCLR runtime.
This strategy is currently identical to Statepoint-example GC,
but is necessary for several upcoming changes specific to CoreCLR, such as:
1. Base-pointers not explicitly reported for interior pointers
2. Different format for stack-map encoding
3. Location of Safe-point polls: polls are only needed before loop-back edges and before tail-calls (not needed at function-entry)
4. Runtime specific handshake between calls to managed/unmanaged functions.
llvm-svn: 237753
We were special casing a handful of intrinsics as not needing a safepoint before them. After running into another valid case - memset - I took a closer look and realized that almost no intrinsics need to have a safepoint poll before them. Restructure the code to make that apparent so that we stop hitting these bugs. The only intrinsics which need a safepoint poll before them are ones which can run arbitrary code.
llvm-svn: 237744
Summary: When PlaceSafepoints pass replaces old return result with gc_result from statepoint, it asserts that gc_result can not have preceding phis in its parent block. This is only true on invoke statepoint, which terminates the block and puts its result at the beginning of the normal successor block. Call statepoint does not terminate the block and thus its result is in the same block with it. There should be no restriction on whether there are phis or not.
Reviewers: reames, igor-laevsky
Reviewed By: igor-laevsky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9803
llvm-svn: 237597
Summary:
Allow hoisting of loads from values marked with dereferenceable_or_null
attribute. For values marked with the attribute perform
context-sensitive analysis to determine whether it's known-non-null or
not.
Patch by Artur Pilipenko!
Reviewers: hfinkel, sanjoy, reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9253
llvm-svn: 237593
Summary:
This allows other passes (such as SLSR) to compute the SCEV expression for an
imaginary GEP.
Test Plan: no regression
Reviewers: atrick, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9786
llvm-svn: 237589
Don't replace a phi with an identical phi. This was done long ago to
"preserve" IVUsers analysis. The code has already called
SE->forgetValue(PN) so I see no purpose in creating a new value for
the phi.
llvm-svn: 237587
There's no point in copying around constants, so, when all else fails,
we can still transform memcpy of memset into two independent memsets.
To quote the example, we can turn:
memset(dst1, c, dst1_size);
memcpy(dst2, dst1, dst2_size);
into:
memset(dst1, c, dst1_size);
memset(dst2, c, dst2_size);
When dst2_size <= dst1_size.
Like r235232 for copy constructors, this can occur in move constructors.
Differential Revision: http://reviews.llvm.org/D9682
llvm-svn: 237506
Summary:
This is a pass for speculative execution of instructions for simple if-then (triangle) control flow. It's aimed at GPUs, but could perhaps be used in other contexts. Enabling this pass gives us a 1.0% geomean improvement on Google benchmark suites, with one benchmark improving 33%.
Credit goes to Jingyue Wu for writing an earlier version of this pass.
Patched by Bjarke Roune.
Test Plan:
This patch adds a set of tests in test/Transforms/SpeculativeExecution/spec.ll
The pass is controlled by a flag which defaults to having the pass not run.
Reviewers: eliben, dberlin, meheff, jingyue, hfinkel
Reviewed By: jingyue, hfinkel
Subscribers: majnemer, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9360
llvm-svn: 237459
Summary:
Consider (B | i) * S as (B + i) * S if B and i have no bits set in
common.
Test Plan: @or in slsr-mul.ll
Reviewers: broune, meheff
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9788
llvm-svn: 237456
Transfer the calling convention from the invoke being replaced by
PlaceStatepoints to the new invoke to gc.statepoint created. Add a test
case that would have caught this issue.
llvm-svn: 237414
rL236672 would generate all invoke statepoints with deopt args set to a
list containing the single element "0", instead of an empty list.
Also add a test case that would have caught this.
llvm-svn: 237413
Summary:
Extract method haveNoCommonBitsSet so that we don't have to duplicate this logic in
InstCombine and SeparateConstOffsetFromGEP.
This patch also makes SeparateConstOffsetFromGEP more precise by passing
DominatorTree to computeKnownBits.
Test Plan: value-tracking-domtree.ll that tests ValueTracking indeed leverages dominating conditions
Reviewers: broune, meheff, majnemer
Reviewed By: majnemer
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9734
llvm-svn: 237407
Summary:
This implements the initial version as was proposed earlier this year
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-January/080462.html).
Since then Loop Access Analysis was split out from the Loop Vectorizer
and was made into a separate analysis pass. Loop Distribution becomes
the second user of this analysis.
The pass is off by default and can be enabled
with -enable-loop-distribution. There is currently no notion of
profitability; if there is a loop with dependence cycles, the pass will
try to split them off from other memory operations into a separate loop.
I decided to remove the control-dependence calculation from this first
version. This and the issues with the PDT are actively discussed so it
probably makes sense to treat it separately. Right now I just mark all
terminator instruction required which keeps identical CFGs for each
distributed loop. This seems to be working pretty well for 456.hmmer
where even though there is an empty if-then block in the distributed
loop initially, it gets completely removed.
The pass keeps DominatorTree and LoopInfo updated. I've tested this
with -loop-distribute-verify with the testsuite where we distribute ~90
loops. SimplifyLoop is violated in some cases and I have a FIXME
covering this.
Reviewers: hfinkel, nadav, aschwaighofer
Reviewed By: aschwaighofer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8831
llvm-svn: 237358
ArrayRef already has a SFINAE constructor which can construct ArrayRef<const T*> from ArrayRef<T*>.
This adds methods to do the same directly from SmallVector and std::vector. This avoids an intermediate step through the use of makeArrayRef.
Also update the users of this in LICM and SROA to remove the now unnecessary makeArrayRef call.
Reviewed by David Blaikie.
llvm-svn: 237309
Summary:
This patch teaches the PlaceSafepoints pass about two `CallSite`
function attributes:
* "statepoint-id": if the string value of this attribute can be parsed
as an integer, then it is propagated to the ID parameter of the
statepoint created.
* "statepoint-num-patch-bytes": if the string value of this attribute
can be parsed as an integer, then it is propagated to the `num patch
bytes` parameter of the statepoint created.
This change intentionally does not assert on a malformed value for these
attributes, given that they're not "official" attributes.
Reviewers: reames, pgavlin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9735
llvm-svn: 237286
Avoid running forever by checking we are not reassociating an expression into
the same form.
Tested with @avoid_infinite_loops in nary-add.ll
llvm-svn: 237269
This patch uses the new function profile metadata "function_entry_count"
to annotate entry counts from sample profiles.
In a sampling profile, the total samples collected at the function entry
are an approximation for the number of times that function was invoked.
llvm-svn: 237265
The array passed to LoadAndStorePromoter's constructor was a constant reference to a SmallVectorImpl, which is just the same as passing an ArrayRef.
Also, the data in the array can be 'const Instruction*' instead of 'Instruction*'. Its not possible to convert a SmallVectorImpl<T*> to SmallVectorImpl<const T*>, but ArrayRef does provide such a method.
Currently this added calls to makeArrayRef which should be a nop, but i'm going to kick off a discussion about improving ArrayRef to not need these.
llvm-svn: 237226
Reduce recalculation of the dominator tree by identifying all sites that will need a safepoint poll before doing any of the insertion. This allows us to invalidate the dominator info once, rather than once per safepoint poll inserted.
While I'm at it, update findLocationForEntrySafepoint to properly update the dom tree now that the interface has been made easy. When first written, it wasn't per comment in the code.
Differential Revision: http://reviews.llvm.org/D9727
llvm-svn: 237220
Summary:
This change adds two new parameters to the statepoint intrinsic, `i64 id`
and `i32 num_patch_bytes`. `id` gets propagated to the ID field
in the generated StackMap section. If the `num_patch_bytes` is
non-zero then the statepoint is lowered to `num_patch_bytes` bytes of
nops instead of a call (the spill and reload code remains unchanged).
A non-zero `num_patch_bytes` is useful in situations where a language
runtime requires complete control over how a call is lowered.
This change brings statepoints one step closer to patchpoints. With
some additional work (that is not part of this patch) it should be
possible to get rid of `TargetOpcode::STATEPOINT` altogether.
PlaceSafepoints generates `statepoint` wrappers with `id` set to
`0xABCDEF00` (the old default value for the ID reported in the stackmap)
and `num_patch_bytes` set to `0`. This can be made more sophisticated
later.
Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9546
llvm-svn: 237214
Responding to review feedback from http://reviews.llvm.org/D9585
1) Remove a variable shadow by converting the outer loop to a range for loop. We never really used the 'i' variable which was being shadowed.
2) Reduce DominatorTree recalculations by passing the DT to SplitEdge.
llvm-svn: 237212
checking and make the cache faster and smaller.
I had thought that using an APInt here would be useful, but I think
I was just wrong. Notably, we don't have to do any fancy overflow
checking, we can just bound the values as quite small and do the math in
a higher precision integer. I've switched to a signed integer so that
UBSan will even point out if we ever have integer overflow. I've added
various asserts to try to catch things as well and hoisted the overflow
checks so that we just leave the too-large offsets out of the SCEV-GEP
cache. This makes the value in the cache quite a bit smaller which is
probably worthwhile.
No functionality changed here (for trip counts under 1 billion).
llvm-svn: 237209
Summary:
If the branch that leads to the PHI node and the Select instruction
depend on correlated conditions, we might be able to directly use the
corresponding value from the Select instruction as the incoming value
for the PHI node, allowing later removal of the select instruction.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9051
llvm-svn: 237201
When relocating a pointer, we need to determine a base pointer for the derived pointer being relocated. We have limited support for handling a pointer extracted from a vector; the current code only handled the case where the entire vector was known to contain base pointers. This patch extends the reasoning to handle chains of insertelements where the indices are constants. This case turns out to be fairly common in vectorized code. We can now handle vectors which contains mixtures of base and derived pointers provided the insertelements use constant indices.
Note that this doesn't solve the general problem. To handle variable indexed insertelements, we'd need to scalarize and introduce conditional branching based on the index. Alternatively, we could eagerly scalarize, but the code structure doesn't currently make either fix easy. The patch also doesn't handle shufflevector or other vector manipulation for much the same reasons. I plan to defer this work until I have a motivating test case.
Differential Revision: http://reviews.llvm.org/D9676
llvm-svn: 237200
The pass doesn't actually modify the module outside of the function being processed. The only confusing piece is that it both inserts calls and then inlines the resulting calls. Given that, it definitely invalidates module level analysis results, but many FunctionPasses do that.
Differential Revision: http://reviews.llvm.org/D9590
llvm-svn: 237185
Switch from using a LoopPass to using a FunctionPass for the internal helper analysis pass. The next step is going to be to make this a true analysis pass which is required by the PlaceSafepoints pass itself.
p.s. The interesting semantic part here is that we're changing the iteration order over the loops. It shouldn't matter, but that's the reason to separate this into it's own distinct patch.
Differential Revision: http://reviews.llvm.org/D9588
llvm-svn: 237180
The old code computed dominators for every loop. This was terribly slow with no good reason. Just use the standard infrastructure for analysis passes.
Differential Revision: http://reviews.llvm.org/D9586
llvm-svn: 237176
As a step towards getting rid of internal pass manager hack entirely, remove the need for loop simplify to run in the inner pass manager. The new code does produce slightly different loop structures, so this isn't technically NFC.
Differential Revision: http://reviews.llvm.org/D9585
llvm-svn: 237172
We already had a method to iterate over all the incoming values of a PHI. This just changes all eligible code to use it.
Ineligible code included anything which cared about the index, or was also trying to get the i'th incoming BB.
llvm-svn: 237169
Summary:
This patch reimplements heuristic that tries to estimate optimization beneftis
from complete loop unrolling.
In this patch I kept the minimal changes - e.g. I removed code handling
branches and folding compares. That's a promising area, but now there
are too many questions to discuss before we can enable it.
Test Plan: Tests are included in the patch.
Reviewers: hfinkel, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8816
llvm-svn: 237156
Summary:
This patch is to rename some variables to CamelCase in gc_relocate
related functions. There is no functionality change.
Patch by Chen Li!
Reviewers: reames, AndyAyers, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9681
llvm-svn: 237069
This fixes another miscompile introduced by r235232: when there was a
dependency on the memcpy destination other than the memset, we would
ignore it, because we only looked at the source dependency.
It was a mistake to use SrcDepInfo. Instead, just use DepInfo.
llvm-svn: 237066
runOnCountable() allowed the caller to call on a loop without a
predictable backedge-taken count. Change the code so that only loops
with computable backdge-count can call this function, in order to catch
abuses.
llvm-svn: 237044
Summary:
In RewriteStatepointsForGC pass, we create a gc_relocate intrinsic for
each relocated pointer, and the gc_relocate has the same type with the
pointer. During the creation of gc_relocate intrinsic, llvm requires to
mangle its type. However, llvm does not support mangling of all possible
types. RewriteStatepointsForGC will hit an assertion failure when it
tries to create a gc_relocate for pointer to vector of pointers because
mangling for vector of pointers is not supported.
This patch changes the way RewriteStatepointsForGC pass creates
gc_relocate. For each relocated pointer, we erase the type of pointers
and create an unified gc_relocate of type i8 addrspace(1)*. Then a
bitcast is inserted to convert the gc_relocate to the correct type. In
this way, gc_relocate does not need to deal with different types of
pointers and the unsupported type mangling is no longer a problem. This
change would also ease further merge when LLVM erases types of pointers
and introduces an unified pointer type.
Some minor changes are also introduced to gc_relocate related part in
InstCombineCalls, CodeGenPrepare, and Verifier accordingly.
Patch by Chen Li!
Reviewers: reames, AndyAyers, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9592
llvm-svn: 237009
If we have recognized that a conditional is constant at a particular location in the code (while trying to decide if we can simplify a conditional branch), we can eagerly replace that condition with a constant if it's definition is post dominated by the branch in question.
In practice, this ends up being a compile time savings at most. JumpThreading would have visited each using branch anyways. CVP would have visited the cmp itself again. Unless LVI gives up early, we shouldn't gain any addition power by doing this transformation early. What we do gain is simplicity and compile time.
Differential Revision: http://reviews.llvm.org/D9312
llvm-svn: 236684
Renames the original CreateGCStatepoint to CreateGCStatepointCall, and
moves invoke creating functionality from PlaceSafepoints.cpp to
IRBuilder.cpp.
This changes the labels generated for PlaceSafepoints/invokes.ll so use
a regex there to make the basic block labels more resilient.
llvm-svn: 236672
I folded the check for the flag -verify-dom-info into the only caller
where I think it is supposed to be checked: verifyAnalysis. (The idea
of the flag is to enable this expensive verification in
verifyPreservedAnalysis.)
I'm assuming that when manually scheduling the verification pass
with -passes=verify<domtree>, we do want to perform the verification.
llvm-svn: 236575
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`. The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.
Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one. It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs. YMMV of
course.
Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py. I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three. It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).
Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.
llvm-svn: 236120
There can be various constant pointers in the IR which do not get relocated at a safepoint. One example is the address of a global variable. Another example is a pointer created via inttoptr. Note that the optimizer itself likes to create such inttoptrs when locally propagating constants through dynamically dead code.
To deal with this, we need to exclude uses of constants from contributing to the liveness of a safepoint which might reach that use. At some later date, it might be worth exploring what could be done to support the relocation of various special types of "constants", but that's future work.
Differential Revision: http://reviews.llvm.org/D9236
llvm-svn: 235821
llvm.frameescape() intrinsic is not a real call. The intrinsic can only exist in the entry block. Inserting a gc.statepoint() before llvm.frameescape() may split the entry block, and push the intrinsic out of the entry block.
Patch by: Swaroop.Sridhar@microsoft.com
Differential Revision: http://reviews.llvm.org/D8910
llvm-svn: 235820
Move isDereferenceablePointer function to Analysis. This function recursively tracks dereferencability over a chain of values like other functions in ValueTracking.
This refactoring is motivated by further changes to support dereferenceable_or_null attribute (http://reviews.llvm.org/D8650). isDereferenceablePointer will be extended to perform context-sensitive analysis and IR is not a good place to have such functionality.
Patch by: Artur Pilipenko <apilipenko@azulsystems.com>
Differential Revision: reviews.llvm.org/D9075
llvm-svn: 235611
MemIntrinsic::getDest() looks through pointer casts, and using it
directly when building the new GEP+memset results in stuff like:
%0 = getelementptr i64* %p, i32 16
%1 = bitcast i64* %0 to i8*
call ..memset(i8* %1, ...)
instead of the correct:
%0 = bitcast i64* %p to i8*
%1 = getelementptr i8* %0, i32 16
call ..memset(i8* %1, ...)
Instead, use getRawDest, which just gives you the i8* value.
While there, use the memcpy's dest, as it's live anyway.
In most cases, when the optimization triggers, the memset and memcpy
sizes are the same, so the built memset is 0-sized and eliminated.
The problem occurs when they're different.
Fixes a regression caused by r235232: PR23300.
llvm-svn: 235419
Summary:
This lets us use range based for loops.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9169
llvm-svn: 235416
Summary:
After we rewrite a candidate, the instructions used by the old form may
become unused. This patch cleans up these unused instructions so that we
needn't run DCE after SLSR.
Test Plan: removed -dce in all the SLSR tests
Reviewers: broune, meheff
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9101
llvm-svn: 235410
Summary: so that we needn't run DCE after this pass.
Test Plan: removed -dce from the commandline in split-gep.ll and split-gep-and-gvn.ll
Reviewers: meheff
Subscribers: llvm-commits, HaoLiu, hfinkel, jholewinski
Differential Revision: http://reviews.llvm.org/D9096
llvm-svn: 235409
Harden r235258 to support any integer bitwidth. The quick glance at
the reference made me think only i32 and i64 were valid types, but
they're not special, so any overload is legal.
Thanks to David Majnemer for noticing!
llvm-svn: 235261
Followup to r235232, which caused PR23278.
We can't assume the memset and memcpy sizes have the same type, as
nothing in the language reference prevents that.
Instead, zext both to i64 if they disagree.
While there, robustify tests by using i8 %c rather than i8 0 for the
memset character.
llvm-svn: 235258
A common idiom in some code is to do the following:
memset(dst, 0, dst_size);
memcpy(dst, src, src_size);
Some of the memset is redundant; instead, we can do:
memcpy(dst, src, src_size);
memset(dst + src_size, 0,
dst_size <= src_size ? 0 : dst_size - src_size);
Original patch by: Joel Jones
Differential Revision: http://reviews.llvm.org/D498
llvm-svn: 235232
Summary:
An alternative is to use a worklist approach. However, that approach
would break the traversing order so that we couldn't lookup SeenExprs
efficiently. I don't see a clear winner here, so I picked the easier approach.
Along with two minor improvements:
1. preserves ScalarEvolution by forgetting instructions replaced
2. removes dead code locally avoiding the need of running DCE afterwards
Test Plan: add to slsr-add.ll a test that requires multiple iterations
Reviewers: broune, dberlin, atrick, meheff
Reviewed By: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9058
llvm-svn: 235151
Summary:
This fixes a left-over efficiency issue in D8950.
As Andrew and Daniel suggested, we can store the candidates in a stack
and pop the top element when it does not dominate the current
instruction. This reduces the worst-case time complexity to O(n).
Test Plan: a new test in nary-add.ll that exercises this optimization.
Reviewers: broune, dberlin, meheff, atrick
Reviewed By: atrick
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D9055
llvm-svn: 235129
Change `DIBuilder::insertDeclare()` and `insertDbgValueIntrinsic()` to
take an `MDLocation*`/`DebugLoc` parameter which it attaches to the
created intrinsic. Assert at creation time that the `scope:` field's
subprogram matches the variable's. There's a matching `clang` commit to
use the API.
The context for this is PR22778, which is removing the `inlinedAt:`
field from `MDLocalVariable`, instead deferring to the `!dbg` location
attached to the debug info intrinsic. The best way to ensure we always
have a `!dbg` attachment is to require one at creation time. I'll be
adding verifier checks next, but this API change is the best way to
shake out frontend bugs.
Note: I added an `llvm_unreachable()` in `bindings/go` and passed in
`nullptr` for the `DebugLoc`. The `llgo` folks will eventually need to
pass a valid `DebugLoc` here.
llvm-svn: 235041
Summary:
With this patch, SLSR may rewrite
S1: X = B + i * S
S2: Y = B + i' * S
to
S2: Y = X + (i' - i) * S
A secondary improvement: if (i' - i) is a power of 2, emit Y as X + (S << log(i' - i)). (S << log(i' -i)) is in a canonical form and thus more likely GVN'ed than (i' - i) * S.
Test Plan: slsr-add.ll
Reviewers: hfinkel, sanjoy, meheff, broune, eliben
Reviewed By: eliben
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8983
llvm-svn: 235019
Summary:
This transformation reassociates a n-ary add so that the add can partially reuse
existing instructions. For example, this pass can simplify
void foo(int a, int b) {
bar(a + b);
bar((a + 2) + b);
}
to
void foo(int a, int b) {
int t = a + b;
bar(t);
bar(t + 2);
}
saving one add instruction.
Fixes PR22357 (https://llvm.org/bugs/show_bug.cgi?id=22357).
Test Plan: nary-add.ll
Reviewers: broune, dberlin, hfinkel, meheff, sanjoy, atrick
Reviewed By: sanjoy, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8950
llvm-svn: 234855
Gut the `DIDescriptor` wrappers around `MDLocalScope` subclasses. Note
that `DILexicalBlock` wraps `MDLexicalBlockBase`, not `MDLexicalBlock`.
llvm-svn: 234850
Summary:
Runtime unrolling of loops needs to emit an expression to compute the
loop's runtime trip-count. Avoid runtime unrolling if this computation
will be expensive.
Depends on D8993.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8994
llvm-svn: 234846
Summary:
Move isHighCostExpansion from IndVarSimplify to SCEVExpander. This
exposed function will be used in a subsequent change.
Reviewers: bogner, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8995
llvm-svn: 234844
This is along the same lines as r234832, but for `DILocation`. Clean
out all accessors from `DILocation`. Any callers should be using
`MDLocation` directly (e.g., via `operator->()`).
llvm-svn: 234835
Completely gut `DIExpression`, turning it into a simple wrapper around
`MDExpression *`. There are two bits of magic left:
- It's constructed from `const MDExpression*` but convertible to
`MDExpression*`.
- It's default-constructed to `nullptr`.
Otherwise, it should behave quite like a raw pointer. Once I've done
the same to the rest of the `DIDescriptor` subclasses, I'll come back to
delete them entirely (and update call sites as necessary to deal with
the missing magic).
llvm-svn: 234832
Before we had real liveness, we needed to track every value that base pointer
insertion code created because these now might be live. We now just rerun
the data flow liveness algorithm (which is actually faster!) and no longer
need the associated code.
llvm-svn: 234827
Use early-return style that's preferred in LLVM and updating the naming in places I touched with other changes in the last few days. Hopefully, NFC.
llvm-svn: 234785
We use dummy calls to adjust the liveness of values over statepoints in the midst of the insertion. If there are no values which need held live, there's no point in actually inserting the holder.
llvm-svn: 234779
Since we're restructuring the CFG, we also need to make sure to update the analsis passes. While I'm touching the code, I dedicided to restructure it a bit. The code involved here was very confusing. This change moves the normalization to essentially being a pre-pass before the main insertion work and updates a few comments to actually say what is happening and *why*.
The restructuring should be covered by existing tests. I couldn't easily see how to create a test for the invalidation bug. Suggestions welcome.
llvm-svn: 234769
This is related to the issues addressed in 234651. These assertions check the properties ensured by that change at the place of use. Note that a similiar property is checked in checkBasicSSA, but without the reachability constraint. Technically, the liveness would be correct to include unreachable values, but this would be problematic for actual relocation.
llvm-svn: 234766
The check in question is attempting to help find cases where we haven't relocated a pointer at a safepoint we should have. It does this by coercing the value to null at any safepoint which doesn't relocate it.
Unfortunately, this turns out to be rather expensive in terms of memory usage and time. The number of stores inserted can grow with O(number of values x number of statepoints). On at least one example I looked at, over half of peak memory usage was coming from this check.
With this change, the check is no longer enabled by default in Asserts builds. It is enabled for expensive asserts builds and has a command line option to enable it in both Asserts and non-Asserts builds.
llvm-svn: 234761
The patch is generated using clang-tidy misc-use-override check.
This command was used:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \
-checks='-*,misc-use-override' -header-filter='llvm|clang' \
-j=32 -fix -format
http://reviews.llvm.org/D8925
llvm-svn: 234679
When rewriting statepoints to make relocations explicit, we need to have a conservative but consistent notion of where a particular pointer is live at a particular site. The old code just used dominance, which is correct, but decidedly more conservative then it needed to be. This patch implements a simple dataflow algorithm that's run one per function (well, twice counting fixup after base pointer insertion). There's still lots of room to make this faster, but it's fast enough for all practical purposes today.
Differential Revision: http://reviews.llvm.org/D8674
llvm-svn: 234657
After submitting 234651, I noticed I hadn't responded to a review comment by mjacob. This patch addresses that comment and fixes a Release only build problem due to an unused variable.
llvm-svn: 234653
Two related small changes:
Various dominance based queries about liveness can get confused if we're talking about unreachable blocks. To avoid reasoning about such cases, just remove them before rewriting statepoints.
Remove single entry phis (likely left behind by LCSSA) to reduce the number of live values.
Both of these are motivated by http://reviews.llvm.org/D8674 which will be submitted shortly.
Differential Revision: http://reviews.llvm.org/D8675
llvm-svn: 234651
This patch adds limited support for inserting explicit relocations when there's a vector of pointers live over the statepoint. This doesn't handle the case where the vector contains a mix of base and non-base pointers; that's future work.
The current implementation just scalarizes the vector over the gc.statepoint before doing the explicit rewrite. An alternate approach would be to plumb the vector all the way though the backend lowering, but doing that appears challenging. In particular, the size of the indirect spill slot is currently assumed to be sizeof(pointer) throughout the backend.
In practice, this is enough to allow running the SLP and Loop vectorizers before RewriteStatepointsForGC.
Differential Revision: http://reviews.llvm.org/D8671
llvm-svn: 234647
CallSite roughly behaves as a common base CallInst and InvokeInst. Bring
the behavior closer to that model by making upcasts explicit. Downcasts
remain implicit and work as before.
Following dyn_cast as a mental model checking whether a Value *V isa
CallSite now looks like this:
if (auto CS = CallSite(V)) // think dyn_cast
instead of:
if (CallSite CS = V)
This is an extra token but I think it is slightly clearer. Making the
ctor explicit has the advantage of not accidentally creating nullptr
CallSites, e.g. when you pass a Value * to a function taking a CallSite
argument.
llvm-svn: 234601
The plan here is to push the API changes out from the common components
(like Constant::getGetElementPtr and IRBuilder::CreateGEP related
functions) and just update callers to either pass the type if it's
obvious, or pass null.
Do this with LoadInst as well and anything else that comes up, then to
start porting specific uses to not pass null anymore - this may require
some refactoring in each case.
llvm-svn: 234042
Summary:
The old requirement on GEP candidates being in bounds is unnecessary.
For off-bound GEPs, we still have
&B[i * S] = B + (i * S) * e = B + (i * e) * S
Test Plan: slsr_offbound_gep in slsr-gep.ll
Reviewers: meheff
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8809
llvm-svn: 233949
Require the pointee type to be passed explicitly and assert that it is
correct. For now it's possible to pass nullptr here (and I've done so in
a few places in this patch) but eventually that will be disallowed once
all clients have been updated or removed. It'll be a long road to get
all the way there... but if you have the cahnce to update your callers
to pass the type explicitly without depending on a pointer's element
type, that would be a good thing to do soon and a necessary thing to do
eventually.
llvm-svn: 233938
This re-adds float2int to the tree, after fixing PR23038. It turns
out the argument to APSInt() is true-if-unsigned, rather than
true-if-signed :(. Added testcase and explanatory comment.
llvm-svn: 233370
The assertion here was more expensive then it needed to be. We're only inserting allocas in the entry block, so we only need to consider ones in the entry block.
llvm-svn: 233362
Summary:
This patch enhances SLSR to handle another candidate form &B[i * S]. If
we found two candidates
S1: X = &B[i * S]
S2: Y = &B[i' * S]
and S1 dominates S2, we can replace S2 with
Y = &X[(i' - i) * S]
Test Plan:
slsr-gep.ll
X86/no-slsr.ll: verify that we do not run SLSR on GEPs that already fit into
an addressing mode
Reviewers: eliben, atrick, meheff, hfinkel
Reviewed By: hfinkel
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D7459
llvm-svn: 233286
The changes to InstCombine do seem a bit silly - it doesn't make
anything obviously better to have the caller access the pointers element
type (the thing I'm trying to remove) than the GEP itself, but it's a
helpful migration step. This will allow me to more obviously lock down
GEP (& Load, etc) API usage, then fix all the code that accesses pointer
element types except the places that need to be removed (most of the
InstCombines) anyway - at which point I'll need to just remove all that
code because it won't be meaningful anymore (there will be no pointer
types, so no bitcasts to combine)
llvm-svn: 233126
This caused PR23008, compiles failing with: "Use still stuck around after Def is
destroyed: %.sroa.speculated"
Also reverting follow-up r233064.
llvm-svn: 233105
IRCE requires the induction variables it handles to not sign-overflow.
The current scheme of checking if sext({X,+,S}) == {sext(X),+,sext(S)}
fails when SCEV simplifies sext(X) too. After this change we //also//
check no-signed-wrap by looking at the flags set on the SCEVAddRecExpr.
llvm-svn: 233102
It is possible to have code that converts from integer to float, performs operations then converts back, and the result is provably the same as if integers were used.
This can come from different sources, but the most obvious is a helper function that uses floats but the arguments given at an inlined callsites are integers.
This pass considers all integers requiring a bitwidth less than or equal to the bitwidth of the mantissa of a floating point type (23 for floats, 52 for doubles) as exactly representable in floating point.
To reduce the risk of harming efficient code, the pass only attempts to perform complete removal of inttofp/fptoint operations, not just move them around.
llvm-svn: 233062
Don't use `DebugLoc` accessors if we're pointing at null, which will be
a problem after a WIP patch to make the `DIDescriptor` accessors more
strict. Caught by Frontend/profile-sample-use-loc-tracking.c (in
clang).
llvm-svn: 232792
Remove `DebugInfoVerifierLegacyPass` and the `-verify-di` pass.
Instead, call into the `DebugInfoVerifier` from inside
`VerifierLegacyPass::finalizeModule()`. This better matches the logic
in `verifyModule()` (used by the new PassManager), avoids requiring two
separate passes to verify the IR, and makes the API for "add a pass to
verify the IR" simple.
Note: the `-verify-debug-info` flag still works (for now, at least;
eventually it might make sense to just remove it).
llvm-svn: 232772
This change to IRCE gets it to recognize "half" range checks. Half
range checks are range checks that only either check if the index is
`slt` some positive integer ("length") or if the index is `sge` `0`.
The range solver does not try to be clever / aggressive about solving
half-range checks -- it transforms "I < L" to "0 <= I < L" and "0 <= I"
to "0 <= I < INT_SMAX". This is safe, but not always optimal.
llvm-svn: 232444
I'm just going to migrate these in a pretty ad-hoc & incremental way -
providing the backwards compatible API for now, then locally removing
it, fixing a few callers, adding it back in and commiting those callers.
Rinse, repeat.
The assertions should ensure that if I get this wrong we'll find out
about it and not just have one giant patch to revert, recommit, revert,
recommit, etc.
llvm-svn: 232240
This reapplies the patch previously committed at revision 232190. This was
reverted at revision 232196 as it caused test failures in tests that did not
expect operands to be commuted. I have made the tests more resilient to
reassociation in revision 232206.
llvm-svn: 232209
This patch adds initial support for vector instructions to the reassociation
pass. It enables most parts of the pass to work with vectors but to keep the
size of the patch small, optimization of Xor trees, canonicalization of
negative constants and converting shifts to muls, etc., have been left out.
This will be handled in later patches.
The patch is based on an initial patch by Chad Rosier.
Differential Revision: http://reviews.llvm.org/D7566
llvm-svn: 232190
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.
This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.
I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.
I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.
Test Plan:
Reviewers: echristo
Subscribers: llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740
Runtime unrolling is an expensive optimization which can bring benefit
only if the loop is hot and iteration number is relatively large enough.
For some loops, we know they are not worth to be runtime unrolled.
The scalar loop from vectorization is one of the cases.
llvm-svn: 231631
Specifically this:
* Prevents an "unused" warning in non-assert builds.
* In that error case return with out removing a child loop instead of
looping forever.
llvm-svn: 231459
This pass interchanges loops to provide a more cache-friendly memory access.
For e.g. given a loop like -
for(int i=0;i<N;i++)
for(int j=0;j<N;j++)
A[j][i] = A[j][i]+B[j][i];
is interchanged to -
for(int j=0;j<N;j++)
for(int i=0;i<N;i++)
A[j][i] = A[j][i]+B[j][i];
This pass is currently disabled by default.
To give a brief introduction it consists of 3 stages-
LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.
LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.
TODO:
1) Add support for reductions and lcssa phi.
2) Improve profitability model.
3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
4) Improve compile time regression found in llvm lnt due to this pass.
5) Fix issues in Dependency Analysis module.
A special thanks to Hal for reviewing this code.
Review: http://reviews.llvm.org/D7499
llvm-svn: 231458
Summary:
DataLayout keeps the string used for its creation.
As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().
Get rid of DataLayoutPass: the DataLayout is in the Module
The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.
Make DataLayout Non-Optional in the Module
Module->getDataLayout() will never returns nullptr anymore.
Reviewers: echristo
Subscribers: resistor, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D7992
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
RewriteStatepointsForGC pass emits an alloca for each GC pointer which will be relocated. It then inserts stores after def and all relocations, and inserts loads before each use as well. In the end, mem2reg is used to update IR with relocations in SSA form.
However, there is a problem with inserting stores for values defined by invoke instructions. The code didn't expect a def was a terminator instruction, and inserting instructions after these terminators resulted in malformed IR.
This patch fixes this problem by handling invoke instructions as a special case. If the def is an invoke instruction, the store will be inserted at the beginning of the normal destination block. Since return value from invoke instruction does not dominate the unwind destination block, no action is needed there.
Patch by: Chen Li
Differential Revision: http://reviews.llvm.org/D7923
llvm-svn: 231183
The assertion was just checking a class invariant that's pretty easy to
verify by inspection (no mutating operations, and the two non-copy ctors
already ensure the state is maintained) so remove the explicit copy ctor
in favor of the default, thus allowing the use of the default copy
assignment operator without hitting the C++11 deprecation here.
llvm-svn: 231143
Accidentally committed a few more of these cleanup changes than
intended. Still breaking these out & tidying them up.
This reverts commit r231135.
llvm-svn: 231136
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.
llvm-svn: 231135
There's really no reason to have them have entries in the symbol table
anymore. Old versions of ld64 had some bugs in this area but those have
been fixed long ago.
llvm-svn: 231041
This re-lands change r230921. r230921 was reverted because it broke a
clang test; a checkin fixing the clang test will be commited shortly.
Summary:
As far as I can tell, the real bug causing the issue was fixed in
r230533. SCEVExpander should mark an increment operation as nuw or nsw
only if it can *prove* that the operation does not overflow. There
shouldn't be any situation where we have to do something different
because of no-wrap flags generated by SCEVExpander.
Revert "IndVarSimplify: Allow LFTR to fire more often"
This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213).
Revert "IndVarSimplify: Don't let LFTR compare against a poison value"
This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102).
Reviewers: majnemer, atrick, spatel
Differential Revision: http://reviews.llvm.org/D7979
llvm-svn: 231018
Summary:
As far as I can tell, the real bug causing the issue was fixed in
r230533. SCEVExpander should mark an increment operation as nuw or nsw
only if it can *prove* that the operation does not overflow. There
shouldn't be any situation where we have to do something different
because of no-wrap flags generated by SCEVExpander.
Revert "IndVarSimplify: Allow LFTR to fire more often"
This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213).
Revert "IndVarSimplify: Don't let LFTR compare against a poison value"
This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102).
Reviewers: majnemer, atrick, spatel
Differential Revision: http://reviews.llvm.org/D7979
llvm-svn: 230921
Leaving empty blocks around just opens up a can of bugs like PR22704. Deleting
them early also slightly simplifies code.
Thanks to Sanjay for the IR test case.
llvm-svn: 230856
All of the cases were just appending from random access iterators to a
vector. Using insert/append can grow the vector to the perfect size
directly and moves the growing out of the loop. No intended functionalty
change.
llvm-svn: 230845
It turns out the naming of inserted phis and selects is sensative to the order in which two sets are iterated. We need to nail this down to avoid non-deterministic output and possible test failures.
The modified test is the one I first noticed something odd in. The change is making it more strict to report the error. With the test change, but without the code change, the test fails roughly 1 in 5. With the code change, I've run ~30 runs without error.
Long term, the right fix here is to adjust the naming scheme. I'm checking in this hack to avoid any possible non-determinism in the tests over the weekend. HJust because I only noticed one case doesn't mean it's actually the only case. I hope to get to the right change Monday.
std->llvm data structure changes bugfix change #3
llvm-svn: 230835
Inserting into a DenseMap you're iterating over is not well defined. This is unfortunate since this is well defined on a std::map.
"cleanup per llvm code style standards" bug #2
llvm-svn: 230827
These tests cover the 'base object' identification and rewritting portion of RewriteStatepointsForGC. These aren't completely exhaustive, but they've proven to be reasonable effective over time at finding regressions.
In the process of porting these tests over, I found my first "cleanup per llvm code style standards" bug. We were relying on the order of iteration when testing the base pointers found for a derived pointer. When we switched from std::set to DenseSet, this stopped being a safe assumption. I'm suspecting I'm going to find more of those. In particular, I'm now really wondering about the main iteration loop for this algorithm. I need to go take a closer look at the assumptions there.
I'm not really happy with the fact these are testing what is essentially debug output (i.e. enabled via command line flags). Suggestions for how to structure this better are very welcome.
llvm-svn: 230818
Use the IRBuilder helpers for gc.statepoint and gc.result, instead of
coding the construction by hand. Note that the gc.statepoint IRBuilder
handles only CallInst, not InvokeInst; retain that part of hand-coding.
Differential Revision: http://reviews.llvm.org/D7518
llvm-svn: 230591
This is a follow-on to r227491 which tightens the check for propagating FP
values. If a non-constant value happens to be a zero, we would hit the same
bug as before.
Bug noted and patch suggested by Eli Friedman.
llvm-svn: 230564
This refactors the core functionality of LICM: HoistRegion, SinkRegion and
PromoteAliasSet (renamed to promoteLoopAccessesToScalars) as utility functions
in LoopUtils. This will enable other transformations to make use of them
directly.
Patch by Ashutosh Nema.
llvm-svn: 230178
work with a non-canonical induction variable.
This is currently a non-functional change because we only ever call
computeSafeIterationSpace on a canonical induction variable; but the
generalization will be useful in a later commit.
llvm-svn: 230151