As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
and again more recently:
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html
...this is a step in cleaning up our fast-math-flags implementation in IR to better match
the capabilities of both clang's user-visible flags and the backend's flags for SDNode.
As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the
'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic
reassociation - 'AllowReassoc'.
We're also adding a bit to allow approximations for library functions called 'ApproxFunc'
(this was initially proposed as 'libm' or similar).
...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did
look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits),
but that's apparently already used for other purposes. Also, I don't think we can just
add a field to FPMathOperator because Operator is not intended to be instantiated.
We'll defer movement of FMF to another day.
We keep the 'fast' keyword. I thought about removing that, but seeing IR like this:
%f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2
...made me think we want to keep the shortcut synonym.
Finally, this change is binary incompatible with existing IR as seen in the
compatibility tests. This statement:
"Newer releases can ignore features from older releases, but they cannot miscompile
them. For example, if nsw is ever replaced with something else, dropping it would be
a valid way to upgrade the IR."
( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility )
...provides the flexibility we want to make this change without requiring a new IR
version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will
fail to optimize some previously 'fast' code because it's no longer recognized as
'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'.
Note: an inter-dependent clang commit to use the new API name should closely follow
commit.
Differential Revision: https://reviews.llvm.org/D39304
llvm-svn: 317488
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.
Originally commited as r317374, but reverted in r317395 to update some missed
tests.
Differential Revision: https://reviews.llvm.org/D35702
llvm-svn: 317408
This preserves the debug info for the cast operation in the original location.
rdar://problem/33460652
Reapplied r317340 with the test moved into an ARM-specific directory.
llvm-svn: 317375
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.
Differential Revision: https://reviews.llvm.org/D35702
llvm-svn: 317374
Merging conditional stores tries to check to see if the code is if convertible after the store is moved. But the store hasn't been moved yet so its being counted against the threshold.
The patch adds 1 to the threshold comparison to make sure we don't count the store. I've adjusted a test to use a lower threshold to ensure we still do that conversion with the lower threshold.
Differential Revision: https://reviews.llvm.org/D39570
llvm-svn: 317368
Summary:
SpeculativelyExecuteBB can flatten the CFG by doing
speculative execution followed by a select instruction.
When the speculatively executed BB contained dbg intrinsics
the result could be a little bit weird, since those dbg
intrinsics were inserted before the select in the flattened
CFG. So when single stepping in the debugger, printing the
value of the variable referenced in the dbg intrinsic, it
could happen that it looked like the variable had values
that never actually were assigned to the variable.
This patch simply discards all dbg intrinsics that were found
in the speculatively executed BB.
Reviewers: aprantl, chandlerc, craig.topper
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39494
llvm-svn: 317198
Summary:
By replacing branches to CommonExitBlock, we remove the node from
CommonExitBlock's predecessors, invalidating the iterator. The problem
is exposed when the common exit block has multiple predecessors and
needs to sink lifetime info. The modification in the test case trigger
the issue.
Reviewers: davidxl, davide, wmi
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39112
llvm-svn: 317084
This formulation might be slightly slower since I eagerly compute the cheap replacements. If anyone sees this having a compile time impact, let me know and I'll use lazy population instead.
llvm-svn: 317048
Currently the selects are created with the names of their inputs concatenated together. It's possible to get cases that chain these selects together resulting in long names due to multiple levels of concatenation. Our internal branch of llvm managed to generate names over 100000 characters in length on a particular test due to an extreme compounding of the names.
This patch changes the name to a generic name that is not dependent on its inputs.
Differential Revision: https://reviews.llvm.org/D39440
llvm-svn: 317024
The optimisation remarks for loop unrolling with an unrolled remainder looks something like:
test.c:7:18: remark: completely unrolled loop with 3 iterations [-Rpass=loop-unroll]
C[i] += A[i*N+j];
^
test.c:6:9: remark: unrolled loop by a factor of 4 with run-time trip count [-Rpass=loop-unroll]
for(int j = 0; j < N; j++)
^
This removes the first of the two messages.
Differential revision: https://reviews.llvm.org/D38725
llvm-svn: 316986
As noted in the nice block comment, the previous code didn't actually handle multi-entry loops correctly, it just assumed SCEV didn't analyze such loops. Given SCEV has comments to the contrary, that seems a bit suspect. More importantly, the pass actually requires loopsimplify form which ensures a loop-preheader is available. Remove the excessive generaility and shorten the code greatly.
Note that we do successfully analyze many multi-entry loops, but we do so by converting them to single entry loops. See the added test case.
llvm-svn: 316976
Previously, the code returned early from the *function* when it couldn't find a free expansion, it should be returning from the *transform*. I don't have a test case, noticed this via inspection.
As a follow up, I'm going to revisit the logic in the extract function. I think that essentially the whole helper routine can be replaced with SCEVExpander, but I wanted to do that in a series of separate commits.
llvm-svn: 316974
Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725
If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area. There's a bunch of obviously wrong code in the same function. I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like.
llvm-svn: 316967
Summary: There are certain requirements for debug location of debug intrinsics, e.g. the scope of the DILocalVariable should be the same as the scope of its debug location. As a result, we should not add discriminator encoding for debug intrinsics.
Reviewers: dblaikie, aprantl
Reviewed By: aprantl
Subscribers: JDevlieghere, aprantl, bjope, sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D39343
llvm-svn: 316703
Summary: For some irreducible CFG the domtree nodes might be dead, do not update domtree for dead nodes.
Reviewers: kuhar, dberlin, hfinkel
Reviewed By: kuhar
Subscribers: llvm-commits, mcrosier
Differential Revision: https://reviews.llvm.org/D38960
llvm-svn: 316582
As discussed in D39011:
https://reviews.llvm.org/D39011
...replacing constants with a variable is inverting the transform done
by other IR passes, so we definitely don't want to do this early.
In fact, it's questionable whether this transform belongs in SimplifyCFG
at all. I'll look at moving this to codegen as a follow-up step.
llvm-svn: 316298
The missed canonicalization/optimization in the motivating test from PR34471 leads to very different codegen:
int switcher(int x) {
switch(x) {
case 17: return 17;
case 19: return 19;
case 42: return 42;
default: break;
}
return 0;
}
int comparator(int x) {
if (x == 17) return 17;
if (x == 19) return 19;
if (x == 42) return 42;
return 0;
}
For the first example, we use a bit-test optimization to avoid a series of compare-and-branch:
https://godbolt.org/g/BivDsw
Differential Revision: https://reviews.llvm.org/D39011
llvm-svn: 316293
This avoid code duplication and allow us to add the disable unroll metadata elsewhere.
Differential Revision: https://reviews.llvm.org/D38928
llvm-svn: 315850
parameterized emit() calls
Summary: This is not functional change to adopt new emit() API added in r313691.
Reviewed By: anemet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38285
llvm-svn: 315476
Summary:
If the extracted region has multiple exported data flows toward the same BB which is not included in the region, correct resotre instructions and PHI nodes won't be generated inside the exitStub. The solution is simply put the restore instructions right after the definition of output values instead of putting in exitStub.
Unittest for this bug is included.
Author: myhsu
Reviewers: chandlerc, davide, lattner, silvas, davidxl, wmi, kuhar
Subscribers: dberlin, kuhar, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D37902
llvm-svn: 315041
It is possible for two modules to define the same set of external
symbols without causing a duplicate symbol error at link time,
as long as each of the symbols is a comdat member. So we cannot
use them as part of a unique id for the module.
Differential Revision: https://reviews.llvm.org/D38602
llvm-svn: 315026
This is a follow-up to https://reviews.llvm.org/D38138.
I fixed the capitalization of some functions because we're changing those
lines anyway and that helped verify that we weren't accidentally dropping
any options by using default param values.
llvm-svn: 314930
Summary: If the merged instruction is call instruction, we need to set the scope to the closes common scope between 2 locations, otherwise it will cause trouble when the call is getting inlined.
Reviewers: dblaikie, aprantl
Reviewed By: dblaikie, aprantl
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D37877
llvm-svn: 314694
The type of a SCEVConstant may not match the corresponding LLVM Value.
In this case, we skip the constant folding for now.
TODO: Replace ConstantInt Zero by ConstantPointerNull
llvm-svn: 314531
JumpThreading now preserves dominance and lazy value information across the
entire pass. The pass manager is also informed of this preservation with
the goal of DT and LVI being recalculated fewer times overall during
compilation.
This change prepares JumpThreading for enhanced opportunities; particularly
those across loop boundaries.
Patch by: Brian Rzycki <b.rzycki@samsung.com>,
Sebastian Pop <s.pop@samsung.com>
Differential revision: https://reviews.llvm.org/D37528
llvm-svn: 314435
Summary:
And now that we no longer have to explicitly free() the Loop instances, we can
(with more ease) use the destructor of LoopBase to do what LoopBase::clear() was
doing.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38201
llvm-svn: 314375
This was intended to be no-functional-change, but it's not - there's a test diff.
So I thought I should stop here and post it as-is to see if this looks like what was expected
based on the discussion in PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603
Notes:
1. The test improvement occurs because the existing 'LateSimplifyCFG' marker is not carried
through the recursive calls to 'SimplifyCFG()->SimplifyCFGOpt().run()->SimplifyCFG()'.
The parameter isn't passed down, so we pick up the default value from the function signature
after the first level. I assumed that was a bug, so I've passed 'Options' down in all of the
'SimplifyCFG' calls.
2. I split 'LateSimplifyCFG' into 2 bits: ConvertSwitchToLookupTable and KeepCanonicalLoops.
This would theoretically allow us to differentiate the transforms controlled by those params
independently.
3. We could stash the optional AssumptionCache pointer and 'LoopHeaders' pointer in the struct too.
I just stopped here to minimize the diffs.
4. Similarly, I stopped short of messing with the pass manager layer. I have another question that
could wait for the follow-up: why is the new pass manager creating the pass with LateSimplifyCFG
set to true no matter where in the pipeline it's creating SimplifyCFG passes?
// Create an early function pass manager to cleanup the output of the
// frontend.
EarlyFPM.addPass(SimplifyCFGPass());
-->
/// \brief Construct a pass with the default thresholds
/// and switch optimizations.
SimplifyCFGPass::SimplifyCFGPass()
: BonusInstThreshold(UserBonusInstThreshold),
LateSimplifyCFG(true) {} <-- switches get converted to lookup tables and loops may not be in canonical form
If this is unintended, then it's possible that the current behavior of dropping the 'LateSimplifyCFG'
setting via recursion was masking this bug.
Differential Revision: https://reviews.llvm.org/D38138
llvm-svn: 314308
This patch tries to transform cases like:
for (unsigned i = 0; i < N; i += 2) {
bool c0 = (i & 0x1) == 0;
bool c1 = ((i + 1) & 0x1) == 1;
}
To
for (unsigned i = 0; i < N; i += 2) {
bool c0 = true;
bool c1 = true;
}
This commit also update test/Transforms/IndVarSimplify/replace-srem-by-urem.ll to prevent constant folding.
Differential Revision: https://reviews.llvm.org/D38272
llvm-svn: 314266
Summary:
Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow . This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.
Reviewers: jlebar
Subscribers: jholewinski, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38265
llvm-svn: 314253
Usually the frontend communicates the size of wchar_t via metadata and
we can optimize wcslen (and possibly other calls in the future). In
cases without the wchar_size metadata we would previously try to guess
the correct size based on the target triple; however this is fragile to
keep up to date and may miss users manually changing the size via flags.
Better be safe and stop guessing and optimizing if the frontend didn't
communicate the size.
Differential Revision: https://reviews.llvm.org/D38106
llvm-svn: 314185
Since now SCEV can handle 'urem', an 'urem' is a better canonical form than an 'srem' because it has well-defined behavior
This is a follow up of D34598
Differential Revision: https://reviews.llvm.org/D38072
llvm-svn: 314125
The fix is to avoid invalidating our insertion point in
replaceDbgDeclare:
Builder.insertDeclare(NewAddress, DIVar, DIExpr, Loc, InsertBefore);
+ if (DII == InsertBefore)
+ InsertBefore = &*std::next(InsertBefore->getIterator());
DII->eraseFromParent();
I had to write a unit tests for this instead of a lit test because the
use list order matters in order to trigger the bug.
The reduced C test case for this was:
void useit(int*);
static inline void inlineme() {
int x[2];
useit(x);
}
void f() {
inlineme();
inlineme();
}
llvm-svn: 313905
.. as well as the two subsequent changes r313826 and r313875.
This leads to segfaults in combination with ASAN. Will forward repro
instructions to the original author (rnk).
llvm-svn: 313876
I noticed this inefficiency while investigating PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603
This fix will likely push another bug (we don't maintain state of 'LateSimplifyCFG')
into hiding, but I'll try to clean that up with a follow-up patch anyway.
llvm-svn: 313829
Summary:
This implements the design discussed on llvm-dev for better tracking of
variables that live in memory through optimizations:
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117222.html
This is tracked as PR34136
llvm.dbg.addr is intended to be produced and used in almost precisely
the same way as llvm.dbg.declare is today, with the exception that it is
control-dependent. That means that dbg.addr should always have a
position in the instruction stream, and it will allow passes that
optimize memory operations on local variables to insert llvm.dbg.value
calls to reflect deleted stores. See SourceLevelDebugging.rst for more
details.
The main drawback to generating DBG_VALUE machine instrs is that they
usually cause LLVM to emit a location list for DW_AT_location. The next
step will be to teach DwarfDebug.cpp how to recognize more DBG_VALUE
ranges as not needing a location list, and possibly start setting
DW_AT_start_offset for variables whose lifetimes begin mid-scope.
Reviewers: aprantl, dblaikie, probinson
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37768
llvm-svn: 313825
Summary:
With this change:
- Methods in LoopBase trip an assert if the receiver has been invalidated
- LoopBase::clear frees up the memory held the LoopBase instance
This change also shuffles things around as necessary to work with this stricter invariant.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38055
llvm-svn: 313708
Summary:
See comment for why I think this is a good idea.
This change also:
- Removes an SCEV test case. The SCEV test was not testing anything useful (most of it was `#if 0` ed out) and it would need to be updated to deal with a private ~Loop::Loop.
- Updates the loop pass manager test case to deal with a private ~Loop::Loop.
- Renames markAsRemoved to markAsErased to contrast with removeLoop, via the usual remove vs. erase idiom we already have for instructions and basic blocks.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37996
llvm-svn: 313695
In the lambda we are now returning the remark by value so we need to preserve
its type in the insertion operator. This requires making the insertion
operator generic.
I've also converted a few cases to use the new API. It seems to work pretty
well. See the LoopUnroller for a slightly more interesting case.
llvm-svn: 313691
Add a profitability heuristic to enable runtime unrolling of multi-exit
loop: There can be atmost two unique exit blocks for the loop and the
second exit block should be a deoptimizing block. Also, there can be one
other exiting block other than the latch exiting block. The reason for
the latter is so that we limit the number of branches in the unrolled
code to being at most the unroll factor. Deoptimizing blocks are rarely
taken so these additional number of branches created due to the
unrolling are predictable, since one of their target is the deopt block.
Reviewers: apilipenko, reames, evstupac, mkuper
Subscribers: llvm-commits
Reviewed by: reames
Differential Revision: https://reviews.llvm.org/D35380
llvm-svn: 313363
During runtime unrolling on loops with multiple exits, we update the
exit blocks with the correct phi values from both original and remainder
loop.
In this process, we lookup the VMap for the mapped incoming phi values,
but did not update the VMap if a default entry was generated in the VMap
during the lookup. This default value is generated when constants or
values outside the current loop are looked up.
This patch fixes the assertion failure when null entries are present in
the VMap because of this lookup. Added a testcase that showcases the
problem.
llvm-svn: 313358
Summary: Move to LoopUtils method that collects all children of a node inside a loop.
Reviewers: majnemer, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37870
llvm-svn: 313322
It now knows the tricks of both functions.
Also, fix a bug that considered allocas of non-zero address space to be always non null
Differential Revision: https://reviews.llvm.org/D37628
llvm-svn: 312869
Debug information can be, and was, corrupted when the runtime
remainder loop was fully unrolled. This is because a !null node can
be created instead of a unique one describing the loop. In this case,
the original node gets incorrectly updated with the NewLoopID
metadata.
In the case when the remainder loop is going to be quickly fully
unrolled, there isn't the need to add loop metadata for it anyway.
Differential Revision: https://reviews.llvm.org/D37338
llvm-svn: 312471
Summary:
If SimplifyCFG pass is able to merge conditional stores into single one,
it loses the alignment. This may lead to incorrect codegen. Patch
sets the alignment of the new instruction if it is set in the original
one.
Reviewers: jmolloy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36841
llvm-svn: 312030
When peeling kicks in, it updates the loop preheader.
Later, a successful full unroll of the loop needs to update a PHI
which i-th argument comes from the loop preheader, so it'd better look
at the correct block. Fixes PR33437.
Differential Revision: https://reviews.llvm.org/D37153
llvm-svn: 311922
The 1st try was reverted because it could inf-loop by creating a dead instruction.
Fixed that to not happen and added a test case to verify.
Original commit message:
Try to fold:
memcmp(X, C, ConstantLength) == 0 --> load X == *C
Without this change, we're unnecessarily checking the alignment of the constant data,
so we miss the transform in the first 2 tests in the patch.
I noted this shortcoming of LibCallSimpifier in one of the recent CGP memcmp expansion
patches. This doesn't help the example in:
https://bugs.llvm.org/show_bug.cgi?id=34032#c13
...directly, but it's worth short-circuiting more of these simple cases since we're
already trying to do that.
The benefit of transforming to load+cmp is that existing IR analysis/transforms may
further simplify that code. For example, if the load of the variable is common to
multiple memcmp calls, CSE can remove the duplicate instructions.
Differential Revision: https://reviews.llvm.org/D36922
llvm-svn: 311366
Try to fold:
memcmp(X, C, ConstantLength) == 0 --> load X == *C
Without this change, we're unnecessarily checking the alignment of the constant data,
so we miss the transform in the first 2 tests in the patch.
I noted this shortcoming of LibCallSimpifier in one of the recent CGP memcmp expansion
patches. This doesn't help the example in:
https://bugs.llvm.org/show_bug.cgi?id=34032#c13
...directly, but it's worth short-circuiting more of these simple cases since we're
already trying to do that.
The benefit of transforming to load+cmp is that existing IR analysis/transforms may
further simplify that code. For example, if the load of the variable is common to
multiple memcmp calls, CSE can remove the duplicate instructions.
Differential Revision: https://reviews.llvm.org/D36922
llvm-svn: 311333
a function into itself.
We tried to fix this before in r306495 but that got reverted as the
assert was actually hit.
This fixes the original bug (which we seem to have lost track of with
the revert) by blocking a second remapping when the function being
inlined is also the caller and the remapping could succeed but
erroneously.
The included test case would actually load from an inlined copy of the
alloca before this change, failing to load the stored value and
miscompiling.
Many thanks to Richard Smith for diagnosing a user miscompile to this
bug, and to Kyle for the first attempt and initial analysis and David Li
for remembering the issue and how to fix it and suggesting the patch.
I'm just stitching it together and landing it. =]
llvm-svn: 311229
Summary:
This patch makes LoopUnswitch use new incremental API for updating dominators.
It also updates SplitCriticalEdge, as it is called in LoopUnswitch.
There doesn't seem to be any noticeable performance difference when bootstrapping clang with this patch.
Reviewers: dberlin, davide, sanjoy, grosser, chandlerc
Reviewed By: davide, grosser
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35528
llvm-svn: 311093
Summary: When we move then-else code to if, we need to merge its debug info, otherwise the hoisted instruction may have inaccurate debug info attached.
Reviewers: aprantl, probinson, dblaikie, echristo, loladiro
Reviewed By: aprantl
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D36778
llvm-svn: 310985
Two minor savings: avoid copying the SinkAfter map and avoid moving a cast if it
is not needed.
Differential Revision: https://reviews.llvm.org/D36408
llvm-svn: 310910
This recommits r310869, with the moved files and no extra changes.
Original commit message:
This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.
I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.
I also had to make decomposeBitTest support vectors since InstSimplify needs that.
As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.
Differential Revision: https://reviews.llvm.org/D36593
llvm-svn: 310889
Failed to add the two files that moved. And then added an extra change I didn't mean to while trying to fix that. Reverting everything.
llvm-svn: 310873
This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.
I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.
I also had to make decomposeBitTest support vectors since InstSimplify needs that.
As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.
Differential Revision: https://reviews.llvm.org/D36593
llvm-svn: 310869
On some targets, the penalty of executing runtime unrolling checks
and then not the unrolled loop can be significantly detrimental to
performance. This results in the need to be more conservative with
the unroll count, keeping a trip count of 2 reduces the overhead as
well as increasing the chance of the unrolled body being executed. But
being conservative leaves performance gains on the table.
This patch enables the unrolling of the remainder loop introduced by
runtime unrolling. This can help reduce the overhead of misunrolled
loops because the cost of non-taken branches is much less than the
cost of the backedge that would normally be executed in the remainder
loop. This allows larger unroll factors to be used without suffering
performance loses with smaller iteration counts.
Differential Revision: https://reviews.llvm.org/D36309
llvm-svn: 310824
This make it consistent with STATISTIC which it will often appears near.
While there move one DEBUG_COUNTER instance out of an anonymous namespace. It's already declaring a static variable so the namespace is unnecessary.
llvm-svn: 310637
To the best of my knowledge -metarenamer is used in two cases:
1) obfuscate names, when e.g. they contain informations that
can't be shared.
2) Improve clarity of the textual IR for testcases.
One of the usecases if getting the output of `opt` and passing it
to the lli interpreter to run the test. If metarenamer renames
@main, lli can't find an entry point.
llvm-svn: 309657