This adds support non-canonical compare predicates. InstSimplify can't rely on canonicalization to have occurred.
Differential Revision: https://reviews.llvm.org/D36646
llvm-svn: 310893
This recommits r310869, with the moved files and no extra changes.
Original commit message:
This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.
I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.
I also had to make decomposeBitTest support vectors since InstSimplify needs that.
As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.
Differential Revision: https://reviews.llvm.org/D36593
llvm-svn: 310889
localized to the code that uses those analyses.
Technically, this can change behavior as we no longer require the
existence of the ProfileSummaryInfo analysis to use local profile
information via BFI. We didn't actually require the PSI to have an
interesting profile though, so this only really impacts the behavior in
non-default pass pipelines.
IMO, this makes it substantially less surprising how everything works --
before an analysis that wasn't actually used had to exist to trigger
*any* profile aware inlining. I think the new organization makes it more
obvious where various checks for profile signals happen.
Differential Revision: https://reviews.llvm.org/D36710
llvm-svn: 310888
Failed to add the two files that moved. And then added an extra change I didn't mean to while trying to fix that. Reverting everything.
llvm-svn: 310873
This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.
I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.
I also had to make decomposeBitTest support vectors since InstSimplify needs that.
As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.
Differential Revision: https://reviews.llvm.org/D36593
llvm-svn: 310869
ValueTracking has to strike a balance when attempting to propagate information
backwards from assumes, because if the information is trivially propagated
backwards, it can appear to LLVM that the assumption is known to be true, and
therefore can be removed.
This is sound (because an assumption has no semantic effect except for causing
UB), but prevents the assume from allowing further optimizations.
The isEphemeralValueOf check exists to try and prevent this issue by not
removing the source of an assumption. This tries to make it a little bit more
general to handle the case of side-effectful instructions, such as in
%0 = call i1 @get_val()
%1 = xor i1 %0, true
call void @llvm.assume(i1 %1)
Patch by Ariel Ben-Yehuda, thanks!
Differential Revision: https://reviews.llvm.org/D36590
llvm-svn: 310859
causing compile time issues.
Moreover, the patch *deleted* the flag in addition to changing the
default, and links to a code review that doesn't even discuss the flag
and just has an update to a Clang test case.
I've followed up on the commit thread to ask for numbers on compile time
at this point, leaving the flag in place until things stabilize, and
pointing at specific code that seems to exhibit excessive compile time
with this patch.
Original commit message for r310583:
"""
[ValueTracking] Enabling ValueTracking patch by default (recommit). Part 2.
The original patch was an improvement to IR ValueTracking on
non-negative integers. It has been checked in to trunk (D18777,
r284022). But was disabled by default due to performance regressions.
Perf impact has improved. The patch would be enabled by default.
""""
llvm-svn: 310816
printing techniques with a DEBUG_TYPE controlling them.
It was a mistake to start re-purposing the pass manager `DebugLogging`
variable for generic debug printing -- those logs are intended to be
very minimal and primarily used for testing. More detailed and
comprehensive logging doesn't make sense there (it would only make for
brittle tests).
Moreover, we kept forgetting to propagate the `DebugLogging` variable to
various places making it also ineffective and/or unavailable. Switching
to `DEBUG_TYPE` makes this a non-issue.
llvm-svn: 310695
The original patch was an improvement to IR ValueTracking on non-negative
integers. It has been checked in to trunk (D18777, r284022). But was disabled by
default due to performance regressions.
Perf impact has improved. The patch would be enabled by default.
Reviewers: reames, hfinkel
Differential Revision: https://reviews.llvm.org/D34101
Patch by: Olga Chupina <olga.chupina@intel.com>
llvm-svn: 310583
of the returned value.
Checking the returned value from inside of a scoped exit isn't actually
valid. It happens to work when NRVO fires and the stars align, which
they reliably do with Clang but don't, for example, on MSVC builds.
llvm-svn: 310547
Summary:
Avoid checking each operand and calling getValueFromCondition() before calling
constantFoldUser() when the instruction type isn't supported by
constantFoldUser().
This fixes a large compile time regression in an internal build.
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36552
llvm-svn: 310545
must-alias(p, sz_p, p, sz_q) irrespective of access sizes sz_p, sz_q
As discussed a couple of weeks ago on the ML.
This makes the behavior consistent with that of BasicAA.
AA clients already check the obj size themselves and may not require the
obj size to match exactly the access size (e.g., in case of store forwarding)
llvm-svn: 310495
The recently improved support for `icmp` in ValueTracking
(r307304) exposes the fact that `isImplied` condition doesn't
really bail out if we hit the recursion limit (and calls
`computeKnownBits` which increases the depth and asserts).
Differential Revision: https://reviews.llvm.org/D36512
llvm-svn: 310481
isLegalAddressingMode() has recently gained the extra optional Instruction*
parameter, and therefore it can now do the job that previously only
isFoldableMemAccess() could do.
The SystemZ implementation of isLegalAddressingMode() has gained the
functionality of checking for offsets, which used to be done with
isFoldableMemAccess().
The isFoldableMemAccess() hook has been removed everywhere.
Review: Quentin Colombet, Ulrich Weigand
https://reviews.llvm.org/D35933
llvm-svn: 310463
to Nodes when removing ref edges from a RefSCC.
This map based association turns out to be pretty expensive for large
RefSCCs and pointless as we already have embedded data members inside
nodes that we use to track the DFS state. We can reuse one of those and
the map becomes unnecessary.
This also fuses the update of those numbers into the scan across the
pending stack of nodes so that we don't walk the nodes twice during the
DFS.
With this I expect the new PM to be faster than the old PM for the test
case I have been optimizing. That said, it also seems simpler and more
direct in many ways. The side storage was always pretty awkward.
The last remaining hot-spot in the profile of the LCG once this is done
will be the edge iterator walk in the DFS. I'll take a look at improving
that next.
llvm-svn: 310456
that RefSCC still connected.
This is common and can be handled much more efficiently. As soon as we
know we've covered every node in the RefSCC with the DFS, we can simply
reset our state and return. This avoids numerous data structure updates
and other complexity.
On top of other changes, this appears to get new PM back to parity with
the old PM for a large protocol buffer message source code. The dense
map updates are very hot in this function.
llvm-svn: 310451
limited batch updates.
Specifically, allow removing multiple reference edges starting from
a common source node. There are a few constraints that play into
supporting this form of batching:
1) The way updates occur during the CGSCC walk, about the most we can
functionally batch together are those with a common source node. This
also makes the batching simpler to implement, so it seems
a worthwhile restriction.
2) The far and away hottest function for large C++ files I measured
(generated code for protocol buffers) showed a huge amount of time
was spent removing ref edges specifically, so it seems worth focusing
there.
3) The algorithm for removing ref edges is very amenable to this
restricted batching. There are just both API and implementation
special casing for the non-batch case that gets in the way. Once
removed, supporting batches is nearly trivial.
This does modify the API in an interesting way -- now, we only preserve
the target RefSCC when the RefSCC structure is unchanged. In the face of
any splits, we create brand new RefSCC objects. However, all of the
users were OK with it that I could find. Only the unittest needed
interesting updates here.
How much does batching these updates help? I instrumented the compiler
when run over a very large generated source file for a protocol buffer
and found that the majority of updates are intrinsically updating one
function at a time. However, nearly 40% of the total ref edges removed
are removed as part of a batch of removals greater than one, so these
are the cases batching can help with.
When compiling the IR for this file with 'opt' and 'O3', this patch
reduces the total time by 8-9%.
Differential Revision: https://reviews.llvm.org/D36352
llvm-svn: 310450
Summary: Currently, ICP checks the count against a fixed value to see if it is hot enough to be promoted. This does not work for SamplePGO because sampled count may be much smaller. This patch uses PSI to check if the count is hot enough to be promoted.
Reviewers: davidxl, tejohnson, eraman
Reviewed By: davidxl
Subscribers: sanjoy, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D36341
llvm-svn: 310416
I want to reuse this code in SimplifyDemandedBits handling of Add/Sub. This will make that easier.
Wonder if we should use it in SelectionDAG's computeKnownBits too.
Differential Revision: https://reviews.llvm.org/D36433
llvm-svn: 310378
This was just a bad oversight on my part. The code in question should
never have worked without this fix. But it turns out, there are
relatively few places that involve libfunctions that participate in
a single SCC, and unless they do, this happens to not matter.
The effect of not having this correct is that each time through this
routine, the edge from write_wrapper to write was toggled between a call
edge and a ref edge. First time through, it becomes a demoted call edge
and is turned into a ref edge. Next time it is a promoted call edge from
a ref edge. On, and on it goes forever.
I've added the asserts which should have always been here to catch silly
mistakes like this in the future as well as a test case that will
actually infloop without the fix.
The other (much scarier) infinite-inlining issue I think didn't actually
occur in practice, and I simply misdiagnosed this minor issue as that
much more scary issue. The other issue *is* still a real issue, but I'm
somewhat relieved that so far it hasn't happened in real-world code
yet...
llvm-svn: 310342
After the previous series of patches, this is now trivial and deletes
a pretty astonishing amount of complexity. This has been a long time
coming, as the move toward a PO sequence of RefSCCs started eroding the
underlying use cases for this half of the data structure.
Among the biggest advantages here is that now there aren't two
independent data structures that need to stay in sync.
Some of my profiling has also indicated that updating the parent sets
was among the most expensive parts of the lazy call graph. Eliminating
it whole sale is likely to be a nice win in terms of compile time.
Last but not least, I had discussed with some folks previously keeping
it around for asserts and other correctness checking, but once the
fundamentals of the parent and child checking were implemented without
the parent sets their value in correctness checking was tiny and no
where near worth the cost of the complexity required to keep everything
up-to-date.
llvm-svn: 310171
isDescendantOf methods on RefSCCs in terms of the forward edges rather
than the parent sets.
This is technically slower, but probably not interestingly slower, and
all of these routines were already so expensive that they're guarded
behind both !NDEBUG and EXPENSIVE_CHECKS.
This removes another non-critical usage of parent sets.
I've also added some comments to try and help clarify to any potential
users the costs of these routines. They're mostly useful for debugging,
asserts, or other queries.
llvm-svn: 310170
walk over the parent set.
When removing a single function from the call graph, we previously would
walk the entire RefSCC's parent set and then walk every outgoing edge
just to find the ones to remove. In addition to this being quite high
complexity in theory, it is also the last fundamental use of the parent
sets.
With this change, when we remove a function we transform the node
containing it to be recognizably "dead" and then teach the edge
iterators to recognize edges to such nodes and skip them the same way
they skip null edges.
We can't move fully to using "dead" nodes -- when disconnecting two live
nodes we need to null out the edge. But the complexity this adds to the
edge sequence isn't too bad and the simplification of lazily handling
this seems like a significant win.
llvm-svn: 310169
The definition of 'false' here was already pretty vague and debatable,
and I'm about to add another potential 'false' that would actually make
much more sense in a bool operator. Especially given how rarely this is
used, a nicely named method seems better.
llvm-svn: 310165
structures, actually null out the graph pointers as well. We won't ever
update these, and we certainly shouldn't be calling any methods on them,
so it seems good to defensively nuke them.
llvm-svn: 310164
pointers in node objects, just walk the map from function to node.
It doesn't have stable ordering, but works just as well and is much
simpler. We don't need ordering when just updating internal pointers.
llvm-svn: 310163
merging RefSCCs.
The logic to directly use the reference edges is simpler and not
substantially slower (despite the comments to the contrary) because this
is not actually an especially hot part of LCG in practice.
llvm-svn: 310161
Pushes the sext onto the operands of a Sub if NSW is present.
Also adds support for propagating the nowrap flags of the
llvm.ssub.with.overflow intrinsic during analysis.
Differential Revision: https://reviews.llvm.org/D35256
llvm-svn: 310117
Summary: We originally set the hotness threshold as 99.9% to be consistent with gcc FDO. But because the inline heuristic is different between 2 compilers: llvm uses bottom-up algorithm while gcc uses priority based. The LLVM algorithm tends to inline too much early that prevents hot callsites from further inlined into its caller. Due to this restriction, we think it is reasonable to lower the hotness threshold to give priority to those that are really hot. Our experiments show that this change would improve performance on large applications. Note that the inline heuristic has great room for further tuning. Once the inline heuristics are refined, we could adjust this threshold to allow inlining for less hot callsites.
Reviewers: davidxl, tejohnson, eraman
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D36317
llvm-svn: 310065
Adds function attributes to index: ReadNone, ReadOnly, NoRecurse, NoAlias. This attributes will be used for future ThinLTO optimizations that will propagate function attributes across modules.
llvm-svn: 310061
Summary:
This commit allows matchSelectPattern to recognize clamp of float
arguments in the presence of FMF the same way as already done for
integers.
This case is a little different though. With integers, given the
min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX
"automatically". That is not the case for float, because for them only
full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care
about NaNs. On the other hand, some backends (e.g. X86) have only
FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM
nodes are illegal thus selection is not happening. So I decided to do
such kind of transformation in IR (InstCombiner) instead of
complicating the logic in the backend.
Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper
Reviewed By: efriedma
Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits
Patch by Andrei Elovikov <andrei.elovikov@intel.com>
Differential Revision: https://reviews.llvm.org/D33186
llvm-svn: 310054
Summary:
Detect when the working set size of a profiled application is huge,
by comparing the number of counts required to reach the hot percentile
in the profile summary to a large threshold*.
When the working set size is determined to be huge, disable peeling
to avoid bloating the working set further.
*Note that the selected threshold (15K) is significantly larger than the
largest working set value in SPEC cpu2006 (which is gcc at around 11K).
Reviewers: davidxl
Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D36288
llvm-svn: 310005
Summary:
This increases the inlining threshold for hot callsites. Hotness is
defined in terms of block frequency of the callsite relative to the
caller's entry block's frequency. Since this requires BFI in the
inliner, this only affects the new PM pipeline. This is enabled by
default at -O3.
This improves the performance of some internal benchmarks. Notably, an
internal benchmark for Gipfeli compression
(https://github.com/google/gipfeli) improves by ~7%. Povray in SPEC2006
improves by ~2.5%. I am running more experiments and will update the
thread if other benchmarks show improvement/regression.
In terms of text size, LLVM test-suite shows an 1.22% text size
increase. Diving into the results, 13 of the benchmarks in the
test-suite increases by > 10%. Most of these are small, but
Adobe-C++/loop_unroll (17.6% increases) and tramp3d(20.7% size increase)
have >250K text size. On a large application, the text size increases by
2%
Reviewers: chandlerc, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36199
llvm-svn: 309994
Summary:
(This is a second attempt as https://reviews.llvm.org/D34822 was reverted.)
LazyValueInfo currently computes the constant value of the switch condition through case edges, which allows the constant value to be propagated through the case edges.
But we have seen a case where a zero-extended value of the switch condition is used past case edges for which the constant propagation doesn't occur.
This patch adds a small logic to handle such a case in getEdgeValueLocal().
This is motivated by the Python 2.7 eval loop in PyEval_EvalFrameEx() where the lack of the constant propagation causes longer live ranges and more spill code than necessary.
With this patch, we see that the code size of PyEval_EvalFrameEx() decreases by ~5.4% and a performance test improves by ~4.6%.
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36247
llvm-svn: 309986
Summary: For SamplePGO, we already record the callsite count in the call instruction itself. So we do not want to use BFI to get profile count as it is less accurate.
Reviewers: tejohnson, davidxl, eraman
Reviewed By: eraman
Subscribers: sanjoy, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D36025
llvm-svn: 309964
The patch rL309080 was reverted because it did not clean up the cache on "forgetValue"
method call. This patch re-enables this change, adds the missing check and introduces
two new unit tests that make sure that the cache is cleaned properly.
Differential Revision: https://reviews.llvm.org/D36087
llvm-svn: 309925
This patch is update after the first patch (https://reviews.llvm.org/rL309651) based on the post-commit comments.
Stack coloring pass need to maintain AliasAnalysis information when merging stack slots of different types.
Actually, there is a FIXME comment in StackColoring.cpp
// FIXME: In order to enable the use of TBAA when using AA in CodeGen,
// we'll also need to update the TBAA nodes in MMOs with values
// derived from the merged allocas.
But, TBAA has been already enabled in CodeGen without fixing this pass.
The incorrect TBAA metadata results in recent failures in bootstrap test on ppc64le (PR33928) by allowing unsafe instruction scheduling.
Although we observed the problem on ppc64le, this is a platform neutral issue.
This patch makes the stack coloring pass maintains AliasAnalysis information when merging multiple stack slots.
This patch fixes PR33928.
llvm-svn: 309849