When we have an existing `argmemonly` or `inaccessiblememorargmemonly`
we used to "know" that information. However, interprocedural constant
propagation can invalidate these attributes. We now ignore and remove
these attributes for internal functions (which may be affected by IP
constant propagation), if we are deriving new attributes for the
function.
As we replace values with constants interprocedurally, we also need to
do this "look-through" step during the generic value traversal or we
would derive properties from replaced values. While this is often not
problematic, it is when we use the "kind" of a value for reasoning,
e.g., accesses to arguments allow `argmemonly`.
When we categorize a pointer value we bailed at `null` before. If we
know `null` is not a valid memory location we can ignore it as there
won't be an access at all.
We now use getPointerDereferenceableBytes to determine `nonnull` and
`dereferenceable` facts from the IR. We also use getPointerAlignment in
AAAlign for the same reason. The latter can interfere with callbacks so
we do restrict it to non-function-pointers for now.
In a recent patch we introduced a problem with abstract attributes that
were assumed dead at some point. Since `Attributor::updateAA` was
introduced in 95e0d28b71, we did not
remember the dependence on the liveness AA when an abstract attribute
was assumed dead and therefore not updated.
Explicit reproducer added in liveness.ll.
---
Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):
Before:
```
calls to allocation functions: 509242 (345483/s)
temporary memory allocations: 98666 (66937/s)
peak heap memory consumption: 18.60MB
peak RSS (including heaptrack overhead): 103.29MB
total memory leaked: 269.10KB
```
After:
```
calls to allocation functions: 529332 (355494/s)
temporary memory allocations: 102107 (68574/s)
peak heap memory consumption: 19.40MB
peak RSS (including heaptrack overhead): 102.79MB
total memory leaked: 269.10KB
```
Difference:
```
calls to allocation functions: 20090 (1339333/s)
temporary memory allocations: 3441 (229400/s)
peak heap memory consumption: 801.45KB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```
This is the first step to resolve a TODO in AAMemoryLocation and to fix
a bug we have when handling `byval` arguments of `readnone` call sites.
No functional change intended.
Summary:
This patch teaches shouldBeDeferred to take into account the total
cost of inlining.
Suppose we have a call hierarchy {A1,A2,A3,...}->B->C. (Each of A1,
A2, A3, ... calls B, which in turn calls C.)
Without this patch, shouldBeDeferred essentially returns true if
TotalSecondaryCost < IC.getCost()
where TotalSecondaryCost is the total cost of inlining B into As.
This means that if B is a small wraper function, for example, it would
get inlined into all of As. In turn, C gets inlined into all of As.
In other words, shouldBeDeferred ignores the cost of inlining C into
each of As.
This patch adds an option, inline-deferral-scale, to replace the
expression above with:
TotalCost < Allowance
where
- TotalCost is TotalSecondaryCost + IC.getCost() * # of As, and
- Allowance is IC.getCost() * Scale
For now, the new option defaults to -1, disabling the new scheme.
Reviewers: davidxl
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79138
Summary:
This was added in https://reviews.llvm.org/D2449, but I'm not sure it's
necessary since an inalloca value is never a Constant (should be an
AllocaInst).
Reviewers: hans, rnk
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79350
Before we eagerly put dependences into the QueryMap as soon as we
encountered them (via `Attributor::getAAFor<>` or
`Attributor::recordDependence`). Now we will wait to see if the
dependence is useful, that is if the target is not already in a fixpoint
state at the end of the update. If so, there is no need to record the
dependence at all.
Due to the abstraction via `Attributor::updateAA` we will now also treat
the very first update (during attribute creation) as we do subsequent
updates.
Finally this resolves the problematic usage of QueriedNonFixAA.
---
Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):
Before:
```
calls to allocation functions: 554675 (389245/s)
temporary memory allocations: 101574 (71280/s)
peak heap memory consumption: 28.46MB
peak RSS (including heaptrack overhead): 116.26MB
total memory leaked: 269.10KB
```
After:
```
calls to allocation functions: 512465 (345559/s)
temporary memory allocations: 98832 (66643/s)
peak heap memory consumption: 22.54MB
peak RSS (including heaptrack overhead): 106.58MB
total memory leaked: 269.10KB
```
Difference:
```
calls to allocation functions: -42210 (-727758/s)
temporary memory allocations: -2742 (-47275/s)
peak heap memory consumption: -5.92MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```
Attributes that only depend on the value (=bit pattern) can be
initialized from uses in the must-be-executed-context (MBEC). We did use
`AAComposeTwoGenericDeduction` and `AAFromMustBeExecutedContext` before
to do this for some positions of these attributes but not for all. This
was fairly complicated and also problematic as we did run it in every
`updateImpl` call even though we only use known information. The new
implementation removes `AAComposeTwoGenericDeduction`* and
`AAFromMustBeExecutedContext` in favor of a simple interface
`AddInformation::fromMBEContext(...)` which we call from the
`initialize` methods of the "value attribute" `Impl` classes, e.g.
`AANonNullImpl:initialize`.
There can be two types of test changes:
1) Artifacts were we miss some information that was known before a
global fixpoint was reached and therefore available in an update
but not at the beginning.
2) Deduction for values we did not derive via the MBEC before or which
were not found as the `AAFromMustBeExecutedContext::updateImpl` was
never invoked.
* An improved version of AAComposeTwoGenericDeduction can be found in
D78718. Once we find a new use case that implementation will be able
to handle "generic" AAs better.
---
Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):
Before:
```
calls to allocation functions: 468428 (328952/s)
temporary memory allocations: 77480 (54410/s)
peak heap memory consumption: 32.71MB
peak RSS (including heaptrack overhead): 122.46MB
total memory leaked: 269.10KB
```
After:
```
calls to allocation functions: 554720 (351310/s)
temporary memory allocations: 101650 (64376/s)
peak heap memory consumption: 28.46MB
peak RSS (including heaptrack overhead): 116.75MB
total memory leaked: 269.10KB
```
Difference:
```
calls to allocation functions: 86292 (556722/s)
temporary memory allocations: 24170 (155935/s)
peak heap memory consumption: -4.25MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D78719
Summary:
This factors cost and reporting out of the inlining workflow, thus
making it easier to reuse when driving inlining from the upcoming
InliningAdvisor.
Depends on: D79215
Reviewers: davidxl, echristo
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79275
Since every AbstractAttribute so far, and for the foreseeable future,
corresponds to a single IRPosition we can simplify the class structure.
We already did this for IRAttribute but there is no reason to stop
there.
Summary:
shouldInline makes a decision based on the InlineCost of a call site, as
well as an evaluation on whether the site should be deferred. This means
it's possible for the decision to be not to inline, even for an
InlineCost that would otherwise allow it.
Both uses of shouldInline performed the exact same logic after calling
it. In addition, the decision on whether to inline or not was
communicated through two values of the Option<InlineCost> return value:
None, or an InlineCost evaluating to false.
Simplified by:
- encapsulating the decision in the return object. The bool it evaluates
to communicates unambiguously the decision. The InlineCost is also
available.
- encapsulated the common post-shouldInline code into shouldInline.
Reviewers: davidxl, echristo, eraman
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79215
Summary:
Renamed 'CS' to 'CB', and, in one case, to a more specific name to avoid
naming collision with outer scope (a maintainability/readability reason,
not correctness)
Also updated comments.
Reviewers: davidxl, dblaikie, jdoerfert
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79101
Summary:
Refactored the parameter and return type where they are too generally
typed as Instruction.
Reviewers: dblaikie, wmi, craig.topper
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79027
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
Summary:
Missing error mangling is noticed in
https://bugs.llvm.org/show_bug.cgi?id=45636
where inconsistent profiling input caused
llvm/lld to crash as:
```
Program aborted due to an unhandled Error:
linking module flags 'ProfileSummary':
IDs have conflicting values in 'Mutex_posix.o' and 'nsBrowserApp.o'
```
The change does not change the fact that LLVM crashes
but changes error output to say what was incorrect:
```
LLVM ERROR: Function Import: link error:
linking module flags 'ProfileSummary':
IDs have conflicting values in 'Mutex_posix.o' and 'nsBrowserApp.o'
```
Actual crash has yet to be fixed.
Reviewers: lattner
Reviewed By: lattner
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78676
The CallSite and ImmutableCallSite were removed in a previous
commit. So rename the file to match the remaining class and
the name of the cpp that implements it.
The motivation is to be able to play with the option and change if it is required.
Reviewers: fedor.sergeev, apilipenko, rnk, jdoerfert
Reviewed By: fedor.sergeev
Subscribers: hiraditya, dantrushin, llvm-commits
Differential Revision: https://reviews.llvm.org/D78624
I don't believe this pass deals with vectors of pointers. I think
this getScalarType() was added during a mechanical opaque pointer
change of the interface to GetElementPtrInst::getIndexedType.
The number of different access location kinds we track is relatively
small (8 so far). With this patch we replace the DenseMap that mapped
from index (0-7) to the access set pointer with an array of access set
pointers. This reduces memory consumption.
No functional change is intended.
---
Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):
Before:
```
calls to allocation functions: 472499 (215654/s)
temporary memory allocations: 77794 (35506/s)
peak heap memory consumption: 35.28MB
peak RSS (including heaptrack overhead): 125.46MB
total memory leaked: 269.04KB
```
After:
```
calls to allocation functions: 472270 (308673/s)
temporary memory allocations: 77578 (50704/s)
peak heap memory consumption: 32.70MB
peak RSS (including heaptrack overhead): 121.78MB
total memory leaked: 269.04KB
```
Difference:
```
calls to allocation functions: -229 (346/s)
temporary memory allocations: -216 (326/s)
peak heap memory consumption: -2.58MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```
---
If we have a dependence between an abstract attribute A to an abstract
attribute B such hat changes in A should trigger an update of B, we do
not need to keep the dependence around once the update was triggered. If
the dependence is still required the update will reinsert it into the
dependence map, if it is not we avoid triggering B in the future. This
replaces the "recompute interval" mechanism we used before to prune
stale dependences.
Number of required iterations is generally down, compile time for the
module pass (not really the CGSCC pass) is down quite a bit.
There is one test change which looks like an artifact in the undefined
behavior AA that needs to be looked at.
The old command line option `-attributor-disable` was too coarse grained
as we want to measure the effects of the module or cgscc pass without
the other as well.
Since `none` is the default there is no real functional change.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D78571
AbstractAttribute::initialize is used to initialize the deduction and
the object we do not always call it. To make sure we have the option to
initialize the object even if initialize is not called we pass the
Attributor to AbstractAttribute constructors now.
Before we kept the first applicable `ident_t*` during deduplication of
runtime calls. The problem is that "first" is dependent on the iteration
order of a DenseMap. Since the proper solution, which is to combine the
information from all `ident_t*`, should be deterministic on its own, we
will not try to make the iteration order deterministic. Instead, we will
create a fresh `ident_t*` if there is not a unique existing `ident_t*`
to pick.
We now also use the BumpPtrAllocator from the Attributor in the
InformationCache. The lifetime of objects in either is pretty much the
same and it should result in consistently good performance regardless of
the allocator.
Doing so requires to call more constructors manually but so far that
does not seem to be problematic or messy.
---
Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):
Before:
```
calls to allocation functions: 615359 (368257/s)
temporary memory allocations: 83315 (49859/s)
peak heap memory consumption: 75.64MB
peak RSS (including heaptrack overhead): 163.43MB
total memory leaked: 269.04KB
```
After:
```
calls to allocation functions: 613042 (359555/s)
temporary memory allocations: 83322 (48869/s)
peak heap memory consumption: 75.64MB
peak RSS (including heaptrack overhead): 162.92MB
total memory leaked: 269.04KB
```
Difference:
```
calls to allocation functions: -2317 (-68147/s)
temporary memory allocations: 7 (205/s)
peak heap memory consumption: 2.23KB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
---
There are also some adjustments to use MaybeAlign in here due
to CallBase::getParamAlignment() being deprecated. It would
be cleaner if getOrEnforceKnownAlignment was migrated
to Align/MaybeAlign.
Differential Revision: https://reviews.llvm.org/D78345
Removing CallSite left us with a bunch of explicit casts from
Instruction to CallBase. This moves the casts earlier so that
function arguments and data structure types are CallBase so
we don't have to cast when we use them.
Differential Revision: https://reviews.llvm.org/D78246
CallSite will likely be removed soon, but AbstractCallSite serves a different purpose and won't be going away.
This patch switches it to internally store a CallBase* instead of a
CallSite. The only interface changes are the removal of the getCallSite
method and getCallBackUses now takes a CallBase&. These methods had only
a few callers that were easy enough to update without needing a
compatibility shim.
In the future once the other CallSites are gone, the CallSite.h
header should be renamed to AbstractCallSite.h
Differential Revision: https://reviews.llvm.org/D78322
Summary:
Changes the type of the @__typeid_.*_unique_member imports we generate
for unique return value optimization from i8 to [0 x i8]. This
prevents assuming that these imports do not alias, such as when
two unique return values occur in the same vtable.
Fixes PR45393.
Reviewers: tejohnson, pcc
Reviewed By: pcc
Subscribers: aganea, hiraditya, rnk, george.burgess.iv, dblaikie, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77421
Since we use the fact that some uses are droppable in the Attributor we
need to handle them explicitly when we replace uses. As an example, an
assumed dead value can have live droppable users. In those we cannot
replace the value simply by an undef. Instead, we either drop the uses
(via `dropDroppableUses`) or keep them as they are. In this patch we do
both, depending on the situation. For values that are dead but not
necessarily removed we keep droppable uses around because they contain
information we might be able to use later. For values that are removed
we drop droppable uses explicitly to avoid replacement with undef.
The check if globals were accessed was not always working because two
bits are set for NO_GLOBAL_MEM. The new check works also if only on kind
of globals (internal/external) is accessed.
Running the verifier is expensive so we want to avoid it even in runs
that enable assertions. As we move closer to enabling the Attributor
this code will be executed by some buildbots but not cause overhead for
most people.
Before, we eagerly analyzed all the functions to collect information
about them, e.g. what instructions may read/write memory. This had
multiple drawbacks:
- In CGSCC-mode we can end up looking at a callee which is not in the
SCC but for which we need an initialized cache.
- We end up looking at functions that we deem dead and never need to
analyze in the first place.
- We have a implicit dependence which is easy to break.
This patch moves the function analysis into the information cache and
makes it lazy. There is no real functional change expected except due to
the first reason above.
The CallGraphUpdater allows to directly alter call site information and
we should do so. This might appease the windows buildbot that crashes
during the SCC traversal.
Summary:
Currently, the internal options -vectorize-loops, -vectorize-slp, and
-interleave-loops do not have much practical effect. This is because
they are used to initialize the corresponding flags in the pass
managers, and those flags are then unconditionally overwritten when
compiling via clang or via LTO from the linkers. The only exception was
-vectorize-loops via opt because of some special hackery there.
While vectorization could still be disabled when compiling via clang,
using -fno-[slp-]vectorize, this meant that there was no way to disable
it when compiling in LTO mode via the linkers. This only affected
ThinLTO, since for regular LTO vectorization is done during the compile
step for scalability reasons. For ThinLTO it is invoked in the LTO
backends. See also the discussion on PR45434.
This patch makes it so the internal options can actually be used to
disable these optimizations. Ultimately, the best long term solution is
to mark the loops with metadata (similar to the approach used to fix
-fno-unroll-loops in D77058), but this enables a shorter term
workaround, and actually makes these internal options useful.
I constant propagated the initial values of these internal flags into
the pass manager flags (for some reasons vectorize-loops and
interleave-loops were initialized to true, while vectorize-slp was
initialized to false). As mentioned above, they are overwritten
unconditionally so this doesn't have any real impact, and these initial
values aren't particularly meaningful.
I then changed the passes to check the internl values and return without
performing the associated optimization when false (I changed the default
of -vectorize-slp to true so the options behave similarly). I was able
to remove the hackery in opt used to get -vectorize-loops=false to work,
as well as a special option there used to disable SLP vectorization.
Finally, I changed thinlto-slp-vectorize-pm.c to:
a) Only test SLP (moved the loop vectorization checking to a new test).
b) Use code that is slp vectorized when it is enabled, and check that
instead of whether the pass is enabled.
c) Test the new behavior of -vectorize-slp.
d) Test both pass managers.
The loop vectorization (and associated interleaving) testing I moved to
a new thinlto-loop-vectorize-pm.c test, with several changes:
a) Changed the flags on the interleaving testing so that it will
actually interleave, and check that.
b) Test the new behavior of -vectorize-loops and -interleave-loops.
c) Test both pass managers.
Reviewers: fhahn, wmi
Subscribers: hiraditya, steven_wu, dexonsmith, cfe-commits, davezarzycki, llvm-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77989
It can be used to avoid passing the begin and end of a range.
This makes the code shorter and it is consistent with another
wrappers we already have.
Differential revision: https://reviews.llvm.org/D78016
ModuleSummaryAnalysis is the only file in libAnalysis that brings a
dependency on the CodeGen layer from libAnalysis, moving it breaks this
dependency.
Differential Revision: https://reviews.llvm.org/D77994
Summary:
Updated CallPromotionUtils and impacted sites. Parameters that are
expected to be non-null, and return values that are guranteed non-null,
were replaced with CallBase references rather than pointers.
Left FIXME in places where more changes are facilitated by CallBase, but
aren't CallSites: Instruction* parameters or return values, for example,
where the contract that they are actually CallBase values.
Reviewers: davidxl, dblaikie, wmi
Reviewed By: dblaikie
Subscribers: arsenm, jvesely, nhaehnle, eraman, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77930
Summary:
The inline history is associated with a call site. There are two locations
we fetch inline history. In one, we fetch it together with the call
site. In the other, we initialize it under certain conditions, use it
later under same conditions (different if check), and otherwise is
uninitialized. Although currently there is no uninitialized use, the
code is more challenging to maintain correctly, than if the value were
always initialized.
Changed to the upfront initialization pattern already present in this
file.
Reviewers: davidxl, dblaikie
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77877
Summary:
Function names: camel case, lower case first letter.
Variable names: start with upper letter. For iterators that were 'i',
renamed with a descriptive name, as 'I' is 'Instruction&'.
Lambda captures simplification.
Opportunistic boolean return simplification.
Reviewers: davidxl, dblaikie
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77837
Summary:
*Almost* all uses are replaced. Left FIXMEs for the two sites that
require refactoring outside of Inliner, to scope this patch.
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77817
Attributor.cpp became quite big and we need to start provide structure.
The Attributor code is now in Attributor.cpp and the classes derived
from AbstractAttribute are in AttributorAttributes.cpp. Minor changes
were required but no intended functional changes.
We also minimized includes as part of this.
Reviewed By: baziotis
Differential Revision: https://reviews.llvm.org/D76873
dso_local leads to direct access even if the definition is not within this compilation unit (it is
still in the same linkage unit). On ELF, such a relocation (e.g. R_X86_64_PC32) referencing a
STB_GLOBAL STV_DEFAULT object can cause a linker error in a -shared link.
If the linkage is changed to available_externally, the dso_local flag should be dropped, so that no
direct access will be generated.
The current behavior is benign, because -fpic does not assume dso_local
(clang/lib/CodeGen/CodeGenModule.cpp:shouldAssumeDSOLocal).
If we do that for -fno-semantic-interposition (D73865), there will be an
R_X86_64_PC32 linker error without this patch.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D74751
Now that we have scalable vectors, there's a distinction that isn't
getting captured in the original SequentialType: some vectors don't have
a known element count, so counting the number of elements doesn't make
sense.
In some cases, there's a better way to express the commonality using
other methods. If we're dealing with GEPs, there's GEP methods; if we're
dealing with a ConstantDataSequential, we can query its element type
directly.
In the relatively few remaining cases, I just decided to write out
the type checks. We're talking about relatively few places, and I think
the abstraction doesn't really carry its weight. (See thread "[RFC]
Refactor class hierarchy of VectorType in the IR" on llvmdev.)
Differential Revision: https://reviews.llvm.org/D75661
The new and old pass managers (PassManagerBuilder.cpp and
PassBuilder.cpp) are exposed to an `extern` declaration of
`attributor-disable` option which will guard the addition of the
attributor passes to the pass pipelines.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D76871
Query AAValueSimplify on pointers in memory accessing instructions to take
advantage of the constant propagation (or any other value simplification) of such values.
Summary:
A recent change in the instruction simplifier enables a call to a function that just returns one of its parameter to be simplified as simply loading the parameter. This exposes a bug in the inliner where double inlining may be involved which in turn may cause compiler ICE when an already-inlined callsite is reused for further inlining.
To put it simply, in the following-like C program, when the function call second(t) is inlined, its code t = third(t) will be reduced to just loading the return value of the callsite first(). This causes the inliner internal data structure to register the first() callsite for the call edge representing the third() call, therefore incurs a double inlining when both call edges are considered an inline candidate. I'm making a fix to break the inliner from reusing a callsite for new call edges.
```
void top()
{
int t = first();
second(t);
}
void second(int t)
{
t = third(t);
fourth(t);
}
void third(int t)
{
return t;
}
```
The actual failing case is much trickier than the example here and is only reproducible with the legacy inliner. The way the legacy inliner works is to process each SCC in a bottom-up order. That means in reality function first may be already inlined into top, or function third is either inlined to second or is folded into nothing. To repro the failure seen from building a large application, we need to figure out a way to confuse the inliner so that the bottom-up inlining is not fulfilled. I'm doing this by making the second call indirect so that the alias analyzer fails to figure out the right call graph edge from top to second and top can be processed before second during the bottom-up. We also need to tweak the test code so that when the inlining of top happens, the function body of second is not that optimized, by delaying the pass of function attribute deducer (i.e, which tells function third has no side effect and just returns its parameter). Since the CGSCC pass is iterative, additional calls are added to top to postpone the inlining of second to the second round right after the first function attribute deducing pass is done. I haven't been able to repro the failure with the new pass manager since the processing order of ininlined callsites is a bit different, but in theory the issue could happen there too.
Note that this fix could introduce a side effect that blocks the simplification of inlined code, specifically for a call site that can be folded to another call site. I hope this can probably be complemented by subsequent inlining or folding, as shown in the attached unit test. The ideal fix should be to separate the use of VMap. However, in reality this failing pattern shouldn't happen often. And even if it happens, there should be a good chance that the non-folded call site will be refolded by iterative inlining or subsequent simplification.
Reviewers: wenlei, davidxl, tejohnson
Reviewed By: wenlei, davidxl
Subscribers: eraman, nikic, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76248
There was a TODO in genericValueTraversal to provide the context
instruction and due to the lack of it users that wanted one just used
something available. Unfortunately, using a fixed instruction is wrong
in the presence of PHIs so we need to update the context instruction
properly.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D76870
This cannot be triggered right now, as far as I know, but it doesn't
make sense to deduce a constant range on arguments of declarations.
Exposed during testing of AAValueSimplify extensions.
Use DL & ABI information for better alignment deduction, e.g., if a type
is accessed and the ABI specifies an alignment requirement for such an
access we can use it. This is based on a patch by @lebedev.ri and
inspired by getBaseAlign in Loads.cpp.
Depends on D76673.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D76674
If we have a must-tail call the callee and caller need to have matching
ABIs. Part of that is alignment which we might modify when we deduce
alignment of arguments of either. Since we would need to keep them in
sync, which is not as simple, we simply avoid deducing alignment for
arguments of the must-tail caller or callee.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D76673
We create a lot of AbstractAttributes and they live as long as
the Attributor does. It seems reasonable to allocate them via a
BumpPtrAllocator owned by the Attributor.
Reviewed By: baziotis
Differential Revision: https://reviews.llvm.org/D76589
Make the attributor pass aware of aligned_alloc for converting heap
allocations to stack ones.
Depends on D76971.
Differential Revision: https://reviews.llvm.org/D76974
Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it.
Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support.
Differential Revision: https://reviews.llvm.org/D76255
Minor update/fixes to comments for the Attributor pass, and dyn_cast -> cast.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76972
This patch integrates operand bundle llvm.assumes [0] with the
Attributor. Most IRAttributes will now look at uses of the associated
value and if there are llvm.assume operand bundle uses with the right
tag we will check if they are in the must-be-executed-context (around
the context instruction). Droppable users, which is currently only
llvm::assume, are handled special in some places now as well.
[0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D74888
PR35760 shows an example program which, when compiled with `clang -O0`
or gcc at any optimization level, prints '0'. However, llvm transforms
the program in a way that causes it to print '1'.
Fix the issue by having `AllUsesOfValueWillTrapIfNull` return false when
analyzing a load from a global which is used by an `icmp`. This special
case was untested [0] so this is just deleting dead code.
An alternative fix might be to change the GlobalStatus analysis for the
global to report "Stored" instead of "StoredOnce". However, "StoredOnce"
is appropriate when only one value other than the initializer is stored
to the global.
[0]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/Transforms/IPO/GlobalOpt.cpp.html#L662
Differential Revision: https://reviews.llvm.org/D76645
Validation of the found runtime library functions declarations types
(return and argument types) with the expected types.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D76058
We special cased must-tail calls all over the place because they cannot
be modified as other calls can be. However, we already centralized the
modification API so we can centralize the handling as well. This
simplifies the code and allows to remove must-tail calls completely.
D75801 removed the last and only user of this option, so we can
drop it now. The original idea behind this was to only run expensive
transforms under -O3, but apart from the one known bits transform,
this has never really taken off. I believe nowadays the recommendation
is to put expensive transforms in AggressiveInstCombine instead,
though that isn't terribly popular either :)
Differential Revision: https://reviews.llvm.org/D76540
The existence of the class is more confusing than helpful, I think; the
commonality is mostly just "GEP is legal", which can be queried using
APIs on GetElementPtrInst.
Differential Revision: https://reviews.llvm.org/D75660
This patch add mayContainUnboundedCycle helper function which checks whether a function has any cycle which we don't know if it is bounded or not.
Loops with maximum trip count are considered bounded, any other cycle not.
It also contains some fixed tests and some added tests contain bounded and
unbounded loops and non-loop cycles.
Reviewed By: jdoerfert, uenoku, baziotis
Differential Revision: https://reviews.llvm.org/D74691
Resolution for below fixme:
(ii) Check whether the value is captured in the scope using AANoCapture.
FIXME: This is conservative though, it is better to look at CFG and
check only uses possibly executed before this callsite.
Propagates caller argument's noalias attribute to callee.
Reviewed by: jdoerfert, uenoku
Reviewers: jdoerfert, sstefan1, uenoku
Subscribers: uenoku, sstefan1, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D71617
This patch introduces the propagation of known information based on path exploration.
For example,
```
int u(int c, int *p){
if(c) {
return *p;
} else {
return *p + 1;
}
}
```
An argument `p` is dereferenced whatever c's value is.
For an instruction `CtxI`, we accumulate branch instructions in the must-be-executed-context of `CtxI` and then, we take the conjunction of the successors' known state.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D65593
It is possible that an instruction to be changed to unreachable is
in the same block with a terminator that can be constant-folded.
In this case, as of now, the instruction will be changed to
unreachable before the terminator is folded. But, then the
whole BB becomes invalidated and so when we go ahead to fold
the terminator, we trap.
Change the order of these two.
Differential Revision: https://reviews.llvm.org/D75780
If we infer the dso_local flag for -fpic, dso_local should be dropped
when we convert a GlobalVariable a declaration. dso_local causes the
generation of direct access (e.g. R_X86_64_PC32). Such relocations referencing
STB_GLOBAL STV_DEFAULT objects are not allowed in a -shared link.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D74749
Spin-off from D75407. As described there, ConstantFoldConstant()
currently returns null for non-ConstantExpr/ConstantVector inputs,
but otherwise always returns non-null, independently of whether
any folding has happened or not.
This is confusing and makes consumer code more complicated.
I would expect either that ConstantFoldConstant() returns only if
it actually folded something, or that it always returns non-null.
I'm going to the latter possibility here, which appears to be more
useful considering existing usage.
Differential Revision: https://reviews.llvm.org/D75543
The initial placement of vector-combine in the opt pipeline revealed phase ordering bugs:
https://bugs.llvm.org/show_bug.cgi?id=45015https://bugs.llvm.org/show_bug.cgi?id=42022
This patch contains a few independent changes:
1. Move the pass up in the pipeline, so it happens just after loop-vectorization.
This is only to keep vectorization passes together in the pipeline at the moment.
I don't have evidence of interaction between these yet.
2. Add an -early-cse pass after -vector-combine to clean up redundant ops. This was
partly proposed as far back as rL219644 (which is why it's effectively being moved
in the old PM code). This is important because the subsequent -instcombine doesn't
work as well without EarlyCSE. With the CSE, -instcombine is able to squash
shuffles together in 1 of the tests (because those are simple "select" shuffles).
3. Remove the -vector-combine pass that was running after SLP. We may want to do that
eventually, but I don't have a test case to support it yet.
Differential Revision: https://reviews.llvm.org/D75145
This reverts commit 80d0a137a5, and the
follow on fix in 873c0d0786. It is
causing test failures after a multi-stage clang bootstrap. See
discussion on D73242 and D75201.
Summary:
Fixes an issue that cropped up after the changes in D73242 to delay
the lowering of type tests. LTT couldn't handle any type tests with
non-string type id (which happens for local vtables, which we try to
promote during the compile step but cannot always when there are no
exported symbols).
We can simply treat the same as having an Unknown resolution, which
delays their lowering, still allowing such type tests to be used in
subsequent optimization (e.g. planned usage during ICP). The final
lowering which simply removes these handles them fine.
Beefed up an existing ThinLTO test for such unpromoted type ids so that
the internal vtable isn't removed before lower type tests, which hides
the problem.
Reviewers: evgeny777, pcc
Subscribers: inglorion, hiraditya, steven_wu, dexonsmith, aganea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75201
This aims to fix a missed inlining case.
If there's a virtual call in the callee on an alloca (stack allocated object) in
the caller, and the callee is inlined into the caller, the post-inline cleanup
would devirtualize the virtual call, but if the next iteration of
DevirtSCCRepeatedPass doesn't happen (under the new pass manager), which is
based on a heuristic to determine whether to reiterate, we may miss inlining the
devirtualized call.
This enables inlining in clang/test/CodeGenCXX/member-function-pointer-calls.cpp.
This is a second commit after a revert
https://reviews.llvm.org/rG4569b3a86f8a4b1b8ad28fe2321f936f9d7ffd43 and a fix
https://reviews.llvm.org/rG41e06ae7ba91.
Differential Revision: https://reviews.llvm.org/D69591
Summary:
Final patch in series to fix inlining between functions with different
nobuiltin attributes/options, which was specifically an issue in LTO.
See discussion on D61634 for background.
The prior patch in this series (D67923) enabled per-Function TLI
construction that identified the nobuiltin attributes.
Here I have allowed inlining to proceed if the callee's nobuiltins are a
subset of the caller's nobuiltins, but not in the reverse case, which
should be conservatively correct. This is controlled by a new option,
-inline-caller-superset-nobuiltin, which is enabled by default.
Reviewers: hfinkel, gchatelet, chandlerc, davidxl
Subscribers: arsenm, jvesely, nhaehnle, mehdi_amini, eraman, hiraditya, haicheng, dexonsmith, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74162
DevirtSCCRepeatedPass iteration. Needs ReviewPublic
This aims to fix a missed inlining case.
If there's a virtual call in the callee on an alloca (stack allocated object) in
the caller, and the callee is inlined into the caller, the post-inline cleanup
would devirtualize the virtual call, but if the next iteration of
DevirtSCCRepeatedPass doesn't happen (under the new pass manager), which is
based on a heuristic to determine whether to reiterate, we may miss inlining the
devirtualized call.
This enables inlining in clang/test/CodeGenCXX/member-function-pointer-calls.cpp.
If we deduplicate OpenMP runtime calls we have multiple `ident_t*` that
represent information like source location. So far, we simply kept the
one used by the replacement call. However, as exposed by PR44893, that
can cause problems if we have stack allocated `ident_t` objects. While
we need to revisit the use of these as well, it is clear that we
eventually want to merge source location information in some way. With
this patch we add the infrastructure to do so but without doing the
actual merge. Instead we pick a global `ident_t` from the replaced
calls, if possible, or create a new one with an unknown location
instead.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D74925
This patch adds bindings to C and Go for
addCoroutinePassesToExtensionPoints, which is used to add coroutine
passes to the correct locations in PassManagerBuilder.
Differential Revision: https://reviews.llvm.org/D51642
We can look through calls with `returned` argument attributes when we
collect subsuming positions. This allows us to get existing attributes
from more places.
We are often interested in an assumed constant and sometimes it has to
be an integer constant. Before we only looked for the latter, now we can
ask for either.
If we propagate function pointers across function boundaries we can
create new call edges. These need to be represented in the CG if we run
as a CGSCC pass. In the new pass manager that is currently not handled
by the CallGraphUpdater so we need to prevent the situation for now.
We usually will ask for liveness of an argument anyway so we ended up
lazily creating the attribute anyway. However, that is not always the
case and even if it is we should go the eager route here. Various tests
show how this can improve the outcome. One test exposed a problem with
type mismatches between argument and call site argument, a fix is
included. For liveness various more tests were added as well.
If a function pointer is casted into a different type the resulting
expression can be a constant. If so, it can be used multiple times which
cannot be handled by the AbstractCallSite constructor alone. Instead, we
follow the cast expression uses now explicitly during the call site
traversal.
In addition to a single bit per memory locations, e.g., globals and
arguments, we now collect more information about the actual accesses,
e.g., what instruction caused it, was it a read/write/read+write, and
what the underlying base pointer was. Follow up patches will make
explicit use of this.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D73527
While the function return updateImpl did only look at call sites the
manifest method looked at return values. If we don't do this during the
updateImpl we might create new abstract attributes during manifest. This
is a problem when it comes to liveness information.
This caused an error when passes iterated over cached assumptions in the
tracker and assumed them to be `null` or an instruction. I failed to
create a test case so far.
In addition to memory behavior attributes (readonly/writeonly) we now
derive memory location attributes (argmemonly/inaccessiblememonly/...).
The former is part of AAMemoryBehavior and the latter part of
AAMemoryLocation. While they are similar in nature it got messy when
they were put in a single AA. Location attributes for arguments and
floating values will follow later.
Note that both memory attributes kinds can derive readnone. If there are
no accesses AAMemoryBehavior will derive readnone. If there are accesses
but only to stack (=local) locations AAMemoryLocation will derive
readnone.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D73426
Due to the genericValueTraversal we might visit values for which we did
not create an AAValueConstantRange object, e.g., as they are behind a
PHI or select or call with `returned` argument. As a consequence we need
to validate the types as we are about to query AAValueConstantRange for
operands.
We used coarse-grained liveness before, thus we looked if the
instruction was executed, but we did not use fine-grained liveness,
hence if the instruction was needed or could be deleted even if the
surrounding ones are live. This patches introduces this level of
liveness checks together with other liveness queries, e.g., for uses.
For more control we enforce that all liveness queries go through the
Attributor.
Test have been adjusted to reflect the changes or augmented to prevent
deletion of the parts we want to check.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D73313
If we have a replacement for a value, via AAValueSimplify, the original
value will lose all its uses. Thus, as long as a value is simplified we
can skip the uses in checkForAllUses, given that these uses are
transitive uses for the simplified version and will therefore affect the
simplified version as necessary.
Since this allowed us to remove calls without side-effects and a known
return value, we need to make sure not to eliminate `musttail` calls.
Those we keep around, or later remove the entire `musttail` call chain.
We relied on wouldInstructionBeTriviallyDead before but that functions
does not take assumed information, especially for calls, into account.
The replacement, AAIsDead::isAssumeSideEffectFree, does.
This change makes AAIsDeadCallSiteReturn more complex as we can have
a dead call or only dead users.
The test have been modified to include a side effect where there was
none in order to keep the coverage.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D73311
As an approximation to a dead edge we can check if the terminator is
dead. If so, the corresponding operand use in a PHI node is dead even if
the PHI node itself is not.
This restores commit 748bb5a0f1, along
with a fix for a Chromium test suite build issue (and a new test for
that case).
Differential Revision: https://reviews.llvm.org/D73242
The changeXXXAfterManifest functions are better suited to deal with
changes so we should prefer them. These functions also recursively
delete dead instructions which is why we see test changes.
This is a minimal but important advancement over the existing code. A
cast with an operand that is only used in the cast retains the no-alias
property of the operand.
Traversing PHI nodes is natural with the genericValueTraversal but also
a bit tricky. The problem is similar to the ones we have seen in AAAlign
and AADereferenceable, namely that we continue to increase the range in
each iteration. We use a pessimistic approach here to stop the
iterations. Nevertheless, optimistic information can now be propagated
through a PHI node.
The change is performed as stated by the FIXME and the tests are
adjusted. All changes look fine to me and values can be inferred as
undef without it being an error.
Casts can be handled natively by the ConstantRange class. We do limit it
to extends for now as we assume an integer type in different locations.
A TODO and a test case with a FIXME was added to remove that restriction
in the future.
We have several bug reports that could be characterized as "reducing scalarization",
and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector
pass that runs before/after other vectorization passes.
There are 4 alternate options that I can think of to deal with this kind of problem
(and we've seen various attempts at all of these), but they all have flaws:
InstCombine - can't happen without TTI, but we don't want target-specific
folds there.
SDAG - too late to assist other vectorization passes; TLI is not equipped
for these kind of cost queries; limited to a single basic block.
CGP - too late to assist other vectorization passes; would need to re-implement
basic cleanups like CSE/instcombine.
SLP - doesn't fit with existing transforms; limited to a single basic block.
This initial patch/transform is based on existing code in AggressiveInstCombine:
we walk backwards through the function looking for a pattern match. But we diverge
from that cost-independent IR canonicalization pass by using TTI to decide if the
vector alternative is profitable.
We probably have at least 10 similar bug reports/patterns (binops, constants,
inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements.
It's possible that we could iterate on a worklist to fix-point like InstCombine does,
but it's safer to start with a most basic case and evolve from there, so I didn't
try to do anything fancy with this initial implementation.
Differential Revision: https://reviews.llvm.org/D73480
The LoopExtractor created new functions (by definition), which violates
the restrictions of a LoopPass.
The correct implementation of this pass should be as a ModulePass.
Includes reverting rL82990 implications on the LoopExtractor.
Fixes PR3082 and PR8929.
Differential Revision: https://reviews.llvm.org/D69069
In addition to the module pass, this patch introduces a CGSCC pass that
runs the Attributor on a strongly connected component of the call graph
(both old and new PM). The Attributor was always design to be used on a
subset of functions which makes this patch mostly mechanical.
The one change is that we give up `norecurse` deduction in the module
pass in favor of doing it during the CGSCC pass. This makes the
interfaces simpler but can be revisited if needed.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D70767
Parallel regions known to be read-only, e.g., after we removed all dead
write accesses, and terminating (`willreturn`) can be removed.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69954
This adds ~27 more runtime calls to the OpenMPKinds.def file, all with
attributes. We deduplicate 16 of those automatically in function =
thread scope. And we annotate all of them automatically during the
OpenMPOpt discovery step. A test with all omp_XXXX runtime calls to
track annotation coverage is included.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69984
The OpenMPOpt pass is a CGSCC pass in which OpenMP specific
optimizations can reside.
The OpenMPOpt pass uses the OpenMPKinds.def file to identify runtime
calls and their uses. This allows targeted transformations and eases
their implementation.
This initial patch deduplicates `__kmpc_global_thread_num` and
`omp_get_thread_num` calls. We can also identify arguments that are
equivalent to such a call result and use it instead. Later we can
determine "gtid" arguments based on the use in kernel functions etc.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69930
Summary:
Currently type test assume sequences inserted for devirtualization are
removed during WPD. This patch delays their removal until later in the
optimization pipeline. This is an enabler for upcoming enhancements to
indirect call promotion, for example streamlined promotion guard
sequences that compare against vtable address instead of the target
function, when there are small number of possible vtables (either
determined via WPD or by in-progress type profiling). We need the type
tests to correlate the callsites with the address point offset needed in
the compare sequence, and optionally to associated type summary info
computed during WPD.
This depends on work in D71913 to enable invocation of LowerTypeTests to
drop type test assume sequences, which will now be invoked following ICP
in the ThinLTO post-LTO link pipelines, and also after the existing
export phase LowerTypeTests invocation in regular LTO (which is already
after ICP). We cannot simply move the existing import phase
LowerTypeTests pass later in the ThinLTO post link pipelines, as the
comment in PassBuilder.cpp notes (it must run early because when
performing CFI other passes may disturb the sequences it looks for).
This necessitated adding a new type test resolution "Unknown" that we
can use on the type test assume sequences previously removed by WPD,
that we now want LTT to ignore.
Depends on D71913.
Reviewers: pcc, evgeny777
Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73242
Summary:
A recent change to enable more importing of global variables with
references exposed some efficiency issues with export computation.
See D73724 for more information and detailed analysis.
The first was specific to variable importing. The code was marking every
copy of a referenced value (from possibly thousands of files in the case
of linkonce_odr) as exported, and we only need to mark the copy in the
module containing the variable def being imported as exported. The
reason is that this is tracking what values are newly exported as a
result of importing. Anything that was defined in another module and
simply used in the exporting module is already exported, and would have
been identified by the caller (e.g. the LTO API implementations).
The second issue is that the code was re-adding previously exported
values (along with all references). It is easy to identify when a
variable was already imported into the same module (via the
import list insert call return value), and we already did this for
function importing. However, what we weren't doing for either function
or variable importing was avoiding a re-insertion when it was previously
exported into a different importing module. The reason we couldn't do
this is there was no way of telling from the export list whether it was
previously inserted there because its definition was exported (in which
case we already marked all its references as exported) from when it was
inserted there because it was referenced by another exported value (in
which case we haven't yet inserted its own references).
To address this we can restructure the way the export list is
constructed. This patch only adds the actual imported definitions
(variable or function) to the export list for its module during the
import computation. After import computation is complete, where we were
already post-processing the export list we go ahead and add all
references made by those exported values to the export list.
These changes speed up the thin link not only with constant variable
importing enabled, but also without (due to the efficiency improvement
in function importing).
Some thin link user time measurements for one large application, average
of 5 runs:
With constant variable importing enabled:
- without this patch: 479.5s
- with this patch: 74.6s
Without constant variable importing enabled:
- without this patch: 80.6s
- with this patch: 70.3s
Note I have not re-enabled constant variable importing here, as I would
like to do additional compile time measurements with these fixes first.
Reviewers: evgeny777
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73851
If all call sites are in `norecurse` functions we can derive `norecurse`
as the ReversePostOrderFunctionAttrsPass does. This should make
ReversePostOrderFunctionAttrsLegacyPass obsolete once the Attributor is
enabled.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D72017
If we know that all call sites have been processed we can derive an
early fixpoint. The use in this patch is likely not to trigger right now
but a follow up patch will make use of it.
Reviewed By: uenoku, baziotis
Differential Revision: https://reviews.llvm.org/D72016
Summary:
In a release build this variable becomes unused and may break the build
with `-Werror,-Wunused-variable`.
Reviewers: gribozavr2, jdoerfert, sstefan1
Reviewed By: gribozavr2
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73683
A pointer is privatizeable if it can be replaced by a new, private one.
Privatizing pointer reduces the use count, interaction between unrelated
code parts. This is a first step towards replacing argument promotion.
While we can already handle recursion (unlike argument promotion!) we
are restricted to stack allocations for now because we do not analyze
the uses in the callee.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D68852
The helpers AAReturnedFromReturnedValues and
AACallSiteReturnedFromReturned are useful not only to avoid code
duplication but also to avoid recomputation of results. If we have N
call sites we should not recompute the function return information N
times but once. These are mostly straightforward usages with some minor
improvements on the helpers and addition of a new one
(IRPosition::getAssociatedType) that knows about function return types.
This commit fixes PR39321.
GlobalExtensions is not guaranteed to be destroyed when optimizer plugins are unloaded. If it is indeed destroyed after a plugin is dlclose-d, the destructor of the corresponding ExtensionFn is not mapped anymore, causing a call to unmapped memory during destruction.
This commit guarantees that extensions coming from external plugins are removed from GlobalExtensions when the plugin is unloaded if GlobalExtensions has not been destroyed yet.
Differential Revision: https://reviews.llvm.org/D71959
There was a TODO in AAValueConstantRangeArgument to reuse
AAArgumentFromCallSiteArguments. We now do this by allowing new States
to be build from the bestState.
If we invalidate an attribute we need to inform all dependent ones even
if the fixpoint state is not invalid. Before we only continued
invalidation if the fixpoint state was invalid, now we signal a change
in case the fixpoint state is valid.
The test case was already included in D71620 but the problem was hiding
because it only manifested with the old PM (for that input).
This patch modularizes the way we check for no-alias call site arguments
by putting the existing logic into helper functions. The reasoning was
not changed but special cases for readonly/readnone were added.
If `null` is not defined we cannot access it, hence the pointer is
`noalias`. While this is not helpful on it's own it simplifies later
deductions that can skip over already known `noalias` pointers in
certain situations.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
This restores 59733525d3 (D71913), along
with bot fix 19c76989bb.
The bot failure should be fixed by D73418, committed as
af954e441a.
I also added a fix for non-x86 bot failures by requiring x86 in new test
lld/test/ELF/lto/devirt_vcall_vis_public.ll.
When we use information only to short-cut deduction or improve it, we
can use OPTIONAL dependences instead of REQUIRED ones to avoid cascading
pessimistic fixpoints.
We also need to track dependences only when we use assumed information,
e.g., we act on assumed liveness information.
It can happen that we have instructions in the ToBeDeletedInsts set
which are deleted earlier already. To avoid dangling pointers we use
weak tracking handles.
When we follow uses, e.g., in AAMemoryBehavior or AANoCapture, we need
to make sure the value is a pointer before we ask for abstract
attributes only valid for pointers. This happens because we follow
pointers through calls that do not capture but may return the value.
We might accidentally ask AAValueSimplify to simplify a void value. That
can lead to very interesting, and very wrong, results. We now handle
this case gracefully.
If alignment was manifested but it is actually only as good as the
data-layout provided one we should not report it as a change.
For testing purposes we still manifest the information.
Summary:
Third part in series to support Safe Whole Program Devirtualization
Enablement, see RFC here:
http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html
This patch adds type test metadata under -fwhole-program-vtables,
even for classes without hidden visibility. It then changes WPD to skip
devirtualization for a virtual function call when any of the compatible
vtables has public vcall visibility.
Additionally, internal LLVM options as well as lld and gold-plugin
options are added which enable upgrading all public vcall visibility
to linkage unit (hidden) visibility during LTO. This enables the more
aggressive WPD to kick in based on LTO time knowledge of the visibility
guarantees.
Support was added to all flavors of LTO WPD (regular, hybrid and
index-only), and to both the new and old LTO APIs.
Unfortunately it was not simple to split the first and second parts of
this part of the change (the unconditional emission of type tests and
the upgrading of the vcall visiblity) as I needed a way to upgrade the
public visibility on legacy WPD llvm assembly tests that don't include
linkage unit vcall visibility specifiers, to avoid a lot of test churn.
I also added a mechanism to LowerTypeTests that allows dropping type
test assume sequences we now aggressively insert when we invoke
distributed ThinLTO backends with null indexes, which is used in testing
mode, and which doesn't invoke the normal ThinLTO backend pipeline.
Depends on D71907 and D71911.
Reviewers: pcc, evgeny777, steven_wu, espindola
Subscribers: emaste, Prazek, inglorion, arichardson, hiraditya, MaskRay, dexonsmith, dang, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71913
The utility method RecursivelyDeleteTriviallyDeadInstructions receives
as input a vector of Instructions, where all inputs are valid
instructions. This same vector is used as a scratch storage (per the
header comment) to recursively delete instructions. If an instruction is
added as an operand of multiple other instructions, it may be added twice,
then deleted once, then the second reference in the vector is invalid.
Switch to using a Vector<WeakTrackingVH>.
This change facilitates a clean-up in LoopStrengthReduction.
Summary:
First patch to support Safe Whole Program Devirtualization Enablement,
see RFC here: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html
Always emit !vcall_visibility metadata under -fwhole-program-vtables,
and not just for -fvirtual-function-elimination. The vcall visibility
metadata will (in a subsequent patch) be used to communicate to WPD
which vtables are safe to devirtualize, and we will optionally convert
the metadata to hidden visibility at link time. Subsequent follow on
patches will help enable this by adding vcall_visibility metadata to the
ThinLTO summaries, and always emit type test intrinsics under
-fwhole-program-vtables (and not just for vtables with hidden
visibility).
In order to do this safely with VFE, since for VFE all vtable loads must
be type checked loads which will no longer be the case, this patch adds
a new "Virtual Function Elim" module flag to communicate to GlobalDCE
whether to perform VFE using the vcall_visibility metadata.
One additional advantage of using the vcall_visibility metadata to drive
more WPD at LTO link time is that we can use the same mechanism to
enable more aggressive VFE at LTO link time as well. The link time
option proposed in the RFC will convert vcall_visibility metadata to
hidden (aka linkage unit visibility), which combined with
-fvirtual-function-elimination will allow it to be done more
aggressively at LTO link time under the same conditions.
Reviewers: pcc, ostannard, evgeny777, steven_wu
Subscribers: mehdi_amini, Prazek, hiraditya, dexonsmith, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71907
Summary:
InlineResult is used both in APIs assessing whether a call site is
inlinable (e.g. llvm::isInlineViable) as well as in the function
inlining utility (llvm::InlineFunction). It means slightly different
things (can/should inlining happen, vs did it happen), and the
implicit casting may introduce ambiguity (casting from 'false' in
InlineFunction will default a message about hight costs,
which is incorrect here).
The change renames the type to a more generic name, and disables
implicit constructors.
Reviewers: eraman, davidxl
Reviewed By: davidxl
Subscribers: kerbowa, arsenm, jvesely, nhaehnle, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72744
Summary:
This patch introduces `AAValueConstantRange`, which answers a possible range for integer value in a specific program point.
One of the motivations is propagating existing `range` metadata. (I think we need to change the situation that `range` metadata cannot be put to Argument).
The state is a tuple of `ConstantRange` and it is initialized to (known, assumed) = ([-∞, +∞], empty).
Currently, AAValueConstantRange is created in `getAssumedConstant` method when `AAValueSimplify` returns `nullptr`(worst state).
Supported
- BinaryOperator(add, sub, ...)
- CmpInst(icmp eq, ...)
- !range metadata
`AAValueConstantRange` is not intended to extend to polyhedral range value analysis.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: phosek, davezarzycki, baziotis, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71620
This ports the MergeFunctions pass to the NewPM. This was rather
straightforward, as no analyses are used.
Additionally MergeFunctions needs to be conditionally enabled in
the PassBuilder, but I left that part out of this patch.
Differential Revision: https://reviews.llvm.org/D72537
Summary:
An assert added to the index-based WPD was trying to verify that we only
have multiple vtables for a given guid when they are all non-external
linkage. This is too conservative because we may have multiple external
vtable with the same guid when they are in comdat. Remove the assert,
as we don't have comdat information in the index, the linker should
issue an error in this case.
See discussion on D71040 for more information.
Reviewers: evgeny777, aganea
Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72648
This reverts commit a03d7b0f24.
As discussed in D68298, this causes a compile-time regression, in case
the DTs requested are not used elsewhere in GlobalOpt. We should only
get the DTs if they are available here, but this seems not possible with
the legacy pass manager from a module pass.
Summary:
A recent fix in D69452 fixed index based WPD in the presence of
available_externally vtables. It added a cast of the vtable def
summary to a GlobalVarSummary. However, in some cases one def may be an
alias, in which case we need to get the base object before casting,
otherwise we will crash.
Reviewers: evgeny777, steven_wu, aganea
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71040
When we replace instructions with unreachable we delete instructions. We
now avoid dangling pointers to those deleted instructions in the
`ToBeChangedToUnreachableInsts` set. Other modification collections
might need to be updated in the future as well.
Summary:
For artificial cases (huge array, few usages), Global SRA optimization creates
a lot of redundant data. It creates an instance of GlobalVariable for each array
element. For huge array, that means huge compilation time and huge memory usage.
Following example compiles for 10 minutes and requires 40GB of memory.
namespace {
char LargeBuffer[64 * 1024 * 1024];
}
int main ( void ) {
LargeBuffer[0] = 0;
printf("\n ");
return LargeBuffer[0] == 0;
}
The fix is to avoid Global SRA for large arrays.
Reviewers: craig.topper, rnk, efriedma, fhahn
Reviewed By: rnk
Subscribers: xbolva00, lebedev.ri, lkail, merge_guards_bot, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71993
If we replace a function with a new one because we rewrite the
signature, dead users may still refer to the old version. With this
patch we reuse the code that deals with dead functions, which the old
versions are, to avoid problems.
An inbounds GEP results in poison if the value is not "inbounds", not in
UB. We accidentally derived nonnull and dereferenceable from these
inbounds GEPs even in the absence of accesses that would make the poison
to UB.