When we replace constant returns at the call site we did issue a cast in
the hopes it would be a no-op if the types are equal. Turns out that is
not the case and we have to check it ourselves first.
Reused an IPConstantProp test for coverage. No functional change to the
test wrt. IPConstantProp.
Even if the invoked function may-return, we can replace it with a call
and branch if it is nounwind. We had almost everything in place to do
this but did not which actually caused a crash when we removed the
landingpad from the actually dead unwind block.
Exposed by the IPConstantProp tests.
We did merge "known" and "assumed" liveness information into a single
set which caused various kinds of problems, especially because we did
not properly record when something was actually known. With this patch
we properly track the "known" bit and distinguish dead ends we know from
the ones we still need to explore in future updates.
We gave up on `noreturn` if `willreturn` was known for a while but we
now again try to always derive `noreturn`. This is useful because a
function that is `noreturn` + `willreturn` is basically dead as
executing it would lead to undefined behavior (UB).
This came up in the IPConstantProp cases where a function only contained
a unreachable but was not marked `noreturn` which caused missed
opportunities down the line.
In D69605 only the "cases" of a switch were handled but if none matched
we did not make the default case life. This is fixed now and properly
tested (with code from IPConstantProp/user-with-multiple-uses.ll).
We cannot simply replace arguments that carry attributes like `nest`,
`inalloca`, `sret`, and `byval`. Except for the last one, which we can
replace if it is not written, we bail for now.
Trying to deduce information for declarations and calls sites of
declarations is not useful in practice but only for testing. Add a flag
that disables this by default but also enable it in the tests.
The misc.ll test will verify the flag "works" as expected.
We cannot look at the subsuming positions and take their nocapture bit
as shown with the two tests for which we derived nocapture on the call
site argument and readonly on the argument of the second before.
Before we did not follow casts and geps when we looked at the users of a
pointer in the pointers must-be-executed-context. This caused us to fail
to determine if it was accessed for sure. With this change we follow
such users now.
The above extension exposed problems in getKnownNonNullAndDerefBytesForUse
which did not always check what the base pointer was. We also did not
handle negative offsets as conservative as we have to without explicit
loop handling. Finally, we should not derive a huge number if we access
a pointer that was traversed backwards first.
The problems exposed by this functional change are already tested in the
existing test cases as is the functional change.
Differential Revision: https://reviews.llvm.org/D69647
Summary:
In order to get context sensitivity from isKnownNonZero we need to
provide a context instruction *and* a dominator tree. The latter is
passed now to which actually allows to remove some initialization code.
Tests taken from PR43833.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69595
The replacement code only looks at the first index of the
extractvalue. If there are additional indices we'll end
up doing a bad replacement.
This only happens if the function returns a nested struct. Not
sure if clang ever generates such code. The original report came
from ispc.
Fixes PR43857
Differential Revision: https://reviews.llvm.org/D69656
Summary:
If control is transferred to a successor is the key question when it
comes to liveness. The new implementation puts that question in the
focus and thereby providing a clean way to assume certain CFG edges are
dead or instructions will not transfer control.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69605
Summary:
This patch introduces liveness (AAIsDead) for all positions, thus for
all kinds of values. For now, we say an instruction is dead if it would
be removed assuming all users are dead. A call site return is different
as we just look at the users. If all call site returns have been
eliminated, the return values can return undef instead of their original
value, eliminating uses.
We try to recursively delete dead instructions now and we introduce a
simple check interface for use-traversal.
This is the idea tried out in D68626 but implemented in the right way.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68925
Deleting blocks will require us to deal with dead edges, e.g.,
`br i1 false, label %live, label %dead`
explicitly. For now we just clear the blocks and move on.
This will be revisited once we actually fold branches.
Summary:
If there is a unique free of the allocated that has to be reached from
the malloc, we can apply the heap-2-stack transformation even if the
pointer escapes.
Reviewers: hfinkel, sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68958
If an attribute did not query any optimistic (=non-fixed) information to
justify its state, we know the attribute state will not change anymore.
Thus, we can indicate an optimistic fixpoint.
We pretended IRPosition came either as mutable or immutable objects
while they are basically always immutable, with a single (existing)
unfortunate exceptions. This patch cleans up the uses to deal with the
immutable version.
Summary:
Clang does not add type metadata to available_externally vtables. When
choosing a summary to look at for virtual function definitions, make
sure we skip summaries for any available externally vtables as they will
not describe any virtual function functions, which are only summarized
in the presence of type metadata on the vtable def. Simply look for the
corresponding strong def's summary.
Also add handling for same-named local vtables with the same GUID
because of same-named files without enough distinguishing path.
In that case we return a conservative result with no devirtualization.
Reviewers: pcc, davidxl, evgeny777
Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69452
To make IntegerState more flexible but also less error prone we split it
up into (1) incrementing, (2) decrementing, and (3) bit-tracking states.
This adds functionality compared to before and disallows misuse, e.g.,
"incrementing" updates on a bit-tracking state.
Part of the change is a single operator in the base class which
simplifies helper functions that deal with states.
There are certain functional changes but all of which should actually be
corrections.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<> directly and if not assert will fire for us.
llvm-svn: 375427
AAReturnedValues, AAMemoryBehavior, and AANoUnwind, can provide
information that helps during the tracking or even justifies no-capture.
We now use this information and enable no-capture in some test cases
designed a long while a ago for these cases.
llvm-svn: 375382
by ExtBinary format profile
Profile on-demand loading was added for ExtBinary format profile in rL374233,
but currently profile on-demand loading doesn't work well with profile
remapping. The patch adds the support.
Suppose a function in the current module has outline instance in the profile.
The function name in the module is different from the name of the outline
instance, but remapper knows the two names are equal. When loading profile
on-demand, the outline instance has to be loaded with remapper's help.
At the same time SampleProfileReaderItaniumRemapper is changed from a proxy
of SampleProfileReader to a helper member in SampleProfileReader.
Differential Revision: https://reviews.llvm.org/D68901
llvm-svn: 375295
Remove dead virtual functions from vtables with
replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up
correctly.
Original commit message:
Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.
This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.
To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.
The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.
This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.
To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.
I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.
On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.
I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.
I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).
Differential revision: https://reviews.llvm.org/D63932
llvm-svn: 375094
Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374784
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374743
No-return and will-return are exclusive, assuming the latter is more
prominent we can avoid updates of the former unless will-return is not
known for sure.
llvm-svn: 374739
Even if an argument is captured, we cannot have an effect the function
does not have. This is fine except for the special case of `inalloca` as
it does not behave by the rules.
TODO: Maybe the special rule for `inalloca` is wrong after all.
llvm-svn: 374736
Summary:
This changes "CHECK" check lines to "ATTRIBUTOR" check lines where
necessary and also fixes the now exposed, mostly minor, problems.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68929
llvm-svn: 374735
Before, we eagerly split blocks even if it was not necessary, e.g., they
had a single unreachable instruction and only a single predecessor.
llvm-svn: 374703
We do not yet perform h2s because we know something is free'ed but we do
it because we know the pointer does not escape. Storing the pointer
allows it to escape so we have to prevent that.
llvm-svn: 374699
H2S did apply to mallocs of non-constant sizes if the uses were OK. This
is now forbidden through reording of the "good" and "bad" cases in the
conditional.
llvm-svn: 374698
The check for naked/optnone was insufficient for different reasons. We
now check before we initialize an abstract attribute and we do it for
all abstract attributes.
llvm-svn: 374694
Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.
This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.
To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.
The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.
This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.
To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.
I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.
On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.
I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.
I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).
Differential revision: https://reviews.llvm.org/D63932
llvm-svn: 374539
Summary:
`null` in the default address space (=AS 0) cannot be captured nor can
it alias anything. We make this clear now as it can be important for
callbacks and other cases later on. In addition, this patch improves the
debug output for noalias deduction.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68624
llvm-svn: 374280
in ExtBinary format
Currently for Text, Binary and ExtBinary format profiles, when we compile a
module with samplefdo, even if there is no function showing up in the profile,
we have to load all the function profiles from the profile input. That is a
waste of compile time.
CompactBinary format profile has already had the support of loading function
profiles on demand. In this patch, we add the support to load profile on
demand for ExtBinary format. It will work no matter the sections in ExtBinary
format profile are compressed or not. Experiment shows it reduces the time to
compile a server benchmark by 30%.
When profile remapping and loading function profiles on demand are both used,
extra work needs to be done so that the loading on demand process will take
the name remapping into consideration. It will be addressed in a follow-up
patch.
Differential Revision: https://reviews.llvm.org/D68601
llvm-svn: 374233
Factor out CodeExtractor's analysis of allocas (for shrinkwrapping
purposes), and allow the analysis to be reused.
This resolves a quadratic compile-time bug observed when compiling
AMDGPUDisassembler.cpp.o.
Pre-patch (Release + LTO clang):
```
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
176.5278 ( 57.8%) 0.4915 ( 18.5%) 177.0192 ( 57.4%) 177.4112 ( 57.3%) Hot Cold Splitting
```
Post-patch (ReleaseAsserts clang):
```
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
1.4051 ( 3.3%) 0.0079 ( 0.3%) 1.4129 ( 3.2%) 1.4129 ( 3.2%) Hot Cold Splitting
```
Testing: check-llvm, and comparing the AMDGPUDisassembler.cpp.o binary
pre- vs. post-patch.
An alternate approach is to hide CodeExtractorAnalysisCache from clients
of CodeExtractor, and to recompute the analysis from scratch inside of
CodeExtractor::extractCodeRegion(). This eliminates some redundant work
in the shrinkwrapping legality check. However, some clients continue to
exhibit O(n^2) compile time behavior as computing the analysis is O(n).
rdar://55912966
Differential Revision: https://reviews.llvm.org/D68616
llvm-svn: 374089
Summary:
In D65186 and related patches, MustBeExecutedContextExplorer is introduced. This enables us to traverse instructions guaranteed to execute from function entry. If we can know the argument is used as `dereferenceable` or `nonnull` in these instructions, we can mark `dereferenceable` or `nonnull` in the argument definition:
1. Memory instruction (similar to D64258)
Trace memory instruction pointer operand. Currently, only inbounds GEPs are traced.
```
define i64* @f(i64* %a) {
entry:
%add.ptr = getelementptr inbounds i64, i64* %a, i64 1
; (because of inbounds GEP we can know that %a is at least dereferenceable(16))
store i64 1, i64* %add.ptr, align 8
ret i64* %add.ptr ; dereferenceable 8 (because above instruction stores into it)
}
```
2. Propagation from callsite (similar to D27855)
If `deref` or `nonnull` are known in call site parameter attributes we can also say that argument also that attribute.
```
declare void @use3(i8* %x, i8* %y, i8* %z);
declare void @use3nonnull(i8* nonnull %x, i8* nonnull %y, i8* nonnull %z);
define void @parent1(i8* %a, i8* %b, i8* %c) {
call void @use3nonnull(i8* %b, i8* %c, i8* %a)
; Above instruction is always executed so we can say that@parent1(i8* nonnnull %a, i8* nonnull %b, i8* nonnull %c)
call void @use3(i8* %c, i8* %a, i8* %b)
ret void
}
```
Reviewers: jdoerfert, sstefan1, spatel, reames
Reviewed By: jdoerfert
Subscribers: xbolva00, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65402
llvm-svn: 374063
Summary: This patch introduces a generic way to compose two structured deductions. This will be used for composing generic deduction with `MustBeExecutedExplorer` and other existing generic deduction.
Reviewers: jdoerfert, sstefan1
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66645
llvm-svn: 374060
The initialization logic has become part of the Attributor but the
patches that introduced these calls here were in development when the
transition happened.
We also now clean up (undefine) the macros used to create attributes.
llvm-svn: 373987
Local linkage is internal or private, and private is a specialization of
internal, so either is fine for all our "local linkage" queries.
llvm-svn: 373986
Summary:
When we iterate over uses of functions and expect them to be call sites,
we now use abstract call sites to allow callback calls.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, hfinkel, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67871
llvm-svn: 373985
Doing this makes MSVC complain that `empty(someRange)` could refer to
either C++17's std::empty or LLVM's llvm::empty, which previously we
avoided via SFINAE because std::empty is defined in terms of an empty
member rather than begin and end. So, switch callers over to the new
method as it is added.
https://reviews.llvm.org/D68439
llvm-svn: 373935
Without this we can encounter link errors or incorrect behaviour
at runtime as a result of the wrong function being referenced.
Differential Revision: https://reviews.llvm.org/D67945
llvm-svn: 373678
Summary:
This fixes a hole in the handling of devirtualized targets that were
local but need promoting due to devirtualization in another module. We
were not correctly referencing the promoted symbol in some cases. Make
sure the code that updates the name also looks at the ExportedGUIDs set
by utilizing a callback that checks all conditions (the callback
utilized by the internalization/promotion code).
Reviewers: pcc, davidxl, hiraditya
Subscribers: mehdi_amini, Prazek, inglorion, steven_wu, dexonsmith, dang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68159
llvm-svn: 373485
removeUnreachableBlocks knows how to preserve the DomTree, so make use
of it instead of re-computing the DT.
Reviewers: davide, kuhar, brzycki
Reviewed By: davide, kuhar
Differential Revision: https://reviews.llvm.org/D68298
llvm-svn: 373430
profile symbol list.
Currently many existing users using profile-sample-accurate want to reduce
code size as much as possible. Their use cases are different from the scenario
profile symbol list tries to handle -- the major motivation of adding profile
symbol list is to get the major memory/code size saving without introduce
performance regression. So to keep the behavior of profile-sample-accurate
unchanged, we think decoupling these two things and using a new flag to
control the handling of profile symbol list may be better.
When profile-sample-accurate and the new flag profile-accurate-for-symsinlist
are both present, since profile-sample-accurate is a user assertion we let it
have a higher precedence.
Differential Revision: https://reviews.llvm.org/D68047
llvm-svn: 373133
When a cold path is outlined, the value tracking in the assumption cache may be
invalidated due to the code motion. We would previously trip an assertion in
subsequent passes (but required the passes to happen in a single run as the
assumption cache is shared across the passes). Invalidating the cache ensures
that we get the correct information when needed with the legacy pass manager as
well.
llvm-svn: 372667
is available
In rL372232, we treated names showing up in profile as not cold when
profile-sample-accurate is enabled. This caused 70k size regression in
Chrome/Android. The patch put a guard and only enable the change when
profile symbol list is available, i.e., keep the old behavior when profile
symbol list is not available.
Differential Revision: https://reviews.llvm.org/D67931
llvm-svn: 372665
Enable flag introduced in rL294998. Security concerns are no longer valid, since function signatures for mentioned libc functions has no nonnull attribute (Clang does not generate them? I see no nonnull attr in LLVM IR for these functions) and since rL372091 we carefully annotate the callsites where we know that size is static, non zero. So let's enable this flag again..
llvm-svn: 372573
Summary:
This patch introduces `norecurse` function attribute deduction.
`norecurse` will be deduced if the following conditions hold:
* The size of SCC in which the function belongs equals to 1.
* The function doesn't have self-recursion.
* We have `norecurse` for all call site.
To avoid a large change, SCC is calculated using scc_iterator in InfoCache initialization for now.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67751
llvm-svn: 372475
is enabled.
We can save memory and reduce binary size significantly by enabling
ProfileSampleAccurate. However when ProfileSampleAccurate is true,
function without sample will be regarded as cold and this could
potentially cause performance regression.
To minimize the potential negative performance impact, we want to be
a little conservative here saying if a function shows up in the profile,
no matter as outline instance, inline instance or call targets, treat
the function as not being cold. This will handle the cases such as most
callsites of a function are inlined in sampled binary (thus outline copy
don't get any sample) but not inlined in current build (because of source
code drift, imprecise debug information, or the callsites are all cold
individually but not cold accumulatively...), so that the outline function
showing up as cold in sampled binary will actually not be cold after current
build. After the change, such function will be treated as not cold even
profile-sample-accurate is enabled.
At the same time we lower the hot criteria of callsiteIsHot check when
profile-sample-accurate is enabled. callsiteIsHot is used to determined
whether a callsite is hot and qualified for early inlining. When
profile-sample-accurate is enabled, functions without profile will be
regarded as cold and much less inlining will happen in CGSCC inlining pass,
so we can worry less about size increase and be aggressive to allow more
early inlining to happen for warm callsites and it is helpful for performance
overall.
Differential Revision: https://reviews.llvm.org/D67561
llvm-svn: 372232
Summary:
There were segfaults as we modified and iterated the instruction maps in
the cache at the same time. This was happening because we created new
instructions while we populated the cache. This fix changes the order
in which we perform these actions. First, the caches for the whole
module are created, then we start to create abstract attributes.
I don't have a unit test but the LLVM test suite exposes this problem.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67232
llvm-svn: 372105
Summary: This patch introduces a helper struct `AnalysisGetter` to put together analysis getters. In this patch, a getter for `AAResult` is also added for `noalias`.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67603
llvm-svn: 372072
D53362 gives a prototype heap-to-stack conversion pass. With addition of new attributes in the attributor, this can now be revisted and improved. This will place it in the Attributor to make it easier to use new attributes (eg. nofree, nosync, willreturn, etc.) and other attributor features.
Reviewers: jdoerfert, uenoku, hfinkel, efriedma
Subscribers: lebedev.ri, xbolva00, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D65408
llvm-svn: 371942
Summary: This should be obsolete once the functionality in D66967 is integrated.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67231
llvm-svn: 371915
This patch contains the basic functionality for reporting potentially
incorrect usage of __builtin_expect() by comparing the developer's
annotation against a collected PGO profile. A more detailed proposal and
discussion appears on the CFE-dev mailing list
(http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a
prototype of the initial frontend changes appear here in D65300
We revised the work in D65300 by moving the misexpect check into the
LLVM backend, and adding support for IR and sampling based profiles, in
addition to frontend instrumentation.
We add new misexpect metadata tags to those instructions directly
influenced by the llvm.expect intrinsic (branch, switch, and select)
when lowering the intrinsics. The misexpect metadata contains
information about the expected target of the intrinsic so that we can
check against the correct PGO counter when emitting diagnostics, and the
compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight.
We use these branch weight values to determine when to emit the
diagnostic to the user.
A future patch should address the comment at the top of
LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and
UnlikelyBranchWeight values into a shared space that can be accessed
outside of the LowerExpectIntrinsic pass. Once that is done, the
misexpect metadata can be updated to be smaller.
In the long term, it is possible to reconstruct portions of the
misexpect metadata from the existing profile data. However, we have
avoided this to keep the code simple, and because some kind of metadata
tag will be required to identify which branch/switch/select instructions
are influenced by the use of llvm.expect
Patch By: paulkirth
Differential Revision: https://reviews.llvm.org/D66324
llvm-svn: 371635
adding new read attribute to an argument
Summary: Update optimization pass to prevent adding read-attribute to an
argument without removing its conflicting attribute.
A read attribute, based on the result of the attribute deduction
process, might be added to an argument. The attribute might be in
conflict with other read/write attribute currently associated with the
argument. To ensure the compatibility of attributes, conflicting
attribute, if any, must be removed before a new one is added.
The following snippet shows the current behavior of the compiler, where
the compilation process is aborted due to incompatible attributes.
$ cat x.ll
; ModuleID = 'x.bc'
%_type_of_d-ccc = type <{ i8*, i8, i8, i8, i8 }>
@d-ccc = internal global %_type_of_d-ccc <{ i8* null, i8 1, i8 13, i8 0,
i8 -127 }>, align 8
define void @foo(i32* writeonly %.aaa) {
foo_entry:
%_param_.aaa = alloca i32*, align 8
store i32* %.aaa, i32** %_param_.aaa, align 8
store i8 0, i8* getelementptr inbounds (%_type_of_d-ccc,
%_type_of_d-ccc* @d-ccc, i32 0, i32 3)
ret void
}
$ opt -O3 x.ll
Attributes 'readnone and writeonly' are incompatible!
void (i32*)* @foo
in function foo
LLVM ERROR: Broken function found, compilation aborted!
The purpose of this changeset is to fix the above error. This fix is
based on a suggestion from Johannes @jdoerfert (many thanks!!!)
Authored By: anhtuyen
Reviewer: nicholas, rnk, chandlerc, jdoerfert
Reviewed By: rnk
Subscribers: hiraditya, jdoerfert, llvm-commits, anhtuyen, LLVM
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D58694
llvm-svn: 371622
This reverts commit r371584. It introduced a dependency from compiler-rt
to llvm/include/ADT, which is problematic for multiple reasons.
One is that it is a novel dependency edge, which needs cross-compliation
machinery for llvm/include/ADT (yes, it is true that right now
compiler-rt included only header-only libraries, however, if we allow
compiler-rt to depend on anything from ADT, other libraries will
eventually get used).
Secondly, depending on ADT from compiler-rt exposes ADT symbols from
compiler-rt, which would cause ODR violations when Clang is built with
the profile library.
llvm-svn: 371598
Summary:
We can query to Attributor whether the value is captured in the scope or not on the following way:
```
const auto & NoCapAA = A.getAAFor<AANoCapture>(*this, IRPosition::value(V));
```
And if V is CallSiteReturned then `getDeducedAttribute` will add `nocatpure` to the callsite returned value. It is not valid.
This patch checks the position is an argument or call site argument.
This is tested in D67286.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67342
llvm-svn: 371589
This patch contains the basic functionality for reporting potentially
incorrect usage of __builtin_expect() by comparing the developer's
annotation against a collected PGO profile. A more detailed proposal and
discussion appears on the CFE-dev mailing list
(http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a
prototype of the initial frontend changes appear here in D65300
We revised the work in D65300 by moving the misexpect check into the
LLVM backend, and adding support for IR and sampling based profiles, in
addition to frontend instrumentation.
We add new misexpect metadata tags to those instructions directly
influenced by the llvm.expect intrinsic (branch, switch, and select)
when lowering the intrinsics. The misexpect metadata contains
information about the expected target of the intrinsic so that we can
check against the correct PGO counter when emitting diagnostics, and the
compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight.
We use these branch weight values to determine when to emit the
diagnostic to the user.
A future patch should address the comment at the top of
LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and
UnlikelyBranchWeight values into a shared space that can be accessed
outside of the LowerExpectIntrinsic pass. Once that is done, the
misexpect metadata can be updated to be smaller.
In the long term, it is possible to reconstruct portions of the
misexpect metadata from the existing profile data. However, we have
avoided this to keep the code simple, and because some kind of metadata
tag will be required to identify which branch/switch/select instructions
are influenced by the use of llvm.expect
Patch By: paulkirth
Differential Revision: https://reviews.llvm.org/D66324
llvm-svn: 371584
This patch contains the basic functionality for reporting potentially
incorrect usage of __builtin_expect() by comparing the developer's
annotation against a collected PGO profile. A more detailed proposal and
discussion appears on the CFE-dev mailing list
(http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a
prototype of the initial frontend changes appear here in D65300
We revised the work in D65300 by moving the misexpect check into the
LLVM backend, and adding support for IR and sampling based profiles, in
addition to frontend instrumentation.
We add new misexpect metadata tags to those instructions directly
influenced by the llvm.expect intrinsic (branch, switch, and select)
when lowering the intrinsics. The misexpect metadata contains
information about the expected target of the intrinsic so that we can
check against the correct PGO counter when emitting diagnostics, and the
compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight.
We use these branch weight values to determine when to emit the
diagnostic to the user.
A future patch should address the comment at the top of
LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and
UnlikelyBranchWeight values into a shared space that can be accessed
outside of the LowerExpectIntrinsic pass. Once that is done, the
misexpect metadata can be updated to be smaller.
In the long term, it is possible to reconstruct portions of the
misexpect metadata from the existing profile data. However, we have
avoided this to keep the code simple, and because some kind of metadata
tag will be required to identify which branch/switch/select instructions
are influenced by the use of llvm.expect
Patch By: paulkirth
Differential Revision: https://reviews.llvm.org/D66324
llvm-svn: 371484
Summary:
This patch introduces initial `AAValueSimplify` which simplifies a value in a context.
example
- (for function returned) If all the return values are the same and constant, then we can replace callsite returned with the constant.
- If an internal function takes the same value(constant) as an argument in the callsite, then we can replace the argument with that constant.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66967
llvm-svn: 371291
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.
Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.
There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.
Reviewers: chandlerc, hfinkel
Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66428
llvm-svn: 371284
Summary:
Instead of building attributes for internal functions which we do not
update as long as we assume they are dead, we now do not create
attributes until we assume the internal function to be live. This
improves the number of required iterations, as well as the number of
required updates, in real code. On our tests, the results are mixed.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66914
llvm-svn: 370924
Summary:
We create attributes on-demand so we need to check the white list
on-demand. This also unifies the location at which we create,
initialize, and eventually invalidate new abstract attributes.
The tests show mixed results, a few more call site attributes are
determined which can cause more iterations.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66913
llvm-svn: 370922
Summary:
Before we tried to rule out non-exact definitions early but that lead to
on-demand attributes created for them anyway. As a consequence we needed
to look at the definition in the initialize of each attribute again.
This patch centralized this lookup and tightens the condition under
which we give up on non-exact definitions.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67115
llvm-svn: 370917
Add the no-capture argument attribute deduction to the Attributor
fixpoint framework.
The new string attributed "no-capture-maybe-returned" is introduced to
allow deduction of no-capture through functions that "capture" an
argument but only by "returning" it. It is only used by the Attributor
for testing.
Differential Revision: https://reviews.llvm.org/D59922
llvm-svn: 370817
cold versus function being newly added.
This is the second half of https://reviews.llvm.org/D66374.
Profile symbol list is the collection of function symbols showing up in
the binary which generates the current profile. It is used to discriminate
function being cold versus function being newly added. Profile symbol list
is only added for profile with ExtBinary format.
During profile use compilation, when profile-sample-accurate is enabled,
a function without profile will be regarded as cold only when it is
contained in that list.
Differential Revision: https://reviews.llvm.org/D66766
llvm-svn: 370563
Summary:
Instead of recomputing information for call sites we now use the
function information directly. This is always valid and once we have
call site specific information we can improve here.
This patch also bootstraps attributes that are created on-demand through
an initial update call. Information that is known will then directly be
available in the new attribute without causing an iteration delay.
The tests show how this improves the iteration count.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66781
llvm-svn: 370480
Summary:
Any pointer could have load/store users not only floating ones so we
move the manifest logic for alignment into the AAAlignImpl class.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66922
llvm-svn: 370479
As dependences between abstract attributes can become stale, e.g., if
one was sufficient to imply another one at some point but it has since
been wakened to the point it is not usable for the formerly implied one.
To weed out spurious dependences, and thereby eliminate unneeded
updates, we introduce an option to determine how often the dependence
cache is cleared and recomputed during the fixpoint iteration.
Note that the initial value was determined such that we see a positive
result on our tests.
Differential Revision: https://reviews.llvm.org/D63315
llvm-svn: 370230
Summary:
Until we have proper call-site information we should not recompute
liveness and return information for each call site. This patch directly
uses the function versions and introduces TODOs at the usage sites.
The required iterations to get to the fixpoint are most of the time
reduced by this change and we always avoid work duplication.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66562
llvm-svn: 370208
Summary:
During the fixpoint iteration, including the manifest stage, we should
not delete stuff as other abstract attributes might have a reference to
the value. Through the API this can now be done safely at the very end.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66779
llvm-svn: 370014
Summary:
Try to verify how many iterations we need for a fixpoint in our tests.
This patch adjust the way we count to make it easier to follow. It also
adjusts the bounds to actually account for a fixpoint and not only the
minimum number to pass all checks.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66757
llvm-svn: 369945
By default, the Attributor tracks potential dependences between abstract
attributes based on the issued Attributor::getAAFor queries. This
simplifies the development of new abstract attributes but it can also
lead to spurious dependences that might increase compile time and make
internalization harder (D63312). With this patch, abstract attributes
can opt-out of implicit dependence tracking and instead register
dependences explicitly. It is up to the implementation to make sure all
existing dependences are registered.
Differential Revision: https://reviews.llvm.org/D63314
llvm-svn: 369935
Summary:
We can now manifest alignment information in load/store instructions if
the pointer is known to have a better alignment.
Reviewers: uenoku, sstefan1, lebedev.ri
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66567
llvm-svn: 369804
Summary:
If the unique return value is a constant we now replace call uses with
that constant.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66551
llvm-svn: 369785
Summary:
If we have a loop in which the dereferenceability of a pointer decreases
we did slowly decrease it iteration by iteration, leading to a timeout.
With this patch we detect such circular reasoning and indicate a
fixpoint early.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66558
llvm-svn: 369784
Summary:
If we have a negative inbounds offset dereferenceabily "grows". However,
until we do not handle the overflow that can occur in the
dereferenceable bytes and the problem with loops, we simply do not grow
the state.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66557
llvm-svn: 369771
If the number of potentially returned values not change since the last
traversal we do not need to visit the returned values again. This works
as we only add values to the returned values set now.
Differential Revision: https://reviews.llvm.org/D66484
llvm-svn: 369770
Summary:
When we have new attributes and we end the fixpoint iteration because
the iteration limit is reached, we need to treat the new ones as if they
changed in the last iteration, as they might have.
This adds a test for which we should not derive anything regardless of
the iteration limit, e.g., if we abort there should not be any
attributes manifested in the IR.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66549
llvm-svn: 369768
Summary:
Keep aliasees alive if their alias is live, otherwise we end up with an
alias to a declaration, which is invalid. This can happen when the
aliasee is weak and non-prevailing.
This fix exposed the fact that we were then attempting to internalize
the weak symbol, which was not exported as it was not prevailing. We
should not internalize interposable symbols in general, unless this is
the prevailing copy, since it can lead to incorrect inlining and other
optimizations. Most of the changes in this patch are due to the
restructuring required to pass down the prevailing callback.
Finally, while implementing the test cases, I found that in the case of
a weak aliasee that is still marked not live because its alias isn't
live, after dropping the definition we incorrectly marked the
declaration with weak linkage when resolving prevailing symbols in the
module. This was due to some special case handling for symbols marked
WeakLinkage in the summary located before instead of after a subsequent
check for the symbol being a declaration. It turns out that we don't
actually need this special case handling any more (looking back at the
history, when that was added the code was structured quite differently)
- we will correctly mark with weak linkage further below when the
definition hasn't been dropped.
Fixes PR42542.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, dang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66264
llvm-svn: 369766
I noticed another instance of the issue where references to aliases were
being replaced with aliasees, this time in InstCombine. In the instance that
I saw it turned out to be only a QoI issue (a symbol ended up being missing
from the symbol table due to the last reference to the alias being removed,
preventing HWASAN from symbolizing a global reference), but it could easily
have manifested as incorrect behaviour.
Since this is the third such issue encountered (previously: D65118, D65314)
it seems to be time to address this common error/QoI issue once and for all
and make the strip* family of functions not look through aliases.
Includes a test for the specific issue that I saw, but no doubt there are
other similar bugs fixed here.
As with D65118 this has been tested to make sure that the optimization isn't
load bearing. I built Clang, Chromium for Linux, Android and Windows as well
as the test-suite and there were no size regressions.
Differential Revision: https://reviews.llvm.org/D66606
llvm-svn: 369697
Summary: In D65402, I want to get DerefState from AADereferenceable but it was not allowed. This patch moves DerefState definition into Attributor.h and makes AADerefenceable inherit StateWrapper.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66585
llvm-svn: 369653
For an internal function, if all its call sites are dead, the body of the function is considered dead.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D66155
llvm-svn: 369470
Summary:
StringMap is used for storing call target to frequency map for AutoFDO. However the iterating order of StringMap is non-deterministic, which leads to non-determinism in AutoFDO profile output. Now new API getSortedCallTargets and SortCallTargets are added for deterministic ordering and output.
Roundtrip test for text profile and binary profile is added.
Reviewers: wmi, davidxl, danielcdh
Subscribers: hiraditya, mgrang, llvm-commits, twoh
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66191
llvm-svn: 369440
Summary:
When the line format is wrong, we may end up accessing out of bound
memory. eg: the test with invalide line will cause assert.
Assertion `idx < size()' failed
The fix is to report fatal when we found mismatched line format.
Reviewers: qcolombet, volkan
Reviewed By: qcolombet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66444
llvm-svn: 369389
Before, we create the set of abstract attributes initially and then
dealt with the fact hat a lookup could fail, e.g., return a nullptr.
This patch will ensure we always return a valid object from a lookup,
allowing us not only to remove the nullptr checks but also to grow the
set of abstract attributes "in-flight" on-demand.
One can now start from those that have the best chance of improving
performance without the need to specify all they might depend on.
While this introduces some boilerplate, the usage of attributes is much
easier and cleaner now.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66276
llvm-svn: 369331
Summary:
This is analogous to D66128 but for AADereferenceable. We have the logic
concentrated in the floating value updateImpl and we use the combiner
helper classes for arguments and return values.
The regressions will go away with "on-demand" attribute creation.
Improvements are already visible in the existing tests.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66272
llvm-svn: 369329
Summary:
What D66126 did for AAAlign, this patch does for AANonNull. Agian, the
logic becomes more concise and localized. Again, returned poiners are
not annotated properly but that will not be an issue if this lands with
the "on-demand" generation of attributes. First improvements due to the
genericValueTraversal are already visible.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66128
llvm-svn: 369328
The clamp operator should not take the known of the given state as the
known is potentially based on assumed information. This also adds TODOs
to guide improvements.
llvm-svn: 369327
This reverts commit cedd0d9a6e.
Re-apply the original commit but make sure the variables are initialized
(even if they are not used) so UBSan is not complaining.
llvm-svn: 369294
By partially resolving returned calls we did not record that they were
not fully resolved which caused odd behavior down the line. We could
also end up with some, but not all, returned values of the callee in the
returned values map of the caller, another odd behavior we want to
avoid.
llvm-svn: 369160
As a preparation to "on-demand" abstract attribute generation we need
implementations for all attributes (as they can be queried and then
created on-demand where we now fail to find one).
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66129
llvm-svn: 369155
Summary:
This is the first commit aiming to structure the attribute deduction.
The base idea is that we have default propagation patterns as listed
below on top of which we can add specific, e.g., context sensitive,
logic.
Deduction patterns used in this patch:
- argument states are determined from call site argument states,
see AAAlignArgument and AAArgumentFromCallSiteArguments.
- call site argument states are determined as if they were floating
values, see AAAlignCallSiteArgument and AAAlignFloating.
- floating value states are determined by traversing the def-use chain
and combining the states determined for the leaves, see
AAAlignFloating and genericValueTraversal.
- call site return states are determined from function return states,
see AAAlignCallSiteReturned and AACallSiteReturnedFromReturned.
- function return states are determined from returned value states,
see AAAlignReturned and AAReturnedFromReturnedValues.
Through this strategy all logic for alignment is concentrated in the
AAAlignFloating::updateImpl method.
Note: This commit works on its own but is part of a larger change that
involves "on-demand" creation of abstract attributes that will
participate in the fixpoint iteration. Without this part, we sometimes
do not have an AAAlign abstract attribute to query, loosing information
we determined before. All tests have appropriate FIXMEs and the
information will be recovered once we added all parts.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66126
llvm-svn: 369144
Until we have call site specific liveness and/or value information there
is no need to do call site specific deduction. Though, we need the
symbols in follow up patches that make Attributor::getAAFor return a
reference.
llvm-svn: 369143
Summary:
This patch should not change the behavior except that the added
initialize methods might indicate an optimistic fixpoint earlier. The
code movement is done to keep the attribute definitions in a single
block where it makes sense. No functional changes intended there.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66258
llvm-svn: 369142
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
Summary:
Instead of constantly keeping track of the nonnull status with the
dereferenceable information we can simply query the nonnull attribute
whenever we need the information (debug + manifest).
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66113
llvm-svn: 368924
Summary:
As one of the first attributes, and one of the complex ones,
AAReturnedValues was not using liveness but we filtered the result after
the fact. This change adds liveness usage during the creation. The
algorithm is also improved and shorter.
The new algorithm will collect returned values over time using the
generic facilities that work with liveness already, e.g.,
genericValueTraversal which does not look at dead PHI node predecessors.
A test to show how this leads to better results is included.
Note: Unresolved calls and resolved calls are now tracked explicitly.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66120
llvm-svn: 368922
Summary:
If the associated context instruction is assumed dead we do not need to
update or manifest the state.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66116
llvm-svn: 368921
Summary:
The next attempt to clean up the Attributor interface before we grow it
further.
Before, we used a combination of two values (associated + anchor) and an
argument number (or -1) to determine a location. This was very fragile.
The new system uses exclusively IR positions and we restrict the
generation of IR positions to special constructor methods that verify
internal constraints we have. This will catch misuse early.
The auto-conversion, e.g., in getAAFor, is now performed through the
SubsumingPositionIterator. This iterator takes an IR position and allows
to visit all IR positions that "subsume" the given one, e.g., function
attributes "subsume" argument attributes of that function. For a
detailed breakdown see the class comment of SubsumingPositionIterator.
This patch also introduces the IRPosition::getAttrs() to extract IR
attributes at a certain position. The method knows how to look up in
different positions that are equivalent, e.g., the argument position for
call site arguments. We also introduce three new positions kinds such
that we have all IR positions where attributes can be placed and one for
"floating" values.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65977
llvm-svn: 368919
Summary:
This commit fixed a race condition from multi-threaded thinLTO backends that causes non-deterministic memory corruption for a data structure used only by AutoFDO with compact binary profile.
GUIDToFuncNameMap, a static data member of type DenseMap in FunctionSamples is used as a per-module mapping from function name MD5 to name string when input AutoFDO profile is in compact binary format. However with ThinLTO, we can have parallel backends modifying and accessing the class static map concurrently. The fix is to make GUIDToFuncNameMap a member of SampleProfileLoader instead of a file static data.
Reviewers: wmi, davidxl, danielcdh
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65848
llvm-svn: 368596
The default behavior of Clang's indirect function call checker will replace
the address of each CFI-checked function in the output file's symbol table
with the address of a jump table entry which will pass CFI checks. We refer
to this as making the jump table `canonical`. This property allows code that
was not compiled with ``-fsanitize=cfi-icall`` to take a CFI-valid address
of a function, but it comes with a couple of caveats that are especially
relevant for users of cross-DSO CFI:
- There is a performance and code size overhead associated with each
exported function, because each such function must have an associated
jump table entry, which must be emitted even in the common case where the
function is never address-taken anywhere in the program, and must be used
even for direct calls between DSOs, in addition to the PLT overhead.
- There is no good way to take a CFI-valid address of a function written in
assembly or a language not supported by Clang. The reason is that the code
generator would need to insert a jump table in order to form a CFI-valid
address for assembly functions, but there is no way in general for the
code generator to determine the language of the function. This may be
possible with LTO in the intra-DSO case, but in the cross-DSO case the only
information available is the function declaration. One possible solution
is to add a C wrapper for each assembly function, but these wrappers can
present a significant maintenance burden for heavy users of assembly in
addition to adding runtime overhead.
For these reasons, we provide the option of making the jump table non-canonical
with the flag ``-fno-sanitize-cfi-canonical-jump-tables``. When the jump
table is made non-canonical, symbol table entries point directly to the
function body. Any instances of a function's address being taken in C will
be replaced with a jump table address.
This scheme does have its own caveats, however. It does end up breaking
function address equality more aggressively than the default behavior,
especially in cross-DSO mode which normally preserves function address
equality entirely.
Furthermore, it is occasionally necessary for code not compiled with
``-fsanitize=cfi-icall`` to take a function address that is valid
for CFI. For example, this is necessary when a function's address
is taken by assembly code and then called by CFI-checking C code. The
``__attribute__((cfi_jump_table_canonical))`` attribute may be used to make
the jump table entry of a specific function canonical so that the external
code will end up taking a address for the function that will pass CFI checks.
Fixes PR41972.
Differential Revision: https://reviews.llvm.org/D65629
llvm-svn: 368495
Summary:
The ever growing switch required Attribute::AttrKind values but they
might not be available for all abstract attributes we deduce. With the
new method we track statistics at the abstract attribute level. The
provided macros simplify the usage and make the messages uniform.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65732
llvm-svn: 368227
Summary:
The wrapper reduces boilerplate code and also provide a nice way to
determine the state type used by an abstract attributes statically via
AAType::StateType.
This was already discussed as part of the review of D65711.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65786
llvm-svn: 368224
If we know everything is live there is no need to query for liveness.
Indicating a pessimistic fixpoint will cause the state to be "invalid"
which will cause the Attributor to not return the AAIsDead on request,
which will prevent us from querying isAssumedDead().
llvm-svn: 368223
Summary:
So far, whenever one wants to look at returned values, one had to deal
with the AAReturnedValues and potentially with the AAIsDead attribute.
In the same spirit as other checkForAllXXX methods, we add this
functionality now to the Attributor. By adopting the use sites we got
better results when return instructions were dead.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65733
llvm-svn: 368222
Commit r368064 was necessary after r367953 (D65712) broke the module
build. That happened, apparently, because the template class IRAttribute
defined in the header had a virtual method defined in the corresponding
source file (IRAttribute::manifest). To unbreak the situation this patch
introduces a helper function IRAttributeManifest::manifestAttrs which
is used to implement IRAttribute::manifest in the header. The deifnition
of the helper function is still in the source file.
Patch by jdoerfert (Johannes Doerfert)
Differential Revision: https://reviews.llvm.org/D65821
llvm-svn: 368076
Summary:
Similar to `Attributor::checkForAllCallSites`, we now provide such
functionality for instructions of a certain opcode through
`Attributor::checkForAllInstructions` and the convenient wrapper
`Attributor::checkForAllCallLikeInstructions`. This cleans up code,
avoids duplication, and simplifies the usage of liveness information.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65731
llvm-svn: 367961
Summary:
Certain properties, e.g., an AttrKind, are not shared among all abstract
attributes. This patch extracts the functionality into a helper struct.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65712
llvm-svn: 367953
To remove boilerplate, mostly passing through values to the
AbstractAttriubute base class, we extract the state into an IRPosition
helper. There is no function change intended but the IRPosition struct
will provide more functionality down the line.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65711
llvm-svn: 367952
Summary:
The new scheme is similar to the pass manager and dyn_cast scheme where
we identify classes by the address of a static member. This is better
than the old scheme in which we had to "invent" new Attributor enums if
there was no corresponding one.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65710
llvm-svn: 367951
Summary:
Instead of storing the reference to the InformationCache we now pass it
whenever it might be needed.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65709
llvm-svn: 367950
A function is "no-return" if we never reach a return instruction, either
because there are none or the ones that exist are dead.
Test have been adjusted:
- either noreturn was added, or
- noreturn was avoided by modifying the code.
The new noreturn_{sync,async} test make sure we do handle invoke
instructions with a noreturn (and potentially nowunwind) callee
correctly, even in the presence of potential asynchronous exceptions.
llvm-svn: 367948
When we remove instructions cached references could still be live. This
patch avoids removing invoke instructions that are replaced by calls and
instead keeps them around but in a dead block.
llvm-svn: 367933
Similar to other places where we transform invokes to calls we need to
be careful if the handler (=personality) can catch asynchronous
exceptions as they are not modeled as part of nounwind.
This is tested with D59978.
llvm-svn: 367931
Summary:
This contains various fixes:
- Explicitly determine and return the next noreturn instruction.
- If an invoke calls a noreturn function which is not nounwind we
keep the unwind destination live. This also means we require an
invoke. Though we can still add the unreachable to the normal
destination block.
- Check if the return instructions are dead after we look for calls
to avoid triggering an optimistic fixpoint in the presence of
assumed liveness information.
- Make the interface work with "const" pointers.
- Some simplifications
While additional tests are included, full coverage is achieved only with
D59978.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65701
llvm-svn: 367791
When a fixpoint is indicated the change status is known due to the
fixpoint kind. This simplifies a common code pattern by making the
connection explicit.
llvm-svn: 367790
Summary:
If the DerefBytesState (and thereby the DerefState) is invalid, we
reached a fixpoint for the whole DerefState as we will not
manifest/provide information then.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65586
llvm-svn: 367789
Modifying other AbstractAttributes to use Liveness AA and skip dead instructions.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential revision: https://reviews.llvm.org/D65243
llvm-svn: 367725
This patch adds support to the WholeProgramDevirt pass to perform
index-based WPD, which is invoked from ThinLTO during the thin link.
The ThinLTO backend (WPD import phase) behaves the same regardless of
whether the WPD decisions were made with the index-based or (the
existing) IR-based analysis.
Depends on D54815.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D55153
llvm-svn: 367679
User of AAReturnedValues need to know if HasOverdefinedReturnedCalls
changed from false to true as it will impact the result of the return
value traversal (calls are not ignored anymore).
This will be tested with the tests in D59978.
llvm-svn: 367581
Summary:
While there is always a `Value::replaceAllUsesWith()`,
sometimes the replacement needs to be conditional.
I have only cleaned a few cases where `replaceUsesWithIf()`
could be used, to both add test coverage,
and show that it is actually useful.
Reviewers: jdoerfert, spatel, RKSimon, craig.topper
Reviewed By: jdoerfert
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, aheejin, george.burgess.iv, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65528
llvm-svn: 367548
Globals that are associated with globals with type metadata need to appear
in the merged module because they will reference the global's section directly.
Differential Revision: https://reviews.llvm.org/D65312
llvm-svn: 367242
Summary:
Deduce dereferenceable attribute in Attributor.
These will be added in a later patch.
* dereferenceable(_or_null)_globally (D61652)
* Deduction based on load instruction (similar to D64258)
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64876
llvm-svn: 366788
Summary: Adds a binding to the internalize pass that allows the caller to pass a function pointer that acts as the visibility-preservation predicate. Previously, one could only pass an unsigned value (not LLVMBool?) that directed the pass to consider "main" or not.
Reviewers: whitequark, deadalnix, harlanhaskins
Reviewed By: whitequark, harlanhaskins
Subscribers: kren1, hiraditya, llvm-commits, harlanhaskins
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62456
llvm-svn: 366777
[Attributor] Liveness analysis.
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D64162
llvm-svn: 366769
[Attributor] Liveness analysis.
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential revision: https://reviews.llvm.org/D64162
llvm-svn: 366753
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential revision: https://reviews.llvm.org/D64162
llvm-svn: 366736
Porting function return value attribute noalias to attributor.
This will be followed with a patch for callsite and function argumets.
Reviewers: jdoerfert
Subscribers: lebedev.ri, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D63067
llvm-svn: 366728
The bytes inserted before an overaligned global need to be padded according
to the alignment set on the original global in order for the initializer
to meet the global's alignment requirements. The previous implementation
that padded to the pointer width happened to be correct for vtables on most
platforms but may do the wrong thing if the vtable has a larger alignment.
This issue is visible with a prototype implementation of HWASAN for globals,
which will overalign all globals including vtables to 16 bytes.
There is also no padding requirement for the bytes inserted after the global
because they are never read from nor are they significant for alignment
purposes, so stop inserting padding there.
Differential Revision: https://reviews.llvm.org/D65031
llvm-svn: 366725
We were previously ignoring alignment entirely when combining globals
together in this pass. There are two main things that we need to do here:
add additional padding before each global to meet the alignment requirements,
and set the combined global's alignment to the maximum of all of the original
globals' alignments.
Since we now need to calculate layout as we go anyway, use the calculated
layout to produce GlobalLayout instead of using StructLayout.
Differential Revision: https://reviews.llvm.org/D65033
llvm-svn: 366722
Summary:
Deduce the "willreturn" attribute for functions.
For now, intrinsics are not willreturn. More annotation will be done in another patch.
Reviewers: jdoerfert
Subscribers: jvesely, nhaehnle, nicholas, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63046
llvm-svn: 366335
Add "memtag" sanitizer that detects and mitigates stack memory issues
using armv8.5 Memory Tagging Extension.
It is similar in principle to HWASan, which is a software implementation
of the same idea, but there are enough differencies to warrant a new
sanitizer type IMHO. It is also expected to have very different
performance properties.
The new sanitizer does not have a runtime library (it may grow one
later, along with a "debugging" mode). Similar to SafeStack and
StackProtector, the instrumentation pass (in a follow up change) will be
inserted in all cases, but will only affect functions marked with the
new sanitize_memtag attribute.
Reviewers: pcc, hctim, vitalybuka, ostannard
Subscribers: srhines, mehdi_amini, javed.absar, kristof.beyls, hiraditya, cryptoad, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D64169
llvm-svn: 366123
There are scenarios where mutually recursive functions may cause the SCC
to contain both read only and write only functions. This removes an
assertion when adding read attributes which caused a crash with a the
provided test case, and instead just doesn't add the attributes.
Patch by Luke Lau <luke.lau@intel.com>
Differential Revision: https://reviews.llvm.org/D60761
llvm-svn: 366090
Attributor::getAAFor will now only return AbstractAttributes with a
valid AbstractState. This simplifies call sites as they only need to
check if the returned pointer is non-null. It also reduces the potential
for accidental misuse.
llvm-svn: 365983
Some function in the Attributor framework are unnecessarily
marked virtual. This patch removes virtual keyword
Reviewers: jdoerfert
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D64637
llvm-svn: 365925
Introduce and deduce "nosync" function attribute to indicate that a function
does not synchronize with another thread in a way that other thread might free memory.
Reviewers: jdoerfert, jfb, nhaehnle, arsenm
Subscribers: wdng, hfinkel, nhaenhle, mehdi_amini, steven_wu,
dexonsmith, arsenm, uenoku, hiraditya, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D62766
llvm-svn: 365830
This makes the functions in Loads.h require a type to be specified
independently of the pointer Value so that when pointers have no structure
other than address-space, it can still do its job.
Most callers had an obvious memory operation handy to provide this type, but a
SROA and ArgumentPromotion were doing more complicated analysis. They get
updated to merge the properties of the various instructions they were
considering.
llvm-svn: 365468
Deduce the "returned" argument attribute by collecting all potentially
returned values.
Not only the unique return value, if any, can be used by subsequent
attributes but also the set of all potentially returned values as well
as the mapping from returned values to return instructions that they
originate from (see AAReturnedValues::checkForallReturnedValues).
Change in statistics (-stats) for LLVM-TS + Spec2006, totaling ~19% more "returned" arguments.
ADDED: attributor NumAttributesManifested n/a -> 637
ADDED: attributor NumAttributesValidFixpoint n/a -> 25545
ADDED: attributor NumFnArgumentReturned n/a -> 637
ADDED: attributor NumFnKnownReturns n/a -> 25545
ADDED: attributor NumFnUniqueReturned n/a -> 14118
CHANGED: deadargelim NumRetValsEliminated 470 -> 449 ( -4.468%)
REMOVED: functionattrs NumReturned 535 -> n/a
CHANGED: indvars NumElimIdentity 138 -> 164 ( +18.841%)
Reviewers: homerdin, hfinkel, fedor.sergeev, sanjoy, spatel, nlopes, nicholas, reames, efriedma, chandlerc
Subscribers: hiraditya, bollu, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D59919
llvm-svn: 365407
This patch adds a function attribute, nofree, to indicate that a function does
not, directly or indirectly, call a memory-deallocation function (e.g., free,
C++'s operator delete).
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D49165
llvm-svn: 365336
It's possible that some function can load and store the same
variable using the same constant expression:
store %Derived* @foo, %Derived** bitcast (%Base** @bar to %Derived**)
%42 = load %Derived*, %Derived** bitcast (%Base** @bar to %Derived**)
The bitcast expression was mistakenly cached while processing loads,
and never examined later when processing store. This caused @bar to
be mistakenly treated as read-only variable. See load-store-caching.ll.
llvm-svn: 365188
This reverts r365040 (git commit 5cacb91475)
Speculatively reverting, since this appears to have broken check-lld on
Linux. Partial analysis in https://crbug.com/981168.
llvm-svn: 365097
Summary:
If LTOUnit splitting is disabled, the module summary analysis computes
the summary information necessary to perform single implementation
devirtualization during the thin link with the index and no IR. The
information collected from the regular LTO IR in the current hybrid WPD
algorithm is summarized, including:
1) For vtable definitions, record the function pointers and their offset
within the vtable initializer (subsumes the information collected from
IR by tryFindVirtualCallTargets).
2) A record for each type metadata summarizing the vtable definitions
decorated with that metadata (subsumes the TypeIdentiferMap collected
from IR).
Also added are the necessary bitcode records, and the corresponding
assembly support.
The follow-on index-based WPD patch is D55153.
Depends on D53890.
Reviewers: pcc
Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54815
llvm-svn: 364960
This allows later passes (in particular InstCombine) to optimize more
cases.
One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0.
llvm-svn: 364412
Add an AssumptionCache callback to the InlineFuntionInfo used for the
AlwaysInlinerPass to match codegen of the AlwaysInlinerLegacyPass to generate
llvm.assume. This fixes CodeGen/builtin-movdir.c when new PM is enabled by
default.
Differential Revision: https://reviews.llvm.org/D63170
llvm-svn: 363287
@jdoerfert Looks like these are placeholders for incoming abstract attributes patches so I've just commented the code out, even though this is usually frowned upon.
llvm-svn: 362592
NOTE: Note that no attributes are derived yet. This patch will not go in
alone but only with others that derive attributes. The framework is
split for review purposes.
This commit introduces the Attributor pass infrastructure and fixpoint
iteration framework. Further patches will introduce abstract attributes
into this framework.
In a nutshell, the Attributor will update instances of abstract
arguments until a fixpoint, or a "timeout", is reached. Communication
between the Attributor and the abstract attributes that are derived is
restricted to the AbstractState and AbstractAttribute interfaces.
Please see the file comment in Attributor.h for detailed information
including design decisions and typical use case. Also consider the class
documentation for Attributor, AbstractState, and AbstractAttribute.
Reviewers: chandlerc, homerdin, hfinkel, fedor.sergeev, sanjoy, spatel, nlopes, nicholas, reames
Subscribers: mehdi_amini, mgorny, hiraditya, bollu, steven_wu, dexonsmith, dang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59918
llvm-svn: 362578
Summary:
When we import an alias, we do so by making a clone of the aliasee. Just
as this clone uses the original alias name and linkage, it should also
use the same visibility (not the aliasee's visibility). Otherwise,
linker behavior is affected (e.g. if the aliasee was hidden, but the
alias is not, the resulting imported clone should not be hidden,
otherwise the linker will make the final symbol hidden which is
incorrect).
Reviewers: wmi
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62535
llvm-svn: 361989
Summary:
Allow struct fields SRA and dead stores. This works by considering fields accesses from getElementPtr to be considered as a possible pointer root that can be cleaned up.
We check that the variable can be SRA by recursively checking the sub expressions with the new isSafeSubSROAGEP function.
basically this allows the array in following C code to be optimized out
struct Expr {
int a[2];
int b;
};
static struct Expr e;
int foo (int i)
{
e.b = 2;
e.a[i] = 1;
return e.b;
}
Reviewers: greened, bkramer, nicholas, jmolloy
Reviewed By: jmolloy
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61911
llvm-svn: 361460
Refactor DIExpression::With* into a flag enum in order to be less
error-prone to use (as discussed on D60866).
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D61943
llvm-svn: 361137
Some atomic loads are implemented as cmpxchg (particularly if large or
floating), and that usually requires write access to the memory involved
or it will segfault.
We can still propagate the constant value to users we understand though.
llvm-svn: 360662
Summary:
We hit undefined references building with ThinLTO when one source file
contained explicit instantiations of a template method (weak_odr) but
there were also implicit instantiations in another file (linkonce_odr),
and the latter was the prevailing copy. In this case the symbol was
marked hidden when the prevailing linkonce_odr copy was promoted to
weak_odr. It led to unsats when the resulting shared library was linked
with other code that contained a reference (expecting to be resolved due
to the explicit instantiation).
Add a CanAutoHide flag to the GV summary to allow the thin link to
identify when all copies are eligible for auto-hiding (because they were
all originally linkonce_odr global unnamed addr), and only do the
auto-hide in that case.
Most of the changes here are due to plumbing the new flag through the
bitcode and llvm assembly, and resulting test changes. I augmented the
existing auto-hide test to check for this situation.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, dexonsmith, arphaman, dang, llvm-commits, steven_wu, wmi
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59709
llvm-svn: 360466
Optimization pass lib/Transforms/IPO/GlobalOpt.cpp needs to insert
DW_OP_deref_size instead of DW_OP_deref to be compatible with big-endian
targets for same reasons as in D59687.
Differential Revision: https://reviews.llvm.org/D60611
llvm-svn: 360013
Summary:
Inalloca parameters require special handling in some optimizations.
This change causes globalopt to strip the inalloca attribute from
function parameters when it is safe to do so, removes the special
handling for inallocas from argpromotion, and replaces it with a
simple check that causes argpromotion to skip functions that receive
inallocas (for when the pass is invoked on code that didn't run
through globalopt first). This also avoids a case where argpromotion
would incorrectly try to pass an inalloca in a register.
Fixes PR41658.
Reviewers: rnk, efriedma
Reviewed By: rnk
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61286
llvm-svn: 359743
Summary:
Match NewPassManager behavior: add option for interleaved loops in the
old pass manager, and use that instead of the flag used to disable loop unroll.
No changes in the defaults.
Reviewers: chandlerc
Subscribers: mehdi_amini, jlebar, dmgreen, hsaito, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61030
llvm-svn: 359615
This change aims at making the file format be compatible with the
way LLVM handles command line options.
Differential Revision: https://reviews.llvm.org/D60970
llvm-svn: 359462
isValidCandidateForColdCC is much more expensive than
TTI.useColdCCForColdCall, which by default just returns false. Avoid
doing this work if we're not going to look at the answer anyway.
This change is NFC, but I see significant compile time improvements on
some code with pathologically many functions.
llvm-svn: 359253
Converting InlineCost interface and its internals into CallBase usage.
Inliners themselves are still not converted.
Reviewed By: reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60636
llvm-svn: 358982
Back in August, r340525 introduced a dependency on the assumption
cache tracker in the ipsccp pass, but that commit missed a call to
INITIALIZE_PASS_DEPENDENCY, which leaves the assumption cache
improperly registered if SCCP is the only thing that pulls it in.
llvm-svn: 358903
Summary:
Make the flags in LICM + MemorySSA tuning options in the old and new
pass managers.
Subscribers: mehdi_amini, jlebar, Prazek, george.burgess.iv, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60490
llvm-svn: 358772
Summary:
Trying to add the plumbing necessary to add tuning options to the new pass manager.
Testing with the flags for loop vectorize.
Reviewers: chandlerc
Subscribers: sanjoy, mehdi_amini, jlebar, steven_wu, dexonsmith, dang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59723
llvm-svn: 358763
removeUsers uses a work list to collect indirect users and call remove()
on those functions. However it has a bug (`if (!Visited.insert(UU).second)`).
Actually, we don't have to collect indirect users.
After the merge of F and G, G's callers will be considered (added to
Deferred). If G's callers can be merged, G's callers' callers will be
considered.
Update the test unnamed-addr-reprocessing.ll to make it clear we can
still merge indirect callers.
llvm-svn: 358741
code to `CallBase`.
This patch focuses on the legacy PM, call graph, and some of inliner and legacy
passes interacting with those APIs from `CallSite` to the new `CallBase` class.
No interesting changes.
Differential Revision: https://reviews.llvm.org/D60412
llvm-svn: 358739
We would previously drop the COMDAT on the thunk we generated when replacing a
function body with the forwarding thunk. This would result in a function that
may have been multiply emitted and multiply merged to be emitted with the same
name without the COMDAT. This is a hard error with PE/COFF where the COMDAT is
used for the deduplication of Value Witness functions for Swift.
llvm-svn: 358728
Prior to this patch, each basic block listed in the extrack-blocks-file
would be extracted to a different function.
This patch adds the support for comma separated list of basic blocks
to form group.
When the region formed by a group is not extractable, e.g., not single
entry, all the blocks of that group are left untouched.
Let us see this new format in action (comments are not part of the
file format):
;; funcName bbName[,bbName...]
foo bb1 ;; Extract bb1 in its own function
foo bb2,bb3 ;; Extract bb2,bb3 in their own function
bar bb1,bb4 ;; Extract bb1,bb4 in their own function
bar bb2 ;; Extract bb2 in its own function
Assuming all regions are extractable, this will create one function and
thus one call per region.
Differential Revision: https://reviews.llvm.org/D60746
llvm-svn: 358701
Summary:
Create a method to forget everything in SCEV.
Add a cl::opt and PassManagerBuilder option to use this in LoopUnroll.
Motivation: Certain Halide applications spend a very long time compiling in forgetLoop, and prefer to forget everything and rebuild SCEV from scratch.
Sample difference in compile time reduction: 21.04 to 14.78 using current ToT release build.
Testcase showcasing this cannot be opensourced and is fairly large.
The option disabled by default, but it may be desirable to enable by
default. Evidence in favor (two difference runs on different days/ToT state):
File Before (s) After (s)
clang-9.bc 7267.91 6639.14
llvm-as.bc 194.12 194.12
llvm-dis.bc 62.50 62.50
opt.bc 1855.85 1857.53
File Before (s) After (s)
clang-9.bc 8588.70 7812.83
llvm-as.bc 196.20 194.78
llvm-dis.bc 61.55 61.97
opt.bc 1739.78 1886.26
Reviewers: sanjoy
Subscribers: mehdi_amini, jlebar, zzheng, javed.absar, dmgreen, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60144
llvm-svn: 358304
Create method `optForNone()` testing for the function level equivalent of
`-O0` and refactor appropriately.
Differential revision: https://reviews.llvm.org/D59852
llvm-svn: 357638
Set the correct debug location on instructions which load arguments in
preparation for a call to an arg-promoted function.
This prevents location cascade from misattributing the line/scope of one
of these loads to the location of the instruction preceding the call.
Differential Revision: https://reviews.llvm.org/D60113
llvm-svn: 357500
Summary: It is possible that multiple indirect call targets have been promoted for a single callsite from the profiled binary. Current implementation repeats promotion for all these targets as far as the callsite itself is hot (the callsite is assumed to be hot if any one of these targets was "hot" during the profiling). However, even when one of the ICPed target is hot other targets may not, and we should not repeat promotion for "cold" targets.
Reviewers: danielcdh, wmi
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59940
llvm-svn: 357484
Summary:
When inserting an `unreachable` after a noreturn call, we must ensure
that it's not a musttail call to avoid breaking the IR invariants for
musttail calls.
Reviewers: fedor.sergeev, majnemer
Reviewed By: majnemer
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60079
llvm-svn: 357483
LTO provides additional opportunities for tailcall elimination due to
link-time inlining and visibility of nocapture attribute. Testing showed
negligible impact on compilation times.
Differential Revision: https://reviews.llvm.org/D58391
llvm-svn: 356511
Summary:
The AliasSummary previously contained the AliaseeGUID, which was only
populated when reading the summary from bitcode. This patch changes it
to instead hold the ValueInfo of the aliasee, and always populates it.
This enables more efficient access to the ValueInfo (specifically in the
recent patch r352438 which needed to perform an index hash table lookup
using the aliasee GUID).
As noted in the comments in AliasSummary, we no longer technically need
to keep a pointer to the corresponding aliasee summary, since it could
be obtained by walking the list of summaries on the ValueInfo looking
for the summary in the same module. However, I am concerned that this
would be inefficient when walking through the index during the thin
link for various analyses. That can be reevaluated in the future.
By always populating this new field, we can remove the guard and special
handling for a 0 aliasee GUID when dumping the dot graph of the summary.
An additional improvement in this patch is when reading the summaries
from LLVM assembly we now set the AliaseeSummary field to the aliasee
summary in that same module, which makes it consistent with the behavior
when reading the summary from bitcode.
Reviewers: pcc, mehdi_amini
Subscribers: inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D57470
llvm-svn: 356268
The basic idea of the pass is to use a circular buffer to log the execution ordering of the functions. We only log the function when it is first executed. We use a 8-byte hash to log the function symbol name.
In this pass, we add three global variables:
(1) an order file buffer: a circular buffer at its own llvm section.
(2) a bitmap for each module: one byte for each function to say if the function is already executed.
(3) a global index to the order file buffer.
At the function prologue, if the function has not been executed (by checking the bitmap), log the function hash, then atomically increase the index.
Differential Revision: https://reviews.llvm.org/D57463
llvm-svn: 355133
Part 2 of CSPGO changes (mostly related to ProfileSummary).
Note that I use a default parameter in setProfileSummary() and getSummary().
This is to break the dependency in clang. I will make the parameter explicit
after changing clang in a separated patch.
Differential Revision: https://reviews.llvm.org/D54175
llvm-svn: 355131
Splitting can make sanitizer errors harder to understand, as the
trapping instruction may not be in the function where the bug was
detected.
rdar://48142697
llvm-svn: 354931
is false.
Right now for inliner and partial inliner, we always pass the address of a
valid ORE object to getInlineCost even if RemarkEnabled is false because of
no -Rpass is specified. Since ComputeFullInlineCost will be set to true if
ORE is non-null in getInlineCost, this introduces the problem that in
getInlineCost we cannot return early even if we already know the cost is
definitely higher than the threshold. It is a general problem for compile
time.
This patch fixes that by pass nullptr as the ORE argument if RemarkEnabled is
false.
Differential Revision: https://reviews.llvm.org/D58399
llvm-svn: 354542
With or without PGO data applied, splitting early in the pipeline
(either before the inliner or shortly after it) regresses performance
across SPEC variants. The cause appears to be that splitting hides
context for subsequent optimizations.
Schedule splitting late again, in effect reversing r352080, which
scheduled the splitting pass early for code size benefits (documented in
https://reviews.llvm.org/D57082).
Differential Revision: https://reviews.llvm.org/D58258
llvm-svn: 354158