Currently, `AANoUndefImpl::initialize` mistakenly always indicates optimistic fixpoint for function returned position.
This is because an associated value is `Function` in the case, and `isGuaranteedNotToBeUndefOrPoison` returns true for Function.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86361
When trying to enable -debug-info-kind=constructor there was an assert
that occurs during debug info cloning ("mismatched subprogram between
llvm.dbg.value variable and !dbg attachment").
It appears that during llvm::CloneFunctionInto, a DISubprogram could be
duplicated when MapMetadata is called, and then added to the MD map again
when DIFinder gets a list of subprograms. This results in two different
versions of the DISubprogram.
This patch switches the order so that the DIFinder subprograms are
added before MapMetadata is called.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46784
Differential Revision: https://reviews.llvm.org/D86185
If some of gc live value are not used in gc.relocate we can remove them
from gc-live bundle of statepoint instruction.
Also the CL removes duplicated Values in gc-live bundle.
Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D85959
The only def for gc.relocate is a gc.statepoint. But real dependency is not
described by def-use chain. Instead this dependency is encoded by indecies
of operands in gc-live bundle of statepoint as integer constants in gc.relocate.
InstCombine operates by def-use chain. As a result when value in gc-live bundle
is simplified the gc.statepoint itself is not simplified but it might simplify dependent
gc.relocates. To trigger the optimization of gc.relocate we now unconditionally trigger
check of all dependent gc.relocates by adding them to worklist.
This CL handles of gc.relocates as process of gc.statepoint optimization considering
gc.statepoint and related gc.relocate as whole entity.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D85954
Recommit the patch after fixing an issue reported caused by the fact
that re-used values are also added to InsertedValues.
Additional tests have been added in 88818491b9
This reverts the revert commit 38884641f2.
Both AfterPass and AfterPassInvalidated pass instrumentation
callbacks get additional parameter of type PreservedAnalyses.
This patch was created by @fedor.sergeev. I have just slightly
changed it.
Reviewers: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D81555
Before we speculatively execute a basic block, query the cost of
inserting the necessary select instructions against the phi folding
threshold. For non-trivial insertions, a more accurate decision can
probably be made during machine if-conversion. With minsize we query
the CodeSize cost, otherwise we use SizeAndLatency.
Differential Revision: https://reviews.llvm.org/D82438
As part of D84741, this adds a target hook for the
preferPredicatedReductionSelect option and makes use
of it under MVE, allowing us to tail predicate most
reduction loops.
Differential Revision: https://reviews.llvm.org/D85980
The normal scheme for tail folding reductions is to use:
loop:
p = phi(0, a)
mask = ...
x = masked_load(..., mask)
a = add(x, p)
s = select(mask, a, p)
This means we need to keep the register p and a alive out of the loop, plus
the mask. On a target with predicated operations we can instead generate
the phi as p = phi(0, s). This ensures the select in the loop and we can
fold select(m, add(a, b), c) to something like a vaddt c, a, b using the
m predicate. This in turn allows us to tail predicate the entire loop.
Differential Revision: https://reviews.llvm.org/D84741
Currently, although we handle `CallBase` case in updateImpl, we give up in initialize in the case.
That is problematic when we propagate a range from call site returned position to floating position.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86196
When removing a non-constant store to a global in
CleanupPointerRootUsers(), the GlobalOpt pass could incorrectly return
false.
This was caught using the check introduced by D80916.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86149
Relanded since the buildbot issue was unrelated to this commit.
When hoisting simple values out from a loop, and an optsize attribute, a
convergent call, or an invoke instruction hindered the pass from
unswitching the loop, the pass would return an incorrect Modified
status.
This was caught using the check introduced by D80916.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86085
This reverts commit dfd447c220.
After I pushed this commit, llvm-sphinx-docs started failing, due to:
Warning, treated as error:
extension 'recommonmark' has no setup() function;
is it really a Sphinx extension module?
I don't see how this commit may have caused that, but I'm still
reverting it since I don't know how to proceed with that
troubleshooting.
When hoisting simple values out from a loop, and an optsize attribute, a
convergent call, or an invoke instruction hindered the pass from
unswitching the loop, the pass would return an incorrect Modified
status.
This was caught using the check introduced by D80916.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86085
Comparison against null is a common pattern that usually is followed by
error handling code and the likes. We now use AANonNull to simplify
these comparisons optimistically in order to make more code dead early
on.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D86145
`AADereferenceable::getAssumedDereferenceableBytes()` is actually
deducing `dereferenceable_or_null`. We should not use that information
to deduce `nonnull`, since it doesn't imply `nonnull`.
This commits breaks certain OpenMP codes (on power) because it expanded
the Attributor scope without telling the Attributor about the SCC
extend. See: https://reviews.llvm.org/D85544#2227611
This reverts commit b0b32e6490.
-force-attribute adds an attribute to function via command-line.
However, there was no counter-part to remove an attribute. This patch
adds -force-remove-attribute that removes an attribute from function.
Differential Revision: https://reviews.llvm.org/D85586
There's a potential motivating case to increase this limit in PR47191:
http://bugs.llvm.org/PR47191
But first we should make it less hacky. The limit in InstCombine is directly tied
to this value because an increase there can cause asserts in the underlying value
tracking calls if not changed together. The usage in VectorUtils is independent,
but the comment suggests that we should use the same value unless there's a known
reason to diverge. There are similar limits in codegen analysis, but I think we
should leave those independent in case we intentionally want the optimization
power/cost to be different there.
Differential Revision: https://reviews.llvm.org/D86113
D81345 appears to accidentally disables vectorization when explicitly
enabled. As PGSO isn't currently accessible from LoopAccessInfo, revert back to
the vectorization with versioning-for-unit-stride for PGSO.
Differential Revision: https://reviews.llvm.org/D85784
canBeMovedDownwards checks if the "wait" counterpart of __tgt_target_data_begin_mapper can be moved downwards, returning a pointer to the instruction that might require/modify the data transferred, and returning null it the movement is not possible or not worth it. The function splitTargetDataBeginRTC receives that returned instruction and instead of moving the "wait" it creates it at that point.
Differential Revision: https://reviews.llvm.org/D86155
And as being reported by Florian Hahn, there's a hit
in MultiSource/Benchmarks/mafft from the test-suite on X86 with -O3 -flto,
so reverting until addressed.
This reverts commit 71e0b82c9f.
Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a
function can have a unique ICV value after it is finished, and
AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a
call site. This enables us to check the value of a call and if it
changes the ICV. This also changes the approach in
`getReplacementValues()` to a worklist-based approach so we can explore
all relevant BBs.
Differential Revision: https://reviews.llvm.org/D85544
Now that we no longer require for this map to have stable iteration order,
we no longer need to pay for keeping the iteration order stable,
so switch from `SmallMapVector` to `SmallDenseMap`.
While it may seem like we can just "deduplicate" the case where
some basic block happens to be a predecessor more than once,
which happens for e.g. switches, that is not correct thing to do.
We must actually add a PHI operand for each predecessor.
This was initially reported to me by David Major
as a clang crash during gecko build for android.
This patch introduces a new abstract attribute `AANoUndef` which corresponds to `noundef` IR attribute and deduce them.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85184
Before this change we looked through all memory operations in a function
even if the first was an unknown call that could do anything. This did
cost a lot of time but there is little use to do so. We also avoid
creating AAs for things that we would have looked at in case no other AA
will; that is the reason for the test changes.
Running only the attributor-cgscc pass on a IR version of
`llvm-test-suite/MultiSource/Applications/SPASS/clause.c` reduced the
time we spend in `AAMemoryLocation::update` from 4% total to
0.9% (disclaimer: no accurate measurements).
Before we tired to create a dominator tree for a declaration when we
wanted to determine if the function pointer is `nonnull`. We now avoid
looking at global values if `Value::getPointerDereferenceableBytes` not
already determined `nonnull`.
WIP that tries to hide the latency of runtime calls that involve host to
device memory transfers by splitting them into their "issue" and "wait"
versions. The "issue" is moved upwards as much as possible. The "wait" is
moved downards as much as possible. The "issue" issues the memory transfer
asynchronously, returning a handle. The "wait" waits in the returned
handle for the memory transfer to finish. We still lack of the movement.
While the original implementation added in D85787 / ae7f08812e
is not incorrect, it is known to be suboptimal.
In particular, it is not incorrect to use the basic block
in which the original `insertvalue` instruction is located
as the merge point, that is not necessarily optimal,
as `@test6` shows.
We should look at all the AggElts, and, if they are all defined
in the same basic block, then that is the basic block we should use.
On RawSpeed library, this catches +4% (+50) more cases.
On vanilla LLVM test-suits, this catches +12% (+92) more cases.
In a following patch, UseBB will be detected later,
so capturing it is potentially error-prone (capture by ref vs by val).
Also, parametrized UseBB will likely be needed
for multiple levels of PHI indirections later on anyways.
This is NFC at the moment, because right now we always insert the PHI
into the same basic block in which the original `insertvalue` instruction
is, but that will change.
Also, fixes addition of the suffix to the value names.
isWriteAtEndOfFunction needs to check all memory uses of Def, which is
much more expensive than getting the underlying objects in practice.
Switch the call order, as recommended by the TODO, which was added as
per an earlier review.
This shaves off a bit of compile-time.
Currently the code does not account for the fact that getDomMemoryDef
can be called with ScanLimit == 0, if we reached the limit while
processing an earlier access. Also tighten the check a bit more and bump
the scan limit now that it is handled properly.
In some cases, this brings a 2x speedup in terms of compile-time.
With gcc 6.3.0, I hit the following compilation bug.
../lib/Transforms/InstCombine/InstCombineVectorOps.cpp:937:2: error: extra ‘;’ [-Werror=pedantic]
};
^
cc1plus: all warnings being treated as errors
The error is introduced by Commit ae7f08812e ("[InstCombine]
Aggregate reconstruction simplification (PR47060)")
This pattern happens in clang C++ exception lowering code, on unwind branch.
We end up having a `landingpad` block after each `invoke`, where RAII
cleanup is performed, and the elements of an aggregate `{i8*, i32}`
holding exception info are `extractvalue`'d, and we then branch to common block
that takes extracted `i8*` and `i32` elements (via `phi` nodes),
form a new aggregate, and finally `resume`'s the exception.
The problem is that, if the cleanup block is effectively empty,
it shouldn't be there, there shouldn't be that `landingpad` and `resume`,
said `invoke` should be a `call`.
Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`.
But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft,
while it is pointless, does not look like "empty cleanup block".
So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has
higher cost than it could have on unwind branch :S
This doesn't happen *that* often, but it will basically happen once per C++
function with complex CFG that called more than one other function
that isn't known to be `nounwind`.
I think, this is a missing fold in InstCombine, so i've implemented it.
I think, the algorithm/implementation is rather self-explanatory:
1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate.
2. For each element, try to find from which aggregate it was extracted.
If it was extracted from the aggregate with identical type,
from identical element index, great.
3. If all elements were found to have been extracted from the same aggregate,
then we can just use said original source aggregate directly,
instead of re-creating it.
4. If we fail to find said aggregate when looking only in the current block,
we need be PHI-aware - we might have different source aggregate when coming
from each predecessor.
I'm not sure if this already handles everything, and there are some FIXME's,
i'll deal with all that later in followups.
I'd be fine with going with post-commit review here code-wise,
but just in case there are thoughts, i'm posting this.
On RawSpeed, for example, this has the following effect:
```
| statistic name | baseline | proposed | Δ | % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified | 0 | 1253 | 1253 | 0.00% | 0.00% |
| simplifycfg.NumInvokes | 948 | 1355 | 407 | 42.93% | 42.93% |
| instcount.NumInsertValueInst | 4382 | 3210 | -1172 | -26.75% | 26.75% |
| simplifycfg.NumSinkCommonCode | 574 | 458 | -116 | -20.21% | 20.21% |
| simplifycfg.NumSinkCommonInstrs | 1154 | 921 | -233 | -20.19% | 20.19% |
| instcount.NumExtractValueInst | 29017 | 26397 | -2620 | -9.03% | 9.03% |
| instcombine.NumDeadInst | 166618 | 174705 | 8087 | 4.85% | 4.85% |
| instcount.NumPHIInst | 51526 | 50678 | -848 | -1.65% | 1.65% |
| instcount.NumLandingPadInst | 20865 | 20609 | -256 | -1.23% | 1.23% |
| instcount.NumInvokeInst | 34023 | 33675 | -348 | -1.02% | 1.02% |
| simplifycfg.NumSimpl | 113634 | 114708 | 1074 | 0.95% | 0.95% |
| instcombine.NumSunkInst | 15030 | 14930 | -100 | -0.67% | 0.67% |
| instcount.TotalBlocks | 219544 | 219024 | -520 | -0.24% | 0.24% |
| instcombine.NumCombined | 644562 | 645805 | 1243 | 0.19% | 0.19% |
| instcount.TotalInsts | 2139506 | 2135377 | -4129 | -0.19% | 0.19% |
| instcount.NumBrInst | 156988 | 156821 | -167 | -0.11% | 0.11% |
| instcount.NumCallInst | 1206144 | 1207076 | 932 | 0.08% | 0.08% |
| instcount.NumResumeInst | 5193 | 5190 | -3 | -0.06% | 0.06% |
| asm-printer.EmittedInsts | 948580 | 948299 | -281 | -0.03% | 0.03% |
| instcount.TotalFuncs | 11509 | 11507 | -2 | -0.02% | 0.02% |
| inline.NumDeleted | 97595 | 97597 | 2 | 0.00% | 0.00% |
| inline.NumInlined | 210514 | 210522 | 8 | 0.00% | 0.00% |
```
So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half,
and there is a very apparent decrease in instruction and basic block count.
On vanilla llvm-test-suite:
```
| statistic name | baseline | proposed | Δ | % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified | 0 | 744 | 744 | 0.00% | 0.00% |
| instcount.NumInsertValueInst | 2705 | 2053 | -652 | -24.10% | 24.10% |
| simplifycfg.NumInvokes | 1212 | 1424 | 212 | 17.49% | 17.49% |
| instcount.NumExtractValueInst | 21681 | 20139 | -1542 | -7.11% | 7.11% |
| simplifycfg.NumSinkCommonInstrs | 14575 | 14361 | -214 | -1.47% | 1.47% |
| simplifycfg.NumSinkCommonCode | 6815 | 6743 | -72 | -1.06% | 1.06% |
| instcount.NumLandingPadInst | 14851 | 14712 | -139 | -0.94% | 0.94% |
| instcount.NumInvokeInst | 27510 | 27332 | -178 | -0.65% | 0.65% |
| instcombine.NumDeadInst | 1438173 | 1443371 | 5198 | 0.36% | 0.36% |
| instcount.NumResumeInst | 2880 | 2872 | -8 | -0.28% | 0.28% |
| instcombine.NumSunkInst | 55187 | 55076 | -111 | -0.20% | 0.20% |
| instcount.NumPHIInst | 321366 | 320916 | -450 | -0.14% | 0.14% |
| instcount.TotalBlocks | 886816 | 886493 | -323 | -0.04% | 0.04% |
| instcount.TotalInsts | 7663845 | 7661108 | -2737 | -0.04% | 0.04% |
| simplifycfg.NumSimpl | 886791 | 887171 | 380 | 0.04% | 0.04% |
| instcount.NumCallInst | 553552 | 553733 | 181 | 0.03% | 0.03% |
| instcombine.NumCombined | 3200512 | 3201202 | 690 | 0.02% | 0.02% |
| instcount.NumBrInst | 741794 | 741656 | -138 | -0.02% | 0.02% |
| simplifycfg.NumHoistCommonInstrs | 14443 | 14445 | 2 | 0.01% | 0.01% |
| asm-printer.EmittedInsts | 7978085 | 7977916 | -169 | 0.00% | 0.00% |
| inline.NumDeleted | 73188 | 73189 | 1 | 0.00% | 0.00% |
| inline.NumInlined | 291959 | 291968 | 9 | 0.00% | 0.00% |
```
Roughly similar effect, less instructions and blocks total.
See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e.
Compile-time wise, this appears to be roughly geomean-neutral:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions
And this is a win size-wize in general:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text
See https://bugs.llvm.org/show_bug.cgi?id=47060
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D85787
This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.
A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.
This is a resubmit of https://reviews.llvm.org/D83743
When adding elements when iterating, the iterator will become
valid, which could cause errors. This fixes the issue by using
indexes instead of iterator.
This patch internalize non-exact functions and replaces of their uses
with the internalized version. Doing this enables the analysis of
non-exact functions.
We can do this because some non-exact functions with the same name
whose linkage is `linkonce_odr` or `weak_odr` should have the same
semantics, so we can safely internalize and replace use of them (the
result of the other version of this function should be the same.).
Note that not all functions can be internalized, e.g., function with
`linkonce` or `weak` linkage.
For now when specified in commandline, we internalize all functions
that meet the requirements without calculating the cost of such
internalzation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D84167
This reverts commit 6dbf0cfcf7.
That commit caused failed assertions, e.g. like this:
$ cat sprintf-strcpy.c
char *ptr; void func(void) { ptr += sprintf(ptr, "%s", ""); }
$ clang -c sprintf-strcpy.c -O2 -target x86_64-linux-gnu
clang: ../lib/IR/Value.cpp:473: void llvm::Value::doRAUW(llvm::Value*,
llvm::Value::ReplaceMetadataUses): Assertion `New->getType() ==
getType() && "replaceAllUses of value with new value of different
type!"' failed.
Transformation creates big strings for big C values, so bail out for C > 128.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86004
This would be a problem if the entire instrumented function was a call
to
e.g. memcpy
Use FnPrologueEnd Instruction* instead of ActualFnStart BB*
Differential Revision: https://reviews.llvm.org/D86001
This allows us to add addtional instrumentation before the function start,
without splitting the first BB.
Differential Revision: https://reviews.llvm.org/D85985
Have the front-end use the `nounwind` attribute on atomic libcalls.
This prevents us from seeing `invoke __atomic_load` in MSAN, which
is problematic as it has no successor for instrumentation to be added.
When getUserCost was transitioned to use an explicit CostKind,
TCK_CodeSize was used even though the original kind was implicitly
SizeAndLatency so restore this behaviour. We now only query for
CodeSize when optimising for minsize.
I expect this to not change anything as, I think all, targets will
currently return the same value for CodeSize and SizeLatency. Indeed
I see no changes in the test suite for Arm, AArch64 and X86.
Differential Revision: https://reviews.llvm.org/D85829
This lets us support the scenario where a binary is linked from a mix
of object files with both instrumented and non-instrumented globals.
This is likely to occur on Android where the decision of whether to use
instrumented globals is based on the API level, which is user-facing.
Previously, in this scenario, it was possible for the comdat from
one of the object files with non-instrumented globals to be selected,
and since this comdat did not contain the note it would mean that the
note would be missing in the linked binary and the globals' shadow
memory would be left uninitialized, leading to a tag mismatch failure
at runtime when accessing one of the instrumented globals.
It is harmless to include the note when targeting a runtime that does
not support instrumenting globals because it will just be ignored.
Differential Revision: https://reviews.llvm.org/D85871
This is a fixup to commit 43bdac2906, to make sure the
address space from the original load pointer is retained in the
vector pointer.
Resolves problem with
Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
due to address space mismatch.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D85912
InstCombine adds users of transformed instruction to working list to
process on the same iteration. However gc.relocate may have a hidden
user (next gc.relocate) which is connected through gc.statepoint intrinsic and
there is no direct def-use chain between them.
In this case if the next gc.relocation is already processed it will not be added
to worklist and will not be able to be processed on the same iteration.
Let's we have the following case:
A = gc.relocate(null)
B = statepoint(A)
C = gc.relocate(B, hidden(A))
If C is already considered then after replacement of A with null, statepoint B
instruction will be added to the queue but not C.
C can be processed only on the next iteration.
If the chain of relocation is pretty long the many iteration may be required.
This change is to reduce the number of iteration to meet the latest changes
related to reducing infinite loop threshold.
This is a quick (not best) fix. In the follow up patches I plan to move gc relocation
handling into statepoint handler. This should also help to remove unused gc live
entries in statepoint bundle.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D75598
When removing instructions from unreachable blocks, and only debug info
intrinsics were removed, InstCombine could incorrectly return a false
Modified status.
This is fixed by making removeAllNonTerminatorAndEHPadInstructions()
also return how many debug info intrinsics that were removed, and take
that into account.
This was caught using the check introduced by D80916.
Reviewed By: majnemer
Differential Revision: https://reviews.llvm.org/D85839
We are re-using tryToMergePartialOverlappingStores, which requires
earlier to domiante Later. In the long run,
tryToMergeParialOverlappingStores should be re-written using MemorySSA.
Fixes PR46513.
This is a retry of rL300977 which was reverted because of infinite loops.
We have fixed all of the known places where that would happen, but there's
still a chance that this patch will cause infinite loops.
This matches the demanded bits behavior in the DAG and should fix:
https://bugs.llvm.org/show_bug.cgi?id=32706
Differential Revision: https://reviews.llvm.org/D32255
These are not correctness issues.
In visitUDivOperand(), if the (potential) divisor is undef, then udiv is
already UB, so it is not incorrect to keep undef as shift amount.
But, that is suboptimal.
We could instead simply drop that select, picking the other operand.
Afterwards, getLogBase2() could assert that there is no undef in divisor.
While x*undef is undef, shift-by-undef is poison,
which we must avoid introducing.
Also log2(iN undef) is *NOT* iN undef, because log2(iN undef) u< N.
See https://bugs.llvm.org/show_bug.cgi?id=47133
Commit 9385aaa848 ("[sancov] Fix PR33732") added zeroext to
__sanitizer_cov_trace(_const)?_cmp[1248] parameters for x86_64 only,
however, it is useful on other targets, in particular, on SystemZ: it
fixes swap-cmp.test.
Therefore, use it on all targets. This is safe: if target ABI does not
require zero extension for a particular parameter, zeroext is simply
ignored. A similar change has been implemeted as part of commit
3bc439bdff ("[MSan] Add instrumentation for SystemZ"), and there were
no problems with it.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D85689
When TTI was updated to use an explicit cost, TCK_CodeSize was used
although the default implicit cost would have been the hand-wavey
cost of size and latency. So, revert back to this behaviour. This is
not expected to have (much) impact on targets since most (all?) of
them return the same value for SizeAndLatency and CodeSize.
When optimising for size, the logic has been changed to query
CodeSize costs instead of SizeAndLatency.
This patch also adds a testing option in the unroller so that
OptSize thresholds can be specified.
Differential Revision: https://reviews.llvm.org/D85723
When visiting load and store instructions in SROA skip scalable vectors.
This is relevant in the implementation of the 'arm_sve_vector_bits'
attribute that is used to define VLS types, where an alloca of a
fixed-length vector could be bitcasted to scalable. See D85128 for more
information.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85725
Based on post-commit discussion in D81766, Hexagon sets this to "0".
I'll see if I can come up with a test, but making the obvious
code fix first to unblock that target.
When turning on -debug-info-kind=constructor we ran into a "fragment covers
entire variable" error during thinlto. The fragment is currently always
emitted if there is no type size, but sometimes the variable has a
forward declared struct type which doesn't have a size.
This changes the code to get the type size from the GlobalVariable instead.
Differential Revision: https://reviews.llvm.org/D85572
Without this patch, we attempt to distribute And over Xor even in
unsafe circumstances like so:
undef & (true ^ true) ==> (undef & true) ^ (undef & true)
and evaluate it to undef instead of false. Note that "true ^ true"
may show up implicitly with one true being part of a PHI node.
This patch fixes the problem by teaching SimplifyUsingDistributiveLaws
to not use undef as part of simplifications.
Reviewers: spatel, aqjune, nikic, lebedev.ri, fhahn, jdoerfert
Differential Revision: https://reviews.llvm.org/D85687
Introduce a helper on Instruction which can be used to update the debug
location after hoisting.
Use this in GVN and LICM, where we were mistakenly introducing new line
0 locations after hoisting (the docs recommend dropping the location in
this case).
For more context, see the discussion in https://reviews.llvm.org/D60913.
Differential Revision: https://reviews.llvm.org/D85670
Adds the binary format goff and the operating system zos to the triple
class. goff is selected as default binary format if zos is choosen as
operating system. No further functionality is added.
Reviewers: efriedma, tahonermann, hubert.reinterpertcast, MaskRay
Reviewed By: efriedma, tahonermann, hubert.reinterpertcast
Differential Revision: https://reviews.llvm.org/D82081
The entries in VectorizableTree are not necessarily ordered by their
position in basic blocks. Collect them and order them by dominance so
later instructions are guaranteed to be visited first. For instructions
in different basic blocks, we only scan to the beginning of the block,
so their order does not matter, as long as all instructions in a basic
block are grouped together. Using dominance ensures a deterministic order.
The modified test case contains an example where we compute a wrong
spill cost (2) without this patch, even though there is no call between
any instruction in the bundle.
This seems to have limited practical impact, .e.g on X86 with a recent
Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006
there are no binary changes.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D82444
SCEVExpander already tracks which instructions have been inserted n
InsertedValues/InsertedPostIncValues. This patch adds an additional
vector to collect the instructions in insertion order. This can then be
used to remove exactly the instructions inserted by the expander.
This replaces ExpandedValuesCleaner, which in some cases might remove
values not inserted by the expander (e.g. if a value was dead before
insertion and is then used during expansion).
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D84327
This patch enables `AAValueSimplify` to use information from `AAPotentialValues`
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85668
for invoke instructions.
We see a warning of "No debug information found in function foo: Function
profile not used" in a case. The function foo is called by an invoke
instruction. It has no debug information because it has attribute((nodebug))
in the definition. It shouldn't have profile instance in the sample profile
but compiler thinks it does, that turns out to be a compiler bug in
findCalleeFunctionSamples. The bug is exposed when sample-profile-merge-inlinee
is enabled recently.
Currently in findCalleeFunctionSamples, CalleeName is unset and is empty for
invoke instruction. For empty CalleeName, findFunctionSamplesAt will treat
the call as an indirect call and will return any inline instance profile at
the same location as the instruction. That leads to a wrong profile being
returned to function foo.
The patch set CalleeName when the instruction is an invoke.
Differential Revision: https://reviews.llvm.org/D85664
A GlobalAlias is an address-taken user of its aliased function.
canRenameComdatFunc has excluded such cases.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D85597
Instructions defined in the original inner loop preheader may depend on
values defined in the outer loop header, but the inner loop header will
become the entry block in the loop nest. Move the instructions from the
preheader to the outer loop header, so we do not break dominance. We
also have to check for unsafe instructions in the preheader. If there
are no unsafe instructions, all instructions should be movable.
Currently we move all instructions except the terminator and rely on
LICM to hoist out invariant instructions later.
Fixes PR45743
Values defined in the outer loop header could be used in the inner loop
latch. In that case, we need to create LCSSA phis for them, because after
interchanging they will be defined in the new inner loop and used in the
new outer loop.
This patch adds noundef to return value and arguments of standard I/O functions.
With this patch, passing undef or poison to the functions becomes undefined
behavior in LLVM IR. Since undef/poison is lowered from operations having UB in C/C++,
passing undef to them was already UB in source.
With this patch, the functions cannot return undef or poison anymore as well.
According to C17 standard, ungetc/ungetwc/fgetpos/ftell can generate unspecified
value; 3.19.3 says unspecified value is a valid value of the relevant type,
and using unspecified value is unspecified behavior, which is not UB, so it
cannot be undef (using undef is UB when e.g. it is used at branch condition).
— The value of the file position indicator after a successful call to the ungetc function for a text stream, or the ungetwc function for any stream, until all pushed-back characters are read or discarded (7.21.7.10, 7.29.3.10).
— The details of the value stored by the fgetpos function (7.21.9.1).
— The details of the value returned by the ftell function for a text stream (7.21.9.4).
In the long run, most of the functions listed in BuildLibCalls should have noundefs; to remove redundant diffs which will anyway disappear in the future, I added noundef to a few more non-I/O functions as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85345
Making use of undef is not safe if the simplification result is not used
to replace all uses of the result. This leads to problems in NewGVN,
which does not replace all uses in the IR directly. See PR33165 for more
details.
This patch adds an option to SimplifyQuery to disable the use of undef.
Note that I've only guarded uses if isa<UndefValue>/m_Undef where
SimplifyQuery is currently available. If we agree on the general
direction, I'll update the remaining uses.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84792
Add support for (if enabled) splitting cold functions into a separate section
in order to further boost locality of hot code.
Authored by: rjf (Ruijie Fang)
Reviewed by: hiraditya,rcorcs,vsk
Differential Revision: https://reviews.llvm.org/D85331
This patch was adjusted to match the most basic pattern that starts with an insertelement
(so there's no extract created here). Hopefully, that removes any concern about
interfering with other passes. Ie, the transform should almost always be profitable.
We could make an argument that this could be part of canonicalization, but we
conservatively try not to create vector ops from scalar ops in passes like instcombine.
If the transform is not profitable, the backend should be able to re-scalarize the load.
Differential Revision: https://reviews.llvm.org/D81766
Currently the SCEVExpander tries to re-use existing casts, even if they
are not exactly at the insertion point it was asked to create the cast.
To do so in some case, it creates a new cast at the insertion point and
updates all users to use the new cast.
This behavior is problematic, because it changes the IR outside of the
instructions created during the expansion. Therefore we cannot
completely undo all changes made during expansion.
This re-use should be only an extra optimization, so only using the new
cast in the expanded instructions should not be a correctness issue.
There are many cases equivalent instructions are created during
expansion.
This patch also adjusts findInsertPointAfter to skip instructions
inserted during expansion. This enables re-using existing casts without
the renaming any uses, by picking a better insertion point.
Reviewed By: efriedma, lebedev.ri
Differential Revision: https://reviews.llvm.org/D84399
SimplifyCFG has two main folds for resumes - one when resume is directly
using the landingpad, and the other one where resume is using a PHI node.
While for the first case, we were already correctly ignoring all the
PHI nodes, and both the debug info intrinsics and lifetime intrinsics,
in the PHI-based-one, we weren't ignoring PHI's in the resume block,
and weren't ignoring lifetime intrinsics. That is clearly a bug.
On RawSpeed library, this results in +9.34% (+81) more invoke->call folds,
-0.19% (-39) landing pads, -0.24% (-81) invoke instructions
but +51 call instructions and -132 basic blocks.
Though, the run-time performance impact appears to be within the noise.
This patch adds an optimization that folds select(freeze(icmp eq/ne x, y), x, y)
to x or y.
This was needed to resolve slowdown after D84940 is applied.
I tried to bake this logic into foldSelectInstWithICmp, but it wasn't clear.
This patch conservatively writes the pattern in a separate function,
foldSelectWithFrozenICmp.
The output does not need freeze; https://alive2.llvm.org/ce/z/X49hNE (from @nikic)
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D85533
No verification for pass mangers since it is not needed.
No verification for skipped loop pass since the asserted condition is not used.
Add a BeforeNonSkippedPass callback for this. The callback needs more
inputs than its parameters to work so the callback is added on-the-fly.
Reviewed By: aeubanks, asbirlea
Differential Revision: https://reviews.llvm.org/D84977
This is a split patch of D80991.
This patch introduces AAPotentialValues and its interface only.
For more detail of AAPotentialValues abstract attribute, see the original patch.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D83283
This patch is a follow up of D84733.
If a function has noundef attribute in returned position, instructions that return undef or poison value cause UB.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85178
Negator knows how to do this, but the one-use reasoning is getting
a bit muddy here, we don't really want to increase instruction count,
so we need to both lie that "IsNegation" and have an one-use check
on the outermost LHS value.
Multiplication is commutative, and either of operands can be negative,
so if the RHS is a negated power-of-two, we should try to make it
true power-of-two (which will allow us to turn it into a left-shift),
by trying to sink the negation down into LHS op.
But, we shouldn't re-invent the logic for sinking negation,
let's just use Negator for that.
Tests and original patch by: Simon Pilgrim @RKSimon!
Differential Revision: https://reviews.llvm.org/D85446
Summary:
This patch takes the indices operands of `insertelement`/`insertvalue`
into account while generation of seed elements for `findBuildAggregate()`.
This function has kept the original order of `insert`s before.
Also this patch optimizes `findBuildAggregate()` preventing it from
redundant temporary vector allocations and its multiple reversing.
Fixes llvm.org/pr44067
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83779
This is a simple patch that folds freeze(undef) into a proper constant after inspecting its uses.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84948
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.
In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).
Differential Revision: https://reviews.llvm.org/D75069
This is the last JumpThreading patch for getting the performance numbers shown at
https://reviews.llvm.org/D84940#2184653 .
This patch makes ProcessBlock call ProcessBranchOnPHI when the branch condition
is freeze(phi) as well (originally it calls the function when the condition is
phi only).
Since what ProcessBranchOnPHI does is to duplicate the basic block into
predecessors if profitable, it is still valid when the condition is freeze(phi)
too.
```
p = phi [a, pred1] [b, pred2]
p.fr = freeze p
br p.fr, ...
=>
pred1:
p.fr = freeze a
br p.fr, ...
pred2:
p.fr2 = freeze b
br p.fr2, ...
```
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85029
We were errneously only doing that for old-style abs/nabs,
but we have no such legality check on the condition of the select.
https://rise4fun.com/Alive/xBHS
MSan removes readnone/readonly and similar attributes from callees,
because after MSan instrumentation those attributes no longer apply.
This change removes the attributes from call sites, as well.
Failing to do this may cause DSE of paramTLS stores before calls to
readonly/readnone functions.
Differential Revision: https://reviews.llvm.org/D85259
This reverts commit e9761688e4. It breaks the build:
```
~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:10: error: no viable conversion from returned value of type 'SmallVector<[...], 8>' to function return type 'SmallVector<[...], 4>'
return ReductionOperations;
```
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.
In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).
Differential Revision: https://reviews.llvm.org/D75069
This was the most obvious regression in
f5df5cd5586ae9cfb2d9e53704dfc76f47aff149.f5df5cd5586ae9cfb2d9e53704dfc76f47aff149
We really don't want to do this if the original/outermost subtraction
isn't a negation, and therefore doesn't go away - just sinking negation
isn't a win. We are actually appear to be missing folds so hoist it.
https://rise4fun.com/Alive/tiVe
This reverts commit ac70b37a00
which reverted commit 8aeb2fe13a
because codegen tests got broken and i needed time to investigate.
This shows some regressions in tests, but they are all around GEP's,
so i'm not really sure how important those are.
https://rise4fun.com/Alive/1Gn
This is a simple patch that makes freeze as a zero-cost instruction, as bitcast already is.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85023
It is technically legal for optimizations to create an alloca that is
used by more than one dbg.declare, if one or both of them are inlined
instances of aliasing variables.
Differential Revision: https://reviews.llvm.org/D85172
This is the last remaining use of ConstantProp, migrate it to InstSimplify in the goal of removing ConstantProp.
Add -hexagon-instsimplify option to enable skipping of instsimplify in
tests that can't handle the extra optimization.
Differential Revision: https://reviews.llvm.org/D85047
If a section is supposed to hold elements of type T, then the
corresponding CreateSecStartEnd()'s Ty parameter represents T*.
Forwarding it to GlobalVariable constructor causes the resulting
GlobalVariable's type to be T*, and its SSA value type to be T**, which
is one indirection too many. This issue is mostly masked by pointer
casts, however, the global variable still gets an incorrect alignment,
which causes SystemZ to choose wrong instructions to access the
section.
This patch tries to improve readability and maintenance
of createVectorizedLoopSkeleton by reorganizing some lines,
updating some of the comments and breaking it up into
smaller logical units.
Reviewed By: pjeeva01
Differential Revision: https://reviews.llvm.org/D83824
Teach SCCP to create notconstant lattice values from inequality
assumes and nonnull metadata, and update getConstant() to make
use of them. Additionally isOverdefined() needs to be changed to
consider notconstant an overdefined value.
Handling inequality branches is delayed until our branch on undef
story in other passes has been improved.
Differential Revision: https://reviews.llvm.org/D83643
As discussed in D84949, this removes the constraint to cast since it does not
cause compile time degradation.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D85188
Compared to the optimized code with branch conditions never frozen,
limiting the type of freeze's operand causes generation of suboptimal code in
some cases.
I would like to suggest removing the constraint, as this patch does.
If the number of freeze instructions becomes significant, this can be revisited.
Differential Revision: https://reviews.llvm.org/D84949
D68041 placed `__profc_`, `__profd_` and (if exists) `__profvp_` in different comdat groups.
There are some issues:
* Cost: one or two additional section headers (`.group` section(s)): 64 or 128 bytes on ELF64.
* `__profc_`, `__profd_` and (if exists) `__profvp_` should be retained or
discarded. Placing them into separate comdat groups is conceptually inferior.
* If the prevailing group does not include `__profvp_` (value profiling not
used) but a non-prevailing group from another translation unit has `__profvp_`
(the function is inlined into another and triggers value profiling), there
will be a stray `__profvp_` if --gc-sections is not enabled.
This has been fixed by 3d6f53018f.
Actually, we can reuse an existing symbol (we choose `__profd_`) as the group
signature to avoid a string in the string table (the sole reason that D68041
could improve code size is that `__profv_` was an otherwise unused symbol which
wasted string table space). This saves one or two section headers.
For a -DCMAKE_BUILD_TYPE=Release -DLLVM_BUILD_INSTRUMENTED=IR build, `ninja
clang lld`, the patch has saved 10.5MiB (2.2%) for the total .o size.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D84723
We might want this if we find out that using of MustExecute analysis is too expensive.
By default we do the analysis because its complexity does not exceed the complexity
of whole loop copying in unswitching. Follow-up for D84925.
Differential Revision: https://reviews.llvm.org/D85001
Reviewed By: asbirlea
Currently, ArgPromotion may leave metadata uses of promoted values,
which will end up in the wrong function, creating invalid IR.
PR33641 fixed this for dead arguments, but it can be also be triggered
arguments with users that are promoted (see the updated test case).
We also have to drop uses to them after promoting them. We need to do
this after dealing with the non-metadata uses, so I also moved the empty
use case to the loop that deals with updating the arguments of the new
function.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D85127
Extend the memop value profile buckets to be more flexible (could accommodate a
mix of individual values and ranges) and to cover more value ranges (from 11 to
22 buckets).
Disabled behind a flag (to be enabled separately) and the existing code to be
removed later.
Differential Revision: https://reviews.llvm.org/D81682
Freeze always returns a defined value. This also prevents msan from
checking the input shadow, which happened because freeze wasn't
explicitly visited.
Differential Revision: https://reviews.llvm.org/D85040
The 1st try at this (rG2265d01f2a5b) exposed what looks like
unspecified behavior in C/C++ resulting in test variations.
The arguments to BinaryOperator::CreateAnd() were both IRBuilder
function calls, and the order in which they execute determines
the order of the new instructions in the IR. But the order of
function arg evaluation is not fixed by the rules of C/C++, so
depending on compiler config, the test would fail because the
test expected a single fixed ordering of instructions.
Original commit message:
I tried to use m_Deferred() on this, but didn't find
a clean way to do that.
http://bugs.llvm.org/PR46955https://alive2.llvm.org/ce/z/2h6QTq
No widening decisions will be computed for instructions outside the
loop. Do not try to get a widening decision. The load/store will be just
a scalar load, so treating at as normal should be fine I think.
Fixes PR46950.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D85087
This patch makes it possible to handle nonnull attribute violation at callsites in AAUndefinedBehavior.
If null pointer is passed to callee at a callsite and the corresponding argument of callee has nonnull attribute, the behavior of the callee is undefined.
In this patch, violations of argument nonnull attributes is only handled.
But violations of returned nonnull attributes can be handled and I will implement that in a follow-up patch.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D84733
This is a split patch of D80991.
This patch introduces AAPotentialValues and its interface only.
For more detail of AAPotentialValues abstract attribute, see the original patch.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D83283
As mentioned on D70376, LVI can currently cause performance issues
when running under NewPM. The problem is that, unlike the legacy
pass manager, NewPM will not immediately discard the LVI analysis
if the following pass does not need it. This is a problem, because
LVI has a high memory requirement, and mass invalidation of LVI
values is very inefficient. LVI should only be alive during passes
that actively interact with it.
This patch addresses the issue by explicitly abandoning LVI after CVP,
which gets us back to the LegacyPM behavior.
Differential Revision: https://reviews.llvm.org/D84959
Negating the input doesn't matter. I left a FIXME to copy the nsw flag if its present on the neg but not on the abs.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D85055
formLCSSAForInstructions is used by SCEVExpander, which tracks all
inserted instructions including LCSSA phis using asserting value
handles. This means cleanup needs to happen in the caller.
Extend formLCSSAForInstructions to take an optional pointer to a
vector. If this argument is non-nullptr, instead of directly deleting
the phis, add them to the vector, so the caller can process them.
This should address various PPC buildbot failures, including
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/40567
Use IRBuilder instead PHINode::Create. This should not impact the
generated code, but IRBuilder provides a way to register callbacks for
inserted instructions, which is convenient for some users.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D85037
querying getSCEV() for incomplete phis leads to wrong cache value in `ExprToIVMap`,
because incomplete phis may be simplified to same value before get SCEV expression.
Reviewed By: lebedev.ri, mkazantsev
Differential Revision: https://reviews.llvm.org/D77560
Summary: This patch separates the Loop Peeling Utilities from Loop Unrolling.
The reason for this change is that Loop Peeling is no longer only being used by
loop unrolling; Patch D82927 introduces loop peeling with fusion, such that
loops can be modified to have to same trip count, making them legal to be
peeled.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D83056
I found that propagateAttributes was ~23% of a thin link's run time
(almost 4x higher than the second hottest function). The main reason is
that it re-examines a global var each time it is referenced. This
becomes unnecessary once it is marked both non read only and non write
only. I added a set to avoid doing redundant work, which dropped the
runtime of that thin link by almost 15%.
I made a smaller efficiency improvement (no measurable impact) to skip
all summaries for a VI if the first copy is dead. I added an assert to
ensure that all copies are dead if any is. The code in
computeDeadSymbols marks all summaries for a VI as live. There is one
corner case where it was skipping marking an alias as live, that I
fixed. However, since the code earlier marked all copies of a preserved
GUID's VI as live, and each 'visit' marks all copies live, the only case
where this could make a difference is summaries that were marked live
when they were built initially, and that is only a few special compiler
generated symbols and inline assembly symbols, so it likely is never
provoked in practice.
Differential Revision: https://reviews.llvm.org/D84985
A function call can be replicated by optimizations like loop unroll and jump threading and the replicates end up sharing the sample nested callee profile. Therefore when it comes to merging samples for uninlined callees in the sample profile inliner, a callee profile can be merged multiple times which will cause an assert to fire.
This change avoids merging same callee profile for duplicate callsites by filtering out callee profiles with a non-zero head sample count.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D84997