Summary:
Resolve indirect branch target when possible.
This potentially eliminates more basicblocks and result in better evaluation for phi and other things.
Reviewers: davide, efriedma, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30322
llvm-svn: 299830
"PredicatesFoldable" returns false for signed/unsigned mismatched pairs,
so these cases should never exist. We'll default to 'unreachable' on those
predicate combos instead.
Most of what's left in these switches belongs in InstSimplify (and may
already be there), so there's probably more that can be done to reduce
this code.
llvm-svn: 299829
In isUseTriviallyOptimizableToLiveOnEntry, pointsToConstantMemory needs to be
called on the load's pointer operand, not on the result of the load (which
might not even be a pointer).
llvm-svn: 299823
coro-split-after-phi.ll test was flaky due to non-determinism in
the coroutine frame construction that was sorting the spill
vector using a pointer to a def as a part of the key.
The sorting was intended to make sure that spills for the same def
are kept together, however, we populate the vector by processing
defs in order, so the spill entires will end up together anyways.
This change removes spill sorting and restores the determinism
in the test.
llvm-svn: 299809
Summary:
Fix a bug where we were inserting a spill in between the PHIs in the beginning of the block.
Consider this fragment:
```
begin:
%phi1 = phi i32 [ 0, %entry ], [ 2, %alt ]
%phi2 = phi i32 [ 1, %entry ], [ 3, %alt ]
%sp1 = call i8 @llvm.coro.suspend(token none, i1 false)
switch i8 %sp1, label %suspend [i8 0, label %resume
i8 1, label %cleanup]
resume:
call i32 @print(i32 %phi1)
```
Unless we are spilling the argument or result of the invoke, we were always inserting the spill immediately following the instruction.
The fix adds a check that if the spilled instruction is a PHI Node, select an appropriate insert point with `getFirstInsertionPt()` that
skips all the PHI Nodes and EH pads.
Reviewers: majnemer, rnk
Reviewed By: rnk
Subscribers: qcolombet, EricWF, llvm-commits
Differential Revision: https://reviews.llvm.org/D31799
llvm-svn: 299771
This patch reapplies r298620. The original patch was reverted because of two
issues. First, the patch exposed a bug in InstCombine that caused the Chromium
builds to fail (PR32414). This issue was fixed in r299017. Second, the patch
introduced a bug in the vectorizer's scalars analysis that caused test suite
builds to fail on SystemZ. The scalars analysis was too aggressive and marked a
memory instruction scalar, even though it was going to be vectorized. This
issue has been fixed in the current patch and several new test cases for the
scalars analysis have been added.
llvm-svn: 299770
Summary:
getModRefInfo is meant to answer the question "what impact does this
instruction have on a given memory location" (not even another
instruction).
Long debate on this on IRC comes to the conclusion the answer should be "nothing special".
That is, a noalias volatile store does not affect a memory location
just by being volatile. Note: DSE and GVN and memdep currently
believe this, because memdep just goes behind AA's back after it says
"modref" right now.
see line 635 of memdep. Prior to this patch we would get modref there, then check aliasing,
and if it said noalias, we would continue.
getModRefInfo *already* has this same AA check, it just wasn't being used because volatile was
lumped in with ordering.
(I am separately testing whether this code in memdep is now dead except for the invariant load case)
Reviewers: jyknight, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31726
llvm-svn: 299741
Calling computeKnownBits on the RHS should allows us to recurse one step further. isMask is equivalent to the isPowerOf2(C+1) except in the case where C is all ones. But that was already handled earlier by creating a not which is an Xor with all ones. So this should be fine.
llvm-svn: 299710
This combine is fully handled by SimplifyDemandedInstructionBits as of r299658 where I fixed this code to ensure the Add/Sub had only a single user. Otherwise it would fire and create additional instructions. That fix resulted in an improvement to code generated for tsan which is why I committed it before deleting.
Differential Revision: https://reviews.llvm.org/D31543
llvm-svn: 299704
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.
From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.
Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
Differential Revision: https://reviews.llvm.org/D31070
llvm-svn: 299699
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.
Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.
This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.
At this moment it does not work on Gold (as in the globals are never
stripped).
This is a re-land of r298158 rebased on D31358. This time,
asan.module_ctor is put in a comdat as well to avoid quadratic
behavior in Gold.
llvm-svn: 299697
When possible, put ASan ctor/dtor in comdat.
The only reason not to is global registration, which can be
TU-specific. This is not the case when there are no instrumented
globals. This is also limited to ELF targets, because MachO does
not have comdat, and COFF linkers may GC comdat constructors.
The benefit of this is a lot less __asan_init() calls: one per DSO
instead of one per TU. It's also necessary for the upcoming
gc-sections-for-globals change on Linux, where multiple references to
section start symbols trigger quadratic behaviour in gold linker.
This is a rebase of r298756.
llvm-svn: 299696
Create the constructor in the module pass.
This in needed for the GC-friendly globals change, where the constructor can be
put in a comdat in some cases, but we don't know about that in the function
pass.
This is a rebase of r298731 which was reverted due to a false alarm.
llvm-svn: 299695
Summary:
Prior to this while it would delete the dead DIGlobalVariables, it would
leave dead DICompileUnits and everything referenced therefrom. For a bit
bitcode file with thousands of compile units those dead nodes easily
outnumbered the real ones. Clean that up.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D31720
llvm-svn: 299692
memorydefs, not just stores. Along the way, we audit and fixup issues
about how we were tracking memory leaders, and improve the verifier
to notice more memory congruency issues.
llvm-svn: 299682
Summary:
Remove all the caching the clobber walker does, and that the
caching walker does. With the patch to enable storing clobbering
access results for stores, i can find no improvement with the cache
turned on (and a number of degradations, both time and memory, from
the cost of caching. For a large program i have, we do millions of
lookups and inserts with zero hits).
I haven't tried to rename or simplify the walker otherwise yet.
(Appreciate some perf testing on this past my own testing)
Reviewers: george.burgess.iv, davide
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31576
llvm-svn: 299578
There must be some opportunity to refactor big chunks of nearly duplicated code in FoldOrOfICmps / FoldAndOfICmps.
Also, none of this works with vectors, but it should.
llvm-svn: 299568
Fix a bug in ARC contract pass where an iterator that pointed to a
deleted instruction was dereferenced.
It appears that tryToContractReleaseIntoStoreStrong was incorrectly
assuming that a call to objc_retain would not immediately follow a call
to objc_release.
rdar://problem/25276306
llvm-svn: 299507
stores with some fixes.
Summary:
This enables us to cache the clobbering access for stores, despite the
fact that we can't rewrite the use-def chains themselves.
Early testing shows that, after this change, for larger testcases, it
will be a significant net positive (memory and time) to remove the
walker caching.
Reviewers: george.burgess.iv, davide
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31567
llvm-svn: 299486
Currently we only fold with ConstantInt RHS. This generalizes to any Constant RHS.
Differential Revision: https://reviews.llvm.org/D31610
llvm-svn: 299466
This patch optimizes two memory intrinsic operations: memset and memcpy based
on the profiled size of the operation. The high level transformation is like:
mem_op(..., size)
==>
switch (size) {
case s1:
mem_op(..., s1);
goto merge_bb;
case s2:
mem_op(..., s2);
goto merge_bb;
...
default:
mem_op(..., size);
goto merge_bb;
}
merge_bb:
Differential Revision: http://reviews.llvm.org/D28966
llvm-svn: 299446
It turns out that SimplifyDemandedInstructionBits will get called earlier and remove bits from C1 first. Effectively doing (X & (C1&C2)) | C2. So by the time it got to this check there could be no common bits.
I think the DAGCombiner has the same check but its check can be executed because it handles demanded bits later. I'll look at it next.
llvm-svn: 299384
1. Improve enum, function, and variable names.
2. Improve comments.
3. Fix variable capitalization.
4. Run clang-format.
As an existing code comment suggests, this should work with vector types / splat constants too,
so making this look right first will reduce the diffs needed for that change.
llvm-svn: 299365
This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation.
Differential Revision: https://reviews.llvm.org/D31565
llvm-svn: 299362
The callers have already performed the necessary cast before calling. This allows us to remove a comment that says the instruction must be a BinaryOperator and make it explicit in the argument type.
Had to add a default case to the switch because BinaryOperator::getOpcode() returns a BinaryOps enum.
llvm-svn: 299339
As far as I can tell this combine is fully handled by SimplifyDemandedInstructionBits.
I was only looking at this because it is the only user of APIntOps::isShiftedMask which is itself broken. As demonstrated by r299187. I was going to fix isShiftedMask and needed to make sure we had coverage for the new cases it would expose to this combine. But looks like we can nuke it instead.
Differential Revision: https://reviews.llvm.org/D31543
llvm-svn: 299337
Summary:
Depends on D30928.
This adds support for coercion of stores and memory instructions that do not require insertion to process.
Another few tests down.
I added the relevant tests from rle.ll
Reviewers: davide
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30929
llvm-svn: 299330
Disable bypassing if one of the operands looks like a hash value. Slow
division often occurs in hashtable implementations and fast division is
never taken there because a hash value is extremely unlikely to have
enough upper bits set to zero.
A value is considered to be hash-like if it is produced by
1) XOR operation
2) Multiplication by a constant wider than the shorter type
3) PHI node with all incoming values being hash-like
Differential Revision: https://reviews.llvm.org/D28200
llvm-svn: 299329
Summary:
This enables us to cache the clobbering access for stores, despite the
fact that we can't rewrite the use-def chains themselves.
Early testing shows that, after this change, for larger testcases, it will be a significant net positive (memory and time) to remove the walker caching.
Reviewers: george.burgess.iv, davide
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31567
llvm-svn: 299322
processing the congruence class of the store.
Because we use the stored value of a store as the def, it isn't dead
just because it appears as a def when it comes from a store.
Note: I have not hit any cases with the memory code as it is where
this breaks anything, just because of what memory congruences we
actually allow. In a followup that improves memory congruence,
this bug actually breaks real stuff (but the verifier catches it).
llvm-svn: 299300
This removes a parameter from the routine that was responsible for a lot of the issue. It was a bit count that had to be set to the BitWidth of the APInt and would get passed to getLowBitsSet. This guaranteed the call to getLowBitsSet would create an all ones value. This was then compared to (V | (V-1)). So the only shifted masks we detected had to have the MSB set.
The one in tree user is a transform in InstCombine that never fires due to earlier transforms covering the case better. I've submitted a patch to remove it completely, but for now I've just adapted it to the new interface for isShiftedMask.
llvm-svn: 299273
This way we ensure we immediately revisit the instruction and do any additional optimizations before visiting the users. Otherwise we might visit the users, then the instruction, then users again, then instruction again.
llvm-svn: 299267
A common way to implement nearbyint is by fiddling with the floating
point environment and calling rint. This is used at least by the BSD
libm and musl. As such, canonicalizing the latter to the former will
create infinite loops for libm and generally pessimize performance, at
least when the generic C versions are used.
This change preserves the rint in the libcall translation and also
handles the domain truncation logic, so that rint with float argument
will be reduced to rintf etc.
llvm-svn: 299247
Summary: Currently the VP metadata was dropped when InstCombine converts a call to direct call. This patch converts the VP metadata to branch_weights so that its hotness is recorded.
Reviewers: eraman, davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31344
llvm-svn: 299228
Summary:
Triggered by commit r298620: "[LV] Vectorize GEPs".
If we encounter a vector GEP with scalar arguments, we splat the scalar
into a vector of appropriate size before we scatter the argument.
Reviewers: arsenm, mehdi_amini, bkramer
Reviewed By: arsenm
Subscribers: bjope, mssimpso, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D31416
llvm-svn: 299186
Since there is no sdiv in SCEV, an 'udiv' is a better canonical form than an 'sdiv' as the user of induction variable
Differential Revision: https://reviews.llvm.org/D31488
llvm-svn: 299118
Some of the GEP combines (e.g., descaling) can't handle vector GEPs. We have an
existing check that attempts to bail out if given a vector GEP. However, the
check only tests the GEP's pointer operand. A GEP results in a vector of
pointers if at least one of its operands is vector-typed (e.g., its pointer
operand could be a scalar, but its index could be a vector). We should just
check the type of the GEP itself. This should fix PR32414.
Reference: https://bugs.llvm.org/show_bug.cgi?id=32414
Differential Revision: https://reviews.llvm.org/D31470
llvm-svn: 299017
The vectorizer tries to replace truncations of induction variables with new
induction variables having the smaller type. After r295063, this optimization
was applied to all integer induction variables, including non-primary ones.
When optimizing the truncation of a non-primary induction variable, we still
need to transform the new induction so that it has the correct start value.
This should fix PR32419.
Reference: https://bugs.llvm.org/show_bug.cgi?id=32419
llvm-svn: 298882
Summary:
We are incorrectly folding selects into phi nodes when the incoming value of a phi
node is a constant vector. This optimization is done in `FoldOpIntoPhi` when the
select condition is a phi node with constant incoming values.
Without the fix, we are miscompiling (i.e. incorrectly folding the
select into the phi node) when the vector contains non-zero
elements.
This patch fixes the miscompile and we will correctly fold based on the
select vector operand (see added test cases).
Reviewers: majnemer, sanjoy, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31189
llvm-svn: 298845
The first variant contains all current transformations except
transforming switches into lookup tables. The second variant
contains all current transformations.
The switch-to-lookup-table conversion results in code that is more
difficult to analyze and optimize by other passes. Most importantly,
it can inhibit Dead Code Elimination. As such it is often beneficial to
only apply this transformation very late. A common example is inlining,
which can often result in range restrictions for the switch expression.
Changes in execution time according to LNT:
SingleSource/Benchmarks/Misc/fp-convert +3.03%
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk -11.20%
MultiSource/Benchmarks/Olden/perimeter/perimeter -10.43%
and a couple of smaller changes. For perimeter it also results 2.6%
a smaller binary.
Differential Revision: https://reviews.llvm.org/D30333
llvm-svn: 298799
This moves it to the iterator facade utilities giving it full random
access semantics, etc. It can also now be used with standard algorithms
like std::all_of and std::any_of and range adaptors like llvm::reverse.
Also make the semantics of iterating match what every other iterator
uses and forbid decrementing past the begin iterator. This was used as
a hacky way to work around iterator invalidation. However, every
instance trying to do this failed to actually avoid touching invalid
iterators despite the clear documentation that the removed and all
subsequent iterators become invalid including the end iterator. So I've
added a return of the next iterator to removeCase and rewritten the
loops that were doing this to correctly follow the iterator pattern of
either incremneting or removing and assigning fresh values to the
iterator and the end.
In one case we were trying to go backwards to make this cleaner but it
doesn't actually work. I've made that code match the code we use
everywhere else to remove cases as we iterate. This changes the order of
cases in one test output and I moved that test to CHECK-DAG so it
wouldn't care -- the order isn't semantically meaningful anyways.
llvm-svn: 298791
The first thing it did was get the User for the Use to get the instruction back. This requires looking through the Uses for the User using the waymarking walk. That's pretty fast, but its probably still better to just pass the Instruction we already had.
llvm-svn: 298772
When possible, put ASan ctor/dtor in comdat.
The only reason not to is global registration, which can be
TU-specific. This is not the case when there are no instrumented
globals. This is also limited to ELF targets, because MachO does
not have comdat, and COFF linkers may GC comdat constructors.
The benefit of this is a lot less __asan_init() calls: one per DSO
instead of one per TU. It's also necessary for the upcoming
gc-sections-for-globals change on Linux, where multiple references to
section start symbols trigger quadratic behaviour in gold linker.
llvm-svn: 298756
Reason: breaks linking Chromium with LLD + ThinLTO (a pass crashes)
LLVM bug: https://bugs.llvm.org//show_bug.cgi?id=32413
Original change description:
[LV] Vectorize GEPs
This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.
With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.
The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.
Original Differential Revision: https://reviews.llvm.org/D30710
llvm-svn: 298735
Create the constructor in the module pass.
This in needed for the GC-friendly globals change, where the constructor can be
put in a comdat in some cases, but we don't know about that in the function
pass.
llvm-svn: 298731
Summary: Declarations need to be filtered out when counting functions.
Reviewers: eraman
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31336
llvm-svn: 298720
SimplifyDemandedUseBits for Add/Sub already recursed down LHS and RHS for simplifying bits. If that didn't provide any simplifications we fall back to calling computeKnownBits which will recurse again. Instead just take the known bits for LHS and RHS we already have and call into a new function in ValueTracking that can calculate the known bits given the LHS/RHS bits.
llvm-svn: 298711
This prevents crashes when attempting to instrument functions containing
C++ try.
Sanitizer coverage will still fail at runtime when an exception is
thrown through a sancov instrumented function, but that seems marginally
better than what we have now. The full solution is to color the blocks
in LLVM IR and only instrument blocks that have an unambiguous color,
using the appropriate token.
llvm-svn: 298662
Summary: In DeadArgumentElimination, the call instructions will be replaced. We also need to set the prof weights so that function inlining can find the correct profile.
Reviewers: eraman
Reviewed By: eraman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31143
llvm-svn: 298660
Library functions can have specific semantics that affect the behavior of
certain passes. DSE, for instance, gives special treatment to malloc-ed pointers
but not to pointers returned from an equivalently typed (but differently named)
function.
MetaRenamer ought not to alter program semantics, so library functions must
remain untouched.
Reviewers: mehdi_amini, majnemer, chandlerc, davide
Reviewed By: davide
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D31304
llvm-svn: 298659
Summary:
loop unrolling and icp will make the sample profile annotation much harder in the backend. So disable these 2 optimization in the ThinLTO compile phase.
Will add a test in cfe in a separate patch.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: mehdi_amini, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D31217
llvm-svn: 298646
Now that we call ShrinkDemandedConstant on the RHS of sub this should be taken care of. This code doesn't trigger on any in tree regressions, but did before ShrinkDemandedConstant was added to the RHS.
llvm-svn: 298644
Summary:
The cumulative size of the bitcode files for a very large application
can be huge, particularly with -g. In a distributed build environment,
all of these files must be sent to the remote build node that performs
the thin link step, and this can exceed size limits.
The thin link actually only needs the summary along with a bitcode
symbol table. Until we have a proper bitcode symbol table, simply
stripping the debug metadata results in significant size reduction.
Add support for an option to additionally emit minimized bitcode
modules, just for use in the thin link step, which for now just strips
all debug metadata. I plan to add a cc1 option so this can be invoked
easily during the compile step.
However, care must be taken to ensure that these minimized thin link
bitcode files produce the same index as with the original bitcode files,
as these original bitcode files will be used in the backends.
Specifically:
1) The module hash used for caching is typically produced by hashing the
written bitcode, and we want to include the hash that would correspond
to the original bitcode file. This is because we want to ensure that
changes in the stripped portions affect caching. Added plumbing to emit
the same module hash in the minimized thin link bitcode file.
2) The module paths in the index are constructed from the module ID of
each thin linked bitcode, and typically is automatically generated from
the input file path. This is the path used for finding the modules to
import from, and obviously we need this to point to the original bitcode
files. Added gold-plugin support to take a suffix replacement during the
thin link that is used to override the identifier on the MemoryBufferRef
constructed from the loaded thin link bitcode file. The assumption is
that the build system can specify that the minimized bitcode file has a
name that is similar but uses a different suffix (e.g. out.thinlink.bc
instead of out.o).
Added various tests to ensure that we get identical index files out of
the thin link step.
Reviewers: mehdi_amini, pcc
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31027
llvm-svn: 298638
This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.
With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.
The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.
Differential Revision: https://reviews.llvm.org/D30710
llvm-svn: 298620
The code for generating scalar base pointers in vectorizeMemoryInstruction is
not needed. We currently scalarize all GEPs and maintain the scalarized values
in VectorLoopValueMap. The GEP cloning in this unneeded code is the same as
that in scalarizeInstruction. The test cases that changed as a result of this
patch changed because we were able to reuse the scalarized GEP that we
previously generated instead of cloning a new one.
Differential Revision: https://reviews.llvm.org/D30587
llvm-svn: 298615
Summary: ThinLTO will annotate the CFG twice. If the branch weight is set by the first annotation, we should not set the branch weight again in the second annotation because the first annotation is more accurate as there is less optimization that could affect debug info accuracy.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: mehdi_amini, aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D31228
llvm-svn: 298602
Pass const qualified summaries into importers and unqualified summaries into
exporters. This lets us const-qualify the summary argument to thinBackend.
Differential Revision: https://reviews.llvm.org/D31230
llvm-svn: 298534
Add a const version of the getTypeIdSummary accessor that avoids
mutating the TypeIdMap.
Differential Revision: https://reviews.llvm.org/D31226
llvm-svn: 298531
insertelement (insertelement X, Y, IdxC1), ScalarC, IdxC2 -->
insertelement (insertelement X, ScalarC, IdxC2), Y, IdxC1
As noted in the code comment and seen in the test changes, the motivation is that by pulling
constant insertion up, we may be able to constant fold some insertelement instructions.
Differential Revision: https://reviews.llvm.org/D31196
llvm-svn: 298520
- First time, during calculation of the cost in InlineCost.cpp
- Second time, during calculation of the cost in Inliner.cpp
This patches fixes this.
Differential Revision: https://reviews.llvm.org/D31137
llvm-svn: 298496
Summary: Subtracts can have constants on the left side, but we don't shrink them based on demanded bits. This patch fixes that to match the right hand side.
Reviewers: davide, majnemer, spatel, sanjoy, hfinkel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31119
llvm-svn: 298478
This adds a parameter to @llvm.objectsize that makes it return
conservative values if it's given null.
This fixes PR23277.
Differential Revision: https://reviews.llvm.org/D28494
llvm-svn: 298430
Summary: Because SamplePGO passes will be invoked twice in ThinLTO build: once at compile phase, the other at backend. We want to make sure the IR at the 2nd phase matches the hot part in profile, thus we do not want to inline hot callsites in the first phase.
Reviewers: tejohnson, eraman
Reviewed By: tejohnson
Subscribers: mehdi_amini, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D31201
llvm-svn: 298428
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.
Rename AttributeSetImpl to AttributeListImpl to follow suit.
It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.
Reviewers: sanjoy, javed.absar, chandlerc, pete
Reviewed By: pete
Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits
Differential Revision: https://reviews.llvm.org/D31102
llvm-svn: 298393
Summary: Inliner should update the branch_weights annotation to scale it to proper value.
Reviewers: davidxl, eraman
Reviewed By: eraman
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D30767
llvm-svn: 298270
InstCombine tries to constant fold instruction operands during worklist building, but we don't print that we're doing this.
We also set a change flag here that causes us to rebuild and rerun the worklist one more time even if processing the worklist itself created no additional changes. So in the log I saw two inst combine runs that visited all instructions without printing that anything was changed. I may be submitting another patch to remove the change flag unless I can find some reason why we should be doing that.
Differential Revision: https://reviews.llvm.org/D31091
llvm-svn: 298264
NFCI.
Summary:
This is ground work for the changes to enable coercion in NewGVN.
GVN doesn't care if they end up constant because it eliminates as it goes.
NewGVN cares.
IRBuilder and ConstantFolder deliberately present the same interface,
so we use this to our advantage to templatize our functions to make
them either constant only or not.
Reviewers: davide
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30928
llvm-svn: 298262
Summary: This Idom check seems unnecessary. The immediate children of a node on the Dominator Tree should always be the IDom of its immediate children in this case.
Reviewers: hfinkel, majnemer, dberlin
Reviewed By: dberlin
Subscribers: dberlin, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D26954
llvm-svn: 298232
Summary:
In case we are loading on a phi-load in SimplifyPartiallyRedundantLoad.
Try to phi translate it into incoming values in the predecessors before
we search for available loads.
This needs https://reviews.llvm.org/D30524
Reviewers: davide, sanjoy, efriedma, dberlin, rengolin
Reviewed By: dberlin
Subscribers: junbuml, llvm-commits
Differential Revision: https://reviews.llvm.org/D30543
llvm-svn: 298217
Summary:
iterateOnFunction creates a ReversePostOrderTraversal object which does a post order traversal in its constructor and stores the results in an internal vector. Iteration over it just reads from the internal vector in reverse order.
The GVN code seems to be unaware of this and iterates over ReversePostOrderTraversal object and makes a copy of the vector into a local vector. (I think at one point in time we used a DFS here instead which would have required the local vector).
The net affect of this is that we have two vectors containing the basic block list. As I didn't want to expose the implementation detail of ReversePostOrderTraversal's constructor to GVN, I've changed the code to do an explicit post order traversal storing into the local vector and then reverse iterate over that.
I've also removed the reserve(256) since the ReversePostOrderTraversal wasn't doing that. I can add it back if we thinks it important. Though it seemed weird that it wasn't based on the size of the function.
Reviewers: davide, anemet, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31084
llvm-svn: 298191
When InstCombine calls into SimplifyLibCalls and it createa putChar calls, we don't infer the attributes. And since SimplifyLibCalls doesn't use InstCombine's IRBuilder the calls doesn't end up in the worklist on this iteration of InstCombine. So it gets picked up on the next iteration where it causes an IR change. This of course causes InstCombine to run another iteration.
So this patch just gets the attributes right the first time. We already did this for puts and some other libcalls.
Differential Revision: https://reviews.llvm.org/D31094
llvm-svn: 298171
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.
Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.
This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.
At this moment it does not work on Gold (as in the globals are never
stripped).
Differential Revision: https://reviews.llvm.org/D30121
llvm-svn: 298158
Loop unswitching can be extremely harmful for a SIMT target. In case
if hoisted condition is not uniform a SIMT machine will execute both
clones of a loop sequentially. Therefor LoopUnswitch checks if the
condition is non-divergent.
Since DivergenceAnalysis adds an expensive PostDominatorTree analysis
not needed for non-SIMT targets a new option is added to avoid unneded
analysis initialization. The method getAnalysisUsage is called when
TargetTransformInfo is not yet available and we cannot use it here.
For that reason a new field DivergentTarget is added to PassManagerBuilder
to control the behavior and set this field from a target.
Differential Revision: https://reviews.llvm.org/D30796
llvm-svn: 298104
We were not handling getelemenptr instructions of vector type before.
Since getelemenptr instructions for vector types follow the same rule as
getelementptr instructions for non-vector types, we can just handle them
in the same way.
llvm-svn: 298028
Users often call getArgumentList().size(), which is a linear way to get
the number of function arguments. arg_size(), on the other hand, is
constant time.
In general, the fact that arguments are stored in an iplist is an
implementation detail, so I've removed it from the Function interface
and moved all other users to the argument container APIs (arg_begin(),
arg_end(), args(), arg_size()).
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D31052
llvm-svn: 298010
[Reapplies r297971 and punting on finding a better API for findDbgValues()]
This patch improves debug info quality in InstCombine by looking at
values that are about to be deleted, checking whether there are any
dbg.value instrinsics referring to them, and potentially encoding the
semantics of the deleted instruction into the dbg.value's
DIExpression.
In the example in the testcase (which was extracted from XNU) there is a sequence of
%4 = load %struct.entry*, %struct.entry** %next2, align 8, !dbg !41
%5 = bitcast %struct.entry* %4 to i8*, !dbg !42
%add.ptr4 = getelementptr inbounds i8, i8* %5, i64 -8, !dbg !43
%6 = bitcast i8* %add.ptr4 to %struct.entry*, !dbg !44
call void @llvm.dbg.value(metadata %struct.entry* %6, i64 0, metadata !20, metadata !21), !dbg 34
When these instructions are eliminated by instcombine one after
another, we can still salvage the otherwise dead debug info:
- Bitcasts have no effect, so have the dbg.value point to operand(0)
- Loads can be expressed via a DW_OP_deref
- Constant gep instructions can be replaced by DWARF expression arithmetic
The API introduced by this patch is not specific to instcombine and
can be useful in other places, too.
rdar://problem/30725338
Differential Revision: https://reviews.llvm.org/D30919
llvm-svn: 297994
As the related tests show, we're not canonicalizing to this form for scalars or vectors yet,
but this solves the immediate problem in:
https://bugs.llvm.org/show_bug.cgi?id=32306
llvm-svn: 297989
This patch improves debug info quality in InstCombine by looking at
values that are about to be deleted, checking whether there are any
dbg.value instrinsics referring to them, and potentially encoding the
semantics of the deleted instruction into the dbg.value's
DIExpression.
In the example in the testcase (which was extracted from XNU) there is a sequence of
%4 = load %struct.entry*, %struct.entry** %next2, align 8, !dbg !41
%5 = bitcast %struct.entry* %4 to i8*, !dbg !42
%add.ptr4 = getelementptr inbounds i8, i8* %5, i64 -8, !dbg !43
%6 = bitcast i8* %add.ptr4 to %struct.entry*, !dbg !44
call void @llvm.dbg.value(metadata %struct.entry* %6, i64 0, metadata !20, metadata !21), !dbg 34
When these instructions are eliminated by instcombine one after
another, we can still salvage the otherwise dead debug info:
- Bitcasts have no effect, so have the dbg.value point to operand(0)
- Loads can be expressed via a DW_OP_deref
- Constant gep instructions can be replaced by DWARF expression arithmetic
The API introduced by this patch is not specific to instcombine and
can be useful in other places, too.
rdar://problem/30725338
Differential Revision: https://reviews.llvm.org/D30919
llvm-svn: 297971
Summary:
The call to canEvaluateZExtd in InstCombiner::visitZExt may
return with BitsToClear == SrcTy->getScalarSizeInBits(), but
there is an assert that BitsToClear should be smaller than
SrcTy->getScalarSizeInBits().
I have a test case that triggers the assert, but it only happens
for my downstream target. I've not been able to trigger it for
any upstream target.
The assert triggered for a piece of code such as this
%shr1 = lshr i16 undef, 15
...
%shr2 = lshr i16 %shr1, 1
%conv = zext i16 %shr2 to i32
Normally the lshr instructions are constant folded before we
visit the zext (that is why it is so hard to reproduce).
The original pattern, before instcombine, is of course a lot more
complicated in my test case. The shift count in the second lshr
is for example determined by the outcome of a PHI instruction.
It seems like other rewrites by instcombine leads up to
the pattern above. And then the zext is pulled from the
worklist, and visited (hitting the assert), before we detect
that the lshr instrucions can be constant folded.
Anyway, since the canEvaluateZExtd may return with BitsToClear
equal to SrcTy->getScalarSizeInBits(), and since the rewrite
that converts the expression type to avoid a zero extend works
also for the case where SrcBitsKept ends up being zero, then
it should be OK to liberate the assert to
assert(BitsToClear <= SrcTy->getScalarSizeInBits() &&
"Unreasonable BitsToClear");
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D30993
llvm-svn: 297952
Summary:
In commit r289548 ([ADCE] Add code to remove dead branches) a redundant loop
nest was accidentally introduced, which implements exactly the same
functionality as has already been available right after. This redundancy has
been found when inspecting the ADCE code in the context of our recent
discussions on post-dominator modeling. This redundant code was also eliminated
by r296535 (which sparked the discussion), but only as part of a larger semantic
change of the post-dominance modeling. As this redundency in [ADCE] is really
just an oversight completely independent of the post-dominance changes under
discussion, we remove this redundancy independently.
Reviewers: dberlin, david2050
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31023
llvm-svn: 297929
This patch adds the value profile support to profile the size parameter of
memory intrinsic calls: memcpy, memcmp, and memmov.
Differential Revision: http://reviews.llvm.org/D28965
llvm-svn: 297897
Summary:
NSIs can be double-counted by different operations in
SelectInstVisitor. Sink the the update to VM_counting mode only.
Also reset the value for each counting operation.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: xur, llvm-commits
Differential Revision: https://reviews.llvm.org/D30999
llvm-svn: 297892
This patch refactors the code for value profile annotation to facilitate
of adding other kind of value profiles.
Differential Revision: http://reviews.llvm.org/D30989
llvm-svn: 297870
Summary:
In SamplePGO, if the profile is collected from non-LTO binary, and used to drive ThinLTO, the indirect call promotion may fail because ThinLTO adjusts local function names to avoid conflicts. There are two places of where the mismatch can happen:
1. thin-link prepends SourceFileName to front of FuncName to build the GUID (GlobalValue::getGlobalIdentifier). Unlike instrumentation FDO, SamplePGO does not use the PGOFuncName scheme and therefore the indirect call target profile data contains a hash of the OriginalName.
2. backend compiler promotes some local functions to global and appends .llvm.{$ModuleHash} to the end of the FuncName to derive PromotedFunctionName
This patch tries at the best effort to find the GUID from the original local function name (in profile), and use that in ICP promotion, and in SamplePGO matching that happens in the backend after importing/inlining:
1. in thin-link, it builds the map from OriginalName to GUID so that when thin-link reads in indirect call target profile (represented by OriginalName), it knows which GUID to import.
2. in backend compiler, if sample profile reader cannot find a profile match for PromotedFunctionName, it will try to find if there is a match for OriginalFunctionName.
3. in backend compiler, we build symbol table entry for OriginalFunctionName and pointer to the same symbol of PromotedFunctionName, so that ICP can find the correct target to promote.
Reviewers: mehdi_amini, tejohnson
Reviewed By: tejohnson
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30754
llvm-svn: 297757
This patch refactors the PHisToFix loop as follows:
- The loop itself now resides in its own method.
- The new method iterates on scalar-loop's header; the PHIsToFix map formerly
propagated as an output parameter and filled during phi widening is removed.
- The code handling reductions is moved into its own method, similar to the
existing fixFirstOrderRecurrence().
Differential Revision: https://reviews.llvm.org/D30755
llvm-svn: 297740
Refactoring Cost Model's selectVectorizationFactor() so that it handles only the
selection of the best VF from a pre-computed range of candidate VF's, extracting
early-exit criteria and the computation of a MaxVF upper-bound to other methods,
all driven by a newly introduced LoopVectorizationPlanner.
Differential Revision: https://reviews.llvm.org/D30653
llvm-svn: 297737
getIntrinsicInstrCost() used to only compute scalarization cost based on types.
This patch improves this so that the actual arguments are checked when they are
available, in order to handle only unique non-constant operands.
Tests updates:
Analysis/CostModel/X86/arith-fp.ll
Transforms/LoopVectorize/AArch64/interleaved_cost.ll
Transforms/LoopVectorize/ARM/interleaved_cost.ll
The improvement in getOperandsScalarizationOverhead() to differentiate on
constants made it necessary to update the interleaved_cost.ll tests even
though they do not relate to intrinsics.
Review: Hal Finkel
https://reviews.llvm.org/D29540
llvm-svn: 297705
This commit is a follow-up on r297580. It fixes the FIXME added temporarily
by that commit to keep the removal of Unroller's specialized version of
scalarizeInstruction() an NFC. See https://reviews.llvm.org/D30715 for details.
llvm-svn: 297610
Unroller's specialized scalarizeInstruction() is mostly duplicating Vectorizer's
variant. OTOH Vectorizer's scalarizeInstruction() already supports the special
case of VF==1 except for avoiding mask-bit extraction in that case. This patch
removes Unroller's specialized version in favor of a unified method.
The only functional difference between the two variants seems to be setting
memcheck metadata for loads and stores only in Vectorizer's variant, which is a
bug in Unroller. To keep this patch an NFC the unified method doesn't set
memcheck metadata for VF==1.
Differential Revision: https://reviews.llvm.org/D30715
llvm-svn: 297580
This reverts r293386, r294027, r294029 and r296411.
Turns out the SLP tree isn't actually a "tree" and we don't handle
accessing the same packet of loads in several different orders well,
causing miscompiles.
Revert until we can fix this properly.
llvm-svn: 297493
It was introduced in:
r296945
WholeProgramDevirt: Implement exporting for single-impl devirtualization.
---------------------
r296939
WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load devirtualizations to the list of llvm.type.test users.
---------------------
Microsoft Visual Studio Community 2015
Version 14.0.23107.0 D14REL
Does not compile that code without additional brackets, showing multiple error like below:
WholeProgramDevirt.cpp(1216): error C2958: the left bracket '[' found at 'c:\access_softek\llvm\lib\transforms\ipo\wholeprogramdevirt.cpp(1216)' was not matched correctly
WholeProgramDevirt.cpp(1216): error C2143: syntax error: missing ']' before '}'
WholeProgramDevirt.cpp(1216): error C2143: syntax error: missing ';' before '}'
WholeProgramDevirt.cpp(1216): error C2059: syntax error: ']'
llvm-svn: 297451
Summary:
These are the functions used to determine when values of loads can be
extracted from stores, etc, and to perform the necessary insertions to
do this. There are no changes to the functions themselves except
reformatting, and one case where memdep was informed of a removed load
(which was pushed into the caller).
Reviewers: davide
Subscribers: mgorny, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30478
llvm-svn: 297438
Summary:
Similar to SmallPtrSet, this makes find and count work with both const
referneces and const pointers.
Reviewers: dblaikie
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D30713
llvm-svn: 297424
entire SCC before iterating on newly-introduced call edges resulting
from any inlined function bodies.
This more closely matches the behavior of the old PM's inliner. While it
wasn't really clear to me initially, this behavior is actually essential
to the inliner behaving reasonably in its current design.
Because the inliner is fundamentally a bottom-up inliner and all of its
cost modeling is designed around that it often runs into trouble within
an SCC where we don't have any meaningful bottom-up ordering to use. In
addition to potentially cyclic, infinite inlining that we block with the
inline history mechanism, it can also take seemingly simple call graph
patterns within an SCC and turn them into *insanely* large functions by
accidentally working top-down across the SCC without any of the
threshold limitations that traditional top-down inliners use.
Consider this diabolical monster.cpp file that Richard Smith came up
with to help demonstrate this issue:
```
template <int N> extern const char *str;
void g(const char *);
template <bool K, int N> void f(bool *B, bool *E) {
if (K)
g(str<N>);
if (B == E)
return;
if (*B)
f<true, N + 1>(B + 1, E);
else
f<false, N + 1>(B + 1, E);
}
template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); }
template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); }
extern bool *arr, *end;
void test() { f<false, 0>(arr, end); }
```
When compiled with '-DMAX=N' for various values of N, this will create an SCC
with a reasonably large number of functions. Previously, the inliner would try
to exhaust the inlining candidates in a single function before moving on. This,
unfortunately, turns it into a top-down inliner within the SCC. Because our
thresholds were never built for that, we will incrementally decide that it is
always worth inlining and proceed to flatten the entire SCC into that one
function.
What's worse, we'll then proceed to the next function, and do the exact same
thing except we'll skip the first function, and so on. And at each step, we'll
also make some of the constant factors larger, which is awesome.
The fix in this patch is the obvious one which makes the new PM's inliner use
the same technique used by the old PM: consider all the call edges across the
entire SCC before beginning to process call edges introduced by inlining. The
result of this is essentially to distribute the inlining across the SCC so that
every function incrementally grows toward the inline thresholds rather than
allowing the inliner to grow one of the functions vastly beyond the threshold.
The code for this is a bit awkward, but it works out OK.
We could consider in the future doing something more powerful here such as
prioritized order (via lowest cost and/or profile info) and/or a code-growth
budget per SCC. However, both of those would require really substantial work
both to design the system in a way that wouldn't break really useful
abstraction decomposition properties of the current inliner and to be tuned
across a reasonably diverse set of code and workloads. It also seems really
risky in many ways. I have only found a single real-world file that triggers
the bad behavior here and it is generated code that has a pretty pathological
pattern. I'm not worried about the inliner not doing an *awesome* job here as
long as it does *ok*. On the other hand, the cases that will be tricky to get
right in a prioritized scheme with a budget will be more common and idiomatic
for at least some frontends (C++ and Rust at least). So while these approaches
are still really interesting, I'm not in a huge rush to go after them. Staying
even closer to the existing PM's behavior, especially when this easy to do,
seems like the right short to medium term approach.
I don't really have a test case that makes sense yet... I'll try to find a
variant of the IR produced by the monster template metaprogram that is both
small enough to be sane and large enough to clearly show when we get this wrong
in the future. But I'm not confident this exists. And the behavior change here
*should* be unobservable without snooping on debug logging. So there isn't
really much to test.
The test case updates come from two incidental changes:
1) We now visit functions in an SCC in the opposite order. I don't think there
really is a "right" order here, so I just update the test cases.
2) We no longer compute some analyses when an SCC has no call instructions that
we consider for inlining.
llvm-svn: 297374
Summary:
In a .symver assembler directive like:
.symver name, name2@@nodename
"name2@@nodename" should get the same symbol binding as "name".
While the ELF object writer is updating the symbol binding for .symver
aliases before emitting the object file, not doing so when the module
inline assembly is handled by the RecordStreamer is causing the wrong
behavior in *LTO mode.
E.g. when "name" is global, "name2@@nodename" must also be marked as
global. Otherwise, the symbol is skipped when iterating over the LTO
InputFile symbols (InputFile::Symbol::shouldSkip). So, for example,
when performing any *LTO via the gold-plugin, the versioned symbol
definition is not recorded by the plugin and passed back to the
linker. If the object was in an archive, and there were no other symbols
needed from that object, the object would not be included in the final
link and references to the versioned symbol are undefined.
The llvm-lto2 tests added will give an error about an unused symbol
resolution without the fix.
Reviewers: rafael, pcc
Reviewed By: pcc
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D30485
llvm-svn: 297332