One of the checks has been removed as it seem invalid.
The LoopStep size is always almost a 32-bit.
Differential Revision: https://reviews.llvm.org/D75079
This reverts commit 80d0a137a5, and the
follow on fix in 873c0d0786. It is
causing test failures after a multi-stage clang bootstrap. See
discussion on D73242 and D75201.
Summary:
Fixes an issue that cropped up after the changes in D73242 to delay
the lowering of type tests. LTT couldn't handle any type tests with
non-string type id (which happens for local vtables, which we try to
promote during the compile step but cannot always when there are no
exported symbols).
We can simply treat the same as having an Unknown resolution, which
delays their lowering, still allowing such type tests to be used in
subsequent optimization (e.g. planned usage during ICP). The final
lowering which simply removes these handles them fine.
Beefed up an existing ThinLTO test for such unpromoted type ids so that
the internal vtable isn't removed before lower type tests, which hides
the problem.
Reviewers: evgeny777, pcc
Subscribers: inglorion, hiraditya, steven_wu, dexonsmith, aganea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75201
Summary:
Current peeling implementation bails out in case of loop nests.
The patch introduces a field in TargetTransformInfo structure that
certain targets can use to relax the constraints if it's
profitable (disabled by default).
Also additional option is added to enable peeling manually for
experimenting and testing purposes.
Reviewers: fhahn, lebedev.ri, xbolva00
Reviewed By: xbolva00
Subscribers: RKSimon, xbolva00, hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D70304
getReductionVars, getInductionVars and getFirstOrderRecurrences were all
being returned from LoopVectorizationLegality as pointers to lists. This
just changes them to be references, cleaning up the interface slightly.
Differential Revision: https://reviews.llvm.org/D75448
getFirstInsertionPt's return value must be checked for validity before
casting it to Instruction*. Don't attempt to insert casts after a phi in
a catchswitch block.
Fixes PR45033, introduced in D37832.
Reviewed By: davidxl, hfinkel
Differential Revision: https://reviews.llvm.org/D75381
Summary:
This patch defines two freeze instructions to have the same value number if they are equivalent.
This is allowed because GVN replaces all uses of a duplicated instruction with another.
If it partially rewrites use, it is not allowed. e.g)
```
a = freeze(x)
b = freeze(x)
use(a)
use(a)
use(b)
=>
use(a)
use(b) // This is not allowed!
use(b)
```
Reviewers: fhahn, reames, spatel, efriedma
Reviewed By: fhahn
Subscribers: lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75398
This removes everything but int_x86_avx512_mask_vcvtph2ps_512 which provides the SAE variant, but even this can use the fpext generic if the rounding control is the default.
Differential Revision: https://reviews.llvm.org/D75162
Try again with an up-to-date version of D69471 (99317124 was a stale
revision).
---
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
Summary:
When -dfsan-event-callbacks is specified, insert a call to
__dfsan_mem_transfer_callback on every memcpy and memmove.
Reviewers: vitalybuka, kcc, pcc
Reviewed By: kcc
Subscribers: eugenis, hiraditya, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D75386
This change adds an assertion to prevent tricky bug related to recursive
approach of building vectorization tree. For loop below takes number of
operands directly from tree entry rather than from scalars.
If the entry at this moment turns out incomplete (i.e. not all operands set)
then not all the dependencies will be seen by the scheduler.
This can lead to failed scheduling (and thus failed vectorization)
for perfectly vectorizable tree.
Here is code example which is likely to fire the assertion:
for (i : VL0->getNumOperands()) {
...
TE->setOperand(i, Operands);
buildTree_rec(Operands, Depth + 1,...);
}
Correct way is two steps process: first set all operands to a tree entry
and then recursively process each operand.
Differential Revision: https://reviews.llvm.org/D75296
This aims to fix a missed inlining case.
If there's a virtual call in the callee on an alloca (stack allocated object) in
the caller, and the callee is inlined into the caller, the post-inline cleanup
would devirtualize the virtual call, but if the next iteration of
DevirtSCCRepeatedPass doesn't happen (under the new pass manager), which is
based on a heuristic to determine whether to reiterate, we may miss inlining the
devirtualized call.
This enables inlining in clang/test/CodeGenCXX/member-function-pointer-calls.cpp.
This is a second commit after a revert
https://reviews.llvm.org/rG4569b3a86f8a4b1b8ad28fe2321f936f9d7ffd43 and a fix
https://reviews.llvm.org/rG41e06ae7ba91.
Differential Revision: https://reviews.llvm.org/D69591
Summary: This fixes the crash that led to the revert of D69591.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75307
This patch deletes some dead code out of SLP vectorizer.
Couple of changes taken out of D57059 to slightly lighten it
plus one more similar case fixed.
Differential Revision: https://reviews.llvm.org/D75276
Summary:
Final patch in series to fix inlining between functions with different
nobuiltin attributes/options, which was specifically an issue in LTO.
See discussion on D61634 for background.
The prior patch in this series (D67923) enabled per-Function TLI
construction that identified the nobuiltin attributes.
Here I have allowed inlining to proceed if the callee's nobuiltins are a
subset of the caller's nobuiltins, but not in the reverse case, which
should be conservatively correct. This is controlled by a new option,
-inline-caller-superset-nobuiltin, which is enabled by default.
Reviewers: hfinkel, gchatelet, chandlerc, davidxl
Subscribers: arsenm, jvesely, nhaehnle, mehdi_amini, eraman, hiraditya, haicheng, dexonsmith, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74162
Summary:
This patch makes EarlyCSE fold equivalent freeze instructions.
Another optimization that I think will be useful is to remove freeze if its operand is used as a branch condition or at llvm.assume:
```
%c = ...
br i1 %c, label %A, ..
A:
%d = freeze %c ; %d can be optimized to %c because %c cannot be poison or undef (or 'br %c' would be UB otherwise)
```
If it make sense for EarlyCSE to support this as well, I will make a patch for this.
Reviewers: spatel, reames, lebedev.ri
Reviewed By: lebedev.ri
Subscribers: lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75334
SROA will drop the explicit alignment on allocas when the ABI guarantees
enough alignment. Because the alignment on new load/store instructions
are set based on the alloca's alignment, that means SROA would end up
dropping the alignment from atomic loads and stores, which is not
allowed (see bug). For those, make sure to always carry over the
alignment from the previous instruction.
Differential revision: https://reviews.llvm.org/D75266
Summary:
For now just insert the callback for stores, similar to how MSan tracks
origins. In the future we may want to add callbacks for loads, memcpy,
function calls, CMPs, etc.
Reviewers: pcc, vitalybuka, kcc, eugenis
Reviewed By: vitalybuka, kcc, eugenis
Subscribers: eugenis, hiraditya, #sanitizers, llvm-commits, kcc
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D75312
DSE would mistakenly remove store (2):
a = calloc(n+1)
for (int i = 0; i < n; i++) {
store 1, a[i+1] // (1)
store 0, a[i] // (2)
}
The fix is to do PHI transaltion while looking for clobbering
instructions between the store and the calloc.
Reviewed By: efriedma, bjope
Differential Revision: https://reviews.llvm.org/D68006
When InstCombine initially populates the worklist, it already
performs constant folding and DCE. However, as the instructions
are initially visited in program order, this DCE can pick up only
the last instruction of a dead chain, the rest would only get
picked up in the main InstCombine run.
To avoid this, we instead perform the DCE in separate pass over the
collected instructions in reverse order, which will allow us to
pick up full dead instruction chains. We already need to do this
reverse iteration anyway to populate the worklist, so this
shouldn't add extra cost.
This by itself only fixes a small part of the problem though:
The same basic issue also applies during the main InstCombine loop.
We generally always want DCE to occur as early as possible,
because it will allow one-use folds to happen. Address this by also
performing DCE while adding deferred instructions to the main worklist.
This drops the number of tests that perform more than 2 InstCombine
iterations from ~80 to ~40. There's some spurious test changes due
to operand order / icmp toggling.
Differential Revision: https://reviews.llvm.org/D75008
Use UnaryOperator::CreateFNeg instead.
Summary:
With the introduction of the native fneg instruction, the
fsub -0.0, %x idiom is obsolete. This patch makes LLVM
emit fneg instead of the idiom in all places.
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D75130
A recent commit
(https://reviews.llvm.org/rG66c120f02560ef528a60924104ead66f330190f1) changed
the cost for calls to functions that have a vector version for some
vectorization factor. However, no check is performed for whether the
vectorization factor matches the current one being cost modeled. This leads to
attempts to widen call instructions to a vectorization factor for which such a
function does not exist, which in turn leads to an assertion failure.
This patch adds the check for vectorization factor (i.e. not just that the
called function has a vector version for some VF, but that it has a vector
version for this VF).
Differential revision: https://reviews.llvm.org/D74944
As suggested in D75145 -
I'm not sure why, but several passes have this kind of disable/enable flag
implemented at the pass manager level. But that means we have to duplicate
the flag for both pass managers and add code to check the flag every time
the pass appears in the pipeline.
We want a debug option to see if this pass is misbehaving regardless of the
pass managers, so just add a disablement check at the single point before
any transforms run.
Differential Revision: https://reviews.llvm.org/D75204
CVP currently does not simplify cmps with instructions in the same
block, because LVI getPredicateAt() currently does not provide
much useful information for that case (D69686 would change that,
but is stuck.) However, if the instruction is a Phi node, then
LVI can compute the result of the predicate by threading it into
the predecessor blocks, which allows it simplify some conditions
that nothing else can handle. Relevant code:
6d6a4590c5/llvm/lib/Analysis/LazyValueInfo.cpp (L1904-L1927)
Differential Revision: https://reviews.llvm.org/D72169
InstCombine removes pairs of start+end intrinsics that don't
have anything in between them. Currently this is done by starting
at the start intrinsic and scanning forwards. This patch changes
it to start at the end intrinsic and scan backwards.
The motivation here is as follows: When we process the start
intrinsic, we have not yet looked at the following instructions,
which may still get folded/removed. If they do, we will only be
able to remove the start/end pair on the next iteration. When we
process the end intrinsic, all the instructions before it have
already been visited, and we don't run into this problem.
Differential Revision: https://reviews.llvm.org/D75011
DevirtSCCRepeatedPass iteration. Needs ReviewPublic
This aims to fix a missed inlining case.
If there's a virtual call in the callee on an alloca (stack allocated object) in
the caller, and the callee is inlined into the caller, the post-inline cleanup
would devirtualize the virtual call, but if the next iteration of
DevirtSCCRepeatedPass doesn't happen (under the new pass manager), which is
based on a heuristic to determine whether to reiterate, we may miss inlining the
devirtualized call.
This enables inlining in clang/test/CodeGenCXX/member-function-pointer-calls.cpp.
See discussion on PR44792.
This reverts commit 02ce9d8ef5.
It also reverts the follow-up commits
8f46269f0 "[profile] Don't dump counters when forking and don't reset when calling exec** functions"
62c7d8402 "[profile] gcov_mutex must be static"
Summary:
Loop unswitch hoists branches on loop-invariant conditions. However, if this
condition is poison/undef and the branch wasn't originally reachable, loop
unswitch introduces UB (since the optimized code will branch on poison/undef and
the original one didn't)).
We fix this problem by freezing the condition to ensure we don't introduce UB.
We will now transform the following:
while (...) {
if (C) { A }
else { B }
}
Into:
C' = freeze(C)
if (C') {
while (...) { A }
} else {
while (...) { B }
}
This patch fixes the root cause of the following bug reports (which use the old loop unswitch, but can be reproduced with minor changes in the code and -enable-nontrivial-unswitch):
- https://llvm.org/bugs/show_bug.cgi?id=27506
- https://llvm.org/bugs/show_bug.cgi?id=31652
Reviewers: reames, majnemer, chenli, sanjoy, hfinkel
Reviewed By: reames
Subscribers: hiraditya, jvesely, nhaehnle, filcab, regehr, trentxintong, nlopes, llvm-commits, mzolotukhin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D29015
If we deduplicate OpenMP runtime calls we have multiple `ident_t*` that
represent information like source location. So far, we simply kept the
one used by the replacement call. However, as exposed by PR44893, that
can cause problems if we have stack allocated `ident_t` objects. While
we need to revisit the use of these as well, it is clear that we
eventually want to merge source location information in some way. With
this patch we add the infrastructure to do so but without doing the
actual merge. Instead we pick a global `ident_t` from the replaced
calls, if possible, or create a new one with an unknown location
instead.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D74925
body
We started seeing cases where ARC optimizer would move retain calls into
loop bodies, causing imbalance in the number of retain and release
calls, after changes were made to delete inert ARC calls since the inert
calls that used to block code motion are gone.
Fix the bug by setting the CFG hazard flag when visiting a loop header.
rdar://problem/56908836
Summary:
Replacing uses of IV outside of the loop is likely generally useful,
but `rewriteLoopExitValues()` is cautious, and if it isn't told to always
perform the replacement, and there are hard uses of IV in loop,
it doesn't replace.
In [[ https://bugs.llvm.org/show_bug.cgi?id=44668 | PR44668 ]],
that prevents `-indvars` from replacing uses of induction variable
after the loop, which might be one of the optimization failures
preventing that code from being vectorized.
Instead, now that the cost model is fixed, i believe we should be
a little bit more optimistic, and also perform replacement
if we believe it is within our budget.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=44668 | PR44668 ]].
Reviewers: reames, mkazantsev, asbirlea, fhahn, skatkov
Reviewed By: mkazantsev
Subscribers: nikic, hiraditya, zzheng, javed.absar, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73501
Summary:
In future patches`SCEVExpander::isHighCostExpansionHelper()` will respect the budget allocated by performing TTI cost modelling.
This is a fully NFC patch to make things reviewable.
Reviewers: reames, mkazantsev, wmi, sanjoy
Reviewed By: mkazantsev
Subscribers: hiraditya, zzheng, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73705
Summary:
Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()`
This is a fully NFC patch to make things reviewable.
Reviewers: reames, mkazantsev, wmi, sanjoy
Reviewed By: mkazantsev
Subscribers: hiraditya, zzheng, javed.absar, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73704
This reverts commit 8d22100f66.
There was a functional regression reported (https://bugs.llvm.org/show_bug.cgi?id=44996). I'm not actually sure the patch is wrong, but I don't have time to investigate currently, and this line of work isn't something I'm likely to get back to quickly.
Much like with reassociateShiftAmtsOfTwoSameDirectionShifts(),
as input, we have the following pattern:
icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0
We want to rewrite that as:
icmp eq/ne (and (x shift (Q+K)), y), 0 iff (Q+K) u< bitwidth(x)
While we know that originally (Q+K) would not overflow
(because 2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.
To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.
https://bugs.llvm.org/show_bug.cgi?id=44802
As input, we have the following pattern:
Sh0 (Sh1 X, Q), K
We want to rewrite that as:
Sh x, (Q+K) iff (Q+K) u< bitwidth(x)
While we know that originally (Q+K) would not overflow
(because 2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.
To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.
https://bugs.llvm.org/show_bug.cgi?id=44802
Code duplication (subsequently removed by refactoring) allowed
a logic discrepancy to creep in here.
We were being conservative about creating a vector binop -- but
not a vector cmp -- in the case where a vector op has the same
estimated cost as the scalar op. We want to be more aggressive
here because that can allow other combines based on reduced
instruction count/uses.
We can reverse the transform in DAGCombiner (potentially with a
more accurate cost model) if this causes regressions.
AFAIK, this does not conflict with InstCombine. We have a
scalarize transform there, but it relies on finding a constant
operand or a matching insertelement, so that means it eliminates
an extractelement from the sequence (so we won't have 2 extracts
by the time we get here if InstCombine succeeds).
Differential Revision: https://reviews.llvm.org/D75062
This patch adds bindings to C and Go for
addCoroutinePassesToExtensionPoints, which is used to add coroutine
passes to the correct locations in PassManagerBuilder.
Differential Revision: https://reviews.llvm.org/D51642
Summary:
There is no need to write out gcdas when forking because we can just reset the counters in the parent process.
Let say a counter is N before the fork, then fork and this counter is set to 0 in the child process.
In the parent process, the counter is incremented by P and in the child process it's incremented by C.
When dump is ran at exit, parent process will dump N+P for the given counter and the child process will dump 0+C, so when the gcdas are merged the resulting counter will be N+P+C.
About exec** functions, since the current process is replaced by an another one there is no need to reset the counters but just write out the gcdas since the counters are definitely lost.
To avoid to have lists in a bad state, we just lock them during the fork and the flush (if called explicitely) and lock them when an element is added.
Reviewers: marco-c
Reviewed By: marco-c
Subscribers: hiraditya, cfe-commits, #sanitizers, llvm-commits, sylvestre.ledru
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D74953
This version fixes a buildbot failure cause by picking the wrong insert
point for XORs. We cannot pick the XOR binary operator as insert point,
as it is not guaranteed that both input operands for the overflow
intrinsic are defined before it.
This reverts the revert commit
c7fc0e5da6.
Add a map from BasicBlocks to overlap intervals. For partial writes, we
can keep track of those in IOLs. We only add candidates that are valid
for eliminations.
Reviewers: dmgreen, bryant, asbirlea, Tyker
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D73757
Summary:
Blocks in a loop can be in any order as long as the loop header is the
first block in Blocks.
With some order of Blocks, cloneLoopWithPreheader would trigger the
assertion in addBasicBlockToLoop.
Example:
define void @test(i64 %N) {
preheader.i:
br label %header.i
header.i:
%i = phi i64 [ 0, %preheader.i ], [ %inc.i, %latch.i ]
br label %header.j
header.j:
%j = phi i64 [ 0, %header.i ], [ %inc.j, %latch.j ]
br label %header.k
header.k:
%k = phi i64 [ 0, %header.j ], [ %inc.k, %latch.k ]
call void @baz(i64 %i, i64 %j, i64 %k)
br label %latch.k
latch.k:
%inc.k = add nsw i64 %k, 1
%cmp.k = icmp slt i64 %inc.k, %N
br i1 %cmp.k, label %header.k, label %latch.j
latch.j:
%inc.j = add nsw i64 %j, 1
%cmp.j = icmp slt i64 %inc.j, %N
br i1 %cmp.j, label %header.j, label %latch.i
latch.i:
%inc.i = add nsw i64 %i, 1
%cmp.i = icmp slt i64 %inc.i, %N
br i1 %cmp.i, label %header.i, label %exit.i
exit.i:
ret void
}
declare void @baz(i64, i64, i64)
If the blocks of loop-i is in the order: header.i, latch.k, header.k,
header.j, latch.j, latch.i,
then cloneLoopWithPreheader would trigger the assertion in
addBasicBlockToLoop
assert(contains(SameHeader) && getHeader() == SameHeader->getHeader() &&
"Incorrect LI specified for this loop!");
As latch.k is in both loop-j and loop-k, it would be set as the header
of both loops after adding latch.k.
If we update loop headers during cloning blocks, then after adding
header.k,
the header of loop-k would be updated with header.k,
while the header of loop-j stays as latch.k.
When adding header.j, SameHeader is loop-k, SameHeader->getHeader() is
header.k, but getHeader() is latch.k, which trigger the assertion.
Reviewer: jdoerfert, Meinersbur, fhahn, kbarton, hfinkel, bmahjour,
etiotto
Reviewed By: Meinersbur
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D74382
This should be the last step in the current cleanup.
Follow-ups should resolve the TODO about cost calc
and enable the more general case where we extract
different elements.
This fixes a small mistake from D72944: The worklist add should
happen before assigning the new operand, not after.
In case an actual replacement happens, the old operand needs to
be added for DCE. If no actual replacement happens, then old/new
are the same, so it doesn't matter.
This drops one iteration from the annotated test case.
Followup to D73919 with another batch of replacements of
setOperand() -> replaceOperand(), to make sure the old
operand gets DCEd right away.
Differential Revision: https://reviews.llvm.org/D74932
ToVectorTy is defined and used in multiple places. Hoist it to
VectorUtils.h to avoid duplication and improve re-usability.
Reviewers: rengolin, hsaito, Ayal, gilr, fpetrogalli
Reviewed By: fpetrogalli
Differential Revision: https://reviews.llvm.org/D74959
This changes the SimplifyLibCalls utility to accept an IRBuilderBase,
which allows us to pass through the IRBuilder used by InstCombine.
This will ensure that new instructions get added to the worklist.
The annotated test-case drops from 4 to 2 InstCombine iterations thanks
to this.
To achieve this, I'm adding an IRBuilderBase::OperandBundlesGuard,
which is basically the same as the existing InsertPointGuard and
FastMathFlagsGuard, but for operand bundles. Also add a
setDefaultOperandBundles() method so these can be set outside the
constructor.
Differential Revision: https://reviews.llvm.org/D74792
Can be used like
-debug-counter=dse-memoryssa-skip=10,dse-memoryssa-counter-count=20
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72147
For tracked globals that are unknown after solving, we expect all
non-store uses to be replaced.
This is a follow-up to f8045b250d, which removed forcedconstant.
We should not mark unknown loads as overdefined, as they either load
from an unknown pointer or an undef global. Restore the original logic
for loads.
Summary:
After updating cost model in AMDGPU target (47a5c36b37) the pass started to
ignore some BBs since they got all instructions estimated as free.
Reviewers: arsenm, chandlerc, nhaehnle
Reviewed By: nhaehnle
Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74825
We can look through calls with `returned` argument attributes when we
collect subsuming positions. This allows us to get existing attributes
from more places.
We are often interested in an assumed constant and sometimes it has to
be an integer constant. Before we only looked for the latter, now we can
ask for either.
If we propagate function pointers across function boundaries we can
create new call edges. These need to be represented in the CG if we run
as a CGSCC pass. In the new pass manager that is currently not handled
by the CallGraphUpdater so we need to prevent the situation for now.
We usually will ask for liveness of an argument anyway so we ended up
lazily creating the attribute anyway. However, that is not always the
case and even if it is we should go the eager route here. Various tests
show how this can improve the outcome. One test exposed a problem with
type mismatches between argument and call site argument, a fix is
included. For liveness various more tests were added as well.
If a function pointer is casted into a different type the resulting
expression can be a constant. If so, it can be used multiple times which
cannot be handled by the AbstractCallSite constructor alone. Instead, we
follow the cast expression uses now explicitly during the call site
traversal.
In builds with assertions enabled (!NDEBUG), IndVarSimplify does an
additional query to ScalarEvolution which may change future SCEV queries
since it fills the internal cache differently. The result is actually
only used with the -verify-indvars command line option. We fix the issue
by only calling SE->getBackedgeTakenCount(L) if -verify-indvars is
enabled such that only -verify-indvars shows the behavior, but not debug
builds themselves. Also add a remark to the description of
-verify-indvars about this behavior.
Fixes llvm.org/PR44815
Differential Revision: https://reviews.llvm.org/D74810
Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match
the expected pattern in CodeGenPerpare via UAddWithOverflow.
This causes a regression over Clang 7 on both X86 and AArch64:
https://gcc.godbolt.org/z/juhXYV
This patch extends UAddWithOverflow to also catch the XOR case, if the
XOR is only used in the ICMP. This covers just a single case, but I'd
like to make sure I am not missing anything before tackling the other
cases.
Reviewers: nikic, RKSimon, lebedev.ri, spatel
Reviewed By: nikic, lebedev.ri
Differential Revision: https://reviews.llvm.org/D74228
Summary:
Depends on https://reviews.llvm.org/D71900.
The fourth in a series of patches that ports the LLVM coroutines passes
to the new pass manager infrastructure. This patch implements
'coro-cleanup'.
No existing regression tests check the behavior of coro-cleanup on its
own, so this patch adds one. (A test named 'coro-cleanup.ll' exists, but
it relies on the entire coroutines pipeline being run. It's updated to
test the new pass manager in the 5th patch of this series.)
Reviewers: GorNishanov, lewissbaker, chandlerc, junparser, deadalnix, wenlei
Reviewed By: wenlei
Subscribers: wenlei, EricWF, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71901
Essentially, fold OrderedBasicBlock into BasicBlock, and make it
auto-invalidate the instruction ordering when new instructions are
added. Notably, we don't need to invalidate it when removing
instructions, which is helpful when a pass mostly delete dead
instructions rather than transforming them.
The downside is that Instruction grows from 56 bytes to 64 bytes. The
resulting LLVM code is substantially simpler and automatically handles
invalidation, which makes me think that this is the right speed and size
tradeoff.
The important change is in SymbolTableTraitsImpl.h, where the numbering
is invalidated. Everything else should be straightforward.
We probably want to implement a fancier re-numbering scheme so that
local updates don't invalidate the ordering, but I plan for that to be
future work, maybe for someone else.
Reviewed By: lattner, vsk, fhahn, dexonsmith
Differential Revision: https://reviews.llvm.org/D51664
Fixes https://bugs.llvm.org/show_bug.cgi?id=44922 (caused by 4698bf145d)
ThreadThroughTwoBasicBlocks assumes PredBBBranch is conditional. The following code can segfault.
AddPHINodeEntriesForMappedBlock(PredBBBranch->getSuccessor(1), PredBB, NewBB,
ValueMapping);
We can also allow unconditional PredBB, but the produced code is not
better.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D74747
The index of an ExtractElementInst is not guaranteed to be a
ConstantInt. It can be any integer value. Check explicitly for
ConstantInts.
The new test cases illustrate scenarios where we crash without
this patch. I've also added another test case to check the matching
of extractelement vector ops works.
Reviewers: RKSimon, ABataev, dtemirbulatov, vporpo
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D74758
When simplifying demanded bits, we currently only report the
instruction on which SimplifyDemandedBits was called as changed.
However, this is a recursive call, and the actually modified
instruction will usually be further up the chain. Additionally,
all the intermediate instructions should also be revisited,
as additional combines may be possible after the demanded bits
simplification. We fix this by explicitly adding them back to the
worklist.
Differential Revision: https://reviews.llvm.org/D72944