Summary:
This patch adds processing of binary operations when the def of operands are in
the same block (i.e. local processing).
Earlier we bailed out in such cases (the bail out was introduced in rL252032)
because LVI at that time was more precise about context at the end of basic
blocks, which implied local def and use analysis didn't benefit CVP.
Since then we've added support for LVI in presence of assumes and guards. The
test cases added show how local def processing in CVP helps adding more
information to the ashr, sdiv, srem and add operators.
Note: processCmp which suffers from the same problem will
be handled in a later patch.
Reviewers: philip, apilipenko, SjoerdMeijer, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38766
llvm-svn: 315634
This is a follow up for the loop predication change 313981 to support ule, sle latch predicates.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D38177
llvm-svn: 315616
Summary:
Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with
LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP.
Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods.
Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so
it'll be picked up by public headers.
Differential Revision: https://reviews.llvm.org/D38406
llvm-svn: 315590
This reverts commit 4e4ee1c507e2707bb3c208e1e1b6551c3015cbf5.
This is failing due to some code that isn't built on MSVC
so I didn't catch. Not immediately obvious how to fix this
at first glance, so I'm reverting for now.
llvm-svn: 315536
There's a lot of misuse of Twine scattered around LLVM. This
ranges in severity from benign (returning a Twine from a function
by value that is just a string literal) to pretty sketchy (storing
a Twine by value in a class). While there are some uses for
copying Twines, most of the very compelling ones are confined
to the Twine class implementation itself, and other uses are
either dubious or easily worked around.
This patch makes Twine's copy constructor private, and fixes up
all callsites.
Differential Revision: https://reviews.llvm.org/D38767
llvm-svn: 315530
parameterized emit() calls
Summary: This is not functional change to adopt new emit() API added in r313691.
Reviewed By: anemet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38285
llvm-svn: 315476
This patch fixes the miscompile that happens when PRE hoists loads across guards and
other instructions that don't always pass control flow to their successors. PRE is now prohibited
to hoist across such instructions because there is no guarantee that the load standing after such
instruction is still valid before such instruction. For example, a load from under a guard may be
invalid before the guard in the following case:
int array[LEN];
...
guard(0 <= index && index < LEN);
use(array[index]);
Differential Revision: https://reviews.llvm.org/D37460
llvm-svn: 315440
Sinking of unordered atomic load into loop must be disallowed because it turns
a single load into multiple loads. The relevant section of the documentation
is: http://llvm.org/docs/Atomics.html#unordered, specifically the Notes for
Optimizers section. Here is the full text of this section:
> Notes for optimizers
> In terms of the optimizer, this **prohibits any transformation that
> transforms a single load into multiple loads**, transforms a store into
> multiple stores, narrows a store, or stores a value which would not be
> stored otherwise. Some examples of unsafe optimizations are narrowing
> an assignment into a bitfield, rematerializing a load, and turning loads
> and stores into a memcpy call. Reordering unordered operations is safe,
> though, and optimizers should take advantage of that because unordered
> operations are common in languages that need them.
Patch by Daniil Suchkov!
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D38392
llvm-svn: 315438
IRCE should not apply when the safe iteration range is proved to be empty.
In this case we do unneeded job creating pre/post loops and then never
go to the main loop.
This patch makes IRCE not apply to empty safe ranges, adds test for this
situation and also modifies one of existing tests where it used to happen
slightly.
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D38577
llvm-svn: 315437
Summary: In the current implementation, we only have accurate profile count for standalone symbols. For inlined functions, we do not have entry count data because it's not available in LBR. In this patch, we use the first instruction's frequency to estimiate the function's entry count, especially for inlined functions. This may be inaccurate due to debug info in optimized code. However, this is a better estimate than the static 80/20 estimation we have in the current implementation.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits, aprantl
Differential Revision: https://reviews.llvm.org/D38478
llvm-svn: 315369
Eliminate inttype phi with inttoptr/ptrtoint.
This version fixed a bug in finding the matching
phi -- the order of the incoming blocks may be
different (triggered in self build on Windows).
A new test case is added.
llvm-svn: 315272
There's at least one bug here - this code can fail with vector types (PR34856).
It's also being called for FREM; I'm still trying to understand how that is valid.
llvm-svn: 315127
This appears to be miscompiling Clang, as shown on two Windows bootstrap
bots:
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/7611http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/6870
Nothing else is in the blame list. Both emit errors on this valid code
in the Windows ucrt headers:
C:\...\ucrt\malloc.h:95:32: error: invalid operands to binary expression ('char *' and 'int')
_Ptr = (char*)_Ptr + _ALLOCA_S_MARKER_SIZE;
~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~
I am attempting to reproduce this now.
This reverts r315044
llvm-svn: 315108
Summary: stripPointerCast is not reliably returning the value that's being type-casted. Instead it may look further at function attributes to further propagate the value. Instead of relying on stripPOintercast, the more reliable solution is to directly use the pointer to the promoted direct call.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D38603
llvm-svn: 315077
This is a vestige from the GCC-3 days, which disables IPO passes
when set. I don't think anybody actually uses it as there are
several IPO passes which still run with this flag set and
nobody complained/noticed. This reduces the delta between
current and new pass manager and allows us to easily review
the difference when we decide to flip the switch (or audit
which passes should run, FWIW).
llvm-svn: 315043
Summary:
If the extracted region has multiple exported data flows toward the same BB which is not included in the region, correct resotre instructions and PHI nodes won't be generated inside the exitStub. The solution is simply put the restore instructions right after the definition of output values instead of putting in exitStub.
Unittest for this bug is included.
Author: myhsu
Reviewers: chandlerc, davide, lattner, silvas, davidxl, wmi, kuhar
Subscribers: dberlin, kuhar, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D37902
llvm-svn: 315041
It is possible for two modules to define the same set of external
symbols without causing a duplicate symbol error at link time,
as long as each of the symbols is a comdat member. So we cannot
use them as part of a unique id for the module.
Differential Revision: https://reviews.llvm.org/D38602
llvm-svn: 315026
Summary: In SamplePGO, when an indirect call is promoted in the profiled binary, before profile annotation, it will be promoted and inlined. For the original indirect call, the current implementation will not mark VP profile on it. This is an issue when profile becomes stale. This patch annotates VP prof on indirect calls during annotation.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D38477
llvm-svn: 315016
The inliner performs some kind of dead code elimination as it goes,
but there are cases that are not really caught by it. We might
at some point consider teaching the inliner about them, but it
is OK for now to run GlobalOpt + GlobalDCE in tandem as their
benefits generally outweight the cost, making the whole pipeline
faster.
This fixes PR34652.
Differential Revision: https://reviews.llvm.org/D38154
llvm-svn: 314997
When ignoring a load that participates in an interleaved group, make sure to
move a cast that needs to sink after it.
Testcase derived from reproducer of PR34743.
Differential Revision: https://reviews.llvm.org/D38338
llvm-svn: 314986
Instead of trying to keep LastWidenRecipe updated after creating each recipe,
have tryToWiden() retrieve the last recipe of the current VPBasicBlock and check
if it's a VPWidenRecipe when attempting to extend its range. This ensures that
such extensions, optimized to maintain the original instruction order, do so
only when the instructions are to maintain their relative order. The latter does
not always hold, e.g., when a cast needs to sink to unravel first order
recurrence (r306884).
Testcase derived from reproducer of PR34711.
Differential Revision: https://reviews.llvm.org/D38339
llvm-svn: 314981
We were using an i1 type and then zero extending to a vector. Instead just create the 0/1 directly as a ConstantInt with the correct type. No need to ask ConstantExpr to zero extend for us.
This bug is a bit tricky to hit because it requires us to visit a zext of an icmp that would normally be simplified to true/false, but that icmp hasnt' been visited yet. In the test case this zext and icmp were created by visiting a udiv and due to worklist ordering we got to the zext first.
Fixes PR34841.
llvm-svn: 314971
Summary: This is to avoid e.g. merging two cheap icmps if the target is not going to expand to something nice later.
Reviewers: dberlin, spatel
Subscribers: davide, nemanjai
Differential Revision: https://reviews.llvm.org/D38232
llvm-svn: 314970
We can support ashr similar to lshr, if we know that none of the shifted in bits are used. In that case SimplifyDemandedBits would normally convert it to lshr. But that conversion doesn't happen if the shift has additional users.
Differential Revision: https://reviews.llvm.org/D38521
llvm-svn: 314945
This is a follow-up to https://reviews.llvm.org/D38138.
I fixed the capitalization of some functions because we're changing those
lines anyway and that helped verify that we weren't accidentally dropping
any options by using default param values.
llvm-svn: 314930
Recommitting r314517 with the fix for handling ConstantExpr.
Original commit message:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing
mode in the target. However, since it doesn't check its actual users, it will
return FREE even in cases where the GEP cannot be folded away as a part of
actual addressing mode. For example, if an user of the GEP is a call
instruction taking the GEP as a parameter, then the GEP may not be folded in
isel.
llvm-svn: 314923
We have found some corner cases connected to range intersection where IRCE makes
a bad thing when the latch condition is unsigned. The fix for that will go as a follow up.
This patch temporarily disables IRCE for unsigned latch conditions until the issue is fixed.
The unsigned latch conditions were introduced to IRCE by rL310027.
Differential Revision: https://reviews.llvm.org/D38529
llvm-svn: 314881
All the buildbots are red, e.g.
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-lld/builds/2436/
> Summary:
> This patch tries to vectorize loads of consecutive memory accesses, accessed
> in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
> which was reverted back due to some basic issue with representing the 'use mask' of
> jumbled accesses.
>
> This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
>
> Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
>
> Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
>
> Reviewed By: Ayal
>
> Subscribers: hans, mzolotukhin
>
> Differential Revision: https://reviews.llvm.org/D36130
llvm-svn: 314824
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: hans, mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
llvm-svn: 314806
Apparently this works by virtue of the fact that the pointers are pointers to the APInts stored inside of the ConstantInt objects. But I really don't think we should be relying on that.
llvm-svn: 314761
Summary: If the merged instruction is call instruction, we need to set the scope to the closes common scope between 2 locations, otherwise it will cause trouble when the call is getting inlined.
Reviewers: dblaikie, aprantl
Reviewed By: dblaikie, aprantl
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D37877
llvm-svn: 314694
Summary: This currently uses ConstantExpr to do its math, but as noted in a TODO it can all be done directly on APInt.
Reviewers: spatel, majnemer
Reviewed By: majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38440
llvm-svn: 314640
And follow-up r314585.
Leads to segfaults. I'll forward reproduction instructions to the patch
author.
Also, for a recommit, still add the original patch description.
Otherwise, it becomes really tedious to find out what a patch actually
does. The fact that it is a recommit with a fix is somewhat secondary.
llvm-svn: 314622
Summary: In SamplePGO ThinLTO compile phase, we will not invoke ICP as it may introduce confusion to the 2nd annotation. This patch extracted that logic and makes it clearer before profile annotation. In the mean time, we need to make function importing process both inlined callsites as well as not promoted indirect callsites.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, mehdi_amini, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D38094
llvm-svn: 314619
This patch will eliminate redundant intptr/ptrtoint that pessimizes
analyses such as SCEV, AA and will make optimization passes such
as auto-vectorization more powerful.
Differential revision: http://reviews.llvm.org/D37832
llvm-svn: 314561
When type shrinking reductions, we should insert the truncations and extends at
the end of the loop latch block. Previously, these instructions were inserted
at the end of the loop header block. The difference is only a problem for loops
with predicated instructions (e.g., conditional stores and instructions that
may divide by zero). For these instructions, we create new basic blocks inside
the vectorized loop, which cause the loop header and latch to no longer be the
same block. This should fix PR34687.
Reference: https://bugs.llvm.org/show_bug.cgi?id=34687
llvm-svn: 314542
The type of a SCEVConstant may not match the corresponding LLVM Value.
In this case, we skip the constant folding for now.
TODO: Replace ConstantInt Zero by ConstantPointerNull
llvm-svn: 314531
Summary:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target.
However, since it doesn't check its actual users, it will return FREE even in cases
where the GEP cannot be folded away as a part of actual addressing mode.
For example, if an user of the GEP is a call instruction taking the GEP as a parameter,
then the GEP may not be folded in isel.
Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng
Reviewed By: hfinkel
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38085
llvm-svn: 314517
JumpThreading now preserves dominance and lazy value information across the
entire pass. The pass manager is also informed of this preservation with
the goal of DT and LVI being recalculated fewer times overall during
compilation.
This change prepares JumpThreading for enhanced opportunities; particularly
those across loop boundaries.
Patch by: Brian Rzycki <b.rzycki@samsung.com>,
Sebastian Pop <s.pop@samsung.com>
Differential revision: https://reviews.llvm.org/D37528
llvm-svn: 314435
Summary:
And now that we no longer have to explicitly free() the Loop instances, we can
(with more ease) use the destructor of LoopBase to do what LoopBase::clear() was
doing.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38201
llvm-svn: 314375
This reverts r314017 and similar code added in later commits. It seems to not work for pointer compares and is causing a bot failure for the last several days.
llvm-svn: 314360
reductions.
If both operands of the newly created SelectInst are Undefs the
resulting operation is also Undef, not SelectInst. It may cause crashes
when trying to propagate IR flags because function expects exactly
SelectInst instruction, nothing else.
llvm-svn: 314323
These changes faciliate positive behavior for arithmetic based select
expressions that match its translation criteria, keeping code size gated to
neutral or improved scenarios.
Patch by Michael Berg <michael_c_berg@apple.com>!
Differential Revision: https://reviews.llvm.org/D38263
llvm-svn: 314320
This was intended to be no-functional-change, but it's not - there's a test diff.
So I thought I should stop here and post it as-is to see if this looks like what was expected
based on the discussion in PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603
Notes:
1. The test improvement occurs because the existing 'LateSimplifyCFG' marker is not carried
through the recursive calls to 'SimplifyCFG()->SimplifyCFGOpt().run()->SimplifyCFG()'.
The parameter isn't passed down, so we pick up the default value from the function signature
after the first level. I assumed that was a bug, so I've passed 'Options' down in all of the
'SimplifyCFG' calls.
2. I split 'LateSimplifyCFG' into 2 bits: ConvertSwitchToLookupTable and KeepCanonicalLoops.
This would theoretically allow us to differentiate the transforms controlled by those params
independently.
3. We could stash the optional AssumptionCache pointer and 'LoopHeaders' pointer in the struct too.
I just stopped here to minimize the diffs.
4. Similarly, I stopped short of messing with the pass manager layer. I have another question that
could wait for the follow-up: why is the new pass manager creating the pass with LateSimplifyCFG
set to true no matter where in the pipeline it's creating SimplifyCFG passes?
// Create an early function pass manager to cleanup the output of the
// frontend.
EarlyFPM.addPass(SimplifyCFGPass());
-->
/// \brief Construct a pass with the default thresholds
/// and switch optimizations.
SimplifyCFGPass::SimplifyCFGPass()
: BonusInstThreshold(UserBonusInstThreshold),
LateSimplifyCFG(true) {} <-- switches get converted to lookup tables and loops may not be in canonical form
If this is unintended, then it's possible that the current behavior of dropping the 'LateSimplifyCFG'
setting via recursion was masking this bug.
Differential Revision: https://reviews.llvm.org/D38138
llvm-svn: 314308
This patch tries to transform cases like:
for (unsigned i = 0; i < N; i += 2) {
bool c0 = (i & 0x1) == 0;
bool c1 = ((i + 1) & 0x1) == 1;
}
To
for (unsigned i = 0; i < N; i += 2) {
bool c0 = true;
bool c1 = true;
}
This commit also update test/Transforms/IndVarSimplify/replace-srem-by-urem.ll to prevent constant folding.
Differential Revision: https://reviews.llvm.org/D38272
llvm-svn: 314266
Summary:
Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow . This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.
Reviewers: jlebar
Subscribers: jholewinski, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38265
llvm-svn: 314253
If this transformation succeeds, we're going to remove our dependency on the shift by rewriting the and. So it doesn't matter how many uses the shift has.
This distributes the one use check to other transforms in foldICmpAndConstConst that do need it.
Differential Revision: https://reviews.llvm.org/D38206
llvm-svn: 314233
This is a 2nd attempt at:
https://reviews.llvm.org/rL310055
...which was reverted at rL310123 because of PR34074:
https://bugs.llvm.org/show_bug.cgi?id=34074
In this version, we break out of the inner loop after we successfully merge and kill a pair of stores. In the
earlier rev, we were continuing instead, which meant we could process the invalid info from a now dead store.
Original commit message (authored by Filipe Cabecinhas):
This fixes PR31777.
If both stores' values are ConstantInt, we merge the two stores
(shifting the smaller store appropriately) and replace the earlier (and
larger) store with an updated constant.
In the future we should also support vectors of integers. And maybe
float/double if we can.
Differential Revision: https://reviews.llvm.org/D30703
llvm-svn: 314206
Usually the frontend communicates the size of wchar_t via metadata and
we can optimize wcslen (and possibly other calls in the future). In
cases without the wchar_size metadata we would previously try to guess
the correct size based on the target triple; however this is fragile to
keep up to date and may miss users manually changing the size via flags.
Better be safe and stop guessing and optimizing if the frontend didn't
communicate the size.
Differential Revision: https://reviews.llvm.org/D38106
llvm-svn: 314185
Summary:
Sanitizer blacklist entries currently apply to all sanitizers--there
is no way to specify that an entry should only apply to a specific
sanitizer. This is important for Control Flow Integrity since there are
several different CFI modes that can be enabled at once. For maximum
security, CFI blacklist entries should be scoped to only the specific
CFI mode(s) that entry applies to.
Adding section headers to SpecialCaseLists allows users to specify more
information about list entries, like sanitizer names or other metadata,
like so:
[section1]
fun:*fun1*
[section2|section3]
fun:*fun23*
The section headers are regular expressions. For backwards compatbility,
blacklist entries entered before a section header are put into the '[*]'
section so that blacklists without sections retain the same behavior.
SpecialCaseList has been modified to also accept a section name when
matching against the blacklist. It has also been modified so the
follow-up change to clang can define a derived class that allows
matching sections by SectionMask instead of by string.
Reviewers: pcc, kcc, eugenis, vsk
Reviewed By: eugenis, vsk
Subscribers: vitalybuka, llvm-commits
Differential Revision: https://reviews.llvm.org/D37924
llvm-svn: 314170
All this optimization cares about is knowing how many low bits of LHS is known to be zero and whether that means that the result is 0 or greater than the RHS constant. It doesn't matter where the zeros in the low bits came from. So we don't need to specifically look for an AND. Instead we can use known bits.
Differential Revision: https://reviews.llvm.org/D38195
llvm-svn: 314153
The 1st attempt at this:
https://reviews.llvm.org/rL314117
was reverted at:
https://reviews.llvm.org/rL314118
because of bot fails for clang tests that were checking optimized IR. That should be fixed with:
https://reviews.llvm.org/rL314144
...so try again.
Original commit message:
The transform to convert an extract-of-a-select-of-vectors was added at:
https://reviews.llvm.org/rL194013
And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>
Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.
The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.
The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.
Differential Revision: https://reviews.llvm.org/D38006
llvm-svn: 314147
Since now SCEV can handle 'urem', an 'urem' is a better canonical form than an 'srem' because it has well-defined behavior
This is a follow up of D34598
Differential Revision: https://reviews.llvm.org/D38072
llvm-svn: 314125
The transform to convert an extract-of-a-select-of-vectors was added at:
rL194013
And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>
Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.
The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.
The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.
Differential Revision: https://reviews.llvm.org/D38006
llvm-svn: 314117
The result of the isSignBitCheck isn't used anywhere else and this allows us to share the m_APInt call in the likely case that it isn't a sign bit check.
llvm-svn: 314018
We've found a serious issue with the current implementation of loop predication.
The current implementation relies on SCEV and this turned out to be problematic.
To fix the problem we had to rework the pass substantially. We have had the
reworked implementation in our downstream tree for a while. This is the initial
patch of the series of changes to upstream the new implementation.
For now the transformation is limited to the following case:
* The loop has a single latch with either ult or slt icmp condition.
* The step of the IV used in the latch condition is 1.
* The IV of the latch condition is the same as the post increment IV of the guard condition.
* The guard condition is ult.
See the review or the LoopPredication.cpp header for the details about the
problem and the new implementation.
Reviewed By: sanjoy, mkazantsev
Differential Revision: https://reviews.llvm.org/D37569
llvm-svn: 313981
The fix is to avoid invalidating our insertion point in
replaceDbgDeclare:
Builder.insertDeclare(NewAddress, DIVar, DIExpr, Loc, InsertBefore);
+ if (DII == InsertBefore)
+ InsertBefore = &*std::next(InsertBefore->getIterator());
DII->eraseFromParent();
I had to write a unit tests for this instead of a lit test because the
use list order matters in order to trigger the bug.
The reduced C test case for this was:
void useit(int*);
static inline void inlineme() {
int x[2];
useit(x);
}
void f() {
inlineme();
inlineme();
}
llvm-svn: 313905
.. as well as the two subsequent changes r313826 and r313875.
This leads to segfaults in combination with ASAN. Will forward repro
instructions to the original author (rnk).
llvm-svn: 313876
Summary:
There already was code that tried to remove the dbg.declare, but that code
was placed after we had called
I->replaceAllUsesWith(UndefValue::get(I->getType()));
on the alloca, so when we searched for the relevant dbg.declare, we
couldn't find it.
Now we do the search before we call RAUW so there is a chance to find it.
An existing testcase needed update due to this. Two dbg.declare with undef
were removed and then suddenly one of the two CHECKS failed.
Before this patch we got
call void @llvm.dbg.declare(metadata i24* undef, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15
call void @llvm.dbg.declare(metadata %struct.prog_src_register* undef, metadata !14, metadata !DIExpression()), !dbg !15
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !15
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15
and with it we get
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !15
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15
However, the CHECKs in the testcase checked things in a silly order, so
they only passed since they found things in the first dbg.declare. Now
we changed the order of the checks and the test passes.
Reviewers: rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37900
llvm-svn: 313875
This patch contains fix for reverted commit
rL312318 which was causing failure due to use
of unchecked dyn_cast to CIInit.
Patch by: Nikola Prica.
llvm-svn: 313870
Revert the patch causing the functional failures.
The patch owner is notified with test cases which fail.
Test case has been provided to Maxim offline.
llvm-svn: 313857
I noticed this inefficiency while investigating PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603
This fix will likely push another bug (we don't maintain state of 'LateSimplifyCFG')
into hiding, but I'll try to clean that up with a follow-up patch anyway.
llvm-svn: 313829
Summary:
This implements the design discussed on llvm-dev for better tracking of
variables that live in memory through optimizations:
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117222.html
This is tracked as PR34136
llvm.dbg.addr is intended to be produced and used in almost precisely
the same way as llvm.dbg.declare is today, with the exception that it is
control-dependent. That means that dbg.addr should always have a
position in the instruction stream, and it will allow passes that
optimize memory operations on local variables to insert llvm.dbg.value
calls to reflect deleted stores. See SourceLevelDebugging.rst for more
details.
The main drawback to generating DBG_VALUE machine instrs is that they
usually cause LLVM to emit a location list for DW_AT_location. The next
step will be to teach DwarfDebug.cpp how to recognize more DBG_VALUE
ranges as not needing a location list, and possibly start setting
DW_AT_start_offset for variables whose lifetimes begin mid-scope.
Reviewers: aprantl, dblaikie, probinson
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37768
llvm-svn: 313825
We already did (X & C2) > C1 --> (X & C2) != 0, if any bit set in (X & C2) will produce a result greater than C1. But there is an equivalent inverse condition with <= C1 (which will be canonicalized to < C1+1)
Differential Revision: https://reviews.llvm.org/D38065
llvm-svn: 313819
This broke the buildbots, e.g.
http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/391
> Summary:
> This patch tries to vectorize loads of consecutive memory accesses, accessed
> in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
> which was reverted back due to some basic issue with representing the 'use mask'
> jumbled accesses.
>
> This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
>
> Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
>
> Subscribers: mzolotukhin
>
> Reviewed By: ayal
>
> Differential Revision: https://reviews.llvm.org/D36130
>
> Review comments updated accordingly
>
> Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0
>
> Added a TODO for sortLoadAccesses API
>
> Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58
>
> Modified the TODO for sortLoadAccesses API
>
> Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565
>
> Review comment update for using OpdNum to insert the mask in respective location
>
> Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce
>
> Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase
>
> Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b
llvm-svn: 313781
In these cases, two selects have constant selectable operands for
both the true and false components and have the same conditional
expression.
We then create two arithmetic operations of the same type and feed a
final select operation using the result of the true arithmetic for the true
operand and the result of the false arithmetic for the false operand and reuse
the original conditionl expression.
The arithmetic operations are naturally folded as a consequence, leaving
only the newly formed select to replace the old arithmetic operation.
Patch by: Michael Berg <michael_c_berg@apple.com>
Differential Revision: https://reviews.llvm.org/D37019
llvm-svn: 313774
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask'
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Subscribers: mzolotukhin
Reviewed By: ayal
Differential Revision: https://reviews.llvm.org/D36130
Review comments updated accordingly
Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0
Added a TODO for sortLoadAccesses API
Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58
Modified the TODO for sortLoadAccesses API
Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565
Review comment update for using OpdNum to insert the mask in respective location
Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce
Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase
Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b
llvm-svn: 313771
Summary:
The fix for dead stripping analysis in the case of SamplePGO indirect
calls to local functions (r313151) introduced the possibility of an
infinite loop.
Make sure we check for the value being already live after we update it
for SamplePGO indirect call handling.
Reviewers: danielcdh
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D38086
llvm-svn: 313766
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
Commit after rebase for patch D36130
Change-Id: I8add1c265455669ef288d880f870a9522c8c08ab
llvm-svn: 313736
Summary:
With this change:
- Methods in LoopBase trip an assert if the receiver has been invalidated
- LoopBase::clear frees up the memory held the LoopBase instance
This change also shuffles things around as necessary to work with this stricter invariant.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38055
llvm-svn: 313708
Summary:
See comment for why I think this is a good idea.
This change also:
- Removes an SCEV test case. The SCEV test was not testing anything useful (most of it was `#if 0` ed out) and it would need to be updated to deal with a private ~Loop::Loop.
- Updates the loop pass manager test case to deal with a private ~Loop::Loop.
- Renames markAsRemoved to markAsErased to contrast with removeLoop, via the usual remove vs. erase idiom we already have for instructions and basic blocks.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37996
llvm-svn: 313695
In the lambda we are now returning the remark by value so we need to preserve
its type in the insertion operator. This requires making the insertion
operator generic.
I've also converted a few cases to use the new API. It seems to work pretty
well. See the LoopUnroller for a slightly more interesting case.
llvm-svn: 313691
Summary: In the ThinLTO compilation, if a function is inlined in the profiling binary, we need to inline it before annotation. If the callee is not available in the primary module, a first step is needed to import that callee function. For the current implementation, if the call is an indirect call, which has been promoted to >1 targets and inlined, SamplePGO will only import one target with the largest sample count. This patch fixed the bug to import all targets instead.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D36637
llvm-svn: 313678
Summary: Fix the bug when promoted call return type mismatches with the promoted function, we should not try to inline it. Otherwise it may lead to compiler crash.
Reviewers: davidxl, tejohnson, eraman
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D38018
llvm-svn: 313658
I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates.
llvm-svn: 313450
CostModel.
The original patch added support for horizontal min/max reductions to
the SLP vectorizer.
This patch causes LLVM to miscompile fairly simple signed min
reductions. I have attached a test progrom to http://llvm.org/PR34635
that shows the behavior change after this patch. We found this in a test
for the open source Eigen library, but also in other code.
Unfortunately, the revert is moderately challenging. It required
reverting:
r313042: [SLP] Test with multiple uses of conditional op and wrong parent.
r312853: [SLP] Fix buildbots, NFC.
r312793: [SLP] Fix the warning about paths not returning the value, NFC.
r312791: [SLP] Support for horizontal min/max reduction.
And even then, I had to completely skip reverting the changes to TTI and
CostModel because r312832 rewrote so much of this code. Plus, the cost
modeling changes aren implicated in the miscompile, so they should be
fine and will just not be used until this gets re-introduced.
llvm-svn: 313409
It enables OptimizationRemarkEmitter::allowExtraAnalysis and MachineOptimizationRemarkEmitter::allowExtraAnalysis to return true not only for -fsave-optimization-record but when specific remarks are requested with
command line options.
The diagnostic handler used to be callback now this patch adds a class
DiagnosticHandler. It has virtual method to provide custom diagnostic handler
and methods to control which particular remarks are enabled.
However LLVM-C API users can still provide callback function for diagnostic handler.
llvm-svn: 313390
It enables OptimizationRemarkEmitter::allowExtraAnalysis and MachineOptimizationRemarkEmitter::allowExtraAnalysis to return true not only for -fsave-optimization-record but when specific remarks are requested with
command line options.
The diagnostic handler used to be callback now this patch adds a class
DiagnosticHandler. It has virtual method to provide custom diagnostic handler
and methods to control which particular remarks are enabled.
However LLVM-C API users can still provide callback function for diagnostic handler.
llvm-svn: 313382
Add a profitability heuristic to enable runtime unrolling of multi-exit
loop: There can be atmost two unique exit blocks for the loop and the
second exit block should be a deoptimizing block. Also, there can be one
other exiting block other than the latch exiting block. The reason for
the latter is so that we limit the number of branches in the unrolled
code to being at most the unroll factor. Deoptimizing blocks are rarely
taken so these additional number of branches created due to the
unrolling are predictable, since one of their target is the deopt block.
Reviewers: apilipenko, reames, evstupac, mkuper
Subscribers: llvm-commits
Reviewed by: reames
Differential Revision: https://reviews.llvm.org/D35380
llvm-svn: 313363
During runtime unrolling on loops with multiple exits, we update the
exit blocks with the correct phi values from both original and remainder
loop.
In this process, we lookup the VMap for the mapped incoming phi values,
but did not update the VMap if a default entry was generated in the VMap
during the lookup. This default value is generated when constants or
values outside the current loop are looked up.
This patch fixes the assertion failure when null entries are present in
the VMap because of this lookup. Added a testcase that showcases the
problem.
llvm-svn: 313358
Patch tries to improve vectorization of the following code:
void add1(int * __restrict dst, const int * __restrict src) {
*dst++ = *src++;
*dst++ = *src++ + 1;
*dst++ = *src++ + 2;
*dst++ = *src++ + 3;
}
Allows to vectorize even if the very first operation is not a binary add, but just a load.
Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev, davide
Subscribers: llvm-commits, RKSimon
Differential Revision: https://reviews.llvm.org/D28907
llvm-svn: 313348
Summary: Move to LoopUtils method that collects all children of a node inside a loop.
Reviewers: majnemer, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37870
llvm-svn: 313322
Summary: SampleProfileLoader inlines hot functions if it is inlined in the profiled binary. However, the inline needs to be guarded by legality check, otherwise it could lead to correctness issues.
Reviewers: eraman, davidxl
Reviewed By: eraman
Subscribers: vitalybuka, sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D37779
llvm-svn: 313277
This patch fixes pr34283, which exposed that the computation of
maximum legal width for vectorization was wrong, because it relied
on MaxInterleaveFactor to obtain the maximum stride used in the loop,
however not all strided accesses in the loop have an interleave-group
associated with them.
Instead of recording the maximum stride in the loop, which can be over
conservative (e.g. if the access with the maximum stride is not involved
in the dependence limitation), this patch tracks the actual maximum legal
width imposed by accesses that are involved in dependencies.
Differential Revision: https://reviews.llvm.org/D37507
llvm-svn: 313237
This reland includes a fix for the LowerTypeTests pass so that it
looks past aliases when determining which type identifiers are live.
Differential Revision: https://reviews.llvm.org/D37842
llvm-svn: 313229
This broke Chromium's CFI build; see crbug.com/765004.
> We were previously handling aliases during dead stripping by adding
> the aliased global's "original name" GUID to the worklist. This will
> lead to incorrect behaviour if the global has local linkage because
> the original name GUID will not correspond to the global's GUID in
> the summary.
>
> Because an alias is just another name for the global that it
> references, there is no need to mark the referenced global as used,
> or to follow references from any other copies of the global. So all
> we need to do is to follow references from the aliasee's summary
> instead of the alias.
>
> Differential Revision: https://reviews.llvm.org/D37789
llvm-svn: 313222
Summary: SampleProfileLoader inlines hot functions if it is inlined in the profiled binary. However, the inline needs to be guarded by legality check, otherwise it could lead to correctness issues.
Reviewers: eraman, davidxl
Reviewed By: eraman
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D37779
llvm-svn: 313195
These are changes to reduce redundant computations when calculating a
feasible vectorization factor:
1. early return when target has no vector registers
2. don't compute register usage for the default VF.
Suggested during review for D37702.
llvm-svn: 313176
Summary:
Added text options to -pgo-view-counts and -pgo-view-raw-counts that dump block frequency and branch probability info in text.
This is useful when the graph is very large and complex (the dot command crashes, lines/edges too close to tell apart, hard to navigate without textual search) or simply when text is preferred.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37776
llvm-svn: 313159
We were previously handling aliases during dead stripping by adding
the aliased global's "original name" GUID to the worklist. This will
lead to incorrect behaviour if the global has local linkage because
the original name GUID will not correspond to the global's GUID in
the summary.
Because an alias is just another name for the global that it
references, there is no need to mark the referenced global as used,
or to follow references from any other copies of the global. So all
we need to do is to follow references from the aliasee's summary
instead of the alias.
Differential Revision: https://reviews.llvm.org/D37789
llvm-svn: 313157
Summary:
SamplePGO indirect call profiles record the target as the original GUID
for statics. The importer had special handling to map to the normal GUID
in that case. The dead global analysis needs the same treatment or
inconsistencies arise, resulting in linker unsats due to some dead
symbols being exported and kept, leaving in references to other dead
symbols that are removed.
This can happen when a SamplePGO profile collected by one binary is used
for a different binary, so the indirect call profiles may not accurately
reflect live targets.
Reviewers: danielcdh
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D37783
llvm-svn: 313151
When converting a PHI into a series of 'select' instructions to combine the
incoming values together according their edge masks, initialize the first
value to the incoming value In0 of the first predecessor, instead of
generating a redundant assignment 'select(Cond[0], In0, In0)'. The latter
fails when the Cond[0] mask is null, representing a full mask, which can
happen only when there's a single incoming value.
No functional changes intended nor expected other than surviving null Cond[0]'s.
This fix follows D35725, which introduced using null to represent full masks.
Differential Revision: https://reviews.llvm.org/D37619
llvm-svn: 313119
Factor out the reachability such that multiple queries to find reachability of values are fast. This is based on finding
the ANTIC points
in the CFG which do not change during hoisting. The ANTIC points are basically the dominance-frontiers in the inverse
graph. So we introduce a data structure (CHI nodes)
to keep track of values flowing out of a basic block. We only do this for values with multiple occurrences in the
function as they are the potential hoistable candidates.
This patch allows us to hoist instructions to a basic block with >2 successors, as well as deal with infinite loops in a
trivial way.
Relevant test cases are added to show the functionality as well as regression fixes from PR32821.
Regression from previous GVNHoist:
We do not hoist fully redundant expressions because fully redundant expressions are already handled by NewGVN
Differential Revision: https://reviews.llvm.org/D35918
Reviewers: dberlin, sebpop, gberry,
llvm-svn: 313116
Summary:
This should improve optimized debug info for address-taken variables at
the cost of inaccurate debug info in some situations.
We patched this into clang and deployed this change to Chromium
developers, and this significantly improved debuggability of optimized
code. The long-term solution to PR34136 seems more and more like it's
going to take a while, so I would like to commit this change under a
flag so that it can be used as a stop-gap measure.
This flag should really help so for C++ aggregates like std::string and
std::vector, which are typically address-taken, even after inlining, and
cannot be SROA-ed.
Reviewers: aprantl, dblaikie, probinson, dberlin
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D36596
llvm-svn: 313108
Summary: This change passes down ACT to SampleProfileLoader for the new PM. Also remove the default value for SampleProfileLoader class as it is not used.
Reviewers: eraman, davidxl
Reviewed By: eraman
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D37773
llvm-svn: 313080
Summary:
The current promoteLoopAccessesToScalars method receives an AliasSet, but
the information used is in fact a list of Value*, known to must alias.
Create the list ahead of time to make this method independent of the AliasSet class.
While there is no functionality change, this adds overhead for creating
a set of Value*, when promotion would normally exit earlier.
This is meant to be as a first refactoring step in order to start replacing
AliasSetTracker with MemorySSA.
And while the end goal is to redesign LICM, the first few steps will focus on
adding MemorySSA as an alternative to the AliasSetTracker using most of the
existing functionality.
Reviewers: mkuper, danielcdh, dberlin
Subscribers: sanjoy, chandlerc, gberry, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D35439
llvm-svn: 313075
Summary:
When the MaxVectorSize > ConstantTripCount, we should just clamp the
vectorization factor to be the ConstantTripCount.
This vectorizes loops where the TinyTripCountThreshold >= TripCount < MaxVF.
Earlier we were finding the maximum vector width, which could be greater than
the trip count itself. The Loop vectorizer does all the work for generating a
vectorizable loop, but in the end we would always choose the scalar loop (since
the VF > trip count). This allows us to choose the VF keeping in mind the trip
count if available.
This is a fix on top of rL312472.
Reviewers: Ayal, zvi, hfinkel, dneilson
Reviewed by: Ayal
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37702
llvm-svn: 313046
Not all targets support the use of absolute symbols to export
constants. In particular, ARM has a wide variety of constant encodings
that cannot currently be relocated by linkers. So instead of exporting
the constants using symbols, export them directly in the summary.
The values of the constants are left as zeroes on targets that support
symbolic exports.
This may result in more cache misses when targeting those architectures
as a result of arbitrary changes in constant values, but this seems
somewhat unavoidable for now.
Differential Revision: https://reviews.llvm.org/D37407
llvm-svn: 312967
It now knows the tricks of both functions.
Also, fix a bug that considered allocas of non-zero address space to be always non null
Differential Revision: https://reviews.llvm.org/D37628
llvm-svn: 312869
This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented
as an independent pass, so there's no stretching of scope and feature creep for an existing pass.
I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost
this same functionality as an addition to CGP in the motivating example of PR31028:
https://bugs.llvm.org/show_bug.cgi?id=31028
The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow
more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and
undo the hoisting that is done here.
Decomposing remainder may allow removing some code from the backend (PPC and possibly others).
Differential Revision: https://reviews.llvm.org/D37121
llvm-svn: 312862
SLP vectorizer supports horizontal reductions for Add/FAdd binary
operations. Patch adds support for horizontal min/max reductions.
Function getReductionCost() is split to getArithmeticReductionCost() for
binary operation reductions and getMinMaxReductionCost() for min/max
reductions.
Patch fixes PR26956.
Differential revision: https://reviews.llvm.org/D27846
llvm-svn: 312791
This is required when targeting COFF, as the comdat name must match
one of the names of the symbols in the comdat.
Differential Revision: https://reviews.llvm.org/D37550
llvm-svn: 312767
r312318 - Debug info for variables whose type is shrinked to bool
r312325, r312424, r312489 - Test case for r312318
Revision 312318 introduced a null dereference bug.
Details in https://bugs.llvm.org/show_bug.cgi?id=34490
llvm-svn: 312758
Consider this type of a loop:
for (...) {
...
if (...) continue;
...
}
Normally, the "continue" would branch to the loop control code that
checks whether the loop should continue iterating and which contains
the (often) unique loop latch branch. In certain cases jump threading
can "thread" the inner branch directly to the loop header, creating
a second loop latch. Loop canonicalization would then transform this
loop into a loop nest. The problem with this is that in such a loop
nest neither loop is countable even if the original loop was. This
may inhibit subsequent loop optimizations and be detrimental to
performance.
Differential Revision: https://reviews.llvm.org/D36404
llvm-svn: 312664
This is a preliminary step towards solving the remaining part of PR27145 - IR for isfinite():
https://bugs.llvm.org/show_bug.cgi?id=27145
In order to solve that one more generally, we need to add matching for and/or of fcmp ord/uno
with a constant operand.
But while looking at those patterns, I realized we were missing a canonicalization for nonzero
constants. Rather than limiting to just folds for constants, we're adding a general value
tracking method for this based on an existing DAG helper.
By transforming everything to 0.0, we can simplify the existing code in foldLogicOfFCmps()
and pick up missing vector folds.
Differential Revision: https://reviews.llvm.org/D37427
llvm-svn: 312591
Instead of creating a Constant and then calling m_APInt with it (which will always return true). Just create an APInt initially, and use that for the checks in isSelect01 function. If it turns out we do need the Constant, create it from the APInt.
This is a refactor for a future patch that will do some more checks of the constant values here.
llvm-svn: 312517
Summary:
Improve how MaxVF is computed while taking into account that MaxVF should not be larger than the loop's trip count.
Other than saving on compile-time by pruning the possible MaxVF candidates, this patch fixes pr34438 which exposed the following flow:
1. Short trip count identified -> Don't bail out, set OptForSize:=True to avoid tail-loop and runtime checks.
2. Compute MaxVF returned 16 on a target supporting AVX512.
3. OptForSize -> choose VF:=MaxVF.
4. Bail out because TripCount = 8, VF = 16, TripCount % VF !=0 means we need a tail loop.
With this patch step 2. will choose MaxVF=8 based on TripCount.
Reviewers: Ayal, dorit, mkuper, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D37425
llvm-svn: 312472
Debug information can be, and was, corrupted when the runtime
remainder loop was fully unrolled. This is because a !null node can
be created instead of a unique one describing the loop. In this case,
the original node gets incorrectly updated with the NewLoopID
metadata.
In the case when the remainder loop is going to be quickly fully
unrolled, there isn't the need to add loop metadata for it anyway.
Differential Revision: https://reviews.llvm.org/D37338
llvm-svn: 312471
In addition to removing chunks of duplicated code, we don't
want these to diverge. If there's a fold for one, there
should be a fold of the other via DeMorgan's Laws.
llvm-svn: 312420
We had these locals:
Value *Op0RHS = LHS->getOperand(1);
Value *Op1LHS = RHS->getOperand(0);
...so we confusingly transposed the meaning of left/right and op0/op1.
llvm-svn: 312418
This makes it easier to see that they're almost duplicates.
As with the similar icmp functions, there should be identical
folds for both logic ops because those are DeMorganized variants.
llvm-svn: 312415
Summary:
After a discussion with Rekka, i believe this (or a small variant)
should fix the remaining phi-of-ops problems.
Rekka's algorithm for completeness relies on looking up expressions
that should have no leader, and expecting it to fail (IE looking up
expressions that can't exist in a predecessor, and expecting it to
find nothing).
Unfortunately, sometimes these expressions can be simplified to
constants, but we need the lookup to fail anyway. Additionally, our
simplifier outsmarts this by taking these "not quite right"
expressions, and simplifying them into other expressions or walking
through phis, etc. In the past, we've sometimes been able to find
leaders for these expressions, incorrectly.
This change causes us to not to try to phi of ops such expressions.
We determine safety by seeing if they depend on a phi node in our
block.
This is not perfect, we can do a bit better, but this should be a
"correctness start" that we can then improve. It also requires a
bunch of caching that i'll eventually like to eliminate.
The right solution, longer term, to the simplifier issues, is to make
the query interface for the instruction simplifier/constant folder
have the flags we need, so that we can keep most things going, but
turn off the possibly-invalid parts (threading through phis, etc).
This is an issue in another wrong code bug as well.
Reviewers: davide, mcrosier
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D37175
llvm-svn: 312401
This patch teaches decomposeBitTestICmp to look through truncate instructions on the input to the compare. If a truncate is found it will now return the pre-truncated Value and appropriately extend the APInt mask.
This allows some code to be removed from InstSimplify that was doing this functionality.
This allows InstCombine's bit test combining code to match a pre-truncate Value with the same Value appear with an 'and' on another icmp. Or it allows us to combine a truncate to i16 and a truncate to i8. This also required removing the type check from the beginning of getMaskedTypeForICmpPair, but I believe that's ok because we still have to find two values from the input to each icmp that are equal before we'll do any transformation. So the type check was really just serving as an early out.
There was one user of decomposeBitTestICmp that didn't want to look through truncates, so I've added a flag to prevent that behavior when necessary.
Differential Revision: https://reviews.llvm.org/D37158
llvm-svn: 312382
A future patch will make the code look through truncates feeding the compare. So the compares might be different types but the pretruncated types might be the same.
This should be safe because we still require the same Value* to be used truncated or not in both compares. So that serves to ensure the types are the same.
llvm-svn: 312381
Previously we used the type from the LHS of the compare, but a future patch will change decomposeBitTestICmp to look through truncates so it will return a pretruncated Value* and the type needs to match that.
llvm-svn: 312380
Summary: When we backtranslate expressions, we can't use the predicateinfo, since we are evaluating them in a different context.
Reviewers: davide, mcrosier
Subscribers: sanjoy, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D37174
llvm-svn: 312352
Summary:
LoopVectorizer is creating casts between vec<ptr> and vec<float> types
on ARM when compiling OpenCV. Since, tIs is illegal to directly cast a
floating point type to a pointer type even if the types have same size
causing a crash. Fix the crash using a two-step casting by bitcasting
to integer and integer to pointer/float.
Fixes PR33804.
Reviewers: mkuper, Ayal, dlj, rengolin, srhines
Reviewed By: rengolin
Subscribers: aemerson, kristof.beyls, mkazantsev, Meinersbur, rengolin, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35498
llvm-svn: 312331
This patch provides such debug information for integer
variables whose type is shrinked to bool by providing
dwarf expression which returns either constant initial
value or other value.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D35994
llvm-svn: 312318
comparisons into memcmp.
Thanks to recent improvements in the LLVM codegen, the memcmp is typically
inlined as a chain of efficient hardware comparisons.
This typically benefits C++ member or nonmember operator==().
For now this is disabled by default until:
- https://bugs.llvm.org/show_bug.cgi?id=33329 is complete
- Benchmarks show that this is always useful.
Differential Revision:
https://reviews.llvm.org/D33987
llvm-svn: 312315
The BasicBlock passed to FindPredecessorRetainWithSafePath should be the
parent block of Autorelease. This fixes a crash that occurs in
FindDependencies when StartInst is not in StartBB.
rdar://problem/33866381
llvm-svn: 312266
Recurse instead of returning on the first found optimization. Also, return early in the caller
instead of continuing because that allows another round of simplification before we might
potentially lose undef information from a shuffle mask by eliminating the shuffle.
As noted in the review, we could probably do better and be more efficient by moving all of
demanded elements into a separate pass, but this is yet another quick fix to instcombine.
Differential Revision: https://reviews.llvm.org/D37236
llvm-svn: 312248
Current implementation of parseLoopStructure interprets the latch comparison as a
comarison against `iv.next`. If the actual comparison is made against the `iv` current value
then the loop may be rejected, because this misinterpretation leads to incorrect evaluation
of the latch start value.
This patch teaches the IRCE to distinguish this kind of loops and perform the optimization
for them. Now we use `IndVarBase` variable which can be either next or current value of the
induction variable (previously we used `IndVarNext` which was always the value on next iteration).
Differential Revision: https://reviews.llvm.org/D36215
llvm-svn: 312221
Renaming as a preparation step to generalizing IRCE for comparison not only against
the next value of an indvar, but also against the current.
Differential Revision: https://reviews.llvm.org/D36509
llvm-svn: 312215
This code is double-dead:
1. We simplify all selects with constant true/false condition in InstSimplify.
I've minimized/moved the tests to show that works as expected.
2. All remaining vector selects with a constant condition are canonicalized to
shufflevector, so we really can't see this pattern.
llvm-svn: 312123
Summary:
If the first insertelement instruction has multiple users and inserts at
position 0, we can re-use this instruction when folding a chain of
insertelement instructions. As we need to generate the first
insertelement instruction anyways, this should be a strict improvement.
We could get rid of the restriction of inserting at position 0 by
creating a different shufflemask, but it is probably worth to keep the
first insertelement instruction with position 0, as this is easier to do
efficiently than at other positions I think.
Reviewers: grosser, mkuper, fpetrogalli, efriedma
Reviewed By: fpetrogalli
Subscribers: gareevroman, llvm-commits
Differential Revision: https://reviews.llvm.org/D37064
llvm-svn: 312110
Summary:
When jumptable encoding does not match target code encoding (arm vs
thumb), a veneer is inserted by the linker. We can not avoid this
in all cases, because entries within one jumptable must have the same
encoding, but we can make it less common by selecting the jumptable
encoding to match the majority of its targets.
This change only covers FullLTO, and not ThinLTO.
Reviewers: pcc
Subscribers: aemerson, mehdi_amini, javed.absar, kristof.beyls, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D37171
llvm-svn: 312054
Summary:
Cross-DSO CFI needs all __cfi_check exports to use the same encoding
(ARM vs Thumb).
Reviewers: pcc
Subscribers: aemerson, srhines, kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37243
llvm-svn: 312052
This is to fix PR34257. rL309059 takes an early return when FindLIVLoopCondition
fails to find a loop invariant condition. This is wrong and it will disable loop
unswitch for select. The patch fixes the bug.
Differential Revision: https://reviews.llvm.org/D36985
llvm-svn: 312045
Summary:
If SimplifyCFG pass is able to merge conditional stores into single one,
it loses the alignment. This may lead to incorrect codegen. Patch
sets the alignment of the new instruction if it is set in the original
one.
Reviewers: jmolloy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36841
llvm-svn: 312030
This patch adds splat support to transformZExtICmp. The test cases are vector versions of tests that failed when commenting out parts of the existing scalar code.
One test didn't vectorize optimize properly due to another bug so a TODO has been added.
Differential Revision: https://reviews.llvm.org/D37253
llvm-svn: 312023
Summary:
Remove some code that was no longer needed. The first FIXME is
stale since we long ago started using the index to drive importing,
rather than doing force importing based on linkage type. And
now with r309278, we no longer import any aliases.
Reviewers: dblaikie
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D37266
llvm-svn: 312019
Summary: We originally assume that in pgo-icp, the promoted direct call will never be null after strip point casts. However, stripPointerCasts is so smart that it could possibly return the value of the function call if it knows that the return value is always an argument. In this case, the returned value cannot cast to Instruction. In this patch, null check is added to ensure null pointer will not be accessed.
Reviewers: tejohnson, xur, davidxl, djasper
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D37252
llvm-svn: 312005
As suggested in D37121, here's a wrapper for removeFromParent() + insertAfter(),
but implemented using moveBefore() for symmetry/efficiency.
Differential Revision: https://reviews.llvm.org/D37239
llvm-svn: 312001
When LSR processes code like
int accumulator = 0;
for (int i = 0; i < N; i++) {
accummulator += i;
use((double) accummulator);
}
It may decide to replace integer `accumulator` with a double Shadow IV to get rid
of casts. The problem with that is that the `accumulator`'s value may overflow.
Starting from this moment, the behavior of integer and double accumulators
will differ.
This patch strenghtens up the conditions of Shadow IV mechanism applicability.
We only allow it for IVs that are proved to be `AddRec`s with `nsw`/`nuw` flag.
Differential Revision: https://reviews.llvm.org/D37209
llvm-svn: 311986
This was pretty close to working already. While I was here I went ahead and passed the ICmpInst pointer from the caller instead of doing a dyn_cast that can never fail.
Differential Revision: https://reviews.llvm.org/D37237
llvm-svn: 311960
In r311742 we marked the PCs array as used so it wouldn't be dead
stripped, but left the guard and 8-bit counters arrays alone since
these are referenced by the coverage instrumentation. This doesn't
quite work if we want the indices of the PCs array to match the other
arrays though, since elements can still end up being dead and
disappear.
Instead, we mark all three of these arrays as used so that they'll be
consistent with one another.
llvm-svn: 311959
Be more consistent with CreateFunctionLocalArrayInSection in the API
of CreatePCArray, and assign the member variable in the caller like we
do for the guard and 8-bit counter arrays.
This also tweaks the order of method declarations to match the order
of definitions in the file.
llvm-svn: 311955
We were handling some vectors in foldSelectIntoOp, but not if the operand of the bin op was any kind of vector constant. This patch fixes it to treat vector splats the same as scalars.
Differential Revision: https://reviews.llvm.org/D37232
llvm-svn: 311940
When peeling kicks in, it updates the loop preheader.
Later, a successful full unroll of the loop needs to update a PHI
which i-th argument comes from the loop preheader, so it'd better look
at the correct block. Fixes PR33437.
Differential Revision: https://reviews.llvm.org/D37153
llvm-svn: 311922
Summary:
Currently, a phi node is created in the normal destination to unify the return values from promoted calls and the original indirect call. This patch makes this phi node to be created only when the return value has uses.
This patch is necessary to generate valid code, as compiler crashes with the attached test case without this patch. Without this patch, an illegal phi node that has no incoming value from `entry`/`catch` is created in `cleanup` block.
I think existing implementation is good as far as there is at least one use of the original indirect call. `insertCallRetPHI` creates a new phi node in the normal destination block only when the original indirect call dominates its use and the normal destination block. Otherwise, `fixupPHINodeForNormalDest` will handle the unification of return values naturally without creating a new phi node. However, if there's no use, `insertCallRetPHI` still creates a new phi node even when the original indirect call does not dominate the normal destination block, because `getCallRetPHINode` returns false.
Reviewers: xur, davidxl, danielcdh
Reviewed By: xur
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37176
llvm-svn: 311906
Original commit r311077 of D32871 was reverted in r311304 due to failures
reported in PR34248.
This recommit fixes PR34248 by restricting the packing of predicated scalars
into vectors only when vectorizing, avoiding doing so when unrolling w/o
vectorizing. Added a test derived from the reproducer of PR34248.
llvm-svn: 311849
Prior to this change (and after r311371), we computed it
unconditionally, causin gsevere compile time regressions (in some
cases, 5 to 10x).
llvm-svn: 311804
Just create an all 1s demanded mask and continue recursing like normal. The recursive calls should be able to handle an all 1s mask and do the right thing.
The only time we should care about knowing whether the upper bit was demanded is when we need to know if we should clear the NSW/NUW flags.
Now that we have a consistent path through the code for all cases, use KnownBits::computeForAddSub to compute the known bits at the end since we already have the LHS and RHS.
My larger goal here is to move the code that turns add into xor if only 1 bit is demanded and no bits below it are non-zero from InstCombiner::OptAndOp to here. This will allow it to be more general instead of just looking for 'add' and 'and' with constant RHS.
Differential Revision: https://reviews.llvm.org/D36486
llvm-svn: 311789
Summary:
SimplifyIndVar may introduce zext instructions to widen arguments of the
loop exit check. They should not prevent us from splitting the loop at
the induction variable, but maybe the check should be more conservative,
e.g. making sure it only extends arguments used by a comparison?
Reviewers: karthikthecool, mcrosier, mzolotukhin
Reviewed By: mcrosier
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D34879
llvm-svn: 311783
There are cases where AShr have better chance to be optimized than LShr, especially when the demanded bits are not known to be Zero, and also known to be similar to the sign bit.
Differential Revision: https://reviews.llvm.org/D36936
llvm-svn: 311773
Summary:
Add musttail to any resume instructions that is immediately followed by a
suspend (i.e. ret). We do this even in -O0 to support guaranteed tail call
for symmetrical coroutine control transfer (C++ Coroutines TS extension).
This transformation is done only in the resume part of the coroutine that has
identical signature and calling convention as the coro.resume call.
Reviewers: GorNishanov
Reviewed By: GorNishanov
Subscribers: EricWF, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D37125
llvm-svn: 311751
There are 3 small independent changes here:
1. Account for multiple uses in the pattern matching: avoid the transform if it increases the instruction count.
2. Add a missing fold for the case where the numerator is the constant: http://rise4fun.com/Alive/E2p
3. Enable all folds for vector types.
There's still one more potential change - use "shouldChangeType()" to keep from transforming to an illegal integer type.
Differential Revision: https://reviews.llvm.org/D36988
llvm-svn: 311726
Summary:
When reassociating an expression, do not drop the instruction's
original debug location in case the replacement location is
missing.
The debug location must at least not be dropped for inlinable
callsites of debug-info-bearing functions in debug-info-bearing
functions. Failing to do so would result in an "inlinable function "
"call in a function with debug info must have a !dbg location"
error in the verifier.
As preserving the original debug location is not expected
to result in overly jumpy debug line information, it is
preserved for all other cases too.
This fixes PR34231:
https://bugs.llvm.org/show_bug.cgi?id=34231
Original patch by David Stenberg
Reviewers: davide, craig.topper, mcrosier, dblaikie, aprantl
Reviewed By: davide, aprantl
Subscribers: aprantl
Differential Revision: https://reviews.llvm.org/D36865
llvm-svn: 311642
Current PGO only annotates the edge weight for branch and switch instructions
with profile counts. We should also annotate the indirectbr instruction as
all the information is there. This patch enables the annotating for indirectbr
instructions. Also uses this annotation in branch probability analysis.
Differential Revision: https://reviews.llvm.org/D37074
llvm-svn: 311604
The lowering isn't really an optimization, so optnone shouldn't make a
difference. ARM relies on the pass running when using "-mthread-model
single", because in that mode, it doesn't run AtomicExpand. See bug for
more details.
Differential Revision: https://reviews.llvm.org/D37040
llvm-svn: 311565
Summary:
If a coroutine outer calls another coroutine inner and the inner coroutine body is inlined into the outer, coro.begin from the inner coroutine should be considered for spilling if accessed across suspends.
Prior to this change, coroutine frame building code was not considering any coro.begins for spilling.
With this change, we only ignore coro.begin for the current coroutine, but, any coro.begins that were inlined into the current coroutine are eligible for spills.
Fixes PR34267
Reviewers: GorNishanov
Subscribers: qcolombet, llvm-commits, EricWF
Differential Revision: https://reviews.llvm.org/D37062
llvm-svn: 311556
..if the resulting subtract will be broken up later. This can cause us to get
into an infinite loop.
x + (-5.0 * y) -> x - (5.0 * y) ; Canonicalize neg const
x - (5.0 * y) -> x + (0 - (5.0 * y)) ; Break up subtract
x + (0 - (5.0 * y)) -> x + (-5.0 * y) ; Replace 0-X with X*-1.
PR34078
llvm-svn: 311554
InstCombine folds instructions with irrelevant conditions to undef.
This, as Nuno confirmed is a bug.
(see https://bugs.llvm.org/show_bug.cgi?id=33409#c1 )
Given the original motivation for the change is that of removing an
USE, we now fold to false instead (which reaches the same goal
without undesired side effects).
Fixes PR33409.
Differential Revision: https://reviews.llvm.org/D36975
llvm-svn: 311540
Looks like for 'and' and 'or' we end up performing at least some of the transformations this is bocking in a round about way anyway.
For 'and sext(cmp1), sext(cmp2) we end up later turning it into 'select cmp1, sext(cmp2), 0'. Then we optimize that back to sext (and cmp1, cmp2). This is the same result we would have gotten if shouldOptimizeCast hadn't blocked it. We do something analogous for 'or'.
With this patch we allow that transformation to happen directly in foldCastedBitwiseLogic. And we now support the same thing for 'xor'. This is definitely opening up many other cases, but since we already went around it for some cases hopefully it's ok.
Differential Revision: https://reviews.llvm.org/D36213
llvm-svn: 311508
We can't reuse the llvm.assume instruction's bitcast because it may not
dominate every user of the vtable pointer.
Differential Revision: https://reviews.llvm.org/D36994
llvm-svn: 311491
Summary:
Use the initialexec TLS type and eliminate calls to the TLS
wrapper. Fixes the sanitizer-x86_64-linux-fuzzer bot failure.
Reviewers: vitalybuka, kcc
Reviewed By: kcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37026
llvm-svn: 311490
Summary:
This patch teaches ADCE to preserve both DominatorTrees and PostDominatorTrees.
This is reapplies the original patch r311057 that was reverted in r311381.
The previous version wasn't using the batch update api for updating dominators,
which in vary rare cases caused assertion failures.
This also fixes PR34258.
Reviewers: dberlin, chandlerc, sanjoy, davide, grosser, brzycki
Reviewed By: davide
Subscribers: grandinj, zhendongsu, llvm-commits, david2050
Differential Revision: https://reviews.llvm.org/D35869
llvm-svn: 311467
I don't think there's any reason to have them scattered about and on all 4 operands. We already have an early check that both compares must be the same type. And within a given compare the LHS and RHS must have the same type. Beyond that I don't think there's anyway this function returns anything valid for pointer types. So let's just return early and be done with it.
Differential Revision: https://reviews.llvm.org/D36561
llvm-svn: 311383
Currently, the inline cost model will bail once the inline cost exceeds the
inline threshold in order to avoid unnecessary compile-time. However, when
debugging it is useful to compute the full cost, so this command line option
is added to override the default behavior.
I took over this work from Chad Rosier (mcrosier@codeaurora.org).
Differential Revision: https://reviews.llvm.org/D35850
llvm-svn: 311371
The 1st try was reverted because it could inf-loop by creating a dead instruction.
Fixed that to not happen and added a test case to verify.
Original commit message:
Try to fold:
memcmp(X, C, ConstantLength) == 0 --> load X == *C
Without this change, we're unnecessarily checking the alignment of the constant data,
so we miss the transform in the first 2 tests in the patch.
I noted this shortcoming of LibCallSimpifier in one of the recent CGP memcmp expansion
patches. This doesn't help the example in:
https://bugs.llvm.org/show_bug.cgi?id=34032#c13
...directly, but it's worth short-circuiting more of these simple cases since we're
already trying to do that.
The benefit of transforming to load+cmp is that existing IR analysis/transforms may
further simplify that code. For example, if the load of the variable is common to
multiple memcmp calls, CSE can remove the duplicate instructions.
Differential Revision: https://reviews.llvm.org/D36922
llvm-svn: 311366
This is similar to what was already done in foldSelectICmpAndOr. Ultimately I'd like to see if we can call foldSelectICmpAnd from foldSelectIntoOp if we detect a power of 2 constant. This would allow us to remove foldSelectICmpAndOr entirely.
Differential Revision: https://reviews.llvm.org/D36498
llvm-svn: 311362
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33786
Reviewers: anemet, davidxl, chandlerc
Reviewed By: anemet
Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D36054
llvm-svn: 311349
Summary:
If the bitsToClear from the LHS of an 'and' comes back non-zero, but all of those bits are known zero on the RHS, we can reset bitsToClear.
Without this, the 'or' in the modified test case blocks the transform because it has non-zero bits in its RHS in those bits.
Reviewers: spatel, majnemer, davide
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36944
llvm-svn: 311343
Try to fold:
memcmp(X, C, ConstantLength) == 0 --> load X == *C
Without this change, we're unnecessarily checking the alignment of the constant data,
so we miss the transform in the first 2 tests in the patch.
I noted this shortcoming of LibCallSimpifier in one of the recent CGP memcmp expansion
patches. This doesn't help the example in:
https://bugs.llvm.org/show_bug.cgi?id=34032#c13
...directly, but it's worth short-circuiting more of these simple cases since we're
already trying to do that.
The benefit of transforming to load+cmp is that existing IR analysis/transforms may
further simplify that code. For example, if the load of the variable is common to
multiple memcmp calls, CSE can remove the duplicate instructions.
Differential Revision: https://reviews.llvm.org/D36922
llvm-svn: 311333
Added a separate metadata to indicate when the loop
has already been vectorized instead of setting width and count to 1.
Patch written by Divya Shanmughan and Aditya Kumar
Differential Revision: https://reviews.llvm.org/D36220
llvm-svn: 311281
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33786
Reviewers: anemet, davidxl, chandlerc
Reviewed By: anemet
Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D36054
llvm-svn: 311273
Summary:
Follow up to fix in r311023, which fixed the case where the combined
index is written to disk. The same samplePGO logic exists for the
in-memory index when computing imports, so we need to filter out
GlobalVariable summaries there too.
Reviewers: davidxl
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D36919
llvm-svn: 311254
a function into itself.
We tried to fix this before in r306495 but that got reverted as the
assert was actually hit.
This fixes the original bug (which we seem to have lost track of with
the revert) by blocking a second remapping when the function being
inlined is also the caller and the remapping could succeed but
erroneously.
The included test case would actually load from an inlined copy of the
alloca before this change, failing to load the stored value and
miscompiling.
Many thanks to Richard Smith for diagnosing a user miscompile to this
bug, and to Kyle for the first attempt and initial analysis and David Li
for remembering the issue and how to fix it and suggesting the patch.
I'm just stitching it together and landing it. =]
llvm-svn: 311229
Clamp function was too optimistic when choosing signed or unsigned min/max function for calculations.
In fact, `!IsSignedPredicate` guarantees us that `Smallest` and `Greatest` can be compared safely using unsigned
predicates, but we did not check this for `S` which can in theory be negative.
This patch makes Clamp use signed min/max for cases when it fails to prove `S` being non-negative,
and it adds a test where such situation may lead to incorrect conditions calculation.
Differential Revision: https://reviews.llvm.org/D36873
llvm-svn: 311205
Summary:
Memcpy intrinsics have size argument of any integer type, like i32 or i64.
Fixed size type along with its value when cloning the intrinsic.
Reviewers: davidxl, xur
Reviewed By: davidxl
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D36844
llvm-svn: 311188
Summary:
Augment SanitizerCoverage to insert maximum stack depth tracing for
use by libFuzzer. The new instrumentation is enabled by the flag
-fsanitize-coverage=stack-depth and is compatible with the existing
trace-pc-guard coverage. The user must also declare the following
global variable in their code:
thread_local uintptr_t __sancov_lowest_stack
https://bugs.llvm.org/show_bug.cgi?id=33857
Reviewers: vitalybuka, kcc
Reviewed By: vitalybuka
Subscribers: kubamracek, hiraditya, cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D36839
llvm-svn: 311186
Summary: This patch teaches LoopRotate to use the new incremental API to update the DominatorTree.
Reviewers: dberlin, davide, grosser, sanjoy
Reviewed By: dberlin, davide
Subscribers: hiraditya, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D35581
llvm-svn: 311125
Summary:
This patch makes LoopUnswitch use new incremental API for updating dominators.
It also updates SplitCriticalEdge, as it is called in LoopUnswitch.
There doesn't seem to be any noticeable performance difference when bootstrapping clang with this patch.
Reviewers: dberlin, davide, sanjoy, grosser, chandlerc
Reviewed By: davide, grosser
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35528
llvm-svn: 311093
In the case where dfsan provides a custom wrapper for a function,
shadow parameters are added for each parameter of the function.
These parameters are i16s. For targets which do not consider this
a legal type, the lack of sign extension information would cause
LLVM to generate anyexts around their usage with phi variables
and calling convention logic.
Address this by introducing zero exts for each shadow parameter.
Reviewers: pcc, slthakur
Differential Revision: https://reviews.llvm.org/D33349
llvm-svn: 311087
VPlan is an ongoing effort to refactor and extend the Loop Vectorizer. This
patch introduces the VPlan model into LV and uses it to represent the vectorized
code and drive the generation of vectorized IR.
In this patch VPlan models the vectorized loop body: the vectorized control-flow
is represented using VPlan's Hierarchical CFG, with predication refactored from
being a post-vectorization-step into a vectorization planning step modeling
if-then VPRegionBlocks, and generating code inline with non-predicated code. The
vectorized code within each VPBasicBlock is represented as a sequence of
Recipes, each responsible for modelling and generating a sequence of IR
instructions. To keep the size of this commit manageable the Recipes in this
patch are coarse-grained and capture large chunks of LV's code-generation logic.
The constructed VPlans are dumped in dot format under -debug.
This commit retains current vectorizer output, except for minor instruction
reorderings; see associated modifications to lit tests.
For further details on the VPlan model see docs/Proposals/VectorizationPlan.rst
and its references.
Authors: Gil Rapaport and Ayal Zaks
Differential Revision: https://reviews.llvm.org/D32871
llvm-svn: 311077
Summary:
This patch teaches ADCE to preserve both DominatorTrees and PostDominatorTrees.
I didn't notice any performance impact when bootstrapping clang with this patch.
The patch was originally committed in r311039 and reverted in r311049.
This revision fixes the problem with not adding a dependency on the
DominatorTreeWrapperPass for the LegacyPassManager.
Reviewers: dberlin, chandlerc, sanjoy, davide, grosser, brzycki
Reviewed By: davide
Subscribers: grandinj, zhendongsu, llvm-commits, david2050
Differential Revision: https://reviews.llvm.org/D35869
llvm-svn: 311057
Summary:
This patch teaches ADCE to preserve both DominatorTrees and PostDominatorTrees.
I didn't notice any performance impact when bootstrapping clang with this patch.
Reviewers: dberlin, chandlerc, sanjoy, davide, grosser, brzycki
Reviewed By: davide
Subscribers: grandinj, zhendongsu, llvm-commits, david2050
Differential Revision: https://reviews.llvm.org/D35869
llvm-svn: 311039
Summary:
Mark LoopDataPrefetch and AArch64FalkorHWPFFix passes as preserving
ScalarEvolution since they do not alter loop structure and should not
alter any SCEV values (though LoopDataPrefetch may introduce new
instructions that won't have cached SCEV values yet).
This can result in slight code differences, mainly w.r.t. nsw/nuw flags
on SCEVs, since these are computed somewhat lazily when a zext/sext
instruction is encountered. As a result, passes after the modified
passes may see SCEVs with more nsw/nuw flags present.
Reviewers: sanjoy, anemet
Subscribers: aemerson, rengolin, mzolotukhin, javed.absar, kristof.beyls, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D36716
llvm-svn: 311032
To clear assumptions that are potentially invalid after trivialization, we need
to walk the use/def chain. Normally, the only way to reach an instruction with
an unsized type is via an instruction that has side effects (or otherwise will
demand its input bits). That would stop the walk. However, if we have a
readnone function that returns an unsized type (e.g., void), we must avoid
asking for the demanded bits of the function call's return value. A
void-returning readnone function is always dead (and so we can stop walking the
use/def chain here), but the check is necessary to avoid asserting.
Fixes PR34211.
llvm-svn: 311014
Summary: When we move then-else code to if, we need to merge its debug info, otherwise the hoisted instruction may have inaccurate debug info attached.
Reviewers: aprantl, probinson, dblaikie, echristo, loladiro
Reviewed By: aprantl
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D36778
llvm-svn: 310985
We were only allowing ConstantInt before. This patch allows splat of ConstantInt too.
Differential Revision: https://reviews.llvm.org/D36763
llvm-svn: 310970
Narrow ops are better for bit-tracking, and in the case of vectors,
may enable better codegen.
As the trunc test shows, this can allow follow-on simplifications.
There's a block of code in visitTrunc that deals with shifted ops
with FIXME comments. It may be possible to remove some of that now,
but I want to make sure there are no problems with this step first.
http://rise4fun.com/Alive/Y3a
Name: hoist_ashr_ahead_of_sext_1
%s = sext i8 %x to i32
%r = ashr i32 %s, 3 ; shift value is < than source bit width
=>
%a = ashr i8 %x, 3
%r = sext i8 %a to i32
Name: hoist_ashr_ahead_of_sext_2
%s = sext i8 %x to i32
%r = ashr i32 %s, 8 ; shift value is >= than source bit width
=>
%a = ashr i8 %x, 7 ; so clamp this shift value
%r = sext i8 %a to i32
Name: junc_the_trunc
%a = sext i16 %v to i32
%s = ashr i32 %a, 18
%t = trunc i32 %s to i16
=>
%t = ashr i16 %v, 15
llvm-svn: 310942
Summary:
This patch teaches PostDominatorTree about infinite loops. It is built on top of D29705 by @dberlin which includes a very detailed motivation for this change.
What's new is that the patch also teaches the incremental updater how to deal with reverse-unreachable regions and how to properly maintain and verify tree roots. Before that, the incremental algorithm sometimes ended up preserving reverse-unreachable regions after updates that wouldn't appear in the tree if it was constructed from scratch on the same CFG.
This patch makes the following assumptions:
- A sequence of updates should produce the same tree as a recalculating it.
- Any sequence of the same updates should lead to the same tree.
- Siblings and roots are unordered.
The last two properties are essential to efficiently perform batch updates in the future.
When it comes to the first one, we can decide later that the consistency between freshly built tree and an updated one doesn't matter match, as there are many correct ways to pick roots in infinite loops, and to relax this assumption. That should enable us to recalculate postdominators less frequently.
This patch is pretty conservative when it comes to incremental updates on reverse-unreachable regions and ends up recalculating the whole tree in many cases. It should be possible to improve the performance in many cases, if we decide that it's important enough.
That being said, my experiments showed that reverse-unreachable are very rare in the IR emitted by clang when bootstrapping clang. Here are the statistics I collected by analyzing IR between passes and after each removePredecessor call:
```
# functions: 52283
# samples: 337609
# reverse unreachable BBs: 216022
# BBs: 247840796
Percent reverse-unreachable: 0.08716159869015269 %
Max(PercRevUnreachable) in a function: 87.58620689655172 %
# > 25 % samples: 471 ( 0.1395104988314885 % samples )
... in 145 ( 0.27733680163724345 % functions )
```
Most of the reverse-unreachable regions come from invalid IR where it wouldn't be possible to construct a PostDomTree anyway.
I would like to commit this patch in the next week in order to be able to complete the work that depends on it before the end of my internship, so please don't wait long to voice your concerns :).
Reviewers: dberlin, sanjoy, grosser, brzycki, davide, chandlerc, hfinkel
Reviewed By: dberlin
Subscribers: nhaehnle, javed.absar, kparzysz, uabelho, jlebar, hiraditya, llvm-commits, dberlin, david2050
Differential Revision: https://reviews.llvm.org/D35851
llvm-svn: 310940
Two minor savings: avoid copying the SinkAfter map and avoid moving a cast if it
is not needed.
Differential Revision: https://reviews.llvm.org/D36408
llvm-svn: 310910
This recommits r310869, with the moved files and no extra changes.
Original commit message:
This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.
I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.
I also had to make decomposeBitTest support vectors since InstSimplify needs that.
As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.
Differential Revision: https://reviews.llvm.org/D36593
llvm-svn: 310889
Failed to add the two files that moved. And then added an extra change I didn't mean to while trying to fix that. Reverting everything.
llvm-svn: 310873
This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.
I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.
I also had to make decomposeBitTest support vectors since InstSimplify needs that.
As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.
Differential Revision: https://reviews.llvm.org/D36593
llvm-svn: 310869
This change let us schedule a bundle with different opcodes in it, for example : [ load, add, add, add ]
Reviewers: mkuper, RKSimon, ABataev, mzolotukhin, spatel, filcab
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D36518
llvm-svn: 310847
The assert was added with r310779 and is usually correct,
but as the test shows, not always. The 'volatile' on the
load is needed to expose the faulty path because without
it, DemandedBits would return that the load is just dead
rather than not demanded, and so we wouldn't hit the
bogus assert.
Also, since the lambda is just a single-line now, get rid
of it and inline the DB.isAllOnesValue() calls.
This should fix (prevent execution of a faulty assert):
https://bugs.llvm.org/show_bug.cgi?id=34179
llvm-svn: 310842
On some targets, the penalty of executing runtime unrolling checks
and then not the unrolled loop can be significantly detrimental to
performance. This results in the need to be more conservative with
the unroll count, keeping a trip count of 2 reduces the overhead as
well as increasing the chance of the unrolled body being executed. But
being conservative leaves performance gains on the table.
This patch enables the unrolling of the remainder loop introduced by
runtime unrolling. This can help reduce the overhead of misunrolled
loops because the cost of non-taken branches is much less than the
cost of the backedge that would normally be executed in the remainder
loop. This allows larger unroll factors to be used without suffering
performance loses with smaller iteration counts.
Differential Revision: https://reviews.llvm.org/D36309
llvm-svn: 310824
Summary:
These functions were overly complicated. The body of this function was rechecking for an And operation to find the constant, but we already knew we were looking at two Ands ORed together and the pieces are in variables. We already had earlier nearby code that checked for ConstantInts. So just inline the remaining parts into the earlier code.
Next step is to use m_APInt instead of ConstantInt.
Reviewers: spatel, efriedma, davide, majnemer
Reviewed By: spatel
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D36439
llvm-svn: 310806
Updating remark API to newer OptimizationDiagnosticInfo API. This
allows remarks to show up in diagnostic yaml file, and enables use
of opt-viewer tool.
Hotness information for remarks (L505 and L751) do not display hotness
information, most likely due to profile information not being
propagated yet. Unsure if this is the desired outcome.
Patch by Tarun Rajendran.
Differential Revision: https://reviews.llvm.org/D36127
llvm-svn: 310763
This also corrects the description to match what was actually implemented. The old comment said X^(C1|C2), but it implemented X^((C1|C2)&~(C1&C2)). I believe ((C1|C2)&~(C1&C2)) is equivalent to (C1^C2).
Differential Revision: https://reviews.llvm.org/D36505
llvm-svn: 310658
We used to try to truncate the constant vector to vXi1, but if it's already i1 this would fail. Instead we now use IRBuilder::getZExtOrTrunc which should check the type and only create a trunc if needed. I believe this should trigger constant folding in the IRBuilder and ultimately do the same thing just with the additional type check.
llvm-svn: 310639
Sometimes it would be nice to stop InstCombine mid way through its combining to see the current IR. By using a debug counter we can place an upper limit on how many instructions to process.
This will also allow skipping the first X combines, but that has the potential to change later combines since earlier canonicalizations might have been skipped.
Differential Revision: https://reviews.llvm.org/D36553
llvm-svn: 310638
This make it consistent with STATISTIC which it will often appears near.
While there move one DEBUG_COUNTER instance out of an anonymous namespace. It's already declaring a static variable so the namespace is unnecessary.
llvm-svn: 310637
This implementation of SanitizerCoverage instrumentation inserts different
callbacks depending on constantness of operands:
1. If both operands are non-const, then a usual
__sanitizer_cov_trace_cmp[1248] call is inserted.
2. If exactly one operand is const, then a
__sanitizer_cov_trace_const_cmp[1248] call is inserted. The first
argument of the call is always the constant one.
3. If both operands are const, then no callback is inserted.
This separation comes useful in fuzzing when tasks like "find one operand
of the comparison in input arguments and replace it with the other one"
have to be done. The new instrumentation allows us to not waste time on
searching the constant operands in the input.
Patch by Victor Chibotaru.
llvm-svn: 310600
I couldn't find any smaller folds to help the cases in:
https://bugs.llvm.org/show_bug.cgi?id=34046
after:
rL310141
The truncated rotate-by-variable patterns elude all of the existing transforms because
of multiple uses and knowledge about demanded bits and knownbits that doesn't exist
without the whole pattern. So we need an unfortunately large pattern match. But by
simplifying this pattern in IR, the backend is already able to generate
rolb/rolw/rorb/rorw for x86 using its existing rotate matching logic (although
there is a likely extraneous 'and' of the rotate amount).
Note that rotate-by-constant doesn't have this problem - smaller folds should already
produce the narrow IR ops.
Differential Revision: https://reviews.llvm.org/D36395
llvm-svn: 310509
Summary:
Instrumentation to copy byval arguments is now correctly inserted
after the dynamic shadow base is loaded.
Reviewers: vitalybuka, eugenis
Reviewed By: vitalybuka
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D36533
llvm-svn: 310503
isLegalAddressingMode() has recently gained the extra optional Instruction*
parameter, and therefore it can now do the job that previously only
isFoldableMemAccess() could do.
The SystemZ implementation of isLegalAddressingMode() has gained the
functionality of checking for offsets, which used to be done with
isFoldableMemAccess().
The isFoldableMemAccess() hook has been removed everywhere.
Review: Quentin Colombet, Ulrich Weigand
https://reviews.llvm.org/D35933
llvm-svn: 310463
In the recursive call to isAMCompletelyFolded(), the passed offset should be
the sum of F.BaseOffset and Fixup.Offset.
Review: Quentin Colombet.
llvm-svn: 310462
When a new phi is generated for scalarpre of an expression, the phiTranslate cache
will become stale: Before PRE, the candidate expression must not be available in a
predecessor block, and phitranslate will cache the information. After PRE, the
expression will become available in all predecessor blocks, so the related entries
in phiTranslate cache becomes stale. The patch will simply remove the stale entries
so phiTranslate can be recomputed next time.
The stale entries in phitranslate cache will not affect correctness but will cause
missing PRE opportunity for later instructions.
Differential Revision: https://reviews.llvm.org/D36124
llvm-svn: 310421
Summary: Currently, ICP checks the count against a fixed value to see if it is hot enough to be promoted. This does not work for SamplePGO because sampled count may be much smaller. This patch uses PSI to check if the count is hot enough to be promoted.
Reviewers: davidxl, tejohnson, eraman
Reviewed By: davidxl
Subscribers: sanjoy, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D36341
llvm-svn: 310416
We already support pulling through an add with constant RHS. We can do the same for subtract.
Differential Revision: https://reviews.llvm.org/D36443
llvm-svn: 310407
Summary:
When vectorizing fcmps we can trip on incorrect cast assertion when setting the
FastMathFlags after generating the vectorized FCmp.
This can happen if the FCmp can be folded to true or false directly. The fix
here is to set the FastMathFlag using the FastMathFlagBuilder *before* creating
the FCmp Instruction. This is what's done by other optimizations such as
InstCombine.
Added a test case which trips on cast assertion without this patch.
Reviewers: Ayal, mssimpso, mkuper, gilr
Reviewed by: Ayal, mssimpso
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D36244
llvm-svn: 310389
results when a loop is completely removed.
This is very hard to manifest as a visible bug. You need to arrange for
there to be a subsequent allocation of a 'Loop' object which gets the
exact same address as the one which the unroll deleted, and you need the
LoopAccessAnalysis results to be significant in the way that they're
stale. And you need a million other things to align.
But when it does, you get a deeply mysterious crash due to actually
finding a stale analysis result. This fixes the issue and tests for it
by directly checking we successfully invalidate things. I have not been
able to get *any* test case to reliably trigger this. Changes to LLVM
itself caused the only test case I ever had to cease to crash.
I've looked pretty extensively at less brittle ways of fixing this and
they are actually very, very hard to do. This is a somewhat strange and
unusual case as we have a pass which is deleting an IR unit, but is not
running within that IR unit's pass framework (which is what handles this
cleanly for the normal loop unroll). And where there isn't a definitive
way to clear *all* of the stale cache entries. And where the pass *is*
updating the core analysis that provides the IR units!
For example, we don't have any of these problems with Function analyses
because it is easy to clear out function analyses when the functions
themselves may have been deleted -- we clear an entire module's worth!
But that is too heavy of a hammer down here in the LoopAnalysisManager
layer.
A better long-term solution IMO is to require that AnalysisManager's
make their keys durable to this kind of thing. Specifically, when
caching an analysis for one IR unit that is conceptually "owned" by
a higher level IR unit, the AnalysisManager should incorporate this into
its data structures so that we can reliably clear these results without
having to teach each and every pass to do so manually as we do here. But
that is a change for another day as it will be a fairly invasive change
to the AnalysisManager infrastructure. Until then, this fortunately
seems to be quite rare.
llvm-svn: 310333
The root cause of reverting was fixed - PR33514.
Summary:
The patch makes instruction count the highest priority for
LSR solution for X86 (previously registers had highest priority).
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D30562
From: Evgeny Stupachenko <evstupac@gmail.com>
<evgeny.v.stupachenko@intel.com>
llvm-svn: 310289
Note the original code I deleted incorrectly listed this as (X | C1) & C2 --> (X & C2^(C1&C2)) | C1 Which is only valid if C1 is a subset of C2. This relied on SimplifyDemandedBits to remove any extra bits from C1 before we got to that code.
My new implementation avoids relying on that behavior so that it can be naively verified with alive.
Differential Revision: https://reviews.llvm.org/D36384
llvm-svn: 310272
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:
1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the
array. This array is processed only after the vectorization of the
first-after-these instructions key node is finished. Vectorization goes
in reverse order to try to vectorize as much code as possible.
Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon
Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits
Differential Revision: https://reviews.llvm.org/D29826
llvm-svn: 310260
Summary:
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:
1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible.
Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon
Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits
Differential Revision: https://reviews.llvm.org/D29826
llvm-svn: 310255
While here, rename `i` to `Rank` as the latter is more
self-explanatory (and this code also uses `I` two lines below to
identify an Instruction).
llvm-svn: 310238
Unfortunately, it looks like there's some other missed optimizations in the generated code for some of these cases. I'll try to look at some of those next.
llvm-svn: 310184
Previously we were always trying to emit the zext or truncate before any shift. This meant if the 'and' mask was larger than the size of the truncate we would skip the transformation.
Now we shift the result of the and right first leaving the bit within the range of the truncate.
This matches what we are doing in foldSelectICmpAndOr for the same problem.
llvm-svn: 310159
Summary:
The bug was uncovered after fix of PR23384 (part 3 of 3).
The patch restricts pointer multiplication in SCEV computaion for ICmpZero.
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D36170
From: Evgeny Stupachenko <evstupac@gmail.com>
<evgeny.v.stupachenko@intel.com>
llvm-svn: 310092
The frontend may have requested a higher alignment for any reason, and
downstream optimizations may already have taken advantage of it. We
should keep the same alignment when moving the allocation from the
parameter area to the local variable area.
Fixes PR34038
llvm-svn: 310071
Summary:
The (not (sext)) case is really (xor (sext), -1) which should have been simplified to (sext (xor, 1)) before we got here. So we shouldn't need to handle it.
With that taken care of we only need to two cases so don't need the swap anymore. This makes us in sync with the equivalent code in visitOr so inline this to match.
Reviewers: spatel, eli.friedman, majnemer
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36240
llvm-svn: 310063
Name: narrow_shift
Pre: C1 < 8
%zx = zext i8 %x to i32
%l = lshr i32 %zx, C1
=>
%narrowC = trunc i32 C1 to i8
%ns = lshr i8 %x, %narrowC
%l = zext i8 %ns to i32
http://rise4fun.com/Alive/jIV
This isn't directly applicable to PR34046 as written, but we
need to have more narrowing folds like this to be sure that
rotate patterns are recognized.
llvm-svn: 310060