If the multiply won't overflow in the original type we can use a smaller mul and sign extend afterwards. We don't currently support this for vector constants.
llvm-svn: 341884
Summary:
Revert min/max changes in rL341674 dues to high compile times causing timeouts (PR38897).
Checking in to unblock failing builds. Patch available for post-commit review and re-revert once resolved.
Working on a smaller reproducer for PR38897.
Reviewers: craig.topper, spatel
Subscribers: sanjoy, jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D51897
llvm-svn: 341883
LSR reassociates small constants that fit into add immediate operands as
unfolded offset. Since unfolded offset is not combined with loop-invariant
registers, LSR does not consider solutions that bump invariant registers by
these constants outside the loop.
llvm-svn: 341835
There were two combines not covered by the check before now, neither of which
actually differed from normal in the benefit analysis.
The most recent seems to be because it was just added at the top of the
function (naturally). The older is from way back in 2008 (r46687) when we just
didn't put those checks in so routinely, and has been diligently maintained
since.
llvm-svn: 341831
When GVN propagates an equality by replacing one value with another it also
needs to invalidate the cached information for the value being replaced.
Differential Revision: https://reviews.llvm.org/D51218
llvm-svn: 341820
Currently, `sinkUnusedInvariants` does not set Changed flag even if it makes
changes in the IR. There is no clear evidence that it can cause a crash, but it
looks highly suspicious and likely invalid.
Differential Revision: https://reviews.llvm.org/D51777
Reviewed By: skatkov
llvm-svn: 341777
Summary:
Like with other similar intrinsics, presense of strip or
launder.invariant.group should not change the result of inlining cost.
This is because they are just markers and do not perform any computation.
Reviewers: amharc, rsmith, reames, kuhar
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D51814
llvm-svn: 341725
AliasSetTracker has special case handling for memset, memcpy and memmove which pre-existed argmemonly on functions and readonly and writeonly on arguments. This patch generalizes it using the AA infrastructure to any call correctly annotated.
The motivation here is to cut down on confusion, not performance per se. For most instructions, there is a direct mapping to alias set. However, this is not guaranteed by the interface and was not in fact true for these three intrinsics *and only these three intrinsics*. I kept getting myself confused about this invariant, so I figured it would be good to clearly distinguish between a instructions and alias sets. Calls happened to be an easy target.
The nice side effect is that custom implementations of memset/memcpy/memmove - including wrappers discovered by IPO - can now be optimized the same as builts by LICM.
Note: The actual removal of the memset/memtransfer specific handling will happen in a follow on NFC patch. It was originally part of this one, but separate for ease of review and rebase.
Differential Revision: https://reviews.llvm.org/D50730
llvm-svn: 341713
shuf (sel (shuf NarrowCond, undef, WideMask), X, Y), undef, NarrowMask) -->
sel NarrowCond, (shuf X, undef, NarrowMask), (shuf Y, undef, NarrowMask)
The motivating case from:
https://bugs.llvm.org/show_bug.cgi?id=38691
...is the last regression test. In that case, we're just left with the narrow select.
Note that if we do create new shuffles, they use the existing extraction identity mask,
so there's no danger that this transform creates arbitrary shuffles.
Differential Revision: https://reviews.llvm.org/D51496
llvm-svn: 341708
If the ~X wasn't able to simplify above the max/min, we might be able to simplify it by moving it below the max/min.
I had to modify the ~(min/max ~X, Y) transform to prevent getting stuck in a loop when we saw the new ~(max/min X, ~Y) before the ~Y had been folded away to remove the new not.
Differential Revision: https://reviews.llvm.org/D51398
llvm-svn: 341674
Fix a latent bug in loop vectorizer which generates incorrect code for
memory accesses that are executed conditionally. As pointed in review,
this bug definitely affects uniform loads and may affect conditional
stores that should have turned into scatters as well).
The code gen for conditionally executed uniform loads on architectures
that support masked gather instructions is broken.
Without this patch, we were unconditionally executing the *conditional*
load in the vectorized version.
This patch does the following:
1. Uniform conditional loads on architectures with gather support will
have correct code generated. In particular, the cost model
(setCostBasedWideningDecision) is fixed.
2. For the recipes which are handled after the widening decision is set,
we use the isScalarWithPredication(I, VF) form which is added in the
patch.
3. Fix the vectorization cost model for scalarization
(getMemInstScalarizationCost): implement and use isPredicatedInst to
identify *all* predicated instructions, not just scalar+predicated. So,
now the cost for scalarization will be increased for maskedloads/stores
and gather/scatter operations. In short, we should be choosing the
gather/scatter in place of scalarization on archs where it is
profitable.
4. We needed to weaken the assert in useEmulatedMaskMemRefHack.
Reviewers: Ayal, hsaito, mkuper
Differential Revision: https://reviews.llvm.org/D51313
llvm-svn: 341673
Find cold blocks based on profile information (or optionally with static analysis).
Forward propagate profile information to all cold-blocks.
Outline a cold region.
Set calling conv and prof hint for the callsite of the outlined function.
Worked in collaboration with: Sebastian Pop <s.pop@samsung.com>
Differential Revision: https://reviews.llvm.org/D50658
llvm-svn: 341669
If OtherOpT or OtherOpF have scalar types and the condition is a vector,
we would create an invalid select.
Reviewers: spatel, john.brawn, mssimpso, craig.topper
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D51781
llvm-svn: 341666
Currently eliminateInstructions only returns true if any instruction got
replaced. In the test case for this patch, we eliminate the trivially
dead calls, for which eliminateInstructions not do a replacement and the
function is not marked as changed, which is why the inliner crashes
while traversing the call graph.
Alternatively we could also change eliminateInstructions to return true
in case we mark instructions for deletion, but that's slightly more code
and doing it at the place where the replacement happens seems safer.
Fixes PR37517.
Reviewers: davide, mcrosier, efriedma, bjope
Reviewed By: bjope
Differential Revision: https://reviews.llvm.org/D51169
llvm-svn: 341651
IndVars does not set `Changed` flag when it eliminates dead instructions. As result,
it may make IR modifications and report that it has done nothing. It leads to inconsistent
preserved analyzes results.
Differential Revision: https://reviews.llvm.org/D51770
Reviewed By: skatkov
llvm-svn: 341633
The patch tries to make sample profile loader independent of profile format
change. It moves compact format related code into FunctionSamples and
SampleProfileReader classes, and sample profile loader only has to interact
with those two classes and will be unaware of profile format changes.
The cleanup also contain some fixes to further remove the difference between
compactbinary format and binary format. After the cleanup using different
formats originated from the same profile will generate the same binaries,
which we verified by compiling two large server benchmarks w/wo thinlto.
Differential Revision: https://reviews.llvm.org/D51643
llvm-svn: 341591
This fold is needed to avoid a regression when we try
to recommit rL300977.
We can't see the most basic win currently because
demanded bits changes the patterns:
https://rise4fun.com/Alive/plpp
llvm-svn: 341559
Previously the alignment on the newly created global strings was not set,
meaning that DataLayout::getPreferredAlignment was free to overalign it
to 16 bytes. This caused unnecessary code bloat with the padding between
variables.
The main example of this happening was the printf->puts optimisation in
SimplifyLibCalls, but as the change here is made in
IRBuilderBase::CreateGlobalString, other globals using this will now be
aligned too.
Differential Revision: https://reviews.llvm.org/D51410
llvm-svn: 341527
I'm probably missing some way to use m_Deferred to remove the code
duplication, but that can be a follow-up.
The improvement in demand_shrink_nsw.ll is an example of missing
the fold because the pattern matching was deficient. I didn't try
to follow the bits in that test, but Alive says it's correct:
https://rise4fun.com/Alive/ugc
llvm-svn: 341426
Reland r341269. Use std::stable_sort when sorting constant condidates.
Reverting commit, r341365:
Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions
One of the tests is failing 50% of the time when expensive checks are
enabled. Not sure how deep the problem is so just reverting while the
author can investigate so that the bots stop repeatedly failing and
blaming things incorrectly. Will respond with details on the original
commit.
Original commit, r341269:
[Constant Hoisting] Hoisting Constant GEP Expressions
Leverage existing logic in constant hoisting pass to transform constant GEP
expressions sharing the same base global variable. Multi-dimensional GEPs are
rewritten into single-dimensional GEPs.
https://reviews.llvm.org/D51396
Differential Revision: https://reviews.llvm.org/D51654
llvm-svn: 341417
This is fix for PR38786.
First order recurrence phis were incorrectly treated as uniform,
which caused them to be vectorized as uniform instructions.
Patch by Ayal Zaks and Orivej Desh!
Reviewed by: Anna
Differential Revision: https://reviews.llvm.org/D51639
llvm-svn: 341416
There are 2 bugs shown here that were untested before:
1. We fail to perform the fold in 1/2 the possible commuted variants.
2. When the fold is done, it disregards extra uses.
llvm-svn: 341415
Recent change to deleteDeadBlocksFromLoop was not enough to
fix all the problems related to dead blocks after nontrivial
unswitching of switches.
We need to delete all the dead blocks that were created during
unswitching, otherwise we will keep having problems with phi's
or dead blocks.
This change removes all the dead blocks that are reachable from the loop,
not trying to track whether these blocks are newly created by unswitching
or not. While not completely correct, we are unlikely to get loose but
reachable dead blocks that do not belong to our loop nest.
It does fix all the failures currently known, in particular PR38778.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D51519
llvm-svn: 341398
The tests attempted to check for commuted variants
of these folds, but complexity-based canonicalization
meant we had no coverage for at least 1/2 of the cases.
Also, the folds correctly check hasOneUse(), but there
was no coverage for that.
llvm-svn: 341394
Summary:
Control height reduction merges conditional blocks of code and reduces the
number of conditional branches in the hot path based on profiles.
if (hot_cond1) { // Likely true.
do_stg_hot1();
}
if (hot_cond2) { // Likely true.
do_stg_hot2();
}
->
if (hot_cond1 && hot_cond2) { // Hot path.
do_stg_hot1();
do_stg_hot2();
} else { // Cold path.
if (hot_cond1) {
do_stg_hot1();
}
if (hot_cond2) {
do_stg_hot2();
}
}
This speeds up some internal benchmarks up to ~30%.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: xbolva00, dmgreen, mehdi_amini, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D50591
llvm-svn: 341386
One of the tests is failing 50% of the time when expensive checks are
enabled. Not sure how deep the problem is so just reverting while the
author can investigate so that the bots stop repeatedly failing and
blaming things incorrectly. Will respond with details on the original
commit.
llvm-svn: 341365
Load Hardening.
Wires up the existing pass to work with a proper IR attribute rather
than just a hidden/internal flag. The internal flag continues to work
for now, but I'll likely remove it soon.
Most of the churn here is adding the IR attribute. I talked about this
Kristof Beyls and he seemed at least initially OK with this direction.
The idea of using a full attribute here is that we *do* expect at least
some forms of this for other architectures. There isn't anything
*inherently* x86-specific about this technique, just that we only have
an implementation for x86 at the moment.
While we could potentially expose this as a Clang-level attribute as
well, that seems like a good question to defer for the moment as it
isn't 100% clear whether that or some other programmer interface (or
both?) would be best. We'll defer the programmer interface side of this
for now, but at least get to the point where the feature can be enabled
without relying on implementation details.
This also allows us to do something that was really hard before: we can
enable *just* the indirect call retpolines when using SLH. For x86, we
don't have any other way to mitigate indirect calls. Other architectures
may take a different approach of course, and none of this is surfaced to
user-level flags.
Differential Revision: https://reviews.llvm.org/D51157
llvm-svn: 341363
This patch removes the function `expandSCEVIfNeeded` which behaves not as
it was intended. This function tries to make a lookup for exact existing expansion
and only goes to normal expansion via `expandCodeFor` if this lookup hasn't found
anything. As a result of this, if some instruction above the loop has a `SCEVConstant`
SCEV, this logic will return this instruction when asked for this `SCEVConstant` rather
than return a constant value. This is both non-profitable and in some cases leads to
breach of LCSSA form (as in PR38674).
Whether or not it is possible to break LCSSA with this algorithm and with some
non-constant SCEVs is still in question, this is still being investigated. I wasn't
able to construct such a test so far, so maybe this situation is impossible. If it is,
it will go as a separate fix.
Rather than do it, it is always correct to just invoke `expandCodeFor` unconditionally:
it behaves smarter about insertion points, and as side effect of this it will choose a
constant value for SCEVConstants. For other SCEVs it may end up finding a better insertion
point. So it should not be worse in any case.
NOTE: So far the only known case for which this transform may break LCSSA is mapping
of SCEVConstant to an instruction. However there is a suspicion that the entire algorithm
can compromise LCSSA form for other cases as well (yet not proved).
Differential Revision: https://reviews.llvm.org/D51286
Reviewed By: etherzhhb
llvm-svn: 341345
The fold was implemented for the general case but use-limitation,
but the later constant version which didn't check uses was only
matching splat constants.
llvm-svn: 341292
If we have a pair of binops feeding another pair of binops, rearrange the operands so
the matching pair are together because that allows easy factorization folds to happen
in instcombine:
((X << S) & Y) & (Z << S) --> ((X << S) & (Z << S)) & Y (reassociation)
--> ((X & Z) << S) & Y (factorize shift from 'and' ops optimization)
This is part of solving PR37098:
https://bugs.llvm.org/show_bug.cgi?id=37098
Note that there's an instcombine version of this patch attached there, but we're trying
to make instcombine have less responsibility to improve compile-time efficiency.
For reasons I still don't completely understand, reassociate does this kind of transform
sometimes, but misses everything in my motivating cases.
This patch on its own is gluing an independent cleanup chunk to the end of the existing
RewriteExprTree() loop. We can build on it and do something stronger to better order the
full expression tree like D40049. That might be an alternative to the proposal to add a
separate reassociation pass like D41574.
Differential Revision: https://reviews.llvm.org/D45842
llvm-svn: 341288
Leverage existing logic in constant hoisting pass to transform constant GEP
expressions sharing the same base global variable. Multi-dimensional GEPs are
rewritten into single-dimensional GEPs.
Differential Revision: https://reviews.llvm.org/D51396
llvm-svn: 341269
It has essentially the same benefit it has on 64-bit ARM: it
substantially reduces the number of constants used by large GEP
operations. Seems to be generally helpful across a few different
codebases I've tried.
Differential Revision: https://reviews.llvm.org/D51462
llvm-svn: 341136
Summary:
Currently, the SafeStack analysis disallows out-of-bounds writes but not
out-of-bounds reads for mem intrinsics like llvm.memcpy. This could
cause leaks of pointers to the safe stack by leaking spilled registers/
frame pointers. Check for allocas used as source or destination pointers
to mem intrinsics.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: pcc, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D51334
llvm-svn: 341116
Generalize the simplification of `pow(2.0, y)` to `pow(2.0 ** n, y)` for all
scalar and vector types.
This improvement helps some benchmarks in SPEC CPU2000 and CPU2006, such as
252.eon, 447.dealII, 453.povray. Otherwise, no significant regressions on
x86-64 or A64.
Differential revision: https://reviews.llvm.org/D49273
llvm-svn: 341095
Splitting an alloca can decrease the alignment of GEPs into the
partition. Normally, rewriting accounts for this, but the code was
missing for uses of PHI nodes and select instructions.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38707 .
Differential Revision: https://reviews.llvm.org/D51335
llvm-svn: 341094
We now only add +64bit to the CPU string for "generic" CPU. All other CPU names are assumed to have the feature flag already set if they support 64-bit. I've remove the implies from CMPXCHG8 so that Feature64Bit only comes in via CPUs or user passing -mattr=+64bit.
I've changed the assert to a report_fatal_error so it's not lost in Release builds.
The test updates are to fix things that tripped the new error.
Differential Revision: https://reviews.llvm.org/D51231
llvm-svn: 341022
The cost modeling was not accounting for the fact we were duplicating the instruction once per predecessor. With a default threshold of 1, this meant we were actually creating #pred copies.
Adding to the fun, there is *absolutely no* test coverage for this. Simply bailing for more than one predecessor passes all checked in tests.
llvm-svn: 341001
Teach LICM to hoist stores out of loops when the store writes to a location otherwise unused in the loop, writes a value which is invariant, and is guaranteed to execute if the loop is entered.
Worth noting is that this transformation is partially overlapping with the existing promotion transformation. Reasons this is worthwhile anyway include:
* For multi-exit loops, this doesn't require duplication of the store.
* It kicks in for case where we can't prove we exit through a normal exit (i.e. we may throw), but can prove the store executes before that possible side exit.
Differential Revision: https://reviews.llvm.org/D50925
llvm-svn: 340974
Summary:
Assert from PR38737 happens on the dead block inside the parent loop
after unswitching nontrivial switch in the inner loop.
deleteDeadBlocksFromLoop now takes extra care to detect/remove dead
blocks in all the parent loops in addition to the blocks from original
loop being unswitched.
Reviewers: asbirlea, chandlerc
Reviewed By: asbirlea
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51415
llvm-svn: 340955
This is a follow-up to rL339604 which did the same transform
for a sin libcall. The handling of intrinsics vs. libcalls
is unfortunately scattered, so I'm just adding this next to
the existing transform for llvm.cos for now.
This should resolve PR38458:
https://bugs.llvm.org/show_bug.cgi?id=38458
If the call was already negated, the negates will cancel
each other out.
llvm-svn: 340952
Expand the simplification of `pow(exp{,2}(x), y)` to all FP types.
This improvement helps some benchmarks in SPEC CPU2000 and CPU2006, such as
252.eon, 447.dealII, 453.povray. Otherwise, no significant regressions on
x86-64 or A64.
Differential revision: https://reviews.llvm.org/D51195
llvm-svn: 340948
Generalize the simplification of `pow(2.0, y)` to `pow(2.0 ** n, y)` for all
scalar and vector types.
This improvement helps some benchmarks in SPEC CPU2000 and CPU2006, such as
252.eon, 447.dealII, 453.povray. Otherwise, no significant regressions on
x86-64 or A64.
Differential revision: https://reviews.llvm.org/D49273
llvm-svn: 340947
In the PR, LoopSink was trying to sink into a catchswitch block, which
doesn't have a valid insertion point.
Differential Revision: https://reviews.llvm.org/D51307
llvm-svn: 340900
In Thumb1, legal imm range is [0, 255] for ADD/SUB instructions. However, the
legal imm range for LD/ST in (R+Imm) addressing mode is [0, 127]. Imms in
[128, 255] are materialized by mov R, #imm, and LD/STs use them in (R+R)
addressing mode.
This patch checks if a constant is used as offset in (R+Imm), if so, it checks
isLegalAddressingMode passing the constant value as BaseOffset.
Differential Revision: https://reviews.llvm.org/D50931
llvm-svn: 340882
Summary:
Sometimes reading an output *.ll file it is not easy to understand why some callsites are not inlined. We can read output of inline remarks (option --pass-remarks-missed=inline) and try correlating its messages with the callsites.
An easier way proposed by this patch is to add to every callsite processed by Inliner an attribute with the latest message that describes the cause of not inlining this callsite. The attribute is called //inline-remark//. By default this feature is off. It can be switched on by the option //-inline-remark-attribute//.
For example in the provided test the result method //@test1// has two callsites //@bar// and inline remarks report different inlining missed reasons:
remark: <unknown>:0:0: bar not inlined into test1 because too costly to inline (cost=-5, threshold=-6)
remark: <unknown>:0:0: bar not inlined into test1 because it should never be inlined (cost=never): recursive
It is not clear which remark correspond to which callsite. With the inline remark attribute enabled we get the reasons attached to their callsites:
define void @test1() {
call void @bar(i1 true) #0
call void @bar(i1 false) #2
ret void
}
attributes #0 = { "inline-remark"="(cost=-5, threshold=-6)" }
..
attributes #2 = { "inline-remark"="(cost=never): recursive" }
Patch by: yrouban (Yevgeny Rouban)
Reviewers: xbolva00, tejohnson, apilipenko
Reviewed By: xbolva00, tejohnson
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D50435
llvm-svn: 340834
Summary:
This fixes PR31105.
There is code trying to delete dead code that does so by e.g. checking if
the single predecessor of a block is the block itself.
That check fails on a block like this
bb:
br i1 undef, label %bb, label %bb
since that has two (identical) predecessors.
However, after the check for dead blocks there is a call to
ConstantFoldTerminator on the basic block, and that call simplifies the
block to
bb:
br label %bb
Therefore we now do the call to ConstantFoldTerminator before the check if
the block is dead, so it can realize that it really is.
The original behavior lead to the block not being removed, but it was
simplified as above, and then we did a call to
Dest->replaceAllUsesWith(&*I);
with old and new being equal, and an assertion triggered.
Reviewers: chandlerc, fhahn
Reviewed By: fhahn
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D51280
llvm-svn: 340820
This patch issues an error message if Darwin ABI is attempted with the PPC
backend. It also cleans up existing test cases, either converting the test to
use an alternative triple or removing the test if the coverage is no longer
needed.
Updated Tests
-------------
The majority of test cases were updated to use a different triple that does not
include the Darwin ABI. Many tests were also updated to use FileCheck, in place
of grep.
Deleted Tests
-------------
llvm/test/tools/dsymutil/PowerPC/sibling.test was originally added to test
specific functionality of dsymutil using an object file created with an old
version of llvm-gcc for a Powerbook G4. After a discussion with @JDevlieghere he
suggested removing the test.
llvm/test/CodeGen/PowerPC/combine_loads_from_build_pair.ll was converted from a
PPC test to a SystemZ test, as the behavior is also reproducible there.
All other tests that were deleted were specific to the darwin/ppc ABI and no
longer necessary.
Phabricator Review: https://reviews.llvm.org/D50988
llvm-svn: 340795
This lines up with the behavior of an existing transform where if both
operands of the binop are shuffled, we allow moving the binop before the
shuffle regardless of whether the shuffle changes the size of the vector.
llvm-svn: 340787
Fix the issue of duplicating the call to `exp{,2}()` when it's nested in
`pow()`, as exposed by rL340462.
Differential revision: https://reviews.llvm.org/D51194
llvm-svn: 340784
This reverts r319889.
Unfortunately, wrapping flags are not a part of SCEV's identity (they
do not participate in computing a hash value or in equality
comparisons) and in fact they could be assigned after the fact w/o
rebuilding a SCEV.
Grep for const_cast's to see quite a few of examples, apparently all
for AddRec's at the moment.
So, if 2 expressions get built in 2 slightly different ways: one with
flags set in the beginning, the other with the flags attached later
on, we may end up with 2 expressions which are exactly the same but
have their operands swapped in one of the commutative N-ary
expressions, and at least one of them will have "sorted by complexity"
invariant broken.
2 identical SCEV's won't compare equal by pointer comparison as they
are supposed to.
A real-world reproducer is added as a regression test: the issue
described causes 2 identical SCEV expressions to have different order
of operands and therefore compare not equal, which in its turn
prevents LoadStoreVectorizer from vectorizing a pair of consecutive
loads.
On a larger example (the source of the test attached, which is a
bugpoint) I have seen even weirder behavior: adding a constant to an
existing SCEV changes the order of the existing terms, for instance,
getAddExpr(1, ((A * B) + (C * D))) returns (1 + (C * D) + (A * B)).
Differential Revision: https://reviews.llvm.org/D40645
llvm-svn: 340777
Summary:
Patch by Marek Olsak and David Stuttard, both of AMD.
This adds a new amdgcn intrinsic supporting s.buffer.load, in particular
multiple dword variants. These are convenient to use from some front-end
implementations.
Also modified the existing llvm.SI.load.const intrinsic to common up the
underlying implementation.
This modification also requires that we can lower to non-uniform loads correctly
by splitting larger dword variants into sizes supported by the non-uniform
versions of the load.
V2: Addressed minor review comments.
V3: i1 glc is now i32 cachepolicy for consistency with buffer and
tbuffer intrinsics, plus fixed formatting issue.
V4: Added glc test.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51098
Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b
llvm-svn: 340684
Back in https://reviews.llvm.org/D19559, I tried to teach CVP about range facts implied by value/value icmps (i.e. no constants.) In the meantime, we've implemented the optimization, but I couldn't find tests checked in, so adding them.
llvm-svn: 340660
Otherwise, the debug info is incorrect. On its own, this is mostly
harmless, but the safe-stack also later inlines the call to
__safestack_pointer_address, which leads to debug info with the wrong
scope, which eventually causes an assertion failure (and incorrect debug
info in release mode).
Differential Revision: https://reviews.llvm.org/D51075
llvm-svn: 340651
Summary:
Sometimes reading an output *.ll file it is not easy to understand why some callsites are not inlined. We can read output of inline remarks (option --pass-remarks-missed=inline) and try correlating its messages with the callsites.
An easier way proposed by this patch is to add to every callsite processed by Inliner an attribute with the latest message that describes the cause of not inlining this callsite. The attribute is called //inline-remark//. By default this feature is off. It can be switched on by the option //-inline-remark-attribute//.
For example in the provided test the result method //@test1// has two callsites //@bar// and inline remarks report different inlining missed reasons:
remark: <unknown>:0:0: bar not inlined into test1 because too costly to inline (cost=-5, threshold=-6)
remark: <unknown>:0:0: bar not inlined into test1 because it should never be inlined (cost=never): recursive
It is not clear which remark correspond to which callsite. With the inline remark attribute enabled we get the reasons attached to their callsites:
define void @test1() {
call void @bar(i1 true) #0
call void @bar(i1 false) #2
ret void
}
attributes #0 = { "inline-remark"="(cost=-5, threshold=-6)" }
..
attributes #2 = { "inline-remark"="(cost=never): recursive" }
Patch by: yrouban (Yevgeny Rouban)
Reviewers: xbolva00, tejohnson, apilipenko
Reviewed By: xbolva00, tejohnson
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D50435
llvm-svn: 340618
Once the invariant_start is reached, we know that no instruction *after* it can modify the memory. So, if we can prove the location isn't read *between entry into the loop and the execution of the invariant_start*, we can execute the invariant_start before entering the loop.
Differential Revision: https://reviews.llvm.org/D51181
llvm-svn: 340617
This patch makes the DoesKMove argument non-optional, to force people
to think about it. Most cases where it is false are either code hoisting
or code sinking, where we pick one instruction from a set of
equal instructions among different code paths.
Reviewers: dberlin, nlopes, efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D47475
llvm-svn: 340606
When GVN sets the incoming value for a phi to undef because the incoming block
is unreachable it needs to also invalidate the cached info for that phi in
MemoryDependenceAnalysis, otherwise later queries will return stale information.
Differential Revision: https://reviews.llvm.org/D51099
llvm-svn: 340529
This version of the patch fixes cleaning up ssa_copy intrinsics, so it does not
crash for instructions in blocks that have been marked unreachable.
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
Differential Revision: https://reviews.llvm.org/D45330
llvm-svn: 340525
If we have a min/max pair we can do a better job of counting sign bits if we look at them together. This is similar to what is done in the SelectionDAG version of computeNumSignBits for ISD::SMAX/SMIN.
Differential Revision: https://reviews.llvm.org/D51112
llvm-svn: 340480
Instead of asserting that the function doesn't have any unreachable
code, just ignore it for the purpose of computing liveness.
Differential Revision: https://reviews.llvm.org/D51070
llvm-svn: 340456
Summary:
Add MemorySSA as a dependency to LoopSimplifyCFG and preserve it.
Disabled by default until all passes preserve MemorySSA.
Reviewers: bogner, chandlerc
Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits
Differential Revision: https://reviews.llvm.org/D50911
llvm-svn: 340445
Summary:
Add MemorySSA as a depency to LoopInstInstSimplify and preserve it.
Disabled by default until all passes preserve MemorySSA.
Reviewers: chandlerc
Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits
Differential Revision: https://reviews.llvm.org/D50906
llvm-svn: 340444
Guard widening should not spend efforts on dealing with guards with trivial true/false conditions.
Such guards can easily be eliminated by any further cleanup pass like instcombine. However we
should not unconditionally delete them because it may be profitable to widen other conditions
into such guards.
Differential Revision: https://reviews.llvm.org/D50247
Reviewed By: fedor.sergeev
llvm-svn: 340381
CodeGenPrepare has a strategy for moving dbg.values so that a value's
definition always dominates its debug users. This cleanup was happening
too early (before certain CGP transforms were run), resulting in some
dbg.value use-before-def errors.
Perform this cleanup as late as possible to avoid use-before-def.
llvm-svn: 340370
This test shows that optimizeSelectInst splits a select and sinks a
`fdiv` operation to one side of the diamond. However, the dbg.value for
the operation isn't moved.
llvm-svn: 340369
This is preparation for landing a use-before-def verifier for debug
intrinsics (D46100).
As a drive-by, remove `tail` from debug intrinsic calls because it
doesn't mean anything in that context.
llvm-svn: 340366
Currently CodeExtractor tries to use the next node after an invoke to
place the store for the result of the invoke, if it is an out parameter
of the region. This fails, as the invoke terminates the current BB.
In that case, we can place the store in the 'normal destination' BB, as
the result will only be available in that case.
Reviewers: davidxl, davide, efriedma
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D51037
llvm-svn: 340331
Currently we assign the same value number to two calls reading the same
memory location if we do not have MemoryDependence info. Without MemDep
Info we cannot guarantee that there is no store between the two calls, so we
have to assign a new number to the second call.
It also adds a new option EnableMemDep to enable/disable running
MemoryDependenceAnalysis and also renamed NoLoads to NoMemDepAnalysis to
be more explicit what it does. As it also impacts calls that read memory,
NoLoads is a bit confusing.
Reviewers: efriedma, sebpop, john.brawn, wmi
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D50893
llvm-svn: 340319
Remove duplicate tests from InstCombine that were added with
D50582. I left negative tests there to verify that nothing
in InstCombine tries to go overboard. If isKnownNeverNaN is
improved to handle the FP binops or other cases, we should
have coverage under InstSimplify, so we could remove more
duplicate tests from InstCombine at that time.
llvm-svn: 340279
Summary:
Follow up change to rL339703, where we now vectorize loops with non-phi
instructions used outside the loop. Note that the cyclic dependency
identification occurs when identifying reduction/induction vars.
We also need to identify that we do not allow users where the PSCEV information
within and outside the loop are different. This was the fix added in rL307837
for PR33706.
Reviewers: Ayal, mkuper, fhahn
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D50778
llvm-svn: 340278
This is a slight modification of the tests from D50582;
change half of the predicates to 'uno' so we have coverage
for that side too. All of the positive tests can fold to a
constant (true/false), so that should happen in instsimplify.
llvm-svn: 340276
This patch teaches LICM to hoist guards from the loop if they are guaranteed to execute and
if there are no side effects that could prevent that.
Differential Revision: https://reviews.llvm.org/D50501
Reviewed By: reames
llvm-svn: 340256
These intrinsics are modelled as writing for control flow purposes, but they don't actually write to any location. Marking these - as we did for guards - allows LICM to hoist loads out of loops containing invariant.starts.
Differential Revision: https://reviews.llvm.org/D50861
llvm-svn: 340245
NewGVN uses InstructionSimplify for simplifications of leaders of
congruence classes. It is not guaranteed that the metadata or other
flags/keywords (like nsw or exact) of the leader is available for all members
in a congruence class, so we cannot use it for simplification.
This patch adds a InstrInfoQuery struct with a boolean field
UseInstrInfo (which defaults to true to keep the current behavior as
default) and a set of helper methods to get metadata/keywords for a
given instruction, if UseInstrInfo is true. The whole thing might need a
better name, to avoid confusion with TargetInstrInfo but I am not sure
what a better name would be.
The current patch threads through InstrInfoQuery to the required
places, which is messier then it would need to be, if
InstructionSimplify and ValueTracking would share the same Query struct.
The reason I added it as a separate struct is that it can be shared
between InstructionSimplify and ValueTracking's query objects. Also,
some places do not need a full query object, just the InstrInfoQuery.
It also updates some interfaces that do not take a Query object, but a
set of optional parameters to take an additional boolean UseInstrInfo.
See https://bugs.llvm.org/show_bug.cgi?id=37540.
Reviewers: dberlin, davide, efriedma, sebpop, hiraditya
Reviewed By: hiraditya
Differential Revision: https://reviews.llvm.org/D47143
llvm-svn: 340031
This patch performs a widening transformation of bitwise atomicrmw
{or,xor,and} and applies it prior to tryExpandAtomicRMW. This operates
similarly to convertCmpXchgToIntegerType. For these operations, the i8/i16
atomicrmw can be implemented in terms of the 32-bit atomicrmw by appropriately
manipulating the operands. There is no functional change for the handling of
partword or/xor, but the transformation for partword 'and' is new.
The advantage of performing this transformation early is that the same
code-path can be used regardless of the approach used to expand the atomicrmw
(AtomicExpansionKind). i.e. the same logic is used for
AtomicExpansionKind::CmpXchg and can also be used by the intrinsic-based
expansion in D47882.
Differential Revision: https://reviews.llvm.org/D48129
llvm-svn: 340027
Summary:
Currently, in LICM, we use the alias set tracker to identify if the
instruction (we're interested in hoisting) aliases with instruction that
modifies that memory location.
This patch adds an LICM alias analysis diagnostic tool that checks the
mod ref info of the instruction we are interested in hoisting/sinking,
with every instruction in the loop. Because of O(N^2) complexity this
is now only a diagnostic tool to show the limitation we have with the
alias set tracker and is OFF by default.
Test cases show the difference with the diagnostic analysis tool, where
we're able to hoist out loads and readonly + argmemonly calls from the
loop, where the alias set tracker analysis is not able to hoist these
instructions out.
Reviewers: reames, mkazantsev, fedor.sergeev, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50854
llvm-svn: 340026
The description of `isGuaranteedToExecute` does not correspond to its implementation.
According to description, it should return `true` if an instruction is executed under the
assumption that its loop is *entered*. However there is a sophisticated alrogithm inside
that tries to prove that the instruction is executed if the loop is *exited*, which is not the
same thing for infinite loops. There is an attempt to protect from dealing with infinite loops
by prohibiting loops without exit blocks, however an infinite loop can have exit blocks.
As result of that, MustExecute can falsely consider some blocks that are never entered as
mustexec, and LICM can hoist dangerous instructions out of them basing on this fact.
This may introduce UB to programs which did not contain it initially.
This patch removes the problematic algorithm and replaced it with a one which tries to
prove what is required in description.
Differential Revision: https://reviews.llvm.org/D50558
Reviewed By: reames
llvm-svn: 339984
This is a follow-up suggested with rL339604.
For tan(), we don't have a corresponding LLVM
intrinsic -- unlike sin/cos -- so this is the
only way/place that we can do this fold currently.
llvm-svn: 339958
The fix is fairly simple, but is says something unpleasant about the usage and testing of invariant.start/end scopes that this went undetected. To put this in perspective, *any* invariant.end in a loop flowing through LICM crashed. I haven't bothered to figure out just how far back this goes, but it's not caused by any of the recent changes. We're probably talking months if not years.
llvm-svn: 339936
Expand the number of cases when `pow(x, 0.5)` is simplified into `sqrt(x)`
by considering the math semantics with more granularity.
Differential revision: https://reviews.llvm.org/D50036
llvm-svn: 339887
This patch fixes PR38125.
Instruction extension types are recorded in PromotedInsts, it can be used later in function canGetThrough. If an instruction has two users with different extension types, it will be inserted into PromotedInsts two times in function promoteOperandForOther. The second one overwrites the first one, and the final extension type is wrong, later causes problem in canGetThrough.
This patch changes the simple bool extension type to 2-bit enum type, add a BothExtension type in addition to zero/sign extension. When an user sees BothExtension for an instruction, it actually knows nothing about how that instruction is extended.
Differential Revision: https://reviews.llvm.org/D49512
llvm-svn: 339822
The `experimental_guard` intrinsic has memory write semantics to model the thread-exiting
logic, but does not do any actual writes to memory. Currently, `AliasSetTracker` treats it as a
normal memory write. As result, a loop-invariant load cannot be hoisted out of loop because
the guard may possibly alias with it.
This patch makes `AliasSetTracker` so that it doesn't treat guards as memory writes.
Differential Revision: https://reviews.llvm.org/D50497
Reviewed By: reames
llvm-svn: 339753
Summary:
This patch teaches the loop vectorizer to vectorize loops with non
header phis that have have outside uses. This is because the iteration
dependence distance for these phis can be widened upto VF (similar to
how we do for induction/reduction) if they do not have a cyclic
dependence with header phis. When identifying reduction/induction/first
order recurrence header phis, we already identify if there are any cyclic
dependencies that prevents vectorization.
The vectorizer is taught to extract the last element from the vectorized
phi and update the scalar loop exit block phi to contain this extracted
element from the vector loop.
This patch can be extended to vectorize loops where instructions other
than phis have outside uses.
Reviewers: Ayal, mkuper, mssimpso, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50579
llvm-svn: 339703
Summary:
Calls marked 'tail' cannot read or write allocas from the current frame
because the current frame might be destroyed by the time they run.
However, a tail call may use an alloca with byval. Calling with byval
copies the contents of the alloca into argument registers or stack
slots, so there is no lifetime issue. Tail calls never modify allocas,
so we can return just ModRefInfo::Ref.
Fixes PR38466, a longstanding bug.
Reviewers: hfinkel, nlewycky, gbiv, george.burgess.iv
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D50679
llvm-svn: 339636
Summary:
This comes with `Implicit Conversion Sanitizer - integer sign change` (D50250):
```
signed char test(unsigned int x) { return x; }
```
`clang++ -fsanitize=implicit-conversion -S -emit-llvm -o - /tmp/test.cpp -O3`
* Old: {F6904292}
* With this patch: {F6904294}
General pattern:
X & Y
Where `Y` is checking that all the high bits (covered by a mask `4294967168`)
are uniform, i.e. `%arg & 4294967168` can be either `4294967168` or `0`
Pattern can be one of:
%t = add i32 %arg, 128
%r = icmp ult i32 %t, 256
Or
%t0 = shl i32 %arg, 24
%t1 = ashr i32 %t0, 24
%r = icmp eq i32 %t1, %arg
Or
%t0 = trunc i32 %arg to i8
%t1 = sext i8 %t0 to i32
%r = icmp eq i32 %t1, %arg
This pattern is a signed truncation check.
And `X` is checking that some bit in that same mask is zero.
I.e. can be one of:
%r = icmp sgt i32 %arg, -1
Or
%t = and i32 %arg, 2147483648
%r = icmp eq i32 %t, 0
Since we are checking that all the bits in that mask are the same,
and a particular bit is zero, what we are really checking is that all the
masked bits are zero.
So this should be transformed to:
%r = icmp ult i32 %arg, 128
The transform itself ended up being rather horrible, even though i omitted some cases.
Surely there is some infrastructure that can help clean this up that i missed?
https://rise4fun.com/Alive/3Ou
The initial commit (rL339610)
was reverted, since the first assert was being triggered.
The @positive_with_extra_and test now has coverage for that case.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: RKSimon, erichkeane, vsk, llvm-commits
Differential Revision: https://reviews.llvm.org/D50465
llvm-svn: 339621
Even though this code is below a function called optimizeFloatingPointLibCall(),
we apparently can't guarantee that we're dealing with FPMathOperators, so bail
out immediately if that's not true.
llvm-svn: 339618
At least one buildbot was able to actually trigger that assert
on the top of the function. Will investigate.
This reverts commit r339610.
llvm-svn: 339612
Summary:
This comes with `Implicit Conversion Sanitizer - integer sign change` (D50250):
```
signed char test(unsigned int x) { return x; }
```
`clang++ -fsanitize=implicit-conversion -S -emit-llvm -o - /tmp/test.cpp -O3`
* Old: {F6904292}
* With this patch: {F6904294}
General pattern:
X & Y
Where `Y` is checking that all the high bits (covered by a mask `4294967168`)
are uniform, i.e. `%arg & 4294967168` can be either `4294967168` or `0`
Pattern can be one of:
%t = add i32 %arg, 128
%r = icmp ult i32 %t, 256
Or
%t0 = shl i32 %arg, 24
%t1 = ashr i32 %t0, 24
%r = icmp eq i32 %t1, %arg
Or
%t0 = trunc i32 %arg to i8
%t1 = sext i8 %t0 to i32
%r = icmp eq i32 %t1, %arg
This pattern is a signed truncation check.
And `X` is checking that some bit in that same mask is zero.
I.e. can be one of:
%r = icmp sgt i32 %arg, -1
Or
%t = and i32 %arg, 2147483648
%r = icmp eq i32 %t, 0
Since we are checking that all the bits in that mask are the same,
and a particular bit is zero, what we are really checking is that all the
masked bits are zero.
So this should be transformed to:
%r = icmp ult i32 %arg, 128
https://rise4fun.com/Alive/3Ou
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: RKSimon, erichkeane, vsk, llvm-commits
Differential Revision: https://reviews.llvm.org/D50465
llvm-svn: 339610
Added a test case to reduction showing where it's illegal to identify
vectorize a loop.
Resetting the reduction var during loop iterations disallows us from
widening the dependency cycle to VF, thereby making it illegal to
vectorize the loop.
llvm-svn: 339605
This is a very partial fix for the reported problem. I suspect
we do not get this fold in most motivating cases because most of
the time, the libcall would have been replaced by an intrinsic,
and that optimization is handled elsewhere...but maybe it should
be handled here?
llvm-svn: 339604
This is a second part of D49974 that handles widening of conditional branches that
have very likely `false` branch.
Differential Revision: https://reviews.llvm.org/D50040
Reviewed By: reames
llvm-svn: 339537
Summary:
We've supported constant folding for sse versions for many years. This patch adds support for the avx512 versions including unsigned with the default rounding mode. We could probably do more with other roundings modes and SAE in the future.
The test cases are largely based on the sse.ll test cases. But I did add some test cases to ensure the unsigned versions don't accept negative values. Also checked the bounds of f64->i32 conversions to make sure unsigned has a larger positive range than signed.
Reviewers: RKSimon, spatel, chandlerc
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50553
llvm-svn: 339529
Try to improve the computed counts when it has been explicitly set by a pragma
or command line option. This moves the code around, so that first call to
computeUnrollCount to get a sensible count and override that if explicit unroll
and jam counts are specified.
Also added some extra debug messages for when unroll and jamming is disabled.
Differential Revision: https://reviews.llvm.org/D50075
llvm-svn: 339501
If we have an assume which is known to execute and whose operand is invariant, we can lift that into the pre-header. So long as we don't change which paths the assume executes on, this is a legal transformation. It's likely to be a useful canonicalization as other transforms only look for dominating assumes.
Differential Revision: https://reviews.llvm.org/D50364
llvm-svn: 339481
This includes a test that would have exposed the bug in rL339439
which was reverted at rL339446. The compare can be integer while
the binop is FP or vice-versa, so we need to use the binop type
when we ask for the identity constant.
llvm-svn: 339453
MemorySSA currently creates MemoryAccesses for lifetime intrinsics, and
sometimes treats them as clobbers. This may/may not be the best way
forward, but while we're doing it, we should consider
MayAlias/PartialAlias to be clobbers.
The ideal fix here is probably to remove all of this reasoning about
lifetimes from MemorySSA + put it into the passes that need to care. But
that's a wayyy broader fix that needs some consensus, and we have
miscompiles + a release branch today, and this should solve the
miscompiles just as well.
differential revision is D43269. Landing without an explicit LGTM (and
without using the special please-autoclose-this syntax) so we can still
use that revision as a place to decide what the right fix here is.
llvm-svn: 339411
The motivating case is an otherwise dead loop with a fence in it. At the moment, this goes all the way through the optimizer and we end up emitting an entirely pointless loop on x86. This case may seem a bit contrived, but we've seen it in real code as the result of otherwise reasonable lowering strategies combined w/thread local memory optimizations (such as escape analysis).
To handle this simple case, we can teach LICM to hoist must execute fences when there is no other memory operation within the loop.
Differential Revision: https://reviews.llvm.org/D50489
llvm-svn: 339378
Summary:
LoopSimplifyCFG should update ScEv for all loops after a block is deleted.
If the deleted block "Succ" is part of L, then it is part of all parent loops, so forget topmost loop.
Reviewers: greened, mkazantsev, sanjoy
Subscribers: jlebar, javed.absar, uabelho, llvm-commits
Differential Revision: https://reviews.llvm.org/D50422
llvm-svn: 339363
The inalloca parameter has to be the only parameter passed in memory.
Changing the convention to fastcc can break that.
At some point we should teach global opt how to optimize ABI attributes
like inalloca and maybe byval. These attributes are mainly used to match
C ABIs. They are harder for LLVM to optimize and they don't always
generate the best code.
Fixes PR38487
llvm-svn: 339360
The main interesting case is a fence in an otherwise dead loop or one containing only arithmetic. This can happen as a result of DSE or other transforms from seemingly reasonable initial IR.
llvm-svn: 339310
The scalar cases are handled in instcombine's internal
reassociation pass for FP ops, but it misses the vector types.
These patterns are similar to what was handled in InstSimplify in:
https://reviews.llvm.org/rL339171https://reviews.llvm.org/rL339174https://reviews.llvm.org/rL339176
...but we can't use instsimplify on these because we require negation
of the original operand.
llvm-svn: 339263
This accounts for the missing IR fold noted in D50195. We don't need any fast-math to enable the negation transform.
FP negation can always be folded into an fmul/fdiv constant to eliminate the fneg.
I've limited this to one-use to ensure that we are eliminating an instruction rather than replacing fneg by a
potentially expensive fdiv or fmul.
Differential Revision: https://reviews.llvm.org/D50417
llvm-svn: 339248
Summary:
https://rise4fun.com/Alive/IT3
Comes up in the [most ugliest] `signed int` -> `signed char` case of
`-fsanitize=implicit-conversion` (https://reviews.llvm.org/D50250)
Previously, we were stuck with `not`: {F6867736}
But now we are able to completely get rid of it: {F6867737}
(FIXME: why are we loosing the metadata? that seems wrong/strange.)
Here, we only want to do that it we will be able to completely
get rid of that 'not'.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: vsk, erichkeane, llvm-commits
Differential Revision: https://reviews.llvm.org/D50301
llvm-svn: 339243