Summary:
This fixes a build break with r331269.
test/Transforms/LoopVectorize/pr23997.ll
should be in:
test/Transforms/LoopVectorize/X86/pr23997.ll
llvm-svn: 331281
Summary:
This is a fix for PR23997.
The loop vectorizer is not preserving the inbounds property of GEPs that it creates.
This is inhibiting some optimizations. This patch preserves the inbounds property in
the case where a load/store is being fed by an inbounds GEP.
Reviewers: mkuper, javed.absar, hsaito
Reviewed By: hsaito
Subscribers: dcaballe, hsaito, llvm-commits
Differential Revision: https://reviews.llvm.org/D46191
llvm-svn: 331269
phi is on lhs of a comparison op.
For the following testcase,
L1:
%t0 = add i32 %m, 7
%t3 = icmp eq i32* %t2, null
br i1 %t3, label %L3, label %L2
L2:
%t4 = load i32, i32* %t2, align 4
br label %L3
L3:
%t5 = phi i32 [ %t0, %L1 ], [ %t4, %L2 ]
%t6 = icmp eq i32 %t0, %t5
br i1 %t6, label %L4, label %L5
We know if we go through the path L1 --> L3, %t6 should always be true. However
currently, if the rhs of the eq comparison is phi, JumpThreading fails to
evaluate %t6 to true. And we know that Instcombine cannot guarantee always
canonicalizing phi to the left hand side of the comparison operation according
to the operand priority comparison mechanism in instcombine. The patch handles
the case when rhs of the comparison op is a phi.
Differential Revision: https://reviews.llvm.org/D46275
llvm-svn: 331266
instcombine should transform the relevant cases if the OverflowingBinaryOperator/PossiblyExactOperator can be proven to be safe.
Change-Id: I7aec62a31a894e465e00eb06aed80c3ea0c9dd45
llvm-svn: 331265
This test had values that differed in only in capitalization,
and that causes problems for the auto-generating check line
script. So I changed that in rL331226, but I accidentally
forgot to change a subsequent use of a param.
llvm-svn: 331228
Summary:
As discussed in D45733, we want to do this in InstCombine.
https://rise4fun.com/Alive/LGk
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: chandlerc, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D45867
llvm-svn: 331205
Summary:
Masked merge has a pattern of: `((x ^ y) & M) ^ y`.
But, there is no difference between `((x ^ y) & M) ^ y` and `((x ^ y) & ~M) ^ x`,
We should canonicalize the pattern to non-inverted mask.
https://rise4fun.com/Alive/Yol
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45664
llvm-svn: 331112
Summary:
Masked merge has a pattern of: `((x ^ y) & M) ^ y`.
But, there is no difference between `((x ^ y) & M) ^ y` and `((x ^ y) & ~M) ^ x`,
We should canonicalize the pattern to non-inverted mask.
Differential Revision: https://reviews.llvm.org/D45663
llvm-svn: 331111
The effect of doing so is not disrupting the LoopPassManager when mixing this pass with other loop passes. This should help locality of access substaintially and avoids the cost of computing PostDom.
The assumption here is that the full GuardWidening (which does use PostDom) is run as a canonicalization before loop opts and that this version is just catching cases exposed by other loop passes. (i.e. LoopPredication, IndVarSimplify, LoopUnswitch, etc..)
llvm-svn: 331094
This patch adds support for fragment expressions
TryToShrinkGlobalToBoolean() which were previously just dropped.
Thanks to Reid Kleckner for providing me a reproducer!
llvm-svn: 331086
Summary:
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the `LHS` and `RHS` matchers:
1. match `RHS` matcher to the `first` operand of binary operator,
2. and then match `LHS` matcher to the `second` operand of binary operator.
This works ok.
But it complicates writing of commutative matchers, where one would like to match
(`m_Value()`) the value on one side, and use (`m_Specific()`) it on the other side.
This is additionally complicated by the fact that `m_Specific()` stores the `Value *`,
not `Value **`, so it won't work at all out of the box.
The last problem is trivially solved by adding a new `m_c_Specific()` that stores the
`Value **`, not `Value *`. I'm choosing to add a new matcher, not change the existing
one because i guess all the current users are ok with existing behavior,
and this additional pointer indirection may have performance drawbacks.
Also, i'm storing pointer, not reference, because for some mysterious-to-me reason
it did not work with the reference.
The first one appears trivial, too.
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the ~~`LHS` and `RHS` matchers~~ **operands**:
1. match ~~`RHS`~~ **`LHS`** matcher to the ~~`first`~~ **`second`** operand of binary operator,
2. and then match ~~`LHS`~~ **`RHS`** matcher to the ~~`second`~ **`first`** operand of binary operator.
Surprisingly, `$ ninja check-llvm` still passes with this.
But i expect the bots will disagree..
The motivational unittest is included.
I'd like to use this in D45664.
Reviewers: spatel, craig.topper, arsenm, RKSimon
Reviewed By: craig.topper
Subscribers: xbolva00, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D45828
llvm-svn: 331085
As suggested in D45842
(although still not sure if we're going to advance that),
we must invalidate references to instructions that have
been recycled (operands were changed, so result is different).
llvm-svn: 331083
We currently have a hard to solve analysis problem around the order of instructions within a potentially throwing block. We can't cheaply determine whether a given instruction is before the first potential throw in the block. While we're working on that in the background, special case the first instruction within the header.
why this particular special case? Well, headers are guaranteed to execute if the loop does, and it turns out we tend to produce this form in practice.
In a follow on patch, I tend to extend LICM with an alternate approach which works for any instruction in the header before the first throw, but this is the best I can come up with other users of the analysis (such as store promotion.)
Note: I can't show the difference in the analysis result since we're ORing in the expensive instruction walk used by SCEV. Using the full walk is not suitable for a general solution.
llvm-svn: 331079
The idea is to have a pass which performs the same transformation as GuardWidening, but can be run within a loop pass manager without disrupting the pass manager structure. As demonstrated by the test case, this doesn't quite get there because of issues with post dom, but it gives a good step in the right direction. the motivation is purely to reduce compile time since we can now preserve locality during the loop walk.
This patch only includes a legacy pass. A follow up will add a new style pass as well.
llvm-svn: 331060
We currently support LCSSA PHI nodes in the outer loop exit, if their
incoming values do not come from the outer loop latch or if the
outer loop latch has a single predecessor. In that case, the outer loop latch
will be executed only if the inner loop gets executed. If we have multiple
predecessors for the outer loop latch, it may be executed even if the inner
loop does not get executed.
This is a first step to support the case described in
https://bugs.llvm.org/show_bug.cgi?id=30472
Reviewers: efriedma, karthikthecool, mcrosier
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D43237
llvm-svn: 331037
Since PTX has grown a <2 x half> datatype vectorization has become more
important. The late LoadStoreVectorizer intentionally only does loads
and stores, but now arithmetic has to be vectorized for optimal
throughput too.
This is still very limited, SLP vectorization happily creates <2 x half>
if it's a legal type but there's still a lot of register moving
happening to get that fed into a vectorized store. Overall it's a small
performance win by reducing the amount of arithmetic instructions.
I haven't really checked what the loop vectorizer does to PTX code, the
cost model there might need some more tweaks. I didn't see it causing
harm though.
Differential Revision: https://reviews.llvm.org/D46130
llvm-svn: 331035
It doesn't unwind, and the wrong marking leads to the creation of an
.eh_frame section when it isn't necessary.
Differential Revision: https://reviews.llvm.org/D46082
llvm-svn: 331008
Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed,
Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D45736
llvm-svn: 331002
Summary:
Simplify integer add expression X % C0 + (( X / C0 ) % C1) * C0 to
X % (C0 * C1). This is a common pattern seen in code generated by the XLA
GPU backend.
Add test cases for this new optimization.
Patch by Bixia Zheng!
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: efriedma, craig.topper, lebedev.ri, llvm-commits, jlebar
Differential Revision: https://reviews.llvm.org/D45976
llvm-svn: 330992
remainder expressions as operands.
Summary:
Add test cases to prepare for the new optimization that Simplifies integer add
expression X % C0 + (( X / C0 ) % C1) * C0 to X % (C0 * C1).
Patch by Bixia Zheng!
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46017
llvm-svn: 330991
This reverts commit 023c8be90980e0180766196cba86f81608b35d38.
This patch triggers miscompile of zlib on PowerPC platform. Most likely it is
caused by some pre-backend PPC-specific pass, but we don't clearly know the
reason yet. So we temporally revert this patch with intention to return it
once the problem is resolved. See bug 37229 for details.
llvm-svn: 330893
Summary:
When performing indirect call promotion, current implementation inspects "all" parameters of the callsite and attemps to match with the formal argument type of the callee function. However, it is not possible to find the type for variable length arguments, and the compiler crashes when it attemps to match the type for variable lenght argument.
It seems that the bug is introduced with D40658. Prior to that, the type matching is performed only for the parameters whose ID is less than callee->getFunctionNumParams(). The attached test case will crash without the patch.
Reviewers: mssimpso, davidxl, davide
Reviewed By: mssimpso
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46026
llvm-svn: 330844
As discussed in D45862, we want these folds sometimes
because they're good improvements.
But as we can see here, the current logic doesn't
check uses and doesn't produce optimal code in all
cases.
llvm-svn: 330837
This also means we have to check if the latch is the exiting block now,
as `transform` expects the latches to be the exiting blocks too.
https://bugs.llvm.org/show_bug.cgi?id=36586
Reviewers: efriedma, davide, karthikthecool
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D45279
llvm-svn: 330806
Summary:
When Reassociate is rewriting an expression tree it may
reuse old binary expression nodes, for new expressions.
Whenever an expression node is reused, but with a non-trivial
change in the result, we need to invalidate any debug info
that is associated with the node.
If for example rewriting
x = mul a, b
y = mul c, x
into
x = mul c, b
y = mul a, x
we still get the same result for 'y', but 'x' is a new expression.
All debug info referring to 'x' must be invalidated (marked as
optimized out) since we no longer calculate the expected value.
As a side-effect this patch avoid (at least some) problems where
reassociate could end up creating IR with debug-use before def.
Earlier the dbg.value nodes where left untouched in the IR, while
the reused binary nodes where sinked to just before the root node
of the rewritten expression tree. See PR27273 for more info about
such problems.
Reviewers: dblaikie, aprantl, dexonsmith
Reviewed By: aprantl
Subscribers: JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D45975
llvm-svn: 330804
These are all but 1 of the select-of-constant tests that appear
to be transformed within foldSelectICmpAnd() and the block above
it predicated by decomposeBitTestICmp().
As discussed in D45862 (and can be seen in several tests here),
we probably want to stop doing those transforms because they
can increase the instruction count without benefitting other
passes or codegen.
The 1 test not included here is a urem test where the bit hackery
allows us to remove a urem. To preserve killing that urem, we
should do some stronger known-bits analysis or pattern matching of
'urem x, (select-of-pow2-constants)'.
llvm-svn: 330768
Patch #2 from VPlan Outer Loop Vectorization Patch Series #1
(RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
This patch introduces the basic infrastructure to detect, legality check
and process outer loops annotated with hints for explicit vectorization.
All these changes are protected under the feature flag
-enable-vplan-native-path. This should make this patch NFC for the existing
inner loop vectorizer.
Reviewers: hfinkel, mkuper, rengolin, fhahn, aemerson, mssimpso.
Differential Revision: https://reviews.llvm.org/D42447
llvm-svn: 330739
After D43236, we started interchanging loops with empty dependence
matrices. In isProfitableForVectorization, we try to determine if
interchanging makes the loop dependences more friendly to the
vectorizer. If there are no dependences, we should not interchange,
based on that heuristic.
Reviewers: efriedma, mcrosier, karthikthecool, blitz.opensource
Reviewed By: mcrosier
Differential Revision: https://reviews.llvm.org/D45208
llvm-svn: 330738
The first step in fixing problems raised in D45862
is to make the problems visible. Now we can more easily
see/update cases where selects have been turned into
multiple instructions with no apparent improvement in
analysis or benefits for other passes (vectorization).
llvm-svn: 330731
The current version of the script uses regex for params.
This could mask a bug (param values got wrongly swapped),
but it seems unlikely in practice, so let's just update
the whole file to reduce diffs when there is a meaningful
change here.
llvm-svn: 330729
The memory location an invariant load is using can never be clobbered by
any store, so it's safe to move the load ahead of the store.
Differential Revision: https://reviews.llvm.org/D46011
llvm-svn: 330725
loop unswitch.
This code incorrectly added the header to the loop block set early. As
a consequence we would incorrectly conclude that a nested loop body had
already been visited when the header of the outer loop was the preheader
of the nested loop. In retrospect, adding the header eagerly doesn't
really make sense. It seems nicer to let the cycle be formed naturally.
This will catch crazy bugs in the CFG reconstruction where we can't
correctly form the cycle earlier rather than later, and makes the rest
of the logic just fall out.
I've also added various asserts that make these issues *much* easier to
debug.
llvm-svn: 330707