The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.
http://reviews.llvm.org/D12992
llvm-svn: 249196
Summary:
Some passes may open up opportunities for optimizations, leaving empty
lifetime start/end ranges. For example, with the following code:
void foo(char *, char *);
void bar(int Size, bool flag) {
for (int i = 0; i < Size; ++i) {
char text[1];
char buff[1];
if (flag)
foo(text, buff); // BBFoo
}
}
the loop unswitch pass will create 2 versions of the loop, one with
flag==true, and the other one with flag==false, but always leaving
the BBFoo basic block, with lifetime ranges covering the scope of the for
loop. Simplify CFG will then remove BBFoo in the case where flag==false,
but will leave the lifetime markers.
This patch teaches InstCombine to remove trivially empty lifetime marker
ranges, that is ranges ending right after they were started (ignoring
debug info or other lifetime markers in the range).
This fixes PR24598: excessive compile time after r234581.
Reviewers: reames, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13305
llvm-svn: 249018
This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a
builtin shuffle if the mask is constant.
Converting byte shuffle intrinsic calls to builtin shuffles can help finding
more opportunities for combining shuffles later on in selection dag.
We may end up with byte shuffles with constant masks as the result of inlining.
Differential Revision: http://reviews.llvm.org/D13252
llvm-svn: 248913
Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements.
This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector).
This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts.
Differential Revision: http://reviews.llvm.org/D12935
llvm-svn: 248784
This is one step towards solving PR24766:
https://llvm.org/bugs/show_bug.cgi?id=24766
We were not producing the same IR for these two C functions because the store
to the temp bool causes extra zexts:
#include <stdbool.h>
bool switchy(char x1, char x2, char condition) {
bool conditionMet = false;
switch (condition) {
case 0: conditionMet = (x1 == x2); break;
case 1: conditionMet = (x1 <= x2); break;
}
return conditionMet;
}
bool switchy2(char x1, char x2, char condition) {
switch (condition) {
case 0: return (x1 == x2);
case 1: return (x1 <= x2);
}
return false;
}
As noted in the code comments, this test case manages to avoid the more general existing
phi optimizations where there are only 2 phi inputs or where there are no constant phi
args mixed in with the casts ops. It seems like a corner case, but if we don't catch it,
then I don't think we can get SimplifyCFG to further optimize towards the canonical form
for this function shown in the bug report.
Differential Revision: http://reviews.llvm.org/D12866
llvm-svn: 248689
This is a fix for PR22723:
https://llvm.org/bugs/show_bug.cgi?id=22723
My first attempt at this was to change what I thought was the root problem:
xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32
...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop!
My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would
mean potentially returning a different size for the match than what was input. I think
this would require all users of m_Not to check the size of the returned match, so I
abandoned that idea.
I settled on just fixing the exact case presented in the PR. This patch does allow the
2 functions in PR22723 to compile identically (x86):
bool test(bool x, bool y) { return !x | !y; }
bool test(bool x, bool y) { return !x || !y; }
...
andb %sil, %dil
xorb $1, %dil
movb %dil, %al
retq
Differential Revision: http://reviews.llvm.org/D12705
llvm-svn: 248634
This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead.
LLVM counterpart to D12835
Differential Revision: http://reviews.llvm.org/D13002
llvm-svn: 248368
The SSE4A instructions EXTRQ/INSERTQ only use the lower 64-bits (or less) for many of their input vector operands and all of them have undefined upper 64-bits results.
Differential Revision: http://reviews.llvm.org/D12680
llvm-svn: 247934
Summary:
`signum(x)` is sometimes implemented as `(x >> 63) | (-x >>> 63)` (for
an `i64` `x`). This change adds a matcher for that pattern, and an
instcombine rule to optimize `signum(x) s< 1`.
Later, we can also consider optimizing:
icmp slt signum(x), 0 --> icmp slt x, 0
icmp sle signum(x), 1 --> true
etc.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12703
llvm-svn: 247846
The patch extends the optimization to cases where the constant's
magnitude is so small or large that the rounding of the conversion
is irrelevant. The "so small" case includes negative zero.
Differential review: http://reviews.llvm.org/D11210
llvm-svn: 247708
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG. It also adds assertions in isKnownNonNull() and isKnownNonNullFromDominatingCondition() to make sure the value checked is pointer type (as defined in LLVM document). These assertions might trip failures in things which are not covered under llvm/test, but fixes should be pretty obvious.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12779
llvm-svn: 247587
Improved InstCombine support for CVTPH2PS (F16C half 2 float conversion):
<4 x float> @llvm.x86.vcvtph2ps.128(<8 x i16>) - only uses the bottom 4 i16 elements for the conversion.
Added constant folding support.
Differential Revision: http://reviews.llvm.org/D12731
llvm-svn: 247504
Summary: This fixes a variety of typos in docs, code and headers.
Subscribers: jholewinski, sanjoy, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D12626
llvm-svn: 247495
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12779
llvm-svn: 247356
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of gc.relocate return value. In this way it can handle cases where the relocated value does not have nonnull attribute but has a dominating null check from the CFG.
Reviewers: reames
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D12772
llvm-svn: 247353
removes cast by performing the lshr on smaller types. However, currently there
is no trunc(lshr (sext A), Cst) variant.
This patch add such optimization by transforming trunc(lshr (sext A), Cst)
to ashr A, Cst.
Differential Revision: http://reviews.llvm.org/D12520
llvm-svn: 247271
with the new pass manager, and no longer relying on analysis groups.
This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:
- FunctionAAResults is a type-erasing alias analysis results aggregation
interface to walk a single query across a range of results from
different alias analyses. Currently this is function-specific as we
always assume that aliasing queries are *within* a function.
- AAResultBase is a CRTP utility providing stub implementations of
various parts of the alias analysis result concept, notably in several
cases in terms of other more general parts of the interface. This can
be used to implement only a narrow part of the interface rather than
the entire interface. This isn't really ideal, this logic should be
hoisted into FunctionAAResults as currently it will cause
a significant amount of redundant work, but it faithfully models the
behavior of the prior infrastructure.
- All the alias analysis passes are ported to be wrapper passes for the
legacy PM and new-style analysis passes for the new PM with a shared
result object. In some cases (most notably CFL), this is an extremely
naive approach that we should revisit when we can specialize for the
new pass manager.
- BasicAA has been restructured to reflect that it is much more
fundamentally a function analysis because it uses dominator trees and
loop info that need to be constructed for each function.
All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.
The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.
This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.
Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.
One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.
Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.
Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.
Differential Revision: http://reviews.llvm.org/D12080
llvm-svn: 247167
removes cast by performing the lshr on smaller types. However, currently there
is no trunc(lshr (sext A), Cst) variant.
This patch add such optimization by transforming trunc(lshr (sext A), Cst)
to ashr A, Cst.
Differential Revision: http://reviews.llvm.org/D12520
llvm-svn: 246997
Trivial multiplication by zero may survive the worklist. We tried to
reassociate the multiplication with a division instruction, causing us
to divide by zero; bail out instead.
This fixes PR24726.
llvm-svn: 246939
PR24605 is caused due to an incorrect insert point in instcombine's IR
builder. When simplifying
%t = add X Y
...
%m = icmp ... %t
the replacement for %t should be placed before %t, not before %m, as
there could be a use of %t between %t and %m.
llvm-svn: 246315
The original checkin was buggy, this change has a fix.
Original commit message:
[InstCombine] Transform A & (L - 1) u< L --> L != 0
Summary:
This transform is never a pessimization at the IR level (since it
replaces an `icmp` with another), and has potentiall payoffs:
1. It may make the `icmp` fold away or become loop invariant.
2. It may make the `A & (L - 1)` computation dead.
This shows up in Java, in range checks generated by array accesses of
the form `a[i & (a.length - 1)]`.
Reviewers: reames, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12210
llvm-svn: 245753
Summary:
This transform is never a pessimization at the IR level (since it
replaces an `icmp` with another), and has potentiall payoffs:
1. It may make the `icmp` fold away or become loop invariant.
2. It may make the `A & (L - 1)` computation dead.
This shows up in Java, in range checks generated by array accesses of
the form `a[i & (a.length - 1)]`.
Reviewers: reames, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12210
llvm-svn: 245635
and make it always preserve debug locations, since all callers wanted this
behavior anyway.
This is addressing a post-commit review feedback for r245589.
NFC (inside the LLVM tree).
llvm-svn: 245622
Instruction::dropUnknownMetadata(KnownSet) is supposed to preserve all
metadata in KnownSet, but the condition for DebugLocs was inverted.
Most users of dropUnknownMetadata() actually worked around this by not
adding LLVMContext::MD_dbg to their list of KnowIDs.
This is now made explicit.
llvm-svn: 245589
Summary: We know that -x & 1 is equivalent to x & 1, avoid using negation for testing if a negative integer is even or odd.
Reviewers: majnemer
Subscribers: junbuml, mssimpso, gberry, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D12156
llvm-svn: 245569
Bitwise arithmetic can obscure a simple sign-test. If replacing the
mask with a truncate is preferable if the type is legal because it
permits us to rephrase the comparison more explicitly.
llvm-svn: 245171