This is necessary to avoid warnings from GCC.
InstCombineLoadStoreAlloca.cpp:238:7: error: 'PointerReplacer' declared
with greater visibility than the type of its field 'PointerReplacer::IC'
llvm-svn: 294794
For function-scope variables with large initialisation list, FE usually
generates a global variable to hold the initializer, then generates
memcpy intrinsic to initialize the alloca. InstCombiner::visitAllocaInst
identifies such allocas which are accessed only by reading and replaces
them with the global variable. This is done by casting the global variable
to the type of the alloca and replacing all references.
However, when the global variable is in a different address space which
is disjoint with addr space 0 (e.g. for IR generated from OpenCL,
global variable cannot be in private addr space i.e. addr space 0), casting
the global variable to addr space 0 results in invalid IR for certain
targets (e.g. amdgpu).
To fix this issue, when the global variable is not in addr space 0,
instead of casting it to addr space 0, this patch chases down the uses
of alloca until reaching the load instructions, then replaces load from
alloca with load from the global variable. If during the chasing
bitcast and GEP are encountered, new bitcast and GEP based on the global
variable are generated and used in the load instructions.
Differential Revision: https://reviews.llvm.org/D27283
llvm-svn: 294786
This fold already existed for vectors but only when 'C1' was a splat
constant (but 'C2' could be any constant).
There were no tests for any vector constants, so I'm adding a test
that shows non-splat constants for both operands.
llvm-svn: 294650
This patch is based on the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/109631.html
Folding to i1 should always be desirable because that's better for value tracking
and we have special folds for i1 types.
I checked for other users of shouldChangeType() where this might have an effect,
but we already handle the i1 case differently than other types in all of those cases.
Side note: the default datalayout includes i1, so it seems we only find this gap in
shouldChangeType + phi folding for the case when there is (1) an explicit datalayout
without i1, (2) casting to i1 from a legal type, and (3) a phi with exactly 2 incoming
casted operands (as Björn mentioned).
Differential Revision: https://reviews.llvm.org/D29336
llvm-svn: 294066
The code comments didn't match the code logic, and we didn't actually distinguish the fake unary (not/neg/fneg)
operators from arguments. Adding another level to the weighting scheme provides more structure and can help
simplify the pattern matching in InstCombine and other places.
I fixed regressions that would have shown up from this change in:
rL290067
rL290127
But that doesn't mean there are no pattern-matching logic holes left; some combines may just be missing regression tests.
Should fix:
https://llvm.org/bugs/show_bug.cgi?id=28296
Differential Revision: https://reviews.llvm.org/D27933
llvm-svn: 294049
Although this is 'no-functional-change-intended', I'm adding tests
for shl-shl and lshr-lshr pairs because there is no existing test
coverage for those folds.
It seems like we should be able to remove some code from foldShiftedShift()
at this point because we're handling those patterns on the general path.
llvm-svn: 293814
Summary:
If there are two adjacent guards with different conditions, we can
remove one of them and include its condition into the condition of
another one. This patch allows InstCombine to merge them by the
following pattern:
guard(a); guard(b) -> guard(a & b).
Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29378
llvm-svn: 293778
transformToIndexedCompare
If they don't have the same type, the size of the constant
index would need to be adjusted (and this wouldn't be always
possible).
Alternatively we could try the analysis with the initial
RHS value, which would guarantee that the two sides have
the same type. However it is unlikely that in practice this
would pass our transformation requirements.
Fixes PR31808 (https://llvm.org/bugs/show_bug.cgi?id=31808).
llvm-svn: 293629
The original shift is bigger, so this may qualify as 'obvious',
but here's an attempt at an Alive-based proof:
Name: exact
Pre: (C1 u< C2)
%a = shl i8 %x, C1
%b = lshr exact i8 %a, C2
=>
%c = lshr exact i8 %x, C2 - C1
%b = and i8 %c, ((1 << width(C1)) - 1) u>> C2
Optimization is correct!
llvm-svn: 293498
This is a minimal patch to avoid the infinite loop in:
https://llvm.org/bugs/show_bug.cgi?id=31751
But the general problem is bigger: we're not canonicalizing all of the min/max forms reported
by value tracking's matchSelectPattern(), and we don't define min/max consistently. Some code
uses matchSelectPattern(), other code uses matchers like m_Umax, and others have their own
inline definitions which may be subtly different from any of the above.
The reason that the test cases in this patch need a cast op to trigger is because we don't
(yet) canonicalize all min/max forms based on matchSelectPattern() in
canonicalizeMinMaxWithConstant(), but we do make min/max+cast transforms based on
matchSelectPattern() in visitSelectInst().
The location of the icmp transforms that trigger the inf-loop seems arbitrary at best, so
I'm moving those behind the min/max fence in visitICmpInst() as the quick fix.
llvm-svn: 293345
Summary:
There are many NVVM intrinsics that we can't entirely get rid of, but
that nonetheless often correspond to target-generic LLVM intrinsics.
For example, if flush denormals to zero (ftz) is enabled, we can convert
@llvm.nvvm.ceil.ftz.f to @llvm.ceil.f32. On the other hand, if ftz is
disabled, we can't do this, because @llvm.ceil.f32 will be lowered to a
non-ftz PTX instruction. In this case, we can, however, simplify the
non-ftz nvvm ceil intrinsic, @llvm.nvvm.ceil.f, to @llvm.ceil.f32.
These transformations are particularly useful because they let us
constant fold instructions that appear in libdevice, the bitcode library
that ships with CUDA and essentially functions as its libm.
Reviewers: tra
Subscribers: hfinkel, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D28794
llvm-svn: 293244
This change reverts:
r293061: "[InstCombine] Canonicalize guards for NOT OR condition"
r293058: "[InstCombine] Canonicalize guards for AND condition"
They miscompile cases like:
```
declare void @llvm.experimental.guard(i1, ...)
define void @test_guard_not_or(i1 %A, i1 %B) {
%C = or i1 %A, %B
%D = xor i1 %C, true
call void(i1, ...) @llvm.experimental.guard(i1 %D, i32 20, i32 30)[ "deopt"() ]
ret void
}
```
because they do transfer the `i32 20, i32 30` parameters to newly
created guard instructions.
llvm-svn: 293227
We already have this fold when the lshr has one use, but it doesn't need that
restriction. We may be able to remove some code from foldShiftedShift().
Also, move the similar:
(X << C) >>u C --> X & (-1 >>u C)
...directly into visitLShr to help clean up foldShiftByConstOfShiftByConst().
That whole function seems questionable since it is called by commonShiftTransforms(),
but there's really not much in common if we're checking the shift opcodes for every
fold.
llvm-svn: 293215
This intrinsic uses bit 0 and bit 4 of an immediate argument to determine which bits of its inputs to read. This patch uses this information to simplify the demanded elements of the input vectors.
Differential Revision: https://reviews.llvm.org/D28979
llvm-svn: 293151
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine
Reviewed By: apilipenko
Differential Revision: https://reviews.llvm.org/D29075
Patch by Maxim Kazantsev.
llvm-svn: 293061
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine
Reviewed By: apilipenko
Differential Revision: https://reviews.llvm.org/D29074
Patch by Maxim Kazantsev.
llvm-svn: 293058
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine
Reviewed By: majnemer, apilipenko
Differential Revision: https://reviews.llvm.org/D29071
Patch by Maxim Kazantsev.
llvm-svn: 293056
Summary: As per title. This will add the instructiions we are interested in in the worklist.
Reviewers: mehdi_amini, majnemer, andreadb
Differential Revision: https://reviews.llvm.org/D29081
llvm-svn: 292957
Added early out for single undef input - we were already supporting (and testing) this in the constant folding code, we just do it quicker now
Drop undef handling from demanded elts code now that we handle it fully in InstCombiner::visitCallInst
llvm-svn: 292913
We may be able to assert that no shl-shl or lshr-lshr pairs ever get here
because we should have already handled those in foldShiftedShift().
llvm-svn: 292726
Simplify a packss/packus truncation based on the elements of the mask that are actually demanded.
Differential Revision: https://reviews.llvm.org/D28777
llvm-svn: 292591
As discussed on D28777 - we don't need to handle 'all element' shuffles inside InstCombiner::visitCallInst as InstCombiner::SimplifyDemandedVectorElts will do everything we need.
llvm-svn: 292365
Add missing fabs(fpext) optimzation that worked with the call,
and also fixes it creating a second fpext when there were multiple
uses.
llvm-svn: 292172
It's not clear what 'First' and 'Second' mean, so use 'Inner' and 'Outer'
to match foldShiftedShift() and add comments with formulas, so it's easier
to see what's going on.
llvm-svn: 292153
Simplify a pshufb shuffle mask based on the elements of the mask that are actually demanded.
Differential Revision: https://reviews.llvm.org/D28745
llvm-svn: 292101
a function's CFG when that CFG is unchanged.
This allows transformation passes to simply claim they preserve the CFG
and analysis passes to check for the CFG being preserved to remove the
fanout of all analyses being listed in all passes.
I've gone through and removed or cleaned up as many of the comments
reminding us to do this as I could.
Differential Revision: https://reviews.llvm.org/D28627
llvm-svn: 292054
mark it as never invalidated in the new PM.
The old PM already required this to work, and after a discussion with
Hal this seems to really be the only sensible answer. The cache
gracefully degrades as the IR is mutated, and most things which do this
should already be incrementally updating the cache.
This gets rid of a bunch of logic preserving and testing the
invalidation of this analysis.
llvm-svn: 292039
cover domtree and alias analysis. These are the pretty clear analyses
that we would always want to survive this pass.
To make these survive, we also need to preserve the assumption cache.
Added a test that verifies the important bits of this preservation.
llvm-svn: 292037
Allows LLVM to optimize sequences like the following:
%add = add nuw i32 %x, 1
%cmp = icmp ugt i32 %add, %y
Into:
%cmp = icmp uge i32 %x, %y
Previously, only signed comparisons were being handled.
Decrements could also be handled, but 'sub nuw %x, 1' is currently canonicalized to
'add %x, -1' in InstCombineAddSub, losing the nuw flag. Removing that canonicalization
seems like it might have far-reaching ramifications so I kept this simple for now.
Patch by Matti Niemenmaa!
Differential Revision: https://reviews.llvm.org/D24700
llvm-svn: 291975
Here's my second try at making @llvm.assume processing more efficient. My
previous attempt, which leveraged operand bundles, r289755, didn't end up
working: it did make assume processing more efficient but eliminating the
assumption cache made ephemeral value computation too expensive. This is a
more-targeted change. We'll keep the assumption cache, but extend it to keep a
map of affected values (i.e. values about which an assumption might provide
some information) to the corresponding assumption intrinsics. This allows
ValueTracking and LVI to find assumptions relevant to the value being queried
without scanning all assumptions in the function. The fact that ValueTracking
started doing O(number of assumptions in the function) work, for every
known-bits query, has become prohibitively expensive in some cases.
As discussed during the review, this is a pragmatic fix that, longer term, will
likely be replaced by a more-principled solution (perhaps based on an extended
SSA form).
Differential Revision: https://reviews.llvm.org/D28459
llvm-svn: 291671
Some of the callers are artificially limiting this transform to integer types;
this should make it easier to incrementally remove that restriction.
llvm-svn: 291620
We can perform the following:
(add (zext (add nuw X, C1)), C2) -> (zext (add nuw X, C1+C2))
This is only possible if C2 is negative and C2 is greater than or equal to negative C1.
llvm-svn: 290927
I wrote this patch before seeing the comment in:
https://reviews.llvm.org/D27114
...that suggests we should actually be canonicalizing the other way.
So just in case we decide this is the right way, we might as well
have a cleaner implementation.
llvm-svn: 290912
We correctly canonicalized (add (sext x), (sext y)) to (sext (add x, y))
where possible. However, we didn't perform the same canonicalization
for zexts or for muls.
llvm-svn: 290733
We bypassed the intrinsic and returned the passthru operand, but we should also add the intrinsic to the worklist since its now dead. This can allow DCE to find it sooner and remove it. Similar was done for InsertElement when the inserted element isn't demanded.
llvm-svn: 290704
This adds a combine that canonicalizes a chain of inserts which broadcasts
a value into a single insert + a splat shufflevector.
This fixes PR31286.
Differential Revision: https://reviews.llvm.org/D27992
llvm-svn: 290641
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.
This builds on r290554 which added supported for 128 and 256-bit.
llvm-svn: 290582
An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed.
llvm-svn: 290566
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.
Differential Revision: https://reviews.llvm.org/D28119
llvm-svn: 290554
Summary:
I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon.
I'll do the same thing for packed add/sub/mul/div in a future patch.
Reviewers: delena, RKSimon, zvi, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27879
llvm-svn: 290535
Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.
We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.
Reviewers: zvi, delena, spatel, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27825
llvm-svn: 290530
We're currently doing nearly the same thing for @llvm.objectsize in
three different places: two of them are missing checks for overflow,
and one of them could subtly break if InstCombine gets much smarter
about removing alloc sites. Seems like a good idea to not do that.
llvm-svn: 290214
Background/motivation - I was circling back around to:
https://llvm.org/bugs/show_bug.cgi?id=28296
I made a simple patch for that and noticed some regressions, so added test cases for
those with rL281055, and this is hopefully the minimal fix for just those cases.
But as you can see from the surrounding untouched folds, we are missing commuted patterns
all over the place, and of course there are no regression tests to cover any of those cases.
We could sprinkle "m_c_" dust all over this file and catch most of the missing folds, but
then we still wouldn't have test coverage, and we'd still miss some fraction of commuted
patterns because they require adjustments to the match order.
I'm aware of the concern about the potential compile-time performance impact of adding
matches like this (currently being discussed on llvm-dev), but I don't think there's any
evidence yet to suggest that handling commutative pattern matching more thoroughly is not
a worthwhile goal of InstCombine.
Differential Revision: https://reviews.llvm.org/D24419
llvm-svn: 290067
Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns.
This patch won't solve the example that was attached to that thread, so something else still needs fixing.
The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that
we want to fold to already exists, but sometimes it's the swapped form of what we want.
Corresponding changes for smax/umin/umax to follow.
Differential Revision: https://reviews.llvm.org/D27531
llvm-svn: 289855
A number of new patterns for simplifying and/xor of icmp:
(icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true:
1- (%x = and %a, %mask) and (%y = and %b, %mask)
2- %mask is a power of 2.
(icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true:
1- (%x = and %a, %mask1) and (%y = and %b, %mask2)
2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any
%s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t.
For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8
violates condition (2) above. So this optimization cannot be applied.
llvm-svn: 289813
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...
llvm-svn: 289756
There was an efficiency problem with how we processed @llvm.assume in
ValueTracking (and other places). The AssumptionCache tracked all of the
assumptions in a given function. In order to find assumptions relevant to
computing known bits, etc. we searched every assumption in the function. For
ValueTracking, that means that we did O(#assumes * #values) work in InstCombine
and other passes (with a constant factor that can be quite large because we'd
repeat this search at every level of recursion of the analysis).
Several of us discussed this situation at the last developers' meeting, and
this implements the discussed solution: Make the values that an assume might
affect operands of the assume itself. To avoid exposing this detail to
frontends and passes that need not worry about it, I've used the new
operand-bundle feature to add these extra call "operands" in a way that does
not affect the intrinsic's signature. I think this solution is relatively
clean. InstCombine adds these extra operands based on what ValueTracking, LVI,
etc. will need and then those passes need only search the users of the values
under consideration. This should fix the computational-complexity problem.
At this point, no passes depend on the AssumptionCache, and so I'll remove
that as a follow-up change.
Differential Revision: https://reviews.llvm.org/D27259
llvm-svn: 289755
If all the operands to a phi node are compares that have a RHS constant,
instcombine will try to pull them through the phi node, combining them into
a single operation. When it does this, the debug location of the new op
should be the merged debug locations of the phi node arguments.
Patch 8 of 8 for D26256. Folding of a compare that has a RHS constant.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289704
If all the operands to a phi node are a binop with a RHS constant, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the new op should be the
merged debug locations of the phi node arguments.
Patch 7 of 8 for D26256. Folding of a binop with RHS constant.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289699
If all the operands to a phi node are a cast, instcombine will try to pull
them through the phi node, combining them into a single cast. When it does
this, the debug location of the new cast should be the merged debug locations
of the phi node arguments.
Patch 6 of 8 for D26256. Folding of a cast operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289693
If all the operands to a phi node are a load, instcombine will try to pull
them through the phi node, combining them into a single load. When it does
this, the debug location of the new load should be the merged debug locations
of the phi node arguments.
Patch 5 of 8 for D26256. Folding of a load operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289688
If all the operands to a phi node are getelementptr, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the new getelementptr
should be the merged debug locations of the phi node arguments.
Patch 4 of 8 for D26256. Folding of a getelementptr operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289684
If all the operands to a phi node are of the same operation, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the operation should
be the merged debug locations of the phi node arguments.
Patch 3 of 8 for D26256. Folding of a compare operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289681
If all the operands to a phi node are of the same operation, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the operation should
be the merged debug locations of the phi node arguments.
Patch 2 of 8 for D26256. Folding of a binary operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289679
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.
Differential Revision: https://reviews.llvm.org/D26671
llvm-svn: 289647
Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Similarly we clear bit 0 for optimizing operand 0.
Also calculate UndefElts correctly.
Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support.
llvm-svn: 289629
Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused.
Also calculate UndefElts correctly.
Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support.
llvm-svn: 289628
Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed.
Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support.
llvm-svn: 289523
We could truncate the condition and then try to fold the add into the
original condition value causing wrong case constants to be used.
Move the offset transform ahead of the truncate transform and return
after each transform, so there's no chance of getting confused values.
Fix for:
https://llvm.org/bugs/show_bug.cgi?id=31260
llvm-svn: 289442
This teaches SimplifyDemandedElts that the FMA can be removed if the lower element isn't used. It also teaches it that if upper elements of the first operand aren't used then we can simplify them.
llvm-svn: 289377
These intrinsics don't read the upper elements of their first and second input. These are slightly different the the SSE version which does use the upper bits of its first element as passthru bits since the result goes to an XMM register. For AVX-512 the result goes to a mask register instead.
llvm-svn: 289371
These intrinsics don't read the upper bits of their second input. And the third input is the passthru for masking and that only uses the lower element as well.
llvm-svn: 289370
This solves a secondary problem seen in PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137#c6
This is similar to the bitwise logic op fold added with:
https://reviews.llvm.org/rL287707
And like that patch, I'm artificially restricting the
transform from vector <-> scalar types until we're sure
that the backend can handle that.
llvm-svn: 288584
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.
Differential Revision: https://reviews.llvm.org/D26594
llvm-svn: 288458
The instcombine code which folds loads and stores into their use types can trip up if the use is a bitcast to a type which we can't directly load or store in the IR. In principle, such types shouldn't exist, but in practice they do today. This is a workaround to avoid a bug while we work towards the long term goal.
Differential Revision: https://reviews.llvm.org/D24365
llvm-svn: 288415
Note that the non-splat lshr+lshr test folded, but that does not
work in general. Something is missing or wrong in computeKnownBits
as the non-splat shl+shl test still shows.
llvm-svn: 288005
There are other spots where we can use this; we're currently dropping
metadata in some places, and there are proposed changes where we will
want to propagate metadata.
IRBuilder's CreateSelect() already has a parameter like this, so this
change makes the regular 'Create' API line up with that.
llvm-svn: 287976
In PR27925:
https://llvm.org/bugs/show_bug.cgi?id=27925
...we proposed adding this fold to eliminate a bitcast. In D20774, there was
some concern about changing the type of a bitwise op as well as creating
bitcasts that might not be free for a target. However, if we're strictly
eliminating an instruction (by limiting this to one-use ops), then we should
be able to do this in InstCombine.
But we're cautiously restricting the transform for now to vector types to
avoid possible backend problems. A transform to make sure the logic op is
legal for the target should be added to reverse this transform and improve
codegen.
Differential Revision: https://reviews.llvm.org/D26641
llvm-svn: 287707
This is a first step towards canonicalization and improved folding/codegen
for integer min/max as discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html
Here, we're just matching the simplest min/max patterns and adjusting the
icmp predicate while swapping the select operands.
I've included FIXME tests in test/Transforms/InstCombine/select_meta.ll
so it's easier to see how this might be extended (corresponds to the TODO
comment in the code). That's also why I'm using matchSelectPattern()
rather than a simpler check; once the backend is patched, we can just
remove some of the restrictions to allow the obfuscated min/max patterns
in the FIXME tests to be matched.
Differential Revision: https://reviews.llvm.org/D26525
llvm-svn: 287585
This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches.
llvm-svn: 287316
This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system.
llvm-svn: 287206
Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.
Reviewers: zvi, delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26660
llvm-svn: 287083
Removing the limitation in visitInsertElementInst() causes several regressions
because we're not prepared to fold sequences of shuffles or inserts and extracts
separated by shuffles. Fixing that appears to be a difficult mission because we
are purposely trying to avoid creating shuffles with arbitrary shuffle masks
because some targets may choke on those.
https://llvm.org/bugs/show_bug.cgi?id=30923
llvm-svn: 286423
As the test change shows, we can increase the critical path by adding
a 'not' instruction, so make sure that we're actually removing an
instruction if we do this transform.
This transform could also cause us to miss folds of min/max pairs.
llvm-svn: 286315
This was reverted at r285866 because there was a crash handling a scalar
select of vectors. I added a check for that pattern and a test case based
on the example provided in the post-commit thread for r285732.
llvm-svn: 286113
This reverts commit r285732.
This change introduced a new assertion failure in the following
testcase at -O2:
typedef short __v8hi __attribute__((__vector_size__(16)));
__v8hi foo(__v8hi &V1, __v8hi &V2, unsigned mask) {
__v8hi Result = V1;
if (mask & 0x80)
Result[0] = V2[0];
return Result;
}
llvm-svn: 285866
This patch introduces the combine:
(C1 shift (A add C2)) -> ((C1 shift C2) shift A)
iff A and C2 are both positive
If both A and C2 are know to be positive then we can safely split into 2 shifts, permitting the folding of the Inner shift.
Fix for the spec benchmark case mentioned by @nadav on PR15141 (assuming we can prove that the inputs as positive).
Differential Revision: https://reviews.llvm.org/D26000
llvm-svn: 285696
The original patch of the A->B->A BitCast optimization was reverted by r274094 because it may cause infinite loop inside compiler https://llvm.org/bugs/show_bug.cgi?id=27996.
The problem is with following code
xB = load (type B);
xA = load (type A);
+yA = (A)xB; B -> A
+zAn = PHI[yA, xA]; PHI
+zBn = (B)zAn; // A -> B
store zAn;
store zBn;
optimizeBitCastFromPhi generates
+zBn = (B)zAn; // A -> B
and expects it will be combined with the following store instruction to another
store zAn
Unfortunately before combineStoreToValueType is called on the store instruction, optimizeBitCastFromPhi is called on the new BitCast again, and this pattern repeats indefinitely.
optimizeBitCastFromPhi only generates BitCast for load/store instructions, only the BitCast before store can cause the reexecution of optimizeBitCastFromPhi, and BitCast before store can easily be handled by InstCombineLoadStoreAlloca.cpp. So the solution to the problem is if all users of a CI are store instructions, we should not do optimizeBitCastFromPhi on it. Then optimizeBitCastFromPhi will not be called on the new BitCast instructions.
Differential Revision: https://reviews.llvm.org/D23896
llvm-svn: 285116
Prefer add/zext because they are better supported in terms of value-tracking.
Note that the backend should be prepared for this IR canonicalization
(including vector types) after:
https://reviews.llvm.org/rL284015
Differential Revision: https://reviews.llvm.org/D25135
llvm-svn: 284241
As discussed by Andrea on PR30486, we have an unsafe cast to an Instruction type in the select combine which doesn't take into account that it could be a ConstantExpr instead.
Differential Revision: https://reviews.llvm.org/D25466
llvm-svn: 284000
When combining an integer load with !range metadata that does not include 0 to a pointer load, make sure emit !nonnull metadata on the newly-created pointer load. This prevents the !nonnull metadata from being dropped during a ptrtoint/inttoptr pair.
This fixes PR30597.
Patch by Ariel Ben-Yehuda!
Differential Revision: https://reviews.llvm.org/D25215
llvm-svn: 283836
If we're going to canonicalize IR towards select of constants, try harder to create those.
Also, don't lose the metadata.
This is actually 4 related transforms in one patch:
// select X, (sext X), C --> select X, -1, C
// select X, (zext X), C --> select X, 1, C
// select X, C, (sext X) --> select X, C, 0
// select X, C, (zext X) --> select X, C, 0
Differential Revision: https://reviews.llvm.org/D25126
llvm-svn: 283575
Also, make foldSelectExtConst() a member of InstCombiner, remove
unnecessary parameters from its interface, and group visitSelectInst
helpers together in the header file.
llvm-svn: 282796
The index of the new insertelement instruction was evaluated in the
wrong way, it was considered as the index of the inserted value instead
of index of the position, where the value should be inserted.
llvm-svn: 282401
This patch fixes PR30366.
Function foldUDivShl() worked under the assumption that one of the values
in input to the function was always an instance of llvm::Instruction.
However, function visitUDivOperand() (the only user of foldUDivShl) was
clearly violating that precondition; internally, visitUDivOperand() uses pattern
matches to check the operands of a udiv. Pattern matchers for binary operators
know how to handle both Instruction and ConstantExpr values.
This patch fixes the problem in foldUDivShl(). Now we use pattern matchers
instead of explicit casts to Instruction. The reduced test case from PR30366
has been added to test file InstCombine/udiv-simplify.ll.
Differential Revision: https://reviews.llvm.org/D24565
llvm-svn: 282398
If inserting more than one constant into a vector:
define <4 x float> @foo(<4 x float> %x) {
%ins1 = insertelement <4 x float> %x, float 1.0, i32 1
%ins2 = insertelement <4 x float> %ins1, float 2.0, i32 2
ret <4 x float> %ins2
}
InstCombine could reduce that to a shufflevector:
define <4 x float> @goo(<4 x float> %x) {
%shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32><i32 0, i32 5, i32 6, i32 3>
ret <4 x float> %shuf
}
Also, InstCombine tries to convert shuffle instruction to single insertelement, if one of the vectors is a constant vector and only a single element from this constant should be used in shuffle, i.e.
shufflevector <4 x float> %v, <4 x float> <float undef, float 1.0, float
undef, float undef>, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef> ->
insertelement <4 x float> %v, float 1.0, 1
Differential Revision: https://reviews.llvm.org/D24182
llvm-svn: 282237
We already have the udiv variant of this transform, so I think this is ok for
InstCombine too even though there is an increase in IR instructions. As the
tests and TODO comments show, the transform can lead to follow-on combines.
This should fix: https://llvm.org/bugs/show_bug.cgi?id=28672
Differential Revision: https://reviews.llvm.org/D24527
llvm-svn: 282209
computeKnownBits() already works for integer vectors, so allow vector types when calling that from InstCombine.
I don't think the change to use m_APInt in computeKnownBits is strictly necessary because we do check for
ConstantVector later, but it's more efficient to handle the splat case without needing to loop on vector elements.
This should work with InstSimplify, but doesn't yet, so I made that a FIXME comment on the test for PR24942:
https://llvm.org/bugs/show_bug.cgi?id=24942
Differential Revision: https://reviews.llvm.org/D24677
llvm-svn: 281777
These 2 helper functions were already using APInt internally, so just
change the API and caller to allow folds for splats. The scalar
regression tests look quite thorough, so I just added a couple of
tests to prove that vectors are handled too.
These folds should be grouped with the other cmp+shift folds though.
That can be an NFC follow-up.
llvm-svn: 281663
This pattern is matched in foldICmpBinOpEqualityWithConstant() and already works
with vectors too. I changed some comments over there to point out the current
location. The tests for this transform are currently in 'sub.ll'.
Note that the remaining folds in this block all require a sub too, so they should
get grouped with the other icmp(sub) patterns.
llvm-svn: 281627
This is a big glob of transforms that probably should work for vectors,
but currently they are disallowed because of ConstantInt guards.
llvm-svn: 281614
Everything under foldICmpInstWithConstant() should now be working for
splat vectors via m_APInt matchers. Ie, I've removed all of the FIXMEs
that I added while cleaning that section up. Note that not all of the
associated FIXMEs in the regression tests are gone though, because some
of the tests require earlier folds that are still scalar-only.
llvm-svn: 281139
I introduced this potential bug by missing this diff in:
https://reviews.llvm.org/rL280873
...however, I'm not sure how to reach this code path with a regression test.
We may be able to remove this code and assume that the transform to a constant
is always handled by InstSimplify?
llvm-svn: 280964
This is a revert of r280676 which was a revert of r280637;
ie, this is r280637 again. It was speculatively reverted to
help debug buildbot failures.
llvm-svn: 280861
This fixes a similar issue to the one already fixed by r280804
(revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts
in the extrq/extrqi combining logic. However, it turns out that even the
insertq/insertqi logic was affected by the same problem.
llvm-svn: 280807
This patch fixes an assertion failure caused by unsafe dynamic casts on the
constant operands of sse4a intrinsic calls to extrq/extrqi
The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently
checks if the input operands are constants. Internally, that logic relies on
dyn_casts of values returned by calls to method Constant::getAggregateElement.
However, method getAggregateElemet may return nullptr if the constant element
cannot be retrieved. So, all the dyn_casts can potentially fail. This is what
happens for example if a constexpr value is passed in input to an extrq/extrqi
intrinsic call.
This patch fixes the problem by using a dyn_cast_or_null (instead of a simple
dyn_cast) on the result of each call to Constant::getAggregateElement.
Added reproducible test cases to x86-sse4a.ll.
Differential Revision: https://reviews.llvm.org/D24256
llvm-svn: 280804
The transform in question:
icmp (and (trunc W), C2), C1 -> icmp (and W, C2'), C1'
...is still not enabled for vectors, thus no functional change intended.
It's not clear to me if this is a good transform for vectors or even
scalars in general. Changing that behavior may be a follow-on patch.
llvm-svn: 280627
memcpy with ld/st.
When InstCombine replaces a memcpy with loads+stores it does not copy over the
llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes
that.
Differential Revision: https://reviews.llvm.org/D23499
llvm-svn: 280617
The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards
shrinking that to a single shufflevector.
Note that the transform is intentionally limited to shuffles that are equivalent to vector
selects to avoid creating arbitrary shuffle masks that may not lower well.
This should solve PR29126:
https://llvm.org/bugs/show_bug.cgi?id=29126
Differential Revision: https://reviews.llvm.org/D23886
llvm-svn: 280504
While removing a scalar shackle from an icmp fold, I noticed that I couldn't find any tests to trigger
this code path.
The 'and' shrinking transform should be handled by InstCombiner::foldCastedBitwiseLogic()
or eliminated with InstSimplify. The icmp narrowing is part of InstCombiner::foldICmpWithCastAndCast().
Differential Revision: https://reviews.llvm.org/D24031
llvm-svn: 280370
This is prep work before changing the callers to also use APInt which will
allow folds for splat vectors. Currently, the callers have ConstantInt
guards in place, so no functional change intended with this commit.
llvm-svn: 280282
It's much less code and easier to read if we don't duplicate
everything between the 'Inside' and not 'Inside' cases.
As noted with the FIXME, the goal is to make this vector-friendly
in a follow-up patch.
llvm-svn: 280183
Like other recent changes near here, the goal is to allow vector types for
all of these folds. Splitting things up makes it easier to incrementally
enhance the code and easier to read.
llvm-svn: 279851
Removing the redundant 'CmpRHSV' local variable exposes a bug in the caller
foldICmpShrConstant() - it was sending in the div constant instead of the
cmp constant. But I have not been able to expose this in a regression test
yet - the affected folds all appear to be handled before we ever reach this
code. I'll keep trying to find a case as I make changes to allow vector folds
in both functions.
llvm-svn: 279828
There was no logic in foldICmpDivConstant, so no need for a separate function.
The code is directly copy/pasted, so further cleanups to follow.
llvm-svn: 279685
I deleted a fold from InstCombine at:
https://reviews.llvm.org/rL279568
because it (like any InstCombine to a constant?) should always happen in InstSimplify,
however, it's not obvious what the assumptions are in the remaining code.
Add a comment and assert to make it clearer.
Differential Revision: https://reviews.llvm.org/D23819
llvm-svn: 279626
There will only be 3 lines of code in foldICmpShrConst() when the cleanup is done,
so it doesn't make much sense to have a separate function for a single fold.
llvm-svn: 279575
AFAICT, these already worked in all cases for scalar types, and I enhanced
the code to work for vector types in:
https://reviews.llvm.org/rL279543
llvm-svn: 279568
Summary: We can allow sinking if the single user block has only one unique predecessor, regardless of the number of edges. Note that a switch statement with multiple cases can have the same destination.
Reviewers: mcrosier, majnemer, spatel, reames
Subscribers: reames, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D23722
llvm-svn: 279448
The callers still have ConstantInt guards, so there is no functional change
intended from this change. But relaxing the callers will allow more folds
for vector types.
llvm-svn: 279396
This is a partial enablement (move the ConstantInt guard down) because there are many
different folds here and one of the later ones will require reworking 'isSignBitCheck'.
llvm-svn: 279339
Specifically, this is done near the end of "SimplifyICmpInst" using
computeKnownBits() as the broader solution. There are even vector
tests (yay!) for this in test/Transforms/InstSimplify/compare.ll.
I considered putting an assert here instead of just deleting, but
then we could assert every possible fold in InstSimplify in
InstCombine, so...less is more?
llvm-svn: 279300
The intended transform is:
// Simplify icmp eq (or (ptrtoint P), (ptrtoint Q)), 0
// -> and (icmp eq P, null), (icmp eq Q, null).
P and Q are both pointer types, but may have different types. We need
two calls to getNullValue() to make the icmps.
llvm-svn: 279271
Of course, we really need to refactor and fix all of the cmp predicates,
but this one is interesting because without it, we later perform an
information-losing transform of icmp (shl 1, Y), C, and we can't recover
the better fold.
llvm-svn: 279263
Clean up the existing code by:
1. Renaming variables
2. Adding local variables
3. Making it vector-safe
This is still guarded by a ConstantInt check, so no functional change is intended.
But this should be ready to go: if we move the ConstantInt check down, all of
these folds should do the right thing for vector types.
llvm-svn: 279150
Use m_APInt for the xor constant, but this is all still guarded by the initial
ConstantInt check, so no vector types should make it in here.
llvm-svn: 278957
1. Change variable names
2. Use local variables to reduce code
3. Use ? instead of if/else
4. Use the APInt variable instead of 'RHS' so the removal of the FIXME code will be direct
llvm-svn: 278944
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.
llvm-svn: 278902
1. Fix variable names
2. Add local variables to reduce code
3. Fix code comments
4. Add early exit to reduce indentation
5. Remove 'else' after if -> return
6. Hoist common predicate
llvm-svn: 278864
Besides breaking up a 700 line function to improve readability,
this sinks the 'FIXME: ConstantInt' check into each helper. So
now we can independently break that restriction within any of the
helper functions.
As much as possible, the code was only {cut/paste/clang-format}'ed
to minimize risk (no functional changes intended), so several more
readability improvements are still possible.
llvm-svn: 278828
There's some formatting and pointer deref ugliness here that I intend to fix in
subsequent patches. The overall goal is to refactor the obnoxiously long switch
and incrementally remove the restriction to scalar types (allow folds for vector
splats). This patch introduces the use of m_APInt which means the RHSV reference
is now a pointer (and may have matched a vector splat), but the check of 'RHS'
remains, so vector folds are disallowed and no functional change is intended.
llvm-svn: 278816
This is part of an effort to constify ValueTracking.cpp. This change is
to methods which need const Value* instead of Value* to go with the upcoming
changes to ValueTracking.
llvm-svn: 278528
Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.
Thanks to David for the suggestion.
llvm-svn: 278077
Summary:
The correctness fix here is that when we CSE a load with another load,
we need to combine the metadata on the two loads. This matches the
behavior of other passes, like instcombine and GVN.
There's also a minor optimization improvement here: for load PRE, the
aliasing metadata on the inserted load should be the same as the
metadata on the original load. Not sure why the old code was throwing
it away.
Issue found by inspection.
Differential Revision: http://reviews.llvm.org/D21460
llvm-svn: 277977
Note that this fold really belongs in InstSimplify.
Refactoring here anyway as an intermediate step because
there's a planned addition to this function in D23134.
Differential Revision: https://reviews.llvm.org/D23223
llvm-svn: 277883
Summary:
Turn (select C, (sext A), B) into (sext (select C, A, B')) when A is i1 and
B is a compatible constant, also for zext instead of sext. This will then be
further folded into logical operations.
The transformation would be valid for non-i1 types as well, but other parts of
InstCombine prefer to have sext from non-i1 as an operand of select.
Motivated by the shader compiler frontend in Mesa for AMDGPU, which emits i32
for boolean operations. With this change, the boolean logic is fully
recovered.
Reviewers: majnemer, spatel, tstellarAMD
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22747
llvm-svn: 277801
Add a generalized IRBuilderCallbackInserter, which is just given a
callback to execute after insertion. This can be used to get rid of
the custom inserter in InstCombine, which will in turn allow me to add
target specific InstCombineCalls API for intrinsics without horrible
layering violations.
llvm-svn: 277784
I'm removing a misplaced pair of more specific folds from InstCombine in this patch as well,
so we know where those folds are happening in InstSimplify.
llvm-svn: 277738
Summary:
InstCombine unfolds expressions of the form `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` such that in a later iteration of InstCombine the exposed `zext(icmp)` instructions can be optimized. We now combine this unfolding and the subsequent `zext(icmp)` optimization to be performed together. Since the unfolding doesn't happen separately anymore, we also again enable the folding of `logic(cast(icmp), cast(icmp))` expressions to `cast(logic(icmp, icmp))` which had been disabled due to its interference with the unfolding transformation.
Tested via `make check` and `lnt`.
Background
==========
For a better understanding on how it came to this change we subsequently summarize its history. In commit r275989 we've already tried to enable the folding of `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` which had to be reverted in r276106 because it could lead to an endless loop in InstCombine (also see http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/374347.html). The root of this problem is that in `visitZExt()` in InstCombineCasts.cpp there also exists a reverse of the above folding transformation, that unfolds `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` in order to expose `zext(icmp)` operations which would then possibly be eliminated by subsequent iterations of InstCombine. However, before these `zext(icmp)` would be eliminated the folding from r275989 could kick in and cause InstCombine to endlessly switch back and forth between the folding and the unfolding transformation. This is the reason why we now combine the `zext`-unfolding and the elimination of the exposed `zext(icmp)` to happen at one go because this enables us to still allow the cast-folding in `logic(cast(icmp), cast(icmp))` without entering an endless loop again.
Details on the submitted changes
================================
- In `visitZExt()` we combine the unfolding and optimization of `zext` instructions.
- In `transformZExtICmp()` we have to use `Builder->CreateIntCast()` instead of `CastInst::CreateIntegerCast()` to make sure that the new `CastInst` is inserted in a `BasicBlock`. The new calls to `transformZExtICmp()` that we introduce in `visitZExt()` would otherwise cause according assertions to be triggered (in our case this happend, for example, with lnt for the MultiSource/Applications/sqlite3 and SingleSource/Regression/C++/EH/recursive-throw tests). The subsequent usage of `replaceInstUsesWith()` is necessary to ensure that the new `CastInst` replaces the `ZExtInst` accordingly.
- In InstCombineAndOrXor.cpp we again allow the folding of casts on `icmp` instructions.
- The instruction order in the optimized IR for the zext-or-icmp.ll test case is different with the introduced changes.
- The test cases in zext.ll have been adopted from the reverted commits r275989 and r276105.
Reviewers: grosser, majnemer, spatel
Subscribers: eli.friedman, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22864
Contributed-by: Matthias Reisinger <d412vv1n@gmail.com>
llvm-svn: 277635
This removes the restriction for the icmp constant, but as noted by the FIXME comments,
we still need to change individual checks for binop operand constants.
llvm-svn: 277629
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.
Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.
llvm-svn: 277099
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.
This is possible for code like this:
for (int i = 0; i < 3; i++) {
int x;
p = &x;
}
"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.
PR27453
Reviewers: eugenis
Differential Revision: https://reviews.llvm.org/D22842
llvm-svn: 277072
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.
This is possible for code like this:
for (int i = 0; i < 3; i++) {
int x;
p = &x;
}
"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.
PR27453
Reviewers: eugenis
Differential Revision: https://reviews.llvm.org/D22842
llvm-svn: 277068
Just because we can constant fold the result of an instruction does not
imply that we can delete the instruction. It may have side effects.
This fixes PR28655.
llvm-svn: 276389
Making smaller pieces out of some of these ~1000 line functions should make
it easier to incrementally upgrade them to handle vector types.
llvm-svn: 276304
As noted in https://reviews.llvm.org/D22537 , we can use this functionality in
visitSelectInstWithICmp() and InstSimplify, but currently we have duplicated
code.
llvm-svn: 276140
The pattern may look more obviously like a sext if written as:
define i32 @g(i16 %x) {
%zext = zext i16 %x to i32
%xor = xor i32 %zext, 32768
%add = add i32 %xor, -32768
ret i32 %add
}
We already have that fold in visitAdd().
Differential Revision: https://reviews.llvm.org/D22477
llvm-svn: 276035
Summary:
Currently, InstCombine is already able to fold expressions of the form `logic(cast(A), cast(B))` to the simpler form `cast(logic(A, B))`, where logic designates one of `and`/`or`/`xor`. This transformation is implemented in `foldCastedBitwiseLogic()` in InstCombineAndOrXor.cpp. However, this optimization will not be performed if both `A` and `B` are `icmp` instructions. The decision to preclude casts of `icmp` instructions originates in r48715 in combination with r261707, and can be best understood by the title of the former one:
> Transform (zext (or (icmp), (icmp))) to (or (zext (cimp), (zext icmp))) if at least one of the (zext icmp) can be transformed to eliminate an icmp.
Apparently, it introduced a transformation that is a reverse of the transformation that is done in `foldCastedBitwiseLogic()`. Its purpose is to expose pairs of `zext icmp` that would subsequently be optimized by `transformZExtICmp()` in InstCombineCasts.cpp. Therefore, in order to avoid an endless loop of switching back and forth between these two transformations, the one in `foldCastedBitwiseLogic()` has been restricted to exclude `icmp` instructions which is mirrored in the responsible check:
`if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src)) && ...`
This check seems to sort out more cases than necessary because:
- the reverse transformation is obviously done for `or` instructions only
- and also not every `zext icmp` pair is necessarily the result of this reverse transformation
Therefore we now remove this check and replace it by a more finegrained one in `shouldOptimizeCast()` that now rejects only those `logic(zext(icmp), zext(icmp))` that would be able to be optimized by `transformZExtICmp()`, which also avoids the mentioned endless loop. That means we are now able to also simplify expressions of the form `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` (`cast` being an arbitrary `CastInst`).
As an example, consider the following IR snippet
```
%1 = icmp sgt i64 %a, %b
%2 = zext i1 %1 to i8
%3 = icmp slt i64 %a, %c
%4 = zext i1 %3 to i8
%5 = and i8 %2, %4
```
which would now be transformed to
```
%1 = icmp sgt i64 %a, %b
%2 = icmp slt i64 %a, %c
%3 = and i1 %1, %2
%4 = zext i1 %3 to i8
```
This issue became apparent when experimenting with the programming language Julia, which makes use of LLVM. Currently, Julia lowers its `Bool` datatype to LLVM's `i8` (also see https://github.com/JuliaLang/julia/pull/17225). In fact, the above IR example is the lowered form of the Julia snippet `(a > b) & (a < c)`. Like shown above, this may introduce `zext` operations, casting between `i1` and `i8`, which could for example hinder ScalarEvolution and Polly on certain code.
Reviewers: grosser, vtjnash, majnemer
Subscribers: majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22511
Contributed-by: Matthias Reisinger
llvm-svn: 275989
Summary:
This patch cleans up parts of InstCombine to raise its compliance with the LLVM coding standards and to increase its readability. The changes and according rationale are summarized in the following:
- Rename `ShouldOptimizeCast()` to `shouldOptimizeCast()` since functions should start with a lower case letter.
- Move `shouldOptimizeCast()` from InstCombineCasts.cpp to InstCombineAndOrXor.cpp since it's only used there.
- Simplify interface of `shouldOptimizeCast()`.
- Minor code style adaptions in `shouldOptimizeCast()`.
- Remove the documentation on the function definition of `shouldOptimizeCast()` since it just repeats the documentation on its declaration. Also enhance the documentation on its declaration with more information describing its intended use and make it doxygen-compliant.
- Change a comment in `foldCastedBitwiseLogic()` from `fold (logic (cast A), (cast B)) -> (cast (logic A, B))` to `fold logic(cast(A), cast(B)) -> cast(logic(A, B))` since the surrounding comments use this format.
- Remove comment `Only do this if the casts both really cause code to be generated.` in `foldCastedBitwiseLogic()` since it just repeats parts of the documentation of `shouldOptimizeCast()` and does not help to improve readability.
- Simplify the interface of `isEliminableCastPair()`.
- Removed the documentation on the function definition of `isEliminableCastPair()` which only contained obvious statements about its implementation. Instead added more general doxygen-compliant documentation to its declaration.
- Renamed parameter `DoXform` of `transformZExtIcmp()` to `DoTransform` to make its intention clearer.
- Moved documentation of `transformZExtIcmp()` from its definition to its declaration and made it doxygen-compliant.
Reviewers: vtjnash, grosser
Subscribers: majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22449
Contributed-by: Matthias Reisinger
llvm-svn: 275964
This is a partial implementation of a general fold for associative+commutative operators:
(op (cast (op X, C2)), C1) --> (cast (op X, op (C1, C2)))
(op (cast (op X, C2)), C1) --> (op (cast X), op (C1, C2))
There are 7 associative operators and 13 cast types, so this could potentially go a lot further.
Differential Revision: https://reviews.llvm.org/D22421
llvm-svn: 275684