1. Update function name and stale code comments.
2. Use variable names that are less ambiguous.
3. Move operand checks into the function as early exits.
llvm-svn: 369390
Summary:
This is rather unconventional..
As the comment there says, we don't have much folds for xor-of-icmps,
we try to turn them into an and-of-icmps, for which we have plenty of folds.
But if the ICmp we need to invert is not single-use - we give up.
As discussed in https://reviews.llvm.org/D65148#1603922,
we may have a non-canonical CLAMP pattern, with bit match and
select-of-threshold that we'll potentially clamp.
As it can be seen in `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`,
out of all 8 variations of the pattern, only two are **not** canonicalized into
the variant with and+icmp instead of bit math.
The reason is because the ICmp we need to invert is not single-use - we give up.
We indeed can't perform this fold at will, the general rule is that
we should not increase instruction count in InstCombine,
But we wouldn't end up increasing instruction count if we can adapt every other
user to the inverted value. This way the `not` we create **will** get folded,
and in the end the instruction count did not increase.
For that, of course, we need to look at the users of a Value,
which is again rather unconventional for InstCombine :S
Thus i'm proposing to be a little bit more insistive in `foldXorOfICmps()`.
The alternatives would be to not create that `not`, but add duplicate code to
manually invert all users; or to add some even less general combine to handle
some more specific pattern[s].
Reviewers: spatel, nikic, RKSimon, craig.topper
Reviewed By: spatel
Subscribers: hiraditya, jdoerfert, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65530
llvm-svn: 368685
Summary:
I have stumbled into this by accident while preparing to extend backend `x s% C ==/!= 0` handling.
While we did happen to handle this fold in most of the cases,
the folding is indirect - we fold `x u% y` to `x & (y-1)` (iff `y` is power-of-two),
or first turn `x s% -y` to `x u% y`; that does handle most of the cases.
But we can't turn `x s% INT_MIN` to `x u% -INT_MIN`,
and thus we end up being stuck with `(x s% INT_MIN) == 0`.
There is no such restriction for the more general fold:
https://rise4fun.com/Alive/IIeS
To be noted, the fold does not enforce that `y` is a constant,
so it may indeed increase instruction count.
This is consistent with what `x u% y`->`x & (y-1)` already does.
I think it makes sense, it's at most one (simple) extra instruction,
while `rem`ainder is really much more un-simple (and likely **very** costly).
Reviewers: spatel, RKSimon, nikic, xbolva00, craig.topper
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65046
llvm-svn: 367322
Extract method to compute overflow based on binop and signedness,
and then make the result handling code generic. This extends the
always-overflow handling to signed muls, but has currently no effect,
as we don't compute always overflow for them (thus NFC).
llvm-svn: 361721
In InstCombine, we use an idiom of "store i1 true, i1 undef" to indicate we've found a path which we've proven unreachable. We can't actually insert the unreachable instruction since that would require changing the CFG. We leave that to simplifycfg later.
This just factors out that idiom creation so we don't duplicate the same mostly undocument idiom creation in multiple places.
llvm-svn: 358600
Summary:
Enable some of the existing size optimizations for cold code under PGO.
A ~5% code size saving in big internal app under PGO.
The way it gets BFI/PSI is discussed in the RFC thread
http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html
Note it doesn't currently touch loop passes.
Reviewers: davidxl, eraman
Reviewed By: eraman
Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59514
llvm-svn: 358422
If we know we're not storing a lane, we don't need to compute the lane. This could be improved by using the undef element result to further prune the mask, but I want to separate that into its own change since it's relatively likely to expose other problems.
Differential Revision: https://reviews.llvm.org/D57247
llvm-svn: 356590
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html
This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.
This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.
There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.
Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii
Differential Revision: https://reviews.llvm.org/D53765
llvm-svn: 353563
This commit teaches InstCombine how to replace an atomicrmw operation
into a simple load atomic.
For a given `atomicrmw <op>`, this is possible when:
1. The ordering of that operation is compatible with a load (i.e.,
anything that doesn't have a release semantic).
2. <op> does not modify the value being stored
Differential Revision: https://reviews.llvm.org/D57854
llvm-svn: 353471
Summary:
The fix added in r352904 is not quite correct, or rather misleading:
1. When the texfailctrl (TFC) argument was non-constant, the fix assumed
non-TFE/LWE, which is incorrect.
2. Regardless, this code path cannot even be hit for correct
TFE/LWE-enabled calls, because those return a struct. Added
a test case for those for completeness.
Change-Id: I92d314dbc67a2670f6d7adaab765ef45f56a49cf
Reviewers: hliao, dstuttard, arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57681
llvm-svn: 353097
Followup to D55745, this time handling comparisons with ugt and ult
predicates (which are the canonical forms for non-equality predicates).
For ctlz we can convert into a simple icmp, for cttz we can convert
into a mask check.
Differential Revision: https://reviews.llvm.org/D56355
llvm-svn: 351645
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.
This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.
This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).
There's an additional fix now to avoid a dmask=0
For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.
Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.
The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:
%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1
This re-submit of the change also includes a slight modification in
SIISelLowering.cpp to work-around a compiler bug for the powerpc_le
platform that caused a buildbot failure on a previous submission.
Differential revision: https://reviews.llvm.org/D48826
Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
Work around for ppcle compiler bug
Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b
llvm-svn: 351054
There's a potential small enhancement to this code that could
solve the cases currently under proposal in D54827 via SimplifyCFG.
Whether instcombine should be doing this kind of semi-non-local
analysis in the first place is an open question, but separating
the logic out can only help if/when we decide to move it to a
different pass.
AFAICT, any proposal to do this in SimplifyCFG could also be seen
as an overreach + it would be incomplete to start the fold from a
branch rather than an icmp.
There's another question here about the code for processUGT_ADDCST_ADD().
That part may be completely dead after rL234638 ?
llvm-svn: 348273
Also revert fix r347876
One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.
llvm-svn: 347911
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.
This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.
This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).
There's an additional fix now to avoid a dmask=0
For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.
Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.
The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:
%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1
Differential revision: https://reviews.llvm.org/D48826
Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
llvm-svn: 347871
The original patch was committed here:
rL344609
...and reverted:
rL344612
...because it did not properly check/test data types before calling
ComputeNumSignBits().
The tests that caused bot failures for the previous commit are
over-reaching front-end tests that run the entire -O optimizer
pipeline:
Clang :: CodeGen/builtins-systemz-zvector.c
Clang :: CodeGen/builtins-systemz-zvector2.c
I've added a negative test here to ensure coverage for that case.
The new early exit check also tests the type of the 'B' parameter,
so we don't waste time on matching if either value is unsuitable.
Original commit message:
This is part of solving PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549
The patterns shown here are a special case of something
that we already convert to select. Using ComputeNumSignBits()
catches that case (but not the more complicated motivating
patterns yet).
The backend has hooks/logic to convert back to logic ops
if that's better for the target.
llvm-svn: 345149
There's probably some vector-with-undef-element pattern
that shows an improvement, so this is probably not quite
'NFC'.
This is the last step towards removing the fake binop
queries for not/neg. Ie, there are no more uses of those
functions in trunk. Fneg should follow.
llvm-svn: 345050
I noticed a missing check and added it at rL344610, but there actually
are codegen tests that will fail without that, so I'll edit those and
submit a fixed patch with more tests.
llvm-svn: 344612
This is part of solving PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549
The patterns shown here are a special case of something
that we already convert to select. Using ComputeNumSignBits()
catches that case (but not the more complicated motivating
patterns yet).
The backend has hooks/logic to convert back to logic ops
if that's better for the target.
llvm-svn: 344609
This is an attempt to get out of a local-minimum that instcombine currently
gets stuck in. We essentially combine two optimisations at once, ~a - ~b = b-a
and min(~a, ~b) = ~max(a, b), only doing the transform if the result is at
least neutral. This involves using IsFreeToInvert, which has been expanded a
little to include selects that can be easily inverted.
This is trying to fix PR35875, using the ideas from Sanjay. It is a large
improvement to one of our rgb to cmy kernels.
Differential Revision: https://reviews.llvm.org/D52177
llvm-svn: 343569
Similar to rL342278:
The test diffs are all cosmetic due to the change in
value naming, but I'm including that to show that the
new code does perform these folds rather than something
else in instcombine.
D52075 should be able to use this code too rather than
duplicating all of the logic.
llvm-svn: 342292
The test diffs are all cosmetic due to the change in
value naming, but I'm including that to show that the
new code does perform these folds rather than something
else in instcombine.
llvm-svn: 342278
getSafeVectorConstantForBinop() was calling getBinOpIdentity() assuming
that the constant we wanted was operand 1 (RHS). That's wrong, but I
don't think we could expose a bug or even a suboptimal fold from that
because the callers have other guards for any binop that would have
been affected.
llvm-svn: 336617
This is almost NFC, but there could be some case where the original
code had undefs in the constants (rather than just the shuffle mask),
and we'll use safe constants rather than undefs now.
The FIXME noted in foldShuffledBinop() is already visible in existing
tests, so correcting that is the next step.
llvm-svn: 336558
Summary:
Use the expanded features of the TableGen generic tables to avoid manually
adding the combinatorially exploded set of intrinsics. The
getAMDGPUImageDimIntrinsic lookup function is early-out,
i.e. non-AMDGPU intrinsics will never look at the underlying table.
Use a generic approach for getting the new intrinsic overload to keep the
code simple, and make the image dmask handling more generic:
- handle non-sampler image loads
- handle the case where the set of demanded elements is not a prefix
There is some overlap between this code and an optimization that happens
in the backend during code generation. They currently complement each other:
- only the codegen optimization can generate vec3 loads
- only the InstCombine optimization can handle D16
The InstCombine optimization also likely covers more cases since the
codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization
in codegen once the infrastructure for vec3 is in place (which will probably
take a long time).
Modify the test cases to use dimension-aware intrinsics. This makes it
easier to see that the test coverage for the new intrinsics is equivalent,
and the old style intrinsics will be removed in a follow-up commit anyway.
Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad
Reviewers: arsenm, rampitec, majnemer
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48165
llvm-svn: 335230
Review feedback from r328165. Split out just the one function from the
file that's used by Analysis. (As chandlerc pointed out, the original
change only moved the header and not the implementation anyway - which
was fine for the one function that was used (since it's a
template/inlined in the header) but not in general)
llvm-svn: 333954
As noted in the review thread for rL333782, we could have
made a bug harder to hit if we were simplifying instructions
before trying other folds.
The shuffle transform in question isn't ever a simplification;
it's just a canonicalization. So I've renamed that to make that
clearer.
This is NFCI at this point, but I've regenerated the test file
to show the cosmetic value naming difference of using
instcombine's RAUW vs. the builder.
Possible follow-ups:
1. Move reassociation folds after simplifies too.
2. Refactor common code; we shouldn't have so much repetition.
llvm-svn: 333820
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
Summary:
This change adds handling of the atomic memset intrinsic to the
code path that simplifies the regular memset. In practice this means
that we will now also expand a small constant-length atomic memset
into a single unordered atomic store.
Reviewers: apilipenko, skatkov, mkazantsev, anna, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D46660
llvm-svn: 332132
Summary:
This change reworks the handling of atomic memcpy within the instcombine pass.
Previously, a constant length atomic memcpy would be lowered into loads & stores
as long as no more than 16 load/store pairs are created. This is quite different
from the lowering done for a non-atomic memcpy; which only ever lowers into a single
load/store pair of no more than 8 bytes. Larger constant-sized memcpy calls are
expanded to load/stores in later passes, such as SelectionDAG lowering.
In this change the behaviour for atomic memcpy is unified with non-atomic memcpy;
atomic memcpy is now treated in the same was as non-atomic memcpy has always been.
We leave it to later passes to lower longer-length atomic memcpy calls.
Due to the structure of the pass's handling of memtransfer intrinsics, this change
also gives us handling of atomic memmove that we did not previously have.
Reviewers: apilipenko, skatkov, mkazantsev, anna, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D46658
llvm-svn: 332093
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
Summary:
Simplify integer add expression X % C0 + (( X / C0 ) % C1) * C0 to
X % (C0 * C1). This is a common pattern seen in code generated by the XLA
GPU backend.
Add test cases for this new optimization.
Patch by Bixia Zheng!
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: efriedma, craig.topper, lebedev.ri, llvm-commits, jlebar
Differential Revision: https://reviews.llvm.org/D45976
llvm-svn: 330992
Two cleanups:
1. As noted in D45453, we had tests that don't need FMF that were misplaced in the 'fast-math.ll' test file.
2. This removes the final uses of dyn_castFNegVal, so that can be deleted. We use 'match' now.
llvm-svn: 330126
Summary:
Folding patterns like:
%vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
%cast = bitcast <4 x i8> %vec to i32
%cond = icmp eq i32 %cast, 0
into:
%ext = extractelement <4 x i8> %insvec, i32 0
%cond = icmp eq i32 %ext, 0
Combined with existing rules, this allows us to fold patterns like:
%insvec = insertelement <4 x i8> undef, i8 %val, i32 0
%vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
%cast = bitcast <4 x i8> %vec to i32
%cond = icmp eq i32 %cast, 0
into:
%cond = icmp eq i8 %val, 0
When we construct a splat vector via a shuffle, and bitcast the vector into an integer type for comparison against an integer constant. Then we can simplify the the comparison to compare the splatted value against the integer constant.
Reviewers: spatel, anna, mkazantsev
Reviewed By: spatel
Subscribers: efriedma, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D44997
llvm-svn: 329087
This replaces a large chunk of code that was looking for compound
patterns that include these sub-patterns. Existing tests ensure that
all of the previous examples are still folded as expected.
We still need to loosen the FMF check.
llvm-svn: 328502
Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering.
Transforms depends on Transforms/Utils, not the other way around. So
remove the header and the "createStripGCRelocatesPass" function
declaration (& definition) that is unused and motivated this dependency.
Move Transforms/Utils/Local.h into Analysis because it's used by
Analysis/MemoryBuiltins.cpp.
llvm-svn: 328165
Also, rename 'foldOpWithConstantIntoOperand' because that's annoyingly
vague. The constant check is redundant in some cases, but it allows
removing duplication for most of the calls.
llvm-svn: 326329
This is guarded by shouldChangeType(), so the tests show that
we don't do the fold if the narrower type is not legal. Note
that there is a proposal (D42424) that would change the results
for the specific cases shown in these tests. That difference is
also discussed in PR35792:
https://bugs.llvm.org/show_bug.cgi?id=35792
Alive proofs for the cases handled here as well as the bitwise
logic binops that we should already do better on:
https://rise4fun.com/Alive/c97https://rise4fun.com/Alive/Lc5Ehttps://rise4fun.com/Alive/kdf
llvm-svn: 323437
Summary:
For reference, see: http://lists.llvm.org/pipermail/llvm-dev/2017-August/116589.html
This patch fleshes out the instruction class hierarchy with respect to atomic and
non-atomic memory intrinsics. With this change, the relevant part of the class
hierarchy becomes:
IntrinsicInst
-> MemIntrinsicBase (methods-only class)
-> MemIntrinsic (non-atomic intrinsics)
-> MemSetInst
-> MemTransferInst
-> MemCpyInst
-> MemMoveInst
-> AtomicMemIntrinsic (atomic intrinsics)
-> AtomicMemSetInst
-> AtomicMemTransferInst
-> AtomicMemCpyInst
-> AtomicMemMoveInst
-> AnyMemIntrinsic (both atomicities)
-> AnyMemSetInst
-> AnyMemTransferInst
-> AnyMemCpyInst
-> AnyMemMoveInst
This involves some class renaming:
ElementUnorderedAtomicMemCpyInst -> AtomicMemCpyInst
ElementUnorderedAtomicMemMoveInst -> AtomicMemMoveInst
ElementUnorderedAtomicMemSetInst -> AtomicMemSetInst
A script for doing this renaming in downstream trees is included below.
An example of where the Any* classes should be used in LLVM is when reasoning
about the effects of an instruction (ex: aliasing).
---
Script for renaming AtomicMem* classes:
PREFIXES="[<,([:space:]]"
CLASSES="MemIntrinsic|MemTransferInst|MemSetInst|MemMoveInst|MemCpyInst"
SUFFIXES="[;)>,[:space:]]"
REGEX="(${PREFIXES})ElementUnorderedAtomic(${CLASSES})(${SUFFIXES})"
REGEX2="visitElementUnorderedAtomic(${CLASSES})"
FILES=$( grep -E "(${REGEX}|${REGEX2})" -r . | tr ':' ' ' | awk '{print $1}' | sort | uniq )
SED_SCRIPT="s~${REGEX}~\1Atomic\2\3~g"
SED_SCRIPT2="s~${REGEX2}~visitAtomic\1~g"
for f in $FILES; do
echo "Processing: $f"
sed -i ".bak" -E "${SED_SCRIPT};${SED_SCRIPT2};${EA_SED_SCRIPT};${EA_SED_SCRIPT2}" $f
done
Reviewers: sanjoy, deadalnix, apilipenko, anna, skatkov, mkazantsev
Reviewed By: sanjoy
Subscribers: hfinkel, jholewinski, arsenm, sdardis, nhaehnle, JDevlieghere, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38419
llvm-svn: 316950
Summary:
The following transformation for cmp instruction:
icmp smin(x, PositiveValue), 0 -> icmp x, 0
should only be done after checking for min/max to prevent infinite
looping caused by a reverse canonicalization. That is why this
transformation was moved to place after the mentioned check.
Reviewers: spatel, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38934
Patch by: Artur Gainullin <artur.gainullin@intel.com>
llvm-svn: 315895
Eliminate inttype phi with inttoptr/ptrtoint.
This version fixed a bug in finding the matching
phi -- the order of the incoming blocks may be
different (triggered in self build on Windows).
A new test case is added.
llvm-svn: 315272
There's at least one bug here - this code can fail with vector types (PR34856).
It's also being called for FREM; I'm still trying to understand how that is valid.
llvm-svn: 315127
This appears to be miscompiling Clang, as shown on two Windows bootstrap
bots:
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/7611http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/6870
Nothing else is in the blame list. Both emit errors on this valid code
in the Windows ucrt headers:
C:\...\ucrt\malloc.h:95:32: error: invalid operands to binary expression ('char *' and 'int')
_Ptr = (char*)_Ptr + _ALLOCA_S_MARKER_SIZE;
~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~
I am attempting to reproduce this now.
This reverts r315044
llvm-svn: 315108
Summary: If the merged instruction is call instruction, we need to set the scope to the closes common scope between 2 locations, otherwise it will cause trouble when the call is getting inlined.
Reviewers: dblaikie, aprantl
Reviewed By: dblaikie, aprantl
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D37877
llvm-svn: 314694
And follow-up r314585.
Leads to segfaults. I'll forward reproduction instructions to the patch
author.
Also, for a recommit, still add the original patch description.
Otherwise, it becomes really tedious to find out what a patch actually
does. The fact that it is a recommit with a fix is somewhat secondary.
llvm-svn: 314622
This patch will eliminate redundant intptr/ptrtoint that pessimizes
analyses such as SCEV, AA and will make optimization passes such
as auto-vectorization more powerful.
Differential revision: http://reviews.llvm.org/D37832
llvm-svn: 314561
In these cases, two selects have constant selectable operands for
both the true and false components and have the same conditional
expression.
We then create two arithmetic operations of the same type and feed a
final select operation using the result of the true arithmetic for the true
operand and the result of the false arithmetic for the false operand and reuse
the original conditionl expression.
The arithmetic operations are naturally folded as a consequence, leaving
only the newly formed select to replace the old arithmetic operation.
Patch by: Michael Berg <michael_c_berg@apple.com>
Differential Revision: https://reviews.llvm.org/D37019
llvm-svn: 313774
In addition to removing chunks of duplicated code, we don't
want these to diverge. If there's a fold for one, there
should be a fold of the other via DeMorgan's Laws.
llvm-svn: 312420
I couldn't find any smaller folds to help the cases in:
https://bugs.llvm.org/show_bug.cgi?id=34046
after:
rL310141
The truncated rotate-by-variable patterns elude all of the existing transforms because
of multiple uses and knowledge about demanded bits and knownbits that doesn't exist
without the whole pattern. So we need an unfortunately large pattern match. But by
simplifying this pattern in IR, the backend is already able to generate
rolb/rolw/rorb/rorw for x86 using its existing rotate matching logic (although
there is a likely extraneous 'and' of the rotate amount).
Note that rotate-by-constant doesn't have this problem - smaller folds should already
produce the narrow IR ops.
Differential Revision: https://reviews.llvm.org/D36395
llvm-svn: 310509
Summary:
This changes SimplifyLibCalls to use the new OptimizationRemarkEmitter
API.
In fact, as SimplifyLibCalls is only ever called via InstCombine,
(as far as I can tell) the OptimizationRemarkEmitter is added there,
and then passed through to SimplifyLibCalls later.
I have avoided changing any remark text.
This closes PR33787
Patch by Sam Elliott!
Reviewers: anemet, davide
Reviewed By: anemet
Subscribers: davide, mehdi_amini, eraman, fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D35608
llvm-svn: 309158
Previously the InstCombiner class contained a pointer to an IR builder that had been passed to the constructor. Sometimes this would be passed to helper functions as either a pointer or the pointer would be dereferenced to be passed by reference.
This patch makes it a reference everywhere including the InstCombiner class itself so there is more inconsistency. This a large, but mechanical patch. I've done very minimal formatting changes on it despite what clang-format wanted to do.
llvm-svn: 307451
We aren't looking through any levels of IR here so I don't think we need the power of a matcher or the temporary variable it requires.
llvm-svn: 306885
Summary:
Many languages have a three way comparison idiom where comparing two values
produces not a boolean, but a tri-state value. Typical values (e.g. as used in
the lcmp/fcmp bytecodes from Java) are -1 for less than, 0 for equality, and +1
for greater than.
We actually do a great job already of converting three way comparisons into
binary comparisons when the result produced has one a single use. Unfortunately,
such values can have more than one use, and in that case, our existing
optimizations break down.
The patch adds a peephole which converts a three-way compare + test idiom into a
binary comparison on the original inputs. It focused on replacing the test on
the result of the three way compare and does nothing about removing the three
way compare itself. That's left to other optimizations (which do actually kick
in commonly.)
We currently recognize one idiom on signed integer compare. In the future, we
plan to recognize and simplify other comparison idioms on
other signed/unsigned datatypes such as floats, vectors etc.
This is a resurrection of Philip Reames' original patch:
https://reviews.llvm.org/D19452
Reviewers: majnemer, apilipenko, reames, sanjoy, mkazantsev
Reviewed by: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34278
llvm-svn: 306100
Summary:
Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html
This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.
Reviewers: reames, sanjoy, efriedma
Reviewed By: reames
Subscribers: mzolotukhin, anna, llvm-commits, skatkov
Differential Revision: https://reviews.llvm.org/D33240
llvm-svn: 305558
Summary: This is the demorganed version of the case we already handle for the OR of iszero.
Reviewers: spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34244
llvm-svn: 305548
This creates a new library called BinaryFormat that has all of
the headers from llvm/Support containing structure and layout
definitions for various types of binary formats like dwarf, coff,
elf, etc as well as the code for identifying a file from its
magic.
Differential Revision: https://reviews.llvm.org/D33843
llvm-svn: 304864
We have wrappers for several other ValueTracking methods that take care of passing all of the analysis and assumption cache parameters. This extends it to isKnownToBeAPowerOfTwo.
llvm-svn: 303924
Summary:
Fix naming conventions and const correctness.
This completes the changes made in rL303029.
Patch by Yoav Ben-Shalom.
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33377
llvm-svn: 303529
Also, fix the old-style capitalization of the related functions
and move them to the 'private' section of the class since they
are just helpers of the visit* functions.
As shown in the post-commit comments for D32143, we are missing
folds for xor-of-icmps.
llvm-svn: 303381
There should be a slight efficiency improvement from handling icmp/fcmp with one matcher and reducing duplicated code.
The larger motivation is that there are questions about how predicate canonicalization is handled, and the refactoring
should make it easier if we want to change any of that behavior.
1. As noted in the code comment, we've chosen 3 of the 16 FCMP preds as not canonical. Why those 3? It goes back to
rL32751 from what I can tell, but I'm not sure if there's a justification for that rule.
2. We currently do not canonicalize integer select conditions. Should we use the same rule that applies to branches
for selects?
3. We currently do canonicalize some FP select conditions, and those rules would conflict with the rule shown here.
Should one or both be changed?
No-functional-change-intended, but adding tests anyway because there's no coverage for most of the predicates.
Differential Revision: https://reviews.llvm.org/D33247
llvm-svn: 303261
Summary:
Merge overflow computation for signed add,
appearing both in InstCombine and ValueTracking.
As part of the merge,
cleanup the interface for overflow checks in InstCombine.
Patch by Yoav Ben-Shalom.
Reviewers: craig.topper, majnemer
Reviewed By: craig.topper
Subscribers: takuto.ikuta, llvm-commits
Differential Revision: https://reviews.llvm.org/D32946
llvm-svn: 303029