Commit Graph

12052 Commits

Author SHA1 Message Date
Craig Topper 45ad631b4c [ScalarizeMaskedMemIntrin] Add some IR only test cases for masked gather expansion.
llvm-svn: 343272
2018-09-27 21:28:55 +00:00
Craig Topper 7d234d6628 [ScalarizeMaskedMemIntrin] When expanding masked loads, start with the passthru value and insert each conditional load result over their element.
Previously we started with undef and did one final merge at the end with a select.

llvm-svn: 343271
2018-09-27 21:28:52 +00:00
Craig Topper dfc0f289fa [ScalarizeMaskedMemIntrin] Handle the case where the mask is an all zero vector.
This shouldn't really happen in practice I hope, but we tried to handle other constant cases. We missed this one because we checked for ConstantVector without realizing that zero becomes ConstantAggregateZero instead.

So instead just check for Constant and use getAggregateElement which will do the dirty work for us.

llvm-svn: 343270
2018-09-27 21:28:46 +00:00
Craig Topper a6478ac5d4 [ScalarizeMaskedMemIntrin] Add dedicated IR only tests for masked load expansion so I can begin making modifications.
llvm-svn: 343269
2018-09-27 21:28:43 +00:00
Sanjay Patel c3f50ff92e [InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0)
When C is not zero and infinites are not allowed (C / X) > 0 is a sign
test. Depending on the sign of C, the predicate must be swapped.

E.g.:
  foo(double X) {
    if ((-2.0 / X) <= 0) ...
  }
 =>
  foo(double X) {
    if (X >= 0) ...
  }

Patch by: @marels (Martin Elshuber)

Differential Revision: https://reviews.llvm.org/D51942

llvm-svn: 343228
2018-09-27 15:59:24 +00:00
Sanjay Patel 95a816b34a [InstCombine] add tests for FP sign-bit cmp optimization with fdiv; NFC
These are baseline tests for D51942.
Patch by: @marels (Martin Elshuber)

llvm-svn: 343222
2018-09-27 14:24:29 +00:00
Nicola Zaghen 436c012702 [InstCombine] Add new tests in preparation for a combine of icmp (mul nsw/nuw X, C2), C
Proof for the future optimisations are here:
- eq/neq: https://rise4fun.com/Alive/9PBA
- sgt/ugt: https://rise4fun.com/Alive/58yr
- slt/ult: https://rise4fun.com/Alive/VCQ

Differential Revision: https://reviews.llvm.org/D51625

llvm-svn: 343190
2018-09-27 10:08:38 +00:00
Sanjay Patel 150afce75a [InstCombine] add tests that show undef propagation failures from D52548; NFC
Differential Revision: https://reviews.llvm.org/D52556

llvm-svn: 343140
2018-09-26 20:30:47 +00:00
Florian Hahn 6feb637124 [LoopInterchange] Preserve LCSSA.
This patch extends LoopInterchange to move LCSSA to the right place
after interchanging. This is required for LoopInterchange to become a
function pass.

An alternative to the manual moving of the PHIs, we could also re-form
the LCSSA phis for a set of interchanged loops, but that's more
expensive.

Reviewers: efriedma, mcrosier, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D52154

llvm-svn: 343132
2018-09-26 19:34:25 +00:00
Sanjay Patel d938d0d4f5 [InstCombine] add tests for vector insert/extract; NFC
Preliminary step for D52439.

llvm-svn: 343128
2018-09-26 17:57:38 +00:00
David Green 353cb3d4e5 [CodeGen] Enable tail calls for functions with NonNull attributes.
Adding NonNull as attributes to returned pointers has the unfortunate side
effect of disabling tail calls. This patch ignores the NonNull attribute when
we decide whether to tail merge, in the same way that we ignore the NoAlias
attribute, as it has no affect on the call sequence.

Differential Revision: https://reviews.llvm.org/D52238

llvm-svn: 343091
2018-09-26 10:46:18 +00:00
Vyacheslav Zakharin e06831a3b2 Remove LoopID metadata from the branch instruction
that follows the peeled iterations.

Differential Revision: https://reviews.llvm.org/D52176

llvm-svn: 343054
2018-09-26 01:03:21 +00:00
Sanjay Patel f23727d972 [InstCombine] add fneg variation of shuffle-binop fold; NFC
If the fsub in this pattern was replaced by an actual fneg
instruction, we would need to add a fold to recognize that
because fneg would not be a binop.

llvm-svn: 343041
2018-09-25 22:48:58 +00:00
Anna Thomas b1e3d45318 [LV][LAA] Vectorize loop invariant values stored into loop invariant address
Summary:
We are overly conservative in loop vectorizer with respect to stores to loop
invariant addresses.
More details in https://bugs.llvm.org/show_bug.cgi?id=38546
This is the first part of the fix where we start with vectorizing loop invariant
values to loop invariant addresses.

This also includes changes to ORE for stores to invariant address.

Reviewers: anemet, Ayal, mkuper, mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50665

llvm-svn: 343028
2018-09-25 20:57:20 +00:00
Teresa Johnson 7fb39dfa7c [ThinLTO] Efficiency fix for writing type id records in per-module indexes
Summary:
In D49565/r337503, the type id record writing was fixed so that only
referenced type ids were emitted into each per-module index for ThinLTO
distributed builds. However, this still left an efficiency issue: each
per-module index checked all type ids for membership in the referenced
set, yielding O(M*N) performance (M indexes and N type ids).

Change the TypeIdMap in the summary to be indexed by GUID, to facilitate
correlating with type identifier GUIDs referenced in the function
summary TypeIdInfo structures. This allowed simplifying other
places where a map from type id GUID to type id map entry was previously
being used to aid this correlation.

Also fix AsmWriter code to handle the rare case of type id GUID
collision.

For a large internal application, this reduced the thin link time by
almost 15%.

Reviewers: pcc, vitalybuka

Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51330

llvm-svn: 343021
2018-09-25 20:14:40 +00:00
Sanjay Patel 69ed4710b8 [InstCombine] narrow binops on concatenated vectors (PR33026)
The motivating case from:
https://bugs.llvm.org/show_bug.cgi?id=33026
...has no shuffles now. This kind of pattern may occur during
vectorization when targets have lumpy ISAs like SSE/AVX.

llvm-svn: 342988
2018-09-25 15:57:37 +00:00
David Green 9108c2b921 [LoopUnroll] Add check to Latch's terminator in UnrollRuntimeLoopRemainder
In this patch, I'm adding an extra check to the Latch's terminator in llvm::UnrollRuntimeLoopRemainder,
similar to how it is already done in the llvm::UnrollLoop.

The compiler would crash if this function is called with a malformed loop.

Patch by Rodrigo Caetano Rocha!

Differential Revision: https://reviews.llvm.org/D51486

llvm-svn: 342958
2018-09-25 10:08:47 +00:00
Christy Lee e94374809e Re-submitting changes in D51550 because it failed to patch.
Reviewers: javed.absar, trentxintong, courbet

Reviewed By: trentxintong

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52433

llvm-svn: 342919
2018-09-24 20:47:12 +00:00
Christy Lee bf112ea25b Reland r342494 after fixing LIT checks.
llvm-svn: 342907
2018-09-24 17:26:30 +00:00
Sanjay Patel 7b86bc22de [InstCombine] add/move tests for extractelement; NFC
llvm-svn: 342905
2018-09-24 17:17:16 +00:00
Matt Arsenault f432011d33 AMDGPU: Fix private handling for allowsMisalignedMemoryAccesses
If the alignment is at least 4, this should report true.

Something still seems off with how < 4-byte types are
handled here though.

Fixing this seems to change how some combines get
to where they get, but somehow isn't changing the net
result.

llvm-svn: 342879
2018-09-24 13:18:15 +00:00
Petar Jovanovic c451c9ef50 [deadargelim] Update dbg.value of 'unused' parameters
DeadArgElim pass marks unused function arguments as ‘undef’ without updating
existing dbg.values referring to it. As a consequence the debug info
metadata in the final executable was wrong.

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D51968

llvm-svn: 342871
2018-09-24 10:01:24 +00:00
Eugene Leviant 2b70d616f0 [WholeProgramDevirt] Don't process declarations when building type id map
Differential revision: https://reviews.llvm.org/D52175

llvm-svn: 342836
2018-09-23 13:27:47 +00:00
Sanjay Patel 09e02fbf51 [InstCombine][x86] try even harder to convert blendv intrinsic to generic IR (PR38814)
Follow-up to rL342324 (D52059):

Missing optimizations with blendv are shown in:
https://bugs.llvm.org/show_bug.cgi?id=38814

This is an easier and more powerful solution than adding pattern matching for a few 
special cases in the backend. The potential danger with this transform in IR is that 
the condition value can get separated from the select, and the backend might not be 
able to make a blendv out of it again.

llvm-svn: 342806
2018-09-22 14:43:55 +00:00
Craig Topper 2b3f5df73a [InstCombine] Fold (min/max ~X, Y) -> ~(max/min X, ~Y) when Y is freely invertible
Summary: This restores the combine that was reverted in r341883. The infinite loop from the failing test no longer occurs due to changes from r342163.

Reviewers: spatel, dmgreen

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52070

llvm-svn: 342797
2018-09-22 05:53:27 +00:00
Warren Ristow 4f27730eaf [Loop Vectorizer] Abandon vectorization when no integer IV found
Support for vectorizing loops with secondary floating-point induction
variables was added in r276554.  A primary integer IV is still required
for vectorization to be done.  If an FP IV was found, but no integer IV
was found at all (primary or secondary), the attempt to vectorize still
went forward, causing a compiler-crash.  This change abandons that
attempt when no integer IV is found.  (Vectorizing FP-only cases like
this, rather than bailing out, is discussed as possible future work
in D52327.)

See PR38800 for more information.

Differential Revision: https://reviews.llvm.org/D52327

llvm-svn: 342786
2018-09-21 23:03:50 +00:00
Sanjay Patel 72d627e5ec [InstCombine] add tests for extractelement; NFC
There are folds under visitExtractElementInst() that don't appear
to have any test coverage, so adding a few basic cases here.

llvm-svn: 342740
2018-09-21 14:43:49 +00:00
JF Bastien 73d8e4e531 Merge clang's isRepeatedBytePattern with LLVM's isBytewiseValue
Summary:
his code was in CGDecl.cpp and really belongs in LLVM's isBytewiseValue. Teach isBytewiseValue the tricks clang's isRepeatedBytePattern had, including merging undef properly, and recursing on more types.

clang part of this patch: D51752

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51751

llvm-svn: 342709
2018-09-21 05:17:42 +00:00
Sanjay Patel 18c29b7d74 [InstCombine] rename test file, simplify tests, regenerate full checks; NFC
Fast-math is irrelevant for these transforms.

llvm-svn: 342683
2018-09-20 21:10:14 +00:00
Sameer AbuAsal 77beee4136 [inline Cost] Don't mark functions accessing varargs as non-inlinable
Summary:
rL323619 marks functions that are calling va_end as not viable for
inlining. This patch reverses that since this va_end doesn't need
access to the vriadic arguments list that are saved on the stack, only
va_start does.

Reviewers: efriedma, fhahn

Reviewed By: fhahn

Subscribers: eraman, haicheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D52067

llvm-svn: 342675
2018-09-20 18:39:34 +00:00
Sanjay Patel dfe4380440 [InstCombine] add tests for vector concat with binop (PR33026); NFC
llvm-svn: 342665
2018-09-20 17:10:38 +00:00
Fedor Sergeev ee8d31c49e [New PM] Introducing PassInstrumentation framework
Pass Execution Instrumentation interface enables customizable instrumentation
of pass execution, as per "RFC: Pass Execution Instrumentation interface"
posted 06/07/2018 on llvm-dev@

The intent is to provide a common machinery to implement all
the pass-execution-debugging features like print-before/after,
opt-bisect, time-passes etc.

Here we get a basic implementation consisting of:
* PassInstrumentationCallbacks class that handles registration of callbacks
  and access to them.

* PassInstrumentation class that handles instrumentation-point interfaces
  that call into PassInstrumentationCallbacks.

* Callbacks accept StringRef which is just a name of the Pass right now.
  There were some ideas to pass an opaque wrapper for the pointer to pass instance,
  however it appears that pointer does not actually identify the instance
  (adaptors and managers might have the same address with the pass they govern).
  Hence it was decided to go simple for now and then later decide on what the proper
  mental model of identifying a "pass in a phase of pipeline" is.

* Callbacks accept llvm::Any serving as a wrapper for const IRUnit*, to remove direct dependencies
  on different IRUnits (e.g. Analyses).

* PassInstrumentationAnalysis analysis is explicitly requested from PassManager through
  usual AnalysisManager::getResult. All pass managers were updated to run that
  to get PassInstrumentation object for instrumentation calls.

* Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra
  args out of a generic PassManager's extra args. This is the only way I was able to explicitly
  run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or
  RepeatedPass::run.
  TODO: Upon lengthy discussions we agreed to accept this as an initial implementation
  and then get rid of getAnalysisResult by improving RepeatedPass implementation.

* PassBuilder takes PassInstrumentationCallbacks object to pass it further into
  PassInstrumentationAnalysis. Callbacks registration should be performed directly
  through PassInstrumentationCallbacks.

* new-pm tests updated to account for PassInstrumentationAnalysis being run

* Added PassInstrumentation tests to PassBuilderCallbacks unit tests.
  Other unit tests updated with registration of the now-required PassInstrumentationAnalysis.

  Made getName helper to return std::string (instead of StringRef initially) to fix
  asan builtbot failures on CGSCC tests.

Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D47858

llvm-svn: 342664
2018-09-20 17:08:45 +00:00
Jesper Antonsson 719fa055d0 [InstCombine] Handle vector compares in foldGEPIcmp()
Summary:
This is to fix PR38984 "InstCombine assertion at vector gep/icmp folding":
https://bugs.llvm.org/show_bug.cgi?id=38984

Reviewers: majnemer, spatel, lattner, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D52263

llvm-svn: 342647
2018-09-20 13:37:28 +00:00
Bjorn Pettersson cd53b7f54e [IPSCCP] Fix a problem with removing labels in a switch with undef condition
Summary:
Before removing basic blocks that ipsccp has considered as dead
all uses of the basic block label must be removed. That is done
by calling ConstantFoldTerminator on the users. An exception
is when the branch condition is an undef value. In such
scenarios ipsccp is using some internal assumptions regarding
which edge in the control flow that should remain, while
ConstantFoldTerminator don't know how to fold the terminator.

The problem addressed here is related to ConstantFoldTerminator's
ability to rewrite a 'switch' into a conditional 'br'. In such
situations ConstantFoldTerminator returns true indicating that
the terminator has been rewritten. However, ipsccp treated the
true value as if the edge to the dead basic block had been
removed. So the code for resolving an undef branch condition
did not trigger, and we ended up with assertion that there were
uses remaining when deleting the basic block.

The solution is to resolve indeterminate branches before the
call to ConstantFoldTerminator.

Reviewers: efriedma, fhahn, davide

Reviewed By: fhahn

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52232

llvm-svn: 342632
2018-09-20 09:00:17 +00:00
Eric Christopher 019889374b Temporarily Revert "[New PM] Introducing PassInstrumentation framework"
as it was causing failures in the asan buildbot.

This reverts commit r342597.

llvm-svn: 342616
2018-09-20 05:16:29 +00:00
Fedor Sergeev a5f279ea89 [New PM] Introducing PassInstrumentation framework
Pass Execution Instrumentation interface enables customizable instrumentation
of pass execution, as per "RFC: Pass Execution Instrumentation interface"
posted 06/07/2018 on llvm-dev@

The intent is to provide a common machinery to implement all
the pass-execution-debugging features like print-before/after,
opt-bisect, time-passes etc.

Here we get a basic implementation consisting of:
* PassInstrumentationCallbacks class that handles registration of callbacks
  and access to them.

* PassInstrumentation class that handles instrumentation-point interfaces
  that call into PassInstrumentationCallbacks.

* Callbacks accept StringRef which is just a name of the Pass right now.
  There were some ideas to pass an opaque wrapper for the pointer to pass instance,
  however it appears that pointer does not actually identify the instance
  (adaptors and managers might have the same address with the pass they govern).
  Hence it was decided to go simple for now and then later decide on what the proper
  mental model of identifying a "pass in a phase of pipeline" is.

* Callbacks accept llvm::Any serving as a wrapper for const IRUnit*, to remove direct dependencies
  on different IRUnits (e.g. Analyses).

* PassInstrumentationAnalysis analysis is explicitly requested from PassManager through
  usual AnalysisManager::getResult. All pass managers were updated to run that
  to get PassInstrumentation object for instrumentation calls.

* Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra
  args out of a generic PassManager's extra args. This is the only way I was able to explicitly
  run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or
  RepeatedPass::run.
  TODO: Upon lengthy discussions we agreed to accept this as an initial implementation
  and then get rid of getAnalysisResult by improving RepeatedPass implementation.

* PassBuilder takes PassInstrumentationCallbacks object to pass it further into
  PassInstrumentationAnalysis. Callbacks registration should be performed directly
  through PassInstrumentationCallbacks.

* new-pm tests updated to account for PassInstrumentationAnalysis being run

* Added PassInstrumentation tests to PassBuilderCallbacks unit tests.
  Other unit tests updated with registration of the now-required PassInstrumentationAnalysis.

Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D47858

llvm-svn: 342597
2018-09-19 22:42:57 +00:00
Matt Morehouse e62fc3d0b6 [InstCombine] Disable strcmp->memcmp transform for MSan.
Summary:
The strcmp->memcmp transform can make the resulting memcmp read
uninitialized data, which MSan doesn't like.

Resolves https://github.com/google/sanitizers/issues/993.

Reviewers: eugenis, xbolva00

Reviewed By: eugenis

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D52272

llvm-svn: 342582
2018-09-19 19:37:24 +00:00
Fedor Sergeev 25de3f83be Revert rL342544: [New PM] Introducing PassInstrumentation framework
A bunch of bots fail to compile unittests. Reverting.

llvm-svn: 342552
2018-09-19 14:54:48 +00:00
Roman Lebedev f50023d37c [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((-1 << y) >> y) mask
Summary:
The last low-bit-mask-pattern-producing-pattern i can think of.

https://rise4fun.com/Alive/UGzE <- non-canonical
But we can not canonicalize it because of extra uses.

https://bugs.llvm.org/show_bug.cgi?id=38123

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52148

llvm-svn: 342548
2018-09-19 13:35:46 +00:00
Roman Lebedev ca2bdb03d6 [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((1 << y)+(-1)) mask
Summary:
Same as to D52146.
`((1 << y)+(-1))` is simply non-canoniacal version of `~(-1 << y)`: https://rise4fun.com/Alive/0vl
We can not canonicalize it due to the extra uses. But we can handle it here.

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52147

llvm-svn: 342547
2018-09-19 13:35:40 +00:00
Roman Lebedev 183a465dc6 [InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask
Summary:
Two folds are happening here:
1. https://rise4fun.com/Alive/oaFX
2. And then `foldICmpWithHighBitMask()` (D52001): https://rise4fun.com/Alive/wsP4

This change doesn't just add the handling for eq/ne predicates,
it actually builds upon the previous `foldICmpWithLowBitMaskedVal()` work,
so **all** the 16 fold variants* are immediately supported.

I'm indeed only testing these two predicates.
I do not feel like re-proving all 16 folds*, because they were already proven
for the general case of constant with all-ones in low bits. So as long as
the mask produces all-ones in low bits, i'm pretty sure the fold is valid.

But required, i can re-prove, let me know.

* eq/ne are commutative - 4 folds; ult/ule/ugt/uge - are not commutative (the commuted variant is InstSimplified), 4 folds; slt/sle/sgt/sge are not commutative - 4 folds. 12 folds in total.

https://bugs.llvm.org/show_bug.cgi?id=38123
https://bugs.llvm.org/show_bug.cgi?id=38708

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52146

llvm-svn: 342546
2018-09-19 13:35:27 +00:00
Fedor Sergeev 875c938fec [New PM] Introducing PassInstrumentation framework
Summary:
Pass Execution Instrumentation interface enables customizable instrumentation
of pass execution, as per "RFC: Pass Execution Instrumentation interface"
posted 06/07/2018 on llvm-dev@

The intent is to provide a common machinery to implement all
the pass-execution-debugging features like print-before/after,
opt-bisect, time-passes etc.

Here we get a basic implementation consisting of:
* PassInstrumentationCallbacks class that handles registration of callbacks
  and access to them.

* PassInstrumentation class that handles instrumentation-point interfaces
  that call into PassInstrumentationCallbacks.

* Callbacks accept StringRef which is just a name of the Pass right now.
  There were some ideas to pass an opaque wrapper for the pointer to pass instance,
  however it appears that pointer does not actually identify the instance
  (adaptors and managers might have the same address with the pass they govern).
  Hence it was decided to go simple for now and then later decide on what the proper
  mental model of identifying a "pass in a phase of pipeline" is.

* Callbacks accept llvm::Any serving as a wrapper for const IRUnit*, to remove direct dependencies
  on different IRUnits (e.g. Analyses).

* PassInstrumentationAnalysis analysis is explicitly requested from PassManager through
  usual AnalysisManager::getResult. All pass managers were updated to run that
  to get PassInstrumentation object for instrumentation calls.

* Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra
  args out of a generic PassManager's extra args. This is the only way I was able to explicitly
  run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or
  RepeatedPass::run.
  TODO: Upon lengthy discussions we agreed to accept this as an initial implementation
  and then get rid of getAnalysisResult by improving RepeatedPass implementation.

* PassBuilder takes PassInstrumentationCallbacks object to pass it further into
  PassInstrumentationAnalysis. Callbacks registration should be performed directly
  through PassInstrumentationCallbacks.

* new-pm tests updated to account for PassInstrumentationAnalysis being run

* Added PassInstrumentation tests to PassBuilderCallbacks unit tests.
  Other unit tests updated with registration of the now-required PassInstrumentationAnalysis.

Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D47858

llvm-svn: 342544
2018-09-19 12:25:52 +00:00
Benjamin Kramer e5e1ea79fd [InstCombine] Don't transform sin/cos -> tanl if for half types
This is still unsafe for long double, we will transform things into tanl
even if tanl is for another type. But that's for someone else to fix.

llvm-svn: 342542
2018-09-19 12:01:38 +00:00
Douglas Yung de94ea140c Revert r342494 as it was failing on a bot and the author cannot look at it until tomorrow.
Failing bot: http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/36708

llvm-svn: 342509
2018-09-18 19:34:05 +00:00
Christy Lee c85da8bd9a Do not optimize atomic load to non-atomic memcmp
Differential Revision: https://reviews.llvm.org/D51998

llvm-svn: 342498
2018-09-18 17:02:42 +00:00
Christy Lee 1bc0fb4a3f Check lines before using alias analysis to check for interference
This diff is to show the difference before and after D51550

Differential Revision: https://reviews.llvm.org/D52044

llvm-svn: 342494
2018-09-18 16:43:44 +00:00
Max Kazantsev 0994abda3a [IndVars] Remove unreasonable checks in rewriteLoopExitValues
A piece of logic in rewriteLoopExitValues has a weird check on number of
users which allowed an unprofitable transform in case if an instruction has
more than 6 users.

Differential Revision: https://reviews.llvm.org/D51404
Reviewed By: etherzhhb

llvm-svn: 342444
2018-09-18 04:57:18 +00:00
Matt Arsenault c640798597 LSV: Fix adjust alloca alignment trick for AMDGPU
This was checking the hardcoded address space 0 for the stack.
Additionally, this should be checking for legality with
the adjusted alignment, so defer the alignment check.

Also try to split if the unaligned access isn't allowed.

llvm-svn: 342442
2018-09-18 02:05:44 +00:00
Alina Sbirlea a782a70ad9 [EarlyCSEwMemorySSA] Add MSSA verification and tests to make EarlyCSE failures easier to track.
Summary:
EarlyCSE can make IR changes that will leave MemorySSA with accesses claiming to be optimized, but for which a subsequent MemorySSA run will yield a different optimized result.
Due to relying on AA queries, we can't fix this in general, unless we recompute MemorySSA.
Adding some tests to track this and a basic verify for future potential failures.

Reviewers: george.burgess.iv, gberry

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D51960

llvm-svn: 342422
2018-09-17 22:35:21 +00:00
Matt Arsenault 80ea6dd1d5 Fix vectorization of canonicalize
llvm-svn: 342390
2018-09-17 13:24:30 +00:00
Roman Lebedev 6356864e6d [NFC][InstCombine] One more test pattern for comparisons with low-bit-mask.
https://rise4fun.com/Alive/UGzE <- non-canonical, but has extra uses.

https://bugs.llvm.org/show_bug.cgi?id=38123

llvm-svn: 342345
2018-09-16 12:51:09 +00:00
Roman Lebedev 3fb9414d02 [NFC][InstCombine] Some more tests for comparisons with low-bit-mask.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://bugs.llvm.org/show_bug.cgi?id=38708

llvm-svn: 342343
2018-09-16 08:05:06 +00:00
Craig Topper 2da7381678 [InstCombine] Support (sub (sext x), (sext y)) --> (sext (sub x, y)) and (sub (zext x), (zext y)) --> (zext (sub x, y))
Summary:
If the sub doesn't overflow in the original type we can move it above the sext/zext.

This is similar to what we do for add. The overflow checking for sub is currently weaker than add, so the test cases are constructed for what is supported.

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52075

llvm-svn: 342335
2018-09-15 18:54:10 +00:00
Sanjay Patel 296d35a5e9 [InstCombine][x86] try harder to convert blendv intrinsic to generic IR (PR38814)
Missing optimizations with blendv are shown in:
https://bugs.llvm.org/show_bug.cgi?id=38814

If this works, it's an easier and more powerful solution than adding pattern matching 
for a few special cases in the backend. The potential danger with this transform in IR
is that the condition value can get separated from the select, and the backend might 
not be able to make a blendv out of it again. I don't think that's too likely, but 
I've kept this patch minimal with a 'TODO', so we can test that theory in the wild 
before expanding the transform.

Differential Revision: https://reviews.llvm.org/D52059

llvm-svn: 342324
2018-09-15 14:25:44 +00:00
Roman Lebedev 1b7fc87020 [InstCombine] Inefficient pattern for high-bits checking 3 (PR38708)
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only n bits wide (where n is a variable.)
There are many ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)

The last (as far i know?) pattern, non-canonical due to the extra use.
https://godbolt.org/z/aCMsPk
https://rise4fun.com/Alive/I6f

https://bugs.llvm.org/show_bug.cgi?id=38708

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52062

llvm-svn: 342321
2018-09-15 12:04:13 +00:00
Vedant Kumar 1b02dad9f2 [CodeGenPrepare] Preserve debug locs in OptimizeExtractBits
CodeGenPrepare has a transform that sinks {lshr, trunc} pairs to make it
easier for the backend to emit fancy extract-bits instructions (e.g UBFX).

Teach it to preserve debug locations and salvage debug values.

llvm-svn: 342319
2018-09-15 04:08:52 +00:00
Sanjay Patel 90a36346bc [InstCombine] refactor mul narrowing folds; NFCI
Similar to rL342278:
The test diffs are all cosmetic due to the change in
value naming, but I'm including that to show that the
new code does perform these folds rather than something
else in instcombine.

D52075 should be able to use this code too rather than
duplicating all of the logic.

llvm-svn: 342292
2018-09-14 22:23:35 +00:00
Wei Mi 6a14325dff [SampleFDO] Add FunctionOffsetTable in compact binary format profile.
The patch saves a function offset table which maps function name index to the
offset of its function profile to the start of the binary profile. By using
the function offset table, for those function profiles which will not be used
when compiling a module, the profile reader does't have to read them. For
profile size around 10~20M, it saves ~10% compile time.

Differential Revision: https://reviews.llvm.org/D51863

llvm-svn: 342283
2018-09-14 20:52:59 +00:00
Sanjay Patel 2426eb46dd [InstCombine] refactor add narrowing folds; NFCI
The test diffs are all cosmetic due to the change in
value naming, but I'm including that to show that the
new code does perform these folds rather than something
else in instcombine.

llvm-svn: 342278
2018-09-14 20:40:46 +00:00
Sebastian Pop 0f30f08b02 HotColdSplit: fix invalid SSA due to outlining
The test used to fail with an invalid phi node: the two predecessors were outlined
and the SSA representation was left invalid. The patch adds the exit block to the
cold region.

llvm-svn: 342277
2018-09-14 20:36:19 +00:00
Sanjay Patel fcf8c7c908 [InstCombine] add more tests for add narrowing folds; NFC
llvm-svn: 342274
2018-09-14 20:33:40 +00:00
Sanjay Patel 003f452522 [InstCombine] rename test file to better describe the fold; NFC
The folds are not limited to zext, and the real goal is width
reduction of a math op. D52075 is proposing to extend this to
subtracts.

llvm-svn: 342254
2018-09-14 18:12:30 +00:00
Sanjay Patel 5a9462e42a [InstCombine] remove unnecessary target constraints for tests; NFC
These are universal folds.

llvm-svn: 342253
2018-09-14 18:06:36 +00:00
Sanjay Patel 7b9e1afd1f [InstCombine] move test next to related tests; NFC
llvm-svn: 342251
2018-09-14 18:05:14 +00:00
Sanjay Patel 1f4f26a2bb [InstCombine] remove stall comment from test file; NFC
llvm-svn: 342250
2018-09-14 18:02:17 +00:00
Sanjay Patel f7ba0ac0b5 [InstCombine] regenerate test checks; NFC
There was a bug in a check line regex that could cause the test to fail
with a naming difference. The auto-gen script seems to work as expected now.

llvm-svn: 342249
2018-09-14 17:53:44 +00:00
Sanjay Patel b437238e95 [InstCombine] add more tests for x86 blendv (PR38814); NFC
llvm-svn: 342237
2018-09-14 13:47:33 +00:00
Florian Hahn 3afb974aa5 [LoopInterchange] Preserve ScalarEvolution, by forgetting about interchanged loops.
As preparation for LoopInterchange becoming a loop pass, it needs to
preserve ScalarEvolution. Even though interchanging should not change
the trip count of the loop, it modifies loop entry, latch and exit
blocks.

I added -verify-scev to some loop interchange tests, but the verification does
not catch problems caused by missing invalidation of SE in loop interchange, as
the trip counts themselves do not change. So there might be potential to
make the SE verification covering more stuff in the future.

Reviewers: mkazantsev, efriedma, karthikthecool

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D52026

llvm-svn: 342209
2018-09-14 07:50:20 +00:00
Craig Topper e385365c40 [InstCombine] Add some test cases for (add (sext x), (sext y)) --> (sext (add int x, y)) and (mul (sext x), (sext y)) --> (sext (mul x, y)). NFC
llvm-svn: 342203
2018-09-14 05:16:58 +00:00
Hideki Saito ea7f3035a0 [VPlan] Implement initial vector code generation support for simple outer loops.
Summary:
[VPlan] Implement vector code generation support for simple outer loops.

Context: Patch Series #1 for outer loop vectorization support in LV  using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
                                                          
This patch introduces vector code generation support for simple outer loops that are currently supported in the VPlanNativePath. Changes here essentially do the following:

  - force vector code generation using explicit vectorize_width

  - add conservative early returns in cost model and other places for VPlanNativePath

  - add code for setting up outer loop inductions 

  - support for widening non-induction PHIs that can result from inner loops and uniform conditional branches

  - support for generating uniform inner branches

We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer. 

Reviewers: fhahn, rengolin, hsaito, dcaballe, mkuper, hfinkel, Ayal

Reviewed By: fhahn, hsaito

Subscribers: dmgreen, bollu, tschuett, rkruppe, rogfer01, llvm-commits

Differential Revision: https://reviews.llvm.org/D50820

llvm-svn: 342197
2018-09-14 00:36:00 +00:00
Roman Lebedev d2316d756e [NFC][InstCombine] PR38708 - inefficient pattern for high-bits checking 3.
The last, non-canonical variant:
https://godbolt.org/z/aCMsPk
https://rise4fun.com/Alive/I6f

It can only happen due to the extra use on the inner shift.
But here it is ok.

https://bugs.llvm.org/show_bug.cgi?id=38708

llvm-svn: 342184
2018-09-13 21:34:47 +00:00
Roman Lebedev 6dc87004fa [InstCombine] Inefficient pattern for high-bits checking 2 (PR38708)
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only n bits wide (where n is a variable.)
There are many ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)

More complicated, canonical pattern:
https://rise4fun.com/Alive/uhA

We do need to have two `switch()`'es like this,
to not mismatch the swappable predicates.

https://bugs.llvm.org/show_bug.cgi?id=38708

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52001

llvm-svn: 342173
2018-09-13 20:33:12 +00:00
Roman Lebedev 083744852a [NFC][InstCombine] Test what happens if 'unefficient high bit check' pattern is on both sides.
Came up in https://reviews.llvm.org/D52001#1233827
While we don't do a good job here, we at least want to make
sure that we don't have any inf-loops.

llvm-svn: 342171
2018-09-13 20:33:02 +00:00
Craig Topper 8fc05ce340 [InstCombine] Fold (xor (min/max X, Y), -1) -> (max/min ~X, ~Y) when X and Y are freely invertible.
This allows the xor to be removed completely.

This might help with recomitting r341674, but seems good regardless.

Coincidentally fixes PR38915.

Differential Revision: https://reviews.llvm.org/D51964

llvm-svn: 342163
2018-09-13 18:52:58 +00:00
Craig Topper 3fc5e72d84 [InstCombine] Add test cases for D51964. NFC
llvm-svn: 342162
2018-09-13 18:52:56 +00:00
Matt Arsenault 9de2fb58fa AMDGPU: Fix some outdated datalayouts in tests
llvm-svn: 342131
2018-09-13 11:56:28 +00:00
Max Kazantsev b2724d9af8 [NFC] Add Requires: asserts where needed
llvm-svn: 342108
2018-09-13 04:43:24 +00:00
Max Kazantsev 0e0e19c980 [NFC] Use expensive asserts in relevant LICM tests
llvm-svn: 342107
2018-09-13 04:00:39 +00:00
Sanjay Patel d341988c86 revert r341288 - [Reassociate] swap binop operands to increase factoring potential
This causes or exposes indeterminism that is visible in the output of -reassociate.

llvm-svn: 342083
2018-09-12 21:29:11 +00:00
Sanjay Patel 31017cd10a [InstCombine] add tests for unsigned add overflow; NFC
llvm-svn: 342082
2018-09-12 21:13:37 +00:00
Roman Lebedev e14b0282bb [NFC][InstCombine] Drop newly-added interference-tests-for-high-bit-check.ll
Now that i have actually double-checked, no,
there is no such interference possible...

llvm-svn: 342076
2018-09-12 20:06:46 +00:00
Roman Lebedev 91c668a276 [NFC][InstCombine] R38708 - inefficient pattern for high-bits checking.
More complicated, canonical pattern:
https://rise4fun.com/Alive/uhA
https://godbolt.org/z/o4RB8D

Also, we need to be careful not to skip some patters...

https://bugs.llvm.org/show_bug.cgi?id=38708

llvm-svn: 342074
2018-09-12 19:44:26 +00:00
Roman Lebedev 75404fb9f8 [InstCombine] Inefficient pattern for high-bits checking (PR38708)
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only `n` bits wide (where `n` is a variable.)
There are **many** ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)

Let's handle the second variant first, since it is much simpler.
https://rise4fun.com/Alive/LYjY

https://bugs.llvm.org/show_bug.cgi?id=38708

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51985

llvm-svn: 342067
2018-09-12 18:19:43 +00:00
Roman Lebedev 99359f391e [NFC][InstCombine] R38708 - inefficient pattern for high-bits checking.
The simplest pattern for now:
https://rise4fun.com/Alive/LYjY
https://godbolt.org/z/o4RB8D

https://bugs.llvm.org/show_bug.cgi?id=38708

llvm-svn: 342054
2018-09-12 14:11:37 +00:00
David Green e27e87cdcb [CGP] Ensure splitgep gives deterministic output
The output of splitLargeGEPOffsets does not appear to be deterministic because
of the way that we iterate over a DenseMap. I've changed it to a MapVector for
consistent output.

The test here isn't particularly great, only showing a consmetic difference in
output. The original reproducer is much larger but show a diffierence in
instruction ordering, leading to different codegen.

Differential Revision: https://reviews.llvm.org/D51851

llvm-svn: 342043
2018-09-12 10:19:10 +00:00
David Green 2352b30c96 [SimplifyCFG] Put an alignment on generated switch tables
Previously the alignment on the newly created switch table data was not set,
meaning that DataLayout::getPreferredAlignment was free to overalign it to 16
bytes. This causes unnecessary code bloat.

Differential Revision: https://reviews.llvm.org/D51800

llvm-svn: 342039
2018-09-12 09:54:17 +00:00
Florian Hahn 1086ce2397 [LV] Move InterleaveGroup and InterleavedAccessInfo to VectorUtils.h (NFC)
Move the 2 classes out of LoopVectorize.cpp to make it easier to re-use
them for VPlan outside LoopVectorize.cpp

Reviewers: Ayal, mssimpso, rengolin, dcaballe, mkuper, hsaito, hfinkel, xbolva00

Reviewed By: rengolin, xbolva00

Differential Revision: https://reviews.llvm.org/D49488

llvm-svn: 342027
2018-09-12 08:01:57 +00:00
Sanjay Patel 1cf0734b2f [InstCombine] add folds for unsigned-overflow compares
Name: op_ugt_sum
  %a = add i8 %x, %y
  %r = icmp ugt i8 %x, %a
  =>
  %notx = xor i8 %x, -1
  %r = icmp ugt i8 %y, %notx

Name: sum_ult_op
  %a = add i8 %x, %y
  %r = icmp ult i8 %a, %x
  =>
  %notx = xor i8 %x, -1
  %r = icmp ugt i8 %y, %notx

https://rise4fun.com/Alive/ZRxI

AFAICT, this doesn't interfere with any add-saturation patterns
because those have >1 use for the 'add'. But this should be
better for IR analysis and codegen in the basic cases.

This is another fold inspired by PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

llvm-svn: 342004
2018-09-11 22:40:20 +00:00
Sanjay Patel 26725bdc50 [InstCombine] add folds for icmp with xor mask constant
These are the folds in Alive;
Name: xor_ult
Pre: isPowerOf2(-C1)
%xor = xor i8 %x, C1
%r = icmp ult i8 %xor, C1
=>
%r = icmp ugt i8 %x, ~C1

Name: xor_ugt
Pre: isPowerOf2(C1+1)
%xor = xor i8 %x, C1
%r = icmp ugt i8 %xor, C1
=>
%r = icmp ugt i8 %x, C1

https://rise4fun.com/Alive/Vty

The ugt case in its simplest form was already handled by DemandedBits,
but that's not ideal as shown in the multi-use test.

I'm not sure if these are all of the symmetrical folds, but I adjusted 
the existing code for one of the folds to try to show the similarities.

There's no obvious connection, but this is another preliminary step 
for PR14613...
https://bugs.llvm.org/show_bug.cgi?id=14613

llvm-svn: 341997
2018-09-11 22:00:15 +00:00
Sanjay Patel c79d964fdd [InstCombine] add tests for icmp with xor; NFC
llvm-svn: 341993
2018-09-11 21:13:20 +00:00
Alina Sbirlea a496143c9e Update MemorySSA in LoopUnswitch.
Summary:
Update MemorySSA in old LoopUnswitch pass.
Actual dependency and update is disabled by default.

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D45301

llvm-svn: 341984
2018-09-11 19:19:21 +00:00
Sanjay Patel 342c3bcf11 [InstCombine] enhance vector demanded elements to look at a vector select condition operand
I noticed that we were not back-propagating undef lanes to shuffle masks when we have a 
shuffle that reduces the vector width. This is part of investigating/solving PR38691:
https://bugs.llvm.org/show_bug.cgi?id=38691

The DAG equivalent was proposed with:
D51696

Differential Revision: https://reviews.llvm.org/D51433

llvm-svn: 341981
2018-09-11 18:49:00 +00:00
Sanjay Patel 44c1b3a331 [InstCombine] add tests for add-with-overflow compares; NFC
llvm-svn: 341979
2018-09-11 18:45:28 +00:00
Craig Topper 4e63db8387 [InstCombine] Fix incorrect usage of getPrimitiveSizeInBits when we should be using the element size for vectors
For vectors, getPrimitiveSizeInBits returns the full vector width. This code should using the element size for vectors. This could be fixed by calling getScalarSizeInBits, but its even easier to just get it from the APInt we're checking.

Differential Revision: https://reviews.llvm.org/D51938

llvm-svn: 341971
2018-09-11 17:57:20 +00:00
Florian Hahn 5b7e21a6b7 [CallSiteSplitting] Add debug location to created PHI nodes.
There are 2 cases when we create PHI nodes:
 * For the result of the call that was duplicated in the split blocks.
   Those PHI nodes should have the debug location of the call.

 * For values produced before the call. Those instructions need to be
   duplicated in the split blocks and the PHI nodes should have the
   debug locations of those instructions.

Fixes PR37962.

Reviewers: junbuml, gbedwell, vsk

Reviewed By: junbuml

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D51919

llvm-svn: 341970
2018-09-11 17:55:58 +00:00
Craig Topper a57bb61a3e [InstCombine] Support (mul (sext x), cst) --> (sext (mul x, cst')) and (mul (zext x), cst) --> (zext (mul x, cst')) for vectors constants.
Similar to D51236, but for mul instead of add.

Differential Revision: https://reviews.llvm.org/D51900

llvm-svn: 341961
2018-09-11 16:51:24 +00:00
Alexandros Lamprineas 96762b37e1 [MemorySSAUpdater] Avoid creating self-referencing MemoryDefs
Fix for https://bugs.llvm.org/show_bug.cgi?id=38807, which occurred
while compiling SemaTemplateInstantiate.cpp with clang and GVNHoist
enabled. In the following example:

      1=def(entry)
      /        \
2=def(1)       4=def(1)
3=def(2)       5=def(4)

When removing the MemoryDef 2=def(1) from its basic block, and just
before adding it to the end of the parent basic block, we first
replace all its uses with the defining memory access:

3=def(2) -> 3=def(1)

Then we call insertDef for adding 2=def(1) to the parent basic block,
where we replace the uses of 1=def(entry) with 2=def(1). Doing so we
create a self reference:

2=def(1) -> 2=def(2)  (bad)
3=def(1) -> 3=def(2)  (ok)
4=def(1) -> 4=def(2)  (ok)

Differential Revision: https://reviews.llvm.org/D51801

llvm-svn: 341947
2018-09-11 14:29:59 +00:00
Johannes Doerfert ae3cfeb3ad [FuncAttrs] Remove "access range attributes" for read-none functions
The presence of readnone and an access range attribute (argmemonly,
inaccessiblememonly, inaccessiblemem_or_argmemonly) is considered an
error by the verifier. This seems strict but also not wrong. This
patch makes sure function attribute detection will remove all access
range attributes for readnone functions.

llvm-svn: 341927
2018-09-11 11:51:29 +00:00
Max Kazantsev 9aacaffd98 [NFC] Specify test's option to reduce reliance on defaults
llvm-svn: 341904
2018-09-11 06:34:43 +00:00
Craig Topper 3de8d592d1 [InstCombine] Add testcases for (mul (sext x), cst) --> (sext (mul x, cst')) and (mul (zext x), cst) --> (zext (mul x, cst')) for vectors constants.
If the multiply won't overflow in the original type we can use a smaller mul and sign extend afterwards. We don't currently support this for vector constants.

llvm-svn: 341884
2018-09-10 23:48:21 +00:00
Alina Sbirlea 116caa2920 [InstCombine] Partially revert rL341674 due to PR38897.
Summary:
Revert min/max changes in rL341674 dues to high compile times causing timeouts (PR38897).
Checking in to unblock failing builds. Patch available for post-commit review and re-revert once resolved.
Working on a smaller reproducer for PR38897.

Reviewers: craig.topper, spatel

Subscribers: sanjoy, jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D51897

llvm-svn: 341883
2018-09-10 23:47:21 +00:00
Sebastian Pop d76177869a HotColdSplitting: fix test failing because of last commit
llvm-svn: 341839
2018-09-10 15:42:17 +00:00
Gil Rapaport d874c3a480 [LSR] Add tests for small constants; NFC
LSR reassociates small constants that fit into add immediate operands as
unfolded offset. Since unfolded offset is not combined with loop-invariant
registers, LSR does not consider solutions that bump invariant registers by
these constants outside the loop.

llvm-svn: 341835
2018-09-10 14:56:24 +00:00
Tim Northover 12c1f7675f InstCombine: move hasOneUse check to the top of foldICmpAddConstant
There were two combines not covered by the check before now, neither of which
actually differed from normal in the benefit analysis.

The most recent seems to be because it was just added at the top of the
function (naturally). The older is from way back in 2008 (r46687) when we just
didn't put those checks in so routinely, and has been diligently maintained
since.

llvm-svn: 341831
2018-09-10 14:26:44 +00:00
John Brawn 8967e18c4a [GVN] Invalidate cached info for values replaced by equality propagation
When GVN propagates an equality by replacing one value with another it also
needs to invalidate the cached information for the value being replaced.

Differential Revision: https://reviews.llvm.org/D51218

llvm-svn: 341820
2018-09-10 12:23:05 +00:00
Max Kazantsev 4d10ba37b9 [IndVars] Set Changed if sinkUnusedInvariants changes IR. PR38863
Currently, `sinkUnusedInvariants` does not set Changed flag even if it makes
changes in the IR. There is no clear evidence that it can cause a crash, but it
looks highly suspicious and likely invalid.

Differential Revision: https://reviews.llvm.org/D51777
Reviewed By: skatkov

llvm-svn: 341777
2018-09-10 06:32:00 +00:00
Matt Arsenault 72d27f5525 AMDGPU: Fix tests using old number for constant address space
llvm-svn: 341770
2018-09-10 02:54:25 +00:00
Abderrazek Zaafrani c30dfb2dfc [SimplifyIndVar] Avoid generating truncate instructions with non-hoisted Laod operand.
Differential Revision: https://reviews.llvm.org/D49151

llvm-svn: 341726
2018-09-07 22:41:57 +00:00
Piotr Padlewski 9a925ba616 Set cost of invariant group intrinsics to 0
Summary:
Like with other similar intrinsics, presense of strip or
launder.invariant.group should not change the result of inlining cost.
This is because they are just markers and do not perform any computation.

Reviewers: amharc, rsmith, reames, kuhar

Subscribers: eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D51814

llvm-svn: 341725
2018-09-07 22:29:48 +00:00
Sanjay Patel caa4de72a2 [InstCombine][x86] add tests for possible blendv transform (PR38814); NFC
llvm-svn: 341715
2018-09-07 21:40:41 +00:00
Philip Reames cb8b3278e5 [AST] Generalize argument specific aliasing
AliasSetTracker has special case handling for memset, memcpy and memmove which pre-existed argmemonly on functions and readonly and writeonly on arguments. This patch generalizes it using the AA infrastructure to any call correctly annotated.

The motivation here is to cut down on confusion, not performance per se. For most instructions, there is a direct mapping to alias set. However, this is not guaranteed by the interface and was not in fact true for these three intrinsics *and only these three intrinsics*. I kept getting myself confused about this invariant, so I figured it would be good to clearly distinguish between a instructions and alias sets. Calls happened to be an easy target.

The nice side effect is that custom implementations of memset/memcpy/memmove - including wrappers discovered by IPO - can now be optimized the same as builts by LICM.

Note: The actual removal of the memset/memtransfer specific handling will happen in a follow on NFC patch.  It was originally part of this one, but separate for ease of review and rebase.

Differential Revision: https://reviews.llvm.org/D50730

llvm-svn: 341713
2018-09-07 21:36:11 +00:00
Sanjay Patel c1416b60f2 [InstCombine] narrow vector select with padded condition and extracted result (PR38691)
shuf (sel (shuf NarrowCond, undef, WideMask), X, Y), undef, NarrowMask) -->
sel NarrowCond, (shuf X, undef, NarrowMask), (shuf Y, undef, NarrowMask)

The motivating case from:
https://bugs.llvm.org/show_bug.cgi?id=38691
...is the last regression test. In that case, we're just left with the narrow select.

Note that if we do create new shuffles, they use the existing extraction identity mask, 
so there's no danger that this transform creates arbitrary shuffles.

Differential Revision: https://reviews.llvm.org/D51496

llvm-svn: 341708
2018-09-07 21:03:34 +00:00
Craig Topper 040c2b0acf [InstCombine] Fold (min/max ~X, Y) -> ~(max/min X, ~Y) when Y is freely invertible
If the ~X wasn't able to simplify above the max/min, we might be able to simplify it by moving it below the max/min.

I had to modify the ~(min/max ~X, Y) transform to prevent getting stuck in a loop when we saw the new ~(max/min X, ~Y) before the ~Y had been folded away to remove the new not.

Differential Revision: https://reviews.llvm.org/D51398

llvm-svn: 341674
2018-09-07 16:19:50 +00:00
Anna Thomas 110df11a1a [LV] Fix code gen for conditionally executed loads and stores
Fix a latent bug in loop vectorizer which generates incorrect code for
memory accesses that are executed conditionally. As pointed in review,
this bug definitely affects uniform loads and may affect conditional
stores that should have turned into scatters as well).

The code gen for conditionally executed uniform loads on architectures
that support masked gather instructions is broken.

Without this patch, we were unconditionally executing the *conditional*
load in the vectorized version.

This patch does the following:
1. Uniform conditional loads on architectures with gather support will
   have correct code generated. In particular, the cost model
   (setCostBasedWideningDecision) is fixed.
2. For the recipes which are handled after the widening decision is set,
   we use the isScalarWithPredication(I, VF) form which is added in the
   patch.

3. Fix the vectorization cost model for scalarization
   (getMemInstScalarizationCost): implement and use isPredicatedInst to
   identify *all* predicated instructions, not just scalar+predicated. So,
   now the cost for scalarization will be increased for maskedloads/stores
   and gather/scatter operations. In short, we should be choosing the
   gather/scatter in place of scalarization on archs where it is
   profitable.
4. We needed to weaken the assert in useEmulatedMaskMemRefHack.

Reviewers: Ayal, hsaito, mkuper

Differential Revision: https://reviews.llvm.org/D51313

llvm-svn: 341673
2018-09-07 15:53:48 +00:00
Aditya Kumar 801394a3d7 Hot cold splitting pass
Find cold blocks based on profile information (or optionally with static analysis).
Forward propagate profile information to all cold-blocks.
Outline a cold region.
Set calling conv and prof hint for the callsite of the outlined function.

Worked in collaboration with: Sebastian Pop <s.pop@samsung.com>
Differential Revision: https://reviews.llvm.org/D50658

llvm-svn: 341669
2018-09-07 15:03:49 +00:00
Florian Hahn e32ff4b28a [InstCombine] Do not fold scalar ops over select with vector condition.
If OtherOpT or OtherOpF have scalar types and the condition is a vector,
we would create an invalid select.

Reviewers: spatel, john.brawn, mssimpso, craig.topper

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D51781

llvm-svn: 341666
2018-09-07 14:40:06 +00:00
Florian Hahn b30f7aeeeb [NewGVN] Mark function as changed if we erase instructions.
Currently eliminateInstructions only returns true if any instruction got
replaced. In the test case for this patch, we eliminate the trivially
dead calls, for which eliminateInstructions not do a replacement and the
function is not marked as changed, which is why the inliner crashes
while traversing the call graph.

Alternatively we could also change eliminateInstructions to return true
in case we mark instructions for deletion, but that's slightly more code
and doing it at the place where the replacement happens seems safer.

Fixes PR37517.

Reviewers: davide, mcrosier, efriedma, bjope

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D51169

llvm-svn: 341651
2018-09-07 11:41:34 +00:00
Max Kazantsev 9e6845d8e1 [IndVars] Set Changed when we delete dead instructions. PR38855
IndVars does not set `Changed` flag when it eliminates dead instructions. As result,
it may make IR modifications and report that it has done nothing. It leads to inconsistent
preserved analyzes results.

Differential Revision: https://reviews.llvm.org/D51770
Reviewed By: skatkov

llvm-svn: 341633
2018-09-07 07:23:39 +00:00
Wei Mi 94d44c97bc [SampleFDO] Make sample profile loader unaware of compact format change.
The patch tries to make sample profile loader independent of profile format
change. It moves compact format related code into FunctionSamples and
SampleProfileReader classes, and sample profile loader only has to interact
with those two classes and will be unaware of profile format changes.

The cleanup also contain some fixes to further remove the difference between
compactbinary format and binary format. After the cleanup using different
formats originated from the same profile will generate the same binaries,
which we verified by compiling two large server benchmarks w/wo thinlto.

Differential Revision: https://reviews.llvm.org/D51643

llvm-svn: 341591
2018-09-06 22:03:37 +00:00
Sanjay Patel 93bd15a005 [InstCombine] add xor+not folds
This fold is needed to avoid a regression when we try
to recommit rL300977. 
We can't see the most basic win currently because 
demanded bits changes the patterns:
https://rise4fun.com/Alive/plpp

llvm-svn: 341559
2018-09-06 16:23:40 +00:00
Sanjay Patel 99d732052f [InstCombine] add tests for xor-not; NFC
These tests demonstrate a missing fold that would
also be needed to avoid a regression when we try 
to recommit rL300977.

llvm-svn: 341557
2018-09-06 15:35:01 +00:00
David Green e6918ca2b3 [SLC] Add an alignment to CreateGlobalString
Previously the alignment on the newly created global strings was not set,
meaning that DataLayout::getPreferredAlignment was free to overalign it
to 16 bytes. This caused unnecessary code bloat with the padding between
variables.

The main example of this happening was the printf->puts optimisation in
SimplifyLibCalls, but as the change here is made in
IRBuilderBase::CreateGlobalString, other globals using this will now be
aligned too.

Differential Revision: https://reviews.llvm.org/D51410

llvm-svn: 341527
2018-09-06 08:42:17 +00:00
Max Kazantsev e157cea3ec [NFC] Add test on full IV widening
llvm-svn: 341456
2018-09-05 10:10:59 +00:00
Sanjay Patel 63cf26cf01 [InstCombine] fix xor-or-xor fold to check uses and handle commutes
I'm probably missing some way to use m_Deferred to remove the code
duplication, but that can be a follow-up.

The improvement in demand_shrink_nsw.ll is an example of missing
the fold because the pattern matching was deficient. I didn't try
to follow the bits in that test, but Alive says it's correct:
https://rise4fun.com/Alive/ugc

llvm-svn: 341426
2018-09-04 23:22:13 +00:00
Sanjay Patel 018ce562a9 [InstCombine] update tests checks; NFC
llvm-svn: 341424
2018-09-04 23:08:23 +00:00
Zhaoshi Zheng a0aa41d793 Revert "Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions"
Reland r341269. Use std::stable_sort when sorting constant condidates.

Reverting commit, r341365:

  Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions

  One of the tests is failing 50% of the time when expensive checks are
  enabled. Not sure how deep the problem is so just reverting while the
  author can investigate so that the bots stop repeatedly failing and
  blaming things incorrectly. Will respond with details on the original
  commit.

Original commit, r341269:

  [Constant Hoisting] Hoisting Constant GEP Expressions

  Leverage existing logic in constant hoisting pass to transform constant GEP
  expressions sharing the same base global variable. Multi-dimensional GEPs are
  rewritten into single-dimensional GEPs.

  https://reviews.llvm.org/D51396

Differential Revision: https://reviews.llvm.org/D51654

llvm-svn: 341417
2018-09-04 22:17:03 +00:00
Anna Thomas dbacea188b [LV] First order recurrence phis should not be treated as uniform
This is fix for PR38786.
First order recurrence phis were incorrectly treated as uniform,
which caused them to be vectorized as uniform instructions.

Patch by Ayal Zaks and Orivej Desh!

Reviewed by: Anna

Differential Revision: https://reviews.llvm.org/D51639

llvm-svn: 341416
2018-09-04 22:12:23 +00:00
Sanjay Patel 5bbe8cd7ef [InstCombine] add tests for xor-or-xor fold; NFC
There are 2 bugs shown here that were untested before:
1. We fail to perform the fold in 1/2 the possible commuted variants.
2. When the fold is done, it disregards extra uses.

llvm-svn: 341415
2018-09-04 22:10:23 +00:00
Sanjay Patel 0f70f86ce0 [InstCombine] make ((X & C) ^ C) form consistent for vectors
It would be better to create a 'not' here, but that's not possible yet.

llvm-svn: 341410
2018-09-04 21:17:14 +00:00
Fedor Sergeev 8b6effd969 [SimpleLoopUnswitch] remove a chain of dead blocks at once
Recent change to deleteDeadBlocksFromLoop was not enough to
fix all the problems related to dead blocks after nontrivial
unswitching of switches.

We need to delete all the dead blocks that were created during
unswitching, otherwise we will keep having problems with phi's
or dead blocks.

This change removes all the dead blocks that are reachable from the loop,
not trying to track whether these blocks are newly created by unswitching
or not. While not completely correct, we are unlikely to get loose but
reachable dead blocks that do not belong to our loop nest.

It does fix all the failures currently known, in particular PR38778.

Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D51519

llvm-svn: 341398
2018-09-04 20:19:41 +00:00
Sanjay Patel 664b2e3bd6 [InstCombine] improve xor+and/or tests
The tests attempted to check for commuted variants
of these folds, but complexity-based canonicalization
meant we had no coverage for at least 1/2 of the cases.

Also, the folds correctly check hasOneUse(), but there
was no coverage for that.

llvm-svn: 341394
2018-09-04 19:06:46 +00:00
Hiroshi Yamauchi 9775a620b0 [PGO] Control Height Reduction
Summary:
Control height reduction merges conditional blocks of code and reduces the
number of conditional branches in the hot path based on profiles.

if (hot_cond1) { // Likely true.
  do_stg_hot1();
}
if (hot_cond2) { // Likely true.
  do_stg_hot2();
}

->

if (hot_cond1 && hot_cond2) { // Hot path.
  do_stg_hot1();
  do_stg_hot2();
} else { // Cold path.
  if (hot_cond1) {
    do_stg_hot1();
  }
  if (hot_cond2) {
    do_stg_hot2();
  }
}

This speeds up some internal benchmarks up to ~30%.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: xbolva00, dmgreen, mehdi_amini, llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D50591

llvm-svn: 341386
2018-09-04 17:19:13 +00:00
Chandler Carruth 6cb12444cc Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions
One of the tests is failing 50% of the time when expensive checks are
enabled. Not sure how deep the problem is so just reverting while the
author can investigate so that the bots stop repeatedly failing and
blaming things incorrectly. Will respond with details on the original
commit.

llvm-svn: 341365
2018-09-04 13:36:44 +00:00
Chandler Carruth 664aa868f5 [x86/SLH] Add a real Clang flag and LLVM IR attribute for Speculative
Load Hardening.

Wires up the existing pass to work with a proper IR attribute rather
than just a hidden/internal flag. The internal flag continues to work
for now, but I'll likely remove it soon.

Most of the churn here is adding the IR attribute. I talked about this
Kristof Beyls and he seemed at least initially OK with this direction.
The idea of using a full attribute here is that we *do* expect at least
some forms of this for other architectures. There isn't anything
*inherently* x86-specific about this technique, just that we only have
an implementation for x86 at the moment.

While we could potentially expose this as a Clang-level attribute as
well, that seems like a good question to defer for the moment as it
isn't 100% clear whether that or some other programmer interface (or
both?) would be best. We'll defer the programmer interface side of this
for now, but at least get to the point where the feature can be enabled
without relying on implementation details.

This also allows us to do something that was really hard before: we can
enable *just* the indirect call retpolines when using SLH. For x86, we
don't have any other way to mitigate indirect calls. Other architectures
may take a different approach of course, and none of this is surfaced to
user-level flags.

Differential Revision: https://reviews.llvm.org/D51157

llvm-svn: 341363
2018-09-04 12:38:00 +00:00
Nicola Zaghen 9588ad9611 [InstCombine] Fold icmp ugt/ult (add nuw X, C2), C --> icmp ugt/ult X, (C - C2)
Support for sgt/slt was added in rL294898, this adds the same cases also for unsigned compares.

This is the Alive proof: https://rise4fun.com/Alive/nyY

Differential Revision: https://reviews.llvm.org/D50972

llvm-svn: 341353
2018-09-04 10:29:48 +00:00
Max Kazantsev 2cbba56337 [IndVars] Fix usage of SCEVExpander to not mess with SCEVConstant. PR38674
This patch removes the function `expandSCEVIfNeeded` which behaves not as
it was intended. This function tries to make a lookup for exact existing expansion
and only goes to normal expansion via `expandCodeFor` if this lookup hasn't found
anything. As a result of this, if some instruction above the loop has a `SCEVConstant`
SCEV, this logic will return this instruction when asked for this `SCEVConstant` rather
than return a constant value. This is both non-profitable and in some cases leads to
breach of LCSSA form (as in PR38674).

Whether or not it is possible to break LCSSA with this algorithm and with some
non-constant SCEVs is still in question, this is still being investigated. I wasn't
able to construct such a test so far, so maybe this situation is impossible. If it is,
it will go as a separate fix.

Rather than do it, it is always correct to just invoke `expandCodeFor` unconditionally:
it behaves smarter about insertion points, and as side effect of this it will choose a
constant value for SCEVConstants. For other SCEVs it may end up finding a better insertion
point. So it should not be worse in any case.

NOTE: So far the only known case for which this transform may break LCSSA is mapping
of SCEVConstant to an instruction. However there is a suspicion that the entire algorithm
can compromise LCSSA form for other cases as well (yet not proved).

Differential Revision: https://reviews.llvm.org/D51286
Reviewed By: etherzhhb

llvm-svn: 341345
2018-09-04 05:01:35 +00:00
Sanjay Patel d75064e6d5 [InstCombine] allow add+not --> sub for arbitrary vector constants.
llvm-svn: 341335
2018-09-03 18:21:59 +00:00
Sanjay Patel faa02b1abb [InstCombine] consolidate tests for ~(X+C); NFC
llvm-svn: 341332
2018-09-03 18:04:21 +00:00
Florian Hahn cc9dc599ba [SLC] Support expanding pow(x, n+0.5) to x * x * ... * sqrt(x)
Reviewers: evandro, efriedma, spatel

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D51435

llvm-svn: 341330
2018-09-03 17:37:39 +00:00
Sanjay Patel 17e709b66a [InstCombine] allow not+sub fold for arbitrary vector constants
The fold was implemented for the general case but use-limitation,
but the later constant version which didn't check uses was only
matching splat constants.

llvm-svn: 341292
2018-09-02 19:31:45 +00:00
Sanjay Patel 04ab22b3f4 [InstCombine] move/add tests for not+sub; NFC
llvm-svn: 341291
2018-09-02 19:18:13 +00:00
Sanjay Patel ca36eb4e33 [Reassociate] swap binop operands to increase factoring potential
If we have a pair of binops feeding another pair of binops, rearrange the operands so 
the matching pair are together because that allows easy factorization folds to happen 
in instcombine:
((X << S) & Y) & (Z << S) --> ((X << S) & (Z << S)) & Y (reassociation)

--> ((X & Z) << S) & Y (factorize shift from 'and' ops optimization)

This is part of solving PR37098:
https://bugs.llvm.org/show_bug.cgi?id=37098

Note that there's an instcombine version of this patch attached there, but we're trying
to make instcombine have less responsibility to improve compile-time efficiency.

For reasons I still don't completely understand, reassociate does this kind of transform
sometimes, but misses everything in my motivating cases.

This patch on its own is gluing an independent cleanup chunk to the end of the existing 
RewriteExprTree() loop. We can build on it and do something stronger to better order the 
full expression tree like D40049. That might be an alternative to the proposal to add a 
separate reassociation pass like D41574.

Differential Revision: https://reviews.llvm.org/D45842

llvm-svn: 341288
2018-09-02 14:22:54 +00:00
Zhaoshi Zheng f5297fb24b [Constant Hoisting] Hoisting Constant GEP Expressions
Leverage existing logic in constant hoisting pass to transform constant GEP
expressions sharing the same base global variable. Multi-dimensional GEPs are
rewritten into single-dimensional GEPs.

Differential Revision: https://reviews.llvm.org/D51396

llvm-svn: 341269
2018-09-01 00:04:56 +00:00
Matt Arsenault c807ce0ee4 SLPVectorizer: Fix assert with different sized address spaces
llvm-svn: 341215
2018-08-31 14:34:53 +00:00
Eli Friedman d5d0a4d27f [ARM] Enable GEP offset splitting for 32-bit ARM.
It has essentially the same benefit it has on 64-bit ARM: it
substantially reduces the number of constants used by large GEP
operations. Seems to be generally helpful across a few different
codebases I've tried.

Differential Revision: https://reviews.llvm.org/D51462

llvm-svn: 341136
2018-08-30 22:18:27 +00:00
Vlad Tsyrklevich 2499aeead9 SafeStack: Prevent OOB reads with mem intrinsics
Summary:
Currently, the SafeStack analysis disallows out-of-bounds writes but not
out-of-bounds reads for mem intrinsics like llvm.memcpy. This could
cause leaks of pointers to the safe stack by leaking spilled registers/
frame pointers. Check for allocas used as source or destination pointers
to mem intrinsics.

Reviewers: eugenis

Reviewed By: eugenis

Subscribers: pcc, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D51334

llvm-svn: 341116
2018-08-30 20:44:51 +00:00
Evandro Menezes 2123ea7d5c [InstCombine] Expand the simplification of pow() into exp2()
Generalize the simplification of `pow(2.0, y)` to `pow(2.0 ** n, y)` for all
scalar and vector types.

This improvement helps some benchmarks in SPEC CPU2000 and CPU2006, such as
252.eon, 447.dealII, 453.povray.  Otherwise, no significant regressions on
x86-64 or A64.

Differential revision: https://reviews.llvm.org/D49273

llvm-svn: 341095
2018-08-30 19:04:51 +00:00
Eli Friedman 94d3e4dd77 [SROA] Fix alignment for uses of PHI nodes.
Splitting an alloca can decrease the alignment of GEPs into the
partition.  Normally, rewriting accounts for this, but the code was
missing for uses of PHI nodes and select instructions.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38707 .

Differential Revision: https://reviews.llvm.org/D51335

llvm-svn: 341094
2018-08-30 18:59:24 +00:00
Craig Topper f0531da109 [InstCombine] Add test cases for D51398
These tests contain the pattern (neg (max ~X, C)) which we should transform to ((min X, ~C) + 1)

llvm-svn: 341023
2018-08-30 06:14:54 +00:00
Craig Topper b7b353be60 [X86] Make Feature64Bit useful
We now only add +64bit to the CPU string for "generic" CPU. All other CPU names are assumed to have the feature flag already set if they support 64-bit. I've remove the implies from CMPXCHG8 so that Feature64Bit only comes in via CPUs or user passing -mattr=+64bit.

I've changed the assert to a report_fatal_error so it's not lost in Release builds.

The test updates are to fix things that tripped the new error.

Differential Revision: https://reviews.llvm.org/D51231

llvm-svn: 341022
2018-08-30 06:01:05 +00:00
Philip Reames 6bd16b5850 [SimplifyCFG] Fix a cost modeling oversight in branch commoning
The cost modeling was not accounting for the fact we were duplicating the instruction once per predecessor.  With a default threshold of 1, this meant we were actually creating #pred copies.

Adding to the fun, there is *absolutely no* test coverage for this.  Simply bailing for more than one predecessor passes all checked in tests.

llvm-svn: 341001
2018-08-30 00:03:02 +00:00
Reid Kleckner 9397c2a23b Revert r340947 "[InstCombine] Expand the simplification of pow() into exp2()"
It broke the clang-cl self-host.

llvm-svn: 340991
2018-08-29 22:58:33 +00:00
Philip Reames 1887c40b22 Add a todo and tests to Address a review commnt from D50925 [NFC]
llvm-svn: 340978
2018-08-29 22:09:21 +00:00
Philip Reames f562fc8dbf [LICM] Hoist stores of invariant values to invariant addresses out of loops
Teach LICM to hoist stores out of loops when the store writes to a location otherwise unused in the loop, writes a value which is invariant, and is guaranteed to execute if the loop is entered.

Worth noting is that this transformation is partially overlapping with the existing promotion transformation. Reasons this is worthwhile anyway include:
 * For multi-exit loops, this doesn't require duplication of the store.
 * It kicks in for case where we can't prove we exit through a normal exit (i.e. we may throw), but can prove the store executes before that possible side exit.

Differential Revision: https://reviews.llvm.org/D50925

llvm-svn: 340974
2018-08-29 21:49:30 +00:00
Fedor Sergeev 7b49aa03af [SimpleLoopUnswitch] After unswitch delete dead blocks in parent loops
Summary:
Assert from PR38737 happens on the dead block inside the parent loop
after unswitching nontrivial switch in the inner loop.

deleteDeadBlocksFromLoop now takes extra care to detect/remove dead
blocks in all the parent loops in addition to the blocks from original
loop being unswitched.

Reviewers: asbirlea, chandlerc

Reviewed By: asbirlea

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51415

llvm-svn: 340955
2018-08-29 19:10:44 +00:00
Sanjay Patel 0f29e953b7 [InstCombine] canonicalize fneg with llvm.sin
This is a follow-up to rL339604 which did the same transform
for a sin libcall. The handling of intrinsics vs. libcalls
is unfortunately scattered, so I'm just adding this next to
the existing transform for llvm.cos for now.

This should resolve PR38458:
https://bugs.llvm.org/show_bug.cgi?id=38458
If the call was already negated, the negates will cancel
each other out.

llvm-svn: 340952
2018-08-29 18:27:49 +00:00
Sanjay Patel 12a7ea44ed [InstCombine] add tests for llvm.sin(-x); NFC
Also add a corresponding test for llvm.cos with FMF to 
make sure that was handled correctly.

llvm-svn: 340950
2018-08-29 18:11:42 +00:00
Evandro Menezes 22e0bdf4ed [InstCombine] Expand the simplification of pow() with nested exp{,2}()
Expand the simplification of `pow(exp{,2}(x), y)` to all FP types.

This improvement helps some benchmarks in SPEC CPU2000 and CPU2006, such as
252.eon, 447.dealII, 453.povray.  Otherwise, no significant regressions on
x86-64 or A64.

Differential revision: https://reviews.llvm.org/D51195

llvm-svn: 340948
2018-08-29 17:59:48 +00:00
Evandro Menezes a3a7b53571 [InstCombine] Expand the simplification of pow() into exp2()
Generalize the simplification of `pow(2.0, y)` to `pow(2.0 ** n, y)` for all
scalar and vector types.

This improvement helps some benchmarks in SPEC CPU2000 and CPU2006, such as
252.eon, 447.dealII, 453.povray.  Otherwise, no significant regressions on
x86-64 or A64.

Differential revision: https://reviews.llvm.org/D49273

llvm-svn: 340947
2018-08-29 17:59:34 +00:00
Sanjay Patel 3abd9f6bdc [InstCombine] add test for vector demanded elements + shrinking; NFC
llvm-svn: 340933
2018-08-29 15:34:19 +00:00
Hans Wennborg e0f3e9283f LoopSink: Don't sink into blocks without an insertion point (PR38462)
In the PR, LoopSink was trying to sink into a catchswitch block, which
doesn't have a valid insertion point.

Differential Revision: https://reviews.llvm.org/D51307

llvm-svn: 340900
2018-08-29 06:55:27 +00:00
Zhaoshi Zheng 35818e2789 [QTOOL-37352] Consider isLegalAddressingImm in Constant Hoisting
In Thumb1, legal imm range is [0, 255] for ADD/SUB instructions. However, the
legal imm range for LD/ST in (R+Imm) addressing mode is [0, 127]. Imms in
[128, 255] are materialized by mov R, #imm, and LD/STs use them in (R+R)
addressing mode.

This patch checks if a constant is used as offset in (R+Imm), if so, it checks
isLegalAddressingMode passing the constant value as BaseOffset.

Differential Revision: https://reviews.llvm.org/D50931

llvm-svn: 340882
2018-08-28 23:00:59 +00:00
Alina Sbirlea 52e97a28d4 [SimpleLoopUnswitch] Form dedicated exits after trivial unswitches.
Summary:
Form dedicated exits after trivial unswitches.
Fixes PR38737, PR38283.

Reviewers: chandlerc, fedor.sergeev

Subscribers: sanjoy, jlebar, uabelho, llvm-commits

Differential Revision: https://reviews.llvm.org/D51375

llvm-svn: 340871
2018-08-28 20:41:05 +00:00
Matt Arsenault 10de2775bd AMDGPU: Remove nan tests in class if src is nnan
llvm-svn: 340850
2018-08-28 18:10:02 +00:00
Sanjay Patel 60ffc2e9a4 [InstCombine] fix baseline assertions
rL340842 contained the wrong version of the check lines.

llvm-svn: 340846
2018-08-28 17:23:20 +00:00
Sanjay Patel c9756e5a23 [InstCombine] add tests for select narrowing (PR38691); NFC
llvm-svn: 340842
2018-08-28 16:45:00 +00:00
David Bolvansky c1b27b562b [Inliner] Attribute callsites with inline remarks
Summary:
Sometimes reading an output *.ll file it is not easy to understand why some callsites are not inlined. We can read output of inline remarks (option --pass-remarks-missed=inline) and try correlating its messages with the callsites.

An easier way proposed by this patch is to add to every callsite processed by Inliner an attribute with the latest message that describes the cause of not inlining this callsite. The attribute is called //inline-remark//. By default this feature is off. It can be switched on by the option //-inline-remark-attribute//.

For example in the provided test the result method //@test1// has two callsites //@bar// and inline remarks report different inlining missed reasons:
  remark: <unknown>:0:0: bar not inlined into test1 because too costly to inline (cost=-5, threshold=-6)
  remark: <unknown>:0:0: bar not inlined into test1 because it should never be inlined (cost=never): recursive

It is not clear which remark correspond to which callsite. With the inline remark attribute enabled we get the reasons attached to their callsites:
  define void @test1() {
    call void @bar(i1 true) #0
    call void @bar(i1 false) #2
    ret void
  }
  attributes #0 = { "inline-remark"="(cost=-5, threshold=-6)" }
  ..
  attributes #2 = { "inline-remark"="(cost=never): recursive" }

Patch by: yrouban (Yevgeny Rouban)

Reviewers: xbolva00, tejohnson, apilipenko

Reviewed By: xbolva00, tejohnson

Subscribers: eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D50435

llvm-svn: 340834
2018-08-28 15:27:25 +00:00
Mikael Holmen 4d652c4ce7 [CloneFunction] Constant fold terminators before checking single predecessor
Summary:
This fixes PR31105.

There is code trying to delete dead code that does so by e.g. checking if
the single predecessor of a block is the block itself.

That check fails on a block like this
 bb:
   br i1 undef, label %bb, label %bb
since that has two (identical) predecessors.

However, after the check for dead blocks there is a call to
ConstantFoldTerminator on the basic block, and that call simplifies the
block to
 bb:
   br label %bb

Therefore we now do the call to ConstantFoldTerminator before the check if
the block is dead, so it can realize that it really is.

The original behavior lead to the block not being removed, but it was
simplified as above, and then we did a call to
    Dest->replaceAllUsesWith(&*I);
with old and new being equal, and an assertion triggered.

Reviewers: chandlerc, fhahn

Reviewed By: fhahn

Subscribers: eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D51280

llvm-svn: 340820
2018-08-28 12:40:11 +00:00
Craig Topper a6cd4b9bce [InstCombine] Extend (add (sext x), cst) --> (sext (add x, cst')) and (add (zext x), cst) --> (zext (add x, cst')) to work for vectors
Differential Revision: https://reviews.llvm.org/D51236

llvm-svn: 340796
2018-08-28 02:02:29 +00:00
Kit Barton 7c80f98b69 [PPC] Remove Darwin support from POWER backend.
This patch issues an error message if Darwin ABI is attempted with the PPC
backend. It also cleans up existing test cases, either converting the test to
use an alternative triple or removing the test if the coverage is no longer
needed.

Updated Tests
-------------
The majority of test cases were updated to use a different triple that does not
include the Darwin ABI. Many tests were also updated to use FileCheck, in place
of grep.

Deleted Tests
-------------
llvm/test/tools/dsymutil/PowerPC/sibling.test was originally added to test
specific functionality of dsymutil using an object file created with an old
version of llvm-gcc for a Powerbook G4. After a discussion with @JDevlieghere he
suggested removing the test.

llvm/test/CodeGen/PowerPC/combine_loads_from_build_pair.ll was converted from a
PPC test to a SystemZ test, as the behavior is also reproducible there.

All other tests that were deleted were specific to the darwin/ppc ABI and no
longer necessary.

Phabricator Review: https://reviews.llvm.org/D50988

llvm-svn: 340795
2018-08-28 01:18:29 +00:00
Craig Topper e23e8a4f53 [InstCombine] Add test cases for D51236. NFC
llvm-svn: 340789
2018-08-27 22:55:49 +00:00
Sanjay Patel 42d31c20a8 [InstCombine] allow shuffle+binop canonicalization with widening shuffles
This lines up with the behavior of an existing transform where if both 
operands of the binop are shuffled, we allow moving the binop before the 
shuffle regardless of whether the shuffle changes the size of the vector.

llvm-svn: 340787
2018-08-27 22:41:44 +00:00
Evandro Menezes 253991cfaf [PATCH] [InstCombine] Fix issue in the simplification of pow() with nested exp{,2}()
Fix the issue of duplicating the call to `exp{,2}()` when it's nested in
`pow()`, as exposed by rL340462.

Differential revision: https://reviews.llvm.org/D51194

llvm-svn: 340784
2018-08-27 22:11:15 +00:00
Roman Tereshin 02320eee6b Revert "[SCEV][NFC] Check NoWrap flags before lexicographical comparison of SCEVs"
This reverts r319889.

Unfortunately, wrapping flags are not a part of SCEV's identity (they
do not participate in computing a hash value or in equality
comparisons) and in fact they could be assigned after the fact w/o
rebuilding a SCEV.

Grep for const_cast's to see quite a few of examples, apparently all
for AddRec's at the moment.

So, if 2 expressions get built in 2 slightly different ways: one with
flags set in the beginning, the other with the flags attached later
on, we may end up with 2 expressions which are exactly the same but
have their operands swapped in one of the commutative N-ary
expressions, and at least one of them will have "sorted by complexity"
invariant broken.

2 identical SCEV's won't compare equal by pointer comparison as they
are supposed to.

A real-world reproducer is added as a regression test: the issue
described causes 2 identical SCEV expressions to have different order
of operands and therefore compare not equal, which in its turn
prevents LoadStoreVectorizer from vectorizing a pair of consecutive
loads.

On a larger example (the source of the test attached, which is a
bugpoint) I have seen even weirder behavior: adding a constant to an
existing SCEV changes the order of the existing terms, for instance,
getAddExpr(1, ((A * B) + (C * D))) returns (1 + (C * D) + (A * B)).

Differential Revision: https://reviews.llvm.org/D40645

llvm-svn: 340777
2018-08-27 21:41:37 +00:00
Tim Renouf 904343f879 [AMDGPU] Add support for multi-dword s.buffer.load intrinsic
Summary:
Patch by Marek Olsak and David Stuttard, both of AMD.

This adds a new amdgcn intrinsic supporting s.buffer.load, in particular
multiple dword variants. These are convenient to use from some front-end
implementations.

Also modified the existing llvm.SI.load.const intrinsic to common up the
underlying implementation.

This modification also requires that we can lower to non-uniform loads correctly
by splitting larger dword variants into sizes supported by the non-uniform
versions of the load.

V2: Addressed minor review comments.
V3: i1 glc is now i32 cachepolicy for consistency with buffer and
    tbuffer intrinsics, plus fixed formatting issue.
V4: Added glc test.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D51098

Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b
llvm-svn: 340684
2018-08-25 14:53:17 +00:00
Sanjay Patel 57a0b4edd7 [InstCombine] add tests for shuffle+binop transform; NFC
llvm-svn: 340683
2018-08-25 14:37:08 +00:00
Philip Reames 006bdad692 [CVP] Extend tests to illustrate an old patch isn't needed
Back in https://reviews.llvm.org/D19559, I tried to teach CVP about range facts implied by value/value icmps (i.e. no constants.)  In the meantime, we've implemented the optimization, but I couldn't find tests checked in, so adding them.

llvm-svn: 340660
2018-08-24 21:56:43 +00:00
Xinliang David Li bcf726a32d [PGO] add target md5sum in warning message for icall
Differential revision: http://reviews.llvm.org/D51193

llvm-svn: 340657
2018-08-24 21:38:24 +00:00
Eli Friedman 59de37ba6c [SafeStack] Set debug location for calls to __safestack_pointer_address.
Otherwise, the debug info is incorrect.  On its own, this is mostly
harmless, but the safe-stack also later inlines the call to
__safestack_pointer_address, which leads to debug info with the wrong
scope, which eventually causes an assertion failure (and incorrect debug
info in release mode).

Differential Revision: https://reviews.llvm.org/D51075

llvm-svn: 340651
2018-08-24 20:42:32 +00:00
David Bolvansky 1ccbddca2c Revert [Inliner] Attribute callsites with inline remarks
llvm-svn: 340619
2018-08-24 16:39:41 +00:00
David Bolvansky 7c0537a3ac [Inliner] Attribute callsites with inline remarks
Summary:
Sometimes reading an output *.ll file it is not easy to understand why some callsites are not inlined. We can read output of inline remarks (option --pass-remarks-missed=inline) and try correlating its messages with the callsites.

An easier way proposed by this patch is to add to every callsite processed by Inliner an attribute with the latest message that describes the cause of not inlining this callsite. The attribute is called //inline-remark//. By default this feature is off. It can be switched on by the option //-inline-remark-attribute//.

For example in the provided test the result method //@test1// has two callsites //@bar// and inline remarks report different inlining missed reasons:
  remark: <unknown>:0:0: bar not inlined into test1 because too costly to inline (cost=-5, threshold=-6)
  remark: <unknown>:0:0: bar not inlined into test1 because it should never be inlined (cost=never): recursive

It is not clear which remark correspond to which callsite. With the inline remark attribute enabled we get the reasons attached to their callsites:
  define void @test1() {
    call void @bar(i1 true) #0
    call void @bar(i1 false) #2
    ret void
  }
  attributes #0 = { "inline-remark"="(cost=-5, threshold=-6)" }
  ..
  attributes #2 = { "inline-remark"="(cost=never): recursive" }

Patch by: yrouban (Yevgeny Rouban)

Reviewers: xbolva00, tejohnson, apilipenko

Reviewed By: xbolva00, tejohnson

Subscribers: eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D50435

llvm-svn: 340618
2018-08-24 16:28:36 +00:00
Philip Reames 9ec15faf20 [LICM] Hoist an invariant_start out of loops if there are no stores executed before it
Once the invariant_start is reached, we know that no instruction *after* it can modify the memory. So, if we can prove the location isn't read *between entry into the loop and the execution of the invariant_start*, we can execute the invariant_start before entering the loop.

Differential Revision: https://reviews.llvm.org/D51181

llvm-svn: 340617
2018-08-24 16:24:48 +00:00
Florian Hahn 406f1ff1cd [Local] Make DoesKMove required for combineMetadata.
This patch makes the DoesKMove argument non-optional, to force people
to think about it. Most cases where it is false are either code hoisting
or code sinking, where we pick one instruction from a set of
equal instructions among different code paths.

Reviewers: dberlin, nlopes, efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D47475

llvm-svn: 340606
2018-08-24 11:40:04 +00:00
Craig Topper dfa176e813 [ValueTracking] Fix assert message and add test case for r340546 and PR38677.
The bug was already fixed. This just adds a test case for it.

llvm-svn: 340556
2018-08-23 17:45:53 +00:00
David Bolvansky 43b0e25847 [InstCombine] Fold Select with binary op - FP opcodes
Summary:
Follow up for https://reviews.llvm.org/rL339520 and https://reviews.llvm.org/rL338300

Alive:

```
%A = fcmp oeq float %x, 0.0
%B = fadd nsz float %x, %z
%C = select i1 %A, float %B, float %y
=>
%C = select i1 %A, float %z, float %y
----------                                                                      
  %A = fcmp oeq float %x, 0.0
  %B = fadd nsz float %x, %z
  %C = select %A, float %B, float %y
=>
  %C = select %A, float %z, float %y

Done: 1                                                                         
Optimization is correct

%A = fcmp une float %x, -0.0
%B = fadd nsz float %x, %z
%C = select i1 %A, float %y, float %B
=>
%C = select i1 %A, float %y, float %z
----------                                                                      
  %A = fcmp une float %x, -0.0
  %B = fadd nsz float %x, %z
  %C = select %A, float %y, float %B
=>
  %C = select %A, float %y, float %z

Done: 1                                                                         
Optimization is correct
```


Reviewers: spatel, lebedev.ri

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50714

llvm-svn: 340538
2018-08-23 15:22:15 +00:00
John Brawn 23cbf09fad [GVN] Invalidate cached info for phis when setting dead predecessors to undef
When GVN sets the incoming value for a phi to undef because the incoming block
is unreachable it needs to also invalidate the cached info for that phi in
MemoryDependenceAnalysis, otherwise later queries will return stale information.

Differential Revision: https://reviews.llvm.org/D51099

llvm-svn: 340529
2018-08-23 12:48:17 +00:00
Florian Hahn 3052290dc0 Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
This version of the patch fixes cleaning up ssa_copy intrinsics, so it does not
crash for instructions in blocks that have been marked unreachable.

This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.

As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.

Reviewers: davide, mssimpso, dberlin, efriedma

Reviewed By: davide, dberlin

Differential Revision: https://reviews.llvm.org/D45330

llvm-svn: 340525
2018-08-23 11:04:00 +00:00
David Bolvansky 8715e03477 [LibCalls] Added returned attribute to libcalls
Reviewers: efriedma

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51092

llvm-svn: 340512
2018-08-23 05:18:23 +00:00
Craig Topper bec15b6516 [ValueTracking] Teach computeNumSignBits to understand min/max clamp patterns with constant/splat values
If we have a min/max pair we can do a better job of counting sign bits if we look at them together. This is similar to what is done in the SelectionDAG version of computeNumSignBits for ISD::SMAX/SMIN.

Differential Revision: https://reviews.llvm.org/D51112

llvm-svn: 340480
2018-08-22 23:27:50 +00:00
Evandro Menezes 74135fc79f [NFC] Expand test cases for simplifying pow()
llvm-svn: 340462
2018-08-22 22:44:06 +00:00
Eli Friedman f3c39a7c79 [SafeStack] Handle unreachable code with safe stack coloring.
Instead of asserting that the function doesn't have any unreachable
code, just ignore it for the purpose of computing liveness.

Differential Revision: https://reviews.llvm.org/D51070

llvm-svn: 340456
2018-08-22 21:38:57 +00:00
Alina Sbirlea 8b83d68544 Update MemorySSA in LoopSimplifyCFG.
Summary:
Add MemorySSA as a dependency to LoopSimplifyCFG and preserve it.
Disabled by default until all passes preserve MemorySSA.

Reviewers: bogner, chandlerc

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D50911

llvm-svn: 340445
2018-08-22 20:10:21 +00:00
Alina Sbirlea c1a216b251 Update MemorySSA in LoopInstSimplify.
Summary:
Add MemorySSA as a depency to LoopInstInstSimplify and preserve it.
Disabled by default until all passes preserve MemorySSA.

Reviewers: chandlerc

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D50906

llvm-svn: 340444
2018-08-22 20:05:21 +00:00
Vedant Kumar a85ca3de66 [CodeGenPrepare] Set debug locs when folding a comparison into a uadd.with.overflow
CGP can replace a branch + select with a uadd.with.overflow. Teach it to
set debug locations as it does this.

llvm-svn: 340432
2018-08-22 18:15:03 +00:00
Max Kazantsev 611d645a08 [GuardWidening] Ignore guards with trivial conditions
Guard widening should not spend efforts on dealing with guards with trivial true/false conditions.
Such guards can easily be eliminated by any further cleanup pass like instcombine. However we
should not unconditionally delete them because it may be profitable to widen other conditions
into such guards.

Differential Revision: https://reviews.llvm.org/D50247
Reviewed By: fedor.sergeev

llvm-svn: 340381
2018-08-22 02:40:49 +00:00
Vedant Kumar 4760686823 [CodeGenPrepare] Set debug loc when widening a switch condition
Set a debug location on the cast instruction used to widen a switch
condition.

llvm-svn: 340379
2018-08-22 01:23:31 +00:00
Vedant Kumar 1e8a2c963c [CodeGenPrepare] Set debug locations when splitting selects
When splitting a select into a diamond, set debug locations on
newly-created branch instructions and phi nodes.

llvm-svn: 340371
2018-08-22 00:10:37 +00:00
Vedant Kumar 30406fd789 [CodeGenPrepare] Clean up dbg.value use-before-def as late as possible
CodeGenPrepare has a strategy for moving dbg.values so that a value's
definition always dominates its debug users. This cleanup was happening
too early (before certain CGP transforms were run), resulting in some
dbg.value use-before-def errors.

Perform this cleanup as late as possible to avoid use-before-def.

llvm-svn: 340370
2018-08-21 23:43:08 +00:00
Vedant Kumar 8d652b756e [CodeGenPrepare] Pre-commit debug info test for optimizeSelectInst
This test shows that optimizeSelectInst splits a select and sinks a
`fdiv` operation to one side of the diamond. However, the dbg.value for
the operation isn't moved.

llvm-svn: 340369
2018-08-21 23:42:53 +00:00
Vedant Kumar a459b9f757 Avoid dbg.value use-before-def in a few tests (NFC)
This is preparation for landing a use-before-def verifier for debug
intrinsics (D46100).

As a drive-by, remove `tail` from debug intrinsic calls because it
doesn't mean anything in that context.

llvm-svn: 340366
2018-08-21 23:42:08 +00:00
Philip Reames 6a2a5c99c7 [LICM] Fix a test so it actualy checks what was meant [NFC]
llvm-svn: 340344
2018-08-21 21:27:26 +00:00
Anna Thomas 1d78503f6a NFC: update the test comments in LV test about early exit loops
llvm-svn: 340337
2018-08-21 21:12:02 +00:00
Florian Hahn 7cdf52e425 [CodeExtractor] Use 'normal destination' BB as insert point to store invoke results.
Currently CodeExtractor tries to use the next node after an invoke to
place the store for the result of the invoke, if it is an out parameter
of the region. This fails, as the invoke terminates the current BB.
In that case, we can place the store in the 'normal destination' BB, as
the result will only be available in that case.


Reviewers: davidxl, davide, efriedma

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D51037

llvm-svn: 340331
2018-08-21 20:07:46 +00:00
Florian Hahn 9583d4fa03 [GVN] Assign new value number to calls reading memory, if there is no MemDep info.
Currently we assign the same value number to two calls reading the same
memory location if we do not have MemoryDependence info. Without MemDep
Info we cannot guarantee that there is no store between the two calls, so we
have to assign a new number to the second call.

It also adds a new option EnableMemDep to enable/disable running
MemoryDependenceAnalysis and also renamed NoLoads to NoMemDepAnalysis to
be more explicit what it does. As it also impacts calls that read memory,
NoLoads is a bit confusing.

Reviewers: efriedma, sebpop, john.brawn, wmi

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D50893

llvm-svn: 340319
2018-08-21 19:11:27 +00:00
Nicola Zaghen 8a012cbabf [InstCombine] Add new tests for icmp ugt/ult (add nuw X, C2), C
Differential Revision: https://reviews.llvm.org/D51040

llvm-svn: 340284
2018-08-21 15:27:32 +00:00
Sanjay Patel f3ae9cc33e [InstSimplify] use isKnownNeverNaN to fold more fcmp ord/uno
Remove duplicate tests from InstCombine that were added with
D50582. I left negative tests there to verify that nothing
in InstCombine tries to go overboard. If isKnownNeverNaN is
improved to handle the FP binops or other cases, we should
have coverage under InstSimplify, so we could remove more
duplicate tests from InstCombine at that time.

llvm-svn: 340279
2018-08-21 14:45:13 +00:00
Anna Thomas b02b0ad8c7 [LV] Vectorize loops where non-phi instructions used outside loop
Summary:
Follow up change to rL339703, where we now vectorize loops with non-phi
instructions used outside the loop. Note that the cyclic dependency
identification occurs when identifying reduction/induction vars.

We also need to identify that we do not allow users where the PSCEV information
within and outside the loop are different. This was the fix added in rL307837
for PR33706.

Reviewers: Ayal, mkuper, fhahn

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D50778

llvm-svn: 340278
2018-08-21 14:40:27 +00:00
Sanjay Patel 6bb09a4291 [InstSimplify] add tests for FP uno/ord with nnan; NFC
This is a slight modification of the tests from D50582;
change half of the predicates to 'uno' so we have coverage
for that side too. All of the positive tests can fold to a
constant (true/false), so that should happen in instsimplify.

llvm-svn: 340276
2018-08-21 13:33:13 +00:00
Anna Thomas 2d33ce7701 NFC: Add loop vectorizer tests showing various control flow within loop that skip iterations
llvm-svn: 340275
2018-08-21 13:02:09 +00:00
Max Kazantsev 097ef69182 [LICM] Hoist guards with invariant conditions
This patch teaches LICM to hoist guards from the loop if they are guaranteed to execute and
if there are no side effects that could prevent that.

Differential Revision: https://reviews.llvm.org/D50501
Reviewed By: reames

llvm-svn: 340256
2018-08-21 08:11:31 +00:00
Max Kazantsev f1dc867396 [NFC] Add some LICM tests
llvm-svn: 340254
2018-08-21 07:37:02 +00:00
Philip Reames a5a8546ac6 [AST] Mark invariant.starts as being readonly
These intrinsics are modelled as writing for control flow purposes, but they don't actually write to any location. Marking these - as we did for guards - allows LICM to hoist loads out of loops containing invariant.starts.

Differential Revision: https://reviews.llvm.org/D50861

llvm-svn: 340245
2018-08-21 00:55:35 +00:00
Philip Reames 578c64da0c [LICM] Add tests from D50786 [NFC]
Exercise more use of volatiles to illustrate that nothing changes as we tweak how we detect them.

llvm-svn: 340244
2018-08-21 00:42:07 +00:00
Philip Reames efdd0a426a [LICM][NFC] Add tests from D50730
Landing tests so corresponding change can show effects clearly.  see
D50730 [AST] Generalize argument specific aliasing

llvm-svn: 340243
2018-08-21 00:37:09 +00:00
Philip Reames 4009487e5c [LICM] More tests for D50925 [NFC]
This time, the corresponding cases where we can hoist (store-like) calls out of loops.

llvm-svn: 340242
2018-08-21 00:14:14 +00:00
Philip Reames 529a590bce [LICM][Tests] Add tests for store hoisting [NFC]
https://reviews.llvm.org/D50925 will be rebased on top of this.

llvm-svn: 340233
2018-08-20 23:37:59 +00:00
Craig Topper bee74793a3 [InstCombine] Add splat vector constant support to foldICmpAddOpConst.
Differential Revision: https://reviews.llvm.org/D50946

llvm-svn: 340231
2018-08-20 23:04:25 +00:00
Michael Berg 0b838deddc extend binop folds for selects to include true and false binops flag intersection
Summary: This change address bug 38641

Reviewers: spatel, wristow

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D50996

llvm-svn: 340222
2018-08-20 22:26:58 +00:00
Matt Arsenault 450fcc77a7 ValueTracking: Handle more instructions in isKnownNeverNaN
llvm-svn: 340187
2018-08-20 16:51:00 +00:00
Sanjay Patel 5ae83a21b5 [InstCombine] add tests for insertelement+binop; NFC
llvm-svn: 340184
2018-08-20 16:49:08 +00:00
Craig Topper 5f695cc1e9 [InstCombine] Add test cases for an icmp combine that is missing support for splat vector constants.
llvm-svn: 340144
2018-08-19 18:03:34 +00:00
Matt Arsenault ea4b476a30 ValueTracking: Add tests for isKnownNeverNaN
llvm-svn: 340090
2018-08-17 21:39:52 +00:00
Evandro Menezes e219d384f9 [NFC] Expand test cases for simplifying pow()
In prepatration for the improvements that D49273 enables.

llvm-svn: 340060
2018-08-17 17:59:38 +00:00
Florian Hahn 9e50e915fa [NewGVN] Add tests for r340031.
llvm-svn: 340032
2018-08-17 14:39:53 +00:00
Florian Hahn 19f9e32f07 [InstrSimplify,NewGVN] Add option to ignore additional instr info when simplifying.
NewGVN uses InstructionSimplify for simplifications of leaders of
congruence classes. It is not guaranteed that the metadata or other
flags/keywords (like nsw or exact) of the leader is available for all members
in a congruence class, so we cannot use it for simplification.

This patch adds a InstrInfoQuery struct with a boolean field
UseInstrInfo (which defaults to true to keep the current behavior as
default) and a set of helper methods to get metadata/keywords for a
given instruction, if UseInstrInfo is true. The whole thing might need a
better name, to avoid confusion with TargetInstrInfo but I am not sure
what a better name would be.

The current patch threads through InstrInfoQuery to the required
places, which is messier then it would need to be, if
InstructionSimplify and ValueTracking would share the same Query struct.

The reason I added it as a separate struct is that it can be shared
between InstructionSimplify and ValueTracking's query objects. Also,
some places do not need a full query object, just the InstrInfoQuery.

It also updates some interfaces that do not take a Query object, but a
set of optional parameters to take an additional boolean UseInstrInfo.

See https://bugs.llvm.org/show_bug.cgi?id=37540.

Reviewers: dberlin, davide, efriedma, sebpop, hiraditya

Reviewed By: hiraditya

Differential Revision: https://reviews.llvm.org/D47143

llvm-svn: 340031
2018-08-17 14:39:04 +00:00
Alex Bradbury 3291f9aa81 [AtomicExpandPass] Widen partword atomicrmw or/xor/and before tryExpandAtomicRMW
This patch performs a widening transformation of bitwise atomicrmw 
{or,xor,and} and applies it prior to tryExpandAtomicRMW. This operates 
similarly to convertCmpXchgToIntegerType. For these operations, the i8/i16 
atomicrmw can be implemented in terms of the 32-bit atomicrmw by appropriately 
manipulating the operands. There is no functional change for the handling of 
partword or/xor, but the transformation for partword 'and' is new.

The advantage of performing this transformation early is that the same 
code-path can be used regardless of the approach used to expand the atomicrmw 
(AtomicExpansionKind). i.e. the same logic is used for 
AtomicExpansionKind::CmpXchg and can also be used by the intrinsic-based 
expansion in D47882.

Differential Revision: https://reviews.llvm.org/D48129

llvm-svn: 340027
2018-08-17 14:03:37 +00:00
Anna Thomas 1962621a7e [LICM] Add a diagnostic analysis for identifying alias information
Summary:
Currently, in LICM, we use the alias set tracker to identify if the
instruction (we're interested in hoisting) aliases with instruction that
modifies that memory location.

This patch adds an LICM alias analysis diagnostic tool that checks the
mod ref info of the instruction we are interested in hoisting/sinking,
with every instruction in the loop.  Because of O(N^2) complexity this
is now only a diagnostic tool to show the limitation we have with the
alias set tracker and is OFF by default.

Test cases show the difference with the diagnostic analysis tool, where
we're able to hoist out loads and readonly + argmemonly calls from the
loop, where the alias set tracker analysis is not able to hoist these
instructions out.

Reviewers: reames, mkazantsev, fedor.sergeev, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50854

llvm-svn: 340026
2018-08-17 13:44:00 +00:00
Max Kazantsev 7b78d3920c [MustExecute] Fix algorithmic bug in isGuaranteedToExecute. PR38514
The description of `isGuaranteedToExecute` does not correspond to its implementation.
According to description, it should return `true` if an instruction is executed under the
assumption that its loop is *entered*. However there is a sophisticated alrogithm inside
that tries to prove that the instruction is executed if the loop is *exited*, which is not the
same thing for infinite loops. There is an attempt to protect from dealing with infinite loops
by prohibiting loops without exit blocks, however an infinite loop can have exit blocks.

As result of that, MustExecute can falsely consider some blocks that are never entered as
mustexec, and LICM can hoist dangerous instructions out of them basing on this fact.
This may introduce UB to programs which did not contain it initially.

This patch removes the problematic algorithm and replaced it with a one which tries to
prove what is required in description.

Differential Revision: https://reviews.llvm.org/D50558
Reviewed By: reames

llvm-svn: 339984
2018-08-17 06:19:17 +00:00
Max Kazantsev cfa3e66b8e [NFC] Add tests to ensure that improvement of MustThrow analysis will not lead to problems in future
llvm-svn: 339983
2018-08-17 05:20:25 +00:00
Sanjay Patel 8ba631d9c8 [InstCombine] add reflection fold for tan(-x)
This is a follow-up suggested with rL339604.
For tan(), we don't have a corresponding LLVM 
intrinsic -- unlike sin/cos -- so this is the 
only way/place that we can do this fold currently.

llvm-svn: 339958
2018-08-16 22:46:20 +00:00
Sanjay Patel 75714b598d [InstCombine] add tests for tan with negated arg; NFC
llvm-svn: 339953
2018-08-16 22:05:51 +00:00
Michael Berg ed89d069f4 add a missed case for binary op FMF propagation under select folds
llvm-svn: 339938
2018-08-16 20:59:45 +00:00
Philip Reames 684fa57ef7 [MemLoc] Fix a bug causing any use of invariant.end to crash in LICM
The fix is fairly simple, but is says something unpleasant about the usage and testing of invariant.start/end scopes that this went undetected.  To put this in perspective, *any* invariant.end in a loop flowing through LICM crashed.  I haven't bothered to figure out just how far back this goes, but it's not caused by any of the recent changes.  We're probably talking months if not years.  

llvm-svn: 339936
2018-08-16 20:48:55 +00:00
Evandro Menezes 42422b33cf [NFC] Fix typo in test cases
llvm-svn: 339900
2018-08-16 17:03:22 +00:00
Evandro Menezes c05c7e11bb [InstCombine] Expand the simplification of pow(x, 0.5) to sqrt(x)
Expand the number of cases when `pow(x, 0.5)` is simplified into `sqrt(x)`
by considering the math semantics with more granularity.

Differential revision: https://reviews.llvm.org/D50036

llvm-svn: 339887
2018-08-16 15:58:08 +00:00
Sanjay Patel 039f556f44 [InstCombine] move vector compare before same-shuffled ops
This is a step towards fixing PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463

llvm-svn: 339875
2018-08-16 12:52:17 +00:00
Guozhi Wei 8c17f9a77d [CodeGenPrepare] Add BothExtension type to PromotedInsts
This patch fixes PR38125.

Instruction extension types are recorded in PromotedInsts, it can be used later in function canGetThrough. If an instruction has two users with different extension types, it will be inserted into PromotedInsts two times in function promoteOperandForOther. The second one overwrites the first one, and the final extension type is wrong, later causes problem in canGetThrough.

This patch changes the simple bool extension type to 2-bit enum type, add a BothExtension type in addition to zero/sign extension. When an user sees BothExtension for an instruction, it actually knows nothing about how that instruction is extended.

Differential Revision: https://reviews.llvm.org/D49512

llvm-svn: 339822
2018-08-15 22:08:26 +00:00
Matt Arsenault 9a389fbd79 AMDGPU: Stop producing icmp/fcmp intrinsics with invalid types
llvm-svn: 339815
2018-08-15 21:14:25 +00:00
Amara Emerson 070ac768ff [InstCombine] Fix IC trying to create a xor of pointer types.
rdar://42473741

Differential Revision: https://reviews.llvm.org/D50775

llvm-svn: 339796
2018-08-15 17:46:22 +00:00
Max Kazantsev 5a10d127b9 [AliasSetTracker] Do not treat experimental_guard intrinsic as memory writing instruction
The `experimental_guard` intrinsic has memory write semantics to model the thread-exiting
logic, but does not do any actual writes to memory. Currently, `AliasSetTracker` treats it as a
normal memory write. As result, a loop-invariant load cannot be hoisted out of loop because
the guard may possibly alias with it.

This patch makes `AliasSetTracker` so that it doesn't treat guards as memory writes.

Differential Revision: https://reviews.llvm.org/D50497
Reviewed By: reames

llvm-svn: 339753
2018-08-15 06:21:02 +00:00
Sanjay Patel b1546da0e8 [InstCombine] fix typos in tests; NFC
See D50036.

llvm-svn: 339713
2018-08-14 19:13:07 +00:00
Sanjay Patel 73b7e9f65e [InstCombine] add tests for pow->sqrt; NFC
D50036 should fix the missed optimizations.

llvm-svn: 339711
2018-08-14 19:05:37 +00:00
Anna Thomas 60a1e4dddc [LV] Teach about non header phis that have uses outside the loop
Summary:
This patch teaches the loop vectorizer to vectorize loops with non
header phis that have have outside uses.  This is because the iteration
dependence distance for these phis can be widened upto VF (similar to
how we do for induction/reduction) if they do not have a cyclic
dependence with header phis. When identifying reduction/induction/first
order recurrence header phis, we already identify if there are any cyclic
dependencies that prevents vectorization.

The vectorizer is taught to extract the last element from the vectorized
phi and update the scalar loop exit block phi to contain this extracted
element from the vector loop.

This patch can be extended to vectorize loops where instructions other
than phis have outside uses.

Reviewers: Ayal, mkuper, mssimpso, efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50579

llvm-svn: 339703
2018-08-14 18:22:19 +00:00
David Bolvansky ba74d1c4ea [NFC] Tests for select with binop fold - FP opcodes
llvm-svn: 339692
2018-08-14 17:03:47 +00:00
Sanjay Patel c8e3943e89 [InstCombine] regenerate checks; NFC
llvm-svn: 339683
2018-08-14 15:21:13 +00:00
Sanjay Patel 19c7e7dab4 [InstCombine] regenerate checks; NFC
llvm-svn: 339681
2018-08-14 15:18:52 +00:00
Tomasz Krupa e766e5f636 [X86] Constant folding of adds/subs intrinsics
Summary: This adds constant folding of signed add/sub with saturation intrinsics.

Reviewers: craig.topper, spatel, RKSimon, chandlerc, efriedma

Reviewed By: craig.topper

Subscribers: rnk, llvm-commits

Differential Revision: https://reviews.llvm.org/D50499

llvm-svn: 339659
2018-08-14 09:04:01 +00:00
Reid Kleckner 40e7663b1f [BasicAA] Don't assume tail calls with byval don't alias allocas
Summary:
Calls marked 'tail' cannot read or write allocas from the current frame
because the current frame might be destroyed by the time they run.
However, a tail call may use an alloca with byval. Calling with byval
copies the contents of the alloca into argument registers or stack
slots, so there is no lifetime issue. Tail calls never modify allocas,
so we can return just ModRefInfo::Ref.

Fixes PR38466, a longstanding bug.

Reviewers: hfinkel, nlewycky, gbiv, george.burgess.iv

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D50679

llvm-svn: 339636
2018-08-14 01:24:35 +00:00
Roman Lebedev 3534874fbf [InstCombine] Re-land: Optimize redundant 'signed truncation check pattern'.
Summary:
This comes with `Implicit Conversion Sanitizer - integer sign change` (D50250):
```
signed char test(unsigned int x) { return x; }
```
`clang++ -fsanitize=implicit-conversion -S -emit-llvm -o - /tmp/test.cpp -O3`
* Old: {F6904292}
* With this patch: {F6904294}

General pattern:
  X & Y

Where `Y` is checking that all the high bits (covered by a mask `4294967168`)
are uniform, i.e.  `%arg & 4294967168`  can be either  `4294967168`  or  `0`
Pattern can be one of:
  %t = add        i32 %arg,    128
  %r = icmp   ult i32 %t,      256
Or
  %t0 = shl       i32 %arg,    24
  %t1 = ashr      i32 %t0,     24
  %r  = icmp  eq  i32 %t1,     %arg
Or
  %t0 = trunc     i32 %arg  to i8
  %t1 = sext      i8  %t0   to i32
  %r  = icmp  eq  i32 %t1,     %arg
This pattern is a signed truncation check.

And `X` is checking that some bit in that same mask is zero.
I.e. can be one of:
  %r = icmp sgt i32   %arg,    -1
Or
  %t = and      i32   %arg,    2147483648
  %r = icmp eq  i32   %t,      0

Since we are checking that all the bits in that mask are the same,
and a particular bit is zero, what we are really checking is that all the
masked bits are zero.
So this should be transformed to:
  %r = icmp ult i32 %arg, 128

The transform itself ended up being rather horrible, even though i omitted some cases.
Surely there is some infrastructure that can help clean this up that i missed?

https://rise4fun.com/Alive/3Ou

The initial commit (rL339610)
was reverted, since the first assert was being triggered.
The @positive_with_extra_and test now has coverage for that case.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: RKSimon, erichkeane, vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D50465

llvm-svn: 339621
2018-08-13 21:54:37 +00:00
Roman Lebedev 93f7e7f03e [NFC][InstCombine] Add a test for D50465 that used to assert
This is valid to fold, too.
https://rise4fun.com/Alive/0lz

llvm-svn: 339619
2018-08-13 21:49:33 +00:00
Sanjay Patel 15bff18c6f [SimplifyLibCalls] don't drop fast-math-flags on trig reflection folds (retry r339608)
Even though this code is below a function called optimizeFloatingPointLibCall(),
we apparently can't guarantee that we're dealing with FPMathOperators, so bail
out immediately if that's not true.

llvm-svn: 339618
2018-08-13 21:49:19 +00:00
Roman Lebedev 28a42c7706 Revert "[InstCombine] Optimize redundant 'signed truncation check pattern'."
At least one buildbot was able to actually trigger that assert
on the top of the function. Will investigate.

This reverts commit r339610.

llvm-svn: 339612
2018-08-13 20:46:22 +00:00
Roman Lebedev 4c4750771f [InstCombine] Optimize redundant 'signed truncation check pattern'.
Summary:
This comes with `Implicit Conversion Sanitizer - integer sign change` (D50250):
```
signed char test(unsigned int x) { return x; }
```
`clang++ -fsanitize=implicit-conversion -S -emit-llvm -o - /tmp/test.cpp -O3`
* Old: {F6904292}
* With this patch: {F6904294}

General pattern:
  X & Y

Where `Y` is checking that all the high bits (covered by a mask `4294967168`)
are uniform, i.e.  `%arg & 4294967168`  can be either  `4294967168`  or  `0`
Pattern can be one of:
  %t = add        i32 %arg,    128
  %r = icmp   ult i32 %t,      256
Or
  %t0 = shl       i32 %arg,    24
  %t1 = ashr      i32 %t0,     24
  %r  = icmp  eq  i32 %t1,     %arg
Or
  %t0 = trunc     i32 %arg  to i8
  %t1 = sext      i8  %t0   to i32
  %r  = icmp  eq  i32 %t1,     %arg
This pattern is a signed truncation check.

And `X` is checking that some bit in that same mask is zero.
I.e. can be one of:
  %r = icmp sgt i32   %arg,    -1
Or
  %t = and      i32   %arg,    2147483648
  %r = icmp eq  i32   %t,      0

Since we are checking that all the bits in that mask are the same,
and a particular bit is zero, what we are really checking is that all the
masked bits are zero.
So this should be transformed to:
  %r = icmp ult i32 %arg, 128

https://rise4fun.com/Alive/3Ou

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: RKSimon, erichkeane, vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D50465

llvm-svn: 339610
2018-08-13 20:33:08 +00:00
Sanjay Patel 66c6fe6534 revert r339608 - [SimplifyLibCalls] don't drop fast-math-flags on trig reflection folds
Can't set the builder flags without knowing this is an FPMathOperator. I'll add a test
for that and try again.

llvm-svn: 339609
2018-08-13 20:20:38 +00:00
Sanjay Patel 981f50919e [SimplifyLibCalls] don't drop fast-math-flags on trig reflection folds
llvm-svn: 339608
2018-08-13 20:14:27 +00:00
Anna Thomas cce7c24af1 NFC: Add a test to LV showing that reduction is not possible when reduction var is reset in the loop
Added a test case to reduction showing where it's illegal to identify
vectorize a loop.
Resetting the reduction var during loop iterations disallows us from
widening the dependency cycle to VF, thereby making it illegal to
vectorize the loop.

llvm-svn: 339605
2018-08-13 19:55:25 +00:00
Sanjay Patel e45a83d447 [SimplifyLibCalls] add reflection fold for -sin(-x) (PR38458)
This is a very partial fix for the reported problem. I suspect
we do not get this fold in most motivating cases because most of
the time, the libcall would have been replaced by an intrinsic,
and that optimization is handled elsewhere...but maybe it should
be handled here?

llvm-svn: 339604
2018-08-13 19:24:41 +00:00
Roman Lebedev 2da1ef5b9e [InstCombine][NFC] Tests for 'signed truncation check' optimization
See D50465 for the actual opt itself.

Differential Revision: https://reviews.llvm.org/D50464

llvm-svn: 339602
2018-08-13 18:51:09 +00:00
Sanjay Patel e33062369e [InstCombine] add more tests for trig reflections; NFC (PR38458)
llvm-svn: 339598
2018-08-13 18:34:32 +00:00
Simon Pilgrim 82edf8d329 [InstCombine] Limit simplifyAllocaArraySize constant folding to values that fit into a uint64_t
Fixes OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=5223

llvm-svn: 339584
2018-08-13 16:50:20 +00:00
Sanjay Patel d379f39e18 [InstCombine] auto-generate full checks and add cos intrinsic test; NFC
llvm-svn: 339579
2018-08-13 16:29:01 +00:00
Evandro Menezes 5ecd6c1a46 [SLC] Expand simplification of pow() for vector types
Also consider vector constants when simplifying `pow()`.

Differential revision: https://reviews.llvm.org/D50035

llvm-svn: 339578
2018-08-13 16:12:37 +00:00
Max Kazantsev 5c490b49c3 [GuardWidening] Widen very likely non-taken br instructions
This is a second part of D49974 that handles widening of conditional branches that
have very likely `false` branch.

Differential Revision: https://reviews.llvm.org/D50040
Reviewed By: reames

llvm-svn: 339537
2018-08-13 07:58:19 +00:00
Craig Topper 484b342c68 [X86] Add constant folding for AVX512 versions of scalar floating point to integer conversion intrinsics.
Summary:
We've supported constant folding for sse versions for many years. This patch adds support for the avx512 versions including unsigned with the default rounding mode. We could probably do more with other roundings modes and SAE in the future.

The test cases are largely based on the sse.ll test cases. But I did add some test cases to ensure the unsigned versions don't accept negative values. Also checked the bounds of f64->i32 conversions to make sure unsigned has a larger positive range than signed.

Reviewers: RKSimon, spatel, chandlerc

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50553

llvm-svn: 339529
2018-08-12 22:09:54 +00:00
David Bolvansky cd57242587 [NFC] Fixed build, updated tests
llvm-svn: 339524
2018-08-12 18:32:53 +00:00
David Bolvansky ddfe408f9a [NFC] Renamed test file
llvm-svn: 339523
2018-08-12 17:43:27 +00:00
David Bolvansky 01d98cc03f [InstCombine] Fold Select with binary op - non-commutative opcodes
Summary:
Basic version was merged - https://reviews.llvm.org/D49954

This adds support for FP & non-commutative opcodes

Precommited tests: https://reviews.llvm.org/rL338727

Reviewers: spatel, lebedev.ri

Reviewed By: spatel

Subscribers: jfb

Differential Revision: https://reviews.llvm.org/D50190

llvm-svn: 339520
2018-08-12 17:30:07 +00:00
Sanjay Patel dc185ee275 [InstCombine] fix/enhance fadd/fsub factorization
(X * Z) + (Y * Z) --> (X + Y) * Z
  (X * Z) - (Y * Z) --> (X - Y) * Z
  (X / Z) + (Y / Z) --> (X + Y) / Z
  (X / Z) - (Y / Z) --> (X - Y) / Z

The existing code that implemented these folds failed to 
optimize vectors, and it transformed code with multiple 
uses when it should not have.

llvm-svn: 339519
2018-08-12 15:48:26 +00:00
Sanjay Patel ce104b6c16 [InstCombine] move/add tests for fadd/fsub factorization; NFC
llvm-svn: 339518
2018-08-12 15:06:15 +00:00
David Green f7111d1ece [UnJ] Improve explicit loop count checks
Try to improve the computed counts when it has been explicitly set by a pragma
or command line option. This moves the code around, so that first call to
computeUnrollCount to get a sensible count and override that if explicit unroll
and jam counts are specified.

Also added some extra debug messages for when unroll and jamming is disabled.

Differential Revision: https://reviews.llvm.org/D50075

llvm-svn: 339501
2018-08-11 07:37:31 +00:00
Philip Reames 85afd1a9a0 [LICM] Hoist assumes out of loops
If we have an assume which is known to execute and whose operand is invariant, we can lift that into the pre-header. So long as we don't change which paths the assume executes on, this is a legal transformation. It's likely to be a useful canonicalization as other transforms only look for dominating assumes.

Differential Revision: https://reviews.llvm.org/D50364

llvm-svn: 339481
2018-08-10 22:21:56 +00:00
Sanjay Patel 0b62b01129 [InstCombine] add tests for fsub factorization; NFC
The tests show that;
1. The fold doesn't fire for vectors, but it should.
2. The fold fires regardless of uses, but it shouldn't.

llvm-svn: 339470
2018-08-10 21:00:27 +00:00
Sanjay Patel 3950095edf [InstCombine] add tests to show disabling of libcall/intrinsic shrinking; NFC
llvm-svn: 339467
2018-08-10 20:12:36 +00:00
Matt Arsenault d35f46caf1 AMDGPU: Turn class x, p_zero|n_zero into fcmp oeq x, 0
The library does use this for some reason.

llvm-svn: 339461
2018-08-10 18:58:49 +00:00
Sanjay Patel 12a2911f62 [InstCombine] add/update tests for selectBinOpIdentity; NFC
This includes a test that would have exposed the bug in rL339439
which was reverted at rL339446. The compare can be integer while
the binop is FP or vice-versa, so we need to use the binop type
when we ask for the identity constant.

llvm-svn: 339453
2018-08-10 17:20:24 +00:00
David Bolvansky 5099835541 [InstCombine][NFC] Added tests for select with binop fold
llvm-svn: 339441
2018-08-10 15:29:09 +00:00
Max Kazantsev 4e9def57c7 [NFC] Add tests that demonstrate that MustExecute is fundamentally broken
llvm-svn: 339417
2018-08-10 09:20:46 +00:00
George Burgess IV ff08c80efc [MemorySSA] "Fix" lifetime intrinsic handling
MemorySSA currently creates MemoryAccesses for lifetime intrinsics, and
sometimes treats them as clobbers. This may/may not be the best way
forward, but while we're doing it, we should consider
MayAlias/PartialAlias to be clobbers.

The ideal fix here is probably to remove all of this reasoning about
lifetimes from MemorySSA + put it into the passes that need to care. But
that's a wayyy broader fix that needs some consensus, and we have
miscompiles + a release branch today, and this should solve the
miscompiles just as well.

differential revision is D43269. Landing without an explicit LGTM (and
without using the special please-autoclose-this syntax) so we can still
use that revision as a place to decide what the right fix here is.

llvm-svn: 339411
2018-08-10 05:14:43 +00:00
David Bolvansky 909889b2cb [InstCombine] Transform str(n)cmp to memcmp
Summary:
Motivation examples:
int strcmp_memcmp() {
    char buf[12];
    return strcmp(buf, "key") == 0;
}

int strcmp_memcmp2() {
    char buf[12];
    return strcmp(buf, "key") != 0;
}

int strncmp_memcmp() {
    char buf[12];
    return strncmp(buf, "key", 3) == 0;
}

can be turned to memcmp.

See test file for more cases.

Reviewers: efriedma

Reviewed By: efriedma

Subscribers: spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D50233

llvm-svn: 339410
2018-08-10 04:32:54 +00:00
Matt Arsenault d54b7f0592 ValueTracking: Start enhancing isKnownNeverNaN
llvm-svn: 339399
2018-08-09 22:40:08 +00:00
Sanjay Patel c6944f795d [InstSimplify] move minnum/maxnum with Inf folds from instcombine
llvm-svn: 339396
2018-08-09 22:20:44 +00:00
Philip Reames ca256d93fb [LICM] hoist fences out of loops w/o memory operations
The motivating case is an otherwise dead loop with a fence in it. At the moment, this goes all the way through the optimizer and we end up emitting an entirely pointless loop on x86. This case may seem a bit contrived, but we've seen it in real code as the result of otherwise reasonable lowering strategies combined w/thread local memory optimizations (such as escape analysis).

To handle this simple case, we can teach LICM to hoist must execute fences when there is no other memory operation within the loop.

Differential Revision: https://reviews.llvm.org/D50489

llvm-svn: 339378
2018-08-09 20:18:42 +00:00
Sanjay Patel 55accd7dd3 [InstCombine] allow fsub+fmul FMF folds for vectors
llvm-svn: 339368
2018-08-09 18:42:12 +00:00
Alina Sbirlea bf9fe79397 SCEV should forget all loops containing a deleted block.
Summary:
LoopSimplifyCFG should update ScEv for all loops after a block is deleted.
If the deleted block "Succ" is part of L, then it is part of all parent loops, so forget topmost loop.

Reviewers: greened, mkazantsev, sanjoy

Subscribers: jlebar, javed.absar, uabelho, llvm-commits

Differential Revision: https://reviews.llvm.org/D50422

llvm-svn: 339363
2018-08-09 17:53:26 +00:00
Sanjay Patel 373790293e [InstCombine] add vector tests for fsub+fmul; NFC
llvm-svn: 339361
2018-08-09 17:40:27 +00:00
Reid Kleckner 80c6ec11d9 [GlobalOpt] Don't apply fastcc if it would break inalloca invariants
The inalloca parameter has to be the only parameter passed in memory.
Changing the convention to fastcc can break that.

At some point we should teach global opt how to optimize ABI attributes
like inalloca and maybe byval. These attributes are mainly used to match
C ABIs. They are harder for LLVM to optimize and they don't always
generate the best code.

Fixes PR38487

llvm-svn: 339360
2018-08-09 17:29:26 +00:00
Philip Reames 954eab1087 [LICM] Add tests for future hoisting of fence instructions [NFC]
The main interesting case is a fence in an otherwise dead loop or one containing only arithmetic.  This can happen as a result of DSE or other transforms from seemingly reasonable initial IR.  

llvm-svn: 339310
2018-08-09 04:21:02 +00:00
Sanjay Patel fe839695a8 [InstCombine] fold fadd+fsub with common operand
This is a sibling to the simplify from:
https://reviews.llvm.org/rL339174

llvm-svn: 339267
2018-08-08 16:19:22 +00:00
Sanjay Patel 2054dd79c2 [InstCombine] fold fsub+fsub with common operand
This is a sibling to the simplify from:
rL339171

llvm-svn: 339266
2018-08-08 16:04:48 +00:00
Sanjay Patel abd4767a0d [InstCombine] add tests for fsub folds; NFC
The scalar cases are handled in instcombine's internal
reassociation pass for FP ops, but it misses the vector types.

These patterns are similar to what was handled in InstSimplify in:
https://reviews.llvm.org/rL339171
https://reviews.llvm.org/rL339174
https://reviews.llvm.org/rL339176
...but we can't use instsimplify on these because we require negation
of the original operand.

llvm-svn: 339263
2018-08-08 15:44:56 +00:00
Sanjay Patel a194b2d2ff [InstCombine] fold fneg into constant operand of fmul/fdiv
This accounts for the missing IR fold noted in D50195. We don't need any fast-math to enable the negation transform. 
FP negation can always be folded into an fmul/fdiv constant to eliminate the fneg.

I've limited this to one-use to ensure that we are eliminating an instruction rather than replacing fneg by a 
potentially expensive fdiv or fmul.

Differential Revision: https://reviews.llvm.org/D50417

llvm-svn: 339248
2018-08-08 14:29:08 +00:00
Roman Lebedev a677651a5a [InstCombine] De Morgan: sink 'not' into 'xor' (PR38446)
Summary:
https://rise4fun.com/Alive/IT3

Comes up in the [most ugliest]  `signed int` -> `signed char`  case of
`-fsanitize=implicit-conversion` (https://reviews.llvm.org/D50250)
Previously, we were stuck with `not`: {F6867736}
But now we are able to completely get rid of it: {F6867737}
(FIXME: why are we loosing the metadata? that seems wrong/strange.)

Here, we only want to do that it we will be able to completely
get rid of that 'not'.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: vsk, erichkeane, llvm-commits

Differential Revision: https://reviews.llvm.org/D50301

llvm-svn: 339243
2018-08-08 13:31:19 +00:00
Roman Lebedev c6a00f545c [NFC][InstCombine] Cleanup demorgan-sink-not-into-xor.ll test
We are only going to do it if it is free to do.

llvm-svn: 339223
2018-08-08 08:46:07 +00:00
Max Kazantsev c9dca6df78 [NFC] Add some tests on mustexec
llvm-svn: 339219
2018-08-08 04:40:47 +00:00
Sanjay Patel 979423c996 [InstCombine] add tests for fneg fold including FMF; NFC
llvm-svn: 339203
2018-08-07 23:24:25 +00:00
Sanjay Patel bac052ef52 [InstCombine] fix FP constant in test; NFC
Too many digits...

llvm-svn: 339200
2018-08-07 23:03:29 +00:00
Michael Berg 2e60ad2e58 [NFC] adding tests for Y - (X + Y) --> -X
llvm-svn: 339197
2018-08-07 22:52:57 +00:00
Sanjay Patel 25887da162 [InstCombine] add tests for fneg of fmul/fdiv with constant; NFC
llvm-svn: 339195
2018-08-07 22:30:43 +00:00
Sanjay Patel 9b07347033 [InstSimplify] fold fsub+fadd with common operand
llvm-svn: 339176
2018-08-07 20:32:55 +00:00
Sanjay Patel 4364d604c2 [InstSimplify] fold fadd+fsub with common operand
llvm-svn: 339174
2018-08-07 20:23:49 +00:00
Sanjay Patel f7a8fb2dee [InstSimplify] fold fsub+fsub with common operand
llvm-svn: 339171
2018-08-07 20:14:27 +00:00
Sanjay Patel 50976393ed [InstSimplify] add tests for fadd/fsub; NFC
Instcombine gets some, but not all, of these cases via
it's internal reassociation transforms. It fails in
all cases with vector types.

llvm-svn: 339168
2018-08-07 19:49:13 +00:00
Alexey Bataev 0edcd0278d [SLP] Fix insert point for reused extract instructions.
Summary:
Reworked the previously committed patch to insert shuffles for reused
extract element instructions in the correct position. Previous logic was
incorrect, and might lead to the crash with PHIs and EH instructions.

Reviewers: efriedma, javed.absar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50143

llvm-svn: 339166
2018-08-07 19:21:05 +00:00
Wei Mi b1ef2cc53d [SampleFDO] Fix a bug in getOrCompHotCountThreshold/getOrCompColdCountThreshold
getOrCompHotCountThreshold/getOrCompColdCountThreshold introduced in
https://reviews.llvm.org/D45377 contain a bad mistake and will only return 1 or 0
instead of the true hot/cold cutoff value. The patch fixes the mistake. But the
mistake seems not causing big performance difference according to internal server
benchmarks testing.

Differential Revision: https://reviews.llvm.org/D50370

llvm-svn: 339162
2018-08-07 18:13:10 +00:00
Philip Reames c792e197b4 [LICM] Strengthen assume hoisting tests [NFC]
As requested in review of https://reviews.llvm.org/D50364

llvm-svn: 339159
2018-08-07 17:54:36 +00:00
Florian Hahn 950576bdf8 [GVN,NewGVN] Keep nonnull if K does not move.
In combineMetadata, we should be able to preserve K's nonnull metadata,
if K does not move. This condition should hold for all replacements by
NewGVN/GVN, but I added a bunch of assertions to verify that.

Fixes PR35038.

There probably are additional kinds of metadata that could be preserved
using similar reasoning. This is follow-up work.

Reviewers: dberlin, davide, efriedma, nlopes

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D47339

llvm-svn: 339149
2018-08-07 15:36:11 +00:00
Sanjay Patel 948ff87d7d [InstSimplify] move minnum/maxnum with common op fold from instcombine
llvm-svn: 339144
2018-08-07 14:36:27 +00:00
Sanjay Patel b06d283909 [InstSimplify] add tests for minnum/maxnum with shared op; NFC
llvm-svn: 339142
2018-08-07 14:13:40 +00:00
Sanjay Patel b802d18df7 [InstSimplify] move misplaced minnum/maxnum tests; NFC
llvm-svn: 339141
2018-08-07 14:12:08 +00:00
Philip Reames 94b29601ef [LICM] Further strengthen tests for hoisting guards and invariant.starts [NFC]
llvm-svn: 339062
2018-08-06 21:39:43 +00:00
Philip Reames 9d7bb2f700 [LICM] Strengthen invariant.start hoisting tests [NFC]
llvm-svn: 339057
2018-08-06 21:18:34 +00:00
Philip Reames 81c7dc93d2 [LICM] Add tests highlighting missing hoists for intrinsics [NFC]
llvm-svn: 339054
2018-08-06 21:06:15 +00:00
Evandro Menezes 6e137cb9f0 [SLC] Fix shrinking of pow()
Properly shrink `pow()` to `powf()` as a binary function and, when no other
simplification applies, do not discard it.

Differential revision: https://reviews.llvm.org/D50113

llvm-svn: 339046
2018-08-06 19:40:17 +00:00
Matt Arsenault 56b31d8d75 ValueTracking: Handle canonicalize in CannotBeNegativeZero
Also fix apparently missing test coverage for any of the
handling here.

llvm-svn: 339023
2018-08-06 15:16:26 +00:00
Max Kazantsev 2dbbd64cb7 Re-enable "[ValueTracking] Teach isKnownNonNullFromDominatingCondition about AND"
The patch was reverted because of bug detected by sanitizer. The bug is fixed,
respective tests added.

Differential Revision: https://reviews.llvm.org/D50172

llvm-svn: 339005
2018-08-06 11:14:18 +00:00
Max Kazantsev 3271f379a9 Revert rL338990 to see if it causes sanitizer failures
Multiple failues reported by sanitizer-x86_64-linux, seem to be caused by this
patch. Reverting to see if they sustain without it.

Differential Revision: https://reviews.llvm.org/D50172

llvm-svn: 338994
2018-08-06 08:10:28 +00:00
Max Kazantsev 34b0666be9 [ValueTracking] Teach isKnownNonNullFromDominatingCondition about AND
`isKnownNonNullFromDominatingCondition` is able to prove non-null basing on `br` or `guard`
by `%p != null` condition, but is unable to do so basing on `(%p != null) && %other_cond`.
This patch allows it to do so.

Differential Revision: https://reviews.llvm.org/D50172
Reviewed By: reames

llvm-svn: 338990
2018-08-06 06:11:36 +00:00
Max Kazantsev eded4abef8 [GuardWidening] Widen guards with conditions of frequently taken dominated branches
If there is a frequently taken branch dominated by a guard, and its condition is available
at the point of the guard, we can widen guard with condition of this branch and convert
the branch into unconditional:

  guard(cond1)
  if (cond2) {
    // taken in 99.9% cases
    // do something
  } else {
    // do something else    
  }

Converts to

  guard(cond1 && cond2)
  // do something

Differential Revision: https://reviews.llvm.org/D49974
Reviewed By: reames

llvm-svn: 338988
2018-08-06 05:49:19 +00:00
David Bolvansky c0aa4b75a4 Enrich inline messages
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.

Patch by: yrouban (Yevgeny Rouban)


Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00

Reviewed By: tejohnson, xbolva00

Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith

Differential Revision: https://reviews.llvm.org/D49412

llvm-svn: 338969
2018-08-05 14:53:08 +00:00
Roman Lebedev 365fa96055 [NFC][InstCombine] Add tests for sinking 'not' into 'xor' (PR38446)
https://rise4fun.com/Alive/IT3

Comes up in the [most ugliest]  signed int -> signed char  case of
-fsanitize=implicit-conversion (https://reviews.llvm.org/D50250)

Not sure if we want to do it always, or only when it is free to invert.

llvm-svn: 338967
2018-08-05 10:15:04 +00:00
Roman Lebedev 656a478e98 [NFC][InstCombine] Regenerate set.ll test
llvm-svn: 338965
2018-08-05 08:53:40 +00:00
David Bolvansky b82a5ec1b6 [InstCombine] [NFC] Tests for strcmp to memcmp transformation
llvm-svn: 338963
2018-08-05 05:46:56 +00:00
Chijun Sima 8b5de48d62 [TailCallElim] Preserve DT and PDT
Summary:
Previously, in the NewPM pipeline, TailCallElim recalculates the DomTree when it modifies any instruction in the Function.
For example,
```
CallInst *CI = dyn_cast<CallInst>(&I);
...
CI->setTailCall();
Modified = true;
...
if (!Modified || ...)
  return PreservedAnalyses::all();
```
After applying this patch, the DomTree only recalculates if needed (plus an extra insertEdge() + an extra deleteEdge() call).

When optimizing SQLite with `-passes="default<O3>"` pipeline of the newPM, the number of DomTree recalculation decreases by 6.2%, the number of nodes visited by DFS decreases by 2.9%. The time used by DomTree will decrease approximately 1%~2.5% after applying the patch.
 
Statistics:
```
Before the patch:
 23010 dom-tree-stats               - Number of DomTree recalculations
489264 dom-tree-stats               - Number of nodes visited by DFS -- DomTree
After the patch:
 21581 dom-tree-stats               - Number of DomTree recalculations
475088 dom-tree-stats               - Number of nodes visited by DFS -- DomTree
```

Reviewers: kuhar, dmgreen, brzycki, grosser, davide

Reviewed By: kuhar, brzycki

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49982

llvm-svn: 338954
2018-08-04 08:13:47 +00:00
Chijun Sima eacad79777 [ADCE] Remove the need of DomTree
Summary: ADCE doesn't need to query domtree.

Reviewers: kuhar, brzycki, dmgreen, davide, grosser

Reviewed By: kuhar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49988

llvm-svn: 338950
2018-08-04 02:50:12 +00:00
Anastasis Grammenos 4dfe279e00 [TRE][DebugInfo] Preserve Debug Location in new branch instruction
There are two branch instructions created
so the new test covers them both.

Differential Revision: https://reviews.llvm.org/D50263

llvm-svn: 338917
2018-08-03 20:27:13 +00:00
Hiroshi Inoue 73f8b255b6 [InstSimplify] fold extracting from std::pair (2/2)
This is the second patch of the series which intends to enable jump threading for an inlined method whose return type is std::pair<int, bool> or std::pair<bool, int>. 
The first patch is https://reviews.llvm.org/rL338485.

This patch handles code sequences that merges two values using `shl` and `or`, then extracts one value using `and`.

Differential Revision: https://reviews.llvm.org/D49981

llvm-svn: 338817
2018-08-03 05:39:48 +00:00
Philip Reames 5937368d4f [LICM] Remove unneccessary safety check to increase sinking effectiveness
This one requires a bit of explaination.  It's not every day you simply delete code to implement an optimization.  :)

The transform in question is sinking an instruction from a loop to the uses in loop exiting blocks.  We know (from LCSSA) that all of the uses outside the loop must be phi nodes, and after predecessor splitting, we know all phi users must have a single operand.  Since the use must be strictly dominated by the def, we know from the definition of dominance/ssa that the exit block must execute along a (non-strict) subset of paths which reach the def.  As a result, duplicating a potentially faulting instruction can not *introduce* a fault that didn't previously exist in the program.  

The full story is that this patch builds on "rL338671: [LICM] Factor out fault legality from canHoistOrSinkInst [NFC]" which pulled this logic out of a common helper routine.  As best I can tell, this check was originally added to the helper function for hoisting legality, later an incorrect fastpath for loads/calls was added, and then the bug was fixed by duplicating the fault safety check in the hoist path.  This left the redundant check in the common code to pessimize sinking for no reason.  I split it out in an NFC, and am not removing the unneccessary check.  I wanted there to be something easy to revert in case I missed something.

Reviewed by: Anna Thomas (in person)

llvm-svn: 338794
2018-08-03 00:21:56 +00:00
Eli Friedman 1ba5e9ac24 [GlobalMerge] Allow merging globals with explicit section markings.
At least on ELF, it's impossible to tell from the object file whether
two globals with the same section marking were merged: the merged global
uses "private" linkage to hide its symbol, and the aliases look like
regular symbols. I can't think of any other reason to disallow it.
(Of course, we can only merge globals in the same section.)

The weird alignment handling matches AsmPrinter; our alignment handling
for global variables should probably be refactored.

Differential Revision: https://reviews.llvm.org/D49822

llvm-svn: 338791
2018-08-02 23:54:16 +00:00
David Bolvansky 67647bcfbe [InstCombine] [NFC] Tests for select with binop fold
llvm-svn: 338727
2018-08-02 14:59:23 +00:00
Sanjay Patel 3f6e9a71f7 [InstSimplify] move minnum/maxnum with undef fold from instcombine
llvm-svn: 338719
2018-08-02 14:33:40 +00:00
Sanjay Patel f9a0d593e9 [ValueTracking] fix maxnum miscompile for cannotBeOrderedLessThanZero (PR37776)
This adds the NAN checks suggested in PR37776:
https://bugs.llvm.org/show_bug.cgi?id=37776

If both operands to maxnum are NAN, that should get constant folded, so we don't 
have to handle that case. This is the same assumption as other FP ops in this
function. Returning 'false' is always conservatively correct.

Copying from the bug report:

Currently, we have this for "when is cannotBeOrderedLessThanZero 
(mustBePositiveOrNaN) true for maxnum":
               L
        -------------------
        | Pos | Neg | NaN |
   ------------------------
   |Pos |  x  |  x  |  x  |
   ------------------------
 R |Neg |  x  |     |  x  |
   ------------------------
   |NaN |  x  |  x  |  x  |
   ------------------------


The cases with (Neg & NaN) are wrong. We should have:

                L
        -------------------
        | Pos | Neg | NaN |
   ------------------------
   |Pos |  x  |  x  |  x  |
   ------------------------
 R |Neg |  x  |     |     |
   ------------------------
   |NaN |  x  |     |  x  |
   ------------------------

Differential Revision: https://reviews.llvm.org/D50081

llvm-svn: 338716
2018-08-02 13:46:20 +00:00
Philip Reames 24b13cb06d [LICM] Expand tests to highlight an oddity in sinking implementation
llvm-svn: 338670
2018-08-02 03:54:29 +00:00
Sanjay Patel 28c7e41c09 [InstSimplify] move minnum/maxnum with same arg fold from instcombine
llvm-svn: 338652
2018-08-01 23:05:55 +00:00
David Bolvansky fbbb83c782 Revert "Enrich inline messages", tests fail
llvm-svn: 338496
2018-08-01 08:02:40 +00:00
David Bolvansky 7f36cd9d96 Enrich inline messages
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.

Patch by: yrouban (Yevgeny Rouban)


Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00

Reviewed By: tejohnson, xbolva00

Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith

Differential Revision: https://reviews.llvm.org/D49412

llvm-svn: 338494
2018-08-01 07:37:16 +00:00
Hiroshi Inoue 02f79eae06 [InstSimplify] fold extracting from std::pair (1/2)
This patch intends to enable jump threading when a method whose return type is std::pair<int, bool> or std::pair<bool, int> is inlined.
For example, jump threading does not happen for the if statement in func.

std::pair<int, bool> callee(int v) {
  int a = dummy(v);
  if (a) return std::make_pair(dummy(v), true);
  else return std::make_pair(v, v < 0);
}

int func(int v) {
  std::pair<int, bool> rc = callee(v);
  if (rc.second) {
    // do something
  }

SROA executed before the method inlining replaces std::pair by i64 without splitting in both callee and func since at this point no access to the individual fields is seen to SROA.
After inlining, jump threading fails to identify that the incoming value is a constant due to additional instructions (like or, and, trunc).

This series of patch add patterns in InstructionSimplify to fold extraction of members of std::pair. To help jump threading, actually we need to optimize the code sequence spanning multiple BBs.
These patches does not handle phi by itself, but these additional patterns help NewGVN pass, which calls instsimplify to check opportunities for simplifying instructions over phi, apply phi-of-ops optimization to result in successful jump threading. 
SimplifyDemandedBits in InstCombine, can do more general optimization but this patch aims to provide opportunities for other optimizers by supporting a simple but common case in InstSimplify.

This first patch in the series handles code sequences that merges two values using shl and or and then extracts one value using lshr.

Differential Revision: https://reviews.llvm.org/D48828

llvm-svn: 338485
2018-08-01 04:40:32 +00:00
Evandro Menezes eb159e56a6 [PATCH] [SLC] Test simplification of pow() for vector types (NFC)
Add test case for the simplification of `pow()` for vector types that D50035
enables.

llvm-svn: 338463
2018-08-01 00:30:43 +00:00
Ewan Crawford d83beb804c Fix InstCombine address space assert
Workaround bug where the InstCombine pass was asserting on the IR added in lit
test, where we have a bitcast instruction after a GEP from an addrspace cast.

The second bitcast in the test was getting combined into
`bitcast <16 x i32>* %0 to <16 x i32> addrspace(3)*`, which looks like it should
be an addrspace cast instruction instead. Otherwise if control flow is allowed
to continue as it is now we create a GEP instruction
`<badref> = getelementptr inbounds <16 x i32>, <16 x i32>* %0, i32 0`. However
because the type of this instruction doesn't match the address space we hit an
assert when replacing the bitcast with that GEP.

```
void llvm::Value::doRAUW(llvm::Value*, bool): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.
```

Differential Revision: https://reviews.llvm.org/D50058

llvm-svn: 338395
2018-07-31 15:53:03 +00:00
Sanjay Patel a35781fdf9 [InstCombine] regenerate checks and add tests for D50035; NFC
llvm-svn: 338392
2018-07-31 15:07:32 +00:00
Anastasis Grammenos ac3f8028da [DebugInfo][LCSSA] Preserve debug location in lcssa phis
Summary:
When inserting lcssa Phi Nodes in the exit block
mak sure to preserve the original instructions DL.

Reviewers: vsk

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D50009

llvm-svn: 338391
2018-07-31 14:54:52 +00:00
David Bolvansky ab79414f7b Revert Enrich inline messages
llvm-svn: 338389
2018-07-31 14:47:22 +00:00
Sanjay Patel 57d617d676 [InstCombine] auto-generate checks; NFC
llvm-svn: 338388
2018-07-31 14:27:30 +00:00
David Bolvansky b562dbabda Enrich inline messages
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.

Patch by: yrouban (Yevgeny Rouban)


Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00

Reviewed By: tejohnson, xbolva00

Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith

Differential Revision: https://reviews.llvm.org/D49412

llvm-svn: 338387
2018-07-31 14:25:24 +00:00
John Brawn cd5f37f3f1 [MemDep] Use PhiValuesAnalysis to improve alias analysis results
This is being done in order to make GVN able to better optimize certain inputs.
MemDep doesn't use PhiValues directly, but does need to notifiy it when things
get invalidated.

Differential Revision: https://reviews.llvm.org/D48489

llvm-svn: 338384
2018-07-31 14:19:29 +00:00
David Bolvansky 16d8a69b90 [InstSimplify] Fold another Select with And/Or pattern
Summary: Proof: https://rise4fun.com/Alive/L5J

Reviewers: lebedev.ri, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49975

llvm-svn: 338383
2018-07-31 14:17:15 +00:00
Alexey Bataev c0c3a6ed5e [SLP] Fix PR38339: Instruction does not dominate all uses!
Summary:
If the ExtractElement instructions can be optimized out during the
vectorization and we need to reshuffle the parent vector, this
ShuffleInstruction may be inserted in the wrong place causing compiler
to produce incorrect code.

Reviewers: spatel, RKSimon, mkuper, hfinkel, javed.absar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49928

llvm-svn: 338380
2018-07-31 14:02:43 +00:00
Sanjay Patel 9a801cb598 [InstCombine] simplify code for A & (A ^ B) --> A & ~B
This fold was written in an odd way and tried to avoid
an endless loop by bailing out on all constants instead
of the supposedly problematic case of -1. But (X & -1) 
should always be simplified before we reach here, so I'm
not sure how that is a problem.

There were no tests for the commuted patterns, so I added
those at rL338364.

llvm-svn: 338367
2018-07-31 13:00:03 +00:00
Sanjay Patel 995138ce60 [InstCombine] move/add tests for xor+add fold; NFC
llvm-svn: 338364
2018-07-31 12:31:00 +00:00
Hiroshi Inoue 2f6769be60 [InstSimplify] tests for D48828, D49981: fold extraction from std::pair
Minor touch up in the previous comment.

llvm-svn: 338351
2018-07-31 05:29:20 +00:00
Hiroshi Inoue 5427d3efc2 [InstSimplify] tests for D48828, D49981: fold extraction from std::pair
Updated unit tests for D48828 and D49981.

llvm-svn: 338350
2018-07-31 05:10:36 +00:00
David Bolvansky 6737b3a6a1 [InstCombine] Fold Select with binary op
Summary:
Fold
  %A = icmp eq i8 %x, 0
  %B = xor i8 %x, %z
  %C = select i1 %A, i8 %B, i8 %y
To
  %C = select i1 %A, i8 %z, i8 %y

Fixes https://bugs.llvm.org/show_bug.cgi?id=38345
Proof: https://rise4fun.com/Alive/43J

Reviewers: lebedev.ri, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49954

llvm-svn: 338300
2018-07-30 20:38:53 +00:00
Manoj Gupta 9d83ce9043 [Inline] Copy "null-pointer-is-valid" attribute in caller.
Summary:
Normally, inling does not happen if caller does not have
"null-pointer-is-valid"="true" attibute but callee has it.

However, alwaysinline may force callee to be inlined.
In this case, if the caller has the "null-pointer-is-valid"="true"
attribute, copy the attribute to caller.

Reviewers: efriedma, a.elovikov, lebedev.ri, jyknight

Reviewed By: efriedma

Subscribers: eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D50000

llvm-svn: 338292
2018-07-30 19:33:53 +00:00
David Bolvansky 76e95865f3 [InstSimplify] [NFC] Tests for Select with AND/OR fold
llvm-svn: 338285
2018-07-30 18:22:18 +00:00
Evandro Menezes a7d48286fb [SLC] Refactor the simplication of pow() (NFC)
Use more meaningful variable names.  Mostly NFC.

llvm-svn: 338266
2018-07-30 16:20:04 +00:00
David Bolvansky 403826ee0f [InstCombine] [NFC] Added tests for Select with binop fold
llvm-svn: 338257
2018-07-30 15:38:42 +00:00
John Brawn cd73fe8989 [BasicAA] Use PhiValuesAnalysis if available when handling phi alias
By using PhiValuesAnalysis we can get all the values reachable from a phi, so
we can be more precise instead of giving up when a phi has phi operands. We
can't make BaseicAA directly use PhiValuesAnalysis though, as the user of
BasicAA may modify the function in ways that PhiValuesAnalysis can't cope with.

For this optional usage to work correctly BasicAAWrapperPass now needs to be not
marked as CFG-only (i.e. it is now invalidated even when CFG is preserved) due
to how the legacy pass manager handles dependent passes being invalidated,
namely the depending pass still has a pointer to the now-dead dependent pass.

Differential Revision: https://reviews.llvm.org/D44564

llvm-svn: 338242
2018-07-30 11:52:08 +00:00
Sanjay Patel 577c705752 [InstCombine] try to fold 'add+sub' to 'not+add'
These are reassociated versions of the same pattern and
similar transforms as in rL338200 and rL338118.

The motivation is identical to those commits:
Patterns with add/sub combos can be improved using
'not' ops. This is better for analysis and may lead
to follow-on transforms because 'xor' and 'add' are
commutative/associative. It can also help codegen.

llvm-svn: 338221
2018-07-29 18:13:16 +00:00
Sanjay Patel 2daf28f9ce [InstCombine] add tests for another sub-not variant; NFC
llvm-svn: 338220
2018-07-29 18:07:28 +00:00
Sanjay Patel 54421ce918 [InstSimplify] fold funnel shifts with 0-shift amount
llvm-svn: 338218
2018-07-29 16:36:38 +00:00
Sanjay Patel 46af5835af [InstSimplify] add tests for funnel shift intrinsics; NFC
llvm-svn: 338217
2018-07-29 16:27:17 +00:00
David Bolvansky f801a58c0d [InstCombine] Tests for fold Select with binary op
Differential Revision: https://reviews.llvm.org/D49961

llvm-svn: 338201
2018-07-28 17:13:33 +00:00
Sanjay Patel 818b253d3a [InstCombine] try to fold 'sub' to 'not'
https://rise4fun.com/Alive/jDd

Patterns with add/sub combos can be improved using
'not' ops. This is better for analysis and may lead
to follow-on transforms because 'xor' and 'add' are 
commutative/associative. It can also help codegen.  

llvm-svn: 338200
2018-07-28 16:48:44 +00:00
David Bolvansky 9800b710c2 [InstSimplify] Moved Select + AND/OR tests from InstCombine
Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49957

llvm-svn: 338195
2018-07-28 13:52:45 +00:00
David Green fc4b0fe0a2 [GlobalOpt] Test array indices inside structs for out-of-bounds accesses
We now, from clang, can turn arrays of
  static short g_data[] = {16, 16, 16, 16, 16, 16, 16, 16, 0, 0, 0, 0, 0, 0, 0, 0};
into structs of the form
  @g_data = internal global <{ [8 x i16], [8 x i16] }> ...

GlobalOpt will incorrectly SROA it, not realising that the access to the first
element may overflow into the second. This fixes it by checking geps more
thoroughly.

I believe this makes the globalsra-partial.ll test case invalid as the %i value
could be out of bounds. I've re-purposed it as a negative test for this case.

Differential Revision: https://reviews.llvm.org/D49816

llvm-svn: 338192
2018-07-28 08:20:10 +00:00
David Bolvansky f947608ddf [InstCombine] Fold Select with AND/OR condition
Summary:
Fold
```
%A = icmp ne i8 %X, %V1
%B = icmp ne i8 %X, %V2
%C = or i1 %A, %B
%D = select i1 %C, i8 %X, i8 %V1
ret i8 %D
  =>
ret i8 %X

Fixes https://bugs.llvm.org/show_bug.cgi?id=38334
Proof: https://rise4fun.com/Alive/plI8

Reviewers: spatel, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: craig.topper, llvm-commits

Differential Revision: https://reviews.llvm.org/D49919

llvm-svn: 338191
2018-07-28 06:55:51 +00:00
David Bolvansky 173484d78c [InstCombine] [NFC] [Tests] Fold Select with AND/OR condition - fixed
Differential Revision: https://reviews.llvm.org/D49933

llvm-svn: 338161
2018-07-27 20:29:32 +00:00
David Bolvansky 1b82617473 [InstCombine] [NFC] [Tests] Fold Select with AND/OR condition
Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49932

llvm-svn: 338159
2018-07-27 20:18:12 +00:00
Evandro Menezes f611ce82c0 [SLC] Test simplification of pow(x, 0.333...) to cbrt(x) (NFC)
Add test case for simplifying `pow(x, 0.333...)` into `cbrt(x)`, which
D49040 enables.

llvm-svn: 338152
2018-07-27 18:56:47 +00:00
Sanjay Patel 78e4b4d3c4 [InstCombine] not(sub X, Y) --> add (not X), Y
The tests with constants show a missing optimization.
Analysis for adds is better than subs, so this can also
help with other transforms. And codegen is better with 
adds for targets like x86 (destructive ops, no sub-from).

https://rise4fun.com/Alive/llK

llvm-svn: 338118
2018-07-27 10:54:48 +00:00
Sanjay Patel eee52b5090 [InstCombine] add tests for not+sub; NFC
llvm-svn: 338117
2018-07-27 10:45:04 +00:00
Max Kazantsev 4d980515d2 [SimplifyIndVar] Canonicalize comparisons to unsigned while eliminating truncs
This is a follow-up for the patch rL335020. When we replace compares against
trunc with compares against wide IV, we can also replace signed predicates with
unsigned where it is legal.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D48763

llvm-svn: 338115
2018-07-27 09:43:39 +00:00
Anastasis Grammenos f6e143e67f Revert "[LV][DebugInfo] Set DL to the middle block Icmp instruction"
This reverts commit r338106.

llvm-svn: 338109
2018-07-27 08:22:54 +00:00
Hiroshi Inoue eeab694cea [InstSimplify] tests for D48828: fold extraction from std::pair
This commit includes unit tests for D48828, which enhances InstSimplify to enable jump threading with a method whose return type is std::pair<int, bool> or std::pair<bool, int>.
I am going to commit the actual transformation later.

llvm-svn: 338107
2018-07-27 07:21:02 +00:00
Anastasis Grammenos 03948d0e0f [LV][DebugInfo] Set DL to the middle block Icmp instruction
Reviewers: hsaito

Differential Revision: https://reviews.llvm.org/D49746

llvm-svn: 338106
2018-07-27 07:12:44 +00:00
Chen Zheng 567485a72f [InstCombine] canonicalize abs pattern
Differential Revision: https://reviews.llvm.org/D48754

llvm-svn: 338092
2018-07-27 01:49:51 +00:00
Reid Kleckner c6cf918f44 [InstrProf] Use comdats on COFF for available_externally functions
Summary:
r262157 added ELF-specific logic to put a comdat on the __profc_*
globals created for available_externally functions. We should be able to
generalize that logic to all object file formats that support comdats,
i.e. everything other than MachO. This fixes duplicate symbol errors,
since on COFF, linkonce_odr doesn't make the symbol weak.

Fixes PR38251.

Reviewers: davidxl, xur

Subscribers: hiraditya, dmajor, llvm-commits, aheejin

Differential Revision: https://reviews.llvm.org/D49882

llvm-svn: 338082
2018-07-26 22:59:17 +00:00
Vedant Kumar b572f64212 [DebugInfo] LowerDbgDeclare: Add derefs when handling CallInst users
LowerDbgDeclare inserts a dbg.value before each use of an address
described by a dbg.declare. When inserting a dbg.value before a CallInst
use, however, it fails to append DW_OP_deref to the DIExpression.

The DW_OP_deref is needed to reflect the fact that a dbg.value describes
a source variable directly (as opposed to a dbg.declare, which relies on
pointer indirection).

This patch adds in the DW_OP_deref where needed. This results in the
correct values being shown during a debug session for a program compiled
with ASan and optimizations (see https://reviews.llvm.org/D49520). Note
that ConvertDebugDeclareToDebugValue is already correct -- no changes
there were needed.

One complication is that SelectionDAG is unable to distinguish between
direct and indirect frame-index (FRAMEIX) SDDbgValues. This patch also
fixes this long-standing issue in order to not regress integration tests
relying on the incorrect assumption that all frame-index SDDbgValues are
indirect. This is a necessary fix: the newly-added DW_OP_derefs cannot
be lowered properly otherwise. Basically the fix prevents a direct
SDDbgValue with DIExpression(DW_OP_deref) from being dereferenced twice
by a debugger. There were a handful of tests relying on this incorrect
"FRAMEIX => indirect" assumption which actually had incorrect
DW_AT_locations: these are all fixed up in this patch.

Testing:

- check-llvm, and an end-to-end test using lldb to debug an optimized
  program.
- Existing unit tests for DIExpression::appendToStack fully cover the
  new DIExpression::append utility.
- check-debuginfo (the debug info integration tests)

Differential Revision: https://reviews.llvm.org/D49454

llvm-svn: 338069
2018-07-26 20:56:53 +00:00
Sanjay Patel 6d6eab66e0 [InstCombine] fold udiv with common factor from muls with nuw
Unfortunately, sdiv isn't as simple because of UB due to overflow.

This fold is mentioned in PR38239:
https://bugs.llvm.org/show_bug.cgi?id=38239

llvm-svn: 338059
2018-07-26 19:22:41 +00:00
Sanjay Patel b381e7e59a [InstCombine] add tests for udiv with common factor; NFC
This fold is mentioned in PR38239:
https://bugs.llvm.org/show_bug.cgi?id=38239

The general case probably belongs in -reassociate, but given that we do 
basic reassociation optimizations similar to this in instcombine already, 
we might as well be consistent within instcombine and handle this pattern?

llvm-svn: 338038
2018-07-26 16:14:53 +00:00
Fangrui Song f2822e2d9d [ConstProp] Fix calls-math-finite.ll on FreeBSD
FreeBSD's log(3.0) is less precise than glibc and musl.
Let's forgive its rounding error of more than half an ulp.

llvm-svn: 338009
2018-07-26 06:24:11 +00:00
Eli Friedman d6baff65f7 [GlobalMerge] Handle llvm.compiler.used correctly.
Reuse the handling for llvm.used, and don't transform such globals.

Fixes a failure on the asan buildbot caused by my previous commit.

llvm-svn: 337973
2018-07-25 22:03:35 +00:00
Roman Tereshin 4f10a9d3a3 [LSV] Look through selects for consecutive addresses
In some cases LSV sees (load/store _ (select _ <pointer expression>
<pointer expression>)) patterns in input IR, often due to sinking and
other forms of CFG simplification, sometimes interspersed with
bitcasts and all-constant-indices GEPs. With this
patch`areConsecutivePointers` method would attempt to handle select
instructions. This leads to an increased number of successful
vectorizations.

Technically, select instructions could appear in index arithmetic as
well, however, we don't see those in our test suites / benchmarks.
Also, there is a lot more freedom in IR shapes computing integral
indices in general than in what's common in pointer computations, and
it appears that it's quite unreliable to do anything short of making
select instructions first class citizens of Scalar Evolution, which
for the purposes of this patch is most definitely an overkill.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D49428

llvm-svn: 337965
2018-07-25 21:33:00 +00:00
Eli Friedman 0887cf9cab [GlobalMerge] Allow merging globals with arbitrary alignment.
Instead of depending on implicit padding from the structure layout code,
use a packed struct and emit the padding explicitly.

Differential Revision: https://reviews.llvm.org/D49710

llvm-svn: 337961
2018-07-25 20:58:01 +00:00
Florian Hahn b6613ac665 Revert r337904: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
I suspect it is causing the clang-stage2-Rthinlto failures.

llvm-svn: 337956
2018-07-25 19:44:19 +00:00
Roman Tereshin ed047b0184 [SCEV] Add [zs]ext{C,+,x} -> (D + [zs]ext{C-D,+,x})<nuw><nsw> transform
as well as sext(C + x + ...) -> (D + sext(C-D + x + ...))<nuw><nsw>
similar to the equivalent transformation for zext's

if the top level addition in (D + (C-D + x * n)) could be proven to
not wrap, where the choice of D also maximizes the number of trailing
zeroes of (C-D + x * n), ensuring homogeneous behaviour of the
transformation and better canonicalization of such AddRec's

(indeed, there are 2^(2w) different expressions in `B1 + ext(B2 + Y)` form for
the same Y, but only 2^(2w - k) different expressions in the resulting `B3 +
ext((B4 * 2^k) + Y)` form, where w is the bit width of the integral type)

This patch generalizes sext(C1 + C2*X) --> sext(C1) + sext(C2*X) and
sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformations added in
r209568 relaxing the requirements the following way:

1. C2 doesn't have to be a power of 2, it's enough if it's divisible by 2
 a sufficient number of times;
2. C1 doesn't have to be less than C2, instead of extracting the entire
  C1 we can split it into 2 terms: (00...0XXX + YY...Y000), keep the
  second one that may cause wrapping within the extension operator, and
  move the first one that doesn't affect wrapping out of the extension
  operator, enabling further simplifications;
3. C1 and C2 don't have to be positive, splitting C1 like shown above
 produces a sum that is guaranteed to not wrap, signed or unsigned;
4. in AddExpr case there could be more than 2 terms, and in case of
  AddExpr the 2nd and following terms and in case of AddRecExpr the
  Step component don't have to be in the C2*X form or constant
  (respectively), they just need to have enough trailing zeros,
  which in turn could be guaranteed by means other than arithmetics,
  e.g. by a pointer alignment;
5. the extension operator doesn't have to be a sext, the same
  transformation works and profitable for zext's as well.

Apparently, optimizations like SLPVectorizer currently fail to
vectorize even rather trivial cases like the following:

 double bar(double *a, unsigned n) {
   double x = 0.0;
   double y = 0.0;
   for (unsigned i = 0; i < n; i += 2) {
     x += a[i];
     y += a[i + 1];
   }
   return x * y;
 }

If compiled with `clang -std=c11 -Wpedantic -Wall -O3 main.c -S -o - -emit-llvm`
(!{!"clang version 7.0.0 (trunk 337339) (llvm/trunk 337344)"})

it produces scalar code with the loop not unrolled with the unsigned `n` and
`i` (like shown above), but vectorized and unrolled loop with signed `n` and
`i`. With the changes made in this commit the unsigned version will be
vectorized (though not unrolled for unclear reasons).

How it all works:

Let say we have an AddExpr that looks like (C + x + y + ...), where C
is a constant and x, y, ... are arbitrary SCEVs. Let's compute the
minimum number of trailing zeroes guaranteed of that sum w/o the
constant term: (x + y + ...). If, for example, those terms look like
follows:

        i
XXXX...X000
YYYY...YY00
   ...
ZZZZ...0000

then the rightmost non-guaranteed-zero bit (a potential one at i-th
position above) can change the bits of the sum to the left (and at
i-th position itself), but it can not possibly change the bits to the
right. So we can compute the number of trailing zeroes by taking a
minimum between the numbers of trailing zeroes of the terms.

Now let's say that our original sum with the constant is effectively
just C + X, where X = x + y + .... Let's also say that we've got 2
guaranteed trailing zeros for X:

         j
CCCC...CCCC
XXXX...XX00  // this is X = (x + y + ...)

Any bit of C to the left of j may in the end cause the C + X sum to
wrap, but the rightmost 2 bits of C (at positions j and j - 1) do not
affect wrapping in any way. If the upper bits cause a wrap, it will be
a wrap regardless of the values of the 2 least significant bits of C.
If the upper bits do not cause a wrap, it won't be a wrap regardless
of the values of the 2 bits on the right (again).

So let's split C to 2 constants like follows:

0000...00CC  = D
CCCC...CC00  = (C - D)

and represent the whole sum as D + (C - D + X). The second term of
this new sum looks like this:

CCCC...CC00
XXXX...XX00
-----------  // let's add them up
YYYY...YY00

The sum above (let's call it Y)) may or may not wrap, we don't know,
so we need to keep it under a sext/zext. Adding D to that sum though
will never wrap, signed or unsigned, if performed on the original bit
width or the extended one, because all that that final add does is
setting the 2 least significant bits of Y to the bits of D:

YYYY...YY00 = Y
0000...00CC = D
-----------  <nuw><nsw>
YYYY...YYCC

Which means we can safely move that D out of the sext or zext and
claim that the top-level sum neither sign wraps nor unsigned wraps.

Let's run an example, let's say we're working in i8's and the original
expression (zext's or sext's operand) is 21 + 12x + 8y. So it goes
like this:

0001 0101  // 21
XXXX XX00  // 12x
YYYY Y000  // 8y

0001 0101  // 21
ZZZZ ZZ00  // 12x + 8y

0000 0001  // D
0001 0100  // 21 - D = 20
ZZZZ ZZ00  // 12x + 8y

0000 0001  // D
WWWW WW00  // 21 - D + 12x + 8y = 20 + 12x + 8y

therefore zext(21 + 12x + 8y) = (1 + zext(20 + 12x + 8y))<nuw><nsw>

This approach could be improved if we move away from using trailing
zeroes and use KnownBits instead. For instance, with KnownBits we could
have the following picture:

    i
10 1110...0011  // this is C
XX X1XX...XX00  // this is X = (x + y + ...)

Notice that some of the bits of X are known ones, also notice that
known bits of X are interspersed with unknown bits and not grouped on
the rigth or left.

We can see at the position i that C(i) and X(i) are both known ones,
therefore the (i + 1)th carry bit is guaranteed to be 1 regardless of
the bits of C to the right of i. For instance, the C(i - 1) bit only
affects the bits of the sum at positions i - 1 and i, and does not
influence if the sum is going to wrap or not. Therefore we could split
the constant C the following way:

    i
00 0010...0011  = D
10 1100...0000  = (C - D)

Let's compute the KnownBits of (C - D) + X:

XX1 1            = carry bit, blanks stand for known zeroes
 10 1100...0000  = (C - D)
 XX X1XX...XX00  = X
--- -----------
 XX X0XX...XX00

Will this add wrap or not essentially depends on bits of X. Adding D
to this sum, however, is guaranteed to not to wrap:

0    X
 00 0010...0011  = D
 sX X0XX...XX00  = (C - D) + X
--- -----------
 sX XXXX   XX11

As could be seen above, adding D preserves the sign bit of (C - D) +
X, if any, and has a guaranteed 0 carry out, as expected.

The more bits of (C - D) we constrain, the better the transformations
introduced here canonicalize expressions as it leaves less freedom to
what values the constant part of ((C - D) + x + y + ...) can take.

Reviewed By: mzolotukhin, efriedma

Differential Revision: https://reviews.llvm.org/D48853

llvm-svn: 337943
2018-07-25 18:01:41 +00:00
Florian Hahn 6f5c6adbcd Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
r337828 resolves a PredicateInfo issue with unnamed types.

Original message:
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.

As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.

Reviewers: davide, mssimpso, dberlin, efriedma

Reviewed By: davide, dberlin

llvm-svn: 337904
2018-07-25 11:13:40 +00:00
Hideki Saito ef380b0fc5 [LV] Fix for PR38110, LV encountered llvm_unreachable()
Summary: truncateToMinimalBitWidths() doesn't handle all Instructions and the worst case is compiler crash via llvm_unreachable(). Fix is to add a case to handle PHINode and changed the worst case to NO-OP (from compiler crash).

Reviewers: sbaranga, mssimpso, hsaito

Reviewed By: hsaito

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49461

llvm-svn: 337861
2018-07-24 22:30:31 +00:00
Roman Tereshin 1ba1f9310c [SCEV] Add zext(C + x + ...) -> D + zext(C-D + x + ...)<nuw><nsw> transform
if the top level addition in (D + (C-D + x + ...)) could be proven to
not wrap, where the choice of D also maximizes the number of trailing
zeroes of (C-D + x + ...), ensuring homogeneous behaviour of the
transformation and better canonicalization of such expressions.

This enables better canonicalization of expressions like

  1 + zext(5 + 20 * %x + 24 * %y)  and
      zext(6 + 20 * %x + 24 * %y)

which get both transformed to

  2 + zext(4 + 20 * %x + 24 * %y)

This pattern is common in address arithmetics and the transformation
makes it easier for passes like LoadStoreVectorizer to prove that 2 or
more memory accesses are consecutive and optimize (vectorize) them.

Reviewed By: mzolotukhin

Differential Revision: https://reviews.llvm.org/D48853

llvm-svn: 337859
2018-07-24 21:48:56 +00:00
Florian Hahn 36d2e25d5a [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types.
This is a workaround and it would be better to fix this generally, but
doing it generally is quite tricky. See D48541 and PR38117.

Doing it in PredicateInfo directly allows us to use the type address to
differentiate different unnamed types, because neither the created
declarations nor the ssa_copy calls should be visible after
PredicateInfo got destroyed.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D49126

llvm-svn: 337828
2018-07-24 14:49:52 +00:00
Manoj Gupta f9f50f634d ConstantFolding: Avoid a crash.
Summary:
Check if the parent basic block and caller exists
before calling CS.getCaller when constant folding
strip.invariant.group instrinsic.

This avoids a crash when the function containing the intrinsic
is being inlined. The instruction is checked for any simplifiction
but has not yet been added to a basic block.

Reviewers: Prazek, rsmith, efriedma

Reviewed By: efriedma

Subscribers: eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D49690

llvm-svn: 337742
2018-07-23 21:20:00 +00:00
John Brawn fc18a6ad7d [GVN] Don't use the eliminated load as an available value in phi construction
In ConstructSSAForLoadSet if an available value is actually the load that we're
doing SSA construction to eliminate, then we can omit it as SSAUpdate will add
in the value for the phi that will be replacing it anyway. This can result in
simpler IR which can allow further optimisation.

Differential Revision: https://reviews.llvm.org/D44160

llvm-svn: 337686
2018-07-23 12:14:45 +00:00
Alexandros Lamprineas bf6009c234 [MemorySSAUpdater] Update Phi operands after trivial Phi elimination
Bug fix for PR37445. The underlying problem and its fix are similar to PR37808.
The bug lies in MemorySSAUpdater::getPreviousDefRecursive(), where PhiOps is
computed before the call to tryRemoveTrivialPhi() and it ends up being out of
date, pointing to stale data. We have now turned each of the PhiOps into a
TrackingVH<MemoryAccess>.

Differential Revision: https://reviews.llvm.org/D49425

llvm-svn: 337680
2018-07-23 10:56:30 +00:00
Alexandros Lamprineas 592cc78dd8 [GVNHoist] safeToHoistLdSt allows illegal hoisting
Bug fix for PR36787. When reasoning if it's safe to hoist a load we
want to make sure that the defining memory access dominates the new
insertion point of the hoisted instruction. safeToHoistLdSt calls
firstInBB(InsertionPoint,DefiningAccess) which returns false if
InsertionPoint == DefiningAccess, and therefore it falsely thinks
it's safe to hoist.

Differential Revision: https://reviews.llvm.org/D49555

llvm-svn: 337674
2018-07-23 09:42:35 +00:00
Chen Zheng 69bb064539 [InstrSimplify] fold sdiv if two operands are negated and non-overflow
Differential Revision: https://reviews.llvm.org/D49382

llvm-svn: 337642
2018-07-21 12:27:54 +00:00
Roman Tereshin 31d52847ef Reapply "[LSV] Refactoring + supporting bitcasts to a type of different size"
This reapplies commit r337489 reverted by r337541
Additionally, this commit contains a speculative fix to the issue reported in r337541
(the report does not contain an actionable reproducer, just a stack trace)

llvm-svn: 337606
2018-07-20 20:10:04 +00:00
Chen Zheng 0f609db61a [NFC][testcases] fold sdiv if two operands are negated and non-overflow
llvm-svn: 337549
2018-07-20 13:38:59 +00:00
Florian Hahn 0a560d5d9c Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters.
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.

Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.

Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.

Reviewers: dberlin, mssimpso, davide, efriedma

Reviewed By: mssimpso

Differential Revision: https://reviews.llvm.org/D43762

llvm-svn: 337548
2018-07-20 13:29:12 +00:00
Chen Zheng f801d0fea9 [InstSimplify] fold srem instruction if its two operands are negated.
Differential Revision: https://reviews.llvm.org/D49423

llvm-svn: 337545
2018-07-20 13:00:47 +00:00
Chen Zheng 35395a6773 [NFC][testcases] more testcases for folding srem if its two operands are negatived.
llvm-svn: 337543
2018-07-20 12:53:45 +00:00
Sam McCall 57743883f1 Revert "[LSV] Refactoring + supporting bitcasts to a type of different size"
This reverts commit r337489.
It causes asserts to fire in some TensorFlow tests, e.g.
tensorflow/compiler/tests/gather_test.py on GPU.

Example stack trace:
Start test case: GatherTest.testHigherRank
assertion failed at third_party/llvm/llvm/lib/Support/APInt.cpp:819 in llvm::APInt llvm::APInt::trunc(unsigned int) const: width && "Can't truncate to 0 bits"
    @     0x5559446ebe10  __assert_fail
    @     0x55593ef32f5e  llvm::APInt::trunc()
    @     0x55593d78f86e  (anonymous namespace)::Vectorizer::lookThroughComplexAddresses()
    @     0x55593d78f2bc  (anonymous namespace)::Vectorizer::areConsecutivePointers()
    @     0x55593d78d128  (anonymous namespace)::Vectorizer::isConsecutiveAccess()
    @     0x55593d78c926  (anonymous namespace)::Vectorizer::vectorizeInstructions()
    @     0x55593d78c221  (anonymous namespace)::Vectorizer::vectorizeChains()
    @     0x55593d78b948  (anonymous namespace)::Vectorizer::run()
    @     0x55593d78b725  (anonymous namespace)::LoadStoreVectorizer::runOnFunction()
    @     0x55593edf4b17  llvm::FPPassManager::runOnFunction()
    @     0x55593edf4e55  llvm::FPPassManager::runOnModule()
    @     0x55593edf563c  (anonymous namespace)::MPPassManager::runOnModule()
    @     0x55593edf5137  llvm::legacy::PassManagerImpl::run()
    @     0x55593edf5b71  llvm::legacy::PassManager::run()
    @     0x55593ced250d  xla::gpu::IrDumpingPassManager::run()
    @     0x55593ced5033  xla::gpu::(anonymous namespace)::EmitModuleToPTX()
    @     0x55593ced40ba  xla::gpu::(anonymous namespace)::CompileModuleToPtx()
    @     0x55593ced33d0  xla::gpu::CompileToPtx()
    @     0x55593b26b2a2  xla::gpu::NVPTXCompiler::RunBackend()
    @     0x55593b21f973  xla::Service::BuildExecutable()
    @     0x555938f44e64  xla::LocalService::CompileExecutable()
    @     0x555938f30a85  xla::LocalClient::Compile()
    @     0x555938de3c29  tensorflow::XlaCompilationCache::BuildExecutable()
    @     0x555938de4e9e  tensorflow::XlaCompilationCache::CompileImpl()
    @     0x555938de3da5  tensorflow::XlaCompilationCache::Compile()
    @     0x555938c5d962  tensorflow::XlaLocalLaunchBase::Compute()
    @     0x555938c68151  tensorflow::XlaDevice::Compute()
    @     0x55593f389e1f  tensorflow::(anonymous namespace)::ExecutorState::Process()
    @     0x55593f38a625  tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady()::$_1::operator()()
*** SIGABRT received by PID 7798 (TID 7837) from PID 7798; ***

llvm-svn: 337541
2018-07-20 12:03:00 +00:00
Eli Friedman a3c78f5981 [SCCP] Don't use markForcedConstant on branch conditions.
It's more aggressive than we need to be, and leads to strange
workarounds in other places like call return value inference. Instead,
just directly mark an edge viable.

Tests by Florian Hahn.

Differential Revision: https://reviews.llvm.org/D49408

llvm-svn: 337507
2018-07-19 23:02:07 +00:00
Roman Tereshin b49b2a601f [LSV] Refactoring + supporting bitcasts to a type of different size
This is mostly a preparation work for adding a limited support for
select instructions. It proved to be difficult to do due to size and
irregularity of Vectorizer::isConsecutiveAccess, this is fixed here I
believe.

It also turned out that these changes make it simpler to finish one of
the TODOs and fix a number of other small issues, namely:

1. Looking through bitcasts to a type of a different size (requires
careful tracking of the original load/store size and some math
converting sizes in bytes to expected differences in indices of GEPs).

2. Reusing partial analysis of pointers done by first attempt in proving
them consecutive instead of starting from scratch. This added limited
support for nested GEPs co-existing with difficult sext/zext
instructions. This also required a careful handling of negative
differences between constant parts of offsets.

3. Handing a case where the first pointer index is not an add, but
something else (a function parameter for instance).

I observe an increased number of successful vectorizations on a large
set of shader programs. Only few shaders are affected, but those that
are affected sport >5% less loads and stores than before the patch.

Reviewed By: rampitec

Differential-Revision: https://reviews.llvm.org/D49342
llvm-svn: 337489
2018-07-19 19:42:43 +00:00
Farhana Aleen 8c7a30baea [LoadStoreVectorizer] Use getMinusScev() to compute the distance between two pointers.
Summary: Currently, isConsecutiveAccess() detects two pointers(PtrA and PtrB) as consecutive by
         comparing PtrB with BaseDelta+PtrA. This works when both pointers are factorized or
         both of them are not factorized. But isConsecutiveAccess() fails if one of the
         pointers is factorized but the other one is not.

         Here is an example:
         PtrA = 4 * (A + B)
         PtrB = 4 + 4A + 4B

         This patch uses getMinusSCEV() to compute the distance between two pointers.
         getMinusSCEV() allows combining the expressions and computing the simplified distance.

Author: FarhanaAleen

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D49516

llvm-svn: 337471
2018-07-19 16:50:27 +00:00
Roman Lebedev 3cb87e905c [InstCombine] Re-commit: Fold 'check for [no] signed truncation' pattern
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266

This transform is surprisingly frustrating.
This does not deal with non-splat shift amounts, or with undef shift amounts.
I've outlined what i think the solution should be:
```
  // Potential handling of non-splats: for each element:
  //  * if both are undef, replace with constant 0.
  //    Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0.
  //  * if both are not undef, and are different, bailout.
  //  * else, only one is undef, then pick the non-undef one.
```

This is a re-commit, as the original patch, committed in rL337190
was reverted in rL337344 as it broke chromium build:
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832
Proofs that the fixed folds are ok: https://rise4fun.com/Alive/VYM

Differential Revision: https://reviews.llvm.org/D49320

llvm-svn: 337376
2018-07-18 10:55:17 +00:00
Roman Lebedev 3404d4dd41 [NFC][InstCombine] i65 tests for 'check for [no] signed truncation' pattern
Those initially broke chromium build:
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832

llvm-svn: 337364
2018-07-18 08:49:51 +00:00
Roman Lebedev ad50ae82ad Revert test changes part of "Revert "[InstCombine] Fold 'check for [no] signed truncation' pattern""
We want the test to remain good anyway.
I think the fix is incoming.

This reverts part of commit rL337344.

llvm-svn: 337359
2018-07-18 08:15:13 +00:00
Bob Haarman 4ebe5d59b6 Revert "[InstCombine] Fold 'check for [no] signed truncation' pattern"
This reverts r337190 (and a few follow-up commits), which caused the
Chromium build to fail. See
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832

llvm-svn: 337344
2018-07-18 02:18:28 +00:00
Vedant Kumar 9ece818291 [InstCombine] Preserve debug value when simplifying cast-of-select
InstCombine has a cast transform that matches a cast-of-select:

  Orig = cast (Src = select Cond TV FV)

And tries to replace it with a select which has the cast folded in:

  NewSel = select Cond (cast TV) (cast FV)

The combiner does RAUW(Orig, NewSel), so any debug values for Orig would
survive the transform. But debug values for Src would be lost.

This patch teaches InstCombine to replace all debug uses of Src with
NewSel (taking care of doing any necessary DIExpression rewriting).

Differential Revision: https://reviews.llvm.org/D49270

llvm-svn: 337310
2018-07-17 18:08:36 +00:00
Vedant Kumar ced3fd4736 Remove an errant piece of !dbg metadata from a test, NFC
llvm-svn: 337309
2018-07-17 18:08:34 +00:00
Florian Hahn d95761d9d0 [IPSCCP] Run Solve each time we resolved an undef in a function.
Once we resolved an undef in a function we can run Solve, which could
lead to finding a constant return value for the function, which in turn
could turn undefs into constants in other functions that call it, before
resolving undefs there.

Computationally the amount of work we are doing stays the same, just the
order we process things is slightly different and potentially there are
a few less undefs to resolve.

We are still relying on the order of functions in the IR, which means
depending on the order, we are able to resolve the optimal undef first
or not. For example, if @test1 comes before @testf, we find the constant
return value of @testf too late and we cannot use it while solving
@test1.

This on its own does not lead to more constants removed in the
test-suite, probably because currently we have to be very lucky to visit
applicable functions in the right order.

Maybe we manage to come up with a better way of resolving undefs in more
'profitable' functions first.

Reviewers: efriedma, mssimpso, davide

Reviewed By: efriedma, davide

Differential Revision: https://reviews.llvm.org/D49385

llvm-svn: 337283
2018-07-17 14:04:59 +00:00
Simon Pilgrim 1a4f3c93fb [SLPVectorizer] Don't attempt horizontal reduction on pointer types (PR38191)
TTI::getMinMaxReductionCost typically can't handle pointer types - until this is changed its better to limit horizontal reduction to integer/float vector types only.

llvm-svn: 337280
2018-07-17 13:43:33 +00:00
Chen Zheng fcfcf07104 [NFC][testcases] add testcases for folding srem whose operands are negatived.
Finish same optimization for add instruction in D49216 and sdiv instruction in 
D49382. This patch is for srem instruction.

llvm-svn: 337270
2018-07-17 12:31:54 +00:00
Chen Zheng c992cc4e97 [testcases] move testcases to right place - NFC
Differential Revision: https://reviews.llvm.org/D49409

llvm-svn: 337230
2018-07-17 01:04:41 +00:00
Roman Lebedev 5da636fb90 [NFC][InstCombine] Fine-tune 'check for [no] signed truncation' tests
We are using i8 for these tests, and shifting by 4,
which is exactly the half of i8.

But as it is seen from the proofs https://rise4fun.com/Alive/mgu
KeptBits = bitwidth(%x) - MaskedBits,
so with using shifts by 4, we are not really testing that
we actually properly handle the other cases with shifts not by half...

llvm-svn: 337208
2018-07-16 20:10:46 +00:00
Roman Lebedev b79b4f539b [InstCombine] Fold 'check for [no] signed truncation' pattern
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

Proofs for this transform: https://rise4fun.com/Alive/mgu
This transform is surprisingly frustrating.
This does not deal with non-splat shift amounts, or with undef shift amounts.
I've outlined what i think the solution should be:
```
  // Potential handling of non-splats: for each element:
  //  * if both are undef, replace with constant 0.
  //    Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0.
  //  * if both are not undef, and are different, bailout.
  //  * else, only one is undef, then pick the non-undef one.
```

The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: JDevlieghere, rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D49320

llvm-svn: 337190
2018-07-16 16:45:42 +00:00
Teresa Johnson d68935c5ac Restore "[ThinLTO] Ensure we always select the same function copy to import"
This reverts commit r337081, therefore restoring r337050 (and fix in
r337059), with test fix for bot failure described after the original
description below.

In order to always import the same copy of a linkonce function,
even when encountering it with different thresholds (a higher one then a
lower one), keep track of the summary we decided to import.
This ensures that the backend only gets a single definition to import
for each GUID, so that it doesn't need to choose one.

Move the largest threshold the GUID was considered for import into the
current module out of the ImportMap (which is part of a larger map
maintained across the whole index), and into a new map just maintained
for the current module we are computing imports for. This saves some
memory since we no longer have the thresholds maintained across the
whole index (and throughout the in-process backends when doing a normal
non-distributed ThinLTO build), at the cost of some additional
information being maintained for each invocation of ComputeImportForModule
(the selected summary pointer for each import).

There is an additional map lookup for each callee being considered for
importing, however, this was able to subsume a map lookup in the
Worklist iteration that invokes computeImportForFunction. We also are
able to avoid calling selectCallee if we already failed to import at the
same or higher threshold.

I compared the run time and peak memory for the SPEC2006 471.omnetpp
benchmark (running in-process ThinLTO backends), as well as for a large
internal benchmark with a distributed ThinLTO build (so just looking at
the thin link time/memory). Across a number of runs with and without
this change there was no significant change in the time and memory.

(I tried a few other variations of the change but they also didn't
improve time or peak memory).

The new commit removes a test that no longer makes sense
(Transforms/FunctionImport/hotness_based_import2.ll), as exposed by the
reverse-iteration bot. The test depends on the order of processing the
summary call edges, and actually depended on the old problematic
behavior of selecting more than one summary for a given GUID when
encountered with different thresholds. There was no guarantee even
before that we would eventually pick the linkonce copy with the hottest
call edges, it just happened to work with the test and the old code, and
there was no guarantee that we would end up importing the selected
version of the copy that had the hottest call edges (since the backend
would effectively import only one of the selected copies).

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D48670

llvm-svn: 337184
2018-07-16 15:30:27 +00:00
Chen Zheng 1ff49315ab [InstrSimplify] add testcases for fold sdiv if two operands are negatived and non-overflow
Differential Revision: https://reviews.llvm.org/D49365

llvm-svn: 337179
2018-07-16 15:06:42 +00:00
Alexandros Lamprineas f854ce84c4 [MemorySSAUpdater] Remove deleted trivial Phis from active workset
Bug fix for PR37808. The regression test is a reduced version of the
original reproducer attached to the bug report. As stated in the report,
the problem was that InsertedPHIs was keeping dangling pointers to
deleted Memory-Phis. MemoryPhis are created eagerly and sometimes get
zapped shortly afterwards. I've used WeakVH instead of an expensive
removal operation from the active workset.

Differential Revision: https://reviews.llvm.org/D48372

llvm-svn: 337149
2018-07-16 07:51:27 +00:00
Chen Zheng ccc8422464 [InstCombine] add more SPFofSPF folding
Differential Revision: https://reviews.llvm.org/D49238

llvm-svn: 337143
2018-07-16 02:23:00 +00:00
Chen Zheng b972273f98 [InstCombine] fold icmp pred (sub 0, X) C for vector type
Differential Revision: https://reviews.llvm.org/D49283

llvm-svn: 337141
2018-07-16 00:51:40 +00:00
Sanjay Patel fae8ed0104 [InstSimplify] add fixme comment for PR37776; NFC
llvm-svn: 337129
2018-07-15 16:13:58 +00:00
Sanjay Patel 92d0c1c129 [InstSimplify] fold minnum/maxnum with NaN arg
This fold is repeated/misplaced in instcombine, but I'm
not sure if it's safe to remove that yet because some
other folds appear to be asserting that the transform
has occurred within instcombine itself.

This isn't the best fix for PR37776, but it probably
hides the bug with the given code example:
https://bugs.llvm.org/show_bug.cgi?id=37776

We have another test to demonstrate the more general bug.

llvm-svn: 337127
2018-07-15 14:52:16 +00:00
Sanjay Patel ef71b704c2 [InstSimplify] add tests for minnum/maxnum; NFC
This isn't the best fix for PR37776, but it probably
hides the bug with the given code example:
https://bugs.llvm.org/show_bug.cgi?id=37776

We have another test to demonstrate the more general
bug.

llvm-svn: 337126
2018-07-15 14:46:48 +00:00
Roman Lebedev b972fc3e8a [InstCombine] Fold x & (-1 >> y) s< x to x s> (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337111
2018-07-14 20:08:47 +00:00
Roman Lebedev edba515baa [NFC][InstCombine] Tests for x & (-1 >> y) s< x to x s> (-1 >> y) fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337110
2018-07-14 20:08:42 +00:00
Roman Lebedev f14426101e [InstCombine] Fold x & (-1 >> y) s>= x to x s<= (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337109
2018-07-14 20:08:37 +00:00
Roman Lebedev 53a014e61d [NFC][InstCombine] Tests for x & (-1 >> y) s>= x to x s<= (-1 >> y) fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337108
2018-07-14 20:08:31 +00:00
Roman Lebedev 1e61e358a4 [InstCombine] Fold x s<= x & (-1 >> y) to x s<= (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337107
2018-07-14 20:08:26 +00:00
Roman Lebedev f1a351cf3a [NFC][InstCombine] Tests for x s<= x & (-1 >> y) to x s<= (-1 >> y) fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337106
2018-07-14 20:08:21 +00:00
Roman Lebedev 859e14aeaa [InstCombine] Fold x s> x & (-1 >> y) to x s> (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337105
2018-07-14 20:08:16 +00:00
Roman Lebedev 62d050d2c4 [NFC][InstCombine] Tests for x s> x & (-1 >> y) to x s> (-1 >> y) fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/I3O

This pattern is not commutative!
We must make sure not to fold the commuted version!

llvm-svn: 337104
2018-07-14 20:08:09 +00:00
Roman Lebedev 0f5ec8921b [InstCombine] Fold x u<= x & C to x u<= C
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Fqp

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337102
2018-07-14 16:44:54 +00:00
Roman Lebedev 79ee3211ba [NFC][InstCombine] Tests for x u<= x & C to x u<= C fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Fqp

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337101
2018-07-14 16:44:48 +00:00
Roman Lebedev 74f611a1f5 [InstCombine] Fold x u> x & C to x u> C
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/JvS

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337100
2018-07-14 16:44:43 +00:00
Roman Lebedev 60ea5df7c2 [NFC][InstCombine] Tests for x u> x & C to x u> C fold.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/JvS

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337099
2018-07-14 16:44:37 +00:00
Roman Lebedev e3dc587ae0 [InstCombine] Fold x & (-1 >> y) u< x to x u> (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/ocb

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337098
2018-07-14 12:20:16 +00:00
Roman Lebedev cad4e3d741 [NFC][InstCombine] Tests for x & (-1 >> y) u< x to x u> (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/ocb

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337097
2018-07-14 12:20:11 +00:00
Roman Lebedev fac48474ce [InstCombine] Fold x & (-1 >> y) u>= x to x u<= (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/azI

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337096
2018-07-14 12:20:06 +00:00
Roman Lebedev 3591eb9296 [NFC][InstCombine] Tests for x & (-1 >> y) u>= x to x u<= (-1 >> y)
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/azI

This pattern is not commutative. But InstSimplify will
already have taken care of the 'commutative' variant.

llvm-svn: 337095
2018-07-14 12:20:01 +00:00
Roman Lebedev ea2ffc3fcf [NFC][InstCombine] Add forgotten variable tests for foldICmpWithLowBitMaskedVal()
llvm-svn: 337094
2018-07-14 12:19:56 +00:00
Teresa Johnson b78c5d0602 Revert "[ThinLTO] Ensure we always select the same function copy to import"
This reverts commits r337050 and r337059. Caused failure in
reverse-iteration bot that needs more investigation.

llvm-svn: 337081
2018-07-14 01:45:49 +00:00
Teresa Johnson 8e739f98fe Revert "[ThinLTO] Add debug output to test"
This reverts commit r337076. Not needed any more.

llvm-svn: 337080
2018-07-14 01:34:06 +00:00
Teresa Johnson 05d28c8002 [ThinLTO] Add debug output to test
Add -debug-only=function-import to get more information for debugging
reverse-iteration bot failure from r337050.

llvm-svn: 337076
2018-07-14 00:08:48 +00:00
Tim Shen a064622bd3 Re-apply "[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428)."
llvm-svn: 337075
2018-07-13 23:58:46 +00:00
Teresa Johnson 4f8d704cd0 [ThinLTO] Require x86 target for new test
Should fix non-x86 bot failures for new test from r337050.

llvm-svn: 337059
2018-07-13 22:36:22 +00:00
Teresa Johnson d94c0594d9 [ThinLTO] Ensure we always select the same function copy to import
In order to always import the same copy of a linkonce function,
even when encountering it with different thresholds (a higher one then a
lower one), keep track of the summary we decided to import.
This ensures that the backend only gets a single definition to import
for each GUID, so that it doesn't need to choose one.

Move the largest threshold the GUID was considered for import into the
current module out of the ImportMap (which is part of a larger map
maintained across the whole index), and into a new map just maintained
for the current module we are computing imports for. This saves some
memory since we no longer have the thresholds maintained across the
whole index (and throughout the in-process backends when doing a normal
non-distributed ThinLTO build), at the cost of some additional
information being maintained for each invocation of ComputeImportForModule
(the selected summary pointer for each import).

There is an additional map lookup for each callee being considered for
importing, however, this was able to subsume a map lookup in the
Worklist iteration that invokes computeImportForFunction. We also are
able to avoid calling selectCallee if we already failed to import at the
same or higher threshold.

I compared the run time and peak memory for the SPEC2006 471.omnetpp
benchmark (running in-process ThinLTO backends), as well as for a large
internal benchmark with a distributed ThinLTO build (so just looking at
the thin link time/memory). Across a number of runs with and without
this change there was no significant change in the time and memory.

(I tried a few other variations of the change but they also didn't
improve time or peak memory).

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D48670

llvm-svn: 337050
2018-07-13 21:35:51 +00:00
Roman Lebedev 4d22f15a2f [NFC][InstCombine] Tests for 'check for [no] signed truncation' pattern
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266

llvm-svn: 337042
2018-07-13 20:33:34 +00:00
Vlad Tsyrklevich cd1559366d [LowerTypeTests] Limit when icall jumptable entries are emitted
Summary:
Currently LowerTypeTests emits jumptable entries for all live external
and address-taken functions; however, we could limit the number of
functions that we emit entries for significantly.

For Cross-DSO CFI, we continue to emit jumptable entries for all
exported definitions.  In the non-Cross-DSO CFI case, we only need to
emit jumptable entries for live functions that are address-taken in live
functions. This ignores exported functions and functions that are only
address taken in dead functions. This change uses ThinLTO summary data
(now emitted for all modules during ThinLTO builds) to determine
address-taken and liveness info.

The logic for emitting jumptable entries is more conservative in the
regular LTO case because we don't have summary data in the case of
monolithic LTO builds; however, once summaries are emitted for all LTO
builds we can unify the Thin/monolithic LTO logic to only use summaries
to determine the liveness of address taking functions.

This change is a partial fix for PR37474. It reduces the build size for
nacl_helper by ~2-3%, the reduction is due to nacl_helper compiling in
lots of unused code and unused functions that are address taken in dead
functions no longer being being considered live due to emitted jumptable
references. The reduction for chromium is ~0.1-0.2%.

Reviewers: pcc, eugenis, javed.absar

Reviewed By: pcc

Subscribers: aheejin, dexonsmith, dschuff, mehdi_amini, eraman, steven_wu, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D47652

llvm-svn: 337038
2018-07-13 19:57:39 +00:00
Evgeniy Stepanov ca8c2f7638 Revert "CallGraphSCCPass: iterate over all functions."
This reverts commit r336419: use-after-free on CallGraph::FunctionMap elements
due to the use of a stale iterator in CGPassManager::runOnModule.

The iterator may be invalidated if a pass removes a function, ex.:
  llvm::LegacyInlinerBase::inlineCalls
  inlineCallsImpl
  llvm::CallGraph::removeFunctionFromModule

llvm-svn: 337018
2018-07-13 16:32:31 +00:00
Simon Pilgrim 36b944e778 [SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED-2)
We currently only support binary instructions in the alternate opcode shuffles.

This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:

1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.

Reapplied with fix to only accept 2 different casts if they come from the same source type (PR38154).

Differential Revision: https://reviews.llvm.org/D49135

llvm-svn: 336989
2018-07-13 11:09:52 +00:00
Sanjay Patel 70043b7e9a [InstCombine] return when SimplifyAssociativeOrCommutative makes a change
This bug was created by rL335258 because we used to always call instsimplify
after trying the associative folds. After that change it became possible
for subsequent folds to encounter unsimplified code (and potentially assert
because of it). 

Instead of carrying changed state through instcombine, we can just return 
immediately. This allows instsimplify to run, so we can continue assuming
that easy folds have already occurred.

llvm-svn: 336965
2018-07-13 01:18:07 +00:00
Piotr Padlewski c63b492bcd Simplify recursive launder.invariant.group and strip
Summary:
This patch is crucial for proving equality laundered/stripped
pointers. eg:

  bool foo(A *a) {
    return a == std::launder(a);
  }

Clang with -fstrict-vtable-pointers will emit something like:

    define dso_local zeroext i1 @_Z3fooP1A(%struct.A* %a) {
    entry:
      %c = bitcast %struct.A* %a to i8*
      %call = tail call i8* @llvm.launder.invariant.group.p0i8(i8* %c)
      %0 = bitcast %struct.A* %a to i8*
      %1 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %0)
      %2 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %call)
      %cmp = icmp eq i8* %1, %2
      ret i1 %cmp
    }

and because %2 can be replaced with @llvm.strip.invariant.group(%0)
and that %2 and %1 will produce the same value (because strip is readnone)
we can replace compare with true.

Reviewers: rsmith, hfinkel, majnemer, amharc, kuhar

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D47423

llvm-svn: 336963
2018-07-12 23:55:20 +00:00
Martin Storsjo 86b95489fd Revert "[SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED)"
This reverts commit r336812, which broke compilation of a number
of projects, see PR38154.

llvm-svn: 336949
2018-07-12 21:33:42 +00:00
Roman Lebedev 733a8d6886 [InstCombine] icmp-logical.ll: restore the original intention of the test.
While that fold is clearly not happening [anymore],
we do now have separate test cases for these cases,
so we should be ok to slightly adjust these tests
to not potentially loose test coverage.

As suggested by Hiroshi Yamauchi in
https://reviews.llvm.org/D49179#1159345

llvm-svn: 336912
2018-07-12 14:56:17 +00:00
Roman Lebedev 74f899f0f4 [InstCombine] Fold x & (-1 >> y) != x to x u> (-1 >> y)
Summary:
A complementary fold to D49179.

https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Rny

Caveat: one more thing in `test/Transforms/InstCombine/icmp-logical.ll` breaks.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49205

llvm-svn: 336911
2018-07-12 14:56:12 +00:00
Chen Zheng 9b00a8e9d7 [InstCombine]add testcases for folding more SPFofSPF pattern
Differential Revision: https://reviews.llvm.org/D49222

llvm-svn: 336902
2018-07-12 13:28:20 +00:00
Chen Zheng fdf13ef342 [InstSimplify] simplify add instruction if two operands are negative
Differential Revision: https://reviews.llvm.org/D49216

llvm-svn: 336881
2018-07-12 03:06:04 +00:00
Eric Christopher 7289ac8a19 Temporarily revert "Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters." as it's causing miscompiles.
A testcase was provided in the original review thread.

This reverts commit r336098.

llvm-svn: 336877
2018-07-12 01:53:21 +00:00
Craig Topper 034adf2683 [X86] Remove and autoupgrade the scalar fma intrinsics with masking.
This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches.

llvm-svn: 336871
2018-07-12 00:29:56 +00:00
Craig Topper ed6acde8cf [LoopIdiomRecognize] Don't convert a do while loop to ctlz.
This commit suppresses turning loops like this into "(bitwidth - ctlz(input))".

unsigned foo(unsigned input) {
  unsigned num = 0;
  do {
    ++num;
    input >>= 1;
  } while (input != 0);
  return num;
}

The loop version returns a value of 1 for both an input of 0 and an input of 1. Converting to a naive ctlz does not preserve that.

Theoretically we could do better if we checked isKnownNonZero or we could insert a select to handle the divergence. But until we have motivating cases for that, this is the easiest solution.

llvm-svn: 336864
2018-07-11 22:35:28 +00:00
Craig Topper ef08aec935 [LoopIdiomRecognize] Add a test case showing a loop we turn into ctlz that we shouldn't.
This loop executes one iteration without checking the input value. This produces a count of 1 for an input of 0 and 1. We are turning this into 32 - ctlz(n), but that returns 0 if n is 0.

llvm-svn: 336862
2018-07-11 22:17:26 +00:00
Roman Lebedev 2c19fe52e9 [NFC][InstCombine] Tests for x & (-1 >> y) != x -> x u> (-1 >> y) fold
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Rny

llvm-svn: 336857
2018-07-11 21:28:42 +00:00
Roman Lebedev 68d54cf5b3 [InstCombine] Fold x & (-1 >> y) == x to x u<= (-1 >> y)
Summary:
https://bugs.llvm.org/show_bug.cgi?id=38123

This pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958
https://bugs.llvm.org/show_bug.cgi?id=21530
in unsigned case, therefore it is probably a good idea to improve it.

https://rise4fun.com/Alive/Rny
^ there are more opportunities for folds, i will follow up with them afterwards.

Caveat: this somehow exposes a missing opportunities
in `test/Transforms/InstCombine/icmp-logical.ll`
It seems, the problem is in `foldLogOpOfMaskedICmps()` in `InstCombineAndOrXor.cpp`.
But i'm not quite sure what is wrong, because it calls `getMaskedTypeForICmpPair()`,
which calls `decomposeBitTestICmp()` which should already work for these cases...
As @spatel notes in https://reviews.llvm.org/D49179#1158760,
that code is a rather complex mess, so we'll let it slide.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: yamauchi, majnemer, t.p.northover, llvm-commits

Differential Revision: https://reviews.llvm.org/D49179

llvm-svn: 336834
2018-07-11 19:05:04 +00:00
Sanjay Patel 14eeb5a5c0 [InstSimplify] add/move tests for add folds; NFC
isKnownNegation() is currently proposed as part of D48754,
but it could be used to make InstSimplify stronger independently
of any abs() improvements.

llvm-svn: 336822
2018-07-11 16:52:18 +00:00
Simon Pilgrim 876e99bf2c [SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED)
We currently only support binary instructions in the alternate opcode shuffles.

This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:

1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.

Reapplied with fix to only accept 2 different casts if they come from the same source type.

Differential Revision: https://reviews.llvm.org/D49135

llvm-svn: 336812
2018-07-11 15:05:10 +00:00
Simon Pilgrim 09832e33b4 [SLPVectorizer] Ensure alternate/passthrough doesn't vectorize sdiv with undef elts
llvm-svn: 336809
2018-07-11 14:34:43 +00:00
Simon Pilgrim 0c7bea0dc0 [SLPVectorizer] Add some additional alternate cast tests
Initial attempt at D49135 failed as we weren't correctly handling casts with different source types.

llvm-svn: 336808
2018-07-11 14:29:13 +00:00
Simon Pilgrim 1edde95abd Revert rL336804: [SLPVectorizer] Add initial alternate opcode support for cast instructions.
Reverting due to buildbot failures

llvm-svn: 336806
2018-07-11 14:08:16 +00:00
Simon Pilgrim 2f963a7e83 [SLPVectorizer] Add initial alternate opcode support for cast instructions.
We currently only support binary instructions in the alternate opcode shuffles.

This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:

1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.

Differential Revision: https://reviews.llvm.org/D49135

llvm-svn: 336804
2018-07-11 13:34:09 +00:00
Roman Lebedev 76f195cdbd [NFC][InstCombine] Tests for x & (-1 >> y) == x -> x u<= (-1 >> y) fold
https://bugs.llvm.org/show_bug.cgi?id=38123

This pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958
https://bugs.llvm.org/show_bug.cgi?id=21530
in unsigned case, therefore it is probably a good idea to improve it.

https://rise4fun.com/Alive/Rny

llvm-svn: 336796
2018-07-11 12:37:12 +00:00
Roman Lebedev a042fae6e0 [NFC][InstCombine] icmp-logical.ll: add a few more tests.
The @masked_and_notA_slightly_optimized and @masked_or_A
will break when PR38123 will be fixed:
https://rise4fun.com/Alive/Rny
Clearly, they aren't optimized currently.

https://rise4fun.com/Alive/ERo

llvm-svn: 336784
2018-07-11 10:31:12 +00:00
Roman Lebedev 5260c9efc8 [NFC][InstCombine] Fix extra space padding in icmp-mul-zext.ll test
update_test_checks will drop it anyway, creating noise..

llvm-svn: 336781
2018-07-11 09:57:53 +00:00
Roman Lebedev c5e437e5eb [NFC][InstCombine] Add variable names and regenerate icmp-logical.ll test.
llvm-svn: 336780
2018-07-11 09:57:46 +00:00
Chen Zheng fb361d2525 [test cases] add test cases for find more abs pattern
Differential Revision: https://reviews.llvm.org/D49123

llvm-svn: 336752
2018-07-11 01:07:21 +00:00
Eugene Leviant 6a572b8e79 [Evaluator] Examine alias when evaluating function call
This fixes PR38120

llvm-svn: 336702
2018-07-10 16:34:23 +00:00
Sanjay Patel c8d9d812ec [InstCombine] allow flag propagation when using safe constant
This corresponds with the code for the single binop pattern
added in rL336684.

llvm-svn: 336696
2018-07-10 16:09:49 +00:00
Sanjay Patel 509a1e7a9b [InstCombine] safely allow non-commutative binop identity constant folds
This was originally intended with D48893, but as discussed there, we
have to make the folds safe from producing extra poison. This should
give the single binop folds the same capabilities as the existing
folds for 2-binops+shuffle.

LLVM binary opcode review: there are a total of 18 binops. There are 7 
commutative binops (add, mul, and, or, xor, fadd, fmul) which we already 
fold. We're able to fold 6 more opcodes with this patch (shl, lshr, ashr,
fdiv, udiv, sdiv). There are no folds for srem/urem/frem AFAIK. We don't 
bother with sub/fsub with constant operand 1 because those are 
canonicalized to add/fadd. 7 + 6 + 3 + 2 = 18.

llvm-svn: 336684
2018-07-10 15:12:31 +00:00
Sanjay Patel 3333106a62 [InstCombine] drop poison flags when shuffle mask undef propagates to constant
llvm-svn: 336679
2018-07-10 14:27:55 +00:00
Sanjay Patel 06ea4206ad [InstCombine] allow more shuffle-binop folds with safe constants
The case with 2 variables is more complicated than the case where
we eliminate the shuffle entirely because a shuffle with an undef 
mask element creates an undef result. 

I'm not aware of any current analysis/transform that recognizes that 
undef propagating to a div/rem/shift, but we have to guard against 
the possibility.

llvm-svn: 336668
2018-07-10 13:33:26 +00:00
Anastasis Grammenos 612bf7cac5 [DebugInfo][LoopVectorize] Preserve DL in induction PHI and Add
Differential Revision: https://reviews.llvm.org/D48968

llvm-svn: 336667
2018-07-10 13:29:50 +00:00
Karl-Johan Karlsson 1ffeb5d7f0 [LowerSwitch] Fixed faulty PHI nodes
Summary:
Fixed two cases of where PHI nodes need to be updated by lowerswitch.

When lowerswitch find out that the switch default branch is not
reachable it remove the old default and replace it with the most
popular block from the cases, but it forget to update the PHI
nodes in the default block.

The PHI nodes also need to be updated when the switch is replaced
with a single branch.

Reviewers: hans, reames, arsenm

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D47203

llvm-svn: 336659
2018-07-10 12:06:16 +00:00
Chandler Carruth 47dc3a346e [PM/Unswitch] Fix a collection of closely related issues with trivial
switch unswitching.

The core problem was that the way we handled unswitching trivial exit
edges through the default successor of a switch. For some reason
I thought the right way to do this was to add a block containing
unreachable and point the default successor at this block. In
retrospect, this has an amazing number of problems.

The first issue is the one that this pass has always worked around -- we
have to *detect* such edges and avoid unswitching them again. This
seemed pretty easy really. You juts look for an edge to a block
containing unreachable. However, this pattern is woefully unsound. So
many things can break it. The amazing thing is that I found a test case
where *simple-loop-unswitch itself* breaks this! When we do
a *non-trivial* unswitch of a switch we will end up splitting this exit
edge. The result will be a default successor that is an exit and
terminates in ... a perfectly normal branch. So the first test case that
I started trying to fix is added to the nontrivial test cases. This is
a ridiculous example that did just amazing things previously. With just
unswitch, it would create 10+ copies of this stuff stamped out. But if
you combine it *just right* with a bunch of other passes (like
simplify-cfg, loop rotate, and some LICM) you can get it to do this
infinitely. Or at least, I never got it to finish. =[

This, in turn, uncovered another related issue. When we are manipulating
these switches after doing a trivial unswitch we never correctly updated
PHI nodes to reflect our edits. As soon as I started changing how these
edges were managed, it became obvious there were more issues that
I couldn't realistically leave unaddressed, so I wrote more test cases
around PHI updates here and ensured all of that works now.

And this, in turn, required some adjustment to how we collect and manage
the exit successor when it is the default successor. That showed a clear
bug where we failed to include it in our search for the outer-most loop
reached by an unswitched exit edge. This was actually already tested and
the test case didn't work. I (wrongly) thought that was due to SCEV
failing to analyze the switch. In fact, it was just a simple bug in the
code that skipped the default successor. While changing this, I handled
it correctly and have updated the test to reflect that we now get
precise SCEV analysis of trip counts for the outer loop in one of these
cases.

llvm-svn: 336646
2018-07-10 08:36:05 +00:00
Sanjay Patel 69faf464ed [InstCombine] allow more shuffle folds using safe constants
getSafeVectorConstantForBinop() was calling getBinOpIdentity() assuming
that the constant we wanted was operand 1 (RHS). That's wrong, but I
don't think we could expose a bug or even a suboptimal fold from that
because the callers have other guards for any binop that would have
been affected.

llvm-svn: 336617
2018-07-09 23:22:47 +00:00
Manoj Gupta 77eeac3d9e llvm: Add support for "-fno-delete-null-pointer-checks"
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.

More details : https://lkml.org/lkml/2018/4/4/601

GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.

-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.

This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.

Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv

Reviewed By: efriedma, george.burgess.iv

Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits

Differential Revision: https://reviews.llvm.org/D47895

llvm-svn: 336613
2018-07-09 22:27:23 +00:00
George Burgess IV 3fbfa9c403 Make llvm.objectsize more conservative with null
In non-zero address spaces, we were reporting that an object at `null`
always occupies zero bytes. This is incorrect in many cases, so just
return `unknown` in those cases for now.

Differential Revision: https://reviews.llvm.org/D48860

llvm-svn: 336611
2018-07-09 22:21:16 +00:00
Sanjay Patel 651438c2f6 [InstCombine] correct test comments; NFC
llvm-svn: 336570
2018-07-09 17:48:08 +00:00
Sanjay Patel 7cd32419ab [InstCombine] avoid extra poison when moving shift above shuffle
As discussed in D49047 / D48987, shift-by-undef produces poison,
so we can't use undef vector elements in that case..

Note that we need to extend this for poison-generating flags,
and there's a proposal to create poison from FMF in D47963,

llvm-svn: 336562
2018-07-09 17:20:20 +00:00
Sanjay Patel 5bd36644c8 [InstCombine] fix shuffle-of-binops transform to avoid poison/undef
As noted in D48987, there are many different ways for this transform to go wrong. 
In particular, the poison potential for shifts means we have to more careful with those ops. 
I added tests to make that behavior visible for all of the different cases that I could find.

This is a partial fix. To make this review easier, I did not make changes for the single binop 
pattern (handled in foldSelectShuffleWith1Binop()). I also left out some potential optimizations 
noted with TODO comments. I'll follow-up once we're confident that things are correct here.

The goal is to correct all marked FIXME tests to either avoid the shuffle transform or do it safely.

Note that distinguishing when the shuffle mask contains undefs and using getBinOpIdentity() allows 
for some improvements to div/rem patterns, so there are wins along with the missed opportunities 
and fixes.

Differential Revision: https://reviews.llvm.org/D49047

llvm-svn: 336546
2018-07-09 13:21:46 +00:00
Chandler Carruth ed2965438e [PM/Unswitch] Fix a nasty bug in the new PM's unswitch introduced in
r335553 with the non-trivial unswitching of switches.

The code correctly updated most aspects of the CFG and analyses, but
missed some crucial aspects:
1) When multiple cases have the same successor, we unswitch that
   a single time and replace the switch with a direct branch. The CFG
   here is correct, but the target of this direct branch may have had
   a PHI node with multiple entries in it.
2) When we still have to clone a successor of the switch into an
   unswitched copy of the loop, we'll delete potentially multiple edges
   entering this successor, not just one.
3) We also have to delete multiple edges entering the successors in the
   original loop when they have to be retained.
4) When the "retained successor" *also* occurs as a case successor, we
   just assert failed everywhere. This doesn't happen very easily
   because its always valid to simply drop the case -- the retained
   successor for switches is always the default successor. However, it
   is likely possible through some contrivance of different loop passes,
   unrolling, and simplifying for this to occur in practice and
   certainly there is nothing "invalid" about the IR so this pass needs
   to handle it.
5) In the case of #4, we also will replace these multiple edges with
   a direct branch much like in #1 and need to collapse the entries in
   any PHI nodes to a single enrty.

All of this stems from the delightful fact that the same successor can
show up in multiple parts of the switch terminator, and each of these
are considered a distinct edge for the purpose of PHI nodes (and
iterating the successors and predecessors) but not for unswitching
itself, the dominator tree, or many other things. For the record,
I intensely dislike this "feature" of the IR in large part because of
the complexity it causes in passes like this. We already have a ton of
logic building sets and handling duplicates, and we just had to add
a bunch more.

I've added a complex test case that covers all five of the above failure
modes. I've also added a variation on it where #4 and #5 occur in loop
exit, adding fun where we have an LCSSA PHI node with "multiple entries"
despite have dedicated exits. There were no additional issues found by
this, but it seems a useful corner case to cover with testing.

One thing that working on all of this code has made painfully clear for
me as well is how amazingly inefficient our PHI node representation is
(in terms of the in-memory data structures and the APIs used to update
them). This code has truly marvelous complexity bounds because every
time we remove an entry from a PHI node we do a linear scan to find it
and then a linear update to the data structure to remove it. We could in
theory batch all of the PHI node updates into a single linear walk of
the operands making this much more efficient, but the APIs fight hard
against this and the fact that we have to handle duplicates in the
peculiar manner we do (removing all but one in some cases) makes even
implementing that very tedious and annoying. Anyways, none of this is
new here or specific to loop unswitching. All code in LLVM that updates
PHI node operands suffers from these problems.

llvm-svn: 336536
2018-07-09 10:30:48 +00:00
Chijun Sima 9e1e0c7b2a [PGOMemOPSize] Preserve the DominatorTree
Summary:
PGOMemOPSize only modifies CFG in a couple of places; thus we can preserve the DominatorTree with little effort.
When optimizing SQLite with -O3, this patch can decrease 3.8% of the numbers of nodes traversed by DFS and 5.7% of the times DominatorTreeBase::recalculation is called.

Reviewers: kuhar, davide, dmgreen

Reviewed By: dmgreen

Subscribers: mzolotukhin, vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D48914

llvm-svn: 336522
2018-07-09 08:07:21 +00:00
Craig Topper 2835278ee0 [LoopIdiomRecognize] Support for converting loops that use LSHR to CTLZ.
In the 'detectCTLZIdiom' function support for loops that use LSHR instruction instead of ASHR has been added.

This supports creating ctlz from the following code.

int lzcnt(int x) {
     int count = 0;
     while (x > 0)  {
          count++;
          x = x >> 1;
     }
    return count;
}

Patch by Olga Moldovanova

Differential Revision: https://reviews.llvm.org/D48354

llvm-svn: 336509
2018-07-08 01:45:47 +00:00
Chandler Carruth d8b0c8ce1b [PM/LoopUnswitch] Fix PR37889, producing the correct loop nest structure
after trivial unswitching.

This PR illustrates that a fundamental analysis update was not performed
with the new loop unswitch. This update is also somewhat fundamental to
the core idea of the new loop unswitch -- we actually *update* the CFG
based on the unswitching. In order to do that, we need to update the
loop nest in addition to the domtree.

For some reason, when writing trivial unswitching, I thought that the
loop nest structure cannot be changed by the transformation. But the PR
helps illustrate that it clearly can. I've expanded this to a number of
different test cases that try to cover the different cases of this. When
we unswitch, we move an exit edge of a loop out of the loop. If this
exit edge changes which loop reached by an exit is the innermost loop,
it changes the parent of the loop. Essentially, this transformation may
hoist the inner loop up the nest. I've added the simple logic to handle
this reliably in the trivial unswitching case. This just requires
updating LoopInfo and rebuilding LCSSA on the impacted loops. In the
trivial case, we don't even need to handle dedicated exits because we're
only hoisting the one loop and we just split its preheader.

I've also ported all of these tests to non-trivial unswitching and
verified that the logic already there correctly handles the loop nest
updates necessary.

Differential Revision: https://reviews.llvm.org/D48851

llvm-svn: 336477
2018-07-07 01:12:56 +00:00
Tim Shen 2ed501d656 Revert "[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428)."
This reverts commit r336140. Our tests shows that LSR assert fails with it.

llvm-svn: 336473
2018-07-06 23:20:35 +00:00
Sanjay Patel 5739587735 [InstCombine] add more tests for potentially poisonous shifts; NFC
llvm-svn: 336454
2018-07-06 17:44:57 +00:00
Vedant Kumar 6379a62250 [Local] replaceAllDbgUsesWith: Update debug values before RAUW
The replaceAllDbgUsesWith utility helps passes preserve debug info when
replacing one value with another.

This improves upon the existing insertReplacementDbgValues API by:

- Updating debug intrinsics in-place, while preventing use-before-def of
  the replacement value.
- Falling back to salvageDebugInfo when a replacement can't be made.
- Moving the responsibiliy for rewriting llvm.dbg.* DIExpressions into
  common utility code.

Along with the API change, this teaches replaceAllDbgUsesWith how to
create DIExpressions for three basic integer and pointer conversions:

- The no-op conversion. Applies when the values have the same width, or
  have bit-for-bit compatible pointer representations.
- Truncation. Applies when the new value is wider than the old one.
- Zero/sign extension. Applies when the new value is narrower than the
  old one.

Testing:

- check-llvm, check-clang, a stage2 `-g -O3` build of clang,
  regression/unit testing.
- This resolves a number of mis-sized dbg.value diagnostics from
  Debugify.

Differential Revision: https://reviews.llvm.org/D48676

llvm-svn: 336451
2018-07-06 17:32:39 +00:00
Sanjay Patel a212b0bc18 [InstCombine] add more tests with poison and undef; NFC
As discussed in D48987 and D48893, there are many different
ways to go wrong depending on the binop (and as shown here
we already do go wrong in some cases).

llvm-svn: 336450
2018-07-06 17:24:32 +00:00
Tim Northover 7ee46ed992 CallGraphSCCPass: iterate over all functions.
Previously we only iterated over functions reachable from the set of
external functions in the module. But since some of the passes under
this (notably the always-inliner and coroutine lowerer) are required for
correctness, they need to run over everything.

This just adds an extra layer of iteration over the CallGraph to keep
track of which functions we've already visited and get the next batch of
SCCs.

Should fix PR38029.

llvm-svn: 336419
2018-07-06 08:04:47 +00:00
Max Kazantsev 20da7e467a Revert "[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done"
llvm-svn: 336410
2018-07-06 04:04:13 +00:00
Matt Arsenault 24ce89b717 Fix asserts in AMDGCN fmed3 folding by handling more cases of NaN
Better NaN handling for AMDGCN fmed3.

All operands are checked for NaN now. The checks
were moved before the canonicalization to provide
a better mapping from fclamp. Changed the behaviour
of fmed3(x,y,NaN) to return max(x,y) instead of
min(x,y) in light of this. Updated tests as a result
and added some new cases to cover the fix.

Patch by Alan Baker

llvm-svn: 336375
2018-07-05 17:05:36 +00:00
Craig Topper 350c5f1881 [X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart independent FMA and extractelement/insertelement.
llvm-svn: 336315
2018-07-05 06:52:55 +00:00
Sanjay Patel 9c2e7ceb1a [InstCombine] allow narrowing of min/max/abs
We have bailout hacks based on min/max in various places in instcombine 
that shouldn't be necessary. The affected test was added for:
D48930 
...which is a consequence of the improvement in:
D48584 (https://reviews.llvm.org/rL336172)

I'm assuming the visitTrunc bailout in this patch was added specifically 
to avoid a change from SimplifyDemandedBits, so I'm just moving that 
below the EvaluateInDifferentType optimization. A narrow min/max is still
a min/max.

llvm-svn: 336293
2018-07-04 17:44:04 +00:00
Sanjay Patel ad32d22e13 [InstCombine] add value names to test; NFC
That makes it easier to mix and match lines into other tests.

llvm-svn: 336289
2018-07-04 16:56:35 +00:00
Gabor Buella da4a966e1c NFC - Various typo fixes in tests
llvm-svn: 336268
2018-07-04 13:28:39 +00:00
Max Kazantsev bee9ca6568 [NFC] Add test that shows that InstCombine can do better
llvm-svn: 336258
2018-07-04 10:32:02 +00:00
Anastasis Grammenos 204726b345 [DebugInfo][LoopVectorize] Preserve DL in generated phi instruction
When creating `phi` instructions to resume at the scalar part of the loop,
copy the DebugLoc from the original phi over to the new one.

Differential Revision: https://reviews.llvm.org/D48769

llvm-svn: 336256
2018-07-04 10:16:55 +00:00
Anastasis Grammenos 509d79789f [DebugInfo][InstCombine] Preserve DI after combining zext
When zext is EvaluatedInDifferentType, InstCombine
drops the dbg.value intrinsic. This patch tries to
preserve said DI, by inserting the zext's old DI in the
resulting instruction. (Only for integer type for now)

Differential Revision: https://reviews.llvm.org/D48331

llvm-svn: 336254
2018-07-04 09:55:46 +00:00
Sanjay Patel 181aa26eb8 [InstCombine] add tests for shuffle+binop with constant op1; NFC
This adds coverage for a planned enhancement for ConstantExpr::getBinOpIdentity() noted in D48830.

llvm-svn: 336220
2018-07-03 18:43:46 +00:00
Sanjay Patel 8307bc407b [Constants] add identity constants for fadd/fmul
As the test diffs show, the current users of getBinOpIdentity()
are InstCombine and Reassociate. SLP vectorizer is a candidate
for using this functionality too (D28907).

The InstCombine shuffle improvements are part of the planned
enhancements noted in D48830.

InstCombine actually has several other uses of getBinOpIdentity() 
via SimplifyUsingDistributiveLaws(), but we don't call that for 
any FP ops. Fixing that might be another part of removing the
custom reassociation in InstCombine that is only done for fadd+fmul.

llvm-svn: 336215
2018-07-03 17:12:59 +00:00
Sanjay Patel 2c38b7fd8b [Reassociate] add tests for binop with identity constant; NFC
llvm-svn: 336214
2018-07-03 16:44:18 +00:00
Sanjay Patel 5b4a003088 [Reassociate] regenerate checks; NFC
llvm-svn: 336211
2018-07-03 16:01:41 +00:00
Sanjay Patel 5a6ba018d7 [Reassociate] add test for missing FP constant analysis; NFC
llvm-svn: 336208
2018-07-03 15:56:04 +00:00
Sanjay Patel 3074b9e53f [InstCombine] fold shuffle-with-binop and common value
This is the last significant change suggested in PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806#c5
...though there are several follow-ups noted in the code comments 
in this patch to complete this transform.

It's possible that a binop feeding a select-shuffle has been eliminated 
by earlier transforms (or the code was just written like this in the 1st 
place), so we'll fail to match the patterns that have 2 binops from: 
D48401, 
D48678, 
D48662, 
D48485.

In that case, we can try to materialize identity constants for the remaining
binop to fill in the "ghost" lanes of the vector (where we just want to pass 
through the original values of the source operand).

I added comments to ConstantExpr::getBinOpIdentity() to show planned follow-ups. 
For now, we only handle the 5 commutative integer binops (add/mul/and/or/xor).

Differential Revision: https://reviews.llvm.org/D48830

llvm-svn: 336196
2018-07-03 13:44:22 +00:00
Bjorn Pettersson 8dd6cf711f [DebugInfo] Corrections for salvageDebugInfo
Summary:
When salvaging a dbg.declare/dbg.addr we should not add
DW_OP_stack_value to the DIExpression
(see test/Transforms/InstCombine/salvage-dbg-declare.ll).

Consider this example
  %vla = alloca i32, i64 2
  call void @llvm.dbg.declare(metadata i32* %vla, metadata !1, metadata !DIExpression())

Instcombine will turn it into
  %vla1 = alloca [2 x i32]
  %vla1.sub = getelementptr inbounds [2 x i32], [2 x i32]* %vla, i64 0, i64 0
  call void @llvm.dbg.declare(metadata [2 x i32]* %vla1.sub, metadata !19, metadata !DIExpression())

If the GEP can be eliminated, then the dbg.declare will be salvaged
and we should get
  %vla1 = alloca [2 x i32]
  call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression())

The problem was that salvageDebugInfo did not recognize dbg.declare
as being indirect (%vla1 points to the value, it does not hold the
value), so we incorrectly got
  call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression(DW_OP_stack_value))

I also made sure that llvm::salvageDebugInfo and
DIExpression::prependOpcodes do not add DW_OP_stack_value to
the DIExpression in case no new operands are added to the
DIExpression. That way we avoid to, unneccessarily, turn a
register location expression into an implicit location expression
in some situations (see test11 in test/Transforms/LICM/sinking.ll).

Reviewers: aprantl, vsk

Reviewed By: aprantl, vsk

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D48837

llvm-svn: 336191
2018-07-03 11:29:00 +00:00
Chandler Carruth 3897ded691 [PM/LoopUnswitch] Fix PR37651 by correctly invalidating SCEV when
unswitching loops.

Original patch trying to address this was sent in D47624, but that
didn't quite handle things correctly. There are two key principles used
to select whether and how to invalidate SCEV-cached information about
loops:

1) We must invalidate any info SCEV has cached before unswitching as we
   may change (or destroy) the loop structure by the act of unswitching,
   and make it hard to recover everything we want to invalidate within
   SCEV.

2) We need to invalidate all of the loops whose CFGs are mutated by the
   unswitching. Notably, this isn't the *entire* loop nest, this is
   every loop contained by the outermost loop reached by an exit block
   relevant to the unswitch.

And we need to do this even when doing trivial unswitching.

I've added more focused tests that directly check that SCEV starts off
with imprecise information and after unswitching (and simplifying
instructions) re-querying SCEV will produce precise information. These
tests also specifically work to check that an *outer* loop's information
becomes precise.

However, the testing here is still a bit imperfect. Crafting test cases
that reliably fail to be analyzed by SCEV before unswitching and succeed
afterward proved ... very, very hard. It took me several hours and
careful work to build these, and I'm not optimistic about necessarily
coming up with more to cover more elaborate possibilities. Fortunately,
the code pattern we are testing here in the pass is really
straightforward and reliable.

Thanks to Max Kazantsev for the initial work on this as well as the
review, and to Hal Finkel for helping me talk through approaches to test
this stuff even if it didn't come to much.

Differential Revision: https://reviews.llvm.org/D47624

llvm-svn: 336183
2018-07-03 09:13:27 +00:00
Max Kazantsev 3097b76e8c [InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done
This patch changes order of transform in InstCombineCompares to avoid
performing transforms based on ranges which produce complex bit arithmetics
before more simple things (like folding with constants) are done. See PR37636
for the motivating example.

Differential Revision: https://reviews.llvm.org/D48584
Reviewed By: spatel, lebedev.ri

llvm-svn: 336172
2018-07-03 06:23:57 +00:00
Tim Shen c7cef4bcc4 [SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428).
Summary:
Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change
SCEV is able to prove that the loop doesn't wrap-self (due to zext i16
to i64), disabling the entire loop versioning pass. Removed the zext and
just use i64.

Reviewers: sanjoy

Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D48409

llvm-svn: 336140
2018-07-02 20:01:54 +00:00
Farhana Aleen 3b416db19b [SLP] Recognize min/max pattern using instructions producing same values.
Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization.

         %1 = extractelement <2 x i32> %a, i32 0
         %2 = extractelement <2 x i32> %a, i32 1
         %cond = icmp sgt i32 %1, %2
         %3 = extractelement <2 x i32> %a, i32 0
         %4 = extractelement <2 x i32> %a, i32 1
         %select = select i1 %cond, i32 %3, i32 %4

Author: FarhanaAleen

Reviewed By: ABataev, RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D47608

llvm-svn: 336130
2018-07-02 17:55:31 +00:00
Sanjay Patel b999d74132 [InstCombine] reverse canonicalization of add --> or to allow more shuffle folding
This extends D48485 to allow another pair of binops (add/or) to be combined either
with or without a leading shuffle:
or X, C --> add X, C (when X and C have no common bits set)

Here, we need value tracking to determine that the 'or' can be reversed into an 'add',
and we've added general infrastructure to allow extending to other opcodes or moving 
to where other passes could use that functionality.

Differential Revision: https://reviews.llvm.org/D48662

llvm-svn: 336128
2018-07-02 17:42:29 +00:00
Simon Pilgrim 35f196c179 [SLPVectorizer][X86] Begin adding alternate tests for call operators
Alternate opcode handling only supports binary operators, these tests demonstrate a missed opportunity to vectorize ceil/floor calls

llvm-svn: 336125
2018-07-02 17:23:45 +00:00
Sanjay Patel 284ba0c18f [ValueTracking] allow undef elements when matching vector abs
llvm-svn: 336111
2018-07-02 14:43:40 +00:00
Sanjay Patel 951f617e16 [InstCombine] adjust shuffle tests with IR flags; NFC
Due to current limitations in constant analysis, we need flags
on add or mul to show propagation for the potential transform
suggested in these tests (no other binops currently report 
identity constants).

llvm-svn: 336101
2018-07-02 13:40:54 +00:00
Florian Hahn 4ebba909a2 Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters.
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.

Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.

Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.

Reviewers: dberlin, mssimpso, davide, efriedma

Reviewed By: mssimpso

Differential Revision: https://reviews.llvm.org/D43762

llvm-svn: 336098
2018-07-02 12:44:04 +00:00
Sanjay Patel d980084597 [InstCombine] add tests for shuffle-binop; NFC
This is another pattern mentioned in PR37806.

llvm-svn: 336096
2018-07-02 12:30:46 +00:00
Simon Pilgrim 265793d52a [SLPVectorizer] Fix alternate opcode + shuffle cost function to correct handle SK_Select patterns.
We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case.

This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now...

llvm-svn: 336095
2018-07-02 11:28:01 +00:00
Max Kazantsev 66da390506 [NFC] Test that shows unprofitability of instcombine with bit ranges
llvm-svn: 336078
2018-07-02 06:55:00 +00:00
Piotr Padlewski 5b3db45e8f Implement strip.invariant.group
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2

Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar

Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits

Differential Revision: https://reviews.llvm.org/D47103

Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336073
2018-07-02 04:49:30 +00:00
Sanjay Patel 279a1a39ad [InstCombine] add abs tests with undef elts; NFC
llvm-svn: 336065
2018-07-01 17:14:37 +00:00
Sanjay Patel a9fdb9fd37 [PatternMatch] allow undef elements in vectors with m_Neg
This is similar to the m_Not change from D44076.

llvm-svn: 336064
2018-07-01 13:42:57 +00:00
David Green 963401d2be [UnrollAndJam] New Unroll and Jam pass
This is a simple implementation of the unroll-and-jam classical loop
optimisation.

The basic idea is that we take an outer loop of the form:

  for i..
    ForeBlocks(i)
    for j..
      SubLoopBlocks(i, j)
    AftBlocks(i)

Instead of doing normal inner or outer unrolling, we unroll as follows:

  for i... i+=2
    ForeBlocks(i)
    ForeBlocks(i+1)
    for j..
      SubLoopBlocks(i, j)
      SubLoopBlocks(i+1, j)
    AftBlocks(i)
    AftBlocks(i+1)
  Remainder Loop

So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now jammed loops.

To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).

Differential Revision: https://reviews.llvm.org/D41953

llvm-svn: 336062
2018-07-01 12:47:30 +00:00
Simon Pilgrim 84f77ecba9 [SLPVectorizer][X86] Add some alternate tests for cast operators
Alternate opcode handling only supports binary operators, these tests demonstrate missed opportunities to vectorize some sitofp/uitofp and fptosi/fptoui style casts as well as some (successful) float bits manipulations

llvm-svn: 336060
2018-07-01 11:29:46 +00:00
Eugene Leviant 6e4134459b [Evaluator] Improve evaluation of call instruction
Recommit of r335324 after buildbot failure fix

llvm-svn: 336059
2018-07-01 11:02:07 +00:00
Sanjay Patel 16a42ca274 [InstCombine] add tests for negate vector with undef elts; NFC
llvm-svn: 336050
2018-06-30 14:11:46 +00:00
Sanjay Patel 4491c0dd45 [InstCombine] add more tests for shuffle-binop folds; NFC
The mul+shl tests add coverage for the fold enabled with D48678.
The and+or tests are not handled yet; that's D48662.

llvm-svn: 335984
2018-06-29 15:28:11 +00:00
Sanjay Patel da66753e01 [InstCombine] enhance shuffle-of-binops to allow different variable ops (PR37806)
This was discussed in D48401 as another improvement for:
https://bugs.llvm.org/show_bug.cgi?id=37806

If we have 2 different variable values, then we shuffle (select) those lanes, 
shuffle (select) the constants, and then perform the binop. This eliminates a binop.

The new shuffle uses the same shuffle mask as the existing shuffle, so there's no 
danger of creating a difficult shuffle.

All of the earlier constraints still apply, but we also check for extra uses to 
avoid creating more instructions than we'll remove.

Additionally, we're disallowing the fold for div/rem because that could expose a
UB hole.

Differential Revision: https://reviews.llvm.org/D48678

llvm-svn: 335974
2018-06-29 13:44:06 +00:00
Roman Lebedev 8d081b78e4 SCEVExpander::expandAddRecExprLiterally(): check before casting as Instruction
Summary:
An alternative to D48597.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 | PR37936 ]].

The problem is as follows:
1. `indvars` marks `%dec` as `NUW`.
2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908)
3. `loop-reduce` tries to do some further modification, but crashes
    with an type assertion in cast, because `%dec` is no longer an `Instruction`,

If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`,
store that into a file, and then run `-loop-reduce`, there is no crash.

So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV.
But in this case we can just not crash if it's not an `Instruction`.
This is just a local fix, unlike D48597, so there may very well be other problems.

Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi

Reviewed By: mkazantsev

Subscribers: evstupac, javed.absar, spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D48599

llvm-svn: 335950
2018-06-29 07:44:20 +00:00
Sanjay Patel 019d3cd3b4 [InstCombine] adjust shuffle tests; NFC
Use xor for the extra uses test because div/rem have other problems.

llvm-svn: 335924
2018-06-28 21:14:02 +00:00
Teresa Johnson e87868b7e9 [ThinLTO] Port InlinerFunctionImportStats handling to new PM
Summary:
The InlinerFunctionImportStats will collect and dump stats regarding how
many function inlined into the module were imported by ThinLTO.

Reviewers: wmi, dexonsmith

Subscribers: mehdi_amini, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D48729

llvm-svn: 335914
2018-06-28 20:07:47 +00:00
Anastasis Grammenos 425df22ee3 [SROA] Preserve DebugLoc when rewriting alloca partitions
When rewriting an alloca partition copy the DL from the
old alloca over the the new one.

Differential Revision: https://reviews.llvm.org/D48640

llvm-svn: 335904
2018-06-28 18:58:30 +00:00
Sanjay Patel 57bda365bf [InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806)
This is an enhancement to D48401 that was discussed in:
https://bugs.llvm.org/show_bug.cgi?id=37806

We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other 
direction because that's generally better of course). This allows us to remove the shuffle 
as we do in the regular opcodes-are-the-same cases.

This requires a small hack to make sure we don't introduce any extra poison:
https://rise4fun.com/Alive/ZGv

Other examples of opcodes where this would work are add+sub and fadd+fsub, but we already 
canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. There 
are planned enhancements for opcode transforms such or -> add.

Note that there's a different fold needed if we've already managed to simplify away a binop 
as seen in the test based on PR37806, but we manage to get that one case here because this 
fold is positioned above the demanded elements fold currently.

Differential Revision: https://reviews.llvm.org/D48485

llvm-svn: 335888
2018-06-28 17:48:04 +00:00
Florian Hahn 388af14f85 [SCCP] Mark CFG as preserved.
SCCP does not change the CFG, so we can mark it as preserved.

Reviewers: dberlin, efriedma, davide

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D47149

llvm-svn: 335820
2018-06-28 09:53:38 +00:00
Max Kazantsev f5ba37182e [IndVarSimplify] Ignore unreachable users of truncs
If a trunc has a user in a block which is not reachable from entry,
we can safely perform trunc elimination as if this user didn't exist.

llvm-svn: 335816
2018-06-28 08:20:03 +00:00
Sanjay Patel 1ef49be8b6 [InstCombine] add tests for vector-select-of-binops with 2 variables; NFC
llvm-svn: 335778
2018-06-27 20:23:47 +00:00
Sanjay Patel 7e45aebe55 [InstCombine] add more tests for shuffle with different binops; NFC
llvm-svn: 335756
2018-06-27 17:21:57 +00:00
Craig Topper 31cbe75b3b [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name.
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".

llvm-svn: 335744
2018-06-27 15:57:53 +00:00
Adhemerval Zanella cadcfed7aa [AArch64] Add custom lowering for v4i8 trunc store
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

  define void @foo(<4 x i16> %x, i8* nocapture %p) {
    %0 = trunc <4 x i16> %x to <4 x i8>
    %1 = bitcast i8* %p to <4 x i8>*
    store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
    ret void
  }

Can be optimized from:

  umov    w8, v0.h[3]
  umov    w9, v0.h[2]
  umov    w10, v0.h[1]
  umov    w11, v0.h[0]
  strb    w8, [x0, #3]
  strb    w9, [x0, #2]
  strb    w10, [x0, #1]
  strb    w11, [x0]
  ret

To:

  xtn     v0.8b, v0.8h
  str     s0, [x0]
  ret

The patch also adjust the memory cost for autovectorization, so the C
code:

  void foo (const int *src, int width, unsigned char *dst)
  {
    for (int i = 0; i < width; i++)
       *dst++ = *src++;
  }

can be vectorized to:

  .LBB0_4:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
        ldr     q0, [x0], #16
        subs    x12, x12, #4            // =4
        xtn     v0.4h, v0.4s
        xtn     v0.8b, v0.8h
        st1     { v0.s }[0], [x2], #4
        b.ne    .LBB0_4

Instead of byte operations.

llvm-svn: 335735
2018-06-27 13:58:46 +00:00
Vedant Kumar f6c0b41fb7 [InstCombine] Avoid creating mis-sized dbg.values in commonCastTransforms()
This prevents InstCombine from creating mis-sized dbg.values when
replacing a sequence of casts with a simpler cast. For example, in:

  (fptrunc (floor (fpext X))) -> (floorf X)

We no longer emit dbg.value(X) (with a 32-bit float operand) to describe
(fpext X) (which is a 64-bit float).

This was diagnosed by the debugify check added in r335682.

llvm-svn: 335696
2018-06-27 00:47:53 +00:00
Vedant Kumar d13536e9f3 [Debugify] Handle failure to get fragment size when checking dbg.values
It's not possible to get the fragment size of some dbg.values. Teach the
mis-sized dbg.value diagnostic to detect this scenario and bail out.

Tested with:
$ find test/Transforms -print -exec opt -debugify-each -instcombine {} \;

llvm-svn: 335695
2018-06-27 00:47:52 +00:00
Michael Zolotukhin d3b8bdef01 [JumpThreading] Don't try to rewrite a use if it's already valid.
Summary:
When recording uses we need to rewrite after cloning a loop we need to
check if the use is not dominated by the original def. The initial
assumption was that the cloned basic block will introduce a new path and
thus the original def will only dominate the use if they are in the same
BB, but as the reproducer from PR37745 shows it's not always the case.

This fixes PR37745.

Reviewers: haicheng, Ka-Ka

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48111

llvm-svn: 335675
2018-06-26 22:19:48 +00:00
Matt Arsenault 7e991d30c0 ConstantFold: Don't fold global address vs. null for addrspace != 0
Not sure why this logic seems to be repeated in 2 different places,
one called by the other.

On AMDGPU addrspace(3) globals start allocating at 0, so these
checks will be incorrect (not that real code actually tries
to compare these addresses)

llvm-svn: 335649
2018-06-26 18:55:43 +00:00
Matt Arsenault 2c1a570aab LoopUnroll: Allow analyzing intrinsic call costs
I'm not sure why the code here is skipping calls since
TTI does try to do something for general calls, but it
at least should allow intrinsics.

Skip intrinsics that should not be omitted as calls, which
is by far the most common case on AMDGPU.

llvm-svn: 335645
2018-06-26 18:51:17 +00:00
Sanjay Patel ad0bfb844d [InstSimplify] fold shifts by sext bool
https://rise4fun.com/Alive/c3Y

llvm-svn: 335633
2018-06-26 17:31:38 +00:00
Sanjay Patel 3d1e4d6fa6 [InstSimplify] add tests for shifts by sext bool; NFC
llvm-svn: 335631
2018-06-26 17:15:07 +00:00
Sanjay Patel 3575f0c0b3 [InstCombine] fold urem with sext bool divisor
Similar to other patches in this series:
https://reviews.llvm.org/rL335512
https://reviews.llvm.org/rL335527
https://reviews.llvm.org/rL335597
https://reviews.llvm.org/rL335616

...this is filling a gap in analysis that is exposed by an unrelated select-of-constants transform.
I didn't see a way to unify the sext cases because each div/rem opcode results in a different fold.

Note that in this case, the backend might want to convert the select into math:
Name: sext urem
%e = sext i1 %x to i32
%r = urem i32 %y, %e
=>
%c = icmp eq i32 %y, -1
%z = zext i1 %c to i32
%r = add i32 %z, %y

llvm-svn: 335622
2018-06-26 16:30:00 +00:00
Simon Pilgrim bbfc18b5b5 [SLPVectorizer] Recognise non uniform power of 2 constants
Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them.

As SLP works with arrays of values I don't think we can easily use the pattern match helpers here.

Differential Revision: https://reviews.llvm.org/D48214

llvm-svn: 335621
2018-06-26 16:20:16 +00:00
Sanjay Patel 0f44759b0d [InstCombine] add tests for urem with sext bool divisor; NFC
llvm-svn: 335619
2018-06-26 16:01:24 +00:00
Sanjay Patel 2b7e31095d [InstSimplify] fold srem with sext bool divisor
llvm-svn: 335616
2018-06-26 15:32:54 +00:00
Sanjay Patel 0e0dbebeed [InstSimplify] add tests for srem with sext bool divisor; NFC
llvm-svn: 335609
2018-06-26 14:47:31 +00:00
Sanjay Patel 7c45debaea [InstCombine] fold udiv with sext bool divisor
Note: I didn't add a hasOneUse() check because the existing,
related fold doesn't have that check. I suspect that the
improved analysis and codegen make these some of the rare
canonicalization cases where we allow an increase in
instructions.

llvm-svn: 335597
2018-06-26 12:41:15 +00:00
Florian Hahn 4a69b0bb36 [IPSCCP] Change dead blocks to unreachable after visiting all executable blocks.
changeToUnreachable may remove PHI nodes from executable blocks we found values
for and we would fail to replace them. By changing dead blocks to unreachable after
we replaced constants in all executable blocks, we ensure such PHI nodes are replaced
by their known value before.

Fixes PR37780.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D48421

llvm-svn: 335588
2018-06-26 10:15:02 +00:00
Bjorn Pettersson 550517bcab Improve ConvertDebugDeclareToDebugValue
Summary:
This is a follow-up to r334830 and r335031.

In the valueCoversEntireFragment check we now also handle
the situation when there is a variable length array (VLA)
involved, and the length of the array has been reduced to
a constant.

The ConvertDebugDeclareToDebugValue functions that are related
to PHI nodes and load instructions now avoid inserting dbg.value
intrinsics when the value does not, for certain, cover the
variable/fragment that should be described.
In r334830 we assumed that the value always covered the entire
var/fragment and we had assertions in the code to show that
assumption. However, those asserts failed when compiling code
with VLAs, so we removed the asserts in r335031. Now when we
know that the valueCoversEntireFragment check can fail also for
PHI/Load instructions we avoid to insert the faulty dbg.value
intrinsic in such situations. Compared to the Store instruction
scenario we simply drop the dbg.value here (as the variable does
not change its value due to PHI/Load, so an earlier dbg.value
describing the variable should still be valid).

Reviewers: aprantl, vsk, efriedma

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48547

llvm-svn: 335580
2018-06-26 06:17:00 +00:00
Gil Rapaport da2e2caa6c [InstCombine] (A + 1) + (B ^ -1) --> A - B
Turn canonicalized subtraction back into (-1 - B) and combine it with (A + 1) into (A - B).
This is similar to the folding already done for (B ^ -1) + Const into (-1 + Const) - B.

Differential Revision: https://reviews.llvm.org/D48535

llvm-svn: 335579
2018-06-26 05:31:18 +00:00
Chandler Carruth 1652996fd6 [PM/LoopUnswitch] Teach the new unswitch to handle nontrivial
unswitching of switches.

This works much like trivial unswitching of switches in that it reliably
moves the switch out of the loop. Here we potentially clone the entire
loop into each successor of the switch and re-point the cases at these
clones.

Due to the complexity of actually doing nontrivial unswitching, this
patch doesn't create a dedicated routine for handling switches -- it
would duplicate far too much code. Instead, it generalizes the existing
routine to handle both branches and switches as it largely reduces to
looping in a few places instead of doing something once. This actually
improves the results in some cases with branches due to being much more
careful about how dead regions of code are managed. With branches,
because exactly one clone is created and there are exactly two edges
considered, somewhat sloppy handling of the dead regions of code was
sufficient in most cases. But with switches, there are much more
complicated patterns of dead code and so I've had to move to a more
robust model generally. We still do as much pruning of the dead code
early as possible because that allows us to avoid even cloning the code.

This also surfaced another problem with nontrivial unswitching before
which is that we weren't as precise in reconstructing loops as we could
have been. This seems to have been mostly harmless, but resulted in
pointless LCSSA PHI nodes and other unnecessary cruft. With switches, we
have to get this *right*, and everything benefits from it.

While the testing may seem a bit light here because we only have two
real cases with actual switches, they do a surprisingly good job of
exercising numerous edge cases. Also, because we share the logic with
branches, most of the changes in this patch are reasonably well covered
by existing tests.

The new unswitch now has all of the same fundamental power as the old
one with the exception of the single unsound case of *partial* switch
unswitching -- that really is just loop specialization and not
unswitching at all. It doesn't fit into the canonicalization model in
any way. We can add a loop specialization pass that runs late based on
profile data if important test cases ever come up here.

Differential Revision: https://reviews.llvm.org/D47683

llvm-svn: 335553
2018-06-25 23:32:54 +00:00
Sanjay Patel 0c90400bf2 [InstCombine] add/move tests for udiv; NFC
llvm-svn: 335544
2018-06-25 22:27:36 +00:00
Sanjay Patel 6a96d90acd [InstCombine] fold sdiv with sext bool divisor
llvm-svn: 335527
2018-06-25 21:39:41 +00:00
Sanjay Patel 46f9b8c333 [InstCombine] add tests for sdiv with sext bool divisor; NFC
llvm-svn: 335526
2018-06-25 21:36:09 +00:00
Florian Hahn b10b141a79 Revert r335513: [SCEVExp] Advance found insertion point
llvm-svn: 335522
2018-06-25 20:55:26 +00:00
Florian Hahn 0b3ed5742a Force vector width for scev-expander-debug.ll test
llvm-svn: 335520
2018-06-25 20:40:50 +00:00
Florian Hahn 5947c17fd4 [SCEVExp] Advance found insertion point until we find a non-dbg instruction.
This avoids creating unnecessary casts if the IP used to be a dbg info
intrinsic. Fixes PR37727.

Reviewers: vsk, aprantl, sanjoy, efriedma

Reviewed By: vsk, efriedma

Differential Revision: https://reviews.llvm.org/D47874

llvm-svn: 335513
2018-06-25 19:17:29 +00:00
Sanjay Patel 1e911fa746 [InstSimplify] fold div/rem of zexted bool
I was looking at an unrelated fold and noticed that
we don't have this simplification (because the other
fold would break existing tests).

Name: zext udiv
  %z = zext i1 %x to i32
  %r = udiv i32 %y, %z
=>
  %r = %y

Name: zext urem
  %z = zext i1 %x to i32
  %r = urem i32 %y, %z
=>
  %r = 0

Name: zext sdiv
  %z = zext i1 %x to i32
  %r = sdiv i32 %y, %z
=>
  %r = %y

Name: zext srem
  %z = zext i1 %x to i32
  %r = srem i32 %y, %z
=>
  %r = 0

https://rise4fun.com/Alive/LZ9

llvm-svn: 335512
2018-06-25 18:51:21 +00:00
Sanjay Patel a46bcbec58 [InstSimplify] add tests for div/rem with bool divisor; NFC
llvm-svn: 335509
2018-06-25 18:27:14 +00:00
Sanjay Patel 2e8babb4fa [InstCombine] add tests for add-of-sext-bool; NFC
We canonicalize to select with a zext-add and either zext-sub or sext-sub,
so this shows a pattern that's not conforming to the general trend.

llvm-svn: 335506
2018-06-25 17:52:10 +00:00
Stanislav Mekhanoshin d8c9374797 Fix invariant fdiv hoisting in LICM
FDiv is replaced with multiplication by reciprocal and invariant
reciprocal is hoisted out of the loop, while multiplication remains
even if invariant.

Switch checks for all invariant operands and only invariant
denominator to fix the issue.

Differential Revision: https://reviews.llvm.org/D48447

llvm-svn: 335411
2018-06-23 04:01:28 +00:00
Eli Friedman 203eaaf5ba [LoopReroll] Rewrite induction variable rewriting.
This gets rid of a bunch of weird special cases; instead, just use SCEV
rewriting for everything.  In addition to being simpler, this fixes a
bug where we would use the wrong stride in certain edge cases.

The one bit I'm not quite sure about is the trip count handling,
specifically the FIXME about overflow.  In general, I think we need to
widen the exit condition, but that's probably not profitable if the new
type isn't legal, so we probably need a check somewhere.  That said, I
don't think I'm making the existing problem any worse.

As a followup to this, a bunch of IV-related code in root-finding could
be cleaned up; with SCEV-based rewriting, there isn't any reason to
assume a loop will have exactly one or two PHI nodes.

Differential Revision: https://reviews.llvm.org/D45191

llvm-svn: 335400
2018-06-22 22:58:55 +00:00
Simon Pilgrim 9d3ef8ee2b [SLPVectorizer] Support alternate opcodes in tryToVectorizeList
Enable tryToVectorizeList to support InstructionsState alternate opcode patterns at a root (build vector etc.) as well as further down the vectorization tree.

NOTE: This patch reduces some of the debug reporting if there are opcode mismatches - I can try to add it back if it proves a problem. But it could get rather messy trying to provide equivalent verbose debug strings via getSameOpcode etc.

Differential Revision: https://reviews.llvm.org/D48488

llvm-svn: 335364
2018-06-22 16:37:34 +00:00
Simon Pilgrim 1e564504bb [SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pair
SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle.

This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle.

Differential Revision: https://reviews.llvm.org/D48477

llvm-svn: 335349
2018-06-22 14:04:06 +00:00
Simon Pilgrim 229a781214 [SLPVectorizer][X86] Add alternate opcode tests for simple build vector cases
llvm-svn: 335348
2018-06-22 13:53:58 +00:00
Sanjay Patel c2b37f73f5 [InstCombine] add shuffle+binops test from PR37806; NFC
This one shows another pattern that we'll need to match
in some cases, but the current ordering of folds allows
us to match this as 2 binops before simplification takes
place.

llvm-svn: 335347
2018-06-22 13:44:42 +00:00
Sanjay Patel cfd9da038c [InstCombine] add tests for shuffle-with-different-binops; NFC
llvm-svn: 335345
2018-06-22 13:19:25 +00:00
Simon Pilgrim 9c8f9374b5 [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc
AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.

This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.

I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.

Differential Revision: https://reviews.llvm.org/D48172

llvm-svn: 335329
2018-06-22 09:45:31 +00:00
Eugene Leviant 6d711ca168 Revert r335324 due to a builtbot failure
llvm-svn: 335327
2018-06-22 08:57:01 +00:00
Eugene Leviant ea19c9473c [Evaluator] Improve evaluation of call instruction
Differential revision: https://reviews.llvm.org/D46584

llvm-svn: 335324
2018-06-22 08:29:36 +00:00
Chandler Carruth fe70b29cf7 [LegacyPM] Fix PR37888 by teaching the legacy loop pass manager how to
clear out deleted loops from the current queue beyond just the current
loop.

This is important because SimpleLoopUnswitch will now enqueue the same
loop to be re-processed. When it does this with the legacy PM, we don't
have a way of canceling the rest of the pipeline and so we can end up
deleting the loop before we reprocess it. =/

This change also makes it easy to support deleting other loops in the
queue to process, although I don't have any use cases for that.

Differential Revision: https://reviews.llvm.org/D48470

llvm-svn: 335317
2018-06-22 02:43:41 +00:00
Sanjay Patel 4784e1506e [InstCombine] fix shuffle-of-binops bug
With non-commutative binops, we could be using the same
variable value as operand 0 in 1 binop and operand 1 in 
the other, so we have to check for that possibility and
bail out.

llvm-svn: 335312
2018-06-21 23:56:59 +00:00
Sanjay Patel 23408c25f3 [InstCombine] add test for shuffle-of-binops; NFC
This shows a miscompile that was missed in rL335283.

llvm-svn: 335311
2018-06-21 23:53:01 +00:00
Matthew Voss 30648ab233 [GVN] Avoid casting a vector of size less than 8 bits to i8
Summary:
A reprise of D25849.

This crash was found through fuzzing some time ago and was documented in PR28879.

No check for load size has been added due to the following tests:
  - Transforms/GVN/invariant.group.ll
  - Transforms/GVN/pr10820.ll

These tests expect load sizes that are not a multiple of eight.

Thanks to @davide for the original patch.

Reviewers: nlopes, davide, RKSimon, reames, efriedma

Reviewed By: efriedma

Subscribers: davide, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D48330

llvm-svn: 335294
2018-06-21 21:43:20 +00:00
Sanjay Patel a76b70069d [InstCombine] fold vector select of binops with constant ops to 1 binop (PR37806)
This is the simplest case from PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806

If we have a common variable operand used in a pair of binops with vector constants 
that are vector selected together, then we can constant shuffle the constant vectors 
to eliminate the shuffle instruction.

This has some tricky parts that are hopefully addressed in the tests and their 
respective comments:

  1. If the shuffle mask contains an undef element, then that lane of the result is 
     undef:
     http://llvm.org/docs/LangRef.html#shufflevector-instruction

     Therefore, we can replace the constant in that lane with an undef value except 
     for div/rem. With div/rem, an undef in the divisor would cause the whole op to 
     be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'.

  2. Intersect the wrapping and FMF of the original binops for the new binop. There 
     should be no extra poison or fast-math potential in the new binop that wasn't 
     possible in the original code.

  3. Disregard other uses. Given that we're eliminating uses (shortening the 
     dependency chain), I think that's always the right IR canonicalization. But 
     I purposely chose the udiv test to demonstrate the scenario where both 
     intermediate values have other uses because that seems likely worse for 
     codegen with an expensive math op. This seems like a very rare possibility to 
     me, so I don't think it requires a backend patch first.

Differential Revision: https://reviews.llvm.org/D48401

llvm-svn: 335283
2018-06-21 20:15:09 +00:00
Francis Visoiu Mistrih ac599b6951 Revert r335206 "Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions."
This reverts commit r335206.

As discussed here: https://reviews.llvm.org/rL333740, a fix will come
tomorrow. In the meanwhile, revert this to fix some bots.

llvm-svn: 335272
2018-06-21 19:18:36 +00:00
Sanjay Patel 3382dc644e [InstCombine] add tests for shuffled cmps; NFC
llvm-svn: 335266
2018-06-21 18:07:38 +00:00
Sanjay Patel 3244537a3c [InstCombine] use constant pattern matchers with icmp+sext
The previous code worked with vectors, but it failed when the
vector constants contained undef elements. 
The matchers handle those cases.

llvm-svn: 335262
2018-06-21 17:51:44 +00:00
Sanjay Patel 5522e968ad [InstCombine] add vector icmp tests with undefs; NFC
llvm-svn: 335261
2018-06-21 17:37:14 +00:00
Sanjay Patel 447e8ece4d [LoopVectorize] regenerate full checks; NFC
llvm-svn: 335257
2018-06-21 16:54:32 +00:00
Nicolai Haehnle b29ee70122 InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemanded
Summary:
Use the expanded features of the TableGen generic tables to avoid manually
adding the combinatorially exploded set of intrinsics. The
getAMDGPUImageDimIntrinsic lookup function is early-out,
i.e. non-AMDGPU intrinsics will never look at the underlying table.

Use a generic approach for getting the new intrinsic overload to keep the
code simple, and make the image dmask handling more generic:
- handle non-sampler image loads
- handle the case where the set of demanded elements is not a prefix

There is some overlap between this code and an optimization that happens
in the backend during code generation. They currently complement each other:

- only the codegen optimization can generate vec3 loads
- only the InstCombine optimization can handle D16

The InstCombine optimization also likely covers more cases since the
codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization
in codegen once the infrastructure for vec3 is in place (which will probably
take a long time).

Modify the test cases to use dimension-aware intrinsics. This makes it
easier to see that the test coverage for the new intrinsics is equivalent,
and the old style intrinsics will be removed in a follow-up commit anyway.

Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad

Reviewers: arsenm, rampitec, majnemer

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48165

llvm-svn: 335230
2018-06-21 13:37:31 +00:00
David Green d143c65de3 [DA] Enable -da-delinearize by default
This enables da-delinearize in Dependence Analysis for delinearizing array
accesses into multiple dimensions. This can help to increase the power of
Dependence analysis on multi-dimensional arrays and prevent having to fall
back to the slower and less accurate MIV tests. It adds static checks on the
bounds of the arrays to ensure that one dimension doesn't overflow into
another, and brings our code in line with our tests.

Differential Revision: https://reviews.llvm.org/D45872

llvm-svn: 335217
2018-06-21 11:53:16 +00:00
Simon Pilgrim 2a9cde026c [X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882)
These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. 

llvm-svn: 335216
2018-06-21 11:37:13 +00:00
Simon Pilgrim d08fbf6486 [SLPVectorizer][X86] Add horizontal add/sub tests
Shows PR37882 perf regression

llvm-svn: 335215
2018-06-21 11:16:10 +00:00
Florian Hahn d36aa1f763 Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
r335150 should resolve the issues with the clang-with-thin-lto-ubuntu
and clang-with-lto-ubuntu builders.

Original message:
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.

As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.

Reviewers: davide, mssimpso, dberlin, efriedma

Reviewed By: davide, dberlin

llvm-svn: 335206
2018-06-21 07:15:08 +00:00
Chandler Carruth d1dab0c3c0 [PM/LoopUnswitch] Add partial non-trivial unswitching for invariant
conditions feeding a chain of `and`s or `or`s for a branch.

Much like with full non-trivial unswitching, we rely on the pass manager
to handle iterating until all of the profitable unswitches have been
done. This is to allow other more profitable unswitches to fire on any
of the cloned, simpler versions of the loop if viable.

Threading the partial unswiching through the non-trivial unswitching
logic motivated some minor refactorings. If those are too disruptive to
make it reasonable to review this patch, I can separate them out, but
it'll be somewhat timeconsuming so I wanted to send it for initial
review as-is. Feel free to tell me whether it warrants pulling apart.

I've tried to re-use (and factor out) logic form the partial trivial
unswitching, but not as much could be shared as I had haped. Still, this
wasn't as bad as I naively expected.

Some basic testing is added, but I probably need more. Suggestions for
things you'd like to see tested more than welcome. One thing I'd like to
do is add some testing that when we schedule this with loop-instsimplify
it effectively cleans up the cruft created.

Last but not least, this uncovered a bug that has been in loop cloning
the entire time for non-trivial unswitching. Specifically, we didn't
correctly add the outer-most cloned loop to the list of cloned loops.
This meant that LCSSA wouldn't be updated for it hypothetically, and
more significantly that we would never visit it in the loop pass
manager. I noticed this while checking loop-instsimplify by hand. I'll
try to separate this bugfix out into its own patch with a more focused
test. But it is just one line, so shouldn't significantly confuse the
review here.

After this patch, the only missing "feature" in this unswitch I'm aware
of us non-trivial unswitching of switches. I'll try implementing *full*
non-trivial unswitching of switches (which is at least a sound thing to
implement), but *partial* non-trivial unswitching of switches is
something I don't see any sound and principled way to implement. I also
have no interesting test cases for the latter, so I'm not really
worried. The rest of the things that need to be ported are bug-fixes and
more narrow / targeted support for specific issues.

Differential Revision: https://reviews.llvm.org/D47522

llvm-svn: 335203
2018-06-21 06:14:03 +00:00
Alina Sbirlea dfd14adeb0 Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor

Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)

2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:

1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.

2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
  i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
  ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
  - Since Pred is not deleted (BB is), the entry block does not need updating.
  - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
  - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
  - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
  - adding some test owner as subscribers for the interesting tests modified:
    test/CodeGen/X86/avx-cmp.ll
    test/CodeGen/AMDGPU/nested-loop-conditions.ll
    test/CodeGen/AMDGPU/si-annotate-cf.ll
    test/CodeGen/X86/hoist-spill.ll
    test/CodeGen/X86/2006-11-17-IllegalMove.ll

3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.

Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar

Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D48202

llvm-svn: 335183
2018-06-20 22:01:04 +00:00
Sanjay Patel 88de778b9b [InstCombine] fix typo in test comment; NFC
llvm-svn: 335165
2018-06-20 20:16:45 +00:00
Chandler Carruth 4da3331d3d [PM/LoopUnswitch] Support partial trivial unswitching.
The idea of partial unswitching is to take a *part* of a branch's
condition that is loop invariant and just unswitching that part. This
primarily makes sense with i1 conditions of branches as opposed to
switches. When dealing with i1 conditions, we can easily extract loop
invariant inputs to a a branch and unswitch them to test them entirely
outside the loop.

As part of this, we now create much more significant cruft in the loop
body, so this relies on adding cleanup passes to the loop pipeline and
revisiting unswitched loops to do that cleanup before continuing to
process them.

This already appears to be more powerful at unswitching than the old
loop unswitch pass, and so I'd appreciate pretty careful review in case
I'm just missing some correctness checks. The `LIV-loop-condition` test
case is not unswitched by the old unswitch pass, but is with this pass.

Thanks to Sanjoy and Fedor for the review!

Differential Revision: https://reviews.llvm.org/D46706

llvm-svn: 335156
2018-06-20 18:57:07 +00:00