Commit Graph

21919 Commits

Author SHA1 Message Date
Teresa Johnson c374a800e7 Refine ArgPromotion metadata handling
Summary:
In r353537 we now copy all metadata to the new function, with the old
being removed when the old function is eliminated. In some cases the old
function is dropped to a declaration (seems to only occur with the old
PM). Go ahead and clear all metadata from the old function to handle that
case, since verification will complain otherwise. This is consistent
with what was being done for debug metadata before r353537.

Reviewers: davidxl, uabelho

Subscribers: jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58215

llvm-svn: 354032
2019-02-14 14:14:24 +00:00
Florian Hahn 6ab83b7db6 [LoopUnrollPeel] Add case where we should forget the peeled loop from SCEV.
The test case requires the peeled loop to be forgotten after peeling,
even though it does not have a parent. When called via the unroller,
SE->forgetTopmostLoop is also called, so the test case would also pass
without any SCEV invalidation, but peelLoop is exposed as utility
function. Also, in the test case, simplifyLoop will make changes,
removing the loop from SCEV, but it is better to not rely on this
behavior.

Reviewers: sanjoy, mkazantsev

Reviewed By: mkazantsev

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58192

llvm-svn: 354031
2019-02-14 13:59:39 +00:00
Clement Courbet c6e768f0ee [Instrumentation][NFC] Fix warning.
lib/Transforms/Instrumentation/AddressSanitizer.cpp:1173:29: warning: extra ‘;’ [-Wpedantic]

llvm-svn: 354024
2019-02-14 12:10:49 +00:00
Max Kazantsev deaf2ba280 [NFC] Refactor LICM code for better readability
llvm-svn: 354013
2019-02-14 09:04:12 +00:00
Leonard Chan 436fb2bd82 [NewPM] Second attempt at porting ASan
This is the second attempt to port ASan to new PM after D52739. This takes the
initialization requried by ASan from the Module by moving it into a separate
class with it's own analysis that the new PM ASan can use.

Changes:
- Split AddressSanitizer into 2 passes: 1 for the instrumentation on the
  function, and 1 for the pass itself which creates an instance of the first
  during it's run. The same is done for AddressSanitizerModule.
- Add new PM AddressSanitizer and AddressSanitizerModule.
- Add legacy and new PM analyses for reading data needed to initialize ASan with.
- Removed DominatorTree dependency from ASan since it was unused.
- Move GlobalsMetadata and ShadowMapping out of anonymous namespace since the
  new PM analysis holds these 2 classes and will need to expose them.

Differential Revision: https://reviews.llvm.org/D56470

llvm-svn: 353985
2019-02-13 22:22:48 +00:00
Vedant Kumar 4b0cc9a7c8 [CodeExtractor] Only lift lifetime markers present in the extraction region
When CodeExtractor finds liftime markers referencing inputs to the
extraction region, it lifts these markers out of the region and inserts
them around the call to the extracted function (see r350420, PR39671).

However, it should *only* lift lifetime markers that are actually
present in the extraction region. I.e., if a start marker is present in
the extraction region but a corresponding end marker isn't (or vice
versa), only the start marker (or end marker, resp.) should be lifted.

Differential Revision: https://reviews.llvm.org/D57834

llvm-svn: 353973
2019-02-13 19:53:38 +00:00
Max Kazantsev 3fe9ad7a9f [NFC] Add const qualifiers where possible
llvm-svn: 353941
2019-02-13 11:54:45 +00:00
Jeremy Morse f10af3f134 [DebugInfo][InstCombine] Prefer to salvage debuginfo over sinking it
When instcombine sinks an instruction between two basic blocks, it sinks any
dbg.value users in the source block with it, to prevent debug use-before-free.
However we can do better by attempting to salvage the debug users, which would
avoid moving where the variable location changes. If we successfully salvage,
still sink a (cloned) dbg.value with the sunk instruction, as the sunk
instruction is more likely to be "live" later in the compilation process.

If we can't salvage dbg.value users of a sunk instruction, mark the dbg.values
in the original block as being undef. This terminates any earlier variable
location range, and represents the fact that we've optimized out the variable
location for a portion of the program.

Differential Revision: https://reviews.llvm.org/D56788

llvm-svn: 353936
2019-02-13 10:54:53 +00:00
Max Kazantsev 2bb95e7c76 [GuardWidening] Support widening of explicitly expressed guards
This patch adds support of guards expressed in explicit form via
`widenable_condition` in Guard Widening pass.

Differential Revision: https://reviews.llvm.org/D56075
Reviewed By: reames

llvm-svn: 353932
2019-02-13 09:56:30 +00:00
Max Kazantsev 5cf777e413 [LoopSimplifyCFG] Re-enable const branch folding by default
Known underlying bugs have been fixed, intensive fuzz testing did not
find any new problems. Re-enabling by default. Feel free to revert if
it causes any functional failures.

llvm-svn: 353911
2019-02-13 06:12:48 +00:00
Alina Sbirlea 0a8bc14ad7 [MemorySSA & LoopPassManager] Add remaining book keeping [NFCI].
Add plumbing to get MemorySSA in the remaining loop passes.
Also update unit test to add the dependency.
[EnableMSSALoopDependency remains disabled].

llvm-svn: 353901
2019-02-12 23:48:02 +00:00
Alina Sbirlea 8567ff0c34 [LICM] Cap the clobbering calls in LICM.
Summary:
Unlimitted number of calls to getClobberingAccess can lead to high
compile times in pathological cases.
Switching EnableLicmCap flag from bool to int, and enabling to default 100.
(tested to be appropriate for current bechmarks)
We can revisit this value when enabling MemorySSA.

Reviewers: sanjoy, chandlerc, george.burgess.iv

Subscribers: jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57968

llvm-svn: 353897
2019-02-12 23:05:40 +00:00
Jeremy Morse b33a5c7347 [DebugInfo] Don't salvage load operations (PR40628).
Salvaging a redundant load instruction into a debug expression hides a
memory read from optimisation passes. Passes that alter memory behaviour
(such as LICM promoting memory to a register) aren't aware of these debug
memory reads and leave them unaltered, making the debug variable location
point somewhere unsafe.

Teaching passes to know about these debug memory reads would be challenging
and probably incomplete. Finding dbg.value instructions that need to be fixed
would likely be computationally expensive too, as more analysis would be
required. It's better to not generate debug-memory-reads instead, alas.

Changed tests:
 * DeadStoreElim: test for salvaging of intermediate operations contributing
   to the dead store, instead of salvaging of the redundant load,
 * GVN: remove debuginfo behaviour checks completely, this behaviour is still
   covered by other tests,
 * InstCombine: don't test for salvaged loads, we're removing that behaviour.

Differential Revision: https://reviews.llvm.org/D57962

llvm-svn: 353824
2019-02-12 10:54:30 +00:00
Max Kazantsev 2a184af221 [IndVars] Fix corner case with unreachable Phi inputs. PR40454
Logic in `getInsertPointForUses` doesn't account for a corner case when `Def`
only comes to a Phi user from unreachable blocks. In this case, the incoming
value may be arbitrary (and not even available in the input block) and break
the loop-related invariants that are asserted below.

In fact, if we encounter this situation, no IR modification is needed. This
Phi will be simplified away with nearest cleanup.

Differential Revision: https://reviews.llvm.org/D58045
Reviewed By: spatel

llvm-svn: 353816
2019-02-12 09:59:44 +00:00
Max Kazantsev bf6af8fbf0 [LoopSimplifyCFG] Change logic of dead loops removal to avoid hitting asserts
The function `LI.erase` has some invariants that need to be preserved when it
tries to remove a loop which is not the top-level loop. In particular, it
requires loop's preheader to be strictly in loop's parent. Our current logic
of deletion of dead blocks may erase the information about preheader before we
handle the loop, and therefore we may hit this assertion.

This patch changes the logic of loop deletion: we make them top-level loops
before we actually erase them. This allows us to trigger the simple branch of
`erase` logic which just detatches blocks from the loop and does not try to do
some complex stuff that need this invariant.

Thanks to @uabelho for reporting this!

Differential Revision: https://reviews.llvm.org/D57221
Reviewed By: fedor.sergeev

llvm-svn: 353813
2019-02-12 09:37:00 +00:00
Max Kazantsev 9aae9da947 Delete blocks from DTU to avoid dangling pointers
llvm-svn: 353804
2019-02-12 08:10:29 +00:00
Max Kazantsev 6bf861597c [LoopSimplifyCFG] Pay respect to LCSSA when removing dead blocks
Utility function that we use for blocks deletion always unconditionally removes
one-input Phis. In LoopSimplifyCFG, it can lead to breach of LCSSA form.
This patch alters this function to keep them if needed.

Differential Revision: https://reviews.llvm.org/D57231
Reviewed By: fedor.sergeev

llvm-svn: 353803
2019-02-12 07:48:07 +00:00
Max Kazantsev 20b9189975 [NFC] Rename DontDeleteUselessPHIs --> KeepOneInputPHIs
llvm-svn: 353801
2019-02-12 07:09:29 +00:00
Max Kazantsev 0686d1ae41 [NFC] Add parameter for keeping one-input Phis in DeleteDeadBlock(s)
llvm-svn: 353799
2019-02-12 06:14:27 +00:00
Eli Friedman 806136f8ef [LoopReroll] Fix reroll root legality checking.
The code checked that the first root was an appropriate distance from
the base value, but skipped checking the other roots. This could lead to
rerolling a loop that can't be legally rerolled (at least, not without
rewriting the loop in a non-trivial way).

Differential Revision: https://reviews.llvm.org/D56812

llvm-svn: 353779
2019-02-12 00:33:25 +00:00
Michael Kruse 77a614a6e1 Refactor setAlreadyUnrolled() and setAlreadyVectorized().
Loop::setAlreadyUnrolled() and
LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that
stops the same loop from being transformed multiple times. This patch
merges both implementations.

In doing so we fix 3 potential issues:

 * setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.*
   metadata even though it will not be used anymore. This already caused
   problems such as http://llvm.org/PR40546. Change the behavior to the
   one of setAlreadyUnrolled which deletes this loop metadata.

 * setAlreadyUnrolled() used to create a new LoopID by calling
   MDNode::get with nullptr as the first operand, then replacing it by
   the returned references using replaceOperandWith. It is possible
   that MDNode::get would instead return an existing node (due to
   de-duplication) that then gets modified. To avoid, use a fresh
   TempMDNode that does not get uniqued with anything else before
   replacing it with replaceOperandWith.

 * LoopVectorizeHints::matchesHintMetadataName() only compares the
   suffix of the attribute to set the new value for. That is, when
   called with "enable", would erase attributes such as
   "llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and
   "llvm.loop.distribute.enable" instead of the one to replace.
   Fortunately, function was only called with "isvectorized".

Differential Revision: https://reviews.llvm.org/D57566

llvm-svn: 353738
2019-02-11 19:45:44 +00:00
Sanjay Patel 587fd849f0 [InstCombine] Fix matchRotate bug when one operand is a ConstantExpr shift
This bug seems to be harmless in release builds, but will cause an error in UBSAN
builds or an assertion failure in debug builds.

When it gets to this opcode comparison, it assumes both of the operands are BinaryOperators,
but the prior m_LogicalShift will also match a ConstantExpr. The cast<BinaryOperator> will
assert in a debug build, or reading an invalid value for BinaryOp from memory with
((BinaryOperator*)constantExpr)->getOpcode() will cause an error in a UBSAN build.

The test I added will fail without this change in debug/UBSAN builds, but not in release.

Patch by: @AndrewScheidecker (Andrew Scheidecker)

Differential Revision: https://reviews.llvm.org/D58049

llvm-svn: 353736
2019-02-11 19:26:27 +00:00
Alina Sbirlea 605b21739d [LICM&MSSA] Limit store hoisting.
Summary:
If there is no clobbering access for a store inside the loop, that store
can only be hoisted if there are no interfearing loads.
A more general verification introduced here: there are no loads that are
not optimized to an access outside the loop.
Addresses PR40586.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57967

llvm-svn: 353734
2019-02-11 19:07:15 +00:00
Chandler Carruth 1f5550326f Update more files added with the old header to the new one.
llvm-svn: 353667
2019-02-11 08:25:56 +00:00
Chandler Carruth b53f0e1145 Update files that were mistakenly added with the old file header to the
new one.

llvm-svn: 353665
2019-02-11 08:07:38 +00:00
Chandler Carruth 751d95fb9b [CallSite removal] Migrate ConstantFolding APIs and implementation to
`CallBase`.

Users have been updated. You can see how to update any out-of-tree
usages: pass `cast<CallBase>(CS.getInstruction())`.

llvm-svn: 353661
2019-02-11 07:51:44 +00:00
Chandler Carruth 3160734af1 [CallSite removal] Migrate the statepoint GC infrastructure to use the
`CallBase` class rather than `CallSite` wrappers.

I pushed this change down through most of the statepoint infrastructure,
completely removing the use of CallSite where I could reasonably do so.
I ended up making a couple of cut-points: generic call handling
(instcombine, TLI, SDAG). As soon as it hit truly generic handling with
users outside the immediate code, I simply transitioned into or out of
a `CallSite` to make this a reasonable sized chunk.

Differential Revision: https://reviews.llvm.org/D56122

llvm-svn: 353660
2019-02-11 07:42:30 +00:00
Fangrui Song 709a3e7488 [Local] Delete a redundant check. NFC
isInstructionTriviallyDead also performs the use_empty() check.

llvm-svn: 353637
2019-02-10 09:25:56 +00:00
Craig Topper a97857b5b5 [InstCombine] Fix an unused variable warning.
llvm-svn: 353630
2019-02-10 02:21:29 +00:00
Fangrui Song 6e679f8ba5 [GlobalOpt] Simplify __cxa_atexit elimination
cxxDtorIsEmpty checks callers recursively to determine if the
__cxa_atexit-registered function is empty, and eliminates the
__cxa_atexit call accordingly.

This recursive check is unnecessary as redundant instructions and
function calls can be removed by early-cse and inliner. In addition,
cxxDtorIsEmpty does not mark visited function and it may visit a
function exponential times (multiplication principle).

llvm-svn: 353603
2019-02-09 09:18:37 +00:00
Gabor Buella 53980b24b7 Extra processing for BitCast + PHI in InstCombine
For some specific cases with bitcast A->B->A with intervening PHI nodes InstCombiner::optimizeBitCastFromPhi transformation creates extra PHI nodes, which are actually a copy of already created PHI or in another words, they are redundant. These extra PHI nodes could lead to extra move instructions generated after DeSSA transformation. This happens when several conditions are met

 - SROA kicks in and creates new alloca;
 - there is a simple assignment L = R, which falls under 'canonicalize loads' done by combineLoadToOperationType (this transformation is by default). Exactly this transformation is the reason of bitcasts generated;
 - the alloca is then used in A->B->A + PHI chain;
 - there is a loop unrolling.

As a result optimizeBitCastFromPhi creates as many of PHI nodes for each new SROA alloca as loop unrolling factor is. These new extra PHI nodes are redundant actually except of one and should not be created. Moreover the idea of optimizeBitCastFromPhi is to get rid of the cast (when possible) but that doesn't happen in these conditions.

The proposed fix is to do the cast replacement for the whole calculated/accumulated PHI closure not for one cast only, which is an argument to the optimizeBitCastFromPhi. These will help to accomplish several things: 1) avoid extra PHI nodes generated as all casts which may trigger optimizeBitCastFromPhi transformation will be replaced, 3) bitcasts will be replaced, and 3) create more opportunities to remove dead code, which appears after the replacement.

A new test case shows that it's possible to get rid of all bitcasts completely and get quite good code reduction.

Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com>

Reviewed By: Carrot

Differential Revision: https://reviews.llvm.org/D57053

llvm-svn: 353595
2019-02-09 01:44:28 +00:00
Sergey Dmitriev afd612ece9 [NFC] Avoid passing blocks vector to the OutlineRegionInfo constructor by value.
Reviewers: vsk, fhahn, davidxl

Reviewed By: vsk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57957

llvm-svn: 353582
2019-02-08 23:52:15 +00:00
Craig Topper 784929d045 Implementation of asm-goto support in LLVM
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.

This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.

There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.

Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii

Differential Revision: https://reviews.llvm.org/D53765

llvm-svn: 353563
2019-02-08 20:48:56 +00:00
Vedant Kumar 0e5dd512aa [CodeExtractor] Restore outputs after creating exit stubs
When CodeExtractor saves the result of InvokeInst at the first insertion
point of the 'normal destination' basic block, this block can be omitted
in the outlined region, so store is placed outside of the function. The
suggested solution is to process saving outputs after creating exit
stubs for new function, and stores will be placed in that blocks before
return in this case.

Patch by Sergei Kachkov!

Fixes llvm.org/PR40455.

Differential Revision: https://reviews.llvm.org/D57919

llvm-svn: 353562
2019-02-08 20:48:04 +00:00
Reid Kleckner 987d331fab [InstrProf] Implement static profdata registration
Summary:
The motivating use case is eliminating duplicate profile data registered
for the same inline function in two object files. Before this change,
users would observe multiple symbol definition errors with VC link, but
links with LLD would succeed.

Users (Mozilla) have reported that PGO works well with clang-cl and LLD,
but when using LLD without this static registration, we would get into a
"relocation against a discarded section" situation. I'm not sure what
happens in that situation, but I suspect that duplicate, unused profile
information was retained. If so, this change will reduce the size of
such binaries with LLD.

Now, Windows uses static registration and is in line with all the other
platforms.

Reviewers: davidxl, wmi, inglorion, void, calixte

Subscribers: mgorny, krytarowski, eraman, fedor.sergeev, hiraditya, #sanitizers, dmajor, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D57929

llvm-svn: 353547
2019-02-08 19:03:50 +00:00
Teresa Johnson 3ce8112dad ArgumentPromotion should copy all metadata to new Function
Summary:
ArgumentPromotion had code to specifically move the dbg metadata over to
the new function, but other metadata such as the function_entry_count
!prof metadata was not. Replace code that moved dbg metadata with a call
to copyMetadata. The old metadata is automatically removed when the old
Function is removed.

Reviewers: davidxl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57846

llvm-svn: 353537
2019-02-08 17:08:27 +00:00
Carlos Alberto Enciso 08dc50f2fb [DWARF] LLVM ERROR: Broken function found, while removing Debug Intrinsics.
Check that when SimplifyCFG is flattening a 'br', all their debug intrinsic instructions are removed, including any dbg.label referencing a label associated with the basic blocks being removed.

Differential Revision: https://reviews.llvm.org/D57444

llvm-svn: 353511
2019-02-08 10:57:26 +00:00
Max Kazantsev 6b63d3a277 [LoopSimplifyCFG] Use DTU.applyUpdates instead of insert/deleteEdge
`insert/deleteEdge` methods in DTU can make updates incorrectly in some cases
(see https://bugs.llvm.org/show_bug.cgi?id=40528), and it is recommended to
use `applyUpdates` methods instead when it is needed to make a mass update in CFG.

Differential Revision: https://reviews.llvm.org/D57316
Reviewed By: kuhar

llvm-svn: 353502
2019-02-08 08:12:41 +00:00
Sergey Dmitriev 807960e6ef [CodeExtractor] Update function's assumption cache after extracting blocks from it
Summary: Assumption cache's self-updating mechanism does not correctly handle the case when blocks are extracted from the function by the CodeExtractor. As a result function's assumption cache may have stale references to the llvm.assume calls that were moved to the outlined function. This patch fixes this problem by removing extracted llvm.assume calls from the function’s assumption cache.

Reviewers: hfinkel, vsk, fhahn, davidxl, sanjoy

Reviewed By: hfinkel, vsk

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57215

llvm-svn: 353500
2019-02-08 06:55:18 +00:00
Quentin Colombet 96f54de8ff [InstCombine] Optimize `atomicrmw <op>, 0` into `load atomic` when possible
This commit teaches InstCombine how to replace an atomicrmw operation
into a simple load atomic.
For a given `atomicrmw <op>`, this is possible when:
1. The ordering of that operation is compatible with a load (i.e.,
   anything that doesn't have a release semantic).
2. <op> does not modify the value being stored

Differential Revision: https://reviews.llvm.org/D57854

llvm-svn: 353471
2019-02-07 21:27:23 +00:00
Florian Hahn f557a94aa3 [LV] Remove unnecessary assignment to UserIC.
llvm-svn: 353469
2019-02-07 21:23:37 +00:00
Sanjay Patel 781d883862 [InstCombine] Fix crashing from (icmp (bitcast ([su]itofp X)), Y)
This fixes a class of bugs introduced by D44367, 
which transforms various cases of icmp (bitcast ([su]itofp X)), Y to icmp X, Y. 
If the bitcast is between vector types with a different number of elements, 
the current code will produce bad IR along the lines of: icmp <N x i32> ..., <M x i32> <...>.

This patch suppresses the transform if the bitcast changes the number of vector elements.

Patch by: @AndrewScheidecker (Andrew Scheidecker)

Differential Revision: https://reviews.llvm.org/D57871

llvm-svn: 353467
2019-02-07 21:12:01 +00:00
Sanjay Patel e7f46c3db3 [InstCombine] refactor folds for (icmp (bitcast X), Y); NFCI
llvm-svn: 353462
2019-02-07 20:54:09 +00:00
Florian Hahn ba5acbc4fe [LV] Prevent interleaving if computeMaxVF returned None.
As discussed in D57382, interleaving should be avoided if computeMaxVF
returns None, same as we currently do for vectorization.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477

Reviewers: Ayal, dcaballe, hsaito, mkuper, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D57837

llvm-svn: 353461
2019-02-07 20:49:10 +00:00
Reid Kleckner f21c022380 [InstrProf] Avoid reconstructing Triple, NFC
llvm-svn: 353439
2019-02-07 18:16:22 +00:00
Teresa Johnson c36c10ddfb [HotColdSplit] With PGO add profile entry metadata to split cold function
Summary:
When compiling with profile data, ensure the split cold function gets
cold function_entry_count metadata (just use 0 since it should be cold).
Otherwise with function sections it will not be placed in the unlikely
text section with other cold code.

Reviewers: vsk

Subscribers: sebpop, hiraditya, davidxl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57900

llvm-svn: 353434
2019-02-07 17:50:35 +00:00
Sam Parker 67756c09f2 [LSR] Generate cross iteration indexes
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.

Differential Revision: https://reviews.llvm.org/D55373

llvm-svn: 353403
2019-02-07 13:32:54 +00:00
Alina Sbirlea 6cba96ed52 [LICM/MSSA] Add promotion to scalars by building an AliasSetTracker with MemorySSA.
Summary:
Experimentally we found that promotion to scalars carries less benefits
than sinking and hoisting in LICM. When using MemorySSA, we build an
AliasSetTracker on demand in order to reuse the current infrastructure.
We only build it if less than AccessCapForMSSAPromotion exist in the
loop, a cap that is by default set to 250. This value ensures there are
no runtime regressions, and there are small compile time gains for
pathological cases. A much lower value (20) was found to yield a single
regression in the llvm-test-suite and much higher benefits for compile
times. Conservatively we set the current cap to a high value, but we will
explore lowering it when MemorySSA is enabled by default.

Reviewers: sanjoy, chandlerc

Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D56625

llvm-svn: 353339
2019-02-06 20:25:17 +00:00
Sanjay Patel 68bc5fb0ad [InstCombine] X | C == C --> (X & ~C) == 0
We should canonicalize to one of these forms,
and compare-with-zero could be more conducive
to follow-on transforms. This also leads to
generally better codegen as shown in PR40611:
https://bugs.llvm.org/show_bug.cgi?id=40611

llvm-svn: 353313
2019-02-06 16:43:54 +00:00
Max Kazantsev cd48ac3661 [NFC] Simplify check in guard widening
llvm-svn: 353290
2019-02-06 11:27:00 +00:00
Max Kazantsev 36b392cbe4 [NFC] Factor out detatchment of dead blocks from their erasing
llvm-svn: 353277
2019-02-06 07:56:36 +00:00
Max Kazantsev a4ccfc1841 [LoopSimplifyCFG] Do not count dead exit blocks twice, make CFG simpler
llvm-svn: 353276
2019-02-06 07:49:17 +00:00
Max Kazantsev 0d7ad3c9a3 [NFC] Revert rL353274
llvm-svn: 353275
2019-02-06 06:33:02 +00:00
Max Kazantsev 61e6ffc398 [NFC] Extend API of DeleteDeadBlock(s) to collect updates without DTU
llvm-svn: 353274
2019-02-06 06:00:02 +00:00
Max Kazantsev bad4db8b1a [NFC] Replace readonly SmallVectorImpl with ArrayRef
llvm-svn: 353273
2019-02-06 05:40:31 +00:00
Teresa Johnson 716abbeb43 [HotColdSplit] Move splitting after instrumented PGO use
Summary:
Follow up to D57082 which moved splitting earlier in the pipeline, in
order to perform it before inlining. However, it was moved too early,
before the IR is annotated with instrumented PGO data. This caused the
splitting to incorrectly determine cold functions.

Move it to just after PGO annotation (still before inlining), in both
pass managers.

Reviewers: vsk, hiraditya, sebpop

Subscribers: mehdi_amini, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57805

llvm-svn: 353270
2019-02-06 04:29:39 +00:00
Richard Trieu 5f436fc57a Move DomTreeUpdater from IR to Analysis
DomTreeUpdater depends on headers from Analysis, but is in IR.  This is a
layering violation since Analysis depends on IR.  Relocate this code from IR
to Analysis to fix the layering violation.

llvm-svn: 353265
2019-02-06 02:52:52 +00:00
Vedant Kumar bd94b4287c [HotColdSplit] Do not split out `resume` instructions
Resumes that are not reachable from a cleanup landing pad are considered
to be unreachable. It’s not safe to split them out.

rdar://47808235

llvm-svn: 353242
2019-02-05 23:39:02 +00:00
Sanjay Patel cddb1e5469 [InstCombine] limit extracting shuffle transform based on uses
As discussed in D53037, this can lead to worse codegen, and we
don't generally expect the backend to be able to optimize
arbitrary shuffles. If there's only one use of the 1st shuffle,
that means it's getting removed, so that should always be
safe.

llvm-svn: 353235
2019-02-05 22:58:45 +00:00
Rong Xu ce10d5ead4 [PGO] Use a function for creating variable for profile file name. NFC.
Factored out the code for creating variable for profile file name to
a function.

llvm-svn: 353230
2019-02-05 22:34:45 +00:00
Jeremy Morse 84ca706be1 [DebugInfo][NFCI] Split salvageDebugInfo into helper functions
Some use cases are appearing where salvaging is needed that does not
correspond to an instruction being deleted -- for example an instruction
being sunk, or a Value not being available in a block being isel'd.

Enable more fine grained control over how salavging occurs by splitting
the logic into helper functions, separating things that are specific to
working on DbgVariableIntrinsics from those specific to interpreting IR
and building DIExpressions.

Differential Revision: https://reviews.llvm.org/D57696

llvm-svn: 353156
2019-02-05 11:11:28 +00:00
Max Kazantsev d5e595b7a6 [LSR] Check SCEV on isZero() after extend. PR40514
When LSR first adds SCEVs to BaseRegs, it only does it if `isZero()` has
returned false. In the end, in invocation of `InsertFormula`, it asserts that
all values there are still not zero constants. However between these two
points, it makes some transformations, in particular extends them to wider
type.

SCEV does not give us guarantee that if `S` is not a constant zero, then
`sext(S)` is also not a constant zero. It might have missed some optimizing
transforms when it was calculating `S` and then made them when it took `sext`.
For example, it may happen if previously optimizing transforms were limited
by depth or somehow else.

This patch adds a bailout when we may end up with a zero SCEV after extension.

Differential Revision: https://reviews.llvm.org/D57565
Reviewed By: samparker

llvm-svn: 353136
2019-02-05 04:30:37 +00:00
Richard Trieu a9354b2f33 Fix narrowing issue from r353129
llvm-svn: 353134
2019-02-05 02:26:03 +00:00
Wei Mi 4901f371a2 [SamplePGO][NFC] Minor improvement to replace a temporary vector with a
brace-enclosed init list.

Differential Revision: https://reviews.llvm.org/D57726

llvm-svn: 353129
2019-02-05 00:57:50 +00:00
Teresa Johnson 4bdf82ce79 [SamplePGO] Minor efficiency improvement in samplePGO ICP
Summary:
When attaching prof metadata to promoted direct calls in SamplePGO
mode, no need to construct and use a SmallVector to pass a single count
to the ArrayRef parameter, we can simply use a brace-enclosed init list.

This made a small but consistent improvement for a ThinLTO backend
compile I was measuring.

Reviewers: wmi

Subscribers: mehdi_amini, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57706

llvm-svn: 353123
2019-02-05 00:18:38 +00:00
Julian Lettner 29ac3a5b82 [SanitizerCoverage] Clang crashes if user declares `__sancov_lowest_stack` variable
Summary:
If the user declares or defines `__sancov_lowest_stack` with an
unexpected type, then `getOrInsertGlobal` inserts a bitcast and the
following cast fails:
```
Constant *SanCovLowestStackConstant =
       M.getOrInsertGlobal(SanCovLowestStackName, IntptrTy);
SanCovLowestStack = cast<GlobalVariable>(SanCovLowestStackConstant);
```

This variable is a SanitizerCoverage implementation detail and the user
should generally never have a need to access it, so we emit an error
now.

rdar://problem/44143130

Reviewers: morehouse

Differential Revision: https://reviews.llvm.org/D57633

llvm-svn: 353100
2019-02-04 22:06:30 +00:00
Nicolai Haehnle a69146e67e [InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemanded
Summary:
The fix added in r352904 is not quite correct, or rather misleading:

1. When the texfailctrl (TFC) argument was non-constant, the fix assumed
   non-TFE/LWE, which is incorrect.

2. Regardless, this code path cannot even be hit for correct
   TFE/LWE-enabled calls, because those return a struct. Added
   a test case for those for completeness.

Change-Id: I92d314dbc67a2670f6d7adaab765ef45f56a49cf

Reviewers: hliao, dstuttard, arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57681

llvm-svn: 353097
2019-02-04 21:24:19 +00:00
Philip Pfaffe 0ee6a933ce [NewPM][MSan] Add Options Handling
Summary: This patch enables passing options to msan via the passes pipeline, e.e., -passes=msan<recover;kernel;track-origins=4>.

Reviewers: chandlerc, fedor.sergeev, leonardchan

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57640

llvm-svn: 353090
2019-02-04 21:02:49 +00:00
Michael Kruse 70560a0a2c [WarnMissedTransforms] Do not warn about already vectorized loops.
LoopVectorize adds llvm.loop.isvectorized, but leaves
llvm.loop.vectorize.enable. Do not consider such a loop for user-forced
vectorization since vectorization already happened -- by prioritizing
llvm.loop.isvectorized except for TM_SuppressedByUser.

Fixes http://llvm.org/PR40546

Differential Revision: https://reviews.llvm.org/D57542

llvm-svn: 353082
2019-02-04 19:55:59 +00:00
Max Kazantsev 56b57e3f53 [NFC] Make a check in GuardWidening more obvious
llvm-svn: 353038
2019-02-04 10:41:17 +00:00
Max Kazantsev 09802f41cc [NFC] Rename variables to reflect the actual status of GuardWidening
llvm-svn: 353036
2019-02-04 10:31:18 +00:00
Max Kazantsev 13ab5cbb64 [NFC] Remove redundant parameters for better readability
llvm-svn: 353034
2019-02-04 10:20:51 +00:00
Max Kazantsev 65970aa24d [NFC] Replace equivalent condition for better readability
llvm-svn: 353032
2019-02-04 09:55:18 +00:00
Davide Italiano 73929c4d24 [LoopIdiomRecognize] @llvm.dbg values shouldn't affect the transformation.
Summary: PR40564

Reviewers: aprantl, rnk

Subscribers: llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57629

llvm-svn: 353007
2019-02-03 20:33:20 +00:00
Florian Hahn dd2ef0af46 [LCSSA] Handle case with single new PHI faster.
If there is only a single available value, all uses must be dominated by
the single value and there is no need to search for a reaching
definition.

This drastically speeds up LCSSA in some cases. For the test case
from PR37202, it speeds up LCSSA construction by 4 times.

Time-passes without this patch for test case from PR37202:

    Total Execution Time: 29.9285 seconds (29.9276 wall clock)

    ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
    5.2786 ( 17.7%)   0.0021 (  1.2%)   5.2806 ( 17.6%)   5.2808 ( 17.6%)  Unswitch loops
    4.3739 ( 14.7%)   0.0303 ( 18.1%)   4.4042 ( 14.7%)   4.4042 ( 14.7%)  Loop-Closed SSA Form Pass
    4.2658 ( 14.3%)   0.0192 ( 11.5%)   4.2850 ( 14.3%)   4.2851 ( 14.3%)  Loop-Closed SSA Form Pass #2
    2.2307 (  7.5%)   0.0013 (  0.8%)   2.2320 (  7.5%)   2.2318 (  7.5%)  Loop Invariant Code Motion
    2.0888 (  7.0%)   0.0012 (  0.7%)   2.0900 (  7.0%)   2.0897 (  7.0%)  Unroll loops
    1.6761 (  5.6%)   0.0013 (  0.8%)   1.6774 (  5.6%)   1.6774 (  5.6%)  Value Propagation
    1.3686 (  4.6%)   0.0029 (  1.8%)   1.3716 (  4.6%)   1.3714 (  4.6%)  Induction Variable Simplification
    1.1457 (  3.8%)   0.0010 (  0.6%)   1.1468 (  3.8%)   1.1468 (  3.8%)  Loop-Closed SSA Form Pass #4
    1.1384 (  3.8%)   0.0005 (  0.3%)   1.1389 (  3.8%)   1.1389 (  3.8%)  Loop-Closed SSA Form Pass #6
    1.1360 (  3.8%)   0.0027 (  1.6%)   1.1387 (  3.8%)   1.1387 (  3.8%)  Loop-Closed SSA Form Pass #5
    1.1331 (  3.8%)   0.0010 (  0.6%)   1.1341 (  3.8%)   1.1340 (  3.8%)  Loop-Closed SSA Form Pass #3

Time passes with this patch

  Total Execution Time: 19.2802 seconds (19.2813 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   4.4234 ( 23.2%)   0.0038 (  2.0%)   4.4272 ( 23.0%)   4.4273 ( 23.0%)  Unswitch loops
   2.3828 ( 12.5%)   0.0020 (  1.1%)   2.3848 ( 12.4%)   2.3847 ( 12.4%)  Unroll loops
   1.8714 (  9.8%)   0.0020 (  1.1%)   1.8734 (  9.7%)   1.8735 (  9.7%)  Loop Invariant Code Motion
   1.7973 (  9.4%)   0.0022 (  1.2%)   1.7995 (  9.3%)   1.8003 (  9.3%)  Value Propagation
   1.4010 (  7.3%)   0.0033 (  1.8%)   1.4043 (  7.3%)   1.4044 (  7.3%)  Induction Variable Simplification
   0.9978 (  5.2%)   0.0244 ( 13.1%)   1.0222 (  5.3%)   1.0224 (  5.3%)  Loop-Closed SSA Form Pass #2
   0.9611 (  5.0%)   0.0257 ( 13.8%)   0.9868 (  5.1%)   0.9868 (  5.1%)  Loop-Closed SSA Form Pass
   0.5856 (  3.1%)   0.0015 (  0.8%)   0.5871 (  3.0%)   0.5869 (  3.0%)  Unroll loops #2
   0.4132 (  2.2%)   0.0012 (  0.7%)   0.4145 (  2.1%)   0.4143 (  2.1%)  Loop Invariant Code Motion #3

Reviewers: efriedma, davide, mzolotukhin

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D57033

llvm-svn: 352960
2019-02-02 15:26:05 +00:00
Florian Hahn 509b48a64a [LCSSA] Add expensive verification of LCSSA form for sub-loops.
This assertion makes sure all sub-loops are in LCSSA form before
bringing their parent in LCSSA form. This precondition was added to
formLCSSA in D56848.

Reviewers: davide, efriedma, mzolotukhin

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D56921

llvm-svn: 352958
2019-02-02 14:42:27 +00:00
Julian Lettner f82d8924ef [ASan] Do not instrument other runtime functions with `__asan_handle_no_return`
Summary:
Currently, ASan inserts a call to `__asan_handle_no_return` before every
`noreturn` function call/invoke. This is unnecessary for calls to other
runtime funtions. This patch changes ASan to skip instrumentation for
functions calls marked with `!nosanitize` metadata.

Reviewers: TODO

Differential Revision: https://reviews.llvm.org/D57489

llvm-svn: 352948
2019-02-02 02:05:16 +00:00
James Y Knight 291f791ef1 [opaque pointer types] Pass function type for CallBase::setCalledFunction.
Differential Revision: https://reviews.llvm.org/D57174

llvm-svn: 352914
2019-02-01 20:44:54 +00:00
James Y Knight 7716075a17 [opaque pointer types] Pass value type to GetElementPtr creation.
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57173

llvm-svn: 352913
2019-02-01 20:44:47 +00:00
James Y Knight 14359ef1b6 [opaque pointer types] Pass value type to LoadInst creation.
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57172

llvm-svn: 352911
2019-02-01 20:44:24 +00:00
James Y Knight d9e85a0861 [opaque pointer types] Pass function types to InvokeInst creation.
This cleans up all InvokeInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57171

llvm-svn: 352910
2019-02-01 20:43:34 +00:00
James Y Knight 7976eb5838 [opaque pointer types] Pass function types to CallInst creation.
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57170

llvm-svn: 352909
2019-02-01 20:43:25 +00:00
Michael Liao 8b323f53eb [InstCombine] Extra null-checking on TFE/LWE support
- If that operand is not ConstantInt, skip enabling TFE/LWE.

Differential Revision: https://reviews.llvm.org/D57539

llvm-svn: 352904
2019-02-01 19:53:44 +00:00
Sanjay Patel fbcbac7174 [InstCombine] reduce duplicate code; NFC
An unused variable problem was introduced with rL352870 
and stubbed out with rL352871, but we can make a better
fix by actually using the local variable in code rather 
than just the assert.  

llvm-svn: 352873
2019-02-01 14:37:49 +00:00
Fangrui Song 8495aabec2 [InstCombine] Fix -Wunused-variable when -DLLVM_ENABLE_ASSERTIONS=off
llvm-svn: 352871
2019-02-01 14:22:02 +00:00
Sanjay Patel be23a91fcd [InstCombine] try to reduce x86 addcarry to generic uaddo intrinsic
If we can reduce the x86-specific intrinsic to the generic op, it allows existing 
simplifications and value tracking folds. AFAICT, this always results in identical 
x86 codegen in the non-reduced case...which should be true because we semi-generically 
(too aggressively IMO) convert to llvm.uadd.with.overflow in CGP, so the DAG/isel must 
already combine/lower this intrinsic as expected.

This isn't quite what was requested in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but we want to have these kinds of folds early for efficiency and to enable greater 
simplifications. For the case in the bug report where we have:
_addcarry_u64(0, ahi, 0, &ahi)
...this gets completely simplified away in IR.

Differential Revision: https://reviews.llvm.org/D57453

llvm-svn: 352870
2019-02-01 14:14:47 +00:00
Yevgeny Rouban 15b17d0a7c Provide reason messages for unviable inlining
InlineCost's isInlineViable() is changed to return InlineResult
instead of bool. This provides messages for failure reasons and
allows to get more specific messages for cases where callsites
are not viable for inlining.

Reviewed By: xbolva00, anemet

Differential Revision: https://reviews.llvm.org/D57089

llvm-svn: 352849
2019-02-01 10:44:43 +00:00
Yevgeny Rouban 4cdd783955 [SLPVectorizer] Get rid of IndexQueue array from vectorizeStores. NFCI.
Indices are checked as they are generated. No need to fill the whole array of indices.

Differential Revision: https://reviews.llvm.org/D57144

llvm-svn: 352839
2019-02-01 06:44:08 +00:00
James Y Knight 13680223b9 [opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.

Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.

Then:
- update the CallInst/InvokeInst instruction creation functions to
  take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.

One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.

However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)

Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.

Differential Revision: https://reviews.llvm.org/D57315

llvm-svn: 352827
2019-02-01 02:28:03 +00:00
Kostya Serebryany a78a44d480 [sanitizer-coverage] prune trace-cmp instrumentation for CMP isntructions that feed into the backedge branch. Instrumenting these CMP instructions is almost always useless (and harmful) for fuzzing
llvm-svn: 352818
2019-01-31 23:43:00 +00:00
James Y Knight fadf25068e Revert "[opaque pointer types] Add a FunctionCallee wrapper type, and use it."
This reverts commit f47d6b38c7 (r352791).

Seems to run into compilation failures with GCC (but not clang, where
I tested it). Reverting while I investigate.

llvm-svn: 352800
2019-01-31 21:51:58 +00:00
Alina Sbirlea e271889291 [EarlyCSE & MSSA] Cleanup special handling for removing MemoryAccesses.
Summary: Moving special handling to MemorySSAUpdater in D57199.

Reviewers: gberry, george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D57200

llvm-svn: 352794
2019-01-31 21:12:41 +00:00
James Y Knight f47d6b38c7 [opaque pointer types] Add a FunctionCallee wrapper type, and use it.
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.

Then:
- update the CallInst/InvokeInst instruction creation functions to
  take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.

One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.

However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)

Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.

Differential Revision: https://reviews.llvm.org/D57315

llvm-svn: 352791
2019-01-31 20:35:56 +00:00
Craig Topper c1892ec15a [CallSite removal] Remove CallSite uses from InstCombine.
Reviewers: chandlerc

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57494

llvm-svn: 352771
2019-01-31 17:23:29 +00:00
Teresa Johnson f59242e5ff Recommit "[ThinLTO] Rename COMDATs for COFF when promoting/renaming COMDAT leader"
Recommit of r352763 with fix for use after free.

llvm-svn: 352770
2019-01-31 17:18:11 +00:00
Teresa Johnson 4877715ee6 Revert "[ThinLTO] Rename COMDATs for COFF when promoting/renaming COMDAT leader"
This reverts commit r352763.

Causing a couple bot failures, root cause pointed to by sanitizer bot:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/28909/steps/annotate/logs/stdio

Use after free. I understand the issue but will revert and test with fix
before recommitting.

llvm-svn: 352768
2019-01-31 16:46:14 +00:00
Teresa Johnson 992b53fd16 [ThinLTO] Rename COMDATs for COFF when promoting/renaming COMDAT leader
Summary:
COFF requires that COMDAT name match that of the leader. When we promote
and rename an internal leader in ThinLTO due to an import, ensure we
subsequently rename the associated COMDAT. Similar to D31963 which did
this during ThinLTO module splitting.

Fixes PR40414.

Reviewers: pcc, inglorion

Subscribers: mehdi_amini, dexonsmith, dmajor, llvm-commits

Differential Revision: https://reviews.llvm.org/D57395

llvm-svn: 352763
2019-01-31 16:00:15 +00:00
Max Kazantsev f392bc846f Default lowering for experimental.widenable.condition
Introduces a pass that provides default lowering strategy for the
`experimental.widenable.condition` intrinsic, replacing all its uses with
`i1 true`.

Differential Revision: https://reviews.llvm.org/D56096
Reviewed By: reames

llvm-svn: 352739
2019-01-31 09:10:17 +00:00
Dmitry Venikov 8817658836 [InstCombine] Missed optimization in math expression: simplify calls exp functions
Summary: This patch enables folding following expressions under -ffast-math flag: exp(X) * exp(Y) -> exp(X + Y), exp2(X) * exp2(Y) -> exp2(X + Y). Motivation: https://bugs.llvm.org/show_bug.cgi?id=35594

Reviewers: hfinkel, spatel, efriedma, lebedev.ri

Reviewed By: spatel, lebedev.ri

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D41342

llvm-svn: 352730
2019-01-31 06:28:10 +00:00
Erik Pilkington 600e9deacf Add a 'dynamic' parameter to the objectsize intrinsic
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.

rdar://32212419

Differential revision: https://reviews.llvm.org/D56761

llvm-svn: 352664
2019-01-30 20:34:35 +00:00
Philip Reames c71e996aed SimplifyDemandedVectorElts for all intrinsics
The point is that this simplifies integration of new intrinsics into SimplifiedDemandedVectorElts, and ensures we don't miss any existing ones.

This is intended to be NFC-ish, but as seen from the diffs, can produce slightly different output.  This is due to order of transforms w/in instcombine resulting in two slightly different fixed points.  That's something we should fix, but isn't a problem w/this patch per se.

Differential Revision: https://reviews.llvm.org/D57398

llvm-svn: 352653
2019-01-30 19:21:11 +00:00
Max Kazantsev 365021cc15 Properly use DT.verify in LoopSimplifyCFG
llvm-svn: 352621
2019-01-30 12:32:19 +00:00
Max Kazantsev 34eeeec3ae Enable IRCE for narrow latch by defailt
llvm-svn: 352619
2019-01-30 11:25:12 +00:00
Alina Sbirlea f9027e554a Check bool attribute value in getOptionalBoolLoopAttribute.
Summary:
Check the bool value of the attribute in getOptionalBoolLoopAttribute
not just its existance.
Eliminates the warning noise generated when vectorization is explicitly disabled.

Reviewers: Meinersbur, hfinkel, dmgreen

Subscribers: jlebar, sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D57260

llvm-svn: 352555
2019-01-29 22:33:20 +00:00
Sanjay Patel 18db56209c [InstCombine] canonicalize cmp/select form of uadd saturate with constant
I'm circling back around to a loose end from D51929.

The backend (either CGP or DAG) doesn't recognize this pattern, so we end up with different asm for these IR variants.

Regardless of any future changes to canonicalize to saturation/overflow intrinsics, we want to get raw IR variations 
into the minimal number of raw IR forms. If/when we can canonicalize to intrinsics, that will make that step easier.

  Pre: C2 == ~C1
  %a = add i32 %x, C1
  %c = icmp ugt i32 %x, C2
  %r = select i1 %c, i32 -1, i32 %a
  =>
  %a = add i32 %x, C1
  %c2 = icmp ult i32 %x, C2
  %r = select i1 %c2, i32 %a, i32 -1

  https://rise4fun.com/Alive/pkH

Differential Revision: https://reviews.llvm.org/D57352

llvm-svn: 352536
2019-01-29 20:02:45 +00:00
Bjorn Pettersson d014d576a9 [IPCP] Don't crash due to arg count/type mismatch between caller/callee
Summary:
This patch avoids an assert in IPConstantPropagation when
there is a argument count/type mismatch between the caller and
the callee.

While this is actually UB on C-level (clang emits a warning),
the IR verifier seems to accept it. I'm not sure what other
frontends/languages might think about this, so simply bailing out
to avoid hitting an assert (in CallSiteBase<>::getArgOperand or
Value::doRAUW) seems like a simple solution.

The problem is exposed by the fact that AbstractCallSites will look
through a bitcast at the callee position of a call/invoke.

Reviewers: jdoerfert, reames, efriedma

Reviewed By: jdoerfert, efriedma

Subscribers: eli.friedman, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D57052

llvm-svn: 352469
2019-01-29 10:19:44 +00:00
Philip Reames 6c5341bc5a Demanded elements support for vector GEPs
GEPs can produce either scalar or vector results. If we're extracting only a subset of the vector lanes, simplifying the operands is helpful in eliminating redundant computation, and (eventually) allowing further optimizations

Differential Revision: https://reviews.llvm.org/D57177

llvm-svn: 352440
2019-01-28 23:24:49 +00:00
Teresa Johnson 5b2f6a1bc2 [ThinLTO] Refine reachability check to fix compile time increase
Summary:
A recent fix to the ThinLTO whole program dead code elimination (D56117)
increased the thin link time on a large MSAN'ed binary by 2x.
It's likely that the time increased elsewhere, but was more noticeable
here since it was already large and ended up timing out.

That change made it so we would repeatedly scan all copies of linkonce
symbols for liveness every time they were encountered during the graph
traversal. This was needed since we only mark one copy of an aliasee as
live when we encounter a live alias. This patch fixes the issue in a
more efficient manner by simply proactively visiting the aliasee (thus
marking all copies live) when we encounter a live alias.

Two notes: One, this requires a hash table lookup (finding the aliasee
summary in the index based on aliasee GUID). However, the impact of this
seems to be small compared to the original pre-D56117 thin link time. It
could be addressed if we keep the aliasee ValueInfo in the alias summary
instead of the aliasee GUID, which I am exploring in a separate patch.

Second, we only populate the aliasee GUID field when reading summaries
from bitcode (whether we are reading individual summaries and merging on
the fly to form the compiled index, or reading in a serialized combined
index). Thankfully, that's currently the only way we can get to this
code as we don't yet support reading summaries from LLVM assembly
directly into a tool that performs the thin link (they must be converted
to bitcode first). I added a FIXME, however I have the fix under test
already. The easiest fix is to simply populate this field always, which
isn't hard, but more likely the change I am exploring to store the
ValueInfo instead as described above will subsume this. I don't want to
hold up the regression fix for this though.

Reviewers: trentxintong

Subscribers: mehdi_amini, inglorion, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D57203

llvm-svn: 352438
2019-01-28 22:27:05 +00:00
Vedant Kumar 1c3694a4d4 [CodeExtractor] Add support for the `swifterror` attribute
When passing a `swifterror` argument or alloca as an input to an
extraction region, mark the input parameter `swifterror`.

llvm-svn: 352408
2019-01-28 19:13:37 +00:00
Alina Sbirlea 932108703a [SimpleLoopUnswitch] Early check exit for trivial unswitch with MemorySSA.
Summary:
If MemorySSA is avaiable, we can skip checking all instructions if block has any Defs.
(volatile loads are also Defs).
We still need to check all instructions for "canThrow", even if no Defs are found.

Reviewers: chandlerc

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D57129

llvm-svn: 352393
2019-01-28 17:48:45 +00:00
Alina Sbirlea a34bcbf335 Revert rL352238.
llvm-svn: 352241
2019-01-25 21:12:08 +00:00
Alina Sbirlea 890a8e575f [WarnMissedTransforms] Set default to 1.
Summary:
Set default value for retrieved attributes to 1, since the check is against 1.
Eliminates the warning noise generated when the attributes are not present.

Reviewers: sanjoy

Subscribers: jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D57253

llvm-svn: 352238
2019-01-25 20:51:55 +00:00
Vedant Kumar db3f9774ee [HotColdSplit] Introduce a cost model to control splitting behavior
The main goal of the model is to avoid *increasing* function size, as
that would eradicate any memory locality benefits from splitting. This
happens when:

  - There are too many inputs or outputs to the cold region. Argument
    materialization and reloads of outputs have a cost.

  - The cold region has too many distinct exit blocks, causing a large
    switch to be formed in the caller.

  - The code size cost of the split code is less than the cost of a
    set-up call.

A secondary goal is to prevent excessive overall binary size growth.

With the cost model in place, I experimented to find a splitting
threshold that works well in practice. To make warm & cold code easily
separable for analysis purposes, I moved split functions to a "cold"
section. I experimented with thresholds between [0, 4] and set the
default to the threshold which minimized geomean __text size.

Experiment data from building LNT+externals for X86 (N = 639 programs,
all sizes in bytes):

| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os**       | 1736.3           | 0, n=0           | 10961.6        |
| -Os, thresh=0 | 1740.53          | 124.482, n=134   | 11014          |
| -Os, thresh=1 | 1734.79          | 57.8781, n=90    | 10978.6        |
| -Os, thresh=2 | ** 1733.85 **    | 65.6604, n=61    | 10977.6        |
| -Os, thresh=3 | 1733.85          | 65.3071, n=61    | 10977.6        |
| -Os, thresh=4 | 1735.08          | 67.5156, n=54    | 10965.7        |
| **-Oz**       | 1554.4           | 0, n=0           | 10153          |
| -Oz, thresh=2 | ** 1552.2 **     | 65.633, n=61     | 10176          |
| **-O3**       | 2563.37          | 0, n=0           | 13105.4        |
| -O3, thresh=2 | ** 2559.49 **    | 71.1072, n=61    | 13162.4        |

Picking thresh=2 reduces the geomean __text section size by 0.14% at
-Os, -Oz, and -O3 and causes ~0.2% growth in the TEXT segment. Note that
TEXT size is page-aligned, whereas section sizes are byte-aligned.

Experiment data from building LNT+externals for ARM64 (N = 558 programs,
all sizes in bytes):

| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os**       | 1763.96          | 0, n=0           | 42934.9        |
| -Os, thresh=2 | ** 1760.9 **     | 76.6755, n=61    | 42934.9        |

Picking thresh=2 reduces the geomean __text section size by 0.17% at
-Os and causes no growth in the TEXT segment.

Measurements were done with D57082 (r352080) applied.

Differential Revision: https://reviews.llvm.org/D57125

llvm-svn: 352228
2019-01-25 18:30:37 +00:00
Max Kazantsev 38cd9acbb9 [LoopSimplifyCFG] Fix inconsistency in blocks in loop markup
2nd part of D57095 with the same reason, just in another place. We never
fold branches that are not immediately in the current loop, but this check
is missing in `IsEdgeLive` As result, it may think that the edge in subloop is
dead while it's live. It's a pessimization in the current stance.

Differential Revision: https://reviews.llvm.org/D57147
Reviewed By: rupprecht	

llvm-svn: 352170
2019-01-25 05:05:02 +00:00
Vedant Kumar 9d70f2b939 [HotColdSplit] Describe the pass in more detail, NFC
llvm-svn: 352161
2019-01-25 03:22:38 +00:00
Vedant Kumar 65de025d64 [HotColdSplit] Split more aggressively before/after cold invokes
While a cold invoke itself and its unwind destination can't be
extracted, code which unconditionally executes before/after the invoke
may still be profitable to extract.

With cost model changes from D57125 applied, this gives a 3.5% increase
in split text across LNT+externals on arm64 at -Os.

llvm-svn: 352160
2019-01-25 03:22:23 +00:00
Peter Collingbourne 1a8acfb768 hwasan: If we split the entry block, move static allocas back into the entry block.
Otherwise they are treated as dynamic allocas, which ends up increasing
code size significantly. This reduces size of Chromium base_unittests
by 2MB (6.7%).

Differential Revision: https://reviews.llvm.org/D57205

llvm-svn: 352152
2019-01-25 02:08:46 +00:00
Haojian Wu b9613a39b8 Fix a compiler error introduced in r352093.
llvm-svn: 352098
2019-01-24 20:30:48 +00:00
Alina Sbirlea 0a4367209c [LICM] Cleanup duplicated code. [NFCI]
llvm-svn: 352093
2019-01-24 19:57:30 +00:00
Alina Sbirlea 52f6e2a173 [MemorySSA +LICM CFHoist] Solve PR40317.
Summary:
MemorySSA needs updating each time an instruction is moved.
LICM and control flow hoisting re-hoists instructions, thus needing another update when re-moving those instructions.
Pending cleanup: the MSSA update is duplicated, should be moved inside moveInstructionBefore.

Reviewers: jnspaulsson

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D57176

llvm-svn: 352092
2019-01-24 19:48:35 +00:00
Vedant Kumar ef1ebed1c6 [HotColdSplit] Move splitting earlier in the pipeline
Performing splitting early has several advantages:

  - Inhibiting inlining of cold code early improves code size. Compared
    to scheduling splitting at the end of the pipeline, this cuts code
    size growth in half within the iOS shared cache (0.69% to 0.34%).

  - Inhibiting inlining of cold code improves compile time. There's no
    need to inline split cold functions, or to inline as much *within*
    those split functions as they are marked `minsize`.

  - During LTO, extra work is only done in the pre-link step. Less code
    must be inlined during cross-module inlining.

An additional motivation here is that the most common cold regions
identified by the static/conservative splitting heuristic can (a) be
found before inlining and (b) do not grow after inlining. E.g.
__assert_fail, os_log_error.

The disadvantages are:

  - Some opportunities for splitting out cold code may be missed. This
    gap can potentially be narrowed by adding a worklist algorithm to the
    splitting pass.

  - Some opportunities to reduce code size may be lost (e.g. store
    sinking, when one side of the CFG diamond is split). This does not
    outweigh the code size benefits of splitting earlier.

On net, splitting early in the pipeline has substantial code size
benefits, and no major effects on memory locality or performance. We
measured memory locality using ktrace data, and consistently found that
10% fewer pages were needed to capture 95% of text page faults in key
iOS benchmarks. We measured performance on frequency-stabilized iOS
devices using LNT+externals.

This reverses course on the decision made to schedule splitting late in
r344869 (D53437).

Differential Revision: https://reviews.llvm.org/D57082

llvm-svn: 352080
2019-01-24 18:55:49 +00:00
Julian Lettner b62e9dc46b Revert "[Sanitizers] UBSan unreachable incompatible with ASan in the presence of `noreturn` calls"
This reverts commit cea84ab93a.

llvm-svn: 352069
2019-01-24 18:04:21 +00:00
Philip Reames 4d683ee7e3 [RS4GC] Be slightly less conservative for gep vector_base, scalar_idx
After submitting https://reviews.llvm.org/D57138, I realized it was slightly more conservative than needed. The scalar indices don't appear to be a problem on a vector gep, we even had a test for that.

Differential Revision: https://reviews.llvm.org/D57161

llvm-svn: 352061
2019-01-24 16:34:00 +00:00
Philip Reames a657510eb7 [RS4GC] Avoid crashing on gep scalar_base, vector_idx
This is an alternative to https://reviews.llvm.org/D57103.  After discussion, we dedicided to check this in as a temporary workaround, and pursue a true fix under the original thread.

The issue at hand is that the base rewriting algorithm doesn't consider the fact that GEPs can turn a scalar input into a vector of outputs. We had handling for scalar GEPs and fully vector GEPs (i.e. all vector operands), but not the scalar-base + vector-index forms. A true fix here requires treating GEP analogously to extractelement or shufflevector.

This patch is merely a workaround. It simply hides the crash at the cost of some ugly code gen for this presumable very rare pattern.

Differential Revision: https://reviews.llvm.org/D57138

llvm-svn: 352059
2019-01-24 16:08:18 +00:00
Florian Hahn bed7f9eab2 Revert "[HotColdSplitting] Get DT and PDT from the pass manager."
This reverts commit a6982414ed (llvm-svn: 352036),
because it causes a memory leak in the pass manager. Failing bot

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/10351/steps/check-llvm%20asan/logs/stdio

llvm-svn: 352041
2019-01-24 11:22:08 +00:00
Florian Hahn a6982414ed [HotColdSplitting] Get DT and PDT from the pass manager.
Instead of manually computing DT and PDT, we can get the from the pass
manager, which ideally has them already cached. With the new pass
manager, we could even preserve DT/PDT on a per function basis in a
module pass.

I think this also addresses the TODO about re-using the computed DTs for
BFI. IIUC, GetBFI will fetch the DT from the pass manager and when we
will fetch the cached version later.

Reviewers: vsk, hiraditya, tejohnson, thegameg, sebpop

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D57092

llvm-svn: 352036
2019-01-24 09:44:52 +00:00
Max Kazantsev 56515a2c76 [LoopSimplifyCFG] Fix inconsistency in live blocks markup
When we choose whether or not we should mark block as dead, we have an
inconsistent logic in markup of live blocks.
- We take candidate IF its terminator branches on constant AND it is immediately
  in current loop;
- We mark successor live IF its terminator doesn't branch by constant OR it branches
  by constant and the successor is its always taken block.

What we are missing here is that when the terminator branches on a constant but is
not taken as a candidate because is it not immediately in the current loop, we will
mark only one (always taken) successor as live. Therefore, we do NOT do the actual
folding but may NOT mark one of the successors as live. So the result of markup is
wrong in this case, and we may then hit various asserts.

Thanks Jordan Rupprech for reporting this!

Differential Revision: https://reviews.llvm.org/D57095
Reviewed By: rupprecht

llvm-svn: 352024
2019-01-24 05:20:29 +00:00
Julian Lettner cea84ab93a [Sanitizers] UBSan unreachable incompatible with ASan in the presence of `noreturn` calls
Summary:
UBSan wants to detect when unreachable code is actually reached, so it
adds instrumentation before every `unreachable` instruction. However,
the optimizer will remove code after calls to functions marked with
`noreturn`. To avoid this UBSan removes `noreturn` from both the call
instruction as well as from the function itself. Unfortunately, ASan
relies on this annotation to unpoison the stack by inserting calls to
`_asan_handle_no_return` before `noreturn` functions. This is important
for functions that do not return but access the the stack memory, e.g.,
unwinder functions *like* `longjmp` (`longjmp` itself is actually
"double-proofed" via its interceptor). The result is that when ASan and
UBSan are combined, the `noreturn` attributes are missing and ASan
cannot unpoison the stack, so it has false positives when stack
unwinding is used.

Changes:
  # UBSan now adds the `expect_noreturn` attribute whenever it removes
    the `noreturn` attribute from a function
  # ASan additionally checks for the presence of this attribute

Generated code:
```
call void @__asan_handle_no_return    // Additionally inserted to avoid false positives
call void @longjmp
call void @__asan_handle_no_return
call void @__ubsan_handle_builtin_unreachable
unreachable
```

The second call to `__asan_handle_no_return` is redundant. This will be
cleaned up in a follow-up patch.

rdar://problem/40723397

Reviewers: delcypher, eugenis

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D56624

llvm-svn: 352003
2019-01-24 01:06:19 +00:00
David Callahan d2eeb2516d Update entry count for cold calls
Summary:
Profile sample files include the number of times each entry or inlined
call site is sampled. This is translated into the entry count metadta
on functions.

When sample data is being read, if a call site that was inlined
in the sample program is considered cold and not inlined, then
the entry count of the out-of-line functions does not reflect
the current compilation.

In this patch, we note call sites where the function was not inlined
and as a last action of the sample profile loading, we update the
called function's entry count to reflect the calls from these
call sites which are not included in the profile file.

Reviewers: danielcdh, wmi, Kader, modocache

Reviewed By: wmi

Subscribers: davidxl, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D52845

llvm-svn: 352001
2019-01-24 00:55:23 +00:00
Mircea Trofin ec02630278 [llvm] Clarify responsiblity of some of DILocation discriminator APIs
Summary:
Renamed setBaseDiscriminator to cloneWithBaseDiscriminator, to match
similar APIs. Also changed its behavior to copy over the other
discriminator components, instead of eliding them.

Renamed cloneWithDuplicationFactor to
cloneByMultiplyingDuplicationFactor, which more closely matches what
this API does.

Reviewers: dblaikie, wmi

Reviewed By: dblaikie

Subscribers: zzheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D56220

llvm-svn: 351996
2019-01-24 00:10:25 +00:00
Hideki Saito 4e4ecae028 [LV][VPlan] Change to implement VPlan based predication for
VPlan-native path

Context: Patch Series #2 for outer loop vectorization support in LV
using VPlan. (RFC:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).

Patch series #2 checks that inner loops are still trivially lock-step
among all vector elements. Non-loop branches are blindly assumed as
divergent.

Changes here implement VPlan based predication algorithm to compute
predicates for blocks that need predication. Predicates are computed
for the VPLoop region in reverse post order. A block's predicate is
computed as OR of the masks of all incoming edges. The mask for an
incoming edge is computed as AND of predecessor block's predicate and
either predecessor's Condition bit or NOT(Condition bit) depending on
whether the edge from predecessor block to the current block is true
or false edge.

Reviewers: fhahn, rengolin, hsaito, dcaballe

Reviewed By: fhahn

Patch by Satish Guggilla, thanks!

Differential Revision: https://reviews.llvm.org/D53349

llvm-svn: 351990
2019-01-23 22:43:12 +00:00
Peter Collingbourne 020ce3f026 hwasan: Read shadow address from ifunc if we don't need a frame record.
This saves a cbz+cold call in the interceptor ABI, as well as a realign
in both ABIs, trading off a dcache entry against some branch predictor
entries and some code size.

Unfortunately the functionality is hidden behind a flag because ifunc is
known to be broken on static binaries on Android.

Differential Revision: https://reviews.llvm.org/D57084

llvm-svn: 351989
2019-01-23 22:39:11 +00:00
Florian Hahn 68cea130df [HotColdSplitting] Remove unused SSAUpdater.h include (NFC).
llvm-svn: 351945
2019-01-23 11:51:38 +00:00
Max Kazantsev d9aee3c0d1 [IRCE] Support narrow latch condition for wide range checks
This patch relaxes restrictions on types of latch condition and range check.
In current implementation, they should match. This patch allows to handle
wide range checks against narrow condition. The motivating example is the
following:

  int N = ...
  for (long i = 0; (int) i < N; i++) {
    if (i >= length) deopt;
  }

In this patch, the option that enables this support is turned off by
default. We'll wait until it is switched to true.

Differential Revision: https://reviews.llvm.org/D56837
Reviewed By: reames

llvm-svn: 351926
2019-01-23 07:20:56 +00:00
Peter Collingbourne 73078ecd38 hwasan: Move memory access checks into small outlined functions on aarch64.
Each hwasan check requires emitting a small piece of code like this:
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses

The problem with this is that these code blocks typically bloat code
size significantly.

An obvious solution is to outline these blocks of code. In fact, this
has already been implemented under the -hwasan-instrument-with-calls
flag. However, as currently implemented this has a number of problems:
- The functions use the same calling convention as regular C functions.
  This means that the backend must spill all temporary registers as
  required by the platform's C calling convention, even though the
  check only needs two registers on the hot path.
- The functions take the address to be checked in a fixed register,
  which increases register pressure.
Both of these factors can diminish the code size effect and increase
the performance hit of -hwasan-instrument-with-calls.

The solution that this patch implements is to involve the aarch64
backend in outlining the checks. An intrinsic and pseudo-instruction
are created to represent a hwasan check. The pseudo-instruction
is register allocated like any other instruction, and we allow the
register allocator to select almost any register for the address to
check. A particular combination of (register selection, type of check)
triggers the creation in the backend of a function to handle the check
for specifically that pair. The resulting functions are deduplicated by
the linker. The pseudo-instruction (really the function) is specified
to preserve all registers except for the registers that the AAPCS
specifies may be clobbered by a call.

To measure the code size and performance effect of this change, I
took a number of measurements using Chromium for Android on aarch64,
comparing a browser with inlined checks (the baseline) against a
browser with outlined checks.

Code size: Size of .text decreases from 243897420 to 171619972 bytes,
or a 30% decrease.

Performance: Using Chromium's blink_perf.layout microbenchmarks I
measured a median performance regression of 6.24%.

The fact that a perf/size tradeoff is evident here suggests that
we might want to make the new behaviour conditional on -Os/-Oz.
But for now I've enabled it unconditionally, my reasoning being that
hwasan users typically expect a relatively large perf hit, and ~6%
isn't really adding much. We may want to revisit this decision in
the future, though.

I also tried experimenting with varying the number of registers
selectable by the hwasan check pseudo-instruction (which would result
in fewer variants being created), on the hypothesis that creating
fewer variants of the function would expose another perf/size tradeoff
by reducing icache pressure from the check functions at the cost of
register pressure. Although I did observe a code size increase with
fewer registers, I did not observe a strong correlation between the
number of registers and the performance of the resulting browser on the
microbenchmarks, so I conclude that we might as well use ~all registers
to get the maximum code size improvement. My results are below:

Regs | .text size | Perf hit
-----+------------+---------
~all | 171619972  | 6.24%
  16 | 171765192  | 7.03%
   8 | 172917788  | 5.82%
   4 | 177054016  | 6.89%

Differential Revision: https://reviews.llvm.org/D56954

llvm-svn: 351920
2019-01-23 02:20:10 +00:00
Vedant Kumar cde65c0fac [HotColdSplit] Calculate BFI lazily to reduce compile-time, NFC
The splitting pass does not need BFI unless the Module actually has a profile
summary. Do not calcualte BFI unless the summary is present.

For the sqlite3 amalgamation, this reduces time spent in the splitting pass
from 0.4% of the total to under 0.1%.

llvm-svn: 351894
2019-01-22 22:49:22 +00:00
Vedant Kumar 38874c8f7b [HotColdSplit] Calculate domtrees lazily to reduce compile-time, NFC
The splitting pass does not need (post)domtrees until after it's found a
cold block. Defer domtree calculation until a cold block is found.

For the sqlite3 amalgamation, this reduces time spent in the splitting
pass from 0.8% of the total to 0.4%.

llvm-svn: 351892
2019-01-22 22:49:08 +00:00
Jordan Rupprecht 0efe27e62b Revert r351520, "Re-enable terminator folding in LoopSimplifyCFG"
This is still causing compilation crashes in some targets. Will follow up shortly with a repro.

llvm-svn: 351845
2019-01-22 17:39:02 +00:00
Max Kazantsev feb475f4cf [LoopPredication] Support guards expressed as branches by widenable condition
This patch adds support of guards expressed as branches by widenable
conditions in Loop Predication.

Differential Revision: https://reviews.llvm.org/D56081
Reviewed By: reames

llvm-svn: 351805
2019-01-22 11:49:06 +00:00
Max Kazantsev ca45087826 [NFC] Factor out some reusable logic
llvm-svn: 351794
2019-01-22 10:13:36 +00:00
Philip Reames 390c0e2f72 [CVP] Use LVI to constant fold deopt operands
Deopt operands are generally intended to record information about a site in code with minimal perturbation of the surrounding code. Idiomatically, they also tend to appear down rare paths. Putting these together, we have an obvious case for extending CVP w/deopt operand constant folding. Arguably, we should be doing this for all operands on all instructions, but that's definitely a much larger and risky change.

Differential Revision: https://reviews.llvm.org/D55678

llvm-svn: 351774
2019-01-22 01:34:33 +00:00
Serge Guelton be88539b85 Replace llvm::isPodLike<...> by llvm::is_trivially_copyable<...>
As noted in https://bugs.llvm.org/show_bug.cgi?id=36651, the specialization for
isPodLike<std::pair<...>> did not match the expectation of
std::is_trivially_copyable which makes the memcpy optimization invalid.

This patch renames the llvm::isPodLike trait into llvm::is_trivially_copyable.
Unfortunately std::is_trivially_copyable is not portable across compiler / STL
versions. So a portable version is provided too.

Note that the following specialization were invalid:

    std::pair<T0, T1>
    llvm::Optional<T>

Tests have been added to assert that former specialization are respected by the
standard usage of llvm::is_trivially_copyable, and that when a decent version
of std::is_trivially_copyable is available, llvm::is_trivially_copyable is
compared to std::is_trivially_copyable.

As of this patch, llvm::Optional is no longer considered trivially copyable,
even if T is. This is to be fixed in a later patch, as it has impact on a
long-running bug (see r347004)

Note that GCC warns about this UB, but this got silented by https://reviews.llvm.org/D50296.

Differential Revision: https://reviews.llvm.org/D54472

llvm-svn: 351701
2019-01-20 21:19:56 +00:00
Simon Pilgrim e1143c1322 [X86] Auto upgrade VPCOM/VPCOMU intrinsics to generic integer comparisons
This causes a couple of changes in the upgrade tests as signed/unsigned eq/ne are equivalent and we constant fold true/false codes, these changes are the same as what we already do for avx512 cmp/ucmp.

Noticed while cleaning up vector integer comparison costs for PR40376.

llvm-svn: 351697
2019-01-20 19:27:40 +00:00
Vedant Kumar 857cacd9de [ConstantMerge] Factor out check for un-mergeable globals, NFC
llvm-svn: 351671
2019-01-20 02:44:43 +00:00
Nikita Popov 6515db205a [InstCombine] Simplify cttz/ctlz + icmp ugt/ult
Followup to D55745, this time handling comparisons with ugt and ult
predicates (which are the canonical forms for non-equality predicates).

For ctlz we can convert into a simple icmp, for cttz we can convert
into a mask check.

Differential Revision: https://reviews.llvm.org/D56355

llvm-svn: 351645
2019-01-19 09:56:01 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Johannes Doerfert 36872b5db9 Enable IPConstantPropagation to work with abstract call sites
This modification of the currently unused inter-procedural constant
propagation pass (IPConstantPropagation) shows how abstract call sites
enable optimization of callback calls alongside direct and indirect
calls. Through minimal changes, mostly dealing with the partial mapping
of callbacks, inter-procedural constant propagation was enabled for
callbacks, e.g., OpenMP runtime calls or pthreads_create.

Differential Revision: https://reviews.llvm.org/D56447

llvm-svn: 351628
2019-01-19 05:19:12 +00:00
Vedant Kumar b537b946b8 [MergeFunc] Allow merging identical vararg functions using aliases
Thanks to Nikita Popov for pointing out this missed case.

This is a follow-up to r351411, which disabled function merging for
vararg functions outright due to a miscompile (see llvm.org/PR40345).

Differential Revision: https://reviews.llvm.org/D56865

llvm-svn: 351624
2019-01-19 02:46:22 +00:00
Vedant Kumar b755a2df51 [HotColdSplit] Mark inherently cold functions as such
If an inherently cold function is found, mark it as cold. For now this
means applying the `cold` and `minsize` attributes.

As a drive-by, revisit and clean up the criteria for considering a
function for splitting. Add tests.

llvm-svn: 351623
2019-01-19 02:38:47 +00:00
Vedant Kumar 4de1962bb9 [HotColdSplit] Remove a set which tracked split functions (NFC)
Use the begin/end iterator idiom to avoid visiting split functions,
instead of doing a set lookup.

llvm-svn: 351622
2019-01-19 02:38:17 +00:00
Vedant Kumar 17d9f14bff [CodeExtractor] Emit lifetime markers around reloads of outputs
CodeExtractor permits extracting a region of blocks from a function even
when values defined within the region are used outside of it.

This is typically done by creating an alloca in the original function
and reloading the alloca after a call to the extracted function.

Wrap the reload in lifetime start/end markers to promote stack coloring.

Suggested by Sergei Kachkov!

Differential Revision: https://reviews.llvm.org/D56045

llvm-svn: 351621
2019-01-19 02:37:59 +00:00
Florian Hahn be7cbe3f70 [LCSSA] Skip blocks in sub-loops when scanning for uses.
Summary:
Scanning blocks in sub-loops for uses is unnecessary, as they were
already handled while dealing with the containing sub-loop.

This speeds up LCSSA for highly nested loops. For the test case in PR37202, it
halves the time spent in LCSSA. In cases were we won't be able to skip
any blocks, the additional lookup should be negligible.

Time-passes without this patch for test case from PR37202:

  Total Execution Time: 48.5505 seconds (48.5511 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  10.0822 ( 21.0%)   0.1406 ( 27.0%)  10.2228 ( 21.1%)  10.2228 ( 21.1%)  Loop-Closed SSA Form Pass
  10.0417 ( 20.9%)   0.1467 ( 28.2%)  10.1884 ( 21.0%)  10.1890 ( 21.0%)  Loop-Closed SSA Form Pass #2
   4.2703 (  8.9%)   0.0040 (  0.8%)   4.2742 (  8.8%)   4.2742 (  8.8%)  Unswitch loops
   2.7376 (  5.7%)   0.0229 (  4.4%)   2.7605 (  5.7%)   2.7611 (  5.7%)  Loop-Closed SSA Form Pass #5
   2.7332 (  5.7%)   0.0214 (  4.1%)   2.7546 (  5.7%)   2.7546 (  5.7%)  Loop-Closed SSA Form Pass #3
   2.7088 (  5.6%)   0.0230 (  4.4%)   2.7319 (  5.6%)   2.7324 (  5.6%)  Loop-Closed SSA Form Pass #4
   2.6855 (  5.6%)   0.0236 (  4.5%)   2.7091 (  5.6%)   2.7090 (  5.6%)  Loop-Closed SSA Form Pass #6
   2.1648 (  4.5%)   0.0018 (  0.4%)   2.1666 (  4.5%)   2.1664 (  4.5%)  Unroll loops
   1.8371 (  3.8%)   0.0009 (  0.2%)   1.8379 (  3.8%)   1.8380 (  3.8%)  Value Propagation
   1.8149 (  3.8%)   0.0021 (  0.4%)   1.8170 (  3.7%)   1.8169 (  3.7%)  Loop Invariant Code Motion
   1.6755 (  3.5%)   0.0226 (  4.3%)   1.6981 (  3.5%)   1.6980 (  3.5%)  Loop-Closed SSA Form Pass #7

Time-passes with this patch

  Total Execution Time: 29.9285 seconds (29.9276 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   5.2786 ( 17.7%)   0.0021 (  1.2%)   5.2806 ( 17.6%)   5.2808 ( 17.6%)  Unswitch loops
   4.3739 ( 14.7%)   0.0303 ( 18.1%)   4.4042 ( 14.7%)   4.4042 ( 14.7%)  Loop-Closed SSA Form Pass
   4.2658 ( 14.3%)   0.0192 ( 11.5%)   4.2850 ( 14.3%)   4.2851 ( 14.3%)  Loop-Closed SSA Form Pass #2
   2.2307 (  7.5%)   0.0013 (  0.8%)   2.2320 (  7.5%)   2.2318 (  7.5%)  Loop Invariant Code Motion
   2.0888 (  7.0%)   0.0012 (  0.7%)   2.0900 (  7.0%)   2.0897 (  7.0%)  Unroll loops
   1.6761 (  5.6%)   0.0013 (  0.8%)   1.6774 (  5.6%)   1.6774 (  5.6%)  Value Propagation
   1.3686 (  4.6%)   0.0029 (  1.8%)   1.3716 (  4.6%)   1.3714 (  4.6%)  Induction Variable Simplification
   1.1457 (  3.8%)   0.0010 (  0.6%)   1.1468 (  3.8%)   1.1468 (  3.8%)  Loop-Closed SSA Form Pass #4
   1.1384 (  3.8%)   0.0005 (  0.3%)   1.1389 (  3.8%)   1.1389 (  3.8%)  Loop-Closed SSA Form Pass #6
   1.1360 (  3.8%)   0.0027 (  1.6%)   1.1387 (  3.8%)   1.1387 (  3.8%)  Loop-Closed SSA Form Pass #5
   1.1331 (  3.8%)   0.0010 (  0.6%)   1.1341 (  3.8%)   1.1340 (  3.8%)  Loop-Closed SSA Form Pass #3

Reviewers: davide, efriedma, mzolotukhin

Reviewed By: davide, efriedma

Subscribers: hiraditya, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D56848

llvm-svn: 351567
2019-01-18 17:36:22 +00:00
Max Kazantsev b4cd50be56 Re-enable terminator folding in LoopSimplifyCFG: underlying bugs fixed
llvm-svn: 351520
2019-01-18 04:57:32 +00:00
Vedant Kumar f529b50702 [HotColdSplit] Allow outlining with live outputs
Prior to r348205, extracting code regions with live output values was
disabled because of a miscompilation (PR39433). Lift the restriction as
PR39433 has been addressed.

Tested on LNT+externals, on a run of check-llvm in a stage2 build, and
with a full build of iOS (with hot/cold splitting enabled).

As a drive-by, remove an errant TODO.

llvm-svn: 351492
2019-01-17 22:36:05 +00:00
Vedant Kumar b70e20db62 [HotColdSplit] Consider resume instructions to be cold
Resuming exception unwinding is roughly as unlikely as throwing an
exception.

Tested on LNT+externals (in particular, the C++ EH regression tests
provide end-to-end test coverage), as well as with a full build of iOS.

llvm-svn: 351491
2019-01-17 22:35:47 +00:00
Vedant Kumar 4541be0686 [HotColdSplit] Relax requirement that the cold sink block be extractable
Relaxing this requirement creates opportunities to split code dominated
by an EH pad.

Tested on LNT+externals.

llvm-svn: 351483
2019-01-17 21:42:36 +00:00
Vedant Kumar 32a014d048 [HotColdSplit] Simplify tests by lowering their splitting thresholds
This gets rid of the brittle/mysterious calls to @sink()/@sideeffect()
peppered throughout the test cases. They are no longer needed to force
splitting to occur.

llvm-svn: 351480
2019-01-17 21:29:34 +00:00
Wei Mi 3bcccdfe38 [SampleFDO] Skip profile reading when flattened profile used in ThinLTO postlink
If the sample profile has no inlining hierachy information included, we call
the sample profile is flattened. For flattened profile, in ThinLTO postlink
phase, SampleProfileLoader's hot function inlining and profile annotation will
do nothing, so it is better to save the effort to read in the profile and run
the sample profile loader pass. It is helpful for reducing compile time when
the flattened profile is huge.

Differential Revision: https://reviews.llvm.org/D54819

llvm-svn: 351476
2019-01-17 20:48:34 +00:00
Reid Kleckner edd653bc07 [InstCombine] Don't sink dynamic allocas
Summary:
InstCombine's sinking algorithm only thinks about memory. It doesn't
think about non-memory constraints like stack object lifetime. It can
sink dynamic allocas across a stacksave call, which may be used with
stackrestore, which can incorrectly reduce the lifetime of the dynamic
alloca.

Fixes PR40365

Reviewers: hfinkel, efriedma

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56872

llvm-svn: 351475
2019-01-17 20:46:53 +00:00
Teresa Johnson 8d86f1ba47 Revert "[ThinLTO] Add summary entries for index-based WPD"
Mistaken commit of something still under review!

This reverts commit r351453.

llvm-svn: 351455
2019-01-17 16:05:04 +00:00
Teresa Johnson 4fcf3b1621 [ThinLTO] Add summary entries for index-based WPD
Summary:
If LTOUnit splitting is disabled, the module summary analysis computes
the summary information necessary to perform single implementation
devirtualization during the thin link with the index and no IR. The
information collected from the regular LTO IR in the current hybrid WPD
algorithm is summarized, including:
1) For vtable definitions, record the function pointers and their offset
within the vtable initializer (subsumes the information collected from
IR by tryFindVirtualCallTargets).
2) A record for each type metadata summarizing the vtable definitions
decorated with that metadata (subsumes the TypeIdentiferMap collected
from IR).

Also added are the necessary bitcode records, and the corresponding
assembly support.

The index-based WPD will be sent as a follow-on.

Depends on D53890.

Reviewers: pcc

Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54815

llvm-svn: 351453
2019-01-17 15:49:03 +00:00
Max Kazantsev 61a8d3fb33 [LoopSimplifyCFG] Form LCSSA when a parent loop becomes a sibling
During the transforms in LoopSimplifyCFG, when we remove a dead exiting edge, the
parent loop may stop being reachable from the child loop, and therefore they become
siblings. If the former child loop had uses of some values from its former parent loop,
now such uses will require LCSSA Phis, even if they weren't needed before. So we must
form LCSSA for all loops that stopped being ancestors of the current loop in this case.

Differential Revision: https://reviews.llvm.org/D56144
Reviewed By: fedor.sergeev

llvm-svn: 351434
2019-01-17 12:51:10 +00:00
Max Kazantsev 8b134169f5 [LoopSimplifyCFG] Fix order of deletion of complex dead subloops
Function `DeleteDeadBlock` requires that all predecessors of a block
being deleted have already been deleted, with the exception of a
single-block loop. When we use it for removal of dead subloops that
contain more than one block, we may not fulfull this requirement and
fail an assertion.

This patch replaces invocation of `DeleteDeadBlock` with a generalized
version `DeleteDeadBlocks` that is able to deal with multiple dead blocks,
even if they contain some cycles.

Differential Revision: https://reviews.llvm.org/D56121
Reviewed By: fedor.sergeev

llvm-svn: 351433
2019-01-17 12:25:40 +00:00
Max Kazantsev ee61308595 [NFC] Factor out some local vars
llvm-svn: 351416
2019-01-17 06:20:42 +00:00
Vedant Kumar a9906c1e5e [MergeFunc] Prevent silent miscompile of vararg functions
The function merging pass miscompiles identical vararg functions. The
forwarding thunk it emits doesn't forward the full variable-length list
of arguments. Disable merging for vararg functions for now.

I've filed llvm.org/PR40345 to track the issue.

rdar://47326238

llvm-svn: 351411
2019-01-17 02:15:05 +00:00
Vedant Kumar e21ab22115 [FunctionComparator] Consider tail call kinds
Essentially, do not treat `call` and `musttail call` as the same thing.

As a drive-by, fold CallInst and InvokeInst handling together using the
CallSite helper.

Differential Revision: https://reviews.llvm.org/D56815

llvm-svn: 351405
2019-01-17 00:29:14 +00:00
Wei Mi 79c4408aa2 Fix a mistake in rL351392.
PGOInstrGen should be initialized to "" instead of false.

llvm-svn: 351397
2019-01-16 23:31:40 +00:00
Wei Mi c876e3d42b [PGO] Make pgo related options in opt more consistent.
Currently we have pgo options defined in PassManagerBuilder.cpp only for
instrument pgo, but not for sample pgo. We also have pgo options defined
in NewPMDriver.cpp in opt only for new pass manager and for all kinds of
pgo. They have some inconsistency.

To make the options more consistent and make tests writing easier, the
patch let old pass manager to share the same pgo options with new pass
manager in opt, and removes the options in PassManagerBuilder.cpp.

Differential Revision: https://reviews.llvm.org/D56749

llvm-svn: 351392
2019-01-16 23:19:02 +00:00
Alexey Bataev 18809a6bbb [SLP] Fix PR40310: The reduction nodes should stay scalar.
Summary:
Sometimes the SLP vectorizer tries to vectorize the horizontal reduction
nodes during regular vectorization. This may happen inside of the loops,
when there are some vectorizable PHIs. Patch fixes this by checking if
the node is the reduction node and thus it must not be vectorized, it must
be gathered.

Reviewers: RKSimon, spatel, hfinkel, fedor.sergeev

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56783

llvm-svn: 351349
2019-01-16 15:39:52 +00:00
Gabor Buella 3ec170c85a Assertion in isAllocaPromotable due to extra bitcast goes into lifetime marker
For the given test SROA detects possible replacement and creates a correct alloca. After that SROA is adding lifetime markers for this new alloca. The function getNewAllocaSlicePtr is trying to deduce the pointer type based on the original alloca, which is split, to use it later in lifetime intrinsic.

For the test we ended up with such code (rA is initial alloca [10 x float], which is split, and rA.sroa.0.0 is a new split allocation)

```
%rA.sroa.0.0.rA.sroa_cast = bitcast i32* %rA.sroa.0 to [10 x float]*    <----- this one causing the assertion and is an extra bitcast
%5 = bitcast [10 x float]* %rA.sroa.0.0.rA.sroa_cast to i8*
call void @llvm.lifetime.start.p0i8(i64 4, i8* %5)
```

isAllocaPromotable code assumes that a user of alloca may go into lifetime marker through bitcast but it must be the only one bitcast to i8* type. In the test it's not a i8* type, return false and throw the assertion.

As we are creating a pointer, which will be used in lifetime markers only, the proposed fix is to create a bitcast to i8* immediately to avoid extra bitcast creation.

The test is a greatly simplified to just reproduce the assertion.

Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com>

Reviewers: chandlerc, craig.topper

Reviewed By: chandlerc

Differential Revision: https://reviews.llvm.org/D55934

llvm-svn: 351325
2019-01-16 12:06:17 +00:00
Philip Pfaffe 81101de585 [MSan] Apply the ctor creation scheme of TSan
Summary: To avoid adding an extern function to the global ctors list, apply the changes of D56538 also to MSan.

Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan

Subscribers: hiraditya, bollu, llvm-commits

Differential Revision: https://reviews.llvm.org/D56734

llvm-svn: 351322
2019-01-16 11:14:07 +00:00
Philip Pfaffe 685c76d7a3 [NewPM][TSan] Reiterate the TSan port
Summary:
Second iteration of D56433 which got reverted in rL350719. The problem
in the previous version was that we dropped the thunk calling the tsan init
function. The new version keeps the thunk which should appease dyld, but is not
actually OK wrt. the current semantics of function passes. Hence, add a
helper to insert the functions only on the first time. The helper
allows hooking into the insertion to be able to append them to the
global ctors list.

Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan

Subscribers: hiraditya, bollu, llvm-commits

Differential Revision: https://reviews.llvm.org/D56538

llvm-svn: 351314
2019-01-16 09:28:01 +00:00
Tom Stellard 3d36e5c3e6 Only promote args when function attributes are compatible
Summary:
Check to make sure that the caller and the callee have compatible
function arguments before promoting arguments.  This uses the same
TargetTransformInfo queries that are used to determine if attributes
are compatible for inlining.

The goal here is to avoid breaking ABI when a called function's ABI
depends on a target feature that is not enabled in the caller.

This is a very conservative fix for PR37358.  Ideally we would have a more
sophisticated check for ABI compatiblity rather than checking if the
attributes are compatible for inlining.

Reviewers: echristo, chandlerc, eli.friedman, craig.topper

Reviewed By: echristo, chandlerc

Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits

Differential Revision: https://reviews.llvm.org/D53554

llvm-svn: 351296
2019-01-16 05:15:31 +00:00
Serguei Katkov a5b0e5585b [InstCombine]Avoid introduction of unaligned mem access
InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair.
It might result in generation of unaligned atomic load/store which later in backend
will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is
and handle it at backend.

Reviewers: reames, anna, apilipenko, mkazantsev
Reviewed By: reames
Subscribers: t.p.northover, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D56582

llvm-svn: 351295
2019-01-16 04:36:26 +00:00
David Callahan d129d3e93f treat invoke like call
Summary:
InvokeInst should be treated like CallInst and
assigned a separate discriminator. This is particularly
import when an Invoke is converted to a Call
during compilation and so can invalidate sample profile
data collected wtih different link time optimizations

Reviewers: twoh, Kader, danielcdh, wmi

Reviewed By: wmi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56491

llvm-svn: 351251
2019-01-15 21:26:51 +00:00
Matt Morehouse 19ff35c481 [SanitizerCoverage] Don't create comdat for interposable functions.
Summary:
Comdat groups override weak symbol behavior, allowing the linker to keep
the comdats for weak symbols in favor of comdats for strong symbols.

Fixes the issue described in:
https://bugs.chromium.org/p/chromium/issues/detail?id=918662

Reviewers: eugenis, pcc, rnk

Reviewed By: pcc, rnk

Subscribers: smeenai, rnk, bd1976llvm, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56516

llvm-svn: 351247
2019-01-15 21:21:01 +00:00
David Callahan dee001207e We can improve the performance (generally) by memo-izing the action to map a debug location to its function summary.
Summary:
Here are timings (as reported by "opt -time-passes") for
sample-profile pass for some files holding hot functions from a major
service©r. Average 17% reduction. Delta column is 100*(old-new)/old.

```
Old    New    Delta
0.0537 0.0538 -0.2%
0.8155 0.6522 20.0%
0.0779 0.0751  3.6%
0.0727 0.0913 -25.6%
0.1622 0.1302 19.7%
0.0627 0.0594  5.3%
0.0766 0.0744  2.9%
0.6426 0.4387 31.7%
0.3521 0.2776 21.2%
0.3549 0.2721 23.3%
0.0912 0.0904  0.9%
0.1236 0.1059 14.3%
0.0854 0.0866 -1.4%
0.0757 0.0722  4.6%
0.1293 0.1147 11.3%
0.1354 0.1122 17.1%
0.0767 0.0770 -0.4%
0.1135 0.0968 14.7%
0.0524 0.0608 -16.0%
0.1279 0.1106 13.5%
==========
3.6820 3.0520 17.1% Total
```

Reviewers: twoh, Kader, danielcdh, wmi

Reviewed By: wmi

Subscribers: dblaikie, llvm-commits

Differential Revision: https://reviews.llvm.org/D56435

llvm-svn: 351211
2019-01-15 17:45:54 +00:00
Zaara Syeda b7dff9c9af [SimpleLoopUnswitch] Increment stats counter for unswitching switch instruction
Increment statistics counter NumSwitches at unswitchNontrivialInvariants() for
unswitching a non-trivial switch instruction. This is to fix a bug that it
increments NumBranches even for the case of switch instruction.
There is no functional change in this patch.

Differential Revision: https://reviews.llvm.org/D56408

llvm-svn: 351193
2019-01-15 15:08:01 +00:00
Florian Hahn 4094f34f78 [InstCombine] Don't undo 0 - (X * Y) canonicalization when combining subs.
Otherwise instcombine gets stuck in a cycle. The canonicalization was
added in D55961.

This patch fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=12400

llvm-svn: 351187
2019-01-15 11:18:21 +00:00
Max Kazantsev f8a0e0ddf0 [NFC] Remove some code duplication
llvm-svn: 351185
2019-01-15 11:16:14 +00:00
Max Kazantsev 80242ee87e [NFC] Remove obsolete enum RangeCheckKind
llvm-svn: 351183
2019-01-15 10:48:45 +00:00
Max Kazantsev 78a5435284 [NFC] Decrease if nest
llvm-svn: 351180
2019-01-15 10:01:46 +00:00
Max Kazantsev a78dc4d6c8 [NFC] Move some functions to LoopUtils
llvm-svn: 351179
2019-01-15 09:51:34 +00:00
Marek Olsak 33eb4d947d AMDGPU: Add a fast path for icmp.i1(src, false, NE)
Summary:
This allows moving the condition from the intrinsic to the standard ICmp
opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic
is an identity for retrieving the SGPR mask.

And we can also get the mask from and i1, or i1, xor i1.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52060

llvm-svn: 351150
2019-01-15 02:13:18 +00:00
Jonathan Metzman e159a0dd1a [SanitizerCoverage][NFC] Use appendToUsed instead of include
Summary:
Use appendToUsed instead of include to ensure that
SanitizerCoverage's constructors are not stripped.

Also, use isOSBinFormatCOFF() to determine if target
binary format is COFF.

Reviewers: pcc

Reviewed By: pcc

Subscribers: hiraditya

Differential Revision: https://reviews.llvm.org/D56369

llvm-svn: 351118
2019-01-14 21:02:02 +00:00
David Callahan 957795973b Ignore PhiNodes when mapping sample profile data
Summary: Like branch instructions, phi nodes frequently do not have debug information related to the block they are in and so they should be ignored.

Reviewers: danielcdh, twoh, Kader, wmi

Reviewed By: wmi

Subscribers: aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D55094

llvm-svn: 351102
2019-01-14 19:05:59 +00:00
David Callahan b1853a6a94 Revert "Merge branch 'arcpatch-D55094'"
This reverts commit a9788dd6587d67c856df74eedff5a6ad34ce8320, reversing
changes made to f1309ffebf718d16aec4fab83380556c660e2825.

unintended merge pushed

llvm-svn: 351095
2019-01-14 18:49:27 +00:00
David Callahan 1b46231764 Merge branch 'arcpatch-D55094'
llvm-svn: 351092
2019-01-14 18:35:43 +00:00
Tom Stellard d0a7676087 cmake: Don't install plugins used for examples or tests
Summary:
This patch drops install targets for LLVMHello.so,
TestPlugin.so, and BugpointPasses.so.

Reviewers: chandlerc, beanz, thakis, philip.pfaffe

Reviewed By: chandlerc

Subscribers: SquallATF, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D55965

llvm-svn: 351087
2019-01-14 18:25:35 +00:00
Jeremy Morse f216da7ee0 [DebugInfo] Remove un-necessary logic from HoistThenElseCodeToIf
Following PR39807, the way in which SimplifyCFG hoists common code on
branch paths was fixed in r347782. However this left extra code hanging
around HoistThenElseCodeToIf that wasn't necessary and needlessly
complicated matters -- we no longer need to look up through the 'if'
basic block to find a location for hoisted 'select' insts, we can instead
use the location chosen by applyMergedLocation.

This patch deletes that extra logic, and updates a regression test to
reflect the new logic (selects get the merged location, not a previous
insts location).

Differential Revision: https://reviews.llvm.org/D55272

llvm-svn: 351058
2019-01-14 12:13:12 +00:00
David Stuttard f77079f892 [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

This re-submit of the change also includes a slight modification in
SIISelLowering.cpp to work-around a compiler bug for the powerpc_le
platform that caused a buildbot failure on a previous submission.

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda


Work around for ppcle compiler bug

Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b
llvm-svn: 351054
2019-01-14 11:55:24 +00:00
Max Kazantsev 1f73310e1e [BasicBlockUtils] Generalize DeleteDeadBlock to deal with multiple dead blocks
Utility function `DeleteDeadBlock` expects that all predecessors of a block being
deleted are already deleted, with the exception of single-block loop. It makes it
hard to use for deletion of a set of blocks that may contain cyclic dependencies.
The is no correct order of invocations of this function that does not produce
dangling pointers on already deleted blocks.

This patch introduces a generalized version of this function `DeleteDeadBlocks`
that allows us to remove multiple blocks at once, even if there are cycles among
them. The only requirement is that no block being deleted should have a predecessor
that is not being deleted. 

The logic of `DeleteDeadBlocks` is following:
  for each block
    create relevant DT updates;
    remove all instructions (replace with undef if needed);
    replace terminator with unreacheable;
  apply DT updates;
  for each block
    delete block;

Therefore, `DeleteDeadBlock` becomes a particular case of
the general algorithm called for a single block.

Differential Revision: https://reviews.llvm.org/D56120
Reviewed By: skatkov

llvm-svn: 351045
2019-01-14 10:26:26 +00:00
Benjamin Kramer b17d2136ea Give helper classes/functions local linkage. NFC.
llvm-svn: 351016
2019-01-12 18:36:22 +00:00
Sanjay Patel 7d65fe5cd5 [LoopVectorizer] give more advice in remark about failure to vectorize call
Something like this is requested by:
https://bugs.llvm.org/show_bug.cgi?id=40265
...and it seems like a common enough case that we should acknowledge it.

Differential Revision: https://reviews.llvm.org/D56551

llvm-svn: 351010
2019-01-12 15:27:15 +00:00
Teresa Johnson 290a839891 [LTO] Record whether LTOUnit splitting is enabled in index
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).

The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.

This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.

Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.

Reviewers: pcc

Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D53890

llvm-svn: 350948
2019-01-11 18:31:57 +00:00
Vedant Kumar ee10ef737e [MergeFunc] Erase unused duplicate functions if they are discardable
MergeFunc only deletes unused duplicate functions if they have local
linkage, but it should be safe to relax this to any "discardable if
unused" linkage type.

Differential Revision: https://reviews.llvm.org/D56574

llvm-svn: 350939
2019-01-11 17:56:35 +00:00
Vedant Kumar 08fe7e02fb [MergeFunc] Use Instruction::getFunction as a cleanup, NFC
llvm-svn: 350938
2019-01-11 17:56:21 +00:00
Ehsan Amiri f452f116d2 [Jump Threading] Unfold a select insn that feeds a switch via a phi node
Currently when a select has a constant value in one branch and the select feeds
a conditional branch (via a compare/ phi and compare) we unfold the select 
statement. This results in threading the conditional branch later on. Similar
opportunity exists when a select (with a constant in one branch) feeds a 
switch (via a phi node). The patch unfolds select under this condition. 
A testcase is provided.

llvm-svn: 350931
2019-01-11 15:52:57 +00:00
Matt Davis 9cd9f41f0e [GVN] Update BlockRPONumber prior to use.
Summary:
The original patch addressed the use of BlockRPONumber by forcing a sequence point when accessing that map in a conditional.  In short we found cases where that map was being accessed with blocks that had not yet been added to that structure.  For context, I've kept the wall of text below,  to what we are trying to fix, by always ensuring a updated BlockRPONumber.

== Backstory ==

I was investigating an ICE (segfault accessing a DenseMap item).  This failure happened non-deterministically, with no apparent reason and only on a Windows build of LLVM (from October 2018).

After looking into the crashes (multiple core files) and running DynamoRio, the cores and DynamoRio (DR) log pointed to the same code in `GVN::performScalarPRE()`. The values in the map are unsigned integers, the keys are `llvm::BasicBlock*`.  Our test case that triggered this warning and periodic crash is rather involved.  But the problematic line looks to be:

GVN.cpp: Line 2197

```
     if (BlockRPONumber[P] >= BlockRPONumber[CurrentBlock] &&
```

To test things out, I cooked up a patch that accessed the items in the map outside of the condition, by forcing a sequence point between accesses. DynamoRio stopped warning of the issue, and the test didn't seem to crash after 1000+ runs.

My investigation was on an older version of LLVM, (source from October this year). What it looks like was occurring is the following, and the assembly from the latest pull of llvm in December seems to confirm this might still be an issue; however, I have not witnessed the crash on more recent builds. Of course the asm in question is generated from the host compiler on that Windows box (not clang), but it hints that we might want to consider how we access the BlockRPONumber map in this conditional (line 2197, listed above).  In any case, I don't think the host compiler is wrong, rather I think it is pointing out a possibly latent bug in llvm.

1) There is no sequence point for the `>=` operation.

2) A call to a `DenseMapBase::operator[]` can have the side effect of the map reallocating a larger store (more Buckets, via a call to `DenseMap::grow`).

3) It seems perfectly legal for a host compiler to generate assembly that stores the result of a call to `operator[]` on the stack (that's what my host compile of GVN.cpp is doing) .  A second call to `operator[]` //might// encourage the map to 'grow' thus making any pointers to the map's store invalid.  The `>=` compares the first and second values. If the first happens to be a pointer produced from operator[], it could be invalid when dereferenced at the time of comparison.

The assembly generated from the Window's host compiler does show the result of the first access to the map via `operator[]` produces a pointer to an unsigned int.  And that pointer is being stored on  the stack.  If a second call to the map (which does occur) causes the map to grow, that address (on the stack) is now invalid. 

Reviewers: t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D55974

llvm-svn: 350880
2019-01-10 19:56:03 +00:00
Alina Sbirlea cae12edaaa Use MemorySSA in LICM to do sinking and hoisting.
Summary:
Step 2 in using MemorySSA in LICM:
Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag.
Promotion is disabled.

Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which
relied on promotion, in order to test all sinking tests.

Reviewers: sanjoy, davide, gberry, george.burgess.iv

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D40375

llvm-svn: 350879
2019-01-10 19:29:04 +00:00
James Y Knight 62df5eed16 [opaque pointer types] Remove some calls to generic Type subtype accessors.
That is, remove many of the calls to Type::getNumContainedTypes(),
Type::subtypes(), and Type::getContainedType(N).

I'm not intending to remove these accessors -- they are
useful/necessary in some cases. However, removing the pointee type
from pointers would potentially break some uses, and reducing the
number of calls makes it easier to audit.

llvm-svn: 350835
2019-01-10 16:07:20 +00:00
Eli Friedman d4e7a0d83c [SimplifyLibCalls] Fix memchr expansion for constant strings.
The C standard says "The memchr function locates the first
occurrence of c (converted to an unsigned char)[...]".  The expansion
was missing the conversion to unsigned char.

Fixes https://bugs.llvm.org/show_bug.cgi?id=39041 .

Differential Revision: https://reviews.llvm.org/D55947

llvm-svn: 350775
2019-01-09 23:39:26 +00:00
Easwaran Raman b45994b843 Refactor synthetic profile count computation. NFC.
Summary:
Instead of using two separate callbacks to return the entry count and the
relative block frequency, use a single callback to return callsite
count. This would allow better supporting hybrid mode in the future as
the count of callsite need not always be derived from entry count (as in
sample PGO).

Reviewers: davidxl

Subscribers: mehdi_amini, steven_wu, dexonsmith, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D56464

llvm-svn: 350755
2019-01-09 20:10:27 +00:00
Florian Hahn 9697d2a764 Revert r350647: "[NewPM] Port tsan"
This patch breaks thread sanitizer on some macOS builders, e.g.
http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/52725/

llvm-svn: 350719
2019-01-09 13:32:16 +00:00
Max Kazantsev 4615a505f8 [IPT] Drop cache less eagerly in GVN and LoopSafetyInfo
Current strategy of dropping `InstructionPrecedenceTracking` cache is to
invalidate the entire basic block whenever we change its contents. In fact,
`InstructionPrecedenceTracking` has 2 internal strictures: `OrderedInstructions`
that is needed to be invalidated whenever the contents changes, and the map
with first special instructions in block. This second map does not need an
update if we add/remove a non-special instuction because it cannot
affect the contents of this map.

This patch changes API of `InstructionPrecedenceTracking` so that it now
accounts for reasons under which we invalidate blocks. This should lead
to much less recalculations of the map and should save us some compile time
because in practice we don't typically add/remove special instructions.

Differential Revision: https://reviews.llvm.org/D54462
Reviewed By: efriedma

llvm-svn: 350694
2019-01-09 07:28:13 +00:00
Sanjay Patel d023dd60e9 [InstCombine] canonicalize another raw IR rotate pattern to funnel shift
This is matching the equivalent of the DAG expansion, 
so it should never end up with worse perf than the 
original code even if the target doesn't have a rotate
instruction.

llvm-svn: 350672
2019-01-08 22:39:55 +00:00
Philip Pfaffe 82f995db75 [NewPM] Port tsan
A straightforward port of tsan to the new PM, following the same path
as D55647.

Differential Revision: https://reviews.llvm.org/D56433

llvm-svn: 350647
2019-01-08 19:21:57 +00:00
Anna Thomas 2dfa412efe [UnrollRuntime] Fix domTree failures in multiexit unrolling
Summary:
This fixes the IDom for exit blocks and all blocks reachable from the exit blocks, when runtime unrolling under multiexit/exiting case.
We initially had a restrictive check that the IDom is only updated when
it is the header of the loop.
However, we also need to update the IDom to the correct one when the
IDom is any block within the original loop. See added test cases (which
fail dom tree verification without the patch).

Reviewers: reames, mzolotukhin, mkazantsev, hfinkel

Reviewed by: brzycki, kuhar

Subscribers: zzheng, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D56284

llvm-svn: 350640
2019-01-08 17:16:25 +00:00
Chandler Carruth 57578aaf96 [CallSite removal] Port `IndirectCallSiteVisitor` to use `CallBase` and
update client code.

Also rename it to use the more generic term `call` instead of something
that could be confused with a praticular type.

Differential Revision: https://reviews.llvm.org/D56183

llvm-svn: 350508
2019-01-07 07:15:51 +00:00
Chandler Carruth 363ac68374 [CallSite removal] Migrate all Alias Analysis APIs to use the newly
minted `CallBase` class instead of the `CallSite` wrapper.

This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.

Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.

Thanks for the review from Saleem!

Differential Revision: https://reviews.llvm.org/D55641

llvm-svn: 350503
2019-01-07 05:42:51 +00:00
Nikita Popov 65038515ee [InstCombine] Relax cttz/ctlz with select on zero
The cttz/ctlz intrinsics have a parameter specifying whether the
result is undefined for zero. cttz(x, false) can be relaxed to
cttz(x, true) if x is known non-zero, and in fact such an optimization
is already performed. However, this currently doesn't work if x is
non-zero as a result of a select rather than an explicit branch.
This patch adds handling for this case, thus allowing
x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y.

Differential Revision: https://reviews.llvm.org/D55786

llvm-svn: 350463
2019-01-05 09:48:16 +00:00
Easwaran Raman 366a873f14 [Inliner] Optimize shouldBeDeferred
This has some minor optimizations to shouldBeDeferred. This is not
strictly NFC because the early exit inside the loop assumes
TotalSecondaryCost is monotonically non-decreasing, which is not true if
the threshold used by CostAnalyzer is negative. AFAICT the thresholds do
not go below 0 for the default values of the various options we use.

llvm-svn: 350456
2019-01-05 02:26:29 +00:00
Evgeniy Stepanov 0184c53cbd Revert "Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)""
This reapplies commit r348983.

llvm-svn: 350448
2019-01-05 00:44:58 +00:00
Nikita Popov 6658fce4fc [BDCE] Remove dead uses of arguments
In addition to finding dead uses of instructions, also find dead uses
of function arguments, and replace them with zero as well.

I'm changing the way the known bits are computed here to remove the
coupling between the transfer function and the algorithm. It previously
relied on the first op being visited first and computing known bits --
unless the first op is not an instruction, in which case they're computed
on the second op. I could have adjusted this to check for "instruction
or argument", but I think it's better to avoid the repeated calculation
with an explicit flag.

Differential Revision: https://reviews.llvm.org/D56247

llvm-svn: 350435
2019-01-04 21:21:43 +00:00
Peter Collingbourne 87f477b5e4 hwasan: Implement lazy thread initialization for the interceptor ABI.
The problem is similar to D55986 but for threads: a process with the
interceptor hwasan library loaded might have some threads started by
instrumented libraries and some by uninstrumented libraries, and we
need to be able to run instrumented code on the latter.

The solution is to perform per-thread initialization lazily. If a
function needs to access shadow memory or add itself to the per-thread
ring buffer its prologue checks to see whether the value in the
sanitizer TLS slot is null, and if so it calls __hwasan_thread_enter
and reloads from the TLS slot. The runtime does the same thing if it
needs to access this data structure.

This change means that the code generator needs to know whether we
are targeting the interceptor runtime, since we don't want to pay
the cost of lazy initialization when targeting a platform with native
hwasan support. A flag -fsanitize-hwaddress-abi={interceptor,platform}
has been introduced for selecting the runtime ABI to target. The
default ABI is set to interceptor since it's assumed that it will
be more common that users will be compiling application code than
platform code.

Because we can no longer assume that the TLS slot is initialized,
the pthread_create interceptor is no longer necessary, so it has
been removed.

Ideally, lazy initialization should only cost one instruction in the
hot path, but at present the call may cause us to spill arguments
to the stack, which means more instructions in the hot path (or
theoretically in the cold path if the spills are moved with shrink
wrapping). With an appropriately chosen calling convention for
the per-thread initialization function (TODO) the hot path should
always need just one instruction and the cold path should need two
instructions with no spilling required.

Differential Revision: https://reviews.llvm.org/D56038

llvm-svn: 350429
2019-01-04 19:27:04 +00:00
Teresa Johnson 853b962416 [ThinLTO] Handle chains of aliases
At -O0, globalopt is not run during the compile step, and we can have a
chain of an alias having an immediate aliasee of another alias. The
summaries are constructed assuming aliases in a canonical form
(flattened chains), and as a result only the base object but no
intermediate aliases were preserved.

Fix by adding a pass that canonicalize aliases, which ensures each
alias is a direct alias of the base object.

Reviewers: pcc, davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54507

llvm-svn: 350423
2019-01-04 19:04:54 +00:00
Vedant Kumar a1778df474 [CodeExtractor] Do not extract unsafe lifetime markers
Lifetime markers which reference inputs to the extraction region are not
safe to extract. Example ('rhs' will be extracted):

```
               entry:
              +------------+
              | x = alloca |
              | y = alloca |
              +------------+
             /              \
   lhs:                      rhs:
  +-------------------+     +-------------------+
  | lifetime_start(x) |     | lifetime_start(x) |
  | use(x)            |     | lifetime_start(y) |
  | lifetime_end(x)   |     | use(x, y)         |
  | lifetime_start(y) |     | lifetime_end(y)   |
  | use(y)            |     | lifetime_end(x)   |
  | lifetime_end(y)   |     +-------------------+
  +-------------------+
```

Prior to extraction, the stack coloring pass sees that the slots for 'x'
and 'y' are in-use at the same time. After extraction, the coloring pass
infers that 'x' and 'y' are *not* in-use concurrently, because markers
from 'rhs' are no longer available to help decide otherwise.

This leads to a miscompile, because the stack slots actually are in-use
concurrently in the extracted function.

Fix this by moving lifetime start/end markers for memory regions defined
in the calling function around the call to the extracted function.

Fixes llvm.org/PR39671 (rdar://45939472).

Differential Revision: https://reviews.llvm.org/D55967

llvm-svn: 350420
2019-01-04 17:43:22 +00:00
Sanjay Patel 722466e1f1 [InstCombine] reduce raw IR narrowing rotate patterns to funnel shift
Similar to rL350199 - there are no known analysis/codegen holes for
funnel shift intrinsics now, so we can canonicalize the 6+ regular
instructions to funnel shift to improve vectorization, inlining,
unrolling, etc.

llvm-svn: 350419
2019-01-04 17:38:12 +00:00
John Brawn 39ac159c24 [LICM] Adjust how moving the re-hoist point works
In some cases the order that we hoist instructions in means that when rehoisting
(which uses the same order as hoisting) we can rehoist to a block A, then a
block B, then block A again. This currently causes an assertion failure as it
expects that when changing the hoist point it only ever moves to a block that
dominates the hoist point being moved from.

Fix this by moving the re-hoist point when it doesn't dominate the dominator of
hoisted instruction, or in other words when it wouldn't dominate the uses of
the instruction being rehoisted.

Differential Revision: https://reviews.llvm.org/D55266

llvm-svn: 350408
2019-01-04 17:12:09 +00:00
Xin Tong 47beee2f3f [memcpyopt] Remove a few unnecessary isVolatile() checks. NFC
We already checked for isSimple() on the store.

llvm-svn: 350378
2019-01-04 02:13:22 +00:00
Anna Thomas a470aa6701 [UnrollRuntime] Move the DomTree verification under expensive checks
Suggested by Hal as done in r349871.

llvm-svn: 350349
2019-01-03 19:43:33 +00:00
Anna Thomas 0785e7307e [UnrollRuntime] Add DomTree verification under debug mode
NFC: This adds the dom tree verification under debug mode at a point
just before we start unrolling the loop. This allows us to verify dom
tree at a state where it is much smaller and before the unrolling
actually happens.
This also implies we do not need to run -verify-dom-info everytime to
see if the DT is in a valid state when we transform the loop for runtime
unrolling.

llvm-svn: 350334
2019-01-03 17:44:44 +00:00
Philip Pfaffe b39a97c8f6 [NewPM] Port Msan
Summary:
Keeping msan a function pass requires replacing the module level initialization:
That means, don't define a ctor function which calls __msan_init, instead just
declare the init function at the first access, and add that to the global ctors
list.

Changes:
- Pull the actual sanitizer and the wrapper pass apart.
- Add a newpm msan pass. The function pass inserts calls to runtime
  library functions, for which it inserts declarations as necessary.
- Update tests.

Caveats:
- There is one test that I dropped, because it specifically tested the
  definition of the ctor.

Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka

Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji

Differential Revision: https://reviews.llvm.org/D55647

llvm-svn: 350305
2019-01-03 13:42:44 +00:00
Pete Cooper 697281df42 Teach ObjCARC optimizer about equivalent PHIs when eliminating autoreleaseRV/retainRV pairs
OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it
sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have
equivalent arguments (either identical or equivalent PHIs). It then assumes that
ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead.

Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs
so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue.

This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence.

rdar://problem/47005143

Reviewed By: ahatanak

Differential Revision: https://reviews.llvm.org/D56235

llvm-svn: 350284
2019-01-03 01:38:08 +00:00
Xin Tong 33e3b4b9b3 [ThinLTO] Scan all variants of vague symbol for reachability.
Summary:
Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable
from somewhere else.

Reviewers: tejohnson, grimar

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D56117

llvm-svn: 350269
2019-01-02 23:18:20 +00:00
Pete Cooper 8d58048024 Fix assert in ObjCARC optimizer when deleting retainBlock of null or undef.
The caller to EraseInstruction had this conditional:

    // ARC calls with null are no-ops. Delete them.
    if (IsNullOrUndef(Arg))

but the assert inside EraseInstruction only allowed ConstantPointerNull and not
undef or bitcasts.

This adds support for both of these cases.

rdar://problem/47003805

llvm-svn: 350261
2019-01-02 21:00:02 +00:00
Nikita Popov cc6ef7f153 [BDCE] Remove instructions without demanded bits
If an instruction has no demanded bits, remove it directly during BDCE,
instead of leaving it for something else to clean up.

Differential Revision: https://reviews.llvm.org/D56185

llvm-svn: 350257
2019-01-02 20:02:14 +00:00
Pawel Bylica 119aa8fa5f Format AggresiveInstCombine.cpp. NFC
llvm-svn: 350255
2019-01-02 19:51:46 +00:00
Sanjay Patel 654e6aabb9 [InstCombine] canonicalize raw IR rotate patterns to funnel shift
The final piece of IR-level analysis to allow this was committed with:
rL350188

Using the intrinsics should improve transforms based on cost models
like vectorization and inlining.

The backend should be prepared too, so we can now canonicalize more
sequences of shift/logic to the intrinsics and know that the end
result should be equal or better to the original code even if the
target does not have an actual rotate instruction.

llvm-svn: 350199
2019-01-01 21:51:39 +00:00
Nikita Popov bc9986e9ad Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 350188
2019-01-01 10:05:26 +00:00
Chen Zheng 4952e668f8 [InstCombine] canonicalize MUL with NEG operand
-X * Y --> -(X * Y)
X * -Y --> -(X * Y)

Differential Revision: https://reviews.llvm.org/D55961

llvm-svn: 350185
2019-01-01 01:09:20 +00:00
Alexander Potapenko cea4f83371 [MSan] Handle llvm.is.constant intrinsic
MSan used to report false positives in the case the argument of
llvm.is.constant intrinsic was uninitialized.
In fact checking this argument is unnecessary, as the intrinsic is only
used at compile time, and its value doesn't depend on the value of the
argument.

llvm-svn: 350173
2018-12-31 09:42:23 +00:00
Max Kazantsev 201534d753 Drop SE cache early because loop parent can change in LoopSimplifyCFG
llvm-svn: 350145
2018-12-29 04:26:22 +00:00
Anna Thomas 98743fa77a [UnrollRuntime] NFC: Add comment and verify LCSSA
Added -verify-loop-lcssa to test cases.
Updated comments in ConnectProlog.

llvm-svn: 350131
2018-12-28 18:52:16 +00:00
Max Kazantsev 530ff8f3cc Temporarily disable term folding in LoopSimplifyCFG, add tests
llvm-svn: 350117
2018-12-28 06:22:39 +00:00
Max Kazantsev 80e4b40f3e [LoopSimplifyCFG] Delete dead blocks in RPO
Deletion of dead blocks in arbitrary order may lead to failure
of assertion in `DeleteDeadBlock` that requires that we have
deleted all predecessors before we can delete the current block.
We should instead delete them in RPO order.

llvm-svn: 350116
2018-12-28 06:08:51 +00:00
Craig Topper c9a6000755 [LoopIdiomRecognize] Add CTTZ support
Summary:
Existing LIR recognizes CTLZ where shifting input variable right until it is zero. (Shift-Until-Zero idiom)

This commit:
1. Augments Shift-Until-Zero idiom to recognize CTTZ where input variable is shifted left.
2. Prepare for BitScan idiom recognition.

Patch by Yuanfang Chen (tabloid.adroit)

Reviewers: craig.topper, evstupac

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55876

llvm-svn: 350074
2018-12-26 21:59:48 +00:00
Max Kazantsev 28298e9647 [NFC] Use utility function for guards detection
llvm-svn: 350064
2018-12-26 08:22:25 +00:00
Max Kazantsev 9b25bf3960 [NFC] Reuse variables instead of re-calling getParent
llvm-svn: 350062
2018-12-25 07:20:06 +00:00
Eugene Leviant 4dc3a3f746 [HWASAN] Instrument memorty intrinsics by default
Differential revision: https://reviews.llvm.org/D55926

llvm-svn: 350055
2018-12-24 16:02:48 +00:00
Max Kazantsev edabb9ae56 [LoopSimplifyCFG] Delete dead exiting edges
This patch teaches LoopSimplifyCFG to remove dead exiting edges
from loops.

Differential Revision: https://reviews.llvm.org/D54025
Reviewed By: fedor.sergeev

llvm-svn: 350049
2018-12-24 07:41:33 +00:00
Max Kazantsev 347c583772 Return "[LoopSimplifyCFG] Delete dead in-loop blocks"
The underlying bug that caused the revert should be fixed by rL348567.

Differential Revision: https://reviews.llvm.org/D54023

llvm-svn: 350045
2018-12-24 06:06:17 +00:00
George Burgess IV 7e12875c89 [LoopIdioms] More LocationSize::precise annotations; NFC
Both of these places reference memset-like loops. Memset is precise.

Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350044
2018-12-24 05:55:50 +00:00
George Burgess IV 5e4a03a089 [MemCpyOpt] Use LocationSize instead of ints; NFC
Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.

srcSize is derived from the size of an alloca, and we quit out if the
size of that is > the size of the thing we're copying to. Hence, we
should always copy everything over, so these sizes are precise.

Don't make srcSize itself a LocationSize, since optionality isn't
helpful, and we do some comparisons against other sizes elsewhere in
that function.

llvm-svn: 350019
2018-12-23 06:40:39 +00:00
Mircea Trofin b53eeb6f4c [llvm] API for encoding/decoding DWARF discriminators.
Summary:
Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling)

The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues:
- communicates overflow back to the user
- supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported.

Reviewers: davidxl, danielcdh, wmi, dblaikie

Reviewed By: dblaikie

Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D55681

llvm-svn: 349973
2018-12-21 22:48:50 +00:00
Vedant Kumar b264d69de7 [IR] Add Instruction::isLifetimeStartOrEnd, NFC
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.

This was suggested as a cleanup in D55967.

Differential Revision: https://reviews.llvm.org/D56019

llvm-svn: 349964
2018-12-21 21:49:40 +00:00
Anna Thomas 18be3cb606 [RuntimeUnrolling] NFC: Add TODO and comments in connectProlog
Currently, runtime unrolling does not support loops where multiple
exiting blocks exit to the latchExit. Added TODO and other code
clarifications for ConnectProlog code.

llvm-svn: 349944
2018-12-21 19:45:05 +00:00
Simon Pilgrim 5d403f6bf8 [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm)
This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics.

Clang counterpart: https://reviews.llvm.org/D55890

Differential Revision: https://reviews.llvm.org/D55894

llvm-svn: 349892
2018-12-21 09:04:14 +00:00
Reid Kleckner b894ecf903 [memcpyopt] Add debug logs when forwarding memcpy src to dst
llvm-svn: 349873
2018-12-21 01:41:20 +00:00
Eli Friedman 3af2f53456 [LoopUnroll] Don't verify domtree by default with +Asserts.
This verification is linear in the size of the function, so it can cause
a quadratic compile-time explosion in a function with many loops to
unroll.

Differential Revision: https://reviews.llvm.org/D54732

llvm-svn: 349871
2018-12-21 01:28:49 +00:00
Tom Stellard 2f44fbe936 cmake: Remove add_llvm_loadable_module()
Summary:
This function is very similar to add_llvm_library(),  so this patch merges it
into add_llvm_library() and replaces all calls to add_llvm_loadable_module(lib ...)
with add_llvm_library(lib MODULE ...)

Reviewers: philip.pfaffe, beanz, chandlerc

Reviewed By: philip.pfaffe

Subscribers: chapuni, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D51748

llvm-svn: 349839
2018-12-20 22:04:08 +00:00
Michael Kruse 199427100b [InstCombine] Preserve access-group metadata.
Preserve llvm.access.group metadata when combining store instructions.
This was forgotten in r349725.

Fixes llvm.org/PR40117

llvm-svn: 349774
2018-12-20 17:11:02 +00:00
Piotr Sobczak deaacc17fe [InstCombine][AMDGPU] Handle more buffer intrinsics
Summary:
Include the following intrinsics in the InsctCombine
simplification:

* amdgcn_raw_buffer_load
* amdgcn_raw_buffer_load_format
* amdgcn_struct_buffer_load
* amdgcn_struct_buffer_load_format

Change-Id: I14deceff74bcb21179baf6aa6e94bf39e7d63d5d

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55882

llvm-svn: 349735
2018-12-20 10:08:18 +00:00
Alexander Potapenko 0e3b85a730 [MSan] Don't emit __msan_instrument_asm_load() calls
LLVM treats void* pointers passed to assembly routines as pointers to
sized types.
We used to emit calls to __msan_instrument_asm_load() for every such
void*, which sometimes led to false positives.
A less error-prone (and truly "conservative") approach is to unpoison
only assembly output arguments.

llvm-svn: 349734
2018-12-20 10:05:00 +00:00
Eugene Leviant 2d98eb1b2e [HWASAN] Add support for memory intrinsics
Differential revision: https://reviews.llvm.org/D55117

llvm-svn: 349728
2018-12-20 09:04:33 +00:00
Michael Kruse 978ba61536 Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.

Differential Revision: https://reviews.llvm.org/D52116

llvm-svn: 349725
2018-12-20 04:58:07 +00:00
Vitaly Buka 07a55f27dc [asan] Undo special treatment of linkonce_odr and weak_odr
Summary:
On non-Windows these are already removed by ShouldInstrumentGlobal.
On Window we will wait until we get actual issues with that.

Reviewers: pcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55899

llvm-svn: 349707
2018-12-20 00:30:27 +00:00
Vitaly Buka d414e1bbb5 [asan] Prevent folding of globals with redzones
Summary:
ICF prevented by removing unnamed_addr and local_unnamed_addr for all sanitized
globals.
Also in general unnamed_addr is not valid here as address now is important for
ODR violation detector and redzone poisoning.

Before the patch ICF on globals caused:
1. false ODR reports when we register global on the same address more than once
2. globals buffer overflow if we fold variables of smaller type inside of large
type. Then the smaller one will poison redzone which overlaps with the larger one.

Reviewers: eugenis, pcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55857

llvm-svn: 349706
2018-12-20 00:30:18 +00:00
Nikita Popov 3817ee7908 Revert "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This reverts commit r349674. It causes a failure in
test-suite enc-3des.execution_time.

llvm-svn: 349684
2018-12-19 22:09:02 +00:00
Nikita Popov 649e125451 [BDCE][DemandedBits] Detect dead uses of undead instructions
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The test case has a couple of cases that are not simplified yet. In
particular, we're only looking at uses of instructions right now. I think
it would make sense to also extend this to arguments. Furthermore
DemandedBits doesn't yet know some of the tricks that InstCombine does
for the demanded bits or bitwise or/and/xor in combination with known
bits information.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 349674
2018-12-19 19:56:21 +00:00
Anton Afanasyev ce28791e20 Test commit
Fix typos.

llvm-svn: 349644
2018-12-19 17:18:40 +00:00
Vitaly Buka 4e4920694c [asan] Restore ODR-violation detection on vtables
Summary:
unnamed_addr is still useful for detecting of ODR violations on vtables

Still unnamed_addr with lld and --icf=safe or --icf=all can trigger false
reports which can be avoided with --icf=none or by using private aliases
with -fsanitize-address-use-odr-indicator

Reviewers: eugenis

Reviewed By: eugenis

Subscribers: kubamracek, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55799

llvm-svn: 349555
2018-12-18 22:23:30 +00:00
Kuba Mracek 3760fc9f3d [asan] In llvm.asan.globals, allow entries to be non-GlobalVariable and skip over them
Looks like there are valid reasons why we need to allow bitcasts in llvm.asan.globals, see discussion at https://github.com/apple/swift-llvm/pull/133. Let's look through bitcasts when iterating over entries in the llvm.asan.globals list.

Differential Revision: https://reviews.llvm.org/D55794

llvm-svn: 349544
2018-12-18 21:20:17 +00:00
Pete Cooper be4f571107 Change the objc ARC optimizer to use the new objc.* intrinsics
We're moving ARC optimisation and ARC emission in clang away from runtime methods
and towards intrinsics.  This is the part which actually uses the intrinsics in the ARC
optimizer when both analyzing the existing calls and emitting new ones.

Differential Revision: https://reviews.llvm.org/D55348

Reviewers: ahatanak
llvm-svn: 349534
2018-12-18 20:32:49 +00:00
Nikita Popov 20853a7807 [InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask check
Checking whether a number has a certain number of trailing / leading
zeros means checking whether it is of the form XXXX1000 / 0001XXXX,
which can be done with an and+icmp.

Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next
step, this can be extended to non-equality predicates.

Differential Revision: https://reviews.llvm.org/D55745

llvm-svn: 349530
2018-12-18 19:59:50 +00:00
Florian Hahn 5c014037b3 [SCCP] Get rid of redundant call for getPredicateInfoFor (NFC).
We can use the result fetched a few lines above.

llvm-svn: 349527
2018-12-18 19:37:07 +00:00
Sanjay Patel e51d5bdb3c [InstCombine] refactor isCheapToScalarize(); NFC
As the FIXME indicates, this has the potential to go
overboard. So I'm not sure if it's even worth keeping 
this vs. iteratively doing simple matches, but we might 
as well clean it up.

llvm-svn: 349523
2018-12-18 19:07:38 +00:00
Michael Kruse d4eb13c880 [LoopVectorize] Rename pass options. NFC.
Rename:
NoUnrolling to InterleaveOnlyWhenForced
and
AlwaysVectorize to !VectorizeOnlyWhenForced

Contrary to what the name 'AlwaysVectorize' suggests, it does not
unconditionally vectorize all loops, but applies a cost model to
determine whether vectorization is profitable to all loops. Hence,
passing false will disable the cost model, except when a loop is marked
with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested
by @hfinkel in D55716) better matches this behavior.

Similarly, 'NoUnrolling' disables the profitability cost model for
interleaving (a term to distinguish it from unrolling by the
LoopUnrollPass); rename it for consistency.

Differential Revision: https://reviews.llvm.org/D55785

llvm-svn: 349513
2018-12-18 17:46:09 +00:00
Michael Kruse 3284775b70 [LoopUnroll] Honor '#pragma unroll' even with -fno-unroll-loops.
When using clang with `-fno-unroll-loops` (implicitly added with `-O1`),
the LoopUnrollPass is not not added to the (legacy) pass pipeline. This
also means that it will not process any loop metadata such as
llvm.loop.unroll.enable (which is generated by #pragma unroll or
WarnMissedTransformationsPass emits a warning that a forced
transformation has not been applied (see
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html).
Such explicit transformations should take precedence over disabling
heuristics.

This patch unconditionally adds LoopUnrollPass to the optimizing
pipeline (that is, it is still not added with `-O0`), but passes a flag
indicating whether automatic unrolling is dis-/enabled. This is the same
approach as LoopVectorize uses.

The new pass manager's pipeline builder has no option to disable
unrolling, hence the problem does not apply.

Differential Revision: https://reviews.llvm.org/D55716

llvm-svn: 349509
2018-12-18 17:16:05 +00:00
Dylan McKay f920da009e [IPO][AVR] Create new Functions in the default address space specified in the data layout
This modifies the IPO pass so that it respects any explicit function
address space specified in the data layout.

In targets with nonzero program address spaces, all functions should, by
default, be placed into the default program address space.

This is required for Harvard architectures like AVR. Without this, the
functions will be marked as residing in data space, and thus not be
callable.

This has no effect to any in-tree official backends, as none use an
explicit program address space in their data layouts.

Patch by Tim Neumann.

llvm-svn: 349469
2018-12-18 09:52:52 +00:00
Tim Northover 856628f707 SROA: preserve alignment tags on loads and stores.
When splitting up an alloca's uses we were dropping any explicit
alignment tags, which means they default to the ABI-required default
alignment and this can cause miscompiles if the real value was smaller.

Also refactor the TBAA metadata into a parent class since it's shared by
both children anyway.

llvm-svn: 349465
2018-12-18 09:29:39 +00:00
Peter Collingbourne d3a3e4b46d hwasan: Move ctor into a comdat.
Differential Revision: https://reviews.llvm.org/D55733

llvm-svn: 349413
2018-12-17 22:56:34 +00:00
Sanjay Patel 200885e654 [AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924)
Now, that we have funnel shift intrinsics, it should be safe to convert this form of rotate to it. 
In the worst case (a target that doesn't have rotate instructions), we will expand this into a 
branch-less sequence of ALU ops (neg/and/and/lshr/shl/or) in the backend, so it's still very 
likely to be a perf improvement over the original code.

The motivating source code pattern for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=34924

Background:
I looked at several different options before deciding where to try this - instcombine, simplifycfg, 
CGP - because it doesn't fit cleanly anywhere AFAIK.

The backend (CGP, SDAG, GlobalIsel?) is too late for what we're trying to accomplish. We want to 
have the IR converted before we reach things like vectorization because the reduced code can make a 
loop much simpler to transform.

Technically, this could be included in instcombine, but it's a large pattern match that includes 
control-flow, so it just felt wrong to stuff into there (although I have a draft of that patch). 
Similarly, this could be part of simplifycfg, but all of this pattern matching is a stretch.

So we're left with our relatively new dumping ground for homeless transforms: aggressive-instcombine. 
This only runs at -O3, but that seems like a reasonable limitation given that source code has many 
options to avoid this pattern (including the recently added clang intrinsics for rotates).

I'm including a PhaseOrdering test because we require the teamwork of 3 passes (aggressive-instcombine, 
instcombine, simplifycfg) to get this into the minimal IR form that we want. That test shows a bug
with the new pass manager that's independent of this change (but it will be masked if we canonicalize
harder to funnel shift intrinsics in instcombine).

Differential Revision: https://reviews.llvm.org/D55604

llvm-svn: 349396
2018-12-17 21:14:51 +00:00
Sanjay Patel 1a6e9ec434 [InstCombine] don't widen an arbitrary sequence of vector ops (PR40032)
The problem is shown specifically for a case with vector multiply here:
https://bugs.llvm.org/show_bug.cgi?id=40032
...and this might mask the original backend bug for ARM shown in:
https://bugs.llvm.org/show_bug.cgi?id=39967

As the test diffs here show, we were (and probably still aren't) doing 
these kinds of transforms in a principled way. We are producing more or 
equal wide instructions than we started with in some cases, so we still 
need to restrict/correct other transforms from overstepping.

If there are perf regressions from this change, we can either carve out 
exceptions to the general IR rules, or improve the backend to do these 
transforms when we know the transform is profitable. That's probably 
similar to a change like D55448.

Differential Revision: https://reviews.llvm.org/D55744

llvm-svn: 349389
2018-12-17 20:27:43 +00:00
Davide Italiano e41e1d015f [EarlyCSE] If DI can't be salvaged, mark it as unavailable.
Fixes PR39874.

llvm-svn: 349323
2018-12-17 01:42:39 +00:00
Kamil Rytarowski 21e270a479 Add NetBSD support in needsRuntimeRegistrationOfSectionRange.
Use linker script magic to get data/cnts/name start/end.

llvm-svn: 349277
2018-12-15 16:51:35 +00:00
Kamil Rytarowski 15ae738bc8 Register kASan shadow offset for NetBSD/amd64
The NetBSD x86_64 kernel uses the 0xdfff900000000000 shadow
offset.

llvm-svn: 349276
2018-12-15 16:32:41 +00:00
Florian Hahn c214bc2b8d [NewGVN] Update use counts for SSA copies when replacing them by their operands.
The current code relies on LeaderUseCount to determine if we can remove
an SSA copy, but in that the LeaderUseCount does not refer to the SSA
copy. If a SSA copy is a dominating leader, we use the operand as dominating
leader instead. This means we removed a user of a ssa copy and we should
decrement its use count, so we can remove the ssa copy once it becomes dead.

Fixes PR38804.

Reviewers: efriedma, davide

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D51595

llvm-svn: 349217
2018-12-15 00:32:38 +00:00
Vedant Kumar 9d1827331f [Util] Refer to [s|z]exts of args when converting dbg.declares (fix PR35400)
When converting dbg.declares, if the described value is a [s|z]ext,
refer to the ext directly instead of referring to its operand.

This fixes a narrowing bug (the debugger got the sign of a variable
wrong, see llvm.org/PR35400).

The main reason to refer to the ext's operand was that an optimization
may remove the ext itself, leading to a dropped variable. Now that
InstCombine has been taught to use replaceAllDbgUsesWith (r336451), this
is less of a concern. Other passes can/should adopt this API as needed
to fix dropped variable bugs.

Differential Revision: https://reviews.llvm.org/D51813

llvm-svn: 349214
2018-12-15 00:03:33 +00:00
Michael Kruse ea9ef34558 [TransformWarning] Do not warn missed transformations in optnone functions.
Optimization transformations are intentionally disabled by the 'optnone'
function attribute. Therefore do not warn if transformation metadata is
still present.

Using the legacy pass manager structure, the `skipFunction` method takes
care for the optnone attribute (already called before this patch). For
the new pass manager, there is no equivalent, so we check for the
'optnone' attribute manually.

Differential Revision: https://reviews.llvm.org/D55690

llvm-svn: 349184
2018-12-14 19:45:43 +00:00
Michael Kruse 5948b7f30f [Transforms] Preserve metadata when converting invoke to call.
The `changeToCall` function did not preserve the invoke's metadata.
Currently, there is probably no metadata that depends on being applied
on a CallInst or InvokeInst. Therefore we can replace the instruction's
metadata.

This fixes http://llvm.org/PR39994

Suggested-by: Moritz Kreutzer <moritz.kreutzer@siemens.com>

Differential Revision: https://reviews.llvm.org/D55666

llvm-svn: 349170
2018-12-14 18:15:11 +00:00
Evgeniy Stepanov eb238ecf0f Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)"
Breaks sanitizer-android buildbot.

This reverts commit af8443a984c3b491c9ca2996b8d126ea31e5ecbe.

llvm-svn: 349092
2018-12-13 23:47:50 +00:00
Wei Mi 66c6c5abea [SampleFDO] handle ProfileSampleAccurate when initializing function entry count
ProfileSampleAccurate is used to indicate the profile has exact match to the
code to be optimized.

Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite
and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function
entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle
ProfileSampleAccurate in multiple places.

Differential Revision: https://reviews.llvm.org/D55660

llvm-svn: 349088
2018-12-13 21:51:42 +00:00
Nikita Popov dc73a6edde Reapply "[MemCpyOpt] memset->memcpy forwarding with undef tail"
Currently memcpyopt optimizes cases like

    memset(a, byte, N);
    memcpy(b, a, M);

to

    memset(a, byte, N);
    memset(b, byte, M);

if M <= N. Often this allows further simplifications down the line,
which drop the first memset entirely.

This patch extends this optimization for the case where M > N, but we
know that the bytes a[N..M] are undef due to alloca/lifetime.start.

This situation arises relatively often for Rust code, because Rust does
not initialize trailing structure padding and loves to insert redundant
memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.

The previous version of this patch did not perform dependency checking
properly: While the dependency is checked at the position of the memset,
the used size must be that of the memcpy. Previously the size of the
memset was used, which missed modification in the region
MemSetSize..CopySize, resulting in miscompiles. The added tests cover
variations of this issue.

Differential Revision: https://reviews.llvm.org/D55120

llvm-svn: 349078
2018-12-13 20:04:27 +00:00
Easwaran Raman 5a7056fa03 [ThinLTO] Compute synthetic function entry count
Summary:
This patch computes the synthetic function entry count on the whole
program callgraph (based on module summary) and writes the entry counts
to the summary. After function importing, this count gets attached to
the IR as metadata. Since it adds a new field to the summary, this bumps
up the version.

Reviewers: tejohnson

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D43521

llvm-svn: 349076
2018-12-13 19:54:27 +00:00
Davide Italiano 9737096bb1 [LoopUtils] Use i32 instead of `void`.
The actual type of the first argument of the @dbg intrinsic
doesn't really matter as we're setting it to `undef`, but the
bitcode reader is picky about `void` types.

llvm-svn: 349069
2018-12-13 18:37:23 +00:00
Vitaly Buka a257639a69 [asan] Don't check ODR violations for particular types of globals
Summary:
private and internal: should not trigger ODR at all.
unnamed_addr: current ODR checking approach fail and rereport false violation if
a linker merges such globals
linkonce_odr, weak_odr: could cause similar problems and they are already not
instrumented for ELF.

Reviewers: eugenis, kcc

Subscribers: kubamracek, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55621

llvm-svn: 349015
2018-12-13 09:47:39 +00:00
David L. Jones 54c01ad6a9 Revert r348645 - "[MemCpyOpt] memset->memcpy forwarding with undef tail"
This revision caused trucated memsets for structs with padding. See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610520.html

llvm-svn: 349002
2018-12-13 03:15:11 +00:00
Davide Italiano 8ee59ca653 [LoopUtils] Prefer a set over a map. NFCI.
llvm-svn: 348999
2018-12-13 01:11:52 +00:00
Davide Italiano 744c3c327f [LoopDeletion] Update debug values after loop deletion.
When loops are deleted, we don't keep track of variables modified inside
the loops, so the DI will contain the wrong value for these.

e.g.

int b() {

int i;
for (i = 0; i < 2; i++)
  ;
patatino();
return a;
-> 6 patatino();

7     return a;
8   }
9   int main() { b(); }
(lldb) frame var i
(int) i = 0

We mark instead these values as unavailable inserting a
@llvm.dbg.value(undef to make sure we don't end up printing an incorrect
value in the debugger. We could consider doing something fancier,
for, e.g. constants, in the future.

PR39868.
rdar://problem/46418795)

Differential Revision: https://reviews.llvm.org/D55299

llvm-svn: 348988
2018-12-12 23:32:35 +00:00
Nikita Popov 36e03ac6ee [InstCombine] Fix negative GEP offset evaluation for 32-bit pointers
This fixes https://bugs.llvm.org/show_bug.cgi?id=39908.

The evaluateGEPOffsetExpression() function simplifies GEP offsets for
use in comparisons against zero, basically by converting X*Scale+Offset==0
to X+Offset/Scale==0 if Scale divides Offset. However, before this is done,
Offset is masked down to the pointer size. This results in incorrect
results for negative Offsets, because we basically end up dividing the
32-bit offset *zero* extended to 64-bit bits (rather than sign extended).

Fix this by explicitly sign extending the truncated value.

Differential Revision: https://reviews.llvm.org/D55449

llvm-svn: 348987
2018-12-12 23:19:03 +00:00
Ryan Prichard e028c818f5 [hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)
Summary:
The change is needed to support ELF TLS in Android. See D55581 for the
same change in compiler-rt.

Reviewers: srhines, eugenis

Reviewed By: eugenis

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D55592

llvm-svn: 348983
2018-12-12 22:45:06 +00:00
Michael Kruse 7244852557 [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.

    #pragma clang loop unroll_and_jam(enable)
    #pragma clang loop distribute(enable)

is the same as

    #pragma clang loop distribute(enable)
    #pragma clang loop unroll_and_jam(enable)

and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.

This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,

    !0 = !{!0, !1, !2}
    !1 = !{!"llvm.loop.unroll_and_jam.enable"}
    !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
    !3 = !{!"llvm.loop.distribute.enable"}

defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.

Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.

For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.

Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.

To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.

With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).

Reviewed By: hfinkel, dmgreen

Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288

llvm-svn: 348944
2018-12-12 17:32:52 +00:00
Mikael Holmen c06b01cb22 Fix compiler warning about unused variable [NFC]
llvm-svn: 348913
2018-12-12 06:33:45 +00:00
Gor Nishanov 20d833d5e3 [coroutines] Improve suspend point simplification
Summary:
Enable suspend point simplification for cases where:
* coro.save and coro.suspend are in different basic blocks
* where there are intervening intrinsics

Reviewers: modocache, tks2103, lewissbaker

Reviewed By: modocache

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D55160

llvm-svn: 348897
2018-12-11 21:23:09 +00:00
Fedor Sergeev a1d95c3fc4 [NewPM] fixing asserts on deleted loop in -print-after-all
IR-printing AfterPass instrumentation might be called on a loop
that has just been invalidated. We should skip printing it to
avoid spurious asserts.

Reviewed By: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D54740

llvm-svn: 348887
2018-12-11 19:05:35 +00:00
Vedant Kumar b3a7cae045 [HotColdSplitting] Disable outlining landingpad instructions (PR39917)
It's currently not safe to outline landingpad instructions (see
llvm.org/PR39917). Like @llvm.eh.typeid.for, the order and content of
previous landingpad instructions in a function alters the lowering of
subsequent landingpads by renumbering type info ID's. Outlining a
landingpad therefore breaks exception handling & unwinding.

llvm-svn: 348870
2018-12-11 18:05:31 +00:00
Sanjay Patel 2aa2dc76c2 [InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927)
call iM movmsk(sext <N x i1> X) --> zext (bitcast <N x i1> X to iN) to iM

This has the potential to create less-than-8-bit scalar types as shown in 
some of the test diffs, but it looks like the backend knows how to deal 
with that in these patterns. This is the simple part of the fix suggested in:
https://bugs.llvm.org/show_bug.cgi?id=39927

Differential Revision: https://reviews.llvm.org/D55529

llvm-svn: 348862
2018-12-11 16:38:03 +00:00
David Stenberg 2474ce5862 [DeadArgElim] Fixes for dbg.values using dead arg/return values
Summary:
When eliminating a dead argument or return value in a function with
local linkage, all uses, including in dbg.value intrinsics, would be
replaced with null constants. This would mean that, for example for an
integer argument, the debug info would incorrectly express that the
value is 0. Instead, replace all uses with undef to indicate that the
argument/return value is optimized out.

Also, make sure that metadata uses of return values are rewritten even
if there are no non-metadata uses of the value.

As a bit of historical curiosity, the code that emitted null constants
was introduced in the initial check-in of the pass in 2003, before
'undef' values even existed in LLVM.

This fixes PR23260.

Reviewers: dblaikie, aprantl, vsk, djtodoro

Reviewed By: aprantl

Subscribers: llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D55513

llvm-svn: 348837
2018-12-11 10:33:38 +00:00
Davide Italiano 8ec7709f58 [Local] Promote an utility that could be used elsewhere. NFCI.
llvm-svn: 348804
2018-12-10 22:17:04 +00:00
Matt Arsenault 9ccde61f81 InstCombine: Scalarize single use icmp/fcmp
llvm-svn: 348801
2018-12-10 21:50:54 +00:00
Nikita Popov 94b8e2ea4e [MemCpyOpt] memset->memcpy forwarding with undef tail
Currently memcpyopt optimizes cases like

    memset(a, byte, N);
    memcpy(b, a, M);

to

    memset(a, byte, N);
    memset(b, byte, M);

if M <= N. Often this allows further simplifications down the line,
which drop the first memset entirely.

This patch extends this optimization for the case where M > N, but we
know that the bytes a[N..M] are undef due to alloca/lifetime.start.

This situation arises relatively often for Rust code, because Rust does
not initialize trailing structure padding and loves to insert redundant
memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.

For the implementation, I'm reusing a bit of code for a similar existing
optimization (direct memcpy of undef). I've also added memset support to
MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be
used, but it seems to make more sense to add this to GetLocation and thus
make the computation cachable.

Differential Revision: https://reviews.llvm.org/D55120

llvm-svn: 348645
2018-12-07 21:16:58 +00:00
Vedant Kumar 03f9f15b16 [HotColdSplitting] Refine definition of unlikelyExecuted
The splitting pass uses its 'unlikelyExecuted' predicate to statically
decide which blocks are cold.

- Do not treat noreturn calls as if they are cold unless they are actually
  marked cold. This is motivated by functions like exit() and longjmp(), which
  are not beneficial to outline.

- Do not treat inline asm as an outlining barrier. In practice asm("") is
  frequently used to inhibit basic block merging; enabling outlining in this case
  results in substantial memory savings.

- Treat invokes of cold functions as cold.

As a drive-by, remove the 'exceptionHandlingFunctions' predicate, because it's
no longer needed. The pass can identify & outline blocks dominated by EH pads,
so there's no need to special-case __cxa_begin_catch etc.

Differential Revision: https://reviews.llvm.org/D54244

llvm-svn: 348640
2018-12-07 20:24:04 +00:00
Vedant Kumar 03aaa3e2aa [HotColdSplitting] Outline more than once per function
Algorithm: Identify maximal cold regions and put them in a worklist. If
a candidate region overlaps with another, discard it. While the worklist
is full, remove a single-entry sub-region from the worklist and attempt
to outline it. By the non-overlap property, this should not invalidate
parts of the domtree pertaining to other outlining regions.

Testing: LNT results on X86 are clean. With test-suite + externals, llvm
outlines 134KB pre-patch, and 352KB post-patch (+ ~2.6x). The file
483.xalancbmk/src/Constants.cpp stands out as an extreme case where llvm
outlines over 100 times in some functions (mostly EH paths). There was
not a significant performance impact pre vs. post-patch.

Differential Revision: https://reviews.llvm.org/D53887

llvm-svn: 348639
2018-12-07 20:23:52 +00:00
Nikita Popov 110cf05203 Reapply "[DemandedBits][BDCE] Support vectors of integers"
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

Unlike the previous iteration of this patch, getDemandedBits() can now
again be called on arbirary (sized) instructions, even if they don't
have integer or vector of integer type. (For vector types the size of the
returned mask will now be the scalar size in bits though.)

The added LoopVectorize test case shows a case which triggered an
assertion failure with the previous attempt, because getDemandedBits()
was called on a pointer-typed instruction.

Differential Revision: https://reviews.llvm.org/D55297

llvm-svn: 348602
2018-12-07 15:38:13 +00:00
Max Kazantsev b9e65cbddf Introduce llvm.experimental.widenable_condition intrinsic
This patch introduces a new instinsic `@llvm.experimental.widenable_condition`
that allows explicit representation for guards. It is an alternative to using
`@llvm.experimental.guard` intrinsic that does not contain implicit control flow.

We keep finding places where `@llvm.experimental.guard` is not supported or
treated too conservatively, and there are 2 reasons to that:

- `@llvm.experimental.guard` has memory write side effect to model implicit control flow,
  and this sometimes confuses passes and analyzes that work with memory;
- Not all passes and analysis are aware of the semantics of guards. These passes treat them
  as regular throwing call and have no idea that the condition of guard may be used to prove
  something. One well-known place which had caused us troubles in the past is explicit loop
  iteration count calculation in SCEV. Another example is new loop unswitching which is not
  aware of guards. Whenever a new pass appears, we potentially have this problem there.

Rather than go and fix all these places (and commit to keep track of them and add support
in future), it seems more reasonable to leverage the existing optimizer's logic as much as possible.
The only significant difference between guards and regular explicit branches is that guard's condition
can be widened. It means that a guard contains (explicitly or implicitly) a `deopt` block successor,
and it is always legal to go there no matter what the guard condition is. The other successor is
a guarded block, and it is only legal to go there if the condition is true.

This patch introduces a new explicit form of guards alternative to `@llvm.experimental.guard`
intrinsic. Now a widenable guard can be represented in the CFG explicitly like this:


    %widenable_condition = call i1 @llvm.experimental.widenable.condition()
    %new_condition = and i1 %cond, %widenable_condition
    br i1 %new_condition, label %guarded, label %deopt

  guarded:
    ; Guarded instructions

  deopt:
    call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]

The new intrinsic `@llvm.experimental.widenable.condition` has semantics of an
`undef`, but the intrinsic prevents the optimizer from folding it early. This form
should exploit all optimization boons provided to `br` instuction, and it still can be
widened by replacing the result of `@llvm.experimental.widenable.condition()`
with `and` with any arbitrary boolean value (as long as the branch that is taken when
it is `false` has a deopt and has no side-effects).

For more motivation, please check llvm-dev discussion "[llvm-dev] Giving up using
implicit control flow in guards".

This patch introduces this new intrinsic with respective LangRef changes and a pass
that converts old-style guards (expressed as intrinsics) into the new form.

The naming discussion is still ungoing. Merging this to unblock further items. We can
later change the name of this intrinsic.

Reviewed By: reames, fedor.sergeev, sanjoy
Differential Revision: https://reviews.llvm.org/D51207

llvm-svn: 348593
2018-12-07 14:39:46 +00:00
Markus Lavin 4dc4ebd606 [PM] Port LoadStoreVectorizer to the new pass manager.
Differential Revision: https://reviews.llvm.org/D54848

llvm-svn: 348570
2018-12-07 08:23:37 +00:00
Max Kazantsev a523a21175 [LoopSimplifyCFG] Do not deal with loops with irreducible CFG inside
The current algorithm that collects live/dead/inloop blocks relies on some invariants
related to RPO and PO traversals. In particular, the important fact it requires is that
the only loop's latch is the first block in PO traversal. It also relies on fact that during
RPO we visit all prececessors of a block before we visit this block (backedges ignored).

If a loop has irreducible non-loop cycle inside, both these assumptions may break.
This patch adds detection for this situation and prohibits the terminator folding
for loops with irreducible CFG.

We can in theory support this later, for this some algorithmic changes are needed.
Besides, irreducible CFG is not a frequent situation and we can just don't bother.

Thanks @uabelho for finding this!

Differential Revision: https://reviews.llvm.org/D55357
Reviewed By: skatkov

llvm-svn: 348567
2018-12-07 05:44:45 +00:00
Vedant Kumar b2a6f8e505 [CodeExtractor] Store outputs at the first valid insertion point
When CodeExtractor outlines values which are used by the original
function, it must store those values in some in-out parameter. This
store instruction must not be inserted in between a PHI and an EH pad
instruction, as that results in invalid IR.

This fixes the following verifier failure seen while outlining within
ObjC methods with live exit values:

  The unwind destination does not have an exception handling instruction!
    %call35 = invoke i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %exn.adjusted, i8* %1)
            to label %invoke.cont34 unwind label %lpad33, !dbg !4183
  The unwind destination does not have an exception handling instruction!
    invoke void @objc_exception_throw(i8* %call35) #12
            to label %invoke.cont36 unwind label %lpad33, !dbg !4184
  LandingPadInst not the first non-PHI instruction in the block.
    %3 = landingpad { i8*, i32 }
            catch i8* null, !dbg !1411

rdar://46540815

llvm-svn: 348562
2018-12-07 03:01:54 +00:00
Nikita Popov 14ca9a8355 Revert "[DemandedBits][BDCE] Support vectors of integers"
This reverts commit r348549. Causing assertion failures during
clang build.

llvm-svn: 348558
2018-12-07 00:42:03 +00:00
Nikita Popov cf65b9207b [DemandedBits][BDCE] Support vectors of integers
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

The getDemandedBits() method can now only be called on instructions that
have integer or vector of integer type. Previously it could be called on
any sized instruction (even if it was not particularly useful). The size
of the return value is now always the scalar size in bits (while
previously it was the type size in bits).

Differential Revision: https://reviews.llvm.org/D55297

llvm-svn: 348549
2018-12-06 23:50:32 +00:00
Adrian Prantl fbeeac0e1e Reapply "Adapt gcov to changes in CFE."
This reverts commit r348203 and reapplies D55085 with an additional
GCOV bugfix to make the change NFC for relative file paths in .gcno files.

Thanks to Ilya Biryukov for additional testing!

Original commit message:

    Update Diagnostic handling for changes in CFE.

    The clang frontend no longer emits the current working directory for
    DIFiles containing an absolute path in the filename: and will move the
    common prefix between current working directory and the file into the
    directory: component.

    https://reviews.llvm.org/D55085

llvm-svn: 348512
2018-12-06 18:44:48 +00:00
Alexandros Lamprineas e4c91f5c4c [GVN] Don't perform scalar PRE on GEPs
Partial Redundancy Elimination of GEPs prevents CodeGenPrepare from
sinking the addressing mode computation of memory instructions back
to its uses. The problem comes from the insertion of PHIs, which
confuse CGP and make it bail.

I've autogenerated the check lines of an existing test and added a
store instruction to demonstrate the motivation behind this change.
The store is now using the gep instead of a phi.

Differential Revision: https://reviews.llvm.org/D55009

llvm-svn: 348496
2018-12-06 16:11:58 +00:00
Ilya Biryukov cb5331eb93 Revert "[LoopSimplifyCFG] Delete dead in-loop blocks"
This reverts commit r348457.
The original commit causes clang to crash when doing an instrumented
build with a new pass manager. Reverting to unbreak our integrate.

llvm-svn: 348484
2018-12-06 13:21:01 +00:00
Roman Lebedev 98cb1216a6 [InstCombine] foldICmpWithLowBitMaskedVal(): don't miscompile -1 vector elts
I was finally able to quantify what i thought was missing in the fix,
it was vector constants. If we have a scalar (and %x, -1),
it will be instsimplified before we reach this code,
but if it is a vector, we may still have a -1 element.

Thus, we want to avoid the fold if *at least one* element is -1.
Or in other words, ignoring the undef elements, no sign bits
should be set. Thus, m_NonNegative().

A follow-up for rL348181
https://bugs.llvm.org/show_bug.cgi?id=39861

llvm-svn: 348462
2018-12-06 08:14:24 +00:00
Max Kazantsev 0b1d069d64 [LoopSimplifyCFG] Delete dead in-loop blocks
This patch teaches LoopSimplifyCFG to delete loop blocks that have
become unreachable after terminator folding has been done.

Differential Revision: https://reviews.llvm.org/D54023
Reviewed By: anna

llvm-svn: 348457
2018-12-06 05:45:02 +00:00
Sanjay Patel 998ececef0 [InstCombine] remove dead code from visitExtractElement
Extracting from a splat constant is always handled by InstSimplify.
Move the test for this from InstCombine to InstSimplify to make
sure that stays true.

llvm-svn: 348423
2018-12-05 23:09:33 +00:00
Sanjay Patel 47b3b4b5aa [InstCombine] reduce duplication in visitExtractElementInst; NFC
llvm-svn: 348418
2018-12-05 21:57:51 +00:00
Vedant Kumar 09415a850e [CodeExtractor] Do not marked outlined calls which may resume EH as noreturn
Treat terminators which resume exception propagation as returning instructions
(at least, for the purposes of marking outlined functions `noreturn`). This is
to avoid inserting traps after calls to outlined functions which unwind.

rdar://46129950

llvm-svn: 348404
2018-12-05 19:35:37 +00:00
Christian Bruel 4ead99b3ac Allow norecurse attribute on functions that have debug infos.
Summary: debug intrinsics might be marked norecurse to enable the caller function to be norecurse and optimized if needed. This avoids code gen optimisation differences when -g is used, as in globalOpt.cpp:processInternalGlobal checks.

Reviewers: chandlerc, jmolloy, aprantl

Reviewed By: aprantl

Subscribers: aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D55187

llvm-svn: 348381
2018-12-05 16:48:00 +00:00
Sanjay Patel baffae91b2 [InstCombine] simplify icmps with same operands based on dominating cmp
The tests here are based on the motivating cases from D54827.

More background:
1. We don't get these cases in general with SimplifyCFG because the root
   of the pattern match is an icmp, not a branch. I'm not sure how often
   we encounter this pattern vs. the seemingly more likely case with 
   branches, but I don't see evidence to leave the minimal pattern
   unoptimized.

2. This has a chance of increasing compile-time because we're using a
   ValueTracking call to handle the match. The motivating cases could be
   handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/
   isImpliedFalseByMatchingCmp, but I saw that we have a more 
   comprehensive wrapper around those, so we might as well use it here
   unless there's evidence that it's significantly slower.

3. Ideally, we'd handle the fold to constants in InstSimplify, but as
   with the existing code here, we could extend this to handle cases
   where the result is not a constant, but a new combined predicate.
   That would mean splitting the logic across the 2 passes and possibly
   duplicating the pattern-matching cost.

4. As mentioned in D54827, this seems like the kind of thing that should
   be handled in Correlated Value Propagation, but that pass is currently
   limited to dealing with instructions with constant operands, so extending
   this bit of InstCombine is the smallest/easiest way to get these patterns 
   optimized.

llvm-svn: 348367
2018-12-05 15:04:00 +00:00
Alina Sbirlea 0e216854f9 [LICM] *Actually* disable ControlFlowHoisting.
Summary:
The remaining code paths that ControlFlowHoisting introduced that were
not disabled, increased compile time by 3x for some benchmarks.
The time is spent in DominatorTree updates.

Reviewers: john.brawn, mkazantsev

Subscribers: sanjoy, jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D55313

llvm-svn: 348345
2018-12-05 10:16:21 +00:00
Vitaly Buka 8076c57fd2 [asan] Add clang flag -fsanitize-address-use-odr-indicator
Reviewers: eugenis, m.ostapenko, ygribov

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55157

llvm-svn: 348327
2018-12-05 01:44:31 +00:00
Vitaly Buka d6bab09b4b [asan] Split -asan-use-private-alias to -asan-use-odr-indicator
Reviewers: eugenis, m.ostapenko, ygribov

Subscribers: mehdi_amini, kubamracek, hiraditya, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D55156

llvm-svn: 348316
2018-12-04 23:17:41 +00:00
Sanjay Patel 3d5bb15a1d [CmpInstAnalysis] fix function signature for ICmp code to predicate; NFC
The old function underspecified the return type, took an unused parameter,
and had a misleading name.

llvm-svn: 348292
2018-12-04 18:53:27 +00:00
Sanjay Patel d23b5ed857 [InstCombine] rearrange foldICmpWithDominatingICmp; NFC
Move it out from under the constant check, reorder
predicates, add comments. This makes it easier to
extend to handle the non-constant case.

llvm-svn: 348284
2018-12-04 17:44:24 +00:00
Ilya Biryukov 449a7f0dbb Revert "Adapt gcov to changes in CFE."
This reverts commit r348203.
Reason: this produces absolute paths in .gcno files, breaking us
internally as we rely on them being consistent with the filenames passed
in the command line.

Also reverts r348157 and r348155 to account for revert of r348154 in
clang repository.

llvm-svn: 348279
2018-12-04 16:30:31 +00:00
Sanjay Patel a40bf9fff7 [InstCombine] add helper for icmp with dominator; NFC
There's a potential small enhancement to this code that could
solve the cases currently under proposal in D54827 via SimplifyCFG.

Whether instcombine should be doing this kind of semi-non-local
analysis in the first place is an open question, but separating
the logic out can only help if/when we decide to move it to a
different pass. 

AFAICT, any proposal to do this in SimplifyCFG could also be seen 
as an overreach + it would be incomplete to start the fold from a 
branch rather than an icmp.

There's another question here about the code for processUGT_ADDCST_ADD().
That part may be completely dead after rL234638 ?

llvm-svn: 348273
2018-12-04 15:35:17 +00:00
Alina Sbirlea 797935f4f1 [SimpleLoopUnswitch] Remove debug dump.
llvm-svn: 348267
2018-12-04 14:43:24 +00:00
Alina Sbirlea a2eebb828e Update MemorySSA in SimpleLoopUnswitch.
Summary:
Teach SimpleLoopUnswitch to preserve MemorySSA.

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D47022

llvm-svn: 348263
2018-12-04 14:23:37 +00:00
Vitaly Buka 537cfc0352 [asan] Reduce binary size by using unnamed private aliases
Summary:
--asan-use-private-alias increases binary sizes by 10% or more.
Most of this space was long names of aliases and new symbols.
These symbols are not needed for the ODC check at all.

Reviewers: eugenis

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55146

llvm-svn: 348221
2018-12-04 00:36:14 +00:00
Vedant Kumar d129569e34 [CodeExtractor] Split PHI nodes with incoming values from outlined region (PR39433)
If a PHI node out of extracted region has multiple incoming values from it,
split this PHI on two parts. First PHI has incomings only from region and
extracts with it (they are placed to the separate basic block that added to the
list of outlined), and incoming values in original PHI are replaced by first
PHI. Similar solution is already used in CodeExtractor for PHIs in entry block
(severSplitPHINodes method). It covers PR39433 bug.

Patch by Sergei Kachkov!

Differential Revision: https://reviews.llvm.org/D55018

llvm-svn: 348205
2018-12-03 22:40:21 +00:00
Adrian Prantl 40eb622325 Adapt gcov to changes in CFE.
The clang frontend no longer emits the current working directory for
DIFiles containing an absolute path in the filename: and will move the
common prefix between current working directory and the file into the
directory: component.

This fixes the GCOV tests in compiler-rt that were broken by the Clang
change.

llvm-svn: 348203
2018-12-03 22:37:48 +00:00
Sanjay Patel 8c65515082 [InstCombine] fix undef propagation bug with shuffle+binop
When we have a shuffle that extends a source vector with undefs
and then do some binop on that, we must make sure that the extra
elements remain undef with that binop if we reverse the order of
the binop and shuffle.

'or' is probably the easiest example to show the bug because
'or C, undef --> -1' (not undef). But there are other 
opcode/constant combinations where this is true as shown by 
the 'shl' test.

llvm-svn: 348191
2018-12-03 21:15:17 +00:00
Roman Lebedev 7bf2fed167 [InstCombine] foldICmpWithLowBitMaskedVal(): disable 2 faulty folds.
These two folds are invalid for this non-constant pattern
when the mask ends up being all-ones:
https://rise4fun.com/Alive/9au
https://rise4fun.com/Alive/UcQM

Fixes https://bugs.llvm.org/show_bug.cgi?id=39861

llvm-svn: 348181
2018-12-03 20:07:58 +00:00
Sanjay Patel f2bda5e43f [InstCombine] rearrange shuffle+binop fold; NFC
This code has a bug dealing with undefs, so we need
to add another escape hatch, so doing some cleanup
ahead of that.

llvm-svn: 348175
2018-12-03 19:53:04 +00:00
Sanjay Patel 472652ef68 [CmpInstAnalysis] fix formatting; NFC
There are potential improvements to the structure of this API
raised by D54994, but remove some cosmetic blemishes before
making any functional changes.

llvm-svn: 348149
2018-12-03 15:48:30 +00:00
Alexander Potapenko 7502e5fc56 [KMSAN] Enable -msan-handle-asm-conservative by default
This change enables conservative assembly instrumentation in KMSAN builds
by default.
It's still possible to disable it with -msan-handle-asm-conservative=0
if something breaks. It's now impossible to enable conservative
instrumentation for userspace builds, but it's not used anyway.

llvm-svn: 348112
2018-12-03 10:15:43 +00:00
Sanjay Patel 7d82d37854 [ValueTracking] add helper function for testing implied condition; NFCI
We were duplicating code around the existing isImpliedCondition() that
checks for a predecessor block/dominating condition, so make that a
wrapper call.

llvm-svn: 348088
2018-12-02 13:26:03 +00:00
Nikita Popov 0c5d6ccbfc [InstCombine] Support ssub.sat canonicalization for non-splats
Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also
support non-splat vector constants. This is done by generalizing
the implementation of the isNotMinSignedValue() helper to return
true for constants that are non-splat, but don't contain any
signed min elements.

Differential Revision: https://reviews.llvm.org/D55011

llvm-svn: 348072
2018-12-01 10:58:34 +00:00
Joseph Tremoulet 27b1e3bd4f [Mem2Reg] Fix nondeterministic corner case
Summary:
When mem2reg inserts phi nodes in blocks with unreachable predecessors,
it adds undef operands for those incoming edges.  When there are
multiple such predecessors, the order is currently based on the address
of the BasicBlocks.  This change fixes that by using the BBNumbers in
the sort/search predicates, as is done elsewhere in mem2reg to ensure
determinism.

Also adds a testcase with a bunch of unreachable preds, which
(nodeterministically) fails without the fix.


Reviewers: majnemer

Reviewed By: majnemer

Subscribers: mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D55077

llvm-svn: 348024
2018-11-30 19:20:02 +00:00
Alexey Bataev 3689747619 [SLP]PR39774: Update references of the replaced external instructions.
Summary:
An additional fix for PR39774. Need to update the references for the
RedcutionRoot instruction when it is replaced during the vectorization
phase to avoid compiler crash on reduction vectorization.

Reviewers: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55017

llvm-svn: 347997
2018-11-30 15:14:20 +00:00
Max Kazantsev 9cf417db78 [LoopSimplifyCFG] Update MemorySSA in terminator folding. PR39783
Terminator folding transform lacks MemorySSA update for memory Phis,
while they exist within MemorySSA analysis. They need exactly the same
type of updates as regular Phis. Failing to update them properly ends up
with inconsistent MemorySSA and manifests in various assertion failures.

This patch adds Memory Phi updates to this transform.

Thanks to @jonpa for finding this!

Differential Revision: https://reviews.llvm.org/D55050
Reviewed By: asbirlea

llvm-svn: 347979
2018-11-30 10:06:23 +00:00
David Stuttard c6603861d8 Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"
Also revert fix r347876

One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.

llvm-svn: 347911
2018-11-29 20:14:17 +00:00
Sanjay Patel d802270808 [InstSimplify] fold select with implied condition
This is an almost direct move of the functionality from InstCombine to 
InstSimplify. There's no reason not to do this in InstSimplify because 
we never create a new value with this transform.

(There's a question of whether any dominance-based transform belongs in
either of these passes, but that's a separate issue.)

I've changed 1 of the conditions for the fold (1 of the blocks for the 
branch must be the block we started with) into an assert because I'm not 
sure how that could ever be false.

We need 1 extra check to make sure that the instruction itself is in a
basic block because passes other than InstCombine may be using InstSimplify
as an analysis on values that are not wired up yet.

The 3-way compare changes show that InstCombine has some kind of 
phase-ordering hole. Otherwise, we would have already gotten the intended
final result that we now show here.

llvm-svn: 347896
2018-11-29 18:44:39 +00:00
John Brawn a7eb2c863f [LICM] Reapply r347776 "Make LICM able to hoist phis" with fix
This commit caused a large compile-time slowdown in some cases when NDEBUG is
off due to the dominator tree verification it added. Fix this by only doing
dominator tree and loop info verification when something has been hoisted.

Differential Revision: https://reviews.llvm.org/D52827

llvm-svn: 347889
2018-11-29 17:10:00 +00:00
Teresa Johnson 93f9996278 [ThinLTO] Import local variables from the same module as caller
Summary:
We can sometimes end up with multiple copies of a local variable that
have the same GUID in the index. This happens when there are local
variables with the same name that are in different source files having the
same name/path at compile time (but compiled into different bitcode objects).

In this case make sure we import the copy in the caller's module.
This enables importing both of the variables having the same GUID
(but which will have different promoted names since the module paths,
and therefore the module hashes, will be distinct).

Importing the wrong copy is particularly problematic for read only
variables, since we must import them as a local copy whenever
referenced. Otherwise we get undefs at link time.

Note that the llvm-lto.cpp and ThinLTOCodeGenerator changes are needed
for testing the distributed index case via clang, which will be sent as
a separate clang-side patch shortly. We were previously not doing the
dead code/read only computation before computing imports when testing
distributed index generation (like it was for testing importing and
other ThinLTO mechanisms alone).

Reviewers: evgeny777

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D55047

llvm-svn: 347886
2018-11-29 17:02:42 +00:00
Joseph Tremoulet 926ee459c4 [CallSiteSplitting] Report edge deletion to DomTreeUpdater
Summary:
When splitting musttail calls, the split blocks' original terminators
get removed; inform the DTU when this happens.

Also add a testcase that fails an assertion in the DTU without this fix.


Reviewers: fhahn, junbuml

Reviewed By: fhahn

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55027

llvm-svn: 347872
2018-11-29 15:27:04 +00:00
David Stuttard de02e4b1cc Add support for TFE/LWE in image intrinsics
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
llvm-svn: 347871
2018-11-29 15:21:13 +00:00
Sanjay Patel 8242c82de4 [CVP] tidy processCmp(); NFC
1. The variables were confusing: 'C' typically refers to a constant, but here it was the Cmp.
2. Formatting violations.
3. Simplify code to return true/false constant.

llvm-svn: 347868
2018-11-29 14:41:21 +00:00
Martin Storsjo bfd1d27585 Revert "[LICM] Enable control flow hoisting by default" and "[LICM] Reapply r347190 "Make LICM able to hoist phis" with fix"
This reverts commits r347776 and r347778.

The first one, r347776, caused significant compile time regressions
for certain input files, see PR39836 for details.

llvm-svn: 347867
2018-11-29 14:39:39 +00:00
Max Kazantsev 24c186ff00 Disable TermFolding in LoopSimplifyCFG until PR39783 is fixed
llvm-svn: 347844
2018-11-29 09:00:19 +00:00
Sam Parker d6ebf0108e [LoopStrengthReduce] ComplexityLimit as an option
Convert ComplexityLimit into a command line value.

Differential Revision: https://reviews.llvm.org/D54899

llvm-svn: 347843
2018-11-29 08:34:22 +00:00
Jeremy Morse 9b4cfa55b1 [DebugInfo] Give inlinable calls DILocs (PR39807)
In PR39807 we incorrectly handle circumstances where calls are common'd
from conditional blocks into the parent BB. Calls that can be inlined
must always have DebugLocs, however we strip them during commoning, which
the IR verifier asserts on.

Fix this by using applyMergedLocation: it will perform the same DebugLoc
stripping of conditional Locs, but will also generate an unknown location
DebugLoc that satisfies the requirement for inlinable calls to always have
locations.

Some of the prior logic for selecting a DebugLoc is now likely redundant;
I'll generate a follow-up to remove it (involves editing more regression
tests).

Differential Revision: https://reviews.llvm.org/D54997

llvm-svn: 347782
2018-11-28 17:58:45 +00:00
John Brawn 4557ffeb63 [LICM] Enable control flow hoisting by default
Differential Revision: https://reviews.llvm.org/D54949

llvm-svn: 347778
2018-11-28 17:23:03 +00:00
John Brawn 31c9769580 [LICM] Reapply r347190 "Make LICM able to hoist phis" with fix
This commit caused failures because it failed to correctly handle cases where
we hoist a phi, then hoist a use of that phi, then have to rehoist that use. We
need to make sure that we rehoist the use to _after_ the hoisted phi, which we
do by always rehoisting to the immediate dominator instead of just rehoisting
everything to the original preheader.

An option is also added to control whether control flow is hoisted, which is
off in this commit but will be turned on in a subsequent commit.

Differential Revision: https://reviews.llvm.org/D52827

llvm-svn: 347776
2018-11-28 17:21:49 +00:00
Nikita Popov 8d63aed459 [InstCombine] Combine saturating add/sub with constant operands
Combine
  sat(sat(X + C1) + C2) -> sat(X + (C1+C2))
and
  sat(sat(X - C1) - C2) -> sat(X - (C1+C2))
if the sign of C1 and C2 matches.

In the unsigned case we can compute C1+C2 with saturating arithmetic,
and InstSimplify will reduce this just to the saturation value. For
the signed case, we cannot perform the simplification if the result
of the addition overflows.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347773
2018-11-28 16:37:15 +00:00
Nikita Popov 42f89989a1 [InstCombine] Canonicalize ssub.sat to sadd.sat
Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and
not signed minimum. This will help further optimizations to apply.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347772
2018-11-28 16:37:09 +00:00
Nikita Popov 78a9295e15 [InstCombine] Use known overflow information for saturating add/sub
If ValueTracking can determine that the add/sub can newer overflow,
replace it with the corresponding nuw/nsw add/sub.

Additionally, for the unsigned case, if ValueTracking determines
that the add/sub always overflows, replace the result with the
saturation value.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347770
2018-11-28 16:36:59 +00:00
Nikita Popov 085d24a8b3 [InstCombine] Canonicalize const arg for saturating adds
If a saturating add intrinsic has one constant argument, make sure
it is on the RHS. This will simplify further transformations.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347769
2018-11-28 16:36:52 +00:00
Xin Tong 53e52e47e8 [ThinLTO] Correct linkonce_any function import linkage. NFC.
Summary:
This is a NFC as we do not import non-odr vague linkage when computing
for import list for a module.

Reviewers: tejohnson, pcc

Subscribers: inglorion, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D54928

llvm-svn: 347763
2018-11-28 15:16:35 +00:00
Alexey Bataev 579c2d9d64 [SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized.
Summary:
If the original reduction root instruction was vectorized, it might be
removed from the tree. It means that the insertion point may become
invalidated and the whole vectorization of the reduction leads to the
incorrect output result.
The ReductionRoot instruction must be marked as externally used so it
could not be removed. Otherwise it might cause inconsistency with the
cost model and we may end up with too optimistic optimization.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54955

llvm-svn: 347759
2018-11-28 14:34:11 +00:00
Florian Hahn fd6ea134f4 [PartialInliner] Make PHIs free in cost computation.
InlineCost also treats them as free and the current implementation
can cause assertion failures if PHI nodes are moved outside the region
from entry BBs to the region.

It also updates the code to use the instructionsWithoutDebug iterator.

Reviewers: davidxl, davide, vsk, graham-yiu-huawei

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D54748

llvm-svn: 347683
2018-11-27 18:17:27 +00:00
Tim Northover 81bff5e6ea InstCombine: add comment explaining malloc deletion. NFC.
I tried to change this, not quite realising the logic behind what we
were doing. Hopefully this comment will help the next person to come
along.

llvm-svn: 347653
2018-11-27 11:08:14 +00:00
Max Kazantsev 70b11c6d31 [LoopSimplifyCFG] Turn on term folding after underlying bug fixed
llvm-svn: 347641
2018-11-27 06:19:42 +00:00
Max Kazantsev c4e4d6449a [LoopSimplifyCFG] Fix corner case with duplicating successors
It fixes a bug that doesn't update Phi inputs of the only live successor that
is in the list of block's successors more than once.

Thanks @uabelho for finding this.

Differential Revision: https://reviews.llvm.org/D54849
Reviewed By: anna

llvm-svn: 347640
2018-11-27 06:17:21 +00:00
Xin Tong 04d49779a1 [ICP] Remove incompatible attributes at indirect-call promoted callsites.
Summary:
Removing ncompatible attributes at indirect-call promoted callsites, not removing it results in
at least a IR verification error.

Reviewers: davidxl, xur, mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54913

llvm-svn: 347605
2018-11-26 22:03:52 +00:00
Sanjay Patel 790af91803 [InstCombine] add helper function to reduce code duplication; NFC
llvm-svn: 347604
2018-11-26 22:00:41 +00:00
Florian Hahn 6615a7132a [IPSCCP] Use input operand instead of OriginalOp for ssa_copy.
OriginalOp of a Predicate refers to the original IR value,
before renaming. While solving in IPSCCP, we have to use
the operand of the ssa_copy instead, to avoid missing
updates for nested conditions on the same IR value.

Fixes PR39772.

llvm-svn: 347524
2018-11-25 16:32:02 +00:00
Nikita Popov 2c779c0e34 [InstCombine] Determine demanded and known bits for funnel shifts
Support funnel shifts in InstCombine demanded bits simplification.
If the shift amount is constant, we can determine both the demanded
bits of the operands, as well as the known bits of the result.

If one of the operands has no demanded bits, it will be replaced
by undef and the funnel shift will be simplified into a simple shift
due to the simplifications added in D54778.

Differential Revision: https://reviews.llvm.org/D54869

llvm-svn: 347515
2018-11-24 19:00:45 +00:00
Nikita Popov 6e81d421e1 [InstCombine] Simplify funnel shift with zero/undef operand to shift
The following simplifications are implemented:

 * `fshl(X, 0, C) -> shl X, C%BW`
 * `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0)
 * `fshl(0, X, C) -> lshr X, BW-C%BW`
 * `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0)
 * `fshr(X, 0, C) -> shl X, (BW-C%BW)`
 * `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0)
 * `fshr(0, X, C) -> lshr X, C%BW`
 * `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0)

The simplification is only performed if the shift amount C is constant,
because we can explicitly compute C%BW and BW-C%BW in this case.

Differential Revision: https://reviews.llvm.org/D54778

llvm-svn: 347505
2018-11-23 22:45:08 +00:00
Max Kazantsev e1c2dc27d3 Disable LoopSimplifyCFG terminator folding by default
llvm-svn: 347486
2018-11-23 09:14:53 +00:00
Max Kazantsev cb8e240334 [LoopSimplifyCFG] Don't delete LCSSA Phis
When removing edges, we also update Phi inputs and may end up removing
a Phi if it has only one input. We should not do it for edges that leave the current
loop because these Phis are LCSSA Phis and need to be preserved.

Thanks @dmgreen	for finding this!

Differential Revision: https://reviews.llvm.org/D54841

llvm-svn: 347484
2018-11-23 07:56:47 +00:00
Max Kazantsev b565e6093b [NFC] Assert that all blocks staying in loop are live
llvm-svn: 347458
2018-11-22 12:43:27 +00:00
Max Kazantsev 56a2443024 [NFC] Ensure deterministic order of dead exit blocks
llvm-svn: 347457
2018-11-22 12:33:41 +00:00
Max Kazantsev d9f59f8c80 [NFC] Simplify code by using standard exit blocks collection
llvm-svn: 347454
2018-11-22 10:48:30 +00:00
Fedor Sergeev 59246b6bfe [PM] correcting return value for new-pass-manager version of Scalarizer
Obvious mistake missed during D54695 review.

llvm-svn: 347432
2018-11-21 22:01:19 +00:00
Nikita Popov 6f54fb0052 [MergeFuncs] Generate alias instead of thunk if possible
The MergeFunctions pass was originally intended to emit aliases
instead of thunks where possible (unnamed_addr). However, for a
long time this functionality was behind a flag hardcoded to false,
bitrotted and was eventually removed in r309313.

Originally the functionality was first disabled in r108417 due to
lack of support for aliases in Mach-O. I believe that this is no
longer the case nowadays, but not really familiar with this area.

In the interest of being conservative, this patch reintroduces the
aliasing functionality behind a default disabled -mergefunc-use-aliases
flag.

Differential Revision: https://reviews.llvm.org/D53285

llvm-svn: 347407
2018-11-21 19:37:19 +00:00
Mikael Holmen b6f76002d9 [PM] Port Scalarizer to the new pass manager.
Patch by: markus (Markus Lavin)

Reviewers: chandlerc, fedor.sergeev

Reviewed By: fedor.sergeev

Subscribers: llvm-commits, Ka-Ka, bjope

Differential Revision: https://reviews.llvm.org/D54695

llvm-svn: 347392
2018-11-21 14:00:17 +00:00
Guozhi Wei c21fba1bab [LoopSink] Add preheader to alias set
This patch fixes PR39695.

The original LoopSink only considers memory alias in loop body. But PR39695 shows that instructions following sink candidate in preheader should also be checked. This is a conservative patch, it simply adds whole preheader block to alias set. It may lose some optimization opportunity, but I think that is very rare because: 1 in the most common case st/ld to the same address, the load should already be optimized away. 2 usually preheader is not very large. 

Differential Revision: https://reviews.llvm.org/D54659

llvm-svn: 347325
2018-11-20 16:49:07 +00:00
Max Kazantsev c04b5307d1 Recommit "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches"
The initial version of patch lacked Phi nodes updates in destinations of removed
edges. This version contains this update and tests on this situation.

Differential Revision: https://reviews.llvm.org/D54021

llvm-svn: 347289
2018-11-20 05:43:32 +00:00
Reid Kleckner 994a8451ba [Transforms] Prefer static and avoid namespaces, NFC
Put 'static' on three functions in an anonymous namespace as per our
coding style.

Remove the 'namespace llvm {}' around the .cpp file and explicitly
declare the free function 'llvm::optimizeGlobalCtorsList' in 'llvm::'.
I prefer this style for free functions because the compiler will error
out if the .h and .cpp files don't agree on the function name or
prototype.

llvm-svn: 347269
2018-11-19 22:19:05 +00:00
Benjamin Kramer fdd9b4fc8f Revert "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches"
This reverts commits r347183 & r347184. Crashes while building libxml.

llvm-svn: 347260
2018-11-19 20:01:20 +00:00
Vedant Kumar 238533ec2e [InstCombine] Set debug loc on `mergeStoreIntoSuccessor` phi
Assigning a merged debug location to the `mergeStoreIntoSuccessor` phi
improves backtrace quality.

Fixes llvm.org/PR38083.

llvm-svn: 347257
2018-11-19 19:55:02 +00:00
Vedant Kumar 4de31bba51 [IR] Add hasNPredecessors, hasNPredecessorsOrMore to BasicBlock
Add methods to BasicBlock which make it easier to efficiently check
whether a block has N (or more) predecessors.

This can be more efficient than using pred_size(), which is a linear
time operation.

We might consider adding similar methods for successors. I haven't done
so in this patch because succ_size() is already O(1).

With this patch applied, I measured a 0.065% compile-time reduction in
user time for running `opt -O3` on the sqlite3 amalgamation (30 trials).
The change in mergeStoreIntoSuccessor alone saves 45 million linked list
iterations in a stage2 Release build of llc.

See llvm.org/PR39702 for a harder but more general way of achieving
similar results.

Differential Revision: https://reviews.llvm.org/D54686

llvm-svn: 347256
2018-11-19 19:54:27 +00:00
Benjamin Kramer 2cad359c91 Revert "[LICM] Make LICM able to hoist phis"
This reverts commit r347190.

llvm-svn: 347225
2018-11-19 16:51:57 +00:00
Anna Thomas 5e9215f02b [LV] Avoid vectorizing unsafe dependencies in uniform address
Summary:
Currently, when vectorizing stores to uniform addresses, the only
instance we prevent vectorization is if there are multiple stores to the
same uniform address causing an unsafe dependency.
This patch teaches LAA to avoid vectorizing loops that have an unsafe
cross-iteration dependency between a load and a store to the same uniform address.

Fixes PR39653.

Reviewers: Ayal, efriedma

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D54538

llvm-svn: 347220
2018-11-19 15:39:59 +00:00
John Brawn 12c046fba0 [LICM] Make LICM able to hoist phis
The general approach taken is to make note of loop invariant branches, then when
we see something conditional on that branch, such as a phi, we create a copy of
the branch and (empty versions of) its successors and hoist using that.

This has no impact by itself that I've been able to see, as LICM typically
doesn't see such phis as they will have been converted into selects by the time
LICM is run, but once we start doing phi-to-select conversion later it will be
important.

Differential Revision: https://reviews.llvm.org/D52827

llvm-svn: 347190
2018-11-19 11:31:24 +00:00
Max Kazantsev 8e3e33d138 [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches
This patch introduces infrastructure and the simplest case for constant-folding
of branch and switch instructions within loop into unconditional branches.
It is useful as a cleanup for such passes as loop unswitching that sometimes
produce such branches.

Only the simplest case supported in this patch: after the folding, no block
should become dead or stop being part of the loop. Support for more
sophisticated cases will go separately in follow-up patches.

Differential Revision: https://reviews.llvm.org/D54021
Reviewed By: anna

llvm-svn: 347183
2018-11-19 05:54:38 +00:00
Vedant Kumar e7b789b529 [ProfileSummary] Standardize methods and fix comment
Every Analysis pass has a get method that returns a reference of the Result of
the Analysis, for example, BlockFrequencyInfo
&BlockFrequencyInfoWrapperPass::getBFI().  I believe that
ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning
a pointer.

Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock,
respectively.  Most methods use BB as the argument of variable names while
methods usually refer to Basic Blocks as Blocks, instead of BB.  For example,
Function::getEntryBlock, Loop:getExitBlock, etc.

I also fixed one of the comments.

Patch by Rodrigo Caetano Rocha!

Differential Revision: https://reviews.llvm.org/D54669

llvm-svn: 347182
2018-11-19 05:23:16 +00:00
Vedant Kumar 35f504c113 [CorrelatedValuePropagation] Preserve debug locations (PR38178)
Fix all of the missing debug location errors in CVP found by debugify.

This includes the missing-location-after-udiv-truncation case described
in llvm.org/PR38178.

llvm-svn: 347147
2018-11-18 00:29:58 +00:00
Fangrui Song 7570932977 Use llvm::copy. NFC
llvm-svn: 347126
2018-11-17 01:44:25 +00:00
Fedor Sergeev 2e3e224e71 [SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with
We need to control exponential behavior of loop-unswitch so we do not get
run-away compilation.

Suggested solution is to introduce a multiplier for an unswitch cost that
makes cost prohibitive as soon as there are too many candidates and too
many sibling loops (meaning we have already started duplicating loops
by unswitching).

It does solve the currently known problem with compile-time degradation
(PR 39544).

Tests are built on top of a recently implemented CHECK-COUNT-<num>
FileCheck directives.

Reviewed By: chandlerc, mkazantsev
Differential Revision: https://reviews.llvm.org/D54223

llvm-svn: 347097
2018-11-16 21:16:43 +00:00
Adrian Prantl 83d87520ed GlobalDCE: Teach isEmptyFunction() to ignore debug intrinsics.
This fixes PR39669.
https://bugs.llvm.org/show_bug.cgi?id=39669

llvm-svn: 347065
2018-11-16 17:47:21 +00:00
Eugene Leviant bf46e7410c [ThinLTO] Internalize readonly globals
An attempt to recommit r346584 after failure on OSX build bot.
Fixed cache key computation in ThinLTOCodeGenerator and added
test case

llvm-svn: 347033
2018-11-16 07:08:00 +00:00
Xin Tong 642c8d3575 [LTO] Load sample profile in LTO link step.
Summary:
Load sample profile in LTO link step.
ThinLTO calls populateModulePassManager to load the profile

Reviewers: tejohnson, davidxl, danielcdh

Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D54564

llvm-svn: 346971
2018-11-15 18:06:42 +00:00
Sanjay Patel bc56b2432d [InstCombine] fix rotate narrowing bug for non-pow-2 types
llvm-svn: 346968
2018-11-15 17:19:14 +00:00
Mandeep Singh Grang 0905fc77c1 [InstCombine] Remove a couple of asserts based on incorrect assumptions
Summary:
These asserts are based on the assumption that the order of true/false operands in a select and those in the compare would always be the same.
This fixes PR39595.

Reviewers: craig.topper, spatel, dmgreen

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54359

llvm-svn: 346874
2018-11-14 17:55:07 +00:00
Sanjay Patel 6072842770 [InstCombine] fix formatting for matchBSwap(); NFC
We should have a similar function for matching rotate and/or 
funnel shift, so tidy up the related existing call.

llvm-svn: 346871
2018-11-14 16:03:36 +00:00
Florian Hahn 6df11868b5 [VPlan, SLP] Use SmallPtrSet for Candidates.
This slightly improves the candidate handling in getBest().

llvm-svn: 346870
2018-11-14 15:58:40 +00:00
Florian Hahn 02cb67deb9 [VPlan] Remove LLVM_DEBUG from VPlanSlp::dumpBundle.
The caller should take care of only calling it with debug enabled.

llvm-svn: 346860
2018-11-14 13:33:44 +00:00
Florian Hahn 2eca3728ee [VPlan] Update ifdef.
llvm-svn: 346858
2018-11-14 13:21:26 +00:00
Florian Hahn 09e516c54b [VPlan, SLP] Add simple SLP analysis on top of VPlan.
This patch adds an initial implementation of the look-ahead SLP tree
construction described in 'Look-Ahead SLP: Auto-vectorization in the Presence
of Commutative Operations, CGO 2018 by Vasileios Porpodas, Rodrigo C. O. Rocha,
Luís F. W. Góes'.

It returns an SLP tree represented as VPInstructions, with combined
instructions represented as a single, wider VPInstruction.

This initial version does not support instructions with multiple
different users (either inside or outside the SLP tree) or
non-instruction operands; it won't generate any shuffles or
insertelement instructions.

It also just adds the analysis that builds an SLP tree rooted in a set
of stores. It does not include any cost modeling or memory legality
checks. The plan is to integrate it with VPlan based cost modeling, once
available and to only apply it to operations that can be widened.

A follow-up patch will add a support for replacing instructions in a
VPlan with their SLP counter parts.

Reviewers: Ayal, mssimpso, rengolin, mkuper, hfinkel, hsaito, dcaballe, vporpo, RKSimon, ABataev

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D4949

llvm-svn: 346857
2018-11-14 13:11:49 +00:00
Florian Hahn 505091a8f2 Recommit r346483: [CallSiteSplitting] Only record conditions up to the IDom(call site).
The underlying problem causing the expensive-check failure was fixed in
rL346769.

llvm-svn: 346843
2018-11-14 10:04:30 +00:00
Reid Kleckner 41390b47de Revert r346810 "Preserve loop metadata when splitting exit blocks"
It broke the Windows self-host:
http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/1457

llvm-svn: 346823
2018-11-14 01:47:32 +00:00
Sanjay Patel a139564896 [InstCombine] fold funnel shift amount based on demanded bits
The shift amount of a funnel shift is modulo the scalar bitwidth:
http://llvm.org/docs/LangRef.html#llvm-fshl-intrinsic
...so we can use demanded bits analysis on that operand to simplify it
when we have a power-of-2 bitwidth.

This is another step towards canonicalizing {shift/shift/or} to the 
intrinsics in IR.

Differential Revision: https://reviews.llvm.org/D54478

llvm-svn: 346814
2018-11-13 23:27:23 +00:00
Craig Topper 3c87c2a3c5 Preserve loop metadata when splitting exit blocks
LoopUtils.cpp contains a utility that splits an loop exit block, so that the new block contains only edges coming from the loop. In the case of nested loops, the exit path for the inner loop might also be the back-edge of the outer loop. The new block which is inserted on this path, is now a latch for the outer loop, and it needs to hold the loop metadata for the outer loop. (The test case gives a more concrete view of the situation.)

Patch by Chang Lin (clin1)

Differential Revision: https://reviews.llvm.org/D53876

llvm-svn: 346810
2018-11-13 23:06:49 +00:00
Sanjay Patel f8f12272e8 [InstCombine] canonicalize rotate patterns with cmp/select
The cmp+branch variant of this pattern is shown in:
https://bugs.llvm.org/show_bug.cgi?id=34924
...and as discussed there, we probably can't transform
that without a rotate intrinsic. We do have that now
via funnel shift, but we're not quite ready to 
canonicalize IR to that form yet. The case with 'select'
should already be transformed though, so that's this patch.

The sequence with negation followed by masking is what we
use in the backend and partly in clang (though that part 
should be updated).

https://rise4fun.com/Alive/TplC
  %cmp = icmp eq i32 %shamt, 0
  %sub = sub i32 32, %shamt
  %shr = lshr i32 %x, %shamt
  %shl = shl i32 %x, %sub
  %or = or i32 %shr, %shl
  %r = select i1 %cmp, i32 %x, i32 %or
  =>
  %neg = sub i32 0, %shamt
  %masked = and i32 %shamt, 31
  %maskedneg = and i32 %neg, 31
  %shl2 = lshr i32 %x, %masked
  %shr2 = shl i32 %x, %maskedneg
  %r = or i32 %shl2, %shr2

llvm-svn: 346807
2018-11-13 22:47:24 +00:00
Florian Hahn 107d0a8756 [CSP, Cloning] Update DuplicateInstructionsInSplitBetween to use DomTreeUpdater.
This patch updates DuplicateInstructionsInSplitBetween to update a DTU
instead of applying updates to the DT directly.

Given that there only are 2 users, also updated them in this patch to
avoid churn.

I slightly moved the code in CallSiteSplitting around to reduce the
places where we have to pass in DTU. If necessary, I could split those
changes in a separate patch.

This fixes missing DT updates when dealing with musttail calls in
CallSiteSplitting, by using DTU->deleteBB.

Reviewers: junbuml, kuhar, NutshellySima, indutny, brzycki

Reviewed By: NutshellySima

llvm-svn: 346769
2018-11-13 17:54:43 +00:00
Steven Wu fa43892d6f Revert "[ThinLTO] Internalize readonly globals"
This reverts commit 10c84a8f35cae4a9fc421648d9608fccda3925f2.

llvm-svn: 346768
2018-11-13 17:35:04 +00:00
Florian Hahn a4dc7feeea [VPlan] VPlan version of InterleavedAccessInfo.
This patch turns InterleaveGroup into a template with the instruction type
being a template parameter. It also adds a VPInterleavedAccessInfo class, which
only contains a mapping from VPInstructions to their respective InterleaveGroup.
As we do not have access to scalar evolution in VPlan, we can re-use
convert InterleavedAccessInfo to VPInterleavedAccess info.


Reviewers: Ayal, mssimpso, hfinkel, dcaballe, rengolin, mkuper, hsaito

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D49489

llvm-svn: 346758
2018-11-13 15:58:18 +00:00
Zhizhou Yang cc633af55b Introduce DebugCounter into ConstProp pass
Summary:
This patch introduces DebugCounter into ConstProp pass at per-transformation level.

It will provide an option to skip first n or stop after n transformations for the whole ConstProp pass.

This will make debug easier for the pass, also providing chance to do transformation level bisecting.

Reviewers: davide, fhahn

Reviewed By: fhahn

Subscribers: llozano, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D50094

llvm-svn: 346720
2018-11-13 00:31:22 +00:00
Sanjay Patel 35b1c2d19d [InstCombine] narrow width of rotate patterns, part 3
This is a longer variant for the pattern handled in
rL346713 
This one includes zexts. 

Eventually, we should canonicalize all rotate patterns 
to the funnel shift intrinsics, but we need a bit more
infrastructure to make sure the vectorizers handle those
intrinsics as well as the shift+logic ops.

https://rise4fun.com/Alive/FMn

Name: narrow rotateright
  %neg = sub i8 0, %shamt
  %rshamt = and i8 %shamt, 7
  %rshamtconv = zext i8 %rshamt to i32
  %lshamt = and i8 %neg, 7
  %lshamtconv = zext i8 %lshamt to i32
  %conv = zext i8 %x to i32
  %shr = lshr i32 %conv, %rshamtconv
  %shl = shl i32 %conv, %lshamtconv
  %or = or i32 %shl, %shr
  %r = trunc i32 %or to i8
  =>
  %maskedShAmt2 = and i8 %shamt, 7
  %negShAmt2 = sub i8 0, %shamt
  %maskedNegShAmt2 = and i8 %negShAmt2, 7
  %shl2 = lshr i8 %x, %maskedShAmt2
  %shr2 = shl i8 %x, %maskedNegShAmt2
  %r = or i8 %shl2, %shr2
llvm-svn: 346716
2018-11-12 22:52:25 +00:00
Sanjay Patel 98e427ccf2 [InstCombine] narrow width of rotate patterns, part 2 (PR39624)
The sub-pattern for the shift amount in a rotate can take on
several different forms, and there's apparently no way to
canonicalize those without seeing the entire rotate sequence.

This is the form noted in:
https://bugs.llvm.org/show_bug.cgi?id=39624

https://rise4fun.com/Alive/qnT

  %zx = zext i8 %x to i32
  %maskedShAmt = and i32 %shAmt, 7
  %shl = shl i32 %zx, %maskedShAmt
  %negShAmt = sub i32 0, %shAmt
  %maskedNegShAmt = and i32 %negShAmt, 7
  %shr = lshr i32 %zx, %maskedNegShAmt
  %rot = or i32 %shl, %shr
  %r = trunc i32 %rot to i8
  =>
  %truncShAmt = trunc i32 %shAmt to i8
  %maskedShAmt2 = and i8 %truncShAmt, 7
  %shl2 = shl i8 %x, %maskedShAmt2
  %negShAmt2 = sub i8 0, %truncShAmt
  %maskedNegShAmt2 = and i8 %negShAmt2, 7
  %shr2 = lshr i8 %x, %maskedNegShAmt2
  %r = or i8 %shl2, %shr2

llvm-svn: 346713
2018-11-12 22:11:09 +00:00
Sanjay Patel ceab2329b6 [InstCombine] refactor code for matching shift amount of a rotate; NFC
As shown in existing test cases and with:
https://bugs.llvm.org/show_bug.cgi?id=39624
...we're missing at least 2 more patterns for rotate narrowing.

llvm-svn: 346711
2018-11-12 22:00:00 +00:00
Philip Reames b8d8db30ea [GC][InstCombine] Fix a potential iteration issue
Noticed via inspection.  Appears to be largely innocious in practice, but slight code change could have resulted in either visit order dependent missed optimizations or infinite loops.  May be a minor compile time problem today.

llvm-svn: 346698
2018-11-12 20:00:53 +00:00
Simon Pilgrim 631f2bf51e [CostModel] Add more realistic SK_ExtractSubvector generic costs.
Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles.

This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type.

llvm-svn: 346656
2018-11-12 14:25:23 +00:00
Max Kazantsev 7d49a3a816 [LICM] Hoist guards from non-header blocks
This patch relaxes overconservative checks on whether or not we could write
memory before we execute an instruction. This allows us to hoist guards out of
loops even if they are not in the header block.

Differential Revision: https://reviews.llvm.org/D50891
Reviewed By: fedor.sergeev

llvm-svn: 346643
2018-11-12 09:29:58 +00:00
Calixte Denizet c6fabeac11 [GCOV] Add options to filter files which must be instrumented.
Summary:
When making code coverage, a lot of files (like the ones coming from /usr/include) are removed when post-processing gcno/gcda so finally they doen't need to be instrumented nor to appear in gcno/gcda.
The goal of the patch is to be able to filter the files we want to instrument, there are several advantages to do that:
- improve speed (no overhead due to instrumentation on files we don't care)
- reduce gcno/gcda size
- it gives the possibility to easily instrument only few files (e.g. ones modified in a patch) without changing the build system
- need to accept this patch to be enabled in clang: https://reviews.llvm.org/D52034

Reviewers: marco-c, vsk

Reviewed By: marco-c

Subscribers: llvm-commits, sylvestre.ledru

Differential Revision: https://reviews.llvm.org/D52033

llvm-svn: 346641
2018-11-12 09:01:43 +00:00
Florian Hahn 9026d4ee9b [IPSCCP,PM] Preserve PDT in the new pass manager.
Reviewers: kuhar, chandlerc, NutshellySima, brzycki

Reviewed By: NutshellySima, brzycki

Differential Revision: https://reviews.llvm.org/D54317

llvm-svn: 346618
2018-11-11 20:22:45 +00:00
Sanjay Patel 4a12aa9791 [InstCombine] simplify code for merging stores; NFCI
llvm-svn: 346596
2018-11-10 20:29:25 +00:00
Eugene Leviant be8d19967a [ThinLTO] Internalize readonly globals
This patch allows internalising globals if all accesses to them
(from live functions) are from non-volatile load instructions

Differential revision: https://reviews.llvm.org/D49362

llvm-svn: 346584
2018-11-10 08:31:21 +00:00
Eli Friedman 15930bf352 [JumpThreading] Fix exponential time algorithm computing known values.
ComputeValueKnownInPredecessors has a "visited" set to prevent infinite
loops, since a value can be visited more than once.  However, the
implementation didn't prevent the algorithm from taking exponential
time. Instead of removing elements from the RecursionSet one at a time,
we should keep around the whole set until
ComputeValueKnownInPredecessors finishes, then discard it.

The testcase is synthetic because I was having trouble effectively
reducing the original.  But it's basically the same idea.

Instead of failing, we could theoretically cache the result instead.
But I don't think it would help substantially in practice.

Differential Revision: https://reviews.llvm.org/D54239

llvm-svn: 346562
2018-11-09 22:35:26 +00:00
Florian Hahn 9f878e9bae Revert r346483: [CallSiteSplitting] Only record conditions up to the IDom(call site).
This cause a failure with EXPENSIVE_CHECKS

llvm-svn: 346492
2018-11-09 13:28:58 +00:00
Florian Hahn a1062f4b68 [IPSCCP,PM] Preserve DT in the new pass manager.
After D45330, Dominators are required for IPSCCP and can be preserved.

This patch preserves DominatorTreeAnalysis in the new pass manager. AFAIK the legacy pass manager cannot preserve function analysis required by a module analysis.

Reviewers: davide, dberlin, chandlerc, efriedma, kuhar, NutshellySima

Reviewed By: chandlerc, kuhar, NutshellySima

Differential Revision: https://reviews.llvm.org/D47259

llvm-svn: 346486
2018-11-09 11:52:27 +00:00
Florian Hahn 52578f95c9 [CallSiteSplitting] Only record conditions up to the IDom(call site).
We can stop recording conditions once we reached the immediate dominator
for the block containing the call site. Conditions in predecessors of the
that node will be the same for all paths to the call site and splitting
is not beneficial.

This patch makes CallSiteSplitting dependent on the DT anlysis. because
the immediate dominators seem to be the easiest way of finding the node
to stop at.

I had to update some exiting tests, because they were checking for
conditions that were true/false on all paths to the call site. Those
should now be handled by instcombine/ipsccp.

Reviewers: davide, junbuml

Reviewed By: junbuml

Differential Revision: https://reviews.llvm.org/D44627

llvm-svn: 346483
2018-11-09 10:23:46 +00:00
Carlos Alberto Enciso fa9cf89734 [DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG.
In SimplifyCFG when given a conditional branch that goes to BB1 and BB2, the hoisted common terminator instruction in the two blocks, caused debug line records associated with subsequent select instructions to become ambiguous. It causes the debugger to display unreachable source lines.

Differential Revision: https://reviews.llvm.org/D53390

llvm-svn: 346481
2018-11-09 09:42:10 +00:00
Max Kazantsev 9883d1e1a7 [NFC] Add utility function for SafetyInfo updates for moveBefore
llvm-svn: 346472
2018-11-09 05:39:04 +00:00
Florian Hahn a684a99441 [LoopInterchange] Support reductions across inner and outer loop.
This patch adds logic to detect reductions across the inner and outer
loop by following the incoming values of PHI nodes in the outer loop. If
the incoming values take part in a reduction in the inner loop or come
from outside the outer loop, we found a reduction spanning across inner
and outer loop.

With this change, ~10% more loops are interchanged in the LLVM
test-suite + SPEC2006.

Fixes https://bugs.llvm.org/show_bug.cgi?id=30472

Reviewers: mcrosier, efriedma, karthikthecool, davide, hfinkel, dmgreen

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D43245

llvm-svn: 346438
2018-11-08 20:44:19 +00:00
Pirama Arumuga Nainar e61652a384 [LTO] Drop non-prevailing definitions only if linkage is not local or appending
Summary:
This fixes PR 37422

In ELF, non-weak symbols can also be non-prevailing.  In this particular
PR, the __llvm_profile_* symbols are non-prevailing but weren't getting
dropped - causing multiply-defined errors with lld.

Also add a test, strong_non_prevailing.ll, to ensure that multiple
copies of a strong symbol are dropped.

To fix the test regressions exposed by this fix,
- do not mark prevailing copies for symbols with 'appending' linkage.
There's no one prevailing copy for such symbols.
- fix the prevailing version in dead-strip-fulllto.ll
- explicitly pass exported symbols to llvm-lto in fumcimport.ll and
funcimport_var.ll

Reviewers: tejohnson, pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith,
dang, srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D54125

llvm-svn: 346436
2018-11-08 20:10:07 +00:00
Tom Stellard 28d662164d InstCombine: Avoid introducing poison values when lowering llvm.amdgcn.[us]bfe
Summary:
When the 3rd argument to these intrinsics is zero, lowering them
to shift instructions produces poison values, since we end up with
shift amounts equal to the number of bits in the shifted value.  This
means we can only lower these intrinsics if we can prove that the
3rd argument is not zero.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: bnieuwenhuizen, jvesely, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D53739

llvm-svn: 346422
2018-11-08 17:57:57 +00:00
Vedant Kumar d6699423f1 [CodeExtractor] Mark functions noreturn when applicable
This eliminates the outlining penalty for llvm.trap/unreachable, because
callers no longer have to emit cleanup/ret instructions after calling an
outlined `noreturn` function.

rdar://45523626

llvm-svn: 346421
2018-11-08 17:57:09 +00:00
Max Kazantsev 266c087b9d Return "[IndVars] Smart hard uses detection"
The patch has been reverted because it ended up prohibiting propagation
of a constant to exit value. For such values, we should skip all checks
related to hard uses because propagating a constant is always profitable.

Differential Revision: https://reviews.llvm.org/D53691

llvm-svn: 346397
2018-11-08 11:54:35 +00:00
Gil Rapaport 7b88bab386 [LSR] Combine unfolded offset into invariant register
LSR reassociates constants as unfolded offsets when the constants fit as
immediate add operands, which currently prevents such constants from being
combined later with loop invariant registers.
This patch modifies GenerateCombinations() to generate a second formula which
includes the unfolded offset in the combined loop-invariant register.

This commit fixes a bug in the original patch (committed at r345114, reverted
at r345123).

Differential Revision: https://reviews.llvm.org/D51861

llvm-svn: 346390
2018-11-08 09:01:19 +00:00
whitequark 73cb978495 [MergeFuncs] Improve ordering of equal functions
Summary:
MergeFunctions currently tries to process strong functions before
weak functions, because weak functions can simply call strong
functions, while a strong/weak function cannot call a weak function
(a backing strong function is needed).

This patch additionally tries to process external functions before
local functions, because we definitely have to keep the external
function, but may be able to drop the local one (and definitely
can if it is also unnamed_addr).

Unfortunately, this exposes an existing bug in the implementation:
The FnTree and FNodesInTree structures can currently go out of
sync in the case where two weak functions are merged, because the
function in FnTree/FNodesInTree is RAUWed. This leaves it behind in
FnTree (this is intended, as it is the strong backing function which
should be used for further merges), while it is replaced in
FNodesInTree (this is not intended).

This is fixed by switching FNodesInTree from using a ValueMap to
using a DenseMap of AssertingVH.

This exposes another minor issue: Currently FNodesInTree is not
cleared after MergeFunctions finishes running. Currently, this is
potentially dangerous (e.g. if something else wants to RAUW a function
with a non-function), but at the very least it is unnecessary/inefficient.
After the change to use AssertingVH it becomes more problematic,
because there are certainly passes that remove functions.

This issue is fixed by clearing FNodesInTree at the end of the pass.

Reviewers: jfb, whitequark

Reviewed By: whitequark

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D53271

llvm-svn: 346386
2018-11-08 03:58:01 +00:00
whitequark 3580ac6125 [MergeFuncs] Call removeUsers() prior to unnamed_addr RAUW
Summary:
For unnamed_addr functions we RAUW instead of only replacing direct callers. However, functions in which replacements were performed currently are not added back to the worklist, resulting in missed merging opportunities.

Fix this by calling removeUsers() prior to RAUW.

Reviewers: jfb, whitequark

Reviewed By: whitequark

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D53262

llvm-svn: 346385
2018-11-08 03:57:55 +00:00
Reid Kleckner b41b372171 [sancov] Put .SCOV* sections into the right comdat groups on COFF
Avoids linker errors about relocations against discarded sections.

This was uncovered during the Chromium clang roll here:
https://chromium-review.googlesource.com/c/chromium/src/+/1321863#message-717516acfcf829176f6a2f50980f7a4bdd66469a

After this change, Chromium's libGLESv2 links successfully for me.

Reviewers: metzman, hans, morehouse

Differential Revision: https://reviews.llvm.org/D54232

llvm-svn: 346381
2018-11-08 00:57:33 +00:00
Rong Xu fb4bcc452c [PGO] Exit early if all count values are zero
If all the edge counts for a function are zero, skip count population and
annotation, as nothing will happen. This can save some compile time.

Differential Revision: https://reviews.llvm.org/D54212

llvm-svn: 346370
2018-11-07 23:51:20 +00:00
Fedor Sergeev f9a02a7006 [SimpleLoopUnswitch] partial unswitch needs to be careful when replacing invariants with constants
When partial unswitch operates on multiple conditions at once, .e.g:
   if (Cond1 || Cond2 || NonInv) ...

it should infer (and replace) values for individual conditions only on one
side of unswitch and not another.

More precisely only these derivations hold true:
   (Cond1 || Cond2) == false  =>  Cond1 == Cond2 == false
   (Cond1 && Cond2) == true   =>  Cond1 == Cond2 == true

By the way we organize unswitching it means only replacing on "continue" blocks
and never on "unswitched" ones. Since trivial unswitch does not have "unswitched"
blocks it does not have this problem.

Fixes PR 39568.

Reviewers: chandlerc, asbirlea
Differential Revision: https://reviews.llvm.org/D54211

llvm-svn: 346350
2018-11-07 20:05:11 +00:00
Mandeep Singh Grang d47d188b6f [LoopSink] Do not sink instructions into non-cold blocks
Summary: This fixes PR39570.

Reviewers: danielcdh, rnk, bkramer

Reviewed By: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54181

llvm-svn: 346337
2018-11-07 18:26:24 +00:00
Florian Hahn ac86038b40 [NewGVN] Make sure we do not add a user to itself.
If we simplify an instruction to itself, we do not need to add a user to
itself. For congruence classes with a defining expression, we already
use a similar logic.

Fixes PR38259.

Reviewers: davide, efriedma, mcrosier

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D51168

llvm-svn: 346335
2018-11-07 17:20:07 +00:00
Sanjay Patel 57a08b3343 [InstCombine] propagate FMF for fcmp+fabs folds
By morphing the instruction rather than deleting and creating a new one,
we retain fast-math-flags and potentially other metadata (profile info?).

llvm-svn: 346331
2018-11-07 16:15:01 +00:00
Sanjay Patel bb521e63af [InstCombine] peek through fabs() when checking isnan()
That should be the end of the missing cases for this fold.
See earlier patches in this series:
rL346321
rL346324

llvm-svn: 346327
2018-11-07 15:44:26 +00:00
Sanjay Patel fa5f146872 [InstCombine] add folds for fcmp Pred fabs(X), 0.0
Similar to rL346321, we had folds for the ordered
versions of these compares already, so add the
unordered siblings for completeness.

llvm-svn: 346324
2018-11-07 15:33:03 +00:00
James Y Knight 72f76bf230 Add support for llvm.is.constant intrinsic (PR4898)
This adds the llvm-side support for post-inlining evaluation of the
__builtin_constant_p GCC intrinsic.

Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call
to a function where canConstantFoldTo returns true, and one of the
arguments is a struct.

Updated from patch initially by Janusz Sobczak.

Differential Revision: https://reviews.llvm.org/D4276

llvm-svn: 346322
2018-11-07 15:24:12 +00:00
Sanjay Patel 76faf5145d [InstCombine] add fold for fabs(X) u< 0.0
The sibling fold for 'oge' --> 'ord' was already here,
but this half was missing. 

The result of fabs() must be positive or nan, so asking 
if the result is negative or nan is the same as asking 
if the result is nan.

This is another step towards fixing:
https://bugs.llvm.org/show_bug.cgi?id=39475

llvm-svn: 346321
2018-11-07 15:11:32 +00:00
Sanjay Patel de58e93666 fix typos aggressively; NFC
llvm-svn: 346316
2018-11-07 14:35:36 +00:00
Sanjay Patel 7552d0d2e6 [InstCombine] do not shrink switch conditions to illegal types (PR29009)
This patch makes shrinking switch conditions less aggressive which was introduced by:
rL274233

Note that we have 2 new bugs to track potential follow-ups that might have solved PR29009
in different ways:
https://bugs.llvm.org/show_bug.cgi?id=39569
https://bugs.llvm.org/show_bug.cgi?id=39578

Patch by:
@dendibakh (Denis Bakhvalov)

Differential Revision: https://reviews.llvm.org/D54115

llvm-svn: 346315
2018-11-07 14:12:41 +00:00
Calixte Denizet c3bed1e8e6 [GCOV] Flush counters before to avoid counting the execution before fork twice and for exec** functions we must flush before the call
Summary:
This is replacement for patch in https://reviews.llvm.org/D49460.
When we fork, the counters are duplicate as they're and so the values are finally wrong when writing gcda for parent and child.
So just before to fork, we flush the counters and so the parent and the child have new counters set to zero.
For exec** functions, we need to flush before the call to have some data.

Reviewers: vsk, davidxl, marco-c

Reviewed By: marco-c

Subscribers: llvm-commits, sylvestre.ledru, marco-c

Differential Revision: https://reviews.llvm.org/D53593

llvm-svn: 346313
2018-11-07 13:49:17 +00:00
Sanjay Patel d1172a0c20 [IR] add optional parameter for copying IR flags to compare instructions
As shown, this is used to eliminate redundant code in InstCombine,
and there are more cases where we should be using this pattern, but
we're currently unintentionally dropping flags. 

llvm-svn: 346282
2018-11-07 00:00:42 +00:00
Teresa Johnson cb397461e1 [ThinLTO] Split NotEligibleToImport into legality and inlinability flags
Summary:
The NotEligibleToImport flag on the GlobalValueSummary was set if it
isn't legal to import (e.g. because it references unpromotable locals)
and when it can't be inlined (in which case importing is pointless).

I split out the inlinable piece into a separate flag on the
FunctionSummary (doesn't make sense for aliases or global variables),
because in the future we may want to import for reasons other than
inlining.

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D53345

llvm-svn: 346261
2018-11-06 19:41:35 +00:00
Vedant Kumar 1e209e284f [CodeExtractor] Do not extract calls to eh_typeid_for (PR39545)
The lowering for a call to eh_typeid_for changes when it's moved from
one function to another.

There are several proposals for fixing this issue in llvm.org/PR39545.
Until some solution is in place, do not allow CodeExtractor to extract
calls to eh_typeid_for, as that results in serious miscompilations.

llvm-svn: 346256
2018-11-06 19:06:08 +00:00
Vedant Kumar 09b7aa443d [CodeExtractor] Erase use-without-def debug intrinsics in parent func
When CodeExtractor moves instructions to a new function, debug
intrinsics referring to those instructions within the parent function
become invalid.

This results in the same verifier failure which motivated r344545, about
function-local metadata being used in the wrong function.

llvm-svn: 346255
2018-11-06 19:05:53 +00:00
Sanjay Patel 724014adde [InstCombine] allow vector types for fcmp+fpext fold
llvm-svn: 346245
2018-11-06 17:20:20 +00:00
Sanjay Patel 46bf3922c1 [InstCombine] propagate fast-math-flags when folding fcmp+fpext, part 2
llvm-svn: 346242
2018-11-06 16:45:27 +00:00
Sanjay Patel 7c3ee4da42 [InstCombine] rearrange code for fcmp+fpext; NFCI
llvm-svn: 346241
2018-11-06 16:37:35 +00:00
Sanjay Patel 1b85f00201 [InstCombine] propagate fast-math-flags when folding fcmp+fpext
llvm-svn: 346240
2018-11-06 16:23:03 +00:00
Sanjay Patel 2fd5b0ebfb [InstCombine] propagate fast-math-flags when folding fcmp+fneg, part 2
llvm-svn: 346238
2018-11-06 15:58:57 +00:00
Sanjay Patel 05e70fb978 [InstCombine] reduce code; NFC
llvm-svn: 346235
2018-11-06 15:53:58 +00:00
Sanjay Patel 70282a0501 [InstCombine] propagate fast-math-flags when folding fcmp+fneg
This is another part of solving PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

This might be enough to fix that particular issue, but as noted
with the FIXME, we're still dropping FMF on other folds around here.

llvm-svn: 346234
2018-11-06 15:49:45 +00:00
Simon Pilgrim c1da5f757e [InstCombine] Ensure nested shifts are in range (OSS-Fuzz #9880)
llvm-svn: 346225
2018-11-06 11:28:22 +00:00
Max Kazantsev 0042c06e8b [LICM] Remove too conservative IsMustExecute variable
LICM relies on variable `MustExecute` which is conservatively set to `false`
in all non-headers. It is used when we decide whether or not we want to hoist
an instruction or a guard.

For the guards, it might be too conservative to use this variable, we can
instead use a more precise logic from LoopSafetyInfo. Currently it is only NFC
because `IsMemoryNotModified` is also conservatively set to `false` for all
non-headers, and we cannot hoist guards from non-header blocks. However once we
give up using `IsMemoryNotModified` and use a smarter check instead, this will
allow us to hoist guards from all mustexecute non-header blocks.

Differential Revision: https://reviews.llvm.org/D50888
Reveiwed By: fedor.sergeev

llvm-svn: 346204
2018-11-06 04:17:40 +00:00
Max Kazantsev 69f6dfa0f8 [LICM] Use ICFLoopSafetyInfo in LICM
This patch makes LICM use `ICFLoopSafetyInfo` that is a smarter version
of LoopSafetyInfo that leverages power of Implicit Control Flow Tracking
to keep track of throwing instructions and give less pessimistic answers
to queries related to throws.

The ICFLoopSafetyInfo itself has been introduced in rL344601. This patch
enables it in LICM only.

Differential Revision: https://reviews.llvm.org/D50377
Reviewed By: apilipenko

llvm-svn: 346201
2018-11-06 02:44:49 +00:00
Max Kazantsev e059f4452b Revert "[IndVars] Smart hard uses detection"
This reverts commit 2f425e9c7946b9d74e64ebbfa33c1caa36914402.

It seems that the check that we still should do the transform if we
know the result is constant is missing in this code. So the logic that
has been deleted by this change is still sometimes accidentally useful.
I revert the change to see what can be done about it. The motivating
case is the following:

@Y = global [400 x i16] zeroinitializer, align 1

define i16 @foo() {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %i = phi i16 [ 0, %entry ], [ %inc, %for.body ]

  %arrayidx = getelementptr inbounds [400 x i16], [400 x i16]* @Y, i16 0, i16 %i
  store i16 0, i16* %arrayidx, align 1
  %inc = add nuw nsw i16 %i, 1
  %cmp = icmp ult i16 %inc, 400
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  %inc.lcssa = phi i16 [ %inc, %for.body ]
  ret i16 %inc.lcssa
}

We should be able to figure out that the result is constant, but the patch
breaks it.

Differential Revision: https://reviews.llvm.org/D51584

llvm-svn: 346198
2018-11-06 02:02:05 +00:00
Sanjay Patel 1440107821 [InstSimplify] fold select (fcmp X, Y), X, Y
This is NFCI for InstCombine because it calls InstSimplify, 
so I left the tests for this transform there. As noted in
the code comment, we can allow this fold more often by using
FMF and/or value tracking.

llvm-svn: 346169
2018-11-05 21:51:39 +00:00
Taewook Oh 2b7ae47ccb [MergeICmps] Do not perform the transformation if GEP is used outside of block
Summary:
This patch prevents MergeICmps to performn the transformation if the address operand GEP of the load instruction has a use outside of the load's parent block. Without this patch, compiler crashes with the given test case because the use of `%first.i` is still around when the basic block is erased from https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/MergeICmps.cpp#L620. I think checking `isUsedOutsideOfBlock` with `GEP` is the original intention of the code, as the checking for `LoadI` is already performed in the same function.

This patch is incomplete though, as this makes the pass overly conservative and fails the test `tuple-four-int8.ll`. I believe what needs to be done is checking if GEP has a use outside of block that is not the part of "Comparisons" chain. Submit the patch as of now to prevent compiler crash.

Reviewers: courbet, trentxintong

Reviewed By: courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54089

llvm-svn: 346151
2018-11-05 18:16:32 +00:00
Sanjay Patel c26fd1e772 [InstCombine] canonicalize -0.0 to +0.0 in fcmp
As stated in IEEE-754 and discussed in:
https://bugs.llvm.org/show_bug.cgi?id=38086
...the sign of zero does not affect any FP compare predicate.

Known regressions were fixed with:
rL346097 (D54001)
rL346143

The transform will help reduce pattern-matching complexity to solve:
https://bugs.llvm.org/show_bug.cgi?id=39475
...as well as improve CSE and codegen (a zero constant is almost always
easier to produce than 0x80..00).

llvm-svn: 346147
2018-11-05 17:26:42 +00:00
Sanjay Patel 87aa10062c [InstCombine] loosen FP 0.0 constraint for fcmp+select substitution
It looks like we correctly removed edge cases with 0.0 from D50714,
but we were a bit conservative because getBinOpIdentity() doesn't
distinguish between +0.0 and -0.0 and 'nsz' is effectively always
true for fcmp (see discussion in:
https://bugs.llvm.org/show_bug.cgi?id=38086

Without this change, we would get regressions by canonicalizing
to +0.0 in all fcmp, and that's a step towards solving:
https://bugs.llvm.org/show_bug.cgi?id=39475

llvm-svn: 346143
2018-11-05 16:50:44 +00:00
Vedant Kumar d2a895a972 [HotColdSplitting] Use TTI to inform outlining threshold
Using TargetTransformInfo allows the splitting pass to factor in the
code size cost of instructions as it decides whether or not outlining is
profitable.

This did not regress the overall amount of outlining seen on the handful
of internal frameworks I tested.

Thanks to Jun Bum Lim for suggesting this!

Differential Revision: https://reviews.llvm.org/D53835

llvm-svn: 346108
2018-11-04 23:11:57 +00:00
Jordan Rupprecht 80e7e86c29 [DebugInfo][InstMerge] Fix -debugify for phi node created by -mldst-motion
Summary:
-mldst-motion creates a new phi node without any debug info. Use the merged debug location from the incoming stores to fix this.

Fixes PR38177. The test case here is (somewhat) simplified from:

```
struct S {
  int foo;
  void fn(int bar);
};
void S::fn(int bar) {
  if (bar)
    foo = 1;
  else
    foo = 0;
}
```

Reviewers: dblaikie, gbedwell, aprantl, vsk

Reviewed By: vsk

Subscribers: vsk, JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D54019

llvm-svn: 346027
2018-11-02 18:25:41 +00:00
Ayal Zaks 45a3ca7be7 [LV] Avoid vectorizing loops under opt for size that involve SCEV checks
Fix PR39417, PR39497

The loop vectorizer may generate runtime SCEV checks for overflow and stride==1
cases, leading to execution of original scalar loop. The latter is forbidden
when optimizing for size. An assert introduced in r344743 triggered the above
PR's showing it does happen. This patch fixes this behavior by preventing
vectorization in such cases.

Differential Revision: https://reviews.llvm.org/D53612

llvm-svn: 345959
2018-11-02 09:16:12 +00:00
Max Kazantsev 872bb74b0a [NFC][LICM] Factor out instruction erasing logic
This patch factors out a function that makes all required updates
whenever an instruction gets erased.

Differential Revision: https://reviews.llvm.org/D54011
Reviewed By: apilipenko

llvm-svn: 345914
2018-11-02 00:21:45 +00:00
Florian Hahn de4f774783 [LoopInterchange] Fix unused variables in release build
llvm-svn: 345881
2018-11-01 19:51:13 +00:00
Florian Hahn c8bd6ea35e [LoopInterchange] Remove support for inner-only reductions.
Inner-loop only reductions require additional checks to make sure they
form a load-phi-store cycle across inner and outer loop. Otherwise the
reduction value is not properly preserved. This patch disables
interchanging such loops for now, as it causes miscompiles in some
cases and it seems to apply only for a tiny amount of loops. Across the
test-suite, SPEC2000 and SPEC2006, 61 instead of 62 loops are
interchange with inner loop reduction support disabled. With
-loop-interchange-threshold=-1000, 3256 instead of 3267.

See the discussion and history of D53027 for an outline of how such legality
checks could look like.

Reviewers: efriedma, mcrosier, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D53027

llvm-svn: 345877
2018-11-01 19:25:00 +00:00
Reid Kleckner 3f756fbabe Remove unnecessary fallthrough annotation after unreachable
Clang's -Wimplicit-fallthrough implementation warns on this. I built
clang with GCC 7.3 in +asserts and -asserts mode, and GCC doesn't warn
on this in either configuration. I think it is unnecessary. I separated
it from the large mechanical patch (https://reviews.llvm.org/D53950) in
case I am wrong and it has to be reverted.

llvm-svn: 345876
2018-11-01 19:11:05 +00:00
Max Kazantsev 46955b58ee [NFC] Reorganize code to prepare it for more transforms
llvm-svn: 345820
2018-11-01 09:42:50 +00:00
Max Kazantsev 3d347bf545 [IndVars] Smart hard uses detection
When rewriting loop exit values, IndVars considers this transform not profitable if
the loop instruction has a loop user which it believes cannot be optimized away.
In current implementation only calls that immediately use the instruction are considered
as such.

This patch extends the definition of "hard" users to any side-effecting instructions
(which usually cannot be optimized away from the loop) and also allows handling
of not just immediate users, but use chains.

Differentlai Revision: https://reviews.llvm.org/D51584
Reviewed By: etherzhhb

llvm-svn: 345814
2018-11-01 06:47:01 +00:00
Volkan Keles 3ca146d083 [InstCombine] Combine nested min/max intrinsics with constants
Reviewers: arsenm, spatel

Reviewed By: spatel

Subscribers: lebedev.ri, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D53774

llvm-svn: 345751
2018-10-31 17:50:52 +00:00
Sanjay Patel 1c254c6716 [InstCombine] refactor fabs+fcmp fold; NFC
Also, remove/replace/minimize/enhance the tests for this fold.
The code drops FMF, so it needs more tests and at least 1 fix.

llvm-svn: 345734
2018-10-31 16:34:43 +00:00
Sanjay Patel b9fe3fbb57 [InstCombine] add assertion that InstSimplify has folded a fabs+fcmp; NFC
The 'OLT' case was updated at rL266175, so I assume it was just an
oversight that 'UGE' was not included because that patch handled
both predicates in InstSimplify.

llvm-svn: 345727
2018-10-31 15:31:45 +00:00
Sanjay Patel 85cba3b6fb [InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative
This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086
...but given the current implementation (no FMF on casts), this is likely the only way to predicate the 
transform.

This is part of solving PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

Differential Revision: https://reviews.llvm.org/D53874

llvm-svn: 345725
2018-10-31 14:57:23 +00:00
Fedor Sergeev 412ed34744 [LoopUnroll] allow customization for new-pass-manager version of LoopUnroll
Unlike its legacy counterpart new pass manager's LoopUnrollPass does
not provide any means to select which flavors of unroll to run
(runtime, peeling, partial), relying on global defaults.

In some cases having ability to run a restricted LoopUnroll that
does more than LoopFullUnroll is needed.

Introduced LoopUnrollOptions to select optional unroll behaviors.
Added 'unroll<peeling>' to PassRegistry mainly for the sake of testing.

Reviewers: chandlerc, tejohnson
Differential Revision: https://reviews.llvm.org/D53440

llvm-svn: 345723
2018-10-31 14:33:14 +00:00
Max Kazantsev 541f824d32 [IndVars] Strengthen restricton in rewriteLoopExitValues
For some unclear reason rewriteLoopExitValues considers recalculation
after the loop profitable if it has some "soft uses" outside the loop (i.e. any
use other than call and return), even if we have proved that it has a user inside
the loop which we think will not be optimized away.

There is no existing unit test that would explain this. This patch provides an
example when rematerialisation of exit value is not profitable but it passes
this check due to presence of a "soft use" outside the loop.

It makes no sense to recalculate value on exit if we are going to compute it
due to some irremovable within the loop. This patch disallows applying this
transform in the described situation.

Differential Revision: https://reviews.llvm.org/D51581
Reviewed By: etherzhhb

llvm-svn: 345708
2018-10-31 10:30:50 +00:00
Dorit Nuzman 34da6dd696 [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668

llvm-svn: 345705
2018-10-31 09:57:56 +00:00
Alexander Potapenko c1c4c9a494 [MSan] another take at instrumenting inline assembly - now with calls
Turns out it's not always possible to figure out whether an asm()
statement argument points to a valid memory region.
One example would be per-CPU objects in the Linux kernel, for which the
addresses are calculated using the FS register and a small offset in the
.data..percpu section.
To avoid pulling all sorts of checks into the instrumentation, we replace
actual checking/unpoisoning code with calls to
msan_instrument_asm_load(ptr, size) and
msan_instrument_asm_store(ptr, size) functions in the runtime.

This patch doesn't implement the runtime hooks in compiler-rt, as there's
been no demand in assembly instrumentation for userspace apps so far.

llvm-svn: 345702
2018-10-31 09:32:47 +00:00
Matthias Braun 9fd397b423 ADT/STLExtras: Introduce llvm::empty; NFC
This is modeled after C++17 std::empty().

Differential Revision: https://reviews.llvm.org/D53909

llvm-svn: 345679
2018-10-31 00:23:23 +00:00
Sanjay Patel 4c39dfc91e [InstCombine] use 'match' to reduce code; NFC
llvm-svn: 345647
2018-10-30 20:52:25 +00:00
Quentin Colombet 900678227c [InstCombine] Teach the move free before null test opti how to deal with noop casts
InstCombine features an optimization that essentially replaces:
if (a)
  free(a)
into:
free(a)

Right now, this optimization is gated by the minsize attribute and therefore
we only perform it if we can prove that we are going to be able to eliminate
the branch and the destination block.

However when casts are involved the optimization would fail to apply, because
the optimization was not smart enough to realize that it is possible to also
move the casts away from the destination block and that is harmless to the
performance since they are just noops.
E.g.,
foo(int *a)
if (a)
  free((char*)a)

Wouldn't be optimized by instcombine, because
- We would refuse to hoist the `bitcast i32* %a to i8` in the source block
- We would fail to see that `bitcast i32* %a to i8` and %a are the same value.

This patch fixes both these problems:
- It teaches the pattern matching of the comparison how to look
  through casts.
- It checks that whether the additional instruction in the destination block
  can be hoisted and are harmless performance-wise.
- It hoists all the code of the destination block in the source block.

Differential Revision: D53356

llvm-svn: 345644
2018-10-30 20:51:04 +00:00
Calixte Denizet 38d50545fe [GCOV] Function counters are wrong when on one line
Summary:
After commit https://reviews.llvm.org/rL344228, the function definitions have a counter but when on one line the counter is wrong (e.g. void foo() { })
I added a test in: https://reviews.llvm.org/D53601

Reviewers: marco-c

Reviewed By: marco-c

Subscribers: llvm-commits, sylvestre.ledru

Differential Revision: https://reviews.llvm.org/D53600

llvm-svn: 345624
2018-10-30 18:41:31 +00:00
Simon Pilgrim 44a9a71d2a [TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)
Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector.

Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination!

I've done my best to fix a number of vectorizer uses:

SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment.

LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta.

Differential Revision: https://reviews.llvm.org/D53573

llvm-svn: 345617
2018-10-30 18:10:02 +00:00
Sanjay Patel 68a61cb07c [InstCombine] use getFltSemantics() instead of duplicating it; NFC
llvm-svn: 345613
2018-10-30 16:21:56 +00:00
Sanjay Patel b12e410082 [InstCombine] try to turn shuffle into insertelement
shuffle (insert ?, Scalar, IndexC), V1, Mask --> insert V1, Scalar, IndexC'

The motivating case is at least a couple of steps away: I noticed that
SLPVectorizer does not analyze shuffles as well as sequences of 
insert/extract in PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724
...so SLP may fail to vectorize when source code has shuffles to start 
with or instcombine has converted insert/extract to shuffles.

Independent of that, an insertelement is always a simpler op for IR 
analysis vs. a shuffle, so we should transform to insert when possible.

I don't think there's any codegen concern here - if a target can't insert 
a scalar directly to some fixed element in a vector (x86?), then this 
should get expanded to the insert+shuffle that we started with.

Differential Revision: https://reviews.llvm.org/D53507

llvm-svn: 345607
2018-10-30 15:26:39 +00:00
Jonas Paulsson 1f067c94dc [LoopVectorizer] Fix for cost values of memory accesses.
This commit is a combination of two patches:

* "Fix in getScalarizationOverhead()"

   If target returns false in TTI.prefersVectorizedAddressing(), it means the
   address registers will not need to be extracted. Therefore, there should
   be no operands scalarization overhead for a load instruction.

* "Don't pass the instruction pointer from getMemInstScalarizationCost."

   Since VF is always > 1, this is a cost query for an instruction in the
   vectorized loop and it should not be evaluated within the scalar
   context of the instruction.

Review: Ulrich Weigand, Hal Finkel
https://reviews.llvm.org/D52351
https://reviews.llvm.org/D52417

llvm-svn: 345603
2018-10-30 14:34:15 +00:00
Nicola Zaghen f96383c99e [SROA] Use offset sizes from the DataLayout instead of the pointer siezes.
This fixes an assertion when constant folding a GEP when the part of the offset
was in i32 (IndexSize, as per DataLayout) and part in the i64 (PointerSize) in
the newly created test case.

Differential Revision: https://reviews.llvm.org/D52609

llvm-svn: 345585
2018-10-30 11:15:04 +00:00
Vedant Kumar dd4be53b20 [HotColdSplitting] Allow outlining single-block cold regions
It can be profitable to outline single-block cold regions because they
may be large.

Allow outlining single-block regions if they have over some threshold of
non-debug, non-terminator instructions. I chose 3 as the threshold after
experimenting with several internal frameworks.

In practice, reducing the threshold further did not give much
improvement, whereas increasing it resulted in substantial regressions.

Differential Revision: https://reviews.llvm.org/D53824

llvm-svn: 345524
2018-10-29 19:15:39 +00:00
Florian Hahn fc7654a67b [Local] Keep K's range if K does not move when combining metadata.
As K has to dominate I, IIUC I's range metadata must be a subset of
K's. After Eli's recent clarification to the LangRef, loading a value
outside of the range is undefined behavior.
Therefore if I's range contains elements outside of K's range and we would load
one such value, K would cause undefined behavior.

In cases like hoisting/sinking, we still want the most generic range
over all code paths to/from the hoist/sink point. As suggested in the
patches related to D47339, I will refactor the handling of those
scenarios and try to decouple it from this function as follow up, once
we switched to a similar handling of metadata in most of
combineMetadata.

I updated some tests checking mostly the merging of metadata to keep the
metadata of to dominating load. The most interesting one is probably test8 in
test/Transforms/JumpThreading/thread-loads.ll. It contained a comment
about the alias metadata preventing us to eliminate the branch, but it
seem like the actual problem currently is that we merge the ranges of
both loads and cannot eliminate the icmp afterwards. With this patch, we
manage to eliminate the icmp, as the range of the first load excludes 8.

Reviewers: efriedma, nlopes, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D51629

llvm-svn: 345456
2018-10-27 16:53:45 +00:00
Simon Pilgrim a132016d4d Fix -Wdocumentation warning. NFCI.
llvm-svn: 345454
2018-10-27 15:14:42 +00:00
Leonard Chan eebecb3214 Revert "[PassManager/Sanitizer] Enable usage of ported AddressSanitizer passes with -fsanitize=address"
This reverts commit 8d6af840396f2da2e4ed6aab669214ae25443204 and commit
b78d19c287b6e4a9abc9fb0545de9a3106d38d3d which causes slower build times
by initializing the AddressSanitizer on every function run.

The corresponding revisions are https://reviews.llvm.org/D52814 and
https://reviews.llvm.org/D52739.

llvm-svn: 345433
2018-10-26 22:51:51 +00:00
Christy Lee 3cc0e935c4 Pointer types were treated as zero-size by MergeICmps
Summary:
The visitICmp analysis function would record compares of pointer types, as size 0. This causes the resulting memcmp() call to have the wrong total size.
Found with "self-build" of clang/LLVM on Windows.

Reviewers: christylee, trentxintong, courbet

Reviewed By: courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53536

llvm-svn: 345413
2018-10-26 18:02:06 +00:00
Max Kazantsev 619a83463f [SimpleLoopUnswitch] Unswitch by experimental.guard intrinsics
This patch adds support of `llvm.experimental.guard` intrinsics to non-trivial
simple loop unswitching. These intrinsics represent implicit control flow which
has pretty much the same semantics as usual conditional branches. The
algorithm of dealing with them is following:

- Consider guards as unswitching candidates;
- If a guard is considered the best candidate, turn it into a branch;
- Apply normal unswitching algorithm on this branch.

The patch has no compile time effect on code that does not contain any guards.

Differential Revision: https://reviews.llvm.org/D53744
Reviewed By: chandlerc

llvm-svn: 345387
2018-10-26 14:20:11 +00:00
Max Kazantsev bde31000b1 [SimpleLoopUnswitch] Make all checks before actual non-trivial unswitch
We should be able to make all relevant checks before we actually start the non-trivial
unswitching, so that we could guarantee that once we have started the transform,
it will always succeed.

Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D53747

llvm-svn: 345375
2018-10-26 09:52:58 +00:00
Cameron McInally 384a74b0e6 [FPEnv] Last BinaryOperator::isFNeg(...) to m_FNeg(...) changes
Replacing BinaryOperator::isFNeg(...) to avoid regressions when we
separate FNeg from the FSub IR instruction.

Differential Revision: https://reviews.llvm.org/D53650

llvm-svn: 345295
2018-10-25 18:09:33 +00:00
Carlos Alberto Enciso 9a24e1a7cd [DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG.
When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines.

Differential Revision: https://reviews.llvm.org/D53287

llvm-svn: 345250
2018-10-25 09:58:59 +00:00
Gabor Buella 1f6ca0ba15 Add -instcombine-code-sinking option
Reviewers: craig.topper, andrew.w.kaylor, efriedma

Reviewed By: craig.topper, andrew.w.kaylor, efriedma

Differential Revision: https://reviews.llvm.org/D52709

llvm-svn: 345248
2018-10-25 08:32:29 +00:00
Alina Sbirlea ad4d018202 Update MemorySSA in LoopRotate.
Summary:
Teach LoopRotate to preserve MemorySSA.
Enable tests for correctness, dependency disabled by default.

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D51718

llvm-svn: 345216
2018-10-24 22:46:45 +00:00
Vedant Kumar c299006879 [HotColdSplitting] Identify larger cold regions using domtree queries
The current splitting algorithm works in three stages:

  1) Identify cold blocks, then
  2) Use forward/backward propagation to mark hot blocks, then
  3) Grow a SESE region of blocks *outside* of the set of hot blocks and
  start outlining.

While testing this pass on Apple internal frameworks I noticed that some
kinds of control flow (e.g. loops) are never outlined, even though they
unconditionally lead to / follow cold blocks. I noticed two other issues
related to how cold regions are identified:

  - An inconsistency can arise in the internal state of the hotness
  propagation stage, as a block may end up in both the ColdBlocks set
  and the HotBlocks set. Further inconsistencies can arise as these sets
  do not match what's in ProfileSummaryInfo.

  - It isn't necessary to limit outlining to single-exit regions.

This patch teaches the splitting algorithm to identify maximal cold
regions and outline them. A maximal cold region is defined as the set of
blocks post-dominated by a cold sink block, or dominated by that sink
block. This approach can successfully outline loops in the cold path. As
a side benefit, it maintains less internal state than the current
approach.

Due to a limitation in CodeExtractor, blocks within the maximal cold
region which aren't dominated by a single entry point (a so-called "max
ancestor") are filtered out.

Results:
  - X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to
  47KB pre-patch, or a ~3x improvement. Did not see a performance impact
  across two runs.
  - AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB
  of TEXT were outlined. Ditto re: performance impact.
  - Outlining results improve marginally in the internal frameworks I
  tested.

Follow-ups:
  - Outline more than once per function, outline large single basic
  blocks, & try to remove unconditional branches in outlined functions.

Differential Revision: https://reviews.llvm.org/D53627

llvm-svn: 345209
2018-10-24 22:15:41 +00:00
Teresa Johnson c8dba682bb [hot-cold-split] Name split functions with ".cold" suffix
Summary:
The current default of appending "_"+entry block label to the new
extracted cold function breaks demangling. Change the deliminator from
"_" to "." to enable demangling. Because the header block label will
be empty for release compile code, use "extracted" after the "." when
the label is empty.

Additionally, add a mechanism for the client to pass in an alternate
suffix applied after the ".", and have the hot cold split pass use
"cold."+Count, where the Count is currently 1 but can be used to
uniquely number multiple cold functions split out from the same function
with D53588.

Reviewers: sebpop, hiraditya

Subscribers: llvm-commits, erik.pilkington

Differential Revision: https://reviews.llvm.org/D53534

llvm-svn: 345178
2018-10-24 18:53:47 +00:00
Sanjay Patel 3b206305fd [InstCombine] try harder to form select from logic ops (2nd try)
The original patch was committed here:
rL344609
...and reverted:
rL344612
...because it did not properly check/test data types before calling
ComputeNumSignBits(). 

The tests that caused bot failures for the previous commit are 
over-reaching front-end tests that run the entire -O optimizer 
pipeline: 
    Clang :: CodeGen/builtins-systemz-zvector.c
    Clang :: CodeGen/builtins-systemz-zvector2.c

I've added a negative test here to ensure coverage for that case.
The new early exit check also tests the type of the 'B' parameter,
so we don't waste time on matching if either value is unsuitable.

Original commit message:

This is part of solving PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

The patterns shown here are a special case of something
that we already convert to select. Using ComputeNumSignBits()
catches that case (but not the more complicated motivating
patterns yet).

The backend has hooks/logic to convert back to logic ops
if that's better for the target.

llvm-svn: 345149
2018-10-24 15:17:56 +00:00
Cameron McInally 678f43f666 [FPEnv] Convert more BinaryOperator::isFNeg(...) to m_FNeg(...)
This work is to avoid regressions when we seperate FNeg from the FSub IR instruction. 

Differential Revision: https://reviews.llvm.org/D53205

llvm-svn: 345146
2018-10-24 14:45:18 +00:00
Gil Rapaport c523036fd2 Revert r345114
Investigating fails.

llvm-svn: 345123
2018-10-24 08:41:22 +00:00
Dorit Nuzman 5114390e48 [LV] Don't have fold-tail under optsize invalidate interleave-groups when
masked-interleaving is enabled

Enable interleave-groups under fold-tail scenario for Opt for size compilation;
D50480 added support for vectorizing loops of arbitrary trip-count without a
remiander, which in turn makes everything in the loop conditional, including
interleave-groups if any. It therefore invalidated all interleave-groups
because we didn't have support for vectorizing predicated interleaved-groups
at the time. In the meantime, D53011 introduced this support, so we don't
have to invalidate interleave-groups when masked-interleaved support is enabled.

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: hsaito

Differential Revision: https://reviews.llvm.org/D53559

llvm-svn: 345115
2018-10-24 07:11:38 +00:00
Gil Rapaport 5012e7f6ac [LSR] Combine unfolded offset into invariant register
LSR reassociates constants as unfolded offsets when the constants fit as
immediate add operands, which currently prevents such constants from being
combined later with loop invariant registers.
This patch modifies GenerateCombinations() to generate a second formula which
includes the unfolded offset in the combined loop-invariant register.

Differential Revision: https://reviews.llvm.org/D51861

llvm-svn: 345114
2018-10-24 07:08:38 +00:00
Wei Mi 80a0c97e07 [PM] keeping history when original SCC split and then merge into itself
in the same round of SCC update.

In https://reviews.llvm.org/rL309784, inline history is added to prevent
infinite inlining across multiple run of inliner and SCC update, but the
history will only be kept when new SCC is actually generated during SCC update.

We found a case that SCC can be split and then merge into itself in the same
round of SCC update, so the same SCC will be pop out from UR.CWorklist and
then added back immediately, without any new SCC generated, that is why the
existing patch cannot catch the infinite inline case.

What the patch does is even if no new SCC is generated, if only the current
SCC appears in UR.CWorklist again, then keep the inline history.

Differential Revision: https://reviews.llvm.org/D52915

llvm-svn: 345103
2018-10-23 23:29:45 +00:00
Vedant Kumar 503154615d [HotColdSplitting] Attach MinSize to outlined code
Outlined code is cold by assumption, so it makes sense to optimize it
for minimal code size rather than performance.

After r344869 moved the splitting pass to the end of the IR pipeline,
this does not result in much of a code size reduction. This is probably
because a comparatively small number backend transforms make use of the
MinSize hint.

Running LNT on x86_64, I see that 33/1020 binaries shrink for a total of
919 bytes of TEXT reduction. I didn't measure a significant performance
impact.

Differential Revision: https://reviews.llvm.org/D53518

llvm-svn: 345072
2018-10-23 19:41:12 +00:00
Sanjay Patel 95790c546f [InstCombine] use 'match' to simplify code
There's probably some vector-with-undef-element pattern
that shows an improvement, so this is probably not quite
'NFC'.

This is the last step towards removing the fake binop 
queries for not/neg. Ie, there are no more uses of those
functions in trunk. Fneg should follow.

llvm-svn: 345050
2018-10-23 16:54:28 +00:00
Jordan Rupprecht 2fed6ac186 [DebugInfo][GlobalOpt] Fix -debugify for globalopt shrinking globals to booleans.
Summary:
TryToShrinkGlobalToBoolean, when possible, will split store <value> + load <value> into store <bool> + select <bool ? value : 0>. This preserves DebugLoc during that pass.

Fixes PR37959. The test case here is the simplified .ll for:

```
static int foo;
int bar() {
  foo = 5;
  return foo;
}
```

Reviewers: dblaikie, gbedwell, aprantl

Reviewed By: dblaikie

Subscribers: mehdi_amini, JDevlieghere, dexonsmith, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D53531

llvm-svn: 345046
2018-10-23 16:35:51 +00:00
Sanjay Patel 5b6b090cf2 [Reassociate] replace fake binop queries with 'match' API
We need to update this code before introducing an 'fneg' instruction in IR,
so we might as well kill off the integer neg/not queries too.

This is no-functional-change-intended for scalar code and most vector code. 
For vectors, we can see that the 'match' API allows for undef elements in 
constants, so we optimize those cases better.

Ideally, there would be a test for each code diff, but I don't see evidence
of that for the existing code, so I didn't try very hard to come up with new 
vector tests for each code change.

Differential Revision: https://reviews.llvm.org/D53533

llvm-svn: 345042
2018-10-23 15:55:06 +00:00
Simon Pilgrim 532a0f122e [SLPVectorizer] Add basic support for mul/and/or/xor horizontal reductions
Expand arithmetic reduction to include mul/and/or/xor instructions.

This patch just fixes the SLPVectorizer - the effective reduction costs for AVX1+ are still poor (see rL344846) and will need to be improved before SLP sees this as a valid transform - but we can already see the effect on SSE2 tests.

This partially helps PR37731, but doesn't fix it all as it still falls over on the extraction/reduction order for some reason.

Differential Revision: https://reviews.llvm.org/D53473

llvm-svn: 345037
2018-10-23 15:13:09 +00:00
Sanjay Patel 747feb28e4 [InstCombine] use 'match' to handle vectors and simplify code
This is another step towards completely removing the fake 
binop queries for not/neg/fneg.

llvm-svn: 345036
2018-10-23 15:05:12 +00:00
Sanjay Patel ad76c682c7 [InstCombine] swap select profile metadata when swapping select ops
llvm-svn: 345034
2018-10-23 14:43:31 +00:00
Sanjay Patel 5141435d23 [SLSR] use 'match' to simplify code; NFC
This pass could probably be modified slightly to allow
vector splat transforms for practically no cost, but
it only works on scalars for now. So the use of the
newer 'match' API should make no functional difference. 

llvm-svn: 345030
2018-10-23 14:07:39 +00:00
Dorit Nuzman da5dc13355 Leftover bits from https://reviews.llvm.org/D53420 that were accidentally left
out of revision 344883

llvm-svn: 345021
2018-10-23 11:51:55 +00:00
Kostya Serebryany af95597c3c [hwasan] add stack frame descriptions.
Summary:
At compile-time, create an array of {PC,HumanReadableStackFrameDescription}
for every function that has an instrumented frame, and pass this array
to the run-time at the module-init time.
Similar to how we handle pc-table in SanitizerCoverage.
The run-time is dummy, will add the actual logic in later commits.

Reviewers: morehouse, eugenis

Reviewed By: eugenis

Subscribers: srhines, llvm-commits, kubamracek

Differential Revision: https://reviews.llvm.org/D53227

llvm-svn: 344985
2018-10-23 00:50:40 +00:00
Sanjay Patel dd1c3df72d [Reassociate] add 'using namespace' to reduce bloat; NFC
llvm-svn: 344959
2018-10-22 21:37:02 +00:00
Teresa Johnson f431a2f261 [hot-cold-split] Add opt remark on success
Summary: Emit optimization remark on successful hot cold split.

Reviewers: sebpop, hiraditya

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53512

llvm-svn: 344938
2018-10-22 19:06:42 +00:00
Benjamin Kramer 3e778165d6 [CGProfile] Turn constant-size SmallVector into array
No functionality change.

llvm-svn: 344893
2018-10-22 10:51:34 +00:00
Dorit Nuzman 3ec99fe21b [IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when
optimizing for size

LV is careful to respect -Os and not to create a scalar epilog in all cases
(runtime tests, trip-counts that require a remainder loop) except for peeling
due to gaps in interleave-groups. This patch fixes that; -Os will now have us
invalidate such interleave-groups and vectorize without an epilog.

The patch also removes a related FIXME comment that is now obsolete, and was
also inaccurate:
"FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller
MaxVF that does not require a scalar epilog."
(requiresScalarEpilog() has nothing to do with VF).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53420

llvm-svn: 344883
2018-10-22 06:17:09 +00:00
Aditya Kumar d9e2e383a9 Schedule Hot Cold Splitting pass after most optimization passes
Summary:
In the new+old pass manager, hot cold splitting was schedule too early.
Thanks to Vedant for pointing this out.

Reviewers: sebpop, vsk

Reviewed By: sebpop, vsk

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D53437

llvm-svn: 344869
2018-10-21 18:11:56 +00:00
Sanjay Patel 0522b0da31 [InstCombine] use 'match' to simplify code; NFC
llvm-svn: 344855
2018-10-20 17:15:57 +00:00
Sanjay Patel ec572ade20 [InstCombine] make code more flexible with lambda; NFC
I couldn't tell from svn history when these checks were added,
but it pre-dates the split of instcombine into its own directory
at rL92459.

The motivation for changing the check is partly shown by the
code in PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724

There are also existing regression tests for SLPVectorizer with
sequences of extract+insert that are likely assumed to become
shuffles by the vectorizer cost models.

llvm-svn: 344854
2018-10-20 16:58:27 +00:00
Sanjay Patel 729c4362cf [InstCombine] add explanatory comment for strange vector logic; NFC
llvm-svn: 344852
2018-10-20 16:25:55 +00:00
Evandro Menezes 164ea101ab [NFC][InstCombine] Undo stray change
Undo stray change introduced by r344725.

llvm-svn: 344814
2018-10-19 20:57:45 +00:00
Thomas Lively c339250e12 [InstCombine] InstCombine and InstSimplify for minimum and maximum
Summary: Depends on D52765

Reviewers: aheejin, dschuff

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52766

llvm-svn: 344799
2018-10-19 19:01:26 +00:00
Sanjay Patel 70daf85bc2 [InstCombine] use m_Neg() in dyn_castNegVal() to match vectors with undef elts
llvm-svn: 344793
2018-10-19 17:54:53 +00:00
Fangrui Song 2e83b2e9ee Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC
llvm-svn: 344774
2018-10-19 06:12:02 +00:00
Ayal Zaks b0b5312e67 [LV] Fold tail by masking to vectorize loops of arbitrary trip count under opt for size
When optimizing for size, a loop is vectorized only if the resulting vector loop
completely replaces the original scalar loop. This holds if no runtime guards
are needed, if the original trip-count TC does not overflow, and if TC is a
known constant that is a multiple of the VF. The last two TC-related conditions
can be overcome by
1. rounding the trip-count of the vector loop up from TC to a multiple of VF;
2. masking the vector body under a newly introduced "if (i <= TC-1)" condition.

The patch allows loops with arbitrary trip counts to be vectorized under -Os,
subject to the existing cost model considerations. It also applies to loops with
small trip counts (under -O2) which are currently handled as if under -Os.

The patch does not handle loops with reductions, live-outs, or w/o a primary
induction variable, and disallows interleave groups.

(Third, final and main part of -)
Differential Revision: https://reviews.llvm.org/D50480

llvm-svn: 344743
2018-10-18 15:03:15 +00:00
Mikael Holmen e3605d0f70 Add a emitUnaryFloatFnCall version that fetches the function name from TLI
Summary:
In several places in the code we use the following pattern:

  if (hasUnaryFloatFn(&TLI, Ty, LibFunc_tan, LibFunc_tanf, LibFunc_tanl)) {
    [...]
    Value *Res = emitUnaryFloatFnCall(X, TLI.getName(LibFunc_tan), B, Attrs);
    [...]
  }

In short, we check if there is a lib-function for a certain type, and then
we _always_ fetch the name of the "double" version of the lib function and
construct a call to the appropriate function, that we just checked exists,
using that "double" name as a basis.

This is of course a problem in cases where the target doesn't support the
"double" version, but e.g. only the "float" version.

In that case TLI.getName(LibFunc_tan) returns "", and
emitUnaryFloatFnCall happily appends an "f" to "", and we erroneously end
up with a call to a function called "f".

To solve this, the above pattern is changed to

  if (hasUnaryFloatFn(&TLI, Ty, LibFunc_tan, LibFunc_tanf, LibFunc_tanl)) {
    [...]
    Value *Res = emitUnaryFloatFnCall(X, &TLI, LibFunc_tan, LibFunc_tanf,
                                      LibFunc_tanl, B, Attrs);
    [...]
  }

I.e instead of first fetching the name of the "double" version and then
letting emitUnaryFloatFnCall() add the final "f" or "l", we let
emitUnaryFloatFnCall() fetch the right name from TLI.

Reviewers: eli.friedman, efriedma

Reviewed By: efriedma

Subscribers: efriedma, bjope, llvm-commits

Differential Revision: https://reviews.llvm.org/D53370

llvm-svn: 344725
2018-10-18 06:27:53 +00:00
Chandler Carruth 60b2e054dc [TI removal] Switch simple loop unswitch to `Instruction`.
llvm-svn: 344719
2018-10-18 00:40:26 +00:00
Chandler Carruth c6cad4251e [TI removal] Switch NewGVN to directly use `Instruction`.
llvm-svn: 344718
2018-10-18 00:39:46 +00:00
Chandler Carruth c8eaea71c9 [TI removal] Use `Instruction` instead of `TerminatorInst` for
a variable's type.

llvm-svn: 344717
2018-10-18 00:39:18 +00:00
Chandler Carruth 8b7a8123dd [TI removal] Update CodeExtractor to use Instruction directly.
llvm-svn: 344716
2018-10-18 00:38:54 +00:00
Chandler Carruth c1e3ee29a4 [TI removal] Switch ObjCARC code to directly use the nice range-based
successors API or directly build the iterators out of the terminator
instruction and avoid requiring a TerminatorInst variable.

llvm-svn: 344715
2018-10-18 00:38:34 +00:00
Chandler Carruth 93cf2ea27a [TI removal] Switch MergeFunctions to directly use Instruction API.
llvm-svn: 344714
2018-10-18 00:37:37 +00:00
Nicolai Haehnle 0823050b9f StructurizeCFG: Simplify inserted PHI nodes
Summary:
This improves subsequent divergence analysis in some cases.

Change-Id: I5e95e7ec7fd3fa80d414d1a53a02fea23e3d67d3

Reviewers: arsenm, rampitec

Subscribers: jvesely, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D53316

llvm-svn: 344697
2018-10-17 15:37:41 +00:00
Fedor Sergeev c297e84b97 [LoopPredication] add some simple stats
Just adding some useful statistics to LoopPredication pass
which was lacking any of these.

llvm-svn: 344681
2018-10-17 09:02:54 +00:00
Leonard Chan 423957ad3a [Sanitizer][PassManager] Fix for failing ASan tests on arm-linux-gnueabihf
Forgot to initialize the legacy pass in it's constructor.

Differential Revision: https://reviews.llvm.org/D53350

llvm-svn: 344659
2018-10-17 00:16:07 +00:00
Teresa Johnson d2c234a4cc [ThinLTO] Add importing stats to thin link
Summary:
Previously we could only get the number of imported functions and
variables from the backend. This adds stats to the thin link where the
importing is decided.

Reviewers: wmi

Subscribers: inglorion, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D53337

llvm-svn: 344658
2018-10-16 23:49:50 +00:00
Jonathan Metzman 5eb8cba280 [SanitizerCoverage] Don't duplicate code to get section pointers
Summary:
Merge code used to get section start and section end pointers
for SanitizerCoverage constructors. This includes code that handles
getting the start pointers when targeting MSVC.

Reviewers: kcc, morehouse

Reviewed By: morehouse

Subscribers: kcc, hiraditya

Differential Revision: https://reviews.llvm.org/D53211

llvm-svn: 344657
2018-10-16 23:43:57 +00:00
David Bolvansky 7c7760da7e [InstCombine] Cleanup libfunc attribute inferring
Reviewers: efriedma

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53338

llvm-svn: 344645
2018-10-16 21:18:31 +00:00
Anna Thomas 6f732bfb79 [LV] Teach vectorizer about variant value store into uniform address
Summary:
Teach vectorizer about vectorizing variant value stores to uniform
address. Similar to rL343028, we do not allow vectorization if we have
multiple stores to the same uniform address.

Cost model already has the change for considering the extract
instruction cost for a variant value store. See added test cases for how
vectorization is done.
The patch also contains changes to the ORE messages.

Reviewers: Ayal, mkuper, anemet, hsaito

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D52656

llvm-svn: 344613
2018-10-16 15:46:26 +00:00
Sanjay Patel bb3dd34e62 revert rL344609: [InstCombine] try harder to form select from logic ops
I noticed a missing check and added it at rL344610, but there actually
are codegen tests that will fail without that, so I'll edit those and
submit a fixed patch with more tests.

llvm-svn: 344612
2018-10-16 15:26:08 +00:00
Sanjay Patel f6a7c8b1fc [InstCombine] make sure type is integer before calling ComputeNumSignBits
llvm-svn: 344610
2018-10-16 14:44:50 +00:00
Sanjay Patel 0c48c977b8 [InstCombine] try harder to form select from logic ops
This is part of solving PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

The patterns shown here are a special case of something
that we already convert to select. Using ComputeNumSignBits()
catches that case (but not the more complicated motivating
patterns yet).

The backend has hooks/logic to convert back to logic ops
if that's better for the target.

llvm-svn: 344609
2018-10-16 14:35:21 +00:00
Max Kazantsev 9c90ec2fae [NFC] Make LoopSafetyInfo abstract to allow alternative implementations
llvm-svn: 344592
2018-10-16 08:31:05 +00:00
Max Kazantsev 8d56be7070 [NFC] Encapsulate work with BlockColors in LoopSafetyInfo
llvm-svn: 344590
2018-10-16 08:07:14 +00:00
David Stenberg c9163855dd [DebugInfo][LCSSA] Rewrite pre-existing debug values outside loop
Summary:
Extend LCSSA so that debug values outside loops are rewritten to
use the PHI nodes that the pass creates.

This fixes PR39019. In that case, we ran LCSSA on a loop that
was later on vectorized, which left us with something like this:

  for.cond.cleanup:
    %add.lcssa = phi i32 [ %add, %for.body ], [ %34, %middle.block ]
    call void @llvm.dbg.value(metadata i32 %add,
    ret i32 %add.lcssa

  for.body:
    %add =
    [...]
    br i1 %exitcond, label %for.cond.cleanup, label %for.body

which later resulted in the debug.value becoming undef when
removing the scalar loop (and the location would have probably
been wrong for the vectorized case otherwise).

As we now may need to query the AvailableVals cache more than
once for a basic block, FindAvailableVals() in SSAUpdaterImpl is
changed so that it updates the cache for blocks that we do not
create a PHI node for, regardless of the block's number of
predecessors. The debug value in the attached IR reproducer
would not be properly rewritten without this.

Debug values residing in blocks where we have not inserted any
PHI nodes are currently left as-is by this patch. I'm not sure
what should be done with those uses.

Reviewers: mattd, aprantl, vsk, probinson

Reviewed By: mattd, aprantl

Subscribers: jmorse, gbedwell, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D53130

llvm-svn: 344589
2018-10-16 08:06:48 +00:00
Max Kazantsev c8466f937c [NFC] Turn isGuaranteedToExecute into a method
llvm-svn: 344587
2018-10-16 06:34:53 +00:00
Lang Hames 8f9a2446e0 Change a TerminatorInst* to an Instruction* in HotColdSplitting.cpp.
r344558 added an assignment to a TerminatorInst* from
BasicBlock::getTerminatorInst(), but BasicBlock::getTerminatorInst() returns an
Instruction* rather than a TerminatorInst* since r344504 so this fails to
compile.

Changing the variable to an Instruction* should get the bots building again.

llvm-svn: 344566
2018-10-15 22:27:03 +00:00
Sebastian Pop 542e522b87 [hot-cold-split] fix static analysis of cold regions
Make the code of blockEndsInUnreachable to match the function
blockEndsInUnreachable in CodeGen/BranchFolding.cpp. I also have
added a note to make sure the code of this function will not be
modified unless the back-end version is also modified.

An early return before outlining has been added to avoid
outlining the full function body when the first block in the
function is marked cold.

The static analysis of cold code has been amended to avoid
marking the whole function as cold by back-propagation
because the back-propagation would mark blocks with return
statements as cold.

The patch adds debug statements to help discover these problems.

Differential Revision: https://reviews.llvm.org/D52904

llvm-svn: 344558
2018-10-15 21:43:11 +00:00
Vedant Kumar 15718a6190 [CodeExtractor] Erase debug intrinsics in outlined thunks (fix PR22900)
Variable updates within the outlined function are invisible to
debuggers. This could be improved by defining a DISubprogram for the
new function. For the moment, simply erase the debug intrinsics instead.

This fixes verifier failures about function-local metadata being used in
the wrong function, seen while testing the hot/cold splitting pass.

rdar://45142482

Differential Revision: https://reviews.llvm.org/D53267

llvm-svn: 344545
2018-10-15 19:22:20 +00:00
Chandler Carruth e303c87e19 [TI removal] Make `getTerminator()` return a generic `Instruction`.
This removes the primary remaining API producing `TerminatorInst` which
will reduce the rate at which code is introduced trying to use it and
generally make it much easier to remove the remaining APIs across the
codebase.

Also clean up some of the stragglers that the previous mechanical update
of variables missed.

Users of LLVM and out-of-tree code generally will need to update any
explicit variable types to handle this. Replacing `TerminatorInst` with
`Instruction` (or `auto`) almost always works. Most of these edits were
made in prior commits using the perl one-liner:
```
perl -i -ple 's/TerminatorInst(\b.* = .*getTerminator\(\))/Instruction\1/g'
```

This also my break some rare use cases where people overload for both
`Instruction` and `TerminatorInst`, but these should be easily fixed by
removing the `TerminatorInst` overload.

llvm-svn: 344504
2018-10-15 10:42:50 +00:00
Chandler Carruth 52eaaf3ff8 [TI removal] Rework `InstVisitor` to support visiting instructions that
are terminators without relying on the specific `TerminatorInst` type.

This required cleaning up two users of `InstVisitor`s usage of
`TerminatorInst` as well.

llvm-svn: 344503
2018-10-15 10:10:54 +00:00
Chandler Carruth edb12a838a [TI removal] Make variables declared as `TerminatorInst` and initialized
by `getTerminator()` calls instead be declared as `Instruction`.

This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).

llvm-svn: 344502
2018-10-15 10:04:59 +00:00
Chandler Carruth ae98759ec5 [TI removal] Remove `TerminatorInst` from GVN.h and GVN.cpp.
This is the last interesting usage in all of LLVM's headers. The
remaining usages in headers are the core typesystem bits (Core.h,
instruction types, and InstVisitor) and as the return of
`BasicBlock::getTerminator`. The latter is the big remaining API point
that I'll remove after mass updates to user code.

llvm-svn: 344501
2018-10-15 10:00:15 +00:00
Chandler Carruth 4a2d58e16a [TI removal] Remove `TerminatorInst` from BasicBlockUtils.h
This requires updating a number of .cpp files to adapt to the new API.
I've just systematically updated all uses of `TerminatorInst` within
these files te `Instruction` so thta I won't have to touch them again in
the future.

llvm-svn: 344498
2018-10-15 09:34:05 +00:00
Chandler Carruth b99a24689b [TI removal] Remove TerminatorInst as an input parameter from all public
LLVM APIs. There weren't very many.

We still have the instruction visitor, and APIs with TerminatorInst as
a return type or an output parameter.

llvm-svn: 344494
2018-10-15 09:17:09 +00:00
Ayal Zaks e567b5b526 [LV] Fix comments reported when not vectorizing single iteration loops; NFC
Landing this as a separate part of https://reviews.llvm.org/D50480, being a
seemingly unrelated change ([LV] Vectorizing loops of arbitrary trip count
without remainder under opt for size).

llvm-svn: 344483
2018-10-14 17:53:02 +00:00
Sanjay Patel 7181146c6c [InstCombine] combine a shuffle and an extract subvector shuffle
This is part of the missing IR-level folding noted in D52912.
This should be ok as a canonicalization because the new shuffle mask can't
be any more complicated than the existing shuffle mask. If there's some 
target where the shorter vector shuffle is not legal, it should just end up 
expanding to something like the pair of shuffles that we're starting with here.

Differential Revision: https://reviews.llvm.org/D53037

llvm-svn: 344476
2018-10-14 15:25:06 +00:00
Dorit Nuzman 38bbf81ade recommit 344472 after fixing build failure on ARM and PPC.
llvm-svn: 344475
2018-10-14 08:50:06 +00:00
Dorit Nuzman 5118c68cde revert 344472 due to failures.
llvm-svn: 344473
2018-10-14 07:21:20 +00:00
Dorit Nuzman 8174368955 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011

llvm-svn: 344472
2018-10-14 07:06:16 +00:00
Benjamin Kramer c55e997556 Move some helpers from the global namespace into anonymous ones.
llvm-svn: 344468
2018-10-13 22:18:22 +00:00
Sanjay Patel 47579b21e2 [InstCombine] fix complexity canonicalization with fake unary vector ops
This is a preliminary step to avoid regressions when we add
an actual 'fneg' instruction to IR. See D52934 and D53205.

llvm-svn: 344458
2018-10-13 16:15:37 +00:00
David Bolvansky e8b3bba717 [InstCombine] Fixed crash with aliased functions
Summary: Fixes PR39177

Reviewers: spatel, jbuening

Reviewed By: jbuening

Subscribers: jbuening, llvm-commits

Differential Revision: https://reviews.llvm.org/D53129

llvm-svn: 344454
2018-10-13 15:21:55 +00:00
Kostya Serebryany bc504559ec move GetOrCreateFunctionComdat to Instrumentation.cpp/Instrumentation.h
Summary:
GetOrCreateFunctionComdat is currently used in SanitizerCoverage,
where it's defined. I'm planing to use it in HWASAN as well,
so moving it into a common location.
NFC

Reviewers: morehouse

Reviewed By: morehouse

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53218

llvm-svn: 344433
2018-10-12 23:21:48 +00:00
Jonathan Metzman 0b94e88007 [SanitizerCoverage] Prevent /OPT:REF from stripping constructors
Summary:
Linking with the /OPT:REF linker flag when building COFF files causes
the linker to strip SanitizerCoverage's constructors. Prevent this by
giving the constructors WeakODR linkage and by passing the linker a
directive to include sancov.module_ctor.

Include a test in compiler-rt to verify libFuzzer can be linked using
/OPT:REF

Reviewers: morehouse, rnk

Reviewed By: morehouse, rnk

Subscribers: rnk, morehouse, hiraditya

Differential Revision: https://reviews.llvm.org/D52119

llvm-svn: 344391
2018-10-12 18:11:47 +00:00
Max Moroz 4d010ca35b [SanitizerCoverage] Make Inline8bit and TracePC counters dead stripping resistant.
Summary:
Otherwise, at least on Mac, the linker eliminates unused symbols which
causes libFuzzer to error out due to a mismatch of the sizes of coverage tables.

Issue in Chromium: https://bugs.chromium.org/p/chromium/issues/detail?id=892167

Reviewers: morehouse, kcc, george.karpenkov

Reviewed By: morehouse

Subscribers: kubamracek, llvm-commits

Differential Revision: https://reviews.llvm.org/D53113

llvm-svn: 344345
2018-10-12 13:59:31 +00:00
Tim Northover 487780678f SCCP: avoid caching DenseMap entry that might be invalidated.
Later calls to getValueState might insert entries into the ValueState map and
cause reallocation, invalidating a reference.

llvm-svn: 344327
2018-10-12 09:01:59 +00:00
Eugene Leviant eddf6b5df5 [ThinLTO] Don't import GV which contains blockaddress
Differential revision: https://reviews.llvm.org/D53139

llvm-svn: 344325
2018-10-12 07:24:02 +00:00
Kostya Serebryany d891ac9794 merge two near-identical functions createPrivateGlobalForString into one
Summary:
We have two copies of createPrivateGlobalForString (in asan and in esan).
This change merges them into one. NFC

Reviewers: vitalybuka

Reviewed By: vitalybuka

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53178

llvm-svn: 344314
2018-10-11 23:03:27 +00:00
Leonard Chan 64e21b5cfd [PassManager/Sanitizer] Port of AddresSanitizer pass from legacy to new PassManager
This patch ports the legacy pass manager to the new one to take advantage of
the benefits of the new PM. This involved moving a lot of the declarations for
`AddressSantizer` to a header so that it can be publicly used via
PassRegistry.def which I believe contains all the passes managed by the new PM.

This patch essentially decouples the instrumentation from the legacy PM such
hat it can be used by both legacy and new PM infrastructure.

Differential Revision: https://reviews.llvm.org/D52739

llvm-svn: 344274
2018-10-11 18:31:51 +00:00
Amara Emerson 54f60255a2 [InstCombine] Fix SimplifyLibCalls erasing an instruction while IC still had references to it.
InstCombine keeps a worklist and assumes that optimizations don't
eraseFromParent() the instruction, which SimplifyLibCalls violates. This change
adds a new callback to SimplifyLibCalls to let clients specify their own hander
for erasing actions.

Differential Revision: https://reviews.llvm.org/D52729

llvm-svn: 344251
2018-10-11 14:51:11 +00:00
David Green 8066198442 [InstCombine] Demand bits of UMin
This is the umin alternative to the umax code from rL344237. We use
DeMorgans law on the umax case to bring us to the same thing on umin,
but using countLeadingOnes, not countLeadingZeros.

Differential Revision: https://reviews.llvm.org/D53036

llvm-svn: 344239
2018-10-11 11:28:27 +00:00
David Green 30c0e98b9c [InstCombine] Demand bits of UMax
Use the demanded bits of umax(A,C) to prove we can just use A so long as the
lowest non-zero bit of DemandMask is higher than the highest non-zero bit of C

Differential Revision: https://reviews.llvm.org/D53033

llvm-svn: 344237
2018-10-11 11:04:09 +00:00
Florian Hahn 18e07bb822 [LV] Use SmallVector instead of DenseMap in calculateRegisterUsage (NFC).
We assign indices sequentially for seen instructions, so we can just use
a vector and push back the seen instructions. No need for using a
DenseMap.

Reviewers: hsaito, rengolin, nadav, dcaballe

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D53089

llvm-svn: 344233
2018-10-11 09:46:25 +00:00
Florian Hahn 7eb5cb4ebc [LV] Ignore more debug info.
We can avoid doing some unnecessary work by skipping debug instructions
in a few loops. It also helps to ensure debug instructions do not
prevent vectorization, although I do not have any concrete test cases
for that.

Reviewers: rengolin, hsaito, dcaballe, aprantl, vsk

Reviewed By: rengolin, dcaballe

Differential Revision: https://reviews.llvm.org/D53091

llvm-svn: 344232
2018-10-11 09:27:24 +00:00
Calixte Denizet d2f290b034 [gcov] Display the hit counter for the line of a function definition
Summary:
Right now there is no hit counter on the line of function.
So the idea is add the line of the function to all the lines covered by the entry block.
Tests in compiler-rt/profile will be fixed in another patch: https://reviews.llvm.org/D49854

Reviewers: marco-c, davidxl

Reviewed By: marco-c

Subscribers: sylvestre.ledru, llvm-commits

Differential Revision: https://reviews.llvm.org/D49853

llvm-svn: 344228
2018-10-11 08:53:43 +00:00
Max Kazantsev b2e51090a4 [IndVars] Drop "exact" flag from lshr and udiv when substituting their args
There is a transform that may replace `lshr (x+1), 1` with `lshr x, 1` in case
if it can prove that the result will be the same. However the initial instruction
might have an `exact` flag set, and it now should be dropped unless we prove
that it may hold. Incorrectly set `exact` attribute may then produce poison.

Differential Revision: https://reviews.llvm.org/D53061
Reviewed By: sanjoy

llvm-svn: 344223
2018-10-11 07:22:26 +00:00
Richard Smith 6c67662816 Add a flag to remap manglings when reading profile data information.
This can be used to preserve profiling information across codebase
changes that have widespread impact on mangled names, but across which
most profiling data should still be usable. For example, when switching
from libstdc++ to libc++, or from the old libstdc++ ABI to the new ABI,
or even from a 32-bit to a 64-bit build.

The user can provide a remapping file specifying parts of mangled names
that should be treated as equivalent (eg, std::__1 should be treated as
equivalent to std::__cxx11), and profile data will be treated as
applying to a particular function if its name is equivalent to the name
of a function in the profile data under the provided equivalences. See
the documentation change for a description of how this is configured.

Remapping is supported for both sample-based profiling and instruction
profiling. We do not support remapping indirect branch target
information, but all other profile data should be remapped
appropriately.

Support is only added for the new pass manager. If someone wants to also
add support for this for the old pass manager, doing so should be
straightforward.

This is the LLVM side of Clang r344199.

Reviewers: davidxl, tejohnson, dlj, erik.pilkington

Subscribers: mehdi_amini, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51249

llvm-svn: 344200
2018-10-10 23:13:47 +00:00
George Burgess IV 6ef8002c2c Replace most users of UnknownSize with LocationSize::unknown(); NFC
Moving away from UnknownSize is part of the effort to migrate us to
LocationSizes (e.g. the cleanup promised in D44748).

This doesn't entirely remove all of the uses of UnknownSize; some uses
require tweaks to assume that UnknownSize isn't just some kind of int.
This patch is intended to just be a trivial replacement for all places
where LocationSize::unknown() will Just Work.

llvm-svn: 344186
2018-10-10 21:28:44 +00:00
Sanjay Patel 05aadf885d [InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try
Re-trying r344082 because it unintentionally included extra diffs.

Original commit message:
icmp ne (and X, 1), 0 --> trunc X to N x i1

Ideally, we'd do the same for scalars, but there will likely be
regressions unless we add more trunc folds as we're doing here
for vectors.

The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {

  %c = fcmp ole <4 x float> %x, %y
  %s = sext <4 x i1> %c to <4 x i32>
  %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
  %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
  %cond = or <4 x i32> %s1, %s2
  %condtr = trunc <4 x i32> %cond to <4 x i1>
  %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
  ret <4 x float> %r

}

Here's a sampling of the vector codegen for that case using
mask+icmp (current behavior) vs. trunc (with this patch):

AVX before:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vandps  LCPI0_0(%rip), %xmm0, %xmm0
vxorps  %xmm1, %xmm1, %xmm1
vpcmpeqd        %xmm1, %xmm0, %xmm0
vblendvps       %xmm0, %xmm3, %xmm2, %xmm0

AVX after:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vblendvps       %xmm0, %xmm2, %xmm3, %xmm0

AVX512f before:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vpbroadcastd    LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd       %zmm1, %zmm0, %k1
vblendmps       %zmm3, %zmm2, %zmm0 {%k1}

AVX512f after:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vpslld  $31, %xmm0, %xmm0
vptestmd        %zmm0, %zmm0, %k1
vblendmps       %zmm2, %zmm3, %zmm0 {%k1}

AArch64 before:

fcmge   v0.4s, v1.4s, v0.4s
zip1    v1.4s, v0.4s, v0.4s
zip2    v0.4s, v0.4s, v0.4s
orr     v0.16b, v1.16b, v0.16b
movi    v1.4s, #1
and     v0.16b, v0.16b, v1.16b
cmeq    v0.4s, v0.4s, #0
bsl     v0.16b, v3.16b, v2.16b

AArch64 after:

fcmge   v0.4s, v1.4s, v0.4s
zip1    v1.4s, v0.4s, v0.4s
zip2    v0.4s, v0.4s, v0.4s
orr     v0.16b, v1.16b, v0.16b
bsl     v0.16b, v2.16b, v3.16b

PowerPC-le before:

xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34

PowerPC-le after:

xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0

Differential Revision: https://reviews.llvm.org/D52747

llvm-svn: 344181
2018-10-10 20:47:46 +00:00
Sanjay Patel 58fc00d0bc revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization
This commit accidentally included the diffs from D53057.

llvm-svn: 344178
2018-10-10 20:39:39 +00:00
Renato Golin d8e7ca4a32 [VPlan] Fix CondBit quoting in dumpBasicBlock
Quotes were being printed for VPInstructions but not the rest.

llvm-svn: 344161
2018-10-10 17:55:21 +00:00
Scott Linder 3759efc650 Relax trivial cast requirements in CallPromotionUtils
Differential Revision: https://reviews.llvm.org/D52792

llvm-svn: 344153
2018-10-10 16:35:47 +00:00
Carlos Alberto Enciso c0952c8a08 Revert "[DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG."
This reverts commit r344120.

It was causing buildbot failures.

llvm-svn: 344135
2018-10-10 12:09:34 +00:00
Neil Henning 3d4579829e Fix an ordering bug in the scalarizer.
I've added a new test case that causes the scalarizer to try and use
dead-and-erased values - caused by the basic blocks not being in
domination order within the function. To fix this, instead of iterating
through the blocks in function order, I walk them in reverse post order.

Differential Revision: https://reviews.llvm.org/D52540

llvm-svn: 344128
2018-10-10 09:27:45 +00:00
Carlos Alberto Enciso e7a347e5f8 [DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG.
When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines. 

Differential Revision: https://reviews.llvm.org/D52887

llvm-svn: 344120
2018-10-10 08:29:55 +00:00
George Burgess IV 40dc63e1f0 [Analysis] Make LocationSizes carry an 'imprecise' bit
There are places where we need to merge multiple LocationSizes of
different sizes into one, and get a sensible result.

There are other places where we want to optimize aggressively based on
the value of a LocationSizes (e.g. how can a store of four bytes be to
an area of storage that's only two bytes large?)

This patch makes LocationSize hold an 'imprecise' bit to note whether
the LocationSize can be treated as an upper-bound and lower-bound for
the size of a location, or just an upper-bound.

This concludes the series of patches leading up to this. The most recent
of which is r344108.

Fixes PR36228.

Differential Revision: https://reviews.llvm.org/D44748

llvm-svn: 344114
2018-10-10 06:39:40 +00:00
Max Kazantsev 1d893bfcea [NFC] Make a variable const
llvm-svn: 344113
2018-10-10 04:19:38 +00:00
Sanjay Patel e9ca7ea3e5 [InstCombine] reverse 'trunc X to <N x i1>' canonicalization
icmp ne (and X, 1), 0 --> trunc X to N x i1

Ideally, we'd do the same for scalars, but there will likely be 
regressions unless we add more trunc folds as we're doing here 
for vectors.

The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {
  %c = fcmp ole <4 x float> %x, %y
  %s = sext <4 x i1> %c to <4 x i32>
  %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
  %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
  %cond = or <4 x i32> %s1, %s2
  %condtr = trunc <4 x i32> %cond to <4 x i1>
  %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
  ret <4 x float> %r
}

Here's a sampling of the vector codegen for that case using 
mask+icmp (current behavior) vs. trunc (with this patch):

AVX before:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vandps	LCPI0_0(%rip), %xmm0, %xmm0
vxorps	%xmm1, %xmm1, %xmm1
vpcmpeqd	%xmm1, %xmm0, %xmm0
vblendvps	%xmm0, %xmm3, %xmm2, %xmm0

AVX after:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vblendvps	%xmm0, %xmm2, %xmm3, %xmm0

AVX512f before:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vpbroadcastd	LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd	%zmm1, %zmm0, %k1
vblendmps	%zmm3, %zmm2, %zmm0 {%k1}

AVX512f after:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vpslld	$31, %xmm0, %xmm0
vptestmd	%zmm0, %zmm0, %k1
vblendmps	%zmm2, %zmm3, %zmm0 {%k1}

AArch64 before:

fcmge	v0.4s, v1.4s, v0.4s
zip1	v1.4s, v0.4s, v0.4s
zip2	v0.4s, v0.4s, v0.4s
orr	v0.16b, v1.16b, v0.16b
movi	v1.4s, #1
and	v0.16b, v0.16b, v1.16b
cmeq	v0.4s, v0.4s, #0
bsl	v0.16b, v3.16b, v2.16b

AArch64 after:

fcmge	v0.4s, v1.4s, v0.4s
zip1	v1.4s, v0.4s, v0.4s
zip2	v0.4s, v0.4s, v0.4s
orr	v0.16b, v1.16b, v0.16b
bsl	v0.16b, v2.16b, v3.16b

PowerPC-le before:

xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34

PowerPC-le after:

xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0

Differential Revision: https://reviews.llvm.org/D52747

llvm-svn: 344082
2018-10-09 21:26:01 +00:00
Sanjay Patel 88194dfe1a [InstCombine] make helper function 'static'; NFC
llvm-svn: 344056
2018-10-09 15:29:26 +00:00
George Burgess IV f96e618017 Make LocationSize a proper Optional type; NFC
This is the second in a series of changes intended to make
https://reviews.llvm.org/D44748 more easily reviewable. Please see that
patch for more context. The first change being r344012.

Since I was requested to do all of this with post-commit review, this is
about as small as I can make this patch.

This patch makes LocationSize into an actual type that wraps a uint64_t;
users are required to call getValue() in order to get the size now. If
the LocationSize has an Unknown size (e.g. if LocSize ==
MemoryLocation::UnknownSize), getValue() will assert.

This also adds DenseMap specializations for LocationInfo, which required
taking two more values from the set of values LocationInfo can
represent. Hence, heavy users of multi-exabyte arrays or structs may
observe slightly lower-quality code as a result of this change.

The intent is for getValue()s to be very close to a corresponding
hasValue() (which is often spelled `!= MemoryLocation::UnknownSize`).
Sadly, small diff context appears to crop that out sometimes, and the
last change in DSE does require a bit of nonlocal reasoning about
control-flow. :/

This also removes an assert, since it's now redundant with the assert in
getValue().

llvm-svn: 344013
2018-10-09 03:18:56 +00:00
George Burgess IV fefc42c9bb Use locals instead of struct fields; NFC
This is one of a series of changes intended to make
https://reviews.llvm.org/D44748 more easily reviewable. Please see that
patch for more context.

Since I was requested to do all of this with post-commit review, this is
about as small as I can make it (beyond committing changes to these few
files separately, but they're incredibly similar in spirit, so...)

On its own, this change doesn't make a great deal of sense. I plan on
having a follow-up Real Soon Now(TM) to make the bits here make more
sense. :)

In particular, the next change in this series is meant to make
LocationSize an actual type, which you have to call .getValue() on in
order to get at the uint64_t inside. Hence, this change refactors code
so that:
- we only need to call the soon-to-come getValue() once in most cases,
  and
- said call to getValue() happens very closely to a piece of code that
  checks if the LocationSize has a value (e.g. if it's != UnknownSize).

llvm-svn: 344012
2018-10-09 02:14:33 +00:00
Robert Lougher 0c93ea2634 [TailCallElim] Enable marking of calls with byval as tails
In r339636 the alias analysis rules were changed with regards to tail calls
and byval arguments. Previously, tail calls were assumed not to alias
allocas from the current frame. This has been updated, to not assume this
for arguments with the byval attribute.

This patch aligns TailCallElim with the new rule. Tail marking can now be
more aggressive and mark more calls as tails, e.g.:

define void @test() {
  %f = alloca %struct.foo
  call void @bar(%struct.foo* byval %f)
  ret void
}

define void @test2(%struct.foo* byval %f) {
  call void @bar(%struct.foo* byval %f)
  ret void
}

define void @test3(%struct.foo* byval %f) {
  %agg.tmp = alloca %struct.foo
  %0 = bitcast %struct.foo* %agg.tmp to i8*
  %1 = bitcast %struct.foo* %f to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 40, i1 false)
  call void @bar(%struct.foo* byval %agg.tmp)
  ret void
}

The problematic case where a byval parameter is captured by a call is still
handled correctly, and will not be marked as a tail (see PR7272).

llvm-svn: 343986
2018-10-08 18:03:40 +00:00
Xin Tong bfdad33b82 [ThinLTO] Keep non-prevailing (linkonce|weak)_odr symbols live
Summary:
If we have a symbol with (linkonce|weak)_odr linkage, we do not want
to dead strip it even it is not prevailing.

IR level (linkonce|weak)_odr symbol can become non-prevailing when we mix
ELF objects and IR objects where the (linkonce|weak)_odr symbol in the ELF
object is prevailing and the ones in the IR objects are not. Stripping
them will prevent us from doing optimizations with them.

By not dead stripping them, We will convert these symbols to
available_externally linkage as a result of non-prevailing and eventually
dropping them after inlining.

I modified cache-prevailing.ll to use linkonce linkage as it is
testing whether cache prevailing bit is effective or not, not
we should treat linkonce_odr alive or not

Reviewers: tejohnson, pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D52893

llvm-svn: 343970
2018-10-08 15:12:48 +00:00
Neil Henning 57f5d0a885 [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.
The IRBuilder CreateIntrinsic method wouldn't allow you to specify the
types that you wanted the intrinsic to be mangled with. To fix this
I've:

- Added an ArrayRef<Type *> member to both CreateIntrinsic overloads.
- Used that array to pass into the Intrinsic::getDeclaration call.
- Added a CreateUnaryIntrinsic to replace the most common use of
  CreateIntrinsic where the type was auto-deduced from operand 0.
- Added a bunch more unit tests to test Create*Intrinsic calls that
  weren't being tested (including the FMF flag that wasn't checked).

This was suggested as part of the AMDGPU specific atomic optimizer
review (https://reviews.llvm.org/D51969).

Differential Revision: https://reviews.llvm.org/D52087

llvm-svn: 343962
2018-10-08 10:32:33 +00:00
Ewan Crawford fa120cbdbc [InstCombine] Fix incongruous GEP type addrspace
Currently running the @insertelem_after_gep function below through the InstCombine pass with opt produces invalid IR.

Input:
```
define void @insertelem_after_gep(<16 x i32>* %t0) {
   %t1 = bitcast <16 x i32>* %t0 to [16 x i32]*
   %t2 = addrspacecast [16 x i32]* %t1 to [16 x i32] addrspace(3)*
   %t3 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(3)* %t2, i64 0, i64 0
   %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
   call void @extern_vec_pointers_func(<16 x i32 addrspace(3)*> %t4)
   ret void
}
```

Output:

```
define void @insertelem_after_gep(<16 x i32>* %t0) {
  %t3 = getelementptr inbounds <16 x i32>, <16 x i32>* %t0, i64 0, i64 0
  %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
  call void @my_extern_func(<16 x i32 addrspace(3)*> %t4)
  ret void
}
```

Which although causes no complaints when produced, isn't valid IR as the insertelement use of the %t3 GEP expects an address space.

```
opt: /tmp/bad.ll:52:73: error: '%t3' defined with type 'i32*' but expected 'i32 addrspace(3)*'
  %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
```

I've fixed this by adding an addrspacecast after the GEP in the InstCombine pass, and including a check for this type mismatch to the verifier.

Reviewers: spatel, lebedev.ri
Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52294

llvm-svn: 343956
2018-10-08 08:40:45 +00:00
Max Kazantsev b07369651e [LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160
At the point when we perform `emitTransformedIndex`, we have a broken IR (in
particular, we have Phis for which not every incoming value is properly set). On
such IR, it is illegal to create SCEV expressions, because their internal
simplification process may try to prove some predicates and break when it
stumbles across some broken IR.

The only purpose of using SCEV in this particular place is attempt to simplify
the generated code slightly. It seems that the result isn't worth it, because
some trivial cases (like addition of zero and multiplication by 1) can be
handled separately if needed, but more generally InstCombine is able to achieve
the goals we want to achieve by using SCEV.

This patch fixes a functional crash described in PR39160, and as side-effect it
also generates a bit smarter code in some simple cases. It also may cause some
optimality loss (i.e. we will now generate `mul` by power of `2` instead of
shift etc), but there is nothing what InstCombine could not handle later. In
case of dire need, we can support more trivial cases just in place.

Note that this patch only fixes one particular case of the general problem that
LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR.
The general solution, however, seems complex enough.

Differential Revision: https://reviews.llvm.org/D52881
Reviewed By: fhahn, hsaito

llvm-svn: 343954
2018-10-08 05:46:29 +00:00
Jonas Paulsson 29d80f07ee [LoopVectorizer] Use TTI.getOperandInfo()
Call getOperandInfo() instead of using (near) duplicated code in
LoopVectorizationCostModel::getInstructionCost().

This gets the OperandValueKind and OperandValueProperties values for a Value
passed as operand to an arithmetic instruction.

getOperandInfo() used to be a static method in TargetTransformInfo.cpp, but
is now instead a public member.

Review: Florian Hahn
https://reviews.llvm.org/D52883

llvm-svn: 343852
2018-10-05 14:34:04 +00:00
Neil Henning d2261f617b Add missing period to comment to match style of file.
This is a test commit to show that my commit access is working.

llvm-svn: 343842
2018-10-05 09:39:07 +00:00
Craig Topper 029d1ef6eb [SimplifyCFG] Pass AggressiveInsts to DominatesMergePoint by reference. Remove null check.
Summary:
At some point in the past the recursion in DominatesMergePoint used to pass null for AggressiveInsts as part of the recursion. It no longer does this. So there is no way for AggressiveInsts to be null.

This passes it by reference and removes the null check to make this explicit.

Reviewers: efriedma, reames

Reviewed By: efriedma

Subscribers: xbolva00, llvm-commits

Differential Revision: https://reviews.llvm.org/D52575

llvm-svn: 343828
2018-10-04 23:40:31 +00:00
Sanjay Patel 3436dc2923 [InstCombine] drop poison flags in SimplifyVectorDemandedElts
We established the (unfortunately complicated) rules for UB/poison
propagation with vector ops in:
D48893
D48987
D49047

It's clear from the affected tests that we are potentially creating 
poison where none existed before the transforms. For add/sub/mul,
the answer is simple: just drop the flags because the extra undef
vector lanes are generally more valuable for analysis and codegen.

llvm-svn: 343819
2018-10-04 21:36:50 +00:00
Craig Topper 1d15f7b02b [SimplifyCFG] Change recursive calls to llvm::SimplifyCFG to instead use an outer while loop to revisit.
Summary:
The llvm::SimplifyCFG function creates a SimplifyCFGOpt object and calls run on it. There were numerous places reached from this run function that called back out llvm::SimplifyCFG which would create another SimplifyCFGOpt object. This is an inefficient use of stack space at minimum. We are also not passing along the LoopHeaders pointer passed into the outer llvm::SimplifyCFG call. So if its not null we lose it on the first recursion and get nullptr from there on.

This patch adds an outer loop around the main BasicBlock simplifying code and adds a flag to the SimplifyCFGOpt class that can be set by to request another iteration. I don't think we can iterate based just on the change flag alone since some of the simplifications delete a basic block entirely leaving nothing to iterate on.

Reviewers: bogner, eli.friedman, reames

Reviewed By: reames

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52760

llvm-svn: 343816
2018-10-04 21:11:52 +00:00
Sanjay Patel 9d6688e38d [InstCombine] reduce code duplication in SimplifyDemandedVectorElts; NFCI
llvm-svn: 343806
2018-10-04 19:12:07 +00:00
Sanjay Patel 3746e11abe [InstCombine] allow bitcast to/from FP for vector insert/extract transform
This is a follow-up to rL343482 / D52439.
This was a pattern that initially caused the commit to be reverted because
the transform requires a bitcast as shown here.

llvm-svn: 343794
2018-10-04 16:25:05 +00:00
Sanjay Patel cafdeb1aa6 [InstCombine] allow SimplifyDemandedVectorElts to work with FP binops
We're a long way from D50992 and D51553, but this is where we have to start.
We weren't back-propagating undefs into binop constant values for anything but
add/sub/mul/and/or/xor. 

This is likely because we have to be careful about not introducing UB/poison 
with div/rem/shift. But I suspect we already are getting the poison part wrong 
for add/sub/mul (although it may not be possible to expose the bug currently
because we use SimplifyDemandedVectorElts from a limited set of opcodes).
See the discussion/implementation from D48987 and D49047.

This patch just enables functionality for FP ops because those do not have 
UB/poison potential.

llvm-svn: 343727
2018-10-03 21:44:59 +00:00
Sanjay Patel 306f14ceb8 [InstCombine] clean up foldVectorBinop(); NFC
1. Fix include ordering.
2. Improve variable name (width is bitwidth not number-of-elements).
3. Add local Opcode variable to reduce code duplication.

llvm-svn: 343694
2018-10-03 15:46:03 +00:00
Sanjay Patel 79dceb2903 [InstCombine] name change: foldShuffledBinop -> foldVectorBinop; NFC
This function will deal with more than shuffles with D50992, and I 
have another potential per-element fold that could live here.

llvm-svn: 343692
2018-10-03 15:20:58 +00:00
Florian Hahn 11a1423348 [LoopInterchange] Remove unused variable PreserveLCSSA (NFC).
llvm-svn: 343676
2018-10-03 11:01:23 +00:00
Aditya Kumar a27014b851 Improve static analysis of cold basic blocks
Differential Revision: https://reviews.llvm.org/D52704

Reviewers: sebpop, tejohnson, brzycki, SirishP
Reviewed By: sebpop

llvm-svn: 343663
2018-10-03 06:21:05 +00:00
Aditya Kumar 9e20ade72a Add support for new pass manager
Modified the testcases to use both pass managers
Use single commandline flag for both pass managers.

Differential Revision: https://reviews.llvm.org/D52708
Reviewers: sebpop, tejohnson, brzycki, SirishP
Reviewed By: tejohnson, brzycki

llvm-svn: 343662
2018-10-03 05:55:20 +00:00
David Green 1e44c3b62c [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A
This is an attempt to get out of a local-minimum that instcombine currently
gets stuck in. We essentially combine two optimisations at once, ~a - ~b = b-a
and min(~a, ~b) = ~max(a, b), only doing the transform if the result is at
least neutral. This involves using IsFreeToInvert, which has been expanded a
little to include selects that can be easily inverted.

This is trying to fix PR35875, using the ideas from Sanjay. It is a large
improvement to one of our rgb to cmy kernels.

Differential Revision: https://reviews.llvm.org/D52177

llvm-svn: 343569
2018-10-02 09:48:34 +00:00
Craig Topper d616d33a96 [SimplifyCFG] Use Value::hasNUses instead of 'getNumUses() =='. NFCI
getNumUses is linear in the number of uses. Since we're looking for a specific use count, we can use hasNUses which will stop as soon as it determines there are more than N uses instead of walking all of them.

llvm-svn: 343550
2018-10-01 23:09:52 +00:00
Craig Topper 90c0a0621c [SimplifyCFG] Update comments that refer to CondBB to say ThenBB instead. NFC
There is no variable in this function named CondBB, but there is one named ThenBB and I believe the comments are all refering to it.

llvm-svn: 343548
2018-10-01 22:56:11 +00:00
Eric Christopher dcf1d97c5c Temporarily revert "[GVNHoist] Re-enable GVNHoist by default"
This reverts commit r342387 as it's showing significant performance
regressions in a number of benchmarks. Followed up with the
committer and original thread with an example and will get performance
numbers before recommitting.

llvm-svn: 343522
2018-10-01 18:57:08 +00:00
Jesper Antonsson c954b86391 [InstCombine] Handle vector compares in foldGEPIcmp(), take 2
Summary:
This is a continuation of the fix for PR34627 "InstCombine assertion at vector gep/icmp folding". (I just realized bugpoint had fuzzed the original test for me, so I had fixed another trigger of the same assert in adjacent code in InstCombine.)

This patch avoids optimizing an icmp (to look only at the base pointers) when the resulting icmp would have a different type.

The patch adds a testcase and also cleans up and shrinks the pre-existing test for the adjacent assert trigger.

Reviewers: lebedev.ri, majnemer, spatel

Reviewed By: lebedev.ri

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52494

llvm-svn: 343486
2018-10-01 14:59:25 +00:00
Sanjay Patel 31b07198f1 [InstCombine] try to convert vector insert+extract to trunc; 2nd try
This was originally committed at rL343407, but reverted at 
rL343458 because it crashed trying to handle a case where
the destination type is FP. This version of the patch adds
a check for that possibility. Tests added at rL343480.

Original commit message:

This transform is requested for the backend in:
https://bugs.llvm.org/show_bug.cgi?id=39016
...but I figured it was worth doing in IR too, and it's probably
easier to implement here, so that's this patch.

In the simplest case, we are just truncating a scalar value. If the
extract index doesn't correspond to the LSBs of the scalar, then we
have to shift-right before the truncate. Endian-ness makes this tricky,
but hopefully the ASCII-art helps visualize the transform.

Differential Revision: https://reviews.llvm.org/D52439

llvm-svn: 343482
2018-10-01 14:40:00 +00:00
Hans Wennborg a60aa91374 Revert r343407 "[InstCombine] try to convert vector insert+extract to trunc"
This caused Chromium builds to fail with "Illegal Trunc" assertion.
See https://crbug.com/890723 for repro.

> This transform is requested for the backend in:
> https://bugs.llvm.org/show_bug.cgi?id=39016
> ...but I figured it was worth doing in IR too, and it's probably
> easier to implement here, so that's this patch.
>
> In the simplest case, we are just truncating a scalar value. If the
> extract index doesn't correspond to the LSBs of the scalar, then we
> have to shift-right before the truncate. Endian-ness makes this tricky,
> but hopefully the ASCII-art helps visualize the transform.
>
> Differential Revision: https://reviews.llvm.org/D52439

llvm-svn: 343458
2018-10-01 12:07:45 +00:00
Florian Hahn 8600fee52e Recommit r343308: [LoopInterchange] Turn into a loop pass.
llvm-svn: 343450
2018-10-01 09:59:48 +00:00
Fangrui Song 3507c6e884 Use the container form llvm::sort(C, ...)
There are a few leftovers in rL343163 which span two lines. This commit
changes these llvm::sort(C.begin(), C.end, ...) to llvm::sort(C, ...)

llvm-svn: 343426
2018-09-30 22:31:29 +00:00
Sanjay Patel 1e0f1f645a [InstCombine] try to convert vector insert+extract to trunc
This transform is requested for the backend in:
https://bugs.llvm.org/show_bug.cgi?id=39016
...but I figured it was worth doing in IR too, and it's probably 
easier to implement here, so that's this patch.

In the simplest case, we are just truncating a scalar value. If the 
extract index doesn't correspond to the LSBs of the scalar, then we 
have to shift-right before the truncate. Endian-ness makes this tricky, 
but hopefully the ASCII-art helps visualize the transform.

Differential Revision: https://reviews.llvm.org/D52439

llvm-svn: 343407
2018-09-30 14:34:01 +00:00
Sanjay Patel 26c119a9c2 [InstCombine] allow lengthening of insertelement to eliminate shuffles
As noted in post-commit comments for D52548, the limitation on 
increasing vector length can be applied by opcode.
As a first step, this patch only allows insertelement to be
widened because that has no logical downsides for IR and has 
little risk of pessimizing codegen.

This may cause PR39132 to go into hiding during a full compile,
but that bug is not fixed.

llvm-svn: 343406
2018-09-30 13:50:42 +00:00
Sanjay Patel 54d31ef87e [InstCombine] fix formatting in vector evaluators; NFC
We need to alter the functionality as shown in D52548.

llvm-svn: 343379
2018-09-29 15:05:24 +00:00
Vitaly Buka 0509070811 [cxx2a] Fix warning triggered by r343285
llvm-svn: 343369
2018-09-29 02:17:12 +00:00
whitequark 29b2980159 Revert "[LLVM-C] Add bindings for addCoroutinePassesToExtensionPoints"
This reverts commit c4baf7c2f06ff5459c4f5998ce980346e72bff97.

Broke the bots, and should really be in Transforms/Coroutines
instead.

llvm-svn: 343337
2018-09-28 16:45:18 +00:00
whitequark 937afbc365 [LLVM-C] Add bindings for addCoroutinePassesToExtensionPoints
Summary: This patch adds bindings to C and Go for addCoroutinePassesToExtensionPoints, which is used to add coroutine passes to the correct locations in PassManagerBuilder.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: mehdi_amini, modocache, llvm-commits

Differential Revision: https://reviews.llvm.org/D51642

llvm-svn: 343336
2018-09-28 16:38:11 +00:00
Sanjay Patel 242f90fe82 [InstCombine] don't propagate wider shufflevector arguments to predecessors
InstCombine would propagate shufflevector insts that had wider output vectors onto 
predecessors, which would sometimes push undef's onto the divisor of a div/rem and 
result in bad codegen.

I've fixed this by just banning propagating shufflevector back if the result of 
the shufflevector is wider than the input vectors.

Patch by: @sheredom (Neil Henning)

Differential Revision: https://reviews.llvm.org/D52548

llvm-svn: 343329
2018-09-28 15:24:41 +00:00
Florian Hahn 8d72ecc36f Revert r343308: [LoopInterchange] Turn into a loop pass.
llvm-svn: 343310
2018-09-28 10:20:07 +00:00
Florian Hahn 0694c159f7 [LoopInterchange] Turn into a loop pass.
This patch turns LoopInterchange into a loop pass. It now only
considers top-level loops and tries to move the innermost loop to the
optimal position within the loop nest. By only looking at top-level
loops, we might miss a few opportunities the function pass would get
(e.g. if we have a loop nest of 3 loops, in the function pass
we might process loops at level 1 and 2 and move the inner most loop to
level 1, and then we process loops at levels 0, 1, 2 and interchange
again, because we now have a different inner loop). But I think it would
be better to handle such cases by picking the best inner loop from the
start and avoid re-visiting the same loops again.

The biggest advantage of it being a function pass is that it interacts
nicely with the other loop passes. Without this patch, there are some
performance regressions on AArch64 with loop interchanging enabled,
where no loops were interchanged, but we missed out on some other loop
optimizations.

It also removes the SimplifyCFG run. We are just changing branches, so
the CFG should not be more complicated, besides the additional 'unique'
preheaders this pass might create.


Reviewers: chandlerc, efriedma, mcrosier, javed.absar, xbolva00

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D51702

llvm-svn: 343308
2018-09-28 09:45:50 +00:00
Sanjay Patel c3f50ff92e [InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0)
When C is not zero and infinites are not allowed (C / X) > 0 is a sign
test. Depending on the sign of C, the predicate must be swapped.

E.g.:
  foo(double X) {
    if ((-2.0 / X) <= 0) ...
  }
 =>
  foo(double X) {
    if (X >= 0) ...
  }

Patch by: @marels (Martin Elshuber)

Differential Revision: https://reviews.llvm.org/D51942

llvm-svn: 343228
2018-09-27 15:59:24 +00:00
Teresa Johnson f24136f17a [WPD] Fix incorrect devirtualization after indirect call promotion
Summary:
Add a dominance check to ensure that the possible devirtualizable
call is actually dominated by the type test/checked load intrinsic being
analyzed. With PGO, after indirect call promotion is performed during
the compile step, followed by inlining, we may have a type test in the
promoted and inlined sequence that allows an indirect call in that
sequence to be devirtualized. That indirect call (inserted by inlining
after promotion) will share the same vtable pointer as the fallback
indirect call that cannot be devirtualized.

Before this patch the code was incorrectly devirtualizing the fallback
indirect call.

See the new test and the example described there for more details.

Reviewers: pcc, vitalybuka

Subscribers: mehdi_amini, Prazek, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D52514

llvm-svn: 343226
2018-09-27 14:55:32 +00:00
Fangrui Song 0cac726a00 llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)
Summary: The convenience wrapper in STLExtras is available since rL342102.

Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb

Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D52573

llvm-svn: 343163
2018-09-27 02:13:45 +00:00
Florian Hahn 6feb637124 [LoopInterchange] Preserve LCSSA.
This patch extends LoopInterchange to move LCSSA to the right place
after interchanging. This is required for LoopInterchange to become a
function pass.

An alternative to the manual moving of the PHIs, we could also re-form
the LCSSA phis for a set of interchanged loops, but that's more
expensive.

Reviewers: efriedma, mcrosier, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D52154

llvm-svn: 343132
2018-09-26 19:34:25 +00:00
Vyacheslav Zakharin e06831a3b2 Remove LoopID metadata from the branch instruction
that follows the peeled iterations.

Differential Revision: https://reviews.llvm.org/D52176

llvm-svn: 343054
2018-09-26 01:03:21 +00:00
Zhaoshi Zheng 95710337b4 Revert "Revert "[ConstHoist] Do not rebase single (or few) dependent constant""
This reverts commit bd7b44f35ee9fbe365eb25ce55437ea793b39346.

Reland r342994: disabled the optimization and explicitly enable it in test.

-mllvm -consthoist-min-num-to-rebase<unsigned>=0

[ConstHoist] Do not rebase single (or few) dependent constant

If an instance (InsertionPoint or IP) of Base constant A has only one or few
rebased constants depending on it, do NOT rebase. One extra ADD instruction is
required to materialize each rebased constant, assuming A and the rebased have
the same materialization cost.

Differential Revision: https://reviews.llvm.org/D52243

llvm-svn: 343053
2018-09-26 00:59:09 +00:00
Anna Thomas b1e3d45318 [LV][LAA] Vectorize loop invariant values stored into loop invariant address
Summary:
We are overly conservative in loop vectorizer with respect to stores to loop
invariant addresses.
More details in https://bugs.llvm.org/show_bug.cgi?id=38546
This is the first part of the fix where we start with vectorizing loop invariant
values to loop invariant addresses.

This also includes changes to ORE for stores to invariant address.

Reviewers: anemet, Ayal, mkuper, mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50665

llvm-svn: 343028
2018-09-25 20:57:20 +00:00
Jessica Paquette e02de05b32 Revert "[ConstHoist] Do not rebase single (or few) dependent constant"
This caused a couple test failures on a bot:

CodeGen/X86/constant-hoisting-bfi.ll
Transforms/ConstantHoisting/X86/ehpad.ll

Example:

http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/53575/

llvm-svn: 343005
2018-09-25 18:41:40 +00:00
Zhaoshi Zheng 2c1a09188f [ConstHoist] Do not rebase single (or few) dependent constant
If an instance (InsertionPoint or IP) of Base constant A has only one or few
rebased constants depending on it, do NOT rebase. One extra ADD instruction is
required to materialize each rebased constant, assuming A and the rebased have
the same materialization cost.

Differential Revision: https://reviews.llvm.org/D52243

llvm-svn: 342994
2018-09-25 17:45:37 +00:00
Sanjay Patel 69ed4710b8 [InstCombine] narrow binops on concatenated vectors (PR33026)
The motivating case from:
https://bugs.llvm.org/show_bug.cgi?id=33026
...has no shuffles now. This kind of pattern may occur during
vectorization when targets have lumpy ISAs like SSE/AVX.

llvm-svn: 342988
2018-09-25 15:57:37 +00:00
David Green 9108c2b921 [LoopUnroll] Add check to Latch's terminator in UnrollRuntimeLoopRemainder
In this patch, I'm adding an extra check to the Latch's terminator in llvm::UnrollRuntimeLoopRemainder,
similar to how it is already done in the llvm::UnrollLoop.

The compiler would crash if this function is called with a malformed loop.

Patch by Rodrigo Caetano Rocha!

Differential Revision: https://reviews.llvm.org/D51486

llvm-svn: 342958
2018-09-25 10:08:47 +00:00