Commit Graph

22876 Commits

Author SHA1 Message Date
Florian Hahn a03d7b0f24 Recommit "[GlobalOpt] Pass DTU to removeUnreachableBlocks instead of recomputing."
The cause for the revert should be fixed by r373513 /
a80b6c1542

This reverts commit 47dbcbd8ec.

llvm-svn: 373522
2019-10-02 20:40:13 +00:00
Evgeniy Stepanov 464df87288 Handle llvm.launder.invariant.group in msan.
Summary:
[MSan] handle llvm.launder.invariant.group

    Msan used to give false-positives in

    class Foo {
     public:
      virtual ~Foo() {};
    };

    // Return true iff *x is set.
    bool f1(void **x, bool flag);

    Foo* f() {
      void *p;
      bool found;
      found = f1(&p,flag);
      if (found) {
        // p is always set here.
        return static_cast<Foo*>(p); // False positive here.
      }
      return nullptr;
    }

Patch by Ilya Tokar.

Reviewers: #sanitizers, eugenis

Reviewed By: #sanitizers, eugenis

Subscribers: eugenis, Prazek, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68236

llvm-svn: 373515
2019-10-02 19:53:19 +00:00
Florian Hahn a80b6c1542 [Local] Handle terminators with users in removeUnreachableBlocks.
Terminators like invoke can have users outside the current basic block.
We have to replace those users with undef, before replacing the
terminator.

This fixes a crash exposed by rL373430.

Reviewers: brzycki, asbirlea, davide, spatel

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D68327

llvm-svn: 373513
2019-10-02 19:38:24 +00:00
Florian Hahn eb6700b57e [Local] Remove unused LazyValueInfo pointer from removeUnreachableBlock.
There are no users that pass in LazyValueInfo, so we can simplify the
function a bit.

Reviewers: brzycki, asbirlea, davide

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D68297

llvm-svn: 373488
2019-10-02 16:58:13 +00:00
Teresa Johnson 077cc3fcb0 [ThinLTO/WPD] Ensure devirtualized targets use promoted symbol when necessary
Summary:
This fixes a hole in the handling of devirtualized targets that were
local but need promoting due to devirtualization in another module. We
were not correctly referencing the promoted symbol in some cases. Make
sure the code that updates the name also looks at the ExportedGUIDs set
by utilizing a callback that checks all conditions (the callback
utilized by the internalization/promotion code).

Reviewers: pcc, davidxl, hiraditya

Subscribers: mehdi_amini, Prazek, inglorion, steven_wu, dexonsmith, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68159

llvm-svn: 373485
2019-10-02 16:36:59 +00:00
Simon Pilgrim 91b4085b03 LowerExpectIntrinsic handlePhiDef - silence static analyzer dyn_cast<PHINode> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<PHINode> directly and if not assert will fire for us.

llvm-svn: 373481
2019-10-02 16:03:45 +00:00
Aditya Kumar c4a7b912c2 [CodeExtractor] NFC: Refactor sanity checks into isEligible
Reviewers: fhahn

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68331

llvm-svn: 373479
2019-10-02 15:36:39 +00:00
Aditya Kumar b1fe6c90e6 NFC: directly return when CommonExitBlock != Succ
Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68330

llvm-svn: 373456
2019-10-02 12:15:17 +00:00
Simon Pilgrim da4cbae696 LICM - remove unused variable and reduce scope of another variable. NFCI.
Appeases both clang static analyzer and cppcheck

llvm-svn: 373453
2019-10-02 11:49:53 +00:00
Florian Hahn 47dbcbd8ec Revert [GlobalOpt] Pass DTU to removeUnreachableBlocks instead of recomputing.
This breaks http://lab.llvm.org:8011/builders/sanitizer-windows/builds/52310

This reverts r373430 (git commit 70f7003548)

llvm-svn: 373432
2019-10-02 08:32:25 +00:00
Florian Hahn 70f7003548 [GlobalOpt] Pass DTU to removeUnreachableBlocks instead of recomputing.
removeUnreachableBlocks knows how to preserve the DomTree, so make use
of it instead of re-computing the DT.

Reviewers: davide, kuhar, brzycki

Reviewed By: davide, kuhar

Differential Revision: https://reviews.llvm.org/D68298

llvm-svn: 373430
2019-10-02 08:15:31 +00:00
Florian Hahn 167b0529be [Local] Simplify function removeUnreachableBlocks() to avoid (re-)computation.
Two small changes in llvm::removeUnreachableBlocks() to avoid unnecessary (re-)computation.

First, replace the use of count() with find(), which has better time complexity.

Second, because we have already computed the set of dead blocks, replace the second loop over all basic blocks to a loop only over the already computed dead blocks. This simplifies the loop and avoids recomputation.

Patch by Rodrigo Caetano Rocha <rcor.cs@gmail.com>

Reviewers: efriedma, spatel, fhahn, xbolva00

Reviewed By: fhahn, xbolva00

Differential Revision: https://reviews.llvm.org/D68191

llvm-svn: 373429
2019-10-02 07:37:41 +00:00
Sanjay Patel 9738fd6387 [BypassSlowDivision][CodeGenPrepare] avoid crashing on unused code (PR43514)
https://bugs.llvm.org/show_bug.cgi?id=43514

llvm-svn: 373394
2019-10-01 21:25:36 +00:00
Leonard Chan 8830975cf6 [ASan][NFC] Address remaining comments for https://reviews.llvm.org/D68287
I submitted that patch after I got the LGTM, but the comments didn't
appear until after I submitted the change. This adds `const` to the
constructor argument and makes it a pointer.

llvm-svn: 373391
2019-10-01 20:49:07 +00:00
Leonard Chan 63663616f5 [ASan] Make GlobalsMD member a const reference.
PR42924 points out that copying the GlobalsMetadata type during
construction of AddressSanitizer can result in exteremely lengthened
build times for translation units that have many globals. This can be addressed
by just making the GlobalsMD member in AddressSanitizer a reference to
avoid the copy. The GlobalsMetadata type is already passed to the
constructor as a reference anyway.

Differential Revision: https://reviews.llvm.org/D68287

llvm-svn: 373389
2019-10-01 20:30:46 +00:00
Roman Lebedev 053014f8f9 [InstCombine] Deal with -(trunc(X >>u 63)) -> trunc(X >>s 63)
Identical to it's trunc-less variant, just pretent-to hoist
trunc, and everything else still holds:
https://rise4fun.com/Alive/JRU

llvm-svn: 373364
2019-10-01 17:50:20 +00:00
Roman Lebedev 65144149d0 [InstCombine] Preserve 'exact' in -(X >>u 31) -> (X >>s 31) fold
https://rise4fun.com/Alive/yR4

llvm-svn: 373363
2019-10-01 17:50:09 +00:00
Philip Reames 0200626f0b [IndVars] An implementation of loop predication without a need for speculation
This patch implements a variation of a well known techniques for JIT compilers - we have an implementation in tree as LoopPredication - but with an interesting twist. This version does not assume the ability to execute a path which wasn't taken in the original program (such as a guard or widenable.condition intrinsic). The benefit is that this works for arbitrary IR from any frontend (including C/C++/Fortran). The tradeoff is that it's restricted to read only loops without implicit exits.

This builds on SCEV, and can thus eliminate the loop varying portion of the any early exit where all exits are understandable by SCEV. A key advantage is that fixing deficiency exposed in SCEV - already found one while writing test cases - will also benefit all of full redundancy elimination (and most other loop transforms).

I haven't seen anything in the literature which quite matches this. Given that, I'm not entirely sure that keeping the name "loop predication" is helpful. Anyone have suggestions for a better name? This is analogous to partial redundancy elimination - since we remove the condition flowing around the backedge - and has some parallels to our existing transforms which try to make conditions invariant in loops.

Factoring wise, I chose to put this in IndVarSimplify since it's a generally applicable to all workloads. I could split this off into it's own pass, but we'd then probably want to add that new pass every place we use IndVars.  One solid argument for splitting it off into it's own pass is that this transform is "too good". It breaks a huge number of existing IndVars test cases as they tend to be simple read only loops.  At the moment, I've opted it off by default, but if we add this to IndVars and enable, we'll have to update around 20 test files to add side effects or disable this transform.

Near term plan is to fuzz this extensively while off by default, reflect and discuss on the factoring issue mentioned just above, and then enable by default.  I also need to give some though to supporting widenable conditions in this framing.

Differential Revision: https://reviews.llvm.org/D67408

llvm-svn: 373351
2019-10-01 17:03:44 +00:00
David Bolvansky 4037582d6b Revert [InstCombine] sprintf(dest, "%s", str) -> memccpy(dest, str, 0, MAX)
Seems to be slower than memcpy + strlen.

llvm-svn: 373335
2019-10-01 13:19:04 +00:00
David Bolvansky 8fc6a1bf56 [InstCombine] sprintf(dest, "%s", str) -> memccpy(dest, str, 0, MAX)
llvm-svn: 373333
2019-10-01 13:03:10 +00:00
Evandro Menezes 41ead4281f [SimplifyLibCalls] Define the value of the Euler number
This patch fixes the build break on Windows hosts.

There must be a better way of accessing the equivalent POSIX math constant
`M_E`.

llvm-svn: 373274
2019-09-30 23:21:02 +00:00
Evandro Menezes 110b1138ba [InstCombine] Expand the simplification of log()
Expand the simplification of special cases of `log()` to include `log2()`
and `log10()` as well as intrinsics and more types.

Differential revision: https://reviews.llvm.org/D67199

llvm-svn: 373261
2019-09-30 20:52:21 +00:00
Alina Sbirlea 0fa07f4276 [LegacyPassManager] Deprecate the BasicBlockPass/Manager.
Summary:
The BasicBlockManager is potentially broken and should not be used.
Replace all uses of the BasicBlockPass with a FunctionBlockPass+loop on
blocks.

Reviewers: chandlerc

Subscribers: jholewinski, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68234

llvm-svn: 373254
2019-09-30 20:17:23 +00:00
David Bolvansky a05e671c7e [FunctionAttrs] Added noalias for memccpy/mempcpy arguments
llvm-svn: 373251
2019-09-30 19:43:48 +00:00
Roman Lebedev faa90eca63 [InstCombine][NFC] visitShl(): call SimplifyQuery::getWithInstruction() once
llvm-svn: 373249
2019-09-30 19:16:00 +00:00
Rong Xu 3674050087 [PGO] Don't group COMDAT variables for compiler generated profile variables in ELF
With this patch, compiler generated profile variables will have its own COMDAT
name for ELF format, which syncs the behavior with COFF. Tested with clang
PGO bootstrap. This shows a modest reduction in object sizes in ELF format.

Differential Revision: https://reviews.llvm.org/D68041

llvm-svn: 373241
2019-09-30 18:11:22 +00:00
Alina Sbirlea 8299fd9dee [EarlyCSE] Pass preserves AA.
llvm-svn: 373231
2019-09-30 17:08:40 +00:00
Sanjay Patel 712b7c2463 [InstCombine] fold negate disguised as select+mul
Name: negate if true
  %sel = select i1 %cond, i32 -1, i32 1
  %r = mul i32 %sel, %x
  =>
  %m = sub i32 0, %x
  %r = select i1 %cond, i32 %m, i32 %x

  Name: negate if false
  %sel = select i1 %cond, i32 1, i32 -1
  %r = mul i32 %sel, %x
  =>
  %m = sub i32 0, %x
  %r = select i1 %cond, i32 %x, i32 %m

https://rise4fun.com/Alive/Nlh

llvm-svn: 373230
2019-09-30 17:02:26 +00:00
Guillaume Chatelet ab11b9188d [Alignment][NFC] Remove AllocaInst::setAlignment(unsigned)
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jholewinski, arsenm, jvesely, nhaehnle, eraman, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68141

llvm-svn: 373207
2019-09-30 13:34:44 +00:00
Guillaume Chatelet 17380227e8 [Alignment][NFC] Remove LoadInst::setAlignment(unsigned)
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jdoerfert

Subscribers: hiraditya, asbirlea, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68142

llvm-svn: 373195
2019-09-30 09:37:05 +00:00
Aditya Kumar a6d9d31279 [LLVM-C][Ocaml] Add MergeFunctions and DCE pass
MergeFunctions and DCE pass are missing from OCaml/C-api. This patch
adds them.

Differential Revision: https://reviews.llvm.org/D65071

Reviewers: whitequark, hiraditya, deadalnix

Reviewed By: whitequark

Subscribers: llvm-commits

Tags: #llvm

Authored by: kren1

llvm-svn: 373170
2019-09-29 16:06:22 +00:00
Roman Lebedev d30093bb8a [DivRemPairs] Don't assert that we won't ever get expanded-form rem pairs in different BB's (PR43500)
If we happen to have the same div in two basic blocks,
and in one of those we also happen to have the rem part,
we'd match the div-rem pair, but the wrong ones.
So let's drop overly-ambiguous assert.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43500

llvm-svn: 373167
2019-09-29 15:25:24 +00:00
Alexey Bataev 8b1eeafb91 [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.

Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel

Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D29641

llvm-svn: 373166
2019-09-29 14:18:06 +00:00
Aditya Kumar 2adae76cc6 [NFC] Move hot cold splitting class to header file
Summary:  This is to facilitate unittests

Reviewers: compnerd, vsk, tejohnson, sebpop, brzycki, SirishP

Reviewed By: tejohnson

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68079

llvm-svn: 373151
2019-09-28 18:13:33 +00:00
Wei Mi f0c4e70e95 [SampleFDO] Create a separate flag profile-accurate-for-symsinlist to handle
profile symbol list.

Currently many existing users using profile-sample-accurate want to reduce
code size as much as possible. Their use cases are different from the scenario
profile symbol list tries to handle -- the major motivation of adding profile
symbol list is to get the major memory/code size saving without introduce
performance regression. So to keep the behavior of profile-sample-accurate
unchanged, we think decoupling these two things and using a new flag to
control the handling of profile symbol list may be better.

When profile-sample-accurate and the new flag profile-accurate-for-symsinlist
are both present, since profile-sample-accurate is a user assertion we let it
have a higher precedence.

Differential Revision: https://reviews.llvm.org/D68047

llvm-svn: 373133
2019-09-27 22:33:59 +00:00
Roman Lebedev 269f1bea0d [InstCombine] Simplify shift-by-sext to shift-by-zext
Summary:
This is valid for any `sext` bitwidth pair:
```
Processing /tmp/opt.ll..

----------------------------------------
  %signed = sext %y
  %r = shl %x, %signed
  ret %r
=>
  %unsigned = zext %y
  %r = shl %x, %unsigned
  ret %r
  %signed = sext %y

Done: 2016
Optimization is correct!
```

(This isn't so for funnel shifts, there it's illegal for e.g. i6->i7.)

Main motivation is the C++ semantics:
```
int shl(int a, char b) {
    return a << b;
}
```
ends as
```
  %3 = sext i8 %1 to i32
  %4 = shl i32 %0, %3
```
https://godbolt.org/z/0jgqUq
which is, as this shows, too pessimistic.

There is another problem here - we can only do the fold
if sext is one-use. But we can trivially have cases
where several shifts have the same sext shift amount.
This should be resolved, later.

Reviewers: spatel, nikic, RKSimon

Reviewed By: spatel

Subscribers: efriedma, hiraditya, nlopes, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68103

llvm-svn: 373106
2019-09-27 18:12:15 +00:00
Simon Pilgrim 2e0de86808 ModuleUtils - silence static analyzer dyn_cast<> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<> directly and if not assert will fire for us.

llvm-svn: 373099
2019-09-27 16:55:49 +00:00
Simon Pilgrim f71f23d14d FunctionImportGlobalProcessing::processGlobalForThinLTO - silence static analyzer dyn_cast<FunctionSummary> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<FunctionSummary> directly and if not assert will fire for us.

llvm-svn: 373097
2019-09-27 15:49:19 +00:00
Simon Pilgrim 623b0e6963 SCCP - silence static analyzer dyn_cast<StructType> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<StructType> directly and if not assert will fire for us.

llvm-svn: 373095
2019-09-27 15:49:10 +00:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
Guillaume Chatelet d886f391af [Alignment][NFC] MaybeAlign in GVNExpression
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67922

llvm-svn: 373054
2019-09-27 08:56:43 +00:00
Peter Collingbourne c336557f02 hwasan: Compatibility fixes for short granules.
We can't use short granules with stack instrumentation when targeting older
API levels because the rest of the system won't understand the short granule
tags stored in shadow memory.

Moreover, we need to be able to let old binaries (which won't understand
short granule tags) run on a new system that supports short granule
tags. Such binaries will call the __hwasan_tag_mismatch function when their
outlined checks fail. We can compensate for the binary's lack of support
for short granules by implementing the short granule part of the check in
the __hwasan_tag_mismatch function. Unfortunately we can't do anything about
inline checks, but I don't believe that we can generate these by default on
aarch64, nor did we do so when the ABI was fixed.

A new function, __hwasan_tag_mismatch_v2, is introduced that lets code
targeting the new runtime avoid redoing the short granule check. Because tag
mismatches are rare this isn't important from a performance perspective; the
main benefit is that it introduces a symbol dependency that prevents binaries
targeting the new runtime from running on older (i.e. incompatible) runtimes.

Differential Revision: https://reviews.llvm.org/D68059

llvm-svn: 373035
2019-09-27 01:02:10 +00:00
Jordan Rupprecht f98d2c099a Revert [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
This reverts r372626 (git commit 6a278d9073)

llvm-svn: 373019
2019-09-26 22:09:17 +00:00
Kit Barton 50bc610460 [LoopFusion] Add ability to fuse guarded loops
Summary:
This patch extends the current capabilities in loop fusion to fuse guarded loops
(as defined in https://reviews.llvm.org/D63885). The patch adds the necessary
safety checks to ensure that it safe to fuse the guarded loops (control flow
equivalent, no intervening code, and same guard conditions). It also provides an
alternative method to perform the actual fusion of guarded loops. The mechanics
to fuse guarded loops are slightly different then fusing non-guarded loops, so I
opted to keep them separate methods. I will be cleaning this up in later
patches, and hope to converge on a single method to fuse both guarded and
non-guarded loops, but for now I think the review will be easier to keep them
separate.

Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65464

llvm-svn: 373018
2019-09-26 21:42:45 +00:00
Zhaoshi Zheng 1128fa0924 [Unroll] Do NOT unroll a loop with small runtime upperbound
For a runtime loop if we can compute its trip count upperbound:

Don't unroll if:
1. loop is not guaranteed to run either zero or upperbound iterations; and
2. trip count upperbound is less than UnrollMaxUpperBound
Unless user or TTI asked to do so.

If unrolling, limit unroll factor to loop's trip count upperbound.

Differential Revision: https://reviews.llvm.org/D62989

Change-Id: I6083c46a9d98b2e22cd855e60523fdc5a4929c73
llvm-svn: 373017
2019-09-26 21:40:27 +00:00
Craig Topper 46721bb7f5 [InstCombine] Use m_Zero instead of isNullValue() when checking if a GEP index is all zeroes to prevent an infinite loop.
The test case here previously infinite looped. Only one element from the GEP is used so SimplifyDemandedVectorElts would replace the other lanes in each index with undef leading to the first index being <0, undef, undef, undef>. But there's a GEP transform that tries to replace an index into a 0 sized type with a zero index. But the zero index check only works on ConstantInt 0 or ConstantAggregateZero so it would turn the index back to zeroinitializer. Resulting in a loop.

The fix is to use m_Zero() to allow a vector of zeroes and undefs.

Differential Revision: https://reviews.llvm.org/D67977

llvm-svn: 373000
2019-09-26 17:20:50 +00:00
Jakub Kuderski d98cb81cd1 Handle successor's PHI node correctly when flattening CFG merges two if-regions
Summary:
FlattenCFG merges two 'if' basicblocks by inserting one basicblock
to another basicblock. The inserted basicblock can have a successor
that contains a PHI node whoes incoming basicblock is the inserted
basicblock. Since the existing code does not handle it, it becomes
a badref.

if (cond1)
  statement
if (cond2)
  statement
successor - contains PHI node whose predecessor is cond2

-->
if (cond1 || cond2)
  statement
(BB for cond2 was deleted)
successor - contains PHI node whose predecessor is cond2 --> bad ref!

Author: Jaebaek Seo

Reviewers: asbirlea, kuhar, tstellar, chandlerc, davide, dexonsmith

Reviewed By: kuhar

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68032

llvm-svn: 372989
2019-09-26 15:20:17 +00:00
Simon Pilgrim c15cd009ac [FlattenCFG] Silence static analyzer dyn_cast<BranchInst> null dereference warnings. NFCI.
The static analyzer is warning about a potential null dereferences, but we should be able to use cast<BranchInst> directly and if not assert will fire for us.

llvm-svn: 372977
2019-09-26 13:33:15 +00:00
Bjorn Pettersson 163c54d288 [InstCombine] Don't assume CmpInst has been visited in getFlippedStrictnessPredicateAndConstant
Summary:
Removing an assumption (assert) that the CmpInst already has been
simplified in getFlippedStrictnessPredicateAndConstant. Solution is
to simply bail out instead of hitting the assertion. Instead we
assume that any profitable rewrite will happen in the next iteration
of InstCombine.

The reason why we can't assume that the CmpInst already has been
simplified is that the worklist does not guarantee such an ordering.

Solves https://bugs.llvm.org/show_bug.cgi?id=43376

Reviewers: spatel, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68022

llvm-svn: 372972
2019-09-26 12:16:01 +00:00
Simon Pilgrim 6b794dfd3d MemorySanitizer - silence static analyzer dyn_cast<> null dereference warnings. NFCI.
The static analyzer is warning about a potential null dereferences, but we should be able to use cast<> directly and if not assert will fire for us.

llvm-svn: 372960
2019-09-26 10:56:14 +00:00
Simon Pilgrim faa5b39e4e PGOMemOPSizeOpt - silence static analyzer dyn_cast<MemIntrinsic> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<MemIntrinsic> directly and if not assert will fire for us.

llvm-svn: 372959
2019-09-26 10:56:07 +00:00
Roman Lebedev a2fa03af3a [InstCombine] foldUnsignedUnderflowCheck(): one last pattern with 'sub' (PR43251)
https://rise4fun.com/Alive/0j9

llvm-svn: 372930
2019-09-25 22:59:59 +00:00
Eli Friedman 69dddfe268 [LICM] Don't verify domtree/loopinfo unless EXPENSIVE_CHECKS is enabled.
For large functions, verifying the whole function after each loop takes
non-linear time.

Differential Revision: https://reviews.llvm.org/D67571

llvm-svn: 372924
2019-09-25 22:35:47 +00:00
Roman Lebedev 23646952e2 [InstCombine] Fold (A - B) u>=/u< A --> B u>/u<= A iff B != 0
https://rise4fun.com/Alive/KtL

This also shows that the fold added in D67412 / r372257
was too specific, and the new fold allows those test cases
to be handled more generically, therefore i delete now-dead code.

This is yet again motivated by
D67122 "[UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour"

llvm-svn: 372912
2019-09-25 19:06:40 +00:00
Florian Hahn f3ab99dcf8 [InstCombine] Limit FMul constant folding for fma simplifications.
As @reames pointed out post-commit, rL371518 adds additional rounding
in some cases, when doing constant folding of the multiplication.
This breaks a guarantee llvm.fma makes and must be avoided.

This patch reapplies rL371518, but splits off the simplifications not
requiring rounding from SimplifFMulInst as SimplifyFMAFMul.

Reviewers: spatel, lebedev.ri, reames, scanon

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D67434

llvm-svn: 372899
2019-09-25 17:03:20 +00:00
Florian Hahn 5c3bc3c930 [PatternMatch] Make m_Br more flexible, add matchers for BB values.
Currently m_Br only takes references to BasicBlock*, which limits its
flexibility. For example, you have to declare a variable, even if you
ignore the result or you have to have additional checks to make sure the
matched BB matches an expected one.

This patch adds m_BasicBlock and m_SpecificBB matchers, which can be
used like the existing matchers for constants or values.

I also had a look at the existing uses and updated a few. IMO it makes
the code a bit more explicit.

Reviewers: spatel, craig.topper, RKSimon, majnemer, lebedev.ri

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D68013

llvm-svn: 372885
2019-09-25 15:05:08 +00:00
Philip Reames d9629b88ff [GCRelocate] Add a peephole to canonicalize base pointer relocation
If we generate the gc.relocate, and then later prove two arguments to the statepoint are equivalent, we should canonicalize the gc.relocate to the form we would have produced if this had been known before rewriting.

llvm-svn: 372771
2019-09-24 17:24:16 +00:00
Roman Lebedev 45fd1e9d50 [InstCombine] (a+b) < a && (a+b) != 0 -> (0-b) < a iff a/b != 0 (PR43259)
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.

For
```
#include <cassert>
char* test(char& base, signed long offset) {
  __builtin_assume(offset < 0);
  return &base + offset;
}
```
We produce

https://godbolt.org/z/r40U47

and again those two icmp's can be merged:
```
Name: 0
Pre: C != 0
  %adjusted = add i8 %base, C
  %not_null = icmp ne i8 %adjusted, 0
  %no_underflow = icmp ult i8 %adjusted, %base
  %r = and i1 %not_null, %no_underflow
=>
  %neg_offset = sub i8 0, C
  %r = icmp ugt i8 %base, %neg_offset
```
https://rise4fun.com/Alive/ALap
https://rise4fun.com/Alive/slnN

There are 3 other variants of this pattern,
i believe they all will go into InstSimplify.

https://bugs.llvm.org/show_bug.cgi?id=43259

Reviewers: spatel, xbolva00, nikic

Reviewed By: spatel

Subscribers: efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67849

llvm-svn: 372768
2019-09-24 16:10:50 +00:00
Roman Lebedev 5b881f356c [InstCombine] (a+b) <= a && (a+b) != 0 -> (0-b) < a (PR43259)
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.

This pattern isn't exactly what we get there
(strict vs. non-strict predicate), but this pattern does not
require known-bits analysis, so it is best to handle it first.

```
Name: 0
  %adjusted = add i8 %base, %offset
  %not_null = icmp ne i8 %adjusted, 0
  %no_underflow = icmp ule i8 %adjusted, %base
  %r = and i1 %not_null, %no_underflow
=>
  %neg_offset = sub i8 0, %offset
  %r = icmp ugt i8 %base, %neg_offset
```
https://rise4fun.com/Alive/knp

There are 3 other variants of this pattern,
they all will go into InstSimplify:
https://rise4fun.com/Alive/bIDZ

https://bugs.llvm.org/show_bug.cgi?id=43259

Reviewers: spatel, xbolva00, nikic

Reviewed By: spatel

Subscribers: hiraditya, majnemer, vsk, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67846

llvm-svn: 372767
2019-09-24 16:10:38 +00:00
Simon Pilgrim 934f18144d LoopVectorize - silence static analyzer dyn_cast<CmpInst> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<CmpInst> directly and if not assert will fire for us.

llvm-svn: 372732
2019-09-24 11:27:38 +00:00
Simon Pilgrim b6d11def37 [SimplifyCFG] FoldTwoEntryPHINode - silence static analyzer null dereference warning. NFCI.
Assert that we've found the DomBlock.

llvm-svn: 372728
2019-09-24 11:17:20 +00:00
Simon Pilgrim 9e8076b219 SimplifyCFG - silence static analyzer dyn_cast<LandingPadInst> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<LandingPadInst> directly and if not assert will fire for us.

llvm-svn: 372727
2019-09-24 11:17:13 +00:00
Simon Pilgrim bc58230e29 SimplifyCFG - silence static analyzer dyn_cast<Instruction> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<Instruction> directly and if not assert will fire for us.

llvm-svn: 372726
2019-09-24 11:17:06 +00:00
Alexey Lapshin 49f3c2b604 [Debuginfo] dbg.value points to undef value after Induction Variable Simplification.
Induction Variable Simplification pass does not update dbg.value intrinsic.

Before:

%add = add nuw nsw i32 %ArgIndex.06, 1
call void @llvm.dbg.value(metadata i32 %add, metadata !17, metadata !DIExpression())

After:

%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
call void @llvm.dbg.value(metadata i64 undef, metadata !17, metadata !DIExpression())

There should be:

%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
call void @llvm.dbg.value(metadata i64 %indvars.iv.next, metadata !17, metadata !DIExpression())

Differential Revision: https://reviews.llvm.org/D67770

llvm-svn: 372703
2019-09-24 08:47:03 +00:00
Sjoerd Meijer 0fcb3afb40 [LV] Forced vectorization with runtime checks and OptForSize
When vectorisation is forced with a pragma, we optimise for min size, and we
need to emit runtime memory checks, then allow this code growth and don't run
in an assert like we currently do.

This is the result of D65197 and D66803, and was a use-case not really
considered before. If this now happens, we emit an optimisation remark warning
about the code-size expansion, which can be avoided by not forcing
vectorisation or possibly source-code modifications.

Differential Revision: https://reviews.llvm.org/D67764

llvm-svn: 372694
2019-09-24 08:03:34 +00:00
Huihui Zhang a4dd98f2e9 [InstCombine] Fold a shifty implementation of clamp-to-allones.
Summary:
Fold
or(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X)
into
X s> Y ? -1 : X

https://rise4fun.com/Alive/d8Ab

clamp255 is a common operator in image processing, can be implemented
in a shifty way "(255 - X) >> 31 | X & 255". Fold shift into select
enables more optimization, e.g., vmin generation for ARM target.

Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon

Reviewed By: lebedev.ri

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67800

llvm-svn: 372678
2019-09-24 00:30:09 +00:00
Huihui Zhang 8952199715 [InstCombine] Fold a shifty implementation of clamp-to-zero.
Summary:
Fold
and(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X)
into
X s> Y ? X : 0

https://rise4fun.com/Alive/lFH

Fold shift into select enables more optimization,
e.g., vmax generation for ARM target.

Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon

Reviewed By: lebedev.ri

Subscribers: xbolva00, andreadb, craig.topper, RKSimon, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67799

llvm-svn: 372676
2019-09-24 00:15:03 +00:00
Saleem Abdulrasool 082f895b1a HotColdSplitting: invalidate the AssumptionCache on split
When a cold path is outlined, the value tracking in the assumption cache may be
invalidated due to the code motion.  We would previously trip an assertion in
subsequent passes (but required the passes to happen in a single run as the
assumption cache is shared across the passes).  Invalidating the cache ensures
that we get the correct information when needed with the legacy pass manager as
well.

llvm-svn: 372667
2019-09-23 22:23:01 +00:00
Wei Mi 22fd88530b [SampleFDO] Treat names in profile as not cold only when profile symbol list
is available

In rL372232, we treated names showing up in profile as not cold when
profile-sample-accurate is enabled. This caused 70k size regression in
Chrome/Android. The patch put a guard and only enable the change when
profile symbol list is available, i.e., keep the old behavior when profile
symbol list is not available.

Differential Revision: https://reviews.llvm.org/D67931

llvm-svn: 372665
2019-09-23 22:11:35 +00:00
Roman Lebedev 23aac95a32 [InstCombine] foldOrOfICmps(): Acquire SimplifyQuery with set CxtI
Extracted from https://reviews.llvm.org/D67849#inline-610377

llvm-svn: 372654
2019-09-23 20:40:47 +00:00
Roman Lebedev 595cfda059 [InstCombine] foldAndOfICmps(): Acquire SimplifyQuery with set CxtI
Extracted from https://reviews.llvm.org/D67849#inline-610377

llvm-svn: 372653
2019-09-23 20:40:40 +00:00
David Bolvansky 48db0272d6 [InstCombine] Annotate strndup calls with dereferenceable_or_null
"Implementations are free to malloc() a buffer containing either (size + 1) bytes or (strnlen(s, size) + 1) bytes. Applications should not assume that strndup() will allocate (size + 1) bytes when strlen(s) is smaller than size."

llvm-svn: 372647
2019-09-23 19:55:45 +00:00
Roman Lebedev 47e1ce4abe [IR] Add getExtendedType() to IntegerType and Type (dispatching to IntegerType or VectorType)
llvm-svn: 372638
2019-09-23 18:21:33 +00:00
Roman Lebedev 1972327d63 [InstCombine] dropRedundantMaskingOfLeftShiftInput(): improve comment
llvm-svn: 372637
2019-09-23 18:21:14 +00:00
David Bolvansky 8d52016155 [SLC] Convert some strndup calls to strdup calls
Summary:
Motivation:
- If we can fold it to strdup, we should (strndup does more things than strdup).
- Annotation mechanism. (Works for strdup well).

strdup and strndup are part of C 20 (currently posix fns), so we should optimize them.

Reviewers: efriedma, jdoerfert

Reviewed By: jdoerfert

Subscribers: lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67679

llvm-svn: 372636
2019-09-23 18:20:01 +00:00
Roman Lebedev 0a51e1f66d [InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. c/d/e with mask (PR42563)
Summary:
If we have a pattern `(x & (-1 >> maskNbits)) << shiftNbits`,
we already know (have a fold) that will drop the `& (-1 >> maskNbits)`
mask iff `(shiftNbits-maskNbits) s>= 0` (i.e. `shiftNbits u>= maskNbits`).

So even if `(shiftNbits-maskNbits) s< 0`, we can still
fold, we will just need to apply a **constant** mask afterwards:
```
Name: c, normal+mask
  %t0 = lshr i32 -1, C1
  %t1 = and i32 %t0, %x
  %r = shl i32 %t1, C2
=>
  %n0 = shl i32 %x, C2
  %n1 = i32 ((-(C2-C1))+32)
  %n2 = zext i32 %n1 to i64
  %n3 = lshr i64 -1, %n2
  %n4 = trunc i64 %n3 to i32
  %r = and i32 %n0, %n4
```
https://rise4fun.com/Alive/gslRa

Naturally, old `%masked` will have to be one-use.
This is not valid for pattern f - where "masking" is done via `ashr`.

https://bugs.llvm.org/show_bug.cgi?id=42563

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67725

llvm-svn: 372630
2019-09-23 17:04:28 +00:00
Roman Lebedev b4a1d8a84c [InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. a/b with mask (PR42563)
Summary:
And this is **finally** the interesting part of that fold!

If we have a pattern `(x & (~(-1 << maskNbits))) << shiftNbits`,
we already know (have a fold) that will drop the `& (~(-1 << maskNbits))`
mask iff `(maskNbits+shiftNbits) u>= bitwidth(x)`.
But that is actually ignorant, there's more general fold here:

In this pattern, `(maskNbits+shiftNbits)` actually correlates
with the number of low bits that will remain in the final value.
So even if `(maskNbits+shiftNbits) u< bitwidth(x)`, we can still
fold, we will just need to apply a **constant** mask afterwards:
```
Name: a, normal+mask
  %onebit = shl i32 -1, C1
  %mask = xor i32 %onebit, -1
  %masked = and i32 %mask, %x
  %r = shl i32 %masked, C2
=>
  %n0 = shl i32 %x, C2
  %n1 = add i32 C1, C2
  %n2 = zext i32 %n1 to i64
  %n3 = shl i64 -1, %n2
  %n4 = xor i64 %n3, -1
  %n5 = trunc i64 %n4 to i32
  %r = and i32 %n0, %n5
```
https://rise4fun.com/Alive/F5R

Naturally, old `%masked` will have to be one-use.
Similar fold exists for patterns c,d,e, will post patch later.

https://bugs.llvm.org/show_bug.cgi?id=42563

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67677

llvm-svn: 372629
2019-09-23 17:04:14 +00:00
Alexey Bataev 6a278d9073 [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
Summary:
Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.

Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel

Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D29641

llvm-svn: 372626
2019-09-23 16:25:03 +00:00
Roman Lebedev 01ac23ca62 [InstCombine] foldUnsignedUnderflowCheck(): s/Subtracted/ZeroCmpOp/
llvm-svn: 372625
2019-09-23 16:04:32 +00:00
David Bolvansky d90fd41f7e [FunctionAttrs] Enable nonnull arg propagation
Enable flag introduced in rL294998. Security concerns are no longer valid, since function signatures for mentioned libc functions has no nonnull attribute (Clang does not generate them? I see no nonnull attr in LLVM IR for these functions) and since rL372091 we carefully annotate the callsites where we know that size is static, non zero. So let's enable this flag again..

llvm-svn: 372573
2019-09-23 09:58:02 +00:00
Simon Pilgrim 2441455bc8 [LSR] Silence static analyzer null dereference warnings with assertions. NFCI.
Add assertions to make it clear that GenerateIVChain / NarrowSearchSpaceByPickingWinnerRegs should succeed in finding non-null values

llvm-svn: 372518
2019-09-22 17:59:24 +00:00
Simon Pilgrim db05a482bc ConstantHoisting - Silence static analyzer dyn_cast<PointerType> null dereference warning. NFCI.
llvm-svn: 372517
2019-09-22 17:45:05 +00:00
Sanjay Patel eb8d39e113 [InstCombine] allow icmp+binop folds before min/max bailout (PR43310)
This has the potential to uncover missed analysis/folds as shown in the
min/max code comment/test, but fewer restrictions on icmp folds should
be better in general to solve cases like:
https://bugs.llvm.org/show_bug.cgi?id=43310

llvm-svn: 372510
2019-09-22 14:31:53 +00:00
Simon Pilgrim a56bd6c51e [VPlan] Silence static analyzer dyn_cast null dereference warning. NFCI.
llvm-svn: 372502
2019-09-22 13:02:00 +00:00
Suyog Sarda cd629ea0a8 SROA: Check Total Bits of vector type
While Promoting alloca instruction of Vector Type, 
Check total size in bits of its slices too.
If they don't match, don't promote the alloca instruction.

Bug : https://bugs.llvm.org/show_bug.cgi?id=42585

llvm-svn: 372480
2019-09-21 18:16:37 +00:00
Suyog Sarda c62136e674 Test mail. NFC.
Testing commit acces. NFC.

llvm-svn: 372479
2019-09-21 18:03:30 +00:00
Hideto Ueno 63f6066b53 [Attributor] Implement "norecurse" function attribute deduction
Summary:
This patch introduces `norecurse` function attribute deduction.

`norecurse` will be deduced if the following conditions hold:
* The size of SCC in which the function belongs equals to 1.
* The function doesn't have self-recursion.
* We have `norecurse` for all call site.

To avoid a large change, SCC is calculated using scc_iterator in InfoCache initialization for now.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67751

llvm-svn: 372475
2019-09-21 15:13:19 +00:00
Simon Pilgrim 2e0c95edfe [AddressSanitizer] Don't dereference dyn_cast<ConstantInt> results. NFCI.
The static analyzer is warning about potential null dereference, but we can use cast<ConstantInt> directly and if not assert will fire for us.

llvm-svn: 372429
2019-09-20 20:52:21 +00:00
Akira Hatanaka 75fbb171c3 [ObjC][ARC] Skip debug instructions when computing the insert point of
objc_release calls

This fixes a bug where the presence of debug instructions would cause
ARC optimizer to change the order of retain and release calls.

rdar://problem/55319419

llvm-svn: 372352
2019-09-19 20:58:51 +00:00
Jakub Kuderski e6b2164723 Don't use invalidated iterators in FlattenCFGPass
Summary:
FlattenCFG may erase unnecessary blocks, which also invalidates iterators to those erased blocks.
Before this patch, `iterativelyFlattenCFG` could try to increment a BB iterator after that BB has been removed and crash.

This patch makes FlattenCFGPass use `WeakVH` to skip over erased blocks.

Reviewers: dblaikie, tstellar, davide, sanjoy, asbirlea, grosser

Reviewed By: asbirlea

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67672

llvm-svn: 372347
2019-09-19 19:39:42 +00:00
Roman Lebedev 7a67ed5795 [InstCombine] Simplify @llvm.usub.with.overflow+non-zero check (PR43251)
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.

In this particular case, given
```
char* test(char& base, unsigned long offset) {
  return &base - offset;
}
```
it will end up producing something like
https://godbolt.org/z/luGEju
which after optimizations reduces down to roughly
```
declare void @use64(i64)
define i1 @test(i8* dereferenceable(1) %base, i64 %offset) {
  %base_int = ptrtoint i8* %base to i64
  %adjusted = sub i64 %base_int, %offset
  call void @use64(i64 %adjusted)
  %not_null = icmp ne i64 %adjusted, 0
  %no_underflow = icmp ule i64 %adjusted, %base_int
  %no_underflow_and_not_null = and i1 %not_null, %no_underflow
  ret i1 %no_underflow_and_not_null
}
```
Without D67122 there was no `%not_null`,
and in this particular case we can "get rid of it", by merging two checks:
Here we are checking: `Base u>= Offset && (Base u- Offset) != 0`, but that is simply `Base u> Offset`

Alive proofs:
https://rise4fun.com/Alive/QOs

The `@llvm.usub.with.overflow` pattern itself is not handled here
because this is the main pattern, that we currently consider canonical.

https://bugs.llvm.org/show_bug.cgi?id=43251

Reviewers: spatel, nikic, xbolva00, majnemer

Reviewed By: xbolva00, majnemer

Subscribers: vsk, majnemer, xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67356

llvm-svn: 372341
2019-09-19 17:25:19 +00:00
Sanjay Patel 13e71ce693 [Float2Int] avoid crashing on unreachable code (PR38502)
In the example from:
https://bugs.llvm.org/show_bug.cgi?id=38502
...we hit infinite looping/crashing because we have non-standard IR -
an instruction operand is used before defined.
This and other unusual constructs are allowed in unreachable blocks,
so avoid the problem by using DominatorTree to step around landmines.

Differential Revision: https://reviews.llvm.org/D67766

llvm-svn: 372339
2019-09-19 16:31:17 +00:00
Serguei Katkov a44768858c [Unroll] Add an option to control complete unrolling
Add an ability to specify the max full unroll count for LoopUnrollPass pass
in pass options.

Reviewers: fhahn, fedor.sergeev
Reviewed By: fedor.sergeev
Subscribers: hiraditya, zzheng, dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D67701

llvm-svn: 372305
2019-09-19 06:57:29 +00:00
Roman Lebedev feea722cf3 [SimplifyCFG] mergeConditionalStoreToAddress(): try to pacify MSAN
MSAN bot complains that there is use-of-uninitialized-value
of this FreeStores later in IsWorthwhile().
Perhaps FreeStores needs to be stored in a vector?

llvm-svn: 372262
2019-09-18 21:04:39 +00:00
Roman Lebedev b646dd92c2 [InstCombine] foldUnsignedUnderflowCheck(): handle last few cases (PR43251)
Summary:
I don't have a direct motivational case for this,
but it would be good to have this for completeness/symmetry.

This pattern is basically the motivational pattern from
https://bugs.llvm.org/show_bug.cgi?id=43251
but with different predicate that requires that the offset is non-zero.

The completeness bit comes from the fact that a similar pattern (offset != zero)
will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259,
so it'd seem to be good to not overlook very similar patterns..

Proofs: https://rise4fun.com/Alive/21b

Also, there is something odd with `isKnownNonZero()`, if the non-zero
knowledge was specified as an assumption, it didn't pick it up (PR43267)

With this, i see no other missing folds for
https://bugs.llvm.org/show_bug.cgi?id=43251

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67412

llvm-svn: 372257
2019-09-18 20:10:07 +00:00
Roman Lebedev dd0170ab24 [SimplifyCFG] mergeConditionalStoreToAddress(): consider cost, not instruction count
Summary:
As it can be see in the changed test, while `div` is really costly,
we were speculating it. This does not seem correct.

Also, the old code would run for every single insturuction in BB,
instead of eagerly bailing out as soon as there are too many instructions.

This function still has a problem that `PHINodeFoldingThreshold` is
per-basic-block, while it should be for all the basic blocks.

Reviewers: efriedma, craig.topper, dmgreen, jmolloy

Reviewed By: jmolloy

Subscribers: xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67315

llvm-svn: 372255
2019-09-18 19:46:57 +00:00
Roman Lebedev ba4cad9039 [InstCombine] dropRedundantMaskingOfLeftShiftInput(): some cleanup before upcoming patch
llvm-svn: 372245
2019-09-18 18:38:40 +00:00
Wei Mi 5f7e822dc7 [SampleFDO] Minimize performance impact when profile-sample-accurate
is enabled.

We can save memory and reduce binary size significantly by enabling
ProfileSampleAccurate. However when ProfileSampleAccurate is true,
function without sample will be regarded as cold and this could
potentially cause performance regression.

To minimize the potential negative performance impact, we want to be
a little conservative here saying if a function shows up in the profile,
no matter as outline instance, inline instance or call targets, treat
the function as not being cold. This will handle the cases such as most
callsites of a function are inlined in sampled binary (thus outline copy
don't get any sample) but not inlined in current build (because of source
code drift, imprecise debug information, or the callsites are all cold
individually but not cold accumulatively...), so that the outline function
showing up as cold in sampled binary will actually not be cold after current
build. After the change, such function will be treated as not cold even
profile-sample-accurate is enabled.

At the same time we lower the hot criteria of callsiteIsHot check when
profile-sample-accurate is enabled. callsiteIsHot is used to determined
whether a callsite is hot and qualified for early inlining. When
profile-sample-accurate is enabled, functions without profile will be
regarded as cold and much less inlining will happen in CGSCC inlining pass,
so we can worry less about size increase and be aggressive to allow more
early inlining to happen for warm callsites and it is helpful for performance
overall.

Differential Revision: https://reviews.llvm.org/D67561

llvm-svn: 372232
2019-09-18 16:06:28 +00:00
Sanjay Patel d46bf63fbb [SimplifyLibCalls] fix crash with empty function name (PR43347)
...and improve some variable names while here.

https://bugs.llvm.org/show_bug.cgi?id=43347

llvm-svn: 372227
2019-09-18 14:33:40 +00:00
Teresa Johnson fd2044f299 [PGO] Change hardcoded thresholds for cold/inlinehint to use summary
Summary:
The PGO counter reading will add cold and inlinehint (hot) attributes
to functions that are very cold or hot. This was using hardcoded
thresholds, instead of the profile summary cutoffs which are used in
other hot/cold detection and are more dynamic and adaptable. Switch
to using the summary-based cold/hot detection.

The hardcoded limits were causing some code that had a medium level of
hotness (per the summary) to be incorrectly marked with a cold
attribute, blocking inlining.

Reviewers: davidxl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67673

llvm-svn: 372189
2019-09-17 23:12:13 +00:00
Reid Kleckner 23e872a3d0 [PGO] Don't use comdat groups for counters & data on COFF
For COFF, a comdat group is really a symbol marked
IMAGE_COMDAT_SELECT_ANY and zero or more other symbols marked
IMAGE_COMDAT_SELECT_ASSOCIATIVE. Typically the associative symbols in
the group are not external and are not referenced by other TUs, they are
things like debug info, C++ dynamic initializers, or other section
registration schemes. The Visual C++ linker reports a duplicate symbol
error for symbols marked IMAGE_COMDAT_SELECT_ASSOCIATIVE even if they
would be discarded after handling the leader symbol.

Fixes coverage-inline.cpp in check-profile after r372020.

llvm-svn: 372182
2019-09-17 21:10:49 +00:00
Roman Lebedev 97bc5ae993 [NFC][InstCombine] dropRedundantMaskingOfLeftShiftInput(): some NFC diff shaving
llvm-svn: 372171
2019-09-17 19:32:26 +00:00
David Bolvansky 0c0de794f1 Reland "[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)"
llvm-svn: 372142
2019-09-17 17:12:24 +00:00
Krasimir Georgiev bdff164e0e Revert "[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)"
Summary:
This reverts commit r372101.

Causes ASAN build bot failures:

http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176
From http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176/steps/64-bit%20check-asan/logs/stdio:

```
[ RUN      ] AddressSanitizer.StrNCatOOBTest
/home/buildbots/ppc64be-sanitizer/sanitizer-ppc64be/build/llvm-project/compiler-rt/lib/asan/tests/asan_str_test.cpp:462: Failure
Death test: strncat(to - 1, from, 0)
    Result: failed to die.
```

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67658

llvm-svn: 372125
2019-09-17 14:15:23 +00:00
Simon Pilgrim a2719f38c1 [LoopVectorize] Don't dereference a dyn_cast result. NFCI.
The static analyzer is warning about potential null dereferences of dyn_cast<> results, we can use cast<> directly as we know that these cases should all be CastInst, which is why its working atm and anyway cast<> will assert if they aren't.

llvm-svn: 372116
2019-09-17 13:24:54 +00:00
Benjamin Kramer df4b9a3f4f Hide implementation details in namespaces.
llvm-svn: 372113
2019-09-17 12:56:29 +00:00
Johannes Doerfert 3ab9e8b818 [Attributor][Fix] Initialize the cache prior to using it
Summary:
There were segfaults as we modified and iterated the instruction maps in
the cache at the same time. This was happening because we created new
instructions while we populated the cache. This fix changes the order
in which we perform these actions. First, the caches for the whole
module are created, then we start to create abstract attributes.

I don't have a unit test but the LLVM test suite exposes this problem.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67232

llvm-svn: 372105
2019-09-17 10:52:41 +00:00
David Bolvansky ded48e93e6 [SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)
llvm-svn: 372101
2019-09-17 10:25:38 +00:00
David Bolvansky be2487a2ba [InstCombine] Annotate strdup with deref_or_null
llvm-svn: 372098
2019-09-17 10:12:48 +00:00
David Bolvansky 3a3dddd9d7 [NFCI] Fixed buildbots
llvm-svn: 372097
2019-09-17 10:03:45 +00:00
Fangrui Song 8351763709 [SimplifyLibCalls] Fix -Wunused-result after D53342/r372091
llvm-svn: 372096
2019-09-17 09:56:55 +00:00
David Bolvansky e80fcf0340 [SimplifyLibCalls] Mark known arguments with nonnull
Reviewers: efriedma, jdoerfert

Reviewed By: jdoerfert

Subscribers: ychen, rsmith, joerg, aaron.ballman, lebedev.ri, uenoku, jdoerfert, hfinkel, javed.absar, spatel, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D53342

llvm-svn: 372091
2019-09-17 09:32:52 +00:00
Florian Hahn 1bd58870e5 [LoopUnroll] Use LoopSize+1 as threshold, to allow unrolling loops matching LoopSize.
We use `< UP.Threshold` later on, so we should use LoopSize + 1, to
allow unrolling if the result won't exceed to loop size.

Fixes PR43305.

Reviewers: efriedma, dmgreen, paquette

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D67594

llvm-svn: 372084
2019-09-17 09:02:48 +00:00
Hideto Ueno 30d86f1858 [Attributor] Use Alias Analysis in noalias callsite argument deduction
Summary: This patch adds a check of alias analysis in `noalias` callsite argument deduction.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67604

llvm-svn: 372075
2019-09-17 06:53:27 +00:00
Hideto Ueno 3bb5cbc20b [Attributor] Create helper struct for handling analysis getters
Summary: This patch introduces a helper struct `AnalysisGetter` to put together analysis getters. In this patch, a getter for `AAResult` is also added for  `noalias`.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67603

llvm-svn: 372072
2019-09-17 05:45:18 +00:00
Reid Kleckner 32837a0c93 [PGO] Use linkonce_odr linkage for __profd_ variables in comdat groups
This fixes relocations against __profd_ symbols in discarded sections,
which is PR41380.

In general, instrumentation happens very early, and optimization and
inlining happens afterwards. The counters for a function are calculated
early, and after inlining, counters for an inlined function may be
widely referenced by other functions.

For C++ inline functions of all kinds (linkonce_odr &
available_externally mainly), instr profiling wants to deduplicate these
__profc_ and __profd_ globals. Otherwise the binary would be quite
large.

I made __profd_ and __profc_ comdat in r355044, but I chose to make
__profd_ internal. At the time, I was only dealing with coverage, and in
that case, none of the instrumentation needs to reference __profd_.
However, if you use PGO, then instrumentation passes add calls to
__llvm_profile_instrument_range which reference __profd_ globals. The
solution is to make these globals externally visible by using
linkonce_odr linkage for data as was done for counters.

This is safe because PGO adds a CFG hash to the names of the data and
counter globals, so if different TUs have different globals, they will
get different data and counter arrays.

Reviewers: xur, hans

Differential Revision: https://reviews.llvm.org/D67579

llvm-svn: 372020
2019-09-16 18:49:09 +00:00
Roman Lebedev 10151f6618 [SimplifyCFG] FoldTwoEntryPHINode(): consider *total* speculation cost, not per-BB cost
Summary:
Previously, if the threshold was 2, we were willing to speculatively
execute 2 cheap instructions in both basic blocks (thus we were willing
to speculatively execute cost = 4), but weren't willing to speculate
when one BB had 3 instructions and other one had no instructions,
even thought that would have total cost of 3.

This looks inconsistent to me.
I don't think `cmov`-like instructions will start executing
until both of it's inputs are available: https://godbolt.org/z/zgHePf
So i don't see why the existing behavior is the correct one.

Also, let's add it's own `cl::opt` for this threshold,
with default=4, so it is not stricter than the previous threshold:
will allow to fold when there are 2 BB's each with cost=2.
And since the logic has changed, it will also allow to fold when
one BB has cost=3 and other cost=1, or there is only one BB with cost=4.

This is an alternative solution to D65148:
This fix is mainly motivated by `signbit-like-value-extension.ll` test.
That pattern comes up in JPEG decoding, see e.g.
`Figure F.12 – Extending the sign bit of a decoded value in V`
of `ITU T.81` (JPEG specification).
That branch is not predictable, and it is within the innermost loop,
so the fact that that pattern ends up being stuck with a branch
instead of `select` (i.e. `CMOV` for x86) is unlikely to be beneficial.

This has great results on the final assembly (vanilla test-suite + RawSpeed): (metric pass - D67240)
| metric                                 |     old |     new | delta |      % |
| x86-mi-counting.NumMachineFunctions    |   37720 |   37721 |     1 |  0.00% |
| x86-mi-counting.NumMachineBasicBlocks  |  773545 |  771181 | -2364 | -0.31% |
| x86-mi-counting.NumMachineInstructions | 7488843 | 7486442 | -2401 | -0.03% |
| x86-mi-counting.NumUncondBR            |  135770 |  135543 |  -227 | -0.17% |
| x86-mi-counting.NumCondBR              |  423753 |  422187 | -1566 | -0.37% |
| x86-mi-counting.NumCMOV                |   24815 |   25731 |   916 |  3.69% |
| x86-mi-counting.NumVecBlend            |      17 |      17 |     0 |  0.00% |

We significantly decrease basic block count, notably decrease instruction count,
significantly decrease branch count and very significantly increase `cmov` count.

Performance-wise, unsurprisingly, this has great effect on
target RawSpeed benchmark. I'm seeing 5 **major** improvements:
```
Benchmark                                                                                             Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_pvalue                                 0.0000          0.0000      U Test, Repetitions: 49 vs 49
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_mean                                  -0.3064         -0.3064      226.9913      157.4452      226.9800      157.4384
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_median                                -0.3057         -0.3057      226.8407      157.4926      226.8282      157.4828
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_stddev                                -0.4985         -0.4954        0.3051        0.1530        0.3040        0.1534
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_pvalue                                  0.0000          0.0000      U Test, Repetitions: 49 vs 49
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_mean                                   -0.1747         -0.1747       80.4787       66.4227       80.4771       66.4146
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_median                                 -0.1742         -0.1743       80.4686       66.4542       80.4690       66.4436
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_stddev                                 +0.6089         +0.5797        0.0670        0.1078        0.0673        0.1062
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_pvalue                                 0.0000          0.0000      U Test, Repetitions: 49 vs 49
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_mean                                  -0.1598         -0.1598      171.6996      144.2575      171.6915      144.2538
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_median                                -0.1598         -0.1597      171.7109      144.2755      171.7018      144.2766
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_stddev                                +0.4024         +0.3850        0.0847        0.1187        0.0848        0.1175
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_pvalue                                  0.0000          0.0000      U Test, Repetitions: 49 vs 49
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_mean                                   -0.0550         -0.0551      280.3046      264.8800      280.3017      264.8559
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_median                                 -0.0554         -0.0554      280.2628      264.7360      280.2574      264.7297
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_stddev                                 +0.7005         +0.7041        0.2779        0.4725        0.2775        0.4729
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_pvalue                                  0.0000          0.0000      U Test, Repetitions: 49 vs 49
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_mean                                   -0.0354         -0.0355      316.7396      305.5208      316.7342      305.4890
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_median                                 -0.0354         -0.0356      316.6969      305.4798      316.6917      305.4324
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_stddev                                 +0.0493         +0.0330        0.3562        0.3737        0.3563        0.3681
```

That being said, it's always best-effort, so there will likely
be cases where this worsens things.

Reviewers: efriedma, craig.topper, dmgreen, jmolloy, fhahn, Carrot, hfinkel, chandlerc

Reviewed By: jmolloy

Subscribers: xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67318

llvm-svn: 372009
2019-09-16 16:18:24 +00:00
Sanjay Patel 3961a143e1 [InstCombine] remove unneeded one-use checks for icmp fold
Related folds were added in:
rL125734
...the code comment about register pressure is discussed in
more detail in:
https://bugs.llvm.org/show_bug.cgi?id=2698

But 10 years later, perf testing bzip2 with this change now
shows a slight (0.2% average) improvement on Haswell although
that's probably within test noise.

Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.

This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.

rL371940 and rL371981 are related patches in this series.

llvm-svn: 372007
2019-09-16 16:15:25 +00:00
Sanjay Patel c5cd808156 [InstCombine] remove unneeded one-use checks for icmp fold
This fold and several others were added in:
rL125734 <https://reviews.llvm.org/rL125734>
...with no explanation for the one-use checks other than the code
comments about register pressure.

Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.

This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.

rL371940 is a related patch in this series.

llvm-svn: 371981
2019-09-16 12:54:34 +00:00
Sanjay Patel 91c2cd0691 [InstCombine] fix comments to match code; NFC
This blob was written before match() existed, so it
could probably be reduced significantly.

But I suspect it isn't well tested, so tests would have
to be added to reduce risk from logic changes.

llvm-svn: 371978
2019-09-16 12:12:05 +00:00
Simon Pilgrim 1aaefbca24 [VPlanSLP] Don't dereference a cast_or_null<VPInstruction> result. NFCI.
The static analyzer is warning about a potential null dereference of the cast_or_null result, I've split the cast_or_null check from the ->getUnderlyingInstr() call to avoid this, but it appears that we weren't seeing any null pointers in the dumped bundles in the first place.

llvm-svn: 371975
2019-09-16 11:22:44 +00:00
Simon Pilgrim bfe6b35c70 [SLPVectorizer] Assert that we find a LastInst to silence analyzer null dereference warning. NFCI.
llvm-svn: 371974
2019-09-16 10:48:16 +00:00
Simon Pilgrim ae625d70cd [SLPVectorizer] Don't dereference a dyn_cast result. NFCI.
The static analyzer is warning about potential null dereferences of dyn_cast<> results - in these cases we can safely use cast<> directly as we know that these cases should all be the correct type, which is why its working atm and anyway cast<> will assert if they aren't.

llvm-svn: 371973
2019-09-16 10:35:09 +00:00
Stefan Stipanovic 431141c5cc [Attributor] Heap-To-Stack Conversion
D53362 gives a prototype heap-to-stack conversion pass. With addition of new attributes in the attributor, this can now be revisted and improved. This will place it in the Attributor to make it easier to use new attributes (eg. nofree, nosync, willreturn, etc.) and other attributor features.

Reviewers: jdoerfert, uenoku, hfinkel, efriedma

Subscribers: lebedev.ri, xbolva00, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D65408

llvm-svn: 371942
2019-09-15 21:47:41 +00:00
Sanjay Patel 3daf168fa9 [InstCombine] remove unneeded one-use checks for icmp fold
This fold and several others were added in:
rL125734
...with no explanation for the one-use checks other than the code
comments about register pressure.

Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.

There are similar checks as noted with the TODO comments. I'm
hoping to remove those restrictions too, but if any of these
does cause a regression, it should be easier to correct by making
small, individual commits.

This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.

llvm-svn: 371940
2019-09-15 20:56:34 +00:00
Simon Pilgrim 4e46ea3946 [LoadStoreVectorizer] vectorizeLoadChain - ensure we find a valid Type down the load chain. NFCI.
Silence static analyzer uninitialized variable warning by setting the LoadTy to null and then asserting we find a real value.

llvm-svn: 371936
2019-09-15 16:44:35 +00:00
Sanjay Patel b6a0faaa0c [SLP] limit vectorization of Constant subclasses (PR33958)
This is a fix for:
https://bugs.llvm.org/show_bug.cgi?id=33958

It seems universally true that we would not want to transform this kind of
sequence on any target, but if that's not correct, then we could view this
as a target-specific cost model problem. We could also white-list ConstantInt,
ConstantFP, etc. rather than blacklist Global and ConstantExpr.

Differential Revision: https://reviews.llvm.org/D67362

llvm-svn: 371931
2019-09-15 13:03:24 +00:00
Johannes Doerfert e7c6f97039 [Attributor][Fix] Use right type to replace expressions
Summary: This should be obsolete once the functionality in D66967 is integrated.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67231

llvm-svn: 371915
2019-09-14 02:57:50 +00:00
Florian Hahn cde8343d85 [BasicBlockUtils] Add optional BBName argument, in line with BB:splitBasicBlock
Reviewers: spatel, asbirlea, craig.topper

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D67521

llvm-svn: 371819
2019-09-13 08:03:32 +00:00
Alina Sbirlea 6943472d45 [MemorySSA] Pass (for update) MSSAU when hoisting instructions.
Summary: Pass MSSAU to makeLoopInvariant in order to properly update MSSA.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, uabelho, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67470

llvm-svn: 371748
2019-09-12 17:12:51 +00:00
Sanjay Patel 2bfb955c51 [InstCombine] rename variable for readability; NFC
There's more that can be done here, but "OpI"
doesn't convey that we casted to BinaryOperator.

llvm-svn: 371682
2019-09-11 22:31:34 +00:00
Eli Friedman 403e08d4cf [ConstantHoisting] Fix non-determinism.
Differential Revision: https://reviews.llvm.org/D66114

llvm-svn: 371644
2019-09-11 18:55:00 +00:00
Petr Hosek 7bdad08429 Reland "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM"
This patch contains the basic functionality for reporting potentially
incorrect usage of __builtin_expect() by comparing the developer's
annotation against a collected PGO profile. A more detailed proposal and
discussion appears on the CFE-dev mailing list
(http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a
prototype of the initial frontend changes appear here in D65300

We revised the work in D65300 by moving the misexpect check into the
LLVM backend, and adding support for IR and sampling based profiles, in
addition to frontend instrumentation.

We add new misexpect metadata tags to those instructions directly
influenced by the llvm.expect intrinsic (branch, switch, and select)
when lowering the intrinsics. The misexpect metadata contains
information about the expected target of the intrinsic so that we can
check against the correct PGO counter when emitting diagnostics, and the
compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight.
We use these branch weight values to determine when to emit the
diagnostic to the user.

A future patch should address the comment at the top of
LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and
UnlikelyBranchWeight values into a shared space that can be accessed
outside of the LowerExpectIntrinsic pass. Once that is done, the
misexpect metadata can be updated to be smaller.

In the long term, it is possible to reconstruct portions of the
misexpect metadata from the existing profile data. However, we have
avoided this to keep the code simple, and because some kind of metadata
tag will be required to identify which branch/switch/select instructions
are influenced by the use of llvm.expect

Patch By: paulkirth
Differential Revision: https://reviews.llvm.org/D66324

llvm-svn: 371635
2019-09-11 16:19:50 +00:00
Florian Hahn 51de22c8ee Revert [InstCombine] Use SimplifyFMulInst to simplify multiply in fma.
This introduces additional rounding error in some cases. See D67434.

This reverts r371518 (git commit 18a1f0818b)

llvm-svn: 371634
2019-09-11 16:17:03 +00:00
Whitney Tsang 1ccba7c1a1 LLVM: Optimization Pass: Remove conflicting attribute, if any, before
adding new read attribute to an argument
Summary: Update optimization pass to prevent adding read-attribute to an
argument without removing its conflicting attribute.

A read attribute, based on the result of the attribute deduction
process, might be added to an argument. The attribute might be in
conflict with other read/write attribute currently associated with the
argument. To ensure the compatibility of attributes, conflicting
attribute, if any, must be removed before a new one is added.

The following snippet shows the current behavior of the compiler, where
the compilation process is aborted due to incompatible attributes.

$ cat x.ll
; ModuleID = 'x.bc'

%_type_of_d-ccc = type <{ i8*, i8, i8, i8, i8 }>

@d-ccc = internal global %_type_of_d-ccc <{ i8* null, i8 1, i8 13, i8 0,
i8 -127 }>, align 8

define void @foo(i32* writeonly %.aaa) {
foo_entry:
  %_param_.aaa = alloca i32*, align 8
  store i32* %.aaa, i32** %_param_.aaa, align 8
  store i8 0, i8* getelementptr inbounds (%_type_of_d-ccc,
%_type_of_d-ccc* @d-ccc, i32 0, i32 3)
  ret void
}

$ opt -O3 x.ll
Attributes 'readnone and writeonly' are incompatible!
void (i32*)* @foo
in function foo
LLVM ERROR: Broken function found, compilation aborted!
The purpose of this changeset is to fix the above error. This fix is
based on a suggestion from Johannes @jdoerfert (many thanks!!!)
Authored By: anhtuyen
Reviewer: nicholas, rnk, chandlerc, jdoerfert
Reviewed By: rnk
Subscribers: hiraditya, jdoerfert, llvm-commits, anhtuyen, LLVM
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D58694

llvm-svn: 371622
2019-09-11 14:26:22 +00:00
Sanjay Patel 80bea345d1 [InstCombine] fold sign-bit compares of srem
(srem X, pow2C) sgt/slt 0 can be reduced using bit hacks by masking
off the sign bit and the module (low) bits:
https://rise4fun.com/Alive/jSO
A '2' divisor allows slightly more folding:
https://rise4fun.com/Alive/tDBM

Any chance to remove an 'srem' use is probably worthwhile, but this is limited
to the one-use improvement case because doing more may expose other missing
folds. That means it does nothing for PR21929 yet:
https://bugs.llvm.org/show_bug.cgi?id=21929

Differential Revision: https://reviews.llvm.org/D67334

llvm-svn: 371610
2019-09-11 12:04:26 +00:00
David Bolvansky 4dae283cd3 [InstCombine] Fixed handling of isOpNewLike (PR11748)
llvm-svn: 371602
2019-09-11 10:37:03 +00:00
Florian Hahn e79381c3f7 [LoopInterchange] Drop unused splitInnerLoopHeader declaration.
llvm-svn: 371601
2019-09-11 10:32:15 +00:00
Dmitri Gribenko 57256af307 Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM"
This reverts commit r371584. It introduced a dependency from compiler-rt
to llvm/include/ADT, which is problematic for multiple reasons.

One is that it is a novel dependency edge, which needs cross-compliation
machinery for llvm/include/ADT (yes, it is true that right now
compiler-rt included only header-only libraries, however, if we allow
compiler-rt to depend on anything from ADT, other libraries will
eventually get used).

Secondly, depending on ADT from compiler-rt exposes ADT symbols from
compiler-rt, which would cause ODR violations when Clang is built with
the profile library.

llvm-svn: 371598
2019-09-11 09:16:17 +00:00
Florian Hahn e4961218fd [LoopInterchange] Properly move condition, induction increment and ops to latch.
Currently we only rely on the induction increment to come before the
condition to ensure the required instructions get moved to the new
latch.

This patch duplicates and moves the required instructions to the
newly created latch. We move the condition to the end of the new block,
then process its operands. We stop at operands that are defined
outside the loop, or are the induction PHI.

We duplicate the instructions and update the uses in the moved
instructions, to ensure other users remain intact. See the added
test2 for such an example.

Reviewers: efriedma, mcrosier

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D67367

llvm-svn: 371595
2019-09-11 08:23:23 +00:00
Hideto Ueno 1d68ed8c24 [Attributor] Implement "noalias" callsite argument deduction
Summary: Now, `nocapture` is deduced in Attributor therefore, this patch introduces deduction for `noalias` callsite argument using `nocapture`.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67286

llvm-svn: 371590
2019-09-11 07:00:33 +00:00
Hideto Ueno 3736764657 [Attributor][Fix] Manifest nocapture only in CSArgument or Argument
Summary:
We can query to Attributor whether the value is captured in the scope or not on the following way:

```
    const auto & NoCapAA = A.getAAFor<AANoCapture>(*this, IRPosition::value(V));
```
And if V is CallSiteReturned then `getDeducedAttribute` will add `nocatpure` to the callsite returned value. It is not valid.
This patch checks the position is an argument or call site argument.

This is tested in D67286.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67342

llvm-svn: 371589
2019-09-11 06:52:11 +00:00
Alexey Lapshin 6b1c6c1287 [Debuginfo][Instcombiner] Do not clone dbg.declare.
TryToSinkInstruction() has a bug: While updating debug info for
sunk instruction, it could clone dbg.declare intrinsic.
That is wrong. There could be only one dbg.declare.
The fix is to not clone dbg.declare intrinsic and to update
it`s arguments, to not to point to sunk instruction.

Differential Revision: https://reviews.llvm.org/D67217

llvm-svn: 371587
2019-09-11 06:07:16 +00:00
Petr Hosek 394a8ed8f1 clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM
This patch contains the basic functionality for reporting potentially
incorrect usage of __builtin_expect() by comparing the developer's
annotation against a collected PGO profile. A more detailed proposal and
discussion appears on the CFE-dev mailing list
(http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a
prototype of the initial frontend changes appear here in D65300

We revised the work in D65300 by moving the misexpect check into the
LLVM backend, and adding support for IR and sampling based profiles, in
addition to frontend instrumentation.

We add new misexpect metadata tags to those instructions directly
influenced by the llvm.expect intrinsic (branch, switch, and select)
when lowering the intrinsics. The misexpect metadata contains
information about the expected target of the intrinsic so that we can
check against the correct PGO counter when emitting diagnostics, and the
compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight.
We use these branch weight values to determine when to emit the
diagnostic to the user.

A future patch should address the comment at the top of
LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and
UnlikelyBranchWeight values into a shared space that can be accessed
outside of the LowerExpectIntrinsic pass. Once that is done, the
misexpect metadata can be updated to be smaller.

In the long term, it is possible to reconstruct portions of the
misexpect metadata from the existing profile data. However, we have
avoided this to keep the code simple, and because some kind of metadata
tag will be required to identify which branch/switch/select instructions
are influenced by the use of llvm.expect

Patch By: paulkirth
Differential Revision: https://reviews.llvm.org/D66324

llvm-svn: 371584
2019-09-11 01:09:16 +00:00
Alina Sbirlea f9cc0393b3 [MemorySSA] MemorySSA should not model debuginfo, and need not update it.
Reverts the change in r371084, but keeps the test.
After r371565, debuginfo cannot be modelled in MemorySSA, even with a
non-standard AA pipeline.

llvm-svn: 371573
2019-09-10 23:36:43 +00:00
Philip Reames cffa630c80 [Loads] Move generic code out of vectorizer into a location it might be reused [NFC]
llvm-svn: 371558
2019-09-10 21:33:53 +00:00
Philip Reames 1e1db80048 [ValueTracking] Factor our common speculation suppression logic [NFC]
Expose a utility function so that all places which want to suppress speculation (when otherwise legal) due to ordering and/or sanitizer interaction can do so.

llvm-svn: 371556
2019-09-10 21:12:29 +00:00
Florian Hahn 18a1f0818b [InstCombine] Use SimplifyFMulInst to simplify multiply in fma.
This allows us to fold fma's that multiply with 0.0. Also, the
multiply by 1.0 case is handled there as well. The fneg/fabs cases
are not handled by SimplifyFMulInst, so we need to keep them.

Reviewers: spatel, anemet, lebedev.ri

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D67351

llvm-svn: 371518
2019-09-10 13:10:28 +00:00
Dmitri Gribenko 2bf8d77453 Revert "Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline.""
This reverts commit r371502, it broke tests
(clang/test/CodeGenCXX/auto-var-init.cpp).

llvm-svn: 371507
2019-09-10 10:39:09 +00:00
Clement Courbet 612c260ec3 Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline."
With a fix for sanitizer breakage (see explanation in D60318).

llvm-svn: 371502
2019-09-10 09:18:00 +00:00
Petr Hosek 7d1757aba8 Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM"
This reverts commit r371484: this broke sanitizer-x86_64-linux-fast bot.

llvm-svn: 371488
2019-09-10 06:25:13 +00:00
Petr Hosek a10802fd73 clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM
This patch contains the basic functionality for reporting potentially
incorrect usage of __builtin_expect() by comparing the developer's
annotation against a collected PGO profile. A more detailed proposal and
discussion appears on the CFE-dev mailing list
(http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a
prototype of the initial frontend changes appear here in D65300

We revised the work in D65300 by moving the misexpect check into the
LLVM backend, and adding support for IR and sampling based profiles, in
addition to frontend instrumentation.

We add new misexpect metadata tags to those instructions directly
influenced by the llvm.expect intrinsic (branch, switch, and select)
when lowering the intrinsics. The misexpect metadata contains
information about the expected target of the intrinsic so that we can
check against the correct PGO counter when emitting diagnostics, and the
compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight.
We use these branch weight values to determine when to emit the
diagnostic to the user.

A future patch should address the comment at the top of
LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and
UnlikelyBranchWeight values into a shared space that can be accessed
outside of the LowerExpectIntrinsic pass. Once that is done, the
misexpect metadata can be updated to be smaller.

In the long term, it is possible to reconstruct portions of the
misexpect metadata from the existing profile data. However, we have
avoided this to keep the code simple, and because some kind of metadata
tag will be required to identify which branch/switch/select instructions
are influenced by the use of llvm.expect

Patch By: paulkirth
Differential Revision: https://reviews.llvm.org/D66324

llvm-svn: 371484
2019-09-10 03:11:39 +00:00
Philip Reames 7403569be7 [LoopVectorize] Leverage speculation safety to avoid masked.loads
If we're vectorizing a load in a predicated block, check to see if the load can be speculated rather than predicated.  This allows us to generate a normal vector load instead of a masked.load.

To do so, we must prove that all bytes accessed on any iteration of the original loop are dereferenceable, and that all loads (across all iterations) are properly aligned.  This is equivelent to proving that hoisting the load into the loop header in the original scalar loop is safe.

Note: There are a couple of code motion todos in the code.  My intention is to wait about a day - to be sure this sticks - and then perform the NFC motion without furthe review.

Differential Revision: https://reviews.llvm.org/D66688

llvm-svn: 371452
2019-09-09 20:54:13 +00:00
Sanjay Patel aff5bee35f [InstCombine] fold extract+insert into identity shuffle
This is similar to the existing fold for splats added with:
rL365379

If we can adjust the shuffle mask to include another element
in an identity mask (if it changes vector length, that's an
extract/insert subvector operation in the backend), then that
can eliminate extractelement/insertelement pairs in IR.

All targets are expected to lower shuffles with identity masks
efficiently.

llvm-svn: 371340
2019-09-08 19:03:01 +00:00
Simon Pilgrim 879ed20bde Fix typo. NFCI
llvm-svn: 371317
2019-09-07 18:09:09 +00:00
Roman Lebedev 45ba26599b [SimplifyCFG] SpeculativelyExecuteBB(): It's SpeculatedInstructions, not SpeculationCost
It counts the number of instructions we are ok speculating
(at most 1 there), not their cost, so rename accordingly.

llvm-svn: 371294
2019-09-07 09:06:06 +00:00
Hideto Ueno f2b9dc4758 [Attributor] ValueSimplify Abstract Attribute
Summary:
This patch introduces initial `AAValueSimplify` which simplifies a value in a context.

example
- (for function returned) If all the return values are the same and constant, then we can replace callsite returned with the constant.
- If an internal function takes the same value(constant) as an argument in the callsite, then we can replace the argument with that constant.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66967

llvm-svn: 371291
2019-09-07 07:03:05 +00:00
Teresa Johnson 9c27b59cec Change TargetLibraryInfo analysis passes to always require Function
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.

This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.

Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.

There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.

Reviewers: chandlerc, hfinkel

Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66428

llvm-svn: 371284
2019-09-07 03:09:36 +00:00
Evandro Menezes 7d677adf2d [InstCombine] Refactor substitution of instruction in the parent BB (NFC)
Add the new method `LibCallSimplifier::substituteInParent()` that calls
`LibCallSimplifier::replaceAllUsesWith()' and
`LibCallSimplifier::eraseFromParent()` back to back, simplifying the
resulting code.

llvm-svn: 371264
2019-09-06 22:07:11 +00:00
Sanjay Patel 4f0e429acc [SimplifyLibCalls] handle pow(x,-0.0) before it can assert (PR43233)
https://bugs.llvm.org/show_bug.cgi?id=43233

llvm-svn: 371221
2019-09-06 16:10:18 +00:00
Matt Arsenault 524a9d5774 InstCombine: Fix crash on icmp of gep with addrspacecasted null
llvm-svn: 371146
2019-09-05 23:39:21 +00:00
Vitaly Buka 9020f11377 [SimplifyCFG] Don't SimplifyBranchOnICmpChain with ExtraCase
Summary:
Here we try to avoid issues with "explicit branch" with SimplifyBranchOnICmpChain
which can check on undef. Msan by design reports branches on uninitialized
memory and undefs, so we have false report here.

In general msan does not like when we convert

```
// If at least one of them is true we can MSAN is ok if another is undefs
if (a || b)
  return;
```
into
```
// If 'a' is undef MSAN will complain even if 'b' is true
if (a)
  return;
if (b)
  return;
```

Example

Before optimization we had something like this:
```
while (true) {
  bool maybe_undef = doStuff();

  while (true) {
    char c = getChar();
    if (c != 10 && c != 13)
     continue
    break;
  }

  // we know that c == 10 || c == 13 if we get here,
  // so msan know that branch is not affected by maybe_undef
  if (maybe_undef || c == 10 || c == 13)
    continue;
  return;
}
```

SimplifyBranchOnICmpChain will convert that into
```
while (true) {
  bool maybe_undef = doStuff();

  while (true) {
    char c = getChar();
    if (c != 10 && c != 13)
      continue;
    break;
  }

  // however msan will complain here:
  if (maybe_undef)
    continue;

  // we know that c == 10 || c == 13, so either way we will get continue
  switch(c) {
    case 10: continue;
    case 13: continue;
  }
  return;
}
```

Reviewers: eugenis, efriedma

Reviewed By: eugenis, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67205

llvm-svn: 371138
2019-09-05 22:49:34 +00:00
Roman Lebedev 8360c42e25 [InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned sub overflow' check
A follow-up for r329011.
This may be changed to produce @llvm.sub.with.overflow in a later patch,
but for now just make things more consistent overall.

A few observations stem from this:
* There does not seem to be a similar one-instruction fold for uadd-overflow
* I'm not sure we'll want to canonicalize `B u> A` as `usub.with.overflow`,
  so since the `icmp` here no longer refers to `sub`,
  reconstructing `usub.with.overflow` will be problematic,
  and will likely require standalone pass (similar to DivRemPairs).

https://rise4fun.com/Alive/Zqs

Name: (A - B) u> A --> B u> A
  %t0 = sub i8 %A, %B
  %r = icmp ugt i8 %t0, %A
=>
  %r = icmp ugt i8 %B, %A

Name: (A - B) u<= A --> B u<= A
  %t0 = sub i8 %A, %B
  %r = icmp ule i8 %t0, %A
=>
  %r = icmp ule i8 %B, %A

Name: C u< (C - D) --> C u< D
  %t0 = sub i8 %C, %D
  %r = icmp ult i8 %C, %t0
=>
  %r = icmp ult i8 %C, %D

Name: C u>= (C - D) --> C u>= D
  %t0 = sub i8 %C, %D
  %r = icmp uge i8 %C, %t0
=>
  %r = icmp uge i8 %C, %D

llvm-svn: 371101
2019-09-05 17:41:02 +00:00
Roman Lebedev ecb7ea1ae7 [InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned add overflow' check
A follow-up for r342004.
This will be changed to produce @llvm.add.with.overflow in a later patch,
but for now just make things more consistent overall.

https://rise4fun.com/Alive/qxE

Name: (Op1 + X) u< Op1 --> ~Op1 u< X
  %t0 = add i8 %Op1, %X
  %r = icmp ult i8 %t0, %Op1
=>
  %n = xor i8 %Op1, -1
  %r = icmp ult i8 %n, %X

Name: (Op1 + X) u>= Op1 --> ~Op1 u>= X
  %t0 = add i8 %Op1, %X
  %r = icmp uge i8 %t0, %Op1
=>
  %n = xor i8 %Op1, -1
  %r = icmp uge i8 %n, %X

;-------------------------------------------------------------------------------

Name: Op0 u> (Op0 + X) --> X u> ~Op0
  %t0 = add i8 %Op0, %X
  %r = icmp ugt i8 %Op0, %t0
=>
  %n = xor i8 %Op0, -1
  %r = icmp ugt i8 %X, %n

Name: Op0 u<= (Op0 + X) --> X u<= ~Op0
  %t0 = add i8 %Op0, %X
  %r = icmp ule i8 %Op0, %t0
=>
  %n = xor i8 %Op0, -1
  %r = icmp ule i8 %X, %n

llvm-svn: 371100
2019-09-05 17:40:49 +00:00
Denis Bakhvalov 58f172f05a [MergedLoadStoreMotion] Sink stores to BB with more than 2 predecessors
If we have:

bb5:
  br i1 %arg3, label %bb6, label %bb7

bb6:
  %tmp = getelementptr inbounds i32, i32* %arg1, i64 2
  store i32 3, i32* %tmp, align 4
  br label %bb9

bb7:
  %tmp8 = getelementptr inbounds i32, i32* %arg1, i64 2
  store i32 3, i32* %tmp8, align 4
  br label %bb9

bb9:  ; preds = %bb4, %bb6, %bb7
  ...

We can't sink stores directly into bb9.
This patch creates new BB that is successor of %bb6 and %bb7
and sinks stores into that block.

SplitFooterBB is the parameter to the pass that controls
that behavior.

Change-Id: I7fdf50a772b84633e4b1b860e905bf7e3e29940f
Differential: https://reviews.llvm.org/D66234
llvm-svn: 371089
2019-09-05 17:00:32 +00:00
Alina Sbirlea 2ac69aadb5 [MemorySSA] Verify MSSAUpdater exists.
llvm-svn: 371087
2019-09-05 16:58:15 +00:00
Hiroshi Yamauchi d842f2eec4 [PGO][CHR] Speed up following long, interlinked use-def chains.
Summary:
Avoid visiting an instruction more than once by using a map.

This is similar to https://reviews.llvm.org/rL361416.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67198

llvm-svn: 371086
2019-09-05 16:56:55 +00:00
Alina Sbirlea ae900d3882 [MemorySSA] Update MemorySSA when removing debug.value calls.
llvm-svn: 371084
2019-09-05 16:25:24 +00:00
Guillaume Chatelet 33671ceffa [LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::Align
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67223

llvm-svn: 371063
2019-09-05 13:09:42 +00:00
Johannes Doerfert 56e9b608ad [Attributor][Stats] Use the right statistics macro
llvm-svn: 370976
2019-09-04 20:34:57 +00:00
Johannes Doerfert 7ab5253704 [Attributor][Fix] Make sure we do not delete live code
Summary: Liveness needs to mark edges, not blocks as dead.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67191

llvm-svn: 370975
2019-09-04 20:34:52 +00:00
Leonard Chan eca01b031d [NewPM][Sancov] Make Sancov a Module Pass instead of 2 Passes
This patch merges the sancov module and funciton passes into one module pass.

The reason for this is because we ran into an out of memory error when
attempting to run asan fuzzer on some protobufs (pc.cc files). I traced the OOM
error to the destructor of SanitizerCoverage where we only call
appendTo[Compiler]Used which calls appendToUsedList. I'm not sure where precisely
in appendToUsedList causes the OOM, but I am able to confirm that it's calling
this function *repeatedly* that causes the OOM. (I hacked sancov a bit such that
I can still create and destroy a new sancov on every function run, but only call
appendToUsedList after all functions in the module have finished. This passes, but
when I make it such that appendToUsedList is called on every sancov destruction,
we hit OOM.)

I don't think the OOM is from just adding to the SmallSet and SmallVector inside
appendToUsedList since in either case for a given module, they'll have the same
max size. I suspect that when the existing llvm.compiler.used global is erased,
the memory behind it isn't freed. I could be wrong on this though.

This patch works around the OOM issue by just calling appendToUsedList at the
end of every module run instead of function run. The same amount of constants
still get added to llvm.compiler.used, abd we make the pass usage and logic
simpler by not having any inter-pass dependencies.

Differential Revision: https://reviews.llvm.org/D66988

llvm-svn: 370971
2019-09-04 20:30:29 +00:00
Alina Sbirlea 6da79ce1fe [MemorySSA] Re-enable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370957
2019-09-04 19:16:04 +00:00
Johannes Doerfert a7a3b3aa43 [Attributor][Fix] Ensure the attribute names are created properly
The names of the attributes were not always created properly which
caused problems with the yaml output.

llvm-svn: 370956
2019-09-04 19:01:08 +00:00
Philip Reames 4228245e41 [NFC] Switch last couple of invariant_load checks to use hasMetadata
llvm-svn: 370948
2019-09-04 18:27:31 +00:00
David Bolvansky 420cbb6190 [InstCombine] sub(xor(x, y), or(x, y)) -> neg(and(x, y))
Summary:
```
Name: sub(xor(x, y), or(x, y)) -> neg(and(x, y))
%or = or i32 %y, %x
%xor = xor i32 %x, %y
%sub = sub i32 %xor, %or
  =>
%sub1 = and i32 %x, %y
%sub = sub i32 0, %sub1

Optimization: sub(xor(x, y), or(x, y)) -> neg(and(x, y))
Done: 1
Optimization is correct!
```

https://rise4fun.com/Alive/8OI

Reviewers: lebedev.ri

Reviewed By: lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67188

llvm-svn: 370945
2019-09-04 18:03:21 +00:00
David Bolvansky 0e07248704 [InstCombine] Fold sub (and A, B) (or A, B)) to neg (xor A, B)
Summary:
```
Name: sub(and(x, y), or(x, y)) -> neg(xor(x, y))
%or = or i32 %y, %x
%and = and i32 %x, %y
%sub = sub i32 %and, %or
  =>
%sub1 = xor i32 %x, %y
%sub = sub i32 0, %sub1

Optimization: sub(and(x, y), or(x, y)) -> neg(xor(x, y))
Done: 1
Optimization is correct!
```

https://rise4fun.com/Alive/VI6

Found by @lebedev.ri. Also author of the proof.

Reviewers: lebedev.ri, spatel

Reviewed By: lebedev.ri

Subscribers: llvm-commits, lebedev.ri

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67155

llvm-svn: 370934
2019-09-04 17:30:53 +00:00
Philip Reames 27820f9909 [Instruction] Add hasMetadata(Kind) helper [NFC]
It's a common idiom, so let's add the obvious wrapper for metadata kinds which are basically booleans.

llvm-svn: 370933
2019-09-04 17:28:48 +00:00
Johannes Doerfert 2f6220633c [Attributor] Look at internal functions only on-demand
Summary:
Instead of building attributes for internal functions which we do not
update as long as we assume they are dead, we now do not create
attributes until we assume the internal function to be live. This
improves the number of required iterations, as well as the number of
required updates, in real code. On our tests, the results are mixed.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66914

llvm-svn: 370924
2019-09-04 16:35:20 +00:00
Johannes Doerfert 97fd582b91 [Attributor] Use the white list for attributes consistently
Summary:
We create attributes on-demand so we need to check the white list
on-demand. This also unifies the location at which we create,
initialize, and eventually invalidate new abstract attributes.

The tests show mixed results, a few more call site attributes are
determined which can cause more iterations.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66913

llvm-svn: 370922
2019-09-04 16:26:20 +00:00
Johannes Doerfert b0412e437c [Attributor] Deal more explicit with non-exact definitions
Summary:
Before we tried to rule out non-exact definitions early but that lead to
on-demand attributes created for them anyway. As a consequence we needed
to look at the definition in the initialize of each attribute again.
This patch centralized this lookup and tightens the condition under
which we give up on non-exact definitions.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67115

llvm-svn: 370917
2019-09-04 16:16:13 +00:00
Sanjay Patel 4a2cd7be5a [InstSimplify] guard against unreachable code (PR43218)
This would crash:
https://bugs.llvm.org/show_bug.cgi?id=43218

llvm-svn: 370911
2019-09-04 15:12:55 +00:00
Alexey Lapshin cbf1f3b771 [Debuginfo][SROA] Need to handle dbg.value in SROA pass.
SROA pass processes debug info incorrecly if applied twice.
Specifically, after SROA works first time, instcombine converts dbg.declare
intrinsics into dbg.value. Inlining creates new opportunities for SROA,
so it is called again. This time it does not handle correctly previously
inserted dbg.value intrinsics.

Differential Revision: https://reviews.llvm.org/D64595

llvm-svn: 370906
2019-09-04 14:19:49 +00:00
David Bolvansky 358b80b340 [InstCombine] Fold sub (or A, B) (and A, B) to (xor A, B)
Summary:
```
Name: sub or and to xor
%or = or i32 %y, %x
%and = and i32 %x, %y
%sub = sub i32 %or, %and
  =>
%sub = xor i32 %x, %y

Optimization: sub or and to xor
Done: 1
Optimization is correct!
```
https://rise4fun.com/Alive/eJu

Reviewers: spatel, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67153

llvm-svn: 370883
2019-09-04 12:00:33 +00:00
Philip Reames 30dc2da827 [GVN] Remove a todo introduced w/rL370791
When I dug into this, it turns out to be *much* more involved than I'd realized and doesn't actually simplify anything.  

The general purpose of the leader table is that we want to find the most-dominating definition quickly.  The problem for equivalance folding is slightly different; we want to find the most dominating *value* whose definition block dominates our use quickly.

To make this change, we'd end up having to restructure the leader table (either the sorting thereof, or maybe even introducing multiple leader tables per value) and that complexity is just not worth it.

llvm-svn: 370824
2019-09-03 21:56:17 +00:00
Alina Sbirlea ccb1862bc9 [MemorySSA] Disable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370821
2019-09-03 21:20:46 +00:00
Johannes Doerfert b19cd27b28 [Attributor] Use the delete API for liveness
Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66833

llvm-svn: 370818
2019-09-03 20:42:16 +00:00
Johannes Doerfert 7516a5e045 [Attributor] Deduce "no-capture" argument attribute
Add the no-capture argument attribute deduction to the Attributor
fixpoint framework.

The new string attributed "no-capture-maybe-returned" is introduced to
allow deduction of no-capture through functions that "capture" an
argument but only by "returning" it. It is only used by the Attributor
for testing.

Differential Revision: https://reviews.llvm.org/D59922

llvm-svn: 370817
2019-09-03 20:37:24 +00:00
Alina Sbirlea e331d50534 [MemorySSA] Re-enable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370811
2019-09-03 19:28:37 +00:00
Philip Reames 37e2f5f125 [GVN] Propagate simple equalities from assumes within the tail of the block
This extends the existing logic for propagating constant expressions in an analogous manner for what we do across basic blocks. The core point is that we chose some order of operands, and canonicalize uses towards that one.

The heuristic used is inspired by the one used across blocks; in a follow up change, I'd plan to common them so that the cross block version uses the slightly stronger ordering herein. 

As noted by the TODOs in the code, there's a good amount of room for improving the existing code and making it more powerful.  Some follow up work planned.

Differential Revision: https://reviews.llvm.org/D66977

llvm-svn: 370791
2019-09-03 17:31:19 +00:00
Roman Lebedev bdd65351d3 Revert r370454 "[LoopIdiomRecognize] BCmp loop idiom recognition"
https://bugs.llvm.org/show_bug.cgi?id=43206 was filed,
claiming that there is a miscompilation.
Reverting until i investigate.

This reverts commit r370454

llvm-svn: 370788
2019-09-03 17:14:56 +00:00
Bjorn Pettersson dd18ce4501 [LV] Fix miscompiles by adding non-header PHI nodes to AllowedExit
Summary:
Fold-tail currently supports reduction last-vector-value live-out's,
but has yet to support last-scalar-value live-outs, including
non-header phi's. As it relies on AllowedExit in order to detect
them and bail out we need to add the non-header PHI nodes to
AllowedExit, otherwise we end up with miscompiles.

Solves https://bugs.llvm.org/show_bug.cgi?id=43166

Reviewers: fhahn, Ayal

Reviewed By: fhahn, Ayal

Subscribers: anna, hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67074

llvm-svn: 370721
2019-09-03 09:33:55 +00:00
Sjoerd Meijer 718f909ccd [LV] Tail-folding, runtime scev checks
Now that we allow tail-folding, not only when we optimise for size, make
sure we do not run in this assert.

Differential revision: https://reviews.llvm.org/D66932

llvm-svn: 370711
2019-09-03 08:53:02 +00:00
Sjoerd Meijer 0469b0e4ef [LV] Tail-folding with runtime memory checks
The loop vectorizer was running in an assert when it tried to fold the tail and
had to emit runtime memory disambiguation checks.

Differential revision: https://reviews.llvm.org/D66803

llvm-svn: 370707
2019-09-03 08:38:24 +00:00
Sanjay Patel 561c39994b [InstCombine] recognize bswap disguised as shufflevector
bitcast <N x i8> (shuf X, undef, <N, N-1,...0>) to i{N*8} --> bswap (bitcast X to i{N*8})

In PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146
...we have a more complicated case where SLP is making a mess of bswap. This patch won't
do anything for that currently, but we need to improve bswap recognition in instcombine,
SLP, and/or a standalone pass to avoid that problem.

This is limited using the data-layout so we don't try to do this transform with actual
vector types. The backend does not appear to have folds to convert in either direction,
so we don't want to mess up something that is actually better lowered as a shuffle.

On x86, we're trading something like this:

  vmovd	%edi, %xmm0
  vpshufb	LCPI0_0(%rip), %xmm0, %xmm0 ## xmm0 = xmm0[3,2,1,0,u,u,u,u,u,u,u,u,u,u,u,u]
  vmovd	%xmm0, %eax

For:

  movl	%edi, %eax
  bswapl	%eax

Differential Revision: https://reviews.llvm.org/D66965

llvm-svn: 370659
2019-09-02 13:33:20 +00:00
David Bolvansky ff0ad3c43d [InstCombine] mempcpy(d,s,n) to memcpy(d,s,n) + n
Summary:
Back-end currently expands mempcpy, but middle-end should work with memcpy instead of mempcpy to enable more memcpy-optimization.

GCC backend emits mempcpy, so LLVM backend could form it too, if we know mempcpy libcall is better than memcpy + n.
https://godbolt.org/z/dOCG96

Reviewers: efriedma, spatel, craig.topper, RKSimon, jdoerfert

Reviewed By: efriedma

Subscribers: hjl.tools, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65737

llvm-svn: 370593
2019-08-31 18:19:05 +00:00
Simon Pilgrim 757cc16ab7 Fix cppcheck shadow variable and variable scope warnings. NFCI.
llvm-svn: 370580
2019-08-31 12:30:19 +00:00
Nikita Popov b9e668f2e7 [CVP] Generate simpler code for elided with.overflow intrinsics
Use a { iN undef, i1 false } struct as the base, and only insert
the first operand, instead of using { iN undef, i1 undef } as the
base and inserting both. This is the same as what we do in InstCombine.

Differential Revision: https://reviews.llvm.org/D67034

llvm-svn: 370573
2019-08-31 09:58:37 +00:00
Wei Mi 798e59b81f [SampleFDO] Add profile symbol list section to discriminate function being
cold versus function being newly added.

This is the second half of https://reviews.llvm.org/D66374.

Profile symbol list is the collection of function symbols showing up in
the binary which generates the current profile. It is used to discriminate
function being cold versus function being newly added. Profile symbol list
is only added for profile with ExtBinary format.

During profile use compilation, when profile-sample-accurate is enabled,
a function without profile will be regarded as cold only when it is
contained in that list.

Differential Revision: https://reviews.llvm.org/D66766

llvm-svn: 370563
2019-08-31 02:27:26 +00:00
Wei Mi 5ef5829fb0 [GVN] Verify value equality before doing phi translation for call instruction
This is an updated version of https://reviews.llvm.org/D66909 to fix PR42605.

Basically, current phi translatation translates an old value number to an new
value number for a call instruction based on the literal equality of call
expression, without verifying there is no clobber in between. This is incorrect.

To get a finegrain check, use MachineDependence analysis to do the job. However,
this is still not ideal. Although given a call instruction,
`MemoryDependenceResults::getCallDependencyFrom` returns identical call
instructions without clobber in between using MemDepResult with its DepType to
be `Def`. However, identical is too strict here and we want it to be relaxed a
little to consider phi-translation -- callee is the same, param operands can be
different. That means changing the semantic of `MemDepResult::Def` and I don't
know the potential impact.

So currently the patch is still conservative to only handle
MemDepResult::NonFuncLocal, which means the current call has no function local
clobber. If there is clobber, even if the clobber doesn't stand in between the
current call and the call with the new value, we won't do phi-translate.

Differential Revision: https://reviews.llvm.org/D67013

llvm-svn: 370547
2019-08-30 23:01:22 +00:00
Johannes Doerfert 659a8707d6 [Attributor] Fix: do not pretend to preserve the CFG
llvm-svn: 370485
2019-08-30 16:35:10 +00:00
Johannes Doerfert 3fac668d83 [Attributor] Use existing function information for the call site
Summary:
Instead of recomputing information for call sites we now use the
function information directly. This is always valid and once we have
call site specific information we can improve here.

This patch also bootstraps attributes that are created on-demand through
an initial update call. Information that is known will then directly be
available in the new attribute without causing an iteration delay.

The tests show how this improves the iteration count.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66781

llvm-svn: 370480
2019-08-30 15:24:52 +00:00
Johannes Doerfert 81df452d82 [Attributor] Manifest load/store alignment generally
Summary:
Any pointer could have load/store users not only floating ones so we
move the manifest logic for alignment into the AAAlignImpl class.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66922

llvm-svn: 370479
2019-08-30 15:22:28 +00:00
Piotr Sobczak 67b979466a [InstCombine][AMDGPU] Simplify tbuffer loads
Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66926

llvm-svn: 370475
2019-08-30 14:20:04 +00:00
Haojian Wu ed170c9bf9 Remove an extra ";", NFC.
llvm-svn: 370465
2019-08-30 12:09:31 +00:00
Hideto Ueno 6381b143f6 [Attributor] Implement AANoAliasCallSiteArgument initialization
Summary: This patch adds an appropriate `initialize` method for `AANoAliasCallSiteArgument`.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66927

llvm-svn: 370456
2019-08-30 10:00:32 +00:00
Roman Lebedev 5c9f3cfec7 [LoopIdiomRecognize] BCmp loop idiom recognition
Summary:
@mclow.lists brought up this issue up in IRC.
It is a reasonably common problem to compare some two values for equality.
Those may be just some integers, strings or arrays of integers.

In C, there is `memcmp()`, `bcmp()` functions.
In C++, there exists `std::equal()` algorithm.
One can also write that function manually.

libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for
various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ

libc++ does not do anything like that, it simply relies on simple C++'s
`operator==()`. https://godbolt.org/z/er0Zwf (GOOD!)

So likely, there exists a certain performance opportunities.
Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that
is using `memcmp()` (in this case, compiled with modified compiler). {F8768213}

```
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iterator>
#include <limits>
#include <random>
#include <type_traits>
#include <utility>
#include <vector>

#include "benchmark/benchmark.h"

template <class T>
bool equal(T* a, T* a_end, T* b) noexcept {
  for (; a != a_end; ++a, ++b) {
    if (*a != *b) return false;
  }
  return true;
}

template <typename T>
std::vector<T> getVectorOfRandomNumbers(size_t count) {
  std::random_device rd;
  std::mt19937 gen(rd());
  std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
                                       std::numeric_limits<T>::max());
  std::vector<T> v;
  v.reserve(count);
  std::generate_n(std::back_inserter(v), count,
                  [&dis, &gen]() { return dis(gen); });
  assert(v.size() == count);
  return v;
}

struct Identical {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto Tmp = getVectorOfRandomNumbers<T>(count);
    return std::make_pair(Tmp, std::move(Tmp));
  }
};

struct InequalHalfway {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto V0 = getVectorOfRandomNumbers<T>(count);
    auto V1 = V0;
    V1[V1.size() / size_t(2)]++;  // just change the value.
    return std::make_pair(std::move(V0), std::move(V1));
  }
};

template <class T, class Gen>
void BM_bcmp(benchmark::State& state) {
  const size_t Length = state.range(0);

  const std::pair<std::vector<T>, std::vector<T>> Data =
      Gen::template Gen<T>(Length);
  const std::vector<T>& a = Data.first;
  const std::vector<T>& b = Data.second;
  assert(a.size() == Length && b.size() == a.size());

  benchmark::ClobberMemory();
  benchmark::DoNotOptimize(a);
  benchmark::DoNotOptimize(a.data());
  benchmark::DoNotOptimize(b);
  benchmark::DoNotOptimize(b.data());

  for (auto _ : state) {
    const bool is_equal = equal(a.data(), a.data() + a.size(), b.data());
    benchmark::DoNotOptimize(is_equal);
  }
  state.SetComplexityN(Length);
  state.counters["eltcnt"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
  state.counters["eltcnt/sec"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
  const size_t BytesRead = 2 * sizeof(T) * Length;
  state.counters["bytes_read/iteration"] =
      benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
                         benchmark::Counter::OneK::kIs1024);
  state.counters["bytes_read/sec"] = benchmark::Counter(
      BytesRead, benchmark::Counter::kIsIterationInvariantRate,
      benchmark::Counter::OneK::kIs1024);
}

template <typename T>
static void CustomArguments(benchmark::internal::Benchmark* b) {
  const size_t L2SizeBytes = []() {
    for (const benchmark::CPUInfo::CacheInfo& I :
         benchmark::CPUInfo::Get().caches) {
      if (I.level == 2) return I.size;
    }
    return 0;
  }();
  // What is the largest range we can check to always fit within given L2 cache?
  const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
                        /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
  b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
}

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical)
    ->Apply(CustomArguments<uint64_t>);

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway)
    ->Apply(CustomArguments<uint64_t>);
```
{F8768210}
```
$ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench
RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx
2019-04-25 21:17:11
Running build-old/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 0.65, 3.90, 4.14
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000           432131 ns       432101 ns         1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s
BM_bcmp<uint8_t, Identical>_BigO               0.86 N          0.86 N
BM_bcmp<uint8_t, Identical>_RMS                   8 %             8 %
<...>
BM_bcmp<uint16_t, Identical>/256000          161408 ns       161409 ns         4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s
BM_bcmp<uint16_t, Identical>_BigO              0.67 N          0.67 N
BM_bcmp<uint16_t, Identical>_RMS                 25 %            25 %
<...>
BM_bcmp<uint32_t, Identical>/128000           81497 ns        81488 ns         8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s
BM_bcmp<uint32_t, Identical>_BigO              0.71 N          0.71 N
BM_bcmp<uint32_t, Identical>_RMS                 42 %            42 %
<...>
BM_bcmp<uint64_t, Identical>/64000            50138 ns        50138 ns        10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s
BM_bcmp<uint64_t, Identical>_BigO              0.84 N          0.84 N
BM_bcmp<uint64_t, Identical>_RMS                 27 %            27 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000      192405 ns       192392 ns         3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.38 N          0.38 N
BM_bcmp<uint8_t, InequalHalfway>_RMS              3 %             3 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000     127858 ns       127860 ns         5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint16_t, InequalHalfway>_RMS             0 %             0 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000      49140 ns        49140 ns        14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.40 N          0.40 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            18 %            18 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000       32101 ns        32099 ns        21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint64_t, InequalHalfway>_RMS             1 %             1 %
RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0
2019-04-25 21:19:29
Running build-new/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 1.01, 2.85, 3.71
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000            18593 ns        18590 ns        37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s
BM_bcmp<uint8_t, Identical>_BigO               0.04 N          0.04 N
BM_bcmp<uint8_t, Identical>_RMS                  37 %            37 %
<...>
BM_bcmp<uint16_t, Identical>/256000           18950 ns        18948 ns        37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s
BM_bcmp<uint16_t, Identical>_BigO              0.08 N          0.08 N
BM_bcmp<uint16_t, Identical>_RMS                 34 %            34 %
<...>
BM_bcmp<uint32_t, Identical>/128000           18627 ns        18627 ns        37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s
BM_bcmp<uint32_t, Identical>_BigO              0.16 N          0.16 N
BM_bcmp<uint32_t, Identical>_RMS                 35 %            35 %
<...>
BM_bcmp<uint64_t, Identical>/64000            18855 ns        18855 ns        37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s
BM_bcmp<uint64_t, Identical>_BigO              0.32 N          0.32 N
BM_bcmp<uint64_t, Identical>_RMS                 33 %            33 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000        9570 ns         9569 ns        73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.02 N          0.02 N
BM_bcmp<uint8_t, InequalHalfway>_RMS             29 %            29 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000       9547 ns         9547 ns        74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.04 N          0.04 N
BM_bcmp<uint16_t, InequalHalfway>_RMS            29 %            29 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000       9396 ns         9394 ns        73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.08 N          0.08 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            30 %            30 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000        9499 ns         9498 ns        73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.16 N          0.16 N
BM_bcmp<uint64_t, InequalHalfway>_RMS            28 %            28 %
Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench
Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000                      -0.9570         -0.9570        432131         18593        432101         18590
<...>
BM_bcmp<uint16_t, Identical>/256000                     -0.8826         -0.8826        161408         18950        161409         18948
<...>
BM_bcmp<uint32_t, Identical>/128000                     -0.7714         -0.7714         81497         18627         81488         18627
<...>
BM_bcmp<uint64_t, Identical>/64000                      -0.6239         -0.6239         50138         18855         50138         18855
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000                 -0.9503         -0.9503        192405          9570        192392          9569
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000                -0.9253         -0.9253        127858          9547        127860          9547
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000                -0.8088         -0.8088         49140          9396         49140          9394
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000                 -0.7041         -0.7041         32101          9499         32099          9498
```

What can we tell from the benchmark?
* Performance of naive equality check somewhat improves with element size,
  maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s
  for uint64_t. I think, that instability implies performance problems.
* Performance of `memcmp()`-aware benchmark always maxes out at around
  bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the
  naive variant!
* eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at
  eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and
  linearly decreases with element size.
  For uint64_t, it's ~4x+ the elements/second.
* The call obvious is more pricey than the loop, with small element count.
  As it can be seen from the full output {F8768210}, the `memcmp()` is almost
  universally worse, independent of the element size (and thus buffer size) when
  element count is less than 8.

So all in all, bcmp idiom does indeed pose untapped performance headroom.
This diff does implement said idiom recognition. I think a reasonable test
coverage is present, but do tell if there is anything obvious missing.

Now, quality. This does succeed to build and pass the test-suite, at least
without any non-bundled elements. {F8768216} {F8768217}
This transform fires 91 times:
```
$ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json
Tests: 1149
Metric: loop-idiom.NumBCmp

Program                                         result-new

MultiSourc...Benchmarks/7zip/7zip-benchmark    79.00
MultiSource/Applications/d/make_dparser         3.00
SingleSource/UnitTests/vla                      2.00
MultiSource/Applications/Burg/burg              1.00
MultiSourc.../Applications/JM/lencod/lencod     1.00
MultiSource/Applications/lemon/lemon            1.00
MultiSource/Benchmarks/Bullet/bullet            1.00
MultiSourc...e/Benchmarks/MallocBench/gs/gs     1.00
MultiSourc...gs-C/TimberWolfMC/timberwolfmc     1.00
MultiSourc...Prolangs-C/simulator/simulator     1.00
```
The size changes are:
I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look.
```
$ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash
Tests: 1149
Same hash: 907 (filtered out)
Remaining: 242
Metric: size..text

Program                                        result-old result-new diff
test-suite...ingleSource/UnitTests/vla.test   753.00     833.00     10.6%
test-suite...marks/7zip/7zip-benchmark.test   1001697.00 966657.00  -3.5%
test-suite...ngs-C/simulator/simulator.test   32369.00   32321.00   -0.1%
test-suite...plications/d/make_dparser.test   89585.00   89505.00   -0.1%
test-suite...ce/Applications/Burg/burg.test   40817.00   40785.00   -0.1%
test-suite.../Applications/lemon/lemon.test   47281.00   47249.00   -0.1%
test-suite...TimberWolfMC/timberwolfmc.test   250065.00  250113.00   0.0%
test-suite...chmarks/MallocBench/gs/gs.test   149889.00  149873.00  -0.0%
test-suite...ications/JM/lencod/lencod.test   769585.00  769569.00  -0.0%
test-suite.../Benchmarks/Bullet/bullet.test   770049.00  770049.00   0.0%
test-suite...HMARK_ANISTROPIC_DIFFUSION/128    NaN        NaN        nan%
test-suite...HMARK_ANISTROPIC_DIFFUSION/256    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/64    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/32    NaN        NaN        nan%
test-suite...ENCHMARK_BILATERAL_FILTER/64/4    NaN        NaN        nan%
Geomean difference                                                   nan%
         result-old    result-new       diff
count  1.000000e+01  10.00000      10.000000
mean   3.152090e+05  311695.40000  0.006749
std    3.790398e+05  372091.42232  0.036605
min    7.530000e+02  833.00000    -0.034981
25%    4.243300e+04  42401.00000  -0.000866
50%    1.197370e+05  119689.00000 -0.000392
75%    6.397050e+05  639705.00000 -0.000005
max    1.001697e+06  966657.00000  0.106242
```

I don't have timings though.

And now to the code. The basic idea is to completely replace the whole loop.
If we can't fully kill it, don't transform.
I have left one or two comments in the code, so hopefully it can be understood.

Also, there is a few TODO's that i have left for follow-ups:
* widening of `memcmp()`/`bcmp()`
* step smaller than the comparison size
* Metadata propagation
* more than two blocks as long as there is still a single backedge?
* ???

Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet

Reviewed By: courbet

Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61144

llvm-svn: 370454
2019-08-30 09:51:23 +00:00
Sanjay Patel 65f1c04000 [InstCombine] reduce duplicated code; NFC
llvm-svn: 370399
2019-08-29 19:36:18 +00:00
Alina Sbirlea 4b87023bae Revert enabling MemorySSA.
Breaks sanitizers bots.

Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370397
2019-08-29 19:01:23 +00:00
Florian Hahn f9cdb98f40 [LoopUnrollAndJam] Use Lazy strategy for DTU.
We can also apply the earlier updates to the lazy DTU, instead of
applying them directly.

Reviewers: kuhar, brzycki, asbirlea, SjoerdMeijer

Reviewed By: brzycki, asbirlea, SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D66918

llvm-svn: 370391
2019-08-29 17:47:58 +00:00
Alina Sbirlea 6289ee941d [MemorySSA & LoopPassManager] Enable MemorySSA as loop dependency. Update tests.
Summary:
I'm not planning to check this in at the moment, but feedback is very welcome, in particular how this affects performance.
The feedback obtains here will guide the next steps towards enabling this.

This patch enables the use of MemorySSA in the loop pass manager.

Passes that currently use MemorySSA:
 - EarlyCSE
Passes that use MemorySSA after this patch:
 - EarlyCSE
 - LICM
 - SimpleLoopUnswitch
Loop passes that update MemorySSA (and do not use it yet, but could use it after this patch):
 - LoopInstSimplify
 - LoopSimplifyCFG
 - LoopUnswitch
 - LoopRotate
 - LoopSimplify
 - LCSSA
Loop passes that do *not* update MemorySSA:
 - IndVarSimplify
 - LoopDelete
 - LoopIdiom
 - LoopSink
 - LoopUnroll
 - LoopInterchange
 - LoopUnrollAndJam
 - LoopVectorize
 - LoopReroll
 - IRCE

Reviewers: chandlerc, george.burgess.iv, davide, sanjoy, gberry

Subscribers: jlebar, Prazek, dmgreen, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370384
2019-08-29 17:08:13 +00:00
Michael Liao 001871dee8 [SimplifyCFG] Skip sinking common lifetime markers of `alloca`.
Summary:
- Similar to the workaround in fix of PR30188, skip sinking common
  lifetime markers of `alloca`. They are mostly left there after
  inlining functions in branches.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66950

llvm-svn: 370376
2019-08-29 16:12:05 +00:00
Roman Lebedev 9f35d2b564 [SimplifyCFG] FoldTwoEntryPHINode(): don't bailout on i1 PHI's if we can hoist a 'not' from incoming values
Summary:
As it can be seen in the tests in D65143/D65144, even though we have formed an '@llvm.umul.with.overflow'
and got rid of potential for division-by-zero, the control flow remains, we still have that branch.

We have this condition:
```
  // Don't fold i1 branches on PHIs which contain binary operators
  // These can often be turned into switches and other things.
  if (PN->getType()->isIntegerTy(1) &&
      (isa<BinaryOperator>(PN->getIncomingValue(0)) ||
       isa<BinaryOperator>(PN->getIncomingValue(1)) ||
       isa<BinaryOperator>(IfCond)))
    return false;
```
which was added back in rL121764 to help with `select` formation i think?

That check prevents us to flatten the CFG here, even though we know
we no longer need that guard and will be able to drop everything
but the '@llvm.umul.with.overflow' + `not`.

As it can be seen from tests, we end here because the `not` is being
sinked into the PHI's incoming values by InstCombine,
so we can't workaround this by hoisting it to after PHI.

Thus i suggest that we relax that check to not bailout if we'd get to hoist the `not`.

Reviewers: craig.topper, spatel, fhahn, nikic

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65147

llvm-svn: 370349
2019-08-29 12:47:34 +00:00
Roman Lebedev 473a063a5e [InstCombine] Fold '((%x * %y) u/ %x) != %y' to '@llvm.umul.with.overflow' + overflow bit extraction
Summary:
`((%x * %y) u/ %x) != %y` is one of (3?) common ways to check that
some unsigned multiplication (will not) overflow.
Currently, we don't catch it. We could:
```
$ /repositories/alive2/build-Clang-unknown/alive -root-only ~/llvm-patch1.ll
Processing /home/lebedevri/llvm-patch1.ll..

----------------------------------------
Name: no overflow
  %o0 = mul i4 %y, %x
  %o1 = udiv i4 %o0, %x
  %r = icmp ne i4 %o1, %y
  ret i1 %r
=>
  %n0 = umul_overflow i4 %x, %y
  %o0 = extractvalue {i4, i1} %n0, 0
  %o1 = udiv %o0, %x
  %r = extractvalue {i4, i1} %n0, 1
  ret %r

Done: 1
Optimization is correct!

----------------------------------------
Name: no overflow
  %o0 = mul i4 %y, %x
  %o1 = udiv i4 %o0, %x
  %r = icmp eq i4 %o1, %y
  ret i1 %r
=>
  %n0 = umul_overflow i4 %x, %y
  %o0 = extractvalue {i4, i1} %n0, 0
  %o1 = udiv %o0, %x
  %n1 = extractvalue {i4, i1} %n0, 1
  %r = xor %n1, -1
  ret i1 %r

Done: 1
Optimization is correct!

```

Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65144

llvm-svn: 370348
2019-08-29 12:47:20 +00:00
Roman Lebedev fb38b7aab3 [InstCombine] Fold '(-1 u/ %x) u< %y' to '@llvm.umul.with.overflow' + overflow bit extraction
Summary:
`(-1 u/ %x) u< %y` is one of (3?) common ways to check that
some unsigned multiplication (will not) overflow.
Currently, we don't catch it. We could:
```
----------------------------------------
Name: no overflow
  %o0 = udiv i4 -1, %x
  %r = icmp ult i4 %o0, %y
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %r = extractvalue {i4, i1} %n0, 1

Done: 1
Optimization is correct!

----------------------------------------
Name: no overflow, swapped
  %o0 = udiv i4 -1, %x
  %r = icmp ugt i4 %y, %o0
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %r = extractvalue {i4, i1} %n0, 1

Done: 1
Optimization is correct!

----------------------------------------
Name: overflow
  %o0 = udiv i4 -1, %x
  %r = icmp uge i4 %o0, %y
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %n1 = extractvalue {i4, i1} %n0, 1
  %r = xor %n1, -1

Done: 1
Optimization is correct!

----------------------------------------
Name: overflow
  %o0 = udiv i4 -1, %x
  %r = icmp ule i4 %y, %o0
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %n1 = extractvalue {i4, i1} %n0, 1
  %r = xor %n1, -1

Done: 1
Optimization is correct!
```

As it can be observed from tests, while simply forming the `@llvm.umul.with.overflow`
is easy, if we were looking for the inverted answer, then more work needs to be done
to cleanup the now-pointless control-flow that was guarding against division-by-zero.
This is being addressed in follow-up patches.

Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon

Reviewed By: nikic, xbolva00

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65143

llvm-svn: 370347
2019-08-29 12:47:08 +00:00
Roman Lebedev f13b0e3ed8 [InstCombine] Shift amount reassociation in bittest: trunc-of-lshr (PR42399)
Summary:
Finally, the fold i was looking forward to :)

The legality check is muddy, i doubt  i've groked the full generalization,
but it handles all the cases i care about, and can come up with:
https://rise4fun.com/Alive/26j

I.e. we can perform the fold if **any** of the following is true:
* The shift amount is either zero or one less than widest bitwidth
* Either of the values being shifted has at most lowest bit set
* The value that is being shifted by `shl` (which is not truncated) should have no less leading zeros than the total shift amount;
* The value that is being shifted by `lshr` (which **is** truncated) should have no less leading zeros than the widest bit width minus total shift amount minus one

I strongly suspect there is some better generalization, but i'm not aware of it as of right now.
For now i also avoided using actual `computeKnownBits()`, but restricted it to constants.

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66383

llvm-svn: 370324
2019-08-29 10:26:23 +00:00
Simon Pilgrim 920b04011b Fix variable set but no used warnings on NDEBUG builds. NFCI.
llvm-svn: 370319
2019-08-29 10:08:45 +00:00
Simon Pilgrim ef9c6a7077 Fix variable set but no used warning on NDEBUG builds. NFCI.
llvm-svn: 370317
2019-08-29 09:58:47 +00:00
Hideto Ueno cbab334e40 [Attributor] Deduce "noalias" attribute
Summary:
This patch adds very basic deduction for noalias.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Tags: LLVM

Differential Revision: https://reviews.llvm.org/D66207

llvm-svn: 370295
2019-08-29 05:52:00 +00:00
Florian Hahn 3177b92231 [LoopUnroll] Use Lazy strategy for DTU used for MergeBlockIntoPredecessor.
We do not access the DT in the loop, so we do not have to apply updates
eagerly. We can apply them lazyly and flush them after we are done
merging blocks.

As follow-up work, we might be able to use the DTU above as well,
instead of manually updating the DT.

This brings the example from PR43134 from ~100s to ~4s for a relase +
assertions build on my machine.

Reviewers: efriedma, kuhar, asbirlea, brzycki

Reviewed By: kuhar, brzycki

Differential Revision: https://reviews.llvm.org/D66911

llvm-svn: 370292
2019-08-29 04:26:29 +00:00
Johannes Doerfert bf112139ac [Attributor] Improve messages in iteration verify mode
When we now verify the iteration count we will see the actual count
and the expected count before the assertion is triggered.

llvm-svn: 370285
2019-08-29 01:29:44 +00:00
Johannes Doerfert 62a9c1da78 [Attributor][Fix] Indicate change correctly
llvm-svn: 370283
2019-08-29 01:26:58 +00:00
Johannes Doerfert 1aac182f31 [Attributor] Fix typo
llvm-svn: 370282
2019-08-29 01:26:09 +00:00
Artur Pilipenko 925afc1ce7 Fix for "DICompileUnit not listed in llvm.dbg.cu" verification error after ...
...cloning a function from a different module

Currently when a function with debug info is cloned from a different module, the 
cloned function may have hanging DICompileUnits, so that the module with the 
cloned function fails debug info verification.

The proposed fix inserts all DICompileUnits reachable from the cloned function 
to "llvm.dbg.cu" metadata operands of the cloned function module. 

Reviewed By: aprantl, efriedma

Differential Revision: https://reviews.llvm.org/D66510

Patch by Oleg Pliss (Oleg.Pliss@azul.com)

llvm-svn: 370265
2019-08-28 21:27:50 +00:00
Julian Lettner 3ae9b9d5e4 [ASan] Make insertion of version mismatch guard configurable
By default ASan calls a versioned function
`__asan_version_mismatch_check_vXXX` from the ASan module constructor to
check that the compiler ABI version and runtime ABI version are
compatible. This ensures that we get a predictable linker error instead
of hard-to-debug runtime errors.

Sometimes, however, we want to skip this safety guard. This new command
line option allows us to do just that.

rdar://47891956

Reviewed By: kubamracek

Differential Revision: https://reviews.llvm.org/D66826

llvm-svn: 370258
2019-08-28 20:40:55 +00:00
Sanjay Patel 2d4b6777c4 [InstCombine] clean up wrap propagation for reassociated ops; NFCI
Always true/false checks were flagged by static analysis;
https://bugs.llvm.org/show_bug.cgi?id=43143

I have not confirmed the logic difference in propagating nsw vs. nuw,
but presumably we would have noticed a bug by now if that was wrong.

llvm-svn: 370248
2019-08-28 18:58:06 +00:00
Pirama Arumuga Nainar 19205abaaa [ValueMapper] NFC: Remove dead code to pause metadata mapping
Summary:
This functionality was added when Mapper::mapMetadata was recursive.  It
is no longer needed after r265456, which switched it to be iterative.

Reviewers: dexonsmith, srhines

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66860

llvm-svn: 370236
2019-08-28 17:43:14 +00:00
Johannes Doerfert f7ca0fe1c8 [Attributor] Regularly clear dependences to remove spurious ones
As dependences between abstract attributes can become stale, e.g., if
one was sufficient to imply another one at some point but it has since
been wakened to the point it is not usable for the formerly implied one.
To weed out spurious dependences, and thereby eliminate unneeded
updates, we introduce an option to determine how often the dependence
cache is cleared and recomputed during the fixpoint iteration.

Note that the initial value was determined such that we see a positive
result on our tests.

Differential Revision: https://reviews.llvm.org/D63315

llvm-svn: 370230
2019-08-28 16:58:52 +00:00
Simon Pilgrim 1d8a886c59 Reduce scope of variable only used in a local pattern match. NFCI.
llvm-svn: 370224
2019-08-28 16:06:33 +00:00
Craig Topper f79d8a064c [InstCombine] Disable recursion in foldGEPICmp for vector pointer GEPs
Due to missing vector support in this function, recursion can
generate worse code in some cases.

llvm-svn: 370221
2019-08-28 15:40:34 +00:00
Simon Pilgrim b569624049 Fix uninitialized variable warning in cppcheck. NFCI.
InstCombiner::MaxArraySizeForCombine is set outside the constructor so we need to ensure it has a default initialization value.

llvm-svn: 370220
2019-08-28 15:19:49 +00:00
David Bolvansky af118bb6d0 [NFC] Added a comment to avoid possible confusion
llvm-svn: 370217
2019-08-28 15:04:48 +00:00
Simon Pilgrim 316bfb0f48 Remove duplicate 'BitWidth' variable. NFCI.
llvm-svn: 370212
2019-08-28 14:37:44 +00:00
Johannes Doerfert 07a5c129c6 [Attributor] Restrict liveness and return information to functions
Summary:
Until we have proper call-site information we should not recompute
liveness and return information for each call site. This patch directly
uses the function versions and introduces TODOs at the usage sites.

The required iterations to get to the fixpoint are most of the time
reduced by this change and we always avoid work duplication.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66562

llvm-svn: 370208
2019-08-28 14:09:14 +00:00
Simon Pilgrim 284118ce3b InstCombiner::visitSelectInst - rename Pred to MinMaxPred to stop shadow variable warning. NFCI.
We have a lot of Predicate variables, all similarly named....

llvm-svn: 370207
2019-08-28 14:05:38 +00:00
Ayal Zaks d15df0ede5 [LV] Fold tail by masking - handle reductions
Allow vectorizing loops that have reductions when tail is folded by masking.
A select is introduced in VPlan, choosing between the last value carried by the
loop-exit/live-out instruction of the reduction, and the penultimate value
carried by the reduction phi, according to the "i < n" mask of fold-tail.
This select replaces the last value as the live-out value of the loop.

Differential Revision: https://reviews.llvm.org/D66720

llvm-svn: 370173
2019-08-28 09:02:23 +00:00
David Bolvansky 05bda8b4e5 Annotate return values of allocation functions with dereferenceable_or_null
Summary:
Example
define dso_local noalias i8* @_Z6maixxnv() local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(64) i8* @malloc(i64 64) #6
  ret i8* %call
}


Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: aaron.ballman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66651

llvm-svn: 370168
2019-08-28 08:28:20 +00:00
Fangrui Song 6964027315 [LoopFusion] Fix another -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build
llvm-svn: 370156
2019-08-28 03:12:40 +00:00
Craig Topper 5bbb604bb5 [InstCombine] Disable some portions of foldGEPICmp for GEPs that return a vector of pointers. Fix other portions.
llvm-svn: 370114
2019-08-27 21:38:56 +00:00
Philip Reames 2694522f13 [Loads/SROA] Remove blatantly incorrect code and fix a bug revealed in the process
The code we had isSafeToLoadUnconditionally was blatantly wrong. This function takes a "Size" argument which is supposed to describe the span loaded from. Instead, the code use the size of the pointer passed (which may be unrelated!) and only checks that span. For any Size > LoadSize, this can and does lead to miscompiles.

Worse, the generic code just a few lines above correctly handles the cases which *are* valid. So, let's delete said code.

Removing this code revealed two issues:
1) As noted by jdoerfert the removed code incorrectly handled external globals.  The test update in SROA is to stop testing incorrect behavior.
2) SROA was confusing bytes and bits, but this wasn't obvious as the Size parameter was being essentially ignored anyway.  Fixed.

Differential Revision: https://reviews.llvm.org/D66778

llvm-svn: 370102
2019-08-27 19:34:43 +00:00
David Bolvansky 0c2692108c [InstCombine] Fold select with ctlz to cttz
Summary:
Handle pattern [0]:

int ctz(unsigned int a)
{
  int c = __clz(a & -a);
  return a ? 31 - c : c;
}

In reality, the compiler can generate much better code for cttz, so fold away this pattern.

https://godbolt.org/z/c5kPtV

 [0] https://community.arm.com/community-help/f/discussions/2114/count-trailing-zeros

Reviewers: spatel, nikic, lebedev.ri, dmgreen, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66308

llvm-svn: 370037
2019-08-27 10:22:40 +00:00
Johannes Doerfert 39681e733c [Attributor] Introduce an API to delete stuff
Summary:
During the fixpoint iteration, including the manifest stage, we should
not delete stuff as other abstract attributes might have a reference to
the value. Through the API this can now be done safely at the very end.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66779

llvm-svn: 370014
2019-08-27 04:57:54 +00:00
Vitaly Buka aeca56964f msan, codegen, instcombine: Keep more lifetime markers used for msan
Reviewers: eugenis

Subscribers: hiraditya, cfe-commits, #sanitizers, llvm-commits

Tags: #clang, #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D66695

llvm-svn: 369979
2019-08-26 22:15:50 +00:00
Evgeniy Stepanov ed4fefb0df [hwasan] Fix test failure in r369721.
Try harder to emulate "old runtime" in the test.
To get the old behavior with the new runtime library, we need both
disable personality function wrapping and enable landing pad
instrumentation.

llvm-svn: 369977
2019-08-26 21:44:55 +00:00
Philip Reames cf3b555973 Add a clarify comment for meaning of SafePointes [NFC]
Extracted from D66688 as requested.

llvm-svn: 369962
2019-08-26 20:48:35 +00:00
Philip Reames b92c971099 [InstCombine] icmp eq/ne (gep inbounds P, Idx..), null -> icmp eq/ne P, null for vectors
Extend the transform introduced in https://reviews.llvm.org/D66608 to work for vector geps as well.

Differential Revision: https://reviews.llvm.org/D66671

llvm-svn: 369949
2019-08-26 19:11:49 +00:00
Johannes Doerfert b504eb8bb5 [Attributor] Adjust and test the iteration bound of tests
Summary:
Try to verify how many iterations we need for a fixpoint in our tests.
This patch adjust the way we count to make it easier to follow. It also
adjusts the bounds to actually account for a fixpoint and not only the
minimum number to pass all checks.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66757

llvm-svn: 369945
2019-08-26 18:55:47 +00:00
Johannes Doerfert a4a308cc25 [Attributor] Further cut down on non-determinism
llvm-svn: 369936
2019-08-26 17:51:23 +00:00
Johannes Doerfert 19b0043641 [Attributor] Allow explicit dependence tracking
By default, the Attributor tracks potential dependences between abstract
attributes based on the issued Attributor::getAAFor queries. This
simplifies the development of new abstract attributes but it can also
lead to spurious dependences that might increase compile time and make
internalization harder (D63312). With this patch, abstract attributes
can opt-out of implicit dependence tracking and instead register
dependences explicitly. It is up to the implementation to make sure all
existing dependences are registered.

Differential Revision: https://reviews.llvm.org/D63314

llvm-svn: 369935
2019-08-26 17:48:05 +00:00
Bjorn Pettersson d804bd17de [LoopUnroll] Handle certain PHIs in full unrolling properly
Summary:
When reconstructing the CFG of the loop after unrolling,
LoopUnroll could in some cases remove the phi operands of
loop-carried values instead of preserving them, resulting
in undef phi values after loop unrolling.

When doing this reconstruction, avoid removing incoming
phi values for phis in the successor blocks if the successor
is the block we are jumping to anyway.

Patch-by: ebevhan

Reviewers: fhahn, efriedma

Reviewed By: fhahn

Subscribers: bjope, lebedev.ri, zzheng, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66334

llvm-svn: 369886
2019-08-26 09:29:53 +00:00
Roman Lebedev 9cf08c6de1 [Constant] Add 'isElementWiseEqual()' method
Promoting it from InstCombine's tryToReuseConstantFromSelectInComparison().

Return true if this constant and a constant 'Y' are element-wise equal.
This is identical to just comparing the pointers, with the exception that
for vectors, if only one of the constants has an `undef` element in some
lane, the constants still match.

llvm-svn: 369842
2019-08-24 06:49:51 +00:00
Roman Lebedev de19f749e0 [InstCombine] matchThreeWayIntCompare(): commutativity awareness
Summary:
`matchThreeWayIntCompare()` looks for
```
   select i1 (a == b),
          i32 Equal,
          i32 (select i1 (a < b), i32 Less, i32 Greater)
```
but both of these selects/compares can be in it's commuted form,
so out of 8 variants, only the two most basic ones is handled.
This fixes regression being introduced in D66232.

Reviewers: spatel, nikic, efriedma, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66607

llvm-svn: 369841
2019-08-24 06:49:36 +00:00
Roman Lebedev 2c75fe7f2a [InstCombine] Try to reuse constant from select in leading comparison
Summary:
If we have e.g.:
```
  %t = icmp ult i32 %x, 65536
  %r = select i1 %t, i32 %y, i32 65535
```
the constants `65535` and `65536` are suspiciously close.
We could perform a transformation to deduplicate them:
```
Name: ult
%t = icmp ult i32 %x, 65536
%r = select i1 %t, i32 %y, i32 65535
  =>
%t.inv = icmp ugt i32 %x, 65535
%r = select i1 %t.inv, i32 65535, i32 %y
```
https://rise4fun.com/Alive/avb

While this may seem esoteric, this should certainly be good for vectors
(less constant pool usage) and for opt-for-size - need to have only one constant.

But the real fun part here is that it allows further transformation,
in particular it finishes cleaning up the `clamp` folding,
see e.g. `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`.
We start with e.g.
```
  %dont_need_to_clamp_positive = icmp sle i32 %X, 32767
  %dont_need_to_clamp_negative = icmp sge i32 %X, -32768
  %clamp_limit = select i1 %dont_need_to_clamp_positive, i32 -32768, i32 32767
  %dont_need_to_clamp = and i1 %dont_need_to_clamp_positive, %dont_need_to_clamp_negative
  %R = select i1 %dont_need_to_clamp, i32 %X, i32 %clamp_limit
```
without this patch we currently produce
```
  %1 = icmp slt i32 %X, 32768
  %2 = icmp sgt i32 %X, -32768
  %3 = select i1 %2, i32 %X, i32 -32768
  %R = select i1 %1, i32 %3, i32 32767
```
which isn't really a `clamp` - both comparisons are performed on the original value,
this patch changes it into
```
  %1.inv = icmp sgt i32 %X, 32767
  %2 = icmp sgt i32 %X, -32768
  %3 = select i1 %2, i32 %X, i32 -32768
  %R = select i1 %1.inv, i32 32767, i32 %3
```
and then the magic happens! Some further transform finishes polishing it and we finally get:
```
  %t1 = icmp sgt i32 %X, -32768
  %t2 = select i1 %t1, i32 %X, i32 -32768
  %t3 = icmp slt i32 %t2, 32767
  %R = select i1 %t3, i32 %t2, i32 32767
```
which is beautiful and just what we want.

Proofs for `getFlippedStrictnessPredicateAndConstant()` for de-canonicalization:
https://rise4fun.com/Alive/THl
Proofs for the fold itself: https://rise4fun.com/Alive/THl

Reviewers: spatel, dmgreen, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66232

llvm-svn: 369840
2019-08-24 06:49:25 +00:00
Fangrui Song eb70ac0249 [LoopFusion] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build
llvm-svn: 369836
2019-08-24 02:50:42 +00:00
Peter Collingbourne 5b31ac5096 hwasan: Fix use of uninitialized memory.
Reported by e.g.
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/23071/steps/build%20with%20ninja/logs/stdio

llvm-svn: 369815
2019-08-23 21:37:20 +00:00
Johannes Doerfert 5a5a139917 [Attributor] Manifest alignment in load and store instructions
Summary:
We can now manifest alignment information in load/store instructions if
the pointer is known to have a better alignment.

Reviewers: uenoku, sstefan1, lebedev.ri

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66567

llvm-svn: 369804
2019-08-23 20:20:10 +00:00
Benjamin Kramer dc5f805d31 Do a sweep of symbol internalization. NFC.
llvm-svn: 369803
2019-08-23 19:59:23 +00:00
Philip Reames 9cb059fdcc Fix a bug in just submitted rL369789
Started implementing the vector case and realized the scalar case hadn't handled the GEP producing a different type than the base correctly.  It's entertaining seeing what slips through review when we're focused on the 'hard' parts.  :(

Also adding an extra vector test as it happened to be in workspace and wasn't worth separating.

llvm-svn: 369795
2019-08-23 18:27:57 +00:00
Philip Reames 5b02cfa0b3 [InstCombine] icmp eq/ne (gep inbounds P, Idx..), null -> icmp eq/ne P, null
This generalizes the isGEPKnownNonNull rule from ValueTracking to apply when we do not know if the base is non-null, and thus need to replace one condition with another.

The core notion is that since an inbounds GEP can only form null if the base pointer is null and the offset is zero. However, if the offset is non-zero, the the "inbounds" marker makes the result poison. Thus, we're free to ignore the case where the offset is non-zero. Similarly, there's no case under which a non-null base can result in a null result without generating poison.

Differential Revision: https://reviews.llvm.org/D66608

llvm-svn: 369789
2019-08-23 17:58:58 +00:00
Johannes Doerfert 23400e618b [Attributor] Manifest constant return values
Summary:
If the unique return value is a constant we now replace call uses with
that constant.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66551

llvm-svn: 369785
2019-08-23 17:41:37 +00:00
Johannes Doerfert 785fad3202 [Attributor] Deal with shrinking dereferenceability in a loop
Summary:
If we have a loop in which the dereferenceability of a pointer decreases
we did slowly decrease it iteration by iteration, leading to a timeout.
With this patch we detect such circular reasoning and indicate a
fixpoint early.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66558

llvm-svn: 369784
2019-08-23 17:29:23 +00:00
Sanjay Patel 5a5d44e801 [SLP] use range-for loops, fix formatting; NFC
These are part of D57059, but that patch doesn't apply cleanly to trunk
at this point, so we might as well remove some of the noise.

llvm-svn: 369776
2019-08-23 16:22:32 +00:00
Cameron McInally 688f3bc240 [Reassoc] Small fix to support unary FNeg in NegateValue(...)
Differential Revision: https://reviews.llvm.org/D66612

llvm-svn: 369772
2019-08-23 15:49:38 +00:00
Johannes Doerfert 2f2d7c3add [Attributor][Fix] Deal with "growing" dereferenceability
Summary:
If we have a negative inbounds offset dereferenceabily "grows". However,
until we do not handle the overflow that can occur in the
dereferenceable bytes and the problem with loops, we simply do not grow
the state.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66557

llvm-svn: 369771
2019-08-23 15:45:46 +00:00
Johannes Doerfert deb9ea3a8c [Attributor][NFCI] Avoid lookups when resolving returned values
If the number of potentially returned values not change since the last
traversal we do not need to visit the returned values again. This works
as we only add values to the returned values set now.

Differential Revision: https://reviews.llvm.org/D66484

llvm-svn: 369770
2019-08-23 15:42:19 +00:00
Sanjay Patel 9182467886 [SLP] fix formatting; NFC
These are part of D57059, but that patch doesn't apply cleanly to trunk
at this point, so we might as well remove some of the noise.

llvm-svn: 369769
2019-08-23 15:26:12 +00:00
Johannes Doerfert 9543f1498c [Attributor] FIX: Treat new attributes as changed ones
Summary:
When we have new attributes and we end the fixpoint iteration because
the iteration limit is reached, we need to treat the new ones as if they
changed in the last iteration, as they might have.

This adds a test for which we should not derive anything regardless of
the iteration limit, e.g., if we abort there should not be any
attributes manifested in the IR.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66549

llvm-svn: 369768
2019-08-23 15:24:57 +00:00
Johannes Doerfert 695089ecfb [Attributor][NFCI] Try to avoid potential non-deterministic behavior
This commit replaces sets with set vectors in an effort to make the
behavior of the Attributor deterministic.

llvm-svn: 369767
2019-08-23 15:23:49 +00:00
Teresa Johnson ea314fd476 [ThinLTO] Fix handling of weak interposable symbols
Summary:
Keep aliasees alive if their alias is live, otherwise we end up with an
alias to a declaration, which is invalid. This can happen when the
aliasee is weak and non-prevailing.

This fix exposed the fact that we were then attempting to internalize
the weak symbol, which was not exported as it was not prevailing. We
should not internalize interposable symbols in general, unless this is
the prevailing copy, since it can lead to incorrect inlining and other
optimizations. Most of the changes in this patch are due to the
restructuring required to pass down the prevailing callback.

Finally, while implementing the test cases, I found that in the case of
a weak aliasee that is still marked not live because its alias isn't
live, after dropping the definition we incorrectly marked the
declaration with weak linkage when resolving prevailing symbols in the
module. This was due to some special case handling for symbols marked
WeakLinkage in the summary located before instead of after a subsequent
check for the symbol being a declaration. It turns out that we don't
actually need this special case handling any more (looking back at the
history, when that was added the code was structured quite differently)
- we will correctly mark with weak linkage further below when the
definition hasn't been dropped.

Fixes PR42542.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66264

llvm-svn: 369766
2019-08-23 15:18:58 +00:00
Philip Reames 2a52583d67 [IndVars] Fix a bug noticed by inspection
We were computing the loop exit value, but not ensuring the addrec belonged to the loop whose exit value we were computing.  I couldn't actually trip this; the test case shows the basic setup which *might* trip this, but none of the variations I've tried actually do.

llvm-svn: 369730
2019-08-23 04:03:23 +00:00
Fangrui Song 3fc933af8b [AlignmentFromAssumptions] getNewAlignmentDiff(): use getURemExpr()
The alignment is calculated incorrectly, thus sometimes it doesn't generate aligned mov instructions, as shown by the example below:

```
// b.cc
typedef long long index;

extern "C" index g_tid;
extern "C" index g_num;

void add3(float* __restrict__ a, float* __restrict__ b, float* __restrict__ c) {
    index n = 64*1024;
    index m = 16*1024;
    index k = 4*1024;
    index tid = g_tid;
    index num = g_num;
    __builtin_assume_aligned(a, 32);
    __builtin_assume_aligned(b, 32);
    __builtin_assume_aligned(c, 32);
    for (index i0=tid*k; i0<m; i0+=num*k)
        for (index i1=0; i1<n*m; i1+=m)
            for (index i2=0; i2<k; i2++)
                c[i1+i0+i2] = b[i0+i2] + a[i1+i0+i2];
}
```

Compile with `clang b.cc -Ofast -march=skylake -mavx2 -S`

```
vmovaps -224(%rdi,%rbx,4), %ymm0
vmovups -192(%rdi,%rbx,4), %ymm1         # should be movaps
vmovups -160(%rdi,%rbx,4), %ymm2         # should be movaps
vmovups -128(%rdi,%rbx,4), %ymm3         # should be movaps
vaddps  -224(%rsi,%rbx,4), %ymm0, %ymm0
vaddps  -192(%rsi,%rbx,4), %ymm1, %ymm1
vaddps  -160(%rsi,%rbx,4), %ymm2, %ymm2
vaddps  -128(%rsi,%rbx,4), %ymm3, %ymm3
vmovaps %ymm0, -224(%rdx,%rbx,4)
vmovups %ymm1, -192(%rdx,%rbx,4)         # should be movaps
vmovups %ymm2, -160(%rdx,%rbx,4)         # should be movaps
vmovups %ymm3, -128(%rdx,%rbx,4)         # should be movaps
```

Differential Revision: https://reviews.llvm.org/D66575
Patch by Dun Liang

llvm-svn: 369723
2019-08-23 02:17:04 +00:00
Peter Collingbourne 21a1814417 hwasan: Untag unwound stack frames by wrapping personality functions.
One problem with untagging memory in landing pads is that it only works
correctly if the function that catches the exception is instrumented.
If the function is uninstrumented, we have no opportunity to untag the
memory.

To address this, replace landing pad instrumentation with personality function
wrapping. Each function with an instrumented stack has its personality function
replaced with a wrapper provided by the runtime. Functions that did not have
a personality function to begin with also get wrappers if they may be unwound
past. As the unwinder calls personality functions during stack unwinding,
the original personality function is called and the function's stack frame is
untagged by the wrapper if the personality function instructs the unwinder
to keep unwinding. If unwinding stops at a landing pad, the function is
still responsible for untagging its stack frame if it resumes unwinding.

The old landing pad mechanism is preserved for compatibility with old runtimes.

Differential Revision: https://reviews.llvm.org/D66377

llvm-svn: 369721
2019-08-23 01:28:44 +00:00
Peter Collingbourne 2452d7030b IR. Change strip* family of functions to not look through aliases.
I noticed another instance of the issue where references to aliases were
being replaced with aliasees, this time in InstCombine. In the instance that
I saw it turned out to be only a QoI issue (a symbol ended up being missing
from the symbol table due to the last reference to the alias being removed,
preventing HWASAN from symbolizing a global reference), but it could easily
have manifested as incorrect behaviour.

Since this is the third such issue encountered (previously: D65118, D65314)
it seems to be time to address this common error/QoI issue once and for all
and make the strip* family of functions not look through aliases.

Includes a test for the specific issue that I saw, but no doubt there are
other similar bugs fixed here.

As with D65118 this has been tested to make sure that the optimization isn't
load bearing. I built Clang, Chromium for Linux, Android and Windows as well
as the test-suite and there were no size regressions.

Differential Revision: https://reviews.llvm.org/D66606

llvm-svn: 369697
2019-08-22 19:56:14 +00:00
Hideto Ueno 70576cac52 [Attributor][NFC] Move DerefState to header and use StateWrapper
Summary: In D65402, I want to get DerefState from AADereferenceable but it was not allowed. This patch moves DerefState definition into Attributor.h and makes AADerefenceable inherit StateWrapper.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66585

llvm-svn: 369653
2019-08-22 14:18:29 +00:00
Serguei Katkov 036e636aa7 [Loop Peeling] Fix silly bug in metadata update.
We must update loop metedata before we moved to parent loop if
it is present.

llvm-svn: 369637
2019-08-22 10:06:46 +00:00
Johannes Doerfert d98f975089 [Attributor] Fix: Gracefully handle non-instruction users
Function can have users that are not instructions, e.g., bitcasts. For
now, we simply give up when we see them.

llvm-svn: 369588
2019-08-21 21:48:56 +00:00
Johannes Doerfert 5427aa843b [Attributor][NFC] Fix copy & paste error
llvm-svn: 369577
2019-08-21 20:57:20 +00:00
Johannes Doerfert 2db8528fb4 [Attributor][NFC] Remove leftover semicolon
llvm-svn: 369576
2019-08-21 20:56:56 +00:00
Johannes Doerfert d410805d57 [Attributor] Use existing unreachable instead of introducing new ones
So far we split the unreachable off and placed a new one, this is not
necessary.

llvm-svn: 369575
2019-08-21 20:56:41 +00:00
Florian Hahn b5e52bfd83 [GVN] Do PHI translations across all edges between the load and the unavailable pred.
Currently we do not properly translate addresses with PHIs if LoadBB !=
LI->getParent(), because PHITranslateAddr expects a direct predecessor as argument,
because it considers all instructions outside of the current block to
not requiring translation.

The amount of cases that trigger this should be very low, as most single
predecessor blocks should be folded into their predecessor by GVN before
we actually start with value numbering. It is still not guaranteed to
happen, so we should do PHI translation along all edges between the
loads' block and the predecessor where we have to place a load.

There are a few test cases showing current limits of the PHI translation, which
could be improved later.

Reviewers: spatel, reames, efriedma, john.brawn

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65020

llvm-svn: 369570
2019-08-21 20:06:50 +00:00
Philip Reames 764b0fd5a3 [instcombine] icmp eq/ne (sub C, Y), C -> icmp eq/ne Y, 0
Noticed while looking at pr43028.  

llvm-svn: 369541
2019-08-21 15:51:57 +00:00
Sanjay Patel e728259278 [InstCombine] narrow icmp with extended operands of different widths
An intermediate extend is used to widen the narrow operand to the width of
the other (wider) operand. At that point, we have the same logic as the
existing transform that was restricted to folds of equal width zext/sext.

This mostly solves PR42700:
https://bugs.llvm.org/show_bug.cgi?id=42700

llvm-svn: 369519
2019-08-21 11:56:08 +00:00
Stefan Stipanovic 26121ae4d0 [Attributor] Liveness for internal functions.
For an internal function, if all its call sites are dead, the body of the function is considered dead.

Reviewers: jdoerfert, uenoku

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D66155

llvm-svn: 369470
2019-08-20 23:16:57 +00:00
Alexandre Ganea 21e9603030 [Sanitizer] Remove unused functions
Differential Revision: https://reviews.llvm.org/D66503

llvm-svn: 369468
2019-08-20 22:56:40 +00:00
Michael Liao a99086dbdd [Attributor] Remove unused variable. NFC.
llvm-svn: 369444
2019-08-20 21:02:31 +00:00
Wenlei He 5adace352d [AutoFDO] Make call targets order deterministic for sample profile
Summary:
StringMap is used for storing call target to frequency map for AutoFDO. However the iterating order of StringMap is non-deterministic, which leads to non-determinism in AutoFDO profile output. Now new API getSortedCallTargets and SortCallTargets are added for deterministic ordering and output.

Roundtrip test for text profile and binary profile is added.

Reviewers: wmi, davidxl, danielcdh

Subscribers: hiraditya, mgrang, llvm-commits, twoh

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66191

llvm-svn: 369440
2019-08-20 20:52:00 +00:00
Sanjay Patel 292b1087f4 [InstCombine] add helper function for icmp+zext/sext; NFC
llvm-svn: 369421
2019-08-20 18:15:17 +00:00
Sanjay Patel 2e68e4d60e [InstCombine] make fold for icmp with sext more efficient; NFC
We were creating 2 instructions and relying on a subsequent fold
to invert a not(icmp). Create the final icmp directly instead.

llvm-svn: 369411
2019-08-20 17:03:22 +00:00
Sanjay Patel a90ee0eeb6 [InstCombine] improve readability for icmp with cast folds; NFC
1. Update function name and stale code comments.
2. Use variable names that are less ambiguous.
3. Move operand checks into the function as early exits.

llvm-svn: 369390
2019-08-20 14:56:44 +00:00
Jinsong Ji cda334ba54 [BlockExtractor] Avoid assert with wrong line format
Summary:
When the line format is wrong, we may end up accessing out of bound
memory. eg: the test with invalide line will cause assert.
Assertion `idx < size()' failed

The fix is to report fatal when we found mismatched line format.

Reviewers: qcolombet, volkan

Reviewed By: qcolombet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66444

llvm-svn: 369389
2019-08-20 14:46:02 +00:00
Sanjay Patel f99d254aae [InstCombine] simplify min/max of min/max with same operands (PR35607)
This is the original integer variant requested in:
https://bugs.llvm.org/show_bug.cgi?id=35607

As noted in the TODO and several similar TODOs around this block,
we could do this in instsimplify, but then it would cost more
because we would be trying to match min/max via ValueTracking
in 2 different places.

There are 4 commuted variants for each of smin/smax/umin/umax
that are not matched here. There are also icmp predicate variants
that are not included in the affected test file because they are
already handled by instsimplify by folding the final icmp to
true/false.

https://rise4fun.com/Alive/3KVc

  Name: smax(smax, smin)
  %c1 = icmp slt i32 %x, %y
  %c2 = icmp slt i32 %y, %x
  %min = select i1 %c1, i32 %x, i32 %y
  %max = select i1 %c2, i32 %x, i32 %y
  %c3 = icmp sgt i32 %max, %min
  %r = select i1 %c3, i32 %max, i32 %min
  =>
  %r = %max

  Name: smin(smax, smin)
  %c1 = icmp slt i32 %x, %y
  %c2 = icmp slt i32 %y, %x
  %min = select i1 %c1, i32 %x, i32 %y
  %max = select i1 %c2, i32 %x, i32 %y
  %c3 = icmp sgt i32 %max, %min
  %r = select i1 %c3, i32 %min, i32 %max
  =>
  %r = %min

  Name: umax(umax, umin)
  %c1 = icmp ult i32 %x, %y
  %c2 = icmp ult i32 %y, %x
  %min = select i1 %c1, i32 %x, i32 %y
  %max = select i1 %c2, i32 %x, i32 %y
  %c3 = icmp ult i32 %min, %max
  %r = select i1 %c3, i32 %max, i32 %min
  =>
  %r = %max

  Name: umin(umax, umin)
  %c1 = icmp ult i32 %x, %y
  %c2 = icmp ult i32 %y, %x
  %min = select i1 %c1, i32 %x, i32 %y
  %max = select i1 %c2, i32 %x, i32 %y
  %c3 = icmp ult i32 %min, %max
  %r = select i1 %c3, i32 %min, i32 %max
  =>
  %r = %min

llvm-svn: 369386
2019-08-20 13:39:17 +00:00
Fangrui Song f182617352 [Attributor] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after r369331
llvm-svn: 369334
2019-08-20 07:21:43 +00:00
Johannes Doerfert 12cbbab9d9 [Attributor] Create abstract attributes on-demand
Before, we create the set of abstract attributes initially and then
dealt with the fact hat a lookup could fail, e.g., return a nullptr.
This patch will ensure we always return a valid object from a lookup,
allowing us not only to remove the nullptr checks but also to grow the
set of abstract attributes "in-flight" on-demand.

One can now start from those that have the best chance of improving
performance without the need to specify all they might depend on.

While this introduces some boilerplate, the usage of attributes is much
easier and cleaner now.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66276

llvm-svn: 369331
2019-08-20 06:15:50 +00:00
Johannes Doerfert 169af994bc [Attributor][NFC] Cleanup statistics code
llvm-svn: 369330
2019-08-20 06:09:56 +00:00
Johannes Doerfert cfcca1a5b1 [Attributor] Use structured deduction for AADereferenceable
Summary:
This is analogous to D66128 but for AADereferenceable. We have the logic
concentrated in the floating value updateImpl and we use the combiner
helper classes for arguments and return values.

The regressions will go away with "on-demand" attribute creation.
Improvements are already visible in the existing tests.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66272

llvm-svn: 369329
2019-08-20 06:08:35 +00:00
Johannes Doerfert b9b8791fed [Attributor] Use structured deduction for AANonNull
Summary:
What D66126 did for AAAlign, this patch does for AANonNull. Agian, the
logic becomes more concise and localized. Again, returned poiners are
not annotated properly but that will not be an issue if this lands with
the "on-demand" generation of attributes. First improvements due to the
genericValueTraversal are already visible.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66128

llvm-svn: 369328
2019-08-20 06:02:39 +00:00
Johannes Doerfert 028b2aa56a [Attributor] Fix the "clamp" operator
The clamp operator should not take the known of the given state as the
known is potentially based on assumed information. This also adds TODOs
to guide improvements.

llvm-svn: 369327
2019-08-20 05:57:01 +00:00
Dinar Temirbulatov 081c57989e [SLP][NFC] Avoid repetitive calls to getSameOpcode()
We can avoid repetitive calls getSameOpcode() for already known tree elements by keeping MainOp and AltOp in TreeEntry.

Differential Revision: https://reviews.llvm.org/D64700

llvm-svn: 369315
2019-08-20 00:22:04 +00:00
Johannes Doerfert de7674ce76 Recommit "[Attributor] Fix: Do not partially resolve returned calls."
This reverts commit b1752f670f.

Fixed the issue with a different commit, reapply this one as it was,
afaik, not broken.

llvm-svn: 369303
2019-08-19 21:35:31 +00:00
Evgeniy Stepanov 55ccd16354 Refactor isPointerOffset (NFC).
Summary:
Simplify the API using Optional<> and address comments in
         https://reviews.llvm.org/D66165

Reviewers: vitalybuka

Subscribers: hiraditya, llvm-commits, ostannard, pcc

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66317

llvm-svn: 369300
2019-08-19 21:08:04 +00:00
Johannes Doerfert 056f1b5cc7 Re-apply fixed "[Attributor] Fix: Make sure we set the changed flag"
This reverts commit cedd0d9a6e.

Re-apply the original commit but make sure the variables are initialized
(even if they are not used) so UBSan is not complaining.

llvm-svn: 369294
2019-08-19 19:14:10 +00:00
Alina Sbirlea 1a3fdaf6a6 [MemorySSA] Rename uses when inserting memory uses.
Summary:
When inserting uses from outside the MemorySSA creation, we don't
normally need to rename uses, based on the assumption that there will be
no inserted Phis (if  Def existed that required a Phi, that Phi already
exists). However, when dealing with unreachable blocks, MemorySSA will
optimize away Phis whose incoming blocks are unreachable, and these Phis end
up being re-added when inserting a Use.
There are two potential solutions here:
1. Analyze the inserted Phis and clean them up if they are unneeded
(current method for cleaning up trivial phis does not cover this)
2. Leave the Phi in place and rename uses, the same way as whe inserting
defs.
This patch use approach 2.

Resolves first test in PR42940.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66033

llvm-svn: 369291
2019-08-19 18:57:40 +00:00
Sanjay Patel b38bac3699 [SLP] reduce duplicated code; NFC
llvm-svn: 369250
2019-08-19 11:39:56 +00:00
David L. Jones cedd0d9a6e Revert [Attributor] Fix: Make sure we set the changed flag
This reverts r369159 (git commit cbaf1fdea2)

r369160 caused a test to fail under UBSAN. See thread on llvm-commits.

llvm-svn: 369241
2019-08-19 08:00:08 +00:00
David L. Jones b1752f670f Revert [Attributor] Fix: Do not partially resolve returned calls.
This reverts r369160 (git commit f72d9b1c97)

r369160 caused some tests to fail under UBSAN. See thread on llvm-commits.

llvm-svn: 369236
2019-08-19 07:16:24 +00:00
Roman Lebedev 9b957d3321 [InstCombine] Cherry-pick NFC cleanups of foldShiftIntoShiftInAnotherHandOfAndInICmp() from D66383
llvm-svn: 369207
2019-08-18 12:26:33 +00:00
Alina Sbirlea f92109dc01 [MemorySSA] Loop passes should mark MSSA preserved when available.
This patch applies only to the new pass manager.
Currently, when MSSA Analysis is available, and pass to each loop pass, it will be preserved by that loop pass.
Hence, mark the analysis preserved based on that condition, vs the current `EnableMSSALoopDependency`. This leaves the global flag to affect only the entry point in the loop pass manager (in FunctionToLoopPassAdaptor).

llvm-svn: 369181
2019-08-17 01:02:12 +00:00
Sanjay Patel a53ad0e157 Revert r367891 - "[InstCombine] combine mul+shl separated by zext"
This reverts commit 5dbb90bfe1.

As noted in the post-commit thread for r367891, this can create
a multiply that is lowered to a libcall that may not exist.

We need to improve the backend decomposition for integer multiply
before trying to re-land this (if it's still worthwhile after
doing the backend work).

llvm-svn: 369174
2019-08-16 23:36:28 +00:00
Jian Cai 16fa8b0970 Reland "[ARM] push LR before __gnu_mcount_nc"
This relands r369147 with fixes to unit tests.

https://reviews.llvm.org/D65019

llvm-svn: 369173
2019-08-16 23:30:16 +00:00
Johannes Doerfert f72d9b1c97 [Attributor] Fix: Do not partially resolve returned calls.
By partially resolving returned calls we did not record that they were
not fully resolved which caused odd behavior down the line. We could
also end up with some, but not all, returned values of the callee in the
returned values map of the caller, another odd behavior we want to
avoid.

llvm-svn: 369160
2019-08-16 21:59:52 +00:00
Johannes Doerfert cbaf1fdea2 [Attributor] Fix: Make sure we set the changed flag
The flag was updated *before* we actually run the visitor callback so we
might miss updates.

llvm-svn: 369159
2019-08-16 21:55:01 +00:00
Johannes Doerfert 6dedc78d9d [Attributor] Add all missing attribute definitions/symbols
As a preparation to "on-demand" abstract attribute generation we need
implementations for all attributes (as they can be queried and then
created on-demand where we now fail to find one).

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66129

llvm-svn: 369155
2019-08-16 21:31:11 +00:00
Jian Cai 2d957cfe02 Revert "[ARM] push LR before __gnu_mcount_nc"
This reverts commit f4cf3b9593.

llvm-svn: 369149
2019-08-16 20:40:21 +00:00
Jian Cai f4cf3b9593 [ARM] push LR before __gnu_mcount_nc
Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of
the stack on ARM32.

Differential Revision: https://reviews.llvm.org/D65019

llvm-svn: 369147
2019-08-16 20:21:08 +00:00
Johannes Doerfert 234eda563d [Attributor] Towards a more structured deduction pattern
Summary:
This is the first commit aiming to structure the attribute deduction.
The base idea is that we have default propagation patterns as listed
below on top of which we can add specific, e.g., context sensitive,
logic.

Deduction patterns used in this patch:
  - argument states are determined from call site argument states,
    see AAAlignArgument and AAArgumentFromCallSiteArguments.
  - call site argument states are determined as if they were floating
    values, see AAAlignCallSiteArgument and AAAlignFloating.
  - floating value states are determined by traversing the def-use chain
    and combining the states determined for the leaves, see
    AAAlignFloating and genericValueTraversal.
  - call site return states are determined from function return states,
    see AAAlignCallSiteReturned and AACallSiteReturnedFromReturned.
  - function return states are determined from returned value states,
    see AAAlignReturned and AAReturnedFromReturnedValues.

Through this strategy all logic for alignment is concentrated in the
AAAlignFloating::updateImpl method.

Note: This commit works on its own but is part of a larger change that
involves "on-demand" creation of abstract attributes that will
participate in the fixpoint iteration. Without this part, we sometimes
do not have an AAAlign abstract attribute to query, loosing information
we determined before. All tests have appropriate FIXMEs and the
information will be recovered once we added all parts.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66126

llvm-svn: 369144
2019-08-16 19:51:23 +00:00
Johannes Doerfert 66cf87e290 [Attributor][NFC] Introduce aliases for call site attributes
Until we have call site specific liveness and/or value information there
is no need to do call site specific deduction. Though, we need the
symbols in follow up patches that make Attributor::getAAFor return a
reference.

llvm-svn: 369143
2019-08-16 19:49:00 +00:00
Johannes Doerfert fe6dbadc0d [Attributor] Introduce initialize calls and move code to keep attributes concise
Summary:
This patch should not change the behavior except that the added
initialize methods might indicate an optimistic fixpoint earlier. The
code movement is done to keep the attribute definitions in a single
block where it makes sense. No functional changes intended there.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66258

llvm-svn: 369142
2019-08-16 19:36:17 +00:00
Sanjay Patel 39eb2324f7 [InstCombine] canonicalize a scalar-select-of-vectors to vector select
This pattern may arise more frequently with an enhancement to SLP vectorization suggested in PR42755:
https://bugs.llvm.org/show_bug.cgi?id=42755
...but we should handle this pattern to make things easier for the backend either way.

For all in-tree targets that I looked at, codegen for typical vector sizes looks better when we change
to a vector select, so this is safe to do without a cost model (in other words, as a target-independent
canonicalization).

For example, if the condition of the select is a scalar, we end up with something like this on x86:

	vpcmpgtd	%xmm0, %xmm1, %xmm0
	vpextrb	$12, %xmm0, %eax
	testb	$1, %al
	jne	LBB0_2
  ## %bb.1:
	vmovaps	%xmm3, %xmm2
  LBB0_2:
	vmovaps	%xmm2, %xmm0

Rather than the splat-condition variant:

	vpcmpgtd	%xmm0, %xmm1, %xmm0
	vpshufd	$255, %xmm0, %xmm0      ## xmm0 = xmm0[3,3,3,3]
	vblendvps	%xmm0, %xmm2, %xmm3, %xmm0

Differential Revision: https://reviews.llvm.org/D66095

llvm-svn: 369140
2019-08-16 18:51:30 +00:00
Vasileios Porpodas 1d254f3dae [SLPVectorizer] Make the scheduler aware of the TreeEntry operands.
Summary:
The scheduler's dependence graph gets the use-def dependencies by accessing the operands of the instructions in a bundle. However, buildTree_rec() may change the order of the operands in TreeEntry, and the scheduler is currently not aware of this. This is not causing any functional issues currently, because reordering is restricted to the operands of a single instruction. Once we support operand reordering across multiple TreeEntries, as shown here: http://www.llvm.org/devmtg/2019-04/slides/Poster-Porpodas-Supernode_SLP.pdf , the scheduler will need to get the correct operands from TreeEntry and not from the individual instructions.

In short, this patch:
- Connects the scheduler's bundle with the corresponding TreeEntry. It introduces new TE and Lane fields in ScheduleData.
- Moves the location where the operands of the TreeEntry are initialized. This used to take place in newTreeEntry() setting one operand at a time, but is now moved pre-order just before the recursion of buildTree_rec(). This is required because the scheduler needs to access both operands of the TreeEntry in tryScheduleBundle().
- Updates the scheduler to access the instruction operands through the TreeEntry operands instead of accessing the instruction operands directly.

Reviewers: ABataev, RKSimon, dtemirbulatov, Ayal, dorit, hfinkel

Reviewed By: ABataev

Subscribers: hiraditya, llvm-commits, lebedev.ri, rcorcs

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62432

llvm-svn: 369131
2019-08-16 17:21:18 +00:00
Evandro Menezes 05e9c2ac2e [InstCombine] Simplify pow(2.0, itofp(y)) to ldexp(1.0, y)
Simplify `pow(2.0, itofp(y))` to `ldexp(1.0, y)`.

Differential revision: https://reviews.llvm.org/D65979

llvm-svn: 369120
2019-08-16 15:33:41 +00:00
Roman Lebedev 16244fccfe [InstCombine] Shift amount reassociation in bittest: trunc-of-shl (PR42399)
Summary:
This is continuation of D63829 / https://bugs.llvm.org/show_bug.cgi?id=42399

I thought naive pattern would solve my issue, but nope, it involved truncation,
thus more folds needed.. This isn't really the fold i'm interested in,
i need trunc-of-lshr, but i'we decided to start with `shl` because it's simpler.

In this case, no extra legality checks are needed:
https://rise4fun.com/Alive/CAb

We should be careful about not increasing instruction count,
since we need to produce `zext` because `and` is done in wider type.

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66057

llvm-svn: 369117
2019-08-16 15:10:41 +00:00
Simon Pilgrim 59894d4668 [SLPVectorizer] Silence null dereference warning. NFCI.
cppcheck + MSVC analyzer both over zealously warn that we might dereference a null Bundle pointer - add an assertion to check for null to silence the warning, plus its a good idea to check that we succeeded in finding a schedule bundle anyway....

llvm-svn: 369094
2019-08-16 10:28:23 +00:00
Evgeniy Stepanov 75344955fc Move isPointerOffset function to ValueTracking (NFC).
Summary: To be reused in MemTag sanitizer.

Reviewers: pcc, vitalybuka, ostannard

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66165

llvm-svn: 369062
2019-08-15 22:58:28 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Dorit Nuzman d57d73daed [LV] fold-tail predication should be respected even with assume_safety
assume_safety implies that loads under "if's" can be safely executed
speculatively (unguarded, unmasked). However this assumption holds only for the
original user "if's", not those introduced by the compiler, such as the
fold-tail "if" that guards us from loading beyond the original loop trip-count.
Currently the combination of fold-tail and assume-safety pragmas results in
ignoring the fold-tail predicate that guards the loads, generating unmasked
loads. This patch fixes this behavior.

Differential Revision: https://reviews.llvm.org/D66106

Reviewers: Ayal, hsaito, fhahn
llvm-svn: 368973
2019-08-15 07:12:14 +00:00
Gor Nishanov efe0093404 [coroutine] Fixes "cannot move instruction since its users are not dominated by CoroBegin" problem.
Summary:
Fixes https://bugs.llvm.org/show_bug.cgi?id=36578 and https://bugs.llvm.org/show_bug.cgi?id=36296.
Supersedes: https://reviews.llvm.org/D55966

One of the fundamental transformation that CoroSplit pass performs before splitting the coroutine is to find which values need to survive between suspend and resume and provide a slot for them in the coroutine frame to spill and restore the value as needed.

Coroutine frame becomes available once the storage for it was allocated and that point is marked in the pre-split coroutine with a llvm.coro.begin intrinsic.

FE normally puts all of the user-authored code that would be accessing those values after llvm.coro.begin, however, sometimes instructions accessing those values would end up prior to coro.begin. For example, writing out a value of the parameter into the alloca done by the FE or instructions that are added by the optimization passes such as SROA when it rewrites allocas.

Prior to this change, CoroSplit pass would try to move instructions that may end up accessing the values in the coroutine frame after CoroBegin. However it would run into problems (report_fatal_error) if some of the values would be used both in the allocation function (for example allocator is passed as a parameter to a coroutine) and in the use-authored body of the coroutine.

To handle this case and to simplify the instruction moving logic, this change removes all of the instruction moving. Instead, we only change the uses of the spilled values that are dominated by coro.begin and leave other instructions intact.

Before:

```
%var = alloca i32
%1 = getelementptr .. %var; ; will move this one after coro.begin
%f = call i8* @llvm.coro.begin(
```

After:

```
%var = alloca i32
%1 = getelementptr .. %var; stays put
%f = call i8* @llvm.coro.begin(
```
If we discover that there is a potential write into an alloca, prior to coro.begin we would copy its value from the alloca into the spill slot in the coroutine frame.

Before:

```
%var = alloca i32
store .. %var ; will move this one after coro.begin
%f = call i8* @llvm.coro.begin(
```

After:

```
%var = alloca i32
store .. %var ;stays put
%f = call i8* @llvm.coro.begin(
%tmp = load %var
store %tmp, %spill.slot.for.var
```

Note: This change does not handle array allocas as that is something that C++ FE does not produce, but, it can be added in the future if need arises

Reviewers: llvm-commits, modocache, ben-clayton, tks2103, rjmccall

Reviewed By: modocache

Subscribers: bartdesmet

Differential Revision: https://reviews.llvm.org/D66230

llvm-svn: 368949
2019-08-15 00:48:51 +00:00
Johannes Doerfert 54f6be7b83 [Attributor] Try to fix "missing field 'RetInsts' initializer" warning
http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/35674/steps/build_Lld/logs/stdio

llvm-svn: 368938
2019-08-14 22:32:29 +00:00
Johannes Doerfert 5304b72a81 [Attributor][NFC] Make debug output consistent
llvm-svn: 368931
2019-08-14 22:04:28 +00:00
Philip Reames 7b0515176b [SCEV] Rename getMaxBackedgeTakenCount to getConstantMaxBackedgeTakenCount [NFC]
llvm-svn: 368930
2019-08-14 21:58:13 +00:00
Johannes Doerfert 4395b31d99 [Attributor][NFC] Try to eliminate warnings (debug build + fall through)
llvm-svn: 368928
2019-08-14 21:46:28 +00:00
Johannes Doerfert 17b578bc75 [Attributor][NFC] Introduce statistics macros for new positions
llvm-svn: 368927
2019-08-14 21:46:25 +00:00
Johannes Doerfert e1e844d6b0 [Attributor][NFC] Add merge/join/clamp operators to the IntegerState
Differential Revision: https://reviews.llvm.org/D66146

llvm-svn: 368925
2019-08-14 21:35:20 +00:00
Johannes Doerfert 6a1274a52e [Attributor] Use the AANoNull attribute directly in AADereferenceable
Summary:
Instead of constantly keeping track of the nonnull status with the
dereferenceable information we can simply query the nonnull attribute
whenever we need the information (debug + manifest).

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66113

llvm-svn: 368924
2019-08-14 21:31:32 +00:00
Johannes Doerfert def9928204 [Attributor] Use liveness during the creation of AAReturnedValues
Summary:
As one of the first attributes, and one of the complex ones,
AAReturnedValues was not using liveness but we filtered the result after
the fact. This change adds liveness usage during the creation. The
algorithm is also improved and shorter.

The new algorithm will collect returned values over time using the
generic facilities that work with liveness already, e.g.,
genericValueTraversal which does not look at dead PHI node predecessors.
A test to show how this leads to better results is included.

Note: Unresolved calls and resolved calls are now tracked explicitly.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66120

llvm-svn: 368922
2019-08-14 21:29:37 +00:00
Johannes Doerfert 9a1a1f96d9 [Attributor] Do not update or manifest dead attributes
Summary:
If the associated context instruction is assumed dead we do not need to
update or manifest the state.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66116

llvm-svn: 368921
2019-08-14 21:25:08 +00:00
Johannes Doerfert 710ebb03ed [Attributor] Use IRPosition consistently
Summary:
The next attempt to clean up the Attributor interface before we grow it
further.

Before, we used a combination of two values (associated + anchor) and an
argument number (or -1) to determine a location. This was very fragile.
The new system uses exclusively IR positions and we restrict the
generation of IR positions to special constructor methods that verify
internal constraints we have. This will catch misuse early.

The auto-conversion, e.g., in getAAFor, is now performed through the
SubsumingPositionIterator. This iterator takes an IR position and allows
to visit all IR positions that "subsume" the given one, e.g., function
attributes "subsume" argument attributes of that function. For a
detailed breakdown see the class comment of SubsumingPositionIterator.

This patch also introduces the IRPosition::getAttrs() to extract IR
attributes at a certain position. The method knows how to look up in
different positions that are equivalent, e.g., the argument position for
call site arguments. We also introduce three new positions kinds such
that we have all IR positions where attributes can be placed and one for
"floating" values.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65977

llvm-svn: 368919
2019-08-14 21:18:01 +00:00
Dinar Temirbulatov da0435a690 [SLP][NFC] Use pointers to address to ScalarToTreeEntry elements, instead of indexes.
llvm-svn: 368906
2019-08-14 19:46:50 +00:00
Philip Reames 6cca3ad43e [RLEV] Rewrite loop exit values for multiple exit loops w/o overall loop exit count
We already supported rewriting loop exit values for multiple exit loops, but if any of the loop exits were not computable, we gave up on all loop exit values. This patch generalizes the existing code to handle individual computable loop exits where possible.

As discussed in the review, this is a starting point for figuring out a better API.  The code is a bit ugly, but getting it in lets us test as we go.  

Differential Revision: https://reviews.llvm.org/D65544

llvm-svn: 368898
2019-08-14 18:27:57 +00:00
Matt Arsenault dbc1f207fa InferAddressSpaces: Move target intrinsic handling to TTI
I'm planning on handling intrinsics that will benefit from checking
the address space enums. Don't bother moving the address collection
for now, since those won't need th enums.

llvm-svn: 368895
2019-08-14 18:13:00 +00:00
Matt Arsenault 0eac2a2963 InferAddressSpaces: Remove unnecessary check for ConstantInt
The IR is invalid if this isn't a constant since immarg was added.

llvm-svn: 368893
2019-08-14 18:01:42 +00:00
David Bolvansky f94460d4b6 [SLC] Dereferenceable annonation - handle valid null pointers
Reviewers: jdoerfert, reames

Reviewed By: jdoerfert

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66161

llvm-svn: 368884
2019-08-14 17:15:20 +00:00
David Bolvansky 0e0fbae1a4 [BuildLibCalls] Noalias annotation
Summary: I think this is better solution than annotating callsites in IC/SLC.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66217

llvm-svn: 368875
2019-08-14 16:50:06 +00:00
Bill Wendling cc2bebe039 Ignore indirect branches from callbr.
Summary:
We can't speculate around indirect branches: indirectbr and invoke. The
callbr instruction needs to be included here.

Reviewers: nickdesaulniers, manojgupta, chandlerc

Reviewed By: chandlerc

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66200

llvm-svn: 368873
2019-08-14 16:44:07 +00:00
Simon Pilgrim 828a89e244 Fix "not all control paths return a value" MSVC warnings. NFCI.
llvm-svn: 368831
2019-08-14 11:31:05 +00:00
Simon Pilgrim 3f40bdb558 Fix "not all control paths return a value" MSVC warning. NFCI.
llvm-svn: 368830
2019-08-14 11:29:56 +00:00
Simon Pilgrim 8bba4798c2 Fix "not all control paths return a value" MSVC warnings. NFCI.
llvm-svn: 368829
2019-08-14 11:29:16 +00:00
Roman Lebedev 32f1e1a01d [InstCombine] Refactor getFlippedStrictnessPredicateAndConstant() out of canonicalizeCmpWithConstant(), NFCI
I'd like to use it elsewhere, hopefully without reinventing the wheel.
No functional change intended so far.

llvm-svn: 368820
2019-08-14 09:57:20 +00:00
Dorit Nuzman 491ca2425d [LV] Fold-tail flag
This is the compiler-flag equivalent of the Predicate pragma
(https://reviews.llvm.org/D65197), to direct the vectorizer to fold the
remainder-loop into the main-loop using predication.

Differential Revision: https://reviews.llvm.org/D66108

Reviewers: Ayal, hsaito, fhahn, SjoerdMeije
llvm-svn: 368801
2019-08-14 05:22:20 +00:00
David L. Jones d4edd9d97e Revert '[LICM] Make Loop ICM profile aware' and 'Fix pass dependency for LICM'
This reverts r368526 (git commit 7e71aa24bc)
This reverts r368542 (git commit cb5a90fd31)

llvm-svn: 368800
2019-08-14 04:50:33 +00:00
John McCall a318c55073 Coroutines: adjust for SVN r358739
CallSite has been removed in favour of CallBase.  Adjust the coroutine split to
account for that.

llvm-svn: 368798
2019-08-14 03:54:25 +00:00
John McCall 3bbf207fbc Don't run a full verifier pass in coro-splitting's private pipeline.
Potentially addresses rdar://49022293.

llvm-svn: 368797
2019-08-14 03:54:18 +00:00
John McCall 5f60b68c68 Remove unreachable blocks before splitting a coroutine.
The suspend-crossing algorithm is not correct in the presence of uses
that cannot be reached on some successor path from their defs.

llvm-svn: 368796
2019-08-14 03:54:13 +00:00
John McCall 2133feec93 Support swifterror in coroutine lowering.
The support for swifterror allocas should work in all lowerings.
The support for swifterror arguments only really works in a lowering
with prototypes where you can ensure that the prototype also has a
swifterror argument; I'm not really sure how it could possibly be
made to work in the switch lowering.

llvm-svn: 368795
2019-08-14 03:54:05 +00:00
John McCall d47801e718 In coro.retcon lowering, don't explode if the optimizer messes around with the linkage of the prototype or the exact types of the yielded values.
llvm-svn: 368793
2019-08-14 03:53:52 +00:00
John McCall ac40483276 Fix a use-after-free in the coro.alloca treatment.
llvm-svn: 368792
2019-08-14 03:53:46 +00:00
John McCall 62a5dde0c2 Add intrinsics for doing frame-bound dynamic allocations within a coroutine.
These rely on having an allocator provided to the coroutine and thus,
for now, only work in retcon lowerings.

llvm-svn: 368791
2019-08-14 03:53:40 +00:00
John McCall 137b50f0c3 Guard dumps in the coro intrinsic validation logic behind NDEBUG checks. dump() is not guaranteed to be defined in all builds.
llvm-svn: 368790
2019-08-14 03:53:31 +00:00
John McCall 3829214185 Generalize llvm.coro.suspend.retcon to allow an arbitrary number of arguments to be passed back to the continuation function.
llvm-svn: 368789
2019-08-14 03:53:26 +00:00
John McCall 94010b2b7f Extend coroutines to support a "returned continuation" lowering.
A quick contrast of this ABI with the currently-implemented ABI:

- Allocation is implicitly managed by the lowering passes, which is fine
  for frontends that are fine with assuming that allocation cannot fail.
  This assumption is necessary to implement dynamic allocas anyway.

- The lowering attempts to fit the coroutine frame into an opaque,
  statically-sized buffer before falling back on allocation; the same
  buffer must be provided to every resume point.  A buffer must be at
  least pointer-sized.

- The resume and destroy functions have been combined; the continuation
  function takes a parameter indicating whether it has succeeded.

- Conversely, every suspend point begins its own continuation function.

- The continuation function pointer is directly returned to the caller
  instead of being stored in the frame.  The continuation can therefore
  directly destroy the frame when exiting the coroutine instead of having
  to leave it in a defunct state.

- Other values can be returned directly to the caller instead of going
  through a promise allocation.  The frontend provides a "prototype"
  function declaration from which the type, calling convention, and
  attributes of the continuation functions are taken.

- On the caller side, the frontend can generate natural IR that directly
  uses the continuation functions as long as it prevents IPO with the
  coroutine until lowering has happened.  In combination with the point
  above, the frontend is almost totally in charge of the ABI of the
  coroutine.

- Unique-yield coroutines are given some special treatment.

llvm-svn: 368788
2019-08-14 03:53:17 +00:00
David Bolvansky 038d604f4f [SimplifyLibCalls] Add noalias from known callsites
Summary:
Should be fine for memcpy, strcpy, strncpy.


Reviewers: jdoerfert, efriedma

Reviewed By: jdoerfert

Subscribers: uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66135

llvm-svn: 368724
2019-08-13 17:18:46 +00:00
David Bolvansky 90a30fdcc3 [SLC] Improve dereferenceable bytes annotation
llvm-svn: 368715
2019-08-13 16:44:16 +00:00
Roman Lebedev 73f702ff19 [InstCombine] Non-canonical clamp-like pattern handling
Summary:
Given a pattern like:
```
%old_cmp1 = icmp slt i32 %x, C2
%old_replacement = select i1 %old_cmp1, i32 %target_low, i32 %target_high
%old_x_offseted = add i32 %x, C1
%old_cmp0 = icmp ult i32 %old_x_offseted, C0
%r = select i1 %old_cmp0, i32 %x, i32 %old_replacement
```
it can be rewritten as more canonical pattern:
```
%new_cmp1 = icmp slt i32 %x, -C1
%new_cmp2 = icmp sge i32 %x, C0-C1
%new_clamped_low = select i1 %new_cmp1, i32 %target_low, i32 %x
%r = select i1 %new_cmp2, i32 %target_high, i32 %new_clamped_low
```
Iff `-C1 s<= C2 s<= C0-C1`
Also, `ULT` predicate can also be `UGE`; or `UGT` iff `C0 != -1` (+invert result)
Also, `SLT` predicate can also be `SGE`; or `SGT` iff `C2 != INT_MAX` (+invert result)

If `C1 == 0`, then all 3 instructions must be one-use; else at most either `%old_cmp1` or `%old_x_offseted` can have extra uses.
NOTE: if we could reuse `%old_cmp1` as one of the comparisons we'll have to build, this could be less limiting.

So there are two icmp's, each one with 3 predicate variants, so there are 9 fold variants:

|     | ULT                            | UGE                             | UGT                             |
| SLT | https://rise4fun.com/Alive/yIJ | https://rise4fun.com/Alive/5BfN | https://rise4fun.com/Alive/INH  |
| SGE | https://rise4fun.com/Alive/hd8 | https://rise4fun.com/Alive/Abk  | https://rise4fun.com/Alive/PlzS |
| SGT | https://rise4fun.com/Alive/VYG | https://rise4fun.com/Alive/oMY  | https://rise4fun.com/Alive/KrzC |
{F9730206}

This fold was brought up in https://reviews.llvm.org/D65148#1603922 by @dmgreen, and is needed to unblock that patch.
This patch requires D65530.

Reviewers: spatel, nikic, xbolva00, dmgreen

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits, dmgreen

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65765

llvm-svn: 368687
2019-08-13 12:49:28 +00:00
Roman Lebedev 0410489a34 [InstCombine][NFC] Rename IsFreeToInvert() -> isFreeToInvert() for consistency
As per https://reviews.llvm.org/D65530#inline-592325

llvm-svn: 368686
2019-08-13 12:49:16 +00:00
Roman Lebedev 2635c324da [InstCombine] foldXorOfICmps(): don't give up on non-single-use ICmp's if all users are freely invertible
Summary:
This is rather unconventional..

As the comment there says, we don't have much folds for xor-of-icmps,
we try to turn them into an and-of-icmps, for which we have plenty of folds.
But if the ICmp we need to invert is not single-use - we give up.

As discussed in https://reviews.llvm.org/D65148#1603922,
we may have a non-canonical CLAMP pattern, with bit match and
select-of-threshold that we'll potentially clamp.
As it can be seen in `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`,
out of all 8 variations of the pattern, only two are **not** canonicalized into
the variant with and+icmp instead of bit math.
The reason is because the ICmp we need to invert is not single-use - we give up.

We indeed can't perform this fold at will, the general rule is that
we should not increase instruction count in InstCombine,

But we wouldn't end up increasing instruction count if we can adapt every other
user to the inverted value. This way the `not` we create **will** get folded,
and in the end the instruction count did not increase.

For that, of course, we need to look at the users of a Value,
which is again rather unconventional for InstCombine :S

Thus i'm proposing to be a little bit more insistive in `foldXorOfICmps()`.
The alternatives would be to not create that `not`, but add duplicate code to
manually invert all users; or to add some even less general combine to handle
some more specific pattern[s].

Reviewers: spatel, nikic, RKSimon, craig.topper

Reviewed By: spatel

Subscribers: hiraditya, jdoerfert, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65530

llvm-svn: 368685
2019-08-13 12:49:06 +00:00
David Bolvansky 39130314fe [SimplifyLibCalls] Add dereferenceable bytes from known callsites
Summary:
int mm(char *a, char *b) {
    return memcmp(a,b,16);
}

Currently:
define dso_local i32 @mm(i8* nocapture readonly %a, i8* nocapture readonly %b) local_unnamed_addr #1 {
entry:
  %call = tail call i32 @memcmp(i8* %a, i8* %b, i64 16)
  ret i32 %call
}

After patch:
define dso_local i32 @mm(i8* nocapture readonly %a, i8* nocapture readonly %b) local_unnamed_addr #1 {
entry:
  %call = tail call i32 @memcmp(i8* dereferenceable(16)  %a, i8* dereferenceable(16)  %b, i64 16)
  ret i32 %call
}




Reviewers: jdoerfert, efriedma

Reviewed By: jdoerfert

Subscribers: javed.absar, spatel, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66079

llvm-svn: 368657
2019-08-13 09:11:49 +00:00
Johannes Doerfert 26e58466de [Attributor] Use the cached data layout directly
This removes the warning by using the new DL member.
It also simplifies the code.

llvm-svn: 368625
2019-08-12 22:21:09 +00:00
Johannes Doerfert acc8079f8e [Attributor][NFC] Add IntegerState raw_ostream << operator
llvm-svn: 368622
2019-08-12 22:07:34 +00:00
Johannes Doerfert ece8190497 [Attributor] Make the InformationCache an Attributor member
The functionality is not changed but the interfaces are simplified and
repetition is removed.

llvm-svn: 368621
2019-08-12 22:05:53 +00:00
Wenlei He 4b99b58a84 [ThinLTO][AutoFDO] Fix memory corruption due to race condition from thin backends
Summary:
This commit fixed a race condition from multi-threaded thinLTO backends that causes non-deterministic memory corruption for a data structure used only by AutoFDO with compact binary profile.
GUIDToFuncNameMap, a static data member of type DenseMap in FunctionSamples is used as a per-module mapping from function name MD5 to name string when input AutoFDO profile is in compact binary format. However with ThinLTO, we can have parallel backends modifying and accessing the class static map concurrently. The fix is to make GUIDToFuncNameMap a member of SampleProfileLoader instead of a file static data.

Reviewers: wmi, davidxl, danielcdh

Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65848

llvm-svn: 368596
2019-08-12 17:45:14 +00:00
David Bolvansky 20d37fab82 [InstCombine] x /c fabs(x) -> copysign(1.0, x)
Summary:
x / fabs(x) -> copysign(1.0, x)
fabs(x) / x -> copysign(1.0, x)

Reviewers: spatel, foad, RKSimon, efriedma

Reviewed By: spatel

Subscribers: lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65898

llvm-svn: 368570
2019-08-12 13:43:35 +00:00
Roman Lebedev ccdad6ef48 [InstCombine] foldShiftIntoShiftInAnotherHandOfAndInICmp(): avoid constantexpr pitfail (PR42962)
Instead of matching value and then blindly casting to BinaryOperator
just to get the opcode, just match instruction and do no cast.

Fixes https://bugs.llvm.org/show_bug.cgi?id=42962

llvm-svn: 368554
2019-08-12 11:28:02 +00:00
Wenlei He cb5a90fd31 Fix pass dependency for LICM
Expected to address buildbot failure http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/16285 caused by D65060.

llvm-svn: 368542
2019-08-11 22:54:05 +00:00
Wenlei He 7e71aa24bc [LICM] Make Loop ICM profile aware
Summary:
Hoisting/sinking instruction out of a loop isn't always beneficial. Hoisting an instruction from a cold block inside a loop body out of the loop could hurt performance. This change makes Loop ICM profile aware - it now checks block frequency to make sure hoisting/sinking anly moves instruction to colder block.

Test Plan:

ninja check

Reviewers: asbirlea, sanjoy, reames, nikic, hfinkel, vsk

Reviewed By: asbirlea

Subscribers: fhahn, vsk, davidxl, xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65060

llvm-svn: 368526
2019-08-11 06:05:35 +00:00
Roman Lebedev 96474d17c6 [InstCombine][NFC] Use SimplifyAddInst() instead of SimplifyBinOp(Instruction::BinaryOps::Add, )
llvm-svn: 368521
2019-08-10 19:29:10 +00:00
Roman Lebedev a8d20b4467 [InstCombine] Shift amount reassociation in bittest: relax one-use check when shifting constant
If one of the values being shifted is a constant, since the new shift
amount is known-constant, the new shift will end up being constant-folded
so, we don't need that one-use restriction then.

llvm-svn: 368519
2019-08-10 19:28:54 +00:00
Roman Lebedev 64fe806c4e [InstCombine] Shift amount reassociation in bittest: drop pointless one-use restriction
That one-use restriction is not needed for correctness - we have already
ensured that one of the shifts will go away, so we know we won't increase
the instruction count. So there is no need for that restriction.

llvm-svn: 368518
2019-08-10 19:28:44 +00:00
Sanjay Patel 21c15ef384 [Reassociate] try harder to convert negative FP constants to positive
This is an extension of a transform that tries to produce positive floating-point
constants to improve canonicalization (and hopefully lead to more reassociation
and CSE).

The original patches were:
D4904
D5363 (rL221721)

But as the test diffs show, these were limited to basic patterns by walking from
an instruction to its single user rather than recursively moving up the def-use
sequence. No fast-math is required here because we're only rearranging implicit
FP negations in intermediate ops.

A motivating bug is:
https://bugs.llvm.org/show_bug.cgi?id=32939

Differential Revision: https://reviews.llvm.org/D65954

llvm-svn: 368512
2019-08-10 13:17:54 +00:00
Peter Collingbourne 0e497d1554 cfi-icall: Allow the jump table to be optionally made non-canonical.
The default behavior of Clang's indirect function call checker will replace
the address of each CFI-checked function in the output file's symbol table
with the address of a jump table entry which will pass CFI checks. We refer
to this as making the jump table `canonical`. This property allows code that
was not compiled with ``-fsanitize=cfi-icall`` to take a CFI-valid address
of a function, but it comes with a couple of caveats that are especially
relevant for users of cross-DSO CFI:

- There is a performance and code size overhead associated with each
  exported function, because each such function must have an associated
  jump table entry, which must be emitted even in the common case where the
  function is never address-taken anywhere in the program, and must be used
  even for direct calls between DSOs, in addition to the PLT overhead.

- There is no good way to take a CFI-valid address of a function written in
  assembly or a language not supported by Clang. The reason is that the code
  generator would need to insert a jump table in order to form a CFI-valid
  address for assembly functions, but there is no way in general for the
  code generator to determine the language of the function. This may be
  possible with LTO in the intra-DSO case, but in the cross-DSO case the only
  information available is the function declaration. One possible solution
  is to add a C wrapper for each assembly function, but these wrappers can
  present a significant maintenance burden for heavy users of assembly in
  addition to adding runtime overhead.

For these reasons, we provide the option of making the jump table non-canonical
with the flag ``-fno-sanitize-cfi-canonical-jump-tables``. When the jump
table is made non-canonical, symbol table entries point directly to the
function body. Any instances of a function's address being taken in C will
be replaced with a jump table address.

This scheme does have its own caveats, however. It does end up breaking
function address equality more aggressively than the default behavior,
especially in cross-DSO mode which normally preserves function address
equality entirely.

Furthermore, it is occasionally necessary for code not compiled with
``-fsanitize=cfi-icall`` to take a function address that is valid
for CFI. For example, this is necessary when a function's address
is taken by assembly code and then called by CFI-checking C code. The
``__attribute__((cfi_jump_table_canonical))`` attribute may be used to make
the jump table entry of a specific function canonical so that the external
code will end up taking a address for the function that will pass CFI checks.

Fixes PR41972.

Differential Revision: https://reviews.llvm.org/D65629

llvm-svn: 368495
2019-08-09 22:31:59 +00:00
Evandro Menezes 59fbe516bd [InstCombine] Refactor optimizeExp2() (NFC)
Refactor `LibCallSimplifier::optimizeExp2()` to use the new
`emitBinaryFloatFnCall()` version that fetches the function name from TLI.

llvm-svn: 368457
2019-08-09 17:22:56 +00:00
Evandro Menezes 8a21214174 [Transforms] Add a emitBinaryFloatFnCall() version that fetches the function name from TLI
Add the counterpart to a similar function for single operands.

Differential revision: https://reviews.llvm.org/D65976

llvm-svn: 368453
2019-08-09 17:06:46 +00:00
Evandro Menezes c6c00cdf2e [Transforms] Rename hasUnaryFloatFn() and getUnaryFloatFn() (NFC)
Rename `hasUnaryFloatFn()` to `hasFloatFn()` and `getUnaryFloatFn()` to `getFloatFnName()`.

llvm-svn: 368449
2019-08-09 16:04:18 +00:00
Sanjay Patel 991834a516 [GlobalOpt] prevent crashing on large integer types (PR42932)
This is a minimal fix (copy the predicate for the assert) to
prevent the crashing seen in:
https://bugs.llvm.org/show_bug.cgi?id=42932
...when converting a constant integer of arbitrary width to uint64_t.

Differential Revision: https://reviews.llvm.org/D65970

llvm-svn: 368437
2019-08-09 12:43:25 +00:00
Bjorn Pettersson d218a3326e [InstSimplify] Report "Changed" also when only deleting dead instructions
Summary:
Make sure that we report that changes has been made
by InstSimplify also in situations when only trivially
dead instructions has been removed. If for example a call
is removed the call graph must be updated.

Bug seem to have been introduced by llvm-svn r367173
(commit 02b9e45a7e), since the code in question
was rewritten in that commit.

Reviewers: spatel, chandlerc, foad

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65973

llvm-svn: 368401
2019-08-09 07:08:25 +00:00
Peter Collingbourne bb17e46644 Linker: Add support for GlobalIFunc.
GlobalAlias and GlobalIFunc ought to be treated the same by the IR
linker, so we can generalize the code to be in terms of their common
base class GlobalIndirectSymbol.

Differential Revision: https://reviews.llvm.org/D55046

llvm-svn: 368357
2019-08-08 22:09:18 +00:00
Cameron McInally 8416f20f2f [LICM] Support unary FNeg in LICM
Differential Revision: https://reviews.llvm.org/D65908

llvm-svn: 368350
2019-08-08 21:38:31 +00:00
Tim Corringham 4f64f1ba3c Add llvm.licm.disable metadata
For some targets the LICM pass can result in sub-optimal code in some
cases where it would be better not to run the pass, but it isn't
always possible to suppress the transformations heuristically.

Where the front-end has insight into such cases it is beneficial
to attach loop metadata to disable the pass - this change adds the
llvm.licm.disable metadata to enable that.

Differential Revision: https://reviews.llvm.org/D64557

llvm-svn: 368296
2019-08-08 13:46:17 +00:00
Johannes Doerfert d1b79e0774 [Attributor][Stats] Locate statistics tracking with the attributes
Summary:
The ever growing switch required Attribute::AttrKind values but they
might not be available for all abstract attributes we deduce. With the
new method we track statistics at the abstract attribute level. The
provided macros simplify the usage and make the messages uniform.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65732

llvm-svn: 368227
2019-08-07 22:46:11 +00:00
Johannes Doerfert beb5150f47 [Attributor][NFC] Code simplification and style normalization
llvm-svn: 368225
2019-08-07 22:36:15 +00:00
Johannes Doerfert 344d038960 [Attributor] Introduce a state wrapper class
Summary:
The wrapper reduces boilerplate code and also provide a nice way to
determine the state type used by an abstract attributes statically via
AAType::StateType.

This was already discussed as part of the review of D65711.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65786

llvm-svn: 368224
2019-08-07 22:34:26 +00:00
Johannes Doerfert d620781872 [Attributor][NFC] Avoid unnecessary liveness queries
If we know everything is live there is no need to query for liveness.
Indicating a pessimistic fixpoint will cause the state to be "invalid"
which will cause the Attributor to not return the AAIsDead on request,
which will prevent us from querying isAssumedDead().

llvm-svn: 368223
2019-08-07 22:32:38 +00:00
Johannes Doerfert 14a0493a88 [Attributor] Provide easier checkForallReturnedValues functionality
Summary:
So far, whenever one wants to look at returned values, one had to deal
with the AAReturnedValues and potentially with the AAIsDead attribute.
In the same spirit as other checkForAllXXX methods, we add this
functionality now to the Attributor. By adopting the use sites we got
better results when return instructions were dead.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65733

llvm-svn: 368222
2019-08-07 22:27:24 +00:00
Craig Topper 005b22855e [LoopVectorize][X86] Clamp interleave factor if we have a known constant trip count that is less than VF*interleave
If we know the trip count, we should make sure the interleave factor won't cause the vectorized loop to exceed it.

Improves one of the cases from PR42674

Differential Revision: https://reviews.llvm.org/D65896

llvm-svn: 368215
2019-08-07 21:44:14 +00:00
Stefan Stipanovic aaa5270c53 [Attributor] Introduce checkForAllReadWriteInstructions(...).
Summary: Similarly to D65731 `Attributor::checkForAllReadWriteInstructions` is introduced.

Reviewers: jdoerfert, uenoku

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65825

llvm-svn: 368194
2019-08-07 18:26:02 +00:00
Jay Foad 8e8b295835 [InstCombine] Propagate fast math flags through selects
Summary:
In SimplifySelectsFeedingBinaryOp, propagate fast math flags from the
outer op into both arms of the new select, to take advantage of
simplifications that require fast math flags.

Reviewers: mcberg2017, majnemer, spatel, arsenm, xbolva00

Subscribers: wdng, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65658

llvm-svn: 368175
2019-08-07 15:16:28 +00:00
Cameron McInally 303b6dbfb4 [EarlyCSE] Add support for unary FNeg to EarlyCSE
Differential Revision: https://reviews.llvm.org/D65815

llvm-svn: 368171
2019-08-07 14:34:41 +00:00
Roman Lebedev 9bece444dd [InstCombine] Recommit: Shift amount reassociation: shl-trunc-shl pattern
This was initially committed in r368059 but got reverted in r368084
because there was a faulty logic in how the shift amounts type mismatch
was being handled (it simply wasn't).

I've added an explicit bailout before we SimplifyAddInst() - i don't think
it's designed in general to handle differently-typed values, even though
the actual problem only comes from ConstantExpr's.

I have also changed the common type deduction, to not just blindly
look past zext, but try to do that so that in the end types match.

Differential Revision: https://reviews.llvm.org/D65380

llvm-svn: 368141
2019-08-07 09:41:50 +00:00
Mitch Phillips 924359dc0f Revert "[X86] Add more extract subvector cost model tests for smaller element sizes and smaller than 128-bit vectors."
This reverts commit fc33e33776.

This commit depends on the rolled back commit rL367901, and thus needs
to be rolled back.

llvm-svn: 368109
2019-08-06 23:38:14 +00:00
Peter Collingbourne 0930643ff6 hwasan: Instrument globals.
Globals are instrumented by adding a pointer tag to their symbol values
and emitting metadata into a special section that allows the runtime to tag
their memory when the library is loaded.

Due to order of initialization issues explained in more detail in the comments,
shadow initialization cannot happen during regular global initialization.
Instead, the location of the global section is marked using an ELF note,
and we require libc support for calling a function provided by the HWASAN
runtime when libraries are loaded and unloaded.

Based on ideas discussed with @evgeny777 in D56672.

Differential Revision: https://reviews.llvm.org/D65770

llvm-svn: 368102
2019-08-06 22:07:29 +00:00
Guanzhong Chen b3292a8469 [WebAssembly] Lower ASan constructor priority on Emscripten
Summary:
This change gives Emscripten the ability to use more than one constructor
priorities that runs before ASan. By convention, constructor priorites 0-100
are reserved for use by the system. ASan on Emscripten now uses priority 50,
leaving plenty of room for use by Emscripten before and after ASan.

This change is done in response to:
https://github.com/emscripten-core/emscripten/pull/9076#discussion_r310323723

Reviewers: kripken, tlively, aheejin

Reviewed By: tlively

Subscribers: cfe-commits, dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D65684

llvm-svn: 368101
2019-08-06 21:52:58 +00:00
Reid Kleckner e4bd38478b Revert [InstCombine] Shift amount reassociation: shl-trunc-shl pattern
This reverts r368059 (git commit 0f95710976)

This caused Clang to assert while self-hosting and compiling
SystemZInstrInfo.cpp. Reduction is running.

llvm-svn: 368084
2019-08-06 20:32:07 +00:00