Commit Graph

21919 Commits

Author SHA1 Message Date
Quentin Colombet ea3364bf85 [BlockExtractor] Extend the file format to support the grouping of basic blocks
Prior to this patch, each basic block listed in the extrack-blocks-file
would be extracted to a different function.

This patch adds the support for comma separated list of basic blocks
to form group.

When the region formed by a group is not extractable, e.g., not single
entry, all the blocks of that group are left untouched.

Let us see this new format in action (comments are not part of the
file format):
;; funcName bbName[,bbName...]
   foo      bb1        ;; Extract bb1 in its own function
   foo      bb2,bb3    ;; Extract bb2,bb3 in their own function
   bar      bb1,bb4    ;; Extract bb1,bb4 in their own function
   bar      bb2        ;; Extract bb2 in its own function

Assuming all regions are extractable, this will create one function and
thus one call per region.

Differential Revision: https://reviews.llvm.org/D60746

llvm-svn: 358701
2019-04-18 18:28:30 +00:00
Philip Reames adf288c5d9 [LoopPred] Fix a blatantly obvious bug in r358684
The bug is that I didn't check whether the operand of the invariant_loads were themselves invariant.  I don't know how this got missed in the patch and review.  I even had an unreduced test case locally, and I remember handling this case, but I must have lost it in one of the rebases.  Oops.

llvm-svn: 358688
2019-04-18 17:01:19 +00:00
Philip Reames 92a7177e6b [LoopPredication] Allow predication of loop invariant computations (within the loop)
The purpose of this patch is to eliminate a pass ordering dependence between LoopPredication and LICM. To understand the purpose, consider the following snippet of code inside some loop 'L' with IV 'i'
A = _a.length;
guard (i < A)
a = _a[i]
B = _b.length;
guard (i < B);
b = _b[i];
...
Z = _z.length;
guard (i < Z)
z = _z[i]
accum += a + b + ... + z;

Today, we need LICM to hoist the length loads, LoopPredication to make the guards loop invariant, and TrivialUnswitch to eliminate the loop invariant guard to establish must execute for the next length load. Today, if we can't prove speculation safety, we'd have to iterate these three passes 26 times to reduce this example down to the minimal form.

Using the fact that the array lengths are known to be invariant, we can short circuit this iteration. By forming the loop invariant form of all the guards at once, we remove the need for LoopPredication from the iterative cycle. At the moment, we'd still have to iterate LICM and TrivialUnswitch; we'll leave that part for later.

As a secondary benefit, this allows LoopPred to expose peeling oppurtunities in a much more obvious manner.  See the udiv test changes as an example.  If the udiv was not hoistable (i.e. we couldn't prove speculation safety) this would be an example where peeling becomes obviously profitable whereas it wasn't before.

A couple of subtleties in the implementation:
- SCEV's isSafeToExpand guarantees speculation safety (i.e. let's us expand at a new point).  It is not a precondition for expansion if we know the SCEV corresponds to a Value which dominates the requested expansion point.
- SCEV's isLoopInvariant returns true for expressions which compute the same value across all iterations executed, regardless of where the original Value is located.  (i.e. it can be in the loop)  This implies we have a speculation burden to prove before expanding them outside loops.
- invariant_loads and AA->pointsToConstantMemory are two cases that SCEV currently does not handle, but meets the SCEV definition of invariance.  I plan to sink this part into SCEV once this has baked for a bit.

Differential Revision: https://reviews.llvm.org/D60093

llvm-svn: 358684
2019-04-18 16:33:17 +00:00
Eric Christopher eff3b6fe7f Elaborate why we have an option on by default for enabling chr.
llvm-svn: 358641
2019-04-18 06:17:40 +00:00
Richard Trieu 7b6192025e Fix bad compare function over FusionCandidate.
Reverse the checking of the domiance order so that when a self compare happens,
it returns false.  This makes compare function have strict weak ordering.

llvm-svn: 358636
2019-04-18 01:39:45 +00:00
Akira Hatanaka 0b19f5aef9 Fix formatting. NFC
llvm-svn: 358623
2019-04-17 23:14:39 +00:00
Denis Bakhvalov cfd25a4b0e Test commit by Denis Bakhvalov
Change-Id: I4d85123a157d957434902fb14ba50926b2d56212
llvm-svn: 358619
2019-04-17 22:27:30 +00:00
Kit Barton 3cdf87940f Add basic loop fusion pass.
This patch adds a basic loop fusion pass. It will fuse loops that conform to the
following 4 conditions:
  1. Adjacent (no code between them)
  2. Control flow equivalent (if one loop executes, the other loop executes)
  3. Identical bounds (both loops iterate the same number of iterations)
  4. No negative distance dependencies between the loop bodies.

The pass does not make any changes to the IR to create opportunities for fusion.
Instead, it checks if the necessary conditions are met and if so it fuses two
loops together.

The pass has not been added to the pass pipeline yet, and thus is not enabled by
default. It can be run stand alone using the -loop-fusion option.

Differential Revision: https://reviews.llvm.org/D55851

llvm-svn: 358607
2019-04-17 18:53:27 +00:00
Philip Reames 88679717ce [InstCombine] Factor out unreachable inst idiom creation [NFC]
In InstCombine, we use an idiom of "store i1 true, i1 undef" to indicate we've found a path which we've proven unreachable.  We can't actually insert the unreachable instruction since that would require changing the CFG.  We leave that to simplifycfg later.

This just factors out that idiom creation so we don't duplicate the same mostly undocument idiom creation in multiple places.

llvm-svn: 358600
2019-04-17 17:37:58 +00:00
Florian Hahn 893aea58ea [LoopUnroll] Allow unrolling if the unrolled size does not exceed loop size.
Summary:
In the following cases, unrolling can be beneficial, even when
optimizing for code size:
 1) very low trip counts
 2) potential to constant fold most instructions after fully unrolling.

We can unroll in those cases, by setting the unrolling threshold to the
loop size. This might highlight some cost modeling issues and fixing
them will have a positive impact in general.

Reviewers: vsk, efriedma, dmgreen, paquette

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D60265

llvm-svn: 358586
2019-04-17 15:57:43 +00:00
Roman Lebedev 0080645846 [CVP] processOverflowIntrinsic(): don't crash if constant-holding happened
As reported by Mikael Holmén in post-commit review in
https://reviews.llvm.org/D60791#1469765

llvm-svn: 358559
2019-04-17 06:35:07 +00:00
Eric Christopher e29874eaa0 Revert "Add basic loop fusion pass." Per request.
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358553
2019-04-17 04:55:24 +00:00
Eric Christopher cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher 0ebbf72a63 Remove the run-slp-after-loop-vectorization option.
It's been on by default for 4 years and cleans up the pass
hierarchy.

llvm-svn: 358548
2019-04-17 02:26:27 +00:00
Eric Christopher a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Kit Barton ab70da0728 Add basic loop fusion pass.
This patch adds a basic loop fusion pass. It will fuse loops that conform to the
following 4 conditions:
  1. Adjacent (no code between them)
  2. Control flow equivalent (if one loop executes, the other loop executes)
  3. Identical bounds (both loops iterate the same number of iterations)
  4. No negative distance dependencies between the loop bodies.

The pass does not make any changes to the IR to create opportunities for fusion.
Instead, it checks if the necessary conditions are met and if so it fuses two
loops together.

The pass has not been added to the pass pipeline yet, and thus is not enabled by
default. It can be run stand alone using the -loop-fusion option.

Phabricator: https://reviews.llvm.org/D55851
llvm-svn: 358543
2019-04-17 01:37:00 +00:00
Ali Tamur e8de5cd602 Fix a typo in comments. [NFC]
llvm-svn: 358531
2019-04-16 21:37:43 +00:00
Sanjay Patel e08783e2f5 [EarlyCSE] detect equivalence of selects with inverse conditions and commuted operands (PR41101)
This is 1 of the problems discussed in the post-commit thread for:
rL355741 / http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190311/635516.html
and filed as:
https://bugs.llvm.org/show_bug.cgi?id=41101

Instcombine tries to canonicalize some of these cases (and there's room for improvement
there independently of this patch), but it can't always do that because of extra uses.
So we need to recognize these commuted operand patterns here in EarlyCSE. This is similar
to how we detect commuted compares and commuted min/max/abs.

Differential Revision: https://reviews.llvm.org/D60723

llvm-svn: 358523
2019-04-16 20:41:20 +00:00
Nikita Popov 52b24ee932 [CVP] Simplify umulo and smulo that cannot overflow
If a umul.with.overflow or smul.with.overflow operation cannot
overflow, simplify it to a simple mul nuw / mul nsw. After the
refactoring in D60668 this is just a matter of removing an
explicit check against multiplications.

Differential Revision: https://reviews.llvm.org/D60791

llvm-svn: 358521
2019-04-16 20:31:41 +00:00
Simon Pilgrim 82ffa88a04 [SLP] Refactoring of the operand reordering code.
This is a refactoring patch which should have all the functionality of the current code. Its goal is twofold:
i. Cleanup and simplify the reordering code, and
ii. Generalize reordering so that it will work for an arbitrary number of operands, not just 2.

This is the second patch in a series of patches that will enable operand reordering across chains of operations. An example of this was presented in EuroLLVM'18 https://www.youtube.com/watch?v=gIEn34LvyNo .

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59973

llvm-svn: 358519
2019-04-16 19:27:00 +00:00
Nikita Popov 5ecd6a48b9 [InstCombine] Prune fshl/fshr with masked operands
If a constant shift amount is used, then only some of the LHS/RHS
operand bits are demanded and we may be able to simplify based on
that. InstCombineSimplifyDemanded already had the necessary support
for that, we just weren't calling it with fshl/fshr as root.

In particular, this allows us to relax some masked funnel shifts
into simple shifts, as shown in the tests.

Patch by Shawn Landden.

Differential Revision: https://reviews.llvm.org/D60660

llvm-svn: 358515
2019-04-16 19:05:49 +00:00
Nikita Popov 79dffc67b5 [IR] Add WithOverflowInst class
This adds a WithOverflowInst class with a few helper methods to get
the underlying binop, signedness and nowrap type and makes use of it
where sensible. There will be two more uses in D60650/D60656.

The refactorings are all NFC, though I left some TODOs where things
could be improved. In particular we have two places where add/sub are
handled but mul isn't.

Differential Revision: https://reviews.llvm.org/D60668

llvm-svn: 358512
2019-04-16 18:55:16 +00:00
Hans Wennborg 21eb771dcb Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)
The original commit caused false positives from AddressSanitizer's
use-after-scope checks, which have now been fixed in r358478.

> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936

llvm-svn: 358483
2019-04-16 12:13:25 +00:00
Hans Wennborg 6ae05777b8 Asan use-after-scope: don't poison allocas if there were untraced lifetime intrinsics in the function (PR41481)
If there are any intrinsics that cannot be traced back to an alloca, we
might have missed the start of a variable's scope, leading to false
error reports if the variable is poisoned at function entry. Instead, if
there are some intrinsics that can't be traced, fail safe and don't
poison the variables in that function.

Differential revision: https://reviews.llvm.org/D60686

llvm-svn: 358478
2019-04-16 07:54:20 +00:00
Quentin Colombet 474a9679bd [CodeExtractor] Add a few debug lines to understand why a region is not extracted
The CodeExtractor is not smart enough to compute which basic block is
the entry of a region. Instead it relies on the order of the list
of basic blocks that is handed to it and assumes that the entry
is the first block in the list.

Without the additional debug information, it is hard to understand
why a valid region does not get extracted, because we would miss
that the order of in the list just doesn't match what the CodeExtractor
wants.

NFC

llvm-svn: 358471
2019-04-16 02:12:05 +00:00
Quentin Colombet fda0426888 [LSR] Rewrite misses some fixup locations if it splits critical edge
If LSR split critical edge during rewriting phi operands and
phi node has other pending fixup operands, we need to
update those pending fixups. Otherwise formulae will not be
implemented completely and some instructions will not be eliminated.

llvm.org/PR41445

Differential Revision: https://reviews.llvm.org/D60645

Patch by: Denis Bakhvalov <denis.bakhvalov@intel.com>

llvm-svn: 358457
2019-04-15 22:23:46 +00:00
Philip Reames e46d77d1d9 [LoopPred] Stop passing around builders [NFC]
This is a preparatory patch for D60093. This patch itself is NFC, but while preparing this I noticed and committed a small hoisting change in rL358419.

The basic structure of the new scheme is that we pass around the guard ("the using instruction"), and select an optimal insert point by examining operands at each construction point. This seems conceptually a bit cleaner to start with as it isolates the knowledge about insertion safety at the actual insertion point.

Note that the non-hoisting path is not actually used at the moment. That's not exercised until D60093 is rebased on this one.

Differential Revision: https://reviews.llvm.org/D60718

llvm-svn: 358434
2019-04-15 18:15:08 +00:00
Wolfgang Pieb 4fe42214e2 [DEBUGINFO] Prevent Instcombine from dropping debuginfo when removing zexts
Zexts can be treated like no-op casts when it comes to assessing whether their
removal affects debug info.

Reviewer: aprantl

Differential Revision: https://reviews.llvm.org/D60641

llvm-svn: 358431
2019-04-15 17:36:29 +00:00
Hiroshi Yamauchi 09e539fcae [PGO] Profile guided code size optimization.
Summary:
Enable some of the existing size optimizations for cold code under PGO.

A ~5% code size saving in big internal app under PGO.

The way it gets BFI/PSI is discussed in the RFC thread

http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html 

Note it doesn't currently touch loop passes.

Reviewers: davidxl, eraman

Reviewed By: eraman

Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59514

llvm-svn: 358422
2019-04-15 16:49:00 +00:00
Philip Reames fbe64a2cfb [LoopPred] Hoist and of predicated checks where legal
If we have multiple range checks which can be predicated, hoist the and of the results outside the loop.  This minorly cleans up the resulting IR, but the main motivation is as a building block for D60093.

llvm-svn: 358419
2019-04-15 15:53:25 +00:00
Sanjay Patel 5e13cd2e61 [InstCombine] canonicalize fdiv after fmul if reassociation is allowed
(X / Y) * Z --> (X * Z) / Y

This can allow other optimizations/reassociations as shown in the test diffs.

llvm-svn: 358404
2019-04-15 13:23:38 +00:00
Alexander Potapenko 6a63e5aa7b [Transforms][ASan] Move findAllocaForValue() to Utils/Local.cpp. NFC
Summary:
Factor out findAllocaForValue() from ASan so that we can use it in
MSan to handle lifetime intrinsics.

Reviewers: eugenis, pcc

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60615

llvm-svn: 358380
2019-04-15 08:59:56 +00:00
Fangrui Song de20429cfc [Mem2Reg] Delete unused PointerAllocaValues
It is unused after AliasSetTracker support was removed.

llvm-svn: 358352
2019-04-14 07:28:29 +00:00
Fangrui Song e57c53df4f [Mem2Reg] Simplify and micro optimize
* Rearrange continu/break
* BBNumbers.lookup(A) -> BBNumbers.find(A)->second
  BBNumbers has been computed, thus we can assume the value exists in the predicate.

llvm-svn: 358351
2019-04-14 07:20:03 +00:00
Fangrui Song f42990e687 [Mem2Reg] Don't call LBI.deleteValue on AllocInst/DbgVariableIntrinsic
Only StoreInst/LoadInst are assigned numbers. Other types of instructions are not in LBI.

llvm-svn: 358350
2019-04-14 06:27:07 +00:00
Fangrui Song 8f9bb2250b [Mem2Reg] Simplify rewriteSingleStoreAlloca
llvm-svn: 358349
2019-04-14 05:48:13 +00:00
Nikita Popov 95e5f28337 [InstCombine] Remove redundant/bogus mul_with_overflow combines
As pointed out in D60518 folding mulo(%x, undef) to {undef, undef}
isn't correct. As a correct version of this already exists in
InstructionSimplify (bd8056ef32/lib/Analysis/InstructionSimplify.cpp (L4750-L4757)) this is just
dead code though. Drop it together with the mul(%x, 0) -> {0, false}
fold that is also already handled by InstSimplify.

Differential Revision: https://reviews.llvm.org/D60649

llvm-svn: 358339
2019-04-13 19:43:35 +00:00
Fangrui Song 8540486974 [Mem2Reg] Delete unused AllocaPointerVal
It is no longer used after the AliasSetTracker updating logic was removed.

llvm-svn: 358334
2019-04-13 15:41:42 +00:00
Chen Zheng 87dd0e06dc [InstCombine] Canonicalize (-X srem Y) to -(X srem Y).
Differential Revision: https://reviews.llvm.org/D60647

llvm-svn: 358328
2019-04-13 09:21:22 +00:00
Alina Sbirlea 2312a06c87 [SCEV] Add option to forget everything in SCEV.
Summary:
Create a method to forget everything in SCEV.
Add a cl::opt and PassManagerBuilder option to use this in LoopUnroll.

Motivation: Certain Halide applications spend a very long time compiling in forgetLoop, and prefer to forget everything and rebuild SCEV from scratch.
Sample difference in compile time reduction: 21.04 to 14.78 using current ToT release build.
Testcase showcasing this cannot be opensourced and is fairly large.

The option disabled by default, but it may be desirable to enable by
default. Evidence in favor (two difference runs on different days/ToT state):

File Before (s) After (s)
clang-9.bc 7267.91 6639.14
llvm-as.bc 194.12 194.12
llvm-dis.bc 62.50 62.50
opt.bc 1855.85 1857.53

File Before (s) After (s)
clang-9.bc 8588.70 7812.83
llvm-as.bc 196.20 194.78
llvm-dis.bc 61.55 61.97
opt.bc 1739.78 1886.26

Reviewers: sanjoy

Subscribers: mehdi_amini, jlebar, zzheng, javed.absar, dmgreen, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60144

llvm-svn: 358304
2019-04-12 19:16:07 +00:00
Philip Reames b091cc081d [InstCombine] Fix a nasty miscompile introduced w/masked.gather demanded elts
This fixes a miscompile which was introduced in r356510 (https://reviews.llvm.org/D57372).

The problem is that the original patch removed pointer operands where the load results we're demanded, but without considering the legality of the load itself.  If the masked.gather had active, but undemanded, lanes, then we could end up creating a load which loaded from an undef address.  The result could be a segfault, or, in theory, an arbitrary read from a random memory location into an used register.  

llvm-svn: 358299
2019-04-12 18:26:56 +00:00
Nikita Popov 00a0d5d1de [CVP] Set NSW/NUW flags when simplifying with.overflow
When CVP determines that a with.overflow intrinsic cannot overflow,
it currently inserts a simple add/sub. As we already determined that
there can be no overflow, we should add the appropriate NUW/NSW flag.

Differential Revision: https://reviews.llvm.org/D60585

llvm-svn: 358298
2019-04-12 18:18:17 +00:00
Jeremy Morse 32afe6a1f8 [DebugInfo] Fix pr41175 Dead Store Elimination missing debug loc
Bug: https://bugs.llvm.org/show_bug.cgi?id=41175

In the bug test case the DSE pass is shortening the range of memory that a
memset is working on. A getelementptr is generated so that the new
starting address can be passed to memset. This instruction was not given
a DebugLoc.

To fix the bug, copy the DebugLoc from the memset instruction.

Patch by Orlando Cazalet-Hyams!

Differential Revision: https://reviews.llvm.org/D60556

llvm-svn: 358270
2019-04-12 09:47:35 +00:00
Fangrui Song cecc435250 Use llvm::lower_bound. NFC
This reapplies rL358161. That commit inadvertently reverted an exegesis file to an old version.

llvm-svn: 358246
2019-04-12 02:02:06 +00:00
Rong Xu 959ef16859 [PGO] Better handling of profile hash mismatch
We currently assume profile hash conflicts will be caught by an upfront
check and we assert for the cases that escape the check. The assumption
is not always true as there are chances of conflict. This patch prints
a warning and skips annotating the function for the escaped cases,.

Differential Revision: https://reviews.llvm.org/D60154

llvm-svn: 358225
2019-04-11 20:54:17 +00:00
Ali Tamur 7822b46188 Revert "Use llvm::lower_bound. NFC"
This reverts commit rL358161.

This patch have broken the test:
llvm/test/tools/llvm-exegesis/X86/uops-CMOV16rm-noreg.s

llvm-svn: 358199
2019-04-11 17:35:20 +00:00
Fangrui Song 71cce580b9 Use llvm::lower_bound. NFC
llvm-svn: 358161
2019-04-11 10:25:41 +00:00
Nikita Popov 0a8228fd28 [InstCombine] Handle ssubo always overflow
Following D60483 and D60497, this adds support for AlwaysOverflows
handling for ssubo. This is the last case we can handle right now.

Differential Revision: https://reviews.llvm.org/D60518

llvm-svn: 358100
2019-04-10 16:32:15 +00:00
Nikita Popov 7a543c3758 [InstCombine] ssubo X, C -> saddo X, -C
ssubo X, C is equivalent to saddo X, -C. Make the transformation in
InstCombine and allow the logic implemented for saddo to fold prior
usages of add nsw or sub nsw with constants.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D60061

llvm-svn: 358099
2019-04-10 16:27:36 +00:00
Nikita Popov ef23e88480 [InstCombine] Handle saddo always overflow
Followup to D60483: Handle AlwaysOverflow conditions for saddo as
well.

Differential Revision: https://reviews.llvm.org/D60497

llvm-svn: 358095
2019-04-10 16:18:01 +00:00
Florian Hahn db1a69c250 [VPLAN] Minor improvement to testing and debug messages.
1. Use computed VF for stress testing.
2. If the computed VF does not produce vector code (VF smaller than 2), force VF to be 4.
3. Test vectorization of i64 data on AArch64 to make sure we generate VF != 4 (on X86 that was already tested on AVX).

Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>

Differential Revision: https://reviews.llvm.org/D59952

llvm-svn: 358056
2019-04-10 08:17:28 +00:00
Nikita Popov 09020ec2a7 [InstCombine] Handle usubo always overflow
Check AlwaysOverflow condition for usubo. The implementation is the
same as the existing handling for uaddo and umulo. Handling for saddo
and ssubo will follow (smulo doesn't have the necessary ValueTracking
support).

Differential Revision: https://reviews.llvm.org/D60483

llvm-svn: 358052
2019-04-10 07:10:53 +00:00
Nikita Popov 596cbeb705 [InstCombine] Directly call computeOverflow methods in OptimizeOverflowCheck; NFC
Instead of using the willOverflow helpers. This makes it easier to
extend handling of AlwaysOverflows.

llvm-svn: 358051
2019-04-10 07:10:44 +00:00
Chen Zheng 5e13ff1da2 [InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y).
Differential Revision: https://reviews.llvm.org/D60395

llvm-svn: 358050
2019-04-10 06:52:09 +00:00
Akira Hatanaka 9ca9d32b6b [ObjC][ARC] Convert the retainRV marker that is passed as a named
metadata into a module flag in the auto-upgrader and make the ARC
contract pass read the marker as a module flag.

This is needed to fix a bug where ARC contract wasn't inserting the
retainRV marker when LTO was enabled, which caused objects returned
from a function to be auto-released.

rdar://problem/49464214

Differential Revision: https://reviews.llvm.org/D60303

llvm-svn: 358047
2019-04-10 06:20:20 +00:00
Nikita Popov 2f5e9de8d1 Revert "[InstCombine] [InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y)."
This reverts commit 1383a91689.

sdiv-canonicalize.ll fails after this revision. The fold needs to be
moved outside the branch handling constant operands. However when this
is done there are further test changes, so I'm reverting this in the
meantime.

llvm-svn: 358026
2019-04-09 18:32:38 +00:00
Nikita Popov eda3b9326e [InstCombine] Restructure OptimizeOverflowCheck; NFC
Change the code to always handle the unsigned+signed cases together
with the same basic structure for add/sub/mul. The simple folds are
always handled first and then the ValueTracking overflow checks are
used.

llvm-svn: 358025
2019-04-09 18:32:28 +00:00
Chen Zheng 1383a91689 [InstCombine] [InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y).
Differential Revision: https://reviews.llvm.org/D60395

llvm-svn: 358017
2019-04-09 16:34:31 +00:00
Sanjay Patel 49d9d17a77 [InstCombine] prevent possible miscompile with sdiv+negate of vector op
Similar to:
rL358005

Forego folding arbitrary vector constants to fix a possible miscompile bug.
We can enhance the transform if we do want to handle the more complicated
vector case.

llvm-svn: 358013
2019-04-09 15:13:03 +00:00
Sanjay Patel f62dcea7ed [InstCombine] prevent possible miscompile with negate+sdiv of vector op
// 0 - (X sdiv C)  -> (X sdiv -C)  provided the negation doesn't overflow.

This fold has been around for many years and nobody noticed the potential
vector miscompile from overflow until recently...
So it seems unlikely that there's much demand for a vector sdiv optimization
on arbitrary vector constants, so just limit the matching to splat constants
to avoid the possible bug.

Differential Revision: https://reviews.llvm.org/D60426

llvm-svn: 358005
2019-04-09 14:09:06 +00:00
Peter Collingbourne df57979ba7 hwasan: Enable -hwasan-allow-ifunc by default.
It's been on in Android for a while without causing problems, so it's time
to make it the default and remove the flag.

Differential Revision: https://reviews.llvm.org/D60355

llvm-svn: 357960
2019-04-09 00:25:59 +00:00
Sanjay Patel 773e04c883 [InstCombine] peek through fdiv to find a squared sqrt
A more general canonicalization between fdiv and fmul would not
handle this case because that would have to be limited by uses
to prevent 2 values from becoming 3 values:
(x/y) * (x/y) --> (x*x) / (y*y)

(But we probably should still have that limited -- but more general --
canonicalization independently of this change.)

llvm-svn: 357943
2019-04-08 21:23:50 +00:00
Brian M. Rzycki 887865c1ad [JumpThreading] Fix incorrect fold conditional after indirectbr/callbr
Fixes bug 40992: https://bugs.llvm.org/show_bug.cgi?id=40992

There is potential for miscompiled code emitted from JumpThreading when
analyzing a block with one or more indirectbr or callbr predecessors. The
ProcessThreadableEdges() function incorrectly folds conditional branches
into an unconditional branch.

This patch prevents incorrect branch folding without fully pessimizing
other potential threading opportunities through the same basic block.

This IR shape was manually fed in via opt and is unclear if clang and the
full pass pipeline will ever emit similar code shapes.

Thanks to Matthias Liedtke for the bug report and simplified IR example.

Differential Revision: https://reviews.llvm.org/D60284

llvm-svn: 357930
2019-04-08 18:20:35 +00:00
Sanjay Patel b33938df7a [InstCombine] remove overzealous assert for shuffles (PR41419)
As the TODO indicates, instsimplify could be improved.

Should fix:
https://bugs.llvm.org/show_bug.cgi?id=41419

llvm-svn: 357910
2019-04-08 13:28:29 +00:00
Simon Pilgrim b4f1bfa659 [InstCombine][X86] Expand MOVMSK to generic IR (PR39927)
First step towards removing the MOVMSK intrinsics completely - this patch expands MOVMSK to the pattern:

e.g. PMOVMSKB(v16i8 x):
%cmp = icmp slt <16 x i8> %x, zeroinitializer
%int = bitcast <16 x i8> %cmp to i16
%res = zext i16 %int to i32

Which is correctly handled by ISel and FastIsel (give or take an annoying movzx move....): https://godbolt.org/z/rkrSFW

Differential Revision: https://reviews.llvm.org/D60256

llvm-svn: 357909
2019-04-08 13:17:51 +00:00
Chen Zheng 923c7c9daa [InstCombine] sdiv exact flag fixup.
Differential Revision: https://reviews.llvm.org/D60396

llvm-svn: 357904
2019-04-08 12:08:03 +00:00
Fangrui Song 2c5c12c041 Change some dyn_cast to more apropriate isa. NFC
llvm-svn: 357773
2019-04-05 16:16:23 +00:00
Evandro Menezes 85bd3978ae [IR] Refactor attribute methods in Function class (NFC)
Rename the functions that query the optimization kind attributes.

Differential revision: https://reviews.llvm.org/D60287

llvm-svn: 357731
2019-04-04 22:40:06 +00:00
Luqman Aden 8911c5be46 [InstCombine] Combine no-wrap sub and icmp w/ constant.
Teach InstCombine the transformation `(icmp P (sub nuw|nsw C2, Y), C) -> (icmp swap(P) Y, C2-C)`

Reviewers: majnemer, apilipenko, sanjoy, spatel, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: dmgreen, lebedev.ri, nikic, hiraditya, JDevlieghere, jfb, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59916

llvm-svn: 357674
2019-04-04 07:08:30 +00:00
David L. Jones 8b8a02175a Revert r357452 - 'SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)'
This revision causes tests to fail under ASAN. Since the cause of the failures
is not clear (could be ASAN, could be a Clang bug, could be a bug in this
revision), the safest course of action seems to be to revert while investigating.

llvm-svn: 357667
2019-04-04 02:27:57 +00:00
Evandro Menezes 7c711ccf36 [IR] Create new method in `Function` class (NFC)
Create method `optForNone()` testing for the function level equivalent of
`-O0` and refactor appropriately.

Differential revision: https://reviews.llvm.org/D59852

llvm-svn: 357638
2019-04-03 21:27:03 +00:00
David Bolvansky 937720e75b [InstCombine] Simplify ctpop with bitreverse/bswap
Summary: Fixes PR41337

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60148

llvm-svn: 357564
2019-04-03 08:08:44 +00:00
Alon Zakai b4f9991f38 [WebAssembly] Add Emscripten OS definition + small_printf
The Emscripten OS provides a definition of __EMSCRIPTEN__, and also that it
supports iprintf optimizations.

Also define small_printf optimizations, which is a printf with float support
but not long double (which in wasm can be useful since long doubles are 128
bit and force linking of float128 emulation code). This part is based on
sunfish's https://reviews.llvm.org/D57620 (which can't land yet since
the WASI integration isn't ready yet).

Differential Revision: https://reviews.llvm.org/D60167

llvm-svn: 357552
2019-04-03 01:08:35 +00:00
David Bolvansky 5ba60b22a4 [InstCombine] Simplify ctlz/cttz with bitreverse
Summary: Fixes PR41273

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60096

llvm-svn: 357521
2019-04-02 20:13:28 +00:00
Vedant Kumar 9da8a68d6b [ArgPromotion] Set debug location at updated callsites
Set the correct debug location on instructions which load arguments in
preparation for a call to an arg-promoted function.

This prevents location cascade from misattributing the line/scope of one
of these loads to the location of the instruction preceding the call.

Differential Revision: https://reviews.llvm.org/D60113

llvm-svn: 357500
2019-04-02 17:42:17 +00:00
Vedant Kumar c6bceec01a [DebugInfo] Fix pr41180 : Loop Vectorization Debugify Failure
Bug: https://bugs.llvm.org/show_bug.cgi?id=41180

In the bug test case the debug location was missing for the cmp instruction in
the "middle block" BB. This patch fixes the bug by copying the debug location
from the cmp of the scalar loop's terminator branch, if it exists.

The patch also fixes the debug location on the subsequent branch instruction.
It was previously using the location of the of the original loop's pre-header
block terminator. Both of these instructions will now map to the source line of
the conditional branch in the original loop.

A regression test has been added that covers these issues.

Patch by Orlando Cazalet-Hyams!

Differential Revision: https://reviews.llvm.org/D59944

llvm-svn: 357499
2019-04-02 17:28:34 +00:00
Simon Pilgrim a3b71018d9 [SLP] reorderInputsAccordingToOpcode is const method. NFCI.
llvm-svn: 357490
2019-04-02 16:27:11 +00:00
Joseph Tremoulet fb4d9f7287 [SimplifyCFG] Don't split musttail call from ret
Summary:
When inserting an `unreachable` after a noreturn call, we must ensure
that it's not a musttail call to avoid breaking the IR invariants for
musttail calls.

Reviewers: fedor.sergeev, majnemer

Reviewed By: majnemer

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60080

llvm-svn: 357485
2019-04-02 15:48:58 +00:00
Taewook Oh 6a27c48be2 [SampleProfile] Repeat indirect call promotion only when the target is actually hot.
Summary: It is possible that multiple indirect call targets have been promoted for a single callsite from the profiled binary. Current implementation repeats promotion for all these targets as far as the callsite itself is hot (the callsite is assumed to be hot if any one of these targets was "hot" during the profiling). However, even when one of the ICPed target is hot other targets may not, and we should not repeat promotion for "cold" targets.

Reviewers: danielcdh, wmi

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59940

llvm-svn: 357484
2019-04-02 15:48:21 +00:00
Joseph Tremoulet b69afa8e9b [PruneEH] Don't split musttail call from ret
Summary:
When inserting an `unreachable` after a noreturn call, we must ensure
that it's not a musttail call to avoid breaking the IR invariants for
musttail calls.

Reviewers: fedor.sergeev, majnemer

Reviewed By: majnemer

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60079

llvm-svn: 357483
2019-04-02 15:47:11 +00:00
Brian Gesiak 1c44ed8b76 [Transforms] Redundant getValueOperand (NFC)
`StoreInst::getValueOperand` is identical to `getOperand(0)`, so the call to
`getOperand(0)` can be replaced. Further, `SI->getValueOperand` is redundantly
called just a few lines down, despite its return value being stored in variable
`DV`. No functional change.

llvm-svn: 357479
2019-04-02 14:57:56 +00:00
Fangrui Song f421978858 [Internalize] Replace uses of std::set with DenseSet
This makes it faster and saves 104 bytes for my build.

llvm-svn: 357458
2019-04-02 09:25:31 +00:00
Fangrui Song 32029135e0 [Internalize] Replace fstream with line_iterator for -internalize-public-api-file
This makes my libLLVMipo.so.9svn smaller by 360 bytes.

llvm-svn: 357457
2019-04-02 09:11:18 +00:00
Hans Wennborg b669fea42f SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)
The code was previously checking that candidates for sinking had exactly
one use or were a store instruction (which can't have uses). This meant
we could sink call instructions only if they had a use.

That limitation seemed a bit arbitrary, so this patch changes it to
"instruction has zero or one use" which seems more natural and removes
the need to special-case stores.

Differential revision: https://reviews.llvm.org/D59936

llvm-svn: 357452
2019-04-02 08:01:38 +00:00
Philip Reames adb3ece216 [LoopPredication] Simplify widenable condition handling [NFC]
The code doesn't actually need any of the information about the widenable condition at this level.  The only thing we need is to ensure the WC call is the last thing anded in, and even that is a quirk we should really look to remove.

llvm-svn: 357448
2019-04-02 02:42:57 +00:00
Philip Reames f608678f1f [LoopPred] Rename a variable to simply a future patch [NFC]
llvm-svn: 357433
2019-04-01 22:39:54 +00:00
Nick Lewycky 1e1e212d27 [NFC] Remove dead parameter "FreeInLoop", fix some typos and trailing whitespace.
Differential Revision: https://reviews.llvm.org/D60084

llvm-svn: 357427
2019-04-01 20:37:56 +00:00
Simon Pilgrim b06935fa8c [SLP] getVectorElementSize and isTreeTinyAndNotFullyVectorizable are const methods. NFCI.
llvm-svn: 357416
2019-04-01 17:48:03 +00:00
Simon Pilgrim f6c04ad486 [SLP] getGatherCost and isFullyVectorizableTinyTree are const methods. NFCI.
llvm-svn: 357414
2019-04-01 17:32:46 +00:00
Philip Reames 05e3e554b4 [LoopPred] Be uniform about proving generated conditions
We'd been optimizing the case where the predicate was obviously true, do the same for the false case.  Mostly just for completeness sake, but also may improve compile time in loops which will exit through the guard.  Such loops are presumed rare in fastpath code, but may be present down untaken paths, so optimizing for them is still useful.

llvm-svn: 357408
2019-04-01 16:26:08 +00:00
Philip Reames d109e2a7c3 [LoopPred] Delete the old condition expressions if unused
LoopPredication was replacing the original condition, but leaving the instructions to compute the old conditions around.  This would get cleaned up by other passes of course, but we might as well do it eagerly.  That also makes the test output less confusing.  

llvm-svn: 357406
2019-04-01 16:05:15 +00:00
Mikael Holmen 150a7ec2dc [InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder
Summary:
This fixes PR41270.

The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.

Reviewers: reames, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60058

llvm-svn: 357389
2019-04-01 14:10:10 +00:00
Mikael Holmen 3e527cd823 Revert "[InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder"
This reverts commit 75216a6dbcfe5fb55039ef06a07e419fa875f4a5.

I'll recommit with a better commit message with reference to the
phabricator review.

llvm-svn: 357387
2019-04-01 14:06:45 +00:00
Mikael Holmen d66a47f90a [InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder
This fixes PR41270.

The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.

llvm-svn: 357385
2019-04-01 13:48:56 +00:00
Sanjay Patel 97d1bc4454 [InstCombine] eliminate commuted select-shuffles + binop (PR41304)
If we have a commutable vector binop with inverted select-shuffles,
we don't care about the order of the operands in each vector lane:

LHS = shuffle V1, V2, <0, 5, 6, 3>
RHS = shuffle V2, V1, <0, 5, 6, 3>
LHS + RHS --> <V1[0]+V2[0], V2[1]+V1[1], V2[2]+V1[2], V1[3]+V2[3]> --> V1 + V2

PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...is currently titled as an SLP enhancement, but at least for the
given example, we can reduce that in instcombine because we are just
eliminating shuffles.

As noted in the TODO, this could be generalized, but I haven't thought
through those patterns completely, so this is limited to what appears
to be always safe.

Differential Revision: https://reviews.llvm.org/D60048

llvm-svn: 357382
2019-04-01 13:36:40 +00:00
Sanjay Patel b276dd195a [InstCombine] canonicalize select shuffles by commuting
In PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...we have a case where we want to fold a binop of select-shuffle (blended) values.

Rather than try to match commuted variants of the pattern, we can canonicalize the
shuffles and check for mask equality with commuted operands.

We don't produce arbitrary shuffle masks in instcombine, but select-shuffles are a
special case that the backend is required to handle because we already canonicalize
vector select to this shuffle form.

So there should be no codegen difference from this change. It's possible that this
improves CSE in IR though.

Differential Revision: https://reviews.llvm.org/D60016

llvm-svn: 357366
2019-03-31 15:01:30 +00:00
Philip Reames b55637b5d7 [LoopPredication] Remove stale TODO
llvm-svn: 357331
2019-03-29 23:10:01 +00:00
Philip Reames 3d4e108237 [LoopPredication] Use the builder's insertion point everywhere [NFC]
llvm-svn: 357330
2019-03-29 23:06:57 +00:00
Sanjay Patel 3f4d1b4abd [InstCombine] move shuffle canonicalizations before other transforms
This may not be NFC, but I'm not sure how to expose any diffs in
tests. In theory, it should be slightly more efficient and possibly
more profitable to do the canonicalizations (which can increase the
undef elements in the mask) ahead of SimplifyDemandedVectorElts().

llvm-svn: 357272
2019-03-29 16:49:38 +00:00
Simon Pilgrim 6a75c36ea9 [SLP] Add support for commutative icmp/fcmp predicates
For the cases where the icmp/fcmp predicate is commutative, use reorderInputsAccordingToOpcode to collect and commute the operands.

This requires a helper to recognise commutativity in both general Instruction and CmpInstr types - the CmpInst::isCommutative doesn't overload the Instruction::isCommutative method for reasons I'm not clear on (maybe because its based on predicate not opcode?!?).

Differential Revision: https://reviews.llvm.org/D59992

llvm-svn: 357266
2019-03-29 15:28:25 +00:00
Florian Hahn 9b41a7320d Recommit "[DSE] Preserve basic block ordering using OrderedBasicBlock."
Updated to use DenseMap::insert instead of [] operator for insertion, to
avoid a crash caused by epoch checks.

This reverts commit 2b85de4383.

llvm-svn: 357257
2019-03-29 14:10:24 +00:00
Simon Pilgrim 62f0d1650a [SLP] Add support for swapping icmp/fcmp predicates to permit vectorization
We should be able to match elements with the swapped predicate as well - as long as we commute the source operands.

Differential Revision: https://reviews.llvm.org/D59956

llvm-svn: 357243
2019-03-29 10:41:00 +00:00
Florian Hahn 2b85de4383 Revert Recommit "[DSE] Preserve basic block ordering using OrderedBasicBlock."
Another buildbot failure

http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/20402

clang-9: /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/llvm/include/llvm/ADT/DenseMap.h:1228: llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::value_type* llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::operator->() const [with KeyT = const llvm::Instruction*; ValueT = unsigned int; KeyInfoT = llvm::DenseMapInfo<const llvm::Instruction*>; Bucket = llvm::detail::DenseMapPair<const llvm::Instruction*, unsigned int>; bool IsConst = false; llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::pointer = llvm::detail::DenseMapPair<const llvm::Instruction*, unsigned int>*; llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::value_type = llvm::detail::DenseMapPair<const llvm::Instruction*, unsigned int>]: Assertion `isHandleInSync() && "invalid iterator access!"' failed.

0.	Program arguments: /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/stage1.install/bin/clang-9 -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -disable-free -main-file-name ArchiveCommandLine.cpp -mrelocation-model static -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu skylake-avx512 -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -coverage-notes-file /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/sandbox/build/MultiSource/Benchmarks/7zip/Output/ArchiveCommandLine.llvm.gcno -resource-dir /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/stage1.install/lib/clang/9.0.0 -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/sandbox/build/MultiSource/Benchmarks/7zip -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/include -I ../../../include -D _GNU_SOURCE -D __STDC_LIMIT_MACROS -D NDEBUG -D BREAK_HANDLER -D UNICODE -D _UNICODE -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/C -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP/myWindows -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP/include_windows -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP -I . -D _FILE_OFFSET_BITS=64 -D _LARGEFILE_SOURCE -D NDEBUG -D _REENTRANT -D ENV_UNIX -D _7ZIP_LARGE_PAGES -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/backward -internal-isystem /usr/local/include -internal-isystem /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/stage1.install/lib/clang/9.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -std=gnu++98 -fdeprecated-macro -fdebug-compilation-dir /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/sandbox/build/MultiSource/Benchmarks/7zip -ferror-limit 19 -fmessage-length 0 -pthread -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -vectorize-loops -vectorize-slp -o Output/ArchiveCommandLine.llvm.o -x c++ /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP/7zip/UI/Common/ArchiveCommandLine.cpp -faddrsig

This reverts r357222 (git commit 64cccfcc72)

llvm-svn: 357227
2019-03-29 00:22:26 +00:00
Florian Hahn 64cccfcc72 Recommit "[DSE] Preserve basic block ordering using OrderedBasicBlock."
Recommitting after addressing a buildbot failure.

This reverts commit c87869ebea.

llvm-svn: 357222
2019-03-28 23:11:00 +00:00
Florian Hahn 45682fd633 [LSR] Fix signed overflow in GenerateCrossUseConstantOffsets.
For the attached test case, unchecked addition of immediate starts and
ends overflows, as they can be arbitrary i64 constants.

Proof: https://rise4fun.com/Alive/Plqc

Reviewers: qcolombet, gilr, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D59218

llvm-svn: 357217
2019-03-28 22:17:29 +00:00
Florian Hahn c87869ebea Revert [DSE] Preserve basic block ordering using OrderedBasicBlock.
This reverts r357208 (git commit c0bfd37d38)

This causes a buildbot failure:  http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/16124

FAILED: lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o
/home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/install/stage2/bin/clang++   -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/IR -I/home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/lib/IR -Iinclude -I/home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/include -fPIC -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -flto=thin -O3    -UNDEBUG  -fno-exceptions -fno-rtti -MD -MT lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o -MF lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o.d -o lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o -c /home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/lib/IR/IRBuilder.cpp
clang-9: /home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/lib/Analysis/OrderedBasicBlock.cpp:38: bool llvm::OrderedBasicBlock::comesBefore(const llvm::Instruction *, const llvm::Instruction *): Assertion `!(LastInstFound == BB->end() && NextInstPos != 0) && "Instruction supposed to be in NumberedInsts"' failed.

llvm-svn: 357211
2019-03-28 20:36:24 +00:00
Florian Hahn c0bfd37d38 [DSE] Preserve basic block ordering using OrderedBasicBlock.
By extending OrderedBB to allow removing and replacing cached
instructions, we can preserve OrderedBBs in DSE easily. This eliminates
one source of quadratic compile time in DSE.

Fixes PR38829.

Reviewers: rnk, efriedma, hfinkel

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D59789

llvm-svn: 357208
2019-03-28 20:02:33 +00:00
Benjamin Kramer ba2ea93ad1 Make helper functions static. NFC.
llvm-svn: 357187
2019-03-28 17:18:42 +00:00
Pierre Gousseau a833c2bd3e [asan] Add options -asan-detect-invalid-pointer-cmp and -asan-detect-invalid-pointer-sub options.
This is in preparation to a driver patch to add gcc 8's -fsanitize=pointer-compare and -fsanitize=pointer-subtract.
Disabled by default as this is still an experimental feature.

Reviewed By: morehouse, vitalybuka

Differential Revision: https://reviews.llvm.org/D59220

llvm-svn: 357157
2019-03-28 10:51:24 +00:00
Florian Hahn e21ed594d8 [VPlan] Determine Vector Width programmatically.
With this change, the VPlan native path is triggered with the directive:

   #pragma clang loop vectorize(enable)

There is no need to specify the vectorize_width(N) clause.

Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>

Differential Revision: https://reviews.llvm.org/D57598

llvm-svn: 357156
2019-03-28 10:37:12 +00:00
Nikita Popov 7462303e06 [InstCombine] Use uadd.sat and usub.sat for canonicalization
Start using the uadd.sat and usub.sat intrinsics for the existing
canonicalizations. These intrinsics should optimize better than
expanded IR, have better handling in the X86 backend and should
be no worse than expanded IR in other backends, as far as we know.

rL357012 already introduced use of uadd.sat for the add+umin pattern.

Differential Revision: https://reviews.llvm.org/D58872

llvm-svn: 357103
2019-03-27 17:56:15 +00:00
Sanjay Patel 81e8d76f5b [InstCombine] form uaddsat from add+umin (PR14613)
This is the last step towards solving the examples shown in:
https://bugs.llvm.org/show_bug.cgi?id=14613

With this change, x86 should end up with psubus instructions
when those are available.

All known codegen issues with expanding the saturating intrinsics
were resolved with:
D59006 / rL356855

We also have some early evidence in D58872 that using the intrinsics
will lead to better perf. If some target regresses from this, custom
lowering of the intrinsics (as in the above for x86) may be needed.

llvm-svn: 357012
2019-03-26 17:50:08 +00:00
Simon Pilgrim 6f96795b88 [SLPVectorizer] Merge reorderAltShuffleOperands into reorderInputsAccordingToOpcode
As discussed on D59738, this generalizes reorderInputsAccordingToOpcode to handle multiple + non-commutative instructions so we can get rid of reorderAltShuffleOperands and make use of the extra canonicalizations that reorderInputsAccordingToOpcode brings.

Differential Revision: https://reviews.llvm.org/D59784

llvm-svn: 356939
2019-03-25 20:05:27 +00:00
Simon Pilgrim ff3abef395 [SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction canonicalization
Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops.

This is prep work towards:
(a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands
(b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops.

Differential Revision: https://reviews.llvm.org/D59738

llvm-svn: 356913
2019-03-25 15:53:55 +00:00
Simon Pilgrim 5cd4eb96f6 [SLPVectorizer] shouldReorderOperands - just check for reordering. NFCI.
Remove the I.getOperand() calls from inside shouldReorderOperands - reorderInputsAccordingToOpcode should handle the creation of the operand lists and shouldReorderOperands should just check to see whether the i'th element should be commuted.

llvm-svn: 356854
2019-03-24 13:36:32 +00:00
Nikita Popov 977934f00f [ConstantRange] Add getFull() + getEmpty() named constructors; NFC
This adds ConstantRange::getFull(BitWidth) and
ConstantRange::getEmpty(BitWidth) named constructors as more readable
alternatives to the current ConstantRange(BitWidth, /* full */ false)
and similar. Additionally private getFull() and getEmpty() member
functions are added which return a full/empty range with the same bit
width -- these are commonly needed inside ConstantRange.cpp.

The IsFullSet argument in the ConstantRange(BitWidth, IsFullSet)
constructor is now mandatory for the few usages that still make use of it.

Differential Revision: https://reviews.llvm.org/D59716

llvm-svn: 356852
2019-03-24 09:34:40 +00:00
Simon Pilgrim 1466e5c383 Fix unused variable warning on non-asserts builds. NFCI.
llvm-svn: 356841
2019-03-23 16:56:23 +00:00
Simon Pilgrim 64feec7977 Remove unused function argument. NFCI.
llvm-svn: 356840
2019-03-23 16:20:34 +00:00
Simon Pilgrim c7ba9555cf [SLPVectorizer] reorderInputsAccordingToOpcode - use InstructionState directly. NFCI.
llvm-svn: 356832
2019-03-23 13:44:06 +00:00
Nikita Popov 0125e4484e [LowerSwitch] Use ConstantRange::fromKnownBits(); NFC
Using an unsigned range to stay NFC, but a signed range would really
be more useful here.

llvm-svn: 356831
2019-03-23 12:48:54 +00:00
Simon Pilgrim f4f01f3cff [SLPVectorizer] Don't repeat VL.size() call. NFCI.
llvm-svn: 356830
2019-03-23 12:11:25 +00:00
Simon Pilgrim b68322f9d0 [SLP] Remove redundancy of performing operand reordering twice: once in buildTree() and later in vectorizeTree().
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.

This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo

Patch by: @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59059

llvm-svn: 356814
2019-03-22 21:27:11 +00:00
Daniel Sanders ef8761fd3b Fix non-determinism in Reassociate caused by address coincidences
Summary:
Between building the pair map and querying it there are a few places that
erase and create Values. It's rare but the address of these newly created
Values is occasionally the same as a just-erased Value that we already
have in the pair map. These coincidences should be accounted for to avoid
non-determinism.

Thanks to Roman Tereshin for the test case.

Reviewers: rtereshin, bogner

Reviewed By: rtereshin

Subscribers: mgrang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59401

llvm-svn: 356803
2019-03-22 20:16:35 +00:00
Tim Renouf 94c163c34e InstCombineSimplifyDemanded: Allow v3 results for AMDGCN buffer and image intrinsics
This helps to avoid the situation where RA spots that only 3 of the
v4f32 result of a load are used, and immediately reallocates the 4th
register for something else, requiring a stall waiting for the load.

Differential Revision: https://reviews.llvm.org/D58906

Change-Id: I947661edfd5715f62361a02b100f14aeeada29aa
llvm-svn: 356768
2019-03-22 15:53:50 +00:00
Akira Hatanaka b576c77a9e Don't add a tail keyword to calls to ObjC runtime functions if the calls
are annotated with notail.

r356705 annotated calls to objc_retainAutoreleasedReturnValue with
notail on x86-64. This commit teaches ARC optimizer to check the notail
marker on the call before turning it into a tail call.

rdar://problem/38675807

llvm-svn: 356707
2019-03-21 20:16:09 +00:00
Craig Topper 16dc165046 [InstCombine] Don't transform ((C1 OP zext(X)) & C2) -> zext((C1 OP X) & C2) if either zext or OP has another use.
If they have other users we'll just end up increasing the instruction count.

We might be able to weaken this to only one of them having a single use if we can prove that the and will be removed.

Fixes PR41164.

Differential Revision: https://reviews.llvm.org/D59630

llvm-svn: 356690
2019-03-21 17:50:49 +00:00
Simon Pilgrim 92cbcfc325 Fix -Wmisleading-indentation gcc7 warning. NFCI.
llvm-svn: 356658
2019-03-21 11:58:22 +00:00
Mikael Holmen 5b1754f93d Silence warning about unused variable in builds without asserts [NFC]
llvm-svn: 356648
2019-03-21 07:54:44 +00:00
Philip Reames 60212be619 [instcombine] Add some todos, and arrange code for readibility
llvm-svn: 356642
2019-03-21 03:23:40 +00:00
Alina Sbirlea f69f807321 [NFC] Fix brace indentation.
llvm-svn: 356596
2019-03-20 19:18:55 +00:00
Robert Lougher f2158a8ef0 Resubmit r356511 "[TailCallElim] Add tailcall elimination pass to LTO pipelines"
Failing LLD tests have been fixed in r356593.

llvm-svn: 356594
2019-03-20 19:08:18 +00:00
Philip Reames e4588bbf80 Simplify operands of masked stores and scatters based on demanded elements
If we know we're not storing a lane, we don't need to compute the lane. This could be improved by using the undef element result to further prune the mask, but I want to separate that into its own change since it's relatively likely to expose other problems.

Differential Revision: https://reviews.llvm.org/D57247

llvm-svn: 356590
2019-03-20 18:44:58 +00:00
Alina Sbirlea 5baa72ea74 [LICM & MemorySSA] Don't sink/hoist stores in the presence of ordered loads.
Summary:
Before this patch, if any Use existed in the loop, with a defining
access in the loop, we conservatively decide to not move the store.
What this approach was missing, is that ordered loads are not Uses, they're Defs
in MemorySSA. So, even when the clobbering walker does not find that
volatile load to interfere, we still cannot hoist a store past a
volatile load.
Resolves PR41140.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59564

llvm-svn: 356588
2019-03-20 18:33:37 +00:00
Nikita Popov 37cf25c3c6 [InstCombine] Fold add nuw + uadd.with.overflow
Fold add nuw and uadd.with.overflow with constants if the
addition does not overflow.

Part of https://bugs.llvm.org/show_bug.cgi?id=38146.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D59471

llvm-svn: 356584
2019-03-20 18:00:27 +00:00
Philip Reames 484d07c828 [instcombine] Add todos describing missing transforms for masked.* intrinsics
llvm-svn: 356536
2019-03-20 03:36:05 +00:00
Robert Lougher c67a759c99 Revert r356511 "[TailCallElim] Add tailcall elimination pass to LTO pipelines"
Due to buildbot failures (LLD tests).

llvm-svn: 356516
2019-03-19 20:54:20 +00:00
Robert Lougher de548ccab9 [TailCallElim] Add tailcall elimination pass to LTO pipelines
LTO provides additional opportunities for tailcall elimination due to
link-time inlining and visibility of nocapture attribute. Testing showed
negligible impact on compilation times.

Differential Revision: https://reviews.llvm.org/D58391

llvm-svn: 356511
2019-03-19 20:24:28 +00:00
Philip Reames 70537abe52 Demanded elements support for masked.load and masked.gather
Teach instcombine to propagate demanded elements through a masked load or masked gather instruction. This is in the broader context of improving vector pointer instcombine under https://reviews.llvm.org/D57140.

Differential Revision: https://reviews.llvm.org/D57372

llvm-svn: 356510
2019-03-19 20:10:00 +00:00
Sanjay Patel 5b820323ca [InstCombine] fold logic-of-nan-fcmps (PR41069)
Combine 2 fcmps that are checking for nan-ness:
   and (fcmp ord X, 0), (and (fcmp ord Y, 0), Z) --> and (fcmp ord X, Y), Z
   or  (fcmp uno X, 0), (or  (fcmp uno Y, 0), Z) --> or  (fcmp uno X, Y), Z

This is an exact match for a minimal reassociation pattern.
If we want to handle this more generally that should go in
the reassociate pass and allow removing this code.

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=41069

llvm-svn: 356471
2019-03-19 16:39:17 +00:00
Markus Lavin b86ce219f4 [DebugInfo] Introduce DW_OP_LLVM_convert
Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows
for a convenient way to perform type conversions on the Dwarf expression
stack. As an additional bonus it paves the way for using other Dwarf
v5 ops that need to reference a base_type.

The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp
to perform sext/zext on debug values but mainly the patch is about
preparing terrain for adding other Dwarf v5 ops that need to reference a
base_type.

For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a
complex shift & mask pattern is generated to emulate sext/zext.

This is a recommit of r356442 with trivial fixes for the failing tests.

Differential Revision: https://reviews.llvm.org/D56587

llvm-svn: 356451
2019-03-19 13:16:28 +00:00
Markus Lavin ad78768d59 Revert "[DebugInfo] Introduce DW_OP_LLVM_convert"
This reverts commit 1cf4b593a7ebd666fc6775f3bd38196e8e65fafe.

Build bots found failing tests not detected locally.

Failing Tests (3):
  LLVM :: DebugInfo/Generic/convert-debugloc.ll
  LLVM :: DebugInfo/Generic/convert-inlined.ll
  LLVM :: DebugInfo/Generic/convert-linked.ll

llvm-svn: 356444
2019-03-19 09:17:28 +00:00
Markus Lavin cd8a940b37 [DebugInfo] Introduce DW_OP_LLVM_convert
Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows
for a convenient way to perform type conversions on the Dwarf expression
stack. As an additional bonus it paves the way for using other Dwarf
v5 ops that need to reference a base_type.

The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp
to perform sext/zext on debug values but mainly the patch is about
preparing terrain for adding other Dwarf v5 ops that need to reference a
base_type.

For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a
complex shift & mask pattern is generated to emulate sext/zext.

Differential Revision: https://reviews.llvm.org/D56587

llvm-svn: 356442
2019-03-19 08:48:19 +00:00
Sanjay Patel 6063393536 [InstCombine] allow general vector constants for funnel shift to shift transforms
Follow-up to:
rL356338
rL356369

We can calculate an arbitrary vector constant minus the bitwidth, so there's
no need to limit this transform to scalars and splats.

llvm-svn: 356372
2019-03-18 14:27:51 +00:00
Sanjay Patel 84de8a30a0 [InstCombine] extend rotate-left-by-constant canonicalization to funnel shift
Follow-up to:
rL356338

Rotates are a special case of funnel shift where the 2 input operands
are the same value, but that does not need to be a restriction for the
canonicalization when the shift amount is a constant.

llvm-svn: 356369
2019-03-18 14:10:11 +00:00
Sanjay Patel b3bcd95771 [InstCombine] canonicalize rotate right by constant to rotate left
This was noted as a backend problem:
https://bugs.llvm.org/show_bug.cgi?id=41057
...and subsequently fixed for x86:
rL356121
But we should canonicalize these in IR for the benefit of all targets
and improve IR analysis such as CSE.

llvm-svn: 356338
2019-03-17 19:08:00 +00:00
Philip Reames 68a2e4d48b [SimplifyDemandedVec] Strengthen handling all undef lanes (particularly GEPs)
A change of two parts:
1) A generic enhancement for all callers of SDVE to exploit the fact that if all lanes are undef, the result is undef.
2) A GEP specific piece to strengthen/fix the vector index undef element handling, and call into the generic infrastructure when visiting the GEP.

The result is that we replace a vector gep with at least one undef in each lane with a undef.  We can also do the same for vector intrinsics.  Once the masked.load patch (D57372) has landed, I'll update to include call tests as well.

Differential Revision: https://reviews.llvm.org/D57468

llvm-svn: 356293
2019-03-15 19:54:06 +00:00
Robert Widmann 2f1ebe6ee8 [LLVM-C] Expose the "Add Discriminators" Pass To LLVM-C
Summary: Add bindings to create a wrapped "Add Discriminators" pass.  Now that we have debug info support, this is a handy transform to have.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: dblaikie, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58624

llvm-svn: 356272
2019-03-15 16:57:23 +00:00
Teresa Johnson 70ec64cb72 [ThinLTO] Restructure AliasSummary to contain ValueInfo of Aliasee
Summary:
The AliasSummary previously contained the AliaseeGUID, which was only
populated when reading the summary from bitcode. This patch changes it
to instead hold the ValueInfo of the aliasee, and always populates it.
This enables more efficient access to the ValueInfo (specifically in the
recent patch r352438 which needed to perform an index hash table lookup
using the aliasee GUID).

As noted in the comments in AliasSummary, we no longer technically need
to keep a pointer to the corresponding aliasee summary, since it could
be obtained by walking the list of summaries on the ValueInfo looking
for the summary in the same module. However, I am concerned that this
would be inefficient when walking through the index during the thin
link for various analyses. That can be reevaluated in the future.

By always populating this new field, we can remove the guard and special
handling for a 0 aliasee GUID when dumping the dot graph of the summary.

An additional improvement in this patch is when reading the summaries
from LLVM assembly we now set the AliaseeSummary field to the aliasee
summary in that same module, which makes it consistent with the behavior
when reading the summary from bitcode.

Reviewers: pcc, mehdi_amini

Subscribers: inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D57470

llvm-svn: 356268
2019-03-15 15:11:38 +00:00
Florian Hahn d9e88f7b7f [LSR] Check for signed overflow in NarrowSearchSpaceByDetectingSupersets.
We are adding a sign extended IR value to an int64_t, which can cause
signed overflows, as in the attached test case, where we have a formula
with BaseOffset = -1 and a constant with numeric_limits<int64_t>::min().

If the addition would overflow, skip the simplification for this
formula. Note that the target triple is required to trigger the failure.

Reviewers: qcolombet, gilr, kparzysz, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D59211

llvm-svn: 356256
2019-03-15 12:17:36 +00:00
Matt Arsenault 1d83670dbd AMDGPU: Remove intrinsic operand assert
Before r355981, this was under LLVM_DEBUG. I don't think the assert is
quite right, but this really should be a verifier check. Instcombine
should not be asserting on this sort of thing.

llvm-svn: 356219
2019-03-14 23:45:09 +00:00
Sanjay Patel de1d5d3675 [InstCombine] canonicalize funnel shift constant shift amount to be modulo bitwidth
The shift argument is defined to be modulo the bitwidth, so if that argument
is a constant, we can always reduce the constant to its minimal form to allow
better CSE and other follow-on transforms.

We need to be careful to ignore constant expressions here, or we will likely
infinite loop. I'm adding a general vector constant query for that case.

Differential Revision: https://reviews.llvm.org/D59374

llvm-svn: 356192
2019-03-14 19:22:08 +00:00
Sam Parker a86ff8640d Fix for buildbots
Remove unused private field.

llvm-svn: 356135
2019-03-14 11:38:55 +00:00
Sam Parker eb0b8019e8 [NFC][LSR] Cleanup Cost API
Create members for Loop, ScalarEvolution, DominatorTree,
TargetTransformInfo and Formula.

Differential Revision: https://reviews.llvm.org/D58389

llvm-svn: 356131
2019-03-14 11:05:07 +00:00
Matt Arsenault caf1316f71 IR: Add immarg attribute
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.

Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.

This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.

llvm-svn: 355981
2019-03-12 21:02:54 +00:00
Philip Reames 9b6b4fac83 [SROA] Fix a crash when trying to convert a memset to an non-integral pointer type
The included test case currently crashes on tip of tree. Rather than adding a bailout, I chose to restructure the code so that the existing helper function could be used. Given that, the majority of the diff is NFC-ish, but the key difference is that canConvertValue returns false when only one side is a non-integral pointer.

Thanks to Cherry Zhang for the test case.

Differential Revision: https://reviews.llvm.org/D59000

llvm-svn: 355962
2019-03-12 20:15:05 +00:00
Craig Topper 03e93f514a [SanitizerCoverage] Avoid splitting critical edges when destination is a basic block containing unreachable
This patch adds a new option to SplitAllCriticalEdges and uses it to avoid splitting critical edges when the destination basic block ends with unreachable. Otherwise if we split the critical edge, sanitizer coverage will instrument the new block that gets inserted for the split. But since this block itself shouldn't be reachable this is pointless. These basic blocks will stick around and generate assembly, but they don't end in sane control flow and might get placed at the end of the function. This makes it look like one function has code that flows into the next function.

This showed up while compiling the linux kernel with clang. The kernel has a tool called objtool that detected the code that appeared to flow from one function to the next. https://github.com/ClangBuiltLinux/linux/issues/351#issuecomment-461698884

Differential Revision: https://reviews.llvm.org/D57982

llvm-svn: 355947
2019-03-12 18:20:25 +00:00
Liang Zou 4a8afeb970 [format] \t => ' '
Summary:
1. \t => '  '
2. test commit access

Reviewers: Higuoxing, liangdzou

Reviewed By: Higuoxing, liangdzou

Subscribers: kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59243

llvm-svn: 355924
2019-03-12 14:48:32 +00:00
Fangrui Song b1dfbebe8b [SimplifyLibCalls] Simplify optimizePuts
The code might intend to replace puts("") with putchar('\n') even if the
return value is used. It failed because use_empty() was used to guard
the whole block. While returning '\n' (putchar('\n')) is technically
correct (puts is only required to return a nonnegative number on
success), doing this looks weird and there is really little benefit to
optimize puts whose return value is used. So don't do that.

llvm-svn: 355921
2019-03-12 14:20:22 +00:00
Simon Pilgrim d3a8fd8bfb Revert rL355906: [SLP] Remove redundancy of performing operand reordering twice: once in buildTree() and later in vectorizeTree().
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.

This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo

Patch by: @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59059
........

Reverted due to buildbot failures that I don't have time to track down.

llvm-svn: 355913
2019-03-12 11:51:59 +00:00
Simon Pilgrim 5db95efdbd Try to fix SLPVectorizer BoUpSLP::BoEdgeInfo::dump visibility on non-debug builds
llvm-svn: 355912
2019-03-12 11:31:06 +00:00
Simon Pilgrim 2086a8894d [SLP] Remove redundancy of performing operand reordering twice: once in buildTree() and later in vectorizeTree().
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.

This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo

Patch by: @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59059

llvm-svn: 355906
2019-03-12 10:51:51 +00:00
Fangrui Song f260967055 [SimplifyLibCalls] Fix comments about fputs, memchr, and s[n]printf. NFC
llvm-svn: 355905
2019-03-12 10:31:52 +00:00
Kristina Brooks 5b1e1c0537 Very minor typo. NFC
Typo `we we're` => `we were` in the pass EarlyCSE

Patch by liangdzou (Liang ZOU)

Differential Revision: https://reviews.llvm.org/D59241

llvm-svn: 355895
2019-03-12 07:08:19 +00:00
Sanjoy Das 3f5ce18658 Reland "Relax constraints for reduction vectorization"
Change from original commit: move test (that uses an X86 triple) into the X86
subdirectory.

Original description:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

llvm-svn: 355889
2019-03-12 01:31:44 +00:00
Sanjoy Das 2136a5bc49 Revert "Relax constraints for reduction vectorization"
This reverts commit r355868.  Breaks hexagon.

llvm-svn: 355873
2019-03-11 22:37:31 +00:00
Sanjoy Das 93f8cc186a Relax constraints for reduction vectorization
Summary:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

llvm-svn: 355868
2019-03-11 21:36:41 +00:00
Nico Weber 885b790f89 Remove esan.
It hasn't seen active development in years, and it hasn't reached a
state where it was useful.

Remove the code until someone is interested in working on it again.

Differential Revision: https://reviews.llvm.org/D59133

llvm-svn: 355862
2019-03-11 20:23:40 +00:00
Brian Gesiak d7b68132d8 [coroutines][PR40979] Ignore unreachable uses across suspend points
Summary:
Depends on https://reviews.llvm.org/D59069.

https://bugs.llvm.org/show_bug.cgi?id=40979 describes a bug in which the
-coro-split pass would assert that a use was across a suspend point from
a definition. Normally this would mean that a value would "spill" across
a suspend point and thus need to be stored in the coroutine frame. However,
in this case the use was unreachable, and so it would not be necessary
to store the definition on the frame.

To prevent the assert, simply remove unreachable basic blocks from a
coroutine function before computing spills. This avoids the assert
reported in PR40979.

Reviewers: GorNishanov, tks2103

Reviewed By: GorNishanov

Subscribers: EricWF, jdoerfert, llvm-commits, lewissbaker

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59068

llvm-svn: 355852
2019-03-11 18:31:28 +00:00
Brian Gesiak 4349dc76fa [Utils] Extract EliminateUnreachableBlocks (NFC)
Summary:
Extract the functionality of eliminating unreachable basic blocks
within a function, previously encapsulated within the
-unreachableblockelim pass, and make it available as a function within
BlockUtils.h. No functional change intended other than making the logic
reusable.

Exposing this logic makes it easier to implement
https://reviews.llvm.org/D59068, which fixes coroutines bug
https://bugs.llvm.org/show_bug.cgi?id=40979.

Reviewers: mkazantsev, wmi, davidxl, silvas, davide

Reviewed By: davide

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59069

llvm-svn: 355846
2019-03-11 17:51:57 +00:00
Jeremy Morse 90ede5f4bf [SimplifyCFG] Retain debug info when threading jumps with critical edges
Fixes bug 38023: https://bugs.llvm.org/show_bug.cgi?id=38023

The SimplifyCFG pass will perform jump threading in some cases where
doing so is trivial and would simplify the CFG. When folding a series
of blocks with redundant conditional branches into an unconditional "critical
edge" block, it does not keep the debug location associated with the previous
conditional branch.

This patch fixes the bug described by copying the debug info from the
old conditional branch to the new unconditional branch instruction, and
adds a regression test for the SimplifyCFG pass that covers this case.

Patch by Stephen Tozer!

Differential Revision: https://reviews.llvm.org/D59206

llvm-svn: 355833
2019-03-11 16:23:59 +00:00
Jeremy Morse b60aea4131 [JumpThreading] Retain debug info when replacing branch instructions
Fixes bug 37966: https://bugs.llvm.org/show_bug.cgi?id=37966

The Jump Threading pass will replace certain conditional branch
instructions with unconditional branches when it can prove that only one
branch can occur. Prior to this patch, it would not carry the debug
info from the old instruction to the new one.

This patch fixes the bug described by copying the debug info from the
conditional branch instruction to the new unconditional branch
instruction, and adds a regression test for the Jump Threading pass that
covers this case.

Patch by Stephen Tozer!

Differential Revision: https://reviews.llvm.org/D58963

llvm-svn: 355822
2019-03-11 11:48:57 +00:00
Clement Courbet 8e16d73346 [SelectionDAG] Allow the user to specify a memeq function.
Summary:
Right now, when we encounter a string equality check,
e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a
small compile-time constant, and fall back on calling `memcmp()` else.

This is sub-optimal because memcmp has to compute much more than
equality.

This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms
that support `bcmp`.

`bcmp` can be made much more efficient than `memcmp` because equality
compare is trivially parallel while lexicographic ordering has a chain
dependency.

Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits

Differential Revision: https://reviews.llvm.org/D56593

llvm-svn: 355672
2019-03-08 09:07:45 +00:00
David Green ffc922ec35 [LSR] Attempt to increase the accuracy of LSR's setup cost
In some loops, we end up generating loop induction variables that look like:
  {(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1}
As opposed to the simpler:
  {(zext i16 (%i0 * %i1) to i32),+,-1}
i.e we count up from -limit to 0, not the simpler counting down from limit to
0. This is because the scores, as LSR calculates them, are the same and the
second is filtered in place of the first. We end up with a redundant SUB from 0
in the code.

This patch tries to make the calculation of the setup cost a little more
thoroughly, recursing into the scev members to better approximate the setup
required. The cost function for comparing LSR costs is:

return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds,
                C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
       std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds,
                C2.ScaleCost, C2.ImmCost, C2.SetupCost);
So this will only alter results if none of the other variables turn out to be
different.

Differential Revision: https://reviews.llvm.org/D58770

llvm-svn: 355597
2019-03-07 13:44:40 +00:00
Fangrui Song b0f764c737 [BDCE] Optimize find+insert with early insert
llvm-svn: 355583
2019-03-07 06:38:03 +00:00
Nick Desaulniers 212c8ac23f [LoopRotate] fix crash encountered with callbr
Summary:
While implementing inlining support for callbr
(https://bugs.llvm.org/show_bug.cgi?id=40722), I hit a crash in Loop
Rotation when trying to build the entire x86 Linux kernel
(drivers/char/random.c). This is a small fix up to r353563.

Test case is drivers/char/random.c (with callbr's inlined), then ran
through creduce, then `opt -opt-bisect-limit=<limit>`, then bugpoint.

Thanks to Craig Topper for immediately spotting the fix, and teaching me
how to fish.

Reviewers: craig.topper, jyknight

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58929

llvm-svn: 355564
2019-03-06 23:04:40 +00:00
Nikita Popov 884feb1b69 [InstCombine] Fold add nsw + sadd.with.overflow
Fold `add nsw` and `sadd.with.overflow` with constants if the addition
does not overflow.

Part of https://bugs.llvm.org/show_bug.cgi?id=38146.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D58881

llvm-svn: 355530
2019-03-06 18:30:00 +00:00
Evgeniy Stepanov 53d7c5cd44 [msan] Instrument x86 BMI intrinsics.
Summary:
They simply shuffle bits. MSan needs to do the same with shadow bits,
after making sure that the shuffle mask is fully initialized.

Reviewers: pcc, vitalybuka

Subscribers: hiraditya, #sanitizers, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D58858

llvm-svn: 355348
2019-03-04 22:58:20 +00:00
Jordan Rupprecht 090683b85e [NFC] Fix PGO link error in shared libs build
llvm-svn: 355346
2019-03-04 22:54:44 +00:00
Sanjay Patel 6e32b46b1d [ConstantHoisting] avoid hang/crash from unreachable blocks (PR40930)
I'm not too familiar with this pass, so there might be a better
solution, but this appears to fix the degenerate:
PR40930
PR40931
PR40932
PR40934
...without affecting any real-world code.

As we've seen in several other passes, when we have unreachable blocks,
they can contain semi-bogus IR and/or cause unexpected conditions. We
would not typically expect these patterns to make it this far, but we
have to guard against them anyway.

llvm-svn: 355337
2019-03-04 20:57:14 +00:00
Rong Xu db29a3a438 [PGO] Context sensitive PGO (part 3)
Part 3 of CSPGO changes (mostly related to PassMananger).

Differential Revision: https://reviews.llvm.org/D54175

llvm-svn: 355330
2019-03-04 20:21:27 +00:00
Davide Italiano 672bec223d [InstCombine] Mark debug values as unavailable after DCE.
Fixes PR40838.

llvm-svn: 355301
2019-03-04 04:38:58 +00:00
Sanjay Patel 1f65903dc1 [InstCombine] move add after smin/smax
Follow-up to rL355221.
This isn't specifically called for within PR14613,
but we'll get there eventually if it's not already
requested in some other bug report.

https://rise4fun.com/Alive/5b0

  Name: smax
  Pre: WillNotOverflowSignedSub(C1,C0)
  %a = add nsw i8 %x, C0
  %cond = icmp sgt i8 %a, C1
  %r = select i1 %cond, i8 %a, i8 C1
  =>
  %c2 = icmp sgt i8 %x, C1-C0
  %u2 = select i1 %c2, i8 %x, i8 C1-C0
  %r = add nsw i8 %u2, C0

  Name: smin
  Pre: WillNotOverflowSignedSub(C1,C0)
  %a = add nsw i32 %x, C0
  %cond = icmp slt i32 %a, C1
  %r = select i1 %cond, i32 %a, i32 C1
  =>
  %c2 = icmp slt i32 %x, C1-C0
  %u2 = select i1 %c2, i32 %x, i32 C1-C0
  %r = add nsw i32 %u2, C0

llvm-svn: 355272
2019-03-02 16:45:10 +00:00
Philip Reames cf0a978e1f [InstCombine] Extend saturating idempotent atomicrmw transform to FP
I'm assuming that the nan propogation logic for InstructonSimplify's handling of fadd and fsub is correct, and applying the same to atomicrmw.

Differential Revision: https://reviews.llvm.org/D58836

llvm-svn: 355222
2019-03-01 19:50:36 +00:00
Sanjay Patel 6e1e7e1c3e [InstCombine] move add after umin/umax
In the motivating cases from PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613
...moving the add enables us to narrow the
min/max which eliminates zext/trunc which
enables signficantly better vectorization.
But that bug is still not completely fixed.

https://rise4fun.com/Alive/5KQ

  Name: umax
  Pre: C1 u>= C0
  %a = add nuw i8 %x, C0
  %cond = icmp ugt i8 %a, C1
  %r = select i1 %cond, i8 %a, i8 C1
  =>
  %c2 = icmp ugt i8 %x, C1-C0
  %u2 = select i1 %c2, i8 %x, i8 C1-C0
  %r = add nuw i8 %u2, C0

  Name: umin
  Pre: C1 u>= C0
  %a = add nuw i32 %x, C0
  %cond = icmp ult i32 %a, C1
  %r = select i1 %cond, i32 %a, i32 C1
  =>
  %c2 = icmp ult i32 %x, C1-C0
  %u2 = select i1 %c2, i32 %x, i32 C1-C0
  %r = add nuw i32 %u2, C0

llvm-svn: 355221
2019-03-01 19:42:40 +00:00
Philip Reames 2226e9a745 [LICM] Infer proper alignment from loads during scalar promotion
This patch fixes an issue where we would compute an unnecessarily small alignment during scalar promotion when no store is not to be guaranteed to execute, but we've proven load speculation safety. Since speculating a load requires proving the existing alignment is valid at the new location (see Loads.cpp), we can use the alignment fact from the load.

For non-atomics, this is a performance problem. For atomics, this is a correctness issue, though an *incredibly* rare one to see in practice. For atomics, we might not be able to lower an improperly aligned load or store (i.e. i32 align 1). If such an instruction makes it all the way to codegen, we *may* fail to codegen the operation, or we may simply generate a slow call to a library function. The part that makes this super hard to see in practice is that the memory location actually *is* well aligned, and instcombine knows that. So, to see a failure, you have to have a) hit the bug in LICM, b) somehow hit a depth limit in InstCombine/ValueTracking to avoid fixing the alignment, and c) then have generated an instruction which fails codegen rather than simply emitting a slow libcall. All around, pretty hard to hit.

Differential Revision: https://reviews.llvm.org/D58809

llvm-svn: 355217
2019-03-01 18:45:05 +00:00
Philip Reames 77982868c5 [InstCombine] Extend "idempotent" atomicrmw optimizations to floating point
An idempotent atomicrmw is one that does not change memory in the process of execution.  We have already added handling for the various integer operations; this patch extends the same handling to floating point operations which were recently added to IR.

Note: At the moment, we canonicalize idempotent fsub to fadd when ordering requirements prevent us from using a load.  As discussed in the review, I will be replacing this with canonicalizing both floating point ops to integer ops in the near future.

Differential Revision: https://reviews.llvm.org/D58251

llvm-svn: 355210
2019-03-01 18:00:07 +00:00
Jonas Hahnfeld e071cd86df Hide two unused debugging methods, NFCI.
GCC correctly moans that PlainCFGBuilder::isExternalDef(llvm::Value*) and
StackSafetyDataFlowAnalysis::verifyFixedPoint() are defined but not used
in Release builds. Hide them behind 'ifndef NDEBUG'.

llvm-svn: 355205
2019-03-01 17:15:21 +00:00
Manman Ren 576124a319 Try to fix NetBSD buildbot breakage introduced in D57463.
By including the header file in the source.

llvm-svn: 355202
2019-03-01 15:25:24 +00:00
Fangrui Song f4b25f700a [ConstantHoisting] Call cleanup() in ConstantHoistingPass::runImpl to avoid dangling elements in ConstIntInfoVec for new PM
Summary:
ConstIntInfoVec contains elements extracted from the previous function.
In new PM, releaseMemory() is not called and the dangling elements can
cause segfault in findConstantInsertionPoint.

Rename releaseMemory() to cleanup() to deliver the idea that it is
mandatory and call cleanup() in ConstantHoistingPass::runImpl to fix
this.

Reviewers: ormris, zzheng, dmgreen, wmi

Reviewed By: ormris, wmi

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58589

llvm-svn: 355174
2019-03-01 05:27:01 +00:00
Reid Kleckner 701593f1db [sancov] Instrument reachable blocks that end in unreachable
Summary:
These sorts of blocks often contain calls to noreturn functions, like
longjmp, throw, or trap. If they don't end the program, they are
"interesting" from the perspective of sanitizer coverage, so we should
instrument them. This was discussed in https://reviews.llvm.org/D57982.

Reviewers: kcc, vitalybuka

Subscribers: llvm-commits, craig.topper, efriedma, morehouse, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58740

llvm-svn: 355152
2019-02-28 22:54:30 +00:00
Manman Ren 1829512dd3 Add a module pass for order file instrumentation
The basic idea of the pass is to use a circular buffer to log the execution ordering of the functions. We only log the function when it is first executed. We use a 8-byte hash to log the function symbol name.

In this pass, we add three global variables:
(1) an order file buffer: a circular buffer at its own llvm section.
(2) a bitmap for each module: one byte for each function to say if the function is already executed.
(3) a global index to the order file buffer.

At the function prologue, if the function has not been executed (by checking the bitmap), log the function hash, then atomically increase the index.

Differential Revision:  https://reviews.llvm.org/D57463

llvm-svn: 355133
2019-02-28 20:13:38 +00:00
Rong Xu a6ff69f6dd [PGO] Context sensitive PGO (part 2)
Part 2 of CSPGO changes (mostly related to ProfileSummary).
Note that I use a default parameter in setProfileSummary() and getSummary().
This is to break the dependency in clang. I will make the parameter explicit
after changing clang in a separated patch.

Differential Revision: https://reviews.llvm.org/D54175

llvm-svn: 355131
2019-02-28 19:55:07 +00:00
Sanjay Patel 4a47f5f550 [InstCombine] fold adds of constants separated by sext/zext
This is part of a transform that may be done in the backend:
D13757
...but it should always be beneficial to fold this sooner in IR
for all targets.

https://rise4fun.com/Alive/vaiW

  Name: sext add nsw
  %add = add nsw i8 %i, C0
  %ext = sext i8 %add to i32
  %r = add i32 %ext, C1
  =>
  %s = sext i8 %i to i32
  %r = add i32 %s, sext(C0)+C1

  Name: zext add nuw
  %add = add nuw i8 %i, C0
  %ext = zext i8 %add to i16
  %r = add i16 %ext, C1
  =>
  %s = zext i8 %i to i16
  %r = add i16 %s, zext(C0)+C1

llvm-svn: 355118
2019-02-28 19:05:26 +00:00
Chijun Sima 586187639a Make MergeBlockIntoPredecessor conformant to the precondition of calling DTU.applyUpdates
Summary:
It is mentioned in the document of DTU that "It is illegal to submit any update that has already been submitted, i.e., you are supposed not to insert an existent edge or delete a nonexistent edge." It is dangerous to violet this rule because DomTree and PostDomTree occasionally crash on this scenario.

This patch fixes `MergeBlockIntoPredecessor`, making it conformant to this precondition.

Reviewers: kuhar, brzycki, chandlerc

Reviewed By: brzycki

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58444

llvm-svn: 355105
2019-02-28 16:47:18 +00:00
Bjorn Pettersson d30f308a9f Add support for computing "zext of value" in KnownBits. NFCI
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.

This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.

Reviewers: craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58650

llvm-svn: 355099
2019-02-28 15:45:29 +00:00
Eric Christopher 07944353fc Temporarily revert "ArgumentPromotion should copy all metadata to new Function" and the dependent patch "Refine ArgPromotion metadata handling" as they're causing segfaults in argument promotion.
This reverts commits r354032 and r353537.

llvm-svn: 355060
2019-02-28 01:11:12 +00:00
Reid Kleckner 4fb3502bc9 [InstrProf] Use separate comdat group for data and counters
Summary:
I hadn't realized that instrumentation runs before inlining, so we can't
use the function as the comdat group. Doing so can create relocations
against discarded sections when references to discarded __profc_
variables are inlined into functions outside the function's comdat
group.

In the future, perhaps we should consider standardizing the comdat group
names that ELF and COFF use. It will save object file size, since
__profv_$sym won't appear in the symbol table again.

Reviewers: xur, vsk

Subscribers: eraman, hiraditya, cfe-commits, #sanitizers, llvm-commits

Tags: #clang, #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D58737

llvm-svn: 355044
2019-02-27 23:38:44 +00:00
Alina Sbirlea fcfa7c5f92 [MemorySSA] Make insertDef insert corresponding phi nodes.
Summary:
The original assumption for the insertDef method was that it would not
materialize Defs out of no-where, hence it will not insert phis needed
after inserting a Def.

However, when cloning an instruction (use case used in LICM), we do
materialize Defs "out of no-where". If the block receiving a Def has at
least one other Def, then no processing is needed. If the block just
received its first Def, we must check where Phi placement is needed.
The only new usage of insertDef is in LICM, hence the trigger for the bug.

But the original goal of the method also fails to apply for the move()
method. If we move a Def from the entry point of a diamond to either the
left or right blocks, then the merge block must add a phi.
While this usecase does not currently occur, or may be viewed as an
incorrect transformation, MSSA must behave corectly given the scenario.

Resolves PR40749 and PR40754.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58652

llvm-svn: 355040
2019-02-27 22:20:22 +00:00
Rong Xu 6cdf3d8086 Recommit r354930 "[PGO] Context sensitive PGO (part 1)"
Fixed UBSan failures.

llvm-svn: 355005
2019-02-27 17:24:33 +00:00
Vlad Tsyrklevich c01643087e Revert "[PGO] Context sensitive PGO (part 1)"
This reverts commit r354930, it was causing UBSan failures.

llvm-svn: 354953
2019-02-27 03:45:28 +00:00
Vedant Kumar 73522d1678 [HotColdSplit] Disable splitting for sanitized functions
Splitting can make sanitizer errors harder to understand, as the
trapping instruction may not be in the function where the bug was
detected.

rdar://48142697

llvm-svn: 354931
2019-02-26 22:55:46 +00:00
Rong Xu 35d2d51369 [PGO] Context sensitive PGO (part 1)
Current PGO profile counts are not context sensitive. The branch probabilities
for the inlined functions are kept the same for all call-sites, and they might
be very different from the actual branch probabilities. These suboptimal
profiles can greatly affect some downstream optimizations, in particular for
the machine basic block placement optimization.

In this patch, we propose to have a post-inline PGO instrumentation/use pass,
which we called Context Sensitive PGO (CSPGO). For the users who want the best
possible performance, they can perform a second round of PGO instrument/use on
the top of the regular PGO. They will have two sets of profile counts. The
first pass profile will be manly for inline, indirect-call promotion, and
CGSCC simplification pass optimizations. The second pass profile is for
post-inline optimizations and code-gen optimizations.

A typical usage:
// Regular PGO instrumentation and generate pass1 profile.
> clang -O2 -fprofile-generate source.c -o gen
> ./gen
> llvm-profdata merge default.*profraw -o pass1.profdata
// CSPGO instrumentation.
> clang -O2 -fprofile-use=pass1.profdata -fcs-profile-generate -o gen2
> ./gen2
// Merge two sets of profiles
> llvm-profdata merge default.*profraw pass1.profdata -o profile.profdata
// Use the combined profile. Pass manager will invoke two PGO use passes.
> clang -O2 -fprofile-use=profile.profdata -o use

This change touches many components in the compiler. The reviewed patch
(D54175) will committed in phrases.

Differential Revision: https://reviews.llvm.org/D54175

llvm-svn: 354930
2019-02-26 22:37:46 +00:00
Alina Sbirlea 9026404125 [MemorySSA & SimpleLoopUnswitch] Update MemorySSA in ReplaceUsesOfWith.
SimpleLoopUnswitch must update MemorySSA when removing instructions.
Resolves PR39197.

llvm-svn: 354919
2019-02-26 19:44:52 +00:00
Sanjay Patel e8bf0f79bd [InstCombine] canonicalize more unsigned saturated add with 'not'
Yet another pattern variation suggested by:
https://bugs.llvm.org/show_bug.cgi?id=14613

There are 8 more potential commuted patterns here on top of the
8 that were already handled (rL354221, rL354276, rL354393).
We have the obvious commute of the 'add' + commute of the cmp
predicate/operands (ugt/ult) + commute of the select operands:

Name: base
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ult i32 %x, %y
%r = select i1 %c, i32 -1, i32 %a
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

Name: ugt
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ugt i32 %y, %x
%r = select i1 %c, i32 -1, i32 %a
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

Name: commute select
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ult i32 %y, %x
%r = select i1 %c, i32 %a, i32 -1
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

Name: ugt + commute select
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ugt i32 %x, %y
%r = select i1 %c, i32 %a, i32 -1
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

https://rise4fun.com/Alive/den

llvm-svn: 354887
2019-02-26 15:18:49 +00:00
Simon Pilgrim a066f1f9e6 [Vectorizer] Add vectorization support for fixed smul/umul intrinsics
This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument.

Differential Revision: https://reviews.llvm.org/D58616

llvm-svn: 354790
2019-02-25 15:42:02 +00:00
Sanjay Patel 9907d3c8b4 [InstCombine] canonicalize add/sub with bool
add A, sext(B) --> sub A, zext(B)

We have to choose 1 of these forms, so I'm opting for the
zext because that's easier for value tracking.

The backend should be prepared for this change after:
D57401
rL353433

This is also a preliminary step towards reducing the amount
of bit hackery that we do in IR to optimize icmp/select.
That should be waiting to happen at a later optimization stage.

The seeming regression in the fuzzer test was discussed in:
D58359

We were only managing that fold in instcombine by luck, and
other passes should be able to deal with that better anyway.

llvm-svn: 354748
2019-02-24 16:57:45 +00:00
Matt Arsenault 65b4ab9921 BreakCriticalEdges: Update PostDominatorTree
llvm-svn: 354673
2019-02-22 15:01:41 +00:00
Roman Tereshin 99a6672bba [LowerSwitch][AMDGPU] Do not handle impossible values
This patch adds LazyValueInfo to LowerSwitch to compute the range of the
value being switched over and reduce the size of the tree LowerSwitch
builds to lower a switch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D58096

llvm-svn: 354670
2019-02-22 14:33:46 +00:00
Chijun Sima 70e97163e0 [DTU] Refine the interface and logic of applyUpdates
Summary:
This patch separates two semantics of `applyUpdates`:
1. User provides an accurate CFG diff and the dominator tree is updated according to the difference of `the number of edge insertions` and `the number of edge deletions` to infer the status of an edge before and after the update.
2. User provides a sequence of hints. Updates mentioned in this sequence might never happened and even duplicated.

Logic changes:

Previously, removing invalid updates is considered a side-effect of deduplication and is not guaranteed to be reliable. To handle the second semantic, `applyUpdates` does validity checking before deduplication, which can cause updates that have already been applied to be submitted again. Then, different calls to `applyUpdates` might cause unintended consequences, for example,
```
DTU(Lazy) and Edge A->B exists.
1. DTU.applyUpdates({{Delete, A, B}, {Insert, A, B}}) // User expects these 2 updates result in a no-op, but {Insert, A, B} is queued
2. Remove A->B
3. DTU.applyUpdates({{Delete, A, B}}) // DTU cancels this update with {Insert, A, B} mentioned above together (Unintended)
```
But by restricting the precondition that updates of an edge need to be strictly ordered as how CFG changes were made, we can infer the initial status of this edge to resolve this issue.

Interface changes:
The second semantic of `applyUpdates`  is separated to `applyUpdatesPermissive`.
These changes enable DTU(Lazy) to use the first semantic if needed, which is quite useful in `transforms/utils`.

Reviewers: kuhar, brzycki, dmgreen, grosser

Reviewed By: brzycki

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58170

llvm-svn: 354669
2019-02-22 13:48:38 +00:00
Alina Sbirlea 90d2e3a16d [MemorySSA & LoopPassManager] Resolve PR40038.
The correct edge being deleted is not to the unswitched exit block, but to the
original block before it was split. That's the key in the map, not the
value.
The insert is correct. The new edge is to the .split block.

The splitting turns OriginalBB into:
OriginalBB -> OriginalBB.split.
Assuming the orignal CFG edge: ParentBB->OriginalBB, we must now delete
ParentBB->OriginalBB, not ParentBB->OriginalBB.split.

llvm-svn: 354656
2019-02-22 07:18:37 +00:00
Chijun Sima f131d6110e [DTU] Deprecate insertEdge*/deleteEdge*
Summary: This patch converts all existing `insertEdge*/deleteEdge*` to `applyUpdates` and marks `insertEdge*/deleteEdge*` as deprecated.

Reviewers: kuhar, brzycki

Reviewed By: kuhar, brzycki

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58443

llvm-svn: 354652
2019-02-22 05:41:43 +00:00
Alina Sbirlea 97468e9282 [MemorySSA & LoopPassManager] Update MemorySSA in formDedicatedExitBlocks.
MemorySSA is now updated when forming dedicated exit blocks.
Resolves PR40037.

llvm-svn: 354623
2019-02-21 21:13:34 +00:00
Alina Sbirlea d2d3244363 [LoopSimplifyCFG] Update MemorySSA after r353911.
Summary:
MemorySSA is not properly updated in LoopSimplifyCFG after recent changes. Use SplitBlock utility to resolve that and clear all updates once handleDeadExits is finished.
All updates that follow are removal of edges which are safe to handle via the removeEdge() API.
Also, deleting dead blocks is done correctly as is, i.e. delete from MemorySSA before updating the CFG and DT.

Reviewers: mkazantsev, rtereshin

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58524

llvm-svn: 354613
2019-02-21 19:54:05 +00:00
Alina Sbirlea 73446cd567 [EarlyCSE] Cleanup deadcode. [NFCI]
Summary: Cleanup nop assignments.

Reviewers: george.burgess.iv, davide

Subscribers: sanjoy, jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58308

llvm-svn: 354612
2019-02-21 19:49:57 +00:00
Joey Gouly fdf651ee8d [InferAddressSpaces] Fix fallthrough error
llvm-svn: 354580
2019-02-21 13:10:37 +00:00
Joey Gouly 92af1360f3 [InferAddressSpaces] Fix crash on select of non-ptr operands
Check the operands of a select are pointers, to determine if it is an address
expression or not.

https://reviews.llvm.org/D58226

llvm-svn: 354576
2019-02-21 12:31:36 +00:00
Max Kazantsev 10489d76f6 [LoopSimplifyCFG] Add missing MSSA edge deletion
When we create fictive switch in preheader, we should take
care about MSSA and delete edge between old preheader and
header.

llvm-svn: 354547
2019-02-21 05:51:29 +00:00
Wei Mi 500606f270 [Inliner] Pass nullptr for the ORE param of getInlineCost if RemarkEnabled
is false.

Right now for inliner and partial inliner, we always pass the address of a
valid ORE object to getInlineCost even if RemarkEnabled is false because of
no -Rpass is specified. Since ComputeFullInlineCost will be set to true if
ORE is non-null in getInlineCost, this introduces the problem that in
getInlineCost we cannot return early even if we already know the cost is
definitely higher than the threshold. It is a general problem for compile
time.

This patch fixes that by pass nullptr as the ORE argument if RemarkEnabled is
false.

Differential Revision: https://reviews.llvm.org/D58399

llvm-svn: 354542
2019-02-21 02:57:52 +00:00
Philip Reames 79d5e16f51 [GVN] Small tweaks to comments, style, and missed vector handling
Noticed these while doing a final sweep of the code to make sure I hadn't missed anything in my last couple of patches.  The (minor) missed optimization was noticed because of the stylistic fix to avoid an overly specific cast.

llvm-svn: 354412
2019-02-20 00:31:28 +00:00
Philip Reames a259dc3263 [GVN] Fix last crasher w/non-integral pointers
Same case as for memset and memcpy, but this time for clobbering stores and loads.  We still can't allow coercion to or from non-integrals, regardless of the transform.

Now that I'm done the whole little sequence, it seems apparent that we'd entirely missed reasoning about clobbers in the original GVN support for non-integral pointers.

My appologies, I thought we'd upstreamed all of this, but it turns out we were still carrying a downstream hack which hid all of these issues.  My chanks to Cherry Zhang for helping debug.

llvm-svn: 354407
2019-02-20 00:15:54 +00:00
Philip Reames 952d234d00 [GVN] Fix a crash bug w/non-integral pointers and memtransfers
Problem is very similiar to the one fixed for memsets in r354399, we try to coerce a value to non-integral type, and then crash while try to do so.  Since we shouldn't be doing such coercions to start with, easy fix.  From inspection, I see two other cases which look to be similiar and will follow up with most test cases and fixes if confirmed.

llvm-svn: 354403
2019-02-19 23:49:38 +00:00
Philip Reames 322eb7660e [GVN] Fix a non-integral pointer bug w/vector types
GVN generally doesn't forward structs or array types, but it *will* forward vector types to non-vectors and vice versa.  As demonstrated in tests, we need to inhibit the same set of transforms for vector of non-integral pointers as for non-integral pointers themselves.

llvm-svn: 354401
2019-02-19 23:19:51 +00:00
Philip Reames 92756a80e7 [GVN] Fix a crash bug around non-integral pointers
If we encountered a location where we tried to forward the value of a memset to a load of a non-integral pointer, we crashed.  Such a forward is not legal in general, but we can forward null pointers.  Test for both cases are included.

llvm-svn: 354399
2019-02-19 23:07:15 +00:00
Sanjay Patel c1e0184317 [InstCombine] reduce even more unsigned saturated add with 'not' op
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

  Name: uaddsat, -1 fval
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ugt i32 %notx, %y
  %r = select i1 %c, i32 %a, i32 -1
  =>
  %a = add i32 %x, %y
  %c2 = icmp ugt i32 %y, %a
  %r = select i1 %c2, i32 -1, i32 %a

  Name: uaddsat, -1 fval + ult
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ult i32 %y, %notx
  %r = select i1 %c, i32 %a, i32 -1
  =>
  %a = add i32 %x, %y
  %c2 = icmp ugt i32 %y, %a
  %r = select i1 %c2, i32 -1, i32 %a

https://rise4fun.com/Alive/nTp

llvm-svn: 354393
2019-02-19 22:14:21 +00:00
Sanjay Patel dcb93c0dda [InstCombine] rearrange saturated add folds; NFC
This is no-functional-change-intended, but that was also
true when it was part of rL354276, and I managed to lose
2 predicates for the fold with constant...causing much bot
distress. So this time I'm adding a couple of negative tests
to avoid that.

llvm-svn: 354384
2019-02-19 21:46:13 +00:00
Max Kazantsev ebd95ea86e [NFC] API for signaling that the current loop is being deleted
We are planning to be able to delete the current loop in LoopSimplifyCFG
in the future. Add API to notify the loop pass manager that it happened.

llvm-svn: 354314
2019-02-19 11:14:05 +00:00
Max Kazantsev 30095d9795 [NFC] Store loop header in a local to keep it available after the loop is deleted
llvm-svn: 354313
2019-02-19 11:13:58 +00:00
Sanjay Patel 8a35d339c9 Revert "[InstCombine] reduce even more unsigned saturated add with 'not' op"
This reverts commit 079b610c29.
Bots are failing after this change on a stage 2 compile of clang.

llvm-svn: 354277
2019-02-18 16:04:22 +00:00
Sanjay Patel 079b610c29 [InstCombine] reduce even more unsigned saturated add with 'not' op
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

  Name: uaddsat, -1 fval
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ugt i32 %notx, %y
  %r = select i1 %c, i32 %a, i32 -1
  =>
  %a = add i32 %x, %y
  %c2 = icmp ugt i32 %y, %a
  %r = select i1 %c2, i32 -1, i32 %a

  Name: uaddsat, -1 fval + ult
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ult i32 %y, %notx
  %r = select i1 %c, i32 %a, i32 -1
  =>
  %a = add i32 %x, %y
  %c2 = icmp ugt i32 %y, %a
  %r = select i1 %c2, i32 -1, i32 %a

https://rise4fun.com/Alive/nTp

llvm-svn: 354276
2019-02-18 15:21:39 +00:00
Max Kazantsev 4561475e09 [NFC] Teach getInnermostLoopFor walk up the loop trees
This should be NFC in current use case of this method, but it will
help to use it for solving more compex tasks in follow-up patches.

llvm-svn: 354227
2019-02-17 18:21:51 +00:00
Sanjay Patel b341ee7071 [InstCombine] reduce more unsigned saturated add with 'not' op
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

  Name: not op
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ult i32 %notx, %y
  %r = select i1 %c, i32 -1, i32 %a
  =>
  %a = add i32 %x, %y
  %c2 = icmp ult i32 %a, %y
  %r = select i1 %c2, i32 -1, i32 %a

  Name: not op ugt
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ugt i32 %y, %notx
  %r = select i1 %c, i32 -1, i32 %a
  =>
  %a = add i32 %x, %y
  %c2 = icmp ult i32 %a, %y
  %r = select i1 %c2, i32 -1, i32 %a

https://rise4fun.com/Alive/niom

(The matching here is still incomplete.)

llvm-svn: 354224
2019-02-17 16:48:50 +00:00
Sanjay Patel bee2073542 [InstCombine] reduce unsigned saturated add with 'not' op
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

(The matching here is incomplete. Trying to take minimal steps
to make sure we don't induce infinite looping from existing
canonicalizations of the 'select'.)

llvm-svn: 354221
2019-02-17 15:58:48 +00:00
Max Kazantsev d72c1a0c5c [NFC] Fix name and clarifying comment for factored-out function
llvm-svn: 354220
2019-02-17 15:22:48 +00:00
Max Kazantsev 0f943269a0 [NFC] Factor out a function for future reuse
llvm-svn: 354218
2019-02-17 15:04:09 +00:00
Alina Sbirlea 383ccfb360 [EarlyCSE & MSSA] Cap the clobbering calls in EarlyCSE.
Summary:
Unlimitted number of calls to getClobberingAccess can lead to high
compile times in pathological cases.
Limitting getClobberingAccess to a fairly high number. Can be adjusted
based on users/need.
Note: this is the only user of MemorySSA currently enabled by default.
The same handling exists in LICM (disabled atm). As MemorySSA gains more
users, this logic of capping will need to move inside MemorySSA.

Reviewers: george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D58248

llvm-svn: 354182
2019-02-15 22:47:54 +00:00
Philip Reames 8220ecbce1 [InstCombine] Address a couple stylistic issues pointed out by reviewer [NFC]
Better addressing comments from https://reviews.llvm.org/D58290.

llvm-svn: 354171
2019-02-15 21:31:39 +00:00
Philip Reames cae6c767e8 [InstCombine] Convert atomicrmws to xchg or store where legal
Implement two more transforms of atomicrmw:
1) We can convert an atomicrmw which produces a known value in memory into an xchg instead.
2) We can convert an atomicrmw xchg w/o users into a store for some orderings.

Differential Revision: https://reviews.llvm.org/D58290

llvm-svn: 354170
2019-02-15 21:23:51 +00:00
Vedant Kumar 5f5cac3ae2 [CodeExtractor] Do not lift lifetime.end markers for region inputs
If a lifetime.end marker occurs along one path through the extraction
region, but not another, then it's still incorrect to lift the marker,
because there is some path through the extracted function which would
ordinarily not reach the marker. If the call to the extracted function
is in a loop, unrolling can cause inputs to the function to become
optimized out as undef after the first iteration.

To prevent incorrect stack slot merging in the calling function, it
should be sufficient to lift lifetime.start markers for region inputs.
I've tested this theory out by doing a stage2 check-all with randomized
splitting enabled.

This is a follow-up to r353973, and there's additional context for this
change in https://reviews.llvm.org/D57834.

rdar://47896986

Differential Revision: https://reviews.llvm.org/D58253

llvm-svn: 354159
2019-02-15 18:46:58 +00:00
Vedant Kumar 47a0c9b69c [HotColdSplit] Schedule splitting late to fix perf regression
With or without PGO data applied, splitting early in the pipeline
(either before the inliner or shortly after it) regresses performance
across SPEC variants. The cause appears to be that splitting hides
context for subsequent optimizations.

Schedule splitting late again, in effect reversing r352080, which
scheduled the splitting pass early for code size benefits (documented in
https://reviews.llvm.org/D57082).

Differential Revision: https://reviews.llvm.org/D58258

llvm-svn: 354158
2019-02-15 18:46:44 +00:00
Sanjay Patel 8a2b543a13 [InstCombine] fix crash while trying to narrow a binop of shuffles (PR40734)
https://bugs.llvm.org/show_bug.cgi?id=40734

llvm-svn: 354144
2019-02-15 16:31:55 +00:00
Clement Courbet f7e84a2ccc [MergeICmps] Make base ordering really deterministic.
Summary:
The idea is that we now manipulate bases through a `unsigned BaseID` based on
order of appearance in the comparison chain rather than through the `Value*`.

Fixes 40714.

Reviewers: gchatelet

Subscribers: mgrang, jfb, jdoerfert, llvm-commits, hans

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58274

llvm-svn: 354131
2019-02-15 14:17:17 +00:00
Clement Courbet cc004df7eb [MergeICmps][NFC] Improve doc.
llvm-svn: 354128
2019-02-15 12:58:06 +00:00
Max Kazantsev c065b025a6 [NFCI] Factor out block removal from stack of nested loops
llvm-svn: 354124
2019-02-15 12:18:10 +00:00
Simon Pilgrim 623c38d6cd Fix "field 'DFS' will be initialized after field 'DTU'" warning. NFCI.
llvm-svn: 354123
2019-02-15 12:13:16 +00:00
Max Kazantsev 136f09bea1 [NFC] Promote DFS to field for further use
llvm-svn: 354118
2019-02-15 11:39:35 +00:00
Max Kazantsev 73db5c137a [NFC] Tweak SplitBlockAndInsertIfThen to use existing ThenBlock
llvm-svn: 354107
2019-02-15 08:18:00 +00:00
Teresa Johnson d0b1f30b32 [ThinLTO] Detect partially split modules during the thin link
Summary:
The changes to disable LTO unit splitting by default (r350949) and
detect inconsistently split LTO units (r350948) are causing some crashes
when the inconsistency is detected in multiple threads simultaneously.
Fix that by having the code always look for the inconsistently split
LTO units during the thin link, by checking for the presence of type
tests recorded in the summaries.

Modify test added in r350948 to remove single threading required to fix
a bot failure due to this issue (and some debugging options added in the
process of diagnosing it).

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57561

llvm-svn: 354062
2019-02-14 21:22:50 +00:00
Philip Reames db57ef6238 [InstCombine] Add todos for possible atomicrmw transforms
llvm-svn: 354059
2019-02-14 20:48:36 +00:00
Philip Reames 485474208e Canonicalize all integer "idempotent" atomicrmw ops
For "idempotent" atomicrmw instructions which we can't simply turn into load, canonicalize the operation and constant. This reduces the matching needed elsewhere in the optimizer, but doesn't directly impact codegen.

For any architecture where OR/Zero is not a good default choice, you can extend the AtomicExpand lowerIdempotentRMWIntoFencedLoad mechanism. I reviewed X86 to make sure this works well, haven't audited other backends.

Differential Revision: https://reviews.llvm.org/D58244

llvm-svn: 354058
2019-02-14 20:41:17 +00:00
Philip Reames 97067d3c73 Teach instcombine about remaining "idempotent" atomicrmw types
Expand on Quentin's r353471 patch which converts some atomicrmws into loads. Handle remaining operation types, and fix a slight bug. Atomic loads are required to have alignment. Since this was within the InstCombine fixed point, somewhere else in InstCombine was adding alignment before the verifier saw it, but still, we should fix.

Terminology wise, I'm using the "idempotent" naming that is used for the same operations in AtomicExpand and X86ISelLoweringInfo. Once this lands, I'll add similar tests for AtomicExpand, and move the pattern match function to a common location. In the review, there was seemingly consensus that "idempotent" was slightly incorrect for this context.  Once we setle on a better name, I'll update all uses at once.

Differential Revision: https://reviews.llvm.org/D58242

llvm-svn: 354046
2019-02-14 18:39:14 +00:00
Teresa Johnson c374a800e7 Refine ArgPromotion metadata handling
Summary:
In r353537 we now copy all metadata to the new function, with the old
being removed when the old function is eliminated. In some cases the old
function is dropped to a declaration (seems to only occur with the old
PM). Go ahead and clear all metadata from the old function to handle that
case, since verification will complain otherwise. This is consistent
with what was being done for debug metadata before r353537.

Reviewers: davidxl, uabelho

Subscribers: jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58215

llvm-svn: 354032
2019-02-14 14:14:24 +00:00
Florian Hahn 6ab83b7db6 [LoopUnrollPeel] Add case where we should forget the peeled loop from SCEV.
The test case requires the peeled loop to be forgotten after peeling,
even though it does not have a parent. When called via the unroller,
SE->forgetTopmostLoop is also called, so the test case would also pass
without any SCEV invalidation, but peelLoop is exposed as utility
function. Also, in the test case, simplifyLoop will make changes,
removing the loop from SCEV, but it is better to not rely on this
behavior.

Reviewers: sanjoy, mkazantsev

Reviewed By: mkazantsev

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58192

llvm-svn: 354031
2019-02-14 13:59:39 +00:00
Clement Courbet c6e768f0ee [Instrumentation][NFC] Fix warning.
lib/Transforms/Instrumentation/AddressSanitizer.cpp:1173:29: warning: extra ‘;’ [-Wpedantic]

llvm-svn: 354024
2019-02-14 12:10:49 +00:00
Max Kazantsev deaf2ba280 [NFC] Refactor LICM code for better readability
llvm-svn: 354013
2019-02-14 09:04:12 +00:00
Leonard Chan 436fb2bd82 [NewPM] Second attempt at porting ASan
This is the second attempt to port ASan to new PM after D52739. This takes the
initialization requried by ASan from the Module by moving it into a separate
class with it's own analysis that the new PM ASan can use.

Changes:
- Split AddressSanitizer into 2 passes: 1 for the instrumentation on the
  function, and 1 for the pass itself which creates an instance of the first
  during it's run. The same is done for AddressSanitizerModule.
- Add new PM AddressSanitizer and AddressSanitizerModule.
- Add legacy and new PM analyses for reading data needed to initialize ASan with.
- Removed DominatorTree dependency from ASan since it was unused.
- Move GlobalsMetadata and ShadowMapping out of anonymous namespace since the
  new PM analysis holds these 2 classes and will need to expose them.

Differential Revision: https://reviews.llvm.org/D56470

llvm-svn: 353985
2019-02-13 22:22:48 +00:00
Vedant Kumar 4b0cc9a7c8 [CodeExtractor] Only lift lifetime markers present in the extraction region
When CodeExtractor finds liftime markers referencing inputs to the
extraction region, it lifts these markers out of the region and inserts
them around the call to the extracted function (see r350420, PR39671).

However, it should *only* lift lifetime markers that are actually
present in the extraction region. I.e., if a start marker is present in
the extraction region but a corresponding end marker isn't (or vice
versa), only the start marker (or end marker, resp.) should be lifted.

Differential Revision: https://reviews.llvm.org/D57834

llvm-svn: 353973
2019-02-13 19:53:38 +00:00
Max Kazantsev 3fe9ad7a9f [NFC] Add const qualifiers where possible
llvm-svn: 353941
2019-02-13 11:54:45 +00:00
Jeremy Morse f10af3f134 [DebugInfo][InstCombine] Prefer to salvage debuginfo over sinking it
When instcombine sinks an instruction between two basic blocks, it sinks any
dbg.value users in the source block with it, to prevent debug use-before-free.
However we can do better by attempting to salvage the debug users, which would
avoid moving where the variable location changes. If we successfully salvage,
still sink a (cloned) dbg.value with the sunk instruction, as the sunk
instruction is more likely to be "live" later in the compilation process.

If we can't salvage dbg.value users of a sunk instruction, mark the dbg.values
in the original block as being undef. This terminates any earlier variable
location range, and represents the fact that we've optimized out the variable
location for a portion of the program.

Differential Revision: https://reviews.llvm.org/D56788

llvm-svn: 353936
2019-02-13 10:54:53 +00:00
Max Kazantsev 2bb95e7c76 [GuardWidening] Support widening of explicitly expressed guards
This patch adds support of guards expressed in explicit form via
`widenable_condition` in Guard Widening pass.

Differential Revision: https://reviews.llvm.org/D56075
Reviewed By: reames

llvm-svn: 353932
2019-02-13 09:56:30 +00:00
Max Kazantsev 5cf777e413 [LoopSimplifyCFG] Re-enable const branch folding by default
Known underlying bugs have been fixed, intensive fuzz testing did not
find any new problems. Re-enabling by default. Feel free to revert if
it causes any functional failures.

llvm-svn: 353911
2019-02-13 06:12:48 +00:00
Alina Sbirlea 0a8bc14ad7 [MemorySSA & LoopPassManager] Add remaining book keeping [NFCI].
Add plumbing to get MemorySSA in the remaining loop passes.
Also update unit test to add the dependency.
[EnableMSSALoopDependency remains disabled].

llvm-svn: 353901
2019-02-12 23:48:02 +00:00
Alina Sbirlea 8567ff0c34 [LICM] Cap the clobbering calls in LICM.
Summary:
Unlimitted number of calls to getClobberingAccess can lead to high
compile times in pathological cases.
Switching EnableLicmCap flag from bool to int, and enabling to default 100.
(tested to be appropriate for current bechmarks)
We can revisit this value when enabling MemorySSA.

Reviewers: sanjoy, chandlerc, george.burgess.iv

Subscribers: jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57968

llvm-svn: 353897
2019-02-12 23:05:40 +00:00
Jeremy Morse b33a5c7347 [DebugInfo] Don't salvage load operations (PR40628).
Salvaging a redundant load instruction into a debug expression hides a
memory read from optimisation passes. Passes that alter memory behaviour
(such as LICM promoting memory to a register) aren't aware of these debug
memory reads and leave them unaltered, making the debug variable location
point somewhere unsafe.

Teaching passes to know about these debug memory reads would be challenging
and probably incomplete. Finding dbg.value instructions that need to be fixed
would likely be computationally expensive too, as more analysis would be
required. It's better to not generate debug-memory-reads instead, alas.

Changed tests:
 * DeadStoreElim: test for salvaging of intermediate operations contributing
   to the dead store, instead of salvaging of the redundant load,
 * GVN: remove debuginfo behaviour checks completely, this behaviour is still
   covered by other tests,
 * InstCombine: don't test for salvaged loads, we're removing that behaviour.

Differential Revision: https://reviews.llvm.org/D57962

llvm-svn: 353824
2019-02-12 10:54:30 +00:00
Max Kazantsev 2a184af221 [IndVars] Fix corner case with unreachable Phi inputs. PR40454
Logic in `getInsertPointForUses` doesn't account for a corner case when `Def`
only comes to a Phi user from unreachable blocks. In this case, the incoming
value may be arbitrary (and not even available in the input block) and break
the loop-related invariants that are asserted below.

In fact, if we encounter this situation, no IR modification is needed. This
Phi will be simplified away with nearest cleanup.

Differential Revision: https://reviews.llvm.org/D58045
Reviewed By: spatel

llvm-svn: 353816
2019-02-12 09:59:44 +00:00
Max Kazantsev bf6af8fbf0 [LoopSimplifyCFG] Change logic of dead loops removal to avoid hitting asserts
The function `LI.erase` has some invariants that need to be preserved when it
tries to remove a loop which is not the top-level loop. In particular, it
requires loop's preheader to be strictly in loop's parent. Our current logic
of deletion of dead blocks may erase the information about preheader before we
handle the loop, and therefore we may hit this assertion.

This patch changes the logic of loop deletion: we make them top-level loops
before we actually erase them. This allows us to trigger the simple branch of
`erase` logic which just detatches blocks from the loop and does not try to do
some complex stuff that need this invariant.

Thanks to @uabelho for reporting this!

Differential Revision: https://reviews.llvm.org/D57221
Reviewed By: fedor.sergeev

llvm-svn: 353813
2019-02-12 09:37:00 +00:00
Max Kazantsev 9aae9da947 Delete blocks from DTU to avoid dangling pointers
llvm-svn: 353804
2019-02-12 08:10:29 +00:00
Max Kazantsev 6bf861597c [LoopSimplifyCFG] Pay respect to LCSSA when removing dead blocks
Utility function that we use for blocks deletion always unconditionally removes
one-input Phis. In LoopSimplifyCFG, it can lead to breach of LCSSA form.
This patch alters this function to keep them if needed.

Differential Revision: https://reviews.llvm.org/D57231
Reviewed By: fedor.sergeev

llvm-svn: 353803
2019-02-12 07:48:07 +00:00
Max Kazantsev 20b9189975 [NFC] Rename DontDeleteUselessPHIs --> KeepOneInputPHIs
llvm-svn: 353801
2019-02-12 07:09:29 +00:00
Max Kazantsev 0686d1ae41 [NFC] Add parameter for keeping one-input Phis in DeleteDeadBlock(s)
llvm-svn: 353799
2019-02-12 06:14:27 +00:00
Eli Friedman 806136f8ef [LoopReroll] Fix reroll root legality checking.
The code checked that the first root was an appropriate distance from
the base value, but skipped checking the other roots. This could lead to
rerolling a loop that can't be legally rerolled (at least, not without
rewriting the loop in a non-trivial way).

Differential Revision: https://reviews.llvm.org/D56812

llvm-svn: 353779
2019-02-12 00:33:25 +00:00
Michael Kruse 77a614a6e1 Refactor setAlreadyUnrolled() and setAlreadyVectorized().
Loop::setAlreadyUnrolled() and
LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that
stops the same loop from being transformed multiple times. This patch
merges both implementations.

In doing so we fix 3 potential issues:

 * setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.*
   metadata even though it will not be used anymore. This already caused
   problems such as http://llvm.org/PR40546. Change the behavior to the
   one of setAlreadyUnrolled which deletes this loop metadata.

 * setAlreadyUnrolled() used to create a new LoopID by calling
   MDNode::get with nullptr as the first operand, then replacing it by
   the returned references using replaceOperandWith. It is possible
   that MDNode::get would instead return an existing node (due to
   de-duplication) that then gets modified. To avoid, use a fresh
   TempMDNode that does not get uniqued with anything else before
   replacing it with replaceOperandWith.

 * LoopVectorizeHints::matchesHintMetadataName() only compares the
   suffix of the attribute to set the new value for. That is, when
   called with "enable", would erase attributes such as
   "llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and
   "llvm.loop.distribute.enable" instead of the one to replace.
   Fortunately, function was only called with "isvectorized".

Differential Revision: https://reviews.llvm.org/D57566

llvm-svn: 353738
2019-02-11 19:45:44 +00:00
Sanjay Patel 587fd849f0 [InstCombine] Fix matchRotate bug when one operand is a ConstantExpr shift
This bug seems to be harmless in release builds, but will cause an error in UBSAN
builds or an assertion failure in debug builds.

When it gets to this opcode comparison, it assumes both of the operands are BinaryOperators,
but the prior m_LogicalShift will also match a ConstantExpr. The cast<BinaryOperator> will
assert in a debug build, or reading an invalid value for BinaryOp from memory with
((BinaryOperator*)constantExpr)->getOpcode() will cause an error in a UBSAN build.

The test I added will fail without this change in debug/UBSAN builds, but not in release.

Patch by: @AndrewScheidecker (Andrew Scheidecker)

Differential Revision: https://reviews.llvm.org/D58049

llvm-svn: 353736
2019-02-11 19:26:27 +00:00
Alina Sbirlea 605b21739d [LICM&MSSA] Limit store hoisting.
Summary:
If there is no clobbering access for a store inside the loop, that store
can only be hoisted if there are no interfearing loads.
A more general verification introduced here: there are no loads that are
not optimized to an access outside the loop.
Addresses PR40586.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57967

llvm-svn: 353734
2019-02-11 19:07:15 +00:00
Chandler Carruth 1f5550326f Update more files added with the old header to the new one.
llvm-svn: 353667
2019-02-11 08:25:56 +00:00
Chandler Carruth b53f0e1145 Update files that were mistakenly added with the old file header to the
new one.

llvm-svn: 353665
2019-02-11 08:07:38 +00:00
Chandler Carruth 751d95fb9b [CallSite removal] Migrate ConstantFolding APIs and implementation to
`CallBase`.

Users have been updated. You can see how to update any out-of-tree
usages: pass `cast<CallBase>(CS.getInstruction())`.

llvm-svn: 353661
2019-02-11 07:51:44 +00:00
Chandler Carruth 3160734af1 [CallSite removal] Migrate the statepoint GC infrastructure to use the
`CallBase` class rather than `CallSite` wrappers.

I pushed this change down through most of the statepoint infrastructure,
completely removing the use of CallSite where I could reasonably do so.
I ended up making a couple of cut-points: generic call handling
(instcombine, TLI, SDAG). As soon as it hit truly generic handling with
users outside the immediate code, I simply transitioned into or out of
a `CallSite` to make this a reasonable sized chunk.

Differential Revision: https://reviews.llvm.org/D56122

llvm-svn: 353660
2019-02-11 07:42:30 +00:00
Fangrui Song 709a3e7488 [Local] Delete a redundant check. NFC
isInstructionTriviallyDead also performs the use_empty() check.

llvm-svn: 353637
2019-02-10 09:25:56 +00:00
Craig Topper a97857b5b5 [InstCombine] Fix an unused variable warning.
llvm-svn: 353630
2019-02-10 02:21:29 +00:00
Fangrui Song 6e679f8ba5 [GlobalOpt] Simplify __cxa_atexit elimination
cxxDtorIsEmpty checks callers recursively to determine if the
__cxa_atexit-registered function is empty, and eliminates the
__cxa_atexit call accordingly.

This recursive check is unnecessary as redundant instructions and
function calls can be removed by early-cse and inliner. In addition,
cxxDtorIsEmpty does not mark visited function and it may visit a
function exponential times (multiplication principle).

llvm-svn: 353603
2019-02-09 09:18:37 +00:00
Gabor Buella 53980b24b7 Extra processing for BitCast + PHI in InstCombine
For some specific cases with bitcast A->B->A with intervening PHI nodes InstCombiner::optimizeBitCastFromPhi transformation creates extra PHI nodes, which are actually a copy of already created PHI or in another words, they are redundant. These extra PHI nodes could lead to extra move instructions generated after DeSSA transformation. This happens when several conditions are met

 - SROA kicks in and creates new alloca;
 - there is a simple assignment L = R, which falls under 'canonicalize loads' done by combineLoadToOperationType (this transformation is by default). Exactly this transformation is the reason of bitcasts generated;
 - the alloca is then used in A->B->A + PHI chain;
 - there is a loop unrolling.

As a result optimizeBitCastFromPhi creates as many of PHI nodes for each new SROA alloca as loop unrolling factor is. These new extra PHI nodes are redundant actually except of one and should not be created. Moreover the idea of optimizeBitCastFromPhi is to get rid of the cast (when possible) but that doesn't happen in these conditions.

The proposed fix is to do the cast replacement for the whole calculated/accumulated PHI closure not for one cast only, which is an argument to the optimizeBitCastFromPhi. These will help to accomplish several things: 1) avoid extra PHI nodes generated as all casts which may trigger optimizeBitCastFromPhi transformation will be replaced, 3) bitcasts will be replaced, and 3) create more opportunities to remove dead code, which appears after the replacement.

A new test case shows that it's possible to get rid of all bitcasts completely and get quite good code reduction.

Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com>

Reviewed By: Carrot

Differential Revision: https://reviews.llvm.org/D57053

llvm-svn: 353595
2019-02-09 01:44:28 +00:00
Sergey Dmitriev afd612ece9 [NFC] Avoid passing blocks vector to the OutlineRegionInfo constructor by value.
Reviewers: vsk, fhahn, davidxl

Reviewed By: vsk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57957

llvm-svn: 353582
2019-02-08 23:52:15 +00:00
Craig Topper 784929d045 Implementation of asm-goto support in LLVM
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.

This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.

There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.

Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii

Differential Revision: https://reviews.llvm.org/D53765

llvm-svn: 353563
2019-02-08 20:48:56 +00:00
Vedant Kumar 0e5dd512aa [CodeExtractor] Restore outputs after creating exit stubs
When CodeExtractor saves the result of InvokeInst at the first insertion
point of the 'normal destination' basic block, this block can be omitted
in the outlined region, so store is placed outside of the function. The
suggested solution is to process saving outputs after creating exit
stubs for new function, and stores will be placed in that blocks before
return in this case.

Patch by Sergei Kachkov!

Fixes llvm.org/PR40455.

Differential Revision: https://reviews.llvm.org/D57919

llvm-svn: 353562
2019-02-08 20:48:04 +00:00
Reid Kleckner 987d331fab [InstrProf] Implement static profdata registration
Summary:
The motivating use case is eliminating duplicate profile data registered
for the same inline function in two object files. Before this change,
users would observe multiple symbol definition errors with VC link, but
links with LLD would succeed.

Users (Mozilla) have reported that PGO works well with clang-cl and LLD,
but when using LLD without this static registration, we would get into a
"relocation against a discarded section" situation. I'm not sure what
happens in that situation, but I suspect that duplicate, unused profile
information was retained. If so, this change will reduce the size of
such binaries with LLD.

Now, Windows uses static registration and is in line with all the other
platforms.

Reviewers: davidxl, wmi, inglorion, void, calixte

Subscribers: mgorny, krytarowski, eraman, fedor.sergeev, hiraditya, #sanitizers, dmajor, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D57929

llvm-svn: 353547
2019-02-08 19:03:50 +00:00
Teresa Johnson 3ce8112dad ArgumentPromotion should copy all metadata to new Function
Summary:
ArgumentPromotion had code to specifically move the dbg metadata over to
the new function, but other metadata such as the function_entry_count
!prof metadata was not. Replace code that moved dbg metadata with a call
to copyMetadata. The old metadata is automatically removed when the old
Function is removed.

Reviewers: davidxl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57846

llvm-svn: 353537
2019-02-08 17:08:27 +00:00
Carlos Alberto Enciso 08dc50f2fb [DWARF] LLVM ERROR: Broken function found, while removing Debug Intrinsics.
Check that when SimplifyCFG is flattening a 'br', all their debug intrinsic instructions are removed, including any dbg.label referencing a label associated with the basic blocks being removed.

Differential Revision: https://reviews.llvm.org/D57444

llvm-svn: 353511
2019-02-08 10:57:26 +00:00
Max Kazantsev 6b63d3a277 [LoopSimplifyCFG] Use DTU.applyUpdates instead of insert/deleteEdge
`insert/deleteEdge` methods in DTU can make updates incorrectly in some cases
(see https://bugs.llvm.org/show_bug.cgi?id=40528), and it is recommended to
use `applyUpdates` methods instead when it is needed to make a mass update in CFG.

Differential Revision: https://reviews.llvm.org/D57316
Reviewed By: kuhar

llvm-svn: 353502
2019-02-08 08:12:41 +00:00
Sergey Dmitriev 807960e6ef [CodeExtractor] Update function's assumption cache after extracting blocks from it
Summary: Assumption cache's self-updating mechanism does not correctly handle the case when blocks are extracted from the function by the CodeExtractor. As a result function's assumption cache may have stale references to the llvm.assume calls that were moved to the outlined function. This patch fixes this problem by removing extracted llvm.assume calls from the function’s assumption cache.

Reviewers: hfinkel, vsk, fhahn, davidxl, sanjoy

Reviewed By: hfinkel, vsk

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57215

llvm-svn: 353500
2019-02-08 06:55:18 +00:00
Quentin Colombet 96f54de8ff [InstCombine] Optimize `atomicrmw <op>, 0` into `load atomic` when possible
This commit teaches InstCombine how to replace an atomicrmw operation
into a simple load atomic.
For a given `atomicrmw <op>`, this is possible when:
1. The ordering of that operation is compatible with a load (i.e.,
   anything that doesn't have a release semantic).
2. <op> does not modify the value being stored

Differential Revision: https://reviews.llvm.org/D57854

llvm-svn: 353471
2019-02-07 21:27:23 +00:00
Florian Hahn f557a94aa3 [LV] Remove unnecessary assignment to UserIC.
llvm-svn: 353469
2019-02-07 21:23:37 +00:00
Sanjay Patel 781d883862 [InstCombine] Fix crashing from (icmp (bitcast ([su]itofp X)), Y)
This fixes a class of bugs introduced by D44367, 
which transforms various cases of icmp (bitcast ([su]itofp X)), Y to icmp X, Y. 
If the bitcast is between vector types with a different number of elements, 
the current code will produce bad IR along the lines of: icmp <N x i32> ..., <M x i32> <...>.

This patch suppresses the transform if the bitcast changes the number of vector elements.

Patch by: @AndrewScheidecker (Andrew Scheidecker)

Differential Revision: https://reviews.llvm.org/D57871

llvm-svn: 353467
2019-02-07 21:12:01 +00:00
Sanjay Patel e7f46c3db3 [InstCombine] refactor folds for (icmp (bitcast X), Y); NFCI
llvm-svn: 353462
2019-02-07 20:54:09 +00:00
Florian Hahn ba5acbc4fe [LV] Prevent interleaving if computeMaxVF returned None.
As discussed in D57382, interleaving should be avoided if computeMaxVF
returns None, same as we currently do for vectorization.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477

Reviewers: Ayal, dcaballe, hsaito, mkuper, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D57837

llvm-svn: 353461
2019-02-07 20:49:10 +00:00
Reid Kleckner f21c022380 [InstrProf] Avoid reconstructing Triple, NFC
llvm-svn: 353439
2019-02-07 18:16:22 +00:00
Teresa Johnson c36c10ddfb [HotColdSplit] With PGO add profile entry metadata to split cold function
Summary:
When compiling with profile data, ensure the split cold function gets
cold function_entry_count metadata (just use 0 since it should be cold).
Otherwise with function sections it will not be placed in the unlikely
text section with other cold code.

Reviewers: vsk

Subscribers: sebpop, hiraditya, davidxl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57900

llvm-svn: 353434
2019-02-07 17:50:35 +00:00
Sam Parker 67756c09f2 [LSR] Generate cross iteration indexes
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.

Differential Revision: https://reviews.llvm.org/D55373

llvm-svn: 353403
2019-02-07 13:32:54 +00:00
Alina Sbirlea 6cba96ed52 [LICM/MSSA] Add promotion to scalars by building an AliasSetTracker with MemorySSA.
Summary:
Experimentally we found that promotion to scalars carries less benefits
than sinking and hoisting in LICM. When using MemorySSA, we build an
AliasSetTracker on demand in order to reuse the current infrastructure.
We only build it if less than AccessCapForMSSAPromotion exist in the
loop, a cap that is by default set to 250. This value ensures there are
no runtime regressions, and there are small compile time gains for
pathological cases. A much lower value (20) was found to yield a single
regression in the llvm-test-suite and much higher benefits for compile
times. Conservatively we set the current cap to a high value, but we will
explore lowering it when MemorySSA is enabled by default.

Reviewers: sanjoy, chandlerc

Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D56625

llvm-svn: 353339
2019-02-06 20:25:17 +00:00
Sanjay Patel 68bc5fb0ad [InstCombine] X | C == C --> (X & ~C) == 0
We should canonicalize to one of these forms,
and compare-with-zero could be more conducive
to follow-on transforms. This also leads to
generally better codegen as shown in PR40611:
https://bugs.llvm.org/show_bug.cgi?id=40611

llvm-svn: 353313
2019-02-06 16:43:54 +00:00
Max Kazantsev cd48ac3661 [NFC] Simplify check in guard widening
llvm-svn: 353290
2019-02-06 11:27:00 +00:00
Max Kazantsev 36b392cbe4 [NFC] Factor out detatchment of dead blocks from their erasing
llvm-svn: 353277
2019-02-06 07:56:36 +00:00
Max Kazantsev a4ccfc1841 [LoopSimplifyCFG] Do not count dead exit blocks twice, make CFG simpler
llvm-svn: 353276
2019-02-06 07:49:17 +00:00
Max Kazantsev 0d7ad3c9a3 [NFC] Revert rL353274
llvm-svn: 353275
2019-02-06 06:33:02 +00:00
Max Kazantsev 61e6ffc398 [NFC] Extend API of DeleteDeadBlock(s) to collect updates without DTU
llvm-svn: 353274
2019-02-06 06:00:02 +00:00
Max Kazantsev bad4db8b1a [NFC] Replace readonly SmallVectorImpl with ArrayRef
llvm-svn: 353273
2019-02-06 05:40:31 +00:00
Teresa Johnson 716abbeb43 [HotColdSplit] Move splitting after instrumented PGO use
Summary:
Follow up to D57082 which moved splitting earlier in the pipeline, in
order to perform it before inlining. However, it was moved too early,
before the IR is annotated with instrumented PGO data. This caused the
splitting to incorrectly determine cold functions.

Move it to just after PGO annotation (still before inlining), in both
pass managers.

Reviewers: vsk, hiraditya, sebpop

Subscribers: mehdi_amini, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57805

llvm-svn: 353270
2019-02-06 04:29:39 +00:00
Richard Trieu 5f436fc57a Move DomTreeUpdater from IR to Analysis
DomTreeUpdater depends on headers from Analysis, but is in IR.  This is a
layering violation since Analysis depends on IR.  Relocate this code from IR
to Analysis to fix the layering violation.

llvm-svn: 353265
2019-02-06 02:52:52 +00:00
Vedant Kumar bd94b4287c [HotColdSplit] Do not split out `resume` instructions
Resumes that are not reachable from a cleanup landing pad are considered
to be unreachable. It’s not safe to split them out.

rdar://47808235

llvm-svn: 353242
2019-02-05 23:39:02 +00:00
Sanjay Patel cddb1e5469 [InstCombine] limit extracting shuffle transform based on uses
As discussed in D53037, this can lead to worse codegen, and we
don't generally expect the backend to be able to optimize
arbitrary shuffles. If there's only one use of the 1st shuffle,
that means it's getting removed, so that should always be
safe.

llvm-svn: 353235
2019-02-05 22:58:45 +00:00
Rong Xu ce10d5ead4 [PGO] Use a function for creating variable for profile file name. NFC.
Factored out the code for creating variable for profile file name to
a function.

llvm-svn: 353230
2019-02-05 22:34:45 +00:00
Jeremy Morse 84ca706be1 [DebugInfo][NFCI] Split salvageDebugInfo into helper functions
Some use cases are appearing where salvaging is needed that does not
correspond to an instruction being deleted -- for example an instruction
being sunk, or a Value not being available in a block being isel'd.

Enable more fine grained control over how salavging occurs by splitting
the logic into helper functions, separating things that are specific to
working on DbgVariableIntrinsics from those specific to interpreting IR
and building DIExpressions.

Differential Revision: https://reviews.llvm.org/D57696

llvm-svn: 353156
2019-02-05 11:11:28 +00:00
Max Kazantsev d5e595b7a6 [LSR] Check SCEV on isZero() after extend. PR40514
When LSR first adds SCEVs to BaseRegs, it only does it if `isZero()` has
returned false. In the end, in invocation of `InsertFormula`, it asserts that
all values there are still not zero constants. However between these two
points, it makes some transformations, in particular extends them to wider
type.

SCEV does not give us guarantee that if `S` is not a constant zero, then
`sext(S)` is also not a constant zero. It might have missed some optimizing
transforms when it was calculating `S` and then made them when it took `sext`.
For example, it may happen if previously optimizing transforms were limited
by depth or somehow else.

This patch adds a bailout when we may end up with a zero SCEV after extension.

Differential Revision: https://reviews.llvm.org/D57565
Reviewed By: samparker

llvm-svn: 353136
2019-02-05 04:30:37 +00:00
Richard Trieu a9354b2f33 Fix narrowing issue from r353129
llvm-svn: 353134
2019-02-05 02:26:03 +00:00
Wei Mi 4901f371a2 [SamplePGO][NFC] Minor improvement to replace a temporary vector with a
brace-enclosed init list.

Differential Revision: https://reviews.llvm.org/D57726

llvm-svn: 353129
2019-02-05 00:57:50 +00:00
Teresa Johnson 4bdf82ce79 [SamplePGO] Minor efficiency improvement in samplePGO ICP
Summary:
When attaching prof metadata to promoted direct calls in SamplePGO
mode, no need to construct and use a SmallVector to pass a single count
to the ArrayRef parameter, we can simply use a brace-enclosed init list.

This made a small but consistent improvement for a ThinLTO backend
compile I was measuring.

Reviewers: wmi

Subscribers: mehdi_amini, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57706

llvm-svn: 353123
2019-02-05 00:18:38 +00:00
Julian Lettner 29ac3a5b82 [SanitizerCoverage] Clang crashes if user declares `__sancov_lowest_stack` variable
Summary:
If the user declares or defines `__sancov_lowest_stack` with an
unexpected type, then `getOrInsertGlobal` inserts a bitcast and the
following cast fails:
```
Constant *SanCovLowestStackConstant =
       M.getOrInsertGlobal(SanCovLowestStackName, IntptrTy);
SanCovLowestStack = cast<GlobalVariable>(SanCovLowestStackConstant);
```

This variable is a SanitizerCoverage implementation detail and the user
should generally never have a need to access it, so we emit an error
now.

rdar://problem/44143130

Reviewers: morehouse

Differential Revision: https://reviews.llvm.org/D57633

llvm-svn: 353100
2019-02-04 22:06:30 +00:00
Nicolai Haehnle a69146e67e [InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemanded
Summary:
The fix added in r352904 is not quite correct, or rather misleading:

1. When the texfailctrl (TFC) argument was non-constant, the fix assumed
   non-TFE/LWE, which is incorrect.

2. Regardless, this code path cannot even be hit for correct
   TFE/LWE-enabled calls, because those return a struct. Added
   a test case for those for completeness.

Change-Id: I92d314dbc67a2670f6d7adaab765ef45f56a49cf

Reviewers: hliao, dstuttard, arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57681

llvm-svn: 353097
2019-02-04 21:24:19 +00:00
Philip Pfaffe 0ee6a933ce [NewPM][MSan] Add Options Handling
Summary: This patch enables passing options to msan via the passes pipeline, e.e., -passes=msan<recover;kernel;track-origins=4>.

Reviewers: chandlerc, fedor.sergeev, leonardchan

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57640

llvm-svn: 353090
2019-02-04 21:02:49 +00:00
Michael Kruse 70560a0a2c [WarnMissedTransforms] Do not warn about already vectorized loops.
LoopVectorize adds llvm.loop.isvectorized, but leaves
llvm.loop.vectorize.enable. Do not consider such a loop for user-forced
vectorization since vectorization already happened -- by prioritizing
llvm.loop.isvectorized except for TM_SuppressedByUser.

Fixes http://llvm.org/PR40546

Differential Revision: https://reviews.llvm.org/D57542

llvm-svn: 353082
2019-02-04 19:55:59 +00:00
Max Kazantsev 56b57e3f53 [NFC] Make a check in GuardWidening more obvious
llvm-svn: 353038
2019-02-04 10:41:17 +00:00
Max Kazantsev 09802f41cc [NFC] Rename variables to reflect the actual status of GuardWidening
llvm-svn: 353036
2019-02-04 10:31:18 +00:00
Max Kazantsev 13ab5cbb64 [NFC] Remove redundant parameters for better readability
llvm-svn: 353034
2019-02-04 10:20:51 +00:00
Max Kazantsev 65970aa24d [NFC] Replace equivalent condition for better readability
llvm-svn: 353032
2019-02-04 09:55:18 +00:00
Davide Italiano 73929c4d24 [LoopIdiomRecognize] @llvm.dbg values shouldn't affect the transformation.
Summary: PR40564

Reviewers: aprantl, rnk

Subscribers: llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57629

llvm-svn: 353007
2019-02-03 20:33:20 +00:00
Florian Hahn dd2ef0af46 [LCSSA] Handle case with single new PHI faster.
If there is only a single available value, all uses must be dominated by
the single value and there is no need to search for a reaching
definition.

This drastically speeds up LCSSA in some cases. For the test case
from PR37202, it speeds up LCSSA construction by 4 times.

Time-passes without this patch for test case from PR37202:

    Total Execution Time: 29.9285 seconds (29.9276 wall clock)

    ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
    5.2786 ( 17.7%)   0.0021 (  1.2%)   5.2806 ( 17.6%)   5.2808 ( 17.6%)  Unswitch loops
    4.3739 ( 14.7%)   0.0303 ( 18.1%)   4.4042 ( 14.7%)   4.4042 ( 14.7%)  Loop-Closed SSA Form Pass
    4.2658 ( 14.3%)   0.0192 ( 11.5%)   4.2850 ( 14.3%)   4.2851 ( 14.3%)  Loop-Closed SSA Form Pass #2
    2.2307 (  7.5%)   0.0013 (  0.8%)   2.2320 (  7.5%)   2.2318 (  7.5%)  Loop Invariant Code Motion
    2.0888 (  7.0%)   0.0012 (  0.7%)   2.0900 (  7.0%)   2.0897 (  7.0%)  Unroll loops
    1.6761 (  5.6%)   0.0013 (  0.8%)   1.6774 (  5.6%)   1.6774 (  5.6%)  Value Propagation
    1.3686 (  4.6%)   0.0029 (  1.8%)   1.3716 (  4.6%)   1.3714 (  4.6%)  Induction Variable Simplification
    1.1457 (  3.8%)   0.0010 (  0.6%)   1.1468 (  3.8%)   1.1468 (  3.8%)  Loop-Closed SSA Form Pass #4
    1.1384 (  3.8%)   0.0005 (  0.3%)   1.1389 (  3.8%)   1.1389 (  3.8%)  Loop-Closed SSA Form Pass #6
    1.1360 (  3.8%)   0.0027 (  1.6%)   1.1387 (  3.8%)   1.1387 (  3.8%)  Loop-Closed SSA Form Pass #5
    1.1331 (  3.8%)   0.0010 (  0.6%)   1.1341 (  3.8%)   1.1340 (  3.8%)  Loop-Closed SSA Form Pass #3

Time passes with this patch

  Total Execution Time: 19.2802 seconds (19.2813 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   4.4234 ( 23.2%)   0.0038 (  2.0%)   4.4272 ( 23.0%)   4.4273 ( 23.0%)  Unswitch loops
   2.3828 ( 12.5%)   0.0020 (  1.1%)   2.3848 ( 12.4%)   2.3847 ( 12.4%)  Unroll loops
   1.8714 (  9.8%)   0.0020 (  1.1%)   1.8734 (  9.7%)   1.8735 (  9.7%)  Loop Invariant Code Motion
   1.7973 (  9.4%)   0.0022 (  1.2%)   1.7995 (  9.3%)   1.8003 (  9.3%)  Value Propagation
   1.4010 (  7.3%)   0.0033 (  1.8%)   1.4043 (  7.3%)   1.4044 (  7.3%)  Induction Variable Simplification
   0.9978 (  5.2%)   0.0244 ( 13.1%)   1.0222 (  5.3%)   1.0224 (  5.3%)  Loop-Closed SSA Form Pass #2
   0.9611 (  5.0%)   0.0257 ( 13.8%)   0.9868 (  5.1%)   0.9868 (  5.1%)  Loop-Closed SSA Form Pass
   0.5856 (  3.1%)   0.0015 (  0.8%)   0.5871 (  3.0%)   0.5869 (  3.0%)  Unroll loops #2
   0.4132 (  2.2%)   0.0012 (  0.7%)   0.4145 (  2.1%)   0.4143 (  2.1%)  Loop Invariant Code Motion #3

Reviewers: efriedma, davide, mzolotukhin

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D57033

llvm-svn: 352960
2019-02-02 15:26:05 +00:00
Florian Hahn 509b48a64a [LCSSA] Add expensive verification of LCSSA form for sub-loops.
This assertion makes sure all sub-loops are in LCSSA form before
bringing their parent in LCSSA form. This precondition was added to
formLCSSA in D56848.

Reviewers: davide, efriedma, mzolotukhin

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D56921

llvm-svn: 352958
2019-02-02 14:42:27 +00:00
Julian Lettner f82d8924ef [ASan] Do not instrument other runtime functions with `__asan_handle_no_return`
Summary:
Currently, ASan inserts a call to `__asan_handle_no_return` before every
`noreturn` function call/invoke. This is unnecessary for calls to other
runtime funtions. This patch changes ASan to skip instrumentation for
functions calls marked with `!nosanitize` metadata.

Reviewers: TODO

Differential Revision: https://reviews.llvm.org/D57489

llvm-svn: 352948
2019-02-02 02:05:16 +00:00
James Y Knight 291f791ef1 [opaque pointer types] Pass function type for CallBase::setCalledFunction.
Differential Revision: https://reviews.llvm.org/D57174

llvm-svn: 352914
2019-02-01 20:44:54 +00:00
James Y Knight 7716075a17 [opaque pointer types] Pass value type to GetElementPtr creation.
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57173

llvm-svn: 352913
2019-02-01 20:44:47 +00:00
James Y Knight 14359ef1b6 [opaque pointer types] Pass value type to LoadInst creation.
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57172

llvm-svn: 352911
2019-02-01 20:44:24 +00:00
James Y Knight d9e85a0861 [opaque pointer types] Pass function types to InvokeInst creation.
This cleans up all InvokeInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57171

llvm-svn: 352910
2019-02-01 20:43:34 +00:00
James Y Knight 7976eb5838 [opaque pointer types] Pass function types to CallInst creation.
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57170

llvm-svn: 352909
2019-02-01 20:43:25 +00:00
Michael Liao 8b323f53eb [InstCombine] Extra null-checking on TFE/LWE support
- If that operand is not ConstantInt, skip enabling TFE/LWE.

Differential Revision: https://reviews.llvm.org/D57539

llvm-svn: 352904
2019-02-01 19:53:44 +00:00
Sanjay Patel fbcbac7174 [InstCombine] reduce duplicate code; NFC
An unused variable problem was introduced with rL352870 
and stubbed out with rL352871, but we can make a better
fix by actually using the local variable in code rather 
than just the assert.  

llvm-svn: 352873
2019-02-01 14:37:49 +00:00
Fangrui Song 8495aabec2 [InstCombine] Fix -Wunused-variable when -DLLVM_ENABLE_ASSERTIONS=off
llvm-svn: 352871
2019-02-01 14:22:02 +00:00
Sanjay Patel be23a91fcd [InstCombine] try to reduce x86 addcarry to generic uaddo intrinsic
If we can reduce the x86-specific intrinsic to the generic op, it allows existing 
simplifications and value tracking folds. AFAICT, this always results in identical 
x86 codegen in the non-reduced case...which should be true because we semi-generically 
(too aggressively IMO) convert to llvm.uadd.with.overflow in CGP, so the DAG/isel must 
already combine/lower this intrinsic as expected.

This isn't quite what was requested in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but we want to have these kinds of folds early for efficiency and to enable greater 
simplifications. For the case in the bug report where we have:
_addcarry_u64(0, ahi, 0, &ahi)
...this gets completely simplified away in IR.

Differential Revision: https://reviews.llvm.org/D57453

llvm-svn: 352870
2019-02-01 14:14:47 +00:00
Yevgeny Rouban 15b17d0a7c Provide reason messages for unviable inlining
InlineCost's isInlineViable() is changed to return InlineResult
instead of bool. This provides messages for failure reasons and
allows to get more specific messages for cases where callsites
are not viable for inlining.

Reviewed By: xbolva00, anemet

Differential Revision: https://reviews.llvm.org/D57089

llvm-svn: 352849
2019-02-01 10:44:43 +00:00
Yevgeny Rouban 4cdd783955 [SLPVectorizer] Get rid of IndexQueue array from vectorizeStores. NFCI.
Indices are checked as they are generated. No need to fill the whole array of indices.

Differential Revision: https://reviews.llvm.org/D57144

llvm-svn: 352839
2019-02-01 06:44:08 +00:00
James Y Knight 13680223b9 [opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.

Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.

Then:
- update the CallInst/InvokeInst instruction creation functions to
  take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.

One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.

However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)

Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.

Differential Revision: https://reviews.llvm.org/D57315

llvm-svn: 352827
2019-02-01 02:28:03 +00:00
Kostya Serebryany a78a44d480 [sanitizer-coverage] prune trace-cmp instrumentation for CMP isntructions that feed into the backedge branch. Instrumenting these CMP instructions is almost always useless (and harmful) for fuzzing
llvm-svn: 352818
2019-01-31 23:43:00 +00:00
James Y Knight fadf25068e Revert "[opaque pointer types] Add a FunctionCallee wrapper type, and use it."
This reverts commit f47d6b38c7 (r352791).

Seems to run into compilation failures with GCC (but not clang, where
I tested it). Reverting while I investigate.

llvm-svn: 352800
2019-01-31 21:51:58 +00:00
Alina Sbirlea e271889291 [EarlyCSE & MSSA] Cleanup special handling for removing MemoryAccesses.
Summary: Moving special handling to MemorySSAUpdater in D57199.

Reviewers: gberry, george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D57200

llvm-svn: 352794
2019-01-31 21:12:41 +00:00
James Y Knight f47d6b38c7 [opaque pointer types] Add a FunctionCallee wrapper type, and use it.
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.

Then:
- update the CallInst/InvokeInst instruction creation functions to
  take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.

One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.

However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)

Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.

Differential Revision: https://reviews.llvm.org/D57315

llvm-svn: 352791
2019-01-31 20:35:56 +00:00
Craig Topper c1892ec15a [CallSite removal] Remove CallSite uses from InstCombine.
Reviewers: chandlerc

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57494

llvm-svn: 352771
2019-01-31 17:23:29 +00:00
Teresa Johnson f59242e5ff Recommit "[ThinLTO] Rename COMDATs for COFF when promoting/renaming COMDAT leader"
Recommit of r352763 with fix for use after free.

llvm-svn: 352770
2019-01-31 17:18:11 +00:00
Teresa Johnson 4877715ee6 Revert "[ThinLTO] Rename COMDATs for COFF when promoting/renaming COMDAT leader"
This reverts commit r352763.

Causing a couple bot failures, root cause pointed to by sanitizer bot:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/28909/steps/annotate/logs/stdio

Use after free. I understand the issue but will revert and test with fix
before recommitting.

llvm-svn: 352768
2019-01-31 16:46:14 +00:00
Teresa Johnson 992b53fd16 [ThinLTO] Rename COMDATs for COFF when promoting/renaming COMDAT leader
Summary:
COFF requires that COMDAT name match that of the leader. When we promote
and rename an internal leader in ThinLTO due to an import, ensure we
subsequently rename the associated COMDAT. Similar to D31963 which did
this during ThinLTO module splitting.

Fixes PR40414.

Reviewers: pcc, inglorion

Subscribers: mehdi_amini, dexonsmith, dmajor, llvm-commits

Differential Revision: https://reviews.llvm.org/D57395

llvm-svn: 352763
2019-01-31 16:00:15 +00:00
Max Kazantsev f392bc846f Default lowering for experimental.widenable.condition
Introduces a pass that provides default lowering strategy for the
`experimental.widenable.condition` intrinsic, replacing all its uses with
`i1 true`.

Differential Revision: https://reviews.llvm.org/D56096
Reviewed By: reames

llvm-svn: 352739
2019-01-31 09:10:17 +00:00
Dmitry Venikov 8817658836 [InstCombine] Missed optimization in math expression: simplify calls exp functions
Summary: This patch enables folding following expressions under -ffast-math flag: exp(X) * exp(Y) -> exp(X + Y), exp2(X) * exp2(Y) -> exp2(X + Y). Motivation: https://bugs.llvm.org/show_bug.cgi?id=35594

Reviewers: hfinkel, spatel, efriedma, lebedev.ri

Reviewed By: spatel, lebedev.ri

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D41342

llvm-svn: 352730
2019-01-31 06:28:10 +00:00
Erik Pilkington 600e9deacf Add a 'dynamic' parameter to the objectsize intrinsic
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.

rdar://32212419

Differential revision: https://reviews.llvm.org/D56761

llvm-svn: 352664
2019-01-30 20:34:35 +00:00
Philip Reames c71e996aed SimplifyDemandedVectorElts for all intrinsics
The point is that this simplifies integration of new intrinsics into SimplifiedDemandedVectorElts, and ensures we don't miss any existing ones.

This is intended to be NFC-ish, but as seen from the diffs, can produce slightly different output.  This is due to order of transforms w/in instcombine resulting in two slightly different fixed points.  That's something we should fix, but isn't a problem w/this patch per se.

Differential Revision: https://reviews.llvm.org/D57398

llvm-svn: 352653
2019-01-30 19:21:11 +00:00
Max Kazantsev 365021cc15 Properly use DT.verify in LoopSimplifyCFG
llvm-svn: 352621
2019-01-30 12:32:19 +00:00
Max Kazantsev 34eeeec3ae Enable IRCE for narrow latch by defailt
llvm-svn: 352619
2019-01-30 11:25:12 +00:00
Alina Sbirlea f9027e554a Check bool attribute value in getOptionalBoolLoopAttribute.
Summary:
Check the bool value of the attribute in getOptionalBoolLoopAttribute
not just its existance.
Eliminates the warning noise generated when vectorization is explicitly disabled.

Reviewers: Meinersbur, hfinkel, dmgreen

Subscribers: jlebar, sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D57260

llvm-svn: 352555
2019-01-29 22:33:20 +00:00
Sanjay Patel 18db56209c [InstCombine] canonicalize cmp/select form of uadd saturate with constant
I'm circling back around to a loose end from D51929.

The backend (either CGP or DAG) doesn't recognize this pattern, so we end up with different asm for these IR variants.

Regardless of any future changes to canonicalize to saturation/overflow intrinsics, we want to get raw IR variations 
into the minimal number of raw IR forms. If/when we can canonicalize to intrinsics, that will make that step easier.

  Pre: C2 == ~C1
  %a = add i32 %x, C1
  %c = icmp ugt i32 %x, C2
  %r = select i1 %c, i32 -1, i32 %a
  =>
  %a = add i32 %x, C1
  %c2 = icmp ult i32 %x, C2
  %r = select i1 %c2, i32 %a, i32 -1

  https://rise4fun.com/Alive/pkH

Differential Revision: https://reviews.llvm.org/D57352

llvm-svn: 352536
2019-01-29 20:02:45 +00:00
Bjorn Pettersson d014d576a9 [IPCP] Don't crash due to arg count/type mismatch between caller/callee
Summary:
This patch avoids an assert in IPConstantPropagation when
there is a argument count/type mismatch between the caller and
the callee.

While this is actually UB on C-level (clang emits a warning),
the IR verifier seems to accept it. I'm not sure what other
frontends/languages might think about this, so simply bailing out
to avoid hitting an assert (in CallSiteBase<>::getArgOperand or
Value::doRAUW) seems like a simple solution.

The problem is exposed by the fact that AbstractCallSites will look
through a bitcast at the callee position of a call/invoke.

Reviewers: jdoerfert, reames, efriedma

Reviewed By: jdoerfert, efriedma

Subscribers: eli.friedman, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D57052

llvm-svn: 352469
2019-01-29 10:19:44 +00:00
Philip Reames 6c5341bc5a Demanded elements support for vector GEPs
GEPs can produce either scalar or vector results. If we're extracting only a subset of the vector lanes, simplifying the operands is helpful in eliminating redundant computation, and (eventually) allowing further optimizations

Differential Revision: https://reviews.llvm.org/D57177

llvm-svn: 352440
2019-01-28 23:24:49 +00:00
Teresa Johnson 5b2f6a1bc2 [ThinLTO] Refine reachability check to fix compile time increase
Summary:
A recent fix to the ThinLTO whole program dead code elimination (D56117)
increased the thin link time on a large MSAN'ed binary by 2x.
It's likely that the time increased elsewhere, but was more noticeable
here since it was already large and ended up timing out.

That change made it so we would repeatedly scan all copies of linkonce
symbols for liveness every time they were encountered during the graph
traversal. This was needed since we only mark one copy of an aliasee as
live when we encounter a live alias. This patch fixes the issue in a
more efficient manner by simply proactively visiting the aliasee (thus
marking all copies live) when we encounter a live alias.

Two notes: One, this requires a hash table lookup (finding the aliasee
summary in the index based on aliasee GUID). However, the impact of this
seems to be small compared to the original pre-D56117 thin link time. It
could be addressed if we keep the aliasee ValueInfo in the alias summary
instead of the aliasee GUID, which I am exploring in a separate patch.

Second, we only populate the aliasee GUID field when reading summaries
from bitcode (whether we are reading individual summaries and merging on
the fly to form the compiled index, or reading in a serialized combined
index). Thankfully, that's currently the only way we can get to this
code as we don't yet support reading summaries from LLVM assembly
directly into a tool that performs the thin link (they must be converted
to bitcode first). I added a FIXME, however I have the fix under test
already. The easiest fix is to simply populate this field always, which
isn't hard, but more likely the change I am exploring to store the
ValueInfo instead as described above will subsume this. I don't want to
hold up the regression fix for this though.

Reviewers: trentxintong

Subscribers: mehdi_amini, inglorion, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D57203

llvm-svn: 352438
2019-01-28 22:27:05 +00:00
Vedant Kumar 1c3694a4d4 [CodeExtractor] Add support for the `swifterror` attribute
When passing a `swifterror` argument or alloca as an input to an
extraction region, mark the input parameter `swifterror`.

llvm-svn: 352408
2019-01-28 19:13:37 +00:00
Alina Sbirlea 932108703a [SimpleLoopUnswitch] Early check exit for trivial unswitch with MemorySSA.
Summary:
If MemorySSA is avaiable, we can skip checking all instructions if block has any Defs.
(volatile loads are also Defs).
We still need to check all instructions for "canThrow", even if no Defs are found.

Reviewers: chandlerc

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D57129

llvm-svn: 352393
2019-01-28 17:48:45 +00:00
Alina Sbirlea a34bcbf335 Revert rL352238.
llvm-svn: 352241
2019-01-25 21:12:08 +00:00
Alina Sbirlea 890a8e575f [WarnMissedTransforms] Set default to 1.
Summary:
Set default value for retrieved attributes to 1, since the check is against 1.
Eliminates the warning noise generated when the attributes are not present.

Reviewers: sanjoy

Subscribers: jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D57253

llvm-svn: 352238
2019-01-25 20:51:55 +00:00
Vedant Kumar db3f9774ee [HotColdSplit] Introduce a cost model to control splitting behavior
The main goal of the model is to avoid *increasing* function size, as
that would eradicate any memory locality benefits from splitting. This
happens when:

  - There are too many inputs or outputs to the cold region. Argument
    materialization and reloads of outputs have a cost.

  - The cold region has too many distinct exit blocks, causing a large
    switch to be formed in the caller.

  - The code size cost of the split code is less than the cost of a
    set-up call.

A secondary goal is to prevent excessive overall binary size growth.

With the cost model in place, I experimented to find a splitting
threshold that works well in practice. To make warm & cold code easily
separable for analysis purposes, I moved split functions to a "cold"
section. I experimented with thresholds between [0, 4] and set the
default to the threshold which minimized geomean __text size.

Experiment data from building LNT+externals for X86 (N = 639 programs,
all sizes in bytes):

| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os**       | 1736.3           | 0, n=0           | 10961.6        |
| -Os, thresh=0 | 1740.53          | 124.482, n=134   | 11014          |
| -Os, thresh=1 | 1734.79          | 57.8781, n=90    | 10978.6        |
| -Os, thresh=2 | ** 1733.85 **    | 65.6604, n=61    | 10977.6        |
| -Os, thresh=3 | 1733.85          | 65.3071, n=61    | 10977.6        |
| -Os, thresh=4 | 1735.08          | 67.5156, n=54    | 10965.7        |
| **-Oz**       | 1554.4           | 0, n=0           | 10153          |
| -Oz, thresh=2 | ** 1552.2 **     | 65.633, n=61     | 10176          |
| **-O3**       | 2563.37          | 0, n=0           | 13105.4        |
| -O3, thresh=2 | ** 2559.49 **    | 71.1072, n=61    | 13162.4        |

Picking thresh=2 reduces the geomean __text section size by 0.14% at
-Os, -Oz, and -O3 and causes ~0.2% growth in the TEXT segment. Note that
TEXT size is page-aligned, whereas section sizes are byte-aligned.

Experiment data from building LNT+externals for ARM64 (N = 558 programs,
all sizes in bytes):

| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os**       | 1763.96          | 0, n=0           | 42934.9        |
| -Os, thresh=2 | ** 1760.9 **     | 76.6755, n=61    | 42934.9        |

Picking thresh=2 reduces the geomean __text section size by 0.17% at
-Os and causes no growth in the TEXT segment.

Measurements were done with D57082 (r352080) applied.

Differential Revision: https://reviews.llvm.org/D57125

llvm-svn: 352228
2019-01-25 18:30:37 +00:00
Max Kazantsev 38cd9acbb9 [LoopSimplifyCFG] Fix inconsistency in blocks in loop markup
2nd part of D57095 with the same reason, just in another place. We never
fold branches that are not immediately in the current loop, but this check
is missing in `IsEdgeLive` As result, it may think that the edge in subloop is
dead while it's live. It's a pessimization in the current stance.

Differential Revision: https://reviews.llvm.org/D57147
Reviewed By: rupprecht	

llvm-svn: 352170
2019-01-25 05:05:02 +00:00
Vedant Kumar 9d70f2b939 [HotColdSplit] Describe the pass in more detail, NFC
llvm-svn: 352161
2019-01-25 03:22:38 +00:00
Vedant Kumar 65de025d64 [HotColdSplit] Split more aggressively before/after cold invokes
While a cold invoke itself and its unwind destination can't be
extracted, code which unconditionally executes before/after the invoke
may still be profitable to extract.

With cost model changes from D57125 applied, this gives a 3.5% increase
in split text across LNT+externals on arm64 at -Os.

llvm-svn: 352160
2019-01-25 03:22:23 +00:00
Peter Collingbourne 1a8acfb768 hwasan: If we split the entry block, move static allocas back into the entry block.
Otherwise they are treated as dynamic allocas, which ends up increasing
code size significantly. This reduces size of Chromium base_unittests
by 2MB (6.7%).

Differential Revision: https://reviews.llvm.org/D57205

llvm-svn: 352152
2019-01-25 02:08:46 +00:00
Haojian Wu b9613a39b8 Fix a compiler error introduced in r352093.
llvm-svn: 352098
2019-01-24 20:30:48 +00:00
Alina Sbirlea 0a4367209c [LICM] Cleanup duplicated code. [NFCI]
llvm-svn: 352093
2019-01-24 19:57:30 +00:00
Alina Sbirlea 52f6e2a173 [MemorySSA +LICM CFHoist] Solve PR40317.
Summary:
MemorySSA needs updating each time an instruction is moved.
LICM and control flow hoisting re-hoists instructions, thus needing another update when re-moving those instructions.
Pending cleanup: the MSSA update is duplicated, should be moved inside moveInstructionBefore.

Reviewers: jnspaulsson

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D57176

llvm-svn: 352092
2019-01-24 19:48:35 +00:00
Vedant Kumar ef1ebed1c6 [HotColdSplit] Move splitting earlier in the pipeline
Performing splitting early has several advantages:

  - Inhibiting inlining of cold code early improves code size. Compared
    to scheduling splitting at the end of the pipeline, this cuts code
    size growth in half within the iOS shared cache (0.69% to 0.34%).

  - Inhibiting inlining of cold code improves compile time. There's no
    need to inline split cold functions, or to inline as much *within*
    those split functions as they are marked `minsize`.

  - During LTO, extra work is only done in the pre-link step. Less code
    must be inlined during cross-module inlining.

An additional motivation here is that the most common cold regions
identified by the static/conservative splitting heuristic can (a) be
found before inlining and (b) do not grow after inlining. E.g.
__assert_fail, os_log_error.

The disadvantages are:

  - Some opportunities for splitting out cold code may be missed. This
    gap can potentially be narrowed by adding a worklist algorithm to the
    splitting pass.

  - Some opportunities to reduce code size may be lost (e.g. store
    sinking, when one side of the CFG diamond is split). This does not
    outweigh the code size benefits of splitting earlier.

On net, splitting early in the pipeline has substantial code size
benefits, and no major effects on memory locality or performance. We
measured memory locality using ktrace data, and consistently found that
10% fewer pages were needed to capture 95% of text page faults in key
iOS benchmarks. We measured performance on frequency-stabilized iOS
devices using LNT+externals.

This reverses course on the decision made to schedule splitting late in
r344869 (D53437).

Differential Revision: https://reviews.llvm.org/D57082

llvm-svn: 352080
2019-01-24 18:55:49 +00:00
Julian Lettner b62e9dc46b Revert "[Sanitizers] UBSan unreachable incompatible with ASan in the presence of `noreturn` calls"
This reverts commit cea84ab93a.

llvm-svn: 352069
2019-01-24 18:04:21 +00:00
Philip Reames 4d683ee7e3 [RS4GC] Be slightly less conservative for gep vector_base, scalar_idx
After submitting https://reviews.llvm.org/D57138, I realized it was slightly more conservative than needed. The scalar indices don't appear to be a problem on a vector gep, we even had a test for that.

Differential Revision: https://reviews.llvm.org/D57161

llvm-svn: 352061
2019-01-24 16:34:00 +00:00
Philip Reames a657510eb7 [RS4GC] Avoid crashing on gep scalar_base, vector_idx
This is an alternative to https://reviews.llvm.org/D57103.  After discussion, we dedicided to check this in as a temporary workaround, and pursue a true fix under the original thread.

The issue at hand is that the base rewriting algorithm doesn't consider the fact that GEPs can turn a scalar input into a vector of outputs. We had handling for scalar GEPs and fully vector GEPs (i.e. all vector operands), but not the scalar-base + vector-index forms. A true fix here requires treating GEP analogously to extractelement or shufflevector.

This patch is merely a workaround. It simply hides the crash at the cost of some ugly code gen for this presumable very rare pattern.

Differential Revision: https://reviews.llvm.org/D57138

llvm-svn: 352059
2019-01-24 16:08:18 +00:00
Florian Hahn bed7f9eab2 Revert "[HotColdSplitting] Get DT and PDT from the pass manager."
This reverts commit a6982414ed (llvm-svn: 352036),
because it causes a memory leak in the pass manager. Failing bot

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/10351/steps/check-llvm%20asan/logs/stdio

llvm-svn: 352041
2019-01-24 11:22:08 +00:00
Florian Hahn a6982414ed [HotColdSplitting] Get DT and PDT from the pass manager.
Instead of manually computing DT and PDT, we can get the from the pass
manager, which ideally has them already cached. With the new pass
manager, we could even preserve DT/PDT on a per function basis in a
module pass.

I think this also addresses the TODO about re-using the computed DTs for
BFI. IIUC, GetBFI will fetch the DT from the pass manager and when we
will fetch the cached version later.

Reviewers: vsk, hiraditya, tejohnson, thegameg, sebpop

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D57092

llvm-svn: 352036
2019-01-24 09:44:52 +00:00
Max Kazantsev 56515a2c76 [LoopSimplifyCFG] Fix inconsistency in live blocks markup
When we choose whether or not we should mark block as dead, we have an
inconsistent logic in markup of live blocks.
- We take candidate IF its terminator branches on constant AND it is immediately
  in current loop;
- We mark successor live IF its terminator doesn't branch by constant OR it branches
  by constant and the successor is its always taken block.

What we are missing here is that when the terminator branches on a constant but is
not taken as a candidate because is it not immediately in the current loop, we will
mark only one (always taken) successor as live. Therefore, we do NOT do the actual
folding but may NOT mark one of the successors as live. So the result of markup is
wrong in this case, and we may then hit various asserts.

Thanks Jordan Rupprech for reporting this!

Differential Revision: https://reviews.llvm.org/D57095
Reviewed By: rupprecht

llvm-svn: 352024
2019-01-24 05:20:29 +00:00
Julian Lettner cea84ab93a [Sanitizers] UBSan unreachable incompatible with ASan in the presence of `noreturn` calls
Summary:
UBSan wants to detect when unreachable code is actually reached, so it
adds instrumentation before every `unreachable` instruction. However,
the optimizer will remove code after calls to functions marked with
`noreturn`. To avoid this UBSan removes `noreturn` from both the call
instruction as well as from the function itself. Unfortunately, ASan
relies on this annotation to unpoison the stack by inserting calls to
`_asan_handle_no_return` before `noreturn` functions. This is important
for functions that do not return but access the the stack memory, e.g.,
unwinder functions *like* `longjmp` (`longjmp` itself is actually
"double-proofed" via its interceptor). The result is that when ASan and
UBSan are combined, the `noreturn` attributes are missing and ASan
cannot unpoison the stack, so it has false positives when stack
unwinding is used.

Changes:
  # UBSan now adds the `expect_noreturn` attribute whenever it removes
    the `noreturn` attribute from a function
  # ASan additionally checks for the presence of this attribute

Generated code:
```
call void @__asan_handle_no_return    // Additionally inserted to avoid false positives
call void @longjmp
call void @__asan_handle_no_return
call void @__ubsan_handle_builtin_unreachable
unreachable
```

The second call to `__asan_handle_no_return` is redundant. This will be
cleaned up in a follow-up patch.

rdar://problem/40723397

Reviewers: delcypher, eugenis

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D56624

llvm-svn: 352003
2019-01-24 01:06:19 +00:00
David Callahan d2eeb2516d Update entry count for cold calls
Summary:
Profile sample files include the number of times each entry or inlined
call site is sampled. This is translated into the entry count metadta
on functions.

When sample data is being read, if a call site that was inlined
in the sample program is considered cold and not inlined, then
the entry count of the out-of-line functions does not reflect
the current compilation.

In this patch, we note call sites where the function was not inlined
and as a last action of the sample profile loading, we update the
called function's entry count to reflect the calls from these
call sites which are not included in the profile file.

Reviewers: danielcdh, wmi, Kader, modocache

Reviewed By: wmi

Subscribers: davidxl, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D52845

llvm-svn: 352001
2019-01-24 00:55:23 +00:00
Mircea Trofin ec02630278 [llvm] Clarify responsiblity of some of DILocation discriminator APIs
Summary:
Renamed setBaseDiscriminator to cloneWithBaseDiscriminator, to match
similar APIs. Also changed its behavior to copy over the other
discriminator components, instead of eliding them.

Renamed cloneWithDuplicationFactor to
cloneByMultiplyingDuplicationFactor, which more closely matches what
this API does.

Reviewers: dblaikie, wmi

Reviewed By: dblaikie

Subscribers: zzheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D56220

llvm-svn: 351996
2019-01-24 00:10:25 +00:00
Hideki Saito 4e4ecae028 [LV][VPlan] Change to implement VPlan based predication for
VPlan-native path

Context: Patch Series #2 for outer loop vectorization support in LV
using VPlan. (RFC:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).

Patch series #2 checks that inner loops are still trivially lock-step
among all vector elements. Non-loop branches are blindly assumed as
divergent.

Changes here implement VPlan based predication algorithm to compute
predicates for blocks that need predication. Predicates are computed
for the VPLoop region in reverse post order. A block's predicate is
computed as OR of the masks of all incoming edges. The mask for an
incoming edge is computed as AND of predecessor block's predicate and
either predecessor's Condition bit or NOT(Condition bit) depending on
whether the edge from predecessor block to the current block is true
or false edge.

Reviewers: fhahn, rengolin, hsaito, dcaballe

Reviewed By: fhahn

Patch by Satish Guggilla, thanks!

Differential Revision: https://reviews.llvm.org/D53349

llvm-svn: 351990
2019-01-23 22:43:12 +00:00
Peter Collingbourne 020ce3f026 hwasan: Read shadow address from ifunc if we don't need a frame record.
This saves a cbz+cold call in the interceptor ABI, as well as a realign
in both ABIs, trading off a dcache entry against some branch predictor
entries and some code size.

Unfortunately the functionality is hidden behind a flag because ifunc is
known to be broken on static binaries on Android.

Differential Revision: https://reviews.llvm.org/D57084

llvm-svn: 351989
2019-01-23 22:39:11 +00:00
Florian Hahn 68cea130df [HotColdSplitting] Remove unused SSAUpdater.h include (NFC).
llvm-svn: 351945
2019-01-23 11:51:38 +00:00
Max Kazantsev d9aee3c0d1 [IRCE] Support narrow latch condition for wide range checks
This patch relaxes restrictions on types of latch condition and range check.
In current implementation, they should match. This patch allows to handle
wide range checks against narrow condition. The motivating example is the
following:

  int N = ...
  for (long i = 0; (int) i < N; i++) {
    if (i >= length) deopt;
  }

In this patch, the option that enables this support is turned off by
default. We'll wait until it is switched to true.

Differential Revision: https://reviews.llvm.org/D56837
Reviewed By: reames

llvm-svn: 351926
2019-01-23 07:20:56 +00:00
Peter Collingbourne 73078ecd38 hwasan: Move memory access checks into small outlined functions on aarch64.
Each hwasan check requires emitting a small piece of code like this:
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses

The problem with this is that these code blocks typically bloat code
size significantly.

An obvious solution is to outline these blocks of code. In fact, this
has already been implemented under the -hwasan-instrument-with-calls
flag. However, as currently implemented this has a number of problems:
- The functions use the same calling convention as regular C functions.
  This means that the backend must spill all temporary registers as
  required by the platform's C calling convention, even though the
  check only needs two registers on the hot path.
- The functions take the address to be checked in a fixed register,
  which increases register pressure.
Both of these factors can diminish the code size effect and increase
the performance hit of -hwasan-instrument-with-calls.

The solution that this patch implements is to involve the aarch64
backend in outlining the checks. An intrinsic and pseudo-instruction
are created to represent a hwasan check. The pseudo-instruction
is register allocated like any other instruction, and we allow the
register allocator to select almost any register for the address to
check. A particular combination of (register selection, type of check)
triggers the creation in the backend of a function to handle the check
for specifically that pair. The resulting functions are deduplicated by
the linker. The pseudo-instruction (really the function) is specified
to preserve all registers except for the registers that the AAPCS
specifies may be clobbered by a call.

To measure the code size and performance effect of this change, I
took a number of measurements using Chromium for Android on aarch64,
comparing a browser with inlined checks (the baseline) against a
browser with outlined checks.

Code size: Size of .text decreases from 243897420 to 171619972 bytes,
or a 30% decrease.

Performance: Using Chromium's blink_perf.layout microbenchmarks I
measured a median performance regression of 6.24%.

The fact that a perf/size tradeoff is evident here suggests that
we might want to make the new behaviour conditional on -Os/-Oz.
But for now I've enabled it unconditionally, my reasoning being that
hwasan users typically expect a relatively large perf hit, and ~6%
isn't really adding much. We may want to revisit this decision in
the future, though.

I also tried experimenting with varying the number of registers
selectable by the hwasan check pseudo-instruction (which would result
in fewer variants being created), on the hypothesis that creating
fewer variants of the function would expose another perf/size tradeoff
by reducing icache pressure from the check functions at the cost of
register pressure. Although I did observe a code size increase with
fewer registers, I did not observe a strong correlation between the
number of registers and the performance of the resulting browser on the
microbenchmarks, so I conclude that we might as well use ~all registers
to get the maximum code size improvement. My results are below:

Regs | .text size | Perf hit
-----+------------+---------
~all | 171619972  | 6.24%
  16 | 171765192  | 7.03%
   8 | 172917788  | 5.82%
   4 | 177054016  | 6.89%

Differential Revision: https://reviews.llvm.org/D56954

llvm-svn: 351920
2019-01-23 02:20:10 +00:00
Vedant Kumar cde65c0fac [HotColdSplit] Calculate BFI lazily to reduce compile-time, NFC
The splitting pass does not need BFI unless the Module actually has a profile
summary. Do not calcualte BFI unless the summary is present.

For the sqlite3 amalgamation, this reduces time spent in the splitting pass
from 0.4% of the total to under 0.1%.

llvm-svn: 351894
2019-01-22 22:49:22 +00:00
Vedant Kumar 38874c8f7b [HotColdSplit] Calculate domtrees lazily to reduce compile-time, NFC
The splitting pass does not need (post)domtrees until after it's found a
cold block. Defer domtree calculation until a cold block is found.

For the sqlite3 amalgamation, this reduces time spent in the splitting
pass from 0.8% of the total to 0.4%.

llvm-svn: 351892
2019-01-22 22:49:08 +00:00
Jordan Rupprecht 0efe27e62b Revert r351520, "Re-enable terminator folding in LoopSimplifyCFG"
This is still causing compilation crashes in some targets. Will follow up shortly with a repro.

llvm-svn: 351845
2019-01-22 17:39:02 +00:00
Max Kazantsev feb475f4cf [LoopPredication] Support guards expressed as branches by widenable condition
This patch adds support of guards expressed as branches by widenable
conditions in Loop Predication.

Differential Revision: https://reviews.llvm.org/D56081
Reviewed By: reames

llvm-svn: 351805
2019-01-22 11:49:06 +00:00
Max Kazantsev ca45087826 [NFC] Factor out some reusable logic
llvm-svn: 351794
2019-01-22 10:13:36 +00:00
Philip Reames 390c0e2f72 [CVP] Use LVI to constant fold deopt operands
Deopt operands are generally intended to record information about a site in code with minimal perturbation of the surrounding code. Idiomatically, they also tend to appear down rare paths. Putting these together, we have an obvious case for extending CVP w/deopt operand constant folding. Arguably, we should be doing this for all operands on all instructions, but that's definitely a much larger and risky change.

Differential Revision: https://reviews.llvm.org/D55678

llvm-svn: 351774
2019-01-22 01:34:33 +00:00
Serge Guelton be88539b85 Replace llvm::isPodLike<...> by llvm::is_trivially_copyable<...>
As noted in https://bugs.llvm.org/show_bug.cgi?id=36651, the specialization for
isPodLike<std::pair<...>> did not match the expectation of
std::is_trivially_copyable which makes the memcpy optimization invalid.

This patch renames the llvm::isPodLike trait into llvm::is_trivially_copyable.
Unfortunately std::is_trivially_copyable is not portable across compiler / STL
versions. So a portable version is provided too.

Note that the following specialization were invalid:

    std::pair<T0, T1>
    llvm::Optional<T>

Tests have been added to assert that former specialization are respected by the
standard usage of llvm::is_trivially_copyable, and that when a decent version
of std::is_trivially_copyable is available, llvm::is_trivially_copyable is
compared to std::is_trivially_copyable.

As of this patch, llvm::Optional is no longer considered trivially copyable,
even if T is. This is to be fixed in a later patch, as it has impact on a
long-running bug (see r347004)

Note that GCC warns about this UB, but this got silented by https://reviews.llvm.org/D50296.

Differential Revision: https://reviews.llvm.org/D54472

llvm-svn: 351701
2019-01-20 21:19:56 +00:00
Simon Pilgrim e1143c1322 [X86] Auto upgrade VPCOM/VPCOMU intrinsics to generic integer comparisons
This causes a couple of changes in the upgrade tests as signed/unsigned eq/ne are equivalent and we constant fold true/false codes, these changes are the same as what we already do for avx512 cmp/ucmp.

Noticed while cleaning up vector integer comparison costs for PR40376.

llvm-svn: 351697
2019-01-20 19:27:40 +00:00
Vedant Kumar 857cacd9de [ConstantMerge] Factor out check for un-mergeable globals, NFC
llvm-svn: 351671
2019-01-20 02:44:43 +00:00
Nikita Popov 6515db205a [InstCombine] Simplify cttz/ctlz + icmp ugt/ult
Followup to D55745, this time handling comparisons with ugt and ult
predicates (which are the canonical forms for non-equality predicates).

For ctlz we can convert into a simple icmp, for cttz we can convert
into a mask check.

Differential Revision: https://reviews.llvm.org/D56355

llvm-svn: 351645
2019-01-19 09:56:01 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Johannes Doerfert 36872b5db9 Enable IPConstantPropagation to work with abstract call sites
This modification of the currently unused inter-procedural constant
propagation pass (IPConstantPropagation) shows how abstract call sites
enable optimization of callback calls alongside direct and indirect
calls. Through minimal changes, mostly dealing with the partial mapping
of callbacks, inter-procedural constant propagation was enabled for
callbacks, e.g., OpenMP runtime calls or pthreads_create.

Differential Revision: https://reviews.llvm.org/D56447

llvm-svn: 351628
2019-01-19 05:19:12 +00:00
Vedant Kumar b537b946b8 [MergeFunc] Allow merging identical vararg functions using aliases
Thanks to Nikita Popov for pointing out this missed case.

This is a follow-up to r351411, which disabled function merging for
vararg functions outright due to a miscompile (see llvm.org/PR40345).

Differential Revision: https://reviews.llvm.org/D56865

llvm-svn: 351624
2019-01-19 02:46:22 +00:00
Vedant Kumar b755a2df51 [HotColdSplit] Mark inherently cold functions as such
If an inherently cold function is found, mark it as cold. For now this
means applying the `cold` and `minsize` attributes.

As a drive-by, revisit and clean up the criteria for considering a
function for splitting. Add tests.

llvm-svn: 351623
2019-01-19 02:38:47 +00:00
Vedant Kumar 4de1962bb9 [HotColdSplit] Remove a set which tracked split functions (NFC)
Use the begin/end iterator idiom to avoid visiting split functions,
instead of doing a set lookup.

llvm-svn: 351622
2019-01-19 02:38:17 +00:00