Commit Graph

19845 Commits

Author SHA1 Message Date
Sanjay Patel b2ab3f28d5 [SimplifyLibcalls] Realloc(null, N) -> Malloc(N)
Patch by Dávid Bolvanský!

Differential Revision: https://reviews.llvm.org/D45413

llvm-svn: 330259
2018-04-18 14:21:31 +00:00
Sam Parker 3c19051bf0 [IRCE] Only check for NSW on equality predicates
After investigation discussed in D45439, it would seem that the nsw
flag restriction is unnecessary in most cases. So the IsInductionVar
lambda has been removed, the functionality extracted, and now only
require nsw when using eq/ne predicates.

Differential Revision: https://reviews.llvm.org/D45617

llvm-svn: 330256
2018-04-18 13:50:28 +00:00
Florian Hahn ac27758895 [LoopUnroll] Only peel if a predicate becomes known in the loop body.
If a predicate does not become known after peeling, peeling is unlikely
to be beneficial.

Reviewers: mcrosier, efriedma, mkazantsev, junbuml

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D44983

llvm-svn: 330250
2018-04-18 12:29:24 +00:00
Bjorn Pettersson bc4f19b6bd [DebugInfo] Sink related dbg users when sinking in InstCombine
Summary:
When sinking an instruction in InstCombine we now also sink
the DbgInfoIntrinsics that are using the sunken value.

Example)

When sinking the load in this input

bb.X:
  %0 = load i64, i64* %start, align 4, !dbg !31
  tail call void @llvm.dbg.value(metadata i64 %0, ...)
  br i1 %cond, label %for.end, label %for.body.lr.ph
for.body.lr.ph:
  br label %for.body

we now also move the dbg.value, like this

bb.X:
  br i1 %cond, label %for.end, label %for.body.lr.ph
for.body.lr.ph:
  %0 = load i64, i64* %start, align 4, !dbg !31
  tail call void @llvm.dbg.value(metadata i64 %0, ...)
  br label %for.body

In the past we haven't moved the dbg.value so we got

bb.X:
  tail call void @llvm.dbg.value(metadata i64 %0, ...)
  br i1 %cond, label %for.end, label %for.body.lr.ph
for.body.lr.ph:
  %0 = load i64, i64* %start, align 4, !dbg !31
  br label %for.body


So in the past we got a debug-use before the def of %0.
And that dbg.value was also on the path jumping to %for.end, for
which %0 never was defined.

CodeGenPrepare normally comes to rescue later (when not moving
the dbg.value), since it moves dbg.value instrinsics quite
brutally, without really analysing if it is correct to move
the intrinsic (see PR31878).
So at the moment this patch isn't expected to have much impact,
besides that it is moving the dbg.value already in opt, making
the IR look more sane directly.

This can be seen as a preparation to (hopefully) make it possible
to turn off CodeGenPrepare::placeDbgValues later as a solution
to PR31878.

I also adjusted test/DebugInfo/X86/sdagsplit-1.ll to make the
IR in the test case up-to-date with this behavior in InstCombine.

Reviewers: rnk, vsk, aprantl

Reviewed By: vsk, aprantl

Subscribers: mattd, JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D45425

llvm-svn: 330243
2018-04-18 08:08:04 +00:00
Sanjay Patel aea15131db [InstCombine] peek through bitcasted vector/array pointer GEP operand
The bitcast may be interfering with other combines or vectorization 
as shown in PR16739:
https://bugs.llvm.org/show_bug.cgi?id=16739

Most pointer-related optimizations are probably able to look through 
this bitcast, but removing the bitcast shrinks the IR, so it's at
least a size savings.

Differential Revision: https://reviews.llvm.org/D44833

llvm-svn: 330237
2018-04-18 00:36:40 +00:00
Vedant Kumar b0585893cc [Mem2Reg] Create merged debug locations for inserted phis
Track the debug locations of the incoming values to newly-created phis,
and apply merged debug locations to the phis.

A merged location will be on line 0, but will have the correct scope
set. This improves crash reporting when an inlined instruction with a
merged location triggers a machine exception. A debugger will be able to
narrow down the crash to the correct inlined scope, instead of simply
pointing to the outer scope of the caller.

Taken together with a change allows generating merged line-0 locations
for  instructions which aren't calls, this results in a 0.5% increase in
the uncompressed size of the .debug_line section of a stage2+Release
build of clang (-O3 -g).

rdar://33858697

Differential Revision: https://reviews.llvm.org/D45397

llvm-svn: 330227
2018-04-17 22:03:08 +00:00
Vedant Kumar 4b29172d09 [Mem2Reg] Make RenamePassData a struct, NFC
llvm-svn: 330226
2018-04-17 22:03:07 +00:00
Stanislav Mekhanoshin 0bee630814 LoadStoreVectorizer crashes due to unsized type
When we skip bitcasts while looking for GEP in LoadSoreVectorizer
we should also verify that the type is sized otherwise we assert

Differential Revision: https://reviews.llvm.org/D45709

llvm-svn: 330221
2018-04-17 21:40:04 +00:00
Michael Zolotukhin 21458fdc55 Revert "Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading." again."
This reverts r330175. There are still stage3/stage4 miscompares.

llvm-svn: 330180
2018-04-17 07:31:27 +00:00
Michael Zolotukhin a6e7bd7001 [SSAUpdaterBulk] Add debug logging.
llvm-svn: 330176
2018-04-17 04:45:40 +00:00
Michael Zolotukhin 3f5fd1b129 Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading." again.
One more, hopefully the last, bug is fixed: when forming UsesToRewrite
we should ignore phi operands coming from edges that we want to delete.

This reverts r329910.

llvm-svn: 330175
2018-04-17 04:45:22 +00:00
Haicheng Wu f7466f3164 [SLP] Use getExtractWithExtendCost() to compute the scalar cost of extractelement/ext pair
We use getExtractWithExtendCost to calculate the cost of extractelement and
s|zext together when computing the extract cost after vectorization, but we
calculate the cost of extractelement and s|zext separately when computing the
scalar cost which is larger than it should be.

Differential Revision: https://reviews.llvm.org/D45469

llvm-svn: 330143
2018-04-16 18:09:49 +00:00
Sanjay Patel f4c4fc77cd [InstCombine] simplify code in SimplifyAssociativeOrCommutative; NFCI
llvm-svn: 330137
2018-04-16 17:15:13 +00:00
Sanjay Patel d93b8a0740 [InstCombine] simplify getBinOpsForFactorization(); NFC
llvm-svn: 330129
2018-04-16 15:19:24 +00:00
Sanjay Patel 1170daa277 [InstCombine] simplify fneg+fadd folds; NFC
Two cleanups:
1. As noted in D45453, we had tests that don't need FMF that were misplaced in the 'fast-math.ll' test file.
2. This removes the final uses of dyn_castFNegVal, so that can be deleted. We use 'match' now.

llvm-svn: 330126
2018-04-16 14:13:57 +00:00
Sanjay Patel 77e990d887 [InstCombine] fix formatting; NFC
llvm-svn: 330124
2018-04-16 13:21:15 +00:00
Roman Lebedev f84bfb2147 [InstCombine] Simplify 'xor' to 'or' if no common bits are set.
Summary:
In order to get the whole fold as specified in [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]],
let's first handle the simple straight-forward things.
Let's start with the `and` -> `or` simplification.

The one obvious thing missing here: the constant mask is not handled.
I have an idea how to handle it, but it will require some thinking,
and is not strictly required here, so i've left that for later.

https://rise4fun.com/Alive/Pkmg

Reviewers: spatel, craig.topper, eli.friedman, jingyue

Reviewed By: spatel

Subscribers: llvm-commits

Was reviewed as part of https://reviews.llvm.org/D45631

llvm-svn: 330103
2018-04-15 18:59:44 +00:00
Roman Lebedev 25cbb62d18 [NFC] ConstantOffsetExtractor::CanTraceInto(): add FIXME: no tests
As suggested in https://reviews.llvm.org/D45631#1068338,
looking at haveNoCommonBitsSet() users, and *trying* to
show the change effect elsewhere.

llvm-svn: 330100
2018-04-15 18:59:27 +00:00
Sanjay Patel 34ea6cdfab [InstCombine] simplify more code for distributive property; NFCI
Also, fix capitalization to current style. Follow-up to:
rL330096

llvm-svn: 330097
2018-04-15 16:20:58 +00:00
Sanjay Patel f1aa0d7af2 [InstCombine] simplify code for distributive property; NFCI
llvm-svn: 330096
2018-04-15 15:39:57 +00:00
Warren Ristow 8b2f27ce3a [InstCombine] Enable Add/Sub simplifications with only 'reassoc' FMF
These simplifications were previously enabled only with isFast(), but that
is more restrictive than required. Since r317488, FMF has 'reassoc' to
control these cases at a finer level.

llvm-svn: 330089
2018-04-14 19:18:28 +00:00
Hiroshi Inoue ae17900997 [NFC] fix trivial typos in document and comments
"not not" -> "not" etc

llvm-svn: 330083
2018-04-14 08:59:00 +00:00
Roman Tereshin dab10b5468 [DebugInfo][OPT] NFC follow-up on "Fixing a couple of DI duplication bugs of CloneModule"
llvm-svn: 330070
2018-04-13 21:23:11 +00:00
Roman Tereshin d769eb36ab [DebugInfo][OPT] Fixing a couple of DI duplication bugs of CloneModule
As demonstrated by the regression tests added in this patch, the
following cases are valid cases:

1. A Function with no DISubprogram attached, but various debug info
  related to its instructions, coming, for instance, from an inlined
  function, also defined somewhere else in the same module;
2. ... or coming exclusively from the functions inlined and eliminated
  from the module entirely.

The ValueMap shared between CloneFunctionInto calls within CloneModule
needs to contain identity mappings for all of the DISubprogram's to
prevent them from being duplicated by MapMetadata / RemapInstruction
calls, this is achieved via DebugInfoFinder collecting all the
DISubprogram's. However, CloneFunctionInto was missing calls into
DebugInfoFinder for functions w/o DISubprogram's attached, but still
referring DISubprogram's from within (case 1). This patch fixes that.

The fix above, however, exposes another issue: if a module contains a
DISubprogram referenced only indirectly from other debug info
metadata, but not attached to any Function defined within the module
(case 2), cloning such a module causes a DICompileUnit duplication: it
will be moved in indirecty via a DISubprogram by DebugInfoFinder first
(because of the first bug fix described above), without being
self-mapped within the shared ValueMap, and then will be copied during
named metadata cloning. So this patch makes sure DebugInfoFinder
visits DICompileUnit's referenced from DISubprogram's as it goes w/o
re-processing llvm.dbg.cu list over and over again for every function
cloned, and makes sure that CloneFunctionInto self-maps
DICompileUnit's referenced from the entire function, not just its own
DISubprogram attached that may also be missing.

The most convenient way of tesing CloneModule I found is to rely on
CloneModule call from `opt -run-twice`, instead of writing tedious
unit tests. That feature has a couple of properties that makes it hard
to use for this purpose though:

1. CloneModule doesn't copy source filename, making `opt -run-twice`
  report it as a difference.
2. `opt -run-twice` does the second run on the original module, not
  its clone, making the result of cloning completely invisible in opt's
  actual output with and without `-run-twice` both, which directly
  contradicts `opt -run-twice`s own error message.

This patch fixes this as well.

Reviewed By: aprantl

Reviewers: loladiro, GorNishanov, espindola, echristo, dexonsmith

Subscribers: vsk, debug-info, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D45593

llvm-svn: 330069
2018-04-13 21:22:24 +00:00
Krzysztof Parzyszek dfed941eec [LV] Introduce TTI::getMinimumVF
The function getMinimumVF(ElemWidth) will return the minimum VF for
a vector with elements of size ElemWidth bits. This value will only
apply to targets for which TTI::shouldMaximizeVectorBandwidth returns
true. The value of 0 indicates that there is no minimum VF.

Differential Revision: https://reviews.llvm.org/D45271

llvm-svn: 330062
2018-04-13 20:16:32 +00:00
Mandeep Singh Grang 636d94db3b [Transforms] Change std::sort to llvm::sort in response to r327219
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.

To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.

Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.

Reviewers: kcc, pcc, danielcdh, jmolloy, sanjoy, dberlin, ruiu

Reviewed By: ruiu

Subscribers: ruiu, llvm-commits

Differential Revision: https://reviews.llvm.org/D45142

llvm-svn: 330059
2018-04-13 19:47:57 +00:00
Andrey Konovalov 1ba9d9c6ca hwasan: add -fsanitize=kernel-hwaddress flag
This patch adds -fsanitize=kernel-hwaddress flag, that essentially enables
-hwasan-kernel=1 -hwasan-recover=1 -hwasan-match-all-tag=0xff.

Differential Revision: https://reviews.llvm.org/D45046

llvm-svn: 330044
2018-04-13 18:05:21 +00:00
Roman Lebedev c00659328a [InstCombine]: foldSelectICmpAndAnd(): and is commutative
Summary:
The fold added in D45108 did not account for the fact that
the and instruction is commutative, and if the mask is a variable,
the mask variable and the fold variable may be swapped.

I have noticed this by accident when looking into [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]]

This extends/generalizes that fold, so it is handled too.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45539

llvm-svn: 330001
2018-04-13 09:57:57 +00:00
Craig Topper 254ed028a4 [X86] Remove the pmuldq/pmuldq intrinsics and replace with native IR.
This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics.

We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well.

llvm-svn: 329990
2018-04-13 06:07:18 +00:00
Xin Tong d83c883d29 [CallSiteSplit] Fix comment. NFC
llvm-svn: 329987
2018-04-13 04:35:38 +00:00
Eli Friedman e1938cbc87 Don't call skipModule for CFI lowering passes.
opt-bisect shouldn't skip these passes; they lower intrinsics which
no other pass can handle.

llvm-svn: 329961
2018-04-12 22:04:11 +00:00
Benjamin Kramer b4ba3988bb Revert "Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading." one more time."
This reverts commit r329865. Causes stage2/stage3 miscompare.

llvm-svn: 329910
2018-04-12 13:52:02 +00:00
Sam Parker 9737535943 [IRCE] isKnownNonNegative helper function
Created a helper function to query for non negative SCEVs. Uses the
SGE predicate to catch constants that could be interpreted as
negative.

Differential Revision: https://reviews.llvm.org/D45481

llvm-svn: 329907
2018-04-12 12:49:40 +00:00
Hiroshi Inoue bcadfee2ad [NFC] fix trivial typos in documents and comments
"is is" -> "is", "if if" -> "if", "or or" -> "or"

llvm-svn: 329878
2018-04-12 05:53:20 +00:00
George Burgess IV 48ee59b6f0 [DeadArgElim] Remove allocsize attributes on callsites
We're already removing allocsize attributes from Functions that we
remove args from, since removing arguments from a function may make the
allocsize attribute incorrect. It appears we forgot to also remove them
from callsites.

Without this, I get verifier errors on `@Test2`.

It probably wouldn't be too hard to make DAE properly update allocsize
attributes instead of dropping them, but I can't think of a scenario
where that'd be useful in practice.

llvm-svn: 329868
2018-04-12 02:06:01 +00:00
Michael Zolotukhin 815f453f76 Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading." one more time.
This reapplies commit r329644.

llvm-svn: 329865
2018-04-11 23:37:53 +00:00
Michael Zolotukhin 4fbb93003b [SSAUpdaterBulk] Fix linux bootstrap/sanitizer failures: explicitly specify order of evaluation.
The standard says that the order of evaluation of an expression
  s[x] = foo()
is unspecified. In our case, we first create an empty entry in the map,
then call foo(), then store its return value to the created entry. The
problem is that foo uses the map as a cache, so if it finds that there
is an entry in the map, it stops computation. This change explicitly
sets the order, thus fixing this heisenbug.

llvm-svn: 329864
2018-04-11 23:37:37 +00:00
Sanjay Patel ff98682c9c [InstCombine] limit X - (cast(-Y) --> X + cast(Y) with hasOneUse()
llvm-svn: 329821
2018-04-11 15:57:18 +00:00
Artur Gainullin d928201ac5 Eliminate a bitwise 'not' op of 'not' min/max by inverting the min/max.
Bitwise 'not' of the min/max could be eliminated in the pattern:

%notx = xor i32 %x, -1
%cmp1 = icmp sgt[slt/ugt/ult] i32 %notx, %y
%smax = select i1 %cmp1, i32 %notx, i32 %y
%res = xor i32 %smax, -1

https://rise4fun.com/Alive/lCN

Reviewers: spatel

Reviewed by: spatel

Subscribers: a.elovikov, llvm-commits

Differential Revision: https://reviews.llvm.org/D45317

llvm-svn: 329791
2018-04-11 10:29:37 +00:00
Sriraman Tallam 182f2df7c5 Simplification of libcall like printf->puts must check for RtLibUseGOT metadata.
With -fno-plt, for example, calls to printf when getting converted to puts
still use the PLT. This patch checks for the metadata "RtLibUseGOT" and
annotates the declaration with the right attributes.

Differential Revision: https://reviews.llvm.org/D45180

llvm-svn: 329768
2018-04-10 23:32:36 +00:00
Sanjay Patel 3b6d46761f [CVP] simplify phi with constant incoming values that match common variable edge values
This is based on an example that was recently posted on llvm-dev:

void *propagate_null(void* b, int* g) {
  if (!b) {
    return 0;
  }
  (*g)++;
  return b;
}

https://godbolt.org/g/xYk3qG

The original code or constant propagation in other passes has obscured the fact 
that the phi can be removed completely.

Differential Revision: https://reviews.llvm.org/D45448

llvm-svn: 329755
2018-04-10 20:42:39 +00:00
Michael Zolotukhin d6beefd5d3 Revert "[PR16756] Use SSAUpdaterBulk in JumpThreading." one more time.
This reverts r329661. Bots are still unhappy.

llvm-svn: 329666
2018-04-10 03:40:29 +00:00
Michael Zolotukhin 8a13f6d4a7 Revert "Revert "[PR16756] Use SSAUpdaterBulk in JumpThreading.""
This reapplies commit r329644.

llvm-svn: 329661
2018-04-10 02:16:45 +00:00
Michael Zolotukhin aa7868594e [SSAUpdaterBulk] Handle CFG with unreachable from entry blocks.
llvm-svn: 329660
2018-04-10 02:16:29 +00:00
Michael Zolotukhin 0274632ee6 Revert "[PR16756] Use SSAUpdaterBulk in JumpThreading."
This reverts commit r329644.

llvm-svn: 329650
2018-04-10 00:42:43 +00:00
Hideki Saito d829973794 Fix for the buildbot failure. Now-unused private field TTI deleted.
llvm-svn: 329649
2018-04-10 00:38:36 +00:00
Hideki Saito dfa932b049 [NFC][LV] Move InterleaveInfo from Legal to CostModel
Summary:
Another clean up, following D43208.

Interleaved memory access analysis/optimization has nothing to do with vectorization legality. It doesn't really belong there. On the other hand, cost model certainly has to know about it.

In principle, vectorization should proceed like Legality ==> Optimization ==> CostModel ==> CodeGen, and this change just does that,
by moving the interleaved access analysis/decision out of Legal, and run it just before CostModel object is created.

After this, I can move LoopVectorizationLegality and Hints/Requirements classes into it's own header file, making it shareable within Transform tree. I have the patch already but I don't want to mix with this change. Eventual goal is to move to Analysis tree, but I first need to move RecurrenceDescriptor/InductionDescriptor from Transform/Util/LoopUtil.* to Analysis.

Reviewers: rengolin, hfinkel, mkuper, dcaballe, sguggill, fhahn, aemerson

Reviewed By: rengolin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45072

llvm-svn: 329645
2018-04-09 23:45:40 +00:00
Michael Zolotukhin c6d2d65f37 [PR16756] Use SSAUpdaterBulk in JumpThreading.
Summary:
SSAUpdater is a bottleneck in JumpThreading, and this patch improves the
situation by using SSAUpdaterBulk instead.

Compile time impact: no noticable changes on CTMark, a big improvement
on the test from PR16756.

Reviewers: dberlin, davide, MatzeB

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D44282

llvm-svn: 329644
2018-04-09 23:37:37 +00:00
Michael Zolotukhin 52b064f3d3 [PR16756] Add SSAUpdaterBulk.
Summary:
SSAUpdater is a bottleneck in a number of passes, and one of the reasons
is that it performs a lot of unnecessary computations (DT/IDF) over and
over again. This patch adds a new SSAUpdaterBulk that uses existing DT
and avoids recomputing IDF when possible.

Reviewers: dberlin, davide, MatzeB

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D44282

llvm-svn: 329643
2018-04-09 23:37:20 +00:00
Simon Pilgrim 23c2182c2b Support generic expansion of ordered vector reduction (PR36732)
Without the fast math flags, the llvm.experimental.vector.reduce.fadd/fmul intrinsic expansions must be expanded in order.

This patch scalarizes the reduction, applying the accumulator at the start of the sequence: ((((Acc + Scl[0]) + Scl[1]) + Scl[2]) + ) ... + Scl[NumElts-1]

Differential Revision: https://reviews.llvm.org/D45366

llvm-svn: 329585
2018-04-09 15:44:20 +00:00
Xin Tong fdad23bc36 [MergeICmp] Update debug msg.NFC
llvm-svn: 329572
2018-04-09 14:29:13 +00:00
Xin Tong 0efadbbcde [MergeICmp] Split blocks that do other work.
Summary:
We do not try to move the instructions and split the block till we
know the blocks can be split, i.e. BCE-cmp-insts can be separated from
non-BCE-cmp-insts.

Reviewers: davide, courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44443

llvm-svn: 329564
2018-04-09 13:14:06 +00:00
Max Kazantsev 8624a4786a [IRCE] Relax restriction on collected range checks
In IRCE, we have a very old legacy check that works when we collect comparisons that we
treat as range checks. It ensures that the value against which the indvar is compared is
loop invariant and is also positive.

This latter condition remained there since the times when IRCE was only able to handle
signed latch comparison. As the optimization evolved, it now learned how to intersect
signed or unsigned ranges, and this logic has no reliance on the fact that the right border
of each range should be positive.

The old implementation of this non-negativity check was also naive enough and just looked
into ranges (while most of other IRCE logic tries to use power of SCEV implications), so this
check did not allow to deal with the most simple case that looks like follows:

  int size; // not known non-negative
  int length; //known non-negative;
  i = 0;
  if (size != 0) {
    do {
      range_check(i < size);
      range_check(i < length);
    ++i;
    } while (i < size)
  }

In this case, even if from some dominating conditions IRCE could parse loop
structure, it could only remove the range check against `length` and simply
ignored the check against `size`.

In this patch we remove this obsolete check. It will allow IRCE to pick comparison
against `size` as a potential range check and then let Range Intersection logic
decide whether it is OK to eliminate it or not.

Differential Revision: https://reviews.llvm.org/D45362
Reviewed By: samparker

llvm-svn: 329547
2018-04-09 06:01:22 +00:00
Hiroshi Inoue 9ff2380ea6 [NFC] fix trivial typos in comments and error message
"is is" -> "is", "are are" -> "are"

llvm-svn: 329546
2018-04-09 04:37:53 +00:00
Xin Tong 99c4e2f364 [LIR] Reorder header. NFC
llvm-svn: 329530
2018-04-08 13:19:53 +00:00
Sanjay Patel 2a24958923 [InstCombine] simplify code that propagates FMF; NFC
llvm-svn: 329503
2018-04-07 14:14:23 +00:00
Roman Lebedev 41922f1a6d [InstCombine] Get rid of select of bittest (PR36950 / PR17564)
Summary:
See [[ https://bugs.llvm.org/show_bug.cgi?id=36950 | PR36950 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=17564 | PR17564 ]], D45065, D45107
https://godbolt.org/g/iAYRup

Alive proof: https://rise4fun.com/Alive/uiH

Testing: `ninja check-llvm`

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45108

llvm-svn: 329492
2018-04-07 10:37:24 +00:00
Nico Weber b64da22db7 Remove trailing space in build file.
llvm-svn: 329479
2018-04-07 03:30:28 +00:00
Vitaly Buka 9cb59b92cc Fix warning by cl::opt<int> -> cl::opt<unsigned>
llvm-svn: 329461
2018-04-06 21:41:17 +00:00
Vitaly Buka 66f53d71f7 Runtime flag to control branch funnel threshold
Reviewers: pcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45193

llvm-svn: 329459
2018-04-06 21:32:36 +00:00
Geoff Berry 5bf4a5eafa [EarlyCSE] Add debug counter for debugging mis-optimizations. NFC.
Reviewers: reames, spatel, davide, dberlin

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D45162

llvm-svn: 329443
2018-04-06 18:47:33 +00:00
Sanjay Patel a9ca709011 [InstCombine] limit nsz: -(X - Y) --> Y - X to hasOneUse()
As noted in the post-commit discussion for r329350, we shouldn't
generally assume that fsub is the same cost as fneg.

llvm-svn: 329429
2018-04-06 17:24:08 +00:00
Simon Pilgrim a74f4ae404 Strip trailing whitespace. NFCI.
llvm-svn: 329421
2018-04-06 17:01:54 +00:00
Mircea Trofin aa3fea6cb0 [GlobalOpt] Fix support for casts in ctors.
Summary:
Fixing an issue where initializations of globals where constructors use
casts were silently translated to 0-initialization.

Reviewers: davidxl, evgeny777

Reviewed By: evgeny777

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45198

llvm-svn: 329409
2018-04-06 15:54:47 +00:00
Chad Rosier 45735b8e40 [LoopUnroll] Make LoopPeeling respect the AllowPeeling preference.
The SimpleLoopUnrollPass isn't suppose to perform loop peeling.

Differential Revision: https://reviews.llvm.org/D45334

llvm-svn: 329395
2018-04-06 13:57:21 +00:00
Hans Wennborg b230c763a4 EntryExitInstrumenter: Handle musttail calls
Inserting instrumentation between a musttail call and ret instruction
would create invalid IR. Instead, treat musttail calls as function
exits.

llvm-svn: 329385
2018-04-06 10:14:09 +00:00
Max Kazantsev 832563a782 [NFC] Add missing end of line symbols
llvm-svn: 329383
2018-04-06 09:47:06 +00:00
Sanjay Patel 04683de82f [InstCombine] FP: Z - (X - Y) --> Z + (Y - X)
This restores what was lost with rL73243 but without
re-introducing the bug that was present in the old code.

Note that we already have these transforms if the ops are 
marked 'fast' (and I assume that's happening somewhere in
the code added with rL170471), but we clearly don't need 
all of 'fast' for these transforms.

llvm-svn: 329362
2018-04-05 23:21:15 +00:00
Sanjay Patel 03e2526728 [InstCombine] nsz: -(X - Y) --> Y - X
This restores part of the fold that was removed with rL73243 (PR4374).

llvm-svn: 329350
2018-04-05 21:37:17 +00:00
Daniel Neilson 367c2aea4e [InstCombine] Properly change GEP type when reassociating loop invariant GEP chains
Summary:
This is a fix to PR37005.

Essentially, rL328539 ([InstCombine] reassociate loop invariant GEP chains to enable LICM) contains a bug
whereby it will convert:
%src = getelementptr inbounds i8, i8* %base, <2 x i64> %val
%res = getelementptr inbounds i8, <2 x i8*> %src, i64 %val2
into:
%src = getelementptr inbounds i8, i8* %base, i64 %val2
%res = getelementptr inbounds i8, <2 x i8*> %src, <2 x i64> %val

By swapping the index operands if the GEPs are in a loop, and %val is loop variant while %val2
is loop invariant.

This fix recreates new GEP instructions if the index operand swap would result in the type
of %src changing from vector to scalar, or vice versa.

Reviewers: sebpop, spatel

Reviewed By: sebpop

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45287

llvm-svn: 329331
2018-04-05 18:51:45 +00:00
Sanjay Patel deaf4f354e [InstCombine] use pattern matchers for fsub --> fadd folds
This allows folding for vectors with undef elements.

llvm-svn: 329316
2018-04-05 17:06:45 +00:00
Sanjay Patel 236442e063 [InstCombine] cleanup; NFC
llvm-svn: 329282
2018-04-05 13:24:26 +00:00
Florian Hahn 6e0043365b [LoopInterchange] Add stats counter for number of interchanged loops.
Reviewers: samparker, karthikthecool, blitz.opensource

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D45209

llvm-svn: 329269
2018-04-05 10:39:23 +00:00
Florian Hahn 831a757728 [LoopInterchange] Preserve LoopInfo after interchanging.
LoopInterchange relies on LoopInfo being up-to-date, so we should
preserve it after interchanging. This patch updates restructureLoops to
move the BBs of the interchanged loops to the right place.

Reviewers: davide, efriedma, karthikthecool, mcrosier

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D45278

llvm-svn: 329264
2018-04-05 09:48:45 +00:00
Taewook Oh e0db533feb [CallSiteSplitting] Do not perform callsite splitting inside landing pad
Summary:
If the callsite is inside landing pad, do not perform callsite splitting.

Callsite splitting uses utility function llvm::DuplicateInstructionsInSplitBetween, which eventually calls llvm::SplitEdge. llvm::SplitEdge calls llvm::SplitCriticalEdge with an assumption that the function returns nullptr only when the target edge is not a critical edge (and further assumes that if the return value was not nullptr, the predecessor of the original target edge always has a single successor because critical edge splitting was successful). However, this assumtion is not true because SplitCriticalEdge returns nullptr if the destination block is a landing pad. This invalid assumption results assertion failure.

Fundamental solution might be fixing llvm::SplitEdge to not to rely on the invalid assumption. However, it'll involve a lot of work because current API assumes that llvm::SplitEdge never fails. Instead, this patch makes callsite splitting to not to attempt splitting if the callsite is in a landing pad.

Attached test case will crash with assertion failure without the fix.

Reviewers: fhahn, junbuml, dberlin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45130

llvm-svn: 329250
2018-04-05 04:16:23 +00:00
Evgeniy Stepanov 1f1a7a719d hwasan: add -hwasan-match-all-tag flag
Sometimes instead of storing addresses as is, the kernel stores the address of
a page and an offset within that page, and then computes the actual address
when it needs to make an access. Because of this the pointer tag gets lost
(gets set to 0xff). The solution is to ignore all accesses tagged with 0xff.

This patch adds a -hwasan-match-all-tag flag to hwasan, which allows to ignore
accesses through pointers with a particular pointer tag value for validity.

Patch by Andrey Konovalov.

Differential Revision: https://reviews.llvm.org/D44827

llvm-svn: 329228
2018-04-04 20:44:59 +00:00
Benjamin Kramer 1fc0da4849 Make helpers static. NFC.
llvm-svn: 329170
2018-04-04 11:45:11 +00:00
Nicolai Haehnle eb7311ffb1 StructurizeCFG: Test for branch divergence correctly
Fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform, so the branch is
non-uniform.

This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.

As discovered after committing an earlier version of this change, this
exposes a subtle interaction between this pass and DivergenceAnalysis:
since we remove and re-create branch instructions, we can no longer rely
on DivergenceAnalysis for branches in subregions that were already
processed by the pass.

Explicitly remove branch instructions from DivergenceAnalysis to
avoid dangling pointers as a matter of defensive programming, and
change how we detect non-uniform subregions.

Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4

Differential Revision: https://reviews.llvm.org/D43743

llvm-svn: 329165
2018-04-04 10:58:15 +00:00
Craig Topper 7d3aba6687 [SimplifyCFG] Teach merge conditional stores to handle cases where the PostBB has more than 2 predecessors by inserting a new block for the store.
Summary:
Currently merge conditional stores can't handle cases where PostBB (the block we need to move the store to) has more than 2 predecessors.

This patch removes that restriction by creating a new block with only the 2 predecessors we care about and an unconditional branch to the original block. This provides a place to put the store.

Reviewers: efriedma, jmolloy, ABataev

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39760

llvm-svn: 329142
2018-04-04 03:47:17 +00:00
Ikhlas Ajbar 1376d934ed [Hexagon] peel loops with runtime small trip counts
Move the check canPeel() to Hexagon Target before setting PeelCount.

Differential Revision: https://reviews.llvm.org/D44880

llvm-svn: 329129
2018-04-03 22:55:09 +00:00
Sanjay Patel 81b3b10a95 [InstCombine] allow more fmul folds with 'reassoc'
The tests marked with 'FIXME' require loosening the check
in SimplifyAssociativeOrCommutative() to optimize completely;
that's still checking isFast() in Instruction::isAssociative().

llvm-svn: 329121
2018-04-03 22:19:19 +00:00
Vlad Tsyrklevich 07cf78cdad Fix bad copy-and-paste in r329108
llvm-svn: 329118
2018-04-03 21:40:27 +00:00
Gor Nishanov d4712715dd [coroutines] Respect alloca alignment requirements when building coroutine frame
Summary:
If an alloca need to be stored in the coroutine frame and it has an alignment specified and the alignment does not match the natural alignment of the alloca type. Insert appropriate padding into the coroutine frame to make sure that it gets requested alignment.

For example for a packet type (which natural alignment is 1), but alloca alignment is 8, we may need to insert a padding field with required number of bytes to make sure it is properly aligned.

```
%PackedStruct = type <{ i64 }>
...
  %data = alloca %PackedStruct, align 8
```

If the previous field in the coroutine frame had alignment 2, we would have [6 x i8] inserted before %PackedStruct in the coroutine frame:

```
%f.Frame = type { ..., i16, [6 x i8], %PackedStruct }
```

Reviewers: rnk, lewissbaker, modocache

Reviewed By: modocache

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D45221

llvm-svn: 329112
2018-04-03 20:54:20 +00:00
Florian Hahn 9467ccf447 [LoopInterchange] Add remark for calls preventing interchanging.
It also updates test/Transforms/LoopInterchange/call-instructions.ll
to use accesses where we can prove dependence after D35430.


Reviewers: sebpop, karthikthecool, blitz.opensource

Reviewed By: sebpop

Differential Revision: https://reviews.llvm.org/D45206

llvm-svn: 329111
2018-04-03 20:54:04 +00:00
Vlad Tsyrklevich d17f61ea3b Add the ShadowCallStack attribute
Summary:
Introduce the ShadowCallStack function attribute. It's added to
functions compiled with -fsanitize=shadow-call-stack in order to mark
functions to be instrumented by a ShadowCallStack pass to be submitted
in a separate change.

Reviewers: pcc, kcc, kubamracek

Reviewed By: pcc, kcc

Subscribers: cryptoad, mehdi_amini, javed.absar, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D44800

llvm-svn: 329108
2018-04-03 20:10:40 +00:00
Alexey Bataev d5b1f7892f [SLP] Fixed formatting, NFC.
llvm-svn: 329091
2018-04-03 17:48:14 +00:00
Daniel Neilson 901acfab0c [InstCombine] Fold compare of int constant against a splatted vector of ints
Summary:
Folding patterns like:
  %vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
  %cast = bitcast <4 x i8> %vec to i32
  %cond = icmp eq i32 %cast, 0
into:
  %ext = extractelement <4 x i8> %insvec, i32 0
  %cond = icmp eq i32 %ext, 0

Combined with existing rules, this allows us to fold patterns like:
  %insvec = insertelement <4 x i8> undef, i8 %val, i32 0
  %vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
  %cast = bitcast <4 x i8> %vec to i32
  %cond = icmp eq i32 %cast, 0
into:
  %cond = icmp eq i8 %val, 0

When we construct a splat vector via a shuffle, and bitcast the vector into an integer type for comparison against an integer constant. Then we can simplify the the comparison to compare the splatted value against the integer constant.

Reviewers: spatel, anna, mkazantsev

Reviewed By: spatel

Subscribers: efriedma, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D44997

llvm-svn: 329087
2018-04-03 17:26:20 +00:00
Alexey Bataev 428e9d9d87 [SLP] Fix PR36481: vectorize reassociated instructions.
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.

Patch does not support reordering of the repeated instruction, this must
be handled in the separate patch.

Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43776

llvm-svn: 329085
2018-04-03 17:14:47 +00:00
Alexey Bataev df989c54cf Recommit "[SLP] Fix issues with debug output in the SLP vectorizer."
The primary issue here is that using NDEBUG alone isn't enough to guard
debug printing -- instead the DEBUG() macro needs to be used so that the
specific pass debug logging check is employed. Without this, every
asserts-enabled build was printing out information when it hit this.

I also fixed another place where we had multiple statements in a DEBUG
macro to use {}s to be a bit cleaner. And I fixed a place that used
errs() rather than dbgs().

llvm-svn: 329082
2018-04-03 16:40:33 +00:00
Benjamin Kramer 2fc3b18922 Revert "[SLP] Fix PR36481: vectorize reassociated instructions."
This reverts commit r328980 and r329046. Makes the vectorizer crash.

llvm-svn: 329071
2018-04-03 14:40:33 +00:00
Alexander Potapenko ac70668cff MSan: introduce the conservative assembly handling mode.
The default assembly handling mode may introduce false positives in the
cases when MSan doesn't understand that the assembly call initializes
the memory pointed to by one of its arguments.

We introduce the conservative mode, which initializes the first
|sizeof(type)| bytes for every |type*| pointer passed into the
assembly statement.

llvm-svn: 329054
2018-04-03 09:50:06 +00:00
Chandler Carruth 597bfd8448 [SLP] Fix issues with debug output in the SLP vectorizer.
The primary issue here is that using NDEBUG alone isn't enough to guard
debug printing -- instead the DEBUG() macro needs to be used so that the
specific pass debug logging check is employed. Without this, every
asserts-enabled build was printing out information when it hit this.

I also fixed another place where we had multiple statements in a DEBUG
macro to use {}s to be a bit cleaner. And I fixed a place that used
`errs()` rather than `dbgs()`.

llvm-svn: 329046
2018-04-03 05:27:28 +00:00
Ikhlas Ajbar b7322e8ac7 peel loops with runtime small trip counts
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.

Differential Revision: https://reviews.llvm.org/D44880

llvm-svn: 329042
2018-04-03 03:39:43 +00:00
Haicheng Wu 7f0daaeb86 [SLP] Distinguish "demanded and shrinkable" from "demanded and not shrinkable" values when determining the minimum bitwidth
We use two approaches for determining the minimum bitwidth.

   * Demanded bits
   * Value tracking

If demanded bits doesn't result in a narrower type, we then try value tracking.
We need this if we want to root SLP trees with the indices of getelementptr
instructions since all the bits of the indices are demanded.

But there is a missing piece though. We need to be able to distinguish "demanded
and shrinkable" from "demanded and not shrinkable". For example, the bits of %i
in

%i = sext i32 %e1 to i64
%gep = getelementptr inbounds i64, i64* %p, i64 %i

are demanded, but we can shrink %i's type to i32 because it won't change the
result of the getelementptr. On the other hand, in

%tmp15 = sext i32 %tmp14 to i64
%tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0

it doesn't make sense to shrink %tmp15 and we can skip the value tracking.

Ideas are from Matthew Simpson!

Differential Revision: https://reviews.llvm.org/D44868

llvm-svn: 329035
2018-04-03 00:05:10 +00:00
Brian Gesiak 64521bed0d [Coroutines] Avoid assert splitting hidden coros
Summary:
When attempting to split a coroutine with 'hidden' visibility (for
example, a C++ coroutine that is inlined when compiled with the option
'-fvisibility-inlines-hidden'), LLVM would hit an assertion in
include/llvm/IR/GlobalValue.h:240: "local linkage requires default
visibility". The issue is that the visibility is copied from the source
of the function split in the `CloneFunctionInto` function, but the linkage
is not. To fix, create the new function first with external linkage,
then copy the linkage from the original function *after* `CloneFunctionInto`
is called.

Since `GlobalValue::setLinkage` in turn calls `maybeSetDsoLocal`, the
explicit call to `setDSOLocal` can be removed in CoroSplit.cpp.

Test Plan: check-llvm

Reviewers: GorNishanov, lewissbaker, EricWF, majnemer, rnk

Reviewed By: rnk

Subscribers: llvm-commits, eric_niebler

Differential Revision: https://reviews.llvm.org/D44185

llvm-svn: 329033
2018-04-02 23:39:40 +00:00
Reid Kleckner 298ffc609b [InstCombine] Don't strip function type casts from musttail calls
Summary:
The cast simplifications that instcombine does here do not make any
attempt to obey the verifier rules for musttail calls. Therefore we have
to disable them.

Reviewers: efriedma, majnemer, pcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45186

llvm-svn: 329027
2018-04-02 22:49:44 +00:00
Reid Kleckner a9e9918ee4 Treat inlining a notail call as a regular, non-tail call
Otherwise, we end up inlining a musttail call into a non-tail position,
which breaks verifier invariants.

Fixes PR31014

llvm-svn: 329015
2018-04-02 21:23:16 +00:00
Sanjay Patel cbb0450540 [InstCombine] add folds for icmp + sub (PR36969)
(A - B) >u A --> A <u B
C <u (C - D) --> C <u D

https://rise4fun.com/Alive/e7j

Name: ugt
  %sub = sub i8 %x, %y
  %cmp = icmp ugt i8 %sub, %x
=>
  %cmp = icmp ult i8 %x, %y
  
Name: ult
  %sub = sub i8 %x, %y
  %cmp = icmp ult i8 %x, %sub
=>
  %cmp = icmp ult i8 %x, %y

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=36969

llvm-svn: 329011
2018-04-02 20:37:40 +00:00
Rong Xu 5a8d4c3357 [DeadArgumentElim] Clone function level metadatas
Some Function level metadatas, such as function entry count, are not cloned in
DeadArgumentElim. This happens a lot in lto/thinlto because of DeadArgumentElim
after internalization.

This patch clones the metadatas in the original function to the new function.

Differential Revision: https://reviews.llvm.org/D44127

llvm-svn: 328991
2018-04-02 17:27:38 +00:00
Gor Nishanov b0316d96ae [coroutines] Add support for llvm.coro.noop intrinsics
Summary:
A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined coroutine noop_coroutine that does nothing.
To implement this feature, we implemented an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that does nothing when resumed or destroyed.

Reviewers: EricWF, modocache, rnk, lewissbaker

Reviewed By: modocache

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45114

llvm-svn: 328986
2018-04-02 16:55:12 +00:00