Commit Graph

9116 Commits

Author SHA1 Message Date
Nikita Popov 15bc4dc9a8 [LVI] Normalize pointer behavior
Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.

The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.

This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this change not fully NFC, because we can now fold
terminator icmps based on the dereferencability information in the
same block. This is the reason why I changed one JumpThreading test
(it would optimize the condition away without the change).

Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.

I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.

Differential Revision: https://reviews.llvm.org/D69914
2019-11-08 17:57:14 +01:00
Hans Wennborg eaff300401 Revert f0c2a5a "[LV] Generalize conditions for sinking instrs for first order recurrences."
It broke Chromium, causing "Instruction does not dominate all uses!" errors.
See https://bugs.chromium.org/p/chromium/issues/detail?id=1022297#c1 for a
reproducer.

> If the recurrence PHI node has a single user, we can sink any
> instruction without side effects, given that all users are dominated by
> the instruction computing the incoming value of the next iteration
> ('Previous'). We can sink instructions that may cause traps, because
> that only causes the trap to occur later, but not on any new paths.
>
> With the relaxed check, we also have to make sure that we do not have a
> direct cycle (meaning PHI user == 'Previous), which indicates a
> reduction relation, which potentially gets missed by
> ReductionDescriptor.
>
> As follow-ups, we can also sink stores, iff they do not alias with
> other instructions we move them across and we could also support sinking
> chains of instructions and multiple users of the PHI.
>
> Fixes PR43398.
>
> Reviewers: hsaito, dcaballe, Ayal, rengolin
>
> Reviewed By: Ayal
>
> Differential Revision: https://reviews.llvm.org/D69228
2019-11-07 11:00:02 +01:00
Philip Reames 686f449e3d [WC] Fix a subtle bug in our definition of widenable branch
We had a subtle, but nasty bug in our definition of a widenable branch, and thus in the transforms which used that utility. Specifically, we returned true for any branch which included a widenable condition within it's condition, regardless of whether that widenable condition also had other uses.

The problem is that the result of the WC() call is defined to be one particular value. As such, all users must agree as to what that value is. If we widen a branch without also updating *all other users* of the WC in the same way, we have broken the required semantics.

Most of the textual diff is updating existing transforms not to leave dead uses hanging around. They're largely NFC as the dead instructions would be immediately deleted by other passes. The reason to make these changes is so that the transforms preserve the widenable branch form.

In practice, we don't get bitten by this only because it isn't profitable to CSE WC() calls and the lowering pass from guards uses distinct WC calls per branch.

Differential Revision: https://reviews.llvm.org/D69916
2019-11-06 14:16:34 -08:00
Philip Reames 9bfa5ab3d1 [LoopPred] Fix two subtle issues found by inspection
This patch fixes two issues noticed by inspection when going to enable the loop predication code in IndVarSimplify.

Issue 1 - Both the LoopPredication transform, and the already on by default optimizeLoopExits transform, modify the exit count of the exits they modify. (either to 0 or Infinity) Looking at the code more closely, this was not reflected into SCEV and we were instead running later transforms with incorrect SCEVs. Fixing this requires forgetting the loop, weakening a too strong assert, and updating SCEV to not pessimize results when a loop is provable untaken. I haven't been able to find a test case to demonstrate the miscompile.

Issue 2 - For modules without a data layout, we can end up with unsized pointer typed exit counts. Just bail out of this case.

I think these are the last two issues which need addressed before we enable this by default. The code has already survived a decent amount of fuzzing without revealing either of the above.

Differential Revision: https://reviews.llvm.org/D69695
2019-11-06 14:04:45 -08:00
Sjoerd Meijer 6c2a4f5ff9 [TTI][LV] preferPredicateOverEpilogue
We have two ways to steer creating a predicated vector body over creating a
scalar epilogue. To force this, we have 1) a command line option and 2) a
pragma available. This adds a third: a target hook to TargetTransformInfo that
can be queried whether predication is preferred or not, which allows the
vectoriser to make the decision without forcing it.

While this change behaves as a non-functional change for now, it shows the
required TTI plumbing, usage of this new hook in the vectoriser, and the
beginning of an ARM MVE implementation. I will follow up on this with:
- a complete MVE implementation, see D69845.
- a patch to disable this, i.e. we should respect "vector_predicate(disable)"
  and its corresponding loophint.

Differential Revision: https://reviews.llvm.org/D69040
2019-11-06 10:14:20 +00:00
Sanjay Patel 659bd73d13 [InstSimplify] use FMF to improve fcmp+select fold
This is part of a series of patches needed to solve PR39535:
https://bugs.llvm.org/show_bug.cgi?id=39535
2019-11-04 08:29:56 -05:00
Dávid Bolvanský decd8c4844 [SCEV] Fixed 'Uninitialized variable 'ContainsAddRec' used.' warning. NFCI. 2019-11-03 20:29:49 +01:00
Dávid Bolvanský 717965ae57 [MemorySSA] Fixed null check after dereferencing warning. NFCI. 2019-11-03 20:27:40 +01:00
Florian Hahn f0c2a5af76 [LV] Generalize conditions for sinking instrs for first order recurrences.
If the recurrence PHI node has a single user, we can sink any
instruction without side effects, given that all users are dominated by
the instruction computing the incoming value of the next iteration
('Previous'). We can sink instructions that may cause traps, because
that only causes the trap to occur later, but not on any new paths.

With the relaxed check, we also have to make sure that we do not have a
direct cycle (meaning PHI user == 'Previous), which indicates a
reduction relation, which potentially gets missed by
ReductionDescriptor.

As follow-ups, we can also sink stores, iff they do not alias with
other instructions we move them across and we could also support sinking
chains of instructions and multiple users of the PHI.

Fixes PR43398.

Reviewers: hsaito, dcaballe, Ayal, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D69228
2019-11-02 22:08:27 +01:00
Hiroshi Yamauchi 0d987e411a [PGO][PGSO] TargetLowering/TargetTransformationInfo/SwitchLoweringUtils part.
Summary:
(Split of off D67120)

TargetLowering/TargetTransformationInfo/SwitchLoweringUtils changes for profile
guided size optimization.

Reviewers: davidxl

Subscribers: eraman, hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69580
2019-10-31 13:22:56 -07:00
Johannes Doerfert 57dd4b03e4 [ValueTracking] Allow context-sensitive nullness check for non-pointers
Same as D60846 but with a fix for the problem encountered there which
was a missing context adjustment in the handling of PHI nodes.

The test that caused D60846 to be reverted was added in e15ab8f277.

Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69571
2019-10-31 14:37:38 -05:00
Johannes Doerfert cb19ea45a7 [FIX] Make LSan happy by *not* leaking memory
I left a memory leak in a printer pass which made LSan sad so I remove
the memory leak now to make LSan happy.

Reported and tested by vlad.tsyrklevich.
2019-10-31 12:16:54 -05:00
Mikael Holmen c950495405 [MustExecute] Silence clang warning about unused captured 'this'
New code introduced in fe799c97fa caused clang to complain with

../lib/Analysis/MustExecute.cpp:360:34: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
  GetterTy<LoopInfo> LIGetter = [this](const Function &F) {
                                 ^~~~
../lib/Analysis/MustExecute.cpp:365:44: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
  GetterTy<PostDominatorTree> PDTGetter = [this](const Function &F) {
                                           ^~~~
2 errors generated.
2019-10-31 09:41:05 +01:00
Johannes Doerfert fe799c97fa [MustExecute] Forward iterate over conditional branches
Summary:
If a conditional branch is encountered we can try to find a join block
where the execution is known to continue. This means finding a suitable
block, e.g., the immediate post dominator of the conditional branch, and
proofing control will always reach that block.

This patch implements different techniques that work with and without
provided analysis.

Reviewers: uenoku, sstefan1, hfinkel

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68933
2019-10-31 00:06:43 -05:00
Guillaume Chatelet a4783ef58d [Alignment][NFC] getMemoryOpCost uses MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69307
2019-10-25 21:26:59 +02:00
Philip Reames 34f68253ca [SCEV] Expose and use maximum constant exit counts for individual loop exits
We were already going to all of the trouble of computing maximum constant exit counts for each loop exit, we might as well expose them through the API.  The change in IndVars is mostly to demonstrate that the wired up code works, but it als very slightly strengthens the transform.  The strengthened case is rather narrow though: it requires one exactly analyzeable exit, one imprecisely analyzeable exit (with the upper bound less than the precise one), and one unanalyzeable exit.  I coudn't construct a reasonably stable test case.

This does increase the memory usage of the BackedgeTakenCount by a factor of 2 in the worst case.

I also noticed the loop in IndVars is O(#Exits ^ 2).  This doesn't change with this patch.  A future patch will cache this result inside of SCEV to avoid requering.
2019-10-24 19:07:33 -07:00
David Blaikie 0e8fc21c2e Fix Clang -Wcovered-switch-default warning by moving llvm_unreachable default to after the switch 2019-10-24 18:56:45 -07:00
Philip Reames c27010ef76 [SCEV] Start reworking backedge taken count APIs to unify max handling [NFC]
This is a first step in figuring out a proper API for maximum (non constant) exit counts.  This may evolve a bit as we get experience with the API needs; suggestions very welcome.  This patch just tried to provide a framework that we can later add maximum too in a clean and obvious way.
2019-10-24 18:21:55 -07:00
Simon Pilgrim c39ba0429c Fix MSVC "switch statement contains 'default' but no 'case' labels" warning. NFCI. 2019-10-24 13:40:13 -07:00
Roman Lebedev 8eda8f8ce8
[LVI][NFC] Factor solveBlockValueSaturatingIntrinsic() out of solveBlockValueIntrinsic()
Now that there's SaturatingInst class, this is cleaner.
2019-10-23 18:17:33 +03:00
Roman Lebedev 1f665046fb
[LVI][CVP] LazyValueInfoImpl::solveBlockValueBinaryOp(): use no-wrap flags from `add` op
Summary:
This was suggested in https://reviews.llvm.org/D69277#1717210
In this form (this is what was suggested, right?), the results aren't staggering
(especially since given LVI cross-block focus)
this does catch some things (as per test-suite), but not too much:

| statistic                                        |       old |       new | delta | % change |
| correlated-value-propagation.NumAddNSW           |      4981 |      4982 |     1 |  0.0201% |
| correlated-value-propagation.NumAddNW            |     12125 |     12126 |     1 |  0.0082% |
| correlated-value-propagation.NumCmps             |      1199 |      1202 |     3 |  0.2502% |
| correlated-value-propagation.NumDeadCases        |       112 |       111 |    -1 | -0.8929% |
| correlated-value-propagation.NumMulNSW           |       275 |       278 |     3 |  1.0909% |
| correlated-value-propagation.NumMulNUW           |      1323 |      1326 |     3 |  0.2268% |
| correlated-value-propagation.NumMulNW            |      1598 |      1604 |     6 |  0.3755% |
| correlated-value-propagation.NumNSW              |      7158 |      7167 |     9 |  0.1257% |
| correlated-value-propagation.NumNUW              |     13304 |     13310 |     6 |  0.0451% |
| correlated-value-propagation.NumNW               |     20462 |     20477 |    15 |  0.0733% |
| correlated-value-propagation.NumOverflows        |         4 |         7 |     3 | 75.0000% |
| correlated-value-propagation.NumPhis             |     15366 |     15381 |    15 |  0.0976% |
| correlated-value-propagation.NumSExt             |      6273 |      6277 |     4 |  0.0638% |
| correlated-value-propagation.NumShlNSW           |      1172 |      1171 |    -1 | -0.0853% |
| correlated-value-propagation.NumShlNUW           |      2793 |      2794 |     1 |  0.0358% |
| correlated-value-propagation.NumSubNSW           |       730 |       736 |     6 |  0.8219% |
| correlated-value-propagation.NumSubNUW           |      2044 |      2046 |     2 |  0.0978% |
| correlated-value-propagation.NumSubNW            |      2774 |      2782 |     8 |  0.2884% |
| instcount.NumAddInst                             |    277586 |    277569 |   -17 | -0.0061% |
| instcount.NumAndInst                             |     66056 |     66054 |    -2 | -0.0030% |
| instcount.NumBrInst                              |    709147 |    709146 |    -1 | -0.0001% |
| instcount.NumCallInst                            |    528579 |    528576 |    -3 | -0.0006% |
| instcount.NumExtractValueInst                    |     18307 |     18301 |    -6 | -0.0328% |
| instcount.NumOrInst                              |    102660 |    102665 |     5 |  0.0049% |
| instcount.NumPHIInst                             |    318008 |    318007 |    -1 | -0.0003% |
| instcount.NumSelectInst                          |     46373 |     46370 |    -3 | -0.0065% |
| instcount.NumSExtInst                            |     79496 |     79488 |    -8 | -0.0101% |
| instcount.NumShlInst                             |     40654 |     40657 |     3 |  0.0074% |
| instcount.NumTruncInst                           |     62251 |     62249 |    -2 | -0.0032% |
| instcount.NumZExtInst                            |     68211 |     68221 |    10 |  0.0147% |
| instcount.TotalBlocks                            |    843910 |    843909 |    -1 | -0.0001% |
| instcount.TotalInsts                             |   7387448 |   7387423 |   -25 | -0.0003% |

Reviewers: nikic, reames

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69321
2019-10-23 18:17:32 +03:00
Guillaume Chatelet 301b4128ac [Alignment][NFC] Finish transition for `Loads`
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69253

llvm-svn: 375419
2019-10-21 15:10:26 +00:00
Guillaume Chatelet 3cc4835c00 Use Align for TFL::TransientStackAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69216

llvm-svn: 375398
2019-10-21 08:31:25 +00:00
Philip Reames 1d509201e2 [SCEV] Simplify umin/max of zext and sext of the same value
This is a common idiom which arises after induction variables are widened, and we have two or more exit conditions.  Interestingly, we don't have instcombine or instsimplify support for this either.

Differential Revision: https://reviews.llvm.org/D69006

llvm-svn: 375349
2019-10-19 17:23:02 +00:00
Victor Campos e64863d192 [SCEV] Removing deprecated comment in ScalarEvolutionExpander
Removing a comment in the ScalarEvolutionExpander.cpp file that was about the
class SCEVSDivExpr, which has been long gone from LLVM.

llvm-svn: 375232
2019-10-18 13:33:45 +00:00
Oliver Stannard 3b598b9c86 Reland: Dead Virtual Function Elimination
Remove dead virtual functions from vtables with
replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up
correctly.

Original commit message:

Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.

This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.

To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.

The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.

This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.

To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.

I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.

On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.

I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.

I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).

Differential revision: https://reviews.llvm.org/D63932

llvm-svn: 375094
2019-10-17 09:58:57 +00:00
Guillaume Chatelet bae629b966 [Alignment][NFC] Value::getPointerAlignment returns MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68398

llvm-svn: 374889
2019-10-15 13:58:22 +00:00
Jorge Gorbe Moya b052331bd6 Revert "Dead Virtual Function Elimination"
This reverts commit 9f6a873268.

llvm-svn: 374844
2019-10-14 23:25:25 +00:00
Sam Parker 527a35e155 [NFC][TTI] Add Alignment for isLegalMasked[Load/Store]
Add an extra parameter so the backend can take the alignment into
consideration.

Differential Revision: https://reviews.llvm.org/D68400

llvm-svn: 374763
2019-10-14 10:00:21 +00:00
Zi Xuan Wu 9802268ad3 recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374634
2019-10-12 02:53:04 +00:00
Oliver Stannard 9f6a873268 Dead Virtual Function Elimination
Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.

This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.

To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.

The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.

This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.

To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.

I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.

On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.

I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.

I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).

Differential revision: https://reviews.llvm.org/D63932

llvm-svn: 374539
2019-10-11 11:59:55 +00:00
Florian Hahn 77fbf069f6 [SCEV] Add stricter verification option.
Currently -verify-scev only fails if there is a constant difference
between two BE counts. This misses a lot of cases.

This patch adds a -verify-scev-strict options, which fails for any
non-zero differences, if used together with -verify-scev.

With the stricter checking, some unit tests fail because
of mis-matches, especially around IndVarSimplify.

If there is no reason I am missing for just checking constant deltas, I
am planning on looking into the various failures.

Reviewers: efriedma, sanjoy.google, reames, atrick

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D68592

llvm-svn: 374535
2019-10-11 11:46:40 +00:00
Alina Sbirlea 6442b56974 [MemorySSA] Update Phi simplification.
When simplifying a Phi to the unique value found incoming, check that
there wasn't a Phi already created to break a cycle. If so, remove it.
Resolves PR43541.

Some additional nits included.

llvm-svn: 374471
2019-10-10 23:27:21 +00:00
Rong Xu 686fa4bbfb [ValueTracking] Improve pointer offset computation for cases of same base
This patch improves the handling of pointer offset in GEP expressions where
one argument is the base pointer. isPointerOffset() is being used by memcpyopt
where current code synthesizes consecutive 32 bytes stores to one store and
two memset intrinsic calls. With this patch, we convert the stores to one
memset intrinsic.

Differential Revision: https://reviews.llvm.org/D67989

llvm-svn: 374454
2019-10-10 21:30:43 +00:00
Alina Sbirlea 67f0c5c085 [MemorySSA] Additional handling of unreachable blocks.
Summary:
Whenever we get the previous definition, the assumption is that the
recursion starts ina  reachable block.
If the recursion starts in an unreachable block, we may recurse
indefinitely. Handle this case by returning LoE if the block is
unreachable.

Resolves PR43426.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68809

llvm-svn: 374447
2019-10-10 20:43:06 +00:00
David Greene 7c562f1286 [System Model] [TTI] Move default cache/prefetch implementations
Move the default implementations of cache and prefetch queries to
TargetTransformInfoImplBase and delete them from NoTIIImpl.  This brings these
interfaces in line with how other TTI interfaces work.

Differential Revision: https://reviews.llvm.org/D68804

llvm-svn: 374446
2019-10-10 20:39:27 +00:00
Guillaume Chatelet 837a1b84ce [Alignment][NFC] Make VectorUtils uas llvm::Align
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, rogfer01, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68784

llvm-svn: 374330
2019-10-10 12:35:04 +00:00
David Greene 2e6f6b4dad [System Model] [TTI] Update cache and prefetch TTI interfaces
Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon.  This involved
moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo.

Rework the TTI cache and software prefetching APIs to prepare for the
introduction of a general system model.  Changes include:

- Marking existing interfaces const and/or override as appropriate
- Adding comments
- Adding BasicTTIImpl interfaces that delegate to a subtarget
  implementation
- Moving the default TargetTransformInfoImplBase implementation to a default
  MCSubtarget implementation

Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC
and SystemZ.  AArch64 already has a custom subtarget implementation, so its
custom TTI implementation is migrated to use the new facilities in BasicTTIImpl
to invoke its custom subtarget implementation.  The custom TTI implementations
continue to exist for the other targets with this change.  They are not moved
over to subtarget-based implementations.

The end goal is to have the default subtarget implementation defer to the system
model defined by the target.  With this change, the default MCSubtargetInfo
implementation essentially returns the defaults TargetTransformInfoImplBase used
to return.  Existing users of TTI defaults will hit the defaults now in
MCSubtargetInfo.  Targets that define their own custom TTI implementations won't
use the BasicTTIImpl implementations that route to the subtarget.

Once system models are in place for the targets that use these interfaces, their
custom TTI implementations can be removed.

Differential Revision: https://reviews.llvm.org/D63614

llvm-svn: 374205
2019-10-09 19:51:48 +00:00
Alina Sbirlea 7faa14a98b [MemorySSA] Make the use of moveAllAfterMergeBlocks consistent.
Summary:
The rule for the moveAllAfterMergeBlocks API si for all instructions
from `From` to have been moved to `To`, while keeping the CFG edges (and
block terminators) unchanged.
Update all the callsites for moveAllAfterMergeBlocks to follow this.

Pending follow-up: since the same behavior is needed everytime, merge
all callsites into one. The common denominator may be the call to
`MergeBlockIntoPredecessor`.

Resolves PR43569.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68659

llvm-svn: 374177
2019-10-09 15:54:24 +00:00
Jinsong Ji 9912232b46 Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize"
Also Revert "[LoopVectorize] Fix non-debug builds after rL374017"

This reverts commit 9f41deccc0.
This reverts commit 18b6fe07bc.

The patch is breaking PowerPC internal build, checked with author, reverting
on behalf of him for now due to timezone.

llvm-svn: 374091
2019-10-08 17:32:56 +00:00
Graham Hunter b302561b76 [SVE][IR] Scalable Vector size queries and IR instruction support
* Adds a TypeSize struct to represent the known minimum size of a type
  along with a flag to indicate that the runtime size is a integer multiple
  of that size
* Converts existing size query functions from Type.h and DataLayout.h to
  return a TypeSize result
* Adds convenience methods (including a transparent conversion operator
  to uint64_t) so that most existing code 'just works' as if the return
  values were still scalars.
* Uses the new size queries along with ElementCount to ensure that all
  supported instructions used with scalable vectors can be constructed
  in IR.

Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen

Reviewed By: rovka, sdesmalen

Differential Revision: https://reviews.llvm.org/D53137

llvm-svn: 374042
2019-10-08 12:53:54 +00:00
Zi Xuan Wu 9f41deccc0 [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374017
2019-10-08 03:28:33 +00:00
Jordan Rose fdaa742174 Second attempt to add iterator_range::empty()
Doing this makes MSVC complain that `empty(someRange)` could refer to
either C++17's std::empty or LLVM's llvm::empty, which previously we
avoided via SFINAE because std::empty is defined in terms of an empty
member rather than begin and end. So, switch callers over to the new
method as it is added.

https://reviews.llvm.org/D68439

llvm-svn: 373935
2019-10-07 18:14:24 +00:00
Martin Storsjo dfc1aee25b Revert "[SLP] avoid reduction transform on patterns that the backend can load-combine"
This reverts SVN r373833, as it caused a failed assert "Non-zero loop
cost expected" on building numerous projects, see PR43582 for details
and reproduction samples.

llvm-svn: 373882
2019-10-07 08:21:37 +00:00
Whitney Tsang dcb75bf843 [LOOPGUARD] Remove asserts in getLoopGuardBranch
Summary: The assertion in getLoopGuardBranch can be a 'return nullptr'
under if condition.
Authored By: DTharun
Reviewer: Whitney, fhahn
Reviewed By: Whitney, fhahn
Subscribers: fhahn, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D66084

llvm-svn: 373857
2019-10-06 16:39:43 +00:00
Sanjay Patel e2321bb448 [SLP] avoid reduction transform on patterns that the backend can load-combine
I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708
https://bugs.llvm.org/show_bug.cgi?id=43146

We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).

Here, we add a scalar cost model adjustment with a conservative pattern match and cost
summation for a multi-instruction sequence that can probably be reduced later.
This should prevent SLP from creating a vector reduction unless that sequence is
extremely cheap.

In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:

  movbe   rax, qword ptr [rdi]

or:

  mov     rax, qword ptr [rdi]

Not some (half) vector monstrosity as we currently do using SLP:

  vpmovzxbq       ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
  vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
  movzx   eax, byte ptr [rdi]
  movzx   ecx, byte ptr [rdi + 5]
  shl     rcx, 40
  movzx   edx, byte ptr [rdi + 6]
  shl     rdx, 48
  or      rdx, rcx
  movzx   ecx, byte ptr [rdi + 7]
  shl     rcx, 56
  or      rcx, rdx
  or      rcx, rax
  vextracti128    xmm1, ymm0, 1
  vpor    xmm0, xmm0, xmm1
  vpshufd xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
  vpor    xmm0, xmm0, xmm1
  vmovq   rax, xmm0
  or      rax, rcx
  vzeroupper
  ret

Differential Revision: https://reviews.llvm.org/D67841

llvm-svn: 373833
2019-10-05 18:03:58 +00:00
Alina Sbirlea 24ae5ce54b [MemorySSA] Update Phi creation when inserting a Def.
MemoryPhis should be added in the IDF of the blocks newly gaining Defs.
This includes the blocks that gained a Phi and the block gaining a Def,
if the block did not have one before.
Resolves PR43427.

llvm-svn: 373505
2019-10-02 18:42:33 +00:00
Aditya Kumar 0cacf136fc Fix: Actually erase remove the elements from AssumeHandles
Reviewers: sdmitriev, tejohnson

Reviewed by: tejohnson

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68318

llvm-svn: 373494
2019-10-02 17:35:06 +00:00
Simon Pilgrim b635964abc MemorySSAUpdater::applyInsertUpdates - silence static analyzer dyn_cast<MemoryAccess> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<MemoryAccess> directly and if not assert will fire for us.

llvm-svn: 373467
2019-10-02 13:09:12 +00:00
Simon Pilgrim 65e1150988 MemorySSA tryOptimizePhi - assert that we've found a DefChainEnd. NFCI.
Silences static analyzer null dereference warning.

llvm-svn: 373466
2019-10-02 13:09:04 +00:00
Simon Pilgrim e2ded3d131 LoopAccessAnalysis isConsecutiveAccess() - silence static analyzer dyn_cast<SCEVConstant> null dereference warning. NFCI.
The static analyzer is warning about potential null dereferences, but in these cases we should be able to use cast<SCEVConstant> directly and if not assert will fire for us.

llvm-svn: 373465
2019-10-02 13:08:56 +00:00
Florian Hahn 067ed96e8e [InstCombine] Simplify fma multiplication to nan for undef or nan operands.
In similar fashion to D67721, we can simplify FMA multiplications if any
of the operands is NaN or undef. In instcombine, we will simplify the
FMA to an fadd with a NaN operand, which in turn gets folded to NaN.

Note that this just changes SimplifyFMAFMul, so we still not catch the
case where only the Add part of the FMA is Nan/Undef.

Reviewers: cameron.mcinally, mcberg2017, spatel, arsenm

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D68265

llvm-svn: 373459
2019-10-02 12:32:52 +00:00
Sanjay Patel be21ceb565 [InstSimplify] fold fma/fmuladd with a NaN or undef operand
This is intended to be similar to the constant folding results from
D67446
and earlier, but not all operands are constant in these tests, so the
responsibility for folding is left to InstSimplify.

Differential Revision: https://reviews.llvm.org/D67721

llvm-svn: 373455
2019-10-02 12:12:02 +00:00
Jay Foad c38188c5fe Remove an unnecessary cast. NFC.
llvm-svn: 373434
2019-10-02 08:56:33 +00:00
Bardia Mahjour 91b62d5c89 [DDG] Data Dependence Graph - Root Node
Summary:
This patch adds Root Node to the DDG. The purpose of the root node is to create a single entry node that allows graph walk iterators to iterate through all nodes of the graph, making sure that no node is left unvisited during a graph walk (eg. SCC or DFS). Once the DDG is fully constructed it will have exactly one root node. Every node in the graph is reachable from the root. The algorithm for connecting the root node is based on depth-first-search that keeps track of visited nodes to try to avoid creating unnecessary edges.

Authored By: bmahjour

Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert

Reviewed By: Meinersbur

Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack

Tag: #llvm

Differential Revision: https://reviews.llvm.org/D67970

llvm-svn: 373386
2019-10-01 19:32:42 +00:00
Alina Sbirlea 890090f7f5 [MemorySSA] Check for unreachable blocks when getting last definition.
If a single predecessor is found, still check if the block is
unreachable. The test that found this had a self loop unreachable block.
Resolves PR43493.

llvm-svn: 373383
2019-10-01 19:09:50 +00:00
Alina Sbirlea ae40dfc1e3 [MemorySSA] Update last_access_in_block check.
The check for "was there an access in this block" should be: is the last
access in this block and is it not a newly inserted phi.
Resolves new test in PR43438.

Also fix a typo when simplifying trivial Phis to match the comment.

llvm-svn: 373380
2019-10-01 18:34:39 +00:00
Sam Parker 95aee9da4c [NFC][HardwareLoops] Update some iterators
llvm-svn: 373309
2019-10-01 07:53:28 +00:00
Evandro Menezes 22cb3d2e58 [ConstantFolding] Fold constant calls to log2()
Somehow, folding calls to `log2()` with a constant was missing.

Differential revision: https://reviews.llvm.org/D67300

llvm-svn: 373262
2019-09-30 20:53:23 +00:00
Tim Northover 58e8c793d0 Revert "[SCEV] add no wrap flag for SCEVAddExpr."
This reverts r366419 because the analysis performed is within the context of
the loop and it's only valid to add wrapping flags to "global" expressions if
they're always correct.

llvm-svn: 373184
2019-09-30 07:46:52 +00:00
Sanjay Patel 8cecc30c99 [InstSimplify] generalize FP folds with undef/NaN; NFC
We can reuse this logic for things like fma.

llvm-svn: 373119
2019-09-27 20:09:09 +00:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
Wei Mi 9c8efeda5c Revert "[LoopInfo] Limit the iterations to check whether a loop has dedicated
exits"

Get a better approach in https://reviews.llvm.org/D68107 to solve the problem.
Revert the initial patch and will commit the new one soon.

This reverts commit rL372990.

llvm-svn: 373044
2019-09-27 05:43:30 +00:00
Whitney Tsang 9c5fbcf920 [LOOPGUARD] Disable loop with multiple loop exiting blocks.
Summary: As discussed in the loop group meeting. With the current
definition of loop guard, we should not allow multiple loop exiting
blocks. For loops that has multiple loop exiting blocks, we can simply
unable to find the loop guard.
When getUniqueExitBlock() obtains a vector size not equals to one, that
means there is either no exit blocks or there exists more than one
unique block the loop exit to.
If we don't disallow loop with multiple loop exit blocks, then with our
current implementation, there can exist exit blocks don't post dominated
by the non pre-header successor of the guard block.
Reviewer: reames, Meinersbur, kbarton, etiotto, bmahjour
Reviewed By: Meinersbur, kbarton
Subscribers: fhahn, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D66529

llvm-svn: 373011
2019-09-26 20:20:42 +00:00
Simon Pilgrim 514e6b6e6e ConstantFold - silence static analyzer dyn_cast<ExtractValueInst> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<ExtractValueInst> directly and if not assert will fire for us.

llvm-svn: 372993
2019-09-26 16:30:36 +00:00
Wei Mi 67d93f0d91 [LoopInfo] Limit the iterations to check whether a loop has dedicated exits
for extreme large case.

We had a case that a single loop which has 4000 exits and the average number
of predecessors of each exit is > 1000, and we found compiling the case spent
a significant amount of time on checking whether a loop has dedicated exits.
This patch adds a limit for the iterations to the check. With the patch, the
time to compile our testcase reduced from 1000s to 200s (clang release build).

Differential Revision: https://reviews.llvm.org/D67359

llvm-svn: 372990
2019-09-26 15:36:25 +00:00
Simon Pilgrim 7573845061 Remove local shadow constant. NFCI.
ValueTracking.cpp already has a local static MaxDepth = 6 constant - this one seems to have been missed when rL124183 landed.

llvm-svn: 372964
2019-09-26 11:30:35 +00:00
Simon Pilgrim 2dcee966ad [ValueTracking] Silence static analyzer dyn_cast<Operator> null dereference warnings. NFCI.
The static analyzer is warning about a potential null dereferences, but since the pointer is only used in a switch statement for Operator::getOpcode() (with an empty default) then its easiest just to wrap this in a null test as the dyn_cast might return null here.

llvm-svn: 372962
2019-09-26 11:09:08 +00:00
Keno Fischer cea8882254 [ConstantFolding] Use FoldBitCast correctly
Previously we might attempt to use a BitCast to turn bits into vectors of pointers,
but that requires an inttoptr cast to be legal. Add an assertion to detect the formation of illegal bitcast attempts
early (in the tests, we often constant-fold away the result before getting to this assertion check),
while being careful to still handle the early-return conditions without adding extra complexity in the result.

Patch by Jameson Nash <jameson@juliacomputing.com>.

Differential Revision: https://reviews.llvm.org/D65057

llvm-svn: 372940
2019-09-26 02:07:51 +00:00
Alina Sbirlea 6720ed851b [MemorySSA] Avoid adding Phis in the presence of unreachable blocks.
Summary:
If a block has all incoming values with the same MemoryAccess (ignoring
incoming values from unreachable blocks), then use that incoming
MemoryAccess and do not create a Phi in the first place.

Revert IDF work-around added in rL372673; it should not be required unless
the Def inserted is the first in its block.

The patch also cleans up a series of tests, added during the many
iterations on insertDef.

The patch also fixes PR43438.
The same issue that occurs in insertDef with "adding phis, hence the IDF of
Phis is needed", can also occur in fixupDefs: the `getPreviousRecursive`
call only adds Phis walking on the predecessor edges, which means there
may be the case of a Phi added walking the CFG "backwards" which
triggers the needs for an additional Phi in successor blocks.
Such Phis are added during fixupDefs only in the presence of unreachable
blocks.
Hence this highlights the need to avoid adding Phis in blocks with
unreachable predecessors in the first place.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67995

llvm-svn: 372932
2019-09-25 23:24:39 +00:00
Roman Lebedev 914a3d1cf2 [InstSimplify] Handle more 'A </>/>=/<= B &&/|| (A - B) !=/== 0' patterns (PR43251)
https://rise4fun.com/Alive/sl9s
https://rise4fun.com/Alive/2plN

https://bugs.llvm.org/show_bug.cgi?id=43251

llvm-svn: 372928
2019-09-25 22:59:41 +00:00
Florian Hahn d663efe23a [InstSimplify] Match 1.0 and 0.0 for both operands in SimplifyFMAMul
Because we do not constant fold multiplications in SimplifyFMAMul,
we match 1.0 and 0.0 for both operands, as multiplying by them
is guaranteed to produce an exact result (if it is allowed to do so).

Note that it is not enough to just swap the operands to ensure a
constant is on the RHS, as we want to also cover the case with
2 constants.

Reviewers: lebedev.ri, spatel, reames, scanon

Reviewed By: lebedev.ri, reames

Differential Revision: https://reviews.llvm.org/D67553

llvm-svn: 372915
2019-09-25 19:33:26 +00:00
Florian Hahn f3ab99dcf8 [InstCombine] Limit FMul constant folding for fma simplifications.
As @reames pointed out post-commit, rL371518 adds additional rounding
in some cases, when doing constant folding of the multiplication.
This breaks a guarantee llvm.fma makes and must be avoided.

This patch reapplies rL371518, but splits off the simplifications not
requiring rounding from SimplifFMulInst as SimplifyFMAFMul.

Reviewers: spatel, lebedev.ri, reames, scanon

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D67434

llvm-svn: 372899
2019-09-25 17:03:20 +00:00
Florian Hahn 5c3bc3c930 [PatternMatch] Make m_Br more flexible, add matchers for BB values.
Currently m_Br only takes references to BasicBlock*, which limits its
flexibility. For example, you have to declare a variable, even if you
ignore the result or you have to have additional checks to make sure the
matched BB matches an expected one.

This patch adds m_BasicBlock and m_SpecificBB matchers, which can be
used like the existing matchers for constants or values.

I also had a look at the existing uses and updated a few. IMO it makes
the code a bit more explicit.

Reviewers: spatel, craig.topper, RKSimon, majnemer, lebedev.ri

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D68013

llvm-svn: 372885
2019-09-25 15:05:08 +00:00
Sanjay Patel 6d4ea22e70 [IR] allow fast-math-flags on phi of FP values (2nd try)
The changes here are based on the corresponding diffs for allowing FMF on 'select':
D61917 <https://reviews.llvm.org/D61917>

As discussed there, we want to have fast-math-flags be a property of an FP value
because the alternative (having them on things like fcmp) leads to logical
inconsistency such as:
https://bugs.llvm.org/show_bug.cgi?id=38086

The earlier patch for select made almost no practical difference because most
unoptimized conditional code begins life as a phi (based on what I see in clang).
Similarly, I don't expect this patch to do much on its own either because
SimplifyCFG promptly drops the flags when converting to select on a minimal
example like:
https://bugs.llvm.org/show_bug.cgi?id=39535

But once we have this plumbing in place, we should be able to wire up the FMF
propagation and start solving cases like that.

The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a
regression in a LoopVectorize test. We are intersecting the FMF of any
FPMathOperator there, so if a phi is not properly annotated, new math
instructions may not be either. Once we fix the propagation in SimplifyCFG, it
may be safe to remove that hack.

Differential Revision: https://reviews.llvm.org/D67564

llvm-svn: 372878
2019-09-25 14:35:02 +00:00
Sanjay Patel 2cec4b58f5 Revert [IR] allow fast-math-flags on phi of FP values
This reverts r372866 (git commit dec03223a9)

llvm-svn: 372868
2019-09-25 13:29:09 +00:00
Sanjay Patel dec03223a9 [IR] allow fast-math-flags on phi of FP values
The changes here are based on the corresponding diffs for allowing FMF on 'select':
D61917

As discussed there, we want to have fast-math-flags be a property of an FP value
because the alternative (having them on things like fcmp) leads to logical
inconsistency such as:
https://bugs.llvm.org/show_bug.cgi?id=38086

The earlier patch for select made almost no practical difference because most
unoptimized conditional code begins life as a phi (based on what I see in clang).
Similarly, I don't expect this patch to do much on its own either because
SimplifyCFG promptly drops the flags when converting to select on a minimal
example like:
https://bugs.llvm.org/show_bug.cgi?id=39535

But once we have this plumbing in place, we should be able to wire up the FMF
propagation and start solving cases like that.

The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a
regression in a LoopVectorize test. We are intersecting the FMF of any
FPMathOperator there, so if a phi is not properly annotated, new math
instructions may not be either. Once we fix the propagation in SimplifyCFG, it
may be safe to remove that hack.

Differential Revision: https://reviews.llvm.org/D67564

llvm-svn: 372866
2019-09-25 13:14:12 +00:00
Artur Pilipenko 5c1447cd43 [SCEV] Disable canonical expansion for non-affine addrecs.
Reviewed By: apilipenko

Differential Revision: https://reviews.llvm.org/D65276

Patch by Evgeniy Brevnov (ybrevnov@azul.com)

llvm-svn: 372789
2019-09-24 23:21:07 +00:00
Hiroshi Yamauchi 857424d185 [PGO][PGSO] ProfileSummary changes.
(Split of off D67120)

ProfileSummary changes for profile guided size optimization.

Differential Revision: https://reviews.llvm.org/D67377

llvm-svn: 372783
2019-09-24 22:17:51 +00:00
Alina Sbirlea 2c5e6646ef [MemorySSA] Update Phi insertion.
Summary:
MemoryPhis may be needed following a Def insertion inthe IDF of all the
new accesses added (phis + potentially a def). Ensure this also  occurs when
only the new MemoryPhis are the defining accesses.

Note: The need for computing IDF here is because of new Phis added with
edges incoming from unreachable code, Phis that had previously been
simplified. The preferred solution is to not reintroduce such Phis.
This patch is the needed fix while working on the preferred solution.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67927

llvm-svn: 372673
2019-09-23 23:50:16 +00:00
Simon Pilgrim f62293e8fe [ValueTracking] Fix uninitialized variable warnings in matchSelectPattern const wrapper. NFCI.
Static analyzer complains about const_cast uninitialized variables, we should explicitly set these to null.

Ideally that const wrapper would go away though.......

llvm-svn: 372603
2019-09-23 13:15:52 +00:00
Roman Lebedev baf809811b [InstSimplify] simplifyUnsignedRangeCheck(): X >= Y && Y == 0 --> Y == 0
https://rise4fun.com/Alive/v9Y4

llvm-svn: 372491
2019-09-21 22:27:39 +00:00
Roman Lebedev e94f156f77 [InstSimplify][NFC] Reorganize simplifyUnsignedRangeCheck() to emphasize and/or symmetry
Only a single `X >= Y && Y == 0  -->  Y == 0` fold appears to be missing.

llvm-svn: 372490
2019-09-21 22:27:28 +00:00
Teresa Johnson 2f32e5d84d [Inliner] Remove incorrect early exit during switch cost computation
Summary:
The CallAnalyzer::visitSwitchInst has an early exit when the estimated
lower bound of the switch cost will put the overall cost of the inline
above the threshold. However, this code is not correctly estimating the
lower bound for switches that can be transformed into bit tests, leading
to unnecessary lost inlines, and also differing behavior with
optimization remarks enabled.

First, the early exit is controlled by whether ComputeFullInlineCost is
enabled or not, and that in turn is disabled by default but enabled when
enabling -pass-remarks=missed. This by itself wouldn't lead to a
problem, except that as described below, the lower bound can be above
the real lower bound, so we can sometimes get different inline decisions
with inline remarks enabled, which is problematic.

The early exit was added in along with a new switch cost model in D31085.
The reason why this early exit was added is due to a concern one reviewer
raised about compile time for large switches:
https://reviews.llvm.org/D31085?id=94559#inline-276200

However, the code just below there calls
getEstimatedNumberOfCaseClusters, which in turn immediately calls
BasicTTIImpl getEstimatedNumberOfCaseClusters, which in the worst case
does a linear scan of the cases to get the high and low values. The
bit test handling in particular is guarded by whether the number of
cases fits into the max bit width. There is no suggestion that anyone
measured a compile time issue, it appears to be theoretical.

The problem is that the reviewer's comment about the lower bound
calculation is incorrect, specifically in the case of a switch that can
be lowered to a bit test. This isn't followed up on the comment
thread, but the author does add a FIXME to that effect above the early
exit added when they subsequently revised the patch.

As a result, we were incorrectly early exiting and not inlining
functions with switch statements that would be lowered to bit tests in
cases where we were nearing the threshold. Combined with the fact that
this early exit was skipped with opt remarks enabled, this caused
different inlining decisions to be made when -pass-remarks=missed is
enabled to debug the missing inline.

Remove the early exit for the above reasons.

I also copied over an existing AArch64 inlining test to X86, and
adjusted the threshold so that the bit test inline only occurs with the
fix in this patch.

Reviewers: davidxl

Subscribers: eraman, kristof.beyls, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67716

llvm-svn: 372440
2019-09-20 23:29:17 +00:00
Shoaib Meenai d89f2d872d [Analysis] Allow -scalar-evolution-max-iterations more than once
At present, `-scalar-evolution-max-iterations` is a `cl::Optional`
option, which means it demands to be passed exactly zero or one times.
Our build system makes it pretty tricky to guarantee this. We often
accidentally pass the flag more than once (but always with the same
value) which results in an error, after which compilation fails:

```
clang (LLVM option parsing): for the -scalar-evolution-max-iterations option: may only occur zero or one times!
```

It seems reasonable to allow -scalar-evolution-max-iterations to be
passed more than once. Quoting the [[ http://llvm.org/docs/CommandLine.html#controlling-the-number-of-occurrences-required-and-allowed | documentation ]]:

> The cl::ZeroOrMore modifier ... indicates that your program will allow the option to be specified zero or more times.
> ...
> If an option is specified multiple times for an option of the cl::opt class, only the last value will be retained.

Original patch by: Enrico Bern Hardy Tanuwidjaja <etanuwid@fb.com>

Differential Revision: https://reviews.llvm.org/D67512

llvm-svn: 372346
2019-09-19 18:21:32 +00:00
Francesco Petrogalli cb032aa2c7 [SVFS] Vector Function ABI demangling.
This patch implements the demangling functionality as described in the
Vector Function ABI. This patch will be used to implement the
SearchVectorFunctionSystem (SVFS) as described in the RFC:

http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html

A fuzzer is added to test the demangling utility.

Patch by Sumedh Arani <sumedh.arani@arm.com>

Differential revision: https://reviews.llvm.org/D66024

llvm-svn: 372343
2019-09-19 17:47:32 +00:00
Bardia Mahjour db800c267d Data Dependence Graph Basics
Summary:
This is the first patch in a series of patches that will implement data dependence graph in LLVM. Many of the ideas used in this implementation are based on the following paper:
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe (1981). DEPENDENCE GRAPHS AND COMPILER OPTIMIZATIONS.
This patch contains support for a basic DDGs containing only atomic nodes (one node for each instruction). The edges are two fold: def-use edges and memory-dependence edges.
The implementation takes a list of basic-blocks and only considers dependencies among instructions in those basic blocks. Any dependencies coming into or going out of instructions that do not belong to those basic blocks are ignored.

The algorithm for building the graph involves the following steps in order:

  1. For each instruction in the range of basic blocks to consider, create an atomic node in the resulting graph.
  2. For each node in the graph establish def-use edges to/from other nodes in the graph.
  3. For each pair of nodes containing memory instruction(s) create memory edges between them. This part of the algorithm goes through the instructions in lexicographical order and creates edges in reverse order if the sink of the dependence occurs before the source of it.

Authored By: bmahjour

Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert

Reviewed By: Meinersbur, fhahn, myhsu

Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto

Tag: #llvm

Differential Revision: https://reviews.llvm.org/D65350

llvm-svn: 372238
2019-09-18 17:43:45 +00:00
Jay Foad c92e51d84b [SDA] Don't stop divergence propagation at the IPD.
Summary:
This fixes B42473 and B42706.

This patch makes the SDA propagate branch divergence until the end of the RPO traversal. Before, the SyncDependenceAnalysis propagated divergence only until the IPD in rpo order. RPO is incompatible with post dominance in the presence of loops. This made the SDA crash because blocks were missed in the propagation.

Reviewers: foad, nhaehnle

Reviewed By: foad

Subscribers: jvesely, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65274

llvm-svn: 372223
2019-09-18 13:40:22 +00:00
Bardia Mahjour 6476d7cf0b Revert "Data Dependence Graph Basics"
This reverts commit c98ec60993, which broke the sphinx-docs build.

llvm-svn: 372168
2019-09-17 19:22:01 +00:00
Bardia Mahjour c98ec60993 Data Dependence Graph Basics
Summary:
This is the first patch in a series of patches that will implement data dependence graph in LLVM. Many of the ideas used in this implementation are based on the following paper:
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe (1981). DEPENDENCE GRAPHS AND COMPILER OPTIMIZATIONS.
This patch contains support for a basic DDGs containing only atomic nodes (one node for each instruction). The edges are two fold: def-use edges and memory-dependence edges.
The implementation takes a list of basic-blocks and only considers dependencies among instructions in those basic blocks. Any dependencies coming into or going out of instructions that do not belong to those basic blocks are ignored.

The algorithm for building the graph involves the following steps in order:

  1. For each instruction in the range of basic blocks to consider, create an atomic node in the resulting graph.
  2. For each node in the graph establish def-use edges to/from other nodes in the graph.
  3. For each pair of nodes containing memory instruction(s) create memory edges between them. This part of the algorithm goes through the instructions in lexicographical order and creates edges in reverse order if the sink of the dependence occurs before the source of it.

Authored By: bmahjour

Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert

Reviewed By: Meinersbur, fhahn, myhsu

Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto

Tag: #llvm

Differential Revision: https://reviews.llvm.org/D65350

llvm-svn: 372162
2019-09-17 18:55:44 +00:00
Alina Sbirlea 4e9082ef95 [MemorySSA] Fix phi insertion when inserting a def.
Summary:
When inserting a Def, the current algorithm is walking edges backward
and inserting new Phis where needed. There may be additional Phis needed
in the IDF of the newly inserted Def and Phis.
Adding Phis in the IDF of the Def was added ina  previous patch, but we
may also need other Phis in the IDF of the newly added Phis.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67637

llvm-svn: 372138
2019-09-17 16:33:35 +00:00
Alina Sbirlea 6b2d1346d8 [MemorySSA] Update MSSA for non-conventional AA.
Summary:
Regularly when moving an instruction that may not read or write memory,
the instruction is not modelled in MSSA, so not action is necessary.
For a non-conventional AA pipeline, MSSA needs to explicitly check when
creating accesses, so as to not model instructions that may not read and
write memory.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67562

llvm-svn: 372137
2019-09-17 16:31:37 +00:00
Simon Pilgrim c52a7093df InterleavedAccessInfo - Don't dereference a dyn_cast result. NFCI.
llvm-svn: 372117
2019-09-17 13:25:56 +00:00
David Bolvansky be2487a2ba [InstCombine] Annotate strdup with deref_or_null
llvm-svn: 372098
2019-09-17 10:12:48 +00:00
Guillaume Chatelet 98cb8db836 [NFC] remove unused functions
Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67616

llvm-svn: 371994
2019-09-16 14:48:58 +00:00
Roman Lebedev 9c5a4a4527 [InstSimplify] simplifyUnsignedRangeCheck(): handle few tautological cases (PR43251)
Summary:
This is split off from D67356, since these cases produce a constant,
no real need to keep them in instcombine.

Alive proofs:
https://rise4fun.com/Alive/u7Fk
https://rise4fun.com/Alive/4lV

https://bugs.llvm.org/show_bug.cgi?id=43251

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67498

llvm-svn: 371921
2019-09-14 13:47:27 +00:00
Philip Reames bdf608477e [SCEV] Add smin support to getRangeRef
We were failing to compute trip counts (both exact and maximum) for any loop which involved a comparison against either an umin or smin. It looks like this simply got missed when we added smin/umin to SCEV.  (Note: umin was submitted separately earlier today.  Turned out two folks hit this at the same time.)

Differential Revision: https://reviews.llvm.org/D67514

llvm-svn: 371776
2019-09-12 21:32:27 +00:00
Evandro Menezes 08df6e64d5 [ConstantFolding] Expand folding of some library functions
Expanding the folding of `nearbyint()`, `rint()` and `trunc()` to library
functions, in addition to the current support for intrinsics.

Differential revision: https://reviews.llvm.org/D67468

llvm-svn: 371774
2019-09-12 21:23:22 +00:00
Florian Hahn a31ee37624 [SCEV] Support SCEVUMinExpr in getRangeRef.
This patch adds support for SCEVUMinExpr to getRangeRef,
similar to the support for SCEVUMaxExpr.

Reviewers: sanjoy.google, efriedma, reames, nikic

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D67177

llvm-svn: 371768
2019-09-12 20:03:32 +00:00
Alina Sbirlea 18f5204db4 [LICM/AST] Check if the AliasAny set is removed from the tracker.
Summary:
Resolves PR38513.
Credit to @bjope for debugging this.

Reviewers: hfinkel, uabelho, bjope

Subscribers: sanjoy.google, bjope, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67417

llvm-svn: 371752
2019-09-12 18:09:47 +00:00
Philip Reames b90f94f42e [LV] Support invariant addresses in speculation logic
Implement a TODO from rL371452, and handle loop invariant addresses in predicated blocks. If we can prove that the load is safe to speculate into the header, then we can avoid using a masked.load in favour of a normal load.

This is mostly about vectorization robustness. In the common case, it's generally expected that LICM/LoadStorePromotion would have eliminated such loads entirely.

Differential Revision: https://reviews.llvm.org/D67372

llvm-svn: 371745
2019-09-12 16:49:10 +00:00
Sanjay Patel 3f5a808365 [ConstProp] allow folding for fma that produces NaN
Folding for fma/fmuladd was added here:
rL202914
...and as seen in existing/unchanged tests, that works to propagate NaN
if it's already an input, but we should fold an fma() that creates NaN too.

From IEEE-754-2008 7.2 "Invalid Operation", there are 2 clauses that apply
to fma, so I added tests for those patterns:

  c) fusedMultiplyAdd: fusedMultiplyAdd(0, ∞, c) or fusedMultiplyAdd(∞, 0, c)
     unless c is a quiet NaN; if c is a quiet NaN then it is implementation
     defined whether the invalid operation exception is signaled
  d) addition or subtraction or fusedMultiplyAdd: magnitude subtraction of
     infinities, such as: addition(+∞, −∞)

Differential Revision: https://reviews.llvm.org/D67446

llvm-svn: 371735
2019-09-12 14:10:50 +00:00
Roman Lebedev f1286621eb [InstSimplify] simplifyUnsignedRangeCheck(): handle more cases (PR43251)
Summary:
I don't have a direct motivational case for this,
but it would be good to have this for completeness/symmetry.

This pattern is basically the motivational pattern from
https://bugs.llvm.org/show_bug.cgi?id=43251
but with different predicate that requires that the offset is non-zero.

The completeness bit comes from the fact that a similar pattern (offset != zero)
will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259,
so it'd seem to be good to not overlook very similar patterns..

Proofs: https://rise4fun.com/Alive/21b

Also, there is something odd with `isKnownNonZero()`, if the non-zero
knowledge was specified as an assumption, it didn't pick it up (PR43267)

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67411

llvm-svn: 371718
2019-09-12 09:26:17 +00:00
Evandro Menezes ed5f452645 [ConstantFolding] Refactor math functions to use LLVM ones (NFC)
When possible, replace calls to library routines on the host with equivalent
ones in LLVM.

Differential revision: https://reviews.llvm.org/D67459

llvm-svn: 371677
2019-09-11 21:46:57 +00:00
Roman Lebedev 00c1ee48e4 [InstSimplify] Pass SimplifyQuery into simplifyUnsignedRangeCheck() and use it for isKnownNonZero()
This was actually the original intention in D67332,
but i messed up and forgot about it.
This patch was originally part of D67411, but precommitting this.

llvm-svn: 371630
2019-09-11 15:32:46 +00:00
Tim Renouf c26b3940c3 [TLI][AMDGPU] AMDPAL does not have library functions
Configure TLI to say that r600/amdgpu does not have any library
functions, such that InstCombine does not do anything like turn sin/cos
into the library function @tan with sufficient fast math flags.

Differential Revision: https://reviews.llvm.org/D67406

Change-Id: I02f907d3e64832117ea9800e9f9285282856e5df
llvm-svn: 371592
2019-09-11 07:26:39 +00:00
Alina Sbirlea f7b4022db1 [MemorySSA] Do not create memoryaccesses for debug info intrinsics.
Summary:
Do not model debuginfo intrinsics in MemorySSA.
Regularly these are non-memory modifying instructions. With -disable-basicaa, they were being modelled as Defs.

Reviewers: george.burgess.iv

Subscribers: aprantl, Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67307

llvm-svn: 371565
2019-09-10 22:35:27 +00:00
Philip Reames cffa630c80 [Loads] Move generic code out of vectorizer into a location it might be reused [NFC]
llvm-svn: 371558
2019-09-10 21:33:53 +00:00
Philip Reames 1e1db80048 [ValueTracking] Factor our common speculation suppression logic [NFC]
Expose a utility function so that all places which want to suppress speculation (when otherwise legal) due to ordering and/or sanitizer interaction can do so.

llvm-svn: 371556
2019-09-10 21:12:29 +00:00
Guozhi Wei b329e0728b [BPI] Adjust the probability for floating point unordered comparison
Since NaN is very rare in normal programs, so the probability for floating point unordered comparison should be extremely small. Current probability is 3/8, it is too large, this patch changes it to a tiny number.

Differential Revision: https://reviews.llvm.org/D65303

llvm-svn: 371541
2019-09-10 17:25:11 +00:00
Roman Lebedev 6e2c5c8710 [InstSimplify] simplifyUnsignedRangeCheck(): if we know that X != 0, handle more cases (PR43246)
Summary:
This is motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.

In this particular case, given
```
char* test(char& base, unsigned long offset) {
  return &base + offset;
}
```
it will end up producing something like
https://godbolt.org/z/LK5-iH
which after optimizations reduces down to roughly
```
define i1 @t0(i8* nonnull %base, i64 %offset) {
  %base_int = ptrtoint i8* %base to i64
  %adjusted = add i64 %base_int, %offset
  %non_null_after_adjustment = icmp ne i64 %adjusted, 0
  %no_overflow_during_adjustment = icmp uge i64 %adjusted, %base_int
  %res = and i1 %non_null_after_adjustment, %no_overflow_during_adjustment
  ret i1 %res
}
```
Without D67122 there was no `%non_null_after_adjustment`,
and in this particular case we can get rid of the overhead:

Here we add some offset to a non-null pointer,
and check that the result does not overflow and is not a null pointer.
But since the base pointer is already non-null, and we check for overflow,
that overflow check will already catch the null pointer,
so the separate null check is redundant and can be dropped.

Alive proofs:
https://rise4fun.com/Alive/WRzq

There are more patterns of "unsigned-add-with-overflow", they are not handled here,
but this is the main pattern, that we currently consider canonical,
so it makes sense to handle it.

https://bugs.llvm.org/show_bug.cgi?id=43246

Reviewers: spatel, nikic, vsk

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits, reames

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67332

llvm-svn: 371349
2019-09-08 20:14:15 +00:00
Bjorn Pettersson 5e331e4ce8 [Intrinsic] Add the llvm.umul.fix.sat intrinsic
Summary:
Add an intrinsic that takes 2 unsigned integers with
the scale of them provided as the third argument and
performs fixed point multiplication on them. The
result is saturated and clamped between the largest and
smallest representable values of the first 2 operands.

This is a part of implementing fixed point arithmetic
in clang where some of the more complex operations
will be implemented as intrinsics.

Patch by: leonardchan, bjope

Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel

Reviewed By: leonardchan

Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57836

llvm-svn: 371308
2019-09-07 12:16:14 +00:00
Nikita Popov fdc6977ff3 [LVI] Look through extractvalue of insertvalue
This addresses the issue mentioned on D19867. When we simplify
with.overflow instructions in CVP, we leave behind extractvalue
of insertvalue sequences that LVI no longer understands. This
means that we can not simplify any instructions based on the
with.overflow anymore (until some over pass like InstCombine
cleans them up).

This patch extends LVI extractvalue handling by calling
SimplifyExtractValueInst (which doesn't do anything more than
constant folding + looking through insertvalue) and using the block
value of the simplification.

A possible alternative would be to do something similar to
SimplifyIndVars, where we instead directly try to replace
extractvalue users of the with.overflow. This would need some
additional structural changes to CVP, as it's currently not legal
to remove anything but the current instruction -- we'd have to
introduce a worklist with instructions scheduled for deletion or similar.

Differential Revision: https://reviews.llvm.org/D67035

llvm-svn: 371306
2019-09-07 12:03:59 +00:00
Teresa Johnson 9c27b59cec Change TargetLibraryInfo analysis passes to always require Function
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.

This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.

Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.

There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.

Reviewers: chandlerc, hfinkel

Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66428

llvm-svn: 371284
2019-09-07 03:09:36 +00:00
Evandro Menezes 7feb812ccd [ConstantFolding] Refactor functions not available before C99 (NFC)
Note the cases when calling a function at compile time may fail if the host
does not support the C99 run time library.

llvm-svn: 371236
2019-09-06 18:24:21 +00:00
Evandro Menezes 6f1369755d [ConstantFolding] Refactor function match for better speed (NFC)
Use an `enum` instead of string comparison to match the candidate function.

llvm-svn: 371228
2019-09-06 16:49:49 +00:00
Guillaume Chatelet 33671ceffa [LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::Align
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67223

llvm-svn: 371063
2019-09-05 13:09:42 +00:00
Alina Sbirlea 6da79ce1fe [MemorySSA] Re-enable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370957
2019-09-04 19:16:04 +00:00
Philip Reames 4228245e41 [NFC] Switch last couple of invariant_load checks to use hasMetadata
llvm-svn: 370948
2019-09-04 18:27:31 +00:00
Evandro Menezes 3b705ef712 [TargetLibraryInfo] Define enumerator for no library function (NFC)
Add a null enumerator do designate no library function.

llvm-svn: 370947
2019-09-04 18:15:58 +00:00
Philip Reames 27820f9909 [Instruction] Add hasMetadata(Kind) helper [NFC]
It's a common idiom, so let's add the obvious wrapper for metadata kinds which are basically booleans.

llvm-svn: 370933
2019-09-04 17:28:48 +00:00
Alina Sbirlea 594f0e0927 [MemorySSA] Move two verify calls under expensive checks.
llvm-svn: 370831
2019-09-04 00:44:54 +00:00
Alina Sbirlea ccb1862bc9 [MemorySSA] Disable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370821
2019-09-03 21:20:46 +00:00
Alina Sbirlea e331d50534 [MemorySSA] Re-enable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370811
2019-09-03 19:28:37 +00:00
Roman Lebedev ff21e3f055 [ConstantFolding] Fix 'undef' folding for @llvm.[us]{add,sub}.with.overflow ops (PR43188)
As we have already established/fixed in
  https://bugs.llvm.org/show_bug.cgi?id=42209
  https://reviews.llvm.org/D63065
  https://reviews.llvm.org/rL363522
the InstSimplify handling for @llvm.with.overflow ops with undefs
is correct. Therefore if ConstantFolding produces different results,
then it is wrong.

This duplication of code hints at the need for some refactoring,
but for now address the brokenness of ConstantFolding by
copying the known-good handling from rL363522.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43188

llvm-svn: 370608
2019-09-01 11:56:52 +00:00
Nikita Popov ac5821395b [LVI] Extract solveBlockValueExtractValue(); NFC
Extract this method in preparation for additional extractvalue
support.

llvm-svn: 370575
2019-08-31 09:58:50 +00:00
Alina Sbirlea 3d03769ba0 [MemorySSA] Rename all phi entries.
When renaming Phis incoming values, there may be multiple edges incoming
from the same block (switch). Rename all.

llvm-svn: 370548
2019-08-30 23:02:53 +00:00
Alina Sbirlea 4b87023bae Revert enabling MemorySSA.
Breaks sanitizers bots.

Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370397
2019-08-29 19:01:23 +00:00
Alina Sbirlea 6289ee941d [MemorySSA & LoopPassManager] Enable MemorySSA as loop dependency. Update tests.
Summary:
I'm not planning to check this in at the moment, but feedback is very welcome, in particular how this affects performance.
The feedback obtains here will guide the next steps towards enabling this.

This patch enables the use of MemorySSA in the loop pass manager.

Passes that currently use MemorySSA:
 - EarlyCSE
Passes that use MemorySSA after this patch:
 - EarlyCSE
 - LICM
 - SimpleLoopUnswitch
Loop passes that update MemorySSA (and do not use it yet, but could use it after this patch):
 - LoopInstSimplify
 - LoopSimplifyCFG
 - LoopUnswitch
 - LoopRotate
 - LoopSimplify
 - LCSSA
Loop passes that do *not* update MemorySSA:
 - IndVarSimplify
 - LoopDelete
 - LoopIdiom
 - LoopSink
 - LoopUnroll
 - LoopInterchange
 - LoopUnrollAndJam
 - LoopVectorize
 - LoopReroll
 - IRCE

Reviewers: chandlerc, george.burgess.iv, davide, sanjoy, gberry

Subscribers: jlebar, Prazek, dmgreen, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370384
2019-08-29 17:08:13 +00:00
Joerg Sonnenberger 799c96693f Allow replaceAndRecursivelySimplify to list unsimplified visitees.
This is part of D65280 and split it to avoid ABI changes on the 9.0
release branch.

llvm-svn: 370355
2019-08-29 13:22:30 +00:00
Roman Lebedev c584786854 [InstSimplify] Drop leftover "division-by-zero guard" around `@llvm.umul.with.overflow` inverted overflow bit
Summary:
Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`,
and with D65147 we've flattened the CFG, we now can see that
the guard may have been there to prevent division by zero is redundant.
We can simply drop it:
```
----------------------------------------
Name: no overflow or zero
  %iszero = icmp eq i4 %y, 0
  %umul = smul_overflow i4 %x, %y
  %umul.ov = extractvalue {i4, i1} %umul, 1
  %umul.ov.not = xor %umul.ov, -1
  %retval.0 = or i1 %iszero, %umul.ov.not
  ret i1 %retval.0
=>
  %iszero = icmp eq i4 %y, 0
  %umul = smul_overflow i4 %x, %y
  %umul.ov = extractvalue {i4, i1} %umul, 1
  %umul.ov.not = xor %umul.ov, -1
  %retval.0 = or i1 %iszero, %umul.ov.not
  ret i1 %umul.ov.not

Done: 1
Optimization is correct!
```
Note that this is inverted from what we have in a previous patch,
here we are looking for the inverted overflow bit.
And that inversion is kinda problematic - given this particular
pattern we neither hoist that `not` closer to `ret` (then the pattern
would have been identical to the one without inversion,
and would have been handled by the previous patch), neither
do the opposite transform. But regardless, we should handle this too.
I've filled [[ https://bugs.llvm.org/show_bug.cgi?id=42720 | PR42720 ]].

Reviewers: nikic, spatel, xbolva00, RKSimon

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65151

llvm-svn: 370351
2019-08-29 12:48:04 +00:00
Roman Lebedev aaf6ab4410 [InstSimplify] Drop leftover "division-by-zero guard" around `@llvm.umul.with.overflow` overflow bit
Summary:
Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`,
and with D65147 we've flattened the CFG, we now can see that
the guard may have been there to prevent division by zero is redundant.
We can simply drop it:
```
----------------------------------------
Name: no overflow and not zero
  %iszero = icmp ne i4 %y, 0
  %umul = umul_overflow i4 %x, %y
  %umul.ov = extractvalue {i4, i1} %umul, 1
  %retval.0 = and i1 %iszero, %umul.ov
  ret i1 %retval.0
=>
  %iszero = icmp ne i4 %y, 0
  %umul = umul_overflow i4 %x, %y
  %umul.ov = extractvalue {i4, i1} %umul, 1
  %retval.0 = and i1 %iszero, %umul.ov
  ret %umul.ov

Done: 1
Optimization is correct!
```

Reviewers: nikic, spatel, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65150

llvm-svn: 370350
2019-08-29 12:47:50 +00:00
Roman Lebedev cc95a45f8a [CostModel] Model all `extractvalue`s as free.
Summary:
As disscussed in https://reviews.llvm.org/D65148#1606412,
`extractvalue` don't actually generate any code,
so we should treat them as free.

Reviewers: craig.topper, RKSimon, jnspaulsson, greened, asb, t.p.northover, jmolloy, dmgreen

Reviewed By: jmolloy

Subscribers: javed.absar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66098

llvm-svn: 370339
2019-08-29 11:50:30 +00:00
David Bolvansky 05bda8b4e5 Annotate return values of allocation functions with dereferenceable_or_null
Summary:
Example
define dso_local noalias i8* @_Z6maixxnv() local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(64) i8* @malloc(i64 64) #6
  ret i8* %call
}


Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: aaron.ballman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66651

llvm-svn: 370168
2019-08-28 08:28:20 +00:00
Philip Reames 93a26ec98d [NFC] Assert preconditions and merge all users into one codepath in Loads.cpp
llvm-svn: 370128
2019-08-27 23:36:31 +00:00
Philip Reames 2694522f13 [Loads/SROA] Remove blatantly incorrect code and fix a bug revealed in the process
The code we had isSafeToLoadUnconditionally was blatantly wrong. This function takes a "Size" argument which is supposed to describe the span loaded from. Instead, the code use the size of the pointer passed (which may be unrelated!) and only checks that span. For any Size > LoadSize, this can and does lead to miscompiles.

Worse, the generic code just a few lines above correctly handles the cases which *are* valid. So, let's delete said code.

Removing this code revealed two issues:
1) As noted by jdoerfert the removed code incorrectly handled external globals.  The test update in SROA is to stop testing incorrect behavior.
2) SROA was confusing bytes and bits, but this wasn't obvious as the Size parameter was being essentially ignored anyway.  Fixed.

Differential Revision: https://reviews.llvm.org/D66778

llvm-svn: 370102
2019-08-27 19:34:43 +00:00
Philip Reames 20650eda99 [NFC] Replace the FIXME I added in rL369989 with a comment clarifying the current code
The current approach is restrictive (as all of geps must be multiples of the alignment), but correct.  

llvm-svn: 370013
2019-08-27 04:52:35 +00:00
Alina Sbirlea 228ffac678 [MemorySSA] Fix insertUse.
Actually call the renamePass on inserted Phis.
Fixes PR42940.

Subscribers: llvm-commits
llvm-svn: 369997
2019-08-27 00:34:47 +00:00
Philip Reames 2f858c2e91 Reorganize code and add a fixme to point out a bug in existing code [NFC]
llvm-svn: 369989
2019-08-26 23:57:27 +00:00
Benjamin Kramer d5e60669c4 [TLI] Simplify code. NFCI.
llvm-svn: 369854
2019-08-24 17:30:12 +00:00
Benjamin Kramer 7043477042 Fix some accidental global initializers by using StringLiteral instead of StringRef
llvm-svn: 369850
2019-08-24 15:24:25 +00:00
Johannes Doerfert 22e6e108e1 [BasicAA] Use dereferenceability to reason about aliasing
Summary:
We already use the fact that an object with known size X does not alias
another objection of size Y > X before. With this commit, we use
dereferenceability information to determine a lower bound for Y and not
only rely on the user provided query size.

The result for @global_and_deref_arg_2() and @local_and_deref_ret_2()
in test/Analysis/BasicAA/dereferenceable.ll improved with this patch.

Reviewers: asbirlea, chandlerc, hfinkel, sanjoy

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66157

llvm-svn: 369786
2019-08-23 17:56:10 +00:00
Johannes Doerfert a5b10b464e [MustExec] Add a generic "must-be-executed-context" explorer
Given an instruction I, the MustBeExecutedContextExplorer allows to
easily traverse instructions that are guaranteed to be executed whenever
I is. For now, these instruction have to be statically "after" I, in
the same or different basic blocks.

This patch also adds a pass which prints the must-be-executed-context
for each instruction in a module. It is used to test the
MustBeExecutedContextExplorer, for now on the examples given in the
class comment of the MustBeExecutedIterator.

Differential Revision: https://reviews.llvm.org/D65186

llvm-svn: 369765
2019-08-23 15:17:27 +00:00
Peter Collingbourne 2452d7030b IR. Change strip* family of functions to not look through aliases.
I noticed another instance of the issue where references to aliases were
being replaced with aliasees, this time in InstCombine. In the instance that
I saw it turned out to be only a QoI issue (a symbol ended up being missing
from the symbol table due to the last reference to the alias being removed,
preventing HWASAN from symbolizing a global reference), but it could easily
have manifested as incorrect behaviour.

Since this is the third such issue encountered (previously: D65118, D65314)
it seems to be time to address this common error/QoI issue once and for all
and make the strip* family of functions not look through aliases.

Includes a test for the specific issue that I saw, but no doubt there are
other similar bugs fixed here.

As with D65118 this has been tested to make sure that the optimization isn't
load bearing. I built Clang, Chromium for Linux, Android and Windows as well
as the test-suite and there were no size regressions.

Differential Revision: https://reviews.llvm.org/D66606

llvm-svn: 369697
2019-08-22 19:56:14 +00:00
Alina Sbirlea 7425179fee [LoopPassManager + MemorySSA] Only enable use of MemorySSA for LPMs known to preserve it.
Summary:
Add a flag to the FunctionToLoopAdaptor that allows enabling MemorySSA only for the loop pass managers that are known to preserve it.

If an LPM is known to have only loop transforms that *all* preserve MemorySSA, then use MemorySSA if `EnableMSSALoopDependency` is set.
If an LPM has loop passes that do not preserve MemorySSA, then the flag passed is `false`, regardless of the value of `EnableMSSALoopDependency`.

When using a custom loop pass pipeline via `passes=...`, use keyword `loop` vs `loop-mssa` to use MemorySSA in that LPM. If a loop that does not preserve MemorySSA is added while using the `loop-mssa` keyword, that's an error.

Add the new `loop-mssa` keyword to a few tests where a difference occurs when enabling MemorySSA.

Reviewers: chandlerc

Subscribers: mehdi_amini, Prazek, george.burgess.iv, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66376

llvm-svn: 369548
2019-08-21 17:00:57 +00:00
Guillaume Chatelet 1c18a9cb9e [LLVM][Alignment] Introduce Alignment In MachineFrameInfo
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: jfb

Subscribers: hiraditya, dexonsmith, llvm-commits, courbet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65800

llvm-svn: 369531
2019-08-21 14:29:30 +00:00
Alina Sbirlea 2863721f05 [MemorySSA] Make Phi cleanups consistent.
Summary:
Make Phi cleanups consistent: remove self as a trivial Phi and
recurse to potentially remove other trivial phis.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66454

llvm-svn: 369466
2019-08-20 22:47:58 +00:00
Alina Sbirlea 1c528e8f1b [MemorySSA] Fix existing phis when inserting defs.
Summary:
When inserting a new Def, and inserting Phis in the IDF when needed,
also mark the already existing Phis in the IDF as non-optimized, since
these may need fixing as well.
In the test attached, there is a Phi in the IDF that happens to be
trivial, and is wrongfully removed by the call to getLastDef that
follows. This is a valid situation and the existing IDF Phis need to
marked as "may need fixing" as well.
Resolves PR43044.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66495

llvm-svn: 369464
2019-08-20 22:29:06 +00:00
Johannes Doerfert 8b962f2814 [CaptureTracker] Let subclasses provide dereferenceability information
Summary:
CaptureTracker subclasses might have better dereferenceability
information which allows null pointer checks to be no-capturing.
The first user will be D59922.

Reviewers: sanjoy, hfinkel, aykevl, sstefan1, uenoku, xbolva00

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66371

llvm-svn: 369305
2019-08-19 21:56:38 +00:00
Evgeniy Stepanov 55ccd16354 Refactor isPointerOffset (NFC).
Summary:
Simplify the API using Optional<> and address comments in
         https://reviews.llvm.org/D66165

Reviewers: vitalybuka

Subscribers: hiraditya, llvm-commits, ostannard, pcc

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66317

llvm-svn: 369300
2019-08-19 21:08:04 +00:00
Alina Sbirlea 1a3fdaf6a6 [MemorySSA] Rename uses when inserting memory uses.
Summary:
When inserting uses from outside the MemorySSA creation, we don't
normally need to rename uses, based on the assumption that there will be
no inserted Phis (if  Def existed that required a Phi, that Phi already
exists). However, when dealing with unreachable blocks, MemorySSA will
optimize away Phis whose incoming blocks are unreachable, and these Phis end
up being re-added when inserting a Use.
There are two potential solutions here:
1. Analyze the inserted Phis and clean them up if they are unneeded
(current method for cleaning up trivial phis does not cover this)
2. Leave the Phi in place and rename uses, the same way as whe inserting
defs.
This patch use approach 2.

Resolves first test in PR42940.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66033

llvm-svn: 369291
2019-08-19 18:57:40 +00:00
Johannes Doerfert 17cb918536 [CaptureTracking] Allow null to be in either icmp operand
Summary:
Before we required the comparison against null to be "canonical", hence
null to be operand #1. This patch allows null to be in either operand,
similar to the handling of loaded globals that follows.

Reviewers: sanjoy, hfinkel, aykevl, sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66321

llvm-svn: 369158
2019-08-16 21:53:49 +00:00
Benjamin Kramer 31a47f9890 Revert "[CallGraph] Refine call graph for indirect calls with !callees metadata"
This reverts commit r369025. Crashes clang, test case is on the mailing
list.

llvm-svn: 369096
2019-08-16 10:59:18 +00:00
Tim Northover 22970d66be AssumptionCache: remove old affected values after RAUW.
If they're left in the cache then they can't be removed efficiently when the
cache is notified to unlink a @llvm.assume call, and that can lead to values
from different functions entirely remaining there.

llvm-svn: 369091
2019-08-16 09:34:27 +00:00
Florian Hahn 75be1a9e58 [ValueTracking] Fix recurrence detection to check both PHI operands.
Summary:
Currently we fail to compute known bits for recurrences where the
first incoming value is the start value of the recurrence.

Instead of exiting the loop when the first incoming value is not
the step of the recurrence, continue to check the second incoming
value.

The original code uses a loop to handle both cases, but incorrectly
exits instead of continuing.

Reviewers: lebedev.ri, spatel, nikic

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66216

llvm-svn: 369088
2019-08-16 09:15:02 +00:00
Evgeniy Stepanov 75344955fc Move isPointerOffset function to ValueTracking (NFC).
Summary: To be reused in MemTag sanitizer.

Reviewers: pcc, vitalybuka, ostannard

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66165

llvm-svn: 369062
2019-08-15 22:58:28 +00:00
Alina Sbirlea 79ff20428e [MemorySSA] Remove restrictive asserts.
The verification I added has overly restrictive asserts.
Unreachable blocks can have any incoming value in practice, after an
update due to a "replaceAllUses" call when the repalced entry is
LiveOnEntry.

llvm-svn: 369050
2019-08-15 21:20:08 +00:00
Florian Hahn 3f2850bc60 [ValueTracking] Look through ptrmask intrinsics during getUnderlyingObject.
Reviewers: nlopes, efriedma, hfinkel, sanjoy, aqjune, jdoerfert

Reviewed By: jdoerfert

Subscribers: jdoerfert, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61669

llvm-svn: 369036
2019-08-15 18:39:56 +00:00
Mark Lacey 626ed22fbe [CallGraph] Refine call graph for indirect calls with !callees metadata
For indirect call sites having a small set of possible callees,
!callees metadata can be used to indicate what those callees are.
This patch updates the call graph and lazy call graph analyses so
that they consider this metadata when encountering call sites. For
the call graph, it adds a new external call graph node to the graph
for each unique !callees metadata node. A call graph edge connects
an indirect call site with the external node associated with the
!callees metadata that is attached to it. And there is an edge from
this external node to each of the callees indicated by the metadata.
Similarly, for the lazy call graph, the patch adds Ref edges from a
caller to the possible callees indicated by the metadata.

The primary purpose of the patch is to facilitate iterating over the
functions in a module such that all of the callees indicated by a
given !callees metadata node will be visited prior to the functions
containing call sites annotated by that node. This property is
required by optimizations performing a bottom-up traversal of the
SCC DAG. For example, the inliner can be made to inline through an
indirect call. If the call site is annotated with !callees metadata,
this patch ensures that the inliner will have visited all of the
callees prior to the caller, allowing it to reliably compute the
cost of inlining one or more of the potential callees.

Original patch by @mssimpso. I've made some small changes to get it
to apply, build, and pass tests on the top of tree, as well as
some minor tweaks to formatting and functionality.

Subscribers: mehdi_amini, hiraditya, llvm-commits, mssimpso

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D39339

llvm-svn: 369025
2019-08-15 17:47:53 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Florian Hahn fd72bf21c9 [ValueTracking] Add MustPreserveNullness arg to functions analyzing calls. (NFC)
Some uses of getArgumentAliasingToReturnedPointer and
isIntrinsicReturningPointerAliasingArgumentWithoutCapturing require the
calls/intrinsics to preserve the nullness of the argument.

For alias analysis, the nullness property does not really come into
play.

This patch explicitly sets it to true. In D61669, the alias analysis
uses will be switched to not require preserving nullness.

Reviewers: nlopes, efriedma, hfinkel, sanjoy, aqjune, jdoerfert

Reviewed By: jdoerfert

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64150

llvm-svn: 368993
2019-08-15 12:13:02 +00:00
Philip Reames 7b0515176b [SCEV] Rename getMaxBackedgeTakenCount to getConstantMaxBackedgeTakenCount [NFC]
llvm-svn: 368930
2019-08-14 21:58:13 +00:00
Matt Arsenault dbc1f207fa InferAddressSpaces: Move target intrinsic handling to TTI
I'm planning on handling intrinsics that will benefit from checking
the address space enums. Don't bother moving the address collection
for now, since those won't need th enums.

llvm-svn: 368895
2019-08-14 18:13:00 +00:00
Nikita Popov 2a4f26b4c2 [ValueTracking] Improve reverse assumption inference
Use isGuaranteedToTransferExecutionToSuccessor() instead of
isSafeToSpeculativelyExecute() when seeing whether we can propagate
the information in an assume backwards in isValidAssumeForContext().
The latter is more general - it also allows arbitrary loads/stores -
and is also the condition we want: if our assume is guaranteed to
execute, its condition not holding would be UB.

Original patch by arielb1.

Differential Revision: https://reviews.llvm.org/D37215

llvm-svn: 368723
2019-08-13 17:15:42 +00:00
Stanislav Mekhanoshin 4c9c98f36b [AMDGPU] Printf runtime binding pass
This pass is a port of the according pass from the HSAIL compiler.
It parses printf calls and setup runtime printf buffer.
After that it copies printf arguments to the buffer and fills in
module metadata for runtime.

Differential Revision: https://reviews.llvm.org/D24035

llvm-svn: 368592
2019-08-12 17:12:29 +00:00
Fedor Sergeev 92e160abab [MemDep] allow to select block-scan-limit when constructing MemoryDependenceAnalysis
Introducing non-global control for default block-scan-limit in MemDep analysis.
Useful when there are many compilations per initialized LLVM instance (e.g. JIT).

Reviewed By: asbirlea
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65806

llvm-svn: 368502
2019-08-10 01:23:38 +00:00
Whitney Tsang dd3b6498b0 Title: Loop Cache Analysis
Summary: Implement a new analysis to estimate the number of cache lines
required by a loop nest.
The analysis is largely based on the following paper:

Compiler Optimizations for Improving Data Locality
By: Steve Carr, Katherine S. McKinley, Chau-Wen Tseng
http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf
The analysis considers temporal reuse (accesses to the same memory
location) and spatial reuse (accesses to memory locations within a cache
line). For simplicity the analysis considers memory accesses in the
innermost loop in a loop nest, and thus determines the number of cache
lines used when the loop L in loop nest LN is placed in the innermost
position.

The result of the analysis can be used to drive several transformations.
As an example, loop interchange could use it determine which loops in a
perfect loop nest should be interchanged to maximize cache reuse.
Similarly, loop distribution could be enhanced to take into
consideration cache reuse between arrays when distributing a loop to
eliminate vectorization inhibiting dependencies.

The general approach taken to estimate the number of cache lines used by
the memory references in the inner loop of a loop nest is:

Partition memory references that exhibit temporal or spatial reuse into
reference groups.
For each loop L in the a loop nest LN: a. Compute the cost of the
reference group b. Compute the 'cache cost' of the loop nest by summing
up the reference groups costs
For further details of the algorithm please refer to the paper.
Authored By: etiotto
Reviewers: hfinkel, Meinersbur, jdoerfert, kbarton, bmahjour, anemet,
fhahn
Reviewed By: Meinersbur
Subscribers: reames, nemanjai, MaskRay, wuzish, Hahnfeld, xusx595,
venkataramanan.kumar.llvm, greened, dmgreen, steleman, fhahn, xblvaOO,
Whitney, mgorny, hiraditya, mgrang, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63459

llvm-svn: 368439
2019-08-09 13:56:29 +00:00
Craig Topper 66c08430f6 [ValueTracking] When calculating known bits for integer abs, make sure we're looking at a negate and not just any instruction with the nsw flag set.
The matchSelectPattern code can match patterns like (x >= 0) ? x : -x
for absolute value. But it can also match ((x-y) >= 0) ? (x-y) : (y-x).
If the latter form was matched we can only use the nsw flag if its
set on both subtracts.

This match makes sure we're looking at the former case only.

Differential Revision: https://reviews.llvm.org/D65692

llvm-svn: 368195
2019-08-07 18:28:16 +00:00
Nikolai Bozhenov 03edcd68dd [SCEV] Return zero from computeConstantDifference(X, X)
Without this patch computeConstantDifference returns None for cases like
these:

  computeConstantDifference(%x, %x)
  computeConstantDifference({%x,+,16}, {%x,+,16})

Differential Revision: https://reviews.llvm.org/D65474

llvm-svn: 368193
2019-08-07 17:38:38 +00:00
Alex Lorenz 8d5c280316 TLI: darwin does not support _bcmp
Not all Darwin targets support _bcmp in all circumstances.

Differential Revision: https://reviews.llvm.org/D65834

llvm-svn: 368113
2019-08-07 00:03:37 +00:00
Fangrui Song cb4327d7db Change two unnecessary uses of llvm::size(C) to C.size()
llvm-svn: 368011
2019-08-06 10:24:36 +00:00
David Bolvansky ef72cded32 [TLI][NFC] Fixed typo
llvm-svn: 367827
2019-08-05 10:14:09 +00:00
Fangrui Song d9b948b6eb Rename F_{None,Text,Append} to OF_{None,Text,Append}. NFC
F_{None,Text,Append} are kept for compatibility since r334221.

llvm-svn: 367800
2019-08-05 05:43:48 +00:00
Sanjay Patel 9ce5f41851 [InstCombine] fold cmp+select using select operand equivalence
As discussed in PR42696:
https://bugs.llvm.org/show_bug.cgi?id=42696
...but won't help that case yet.

We have an odd situation where a select operand equivalence fold was
implemented in InstSimplify when it could have been done more generally
in InstCombine if we allow dropping of {nsw,nuw,exact} from a binop operand.

Here's an example:
https://rise4fun.com/Alive/Xplr

  %cmp = icmp eq i32 %x, 2147483647
  %add = add nsw i32 %x, 1
  %sel = select i1 %cmp, i32 -2147483648, i32 %add
  =>
  %sel = add i32 %x, 1

I've left the InstSimplify code in place for now, but my guess is that we'd
prefer to remove that as a follow-up to save on code duplication and
compile-time.

Differential Revision: https://reviews.llvm.org/D65576

llvm-svn: 367695
2019-08-02 17:39:32 +00:00
Hideki Saito 09fac2450b [LV] Avoid building interleaved group in presence of WAW dependency
Reviewers: hsaito, Ayal, fhahn, anna, mkazantsev

Reviewed By: hsaito

Patch by evrevnov, thanks!

Differential Revision: https://reviews.llvm.org/D63981

llvm-svn: 367654
2019-08-02 06:31:50 +00:00
Roman Lebedev 081e990d08 [IR] Value: add replaceUsesWithIf() utility
Summary:
While there is always a `Value::replaceAllUsesWith()`,
sometimes the replacement needs to be conditional.

I have only cleaned a few cases where `replaceUsesWithIf()`
could be used, to both add test coverage,
and show that it is actually useful.

Reviewers: jdoerfert, spatel, RKSimon, craig.topper

Reviewed By: jdoerfert

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, aheejin, george.burgess.iv, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65528

llvm-svn: 367548
2019-08-01 12:32:08 +00:00
Alina Sbirlea 7153f2784c [SCCP] Update condition to avoid overflow.
Summary:
Update condition to remove addition that may cause an overflow.
Resolves PR42814.

Reviewers: sanjoy, RKSimon

Subscribers: jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65417

llvm-svn: 367461
2019-07-31 18:22:22 +00:00
Alina Sbirlea 63e97fa0b3 [MemorySSA] Add additional verification for phis.
Summary:
Verify that the incoming defs into phis are the last defs from the
respective incoming blocks.
When moving blocks, insertDef must RenameUses. Adding this verification
makes GVNHoist tests fail that uncovered this issue.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63147

llvm-svn: 367451
2019-07-31 17:41:04 +00:00
Florian Hahn 189efe295b Recommit "[GVN] Preserve loop related analysis/canonical forms."
This fixes some pipeline tests.
This reverts commit d0b6f42936.

llvm-svn: 367401
2019-07-31 09:27:54 +00:00
Alina Sbirlea 4bc625cae0 [MemorySSA] Extend allowed behavior for simplified instructions.
Summary:
LoopRotate may simplify instructions, leading to the new instructions not having memory accesses created for them.
Allow this behavior, by allowing the new access to be null when the template is null, and looking upwards for the proper defined access when dealing with simplified instructions.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65338

llvm-svn: 367352
2019-07-30 20:10:33 +00:00
Hideto Ueno 6e2be4eab3 [FunctionAttrs] Annotate "willreturn" for AssumeLikeInst
Summary:
In D37215, AssumeLikeInstruction are regarded as `willreturn`. In this patch, annotation is added to those which don't have `willreturn` now(`sideeffect, object_size, experimental_widenable_condition`).

Reviewers: jdoerfert, nikic, sstefan1

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65455

llvm-svn: 367342
2019-07-30 18:35:29 +00:00
Florian Hahn d0b6f42936 Revert [GVN] Preserve loop related analysis/canonical forms.
This reverts r367332 (git commit 2d7227ec3a)

llvm-svn: 367335
2019-07-30 17:04:58 +00:00
Florian Hahn 2d7227ec3a [GVN] Preserve loop related analysis/canonical forms.
LoopInfo can be easily preserved by passing it to the functions that
modify the CFG (SplitCriticalEdge and MergeBlockIntoPredecessor.
SplitCriticalEdge also preserves LoopSimplify and LCSSA form when when passing in
LoopInfo. The test case shows that we preserve LoopSimplify and
LoopInfo. Adding addPreservedID(LCSSAID) did not preserve LCSSA for some
reason.

Also I am not sure if it is possible to preserve those in the new pass
manager, as they aren't analysis passes.

Reviewers: reames, hfinkel, davide, jdoerfert

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D65137

llvm-svn: 367332
2019-07-30 16:43:39 +00:00
Hideto Ueno 98d281a99f [ValueTracking] Remove volatile check in isGuaranteedToTransferExecutionToSuccessor
Summary: As clarified in D53184, volatile load and store do not trap. Therefore, we should remove volatile checks for instructions in  `isGuaranteedToTransferExecutionToSuccessor`.

Reviewers: jdoerfert, efriedma, nikic

Reviewed By: nikic

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65375

llvm-svn: 367226
2019-07-29 13:35:34 +00:00
Jay Foad dcb7532479 [DivergenceAnalysis] Add methods for querying divergence at use
Summary:
The existing isDivergent(Value) methods query whether a value is
divergent at its definition. However even if a value is uniform at its
definition, a use of it in another basic block can be divergent because
of divergent control flow between the def and the use.

This patch adds new isDivergent(Use) methods to DivergenceAnalysis,
LegacyDivergenceAnalysis and GPUDivergenceAnalysis.

This might allow D63953 or other similar workarounds to be removed.

Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue

Reviewed By: nhaehnle

Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65141

llvm-svn: 367218
2019-07-29 10:22:09 +00:00
Whitney Tsang 8ee361ebe5 [LOOPINFO] Introduce the loop guard API.
Summary:
This is the first patch for the loop guard. We introduced
getLoopGuardBranch() and isGuarded().
This currently only works on simplified loop, as it requires a preheader
and a latch to identify the guard.
It will work on loops of the form:
/// GuardBB:
///   br cond1, Preheader, ExitSucc <== GuardBranch
/// Preheader:
///   br Header
/// Header:
///  ...
///   br Latch
/// Latch:
///   br cond2, Header, ExitBlock
/// ExitBlock:
///   br ExitSucc
/// ExitSucc:
Prior discussions leading upto the decision to introduce the loop guard
API: http://lists.llvm.org/pipermail/llvm-dev/2019-May/132607.html
Reviewer: reames, kbarton, hfinkel, jdoerfert, Meinersbur, dmgreen
Reviewed By: reames
Subscribers: wuzish, hiraditya, jsji, llvm-commits, bmahjour, etiotto
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63885

llvm-svn: 367033
2019-07-25 16:13:18 +00:00
Jay Foad 565c54320e [InstSimplify] Rename SimplifyFPUnOp and SimplifyFPBinOp
Summary:
SimplifyFPBinOp is a variant of SimplifyBinOp that lets you specify
fast math flags, but the name is misleading because both functions
can simplify both FP and non-FP ops. Instead, overload SimplifyBinOp
so that you can optionally specify fast math flags.

Likewise for SimplifyFPUnOp.

Reviewers: spatel

Reviewed By: spatel

Subscribers: xbolva00, cameron.mcinally, eraman, hiraditya, haicheng, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64902

llvm-svn: 366902
2019-07-24 12:50:10 +00:00
Peter Collingbourne 710605c085 Analysis: Don't look through aliases when simplifying GEPs.
It is not safe in general to replace an alias in a GEP with its aliasee
if the alias can be replaced with another definition (i.e. via strong/weak
resolution (linkonce_odr) or via symbol interposition (default visibility
in ELF)) while the aliasee cannot. An example of how this can go wrong is
in the included test case.

I was concerned that this might be a load-bearing misoptimization (it's
possible for us to use aliases to share vtables between base and derived
classes, and on Windows, vtable symbols will always be aliases in RTTI
mode, so this change could theoretically inhibit trivial devirtualization
in some cases), so I built Chromium for Linux and Windows with and without
this change. The file sizes of the resulting binaries were identical, so it
doesn't look like this is going to be a problem.

Differential Revision: https://reviews.llvm.org/D65118

llvm-svn: 366754
2019-07-22 22:13:46 +00:00
Michael Liao 17a8a9277c [LAA] Re-check bit-width of pointers after stripping.
Summary:
- As the pointer stripping now tracks through `addrspacecast`, prepare
  to handle the bit-width difference from the result pointer.

Reviewers: jdoerfert

Subscribers: jvesely, nhaehnle, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64928

llvm-svn: 366470
2019-07-18 17:30:27 +00:00
Chen Zheng c38e3efe27 [SCEV] add no wrap flag for SCEVAddExpr.
Differential Revision: https://reviews.llvm.org/D64868

llvm-svn: 366419
2019-07-18 09:23:19 +00:00
Evgeniy Stepanov 6abd78cc7c Make DT a transitive dependency of LI.
Summary:
LoopInfoWrapperPass::verify uses DT, which means DT must be alive
even if it has no direct users.

Fixes a crash in expensive checks mode.

Reviewers: pcc, leonardchan

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64896

llvm-svn: 366388
2019-07-17 23:31:59 +00:00
Evgeniy Stepanov d752f5e953 Basic codegen for MTE stack tagging.
Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.

Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.

Differential Revision: https://reviews.llvm.org/D64172

llvm-svn: 366360
2019-07-17 19:24:02 +00:00
Daniil Fukalov d912a9ba9b [AMDGPU] Tune inlining parameters for AMDGPU target
Summary:
Since the target has no significant advantage of vectorization,
vector instructions bous threshold bonus should be optional.

amdgpu-inline-arg-alloca-cost parameter default value and the target
InliningThresholdMultiplier value tuned then respectively.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64642

llvm-svn: 366348
2019-07-17 16:51:29 +00:00
Michael Liao 543ba4e9e0 [InstructionSimplify] Apply sext/trunc after pointer stripping
Summary:
- As the pointer stripping could trace through `addrspacecast` now, need
  to sext/trunc the offset to ensure it has the same width as the
  pointer after stripping.

Reviewers: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64768

llvm-svn: 366162
2019-07-16 01:03:06 +00:00
Nick Desaulniers c4f245b40a [LoopUnroll+LoopUnswitch] do not transform loops containing callbr
Summary:
There is currently a correctness issue when unrolling loops containing
callbr's where their indirect targets are being updated correctly to the
newly created labels, but their operands are not.  This manifests in
unrolled loops where the second and subsequent copies of callbr
instructions have blockaddresses of the label from the first instance of
the unrolled loop, which would result in nonsensical runtime control
flow.

For now, conservatively do not unroll the loop.  In the future, I think
we can pursue unrolling such loops provided we transform the cloned
callbr's operands correctly.

Such a transform and its legalities are being discussed in:
https://reviews.llvm.org/D64101

Link: https://bugs.llvm.org/show_bug.cgi?id=42489
Link: https://groups.google.com/forum/#!topic/clang-built-linux/z-hRWP9KqPI

Reviewers: fhahn, hfinkel, efriedma

Reviewed By: fhahn, hfinkel, efriedma

Subscribers: efriedma, hiraditya, zzheng, dmgreen, llvm-commits, pirama, kees, nathanchance, E5ten, craig.topper, chandlerc, glider, void, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64368

llvm-svn: 366130
2019-07-15 21:16:29 +00:00
Johannes Doerfert 2d63fbb7b1 [ValueTracking] Look through constant Int2Ptr/Ptr2Int expressions
Summary:
This is analogous to the int2ptr/ptr2int instruction handling introduced
in D54956.

Reviewers: fhahn, efriedma, spatel, nlopes, sanjoy, lebedev.ri

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64708

llvm-svn: 366036
2019-07-15 03:24:35 +00:00
Vitaly Buka b1bff76e22 isBytewiseValue checks ConstantVector element by element
Summary: Vector of the same value with few undefs will sill be considered "Bytewise"

Reviewers: eugenis, pcc, jfb

Reviewed By: jfb

Subscribers: dexonsmith, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64031

llvm-svn: 365971
2019-07-12 22:37:55 +00:00
Alina Sbirlea db101864bd [MemorySSA] Use SetVector to avoid nondeterminism.
Summary:
Use a SetVector for DeadBlockSet.
Resolves PR42574.

Reviewers: george.burgess.iv, uabelho, dblaikie

Subscribers: jlebar, Prazek, mgrang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64601

llvm-svn: 365970
2019-07-12 22:30:30 +00:00
Fangrui Song b251cc0d91 Delete dead stores
llvm-svn: 365903
2019-07-12 14:58:15 +00:00
Vitaly Buka 52096ee9a9 Return Undef from isBytewiseValue for empty arrays or structs
Reviewers: pcc, eugenis

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64052

llvm-svn: 365864
2019-07-12 02:23:07 +00:00
Vitaly Buka c559e63798 Handle IntToPtr in isBytewiseValue
Summary:
This helps with more efficient use of memset for pattern initialization

From @pcc prototype for -ftrivial-auto-var-init=pattern optimizations

Binary size change on CTMark, (with -fuse-ld=lld -Wl,--icf=all, similar results with default linker options)
```
                   master           patch      diff
Os           8.238864e+05    8.238864e+05       0.0
O3           1.054797e+06    1.054797e+06       0.0
Os zero      8.292384e+05    8.292384e+05       0.0
O3 zero      1.062626e+06    1.062626e+06       0.0
Os pattern   8.579712e+05    8.338048e+05 -0.030299
O3 pattern   1.090502e+06    1.067574e+06 -0.020481
```

Zero vs Pattern on master
```
               zero       pattern      diff
Os     8.292384e+05  8.579712e+05  0.036578
O3     1.062626e+06  1.090502e+06  0.025124
```

Zero vs Pattern with the patch
```
               zero       pattern      diff
Os     8.292384e+05  8.338048e+05  0.003333
O3     1.062626e+06  1.067574e+06  0.003193
```

Reviewers: pcc, eugenis

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D63967

llvm-svn: 365858
2019-07-12 01:42:03 +00:00
Tim Northover 67828edbbd OpaquePtr: switch to GlobalValue::getValueType in a few places. NFC.
llvm-svn: 365770
2019-07-11 13:13:02 +00:00
Tim Northover 030bb3d363 InstructionSimplify: Simplify InstructionSimplify. NFC.
The interface predates CallBase, so both it and implementation were
significantly more complicated than they needed to be. There was even
some redundancy that could be eliminated.

Should also help with OpaquePointers by not trying to derive a
function's type from it's PointerType.

llvm-svn: 365767
2019-07-11 13:11:44 +00:00
Chen Zheng 627095ec5b [SCEV] teach SCEV symbolical execution about overflow intrinsics folding.
Differential Revision: https://reviews.llvm.org/D64422

llvm-svn: 365726
2019-07-11 02:18:22 +00:00
Johannes Doerfert 3ed286a388 Replace three "strip & accumulate" implementations with a single one
This patch replaces the three almost identical "strip & accumulate"
implementations for constant pointer offsets with a single one,
combining the respective functionalities. The old interfaces are kept
for now.

Differential Revision: https://reviews.llvm.org/D64468

llvm-svn: 365723
2019-07-11 01:14:48 +00:00
Vitaly Buka d03bd1db59 NFC: Pass DataLayout into isBytewiseValue
Summary:
We will need to handle IntToPtr which I will submit in a separate patch as it's
not going to be NFC.

Reviewers: eugenis, pcc

Reviewed By: eugenis

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D63940

llvm-svn: 365709
2019-07-10 22:53:52 +00:00
Jinsong Ji 06fef0b359 Revert "[HardwareLoops] NFC - move hardware loop checking code to isHardwareLoopProfitable()"
This reverts commit d955573065.

llvm-svn: 365520
2019-07-09 17:53:09 +00:00
Chen Zheng d955573065 [HardwareLoops] NFC - move hardware loop checking code to isHardwareLoopProfitable()
Differential Revision: https://reviews.llvm.org/D64197

llvm-svn: 365497
2019-07-09 14:56:17 +00:00
Tim Northover 60afa49abe OpaquePtr: add Type parameter to Loads analysis API.
This makes the functions in Loads.h require a type to be specified
independently of the pointer Value so that when pointers have no structure
other than address-space, it can still do its job.

Most callers had an obvious memory operation handy to provide this type, but a
SROA and ArgumentPromotion were doing more complicated analysis. They get
updated to merge the properties of the various instructions they were
considering.

llvm-svn: 365468
2019-07-09 11:35:35 +00:00
Denis Bakhvalov 74be349bcf [SCEV] Fix for PR42397. SCEVExpander wrongly adds nsw to shl instruction.
Change-Id: I76c9f628c092ae3e6e78ebdaf55cec726e25d692
llvm-svn: 365363
2019-07-08 18:03:43 +00:00
Brian Homerding b4b21d807e Add, and infer, a nofree function attribute
This patch adds a function attribute, nofree, to indicate that a function does
not, directly or indirectly, call a memory-deallocation function (e.g., free,
C++'s operator delete).

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D49165

llvm-svn: 365336
2019-07-08 15:57:56 +00:00
Eugene Leviant 3aef35288b [ThinLTO] Attempt to recommit r365188 after alignment fix
llvm-svn: 365215
2019-07-05 15:25:05 +00:00
Eugene Leviant e91f86f0ac Reverted r365188 due to alignment problems on i686-android
llvm-svn: 365206
2019-07-05 13:26:05 +00:00
Eugene Leviant 820cc01d1e [ThinLTO] Attempt to recommit r365040 after caching fix
It's possible that some function can load and store the same
variable using the same constant expression:

store %Derived* @foo, %Derived** bitcast (%Base** @bar to %Derived**)
%42 = load %Derived*, %Derived** bitcast (%Base** @bar to %Derived**)

The bitcast expression was mistakenly cached while processing loads,
and never examined later when processing store. This caused @bar to
be mistakenly treated as read-only variable. See load-store-caching.ll.

llvm-svn: 365188
2019-07-05 12:00:10 +00:00
Reid Kleckner f7e52fbdb5 Revert [ThinLTO] Optimize writeonly globals out
This reverts r365040 (git commit 5cacb91475)

Speculatively reverting, since this appears to have broken check-lld on
Linux. Partial analysis in https://crbug.com/981168.

llvm-svn: 365097
2019-07-04 00:03:30 +00:00
Evgeniy Stepanov 50dc28b556 Teach ValueTracking that aarch64.irg result aliases its input.
Reviewers: javed.absar, olista01

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64103

llvm-svn: 365079
2019-07-03 20:19:14 +00:00
Philip Reames 39e7a97ad7 [SCEV] Preserve flags on add/muls in getSCEVATScope
We haven't changed the set of users, just specialized an operand for those users.  Given that, the previous wrap flags must still be correct.

Sorry for the lack of test case.  Noticed this while working on something else, and haven't figured out to exercise this standalone.

llvm-svn: 365053
2019-07-03 16:34:08 +00:00
Eugene Leviant 5cacb91475 [ThinLTO] Optimize writeonly globals out
Differential revision: https://reviews.llvm.org/D63444

llvm-svn: 365040
2019-07-03 14:14:52 +00:00
Eugene Leviant ac407a7b4a [SCEV][LSR] Prevent using undefined value in binops
On some occasions ReuseOrCreateCast may convert previously
expanded value to undefined. That value may be passed by
SCEVExpander as an argument to InsertBinop making IV chain
undefined.

Differential revision: https://reviews.llvm.org/D63928 

llvm-svn: 365009
2019-07-03 09:36:32 +00:00
Jordan Rupprecht 02647f73d4 Revert [InlineCost] cleanup calculations of Cost and Threshold
This reverts r364422 (git commit 1a3dc76186)

The inlining cost calculation is incorrect, leading to stack overflow due to large stack frames from heavy inlining.

llvm-svn: 365000
2019-07-03 04:01:51 +00:00
Chen Zheng dfdccbb26b [PowerPC] exclude ICmpZero in LSR if icmp can be replaced in later hardware loop.
Differential Revision: https://reviews.llvm.org/D63477

llvm-svn: 364993
2019-07-03 01:49:03 +00:00
Teresa Johnson a700436323 [ThinLTO] Add summary entries for index-based WPD
Summary:
If LTOUnit splitting is disabled, the module summary analysis computes
the summary information necessary to perform single implementation
devirtualization during the thin link with the index and no IR. The
information collected from the regular LTO IR in the current hybrid WPD
algorithm is summarized, including:
1) For vtable definitions, record the function pointers and their offset
within the vtable initializer (subsumes the information collected from
IR by tryFindVirtualCallTargets).
2) A record for each type metadata summarizing the vtable definitions
decorated with that metadata (subsumes the TypeIdentiferMap collected
from IR).

Also added are the necessary bitcode records, and the corresponding
assembly support.

The follow-on index-based WPD patch is D55153.

Depends on D53890.

Reviewers: pcc

Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54815

llvm-svn: 364960
2019-07-02 19:38:02 +00:00
Kristof Umann 9353421ecd [IDF] Generalize IDFCalculator to be used with Clang's CFG
I'm currently working on a GSoC project that aims to improve the the bug reports
of the analyzer. The main heuristic I plan to use is to explain values that are
a control dependency of the bug location better.

01 bool b = messyComputation();
02 int i = 0;
03 if (b) // control dependency of the bug site, let's explain why we assume val
04        // to be true
05   10 / i; // warn: division by zero

Because of this, I'd like to generalize IDFCalculator so that I could use it for
Clang's CFG: D62883.

In detail:

* Rename IDFCalculator to IDFCalculatorBase, make it take a general CFG node
  type as a template argument rather then strictly BasicBlock (but preserve
  ForwardIDFCalculator and ReverseIDFCalculator)
* Move IDFCalculatorBase from llvm/include/llvm/Analysis to
  llvm/include/llvm/Support (but leave the BasicBlock variants in
  llvm/include/llvm/Analysis)
* clang-format the file since this patch messes up git blame anyways
* Change typedef to using
* Add the new type ChildrenGetterTy, and store an instance of it in
  IDFCalculatorBase. This is important because I'll have to specialize it for
  Clang's CFG to filter out nullpointer successors, similarly to D62507.

Differential Revision: https://reviews.llvm.org/D63389

llvm-svn: 364911
2019-07-02 11:30:12 +00:00
Fangrui Song 78ee2fbf98 Cleanup: llvm::bsearch -> llvm::partition_point after r364719
llvm-svn: 364720
2019-06-30 11:19:56 +00:00
Johannes Doerfert 6ed459fd41 Use "willreturn" in isGuaranteedToTransferExecutionToSuccessor
The `willreturn` function attribute guarantees that a function call will
come back to the call site if the call is also known not to throw.
Therefore, this attribute can be used in
`isGuaranteedToTransferExecutionToSuccessor`.

Patch by Hideto Ueno (@uenoku)

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63372

llvm-svn: 364580
2019-06-27 19:29:48 +00:00
Philip Reames 1cf9e72cbc Update -analyze -scalar-evolution output for multiple exit loops w/computable exit values
The previous output was next to useless if *any* exit was not computable.  If we have more than one exit, show the exit count for each so that it's easier to see what's going from with SCEV analysis when debugging.

llvm-svn: 364579
2019-06-27 19:22:43 +00:00
Fedor Sergeev 1a3dc76186 [InlineCost] cleanup calculations of Cost and Threshold
Summary:
Doing better separation of Cost and Threshold.
Cost counts the abstract complexity of live instructions, while Threshold is an upper bound of complexity that inlining is comfortable to pay.
There are two parts:
     - huge 15K last-call-to-static bonus is no longer subtracted from Cost
       but rather is now added to Threshold.

       That makes much more sense, as the cost of inlining (Cost) is not changed by the fact
       that internal function is called once. It only changes the likelyhood of this inlining
       being profitable (Threshold).

     - bonus for calls proved-to-be-inlinable into callee is no longer subtracted from Cost
       but added to Threshold instead.

While calculations are somewhat different,  overall InlineResult should stay the same since Cost >= Threshold compares the same.

Reviewers: eraman, greened, chandlerc, yrouban, apilipenko
Reviewed By: apilipenko
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60740

llvm-svn: 364422
2019-06-26 13:24:24 +00:00
Chen Zheng aa99952896 [HardwareLoops] NFC - move loop with irreducible control flow checking logic to HarewareLoopInfo.
llvm-svn: 364415
2019-06-26 12:02:43 +00:00
Chen Zheng 46ce9e4fff [HardwareLoops] NFC - move loop with irreducible control flow checking logic to isHardwareLoopProfitable()
llvm-svn: 364397
2019-06-26 09:12:52 +00:00
Clement Courbet 3bc5ad551a [ExpandMemCmp] Move all options to TargetTransformInfo.
Split off from D60318.

llvm-svn: 364281
2019-06-25 08:04:13 +00:00
Bjorn Pettersson 485a421876 [ConstantFolding] Use hasVectorInstrinsicScalarOpd. NFC
Summary:
Use the hasVectorInstrinsicScalarOpd helper function
in ConstantFoldVectorCall.

Reviewers: rengolin, RKSimon, dblaikie

Reviewed By: rengolin, RKSimon

Subscribers: tschuett, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63705

llvm-svn: 364178
2019-06-24 12:07:17 +00:00
Bjorn Pettersson 512b118779 [Scalarizer] Add scalarizer support for smul.fix.sat
Summary:
Handle smul.fix.sat in the scalarizer. This is done by
adding smul.fix.sat to the set of "isTriviallyVectorizable"
intrinsics.

The addition of smul.fix.sat in isTriviallyVectorizable and
hasVectorInstrinsicScalarOpd can also be seen as a preparation
to be able to use hasVectorInstrinsicScalarOpd in ConstantFolding.

Reviewers: rengolin, RKSimon, dblaikie

Reviewed By: rengolin

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63704

llvm-svn: 364177
2019-06-24 12:07:11 +00:00
Fangrui Song dc8de6037c Simplify std::lower_bound with llvm::{bsearch,lower_bound}. NFC
llvm-svn: 364006
2019-06-21 05:40:31 +00:00
Sanjay Patel b342f026a4 [InstSimplify] simplify power-of-2 (single bit set) sequences
As discussed in PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314

Improving the canonicalization for these patterns:
rL363956
...means we should adjust/enhance the related simplification.

https://rise4fun.com/Alive/w1cp

  Name: isPow2 or zero
  %x = and i32 %xx, 2048
  %a = add i32 %x, -1
  %r = and i32 %a, %x
  =>
  %r = i32 0

llvm-svn: 363997
2019-06-20 22:55:28 +00:00
Alina Sbirlea 109d2ea153 [MemorySSA] Cleanup trivial phis.
Summary:
This is unfortunately needed for correctness, if we are to extend the tolerance of the update API to the way simple loop unswitch is doing cloning.

In simple loop unswitch (as opposed to loop unswitch), not all blocks are cloned. This can create unreachable cloned blocks (no predecessor), which are later cleaned up.

In MemorySSA, the  APIs for supporting these kind of updates (clone + update exit blocks), make certain assumption on the integrity of the CFG. When cloning, if something was not cloned, it's values in MemorySSA default to LiveOnEntry. When updating exit blocks, it is safe to assume that we can first insert phis in the blocks merging two clones, then add additional phis in the IDF of the blocks that received phis. This no longer holds true if one of the clones being merged comes from an unreachable block. We'd conservatively need to add all phis before filling in their incoming definitions. In practice this restriction can be relaxed if we clean up trivial phis after the first round of insertion.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63354

llvm-svn: 363880
2019-06-19 21:33:09 +00:00
Alina Sbirlea 238b8e62b6 [MemorySSA] Use GraphDiff info when computing IDF.
Summary:
When computing IDF for insert updates, ensure we use the snapshot CFG offered by GraphDiff.
Caught by D63389.

Reviewers: kuhar, george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits, Szelethus

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63443

llvm-svn: 363879
2019-06-19 21:17:31 +00:00
Bjorn Pettersson 16ff5fea87 [ConstantFolding] Add constant folding for smul.fix and smul.fix.sat
Summary:
This patch teaches ConstantFolding to constant fold
both scalar and vector variants of llvm.smul.fix and
llvm.smul.fix.sat.

As described in the LangRef rounding is unspecified for
these instrinsics. If the result cannot be represented
exactly the default behavior in ConstantFolding is to
round down towards negative infinity. If a target has a
preferred rounding that is different some kind of target
hook would be needed (same strategy as used by the
SelectionDAG legalizer).

Reviewers: nikic, leonardchan, RKSimon

Reviewed By: leonardchan

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63385

llvm-svn: 363811
2019-06-19 14:28:03 +00:00
Bjorn Pettersson b81b9a4e7b [ConstantFolding] Refactor ConstantFoldScalarCall. NFC
This patch splits ConstantFoldScalarCall into several
functions.

Benefits:
- Reduces indentation levels and avoids long if-statements.
- Makes it easier to add support for > 3 operands.

llvm-svn: 363810
2019-06-19 14:27:51 +00:00
Jay Foad 45d19fb470 [ConstantFolding] Fix assertion failure on non-power-of-two vector load.
Summary:
The test case does an (out of bounds) load from a global constant with
type <3 x float>. InstSimplify tried to turn this into an integer load
of the whole alloc size of the vector, which is 128 bits due to
alignment padding, and then bitcast this to <3 x vector> which failed
an assertion due to the type size mismatch.

The fix is to do an integer load of the normal size of the vector, with
no alignment padding.

Reviewers: tpr, arsenm, majnemer, dstuttard

Reviewed By: arsenm

Subscribers: hfinkel, wdng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63375

llvm-svn: 363784
2019-06-19 10:28:48 +00:00
Chen Zheng c5b918de58 [NFC] move some hardware loop checking code to a common place for other using.
Differential Revision: https://reviews.llvm.org/D63478

llvm-svn: 363758
2019-06-19 01:26:31 +00:00
Amara Emerson 146882242f [GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra block.
Inter-block localization is the same as what currently happens, except now it
only runs on the entry block because that's where the problematic constants with
long live ranges come from.

The second phase is a new intra-block localization phase which attempts to
re-sink the already localized instructions further right before one of the
multiple uses.

One additional change is to also localize G_GLOBAL_VALUE as they're constants
too. However, on some targets like arm64 it takes multiple instructions to
materialize the value, so some additional heuristics with a TTI hook have been
introduced attempt to prevent code size regressions when localizing these.

Overall, these changes improve CTMark code size on arm64 by 1.2%.

Full code size results:

Program                                         baseline       new       diff
------------------------------------------------------------------------------
 test-suite...-typeset/consumer-typeset.test    1249984      1217216     -2.6%
 test-suite...:: CTMark/ClamAV/clamscan.test    1264928      1232152     -2.6%
 test-suite :: CTMark/SPASS/SPASS.test          1394092      1361316     -2.4%
 test-suite...Mark/mafft/pairlocalalign.test    731320       714928      -2.2%
 test-suite :: CTMark/lencod/lencod.test        1340592      1324200     -1.2%
 test-suite :: CTMark/kimwitu++/kc.test         3853512      3820420     -0.9%
 test-suite :: CTMark/Bullet/bullet.test        3406036      3389652     -0.5%
 test-suite...ark/tramp3d-v4/tramp3d-v4.test    8017000      8016992     -0.0%
 test-suite...TMark/7zip/7zip-benchmark.test    2856588      2856588      0.0%
 test-suite...:: CTMark/sqlite3/sqlite3.test    765704       765704       0.0%
 Geomean difference                                                      -1.2%

Differential Revision: https://reviews.llvm.org/D63303

llvm-svn: 363632
2019-06-17 23:20:29 +00:00
Philip Reames 44475363e8 Teach getSCEVAtScope how to handle loop phis w/invariant operands in loops w/taken backedges
This patch really contains two pieces:
    Teach SCEV how to fold a phi in the header of a loop to the value on the backedge when a) the backedge is known to execute at least once, and b) the value is safe to use globally within the scope dominated by the original phi.
    Teach IndVarSimplify's rewriteLoopExitValues to allow loop invariant expressions which already exist (and thus don't need new computation inserted) even in loops where we can't optimize away other uses.

Differential Revision: https://reviews.llvm.org/D63224

llvm-svn: 363619
2019-06-17 21:06:17 +00:00
Alina Sbirlea 7a0098aa6e [MemorySSA] Don't use template when the clone is a simplified instruction.
Summary:
LoopRotate doesn't create a faithful clone of an instruction, it may
simplify it beforehand. Hence the clone of an instruction that has a
MemoryDef associated may not be a definition, but a use or not a memory
alternig instruction.
Don't rely on the template when the clone may be simplified.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63355

llvm-svn: 363597
2019-06-17 18:58:40 +00:00
Alina Sbirlea 05f77803f4 [MemorySSA] Add all MemoryPhis before filling their values.
Summary:
Add all MemoryPhis in IDF before filling in their incomign values.
Otherwise, a new Phi can be added that needs to become the incoming
value of another Phi.
Test fails the verification in verifyPrevDefInPhis.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63353

llvm-svn: 363590
2019-06-17 18:16:53 +00:00
Warren Ristow 6452bdd29b [LV] Suppress vectorization in some nontemporal cases
When considering a loop containing nontemporal stores or loads for
vectorization, suppress the vectorization if the corresponding
vectorized store or load with the aligment of the original scaler
memory op is not supported with the nontemporal hint on the target.

This adds two new functions:
  bool isLegalNTStore(Type *DataType, unsigned Alignment) const;
  bool isLegalNTLoad(Type *DataType, unsigned Alignment) const;

to TTI, leaving the target independent default implementation as
returning true, but with overriding implementations for X86 that
check the legality based on available Subtarget features.

This fixes https://llvm.org/PR40759

Differential Revision: https://reviews.llvm.org/D61764

llvm-svn: 363581
2019-06-17 17:20:08 +00:00
Sam Parker 60d6fb2a63 [SCEV] Use NoWrapFlags when expanding a simple mul
Second functional change following on from rL362687. Pass the
NoWrapFlags from the MulExpr to InsertBinop when we're generating a
shl or mul.

Differential Revision: https://reviews.llvm.org/D61934

llvm-svn: 363540
2019-06-17 10:05:18 +00:00
Roman Lebedev 5a663bd77a [InstSimplify] Fix addo/subo undef folds (PR42209)
Fix folds of addo and subo with an undef operand to be:

`@llvm.{u,s}{add,sub}.with.overflow` all fold to `{ undef, false }`,
 as per LLVM undef rules.
Same for commuted variants.

Based on the original version of the patch by @nikic.

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42209 | PR42209 ]]

Differential Revision: https://reviews.llvm.org/D63065

llvm-svn: 363522
2019-06-16 20:39:45 +00:00
Nikita Popov 8550fb386a [SCEV] Use unsigned/signed intersection type in SCEV
Based on D59959, this switches SCEV to use unsigned/signed range
intersection based on the sign hint. This will prefer non-wrapping
ranges in the relevant domain. I've left the one intersection in
getRangeForAffineAR() to use the smallest intersection heuristic,
as there doesn't seem to be any obvious preference there.

Differential Revision: https://reviews.llvm.org/D60035

llvm-svn: 363490
2019-06-15 09:15:52 +00:00
Akira Hatanaka a704a8f28c [ObjC][ARC] Delete ObjC runtime calls on global variables annotated
with 'objc_arc_inert'

Those calls are no-ops, so they can be safely deleted.

rdar://problem/49839633

Differential Revision: https://reviews.llvm.org/D62433

llvm-svn: 363468
2019-06-14 22:06:32 +00:00
Matt Arsenault 282dac717e SROA: Allow eliminating addrspacecasted allocas
There is a circular dependency between SROA and InferAddressSpaces
today that requires running both multiple times in order to be able to
eliminate all simple allocas and addrspacecasts. InferAddressSpaces
can't remove addrspacecasts when written to memory, and SROA helps
move pointers out of memory.

This should avoid inserting new commuting addrspacecasts with GEPs,
since there are unresolved questions about pointer wrapping between
different address spaces.

For now, don't replace volatile operations that don't match the alloca
addrspace, as it would change the address space of the access. It may
be still OK to insert an addrspacecast from the new alloca, but be
more conservative for now.

llvm-svn: 363462
2019-06-14 21:38:31 +00:00
Sam Parker 0cf9639a9c [SCEV] Pass NoWrapFlags when expanding an AddExpr
InsertBinop now accepts NoWrapFlags, so pass them through when
expanding a simple add expression.

This is the first re-commit of the functional changes from rL362687,
which was previously reverted.

Differential Revision: https://reviews.llvm.org/D61934

llvm-svn: 363364
2019-06-14 09:19:41 +00:00
Nikita Popov ad81d427ca [LangRef] Clarify poison semantics
I find the current documentation of poison somewhat confusing,
mainly because its use of "undefined behavior" doesn't seem to
align with our usual interpretation (of immediate UB). Especially
the sentence "any instruction that has a dependence on a poison
value has undefined behavior" is very confusing.

Clarify poison semantics by:

 * Replacing the introductory paragraph with the standard rationale
   for having poison values.
 * Spelling out that instructions depending on poison return poison.
 * Spelling out how we go from a poison value to immediate undefined
   behavior and give the two examples we currently use in ValueTracking.
 * Spelling out that side effects depending on poison are UB.

Differential Revision: https://reviews.llvm.org/D63044

llvm-svn: 363320
2019-06-13 19:45:36 +00:00
Philip Reames 038e01dc9a Add a clarifying comment about branching on poison
I recently got this wrong (again), and I'm sure I'm not the only one.  Put a comment in the logical place someone would look to "fix" the obvious "missed optimization" which arrises based on the common misunderstanding.  Hopefully, this will save others time.  :)

llvm-svn: 363318
2019-06-13 19:27:56 +00:00
Joseph Tremoulet 3bc6e2a7aa [EarlyCSE] Ensure equal keys have the same hash value
Summary:
The logic in EarlyCSE that looks through 'not' operations in the
predicate recognizes e.g. that `select (not (cmp sgt X, Y)), X, Y` is
equivalent to `select (cmp sgt X, Y), Y, X`.  Without this change,
however, only the latter is recognized as a form of `smin X, Y`, so the
two expressions receive different hash codes.  This leads to missed
optimization opportunities when the quadratic probing for the two hashes
doesn't happen to collide, and assertion failures when probing doesn't
collide on insertion but does collide on a subsequent table grow
operation.

This change inverts the order of some of the pattern matching, checking
first for the optional `not` and then for the min/max/abs patterns, so
that e.g. both expressions above are recognized as a form of `smin X, Y`.

It also adds an assertion to isEqual verifying that it implies equal
hash codes; this fires when there's a collision during insertion, not
just grow, and so will make it easier to notice if these functions fall
out of sync again.  A new flag --earlycse-debug-hash is added which can
be used when changing the hash function; it forces hash collisions so
that any pair of values inserted which compare as equal but hash
differently will be caught by the isEqual assertion.

Reviewers: spatel, nikic

Reviewed By: spatel, nikic

Subscribers: lebedev.ri, arsenm, craig.topper, efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62644

llvm-svn: 363274
2019-06-13 15:24:11 +00:00
Philip Reames e51c3d8b82 [SCEV] Teach computeSCEVAtScope benefit from one-input Phi. PR39673
SCEV does not propagate arguments through one-input Phis so as to make it easy for the SCEV expander (and related code) to preserve LCSSA.  It's not entirely clear this restriction is neccessary, but for the moment it exists.   For this reason, we don't analyze single-entry phi inputs.  However it is possible that when an this input leaves the loop through LCSSA Phi, it is a provable constant.  Missing that results in an order of optimization issue in loop exit value rewriting where we miss some oppurtunities based on order in which we visit sibling loops.

This patch teaches computeSCEVAtScope about this case. We can generalize it later, but so far we can only replace LCSSA Phis with their constant loop-exiting values.  We should probably also add similiar logic directly in the SCEV construction path itself.

Patch by: mkazantsev (with revised commit message by me)
Differential Revision: https://reviews.llvm.org/D58113

llvm-svn: 363180
2019-06-12 17:21:47 +00:00
Matt Arsenault 2466ba97bc LoopDistribute/LAA: Respect convergent
This case is slightly tricky, because loop distribution should be
allowed in some cases, and not others. As long as runtime dependency
checks don't need to be introduced, this should be OK. This is further
complicated by the fact that LoopDistribute partially ignores if LAA
says that vectorization is safe, and then does its own runtime pointer
legality checks.

Note this pass still does not handle noduplicate correctly, as this
should always be forbidden with it. I'm not going to bother trying to
fix it, as it would require more effort and I think noduplicate should
be removed.

https://reviews.llvm.org/D62607

llvm-svn: 363160
2019-06-12 13:34:19 +00:00
Nico Weber 8bbdea447e Fix a Wunused-lambda-capture warning.
The capture was added in the first commit of https://reviews.llvm.org/D61934
when it was used. In the reland, the use was removed but the capture
wasn't removed.

llvm-svn: 363155
2019-06-12 12:46:46 +00:00
Sam Parker 61de6a4e9c [NFC][SCEV] Add NoWrapFlag argument to InsertBinOp
'Use wrap flags in InsertBinop' (rL362687) was reverted due to
miscompiles. This patch introduces the previous change to pass
no-wrap flags but now only FlagAnyWrap is passed.

Differential Revision: https://reviews.llvm.org/D61934

llvm-svn: 363147
2019-06-12 11:53:55 +00:00
Philip Reames 02f0b379f5 Fix a bug in getSCEVAtScope w.r.t. non-canonical loops
The issue is that if we have a loop with multiple predecessors outside the loop, the code was expecting to merge them and only return if equal, but instead returned the first one seen.

I have no idea if this actually tripped anywhere.  I noticed it by accident when reading the code and have no idea how to go about constructing a test case.

llvm-svn: 363112
2019-06-11 23:21:24 +00:00
Sanjay Patel 40e3bdf876 [Analysis] add isSplatValue() for vectors in IR
We have the related getSplatValue() already in IR (see code just above the proposed addition).
But sometimes we only need to know that the value is a splat rather than capture the splatted
scalar value. Also, we have an isSplatValue() function already in SDAG.

Motivation - recent bugs that would potentially benefit from improved splat analysis in IR:
https://bugs.llvm.org/show_bug.cgi?id=37428
https://bugs.llvm.org/show_bug.cgi?id=42174

Differential Revision: https://reviews.llvm.org/D63138

llvm-svn: 363106
2019-06-11 22:25:18 +00:00
Alina Sbirlea cb4ed8a7bc [MemorySSA] When applying updates, clean unnecessary Phis.
Summary: After applying a set of insert updates, there may be trivial Phis left over. Clean them up.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63033

llvm-svn: 363094
2019-06-11 19:09:34 +00:00
Alina Sbirlea 3cef1f7d64 Only passes that preserve MemorySSA must mark it as preserved.
Summary:
The method `getLoopPassPreservedAnalyses` should not mark MemorySSA as
preserved, because it's being called in a lot of passes that do not
preserve MemorySSA.
Instead, mark the MemorySSA analysis as preserved by each pass that does
preserve it.
These changes only affect the new pass mananger.

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, Prazek, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62536

llvm-svn: 363091
2019-06-11 18:27:49 +00:00
Philip Reames 4bf1c23990 Factor out a helper function for readability and reuse in a future patch [NFC]
llvm-svn: 362980
2019-06-10 20:41:27 +00:00
Sanjay Patel 866db10228 [InstSimplify] reduce code duplication for fcmp folds; NFC
llvm-svn: 362904
2019-06-09 13:58:46 +00:00
Sanjay Patel 73f5a855b3 [InstSimplify] enhance fcmp fold with never-nan operand
This is another step towards correcting our usage of fast-math-flags when applied on an fcmp.
In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of
the fcmp. But I'm leaving that clause in until we're more confident that we can stop
relying on fcmp's FMF.

By using the more general "isKnownNeverNaN()", we gain a simplification shown on the
tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN).
On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction
in addition to the FMF on the fcmp.

This is a continuation of D62979 / rL362879.

llvm-svn: 362903
2019-06-09 13:48:59 +00:00
Ayke van Laethem f18cf230e4 [CaptureTracking] Don't let comparisons against null escape inbounds pointers
Pointers that are in-bounds (either through dereferenceable_or_null or
thorough a getelementptr inbounds) cannot be captured with a comparison
against null. There is no way to construct a pointer that is still in
bounds but also NULL.

This helps safe languages that insert null checks before load/store
instructions. Without this patch, almost all pointers would be
considered captured even for simple loads. With this patch, an icmp with
null will not be seen as escaping as long as certain conditions are met.

There was a lot of discussion about this patch. See the Phabricator
thread for detals.

Differential Revision: https://reviews.llvm.org/D60047

llvm-svn: 362900
2019-06-09 10:20:33 +00:00
Sanjay Patel 4329c15f11 [InstSimplify] enhance fcmp fold with never-nan operand
This is 1 step towards correcting our usage of fast-math-flags when applied on an fcmp.
In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of
the fcmp. But I'm leaving that clause in until we're more confident that we can stop
relying on fcmp's FMF.

By using the more general "isKnownNeverNaN()", we gain a simplification shown on the
tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN).
On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction
in addition to the FMF on the fcmp.

I'll update the 'ult' case below here as a follow-up assuming no problems here.

Differential Revision: https://reviews.llvm.org/D62979

llvm-svn: 362879
2019-06-08 15:12:33 +00:00
Sanjay Patel e490e4a0e7 [Analysis] simplify code for getSplatValue(); NFC
AFAIK, this is only currently called by TTI, but it could be
used from instcombine or CGP to help solve problems like:
https://bugs.llvm.org/show_bug.cgi?id=37428
https://bugs.llvm.org/show_bug.cgi?id=42174

llvm-svn: 362810
2019-06-07 16:09:54 +00:00
Joerg Sonnenberger b2e96169b0 [NFC] Don't export helpers of ConstantFoldCall
llvm-svn: 362799
2019-06-07 13:28:52 +00:00
Sam Parker c5ef502ee8 [CodeGen] Generic Hardware Loop Support
Patch which introduces a target-independent framework for generating
hardware loops at the IR level. Most of the code has been taken from
PowerPC CTRLoops and PowerPC has been ported over to use this generic
pass. The target dependent parts have been moved into
TargetTransformInfo, via isHardwareLoopProfitable, with
HardwareLoopInfo introduced to transfer information from the backend.
    
Three generic intrinsics have been introduced:
- void @llvm.set_loop_iterations
  Takes as a single operand, the number of iterations to be executed.
- i1 @llvm.loop_decrement(anyint)
  Takes the maximum number of elements processed in an iteration of
  the loop body and subtracts this from the total count. Returns
  false when the loop should exit.
- anyint @llvm.loop_decrement_reg(anyint, anyint)
  Takes the number of elements remaining to be processed as well as
  the maximum numbe of elements processed in an iteration of the loop
  body. Returns the updated number of elements remaining.

llvm-svn: 362774
2019-06-07 07:35:30 +00:00
Craig Topper ca541b20d0 [CFLGraph] Add support for unary fneg instruction.
Differential Revision: https://reviews.llvm.org/D62791

llvm-svn: 362737
2019-06-06 19:21:23 +00:00
Craig Topper 6cda33ba36 [InlineCost] Add support for unary fneg.
This adds support for unary fneg based on the implementation of BinaryOperator without the soft float FP cost.

Previously we would just delegate to visitUnaryInstruction. I think the only real change is that we will pass the FastMath flags to SimplifyFNeg now.

Differential Revision: https://reviews.llvm.org/D62699

llvm-svn: 362732
2019-06-06 19:02:18 +00:00
Whitney Tsang 03e8369a72 [DA] Add an option to control delinearization validity checks
Summary: Dependence Analysis performs static checks to confirm validity
of delinearization. These checks often fail for 64-bit targets due to
type conversions and integer wrapping that prevent simplification of the
SCEV expressions. These checks would also fail at compile-time if the
lower bound of the loops are compile-time unknown.

For example:

void foo(int n, int m, int a[][m]) {
  for (int i = 0; i < n; ++i)
    for (int j = 0; j < m; ++j) {
      a[i][j] = a[i+1][j-2];
    }
}

opt -mem2reg -instcombine -indvars -loop-simplify -loop-rotate -inline
-pass-remarks=.* -debug-pass=Arguments
-da-permissive-validity-checks=false k3.ll -analyze -da
will produce the following by default:

da analyze - anti [* *|<]!
but will produce the following expected dependence vector if the
validity checks are disabled:

da analyze - consistent anti [1 -2]!
This revision will introduce a debug option that will leave the validity
checks in place by default, but allow them to be turned off. New tests
are added for cases where it cannot be proven at compile-time that the
individual subscripts stay in-bound with respect to a particular
dimension of an array. These tests enable the option to provide user
guarantee that the subscripts do not over/under-flow into other
dimensions, thereby producing more accurate dependence vectors.

For prior discussion on this topic, leading to this change, please see
the following thread:
http://lists.llvm.org/pipermail/llvm-dev/2019-May/132372.html

Reviewers: Meinersbur, jdoerfert, kbarton, dmgreen, fhahn
Reviewed By: Meinersbur, jdoerfert, dmgreen
Subscribers: fhahn, hiraditya, javed.absar, llvm-commits, Whitney,
etiotto
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D62610

llvm-svn: 362711
2019-06-06 15:12:49 +00:00
Benjamin Kramer f1249442cf Revert "[SCEV] Use wrap flags in InsertBinop"
This reverts commit r362687. Miscompiles llvm-profdata during selfhost.

llvm-svn: 362699
2019-06-06 12:35:46 +00:00
Sam Parker 7cc580f5e9 [SCEV] Use wrap flags in InsertBinop
If the given SCEVExpr has no (un)signed flags attached to it, transfer
these to the resulting instruction or use them to find an existing
instruction.

Differential Revision: https://reviews.llvm.org/D61934

llvm-svn: 362687
2019-06-06 08:56:26 +00:00
Whitney Tsang 2d0896c1cb [LOOPINFO] Extend Loop object to add utilities to get the loop bounds,
step, and loop induction variable.

Summary: This PR extends the loop object with more utilities to get loop
bounds, step, and loop induction variable. There already exists passes
which try to obtain the loop induction variable in their own pass, e.g.
loop interchange. It would be useful to have a common area to get these
information.

/// Example:
/// for (int i = lb; i < ub; i+=step)
///   <loop body>
/// --- pseudo LLVMIR ---
/// beforeloop:
///   guardcmp = (lb < ub)
///   if (guardcmp) goto preheader; else goto afterloop
/// preheader:
/// loop:
///   i1 = phi[{lb, preheader}, {i2, latch}]
///   <loop body>
///   i2 = i1 + step
/// latch:
///   cmp = (i2 < ub)
///   if (cmp) goto loop
/// exit:
/// afterloop:
///
/// getBounds
///   getInitialIVValue      --> lb
///   getStepInst            --> i2 = i1 + step
///   getStepValue           --> step
///   getFinalIVValue        --> ub
///   getCanonicalPredicate  --> '<'
///   getDirection           --> Increasing
/// getInductionVariable          --> i1
/// getAuxiliaryInductionVariable --> {i1}
/// isCanonical                   --> false

Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara,
fhahn
Reviewed By: kbarton
Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya,
llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D60565

llvm-svn: 362644
2019-06-05 20:42:47 +00:00
Whitney Tsang 590b1aee60 Revert "Title: [LOOPINFO] Extend Loop object to add utilities to get the loop"
This reverts commit d34797dfc2.

llvm-svn: 362615
2019-06-05 15:32:56 +00:00
Benjamin Kramer b90b354798 [LoopInfo] Fix unused variable warning. NFC.
llvm-svn: 362610
2019-06-05 14:43:58 +00:00
Whitney Tsang d34797dfc2 Title: [LOOPINFO] Extend Loop object to add utilities to get the loop
bounds, step, and loop induction variable.

Summary: This PR extends the loop object with more utilities to get loop
bounds, step, and loop induction variable. There already exists passes
which try to obtain the loop induction variable in their own pass, e.g.
loop interchange. It would be useful to have a common area to get these
information.

/// Example:
/// for (int i = lb; i < ub; i+=step)
///   <loop body>
/// --- pseudo LLVMIR ---
/// beforeloop:
///   guardcmp = (lb < ub)
///   if (guardcmp) goto preheader; else goto afterloop
/// preheader:
/// loop:
///   i1 = phi[{lb, preheader}, {i2, latch}]
///   <loop body>
///   i2 = i1 + step
/// latch:
///   cmp = (i2 < ub)
///   if (cmp) goto loop
/// exit:
/// afterloop:
///
/// getBounds
///   getInitialIVValue      --> lb
///   getStepInst            --> i2 = i1 + step
///   getStepValue           --> step
///   getFinalIVValue        --> ub
///   getCanonicalPredicate  --> '<'
///   getDirection           --> Increasing
/// getInductionVariable          --> i1
/// getAuxiliaryInductionVariable --> {i1}
/// isCanonical                   --> false

Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara,
fhahn
Reviewed By: kbarton
Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya,
llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D60565

llvm-svn: 362609
2019-06-05 14:34:12 +00:00
Nemanja Ivanovic fe97754acf Initial support for IBM MASS vector library
This is the LLVM portion of patch https://reviews.llvm.org/D59881.
The clang portion is to follow.

llvm-svn: 362568
2019-06-05 01:31:43 +00:00
Johannes Doerfert 40107ce753 Introduce Value::stripPointerCastsSameRepresentation
This patch allows current users of Value::stripPointerCasts() to force
the result of the function to have the same representation as the value
it was called on. This is useful in various cases, e.g., (non-)null
checks.

In this patch only a single call site was adjusted to fix an existing
misuse that would cause nonnull where they may be wrong. Uses in
attribute deduction and other areas, e.g., D60047, are to be expected.

For a discussion on this topic, please see [0].

[0] http://lists.llvm.org/pipermail/llvm-dev/2018-December/128423.html

Reviewers: hfinkel, arsenm, reames

Subscribers: wdng, hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61607

llvm-svn: 362545
2019-06-04 20:21:46 +00:00
Nikita Popov df621bdfc8 [LVI][CVP] Add support for urem, srem and sdiv
The underlying ConstantRange functionality has been added in D60952,
D61207 and D61238, this just exposes it for LVI.

I'm switching the code from using a whitelist to a blacklist, as
we're down to one unsupported operation here (xor) and writing it
this way seems more obvious :)

Differential Revision: https://reviews.llvm.org/D62822

llvm-svn: 362519
2019-06-04 16:24:09 +00:00
George Burgess IV c24a2f4ad9 CFLAA: reflow comments; NFC
llvm-svn: 362442
2019-06-03 19:56:22 +00:00
Craig Topper 7a4eabef39 [CFLGraph] Add FAdd to visitConstantExpr.
This looks like an oversight as all the other binary operators are present.

Accidentally noticed while auditing places that need FNeg handling.

No test because as noted in the review it would be contrived and amount to "don't crash"

Differential Revision: https://reviews.llvm.org/D62790

llvm-svn: 362441
2019-06-03 19:35:52 +00:00
Craig Topper 7cebf0af40 [InlineCost] Don't add the soft float function call cost for the fneg idiom, fsub -0.0, %x
Summary: Fneg can be implemented with an xor rather than a function call so we don't need to add the function call overhead. This was pointed out in D62699

Reviewers: efriedma, cameron.mcinally

Reviewed By: efriedma

Subscribers: javed.absar, eraman, hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62747

llvm-svn: 362304
2019-06-01 19:40:07 +00:00
Erik Pilkington abb2a93c53 [SimplifyLibCalls] Fold more fortified functions into non-fortified variants
When the object size argument is -1, no checking can be done, so calling the
_chk variant is unnecessary. We already did this for a bunch of these
functions.

rdar://50797197

Differential revision: https://reviews.llvm.org/D62358

llvm-svn: 362272
2019-05-31 22:41:36 +00:00
Russell Gallop 802c9b59d5 ftime-trace: Trace loop passes
These can take a significant amount of time in some builds.

Suggested by Andrea Di Biagio.

Differential Revision: https://reviews.llvm.org/D62666

llvm-svn: 362219
2019-05-31 10:14:04 +00:00
Craig Topper b457e430f3 [InstructionSimplify] Add missing implementation of llvm::SimplifyUnOp. NFC
There are no callers currently, but the function is declared so we should at
least implement it.

llvm-svn: 362205
2019-05-31 08:10:23 +00:00
Nikita Popov 332c100562 [ValueTracking][ConstantRange] Distinguish low/high always overflow
In order to fold an always overflowing signed saturating add/sub,
we need to know in which direction the always overflow occurs.
This patch splits up AlwaysOverflows into AlwaysOverflowsLow and
AlwaysOverflowsHigh to pass through this information (but it is
not used yet).

Differential Revision: https://reviews.llvm.org/D62463

llvm-svn: 361858
2019-05-28 18:08:31 +00:00
Craig Topper ab53c5e5ab [InlineCost] Fix a couple comments. NFC
Replace "unary operator" with "unary instruction" in visitUnaryInstruction since
we now have a UnaryOperator class which might needs its own visit function.

Fix a copy/paste in visitCastInst that appears to have been copied from
visitPtrToInt.

llvm-svn: 361794
2019-05-28 07:25:27 +00:00
Craig Topper 50d502826b [CostModel] Add really basic support for being able to query the cost of the FNeg instruction.
Summary:
This reuses the getArithmeticInstrCost, but passes dummy values of the second
operand flags.

The X86 costs are wrong and can be improved in a follow up. I just wanted to
stop it from reporting an unknown cost first.

Reviewers: RKSimon, spatel, andrew.w.kaylor, cameron.mcinally

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62444

llvm-svn: 361788
2019-05-28 04:09:18 +00:00
Xing Xue 3860aad6e7 [MustExecute] Improve MustExecute to correctly handle loop nest
Summary:
for.outer:
  br for.inner
for.inner:
  LI <loop invariant load instruction>
for.inner.latch:
  br for.inner, for.outer.latch
for.outer.latch:
  br for.outer, for.outer.exit

LI is a loop invariant load instruction that post dominate for.outer, so LI should be able to move out of the loop nest. However, there is a bug in allLoopPathsLeadToBlock().

Current algorithm of allLoopPathsLeadToBlock()

  1. get all the transitive predecessors of the basic block LI belongs to (for.inner) ==> for.outer, for.inner.latch
  2. if any successors of any of the predecessors are not for.inner or for.inner's predecessors, then return false
  3. return true

Although for.inner.latch is for.inner's predecessor, but for.inner dominates for.inner.latch, which means if for.inner.latch is ever executed, for.inner should be as well. It should not return false for cases like this.

Author: Whitney (committed by xingxue)

Reviewers: kbarton, jdoerfert, Meinersbur, hfinkel, fhahn

Reviewed By: jdoerfert

Subscribers: hiraditya, jsji, llvm-commits, etiotto, bmahjour

Tags: #LLVM

Differential Revision: https://reviews.llvm.org/D62418

llvm-svn: 361762
2019-05-27 13:57:28 +00:00
Nikita Popov d0f13e618f [ValueTracking] Base computeOverflowForUnsignedMul() on ConstantRange code; NFCI
The implementation in ValueTracking and ConstantRange are equally
powerful, reuse the one in ConstantRange, which will make this easier
to extend.

llvm-svn: 361723
2019-05-26 13:22:01 +00:00
Nikita Popov 6bb5041e94 [LVI][CVP] Add support for saturating add/sub
Adds support for the uadd.sat family of intrinsics in LVI, based on
ConstantRange methods from D60946.

Differential Revision: https://reviews.llvm.org/D62447

llvm-svn: 361703
2019-05-25 16:44:14 +00:00
Nikita Popov 024b18aca7 [LVI][CVP] Calculate with.overflow result range
In LVI, calculate the range of extractvalue(op.with.overflow(%x, %y), 0)
as the range of op(%x, %y). This is mainly useful in conjunction with
D60650: If the result of the operation is extracted in a branch guarded
against overflow, then the value of %x will be appropriately constrained
and the result range of the operation will be calculated taking that
into account.

Differential Revision: https://reviews.llvm.org/D60656

llvm-svn: 361693
2019-05-25 09:53:45 +00:00
Nikita Popov 17367b0d89 [LVI] Extract helper for binary range calculations; NFC
llvm-svn: 361692
2019-05-25 09:53:37 +00:00
Sanjay Patel 8869a98e82 [InstSimplify] fold insertelement-of-extractelement
This was partly handled in InstCombine (only the constant
index case), so delete that and zap it more generally in
InstSimplify.

llvm-svn: 361576
2019-05-24 00:13:58 +00:00
Sanjay Patel e60cb7d1be [InstSimplify] insertelement V, undef, ? --> V
This was part of InstCombine, but it's better placed in
InstSimplify. InstCombine also had an unreachable but weaker
fold for insertelement with undef index, so that is deleted.

llvm-svn: 361559
2019-05-23 21:49:47 +00:00
Kit Barton 987fdfd9a7 Revert [LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, induction variable, and guard branch.
This reverts r361517 (git commit 2049e4dd8f)

llvm-svn: 361553
2019-05-23 20:53:05 +00:00
Kit Barton 2049e4dd8f [LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, induction variable, and guard branch.
Summary:
    This PR extends the loop object with more utilities to get loop bounds, step, induction variable, and guard branch. There already exists passes which try to obtain the loop induction variable in their own pass, e.g. loop interchange. It would be useful to have a common area to get these information. Moreover, loop fusion (https://reviews.llvm.org/D55851) is planning to use getGuard() to extend the kind of loops it is able to fuse, e.g. rotated loop with non-constant upper bound, which would have a loop guard.

      /// Example:
      /// for (int i = lb; i < ub; i+=step)
      ///   <loop body>
      /// --- pseudo LLVMIR ---
      /// beforeloop:
      ///   guardcmp = (lb < ub)
      ///   if (guardcmp) goto preheader; else goto afterloop
      /// preheader:
      /// loop:
      ///   i1 = phi[{lb, preheader}, {i2, latch}]
      ///   <loop body>
      ///   i2 = i1 + step
      /// latch:
      ///   cmp = (i2 < ub)
      ///   if (cmp) goto loop
      /// exit:
      /// afterloop:
      ///
      /// getBounds
      ///   getInitialIVValue      --> lb
      ///   getStepInst            --> i2 = i1 + step
      ///   getStepValue           --> step
      ///   getFinalIVValue        --> ub
      ///   getCanonicalPredicate  --> '<'
      ///   getDirection           --> Increasing
      /// getGuard             --> if (guardcmp) goto loop; else goto afterloop
      /// getInductionVariable          --> i1
      /// getAuxiliaryInductionVariable --> {i1}
      /// isCanonical                   --> false

    Committed on behalf of @Whitney (Whitney Tsang).

    Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara, fhahn

    Reviewed By: kbarton

    Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya, llvm-commits

    Tags: #llvm

    Differential Revision: https://reviews.llvm.org/D60565

llvm-svn: 361517
2019-05-23 17:56:35 +00:00
Sanjay Patel 63fa690617 [InstSimplify] update stale comment; NFC
Missed this diff with rL361118.

llvm-svn: 361180
2019-05-20 17:52:18 +00:00
Nick Desaulniers 639b29b1b5 [INLINER] allow inlining of blockaddresses if sole uses are callbrs
Summary:
It was supposed that Ref LazyCallGraph::Edge's were being inserted by
inlining, but that doesn't seem to be the case.  Instead, it seems that
there was no test for a blockaddress Constant in an instruction that
referenced the function that contained the instruction. Ex:

```
define void @f() {
  %1 = alloca i8*, align 8
2:
  store i8* blockaddress(@f, %2), i8** %1, align 8
  ret void
}
```

When iterating blockaddresses, do not add the function they refer to
back to the worklist if the blockaddress is referring to the contained
function (as opposed to an external function).

Because blockaddress has sligtly different semantics than GNU C's
address of labels, there are 3 cases that can occur with blockaddress,
where only 1 can happen in GNU C due to C's scoping rules:
* blockaddress is within the function it refers to (possible in GNU C).
* blockaddress is within a different function than the one it refers to
(not possible in GNU C).
* blockaddress is used in to declare a global (not possible in GNU C).

The second case is tested in:

```
$ ./llvm/build/unittests/Analysis/AnalysisTests \
  --gtest_filter=LazyCallGraphTest.HandleBlockAddress
```

This patch adjusts the iteration of blockaddresses in
LazyCallGraph::visitReferences to not revisit the blockaddresses
function in the first case.

The Linux kernel contains code that's not semantically valid at -O0;
specifically code passed to asm goto. It requires that asm goto be
inline-able. This patch conservatively does not attempt to handle the
more general case of inlining blockaddresses that have non-callbr users
(pr/39560).

https://bugs.llvm.org/show_bug.cgi?id=39560
https://bugs.llvm.org/show_bug.cgi?id=40722
https://github.com/ClangBuiltLinux/linux/issues/6
https://reviews.llvm.org/rL212077

Reviewers: jyknight, eli.friedman, chandlerc

Reviewed By: chandlerc

Subscribers: george.burgess.iv, nathanchance, mgorny, craig.topper, mengxu.gatech, void, mehdi_amini, E5ten, chandlerc, efriedma, eraman, hiraditya, haicheng, pirama, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58260

llvm-svn: 361173
2019-05-20 16:48:09 +00:00
Cameron McInally 2d2a46db8e [InstSimplify] Teach fsub -0.0, (fneg X) ==> X about unary fneg
Differential Revision: https://reviews.llvm.org/D62077

llvm-svn: 361151
2019-05-20 13:13:35 +00:00
Sanjay Patel 9ef99b4b11 [InstSimplify] fold fcmp (maxnum, X, C1), C2
This is the sibling transform for rL360899 (D61691):

  maxnum(X, GreaterC) == C --> false
  maxnum(X, GreaterC) <= C --> false
  maxnum(X, GreaterC) <  C --> false
  maxnum(X, GreaterC) >= C --> true
  maxnum(X, GreaterC) >  C --> true
  maxnum(X, GreaterC) != C --> true

llvm-svn: 361118
2019-05-19 14:26:39 +00:00
Cameron McInally 067e946859 [InstSimplify] Add unary fneg to `fsub 0.0, (fneg X) ==> X` transform
Differential Revision: https://reviews.llvm.org/D62013

llvm-svn: 361047
2019-05-17 16:47:00 +00:00
Sanjay Patel 152f81fae8 [InstSimplify] fold fcmp (minnum, X, C1), C2
minnum(X, LesserC) == C --> false
   minnum(X, LesserC) >= C --> false
   minnum(X, LesserC) >  C --> false
   minnum(X, LesserC) != C --> true
   minnum(X, LesserC) <= C --> true
   minnum(X, LesserC) <  C --> true

maxnum siblings will follow if there are no problems here.

We should be able to perform some other combines when the constants
are equal or greater-than too, but that would go in instcombine.

We might also generalize this by creating an FP ConstantRange
(similar to what we do for integers).

Differential Revision: https://reviews.llvm.org/D61691

llvm-svn: 360899
2019-05-16 14:03:10 +00:00
Fangrui Song 3e92df3e39 Add Triple::isPPC64()
llvm-svn: 360864
2019-05-16 08:31:22 +00:00
Cameron McInally 0c82d9b5a2 Teach InstSimplify -X + X --> 0.0 about unary FNeg
Differential Revision: https://reviews.llvm.org/D61916

llvm-svn: 360777
2019-05-15 14:31:33 +00:00
Nikita Popov 48c4e4fa80 [LVI][CVP] Add support for abs/nabs select pattern flavor
Based on ConstantRange support added in D61084, we can now handle
abs and nabs select pattern flavors in LVI.

Differential Revision: https://reviews.llvm.org/D61794

llvm-svn: 360700
2019-05-14 18:53:47 +00:00
Kit Barton 37b7922daa Save the induction binary operator in IVDescriptors for non FP induction variables.
Summary:
Currently InductionBinOps are only saved for FP induction variables, the PR extends it with non FP induction variable, so user of IVDescriptors can query the InductionBinOps for integer induction variables.

The changes in hasUnsafeAlgebra() and getUnsafeAlgebraInst() are required for the existing LIT test cases to pass. As described in the comment of the two functions, one of the requirement to return true is it is a FP induction variable. The checks was not needed because InductionBinOp was not set on non FP cases before.

https://reviews.llvm.org/D60565 depends on the patch.

Committed on behalf of @Whitney (Whitney Tsang).

Reviewers: jdoerfert, kbarton, fhahn, hfinkel, dmgreen, Meinersbur

Reviewed By: jdoerfert

Subscribers: mgorny, hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61329

llvm-svn: 360671
2019-05-14 13:26:36 +00:00
Teresa Johnson 37b80122bd [ThinLTO] Auto-hide prevailing linkonce_odr only when all copies eligible
Summary:
We hit undefined references building with ThinLTO when one source file
contained explicit instantiations of a template method (weak_odr) but
there were also implicit instantiations in another file (linkonce_odr),
and the latter was the prevailing copy. In this case the symbol was
marked hidden when the prevailing linkonce_odr copy was promoted to
weak_odr. It led to unsats when the resulting shared library was linked
with other code that contained a reference (expecting to be resolved due
to the explicit instantiation).

Add a CanAutoHide flag to the GV summary to allow the thin link to
identify when all copies are eligible for auto-hiding (because they were
all originally linkonce_odr global unnamed addr), and only do the
auto-hide in that case.

Most of the changes here are due to plumbing the new flag through the
bitcode and llvm assembly, and resulting test changes. I augmented the
existing auto-hide test to check for this situation.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, dexonsmith, arphaman, dang, llvm-commits, steven_wu, wmi

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59709

llvm-svn: 360466
2019-05-10 20:08:24 +00:00
Philip Reames 76ea748d2d Compile time tweak for libcall lookup
If we have a large module which is mostly intrinsics, we hammer the lib call lookup path from CodeGenPrepare.  Adding a fastpath reduces compile by 15% for one such example.

The problem is really more general than intrinsics - a module with lots of non-intrinsics non-libcall calls has the same problem - but we might as well avoid an easy case quickly.

llvm-svn: 360391
2019-05-09 23:13:09 +00:00
Warren Ristow d27b0c6247 [SCEV] Suppress hoisting insertion point of binops when unsafe
InsertBinop tries to move insertion-points out of loops for expressions
that are loop-invariant. This patch adds a new parameter, IsSafeToHost,
to guard that hoisting. This allows callers to suppress that hoisting
for unsafe situations, such as divisions that may have a zero
denominator.

This fixes PR38697.

Differential Revision: https://reviews.llvm.org/D55232

llvm-svn: 360280
2019-05-08 18:50:07 +00:00
Alina Sbirlea f31eba6494 [MemorySSA] Teach LoopSimplify to preserve MemorySSA.
Summary:
Preserve MemorySSA in LoopSimplify, in the old pass manager, if the analysis is available.
Do not preserve it in the new pass manager.
Update tests.

Subscribers: nemanjai, jlebar, javed.absar, Prazek, kbarton, zzheng, jsji, llvm-commits, george.burgess.iv, chandlerc

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60833

llvm-svn: 360270
2019-05-08 17:05:36 +00:00
Nikita Popov 9fd02a71a3 Revert "[ValueTracking] Improve isKnowNonZero for Ints"
This reverts commit 3b137a4956.

As reported in https://reviews.llvm.org/D60846, this is causing
miscompiles.

llvm-svn: 360260
2019-05-08 14:50:01 +00:00
Dan Robertson 3b137a4956 [ValueTracking] Improve isKnowNonZero for Ints
Improve isKnownNonZero for integers in order to improve cttz
optimizations.

Differential Revision: https://reviews.llvm.org/D60846

llvm-svn: 360222
2019-05-08 02:25:08 +00:00
Sanjay Patel e088d03b9c [ValueTracking] add logic for known-never-nan with minnum/maxnum
From the LangRef: "Returns NaN only if both operands are NaN."

llvm-svn: 360206
2019-05-07 22:58:31 +00:00
Keno Fischer a1a4adf4b9 [SCEV] Add explicit representations of umin/smin
Summary:
Currently we express umin as `~umax(~x, ~y)`. However, this becomes
a problem for operands in non-integral pointer spaces, because `~x`
is not something we can compute for `x` non-integral. However, since
comparisons are generally still allowed, we are actually able to
express `umin(x, y)` directly as long as we don't try to express is
as a umax. Support this by adding an explicit umin/smin representation
to SCEV. We do this by factoring the existing getUMax/getSMax functions
into a new function that does all four. The previous two functions were
largely identical.

Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D50167

llvm-svn: 360159
2019-05-07 15:28:47 +00:00
Cameron McInally c3167696bc Add FNeg support to InstructionSimplify
Differential Revision: https://reviews.llvm.org/D61573

llvm-svn: 360053
2019-05-06 16:05:10 +00:00
Cameron McInally 1d0c845d9d Add FNeg IR constant folding support
llvm-svn: 359982
2019-05-05 16:07:09 +00:00
Alina Sbirlea 0363c3b8bb [MemorySSA] Check that block is reachable when adding phis.
Summary:
Originally the insertDef method was only used when building MemorySSA, and was limiting the number of Phi nodes that it created.
Now it's used for updates as well, and it can create additional Phis needed for correctness.
Make sure no Phis are created in unreachable blocks (condition met during MSSA build), otherwise the renamePass will find a null DTNode.

Resolves PR41640.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61410

llvm-svn: 359845
2019-05-02 23:41:58 +00:00
Alina Sbirlea 151ab4844a [MemorySSA] Refactor removing multiple trivial phis [NFC].
Summary: Create a method to clean up multiple potentially trivial phis, since we will need this often.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61471

llvm-svn: 359842
2019-05-02 23:12:49 +00:00
Keno Fischer a3e4b3bd33 [SCEV] Use isKnownViaNonRecursiveReasoning for smax simplification
Summary:
Commit
	rL331949: SCEV] Do not use induction in isKnownPredicate for simplification umax

changed the codepath for umax from isKnownPredicate to
isKnownViaNonRecursiveReasoning to avoid compile time blow up (and as
I found out also stack overflows). However, there is an exact copy of
the code for umax that was lacking this change. In D50167 I want to unify
these codepaths, but to avoid that being a behavior change for the smax
case, pull this independent bit out of it.

Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D61166

llvm-svn: 359693
2019-05-01 15:58:24 +00:00
Keno Fischer d8f856d265 [LoopInfo] Faster implementation of setLoopID. NFC.
Summary:
This change was part of D46460. However, in the meantime rL341926 fixed the
correctness issue here. What remained was the performance issue in setLoopID
where it would iterate through all blocks in the loop and their successors,
rather than just the predecessor of the header (the later presumably being
much faster). We already have the `getLoopLatches` to compute precisely these
basic blocks in an efficient manner, so just use it (as the original commit
did for `getLoopID`).

Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D61215

llvm-svn: 359684
2019-05-01 14:39:11 +00:00
Alina Sbirlea b468320313 [MemorySSA] Invalidate MemorySSA if AA or DT are invalidated.
Summary:
MemorySSA keeps internal pointers of AA and DT.
If these get invalidated, so should MemorySSA.

Reviewers: george.burgess.iv, chandlerc

Subscribers: jlebar, Prazek, llvm-commits

Tags: LLVM

Differential Revision: https://reviews.llvm.org/D61043

llvm-svn: 359627
2019-04-30 22:43:55 +00:00
Alina Sbirlea ba48a2c5e8 [AliasAnalysis/NewPassManager] Invalidate AAManager less often.
Summary:
This is a redo of D60914.

The objective is to not invalidate AAManager, which is stateless, unless
there is an explicit invalidate in one of the AAResults.

To achieve this, this patch adds an API to PAC, to check precisely this:
is this analysis not invalidated explicitly == is this analysis not abandoned == is this analysis stateless, so preserved without explicitly being marked as preserved by everyone

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61284

llvm-svn: 359622
2019-04-30 22:15:47 +00:00
Fedor Sergeev eeae45dc77 [NFC][InlineCost] cleanup - comments, overflow handling.
Reviewed By: apilipenko
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60751

llvm-svn: 359609
2019-04-30 20:44:53 +00:00
Simon Pilgrim f5e8f222d6 Revert rL359519 : [MemorySSA] Invalidate MemorySSA if AA or DT are invalidated.
Summary:
MemorySSA keeps internal pointers of AA and DT.
If these get invalidated, so should MemorySSA.

Reviewers: george.burgess.iv, chandlerc

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61043
........
This was causing windows build bot failures

llvm-svn: 359555
2019-04-30 12:34:21 +00:00
Sjoerd Meijer ea31ddb36f [ARM] Implement TTI::getMemcpyCost
This implements TargetTransformInfo method getMemcpyCost, which estimates the
number of instructions to which a memcpy instruction expands to.

Differential Revision: https://reviews.llvm.org/D59787

llvm-svn: 359547
2019-04-30 10:28:50 +00:00
Alina Sbirlea 9a1edd14a2 [MemorySSA] Invalidate MemorySSA if AA or DT are invalidated.
Summary:
MemorySSA keeps internal pointers of AA and DT.
If these get invalidated, so should MemorySSA.

Reviewers: george.burgess.iv, chandlerc

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61043

llvm-svn: 359519
2019-04-29 23:53:04 +00:00
Nikita Popov 7a94795b2b [ConstantRange] Add makeExactNoWrapRegion()
I got confused on the terminology, and the change in D60598 was not
correct. I was thinking of "exact" in terms of the result being
non-approximate. However, the relevant distinction here is whether
the result is

 * Largest range such that:
   Forall Y in Other: Forall X in Result: X BinOp Y does not wrap.
   (makeGuaranteedNoWrapRegion)
 * Smallest range such that:
   Forall Y in Other: Forall X not in Result: X BinOp Y wraps.
   (A hypothetical makeAllowedNoWrapRegion)
 * Both. (makeExactNoWrapRegion)

I'm adding a separate makeExactNoWrapRegion method accepting a
single APInt (same as makeExactICmpRegion) and using it in the
places where the guarantee is relevant.

Differential Revision: https://reviews.llvm.org/D60960

llvm-svn: 359402
2019-04-28 15:40:56 +00:00
Philip Reames 88cd69b56f Consolidate existing utilities for interpreting vector predicate maskes [NFC]
llvm-svn: 359163
2019-04-25 02:30:17 +00:00
Xinliang David Li 499c80b890 Add optional arg to profile count getters to filter
synthetic profile count.

Differential Revision: http://reviews.llvm.org/D61025

llvm-svn: 359131
2019-04-24 19:51:16 +00:00
Bjorn Pettersson 71e8c6f20f Add "const" in GetUnderlyingObjects. NFC
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.

It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.

Reviewers: hfinkel, materi, jkorous

Reviewed By: jkorous

Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61038

llvm-svn: 359072
2019-04-24 06:55:50 +00:00
Alina Sbirlea b341efce31 Revert [AliasAnalysis] AAResults preserves AAManager.
Triggers use-after-free.

llvm-svn: 359055
2019-04-24 00:28:29 +00:00
Josh Stone 27924c3a3c [Lint] Permit aliasing noalias readonly arguments
Summary:
If two arguments are both readonly, then they have no memory dependency
that would violate noalias, even if they do actually overlap.

Reviewers: hfinkel, efriedma

Reviewed By: efriedma

Subscribers: efriedma, hiraditya, llvm-commits, tstellar

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60239

llvm-svn: 359047
2019-04-23 23:43:47 +00:00
Alina Sbirlea 4fd1f266b1 [MemorySSA] LCSSA preserves MemorySSA.
Summary:
Enabling MemorySSA in the old pass manager leads to MemorySSA being run
twice due to the fact that LCSSA and LoopSimplify do not preserve
MemorySSA. This is the first step to address that: target LCSSA.

LCSSA does not make any changes that invalidate MemorySSA, so it
preserves it by design. It must preserve AA as well, for this to hold.

After this patch, MemorySSA is still run twice in the old pass manager.
Step two follows: target LoopSimplify.

Subscribers: mehdi_amini, jlebar, Prazek, llvm-commits, george.burgess.iv, chandlerc

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60832

llvm-svn: 359032
2019-04-23 20:59:44 +00:00
Alina Sbirlea a809e8e5e7 [AliasAnalysis] AAResults preserves AAManager.
Summary:
AAResults should not invalidate AAManager.
Update tests.

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60914

llvm-svn: 359014
2019-04-23 17:21:18 +00:00
Fangrui Song efd94c56ba Use llvm::stable_sort
While touching the code, simplify if feasible.

llvm-svn: 358996
2019-04-23 14:51:27 +00:00
Fedor Sergeev 652168a99b [CallSite removal] move InlineCost to CallBase usage
Converting InlineCost interface and its internals into CallBase usage.
Inliners themselves are still not converted.

Reviewed By: reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60636

llvm-svn: 358982
2019-04-23 12:43:27 +00:00
Philip Reames d8d9b7b20e [InstSimplify] Move masked.gather w/no active lanes handling to InstSimplify from InstCombine
In the process, use the existing masked.load combine which is slightly stronger, and handles a mix of zero and undef elements in the mask.  

llvm-svn: 358913
2019-04-22 19:30:01 +00:00
Nikita Popov 5aacc7a573 Revert "[ConstantRange] Rename make{Guaranteed -> Exact}NoWrapRegion() NFC"
This reverts commit 7bf4d7c07f2fac862ef34c82ad0fef6513452445.

After thinking about this more, this isn't right, the range is not exact
in the same sense as makeExactICmpRegion(). This needs a separate
function.

llvm-svn: 358876
2019-04-22 09:01:38 +00:00
Nikita Popov 5299e25f50 [ConstantRange] Rename make{Guaranteed -> Exact}NoWrapRegion() NFC
Following D60632 makeGuaranteedNoWrapRegion() always returns an
exact nowrap region. Rename the function accordingly. This is in
line with the naming of makeExactICmpRegion().

llvm-svn: 358875
2019-04-22 08:36:05 +00:00
Nikita Popov dbc3fbafe7 [ConstantRange] Add getNonEmpty() constructor
ConstantRanges have an annoying special case: If upper and lower are
the same, it can be either an empty or a full set. When constructing
constant ranges nearly always a full set is intended, but this still
requires an explicit check in many places.

This revision adds a getNonEmpty() constructor that disambiguates this
case: If upper and lower are the same, a full set is created.

Differential Revision: https://reviews.llvm.org/D60947

llvm-svn: 358854
2019-04-21 15:22:54 +00:00
Chandler Carruth ce3f75df1f [CallSite removal] Move the legacy PM, call graph, and some inliner
code to `CallBase`.

This patch focuses on the legacy PM, call graph, and some of inliner and legacy
passes interacting with those APIs from `CallSite` to the new `CallBase` class.
No interesting changes.

Differential Revision: https://reviews.llvm.org/D60412

llvm-svn: 358739
2019-04-19 05:59:42 +00:00
Nicolai Haehnle 523f90a2ba [SDA] Bug fix: Use IPD outside the loop as divergence bound
Summary:
The immediate post dominator of the loop header may be part of the divergent loop.
Since this /was/ the divergence propagation bound the SDA would not detect joins of divergent paths outside the loop.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: mmasten, arsenm, jvesely, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59042

llvm-svn: 358681
2019-04-18 16:17:35 +00:00
Philip Reames b2c9fc02d5 Fix a bug in SCEV's isSafeToExpand around speculation safety
isSafeToExpand was making a common, but dangerously wrong, mistake in assuming that if any instruction within a basic block executes, that all instructions within that block must execute.  This can be trivially shown to be false by considering the following small example:
bb:
  add x, y  <-- InsertionPoint
  call @throws()
  udiv x, y <-- SCEV* S
  br ...

It's clearly not legal to expand S above the throwing call, but the previous logic would do so since S dominates (but not properlyDominates) the block containing the InsertionPoint.

Since iterating instructions w/in a block is expensive, this change special cases two cases: 1) S is an operand of InsertionPoint, and 2) InsertionPoint is the terminator of it's block.  These two together are enough to keep all current optimizations triggering while fixing the latent correctness issue.

As best I can tell, this is a silent bug in current ToT.  Given that, there's no tests with this change.  It was noticed in an upcoming optimization change (D60093), and was reviewed as part of that.  That change will include the test which caused me to notice the issue.  I'm submitting this seperately so that anyone bisecting a problem gets a clear explanation.

llvm-svn: 358680
2019-04-18 16:10:21 +00:00
Nikita Popov 2039581002 [LVI][CVP] Constrain values in with.overflow branches
If a branch is conditional on extractvalue(op.with.overflow(%x, C), 1)
then we can constrain the value of %x inside the branch based on
makeGuaranteedNoWrapRegion(). We do this by extending the edge-value
handling in LVI. This allows CVP to then fold comparisons against %x,
as illustrated in the tests.

Differential Revision: https://reviews.llvm.org/D60650

llvm-svn: 358597
2019-04-17 16:57:42 +00:00
Nikita Popov 79dffc67b5 [IR] Add WithOverflowInst class
This adds a WithOverflowInst class with a few helper methods to get
the underlying binop, signedness and nowrap type and makes use of it
where sensible. There will be two more uses in D60650/D60656.

The refactorings are all NFC, though I left some TODOs where things
could be improved. In particular we have two places where add/sub are
handled but mul isn't.

Differential Revision: https://reviews.llvm.org/D60668

llvm-svn: 358512
2019-04-16 18:55:16 +00:00
Alina Sbirlea f9f073a861 [MemorySSA] Add previous def to cache when found, even if trivial.
Summary:
When inserting a new Def, MemorySSA may be have non-minimal number of Phis.
While inserting, the walk to find the previous definition may cleanup minimal Phis.
When the last definition is trivial to obtain, we do not cache it.

It is possible while getting the previous definition for a Def to get two different answers:
- one that was straight-forward to find when walking the first path (a trivial phi in this case), and
- another that follows a cleanup of the trivial phi, it determines it may need additional Phi nodes, it inserts them and returns a new phi in the same position as the former trivial one.
While the Phis added for the second path are all redundant, they are not complete (the walk is only done upwards), and they are not properly cleaned up afterwards.

A way to fix this problem is to cache the straight-forward answer we got on the first walk.
The caching is only kept for the duration of a getPreviousDef call, and for Phis we use TrackingVH, so removing the trivial phi will lead to replacing it with the next dominating phi in the cache.
Resolves PR40749.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60634

llvm-svn: 358313
2019-04-12 21:58:52 +00:00