Commit Graph

9705 Commits

Author SHA1 Message Date
Juneyoung Lee 684b43c0cf [IR] Add NoUndef attribute to Intrinsics.td
This patch adds NoUndef to Intrinsics.td.
The attribute is attached to llvm.assume's operand, because llvm.assume(undef)
is UB.
It is attached to pointer operands of several memory accessing intrinsics
as well.

This change makes ValueTracking::getGuaranteedNonPoisonOps' intrinsic check
unnecessary, so it is removed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86576
2020-08-27 02:54:48 +09:00
Mircea Trofin 7cfcecece0 [MLInliner] Simplify TFUTILS_SUPPORTED_TYPES
We only need the C++ type and the corresponding TF Enum. The other
parameter was used for the output spec json file, but we can just
standardize on the C++ type name there.

Differential Revision: https://reviews.llvm.org/D86549
2020-08-25 14:19:39 -07:00
Juneyoung Lee f753f5b050 [ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands
This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands.

Instead of special-casing llvm.assume, I think it is also a viable option to
add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86477
2020-08-26 04:40:21 +09:00
Nikita Popov 3a54b6a4b7 [MemDep] Use BatchAA when computing pointer dependencies
We're not changing IR while running a single MemDep query, so it's
safe to cache alias analysis results using BatchAA. This adds BatchAA
usage to getSimplePointerDependencyFrom(), which is non-intrusive --
covering larger parts (like a whole processNonLocalLoad query) is
also possible, but requires threading BatchAA through a bunch of APIs.

For the ThinLTO configuration, this is a 1% geomean improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D85583
2020-08-25 21:34:34 +02:00
Ta-Wei Tu abbd652dd6 [LoopNest] False negative of `arePerfectlyNested` with LCSSA loops
Summary: The LCSSA pass (required for all loop passes) sometimes adds
additional blocks containing LCSSA variables, and checkLoopsStructure
may return false even when the loops are perfectly nested in this case.
This is because the successor of the exit block of the inner loop now
points to the LCSSA block instead of the latch block of the outer loop.
Examples are shown in the test nests-with-lcssa.ll.

To fix the issue, the successor of the exit block of the inner loop can
now point to a block in which all instructions are LCSSA phi node
(except the terminator), and the sole successor of that block should
point to the latch block of the outer loop.

Reviewed By: Whitney, etiotto

Differential Revision: https://reviews.llvm.org/D86133
2020-08-25 16:20:52 +00:00
Mircea Trofin 8c63df2416 [MLInliner] Support training that doesn't require partial rewards
If we use training algorithms that don't need partial rewards, we don't
need to worry about an ir2native model. In that case, training logs
won't contain a 'delta_size' feature either (since that's the partial
reward).

Differential Revision: https://reviews.llvm.org/D86481
2020-08-24 17:36:29 -07:00
Sam Parker 2e194fe73b [SCEV] Still trying to fix windows buildbots 2020-08-24 10:26:48 +01:00
Alina Sbirlea f55ad3973d [DomTree] Extend update API to allow a post CFG view.
Extend the `applyUpdates` in DominatorTree to allow a post CFG view,
different from the current CFG.
This patch implements the functionality of updating an already up to
date DT, to the desired PostCFGView.
Combining a set of updates towards an up to date DT and a PostCFGView is
not yet supported.

Differential Revision: https://reviews.llvm.org/D85472
2020-08-21 17:23:08 -07:00
Arthur Eubanks b79889c2b1 [opt][NewPM] Add basic-aa in legacy PM compatibility mode
The legacy PM alias analysis pipeline by default includes basic-aa.
When running `opt -foo-pass` under the NPM and -disable-basic-aa is not
specified, use basic-aa.

This decreases the number of check-llvm failures under NPM from 913 to 752.

Reviewed By: ychen, asbirlea

Differential Revision: https://reviews.llvm.org/D86167
2020-08-21 14:05:07 -07:00
Roman Lebedev 5d7c5a5e99
[NFC] Port InstCount pass to new pass manager 2020-08-21 12:39:42 +03:00
Yevgeny Rouban 18bc400f97 [NewPM][PassInstrumentation] Add PreservedAnalyses parameter to AfterPass* callbacks
Both AfterPass and AfterPassInvalidated pass instrumentation
callbacks get additional parameter of type PreservedAnalyses.
This patch was created by @fedor.sergeev. I have just slightly
changed it.

Reviewers: fedor.sergeev

Differential Revision: https://reviews.llvm.org/D81555
2020-08-21 16:10:42 +07:00
David Green 2b69efded0 [ARM][LV] Add a preferPredicatedReductionSelect target hook
As part of D84741, this adds a target hook for the
preferPredicatedReductionSelect option and makes use
of it under MVE, allowing us to tail predicate most
reduction loops.

Differential Revision: https://reviews.llvm.org/D85980
2020-08-21 08:48:12 +01:00
Sanjay Patel 6f3511a01a [ValueTracking] define/use max recursion depth in header
There's a potential motivating case to increase this limit in PR47191:
http://bugs.llvm.org/PR47191

But first we should make it less hacky. The limit in InstCombine is directly tied
to this value because an increase there can cause asserts in the underlying value
tracking calls if not changed together. The usage in VectorUtils is independent,
but the comment suggests that we should use the same value unless there's a known
reason to diverge. There are similar limits in codegen analysis, but I think we
should leave those independent in case we intentionally want the optimization
power/cost to be different there.

Differential Revision: https://reviews.llvm.org/D86113
2020-08-19 16:56:59 -04:00
Mehdi Amini a407ec9b6d Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private.""
Was reverted because MLIR/Flang builds were broken, these APIs have been
fixed in the meantime.
2020-08-19 17:26:36 +00:00
Mehdi Amini 4fc56d70aa Revert "[NFC][llvm] Make the contructors of `ElementCount` private."
This reverts commit 264afb9e6a.
(and dependent 6b742cc48 and fc53bd610f)

MLIR/Flang are broken.
2020-08-19 17:21:37 +00:00
Francesco Petrogalli 264afb9e6a [NFC][llvm] Make the contructors of `ElementCount` private.
Differential Revision: https://reviews.llvm.org/D86120
2020-08-19 16:26:44 +00:00
Mircea Trofin 62fc44ca3c [MLInliner] In development mode, obtain the output specs from a file
Different training algorithms may produce models that, besides the main
policy output (i.e. inline/don't inline), produce additional outputs
that are necessary for the next training stage. To facilitate this, in
development mode, we require the training policy infrastructure produce
a description of the outputs that are interesting to it, in the form of
a JSON file. We special-case the first entry in the JSON file as the
inlining decision - we care about its value, so we can guide inlining
during training - but treat the rest as opaque data that we just copy
over to the training log.

Differential Revision: https://reviews.llvm.org/D85674
2020-08-17 16:56:47 -07:00
Tyker a79e604462 [AssumeBundles] Fix Bug in Assume Queries
this bug was causing miscompile.
now clang cant properly selfhost with -mllvm --enable-knowledge-retention

Reviewed By: jdoerfert, lebedev.ri

Differential Revision: https://reviews.llvm.org/D83507
2020-08-17 21:36:53 +02:00
Dávid Bolvanský 0f14b2e6cb Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 50c743fa71. Patch will be split to smaller ones.
2020-08-17 20:44:33 +02:00
Simon Pilgrim c1f6ce0c73 [DemandedBits] Improve accuracy of Add propagator
The current demand propagator for addition will mark all input bits at and right of the alive output bit as alive. But carry won't propagate beyond a bit for which both operands are zero (or one/zero in the case of subtraction) so a more accurate answer is possible given known bits.

I derived a propagator by working through truth tables and using a bit-reversed addition to make demand ripple to the right, but I'm not sure how to make a convincing argument for its correctness in the comments yet. Nevertheless, here's a minimal implementation and test to get feedback.

This would help in a situation where, for example, four bytes (<128) packed into an int are added with four others SIMD-style but only one of the four results is actually read.

Known A:     0_______0_______0_______0_______
Known B:     0_______0_______0_______0_______
AOut:        00000000001000000000000000000000
AB, current: 00000000001111111111111111111111
AB, patch:   00000000001111111000000000000000

Committed on behalf of: @rrika (Erika)

Differential Revision: https://reviews.llvm.org/D72423
2020-08-17 12:54:09 +01:00
Cullen Rhodes 2ccde3c96b [InlineCost] Fix scalable vectors in visitAlloca
Discovered as part of the VLS type work (see D85128).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85848
2020-08-17 10:34:27 +00:00
Vitaly Buka 3b348d9102 [NFC][StackSafety] Move out sort from the loop 2020-08-17 03:30:14 -07:00
Vitaly Buka e10e7829bf [StackSafety] Skip ambiguous lifetime analysis
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D84630
2020-08-16 18:05:52 -07:00
Vitaly Buka 47552a614a [StackSafety] Change how callee searched in index
Handle other than local linkage types.
2020-08-16 04:37:19 -07:00
Wenlei He 577e58bcc7 [InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.

A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.

This is a resubmit of https://reviews.llvm.org/D83743
2020-08-15 20:17:21 -07:00
Vitaly Buka fc4fd89852 [StackSafety] Use ValueInfo in ParamAccess::Call
This avoid GUID lookup in Index.findSummaryInModule.
Follow up for D81242.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D85269
2020-08-14 12:42:44 -07:00
Matt Morehouse 891b2be85d Revert "[NFC][StackSafety] Move out sort from the loop"
This reverts commit 0426e28419 due to ASan
buildbot failure.
2020-08-14 08:17:35 -07:00
Vitaly Buka 4c30d4b4e5 [NFC][StackSafety] Change map key comparison 2020-08-14 04:23:15 -07:00
Vitaly Buka 0426e28419 [NFC][StackSafety] Move out sort from the loop 2020-08-14 04:19:10 -07:00
Vitaly Buka 798eb71c3a [NFC][StackSafety] Dedup callees 2020-08-14 01:14:52 -07:00
Dávid Bolvanský 50c743fa71 [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 19:54:27 +02:00
Simon Pilgrim 63863451d1 Fix unused variable warning. NFC.
Reduce the dyn_cast<> to a isa<> as that's all non-assert builds require, and move the cast<> inside the assert.
2020-08-13 15:43:20 +01:00
Dávid Bolvanský f9264995a6 Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 44587e2f7e. Sanitizer tests need to be updated.
2020-08-13 14:37:40 +02:00
Dávid Bolvanský 44587e2f7e [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 14:23:58 +02:00
Dávid Bolvanský a0485421d2 Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 385c9d673f.
2020-08-13 12:59:15 +02:00
Dávid Bolvanský 385c9d673f [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 12:45:40 +02:00
Ali Tamur 0581c0b0ee Revert "[SCEV] Look through single value PHIs."
This reverts commit e441b7a7a0.

This patch causes a compile error in tensorflow opensource project. The stack trace looks like:

Point of crash:
llvm/include/llvm/Analysis/LoopInfoImpl.h : line 35

(gdb) ptype *this
type = const class llvm::LoopBase<llvm::BasicBlock, llvm::Loop> [with BlockT = llvm::BasicBlock, LoopT = llvm::Loop]

(gdb) p *this
$1 = {ParentLoop = 0x0, SubLoops = std::vector of length 0, capacity 0, Blocks = std::vector of length 0, capacity 1,
  DenseBlockSet = {<llvm::SmallPtrSetImpl<llvm::BasicBlock const*>> = {<llvm::SmallPtrSetImplBase> = {<llvm::DebugEpochBase> = {Epoch = 3}, SmallArray = 0x1b2bf6c8, CurArray = 0x1b2bf6c8,
        CurArraySize = 8, NumNonEmpty = 0, NumTombstones = 0}, <No data fields>}, SmallStorage = {0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, IsInvalid = true}

(gdb) p *this->DenseBlockSet->CurArray
$2 = (const void *) 0xfffffffffffffffe

I will try to get a case from tensorflow or use creduce to get a small case.
2020-08-12 23:13:24 -07:00
Nikita Popov eba5f5f798 [ValueTracking] Add abs intrinsics support to computeConstantRange()
Implementation is the same as for SPF_ABS.
2020-08-12 22:28:46 +02:00
Nikita Popov e2040d38a1 [ValueTracking] Support min/max intrinsics in computeConstantRange()
The implementation is the same as for the SPF_* case.
2020-08-12 22:07:29 +02:00
Craig Topper a7a06ded8b Recommit "[InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms" and its follow up patches
This recommits the following patches now that D85684 has landed

1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
2020-08-12 10:45:27 -07:00
Florian Hahn e441b7a7a0 [SCEV] Look through single value PHIs.
Now that SCEVExpander can preserve LCSSA form,
we do not have to worry about LCSSA form when
trying to look through PHIs. SCEVExpander will take
care of inserting LCSSA PHI nodes as required.

This increases precision of the analysis in some cases.

Reviewed By: mkazantsev, bmahjour

Differential Revision: https://reviews.llvm.org/D71539
2020-08-12 10:03:42 +01:00
Nikita Popov 06d567059e [InstSimplify] Respect CanUseUndef in more places
Similar to what we do in IIQ, add an isUndefValue() helper that
checks for undef values while respective CanUseUndef. This makes
it much easier to search for places that don't respect the flag
yet.
2020-08-11 21:53:33 +02:00
Dávid Bolvanský d68a2859ab [BPI] Teach BPI about bcmp function
bcmp is similar to memcmp
2020-08-11 20:44:53 +02:00
Nikita Popov d110d4aaff [InstSimplify] Forbid undef folds in expandBinOp
This is the replacement for D84250 based on D84792. As we recursively
fold with the same value twice, we need to disable undef folds,
to prevent an undef from being folded to two different values.

Reverting rG00f3579aea6e3d4a4b7464c3db47294f71cef9e4 and using the
test case from https://reviews.llvm.org/D83360#2145793, it no longer
performs the incorrect fold.

Differential Revision: https://reviews.llvm.org/D85684
2020-08-11 18:39:24 +02:00
Sanjay Patel 1470ce4a76 [InstSimplify] fold min/max with matching min/max operands
I think this is the last remaining translation of an existing
instcombine transform for the corresponding cmp+sel idiom.

This interpretation is more general though - we can remove
mismatched signed/unsigned combinations in addition to the
more obvious cases.

min/max(X, Y) must produce X or Y as the result, so this is
just another clause in the existing transform that was already
matching a min/max of min/max.
2020-08-11 11:23:15 -04:00
Florian Hahn 3483c28c5b [SCEV] ] If RHS >= Start, simplify (Start smax RHS) to RHS for trip counts.
This is the max version of D85046.

This change causes binary changes in 44 out of 237 benchmarks (out of
MultiSource/SPEC2000/SPEC2006)

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D85189
2020-08-11 13:20:24 +01:00
Juneyoung Lee 63b5b92bc9 [LazyValueInfo] Let getEdgeValueLocal look into freeze instructions
This patch makes getEdgeValueLocal more precise when a freeze instruction is
given, by adding support for freeze into constantFoldUser

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84629
2020-08-11 16:39:34 +09:00
Thomas Lively 514445e035 [WebAssembly][ConstantFolding] Fold fp-to-int truncation intrinsics
Constant fold both the trapping and saturating versions of the
WebAssembly truncation intrinsics. The tests are adapted from the
WebAssembly spec tests for the corresponding instructions.

Requested in PR46982.

Differential Revision: https://reviews.llvm.org/D85392
2020-08-10 12:40:05 -07:00
Mircea Trofin 211117b660 [NFC][MLInliner] remove curly braces for a few sinle-line loops 2020-08-10 09:32:21 -07:00
Mircea Trofin d5c81be3ca [NFC][MLInliner] Set up the logger outside the development mode advisor
This allows us to subsequently configure the logger for the case when we
use a model evaluator and want to log additional outputs.

Differential Revision: https://reviews.llvm.org/D85577
2020-08-10 09:22:17 -07:00
Vitaly Buka 1970eefb17 [NFC][StackSafety] Add a couple of early returns 2020-08-09 23:42:09 -07:00
Vitaly Buka 8d91ce8f58 [NFC][StackSafety] Count dataflow inputs 2020-08-09 23:32:41 -07:00
Vitaly Buka dee812a297 [StackSafety] Fix union which produces wrapped sets 2020-08-09 23:20:17 -07:00
Vitaly Buka a6feeb1c6b [NFC][StackSafety] Avoid assert in getBaseObjec 2020-08-09 23:20:17 -07:00
Vitaly Buka 3a34228bff [StackSafety] Don't keep FullSet in index
Optimization. Missing record is enterpreted as FullSet anyway.
2020-08-09 15:01:46 -07:00
Florian Hahn d236e1c7b6 [InstSimplify/NewGVN] Add option to control the use of undef.
Making use of undef is not safe if the simplification result is not used
to replace all uses of the result. This leads to problems in NewGVN,
which does not replace all uses in the IR directly. See PR33165 for more
details.

This patch adds an option to SimplifyQuery to disable the use of undef.

Note that I've only guarded uses if isa<UndefValue>/m_Undef where
SimplifyQuery is currently available. If we agree on the general
direction, I'll update the remaining uses.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84792
2020-08-09 19:16:56 +01:00
Vitaly Buka 648228bcc3 [NFC][StackSafety] Fix statistics 2020-08-07 16:18:52 -07:00
Mircea Trofin 64372d93bc [NFC][MLInliner] Refactor logging implementation
This prepares it for logging externally-specified outputs.

Differential Revision: https://reviews.llvm.org/D85451
2020-08-07 14:56:56 -07:00
Vitaly Buka 7547508b7a Revert "[StackSafety] Skip ambiguous lifetime analysis"
This reverts commit 0b2616a804.

Crashes with safe-stack.
2020-08-07 14:02:50 -07:00
Vitaly Buka 7d4996033b [StackSafety,NFC] Add Stats counters 2020-08-07 14:02:50 -07:00
Vitaly Buka 7fb9de2c6f [StackSafety,NFC] Fix tests in debug 2020-08-06 20:46:39 -07:00
Vitaly Buka 58b95c9b2b [StackSafety,NFC] Add debug counters 2020-08-06 19:24:02 -07:00
Vitaly Buka 92dcf12b2f [StackSafety,NFC] Use CHECK-EMPTY in tests 2020-08-06 19:19:51 -07:00
Vitaly Buka 0b2616a804 [StackSafety] Skip ambiguous lifetime analysis
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D84630
2020-08-06 19:10:33 -07:00
Vitaly Buka 5c6d9b2bbf [LTO,NFC] Skip generateParamAccessSummary when empty
addGlobalValueSummary can check newly added FunctionSummary
and set HasParamAccess to mark that generateParamAccessSummary
is needed.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D85182
2020-08-06 19:01:19 -07:00
Jessica Paquette c8a282bcf7 [GlobalISel] Fix computing known bits for loads with range metadata
In GlobalISel, if you have a load into a small type with a range, you'll hit
an assert if you try to compute known bits on it starting at a larger type.

e.g.

```
%x:_(s8) = G_LOAD %whatever(p0) :: (load 1 ... !range !n)
...
%y:_(s32) = G_SOMETHING %x
```

When we walk through G_SOMETHING and hit the load, the width of our known bits
is 32. However, the width of the range is going to be 8. This will cause us
to hit an assert.

To fix this, make computeKnownBitsFromRangeMetadata zero extend or truncate
the range type to match the bitwidth of the known bits we're calculating.

Add a testcase in CodeGen/GlobalISel/KnownBitsTest.cpp to reflect that this
works now.

https://reviews.llvm.org/D85375
2020-08-06 16:47:07 -07:00
Sanjay Patel 250a167c41 [InstSimplify] avoid crashing by trying to rem-by-zero
Bug was noted in the post-commit comments for:
rGe8760bb9a8a3
2020-08-06 16:06:31 -04:00
Mircea Trofin ca7973cf18 [NFC]{MLInliner] Point out the tests' model dependencies 2020-08-06 09:57:26 -07:00
Mircea Trofin 87fb7aa137 [llvm][MLInliner] Don't log 'mandatory' events
We don't want mandatory events in the training log. We do want to handle
them, to keep the native size accounting accurate, but that's all.

Fixed the code, also expanded the test to capture this.

Differential Revision: https://reviews.llvm.org/D85373
2020-08-06 09:04:15 -07:00
David Green 745bf6cf44 [LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069
2020-08-06 10:10:50 +01:00
Sanjay Patel e8760bb9a8 [InstSimplify] fold icmp with mul nsw and constant operands
https://rise4fun.com/Alive/slvl

  Name: mul nsw with icmp eq
  Pre: (C2 % C1) != 0
  %a = mul nsw i8 %x, C1
  %r = icmp eq i8 %a, C2
    =>
  %r = false

  Name: mul nsw with icmp ne
  Pre: (C2 % C1) != 0
  %a = mul nsw i8 %x, C1
  %r = icmp ne i8 %a, C2
    =>
  %r = true

Follow-up to the 'nuw' variation added with:
rGf879c9b79621
2020-08-05 14:38:39 -04:00
Sanjay Patel f879c9b796 [InstSimplify] fold icmp with mul nuw and constant operands
https://rise4fun.com/Alive/pZEr

  Name: mul nuw with icmp eq
  Pre: (C2 %u C1) != 0
  %a = mul nuw i8 %x, C1
  %r = icmp eq i8 %a, C2
    =>
  %r = false

  Name: mul nuw with icmp ne
  Pre: (C2 %u C1) != 0
  %a = mul nuw i8 %x, C1
  %r = icmp ne i8 %a, C2
    =>
  %r = true

There are potentially several other transforms we need to add based on:
D51625
...but it doesn't look like there was follow-up to that patch.
2020-08-05 14:32:17 -04:00
Jordan Rupprecht 3c39db0c44 Revert "[LoopVectorizer] Inloop vector reductions"
This reverts commit e9761688e4. It breaks the build:

```
~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:10: error: no viable conversion from returned value of type 'SmallVector<[...], 8>' to function return type 'SmallVector<[...], 4>'
  return ReductionOperations;
```
2020-08-05 10:24:15 -07:00
Mircea Trofin b18c41c66f [TFUtils] Expose untyped accessor to evaluation result tensors
These were implementation detail, but become necessary for generic data
copying.

Also added const variations to them, and move assignment, since we had a
move ctor (and the move assignment helps in a subsequent patch).

Differential Revision: https://reviews.llvm.org/D85262
2020-08-05 10:22:45 -07:00
David Green e9761688e4 [LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069
2020-08-05 18:14:05 +01:00
Sanjay Patel bd2c88b253 [InstSimplify] reduce code duplication in simplifyICmpWithMinMax(); NFC 2020-08-05 11:39:28 -04:00
Evgeniy Brevnov 02a629daad [BPI][NFC] Unify handling of normal and SCC based loops
This is one more NFC part extracted from D79485. Normal and SCC based loops have very different representation and have to be handled separatly each time we deal with loops. D79485 is going to introduce much more extensive use of loops what will be problematic with out this change.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D84838
2020-08-05 11:19:24 +07:00
Mircea Trofin 90b9c49ca6 [llvm] Expose type and element count-related APIs on TensorSpec
Added a mechanism to check the element type, get the total element
count, and the size of an element.

Differential Revision: https://reviews.llvm.org/D85250
2020-08-04 17:32:16 -07:00
Mircea Trofin 65b6dbf939 [llvm][NFC] Moved implementation of TrainingLogger outside of its decl
Also renamed a method - printTensor - to print; and added comments.
2020-08-04 14:35:35 -07:00
Xavier Denis 29fe3fe615 [InstSimplify] Peephole optimization for icmp (urem X, Y), X
This revision adds the following peephole optimization
and it's negation:

    %a = urem i64 %x, %y
    %b = icmp ule i64 %a, %x
    ====>
    %b = true

With John Regehr's help this optimization was checked with Alive2
which suggests it should be valid.

This pattern occurs in the bound checks of Rust code, the program

    const N: usize = 3;
    const T = u8;

    pub fn split_mutiple(slice: &[T]) -> (&[T], &[T]) {
        let len = slice.len() / N;
        slice.split_at(len * N)
    }

the method call slice.split_at will check that len * N is within
the bounds of slice, this bounds check is after some transformations
turned into the urem seen above and then LLVM fails to optimize it
any further. Adding this optimization would cause this bounds check
to be fully optimized away.

ref: https://github.com/rust-lang/rust/issues/74938

Differential Revision: https://reviews.llvm.org/D85092
2020-08-04 20:48:37 +02:00
Sanjay Patel a16882047a [InstSimplify] refactor min/max folds with shared operand; NFC 2020-08-04 12:21:05 -04:00
Sanjay Patel 04e45ae1c6 [InstSimplify] fold nested min/max intrinsics with constant operands
This is based on the existing code for the non-intrinsic idioms
in InstCombine.

The vector constant constraint is non-obvious: undefs should be
ok in the outer call, but they can't propagate safely from the
inner call in all cases. Example:

https://alive2.llvm.org/ce/z/-2bVbM
  define <2 x i8> @src(<2 x i8> %x) {
  %0:
    %m = umin <2 x i8> %x, { 7, undef }
    %m2 = umin <2 x i8> { 9, 9 }, %m
    ret <2 x i8> %m2
  }
  =>
  define <2 x i8> @tgt(<2 x i8> %x) {
  %0:
    %m = umin <2 x i8> %x, { 7, undef }
    ret <2 x i8> %m
  }
  Transformation doesn't verify!
  ERROR: Value mismatch

  Example:
  <2 x i8> %x = < undef, undef >

  Source:
  <2 x i8> %m = < #x00 (0)	[based on undef value], #x00 (0) >
  <2 x i8> %m2 = < #x00 (0), #x00 (0) >

  Target:
  <2 x i8> %m = < #x07 (7), #x10 (16) >
  Source value: < #x00 (0), #x00 (0) >
  Target value: < #x07 (7), #x10 (16) >
2020-08-04 08:44:48 -04:00
Sanjay Patel 20c71e55aa [InstSimplify] reduce code for min/max analysis; NFC
This should probably be moved up to some common area eventually
when there's another user.
2020-08-04 08:02:33 -04:00
David Green 3c7e7d40a9 [BasicAA] Enable -basic-aa-recphi by default
This option was added a while back, to help improve AA around pointer
phi loops. It looks for phi(gep(phi, const), x) loops, checking if x can
then prove more precise aliasing info.

Differential Revision: https://reviews.llvm.org/D82998
2020-08-04 10:43:42 +01:00
Alina Sbirlea 1ce82015f6 [MemorySSA] Restrict optimizations after a PhiTranslation.
Merging alias results from different paths, when a path did phi
translation is not necesarily correct. Conservatively terminate such paths.
Aimed to fix PR46156.

Differential Revision: https://reviews.llvm.org/D84905
2020-08-03 14:46:41 -07:00
Sanjay Patel 9e5cf6bde5 [InstSimplify] fold variations of max-of-min with common operand
https://alive2.llvm.org/ce/z/ZtxpZ3
2020-08-03 15:02:46 -04:00
Mircea Trofin 4b1b109c51 [llvm] Add a parser from JSON to TensorSpec
A JSON->TensorSpec utility we will use subsequently to specify
additional outputs needed for certain training scenarios.

Differential Revision: https://reviews.llvm.org/D84976
2020-08-03 09:49:31 -07:00
Florian Hahn ee1c12708a [SCEV] If Start>=RHS, simplify (Start smin RHS) = RHS for trip counts.
In some cases, it seems like we can get rid of unnecessary s/umins by
using information from the loop guards (unless I am missing something).

One place where this seems to be helpful in practice is when computing
loop trip counts. This patch just changes howManyGreaterThans for now.
Note that this requires a loop for which we can check 'is guarded'.

On SPEC2000/SPEC2006/MultiSource, there are some notable changes for
some programs in the number of loops unrolled and trip counts computed.

```
Same hash: 179 (filtered out)
Remaining: 58
Metric: scalar-evolution.NumTripCountsComputed

Program                                        base    patch   diff
 test-suite...langs-C/compiler/compiler.test    25.00   31.00  24.0%
 test-suite.../Applications/SPASS/SPASS.test   2020.00 2323.00 15.0%
 test-suite...langs-C/allroots/allroots.test    29.00   32.00  10.3%
 test-suite.../Prolangs-C/loader/loader.test    17.00   18.00   5.9%
 test-suite...fice-ispell/office-ispell.test   253.00  265.00   4.7%
 test-suite...006/450.soplex/450.soplex.test   3552.00 3692.00  3.9%
 test-suite...chmarks/MallocBench/gs/gs.test   453.00  470.00   3.8%
 test-suite...ngs-C/assembler/assembler.test    29.00   30.00   3.4%
 test-suite.../Benchmarks/Ptrdist/bc/bc.test   263.00  270.00   2.7%
 test-suite...rks/FreeBench/pifft/pifft.test   722.00  741.00   2.6%
 test-suite...count/automotive-bitcount.test    41.00   42.00   2.4%
 test-suite...0/253.perlbmk/253.perlbmk.test   1417.00 1451.00  2.4%
 test-suite...000/197.parser/197.parser.test   387.00  396.00   2.3%
 test-suite...lications/sqlite3/sqlite3.test   1168.00 1189.00  1.8%
 test-suite...000/255.vortex/255.vortex.test   173.00  176.00   1.7%

Metric: loop-unroll.NumUnrolled

Program                                        base   patch  diff
 test-suite...langs-C/compiler/compiler.test     1.00   3.00 200.0%
 test-suite.../Applications/SPASS/SPASS.test   134.00 234.00 74.6%
 test-suite...count/automotive-bitcount.test     3.00   4.00 33.3%
 test-suite.../Prolangs-C/loader/loader.test     3.00   4.00 33.3%
 test-suite...langs-C/allroots/allroots.test     3.00   4.00 33.3%
 test-suite...Source/Benchmarks/sim/sim.test    10.00  12.00 20.0%
 test-suite...fice-ispell/office-ispell.test    21.00  25.00 19.0%
 test-suite.../Benchmarks/Ptrdist/bc/bc.test    32.00  38.00 18.8%
 test-suite...006/450.soplex/450.soplex.test   300.00 352.00 17.3%
 test-suite...rks/FreeBench/pifft/pifft.test    60.00  69.00 15.0%
 test-suite...chmarks/MallocBench/gs/gs.test    57.00  63.00 10.5%
 test-suite...ngs-C/assembler/assembler.test    10.00  11.00 10.0%
 test-suite...0/253.perlbmk/253.perlbmk.test   145.00 157.00  8.3%
 test-suite...000/197.parser/197.parser.test    43.00  46.00  7.0%
 test-suite...TimberWolfMC/timberwolfmc.test   205.00 214.00  4.4%
 Geomean difference                                           7.6%
```

Fixes https://bugs.llvm.org/show_bug.cgi?id=46939
Fixes https://bugs.llvm.org/show_bug.cgi?id=46924 on X86.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D85046
2020-08-03 17:22:42 +01:00
Vitaly Buka 08cf49658c [StackSafety, NFC] Don't insert empty objects into the map
Result should be the same but it makes generateParamAccessSummary 5x
faster.
2020-08-02 13:58:56 -07:00
Sanjay Patel 4abc69c6f5 [InstSimplify] fold max (max X, Y), X --> max X, Y
https://alive2.llvm.org/ce/z/VGgG3M
2020-08-02 11:50:58 -04:00
Nikita Popov a0addbb4ec [InstSimplify] Reduce code duplication in icmp of binop folds (NFC)
For folds where we check for the binop on both the LHS and RHS,
extract a function that expects it on the LHS and call it with
swapped order.
2020-08-02 15:47:18 +02:00
Kazu Hirata 60434989e5 Use llvm::is_contained where appropriate (NFC)
Use llvm::is_contained where appropriate (NFC)

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D85083
2020-08-01 21:51:06 -07:00
Craig Topper 85b5315dbe [InstSimplify] Fold abs(abs(x)) -> abs(x)
It's always safe to pick the earlier abs regardless of the nsw flag. We'll just lose it if it is on the outer abs but not the inner abs.

Differential Revision: https://reviews.llvm.org/D85053
2020-08-01 13:25:00 -07:00
Sanjay Patel 04b99a4d18 [InstSimplify] simplify abs if operand is known non-negative
abs() should be rare enough that using value tracking is not going
to be a compile-time cost burden, so use it to reduce a variety of
potential patterns. We do this in DAGCombiner too.

Differential Revision: https://reviews.llvm.org/D85043
2020-08-01 07:47:06 -04:00
Craig Topper 86dea1f39b [ValueTracking] Improve llvm.abs handling in computeKnownBits.
Add the optimizations we have in the SelectionDAG version.
Known non-negative copies all known bits. Any known one other than
the sign bit makes result non-negative.

Differential Revision: https://reviews.llvm.org/D85000
2020-07-31 15:55:03 -07:00
Sanjay Patel e591713bff [ConstantFolding] fold abs intrinsic
The handling for minimum value is similar to cttz/ctlz with 0 just above this case.

Differential Revision: https://reviews.llvm.org/D84942
2020-07-31 14:08:44 -04:00
Craig Topper 0e0aebc527 [ValueTracking] Add ComputeNumSignBits support for llvm.abs intrinsic
If absolute value needs turn a negative number into a positive number it reduces the number of sign bits by at most 1.

Differential Revision: https://reviews.llvm.org/D84971
2020-07-31 10:59:12 -07:00
Vitaly Buka b0eb40ca39 [NFC] Remove unused GetUnderlyingObject paramenter
Depends on D84617.

Differential Revision: https://reviews.llvm.org/D84621
2020-07-31 02:10:03 -07:00
Vitaly Buka 89051ebace [NFC] GetUnderlyingObject -> getUnderlyingObject
I am going to touch them in the next patch anyway
2020-07-30 21:08:24 -07:00
Vitaly Buka b256cb88a7 [ValueTracking] Remove AllocaForValue parameter
findAllocaForValue uses AllocaForValue to cache resolved values.
The function is used only to resolve arguments of lifetime
intrinsic which usually are not fare for allocas. So result reuse
is likely unnoticeable.

In followup patches I'd like to replace the function with
GetUnderlyingObjects.

Depends on D84616.

Differential Revision: https://reviews.llvm.org/D84617
2020-07-30 18:48:34 -07:00
Vitaly Buka 61cab352e3 [NFC] Move findAllocaForValue into ValueTracking.h
Differential Revision: https://reviews.llvm.org/D84616
2020-07-30 18:22:59 -07:00
Craig Topper 24f5235d93 [ValueTracking] Add basic computeKnownBits support for llvm.abs intrinsic
This includes basic support for computeKnownBits on abs. I've left FIXMEs for more complicated things we could do.

Differential Revision: https://reviews.llvm.org/D84963
2020-07-30 16:26:54 -07:00
Florian Hahn 2062b3707c [LAA] Avoid adding pointers to the checks if they are not needed.
Currently we skip alias sets with only reads or a single write and no
reads, but still add the pointers to the list of pointers in RtCheck.

This can lead to cases where we try to access a pointer that does not
exist when grouping checks.  In most cases, the way we access
PositionMap masked that, as the value would default to index 0.

But in the example in PR46854 it causes a crash.

This patch updates the logic to avoid adding pointers for alias sets
that do not need any checks. It makes things slightly more verbose, by
first checking the numbers of reads/writes and bailing out early if we don't
need checks for the alias set.

I think this makes the logic a bit simpler to follow.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D84608
2020-07-30 19:21:14 +01:00
Yuanfang Chen 555cf42f38 [NewPM][PassInstrument] Add PrintPass callback to StandardInstrumentations
Problem:
Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order.

Solution:
Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder)

This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order.

This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want.

The test fixes looks messy. It includes changes:
- Remove PassInstrumentationAnalysis
- Remove PassAdaptor
- If a PassAdaptor is for a real pass, the pass is added
- Pass reorder (to the correct order), related to PassAdaptor
- Add missing passes (due to Debuglogging not passed down)

Reviewed By: asbirlea, aeubanks

Differential Revision: https://reviews.llvm.org/D84774
2020-07-30 10:07:57 -07:00
Mircea Trofin 71059257bd [llvm][NFC] TensorSpec abstraction for ML evaluator
Further abstracting the specification of a tensor, to more easily
support different types and shapes of tensor, and also to perform
initialization up-front, at TFModelEvaluator construction time.

Differential Revision: https://reviews.llvm.org/D84685
2020-07-29 16:29:21 -07:00
Sanjay Patel fef513f5cc [InstSimplify] fold min/max intrinsic with undef operand 2020-07-29 17:03:50 -04:00
Sanjay Patel 5cd695dd7f [InstSimplify] fold min/max with opposite of limit value 2020-07-29 17:03:50 -04:00
Nikita Popov 897bdca4b8 [ConstantRange] Add API for intrinsics (NFC)
This adds a common API for compute constant ranges of intrinsics.
The intention here is that
a) we can reuse the same code across different passes that handle
   constant ranges, i.e. this can be reused in SCCP
b) we only have to add knowledge about supported intrinsics to
   ConstantRange, not any consumers.

Differential Revision: https://reviews.llvm.org/D84587
2020-07-29 22:16:27 +02:00
Craig Topper 3efc978bae [LV] Add abs/smin/smax/umin/umax intrinsics to isTriviallyVectorizable
This patch adds support for vectorizing these intrinsics.

Differential Revision: https://reviews.llvm.org/D84796
2020-07-29 10:23:07 -07:00
Sanjay Patel ee9617e96b [InstSimplify] try constant folding intrinsics before general simplifications
This matches the behavior of simplify calls for regular opcodes -
rely on ConstantFolding before spending time on folds with variables.

I am not aware of any diffs from this re-ordering currently, but there was
potential for unintended behavior from the min/max intrinsics because that
code is implicitly assuming that only 1 of the input operands is constant.
2020-07-29 13:18:40 -04:00
Sanjay Patel 3e8534fbc6 [InstSimplify] allow partial undef constants for vector min/max folds 2020-07-29 11:53:41 -04:00
Sanjay Patel 3c20ede18b [InstSimplify] fold integer min/max intrinsic with same args 2020-07-29 11:53:41 -04:00
Sanjay Patel 9ee7d7122c [ConstantFolding] fold integer min/max intrinsics
If both operands are undef, return undef.
If one operand is undef, clamp to limit constant.
2020-07-29 11:01:13 -04:00
David Green 60280e9818 [Analysis] TTI: Add CastContextHint for getCastInstrCost
Currently, getCastInstrCost has limited information about the cast it's
rating, often just the opcode and types.  Sometimes there is a context
instruction as well, but it isn't trustworthy: for instance, when the
vectorizer is rating a plan, it calls getCastInstrCost with the old
instructions when, in fact, it's trying to evaluate the cost of the
instruction post-vectorization.  Thus, the current system can get the
cost of certain casts incorrect as the correct cost can vary greatly
based on the context in which it's used.

For example, if the vectorizer queries getCastInstrCost to evaluate the
cost of a sext(load) with tail predication enabled, getCastInstrCost
will think it's free most of the time, but it's not always free. On ARM
MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar
situations can come up with how masked loads can be extended when being
split.

To fix that, this path adds a new parameter to getCastInstrCost to give
it a hint about the context of the cast. It adds a CastContextHint enum
which contains the type of the load/store being created by the
vectorizer - one for each of the types it can produce.

Original patch by Pierre van Houtryve

Differential Revision: https://reviews.llvm.org/D79162
2020-07-29 13:32:53 +01:00
Sanjay Patel 3fb13b8484 [InstSimplify] allow undefs in icmp with vector constant folds
This is the main icmp simplification shortcoming seen in D84655.

Alive2 agrees that the basic examples are correct at least:

define <2 x i1> @src(<2 x i8> %x) {
%0:
  %r = icmp sle <2 x i8> { undef, 128 }, %x
  ret <2 x i1> %r
}
=>
define <2 x i1> @tgt(<2 x i8> %x) {
%0:
  ret <2 x i1> { 1, 1 }
}
Transformation seems to be correct!

define <2 x i1> @src(<2 x i32> %X) {
%0:
  %A = or <2 x i32> %X, { 63, 63 }
  %B = icmp ult <2 x i32> %A, { undef, 50 }
  ret <2 x i1> %B
}
=>
define <2 x i1> @tgt(<2 x i32> %X) {
%0:
  ret <2 x i1> { 0, 0 }
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/omt2ee
https://alive2.llvm.org/ce/z/GW4nP_

Differential Revision: https://reviews.llvm.org/D84762
2020-07-28 15:13:53 -04:00
Evgeniy Brevnov 412b3932c6 [BPI] Fix memory leak reported by sanitizer bots
There is a silly mistake where release() is used instead of reset() for free resources of unique pointer.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D84747
2020-07-28 19:53:46 +07:00
Evgeniy Brevnov 3a2b05f9fe [BPI][NFC] Consolidate code to deal with SCCs under a dedicated data structure.
In order to facilitate review of D79485 here is a small NFC change which restructures code around handling of SCCs in BPI.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D84514
2020-07-28 17:42:33 +07:00
Alina Sbirlea f1d4db4f0c [GraphDiff] Use class method getChildren instead of GraphTraits.
Summary:
Use getChildren() method in GraphDiff instead of GraphTraits.

This simplifies the code and allows for refactorigns inside GraphDiff.
All usecase need not have a light-weight/copyable range.
Clean GraphTraits implementation.

Reviewers: dblaikie

Subscribers: hiraditya, llvm-commits, george.burgess.iv

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84562
2020-07-27 16:12:34 -07:00
Kazu Hirata 902cbcd59e Use llvm::is_contained where appropriate (NFC)
Summary:
This patch replaces std::find with llvm::is_contained where
appropriate.

Reviewers: efriedma, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84489
2020-07-27 10:20:44 -07:00
Sergey Dmitriev bec77ece14 [CallGraph] Preserve call records vector when replacing call edge
Summary:
Try not to resize vector of call records in a call graph node when
replacing call edge. That would prevent invalidation of iterators
stored in the CG SCC pass manager's scc_iterator.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84295
2020-07-27 06:02:55 -07:00
Sanjay Patel 0481e1ae3c [InstSimplify] fold integer min/max intrinsics with limit constant 2020-07-26 09:41:54 -04:00
Sanjay Patel b89ae102e6 [InstSimplify] fold fcmp using isKnownNeverInfinity + isKnownNeverNaN
Follow-up to D84035 / rG7393d7574c09.
This sidesteps a question of FMF/poison on fcmp raised in PR46077:
http://bugs.llvm.org/PR46077

https://alive2.llvm.org/ce/z/TCsyzD
  define i1 @src(float %x) {
  %0:
    %x42 = fadd nnan ninf float %x, 42.000000
    %r = fcmp ueq float %x42, inf
    ret i1 %r
  }
  =>
  define i1 @tgt(float %x) {
  %0:
    ret i1 0
  }
  Transformation seems to be correct!

https://alive2.llvm.org/ce/z/FQaH7a
  define i1 @src(i8 %x) {
  %0:
    %cast = uitofp i8 %x to float
    %r = fcmp one float inf, %cast
    ret i1 %r
  }
  =>
  define i1 @tgt(i8 %x) {
  %0:
    ret i1 1
  }
  Transformation seems to be correct!
2020-07-26 09:04:37 -04:00
Juneyoung Lee 32088f4f7f [ConstantFolding] Fold freeze if it is never undef or poison
This is a simple patch that adds constant folding for freeze
instruction.

IIUC, it isn't needed to update ConstantFold.cpp because there is no freeze
constexpr.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84597
2020-07-26 21:54:44 +09:00
Juneyoung Lee 9f074214b7 [ValueTracking] Instruction::isBinaryOp should be used for constexprs
This is a simple patch that makes canCreateUndefOrPoison use
Instruction::isBinaryOp because BinaryOperator inherits Instruction.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84596
2020-07-26 21:48:51 +09:00
Nikita Popov bc79ed7e16 [LVI] Don't require operand number for range (NFC)
Pass the Value* instead of the operand number, rename I to CxtI.
This makes the function a bit more generally useful.
2020-07-25 16:33:45 +02:00
Johannes Doerfert ce8928f2e4 [Mem2Reg] Teach promote to register about droppable instructions
This is the first of two patches to address PR46753. We basically allow
mem2reg to promote allocas that are used in doppable instructions, for
now that means `llvm.assume`. The uses of the alloca (or a bitcast or
zero offset GEP from there) are replaced by `undef` in the droppable
instructions.

Reviewed By: Tyker

Differential Revision: https://reviews.llvm.org/D83976
2020-07-24 15:15:38 -05:00
Arthur Eubanks 9bb6ce78be Rename scoped-noalias -> scoped-noalias-aa
Summary: To match NewPM name. Also the new name is clearer and more consistent.

Subscribers: jvesely, nhaehnle, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84542
2020-07-24 12:14:27 -07:00
Florian Hahn 1c7c69c795 [ValueTracking] Check for ConstantExpr before using recursive helpers.
Make sure we do not call
constainsConstantExpression/containsUndefElement on ConstantExpression,
which is not supported.

In particular, containsUndefElement/constainsConstantExpression are only
supported on constants which are supported by getAggregateElement.

Unfortunately there's no convenient way to check if a constant supports
getAggregateElement, so just check for non-constantexpressions with
vector type. Other users of those functions do so too.

Reviewers: spatel, nikic, craig.topper, lebedev.ri, jdoerfert, aqjune

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84512
2020-07-24 17:37:09 +01:00
Simon Pilgrim 0128b9505c Revert rG5dd566b7c7b78bd- "PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI."
This reverts commit 5dd566b7c7.

Causing some buildbot failures that I'm not seeing on MSVC builds.
2020-07-24 13:02:33 +01:00
Simon Pilgrim 5dd566b7c7 PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI.
PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list.

This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.
2020-07-24 12:40:50 +01:00
Eric Christopher 3ac828b8f7 Use llvm::size rather than an empty loop to get the number of top
level loops.
2020-07-23 14:55:50 -07:00
Tarindu Jayatilaka 06283661b3 Add new function properties to FunctionPropertiesAnalysis
Added  LoadInstCount, StoreInstCount, MaxLoopDepth, LoopCount

Reviewed By: jdoerfert, mtrofin

Differential Revision: https://reviews.llvm.org/D82283
2020-07-23 12:46:47 -07:00
Tarindu Jayatilaka ee6f0e109c Add a Printer to the FunctionPropertiesAnalysis
A printer pass and a lit test case was added.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D82523
2020-07-23 11:57:11 -07:00
Tarindu Jayatilaka 2f56046d7c Refactor FunctionPropertiesAnalysis
this separates  `analyze` logic from  `FunctionPropertiesAnalysis`

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D82521
2020-07-23 11:49:10 -07:00
Simon Pilgrim 7eb213499e RegionInfo.cpp - remove duplicate includes that already exist in RegionInfo.h. NFC.
Also remove some unnecessary forward declarations in RegionInfo.h.
2020-07-23 17:50:22 +01:00
Sanjay Patel 7485e92412 [InstSimplify] reduce code duplication for binop expansion; NFC
D84250 proposes to extend this code, so the duplication for
the commuted case would continue to grow.
2020-07-23 08:35:21 -04:00
Christopher Tetreault 23c5e59d9f [SVE] Remove calls to VectorType::getNumElements from Analysis
Reviewers: efriedma, fpetrogalli, c-rhodes, asbirlea, RKSimon

Reviewed By: RKSimon

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81504
2020-07-22 15:19:05 -07:00
Tarindu Jayatilaka 418121c30a Reapply "Rename InlineFeatureAnalysis to FunctionPropertiesAnalysis"
(This reverts commit a5e0194709, and
corrects author).

Rename the pass to be able to extend it to function properties other than inliner features.

    Reviewed By: mtrofin

    Differential Revision: https://reviews.llvm.org/D82044
2020-07-22 10:07:35 -07:00
Mircea Trofin a5e0194709 Revert "Rename InlineFeatureAnalysis to FunctionPropertiesAnalysis"
This reverts commit 44a6bda19b. I forgot
to correctly attibute it to tarinduj. Fixing and resubmitting.
2020-07-22 09:42:17 -07:00
Mircea Trofin 44a6bda19b Rename InlineFeatureAnalysis to FunctionPropertiesAnalysis
Rename the pass to be able to extend it to function properties other than inliner features.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D82044
2020-07-22 09:24:15 -07:00
Sebastian Neubauer 2a6c871596 [InstCombine] Move target-specific inst combining
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.

D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).

This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic

A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.

This allows to move about 3000 lines out from InstCombine to the targets.

Differential Revision: https://reviews.llvm.org/D81728
2020-07-22 15:59:49 +02:00
Max Kazantsev b96114c1e1 [SCEV] Remove premature assert. PR46786
This assert was added to verify assumption that GEP's SCEV will be of pointer type,
basing on fact that it should be a SCEVAddExpr with (at least) last operand being
pointer. Two notes:
- GEP's SCEV does not have to be a SCEVAddExpr after all simplifications;
- In current state, GEP's SCEV does not have to have at least one pointer operands
  (all of them can become int during the transforms).

However, we might want to be at a point where it is true. We are currently removing
this assert and will try to enumerate the cases where "is pointer" notion might be
lost during the transforms. When all of them are fixed, we can return it.

Differential Revision: https://reviews.llvm.org/D84294
Reviewed By: lebedev.ri
2020-07-22 15:43:16 +07:00
Juneyoung Lee ace0bf7490 [ValueTracking] Fix incorrect handling of canCreateUndefOrPoison
.. in isGuaranteedNotToBeUndefOrPoison.

This caused early exit of isGuaranteedNotToBeUndefOrPoison, making it return
imprecise result.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84251
2020-07-22 09:31:16 +09:00
Nico Weber 4fe912f186 Build: Move TF source file inclusion from build system to source files
Outside of compiler-rt (where it's arguably an anti-pattern too),
LLVM tries to keep its build files as simple as possible. See e.g.
llvm/docs/SupportLibrary.rst, "Code Organization".

Differential Revision: https://reviews.llvm.org/D84243
2020-07-21 13:02:34 -04:00
David Green becaa6803a [ARM] Constant fold VCTP intrinsics
We can sometimes get into the situation where the operand to a vctp
intrinsic becomes constant, such as after a loop is fully unrolled. This
adds the constant folding needed for them, allowing them to simplify
away and hopefully simplifying remaining instructions.

Differential Revision: https://reviews.llvm.org/D84110
2020-07-21 11:39:31 +01:00
Nico Weber e37b220442 [gn build] (manually) hack around 70f8d0ac8a 2020-07-21 06:35:36 -04:00
Mircea Trofin 70f8d0ac8a [llvm] Development-mode InlineAdvisor
Summary:
This is the InlineAdvisor used in 'development' mode. It enables two
scenarios:

 - loading models via a command-line parameter, thus allowing for rapid
 training iteration, where models can be used for the next exploration
 phase without requiring recompiling the compiler. This trades off some
 compilation speed for the added flexibility.

 - collecting training logs, in the form of tensorflow.SequenceExample
 protobufs. We generate these as textual protobufs, which simplifies
 generation and testing. The protobufs may then be readily consumed by a
 tensorflow-based training algorithm.

To speed up training, training logs may also be collected from the
'default' training policy. In that case, this InlineAdvisor does not
use a model.

RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140763.html

Reviewers: jdoerfert, davidxl

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83733
2020-07-20 11:01:56 -07:00
Matt Arsenault 5e999cbe8d IR: Define byref parameter attribute
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.

This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.

My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.

This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.

Background:

There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.

It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.

The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.

This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.

Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.

The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.

Additional context is in D79744.
2020-07-20 10:23:09 -04:00
Juneyoung Lee 30201d3b61 [ValueTracking] Let isGuaranteedNotToBeUndefOrPoison use canCreateUndefOrPoison
This patch adds support more operations.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D83926
2020-07-20 09:21:39 +09:00
Jameson Nash 8b354cc8db [ConstantFolding] check applicability of AllOnes constant creation first
The getAllOnesValue can only handle things that are bitcast from a
ConstantInt, while here we bitcast through a pointer, so we may see more
complex objects (like Array or Struct).

Differential Revision: https://reviews.llvm.org/D83870
2020-07-19 13:13:57 -04:00