Commit Graph

11816 Commits

Author SHA1 Message Date
Nikita Popov f3cbe60aa9 [AAEval] Remove unused function (NFC) 2022-03-16 10:25:45 +01:00
Nikita Popov 57d57b1afd [AAEval] Make compatible with opaque pointers
With opaque pointers, we cannot use the pointer element type to
determine the LocationSize for the AA query. Instead, -aa-eval
tests are now required to have an explicit load or store for any
pointer they want to compute alias results for, and the load/store
types are used to determine the location size.

This may affect ordering of results, and sorting within one result,
as the type is not considered part of the sorted string anymore.

To somewhat minimize the churn, printing still uses faux typed
pointer notation.
2022-03-16 10:02:11 +01:00
Dmitry Makogon 361034ba78 [NFC] Add LazyValueInfo::clear method
This method just calls LazyValueInfoImpl::clear
2022-03-15 17:52:50 +07:00
Arthur Eubanks 4fc7c55fff [NewPM] Actually recompute GlobalsAA before module optimization pipeline
RequireAnalysis<GlobalsAA> doesn't actually recompute GlobalsAA.
GlobalsAA isn't invalidated (unless specifically invalidated) because
it's self-updating via ValueHandles, but can be imprecise during the
self-updates.

Rather than invalidating GlobalsAA, which would invalidate AAManager and
any analyses that use AAManager, create a new pass that recomputes
GlobalsAA.

Fixes #53131.

Differential Revision: https://reviews.llvm.org/D121167
2022-03-14 09:42:34 -07:00
Arthur Eubanks 55cf09ae26 [ValueTracking] Simplify llvm::isPointerOffset()
We still need the code after stripAndAccumulateConstantOffsets() since
it doesn't handle GEPs of scalable types and non-constant but identical
indexes.

Differential Revision: https://reviews.llvm.org/D120523
2022-03-14 09:32:36 -07:00
Nikita Popov 04b717c423 [TLI] Check that malloc argument has type size_t
DSE assumes that this is the case when forming a calloc from a
malloc + memset pair.

For tests, either update the malloc signature or change the
data layout.
2022-03-14 17:22:24 +01:00
Andrew Litteken 0c4bbd293e [IRSim] Make sure the first instruction of a block doesn't get missed if it is the first valid instruction in Module.
If an instruction is first legal instruction in the module, and is the only legal instruction in its basic block, it will be ignored by the outliner due to a length check inherited from the older version of the outliner that was restricted to outlining within a single basic block. This removes that check, and updates any tests that broke because of it.

Reviewer: paquette

Differential Revision: https://reviews.llvm.org/D120786
2022-03-13 23:13:09 -05:00
Andrew Litteken 1643f01232 [IRSim][IROutliner] Ignoring Musttail Function
Musttail calls require extra handling to properly propagate the calling convention information and tail call information. The outliner does not currently do this, so we ignore call instructions that utilize the swifttailcc and tailcc calling convention as well as functions marked with the attribute musttail.

Reviewers: paquette, aschwaighofer

Differential Revision: https://reviews.llvm.org/D120733
2022-03-13 19:27:25 -05:00
Andrew Litteken 66f90fdff1 Revert "[IRSim][IROutliner] Ignoring Musttail Function"
This reverts commit c7037c7257.

Pushed too soon
2022-03-13 19:26:51 -05:00
Andrew Litteken c7037c7257 [IRSim][IROutliner] Ignoring Musttail Function 2022-03-13 18:57:24 -05:00
serge-sans-paille 3d219d805c Add missing include under EXPENSIVE_CHECKS 2022-03-12 18:54:29 +01:00
serge-sans-paille ed98c1b376 Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121332
2022-03-12 17:26:40 +01:00
Johannes Doerfert d6e09ce86f [CaptureTracking][NFCI] Expose capture tracking logic
The logic exposed by this patch via `llvm::DetermineUseCaptureKind` was
part of `llvm::PointerMayBeCaptured`. In the Attributor we want to keep
track of the work list items but still reuse the logic if a use might
capture a value. A follow up for the Attributor removes ~100 lines of
code and complexity while making future handling of simplified values
possible.

Differential Revision: https://reviews.llvm.org/D121272
2022-03-11 22:56:16 -06:00
Anna Thomas a4aa97d578 [InlineCost] Add cl::opt for target attributes compatibility check. NFC
This patch adds a CL option for avoiding the attribute compatibility
check between caller and callee in TTI. TTI attribute compatibility
checks for target CPU and target features.
In our downstream compiler, this attribute always remains the same
between callee and caller. By avoiding the addition of this attribute to
each of our inline candidate (and then checking them here during inline
cost), we save some compile time.

The option is kept false, so this change is an NFC upstream.
2022-03-11 18:05:16 -05:00
Nikita Popov 806450805d [ConstFold] Don't fold calls with mismatching function type
With opaque pointers, this is no longer ensured through pointer
type identity.
2022-03-11 14:09:23 +01:00
Nikita Popov 02c2106002 [InstSimplify] Handle vector GEP when simplifying zero indices
If the base is a scalar and the index is a vector, we can't
simplify, as this is effectively a splat operation.
2022-03-11 10:56:44 +01:00
Sanjay Patel b48fe158e0 [Analysis] remove bogus smin/smax pattern detection
This is a revert of cfcc42bdc. The analysis is wrong as shown by
the minimal tests for instcombine:
https://alive2.llvm.org/ce/z/y9Dp8A

There may be a way to salvage some of the other tests,
but that can be done as follow-ups. This avoids a miscompile
and fixes #54311.
2022-03-09 17:50:34 -05:00
Florian Hahn f98125abb2
Revert "[PassManager] Add pretty stack entries before P->run() call."
This reverts commit 128745cc26.

This increased compile-time unnecessarily. Revert this change and follow
ups 2c7afadb47 & add0c5856d.

http://llvm-compile-time-tracker.com/compare.php?from=338dfcd60f843082bb589b287d890dbd9394eb82&to=128745cc2681c284bc6d0150a319673a6d6e8424&stat=instructions
2022-03-09 18:46:32 +00:00
Florian Hahn 128745cc26
[PassManager] Add pretty stack entries before P->run() call.
This patch adds PrettyStackEntries before running passes. The entries
include the pass name and the IR unit the pass runs on.

The information is used the print additional information when a pass
crashes, including the name and a reference to the IR unit on which it
crashed. This is similar to the behavior of the legacy pass manager.

The improved stack trace now includes:

Stack dump:
0.	Program arguments: bin/opt -loop-vectorize -force-vector-width=4 crash.ll
1.	Running pass 'ModuleToFunctionPassAdaptor' on module 'crash.ll'
2.	Running pass 'LoopVectorizePass' on function '@a'

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D120993
2022-03-09 13:01:09 +00:00
Nikita Popov ba8ee4a43e [SCEV] Verify all IR -> SCEV mappings
This extends SCEV verification to check not only backedge-taken
counts, but all entries in the IR -> SCEV cache. The restrictions
are the same as for the BECount case, i.e. we ignore expressions
based on undef, we only diagnose constant deltas (there are way
too many false positives otherwise) and we limit to reachable code.

Differential Revision: https://reviews.llvm.org/D121104
2022-03-09 09:33:22 +01:00
Arthur Eubanks 53e5e58670 [NewPM][Inliner] Make inlined calls to functions in same SCC as callee exponentially expensive
Introduce a new attribute "function-inline-cost-multiplier" which
multiplies the inline cost of a call site (or all calls to a callee) by
the multiplier.

When processing the list of calls created by inlining, check each call
to see if the new call's callee is in the same SCC as the original
callee. If so, set the "function-inline-cost-multiplier" attribute of
the new call site to double the original call site's attribute value.
This does not happen when the original call site is intra-SCC.

This is an alternative to D120584, which marks the call sites as
noinline.

Hopefully fixes PR45253.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D121084
2022-03-07 23:51:09 -08:00
Florian Hahn a2979c8399
[IVDescriptors] Bail out instead of asserting that order is expected.
When dealing with multiple phis that depend on each other, the order
might have been changed and may not match the expectation. If that
happens, bail out, rather than asserting.

Fixes https://github.com/llvm/llvm-project/issues/54218
Fixes https://github.com/llvm/llvm-project/issues/54233
Fixes https://github.com/llvm/llvm-project/issues/54254
2022-03-07 19:57:26 +00:00
Nikita Popov 81b43b23e4 [SCEV] Enable verification under EXPENSIVE_CHECKS
SCEV verification should no longer affect results of subsequent
queries, and our lit tests as well as llvm-test-suite pass with
SCEV verification enabled, so I think we can enable it by default
under EXPENSIVE_CHECKS now.

Differential Revision: https://reviews.llvm.org/D120708
2022-03-07 09:53:00 +01:00
Nikita Popov d1e880acaa [SCEV] Enable verification in LoopPM
Currently, we hardly ever actually run SCEV verification, even in
tests with -verify-scev. This is because the NewPM LPM does not
verify SCEV. The reason for this is that SCEV verification can
actually change the result of subsequent SCEV queries, which means
that you see different transformations depending on whether
verification is enabled or not.

To allow verification in the LPM, this limits verification to
BECounts that have actually been cached. It will not calculate
new BECounts.

BackedgeTakenInfo::getExact() is still not entirely readonly,
it still calls getUMinFromMismatchedTypes(). But I hope that this
is not problematic in the same way. (This could be avoided by
performing the umin in the other SCEV instance, but this would
require duplicating some of the code.)

Differential Revision: https://reviews.llvm.org/D120551
2022-03-07 09:46:20 +01:00
Nikita Popov 8133778d3c [SCEV] Fully invalidate SCEVUnknown on RAUW
When a SCEVUnknown gets RAUWd, we currently drop it from the folding
set, but don't forget memoized values. I believe we should be
treating RAUW the same way as deletion here and invalidate all
caches and dependent expressions.

I don't have any specific cases where this causes issues right now,
but it does address the FIXME in https://reviews.llvm.org/D119488.

Differential Revision: https://reviews.llvm.org/D120033
2022-03-07 09:28:28 +01:00
Florian Hahn de8ac485e5
[IVDescriptor] Remove SinkCandidate from SinkAfter before re-sinking.
This ensures the right order in the sink-after map is maintained. If we
re-sink an instruction, it must be sunk after all earlier instructions
have been sunk.

Fixes https://github.com/llvm/llvm-project/issues/54223
2022-03-05 19:48:26 +00:00
Arthur Eubanks f909aed671 Revert "[SCEV] Infer ranges for SCC consisting of cycled Phis"
This reverts commit fc539b0004.

Causes miscompiles, see D110620.
2022-03-04 19:52:44 -08:00
Augie Fackler dba73135c8 getAllocAlignment: respect allocalign attribute if present
As with allocsize(), we prefer the table data to attributes.

Differential Revision: https://reviews.llvm.org/D118263
2022-03-04 15:57:54 -05:00
Augie Fackler 5e4c75db3b InstructionCombining: avoid eliding mismatched alloc/free pairs
Prior to this change LLVM would happily elide a call to any allocation
function and a call to any free function operating on the same unused
pointer. This can cause problems in some obscure cases, for example if
the body of operator::new can be inlined but the body of
operator::delete can't, as in this example from jyknight:

    #include <stdlib.h>
    #include <stdio.h>

    int allocs = 0;

    void *operator new(size_t n) {
        allocs++;
        void *mem = malloc(n);
        if (!mem) abort();
        return mem;
    }

    __attribute__((noinline)) void operator delete(void *mem) noexcept {
        allocs--;
        free(mem);
    }

    void deleteit(int*i) { delete i; }
    int main() {
        int*i = new int;
        deleteit(i);
        if (allocs != 0)
          printf("MEMORY LEAK! allocs: %d\n", allocs);
    }

This patch addresses the issue by introducing the concept of an
allocator function family and uses it to make sure that alloc/free
function pairs are only removed if they're in the same family.

Differential Revision: https://reviews.llvm.org/D117356
2022-03-04 10:41:10 -05:00
Florian Hahn 5a60260efe
[IVDescriptor] Use DT to check order of Previous, OtherPrev.
Previous and OhterPrev may not be in the same block. Use DT::dominates
instead of local comesBefore. DT::dominates is already used earlier to
check the order of Previous and SinkCandidate.

Fixes https://github.com/llvm/llvm-project/issues/54195
2022-03-04 11:07:42 +00:00
Jez Ng dd29597e10 [LTO] Initialize canAutoHide() using canBeOmittedFromSymbolTable()
Per discussion on
https://reviews.llvm.org/D59709#inline-1148734, this seems like the
right course of action. `canBeOmittedFromSymbolTable()` subsumes and
generalizes the previous logic. In addition to handling `linkonce_odr`
`unnamed_addr` globals, we now also internalize `linkonce_odr` +
`local_unnamed_addr` constants.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D120173
2022-03-03 19:04:11 -05:00
Arthur Eubanks 41e792d725 [CostModel] Change printer pass wording to work with update_analyze_test_checks.py
update_analyze_test_checks.py looks for very specific wording, update
the printer pass to match the legacy `-analyze -cost-model` wording.
2022-03-03 10:10:48 -08:00
Craig Topper 608161225e [InstCombine][Analysis] Move getFCmpCode and getPredForFCmpCode to CmpInstAnalysis. NFC
The similar getICmpCode and getPredForICmpCode are already there.
This moves FP for consistency.

I think InstCombine is currently the only user of both.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120754
2022-03-03 09:33:24 -08:00
Florian Hahn 139215af8e
[IVDescriptor] Find original 'Previous' for first-order recurrences.
This patch extends first-order recurrence handling to support cases
where we already sunk an instruction for a different recurrence, but
LastPrev comes before Previous.

To handle those cases correctly, we need to find the earliest entry for
the sink-after chain, because this is references the Previous from the
original recurrence. This is needed to ensure we use the correct
instruction as sink point.

Depends on D118558.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D118642
2022-03-03 16:41:26 +00:00
serge-sans-paille 81a1760cac Revert "Add missing include under EXPENSIVE_CHECK"
This reverts commit eeaca53df7.

It's a duplicate of
https://reviews.llvm.org/rG50874a188b94a25827963956887b878d3701509a
2022-03-03 07:56:34 +01:00
serge-sans-paille a494ae43be Cleanup includes: TransformsUtils
Estimation on the impact on preprocessor output:
before: 1065307662
after:  1064800684

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120741
2022-03-01 21:00:07 +01:00
serge-sans-paille eeaca53df7 Add missing include under EXPENSIVE_CHECK
This is a followup to 344f8ec3048b6eeef94569800acb012f794ad372

It should fix
https://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/21961/console
2022-03-01 21:00:06 +01:00
Fangrui Song 50874a188b Fix -DLLVM_ENABLE_EXPENSIVE_CHECKS=on build after D120659 2022-03-01 11:36:25 -08:00
Mircea Trofin 261419273a Fix build breaks on ml-* bots introduced by include cleanups 2022-03-01 11:29:18 -08:00
Craig Topper 7bc6667845 [Analysis] Simplify the interface to llvm::getICmpCode. NFC
Instead of passing an InstCmpInt * and a bool just pass the predicate
from the caller.

I'm considering moving the similar FCmp functions from InstCombine
over here and this makes the interface consistent with what is used
for FCmp.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120609
2022-03-01 09:53:27 -08:00
serge-sans-paille 71c3a5519d Cleanup includes: LLVMAnalysis
Number of lines output by preprocessor:
before: 1065940348
after:  1065307662

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120659
2022-03-01 18:01:54 +01:00
Nikita Popov aeab6167b0 [SCEV] Only verify BECounts for reachable loops (PR50523)
For unreachable loops, any BECount is legal, and since D98706 SCEV
can make use of this for loops that are unreachable due to constant
branches. To avoid false positives, adjust SCEV verification to only
check BECounts in reachable loops.

Fixes https://github.com/llvm/llvm-project/issues/50523.

Differential Revision: https://reviews.llvm.org/D120651
2022-03-01 11:52:35 +01:00
Nikita Popov 3c53d3a733 [InlineCost] Use SmallPtrSet for DeadBlocks (NFC)
This set is only used with contains operations, so there is no
need to use a SetVector.
2022-02-28 15:26:22 +01:00
Serge Pavlov 6982c38cb1 [ConstantFolding] Fix folding of constrained compare intrinsics
The change fixes treatment of constrained compare intrinsics if
compared values are of vector type.

Differential revision: https://reviews.llvm.org/D110322
2022-02-27 10:19:19 +07:00
Nikita Popov 2d0fc3e46f [SCEV] Return ArrayRef from getSCEVValues() (NFC)
Return a read-only view on this set. For the one internal use,
directly access ExprValueMap.
2022-02-25 09:32:22 +01:00
Nikita Popov d9715a7266 [SCEV] Don't try to reuse expressions with offset
SCEVs ExprValueMap currently tracks not only which IR Values
correspond to a given SCEV expression, but additionally stores that
it may be expanded in the form X+Offset. In theory, this allows
reusing existing IR Values in more cases.

In practice, this doesn't seem to be particularly useful (the test
changes are rather underwhelming) and adds a good bit of complexity.
Per https://github.com/llvm/llvm-project/issues/53905, we have an
invalidation issue with these offseted expressions.

Differential Revision: https://reviews.llvm.org/D120311
2022-02-25 09:16:48 +01:00
Mircea Trofin 7e3606f43c [ScalarEvolution] Control flag for nonstrict inequalities in finite loops
D118090 causes a pretty significant (19%) regression in some Eigen
benchmarks. Investigating is a bit time consuming as the compilation
unit where this occurs is large. Rather than revert, this patch adds a
flag controlling that behavior (enabled by default).
2022-02-23 17:56:35 -08:00
Malhar Jajoo 9f1c6fbf11 [LAA] Add remarks for unbounded array access
Adds new optimization remarks when loop vectorization fails due to
the compiler being unable to find bound of an array access inside
a loop

Differential Revision: https://reviews.llvm.org/D115873
2022-02-23 15:57:39 +00:00
Sanjay Patel fc3b34c508 [InstSimplify] remove shift that is redundant with part of funnel shift
In D111530, I suggested that we add some relatively basic pattern-matching
folds for shifts and funnel shifts and avoid a more specialized solution
if possible.

We can start by implementing at least one of these in IR because it's
easier to write the code and verify with Alive2:
https://alive2.llvm.org/ce/z/qHpmNn

This will need to be adapted/extended for SDAG to handle the motivating
bug ( #49541 ) because the patterns only appear later with that example
(added some tests: bb850d422b)

This can be extended within InstSimplify to handle cases where we 'and'
with a shift too (in that case, kill the funnel shift).
We could also handle patterns where the shift and funnel shift directions
are inverted, but I think it's better to canonicalize that instead to
avoid pattern-match case explosion.

Differential Revision: https://reviews.llvm.org/D120253
2022-02-23 09:10:01 -05:00
Thomas Preud'homme 40f9081958 [LAA] Add missing newline in debug print 2022-02-23 13:25:16 +00:00
Nikita Popov 6777ec9e4d [ValueTracking] Support signed intrinsic clamp
This is the same special logic we apply for SPF signed clamps
when computing the number of sign bits, just for intrinsics.

This just uses the same logic as the select case, but there's
multiple directions this could be improved in: We could also use
the num sign bits from the clamped value, we could do this during
constant range calculation, and there's probably unsigned analogues
for the constant range case at least.
2022-02-23 12:45:16 +01:00
Bill Wendling a5bbc6ef99 [NFC] Remove unnecessary "#include"s from header files 2022-02-23 01:20:48 -08:00
Kerry McLaughlin 12fb133eba [LoopVectorize] Support conditional in-loop vector reductions
Extends getReductionOpChain to look through Phis which may be part of
the reduction chain. adjustRecipesForReductions will now also create a
CondOp for VPReductionRecipe if the block is predicated and not only if
foldTailByMasking is true.

Changes were required in tryToBlend to ensure that we don't attempt
to convert the reduction Phi into a select by returning a VPBlendRecipe.
The VPReductionRecipe will create a select between the Phi and the reduction.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D117580
2022-02-22 12:04:35 +00:00
Max Kazantsev ad3b1fe472 [SCEV] Do not erase LoopUsers. PR53969
This patch fixes a logical error in how we work with `LoopUsers` map.
It maps a loop onto a set of AddRecs that depend on it. The Addrecs
are added to this map only once when they are created and put to
the UniqueSCEVs` map.

The only purpose of this map is to make sure that, whenever we forget
a loop, all (directly or indirectly) dependent SCEVs get forgotten too.

Current code erases SCEVs from dependent set of a given loop whenever
we forget this loop. This is not a correct behavior due to the following scenario:

1. We have a loop `L` and an AddRec `AR` that depends on it;
2. We modify something in the loop, but don't destroy it. We still call forgetLoop on it;
3. `AR` is no longer dependent on `L` according to `LoopUsers`. It is erased from
    ValueExprMap` and `ExprValue map, but still exists in UniqueSCEVs;
4. We can later request the very same AddRec for the very same loop again, and get existing
    SCEV `AR`.
5. Now, `AR` exists and is used again, but its notion that it depends on `L` is lost;
6. Then we decide to delete `L`. `AR` will not be forgotten because we have lost it;
7. Just you wait when you run into a dangling pointer problem, or any other kind of problem
   because an active SCEV is now referecing a non-existent loop.

The solution to this is to stop erasing values from `LoopUsers`. Yes, we will maybe forget something
that is already not used, but it's cheap.

This fixes a functional bug and potentially may have negative compile time impact on methods with
huge or numerous loops.

Differential Revision: https://reviews.llvm.org/D120303
Reviewed By: nikic
2022-02-22 17:24:39 +07:00
David Sherwood dc0657277f Fix warning introduced by 47eff645d8 2022-02-22 09:37:16 +00:00
David Sherwood 47eff645d8 [InstCombine] Bail out of load-store forwarding for scalable vector types
This patch fixes an invalid TypeSize->uint64_t implicit conversion in
FoldReinterpretLoadFromConst. If the size of the constant is scalable
we bail out of the optimisation for now.

Tests added here:

  Transforms/InstCombine/load-store-forward.ll

Differential Revision: https://reviews.llvm.org/D120240
2022-02-22 09:26:04 +00:00
Max Kazantsev 40d06c4ce9 [SCEV][NFC] Replace contains+insert check with insert.second 2022-02-21 20:11:13 +07:00
Philip Reames 34a9642af8 Revert "[instsimplify] Simplify HaveNonOverlappingStorage per review suggestion on D120133 [NFC]"
This reverts commit 3a6be124cc.  This appears to have caused a stage2 build failure: https://lab.llvm.org/buildbot/#/builders/168/builds/4813

Will investigate further on Monday and recommit.
2022-02-18 15:36:15 -08:00
Whitney Tsang e7afbea8ca [MemorySSA] Clear VisitedBlocks per query
The problem can be shown from the newly added test case.
There are two invocations to MemorySSAUpdater::moveToPlace, and the
internal data structure VisitedBlocks is changed in the first
invocation, and reused in the second invocation. In between the two
invocations, there is a change to the CFG, and MemorySSAUpdater is
notified about the change.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D119898
2022-02-18 15:36:19 -05:00
Philip Reames 3a6be124cc [instsimplify] Simplify HaveNonOverlappingStorage per review suggestion on D120133 [NFC] 2022-02-18 11:33:15 -08:00
Philip Reames ff2e4c04c4 [instsimplify] Assume storage for byval args doesn't overlap allocas, globals, or other byval args
This allows us to discharge many pointer comparisons based on byval arguments.

Differential Revision: https://reviews.llvm.org/D120133
2022-02-18 11:08:01 -08:00
Philip Reames bf296ea6bb [instsimplify] Clarify assumptions about disjoint memory regions [NFC] 2022-02-18 08:51:18 -08:00
Philip Reames 5ecf218eca [instsimplify] Add a comment hinting how compares involving two globals are handled [NFC] 2022-02-18 08:41:30 -08:00
Philip Reames f6510e6d6f [instsimplify] Factor out a helper for alloca bounds checking [NFC]
At the moment, this just groups comments with a reasonably named predicate, but I plan to add other cases to this in the near future.
2022-02-18 07:40:22 -08:00
Serguei Katkov b45d0b3e8e [MemoryDependency] Simplfy re-ordering condition. Cleanup. NFC.
Make the reading of condition for restricting re-ordering simpler.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D120005
2022-02-18 12:40:26 +07:00
Philip Reames cf5e88864b [instsimplify] When compare allocas, consider their minimal size
The code was using exact sizing only, but since what we really need is just to make sure the offsets are in bounds, a minimum bound on the object size is sufficient.

To demonstrate the difference, support computing minimum sizes from obects of scalable vector type.
2022-02-17 09:53:24 -08:00
Philip Reames 2404313d80 [instsimplify] Fix a miscompile with zero sized allocas
Remove some code which tried to handle the case of comparing two allocas where an object size could not be precisely computed.  This code had zero coverage in tree, and at least one nasty bug.

The bug comes from the fact that the code uses the size of the result pointer as a proxy for whether the alloca can be of size zero.  Since the result of an alloca is *always* a pointer type, and a pointer type can *never* be empty, this check was a nop.  As a result, we blindly consider a zero offset from two allocas to never be equal.  They can in fact be equal when one or more of the allocas is zero sized.

This is particularly ugly because instcombine contains the exact opposite rule.  If instcombine reaches the allocas first, it combines them into one (making them equal).  If instsimplify reaches the compare first, it would consider them not equal.  This creates all kinds of fun scenarios for order of optimization reaching different and contradictory conclusions.
2022-02-17 09:27:34 -08:00
Max Kazantsev fc539b0004 [SCEV] Infer ranges for SCC consisting of cycled Phis
Our current strategy of computing ranges of SCEVUnknown Phis was to simply
compute the union of ranges of all its inputs. In order to avoid infinite recursion,
we mark Phis as pending and conservatively return full set for them. As result,
even simplest patterns of cycled phis always have a range of full set.

This patch makes this logic a bit smarter. We basically do the same, but instead
of taking inputs of single Phi we find its strongly connected component (SCC)
and compute the union of all inputs that come into this SCC from outside.

Processing entire SCC together has one more advantage: we can set range for all
of them at once, because the only thing that happens to them is the same value is
being passed between those Phis. So, despite we spend more time analyzing a
single Phi, overall we may save time by not processing other SCC members, so
amortized compile time spent should be approximately the same.

Differential Revision: https://reviews.llvm.org/D110620
Reviewed By: reames
2022-02-17 18:03:52 +07:00
Nikita Popov c3c5280b0e [InstSimplify] Delay creation of constants for offsets (NFC)
Return APInt from stripAndComputeConstantOffsets(), and only
create corresponding Constants later, if we actually need them.
2022-02-17 09:56:32 +01:00
Serguei Katkov 194899caef [MemoryDependency] Relax the re-ordering of atomic store and unordered load/store
Atomic store with Release semantic allows re-ordering of unordered load/store before the store.
Implement it.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D119844
2022-02-17 10:53:25 +07:00
Roman Lebedev ae48af582b
[NFC][SCEV] Recognize umin_seq when operand is zext'ed in zero-check
zext(umin(x,y)) == umin(zext(x),zext(y))
zext(x) == 0  ->  x == 0

While it is not a very likely scenario, we probably should not expect
that instcombine already dropped such a redundant zext,
but handle directly. Moreover, perhaps there was no ZExtInst,
and SCEV somehow managed to  pull out said zext out of the SCEV expression.
2022-02-16 22:16:02 +03:00
Roman Lebedev 3c7d48ed90
[NFC][SCEV] Recognize umin_seq when operand is zext'ed in umin but not in zero-check
zext(umin(x,y)) == umin(zext(x),zext(y))
zext(x) == 0  ->  x == 0

Extra leading zeros do not affect the result of comparison with zero,
nor do they matter for the unsigned min/max,
so we should not be dissuaded when we find a zero-extensions,
but instead we should just skip it.
2022-02-16 22:16:02 +03:00
Kevin P. Neal 8290f2535b [FPEnv][FMF] Move helper function to header, move fast math flags to new include file.
In a prior review I was asked to move the helper function canIgnoreSNaN()
out to FPEnv.h. This wasn't possible at the time because that function
needs the fast math flags, and including them includes lots of other stuff
that isn't needed.

This patch moves the fast math flags out into a new FMF.h file unchanged,
and moves the helper function out to FPEnv.h also unchanged. This ticket
only moves code around.

Differential Revision: https://reviews.llvm.org/D119752
2022-02-16 12:34:53 -05:00
Kevin P. Neal c7400892ca [FPEnv][InstSimplify] Fold fsub X, -0 ==> X, when we know X is not -0
Currently the fsub optimizations in InstSimplify don't know how to fold
X - -0.0 to X when we know X is not zero and the constrained intrinsics
are used. This adds the support.

This review is split out from D107285.

Differential Revision: https://reviews.llvm.org/D119746
2022-02-16 10:10:13 -05:00
Chuanqi Xu a2609be0b2 [ValueTracking] Checking haveNoCommonBitsSet for (x & y) and ~(x | y)
This one tries to fix:
https://github.com/llvm/llvm-project/issues/53357.

Simply, this one would check (x & y) and ~(x | y) in
haveNoCommonBitsSet. Since they shouldn't have common bits (we could
traverse the case by enumerating), and we could convert this one to (x &
y) | ~(x | y) . Then the compiler could handle it in
InstCombineAndOrXor.
Further more, since ((x & y) + (~x & ~y)) would be converted to ((x & y)
+ ~(x | y)), this patch would fix it too.

https://alive2.llvm.org/ce/z/qsKzRS

Reviewed By: spatel, xbolva00, RKSimon, lebedev.ri

Differential Revision: https://reviews.llvm.org/D118094
2022-02-16 13:42:52 +08:00
Serguei Katkov 15f1cffb3a [MemoryDependency] Relax the re-ordering with volatile store.
Volatile store does not provide any special rules for reordering with
atomics. Usual must alias anaylsis is enough here.

This makes the bahavior similar to how volatile load is handled.

Reviewers: reames, nikic
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D119818
2022-02-16 10:58:48 +07:00
Sanjay Patel 7cc0a29b3f [Analysis] propagate poison through add/sub saturate intrinsics
A more general enhancement needs to add tests and make sure
that intrinsics that return structs are correct. There are also
target-specific intrinsics, and I'm not sure what behavior is
expected for those.
2022-02-15 10:45:32 -05:00
Sanjay Patel 00218c188b [Analysis] propagate poison through integer min/max intrinsics
A more general enhancement needs to add tests and make sure
that intrinsics that return structs are correct. There are also
target-specific intrinsics, and I'm not sure what behavior is
expected for those.
2022-02-15 10:45:32 -05:00
Nikita Popov f35af77573 [InstSimplify] Strip offsets once in computePointerICmp()
Instead of doing an inbounds strip first and another non-inbounds
strip afterward for equality comparisons, directly do a single
inbounds or non-inbounds strip based on whether we have an equality
predicate or not.

This is NFC-ish in that the alloca equality codepath is the only
part that sees additional non-inbounds offsets now, and for that
codepath it doesn't matter whether or not the GEP is inbounds, as
it does a stronger check itself. InstCombine would infer inbounds
for such GEPs.
2022-02-15 12:04:24 +01:00
Dávid Bolvanský f0e6ec1547 [Inliner] Respect noinline call site attribute
```
always_inline foo() { }

bar () {

noinline foo();
}
```
We should prefer call site attribute over attribute on decl.

Related to https://reviews.llvm.org/D119061

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D119579
2022-02-14 18:35:52 +01:00
Kevin P. Neal 22bd65fbe7 [FPEnv][InstSimplify] Fold fsub X, +0 ==> X
Currently the fsub optimizations in InstSimplify don't know how to fold X
- +0.0 to X when using the constrained intrinsics. This adds the support.

This review is split out from D107285.

Differential Revision: https://reviews.llvm.org/D118928
2022-02-14 11:56:45 -05:00
zhongyunde b2f5164deb [IVDescriptors] Support FOR where we have multiple sink pointed
Handles the case where Previous doesn't come before LastPrev incorrectly.
Fix https://github.com/llvm/llvm-project/issues/53483

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D118558
2022-02-14 09:30:35 +08:00
Arthur Eubanks a9029a33ff [OpaquePtr][ValueTracking] Check GEP source element type in isPointerOffset()
Fixes a MemCpyOpt miscompile with opaque pointers.
This function can be further cleaned up, but let's just fix the miscompile first.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D119652
2022-02-13 10:35:38 -08:00
Philip Reames c02deae18c [SCEVPredicate] Remove getExpr mechanism [NFC]
This mechanism was used for a couple of purposes, but the primary one was keeping track of which predicates in a union might apply to an expression.  As these sets are small and agressively deduped, this has little value.
2022-02-11 11:35:58 -08:00
Roman Lebedev 97484f46eb
[NFCI][SCEV] `SCEVTraversal`: if search terminated, don't push further ops of nary
Even if the search is marked as terminated after only looking at
the first operand, we'd still look at the remaining operands
before actually ending the search.

This seems pointless and wasteful, let's not do that.
2022-02-11 21:58:19 +03:00
Roman Lebedev 65715ac72a
[SCEV] Generalize umin_seq matching
Since we don't greedily flatten `umin_seq(a, umin(b, c))` into `umin_seq(a, b, c)`,
just looking at the operands of the outer-level `umin` is not sufficient,
and we need to recurse into all same-typed `umin`'s.
2022-02-11 21:58:19 +03:00
Roman Lebedev c234809ff8
[SCEV] Recognize `x == 0 ? 0 : umin_seq(..., x, ...) -> umin_seq(x, umin_seq(...))` 2022-02-11 21:58:19 +03:00
Roman Lebedev 281421693b
[SCEV] Recognize `x == 0 ? 0 : umin(..., x, ...) -> umin_seq(x, umin(...))`
That is the canonical expansion for umin_seq,
so we really should roundtrip it.
2022-02-11 21:58:19 +03:00
Roman Lebedev 4d0c0e6cc2
[SCEV] `createNodeForSelectOrPHIInstWithICmpInstCond()`: generalize eq handling
The current logic was: https://alive2.llvm.org/ce/z/j8muXk
but in reality the offset to the Y in the 'true' hand
does not need to exist: https://alive2.llvm.org/ce/z/MNQ7DZ
https://alive2.llvm.org/ce/z/S2pMQD

To catch that, instead of computing the Y's in both
hands and checking their equality, compute Y and C,
and check that C is 0 or 1.
2022-02-11 21:58:19 +03:00
Roman Lebedev a473c457f6
[NFC][SCEV] `createNodeForSelectOrPHIInstWithICmpInstCond()`: dedup eq/ne pred handling 2022-02-11 21:58:19 +03:00
Philip Reames 3e27fb8590 [PSE] Allow duplicate predicates in debug output
This lets us avoid redundant implication work in the constructor of SCEVUnionPredicate which simplifies an upcoming change.  If we're actually building a predicate via PSE, that goes through addPredicate which does include the implication check.
2022-02-11 10:39:01 -08:00
Nikita Popov e9c0720010 [PHITransAddr] Check GEP source element type
It's not the same GEP if the source element type is different.
2022-02-11 16:22:48 +01:00
Nikita Popov 5d63903465 [SCCP] Check that load/store and global type match
SCCP requires that the load/store type and global type are the
same (it does not support bitcasts of tracked globals). With
typed pointers this was implicitly enforced.
2022-02-11 11:01:18 +01:00
Philip Reames 5ba115031d [PSE] Remove assumption that top level predicate is union from public interface [NFC*]
Note that this doesn't actually cause the top level predicate to become a non-union just yet.

The * above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.
2022-02-10 16:14:52 -08:00
Philip Reames 01b56b8bdd [SCEVPredicateRewriter] Remove assumption top level predicate is a union [NFC] 2022-02-10 15:51:15 -08:00
Roman Lebedev c94ec7997a
[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: use sub instead of add
For booleans, xor/add/sub are interchangeable:
https://alive2.llvm.org/ce/z/ziav3d

But for larger bitwidths, we'll need sub, so change it now.
2022-02-11 01:21:45 +03:00
Philip Reames e43b1ce4d5 [SCEV] Constify some uses of SCEVUnionPredicate* [NFC]
This exploits the immutability introduced in d334fec.
2022-02-10 12:42:19 -08:00
Nikita Popov 87a0b1bd23 [InstSimplify] Remove zero-index opaque pointer GEP
With opaque pointers, a zero-index GEP is a no-op. It does not
need to be retained for the pointer element type change it may
perform.
2022-02-10 16:01:56 +01:00
Roman Lebedev 580d3a14b2
[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: refactor `i1 cond ? i1 x : i1 y` handling
While that effectively concludes i1 select handling,
that boolean restriction can be lifted later.
2022-02-10 17:42:56 +03:00
Roman Lebedev 9a322e430f
[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: refactor `i1 cond ? i1 C : i1 y` pattern
https://alive2.llvm.org/ce/z/uRvVtN
2022-02-10 17:42:56 +03:00
Roman Lebedev 576a45f20d
[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: refactor `i1 cond ? i1 x : i1 C` pattern
https://alive2.llvm.org/ce/z/2Q7Du_
2022-02-10 17:42:55 +03:00
Roman Lebedev 9766a0cca0
[SCEV] Recognize `cond ? i1 0 : i1 y` as `umin_seq ~cond, x`
By definition, `umin_seq` has the exact same
poison stopping properties the original `select` had:
https://alive2.llvm.org/ce/z/N6XwV-
2022-02-10 17:42:55 +03:00
Roman Lebedev 418604fd90
[SCEV] Recognize `cond ? i1 x : i1 1` as `~umin_seq cond, ~x`
By definition, `umin_seq` has the exact same
poison stopping properties the original `select` had:
https://alive2.llvm.org/ce/z/aqe9GK
2022-02-10 17:42:55 +03:00
Roman Lebedev 49d9acc242
[SCEV] Recognize logical `or` as `not umin_seq (not, not)`
By definition, `umin_seq` has the exact same
poison stopping properties the original `select` had:
https://alive2.llvm.org/ce/z/MUfbTL
2022-02-10 17:42:55 +03:00
Roman Lebedev 16bc24e7be
[SCEV] Recognize logical `and` as `umin_seq`
By definition, `umin_seq` has the exact same
poison stopping properties the original `select` had:
https://alive2.llvm.org/ce/z/59KuZZ
2022-02-10 17:42:55 +03:00
Roman Lebedev 1c69444863
[SCEV] `createNodeForSelectOrPHI()`: try constant-folding even if not an Instruction
We'd catch the tautological select pattern later anyways
due to constant folding, so that leaves PHI-like select,
but it does not appear to fire there.
2022-02-10 17:42:55 +03:00
Roman Lebedev 97930f85af
[NFC][SCEV] Prepare `createNodeForSelectOrPHI()` for gaining additional strategy
Currently `createNodeForSelectOrPHI()` takes an Instruction,
and only works on the Cond that is an ICmpInst,
but that can be relaxed somewhat.

For now, simply rename the existing function,
and add a thin wrapper ontop that still does
the same thing as it used to.
2022-02-10 17:42:55 +03:00
Roman Lebedev 73990ff8a7
[SCEV] Recognize binary `xor` as bit-wise `add`
https://alive2.llvm.org/ce/z/ULuZxB

We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `add`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
2022-02-10 17:42:55 +03:00
Roman Lebedev 503541fa93
[SCEV] Recognize binary `and` as bit-wise `umin`
https://alive2.llvm.org/ce/z/aKAr94

We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `umin`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
2022-02-10 17:42:54 +03:00
Roman Lebedev e7e0834f07
[SCEV] Recognize binary `or` as bit-wise `umax`
https://alive2.llvm.org/ce/z/SMEaoc

We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `umax`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
2022-02-10 17:42:54 +03:00
David Sherwood 1badfbb4fc Fix incorrect TypeSize->uint64_t cast in InductionDescriptor::isInductionPHI
The code was relying upon the implicit conversion of TypeSize to
uint64_t and assuming the type in question was always fixed. However,
I discovered an issue when running the canon-freeze pass with some
IR loops that contains scalable vector types. I've changed the code
to bail out if the size is unknown at compile time, since we cannot
compute whether the step is a multiple of the type size or not.

I added a test here:

  Transforms/CanonicalizeFreezeInLoops/phis.ll

Differential Revision: https://reviews.llvm.org/D118696
2022-02-10 09:39:12 +00:00
Philip Reames d334fec140 [SCEV] Make SCEVUnionPredicate externally immutable [NFC]
This is the last major stepping stone before being able to allocate the node via the folding set allocator.  That will in turn allow more general SCEV predicate expression trees.
2022-02-09 13:47:28 -08:00
Philip Reames e6d9bab558 [SCEV] Remove a direct call to SCEVUnionPredicate::add [NFC] 2022-02-09 13:04:12 -08:00
Philip Reames d39f4ac494 [SCEV] Unwind SCEVUnionPredicate from getPredicatedBackedgeTakenCount [NFC]
For those curious, the whole reason for tracking the predicate set seperately as opposed to just immediately registering the dependencies appears to be allowing the printing code to print a result without changing the PSE state.  It's slightly questionable if this justifies the complexity, but since we can preserve it with local ugliness, I did so.
2022-02-09 12:55:40 -08:00
Philip Reames aa845d7a24 [SCEV] Remove conversion to SCEVUnionPredicate in ExitNotTakenInfo [NFC]
This removes one of the places where we mutate an existing union predicate.
2022-02-09 12:10:23 -08:00
Philip Reames 83f895d952 [SCEV] Add interface for constructing generic SCEVComparePredicate [NFC} 2022-02-09 10:29:04 -08:00
Arthur Eubanks ff31020ee6 [OpaquePtr][LoopAccessAnalysis] Support opaque pointers
Previously we relied on the pointee type to determine what type we need
to do runtime pointer access checks.

With opaque pointers, we can access a pointer with more than one type,
so now we keep track of all the types we're accessing a pointer's
memory with.

Also some other minor getPointerElementType() removals.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D119047
2022-02-09 09:11:27 -08:00
Philip Reames c302f1e677 [SCEV] Generalize SCEVEqualsPredicate to any compare [NFC]
PredicatedScalarEvolution has a predicate type for representing A == B.  This change generalizes it into something which can represent a A <pred> B.

This generality is currently unused, but is motivated by a couple of recent cases which have come up.  In particular, I'm currently playing around with using this to simplify the runtime checking code in LoopVectorizer. Regardless of the outcome of that prototyping, generalizing the compare node seemed useful.
2022-02-08 08:18:09 -08:00
Roman Lebedev ae9414d562
[ValueTracking] Only check for non-undef/poison if already known to be a self-multiply
https://godbolt.org/z/js9fTTG9h
^ we don't care what `isGuaranteedNotToBeUndefOrPoison()` says
unless we already knew that the operands were equal.
2022-02-08 18:35:29 +03:00
Simon Pilgrim 1468202748 [ValueTracking] Add support for X*X self-multiplication
D108992 added KnownBits handling for 'Quadratic Reciprocity' self-multiplication patterns (bit[1] == 0), which can be used for non-undef values (poison is OK).

This patch adds noundef selfmultiply handling to value tracking so demanded bits patterns can make use of it.

Differential Revision: https://reviews.llvm.org/D117995
2022-02-08 13:33:27 +00:00
Simon Pilgrim e2537f6b19 [ValueTracking] Replace dyn_cast with dyn_cast_or_null to account for getTerminator returning null
Noticed while running checks on D117995 - a hexagon regression test was managing to return a block without a terminator
2022-02-08 13:33:26 +00:00
Johannes Doerfert 29c8ebad10 [MemoryBuiltins][FIX] Adjust index type size properly wrt. AS casts
Use existing functionality to strip constant offsets that works well
with AS casts and avoids the code duplication.

Since we strip AS casts during the computation of the offset we also
need to adjust the APInt properly to avoid mismatches in the bit width.
This code ensures the caller of `compute` sees APInts that match the
index type size of the value passed to `compute`, not the value result
of the strip pointer cast.

Fixes #53559.

Differential Revision: https://reviews.llvm.org/D118727
2022-02-07 20:19:19 -06:00
Kazu Hirata 3a3cb929ab [llvm] Use = default (NFC) 2022-02-06 22:18:35 -08:00
Bill Wendling c6f0940d99 [NFC] Remove unnecessary #includes
An attempt to reduce the number of files that are recompiled due to a change.

Differential Revision: https://reviews.llvm.org/D119055
2022-02-04 21:22:41 -08:00
serge-sans-paille ffe8720aa0 Reduce dependencies on llvm/BinaryFormat/Dwarf.h
This header is very large (3M Lines once expended) and was included in location
where dwarf-specific information were not needed.

More specifically, this commit suppresses the dependencies on
llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and
llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used,
this has a decent impact on number of preprocessed lines generated during
compilation of LLVM, as showcased below.

This is achieved by moving some definitions back to the .cpp file, no
performance impact implied[0].

As a consequence of that patch, downstream user may need to manually some extra
files:

llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h
llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h

In some situations, codes maybe relying on the fact that
llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden
dependency now needs to be explicit.

$ clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
after:   10978519
before:  11245451

Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
[0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions

Differential Revision: https://reviews.llvm.org/D118781
2022-02-04 11:44:03 +01:00
Augie Fackler b2d091aa5d [NFC] MemoryBuiltins: tease out a getFreeFunctionDataForFunction helper 2022-02-03 08:36:36 -08:00
Augie Fackler bad0301cc5 MemoryBuiltins: simplify isLibFreeFunction [NFC]
This is in anticipation of my next patch, where I need to store more information about free functions than just their argument count. It felt invasive enough on this function that it seemed worthwhile to just extract this as its own commit that makes no functional changes.

Differential Revision: https://reviews.llvm.org/D117350
2022-02-03 08:30:02 -08:00
Serge Pavlov d2f132f0b7 [ConstantFolding] Fold constrained compare intrinsics
The change implements constant folding of ‘llvm.experimental.constrained.fcmp’
and ‘llvm.experimental.constrained.fcmps’ intrinsics.

Differential Revision: https://reviews.llvm.org/D110322
2022-02-03 16:45:56 +07:00
Andrew Litteken 30420bc344 [IRSim] Make sure that commutative intrinsics are treated as function calls without commutativity
Created to fix: https://github.com/llvm/llvm-project/issues/53537

Some intrinsics functions are considered commutative since they are performing operations like addition or multiplication. Some of these have extra parameters to provide extra information that are not part of the operation itself and are not commutative. This makes sure that if an instruction that is an intrinsic takes the non commutative path to handle this case.

Reviewer: paquette

Closes Issue #53537

Differential Revision: https://reviews.llvm.org/D118807
2022-02-02 13:24:56 -06:00
Malhar Jajoo 778b455dd6 [LAA] Add Memory dependence remarks.
Adds new optimization remarks when vectorization fails.

More specifically, new remarks are added for following 4 cases:

- Backward dependency
- Backward dependency that prevents Store-to-load forwarding
- Forward dependency that prevents Store-to-load forwarding
- Unknown dependency

It is important to note that only one of the sources
of failures (to vectorize) is reported by the remarks.
This source of failure may not be first in program order.

A regression test has been added to test the following cases:

a) Loop can be vectorized: No optimization remark is emitted
b) Loop can not be vectorized: In this case an optimization
remark will be emitted for one source of failure.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D108371
2022-02-02 12:07:51 +00:00
William S. Moses 8cb9c73609 [LoopIdiom] Keep TBAA when creating memcpy/memmove
When upgrading a loop of load/store to a memcpy, the existing pass does not keep existing aliasing information. This patch allows existing aliasing information to be kept.

Reviewed By: jeroen.dobbelaere

Differential Revision: https://reviews.llvm.org/D108221
2022-01-31 16:28:13 -05:00
Eli Friedman b2837bf2f2 [ScalarEvolution] Add bailout to avoid zext of pointer.
The RHS of an isImpliedCond call can be a pointer even if the LHS is
not. This is similar to bfa2a81e.

Not going to include a testcase; an IR testcase would be extremely
complicated and fragile.

Fixes https://github.com/llvm/llvm-project/issues/51936 .

Differential Revision: https://reviews.llvm.org/D114555
2022-01-31 11:41:39 -08:00
Kazu Hirata cda7b6aaf3 [Analysis] Drop an unnecessary const from a return type (NFC)
Identified with readability-const-return-type.
2022-01-30 16:04:58 -08:00
Kazu Hirata 49fdee13c1 [Analysis] Use != to compare strings (NFC)
Identified with readability-string-compare.
2022-01-30 12:32:57 -08:00
Nuno Lopes 0dc20e321c [InstSimplify] fold 'xor X, poison' and 'div/rem X, poison' to poison 2022-01-30 10:46:54 +00:00
William S. Moses 99d2582164 [ScalarEvolution] Handle <= and >= in non infinite loops
Extend scalar evolution to handle >= and <= if a loop is known to be finite and the induction variable guards the condition. Specifically, with these assumptions lhs <= rhs is equivalent to lhs < rhs + 1 and lhs >= rhs to lhs > rhs -1.

In the case of lhs <= rhs, this is true since the only case these are not equivalent
is when rhs == unsigned/signed intmax, which would have resulted in an infinite loop.

In the case of lhs >= rhs, this is true since the only case these are not equivalent
is when rhs == unsigned/signed intmin, which would again have resulted in an infinite loop.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D118090
2022-01-28 17:41:08 -05:00
Andrew Litteken 3785c1d055 [IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions
Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match.

This also adds an additional command line flag debug option to disable outlining intrinsics.

Recommit of: 8de76bd569
Adds extra checking of intrinsic function calls names to avoid taking the address of intrinsic calls when extracting function calls.

Reviewers: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D109450
2022-01-28 13:52:21 -06:00
William S. Moses 0d04c77856 [ScalarEvolution] Mark a loop as finite if in a willreturn function
A limited version of (https://reviews.llvm.org/D118090) that only marks a loop as finite if in a willreturn function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118429
2022-01-28 14:17:05 -05:00
Nikita Popov 8a4293f3ef [Loads] Require Align in isDereferenceableAndAlignedPointer() (NFC)
Now that loads always have an alignment, we should not perform an
ABI alignment fallback here.
2022-01-28 16:23:32 +01:00
Evgeniy Brevnov d7424939a6 [BasicAA] Add support for memmove intrinsic
Currently, basic AA has special support for llvm.memcpy.* intrinsics. This change extends this support for any memory trancsfer opration and in particular llvm.memmove.* intrinsic.

Reviewed By: reames, nikic

Differential Revision: https://reviews.llvm.org/D117095
2022-01-28 18:19:36 +07:00
Florian Hahn 1ca02bddb4
[ConstraintSystem] Mark function as const (NFC). 2022-01-27 13:44:47 +00:00
Congzhe Cao f3e1f44340 [IVDescriptor] Get the exact FP instruction that does not allow reordering
This is a bugfix in IVDescriptor.cpp.

The helper function `RecurrenceDescriptor::getExactFPMathInst()`
is supposed to return the 1st FP instruction that does not allow
reordering. However, when constructing the RecurrenceDescriptor,
we trace the use-def chain staring from a PHI node and for each
instruction in the use-def chain, its descriptor overrides the
previous one. Therefore in the final RecurrenceDescriptor we
constructed, we lose previous FP instructions that does not allow
reordering.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D118073
2022-01-27 00:33:46 -05:00
Nikita Popov 44cfc3a816 [LICM] Generalize unwinding check during scalar promotion
This extract a common isNotVisibleOnUnwind() helper into
AliasAnalysis, which handles allocas, byval arguments and noalias
calls. After D116998 this could also handle sret arguments. We
have similar logic in DSE and MemCpyOpt, which will be switched
to use this helper as well.

The noalias call case is a bit different from the others, because
it also requires that the object is not captured. The caller is
responsible for doing the appropriate check.

Differential Revision: https://reviews.llvm.org/D117000
2022-01-26 11:15:03 +01:00
Andrew Litteken ba79295c48 [NFC][IROutliner] fix namespace and unused variable 2022-01-25 18:41:30 -06:00
Andrew Litteken e8f4e41b6b [IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region.
We use the same similarity scheme we used for branch instructions for phi nodes, and allow them to be outlined. There is not a lot of special handling needed for these phi nodes when outlining, as they simply act as outputs. The code extractor does not currently allow for non entry blocks within the extracted region to have predecessors, so there are not conflicts to handle with respect to predecessors no longer contained in the function.

Recommit of 515eec3553

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106997
2022-01-25 18:25:50 -06:00
Andrew Litteken e50b217b4e Revert "[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region."
This reverts commit 515eec3553.

By mistake, commit message was not complete.
2022-01-25 18:24:19 -06:00
Andrew Litteken 515eec3553 [IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region. 2022-01-25 18:20:10 -06:00
Andrew Litteken 9c2daf648c Revert "[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions"
This reverts commit 8de76bd569.

Reverting due to failure of different-intrinsics.ll on lld-x86_64-win buildbot.
2022-01-25 18:19:33 -06:00
Andrew Litteken 8de76bd569 [IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions
Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match.

This also adds an additional command line flag debug option to disable outlining intrinsics.

Reviewers: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D109450
2022-01-25 17:06:09 -06:00
Andrew Litteken f5f377d1fc [IRSim][IROutliner] Adding support for recognizing and outlining indirect function calls, and function calls with different names, but the same type
The outliner currently requires that function calls not be indirect calls, and have that the function name, and function type must match, as well as other attributes such as calling conventions. This patch treats called functions as values, and just another operand, and named function calls as constants. This allows functions to be treated like any other constant, or input and output into the outlined functions.

There are also debugging flags added to enforce the old behaviors where indirect calls not be allowed, and to enforce the old rule that function calls names must also match.

Reviewers: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D109448
2022-01-25 15:19:28 -06:00
Nikita Popov 3e2ae92d3f [SCEV] Remove an unnecessary GEP type check
The code already checked that the addrec step size and type alloc
size are the same. The actual pointer element type is irrelevant
here.
2022-01-25 12:56:46 +01:00
Nikita Popov aa97bc116d [NFC] Remove uses of PointerType::getElementType()
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().

This is part of D117885, in preparation for deprecating the API.
2022-01-25 09:44:52 +01:00
Max Kazantsev c913dccfde [SCEV] Use lshr in implications
This patch adds support for implication inference logic for the
following pattern:
```
  lhs < (y >> z) <= y, y <= rhs --> lhs < rhs
```
We should be able to use the fact that value shifted to right is
not greater than the original value (provided it is non-negative).

Differential Revision: https://reviews.llvm.org/D116150
Reviewed-By: apilipenko
2022-01-25 13:25:19 +07:00
Ahmed Bougacha e7298464c5 [ObjCARC] Use "UnsafeClaimRV" to refer to unsafeClaim in enums. NFC.
This matches the actual runtime function more closely.
I considered also renaming both RetainRV/UnsafeClaimRV to end with
"ARV", for AutoreleasedReturnValue, but there's less potential
for confusion there.
2022-01-24 19:37:01 -08:00
Evgeniy Brevnov 0e55d4fab0 [AA] Refine ModRefInfo for llvm.memcpy.* in presence of operand bundles
Presence of operand bundles changes semantics in respect to ModRef. In particular, spec says: "From the compilers perspective, deoptimization operand bundles make the call sites theyre attached to at least readonly. They read through all of their pointer typed operands (even if theyre not otherwise escaped) and the entire visible heap. Deoptimization operand bundles do not capture their operands except during deoptimization, in which case control will not be returned to the compiled frame". Fix handling of llvm.memcpy.* according to the spec.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D118033
2022-01-25 10:15:23 +07:00
Mircea Trofin b1af01fe6a [NFC][MLGO] Simplify conditional compilation
Most of the code that's shared between 'release' and 'development'
modes doesn't depend on anything special.
2022-01-24 11:19:04 -08:00
Kazu Hirata b752eb887f [Analysis] Use default member initialization (NFC)
Identified with modernize-use-default-member-init.
2022-01-23 20:32:56 -08:00
Kazu Hirata 448d0dfab7 [Analysis] Remove a redundant const from a return type (NFC)
Identified with readability-const-return-type.
2022-01-23 14:00:03 -08:00
Sanjay Patel 2e26633af0 [IR] document and update ctlz/cttz intrinsics to optionally return poison rather than undef
The behavior in Analysis (knownbits) implements poison semantics already,
and we expect the transforms (for example, in instcombine) derived from
those semantics, so this patch changes the LangRef and remaining code to
be consistent. This is one more step in removing "undef" from LLVM.

Without this, I think https://github.com/llvm/llvm-project/issues/53330
has a legitimate complaint because that report wants to allow subsequent
code to mask off bits, and that is allowed with undef values. The clang
builtins are not actually documented anywhere AFAICT, but we might want
to add that to remove more uncertainty.

Differential Revision: https://reviews.llvm.org/D117912
2022-01-23 11:22:48 -05:00
Nikita Popov b4900296e4 [ConstantFold] Allow all float types in reinterpret load folding
Rather than hardcoding just half, float and double, allow all
floating point types.
2022-01-21 09:26:51 +01:00
Nikita Popov 6a19cb837c [ConstantFold] Support pointers in reinterpret load folding
Peculiarly, the necessary code to handle pointers (including the
check for non-integral address spaces) is already in place,
because we were already allowing vectors of pointers here, just
not plain pointers.
2022-01-21 09:13:37 +01:00
Nikita Popov 05cd9a0596 [ConstantFold] Simplify type check in reinterpret load folding (NFC)
Keep a list of allowed types, but then always construct the map
type the same way. We need an integer with the same width as the
original type.
2022-01-21 09:06:35 +01:00
Mircea Trofin f29256a64a [MLGO] Improved support for AOT cross-targeting scenarios
The tensorflow AOT compiler can cross-target, but it can't run on (for
example) arm64. We added earlier support where the AOT-ed header and object
would be built on a separate builder and then passed at build time to
a build host where the AOT compiler can't run, but clang can be otherwise
built.

To simplify such scenarios given we now support more than one AOT-able
case (regalloc and inliner), we make the AOT scenario centered on whether
files are generated, case by case (this includes the "passed from a
different builder" scenario).
This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of
case by case control. A builder can opt out of an AOT case by passing that case's
model path as `none`. Note that the overrides still take precedence.

This patch controls conditional compilation with case-specific flags,
which can be enabled locally, for the component where those are
available. We still keep an overall flag for some tests.

The 'development/training' mode is unchanged, because there the model is
passed from the command line and interpreted.

Differential Revision: https://reviews.llvm.org/D117752
2022-01-20 07:05:39 -08:00
Mircea Trofin e67430cca4 [MLGO] ML Regalloc Eviction Advisor
The bulk of the implementation is common between 'release' mode (==AOT-ed
model) and 'development' mode (for training), the main difference is
that in development mode, we may also log features (for training logs),
inject scoring information (currently after the Virtual Register
Rewriter) and then produce the log file.

This patch also introduces the score injection pass, 'Register
Allocation Pass Scoring', which is trivially just logging the score in
development mode.

Differential Revision: https://reviews.llvm.org/D117147
2022-01-19 11:00:32 -08:00
Nikita Popov d8bff13a8a [NFC] Add missing <map> includes
These were relying on a transitive include.
2022-01-19 12:29:03 +01:00
Philip Reames 215bd46905 [MemoryBuiltins] Demote isMallocLikeFn to implementation routine since last use has been removed
Try 2, this time including the test.
2022-01-18 15:24:52 -08:00
Philip Reames fcab2d1309 Revert "[MemoryBuiltins] Demote isMallocLikeFn to implementation routine since last use has been removed"
This reverts commit 167af7bbfe. Buildbot breaks since I forgot to remove a unit test.
2022-01-18 15:16:12 -08:00
Philip Reames 167af7bbfe [MemoryBuiltins] Demote isMallocLikeFn to implementation routine since last use has been removed 2022-01-18 15:12:07 -08:00
Mircea Trofin 3e8553aab4 [mlgo][inline] Improve global state tracking
The global state refers to the number of the nodes currently in the
module, and the number of direct calls between nodes, across the
module.

Node counts are not a problem; edge counts are because we want strictly
the kind of edges that affect inlining (direct calls), and that is not
easily obtainable without iteration over the whole module.

This patch avoids relying on analysis invalidation because it turned out
to be too aggressive in some cases. It leverages the fact that Node
objects are stable - they do not get deleted while cgscc passes are
run over the module; and cgscc pass manager invariants.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D115847
2022-01-18 17:45:34 +00:00
Jan Svoboda 5f4ae56457 [llvm] Remove uses of `std::vector<bool>`
LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement.

This patch does just that for llvm.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D117121
2022-01-18 18:20:45 +01:00
Nikita Popov 3ec7f46e99 [LVI] Handle implication from icmp of trunc (PR51867)
Similar to the existing urem code, if we have (trunc X) >= C,
then also X >= C.

Proof: https://alive2.llvm.org/ce/z/RF4YR2

Fixes https://github.com/llvm/llvm-project/issues/51867.
2022-01-18 11:24:11 +01:00
Nikita Popov 9e68557e64 [LVI] Handle commuted SPF min/max operands
We need to check that the operands of the min/max are the operands
of the select, but we don't care which order they are in.
2022-01-18 10:43:00 +01:00
Nikita Popov d15823e300 [LVI] Compute SPF range even if one operands is overdefined
If we have a constant range for one operand but not the other,
we can generally still compute a useful results for SPF min/max.
2022-01-18 10:40:49 +01:00
Nikita Popov 202d590a01 [LVI] Consistently intersect assumes
Integrate intersection with assumes into getBlockValue(), to ensure
that it is consistently performed.

We were doing it in nearly all places, but for example missed it
for select inputs.
2022-01-18 10:15:31 +01:00
Nikita Popov f104cc38f4 [ConstantFold] Don't fold load from non-byte-sized vector
Following up on 1470f94d71 (r63981173):

The result here (probably) depends on endianness. Don't bother
trying to handle this exotic case, just bail out.
2022-01-17 17:01:47 +01:00
Nikita Popov af12a3f4a9 [ValueTracking] Remove ComputeMultiple() function
This function is no longer used since
499f1ca79f.
2022-01-17 10:28:31 +01:00
Bryce Wilson dd13744bfb
Revert "[BasicAliasAnalysis] Remove isMallocOrCallocLikeFn"
This reverts commit 1f2cfc4fdc.
2022-01-14 14:42:53 -08:00
Roman Lebedev 650fc40b6d
[NFC][SCEV] Introduce `getCastExpr()` QoL helper 2022-01-15 00:52:22 +03:00
Bryce Wilson 1f2cfc4fdc
[BasicAliasAnalysis] Remove isMallocOrCallocLikeFn
Allocation functions should be marked with onlyAccessesInaccessibleMemory (when that is correct for the given function) which is checked elsewhere so this check is no longer needed.

Differential Revision: https://reviews.llvm.org/D117180
2022-01-14 12:22:01 -08:00
Philip Reames dac82b53e2 Revert "[MemoryBuiltins] [NFC] Add missing section comments"
This reverts commit 83338d5032.  Comments in source are non-idiomatic and naming choice in head is unclear.
2022-01-14 08:34:21 -08:00
Roman Lebedev b32077234b
[NFCI][SCEV] `computeExitLimitFromCondFromBinOp()`: rely on `getSequentialMinMaxExpr()` constant relaxation
`getSequentialMinMaxExpr()` has been taught to perform this relaxation,
so rely on that now. Not sure this can be tested.
2022-01-14 17:07:48 +03:00
Roman Lebedev 8dcba20674
[SCEV] `getSequentialMinMaxExpr()`: relax 2-op umin_seq w/ constant to umin
Currently, `computeExitLimitFromCondFromBinOp()` does that directly.
2022-01-14 17:07:48 +03:00
Roman Lebedev c86a982d7d
[SCEV] `getSequentialMinMaxExpr()`: rewrite deduplication to be fully recursive
Since we don't merge/expand non-sequential umin exprs into umin_seq exprs,
we may have umin_seq(umin(umin_seq())) chain, and the innermost umin_seq
can have duplicate operands still.
2022-01-14 15:42:26 +03:00
Florian Hahn 1ef9bfa013
[InstSimplify] Pass pointer and indices separately to SimplifyGEPInst.
This doesn't require callers to put the pointer operand and the indices
in a container like a vector when calling the function. This is not
really an issue with the existing callers. But when using it from
IRBuilder the inputs are available as separate pointer value and indices
ArrayRef.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D117038
2022-01-14 09:59:52 +00:00
Nikita Popov 20d9c51dc0 [ConstantFold] Check for uniform value before reinterpret load
The reinterpret load code will convert undef values into zero.
Check the uniform value case before it to produce a better result
for all-undef initializers.

However, the uniform value handling will return the uniform value
even if the access is out of bounds, while the reinterpret load
code will return undef. Add an explicit check to retain the
previous result in this case.
2022-01-14 10:18:02 +01:00
Bryce Wilson 83338d5032
[MemoryBuiltins] [NFC] Add missing section comments 2022-01-13 17:43:43 -08:00
Philip Reames ee02cf0797 [MemoryBuiltins] Demote isCallocLikeFn and isAlignedAllocLikeFn to local helpers after removal of last external use [NFC] 2022-01-13 15:51:17 -08:00
Philip Reames cf66f01ec1 [Attributor] Share code for abstract interpretation of allocation sizes with getObjectSize [NFC-ish]
The basic idea is that we can parameterize the getObjectSize implementation with a callback which lets us replace the operand before analysis if desired. This is what Attributor is doing during it's abstract interpretation, and allows us to have one copy of the code.

Note this is not NFC for two reasons:
* The existing attributor code is wrong. (Well, this is under-specified to be honest, but at least inconsistent.) The intermediate math needs to be done in the index type of the pointer space. Imagine e.g. i64 arguments in a 32 bit address space.
* I did not preserve the behavior in getAPInt where we return 0 for a partially analyzed value. This looks simply wrong in the original code, and nothing test wise contradicts that.

Differential Revision: https://reviews.llvm.org/D117241
2022-01-13 15:33:24 -08:00
Bryce Wilson 68874d8b5f
[MemoryBuiltins] [NFC] Remove unused overload of isAlignedAllocLikeFn
Differential Revision: https://reviews.llvm.org/D117245
2022-01-13 15:19:04 -08:00
Arthur Eubanks 757e044dce [Inliner] Don't removeDeadConstantUsers() when checking if a function is dead
If a function has many uses, this can take a good chunk of compile times.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D117236
2022-01-13 14:29:45 -08:00
Philip Reames cd36b29ec7 [MemoryBuiltins] (Slightly) clean up abuse of MallocLike bitmask [NFC] 2022-01-13 12:39:22 -08:00
Nikita Popov aba7c3c033 [ConstantFold] Check uniform value in ConstantFoldLoadFromConst()
This case is automatically handled if ConstantFoldLoadFromConstPtr()
is used. Make sure that ConstantFoldLoadFromConst() also handles it.
2022-01-13 14:40:19 +01:00
Hans Wennborg 2bc57d85eb Don't override __attribute__((no_stack_protector)) by inlining (PR52886)
Since 26c6a3e736, LLVM's inliner will "upgrade" the caller's stack protector
attribute based on the callee. This lead to surprising results with Clang's
no_stack_protector attribute added in 4fbf84c173 (D46300). Consider the
following code compiled with clang -fstack-protector-strong -Os
(https://godbolt.org/z/7s3rW7a1q).

  extern void h(int* p);

  inline __attribute__((always_inline)) int g() {
    return 0;
  }

  int __attribute__((__no_stack_protector__)) f() {
    int a[1];
    h(a);
    return g();
  }

LLVM will inline g() into f(), and f() would get a stack protector, against the
users explicit wishes, potentially breaking the program e.g. if h() changes the
value of the stack cookie. That's a miscompile.

More recently, bc044a88ee (D91816) addressed this problem by preventing
inlining when the stack protector is disabled in the caller and enabled in the
callee or vice versa. However, the problem remained if the callee is marked
always_inline as in the example above. This affected users, see e.g.
http://crbug.com/1274129 and http://llvm.org/pr52886.

One way to fix this would be to prevent inlining also in the always_inline
case. Despite the name, always_inline does not guarantee inlining, so this
would be legal but potentially surprising to users.

However, I think the better fix is to not enable the stack protector in a
caller based on the callee. The motivation for the old behaviour is unclear, it
seems counter-intuitive, and causes real problems as we've seen.

This commit implements that fix, which means in the example above, g() gets
inlined into f() (also without always_inline), and f() is emitted without stack
protector. I think that matches most developers' expectations, and that's also
what GCC does.

Another effect of this change is that a no_stack_protector function can now be
inlined into a stack protected function, e.g. (https://godbolt.org/z/hafP6W856):

  extern void h(int* p);

  inline int __attribute__((__no_stack_protector__)) __attribute__((always_inline)) g() {
    return 0;
  }

  int f() {
    int a[1];
    h(a);
    return g();
  }

I think that's fine. Such code would be unusual since no_stack_protector is
normally applied to a program entry point which sets up the stack canary. And
even if such code exists, inlining doesn't change the semantics: there is still
no stack cookie setup/check around entry/exit of the g() code region, but there
may be in the surrounding context, as there was before inlining. This also
matches GCC.

See also the discussion at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722

Differential revision: https://reviews.llvm.org/D116589
2022-01-13 12:04:49 +01:00
Sanjay Patel 6bd127b079 [InstSimplify] use knownbits to fold more udiv/urem
We could use knownbits on both operands for even more folds (and there are
already tests in place for that), but this is enough to recover the example
from:
https://github.com/llvm/llvm-project/issues/51934
(the tests are derived from the code in that example)

I am assuming no noticeable compile-time impact from this because udiv/urem
are rare opcodes.

Differential Revision: https://reviews.llvm.org/D116616
2022-01-12 14:59:43 -05:00
Rosie Sumpter 552eb372cb [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter
This is required to query the legality more precisely in the LoopVectorizer.

This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.

Differential Revision: https://reviews.llvm.org/D115329
2022-01-12 13:34:12 +00:00
Mircea Trofin 248d55af3e [NFC][MLGO] Use LazyCallGraph::Node to track functions.
This avoids the InlineAdvisor carrying the responsibility of deleting
Function objects. We use LazyCallGraph::Node objects instead, which are
stable in memory for the duration of the Module-wide performance of CGSCC
passes started under the same ModuleToPostOrderCGSCCPassAdaptor (which
is the case here)

Differential Revision: https://reviews.llvm.org/D116964
2022-01-11 19:23:47 -08:00
Mircea Trofin 1f5dceb1d0 [MLGO] Add support for multiple training traces per module
This happens in e.g. regalloc, where we trace decisions per function,
but wouldn't want to spew N log files (i.e. one per function). So we
output a key-value association, where the key is an ID for the
sub-module object, and the value is the tensorflow::SequenceExample.

The current relation with protobuf is tenuous, so we're avoiding a
custom message type in favor of using the `Struct` message, but that
requires the values be wire-able strings, hence base64 encoding.

We plan on resolving the protobuf situation shortly, and improve the
encoding of such logs, but this is sufficient for now for setting up
regalloc training.

Differential Revision: https://reviews.llvm.org/D116985
2022-01-11 16:13:31 -08:00
Mircea Trofin a81b0c978f [NFC][MLGO] Remove the word "inliner" in a generic error message. 2022-01-11 12:39:16 -08:00
Arthur Eubanks bf52210e25 [NFC][LazyCallGraph] Remove check in removeDeadFunction() if graph is empty
If we're in removeDeadFunction(), we should have already constructed the call graph.

Differential Revision: https://reviews.llvm.org/D115676
2022-01-11 10:17:13 -08:00
Florian Hahn f0ef1ea6dd
[IRBuilder] Introduce folder using inst-simplify, use for Or fold.
Alternative to D116817.

This introduces a new value-based folding interface for Or (FoldOr),
which takes 2 values and returns an existing Value or a constant if the
Or can be simplified. Otherwise nullptr is returned. This replaces the
more restrictive CreateOr which takes 2 constants.

This is the used to implement a folder that uses InstructionSimplify.
The logic to simplify `Or` instructions is moved there. Subsequent
patches are going to transition other CreateXXX to the more general
FoldXXX interface.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D116935
2022-01-11 17:30:48 +00:00
Philip Reames 8f553da492 [instsimplify] Add a comment and test for a highly confusing case 2022-01-11 09:24:10 -08:00
Philip Reames e838949bee [GlobalsModRef] Apply indirect-global rule to all globals initialized from noalias calls
Extend the existing malloc-family specific optimization to all noalias calls.  This allows us to handle allocation wrappers, and removes a dependency on a lib-func check in favor of generic attribute usage.

Differential Revision: https://reviews.llvm.org/D116980
2022-01-11 08:44:31 -08:00
Florian Hahn 8a469e2050
[InstSimplify] Fold inbounds GEP to poison if base is undef.
D92270 updated constant expression folding to fold inbounds GEP to
poison if the base is undef. Apply the same logic to SimplifyGEPInst.

The justification is that we can choose an out-of-bounds pointer as base
pointer.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D117015
2022-01-11 16:11:22 +00:00
Roman Lebedev 5ceb070bbb
[SCEV] `getSequentialMinMaxExpr()`: look into `umin` when deduplicating operands
We could just merge all umin into umin_seq, but that is likely
a pessimization, so don't do that, but pretend that we did
for the purpose of deduplication.
2022-01-11 18:51:57 +03:00
Roman Lebedev 5e16650792
[SCEV] `getSequentialMinMaxExpr()`: keep only the first instance of an operand
Having the same operand more than once doesn't change the outcome here,
neither reduction-wise nor poison-wise.
We must keep the first instance specifically though.
2022-01-11 16:51:53 +03:00
Roman Lebedev 76a0abbc13
[SCEV] Reenable umin_seq support and fix the `computeSCEVAtScope()`
This reverts commit f62f47f5e1.
2022-01-11 16:03:35 +03:00
Nikita Popov 3946095b88 [MemoryBuiltins] Remove unused isOpNewLikeFn() (NFC)
This function is no longer used since
2cafbcb560.
2022-01-11 12:27:23 +01:00
Nikita Popov b56f6f1913 [MemoryBuiltins] Remove unused isStrdupLikeFn() function (NFC)
This function is no longer used after
dcbc91f40c.
2022-01-11 12:26:20 +01:00
Philip Reames f62f47f5e1 Partial revert of 82fb4f4
Two crashes have been reported.  This change disables the new logic while leaving the new node in tree.  Hopefully, that's enough to allow investigation without breakage while avoiding massive churn.
2022-01-10 18:18:34 -08:00
Philip Reames 5265ac72c6 [MemoryBuiltin] Add an API for checking if an unused allocation can be removed [NFC]
Not all allocation functions are removable if unused.  An example of a non-removable allocation would be a direct call to the replaceable global allocation function in C++.  An example of a removable one - at least according to historical practice - would be malloc.
2022-01-10 15:43:39 -08:00
Roman Lebedev 82fb4f4b22
[SCEV] Sequential/in-order `UMin` expression
As discussed in https://github.com/llvm/llvm-project/issues/53020 / https://reviews.llvm.org/D116692,
SCEV is forbidden from reasoning about 'backedge taken count'
if the branch condition is a poison-safe logical operation,
which is conservatively correct, but is severely limiting.

Instead, we should have a way to express those
poison blocking properties in SCEV expressions.

The proposed semantics is:
```
Sequential/in-order min/max SCEV expressions are non-commutative variants
of commutative min/max SCEV expressions. If none of their operands
are poison, then they are functionally equivalent, otherwise,
if the operand that represents the saturation point* of given expression,
comes before the first poison operand, then the whole expression is not poison,
but is said saturation point.
```
* saturation point - the maximal/minimal possible integer value for the given type

The lowering is straight-forward:
```
compare each operand to the saturation point,
perform sequential in-order logical-or (poison-safe!) ordered reduction
over those checks, and if reduction returned true then return
saturation point else return the naive min/max reduction over the operands
```
https://alive2.llvm.org/ce/z/Q7jxvH (2 ops)
https://alive2.llvm.org/ce/z/QCRrhk (3 ops)
Note that we don't need to check the last operand: https://alive2.llvm.org/ce/z/abvHQS
Note that this is not commutative: https://alive2.llvm.org/ce/z/FK9e97

That allows us to handle the patterns in question.

Reviewed By: nikic, reames

Differential Revision: https://reviews.llvm.org/D116766
2022-01-10 20:51:26 +03:00
Philip Reames 1d127315e7 Minor style tweaks following fb93659 2022-01-10 09:32:29 -08:00
Bryce Wilson 7febd60a90 [instcombine] Add align return attributes for operator new(..., align_val)
(Split from original patch to separate non-NFC part and add coverage.  I typoed when adding the new test, so this change includes the typo fix to let libfunc recongize the signature.  Didn't figure it was worth another separate commit.)

Differential Revision: https://reviews.llvm.org/D116851 (part 2 of 2)
2022-01-10 09:15:20 -08:00
Bryce Wilson fb936595fa [MemoryBuiltins] Add field for alignment argument [NFC]
There are a few places where the alignment argument for AlignedAllocLike functions was previously hardcoded. This patch adds an getAllocAlignment function and a change to the MemoryBuiltin table to allow alignment arguments to be found generically.

This will shortly allow alignment inference on operator new's with align_val params and an extension to Attributor's HeapToStack.  The former will follow shortly - I split Bryce's patch for purpose of having the large change be NFC.  The later will be reviewed separately.

Differential Revision: https://reviews.llvm.org/D116851 (part 1 of 2)
2022-01-10 09:15:20 -08:00
Simon Pilgrim fd1094f318 [ConstantFolding] Clean up Intrinsics::abs undef handling
Match cttz/ctlz handling by assuming C1 == 0 if C1 != 1 - I've added an assertion as well.

Fixes static analyzer nullptr dereference warnings.
2022-01-10 17:04:03 +00:00
Nikita Popov 92d55e7336 [MemoryBuiltins] Remove isNoAliasFn() in favor of isNoAliasCall()
We currently have two similar implementations of this concept:
isNoAliasCall() only checks for the noalias return attribute.
isNoAliasFn() also checks for allocation functions.

We should switch to only checking the attribute. SLC is responsible
for inferring the noalias return attribute for non-new allocation
functions (with a missing case fixed in
348bc76e35).
For new, clang is responsible for setting the attribute,
if -fno-assume-sane-operator-new is not passed.

Differential Revision: https://reviews.llvm.org/D116800
2022-01-10 09:18:15 +01:00
Simon Pilgrim be7dbd674c [DivergenceAnalysis] Simplify inRegion test based on whether the RegionLoop pointer is null or not
More closely matches the documentation

Requested by @nikic
2022-01-08 14:30:10 +00:00
Simon Pilgrim b3f193a980 [DivergenceAnalysis] Fix static analyzer warning about dereference of nullptr
We're testing that the RegionLoop pointer is null in the first part of the check, so we need to check that its non-null before dereferencing it in a later part of the check.
2022-01-08 13:57:33 +00:00
Kazu Hirata b932bdf59f [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-07 17:45:09 -08:00
Philip Reames f38873537b [MemoryBuiltin] Cleanup stale todo comments [NFC]
strdup/strndup are already partially implemented, move remaining comment to relevant place.  Remaining named routines are copy routines and mostly handled via intrinsics already - they do not allocate new memory.
2022-01-07 13:57:20 -08:00
Roman Lebedev 32300375f5
[NFCI] `ScalarEvolution::getRangeRef()`: collapse `SCEVMinMaxExpr` handling 2022-01-08 00:23:08 +03:00
Arthur Eubanks d51e3474e0 [LazyCallGraph] Ignore empty RefSCCs rather than shift RefSCCs when removing dead functions
This is in preparation for D115545 which attempts to delete discardable functions if they are unused. With that change, shifting RefSCCs becomes noticeable in compile time. This change makes the LCG update negligible again.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D116776
2022-01-07 09:42:23 -08:00
Philip Reames 6b0ff0969d Extract utility function for checking initial value of allocation [NFC, try 2]
This is a reoccuring pattern, we can consolidate three copies into one.  The main motivation is to reduce usages of isMallocLike.

The original commit (which was quickly reverted) didn't account for the allocation function could be an invoke, test coverage for that case added in this commit.
2022-01-07 08:44:08 -08:00
Roman Lebedev a5a6960d1c
[NFCI][IR] MinMaxIntrinsic: add some more helper methods, and use them 2022-01-07 13:02:11 +03:00
Philip Reames c6a0c1585a Revert "Extract utility function for checking initial value of allocation [NFC]"
This reverts commit 9ce30fe86f.  Appears to be causing a problem on a buildbot, revert while investigating.

https://green.lab.llvm.org/green//job/clang-stage1-RA/26818/consoleFull#-1502953973d489585b-5106-414a-ac11-3ff90657619c
2022-01-06 19:05:51 -08:00
Philip Reames 9ce30fe86f Extract utility function for checking initial value of allocation [NFC]
This is a reoccuring pattern, we can consolidate three copies into one.  The main motivation is to reduce usages of isMallocLike.
2022-01-06 18:02:14 -08:00
Philip Reames 5d1cfd4348 Remove unused LookThroughBitCast param in isXAllocLike functions [NFC]
This parameter took the non-default value exactly twice, and neither had semantic effect.
2022-01-06 18:02:13 -08:00
Philip Reames 7052670e96 Move getMallocAllocatedType and getMallocArraySize to GlobalOpt [NFC]
These are implementation details of the global-opt transform and not easily reuseable, so remove them from the analysis header.
2022-01-06 18:02:13 -08:00
Philip Reames 67a3331e4f Inline extractMallocCall to sole use and delete [NFC] 2022-01-06 18:02:13 -08:00
Philip Reames 4b0fc924a9 Delete unused extractCallocCall routine [NFC] 2022-01-06 18:02:13 -08:00
Philip Reames cffd268316 Demote getMallocType to implementation routine in MemoryBuiltins [NFC] 2022-01-06 18:02:13 -08:00
Daniil Suchkov 524abc68f2 Introduce NewPM .dot printers for DomTree
This patch adds a couple of NewPM function passes (dot-dom and
dot-dom-only) that dump DomTree into .dot files.

Reviewed-By: aeubanks
Differential Revision: https://reviews.llvm.org/D116629
2022-01-05 23:25:40 +00:00
Nico Weber 085f078307 Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`.""
This reverts commit 859ebca744.
The change contained many unrelated changes and e.g. restored
unit test failes for the old lld port.
2022-01-05 13:10:25 -05:00
David Salinas 859ebca744 Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."
This reverts commit 640beb38e7.

That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort).
Reverting until we have a better solution to s_cselect_b64 codegen cleanup

Change-Id: Ibf8e397df94001f248fba609f072088a46abae08

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D115960

Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
2022-01-05 17:57:32 +00:00
Philip Reames c16fd6a376 Rename doesNotReadMemory to onlyWritesMemory globally [NFC]
The naming has come up as a source of confusion in several recent reviews.  onlyWritesMemory is consist with onlyReadsMemory which we use for the corresponding readonly case as well.
2022-01-05 08:52:55 -08:00
Nikita Popov 3dc1907d06 [ConstantFold] Use ConstantFoldLoadFromUniformValue() in more places
In particular, this also preserves undef when loading from padding,
rather than converting it to zero through a different codepath.

This is the remaining part of D115924.
2022-01-05 12:47:50 +01:00
Nikita Popov 99c6b12b92 [ConstantFolding] Unify handling of load from uniform value
There are a number of places that specially handle loads from a
uniform value where all the bits are the same (zero, one, undef,
poison), because we a) don't care about the load offset in that
case b) it bypasses casts that might not be legal generally but
do work with uniform values.

We had multiple implementations of this, with a different set of
supported values each time. This replaces two usages with a more
complete helper. Other usages will be replaced separately, because
they have larger impact.

This is part of D115924.
2022-01-05 12:30:46 +01:00
Mircea Trofin a120fdd337 [NFC][MLGO]Add RTTI support for MLModelRunner and simplify runner setup 2022-01-04 19:46:14 -08:00
Sanjay Patel 1e50d06466 [Analysis] fix swapped operands to computeConstantRange
This was noted in post-commit review for D116322 / 0edf99950e .

I am not seeing how to expose the bug in a test though because
we don't pass an assumption cache into this analysis from there.
2022-01-04 13:13:50 -05:00
Philip Reames b061d86c69 [SCEV] Compute exit count from overflow check expressed w/ x.with.overflow intrinsics
This ports the logic we generate in instcombine for a single use x.with.overflow check for use in SCEV's analysis. The result is that we can prove trip counts for many checks, and (through existing logic) often discharge them.

Motivation comes from compiling a simple example with -ftrapv.

Differential Revision: https://reviews.llvm.org/D116499
2022-01-04 09:44:23 -08:00
Florian Hahn d8276208be
[LAA] Remove overeager assertion for aggregate types.
0a00d64 turned an early exit here into an assertion, but the assertion
can be triggered, as PR52920 shows.

The later code is agnostic to the accessed type, so just drop the
assert. The patch also adds tests for LAA directly and
loop-load-elimination to show the behavior is sane.
2022-01-04 15:20:35 +00:00
Nikita Popov 71b2c4a3cf [ConstantFolding] Remove unused ConstantFoldLoadThroughGEPConstantExpr()
This API is no longer used since bbeaf2aac6.
2022-01-04 12:37:12 +01:00
Rosie Sumpter 961f51fdf0 [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions without loads/stores
For loops that contain in-loop reductions but no loads or stores, large
VFs are chosen because LoopVectorizationCostModel::getSmallestAndWidestTypes
has no element types to check through and so returns the default widths
(-1U for the smallest and 8 for the widest). This results in the widest
VF being chosen for the following example,

float s = 0;
for (int i = 0; i < N; ++i)
  s += (float) i*i;

which, for more computationally intensive loops, leads to large loop
sizes when the operations end up being scalarized.

In this patch, for the case where ElementTypesInLoop is empty, the widest
type is determined by finding the smallest type used by recurrences in
the loop instead of falling back to a default value of 8 bits. This
results in the cost model choosing a more sensible VF for loops like
the one above.

Differential Revision: https://reviews.llvm.org/D113973
2022-01-04 10:12:57 +00:00
Craig Topper cbcbbd6ac8 [ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC
This function returns an upper bound on the number of bits needed
to represent the signed value. Use "Max" to match similar functions
in KnownBits like countMaxActiveBits.

Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old
name around to keep this patch size down. Will do a bulk rename as
follow up.

Rename KnownBits::countMaxSignedBits->countMaxSignificantBits.

Reviewed By: lebedev.ri, RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D116522
2022-01-03 11:33:30 -08:00
Kazu Hirata e5947760c2 Revert "[llvm] Remove redundant member initialization (NFC)"
This reverts commit fd4808887e.

This patch causes gcc to issue a lot of warnings like:

  warning: base class ‘class llvm::MCParsedAsmOperand’ should be
  explicitly initialized in the copy constructor [-Wextra]
2022-01-03 11:28:47 -08:00
Nikita Popov 5afbfe33e7 [ConstantFold] Make icmp of gep fold offset based
We can fold an equality or unsigned icmp between base+offset1 and
base+offset2 with inbounds offsets by comparing the offsets directly.

This replaces a pair of specialized folds that tried to reason
based on the GEP structure instead. One of those folds was plain
wrong (because it does not account for negative offsets), while
the other is unnecessarily complicated and limited (e.g. it will
fail with bitcasts involved).

The disadvantage of this change is that it requires data layout,
so the fold is no longer performed by datalayout-independent
constant folding. I don't think this is a loss in practice, but
it does regress the ConstantExprFold.ll test, which checks folding
without running any passes.

Differential Revision: https://reviews.llvm.org/D116332
2022-01-03 09:41:37 +01:00
Philip Reames 890e685492 [SCEV] Drop unused param from new version of computeExitLimitFromICmp [NFC] 2022-01-02 10:15:17 -08:00
Philip Reames f19a95bbed [SCEV] Split computeExitLimitFromICmp into two versions [NFC]
This is in advance of a following change which needs to the non-icmp API.
2022-01-02 09:58:32 -08:00
Kazu Hirata fd4808887e [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-01 16:18:18 -08:00
Sanjay Patel c054402170 [InstSimplify] fold or-nand-xor
~(A & B) | (A ^ B) --> ~(A & B)

https://alive2.llvm.org/ce/z/hXQucg
2021-12-31 15:11:13 -05:00
Nuno Lopes 64af9f61c3 [InstSimplify] add 'x + poison -> poison' (needed for NewGVN) 2021-12-30 11:52:42 +00:00
Fangrui Song b69fe48ccf [IROutliner] Move global namespace cl::opt inside llvm:: 2021-12-30 01:12:55 -08:00
Sanjay Patel 0edf99950e [Analysis] allow caller to choose signed/unsigned when computing constant range
We should not lose analysis precision if an 'add' has both no-wrap
flags (nsw and nuw) compared to just one or the other.

This patch is modeled on a similar construct that was added with
D59386.

I don't think it is possible to expose a problem with an unsigned
compare because of the way this was coded (nuw is handled first).

InstCombine has an assert that fires with the example from:
https://github.com/llvm/llvm-project/issues/52884
...because it was expecting InstSimplify to handle this kind of
pattern with an smax.

Fixes #52884

Differential Revision: https://reviews.llvm.org/D116322
2021-12-28 09:45:37 -05:00
Sanjay Patel 773ab3c665 [Analysis] remove unneeded casts; NFC
The callee does the casting too; this matches a plain call later in the same function for 'shl'.
2021-12-27 13:41:50 -05:00
Nikita Popov ae64c5a0fd [DSE][MemLoc] Handle intrinsics more generically
Remove the special casing for intrinsics in MemoryLocation::getForDest()
and handle them through the general attribute based code. On the DSE
side, this means that isRemovable() now needs to handle more than a
hardcoded list of intrinsics. We consider everything apart from
volatile memory intrinsics and lifetime markers to be removable.

This allows us to perform DSE on intrinsics that DSE has not been
specially taught about, using a matrix store as an example here.

There is an interesting test change for invariant.start, but I
believe that optimization is correct. It only looks a bit odd
because the code is immediate UB anyway.

Differential Revision: https://reviews.llvm.org/D116210
2021-12-24 09:29:57 +01:00
Mehrnoosh Heidarpour 0ff20f2f44 [InstSimplify] Fold logic AND to zero
Adding following fold opportunity:
((A | B) ^ A) & ((A | B) ^ B) --> 0

Reviewed By: spatel, rampitec

Differential Revision: https://reviews.llvm.org/D115755
2021-12-23 10:06:26 -05:00
Mircea Trofin edf8e3ea5e [NFC][mlgo]Make the test model generator inlining-specific
When looking at building the generator for regalloc, we realized we'd
need quite a bit of custom logic, and that perhaps it'd be easier to
just have each usecase (each kind of mlgo policy) have it's own
stand-alone test generator.

This patch just consolidates the old `config.py` and
`generate_mock_model.py` into one file, and does away with
subdirectories under Analysis/models.
2021-12-22 13:38:45 -08:00
Nikita Popov 8a0e35f3a7 [MemoryLocation] Don't require nocapture in getForDest()
As reames mentioned on related reviews, we don't need the nocapture
requirement here. First of all, from an API perspective, this is
not something that MemoryLocation::getForDest() should be checking
in the first place, because it does not affect which memory this
particular call can access; it's an orthogonal concern that should
be handled by the caller if necessary.

However, for both of the motivating users in DSE and InstCombine,
we don't need the nocapture requirement, because the capture can
either be purely local to the call (a pointer identity check that
is irrelevant to us), be part of the return value (which we check
is unused), or be written in the dest location, which we have
determined to be dead.

This allows us to remove the special handling for libcalls as well.

Differential Revision: https://reviews.llvm.org/D116148
2021-12-22 12:20:13 +01:00
Nikita Popov f5ac23b5ae [ArgPromotion][TTI] Pass types to ABI compatibility hook
The areFunctionArgsABICompatible() hook currently accepts a list of
pointer arguments, though what we're actually interested in is the
ABI compatibility after these pointer arguments have been converted
into value arguments.

This means that a) the current API is incompatible with opaque
pointers (because it requires inspection of pointee types) and
b) it can only be used in the specific context of ArgPromotion.
I would like to reuse the API when inspecting calls during inlining.

This patch converts it into an areTypesABICompatible() hook, which
accepts a list of types. This makes the method more generally usable,
and compatible with opaque pointers from an API perspective (the
actual usage in ArgPromotion/Attributor is still incompatible,
I'll follow up on that in separate patches).

Differential Revision: https://reviews.llvm.org/D116031
2021-12-22 09:37:51 +01:00
Serge Pavlov 77b923d0db [ConstantFolding] Do not remove side effect from constrained functions
According to the discussion in https://reviews.llvm.org/D110322 the code
that removes side effect from replaced function call is deleted.

Differential Revision: https://reviews.llvm.org/D115870
2021-12-22 13:45:49 +07:00
Nikita Popov 2926d6d335 [ConstantFold][GlobalOpt] Don't create x86_mmx null value
This fixes the assertion failure reported at
https://reviews.llvm.org/D114889#3198921 with a straightforward
check, until the cleaner fix in D115924 can be reapplied.
2021-12-21 09:11:41 +01:00
Kazu Hirata 500c4b68dc [llvm] Construct SmallVector with iterator ranges (NFC) 2021-12-20 23:43:24 -08:00
Philip Reames 44d23d5345 [DSE] Remove calls with known writes to dead memory
This is a reapply of a8a51fe5, which was reverted in 1ba99e due to a failing compiler-rt test.   That test was a false positive because it was checking asan failures not accounting for the fact the call could be validly optimized out.  I hopefully managed to stablize that test in 9b955f.  (That's a speculative fix due to disk consumption needed to build compiler-rt tests locally being absurd.)

Original commit message follows..

The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size.

Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet).

In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest.

Differential Revision: https://reviews.llvm.org/D115904
2021-12-20 18:10:23 -08:00
Sanjay Patel a56803b8f8 [Analysis] fix cast in ValueTracking to allow constant expression
The test would crash because a non-instruction negate op made it in here.

Fixes #51506
2021-12-20 17:16:47 -05:00
Sander de Smalen b1ff20fd35 [LV] Enable scalable vectorization by default for SVE cores.
The availability of SVE should be sufficient to enable scalable
auto-vectorization.

This patch adds a new TTI interface to query the target what style of
vectorization it wants when scalable vectors are available. For other
targets than AArch64, this currently defaults to 'FixedWidthOnly'.

Differential Revision: https://reviews.llvm.org/D115651
2021-12-20 16:23:29 +00:00
Nikita Popov aeb36ae0f4 Revert "[ConstantFolding] Unify handling of load from uniform value"
This reverts commit 9fd4f80e33.

This breaks SingleSource/Regression/C/gcc-c-torture/execute/pr19687.c
in test-suite. Either the test is incorrect, or clang is generating
incorrect union initialization code. I've submitted
https://reviews.llvm.org/D115994 to fix the test, assuming my
interpretation is correct. Reverting this in the meantime as it
may take some time to resolve.
2021-12-18 20:46:52 +01:00
Ricky Zhou 9927a06f74 [AA] Handle callbr instructions in alias analysis
Before this change, AAResults::getModRefInfo() was missing a case for
callbr instructions (asm goto), which may read/write memory. In PR52735,
this led to a miscompile where a load was incorrect eliminated.

Add this missing case, as well as an assert verifying that all
memory-accessing instructions are handled properly.

Fixes #52735.

Differential Revision: https://reviews.llvm.org/D115992
2021-12-18 18:49:17 +01:00
Nikita Popov 1ba99eaf70 Revert "[DSE] Remove calls with known writes to dead memory"
This reverts commit a8a51fe556.

This breaks the strncpy-overflow.cpp test case.
2021-12-18 09:23:41 +01:00
Philip Reames a8a51fe556 [DSE] Remove calls with known writes to dead memory
The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size.

Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet).

In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest.

Differential Revision: https://reviews.llvm.org/D115904
2021-12-17 13:42:36 -08:00
Philip Reames 793c0da89e [capturetracking] Explicitly check for callee operand [NFC]
Pull out an explicit check rather than relying on the fact that the callee operand is not a data operand.  The only real value is it gives us a clear place to move the comment, and makes the code slightly more understandable.
2021-12-17 09:21:35 -08:00
Nikita Popov 9fd4f80e33 [ConstantFolding] Unify handling of load from uniform value
There are a number of places that specially handle loads from a
uniform value where all the bits are the same (zero, one, undef,
poison), because we a) don't care about the load offset in that
case and b) it bypasses casts that might not be legal generally
but do work with uniform values.

We had multiple implementations of this, with a different set of
supported values each time, as well as incomplete type checks in
some cases. In particular, this fixes the assertion reported in
https://reviews.llvm.org/D114889#3198921, as well as a similar
assertion that could be triggered via constant folding.

Differential Revision: https://reviews.llvm.org/D115924
2021-12-17 17:05:06 +01:00
Momchil Velikov 6192c312cf [AA] Correctly maintain the sign of PartiaAlias offset
Preserve the invariant that offset reported in the case of a
`PartialAlias` between `Loc1` and `Loc2`, is such that
`Loc1 + Offset = Loc2`, where `Loc1` and `Loc2` are the first and
the second argument, respectively, in alias queries.

Differential Revision: https://reviews.llvm.org/D115927
2021-12-17 15:45:26 +00:00
Florian Hahn f5f421e0ee
[SCEV] Apply loop guards in reverse order.
This patch updates applyLoopGuards to first collect all conditions and
then applies them in reverse order. This ensures the SCEVs with the
shortest dependency chains are constructed first, limiting the required
stack size.

This fixes a crash reported in D113578.

Note that the order conditions are applied can impact the accuracy of
the result, mostly due to missing min/max simplifications when
constructing SCEVs.

The changed test highlights the impact of the evaluation order. I will
follow up with a SCEV patch to improve min/max simplifications to get
the same results for both orders.
2021-12-16 10:52:37 +00:00
Nikita Popov a8c2ba105d [Inline] Disable deferred inlining
After the switch to the new pass manager, we have observed multiple
instances of catastrophic inlining, where the inliner produces huge
functions with many hundreds of thousands of instructions from small
input IR. We were forced to back out the switch to the new pass
manager for this reason. This patch fixes at least one of the root
cause issues.

LLVM uses a bottom-up inliner, and the fact that functions are processed
bottom-up is not just a question of optimality -- it is an imporant
requirement to prevent runaway inlining. The premise of the current
inlining approach and cost model is that after all calls inside a function
have been inlined, it may get large enough that inlining it into its
callers is no longer considered profitable. This safeguard does not
exist if inlining doesn't happen bottom-up, as inlining the callees,
and their callees, and their callees etc. will always seem individually
profitable, and the inliner can easily flatten the whole call tree.

There are instances where we necessarily have to deviate from bottom-up
inlining: When inlining in an SCC there is no natural "bottom", so
inlining effectively happens top-down. This requires special care,
and the inliner avoids exponential blowup by ensuring that functions
in the SCC grow in a balanced way and will eventually hit the threshold.

However, there is one instance where the inlining advisor explicitly
violates the bottom-up principle: Deferred inlining tries to "defer"
inlining a call if it determines that inlining the caller into all
its call-sites would be more profitable. Something very important to
understand about deferred inlining is that it doesn't make one inlining
choice in place of another -- it effectively chooses to do both. If we
have a call chain A -> B -> C and cost modelling tells us that inlining
B -> C is profitable, but we defer this and instead inline A -> B first,
then we'll now have a call A -> C, and the cost model will (a few special
cases notwithstanding) still tell us that this is profitable. So the end
result is that we inlined *both* B and C, even though under the usual
cost model function B would have been too large to further inline after
C has been integrated into it.

Because deferred inlining violates the bottom-up invariant of the inliner,
it can result in exponential inlining. The exponential-deferred-inlining.ll
test case illustrates this on a simple example (see
https://gist.github.com/nikic/1262b5f7d27278e1b34a190ae10947f5 for a
much more catastrophic case with about 5000x size blowup). If the call
chain A -> B -> C is not a chain but a tree of calls, then we end up
deferring inlining across the tree and end up flattening everything into
the root node.

This patch proposes to address this by disabling deferred inlining
entirely (currently still behind an option). Beyond the issue of
exponential inlining, I don't think that the whole concept makes sense,
at least as long as deferred inlining still ends up inlining both call
edges.

I believe the motivation for having deferred inlining in the first place
is that you might have a small wrapper function with local linkage that
could be eliminated if inlined. This would automatically happen if there
was a single caller, due to the large "last call to local" bonus. However,
this bonus is not extended if there are multiple callers, even if we
would eventually end up inlining into all of them (if the bonus were
extended).

Now, unlike the normal inlining cost model, the deferred inlining cost
model does look at all callers, and will extend the "last call to local"
bonus if it determines that we could inline all of them as long as we
defer the current inlining decision. This makes very little sense.
The "last call to local" bonus doesn't really cost model anything.
It's basically an "infinite" bonus that ensures we always inline the
last call to a local. The fact that it's not literally infinite just
prevents inlining of huge functions, which can easily result in
scalability issues. I very much doubt that it was an intentional
cost-modelling choice to say that getting rid of a small local function
is worth adding 15000 instructions elsewhere, yet this is exactly how
this value is getting used here.

The main alternative I see to complete removal is to change deferred
inlining to an actual either/or decision. That is, to mark deferred
calls as noinline so we're actually trading off one inlining decision
against another, and not just adding a side-channel to the cost model
to do both.

Apart from fixing the catastrophic inlining case, the effect on rustc
is a modest compile-time improvement on average (up to 8% for a
parsing-type crate, where tree-like calls are expected) and pretty
neutral where run-time performance is concerned (mix of small wins
and losses, usually in the sub-1% category).

Differential Revision: https://reviews.llvm.org/D115497
2021-12-16 09:59:50 +01:00
Mircea Trofin db5aceb979 [NFC] Expose the ReleaseModeModelRunner
The type was pretty much generic, just needed a bit of parameterization.

Differential Revision: https://reviews.llvm.org/D115764
2021-12-15 23:21:58 -08:00
Fangrui Song cf9e61a9bb [LTO][WPD] Simplify mustBeUnreachableFunction and test after D115492
An well-formed IR function definition must have an entry basic block and
a well-formed IR basic block must have one terminator so the emptiness
check can be simplified.
Also simplify the test a bit.

Reviewed By: luna

Differential Revision: https://reviews.llvm.org/D115780
2021-12-15 15:43:35 -08:00
Arthur Eubanks 5a81a60391 [NFC] Remove more calls to getAlignment()
These are deprecated and should be replaced with getAlign().

Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.
2021-12-15 14:40:57 -08:00
Mingming Liu 09a704c5ef [LTO] Ignore unreachable virtual functions in WPD in hybrid LTO.
Differential Revision: https://reviews.llvm.org/D115492
2021-12-14 20:18:04 +00:00
Philip Reames 423f19680a Add FMF to hasPoisonGeneratingFlags/dropPoisonGeneratingFlags
These flags are documented as generating poison values for particular input values. As such, we should really be consistent about their handling with how we handle nsw/nuw/exact/inbounds.

Differential Revision: https://reviews.llvm.org/D115460
2021-12-14 08:43:00 -08:00
Florian Hahn ddfac0759c
Revert "[MemoryLocation] Handle memset_pattern{4,8,16} in getForDest."
This reverts commit ac60263ad1.

It looks like the test fails on certain non-Darwin system, even though
the triple is explicitly set to macos. Revert while I investigate.
2021-12-14 14:48:47 +00:00
Nikita Popov 7abf299fed [InlineAdvisor] Add option to control deferred inlining (NFC)
This change is split out from D115497 to add the option
independently from the switch of the default value.
2021-12-14 15:46:11 +01:00
Florian Hahn ac60263ad1
[MemoryLocation] Handle memset_pattern{4,8,16} in getForDest.
memset_pattern{4,8,16} writes to the first argument. Use getForDest
to return the corresponding MemoryLocation.

Reviewed By: ab

Differential Revision: https://reviews.llvm.org/D114906
2021-12-14 14:41:28 +00:00
Kazu Hirata d2377f24e1 Ensure newlines at the end of files (NFC) 2021-12-12 11:04:44 -08:00
Nikita Popov 9932d4db0d [SCEV] Fix unused variable warning (NFC) 2021-12-11 21:03:54 +01:00
Mircea Trofin 04f2712ef4 [NFC][MLGO] Factor ModelUnderTrainingRunner for reuse
This is so we may reuse it. It was very non-inliner specific already.

Differential Revision: https://reviews.llvm.org/D115465
2021-12-10 11:24:15 -08:00
Nikita Popov 65bec04295 [ConstantFold] Handle same type in ConstantFoldLoadThroughBitcast
Usually the case where the types are the same ends up being handled
fine because it's legal to do a trivial bitcast to the same type.
However, this is not true for aggregate types. Short-circuit the
whole code if the types match exactly to account for this.
2021-12-10 16:39:50 +01:00
Sameer Sahasrabuddhe 1d0244aed7 Reapply CycleInfo: Introduce cycles as a generalization of loops
Reverts 02940d6d22. Fixes breakage in the modules build.

LLVM loops cannot represent irreducible structures in the CFG. This
change introduce the concept of cycles as a generalization of loops,
along with a CycleInfo analysis that discovers a nested
hierarchy of such cycles. This is based on Havlak (1997), Nesting of
Reducible and Irreducible Loops.

The cycle analysis is implemented as a generic template and then
instatiated for LLVM IR and Machine IR. The template relies on a new
GenericSSAContext template which must be specialized when used for
each IR.

This review is a restart of an older review request:
https://reviews.llvm.org/D83094

Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>,
with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>

Differential Revision: https://reviews.llvm.org/D112696
2021-12-10 14:36:43 +05:30
Hasyimi Bahrudin c1cd698a52 [InstSimplify] Simplify bool icmp with not in LHS
Refer to https://llvm.org/PR52546.

Simplifies the following cases:
    not(X) == 0 -> X != 0 -> X
    not(X) <=u 0 -> X >u 0 -> X
    not(X) >=s 0 -> X <s 0 -> X
    not(X) != 1 -> X == 1 -> X
    not(X) <=u 1 -> X >=u 1 -> X
    not(X) >s 1 -> X <=s -1 -> X

Differential Revision: https://reviews.llvm.org/D114666
2021-12-09 16:26:46 -05:00
Arthur Eubanks 1172712f46 [NFC] Replace some deprecated getAlignment() calls with getAlign()
Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D115370
2021-12-09 08:43:19 -08:00
Nikita Popov 3beafecedf [InlineAdvisor] Remove outdated comment (NFC)
This just returns None nowadays, so this comment doesn't apply
anymore.
2021-12-09 15:11:56 +01:00
Florian Hahn d74a8a78ad
[LV] Mark various functions as const (NFC).
Make sure various accessors do not modify any state, in preparation for
D115111.
2021-12-09 10:51:29 +00:00
Mircea Trofin 059e03476c [NFC][mlgo] Generalize model runner interface
This prepares it for the regalloc work. Part of it is making model
evaluation accross 'development' and 'release' scenarios more reusable.
This patch:
- extends support to tensors of any shape (not just scalars, like we had
in the inliner -Oz case). While the tensor shape can be anything, we
assume row-major layout and expose the tensor as a buffer.
- exposes the NoInferenceModelRunner, which we use in the 'development'
mode to keep the evaluation code path consistent and simplify logging,
as we'll want to reuse it in the regalloc case.

Differential Revision: https://reviews.llvm.org/D115306
2021-12-08 20:10:58 -08:00
Florian Hahn 3c55acc4a6
[MemoryLocation] Support memset_pattern{4,8} in getForArgument.
memset_pattern{4,8} behave as memset_pattern16, with the only difference
being the size of the pattern location.

Reviewed By: ab

Differential Revision: https://reviews.llvm.org/D114905
2021-12-08 19:39:45 +00:00
Jolanta Jensen 77b2bb5567 [LAA] Use type sizes when determining dependence.
In the isDependence function the code does not try hard enough
to determine the dependence between types. If the types are
different it simply gives up, whereas in fact what we really
care about are the type sizes. I've changed the code to compare
sizes instead of types.

Reviewed By: fhahn, sdesmalen

Differential Revision: https://reviews.llvm.org/D108763
2021-12-08 15:00:58 +00:00
James Farrell 219672b8dd Revert "Revert "Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible.""
This reverts commit 63a6348cad.

Differential Revision: https://reviews.llvm.org/D115254
2021-12-07 23:15:21 +00:00
Jonas Devlieghere 02940d6d22 Revert "CycleInfo: Introduce cycles as a generalization of loops"
This reverts commit 0fe61ecc2c because it
breaks the modules build.

https://green.lab.llvm.org/green/job/clang-stage2-rthinlto/4858/
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/39112/
2021-12-07 13:06:34 -08:00
Sanjay Patel 8a69b04478 [InstSimplify] add logic fold for 'or' with 'xor'+'and'
This replaces the 'or' from 4b30076f16 with an 'and'.
We have to guard against propagating undef elements from
vector 'not' values:
https://alive2.llvm.org/ce/z/irMwRc
2021-12-07 11:08:26 -05:00
Cullen Rhodes 0395e01583 [IR] Split vscale_range interface
Interface is split from:

  std::pair<unsigned, unsigned> getVScaleRangeArgs()

into separate functions for min/max:

  unsigned getVScaleRangeMin();
  Optional<unsigned> getVScaleRangeMax();

Reviewed By: sdesmalen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D114075
2021-12-07 10:38:26 +00:00
Sameer Sahasrabuddhe 0fe61ecc2c CycleInfo: Introduce cycles as a generalization of loops
LLVM loops cannot represent irreducible structures in the CFG. This
change introduce the concept of cycles as a generalization of loops,
along with a CycleInfo analysis that discovers a nested
hierarchy of such cycles. This is based on Havlak (1997), Nesting of
Reducible and Irreducible Loops.

The cycle analysis is implemented as a generic template and then
instatiated for LLVM IR and Machine IR. The template relies on a new
GenericSSAContext template which must be specialized when used for
each IR.

This review is a restart of an older review request:
https://reviews.llvm.org/D83094

Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>,
with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>

Differential Revision: https://reviews.llvm.org/D112696
2021-12-07 12:02:34 +05:30
James Farrell 63a6348cad Revert "Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible."
This reverts commit 5032467034.
2021-12-06 17:35:26 +00:00
Bardia Mahjour dfcfd14070 [VP] getVPMemoryOpCost interface
Added TTI queries for the cost of a VP Memory operation, and added Opcode,
DataType and Alignment to the hasActiveVectorLength() interface.

Reviewed By: Roland Froese

Differential Revision: https://reviews.llvm.org/D109416
2021-12-06 11:27:07 -05:00
James Farrell 5032467034 Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible.
This reverts commit 40d5eeac6c.

Differential Revision: https://reviews.llvm.org/D114885
2021-12-06 14:57:47 +00:00
Kazu Hirata 1457e78352 [llvm] Use range-based for loops (NFC) 2021-12-05 08:33:02 -08:00
Sanjay Patel c65e651e60 [InstSimplify] fix logic fold of 'or' for vectors
Reduce code duplication for commutative pattern matching
and fix a miscompile.

We can't safely propagate an undef element in this transform:
https://alive2.llvm.org/ce/z/s5xy55
2021-12-05 09:57:07 -05:00
Florian Hahn 203f29b40c
[MemoryLocation] Use getForArgument in getForSource/getForDest. (NFC)
getForArgument already knows how to extract a memory location for all
memory intrinsics. Use it instead of duplicating the logic.
2021-12-05 11:13:14 +00:00
Florian Hahn a9125792b3
[MemoryLocation] Support missing atomic intrinsics in getForArg.
getForArgument is missing support for atomic memory transfer
intrinsics. In terms of accessed locations they behave like regular
memory transfer intrinsics and we already support them as such in
getForSource/getForDest.
2021-12-04 22:18:39 +00:00
Mehrnoosh Heidarpour e94134052f [InstSimplify] Add logic 'or' fold to -1
Adding the following folding opportunity:
(~A | B) | (A ^ B) --> -1

https://alive2.llvm.org/ce/z/PMtdYB

Differential revision: https://reviews.llvm.org/D114996
2021-12-04 15:04:18 -05:00
Florian Hahn ead3979a92
[MemoryLocation] Move DSE intrinsic handling to MemoryLocation. (NFC)
Suggested in D114872.
2021-12-03 16:00:39 +00:00
David Green ab0c5cea0b [ARM] Use v2i1 for MVE and CDE intrinsics
This adjusts all the MVE and CDE intrinsics now that v2i1 is a legal
type, to use a <2 x i1> as opposed to emulating the predicate with a
<4 x i1>. The v4i1 workarounds have been removed leaving the natural
v2i1 types, notably in vctp64 which now generates a v2i1 type.

AutoUpgrade code has been added to upgrade old IR, which needs to
convert the old v4i1 to a v2i1 be converting it back and forth to an
integer with arm.mve.v2i and arm.mve.i2v intrinsics. These should be
optimized away in the final assembly.

Differential Revision: https://reviews.llvm.org/D114455
2021-12-03 15:27:58 +00:00
Florian Hahn af86aa7980
[MemoryLocation] Use None instead of {}. (NFC) 2021-12-03 13:19:00 +00:00
Florian Hahn f078536f46
[MemoryLocation] Move DSE's logic to new MemLoc::getForDest helper (NFC).
DSE has some extra logic to determine the write location of library
calls like str*cpy and str*cat. This patch moves the logic to a new
MemoryLocation:getForDest variant, which takes a call and TLI.

This patch should be NFC, because no other places take advantage of the
new helper yet.

Suggested by @reames post-commit 7eec832def.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114872
2021-12-03 09:12:01 +00:00
Nikita Popov 49d040ac97 [SCEV] Fix ValuesAtScopesUsers consistency
Fixes verification failure reported at:
https://reviews.llvm.org/rGc9f9be0381d1

The issue is that getSCEVAtScope() might compute a result without
inserting it in the ValuesAtScopes map in degenerate cases,
specifically if the ValuesAtScopes entry is invalidated during the
calculation. Arguably we should still insert the result if no
existing placeholder is found, but for now just tweak the logic
to only update ValuesAtScopesUsers if ValuesAtScopes is updated.
2021-12-03 10:03:10 +01:00
Florian Hahn 829b29b619
[MemoryLocation] strcat/strncat/strcpy read/write after their args.
strcpy/strcat/strncat access memory starting from the passed in
pointers. Construct memory locations for their args using getAfter.

Discussed in D114872.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D114969
2021-12-03 08:48:23 +00:00
Florian Hahn 639a78a4bf
[MemoryLocation] Support strncpy in getForArgument.
The size argument of strncpy can be used as bound for the size of
its pointer arguments.

strncpy is guaranteed to write N bytes and reads up to N bytes.

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D114871
2021-12-02 14:18:05 +00:00
Sanjay Patel 97e921c81f [PatternMatch] create and use matcher for 'not' that excludes undef elements
We needed a stricter version of m_Not for D114462, but I wasn't
sure if that was going to be required anywhere else, so I didn't bother
to make that reusable.

It turns out we have one more existing simplification that needs
this (currently miscompiles):
https://alive2.llvm.org/ce/z/9-nTKi

And there's at least one more fold in that family that we could add.

Differential Revision: https://reviews.llvm.org/D114882
2021-12-02 08:51:13 -05:00
Florian Hahn 9f9e8ba114
[MemoryLocation] Support memset_chk in getForArgument.
The size argument for memset_chk is an upper bound for the size of the
pointer argument. memset_chk may write less than the specified length,
if it exceeds the specified max size and aborts.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114870
2021-12-02 13:45:58 +00:00
Florian Hahn ad88a37cea
[TLI] Add memset_pattern4, memset_pattern8 lib functions.
Similar to memset_pattern16, memset_pattern4, memset_pattern8 are
available on Darwin platforms.

https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/memset_pattern4.3.html

Reviewed By: ab

Differential Revision: https://reviews.llvm.org/D114881
2021-12-01 21:18:19 +00:00
Nikita Popov 67704801c6 [SCEV] Track backedge taken count users (NFCI)
Track which SCEVs are used as ExactNotTaken counts in
BackedgeTakenInfo structures, so we can directly determine which
loops need to be invalidated, rather than iterating over all BECounts.

This gives a small compile-time improvement on average, but the
motivation here is more to ensure there are no degenerate cases,
if the number of backedge taken counts is large.

Differential Revision: https://reviews.llvm.org/D114784
2021-12-01 10:16:47 +01:00
Nikita Popov c9f9be0381 [SCEV] Verify integrity of ValuesAtScopes and users (NFC)
Make sure that ValuesAtScopes and ValuesAtScopesUsers are
consistent during SCEV verification.
2021-11-30 21:08:40 +01:00
Sanjay Patel 4b30076f16 [InstSimplify] add logic fold for 'or'
https://alive2.llvm.org/ce/z/4PaPDy

There's a related fold where the inner 'or' is replaced by 'and',
but that needs to be more careful about matching a 'not'.
2021-11-30 14:08:54 -05:00
Sanjay Patel c49ef1448d [InstSimplify] reduce code duplication for 'or' logic folds; NFC 2021-11-30 14:08:54 -05:00
Sanjay Patel 7a7c059d86 [InstSimplify] reduce code duplication for 'or' logic fold; NFC 2021-11-30 12:55:37 -05:00
Sanjay Patel 8dec0b23da [InstSimplify] refactor 'or' logic folds; NFC
Reduce duplication for handling the top-level commuted operands.
There are several other folds that should be moved in here, but
we need to make sure there's good test coverage.
2021-11-30 12:55:36 -05:00
Nikita Popov 40d5eeac6c Revert "Use VersionTuple for parsing versions in Triple. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible."
This reverts commit 1e82864670.

llvm/test/Transforms/LoopStrengthReduce/X86/2009-11-10-LSRCrash.ll fails
with assertion failure:

llc: /home/nikic/llvm-project/llvm/include/llvm/ADT/Optional.h:196: T& llvm::optional_detail::OptionalStorage<T, true>::getValue() & [with T = unsigned int]: Assertion `hasVal' failed.
...
 #8 0x00005633843af5cb llvm::MCStreamer::emitVersionForTarget(llvm::Triple const&, llvm::VersionTuple const&)
 #9 0x0000563383b47f14 llvm::AsmPrinter::doInitialization(llvm::Module&)
2021-11-30 18:36:32 +01:00
kpyzhov a356dae74c [RegionPass] Added check for -filter-print-funcs option to the region IR dumps.
Differential Revision: https://reviews.llvm.org/D114310
2021-11-30 12:30:15 -05:00
Nikita Popov 37d72991c1 [SCEV] Track and invalidate ValuesAtScopes users
ValuesAtScopes maps a SCEV and a Loop to another SCEV. While we
invalidate entries if the left-hand SCEV is invalidated, we
currently don't do this for the right-hand SCEV. Fix this by
tracking users in a reverse map and using it for invalidation.

This is conceptually the same change as D114738, but using the
reverse map to avoid performance issues.

Differential Revision: https://reviews.llvm.org/D114788
2021-11-30 18:21:14 +01:00
James Farrell 1e82864670 Use VersionTuple for parsing versions in Triple. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible.
See also https://github.com/android/ndk/issues/1455.

Differential Revision: https://reviews.llvm.org/D114163
2021-11-30 15:44:23 +00:00
Nikita Popov 77dd579827 [SCEV] Remove incorrect assert
Fix assertion failure reported on D113349 by removing the assert.
While the produced expression should be equivalent, it may not
be strictly the same, e.g. due to lazy nowrap flag updates. Similar
to what the main createSCEV() code does, simply retain the old
value map entry if one already exists.
2021-11-29 17:09:12 +01:00
Florian Hahn 7b75110fac
[SCEV] Turn validity check in getExistingSCEV into assert (NFC).
Now that we track users of SCEV expressions, we should be able to always
invalidate containing expressions.

With that, I think the case where a value gets removed but
SCEVs containing references to it should not be possible any longer.
Turn check into an assert.

This slightly reduces compile-time:

NewPM-O3: -0.27%
NewPM-ReleaseThinLTO: -0.21%
NewPM-ReleaseLTO-g: -0.26%

http://llvm-compile-time-tracker.com/compare.php?from=c3dc6b081da6ba503e67d260033f81f61eb38ea3&to=95a4a028b1f1dd0bc3d221435953b7d2c031b3d5&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114633
2021-11-28 12:16:55 +00:00
Nikita Popov f492a414ba [SCEV] Simplify forgetSymbolicName() (NFCI)
With the recently introduced tracking as well as D113349, we can
greatly simplify forgetSymbolicName(). In fact, we can simply
replace it with forgetMemoizedResults().

What forgetSymbolicName() used to do is to walk the IR use-def
chain to find all SCEVs that mention the SymbolicName. However,
thanks to use tracking, we can now determine the relevant SCEVs
in a more direct way. D113349 is needed to also clear out the
actual IR to SCEV mapping in ValueExprMap.

Differential Revision: https://reviews.llvm.org/D114263
2021-11-27 16:42:38 +01:00
Nikita Popov c2550e3427 [SCEV] Simplify invalidation after BE count calculation (NFCI)
After backedge taken counts have been calculated, we want to
invalidate all addrecs and dependent expressions in the loop,
because we might compute better results with the newly available
backedge taken counts. Previously this was done with a forgetLoop()
style use-def walk. With recent improvements to SCEV invalidation,
we can instead directly invalidate any SCEVs using addrecs in this
loop. This requires a great deal less subtlety to avoid invalidating
more than necessary, and in particular gets rid of the hack from
D113349. The change is similar to D114263 in spirit.
2021-11-27 16:35:06 +01:00
Nikita Popov 2b160e95c8 Reland [SCEV] Fix and validate ValueExprMap/ExprValueMap consistency
Relative to the previous landing attempt, this introduces an additional
flag on forgetMemoizedResults() to not remove SCEVUnknown phis from
the value map. The invalidation after BECount calculation wants to
leave these alone and skips them in its own use-def walk, but we can
still end up invalidating them via forgetMemoizedResults() if there
is another IR value with the same SCEV. This is intended as a temporary
workaround only, and the need for this should go away once the
getBackedgeTakenInfo() invalidation is refactored in the spirit of
D114263.

-----

This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:

* Addrec construction directly wrote to ValueExprMap in a few places,
  without updating ExprValueMap. Add a helper to ensures they stay
  consistent. The adjustment in forgetSymbolicName() explicitly
  drops the old value from the map, so that we don't rely on it
  being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
  ExprValueMap, but not dropping the corresponding entries from
  ValueExprMap.

Differential Revision: https://reviews.llvm.org/D113349
2021-11-27 12:37:15 +01:00
Erik Desjardins 53b00b8215 [InstSimplify] Fold X {lshr,udiv} C <u X --> true for nonzero X, non-identity C
This eliminates the bounds check in Rust code like

pub fn mid(data: &[i32]) -> i32 {
  if data.is_empty() { return 0; }
  return data[data.len()/2];
}

(from https://blog.sigplan.org/2021/11/18/undefined-behavior-deserves-a-better-reputation/)

Alive proofs:
lshr https://alive2.llvm.org/ce/z/nyTu8D
udiv https://alive2.llvm.org/ce/z/CNUZH7

Differential Revision: https://reviews.llvm.org/D114279
2021-11-26 16:48:33 -05:00
Nikita Popov 719354a571 Revert "[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency"
This reverts commit bee8dcda1f.

Some sanitizer buildbots fail with:
> Attempt to use a SCEVCouldNotCompute object!

For example:
https://lab.llvm.org/buildbot/#/builders/85/builds/7020/steps/9/logs/stdio
2021-11-26 22:18:23 +01:00
Nikita Popov bee8dcda1f [SCEV] Fix and validate ValueExprMap/ExprValueMap consistency
Relative to the previous landing attempt, this makes
insertValueToMap() resilient against the value already being
present in the map -- previously I only checked this for the
createSimpleAffineAddRec() case, but the same issue can also
occur for the general createNodeForPHI(). In both cases, the
addrec may be constructed and added to the map in a recursive
query trying to create said addrec. In this case, this happens
due to the invalidation when the BE count is computed, which
ends up clearing out the symbolic name as well.

-----

This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:

* Addrec construction directly wrote to ValueExprMap in a few places,
  without updating ExprValueMap. Add a helper to ensures they stay
  consistent. The adjustment in forgetSymbolicName() explicitly
  drops the old value from the map, so that we don't rely on it
  being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
  ExprValueMap, but not dropping the corresponding entries from
  ValueExprMap.

Differential Revision: https://reviews.llvm.org/D113349
2021-11-26 20:57:47 +01:00
Florian Hahn b927aa69bf
[SCEV] Turn check in createSimpleAffineAddRec to assertion. (NFC)
Accum is guaranteed to be defined outside L (via Loop::isLoopInvariant
checks above). I think that should guarantee that the more powerful
ScalarEvolution::isLoopInvariant also determines that the value is loop
invariant.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114634
2021-11-26 13:23:48 +00:00
Zarko Todorovski 95875d246a [LLVM][NFC]Inclusive language: remove occurances of sanity check/test from llvm
Part of work to use more inclusive language in clang/llvm. Rewording
some comments and change function and variable names.
2021-11-24 17:29:55 -05:00
Peter Waller 787b66eb5f [LoopAccessAnalysis][SVE] Bail out for scalable vectors
The supplied test case, reduced from real world code, crashes with a
'Invalid size request on a scalable vector.' error.

Since it's similar in spirit to an existing LAA test, rename the file to
generalize it to both.

Differential Revision: https://reviews.llvm.org/D114155
2021-11-24 15:52:20 +00:00
Sanjay Patel b326c05814 [InstSimplify] fold xor logic of 2 variables, part 2
(~a & b) ^ (a | b) --> a

This is the swapped and/or (Demorgan?) sibling fold for
the fold added with D114462 ( 892648b18a ).

This case is easier to specify because we are returning
a root value, not a 'not':
https://alive2.llvm.org/ce/z/SRzj4f
2021-11-24 08:15:47 -05:00
Rosie Sumpter c2441b6b89 [LoopVectorize] Add vector reduction support for fmuladd intrinsic
Enables LoopVectorize to handle reduction patterns involving the
llvm.fmuladd intrinsic.

Differential Revision: https://reviews.llvm.org/D111555
2021-11-24 08:50:04 +00:00
Florian Mayer 6c06d8e310 [stack-safety] Check SCEV constraints at memory instructions.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D113160
2021-11-23 15:29:23 -08:00
Florian Hahn 73a05cc8df
[LAA] Move visitPointers up in file (NFC).
This allows easier re-use in earlier functions.
2021-11-23 22:47:26 +00:00
Sanjay Patel 892648b18a [InstSimplify] fold xor logic of 2 variables
(a & b) ^ (~a | b) --> ~a

I was looking for a shortcut to reduce some of the complex logic
folds that are currently up for review (D113216
and others in that stack), and I found this missing from
instcombine/instsimplify.

There is a trade-off in putting it into instsimplify: because
we can't create new values here, we need a strict 'not' op (no
undef elements). Otherwise, the fold is not valid:
https://alive2.llvm.org/ce/z/k_AGGj

If this was in instcombine instead, we could create the proper
'not'. But having the fold here benefits other passes like GVN
that use instsimplify as an analysis.

There is a related fold where 'and' and 'or' are swapped, and
that is planned as a follow-up commit.

Differential Revision: https://reviews.llvm.org/D114462
2021-11-23 16:50:23 -05:00
Florian Hahn 0a00d64e32
[LAA] Turn aggregate type check into assertion (NFCI).
getPtrStride should not be called with aggregate access types. There's
also an old TODO.

Turn the check into an assertion.
2021-11-23 17:37:30 +00:00
Paul Robinson c075566c8d [PS4][TLI] Remove redundant line 2021-11-23 08:42:32 -08:00
Nikita Popov 62e9acad0a Revert "[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency"
This reverts commit d633db8f9d.

Causes bootstrap assertion failures:
https://lab.llvm.org/buildbot/#/builders/168/builds/3459/steps/9/logs/stdio
2021-11-22 15:47:33 +01:00
Nikita Popov d633db8f9d [SCEV] Fix and validate ValueExprMap/ExprValueMap consistency
This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:

* Addrec construction directly wrote to ValueExprMap in a few places,
  without updating ExprValueMap. Add a helper to ensures they stay
  consistent. The adjustment in forgetSymbolicName() explicitly
  drops the old value from the map, so that we don't rely on it
  being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
  ExprValueMap, but not dropping the corresponding entries from
  ValueExprMap.

Differential Revision: https://reviews.llvm.org/D113349
2021-11-22 15:27:25 +01:00
Simon Moll 56db1c072c [DA][NFC] Update publication - add remarks
Update the reference publication for the SyncDependenceAnalysis and Divergence Analysis.  Fix phrasing, formatting. Add comments on reducible loop limitation.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D114146
2021-11-22 12:58:19 +01:00
Sjoerd Meijer 4d21b64464 [BPI] Look-up tables for non-loop branches. NFC.
This adds and uses look-up tables for non-loop branch probabilities, which have
have probabilities directly encoded into the tables for the different condition
codes. Compared to having this logic inlined in different functions, as it used
to be the case, I think this is compacter and thus also easier to check/cross
reference. This also adds a test for pointer heuristics that was missing.

Differential Revision: https://reviews.llvm.org/D114009
2021-11-22 10:30:42 +00:00
Kazu Hirata f6bce30cf9 [llvm] Use range-based for loops (NFC) 2021-11-20 18:42:10 -08:00
Nikita Popov 0a2bde94a0 [LVI] Drop requirement that modulus is constant
If we're looking only at the lower bound, the actual modulus
doesn't matter. This is a leftover from when I wanted to consider
the upper bound as well, where the modulus does matter.
2021-11-20 21:06:08 +01:00
Nikita Popov cd84cab6b3 [LVI] Support urem in implied conditions
If (X urem M) >= C we know that X >= C. Make use of this fact
when computing the implied condition range.

In some cases we could also establish an upper bound, but that's
both tricker and not interesting in practice.

Alive: https://alive2.llvm.org/ce/z/R5ZGSW
2021-11-20 21:01:26 +01:00
Philip Reames 28000587e1 [SCEV] Revert two speculative compile time optimizations which made no difference
Revert "[SCEV] Defer all work from ea12c2cb as late as possible"
Revert "[SCEV] Defer loop property checks from ea12c2cb as late as possible"

This reverts commit 734abbad79 and  1a5666acb2.

Both of these changes were speculative attempts to address a compile time regression.  Neither worked, and both complicated the code in undesirable ways.
2021-11-19 08:45:56 -08:00
Philip Reames 734abbad79 [SCEV] Defer all work from ea12c2cb as late as possible
This is a second speculative compile time optimization to address a reported regression.  My actual suspicion is that availability of no-self-wrap is making some *other* bit of code trigger, but let's rule this out.
2021-11-18 17:19:52 -08:00
Philip Reames 1a5666acb2 [SCEV] Defer loop property checks from ea12c2cb as late as possible
This is a speculative compile time optimization to address a reported regression.  It's the only thing which vaguely makes sense.
2021-11-18 13:47:45 -08:00
Philip Reames ea12c2cb9c [SCEV] Move mustprogress based no-self-wrap logic so it applies to all exit conditions
This change moves logic which we'd added specifically for less than tests so that it applies to equalities and greater than tests as well. The basic idea is that if we can show an IV cycles infinitely through the same series on self-wrap, and that the exit condition must be taken to prevent UB, we can conclude that it must be taken before self-wrap and thus infer said flag.

The motivation here is simple loops with unsigned induction variables w/non-one steps and inequality tests. A toy example would be:
for (unsigned i = 0; i != N; i += 2) { body; }

If body contains no side effects, and this is a mustprogress function, we can assume that this must be a finite loop and thus that the exit count is N/2.

Differential Revision: https://reviews.llvm.org/D103991
2021-11-18 10:07:44 -08:00
Kazu Hirata 7ca14f6044 [llvm] Use range-based for loops (NFC) 2021-11-18 09:09:52 -08:00
Kerry McLaughlin ff64b2933a [LoopVectorize] Check the number of uses of an FAdd before classifying as ordered
checkOrderedReductions looks for Phi nodes which can be classified as in-order,
meaning they can be vectorised without unsafe math. In order to vectorise the
reduction it should also be classified as in-loop by getReductionOpChain, which
checks that the reduction has two uses.

In this patch, a similar check is added to checkOrderedReductions so that we
now return false if there are more than two uses of the FAdd instruction.
This fixes PR52515.

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D114002
2021-11-18 16:41:19 +00:00
Florian Hahn da9f2ba3b1
[SCEV] Reorder operands checks in collectConditions.
The initial two cases require a SCEVConstant as RHS. Pull up the condition
to check and swap SCEVConstants from below. Also remove a redundant
check & swap if RHS is SCEVUnknown.
2021-11-18 09:36:16 +00:00
Philip Reames ad69402f3e [SCEVAA] Avoid forming malformed pointer diff expressions
This solves the same crash as in D104503, but with a different approach.

The test case test_non_dom demonstrates a case where scev-aa crashes today. (If exercised either by -eval-aa or -licm.) The basic problem is that SCEV-AA expects to be able to compute a pointer difference between two SCEVs for any two pair of pointers we do an alias query on. For (valid, but out of scope) reasons, we can end up asking whether expressions in different sub-loops can alias each other. This results in a subtraction expression being formed where neither operand dominates the other.

The approach this patch takes is to leverage the "defining scope" notion we introduced for flag semantics to detect and disallow the formation of the problematic SCEV. This ends up being relatively straight forward on that new infrastructure. This change does hint that we should probably be verifying a similar property for all SCEVs somewhere, but I'll leave that to a follow on change.

Differential Revision: D114112
2021-11-17 12:38:04 -08:00
Arthur Eubanks e3e25b5112 [NewPM] Add option to prevent rerunning function pipeline on functions in CGSCC adaptor
In a CGSCC pass manager, we may visit the same function multiple times
due to SCC mutations. In the inliner pipeline, this results in running
the function simplification pipeline on a function multiple times even
if it hasn't been changed since the last function simplification
pipeline run.

We use a newly introduced analysis to keep track of whether or not a
function has changed since the last time the function simplification
pipeline has run on it. If we see this analysis available for a function
in a CGSCCToFunctionPassAdaptor, we skip running the function passes on
the function. The analysis is queried at the end of the function passes
so that it's available after the first time the function simplification
pipeline runs on a function. This is a per-adaptor option so it doesn't
apply to every adaptor.

The goal of this is to improve compile times. However, currently we
can't turn this on by default at least for the higher optimization
levels since the function simplification pipeline is not robust enough
to be idempotent in many cases, resulting in performance regressions if
we stop running the function simplification pipeline on a function
multiple times. We may be able to turn this on for -O1 in the near
future, but turning this on for higher optimization levels would require
more investment in the function simplification pipeline.

Heavily inspired by D98103.

Example compile time improvements with flag turned on:
https://llvm-compile-time-tracker.com/compare.php?from=998dc4a5d3491d2ae8cbe742d2e13bc1b0cacc5f&to=5c27c913687d3d5559ef3ab42b5a3d513531d61c&stat=instructions

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D113947
2021-11-17 09:06:46 -08:00
Florian Hahn e8b55cf7b7
[SCEV] Apply loop guards when computing max BTC for arbitrary steps.
Similar other cases in the current function (e.g. when the step is 1 or
-1), applying loop guards can lead to tighter upper bounds for the
backedge-taken counts.

Fixes PR52464.

Reviewed By: reames, nikic

Differential Revision: https://reviews.llvm.org/D113578
2021-11-17 11:00:49 +00:00
Philip Reames 8d85e945b2 [SCEV] Canonicalize X - urem X, Y patterns
There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could get different results. The sub representation appears to produce strictly inferior results in practice, so I decided to canonicalize to the Y * X/Y version.

The motivation here is that runtime unroll produces the sub X - (and X, Y-1) pattern when Y is a power of two. SCEV is thus unable to recognize that an unrolled loop exits because we don't figure out that the new unrolled step evenly divides the trip count of the unrolled loop. After instcombine runs, we convert the the andn form which SCEV recognizes, so essentially, this is just fixing a nasty pass ordering dependency.

The ARM loop hardware interaction in the test diff is opague to me, but the comments in the review from others knowledge of the infrastructure appear to indicate these are improvements in loop recognition, not regressions.

Differential Revision: https://reviews.llvm.org/D114018
2021-11-16 11:59:21 -08:00
Arthur Eubanks c95a9f46c9 [Loads] Handle addrspacecast constant expressions when determining dereferenceability
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114015
2021-11-16 11:17:57 -08:00
Florian Hahn b7aec4f08e
[SCEV] Support rewriting ZExt expressions with loop guard info.
So far, applying loop guard information has been restricted to
SCEVUnknown. In a few cases, like PR40961 and PR52464, this leads to
SCEV failing to determine tight upper bounds for the backedge taken
count.

This patch adjusts SCEVLoopGuardRewriter and applyLoopGuards to support
re-writing ZExt expressions.

This is a first step towards fixing  PR40961 and PR52464.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D113577
2021-11-16 11:16:07 +00:00
Mehrnoosh Heidarpour 62c51a72f9 [InstSimplify] Fold A|B | (A^B) --> A|B
This patch adds the following fold opportunity:
A|B | (A^B) --> A|B

that is reported here : https://bugs.llvm.org/show_bug.cgi?id=52479

https://alive2.llvm.org/ce/z/33-My-

Test cases with base results are added in D113860

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D113861
2021-11-15 18:55:04 -05:00
Stanislav Mekhanoshin 833cdb0a07 Revert "[InstSimplify] Fold A|B | (A^B) --> A|B"
This reverts commit 193c40e966.
2021-11-15 14:56:20 -08:00
Arthur Eubanks 19867de9e7 [NewPM] Only invalidate modified functions' analyses in CGSCC passes + turn on eagerly invalidate analyses
Previously, any change in any function in an SCC would cause all
analyses for all functions in the SCC to be invalidated. With this
change, we now manually invalidate analyses for functions we modify,
then let the pass manager know that all function analyses should be
preserved since we've already handled function analysis invalidation.

So far this only touches the inliner, argpromotion, function-attrs, and
updateCGAndAnalysisManager(), since they are the most used.

This is part of an effort to investigate running the function
simplification pipeline less on functions we visit multiple times in the
inliner pipeline.

However, this causes major memory regressions especially on larger IR.
To counteract this, turn on the option to eagerly invalidate function
analyses. This invalidates analyses on functions immediately after
they're processed in a module or scc to function adaptor for specific
parts of the pipeline.

Within an SCC, if a pass only modifies one function, other functions in
the SCC do not have their analyses invalidated, so in later function
passes in the SCC pass manager the analyses may still be cached. It is
only after the function passes that the eager invalidation takes effect.
For the default pipelines this makes sense because the inliner pipeline
runs the function simplification pipeline after all other SCC passes
(except CoroSplit which doesn't request any analyses).

Overall this has mostly positive effects on compile time and positive effects on memory usage.
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructions
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss

D113196 shows that we slightly regressed compile times in exchange for
some memory improvements when turning on eager invalidation.  D100917
shows that we slightly improved compile times in exchange for major
memory regressions in some cases when invalidating less in SCC passes.
Turning these on at the same time keeps the memory improvements while
keeping compile times neutral/slightly positive.

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D113304
2021-11-15 14:44:53 -08:00
Stanislav Mekhanoshin 193c40e966 [InstSimplify] Fold A|B | (A^B) --> A|B
This patch adds the following fold opportunity:
A|B | (A^B) --> A|B

that is reported here : https://bugs.llvm.org/show_bug.cgi?id=52479

https://alive2.llvm.org/ce/z/33-My-

Test cases with base results are added in D113860

(authored by MehrHeidar, committed by rampitec).

Differential Revision:  https://reviews.llvm.org/D113861
2021-11-15 13:49:20 -08:00
Florian Hahn 112c1c346a
[IVDescriptor] Make sure the sign is included for negative extension.
At the moment, computeRecurrenceType does not include any sign bits in
the maximum bit width. If the value can be negative, this means the sign
bit will be missing and the sext won't properly extend the value.

If the value can be negative, increment the bitwidth by one to make sure
there is at least one sign bit in the result value.

Note that the increment is also needed *if* the value is *known* to be
negative, as a sign bit needs to be preserved for the sext to work.

Note that this at the moment prevents vectorization, because the
analysis computes i1 as type for the recurrence when looking through the
AND in lookThroughAnd.

Fixes PR51794, PR52485.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D113056
2021-11-15 13:12:57 +00:00
Kazu Hirata d243cbf8ea [llvm] Use isa instead of dyn_cast (NFC) 2021-11-14 19:40:46 -08:00
Mircea Trofin a32c2c3808 [NFC] Use Optional<ProfileCount> to model invalid counts
ProfileCount could model invalid values, but a user had no indication
that the getCount method could return bogus data. Optional<ProfileCount>
addresses that, because the user must dereference the optional. In
addition, the patch removes concept duplication.

Differential Revision: https://reviews.llvm.org/D113839
2021-11-14 19:03:30 -08:00
Kazu Hirata 7379736774 [llvm] Use range-based for loops with User::operands (NFC) 2021-11-14 09:32:38 -08:00
Roman Lebedev e876698a5d
[NFC][TTI] `getReplicationShuffleCost()`: s/Replicated/Dst/
'Replicated' is mouthful and somewhat ambigious,
while 'destination' is pretty self-explanatory.
2021-11-14 20:01:38 +03:00
Florian Hahn 8ed8d37088
[SCEV] Update SCEVLoopGuardRewriter to hold reference to map. (NFC)
SCEVLoopGuardRewriter doesn't need to copy the rewrite map. It can just
hold a const reference instead, to avoid an unnecessary copy.
2021-11-13 09:39:14 +00:00
Florian Hahn 03cfea68c6
[SCEV] Update SCEVLoopGuardRewriter to take SCEV -> SCEV map (NFC).
Split off refactoring from D113577 to reduce the diff. NFC as the new
interface will only be used in D113577.
2021-11-12 18:16:03 +00:00
Florian Hahn 819bca9b90
[SCEV] Use APIntOps::umin to select best max BC count (NFC).
Suggested in D102267, but I missed this in the committed version.
2021-11-12 12:20:01 +00:00
Mircea Trofin f64eee1625 [NFC][InlineAdvisor] Inform advisor when the module is invalidated
This avoids unnecessary re-calculation of module-wide features in the
MLInlineAdvisor. In cases where function passes don't invalidate
functions (and, thus, don't invalidate the module), but we re-process a
CGSCC, we currently refreshed module features unnecessarily. The
overhead of fetching cached results (albeit they weren't themselves
invalidated) was noticeable in certain modules' compilations.

We don't want to just invalidate the advisor object, though, via the
analysis manager, because we'd then need to re-create expensive state
(like the model evaluator in the ML 'development' mode).

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D113644
2021-11-11 10:23:49 -08:00
duanbo.db 53dc525828 [LoopInfo] Fix function getInductionVariable
The way function gets the induction variable is by judging whether
StepInst or IndVar in the phi statement is one of the operands of CMP.
But if the LatchCmpOp0/LatchCmpOp1 is a constant,  the subsequent
comparison may result in null == null, which is meaningless. This patch
fixes the typo.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D112980
2021-11-11 16:22:42 +08:00
Bin Cheng bf76e64854 [BPI] Push exit block rather than exiting ones in getSccExitBlocks
The function BranchProbabilityInfo::SccInfo::getSccExitBlocks is
supposed to collect all exit blocks for SCC rather than all exiting
blocks. This patch fixes the typo.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D113344
2021-11-11 14:22:19 +08:00
Chris Jackson 116dc70cf3 [DebugInfo][LSR] Add more stringent checks on IV selection and salvage
attempts

Prevent the selection of IVs that have a SCEV containing an undef. Also
prevent salvaging attempts for values for which a SCEV could not be
created by ScalarEvolution and have only SCEVUknown.

Reviewed by: Orlando

Differential Revision: https://reviews.llvm.org/D111810
2021-11-09 13:09:37 +00:00
Roman Lebedev d484cc152b
[TTI] Adjust `getReplicationShuffleCost()` interface
It is trivial to produce DemandedSrcElts given DemandedReplicatedElts,
so don't pass the former. Also, it isn't really useful so far
to have the overload taking the Mask, so just inline it.
2021-11-09 14:07:59 +03:00
Michael Liao bf225939bc [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.
- CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC.
- This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g.,

```
  foo(float *p) {
    __builtin_assume(__isGlobal(p));
    // From there, we could assume p is a global pointer instead of a
    // generic one.
  }
```

This makes the code portable without introducing the implementation-specific features.

Note that NVCC starts to support __builtin_assume from version 11.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112041
2021-11-08 16:51:57 -05:00
Sander de Smalen 2829376bb2 [LV] Use VScaleForTuning to fine-tune the cost per lane.
When targeting a specific CPU with scalable vectorization, the knowledge
of that particular CPU's vscale value can be used to tune the cost-model
and make the cost per lane less pessimistic.

If the target implements 'TTI.getVScaleForTuning()', the cost-per-lane
is calculated as:

  Cost / (VScaleForTuning * VF.KnownMinLanes)

Otherwise, it assumes a value of 1 meaning that the behavior
is unchanged and calculated as:

  Cost / VF.KnownMinLanes

Reviewed By: kmclaughlin, david-arm

Differential Revision: https://reviews.llvm.org/D113209
2021-11-08 16:59:46 +00:00
Nikita Popov a8c318b50e [BasicAA] Use index size instead of pointer size
When accumulating the GEP offset in BasicAA, we should use the
pointer index size rather than the pointer size.

Differential Revision: https://reviews.llvm.org/D112370
2021-11-07 18:56:11 +01:00
Benjamin Kramer 9b8b16457c Put implementation details into anonymous namespaces. NFCI. 2021-11-07 15:18:30 +01:00
Kazu Hirata 843d1eda18 [llvm] Use llvm::reverse (NFC) 2021-11-06 19:31:18 -07:00
Nikita Popov e3cec17b2d [InstSimplify] Remove incorrect icmp of gep fold (PR52429)
As described in https://bugs.llvm.org/show_bug.cgi?id=52429 this
fold is incorrect, because inbounds only guarantees that the
pointers don't wrap in the unsigned space: It is possible that
the sign boundary is crossed by an object.

I'm dropping the fold entirely rather than adjusting it, because
computePointerICmp() fully subsumes it (just with correct predicate
handling).

Differential Revision: https://reviews.llvm.org/D113343
2021-11-06 21:03:21 +01:00
Roman Lebedev a30ec4778a
[TTI][CostModel] `getUserCost()`: recognize replication shuffles and query their cost
This finally creates proper test coverage for replication shuffles,
that are used by LV for conditional loads, and will allow to add
proper costmodel at least for AVX512.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113324
2021-11-06 16:45:15 +03:00
Roman Lebedev f8efc5c0ac
[NFC][TTI] Add/extract `getReplicationShuffleCost()` method, deduplicate it's implementations
Hiding it in `getInterleavedMemoryOpCost()` is problematic for a number of reasons,
including testability and reuse, let's do better.

In a followup `getUserCost()` will be taught to use to to estimate the mask costs,
which will allow for better cost model tests for it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113313
2021-11-06 16:45:15 +03:00
Kazu Hirata 87e53a0ad8 [llvm] Use make_early_inc_range (NFC) 2021-11-05 19:39:07 -07:00
Philip Reames d24a0e8857 [SCEV] Use constant range of RHS to prove NUW on narrow IV in trip count logic
The basic idea here is that given a zero extended narrow IV, we can prove the inner IV to be NUW if we can prove there's a value the inner IV must take before overflow which must exit the loop.

Differential Revision: https://reviews.llvm.org/D109457
2021-11-05 15:36:47 -07:00
David Green 61225c0818 [ValueTracking][InstCombine] Introduce and use ComputeMinSignedBits
This introduces a new ComputeMinSignedBits method for ValueTracking that
returns the BitWidth - SignBits + 1 from ComputeSignBits, and represents
the minimum bit size for the value as a signed integer.  Similar to the
existing APInt::getMinSignedBits method, this can make some of the
reasoning around ComputeSignBits more natural.

See https://reviews.llvm.org/D112298
2021-11-05 14:41:37 +00:00
Arthur Eubanks 7175886a0f [NewPM] Make eager analysis invalidation per-adaptor
Follow-up change to D111575.
We don't need eager invalidation on every adaptor. Most notably,
adaptors running passes that use very few analyses, or passes that
purely invalidate specific analyses.

Also allow testing of this via a pipeline string
"function<eager-inv>()".

The compile time/memory impact of this is very comparable to D111575.
https://llvm-compile-time-tracker.com/compare.php?from=9a2eec512a29df45c90c2fcb741e9d5c693b1383&to=b9f20bcdea138060967d95a98eab87ce725b22bb&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D113196
2021-11-04 17:16:11 -07:00
Liren Peng 57e093162e [ScalarEvolution] Infer loop max trip count from array accesses
Data references in a loop should not access elements over the
statically allocated size. So we can infer a loop max trip count
from this undefined behavior.

Reviewed By: reames, mkazantsev, nikic

Differential Revision: https://reviews.llvm.org/D109821
2021-11-03 10:40:18 +08:00
Nikita Popov c00e9c6345 [BasicAA] Check known access sizes earlier (NFC)
All heuristics for variable accesses require both access sizes to
be known, so check this once at the start, rather than for each
particular heuristic.
2021-11-02 21:26:26 +01:00
Nikita Popov 0b6ed92c8a [BasicAA] Use early returns (NFC)
Reduce nesting in aliasGEP() a bit by returning early.
2021-11-02 21:17:36 +01:00
Nikita Popov 51e9f33603 [BasicAA] Use saturating multiply on range if nsw
If we know that the var * scale multiplication is nsw, we can use
a saturating multiplication on the range (as a good approximation
of an nsw multiply). This recovers some cases where the fix from
D112611 is unnecessarily strict. (This can be further strengthened
by using a saturating add, but we currently don't track all the
necessary information for that.)

This exposes an issue in our NSW tracking for multiplies. The code
was assuming that (X +nsw Y) *nsw Z results in
(X *nsw Z) +nsw (Y *nsw Z) -- however, it is possible that the
distributed multiplications overflow, even if the non-distributed
one does not. We should discard the nsw flag if the the offset is
non-zero. If we just have (X *nsw Y) *nsw Z then concluding
X *nsw (Y *nsw Z) is fine.

Differential Revision: https://reviews.llvm.org/D112848
2021-11-02 20:27:39 +01:00
Arthur Eubanks e2024d72fa Revert "[NFC] Remove LinkAll*.h"
This reverts commit fe364e5dc7.

Causes breakages, e.g. https://lab.llvm.org/buildbot/#/builders/188/builds/5266
2021-11-02 09:08:09 -07:00
Arthur Eubanks fe364e5dc7 [NFC] Remove LinkAll*.h
These were added to prevent functions from being removed by WPO.

But that doesn't make sense, correct WPO will not remove functions we actually use.

I noticed these because compiling cc1_main.cpp was pulling in random LLVM pass headers.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112971
2021-11-02 08:43:17 -07:00
Arthur Eubanks 029f1a5344 [LazyCallGraph] Skip blockaddresses
blockaddresses do not participate in the call graph since the only
instructions that use them must all return to someplace within the
current function. And passes cannot retrieve a function address from a
blockaddress.

This was suggested by efriedma in D58260.

Fixes PR50881.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D112178
2021-11-01 13:10:24 -07:00
Nikita Popov 4972d12185 [SCEV] Only add direct loop users (NFC)
It it now sufficient to track only direct addrec users of a loop,
and let the SCEVUsers mechanism track and invalidate transitive users.

Differential Revision: https://reviews.llvm.org/D112875
2021-11-01 18:49:43 +01:00
Max Kazantsev e512c5b166 [SCEV][NFC] Factor out common API for getting unique operands of a SCEV
This function is used at least in 2 places, to it makes sense to make it separate.

Differential Revision: https://reviews.llvm.org/D112516
Reviewed By: reames
2021-11-01 11:36:47 +07:00
Kazu Hirata c8b1ed5fb2 [clang, llvm] Use Optional::getValueOr (NFC) 2021-10-30 19:00:21 -07:00
David Green 2c4a9e830c [ValueTracking] Teach computeConstantRange that the maximum value of a half is 65504
The maximal value of a half is 0x7bff, which is 65504 when converted to
an integer. This patch teaches that to computeConstantRange to compute a
constant range with the correct maximum value.
https://alive2.llvm.org/ce/z/BV_Spb
https://alive2.llvm.org/ce/z/Nwuqvb

The maximum value for a float converted in the same way is 3.4e38, which
requires 129bits of data. I have not added that here as integer types so
larger are rare, compared to integers types larger than 17 bits require
for half floats.

The MVE tests change because instsimplify happens to be run as a part of
the backend, where it doesn't tend to for other backends.

Differential Revision: https://reviews.llvm.org/D112694
2021-10-30 14:27:38 +01:00
Kazu Hirata 972d4133e9 Use {DenseSet,SmallPtrSet}::contains (NFC) 2021-10-29 20:26:07 -07:00
Nikita Popov cdf45f98ca [BasicAA] Extract linear expression multiplication (NFC)
Extract a common method for multiplying a linear expression by a
factor.
2021-10-29 22:41:40 +02:00
Nikita Popov 7cf7378a9d [BasicAA] Don't treat non-inbounds GEP as nsw
The scale multiplication is only guaranteed to be nsw if the GEP
is inbounds (or the multiplication is trivial). Previously we were
only considering explicit muls in GEP indices.
2021-10-29 22:30:44 +02:00
modimo 5caad9b5d3 [InlineAdvisor] Add fallback/format switches and negative remark processing to Replay Inliner
Adds the following switches:

1. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback: controls what the replay advisor does for inline sites that are not present in the replay. Options are:

 1. Original: defers to original advisor
 2. AlwaysInline: inline all sites not in replay
 3. NeverInline: inline no sites not in replay

2. --sample-profile-inline-replay-format/--cgscc-inline-replay-format: controls what format should be generated to match against the replay remarks. Options are:

  1. Line
  2. LineColumn
  3. LineDiscriminator
  4. LineColumnDiscriminator

Adds support for negative inlining decisions. These are denoted by "will not be inlined into" as compared to the positive "inlined into" in the remarks.

All of these together with the previous `--sample-profile-inline-replay-scope/--cgscc-inline-replay-scope` allow tweaking in how to apply replay. In my testing, I'm using:
1. --sample-profile-inline-replay-scope/--cgscc-inline-replay-scope = Function to only replay on a function
2. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback = NeverInline since I'm feeding in only positive remarks to the replay system
3. --sample-profile-inline-replay-format/--cgscc-inline-replay-format = Line since I'm generating the remarks from DWARF information from GCC which can conflict quite heavily in column number compared to Clang

An alternative configuration could be to do Function, AlwaysInline, Line fallback with negative remarks which closer matches the final call-sites. Note that this can lead to unbounded inlining if a negative remark doesn't match/exist for one reason or another.

Updated various tests to cover the new switches and negative remarks

Testing:
ninja check-all

Reviewed By: wenlei, mtrofin

Differential Revision: https://reviews.llvm.org/D112040
2021-10-29 12:32:03 -07:00
Peter Waller 98f08752f7 [InstCombine][ConstantFolding] Make ConstantFoldLoadThroughBitcast TypeSize-aware
The newly added test previously caused the compiler to fail an
assertion. It looks like a strightforward TypeSize upgrade.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D112142
2021-10-28 12:15:15 +00:00
Max Kazantsev 513914e1f3 [SCEV] Invalidate user SCEVs along with operand SCEVs to avoid cache corruption
Following discussion in D110390, it seems that we are suffering from unability
to traverse users of a SCEV being invalidated. The result of that is that ScalarEvolution's
inner caches may store obsolete data about SCEVs even if their operands are
forgotten. It creates problems when we try to verify the contents of those caches.

It's also a frequent situation when messing with cache causes very sneaky and
hard-to-analyze bugs related to corruption of memory when dealing with cached
data. They are lurking there because ScalarEvolution's veirfication is not powerful
enough and misses many problematic cases. I plan to make SCEV's verification
much stricter in follow-ups, and this requires dangling-pointers-free caches.

This patch makes sure that, whenever we forget cached information for a SCEV,
we also forget it for all SCEVs that (transitively) use it.

This may have negative compile time impact. It's a sacrifice we are more
than willing to make to enforce correctness. We can also save some time by
reworking invokers of forgetMemoizedResults (maybe we can forget multiple
SCEVs with single query).

Differential Revision: https://reviews.llvm.org/D111533
Reviewed By: reames
2021-10-28 09:39:24 +07:00
Nikita Popov 665060ea45 [BasicAA] Remove misleading overflow check
GEP decomposition currently checks whether the multiplication of
the linear expression offset and GEP scale overflows. However, if
everything else works correctly, this overflow check is both
unnecessary and dangerously misleading. While it will avoid an
overflow in Scale * Offset in particular, other parts of the
calculation (including those on dynamic values) may still overflow.
The code working on the decomposed GEPs is responsible for ensuring
that it remains correct in the presence of overflow. D112611 fixes
the last issue of that kind that I'm aware of (in fact, the overflow
check was originally introduced to work around precisely that issue).

Differential Revision: https://reviews.llvm.org/D112618
2021-10-27 20:56:03 +02:00
Philip Reames 425cbbc602 [Operator] Add hasPoisonGeneratingFlags [mostly NFC]
This method parallels the dropPoisonGeneratingFlags on Instruction, but is hoisted to operator to handle constant expressions as well.

This is mostly code movement, but I did go ahead and add the inrange constexpr gep case.  This had been discussed previously, but apparently never followed up o.
2021-10-27 11:25:40 -07:00
Nikita Popov fbc0c308d5 [BasicAA] Handle known bits as ranges
BasicAA currently tries to determine that the offset is positive by
checking whether all variable indices are positive based on known
bits, multiplied by a positive scale. However, this is incorrect
if the scale multiplication might overflow. In the modified test
case the original value is positive, but may be negative after a
left shift.

Fix this by converting known bits into a constant range and reusing
the range-based logic, which handles overflow correctly.

Differential Revision: https://reviews.llvm.org/D112611
2021-10-27 14:41:31 +02:00
Nikita Popov 9bc7e543b4 [BasicAA] Make range check more precise
Make the range check more precise by calculating the range of
potentially accessed bytes for both accesses and checking whether
their intersection is empty. In that case there can be no overlap
between the accesses and the result is NoAlias.

This is more powerful than the previous approach, because it can
deal with sign-wrapped ranges. In the test case the original range
is [-1, INT_MAX] but becomes [0, INT_MIN] after applying the offset.
This is a wrapping range, so getSignedMin/getSignedMax will treat
it as a full range. However, the range excludes the elements
[INT_MIN+1, -1], which is enough to prove NoAlias with an access
at offset -1.

Differential Revision: https://reviews.llvm.org/D112486
2021-10-27 12:40:58 +02:00
Max Kazantsev 5961f0308f [SCEV][NFC] Verify intergity of SCEVUsers
Make sure that, for every living SCEV, we have all its direct
operand tracking it as their user.

Differential Revision: https://reviews.llvm.org/D112402
Reviewed By: reames
2021-10-27 09:54:49 +07:00
Nikita Popov 3a995c918e [SCEV] Move SCEVLostPoisonFlags() check into SCEVExpander
Always insert values into ExprValueMap, and instead skip using them
in SCEVExpander if poison-generating flags have been lost. This
ensures that all values that are in ValueExprMap are also in
ExprValueMap, so we can use the latter to invalidate the former.

This change is probably not entirely NFC for the case where
originally the SCEV had no nowrap flags but they were inferred
later, in which case that would now allow reusing the existing
value for expansion.

Differential Revision: https://reviews.llvm.org/D112389
2021-10-25 22:37:20 +02:00
Nikita Popov 0d20ebf686 [BasicAA] Use ranges for more than one index
D109746 made BasicAA use range information to determine the
minimum/maximum GEP offset. However, it was limited to the case of
a single variable index. This patch extends support to multiple
indices by adding all the ranges together.

Differential Revision: https://reviews.llvm.org/D112378
2021-10-25 15:30:50 +02:00
Nikita Popov 75384ecdf8 [InstSimplify] Refactor invariant.group load folding
Currently strip.invariant/launder.invariant are handled by
constructing constant expressions with the intrinsics skipped.
This takes an alternative approach of accumulating the offset
using stripAndAccumulateConstantOffsets(), with a flag to look
through invariant.group intrinsics.

Differential Revision: https://reviews.llvm.org/D112382
2021-10-25 10:56:25 +02:00
Kazu Hirata 3729a5abf4 [SCEV] Fix a warning on an unused lambda capture
This patch fixes:

  llvm/lib/Analysis/ScalarEvolution.cpp:12770:37: error: lambda
  capture 'this' is not used [-Werror,-Wunused-lambda-capture]
2021-10-25 00:45:18 -07:00
Max Kazantsev f8623b0783 [SCEV][NFC] Win some compile time from mass forgetMemoizedResults
Mass forgetMemoizedResults can be done more efficiently than bunch
of individual invocations of helper because we can traverse maps being
updated just once, rather than doing this for each invidivual SCEV.

Should be NFC and supposedly improves compile time.

Differential Revision: https://reviews.llvm.org/D112294
Reviewed By: reames
2021-10-25 14:09:41 +07:00
Max Kazantsev dbab339ea4 [SCEV][NFC] Apply mass forgetMemoizedResults queries where possible
When forgetting multiple SCEVs, rather than doing this one by one, we can
instead use mass updates. We plan to make them more efficient than they
are now, potentially improving compile time.

Differential Revision: https://reviews.llvm.org/D111602
Reviewed By: reames
2021-10-25 13:50:49 +07:00
Max Kazantsev a6096b7f9e [SCEV][NFC] Introduce API for mass forgetMemoizedResults query
This patch changes signature of forgetMemoizedResults to be able to work with
multiple SCEVs. Usage will come in follow-ups. We also plan to optimize it in the
future to work faster than individual invalidation updates. Should not change
behavior in any sense.

Split-off from D111602.

Differential Revision: https://reviews.llvm.org/D112293
Reviewed By: reames
2021-10-25 13:49:31 +07:00
Max Kazantsev 1c18ebb2cc [NFC][SCEV] Do not track users of SCEVConstants
Follow-up from D112295, suggested by Nikita: we can avoid tracking
users of SCEVConstants because dropping their cached info is unlikely
to give any new prospects for fact inference, and it should not introduce
any correctness problems.
2021-10-25 12:30:46 +07:00
Max Kazantsev fea4a48c0b [SCEV][NFC] API for tracking of SCEV users
This patch introduces API that keeps track of SCEVs users of
another SCEVs, required to handle invalidations of users along
with operands that comes in follow-up patches.

Differential Revision: https://reviews.llvm.org/D112295
Reviewed By: reames
2021-10-25 12:14:18 +07:00
Kazu Hirata 4bd46501c3 Use llvm::any_of and llvm::none_of (NFC) 2021-10-24 17:35:33 -07:00
Philip Reames a461fa64bb Treat branch on poison as immediate UB (under an off by default flag)
The LangRef clearly states that branching on a undef or poison value is immediate undefined behavior, but historically, we have not been consistent about implementing that interpretation in the optimizer. Historically, we used (in some cases) a more relaxed model which essentially looked for provable UB along both paths which was control dependent on the condition. However, we've never been 100% consistent here. For instance SCEV uses the strong model for increments which form AddRecs (and only addrecs).

At the moment, the last big blocker for finally making this switch is enabling the fix landed in D106041. Loop unswitching (in it's classic form) is incorrect as it creates many "branch on poisons" when unswitching conditions originally unreachable within the loop.

This change adds a flag to value tracking which allows to easily test the optimization potential of treating branch on poison as immediate UB. It's intended to help ease work on getting us finally through this transition and avoid multiple independent rediscovers of the same issues.

Differential Revision: https://reviews.llvm.org/D112026
2021-10-24 14:42:03 -07:00
Nikita Popov 0c7f85d786 [InstSimplify] Simplify fetching of index size (NFC)
Directly fetch the size instead of going through the index type
first.
2021-10-23 22:08:15 +02:00
Nikita Popov 710596a1e1 [ConstantFolding] Accept offset in ConstantFoldLoadFromConstPtr (NFCI)
As this API is now internally offset-based, we can accept a starting
offset and remove the need to create a temporary bitcast+gep
sequence to perform an offset load. The API now mirrors the
ConstantFoldLoadFromConst() API.
2021-10-23 17:59:39 +02:00
Kazu Hirata d8e4170b0a Ensure newlines at the end of files (NFC) 2021-10-23 08:45:29 -07:00
Kazu Hirata d14d7068b6 [llvm] Use StringRef::contains (NFC) 2021-10-23 08:45:27 -07:00
Nikita Popov c5b5b7f621 [ConstantFolding] Remove ConstantFoldLoadThroughGEPIndices() API (NFC)
The last user of this API went away in
4f5e9a2bb2.
2021-10-23 16:59:29 +02:00
Nikita Popov 4f5e9a2bb2 [SCEV] Remove computeLoadConstantCompareExitLimit() (NFCI)
The functionality of this method is already covered by
computeExitCountExhaustively() in a more general fashion. It was
added at a time when exhaustive exit count calculation did not
support constant folding loads yet. I double checked that dropping
this code causes no binary changes in test-suite.

Differential Revision: https://reviews.llvm.org/D112343
2021-10-23 15:34:25 +02:00
Nikita Popov 61cfdf636d [BasicAA] Model implicit trunc of GEP indices
GEP indices larger than the GEP index size are implicitly truncated
to the index size. BasicAA currently doesn't model this, resulting
in incorrect alias analysis results.

Fix this by explicitly modelling truncation in CastedValue in the
same way we do zext and sext. Additionally we need to disable a
number of optimizations for truncated values, in particular
"non-zero" and "non-equal" may no longer hold after truncation.
I believe the constant offset heuristic is also not necessarily
correct for truncated values, but wasn't able to come up with a
test for that one.

A possible followup here would be to use the new mechanism to
model explicit trunc as well (which should be much more common,
as it is the canonical form). This is straightforward, but omitted
here to separate the correctness fix from the analysis improvement.

(Side note: While I say "index size" above, BasicAA currently uses
the pointer size instead. Something for another day...)

Differential Revision: https://reviews.llvm.org/D110977
2021-10-22 23:47:02 +02:00
Nikita Popov 3a10fe2d89 [Loads] Use more powerful constant folding API
This follows up on D111023 by exporting the generic "load value
from constant at given offset as given type" and using it in the
store to load forwarding code. We now need to make sure that the
load size is smaller than the store size, previously this was
implicitly ensured by ConstantFoldLoadThroughBitcast().

Differential Revision: https://reviews.llvm.org/D112260
2021-10-22 18:33:03 +02:00
Nikita Popov 1848525842 [CodeMetrics] Don't require speculatability for ephemeral values
As discussed in D112016, our current requirement of speculatability
for ephemeral is overly strict: What we really care about is that
the instruction will be DCEd once the assume is dropped. For that
it is sufficient that the instruction is side-effect free and not
a terminator.

In particular, this allows non-dereferenceable loads to be ephemeral
values.

Differential Revision: https://reviews.llvm.org/D112179
2021-10-21 20:30:01 +02:00
Arthur Eubanks 3781a46c3c Revert "[IPT] Restructure cache to allow lazy update following invalidation [NFC]"
This reverts commit baea663a6e.

Causes crashes, e.g. https://lab.llvm.org/buildbot/#/builders/77/builds/10715.
2021-10-21 10:48:41 -07:00
Philip Reames baea663a6e [IPT] Restructure cache to allow lazy update following invalidation [NFC]
This change restructures the cache used in IPT to point not to the first special instruction, but to the first instruction which *could* be special. That is, the cached reference is always equal to the first special, or comes before it in the block.

This avoids expensive block scans when we are removing special instructions from the beginning of the block. At the moment, this case is not heavily used, though it does trigger in GVN when doing CSE of calls. The main motivation was a change I'm no longer planning to move forward with, but the cache optimization seemed worthwhile as a minor perf win at low cost.

Differential Revision: https://reviews.llvm.org/D111768
2021-10-21 09:16:21 -07:00
Arthur Eubanks 00500d5bad [NFC] De-template LazyCallGraph::visitReferences() and move into .cpp file
This makes changing it and recompiling it much faster.
2021-10-20 10:50:00 -07:00
Bjorn Pettersson 9c44a0996c [SCEV] Fix formatting error introduced by D112080
Accidentally pushed D112080 without this clang-format cleanup.
2021-10-19 21:44:07 +02:00
Bjorn Pettersson 08619006a0 [SCEV] Avoid compile time explosion in ScalarEvolution::isImpliedCond
As seen in PR51869 the ScalarEvolution::isImpliedCond function might
end up spending lots of time when doing the isKnownPredicate checks.

Calling isKnownPredicate for example result in isKnownViaInduction
being called, which might result in isLoopBackedgeGuardedByCond being
called, and then we might get one or more new calls to isImpliedCond.
Even if the scenario described here isn't an infinite loop, using
some random generated C programs as input indicates that those
isKnownPredicate checks quite often returns true. On the other hand,
the third condition that needs to be fulfilled in order to "prove
implications via truncation", i.e. the isImpliedCondBalancedTypes
check, is rarely fulfilled.
I also made some similar experiments to look at how often we would
get the same result when using isKnownViaNonRecursiveReasoning instead
of isKnownPredicate. So far I haven't seen a single case when codegen
is negatively impacted by using isKnownViaNonRecursiveReasoning. On
the other hand, it seems like we get rid of the compile time explosion
seen in PR51869 that way. Hence this patch.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112080
2021-10-19 21:37:57 +02:00
Arthur Eubanks ecd25edfc5 [InlineCost] Add empty line between call sites when printing inline costs 2021-10-18 13:56:48 -07:00
Arthur Eubanks b8ce97372d [NewPM] Add PipelineTuningOption to eagerly invalidate analyses
This trades off more compile time for less peak memory usage. Right now
it invalidates all function analyses after a module->function or
cgscc->function adaptor.

https://llvm-compile-time-tracker.com/compare.php?from=1fb24fe85a19ae71b00875ff6c96ef1831dcf7e3&to=cb28ddb063c87f0d5df89812ab2de9a69dd276db&stat=instructions
https://llvm-compile-time-tracker.com/compare.php?from=1fb24fe85a19ae71b00875ff6c96ef1831dcf7e3&to=cb28ddb063c87f0d5df89812ab2de9a69dd276db&stat=max-rss

For now this is just experimental.

See comments on why this may affect optimizations.

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D111575
2021-10-18 13:20:35 -07:00
modimo 313c657fce [InlineAdvisor] Add -inline-replay-scope=<Function|Module> to control replay scope
The goal is to allow grafting an inline tree from Clang or GCC into a new compilation without affecting other functions. For GCC, we're doing this by extracting the inline tree from dwarf information and generating the equivalent remarks.

This allows easier side-by-side asm analysis and a trial way to see if a particular inlining setup provides benefits by itself.

Testing:
ninja check-all

Reviewed By: wenlei, mtrofin

Differential Revision: https://reviews.llvm.org/D110658
2021-10-18 13:08:39 -07:00
Kirill Stoimenov 62627c7217 [Sanitizers] Replaced getMaxPointerSizeInBits with getPointerSizeInBits, which was causing failures for 32bit x86.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D111829
2021-10-18 09:31:14 -07:00
Sanjay Patel 2a3cc4d461 [Analysis] add utility function for unary shuffle mask creation
This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )

Differential Revision: https://reviews.llvm.org/D111891
2021-10-18 09:00:39 -04:00
Nikita Popov 274b2439f8 [ConstantRange] Add fast signed multiply
The multiply() implementation is very slow -- it performs six
multiplications in double the bitwidth, which means that it will
typically work on allocated APInts and bypass fast-path
implementations. Add an additional implementation that doesn't
try to produce anything better than a full range if overflow is
possible. At least for the BasicAA use-case, we really don't care
about more precise modeling of overflow behavior. The current
use of multiply() is fine while the implementation is limited to
a single index, but extending it to the multiple-index case makes
the compile-time impact untenable.
2021-10-17 16:41:49 +02:00
Simon Pilgrim d464a9d476 [Analysis] Replace assert(isa)/dyn_cast with cast. NFC.
cast<> will perform the assertion for us.

Removes a static analysis null dereference warning.
2021-10-16 11:40:19 +01:00
Simon Pilgrim a1b43d2bc9 [LazyValueInfo] getPredicateAt - remove unnecessary null pointer check. NFC.
We already dereference the CxtI pointer several times before reaching the "if(CxtI)", we have no need to check it again.

Fixes a coverity warning.
2021-10-16 11:20:19 +01:00
Simon Pilgrim c288241795 [ConstantFolding] ConstantFoldScalarCall2 - early-out if getLibFunc fails. NFC. 2021-10-16 11:20:19 +01:00
Simon Pilgrim c18cf10a04 [ConstantFolding] Use getValueAPF const ref value where possible. NFC.
Don't copy the value if we can avoid it.
2021-10-16 11:20:19 +01:00
Simon Pilgrim 76ca0d67ab [ConstantFolding] ConstantFoldScalarCall1 - early-out if getLibFunc fails. NFC. 2021-10-16 11:20:18 +01:00
Nikita Popov 0c52c271a5 [BasicAA] Rename ExtendedValue to CastedValue (NFC)
As suggested on D110977, rename ExtendedValue to CastedValue,
because it will contain more than just extensions in the future.
2021-10-15 21:56:54 +02:00
Max Kazantsev 90ae538cab [SCEV] Prove implication of predicates to their sign-flipped counterparts
This patch teaches SCEV two implication rules:

  x <u y && y >=s 0 --> x <s y,
  x <s y && y <s 0 --> x <u y.

And all equivalents with signs/parts swapped.

Differential Revision: https://reviews.llvm.org/D110517
Reviewed By: nikic
2021-10-15 11:49:18 +07:00
Max Kazantsev 1202d280c6 [SCEV][NFC] Reduce memory footprint & compile time via DFS refactoring
Current implementations of DFS in SCEV check unique-visited of traversed
values on pop, and not on push. As result, the same value may be pushed
multiple times just to be thrown away when popped. These operations are
meaningless and only waste time and increase memory footprint of the
worklist.

This patch reworks the DFS strategy to check uniqueness before push.
Should be NFC.

Differential Revision: https://reviews.llvm.org/D111774
Reviewed By: nikic, reames
2021-10-15 10:19:15 +07:00
Artur Pilipenko 3f96f7b30c Fix getInlineCost with ComputeFullInlineCost enabled
Fix a bug when getInlineCost incorrectly returns a
cost/threshold pair instead of an explicit never inline.

Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D111687
2021-10-14 17:41:41 -07:00
Nikita Popov 69853f9920 [IVUsers] Move preheader check into SCEVExpander
Rather than checking for loop nest preheaders upfront in IVUsers,
move this requirement into isSafeToExpand() from SCEVExpander.

Historically, LSR did not check whether SCEVs are safe to expand
and fully relied on IVUsers to validate this. Later, support for
non-expandable SCEVs was added via rigid formulas.

Checking this in isSafeToExpand() makes it more obvious what
exactly this check is guarding against, and avoids the awkward
loop nest scan.

This is a followup to https://reviews.llvm.org/D111493#3055286.

Differential Revision: https://reviews.llvm.org/D111681
2021-10-14 21:52:31 +02:00
Nikita Popov 5f05ff081f [BasicAA] Improve scalable vector handling
Currently, DecomposeGEP() bails out on the whole decomposition if
it encounters a scalable GEP type anywhere. However, it is fine to
still analyze other GEPs that we look through before hitting the
scalable GEP. This does mean that the decomposed GEP base is no
longer required to be the same as the underlying object. However,
I don't believe this property is necessary for correctness anymore.

This allows us to compute slightly more precise aliasing results
for GEP chains containing scalable vectors, though my primary
interest here is simplifying the code.

Differential Revision: https://reviews.llvm.org/D110511
2021-10-14 20:23:50 +02:00
Kevin P. Neal 727a891ec8 [FPEnv][InstSimplify] Fold fadd X, 0 ==> X, when we know X is not -0
Currently the fadd optimizations in InstSimplify don't know how to do this
NoSignedZeros "X + 0.0 ==> X" fold when using the constrained intrinsics.
This adds the support.

This review is derived from D106362 with some improvements from D107285
and is a follow-on to D111085.

Differential Revision: https://reviews.llvm.org/D111450
2021-10-14 12:32:45 -04:00
Nikita Popov a8e7d11aca [ValueTracking] Simplify getKnowledgeValidInContext() call (NFC)
This accepts an ArrayRef, there's no need to create a SmallVector.
2021-10-14 18:17:54 +02:00
Max Kazantsev 6e1308bc10 [SCEV][NFC] Simplify check with CI->isZero() exit condition
Replace check with
    if ((ExitIfTrue && CI->isZero()) || (!ExitIfTrue && CI->isOne()))
with equivalent and simpler version
    if (ExitIfTrue == CI->isZero())
2021-10-14 14:06:52 +07:00
Max Kazantsev 46a1dd47e6 [SCEV][NFC] Reorder checks to delay call of all_of
Check lightweight getter condition before calling all_of.
2021-10-14 13:30:51 +07:00
Mircea Trofin 6c76d01011 [mlgo][aot] requrie the model is autogenerated for test determinism
The tests that exercise the 'release' mode, where the model is AOT-ed,
check the output has certain properties, to validate that, indeed, a
different policy from the default one was exercised. For determinism, we
can't reliably check that output for an arbitrary learned policy, since
it could be that policy happens to mimic the default one in that
particular case.

This patch adds a requirement that those tests run only when the model
is autogenerated (e.g. on build bots).

Differential Revision: https://reviews.llvm.org/D111747
2021-10-13 14:02:41 -07:00
Arthur Eubanks 3628bb7436 Make various assume bundle data structures use uint64_t
Following D110451, we need to make sure to support 64 bit values.
2021-10-13 10:38:41 -07:00
Philip Reames 24c9016574 [instcombine] propagate single use freeze(gep inbounds X)
This is a follow on for D111675 which implements the gep case. I'd originally left it out because I was hoping to actually implement the inrange todo, but after a bit of staring at the code, decided to leave it as is since it doesn't effect this use case (i.e. instcombine requires the op to freeze to be an instruction).

Differential Revision: https://reviews.llvm.org/D111691
2021-10-13 09:25:00 -07:00
Philip Reames 4c5702cb12 Fix bug introduced with 6f34839 (poison flags on floating point ops)
The newly introduced API for checking whether poison comes solely from flags which can be dropped was out of sync.  This was noticed by a reviewer post commit.

For the moment, disable the floating point flags.  In a follow up change, I plan to add support in dropPoisonGeneratingFlags, but that deserves to be a change of it's own.
2021-10-12 20:25:00 -07:00
Philip Reames 6f34839407 [instcombine] propagate freeze through single use poison producing flag instruction
If we have an instruction which produces poison only when flags are specified on the instruction, then we know that freezing the operands and dropping flags is equivalent to freezing the result. If we know those flags don't result in any undefined behavior being executed, then there's no point in preserving the flags as we gain no knowledge by having them.

This patch extends the existing propagation logic which sinks freeze to single potential non-poison operands to allow dropping of flags when we know the freeze is the sole use of the instruction with poison flags.

The main value is that we tend to sink freezes towards the phi in IV cycles where the incoming value to the phi is the freeze of an IV increment. This will in turn (in a future patch), let us fold the freeze through the phi into the loop preheader. Motivated by eliminating need for CanonicalizeFreezeInLoops for the clearly profitable cases from onephi.ll test case in the test directory.

Differential Revision: https://reviews.llvm.org/D111675
2021-10-12 13:52:41 -07:00
Hongtao Yu 098a0d8fbc [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.
This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation.

Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are:
- Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes
- Unblocked CSE by avoiding pseudo probe from clobbering memory SSA
- Unblocked induction variable simpliciation
- Allow empty loop deletion by treating probe intrinsic isDroppable
- Some refactoring.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110847
2021-10-12 09:44:12 -07:00
Nikita Popov 2a2a37d972 [IVUsers] Check for preheader instead of loop simplify form
IVUsers currently makes sure that all loops dominating a user are
in loop simplify form, because SCEVExpander needs a preheader to
insert into. However, loop simplify form requires much more than
that. In particular, it requires dedicated exits, which means that
exits need to be found and walked. For large functions with many
nested loops, this can result in pathological compile-time explosion.

Fix this by only checking the property we're actually interested in,
which is incidentally cheap to check.

Differential Revision: https://reviews.llvm.org/D111493
2021-10-11 23:13:13 +02:00
Roman Lebedev 684cbae89a
[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places 2021-10-11 23:36:06 +03:00
Arthur Eubanks 259390de9a [LCG] Don't skip invalidation of LazyCallGraph if CFG analyses are preserved
The CFG being changed and the overall call graph are not related, we can introduce/remove calls without changing the CFG.

Resolves one of the issues in PR51946.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D111275
2021-10-11 13:30:47 -07:00
Philip Reames 7f55209cee [SCEV] Extend trip count to avoid overflow by default
As a brief reminder, an "exit count" is the number of times the backedge executes before some event. It can be zero if we exit before the backedge is reached. A "trip count" is the number of times the loop header is entered if we branch into the loop. In general, TC = BTC + 1 and thus a zero trip count is ill defined

There is a cornercases which we don't handle well. Let's assume i8 for our examples to keep things simple. If BTC = 255, then the correct trip count is 256. However, 256 is not representable in i8.

In theory, code which needs to reason about trip counts is responsible for checking for this cornercase, and either bailing out, or handling it correctly. Historically, we don't have a great track record about actually doing so.

When reviewing D109676, I found myself asking a basic question. Was there any good reason to preserve the current wrap-to-zero behavior when converting from backedge taken counts to trip counts? After reviewing existing code, I could not find a single case which appears to correctly and precisely handle the overflow case.

This patch changes the default behavior to extend instead of wrap. That is, if the result might be 256, we return a value of i9 type to ensure we interpret the count correctly. I did leave the legacy behavior as an option since a) loop-flatten stops triggering if I extend due to weirdly specific pattern matching I didn't understand and b) we could reasonably use the mode if we'd externally established a lack of overflow.

I want to emphasize that this change is *not* NFC. There are two call sites (one in ScalarEvolution.cpp, one in LoopCacheAnalysis.cpp) which are switched to the extend semantics. The former appears imprecise (but correct) for a constant 255 BTC. The later appears incorrect, though I don't have a test case.

Differential Revision: https://reviews.llvm.org/D110587
2021-10-11 09:55:55 -07:00
David Sherwood 26b7d9d622 [LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns
This patch adds further support for vectorisation of loops that involve
selecting an integer value based on a previous comparison. Consider the
following C++ loop:

  int r = a;
  for (int i = 0; i < n; i++) {
    if (src[i] > 3) {
      r = b;
    }
    src[i] += 2;
  }

We should be able to vectorise this loop because all we are doing is
selecting between two states - 'a' and 'b' - both of which are loop
invariant. This just involves building a vector of values that contain
either 'a' or 'b', where the final reduced value will be 'b' if any lane
contains 'b'.

The IR generated by clang typically looks like this:

  %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ]
  ...
  %pred = icmp ugt i32 %val, i32 3
  %phi.update = select i1 %pred, i32 %b, i32 %phi

We already detect min/max patterns, which also involve a select + cmp.
However, with the min/max patterns we are selecting loaded values (and
hence loop variant) in the loop. In addition we only support certain
cmp predicates. This patch adds a new pattern matching function
(isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp.
We only support selecting values that are integer and loop invariant,
however we can support any kind of compare - integer or float.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
  Transforms/LoopVectorize/select-cmp-predicated.ll
  Transforms/LoopVectorize/select-cmp.ll

Differential Revision: https://reviews.llvm.org/D108136
2021-10-11 09:41:38 +01:00
Clement Courbet 83ded5d323 re-land "[AA] Teach BasicAA to recognize basic GEP range information."
Now that PR52104 is fixed.
2021-10-11 10:04:22 +02:00
Nick Desaulniers 9697f93587 [InlineCost] model calls to llvm.is.constant* more carefully
llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values.

A common use case for llvm.is.constant comes from the higher level
__builtin_constant_p. A common usage pattern of __builtin_constant_p in
the Linux kernel is:

    void foo (int bar) {
      if (__builtin_constant_p(bar)) {
        // lots of code that will fold away to a constant.
      } else {
        // a little bit of code, usually a libcall.
      }
    }

A minor issue in InlineCost calculations is when `bar` is _not_ Constant
and still will not be after inlining, we don't discount the true branch
and the inline cost of `foo` ends up being the cost of both branches
together, rather than just the false branch.

This leads to code like the above where inlining will not help prove bar
Constant, but it still would be beneficial to inline foo, because the
"true" branch is irrelevant from a cost perspective.

For example, IPSCCP can sink a passed constant argument to foo:

    const int x = 42;
    void bar (void) { foo(x); }

This improves our inlining decisions, and fixes a few head scratching
cases were the disassembly shows a relatively small `foo` not inlined
into a lone caller.

We could further improve this modeling by tracking whether the argument
to llvm.is.constant* is a parameter of the function, and if inlining
would allow that parameter to become Constant. This idea is noted in a
FIXME comment.

Link: https://github.com/ClangBuiltLinux/linux/issues/1302

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D111272
2021-10-08 15:27:30 -07:00
Philip Reames edf31b4db1 [IPT] Add a statistic to track instructions scanned to answer queries
I'm planning some changes to the invalidation mechanism here, and having a concrete mechanism to track progress is key.
2021-10-08 10:59:35 -07:00
Philip Reames b4498e6b8d [IPT] Narrow scope of removeInstruction invalidation [NFC]
We only need to invalidate if the instruction being removed is the cached "first special instruction".  If the instruction is before that one, it can't (by assumption) be special.  If it is after that one, it wasn't the first.
2021-10-08 10:35:03 -07:00
Philip Reames d694dd0f0d Add iterator range variants of isGuaranteedToTransferExecutionToSuccessor [mostly-nfc]
This factors out utilities for scanning a bounded block of instructions since we have this code repeated in a bunch of places.  The change to InlineFunction isn't strictly NFC as the limit mechanism there didn't handle debug instructions correctly.
2021-10-08 09:50:10 -07:00
Nikita Popov c77a5c21bb [BasicAA] Use base of decomposed GEP in recursive queries (NFC)
DecompGEP.Base and UnderlyingV are currently always the same.
However, logically DecompGEP.Base is the right value to use here,
because the decomposed offset is relative to that base.
2021-10-07 22:08:41 +02:00
Paul Robinson aec66f895b [PS4][TargetLibraryInfo] Set TLI info correctly for PS4 2021-10-07 10:03:31 -07:00
Sanjay Patel fdbf2bb4ee [InstSimplify] (x || y) && (x || !y) --> x
https://alive2.llvm.org/ce/z/4BE33w

This is the logical (select-form) equivalent of the bitwise logic fold:
e36d351d19

This is another part of solving the regression from:
https://llvm.org/PR52077
2021-10-07 12:25:25 -04:00
Erik Desjardins 11c8efd4db [Inline] Introduce Constant::hasOneLiveUse, use it instead of hasOneUse in inline cost model (PR51667)
Otherwise, inlining costs may be pessimized by dead constants.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51667.

Reviewed By: mtrofin, aeubanks

Differential Revision: https://reviews.llvm.org/D109294
2021-10-07 08:33:25 -07:00
Itay Bookstein 40ec1c0f16 [IR][NFC] Rename getBaseObject to getAliaseeObject
To better reflect the meaning of the now-disambiguated {GlobalValue,
GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction
(D109792), the function is renamed to getAliaseeObject.
2021-10-06 19:33:10 -07:00
Kuba Mracek 7329abf2f8 [GlobalDCE] In VFE, replace the whole 'sub' expression of unused relative-pointer-based vtable slots
Differential Revision: https://reviews.llvm.org/D109114
2021-10-06 15:55:55 -07:00
Philip Reames 1183d65b4d [SCEV] Search operand tree for scope bound when inferring flags from IR
When checking to see if we can apply IR flags to a SCEV, we need to identify a bound on the defining scope of the SCEV to be produced.  We'd previously added support for a couple SCEVExpr types which trivially imply bounds, but hadn't handled types such as umax where the bounds come from the bounds of the operands.  This does the obvious thing, and recurses through operands searching for a tighter bound on the defining scope.

I'm honestly surprised by how little this seems to mater on existing tests, but it's worth doing for completeness sake alone.

Differential Revision: https://reviews.llvm.org/D111191
2021-10-06 15:10:02 -07:00
Nikita Popov 17c20a6dfb [SCEV] Avoid unnecessary domination checks (NFC)
When determining the defining scope, avoid repeatedly querying
dominationg against the function entry instruction. This ends up
begin a very common case that we can handle more efficiently.
2021-10-06 22:14:04 +02:00
Philip Reames a7ae227baf [scev] minor style improvement [nfc] 2021-10-06 12:15:16 -07:00
Philip Reames 67896f494e Returning poison from a function w/ noundef return attribute is UB
This does for readability of returns within said function as what we do for the caller side when reasoning about what might be poison.

Differential Revision: https://reviews.llvm.org/D111180
2021-10-06 11:52:18 -07:00
Philip Reames 0658bab870 [SCEV] Infer flags from add/gep in any block
This patch removes a compile time restriction from isSCEVExprNeverPoison. We've strengthened our ability to reason about flags on scopes other than addrecs, and this bailout prevents us from using it. The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands *anyways*.

Differential Revision: https://reviews.llvm.org/D111186
2021-10-06 11:11:54 -07:00
Kevin P. Neal f86c930cc9 [FPEnv][InstSimplify] Fold constrained X + -0.0 ==> X
Currently the fadd optimizations in InstSimplify don't know how to do this
"X + -0.0 ==> X" fold when using the constrained intrinsics. This adds the
support.

This commit is derived from D106362 with some improvements from D107285.

Differential Revision: https://reviews.llvm.org/D111085
2021-10-06 13:52:31 -04:00
Nikita Popov 1301a8b473 [BasicAA] Don't unnecessarily extend pointer size
BasicAA GEP decomposition currently performs all calculation on the
maximum pointer size, but at least 64-bit, with an option to double
the size. The code comment claims that this improves analysis power
when working with uint64_t indices on 32-bit systems. However, I don't
see how this can be, at least while maintaining correctness:

When working on canonical code, the GEP indices will have GEP index
size. If the original code worked on uint64_t with a 32-bit size_t,
then there will be truncs inserted before use as a GEP index. Linear
expression decomposition does not look through truncs, so this will
be an opaque value as far as GEP decomposition is concerned. Working
on a wider pointer size does not help here (or have any effect at all).

When working on non-canonical code (before first InstCombine), the
GEP indices are implicitly truncated to GEP index size. The BasicAA
code currently just ignores this fact completely, and pretends that
this truncation doesn't happen. This is incorrect and will be
addressed by D110977.

I believe that for correctness reasons, it is important to work on
the actual GEP index size to properly model potential overflow.
BasicAA tries to patch over the fact that it uses the wrong size
(see adjustToPointerSize), but it only does that in limited cases
(only for constant values, and not all of them either). I'd like to
move this code towards always working on the correct size, and
dropping these artificial pointer size adjustments is the first step
towards that.

Differential Revision: https://reviews.llvm.org/D110657
2021-10-06 18:40:21 +02:00
Sanjay Patel e36d351d19 [InstSimplify] (x | y) & (x | !y) --> x
https://alive2.llvm.org/ce/z/QagQMn

This fold is handled by instcombine via SimplifyUsingDistributiveLaws(),
but we are missing the sibliing fold for 'logical and' (implemented with
'select'). Retrofitting the code in instcombine looks much harder
than just adding a small adjustment here, and this is potentially more
efficient and beneficial to other passes.
2021-10-06 12:31:25 -04:00
Clement Courbet 3255015407 Fix incomplete conflict resolution in ff41fc07b1 2021-10-06 16:55:14 +02:00
Clement Courbet ff41fc07b1 Revert "[AA] Teach BasicAA to recognize basic GEP range information."
We have found a miscompile with this change, reverting while working on a
reproducer.

This reverts commit 455b60ccfb.
2021-10-06 16:49:10 +02:00
Mircea Trofin 7d541eb4d4 [inliner] Mandatory inlining decisions produce remarks
This also removes the need to disable the mandatory inlining phase in
tests.

In a departure from the previous remark, we don't output a 'cost' in
this case, because there's no such thing. We just report that inlining
happened because of the attribute.

Differential Revision: https://reviews.llvm.org/D110891
2021-10-05 14:01:25 -07:00
Nikita Popov 0be9940ef2 [SCEV] Don't check if propagation safe if there are no flags (NFC)
If there are no nowrap flags, then we don't need to determine
whether propagating flags is safe -- it will make no difference.
2021-10-05 22:25:41 +02:00
Philip Reames c608b49d67 [SCEV] Tweak the algorithm for figuring out if flags must apply to a SCEV [mostly-NFC]
Behavior wise, this patch should be mostly NFC.  The only behavior difference known is that on the isSCEVExprNeverPoison path we'll consider a bound imposed by the SCEVable operands (if any).

Algorithmically, it's an invert of the existing code.  Previously, we checked for each operand if we could find a bound, then checked for must-execute given that bound.  With the patch, we use dominance to refine the innermost bound, then check must execute once.  The interesting case is when we have multiple unknowns within a single basic block.  While both dominance and must-execute are worst-case linear walks within the block, only dominance is cached.  As such, refining based on dominance should be more efficient.
2021-10-05 11:20:48 -07:00
Nikita Popov c117d77e93 [ConstantFold] Refactor load folding
This refactors load folding to happen in two cleanly separated
steps: ConstantFoldLoadFromConstPtr() takes a pointer to load from
and decomposes it into a constant initializer base and an offset.
Then ConstantFoldLoadFromConst() loads from that initializer at
the given offset. This makes the core logic independent of having
actual GEP expressions (and those GEP expressions having certain
structure) and will allow exposing ConstantFoldLoadFromConst() as
an independent API in the future.

This is mostly only a refactoring, but it does make the folding
logic slightly more powerful.

Differential Revision: https://reviews.llvm.org/D111023
2021-10-05 18:07:57 +02:00
Nikita Popov 30001af84e [BasicAA] Ignore CanBeFreed in minimal extent reasoning
When determining NoAlias based on object size and dereferenceability
information, we can ignore frees for the same reason we can ignore
possible null pointers (if null is not a valid pointer): Actually
accessing the null pointer / freed pointer would be immediate UB,
and AA results are only valid under the assumption of an access.

This addresses a minor regression from D110745.

Differential Revision: https://reviews.llvm.org/D111028
2021-10-04 22:08:57 +02:00
Bjorn Pettersson 7f84fa4ad4 [TargetLibraryInfo] Refactor size_t checks in isValidProtoForLibFunc. NFC
In TargetLibraryInfoImpl::isValidProtoForLibFunc we no longer
need the IsSizeTTy lambda function and the SizeTTy object. Instead
we just follow the regular structure of checking for integer types
given an exepected number of bits.
2021-10-04 15:46:39 +02:00