Commit Graph

11632 Commits

Author SHA1 Message Date
Sinan Lin 7a2f5dca09 [CodeMetrics] use hasOneLiveUse instead of hasOneUse while analyzing inlinable callsites
It would be better for CodeMetrics to use hasOneLiveUse while analyzing
static and called once callsites, since inline cost now uses
hasOneLiveUse instead of hasOneUse to avoid overpessimization on dead
constant cases (since this patch https://reviews.llvm.org/D109294).

This change has no noticeable influence now, but it helps improve the
accuracy of cost models of passes that use CodeMetrics.

Reviewed By: fhahn, nikic

Differential Revision: https://reviews.llvm.org/D130461
2022-07-26 13:46:19 +08:00
Augie Fackler 85063090e9 MemoryBuiltins: remove malloc-family funcs from list
We no longer need specialized knowledge of these allocator functions in
this file since we have the correct attributes available now.

As far as I can tell the changes in the attributor tests are due to
things getting more consistent on alloc-family once we remove the static
list entries.

The two test changes in NewGVN merit extra scrutiny: NewGVN appears to
be _extremely_ sensitive to the inaccessiblememonly for reasons that
are beyond me. As a result, I had-enumerated all the attributes on
allocation functions in those two tests instead of using -inferattrs.
I assumed that the two -disable-simplify-libcalls tests there no
longer are sensible since the function declaration now includes all the
relevant attributes.

Differential Revision: https://reviews.llvm.org/D130107
2022-07-25 17:29:01 -04:00
Benjamin Kramer 5fde785186 [ValueTracking] Fix unused variable warning in release builds. NFC 2022-07-25 13:28:32 +02:00
Peter Waller f8919d2f7e [NFC][GVN] Put phi-translation of 'add' behind a switch
The code in this `#if 0` block appears to be a net benefit. Put it
behind a switch defaulting to off to support experimentation and as a
request for comment.

The codegen impact of enabling this that I'm currently persuing is that
it allows PRE to take place more frequently, particularly in loops with
second order recurrences.

Preliminary experimental data:

Across LNT on AArch64, 54 benchmarks are sped up by >1%, and 42 are
regressed by >1%, the geomean (exec_time_enabled / exec_time_disabled)
of these 96 "1% or greater significance" benchmarks is 0.991. For the
full set of 770 benchmarks it's 0.998.

There are two benchmarks which experience a >30% speedup, and the worst
slowdown is ~12%, and for every benchmark with a slowdown there is a
benckmark which is sped up by a greater factor.

Differential Revision: https://reviews.llvm.org/D130241
2022-07-25 07:59:47 +00:00
Max Kazantsev a053f35990 [SCEV][NFC][CT] Cheaper handling of guards in isBasicBlockEntryGuardedByCond
Handle guards uniformly with assumes, rather than iterating through all
block instructions in attempt to find them.

Differential Revision: https://reviews.llvm.org/D129874
Reviewed By: nikic
2022-07-25 13:38:59 +07:00
Kazu Hirata b5188591a0 [llvm] Remove redundaunt virtual specifiers (NFC)
Identified with modernize-use-override.
2022-07-24 21:50:35 -07:00
Kazu Hirata acf648b5e9 Use llvm::less_first and llvm::less_second (NFC) 2022-07-24 16:21:29 -07:00
Sanjay Patel a925bef70c [ValueTracking] allow vector types in isImpliedCondition()
The matching of constants assumed integers, but we can handle
splat vector constants seamlessly with m_APInt.
2022-07-24 17:46:48 -04:00
Kazu Hirata 97718180d7 [Analysis] Remove a redundant return statement (NFC)
Identified with readability-redundant-control-flow.
2022-07-23 11:35:19 -07:00
Malhar Jajoo 41958f76d8 [Costmodel] Add "type-based-intrinsic-cost" cli option
This patch adds a command line flag to be able to test
the type based cost-model analysis for Intrinsics.

Differential Revision: https://reviews.llvm.org/D129109
2022-07-22 15:50:57 +01:00
Teresa Johnson 1dad6247d2 [MemProf] Add memprof metadata related analysis utilities
Adds a number of utilities that are used to help create and update
memprof related metadata. These will be used during profile matching
and annotation, as well as by the inliner when updating the metadata.
Also adds unit tests for the utilities.

See also related RFCs:
RFC: Sanitizer-based Heap Profiler [1]
RFC: A binary serialization format for MemProf [2]
RFC: IR metadata format for MemProf [3]
(Note that the IR metadata format has changed from the RFC during
implementation, as described in the preceeding patch adding the basic
metadata and verification support.)

Depends on D128141.

Differential Revision: https://reviews.llvm.org/D128854
2022-07-21 13:46:01 -07:00
Augie Fackler 62f48cadfd MemoryBuiltins: accept non-TLI funcs with attribs as allocator funcs
This allows us to accept annotations from out-of-tree languages (the
example test is derived from Rust) so they can enjoy the benefits of
LLVM's optimizations without requiring LLVM to have language-specific
knowledge.

Differential Revision: https://reviews.llvm.org/D123091
2022-07-21 15:31:16 -04:00
Augie Fackler 5a3e3675f6 MemoryBuiltins: start using properties of functions
Prior to this change, we relied on the hard-coded list for all of the
information performed by MemoryBuiltins. With this change, we're able to
start relying on properites of functions described in attributes, which
opens the door to out-of-tree compilers being able to describe their
allocator functions to LLVM's optimizer logic without having to register
their implementation details with LLVM.

Differential Revision: https://reviews.llvm.org/D123090
2022-07-21 15:31:15 -04:00
Arthur Eubanks 04d398db46 [LoopAccessAnalysis] Simplify D119047
No need to add checks for every type per pointer that we couldn't create
a check for the first time around, just the types that weren't
successful.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D119376
2022-07-21 12:16:02 -07:00
David Sherwood f15b6b2907 [AArch64] Add target hook for preferPredicateOverEpilogue
This patch adds the AArch64 hook for preferPredicateOverEpilogue,
which currently returns true if SVE is enabled and one of the
following conditions (non-exhaustive) is met:

1. The "sve-tail-folding" option is set to "all", or
2. The "sve-tail-folding" option is set to "all+noreductions"
and the loop does not contain reductions,
3. The "sve-tail-folding" option is set to "all+norecurrences"
and the loop has no first-order recurrences.

Currently the default option is "disabled", but this will be
changed in a later patch.

I've added new tests to show the options behave as expected here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D129560
2022-07-21 17:20:06 +01:00
Nikita Popov 1f69503107 [MemoryBuiltins] Add getReallocatedOperand() function (NFC)
Replace the value-accepting isReallocLikeFn() overload with a
getReallocatedOperand() function, which returns which operand is
the one being reallocated. Currently, this is always the first one,
but once allockind(realloc) is respected, the reallocated operand
will be determined by the allocptr parameter attribute.
2022-07-21 14:54:16 +02:00
Nikita Popov 46e6dd84b7 [MemoryBuiltins] Remove isFreeCall() function (NFC)
Remove isFreeCall() in favor of getFreedOperand(). Replace the
two remaining uses with a getFreedOperand() != nullptr check, as
they only care that something is getting freed. (The usage in DSE
is correct as such. The allocator-related checks in CFLGraph look
rather questionable in general.)
2022-07-21 14:44:23 +02:00
Nikita Popov c81dff3c30 [MemoryBuiltins] Add getFreedOperand() function (NFCI)
We currently assume in a number of places that free-like functions
free their first argument. This is true for all hardcoded free-like
functions, but with the new attribute-based design, the freed
argument is supposed to be indicated by the allocptr attribute.

To make sure we handle this correctly once allockind(free) is
respected, add a getFreedOperand() helper which returns the freed
argument, rather than just indicating whether the call frees *some*
argument.

This migrates most but not all users of isFreeCall() to the new
API. The remaining users are a bit more tricky.
2022-07-21 12:39:35 +02:00
Nikita Popov d144ae6e1b [MemoryBuiltins] Default to trivial mapper in getAllocSize() (NFC)
Default getAllocSize() to use the trivial mapper. Also switch
from using std::function to function_ref.

Furthermore, update the doc comment to point out a subtle difference
between getAllocSize() and getObjectSize(): The latter may also
return something for calls that return their argument (via "returned"
attribute or special intrinsics like invariant groups).
2022-07-21 11:43:48 +02:00
Nikita Popov 235fb602ed [MemoryBuiltins] Don't query TLI for non-pointer functions (NFC)
Fetching allocation data for calls is a rather hot operation, and
TLI lookups are slow. We can greatly reduce the number of calls
for which TLI is queried by checking that they return a pointer
value first, as this is a requirement for allocation functions
anyway.
2022-07-21 11:28:36 +02:00
Nikita Popov f45ab43332 [MemoryBuiltins] Avoid isAllocationFn() call before checking removable alloc
Alloc directly checking whether a given call is a removable
allocation, instead of first checking whether it is an allocation
first.
2022-07-21 09:39:19 +02:00
Congzhe Cao 05ccde8023 [LoopCacheAnalysis] Fix a type mismatch problem in cost calculation
There is a problem in loop cache analysis that the types of SCEV variables
`Coeff` and `ElemSize` in function `isConsecutive()` may not match. The
mismatch would cause SCEV failures when `Coeff` is multiplied with `ElemSize`.

The fix in this patch is to extend the type of both `Coeff` and `ElemSize` to
whichever is wider in those two variables. As a clean-up, duplicate calculations
of `Stride` in `computeRefCost()` is then removed.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D128877
2022-07-21 01:57:05 -04:00
Schrodinger ZHU Yifan 304027206c [ThinLTO] Support aliased GlobalIFunc
Fixes https://github.com/llvm/llvm-project/issues/56290: when an ifunc is
aliased in LTO, clang will attempt to create an alias summary; however, as ifunc
is not included in the module summary, doing so will lead to crash.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D129009
2022-07-20 15:30:38 -07:00
Philip Reames f494f89b2a [LAA] Fix latent missing check bug when mixing scalable and non-scalabe strides
Noticed via inspection; to my knowledge, impossible to hit today.  In theory, we could have a fixed stride check be analyzed, then a scalable one.  With the old code, the scalable one would be silently dropped, and the runtime guard would go ahead with only the fixed one.  This would be a miscompile.
2022-07-20 11:56:45 -07:00
Max Kazantsev e0ccd190ae [SCEV][NFC][CT] Do not waste time proving contextual facts for unreached loops and blocks
In fact, in unreached code we can say that every fact is true. So do not waste time trying to
do something smarter.

Formally it's not an NFC because it may change query results in unreached code, but they
won't have any impact on execution.

Hypothetical CT boost expected but not measured in practice.

Differential Revision: https://reviews.llvm.org/D129878
2022-07-20 19:02:28 +07:00
Chuanqi Xu 645d2dd3a9 Revert "Don't treat readnone call in presplit coroutine as not access memory"
This reverts commit 57224ff4a6. This
commit may trigger crashes on some workloads. Revert it for clearness.
2022-07-20 17:00:58 +08:00
Chuanqi Xu 57224ff4a6 Don't treat readnone call in presplit coroutine as not access memory
To solve the readnone problems in coroutines. See
https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015
for details.

According to the discussion, we decide to fix the problem by inserting
isPresplitCoroutine() checks in different passes instead of
wrapping/unwrapping readnone attributes in CoroEarly/CoroCleanup passes.
In this direction, we might not be able to cover every case at first.
Let's take a "find and fix" strategy.

Reviewed By: nikic, nhaehnle, jyknight

Differential Revision: https://reviews.llvm.org/D127383
2022-07-20 10:37:23 +08:00
Nikita Popov 534b9246a2 [LoopInfo] Allow cloning of callbr
After D129288, callbr is safe to clone without special handling.
This permits optimizations like loop unroll and loop unswitch on
loops containing callbrs.

Fixes https://github.com/llvm/llvm-project/issues/41834.

Differential Revision: https://reviews.llvm.org/D129993
2022-07-19 09:57:28 +02:00
Max Kazantsev 51f837a680 [NFC] Introduce API to detect tokens penetrating LCSSA form
Following discussion in PR56243, we need to somehow detect the situation
when token values penetrate LCSSA form for transforms that require that
it is maintained by all values (for example, to sustain use-def dominance
invarians). This patch introduces a parameter to LCSSA checkers to control
their ignorance about tokens.

Differential Revision: https://reviews.llvm.org/D129983
Reviewed By: efriedma
2022-07-19 13:52:30 +07:00
Benjamin Kramer 4bd072c56b [LAA] Fix the build with older versions of Clang
llvm/lib/Analysis/LoopAccessAnalysis.cpp:916:12: error: no viable conversion from returned value of type 'SmallVector<[...], 2>' to function return type 'SmallVector<[...], (default)
      CalculateSmallVectorDefaultInlinedElements<T>::value aka 3>'
    return Scevs;
           ^~~~~
2022-07-18 14:01:47 +02:00
Graham Hunter db8fcb2c25 [LAA] Add recursive IR walker for forked pointers
This builds on the previous forked pointers patch, which only accepted
a single select as the pointer to check. A recursive function to walk
through IR has been added, which searches for either a loop-invariant
or addrec SCEV.

This will only handle a single fork at present, so selects of selects
or a GEP with a select for both the base and offset will be rejected.

There is also a recursion limit with a cli option to change it.

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D108699
2022-07-18 12:06:17 +01:00
Kazu Hirata 601b3a13de [Analysis] Qualify auto variables in for loops (NFC) 2022-07-16 23:26:34 -07:00
Kazu Hirata 92a1b2afc8 [Analysis] Remove isArithmeticRecurrenceKind
The last use was removed on Jul 30, 2021 in commit
9d35594993.
2022-07-16 13:23:32 -07:00
Max Kazantsev 883e83d5fe [NFC][SCEV] Rename variable to correspond its current meaning 2022-07-15 22:33:57 +07:00
Nikita Popov 2659e1bf4b [SCEV] List all binops in getOperandsToCreate()
Explicitly list all binops rather than having a default case. There
were two bugs here:
1. U->getOpcode() was used instead of BO->Opcode, which means we
   used the logic for the wrong opcode in some cases.
2. SCEV construction does not support LShr. We should return
   unknown for it rather than recursing into the operands.
2022-07-15 17:08:48 +02:00
Florian Hahn e7ec1746a6
[SCEV] Avoid creating unnecessary SCEVs for SelectInsts.
After 675080a453, we always create SCEVs for all operands of a
SelectInst. This can cause notable compile-time regressions compared to
the recursive algorithm, which only evaluates the operands if the select
is in a form we can create a usable expression.

This approach adds additional logic to getOperandsToCreate to only
queue operands for selects if we will later be able to construct a
usable SCEV.

Unfortunately this introduces a bit of coupling between actual SCEV
construction for selects and getOperandsToCreate, but I am not sure if
there are better alternatives to address the regression mentioned for
675080a453.

This doesn't have any notable compile-time impact on CTMark.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D129731
2022-07-14 09:23:47 -07:00
Philip Reames 3bc09c7da5 [SCEVExpander] Allow udiv with isKnownNonZero(RHS) + add vscale case
Motivation here is to unblock LSRs ability to use ICmpZero uses - the major effect of which is to enable count down IVs. The test changes reflect this goal, but the potential impact is much broader since this isn't a change in LSR at all.

SCEVExpander needs(*) to prove that expanding the expression is safe anywhere the SCEV expression is valid. In general, we can't expand any node which might fault (or exhibit UB) unless we can either a) prove it won't fault, or b) guard the faulting case. We'd been allowing non-zero constants here; this change extends it to non-zero values.

vscale is never zero. This is already implemented in ValueTracking, and this change just adds the same logic in SCEV's range computation (which in turn drives isKnownNonZero). We should common up some logic here, but let's do that in separate changes.

(*) As an aside, "needs" is such an interesting word here. First, we don't actually need to guard this at all; we could choose to emit a select for the RHS of ever udiv and remove this code entirely. Secondly, the property being checked here is way too strong. What the client actually needs is to expand the SCEV at some particular point in some particular loop. In the examples, the original urem dominates that loop and yet we completely ignore that information when analyzing legality. I don't plan to actively pursue either direction, just noting it for future reference.

Differential Revision: https://reviews.llvm.org/D129710
2022-07-14 08:56:58 -07:00
Dawid Jurczak d71128d97d [NFC][Metadata] Change MDNode::operands()'s return type from op_range to ArrayRef<MDOperand>
This patch is https://reviews.llvm.org/D129468 follow-up and address one of comment
coming from that review: https://reviews.llvm.org/D129468#3643295

Differential Revision: https://reviews.llvm.org/D129565
2022-07-14 17:22:32 +02:00
Kazu Hirata 611ffcf4e4 [llvm] Use value instead of getValue (NFC) 2022-07-13 23:11:56 -07:00
Kazu Hirata 30d3f56e33 [Analysis] clang-format InlineAdvisor.cpp (NFC) 2022-07-13 13:38:50 -07:00
Max Kazantsev 30e33b4b81 [SCEV][NFC] Make getStrengthenedNoWrapFlagsFromBinOp return optional 2022-07-13 18:54:25 +07:00
Peter Waller 8acf74fd56 [InstCombine][SVE] Bail out of isSafeToLoadUnconditionally for scalable types
`isSafeToLoadUnconditionally` currently assumes sized types. Bail out for now.
This fixes a TypeSize warning reachable from instcombine via (load (select
cond, ptr, ptr)).

Differential Revision: https://reviews.llvm.org/D129477
2022-07-13 10:07:36 +00:00
Dawid Jurczak 165240fe38 [NFC] Fix compile time regression seen on some benchmarks after a630ea3003 commit
The goal of this change is fixing most of compile time slowdown seen after a630ea3003 commit on lencod and sqlite3 benchmarks.
There are 3 improvements included in this patch:

1. In getNumOperands when possible get value directly from SmallNumOps.
2. Inline getLargePtr by moving its definition to header.
3. In TBAAStructTypeNode::getField get all operands once instead taking operands in loop one after one.

Differential Revision: https://reviews.llvm.org/D129468
2022-07-12 15:00:27 +02:00
Aiden Grossman f3939dc509 [mlgo] Simplify autogenerated regalloc model
Currently the autogenerated regalloc model will sometimes
output an incorrect LR index to evict instead of the first LR
with with the mask set to 1. This trips an assertion within
the MLRegallocAdvisor that the evicted LR has a mask of 1. This
patch, made possible by https://reviews.llvm.org/D124565, simplifies
the autogenerated model by taking away all unnecessary features and
getting rid of the functions that were previously to mix in all
the necessary inputs so they wouldn't get pruned by the Tensorflow
XLA AOT compiler. This is no longer necessary after the previously
mentioned patch. This also fixes the nondeterministic behavior
that is sometimes observed where the autogenerated model will
simply output 0 instead of the correct index.

Reviewed By: yundiqian

Differential Revision: https://reviews.llvm.org/D129254
2022-07-11 13:23:31 -07:00
Mircea Trofin 24c6c35270 [mlgo] Don't provide default model URLs
Pointed out in Issue #56432: the current reference models may not be
quite friendly to open source projects. Their purpose is only
illustrative - the expectation is that projects would train their own.
To avoid unintentionally pulling such a model, made the URL cmake
setting require explicit user setting.

Differential Revision: https://reviews.llvm.org/D129342
2022-07-11 07:37:14 -07:00
David Sherwood 03fee6712a [LoopVectorize] Add option to use active lane mask for loop control flow
Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operations,
such as masked loads and stores, etc. The loop itself is still
controlled by comparing the canonical induction variable with the
trip count. However, for some targets this is inefficient when it's
cheap to use the mask itself to control the loop.

This patch adds support for using the active lane mask for control
flow by:

1. Generating the active lane mask for the next iteration of the
vector loop, rather than the current one. If there are still any
remaining iterations then at least the first bit of the mask will
be set.
2. Extract the first bit of this mask and use this bit for the
conditional branch.

I did this by creating a new VPActiveLaneMaskPHIRecipe that sets
up the initial PHI values in the vector loop pre-header. I've also
made use of the new BranchOnCond VPInstruction for the final
instruction in the loop region.

Differential Revision: https://reviews.llvm.org/D125301
2022-07-11 13:46:55 +01:00
Nicolai Hähnle ede600377c ManagedStatic: remove many straightforward uses in llvm
(Reapply after revert in e9ce1a5880 due to
Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
than error categories, to be checked in more detail and reapplied
separately.)

Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 10:29:15 +02:00
Nicolai Hähnle e9ce1a5880 Revert "ManagedStatic: remove many straightforward uses in llvm"
This reverts commit e6f1f06245.

Reverting due to a failure on the fuchsia-x86_64-linux buildbot.
2022-07-10 09:54:30 +02:00
Nicolai Hähnle e6f1f06245 ManagedStatic: remove many straightforward uses in llvm
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 09:15:08 +02:00
Wenlei He a78f436c3f [Inliner] Make recusive inlinee stack size limit tunable
For recursive callers, we want to be conservative when inlining callees with large stack size. We currently have a limit `InlineConstants::TotalAllocaSizeRecursiveCaller`, but that is hard coded.

We found the current limit insufficient to suppress problematic inlining that bloats stack size for deep recursion. This change adds a switch to make the limit tunable as a mitigation.

Differential Revision: https://reviews.llvm.org/D129411
2022-07-08 21:32:39 -07:00