Commit Graph

11852 Commits

Author SHA1 Message Date
Nikita Popov 5b1d10bda6 [AA] Drop setModAndRef() function (NFC)
Without the "must" state, this function is pointless, because we
can just directly create a ModRef instead.
2022-08-01 07:55:39 +02:00
Nikita Popov 34683c3e35 [MSSA] Fix expensive checks build 2022-08-01 07:28:52 +02:00
Nikita Popov f96ea53e89 [AA] Do not track Must in ModRefInfo
getModRefInfo() queries currently track whether the result is a
MustAlias on a best-effort basis. The only user of this functionality
is the optimized memory access type in MemorySSA -- which in turn
has no users. Given that this functionality has not found a user
since it was introduced five years ago (in D38862), I think we
should drop it again.

The context is that I'm working to separate FunctionModRefBehavior
to track mod/ref for different location kinds (like argmem or
inaccessiblemem) separately, and the fact that ModRefInfo also has
an unrelated Must flag makes this quite awkward, especially as this
means that NoModRef is not a zero value. If we want to retain the
functionality, I would probably split getModRefInfo() results into
a part that just contains the ModRef information, and a separate
part containing a (best-effort) AliasResult.

Differential Revision: https://reviews.llvm.org/D130713
2022-08-01 07:14:31 +02:00
Kazu Hirata bf6021709a Use drop_begin (NFC) 2022-07-31 15:17:09 -07:00
Sanjay Patel 02b3a35892 [InstSimplify] fold FP rounding intrinsic with rounded operand
issue #56775

I rearranged the Thumb2 codegen test to avoid simplifying the chain
of rounding instructions. I'm assuming the intent of the test is
to verify lowering of each of those intrinsics.
2022-07-31 10:00:27 -04:00
Kazu Hirata 60db8d9b4e Use nullptr instead of 0 (NFC)
Identified with modernize-use-nullptr.
2022-07-30 10:35:48 -07:00
Nuno Lopes d4b4747de5 ConstantFolding: fold OOB accesses to poison instead of undef 2022-07-30 15:20:32 +01:00
Nuno Lopes fffabd5348 [NFC] Switch a few uses of undef to poison as placeholders for unreachable code 2022-07-30 13:55:56 +01:00
Florian Hahn 214e2d8fe5
[SCEV] Avoid repeated proveNoSignedWrapViaInduction calls.
At the moment, proveNoSignedWrapViaInduction may be called for the
same AddRec a large number of times via getSignExtendExpr. This can have
a severe compile-time impact for very loop-heavy code.

If proveNoSignedWrapViaInduction failed to prove NSW the first time,
it is unlikely to succeed on subsequent tries and the cost doesn't seem
to be justified.

This is the signed version of 8daa338297 / D130648.

This can drastically improve compile-time in some excessive cases and
also has a slightly positive compile-time impact on CTMark:

NewPM-O3: -0.06%
NewPM-ReleaseThinLTO: -0.04%
NewPM-ReleaseLTO-g: -0.04%

https://llvm-compile-time-tracker.com/compare.php?from=8daa338297d533db4d1ae8d3770613eb25c29688&to=aed126a196e7a5a9803543d9b4d6bdb233d0009c&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D130694
2022-07-29 09:15:03 +01:00
Liqiang Tao d52e775b05 [llvm][ModuleInliner] Add inline cost priority for module inliner
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D130012
2022-07-28 22:44:03 +08:00
Liqiang Tao c113594378 Revert "[llvm][ModuleInliner] Add inline cost priority for module inliner"
This reverts commit bb7f62bbbd.
2022-07-28 22:36:28 +08:00
Liqiang Tao bb7f62bbbd [llvm][ModuleInliner] Add inline cost priority for module inliner
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D130012
2022-07-28 21:28:07 +08:00
Florian Hahn 8daa338297
[SCEV] Avoid repeated proveNoUnsignedWrapViaInduction calls.
At the moment, proveNoUnsignedWrapViaInduction may be called for the
same AddRec a large number of times via getZeroExtendExpr. This can have
a severe compile-time impact for very loop-heavy code. One one
particular workload, LSR takes ~51s without this patch, almost
exlusively in proveNoUnsignedWrapViaInduction. With this patch, the time
in LSR drops to ~0.4s.

If proveNoUnsignedWrapViaInduction failed to prove NUW the first time,
it is unlikely to succeed on subsequent tries and the cost doesn't seem
to be justified.

Besides drastically improving compile-time in some excessive cases, this
also has a slightly positive compile-time impact on CTMark:

NewPM-O3: -0.07%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.06

https://llvm-compile-time-tracker.com/compare.php?from=b435da027d7774c24cdb8c88d09f6b771e07fb14&to=f2729e33e8284b502f6c35a43345272252f35d12&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D130648
2022-07-28 10:02:19 +01:00
Max Kazantsev 2d1c6e0b44 [LAA] Remove block order sensitivity in LAA algorithm. PR56672
As test in PR56672 shows, LAA produces different results which lead to either
positive or negative vectorization decisions depending on the order of blocks
in loop. The exact reason of this is not clear to me, however this makes investigation
of related bugs extremely complex.

Current order of blocks in the loop is arbitrary. It may change, for example, if loop
info analysis is dropped and recomputed. Seems that it interferes with LAA's logic.
This patch chooses fixed traversal order of blocks in loops, making it RPOT.

Note: this is *not* a fix for bug with incorrect analysis result. It just makes
the answer more robust to make the investigation easier.

Differential Revision: https://reviews.llvm.org/D130482
Reviewed By: aeubanks, fhahn
2022-07-28 13:36:56 +07:00
Paul Kirth 6e9bab71b6 Revert "[llvm][NFC] Refactor code to use ProfDataUtils"
This reverts commit 300c9a7881.

We will reland once these issues are ironed out.
2022-07-27 21:38:11 +00:00
Paul Kirth 300c9a7881 [llvm][NFC] Refactor code to use ProfDataUtils
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.

Reviewed By: bogner, davidxl

Differential Revision: https://reviews.llvm.org/D128860
2022-07-27 21:13:54 +00:00
Stanislav Mekhanoshin 0562cf442f Allow data prefetch into non-default address space
I am playing with the LoopDataPrefetch pass and found out that it
bails to work with a pointer in a non-zero address space. This
patch adds the target callback to check if an address space is to
be considered for prefetching. Default implementation still only
allows address space 0, so this is NFCI.

This does not currently affect any known targets, but seems to be
generally useful for the future.

Differential Revision: https://reviews.llvm.org/D129795
2022-07-27 10:01:26 -07:00
Martin Sebor 4447603616 [InstCombine] Fold strtoul and strtoull and avoid PR #56293
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D129224
2022-07-26 14:11:40 -06:00
Sanjay Patel dcd09467b0 [InstSimplify] remove redundant calls to 'isImplied'; NFCI
We already call the more general isImpliedCondition() (which calls
isImpliedTrueByMatchingCmp() internally) from simplifyAndInst()
and simplifyOrInst().

There was a difference visible with this change on a vector test
before a925bef70c, but I can't find any gaps now.
2022-07-26 14:47:21 -04:00
Arthur Eubanks 2eade1dba4 [WPD] Use new llvm.public.type.test intrinsic for potentially publicly visible classes
Turning on opaque pointers has uncovered an issue with WPD where we currently pattern match away `assume(type.test)` in WPD so that a later LTT doesn't resolve the type test to undef and introduce an `assume(false)`. The pattern matching can fail in cases where we transform two `assume(type.test)`s into `assume(phi(type.test.1, type.test.2))`.

Currently we create `assume(type.test)` for all virtual calls that might be devirtualized. This is to support `-Wl,--lto-whole-program-visibility`.

To prevent this, all virtual calls that may not be in the same LTO module instead use a new `llvm.public.type.test` intrinsic in place of the `llvm.type.test`. Then when we know if `-Wl,--lto-whole-program-visibility` is passed or not, we can either replace all `llvm.public.type.test` with `llvm.type.test`, or replace all `llvm.public.type.test` with `true`. This prevents WPD from trying to pattern match away `assume(type.test)` for public virtual calls when failing the pattern matching will result in miscompiles.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D128955
2022-07-26 08:01:08 -07:00
Sinan Lin 7a2f5dca09 [CodeMetrics] use hasOneLiveUse instead of hasOneUse while analyzing inlinable callsites
It would be better for CodeMetrics to use hasOneLiveUse while analyzing
static and called once callsites, since inline cost now uses
hasOneLiveUse instead of hasOneUse to avoid overpessimization on dead
constant cases (since this patch https://reviews.llvm.org/D109294).

This change has no noticeable influence now, but it helps improve the
accuracy of cost models of passes that use CodeMetrics.

Reviewed By: fhahn, nikic

Differential Revision: https://reviews.llvm.org/D130461
2022-07-26 13:46:19 +08:00
Augie Fackler 85063090e9 MemoryBuiltins: remove malloc-family funcs from list
We no longer need specialized knowledge of these allocator functions in
this file since we have the correct attributes available now.

As far as I can tell the changes in the attributor tests are due to
things getting more consistent on alloc-family once we remove the static
list entries.

The two test changes in NewGVN merit extra scrutiny: NewGVN appears to
be _extremely_ sensitive to the inaccessiblememonly for reasons that
are beyond me. As a result, I had-enumerated all the attributes on
allocation functions in those two tests instead of using -inferattrs.
I assumed that the two -disable-simplify-libcalls tests there no
longer are sensible since the function declaration now includes all the
relevant attributes.

Differential Revision: https://reviews.llvm.org/D130107
2022-07-25 17:29:01 -04:00
Benjamin Kramer 5fde785186 [ValueTracking] Fix unused variable warning in release builds. NFC 2022-07-25 13:28:32 +02:00
Peter Waller f8919d2f7e [NFC][GVN] Put phi-translation of 'add' behind a switch
The code in this `#if 0` block appears to be a net benefit. Put it
behind a switch defaulting to off to support experimentation and as a
request for comment.

The codegen impact of enabling this that I'm currently persuing is that
it allows PRE to take place more frequently, particularly in loops with
second order recurrences.

Preliminary experimental data:

Across LNT on AArch64, 54 benchmarks are sped up by >1%, and 42 are
regressed by >1%, the geomean (exec_time_enabled / exec_time_disabled)
of these 96 "1% or greater significance" benchmarks is 0.991. For the
full set of 770 benchmarks it's 0.998.

There are two benchmarks which experience a >30% speedup, and the worst
slowdown is ~12%, and for every benchmark with a slowdown there is a
benckmark which is sped up by a greater factor.

Differential Revision: https://reviews.llvm.org/D130241
2022-07-25 07:59:47 +00:00
Max Kazantsev a053f35990 [SCEV][NFC][CT] Cheaper handling of guards in isBasicBlockEntryGuardedByCond
Handle guards uniformly with assumes, rather than iterating through all
block instructions in attempt to find them.

Differential Revision: https://reviews.llvm.org/D129874
Reviewed By: nikic
2022-07-25 13:38:59 +07:00
Kazu Hirata b5188591a0 [llvm] Remove redundaunt virtual specifiers (NFC)
Identified with modernize-use-override.
2022-07-24 21:50:35 -07:00
Kazu Hirata acf648b5e9 Use llvm::less_first and llvm::less_second (NFC) 2022-07-24 16:21:29 -07:00
Sanjay Patel a925bef70c [ValueTracking] allow vector types in isImpliedCondition()
The matching of constants assumed integers, but we can handle
splat vector constants seamlessly with m_APInt.
2022-07-24 17:46:48 -04:00
Kazu Hirata 97718180d7 [Analysis] Remove a redundant return statement (NFC)
Identified with readability-redundant-control-flow.
2022-07-23 11:35:19 -07:00
Malhar Jajoo 41958f76d8 [Costmodel] Add "type-based-intrinsic-cost" cli option
This patch adds a command line flag to be able to test
the type based cost-model analysis for Intrinsics.

Differential Revision: https://reviews.llvm.org/D129109
2022-07-22 15:50:57 +01:00
Teresa Johnson 1dad6247d2 [MemProf] Add memprof metadata related analysis utilities
Adds a number of utilities that are used to help create and update
memprof related metadata. These will be used during profile matching
and annotation, as well as by the inliner when updating the metadata.
Also adds unit tests for the utilities.

See also related RFCs:
RFC: Sanitizer-based Heap Profiler [1]
RFC: A binary serialization format for MemProf [2]
RFC: IR metadata format for MemProf [3]
(Note that the IR metadata format has changed from the RFC during
implementation, as described in the preceeding patch adding the basic
metadata and verification support.)

Depends on D128141.

Differential Revision: https://reviews.llvm.org/D128854
2022-07-21 13:46:01 -07:00
Augie Fackler 62f48cadfd MemoryBuiltins: accept non-TLI funcs with attribs as allocator funcs
This allows us to accept annotations from out-of-tree languages (the
example test is derived from Rust) so they can enjoy the benefits of
LLVM's optimizations without requiring LLVM to have language-specific
knowledge.

Differential Revision: https://reviews.llvm.org/D123091
2022-07-21 15:31:16 -04:00
Augie Fackler 5a3e3675f6 MemoryBuiltins: start using properties of functions
Prior to this change, we relied on the hard-coded list for all of the
information performed by MemoryBuiltins. With this change, we're able to
start relying on properites of functions described in attributes, which
opens the door to out-of-tree compilers being able to describe their
allocator functions to LLVM's optimizer logic without having to register
their implementation details with LLVM.

Differential Revision: https://reviews.llvm.org/D123090
2022-07-21 15:31:15 -04:00
Arthur Eubanks 04d398db46 [LoopAccessAnalysis] Simplify D119047
No need to add checks for every type per pointer that we couldn't create
a check for the first time around, just the types that weren't
successful.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D119376
2022-07-21 12:16:02 -07:00
David Sherwood f15b6b2907 [AArch64] Add target hook for preferPredicateOverEpilogue
This patch adds the AArch64 hook for preferPredicateOverEpilogue,
which currently returns true if SVE is enabled and one of the
following conditions (non-exhaustive) is met:

1. The "sve-tail-folding" option is set to "all", or
2. The "sve-tail-folding" option is set to "all+noreductions"
and the loop does not contain reductions,
3. The "sve-tail-folding" option is set to "all+norecurrences"
and the loop has no first-order recurrences.

Currently the default option is "disabled", but this will be
changed in a later patch.

I've added new tests to show the options behave as expected here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D129560
2022-07-21 17:20:06 +01:00
Nikita Popov 1f69503107 [MemoryBuiltins] Add getReallocatedOperand() function (NFC)
Replace the value-accepting isReallocLikeFn() overload with a
getReallocatedOperand() function, which returns which operand is
the one being reallocated. Currently, this is always the first one,
but once allockind(realloc) is respected, the reallocated operand
will be determined by the allocptr parameter attribute.
2022-07-21 14:54:16 +02:00
Nikita Popov 46e6dd84b7 [MemoryBuiltins] Remove isFreeCall() function (NFC)
Remove isFreeCall() in favor of getFreedOperand(). Replace the
two remaining uses with a getFreedOperand() != nullptr check, as
they only care that something is getting freed. (The usage in DSE
is correct as such. The allocator-related checks in CFLGraph look
rather questionable in general.)
2022-07-21 14:44:23 +02:00
Nikita Popov c81dff3c30 [MemoryBuiltins] Add getFreedOperand() function (NFCI)
We currently assume in a number of places that free-like functions
free their first argument. This is true for all hardcoded free-like
functions, but with the new attribute-based design, the freed
argument is supposed to be indicated by the allocptr attribute.

To make sure we handle this correctly once allockind(free) is
respected, add a getFreedOperand() helper which returns the freed
argument, rather than just indicating whether the call frees *some*
argument.

This migrates most but not all users of isFreeCall() to the new
API. The remaining users are a bit more tricky.
2022-07-21 12:39:35 +02:00
Nikita Popov d144ae6e1b [MemoryBuiltins] Default to trivial mapper in getAllocSize() (NFC)
Default getAllocSize() to use the trivial mapper. Also switch
from using std::function to function_ref.

Furthermore, update the doc comment to point out a subtle difference
between getAllocSize() and getObjectSize(): The latter may also
return something for calls that return their argument (via "returned"
attribute or special intrinsics like invariant groups).
2022-07-21 11:43:48 +02:00
Nikita Popov 235fb602ed [MemoryBuiltins] Don't query TLI for non-pointer functions (NFC)
Fetching allocation data for calls is a rather hot operation, and
TLI lookups are slow. We can greatly reduce the number of calls
for which TLI is queried by checking that they return a pointer
value first, as this is a requirement for allocation functions
anyway.
2022-07-21 11:28:36 +02:00
Nikita Popov f45ab43332 [MemoryBuiltins] Avoid isAllocationFn() call before checking removable alloc
Alloc directly checking whether a given call is a removable
allocation, instead of first checking whether it is an allocation
first.
2022-07-21 09:39:19 +02:00
Congzhe Cao 05ccde8023 [LoopCacheAnalysis] Fix a type mismatch problem in cost calculation
There is a problem in loop cache analysis that the types of SCEV variables
`Coeff` and `ElemSize` in function `isConsecutive()` may not match. The
mismatch would cause SCEV failures when `Coeff` is multiplied with `ElemSize`.

The fix in this patch is to extend the type of both `Coeff` and `ElemSize` to
whichever is wider in those two variables. As a clean-up, duplicate calculations
of `Stride` in `computeRefCost()` is then removed.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D128877
2022-07-21 01:57:05 -04:00
Schrodinger ZHU Yifan 304027206c [ThinLTO] Support aliased GlobalIFunc
Fixes https://github.com/llvm/llvm-project/issues/56290: when an ifunc is
aliased in LTO, clang will attempt to create an alias summary; however, as ifunc
is not included in the module summary, doing so will lead to crash.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D129009
2022-07-20 15:30:38 -07:00
Philip Reames f494f89b2a [LAA] Fix latent missing check bug when mixing scalable and non-scalabe strides
Noticed via inspection; to my knowledge, impossible to hit today.  In theory, we could have a fixed stride check be analyzed, then a scalable one.  With the old code, the scalable one would be silently dropped, and the runtime guard would go ahead with only the fixed one.  This would be a miscompile.
2022-07-20 11:56:45 -07:00
Max Kazantsev e0ccd190ae [SCEV][NFC][CT] Do not waste time proving contextual facts for unreached loops and blocks
In fact, in unreached code we can say that every fact is true. So do not waste time trying to
do something smarter.

Formally it's not an NFC because it may change query results in unreached code, but they
won't have any impact on execution.

Hypothetical CT boost expected but not measured in practice.

Differential Revision: https://reviews.llvm.org/D129878
2022-07-20 19:02:28 +07:00
Chuanqi Xu 645d2dd3a9 Revert "Don't treat readnone call in presplit coroutine as not access memory"
This reverts commit 57224ff4a6. This
commit may trigger crashes on some workloads. Revert it for clearness.
2022-07-20 17:00:58 +08:00
Chuanqi Xu 57224ff4a6 Don't treat readnone call in presplit coroutine as not access memory
To solve the readnone problems in coroutines. See
https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015
for details.

According to the discussion, we decide to fix the problem by inserting
isPresplitCoroutine() checks in different passes instead of
wrapping/unwrapping readnone attributes in CoroEarly/CoroCleanup passes.
In this direction, we might not be able to cover every case at first.
Let's take a "find and fix" strategy.

Reviewed By: nikic, nhaehnle, jyknight

Differential Revision: https://reviews.llvm.org/D127383
2022-07-20 10:37:23 +08:00
Nikita Popov 534b9246a2 [LoopInfo] Allow cloning of callbr
After D129288, callbr is safe to clone without special handling.
This permits optimizations like loop unroll and loop unswitch on
loops containing callbrs.

Fixes https://github.com/llvm/llvm-project/issues/41834.

Differential Revision: https://reviews.llvm.org/D129993
2022-07-19 09:57:28 +02:00
Max Kazantsev 51f837a680 [NFC] Introduce API to detect tokens penetrating LCSSA form
Following discussion in PR56243, we need to somehow detect the situation
when token values penetrate LCSSA form for transforms that require that
it is maintained by all values (for example, to sustain use-def dominance
invarians). This patch introduces a parameter to LCSSA checkers to control
their ignorance about tokens.

Differential Revision: https://reviews.llvm.org/D129983
Reviewed By: efriedma
2022-07-19 13:52:30 +07:00
Benjamin Kramer 4bd072c56b [LAA] Fix the build with older versions of Clang
llvm/lib/Analysis/LoopAccessAnalysis.cpp:916:12: error: no viable conversion from returned value of type 'SmallVector<[...], 2>' to function return type 'SmallVector<[...], (default)
      CalculateSmallVectorDefaultInlinedElements<T>::value aka 3>'
    return Scevs;
           ^~~~~
2022-07-18 14:01:47 +02:00
Graham Hunter db8fcb2c25 [LAA] Add recursive IR walker for forked pointers
This builds on the previous forked pointers patch, which only accepted
a single select as the pointer to check. A recursive function to walk
through IR has been added, which searches for either a loop-invariant
or addrec SCEV.

This will only handle a single fork at present, so selects of selects
or a GEP with a select for both the base and offset will be rejected.

There is also a recursion limit with a cli option to change it.

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D108699
2022-07-18 12:06:17 +01:00
Kazu Hirata 601b3a13de [Analysis] Qualify auto variables in for loops (NFC) 2022-07-16 23:26:34 -07:00
Kazu Hirata 92a1b2afc8 [Analysis] Remove isArithmeticRecurrenceKind
The last use was removed on Jul 30, 2021 in commit
9d35594993.
2022-07-16 13:23:32 -07:00
Max Kazantsev 883e83d5fe [NFC][SCEV] Rename variable to correspond its current meaning 2022-07-15 22:33:57 +07:00
Nikita Popov 2659e1bf4b [SCEV] List all binops in getOperandsToCreate()
Explicitly list all binops rather than having a default case. There
were two bugs here:
1. U->getOpcode() was used instead of BO->Opcode, which means we
   used the logic for the wrong opcode in some cases.
2. SCEV construction does not support LShr. We should return
   unknown for it rather than recursing into the operands.
2022-07-15 17:08:48 +02:00
Florian Hahn e7ec1746a6
[SCEV] Avoid creating unnecessary SCEVs for SelectInsts.
After 675080a453, we always create SCEVs for all operands of a
SelectInst. This can cause notable compile-time regressions compared to
the recursive algorithm, which only evaluates the operands if the select
is in a form we can create a usable expression.

This approach adds additional logic to getOperandsToCreate to only
queue operands for selects if we will later be able to construct a
usable SCEV.

Unfortunately this introduces a bit of coupling between actual SCEV
construction for selects and getOperandsToCreate, but I am not sure if
there are better alternatives to address the regression mentioned for
675080a453.

This doesn't have any notable compile-time impact on CTMark.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D129731
2022-07-14 09:23:47 -07:00
Philip Reames 3bc09c7da5 [SCEVExpander] Allow udiv with isKnownNonZero(RHS) + add vscale case
Motivation here is to unblock LSRs ability to use ICmpZero uses - the major effect of which is to enable count down IVs. The test changes reflect this goal, but the potential impact is much broader since this isn't a change in LSR at all.

SCEVExpander needs(*) to prove that expanding the expression is safe anywhere the SCEV expression is valid. In general, we can't expand any node which might fault (or exhibit UB) unless we can either a) prove it won't fault, or b) guard the faulting case. We'd been allowing non-zero constants here; this change extends it to non-zero values.

vscale is never zero. This is already implemented in ValueTracking, and this change just adds the same logic in SCEV's range computation (which in turn drives isKnownNonZero). We should common up some logic here, but let's do that in separate changes.

(*) As an aside, "needs" is such an interesting word here. First, we don't actually need to guard this at all; we could choose to emit a select for the RHS of ever udiv and remove this code entirely. Secondly, the property being checked here is way too strong. What the client actually needs is to expand the SCEV at some particular point in some particular loop. In the examples, the original urem dominates that loop and yet we completely ignore that information when analyzing legality. I don't plan to actively pursue either direction, just noting it for future reference.

Differential Revision: https://reviews.llvm.org/D129710
2022-07-14 08:56:58 -07:00
Dawid Jurczak d71128d97d [NFC][Metadata] Change MDNode::operands()'s return type from op_range to ArrayRef<MDOperand>
This patch is https://reviews.llvm.org/D129468 follow-up and address one of comment
coming from that review: https://reviews.llvm.org/D129468#3643295

Differential Revision: https://reviews.llvm.org/D129565
2022-07-14 17:22:32 +02:00
Kazu Hirata 611ffcf4e4 [llvm] Use value instead of getValue (NFC) 2022-07-13 23:11:56 -07:00
Kazu Hirata 30d3f56e33 [Analysis] clang-format InlineAdvisor.cpp (NFC) 2022-07-13 13:38:50 -07:00
Max Kazantsev 30e33b4b81 [SCEV][NFC] Make getStrengthenedNoWrapFlagsFromBinOp return optional 2022-07-13 18:54:25 +07:00
Peter Waller 8acf74fd56 [InstCombine][SVE] Bail out of isSafeToLoadUnconditionally for scalable types
`isSafeToLoadUnconditionally` currently assumes sized types. Bail out for now.
This fixes a TypeSize warning reachable from instcombine via (load (select
cond, ptr, ptr)).

Differential Revision: https://reviews.llvm.org/D129477
2022-07-13 10:07:36 +00:00
Dawid Jurczak 165240fe38 [NFC] Fix compile time regression seen on some benchmarks after a630ea3003 commit
The goal of this change is fixing most of compile time slowdown seen after a630ea3003 commit on lencod and sqlite3 benchmarks.
There are 3 improvements included in this patch:

1. In getNumOperands when possible get value directly from SmallNumOps.
2. Inline getLargePtr by moving its definition to header.
3. In TBAAStructTypeNode::getField get all operands once instead taking operands in loop one after one.

Differential Revision: https://reviews.llvm.org/D129468
2022-07-12 15:00:27 +02:00
Aiden Grossman f3939dc509 [mlgo] Simplify autogenerated regalloc model
Currently the autogenerated regalloc model will sometimes
output an incorrect LR index to evict instead of the first LR
with with the mask set to 1. This trips an assertion within
the MLRegallocAdvisor that the evicted LR has a mask of 1. This
patch, made possible by https://reviews.llvm.org/D124565, simplifies
the autogenerated model by taking away all unnecessary features and
getting rid of the functions that were previously to mix in all
the necessary inputs so they wouldn't get pruned by the Tensorflow
XLA AOT compiler. This is no longer necessary after the previously
mentioned patch. This also fixes the nondeterministic behavior
that is sometimes observed where the autogenerated model will
simply output 0 instead of the correct index.

Reviewed By: yundiqian

Differential Revision: https://reviews.llvm.org/D129254
2022-07-11 13:23:31 -07:00
Mircea Trofin 24c6c35270 [mlgo] Don't provide default model URLs
Pointed out in Issue #56432: the current reference models may not be
quite friendly to open source projects. Their purpose is only
illustrative - the expectation is that projects would train their own.
To avoid unintentionally pulling such a model, made the URL cmake
setting require explicit user setting.

Differential Revision: https://reviews.llvm.org/D129342
2022-07-11 07:37:14 -07:00
David Sherwood 03fee6712a [LoopVectorize] Add option to use active lane mask for loop control flow
Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operations,
such as masked loads and stores, etc. The loop itself is still
controlled by comparing the canonical induction variable with the
trip count. However, for some targets this is inefficient when it's
cheap to use the mask itself to control the loop.

This patch adds support for using the active lane mask for control
flow by:

1. Generating the active lane mask for the next iteration of the
vector loop, rather than the current one. If there are still any
remaining iterations then at least the first bit of the mask will
be set.
2. Extract the first bit of this mask and use this bit for the
conditional branch.

I did this by creating a new VPActiveLaneMaskPHIRecipe that sets
up the initial PHI values in the vector loop pre-header. I've also
made use of the new BranchOnCond VPInstruction for the final
instruction in the loop region.

Differential Revision: https://reviews.llvm.org/D125301
2022-07-11 13:46:55 +01:00
Nicolai Hähnle ede600377c ManagedStatic: remove many straightforward uses in llvm
(Reapply after revert in e9ce1a5880 due to
Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
than error categories, to be checked in more detail and reapplied
separately.)

Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 10:29:15 +02:00
Nicolai Hähnle e9ce1a5880 Revert "ManagedStatic: remove many straightforward uses in llvm"
This reverts commit e6f1f06245.

Reverting due to a failure on the fuchsia-x86_64-linux buildbot.
2022-07-10 09:54:30 +02:00
Nicolai Hähnle e6f1f06245 ManagedStatic: remove many straightforward uses in llvm
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 09:15:08 +02:00
Wenlei He a78f436c3f [Inliner] Make recusive inlinee stack size limit tunable
For recursive callers, we want to be conservative when inlining callees with large stack size. We currently have a limit `InlineConstants::TotalAllocaSizeRecursiveCaller`, but that is hard coded.

We found the current limit insufficient to suppress problematic inlining that bloats stack size for deep recursion. This change adds a switch to make the limit tunable as a mitigation.

Differential Revision: https://reviews.llvm.org/D129411
2022-07-08 21:32:39 -07:00
Nikita Popov d686ea32b1 [ConstantFolding] Guard against unfolded FP binop
Check that the operation actually folded before trying to flush
denormals. A minor variation of the pr33453 test exposed this
with the FP binops marked as undesirable.
2022-07-08 17:45:33 +02:00
Nikita Popov 4a579abd9f [GlobalsModRef] Don't override getModRefBehavior() for CallBase
BasicAA will already call getModRefBehavior() on the Function of
the CallBase if there are no operand bundles. This happens through
getBestAAResults(), i.e. it is a recursive call that will query
other AA providers, not just the BasicAA implementation.

As such, there is no need to reimplement the same functionality
in GlobalsModRef, a combination of BasicAA and GlobalsModRef already
handles it. This does mean that this no longer works under
-disable-basic-aa, but that's a testing only option.
2022-07-07 10:35:44 +02:00
Nikita Popov f96cb66d19 [ValueTracking] Accept Instruction in isSafeToSpeculativelyExecute() (NFC)
As constant expressions can no longer trap, it only makes sense to
call isSafeToSpeculativelyExecute on Instructions, so limit the
API to accept only them, rather than general Operators or Values.
2022-07-06 11:12:49 +02:00
Nikita Popov 8ee913d83b [IR] Remove Constant::canTrap() (NFC)
As integer div/rem constant expressions are no longer supported,
constants can no longer trap and are always safe to speculate.
Remove the Constant::canTrap() method and its usages.
2022-07-06 10:36:47 +02:00
Nikita Popov 935570b2ad [ConstExpr] Don't create div/rem expressions
This removes creation of udiv/sdiv/urem/srem constant expressions,
in preparation for their removal. I've added a
ConstantExpr::isDesirableBinOp() predicate to determine whether
an expression should be created for a certain operator.

With this patch, div/rem expressions can still be created through
explicit IR/bitcode, forbidding them entirely will be the next step.

Differential Revision: https://reviews.llvm.org/D128820
2022-07-05 15:54:53 +02:00
Nikita Popov e4d1d0cc2c [SCEV] Fix isImpliedViaMerge() with values from previous iteration (PR56242)
When trying to prove an implied condition on a phi by proving it
for all incoming values, we need to be careful about values coming
from a backedge, as these may refer to a previous loop iteration.
A variant of this issue was fixed in D101829, but the dominance
condition used there isn't quite right: It checks that the value
dominates the incoming block, which doesn't exclude backedges
(values defined in a loop will usually dominate the loop latch,
which is the incoming block of the backedge).

Instead, we should be checking for domination of the phi block.
Any values defined inside the loop will not dominate the loop
header phi.

Fixes https://github.com/llvm/llvm-project/issues/56242.

Differential Revision: https://reviews.llvm.org/D128640
2022-07-05 15:31:23 +02:00
Nikita Popov f93cd56262 [BPI] Avoid ConstantExpr::get()
Use ConstantFoldBinaryOpOperands() instead, to prepare for the case
where not all binary operators have a constant expression form.

I believe this code actually intended to set OnlyIfReduced=true,
however ConstantExpr::get() actually accepts a Flags argument at
that position (and OnlyIfReducedTy as the next argument), so this
ended up creating a constant expression with some random flag
(probably exact or nuw depending on which).
2022-07-04 16:04:26 +02:00
Nikita Popov 4905bcac00 [ConstantFolding] Check return value of ConstantFoldInstOperandsImpl()
This operation is fallible, but ConstantFoldConstantImpl() is not.
If we fail to fold, we should simply return the original expression.

I don't think this can cause any issues right now, but it becomes
a problem if once make ConstantFoldInstOperandsImpl() not create a
constant expression for everything it possibly could.
2022-07-04 14:19:59 +02:00
Chen Zheng 2c3784cff8 [SCEV] recognize llvm.annotation intrinsic
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D127835
2022-07-03 21:02:50 -04:00
Nuno Lopes 53dc0f1078 [NFC] Switch a few uses of undef to poison as placeholders for unreachble code 2022-07-03 14:34:03 +01:00
Nikita Popov 560e694d48 [AST] Don't assert instruction reads/writes memory (PR51333)
This function is well-defined for an instruction that doesn't access
memory (and thus trivially doesn't alias anything in the AST), so
drop the assert. We can end up with a readnone call here if we
originally created a MemoryDef for an indirect call, which was
later replaced with a direct readnone call.

Fixes https://github.com/llvm/llvm-project/issues/51333.

Differential Revision: https://reviews.llvm.org/D127947
2022-07-01 17:04:48 +02:00
Nikita Popov c8bd3e7825 [SCEV] Remove unnecessary pointer handling in BuildConstantFromSCEV (NFCI)
Nowadays, we do not allow pointers in multiplies, and adds can only
have a single pointer, which is also guaranteed to be last by
complexity sorting. As such, we can somewhat simplify the treatment
of pointer types.
2022-07-01 16:28:56 +02:00
Chen Zheng 758de0e931 [InstructionSimplify] handle denormal input for fcmp
Handle denormal constant input for fcmp instructions based on the
denormal handling mode.

Reviewed By: spatel, dcandler

Differential Revision: https://reviews.llvm.org/D128647
2022-07-01 03:51:28 -04:00
Nikita Popov 9ac386495d [ConstExpr] Don't create insertvalue expressions
In preparation for the removal in D128719, this stops creating
insertvalue constant expressions (well, unless they are directly
used in LLVM IR).

Differential Revision: https://reviews.llvm.org/D128792
2022-07-01 09:23:28 +02:00
Fangrui Song 27abff670b Remove unneeded cl::ZeroOrMore. NFC 2022-06-30 19:11:27 -07:00
Nikita Popov 0445c340ff [ConstantFold] Support loads in ConstantFoldInstOperands()
This allows all constant folding to happen through a single
function, without requiring special handling for loads at each
call-site.

This may not be NFC because some callers currently don't do that
special handling.
2022-06-30 12:18:15 +02:00
Nikita Popov 54fcde42c0 [InlineCost] Simplify constant folding
Use a common ConstantFoldInstOperands-based constant folding
implementation, instead of specifying the folding function for
each function individually. Going through the generic handling
doesn't appear to have any significant compile-time impact.

As the test change shows, this is not NFC, because we now use
DataLayout-aware constant folding, which can do slightly better
in some cases (e.g. those involving GEPs).
2022-06-30 11:49:17 +02:00
Nikita Popov a6d4b4138f [ConstantFold] Supports compares in ConstantFoldInstOperands()
Support compares in ConstantFoldInstOperands(), instead of
forcing the use of ConstantFoldCompareInstOperands(). Also handle
insertvalue (extractvalue was already handled).

This removes a footgun, where many uses of ConstantFoldInstOperands()
need a separate check for compares beforehand. It's particularly
insidious if called on a constant expression, because it doesn't
fail in that case, but will just not do DL-dependent folding.
2022-06-30 11:05:24 +02:00
Chuanqi Xu 0b5ead6590 [WebAssembly] Don't set musttail for coroutines when tail-call is not
enabled

The C++20 Coroutines couldn't be compiled to WebAssembly due to an
optimization named symmetric transfer requires the support for musttail
calls but WebAssembly doesn't support it yet.

This patch tries to fix the problem by adding a supportsTailCalls
method to TargetTransformImpl to skip the symmetric transfer when
tail-call feature is not supported.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D128794
2022-06-30 11:15:40 +08:00
Nikita Popov 30ea6a0636 [SCEV] Don't create udiv constant expression (NFC)
Work on APInts to make it clear that this will not create a
constant expression.

This code path is not reached if the RHS is zero.
2022-06-29 14:35:05 +02:00
Florian Hahn 675080a453
[SCEV] Construct SCEV iteratively.
This patch updates SCEV construction to work iteratively instead of recursively
in most cases. It resolves stack overflow issues when trying to construct SCEVs
for certain inputs, e.g. PR45201.

The basic approach is to to use a worklist to queue operands of V which
need to be created before V. To do so, the current patch adds a
getOperandsToCreate function which collects the operands SCEV
construction depends on for a given value. This is a slight duplication
with createSCEV.

At the moment, SCEVs for phis are still created recursively.

Fixes #32078, #42594, #44546, #49293, #49599,  #55333, #55511

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114650
2022-06-29 11:29:31 +01:00
Nikita Popov 16033ffdd9 [ConstExpr] Remove more leftovers of extractvalue expression (NFC)
Remove some leftover bits of extractvalue handling after the
removal in D125795.
2022-06-29 10:45:19 +02:00
Martin Sebor e263a7670e [InstCombine] Look through more casts when folding memchr and memcmp
Enhance getConstantDataArrayInfo to let the memchr and memcmp library
call folders look through arbitrarily long sequences of bitcast and
GEP instructions.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128364
2022-06-28 15:58:42 -06:00
Nikita Popov 5548e807b5 [IR] Remove support for extractvalue constant expression
This removes the extractvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
extractvalue is already not supported in bitcode, so we do not need
to worry about bitcode auto-upgrade.

Uses of ConstantExpr::getExtractValue() should be replaced with
IRBuilder::CreateExtractValue() (if the fact that the result is
constant is not important) or ConstantFoldExtractValueInstruction()
(if it is). Though for this particular case, it is also possible
and usually preferable to use getAggregateElement() instead.

The C API function LLVMConstExtractValue() is removed, as the
underlying constant expression no longer exists. Instead,
LLVMBuildExtractValue() should be used (which will constant fold
or create an instruction). Depending on the use-case,
LLVMGetAggregateElement() may also be used instead.

Differential Revision: https://reviews.llvm.org/D125795
2022-06-28 10:40:17 +02:00
Bradley Smith a83aa33d1b [IR] Move vector.insert/vector.extract out of experimental namespace
These intrinsics are now fundemental for SVE code generation and have been
present for a year and a half, hence move them out of the experimental
namespace.

Differential Revision: https://reviews.llvm.org/D127976
2022-06-27 10:48:45 +00:00
Nikita Popov 327307d9d4 [SCEV] Assert that GEP source element type is sized (NFC)
This is checked by the IR verifier, so replace the condition with
an assert.
2022-06-27 10:51:09 +02:00
Florian Hahn e4e22b6d80
[SCEV] Use SCEVUnknown(poison) instead of SCEVUnknown(undef).
Use poison instead of undef for SCEVUnkown of unreachable values.
This should be in line with the movement to replace undef with poison
when possible.

Suggested in D114650.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128586
2022-06-27 09:33:05 +01:00
Kazu Hirata a81b64a1fb [llvm] Use Optional::has_value instead of Optional::hasValue (NFC)
This patch replaces x.hasValue() with x.has_value() where x is not
contextually convertible to bool.
2022-06-26 16:10:42 -07:00
Kazu Hirata a7938c74f1 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-25 21:42:52 -07:00
Kazu Hirata 3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3.
2022-06-25 11:56:50 -07:00
Kazu Hirata aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Mircea Trofin 7ae92a69c2 [MLInliner] No need to invalidate everything post-inlining.
We really just need to invalidate loop info and the dominator tree, in
addition to the FunctionPropertiesInfo we were invalidating originally.
Doing more adds unnecessary compile time overhead.
2022-06-24 18:22:06 -07:00
Mingming Liu e0d069598b [Inline] Annotate inline pass name with link phase information for analysis.
The annotation is flag gated; flag is turned off by default.

Differential Revision: https://reviews.llvm.org/D125495
2022-06-24 10:06:43 -07:00
Dawid Jurczak b7e7f4e1b6 [InlineCost] Improve debugging experience by adding print about initial inlining cost
Differential Revision: https://reviews.llvm.org/D127597
2022-06-24 16:27:26 +02:00
Nikita Popov 871197d0a3 [MemoryBuiltins] Accept any value in getInitialValueOfAllocation() (NFC)
Drop the requirement that getInitialValueOfAllocation() must be
passed an allocator function, shifting the responsibility for
checking that into the function (which it does anyway). The
motivation is to avoid some calls to isAllocationFn(), which has
somewhat ill-defined semantics (given the number of
allocator-related attributes we have floating around...)

(For this function, all we eventually need is an allockind of
zeroed or uninitialized.)

Differential Revision: https://reviews.llvm.org/D127274
2022-06-24 16:08:07 +02:00
Nikita Popov 54eff7da3c [AA] Export isEscapeSource() API (NFC)
Export API that was previously private to BasicAliasAnalysis and
will be used in D127202.
2022-06-24 11:59:15 +02:00
Nikita Popov bcadfc2595 [BasicAA] Handle passthru calls in isEscapeSource()
isEscapeSource() currently considers all call return values as
escape sources. However, CaptureTracking can look through certain
calls, so we shouldn't consider these as escape sources either.

The corresponding CaptureTracking code is:
7c9a3825b8/llvm/lib/Analysis/CaptureTracking.cpp (L332-L333)

Differential Revision: https://reviews.llvm.org/D128444
2022-06-24 11:00:57 +02:00
Wolfgang Pieb c50e6f590c [Inline] Introduce a backend option to suppress inlining of functions with large stack sizes.
The hidden option max-inline-stacksize=<N> prevents the inlining of functions
with a stack size larger than N.

Reviewed By: mtrofin, aeubanks

Differential Review: https://reviews.llvm.org/D127988
2022-06-23 10:57:46 -07:00
David Green bd1a4c8565 [ValueTracking] Teach isKnownNonZero that a vscale is never 0.
A llvm.vscale will always be at least 1, never zero. Teaching that to
isKnownNonZero can help fold away some statically known compares.

Differential Revision: https://reviews.llvm.org/D128217
2022-06-23 15:25:24 +01:00
Mingming Liu bc856eb3fc [SampleProfile][Inline] Annotate sample profile inline remarks with link phase (prelink/postlink) information.
Differential Revision: https://reviews.llvm.org/D126833
2022-06-22 17:00:53 -07:00
Vasileios Porpodas 7a9ad25769 Recommit "[SLP][X86] Improve reordering to consider alternate instruction bundles"
This reverts commit 6d6268dcbf.

Review: https://reviews.llvm.org/D125712
2022-06-21 18:35:29 -07:00
Vasileios Porpodas 6d6268dcbf Revert "[SLP][X86] Improve reordering to consider alternate instruction bundles"
This reverts commit 6f88acf410.
2022-06-21 17:07:21 -07:00
Vasileios Porpodas 6f88acf410 [SLP][X86] Improve reordering to consider alternate instruction bundles
During the reordering transformation we should try to avoid reordering bundles
like fadd,fsub because this may block them being matched into a single vector
instruction in x86.
We do this by checking if a TreeEntry is such a pattern and adding it to the
list of TreeEntries with orders that need to be considered.

Differential Revision: https://reviews.llvm.org/D125712
2022-06-21 16:44:48 -07:00
Martin Sebor b19194c032 [InstCombine] handle subobjects of constant aggregates
Remove the known limitation of the library function call folders to only
work with top-level arrays of characters (as per the TODO comment in
the code) and allows them to also fold calls involving subobjects of
constant aggregates such as member arrays.
2022-06-21 11:55:14 -06:00
Mircea Trofin 3f8e4169c1 [FunctionPropertiesAnalysis] Generalize support for unreachable
Generalized support for subgraphs that get rendered unreachable, for
both `call` and `invoke` cases.

Differential Revision: https://reviews.llvm.org/D127921
2022-06-21 08:18:01 -07:00
Nikita Popov ed63fcb232 [GlobalsModRef] Remove check for allocator calls
As the FIXME already indicates, I don't see why this code would be
necessary. If there's a call to an allocator function, that should
get treated just like any other function call -- usually it will be
a declaration and handled conservatively based on memory attributes
only. There should be no need to explicitly force it to be modref.
No test failures either, so I think this is just dead code.

Differential Revision: https://reviews.llvm.org/D127273
2022-06-21 14:24:13 +02:00
Kazu Hirata 7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Kazu Hirata 0916d96d12 Don't use Optional::hasValue (NFC) 2022-06-20 20:17:57 -07:00
Kazu Hirata 064a08cd95 Don't use Optional::hasValue (NFC) 2022-06-20 20:05:16 -07:00
Kazu Hirata 5413bf1bac Don't use Optional::hasValue (NFC) 2022-06-20 11:33:56 -07:00
Kazu Hirata e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
David Candler d3919a8cc5 [ConstantFolding] Respect denormal handling mode attributes when folding instructions
Depending on the environment, a floating point instruction should
treat denormal inputs as zero, and/or flush a denormal output to zero.
Denormals are not currently accounted for when an instruction gets
folded to a constant, which can lead to differences in output between
a folded and a unfolded instruction when running on the target. The
denormal handling mode can be set by the function level attribute
denormal-fp-math, which this patch uses to determine whether any
denormal inputs to or outputs from folding should be zero, and that
the sign is set appropriately.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D116952
2022-06-20 16:41:46 +01:00
Guillaume Chatelet d3cf49e984 [Alignment] Remove alignTo version taking a MaybeAlign 2022-06-20 15:15:53 +00:00
Arthur Eubanks 6dd17a2b34 [CallGraph] Don't preserve CallGraph when function CFG analyses are preserved
The call graph has nothing to with function CFGs.

Fixes a crash in a future change that exposes this bug.
2022-06-19 13:01:08 -07:00
Sanjay Patel 4022551a15 [ValueTracking] recognize sub X, (X -nuw Y) as not overflowing
This extends a similar pattern from D125500 and D127754.
If we know that operand 1 (RHS) of a subtract is itself a
non-overflowing subtract from operand 0 (LHS), then the
final/outer subtract is also non-overflowing:
https://alive2.llvm.org/ce/z/Bqan8v

InstCombine uses this analysis to trigger a narrowing
optimization, so that is what the first changed test shows.

The last test models a motivating case from issue #48013.
In that example, we determine 'nuw' on the first sub from
the urem, then we determine that the 2nd sub can be narrowed,
and that leads to eliminating both subtracts.

here are still several missing subtract narrowing optimizations
demonstrated in the tests above the diffs shown here - those
should be handled in InstCombine with another set of patches.
2022-06-19 15:12:19 -04:00
Kazu Hirata 129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
Kazu Hirata b254d67160 [llvm] Call *set::insert without checking membership first (NFC) 2022-06-18 08:32:54 -07:00
Florian Hahn e9cced2739
Recommit "[LAA] Initial support for runtime checks with pointer selects."
This reverts commit 7aa8a67882.

This version includes fixes to address issues uncovered after
the commit landed and discussed at D11448.

Those include:

* Limit select-traversal to selects inside the loop.
* Freeze pointers resulting from looking through selects to avoid
  branch-on-poison.
2022-06-17 21:06:26 +02:00
Sterling Augustine df6087ee37 Move debug-only code inside LLVM_DEUG to prevent unused variable warnings. 2022-06-16 14:01:26 -07:00
Congzhe Cao 4c77d0276b [Delinearization] Refactoring of fixed-size array delinearization
This is a follow-up patch to D122857 where we added delinearization of
fixed-size arrays to loop cache analysis, which resulted in some duplicate
code, i.e., "tryDelinearizeFixedSize()", in LoopCacheCost.cpp and
DependenceAnalysis.cpp. Refactoring is done in this patch.

This patch refactors out the main logic of "tryDelinearizeFixedSize()" as
"tryDelinearizeFixedSizeImpl()" and moves it to Delinearization.cpp, such that
clients can reuse "llvm::tryDelinearizeFixedSizeImpl()" wherever they would
like to delinearize fixed-size arrays. Currently it has two users, i.e.,
DependenceAnalysis.cpp and LoopCacheCost.cpp.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D124745
2022-06-16 16:03:41 -04:00
Congzhe Cao a9dccb0072 [TargetTransformInfo] Added an opt/llc option for cache line size
In some passes we need a valid number of cache line size to do analysis or
transformation, e.g., loop cache analysis and loop date prefetch. However,
for some backend targets, `TTIImpl->getCacheLineSize()` is not implemented
and hence 'TTI.getCacheLineSize()' would just return 0 which eventually might
produce invalid result.

In this patch we add a user-specified opt/llc option for cache line size.
If the option is specified by users we use the value supplied, otherwise we
fall-back to the default value obtained from `TTIImpl->->getCacheLineSize()`.
The powerpc target already has such an option, this patch generalizes
this option to TargetTransformInfo.cpp.

Reviewed By: bmahjour, #loopoptwg

Differential Revision: https://reviews.llvm.org/D127342
2022-06-16 15:57:51 -04:00
Mircea Trofin 7f24e574d4 [MLInliner] Don't inline call sites in unreachable basic blocks
This requires DominatorTree be updated, which we do in the ml inliner
case, but not in the default case, and the cost of doing so is
noticeable to compile time for the latter[1]. So the patch only affects
the ML inliner.

[1] https://llvm-compile-time-tracker.com/compare.php?from=9fc0aa45e3312944431ba7e1ca0cec99c613992b&to=7af461b1ce0d9138211ef5f883f35d5b9ddf47be&stat=wall-time

Differential Revision: https://reviews.llvm.org/D127899
2022-06-16 09:14:22 -07:00
Jin Xin Ng aaff3fb6d5 [mlgo] Fix accounting for SCC splits
Previously if the inliner split an SCC such that an empty one remained, the MLInlineAdvisor could potentially lose track of the EdgeCount if a subsequent CGSCC pass modified the calls of a function that was initially in the SCC pre-split. Saving the seen nodes in onPassEntry resolves this.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D127693
2022-06-15 10:53:23 -07:00
Mircea Trofin 22a1f998f7 FunctionPropertiesAnalysis: handle callsite BBs that lose edges
There could be successors that were reached before but now are only
reachable from elsewhere in the CFG.

Suppose the following diamond CFG (lines are arrows pointing down):
    A
  /   \
 B     C
  \   /
    D
There's a call site in C that is inlined. Upon doing that, it turns out
it expands to:
   call void @llvm.trap()
   unreachable
D isn't reachable from C anymore, but we did discount it when we set up
FunctionPropertiesUpdater, so we need to re-include it here.

The patch also updates loop accounting to use LoopInfo rather than
traverse BBs.

Differential Revision: https://reviews.llvm.org/D127353
2022-06-14 15:19:44 -07:00
Paul Robinson 10affe74ed [PS5] Make library function availability match PS4 2022-06-14 12:47:06 -07:00
Sanjay Patel 8605b4d8c5 [ValueTracking] recognize sub X, (X -nsw Y) as not overflowing
This extends a similar pattern from D125500.
If we know that operand 1 (RHS) of a subtract is itself a
non-overflowing subtract from operand 0 (LHS), then the
final/outer subtract is also non-overflowing:
https://alive2.llvm.org/ce/z/Bqan8v

InstCombine uses this analysis to trigger a narrowing
optimization, so that is what the first changed test shows.

The last test models the motivating case from issue #48013.
In that example, we determine 'nsw' on the first sub from
the srem, then we determine that the 2nd sub can be narrowed,
and that leads to eliminating both subtracts.

This works for unsigned sub too, but I left that out to keep
the patch minimal. If this looks ok, I will follow up with
that change. There are also several missing subtract narrowing
optimizations demonstrated in the tests above the diffs shown
here - those should be handled in InstCombine with another set
of patches.

Differential Revision: https://reviews.llvm.org/D127754
2022-06-14 14:51:49 -04:00
Paul Robinson c36eebb52e [PS5] Use __gxx_personality_v0 for TSan 2022-06-14 10:39:34 -07:00
Jin Xin Ng 9f2b873a7d [inliner] Add per-SCC-pass InlineAdvisor printing option
Adds option to print the contents of the Inline Advisor after each SCC Inliner pass

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D127689
2022-06-14 08:06:52 -07:00
Kazu Hirata 5c41b0f429 [Analysis] Remove getUniqueInstruction (NFC)
The last use was removed on Apr 7, 2022 in commit
5cefe7d9f5.
2022-06-13 14:26:20 -07:00
Nikita Popov 7e64a29e58 [InstSimplify][IR] Handle trapping constant aggregate (PR49839)
Unfortunately, it's not just constant expressions that can trap,
we might also have a trapping constant expression nested inside
a constant aggregate.

Perform the check during phi folding on Constant rather than
ConstantExpr, and extend the Constant::mayTrap() implementation
to also recursive into ConstantAggregates, not just ConstantExprs.

Fixes https://github.com/llvm/llvm-project/issues/49839.
2022-06-13 12:35:17 +02:00
Mircea Trofin 7e7021ca1a [mlgo] Update FunctionPropertyCache after invalidating analyses
The update depends on LoopInfo, so we need that refreshed first, not
after.

Differential Revision: https://reviews.llvm.org/D127467
2022-06-10 16:18:14 -07:00
Guillaume Chatelet 38637ee477 [clang] Add support for __builtin_memset_inline
In the same spirit as D73543 and in reply to https://reviews.llvm.org/D126768#3549920 this patch is adding support for `__builtin_memset_inline`.

The idea is to get support from the compiler to easily write efficient memory function implementations.

This patch could be split in two:
 - one for the LLVM part adding the `llvm.memset.inline.*` intrinsics.
 - and another one for the Clang part providing the instrinsic as a builtin.

Differential Revision: https://reviews.llvm.org/D126903
2022-06-10 13:13:59 +00:00
Nikita Popov d77f944832 [LoopInfo] Add getOutermostLoop() (NFC)
This is a recurring pattern, add an API function for it.
2022-06-10 11:48:21 +02:00
Philip Reames f85c5079b8 Pipe potentially invalid InstructionCost through CodeMetrics
Per the documentation in Support/InstructionCost.h, the purpose of an invalid cost is so that clients can change behavior on impossible to cost inputs. CodeMetrics was instead asserting that invalid costs never occurred.

On a target with an incomplete cost model - e.g. RISCV - this means that transformations would crash on (falsely) invalid constructs - e.g. scalable vectors. While we certainly should improve the cost model - and I plan to do so in the near future - we also shouldn't be crashing. This violates the explicitly stated purpose of an invalid InstructionCost.

I updated all of the "easy" consumers where bailouts were locally obvious. I plan to follow up with loop unroll in a following change.

Differential Revision: https://reviews.llvm.org/D127131
2022-06-09 15:17:24 -07:00
Florian Hahn 20d798bd47
Recommit "[SCEV] Look through single value PHIs." (take 3)
This reverts commit 1fbdbb5595.

All known issues surfaced by this patch should have been fixed now.

The fixes included fixing issues with SCEV expansion in LV and DA's
reliance on LCSSA phis.
2022-06-09 15:20:10 +01:00
Simon Moll b8c2781ff6 [NFC] format InstructionSimplify & lowerCaseFunctionNames
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName".  This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.

This is the alternative to the less invasive clang-format only patch: D126783

Reviewed By: spatel, rengolin

Differential Revision: https://reviews.llvm.org/D126889
2022-06-09 16:10:08 +02:00
Mircea Trofin b8c39eb275 Fix FunctionPropertiesAnalysis updating callsite in 1-BB loop
If the callsite is in a single BB loop, we need to exclude the BB from
the successor set (in which it'd be a member), because that set forms a
boundary at which we stop traversing the CFG, when re-ingesting BBs
after inlining; but after inlining, the callsite BB's new successors
should be visited.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D127178
2022-06-08 14:32:00 -07:00
Jin Xin Ng a3a7826d82 [mlgo] Disable accounting upon ForceStop
Once ForceStop is set to true, we only return positive inlining advice when it is mandatory; There is no need for further node/edge accounting.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D127245
2022-06-08 14:26:06 -07:00
Bardia Mahjour c9677f6db4 [DA] Handle mismatching loop levels by considering them non-linear
To represent various loop levels within a nest, DA implements a special
numbering scheme (see comment atop establishNestingLevels). The goal of
this numbering scheme appears to be representing each unique loop
distinctively by using as little memory as possible. This numbering
scheme is simple when the source and destination of the dependence are
in the same loop. In such cases the level is simply the depth of the
loop in which src and dst reside. When the src and dst are not in the
same loop, we could run into the following situation exposed by
https://reviews.llvm.org/D71539. This patch fixes this by detecting
such cases in checkSubscripts and treating them as non-linear/non-affine.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D110973
2022-06-08 11:15:37 -04:00
Chuanqi Xu 0e10f12844 [NFC] Remove commented cerr debugging loggings
There are some unused cerr debugging loggings in the codes. It is weird
to remain such commented debug helpers in the product.
2022-06-08 15:58:06 +08:00
Philip Reames 8a0cd23326 Revert "[MemDep][NFCI] Remove redundant dyn_cast, replace with cast"
This reverts commit 180d3f251d.  This commit is simply wrong.  IsLoad is set within the same file based on modref state, not whether the instruction is a LoadInst.

This went uncaught because cast<Ty>(X) has been broken.  See https://discourse.llvm.org/t/cast-x-is-broken-implications-and-proposal-to-address/63033 for context.
2022-06-07 13:21:31 -07:00
William Huang ba26e45ca9 [ValueTracking] Add support to deduce a PHI node being a power of 2 if each incoming value is a power of 2.
Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D124889
2022-06-07 18:52:31 +00:00
Fangrui Song d86a206f06 Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 00:31:44 -07:00
Fangrui Song 36c7d79dc4 Remove unneeded cl::ZeroOrMore for cl::opt options
Similar to 557efc9a8b.
This commit handles options where cl::ZeroOrMore is more than one line below
cl::opt.
2022-06-04 00:10:42 -07:00
Fangrui Song 557efc9a8b [llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.
2022-06-03 21:59:05 -07:00
Max Kazantsev 8555e59a71 [NFC][MemDep] Remove unnecessary Worklist.clear
This execution path leads to return 'false' where the Worklist
will be deallocated anyways. No need to clear it separately.
2022-06-03 12:31:44 +07:00
Florian Hahn 78c6b1488f
[CaptureTracking] Increase limit and use it for all visited uses.
Currently the MaxUsesToExplore limit only applies to the number of users
per value, not the total number of users to explore.

The current limit of 20 pessimizes IR with opaque pointers in some
cases. Without opaque pointers, we have deeper pointer def-use chains in
general due to extra bitcasts and geps for structs with index 0.

With opaque pointers the def-use chain is not as deep but wider, due to
bitcasts & 0-geps missing.

To improve the situation for opaque pointers, this patch does 2 things:

 1. Apply the limit to the total number of uses visited. From the
    wording in the description of the option it seems like this may be
    the original intention. With the current implementation we could
    still end up walking a lot of uses.
 2. Increase the limit to 100. This is quite arbitrary, but enables
    a good number of additional optimizations.

Those adjustments have a noticeable compile-time impact though. In part
that is likely due to additional transformations (and conversely
the current baseline misses optimizations after switching to opaque
pointers).

This recovers some regressions that showed up after enabling opaque
pointers.

Limit=100:

* NewPM-O3: +0.21%
* NewPM-ReleaseThinLTO: +0.87%
* NewPM-ReleaseLTO-g: +0.46%

https://llvm-compile-time-tracker.com/compare.php?from=2e50ecb2ef4e1da1aeab05bcf66380068e680991&to=7e6fbe519d958d09f32f01d5d44a622f551e2031&stat=instructions

Limit=60:

* NewPM-O3: +0.14%
* NewPM-ReleaseThinLTO: +0.41%
* NewPM-ReleaseLTO-g: +0.21%

https://llvm-compile-time-tracker.com/compare.php?from=aeb19817d66f1a15754163c7f48e01e9ebdd6d45&to=520563fdc146319aae90d06f88d87f2e9e1247b7&stat=instructions

Limit=40:
* NewPM-O3: +0.11%
* NewPM-ReleaseThinLTO: +0.12%
* NewPM-ReleaseLTO-g: +0.09%

https://llvm-compile-time-tracker.com/compare.php?from=aeb19817d66f1a15754163c7f48e01e9ebdd6d45&to=c9182576e9fe3f1c84a71479665aef91a416318c&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126236
2022-06-02 21:43:58 +01:00
Mingming Liu 8601f269f1 [Inline][Remark][NFC] Optionally provide inline context to inline
advisor.

This patch has no functional change, and merely a preparation patch for
main functional change. The motivating use case is to annotate inline
remark pass name with context information (e.g. prelink or postlink,
CGSCC or always-inliner), see D125495 for more details.

Differential Revision: https://reviews.llvm.org/D126824
2022-06-02 13:14:30 -07:00
Alexander Kornienko 7aa8a67882 Revert "[LAA] Initial support for runtime checks with pointer selects."
This reverts commit 5890b30105 as per discussion
on the review thread: https://reviews.llvm.org/D114487#3547560.
2022-06-01 15:24:27 +02:00
Nikita Popov 03aceab08b [ValueTracking] Enable -branch-on-poison-as-ub by default
Now that SimpleLoopUnswitch and other transforms no longer introduce
branch on poison, enable the -branch-on-poison-as-ub option by
default. The practical impact of this is mostly better flag
preservation in SCEV, and some freeze instructions no longer being
necessary.

Differential Revision: https://reviews.llvm.org/D125299
2022-06-01 10:46:06 +02:00
Mircea Trofin f46dd19b48 [mlgo] Incrementally update FunctionPropertiesInfo during inlining
Re-computing FunctionPropertiesInfo after each inlining may be very time
consuming: in certain cases, e.g. large caller with lots of callsites,
and when the overall IR doesn't increase (thus not tripping a size bloat
threshold).

This patch addresses this by incrementally updating
FunctionPropertiesInfo.

Differential Revision: https://reviews.llvm.org/D125841
2022-05-31 17:27:32 -07:00
Mehdi Amini aff271930e Fix warning for unused variable in the non-assert build (NFC) 2022-05-30 16:21:38 +00:00
Simon Moll 18c1ee04de Re-land "[VP] vp intrinsics are not speculatable" with test fix
Update the llvmir-intrinsics.mlir test to account for the modified
attribute sets.

This reverts commit 2e2a8a2d90.
2022-05-30 14:41:15 +02:00
Mehdi Amini 2e2a8a2d90 Revert "[VP] vp intrinsics are not speculatable"
This reverts commit 78a18d2b54.

Break MLIR bot: https://lab.llvm.org/buildbot/#/builders/61/builds/27127
2022-05-30 12:26:16 +00:00
Max Kazantsev 7e5a730473 [MemDep][NFC] Remove duplicating check in `if` and `else` branch
Same check is done whether the condition is true or false. Just hoist
it out of conditional.
2022-05-30 17:43:00 +07:00
Simon Moll 78a18d2b54 [VP] vp intrinsics are not speculatable
VP intrinsics show UB if the %evl parameter is out of bounds - they must
not carry the speculatable attribute.  The out-of-bounds UB disappears
when the %evl parameter is expanded into the mask or expansion replaces
the entire VP intrinsic with non-VP code.

This patch
- Removes the speculatable attribute on all VP intrinsics.
- Generalizes the isSafeToSpeculativelyExecute function to let VP
  expansion know whether the VP intrinsic replacement will be
  speculatable.  VP expansion may only discard %evl where this is the
  case.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D125296
2022-05-30 12:20:05 +02:00
Max Kazantsev 180d3f251d [MemDep][NFCI] Remove redundant dyn_cast, replace with cast
When `IsLoad` is `true`, we don't need to check if the instruction
is actually a load with dyn_cast. Saves some petty amount of CT.
2022-05-30 17:17:55 +07:00
eopXD 6a84579243 [LSR][TTI][PowerPC][SystemZ][X86] Add const-ness to TTI::isLSRCostLess. NFC
Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D126350
2022-05-27 15:22:23 -07:00
William Huang 35b0955aa5 [ValueTracking] Added support to deduce PHI Nodes values being a power of 2
Add Value Tracking support to deduce induction variable being a power of 2, allowing urem optimizations

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D126018
2022-05-26 20:30:31 +00:00
Florian Hahn 6af5f5697c
[SCEV] Collect conditions from assumes same way as for branches.
Also collect conditions from assume up-front in applyLoopGuards.
This allows re-using the logic to handle logical ANDs as assume
conditions.

It should should pave the road for a fix for #55645.
2022-05-26 18:17:13 +01:00
serge-sans-paille fb67d683db [iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since 7030654296 detected a few
regressions, fixing them.

Differential Revision: https://reviews.llvm.org/D126417
2022-05-26 08:12:34 +02:00
Nikita Popov 8a6698b523 [ValueTracking] Loads with !dereferenceable metadata cannot be undef/poison
A load with !dereferenceable or !dereferenceable_or_null metadata
must return a well-defined (non-undef/poison) value. Effectively
they imply !noundef. This is the same as we do for the
dereferenceable(N) attribute.

This should fix https://github.com/llvm/llvm-project/issues/55672,
or at least the specific case discussed there.

Differential Revision: https://reviews.llvm.org/D126296
2022-05-25 09:54:04 +02:00
Sanjay Patel e8c20d995b [IR] add and use pattern match specialization for sqrt intrinsic; NFC
This was included in D126190 originally, but it's
independent and a useful change for readability.
2022-05-23 14:16:30 -04:00
Jingu Kang bb82f74612 Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth""
This reverts commit 42ebfa8269.

The commmit from https://reviews.llvm.org/D125918 has fixed the stage 2 build
failure.

Differential Revision: https://reviews.llvm.org/D118979
2022-05-23 16:15:45 +01:00
Peter Waller ade47bdc31 [LV] Improve register pressure estimate at high VFs
Previously, `getRegUsageForType` was implemented using
`getTypeLegalizationCost`.  `getRegUsageForType` is used by the loop
vectorizer to estimate the register pressure caused by using a vector
type.  However, `getTypeLegalizationCost` currently only appears to
understand splitting and not scalarization, so significantly
underestimates the register requirements.

Instead, use `getNumRegisters`, which understands when scalarization
can occur (via computeRegisterProperties).

This was discovered while investigating D118979 (Set maximum VF with
shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the
loop vectorizer previously ends up costing an v128i1 as 2 v64i*
registers where it actually occupies 128 i32 registers.

I'm sending this patch early for comment, I'm still doing some sanity checking
with LNT.  I note that getRegisterClassForType appears to return VectorRC even
though the type in question (large vNi1 types) end up occupying scalar
registers. That might be worth fixing too.

Differential Revision: https://reviews.llvm.org/D125918
2022-05-23 07:57:45 +00:00
Nikita Popov c8b675eaa1 [SCEV] Use umin_seq for BECount of multi-exit loops
When computing the BECount for multi-exit loops, we need to combine
individual exit counts using umin_seq rather than umin. This is
because an earlier exit may exit on the first iteration, in which
case later exit expressions will not be evaluated and could be
poisonous. We cannot propagate potential poison values from later
exits.

In particular, this avoids the introduction of "branch on poison"
UB when optimizing multi-exit loops.

Differential Revision: https://reviews.llvm.org/D124910
2022-05-21 15:48:14 +02:00
Craig Topper f2df53b750 [InstructionSimplify] Remove multiple 'break' after 'return'. NFC 2022-05-20 10:23:57 -07:00
Nico Weber 304a5a7a14 Revert "[ValueTracking] Added support to deduce PHI Nodes values being a power of 2"
This reverts commit d5c130f17e.
Breaks tests, see https://reviews.llvm.org/D125332#3525819
2022-05-19 15:05:30 -04:00
William Huang d5c130f17e [ValueTracking] Added support to deduce PHI Nodes values being a power of 2
Add Value Tracking support to deduce induction variable being a power of 2, allowing urem optimizations

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D125332
2022-05-19 18:39:13 +00:00
Jay Foad 6bec3e9303 [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.

The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.

Differential Revision: https://reviews.llvm.org/D125557
2022-05-19 11:23:13 +01:00
Philip Reames f7988d08a8 Revert "[BasicAA] Remove unneeded special case for malloc/calloc"
This reverts commit 9b1e00738c.

Nikic reported in commit thread that I had forgotten history here, and that a) we'd tried this before, and b) had to revert due to an unexpected codegen impact.  Current measurements confirm the same issue still exists.
2022-05-18 07:35:27 -07:00
NAKAMURA Takumi 6ca7eb2c6d [SCEV] Part 1, Serialize function calls in function arguments.
Evaluation odering in function call arguments is implementation-dependent.
In fact, gcc evaluates bottom-top and clang does top-bottom.

Fixes #55283 partially.

Part of https://reviews.llvm.org/D125627
2022-05-18 23:20:08 +09:00
Philip Reames 9b1e00738c [BasicAA] Remove unneeded special case for malloc/calloc
This code pre-exists the generic handling for inaccessiblememonly.  If we remove it and update one test with inaccessiblememonly, nothing else changes.  Note that simply running O1 on that test would annotate malloc with the missing inaccessiblememonly.
2022-05-17 20:45:14 -07:00
Nikita Popov b9b71c2b87 [LVI] Compute range for xor
We do have a non-trivial implementation for binaryXor() now.
2022-05-17 10:18:38 +02:00
Yang Keao 7dce9eb6e5 [DomPrinter] Migrate -dot-dom to the new pass manager.
In D123677, @YangKeao provided an implementation of `DOTGraphTraits{Viewer,Printer}` in the new pass manager. This commit migrates the `DomPrinter` and `DomViewer` to the new pass manager.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D124904
2022-05-16 15:07:16 -05:00
Nikita Popov 356d47ccb9 [ValueTracking] Handle and/or on RHS of isImpliedCondition()
isImpliedCondition() currently handles and/or on the LHS, but not
on the RHS, resulting in asymmetric behavior. This patch adds two
new implication rules:

 * LHS ==> (RHS1 || RHS2) if LHS ==> RHS1 or LHS ==> RHS2
 * LHS ==> !(RHS1 && RHS2) if LHS ==> !RHS1 or LHS ==> !RHS2

Differential Revision: https://reviews.llvm.org/D125551
2022-05-16 16:30:26 +02:00
Florian Hahn b7315ffc3c
[LAA,LV] Add initial support for pointer-diff memory checks.
This patch adds initial support for a pointer diff based runtime check
scheme for vectorization. This scheme requires fewer computations and
checks than the existing full overlap checking, if it is applicable.

The main idea is to only check if source and sink of a dependency are
far enough apart so the accesses won't overlap in the vector loop. To do
so, it is sufficient to compute the difference and compare it to the
`VF * UF * AccessSize`. It is sufficient to check
`(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards
dependence in the vector loop with the given VF and UF. If Src >=u Sink,
there is not dependence preventing vectorization, hence the overflow
should not matter and using the ULT should be sufficient.

Note that the initial version is restricted in multiple ways:

1. Pointers must only either be read or written, by a single
   instruction (this allows re-constructing source/sink for
   dependences with the available information)
 2. Source and sink pointers must be add-recs, with matching steps
 3. The step must be a constant.
 3. abs(step) == AccessSize.

Most of those restrictions can be relaxed in the future.

See https://github.com/llvm/llvm-project/issues/53590.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D119078
2022-05-16 15:27:22 +01:00
NAKAMURA Takumi da7d8de1e4 ScalarEvolution.cpp: Reformat. 2022-05-15 20:51:27 +09:00
Sanjay Patel ee6754c277 [ValueTracking] recognize sub X, (X % Y) as not overflowing
I fixed some poison-safety violations on related patterns in InstCombine
and noticed that we missed adding nsw/nuw on them, so this adds clauses
to the underlying analysis for that.

We need the undef input restriction to make this safe according to Alive2:
https://alive2.llvm.org/ce/z/48g9K8

Differential Revision: https://reviews.llvm.org/D125500
2022-05-13 09:59:41 -04:00
Nikita Popov ddfee07519 [InstSimplify] Fold and/or using implied conditions
This adds two conjugated folds:

 * A | B -> B if A implies B (https://alive2.llvm.org/ce/z/R6GU4j)
 * A & B -> A if A implies B (https://alive2.llvm.org/ce/z/EGMqyy)

If A and B are icmps themselves, we will usually fold this through
other logic already (though the tests show a couple additional cases
we previously missed). However, isImpliedCond() also supports A
being of the form X & Y, which allows us to handle cases like
(X & Y) | B where X implies B. This addresses the regression from
D125398.

Something that notably doesn't work yet is the (X | Y) & B case.
This is due to an asymmetry in the isImpliedCondition()
implementation that will have to be addressed separately.

Differential Revision: https://reviews.llvm.org/D125530
2022-05-13 15:09:14 +02:00
Florian Hahn 5890b30105
[LAA] Initial support for runtime checks with pointer selects.
Scaffolding support for generating runtime checks for multiple SCEV expressions
per pointer. The initial version just adds support for looking through
a single pointer select.

The more sophisticated logic for analyzing forks is in D108699

Reviewed By: huntergr

Differential Revision: https://reviews.llvm.org/D114487
2022-05-12 19:33:48 +01:00
Arthur Eubanks 7e0802aeb5 [BasicAA] Fix order in which we pass MemoryLocations to alias()
D98718 caused the order of Values/MemoryLocations we pass to alias() to
be significant due to storing the offset in the PartialAlias case. But
some callers weren't audited and were still passing swapped arguments,
causing the returned PartialAlias offset to be negative in some
cases. For example, the newly added unittests would return -1
instead of 1.

Fixes #55343, a miscompile.

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D125328
2022-05-10 12:05:38 -07:00
Nikita Popov c077510bb1 [InstSimplify] Handle unknown function context in pointer icmp fold (PR54615)
This issue reproduces in the context of LoopDeletion, because the
bitcast does not get simplified away there. For a plain -inst-simplify
run the bitcast would get folded away first.

Fixes https://github.com/llvm/llvm-project/issues/54615.
2022-05-10 11:48:43 +02:00
Andrew Litteken 96345f773c [IRSim] Remove early check from similarity matching such that commutative instructions are checked correctly when using the same value.
When the first commutative instruction in a region using the same value in both positions was compared to a corresponding instruction with two different values, there was an early check that determined that since the values were new, it was true that these values acted in the same way structurally. If this was not contradicted later in the program, the regions were marked as similar. This removes that check, so that it is clear that the same value cannot be mapped to two different values.

Reviewer: paquette

Differential Revision: https://reviews.llvm.org/D124775
2022-05-09 22:59:09 -05:00
Mircea Trofin c35ad9ee4f [mlgo] Support exposing more features than those supported by models
This allows the compiler to support more features than those supported by a
model. The only requirement (development mode only) is that the new
features must be appended at the end of the list of features requested
from the model. The support is transparent to compiler code: for
unsupported features, we provide a valid buffer to copy their values;
it's just that this buffer is disconnected from the model, so insofar
as the model is concerned (AOT or development mode), these features don't
exist. The buffers are allocated at setup - meaning, at steady state,
there is no extra allocation (maintaining the current invariant). These
buffers has 2 roles: one, keep the compiler code simple. Second, allow
logging their values in development mode. The latter allows retraining
a model supporting the larger feature set starting from traces produced
with the old model.

For release mode (AOT-ed models), this decouples compiler evolution from
model evolution, which we want in scenarios where the toolchain is
frequently rebuilt and redeployed: we can first deploy the new features,
and continue working with the older model, until a new model is made
available, which can then be picked up the next time the compiler is built.

Differential Revision: https://reviews.llvm.org/D124565
2022-05-09 18:01:21 -07:00
Michael Kruse 6b3b87376b [polly] migrate -polly-show to the new pass manager
Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D123678
2022-05-09 14:04:29 -05:00
Michael Kruse a6b399ad79 [PassManager] Implement DOTGraphTraitsViewer under NPM
Rename the legacy `DOTGraphTraits{Module,}{Viewer,Printer}` to the corresponding `DOTGraphTraits...WrapperPass`, and implement a new `DOTGraphTraitsViewer` with new pass manager.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D123677
2022-05-09 14:04:28 -05:00
Alexey Bataev 9dc4ced204 [SLP]Try partial store vectorization if supported by target.
We can try to vectorize number of stores less than MinVecRegSize
/ scalar_value_size, if it is allowed by target. Gives an extra
opportunity for the vectorization.

Fixes PR54985.

Differential Revision: https://reviews.llvm.org/D124284
2022-05-09 09:48:15 -07:00
Nikita Popov 68e1ba8188 [SCEV] Fold umin_seq using known predicate
Fold %x umin_seq %y to %x if %x ule %y. This also subsumes the
special handling for constant operands, as if %y is constant this
folds to umin via implied poison reasoning, and if %x is constant
then either %x is not zero and it folds to umin, or it is known
zero, in which case it is ule anything.
2022-05-09 16:35:08 +02:00
Nikita Popov 18eaff1510 [ScalarEvolution] Fold %x umin_seq %y if %x cannot be zero
Fold %x umin_seq %y to %x umin %y if %x cannot be zero. They only
differ in semantics for %x==0.

More generally %x *_seq %y folds to %x * %y if %x cannot be the
saturation fold (though currently we only have umin_seq).
2022-05-09 15:11:05 +02:00
Serge Pavlov eb28da89a6 [InstCombine] Remove side effect of replaced constrained intrinsics
If a constrained intrinsic call was replaced by some value, it was not
removed in some cases. The dangling instruction resulted in useless
instructions executed in runtime. It happened because constrained
intrinsics usually have side effect, it is used to model the interaction
with floating-point environment. In some cases side effect is actually
absent or can be ignored.

This change adds specific treatment of constrained intrinsics so that
their side effect can be removed if it actually absents.

Differential Revision: https://reviews.llvm.org/D118426
2022-05-07 19:04:11 +07:00
Nikita Popov 47c559d6c1 [SCEV] Fold umin_seq to umin using implied poison reasoning
Similar to how we convert logical and/or to bitwise and/or, we should
also convert umin_seq to umin based on implied poison reasoning. In
%x umin_seq %y, if %y being poison implies %x being poison, then we
don't need the sequential evaluation: Having %y contribute towards
the result will never make the result more poisonous. An important
corollary of this is that if %y is never poison, we also don't need
the sequential evaluation.

This avoids some of the regressions in D124910.

Differential Revision: https://reviews.llvm.org/D124921
2022-05-05 09:43:49 +02:00
Yangguang Li 3a8266902b [SCEV] Removed an unnecessary assertion
The assertion is to check we always get backedge taken count
(`BECount`) of zero when the exit condition is in select form
(`isa<BinaryOperation>(ExitCond)`) and the exit limit for the
first operand is zero `EL0.ExactNotTaken->isZero()`). However
the assertion is checking that the exit condition is NOT in
select form. Removing the the whole assertion since we now handle
select form in ScalarEvolution::getSequentialMinMaxExpr.

Reviewed By: reames, nikic

Differential Revision: https://reviews.llvm.org/D122835
2022-05-03 17:26:27 -04:00
Augie Fackler 1deea714b3 BuildLibCalls: simplify switch statement slightly
Per feedback on D123086 after submit.

Also added a test for vec_malloc et al attribute inference to show it's
doing the right thing.

The new tests exposed a defect, corrected by adding vec_free to the list of
free functions in MemoryBuiltins.cpp, which had been overlooked all the
way back in D94710, over a year ago.

Differential Revision: https://reviews.llvm.org/D124859
2022-05-03 13:17:33 -04:00
Nikita Popov 47255834e7 [ValueTracking] A and (B & ~A) have no common bits set
This extends haveNoCommonBitsSet() to two additional cases, allowing
the following folds:

 * `A + (B & ~A)` --> `A | (B & ~A)`
    (https://alive2.llvm.org/ce/z/crxxhN)
 * `A + ((A & B) ^ B)` --> `A | ((A & B) ^ B)`
    (https://alive2.llvm.org/ce/z/A_wsH_)

These should further fold to just `A | B`, though this currently
only works in the first case.

The reason why the second fold is necessary is that we consider
this to be the canonical form if B is a constant. (I did check
whether we can change that, but it looks like a number of folds
depend on the current canonicalization, so I ended up adding both
patterns here.)

Differential Revision: https://reviews.llvm.org/D124763
2022-05-03 11:33:27 +02:00
Igor Kirillov 4e5e042d9a [LoopVectorize] Support reductions that store intermediary result
Adds ability to vectorize loops containing a store to a loop-invariant
address as part of a reduction that isn't converted to SSA form due to
lack of aliasing info. Runtime checks are generated to ensure the store
does not alias any other accesses in the loop.

Ordered fadd reductions are not yet supported.

Differential Revision: https://reviews.llvm.org/D110235
2022-05-03 10:12:30 +01:00
David Green 6f81903e89 [LV][SLP] Mark fptosi_sat as vectorizable
This adds fptosi_sat and fptoui_sat to the list of trivially
vectorizable functions, mainly so that the loop vectorizer can vectorize
the instruction. Marking them as trivially vectorizable also allows them
to be SLP vectorized, and Scalarized.

The signature of a fptosi_sat requires two type overrides
(@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only
take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd
to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first
operand of the intrinsic as a overloaded (but not scalar) operand.

Differential Revision: https://reviews.llvm.org/D124358
2022-05-03 09:32:34 +01:00
Bardia Mahjour 363b3a645a fix warning caused by ef4ecc3cef 2022-05-02 17:06:27 -04:00
Bardia Mahjour ef4ecc3cef [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost
Reviewed By: congzhe, etiotto

Differential Revision: https://reviews.llvm.org/D123400
2022-05-02 16:49:10 -04:00
Nikita Popov 597946a4dd [ConstantFold] Don't convert getelementptr to ptrtoint+inttoptr
ConstantFolding currently converts "getelementptr i8, Ptr, (sub 0, V)"
to "inttoptr (sub (ptrtoint Ptr), V)". This transform is, taken by
itself, correct, but does came with two issues:

1. It unnecessarily broadens provenance by introducing an inttoptr.
   We generally prefer not to introduce inttoptr during optimization.
2. For the case where V == ptrtoint Ptr, this folds to inttoptr 0,
   which further folds to null. In that case provenance becomes
   incorrect. This has been observed as a real-world miscompile with
   rustc.

We should probably address that incorrect inttoptr 0 fold at some
point, but in either case we should also drop this inttoptr-introducing
fold. Instead, replace it with a fold rooted at
ptrtoint(getelementptr), which seems to cover the original
motivation for this fold (test2 in the changed file).

Differential Revision: https://reviews.llvm.org/D124677
2022-05-02 10:24:46 +02:00
Congzhe Cao c428a3d2a0 [LoopCacheAnalysis] Enable delinearization of fixed sized arrays
Currently loop cache cost (LCC) cannot analyze fix-sized arrays
since it cannot delinearize them. This patch adds the capability
to delinearize fix-sized arrays to LCC. Most of the code is ported
from DependenceAnalysis.cpp and some refactoring will be done in a
next patch.

Reviewed By: #loopoptwg, Meinersbur

Differential Revision: https://reviews.llvm.org/D122857
2022-04-29 16:01:27 -04:00
Roman Lebedev 981ed72a17
[NFC][SCEV] Refactor `createNodeForSelectViaUMinSeq()` out of `createNodeForSelectOrPHIViaUMinSeq()` 2022-04-29 02:37:06 +03:00
Mircea Trofin 49942d595f [NFC] remove const from FunctionPropertiesAnalysis::run, keep on Result
The goal in 75881d8b02 was just modifying what `Result` is, didn't
need to also modify ::run.
2022-04-28 15:10:21 -07:00
Mircea Trofin 75881d8b02 [NFC] const-ed the return type of FunctionPropertiesAnalysis
The result is a data bag, this makes sure it's signaled to a user that
the data can't be mutated when, for example, doing something like:

auto &R = FAM.getResult<FunctionPropertiesAnalysis>(F)
...
R.Uses++
2022-04-28 12:42:16 -07:00
Alexey Bataev 75e1cf4a6a [COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Differential Revision: https://reviews.llvm.org/D100486
2022-04-28 10:04:41 -07:00
Alexey Bataev 9861ca0c23 Revert "[COST]Improve cost model for shuffles in SLP."
This reverts commit 29a470e380 to fix
a crash reported in https://reviews.llvm.org/D100486#3479989.
2022-04-28 08:11:56 -07:00
Chris Jackson c792884589 [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]
Reland 3f2b76ec90 with the test corrected
to require x86-registered-target.

Differential Revision: https://reviews.llvm.org/D120169
2022-04-28 14:21:56 +01:00
Chris Jackson cd5f9efc4d Revert "[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]"
This reverts commit 3f2b76ec90.
2022-04-28 14:07:31 +01:00
Chris Jackson 3f2b76ec90 [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]
Reland commit 74273d575f following a fix
for a memory leak. The DVIRecoveryRecord vectors now use unique_ptr.

Differential Revision: https://reviews.llvm.org/D120169
2022-04-28 13:55:49 +01:00
Kirill Stoimenov 761366e6ae Revert "[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]"
This reverts commit 74273d575f.

Buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/22795
Failing with memory leak.
2022-04-27 23:11:48 +00:00
Alexey Bataev 29a470e380 [COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Differential Revision: https://reviews.llvm.org/D100486
2022-04-27 10:56:26 -07:00
Chris Jackson 74273d575f [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]
This relands commit 8f550368b1.

The test is amended with REQUIRES: x86-registered-target, in line with
the other debuginfo-scev-salvage tests.

Differential Revision: https://reviews.llvm.org/D120169
2022-04-27 13:10:30 +01:00
Chris Jackson 855752e563 Revert [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics[2/2]
This reverts commit 8f550368b1.
2022-04-27 13:06:03 +01:00
Chris Jackson 8f550368b1 [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]
Second of two patches to extend SCEV-based salvaging to dbg.value
intrinsics that have multiple location ops pre-LSR. This second patch
adds the core implementation.

Reviewers: @StephenTozer, @djtodoro

Differential Revision: https://reviews.llvm.org/D120169
2022-04-27 12:47:35 +01:00
Vasileios Porpodas fa8a9fea47 Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 6a9bbd9f20.

Code review: https://reviews.llvm.org/D124202
2022-04-26 14:02:40 -07:00
Vasileios Porpodas 6a9bbd9f20 Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 55ce296d6f.
2022-04-26 11:25:26 -07:00
Vasileios Porpodas 55ce296d6f [SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`
Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is now used for passing the operands of
the shuffle.

Differential Revision: https://reviews.llvm.org/D124202
2022-04-26 11:11:29 -07:00
Mircea Trofin b1fa5ac3ba [mlgo] Factor out TensorSpec
This is a simple datatype with a few JSON utilities, and is independent
of the underlying executor. The main motivation is to allow taking a
dependency on it on the AOT side, and allow us build a correctly-sized
buffer in the cases when the requested feature isn't supported by the
model. This, in turn, allows us to grow the feature set supported by the
compiler in a backward-compatible way; and also collect traces exposing
the new features, but starting off the older model, and continue
training from those new traces.

Differential Revision: https://reviews.llvm.org/D124417
2022-04-25 18:35:46 -07:00
David Green 9727c77d58 [NFC] Rename Instrinsic to Intrinsic 2022-04-25 18:13:23 +01:00
Jun Ma c0022b4bb1 [InlineCost] Set LastCallToStaticBonus in ML inlining models.
This patch set LastCallToStaticBonus based on check, it has
no noticeable size reduction on an internal workload and linux kernel
with Os/Oz.

Differential Revision: https://reviews.llvm.org/D124233
2022-04-24 09:34:19 +08:00
Florian Hahn d43c083ab6
[SCEV] Use getConstant to construct SCEV for ConstantInt (NFC).
We already know that we will construct a SCEVConstant. Directly use
getConstant, rather than going through getSCEV.
2022-04-23 11:12:59 +01:00
Chang-Sun Lin Jr 7ee30a0e24 [NFC][LAA] Match-up type sizes for possible extensions, based on actual bit-size rather than rounded-up byte size.
Differential Revision: https://reviews.llvm.org/D119200
2022-04-22 23:16:20 -07:00
Mircea Trofin e4794ff5c6 [mlgo][nfc] Decouple TensorSpec from tensorflow.
The motivation is twofold:

1) Allow plugging in a different training-time evaluator, e.g.
   TFLite-based, etc.

2) Allow using TensorSpec for AOT, too, to support evolution: we start
   by extracting a superset of the features currently supported by a
   model. For the tensors the model does not support, we just return a
   valid, but useless, buffer. This makes using a 'smaller' model (less
   supported tensors) transparent to the compiler. The key is to
   dimension the buffer appropriately, and we already have TensorSpec
   modeling that info.

The only coupling was due to the reliance of a TF internal API for
getting the element size, but for the types we are interested in,
`sizeof` is sufficient.

A subsequent change will yank out TensorSpec in its own module.

Differential Revision: https://reviews.llvm.org/D124045
2022-04-21 15:37:01 -07:00
Vasileios Porpodas 889588ee97 [SLP] Refactoring isLegalBroadcastLoad() to use `ElementCount`.
Replacing `unsigned` with `ElementCount` in the argument of `isLegalBroadcastLoad()`.
This helps reduce the diff of a future SLP patch for AArch64.
2022-04-21 10:19:00 -07:00
Alexey Bataev 2cca53c815 [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.

Part of D100486

Differential Revision: https://reviews.llvm.org/D115653
2022-04-20 09:37:16 -07:00
Alexey Bataev 5f7ac15912 Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer."
This reverts commit 2f49163b33 to fix
a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284
2022-04-20 06:35:55 -07:00
Alexey Bataev 2f49163b33 [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.

Part of D100486

Differential Revision: https://reviews.llvm.org/D115653
2022-04-20 05:32:56 -07:00
Nikita Popov f767a7d115 [DomTreeUpdater] Remove deprecated methods
Remove the insertEdge(), insertEdgeRelaxed(), deleteEdge() and
deleteEdgeRelaxed() methods, which have been deprecated three
years ago.
2022-04-20 12:14:29 +02:00
Andrew Litteken 3de29ad209 [IRSim] Ignore debug instructions when creating canonical numbering
When constructing canonical relationships between two regions, the first instruction of a basic block from the first region is used to find the corresponding basic block from the second region. However, debug instructions are not included in similarity matching, and therefore do not have a canonical numbering. This patch makes sure to ignore the debug instructions when finding the first instruction in a basic block.

Reviewer: paquette

Differential Revision: https://reviews.llvm.org/D123903
2022-04-19 13:18:28 -05:00
Arthur Eubanks a7e20a8a7a [CallPrinter] Port CallPrinter passes to new pass manager
Port the legacy CallGraphViewer and CallGraphDOTPrinter to work with the new pass manager.

Addresses issue https://github.com/llvm/llvm-project/issues/54323
Adds back related tests that were removed in commits d53a4e7b4a and 9e9d9aba14

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D122989
2022-04-18 10:02:18 -07:00
Andrew Litteken a919d3d888 [IROutliner] Ensure that incoming blocks of PHINodes are included in the unique numbering gneration for phi nodes for each exit path
Issue: https://github.com/llvm/llvm-project/issues/54431

PHINodes that need to be generated to accommodate a PHINode outside the region due to different output paths need to have their own numbering to determine the number of output schemes required to properly handle all the outlined regions. This numbering was previously only determined by the order and values of the incoming values, as well as the parent block of the PHINode. This adds the incoming blocks to the calculation of a hash value for these PHINodes as well, and the supporting infrastructure to give each block in a region a corresponding canonical numbering.

Reviewer: paquette

Differential Revision: https://reviews.llvm.org/D122207
2022-04-14 12:13:17 -05:00
Kevin P. Neal d43d9e1d5c [FPEnv][InstSimplify] Fold fsub -0.0, -X ==> X
Currently the fsub optimizations in InstSimplify don't know how to fold
-0.0 - (-X) to X when the constrained intrinsics are used. This adds partial
support. The rest of the support will come later with work on the IR
matchers.

This review is split out from D107285.

Differential Revision: https://reviews.llvm.org/D123396
2022-04-14 11:48:54 -04:00
Congzhe Cao 557b131c88 [DA] Refactor with a better API
Refactor from iteratively using BitCastInst::getOperand()
to using stripPointerCasts() instead. This is an improvement
since now we are able to analyze more cases, please refer
to test cases added in this patch.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D123559
2022-04-13 14:51:48 -04:00
serge-sans-paille 262eba01b3 Revert "[ValueTracking] Make getStringLenth aware of strdup"
This reverts commit e810d55809.

The commit was not taken into account the fact that strduped string could be
modified. Checking if such modification happens would make the function very
costly, without a test case in mind it's not worth the effort.
2022-04-13 19:17:28 +02:00
Muhammad Omair Javaid 42ebfa8269 Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"
This reverts commit 64b6192e81.

This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage:

https://lab.llvm.org/buildbot/#/builders/176/builds/1515

llvm-tblgen crashes after applying this patch.
2022-04-13 04:53:07 +05:00
Johannes Doerfert 9dc7da3f9c [GlobalsModRef][FIX] Ensure we honor synchronizing effects of intrinsics
This is a long standing problem that resurfaces once in a while [0].
There might actually be two problems because I'm not 100% sure if the
issue underlying https://reviews.llvm.org/D115302 would be solved by
this or not. Anyway.

In 2008 we thought intrinsics do not read/write globals passed to them:
d4133ac315
This is not correct given that intrinsics can synchronize threads and
cause effects to effectively become visible.

NOTE: I did not yet modify any tests but only tried out the reproducer
      of https://github.com/llvm/llvm-project/issues/54851.

Fixes: https://github.com/llvm/llvm-project/issues/54851

[0] https://discourse.llvm.org/t/bug-gvn-memdep-bug-in-the-presence-of-intrinsics/59402

Differential Revision: https://reviews.llvm.org/D123531
2022-04-12 16:42:50 -05:00
Nikita Popov 1d530b914e [InstSimplify] Don't fold phi of poison and trapping const expr (PR49839)
Folding this case would result in the constant expression being
executed unconditionally, which may introduce a new trap.

Fixes https://github.com/llvm/llvm-project/issues/49839.
2022-04-12 17:32:25 +02:00
serge-sans-paille e810d55809 [ValueTracking] Make getStringLenth aware of strdup
During strlen compile-time evaluation, make it possible to track size of
strduped strings.

Differential Revision: https://reviews.llvm.org/D123497
2022-04-12 14:47:29 +02:00
Nikita Popov 8d5c8d57c6 [InlineCost] Check that function types match
Retain the behavior we get without opaque pointers: A call to a
known function with different function type is considered an
indirect call.

This fixes the crash reported in https://reviews.llvm.org/D123300#3444772.
2022-04-12 11:05:33 +02:00
Arthur Eubanks b22ffc7b98 [CaptureTracking] Ignore ephemeral values in EarliestEscapeInfo
And thread DSE's ephemeral values to EarliestEscapeInfo.

This allows more precise analysis in DSEState::isReadClobber() via BatchAA.

Followup to D123162.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D123342
2022-04-08 10:07:26 -07:00