Commit Graph

31780 Commits

Author SHA1 Message Date
Florian Hahn 9d31d1c214
[ConstraintElimination] Use logic from 3771310eed for queries only.
The logic added in 3771310eed was placed sub-optimally. Applying the
transform in ::getConstraint meant that it would also impact conditions
that are added to the system by the signed <-> unsigned transfer logic.

This meant we failed to add some signed facts to the signed system. To
make sure we still add as many useful facts to the signed/unsigned
systems, move the logic to the point where we query the system.
2022-10-08 11:03:45 +01:00
Florian Hahn 13ac102726
[LoopSimplifyCFG] Invalidate SCEV dispositions.
Clear all dispositions if there are any dead blocks (which will get
removed later) and also clear dispositions for removed instructions.

Clearing all dispositions in case there are dead blocks happens first,
which should avoid traversing SCEV use-lists for invalidating
dispositions for individual values.

Fixes #58179.
2022-10-07 21:35:42 +01:00
Florian Hahn 19ad1cd5ce
Recommit "[SCEV] Support clearing Block/LoopDispositions for a single value."
This reverts commit 92f698f01f.

The updated version of the patch includes handling for non-SCEVable
types. A test case has been added in ec86e9a99b.
2022-10-07 20:15:44 +01:00
Philip Reames cb66e123c6 Remove PlaceSafepoints pass
This patch was added way back in the beginning of the work which became the statepoint infrastructure. The idea was that safepoints could be inserted late in the optimization pipeline. This is true if the only concern is garbage collection, but this approach turned out to be incompatible with the requirement to also support deoptimization at safepoints.

In theory, this pass would still be quite useful for an AOT compiled language which wants to support garbage collection, but we have no known users, and haven't for over 5 years. Time to remove unused code. If someone wants to use this, restoring it would not be hard. The immediate motivation for removal is that this is one of the last passes remaining which hasn't been ported to the new pass manager and the (straight forward) work to do so is not justified for unused code.

Differential Revision: https://reviews.llvm.org/D135371
2022-10-07 11:51:00 -07:00
Sanjay Patel 3e6767ed5f [InstCombine] propagate 'exact' when converting ashr to lshr
The shift amount is not changing, so if we guaranteed
shifting out zeros before, those bits are still zeros.

https://alive2.llvm.org/ce/z/sokQca
2022-10-07 13:17:19 -04:00
Florian Hahn 92f698f01f
Revert "[SCEV] Support clearing Block/LoopDispositions for a single value."
This reverts commit 9e931439dd.

This commit causes a crash when TSan, e.g. with
https://lab.llvm.org/buildbot/#/builders/70/builds/28309/steps/10/logs/stdio

Reverting while I extract a reproducer and submit a fix.
2022-10-07 17:58:54 +01:00
Sanjay Patel bdfefac9a4 [InstCombine] refactor sdiv by (negative) power-of-2 folds; NFCI
It's probably better to try harder on this kind of
pattern by using ValueTracking.
2022-10-07 11:35:17 -04:00
Florian Hahn 9e931439dd
[SCEV] Support clearing Block/LoopDispositions for a single value.
Extend forgetBlockAndLoopDisposition to allow clearing information for a
single value. This can be useful when only a single value is changed,
e.g. because the instruction is moved.

We also need to clear the cached values for all SCEV users, because they
may depend on the starting value's disposition.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D134614
2022-10-07 16:07:17 +01:00
Florian Hahn 3771310eed
[ConstraintElimination] Convert to unsigned Pred if possible.
Convert SLE/SLT predicates to unsigned equivalents if both operands are
known to be signed-positive.

https://alive2.llvm.org/ce/z/tBeiZr
2022-10-07 12:27:36 +01:00
Nikita Popov b43a4d0850 [LoopPeeling] Support peeling loops with non-latch exits
Loop peeling currently requires that a) the latch is exiting
b) a branch and c) other exits are unreachable/deopt. This patch
removes all of these limitations, and adds the necessary branch
weight updating support. It essentially works the same way as
before with latch -> exiting terminator and
loop trip count -> per exit trip count.

It's worth noting that there are still other limitations in
profitability heuristics: This patch enables peeling of loops to
make conditions invariant (which is pretty much always highly
profitable if possible), while peeling to make loads dereferenceable
still checks that non-latch exits are unreachable and PGO-based
peeling has even more conditions. Those checks could be relaxed
later if we consider those cases profitable.

The motivation for this change is that loops using iterator adaptors
in Rust often optimize very badly, and end up with a loop phi of the
form phi(true, false) in the final result. Peeling eliminates that
phi and conditions based on it, which enables a lot of follow-on
simplification.

Differential Revision: https://reviews.llvm.org/D134803
2022-10-07 12:35:52 +02:00
Nikita Popov ccf53cae32 [ValueTracking] Remove unused Offset argument in getConstantStringInfo() (NFC) 2022-10-07 11:35:55 +02:00
Dmitry Makogon 8307f6c854 [LoopPredication] Insert assumes of conditions of predicated guards
As LoopPredication performs non-equivalent transforms removing some
checks from loops, other passes may not be able to perform transforms
they'd be able to do if the checks were left in loops.

This patch makes LoopPredication insert assumes of the replaced
conditions either after a guard call or in the true block of
widenable condition branch.

Differential Revision: https://reviews.llvm.org/D135354
2022-10-07 16:10:24 +07:00
Nikita Popov 333246b48e Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify
Relative to the previous attempt, this adjusts simplification to
use the correct context instruction: We need to use the terminator
of the incoming block, not the original instruction.

-----

foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.

This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.

This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.

Differential Revision: https://reviews.llvm.org/D134954
2022-10-07 11:04:19 +02:00
Alina Sbirlea b9898e7ed1 Revert "Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify"
This reverts commit e94619b955.
2022-10-06 13:12:24 -07:00
Alexey Bataev 323ed2308a [SLP]Improve/fix CSE analysis of the blocks/instructions.
Added analysis for invariant extractelement instructions and improved
detection of the CSE blocks for generated extractelement instructions.

Differential Revision: https://reviews.llvm.org/D135279
2022-10-06 12:08:48 -07:00
Alex Brachet 4ea1a647ff [PGO] Make emitted symbols hidden
Differential Revision: https://reviews.llvm.org/D135340
2022-10-06 18:28:16 +00:00
Bjorn Pettersson 0db4b1d1a8 [SimplifyLibCalls] Adjust code comment in optimizeStringLength. NFC
The limitation in LibCallSimplifier::optimizeStringLength to only
optimize when the string is an i8 array was changed already in
commit 50ec0b5dce back in 2017.

We still only simplify when 's' points at an array of 'CharSize', so
the comment is still valid in the sense that we do not support
arbitrary array types.

Differential Revision: https://reviews.llvm.org/D135261
2022-10-06 20:00:27 +02:00
Arthur Eubanks ae5733346f Revert "[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI"
This reverts commit cd8f3e7581.

Causes miscompiles, see D132657
2022-10-06 10:36:02 -07:00
Sanjay Patel 8da2fa856f [InstCombine] fold sdiv with hidden common factor
(X * Y) s/ (X << Z) --> Y s/ (1 << Z)

https://alive2.llvm.org/ce/z/yRSddG

issue #58137
2022-10-06 13:11:50 -04:00
Florian Hahn a7ac0dd0cf
[ConstraintElimination] Generalize AND matching.
Extend more general matching used for chains of ORs to also support
chains of ANDs.
2022-10-06 17:17:38 +01:00
Sanjay Patel 6b869be810 [InstCombine] fold udiv with hidden common factor
(X * Y) u/ (X << Z) --> Y u>> Z

https://alive2.llvm.org/ce/z/4G9D_W
2022-10-06 11:35:27 -04:00
Florian Hahn 8e3e96298f
[ConstraintElimination] Order cmps for signed <-> unsigned transfer first.
Make sure conditions with constant operands come before conditions
without constant operands. This increases the effectiveness of the
current signed <-> unsigned fact transfer logic.
2022-10-06 15:56:25 +01:00
Florian Hahn 349375d093
[ConstraintElimination] Generalize OR matching.
Extend OR handling to traverse chains of ORs.
2022-10-06 11:56:22 +01:00
Nikita Popov 028874dd61 [Local] Fix unused variable warnings (NFC) 2022-10-06 10:30:59 +02:00
Florian Hahn 7449570ff7
[ConstraintElimination] Use ConstraintTy::IsSigned instead of Predicate.
This should be NFC and ensure the sign of the constraint is used
consistently in the future.
2022-10-06 07:51:49 +01:00
Carl Ritson c316332e17 [Sink] Allow sinking of invariant loads across critical edges
Invariant loads can always be sunk.

Reviewed By: foad, arsenm

Differential Revision: https://reviews.llvm.org/D135133
2022-10-06 09:21:12 +09:00
Florian Hahn 9aa004a04c
[ConstraintElimination] Convert NewIndices to vector and rename (NFCI).
The callers of getConstraint only require a list of new variables.
Update the naming and types to make this clearer.
2022-10-05 16:25:00 +01:00
Johannes Doerfert e18736149c [Attributor] Teach AAPointerInfo about atomic cmxchg and rmw
The atomic operations behave similar to a store except that we don't
know the new value and we read the result first.
2022-10-05 06:48:00 -07:00
Johannes Doerfert 93e51fa444 [Attributor] AAPointerInfo can model non-escaping call uses
If a call base use will not capture a pointer we can approximate the
effects. This is important especially for readnone/only uses. Even
may-write uses are not too bad with reachability in place. Capturing
is the problem as we loose track of update sides.
2022-10-05 06:29:14 -07:00
Johannes Doerfert 477e8e10f0 [Attributor] Teach AAPointerInfo to look into aggregates
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
2022-10-05 06:19:47 -07:00
Nikita Popov 5fa14ee835 [MemCpyOpt] Don't hoist above producer of pointer operand
This was already handled correctly below, but not checked for the
original store pointer operand. Encountered when converting tests
to opaque pointers, where the intermediate bitcast goes away.
2022-10-05 14:52:33 +02:00
David Stuttard d1d7d2235c [AggressiveInstCombine] Fix cases where non-opaque pointers are used
In the case of non-opaque pointers, when combining consecutive loads,
need to bitcast the pointer source to the combined type size, otherwise
asserts are triggered.

Differential Revision: https://reviews.llvm.org/D135249
2022-10-05 13:42:46 +01:00
Nikita Popov e94619b955 Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify
The infinite loop seen on buildbots should be fixed by
11897708c0 (assuming there are not
multiple infinite combine loops...)

-----

foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.

This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.

This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.

Differential Revision: https://reviews.llvm.org/D134954
2022-10-05 14:00:19 +02:00
Nikita Popov 11897708c0 [InstCombine] Directly replace instr in foldIntegerTypedPHI() (NFCI)
Rather than inserting a ptrtoint + inttoptr pair, directly replace
the inttoptr with the new phi node. This ensures that no other
transform can undo it before the pair gets folded away.

This avoids the infinite loop when combined with D134954.

This is NFCI in the sense that it shouldn't make a difference, but
could due to different worklist order.
2022-10-05 13:28:23 +02:00
Florian Hahn 469f0fc6a6
[SimpleLoopUnswitch] Clear dispos in deleteDeadBlocksFromLoop.
SimpleLoopUnswitch may remove blocks from loops. Clear block and loop
dispositions in that case, to clean up invalid entries in the cache.

Fixes #58158.

Fixes #58159.
2022-10-05 10:28:15 +01:00
Johannes Doerfert a9557115b4 [Attributor] Qualify variables to avoid clashes in the future 2022-10-04 19:43:04 -07:00
Gulfem Savrun Yeniceri d7592bbb03 Revert "Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify"
This reverts commit e1dd2cd063 because
the original commit b20e34b39f had
a dramatic increase in the build time of RTfuzzer, which caused Fuchsia
Clang toolchain builders to timeout:
https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8801248587754572961/overview
2022-10-04 20:57:34 +00:00
Florian Hahn 4f827318e3
[LoopVersioning,LLE] Clear LoopAccessInfoManager after making changes.
Loop versioning changes the control-flow, which may impact SCEVs cached
by for other loops in LoopAccessInfoManager. Clear the manager after
making changes.

Fixes #57825.

Depends on D134609.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134611
2022-10-04 21:35:42 +01:00
Ram-NK a58b6acf1f [NFC][LoopInterchange] Clean up of irrelevent dependency checking with
isOuterMostDepPositive()

The function isOuterMostDepPositive() is checked after negative dependence
vectors are normalized to be non-negative, so there will not be any negative
dependency ('>' as the outermost non-equal sign) after normalization. And
therefore the check in isOuterMostDepPositive() is irrelevent and redundant.

Reviewed By: congzhe

Differential Revision: https://reviews.llvm.org/D132982
2022-10-04 14:54:08 -04:00
Alexey Bataev ab9a81f736 [SLP]Try to emit canonical shuffle with undef operand.
In the canonical form of the shuffle the poison/undef operand is the
second operand, the patch tries to emit canonical form for partial
vectorization of the buildvector sequence.
Also, this patch starts emitting freeze instruction for shuffles with undef indices if the second shuffle operan is undef, not poison. It is an initial step to D93818, where undef mask element are treated as returning poison value.

Differential Revision: https://reviews.llvm.org/D134377
2022-10-04 08:16:07 -07:00
Nikita Popov e1dd2cd063 Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify
Reapply with a fix for the case where an operand simplified back
to the original phi: We need to map this case to the new phi node.

-----

foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.

This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.

This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.
2022-10-04 15:18:34 +02:00
Alex Richardson 16f9c5577d [SimplifyLibCalls] Retain attributes added by Builder.CreateMem*
This currently does not make much of a difference (only one tests is
affected), but it is helpful e.g. for the out-of-tree CHERI target where
Builder.CreateMemCpy() can add attributes other than parameter alignment.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D135075
2022-10-04 13:11:34 +00:00
Bjorn Pettersson 491ac8f3e8 [LibCalls] Cast Char argument to 'int' before calling emitFPutC
The helpers in BuildLibCalls normally expect that the Value
arguments already have the correct type (matching the lib call
signature). And exception has been emitFPutC which casted the Char
argument to 'int' using CreateIntCast. This patch moves the cast to
the caller instead of doing it inside emitFPutC.

I think it makes sense to make the BuildLibCall API:s a bit
more consistent this way, despite the need to handle the int cast
in two different places now.

Differential Revision: https://reviews.llvm.org/D135066
2022-10-04 12:52:05 +02:00
Bjorn Pettersson aa1b64cc42 [BuildLibCalls] Use TLI to get 'int' and 'size_t' type sizes
Stop assuming that an 'int' is 32 bits in helpers that emit libcalls
to lib functions that had 'int' in the signature. For most targets
this is NFC. For a target with 16 bit 'int' type this could help out
detecting if trying to emit a libcall with incorrect signature.

Similarly we now derive the type mapping to 'size_t' by asking TLI
about the size of 'size_t'. This should be NFC (at least for in-tree
targets) since getSizeTSize(), in TLI, is deriving the size in the
same way as DataLayout::getIntPtrType().

Differential Revision: https://reviews.llvm.org/D135065
2022-10-04 12:52:05 +02:00
Bjorn Pettersson 73e8d95d28 [BuildLibCalls] Name types to identify when 'int' and 'size_t' is assumed. NFC
Lots of BuildLibCalls helpers are using Builder::getInt32Ty to get
a type matching an 'int', and DataLayout::getIntPtrType to get a
type matching 'size_t'. The former is not true for all targets, since
and 'int' isn't always 32 bits. And the latter is a bit weird as well
as the definition of DataLayout::getIntPtrType isn't clearly mapping
it to 'size_t'.

This patch is not aiming at solving any such problems. It is merely
highlighting when a libcall is expecting to use 'int' and 'size_t'
by naming the types as IntTy and SizeTTy when preparing the type
signatures for the emitted libcalls.

Differential Revision: https://reviews.llvm.org/D135064
2022-10-04 12:52:05 +02:00
Florian Hahn 825e16969e
[LAA] Pass LoopAccessInfoManager instead of GetLAA function.
Use LoopAccessInfoManager directly instead of various GetLAA lambdas.

Depends on D134608.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134609
2022-10-04 11:51:25 +01:00
Florian Hahn e399dd601f
[SimpleLoopUnswitch] Clear block and loop dispos after destroying loop.
SimpleLoopUnswitch may remove loops. Clear block and loop dispositions,
to clean up invalid entries in the cache.

Fixes #58136.
2022-10-04 10:27:52 +01:00
Nikita Popov 635f93dff7 [SimplifyLibCalls] Place deref attr even if nonnull already set
If nonnull is already set, we currently skip setting both nonnull
and dereferenceable. Make these independent, to avoid regressions
when additional nonnull attributes are inferred earlier.
2022-10-04 11:26:15 +02:00
Nikita Popov 0f32f0e147 Revert "[InstCombine] Switch foldOpIntoPhi() to use InstSimplify"
This reverts commit b20e34b39f.

This causes RAUW type mismatch assertions on some buildbots,
reverting for now.
2022-10-04 11:17:09 +02:00
Nikita Popov b20e34b39f [InstCombine] Switch foldOpIntoPhi() to use InstSimplify
foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.

This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.

This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.
2022-10-04 10:12:14 +02:00
Florian Hahn b2daeb9706
[ConstraintElimination] Re-enable debug print when adding facts. (NFC)
Also add test coverage for important debug output.
2022-10-03 20:51:58 +01:00
Florian Hahn bdfe986d66
[ConstraintElimination] Simplify logic for using inverse predicate (NFC)
Recent improvements to the code structure mean we don't need to reset
the condition's predicate in the IR and later restore it. Remove the
restorer logic.
2022-10-03 18:35:59 +01:00
Florian Hahn 017552e8b8
[ConstraintElimination] Remove stray comment (NFC).
The comment doens't apply in the current context, remove it.
2022-10-03 18:20:48 +01:00
Florian Hahn a2efc29e99
[ConstraintElimination] Remove unused StackEntry::IsNot field. (NFC)
The field is no unused and can be removed.
2022-10-03 18:05:25 +01:00
Alex Richardson 3890a456d8 [SimplifyLibCalls] Reduce code duplication. NFC
Reviewed By: nikic, nickdesaulniers, xbolva00

Differential Revision: https://reviews.llvm.org/D135073
2022-10-03 15:44:00 +00:00
Yevgeny Rouban e351821088 Fix compilation of CodeLayout.cpp for MacOS
llvm/lib/Transforms/Utils/CodeLayout.cpp uses std::abs() with double argument,
which is provided by cmath header, which is not explicitly included into CodeLayout.cpp.
The implicit include in llvm/include/llvm/Support/MathExtras.h was removed in
commit 16544cbe64

Inserting explicit include of cmath into CodeLayout.cpp in order to fix build on MacOS.

Committed on behalf of alsemenov (Aleksei Semenov)
Reviewed By: thieta
Differential Revision: https://reviews.llvm.org/D135072
2022-10-03 21:47:43 +07:00
Bjorn Pettersson 66fcdfca4d [Analysis][SimplifyLibCalls] Refactor code related to size_t in lib func signatures. NFC
Added a helper in TargetLibraryInfo to get size of "size_t" in bits,
given a Module reference. The new getSizeTSize helper is using the
same strategy as for example isValidProtoForLibFunc has been using
in the past, assuming that the size can be derived by asking
DataLayout about the size/type of a pointer to int.

FortifiedLibCallSimplifier::optimizeStrpCpyChk was changed to use
the new getSizeTSize helper instead of assuming that sizeof(size_t)
is equal to sizeof(int*) by itself (that is the assumption used in
TargetLibraryInfoImpl::getSizeTSize so the result will be the same).

Having a common helper for this ensure that we use the same strategy
when deriving the size of "size_t" in different parts of the code.
One bonus with this refactoring (basing it on Module instead of just
DataLayout) is that it makes it easier to override this for a specific
target triple, in case the assumption of using getPointerSizeInBits
wouldn't hold.

Differential Revision: https://reviews.llvm.org/D110585
2022-10-03 12:02:50 +02:00
Sanjay Patel 2e87333bfe [InstCombine] convert mul by negative-pow2 to negate and shift
This is an unusual canonicalization because we create an extra instruction,
but it's likely better for analysis and codegen (similar reasoning as D133399).

InstCombine::Negator may create this kind of multiply from negate and shift,
but this should not conflict because of the narrow negation.

I don't know how to create a fully general proof for this kind of transform in
Alive2, but here's an example with bitwidths similar to one of the regression
tests:
https://alive2.llvm.org/ce/z/J3jTjR

Differential Revision: https://reviews.llvm.org/D133667
2022-10-02 12:22:25 -04:00
Florian Hahn 3fe6ddd999
[ConstraintElimination] Update Changed status in ssub simplification.
Update tryToSimplifyOverflowMath to indicate whether the function made
any changes to the IR.
2022-10-02 14:25:51 +01:00
Arthur Eubanks 5df4ab55f9 [llvm] Migrate PAEval to new pass manager 2022-10-01 16:41:58 -07:00
Florian Hahn 7c0ff64b0f
[LAA] Change to function analysis for new PM.
At the moment, LoopAccessAnalysis is a loop analysis for the new pass
manager. The issue with that is that LAI caches SCEV expressions and
modifications in a loop may impact SCEV expressions in other loops, but
we do not have a convenient way to invalidate LAI for other loops
withing a loop pipeline.

To avoid this issue, turn it into a function analysis which returns a
manager object that keeps track of the individual LAI objects per loop.

Fixes #50940.

Fixes #51669.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134606
2022-10-01 15:44:27 +01:00
Teresa Johnson 43417d8159 [MemProf] Update metadata during inlining
Update both memprof and callsite metadata to reflect inlined functions.

For callsite metadata this is simply a concatenation of each cloned
call's call stack with that of the inlined callsite's.

For memprof metadata, each profiled memory info block (MIB) is either
moved to the cloned allocation call or left on the original allocation
call depending on whether its context matches the newly refined call
stack context on the cloned call. We also reapply context trimming
optimizations based on the refined set of contexts on each of the calls
(cloned and original).

Depends on D128142.

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D128143
2022-09-30 19:21:15 -07:00
Teresa Johnson 4d243348fb Revert "[MemProf] Update metadata during inlining" and preceeding commit
This reverts commit 0d7f3464ce and
commit f9403ca41e. The latter was
"Profile matching and IR annotation for memprof profiles." and was left
from a bad rebase from a commit already pushed upstream.
2022-09-30 17:01:30 -07:00
Teresa Johnson 0d7f3464ce [MemProf] Update metadata during inlining
Update both memprof and callsite metadata to reflect inlined functions.

For callsite metadata this is simply a concatenation of each cloned
call's call stack with that of the inlined callsite's.

For memprof metadata, each profiled memory info block (MIB) is either
moved to the cloned allocation call or left on the original allocation
call depending on whether its context matches the newly refined call
stack context on the cloned call. We also reapply context trimming
optimizations based on the refined set of contexts on each of the calls
(cloned and original), via utilities in MemoryProfileInfo.

Depends on D128142.

Differential Revision: https://reviews.llvm.org/D128143
2022-09-30 16:46:17 -07:00
Teresa Johnson f9403ca41e Profile matching and IR annotation for memprof profiles.
See also related RFCs:
RFC: Sanitizer-based Heap Profiler [1]
RFC: A binary serialization format for MemProf [2]
RFC: IR metadata format for MemProf [3]*

* Note that the IR metadata format has changed from the RFC during
implementation, as described in the preceeding patch adding the basic
metadata and verification support.

The matching is performed during the normal PGO annotation phase, to
ensure that the inlines applied in the IR at that point are a subset
of the inlines in the profiled binary and thus reflected in the
profile's call stacks. This is important because the call frames are
associated with functions in the profile based on the inlining in the
symbolized call stacks, and this simplifies locating the subset of
profile data relevant for matching onto each function's IR.

The PGOInstrumentationUse pass is enhanced to perform matching for
whatever combination of memprof and regular PGO profile data exists in
the profile.

Using the utilities introduced in D128854:
The memprof profile data for each context is converted to "cold" or
"notcold" based on parameterized thresholds for size, access count, and
lifetime. The memprof allocation contexts are trimmed to the minimal
amount of context required to uniquely identify whether the context is
cold or not cold. For allocations where all profiled contexts have the
same allocation type, no memprof metadata is attached and instead the
allocation call is directly annotated with an attribute specifying the
alloction type. This is the same attributed that will be applied to
allocation calls once cloned for different contexts, and later used
during LibCall simplification to emit allocation hints [4].

Depends on D128141 and D128854.

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html
[2] https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html
[3] https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165
[4] ab87cf382d

Differential Revision: https://reviews.llvm.org/D128142
2022-09-30 16:46:17 -07:00
Sanjay Patel 2053070443 [SCCP] remove unnecessary check for constant when folding sext->zext
I'm not sure how to test this because we seem to constant-fold
all examples already. We changed this code to use the common
isNonNegative() helper, so it should not be necessary to avoid
a constant. This makes the code uniform for all transforms.
2022-09-30 17:26:10 -04:00
Sanjay Patel 89a5d804c1 [SCCP] add a code comment about sitofp -> uitofp; NFC
D134975 would have added this fold, but we decided it's
not worth doing without some evidence of benefit.
2022-09-30 17:26:10 -04:00
Florian Hahn 04c711c78d
[ConstraintElimination] Make sure the variable is available before use.
This fixes a crash when trying to access an index for a value where we
don't have a known index.

Fixes #58009.
2022-09-30 18:09:01 +01:00
Nikita Popov d40dcb0b8d [LICM] Collect more scalar promotion stats (NFC)
Collect more statistics for scalar promotion. In particular,
keep track of how many promotion candidates there were, and
whether it is a load or a load/store promotion.
2022-09-30 16:07:52 +02:00
Simon Pilgrim 5849fcb635 Revert rG1b7089fe67b924bdd5ecef786a34bdba7a88778f "[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions"
Revert rGef89409a59f3b79ae143b33b7d8e6ee6285aa42f "Fix 'unused-lambda-capture' gcc warning. NFCI."
Revert rG926ccfef032d206dcbcdf74ca1e3a9ebf4d1be45 "[SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds"

Revert ScalarizationOverheadBuilder sequence from D134605 - when accumulating extraction costs by Type (instead of specific Value), we are not distinguishing enough when they are coming from the same source or not, and we always just count the cost once. This needs addressing before we can use getScalarizationOverhead properly.
2022-09-30 11:22:48 +01:00
Florian Hahn 8ae0d9aa07
[LoopDeletion] Clear block & loop dispo cache after breaking backedge.
breakLoopBackedge may remove blocks and loops. Also clear block &
loop disposition to avoid the cache containing invalid blocks and loops.

The coverage for the change is provided when using an ASAN build of opt
to run the LoopDeletion unit tests; without the fix, pointers to invalid
objects would be used.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D134663
2022-09-30 11:21:58 +01:00
Florian Hahn 9933a2e9fd
[SCEVExpander] Move LCSSA fixup to ::expand.
Move LCSSA fixup from ::expandCodeForImpl to ::expand(). This has
the advantage that we directly preserve LCSSA nodes here instead of
relying on doing so in rememberInstruction. It also ensures that we
 don't add the non-LCSSA-safe value to InsertedExpressions.

Alternative to D132704.

Fixes #57000.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D134739
2022-09-29 20:49:56 +01:00
luxufan f079ba76cf [DSE] Eliminate noop store even through has clobbering between LoadI and StoreI
For noop store of the form of LoadI and StoreI,
An invariant should be kept is that the memory state of the related
MemoryLoc before LoadI is the same as before StoreI.
For this example:
```
define void @pr49927(i32* %q, i32* %p) {
  %v = load i32, i32* %p, align 4
  store i32 %v, i32* %q, align 4
  store i32 %v, i32* %p, align 4
  ret void
}
```
Here the definition of the store's destination is different with the
definition of the load's destination, which it seems that the
invariant mentioned above is broken. But the definition of the
store's destination would write a value that is LoadI, actually, the
invariant is still kept. So we can safely ignore it.

Fixes https://github.com/llvm/llvm-project/issues/49271

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D132657
2022-09-29 00:51:56 +00:00
Nikola Tesic b5d28f3ea5 [Debugify][OriginalDIMode] Make HTML reporting infrastructure more resilient
Debugify in OriginalDebugInfo mode (verify-each-debuginfo-preserve), when used
in parallel builds of large projects, can produce incorrect report. More
precisely, simultaneous writes to JSON report file, could form incorrect JSON
objects, which describe found Debug Info bugs.
This patch uses the lock/unlock mechanism to protect JSON report file and also
makes script llvm/utils/llvm-original-di-preservation.py resilient to corrupted
lines in the report file. So, it ensures the creation of HTML report.

Differential Revision: https://reviews.llvm.org/D115616
2022-09-29 16:48:06 +02:00
eopXD 8cbdb1e081 [LSR][NFC] Add missing constness 2022-09-29 06:30:50 -07:00
Nikita Popov 412141663c Reapply [FunctionAttrs] Infer precise FMRB
The previous version of the patch would incorrect convert an
existing argmemonly attribute into an inaccessiblemem_or_argmemonly
attribute.

-----

This updates checkFunctionMemoryAccess() to infer a precise
FunctionModRefBehavior, rather than an approximation split into
read/write and argmemonly.

Afterwards, we still map this back to imprecise function attributes.
This still allows us to infer some cases that we previously did not
handle, namely inaccessiblememonly and inaccessiblemem_or_argmemonly.
In practice, this means we get better memory attributes in the
presence of intrinsics like @llvm.assume.

Differential Revision: https://reviews.llvm.org/D134527
2022-09-29 14:02:15 +02:00
Florian Hahn 080a1e2bbb
[LV] Create createInductionResumeValue helper (NFC).
Factor out the logic to create induction resume values for a specific
induction. This will be used in D92132 to support widened IVs during
epilogue vectorization.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D134211
2022-09-29 11:13:01 +01:00
Juan Manuel MARTINEZ CAAMAÑO 52545e603b [DebugInfo][InferAddressSpaces] Propagate DebugLoc when cloning an instruction in InferAddressSpaces
Differential Revision: https://reviews.llvm.org/D134428
2022-09-29 08:43:37 +00:00
Juan Manuel MARTINEZ CAAMAÑO e9716c64ec [StructurizeCFG] Remove imposible case and replace by assert
In addition, replace outdated XFAIL test by a new one.

Differential Revision: https://reviews.llvm.org/D134439
2022-09-29 08:27:49 +00:00
Florian Hahn 9247b012d6
[SCEVExpander] Use CreateBitOrPointerCast instead of builder (NFC).
Simplify the code by using CastInst::CreateBitOrPointerCast directly. By
not going through the builder, the temporary instruction also won't get
registered in InsertedValues & co, which means less work overall and
simplifies the clean-up.
2022-09-29 09:24:39 +01:00
Gulfem Savrun Yeniceri 5bdf22e743 [InstrProfiling] Fix emitting runtime hook once
https://reviews.llvm.org/D134254 introduced an issue on Fuchsia
target, which does not unconditionally emit runtime hook.
It used containsProfilingIntrinsics(M) after intrinsics are lowered.
So, this patch fixes the issue by capturing the result of that
function invocation before intrinsics are lowered.

Differential Revision: https://reviews.llvm.org/D134841
2022-09-29 01:21:49 +00:00
Florian Mayer e06c9b63bc [NFC] [HWASan] remove unnecessary cast 2022-09-28 17:48:19 -07:00
Florian Mayer 0401dc2913 [MTE] [HWASan] unify isInterestingAlloca
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D134779
2022-09-28 15:52:34 -07:00
serge-sans-paille 16544cbe64 [iwyu] Move <cmath> out of llvm/Support/MathExtras.h
Interestingly, MathExtras.h doesn't use <cmath> declaration, so move it out of
that header and include it when needed.

No functional change intended, but there's no longer a transitive include
fromMathExtras.h to cmath.
2022-09-28 20:49:01 +02:00
Mingming Liu ac28efa6c1 [SimplifyCFG][TranformUtils]Do not simplify away a trivial basic block if both this block and at least one of its predecessors are loop latches.
- Before this patch, loop metadata (if exists) will override the metadata of each predecessor; if the predecessor block already has loop metadata, the orignal loop metadata won't be preserved and could cause missed loop transformations (see 'test2' in llvm/test/Transforms/SimplifyCFG/preserve-llvm-loop-metadata.ll).

To illustrate how inner-loop metadata might be dropped before this patch:

CFG Before

      entry
        |
        v
 ---> while.cond   ------------->  while.end
 |       |
 |       v
 |   while.body
 |       |
 |       v
 |    for.body <---- (md1)
 |       |  |______|
 |       v
 |    while.cond.exit (md2)
 |       |
 |_______|

CFG After

       entry
         |
         v
 ---> while.cond.rewrite  ------------->  while.end
 |       |
 |       v
 |   while.body
 |       |
 |       v
 |    for.body <---- (md2)
 |_______|  |______|

Basically, when 'while.cond.exit' is folded into 'while.cond', 'md2' overrides 'md1' and 'md1' is dropped from the CFG.

Differential Revision: https://reviews.llvm.org/D134152
2022-09-28 10:48:14 -07:00
bipmis 3b49a9fcf6 [AggressiveInstCombine] Combine consecutive loads which are being merged to form a wider load.
The patch simplifies some of the patterns as below

1. (ZExt(L1) << shift1) | (ZExt(L2) << shift2) -> ZExt(L3) << shift1
2. (ZExt(L1) << shift1) | ZExt(L2) -> ZExt(L3)

The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc.

Fix the error reported on reverse load merge.

Differential Revision: https://reviews.llvm.org/D127392
2022-09-28 17:32:47 +01:00
Sanjay Patel e239198cdb [InstCombine] fold select shuffles with shared operand together
We don't combine generic shuffles together in IR, but select
shuffles are a special-case because a select shuffle of a
select shuffle is just another select shuffle; codegen is
expected to efficiently lower those (select shuffles are also
the canonical form of a vector select with constant condition).
2022-09-28 11:56:27 -04:00
Sameer Sahasrabuddhe 3f078b308b [AAPointerInfo] OffsetInfo: Unassigned is distinct from Unknown
A User like the PHINode may be visited multiple times for the same pointer along
different def-use edges. The uninitialized state of OffsetInfo at the first
visit needs to be distinct from the Unknown value that may be assigned after
processing the PHINode. Without that, a PHINode with all inputs Unknown is never
followed to its uses. This results in incorrect optimization because some
interfering accessess are missed.

Differential Revision: https://reviews.llvm.org/D134704
2022-09-28 20:31:36 +05:30
Benjamin Kramer 0fb2676c24 Revert "[FunctionAttrs] Infer precise FMRB"
This reverts commit 97dfa53626.

It can make DSE crash. Reduced test case at
https://reviews.llvm.org/P8291
2022-09-28 16:57:43 +02:00
Florian Hahn ed47bc8b58
[SCEVExpander] Remove dead Root argument from expandCodeForImpl (NFC).
The argument is unused and can be removed.
2022-09-28 12:08:36 +01:00
Florian Hahn 2b23a58924
[LoopDeletion] Forget block and loop dispositions after deleting loop.
After deleting a loop, the block and loop dispositions need to be
cleared. As we don't know which SCEVs in the loop/blocks may be
impacted, completely clear the cache. This should also fix some cases
where deleted loops remained in the LoopDispositions cache.

This fixes a verification failure surfaced by D134531.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D134613
2022-09-28 11:33:43 +01:00
Simon Pilgrim 926ccfef03 [SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds
Workaround for a chromium bug reported on D134605 - test case will be added later
2022-09-28 11:03:37 +01:00
Igor Kirillov 2d60d7ba1a [LoopVectorize][Fix] Crash when invariant store address is calculated inside loop
Fixes #57572

Generally LICM pass is responsible for sinking out code that calculates
invariant address inside loop as it only needed to be calculated once.
But in rare case it does not happen we will not be vectorizing the
loop.

Differential Revision: https://reviews.llvm.org/D133687
2022-09-28 10:33:50 +01:00
Philip Reames f6d110e26f [LAA] Make getPtrStride return Option instead of overloading zero as error value [nfc]
This is purely NFC restructure in advance of a change which actually exposes zero strides.  This is mostly because I find this interface confusing each time I look at it.
2022-09-27 15:55:44 -07:00
Philip Reames 899ebd7e99 [LV] Remove two unused default arguments [nfc] 2022-09-27 14:33:53 -07:00
Martin Sebor e80e134c77 [InstCombine] Add support for stpncpy folding
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D130922
2022-09-27 14:44:33 -06:00
Doru Bercea c9adeca501 Move allocas converted from __kmpc_alloc_shared to entry block. 2022-09-27 17:16:58 +00:00
Philip Reames dc7387b587 [LV] Adjust cost model to use uniform store lowering for unpredicated uniform stores
Follow up to D133580; adjust the cost model to prefer uniform store lowering for scalable stores which are unpredicated.

The impact here isn't in the uniform store lowering quality itself. InstCombine happily converts the scatter form into the single store form. The main impact is in letting the rest of the cost model make choices based on the knowledge that the vector will be scalarized on use.

Differential Revision: https://reviews.llvm.org/D134460
2022-09-27 07:28:40 -07:00
Simon Pilgrim ef89409a59 Fix 'unused-lambda-capture' gcc warning. NFCI. 2022-09-27 15:15:43 +01:00
Simon Pilgrim 1b7089fe67 [SLP] Add ScalarizationOverheadBuilder helper to track vector extractions
Instead of accumulating all extraction costs separately and then adjusting for repeated subvector extractions, this patch collects all the extractions and then converts to calls to getScalarizationOverhead to improve the accuracy of the costs.

I'm not entirely satisfied with the getExtractWithExtendCost handling yet - this still just adds all the getExtractWithExtendCost costs together - it really needs to be replaced with a "getScalarizationOverheadWithExtend", but that will require further refactoring first.

This replaces my initial attempt in D124769.

Differential Revision: https://reviews.llvm.org/D134605
2022-09-27 14:49:07 +01:00
Florian Hahn 3abaa3760d
[LSR] Preserve LCSSA in expander when rewriting loop exit values.
The expanded values when rewriting exit values need to preserve LCSSA.
Ask SCEVExpander to preserve LCSSA to ensure that.

Fixes #58007.
2022-09-27 09:58:48 +01:00
Nikita Popov 97dfa53626 [FunctionAttrs] Infer precise FMRB
This updates checkFunctionMemoryAccess() to infer a precise
FunctionModRefBehavior, rather than an approximation split into
read/write and argmemonly.

Afterwards, we still map this back to imprecise function attributes.
This still allows us to infer some cases that we previously did not
handle, namely inaccessiblememonly and inaccessiblemem_or_argmemonly.
In practice, this means we get better memory attributes in the
presence of intrinsics like @llvm.assume.

Differential Revision: https://reviews.llvm.org/D134527
2022-09-27 10:14:35 +02:00
Florian Hahn 275bee32ad
[LoopUnroll] Forget block and loop dispositions during unrolling.
After unrolling a loop, the block and loop dispositions need to be
cleared. As we don't know which SCEVs in the loop/blocks may be
impacted, completely clear the cache. This should also fix some cases
where deleted loops remained in the LoopDispositions cache.

This fixes a verification failure surfaced by D134531.

I am planning on reviewing/updating the existing uses of
forgetLoopDispositions to check if they should be replaced by
forgetBlockAndLoopDispositions.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D134612
2022-09-27 08:49:04 +01:00
Sebastian Peryt 46fc75ab28 [NFC][2/n] Remove PrunePH pass
Second patch in the series to remove legacy PM and
associated -enable-new-pm=0 flag targets pass that
has not been ported to new PM - PruneEH.
Discussion about this can be found in D44415.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134686
2022-09-26 18:38:04 -07:00
Sanjay Patel def6cbd2bd [InstCombine] add assert/test for zext to i1
This is a test to verify that we do not crash with the
problem noted in issue #57986. The root problem should
be fixed with a prior change to InstSimplify.
2022-09-26 16:01:25 -04:00
Matt Arsenault 473e83b95a GuardWidening: Pass through AssumptionCache (NFC) 2022-09-26 14:53:00 -04:00
Matt Arsenault 9bf1aea224 LoopPeel: Pass through AssumptionCache (NFC) 2022-09-26 14:52:59 -04:00
Matt Arsenault 53fa00b3ae LoopUnroll: Pass through AssumptionCache (NFC)
Using these queries with a context instruction and without a cache
seems to be about 2x slower than with it so this theoretically
improves compile time.
2022-09-26 14:52:59 -04:00
Ruiling Song a5676a3a7e StructurizeCFG: Set Undef for non-predecessors in setPhiValues()
During structurization process, we may place non-predecessor blocks
between the predecessors of a block in the structurized CFG. Take
the typical while-break case as an example:
```
 /---A(v=...)
 |  / \
 ^ B   C
 |  \ /|
 \---L |
     \ /
      E (r = phi (v:C)...)
```
After structurization, the CFG would be look like:
```
 /---A
 |   |\
 |   | C
 |   |/
 |   F1
 ^   |\
 |   | B
 |   |/
 |   F2
 |   |\
 |   | L
 \   |/
  \--F3
     |
     E
```
We can see that block B is placed between the predecessors(C/L) of E.
During phi reconstruction, to achieve the same sematics as before, we
are reconstructing the PHIs as:
  F1: v1 = phi (v:C), (undef:A)
  F3: r = phi (v1:F2), ...
But this is also saying that `v1` would be live through B, which is not
quite necessary. The idea in the change is to say the incoming value
from B is Undef for the PHI in E. With this change, the reconstructed
PHI would be:
  F1: v1 = phi (v:C), (undef:A)
  F2: v2 = phi (v1:F1), (undef:B)
  F3: r  = phi (v2:F2), ...

Reviewed by: sameerds

Differential Revision: https://reviews.llvm.org/D132450
2022-09-26 09:54:47 +08:00
Ruiling Song 40e9284f3c StructurizeCFG: prefer reduced number of live values
The instruction simplification will try to simplify the affected phis.
In some cases, this might extend the liveness of values. For example:

  BB0:
   | \
   | BB1
   | /
  BB2:phi (BB0, v), (BB1, undef)

The phi in BB2 will be simplified to v as v dominates BB2, but this is
increasing the number of active values in BB1. By setting CanUseUndef
to false, we will not simplify the phi in this way, this would help
register pressure. This is mandatory for the later change to help
reducing VGPR pressure for AMDGPU.

Reviewed by: foad, sameerds

Differential Revision: https://reviews.llvm.org/D132449
2022-09-26 09:54:47 +08:00
Douglas Yung 91e0423595 Revert "[SROA] Create additional vector type candidates based on store and load slices"
This reverts commit de3445e0ef.

This is causing GHI #57796 and #57821.
2022-09-23 12:24:07 -07:00
Douglas Yung 0a7f4e03a9 Revert "[SROA] Check typeSizeEqualsStoreSize in isVectorPromotionViable"
This reverts commit 3f08d248c4.

The commit this change is fixing is being reverted due to GHI #57796 and #37821, so revert this commit as well.
2022-09-23 12:24:07 -07:00
Teresa Johnson b1926f308f Restore "[MemProf] Memprof profile matching and annotation"
This reverts commit 794b7ea960, and
thus restores commit a212d8da94, and
follow on fixes 0cd6763fa9,
e9ff53d42f, and
37c6a25e9a.

Use a hash function (BLAKE3) instead of hash_combine/hash_code which are
not guaranteed to be stable across executions.

Additionally, it adds a "REQUIRES: x86_64-linux" to the tests that have
raw profile inputs to avoid failures on big endian bots.

Reviewers: snehasish, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D128142
2022-09-23 11:38:47 -07:00
Florian Hahn 2c692d891e
[LV] Update handling of scalable pointer inductions after b73d2c8.
The dependent code has been changed quite a lot since 151c144 which
b73d2c8 effectively reverts. Now we run into a case where lowering
didn't expect/support the behavior pre 151c144 any longer.

Update the code dealing with scalable pointer inductions to also check
for uniformity in combination with isScalarAfterVectorization. This
should ensure scalable pointer inductions are handled properly during
epilogue vectorization.

Fixes #57912.
2022-09-23 18:23:02 +01:00
Dmitri Gribenko 954d3cd2c6 Revert "[AggressiveInstCombine] Combine consecutive loads which are being merged to form a wider load."
This reverts commit 3c70c8c1df.

After this commit, during the 3-stage bootstrap the second-stage Clang
crashes.
2022-09-23 19:21:09 +02:00
Philip Reames 954c1ed009 [SLP] Adjust debug output for store vectorization failure
When store vectorization is infeasible, it's helpful to have a debug logging indication of why.  A case I've hit a couple times now is accidentally using -march instead of -mtriple and getting the default TTI results.  This causes max-vf to become 1, and thus hits the added logging line.
2022-09-23 09:58:15 -07:00
Philip Reames 42ef572049 [SLP] Fix cost model w.r.t. operand properties
We allow the target to report different costs depending on properties of the operands; given this, we have to make sure we pass the right set of operands and account for the fact that different scalar instructions can have operands with different properties.

As a motivating example, consider a set of multiplies which each multiply by a constant (but not all the same constant). Most of the constants are power of two (but not all).

If the target doesn't have support for non-uniform constant immediates, this will likely require constant materialization and a non-uniform multiply. However, depending on the balance of target costs for constant scalar multiplies vs a single vector multiply, this might or might not be a profitable vectorization.

This ends up basically being a rewrite of the existing code. Normally, I'd scope the change more narrowly, but I kept noticing things which seemed highly suspicious, and none of the existing code appears to have any test coverage at all. I think this is a case where simply throwing out the existing code and starting from scratch is reasonable.

This is a follow on to Alexey's D126885, but also handles the arithmetic instruction case since the existing code appears to have the same problem.

Differential Revision: https://reviews.llvm.org/D132566
2022-09-23 08:40:23 -07:00
Florian Hahn d72eb9c985
[LoopDeletion] Invalidate SCEV after moving instruction.
LoopDeletion may hoist instructions out of a loop using
makeLoopInvariant without invalidating the SCEV for the moved
instruction.

Moving the instruction to a different block may change its
cached block disposition, so invalidate the cached info.

Fixes #57837.
2022-09-23 15:14:11 +01:00
Simon Pilgrim a6e9141505 [TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436)
The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both.

This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling.

Fixes #50778

Differential Revision: https://reviews.llvm.org/D111968
2022-09-23 14:03:18 +01:00
Nikita Popov fe196380cc [FunctionAttrs] Use MemoryLocation::getOrNone() when infering memory attrs
MemoryLocation::getOrNone() already has the necessary logic to
handle different instruction types. Use it, rather than repeating
a subset of the logic. This adds support for previously unhandled
instructions like atomicrmw.
2022-09-23 13:56:55 +02:00
Florian Hahn 623c4a7a55
[LoopVersioning] Invalidate SCEV for phi if new values are added.
After 20d798bd47, SCEV looks through PHIs with a single incoming
value. This means adding a new incoming value may change the SCEV for a
phi. Add missing invalidation when an existing PHI is reused during
LoopVersioning. New incoming values will be added later from the
versioned loop.

Similar issues have been fixed by also adding missing invalidation.

Fixes #57825.

Note that the test case unfortunately requires running loop-vectorize
followed by loop-load-elimination, which does the actual versioning. I
don't think it is possible to reproduce the failure without that
combination.
2022-09-23 11:53:29 +01:00
bipmis 3c70c8c1df [AggressiveInstCombine] Combine consecutive loads which are being merged to form a wider load.
The patch simplifies some of the patterns as below

1. (ZExt(L1) << shift1) | (ZExt(L2) << shift2) -> ZExt(L3) << shift1
2. (ZExt(L1) << shift1) | ZExt(L2) -> ZExt(L3)

The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc.

Differential Revision: https://reviews.llvm.org/D127392
2022-09-23 10:19:50 +01:00
Teresa Johnson 794b7ea960 Revert "[MemProf] Memprof profile matching and annotation"
This reverts commit a212d8da94, and follow
on fixes 0cd6763fa9,
e9ff53d42f, and
37c6a25e9a.

After re-reading the documentation for hash_combine, I don't think this
is the appropriate hash function to use for computing the hash to use as
a stack id in the metadata, since it is not guaranteed to produce stable
values across executions. I have not hit this problem, but plan to
switch to using an MD5 hash. I am hitting an issue with one of the bots
(https://lab.llvm.org/buildbot/#/builders/171/builds/20732)
where the values produced are only the lower 32 bits of the expected
hash values, however, which I assume is related to the implementation of
hash_combine and hash_code.

I believe I fixed all of the other bot failures with the follow on fixes,
which I'll merge into the new version before reapplying.
2022-09-22 16:08:03 -07:00
Teresa Johnson e9ff53d42f [MemProf] Fix buildbot error due to unused variable from bad merge
Fix an unused variable warning introduced by a212d8da94
due to a bad merge with a recent change. E.g. in
https://lab.llvm.org/buildbot/#/builders/77/builds/22095
2022-09-22 13:24:33 -07:00
Teresa Johnson a212d8da94 [MemProf] Memprof profile matching and annotation
Profile matching and IR annotation for memprof profiles.

See also related RFCs:
RFC: Sanitizer-based Heap Profiler [1]
RFC: A binary serialization format for MemProf [2]
RFC: IR metadata format for MemProf [3]*

* Note that the IR metadata format has changed from the RFC during
implementation, as described in the preceeding patch adding the basic
metadata and verification support.

The matching is performed during the normal PGO annotation phase, to
ensure that the inlines applied in the IR at that point are a subset
of the inlines in the profiled binary and thus reflected in the
profile's call stacks. This is important because the call frames are
associated with functions in the profile based on the inlining in the
symbolized call stacks, and this simplifies locating the subset of
profile data relevant for matching onto each function's IR.

The PGOInstrumentationUse pass is enhanced to perform matching for
whatever combination of memprof and regular PGO profile data exists in
the profile.

Using the utilities introduced in D128854:
The memprof profile data for each context is converted to "cold" or
"notcold" based on parameterized thresholds for size, access count, and
lifetime. The memprof allocation contexts are trimmed to the minimal
amount of context required to uniquely identify whether the context is
cold or not cold. For allocations where all profiled contexts have the
same allocation type, no memprof metadata is attached and instead the
allocation call is directly annotated with an attribute specifying the
alloction type. This is the same attributed that will be applied to
allocation calls once cloned for different contexts, and later used
during LibCall simplification to emit allocation hints [4].

Depends on D128141 and D128854.

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html
[2] https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html
[3] https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165
[4] ab87cf382d

Differential Revision: https://reviews.llvm.org/D128142
2022-09-22 12:48:31 -07:00
Leonard Chan 21b03bf970 [llvm] Handle dso_local_equivalent in FunctionComparator
This addresses https://github.com/llvm/llvm-project/issues/51066.

Prior to this, dso_local_equivalent would lead to an llvm_unreachable in
a switch in the FunctionComparator. This adds a conservative case in
that switch that just compares the underlying functions.

Differential Revision: https://reviews.llvm.org/D134300
2022-09-22 18:42:31 +00:00
Philip Reames 32dc1151e2 [VPlan] Only generate single instr for unpredicated stores of varying value to invariant address
This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.)

This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.)

Differential Revision: https://reviews.llvm.org/D133580
2022-09-22 08:53:46 -07:00
Nikita Popov 8df376db72 [InstCombine] Remove buggy zext of icmp eq with pow2 fold (PR57899)
For the case where the constant is a power of two rather than zero,
the fold is incorrect, because it fails to check that the bit set
in the LHS matches the bit in the RHS.

Rather than fixing this, remove the power of two handling entirely,
as a different fold will already canonicalize such comparisons to
use a zero constant.

Fixes https://github.com/llvm/llvm-project/issues/57899.
2022-09-22 16:37:10 +02:00
Nikita Popov c2e76f914c [InstCombine] Use simplifyWithOpReplaced() for non-bool selects
Perform the simplifyWithOpReplaced() fold even for non-bool
selects. This subsumes a number of recently added folds for
zext/sext of the condition.

We still need to manually handle variations with both sext/zext
and not, because simplifyWithOpReplaced() only performs one
level of replacements.
2022-09-22 15:46:00 +02:00
Nikita Popov 41dde5d858 [InstSimplify] Support vectors in simplifyWithOpReplaced()
We can handle vectors inside simplifyWithOpReplaced(), as long as
cross-lane operations are excluded. The equality can hold (or not
hold) for each vector lane independently, so we shouldn't use the
replacement value from other lanes.

I believe the only operations relevant here are shufflevector (where
all previous bugs were seen) and calls (which might use shuffle-like
intrinsics and would require more careful classification).

Differential Revision: https://reviews.llvm.org/D134348
2022-09-22 10:45:42 +02:00
Congzhe Cao 22c91df52c [LoopInterchange][PR57148] Ensure the correct form of IR after transformation
This is a bugfix patch that resolves the following two bugs in loop interchange:

1. PR57148 which is an assertion error due to of loss of LCSSA form after interchange,
   as referred to test1() in pr57148.ll.
2. Use before def for the outermost loop induction variables after interchange,
   as referred to test2() in pr57148.ll.

The fix in this patch is that:

1. In cases where the LCSSA form is not maintained after interchange, we update the IR
   to the LCSSA form again.
2. We split the phi nodes in the inner loop header into a separate basic block to avoid
   the situation where use of the outer indvar appears before its def after interchange.
   Previously we already did this for innermost loops, now we do it for non-innermost
   loops (e.g., middle loops) as well.

Reviewed By: bmahjour, Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D132055
2022-09-22 00:20:53 -04:00
Congzhe Cao 6782d71680 [LoopPassManager] Ensure to construct loop nests with the outermost loop
This patch is to resolve the bug reported and discussed in
https://reviews.llvm.org/D124926#3718761 and https://reviews.llvm.org/D124926#3719876.

The problem is that loop interchange is a loopnest pass under the new pass manager,
but the loop nest may not be constructed correctly by the loop pass manager after
running loop interchange and before running the next pass, which might cause problems
when it continues running the next pass.

The reason that the loop nest is constructed incorrectly is that the outermost
loop might have changed after interchange, and what was the original outermost
loop is not the current outermost loop anymore. Constructing the loop nest based
on the original outermost loop would generate an invalid loop nest.

The fix in this patch is that, in the loop pass manager before running each loopnest
pass, we re-cosntruct the loop nest based on the current outermost loop, if LPMUpdater
notifies the loop pass manager that the previous loop nest has been invalidated by passes
like loop interchange.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D132199
2022-09-21 23:59:26 -04:00
Vitaly Buka ba39a6e14a [msan] Instrument vtest instrinsics
Instrumentation just ORs shadow of inputs.
I assume some result shadow bits can be reset if we go into specifics of particular checks,
but as-is it is still an improvement against existing default strict instruction handler, when
every set bit of input shadow is  reported as an error.

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D134123
2022-09-21 16:57:44 -07:00
Vitaly Buka 6fd959d625 [msan] Handle x86_avx_cmp_pd_256 and x86_avx_cmp_ps_256
Removed FIXME which looks irrelevant. The error message happens only without -mattr=+avx.
E.g.
GOOD: opt llvm/test/Instrumentation/MemorySanitizer/avx-intrinsics-x86.ll -passes=msan -o - | llc -O3 -o /dev/null -mattr=+avx
BAD: opt llvm/test/Instrumentation/MemorySanitizer/avx-intrinsics-x86.ll -passes=msan -o - | llc -O3 -o /dev/null

So nothing to fix here.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D134119
2022-09-21 15:17:02 -07:00
Sanjay Patel ee0bf64722 [InstCombine] try to fold mul by neg-power-of-2 to shl
`(A * -2**C) + B --> B - (A << C)`

https://alive2.llvm.org/ce/z/A6BWkf

This inverts what Negator was doing before:
D134310 / 0f32a5dea0

Analysis and codegen are generally better without multiply,
so we should favor this form even if we trade add for sub
(because those are generally equivalent cost operations).
2022-09-21 15:09:39 -04:00
Sanjay Patel 64d309131a [InstCombine] try multi-use demanded bits fold for 'sub'
This is similar to D133788 / 73919a87e9, but for sub
the transform is valid only for low zeros in operand 1.

https://alive2.llvm.org/ce/z/EmRsXC
2022-09-21 14:13:05 -04:00
Konstantina 80d3ed6fb1 [NFC][NewGVN] Remove OpIsSafeForPHIOfOpsHelper()
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D130949
2022-09-21 09:25:59 -07:00
Alexey Bataev e664dea182 [SLP]Fix write-after-bounds.
Mask might be larger than the NumElts-OffsetBeg, need to use actual
indices to avoid acces out of bounds.
2022-09-21 08:00:15 -07:00
Matt Arsenault 0d1f040749 LICM: Pass through AssumptionCache 2022-09-21 09:16:17 -04:00
Sanjay Patel 0f32a5dea0 [InstCombine] don't canonicalize shl+sub to mul+add
This stops Negator from transforming:
`C1 - shl X, C2 --> mul X, (1<<C2) + C1`
...in the general case. There does not seem to be any analysis
benefit to using mul in IR, and there's definitely downside in
codegen (particularly when the multiply has to be expanded).

If `C1` is 0, then there's a stronger argument that the single
mul is a better canonicalization than negate-of-shl, but we may
want to remove that too.

This was noted as a potential conflict for D133667.

Differential Revision: https://reviews.llvm.org/D134310
2022-09-21 08:39:07 -04:00
Bjorn Pettersson 3f08d248c4 [SROA] Check typeSizeEqualsStoreSize in isVectorPromotionViable
Commit de3445e0ef (https://reviews.llvm.org/D132096) made
changes to isVectorPromotionViable basically doing

  // Create Vector with size of V, and each element of type Ty
  ...
  uint64_t ElementSize = DL.getTypeStoreSizeInBits(Ty).getFixedSize();
  uint64_t VectorSize = DL.getTypeSizeInBits(V).getFixedSize();
  ...
  VectorType *VTy = VectorType::get(Ty, VectorSize / ElementSize, false);

Not quite sure why it uses the TypeStoreSize for the ElementSize,
but the new vector would only match in size with the old vector in
situations when the TypeStoreSize equals the TypeSize for Ty.
Therefore this patch adds a typeSizeEqualsStoreSize check as yet
another condition for allowing the the new type as a promotion
candidate.

Without this fix the new @test15 test would fail with an assert
like this:

opt: ../lib/Transforms/Scalar/SROA.cpp:1966:
  auto isVectorPromotionViable(llvm::sroa::Partition &,
                               const llvm::DataLayout &)
       ::(anonymous class)::operator()(llvm::VectorType *,
                                       llvm::VectorType *) const:
  Assertion `DL.getTypeSizeInBits(RHSTy).getFixedSize() ==
             DL.getTypeSizeInBits(LHSTy).getFixedSize() &&
             "Cannot have vector types of different sizes!"' failed.
 ...
 #8  isVectorPromotionViable(...)::$_10::operator()...
 #9  llvm::SROAPass::rewritePartition(...)
#10  llvm::SROAPass::splitAlloca(...)
#11  llvm::SROAPass::runOnAlloca(...)
#12  llvm::SROAPass::runImpl(...)
#13  llvm::SROAPass::run(...)

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D134032
2022-09-21 09:45:05 +02:00
Michael Berg 897a79f970 [DSE] Add value type info checks for masked store candidates in Dead Store Elimination.
The type information of the store values can diverge when checking for valid
mask store candidates to eliminate via DSE. This patch checks for equivalence
wrt to size and element count.

Reviewed By: fhahn, rui.zhang

Differential Revision: https://reviews.llvm.org/D132700
2022-09-20 15:54:25 -07:00
Markus Böck b751da43b2 [InstCombine] Handle integer extension in `select` patterns using the condition as value
These patterns were previously only implemented for i1 type but can be extended for any integer type by also handling zext and sext operands.

Differential Revision: https://reviews.llvm.org/D134142
2022-09-20 22:25:13 +02:00
Zain Jaffal 68cc35d52c
[InstCombine] Matrix multiplication negation optimisation
If one of the operands in a matrix multiplication is negated we can optimise the equation by moving the negation to the smallest element of the operands or the result.

Reviewed By: spatel, fhahn

Differential Revision: https://reviews.llvm.org/D133300
2022-09-20 19:50:39 +01:00
Gulfem Savrun Yeniceri f039a9fa32 [InstrProfiling] Emit runtime hook only once
This patch fixes the issue about calling emitRuntimeHook() twice
when we need to unconditionally emit runtime hook as discussed in
https://reviews.llvm.org/rGd6aed77f0d19.

Differential Revision: https://reviews.llvm.org/D134254
2022-09-20 17:00:46 +00:00
Kazu Hirata 00874c48ea [IPO] Reorder parameters of InlineFunction (NFC)
With the recent addition of new parameter MergeAttributes (D134117),
callers need to specify several default parameters before getting to
specify the new parameter.

This patch reorders the parameters so that callers do not have to
specify as many default parameters.

Differential Revision: https://reviews.llvm.org/D134125
2022-09-20 09:09:38 -07:00
Simon Pilgrim 09cb9fdef9 [InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635)
Alive2: https://alive2.llvm.org/ce/z/sZ6wwS

As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero.

Differential Revision: https://reviews.llvm.org/D134172
2022-09-20 16:44:41 +01:00
Florian Hahn dcbc8a0daa
[LV] Remove unused widenCallInstruction declaration (NFC).
The definition and uses have been removed a while ago. Clean up the
unused declaration.
2022-09-20 15:20:28 +01:00
Djordje Todorovic f0f8b46863 Recommit "[AggressiveInstCombine] Lower Table Based CTTZ
The bug reported on the [0] has been fixed.
The issue was we have not checked if the global variables that
represent cttz tables was constant.
There is a new negative test added in negative-lower-table-based-cttz.ll
that represents this.

[0] https://reviews.llvm.org/rGdf868edee561eb973edd85ec9df41c67aa0bff6b
2022-09-20 13:12:47 +02:00
Dmitri Gribenko 5d7ff0d87c Fix an unused warning in release build 2022-09-20 11:29:39 +02:00