Commit Graph

31127 Commits

Author SHA1 Message Date
Nikita Popov bdba8278d9 [VectorCombine] Avoid ConstantExpr::get() (NFC)
Use IRBuilder APIs instead, which will still constant fold.
2022-06-29 17:17:52 +02:00
Nikita Popov 2124b2f0e6 [JumpThreading] Avoid ConstantExpr::get() (NFCI)
This code requires the result to be an UndefValue/ConstantInt
anyway (checked by getKnownConstant), so we are only interested
in the case where this folds.
2022-06-29 16:43:05 +02:00
Nikita Popov df698a5762 [InstCombine] Avoid some calls to ConstantExpr::get() (NFCI)
Replace some calls to ConstantExpr::get() with IRBuilder APIs
(which will also constant fold if possible).
2022-06-29 16:26:02 +02:00
Nikita Popov 0af53fcb99 [SROA] Don't create constant expressions (NFC)
Use IRBuilder instead, which will fold these. Just to clarify
that this does not actually create any udiv expression.
2022-06-29 11:51:22 +02:00
Pavel Samolysov 3d9ce9e43d [ArgPromotion] Remove all the getters and ReplaceCallSite (NFC)
AARGetter is an abstraction over a source of the `AAResults` introduced
to support the legacy pass manager as well as the modern one. Since the
Argument Promotion pass doesn't support the legacy pass manager anymore,
the abstraction is not required and `AAResults` may be used directly.

The instance of the `FunctionAnalysisManager` is passed through the
functions to get all the required analyses just wherever they are
required and do not use the awkward getter callbacks.

The `ReplaceCallSite` parameter was required for the legacy pass manager
only and isn't used anymore, so the parameter has been eliminated.

Differential Revision: https://reviews.llvm.org/D128727
2022-06-29 10:45:11 +03:00
Pavel Samolysov 8958057fb1 [ArgPromotion] Move isDenselyPacked static member (NFC)
The `isDenselyPacked` static member of the `ArgumentPromotionPass` class
is not used in the class itself anymore. The single known user of the
function is in the `AttributorAttributes.cpp` file, so the function has
been moved into the file.

Differential Revision: https://reviews.llvm.org/D128725
2022-06-29 10:45:10 +03:00
Martin Sebor 8827679826 [InstCombine] Fold strncmp of constant arrays and variable size
Extend the solution accepted in D127766 to strncmp and simplify
strncmp(A, B, N) calls with constant A and B and variable N to
the equivalent of

  N <= Pos ? 0 : (A < B ? -1 : B < A ? +1 : 0)

where Pos is the offset of either the first mismatch between A
and B or the terminating null character if both A and B are equal
strings.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D128089
2022-06-28 15:59:14 -06:00
Martin Sebor e263a7670e [InstCombine] Look through more casts when folding memchr and memcmp
Enhance getConstantDataArrayInfo to let the memchr and memcmp library
call folders look through arbitrarily long sequences of bitcast and
GEP instructions.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128364
2022-06-28 15:58:42 -06:00
Alexey Bataev bf4dcbd2df [SLP]Fix PR56251: Do not remove the reordering from the root node, being used as an operand.
If the root order itself does not require reordering, we can just
remove its reorder mask safely (e.g., if the root node is a vector of
phis). But if this node is used as an operand in the graph, we cannot
delete the reordering, need to keep it. Otherwise the graph nodes are
not synchronized with the operands. It may cause an extra gather
instruction(s) or a compiler crash.
Also, need to be very careful when selecting the gather nodes for
reordering since there might several gather nodes with the same scalars
and we can try to reorder just the same node many times instead of
different nodes.

Differential Revision: https://reviews.llvm.org/D128680
2022-06-28 13:42:05 -07:00
Leonard Chan 9553d69580 [NFC][HWASan] Refactor hwasan pass
This moves some code for getting PC and SP into their own functions. Since SP
is also retrieved in the prologue and getting the stack tag, we can cache the
SP if we get it once in the prologue. This caching will really only be relevant
in D128387 where StackBaseTag may not be set in the prologue if __hwasan_tls
is not used.

Differential Revision: https://reviews.llvm.org/D128551
2022-06-28 12:09:20 -07:00
Pavel Samolysov 170c4d21bd [ArgPromotion] Unify byval promotion with non-byval
It makes sense to handle byval promotion in the same way as non-byval
but also allowing `store` instructions. However, these should
use the same checks as the `load` instructions do, i.e. be part of the
`ArgsToPromote` collection. For these instructions, the check for
interfering modifications can be disabled, though. The promotion
algorithm itself has been modified a lot: all the accesses (i.e. loads
and stores) are rewritten to the emitted `alloca` instructions. To
optimize these new `alloca`s out, the `PromoteMemToReg` function from
`Transforms/Utils/PromoteMemoryToRegister.cpp` file is invoked after
promotion.

In order to let the `PromoteMemToReg` promote as many `alloca`s as it
is possible, there should be no `GEP`s from the `alloca`s. To
eliminate the `GEP`s, its own `alloca` is generated for every argument
part because a single `alloca` for the whole argument (that
significantly simplifies the code of the pass though) unfortunately
cannot be used.

The idea comes from the following discussion:
https://reviews.llvm.org/D124514#3479676

Differential Revision: https://reviews.llvm.org/D125485
2022-06-28 15:19:58 +03:00
Mikhail Goncharov c6c124ca80 Fixed unused variable warning. 2022-06-28 11:44:16 +02:00
Florian Hahn 03975b7f0e
[VPlan] Move recipe implementations to separate file (NFC).
This patch moves the code for recipe implementations to a separate file.

The benefits are:
 * Keep VPlan.cpp smaller => faster compile-time during parallel builds.
 * Keep code for logical units together

As a follow-up I am also planning on moving all ::execute
implemetnations from LoopVectorize.cpp over to the new file, which
should help to reduce the size of the file a bit.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D127965
2022-06-28 10:34:30 +01:00
Nikita Popov 5548e807b5 [IR] Remove support for extractvalue constant expression
This removes the extractvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
extractvalue is already not supported in bitcode, so we do not need
to worry about bitcode auto-upgrade.

Uses of ConstantExpr::getExtractValue() should be replaced with
IRBuilder::CreateExtractValue() (if the fact that the result is
constant is not important) or ConstantFoldExtractValueInstruction()
(if it is). Though for this particular case, it is also possible
and usually preferable to use getAggregateElement() instead.

The C API function LLVMConstExtractValue() is removed, as the
underlying constant expression no longer exists. Instead,
LLVMBuildExtractValue() should be used (which will constant fold
or create an instruction). Depending on the use-case,
LLVMGetAggregateElement() may also be used instead.

Differential Revision: https://reviews.llvm.org/D125795
2022-06-28 10:40:17 +02:00
Guillaume Chatelet 3c126d5fe4 [Alignment] Replace commonAlignment with std::min
`commonAlignment` is a shortcut to pick the smallest of two `Align`
objects. As-is it doesn't bring much value compared to `std::min`.

Differential Revision: https://reviews.llvm.org/D128345
2022-06-28 07:15:02 +00:00
wlei 7e86b13c63 [CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie
This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner.

One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node.

Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic.

Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D127031
2022-06-27 23:22:21 -07:00
wlei aa58b7b1e3 [CSSPGO][llvm-profgen] Reimplement computeSummaryAndThreshold using context trie
Follow-up patch to https://reviews.llvm.org/D125246, support `computeSummaryAndThreshold` based on context trie.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D127026
2022-06-27 23:22:21 -07:00
Congzhe Cao b941857b40 [LoopInterchange] New cost model for loop interchange
This is another attempt to land this patch.

The patch proposed to use a new cost model for loop interchange,
which is obtained from loop cache analysis.

Given a loopnest, what loop cache analysis returns is a vector of
loops [loop0, loop1, loop2, ...] where loop0 should be replaced as
the outermost loop, loop1 should be placed one more level inside, and
loop2 one more level inside, etc. What loop cache analysis does is not
only more comprehensive than the current cost model, it is also a "one-shot"
query which means that we only need to query it once during the entire
loop interchange pass, which is better than the current cost model where
we query it every time we check whether it is profitable to interchange
two loops. Thus complexity is reduced, especially after D120386 where we
do more interchanges to get the globally optimal loop access pattern.

Updates made to test cases are mostly minor changes and some
corrections. One change that applies to all tests is that we added an option
`-cache-line-size=64` to the RUN lines. This is ensure that loop
cache analysis receives a valid number of cache line size for correct
analysis. Test coverage for loop interchange is not reduced.

Currently we did not completely remove the legacy cost model, but
keep it as fall-back in case the new cost model did not run successfully.
This is because currently we have some limitations in delinearization, which
sometimes makes loop cache analysis bail out. The longer term goal is to
enhance delinearization and eventually remove the legacy cost model
compeletely.

Reviewed By: bmahjour, #loopoptwg

Differential Revision: https://reviews.llvm.org/D124926
2022-06-28 00:08:37 -04:00
Vitaly Buka 6824eee942 [asan] Add missing dependency on Demangle
Follow up to D127911.
2022-06-27 15:10:02 -07:00
Mitch Phillips dacfa24f75 Delete 'llvm.asan.globals' for global metadata.
Now that we have the sanitizer metadata that is actually on the global
variable, and now that we use debuginfo in order to do symbolization of
globals, we can delete the 'llvm.asan.globals' IR synthesis.

This patch deletes the 'location' part of the __asan_global that's
embedded in the binary as well, because it's unnecessary. This saves
about ~1.7% of the optimised non-debug with-asserts clang binary.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D127911
2022-06-27 14:40:40 -07:00
Philip Reames 20dd3297b1 [LV] Allow scalable vectorization with vscale = 1
This change is a bit subtle. If we have a type like <vscale x 1 x i64>, the vectorizer will currently reject vectorization. The reason is that a type like <1 x i64> is likely to get simply rescalarized, and the vectorizer doesn't want to be in the game of simple unrolling.

(I've given the example in terms of 1 x types which use a single register, but the same issue exists for any N x types which use N registers. e.g. RISCV LMULs.)

This change distinguishes scalable types from fixed types under the reasoning that converting to a scalable type isn't unrolling. Because the actual vscale isn't known until runtime, using a vscale type is potentially very profitable.

This makes an important, but unchecked, assumption. Specifically, the scalable type is assumed to only be legal per the cost model if there's actually a scalable register class which is distinct from the scalar domain. This is, to my knowledge, true for all targets which return non-invalid costs for scalable vector ops today, but in theory, we could have a target decide to lower scalable to fixed length vector or even scalar registers. If that ever happens, we'd need to revisit this code.

In practice, this patch unblocks scalable vectorization for ELEN types on RISCV.

Let me sketch one alternate implementation I considered. We could have restricted this to when we know a minimum value for vscale. Specifically, for the default +v extension for RISCV, we actually know that vscale >= 2 for ELEN types. However, doing it this way means we can't generate scalable vectors when using the various embedded vector extensions which have a minimum vscale of 1.

Differential Revision: https://reviews.llvm.org/D128542
2022-06-27 13:38:57 -07:00
Yuanfang Chen e2e9e708e5 [Coroutine] Remove the '!func_sanitize' metadata for split functions
There is no proper RTTI for these split functions. So just delete the
metadata.

Fixes https://github.com/llvm/llvm-project/issues/49689.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D116130
2022-06-27 12:09:13 -07:00
Yuanfang Chen 6678f8e505 [ubsan] Using metadata instead of prologue data for function sanitizer
Information in the function `Prologue Data` is intentionally opaque.
When a function with `Prologue Data` is duplicated. The self (global
value) references inside `Prologue Data` is still pointing to the
original function. This may cause errors like `fatal error: error in backend: Cannot represent a difference across sections`.

This patch detaches the information from function `Prologue Data`
and attaches it to a function metadata node.

This and D116130 fix https://github.com/llvm/llvm-project/issues/49689.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D115844
2022-06-27 12:09:13 -07:00
Joseph Huber c7243f21d3 [OpenMP] Only strip runtime attributes if needed
Summary:
Currently in OpenMPOpt we strip `noinline` attributes from runtime
functions. This is here because the device bitcode library that we link
has problems with needed definitions getting prematurely optimized out.
This is only necessary for OpenMP offloading to GPUs so we should narrow
the scope for where we spend time doing this. In the future this
shouldn't be necessary as we move to using a linked library rather than
pulling in a bitcode library in Clang.
2022-06-27 13:35:41 -04:00
Nikita Popov f65c88c42f [GlobalOpt] Fix memset handling in global ctor evaluation (PR55859)
The global ctor evaluator currently handles by checking whether the
memset memory is already zero, and skips it in that case. However,
it only actually checks the first byte of the memory being set.

This patch extends the code to check all bytes being set. This is
done byte-by-byte to avoid converting undef values to zeros in
larger reads. However, the handling is still not completely correct,
because there might still be padding bytes (though probably this
doesn't matter much in practice, as I'd expect global variable
padding to be zero-initialized in practice).

Mostly fixes https://github.com/llvm/llvm-project/issues/55859.

Differential Revision: https://reviews.llvm.org/D128532
2022-06-27 16:50:49 +02:00
Bradley Smith a83aa33d1b [IR] Move vector.insert/vector.extract out of experimental namespace
These intrinsics are now fundemental for SVE code generation and have been
present for a year and a half, hence move them out of the experimental
namespace.

Differential Revision: https://reviews.llvm.org/D127976
2022-06-27 10:48:45 +00:00
Nikita Popov cde402778a [FunctionAttrs] Add missing pass dependency
This pass depends on AAResults. This fixes the ocaml IPO binding
tests.
2022-06-27 10:15:06 +02:00
Nikita Popov 217e85761c [ArgPromotion] Remove legacy PM support
Support for the legacy pass manager in ArgPromotion causes
complications in D125485. As the legacy pass manager for middle-end
optimizations is unsupported, drop ArgPromotion from the legacy
pipeline, rather than introducing additional complexity to deal
with it.

Differential Revision: https://reviews.llvm.org/D128536
2022-06-27 09:42:17 +02:00
Chuanqi Xu 24e53b01d5 Revert "[Coroutines] Only do symmetric transfer if optimization is on"
This reverts commit 7782e080e8. According
to the discussion of WG21, symmetric transfer is a desired feature.
2022-06-27 10:54:56 +08:00
Kazu Hirata d08f34b592 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-26 18:31:51 -07:00
Kazu Hirata a81b64a1fb [llvm] Use Optional::has_value instead of Optional::hasValue (NFC)
This patch replaces x.hasValue() with x.has_value() where x is not
contextually convertible to bool.
2022-06-26 16:10:42 -07:00
Nuno Lopes 6ef9a2ad01 [LICM] Use poison to replace unreachable values instead of undef [NFC] 2022-06-26 14:56:35 +01:00
Nuno Lopes 3fa2411dc5 [LoopSimplifyCFG] use poison when replacing dead instructions instead of undef [NFC] 2022-06-26 14:15:55 +01:00
Nuno Lopes d46fa1fc58 [ArgumentPromotion] use poison when replacing dead instructions instead of undef [NFC] 2022-06-26 13:44:05 +01:00
Kazu Hirata a7938c74f1 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-25 21:42:52 -07:00
Kazu Hirata 3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3.
2022-06-25 11:56:50 -07:00
Kazu Hirata aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Pavel Samolysov 6e3d4712b9 [DeadArgElim] Replace insert with emplace (NFC) 2022-06-25 10:31:27 +03:00
Mitch Phillips f57066401e [HWASan] Use new IR attribute for communicating unsanitized globals.
Globals that shouldn't be sanitized are currently communicated to HWASan
through the use of the llvm.asan.globals IR metadata. Now that we have
an on-GV attribute, use it.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D127543
2022-06-24 12:04:11 -07:00
Mingming Liu e0d069598b [Inline] Annotate inline pass name with link phase information for analysis.
The annotation is flag gated; flag is turned off by default.

Differential Revision: https://reviews.llvm.org/D125495
2022-06-24 10:06:43 -07:00
Alexey Bataev 2faacf61a5 [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-24 09:28:01 -07:00
Arthur Eubanks e422c0d3b2 [GlobalOpt] Perform store->dominated load forwarding for stored once globals
The initial land incorrectly optimized forwarding non-Constants in non-nosync/norecurse functions. Bail on non-Constants since norecurse should cause global -> alloca promotion anyway.

The initial land also incorrectly assumed that StoredOnceStore was the only store to the global, but it actually means that only one value other than the global initializer is stored. Add a check that there's only one store.

Compile time tracker:
https://llvm-compile-time-tracker.com/compare.php?from=c80b88ee29f34078d2149de94e27600093e6c7c0&to=ef2c2b7772424b6861a75e794f3c31b45167304a&stat=instructions

Reviewed By: nikic, asbirlea, jdoerfert

Differential Revision: https://reviews.llvm.org/D128128
2022-06-24 09:09:26 -07:00
Florian Hahn cb69ba4faa
[LV] Create RT checks once VF/IC are selected, track scalar cost.
This patch updates LV to generate runtime after the VF & IC are selected. It
allows deciding whether to vectorize with runtime checks or not based on
their cost compared to the vector loop.

It also updates VectorizationFactor to include the scalar cost.

Reviewed By: lebedev.ri, dmgreen

Differential Revision: https://reviews.llvm.org/D75981
2022-06-24 17:42:11 +02:00
Nikita Popov 871197d0a3 [MemoryBuiltins] Accept any value in getInitialValueOfAllocation() (NFC)
Drop the requirement that getInitialValueOfAllocation() must be
passed an allocator function, shifting the responsibility for
checking that into the function (which it does anyway). The
motivation is to avoid some calls to isAllocationFn(), which has
somewhat ill-defined semantics (given the number of
allocator-related attributes we have floating around...)

(For this function, all we eventually need is an allockind of
zeroed or uninitialized.)

Differential Revision: https://reviews.llvm.org/D127274
2022-06-24 16:08:07 +02:00
Nikita Popov e523baa664 [InlineFunction] Slightly clarify noalias scope calculation (NFC)
Rename CanDeriveViaCapture -> RequiresNoCaptureBefore, drop
unnecessary const cast, reformat some code avoid an ugly
super-indented comment.
2022-06-24 12:31:46 +02:00
Florian Hahn b18141a8f2
[VPlan] Set VFs included in plan before last set of VPTransforms (NFC).
This allows VPlanTransforms to query the VFs included in the plan in the
future.
2022-06-24 10:16:56 +02:00
Florian Hahn 92f87787b3
Recommit "[ConstraintElimination] Transfer info from ULT to signed system."
This reverts commit 94ed2caf70.

The issue with no-determinism with the test has been fixed in
d9526e8a52.
2022-06-24 09:27:14 +02:00
Evgenii Stepanov 878309cc54 Revert "[LoopInterchange] New cost model for loop interchange"
llvm/lib/Analysis/LoopCacheAnalysis.cpp:702:30: runtime error: signed
integer overflow: 6148914691236517209 * 100 cannot be represented in
type 'long'

https://lab.llvm.org/buildbot/#/builders/5/builds/25185

This reverts commit 1b24fe34b0.
2022-06-23 16:10:53 -07:00
Congzhe Cao 1b24fe34b0 [LoopInterchange] New cost model for loop interchange
This is the second attempt to land this patch.

The patch proposed to use a new cost model for loop interchange,
which is obtained from loop cache analysis.

Given a loopnest, what loop cache analysis returns is a vector of
loops [loop0, loop1, loop2, ...] where loop0 should be replaced as the
outermost loop, loop1 should be placed one more level inside, and loop2
one more level inside, etc. What loop cache analysis does is not only more
comprehensive than the current cost model, it is also a "one-shot" query
which means that we only need to query it once during the entire loop
interchange pass, which is better than the current cost model where we
query it every time we check whether it is profitable to interchange two
loops. Thus complexity is reduced, especially after D120386 where we do
more interchanges to get the globally optimal loop access pattern.

Updates made to test cases are mostly minor changes and some corrections.
One change that applies to all tests is that we added an option
`-cache-line-size=64` to the RUN lines. This is ensure that loop cache
analysis receives a valid number of cache line size for correct analysis.
Test coverage for loop interchange is not reduced.

Currently we did not completely remove the legacy cost model, but keep it
as fall-back in case the new cost model did not run successfully. This is
because currently we have some limitations in delinearization, which sometimes
makes loop cache analysis bail out. The longer term goal is to enhance
delinearization and eventually remove the legacy cost model compeletely.

Reviewed By: bmahjour, #loopoptwg

Differential Revision: https://reviews.llvm.org/D124926
2022-06-23 16:34:57 -04:00
Philip Reames 46ea4b5ea1 [LV] Avoid a crash when costing a uniform store which doesn't correspond to a legal scatter
If we have an unaligned uniform store, then when costing a scalable VF we can't emit code to scalarize it.  (Well, we could, but we haven't implemented that case.)  This change replaces an assert with a cost-model bailout such that we reject vectorization with the scalable VF instead of crashing.
2022-06-23 12:41:09 -07:00
Alexey Bataev 3b6edef15d [SLP]Fix a crash when reorder masked gather nodes with reused scalars.
If the masked gather nodes must be reordered, we can just reorder
scalars, just like for gather nodes. But if the node contains reused
scalars, it must be handled same way as a regular vectorizable node,
since need to reorder reused mask, not the scalars directly.

Differential Revision: https://reviews.llvm.org/D128360
2022-06-23 11:32:30 -07:00
Florian Hahn d9526e8a52
[ConstraintElimination] Use stable_sort to sort worklist.
If there are multiple constraints in the same block, at the moment the
order they are processed may be different depending on the sort
implementation.

Use stable_sort to ensure consistent ordering.
2022-06-23 19:22:15 +02:00
Florian Hahn 94ed2caf70
Revert "[ConstraintElimination] Transfer info from ULT to signed system."
This reverts commit 316e106f49.

This breaks a bot with expensive checks.
2022-06-23 17:27:33 +02:00
Florian Hahn 316e106f49
[ConstraintElimination] Transfer info from ULT to signed system.
If A u< B holds, then A s>= 0 && A s< B holds if B s>= 0.

https://alive2.llvm.org/ce/z/RrNxHh
2022-06-23 17:17:01 +02:00
Florian Hahn 9a33f3975e
[ConstraintElimination] Transfer info from SLT to unsigned system.
If A s< B holds, then A u< also holds, if A s>= 0.

https://alive2.llvm.org/ce/z/J4JZuN
2022-06-23 15:57:59 +02:00
chenglin.bi 30e49a3794 [InstCombine] Optimise shift+and+boolean conversion pattern to simple comparison
if (`C1` is pow2) & (`(C2 & ~(C1-1)) + C1)` is pow2):
    ((C1 << X) & C2) == 0 -> X >= (Log2(C2+C1) - Log2(C1));
https://alive2.llvm.org/ce/z/EJAl1R
    ((C1 << X) & C2) != 0 -> X  < (Log2(C2+C1) - Log2(C1));
https://alive2.llvm.org/ce/z/3bVRVz

And remove dead code.

Fix: https://github.com/llvm/llvm-project/issues/56124

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126591
2022-06-23 21:53:07 +08:00
Florian Hahn 569d84fe99
[VPlan] Remove dead recipes across whole plan.
This extends removeDeadRecipe to remove recipes across the whole plan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D127580
2022-06-23 13:36:02 +02:00
Florian Hahn 24a98881cd
[ConstraintElimination] Transfer info from SGT to unsigned system.
If A >s B then A >=u 0, if B >=s -1.

https://alive2.llvm.org/ce/z/cncGKi
2022-06-23 11:04:51 +02:00
Fangrui Song 1ffd2d99c2 Revert D115462 "[SLP]Improve shuffles cost estimation where possible."
This reverts commit cac60940b7.

Caused -Os -fsanitize=memory -march=haswell miscompile to pytorch/cpuinfo.
See my latest comment (may update) on D115462.
2022-06-22 23:16:31 -07:00
Fangrui Song a411bc11d6 Revert "[SLP]Fix a crash when insert subvector is out of range."
This reverts commit f1ee2738b3.

Revert due to the revert of a dependent commit `[SLP]Improve shuffles cost estimation where possible.`
2022-06-22 23:16:25 -07:00
Serguei Katkov 5e1ccdf960 [RS4GC] Handle freeze case for vector
Finding BDV for vector value does not handle freeze instruction.
Adding its handling as it is done for scalar case.

Reviewed By: apilipenko
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D128254
2022-06-23 11:58:41 +07:00
Mingming Liu bc856eb3fc [SampleProfile][Inline] Annotate sample profile inline remarks with link phase (prelink/postlink) information.
Differential Revision: https://reviews.llvm.org/D126833
2022-06-22 17:00:53 -07:00
Florian Mayer 9320a32bb9 [MTE] [HWASan] Use LoopInfo for reachability queries.
The reachability queries default to "reachable" after exploring too many
basic blocks. LoopInfo helps it skip over the whole loop.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D127917
2022-06-22 15:28:49 -07:00
Adrian Tong 4e555a3df4 Fix a misspell. NFC 2022-06-22 21:23:21 +00:00
Brendon Cahoon f1b05a0a2b [StructurizeCFG] Improve basic block ordering
StructurizeCFG linearizes the successors of branching basic block
by adding Flow blocks to record the true/false path for branches
and back edges. This patch reduces the number of Phi values needed
to capture the control flow path by improving the basic block
ordering.

Previously, StructurizeCFG adds loop exit blocks outside of the
loop. StructurizeCFG sets a boolean value to indicate the path
taken, and all exit block live values extend to after the loop.
For loops with a large number of exits blocks, this creates a
huge number of values that are maintained, which increases
compilation time and register pressure. This is problem
especially with ASAN, which adds early exits to blocks with
unreachable instructions for each instrumented check in the loop.

In specific cases, this patch reduces the number of values needed
after the loop by moving the exit block into the loop. This is
done for blocks that have a single predecessor and single successor
by moving the block to appear just after the predecessor.

Differential Revision: https://reviews.llvm.org/D123231
2022-06-22 16:10:41 -05:00
Brendon Cahoon e13248ab0e [UnifyLoopExits] Reduce number of guard blocks
UnifyLoopExits creates a single exit, a control flow hub, for
loops with multiple exits. There is an input to the block for
each loop exiting block and an output from the block for each
loop exit block. Multiple checks, or guard blocks, are needed
to branch to the correct exit block.

For large loops with lots of exit blocks, all the extra guard
blocks cause problems for StructurizeCFG and subsequent passes.
This patch reduces the number of guard blocks needed when the
exit blocks branch to a common block (e.g., an unreachable
block). The guard blocks are reduced by changing the inputs
and outputs of the control flow hub. The inputs are the exit
blocks and the outputs are the common block.

Reducing the guard blocks enables StructurizeCFG to reorder the
basic blocks in the CFG to reduce the values that exit a loop
with multiple exits. This reduces the compile-time of
StructurizeCFG and also reduces register pressure.

Differential Revision: https://reviews.llvm.org/D123230
2022-06-22 15:44:23 -05:00
Evgenii Stepanov 5011b4ca0e Revert "[Attributor] Ensure to use the proper liveness AA"
Reason: memory leaks

This reverts commit 083010312a.
2022-06-22 13:40:45 -07:00
Florian Mayer 476ced4b89 [MTE] [HWASan] Support diamond lifetimes.
We were overly conservative and required a ret statement to be dominated
completely be a single lifetime.end marker. This is quite restrictive
and leads to two problems:

* limits coverage of use-after-scope, as we degenerate to
  use-after-return;
* increases stack usage in programs, as we have to remove all lifetime
  markers if we degenerate to use-after-return, which prevents
  reuse of stack slots by the stack coloring algorithm.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D127905
2022-06-22 11:16:34 -07:00
Florian Mayer acc9721e38 [NFC] [HWASan] Remove indirection for getting analyses.
This was necessary for code reuse between the old and new passmanager.
With the old pass-manager gone, this is no longer necessary.

Reviewed By: eugenis, myhsu

Differential Revision: https://reviews.llvm.org/D127913
2022-06-22 10:53:20 -07:00
Mingming Liu 67dc8021a1 [Support] Change TrackingStatistic and NoopStatistic to use uint64_t instead of unsigned.
Binary size of `clang` is trivial; namely, numerical value doesn't
change when measured in MiB, and `.data` section increases from 139Ki to
173 Ki.

Differential Revision: https://reviews.llvm.org/D128070
2022-06-22 10:11:40 -07:00
Max Kazantsev cff4f04e2e [LSR] Don't allow zero quotient as scale ref. PR56160
Scale reg should never be zero, so when the quotient is zero, we
cannot assign it there. Limit this transform to avoid this situation.

Differential Revision: https://reviews.llvm.org/D128339
Reviewed By: eopXD
2022-06-22 23:33:57 +07:00
Guillaume Chatelet 57ffff6db0 Revert "[NFC] Remove dead code"
This reverts commit 8ba2cbff70.
2022-06-22 14:55:47 +00:00
Guillaume Chatelet 8ba2cbff70 [NFC] Remove dead code 2022-06-22 13:33:58 +00:00
Florian Hahn 098b0b18a7
[ConstraintElimination] Transfer info from SGE to unsigned system.
This patch adds a new transferToOtherSystem helper that tries to
transfer information from signed predicates to the unsigned system and
vice versa.

The initial version adds A >=u B for A >=s B && B >=s 0

https://alive2.llvm.org/ce/z/8b6F9i
2022-06-22 15:27:59 +02:00
Nikita Popov 1f88d80408 [SCCP] Don't mark edges feasible when resolving undefs
As branch on undef is immediate undefined behavior, there is no need
to mark one of the edges as feasible. We can leave all the edges
non-feasible. In IPSCCP, we can replace the branch with an unreachable
terminator.

Differential Revision: https://reviews.llvm.org/D126962
2022-06-22 10:28:27 +02:00
Florian Hahn ac62b8f704
[ConstraintElimination] Update addFact to take Predicate and ops (NFC).
This allows adding facts without necessarily having a corresponding
CmpInst.
2022-06-22 08:36:41 +02:00
Pavel Samolysov f44bf3805a [DeadArgElim] Reformat the pass in accordance with the code style
The code has been reformatted in accordance with the code style. Some
function comments were extended to the Doxygen ones and reworded a bit
to eliminate the duplication of the function's/class' name in the
comment.

Differential Revision: https://reviews.llvm.org/D128168
2022-06-22 09:13:00 +03:00
chenglin.bi 810b5c471f [NewGVN] add context instruction for SimplifyQuery
NewGVN will find operator from other context. ValueTracking currently doesn't have a way to run completely without context instruction.
So it will use operator itself as conext instruction.
If the operator in another branch will never be executed but it has an assume, it may caused value tracking use the assume to do wrong simpilfy.

It would be better to make these simplification queries not use context at all, but that would require some API changes.
For now we just use the orignial instruction as context instruction to fix the issue.

Fix #56039

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D127942
2022-06-22 12:25:24 +08:00
Serguei Katkov 8f891b7c39 [LoopVectorize] Uninitialized phi node leads to a crash in SSAUpdater.
createInductionResumeValues creates a phi node placeholder
without filling incoming values. Then it generates the incoming values.

It includes triggering of SCEV expander which may invoke SSAUpdater.
SSAUpdater has an optimization to detect number of predecessors
basing on incoming values if there is phi node.
In case phi node is not filled with incoming values - the number of predecessors
is detected as 0 and this leads to segmentation fault.

In other words SSAUpdater expects that phi is in good shape while
LoopVectorizer breaks this requirement.

The fix is just prepare all incoming values first and then build a phi node.

Reviewed By: fhahn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D128033
2022-06-22 10:49:27 +07:00
Johannes Doerfert b7cc3b10c5 [Attributor][FIX] Avoid empty bin in AAPointerInfo
This avoid creating empty bins in AAPointerInfo which can lead to
segfaults. Also ensure we do not try to translate from callee to caller
except if we really take the argument state and move it to the call site
argument state.

Fixes: https://github.com/llvm/llvm-project/issues/55726
2022-06-21 21:30:57 -05:00
Johannes Doerfert 083010312a [Attributor] Ensure to use the proper liveness AA
When determining liveness via Attributor::isAssumedDead(...) we might
end up without a liveness AA or with one pointing into another function.
Neither is helpful and we will avoid both from now on.

Reapplied after fixing the ASAN error which caused the revert:
db68a25ca9
2022-06-21 21:28:26 -05:00
Vasileios Porpodas 7a9ad25769 Recommit "[SLP][X86] Improve reordering to consider alternate instruction bundles"
This reverts commit 6d6268dcbf.

Review: https://reviews.llvm.org/D125712
2022-06-21 18:35:29 -07:00
Vasileios Porpodas 6d6268dcbf Revert "[SLP][X86] Improve reordering to consider alternate instruction bundles"
This reverts commit 6f88acf410.
2022-06-21 17:07:21 -07:00
Vasileios Porpodas 6f88acf410 [SLP][X86] Improve reordering to consider alternate instruction bundles
During the reordering transformation we should try to avoid reordering bundles
like fadd,fsub because this may block them being matched into a single vector
instruction in x86.
We do this by checking if a TreeEntry is such a pattern and adding it to the
list of TreeEntries with orders that need to be considered.

Differential Revision: https://reviews.llvm.org/D125712
2022-06-21 16:44:48 -07:00
Florian Hahn 88ce403c6a
[LV] Add new block to place recurrence splice, if needed.
In some cases, a recurrence splice instructions needs to be inserted
between to regions, for example if the regions get re-arranged during
sinking.

Fixes #56146.
2022-06-21 21:54:37 +02:00
Heejin Ahn 27e4afcea7 [DSE] Don't remove nounwind invokes
For non-mem-intrinsic and non-lifetime `CallBase`s, the current
`isRemovable` function only checks if the `CallBase` 1. has no uses 2.
will return 3. does not throw:
80fb782336/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp (L1017)

But we should also exclude invokes even in case they don't throw,
because they are terminators and thus cannot be removed. While it
doesn't seem to make much sense for `invoke`s to have an `nounwind`
target, this kind of code can be generated and is also valid bitcode.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128224
2022-06-21 11:54:09 -07:00
Martin Sebor b19194c032 [InstCombine] handle subobjects of constant aggregates
Remove the known limitation of the library function call folders to only
work with top-level arrays of characters (as per the TODO comment in
the code) and allows them to also fold calls involving subobjects of
constant aggregates such as member arrays.
2022-06-21 11:55:14 -06:00
Alexey Bataev d4ee43153d [SLP][NFC]Fix a warning in a comparison, NFC.
Fixed signedness warning.
2022-06-21 10:19:47 -07:00
serge-sans-paille aaf1630ac3 [Scalarizer] No need to gather a scattered extracted element
ExtractElement does not produce a vector out of a vector, so there's no need to
call a gather once done.

Fix #54469

Credits to npopov@redhat.com for the original approach.

Differential Revision: https://reviews.llvm.org/D126012
2022-06-21 18:43:54 +02:00
Arthur Eubanks b5db65e0da Reland [GlobalOpt] Preserve CFG analyses
The only place we modify the CFG is when calling
removeUnreachableBlocks(), so insert a callback there which invalidates
analyses for that function (or recomputes DT in the legacy PM).

We may delete functions, make sure to clear analyses for those
functions. (this was missed in the original revision)

Small compile time wins across the board:
https://llvm-compile-time-tracker.com/compare.php?from=f444ea8ce0aaaa5ec1a4129809389da15cc41396&to=698f41f4fc26cbf1006ed5d88e9d658edfc5b749&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128145
2022-06-21 09:19:59 -07:00
Alexey Bataev f1ee2738b3 [SLP]Fix a crash when insert subvector is out of range.
If the OffsetBeg + InsertVecSz is greater than VecSz, need to estimate
the cost as shuffle of 2 vector, not as insert of subvector. Otherwise,
the inserted subvector is out of range and compiler may crash.

Differential Revision: https://reviews.llvm.org/D128071
2022-06-21 07:16:35 -07:00
Florian Hahn 4ea6891f95
[ConstraintElimination] Remove unneeded StackEntry::Condition (NFC).
The field was only used for debug printing. Print constraint from the
system instead.
2022-06-21 15:57:29 +02:00
Florian Hahn 2a9313ee0b
[ConstraintElimination] Move logic to check condition to helper (NFC). 2022-06-21 11:50:33 +02:00
Kazu Hirata 7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Kazu Hirata d66cbc565a Don't use Optional::hasValue (NFC) 2022-06-20 20:26:05 -07:00
Kazu Hirata 0916d96d12 Don't use Optional::hasValue (NFC) 2022-06-20 20:17:57 -07:00
Florian Hahn 6dd772d348
[ConstraintElimination] Move logic to get a constraint to helper (NFC). 2022-06-20 21:34:07 +02:00
Kazu Hirata ad7ce1e769 Don't use Optional::hasValue (NFC) 2022-06-20 11:49:10 -07:00
Kazu Hirata 5413bf1bac Don't use Optional::hasValue (NFC) 2022-06-20 11:33:56 -07:00
Kazu Hirata e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Arthur Eubanks 13ff7d6f39 Revert "[GlobalOpt] Perform store->dominated load forwarding for stored once globals"
This reverts commit 6f348b146b.

Am seeing internal test failures plus a linux kernel breakage reported due to this.
2022-06-20 10:26:47 -07:00
Arthur Eubanks 1cd2c72bef Revert "[GlobalOpt] Preserve CFG analyses"
This reverts commit cc65f3e167.

Causes crashes: https://github.com/llvm/llvm-project/issues/56131
2022-06-20 10:25:10 -07:00
Guillaume Chatelet 589c8d6fb9 [NFC] Simplify alignment code in MemorySanitizer 2022-06-20 15:15:53 +00:00
Guillaume Chatelet 7296811910 [NFC] Simplify alignment code in CoroFrame 2022-06-20 15:15:52 +00:00
Florian Hahn cebe7ae881
[ConstraintElimination] Move logic to add constraint to helper (NFC). 2022-06-20 17:08:35 +02:00
Florian Hahn bd9632afd2
[ConstraintElimination] Move StackEntry up, to allow use earlier (NFC). 2022-06-20 16:40:42 +02:00
Florian Hahn cfc741bc0e
[LoopPeel] Forget SCEV for updated exit phi values.
LoopPeel add new incoming values to exit phi nodes which can change the
SCEV for the phi after 20d798bd47.

Forget SCEVs for such phis.

Fixes #56044.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128164
2022-06-20 13:19:27 +02:00
Guillaume Chatelet f1255186c7 [NFC][Alignment] Remove max functions between Align and MaybeAlign
`llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are
not used often enough to be required. They also make the code more opaque.

Differential Revision: https://reviews.llvm.org/D128121
2022-06-20 08:37:48 +00:00
Guillaume Chatelet 009fe0755e [Alignment] Remove multiply by MaybeAlign 2022-06-20 08:37:15 +00:00
Nikita Popov 2b089e9ae0 [SimplifyCFG] Try to merge edge block when threading (PR55765)
When threading, we always create a new block for the threaded edge
(even if the edge is not critical), which will later get folded back
into the predecessor if possible. Depending on precise processing
order, this separate block may break the detection of trivial
cycles in the threading code, which normally avoids infinite
threading of loops. Explicitly merge the created edge block into
the predecessor to avoid this.

Fixes https://github.com/llvm/llvm-project/issues/55765.

Differential Revision: https://reviews.llvm.org/D127216
2022-06-20 10:29:33 +02:00
Chuanqi Xu 7782e080e8 [Coroutines] Only do symmetric transfer if optimization is on
Symmetric transfer is not a part of C++ standards. So the vendors is not
forced to implement it any way. Given the symmetric transfer nowadays is
an optimization. It makes more sense to enable it only if the
optimization is enabled. It is also helpful for the compilation speed in
O0.
2022-06-20 16:20:36 +08:00
Chenbing Zheng 0eff6c6ba8 [InstCombine] add vector support for (A >> C) == (B >> C) --> (A^B) u< (1 << C)
Reviewed By: spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D127398
2022-06-20 10:55:47 +08:00
Arthur Eubanks cc65f3e167 [GlobalOpt] Preserve CFG analyses
The only place we modify the CFG is when calling
removeUnreachableBlocks(), so insert a callback there which invalidates
analyses for that function (or recomputes DT in the legacy PM).

Small compile time wins across the board:
https://llvm-compile-time-tracker.com/compare.php?from=f444ea8ce0aaaa5ec1a4129809389da15cc41396&to=698f41f4fc26cbf1006ed5d88e9d658edfc5b749&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128145
2022-06-19 16:13:02 -07:00
Eric Gullufsen 73202130e5 [InstCombine] Optimize test for same-sign of values
(icmp slt (X & Y), 0) | (icmp sgt (X | Y), -1) -> (icmp sgt (X ^ Y), -1)
(icmp slt (X | Y), 0) & (icmp sgt (X & Y), -1) -> (icmp slt (X ^ Y), 0)

[[ https://alive2.llvm.org/ce/z/qXxEFP | alive2 example ]]
[[ https://godbolt.org/z/aWf9c6j74 | godbolt  ]]

[[ https://godbolt.org/z/5Ydn5TehY | godbolt for inverted form ]]
[[ https://alive2.llvm.org/ce/z/93AODr | alive2 for inverted form ]]
[[ https://github.com/llvm/llvm-project/issues/55988 | issue #55988 ]]

Differential Revision: https://reviews.llvm.org/D127903
2022-06-19 16:18:19 -04:00
Arthur Eubanks 6f348b146b [GlobalOpt] Perform store->dominated load forwarding for stored once globals
Compile time tracker:
https://llvm-compile-time-tracker.com/compare.php?from=1e556f459b44dd0ca4073e932f66ecb6f40fe31a&to=6d7bed4e1e72c6a8592748626091274209740a40&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128128
2022-06-19 10:27:20 -07:00
Sanjay Patel 0399473de8 [InstCombine] add fold for (ShiftC >> X) <u C
https://alive2.llvm.org/ce/z/RcdzM-

This fixes a regression noted in issue #56046.
2022-06-19 11:03:28 -04:00
Kazu Hirata 129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
Arthur Eubanks 4a5201f484 [NFC][GlobalOpt] Remove unused parameters 2022-06-18 21:23:39 -07:00
Kazu Hirata c399b3a608 [Vectorize] Use llvm::is_contained (NFC) 2022-06-18 15:49:15 -07:00
Kazu Hirata 726b2dd040 [IPO] Use default member initialization (NFC)
Identified with modernize-use-default-member-init.
2022-06-18 15:41:20 -07:00
Arthur Eubanks e4406cefa0 [RPOFuncAttrs] Fix norecurse detection
We wanted to check if all uses of the function are direct calls, but the
code didn't account for passing the function as a parameter.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D128104
2022-06-18 12:20:10 -07:00
Kazu Hirata eb15c80c89 [IPO] Use default member initialization (NFC)
Identified with modernize-use-default-member-init.
2022-06-18 12:17:09 -07:00
Kazu Hirata f8b5be64ab [IPO] Call *set::insert without checking membership first (NFC) 2022-06-18 10:37:04 -07:00
Kazu Hirata b254d67160 [llvm] Call *set::insert without checking membership first (NFC) 2022-06-18 08:32:54 -07:00
Florian Hahn e9cced2739
Recommit "[LAA] Initial support for runtime checks with pointer selects."
This reverts commit 7aa8a67882.

This version includes fixes to address issues uncovered after
the commit landed and discussed at D11448.

Those include:

* Limit select-traversal to selects inside the loop.
* Freeze pointers resulting from looking through selects to avoid
  branch-on-poison.
2022-06-17 21:06:26 +02:00
Martin Sebor 5fb67e32f8 [InstCombine] Fold memcmp of constant arrays and variable size
The memcmp simplifier is limited to folding to constants calls with constant
arrays and constant sizes.  This change adds the ability to simplify
memcmp(A, B, N) calls with constant A and B and variable N to the pseudocode
equivalent of

N <= Pos ? 0 : (A < B ? -1 : B < A ? +1 : 0)

where Pos is the offset of the first mismatch between A and B.

Differential Revision: https://reviews.llvm.org/D127766
2022-06-17 10:35:35 -06:00
Sanjay Patel bfde861935 [InstCombine] convert mask and shift of power-of-2 to cmp+select
When the mask is a power-of-2 constant and op0 is a shifted-power-of-2
constant, test if the shift amount equals the offset bit index:

(ShiftC << X) & C --> X == (log2(C) - log2(ShiftC)) ? C : 0
(ShiftC >> X) & C --> X == (log2(ShiftC) - log2(C)) ? C : 0

This is an alternate to D127610 with a more general pattern.
We match only shift+and instead of the trailing xor, so we see a few
more tests diffs. I think we discussed this initially in D126617.

Here are proofs for shifts in both directions:
https://alive2.llvm.org/ce/z/CFrLs4

The test diffs look equal or better for IR, and this makes the
patterns more uniform in IR. The backend can partially invert this
in both cases if that is profitable. It is not trivially reversible,
however, so if we find perf regressions that are not easy to undo,
then we may want to revert this.

Differential Revision: https://reviews.llvm.org/D127801
2022-06-17 10:51:57 -04:00
Malhar Jajoo 6bb40552f2 [LoopVectorize] Add support for invariant stores of ordered reductions
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D126772
2022-06-17 14:56:21 +01:00
Nikita Popov c6b88cb918 [InstCombine] Push freeze through recurrence phi
We really want to push freezes through recurrence phis, so that we
freeze only the start value, rather than the IV value on every
iteration. foldOpIntoPhi() already handles this for the case where
the transfer function doesn't produce poison, e.g.
%iv.next = add %iv, 1. However, this does not work if nowrap flags
are present, e.g. the very common %iv.next = add nuw %iv, 1 case.

This patch adds a fold that pushes freeze instructions to the start
value by checking whether all backedge values will be non-poison
after poison generating flags have been dropped. This allows pushing
freezes out of loops in most cases. I suspect that this also
obsoletes the CanonicalizeFreezeInLoops pass, and we can probably
drop it.

Fixes https://github.com/llvm/llvm-project/issues/56048.

Differential Revision: https://reviews.llvm.org/D127960
2022-06-17 15:01:41 +02:00
Amanieu d'Antras caa2a829cd [MergeFunctions] Preserve symbols used llvm.used/llvm.compiler.used
llvm.used and llvm.compiler.used are often used with inline assembly
that refers to a specific symbol so that the symbol is kept through to
the linker even though there are no references to it from LLVM IR.

This fixes the MergeFunctions pass to preserve references to these
symbols in llvm.used/llvm.compiler.used so they are not deleted from the
IR. This doesn't prevent these functions from being merged, but
guarantees that an alias or thunk with the expected symbol name is kept
in the IR.

Differential Revision: https://reviews.llvm.org/D127751
2022-06-16 21:36:39 +01:00
Paul Robinson 3f6030255d Reland "[PS4/PS5][profiling] Go back to the old way of doing a runtime hook"
Profiling stopped working for us after D98061, which was largely a
Fuschia-specific patch but in one place used `isOSBinFormatELF` to
make a decision.  I'm adding a PS4/PS5 exception to that, so we can
get profiling to work again.

Differential Revision: https://reviews.llvm.org/D127506
2022-06-16 11:53:48 -07:00
Paul Robinson d0e60b6d7e Revert "[PS4/PS5][profiling] Go back to the old way of doing a runtime hook"
This reverts commit 39fb84343e.

Pushed without verifying the test still works.
2022-06-16 11:41:23 -07:00
Paul Robinson 39fb84343e [PS4/PS5][profiling] Go back to the old way of doing a runtime hook
Profiling stopped working for us after D98061, which was largely a
Fuschia-specific patch but in one place used `isOSBinFormatELF` to
make a decision.  I'm adding a PS4/PS5 exception to that, so we can
get profiling to work again.

Differential Revision: https://reviews.llvm.org/D127506
2022-06-16 11:37:07 -07:00
Paul Robinson 593fa3ab30 [PS5] Set address sanitizer shadow offset 2022-06-16 11:28:30 -07:00
Alexey Bataev 76782a65ee [SLP]Use original vector if need to shuffle truncated root.
If the root scalar is mapped to to the smallest bit width, the vector is
truncated and the types between original buildvector and extracted value
mismatched. For extract, we emit sext/zext instructions, for shuffles we
can reuse oringal vector instead of the truncated one.

Differential Revision: https://reviews.llvm.org/D127974
2022-06-16 10:41:18 -07:00
Samuel Eubanks bf02ed240d Prevent crash when TurnSwitchRangeIntoICmp receives default unreachable destination
TurnSwitchRangeIntoICmp crashes when given a switch with a default
destination of unreachable
Addresses issue #53208
https://github.com/llvm/llvm-project/issues/53208

Differential revision: https://reviews.llvm.org/D127712
2022-06-16 16:11:24 +02:00
Florian Hahn 949c13649c
[LV] Remove widenPHIInstruction dependence on underlying instr (NFC).
Instead of using the underlying instruction and VF to get the type, use
the type of the incoming value. This removes an unnecessary dependence
on the underlying instruction and enables using the recipe without an
underlying instruction.
2022-06-16 16:03:01 +02:00
Alexey Bataev 7236d49fd5 [SLP]Extend vectorization for scatter vectorize nodes.
Currently scatter vectorize nodes can be emitted only for GEPs with
constant indices. But we can also emit such nodes for GEPs with the same
ptr and non-constant vectorizable/gathered indices, if profitable. Patch
adds support for such nodes and tries to improve handling of GEPs with
non-const indeces for such nodes.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
                                                                                              results                   results0 diff
                    test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  5243.00                   5240.00  -0.1%
                     test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  5243.00                   5240.00  -0.1%
                     test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27550.00                  27507.00  -0.2%
                               test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  5395.00                   5380.00  -0.3%
                       test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5389.00                   5374.00  -0.3%
                    test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   961.00                    958.00  -0.3%
                   test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test   961.00                    958.00  -0.3%
                               test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test  5664.00                   5643.00  -0.4%
                       test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 13202.00                  13127.00  -0.6%
                                test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   212.00                    207.00  -2.4%
                                test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   890.00                    850.00  -4.5%
                            test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test  1695.00                   1581.00  -6.7%
                                 test-suite :: MultiSource/Applications/JM/lencod/lencod.test  2338.00                   2140.00  -8.5%
                                  test-suite :: SingleSource/UnitTests/matrix-types-spec.test    63.00                     55.00 -12.7%
                             test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   468.00                    356.00 -23.9%
                                                                           Geomean difference                                     -0.3%

All numbers show increased number of generated vector instructions.

Diff:
SingleSource/Benchmarks/Adobe-C++/loop_unroll - better without LTO, but
need an extra analysis with LTO (with LTO compiler generates
masked_gather, while before regular loads were emitted because of extra
data, availbale at LTO time).
SingleSource/UnitTests/matrix-types-spec - more vector code.
MultiSource/Applications/JM/lencod/lencod - same.
External/SPEC/CINT2006/464.h264ref/464.h264ref - same.
MultiSource/Benchmarks/7zip/7zip-benchmark - same.
External/SPEC/CINT2006/445.gobmk/445.gobmk - no changes.
External/SPEC/CFP2017rate/510.parest_r/510.parest_r - more vector code.
External/SPEC/CFP2006/447.dealII/447.dealII - same
External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s - same
External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp - same
External/SPEC/CFP2017rate/511.povray_r/511.povray - same
External/SPEC/CFP2006/453.povray/453.povray - same
External/SPEC/CFP2017rate/526.blender_r/526.blender_r - same
External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r - same
External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s - same

Differential Revision: https://reviews.llvm.org/D127219
2022-06-16 06:05:48 -07:00
Jin Xin Ng aaff3fb6d5 [mlgo] Fix accounting for SCC splits
Previously if the inliner split an SCC such that an empty one remained, the MLInlineAdvisor could potentially lose track of the EdgeCount if a subsequent CGSCC pass modified the calls of a function that was initially in the SCC pre-split. Saving the seen nodes in onPassEntry resolves this.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D127693
2022-06-15 10:53:23 -07:00
Florian Hahn 5ff5b460d9
[LV] Remove unneeded CustomBuilder arg from setDebugLocFromInst (NFC).
The only user that passed in a custom builder was passing in
VPTransformState::Builder, which is the same as ILV::Builder.
2022-06-15 18:48:02 +01:00
Alexey Bataev c60c13f7eb [SLP] Improve reordering in presence of constant only nodes.
We can skip the analysis of the constant nodes, their order should not
affect the ordering of the trees/subtrees.

Differential Revision: https://reviews.llvm.org/D127775
2022-06-15 06:17:34 -07:00
Heejin Ahn b2f4112f25 [InstCombine] Improve check for catchswitch BBs (NFC)
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D127810
2022-06-15 01:06:13 -07:00
Nikita Popov 2dac2c4f76 [SimplifyLibCalls] Drop duplicate check (NFC)
The same condition already exists inside optimizeMemCmpConstantSize().
2022-06-15 09:37:09 +02:00
Venkata Ramanaiah Nalamothu 340b0ca900 [llvm] Add DW_CC_nocall to function debug metadata when either return values or arguments are removed
Adding the `DW_CC_nocall` calling convention to the function debug metadata is needed when either the return values or the arguments of a function are removed as this helps in informing debugger that it may not be safe to call this function or try to interpret the return value.
This translates to setting `DW_AT_calling_convention` with `DW_CC_nocall` for appropriate DWARF DIEs.

The DWARF5 spec (section 3.3.1.1 Calling Convention Information) says:

If the `DW_AT_calling_convention` attribute is not present, or its value is the constant `DW_CC_normal`, then the subroutine may be safely called by obeying the `standard` calling conventions of the target architecture. If the value of the calling convention attribute is the constant `DW_CC_nocall`, the subroutine does not obey standard calling conventions, and it may not be safe for the debugger to call this subroutine.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D127134
2022-06-15 03:30:15 +05:30
Florian Hahn 7c0089d735
[Matrix] Check if iterator is at beginning of BB in optimizeTranspose.
If an instruction at the beginning of a block is erased,  this may
trigger crash due to dereferencing an invalid iterator.

Check if II is at the end before dereferencing it.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D127736
2022-06-14 21:37:02 +01:00
Jin Xin Ng 9f2b873a7d [inliner] Add per-SCC-pass InlineAdvisor printing option
Adds option to print the contents of the Inline Advisor after each SCC Inliner pass

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D127689
2022-06-14 08:06:52 -07:00
Serguei Katkov d713f0eab8 Revert "[MachineSSAUpdater] compile time improvement in GetValueInMiddleOfBlock"
It looks like it causes buildbot failures.
As an example: https://lab.llvm.org/buildbot/#/builders/121/builds/20364

Revert to investigate...

This reverts commit 6bf2791814.
2022-06-14 20:27:21 +07:00
Serguei Katkov 6bf2791814 [MachineSSAUpdater] compile time improvement in GetValueInMiddleOfBlock
GetValueInMiddleOfBlock uses result of GetValueAtEndOfBlockInternal if there is no value
defined for current basic block.

If there is already a value it tries (in this order):

to find single register coming from all predecessors
find existing phi node which matches our incoming registers
build new phi.
The compile time improvement is to use current available value if
it is defined out of current BB or it is a PHI register.
This is due to it can be used in the middle basic block.

Reviewed By: sameerds
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D126523
2022-06-14 18:00:34 +07:00
Guillaume Chatelet 77bba68de6 [NFC][Alignment] Use Align in CoroFrame 2022-06-14 10:56:36 +00:00
Guillaume Chatelet 6fd480d957 [NFC][Alignment] use getAlign in AddressSanitizer 2022-06-14 10:56:36 +00:00
Florian Hahn 782e912246
[ConstraintElimination] Support constraints with only const ops.
Remove the early exit if both constraints contain no variables. This
restriction is unnecessayr for correctness and removing it simplifies
handling of trivial constant conditions in follow-up changes.
2022-06-14 10:37:12 +01:00
Chuanqi Xu d029db9e8a [NFC] Fix Wswitch warning triggered by 735e6c 2022-06-14 14:45:15 +08:00
Chuanqi Xu 735e6c40b5 [Coroutines] Convert coroutine.presplit to enum attr
This is required by @nikic in https://reviews.llvm.org/D127383 to
decrease the cost to check whether a function is a coroutine and this
fixes a FIXME too.

Reviewed By: rjmccall, ezhulenev

Differential Revision: https://reviews.llvm.org/D127471
2022-06-14 14:23:46 +08:00
chenglin.bi 286198ff04 [InstCombine] Optimize lshr+shl+and conversion pattern
if `C1` and `C3` are pow2 and `Log2(C3) >= C2`:
    ((C1 >> X) << C2) & C3 -> X == (Log2(C1)+C2-Log2(C3)) ? C3 : 0
https://alive2.llvm.org/ce/z/zvrkKF

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127469
2022-06-14 11:06:10 +08:00
Heejin Ahn ac4006b0d6 [InstCombine] Don't slice up PHIs when pred BB has catchswitch
If an integer PHI has an illegal type (according to the data layout) and
it is only used by `trunc` or `trunc(lshr)` operations, we split the PHI
into various instructions in its predecessors:
6d1543a167/llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp (L1536-L1543)

So this can produce code like the following:
Before:
```
pred:
  ...

bb:
  %p = phi i8 [ %somevalue, %pred ], ...
  ...
  %tobool = trunc i8 %p to i1
  use %tobool
  ...
```
In this code, `%p` has an illegal integer type, `i8`, and its only used
in a `trunc` instruction later. In this case this pass puts extraction
code in its predecessors:

After:
```
pred:
  ...
  %t = and i8 %somevalue, 1
  %extract = icmp ne i8 %t, 0

bb:
  %p.new = phi i1 [ %extract, %pred ], ...
  use %p.new instead of %tobool
```

But this doesn't work if `pred` is a `catchswitch` BB because it cannot
have any non-PHI instructions. This CL ensures we bail out in that case.

Fixes https://github.com/llvm/llvm-project/issues/55803.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D127699
2022-06-13 18:32:09 -07:00
Florian Hahn 9129e7bb54
[LV] Replace OrigPHIsToFix in native with VPlan traversal. (NFC)
OrigPHIsToFix is only used in the native path. Collecting phis can be
replaced by iterating over the plan. This also removes another
unnecessary use of a late getVPValue.

This also reduces the coupling between ILV and the VPlan utilities.
2022-06-13 22:20:58 +01:00
Hubert Tong 5efb380c26 [NFC] Undo AIX build compiler workaround
Removes the workaround from https://reviews.llvm.org/D98509#2732628 for
an AIX build compiler issue.

The AIX build compiler product that caused the issue has since been
fixed. Also, the AIX build compiler has been changed to one based on
LLVM.
2022-06-13 17:00:33 -04:00
Guillaume Chatelet 111b32ecb4 [NFC][Alignment] Use getAlign in Attributor classes 2022-06-13 15:13:05 +00:00
Guillaume Chatelet 2887dd754e [NFC][Alignment] Use getAlign in VNCoercion 2022-06-13 15:13:05 +00:00
Sanjay Patel 310adb658c [InstCombine] reorder mask folds for efficiency
This shows narrowing improvements on the logic tests
(transforms recently added with e247b0e5c9).

This is not a complete fix. That would require adding
folds to visitOr/visitXor. But it enables the expected
transforms for the basic patterns in the affected tests.
2022-06-13 09:49:57 -04:00
Guillaume Chatelet 45a5cd41e5 [NFC][Alignment] Simplify code in MemorySanitizer 2022-06-13 13:36:36 +00:00
Guillaume Chatelet 86f455750b [NFC] Remove dead code 2022-06-13 12:59:38 +00:00
Guillaume Chatelet a6c2ab0c3f [NFC][Alignment] Use proper type in instrumentLoadOrStore 2022-06-13 12:59:38 +00:00
Nikita Popov 571c713144 [SimplifyCFG] Handle trapping aggregates (PR49839)
Handle the fact that not only constant expressions, but also
constant aggregates containing expressions can trap.

This still doesn't fix the original C reproducer, probably due to
more issues remaining in other passes.
2022-06-13 14:56:49 +02:00
Guillaume Chatelet f9bb8c24ac [NFC][Alignment] Convert MemCpyOptimizer.cpp 2022-06-13 10:07:09 +00:00
David Sherwood 83251896d7 [NFC][InstCombine] Refactor InstCombinerImpl::foldSelectIntoOp
Introduce a lambda function so that we remove a lot of code
duplication.

Differential Revision: https://reviews.llvm.org/D127493
2022-06-13 10:37:07 +01:00
Nikita Popov 92a9b1c918 [InstCombine] Don't push operation across loop phi
When pushing an operation across a phi node, we should avoid doing
so across a loop backedge. This is generally non-profitable, because
it does not reduce the number of times the operation is executed,
and could lead to an infinite combine loop.

The code was already guarding against this, but using an
insufficiently strong condition, which did not cover the case where
the operation was originally outside the loop (in which case the
transform moves the operation from outside the loop into the loop,
which is particularly undesirable).

Differential Revision: https://reviews.llvm.org/D127499
2022-06-13 10:48:09 +02:00
Kazu Hirata df792bcb02 [Transforms] Use default member initialization (NFC)
Identified with modernize-use-default-member-init.
2022-06-12 18:39:05 -07:00
Florian Hahn 763f2bdba5
[VPlan] Remove dead OrigLoop argument from removeDeadRecipes (NFC).
The use of the argument has been remove a while ago. Remove the dead
argument.
2022-06-11 23:36:47 +01:00
Kazu Hirata a838043f38 [llvm] Use contains (NFC) 2022-06-11 11:46:16 -07:00
Kazu Hirata 5d7b1a5f1b [Scalar] Use llvm::append_range (NFC) 2022-06-10 23:09:01 -07:00
Mitch Phillips db68a25ca9 Revert "[Attributor] Ensure to use the proper liveness AA"
This reverts commit a3273c0c06.

Reason: Broke the ASan buildbots with a memory leak. See
https://reviews.llvm.org/rG94841c713fdd2bce3276015d1e946d414bb74ee8 for
more information.
2022-06-10 14:05:09 -07:00
Nuno Lopes e5c5f92e12 [InstCombine] switch synthetic unreachable to use undef instead of poison (NFC) 2022-06-10 21:54:09 +01:00
Sanjay Patel e247b0e5c9 [InstCombine] add narrowing transform for low-masked binop with zext operand (2nd try)
The 1st try ( afa192cfb6 ) was reverted because it could
cause an infinite loop with constant expressions.

A test for that and an extra condition to enable the transform
are added now. I also added code comments to better describe
the transform and the existing, related transform.

Original commit message:
https://alive2.llvm.org/ce/z/hRy3rE

As shown in D123408, we can produce this pattern when moving
casts around, and we already have a related fold for a binop
with a constant operand.
2022-06-10 12:42:27 -04:00
Paul Robinson 3dbb5cb273 [PS5] Use linker scripting to find profiling data, like PS4 2022-06-10 09:00:51 -07:00
Guillaume Chatelet dc9c2eac98 [NFC][Alignment] Simplify code 2022-06-10 15:25:28 +00:00
Hans Wennborg 3800b157d7 [SimplifyCFG] Share code to compute switch density between ShouldBuildLookupTable() and ReduceSwitchRange()
They're computing the same thing. No functionality change.

Differential revision: https://reviews.llvm.org/D127482
2022-06-10 15:29:36 +02:00
Guillaume Chatelet 12ccdd67aa [NFC] Use proper getSliceAlign type in SROA 2022-06-10 12:37:41 +00:00
Sanjay Patel 6fedc6a2b4 Revert "[InstCombine] add narrowing transform for low-masked binop with zext operand"
This reverts commit afa192cfb6.
This can cause an infinite loop as shown with an example in the
post-commit thread.
2022-06-10 08:25:10 -04:00
David Sherwood 8daaea206b [InstCombine] Use +0.0 instead of -0.0 as the FP identity for some folds
In foldSelectIntoOp we sometimes transform a select of a fadd into a
fadd of a select, where we select between data and an identity value.
For both fadd and fsub the identity is always -0.0, but if the nsz
flag is set on the select instruction we can use +0.0 instead. Doing
so then triggers other optimisations, such as when folding the select
of masked load into a new masked load.

Differential Revision: https://reviews.llvm.org/D126774
2022-06-10 12:42:34 +01:00
Bin Cheng 8b360c69e9 [FuncSpec]Fix assertion failure when value is not added to solver
This patch improves the fix in D110529 to prevent from crashing on value
with byval attribute that is not added in SCCP solver.

Authored-by: sinan.lin@linux.alibaba.com
Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D126355
2022-06-10 18:45:53 +08:00
Nikita Popov d77f944832 [LoopInfo] Add getOutermostLoop() (NFC)
This is a recurring pattern, add an API function for it.
2022-06-10 11:48:21 +02:00
David Green 4a5cb957a1 [AggressiveInstcombine] Conditionally fold saturated fptosi to llvm.fptosi.sat
This adds a fold for aggressive instcombine that converts
smin(smax(fptosi(x))) into a llvm.fptosi.sat, providing that the
saturation constants are correct and the cost of the llvm.fptosi.sat is
lower.

Unfortunately, a llvm.fptosi.sat cannot always be converted back to a
smin/smax/fptosi. The llvm.fptosi.sat intrinsic is more defined that the
original, which produces poison if the original fptosi was out of range.
The llvm.fptosi.sat will saturate any value, so needs to be expanded to
a fptosi(fpmin(fpmax(x))), which can be worse for codegeneration
depending on the target.

So this change thais conditional on the backend reporting that the
llvm.fptosi.sat is cheaper that the original smin+smax+fptost.  This is
a change to the way that AggressiveInstrcombine has worked in the past.
Instead of just being a canonicalization pass, that canonicalization can
be dependant on the target in certain specific cases.

Differential Revision: https://reviews.llvm.org/D125755
2022-06-10 09:36:09 +01:00
chenglin.bi de7a6ae1ff [InstCombine] Optimize shl+lshr+and conversion pattern
if `C1` and `C3` are pow2 and `Log2(C3)+C2 < BitWidth`:
    ((C1 << X) >> C2) & C3 -> X == (Log2(C3)+C2-Log2(C1)) ? C3 : 0;

https://alive2.llvm.org/ce/z/Pus5bd

Fix issue https://github.com/llvm/llvm-project/issues/55739

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126617
2022-06-10 09:36:58 +08:00
Philip Reames 206f10d3f6 Plumb InstructionCost through unroll costing
Teach the unroller(s) how to handle an invalid cost. This avoids crashes when the backend can't provide a cost due to either a fundemental limitation or an unimplemented cost model case.

Differential Revision: https://reviews.llvm.org/D127305
2022-06-09 15:42:53 -07:00
Philip Reames f85c5079b8 Pipe potentially invalid InstructionCost through CodeMetrics
Per the documentation in Support/InstructionCost.h, the purpose of an invalid cost is so that clients can change behavior on impossible to cost inputs. CodeMetrics was instead asserting that invalid costs never occurred.

On a target with an incomplete cost model - e.g. RISCV - this means that transformations would crash on (falsely) invalid constructs - e.g. scalable vectors. While we certainly should improve the cost model - and I plan to do so in the near future - we also shouldn't be crashing. This violates the explicitly stated purpose of an invalid InstructionCost.

I updated all of the "easy" consumers where bailouts were locally obvious. I plan to follow up with loop unroll in a following change.

Differential Revision: https://reviews.llvm.org/D127131
2022-06-09 15:17:24 -07:00
Sanjay Patel afa192cfb6 [InstCombine] add narrowing transform for low-masked binop with zext operand
https://alive2.llvm.org/ce/z/hRy3rE

As shown in D123408, we can produce this pattern when moving
cast around, and we already have a related fold for a binop
with a constant operand.
2022-06-09 16:59:26 -04:00
Johannes Doerfert 6555558a80 Revert "[Attributor] Replace AAValueSimplify with AAPotentialValues"
This reverts commit da50dab1ae.

Patch broke AMD GPU OpenMP offload buildbots.
https://lab.llvm.org/buildbot/#/builders/193/builds/13246
2022-06-09 17:04:01 +02:00
Johannes Doerfert da50dab1ae [Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
  locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
  only as an afterthought. `genericValueTraversal` did offer an option
  but `AAValueSimplify` did not. Thus, we might end up with "too much"
  simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
  problems like the infinite recursion bug (#54981) as well as code
  duplication.

This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.

`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.

We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good.

Fixes: https://github.com/llvm/llvm-project/issues/54981
2022-06-09 16:48:53 +02:00
Johannes Doerfert 94841c713f [Attributor] Try to delete stores and simplify stored values
By default we should try to eliminate unused stores and simplify values
stored while we are at it.
2022-06-09 16:48:53 +02:00
Johannes Doerfert a3273c0c06 [Attributor] Ensure to use the proper liveness AA
When determining liveness via Attributor::isAssumedDead(...) we might
end up without a liveness AA or with one pointing into another function.
Neither is helpful and we will avoid both from now on.
2022-06-09 16:48:53 +02:00
Simon Moll b8c2781ff6 [NFC] format InstructionSimplify & lowerCaseFunctionNames
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName".  This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.

This is the alternative to the less invasive clang-format only patch: D126783

Reviewed By: spatel, rengolin

Differential Revision: https://reviews.llvm.org/D126889
2022-06-09 16:10:08 +02:00
Johannes Doerfert ae10b8a582 [Attributor][FIX] Give registered simplification callbacks precedence
We accidentally checked for constants before we looked for registered
simplification callbacks. The latter needs to take precedence though.
2022-06-09 15:31:53 +02:00
Johannes Doerfert 982053e85e [Attributor][NFC] Improve debug code and comments 2022-06-09 13:41:23 +02:00
Johannes Doerfert 0ece283f03 [Attributor] Add checks needed as we strengthen value simplify 2022-06-09 13:41:23 +02:00
Johannes Doerfert 393be12b74 [Attributor] Look at base values for align, nonnull, and deref
Stripping bitcasts and 0-geps helps normalization and minimizes the
impact of a follow up change.
2022-06-09 13:41:23 +02:00
Johannes Doerfert cb8adf76f7 [Attributor] Simplify loads from constant globals
If a global is constant and the initializer is known we can simplify
loads from it as the value has to be the initializer.
2022-06-09 13:41:23 +02:00
Florian Hahn 85983ca42e
[VPlan] Replace remaining use of needsScalarIV.
All information is already available in VPlan. Note that there are some
test changes, because we now can correctly look through instructions
like truncates to analyze the actual users.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123541
2022-06-09 12:05:37 +01:00
Johannes Doerfert 14899bc43d [Attributor] Generalize interface from ConstantInt to Constant
We can use constant to allow undef and there is no need to force
integers in the API anyway. The user can decide if a non integer
constant is fine or not.
2022-06-09 12:00:26 +02:00
Johannes Doerfert 7a07b88f37 [Attributor][FIX] Replace call site argument uses, not values
We need to be careful replacing values as call site arguments
(IRPosition::IRP_CALL_SITE_ARGUMENT) is representing a use and not a
value. This patch replaces the interface to take a IR position instead
making it harder to misuse accidentally. It does not change our tests
right now but a follow up exposed the potential footgun.
2022-06-09 12:00:26 +02:00
Johannes Doerfert 1df6e171c3 [Attributor] Simplify (integer range) state handling
We used to be very conservative when integer states were merged.
Instead of adding the known range (which is large due to uncertainty)
into the assumed range (which is hopefully small), we can also only
allow to merge in both at the same time into their respective
counterpart. This will ensure we keep the invariant that assumed is part
of known.
2022-06-09 12:00:26 +02:00
Johannes Doerfert 481b8f31df [Attributor][NFC] Introduce helper struct
We often use a context associated with a value. For now only one use
case has been changed.
2022-06-09 12:00:26 +02:00
Johannes Doerfert 4277c1be88 [Attributor][FIX] Avoid metadata and duplicate replication assertion
When we recreate instructions as part of simplification we need to take
care of debug metadata and replacing the value multiple times. For now,
we handle both conservatively.
2022-06-09 12:00:26 +02:00
Biplob Mishra d87bfa9ad0 [InstCombine] Combine instructions of type or/and where AND masks can be combined.
The patch simplifies some of the patterns as below

(A | (B & C0)) | (B & C1) -> A | (B & C0|C1)
((B & C0) | A) | (B & C1) -> (B & C0|C1) | A

In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns.

Additionally this commit fixes the issue reported with the test case.
int f(int a, int b) {
  int c = ((unsigned char)(a >> 23) & 925);
  if (a)
    c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
  return c;
}

The previous revision/commit did not check one-use of an intermediate value that this transform re-uses.
When that value has another use, an existing transform will try to invert the transform here.
By adding one-use checks, we avoid the infinite loops seen with the earlier commit.

Differential Revision: https://reviews.llvm.org/D124119
2022-06-09 10:58:30 +01:00
Chenbing Zheng 38992d2c5e [InstCombine] improve fold for icmp-ugt-ashr
Existing condition for
fold icmp ugt (ashr X, ShAmtC), C --> icmp ugt X, ((C + 1) << ShAmtC) - 1
missed some boundary. It cause this fold don't work for some cases, and the
reason is due to signed number overflow.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127188
2022-06-09 16:22:12 +08:00
Nikita Popov 56c9976d46 [IndVarSimplify] Don't assert that terminator is not SCEVable (PR55925)
The IV widening code currently asserts that terminators aren't SCEVable
-- however, this is not the case for invokes with a returned attribute.

As far as I can tell, this assertions is not necessary -- even if we
have a critical edge (the second test case), the trunc gets inserted
in a legal position.

Fixes https://github.com/llvm/llvm-project/issues/55925.

Differential Revision: https://reviews.llvm.org/D127288
2022-06-09 10:12:13 +02:00
Fangrui Song 11136a6032 [DeadArgElim] Remove dead code after r128810 2022-06-08 21:11:54 -07:00
Florian Hahn cedfd7a2e5
Recommit "[VPlan] Remove uneeded needsVectorIV check."
This reverts commit 266ea446ab.

The reasons for the revert have been addressed by cleaning up condition
handling in VPlan and properly marking VPBranchOnMaskRecipe as using
scalars.

The test case for the revert from D123720 has been added in 3d663308a5.
2022-06-08 14:06:45 +01:00
Chuanqi Xu 0e10f12844 [NFC] Remove commented cerr debugging loggings
There are some unused cerr debugging loggings in the codes. It is weird
to remain such commented debug helpers in the product.
2022-06-08 15:58:06 +08:00
Chuanqi Xu 733d7cf964 [Debug] [Coroutines] Add deref operator for non complex expression
Background:

When we construct coroutine frame, we would insert a dbg.declare
intrinsic for it:
```
%hdl = call void @llvm.coro.begin() ; would return coroutine handle
call void @llvm.dbg.declare(metadata ptr %hdl, metadata
![[DEBUG_VARIABLE: __coro_frame]], metadata !DIExpression())
```

And in the splitted coroutine, it looks like:
```
define void @coro_func.resume(ptr *hdl) {
entry.resume:
    call void @llvm.dbg.declare(metadata ptr %hdl, metadata
![[DEBUG_VARIABLE: __coro_frame]], metadata !DIExpression())
}
```

And we would salvage the debug info by inserting a new alloca here:
```
define void @coro_func.resume(ptr %hdl) {
entry.resume:
    %frame.debug = alloca ptr
    call void @llvm.dbg.declare(metadata ptr %frame.debug, metadata
![[DEBUG_VARIABLE: __coro_frame]], metadata !DIExpression())
    store ptr %hdl, %frame.debug
}
```

But now, the problem comes since the `dbg.declare` refers to the address
of that alloca instead of actual coroutine handle. I saw there are codes
to solve the problem but it only applies to complex expression only. I
feel if it is OK to relax the condition to make it work for
`__coro_frame`.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D126277
2022-06-08 10:53:51 +08:00
Wael Yehia 0952cf5bbb [InstCombine] decomposeSimpleLinearExpr should bail out on negative operands.
InstCombine tries to rewrite

  %prod = mul nsw i64 %X,   Scale
  %acc = add nsw i64 %prod,   Offset
  %0 = alloca i8, i64 %acc, align 4
  %1 = bitcast i8* %0 to i32*
  Use ( %1 )

into

  %prod = mul nsw i64 %X,   Scale/4
  %acc = add nsw i64 %prod,   Offset/4
  %0 = alloca i32, i64 %acc, align 4
  Use (%0)

But it assumes Scale is unsigned, and performs an unsigned division.
So we should bail out if Scale cannot be interpreted as an unsigned safely.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126546
2022-06-08 00:57:25 +00:00
Sanjay Patel cae993d4c8 [InstCombine] [InstCombine] reduce left-shift-of-right-shifted constant via demanded bits
If we don't demand low bits and it is valid to pre-shift a constant:
(C2 >> X) << C1 --> (C2 << C1) >> X

https://alive2.llvm.org/ce/z/_UzTMP

This is the reverse-order shift sibling to 82040d414b ( D127122 ).
It seems likely that we would want to add this to the SDAG version of
the code too to keep it on par with IR.
2022-06-07 18:43:27 -04:00
Sanjay Patel a4d2c5ecaa [InstCombine] reduce code duplication for accessing type; NFC 2022-06-07 18:43:27 -04:00
Philip Reames 89c4b29e8d [GuardWidening] Fix a nasty cast bug in c2eccc6
c2eccc6 introduced a call to etHasNoUnsignedWrap which implicitly assumes that Inst is a OverflowingBinaryOperator.  This is frequently untrue, but was not caught because cast<Ty>(X) has been broken, see https://discourse.llvm.org/t/cast-x-is-broken-implications-and-proposal-to-address/63033 for context.

I considered reverting this, but since doing so re-introduces a nasty miscompile of its own, I decided to fix forward instead.

I'll note that this is a particularly nasty form of the cast<Ty>(X) issue.  Because the cast was succeeding unexpected, we were writing data to instructions which weren't OBOs.  This could result in near arbitrary data or memory corruption.  I'm a bit shocked that the sanitizers didn't find this TBH.
2022-06-07 13:27:13 -07:00
Florian Hahn b0c9a71be0
[VPlan] Handle VPInst without underlying instr in VPInterleavedAccess.
This violation is hidden while `cast` is missing an isa assertion after
D123901.
2022-06-07 21:00:49 +01:00
Martin Sebor dd2a6d78ee [InstCombine] Fold memchr of sequences of same characters
Enhance memchr libcall folder to handle constant arrays consisting
of one or two sequences of cosecutive equal characters.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126515
2022-06-07 13:45:10 -06:00
Martin Sebor fb6627fa0c [InstCombine] Add substr helper function (NFC).
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126515
2022-06-07 13:27:36 -06:00
Sanjay Patel 82040d414b [InstCombine] reduce right-shift-of-left-shifted constant via demanded bits
If we don't demand high bits (zeros) and it is valid to pre-shift a constant:
(C2 << X) >> C1 --> (C2 >> C1) << X

https://alive2.llvm.org/ce/z/P3dWDW

There are a variety of related patterns, but I haven't found a single solution
that gets all of the motivating examples - so pulling this piece out of
D126617 along with more tests.

We should also handle the case where we shift-right followed by shift-left,
but I'll make that a follow-on patch assuming this one is ok. It seems likely
that we would want to add this to the SDAG version of the code too to keep it
on par with IR.

Differential Revision: https://reviews.llvm.org/D127122
2022-06-07 13:28:18 -04:00
Craig Topper d73684e223 [LoopFlatten] Fix crash if the inner loop trip count comes from a sext instruction.
If we look through a truncate in matchLinearIVUser, it's possible
we find a sext/zext instruction that didn't come from widening.
This will fail the MatchedItCount->getType() == InnerInductionPHI->getType()
assertion.

Fix this by checking that we did not look through a truncate already.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D127149
2022-06-07 08:21:21 -07:00
Craig Topper fdd5843572 [LoopFlatten] Replace unchecked dyn_cast with cast.
Spotted while reading through the code.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D127146
2022-06-07 08:21:00 -07:00
David Sherwood 997ecb0036 [LoopVectorize] Add FastMathFlags to the select used for reductions with tail-folding
Based on reviewer comments on https://reviews.llvm.org/D126692 I've
added FastMathFlags to the select instruction used when tail-folding
with reductions. These flags can then be used by InstCombine to
decide upon the most optimal floating point identity value for
fadd/fsub. Doing so unlocks further optimisations, such as folding
selects into masked loads.

Differential Revision: https://reviews.llvm.org/D126778
2022-06-07 10:21:31 +01:00
Nikita Popov 7fa97b473c [SCCP] Don't mark ranges from branch conditions as potentially undef
Now that transforms introducing branch on poison have been removed,
we can stop marking ranges that have been derived from branch
conditions as containing undef. The existing comment explains why
this is legal. I've checked that alive2 is happy with SCCP tests
after this change.

Differential Revision: https://reviews.llvm.org/D126647
2022-06-07 10:20:24 +02:00
Enna1 e52a38c8f1 [ASan] Skip any instruction inserted by another instrumentation.
Currently, we only check !nosanitize metadata for instruction passed to function `getInterestingMemoryOperands()` or instruction which is a cannot return callable instruction.
This patch add this check to any instruction.

E.g. ASan shouldn't instrument the instruction inserted by UBSan/pointer-overflow.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D126269
2022-06-07 11:17:07 +08:00
Kevin P. Neal a1f1bd547b [IPSCCP] Switch away from Instruction::isSafeToRemove()
In D115737 I found that I needed to teach Instruction::isSafeToRemove()
about strictfp/constrained intrinsics. It was pointed out that this is
probably the wrong function to use isInstructionTriviallyDead(). It doesn't
make sense to have a "second, worse implementation".

I also believe that the Instruction class is the wrong place for this
functionality. The information about whether or not an instruction can be
removed is in the transform passes and should stay there.

Differential Revision: https://reviews.llvm.org/D118387
2022-06-06 09:24:11 -04:00
Florian Hahn eaf48dd9b0
[VPlan] Replace BranchOnCount with BranchOnCond if TC <= UF * VF.
Try to simplify BranchOnCount to `BranchOnCond true` if TC <= UF * VF.

This is an alternative to D121899 which simplifies the VPlan directly
instead of doing so late in code-gen.

The potential benefit of doing this in VPlan is that this may help
cost-modeling in the future. The reason this is done in prepareToExecute
at the moment is that a single plan may be used for multiple VFs/UFs.

There are further simplifications that can be applied as follow ups:

1. Replace inductions with constants
2. Replace vector region with regular block.

Fixes #55354.

Depends on D126679.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D126680
2022-06-06 09:38:53 +01:00
Kazu Hirata 8daf23d364 [Scalar] Use llvm::make_early_inc_range (NFC) 2022-06-05 23:53:18 -07:00
Sanjay Patel 3f33d67d8a [InstCombine] fold mul with masked low bit operand to trunc+select
https://alive2.llvm.org/ce/z/o7rQ5q

This shows an extra instruction in some cases, but that is
caused by an existing canonicalization of trunc -> and+icmp.

Codegen should be better for any target where a multiply is
more costly than the most simple ALU op.

This ends up producing the requested x86 asm from issue #55618,
but it's not the same IR. We are missing a canonicalization
from the negate+mask pattern to the trunc+select created here.
2022-06-05 20:07:18 -04:00
Kazu Hirata 3b9707dbc0 [llvm] Convert for_each to range-based for loops (NFC) 2022-06-05 12:07:14 -07:00
Kazu Hirata 30f19382c6 [Scalar] Remove isValidSingle (NFC)
The last use was removed on Feb 18, 2022 in commit
00ab91b70d.
2022-06-05 08:45:11 -07:00
Fangrui Song 95a134254a Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 01:07:51 -07:00
Fangrui Song d86a206f06 Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 00:31:44 -07:00
Kazu Hirata 2c4d52467a [Transforms/Utils] Use predecessors (NFC) 2022-06-05 00:16:14 -07:00
Fangrui Song d0d1c416cb Remove unneeded cl::ZeroOrMore for cl::list options 2022-06-04 23:51:13 -07:00
Kazu Hirata e0039b8d6a Use llvm::less_second (NFC) 2022-06-04 22:48:32 -07:00
Kazu Hirata f83a88a179 [Transforms] Use llvm::is_contained (NFC) 2022-06-04 20:48:26 -07:00
Florian Hahn 416a5080d8
[VPlan] Update vector latch terminator edge to exit block after execution.
Instead of setting the successor to the exit using CFG.ExitBB, set it to
nullptr initially. The successor to the exit block is later set either
through createEmptyBasicBlock or after VPlan execution (because at the
moment, no block is created by VPlan for the exit block, the existing
one is reused).

This also enables BranchOnCond to be used as terminator for the exiting
block of the topmost vector region.

Depends on D126618.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D126679
2022-06-04 21:22:32 +01:00
Fangrui Song 36c7d79dc4 Remove unneeded cl::ZeroOrMore for cl::opt options
Similar to 557efc9a8b.
This commit handles options where cl::ZeroOrMore is more than one line below
cl::opt.
2022-06-04 00:10:42 -07:00
Fangrui Song 557efc9a8b [llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.
2022-06-03 21:59:05 -07:00
Alexey Bataev cac60940b7 [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-03 08:06:22 -07:00
Arnold Schwaighofer 5c902af572 [coro async] Add code to support dynamic aligment of over-aligned types in async frames
Async context frames are allocated with a maximum alignment. If a type
requests an alignment bigger than that dynamically align the address
in the frame.

Differential Revision: https://reviews.llvm.org/D126715
2022-06-03 07:06:14 -07:00
Benjamin Kramer a8d2a381a2 [VPlan] Silence another unused variable warning in release builds 2022-06-03 14:07:56 +02:00
Benjamin Kramer 6b7c186390 [VPlan] Inline variable into assertion. NFC.
Avoids a warning in release builds
llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp:311:14: warning: unused variable 'BrCond' [-Wunused-variable]
      Value *BrCond = Br->getCondition();
2022-06-03 13:59:48 +02:00
Florian Hahn a5bb4a3b4d
[VPlan] Replace CondBit with BranchOnCond VPInstruction.
This patch removes CondBit and Predicate from VPBasicBlock. To do so,
the patch introduces a new branch-on-cond VPInstruction opcode to model
a branch on a condition explicitly.

This addresses a long-standing TODO/FIXME that blocks shouldn't be users
of VPValues. Those extra users can cause issues for VPValue-based
analyses that don't expect blocks. Addressing this fixme should allow us
to re-introduce 266ea446ab.

The generic branch opcode can also be used in follow-up patches.

Depends on D123005.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D126618
2022-06-03 11:48:31 +01:00
Fangrui Song df0f30dc36 Revert "[SLP]Improve shuffles cost estimation where possible."
This reverts commit 9980c99718.

Caused assertion failures: https://reviews.llvm.org/D115462#3555350
2022-06-03 00:30:34 -07:00
Daniil Suchkov f1940a5895 Revert "[LoopInterchange] New cost model for loop interchange"
Reverting the commit due to numerous buildbot failures.

This reverts commit 006334470d.
2022-06-03 00:52:08 +00:00
Congzhe Cao 006334470d [LoopInterchange] New cost model for loop interchange
This patch proposed to use a new cost model for loop interchange, which
is obtained from loop cache analysis.

Given a loopnest, what loop cache analysis returns is a vector of loops
[loop0, loop1, loop2, ...] where loop0 should be replaced as the outermost
loop, loop1 should be placed one more level inside, and loop2 one more level
inside, etc. What loop cache analysis does is not only more comprehensive than
the current cost model, it is also a "one-shot" query which means that we only
need to query it once during the entire loop interchange pass, which is better
than the current cost model where we query it every time we check whether it is
profitable to interchange two loops. Thus complexity is reduced, especially after
D120386 where we do more interchanges to get the globally optimal loop access pattern.

Updates made to test cases are mostly minor changes and some corrections.
Test coverage for loop interchange is not reduced.

Currently we did not completely remove the legacy cost model, but keep it as
fall-back in case the new cost model did not run successfully. This is because
currently we have some limitations in delinearization, which sometimes makes
loop cache analysis bail out. The longer term goal is to enhance delinearization
and eventually remove the legacy cost model compeletely.

Reviewed By: bmahjour, #loopoptwg

Differential Revision: https://reviews.llvm.org/D124926
2022-06-02 19:07:14 -04:00
Sanjay Patel 8689463bfb [InstCombine] make pattern matching more consistent; NFC
We could go either way on this and several similar matches.
Just matching as a binop is possibly slightly more efficient;
we don't need to re-confirm the opcode of the instruction.
2022-06-02 16:01:23 -04:00
Alexey Bataev 9980c99718 [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-02 11:18:14 -07:00
Liqiang Tao 14e8add939 [llvm][ModuleInliner] Refactor InlineSizePriority and PriorityInlineOrder
This patch introduces the abstract base class InlinePriority to serve as
the comparison function for the priority queue.  A derived class, such
as SizePriority, may choose to cache the priorities for different
functions for performance reasons.

This design shields the type used for the priority away from classes
outside InlinePriority and classes derived from it.  In turn,
PriorityInlineOrder no longer needs to be a template class.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D126300
2022-06-02 23:40:26 +08:00
Liqiang Tao 5c6ed60c51 Revert "[llvm][ModuleInliner] Refactor InlineSizePriority and PriorityInlineOrder"
This reverts commit 50de7f1e77.
2022-06-02 23:18:47 +08:00
Liqiang Tao 50de7f1e77 [llvm][ModuleInliner] Refactor InlineSizePriority and PriorityInlineOrder
This patch introduces the abstract base class InlinePriority to serve as
the comparison function for the priority queue.  A derived class, such
as SizePriority, may choose to cache the priorities for different
functions for performance reasons.

This design shields the type used for the priority away from classes
outside InlinePriority and classes derived from it.  In turn,
PriorityInlineOrder no longer needs to be a template class.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D126300
2022-06-02 22:28:33 +08:00
Florian Hahn 4f1c86e3d5
[VPlan] Remove dead VPlan-native special case from BranchOnCount (NFC).
After 05776122b6 this special case doesn't exist any longer.
2022-06-02 12:07:54 +01:00
eopXD 6eab5cade7 [LSR] Early exit for RateFormula when it is already losing. NFC
This patch does not effect any behavior of the current code.

The codebase implicitly implies that `Cost::RateFormula` is only called
when the `Cost` is not in losing status, or else there may be possible
to trigger the assertion of `Cost::isValid`.

The intention here is to prevent mis-use where future development
allow `Cost` that is already loser to call `Cost::RateFormula` - Early
exit when `Cost` is already losing.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D125670
2022-06-01 21:02:40 -07:00
Alexey Bataev 73020b4540 Revert "[SLP]Improve shuffles cost estimation where possible."
This reverts commit fd5a6ce9dc to fix
a crash detected by a buildbot
https://lab.llvm.org/buildbot/#/builders/179/builds/3805/steps/11/logs/stdio.
2022-06-01 15:44:51 -07:00
Florian Hahn 08482830eb
[LV] Update var name to Exiting, in line with terminology (NFC)
Recently the terminology used has been changed from Exit->Exiting in
line with common LLVM loop terminology. Update a remaining use of the
old terminology.
2022-06-01 22:13:29 +01:00
Alexey Bataev fd5a6ce9dc [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-01 11:01:37 -07:00
Alexey Bataev fe4949942d [SLP]Fix PR55796: insert point for extractelements from different basic blocks.
Extractelement instructions may come from different basic blocks, need
to take it into account when looking for a last instruction in the
bundle to prevent compiler crash.

Differential Revision: https://reviews.llvm.org/D126777
2022-06-01 09:44:53 -07:00
Alexander Kornienko aa98e7e1eb Revert "[InstCombine] Combine instructions of type or/and where AND masks can be combined."
This reverts commit ec4adf1f6c. The commit causes
clang to hang on a certain input:
```
$ cat q.cc
int f(int a, int b) {
  int c = ((unsigned char)(a >> 23) & 925);
  if (a)
    c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
  return c;
}

$ time ./clang-15-10515 --target=x86_64--linux-gnu -O1  -c q.cc
^C

real    0m45.072s
user    0m0.025s
sys     0m0.099s
```
2022-06-01 14:20:00 +02:00
Florian Hahn 05776122b6
[VPlan] Use region for each loop in native path.
This patch updates the VPlan native path to use VPRegionBlocks for all
loops in a loop nest. Up to now, only the outermost loop used a region.

This is a step towards unifying both paths and keep things consistent
between them. It also prepares various code-gen parts for modeling the
pre-header in the inner loop vectorizer (D121624).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123005
2022-06-01 10:41:05 +01:00
Florian Hahn d157019482
[VPlan] Remove unused native utilities incompatible with nested regions.
The implementations of VPlanDominatorTree, VPlanLoopInfo and VPlanPredicator
are all incompatible with modeling loops in VPlans as region without
explicit back-edges.

Those pieces are not actively used and only exercised by a few gtest
unit tests. They are at the moment blocking progress towards unifying
the native and inner-loop vectorizer paths in D121624 and D123005.

I think we should not block forward progress on unused pieces of code,
so this patch removes the utilities for now. The plan is to re-introduce
them as needed in a way that is compatible with the unified VPlan scheme
used in both the inner loop vectorizer and the native path.

Reviewed By: sguggill

Differential Revision: https://reviews.llvm.org/D123017
2022-06-01 09:32:59 +01:00
Eli Friedman abdf0da800 [LoopIdiom] Fix bailout for aliasing in memcpy transform.
Commit dd5991cc modified the aliasing checks here to allow transforming
a memcpy where the source and destination point into the same object.
However, the change accidentally made the code skip the alias check for
other operations in the loop.

Instead of completely skipping the alias check, just skip the check for
whether the memcpy aliases itself.

Differential Revision: https://reviews.llvm.org/D126486
2022-05-31 17:24:23 -07:00
Sanjay Patel 2bf6123f22 [InstCombine] fold icmp of sext bool based on limited range
X <=u (sext i1 Y) --> (X == 0) | Y

https://alive2.llvm.org/ce/z/W_tZzo

This is the conjugate/sibling pattern suggested with D126171
for a sign-extended bool value.
2022-05-31 12:37:56 -04:00
Augie Fackler 73f664601c BuildLibCalls: infer allockind attributes on relevant functions
Differential Revision: https://reviews.llvm.org/D123089
2022-05-31 10:01:17 -04:00
Augie Fackler 42861faa8e attributes: introduce allockind attr for describing allocator fn behavior
I chose to encode the allockind information in a string constant because
otherwise we would get a bit of an explosion of keywords to deal with
the possible permutations of allocation function types.

I'm not sure that CodeGen.h is the correct place for this enum, but it
seemed to kind of match the UWTableKind enum so I put it in the same
place. Constructive suggestions on a better location most certainly
encouraged.

Differential Revision: https://reviews.llvm.org/D123088
2022-05-31 10:01:17 -04:00
Nikita Popov 36cbdaa163 [InstCombine] Fix inbounds preservation when swapping GEPs (PR44206)
When reassociating GEPs, we can only keep inbounds if both original
GEPs were inbounds, and their offsets have the same sign. For the
sake of simplicity, I only handle the case where both offsets are
non-negative here.

It would probably be fine to just not preserve inbounds at all here,
but as I don't see a compile-time impact for adding the
isKnownNonNegative() calls I went with this more conservative
approach.

Fixes https://github.com/llvm/llvm-project/issues/44206.

Differential Revision: https://reviews.llvm.org/D126687
2022-05-31 15:45:02 +02:00
Mel Chen b0fc765350 [NFC] Change LoopVectorizationCostModel::useOrderedReductions() to be a const function.
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D126200
2022-05-31 05:39:13 -07:00
Danila Malyutin 4fb3fd7d82 [InstCombine] Fix const folding of switches with default case
In case phi was in the default block it could lead to multi-edge.
Fixes #55721.

Differential Revision: https://reviews.llvm.org/D126650
2022-05-31 15:13:58 +03:00
Nikita Popov 872d69e5d4 [InstCombine] Fix inbounds preservation when merging GEPs (PR55722)
Even if the total offset is inbounds, we might represent it by first
performing a large negative offset and then a small positive one.
With inbounds semantics as currently specified, each offset must
be inbounds individually, not just the overall offset of the GEP.

Fix this by checking that the sign of all offsets is the same.

Fixes https://github.com/llvm/llvm-project/issues/55722.
2022-05-31 11:54:01 +02:00
Sanjay Patel a0c3c60728 [InstCombine] fold shift-right-by-constant with shift-right-of-constant operand
(C2 >> X) >> C1 --> (C2 >> C1) >> X

The shift-left form of this transform has existed since:
16f18ed7b5

...but it applies to matching shift right opcodes too:
https://alive2.llvm.org/ce/z/c5eQms
2022-05-30 15:30:01 -04:00
Sanjay Patel c5d942a4fb [InstCombine] remove unnecessary one-use check from (C2 << X) << C1 fold
The restriction goes back to:
16f18ed7b5
...but the fold only replaces a shift with a shift, so that's not necessary.
Generalizing to other opcodes is planned as a follow-up.
2022-05-30 15:17:54 -04:00
Nuno Lopes 80b3dcc045 [Support] Make report_fatal_error respect its GenCrashDiag argument so it doesn't generate a backtrace
There are a few places where we use report_fatal_error when the input is broken.
Currently, this function always crashes LLVM with an abort signal, which
then triggers the backtrace printing code.
I think this is excessive, as wrong input shouldn't give a link to
LLVM's github issue URL and tell users to file a bug report.
We shouldn't print a stack trace either.

This patch changes report_fatal_error so it uses exit() rather than
abort() when its argument GenCrashDiag=false.

Reviewed by: nikic, MaskRay, RKSimon

Differential Revision: https://reviews.llvm.org/D126550
2022-05-30 19:19:23 +01:00
Nikita Popov a770f534e6 [InstCombine] When swapping GEPs, only keep inbounds if both are
If only one of the GEPs is inbounds, then after swapping, there is
no guarantee that one of them will be inbounds as well
(see e.g. https://alive2.llvm.org/ce/z/agaCnp).

This is only a partial fix, because even if both are inbounds, the
result is not necessarily inbounds (if the offsets have different
signs).
2022-05-30 17:04:42 +02:00
Nikita Popov 2d7bab666f [InstCombine] Always create new GEPs when swapping GEPs
As the long explanatory comment attests, performing the modification
in place is pretty tricky. Drop this unnecessary complexity and
always create new instructions.

This should be NFC-ish, but can probably cause difference due to
worklist order.
2022-05-30 16:48:52 +02:00
Nikita Popov 2e101cca69 [Local] Don't remove invoke of non-willreturn function
The code was only checking for memory side-effects, but not for
divergence side-effects. Replace this with a generic check.
2022-05-30 15:37:46 +02:00
zhongyunde 3e6ba89055 [InstCombine] Fold a mul with bool value into and
Fixes https://github.com/llvm/llvm-project/issues/55599
  X * Y --> X & Y, iff X, Y can be only {0, 1}.
https://alive2.llvm.org/ce/z/_RsTKF

Reviewed By: spatel, nikic

Differential Revision: https://reviews.llvm.org/D126040
2022-05-30 21:05:00 +08:00
Nikita Popov 1721ff1dfd [GVN] Enable enable-split-backedge-in-load-pre option by default
This option was added in D89854. It prevents GVN from performing
load PRE in a loop, if doing so would require critical edge
splitting on the backedge. From the review:

> I know that GVN Load PRE negatively impacts peeling,
> loop predication, so the passes expecting that latch has
> a conditional branch.

In the PhaseOrdering test in this patch, splitting the backedge
negatively affects vectorization: After critical edge splitting,
the loop gets rotated, effectively peeling off the first loop
iteration. The effect is that the first element is handled
separately, then the bulk of the elements use a vectorized
reduction (but using unaligned, off-by-one memory accesses) and
then a tail of 15 elements is handled separately again.

It's probably worth noting that the loop load PRE from D99926 is
not affected by this change (as it does not need backedge
splitting). This is about normal load PRE that happens to occur
inside a loop.

Differential Revision: https://reviews.llvm.org/D126382
2022-05-30 09:55:58 +02:00
Max Kazantsev 503d5771b6 [JumpThreading][NFCI] Reuse existing DT instead of recomputation
This whole part with recomputation of BPI and BFI looks redundant,
and we tried to get rid of it in D124439. Unfortunately, it causes
some hard-to-reproduce failures due to invalid state of analysis.
Until this is investigated and fixed, let's try to reuse at least
part of available analyzes.

DT is available at this point, and there is no need to recompute it.

Please revert if you see it causing *any* behavior changes.
2022-05-30 12:48:10 +07:00
Chenbing Zheng ef256ed58e [InstCombine] bitcast (extractelement <1 x elt>, dest) -> bitcast(<1 x elt>, dest)
Only solve dest type is vector to avoid inverse transform in visitBitCast.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D125951
2022-05-30 10:16:32 +08:00
Florian Hahn 0776c48f9b
Recommit "[LICM] Only create load in ph when promoting load or store doesn't exec."
This reverts the revert commit ad95255b92.

The updated version also creates a load when the store may not execute.
In those cases, we still need to introduce a load in a function where
there may not have been one before, so this doesn't completely resolve
issue #51248.

Original message:

    When only a store is sunk, there is no need to create a load in the
    pre-header, as the result of the load will never get used.

    The dead load can can introduce UB, if the function is marked as
    writeonly.

    Reviewed By: nikic

    Differential Revision: https://reviews.llvm.org/D123473
2022-05-29 21:57:14 +01:00
Florian Hahn 6abce17fc2
[VPlan] Use Exiting-block instead of Exit-block terminology (NFC).
In LLVM's common loop terminology, an exit block is a block outside a
loop with a predecessor inside the loop. An exiting block is a block
inside the loop which branches to an exit block outside the loop.

This patch updates a few places where VPlan was using ExitBlock for a
block exiting a region. Those instances have been updated to use
ExitingBlock.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D126173
2022-05-28 21:16:05 +01:00
eopXD 6a84579243 [LSR][TTI][PowerPC][SystemZ][X86] Add const-ness to TTI::isLSRCostLess. NFC
Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D126350
2022-05-27 15:22:23 -07:00
Sanjay Patel b5b6aa4d53 [InstCombine] fold multiply by signbit-splat to cmp+select
(ashr i32 X, 31) * C --> (X < 0) ? -C : 0
https://alive2.llvm.org/ce/z/G8u9SS

With a constant operand, this is an improvement in IR
and codegen (where it can be converted to a mask op).

Without a constant operand, we would have to negate
the operand, so that is probably better left to the backend.

This is similar but not the same optimization that is requested
in #55618.
2022-05-27 11:54:19 -04:00
Sanjay Patel 5a6e085757 [InstCombine] reduce code duplication; NFC 2022-05-27 11:54:19 -04:00
Enna1 52992f136b Add !nosanitize to FixedMetadataKinds
This patch adds !nosanitize metadata to FixedMetadataKinds.def, !nosanitize indicates that LLVM should not insert any sanitizer instrumentation.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D126294
2022-05-27 09:46:13 +08:00
Arthur Eubanks 36096c2b38 [NFC][JumpThreading] Remove InsertFreezeWhenUnfoldingSelect pass parameter
All callers pass true.

select-unfold-freeze.ll is now a subset of select.ll so delete it.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126501
2022-05-26 16:13:34 -07:00
Sanjay Patel c4c750058f [InstCombine] fold mul of signbit directly to X < 0 ? Y : 0
This is effectively NFC (intentionally no test diffs)
because we already have the related fold that converts
the 'and' pattern to select. So this is just an efficiency
improvement.
2022-05-26 16:19:15 -04:00
Sanjay Patel 49f8b05137 [InstCombine] fold icmp equality with sdiv and SMIN
This extends the fold from D126410 / 3952c905ef
to allow for the only case where it works with signed
division:
https://alive2.llvm.org/ce/z/k7_ypu

(X s/ Y) == SMIN --> (X == SMIN) && (Y == 1)
(X s/ Y) != SMIN --> (X != SMIN) || (Y != 1)

This is another improvement based on #55695.
2022-05-26 16:19:15 -04:00
Sanjay Patel ed5be1523f [InstCombine] reduce code duplication in icmp+div folds; NFC 2022-05-26 16:19:15 -04:00
Owen Anderson 939a43461b Revert "Replace the custom linked list in LeaderTableEntry with TinyPtrVector."
This reverts commit 1e91149844.

Pending further discussion.
2022-05-26 09:50:36 -07:00
Nikita Popov c8eb83f2d0 [ControlHeightReduction] Use logical and
Use logical instead of bitwise and to combine conditions, to avoid
propagating poison from a later condition if an earlier one is
already false. This avoids introducing branch on poison.

Differential Revision: https://reviews.llvm.org/D125898
2022-05-26 18:03:35 +02:00
Alexey Bataev 7b809c30b9 [SLP]Improve compile time, NFC.
Patch improves compile time. For function calls, which cannot be
vectorized, create a unique group for each such a call instead of
subgroup. It prevents them from being grouped by a subgroups and
attempts for their vectorization.

Also, looks through casts operand to try to check their
groups/subgroups.

Reduces number of vectorization attempts. No changes in the statistics
for SPEC2017/2006/llvm-test-suite.

Differential Revision: https://reviews.llvm.org/D126476
2022-05-26 08:40:59 -07:00
Alexey Bataev 120d52b0ef [SLP]Fix PR55653: emit undefs where required, not poison.
Need to handle a corner case correctly, if all elements are Undefs/Poisons,
need to emit actual values, not just poisons.

Differential Revision: https://reviews.llvm.org/D126298
2022-05-26 08:38:50 -07:00
Alex Zhikhartsev 8b0d763474 [DFAJumpThreading] Relax analysis to handle unpredictable initial values
Responding to a feature request from the Rust community:

https://github.com/rust-lang/rust/issues/80630

    void foo(X) {
      for (...)
	switch (X)
	  case A
	    X = B
	  case B
	    X = C
    }

Even though the initial switch value is non-constant, the switch
statement can still be threaded: the initial value will hit the switch
statement but the rest of the state changes will proceed by jumping
unconditionally.

The early predictability check is relaxed to allow unpredictable values
anywhere, but later, after the paths through the switch statement have
been enumerated, no non-constant state values are allowed along the
paths. Any state value not along a path will be an initial switch value,
which can be safely ignored.

Differential Revision: https://reviews.llvm.org/D124394
2022-05-26 11:29:54 -04:00
Simon Pilgrim 14258d6fb5 [SLP] Move canVectorizeLoads implementation to simplify the diff in D105986. NFC. 2022-05-26 15:23:58 +01:00
Alexey Bataev 9139d484d4 [SLP]Fix crash on reordering of ScatterVectorize nodes.
ScatterVectorize nodes should be handled same way as gathers in
reorderBottomToTop function, since we can simple reorder the loads in
this node. Because of that need to include such nodes to the list of
gathered nodes to fix compiler crash.

Differential Revision: https://reviews.llvm.org/D126378
2022-05-26 06:25:58 -07:00
Sanjay Patel 3952c905ef [InstCombine] fold icmp equality with udiv and large constant
With large compare constant:
(X u/ Y) == C --> (X == C) && (Y == 1)
(X u/ Y) != C --> (X != C) || (Y != 1)

https://alive2.llvm.org/ce/z/EhKwh6

There are various potential missing icmp (div) transforms shown here:
https://github.com/llvm/llvm-project/issues/55695

This is a generalization for part of the udiv + equality.
I didn't check in detail, but some of those may only make sense as
codegen transforms.

This results in one extra instruction in IR, but it is better for
analysis, and looks much better in codegen on all targets that I tried.

Differential Revision: https://reviews.llvm.org/D126410
2022-05-26 09:08:47 -04:00
Florian Hahn f96aa493f0
[SimpleLoopUnswitch] Always skip trivial select and set condition.
When updating the branch instruction outside the loopduring non-trivial
 unswitching, always skip trivial selects and update the condition.

Otherwise we might create invalid IR, because the trivial select is
inside the loop, while the condition is outside the loop.

Fixes #55697.
2022-05-26 09:46:24 +01:00
Florian Hahn 390c0ac28d
[LV] Fix indentation in tryToCreateWidenRecipe (NFC). 2022-05-26 08:53:34 +01:00
Owen Anderson 1e91149844 Replace the custom linked list in LeaderTableEntry with TinyPtrVector.
The purpose of the custom linked list was to optimize for the case
of a single-element list. It turns out that TinyPtrVector handles
the same basic scenario even better, reducing the size of
LeaderTableEntry by 33%, and requiring only log2(N) allocations
as the size of the list grows. The only downside is that we have
to store the Value's and BasicBlock's in separate vectors, which
is slightly awkward in a few cases. Fortunately that ends up being
entirely encapsulated inside helper functions.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D125205
2022-05-25 23:52:44 -07:00
Serguei Katkov c2eccc67ce [GuardWidening] Remove nuw/nsw flags for hoisted instructions
When we hoist instructions over guard we must clear flags due to these flags
might be implied using this guard, so they make sense only after the guard.

As an example of the bug due to current behavior.
L is known to be in range say [0, 100)
c1 = x u< L
guard (c1)
x1 = add x, 1
c2 = x1 u< L
guard(c2)

basing on guard(c1) we can say that x1 = add nuw nsw x, 1
after guard widening we get
c1 = x u< L
x1 = add nuw nsw x, 1
c2 = x1 u< L
c = and c1, c2
guard(c)

now, basing on fact that x + 1 < L and x >= 0 due to x + 1 is nuw
we can prove that x + 1 u< L implies that x u< L, so we can just remove c1
x1 = add nuw nsw x, 1
c2 = x1 u< L
guard(c2)

But that is not correct due to we will pass x == -1 value.

Reviewed By: mkazantsev
Subscribers: llvm-commits, nikic
Differential Revision: https://reviews.llvm.org/D126354
2022-05-26 13:20:55 +07:00