Commit Graph

28640 Commits

Author SHA1 Message Date
David Sherwood c2634fc6ab [Analysis] Fix issues when querying vscale attributes on functions
There are several places in the code that are currently broken as
they assume an Instruction always has a parent Function when
attempting to get the vscale_range attribute. This patch adds checks
that an Instruction has a parent.

I've added a test for a parentless @llvm.vscale intrinsic call here:

  unittests/Analysis/ValueTrackingTest.cpp

Differential Revision: https://reviews.llvm.org/D110158
2021-09-24 09:58:10 +01:00
Nikita Popov 1e3c6fc7cb [JumpThreading] Ignore free instructions
This is basically D108837 but for jump threading. Free instructions
should be ignored for the threading decision. JumpThreading already
skips some free instructions (like pointer bitcasts), but does not
skip various free intrinsics -- in fact, it currently gives them a
fairly large cost of 2.

Differential Revision: https://reviews.llvm.org/D110290
2021-09-23 18:28:36 +02:00
Fangrui Song 1a6e1ee42a Resolve {GlobalValue,GloalIndirectSymol}::getBaseObject confusion
While both GlobalAlias and GlobalIFunc are GlobalIndirectSymbol, their
`getIndirectSymbol()` usage is quite different (GlobalIFunc's resolver
is an entity different from GlobalIFunc itself).

As discussed on https://lists.llvm.org/pipermail/llvm-dev/2020-September/144904.html
("[IR] Modelling of GlobalIFunc"), the name `getBaseObject` is confusing when
used with GlobalIFunc.

To resolve the confusion:

* Move GloalIndirectSymol::getBaseObject to GlobalAlias:: (GlobalIFunc should use `getResolver` instead)
* Change GlobalValue::getBaseObject not to inspect GlobalIFunc. Note: the function has 7 references.
* Add GlobalIFunc::getResolverFunction to peel off potential ConstantExpr indirection
  (`strlen` in `test/LTO/Resolution/X86/ifunc.ll`)

Note: GlobalIFunc::getResolver (like GlobalAlias::getAliasee which does not peel
off ConstantExpr indirection) is kept to be used by ValueEnumerator.

Reviewed By: ibookstein

Differential Revision: https://reviews.llvm.org/D109792
2021-09-23 09:23:35 -07:00
Simon Pilgrim 5f2c53bdf4 Pass some DataLayout arguments by const-ref
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-23 15:50:31 +01:00
Sanjay Patel bb9333c350 [InstCombine] fold cast of right-shift if high bits are not demanded (2nd try)
The 1st try at this was reverted because it caused an infinite loop in instcombine.
That should be fixed after:
1cd6b44f26

(masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C

Narrowing the shift should be better for analysis and can lead
to follow-on transforms as shown.

Attempt at a general proof in Alive2:
https://alive2.llvm.org/ce/z/tRnnSF

Here are a couple of the specific tests:
https://alive2.llvm.org/ce/z/bCnTp-
https://alive2.llvm.org/ce/z/TfaHnb

Differential Revision: https://reviews.llvm.org/D110170
2021-09-23 09:41:37 -04:00
Florian Hahn 5ce89279c0
[DSE] Track earliest escape, use for loads in isReadClobber.
At the moment, DSE only considers whether a pointer may be captured at
all in a function. This leads to cases where we fail to remove stores to
local objects because we do not check if they escape before potential
read-clobbers or after.

Doing context-sensitive escape queries in isReadClobber has been removed
a while ago in d1a1cce5b1 to save compile-time. See PR50220 for more
context.

This patch introduces a new capture tracker, which keeps track of the
'earliest' capture. An instruction A is considered earlier than instruction
B, if A dominates B. If 2 escapes do not dominate each other, the
terminator of the common dominator is chosen. If not all uses cannot be
analyzed, the earliest escape is set to the first instruction in the
function entry block.

If the query instruction dominates the earliest escape and is not in a
cycle, then pointer does not escape before the query instruction.

This patch uses this information when checking if a load of a loaded
underlying object may alias a write to a stack object. If the stack
object does not escape before the load, they do not alias.

I will share a follow-up patch to also use the information for call
instructions to fix PR50220.

In terms of compile-time, the impact is low in general,
    NewPM-O3: +0.05%
    NewPM-ReleaseThinLTO: +0.05%
    NewPM-ReleaseLTO-g: +0.03

with the largest change being tramp3d-v4 (+0.30%)
http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Compared to always computing the capture information on demand, we get
the following benefits from the caching:
NewPM-O3: -0.03%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.04%

The biggest speedup is tramp3d-v4 (-0.21%).
http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Overall there is a small, but noticeable benefit from caching. I am not
entirely sure if the speedups warrant the extra complexity of caching.
The way the caching works also means that we might miss a few cases, as
it is less precise. Also, there may be a better way to cache things.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109844
2021-09-23 12:45:05 +01:00
Bjorn Pettersson 85a586501b [BasicBlockUtils] Fixup of an assumed typo in MergeBlockIntoPredecessor
The NFC commit e5692a564a changed the logic for
DomTreeUpdates to use the range [succ_begin, succ_begin) when
looking for SuccsOfPredBB rather than using [succ_begin, succ_end).

As the commit was NFC this is identified as a typo (it has been
discussed briefly in phabricator).

The typo was found when inspecting the code, so I've got no idea if
changing back to the old range has any significant impact (such as
solving any PR:s or causing some new problems). But at least this
restores the code to the originally indented behavior.
2021-09-23 13:03:26 +02:00
Alex Richardson 05663dc146 [InstSimplify] Don't lose inbounds when simplifying a GEP
I noticed this while working on a (ptrtoint (gep null, x)) -> x fold.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D110168
2021-09-23 09:25:06 +01:00
Bjorn Pettersson c5e0313e44 [ModuleInlinerWrapperPass] Do some naive printing of wrapped pipeline with -print-pipeline-passes
Bisecting and reducing opt pipelines that includes the
ModuleInlinerWrapperPass has turned out to be a bit problematic.
This is far from perfect (it still lacks information about inline
advisor params etc.), but it should give some kind of hint to what
the wrapped pipeline looks like when using -print-pipeline-passes.

Reviewed By: aeubanks, mtrofin

Differential Revision: https://reviews.llvm.org/D109878
2021-09-23 09:54:42 +02:00
Johannes Doerfert c6457dcae8 [OpenMP][FIX] Be more deliberate about invalidating the AAKernelInfo state
This patch fixes a problem when the AAKernelInfo state was invalidated,
e.g., due to `optnone` for a kernel, but not all parts indicated the
invalidation properly. We further eliminate most full state
invalidations as they should never be necessary.

Differential Revision: https://reviews.llvm.org/D109468
2021-09-23 00:04:30 -05:00
Johannes Doerfert 0a16c56010 [OpenMP][NFC] Improve debug output 2021-09-23 00:04:29 -05:00
hyeongyu kim 10a5632550 [NFC][InstCombine] Fix inconsistent comments 2021-09-23 09:31:39 +09:00
Shilei Tian 423d34f74a [OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit`
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110279
2021-09-22 17:16:41 -04:00
Sanjay Patel 1cd6b44f26 [InstCombine] add one-use check to shift-shift transform
We don't want to create extra instructions, and this
could infinite loop with the proposed transform in D110170.
2021-09-22 16:31:12 -04:00
Arthur Eubanks e7249e4acf [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest
When determining whether to fold branches to a common destination by
merging two blocks, SimplifyCFG will count the number of instructions to
be moved into the first basic block. However, there's no reason to count
free instructions like bitcasts and other similar instructions.

This resolves missed branch foldings with -fstrict-vtable-pointers in
llvm-test-suite's lambda benchmark.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D108837
2021-09-22 09:52:37 -07:00
Simon Pilgrim 8a44281f47 [SLP] getReductionCost - use explicit TTI::TCK_RecipThroughput CostKind. NFCI.
Avoid relying on the default cost kinds in TTI calls (we already do this in other places in SLP) - noticed while trying to see how much work it'd be to extend D110242 and remove all remaining uses of default CostKind arguments.
2021-09-22 16:52:22 +01:00
hyeongyu kim 98e96663f6 [InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (3/3)
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineVectorOps.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D110230
2021-09-23 00:48:24 +09:00
Shilei Tian ca999f7191 [OpenMP][Offloading] Use bitset to indicate execution mode instead of value
The execution mode of a kernel is stored in a global variable, whose value means:
- 0 - SPMD mode
- 1 - indicates generic mode
- 2 - SPMD mode execution with generic mode semantics

We are going to add support for SIMD execution mode. It will be come with another
execution mode, such as SIMD-generic mode. As a result, this value-based indicator
is not flexible.

This patch changes to bitset based solution to encode execution mode. Each
position is:
[0] - generic mode
[1] - SPMD mode
[2] - SIMD mode (will be added later)

In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode
execution with generic mode semantics. In the future after we add the support for
SIMD mode, `0b1xx` will be in SIMD mode.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110029
2021-09-22 11:40:52 -04:00
hyeongyu kim ec8311444a [InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (2/3)
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D110227
2021-09-23 00:14:50 +09:00
hyeongyu kim e5aaf03326 [InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (1/3)
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCasts.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D110226
2021-09-22 23:18:51 +09:00
Joseph Huber 1cf86df883 [OpenMP] Make sure the Thread ID function is not removed
Summary:
The thread ID function was reintroduced in D110195, but could
potentially be removed by the optimizer. Make the function noinline to
preserve the call sites and add it to the externalization RAII so its
definition is not removed by the attributor.
2021-09-22 10:13:18 -04:00
Florian Hahn a7c6471a85
[Passes] Run vector-combine early with -fenable-matrix.
IR with matrix intrinsics is likely to also contain large vector
operations, which can benefit from early simplifications.

This is the last step in a series of changes to improve code-gen for
code using matrix subscript operators with the C/C++ matrix extension in
CLang, like

    using matrix_t = double __attribute__((matrix_type(15, 15)));

    void foo(unsigned i, matrix_t &A, matrix_t &B) {
      for (unsigned j = 0; j < 4; ++j)
        for (unsigned k = 0; k < i; k++)
          B[k][j] -= A[k][j] * B[i][j];
    }

https://clang.godbolt.org/z/6dKxK1Ed7

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D102496
2021-09-22 12:48:32 +01:00
Sanjay Patel c6013f71a4 Revert "[InstCombine] fold cast of right-shift if high bits are not demanded"
This reverts commit 2f6b07316f.

This caused several bots to hit an infinite loop at stage 2,
so it needs to be reverted while figuring out how to fix that.
2021-09-22 07:45:21 -04:00
Yi Kong d0746f2e9b Don't fold (select C, (gep Ptr, Idx), Ptr) if C is vector but Idx is scalar
The folding rule (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C,
Idx, 0)) creates a malformed SELECT IR if C is a vector while Idx is scalar.

  SELECT VecC, ScalarIdx, 0

We could splat Idx to a vector but it defeats the purpose of
optimisation. Don't apply the folding rule in this case.

This fixes a regression from commit d561b6fbdb.
2021-09-22 18:11:33 +08:00
Florian Mayer 36daf074d9 [hwasan] also omit safe mem[cpy|mov|set].
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D109816
2021-09-22 11:08:27 +01:00
Florian Hahn 300870a95c
[VectorCombine] Switch to using a worklist.
This patch updates VectorCombine to use a worklist to allow iterative
simplifications where a combine enables other combines.

Suggested in D100302.

The main use case at the moment is foldSingleElementStore and
scalarizeLoadExtract working together to improve scalarization.

Note that we now also do not run SimplifyInstructionsInBlock on the
whole function if there have been changes. This means we fail to
remove/simplify instructions not related to any of the vector combines.
IMO this is fine, as simplifying the whole function seems more like a
workaround for not tracking the changed instructions.

Compile-time impact looks neutral:
NewPM-O3: +0.02%
NewPM-ReleaseThinLTO: -0.00%
NewPM-ReleaseLTO-g: -0.02%

http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D110171
2021-09-22 09:54:58 +01:00
Florian Hahn e08a5dc86f
[InstCombine] Move InstCombineWorklist to Utils to allow reuse (NFC).
InstCombine's worklist can be re-used by other passes like
VectorCombine. Move it to llvm/Transform/Utils and rename it to
InstructionWorklist.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110181
2021-09-22 08:47:21 +01:00
Wenlei He 5f187f0afa [SamplePGO] Add switch to honor zero count on block level as accurate
Add a new LLVM switch `-profile-sample-block-accurate` to trust zero block counts for branches. Currently we leave out such zero counts when annotating branch weight metadata, which would lead to weights being considered as unknown.

Differential Revision: https://reviews.llvm.org/D110117
2021-09-21 17:06:37 -07:00
Sanjay Patel 2f6b07316f [InstCombine] fold cast of right-shift if high bits are not demanded
(masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C

Narrowing the shift should be better for analysis and can lead
to follow-on transforms as shown.

Attempt at a general proof in Alive2:
https://alive2.llvm.org/ce/z/tRnnSF

Here are a couple of the specific tests:
https://alive2.llvm.org/ce/z/bCnTp-
https://alive2.llvm.org/ce/z/TfaHnb

Differential Revision: https://reviews.llvm.org/D110170
2021-09-21 16:09:08 -04:00
Nikita Popov e4a1af3724 [MergeICmps] Remove unused NumMerged variable 2021-09-21 21:43:25 +02:00
Nikita Popov f2fa6ad047 [MergeICmps] Don't reorder unmerged comparisons
MergeICmps will currently sort (by offset) all comparisons in a chain,
including those that do not get merged. This is problematic in two ways:

 * We may end up moving the original first block into the middle of
   the chain, in which case the "extra work" instructions will also
   be in the middle of the chain, resulting in invalid IR
   (reported in https://reviews.llvm.org/D108782#3005583).
 * Reordering branches is generally not legal, because it may
   introduce branch on poison, which is UB (PR51845). The merging
   done by MergeICmps is legal as long as we assume that memcmp()
   works on frozen memory, but the reordering of unmerged comparisons
   is definitely incorrect (without inserting freeze instructions),
   so we should avoid it.

There are easier ways to fix the first issue, but I figured it was
worthwhile to do this properly to also fix the second one. What we
now do is to restore the original relative order of (potentially
merged) comparisons.

I took the liberty of dropping the MERGEICMPS_DOT_ON functionality,
because it would be more awkward to implement now (as the before and
after representation is different) and it doesn't seem terribly
useful nowadays.

Differential Revision: https://reviews.llvm.org/D110024
2021-09-21 21:22:12 +02:00
Owen Anderson b5fbbdd202 Teach InstCombine to eliminate malloc-realloc-free triplets.
Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D109988
2021-09-21 18:07:49 +00:00
Danila Malyutin 78b51c7a2c [LSR] Make sure that Factor fits into Base type
Fixes pr42770

Differential Revision: https://reviews.llvm.org/D108772
2021-09-21 20:50:50 +03:00
Ayal Zaks ab6a69dfea [LV] Fix crash for reverse interleaved loads with gap under fold-tail.
This patch fixes the crash found by PR51614:
whenever doing tail folding, interleave groups must be considered under mask.

Another fix D108900 follows for targets that support masked loads and stores:
when *deciding* to vectorize with masked interleave groups, check if the access
is reverse - which is currently not supported; rather than (only) asserting when
computing cost and generating code.

Differential Revision: https://reviews.llvm.org/D108891
2021-09-21 20:13:32 +03:00
Dávid Bolvanský c0fdfc9af2 [InstCombine] powi(x, y) * powi(x, z) -> powi(x, y + z)
We already have pow(x, y) * pow(x, z) -> pow(x, y + z) transformation, but we are missing same transformation for powi (power is integer).

Requires reassoc.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D109954
2021-09-21 18:20:46 +02:00
Florian Hahn 5131037ea9
[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange.
isValidAssumeForContext can provide better results with access to the
dominator tree in some cases. This patch adjusts computeConstantRange to
allow passing through a dominator tree.

The use VectorCombine is updated to pass through the DT to enable
additional scalarization.

Note that similar APIs like computeKnownBits already accept optional dominator
tree arguments.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110175
2021-09-21 16:54:47 +01:00
Anna Thomas 69921f6f45 [InstCombine] Improve TryToSinkInstruction with multiple uses
This patch allows sinking an instruction which can have multiple uses in a
single user. We were previously over-restrictive by looking for exactly one use,
rather than one user.

Also added an API for retrieving a unique undroppable user.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109700
2021-09-21 10:04:04 -04:00
Simon Pilgrim fc8f1e4419 [InstCombine] foldConstantInsEltIntoShuffle - bail if we fail to find constant element (PR51824)
If getAggregateElement() returns null for any element, early out as otherwise we will assert when creating a new constant vector

Fixes PR51824 + ; OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38057
2021-09-21 13:01:09 +01:00
Simon Pilgrim f5d23d36de RewriteStatepointsForGC - Use const-ref iterator in for-range loops. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-21 13:01:08 +01:00
David Stenberg 7b4cc09b14 [LowerConstantIntrinsics] Fix heap-use-after-free bug in worklist
This fixes PR51730, a heap-use-after-free bug in
replaceConditionalBranchesOnConstant().

With the attached reproducer we were left with a function looking
something like this after replaceAndRecursivelySimplify():

  [...]

  cont2.i:
    br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i

  handler.type_mismatch3.i:
    %3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ]
    unreachable

  cont4.i:
    unreachable

  [...]

with both the branch instruction and PHI node being in the worklist. As
a result of replacing the branch instruction with an unconditional
branch, the PHI node in %handler.type_mismatch3.i would be removed. This
then resulted in a heap-use-after-free bug due to accessing that removed
PHI node in the next worklist iteration.

This is solved by using a value handle worklist. I am a unsure if this
is the most idiomatic solution. Another solution could have been to
produce a worklist just containing the interesting branch instructions,
but I thought that it perhaps was a bit cleaner to keep all worklist
filtering in the loop that does the rewrites.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D109221
2021-09-21 11:33:07 +02:00
Evgeniy Brevnov 129cf33604 [DSE][NFC] Rename Later->Killing, Earlier->Dead
First (and biggest) change is to use "Killing/Dead" in place of "Later/Earlier" base for names in DSE. For example, [Maybe]DeadLoc - is a location killed by KillingI instruction. I believe such names are more descriptive and easy to understand than current ones.

Second, there are inconsistencies in naming where different names are used for the same thing. Fixed that too.

Third, reordered parameters of isPartialOverwrite, tryToMergePartialOverlappingStores, isOverwrite to make them consistent between each other. This greatly reduces potential mistakes.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D106947
2021-09-21 13:44:12 +07:00
Max Kazantsev 073b254cff [SimplifyCFG] Redirect switch cases that lead to UB into an unreachable block
When following a case of a switch instruction is guaranteed to lead to
UB, we can safely break these edges and redirect those cases into a newly
created unreachable block. As result, CFG will become simpler and we can
remove some of Phi inputs to make further analyzes easier.

Patch by Dmitry Bakunevich!

Differential Revision: https://reviews.llvm.org/D109428
Reviewed By: lebedev.ri
2021-09-21 10:45:19 +07:00
Kazu Hirata 85b4b21c8b [llvm] Use make_early_inc_range (NFC) 2021-09-20 19:30:02 -07:00
Usman Nadeem f417d9d821 [InstCombine] Eliminate vector reverse if all inputs/outputs to an instruction are reverses
Differential Revision: https://reviews.llvm.org/D109808

Change-Id: I1a10d2bc33acbe0ea353c6cb3d077851391fe73e
2021-09-20 18:32:24 -07:00
Florian Mayer 16b5f4502c [NFC] [hwasan] Separate outline and inline instrumentation.
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D110067
2021-09-20 21:49:09 +01:00
Nikita Popov dd0226561e [IR] Add helper to convert offset to GEP indices
We implement logic to convert a byte offset into a sequence of GEP
indices for that offset in a number of places. This patch adds a
DataLayout::getGEPIndicesForOffset() method, which implements the
core logic. I've updated SROA, ConstantFolding and InstCombine to
use it, and there's a few more places where it looks relevant.

Differential Revision: https://reviews.llvm.org/D110043
2021-09-20 20:18:16 +02:00
Alexey Bataev bc69dd62c0 [SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020
2021-09-20 08:42:19 -07:00
Bjorn Pettersson c8cb7f611f [NewPM] Make InlinerPass (aka 'inline') a parameterized pass
In default pipelines the ModuleInlinerWrapperPass is adding the
InlinerPass to the pipeline twice, once due to MandatoryFirst (passing
true in the ctor) and then a second time with false as argument.

To make it possible to bisect and reduce opt test cases for this
part of the pipeline we need to be able to choose between the two
different variants of the InlinerPass when running opt. This patch is
changing 'inline' to a CGSCC_PASS_WITH_PARAMS in the PassRegistry,
making it possible run opt with both -passes=cgscc(inline) and
-passes=cgscc(inline<only-mandatory>).

Reviewed By: aeubanks, mtrofin

Differential Revision: https://reviews.llvm.org/D109877
2021-09-20 12:52:52 +02:00
Max Kazantsev e9d34c5429 [NFC] Add assert and test showing that revert of D109596 wasn't justified
All transforms of IndVars have prerequisite requirement of LCSSA and LoopSimplify
form and rely on it. Added test that shows that this actually stands.
2021-09-20 12:01:12 +07:00
Max Kazantsev 471217cff8 Revert "Revert "[IndVars] Replace PHIs if loop exits on 1st iteration""
This reverts commit 6fec6552f5.

The patch was reverted on incorrect claim that this patch may break LCSSA form
when the loop is not in a simplify form. All IndVars' transform insure that
the loop is in simplify and LCSSA form, so if it wasn't broken before this
transform, it will also not be broken after it.
2021-09-20 12:01:10 +07:00
Kazu Hirata 84b07c9b3a [llvm] Use pop_back_val (NFC) 2021-09-19 13:44:23 -07:00
Chris Jackson 5ba8020326 [DebugInfo][LSR] Emit shorter expressions from scev-based salvaging
The scev-based salvaging for LSR can sometimes produce unnecessarily
verbose expressions. This patch adds logic to detect when the value to
be recovered and the induction variable differ by only a constant
offset. Then, the expression to derive the current iteration count can
be omitted from the dbg.value in favour of the offset.

Reviewed by: aprantl

Differential Revision: https://reviews.llvm.org/D109044
2021-09-19 21:41:44 +01:00
Joseph Huber 27905eeb89 [Attributor] Change AAExecutionDomain to check intrinsic edges
The AAExecutionDomain instance checks if a BB is executed by the main
thread only. Currently, this only checks the `__kmpc_kernel_init` call
for generic regions to indicate the path taken by the main thread. In
the new runtime, we want to be able to detect basic blocks even in SPMD
mode. For this we enable it to check thread-ID intrinsics being compared
to zero as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109849
2021-09-17 19:51:38 -04:00
Arthur Eubanks 0db9481208 [NFC] Remove FIXMEs about calling LLVMContext::yield()
Nobody has complained about this, and the documentation for
LLVMContext::yield() states that LLVM is allowed to never call it.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D110008
2021-09-17 14:59:34 -07:00
Andrew Browne c533b88a6d [DFSan] Add force_zero_label abilist option to DFSan. This can be used as a work-around for overtainting.
Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D109847
2021-09-17 12:57:40 -07:00
Kazu Hirata e2febc2ed4 [llvm] Use drop_begin (NFC) 2021-09-17 09:16:40 -07:00
Sanjay Patel 41ff7612b3 [InstCombine] allow splat vectors for narrowing masked fold
Mostly cosmetic diffs, but the use of m_APInt matches splat constants.
2021-09-17 11:24:16 -04:00
Simon Pilgrim 77f6c0bcaa Fix Wdocumentation warnings. NFCI.
Fix parameter name typos and drop returns statements from void functions
2021-09-17 12:45:56 +01:00
Sjoerd Meijer 97cc678cc4 [FuncSpec] Specialising on addresses of const global values.
This introduces an option to allow specialising on the address of global
values. This option is off by default because it is likely not that profitable
to do so and needs more investigation. Before, we were specialising on addresses
and thus this changes the default behaviour.

Differential Revision: https://reviews.llvm.org/D109775
2021-09-17 08:07:05 +01:00
Christudasan Devadasan 167ff5280d [GlobalOpt] Do not shrink global to bool for an unfavorable AS
Do not call `TryToShrinkGlobalToBoolean` for address spaces
that don't allow initializers. It inserts an initializer value
while shrinking to bool. Used the target hook introduced with
D109337 to skip this call for the restricted address spaces.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D109823
2021-09-16 23:13:30 -04:00
Daniil Suchkov 0e36288318 [LoopPredication] Report changes correctly when attempting loop exit predication
To make the IR easier to analyze, this pass makes some minor transformations.
After that, even if it doesn't decide to optimize anything, it can't report that
it changed nothing and preserved all the analyses.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D109855
2021-09-16 22:49:55 +00:00
Jon Roelofs 4b19e7dfae [LoopIdiomRecognize][Remarks] Track loop-strided store to/from blocks
Differential revision: https://reviews.llvm.org/D109929
2021-09-16 15:46:26 -07:00
Teresa Johnson 88cb3e2cb6 [MemProf] Don't instrument stack accesses unless requested
Skip stack accesses unless requested, as the memory profiler runtime
does not currently look at or report accesses for these addresses.

Differential Revision: https://reviews.llvm.org/D109868
2021-09-16 12:21:51 -07:00
Nikita Popov 0fc624f029 [IR] Return AAMDNodes from Instruction::getMetadata() (NFC)
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.

Differential Revision: https://reviews.llvm.org/D109852
2021-09-16 21:06:57 +02:00
Arthur Eubanks d49cb5b303 [SimplifyCFG] Add bonus when seeing vector ops to branch fold to common dest
This makes some tests in vector-reductions-logical.ll more stable when
applying D108837.

The cost of branching is higher when vector ops are involved due to
potential SLP transformations.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D108935
2021-09-16 10:50:36 -07:00
Dávid Bolvanský a4a426c9e0 [InstCombine] Added llvm.powi optimizations
If power is even:
powi(-x, p) -> powi(x, p)
powi(fabs(x), p) -> powi(x, p)
powi(copysign(x, y), p) -> powi(x, p)
2021-09-16 19:42:21 +02:00
Kazu Hirata cfc7402419 [llvm] Use drop_begin (NFC) 2021-09-16 08:46:26 -07:00
Bjorn Pettersson d9fc3d879e [NewPM] Replace 'kasan-module' by 'asan-module<kernel>'
Change the asan-module pass into a MODULE_PASS_WITH_PARAMS in the
pass registry, and add a single parameter called 'kernel' that
can be set instead of having a special pass name 'kasan-module'
to trigger that special pass config.

Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.

This is a follow-up to D105006 and D105007.
2021-09-16 14:58:42 +02:00
Bjorn Pettersson 8f8616655c [NewPM] Use a separate struct for ModuleThreadSanitizerPass
Split ThreadSanitizerPass into ThreadSanitizerPass (as a function
pass) and ModuleThreadSanitizerPass (as a module pass).
Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.

This is a follow-up to D105006 and D105007.
2021-09-16 14:58:42 +02:00
Bjorn Pettersson ab41eef9ac [NewPM] Use a separate struct for ModuleMemorySanitizerPass
Split MemorySanitizerPass into MemorySanitizerPass (as a function
pass) and ModuleMemorySanitizerPass (as a module pass).
Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.

This is a follow-up to D105006 and D105007.
2021-09-16 14:58:42 +02:00
Anton Afanasyev 6a5f49a1ac [AggressiveInstCombine] Add `{insert/extract}element` to `TruncInstCombine` DAG
Alive2 for `{insert/extract}element`: https://alive2.llvm.org/ce/z/hwy_E-

Actually, no one file of test suite is touched by this change,
which means that is rare pattern not generated by frontend. But
it's worth being in place.

Differential Revision: https://reviews.llvm.org/D109236
2021-09-16 11:24:31 +03:00
Kazu Hirata 24c8eaec94 [Transforms] Use make_early_inc_range (NFC) 2021-09-15 19:55:24 -07:00
Owen Anderson 68079ef0eb Teach SimplifyCFG to fold switches into lookup tables in more cases.
In particular, it couldn't handle cases where lookup table constant
expressions involved bitcasts. This does not seem to come up
frequently in C++, but comes up reasonably often in Rust via
`#[derive(Debug)]`.

Originally reported by pcwalton.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109565
2021-09-15 22:07:08 +00:00
Anna Thomas f9e4aebe4a Revert "[InstCombine] Improve TryToSinkInstruction with multiple uses"
This reverts commit 4ac4e52189.
There are couple of test failures, which needs update of the test cases.

Doing a clean revert and will recommit the change along with fixed
testcases.
2021-09-15 18:03:11 -04:00
Anna Thomas b6cb03e6b9 Revert use of getUniqueUndroppableUser in AssumeBundleBuilder
Fix build bot failure in rG4ac4e521 caused due to assumeBundleBuilder
using new API (getUniqueUndroppableUser).
We now continue using the existing API for AssumeBundleBuilder
(getSingleUndroppableUser).

Sorry for the noise here.

Tests-Run: failing testcase passes.
2021-09-15 17:45:09 -04:00
Anna Thomas 4ac4e52189 [InstCombine] Improve TryToSinkInstruction with multiple uses
This patch allows sinking an instruction which can have multiple uses in a
single user. We were previously over-restrictive by looking for exactly one use,
rather than one user.

Also, the API for retrieving undroppable user has been updated accordingly since
in both usecases (Attributor and InstCombine), we seem to care about the user,
rather than the use.

Reviewed-By: nikic

Differential Revision: https://reviews.llvm.org/D109700
2021-09-15 20:39:38 +00:00
Sanjay Patel e5a32d720e [InstCombine] move extend after insertelement if both operands are extended
I was wondering how instcombine does on the examples in D109236,
and we're missing a basic transform:

inselt (ext X), (ext Y), Index --> ext (inselt X, Y, Index)

https://alive2.llvm.org/ce/z/z2aBu9

Note that there are several possible extensions of this fold
(see TODO comments).

Differential Revision: https://reviews.llvm.org/D109537
2021-09-15 14:38:03 -04:00
Filipp Zhinkin f5d8952356 [InstCombine] Transform X == 0 ? 0 : X * Y --> X * freeze(Y)
Enabled mul folding optimization that was previously disabled
by being incorrect.
To preserve correctness, mul's operand that is not compared
with zero in select's condition is now frozen.

Related bug: https://bugs.llvm.org/show_bug.cgi?id=51286

Correctness:
https://alive2.llvm.org/ce/z/bHef7J
https://alive2.llvm.org/ce/z/QcR7sf
https://alive2.llvm.org/ce/z/vvBLzt
https://alive2.llvm.org/ce/z/jGDXgq
https://alive2.llvm.org/ce/z/3Pe8Z4
https://alive2.llvm.org/ce/z/LGga8M
https://alive2.llvm.org/ce/z/CTG5fs

Differential Revision: https://reviews.llvm.org/D108408
2021-09-15 09:04:06 -04:00
Florian Hahn e90d55e1c9
[VPlan] Support sinking recipes with uniform users outside sink target.
This is a first step towards addressing the last remaining limitation of
the VPlan version of sinkScalarOperands: the legacy version can
partially sink operands. For example, if a GEP has uniform users outside
the sink target block, then the legacy version will sink all scalar
GEPs, other than the one for lane 0.

This patch works towards addressing this case in the VPlan version by
detecting such cases and duplicating the sink candidate. All users
outside of the sink target will be updated to use the uniform clone.

Note that this highlights an issue with VPValue naming. If we duplicate
a replicate recipe, they will share the same underlying IR value and
both VPValues will have the same name ir<%gep>.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104254
2021-09-15 09:21:39 +01:00
Markus Lavin 1ac209ed76 [NPM] Added -print-pipeline-passes print params for a few passes.
Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.

Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.

The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).

LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch

Differential Revision: https://reviews.llvm.org/D109310
2021-09-15 08:34:04 +02:00
Matt Arsenault 88146230e1 SeparateConstOffsetFromGEP: Fix stack overflow in unreachable code
ConstantOffsetExtractor::Find was infinitely recursing on the add
referencing itself.
2021-09-14 19:49:38 -04:00
Matt Arsenault fdd9761dd1 Attributor: Fix crash on undef in !callees 2021-09-14 19:49:34 -04:00
Florian Hahn 7359450e6a
[VPlan] Queue (block, operand) pairs together (NFC).
Instead of discovering the sink-to block for each operand in the main
loop, the sink-to block can instead be directly queued with the
operands.

This simplifies processing in the main loop and is a NFC change split
off from D104254 as suggested there.
2021-09-14 20:02:51 +01:00
Kazu Hirata d9e46beace [IPO] Use make_early_inc_range (NFC) 2021-09-14 08:59:36 -07:00
Jon Chesterfield 71052ea1e3 [openmp] Apply code change from D109500 2021-09-13 18:33:53 +01:00
Jon Chesterfield bfcf979978 Revert "[openmp] Fix 51647, corrupt bitcode on amdgpu"
This reverts commit d5c049a3f6.
Going to re-commit it in pieces for easier application to 13
2021-09-13 18:25:07 +01:00
Philip Reames 6fec6552f5 Revert "[IndVars] Replace PHIs if loop exits on 1st iteration"
This reverts commit 5a6dfb27ca.  See original review for why.
2021-09-13 10:11:18 -07:00
Philip Reames 5746c76f3f Revert "[IndVars] Break backedge and replace PHIs if loop exits on 1st iteration"
This reverts commit d9ca444835.  See review for why.
2021-09-13 10:10:49 -07:00
Kazu Hirata abca4c012f [Utils] Use make_early_inc_range (NFC) 2021-09-13 08:57:23 -07:00
dpalermo d5c049a3f6 [openmp] Fix 51647, corrupt bitcode on amdgpu
Patch by @dpalermo

The corrupt bitcode reported in https://bugs.llvm.org/show_bug.cgi?id=51647 seems to be a result of a later pass changing the workfn variable to addrspace(5) (thread private, on the stack). That seems reasonable for an alloca without an address space so it's an open question why that can crash the bitcode reader.

This change puts it in the thread private address space to begin with which means whatever misfired further down the pipeline does not break it. That matches the codegen from clang where stack variables are always annotated (5) and then addrspace cast prior to following use.

This therefore patches around whatever unsuccessfully moved the alloca variable to addrspace(5). That solves the problem of openmp opt producing code that crashes the bitcode reader. It should be possible to create a minimal repro for the underlying bug based on some handwritten IR that uses an alloca in a generic address space.

Reviewed By: ronlieb, jdoerfert, dpalermo-phab

Differential Revision: https://reviews.llvm.org/D109500
2021-09-13 15:24:48 +01:00
Anna Thomas b4e787d8f4 [InstCombining] Refactor checks for TryToSinkInstruction. NFC
Moved out the checks for profitability of TryToSinkInstructions
into a lambda function.
This will also allow us to easily add checks for bailing out if the
transform is not profitable.

Tests-Run: instCombine tests.
2021-09-13 09:04:34 -04:00
Florian Hahn c24fc37e47
[VectorCombine] Support AND/UREM indices that require freezing.
38b098be66 limited scalarization to indices that are known non-poison.
For certain patterns that restrict the range of an index, we can insert
a freeze of the original value, to prevent propagation of poison.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107580
2021-09-13 11:21:45 +01:00
Jingu Kang 2a26d47a2d [LoopBoundSplit] Check the start value of split cond AddRec
After transformation, we assume the split condition of the pre-loop is always
true. In order to guarantee it, we need to check the start value of the split
cond AddRec satisfies the split condition.

Differential Revision: https://reviews.llvm.org/D109354
2021-09-13 10:32:35 +01:00
David Sherwood bbada9ff45 [NFC] Replace unsigned VF with ElementCount in EpilogueLoopVectorizationInfo
This patch simply replaces any unsigned VFs with ElementCounts. It's
still NFC because at the moment epilogue vectorisation is disabled
when the main vector loop uses scalable vectors.

Differential Revision: https://reviews.llvm.org/D109364
2021-09-13 10:18:30 +01:00
Arthur Eubanks 6a92ab07cb [NFC][CoroSplit] Directly use Function::getFunctionType() 2021-09-12 21:34:19 -07:00
Max Kazantsev d9ca444835 [IndVars] Break backedge and replace PHIs if loop exits on 1st iteration
Implement TODO in optimizeLoopExits. Now if we have proved that some loop exit
is taken on 1st iteration, we make all branches in the following exiting blocks
always branch out of the loop and their conditions simplified away.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D108910
Reviewed By: lebedev.ri
2021-09-13 11:30:55 +07:00
Max Kazantsev 5a6dfb27ca [IndVars] Replace PHIs if loop exits on 1st iteration
This is a part of D108910.
We replace all loop PHIs with values coming from the loop preheader if
we proved that backedge is never taken.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D109596
Reviewed By: lebedev.ri
2021-09-13 10:50:33 +07:00
Kuter Dinel 9a193bdc81 [Attributor][FIX] AACallEdges, fix propagation error.
This patch fixes a error made in 2cc6f7c8e1. That patch
added a call site position but there was a small error with the way
the presence of a unknown call edge was being propagated from call site
to function. This patch fixes that error. This error was effecting some
AMDGPU tests.
2021-09-13 03:45:26 +03:00
Kuter Dinel 66a0b3464c [Attributor] AAFunctionReachability, Handle CallBase Reachability.
This patch makes it possible to query callbase reachability
(Can a callbase reach a function Fn transitively).
The patch moves the reachability query handling logic to a member class,
this class will have more users within the AA once we add other function
reachability queries.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106402
2021-09-13 01:35:44 +03:00
Kuter Dinel 2cc6f7c8e1 [Attributor] Create a call site position for AACalledges
This patch adds a call site position for AACallEdges, this
allows us to ask questions about which functions a specific
`CallBase` might call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106208
2021-09-13 01:17:05 +03:00