Commit Graph

27882 Commits

Author SHA1 Message Date
Arthur Eubanks 2983053d23 [NFC][OpaquePtr] Explicitly pass GEP source type to IRBuilder in more places 2021-06-01 13:13:37 -07:00
Harald van Dijk f126e8ec28
[SLPVectorizer] Ignore unreachable blocks
As the existing test unreachable.ll shows, we should be doing more
work to avoid entering unreachable blocks: we should not stop
vectorization just because a PHI incoming value from an unreachable
block cannot be vectorized. We know that particular value will never
be used so we can just replace it with poison.
2021-06-01 20:21:04 +01:00
Alexey Bataev 36911971a5 [SLP]Better detection of perfect/shuffles matches for gather nodes.
Implemented better scheme for perfect/shuffled matches of the gather
nodes which allows to fix the performance regressions introduced by
earlier patches. Starting detecting matches for broadcast nodes and
extractelement gathering.

Differential Revision: https://reviews.llvm.org/D102920
2021-06-01 07:08:07 -07:00
Daniil Seredkin 13140120dc [InstCombine] Relax constraints of uses for exp(X) * exp(Y) -> exp(X + Y)
InstCombine didn't perform the transformations when fmul's operands were
the same instruction because it required to have one use for each of them
which is false in the case. This patch fixes this + adds tests for them
and introduces a new function isOnlyUserOfAnyOperand to check these cases
in a single place.

This patch is a result of discussion in D102574.

Differential Revision: https://reviews.llvm.org/D102698
2021-06-01 08:33:23 -04:00
Florian Hahn 1b84acb23a
[LoopDeletion] Consider infinite loops alive, unless mustprogress.
The current loop or any of its sub-loops may be infinite. Unless the
function or the loops are marked as mustprogress, this in itself makes
the loop *not* dead.

This patch moves the logic to check whether the current loop is finite
or mustprogress to `isLoopDead` and also extends it to check the
sub-loops. This should fix PR50511.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D103382
2021-06-01 13:07:36 +01:00
Florian Hahn d4c070d801
[VectorCombine] Freeze index unless it is known to be non-poison.
If the index itself is already poison, the poison propagates through
instructions clamping the index to a valid range. This still causes
introducing a load of poison, as flagged by Alive2 and pointed out
at 575e2aff55.

This patch updates the code to freeze the index, unless it is proven to
not be poison.

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D103378
2021-06-01 10:40:57 +01:00
Nathan Chancellor e6b086bef2
Revert "[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210)"
This reverts commit 4f2fd3818b.

The Linux kernel fails to build after this commit. See
https://reviews.llvm.org/D99481 for a reproducer.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2021-05-31 20:21:26 -07:00
Arthur Eubanks 372237487e [OpaquePtr] Remove some uses of PointerType::getElementType() 2021-05-31 16:11:25 -07:00
Congzhe Cao bfefde22b6 [LoopInterhcange] Handle movement of reduction phis appropriately
This patch fixes pr43326 and pr48212.

Currently when we move reduction phis to the right place,
loop interchange assumes the first phi in loop headers is
an induction phi, skips the first phi and assumes the rest
of phis are candidate reduction phis to move. However, it
may not always be the case.

This patch loops over all phis in loop headers and considers
a phi node as a candidate reduction phi to move only when it
is indeed a reduction phi across outer and inner loop.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D102743
2021-05-31 16:27:38 -04:00
Florian Hahn aa00b1d763
[LV] Try to sink users recursively for first-order recurrences.
Update isFirstOrderRecurrence to  explore all uses of a recurrence phi
and check if we can sink them. If there are multiple users to sink, they
are all mapped to the previous instruction.

Fixes PR44286 (and another PR or two).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D84951
2021-05-31 19:55:33 +01:00
Roman Lebedev f7c95c3322
[NFC] ScalarEvolution: apply SSO to the ExprValueMap value
ExprValueMap is a map from SCEV * to a set-vector of (Value *, ConstantInt *) pair,
and while the map itself will likely be big-ish (have many keys),
it is a reasonable assumption that each key will refer to a small-ish
number of pairs.

In particular looking at n=512 case from
https://bugs.llvm.org/show_bug.cgi?id=50384,
the small-size of 4 appears to be the sweet spot,
it results in the least allocations while minimizing memory footprint.
```
$ for i in $(ls heaptrack.opt.*.gz); do echo $i; heaptrack_print $i | tail -n 6; echo ""; done
heaptrack.opt.0-orig.gz
total runtime: 14.32s.
calls to allocation functions: 8222442 (574192/s)
temporary memory allocations: 2419000 (168924/s)
peak heap memory consumption: 190.98MB
peak RSS (including heaptrack overhead): 239.65MB
total memory leaked: 67.58KB

heaptrack.opt.1-n1.gz
total runtime: 13.72s.
calls to allocation functions: 7184188 (523705/s)
temporary memory allocations: 2419017 (176338/s)
peak heap memory consumption: 191.38MB
peak RSS (including heaptrack overhead): 239.64MB
total memory leaked: 67.58KB

heaptrack.opt.2-n2.gz
total runtime: 12.24s.
calls to allocation functions: 6146827 (502355/s)
temporary memory allocations: 2418997 (197695/s)
peak heap memory consumption: 163.31MB
peak RSS (including heaptrack overhead): 211.01MB
total memory leaked: 67.58KB

heaptrack.opt.3-n4.gz
total runtime: 12.28s.
calls to allocation functions: 6068532 (494260/s)
temporary memory allocations: 2418985 (197017/s)
peak heap memory consumption: 155.43MB
peak RSS (including heaptrack overhead): 201.77MB
total memory leaked: 67.58KB

heaptrack.opt.4-n8.gz
total runtime: 12.06s.
calls to allocation functions: 6068042 (503321/s)
temporary memory allocations: 2418992 (200646/s)
peak heap memory consumption: 166.03MB
peak RSS (including heaptrack overhead): 213.55MB
total memory leaked: 67.58KB

heaptrack.opt.5-n16.gz
total runtime: 12.14s.
calls to allocation functions: 6067993 (499958/s)
temporary memory allocations: 2418999 (199307/s)
peak heap memory consumption: 187.24MB
peak RSS (including heaptrack overhead): 233.69MB
total memory leaked: 67.58KB
```

While that test may be an edge worst-case scenario,
https://llvm-compile-time-tracker.com/compare.php?from=dee85d47d9f15fc268f7b18f279dac2774836615&to=98a57e31b1947d5bcdf4a5605ac2ab32b4bd5f63&stat=instructions
agrees that this also results in improvements in the usual situations.
2021-05-31 15:34:03 +03:00
Juneyoung Lee 7161bb87c9 [InsCombine] Fix a few remaining vec transforms to use poison instead of undef
This is a patch that replaces shufflevector and insertelement's placeholder value with poison.

Underlying motivation is to fix the semantics of shufflevector with undef mask to return poison instead
(D93818)
The consensus has been made in the late 2020 via mailing list as well as the thread in https://bugs.llvm.org/show_bug.cgi?id=44185 .

This patch is a simple syntactic change to the existing code, hence directly pushed as a commit.
2021-05-31 18:47:09 +09:00
David Green 222aeb4d51 [DSE] Remove stores in the same loop iteration
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.

The test case this helps is from code like this, which can come up in
certain matrix operations:
  for(i=..)
    dst[i] = 0;
    for(j=..)
      dst[i] += src[i*n+j];

After LICM, this becomes:
for(i=..)
  dst[i] = 0;
  sum = 0;
  for(j=..)
    sum += src[i*n+j];
  dst[i] = sum;

The first store is dead, and with this patch is now removed.

Differntial Revision: https://reviews.llvm.org/D100464
2021-05-31 10:22:37 +01:00
Hyeongyu Kim 4f2fd3818b [InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210)
As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210
...the bug is triggered as Eli say when sext(idx) * ElementSize overflows.

```
   // assume that GV is an array of 4-byte elements
   GEP = gep GV, 0, Idx // this is accessing Idx * 4
   L = load GEP
   ICI = icmp eq L, value
 =>
   ICI = icmp eq Idx, NewIdx
```

The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp.
And there is a problem because Idx * ElementSize can overflow.

Let's assume that the wanted value is at offset 0.
Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00.
We should return true for all these values, but currently, the new icmp only returns true for 0x00..00.

This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx.

```
   ...
 =>
   Idx' = and Idx, 0x3F..FF
   ICI = icmp eq Idx', NewIdx
```

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D99481
2021-05-31 14:08:20 +09:00
Sanjay Patel 7bb8bfa062 [InstCombine] fix miscompile from vector select substitution
This is similar to the fix in c590a9880d ( PR49832 ), but
we missed handling the pattern for select of bools (no compare
inst).

We can't substitute a vector value because the equality condition
replacement that we are attempting requires that the condition
is true/false for the entire value. Vector select can be partly
true/false.

I added an assert for vector types, so we shouldn't hit this again.
Fixed formatting while auditing the callers.

https://llvm.org/PR50500
2021-05-30 07:11:58 -04:00
Mindong Chen 71acce68da [NFCI] Move DEBUG_TYPE definition below #includes
When you try to define a new DEBUG_TYPE in a header file, DEBUG_TYPE
definition defined around the #includes in files include it could
result in redefinition warnings even compile errors.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D102594
2021-05-30 17:31:01 +08:00
Sanjay Patel c7da0c383a [InstCombine] fold zext of masked bit set/clear
This does not solve PR17101, but it is one of the
underlying diffs noted here:
https://bugs.llvm.org/show_bug.cgi?id=17101#c8

We could ease the one-use checks for the 'clear'
(no 'not' op) half of the transform, but I do not
know if that asymmetry would make things better
or worse.

Proofs:
https://rise4fun.com/Alive/uVB

Name: masked bit set
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp ne i32 %and, 0
%r = zext i1 %cmp to i32
=>
%s = lshr i32 %x, %y
%r = and i32 %s, 1

Name: masked bit clear
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp eq i32 %and, 0
%r = zext i1 %cmp to i32
=>
%xn = xor i32 %x, -1
%s = lshr i32 %xn, %y
%r = and i32 %s, 1

Note: this is a re-post of a patch that I committed at:
rGa041c4ec6f7a

The commit was reverted because it exposed another bug:
rGb212eb7159b40

But that has since been corrected with:
rG8a156d1c2795189 ( D101191 )

Differential Revision: https://reviews.llvm.org/D72396
2021-05-29 08:52:26 -04:00
Sanjay Patel 52f2970036 [InstCombine] reduce code duplication; NFC 2021-05-29 08:33:25 -04:00
Nikita Popov 625920dabf [LoopUnroll] Make DomTree explicitly required (NFC)
Some of the code was already assuming that DT is non-null, so
make that requirement more explicit and remove unnecessary null
checks.
2021-05-29 09:37:32 +02:00
Fangrui Song 38dbdde792 [Internalize] Simplify comdat renaming with noduplicates after D103043
I realized that we can use `comdat noduplicates` which is available on ELF.
Add a special case for wasm which doesn't support the feature.
2021-05-28 16:58:38 -07:00
Nikita Popov 90310dfff8 [LoopUnroll] Use changeToUnreachable() (NFC)
When fulling unrolling with a non-latch exit, the latch block is
folded to unreachable. Replace this folding with the existing
changeToUnreachable() helper, rather than performing it manually.

This also moves the fold to happen after the manual DT update
for exit blocks. I believe this is correct in that the conversion
of an unconditional backedge into unreachable should not affect
the DT at all.

Differential Revision: https://reviews.llvm.org/D103340
2021-05-29 00:11:21 +02:00
Nikita Popov f765445a69 [LoopUnroll] Clean up exit folding (NFC)
This does some non-functional cleanup of exit folding during
unrolling. The two main changes are:

 * First rewrite latch->header edges, which is unrelated to exit
   folding.
 * Combine folding for latch and non-latch exits. After the
   previous change, the only difference in their logic is that
   for non-latch exits we currently only fold "known non-exit"
   cases, but not "known exit" cases.

I think this helps a lot to clarify this code and prepare it for
future changes.

Differential Revision: https://reviews.llvm.org/D103333
2021-05-28 22:31:13 +02:00
Bardia Mahjour 06eaffa858 [NFC] Remove confusing info about MainLoop VF/UF from debug message 2021-05-28 16:10:04 -04:00
Florian Hahn 007f268c35
[VectorCombine] Check indices for all extracts we scalarize.
We need to make sure that the indices of all extracts we scalarize are
valid.
2021-05-28 18:35:29 +01:00
Stefan Pintilie 0159652058 Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" (try 2)"
This reverts commit be1a23203b.
2021-05-28 12:21:22 -05:00
Stefan Pintilie 24bd657202 Revert "[NFCI][LoopDeletion] Only query SCEV about loop successor if another successor is also in loop"
This reverts commit b0b2bf3b5d.
2021-05-28 12:21:22 -05:00
Stefan Pintilie fd55331203 Revert "[NFC] Formatting fix"
This reverts commit 59d938e649.
2021-05-28 12:21:22 -05:00
Stefan Pintilie 807fc7cdc9 Revert "[NFC] Reuse existing variables instead of re-requesting successors"
This reverts commit c467585682.
2021-05-28 12:21:22 -05:00
Stefan Pintilie dd226803c2 Revert "[NFCI][LoopDeletion] Do not call complex analysis for known non-zero BTC"
This reverts commit 7d418dadf6.
2021-05-28 12:21:21 -05:00
Sanjay Patel 403cfe5d70 [PassManager] unify late simplifycfg options between regular and LTO pipelines
This is split off from D102002, and I think it is clear that
the difference in behavior was not intended. Options were
added to SimplifyCFG over time, but different chunks of
the pass pipelines were not kept in sync.
2021-05-28 13:06:49 -04:00
eopXD fa488ea864 [LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.

Utilize LoopNest and let function 'Flatten' generate information from it.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D102904
2021-05-28 15:43:12 +00:00
dongAxis 66ff1cbd71 [NFC][Transforms][Utils] remove useless variable in CloneBasicBlock 2021-05-28 17:50:38 +08:00
eopXD e96d6f4821 Revert "[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass"
This reverts commit 7952ddb21f.

Differential Revision: https://reviews.llvm.org/D103302
2021-05-28 07:58:06 +00:00
eopXD 7e06cf8f1b Revert "[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass"
This reverts commit ffc4d3e068.
2021-05-28 07:48:04 +00:00
eopXD ffc4d3e068 [LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.

Utilize LoopNest and let function 'Flatten' generate information from it.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D102904
2021-05-28 07:25:53 +00:00
eopXD 7952ddb21f [LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.

Utilize LoopNest and let function 'Flatten' generate information from it.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D102904
2021-05-28 07:11:26 +00:00
Max Kazantsev 6a2af607ad Revert "[NFCI] Lazily evaluate SCEVs of PHIs"
This reverts commit 51d334a845.

Reported failures, need to analyze.
2021-05-28 11:05:30 +07:00
Jinsong Ji b2581196eb [AIX] Enable stackprotect feature
AIX use `__ssp_canary_word` instead of `__stack_chk_guard`.
This patch update the target hook to use correct symbol,
so that the basic stackprotect feature can work.

The traceback will be handled in follow up patch.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D103100
2021-05-28 02:18:15 +00:00
Jianzhou Zhao fc1d39849e [dfsan] Add a flag about whether to propagate offset labels at gep
DFSan has flags to control flows between pointers and objects referred
by pointers. For example,

a = *p;
L(a) = L(*p)        when -dfsan-combine-pointer-labels-on-load = false
L(a) = L(*p) + L(p) when -dfsan-combine-pointer-labels-on-load = true

*p = b;
L(*p) = L(b)        when -dfsan-combine-pointer-labels-on-store = false
L(*p) = L(b) + L(p) when -dfsan-combine-pointer-labels-on-store = true
The question is what to do with p += c.

In practice we found many confusing flows if we propagate labels from c
to p. So a new flag works like this

p += c;
L(p) = L(p)        when -dfsan-propagate-via-pointer-arithmetic = false
L(p) = L(p) + L(c) when -dfsan-propagate-via-pointer-arithmetic = true

Reviewed-by: gbalats

Differential Revision: https://reviews.llvm.org/D103176
2021-05-28 00:06:19 +00:00
Arthur Eubanks 2d2a902078 [SanCov] Properly set ABI parameter attributes
Arguments need to have the proper ABI parameter attributes set.

Followup to D101806.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D103288
2021-05-27 15:27:21 -07:00
Adrian Prantl f3869a5c32 Support stripping indirectly referenced DILocations from !llvm.loop metadata
in stripDebugInfo().  This patch fixes an oversight in
https://reviews.llvm.org/D96181 and also takes into account loop
metadata pointing to other MDNodes that point into the debug info.

rdar://78487175

Differential Revision: https://reviews.llvm.org/D103220
2021-05-27 13:23:33 -07:00
maekawatoshiki 2165360003 [LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D99149
2021-05-28 01:17:23 +09:00
Florian Hahn 38641ddf3e
[VPlan] Do not sink uniform recipes in sinkScalarOperands.
For uniform ReplicateRecipes, only the first lane should be used, so
sinking them would mean we have to compute the value of the first lane
multiple times. Also, at the moment, sinking them causes a crash because
the value of the first lane is re-used by all users.

Reported post-commit for D100258.
2021-05-27 14:07:48 +01:00
Max Kazantsev 7d418dadf6 [NFCI][LoopDeletion] Do not call complex analysis for known non-zero BTC 2021-05-27 15:29:37 +07:00
Max Kazantsev c467585682 [NFC] Reuse existing variables instead of re-requesting successors 2021-05-27 15:29:37 +07:00
Max Kazantsev 51d334a845 [NFCI] Lazily evaluate SCEVs of PHIs
Eager evaluation has cost of compile time. Only query them if they are
required for proving predicates.
2021-05-27 13:35:31 +07:00
Max Kazantsev 59d938e649 [NFC] Formatting fix 2021-05-27 12:50:54 +07:00
Max Kazantsev b0b2bf3b5d [NFCI][LoopDeletion] Only query SCEV about loop successor if another successor is also in loop 2021-05-27 12:44:22 +07:00
Yevgeny Rouban 4d26f41f76 [RS4GC] Introduce intrinsics to get base ptr and offset
There can be a need for some optimizations to get (base, offset)
for any GC pointer. The base can be calculated by generating
needed instructions as it is done by the
RewriteStatepointsForGC::findBasePointer() function. The offset
can be calculated in the same way. Though to not expose the base
calculation and to make the offset calculation as simple as
ptrtoint(derived_ptr) - ptrtoint(base_ptr), which is illegal
outside RS4GC, this patch introduces 2 intrinsics:

 @llvm.experimental.gc.get.pointer.base(%derived_ptr)
 @llvm.experimental.gc.get.pointer.offset(%derived_ptr)

These intrinsics are inlined by RS4GC along with generation of
statepoint sequences.

With these new intrinsics the GC parseable lowering for atomic
memcpy intrinsics (6ec2c5e402)
could be implemented as a separate pass.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D100445
2021-05-27 09:14:14 +07:00
Heejin Ahn 5bfe06ad35 [SimplifyCFG] Use make_early_inc_range() while deleting instructions
We are deleting `phi` nodes within the for loop, so this makes sure we
increment the iterator before we delete the instruction pointed by the
iterator.

This started to break in
a0be081646.

Reviewed By: dschuff, lebedev.ri

Differential Revision: https://reviews.llvm.org/D103181
2021-05-26 11:43:11 -07:00
Alexey Bataev 27d3528acf [SLP]Fix vectorization of insertelements with multiple uses.
SLP vectorizer should not consider in sertelements with multiple uses as
a part of high level build vector, it must be considered as
a terminating insertelement in the vector build, otherwise it may
produce incorrect code.

Differential Revision: https://reviews.llvm.org/D103164
2021-05-26 09:42:18 -07:00
Stephen Tozer a0bd6105d8 [DebugInfo] Limit the number of values that may be referenced by a dbg.value
Following the addition of salvaging dbg.values using DIArgLists to
reference multiple values, a case has been found where excessively large
DIArgLists are produced as a result of this salvaging, resulting in
large enough performance costs to effectively freeze the compiler.

This patch introduces an upper bound of 16 to the number of values that
may be salvaged into a dbg.value, to limit the impact of these extreme
cases to performance.

Differential Revision: https://reviews.llvm.org/D103162
2021-05-26 17:34:05 +01:00
Philip Reames 9cc2181ec3 [unroll] Use value domain for symbolic execution based cost model
The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost.

By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928.

Differential Revision: https://reviews.llvm.org/D102934
2021-05-26 08:41:25 -07:00
Kerry McLaughlin 9f76a85260 [LoopVectorize] Enable strict reductions when allowReordering() returns false
When loop hints are passed via metadata, the allowReordering function
in LoopVectorizationLegality will allow the order of floating point
operations to be changed:

  bool allowReordering() const {
    // When enabling loop hints are provided we allow the vectorizer to change
    // the order of operations that is given by the scalar loop. This is not
    // enabled by default because can be unsafe or inefficient.

The -enable-strict-reductions flag introduced in D98435 will currently only
vectorize reductions in-loop if hints are used, since canVectorizeFPMath()
will return false if reordering is not allowed.

This patch changes canVectorizeFPMath() to query whether it is safe to
vectorize the loop with ordered reductions if no hints are used. For
testing purposes, an additional flag (-hints-allow-reordering) has been
added to disable the reordering behaviour described above.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D101836
2021-05-26 13:59:12 +01:00
Max Kazantsev be1a23203b Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" (try 2)
The patch was reverted due to compile time impact of contextual SCEV
queries. It also appeared that it introduced a miscompile on irreducible CFG.

Changes made:
1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate;
2. Irreducible CFG in live code is now detected and excluded from processing.

Differential Revision: https://reviews.llvm.org/D102615
2021-05-26 19:47:14 +07:00
Max Kazantsev 0de553dce0 Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration""
This reverts commit 43d2e51c2e.

Commited wrong version.
2021-05-26 19:29:07 +07:00
Max Kazantsev 43d2e51c2e Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration"
The patch was reverted due to compile time impact of contextual SCEV
queries. It also appeared that it introduced a miscompile on irreducible CFG.

Changes made:
1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate;
2. Irreducible CFG in live code is now detected and excluded from processing.

Differential Revision: https://reviews.llvm.org/D102615
2021-05-26 19:23:21 +07:00
David Sherwood 70d8365e33 Fix warning introduced by 9c766f4090 2021-05-26 10:20:39 +01:00
David Sherwood 9c766f4090 [InstCombine] Fold extractelement + vector GEP with one use
We sometimes see code like this:

Case 1:
  %gep = getelementptr i32, i32* %a, <2 x i64> %splat
  %ext = extractelement <2 x i32*> %gep, i32 0

or this:

Case 2:
  %gep = getelementptr i32, <4 x i32*> %a, i64 1
  %ext = extractelement <4 x i32*> %gep, i32 0

where there is only one use of the GEP. In such cases it makes
sense to fold the two together such that we create a scalar GEP:

Case 1:
  %ext = extractelement <2 x i64> %splat, i32 0
  %gep = getelementptr i32, i32* %a, i64 %ext

Case 2:
  %ext = extractelement <2 x i32*> %a, i32 0
  %gep = getelementptr i32, i32* %ext, i64 1

This may create further folding opportunities as a result, i.e.
the extract of a splat vector can be completely eliminated. Also,
even for the general case where the vector operand is not a splat
it seems beneficial to create a scalar GEP and extract the scalar
element from the operand. Therefore, in this patch I've assumed
that a scalar GEP is always preferrable to a vector GEP and have
added code to unconditionally fold the extract + GEP.

I haven't added folds for the case when we have both a vector of
pointers and a vector of indices, since this would require
generating an additional extractelement operation.

Tests have been added here:

  Transforms/InstCombine/gep-vector-indices.ll

Differential Revision: https://reviews.llvm.org/D101900
2021-05-26 09:54:26 +01:00
Teresa Johnson d35fe04fa3 [LTT] Handle merged llvm.assume when dropping type tests
When the lower type test pass is invoked a second time with
DropTypeTests set to true, it expects that all remaining type tests feed
assume instructions, which are removed along with the type tests.

In some cases the llvm.assume might have been merged with another one,
i.e. from a builtin_assume instruction, in which case the type test
would actually feed a phi that in turn feeds the merged assume
instruction. In this case we can simply replace that operand of the phi
with "true" before removing the type test.

Differential Revision: https://reviews.llvm.org/D103073
2021-05-25 17:02:13 -07:00
Kevin Athey 52ac114771 LLVM Detailed IR tests for introduction of flag -fsanitize-address-detect-stack-use-after-return-mode.
Rework all tests that interact with use after return to correctly handle the case where the mode has been explicitly set to Never or Always.

for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D102462
2021-05-25 16:17:39 -07:00
Fangrui Song b426b45d10 [Internalize] Rename instead of removal if a to-be-internalized comdat has more than one member
Beside the `comdat any` deduplication feature, instrumentations use comdat to
establish dependencies among a group of sections, to prevent section based
linker garbage collection from discarding some members without discarding all.
LangRef acknowledges this usage with the following wording:

> All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key.

On ELF, for PGO instrumentation, a `__llvm_prf_cnts` section and its associated
`__llvm_prf_data` section are placed in the same GRP_COMDAT group.  A
`__llvm_prf_data` is usually not referenced and expects the liveness of its
associated `__llvm_prf_cnts` to retain it.

The `setComdat(nullptr)` code (added by D10679) in InternalizePass can break the
use case (a `__llvm_prf_data` may be dropped with its associated `__llvm_prf_cnts` retained).
The main goal of this patch is to fix the dependency relationship.

I think it makes sense for InternalizePass to internalize a comdat and thus
suppress the deduplication feature, e.g. a relocatable link of a regular LTO can
create an object file affected by InternalizePass.
If a non-internal comdat in a.o is prevailed by an internal comdat in b.o, the
a.o references to the comdat definitions will be non-resolvable (references
cannot bind to STB_LOCAL definitions in b.o).

On PE-COFF, for a non-external selection symbol, deduplication is naturally
suppressed with link.exe and lld-link. However, this is fuzzy on ELF and I tend
to believe the spec creator has not thought about this use case (see D102973).

GNU ld and gold are still using the "signature is name based" interpretation.
So even if D102973 for ld.lld is accepted, for portability, a better approach is
to rename the comdat. A comdat with one single member is the common case,
leaving the comdat can waste (sizeof(Elf64_Shdr)+4*2) bytes, so we optimize by
deleting the comdat; otherwise we rename the comdat.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103043
2021-05-25 14:15:27 -07:00
Matt Morehouse 832c99f727 Revert "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration"
This reverts commit 2531fd70d1 due to
performance regression on the PPC buildbot.
2021-05-25 13:58:42 -07:00
Benjamin Kramer d2d4f16806 [Matrix] Use LLVM_DEBUG for a debug flag
dump() doesn't exist in release builds.

ld.lld: error: undefined symbol: llvm::Value::dump() const
>>> referenced by LowerMatrixIntrinsics.cpp
>>>               LowerMatrixIntrinsics.o:((anonymous namespace)::LowerMatrixIntrinsics::Visit())
2021-05-25 21:10:19 +02:00
Nikita Popov 9c91614959 [CVP] Guard against poison in common phi value transform (PR50399)
The common phi value transform replaces constants with values that
have the same value as the constant on a given edge. However, LVI
generally only provides information that is correct up to poison,
so this can end up replacing a well-defined value with poison.
D69442 addressed an instance of this problem by clearing poison
flags on the generating instruction, which was sufficient at the
time. rGa917fb89dc28 made LVI's edge value analysis slightly more
powerful, and clearing poison flags is no longer sufficient.

This patch changes the transform to instead explicitly guard against
a poison value instead. This should be satisfied for most cases due
to a prior branch on poison.

Fixes https://bugs.llvm.org/show_bug.cgi?id=50399.

Differential Revision: https://reviews.llvm.org/D102966
2021-05-25 20:47:17 +02:00
Adam Nemet dfd1bbd00a [Matrix] Factor and distribute transposes across multiplies
Now that we can fold some transposes into multiplies (CM: A * B^t and RM:
A^t * B), we want to move them around to create the optimal expressions:

* fold away double transposes while still using them to assert the shape
* sink transposes hoping they cancel out
* lift transposes when both operands are transposed

This also modifies the matrix remarks to include the number of exposed
transposes (i.e. transposes that we couldn't fold into a multiply).

The adjustment to the test remarks-inlining is a bit subtle: I am changing the
double transpose to a single transpose so that we don't remove it completely.
More importantly this changes some of the total instruction count, most
notable stores because we can no longer use a vector store.

Differential Revision: https://reviews.llvm.org/D102733
2021-05-25 11:12:20 -07:00
Roman Lebedev 149e018d12
[LoopIdiom] 'arithmetic right-shift until zero': don't turn potentially infinite loops into finite ones
Nowadays LLVM does not assume that all loops are finite,
so if we want to produce a finite loop from a potentially-infinite one,
we must ensure that the original loop is known to be a finite one.

For this transform, it only matters for arithmetic right-shifts.
For them, either the function or the loop must be known to
be `mustprogress`, or the original value being shifted must be known
to be non-negative (because iff the sign bit was set,
it will never become zero, but will become `-1` in the "end").

It would be really good for alive2 to actually complain about this,
but it currently does not: https://github.com/AliveToolkit/alive2/issues/726
2021-05-25 21:02:28 +03:00
Sanjay Patel ae1bc9ebf3 [InstCombine] avoid infinite loop from vector select transforms
The 2nd test is based on the fuzzer example in post-commit
comments of D101191 -
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34661

The 1st test shows that we don't deal with this symmetrically.
We should be able to reduce both examples (possibly in
instsimplify instead of instcombine).
2021-05-25 13:28:38 -04:00
Florian Hahn 8e83ff58c9
[VectorCombine] Remove unneeded InsertPointGuard (NFCI).
All users of the builder should set an insert point before using the
builder. There should be no need for using InsertPointGuard here.
2021-05-25 17:01:05 +01:00
Florian Hahn 575e2aff55
[VectorCombine] Use constant range info for index scalarization legality.
We can only scalarize memory accesses if we know the index is valid.

This patch adjusts canScalarizeAcceess to fall back to
computeConstantRange to check if the index is known to be valid.

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D102476
2021-05-25 13:58:42 +01:00
Sanjay Patel 0bab0f6161 [InstCombine] canonicalize cast before unary shuffle
We could go either direction on this transform. VectorCombine already goes this
way for bitcasts (and handles more complicated cases using the cost model), so
let's try cast-first.

Deferring completely to VectorCombine is another possibility. But the backend
should be able to invert this easily when the vectors have the same shape, so
it doesn't seem like a transform that we need to avoid.

The motivating example from https://llvm.org/PR49081 has an int-to-float
sandwiched between 2 shuffles, and the backend currently does not reduce that,
so on x86, we get something like:

  pshufd	$249, %xmm0, %xmm0]
  cvtdq2ps	%xmm0, %xmm0
  shufps	$144, %xmm0, %xmm0

...instead of just a single conversion instruction.

Differential Revision: https://reviews.llvm.org/D103038
2021-05-25 08:43:09 -04:00
Chuanqi Xu 400a9d3501 [NFC] [Coroutines] Remove unused variable: UnreachableCache 2021-05-25 20:33:46 +08:00
Roman Lebedev 8f4db14d1c
[LoopIdiom] Support 'left-shift until zero' idiom
This adds support for the "count active bits" pattern, i.e.:
```
int countBits(unsigned val) {
    int cnt = 0;
    for( ; (val << cnt) != 0; ++cnt)
        ;
    return cnt;
}
```
but a somewhat more general one:
```
int countBits(unsigned val, int start, int off) {
    int cnt;
    for (cnt = start; val << (cnt + off); cnt++)
        ;
    return cnt;
}
```

alive2 is happy with all the tests there.

Note that, again, much like with the right-shift cases,
we don't require the `val != 0` guard.

This is the last pattern that was supported by
`detectShiftUntilZeroIdiom()`, which now becomes obsolete.
2021-05-25 15:26:35 +03:00
Roman Lebedev f1c5f78d38
[LoopIdiom] Support 'arithmetic right-shift until zero' idiom
This adds support for the "count active bits" pattern, i.e.:
```
int countActiveBits(signed val) {
    int cnt = 0;
    for( ; (val >> cnt) != 0; ++cnt)
        ;
    return cnt;
}
```
but a somewhat more general one:
```
int countActiveBits(signed val, int start, int off) {
    int cnt;
    for (cnt = start; val >> (cnt + off); cnt++)
        ;
    return cnt;
}
```

This directly matches the existing 'logical right-shift until zero' idiom.
alive2 is happy with all the tests there.

Note that, again, much like with the original unsigned case,
we don't require the `val != 0` guard.

The old `detectShiftUntilZeroIdiom()` already supports this pattern,
the idea here is that the `val` must be positive (have at least one
leading zero), because otherwise the loop is non-terminating,
but since it is not `while(1)`, that would have been UB.
2021-05-25 14:30:49 +03:00
Marco Elver 280333021e [SanitizeCoverage] Add support for NoSanitizeCoverage function attribute
We really ought to support no_sanitize("coverage") in line with other
sanitizers. This came up again in discussions on the Linux-kernel
mailing lists, because we currently do workarounds using objtool to
remove coverage instrumentation. Since that support is only on x86, to
continue support coverage instrumentation on other architectures, we
must support selectively disabling coverage instrumentation via function
attributes.

Unfortunately, for SanitizeCoverage, it has not been implemented as a
sanitizer via fsanitize= and associated options in Sanitizers.def, but
rolls its own option fsanitize-coverage. This meant that we never got
"automatic" no_sanitize attribute support.

Implement no_sanitize attribute support by special-casing the string
"coverage" in the NoSanitizeAttr implementation. To keep the feature as
unintrusive to existing IR generation as possible, define a new negative
function attribute NoSanitizeCoverage to propagate the information
through to the instrumentation pass.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035

Reviewed By: vitalybuka, morehouse

Differential Revision: https://reviews.llvm.org/D102772
2021-05-25 12:57:14 +02:00
Alexey Lapshin 10c2e26159 [TRE] Reland: allow TRE for non-capturing calls.
The D82085 "allow TRE for non-capturing calls" caused failure during bootstrap.
This patch does the same as D82085 plus fixes bootstrap error.

The problem with D82085 is that it does not create copies for byval
operands, while replacing function call with a branch.

Consider following example:

```
    int zoo ( S p1 );

    int foo ( int count, S p1 ) {
      if ( count > 10 )
        return zoo(p1);

      // temporarily variable created for passing byvalue parameter
      // p1 could be used when zoo(p1) is called(after TRE is done).
      // lifetime.start p1.byvalue.temp
      return foo(count+1, p1);
      // lifetime.end p1.byvalue.temp
    }
```

After recursive call to foo is replaced with a jump into
start of the function, its parameters could be passed to
zoo function. i.e. temporarily variable created for byvalue
parameter "p1" could be passed to zoo. Finally zoo receives
broken operand:

```
    int foo ( int count, S p1 ) {
    :tailrecurse
      p1_tr = phi p1, p1.byvalue.temp
      if ( count > 10 )
        return zoo(p1_tr);

      // temporarily variable created for passing byvalue parameter
      // p1 could be used when zoo(p1) is called(after TRE is done).
      lifetime.start p1.byvalue.temp
      memcpy (p1.byvalue.temp, p1_tr)
      count = count + 1
      lifetime.end p1.byvalue.temp
      br tailrecurse
    }
```

To prevent using p1.byvalue.temp after its scope finished by
lifetime.end marker this patch copies value from p1.byvalue.temp
into another temporarily variable and then copies this variable
into the input parameter for next iteration.

This patch passes bootstrap build and bootstrap build with AddressSanitizer.

Differential Revision: https://reviews.llvm.org/D85614
2021-05-25 11:35:48 +03:00
Max Kazantsev 2531fd70d1 [LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration
This patch handles one particular case of one-iteration loops for which SCEV
cannot straightforwardly prove BECount = 1. The idea of the optimization is to
symbolically execute conditional branches on the 1st iteration, moving in topoligical
order, and only visiting blocks that may be reached on the first iteration. If we find out
that we never reach header via the latch, then the backedge can be broken.

Differential Revision: https://reviews.llvm.org/D102615
Reviewed By: reames
2021-05-25 12:43:31 +07:00
maekawatoshiki e77d24f70a Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass"
This reverts commit d65c32fb41.
2021-05-25 11:39:49 +09:00
Anton Afanasyev b2cd895011 [SLP] Fix "gathering" of insertelement instructions
For rare exceptional case vector tree node (insertelements for now only)
is marked as `NeedToGather`, this case is processed by patch. Follow-up
of D98714 to fix bug reported here https://reviews.llvm.org/D98714#2764135.

Differential Revision: https://reviews.llvm.org/D102675
2021-05-25 01:35:43 +03:00
Jon Roelofs 095e91c973 [Remarks] Add analysis remarks for memset/memcpy/memmove lengths
Re-landing now that the crasher this patch previously uncovered has been fixed
in: https://reviews.llvm.org/D102935

Differential revision: https://reviews.llvm.org/D102452
2021-05-24 10:10:44 -07:00
Jon Roelofs 694068d0db [Remarks] Look through inttoptr/ptrtoint for -ftrivial-auto-var-init remarks.
The crasher is a related problem that @aemerson found broke speck2k6/403.gcc
when I landed https://reviews.llvm.org/D102452. It has been reduced & modified
to reproduce without that patch.

Differential revision: https://reviews.llvm.org/D102935
2021-05-24 09:23:22 -07:00
Adrian Prantl 4cba0a4f11 CoroSplit: Replace ad-hoc implementation of reachability with API from CFG.h
The current ad-hoc implementation used to determine whether a basic
block is unreachable doesn't work correctly in the general case (for
example it won't detect successors of unreachable blocks as
unreachable). This patch replaces it with the correct API that uses a
DominatorTree to answer the question correctly and quickly.

rdar://77181156

Differential Revision: https://reviews.llvm.org/D102963
2021-05-24 09:18:33 -07:00
Florian Hahn 65d3dd7c88
[VPlan] Add first VPlan version of sinkScalarOperands.
This patch adds a first VPlan-based implementation of sinking of scalar
operands.

The current version traverse a VPlan once and processes all operands of
a predicated REPLICATE recipe. If one of those operands can be sunk,
it is moved to the block containing the predicated REPLICATE recipe.
Continue with processing the operands of the sunk recipe.

The initial version does not re-process candidates after other recipes
have been sunk. It also cannot partially sink induction increments at
the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the
induction is used for example in a GEP, only the first lane is used and
in the lowered IR the adds for the other lanes can be sunk into the
predicated blocks.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100258
2021-05-24 15:29:58 +01:00
Florian Hahn e9d97d7d9d
[VPlan] Add mayReadOrWriteMemory & friends.
This patch adds initial implementation of mayReadOrWriteMemory,
mayReadFromMemory and mayWriteToMemory to VPRecipeBase.

Used by D100258.
2021-05-24 13:11:32 +01:00
Florian Hahn 4e8c28b6fb
Recommit "[VectorCombine] Scalarize vector load/extract."
This reverts commit 94d54155e2.

This fixes a sanitizer failure by moving scalarizeLoadExtract(I)
before foldSingleElementStore(I), which may remove instructions.
2021-05-24 11:35:07 +01:00
Roman Lebedev 32bee42719
[NFCI][LoopIdiom] 'left-shift until bittest': assert that BaseX is loop-invariant
Given that BaseX is an incoming value when coming from the preheader,
it *should* be loop-invariant, but let's just document this assumption.
2021-05-24 12:15:06 +03:00
Roman Lebedev aa3dac95ed
[LoopIdiom] 'logical right shift until zero': the value must be loop-invariant
As per the reproducer provided by Mikael Holmén in post-commit review.
2021-05-24 12:15:06 +03:00
Florian Hahn 94d54155e2
Revert "[VectorCombine] Scalarize vector load/extract."
This reverts commit 86497785d5.

One of the tests causes an ASAN failure.
https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio
2021-05-24 10:11:00 +01:00
Florian Hahn 86497785d5
[VectorCombine] Scalarize vector load/extract.
This patch adds a new combine that tries to scalarize chains of
`extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is
profitable when extracting only a few elements out of a large vector.

At the moment, `store (extractelement (load %ptr), %idx), %ptr`
operations on large vectors result in huge code in the backend.

This can easily be triggered by using the matrix extension, e.g.
https://clang.godbolt.org/z/qsccPdPf4

This should complement D98240.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D100273
2021-05-24 09:29:08 +01:00
Johannes Doerfert 6caea8a7fa [Attributor] Introduce a helper do deal with constant type mismatches
If we simplify values we sometimes end up with type mismatches. If the
value is a constant we can often cast it though to still allow
propagation. The logic is now put into a helper and it replaces some
ad hoc things we did before.

This also introduces the AA namespace for abstract attribute related
functions and types.
2021-05-23 23:00:40 -05:00
Johannes Doerfert 55e9c28212 [Attributor] Teach AAIsDead about undef values
Not only if the branch or switch condition is dead but also if it is
assumed `undef` we can delay AAIsDead exploration.
2021-05-23 23:00:40 -05:00
Johannes Doerfert 4878d73419 [Attributor] Deal with address spaces gracefully
When we do value propagation we need to cast address spaces properly.
2021-05-23 23:00:39 -05:00
Johannes Doerfert 1ba2929bb8 [Attributor] Be more careful to not disturb the CG outside the SCC
We have seen various problems when the call graph was not updated or
the updated did not succeed because it involved functions outside the
SCC. This patch adds assertions and checks to avoid accidentally
changing something outside the SCC that would impact the call graph.
It also prevents us from reanalyzing functions outside the current
SCC which could cause problems on its own. Note that the transformations
we do might cause the CG to be "more precise" but the original one would
always be a super set of the most precise one. Since the call graph is
by nature an approximation, it is good enough to have a super set of all
call edges.
2021-05-23 23:00:39 -05:00
Johannes Doerfert e93ac1e2de [Attributor][FIX] Account for undef in the constant value lattice
The constant value lattice looks like this

```
  <None>
     |
  <undef>
  /  |   \
... <0>  ...
 \   |   /
 <unknown>
```
We did not account for the undef and assumed a value meant we could not
change anymore. Now we actually check if we have the same value as
before, which will signal CHANGED to the users when we go from undef to
a specific constant.

This fixes, among other things, the bug exposed by @ipccp4 in
`value-simplify.ll`.
2021-05-23 20:47:06 -05:00
Johannes Doerfert 5cdc29f795 [Attributor][FIX] Ensure we replace undef if we see the first "real" value
The state of AAPotentialValues tracks if undef is contained. It should
fold undef into the first non-undef value. However we missed a case
before. There was also a shadowing definition of two variables that
caused trouble. The test exposes both problems.
2021-05-23 20:47:06 -05:00
Johannes Doerfert 2bc51d39db [Attributor][NFC] Add helpful debug outputs 2021-05-23 20:47:05 -05:00
Johannes Doerfert cb511531b9 [Attributor][NFC] Clang format the Attributor source files 2021-05-23 20:47:05 -05:00
maekawatoshiki d65c32fb41 [LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D99149
2021-05-23 22:32:01 +09:00
Yaxun (Sam) Liu bf6124580d [HIP] support ThinLTO
Add options -[no-]offload-lto and -foffload-lto=[thin,full] for controlling
LTO for offload compilation. Allow LTO for AMDGPU target.

AMDGPU target does not support codegen of object files containing
call of external functions, therefore the LLVM module passed to
AMDGPU backend needs to contain definitions of all the callees.
An LLVM option is added to allow function importer to import
functions with noinline attribute.

HIP toolchain passes proper LLVM options to lld to make sure
function importer imports definitions of all the callees.

Reviewed by: Teresa Johnson, Artem Belevich

Differential Revision: https://reviews.llvm.org/D99683
2021-05-22 10:48:34 -04:00
Nikita Popov 9a9421a461 Reapply [InstCombine] Fold multiuse shr eq zero
This was reverted due to performance regressions in ARM benchmarks,
which have since been addressed by D101196 (SCEV analysis improvement)
and D101778 (CGP reverse transform).

-----

The single-use case is handled implicity by converting the icmp
into a mask check first. When comparing with zero in particular,
we don't need the one-use restriction, as we only produce a single
icmp.

https://alive2.llvm.org/ce/z/MSixcm
https://alive2.llvm.org/ce/z/GwpG0M
2021-05-22 14:46:50 +02:00
Florian Hahn a6de8d95db
[Matrix] Bail out early if there are no matrix intrinsics.
If there are no matrix intrinsics in a function, we can directly bail
out, as there's nothing left to do.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D102931
2021-05-22 11:37:25 +01:00
Arthur Eubanks f7788e1bff Revert "[NewPM] Only invalidate modified functions' analyses in CGSCC passes"
This reverts commit d14d84af2f.

Causes unacceptable memory regressions.
2021-05-21 16:38:03 -07:00
Florian Hahn a0ce6439ca
[Matrix] Remove unused matrix-propagate-shape option.
The option was used during the initial bringup, but it does not add any
value at this point. Remove it.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D102930
2021-05-21 19:01:54 +01:00
maekawatoshiki fd53cb4148 Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass"
This reverts commit cea7a3fe3d.
To investigate sanitizer-x86_64-linux-fast failure.
2021-05-22 01:40:43 +09:00
maekawatoshiki cea7a3fe3d [LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D99149
2021-05-21 23:57:39 +09:00
Alexey Bataev 8dab25954b [SLP]Improve handling of compensate external uses cost.
External insertelement users can be represented as a result of shuffle
of the vectorized element and noconsecutive insertlements too. Added
support for handling non-consecutive insertelements.

Differential Revision: https://reviews.llvm.org/D101555
2021-05-21 07:45:31 -07:00
Djordje Todorovic cd49b3ae1a [DebugInfo] Salvage dbg.value() during ADCE
This has been found by using the [0].

[0] https://llvm.org/docs/HowToUpdateDebugInfo.html#\
    test-original-debug-info-preservation-in-optimizations

 Differential Revision: https://reviews.llvm.org/D100844
2021-05-21 05:25:59 -07:00
Daniil Fukalov e8e88c3353 [TTI] NFC: Change getRegUsageForType to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D102541
2021-05-21 15:17:23 +03:00
Stephen Tozer 36ec97f76a 3rd Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
This reapplies c0f3dfb9, which was reverted following the discovery of
crashes on linux kernel and chromium builds - these issues have since
been fixed, allowing this patch to re-land.

This reverts commit 4397b7095d.
2021-05-21 11:06:20 +01:00
Djordje Todorovic b9076d119a Recommit: "[Debugify][Original DI] Test dbg var loc preservation""
[Debugify][Original DI] Test dbg var loc preservation

    This is an improvement of [0]. This adds checking of
    original llvm.dbg.values()/declares() instructions in
    optimizations.

    We have picked a real issue that has been found with
    this (actually, picked one variable location missing
    from [1] and resolved the issue), and the result is
    the fix for that -- D100844.

    Before applying the D100844, using the options from [0]
    (but with this patch applied) on the compilation of GDB 7.11,
    the final HTML report for the debug-info issues can be found
    at [1] (please scroll down, and look for
    "Summary of Variable Location Bugs"). After applying
    the D100844, the numbers has improved a bit -- please take
    a look into [2].

    [0] https://llvm.org/docs/HowToUpdateDebugInfo.html#\
        test-original-debug-info-preservation-in-optimizations
    [1] https://djolertrk.github.io/di-check-before-adce-fix/
    [2] https://djolertrk.github.io/di-check-after-adce-fix/

    Differential Revision: https://reviews.llvm.org/D100845

The Unit test was failing because the pass from the test that
modifies the IR, in its runOnFunction() didn't return 'true',
so the expensive-check configuration triggered an assertion.
2021-05-21 02:04:29 -07:00
Xiang1 Zhang 5684851cb0 [HWASAN] No code changed, Only clang-format for HWAddressSanitizer.cpp 2021-05-21 14:00:34 +08:00
Jon Roelofs 0af3105b64 Revert "[Remarks] Add analysis remarks for memset/memcpy/memmove lengths"
This reverts commit 4bf69fb52b.

This broke spec2k6/403.gcc under -global-isel. Details to follow once I've
reduced the problem.
2021-05-20 12:19:16 -07:00
Kevin P. Neal f21f1eea05 [FPEnv] EarlyCSE support for constrained intrinsics, default FP environment edition
EarlyCSE cannot distinguish between floating point instructions and
constrained floating point intrinsics that are marked as running in the
default FP environment. Said intrinsics are supposed to behave exactly the
same as the regular FP instructions. Teach EarlyCSE to handle them in that
case.

Differential Revision: https://reviews.llvm.org/D99962
2021-05-20 14:40:51 -04:00
Reid Kleckner 8f20ac9595 [PGO] Don't reference functions unless value profiling is enabled
This reduces the size of chrome.dll.pdb built with optimizations,
coverage, and line table info from 4,690,210,816 to 2,181,128,192, which
makes it possible to fit under the 4GB limit.

This change can greatly reduce binary size in coverage builds, which do
not need value profiling. IR PGO builds are unaffected. There is a minor
behavior change for frontend PGO.

PGO and coverage both use InstrProfiling to create profile data with
counters. PGO records the address of each function in the __profd_
global. It is used later to map runtime function pointer values back to
source-level function names. Coverage does not appear to use this
information.

Recording the address of every function with code coverage drastically
increases code size. Consider this program:

  void foo();
  void bar();
  inline void inlineMe(int x) {
    if (x > 0)
      foo();
    else
      bar();
  }
  int getVal();
  int main() { inlineMe(getVal()); }

With code coverage, the InstrProfiling pass runs before inlining, and it
captures the address of inlineMe in the __profd_ global. This greatly
increases code size, because now the compiler can no longer delete
trivial code.

One downside to this approach is that users of frontend PGO must apply
the -mllvm -enable-value-profiling flag globally in TUs that enable PGO.
Otherwise, some inline virtual method addresses may not be recorded and
will not be able to be promoted. My assumption is that this mllvm flag
is not popular, and most frontend PGO users don't enable it.

Differential Revision: https://reviews.llvm.org/D102818
2021-05-20 11:09:24 -07:00
Sanjay Patel f34311c402 [GlobalOpt] recompute alignments for loads and stores of updated globals
GlobalOpt can slice structs/arrays and change GEPs in the process,
but it was not updating alignments for load/store users. This
eventually causes the crashing seen in:
https://llvm.org/PR49661
https://llvm.org/PR50253

On x86, this required SLP+codegen to create an aligned vector
store on an invalid address. The bugs would be easier to
demonstrate on a target with stricter alignment requirements.

I'm not sure if this is a complete solution. The alignment
updating code is adapted from InstCombine, so I assume that
part is tested and good.

Differential Revision: https://reviews.llvm.org/D102552
2021-05-20 12:12:21 -04:00
Alexey Bataev 182162b616 [SLP]Try to vectorize tiny trees with shuffled gathers of extractelements.
If we gather extract elements and they actually are just shuffles, it
might be profitable to vectorize them even if the tree is tiny.

Differential Revision: https://reviews.llvm.org/D101460
2021-05-20 08:36:16 -07:00
Djordje Todorovic 0ae3c1d4d7 Revert "[Debugify][Original DI] Test dbg var loc preservation"
This reverts commit 76f375f3d9.

This will be pushed again, after investigating a test failure:
https://lab.llvm.org/buildbot/#/builders/16/builds/11254
2021-05-20 07:11:35 -07:00
Djordje Todorovic 76f375f3d9 [Debugify][Original DI] Test dbg var loc preservation
This is an improvement of [0]. This adds checking of
original llvm.dbg.values()/declares() instructions in
optimizations.

We have picked a real issue that has been found with
this (actually, picked one variable location missing
from [1] and resolved the issue), and the result is
the fix for that -- D100844.

Before applying the D100844, using the options from [0]
(but with this patch applied) on the compilation of GDB 7.11,
the final HTML report for the debug-info issues can be found
at [1] (please scroll down, and look for
"Summary of Variable Location Bugs"). After applying
the D100844, the numbers has improved a bit -- please take
a look into [2].

[0] https://llvm.org/docs/HowToUpdateDebugInfo.html\
[1] https://djolertrk.github.io/di-check-before-adce-fix/
[2] https://djolertrk.github.io/di-check-after-adce-fix/

Differential Revision: https://reviews.llvm.org/D100845
2021-05-20 06:42:02 -07:00
Xiang1 Zhang 02f2d739e0 Revert "[HWASAN] Update the tag info for X86_64."
This reverts commit 81c18ce03c.
2021-05-20 13:12:59 +08:00
Xiang1 Zhang 81c18ce03c [HWASAN] Update the tag info for X86_64.
In LAM model X86_64 will use bits 57-62 (of 0-63) as HWASAN tag.
So here we make sure the tag shift position and tag mask is correct for x86-64.

Differential Revision: https://reviews.llvm.org/D102472
2021-05-20 11:22:12 +08:00
Zhiwei Chen dbc641deb9 [sanitizer] Reduce redzone size for small size global objects
Currently 1 byte global object has a ridiculous 63 bytes redzone.
This patch reduces the redzone size to be less than 32 if the size of global object is less than or equal to half of 32 (the minimal size of redzone).
A 12 bytes object has a 20 bytes redzone, a 20 bytes object has a 44 bytes redzone.

Reviewed By: MaskRay, #sanitizers, vitalybuka

Differential Revision: https://reviews.llvm.org/D102469
2021-05-19 19:18:50 -07:00
Jon Roelofs 3d2ffc88e6 Fix warnings in windows bots. NFC 2021-05-19 17:42:34 -07:00
Jon Roelofs 4bf69fb52b [Remarks] Add analysis remarks for memset/memcpy/memmove lengths
Differential revision: https://reviews.llvm.org/D102452
2021-05-19 15:09:18 -07:00
wlei 6539a80bc9 [CSSPGO] Avoid deleting probe instruction in FoldValueComparisonIntoPredecessors
This change tries to fix a place missing `moveAndDanglePseudoProbes `. In FoldValueComparisonIntoPredecessors, it folds the BB into predecessors and then marked the BB unreachable. However, the original logic from the BB is still alive, deleting the probe will mislead the SampleLoader mark it as zero count sample.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D102721
2021-05-19 13:39:05 -07:00
Joseph Huber 2db182ff8d [Diagnostics] Allow emitting analysis and missed remarks on functions
Summary:
Currently, only `OptimizationRemarks` can be emitted using a Function.
Add constructors to allow this for `OptimizationRemarksAnalysis` and
`OptimizationRemarkMissed` as well.

Reviewed By: jdoerfert thegameg

Differential Revision: https://reviews.llvm.org/D102784
2021-05-19 15:10:20 -04:00
Roman Lebedev 40fb4eeff9
[NFCI][Local] TryToSimplifyUncondBranchFromEmptyBlock(): use DeleteDeadBlocks() 2021-05-19 20:38:30 +03:00
Roman Lebedev c60ca9856c
[NFCI][Local] MergeBlockIntoPredecessor(): use DeleteDeadBlocks() 2021-05-19 20:38:30 +03:00
Roman Lebedev b0bb2149b3
[NFCI][Local] removeUnreachableBlocks(): use DeleteDeadBlocks() 2021-05-19 20:38:30 +03:00
Philip Reames 449d14ebd2 Do actual DCE in LoopUnroll (try 4)
Turns out simplifyLoopIVs sometimes returns a non-dead instruction in it's DeadInsts out param.  I had done a bit of NFC cleanup which was only NFC if simplifyLoopIVs obeyed it's documentation.  I'm simplfy dropping that part of the change.

Commit message from try 3:

Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :)

Original commit message:

The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case.

LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.

Differential Revision: https://reviews.llvm.org/D102511
2021-05-19 10:25:31 -07:00
Hongtao Yu 4ca6e37b98 [CSSPGO] Overwrite branch weight annotated in previous pass.
Sample profile loader can be run in both LTO prelink and postlink. Currently the counts annoation in postilnk doesn't fully overwrite what's done in prelink. I'm adding a switch (`-overwrite-existing-weights=1`) to enable a full overwrite, which includes:

1. Clear old metadata for calls when their parent block has a zero count. This could be caused by prelink code duplication.

2. Clear indirect call metadata if somehow all the rest targets have a sum of zero count.

3. Overwrite branch weight for basic blocks.

With a CS profile, I was seeing #1 and #2 help reduce code size by preventing post-sample ICP and CGSCC inliner working on obsolete metadata, which come from a partial global inlining in prelink.  It's not expected to work well for non-CS case with a less-accurate post-inline count quality.

It's worth calling out that some prelink optimizations can damage counts quality in an irreversible way. One example is the loop rotate optimization. Due to lack of exact loop entry count (profiling can only give loop iteration count and loop exit count), moving one iteration out of the loop body leaves the rest iteration count unknown. We had to turn off prelink loop rotate to achieve a better postlink counts quality. A even better postlink counts quality can be archived by turning off prelink CGSCC inlining which is not context-sensitive.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D102537
2021-05-19 09:12:24 -07:00
Amy Huang 517857421d Revert "Do actual DCE in LoopUnroll (try 3)"
This reverts commit b6320eeb86
as it causes clang to assert; see
https://reviews.llvm.org/rGb6320eeb8622f05e4a5d4c7f5420523357490fca.
2021-05-19 08:53:38 -07:00
David Sherwood 7e95a563c8 Remove scalable vector assert from InnerLoopVectorizer::setDebugLocFromInst
In InnerLoopVectorizer::setDebugLocFromInst we were previously
asserting that the VF is not scalable. This is because we want to
use the number of elements to create a duplication factor for the
debug profiling data. However, for scalable vectors we only know the
minimum number of elements. I've simply removed the assert for now
and added a FIXME saying that we assume vscale is always 1. When
vscale is not 1 it just means that the profiling data isn't as
accurate, but shouldn't cause any functional problems.
2021-05-19 13:33:10 +01:00
Roman Lebedev 8c2b535d6c
[NFCI][SimplifyCFG] removeEmptyCleanup(): use DeleteDeadBlock()
This required some changes to, instead of eagerly making PHI's
in the UnwindDest valid as-if the BB is already not a predecessor,
to be valid while BB is still a predecessor.
2021-05-19 14:08:25 +03:00
Roman Lebedev bb5d613aba
[NFCI][SimplifyCFG] removeEmptyCleanup(): streamline PHI node updating 2021-05-19 14:08:25 +03:00
Roman Lebedev a0be081646
[NFC][SimplifyCFG] removeEmptyCleanup(): use BasicBlock::phis() 2021-05-19 14:08:24 +03:00
Sander de Smalen 4f86aa650c [LV] Add -scalable-vectorization=<option> flag.
This patch adds a new option to the LoopVectorizer to control how
scalable vectors can be used.

Initially, this suggests three levels to control scalable
vectorization, although other more aggressive options can be added in
the future.

The possible options are:
- Disabled:   Disables vectorization with scalable vectors.
- Enabled:    Vectorize loops using scalable vectors or fixed-width
              vectors, but favors fixed-width vectors when the cost
              is a tie.
- Preferred:  Like 'Enabled', but favoring scalable vectors when the
              cost-model is inconclusive.

Reviewed By: paulwalker-arm, vkmr

Differential Revision: https://reviews.llvm.org/D101945
2021-05-19 10:40:56 +01:00
Roman Lebedev 57d20cbf46
[NFCI][SimplifyCFG] simplifyUnreachable(): use DeleteDeadBlock() 2021-05-19 12:04:22 +03:00
Roman Lebedev 69a43e5fc5
[NFCI][SimplifyCFG] simplifyReturn(): use DeleteDeadBlock() 2021-05-19 12:04:22 +03:00
Roman Lebedev 00f90e3fca
[NFCI][SimplifyCFG] simplifySingleResume(): use DeleteDeadBlock() 2021-05-19 12:04:22 +03:00
Roman Lebedev a4eb24c688
[NFCI][SimplifyCFG] simplifyCommonResume(): use DeleteDeadBlock() 2021-05-19 12:04:22 +03:00
Roman Lebedev 729e18cbf4
[NFCI] SimplifyCFGPass: mergeEmptyReturnBlocks(): use DeleteDeadBlocks()
In this case, it does the same thing as the original pattern does.

SimplifyCFG has a few lurking miscompilations about deleting blocks that
have their address taken, and consistently using DeleteDeadBlocks() instead
 of a hand-rolled pattern will allow to weed those cases out easierly.
2021-05-19 11:32:24 +03:00
Joseph Huber 68abc3d264 [Attributor] Change AAExecutionDomain to only accept intrinsics
Summary:
The OpenMP runtime functions don't always provide unique thread ID's to
determine if a basic block is truly single-threaded. Change the implementation
to only check NVPTX intrinsics for now.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102700
2021-05-18 21:19:26 -04:00
Rong Xu 886629a8c9 [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This patch implements first part of Flow Sensitive SampleFDO (FSAFDO).
It has the following changes:
(1) disable current discriminator encoding scheme,
(2) new hierarchical discriminator for FSAFDO.

For this patch, option "-enable-fs-discriminator=true" turns on the new
functionality. Option "-enable-fs-discriminator=false" (the default)
keeps the current SampleFDO behavior. When the fs-discriminator is
enabled, we insert a flag variable, namely, llvm_fs_discriminator, to
the object. This symbol will checked by create_llvm_prof tool, and used
to generate a profile with FS-AFDO discriminators enabled. If this
happens, for an extbinary format profile, create_llvm_prof tool
will add a flag to profile summary section.

Differential Revision: https://reviews.llvm.org/D102246
2021-05-18 16:23:43 -07:00
Arthur Eubanks b86302e500 [MSan] Set zeroext on call arguments to msan functions with zeroext parameter attribute
ABI attributes need to match between the caller and callee.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D102667
2021-05-18 14:07:39 -07:00
Arthur Eubanks 6b9524a05b [NewPM] Don't mark AA analyses as preserved
Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.

SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D102032
2021-05-18 13:49:03 -07:00
Nikita Popov e81334a754 [LICM] Remove MaybePromotable set (PR50367)
The MaybePromotable set keeps track of loads/stores for which
promotion was not attempted yet. Normally, any load/stores that
are promoted in the current iteration will be removed from this
set, because they naturally MustAlias with the promoted value.
However, if the source program has UB with metadata claiming that
a store is NoAlias, while it is actually MustAlias, and multiple
different pointers are promoted in the same iteration, it can
happen that a store is removed that is still in the MaybePromotable
set, causing a use-after-free.

While this could be fixed by explicitly invalidating values in
MaybePromotable in the LoopPromoter, I'm going with the more
radical option of dropping the set entirely here and check all
load/stores on each promotion iteration. As promotion, and especially
repeated promotion, are quite rare, this doesn't seem to have any
impact on compile-time.

Fixes https://bugs.llvm.org/show_bug.cgi?id=50367.
2021-05-18 20:26:01 +02:00
Sanjay Patel 6d949a9c8f [InstCombine] restrict funnel shift match to avoid miscompile
As noted in the post-commit discussion for:
https://reviews.llvm.org/rGabd7529625a73f405e40a63dcc446c41d51a219e

...that change exposed a logic hole that allows a miscompile
if the shift amount could exceed the narrow width:
https://alive2.llvm.org/ce/z/-i_CiM
https://alive2.llvm.org/ce/z/NaYz28

The restriction isn't necessary for a rotate (same operand for
both shifts), so we should adjust the matching for the shift
value as a follow-up enhancement:
https://alive2.llvm.org/ce/z/ahuuQb
2021-05-18 13:32:07 -04:00
Florian Hahn cc1a6361d3
[VPlan] Add VPUserID to distinguish between recipes and others.
This allows cast/dyn_cast'ing from VPUser to recipes. This is needed
because there are VPUsers that are not recipes.

Reviewed By: gilr, a.elovikov

Differential Revision: https://reviews.llvm.org/D100257
2021-05-18 09:17:28 +01:00
Sander de Smalen 81fdc73e5d [LV] Return both fixed and scalable Max VF from computeMaxVF.
This patch introduces a new class, MaxVFCandidates, that holds the
maximum vectorization factors that have been computed for both scalable
and fixed-width vectors.

This patch is intended to be NFC for fixed-width vectors, although
considering a scalable max VF (which is disabled by default) pessimises
tail-loop elimination, since it can no longer determine if any chosen VF
(less than fixed/scalable MaxVFs) is guaranteed to handle all vector
iterations if the trip-count is known. This issue will be addressed in
a future patch.

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D98721
2021-05-18 08:03:48 +01:00
Adam Nemet ab1f6ffa56 [GVN] Improve analysis for missed optimization remark
This change tries to handle multiple dominating users of the pointer operand
by choosing the most immediately dominating one, if possible.  While making
this change I also found that the previous implementation had a missing break
statement, making all loads with an odd number of dominating users emit an
OtherAccess value, so that has also been fixed.

Patch by Henrik G Olsson!

Differential Revision: https://reviews.llvm.org/D79097
2021-05-17 21:51:15 -07:00
Philip Reames ed9d70781b Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3)"
This reverts commit 6d3e3ae8a9.

Still seeing PPC build bot failures, and one arm self host bot failing.  I'm officially stumped, and need help from a bot owner to reduce.
2021-05-17 20:53:28 -07:00
Serguei Katkov 7bed58d28f [Inliner] Copy attributes when deoptimize intrinsic is inlined
During inlining of call-site with deoptimize intrinsic callee we miss
attributes set on this call site. As a result attributes like deopt-lowering are
disappeared resulting in inefficient behavior of register allocator in codegen.

Just copy attributes for deoptimize call like we do for others calls.

Reviewers: reames, apilipenko
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D102602
2021-05-18 10:08:37 +07:00
Adam Nemet fcffd087c6 [Matrix] Fold the transpose into the matmul operand used to fetch scalars
For column-major this is:
  A * B^t
whereas for row-major:
  A^t * B

Differential Revision: https://reviews.llvm.org/D101762
2021-05-17 17:40:46 -07:00
Philip Reames 6d3e3ae8a9 [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3)
Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll

Previous commit message...

This is a resubmit of 3e5ce4 (which was reverted by 7fe41ac).  The original commit caused a PPC build bot failure we never really got to the bottom of.  I can't reproduce the issue, and the bot owner was non-responsive.  In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in 80e8025.  My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess.

Original commit message follows...

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-05-17 16:59:25 -07:00
Philip Reames d16da7343d Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute"
This reverts commit c23ce54b36.  I apparently missed some newly added non-x86 tests.
2021-05-17 16:49:32 -07:00
Philip Reames c23ce54b36 [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute
This is a resubmit of 3e5ce4 (which was reverted by 7fe41ac).  The original commit caused a PPC build bot failure we never really got to the bottom of.  I can't reproduce the issue, and the bot owner was non-responsive.  In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in 80e8025.  My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess.

Original commit message follows...

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-05-17 16:33:56 -07:00
Philip Reames b6320eeb86 Do actual DCE in LoopUnroll (try 3)
Recommitting after fixing a bug found post commit.  Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug.  Oops.  :)

The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration.  Test case added in LoopUnroll/dce.ll to cover this case.

LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.

Differential Revision: https://reviews.llvm.org/D102511
2021-05-17 14:47:02 -07:00
Sanjay Patel 3cdd05e519 [InstCombine] fold fnegs around select
This is one of the folds requested in:
https://llvm.org/PR39480

https://alive2.llvm.org/ce/z/NczU3V

Note - this uses the normal FMF propagation logic
(flags transfer from the final value to new/intermediate ops).
It's not clear if this matches what Alive2 implements,
so we may want to adjust one or the other.
2021-05-17 14:53:49 -04:00
Roman Lebedev 0633d5ce7b
[LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition.
I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests,
so in principle i'm fine with landing this without review, but just in case..

This adds support for the "count active bits" pattern, i.e.:
```
int countActiveBits(unsigned val) {
    int cnt = 0;
    for( ; (val >> cnt) != 0; ++cnt)
        ;
    return cnt;
}
```
but a somewhat more general one, since that is what i need:
```
int countActiveBits(unsigned val, int start, int off) {
    int cnt;
    for (cnt = start; val >> (cnt + off); cnt++)
        ;
    return cnt;
}
```

I've followed in footstep of 'left-shift until bittest' idiom (D91038),
in the sense that iff the `ctlz` intrinsic is cheap, we'll transform,
regardless of all other factors.

This can have a shocking effect on certain benchmarks:
```
raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf
RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm
2021-05-09T01:06:05+03:00
Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench
Run on (32 X 3600.24 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x2)
Load Average: 5.26, 6.29, 3.49
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_mean          145 ms          145 ms          128   0.145319         0.999981   10.1568M       69.8949M        69.8936M      6.88159       6.88146   0.145322
p1319978.orf/threads:32/process_time/real_time_median        145 ms          145 ms          128   0.145317         0.999986   10.1568M       69.8941M        69.8931M      6.88151       6.88141   0.145319
p1319978.orf/threads:32/process_time/real_time_stddev      0.766 ms        0.766 ms          128   766.586u         15.1302u          0       354.167k        354.098k    0.0348699     0.0348631   766.469u
RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0
2021-05-09T01:06:24+03:00
Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Run on (32 X 3599.95 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x2)
Load Average: 4.05, 5.95, 3.43
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_mean         99.8 ms         99.8 ms          128  0.0997758         0.999972   10.1568M       101.797M        101.794M      10.0225       10.0222  0.0997786
p1319978.orf/threads:32/process_time/real_time_median       99.7 ms         99.7 ms          128  0.0997165         0.999985   10.1568M       101.857M        101.854M      10.0284       10.0281  0.0997195
p1319978.orf/threads:32/process_time/real_time_stddev      0.224 ms        0.224 ms          128   224.166u          34.345u          0        226.81k        227.231k    0.0223309     0.0223723   224.586u
Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Benchmark                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 128 vs 128
p1319978.orf/threads:32/process_time/real_time_mean                  -0.3134         -0.3134           145           100           145           100
p1319978.orf/threads:32/process_time/real_time_median                -0.3138         -0.3138           145           100           145           100
p1319978.orf/threads:32/process_time/real_time_stddev                -0.7073         -0.7078             1             0             1             0

```

Reviewed By: craig.topper, zhuhan0

Differential Revision: https://reviews.llvm.org/D102116
2021-05-17 20:33:33 +03:00
Hongtao Yu f28ee1a2b3 [CSSPGO] Update pseudo probe distribution factor based on inline context.
With prelink inlining, pseudo probes with same ID can come from different inline contexts. Such probes should not share samples and their factors should be fixed up separately.

I'm seeing 0.3% speedup for SPEC2017 overall. Benchmark 631.deepsjeng_s benefits the most, about 4%.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D102429
2021-05-16 23:11:36 -07:00
Philip Reames 6ae9893ed2 Revert "Do actual DCE in LoopUnroll (try 2)"
This reverts commit 653fa0b46a.

Reported to trigger pr50354.  Reverting until investigated.
2021-05-16 09:38:36 -07:00
Kuter Dinel 64ef29bc66 [Attributor] Call site specific AAValueSimplification and AAIsDead.
This patch makes it possible to do call site specific deductions
for AAValueSimplification and AAIsDead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84722
2021-05-15 21:39:07 +00:00
Simon Pilgrim e30540a603 SampleProfileLoader::inlineHotFunctionsWithPriority - Fix uninitialized variable warning. NFCI.
findIndirectCallFunctionSamples will leave Sum uninitialized if it returns an empty vector, we don't really use Sum in this case (but we do make a copy that isn't used either) - so ensure we initialize the value to zero to at least silence the static analysis warning.
2021-05-15 15:02:52 +01:00
Nikita Popov f9e9b0cdb4 [CFG] Move reachable from entry checks into basic block variant
These checks are not specific to the instruction based variant of
isPotentiallyReachable(), they are equally valid for the basic
block based variant. Move them there, to make sure that switching
between the instruction and basic block variants cannot introduce
regressions.
2021-05-15 15:42:02 +02:00
Simon Pilgrim f0660a977e [Local] collectBitParts - bail out if we find more than one root input value.
All the uses that we have for collectBitParts revolve around us matching down to an operation with a single root value - I don't think we're intending to change that (and a lot of collectBitParts assumes it).

The binops cases (OR/FSHL/FSHR) already check if the providers are the same, but that would still mean we waste time collecting through unaryops before getting to them.
2021-05-15 13:58:42 +01:00
Simon Pilgrim 401d6685c0 [InstCombine] InstCombinerImpl::visitOr - enable bitreverse matching
Currently we only match bswap intrinsics from or(shl(),lshr()) style patterns when we could often match bitreverse intrinsics almost as cheaply.

Differential Revision: https://reviews.llvm.org/D90170
2021-05-15 13:39:09 +01:00
Simon Pilgrim 28aa7d378a [Local] collectBitParts - early-out from binops. NFCI.
Minor speedup by not bothering to attempt to collect the second operand's bit parts if we already know its failed in the first operand.
2021-05-15 13:04:10 +01:00
Nikita Popov fb9ed1979a [IR] Add BasicBlock::isEntryBlock() (NFC)
This is a recurring and somewhat awkward pattern. Add a helper
method for it.
2021-05-15 12:41:58 +02:00
Vitaly Buka 6ce7b2f026 Fix "is not used" warning 2021-05-14 20:58:58 -07:00
Philip Reames fcd12fed41 Extract a helper routine to simplify D91481 [NFC] 2021-05-14 18:40:23 -07:00
Nick Desaulniers 8c72749bd9 [LowerConstantIntrinsics] reuse isManifestLogic from ConstantFolding
GlobalVariables are Constants, yet should not unconditionally be
considered true for __builtin_constant_p.

Via the LangRef
https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic:

    This intrinsic generates no code. If its argument is known to be a
    manifest compile-time constant value, then the intrinsic will be
    converted to a constant true value. Otherwise, it will be converted
    to a constant false value.

    In particular, note that if the argument is a constant expression
    which refers to a global (the address of which _is_ a constant, but
    not manifest during the compile), then the intrinsic evaluates to
    false.

Move isManifestConstant from ConstantFolding to be a method of
Constant so that we can reuse the same logic in
LowerConstantIntrinsics.

pr/41459

Reviewed By: rsmith, george.burgess.iv

Differential Revision: https://reviews.llvm.org/D102367
2021-05-14 15:35:21 -07:00
wlei e475d4d69f [CSSPGO] Fix return value of getProbeWeight
Currently we didn't support multiple return type, we work around to use error_code to represent:

1)  The dangling probe.
2)  Ignore the weight of non-probe instruction

While merging the instructions' weight for the whole BB, it will filter out the error code. But If all instructions of the BB give error_code, the outside logic will mark it as a BB requiring the inference algorithm to infer its weight. This is different from the zero value which will be treated as a cold block.

Fix one place that if we can't find the FunctionSamples in the profile data which indicates the BB is cold, we choose to return zero.

Also refine the comments.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D102007
2021-05-14 14:06:09 -07:00
Sanjay Patel e82db87fb1 [InstCombine] drop poison flags when simplifying 'shl' based on demanded bits
As with other transforms in demanded bits, we must be careful not to
wrongly propagate nsw/nuw if we are reducing values leading up to the shift.

This bug was introduced with 1b24f35f84 and leads to the miscompile
shown in:
https://llvm.org/PR50341
2021-05-14 13:54:13 -04:00
Philip Reames 653fa0b46a Do actual DCE in LoopUnroll (try 2)
Recommitting after addressing a missed review comment, and updating an aarch64 test I'd missed.

LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.

Differential Revision: https://reviews.llvm.org/D102511
2021-05-14 10:42:36 -07:00
Philip Reames e488bf815f Revert "Do actual DCE in LoopUnroll"
This reverts commit 9d1a61e695.

I'd missed some review feedback, and had missed updating an aarch64 test.  Reverting while I fix both.
2021-05-14 10:15:30 -07:00
Philip Reames 9d1a61e695 Do actual DCE in LoopUnroll
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.

Differential Revision: https://reviews.llvm.org/D102511
2021-05-14 10:05:25 -07:00
Philip Reames 3f1c218318 [rs4gc] Strip memory related attributes consistently
I noticed that rs4gc is not stripping a number of memory aliasing related attributes. We do strip some from call sites, but don't strip the same ones from declarations or parameters.

Why do we need to strip these? Two answers:

    Safepoints conceptually read and write to the entire garbage collected heap in the physical model. We need this to preserve ordering of all loads and stores with respect to possible relocation.
    We can infer other attributes from these. For instance, readnone can imply both nofree and nosync. Both of which don't hold after physical rewriting.

Note: This exposed a latent issue which was fixed a couple weeks back in 01801d5274.

Differential Revision: https://reviews.llvm.org/D99802
2021-05-14 07:54:56 -07:00
Djordje Todorovic 01c90bbd4f [Transforms][Debugify] Fix "Missing line" false alarm on PHI nodes
This is a fix for https://bugs.llvm.org/show_bug.cgi?id=49959

The "Missing line" false alarm was introduced in D75242.

Patch by Yilong Guo<yilong.guo@intel.com>

Differential Revision: https://reviews.llvm.org/D100446
2021-05-14 14:06:13 +02:00
Sander de Smalen f82966d19a [LoopVectorizationLegality] NFC: Mark some interfaces as 'const'
This patch marks blockNeedsPredication, isConsecutivePtr, isMaskRequired
and getSymbolicStrides as 'const'.
2021-05-14 11:53:54 +01:00
Tim Northover ea0eec69f1 IR+AArch64: add a "swiftasync" argument attribute.
This extends any frame record created in the function to include that
parameter, passed in X22.

The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:

  * 0b0000 => a normal, non-extended record with just [FP, LR]
  * 0b0001 => the extended record [X22, FP, LR]
  * 0b1111 => kernel space, and a non-extended record.

All other values are currently reserved.

If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.

There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
2021-05-14 11:43:58 +01:00
Simon Pilgrim 079bbea2b2 [Local] collectBitParts - for bswap-only matches, limit shift amounts to whole bytes to reduce compile time. 2021-05-14 11:42:52 +01:00
Simon Pilgrim 78c8451cd7 [Local] collectBitParts - reduce maximum recursion depth.
As noticed on D90170, the recursion depth for matching a maximum of a i128 bitwidth was too high.

@lebedev.ri mentioned that we can probably do better by limiting the number of collected Values instead of just depth, but I'll look at that later.
2021-05-14 11:42:51 +01:00
Anton Afanasyev 207cdd7ed9 [SLP] Fix spill cost computation for insertelement tree node
This is follow up for D98714, bugfixing.
2021-05-14 13:14:41 +03:00
Sander de Smalen 459c48e04f NFCI: Remove VF argument from isScalarWithPredication
As discussed in D102437, the VF argument to isScalarWithPredication
seems redundant, so this is intended to be a non-functional change. It
seems wrong to query the widening decision at this point. Removing the
operand and code to get the widening decision causes no unit/regression
tests to fail. I've also found no issues running the LLVM test-suite.

This subsequently removes the VF argument from isPredicatedInst as well,
since it is no longer required.
2021-05-14 10:34:40 +01:00
dfukalov fdae3fc8b3 [GVN] Clobber partially aliased loads.
Use offsets stored in `AliasResult` implemented in D98718.

Updated with fix of issue reported in https://reviews.llvm.org/D95543#2745161

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95543
2021-05-14 11:17:14 +03:00
David Green f7cb654763 [DSE] Move isOverwrite into DSEState. NFC
This moves the isOverwrite function into the DSEState so that it can
share the analyses and members from the state.

A few extra loop tests were also added to test stores in and around
multi block loops for D100464.
2021-05-14 09:16:51 +01:00
Joseph Huber 8b57ed09bd [OpenMP] Prevent Attributor from deleting functions in OpenMPOptCGSCC pass
Summary:
This patch prevents the Attributor instances made in the CGSCC pass from
deleting functions. This prevents the attributor from changing the call
graph while OpenMPOpt is working with it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102363
2021-05-13 16:35:23 -04:00
cynecx 8ec9fd4839 Support unwinding from inline assembly
I've taken the following steps to add unwinding support from inline assembly:

1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax:

```
invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"()
    to label %exit unwind label %uexit
```

2.) Add Bitcode writing/reading support + LLVM-IR parsing.

3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled.

4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind.

5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled.

6.) Don't allow unwinding callbr.

Reviewed By: Amanieu

Differential Revision: https://reviews.llvm.org/D95745
2021-05-13 19:13:03 +01:00
Florian Hahn bdada7546e
[VPlan] Adjust assert in splitBlock to allow splitting at end.
SplitAt should only be dereferenced in the assert if it does not point
to the end of the block. This fixes a crash in the added test case.
2021-05-13 13:36:35 +01:00
Jingu Kang 107d19eb01 Revert "[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch"
This reverts commit 88b259c014.

It needs to fix below bugs.

https://bugs.llvm.org/show_bug.cgi?id=50279
https://bugs.llvm.org/show_bug.cgi?id=50302
2021-05-13 08:40:49 +01:00
Chuanqi Xu c1359ef07e [Coroutines] Salvege Debug.values
Summary: The previous implementation of coro-split didn't collect values
used by dbg instructions into the spills which made a log debug info
unavailable with optimization on.
This patch tries to collect these uses which are used by dbg.values. In
this way, the debugbility of coroutine could be as powerful as normal
functions with optimization on.

To avoid enlarging the coroutine frame, this patch only collects
`dbg.value` whose value is already in the coroutine frame. This decision
may make some debug info getting unavailable. But if we are with
optimization on, the performance issue should be considered first. And
this patch would make the debugbility of coroutine to be better only
without changing the layout of the frame.

Test-plan: check-llvm

Reviewed By: aprantl, lxfind

Differential Revision: https://reviews.llvm.org/D97673
2021-05-13 13:06:33 +08:00
Chuanqi Xu 6e5b8f489a [Coroutines] Enable printing coroutine frame when dbg info is available
Summary: This patch tries to build debug info for coroutine frame in the
middle end. Although the coroutine frame is constructed and maintained by
the compiler and the programmer shouldn't care about the coroutine frame
by the design of C++20 coroutine,
a lot of programmers told me that they want to see the layout of the
coroutine frame strongly. Although C++ is designed as an abstract layer
so that the programmers shouldn't care about the actual memory in bits,
many experienced C++ programmers  are familiar with assembler and
debugger to see the memory layout in fact, After I was been told they
want to see the coroutine frame about 3 times, I think it is an actual
and desired demand.

However, the debug information is constructed in the front end and
coroutine frame is constructed in the middle end. This is a natural and
clear gap. So I could only try to construct the debug information in the
middle end after coroutine frame constructed. It is unusual, but we are
in consensus that the approch is the best one.

One hard part is we need construct the name for variables since there
isn't a map from llvm variables to DIVar. Then here is the strategy this
patch uses:
- The name `__resume_fn `, `__destroy_fn` and `__coro_index ` are
  constructed by the patch.
- Then the name `__promise` comes from the dbg.variable of corresponding
  dbg.declare of PromiseAlloca, which shows highest priority to
construct the debug information for the member of coroutine frame.
- Then if the member is struct, we would try to get the name of the llvm
  struct directly. Then replace ':' and '.' with '_' to make it
printable for debugger.
- If the member is a basic type like integer or double, we would try to
  emit the corresponding name.
- Then if the member is a Pointer Type, we would add `Ptr` after
  corresponding pointee type.
- Otherwise, we would name it with 'UnknownType'.

Reviewered by: lxfind, aprantl, rjmcall, dblaikie

Differential Revision: https://reviews.llvm.org/D99179
2021-05-13 12:43:08 +08:00
Anton Afanasyev ab2c499d3a [SLP] Add insertelement instructions to vectorizable tree
Add new type of tree node for `InsertElementInst` chain forming vector.
These instructions could be either removed, or replaced by shuffles during
vectorization and we can add this node to cost model, so naturally estimating
their cost, getting rid of `CompensateCost` tricks and reducing further work
for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this
patch is the first step towards revectorization of partially vectorization
(to fix PR42022 completely). After adding inserts to tree the next step is
to add vector instructions there (for instance, to merge `store <2 x float>`
and `store <2 x float>` to `store <4 x float>`).

Fixes PR40522 and PR35732.

Differential Revision: https://reviews.llvm.org/D98714
2021-05-13 07:41:45 +03:00
Justin Bogner e7d26aceca Change the context instruction for computeKnownBits in LoadStoreVectorizer pass
This change enables cases for which the index value for the first
load/store instruction in a pair could be a function argument. This
allows using llvm.assume to provide known bits information in such
cases.

Patch by Viacheslav Nikolaev. Thanks!

Differential Revision: https://reviews.llvm.org/D101680
2021-05-12 15:29:29 -07:00
Nikita Popov a8f7dee1df [InstCombine] Support one-hot merge for logical and/or
If a logical and/or is used, we need to be careful not to propagate
a potential poison value from the RHS by inserting a freeze
instruction. Otherwise it works the same way as bitwise and/or.

This is intended to address the regression reported at
https://reviews.llvm.org/D101191#2751002.

Differential Revision: https://reviews.llvm.org/D102279
2021-05-12 21:01:18 +02:00
Stelios Ioannou 1124ad2f5d [LoopFlatten] Simplify loops so that the pass can operate on unsimplified loops.
The loop flattening pass requires loops to be in simplified form. If the
loops are not in simplified form, the pass cannot operate. This patch
simplifies all loops before flattening. As a result, all loops will be
simplified regardless of whether anything ends up being flattened.

This change was inspired by observing a certain loop that was not flatten
because the loops were not in simplified form. This loop is added as a
test to verify that it is now flattened.

Differential Revision: https://reviews.llvm.org/D102249

Change-Id: I45bcabe70fb99b0d89f0effafc82eb9e0585ec30
2021-05-12 19:22:01 +01:00
Roman Lebedev 554b1bced3
[InstCombine] ~(C + X) --> ~C - X (PR50308)
We can not rely on (C+X)-->(X+C) already happening,
because we might not have visited that `add` yet.
The added testcase would get stuck in an endless combine loop.
2021-05-12 16:10:55 +03:00
David Sherwood b7a11274f9 [LoopVectorize] Fix scalarisation crash in widenPHIInstruction for scalable vectors
In InnerLoopVectorizer::widenPHIInstruction there are cases where we have
to scalarise a pointer induction variable after vectorisation. For scalable
vectors we already deal with the case where the pointer induction variable
is uniform, but we currently crash if not uniform. For fixed width vectors
we calculate every lane of the scalarised pointer induction variable for a
given VF, however this cannot work for scalable vectors. In this case I
have added support for caching the whole vector value for each unrolled
part so that we can always extract an arbitrary element. Additionally, we
still continue to cache the known minimum number of lanes too in order
to improve code quality by avoiding an extractelement operation.

I have adapted an existing test `pointer_iv_mixed` from the file:

  Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

and added it here for scalable vectors instead:

  Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

Differential Revision: https://reviews.llvm.org/D101294
2021-05-12 11:02:11 +01:00
Qiu Chaofan 6d2df18163 [VectorComine] Restrict single-element-store index to inbounds constant
Vector single element update optimization is landed in 2db4979. But the
scope needs restriction. This patch restricts the index to inbounds and
vector must be fixed sized. In future, we may use value tracking to
relax constant restrictions.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D102146
2021-05-12 13:18:20 +08:00
Congzhe Cao 3f8be15f29 [LoopInterchange] Handle lcssa PHIs with multiple predecessors
This is a bugfix in the transformation phase.

If the original outer loop header branches to both the inner loop
(header) and the outer loop latch, and if there is an lcssa PHI
node outside the loop nest, then after interchange the new outer latch
will have an lcssa PHI node inserted which has two predecessors, i.e.,
the original outer header and the original outer latch. Currently
the transformation assumes it has only one predecessor (the original
outer latch) and crashes, since the inserted lcssa PHI node does
not take both predecessors as incoming BBs.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D100792
2021-05-11 21:30:54 -04:00
Jordan Rupprecht fec2945998 Revert "[GVN] Clobber partially aliased loads."
This reverts commit 6c57044231.

It causes assertion errors due to widening atomic loads, and potentially causes miscompile elsewhere too. Repro, also posted to D95543:

```
$ cat repro.ll
; ModuleID = 'repro.ll'
source_filename = "repro.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.widget = type { i32 }
%struct.baz = type { i32, %struct.snork }
%struct.snork = type { %struct.spam }
%struct.spam = type { i32, i32 }

@global = external local_unnamed_addr global %struct.widget, align 4
@global.1 = external local_unnamed_addr global i8, align 1
@global.2 = external local_unnamed_addr global i32, align 4

define void @zot(%struct.baz* %arg) local_unnamed_addr align 2 {
bb:
  %tmp = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1
  %tmp1 = bitcast %struct.snork* %tmp to i64*
  %tmp2 = load i64, i64* %tmp1, align 4
  %tmp3 = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1, i32 0, i32 1
  %tmp4 = icmp ugt i64 %tmp2, 4294967295
  br label %bb5

bb5:                                              ; preds = %bb14, %bb
  %tmp6 = load i32, i32* %tmp3, align 4
  %tmp7 = icmp ne i32 %tmp6, 0
  %tmp8 = select i1 %tmp7, i1 %tmp4, i1 false
  %tmp9 = zext i1 %tmp8 to i8
  store i8 %tmp9, i8* @global.1, align 1
  %tmp10 = load i32, i32* @global.2, align 4
  switch i32 %tmp10, label %bb11 [
    i32 1, label %bb12
    i32 2, label %bb12
  ]

bb11:                                             ; preds = %bb5
  br label %bb14

bb12:                                             ; preds = %bb5, %bb5
  %tmp13 = load atomic i32, i32* getelementptr inbounds (%struct.widget, %struct.widget* @global, i64 0, i32 0) acquire, align 4
  br label %bb14

bb14:                                             ; preds = %bb12, %bb11
  br label %bb5
}
$ opt -O2 repro.ll -disable-output
opt: /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Utils/VNCoercion.cpp:496: llvm::Value *llvm::VNCoercion::getLoadValueForLoad(llvm::LoadInst *, unsigned int, llvm::Type *, llvm::Instruction *, const llvm::DataLayout &): Assertion `SrcVal->isSimple() && "Cannot widen volatile/atomic load!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/rupprecht/dev/opt -O2 repro.ll -disable-output
...
```
2021-05-11 16:08:53 -07:00
Congzhe Cao 40e3aa39bd [LoopInterchange] Fix legality for triangular loops
This is a bug fix in legality check.

When we encounter triangular loops such as the following form:
    for (int i = 0; i < m; i++)
      for (int j = 0; j < i; j++), or

    for (int i = 0; i < m; i++)
      for (int j = 0; j*i < n; j++),

we should not perform interchange since the number of executions
of the loop body will be different before and after interchange,
resulting in incorrect results.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D101305
2021-05-11 18:36:53 -04:00
Congzhe Cao d3f89d4d16 Revert "[LoopInterchange] Fix legality for triangular loops"
This reverts commit 29342291d2.

The test case requires an assert build. Will add REQUIRES and re-commit.
2021-05-11 18:10:58 -04:00
Nikita Popov 1556540372 [InstCombine] Clean up one-hot merge optimization (NFC)
Remove the requirement that the instruction is a BinaryOperator,
make the predicate check more compact and use slightly more
meaningful naming for the and operands.
2021-05-11 23:22:11 +02:00
Fangrui Song 129f466e22 [GlobalOpt] Remove heap SROA
GlobalOpt implements a heap SROA (SROA for an malloc allocatated struct or array
of structs) which is largely undertested (heap-sra-[1234].ll are basically the
same test with very little difference) and does not trigger at all when
bootstrapping clang (it only supports the case of one single store).

The heap SROA implementation causes PR50027 (GEP is not properly handled; crash or miscompile).
Just drop the implementation. I have deleted some obviously duplicated tests
but kept `heap-sra-[12]{,-no-nullopt}.ll`.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102257
2021-05-11 11:34:37 -07:00
Eli Friedman 61cbbba7a6 [ArgumentPromotion] Fix byval alignment handling.
Make sure the alignment of the generated operations matches the
alignment of the byval argument.  Previously, we were just ignoring
alignment and getting lucky.

While I'm here, also delete the unnecessary "tail" handling.
Passing a pointer to a byval argument to a "tail" call is UB, so
rewriting to an alloca doesn't require any special handling.

Differential Revision: https://reviews.llvm.org/D89819
2021-05-11 11:22:18 -07:00
Congzhe Cao 29342291d2 [LoopInterchange] Fix legality for triangular loops
This is a bug fix in legality check.

When we encounter triangular loops such as the following form:
    for (int i = 0; i < m; i++)
      for (int j = 0; j < i; j++), or

    for (int i = 0; i < m; i++)
      for (int j = 0; j*i < n; j++),

we should not perform interchange since the number of executions of the loop body
will be different before and after interchange, resulting in incorrect results.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D101305
2021-05-11 11:00:46 -04:00
Florian Hahn faebc6bf10
[VPlan] Register recipe for instr if the simplified value is recipe.
If the simplified VPValue is a recipe, we need to register it for Instr,
in case it needs to be recorded. The way this is handled in general may
change soon, following some post-commit comments.

This fixes PR50298.
2021-05-11 14:32:34 +01:00
Sanjay Patel 49950cb1f6 [SLP] restrict matching of load combine candidates
The test example from https://llvm.org/PR50256 (and reduced here)
shows that we can match a load combine candidate even when there
are no "or" instructions. We can avoid that by confirming that we
do see an "or". This doesn't apply when matching an or-reduction
because that match begins from the operands of the reduction.

Differential Revision: https://reviews.llvm.org/D102074
2021-05-11 08:46:40 -04:00
Sanjay Patel 5577e86691 [InstCombine] fold extract subvector of bitcast insertelt
This is visible in the original example from:
https://llvm.org/PR50055
(but this change doesn't solve the bug)

https://alive2.llvm.org/ce/z/vM_Yq-
2021-05-10 17:20:10 -04:00
Nikita Popov 463ea28e96 [InstCombine] Fold comparison of integers by parts
Let's say you represent (i32, i32) as an i64 from which the parts
are extracted with lshr/trunc. Then, if you compare two tuples by
parts you get something like A[0] == B[0] && A[1] == B[1], just
that the part extraction happens by lshr/trunc and not a narrow
load or similar.

The fold implemented here reduces such equality comparisons by
converting them into a comparison on a larger part of the integer
(which might be the whole integer). It handles both the "and of eq"
and the conjugated "or of ne" case.

I'm being conservative with one-use for now, though this could be
relaxed if profitable (the base pattern converts 11 instructions
into 5 instructions, but there's quite a few variations on how it
can play out).

Differential Revision: https://reviews.llvm.org/D101232
2021-05-10 22:22:39 +02:00
Nikita Popov aa9b02ac75 [Inliner] Fix noalias metadata handling for instructions simplified during cloning (PR50270)
Instead of using VMap, which may include instructions from the
caller as a result of simplification, iterate over the
(FirstNewBlock, Caller->end()) range, which will only include new
instructions.

Fixes https://bugs.llvm.org/show_bug.cgi?id=50270.

Differential Revision: https://reviews.llvm.org/D102110
2021-05-10 21:59:59 +02:00
Sanjay Patel 88d8f10baf [PassManager] add helper function to hold set of vector passes (2nd try)
This is better no-functional-change-intended than the 1st attempt.
As noted in D102002, there were at least 2 diffs that went
unchecked in pass manager regressions tests: different pass
parameters (SimplifyCFG) and an extension point/callback.
Those should be lifted from the original code blocks correctly
now.
2021-05-10 14:43:00 -04:00
Sanjay Patel 822be4bec8 Revert "[PassManager] add helper function to hold set of vector passes"
This reverts commit fefcb1f878.
It was supposed to be NFC, but as noted in the post-commit
comments in D102002, that was not true: SimplifyCFG uses
different parameters and there's a difference in an
extension point / callback.
2021-05-10 10:59:30 -04:00
Alexey Bataev 30463bc3f1 [SLP]Do not count perfect diamond matches for gathers several times.
Need to remove the old code for avoiding double counting of the gather
nodes with perfect diamond matches within the tree after we started
detecting perfect/shuffled matching in the previous patch D100495. We
may skip the cost for such nodes completely.

Differential Revision: https://reviews.llvm.org/D102023
2021-05-10 07:08:07 -07:00
Teresa Johnson 220f6e5271 [SimplifyCFG] Ignore ephemeral values when counting insts for threading
Ignore ephemeral values (only feeding llvm.assume intrinsics) when
computing the instruction count to decide if a block is small enough for
threading. This is similar to the handling of these values in the
InlineCost computation. These instructions will eventually be removed
and shouldn't count against code size (similar to the existing ignoring
of phis).

Without this change, when enabling -fwhole-program-vtables, which causes
type test / assume sequences to be inserted by clang, we can get
different threading decisions. In particular, when building with
instrumentation FDO it can affect the optimizations decisions before FDO
matching, leading to some mismatches.

Differential Revision: https://reviews.llvm.org/D101494
2021-05-09 19:06:54 -07:00
Roman Lebedev 1acd9a1a29
Revert "[LICM] Hoist loads with invariant.group metadata"
This appears to miscompile google benchmark's GetCacheSizesFromKVFS()
when compiling with -fstrict-vtable-pointers.
Runnable reproducer: https://godbolt.org/z/f9ovKqTzb
The "f.fail()" crashes with BUS error, it is compiled into testb,
and the adress it is testing is non-sensical.

This reverts commit 4c89bcadf6.
2021-05-08 15:44:49 +03:00
Qiu Chaofan 2db4979c0f [VectorCombine] Simplify to scalar store if only one element updated
This patch simplifies load-insertelt-store pattern into
getelementptr-store.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D98240
2021-05-08 18:14:51 +08:00
Arthur Eubanks 34a8a437bf [NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose
Printing pass manager invocations is fairly verbose and not super
useful.

This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.

This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D101797
2021-05-07 21:51:47 -07:00
Adrian Prantl c6ddf669dc Fix the module-enabled build by removing a redundant type definition. 2021-05-07 14:45:17 -07:00
Florian Hahn 75b9997760
[LV] Remove reference of PHI from comment, they are not recorded (NFC).
The comment incorrectly states that the PHI is recorded. That's not
accurate, only the recipe for the incoming value is recorded.

Suggested post-commit for 4ba8720f88.
2021-05-07 21:34:23 +01:00
Florian Hahn 337d765282
[LV] Assert if trying to sink replicate region into another region (NFC)
Currently sinking a replicate region into another replicate region is
not supported. Add an assert, to make the problem more obvious, should
it occur.

Discussed post-commit for ccebf7a109.
2021-05-07 21:25:35 +01:00
Florian Hahn 01c26d4e04
[LV] Rename Region to TargetRegion, similar to SinkRegion (NFC).
Adjust the name to make it clearer this is the region containing the
target recipe, similar to SinkRegion below.

Suggested post-commit for ccebf7a109.
2021-05-07 21:25:35 +01:00
Fangrui Song d8aba75a76 Internalize some cl::opt global variables or move them under namespace llvm 2021-05-07 11:15:43 -07:00
Caroline Concatto cf06c8eee3 [LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer::fixReduction
The function fixReduction used to assert/crash for scalable vector when
a vector reduce could be done with a smaller vector.
This patch removes this assertion as it is safe to use scalable vector for
vector reduce and truncate.

Differential Revision: https://reviews.llvm.org/D101260
2021-05-07 09:37:37 +01:00
Sanjay Patel fefcb1f878 [PassManager] add helper function to hold set of vector passes
This is no-functional-change-intended (NFC) and split off from
D102002 (which proposes to eliminate the LTO-based differences).
2021-05-06 15:36:15 -04:00
Simon Pilgrim 338c1b701f [SLP] Constify the TreeEntry* input into getEntryCost() + setInsertPointAfterBundle(). NFCI. 2021-05-06 16:20:19 +01:00
Simon Pilgrim 2dab059021 [SLP] Constify the TreeEntry* input into dumpTreeCosts(). NFCI. 2021-05-06 16:20:19 +01:00
Simon Pilgrim 1b47489fd0 [SLP] Use empty() instead of size() == 0. NFCI. 2021-05-06 16:20:18 +01:00
David Green 4979c90458 [LV] Account for tripcount when calculation vectorization profitability
The loop vectorizer will currently assume a large trip count when
calculating which of several vectorization factors are more profitable.
That is often not a terrible assumption to make as small trip count
loops will usually have been fully unrolled. There are cases however
where we will try to vectorize them, and especially when folding the
tail by masking can incorrectly choose to vectorize loops that are not
beneficial, due to the folded tail rounding the iteration count up for
the vectorized loop.

The motivating example here has a trip count of 5, so either performs 5
scalar iterations or 2 vector iterations (with VF=4). At a high enough
trip count the vectorization becomes profitable, but the rounding up to
2 vector iterations vs only 5 scalar makes it unprofitable.

This adds an alternative cost calculation when we know the max trip
count and are folding tail by masking, rounding the iteration count up
to the correct number for the vector width. We still do not account for
anything like setup cost or the mixture of vector and scalar loops, but
this is at least an improvement in a few cases that we have had
reported.

Differential Revision: https://reviews.llvm.org/D101726
2021-05-06 12:36:46 +01:00
Kerry McLaughlin 8c9742bd23 [SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences
Adds support for scalable vectorization of loops containing first-order recurrences, e.g:
```
for(int i = 0; i < n; i++)
  b[i] =  a[i] + a[i - 1]
```
This patch changes fixFirstOrderRecurrence for scalable vectors to take vscale into
account when inserting into and extracting from the last lane of a vector.
CreateVectorSplice has been added to construct a vector for the recurrence, which
returns a splice intrinsic for scalable types. For fixed-width the behaviour
remains unchanged as CreateVectorSplice will return a shufflevector instead.

The tests included here are the same as test/Transform/LoopVectorize/first-order-recurrence.ll

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D101076
2021-05-06 11:35:39 +01:00
Juneyoung Lee 8a156d1c27 [InstCombine] Fully disable select to and/or i1 folding
This is a patch that disables the poison-unsafe select -> and/or i1 folding.

It has been blocking D72396 and also has been the source of a few miscompilations
described in llvm.org/pr49688 .
D99674 conditionally blocked this folding and successfully fixed the latter one.
The former one was still blocked, and this patch addresses it.

Note that a few test functions that has `_logical` suffix are now deoptimized.
These are created by @nikic to check the impact of disabling this optimization
by copying existing original functions and replacing and/or with select.

I can see that most of these are poison-unsafe; they can be revived by introducing
freeze instruction. I left comments at fcmp + select optimizations (or-fcmp.ll, and-fcmp.ll)
because I think they are good targets for freeze fix.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101191
2021-05-06 09:29:52 +09:00
Coplin, Jared 6251b2f7f6 Attach metadata to simplified masked loads and stores 2021-05-05 18:01:49 -05:00
Roman Lebedev 8048005739
[NFC][SimplifyCFG] Update documentation comments for SinkCommonCodeFromPredecessors() after 1886aad 2021-05-05 20:34:59 +03:00
Philip Reames 80e8025083 [LV] Workaround PR49900 (a crash due to analyzing partially mutated IR)
LoopVectorize has a fairly deeply baked in design problem where it will try to query analysis (primarily SCEV, but also ValueTracking) in the midst of mutating IR. In particular, the intermediate IR state does not represent the semantics of the original (or final) program.

Fixing this for real is hard, but all of the cases seen so far share a common symptom. In cases seen to date, the analysis being queried is the computation of the original loop's trip count. We can fix this particular instance of the issue by simply computing the trip count early, and caching it.

I want to be really clear that this is nothing but a workaround. It does nothing to fix the root issue, and at best, delays the time until we have to fix this for real. Florian and I have discussed an eventual solution in the review comments for https://reviews.llvm.org/D100663, but it's a lot of work.

Test taken from https://reviews.llvm.org/D100663.

Differential Revision: https://reviews.llvm.org/D101487
2021-05-05 09:56:28 -07:00
Sanjay Patel 0034197874 [InstCombine] improve readability; NFC 2021-05-05 11:05:47 -04:00
Juneyoung Lee 1fef5c88a6 [InstCombine] Fold more select of selects using isImpliedCondition
This is a simple folding that does these:

```
select x_inv, true, (select y, x, false)
=>
select x_inv, true, y
```
https://alive2.llvm.org/ce/z/-STJ2d

```
select (select y, x, false), true, x_inv
=>
select y, true, x_inv
```
https://alive2.llvm.org/ce/z/6ruYt6

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101807
2021-05-05 13:44:58 +09:00
Han Zhu da1cdffbb1 [loop-idiom] Hoist loop memcpys to loop preheader
For a simple loop like:
```
struct S {
  int x;
  int y;
  char b;
};

unsigned foo(S* __restrict__ a, S* b, int n) {
  for (int i = 0; i < n; i++)
    a[i] = b[i];

  return sizeof(a[0]);
}
```
We could eliminate the loop and convert it to a large memcpy of 12*n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll`
```
%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %0 = bitcast %struct.S* %arrayidx2 to i8*
  %1 = bitcast %struct.S* %arrayidx to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false)
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }

```
The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic.

With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass.
```
%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %a1 = bitcast %struct.S* %a to i8*
  %b2 = bitcast %struct.S* %b to i8*
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  %0 = zext i32 %n to i64
  %1 = mul nuw nsw i64 %0, 12
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false)
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %2 = bitcast %struct.S* %arrayidx2 to i8*
  %3 = bitcast %struct.S* %arrayidx to i8*
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```

Reviewed By: zino

Differential Revision: https://reviews.llvm.org/D97667
2021-05-04 17:05:04 -07:00
Florian Hahn ccebf7a109
[VPlan] Properly handle sinking of replicate regions.
This patch updates the code that sinks recipes required for first-order
recurrences to properly handle replicate-regions. At the moment, the
code would just move the replicate recipe out of its replicate-region,
producing an invalid VPlan.

When sinking a recipe in a replicate-region, we have to sink the whole
region. To do that, we first need to split the block at the target
recipe and move the region in between.

This patch also adds a splitAt helper to VPBasicBlock to split a
VPBasicBlock at a given iterator.

Fixes PR50009.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100751
2021-05-04 22:36:01 +01:00
Xun Li def86413d4 [Coroutines] Do not add alloca to the frame if the size is 0
This patch is to address https://bugs.llvm.org/show_bug.cgi?id=49916.
When the size of an alloca is 0, it will trigger an assertion in OptimizedStructLayout when being added to the frame.
Fix it by not adding it at all. We return index 0 (beginning of the frame) for all 0-sized allocas.

Differential Revision: https://reviews.llvm.org/D101841
2021-05-04 12:55:40 -07:00
Nikita Popov e20897726f [SimplifyCFG] Create logical or in SimplifyCondBranchToCondBranch()
We need to use a logical or instead of a bitwise or to preserve
poison behavior. Poison from the second condition should not
propagate if the first condition is true.

We were already handling this correctly in FoldBranchToCommonDest(),
but not in this fold. (There are still other folds with this issue.)
2021-05-04 19:51:30 +02:00
Nikita Popov 44fd4575b3 [SimplifyCFG] Extract helper for creating logical op (NFC) 2021-05-04 19:51:30 +02:00
Sanjay Patel a6f79b5671 [InstCombine] avoid infinite loops with select/icmp transforms
This fixes https://llvm.org/PR48900 , but as seen in the
regression tests prevents some optimizations.

There are a few options to restore those (switch to min/max
intrinsics, add larger pattern matching for select with
dominating condition, improve CVP), but we need to prevent
the bug 1st.
2021-05-04 11:54:06 -04:00
Florian Hahn 4ba8720f88
[VPlan] Representing backedge def-use feeding reduction phis.
This patch updates the code handling reduction recipes to also keep
track of the incoming value from the latch in the recipe. This is needed
to model the def-use chains completely in VPlan, so that it is possible
to replace the incoming value with an arbitrary VPValue.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D99294
2021-05-04 16:33:22 +01:00
Sander de Smalen 9931ae645e Reland "[LV] Calculate max feasible scalable VF."
Relands https://reviews.llvm.org/D98509

This reverts commit 51d648c119.
2021-05-04 15:44:41 +01:00
Simon Pilgrim b04148f777 Local.cpp - Avoid DebugLoc copies - use const reference from getDebugLoc. NFCI. 2021-05-04 14:31:50 +01:00
Simon Pilgrim 2bb41851a1 [Utils] recognizeBSwapOrBitReverseIdiom - support matching from funnel shift roots (PR40058)
We were missing bitreverse matches in cases where InstCombine had seen a byte-level rotation at the end of a bitreverse sequence (replacing or() with fshl()), hindering the exhaustive bitreverse matching in CodeGenPrepare later on.
2021-05-04 13:46:45 +01:00
Alexey Bataev 369cd2ae52 Revert "[SLP]Allow masked gathers only if allowed by target."
This reverts commit fd18547e07. Need to
add a check for the size of the vectorization tree to avoid some extra
vectorization.
2021-05-04 04:53:22 -07:00
Dávid Bolvanský 80b897e21b [InstCombine] ctpop(X) ^ ctpop(Y) & 1 --> ctpop(X^Y) & 1 (PR50094)
Original pattern: (__builtin_parity(x) ^ __builtin_parity(y))

LLVM rewrites it as: (__builtin_popcount(x) ^ __builtin_popcount(y)) & 1

Optimized form:  __builtin_popcount(X^Y) & 1

Alive proof: https://alive2.llvm.org/ce/z/-GdWFr

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D101802
2021-05-04 13:16:18 +02:00
Philip Reames 11326cbcdb [IndVarSimplify][NFC] Removed mayThrow from if-condition in predicateLoopExits of IndVarSimplify
Instruction has mayHaveSideEffects method that returns true if mayThrow return true because this is called internally in the first method.  As such, the call being removed is redundant.

Patch By: vdsered (Daniil Seredkin)
Differential Revision: https://reviews.llvm.org/D101685
2021-05-03 18:25:07 -07:00
Juneyoung Lee 24ce194cfe [InstCombine] generalize select + select/and/or folding using implied conditions
This patch optimizes the remaining possible cases in D101191 by generalizing isImpliedCondition()-based
foldings.

Assume that there is `op a, (select b, _, _)` where op is one of `and i1`, `or i1` or their select forms.

We can do the following optimization based on the result of `isImpliedCondition(a, b)`:

If a = true implies…
- b = true:
    - select a, (select b, A, B), false => select a, A, false : https://alive2.llvm.org/ce/z/WCnZYh
    - and a, (select b, A, B) => select a, A, false : https://alive2.llvm.org/ce/z/uZhcMG
- b = false:
    - select a, (select b, A, B), false => select a, B, false : https://alive2.llvm.org/ce/z/c2hJpV
    - and a, (select b, A, B) => select a, B, false : https://alive2.llvm.org/ce/z/5ggwMM

If a = false implies…
- b = true:
    - select a, true, (select b, A, B) => select a, true, A : https://alive2.llvm.org/ce/z/tidKvH
    - or a, (select b, A, B) =>  select a, true, A : https://alive2.llvm.org/ce/z/cC-uyb
- b = false:
    - select a, true, (select b, A, B) => select a, true, B : https://alive2.llvm.org/ce/z/ZXpJq9
    - or a, (select b, A, B) => select a, true, B : https://alive2.llvm.org/ce/z/hnDrJj

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101720
2021-05-04 09:42:06 +09:00
Arthur Eubanks d14d84af2f [NewPM] Only invalidate modified functions' analyses in CGSCC passes
Previously, any change in any function in an SCC would cause all
analyses for all functions in the SCC to be invalidated. With this
change, we now manually invalidate analyses for functions we modify,
then let the pass manager know that all function analyses should be
preserved.

So far this only touches the inliner, argpromotion, funcattrs, and
updateCGAndAnalysisManager(), since they are the most used.

Slight compile time improvements:
http://llvm-compile-time-tracker.com/compare.php?from=326da4adcb8def2abdd530299d87ce951c0edec9&to=8942c7669f330082ef159f3c6c57c3c28484f4be&stat=instructions

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D100917
2021-05-03 17:21:44 -07:00
Joseph Huber 182831258b [Attributor] Add AAExecutionDomainInfo interface to OpenMPOpt
Summary:
Add the AAExecutionDomainInfo attributor instance to OpenMPOpt.
This will infer information relating to domain information that an
instruction might be expecting in. Right now this only includes a very
crude check for instructions that will be executed by the master thread
by comparing a thread-id function with a constant zero.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101578
2021-05-03 19:24:19 -04:00
Dávid Bolvanský 08c08577f9 [InstCombine] cttz(sext(x)) -> cttz(zext(x))
```

----------------------------------------
define i32 @src(i16 %x, i1 %b) {
%0:
  %z = sext i16 %x to i32
  %p = cttz i32 %z, %b
  ret i32 %p
}
=>
define i32 @tgt(i16 %x, i1 %b) {
%0:
  %z = zext i16 %x to i32
  %p = cttz i32 %z, %b
  ret i32 %p
}
Transformation seems to be correct!
```

https://alive2.llvm.org/ce/z/evomeg

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101764
2021-05-03 23:59:30 +02:00
Teresa Johnson ea817d79be [SimplifyCFG] Look for control flow changes instead of side effects.
When passingValueIsAlwaysUndefined scans for an instruction between an
inst with a null or undef argument and its first use, it was checking
for instructions that may have side effects, which is a superset of the
instructions it intended to find (as per the comments, control flow
changing instructions that would prevent reaching the uses). Switch
to using isGuaranteedToTransferExecutionToSuccessor() instead.

Without this change, when enabling -fwhole-program-vtables, which causes
assumes to be inserted by clang, we can get different simplification
decisions. In particular, when building with instrumentation FDO it can
affect the optimizations decisions before FDO matching, leading to some
mismatches.

I had to modify d83507-knowledge-retention-bug.ll since this fix enables
more aggressive optimization of that code such that it no longer tested
the original bug it was meant to test. I removed the undef which still
provokes the original failure (confirmed by temporarily reverting the
fix) and also changed it to just invoke the passes of interest to narrow
the testing.

Similarly I needed to adjust code for UnreachableEliminate.ll to avoid
an undef which was causing the function body to get optimized away with
this fix.

Differential Revision: https://reviews.llvm.org/D101507
2021-05-03 13:32:22 -07:00
Alexey Bataev fd18547e07 [SLP]Allow masked gathers only if allowed by target.
Need to check if target allows/supports masked gathers before trying to
estimate its cost, otherwise we may fail to vectorize some of the
patterns because of too pessimistic cost model.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101297
2021-05-03 08:06:20 -07:00
Dávid Bolvanský 27b651ca47 [InstCombine] cttz(zext(x)) -> zext(cttz(x)) if the 'ZeroIsUndef' parameter is 'true' (PR50172)
Zext doesn't change the number of trailing zeros, so narrow cttz(zext(x)) -> zext(cttz(x)) if the 'ZeroIsUndef' parameter is 'true'.

Proofs:
https://alive2.llvm.org/ce/z/o2dnjY

Solves https://bugs.llvm.org/show_bug.cgi?id=50172

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101582
2021-05-03 17:05:12 +02:00
Alexey Bataev 2e4cc9a725 Revert "[SLP]Allow masked gathers only if allowed by target."
This reverts commit b5f64768cf to fix
a compiler crash revealed by buildbots.
2021-05-03 07:20:00 -07:00
Alexey Bataev b5f64768cf [SLP]Allow masked gathers only if allowed by target.
Need to check if target allows/supports masked gathers before trying to
estimate its cost, otherwise we may fail to vectorize some of the
patterns because of too pessimistic cost model.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101297
2021-05-03 06:45:42 -07:00
Florian Hahn 2b7fa7f744 [LV] Iterate over recipes in VPlan to fix PHI (NFC).
As we gradually move more elements of LV to VPlan, we are trying to
reduce the number of places that still has to check IR of the original
loop.

This patch adjusts the code to fix cross iteration phis to get the PHIs
to fix directly from the VPlan that is executed. We still need the
original PHI to check for first-order recurrences, but we can get rid of
that once we model that explicitly in VPlan as well.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D99293
2021-05-03 14:09:46 +01:00
Sanjay Patel 1b24f35f84 [InstCombine] improve demanded bits analysis of left-shifted operand
If we don't demand high bits, then we also don't care about those
high bits of a left-shift operand regardless of shift amount.
I noticed the sext/trunc pattern in a motivating example.
It seems like there should be a low-bits with right-shift sibling,
but I haven't looked at that yet.

https://alive2.llvm.org/ce/z/JuS6jc
https://rise4fun.com/Alive/Trm (not sure how to use 'width' with Alive1)
https://alive2.llvm.org/ce/z/gRadbF

Differential Revision: https://reviews.llvm.org/D101489
2021-05-03 08:39:20 -04:00
Reshabh Sharma 9f51f1b927 [ASAN][AMDGPU] Add support for accesses to global and constant addrspaces
Add address sanitizer instrumentation support for accesses to global
and constant address spaces in AMDGPU. It strictly avoids instrumenting
the stack and assumes x86 as the host.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D99071
2021-05-03 09:01:15 +05:30
Florian Hahn 942e068d7a [VPlan] Add VPBasicBlock::phis() helper (NFC).
This patch introduces a helper to obtain an iterator range for the
PHI-like recipes in a block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100101
2021-05-02 19:20:13 +01:00
Juneyoung Lee 39eb2665d9 [InstCombine] Add a few more patterns for folding select of select
This is a patch that folds select of select to salvage some optimizations after select -> and/or folding is disabled.

```
select (select a, true, b), c, false -> select a, c, false
select c, (select a, true, b), false -> select c, a, false
  if c implies that b is false (isImpliedCondition).
```
https://alive2.llvm.org/ce/z/ANatjt, https://alive2.llvm.org/ce/z/rv8zTB

```
sel (sel c, a, false), true, (sel !c, b, false) -> sel c, a, b
sel (sel !c, a, false), true, (sel c, b, false) -> sel c, b, a
```
https://alive2.llvm.org/ce/z/U2kp-t, https://alive2.llvm.org/ce/z/bc88EE

See D101191

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101375
2021-05-02 19:00:42 +09:00
Juneyoung Lee 1977c53b2a [InstCombine] Fold overflow bit of [u|s]mul.with.overflow in a poison-safe way
As discussed in D101191, this patch adds a poison-safe folding of overflow bit check:
```
  %Op0 = icmp ne i4 %X, 0
  %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y)
  %Op1 = extractvalue { i4, i1 } %Agg, 1
  %ret = select i1 %Op0, i1 %Op1, i1 false
=>
  %Y.fr = freeze %Y
  %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y.fr)
  %Op1 = extractvalue { i4, i1 } %Agg, 1
  %ret = %Op1
```

https://alive2.llvm.org/ce/z/zgPUGT
https://alive2.llvm.org/ce/z/h2gZ_6

Note that there are cases where inserting freeze is not necessary: e.g. %Y is `noundef`.
In this case, LLVM is already good because `%ret` is already successfully folded into `and`,
triggering the pre-existing optimization in InstSimplify: https://godbolt.org/z/v6qena15K

Differential Revision: https://reviews.llvm.org/D101423
2021-05-02 11:54:12 +09:00
Nikita Popov ffa5a402a9 [IndVars] Remove redundant loop invariance check (NFC)
This is checked again directly below this condition.
2021-05-01 17:22:00 +02:00
Nathan Chancellor 4397b7095d
Revert "Re-reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands""
This reverts commit 791930d740, as per
https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy.

I observed breakage with the Linux kernel, as reported at
https://reviews.llvm.org/D91722#2724321

Fixes exist at
https://reviews.llvm.org/D101523
https://reviews.llvm.org/D101540

but they have not landed so to unbreak the tree for the weekend, revert
this commit.

Commit b11e4c9907 ("Revert "[DebugInfo] Drop DBG_VALUE_LISTs with an
excessive number of debug operands"") only reverted one follow-up fix,
not the original patch that broke the kernel.

e
2021-04-30 20:23:21 -07:00
George Balatsouras a45fd436ae [dfsan] Fix origin tracking for fast8
The problem is the following. With fast8, we broke an important
invariant when loading shadows.  A wide shadow of 64 bits used to
correspond to 4 application bytes with fast16; so, generating a single
load was okay since those 4 application bytes would share a single
origin.  Now, using fast8, a wide shadow of 64 bits corresponds to 8
application bytes that should be backed by 2 origins (but we kept
generating just one).

Let’s say our wide shadow is 64-bit and consists of the following:
0xABCDEFGH. To check if we need the second origin value, we could do
the following (on the 64-bit wide shadow) case:

 - bitwise shift the wide shadow left by 32 bits (yielding 0xEFGH0000)
 - push the result along with the first origin load to the shadow/origin vectors
 - load the second 32-bit origin of the 64-bit wide shadow
 - push the wide shadow along with the second origin to the shadow/origin vectors.

The combineOrigins would then select the second origin if the wide
shadow is of the form 0xABCDE0000.  The tests illustrate how this
change affects the generated bitcode.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D101584
2021-04-30 15:57:33 -07:00
Justin Bogner 9542721085 Add support for llvm.assume intrinsic to the LoadStoreVectorizer pass
Patch by Viacheslav Nikolaev. Thanks!
2021-04-30 13:39:46 -07:00
Alexey Bataev a3fd82c289 [SLP]Fix the crash on cost calculation if non-compatible vectors shuffled.
If the extracts from the non-power-2 vectors are recognized as shuffles,
need some extra checks to not crash cost calculations if trying to gext
the ecost for subvector extracts. In this case need to check carefully
that we do not exit out of bounds of the original vector, otherwise the
TTI's cost model will crash on assert.

Differential Revision: https://reviews.llvm.org/D101477
2021-04-30 09:34:20 -07:00
Jingu Kang 88b259c014 [SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch
Differential Revision: https://reviews.llvm.org/D99354
2021-04-30 15:55:56 +01:00
Evgeniy Brevnov 7861cb600c [NARY] Don't optimize min/max if there are side uses (part2)
Previous attempt to fix infinite recursion in min/max reassociation was not fully successful (D100170). Newly discovered failing case is due to not properly handled when there is a single use. It should be processed separately from 2 uses case.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D101359
2021-04-30 19:02:02 +07:00
Florian Hahn ed9df5bd2f
[Passes] Run sinking/hoisting in SimplifyCFG earlier.
Hoisting and sinking instructions out of conditional blocks enables
additional vectorization by:

1. Executing memory accesses unconditionally.
2. Reducing the number of instructions that need predication.

After disabling early hoisting / sinking, we miss out on a few
vectorization opportunities. One of those is causing a ~10% performance
regression in one of the Geekbench benchmarks on AArch64.

This patch tires to recover the regression by running hoisting/sinking
as part of a SimplifyCFG run after LoopRotate and before LoopVectorize.

Note that in the legacy pass-manager, we run LoopRotate just before
vectorization again and there's no SimplifyCFG run in between, so the
sinking/hoisting may impact the later run on LoopRotate. But the impact
should be limited and the benefit of hosting/sinking at this stage
should outweigh the risk of not rotating.

Compile-time impact looks slightly positive for most cases.
http://llvm-compile-time-tracker.com/compare.php?from=2ea7fb7b1c045a7d60fcccf3df3ebb26aa3699e5&to=e58b4a763c691da651f25996aad619cb3d946faf&stat=instructions

NewPM-O3: geomean -0.19%
NewPM-ReleaseThinLTO: geoman -0.54%
NewPM-ReleaseLTO-g: geomean -0.03%

With a few benchmarks seeing a notable increase, but also some
improvements.

Alternative to D101290.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101468
2021-04-30 12:23:57 +01:00
Alexey Bataev 12c51f2358 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 12:48:00 -07:00
Alexey Bataev 6e859f3cd4 Revert "[COST] Improve shuffle kind detection if shuffle mask is provided."
This reverts commit 9239932221 to fix
a compiler crash on mask checks.
2021-04-29 12:40:33 -07:00
Sanjay Patel 0f8b6686ac [InstCombine] narrow popcount with zext operand
https://llvm.org/PR50141
2021-04-29 15:07:16 -04:00
Roman Lebedev cc63203908
[SimplifyCFG] Common code sinking: fix application of profitability check
The profitability check is: we don't want to create more than a single PHI
per instruction sunk. We need to create the PHI unless we'll sink
all of it's would-be incoming values.

But there is a caveat there.
This profitability check doesn't converge on the first iteration!
If we first decide that we want to sink 10 instructions,
but then determine that 5'th one is unprofitable to sink,
that may result in us not sinking some instructions that
resulted in determining that some other instruction
we've determined to be profitable to sink becoming unprofitable.

So we need to iterate until we converge, as in determine
that all leftover instructions are profitable to sink.

But, the direct approach of just re-iterating seems dumb,
because in the worst case we'd find that the last instruction
is unprofitable, which would result in revisiting instructions
many many times.

Instead, i think we can get away with just two passes - forward and backward.
However then it isn't obvious what is the most performant way to update
InstructionsToSink.
2021-04-29 21:11:40 +03:00
Alexey Bataev 9239932221 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 09:42:56 -07:00
Sander de Smalen 51d648c119 Revert "[LV] Calculate max feasible scalable VF."
Temporarily reverting this patch due to some unexpected issue found
by one of the PPC buildbots.

This reverts commit 584e9b6e4b.
2021-04-29 16:04:37 +01:00
Florian Hahn a0e1313c23
[VPlan] Add getVPSingleValue helper.
As suggested in D99294, this adds a getVPSingleValue helper to use for
recipes that are guaranteed to define a single value. This replaces uses
of getVPValue() which used to default to I = 0.
2021-04-29 13:37:38 +01:00
Reshabh Sharma 60c60dd138 [ASAN] NFC: Use addrspace cast for pointers in non-zero addrspace
Pointers in non-zero address spaces need to be address space
casted before appending to the used list.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D101363
2021-04-29 11:06:00 +05:30
Reshabh Sharma fc1df36e6e [ASAN] NFC: Copy address space when creating globals with redzones
This patch makes sure that globals in supported address spaces
will be replaced by globals with red zones in the same address
space by copying the address space.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D101362
2021-04-29 10:21:43 +05:30
Amanieu d'Antras ad9ce8142d [ConstantMerge] Don't merge thread_local constants with non-thread_local constants
Fixes PR49932

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D100322
2021-04-28 23:44:20 +01:00
Roman Lebedev 707ad01399
[SimplifyCFG] Common code sinking: fixup variable name
As noticed in post-commit review.

I've gone through several iterations of that name,
and somehow managed to end up with an incorrect one.
2021-04-29 01:24:16 +03:00
Dávid Bolvanský e20b32ff3b [BuildLibCalls] Remove inaccessiblememonly inference for calloc
Solves regression mentioned in PR50143.

As noted in D101440, proper modelling for calloc would require new attribute inaccessible_or_returned_memonly.
2021-04-29 00:17:37 +02:00
Roman Lebedev 1886aad9d0
[SimplifyCFG] Common code sinking: relax restriction on non-uncond predecessors
While we have a known profitability issue for sinking in presence of
non-unconditional predecessors, there isn't any known issues
for having multiple such non-unconditional predecessors,
so said restriction appears to be artificial. Lift it.
2021-04-29 01:01:01 +03:00
Roman Lebedev a8e273f2ed
[NFC][SimplifyCFG] Add test showing that profitability check for sinking is broken
Essentially, we can't promise that the instruction is sinkable without
introducing PHI's until we know that it is profitable to sink.
2021-04-29 01:01:00 +03:00
Roman Lebedev 12c8027ce3
[NFC][SimplifyCFG] Common code sinking: check profitability once
We can just eagerly pre-check all the instructions that we *could*
sink that we'd actually want to sink them, clamping the number of
instructions that we'll sink to stop just before the first unprofitable one.
2021-04-29 01:01:00 +03:00
Roman Lebedev 4c27ca21d9
[NFC][SimplifyCFG] SinkCommonCodeFromPredecessors(): reword comment about PR30244 2021-04-29 01:01:00 +03:00
Bardia Mahjour ddb3b26a12 [LV] Consider Loop Unroll Hints When Making Interleave Decisions
This patch causes the loop vectorizer to not interleave loops that have
nounroll loop hints (llvm.loop.unroll.disable and llvm.loop.unroll_count(1)).
Note that if a particular interleave count is being requested
(through llvm.loop.interleave_count), it will still be honoured, regardless
of the presence of nounroll hints.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D101374
2021-04-28 17:27:52 -04:00
Sanjay Patel abd7529625 [InstCombine] relax masking requirement for truncated funnel/rotate match
I was investigating a seemingly unrelated improvement in demanded
bits for shift-left, but that caused regressions on these tests
because we were able to look through/eliminate the mask.

https://alive2.llvm.org/ce/z/Ztdr22

  define i8 @src(i32 %x, i32 %y, i32 %shift) {
  %and = and i32 %shift, 3
  %conv = and i32 %x, 255
  %shr = lshr i32 %conv, %and
  %sub = sub i32 8, %and
  %shl = shl i32 %y, %sub
  %or = or i32 %shr, %shl
  %conv2 = trunc i32 %or to i8
  ret i8 %conv2
  }

  define i8 @tgt(i32 %x, i32 %y, i32 %shift) {
  %x8 = trunc i32 %x to i8
  %y8 = trunc i32 %y to i8
  %shift8 = trunc i32 %shift to i8
  %and = and i8 %shift8, 3
  %conv2 = call i8 @llvm.fshr.i8(i8 %y8, i8 %x8, i8 %and)
  ret i8 %conv2
  }

  declare i8 @llvm.fshr.i8(i8,i8,i8)
2021-04-28 16:49:50 -04:00
Roman Lebedev d16d820c2e
[SimplifyCFG] Try 2: sink all-indirect indirect calls
Note that we don't want to turn a partially-direct call
into an indirect one, that will break ICP amongst other things.
2021-04-28 19:08:54 +03:00
Roman Lebedev 262c679d32
Revert "[SimplifyCFG] Sinking indirect calls - they're already indirect anyways"
Seems to break indirect call promotion, LTO/Resolution/X86/load-sample-prof-icp.ll fails.

This reverts commit e57cf128b3.
2021-04-28 17:46:59 +03:00
Roman Lebedev e57cf128b3
[SimplifyCFG] Sinking indirect calls - they're already indirect anyways 2021-04-28 17:36:23 +03:00
Dawid Jurczak 5f5974aeac [SimplifyLibCalls] Transform printf("%s", str) --> puts(str)/noop
Before this change LLVM cannot simplify printf in following cases:

printf("%s", "") --> noop
printf("%s", str"\n") --> puts(str)

From the other hand GCC can perform such transformations for many years:
https://godbolt.org/z/7nnqbedfe

Differential Revision: https://reviews.llvm.org/D100724
2021-04-28 10:29:07 -04:00
David Sherwood 00e65f3345 [LoopVectorize][SVE] Fix crash when vectorising FP negation
This patch fixes a crash encountered when vectorising the following loop:

 void foo(float *dst, float *src, long long n) {
   for (long long i = 0; i < n; i++)
     dst[i] = -src[i];
 }

using scalable vectors. I've added a test to

 Transforms/LoopVectorize/AArch64/sve-basic-vec.ll

as well as cleaned up the other tests in the same file.

Differential Revision: https://reviews.llvm.org/D98054
2021-04-28 15:22:35 +01:00
Tres Popp f0e848e63d Silence unused variable warning 2021-04-28 15:46:09 +02:00
Alexey Bataev 8af4723c58 [SLP]Try to vectorize tiny trees with shuffled gathers.
If the first tree element is vectorize and the second is gather, it
still might be profitable to vectorize it if the gather node contains
less scalars to vectorize than the original tree node. It might be
profitable to use shuffles.

Differential Revision: https://reviews.llvm.org/D101397
2021-04-28 06:35:31 -07:00
David Sherwood 6998f8ae2d [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.

Differential Revision: https://reviews.llvm.org/D99718
2021-04-28 13:41:07 +01:00
Sander de Smalen 584e9b6e4b [LV] Calculate max feasible scalable VF.
This patch also refactors the way the feasible max VF is calculated,
although this is NFC for fixed-width vectors.

After this change scalable VF hints are no longer truncated/clamped
to a shorter scalable VF, nor does it drop the 'scalable flag' from
the suggested VF to vectorize with a similar VF that is fixed.

Instead, the hint is ignored which means the vectorizer is free
to find a more suitable VF, using the CostModel to determine the
best possible VF.

Reviewed By: c-rhodes, fhahn

Differential Revision: https://reviews.llvm.org/D98509
2021-04-28 12:30:00 +01:00
Tres Popp efce19c3b0 Revert "[loop-idiom] Hoist loop memcpys to loop preheader"
This reverts commit 75d6b8bb40.

The reasoning is mentioned in https://reviews.llvm.org/D97667
2021-04-28 13:16:34 +02:00
Kerry McLaughlin 9cc217ab36 [LoopVectorize] Prevent multiple Phis being generated with in-order reductions
When using the -enable-strict-reductions flag where UF>1 we generate multiple
Phi nodes, though only one of these is used as an input to the vector.reduce.fadd
intrinsics. The unused Phi nodes are removed later by instcombine.

This patch changes widenPHIInstruction/fixReduction to only generate
one Phi, and adds an additional test for unrolling to strict-fadd.ll

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D100570
2021-04-28 11:29:01 +01:00
Benjamin Kramer 7e5682ee62 [ADT] Make TrackingStatistic's ctor constexpr
This lets clang diagnose unused statistics, so remove them.
2021-04-28 12:00:17 +02:00
Dávid Bolvanský e81819377e [DSE] Eliminate zero memset after calloc
Solves PR11896

As noted, this can be improved futher (calloc -> malloc) in some cases. But for know, this is the first step.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101391
2021-04-28 03:30:52 +02:00
Han Zhu 75d6b8bb40 [loop-idiom] Hoist loop memcpys to loop preheader
For a simple loop like:
```
struct S {
  int x;
  int y;
  char b;
};

unsigned foo(S* __restrict__ a, S* b, int n) {
  for (int i = 0; i < n; i++)
    a[i] = b[i];

  return sizeof(a[0]);
}
```
We could eliminate the loop and convert it to a large memcpy of 12*n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll`
```
%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %0 = bitcast %struct.S* %arrayidx2 to i8*
  %1 = bitcast %struct.S* %arrayidx to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false)
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }

```
The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic.

With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass.
```
%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %a1 = bitcast %struct.S* %a to i8*
  %b2 = bitcast %struct.S* %b to i8*
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  %0 = zext i32 %n to i64
  %1 = mul nuw nsw i64 %0, 12
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false)
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %2 = bitcast %struct.S* %arrayidx2 to i8*
  %3 = bitcast %struct.S* %arrayidx to i8*
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```

Reviewed By: zino

Differential Revision: https://reviews.llvm.org/D97667
2021-04-27 17:37:51 -07:00
Han Zhu cd13f19031 [loop-idiom][NFC] Extract processLoopStoreOfLoopLoad into a helper function
Differential Revision: https://reviews.llvm.org/D100979
2021-04-27 13:42:30 -07:00
Sanjay Patel 025bb52903 [InstCombine] fold clamp to 2 values from min/max intrinsics
The "select" versions of these folds is also missing and can
cause infinite loops as shown in:
https://llvm.org/PR48900
...but it seems easier to match these as max/min as a first fix.

https://alive2.llvm.org/ce/z/wv-_dT
2021-04-27 15:35:49 -04:00
David Sherwood 6968520c3b Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost"
This reverts commit 4afeda9157.
2021-04-27 15:46:03 +01:00
David Sherwood 4afeda9157 [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.

Differential Revision: https://reviews.llvm.org/D99718
2021-04-27 15:26:15 +01:00
Alexey Bataev 24590d8d67 [SLP]Improved isGatherShuffledEntry, NFC.
Reworked isGatherShuffledEntry function, simplified and moved
common code to the lambda (it shall go away when non-power-2 patch will
be landed).
2021-04-27 05:59:46 -07:00
Florian Hahn cb96d802d4
[LV] Hoist code to get vector loop latch (NFC).
Address suggestion from D99294.
2021-04-27 13:30:17 +01:00
Sanjay Patel e808289fe6 [IndVars] avoid crash in LFTR when assuming an add recurrence
The test is a crasher reduced from:
https://llvm.org/PR49993

linearFunctionTestReplace() assumes that we have an add recurrence,
so check for that as a condition of matching a loop counter.

Differential Revision: https://reviews.llvm.org/D101291
2021-04-27 08:26:02 -04:00
Florian Hahn 160e729cf0
[VPlan] Use recursive traversal iterator in VPSlotTracker.
This patch simplifies VPSlotTracker by using the recursive traversal
iterator to traverse all blocks in a VPlan in reverse post-order when
numbering VPValues in a plan.

This depends on a fix to RPOT (D100169). It also extends the traversal
unit tests to check RPOT.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D100176
2021-04-27 12:39:06 +01:00
Vitaly Buka f2a585e6d3 [NFC] Fix "not used" warning 2021-04-26 22:09:23 -07:00
Arthur Eubanks fd1ff5ee03 [Inliner] Make ModuleInlinerWrapperPass return PreservedAnalyses::all()
The ModulePassManager should already have taken care of all analysis
invalidation. Without this change, upcoming changes will cause more
invalidation than necessary.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D101320
2021-04-26 17:22:35 -07:00
William S. Moses 7aa3cad46a [NVPTX] Enable lowering of atomics on local memory
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.

Differential Revision: https://reviews.llvm.org/D98650
2021-04-26 20:12:12 -04:00
Hongtao Yu 30bb5be389 [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 2.
As a follow-up to D95982, this patch continues unblocking optimizations that are blocked by pseudu probe instrumention.

The optimizations unblocked are:
		- In-block load propagation.
		- In-block dead store elimination
		- Memory copy optimization that turns stores to consecutive memories into a memset.

These optimizations are local to a block, so they shouldn't affect the profile quality.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100075
2021-04-26 16:52:33 -07:00
Fangrui Song 18839be9c5 [ADT] Remove StatisticBase and make NoopStatistic empty
In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic`
but has 3 mostly unused pointers. GlobalOpt considers that the pointers can
potentially retain allocated objects, so GlobalOpt cannot optimize out the
`NoopStatistic` variables (see D69428 for more context), wasting 23KiB for stage
2 clang.

This patch makes `NoopStatistic` empty and thus reclaims the wasted space.  The
clang size is even smaller than applying D69428 (slightly smaller in both .bss and
.text).
```
# This means the D69428 optimization on clang is mostly nullified by this patch.
HEAD+D69428: size(.bss) = 0x0725a8
HEAD+D101211: size(.bss) = 0x072238

# bloaty - HEAD+D69428 vs HEAD+D101211
# With D101211, we also save a lot of string table space (.rodata).
    FILE SIZE        VM SIZE
 --------------  --------------
  -0.0%     -32  -0.0%     -24    .eh_frame
  -0.0%    -336  [ = ]       0    .symtab
  -0.0%    -360  [ = ]       0    .strtab
  [ = ]       0  -0.2%    -880    .bss
  -0.0% -2.11Ki  -0.0% -2.11Ki    .rodata
  -0.0% -2.89Ki  -0.0% -2.89Ki    .text
  -0.0% -5.71Ki  -0.0% -5.88Ki    TOTAL
```

Note: LoopFuse is a disabled pass. For now this patch adds
`#if LLVM_ENABLE_STATS` so `OptimizationRemarkMissed` is skipped in
LLVM_ENABLE_STATS==0 builds.  If these `OptimizationRemarkMissed` are useful in
LLVM_ENABLE_STATS==0 builds, we can replace `llvm::Statistic` with
`llvm::TrackingStatistic`, or use a different abstraction to keep track of the strings.

Similarly, skip the code in `mlir/lib/Pass/PassStatistics.cpp` which
calls `getName`/`getDesc`/`getValue`.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D101211
2021-04-26 16:47:32 -07:00
William S. Moses 8ede96493c Revert "[NVPTX] Enable lowering of atomics on local memory"
This reverts commit fede99d386.
2021-04-26 19:33:01 -04:00
William S. Moses fede99d386 [NVPTX] Enable lowering of atomics on local memory
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.

Differential Revision: https://reviews.llvm.org/D98650
2021-04-26 19:27:27 -04:00
Lei Zhang 254e289d45 Revert "[ADT] Remove StatisticBase and make NoopStatistic empty"
This reverts commit b540311781
because it breaks MLIR build:

https://buildkite.com/mlir/mlir-core/builds/13299#ad0f8901-dfa4-43cf-81b8-7940e2c6c15b
2021-04-26 18:31:04 -04:00
Michael Kruse b99466eb45 [SimplifyCFG] Preserve metadata when unconditionalizing branches (same target).
When replacing a conditional branch by an unconditional one because the targets are identical, transfer the metadata to the new branch instruction.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101226
2021-04-26 17:23:01 -05:00
Fangrui Song b540311781 [ADT] Remove StatisticBase and make NoopStatistic empty
In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic`
but has 3 unused pointers. GlobalOpt considers that the pointers can potentially
retain allocated objects, so GlobalOpt cannot optimize out the `NoopStatistic`
variables (see D69428 for more context), wasting 23KiB for stage 2 clang.

This patch makes `NoopStatistic` empty and thus reclaims the wasted space.  The
clang size is even smaller than applying D69428 (slightly smaller in both .bss and
.text).
```
# This means the D69428 optimization on clang is mostly nullified by this patch.
HEAD+D69428: size(.bss) = 0x0725a8
HEAD+D101211: size(.bss) = 0x072238

# bloaty - HEAD+D69428 vs HEAD+D101211
# With D101211, we also save a lot of string table space (.rodata).
    FILE SIZE        VM SIZE
 --------------  --------------
  -0.0%     -32  -0.0%     -24    .eh_frame
  -0.0%    -336  [ = ]       0    .symtab
  -0.0%    -360  [ = ]       0    .strtab
  [ = ]       0  -0.2%    -880    .bss
  -0.0% -2.11Ki  -0.0% -2.11Ki    .rodata
  -0.0% -2.89Ki  -0.0% -2.89Ki    .text
  -0.0% -5.71Ki  -0.0% -5.88Ki    TOTAL
```

Note: LoopFuse is a disabled pass. This patch adds `#if LLVM_ENABLE_STATS` so
`OptimizationRemarkMissed` is skipped in LLVM_ENABLE_STATS==0 builds.  If these
`OptimizationRemarkMissed` are useful and not noisy, we can replace
`llvm::Statistic` with `llvm::TrackingStatistic` in the future.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D101211
2021-04-26 13:39:35 -07:00
Fangrui Song 614de225c9 [gcov] Set nounwind and respect module flags metadata "frame-pointer" & "uwtable" for synthesized functions
This applies the D100251 mechanism to the gcov instrumentation pass.

With this patch, `-fno-omit-frame-pointer` in
`clang -fprofile-arcs -O1 -fno-omit-frame-pointer` will be respected for synthesized
`__llvm_gcov_writeout,__llvm_gcov_reset,__llvm_gcov_init` functions: the frame pointer
will be kept (note: on many targets -O1 eliminates the frame pointer by default).

`clang -fno-exceptions -fno-asynchronous-unwind-tables -g -fprofile-arcs` will
produce .debug_frame instead of .eh_frame.

Fix: https://github.com/ClangBuiltLinux/linux/issues/955

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D101129
2021-04-26 13:30:21 -07:00
Michael Kruse 153144be40 [SimplifyCFG] Preserve metadata when unconditionalizing branches (constant condition).
When replacing a conditional branch by an unconditional one because the condition is a constant, transfer the metadata to the new branch instruction.

Part of fix for llvm.org/PR50060

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101141
2021-04-26 10:57:31 -05:00
Dávid Bolvanský 691badc3d6 [InstCombine] C - ctpop(a) - > ctpop(~a)) if C is bitwidth (PR50104)
Proof: https://alive2.llvm.org/ce/z/mncA9K
Solves https://bugs.llvm.org/show_bug.cgi?id=50104

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101257
2021-04-26 15:40:54 +02:00
Yuanbo Li cc7803ee3f [LSR][DebugInfo] Don't unnecessarily drop DebugLocs
When transforming a loop terminating condition into a "max" comparison,
the DebugLoc from the old condition should be set on the newly created
comparison. They are the same operation, just optimized. Fixes PR48067.

Differential Revision: https://reviews.llvm.org/D98218
2021-04-26 13:14:42 +01:00
Florian Hahn 7302fe4328
[VPlan] Make blocksOnly work properly with ranges over const pointers.
When iterating over const blocks, the base type in the lambdas needs
to use const VPBlockBase *, otherwise it cannot be used with input
iterators over const VPBlockBase.

Also adjust the type of the input iterator range to const &, as it
does not take ownership of the input range.
2021-04-26 10:52:35 +01:00
Florian Hahn 4b9be5ac08
[VPlan] Add VPBlockUtils::blocksOnly helper.
This patch adds a blocksOnly helpers which take an iterator range
over VPBlockBase * or const VPBlockBase * and returns an interator
range that only include BlockTy blocks. The accesses are casted to
BlockTy.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D101093
2021-04-25 17:38:09 +01:00
Florian Hahn fa2f162e76
[NewGVN] Properly transfer PredDep in move constructor. 2021-04-25 11:22:59 +01:00
Florian Hahn 1d8ef761be
[NewGVN] Use ExprResult to add extra predicate users.
This patch updates performSymbolicPredicateInfoEvaluation to manage
registering additional dependencies using ExprResult. Similar to D99987,
this fixes an issues where we failed to track the correct dependency for
a phi-of-ops value, which is marked as temporary.

Fixes PR49873.

Reviewed By: asbirlea, ruiling

Differential Revision: https://reviews.llvm.org/D100560
2021-04-25 11:13:32 +01:00
Florian Hahn 1cc5946cc8
[NewGVN] Use performSymbolicEvaluation instead of createExpression.
performSymbolicEvaluation is used to obtain the symbolic expression when
visiting instructions and this is used to determine their congruence
class.

performSymbolicEvaluation only creates expressions for certain
instructions (via createExpression). For unsupported instructions,
'unknown' expression are created.

The use of createExpression in processOutgoingEdges means we may
simplify the condition in processOutgoingEdges to a constant in the
initial round of processing, but we use Unknown(I) for the congruence
class. If an operand of I changes the expression Unknown(I) stays the
same, so there is no update of the congruence class of I. Hence it
won't get re-visited. So if an operand of I changes in a way that causes
createExpression to return different result, this update is missed.

This patch updates the code to use performSymbolicEvaluation, to be
symmetric with the congruence class updating code.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D99990
2021-04-24 18:49:07 +01:00
Dávid Bolvanský 137568e579 [InstCombine] Fixed UB in foldCtpop 2021-04-24 19:44:16 +02:00
Dávid Bolvanský de3fa35cdb [InstCombine] ctpop(rot(X)) -> ctpop(X)
Proof:
https://alive2.llvm.org/ce/z/ss2zyt - rotl
https://alive2.llvm.org/ce/z/ZM7Aue - rotr

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101235
2021-04-24 18:25:03 +02:00
Dávid Bolvanský d4ec8ea19c [InstCombine] ctpop(X) + ctpop(Y) => ctpop(X | Y) if X and Y have no common bits (PR48999)
For example:

```
int src(unsigned int a, unsigned int b)
{
    return __builtin_popcount(a << 16) + __builtin_popcount(b >> 16);
}

int tgt(unsigned int a, unsigned int b)
{
    return __builtin_popcount((a << 16)  | (b >> 16));
}
```

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101210
2021-04-24 17:52:10 +02:00
dfukalov 6c57044231 [GVN] Clobber partially aliased loads.
Use offsets stored in `AliasResult` implemented in D98718.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95543
2021-04-24 14:14:20 +03:00
wlei 3d1aecbd28 [CSSPGO] Fix missing debug info of dangling pseudo probe
While doing speculative execution opt, it conservatively drops all insn's debug info in the merged `ThenBB`(see the loop at line 2384) including the dangling probe. The missing debug info of the dangling probe will cause the wrong inference computation.

So we should avoid dropping the debug info from pseudo probe, this change try to fix this by moving the to-be dangling probe to the merging target BB before the debug info is dropped.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D101195
2021-04-23 14:26:47 -07:00
Dávid Bolvanský 9aee07abd0 [InstCombine] X - usub.sat(X, Y) => umin(X, Y)
Pattern regressed in LLVM 9 with the introduction of usub.sat.

Fixes https://bugs.llvm.org/show_bug.cgi?id=42178#c2

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101184
2021-04-23 21:13:07 +02:00
Hongtao Yu 5f2d730073 [CSSPGO] Fix incorrect prorating indirect call distribution factor that leads to target count loss.
Pseudo probe distribution factor is used to scale down profile samples to avoid misleading the counts inference due to the usage of "maximum" in `getBlockWeight`. For callsites, the scaling down can come from code duplication prior to the sample profile loader (prelink or postlink), or due to the indirect call promotion in sample loader inliner. This patch fixes an issue in sample loader ICP where the leftover indirect callsite scaling down causes the loss of non-promoted call target samples unexpectedly. While the scaling down is to favor BFI/BPI with accurate an callsite count, it doesn't fit in the current distribution factor that represents code duplication changes. Ideally,  we would need two factors, one is for code duplication, the other is for ICP. However this seems over complicated. I'm going to trade one usage (callsite counts) for the other (call target counts).

Seeing perf win on one benchmark (mcf) of SPEC2017 with others unchanged.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D100993
2021-04-23 11:09:22 -07:00
Sanjay Patel e10d7d455d [InstCombine] fold 'not' of ctpop in parity pattern
As discussed in https://llvm.org/PR50096 , we could
convert the 'not' into a 'sub' and see the same
fold. That's because we already have another demanded
bits optimization for 'sub'.

We could add a related transform for
odd-number-of-type-bits, but that seems unlikely
to be practical.

https://alive2.llvm.org/ce/z/TWJZXr
2021-04-23 13:23:24 -04:00
Florian Hahn 89c4dda076
[VPlan] Add GraphTraits impl to traverse through VPRegionBlock.
This patch adds a new iterator to traverse through VPRegionBlocks and a
GraphTraits specialization using the iterator to traverse through
VPRegionBlocks.

Because there is already a GraphTraits specialization for VPBlockBase *
and co, a new VPBlockRecursiveTraversalWrapper helper is introduced.
This allows us to provide a new GraphTraits specialization for that
type. Users can use the new recursive traversal by using this wrapper.

The graph trait visits both the entry block of a region, as well as all
its successors. Exit blocks of a region implicitly have their parent
region's successors. This ensures all blocks in a region are visited
before any blocks in a successor region when doing a reverse post-order
traversal of the graph.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D100175
2021-04-23 17:26:47 +01:00
Sander de Smalen f9a50f04ba [TTI] NFC: Change getIntImmCost[Inst|Intrin] to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100565
2021-04-23 16:06:36 +01:00
Sander de Smalen 43ace8b5ce [TTI] NFC: Change getScalingFactorCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100564
2021-04-23 16:06:36 +01:00
Timm Bäder e60d6e91e1 [llvm][NFC] Fix assert indentation
This triggers GCC's misleading-indentation checker.
2021-04-23 14:44:05 +02:00
Dávid Bolvanský 5f77e7708a [InstCombine] Fixed crash when setting align attr for memalign 2021-04-23 14:04:08 +02:00
Florian Hahn 2b15262f89
Recommit "[NewGVN] Track simplification dependencies for phi-of-ops."
This recommits 4f5da356ff, including
explicit implementations of move a constructor and deleted copy
constructors/assignment operators, to fix failures with some compilers.

This reverts the revert 74854d00e8.
2021-04-23 11:27:43 +01:00
Stephen Tozer 791930d740 Re-reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
Previous build failures were caused by an error in bitcode reading and
writing for DIArgList metadata, which has been fixed in e5d844b587.
There were also some unnecessary asserts that were being triggered on
certain builds, which have been removed.

This reverts commit dad5caa59e.
2021-04-23 10:54:01 +01:00
Florian Hahn 74854d00e8
Revert "[NewGVN] Track simplification dependencies for phi-of-ops."
This reverts commit 4f5da356ff.

This causes some  buildbot failures, e.g.
https://lab.llvm.org/buildbot/#/builders/139/builds/3019
2021-04-23 09:56:17 +01:00
Florian Hahn 4f5da356ff
[NewGVN] Track simplification dependencies for phi-of-ops.
If we are using a simplified value, we need to add an extra
dependency this value , because changes to the class of the
simplified value may require us to invalidate any decision based on
that value.

This is done by adding such values as additional users, however the
current code does not excludes temporary instructions.

At the moment, this means that we miss those dependencies for
phi-of-ops, because they are temporary instructions at this point. We
instead need to add the extra dependencies to the root instruction of
the phi-of-ops.

This patch pushes the responsibility of adding extra users to the
callers of createExpression & performSymbolicEvaluation. At those
points, it is clearer which real instruction to pick.

Alternatively we could either pass the 'real' instruction as additional
argument or use another map, but I think the approach in the patch makes
things a bit easier to follow.

Fixes PR35074.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D99987
2021-04-23 09:48:38 +01:00
KAWASHIMA Takahiro d9a9c992d1 [LoopReroll] Fix rerolling loop with extra instructions
Fixes PR47627

This fix suppresses rerolling a loop which has an unrerollable
instruction.

Sample IR for the explanation below:

```
define void @foo([2 x i32]* nocapture %a) {
entry:
  br label %loop

loop:
  ; base instruction
  %indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]

  ; unrerollable instructions
  %stptrx = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %indvar, i64 0
  store i32 999, i32* %stptrx, align 4

  ; extra simple arithmetic operations, used by root instructions
  %plus20 = add nuw nsw i64 %indvar, 20
  %plus10 = add nuw nsw i64 %indvar, 10

  ; root instruction 0
  %ldptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 0
  %value0 = load i32, i32* %ldptr0, align 4
  %stptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 0
  store i32 %value0, i32* %stptr0, align 4

  ; root instruction 1
  %ldptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 1
  %value1 = load i32, i32* %ldptr1, align 4
  %stptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 1
  store i32 %value1, i32* %stptr1, align 4

  ; loop-increment and latch
  %indvar.next = add nuw nsw i64 %indvar, 1
  %exitcond = icmp eq i64 %indvar.next, 5
  br i1 %exitcond, label %exit, label %loop

exit:
  ret void
}
```

In the loop rerolling pass, `%indvar` and `%indvar.next` are appended
to the `LoopIncs` vector in the `LoopReroll::DAGRootTracker::findRoots`
function.

Before this fix, two instructions with `unrerollable instructions`
comment above are marked as `IL_All` at the end of the
`LoopReroll::DAGRootTracker::collectUsedInstructions` function,
as well as instructions with `extra simple arithmetic operations`
comment and `loop-increment and latch` comment. It is incorrect
because `IL_All` means that the instruction should be executed in all
iterations of the rerolled loop but the `store` instruction should
not.

This fix rejects instructions which may have side effects and don't
belong to def-use chains of any root instructions and reductions.

See https://bugs.llvm.org/show_bug.cgi?id=47627 for more information.
2021-04-23 15:14:46 +09:00
Elia Geretto 2627f99613 [dfsan] Fix Len argument type in call to __dfsan_mem_transfer_callback
This patch is supposed to solve: https://bugs.llvm.org/show_bug.cgi?id=50075

The function `__dfsan_mem_transfer_callback` takes a `Len` argument of type `i64`; however, when processing a `MemTransferInst` such as `llvm.memcpy.p0i8.p0i8.i32`, the `len` argument has type `i32`. In order to make the type of `len` compatible with the one of the callback argument, this change zero-extends it when necessary.

Reviewed By: stephan.yichao.zhao, gbalats

Differential Revision: https://reviews.llvm.org/D101048
2021-04-22 21:12:20 +00:00
Arthur Eubanks 16ff1a7023 [GlobalOpt] Don't replace alias with aliasee if aliasee is interposable
Both the alias and aliasee linkage are important.

PR27866 provides some background.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D99629
2021-04-22 13:12:34 -07:00
Philip Reames 15e19a2599 Revert "[instcombine] Exploit UB implied by nofree attributes"
This change effectively reverts 86664638, but since there have been some changes on top and I wanted to leave the tests in, it's not a mechanical revert.

Why revert this now?  Two main reasons:
1) There are continuing discussion around what the semantics of nofree.  I am getting increasing uncomfortable with the seeming possibility we might redefine nofree in a way incompatible with these changes.
2) There was a reported miscompile triggered by this change (https://github.com/emscripten-core/emscripten/issues/9443).  At first, I was making good progress on tracking down the issues exposed and those issues appeared to be unrelated latent bugs.  Now that we've found at least one bug in the original change, and the investigation has stalled, I'm no longer comfortable leaving this in tree.  In retrospect, I probably should have reverted this earlier and investigated the issues once the triggering change was out of tree.
2021-04-22 10:53:17 -07:00
Jianzhou Zhao 7fdf270965 [dfsan] Track origin at loads
The first version of origin tracking tracks only memory stores. Although
    this is sufficient for understanding correct flows, it is hard to figure
    out where an undefined value is read from. To find reading undefined values,
    we still have to do a reverse binary search from the last store in the chain
    with printing and logging at possible code paths. This is
    quite inefficient.

    Tracking memory load instructions can help this case. The main issues of
    tracking loads are performance and code size overheads.

    With tracking only stores, the code size overhead is 38%,
    memory overhead is 1x, and cpu overhead is 3x. In practice #load is much
    larger than #store, so both code size and cpu overhead increases. The
    first blocker is code size overhead: link fails if we inline tracking
    loads. The workaround is using external function calls to propagate
    metadata. This is also the workaround ASan uses. The cpu overhead
    is ~10x. This is a trade off between debuggability and performance,
    and will be used only when debugging cases that tracking only stores
    is not enough.

Reviewed By: gbalats

Differential Revision: https://reviews.llvm.org/D100967
2021-04-22 16:25:24 +00:00
Alexey Bataev 18c61fc498 [SLP]Skip undefs trying to find perfect/shuffled tree entries matching.
We can skip check for undefs trying to find perfect/shuffled tree
entries matching, they can be ignored completely improving the final
cost/vectorization results.

Differential Revision: https://reviews.llvm.org/D101061
2021-04-22 08:59:07 -07:00
Joe Ellis 2c551aedcf [LoopVectorize] Fix bug where predicated loads/stores were dropped
This commit fixes a bug where the loop vectoriser fails to predicate
loads/stores when interleaving for targets that support masked
loads and stores.

Code such as:

     1  void foo(int *restrict data1, int *restrict data2)
     2  {
     3    int counter = 1024;
     4    while (counter--)
     5      if (data1[counter] > data2[counter])
     6        data1[counter] = data2[counter];
     7  }

... could previously be transformed in such a way that the predicated
store implied by:

    if (data1[counter] > data2[counter])
       data1[counter] = data2[counter];

... was lost, resulting in miscompiles.

This bug was causing some tests in llvm-test-suite to fail when built
for SVE.

Differential Revision: https://reviews.llvm.org/D99569
2021-04-22 15:05:54 +00:00
Alexey Bataev d4f5f23bbb [SLP]Replace more `TTI` with `TTIRef`, NFC.
To pacify MSVC buildbots.
2021-04-22 07:53:20 -07:00
Alexey Bataev da2cdfd421 [SLP]Added explicit ref to TargetTransformInfo to try to pacify MSVC
buildbots, NFC.
2021-04-22 07:49:48 -07:00
Alexey Bataev e99b98cb1b [SLP]Improve cost model for the vectorized extractelements.
1. No need to call `areAllUsersVectorized` as later the cost is
   calculated only if the instruction has one use and gets vectorized.
2. Need to calculate the cost of the dead extractelement more precisely,
   taking the vector type of the vector operand, not the resulting
   vector type.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D99980
2021-04-22 07:40:17 -07:00
Dawid Jurczak 57f443c348 [SimplifyLibCalls][NFC] Use StringRef::back instead explicit indexing.
Split off from D100724.

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D101032
2021-04-22 15:02:47 +02:00
David Sherwood 5a229a6702 [LoopVectorize] Don't create unnecessary vscale intrinsic calls
In quite a few cases in LoopVectorize.cpp we call createStepForVF
with a step value of 0, which leads to unnecessary generation of
llvm.vscale intrinsic calls. I've optimised IRBuilder::CreateVScale
and createStepForVF to return 0 when attempting to multiply
vscale by 0.

Differential Revision: https://reviews.llvm.org/D100763
2021-04-22 09:01:52 +01:00
Max Kazantsev 8fe62b7af1 [GVN] Introduce loop load PRE
This patch allows PRE of the following type of loads:

```
preheader:
  br label %loop

loop:
  br i1 ..., label %merge, label %clobber

clobber:
  call foo() // Clobbers %p
  br label %merge

merge:
  ...
  br i1 ..., label %loop, label %exit

```

Into
```
preheader:
  %x0 = load %p
  br label %loop

loop:
  %x.pre = phi(x0, x2)
  br i1 ..., label %merge, label %clobber

clobber:
  call foo() // Clobbers %p
  %x1 = load %p
  br label %merge

merge:
  x2 = phi(x.pre, x1)
  ...
  br i1 ..., label %loop, label %exit

```

So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.

The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.

There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
  we don't know if their sum is colder than the header.

Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames
2021-04-22 12:50:38 +07:00
Chuanqi Xu 77ca2a6893 [Coroutine] Collect CoroBegin if all of terminators are dominated by one coro.destroy
Summary: The original logic seems to be we could collecting a CoroBegin
if one of the terminators could be dominated by one of coro.destroy,
which doesn't make sense.
This patch rewrites the logics to collect CoroBegin if all of
terminators are dominated by one coro.destroy. If there is no such
coro.destroy, we would call hasEscapePath to evaluate if we should
collect it.

Test Plan: check-llvm

Reviewed by: lxfind

Differential Revision: https://reviews.llvm.org/D100614
2021-04-22 11:21:37 +08:00
Giorgis Georgakoudis a2dbfb6b72 [OpenMP] Simplify offloading parallel call codegen
This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions.

Reviewed By: jdoerfert, Meinersbur

Differential Revision: https://reviews.llvm.org/D95976
2021-04-21 18:46:07 -07:00
Fangrui Song 775a9483e5 [IR][sanitizer] Set nounwind on module ctor/dtor, additionally set uwtable if -fasynchronous-unwind-tables
On ELF targets, if a function has uwtable or personality, or does not have
nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module.

Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified.
(i.e. If -g[123], every function gets `.eh_frame`.
This behavior is strange but that is the status quo on GCC and Clang.)

Let's take asan as an example. Other sanitizers are similar.
`asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true,
so every function gets `.eh_frame` if `-g[123]` is specified.
This is the root cause that
`-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame
while
`-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame.

This patch

* sets the nounwind attribute on sanitizer module ctor/dtor.
* let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute.

The "uwtable" mechanism is generic: synthesized functions not cloned/specialized
from existing ones should consider `Function::createWithDefaultAttr` instead of
`Function::create` if they want to get some default attributes which
have more of module semantics.

Other candidates: "frame-pointer" (https://github.com/ClangBuiltLinux/linux/issues/955
https://github.com/ClangBuiltLinux/linux/issues/1238), dso_local, etc.

Differential Revision: https://reviews.llvm.org/D100251
2021-04-21 15:58:20 -07:00
Olle Fredriksson f5446b769a [MemCpyOpt] Allow variable lengths in memcpy optimizer
This makes the memcpy-memcpy and memcpy-memset optimizations work for
variable sizes as long as they are equal, relaxing the old restriction
that they are constant integers. If they're not equal, the old
requirement that they are constant integers with certain size
restrictions is used.

The implementation works by pushing the length tests further down in the
code, which reveals some places where it's enough that the lengths are
equal (but not necessarily constant).

Differential Revision: https://reviews.llvm.org/D100870
2021-04-21 23:23:38 +02:00
Arthur Eubanks b606e2df4d [Evaluator] Bitcast result of pointer stripping
Trying to evaluate a GEP would assert with
  "Ty == cast<PointerType>(C->getType()->getScalarType())->getElementType()"
because the type of the pointer we would evaluate the GEP argument to
would be a different type than the GEP was expecting. We should treat
pointer stripping as a bitcast.

The test adds a redundant GEP that would crash due to type mismatch.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D100970
2021-04-21 13:32:29 -07:00
Nikita Popov 24e9fbc1a3 Revert "[InstCombine] Fold multiuse shr eq zero"
This reverts commit 9423f78240.

A performance regression with this patch has been reported at
https://reviews.llvm.org/rG9423f78240a2#990953. Reverting for now.
2021-04-21 21:40:52 +02:00
sstefan1 62cdcd6c5a [FuncAttrs] Don't infer willreturn for nonexact definitions
Discovered during attributor testing comparing stats with
and without the attributor. Willreturn should not be inferred
for nonexact definitions.

Differential Revision: https://reviews.llvm.org/D100988
2021-04-21 21:26:09 +02:00
sstefan1 656ebd519e [SimplifyLibCalls] Don't change alignment when creating memset
Fix for PR49984
This was discovered during Attributor testing.
Memset was always created with alignment of 1
and in case when strncpy alignment was changed
it triggered an assertion in the AttrBuilder.
Memset will now be created with appropriate alignment.

Differential Revision: https://reviews.llvm.org/D100875
2021-04-21 20:34:13 +02:00
Nico Weber ba7a92c01e [Support] Don't include VirtualFileSystem.h in CommandLine.h
CommandLine.h is indirectly included in ~50% of TUs when building
clang, and VirtualFileSystem.h is large.

(Already remarked by jhenderson on D70769.)

No behavior change.

Differential Revision: https://reviews.llvm.org/D100957
2021-04-21 10:19:01 -04:00
George Balatsouras 79b5280a6c [dfsan] Enable origin tracking with fast8 mode
All related instrumentation tests have been updated.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D100903
2021-04-20 18:10:32 -07:00
Arthur Eubanks 326da4adcb [FuncAttrs] Always preserve FunctionAnalysisManagerCGSCCProxy
FunctionAnalysisManagerCGSCCProxy should not be preserved if any of its
keys may be invalid. Since we are not removing/adding functions in
FuncAttrs, it's fine to preserve it.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D100893
2021-04-20 16:37:45 -07:00
Reid Kleckner 91f7a4fff7 Revert "[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)"
This reverts commit 13ec913bdf.

This commit introduces new uses of the overflow checking intrinsics that
depend on implementations in compiler-rt, which Windows users generally
do not link against. I filed an issue (somewhere) to make clang
auto-link the builtins library to resolve this situation, but until that
happens, it isn't reasonable for the optimizer to introduce new link
time dependencies.
2021-04-20 15:53:34 -07:00
Philip Reames 4824d876f0 Revert "Allow invokable sub-classes of IntrinsicInst"
This reverts commit d87b9b81cc.

Post commit review raised concerns, reverting while discussion happens.
2021-04-20 15:38:38 -07:00
Roman Lebedev 5a654bfeab
Revert "[InstCombine] `sext(trunc(x)) --> sext(x)` iff trunc is NSW (PR49543)"
I forgot about the case where we sign-extend to width smaller than the original.

This reverts commit 1e6ca23ab8.
2021-04-21 01:11:15 +03:00
Roman Lebedev 1e68d338c1
Revert "[InstCombine] "Bypass" NUW trunc of lshr if we are going to sext the result (PR49543)"
I forgot about the case where we sign-extend to width smaller than the original.

This reverts commit 41b71f718b.
2021-04-21 01:11:14 +03:00
Philip Reames d87b9b81cc Allow invokable sub-classes of IntrinsicInst
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.

This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.

After this lands and has baked for a couple days, planned cleanups:
    Make GCStatepointInst a IntrinsicInst subclass.
    Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
    Do the same in SelectionDAG.
    Do the same in FastISEL.

Differential Revision: https://reviews.llvm.org/D99976
2021-04-20 15:03:49 -07:00
Roman Lebedev 41b71f718b
[InstCombine] "Bypass" NUW trunc of lshr if we are going to sext the result (PR49543)
This is a more convoluted form of the same pattern "sext of NSW trunc",
but in this case the operand of trunc was a right-shift,
and the truncation chops off just the zero bits that were shifted-in.
2021-04-21 00:31:46 +03:00
Roman Lebedev 1e6ca23ab8
[InstCombine] `sext(trunc(x)) --> sext(x)` iff trunc is NSW (PR49543)
If we can tell that trunc only chops off sign bits, and not all of them,
then we can simply sign-extend the trunc's source.
2021-04-21 00:31:45 +03:00
Sanjay Patel 1e202e8f39 [InstCombine] fold shift-of-srem-by-2 to mask+shift
There are several potential srem-by-2 folds
because the result is known {-1,0,1}.

https://alive2.llvm.org/ce/z/LuVyeK
2021-04-20 17:10:16 -04:00
Roman Lebedev 13ec913bdf
[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)
We already had support for it's unsigned variant, so simply extend it
to also handle the signed variant.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48769
2021-04-20 21:29:43 +03:00
Joseph Huber b2ad63d3cf [OpenMP] Add OpenMPOpt as a Module pass
Summary:
This patch registers OpenMPOpt as a Module pass in addition to a CGSCC
pass. This is so certain optimzations that are sensitive to intact
call-sites can happen before inlining. The old `openmpopt` pass name is
changed to `openmp-opt-cgscc` and `openmp-opt` calls the Module pass.
The current module pass only runs a single check but will be expanded in
the future.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D99202
2021-04-20 12:28:58 -04:00
Alexey Bataev af870e11ae [SLP] Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100495
2021-04-20 09:08:46 -07:00
Philip Reames 3b1474cab2 free(nullptr) does not violate the nofree specification
This fixes a subtle and nasty bug in my 86664638. The problem is that free(nullptr) is well defined (and common).

The specification for the nofree attributes talks about memory objects, and doesn't explicitly address null, but I think it's reasonable to assume that nofree doesn't disallow a call to free(nullptr). If it did, we'd have to prove nonnull on an argument to ever infer nofree which doesn't seem to be the intent.

This was found by Nuno and Alive2 over in https://reviews.llvm.org/D100141#2697374.

Differential Revision: https://reviews.llvm.org/D100779
2021-04-20 09:08:05 -07:00
Alexey Bataev b82344a019 Revert "[SLP] Add detection of shuffled/perfect matching of tree entries."
This reverts commit daf6e18c55 to fix the
compiler crash.
2021-04-20 08:29:32 -07:00
Alexey Bataev daf6e18c55 [SLP] Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100495
2021-04-20 07:46:49 -07:00
Alexey Bataev cf00cb8bed Revert "[SLP] Add detection of shuffled/perfect matching of tree entries."
This reverts commit b232771aca to fix
buildbots.
2021-04-20 07:16:11 -07:00
Alexey Bataev b232771aca [SLP] Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100495
2021-04-20 06:55:55 -07:00
Sander de Smalen 86729538bd [LV] Let selectVectorizationFactor reason directly on VectorizationFactor.
Rather than maintaining two separate values, a `float` for the per-lane
cost and a Width for the VF, maintain a single VectorizationFactor which
comprises the two and also removes the need for converting an integer value
to float.

This simplifies the query when asking if one VF is more profitable than
another when we want to extend this for scalable vectors (which may
require additional options to determine if e.g. a scalable VF of the
some cost, is more profitable than a fixed VF of the same cost).

The patch isn't entirely NFC because it also fixes an issue in
selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs
no longer truncates the floating-point cost from `float` to `unsigned` to
then perform the calculation on the truncated cost. It now does
a cost comparison with the correct precision.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100121
2021-04-20 09:54:45 +01:00
Luo, Yuanke bcdaccfe34 [X86][AMX] Verify illegal types or instructions for x86_amx.
This patch is related to https://reviews.llvm.org/D100032 which define
some illegal types or operations for x86_amx. There are no arguments,
arrays, pointers, vectors or constants of x86_amx.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D100472
2021-04-20 16:14:22 +08:00
Arthur Eubanks 5e71b9fa93 Explicitly pass type to cast load constant folding result
Previously we would use the type of the pointee to determine what to
cast the result of constant folding a load. To aid with opaque pointer
types, we should explicitly pass the type of the load rather than
looking at pointee types.

ConstantFoldLoadThroughBitcast() converts the const prop'd value to the
proper load type (e.g. [1 x i32] -> i32). Instead of calling this in
every intermediate step like bitcasts, we only call this when we
actually see the global initializer value.

In some existing uses of this API, we don't know the exact type we're
loading from immediately (e.g. first we visit a bitcast, then we visit
the load using the bitcast). In those cases we have to manually call
ConstantFoldLoadThroughBitcast() when simplifying the load to make sure
that we cast to the proper type.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D100718
2021-04-20 00:53:21 -07:00
Dávid Bolvanský 324d641b75 [InstCombine] Enhance deduction of alignment for aligned_alloc
This patch improves https://reviews.llvm.org/D76971 (Deduce attributes for aligned_alloc in InstCombine) and implements "TODO" item mentioned in the review of that patch.

> The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a multiple of alignment.

Currently, we simply bail out if we see a non-constant size - change that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D100785
2021-04-20 02:04:18 +02:00
Alexey Bataev 8030481065 Revert "[SLP]Add detection of shuffled/perfect matching of tree entries."
This reverts commit d6fde91379 to fix
compiler crashes.
2021-04-19 14:10:04 -07:00
Zequan Wu e28435caf6 [ThinLTO] Copy UnnamedAddr when spliting module.
The unnamedaddr property of a function is lost when using
`-fwhole-program-vtables` and thinlto which causes size increase under linker's
safe icf mode.

The size increase of chrome on Linux when switching from all icf to safe icf
drops from 5 MB to 3 MB after this change, and from 6 MB to 4 MB on Windows.

There is a repro:
```
# a.h
struct A {
  virtual int f();
  virtual int g();
};

# a.cpp
#include "a.h"
int A::f() { return 10; }
int A::g() { return 10; }

# main.cpp
#include "a.h"

int g(A* a) {
  return a->f();
}

int main(int argv, char** args) {
  A a;
  return g(&a);
}

$ clang++ -O2 -ffunction-sections -flto=thin -fwhole-program-vtables -fsplit-lto-unit -c main.cpp -o main.o  && clang++ -Wl,--icf=safe -fuse-ld=lld  -flto=thin main.o -o a.out && llvm-readobj -t a.out | grep -A 1 -e _ZN1A1fEv -e _ZN1A1gEv
    Name: _ZN1A1fEv (480)
    Value: 0x201830
--
    Name: _ZN1A1gEv (490)
    Value: 0x201840
```

Differential Revision: https://reviews.llvm.org/D100498
2021-04-19 14:04:58 -07:00
Alexey Bataev d6fde91379 [SLP]Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Differential Revision: https://reviews.llvm.org/D100495
2021-04-19 13:29:30 -07:00
Philip Reames 3c54762226 [funcattrs] Consistently check call site attributes
This is mostly stylistic cleanup after D100226, but not entirely. When skimming the code, I found one case where we weren't accounting for attributes on the callsite at all. I'm also suspicious we had some latent bugs related to operand bundles (which are supposed to be able to *override* attributes on declarations), but I don't have concrete test cases for those, just suspicions.

Aside: The only case left in the file which directly checks attributes on the declaration is the norecurse logic. I left that because I didn't understand it; it looks obviously wrong, so I suspect I'm misinterpreting the intended semantics of the attribute.

Differential Revision: https://reviews.llvm.org/D100689
2021-04-19 13:20:50 -07:00
Philip Reames 01801d5274 [rs4gc] Fix a latent bug around attribute stripping for intrinsics
This change fixes a latent bug which was exposed by a change currently in review (https://reviews.llvm.org/D99802#2685032).

The story on this is a bit involved.  Without this change, what ended up happening with the pending review was that we'd strip attributes off intrinsics, and then selectiondag would fail to lower the intrinsic.  Why?  Because the lowering of the intrinsic relies on the presence of the readonly attribute.  We don't have a matcher to select the case where there's a glue node needed.

Now, on the surface, this still seems like a codegen bug.  However, here it gets fun.  I was unable to reproduce this with a standalone test at all, and was pretty much struck until skatkov provided the critical detail.  This reproduces only when RS4GC and codegen are run in the same process and context.  Why?  Because it turns out we can't roundtrip the stripped attribute through serialized IR!

We'll happily print out the missing attribute, but when we parse it back, the auto-upgrade logic has a side effect of blindly overwriting attributes on intrinsics with those specified in Intrinsics.td.  This makes it impossible to exercise SelectionDAG from a standalone test case.

At this point, I decided to treat this an RS4GC bug as a) we don't need to strip in this case, and b) I could write a test which shows the correct behavior to ensure this doesn't break again in the future.

As an aside, I'd originally set out to handle libfuncs too - since in theory they might have the same issues - but backed away quickly when I realized how the semantics of builtin, nobuiltin, and no-builtin-x all interacted.  I'm utterly convinced that no part of the optimizer handles that correctly, and decided not to open that can of worms here.
2021-04-19 13:14:07 -07:00
Nikita Popov 9423f78240 [InstCombine] Fold multiuse shr eq zero
The single-use case is handled implicity by converting the icmp
into a mask check first. When comparing with zero in particular,
we don't need the one-use restriction, as we only produce a single
icmp.

https://alive2.llvm.org/ce/z/MSixcm
https://alive2.llvm.org/ce/z/GwpG0M
2021-04-19 22:13:11 +02:00
Nikita Popov d440f9a326 [LICM] Make capture check more precise
During store promotion, we check whether the pointer was captured
to exclude potential reads from other threads. However, we're only
interested in captures before or inside the loop. Check this using
PointerMayBeCapturedBefore against the loop header.

Differential Revision: https://reviews.llvm.org/D100706
2021-04-19 20:34:23 +02:00
Roman Lebedev d746fefb6f
[SCEVExpander] ReuseOrCreateCast(): use IRBuilder to actually create the cast
In particular, this allows to create constant expressions
instead of IR Instruction's if the argumen is a constant.
2021-04-19 18:38:39 +03:00
Roman Lebedev ecc9d7e913
[SCEVExpander] Expand explicit PtrToInt casts just like we would implicit ones
I.e., use GetOptimalInsertionPointForCastOf() helper to get the insertion
point, and try to reuse casts first.
2021-04-19 18:38:39 +03:00
Roman Lebedev 442c408e0e
[SCEVExpander] GetOptimalInsertionPointForCastOf(): gracefully handle Constant's
I guess this case hasn't come up thus far, and i'm not sure if it can
really happen for the existing usages, thus no test in *this* commit.

But, the following commit adds test coverage,
there we'd expirience a crash without this fix.
2021-04-19 18:38:39 +03:00
Roman Lebedev b8a3705896
[NFCI][SCEVExpander] Extract GetOptimalInsertionPointForCastOf() helper 2021-04-19 18:38:38 +03:00
Roman Lebedev 73f60e3988
[SCEVExpander] generateOverflowCheck(): explicitly PtrToInt the Start
Currently, InsertNoopCastOfTo() would implicitly insert that cast,
but now that we have SCEVPtrToIntExpr, i'm hoping we could stop
InsertNoopCastOfTo() from doing that. But first all users must be fixed.
2021-04-19 18:38:38 +03:00
Cullen Rhodes f0bc2782f2 [TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D100377
2021-04-19 11:01:34 +00:00
OCHyams 0ebf9a8e34 [DebugInfo] Move the findDbg* functions into DebugInfo.cpp
Move the findDbg* functions into lib/IR/DebugInfo.cpp from
lib/Transforms/Utils/Local.cpp.

D99169 adds a call to a function (findDbgUsers) that lives in
lib/Transforms/Utils/Local.cpp (LLVMTransformUtils) from lib/IR/Value.cpp
(LLVMCore). The Core lib doesn't include TransformUtils. The builtbots caught
this here: https://lab.llvm.org/buildbot/#/builders/109/builds/12664. This patch
moves the function, and the 3 similar ones for consistency, into DebugInfo.cpp
which is part of LLVMCore.

Reviewed By: dblaikie, rnk

Differential Revision: https://reviews.llvm.org/D100632
2021-04-19 10:30:25 +01:00
Evgeniy Brevnov 35e95c6817 [CVP] processCallSite returns wrong status
Recently processMinMaxIntrinsic has been added and we started to observe a number of analysis get invalidated after CVP. The problem is CVP conservatively returns 'true'  even if there were no modifications to IR. I found one more place besides processMinMaxIntrinsic  which has the same problem. I think processMinMaxIntrinsic and similar should better have boolean return status to prevent similar issue reappear in future.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D100538
2021-04-19 12:13:22 +07:00
Xun Li 5faba87938 Revert "[Coroutines] Set presplit attribute in Clang instead of CoroEarly pass"
This reverts commit fa6b54c44a.
The commited patch broke mlir tests. It seems that mlir tests depend on coroutine function properties set in CoroEarly pass.
2021-04-18 17:22:28 -07:00
Xun Li fa6b54c44a [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920

To fix this, we set the attributes in the Clang front-end instead of in CoroEarly pass.

Reviewed By: rjmccall, ChuanqiXu

Differential Revision: https://reviews.llvm.org/D100282
2021-04-18 15:41:09 -07:00
Xun Li c0211e8d7d Revert "[Coroutines] Move CoroEarly pass to before AlwaysInliner"
This reverts commit 2b50f5a434.
Forgot to update the description of the commit to sync with phabricator. Going to redo the commit.
2021-04-18 15:38:19 -07:00
Xun Li 2b50f5a434 [Coroutines] Move CoroEarly pass to before AlwaysInliner
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920

Differential Revision: https://reviews.llvm.org/D100282
2021-04-18 14:54:04 -07:00
Juneyoung Lee 1c10201d96 Update InstCombine to use undef matcher instead
This is a patch to use m_Undef() matcher instead of isa<UndefValue>().

As suggested in D100122, this update is separately committed.
2021-04-18 11:05:36 +09:00
Florian Hahn af523514c4
[SimplifyCFG] Skip dbg intrinsics when checking for branch-only BBs.
Debug intrinsics are free to hoist and should be skipped when looking
for terminator-only blocks. As a consequence, we have to delegate to the
main hoisting loop to hoist any dbg intrinsics instead of jumping to the
terminator case directly.

This fixes PR49982.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D100640
2021-04-17 15:17:50 +01:00
Nikita Popov e68b12c99e [Inline] Don't add noalias metadata to inaccessiblememonly calls
It will not do anything useful for them, as we already know that
they don't modref with any accessible memory.

In particular, this prevents noalias metadata from being placed
on noalias.scope.decl intrinsics. This reduces the amount of
metadata needed, and makes it more likely that unnecessary decls
can be eliminated.
2021-04-17 14:56:13 +02:00
Serge Guelton d6de1e1a71 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
Philip Reames 11707435cc [inferattrs] Don't infer lib func attributes for nobuiltin functions
If we have a nobuiltin function, we can't assume we know anything about the implementation.

I noticed this when tracing through a log from an in the wild miscompile (https://github.com/emscripten-core/emscripten/issues/9443) triggered after 8666463.  We were incorrectly assuming that a custom allocator could not free.  (It's not clear yet this is the only problem in said issue.)

I also noticed something similiar mentioned in the commit message of ab243e when scrolling back through history.  Through, from what I can tell, that commit fixed symptom not root cause.

The interface we have for library function detection is extremely error prone, but given the interaction between ``nobuiltin`` decls and ``builtin`` callsites, it's really hard to imagine something much cleaner.  I may iterate on that, but it'll be invasive enough I didn't want to hold an obvious functional fix on it.
2021-04-16 15:36:15 -07:00
Philip Reames f549176ad9 [funcattrs] Add the maximal set of implied attributes to definitions
Have funcattrs expand all implied attributes into the IR. This expands the infrastructure from D100400, but for definitions not declarations this time.

Somewhat subtly, this mostly isn't semantic. Because the accessors did the inference, any client which used the accessor was already getting the stronger result. Clients that directly checked presence of attributes (there are some), will see a stronger result now.

The old behavior can end up quite confusing for two reasons:
* Without this change, we have situations where function-attrs appears to fail when inferring an attribute (as seen by a human reading IR), but that consuming code will see that it should have been implied. As a human trying to sanity check test results and study IR for optimization possibilities, this is exceeding error prone and confusing. (I'll note that I wasted several hours recently because of this.)
* We can have transforms which trigger without the IR appearing (on inspection) to meet the preconditions. This change doesn't prevent this from happening (as the accessors still involve multiple checks), but it should make it less frequent.

I'd argue in favor of deleting the extra checks out of the accessors after this lands, but I want that in it's own review as a) it's purely stylistic, and b) I already know there's some disagreement.

Once this lands, I'm also going to do a cleanup change which will delete some now redundant duplicate predicates in the inference code, but again, that deserves to be a change of it's own.

Differential Revision: https://reviews.llvm.org/D100226
2021-04-16 14:22:19 -07:00
Philip Reames ff55d01a8e [nofree] Restrict semantics to memory visible to caller
This patch clarifies the semantics of the nofree function attribute to make clear that it provides an "as if" semantic. That is, a nofree function is guaranteed not to free memory which existed before the call, but might allocate and then deallocate that same memory within the lifetime of the callee.

This is the result of the discussion on llvm-dev under the thread "Ambiguity in the nofree function attribute".

The most important part of this change is the LangRef wording. The rest is minor comment changes to emphasize the new semantics where code was accidentally consistent, and fix one place which wasn't consistent. That one place is currently narrowly used as it is primarily part of the ongoing (and not yet enabled) deref-at-point semantics work.

Differential Revision: https://reviews.llvm.org/D100141
2021-04-16 11:38:55 -07:00
Marcythm f8cf3b9931
[LICM][NFC] Fix typo
fixed some typos which may lead to misunderstandings in LICM.cpp

Reviewed By: nikic, asbirlea
Differential Revision: https://reviews.llvm.org/D100470
2021-04-16 09:42:00 +08:00
Arthur Eubanks 9c776c2fa2 [NFC][NewPM] Remove some AnalysisManager invalidate methods
These were misleading, they're more of a "clear" than an "invalidate".

We shouldn't be individually clearing analysis results. Either we clear
all analyses when some IR becomes invalid, or we properly go through
invalidation.

There was only one use of this, which can be simulated with
AM.invalidate(F, PA).

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D100519
2021-04-15 16:51:26 -07:00
Florian Hahn 3e7ee5428d
[InferAttrs] Do not mark first argument of str(n)cat as writeonly.
str(n)cat appends a copy of the second argument to the end of the first
argument. To find the end of the first argument, str(n)cat has to read
from it until it finds the terminating 0. So it should not be marked as
writeonly. I think this means the argument should not be marked as
writeonly.

(This is causing a mis-compile with legacy DSE, before it got removed)

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D100601
2021-04-15 23:00:21 +01:00
Florian Hahn 49999d4364 [VPlan] Replace a few unnecessary includes with forward decls. 2021-04-15 20:08:31 +01:00
Danilo C. Grael 55487079a9 [LoopUnrollAndJam] Avoid repeated instructions for UAJ analysis
Avoid visiting repeated instructions for processHeaderPhiOperands as it can cause a scenario of endless loop. Test case is attached and can be ran with `opt -basic-aa -tbaa -loop-unroll-and-jam  -allow-unroll-and-jam -unroll-and-jam-count=4`.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D97407
2021-04-15 12:59:42 -04:00
Mark Johnston f511dc75e4 [asan] Add an offset for the kernel address sanitizer on FreeBSD
This is based on a port of the sanitizer runtime to the FreeBSD kernel
that has been commited as https://cgit.freebsd.org/src/commit/?id=38da497a4dfcf1979c8c2b0e9f3fa0564035c147
and the following commits.

Reviewed By: emaste, dim
Differential Revision: https://reviews.llvm.org/D98285
2021-04-15 17:49:00 +01:00
Stelios Ioannou bf147c4653 [LSR] Fix for pre-indexed generated constant offset
This patch changed the isLegalUse check to ensure that
LSRInstance::GenerateConstantOffsetsImpl generates an
offset that results in a legal addressing mode and
formula. The check is changed to look similar to the
assert check used for illegal formulas.

Differential Revision: https://reviews.llvm.org/D100383

Change-Id: Iffb9e32d59df96b8f072c00f6c339108159a009a
2021-04-15 16:44:42 +01:00
Florian Hahn 6adebe3fd2 [VPlan] Add VPRecipeBase::mayHaveSideEffects.
Add an initial version of a helper to determine whether a recipe may
have side-effects.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D100259
2021-04-15 11:49:40 +01:00
David Sherwood ea14df695e [SVE][LoopVectorize] Fix crash in InnerLoopVectorizer::widenPHIInstruction
There were a few places in widenPHIInstruction where calculations of
offsets were failing to take the runtime calculation of VF into
account for scalable vectors. I've fixed those cases in this patch
as well as adding an assert that we should not be scalarising for
scalable vectors.

Tests are added here:

  Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

Differential Revision: https://reviews.llvm.org/D99254
2021-04-15 10:51:49 +01:00
David Sherwood 7120f89f7d [NFC][LoopVectorize] Remove unnecessary VF.isScalable asserts
There are a few places in LoopVectorize.cpp where we have been too
cautious in adding VF.isScalable() asserts and it can be confusing.
It also makes it more difficult to see the genuine places where
work needs doing to improve scalable vectorization support.

This patch changes getMemInstScalarizationCost to return an
invalid cost instead of firing an assert for scalable vectors. Also,
vectorizeInterleaveGroup had multiple asserts all for the same
thing. I have removed all but one assert near the start of the
function, and added a new assert that we aren't dealing with masks
for scalable vectors.

Differential Revision: https://reviews.llvm.org/D99727
2021-04-15 09:41:03 +01:00
Florian Hahn 5a3ff24b12
[NewGVN] Add phi-of-ops operands if no real PHI is created.
If the PHI-of-ops simplifies to an existing value, no real PHI is
created, which means the dependencies between the
PHI-of-ops and its operands is not materialized in IR. At the
moment, we fail to create a real PHI node for the PHI-of-ops,
because the PHI-of-ops root instruction is not re-visited if
one of the PHI-of-ops operands changes. We need to add the
operands as additional users in this case.

Even with this patch, there are still some dependencies
missing. I will continue tackling the outstanding
reporeted crashes in this area.

Fixes PR36501, PR42422, PR42557.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D66924
2021-04-15 08:25:10 +01:00
Philip Reames dd985551c2 Reapply "[InferAttributes] Materialize all infered attributes for declaration"" and follow on patches.
This reverts commit ab98f2c712 and 98eea392cd.

It includes a fix for the clang test which triggered the revert.  I failed to notice this one because there was another AMDGPU llvm test with a similiar name and the exact same text in the error message.  Odd.  Since only one build bot reported the clang test, I didn't notice that one.
2021-04-14 16:38:07 -07:00
Nico Weber ab98f2c712 Revert "[InferAttributes] Materialize all infered attributes for declaration"
Breaks check-clang, see comments on D100400

Also revert follow-up "[NFC] Move a recently added utility into a location to enable reuse"

This reverts commit 3ce61fb6d6.
This reverts commit 61a85da882.
2021-04-14 18:41:20 -04:00
Philip Reames 3ce61fb6d6 [NFC] Move a recently added utility into a location to enable reuse
About to refresh a patch that uses this in FunctionAtrrs, doing the move seperately to control build times.
2021-04-14 15:05:16 -07:00
Philip Reames 61a85da882 [InferAttributes] Materialize all infered attributes for declaration
We have some cases today where attributes can be inferred from another on access, but the result is not explicitly materialized in IR. This change is a step towards changing that.

Why? Two main reasons:

* Human clarity. It's really confusing trying to figure out why a transform is triggering when the IR doesn't appear to have the required attributes.
* This avoids the need to special case declarations in e.g. functionattrs. Since we can assume the attribute is present, we can work directly from attributes (and only attributes) without also needing to query accessors on Function to avoid missing cases due to unannotated (but infered on use) declarations. (This piece will appear must easier to follow once D100226 also lands.)

Differential Revision: https://reviews.llvm.org/D100400
2021-04-14 14:45:24 -07:00
Mehrnoosh Heidarpour 29f189f90d [InstCombine] Conditionally emit nowrap flags when combining two adds
Currently, the InstCombineCompare is combining two add operations
into a single add operation which always has a nsw flag, without
checking the conditions to see if this flag should be present
according to the original two add operations or not.

This patch will change the InstCombineCompare to emit the nsw or
nuw only when these flags are allowed to be generated according to
the original add operations and remove the possibility of applying
wrong optimization with passes that will perform on the IR later
in the pipeline.

To confirm that the current results are buggy and the results after
proposed patch are the correct IR the following examples from Alive2
are attached; the same results can be seen in the case of nuw flag
and nsw is just used as an example. The following link shows that
the generated IR with current LLVM is a buggy IR when none of the
original add operations have nsw flag.
https://alive2.llvm.org/ce/z/WGaDrm
The following link proves that the generated IR after the patch in
the former case is the correct IR.
https://alive2.llvm.org/ce/z/wQ7G_e

Differential Revision: https://reviews.llvm.org/D100095
2021-04-14 20:53:06 +02:00
Sjoerd Meijer 39d29817f3 [SCCP] Follow up of rGbbab9f986c6d. NFC.
This addresses the linter messages, mainly the inconsistent capitalisation of
member functions.
2021-04-14 17:14:46 +01:00
Benjamin Kramer cf4161673c [Instcombine] Disable memcpy of alloca bypass for instruction sources
This transformation is fundamentally broken when it comes to dominance,
it just happened to work when the source of the memcpy can be moved into
the place of the alloca. The bug shows up a lot more often since
077bff39d4 allows the source to be a
switch.

It would be possible to check dominance of the source and all its
operands, but that seems very heavy for instcombine.
2021-04-14 16:52:09 +02:00
Simon Pilgrim b49c41afba [SLP] createOp - fix null dereference warning. NFCI.
Only attempt to propagateIRFlags if we have both SelectInst - afaict we shouldn't have matched a min/max reduction without both SelectInst, but static analyzer doesn't know that.
2021-04-14 15:24:41 +01:00
Sjoerd Meijer bbab9f986c [SCCP] Create SCCP Solver
This refactors SCCP and creates a SCCPSolver interface and class so that it can
be used by other passes and transformations. We will use this in D93838, which
adds a function specialisation pass.

This is based on an early version by Vinay Madhusudan.

Differential Revision: https://reviews.llvm.org/D93762
2021-04-14 14:58:03 +01:00
Roman Lebedev 2fea5d5d4a
[InstCombine] tmp alloca bypass: ensure that the replacement dominates all alloca uses
After 077bff39d4,
isDereferenceableForAllocaSize() can recurse into selects,
which is causing a problem for the new test case,
reduced from https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20210412/904154.html
because the replacement (the select) is defined after the first use
of an alloca, so we'd end up with a verifier error.

Now, this new check is too restrictive.
We likely can handle *some* cases, by trying to sink all uses of an alloca
to after the the def.
2021-04-14 13:04:12 +03:00
Sterling Augustine 32e264921b Revert "[GlobalOpt] Revert valgrind hacks"
This reverts commit dbc16ed199.
2021-04-13 17:47:07 -07:00
Evgeny Leviant dbc16ed199 [GlobalOpt] Revert valgrind hacks
Differential revision: https://reviews.llvm.org/D69428
2021-04-13 19:11:10 +03:00
Sander de Smalen bd86824d98 [TTI] NFC: Change getArithmeticReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

This patch is practically NFC, with the exception of an AArch64 SVE related
cost-model change, where we can now return an Invalid cost instead of some
bogus number.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100201
2021-04-13 14:20:59 +01:00
Sander de Smalen 92d8421f49 [TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100199
2021-04-13 14:20:58 +01:00
Florian Hahn 467b1f1cd2
[SimplifyCFG] Allow hoisting terminators only with HoistCommonInsts=false.
As a side-effect of the change to default HoistCommonInsts to false
early in the pipeline, we fail to convert conditional branch & phis to
selects early on, which prevents vectorization for loops that contain
conditional branches that effectively are selects (or if the loop gets
vectorized, it will get vectorized very inefficiently).

This patch updates SimplifyCFG to perform hoisting if the only
instruction in both BBs is an equal branch. In this case, the only
additional instructions are selects for phis, which should be cheap.

Even though we perform hoisting, the benefits of this kind of hoisting
should by far outweigh the negatives.

For example, the loop in the code below will not get vectorized on
AArch64 with the current default, but will with the patch. This is a
fundamental pattern we should definitely vectorize. Besides that, I
think the select variants should be easier to use for reasoning across
other passes as well.

https://clang.godbolt.org/z/sbjd8Wshx

```
double clamp(double v) {
  if (v < 0.0)
    return 0.0;
  if (v > 6.0)
    return 6.0;
  return v;
}

void loop(double* X, double *Y) {
  for (unsigned i = 0; i < 20000; i++) {
    X[i] = clamp(Y[i]);
  }
}
```

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D100329
2021-04-13 10:33:35 +01:00
Amy Huang dad5caa59e Revert "Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands""
This change causes an assert / segmentation fault in LTO builds.

This reverts commit f2e4f3eff3.
2021-04-12 20:10:17 -07:00
Evgeniy Brevnov e50aa1af2d [NARY][NFC] Use hasNUsesOrMore instead of getNumUses since it's more
efficient.
2021-04-13 09:29:49 +07:00
Gulfem Savrun Yeniceri e96df3e531 [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-04-13 01:29:41 +00:00
Nick Desaulniers 237d4ee835 [JumpThreading] merge debug info when merging select+br
Jump threading can replace select then unconditional branch with
conditional branch, but when doing so loses debug info.

This destructive transform is eventually leading to a failed Verifier
run during full LTO builds of the Linux kernel with CFI and KCOV
enabled, as reported in PR39531.

ModuleSanitizerCoveragePass will insert calls to
__sanitizer_cov_trace_pc, and sometimes split critical edges,
using whatever debug info may or may not exist for the branch for
the added libcall. Since we can inline calls to
__sanitizer_cov_trace_pc due to LTO, this can lead to the error
observed in PR39531 when the debug info isn't propagated to
the libcall, because of prior destructive transforms that failed to
retain debug info.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D100137
2021-04-12 17:51:21 -07:00
Arthur Eubanks a8ab1f98d2 [Evaluator] Look through invariant.group intrinsics
Turning on -fstrict-vtable-pointers in Chrome caused an extra global
initializer. Turns out that a llvm.strip.invariant.group intrinsic was
causing GlobalOpt to fail to step through some simple code.

We can treat *.invariant.group uses as simply their operand.
Value::stripPointerCastsForAliasAnalysis() does exactly this. This
should be safe because the Evaluator does not skip memory accesses due
to invariants or alias analysis.

However, we don't want to leak that we've stripped arbitrary pointer
casts to users of Evaluator, so we bail out if we evaluate a function to
any constant, since we may have looked through *.invariant.group calls
and aliasing pointers cannot be arbitrarily substituted.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D98843
2021-04-12 16:12:15 -07:00