Commit Graph

11157 Commits

Author SHA1 Message Date
Craig Topper e38b7e8948 [DivRemPairs] Add an initial case for hoisting to a common predecessor.
This patch adds support for hoisting the division and maybe the
remainder for control flow graphs like this.

```
PredBB
  |  \
  |  Rem
  |  /
 Div
```

If we have DivRem we'll hoist both to PredBB. If not we'll just
hoist Div and expand Rem using the Div.

This improves our codegen for something like this

```
__uint128_t udivmodti4(__uint128_t dividend, __uint128_t divisor, __uint128_t *remainder) {
    if (remainder != 0)
        *remainder = dividend % divisor;
    return dividend / divisor;
}
```

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D87555
2021-07-11 10:03:07 -07:00
Eli Friedman 9c4baf5101 [ScalarEvolution] Strictly enforce pointer/int type rules.
Rules:

1. SCEVUnknown is a pointer if and only if the LLVM IR value is a
   pointer.
2. SCEVPtrToInt is never a pointer.
3. If any other SCEV expression has no pointer operands, the result is
   an integer.
4. If a SCEVAddExpr has exactly one pointer operand, the result is a
   pointer.
5. If a SCEVAddRecExpr's first operand is a pointer, and it has no other
   pointer operands, the result is a pointer.
6. If every operand of a SCEVMinMaxExpr is a pointer, the result is a
   pointer.
7. Otherwise, the SCEV expression is invalid.

I'm not sure how useful rule 6 is in practice.  If we exclude it, we can
guarantee that ScalarEvolution::getPointerBase always returns a
SCEVUnknown, which might be a helpful property. Anyway, I'll leave that
for a followup.

This is basically mop-up at this point; all the changes with significant
functional effects have landed.  Some of the remaining changes could be
split off, but I don't see much point.

Differential Revision: https://reviews.llvm.org/D105510
2021-07-09 17:29:26 -07:00
Arthur Eubanks 9a7afae492 [OpaquePtr][InferAddrSpace] Use PointerType::getWithSamePointeeType() 2021-07-09 10:29:08 -07:00
Roman Lebedev 4e332cd41a
Revert "Transform memset + malloc --> calloc (PR25892)"
It broke `check-msan`, see e.g. https://lab.llvm.org/buildbot/#/builders/18/builds/1934

This reverts commit 375694a07b.
2021-07-09 16:26:48 +03:00
Max Kazantsev 9c5e65691e [LoopDeletion] Handle switch in proving that loop exits on first iteration
Added check for switch-terminated blocks in loops.
Now if a block is terminated with a switch, we try to find out which of the
cases is taken on 1st iteration and mark corresponding edge from the block
to the case successor as live.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D105688
Reviewed By: nikic, mkazantsev
2021-07-09 18:03:34 +07:00
Dawid Jurczak 375694a07b Transform memset + malloc --> calloc (PR25892)
After this change DSE can eliminate malloc + memset and emit calloc.
It's https://reviews.llvm.org/D101440 follow-up.

Differential Revision: https://reviews.llvm.org/D103009
2021-07-09 11:01:12 +02:00
Bjorn Pettersson 472462c472 [NewPM] Consistently use 'simplifycfg' rather than 'simplify-cfg'
There was an alias between 'simplifycfg' and 'simplify-cfg' in the
PassRegistry. That was the original reason for this patch, which
effectively removes the alias.

This patch also replaces all occurrances of 'simplify-cfg'
by 'simplifycfg'. Reason for choosing that form for the name is
that it matches the DEBUG_TYPE for the pass, and the legacy PM name
and also how it is spelled out in other passes such as
'loop-simplifycfg', and in other options such as
'simplifycfg-merge-cond-stores'.

I for some reason the name should be changed to 'simplify-cfg' in
the future, then I think such a renaming should be more widely done
and not only impacting the PassRegistry.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D105627
2021-07-09 09:47:03 +02:00
David Blaikie 1def2579e1 PR51018: Remove explicit conversions from SmallString to StringRef to future-proof against C++23
C++23 will make these conversions ambiguous - so fix them to make the
codebase forward-compatible with C++23 (& a follow-up change I've made
will make this ambiguous/invalid even in <C++23 so we don't regress
this & it generally improves the code anyway)
2021-07-08 13:37:57 -07:00
Max Kazantsev 19885c7adf [NFC] Remove duplicate function calls
Removed repeated call of L->getHeader(). Now using previously stored return value.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D105535
Reviewed By: mkazantsev
2021-07-07 17:02:36 +07:00
Eli Friedman 7ac1c7bead Recommit [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers.
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.

There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.

The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.

As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base.  This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.

This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes).  The resulting changes
seem okay to me, but suggestions welcome.  As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.

I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.

Recommitting with fix to MemoryDepChecker::isDependent.

Differential Revision: https://reviews.llvm.org/D104806
2021-07-06 12:16:05 -07:00
Eli Friedman a6d081b2cb Revert "[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers."
This reverts commit 74d6ce5d5f.

Seeing crashes on buildbots in MemoryDepChecker::isDependent.
2021-07-06 11:17:13 -07:00
Eli Friedman 74d6ce5d5f [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers.
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.

There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.

The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.

As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base.  This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.

This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes).  The resulting changes
seem okay to me, but suggestions welcome.  As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.

I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.

Differential Revision: https://reviews.llvm.org/D104806
2021-07-06 10:54:41 -07:00
Jon Roelofs 37b6e03c18 [Intrinsics] Make MemCpyInlineInst a MemCpyInst
This opens up more optimization opportunities in passes that already handle MemCpyInst's.

Differential revision: https://reviews.llvm.org/D105247
2021-07-02 10:25:24 -07:00
Florian Hahn a3ca578eb9
[Matrix] Fix crash during fusion if the same load is re-used.
This patch fixes a crash when the same load is used for both operands of
a fuseable multiply.
2021-07-02 14:00:17 +01:00
Florian Hahn 7655061cc6
[Matrix] Hoist address computation before multiply to enable fusion.
If the store address does not dominate the matrix multiply, try to hoist
address computation instructions without side-effects and/or memory
reads before the multiply, to allow fusion.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D105193
2021-07-02 09:52:11 +01:00
Evgeniy Brevnov 9568811cb8 [NFC][DSE]Change 'do-while' to 'for' loop to simplify code structure
With 'for' loop there is is a single place where 'Current' is adjusted. It helps to avoid copy paste and makes a bit easy to understand overall loop controll flow.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D101044
2021-07-02 10:00:47 +07:00
Craig Topper 066524ea54 [ScalarizeMaskedMemIntrin][SelectionDAGBuilder] Use the element type to calculate alignment for gather/scatter when alignment operand is 0.
Previously we used the vector type, but we're loading/storing
invididual elements so I think only element alignment should matter.

Noticed while looking at the code for something else so I don't
have a test case.

Differential Revision: https://reviews.llvm.org/D105220
2021-07-01 19:08:47 -07:00
Reshabh Sharma ae983de6cc [InferAddressSpaces] NFC: For noop IntToPtr/PtrToInt pair cast to operator instead of PtrToInt
Compiler crashes at an assertion while casting operands to PtrToIntInst at some cases when
ptrtoint is present as an explicit operand to inttoptr. Explicit instruction operator as
operand can not be casted to an Instruction.

This patch replaces cast from PtrToInst to Operator which are later checked for constant
expressions.

Differential Revision: https://reviews.llvm.org/D105002
2021-06-28 19:24:26 +05:30
Max Kazantsev d58514d41c [LSR][NFC] Make sure that after the canonicalization the formula is canonical 2021-06-28 12:50:04 +07:00
Max Kazantsev 7c73c2ede8 [LoopDeletion] Benefit from branches by undef conditions when symbolically executing 1st iteration
We can exploit branches by `undef` condition. Frankly, the LangRef says that
such branches are UB, so we can assume that all outgoing edges of such blocks
are dead.

However, from practical perspective, we know that this is not supported correctly
in some other places. So we are being conservative about it.

Branch by undef is treated in the following way:
- If it is a loop-exiting branch, we always assume it exits the loop;
- If not, we arbitrarily assume it takes `true` value.

Differential Revision: https://reviews.llvm.org/D104689
Reviewed By: nikic
2021-06-28 11:39:46 +07:00
Nikita Popov e81702912e [DSE] Preserve address space
Preserve address space when inserting i8* cast.
2021-06-27 20:26:00 +02:00
Nikita Popov 9aa951e80e [MemCpyOpt] Preserve address space
Preserve address space when generating the cast to i8*.
2021-06-27 20:21:19 +02:00
Nikita Popov f00941e061 [DSE] Support opaque pointers
For the start shortening optimization, always use a i8 type for
the GEP, as it is a raw offset calculation.

Handling of non-i8* memset/memcpy arguments requires insertion
of casts. These cases were previously miscompiled, as the offset
calculation was performed on the wrong type.
2021-06-27 17:41:40 +02:00
Nikita Popov f025053977 [MemCpyOpt] Handle unusual memcpy element type
Apparently, it is legal to use memcpy/memset with pointer types
other than i8*. Prior to 81fcdae68c
this case was silently miscompiled, as the i8 offset calculation
was performed on some other type. Now it would crash due to a
type mismatch. Fix this by inserting an explicit bitcast to i8*.
2021-06-27 16:21:44 +02:00
Nikita Popov 81fcdae68c [MemCpyOpt] Support opaque pointers 2021-06-27 15:52:38 +02:00
Roman Lebedev d064182612
[SimplifyCFG] Tail-merging all blocks with `resume` terminator
Similar to what we already do for `ret` terminators.
As noted by @rnk, clang seems to already generate a single `ret`/`resume`,
so this isn't likely to cause widespread changes.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104849
2021-06-24 21:25:06 +03:00
Roman Lebedev 9c4c2f2472
[SimplifyCFG] Tail-merging all blocks with `ret` terminator
Based ontop of D104598, which is a NFCI-ish refactoring.
Here, a restriction, that only empty blocks can be merged, is lifted.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104597
2021-06-24 13:15:39 +03:00
Roman Lebedev ff4b1d379f
[NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging
This changes the approach taken to tail-merge the blocks
to always create a new block instead of trying to reuse some block,
and generalizes it to support dealing not with just the `ret` in the future.

This effectively lifts the CallBr restriction, although this isn't really intentional.
That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it.

Other restrictions of the transform remain.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104598
2021-06-23 14:33:18 +03:00
Max Kazantsev 842b4c83cb [LoopDeletion] Exploit undef Phi inputs when symbolically executing 1st iteration
Follow-up on Roman's idea expressed in D103959.
- If a Phi has undefined inputs from live blocks:
   - and no other inputs, assume it is undef itself;
   - and exactly one non-undef input, we can assume that all undefs are equal to this input.

Differential Revision: https://reviews.llvm.org/D104618
Reviewed By: lebedev.ri, nikic
2021-06-23 11:53:48 +07:00
Max Kazantsev b7d2c173eb [LSR] Filter out zero factors. PR50765
Zero factor leads to division by zero and failure of corresponding
assert as shown in PR50765. We should filter out such factors.

Differential Revision: https://reviews.llvm.org/D104702
Reviewed By: huihuiz, reames
2021-06-23 10:43:06 +07:00
Jingu Kang 873ff5a728 [SimpleLoopUnswich] Fixa a bug on ComputeUnswitchedCost with partial unswitch
There was a bug from cost calculation for partially invariant unswitch.

The costs of non-duplicated blocks are substracted from the total LoopCost, so
anything that is duplicated should not be counted.

Differential Revision: https://reviews.llvm.org/D103816
2021-06-22 16:18:00 +01:00
Max Kazantsev 4c4f1ae93e Re-land "[LoopDeletion] Handle Phis with similar inputs from different blocks"
Patch was reverted due to a bug that existed before it and was exposed
by it. Returning after the underlying bug has been fixed.

Differential Revision: https://reviews.llvm.org/D103959
2021-06-22 12:28:46 +07:00
Max Kazantsev 575253887b [LoopDeletion] Require loop to have a predecessor when executing 1st iteration symbolically
Two predecessors break the further logic, and the loop may come to the
opt in non-canonicalized state.
2021-06-22 12:20:55 +07:00
Nikita Popov 862313cf59 [LoopUnroll] Don't modify TripCount/TripMultiple in computeUnrollCount() (NFCI)
As these are no longer passed to UnrollLoop(), there is no need to
modify them in computeUnrollCount(). Make them non-reference parameters.

Differential Revision: https://reviews.llvm.org/D104590
2021-06-21 21:34:17 +02:00
Nathan Chancellor f52666985d
Revert "[LoopDeletion] Handle Phis with similar inputs from different blocks"
This reverts commit bb1dc876eb.

This patch causes an assertion failure when building an arm64 defconfig
Linux kernel.

See https://reviews.llvm.org/D103959 for a link to the original bug
report and a reduced reproducer.
2021-06-21 10:18:55 -07:00
Max Kazantsev bb1dc876eb [LoopDeletion] Handle Phis with similar inputs from different blocks
This patch lifts the requirement to have the only incoming live block
for Phis. There can be multiple live blocks if the same value comes to
phi from all of them.

Differential Revision: https://reviews.llvm.org/D103959
Reviewed By: nikic, lebedev.ri
2021-06-21 11:37:06 +07:00
Nikita Popov 1ae266f452 [LoopUnroll] Use smallest exact trip count from any exit
This is a more general alternative/extension to D102635. Rather than
handling the special case of "header exit with non-exiting latch",
this unrolls against the smallest exact trip count from any exit.
The latch exit is no longer treated as priviledged when it comes to
full unrolling.

The motivating case is in full-unroll-one-unpredictable-exit.ll.
Here the header exit is an IV-based exit, while the latch exit is
a data comparison. This kind of loop does not get rotated, because
the latch is already exiting, and loop rotation doesn't try to
distinguish IV-based/analyzable latches.

Differential Revision: https://reviews.llvm.org/D102982
2021-06-20 20:58:26 +02:00
David Green a24b02193a [DSE] Remove stores in the same loop iteration
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.

The test case this helps is from code like this, which can come up in
certain matrix operations:
  for(i=..)
    dst[i] = 0;
    for(j=..)
      dst[i] += src[i*n+j];

After LICM, this becomes:
for(i=..)
  dst[i] = 0;
  sum = 0;
  for(j=..)
    sum += src[i*n+j];
  dst[i] = sum;

The first store is dead, and with this patch is now removed.

Differntial Revision: https://reviews.llvm.org/D100464
2021-06-20 17:03:30 +01:00
Nikita Popov 1bd4085e0b [LoopUnroll] Push runtime unrolling decision up into tryToUnrollLoop()
Currently, UnrollLoop() is passed an AllowRuntime flag and decides
itself whether runtime unrolling should be used or not. This patch
pushes the decision into the caller and allows us to eliminate the
ULO.TripCount and ULO.TripMultiple parameters.

Differential Revision: https://reviews.llvm.org/D104487
2021-06-19 09:25:57 +02:00
Hongtao Yu bd52495518 [CSSPGO] Undoing the concept of dangling pseudo probe
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.

I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104477
2021-06-18 15:14:11 -07:00
Max Kazantsev de92287cf8 [LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration (try 3)
This patch handles one particular case of one-iteration loops for which SCEV
cannot straightforwardly prove BECount = 1. The idea of the optimization is to
symbolically execute conditional branches on the 1st iteration, moving in topoligical
order, and only visiting blocks that may be reached on the first iteration. If we find out
that we never reach header via the latch, then the backedge can be broken.

This implementation uses InstSimplify. SCEV version was rejected due to high
compile time impact.

Differential Revision: https://reviews.llvm.org/D102615
Reviewed By: nikic
2021-06-18 17:31:57 +07:00
Max Kazantsev fa5eb22ad4 [NFC] Assert non-zero factor before division
This is to ensure that zero denominator leads to controlled
assertion failure rather than UB.
2021-06-18 15:50:50 +07:00
Roman Lebedev 84eeb82888
[NFC][SimpleLoopUnswitch] unswitchTrivialBranch(): add debug output explaining unswitching failure
It's not prohibitively verbose, and allows easier understanding
why certain unswitching ultimately wasn't performed.
2021-06-18 00:46:04 +03:00
Craig Topper 99e95856fb [PartiallyInlineLibCalls] Disable sqrt expansion for strictfp.
This pass emits a floating point compare and a conditional branch,
but if strictfp is enabled we don't emit a constrained compare
intrinsic.

The backend also won't expand the readonly sqrt call this pass inserts
to a sqrt instruction under strictfp. So we end up with 2 libcalls as
seen here. https://godbolt.org/z/oax5zMEWd

Fix these things by disabling the pass.

Differential Revision: https://reviews.llvm.org/D104479
2021-06-17 14:15:12 -07:00
Bjorn Pettersson 4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Roman Lebedev e52364532a
[NewPM] Remove SpeculateAroundPHIs pass
Addition of this pass has been botched.
There is no particular reason why it had to be sold as an inseparable part
of new-pm transition. It was added when old-pm was still the default,
and very *very* few users were actually tracking new-pm,
so it's effects weren't measured.

Which means, some of the turnoil of the new-pm transition
are actually likely regressions due to this pass.

Likewise, there has been a number of post-commit feedback
(post new-pm switch), namely
* https://reviews.llvm.org/D37467#2787157 (regresses HW-loops)
* https://reviews.llvm.org/D37467#2787259 (should not be in middle-end, should run after LSR, not before)
* https://reviews.llvm.org/D95789 (an attempt to fix bad loop backedge metadata)
and in the half year past, the pass authors (google) still haven't found time to respond to any of that.

Hereby it is proposed to backout the pass from the pipeline,
until someone who cares about it can address the issues reported,
and properly start the process of adding a new pass into the pipeline,
with proper performance evaluation.

Furthermore, neither google nor facebook reports any perf changes
from this change, so i'm dropping the pass completely.
It can always be re-reverted should/if anyone want to pick it up again.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D104099
2021-06-15 20:35:55 +03:00
Florian Hahn f7fc8927c0
[LoopDeletion] Check for irreducible cycles when deleting loops.
Loops with irreducible cycles may loop infinitely. Those cannot be
removed, unless the loop/function is marked as mustprogress.

Also discussed in D103382.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104238
2021-06-15 12:56:12 +01:00
Huihui Zhang 1c096bf09f [SVE][LSR] Teach LSR to enable simple scaled-index addressing mode generation for SVE.
Currently, Loop strengh reduce is not handling loops with scalable stride very well.

Take loop vectorized with scalable vector type <vscale x 8 x i16> for instance,
(refer to test/CodeGen/AArch64/sve-lsr-scaled-index-addressing-mode.ll added).

Memory accesses are incremented by "16*vscale", while induction variable is incremented
by "8*vscale". The scaling factor "2" needs to be extracted to build candidate formula
i.e., "reg(%in) + 2*reg({0,+,(8 * %vscale)}". So that addrec register reg({0,+,(8*vscale)})
can be reused among Address and ICmpZero LSRUses to enable optimal solution selection.

This patch allow LSR getExactSDiv to recognize special cases like "C1*X*Y /s C2*X*Y",
and pull out "C1 /s C2" as scaling factor whenever possible. Without this change, LSR
is missing candidate formula with proper scaled factor to leverage target scaled-index
addressing mode.

Note: This patch doesn't fully fix AArch64 isLegalAddressingMode for scalable
vector. But allow simple valid scale to pass through.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D103939
2021-06-14 16:42:34 -07:00
Jeroen Dobbelaere bb8ce25e88 Intrinsic::getName: require a Module argument
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.

This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.

Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99173
2021-06-14 14:52:29 +02:00
Simon Pilgrim 56541d1377 GVN.cpp - remove unused <vector> include. NFCI. 2021-06-13 14:06:32 +01:00
Simon Pilgrim c14fd171fe LoopUnrollAndJamPass.cpp - remove unused <vector> include. NFCI. 2021-06-13 14:06:32 +01:00
Adam Nemet e0efebb8eb [Matrix] In transpose opts, handle a^t * a^t
Without the fix the testcase crashes because we remove the same instruction
twice.

Differential Revision: https://reviews.llvm.org/D104127
2021-06-11 09:29:43 -07:00
Sjoerd Meijer 9907746f5d Move Function Specialization to its correct location. NFC.
As a follow up of rGc4a0969b9c14, and as part of D104102, move it to
the IPO transformations directory.
2021-06-11 15:00:10 +01:00
Sjoerd Meijer c4a0969b9c Function Specialization Pass
This adds a function specialization pass to LLVM. Constant parameters
like function pointers and constant globals are propagated to the callee by
specializing the function.

This is a first version with a number of limitations:
- The pass is off by default, so needs to be enabled on the command line,
- It does not handle specialization of recursive functions,
- It does not yet handle constants and constant ranges,
- Only 1 argument per function is specialised,
- The cost-model could be further looked into, and perhaps related,
- We are not yet caching analysis results.

This is based on earlier work by Matthew Simpson (D36432) and Vinay Madhusudan.
More recently this was also discussed on the list, see:

https://lists.llvm.org/pipermail/llvm-dev/2021-March/149380.html.

The motivation for this work is that function specialisation often comes up as
a reason for performance differences of generated code between LLVM and GCC,
which has this enabled by default from optimisation level -O3 and up. And while
this certainly helps a few cpu benchmark cases, this also triggers in real
world codes and is thus a generally useful transformation to have in LLVM.

Function specialisation has great potential to increase compile-times and
code-size.  The summary from some investigations with this patch is:
- Compile-time increases for short compile jobs is high relatively, but the
  increase in absolute numbers still low.
- For longer compile-jobs, the extra compile time is around 1%, and very much
  in line with GCC.
- It is difficult to blame one thing for compile-time increases: it looks like
  everywhere a little bit more time is spent processing more functions and
  instructions.
- But the function specialisation pass itself is not very expensive; it doesn't
  show up very high in the profile of the optimisation passes.

The goal of this work is to reach parity with GCC which means that eventually
we would like to get this enabled by default. But first we would like to address
some of the limitations before that.

Differential Revision: https://reviews.llvm.org/D93838
2021-06-11 09:11:29 +01:00
Andy Kaylor 41555eaf65 Preserve more MD_mem_parallel_loop_access and MD_access_group in SROA
SROA sometimes preserves MD_mem_parallel_loop_access and MD_access_group metadata on loads/stores, and sometimes fails to do so. This change adds copying of the MD after other CreateAlignedLoad/CreateAlignedStores. Also fix a case where the metadata was being copied from a load, rather than the store.

Added a LIT test to catch one case.

Patch by Mark Mendell

Differential Revision: https://reviews.llvm.org/D103254
2021-06-10 15:47:03 -07:00
Philip Reames 7629b2a09c [LI] Add a cover function for checking if a loop is mustprogress [nfc]
Essentially, the cover function simply combines the loop level check and the function level scope into one call.  This simplifies several callers and is (subjectively) less error prone.
2021-06-10 13:37:32 -07:00
LemonBoy d3faef6eef [SROA] Avoid splitting loads/stores with irregular type
Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D99435
2021-06-09 16:36:58 +02:00
Nico Weber 205cde63c7 Revert "[SROA] Avoid splitting loads/stores with irregular type"
This reverts commit 905f4eb537.
Breaks check-llvm on most (all?) bots, see https://reviews.llvm.org/D99435
2021-06-09 06:32:58 -04:00
LemonBoy 905f4eb537 [SROA] Avoid splitting loads/stores with irregular type
Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D99435
2021-06-09 11:48:20 +02:00
Jingu Kang 8eee02020b [LoopBoundSplit] Ignore phi node which is not scevable
There was a bug in LoopBoundSplit. The pass should ignore phi node which is not
scevable.

Differential Revision: https://reviews.llvm.org/D103913
2021-06-09 09:44:36 +01:00
David Green 297088d1ad Revert "[DSE] Remove stores in the same loop iteration"
Apparently non-dead stores are being removed, as noted in D100464.

This reverts commit 222aeb4d51.
2021-06-08 21:23:08 +01:00
maekawatoshiki 09e92c607c [LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.

Also, a crash problem on legacy pass manager is fixed.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D99149
2021-06-08 20:30:02 +09:00
Simon Pilgrim 596004a947 MemCpyOptimizer.cpp - hasUndefContentsMSSA - Pass DataLayout by reference. NFCI. 2021-06-08 10:41:02 +01:00
Philip Reames c880d5e583 [RS4GC] Treat inttoptr as base pointer
This is a modified version of a patch by tolziplohu with a style change, and most importantly, a revised commit message.

inttoptr for a non-integral address space is currently ill defined in the LangRef.  Figuring out exactly what the dynamic semantics of such a cast would be is hard, and not yet settled.  Despite that, we still need to go ahead and implement something in RS4GC for a couple of reasons.

First, as a simple consistency argument.  We're apparently added support for constexpr inttoptrs a while back, and even have tests which exercised them.  Having a lack of constant folding trigger a crash during lowering is non-ideal.

Second, and more fundementally, the optimizer is allowed to insert undefined constructs in unreachable code.  At the same time, we can't assume that dynamically dead code is always pruned before lowering.  As a result, we must assume that inttoptrs can occur (even if completely ill defined) along dead paths.  We need the lowering to not crash.  The stackmaps produced can be garbage (as the assumption is the code is dynamically dead), but the lowering itself can't crash.

Differential Revision: https://reviews.llvm.org/D103492
2021-06-07 10:27:23 -07:00
Jingu Kang a2a0ac42ab [SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IV
This pass transforms loops that contain a conditional branch with induction
variable. For example, it transforms left code to right code:

                             newbound = min(n, c)
 while (iv < n) {            while(iv < newbound) {
   A                           A
   if (iv < c)                 B
     B                         C
   C                         }
 }                           if (iv != n) {
                               while (iv < n) {
                                 A
                                 C
                               }
                             }

Differential Revision: https://reviews.llvm.org/D102234
2021-06-07 10:55:25 +01:00
maekawatoshiki 0a9d079931 Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass"
This reverts commit 2165360003.

To fix the crash problem in legacy pass manager
2021-06-07 01:26:47 +09:00
Nikita Popov db45746821 [LoopUnroll] Separate peeling from unrolling
Loop peeling is currently performed as part of UnrollLoop().
Outside test scenarios, it is always performed with an unroll
count of 1. This means that unrolling doesn't actually do anything
apart from performing post-unroll simplification.

When testing, it's currently possible to specify both an explicit
peel count and an explicit unroll count. This doesn't perform any
sensible operation and may result in miscompiles, see
https://bugs.llvm.org/show_bug.cgi?id=45939.

This patch moves peeling from UnrollLoop() into tryToUnrollLoop(),
so that peeling does not also perform a susequent unroll. We only
run the post-unroll simplifications. Specifying both an explicit
peel count and unroll count is forbidden.

In the future, we may want to support both (non-PGO) peeling a
loop and unrolling it, but this needs to be done by first performing
the peel and then recalculating unrolling heuristics on a now
possibly analyzable loop.

Differential Revision: https://reviews.llvm.org/D103362
2021-06-05 10:32:00 +02:00
Adam Nemet ffde966cd9 [Matrix] Fix transpose-multiply folding if transpose has multiple uses
Don't add it to FusedInsts in this case.

Differential Revision: https://reviews.llvm.org/D103627
2021-06-04 10:55:03 -07:00
Philip Reames 5c0d1b2f90 [LoopUnroll] Eliminate PreserveCondBr parameter and fix a bug in the process
This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing when the passed in trip count is an exact trip count or a max trip count. In theory the new code is slightly less powerful (since it relies on exact computable trip counts), but in practice, it appears to cover all the same cases. It can also be extended if needed.

The test change shows what appears to be a bug in the existing code around the interaction of peeling and unrolling. The original loop only ran 8 iterations. The previous output had the loop peeled by 2, and then an exact unroll of 8. This meant the loop ran a total of 10 iterations which appears to have been a miscompile.

Differential Revision: https://reviews.llvm.org/D103620
2021-06-03 14:09:16 -07:00
Philip Reames 44d70d298a [LoopUnroll] Eliminate PreserveOnlyFirst parameter [nfc]
This is a first step towards simplifying the transform interface to be less error prone. The basic idea is that querying SCEV is cheap (since it's cached) and we can just check for properties related to branch folding in the transform method instead of relying on the heuristic part to pass everything in correctly.

Differential Revision: https://reviews.llvm.org/D103584
2021-06-03 10:33:14 -07:00
Hamza Mahfooz 83235b07e3
[Matrix] Preserve existing fast-math flags during lowering
This patch makes it so, floating-point instructions created in
LowerMatrixIntrinsics retain fast-math flags from instructions that are
higher up the chain.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49738

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D103233
2021-06-03 15:29:31 +01:00
Stephen Tozer 4316b0e59c [LoopStrengthReduce] Ensure that debug intrinsics do not affect LSR's output
During Loop Strength Reduce, if the terminating condition for the loop
is not immediately adjacent to the terminating branch and it has more
than one use, a clone of the condition will be created just before the
terminating branch and will be used as the branch condition. Currently,
whether the instructions are "immediately adjacent" is determined by
checking whether the next instruction after the condition is the
terminating branch; this is incorrect however, as the presence of a
debug intrinsic between the two will result in a change to the output.
This is fixed by using getNextNonDebugInstruction() instead.

Differential Revision: https://reviews.llvm.org/D103033
2021-06-02 15:56:23 +01:00
Jingu Kang f3a27511c9 [SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch
This re-enables commit 107d19eb01 with bug fixes.

Differential Revision: https://reviews.llvm.org/D99354
2021-06-02 10:58:22 +01:00
Daniil Fukalov 0b34acdab7 [NFC] Fix 'Load' name masking.
Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D103456
2021-06-02 11:09:53 +03:00
Florian Hahn 1b84acb23a
[LoopDeletion] Consider infinite loops alive, unless mustprogress.
The current loop or any of its sub-loops may be infinite. Unless the
function or the loops are marked as mustprogress, this in itself makes
the loop *not* dead.

This patch moves the logic to check whether the current loop is finite
or mustprogress to `isLoopDead` and also extends it to check the
sub-loops. This should fix PR50511.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D103382
2021-06-01 13:07:36 +01:00
Congzhe Cao bfefde22b6 [LoopInterhcange] Handle movement of reduction phis appropriately
This patch fixes pr43326 and pr48212.

Currently when we move reduction phis to the right place,
loop interchange assumes the first phi in loop headers is
an induction phi, skips the first phi and assumes the rest
of phis are candidate reduction phis to move. However, it
may not always be the case.

This patch loops over all phis in loop headers and considers
a phi node as a candidate reduction phi to move only when it
is indeed a reduction phi across outer and inner loop.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D102743
2021-05-31 16:27:38 -04:00
David Green 222aeb4d51 [DSE] Remove stores in the same loop iteration
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.

The test case this helps is from code like this, which can come up in
certain matrix operations:
  for(i=..)
    dst[i] = 0;
    for(j=..)
      dst[i] += src[i*n+j];

After LICM, this becomes:
for(i=..)
  dst[i] = 0;
  sum = 0;
  for(j=..)
    sum += src[i*n+j];
  dst[i] = sum;

The first store is dead, and with this patch is now removed.

Differntial Revision: https://reviews.llvm.org/D100464
2021-05-31 10:22:37 +01:00
Mindong Chen 71acce68da [NFCI] Move DEBUG_TYPE definition below #includes
When you try to define a new DEBUG_TYPE in a header file, DEBUG_TYPE
definition defined around the #includes in files include it could
result in redefinition warnings even compile errors.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D102594
2021-05-30 17:31:01 +08:00
Stefan Pintilie 0159652058 Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" (try 2)"
This reverts commit be1a23203b.
2021-05-28 12:21:22 -05:00
Stefan Pintilie 24bd657202 Revert "[NFCI][LoopDeletion] Only query SCEV about loop successor if another successor is also in loop"
This reverts commit b0b2bf3b5d.
2021-05-28 12:21:22 -05:00
Stefan Pintilie fd55331203 Revert "[NFC] Formatting fix"
This reverts commit 59d938e649.
2021-05-28 12:21:22 -05:00
Stefan Pintilie 807fc7cdc9 Revert "[NFC] Reuse existing variables instead of re-requesting successors"
This reverts commit c467585682.
2021-05-28 12:21:22 -05:00
Stefan Pintilie dd226803c2 Revert "[NFCI][LoopDeletion] Do not call complex analysis for known non-zero BTC"
This reverts commit 7d418dadf6.
2021-05-28 12:21:21 -05:00
eopXD fa488ea864 [LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.

Utilize LoopNest and let function 'Flatten' generate information from it.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D102904
2021-05-28 15:43:12 +00:00
eopXD e96d6f4821 Revert "[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass"
This reverts commit 7952ddb21f.

Differential Revision: https://reviews.llvm.org/D103302
2021-05-28 07:58:06 +00:00
eopXD 7e06cf8f1b Revert "[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass"
This reverts commit ffc4d3e068.
2021-05-28 07:48:04 +00:00
eopXD ffc4d3e068 [LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.

Utilize LoopNest and let function 'Flatten' generate information from it.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D102904
2021-05-28 07:25:53 +00:00
eopXD 7952ddb21f [LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.

Utilize LoopNest and let function 'Flatten' generate information from it.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D102904
2021-05-28 07:11:26 +00:00
Max Kazantsev 6a2af607ad Revert "[NFCI] Lazily evaluate SCEVs of PHIs"
This reverts commit 51d334a845.

Reported failures, need to analyze.
2021-05-28 11:05:30 +07:00
maekawatoshiki 2165360003 [LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D99149
2021-05-28 01:17:23 +09:00
Max Kazantsev 7d418dadf6 [NFCI][LoopDeletion] Do not call complex analysis for known non-zero BTC 2021-05-27 15:29:37 +07:00
Max Kazantsev c467585682 [NFC] Reuse existing variables instead of re-requesting successors 2021-05-27 15:29:37 +07:00
Max Kazantsev 51d334a845 [NFCI] Lazily evaluate SCEVs of PHIs
Eager evaluation has cost of compile time. Only query them if they are
required for proving predicates.
2021-05-27 13:35:31 +07:00
Max Kazantsev 59d938e649 [NFC] Formatting fix 2021-05-27 12:50:54 +07:00
Max Kazantsev b0b2bf3b5d [NFCI][LoopDeletion] Only query SCEV about loop successor if another successor is also in loop 2021-05-27 12:44:22 +07:00
Yevgeny Rouban 4d26f41f76 [RS4GC] Introduce intrinsics to get base ptr and offset
There can be a need for some optimizations to get (base, offset)
for any GC pointer. The base can be calculated by generating
needed instructions as it is done by the
RewriteStatepointsForGC::findBasePointer() function. The offset
can be calculated in the same way. Though to not expose the base
calculation and to make the offset calculation as simple as
ptrtoint(derived_ptr) - ptrtoint(base_ptr), which is illegal
outside RS4GC, this patch introduces 2 intrinsics:

 @llvm.experimental.gc.get.pointer.base(%derived_ptr)
 @llvm.experimental.gc.get.pointer.offset(%derived_ptr)

These intrinsics are inlined by RS4GC along with generation of
statepoint sequences.

With these new intrinsics the GC parseable lowering for atomic
memcpy intrinsics (6ec2c5e402)
could be implemented as a separate pass.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D100445
2021-05-27 09:14:14 +07:00
Philip Reames 9cc2181ec3 [unroll] Use value domain for symbolic execution based cost model
The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost.

By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928.

Differential Revision: https://reviews.llvm.org/D102934
2021-05-26 08:41:25 -07:00
Max Kazantsev be1a23203b Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" (try 2)
The patch was reverted due to compile time impact of contextual SCEV
queries. It also appeared that it introduced a miscompile on irreducible CFG.

Changes made:
1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate;
2. Irreducible CFG in live code is now detected and excluded from processing.

Differential Revision: https://reviews.llvm.org/D102615
2021-05-26 19:47:14 +07:00
Max Kazantsev 0de553dce0 Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration""
This reverts commit 43d2e51c2e.

Commited wrong version.
2021-05-26 19:29:07 +07:00
Max Kazantsev 43d2e51c2e Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration"
The patch was reverted due to compile time impact of contextual SCEV
queries. It also appeared that it introduced a miscompile on irreducible CFG.

Changes made:
1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate;
2. Irreducible CFG in live code is now detected and excluded from processing.

Differential Revision: https://reviews.llvm.org/D102615
2021-05-26 19:23:21 +07:00
Matt Morehouse 832c99f727 Revert "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration"
This reverts commit 2531fd70d1 due to
performance regression on the PPC buildbot.
2021-05-25 13:58:42 -07:00
Benjamin Kramer d2d4f16806 [Matrix] Use LLVM_DEBUG for a debug flag
dump() doesn't exist in release builds.

ld.lld: error: undefined symbol: llvm::Value::dump() const
>>> referenced by LowerMatrixIntrinsics.cpp
>>>               LowerMatrixIntrinsics.o:((anonymous namespace)::LowerMatrixIntrinsics::Visit())
2021-05-25 21:10:19 +02:00
Nikita Popov 9c91614959 [CVP] Guard against poison in common phi value transform (PR50399)
The common phi value transform replaces constants with values that
have the same value as the constant on a given edge. However, LVI
generally only provides information that is correct up to poison,
so this can end up replacing a well-defined value with poison.
D69442 addressed an instance of this problem by clearing poison
flags on the generating instruction, which was sufficient at the
time. rGa917fb89dc28 made LVI's edge value analysis slightly more
powerful, and clearing poison flags is no longer sufficient.

This patch changes the transform to instead explicitly guard against
a poison value instead. This should be satisfied for most cases due
to a prior branch on poison.

Fixes https://bugs.llvm.org/show_bug.cgi?id=50399.

Differential Revision: https://reviews.llvm.org/D102966
2021-05-25 20:47:17 +02:00
Adam Nemet dfd1bbd00a [Matrix] Factor and distribute transposes across multiplies
Now that we can fold some transposes into multiplies (CM: A * B^t and RM:
A^t * B), we want to move them around to create the optimal expressions:

* fold away double transposes while still using them to assert the shape
* sink transposes hoping they cancel out
* lift transposes when both operands are transposed

This also modifies the matrix remarks to include the number of exposed
transposes (i.e. transposes that we couldn't fold into a multiply).

The adjustment to the test remarks-inlining is a bit subtle: I am changing the
double transpose to a single transpose so that we don't remove it completely.
More importantly this changes some of the total instruction count, most
notable stores because we can no longer use a vector store.

Differential Revision: https://reviews.llvm.org/D102733
2021-05-25 11:12:20 -07:00
Roman Lebedev 149e018d12
[LoopIdiom] 'arithmetic right-shift until zero': don't turn potentially infinite loops into finite ones
Nowadays LLVM does not assume that all loops are finite,
so if we want to produce a finite loop from a potentially-infinite one,
we must ensure that the original loop is known to be a finite one.

For this transform, it only matters for arithmetic right-shifts.
For them, either the function or the loop must be known to
be `mustprogress`, or the original value being shifted must be known
to be non-negative (because iff the sign bit was set,
it will never become zero, but will become `-1` in the "end").

It would be really good for alive2 to actually complain about this,
but it currently does not: https://github.com/AliveToolkit/alive2/issues/726
2021-05-25 21:02:28 +03:00
Roman Lebedev 8f4db14d1c
[LoopIdiom] Support 'left-shift until zero' idiom
This adds support for the "count active bits" pattern, i.e.:
```
int countBits(unsigned val) {
    int cnt = 0;
    for( ; (val << cnt) != 0; ++cnt)
        ;
    return cnt;
}
```
but a somewhat more general one:
```
int countBits(unsigned val, int start, int off) {
    int cnt;
    for (cnt = start; val << (cnt + off); cnt++)
        ;
    return cnt;
}
```

alive2 is happy with all the tests there.

Note that, again, much like with the right-shift cases,
we don't require the `val != 0` guard.

This is the last pattern that was supported by
`detectShiftUntilZeroIdiom()`, which now becomes obsolete.
2021-05-25 15:26:35 +03:00
Roman Lebedev f1c5f78d38
[LoopIdiom] Support 'arithmetic right-shift until zero' idiom
This adds support for the "count active bits" pattern, i.e.:
```
int countActiveBits(signed val) {
    int cnt = 0;
    for( ; (val >> cnt) != 0; ++cnt)
        ;
    return cnt;
}
```
but a somewhat more general one:
```
int countActiveBits(signed val, int start, int off) {
    int cnt;
    for (cnt = start; val >> (cnt + off); cnt++)
        ;
    return cnt;
}
```

This directly matches the existing 'logical right-shift until zero' idiom.
alive2 is happy with all the tests there.

Note that, again, much like with the original unsigned case,
we don't require the `val != 0` guard.

The old `detectShiftUntilZeroIdiom()` already supports this pattern,
the idea here is that the `val` must be positive (have at least one
leading zero), because otherwise the loop is non-terminating,
but since it is not `while(1)`, that would have been UB.
2021-05-25 14:30:49 +03:00
Alexey Lapshin 10c2e26159 [TRE] Reland: allow TRE for non-capturing calls.
The D82085 "allow TRE for non-capturing calls" caused failure during bootstrap.
This patch does the same as D82085 plus fixes bootstrap error.

The problem with D82085 is that it does not create copies for byval
operands, while replacing function call with a branch.

Consider following example:

```
    int zoo ( S p1 );

    int foo ( int count, S p1 ) {
      if ( count > 10 )
        return zoo(p1);

      // temporarily variable created for passing byvalue parameter
      // p1 could be used when zoo(p1) is called(after TRE is done).
      // lifetime.start p1.byvalue.temp
      return foo(count+1, p1);
      // lifetime.end p1.byvalue.temp
    }
```

After recursive call to foo is replaced with a jump into
start of the function, its parameters could be passed to
zoo function. i.e. temporarily variable created for byvalue
parameter "p1" could be passed to zoo. Finally zoo receives
broken operand:

```
    int foo ( int count, S p1 ) {
    :tailrecurse
      p1_tr = phi p1, p1.byvalue.temp
      if ( count > 10 )
        return zoo(p1_tr);

      // temporarily variable created for passing byvalue parameter
      // p1 could be used when zoo(p1) is called(after TRE is done).
      lifetime.start p1.byvalue.temp
      memcpy (p1.byvalue.temp, p1_tr)
      count = count + 1
      lifetime.end p1.byvalue.temp
      br tailrecurse
    }
```

To prevent using p1.byvalue.temp after its scope finished by
lifetime.end marker this patch copies value from p1.byvalue.temp
into another temporarily variable and then copies this variable
into the input parameter for next iteration.

This patch passes bootstrap build and bootstrap build with AddressSanitizer.

Differential Revision: https://reviews.llvm.org/D85614
2021-05-25 11:35:48 +03:00
Max Kazantsev 2531fd70d1 [LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration
This patch handles one particular case of one-iteration loops for which SCEV
cannot straightforwardly prove BECount = 1. The idea of the optimization is to
symbolically execute conditional branches on the 1st iteration, moving in topoligical
order, and only visiting blocks that may be reached on the first iteration. If we find out
that we never reach header via the latch, then the backedge can be broken.

Differential Revision: https://reviews.llvm.org/D102615
Reviewed By: reames
2021-05-25 12:43:31 +07:00
maekawatoshiki e77d24f70a Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass"
This reverts commit d65c32fb41.
2021-05-25 11:39:49 +09:00
Jon Roelofs 095e91c973 [Remarks] Add analysis remarks for memset/memcpy/memmove lengths
Re-landing now that the crasher this patch previously uncovered has been fixed
in: https://reviews.llvm.org/D102935

Differential revision: https://reviews.llvm.org/D102452
2021-05-24 10:10:44 -07:00
Roman Lebedev 32bee42719
[NFCI][LoopIdiom] 'left-shift until bittest': assert that BaseX is loop-invariant
Given that BaseX is an incoming value when coming from the preheader,
it *should* be loop-invariant, but let's just document this assumption.
2021-05-24 12:15:06 +03:00
Roman Lebedev aa3dac95ed
[LoopIdiom] 'logical right shift until zero': the value must be loop-invariant
As per the reproducer provided by Mikael Holmén in post-commit review.
2021-05-24 12:15:06 +03:00
maekawatoshiki d65c32fb41 [LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D99149
2021-05-23 22:32:01 +09:00
Florian Hahn a6de8d95db
[Matrix] Bail out early if there are no matrix intrinsics.
If there are no matrix intrinsics in a function, we can directly bail
out, as there's nothing left to do.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D102931
2021-05-22 11:37:25 +01:00
Florian Hahn a0ce6439ca
[Matrix] Remove unused matrix-propagate-shape option.
The option was used during the initial bringup, but it does not add any
value at this point. Remove it.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D102930
2021-05-21 19:01:54 +01:00
maekawatoshiki fd53cb4148 Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass"
This reverts commit cea7a3fe3d.
To investigate sanitizer-x86_64-linux-fast failure.
2021-05-22 01:40:43 +09:00
maekawatoshiki cea7a3fe3d [LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D99149
2021-05-21 23:57:39 +09:00
Djordje Todorovic cd49b3ae1a [DebugInfo] Salvage dbg.value() during ADCE
This has been found by using the [0].

[0] https://llvm.org/docs/HowToUpdateDebugInfo.html#\
    test-original-debug-info-preservation-in-optimizations

 Differential Revision: https://reviews.llvm.org/D100844
2021-05-21 05:25:59 -07:00
Jon Roelofs 0af3105b64 Revert "[Remarks] Add analysis remarks for memset/memcpy/memmove lengths"
This reverts commit 4bf69fb52b.

This broke spec2k6/403.gcc under -global-isel. Details to follow once I've
reduced the problem.
2021-05-20 12:19:16 -07:00
Kevin P. Neal f21f1eea05 [FPEnv] EarlyCSE support for constrained intrinsics, default FP environment edition
EarlyCSE cannot distinguish between floating point instructions and
constrained floating point intrinsics that are marked as running in the
default FP environment. Said intrinsics are supposed to behave exactly the
same as the regular FP instructions. Teach EarlyCSE to handle them in that
case.

Differential Revision: https://reviews.llvm.org/D99962
2021-05-20 14:40:51 -04:00
Jon Roelofs 4bf69fb52b [Remarks] Add analysis remarks for memset/memcpy/memmove lengths
Differential revision: https://reviews.llvm.org/D102452
2021-05-19 15:09:18 -07:00
Roman Lebedev 729e18cbf4
[NFCI] SimplifyCFGPass: mergeEmptyReturnBlocks(): use DeleteDeadBlocks()
In this case, it does the same thing as the original pattern does.

SimplifyCFG has a few lurking miscompilations about deleting blocks that
have their address taken, and consistently using DeleteDeadBlocks() instead
 of a hand-rolled pattern will allow to weed those cases out easierly.
2021-05-19 11:32:24 +03:00
Arthur Eubanks 6b9524a05b [NewPM] Don't mark AA analyses as preserved
Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.

SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D102032
2021-05-18 13:49:03 -07:00
Nikita Popov e81334a754 [LICM] Remove MaybePromotable set (PR50367)
The MaybePromotable set keeps track of loads/stores for which
promotion was not attempted yet. Normally, any load/stores that
are promoted in the current iteration will be removed from this
set, because they naturally MustAlias with the promoted value.
However, if the source program has UB with metadata claiming that
a store is NoAlias, while it is actually MustAlias, and multiple
different pointers are promoted in the same iteration, it can
happen that a store is removed that is still in the MaybePromotable
set, causing a use-after-free.

While this could be fixed by explicitly invalidating values in
MaybePromotable in the LoopPromoter, I'm going with the more
radical option of dropping the set entirely here and check all
load/stores on each promotion iteration. As promotion, and especially
repeated promotion, are quite rare, this doesn't seem to have any
impact on compile-time.

Fixes https://bugs.llvm.org/show_bug.cgi?id=50367.
2021-05-18 20:26:01 +02:00
Adam Nemet ab1f6ffa56 [GVN] Improve analysis for missed optimization remark
This change tries to handle multiple dominating users of the pointer operand
by choosing the most immediately dominating one, if possible.  While making
this change I also found that the previous implementation had a missing break
statement, making all loads with an odd number of dominating users emit an
OtherAccess value, so that has also been fixed.

Patch by Henrik G Olsson!

Differential Revision: https://reviews.llvm.org/D79097
2021-05-17 21:51:15 -07:00
Adam Nemet fcffd087c6 [Matrix] Fold the transpose into the matmul operand used to fetch scalars
For column-major this is:
  A * B^t
whereas for row-major:
  A^t * B

Differential Revision: https://reviews.llvm.org/D101762
2021-05-17 17:40:46 -07:00
Roman Lebedev 0633d5ce7b
[LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition.
I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests,
so in principle i'm fine with landing this without review, but just in case..

This adds support for the "count active bits" pattern, i.e.:
```
int countActiveBits(unsigned val) {
    int cnt = 0;
    for( ; (val >> cnt) != 0; ++cnt)
        ;
    return cnt;
}
```
but a somewhat more general one, since that is what i need:
```
int countActiveBits(unsigned val, int start, int off) {
    int cnt;
    for (cnt = start; val >> (cnt + off); cnt++)
        ;
    return cnt;
}
```

I've followed in footstep of 'left-shift until bittest' idiom (D91038),
in the sense that iff the `ctlz` intrinsic is cheap, we'll transform,
regardless of all other factors.

This can have a shocking effect on certain benchmarks:
```
raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf
RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm
2021-05-09T01:06:05+03:00
Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench
Run on (32 X 3600.24 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x2)
Load Average: 5.26, 6.29, 3.49
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_mean          145 ms          145 ms          128   0.145319         0.999981   10.1568M       69.8949M        69.8936M      6.88159       6.88146   0.145322
p1319978.orf/threads:32/process_time/real_time_median        145 ms          145 ms          128   0.145317         0.999986   10.1568M       69.8941M        69.8931M      6.88151       6.88141   0.145319
p1319978.orf/threads:32/process_time/real_time_stddev      0.766 ms        0.766 ms          128   766.586u         15.1302u          0       354.167k        354.098k    0.0348699     0.0348631   766.469u
RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0
2021-05-09T01:06:24+03:00
Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Run on (32 X 3599.95 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x2)
Load Average: 4.05, 5.95, 3.43
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_mean         99.8 ms         99.8 ms          128  0.0997758         0.999972   10.1568M       101.797M        101.794M      10.0225       10.0222  0.0997786
p1319978.orf/threads:32/process_time/real_time_median       99.7 ms         99.7 ms          128  0.0997165         0.999985   10.1568M       101.857M        101.854M      10.0284       10.0281  0.0997195
p1319978.orf/threads:32/process_time/real_time_stddev      0.224 ms        0.224 ms          128   224.166u          34.345u          0        226.81k        227.231k    0.0223309     0.0223723   224.586u
Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Benchmark                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 128 vs 128
p1319978.orf/threads:32/process_time/real_time_mean                  -0.3134         -0.3134           145           100           145           100
p1319978.orf/threads:32/process_time/real_time_median                -0.3138         -0.3138           145           100           145           100
p1319978.orf/threads:32/process_time/real_time_stddev                -0.7073         -0.7078             1             0             1             0

```

Reviewed By: craig.topper, zhuhan0

Differential Revision: https://reviews.llvm.org/D102116
2021-05-17 20:33:33 +03:00
Nikita Popov fb9ed1979a [IR] Add BasicBlock::isEntryBlock() (NFC)
This is a recurring and somewhat awkward pattern. Add a helper
method for it.
2021-05-15 12:41:58 +02:00
Vitaly Buka 6ce7b2f026 Fix "is not used" warning 2021-05-14 20:58:58 -07:00
Philip Reames fcd12fed41 Extract a helper routine to simplify D91481 [NFC] 2021-05-14 18:40:23 -07:00
Nick Desaulniers 8c72749bd9 [LowerConstantIntrinsics] reuse isManifestLogic from ConstantFolding
GlobalVariables are Constants, yet should not unconditionally be
considered true for __builtin_constant_p.

Via the LangRef
https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic:

    This intrinsic generates no code. If its argument is known to be a
    manifest compile-time constant value, then the intrinsic will be
    converted to a constant true value. Otherwise, it will be converted
    to a constant false value.

    In particular, note that if the argument is a constant expression
    which refers to a global (the address of which _is_ a constant, but
    not manifest during the compile), then the intrinsic evaluates to
    false.

Move isManifestConstant from ConstantFolding to be a method of
Constant so that we can reuse the same logic in
LowerConstantIntrinsics.

pr/41459

Reviewed By: rsmith, george.burgess.iv

Differential Revision: https://reviews.llvm.org/D102367
2021-05-14 15:35:21 -07:00
Philip Reames 3f1c218318 [rs4gc] Strip memory related attributes consistently
I noticed that rs4gc is not stripping a number of memory aliasing related attributes. We do strip some from call sites, but don't strip the same ones from declarations or parameters.

Why do we need to strip these? Two answers:

    Safepoints conceptually read and write to the entire garbage collected heap in the physical model. We need this to preserve ordering of all loads and stores with respect to possible relocation.
    We can infer other attributes from these. For instance, readnone can imply both nofree and nosync. Both of which don't hold after physical rewriting.

Note: This exposed a latent issue which was fixed a couple weeks back in 01801d5274.

Differential Revision: https://reviews.llvm.org/D99802
2021-05-14 07:54:56 -07:00
dfukalov fdae3fc8b3 [GVN] Clobber partially aliased loads.
Use offsets stored in `AliasResult` implemented in D98718.

Updated with fix of issue reported in https://reviews.llvm.org/D95543#2745161

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95543
2021-05-14 11:17:14 +03:00
David Green f7cb654763 [DSE] Move isOverwrite into DSEState. NFC
This moves the isOverwrite function into the DSEState so that it can
share the analyses and members from the state.

A few extra loop tests were also added to test stores in and around
multi block loops for D100464.
2021-05-14 09:16:51 +01:00
Jingu Kang 107d19eb01 Revert "[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch"
This reverts commit 88b259c014.

It needs to fix below bugs.

https://bugs.llvm.org/show_bug.cgi?id=50279
https://bugs.llvm.org/show_bug.cgi?id=50302
2021-05-13 08:40:49 +01:00
Stelios Ioannou 1124ad2f5d [LoopFlatten] Simplify loops so that the pass can operate on unsimplified loops.
The loop flattening pass requires loops to be in simplified form. If the
loops are not in simplified form, the pass cannot operate. This patch
simplifies all loops before flattening. As a result, all loops will be
simplified regardless of whether anything ends up being flattened.

This change was inspired by observing a certain loop that was not flatten
because the loops were not in simplified form. This loop is added as a
test to verify that it is now flattened.

Differential Revision: https://reviews.llvm.org/D102249

Change-Id: I45bcabe70fb99b0d89f0effafc82eb9e0585ec30
2021-05-12 19:22:01 +01:00
Congzhe Cao 3f8be15f29 [LoopInterchange] Handle lcssa PHIs with multiple predecessors
This is a bugfix in the transformation phase.

If the original outer loop header branches to both the inner loop
(header) and the outer loop latch, and if there is an lcssa PHI
node outside the loop nest, then after interchange the new outer latch
will have an lcssa PHI node inserted which has two predecessors, i.e.,
the original outer header and the original outer latch. Currently
the transformation assumes it has only one predecessor (the original
outer latch) and crashes, since the inserted lcssa PHI node does
not take both predecessors as incoming BBs.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D100792
2021-05-11 21:30:54 -04:00
Jordan Rupprecht fec2945998 Revert "[GVN] Clobber partially aliased loads."
This reverts commit 6c57044231.

It causes assertion errors due to widening atomic loads, and potentially causes miscompile elsewhere too. Repro, also posted to D95543:

```
$ cat repro.ll
; ModuleID = 'repro.ll'
source_filename = "repro.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.widget = type { i32 }
%struct.baz = type { i32, %struct.snork }
%struct.snork = type { %struct.spam }
%struct.spam = type { i32, i32 }

@global = external local_unnamed_addr global %struct.widget, align 4
@global.1 = external local_unnamed_addr global i8, align 1
@global.2 = external local_unnamed_addr global i32, align 4

define void @zot(%struct.baz* %arg) local_unnamed_addr align 2 {
bb:
  %tmp = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1
  %tmp1 = bitcast %struct.snork* %tmp to i64*
  %tmp2 = load i64, i64* %tmp1, align 4
  %tmp3 = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1, i32 0, i32 1
  %tmp4 = icmp ugt i64 %tmp2, 4294967295
  br label %bb5

bb5:                                              ; preds = %bb14, %bb
  %tmp6 = load i32, i32* %tmp3, align 4
  %tmp7 = icmp ne i32 %tmp6, 0
  %tmp8 = select i1 %tmp7, i1 %tmp4, i1 false
  %tmp9 = zext i1 %tmp8 to i8
  store i8 %tmp9, i8* @global.1, align 1
  %tmp10 = load i32, i32* @global.2, align 4
  switch i32 %tmp10, label %bb11 [
    i32 1, label %bb12
    i32 2, label %bb12
  ]

bb11:                                             ; preds = %bb5
  br label %bb14

bb12:                                             ; preds = %bb5, %bb5
  %tmp13 = load atomic i32, i32* getelementptr inbounds (%struct.widget, %struct.widget* @global, i64 0, i32 0) acquire, align 4
  br label %bb14

bb14:                                             ; preds = %bb12, %bb11
  br label %bb5
}
$ opt -O2 repro.ll -disable-output
opt: /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Utils/VNCoercion.cpp:496: llvm::Value *llvm::VNCoercion::getLoadValueForLoad(llvm::LoadInst *, unsigned int, llvm::Type *, llvm::Instruction *, const llvm::DataLayout &): Assertion `SrcVal->isSimple() && "Cannot widen volatile/atomic load!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/rupprecht/dev/opt -O2 repro.ll -disable-output
...
```
2021-05-11 16:08:53 -07:00
Congzhe Cao 40e3aa39bd [LoopInterchange] Fix legality for triangular loops
This is a bug fix in legality check.

When we encounter triangular loops such as the following form:
    for (int i = 0; i < m; i++)
      for (int j = 0; j < i; j++), or

    for (int i = 0; i < m; i++)
      for (int j = 0; j*i < n; j++),

we should not perform interchange since the number of executions
of the loop body will be different before and after interchange,
resulting in incorrect results.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D101305
2021-05-11 18:36:53 -04:00
Congzhe Cao d3f89d4d16 Revert "[LoopInterchange] Fix legality for triangular loops"
This reverts commit 29342291d2.

The test case requires an assert build. Will add REQUIRES and re-commit.
2021-05-11 18:10:58 -04:00
Congzhe Cao 29342291d2 [LoopInterchange] Fix legality for triangular loops
This is a bug fix in legality check.

When we encounter triangular loops such as the following form:
    for (int i = 0; i < m; i++)
      for (int j = 0; j < i; j++), or

    for (int i = 0; i < m; i++)
      for (int j = 0; j*i < n; j++),

we should not perform interchange since the number of executions of the loop body
will be different before and after interchange, resulting in incorrect results.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D101305
2021-05-11 11:00:46 -04:00
Roman Lebedev 1acd9a1a29
Revert "[LICM] Hoist loads with invariant.group metadata"
This appears to miscompile google benchmark's GetCacheSizesFromKVFS()
when compiling with -fstrict-vtable-pointers.
Runnable reproducer: https://godbolt.org/z/f9ovKqTzb
The "f.fail()" crashes with BUS error, it is compiled into testb,
and the adress it is testing is non-sensical.

This reverts commit 4c89bcadf6.
2021-05-08 15:44:49 +03:00
Arthur Eubanks 34a8a437bf [NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose
Printing pass manager invocations is fairly verbose and not super
useful.

This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.

This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D101797
2021-05-07 21:51:47 -07:00
Han Zhu da1cdffbb1 [loop-idiom] Hoist loop memcpys to loop preheader
For a simple loop like:
```
struct S {
  int x;
  int y;
  char b;
};

unsigned foo(S* __restrict__ a, S* b, int n) {
  for (int i = 0; i < n; i++)
    a[i] = b[i];

  return sizeof(a[0]);
}
```
We could eliminate the loop and convert it to a large memcpy of 12*n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll`
```
%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %0 = bitcast %struct.S* %arrayidx2 to i8*
  %1 = bitcast %struct.S* %arrayidx to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false)
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }

```
The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic.

With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass.
```
%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %a1 = bitcast %struct.S* %a to i8*
  %b2 = bitcast %struct.S* %b to i8*
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  %0 = zext i32 %n to i64
  %1 = mul nuw nsw i64 %0, 12
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false)
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %2 = bitcast %struct.S* %arrayidx2 to i8*
  %3 = bitcast %struct.S* %arrayidx to i8*
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```

Reviewed By: zino

Differential Revision: https://reviews.llvm.org/D97667
2021-05-04 17:05:04 -07:00
Philip Reames 11326cbcdb [IndVarSimplify][NFC] Removed mayThrow from if-condition in predicateLoopExits of IndVarSimplify
Instruction has mayHaveSideEffects method that returns true if mayThrow return true because this is called internally in the first method.  As such, the call being removed is redundant.

Patch By: vdsered (Daniil Seredkin)
Differential Revision: https://reviews.llvm.org/D101685
2021-05-03 18:25:07 -07:00
Nikita Popov ffa5a402a9 [IndVars] Remove redundant loop invariance check (NFC)
This is checked again directly below this condition.
2021-05-01 17:22:00 +02:00
Jingu Kang 88b259c014 [SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch
Differential Revision: https://reviews.llvm.org/D99354
2021-04-30 15:55:56 +01:00
Evgeniy Brevnov 7861cb600c [NARY] Don't optimize min/max if there are side uses (part2)
Previous attempt to fix infinite recursion in min/max reassociation was not fully successful (D100170). Newly discovered failing case is due to not properly handled when there is a single use. It should be processed separately from 2 uses case.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D101359
2021-04-30 19:02:02 +07:00
Tres Popp efce19c3b0 Revert "[loop-idiom] Hoist loop memcpys to loop preheader"
This reverts commit 75d6b8bb40.

The reasoning is mentioned in https://reviews.llvm.org/D97667
2021-04-28 13:16:34 +02:00