Commit Graph

10771 Commits

Author SHA1 Message Date
Nikita Popov 8d54c8a0c3 [SCEV] Fix applyLoopGuards() with range check idiom (PR51760)
Due to a typo, this replaced %x with umax(C1, umin(C2, %x + C3))
rather than umax(C1, umin(C2, %x)). This didn't make a difference
for the existing tests, because the result is only used for range
calculation, and %x will usually have an unknown starting range,
and the additional offset keeps it unknown. However, if %x already
has a known range, we may compute a result range that is too
small.
2021-09-06 22:22:41 +02:00
Andrew Litteken bd4b1b5f6d [IRSim] Adding support for recognizing branch similarity
The current IRSimilarityIdentifier does not try to find similarity across blocks, this patch provides a mechanism to compare two branches against one another, to find similarity across basic blocks, rather than just within them.

This adds a step in the similarity identification process that labels all of the basic blocks so that we can identify the relative branching locations. Within an IRSimilarityCandidate we use these relative locations to determine whether if the branching to other relative locations in the same region is the same between branches. If they are, we consider them similar.

We do not consider the relative location of the branch if the target branch is outside of the region. In this case, both branches must exit to a location outside the region, but the exact relative location does not matter.

Reviewers: paquette, yroux

Differential Revision: https://reviews.llvm.org/D106989
2021-09-06 11:55:38 -07:00
Sander de Smalen 96f6785bc9 [VectorUtils] Teach findScalarElement to return splat value.
If the vector is a splat of some scalar value, findScalarElement()
can simply return the scalar value if it knows the requested lane
is in the vector.

This is only needed for scalable vectors, because the InsertElement/ShuffleVector
case is already handled explicitly for the fixed-width case.

This helps to recognize an InstCombine fold like:
  extractelt(bitcast(splat(%v))) -> bitcast(%v)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D107254
2021-09-06 10:56:06 +01:00
Michael Kruse 650bbc5620 [OpenMP][OpenMPIRBuilder] Implement loop unrolling.
Recommit of 707ce34b06. Don't introduce a
dependency to the LLVMPasses component, instead register the required
passes individually.

Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:

 * `unrollLoopFull`
 * `unrollLoopPartial`
 * `unrollLoopHeuristic`

`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.

With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.

Reviewed By: jdoerfert, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107764
2021-09-04 19:18:58 -05:00
Chen Zheng 34badc409c Revert "[HardwareLoops] Change order of SCEV expression construction for InitLoopCount."
This causes https://bugs.llvm.org/show_bug.cgi?id=51714 and
is not a right patch according to comments in D91724

This reverts commit 42eaf4fe0a.
2021-09-03 02:55:43 +00:00
Arthur Eubanks 813a7f1ad7 [MemorySSA] Properly handle liveOnEntry in the walker printer
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109177
2021-09-02 12:51:27 -07:00
Wenlei He f7fff46acc [CSSPGO] Allow inlining recursive call for preinliner
When preinliner is used for CSSPGO, we try to honor global preinliner decision as much as we can except for uninlinable callees. We rely on InlineCost::Never to prevent us from illegal inlining.

However, it turns out that we use InlineCost::Never for both illeagle inlining and some of the "not-so-beneficial" inlining.

The most common one is recursive inlining, while it can bloat size a lot during CGSCC bottom-up inlining, it's less of a problem when recursive inlining is guided by profile and done in top-down manner.

Ideally it'd be better to have a clear separation between inline legality check vs cost-benefit check, but that requires a bigger change.

This change enables InlineCost computation to allow inlining recursive calls, controlled by InlineParams. In SampleLoader, we now enable recursive inlining for CSSPGO when global preinliner decision is used.

With this change, we saw a few perf improvements on SPEC2017 with CSSPGO and preinliner on: 2% for povray_r, 6% for xalancbmk_s, 3% omnetpp_s, while size is about the same (no noticeable perf change for all other benchmarks)

Differential Revision: https://reviews.llvm.org/D109104
2021-09-02 11:24:27 -07:00
Daniil Suchkov 5c97507e2b [InlineCost] Introduce attributes to override InlineCost for inliner testing
This patch introduces four new string attributes: function-inline-cost,
function-inline-threshold, call-inline-cost and call-threshold-bonus.
These attributes allow you to selectively override some aspects of
InlineCost analysis. That would allow us to test inliner separately from
the InlineCost analysis.

That could be useful when you're trying to write tests for inliner and
you need to test some very specific situation, like "the inline cost has
to be this high", or "the threshold has to be this low". Right now every
time someone does that, they have get creative to come up with a way to
make the InlineCost give them the number they need (like adding ~30
load/add pairs for a trivial test). This process can be somewhat tedious
which can discourage some people from writing enough tests for their
changes. Also, that results in tests that are fragile and can be easily
broken without anyone noticing it because the test writer can't
explicitly control what input the inliner will get from the inline cost
analysis.

These new attributes will alleviate those problems to an extent.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D109033
2021-09-02 17:35:06 +00:00
Roman Lebedev 3f1f08f0ed
Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
2021-09-02 13:53:56 +03:00
Roman Lebedev 50634deaa5
Revert "[OpenMP][OpenMPIRBuilder] Implement loop unrolling."
Breaks build with -DBUILD_SHARED_LIBS=ON
```
CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
  "LLVMFrontendOpenMP" of type SHARED_LIBRARY
    depends on "LLVMPasses" (weak)
  "LLVMipo" of type SHARED_LIBRARY
    depends on "LLVMFrontendOpenMP" (weak)
  "LLVMCoroutines" of type SHARED_LIBRARY
    depends on "LLVMipo" (weak)
  "LLVMPasses" of type SHARED_LIBRARY
    depends on "LLVMCoroutines" (weak)
    depends on "LLVMipo" (weak)
At least one of these targets is not a STATIC_LIBRARY.  Cyclic dependencies are allowed only among static libraries.
CMake Generate step failed.  Build files cannot be regenerated correctly.
```

This reverts commit 707ce34b06.
2021-09-02 12:42:23 +03:00
Michael Kruse 707ce34b06 [OpenMP][OpenMPIRBuilder] Implement loop unrolling.
Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:

 * `unrollLoopFull`
 * `unrollLoopPartial`
 * `unrollLoopHeuristic`

`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.

With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.

Reviewed By: jdoerfert, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107764
2021-09-02 02:37:25 -05:00
Arthur Eubanks 7b08d9da55 Reland [MemorySSA] Add pass to print results of MemorySSA walker
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109028
2021-09-01 18:58:57 -07:00
Arthur Eubanks 0f63496ea4 Revert "[MemorySSA] Add pass to print results of MemorySSA walker"
This reverts commit 8f98477c2d.

Breaks bots
2021-09-01 18:45:19 -07:00
Arthur Eubanks 8f98477c2d [MemorySSA] Add pass to print results of MemorySSA walker
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109028
2021-09-01 18:29:15 -07:00
Philip Reames bb0fa3ea02 Revert "snapshot - do not push"
This reverts commit 91f4655d92.

This wasn't intented to be pushed, sorry.
2021-09-01 16:59:23 -07:00
Philip Reames 91f4655d92 snapshot - do not push 2021-09-01 16:59:01 -07:00
Alina Sbirlea a10409fe23 [MemorySSAUpdater] Simplify updates when only deleting edges.
When performing only edge deletion, we don't need to do the DT updates
back and forth. Check for the existance of insert updates to simplify
this.
2021-09-01 15:48:20 -07:00
Philip Reames 73b951a7f7 [SCEV] Clarify requirements for zero-stride to be UB
There's a silent bug in our reasoning about zero strides. We assume that having a single static exit implies that if that exit is not taken, then the loop must be infinite. This ignores the potential for abnormal exits via exceptions. Consider the following example:

for (uint_8 i = 0; i < 1; i += 0) {
  throw_on_thousandth_call();
}

Our reasoning is such that we'd conclude this loop can't take the backedge as that would lead to a (presumed) infinite loop.

In practice, this is a silent bug because the loopIsFiniteByAssumption returns false strictly more often than the loopHaNoAbnormalExits property. We could reasonable want to change that in the future, so fixing the codeflow now is worthwhile.

Differential Revision: https://reviews.llvm.org/D109029
2021-09-01 14:01:13 -07:00
Nikita Popov 02f74eadbe [IVDescriptors] Make pointer inductions compatible with opaque pointers
Store the used element type in the InductionDescriptor. For typed
pointers, it remains the pointer element type. For opaque pointers,
we always use an i8 element type, such that the step is a simple
offset.

A previous version of this patch instead tried to guess the element
type from an induction GEP, but this is not reliable, as the GEP
may be hidden (see @both in iv_outside_user.ll).

Differential Revision: https://reviews.llvm.org/D104795
2021-09-01 21:02:05 +02:00
Philip Reames 29fa37ec9f [SCEV] If max BTC is zero, then so is the exact BTC [2 of 2]
This extends D108921 into a generic rule applied to constructing ExitLimits along all paths. The remaining paths (primarily howFarToZero) don't have the same reasoning about UB sensitivity as the howManyLessThan ones did. Instead, the remain cause for max counts being more precise than exact counts is that we apply context sensitive loop guards on the max path, and not on the exact path. That choice is mildly suspect, but out of scope of this patch.

The MVETailPredication.cpp change deserves a bit of explanation. We were previously figuring out that two SCEVs happened to be equal because the happened to be identical. When we optimized one with context sensitive information, but not the other, we lost the ability to prove them equal. So, cover this case by subtracting and then applying loop guards again. Without this, we see changes in test/CodeGen/Thumb2/mve-blockplacement.ll

Differential Revision: https://reviews.llvm.org/D109015
2021-09-01 11:51:48 -07:00
Philip Reames 6600e1759b [SCEV] If max BTC is zero, then so is the exact BTC [1 of N]
This patch is specifically the howManyLessThan case.  There will be a couple of followon patches for other codepaths.

The subtle bit is explaining why the two codepaths have a difference while both are correct. The test case with modifications is a good example, so let's discuss in terms of it.
* The previous exact bounds for this example of (-126 + (126 smax %n))<nsw> can evaluate to either 0 or 1. Both are "correct" results, but only one of them results in a well defined loop. If %n were 127 (the only possible value producing a trip count of 1), then the loop must execute undefined behavior. As a result, we can ignore the TC computed when %n is 127. All other values produce 0.
* The max taken count computation uses the limit (i.e. the maximum value END can be without resulting in UB) to restrict the bound computation. As a result, it returns 0 which is also correct.

WARNING: The logic above only holds for a single exit loop. The current logic for max trip count would be incorrect for multiple exit loops, except that we never call computeMaxBECountForLT except when we can prove either a) no overflow occurs in this IV before exit, or b) this is the sole exit.

An alternate approach here would be to add the limit logic to the symbolic path. I haven't played with this extensively, but I'm hesitant because a) the term is optional and b) I'm not sure it'll reliably simplify away. As such, the resulting code quality from expansion might actually get worse.

This was noticed while trying to figure out why D108848 wasn't NFC, but is otherwise standalone.

Differential Revision: https://reviews.llvm.org/D108921
2021-08-31 08:50:11 -07:00
Kuba Mracek 4c066bd08b [GlobalDCE] Handle relative pointers in VFE (for Swift vtables)
To support Virtual Function Elimination to Swift, this PR adds support for Swift
vtables which contain "relative pointers" instead of direct pointer references.
These are in the form of:

@symbol = ... {
  i32 trunc (i64 sub (i64 ptrtoint (<type> @target to i64), i64 ptrtoint (... @symbol to i64)) to i32)
}

The PR extends GlobalDCE's way of looking up a vtable offset into a dependency
to be able to see through this expression and find the target symbol.

Differential Revision: https://reviews.llvm.org/D107645
2021-08-31 07:07:22 -07:00
Andrew Litteken cf56b08d15 [IRSim] Adding missing comments canonical relation commit
Adding missing comments to IRSimilarityIdentifier.cpp since
they were not properly added in commit 063af63b96.
2021-08-30 08:41:05 -07:00
Nikita Popov 9f7873784d [SCEVExpander] Reuse removePointerBase() for canonical addrecs
ExposePointerBase() in SCEVExpander implements basically the same
functionality as removePointerBase() in SCEV, so reuse it.

The SCEVExpander code assumes that the pointer operand on adds is
the last one -- I'm not sure that always holds. As such this might
not be strictly NFC.
2021-08-29 21:12:35 +02:00
Nikita Popov e6a5dd60ff [SCEV] Assert unique pointer base (NFC)
Add expressions can contain at most one pointer operand nowadays,
assert that in getPointerBase() and removePointerBase().
2021-08-29 20:06:24 +02:00
Kazu Hirata 0003d57434 [Analysis] Fix a "set but not used" warning 2021-08-28 06:37:01 -07:00
Andrew Litteken 063af63b96 [IRSim][IROutliner] Canonicalizing commutative value numbering between similarity sections.
When the initial relationship between two pairs of values between
similar sections is ambiguous to commutativity, arguments to the
outlined functions can be passed in such that the order is incorrect,
causing miscompilations.  This adds a canonical mapping to each
similarity section, so that we can maintain the relationship of global
value numbering from one section to another.

Added Tests:
Transforms/IROutliner/outlining-commutative-operands-opposite-order.ll
unittests/Analysis/IRSimilarityIdentifierTest.cpp - IRSimilarityCandidate:CanonicalNumbering

Reviewers: jroelofs, jpaquette, yroux

Differential Revision: https://reviews.llvm.org/D104143
2021-08-27 15:02:56 -07:00
Philip Reames ec8d87e9f5 [SCEV] Infer nuw from nw for addrecs
This was previously committed in 914836b, and reverted due to confusion on the status of the review.

Differential Revision: https://reviews.llvm.org/D108601
2021-08-24 14:24:05 -07:00
Sanjay Patel 204038d52e [InstSimplify] fold or+shifted -1 to -1
These are similar to the rotate pattern added with:
dcf659e821
...but we don't have guard ops on the shift amount,
so we don't canonicalize to the intrinsic.

  declare void @llvm.assume(i1)

  define i32 @src(i32 %shamt, i32 %bitwidth) {
    ; subtract must be in range of bitwidth
    %lt = icmp ule i32 %bitwidth, 32
    call void @llvm.assume(i1 %lt)

    %r = lshr i32 -1, %shamt
    %s = sub i32 %bitwidth, %shamt
    %l = shl i32 -1, %s
    %o = or i32 %r, %l
    ret i32 %o
  }

  define i32 @tgt(i32 %shamt, i32 %bitwidth) {
    ret i32 -1
  }

https://alive2.llvm.org/ce/z/aF7WHx
2021-08-24 15:38:38 -04:00
Philip Reames 58582bae63 Revert "[SCEV] Infer nsw/nuw from nw for addrecs"
This reverts commit 914836b1c8.  Further comments on review came up after initial approval.  Reverting while addressing.
2021-08-24 09:28:37 -07:00
Philip Reames 914836b1c8 [SCEV] Infer nsw/nuw from nw for addrecs
If we no an addrec doesn't self-wrap, the increment is strictly positive, and the start value is the smallest representable value, then we know that the corresponding wrap type can not occur.

Differential Revision: https://reviews.llvm.org/D108601
2021-08-24 08:53:21 -07:00
Philip Reames 96ef794fd0 [SCEV] Add a hasFlags utility to improve readability [NFC] 2021-08-23 17:36:52 -07:00
Mircea Trofin 1055c5e1d3 [MLGO] Make sure inliner logs when deleting callees
When using final reward (which is now the default), we were skipping
logging decisions that were leading to callee deletion. This fixes that.

Differential Revision: https://reviews.llvm.org/D108587
2021-08-23 14:54:46 -07:00
Florian Hahn d024a01511
Recommit "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"
This reverts the revert ab9296f13b.

The issue causing the revert should be fixed in 9baed023b4.
2021-08-23 11:25:27 +01:00
Sanjay Patel dcf659e821 [InstSimplify] fold rotate of -1 to -1
This is part of solving more general rotate patterns seen in
bugs related to:
https://llvm.org/PR51575

https://alive2.llvm.org/ce/z/GpkFCt
2021-08-22 09:15:48 -04:00
Sanjay Patel d41e308f10 [InstSimplify] fold rotate of zero to zero
This is part of solving more general rotate patterns seen in
bugs related to:
https://llvm.org/PR51575

https://alive2.llvm.org/ce/z/fjKwqv
2021-08-22 09:15:48 -04:00
Mircea Trofin 8dc3fe0cd1 [NFC][MLGO] Use std::move when moving protobufs
Because of an odd linking problem, we need to temporarily support
building with TF C API 1.15 + tensorflow 2.50 pip package in
'development' mode scenarios. Protobuf Message 'Swap' is partially
implemented in the header (2.50) and relies on a symbol not found in TF
C API 1.15. std::move avoids that, at no semantic cost.
2021-08-20 13:40:35 -07:00
Florian Hahn ab9296f13b
Revert "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"
This reverts commit f4122398e7 to
investigate a crash exposed by it.

The patch breaks building the code below with `clang -O2 --target=aarch64-linux`

     int a;
     double b, c;
     void d() {
       for (; a; a++) {
         b += c;
         c = a;
       }
     }
2021-08-20 21:24:28 +01:00
Augie Fackler e59c88294b MemoryBuiltins: trailing , on collection literal
This was probably bugging more than is reasonable, but it makes merging
changes in this file slightly less annoying to have the trailing comma
here. I only noticed this because Rust is currently carrying a patch to
this file and it kept making life a little difficult.
2021-08-19 17:59:23 +02:00
David Sherwood f4122398e7 [LoopVectorize][AArch64] Enable ordered reductions by default for AArch64
I have added a new TTI interface called enableOrderedReductions() that
controls whether or not ordered reductions should be enabled for a
given target. By default this returns false, whereas for AArch64 it
returns true and we rely upon the cost model to make sensible
vectorisation choices. It is still possible to override the new TTI
interface by setting the command line flag:

  -force-ordered-reductions=true|false

I have added a new RUN line to show that we use ordered reductions by
default for SVE and Neon:

  Transforms/LoopVectorize/AArch64/strict-fadd.ll
  Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

Differential Revision: https://reviews.llvm.org/D106653
2021-08-19 09:29:40 +01:00
Peter Collingbourne 6f85225ef3 StackLifetime: Remove asserts for multiple lifetime intrinsics.
According to the langref, it is valid to have multiple consecutive
lifetime start or end intrinsics on the same object.

For llvm.lifetime.start:
"If ptr [...] is a stack object that is already alive, it simply
fills all bytes of the object with poison."

For llvm.lifetime.end:
"Calling llvm.lifetime.end on an already dead alloca is no-op."

However, we currently fail an assertion in such cases. I've observed
the assertion failure when the loop vectorization pass duplicates
the intrinsic.

We can conservatively handle these intrinsics by ignoring all but
the first one, which can be implemented by removing the assertions.

Differential Revision: https://reviews.llvm.org/D108337
2021-08-18 18:45:28 -07:00
Arthur Eubanks 7557d6c896 [NFC] Cleanup calls to CallBase::getAttribute() 2021-08-18 09:39:33 -07:00
Arthur Eubanks 3f4d00bc3b [NFC] More get/removeAttribute() cleanup 2021-08-17 21:05:41 -07:00
Mark Danial 4018d25da8 LoopNest Analysis expansion to return instructions that prevent a Loop
Nest from being perfect

Expand LoopNestAnalysis to return the full list of instructions that
cause a loop nest to be imperfect. This is useful for other passes to
know if they should continue for in the inner loops.
Added New function getInterveningInstructions
that returns a small vector with the instructions that prevent a loop
for being perfect. Also added a couple of helper functions to reduce
code duplication.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D107773
2021-08-17 22:25:49 +00:00
Nikita Popov 735a590471 [MemorySSA] Remove -enable-mssa-loop-dependency option
This option has been enabled by default for quite a while now.
The practical impact of removing the option is that MSSA use
cannot be disabled in default pipelines (both LPM and NPM) and
in manual LPM invocations. NPM can still choose to enable/disable
MSSA using loop vs loop-mssa.

The next step will be to require MSSA for LICM and drop the
AST-based implementation entirely.

Differential Revision: https://reviews.llvm.org/D108075
2021-08-16 20:59:37 +02:00
Paul Robinson 94b4598d77 [PS4] stp[n]cpy not available on PS4 2021-08-16 09:06:52 -07:00
Sanjay Patel ca637014f1 [Analysis][SimplifyLibCalls] improve function signature check for memcmp
This would assert/crash as shown in:
https://llvm.org/PR50850

The matching for bcmp/bcopy should probably also be updated,
but that's another patch.
2021-08-15 16:11:26 -04:00
Arthur Eubanks 92ce6db9ee [NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr()
This is more consistent with similar methods.
2021-08-13 11:09:18 -07:00
Arthur Eubanks a0c42ca56c [NFC] Remove AttributeList::hasParamAttribute()
It's the same as AttributeList::hasParamAttr().
2021-08-13 10:58:21 -07:00
Roman Lebedev 0dc6b597db
Revert "[SCEV] Remove premature assert. PR46786"
Since then, the SCEV pointer handling as been improved,
so the assertion should now hold.

This reverts commit b96114c1e1,
relanding the assertion from commit 141e845da5.
2021-08-13 17:50:22 +03:00