Created to fix: https://github.com/llvm/llvm-project/issues/53537
Some intrinsics functions are considered commutative since they are performing operations like addition or multiplication. Some of these have extra parameters to provide extra information that are not part of the operation itself and are not commutative. This makes sure that if an instruction that is an intrinsic takes the non commutative path to handle this case.
Reviewer: paquette
Closes Issue #53537
Differential Revision: https://reviews.llvm.org/D118807
Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match.
This also adds an additional command line flag debug option to disable outlining intrinsics.
Recommit of: 8de76bd569
Adds extra checking of intrinsic function calls names to avoid taking the address of intrinsic calls when extracting function calls.
Reviewers: paquette, jroelofs
Differential Revision: https://reviews.llvm.org/D109450
We use the same similarity scheme we used for branch instructions for phi nodes, and allow them to be outlined. There is not a lot of special handling needed for these phi nodes when outlining, as they simply act as outputs. The code extractor does not currently allow for non entry blocks within the extracted region to have predecessors, so there are not conflicts to handle with respect to predecessors no longer contained in the function.
Recommit of 515eec3553
Reviewers: paquette
Differential Revision: https://reviews.llvm.org/D106997
Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match.
This also adds an additional command line flag debug option to disable outlining intrinsics.
Reviewers: paquette, jroelofs
Differential Revision: https://reviews.llvm.org/D109450
The outliner currently requires that function calls not be indirect calls, and have that the function name, and function type must match, as well as other attributes such as calling conventions. This patch treats called functions as values, and just another operand, and named function calls as constants. This allows functions to be treated like any other constant, or input and output into the outlined functions.
There are also debugging flags added to enforce the old behaviors where indirect calls not be allowed, and to enforce the old rule that function calls names must also match.
Reviewers: paquette, jroelofs
Differential Revision: https://reviews.llvm.org/D109448
LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement.
This patch does just that for llvm.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D117121
getLoopIndex() is added to get the loop index of a given loop.
getLoopsAtDepth() is added to get the loops in the nest at a given
depth.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D115590
This happens in e.g. regalloc, where we trace decisions per function,
but wouldn't want to spew N log files (i.e. one per function). So we
output a key-value association, where the key is an ID for the
sub-module object, and the value is the tensorflow::SequenceExample.
The current relation with protobuf is tenuous, so we're avoiding a
custom message type in favor of using the `Struct` message, but that
requires the values be wire-able strings, hence base64 encoding.
We plan on resolving the protobuf situation shortly, and improve the
encoding of such logs, but this is sufficient for now for setting up
regalloc training.
Differential Revision: https://reviews.llvm.org/D116985
We currently have two similar implementations of this concept:
isNoAliasCall() only checks for the noalias return attribute.
isNoAliasFn() also checks for allocation functions.
We should switch to only checking the attribute. SLC is responsible
for inferring the noalias return attribute for non-new allocation
functions (with a missing case fixed in
348bc76e35).
For new, clang is responsible for setting the attribute,
if -fno-assume-sane-operator-new is not passed.
Differential Revision: https://reviews.llvm.org/D116800
We should not lose analysis precision if an 'add' has both no-wrap
flags (nsw and nuw) compared to just one or the other.
This patch is modeled on a similar construct that was added with
D59386.
I don't think it is possible to expose a problem with an unsigned
compare because of the way this was coded (nuw is handled first).
InstCombine has an assert that fires with the example from:
https://github.com/llvm/llvm-project/issues/52884
...because it was expecting InstSimplify to handle this kind of
pattern with an smax.
Fixes#52884
Differential Revision: https://reviews.llvm.org/D116322
Preserve the invariant that offset reported in the case of a
`PartialAlias` between `Loc1` and `Loc2`, is such that
`Loc1 + Offset = Loc2`, where `Loc1` and `Loc2` are the first and
the second argument, respectively, in alias queries.
Differential Revision: https://reviews.llvm.org/D115927
This prepares it for the regalloc work. Part of it is making model
evaluation accross 'development' and 'release' scenarios more reusable.
This patch:
- extends support to tensors of any shape (not just scalars, like we had
in the inliner -Oz case). While the tensor shape can be anything, we
assume row-major layout and expose the tensor as a buffer.
- exposes the NoInferenceModelRunner, which we use in the 'development'
mode to keep the evaluation code path consistent and simplify logging,
as we'll want to reuse it in the regalloc case.
Differential Revision: https://reviews.llvm.org/D115306
The way function gets the induction variable is by judging whether
StepInst or IndVar in the phi statement is one of the operands of CMP.
But if the LatchCmpOp0/LatchCmpOp1 is a constant, the subsequent
comparison may result in null == null, which is meaningless. This patch
fixes the typo.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D112980
- CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC.
- This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g.,
```
foo(float *p) {
__builtin_assume(__isGlobal(p));
// From there, we could assume p is a global pointer instead of a
// generic one.
}
```
This makes the code portable without introducing the implementation-specific features.
Note that NVCC starts to support __builtin_assume from version 11.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D112041
Data references in a loop should not access elements over the
statically allocated size. So we can infer a loop max trip count
from this undefined behavior.
Reviewed By: reames, mkazantsev, nikic
Differential Revision: https://reviews.llvm.org/D109821
blockaddresses do not participate in the call graph since the only
instructions that use them must all return to someplace within the
current function. And passes cannot retrieve a function address from a
blockaddress.
This was suggested by efriedma in D58260.
Fixes PR50881.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D112178
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
We can use the raw_string_ostream::str() method to perform the implicit flush() and return a reference to the std::string container that we can then wrap inside Twine().
When using a datalayout that has pointer index width != pointer size this
code triggers an assertion in Value::stripAndAccumulateConstantOffsets().
I encountered this this while compiling FreeBSD for CHERI-RISC-V.
Also update LoadsTest.cpp to use a DataLayout with index width != pointer
width to ensure this case is tested.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D110406
This is a followup to D109844 (and alternative to D109907), which
integrates the new "earliest escape" tracking into AliasAnalysis.
This is done by replacing the pre-existing context-free capture
cache in AAQueryInfo with a replaceable (virtual) object with two
implementations: The SimpleCaptureInfo implements the previous
behavior (check whether object is captured at all), while
EarliestEscapeInfo implements the new behavior from DSE.
This combines the "earliest escape" analysis with the full power of
BasicAA: It subsumes the call handling from D109907, considers a
wider range of escape sources, and works with AA recursion. The
compile-time cost is slightly higher than with D109907.
Differential Revision: https://reviews.llvm.org/D110368
There are several places in the code that are currently broken where
we assume an Instruction is always a member of a BasicBlock that
lives in a Function. This is a problem specifically when
attempting to get the vscale_range attribute. This patch adds checks
that an Instruction's parent also has a parent!
I've added a test for a function-less @llvm.vscale intrinsic call here:
unittests/Analysis/ValueTrackingTest.cpp
There are several places in the code that are currently broken as
they assume an Instruction always has a parent Function when
attempting to get the vscale_range attribute. This patch adds checks
that an Instruction has a parent.
I've added a test for a parentless @llvm.vscale intrinsic call here:
unittests/Analysis/ValueTrackingTest.cpp
Differential Revision: https://reviews.llvm.org/D110158
isValidAssumeForContext can provide better results with access to the
dominator tree in some cases. This patch adjusts computeConstantRange to
allow passing through a dominator tree.
The use VectorCombine is updated to pass through the DT to enable
additional scalarization.
Note that similar APIs like computeKnownBits already accept optional dominator
tree arguments.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D110175
The implementation is mostly copied from MemDepAnalysis. We want to look
at all loads and stores to the same pointer operand. Bitcasts and zero
GEPs of a pointer are considered the same pointer value. We choose the
most dominating instruction.
Since updating MemorySSA with invariant.group is non-trivial, for now
handling of invariant.group is not cached in any way, so it's part of
the walker. The number of loads/stores with invariant.group is small for
now anyway. We can revisit if this actually noticeably affects compile
times.
To avoid invariant.group affecting optimized uses, we need to have
optimizeUsesInBlock() not use invariant.group in any way.
Co-authored-by: Piotr Padlewski <prazek@google.com>
Reviewed By: asbirlea, nikic, Prazek
Differential Revision: https://reviews.llvm.org/D109134
The current IRSimilarityIdentifier does not try to find similarity across blocks, this patch provides a mechanism to compare two branches against one another, to find similarity across basic blocks, rather than just within them.
This adds a step in the similarity identification process that labels all of the basic blocks so that we can identify the relative branching locations. Within an IRSimilarityCandidate we use these relative locations to determine whether if the branching to other relative locations in the same region is the same between branches. If they are, we consider them similar.
We do not consider the relative location of the branch if the target branch is outside of the region. In this case, both branches must exit to a location outside the region, but the exact relative location does not matter.
Reviewers: paquette, yroux
Differential Revision: https://reviews.llvm.org/D106989
When the initial relationship between two pairs of values between
similar sections is ambiguous to commutativity, arguments to the
outlined functions can be passed in such that the order is incorrect,
causing miscompilations. This adds a canonical mapping to each
similarity section, so that we can maintain the relationship of global
value numbering from one section to another.
Added Tests:
Transforms/IROutliner/outlining-commutative-operands-opposite-order.ll
unittests/Analysis/IRSimilarityIdentifierTest.cpp - IRSimilarityCandidate:CanonicalNumbering
Reviewers: jroelofs, jpaquette, yroux
Differential Revision: https://reviews.llvm.org/D104143
Nest from being perfect
Expand LoopNestAnalysis to return the full list of instructions that
cause a loop nest to be imperfect. This is useful for other passes to
know if they should continue for in the inner loops.
Added New function getInterveningInstructions
that returns a small vector with the instructions that prevent a loop
for being perfect. Also added a couple of helper functions to reduce
code duplication.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D107773
1) add some self-diagnosis (when asserts are enabled) to check that all
features have the same nr of entries
2) avoid storing pointers to mutable fields because the proto API
contract doesn't actually guarantee those stay fixed even if no further
mutation of the object occurs.
Differential Revision: https://reviews.llvm.org/D107594
ValueTracking should allow for value ranges that may satisfy
llvm.assume, instead of restricting the ranges only to values that
will always satisfy the condition.
Differential Revision: https://reviews.llvm.org/D107298
Avoid buffering just to copy the buffered data, in 'development
mode', when logging. Instead, just populate the underlying protobuf.
Differential Revision: https://reviews.llvm.org/D106592
This patch changes `__kmpc_free_shared` to take an additional argument
corresponding to the associated allocation's size. This makes it easier to
implement the allocator in the runtime.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106496
It turns out that during training, the time required to parse the
textual protobuf of a training log is about the same as the time it
takes to compile the module generating that log. Using binary protobufs
instead elides that cost almost completely.
Differential Revision: https://reviews.llvm.org/D106157
The ceiling variant was recently added (due to the work towards D105216), and we're spending a lot of time trying to find optimizations for the expression. This patch brute forces the space of i8 unsigned divides and checks that we get a correct (well consistent with APInt) result for both udiv and udiv ceiling.
(This is basically what I've been doing locally in a hand rolled C++ program, and I realized there no good reason not to check it in as a unit test which directly exercises the logic on constants.)
Differential Revision: https://reviews.llvm.org/D106083
Rules:
1. SCEVUnknown is a pointer if and only if the LLVM IR value is a
pointer.
2. SCEVPtrToInt is never a pointer.
3. If any other SCEV expression has no pointer operands, the result is
an integer.
4. If a SCEVAddExpr has exactly one pointer operand, the result is a
pointer.
5. If a SCEVAddRecExpr's first operand is a pointer, and it has no other
pointer operands, the result is a pointer.
6. If every operand of a SCEVMinMaxExpr is a pointer, the result is a
pointer.
7. Otherwise, the SCEV expression is invalid.
I'm not sure how useful rule 6 is in practice. If we exclude it, we can
guarantee that ScalarEvolution::getPointerBase always returns a
SCEVUnknown, which might be a helpful property. Anyway, I'll leave that
for a followup.
This is basically mop-up at this point; all the changes with significant
functional effects have landed. Some of the remaining changes could be
split off, but I don't see much point.
Differential Revision: https://reviews.llvm.org/D105510
This change yields an additional 2% size reduction on an internal search
binary, and an additional 0.5% size reduction on fuchsia.
Differential Revision: https://reviews.llvm.org/D104751
Summary:
The changes to globalization introduced in D97680 created two new functions to
push / pop shareably memory on the GPU, __kmpc_alloc_shared and
__kmpc_free_shared. This patch adds these new runtime functions to the
library info so they can be used by the HeapToStack attributor interface. This
optimization replaces malloc / free pairs with stack memory if legal.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D102087
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.
The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.
One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.
Differential Revision: https://reviews.llvm.org/D99439
I'm not sure what behavior we want if the FP environment is
not default (also not sure if there's a way to enumerate
the full list of intrinsics programmatically), but currently
these are all defaulting to 'false' (doesn't propagate).
Currently, NoWrapFlags are dropped if we inline operands of SCEVAddExpr
operands. As a consequence, we always drop flags when building
expressions like `getAddExpr(A, getAddExpr(B, C, NUW), NUW)`.
We should be able to retain NUW flags common among all inlined
SCEVAddExpr and the original flags.
Reviewed By: nikic, mkazantsev
Differential Revision: https://reviews.llvm.org/D103877