Instruction being hoisted could have nuw/nsw flags inferred from the old
context, and we cannot simply move it to the new location keeping them
because we are going to introduce new uses to them that didn't exist before.
Example in https://github.com/llvm/llvm-project/issues/57187 shows how
this can produce branch by poison from initially well-defined program.
This patch forcefully recomputes poison-generating flag in the new context.
Differential Revision: https://reviews.llvm.org/D132022
Reviewed By: fhahn, nikic
I'm planning to deprecate and eventually remove llvm::empty.
I thought about replacing llvm::empty(x) with std::empty(x), but it
turns out that all uses can be converted to x.empty(). That is, no
use requires the ability of std::empty to accept C arrays and
std::initializer_list.
Differential Revision: https://reviews.llvm.org/D133677
TargetLibraryInfo isn't optional, so we have to provide it even with the
lageacy stuff. Ideally we wouldn't need it anymore but there are still
users out there that are stuck on the legacy PM.
Differential Revision: https://reviews.llvm.org/D133685
This patch adds additional vector types to be considered when doing
promotion in SROA, based on the types of the store and load slices. This
provides more promotion opportunities, by potentially using an optimal
"intermediate" vector type.
For example, the following code would currently not be promoted to a
vector, since `__m128i` is a `<2 x i64>` vector.
```
__m128i packfoo0(int a, int b, int c, int d) {
int r[4] = {a, b, c, d};
__m128i rm;
std::memcpy(&rm, r, sizeof(rm));
return rm;
}
```
```
packfoo0(int, int, int, int):
mov dword ptr [rsp - 24], edi
mov dword ptr [rsp - 20], esi
mov dword ptr [rsp - 16], edx
mov dword ptr [rsp - 12], ecx
movaps xmm0, xmmword ptr [rsp - 24]
ret
```
By also considering the types of the elements, we could find that the
`<4 x i32>` type would be valid for promotion, hence removing the memory
accesses for this function. In other words, we can explore other new
vector types, with the same size but different element types based on
the load and store instructions from the Slices, which can provide us
more promotion opportunities.
Additionally, the step for removing duplicate elements from the
`CandidateTys` vector was not using an equality comparator, which has
been fixed.
Differential Revision: https://reviews.llvm.org/D132096
As shown in the examples in issue #57683, we allow matching
vectors with poison (undef) in this transform (and possibly more),
but we can't then use the partially defined value as a replacement
value in other expressions blindly.
This seems to be avoided in simpler examples of reassociation,
and other passes should be able to clean up the redundant op
seen in these tests.
Adding the pre-header to CSEBlocks ensures instructions are CSE'd even
after hoisting.
This was original discovered by @atrick a while ago.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D133649
If the reused scalars are clustered, i.e. each part of the reused mask
contains all elements of the original scalars exactly once, we can
reorder those clusters to improve the whole ordering of of the clustered
vectors.
Differential Revision: https://reviews.llvm.org/D133524
Revert "[Attributor] Teach AAPointerInfo to look into aggregates"
This reverts commit 844f6c5d03 and
4ed0a88cd8 as they broke the buildbots
that run openmp/libomptarget/test/offloading/bug49021.cpp.
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
The previous implementation of time tracing in NewPassManager is direct but messive.
The key codes are like the demo below:
```
/// Runs the function pass across every function in the module.
PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM,
LazyCallGraph &CG, CGSCCUpdateResult &UR) {
/// ...
PreservedAnalyses PassPA;
{
TimeTraceScope TimeScope(Pass.name());
PassPA = Pass.run(F, FAM);
}
/// ...
}
```
It can be bothered to judge where should we add the tracing codes by hands.
With the PassInstrumentation framework, we can easily add `Before/After` callback
functions to add time tracing codes.
Differential Revision: https://reviews.llvm.org/D131960
Fixes#57531
This transformation may be particularly useful on x86-64,
because x & (x - 1) can be performed by a single blsr instruction.
Differential Revision: https://reviews.llvm.org/D133362
In situations when a submodule is extracted from big module (i.e. using
CloneModule) a lot of debug info is copied via metadata nodes. Despite of
the fact that part of that info is not linked to any instruction in extracted
IR file, StripDeadDebugInfo pass doesn't drop them.
Strengthen criteria for debug info that should be kept in a module:
- Only those compile units are left that referenced by a subprogram debug info
node that is attached to a function definition in the module or to an instruction
in the module that belongs to an inlined function.
Signed-off-by: Mikhail Lychkov <mikhail.lychkov@intel.com>
Differential Revision: https://reviews.llvm.org/D122163
Replacing the following instances of UndefValue with PoisonValue, where the UndefValue is used as an arbitrary value:
- llvm/lib/CodeGen/WinEHPrepare.cpp
`demotePHIsOnFunclets`: RAUW arbitrary value for lingering uses of removed PHI nodes
- llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
`FoldSingleEntryPHINodes`: Removes a self-referential single entry phi node.
- llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
`finalize`: Remove all references to removed functions.
- llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
`cleanup`: the result is not used then the inserted instructions are removed.
- llvm/tools/bugpoint/CrashDebugger.cpp
`TestInts`: the program is cloned and instructions are removed to narrow down source of crash.
Differential Revision: https://reviews.llvm.org/D133640
Use VPExpandSCEVRecipe to expand the step of pointer inductions. This
cleanup addresses a corresponding FIXME.
It should be NFC, as steps for pointer induction must be constants,
which makes expansion trivial.
The LLVM performance tips suggest that allocas should be placed at the
beginning of the entry block. So far, llvm doesn’t provide any helper to
find that position.
Add BasicBlock::getFirstNonPHIOrDbgOrAlloca and IRBuilder::SetInsertPointPastAllocas(Function*)
that get an insert position after the (static) allocas at the start of a
function and use it in ShadowStackGCLowering.
Differential Revision: https://reviews.llvm.org/D132554
If there are non-load/store users of the promoted pointer, we
currently abort promotion. However, having such users isn't really
relevant to the transform. We already separately check that a)
there are no instructions that modref the promoted pointer and
b) that a pointer capture disables store promotion.
In the affected @test_captured_in_loop test case we have a readnone
capture of the promoted pointer, which means that load promotion
can be performed (while store promotion cannot).
Differential Revision: https://reviews.llvm.org/D133485
This reverts commit 053841c562.
We faced a use-after-free after pushing the D113291, since the
foldSqrt() has a call to eraseFromParent(). The function
should be at the end of the main loop that folds the patterns.
This patch fixes that.
If multiple warnings created on the same instruction (debug location)
it can be difficult to figure out which input value is the cause.
This patches chains origins just before the warning using last origins
update debug information.
To avoid inflating the binary unnecessarily, do this only when uncertainty is
high enough, 3 warnings by default. On average it adds 0.4% to the
.text size.
Reviewed By: kda, fmayer
Differential Revision: https://reviews.llvm.org/D133232
GlobalsAA is considered stateless as usually transformations do not introduce
new global accesses, and removed global access is not a problem for GlobalsAA
users.
Sanitizers introduce new global accesses:
- Msan and Dfsan tracks origins and parameters with TLS, and to store stack origins.
- Sancov uses global counters. HWAsan store tag state in TLS.
- Asan modifies globals, but I am not sure if invalidation is required.
I see no evidence that TSan needs invalidation.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D133394
(trunc (1 << Y) to iN) == 2**C --> Y == C
(trunc (1 << Y) to iN) != 2**C --> Y != C
https://alive2.llvm.org/ce/z/xnFPo5
Follow-up to d9e1f9d759. This was a suggested
enhancement mentioned in issue #51889.
This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well.
Differential Revision: https://reviews.llvm.org/D132591
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.
Change call sites to use `std::size` instead.
Differential Revision: https://reviews.llvm.org/D133429
(trunc (1 << Y) to iN) == 0 --> Y u>= N
(trunc (1 << Y) to iN) != 0 --> Y u< N
These can be generalized in several ways as noted by the TODO
items, but this handles the pattern in the motivating bug report.
Fixes#51889
Differential Revision: https://reviews.llvm.org/D115480
VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a
scalar instruction is generated per-part.
This is a potential alternative D132892. For now the current patch only
catches cases where the address is trivially invariant (defined outside
VPlan), while D132892 catches any address that is considered invariant
by SCEV AFAICT.
It should be possible to hoist fully invariant recipes feeding loads out
of the vector loop region as well, but in practice LICM should do that
already.
This version of the patch artificially limits this to loads to make it
easier to compare, but this restriction should be easily liftable.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D133019
Introduces the SanitizerBinaryMetadata instrumentation pass which uses
the new MD_pcsections metadata kinds to instrument certain types of
instructions and functions required for breakpoint-based sanitizers.
The first intended user of the binary metadata emitted will be a variant
of GWP-TSan [1]. GWP-TSan will require information about atomic
accesses; to unambiguously determine if an access is atomic or not, we
also require "covered" information which code has been compiled with
SanitizerBinaryMetadata instrumentation enabled.
[1] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D130887
The original commit ( fe1f3cfc26 ) was reverted because it could
crash / assert when trying to fold a value that was replaced
by a constant. In that case, there might not be an entry for the
constant in the solver yet.
This version adds a check for that possibility along with tests to
exercise that pattern (they used to crash).
Original commit message:
This extends the transform added with D81756 to handle div/rem opcodes.
For example:
https://alive2.llvm.org/ce/z/cX6za6
This replicates part of what CVP already does, but the motivating example
from issue #57472 demonstrates a phase ordering problem - we convert
branches to select before CVP runs and miss the transform.
Differential Revision: https://reviews.llvm.org/D133198
This transform came up as a potential DAGCombine in D133282,
so I wanted to see how it escaped in IR too.
We do general folds in InstCombiner::SimplifySelectsFeedingBinaryOp()
by checking if either arm of a select simplifies when the trailing
binop is threaded into the select.
So as long as one side simplifies, it's a good fold to combine a
negate and add into 1 subtract.
This is an example with a zero arm in the select:
https://alive2.llvm.org/ce/z/Hgu_Tj
And this models the tests with a cancelling 'not' op:
https://alive2.llvm.org/ce/z/BuzVV_
Differential Revision: https://reviews.llvm.org/D133369
Currently, instructions in the preheader of the second of two fusion
candidates are sunk and hoisted whenever possible, to try to allow the
loops to fuse. Memory instructions are skipped, and are never sunk or
hoisted. This change adds memory instructions for sinking/hoisting
consideration.
This change uses DependenceAnalysis to check if a mem inst in the
preheader of FC1 depends on an instruction in FC0's header, across
which it will be hoisted, or FC1's header, across which it will be
sunk. We reject cases where the dependency is a data hazard.
Differential Revision: https://reviews.llvm.org/D131606