The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
This reduces peak memory overhead by 15% when building CTMark's tramp3d-v4 with
-O2 -g with assignment tracking enabled.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133321
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
The inliner requires two additions:
fixupAssignments - Update inlined instructions' DIAssignID metadata so that
inlined DIAssignID attachments are unique to the inlined instance.
trackInlinedStores - Treat inlined stores to caller-local variables
(i.e. callee stores to argument pointers that point to the caller's allocas) as
assignments. Track them using trackAssignments, which is the same method as is
used by the AssignmentTrackingPass. This means that we're able to detect stale
memory locations due to DSE after inlining. Because the stores are only tracked
_after_ inlining, any DSE or movement of stores _before_ inlining will not be
accounted for. This is an accepted limitation mentioned in the RFC.
One change is also required:
Update CloneBlock to preserve debug use-before-defs. Otherwise the assignments
will be dropped due to having the intrinsic operands replaced with empty
metadata (see use-before-def.ll in this patch and this related discourse post.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133318
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Update simplifycfg:
sinkLastInstruction - preserve debug use-before-defs.
SpeculativelyExecuteBB - replace the value component of dbg.assign intrinsics
when stores are hoisted and merged using a select, and don't delete them.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133310
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Most of the updates here are just to ensure DIAssignID attachments are
maintained and propagated correctly.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133307
llvm.coro.begin
Previously we've taken care of the writes to allocas prior to llvm.coro.begin.
However, since the promise alloca is special so that we never handled it
before. For the long time, since the programmers can't access the
promise_type due to the c++ language specification, we still failed to
recognize the problem until a recent report:
https://github.com/llvm/llvm-project/issues/57861
And we've tested many codes that the problem gone away after we handle
the writes to the promise alloca prior to @llvm.coro.begin()
prope until a recent report:
https://github.com/llvm/llvm-project/issues/57861
And we've tested many codes that the problem gone away after we handle
the writes to the promise alloca prior to @llvm.coro.begin() properly.
Closes https://github.com/llvm/llvm-project/issues/57861
Need to count the reduced values, vectorized in the tree but not in the top node. Such scalars still must be extracted out of the vector node instead of the original scalar.
This reverts commit eb2a57ebc7.
llvm/include/llvm/Transforms/Instrumentation/KCFI.h including
llvm/CodeGen is a layering violation. We should use an approach where
Instrumementation/ doesn't need to include CodeGen/.
Sorry for not spotting this in the review.
The KCFI sanitizer emits "kcfi" operand bundles to indirect
call instructions, which the LLVM back-end lowers into an
architecture-specific type check with a known machine instruction
sequence. Currently, KCFI operand bundle lowering is supported only
on 64-bit X86 and AArch64 architectures.
As a lightweight forward-edge CFI implementation that doesn't
require LTO is also useful for non-Linux low-level targets on
other machine architectures, add a generic KCFI operand bundle
lowering pass that's only used when back-end lowering support is not
available and allows -fsanitize=kcfi to be enabled in Clang on all
architectures.
Reviewed By: nickdesaulniers, MaskRay
Differential Revision: https://reviews.llvm.org/D135411
The current code of validDepInterchange() enumerates cases that are
legal for interchange. This could be simplified by checking
lexicographically order of the swapped direction matrix.
Reviewed By: congzhe, Meinersbur, bmahjour
Differential Revision: https://reviews.llvm.org/D137461
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
Currently, JT creates and updates local instances of BPI\BFI. As a result global ones have to be invalidated if JT made any changes.
In fact, JT doesn't use any information from BPI/BFI for the sake of the transformation itself. It only creates BPI/BFI to keep them up to date. But since it updates local copies (besides cases when it updates profile metadata) it just waste of time.
Current patch is a rework of D124439. D124439 makes one step and replaces local copies with global ones retrieved through AnalysisPassManager. Here we do one more step and don't create BPI/BFI if the only reason of creation is to keep BPI/BFI up to date. Overall logic is the following. If there is cached BPI/BFI then update it along the transformations. If there is no existing BPI/BFI, then create it only if it is required to update profile metadata.
Please note if BPI/BFI exists on exit from JT (either cached or created) it is always up to date and no reason to invalidate it.
Differential Revision: https://reviews.llvm.org/D136827
For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its
leader has been made available_externally. This violates the COMDAT rule that
its members must be retained or discarded as a unit.
To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)
This fixes two problems.
(a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to
linger around (unreferenced, hence benign), and is now correctly discarded.
```
int foo();
inline int v = foo();
```
(b) Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.
```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
#include "c.h"
int main() {
bar();
foo();
}
eof
cat > m2.cc <<'eof'
#include "c.h"
__attribute__((noinline)) void bar() {
foo();
}
eof
clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
```
If a GlobalAlias references a GlobalValue which is just changed to
available_externally, change the GlobalAlias as well (e.g. C5/D5 comdats due to
cc1 -mconstructor-aliases). The GlobalAlias may be referenced by other
available_externally functions, so it cannot easily be removed.
Depends on D137441: we use available_externally to mark a GlobalAlias in a
non-prevailing COMDAT, similar to how we handle GlobalVariable/Function.
GlobalAlias may refer to a ConstantExpr, not changing GlobalAlias to
GlobalVariable gives flexibility for future extensions (the use case is niche.
For simplicity we don't handle it yet). In addition, available_externally
GlobalAlias is the most straightforward implementation and retains the aliasee
information to help optimizers.
See windows-vftable.ll: Windows vftable uses an alias pointing to a
private constant where the alias is the COMDAT leader. The COMDAT use case
is skeptical and ThinLTO does not discard the alias in the non-prevailing COMDAT.
This patch retains the behavior.
See new tests ctor-dtor-alias2.ll: depending on whether the complete object
destructor emitted, when ctor/dtor aliases are used, we may see D0/D2 COMDATs in
one TU and D0/D1/D2 in a D5 COMDAT in another TU.
Allow such a mix-and-match with `if (GO->getComdat()->getName() == GO->getName()) NonPrevailingComdats.insert(GO->getComdat());`
GlobalAlias handling in ThinLTO is still weird, but this patch should hopefully
improve the situation for at least all cases I can think of.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D135427
Update comment to reflect current code, which also allows for
VPScalarIVStepsRecipes to be uniform.
Suggested by @Ayal during review of D136068, thanks!
The existing code already unconditionally dereferences RepR, so
cast_or_null can be replaced by just cast.
Suggested by @Ayal during review of D136068, thanks!
The return value of getDef is guaranteed to be a VPRecipeBase and all
users can also accept a VPRecipeBase *. Most users actually case to
VPRecipeBase or a specific recipe before using it, so this change
removes a number of redundant casts.
Also rename it to getDefiningRecipe to make the name a bit clearer.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D136068
If same instruction is reduced several times, but in one graph is part
of buildvector sequence and in another it is vectorized, we may loose
information that it was part of buildvector and must be extracted from
later vectorized value.
If the graph is only the buildvector node without main operation, need
to inherit insrtpoint from the redution instruction. Otherwise the
compiler crashes trying to insert instruction at the entry block.
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Update the RemoveRedundantDbgInstrs utility to avoid sometimes losing
information when deleting dbg.assign intrinsics.
removeRedundantDbgInstrsUsingBackwardScan - treat dbg.assign intrinsics that
are not linked to any instruction just like dbg.values. That is, in a block of
contiguous debug intrinsics, delete all other than the last definition for a
fragment. Leave linked dbg.assign intrinsics in place.
removeRedundantDbgInstrsUsingForwardScan - Don't delete linked dbg.assign
intrinsics and don't delete the next intrinsic found even if it would otherwise
be eligible for deletion.
remomveUndefDbgAssignsFromEntryBlock - Delete undef and unlinked dbg.assign
intrinsics encountered in the entry block that come before non-undef
non-unlinked intrinsics for the same variable.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133294
According to report by @JojoR, the assertion error was hit hence we need
to have this check before the actual transformation.
Reviewed By: Meinersbur, #loopoptwg
Differential Revision: https://reviews.llvm.org/D136415
As part of legacy PM optimization pipeline removal.
This shouldn't be used in codegen pipelines so it should be ok to remove.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D137116
Need to use advanced check for the same vectorized node to avoid
possible compiler crash. We may have 2 similar nodes (vector one and
gather) after graph nodes rotation, need to do extra checks for the
exact match.
Instead of using the public, interposable symbol, we can use a private
alias and avoid relocations and addends.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D137982
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
The SLP-Vectorizer can merge a set of scalar stores into a single vectorized
store. Merge DIAssignID intrinsics from the scalar stores onto the new
vectorized store.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133320
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
DeadStoreElimmination shortens stores that are shadowed by later stores such
that the overlapping part of the earlier store is omitted. Insert an unlinked
dbg.assign intrinsic with a variable fragment that describes the omitted part
to signal that that fragment of the variable has a stale value in memory.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133315
This restores commit b756096b0c, which was
originally reverted in 00b09a7b18.
AAPointerInfo now maintains a list of all Access objects that it owns, along
with the following maps:
- OffsetBins: OffsetAndSize -> { Access }
- InstTupleMap: RemoteI x LocalI -> Access
A RemoteI is any instruction that accesses memory. RemoteI is different from
LocalI if and only if LocalI is a call; then RemoteI is some instruction in the
callgraph starting from LocalI.
Motivation: When AAPointerInfo recomputes the offset for an instruction, it sets
the value to Unknown if the new offset is not the same as the old offset. The
instruction must now be moved from its current bin to the bin corresponding to
the new offset. This happens for example, when:
- A PHINode has operands that result in different offsets.
- The same remote inst is reachable from the same local inst via different paths
in the callgraph:
```
A (local inst)
|
B
/ \
C1 C2
\ /
D (remote inst)
```
This fixes a bug where a store is incorrectly eliminated in a lit test.
Reviewed By: jdoerfert, ye-luo
Differential Revision: https://reviews.llvm.org/D136526
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
In an attempt to preserve more info, don't delete dbg.assign intrinsics that
are considered "out of scope" if they're linked to instructions.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133314
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Merge DIAssignID attachments on stores that are merged and sunk out of
loops. The store may be sunk into multiple exit blocks, and in this case all
the copies of the store get the same DIAssignID.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133313
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Maintain and propagate DIAssignID attachments in memcpyopt.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133312
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
mldst-motion will merge and sink the stores in if-diamond branches into the
common successor. Attach a merged DIAssignID to the merged store.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133311
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
The changes for assignment tracking in mem2reg don't require much of a
deviation from existing behaviour. dbg.assign intrinsics linked to an alloca
are treated much in the same way as dbg.declare users of an alloca, except that
we don't insert dbg.value intrinsics to describe assignments when there is
already a dbg.assign intrinsic present, e.g. one linked to a store that is
going to be removed.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133295
Move these to the new PM if they're used there.
Part of removing the legacy pass manager for optimization pipeline.
Reland with UseNewGVN usage in clang removed.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D137915
Move these to the new PM if they're used there.
Part of removing the legacy pass manager for optimization pipeline.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D137915
The LoopVersioningLICM object is only ever used for a single loop,
but there was various unnecessary code for handling the case where
it is reused across loops. Drop that code, and pass the loop to the
constructor.
Same as for async-style lowering, if there are no resume points in a
function, the coroutine frame pointer will be replaced by an undef,
making all accesses to the frame undefinde behavior.
Fix this by not adding allocas to the coroutine frame if there are no
resume points.
Differential Revision: https://reviews.llvm.org/D137866
Replace the vector of DecompEntry with a struct that stores the
constant offset separately. I think this is cleaner than giving the
first element special handling.
This probably also fixes some potential ubsan errors by more
consistently using addWithOverflow/multiplyWithOverflow.
decompose() currently returns a mix of {} and 0 + 1*V on failure.
This changes it to always return the 0 + 1*V form, thus making
decompose() infallible.
This makes the code marginally more powerful, e.g. we now fold
sub_decomp_i80 by treating the constant as a symbolic value.
Differential Revision: https://reviews.llvm.org/D137847
When IRCE runs on outer loop and sees a check of an AddRec of
inner loop, it crashes with an assert in SCEV that the AddRec
must be loop invariant.
This adds a bail out if the AddRec which is checked in icmp
is for another loop.
Fixes https://github.com/llvm/llvm-project/issues/58912.
Differential Revision: https://reviews.llvm.org/D137822
Method forgetLoop only forgets expression of phi or its users. SCEV
expressions except the above mentioned may still has loop dispositions
that point to the destroyed loop, which might cause a crash.
Fixes: https://github.com/llvm/llvm-project/issues/58865
Reviewed By: nikic, fhahn
Differential Revision: https://reviews.llvm.org/D137651