This patch is to address https://bugs.llvm.org/show_bug.cgi?id=50610.
In computed goto pattern, there are usually a list of basic blocks that are all targets of indirectbr instruction, and each basic block also has address taken and stored in a variable.
CHR pass could potentially clone these basic blocks, which would generate a cloned version of the indirectbr and clonved version of all basic blocks in the list.
However these basic blocks will not have their addresses taken and stored anywhere. So latter SimplifyCFG pass will simply remove all tehse cloned basic blocks, resulting in incorrect code.
To fix this, when searching for scopes, we skip scopes that contains BBs with addresses taken.
Added a few test cases.
Reviewed By: aeubanks, wenlei, hoy
Differential Revision: https://reviews.llvm.org/D103867
Also:
- add driver test (fsanitize-use-after-return.c)
- add basic IR test (asan-use-after-return.cpp)
- (NFC) cleaned up logic for generating table of __asan_stack_malloc
depending on flag.
for issue: https://github.com/google/sanitizers/issues/1394
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D104076
It was found by chance revealing discrepancy between comment (few lines above),
the condition and how re-ordering of instruction is done inside the if statement
it guards. The condition was always evaluated to true.
Differential Revision: https://reviews.llvm.org/D104064
Adds the basic instrumentation needed for stack tagging.
Currently does not support stack short granules or TLS stack histories,
since a different code path is followed for the callback instrumentation
we use.
We may simply wait to support these two features until we switch to
a custom calling convention.
Patch By: xiangzhangllvm, morehouse
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D102901
The problematic code pattern in the test is based on:
https://llvm.org/PR50638
If the IfCond is itself the phi that we are trying to remove,
then the loop around line 2835 can end up with something like:
%cmp = select i1 %cmp, i1 false, i1 true
That can then lead to a use-after-free and assert (although
I'm still not seeing that locally in my release + asserts build).
I think this can only happen with unreachable code.
Differential Revision: https://reviews.llvm.org/D104063
<string> is currently the highest impact header in a clang+llvm build:
https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html
One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.
This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.
Differential Revision: https://reviews.llvm.org/D103888
We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.).
Differential Revision: https://reviews.llvm.org/D104029
This adds a function specialization pass to LLVM. Constant parameters
like function pointers and constant globals are propagated to the callee by
specializing the function.
This is a first version with a number of limitations:
- The pass is off by default, so needs to be enabled on the command line,
- It does not handle specialization of recursive functions,
- It does not yet handle constants and constant ranges,
- Only 1 argument per function is specialised,
- The cost-model could be further looked into, and perhaps related,
- We are not yet caching analysis results.
This is based on earlier work by Matthew Simpson (D36432) and Vinay Madhusudan.
More recently this was also discussed on the list, see:
https://lists.llvm.org/pipermail/llvm-dev/2021-March/149380.html.
The motivation for this work is that function specialisation often comes up as
a reason for performance differences of generated code between LLVM and GCC,
which has this enabled by default from optimisation level -O3 and up. And while
this certainly helps a few cpu benchmark cases, this also triggers in real
world codes and is thus a generally useful transformation to have in LLVM.
Function specialisation has great potential to increase compile-times and
code-size. The summary from some investigations with this patch is:
- Compile-time increases for short compile jobs is high relatively, but the
increase in absolute numbers still low.
- For longer compile-jobs, the extra compile time is around 1%, and very much
in line with GCC.
- It is difficult to blame one thing for compile-time increases: it looks like
everywhere a little bit more time is spent processing more functions and
instructions.
- But the function specialisation pass itself is not very expensive; it doesn't
show up very high in the profile of the optimisation passes.
The goal of this work is to reach parity with GCC which means that eventually
we would like to get this enabled by default. But first we would like to address
some of the limitations before that.
Differential Revision: https://reviews.llvm.org/D93838
This fixes the concern in single element store scalarization that the
alignment of new store may be larger than it should be. It calculates
the largest alignment if index is constant, and a safe one if not.
Reviewed By: lebedev.ri, spatel
Differential Revision: https://reviews.llvm.org/D103419
First we refactor the code which does no wrapping add sequences
match: we need to allow different operand orders for
the key add instructions involved in the match.
Then we use the refactored code trying 4 variants of matching operands.
Originally the code relied on the fact that the matching operands
of the two last add instructions of memory index calculations
had the same LHS argument. But which operand is the same
in the two instructions is actually not essential, so now we allow
that to be any of LHS or RHS of each of the two instructions.
This increases the chances of vectorization to happen.
Reviewed By: volkan
Differential Revision: https://reviews.llvm.org/D103912
SROA sometimes preserves MD_mem_parallel_loop_access and MD_access_group metadata on loads/stores, and sometimes fails to do so. This change adds copying of the MD after other CreateAlignedLoad/CreateAlignedStores. Also fix a case where the metadata was being copied from a load, rather than the store.
Added a LIT test to catch one case.
Patch by Mark Mendell
Differential Revision: https://reviews.llvm.org/D103254
As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads.
This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL.
The question remains why this was initially added and what the implications of removing this optimization would be.
Do we need an alternative mechanism to propagate the information about legality of if-conversion?
Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general?
I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous.
Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D103907
Essentially, the cover function simply combines the loop level check and the function level scope into one call. This simplifies several callers and is (subjectively) less error prone.
There is no need to schedule insertelement instructions. The compiler
did not schedule them before it started support their vectorization and
it should not do it after. We pre-schedule them manually when finding
a build vector sequence.
Disabling scheduling of insertelement instructions improves compile
time and vectorization of the very large basic blocks by saving
scheduling budget for other instructions.
Differential Revision: https://reviews.llvm.org/D104026
```
llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:19: warning: loop variable 'VF' of type 'const llvm::ElementCount' creates a copy from type 'const llvm::ElementCount' [-Wrange-loop-analysis]
for (const auto VF : VFCandidates) {
^
llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:8: note: use reference type 'const llvm::ElementCount &' to prevent copying
for (const auto VF : VFCandidates) {
^~~~~~~~~~~~~~~
&
1 warning generated.
```
Differential Revision: https://reviews.llvm.org/D103970
This patch allows folding stepvector + extract to the lane when the lane is
lower than the minimum size of the scalable vector. This fold is possible
because lane X of a stepvector is also X!
For instance, extracting element 3 of a <vscale x 4 x i64>stepvector is 3.
Differential Revision: https://reviews.llvm.org/D103153
Summary:
The current implementation of AANoFreeFloating will incorrectly list floating
point loads and stores as may-free. This prevents other attributor instances
like HeapToStack from pushing some allocations to the stack.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D103975
This allows for using the frame record feature (which uses __hwasan_tls)
independently from however the user wants to access the shadow base, which
prior was only usable if shadow wasn't accessed through the TLS variable or ifuncs.
Frame recording can be explicitly set according to ShadowMapping::WithFrameRecord
in ShadowMapping::init. Currently, it is only enabled on Fuchsia and if TLS is
used, so this should mimic the old behavior.
Added an extra case to prologue.ll that covers this new case.
Differential Revision: https://reviews.llvm.org/D103841
Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D99435
1. Better sorting of scalars to be gathered. Trying to insert
constants/arguments/instructions-out-of-loop at first and only then
the instructions which are inside the loop. It improves hoisting of
invariant insertelements instructions.
2. Better detection of shuffle candidates in gathering function.
3. The cost of insertelement for constants is 0.
Part of D57059.
Differential Revision: https://reviews.llvm.org/D103458
Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D99435
As discussed in the post-commit comments for:
3cdd05e519
It seems to be safe to propagate all flags from the final fneg
except for 'nsz' to the new select:
https://alive2.llvm.org/ce/z/J_APDc
nsz has unique FMF semantics: it is not poison, it is only
"insignificant" in the calculation according to the LangRef.
> This reapplies c0f3dfb9, which was reverted following the discovery of
> crashes on linux kernel and chromium builds - these issues have since
> been fixed, allowing this patch to re-land.
This reverts commit 36ec97f76a.
The change caused non-determinism in the compiler, see comments on the code
review at https://reviews.llvm.org/D91722.
Reverting to unbreak people's builds until that can be addressed.
This also reverts the follow-up "[DebugInfo] Limit the number of values
that may be referenced by a dbg.value" in
a0bd6105d8.
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.
Also, a crash problem on legacy pass manager is fixed.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D99149
If the `-enable-strict-reductions` flag is set to true, then currently we will
always choose to vectorize the loop with strict in-order reductions. This is
not necessary where we allow the reordering of FP operations, such as
when loop hints are passed via metadata.
This patch moves useOrderedReductions so that we can also check whether
loop hints allow reordering, in which case we should use the default
behaviour of vectorizing with unordered reductions.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D103814
Complete support for fast8:
- amend shadow size and mapping in runtime
- remove fast16 mode and -dfsan-fast-16-labels flag
- remove legacy mode and make fast8 mode the default
- remove dfsan-fast-8-labels flag
- remove functions in dfsan interface only applicable to legacy
- remove legacy-related instrumentation code and tests
- update documentation.
Reviewed By: stephan.yichao.zhao, browneee
Differential Revision: https://reviews.llvm.org/D103745
Needs to be discussed more.
This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46
This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1
This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23
This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd
Unrolling with more iterations than MaxTripCount is pointless, as
those iterations can never be executed. As such, we clamp ULO.Count
to MaxTripCount if it is known. This means we no longer need to
consider iterations after MaxTripCount for exit folding, and the
CompletelyUnroll flag becomes independent of ULO.TripCount.
Differential Revision: https://reviews.llvm.org/D103748
This is a modified version of a patch by tolziplohu with a style change, and most importantly, a revised commit message.
inttoptr for a non-integral address space is currently ill defined in the LangRef. Figuring out exactly what the dynamic semantics of such a cast would be is hard, and not yet settled. Despite that, we still need to go ahead and implement something in RS4GC for a couple of reasons.
First, as a simple consistency argument. We're apparently added support for constexpr inttoptrs a while back, and even have tests which exercised them. Having a lack of constant folding trigger a crash during lowering is non-ideal.
Second, and more fundementally, the optimizer is allowed to insert undefined constructs in unreachable code. At the same time, we can't assume that dynamically dead code is always pruned before lowering. As a result, we must assume that inttoptrs can occur (even if completely ill defined) along dead paths. We need the lowering to not crash. The stackmaps produced can be garbage (as the assumption is the code is dynamically dead), but the lowering itself can't crash.
Differential Revision: https://reviews.llvm.org/D103492
We need to adjust the FMF propagation on at least
one of these transforms as discussed in:
https://llvm.org/PR49654
...so this should make it easier to intersect flags.
The non-DOT printing does not include the successors of VPregionBlocks.
This patch use the same style for printing successors as for
VPBasicBlock.
I think the printing of successors could be a bit improved further, as
at the moment it is hard to ensure a check line matches all successors.
But that can be done as follow-up.
Reviewed By: a.elovikov
Differential Revision: https://reviews.llvm.org/D103515
This patch is an extension of D103421. It allows the InstCombiner to
generate the negated form of integer scalable-vector splats. It can
technically handle fixed-length vectors too but those are completely
covered by the preceding logic.
This enables extra combining opportunities for scalable vector types.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D103801
This patch abstract Calls in Inliner:run() to InlineOrder.
With this patch, it's possible to customize the inlining order,
e.g. use queue or priority queue.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D103315
This pass transforms loops that contain a conditional branch with induction
variable. For example, it transforms left code to right code:
newbound = min(n, c)
while (iv < n) { while(iv < newbound) {
A A
if (iv < c) B
B C
C }
} if (iv != n) {
while (iv < n) {
A
C
}
}
Differential Revision: https://reviews.llvm.org/D102234
This patch marks the induction increment of the main induction variable
of the vector loop as NUW when not folding the tail.
If the tail is not folded, we know that End - Start >= Step (either
statically or through the minimum iteration checks). We also know that both
Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
%Step == %End. Hence we must exit the loop before %IV + %Step unsigned
overflows and we can mark the induction increment as NUW.
This should make SCEV return more precise bounds for the created vector
loops, used by later optimizations, like late unrolling.
At the moment quite a few tests still need to be updated, but before
doing so I'd like to get initial feedback to make sure I am not missing
anything.
Note that this could probably be further improved by using information
from the original IV.
Attempt of modeling of the assumption in Alive2:
https://alive2.llvm.org/ce/z/H_DL_g
Part of a set of fixes required for PR50412.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D103255
This patch abstract Calls in Inliner:run() to InlineOrder.
With this patch, it's possible to customize the inlining order, i.e. use queue or priority queue.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D103315
We might want to use it when creating SCEV proper in createSCEV(),
now that we don't `forgetValue()` in `SimplifyIndvar::strengthenOverflowingOperation()`,
which might have caused us to loose some optimization potential.
Loop peeling is currently performed as part of UnrollLoop().
Outside test scenarios, it is always performed with an unroll
count of 1. This means that unrolling doesn't actually do anything
apart from performing post-unroll simplification.
When testing, it's currently possible to specify both an explicit
peel count and an explicit unroll count. This doesn't perform any
sensible operation and may result in miscompiles, see
https://bugs.llvm.org/show_bug.cgi?id=45939.
This patch moves peeling from UnrollLoop() into tryToUnrollLoop(),
so that peeling does not also perform a susequent unroll. We only
run the post-unroll simplifications. Specifying both an explicit
peel count and unroll count is forbidden.
In the future, we may want to support both (non-PGO) peeling a
loop and unrolling it, but this needs to be done by first performing
the peel and then recalculating unrolling heuristics on a now
possibly analyzable loop.
Differential Revision: https://reviews.llvm.org/D103362
`__profd_*` variables are referenced by code only when value profiling is
enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just
waste space on ELF/Mach-O. We change the comdat symbol from `__profd_*` to
`__profc_*` because an internal symbol does not provide deduplication features
on COFF. The choice doesn't matter on ELF.
(In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_*` symbols.)
On Windows this enables further optimization. We are no longer affected by the
link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can
cause duplicate definition error.
https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html
We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585).
This avoids many `/INCLUDE:` directives in `.drectve`.
Here is rnk's measurement for Chrome:
```
This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%:
#BEFORE
$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
1047758867
$ du -cksh base_unittests.exe
82M base_unittests.exe
82M total
# AFTER
$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
937886499
$ du -cksh base_unittests.exe
78M base_unittests.exe
78M total
```
The change is NFC for Mach-O.
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D103372
When SimplifyIndVars infers IR nowrap flags from SCEV, this may
happen in two ways: Either nowrap flags were already present in
SCEV and just get transferred to IR. Or zero/sign extension of
addrecs infers additional nowrap flags, and those get transferred
to IR. In the latter case, calling forgetValue() ensures that the
newly inferred nowrap flags get propagated to any other SCEV
expressions based on the addrec. However, the invalidation can
also have a major compile-time effect in some cases. For
https://bugs.llvm.org/show_bug.cgi?id=50384 with n=512 compile-
time drops from 7.1s to 0.8s without this invalidation. At the
same time, removing the invalidation doesn't affect any codegen
in test-suite.
Differential Revision: https://reviews.llvm.org/D103424
This patch was split from https://reviews.llvm.org/D102246
[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This is for llvm-profdata part of change. It sets the bit masks for the
profile reader in llvm-profdata. Also add an internal option
"-fs-discriminator-pass" for show and merge command to process the profile
offline.
This patch also moved setDiscriminatorMaskedBitFrom() to
SampleProfileReader::create() to simplify the interface.
Differential Revision: https://reviews.llvm.org/D103550
This patch changes the `isKnownHeapToStack` and `isAssumedHeapToStack`
member functions to return if a function call is going to be altered by
HeapToStack.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D103574
This is similar to b865eead76 ( D103617 ) and fixes:
https://llvm.org/PR5057541b71f718b did this and more (noted with TODO
comments in the tests), but it didn't handle the case
where the destination is narrower than the source, so
it got reverted.
This is a simple match-and-replace. If there's evidence
that the TODO cases are useful, we can revisit/extend.
Some floating point lib calls have ABI attributes that need to be set on
the caller. Found via D103412.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D103415
This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing when the passed in trip count is an exact trip count or a max trip count. In theory the new code is slightly less powerful (since it relies on exact computable trip counts), but in practice, it appears to cover all the same cases. It can also be extended if needed.
The test change shows what appears to be a bug in the existing code around the interaction of peeling and unrolling. The original loop only ran 8 iterations. The previous output had the loop peeled by 2, and then an exact unroll of 8. This meant the loop ran a total of 10 iterations which appears to have been a miscompile.
Differential Revision: https://reviews.llvm.org/D103620
`__profd_*` variables are referenced by code only when value profiling is
enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just
waste space on ELF/Mach-O. We change the comdat symbol from `__profd_*` to
`__profc_*` because an internal symbol does not provide deduplication features
on COFF. The choice doesn't matter on ELF.
(In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_*` symbols.)
On Windows this enables further optimization. We are no longer affected by the
link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can
cause duplicate definition error.
https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html
We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585).
This avoids many `/INCLUDE:` directives in `.drectve`.
Here is rnk's measurement for Chrome:
```
This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%:
#BEFORE
$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
1047758867
$ du -cksh base_unittests.exe
82M base_unittests.exe
82M total
# AFTER
$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
937886499
$ du -cksh base_unittests.exe
78M base_unittests.exe
78M total
```
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D103372
This is a first step towards simplifying the transform interface to be less error prone. The basic idea is that querying SCEV is cheap (since it's cached) and we can just check for properties related to branch folding in the transform method instead of relying on the heuristic part to pass everything in correctly.
Differential Revision: https://reviews.llvm.org/D103584
No need to recalculate the cost of extractelements, just no need to
compensate the cost of all extractelements, need to check before if this
is actually going to be removed at the vectorization. Also, no need to
generate new extractelement instruction, we may just regenerate the
original one. It may improve the final vectorization.
Differential Revision: https://reviews.llvm.org/D102933
This cleans up the unroll action into two phases. Phase 1 does the mechanical act of unrolling, and leaves all conditional branches in place. Phase 2 optimizes away some of the conditional branches and then simplifies the loop. The primary benefit of the reordering is that we can delete some special cases dom tree update logic.
Differential Revision: https://reviews.llvm.org/D103561
tryToVectorizeList function allows to reorder only 2 scalars. Patch
allows to reorder >2 scalars. Also, to avoid possible regressions, it
allows extra vectorization of the remaining parts of the scalars
elements if possible.
Part of D57059.
Differential Revision: https://reviews.llvm.org/D103247
As noticed by NAKAMURA Takumi back in 2017, we cannot use
properlyDominates for std::stable_sort as properlyDominates only
partially orders blocks. That is, for blocks A, B, C, D, where A
dominates B and C dominates D, we have A == C, B == C, but A < B. This
is not a valid comparison function for std::stable_sort and causes
different results between libstdc++ and libc++. This change uses DFS
numbering to give deterministic results for all reachable blocks.
Unreachable blocks are ignored already, so do not need special
consideration.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D103441
Calls must properly match argument ABI attributes with the callee.
Found via D103412.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D103414
The linkage/visibility of `__profn_*` variables are derived
from the profiled functions.
extern_weak => linkonce
available_externally => linkonce_odr
internal => private
extern => private
_ => unchanged
The linkage/visibility of `__profc_*`/`__profd_*` variables are derived from
`__profn_*` with linkage/visibility wrestling for Windows.
The changes can be folded to the following without changing semantics.
```
if (TT.isOSBinFormatCOFF() && !NeedComdat) {
Linkage = GlobalValue::InternalLinkage;
Visibility = GlobalValue::DefaultVisibility;
}
```
That said, I think we can just delete the code block.
An extern/internal function will now use private `__profc_*`/`__profd_*`
variables, instead of internal ones. This saves some symbol table entries.
A non-comdat {linkonce,weak}_odr function will now use hidden external
`__profc_*`/`__profd_*` variables instead of internal ones. There is potential
object file size increase because such symbols need `/INCLUDE:` directives.
However such non-comdat functions are rare (note that non-comdat weak
definitions don't prevent duplicate definition error).
The behavior changes match ELF.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D103355
Coro-split functions with an active suspend point have their scope line set to
the line of the suspend point. However for compiler generated functions, this
results in debug info with unconventional results: a file named
`<compiler-generated>` with a non-zero line number. The convention for
`<compiler-generated>` is that the line number is zero.
This change propagates the scope line only for non-compiler generated
functions.
Differential Revision: https://reviews.llvm.org/D102412
Without this change, a callsite like:
[[clang::musttail]] return func_call(x);
will cause an error like:
fatal error: error in backend: failed to perform tail call elimination
on a call site marked musttail
due to DFSan inserting instrumentation between the musttail call and
the return.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D103542
This patch was split from https://reviews.llvm.org/D102246
[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This is mainly for ProfileData part of change. It will load
FS Profile when such profile is detected. For an extbinary format profile,
create_llvm_prof tool will add a flag to profile summary section.
For other format profiles, the users need to use an internal option
(-profile-isfs) to tell the compiler that the profile uses FS discriminators.
This patch also simplified the bit API used by FS discriminators.
Differential Revision: https://reviews.llvm.org/D103041
During Loop Strength Reduce, if the terminating condition for the loop
is not immediately adjacent to the terminating branch and it has more
than one use, a clone of the condition will be created just before the
terminating branch and will be used as the branch condition. Currently,
whether the instructions are "immediately adjacent" is determined by
checking whether the next instruction after the condition is the
terminating branch; this is incorrect however, as the presence of a
debug intrinsic between the two will result in a change to the output.
This is fixed by using getNextNonDebugInstruction() instead.
Differential Revision: https://reviews.llvm.org/D103033
Transfer the swiftasync attribute to the resume partial function according to
suspend.async specification. It's first argument denotes which argument is the
async context.
rdar://71499498
Differential Revision: https://reviews.llvm.org/D103285
This patch uses the calculated maximum scalable VFs to build VPlans,
cost them and select a suitable scalable VF.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D98722
llvm::getLoadStoreType was added recently and has the same implementation
as 'getMemInstValueType' in LoopVectorize.cpp. Since there is no
value in having two implementations, this patch removes the custom LV
implementation in favor of the generic one defined in Instructions.h.
When rewriting
powf(2.0, itofp(x)) -> ldexpf(1.0, x)
exp2(sitofp(x)) -> ldexp(1.0, sext(x))
exp2(uitofp(x)) -> ldexp(1.0, zext(x))
the wrong type was used for the second argument in the ldexp/ldexpf
libc call, for target architectures with 16 bit "int" type.
The transform incorrectly used a bitcasted function pointer with
a 32-bit argument when emitting the ldexp/ldexpf call for such
targets.
The fault is solved by using the correct function prototype
in the call, by asking TargetLibraryInfo about the size of "int".
TargetLibraryInfo by default derives the size of the int type by
assuming that it is 16 bits for 16-bit architectures, and
32 bits otherwise. If this isn't true for a target it should be
possible to override that default in the TargetLibraryInfo
initializer.
Differential Revision: https://reviews.llvm.org/D99438
As the existing test unreachable.ll shows, we should be doing more
work to avoid entering unreachable blocks: we should not stop
vectorization just because a PHI incoming value from an unreachable
block cannot be vectorized. We know that particular value will never
be used so we can just replace it with poison.
Implemented better scheme for perfect/shuffled matches of the gather
nodes which allows to fix the performance regressions introduced by
earlier patches. Starting detecting matches for broadcast nodes and
extractelement gathering.
Differential Revision: https://reviews.llvm.org/D102920
InstCombine didn't perform the transformations when fmul's operands were
the same instruction because it required to have one use for each of them
which is false in the case. This patch fixes this + adds tests for them
and introduces a new function isOnlyUserOfAnyOperand to check these cases
in a single place.
This patch is a result of discussion in D102574.
Differential Revision: https://reviews.llvm.org/D102698
The current loop or any of its sub-loops may be infinite. Unless the
function or the loops are marked as mustprogress, this in itself makes
the loop *not* dead.
This patch moves the logic to check whether the current loop is finite
or mustprogress to `isLoopDead` and also extends it to check the
sub-loops. This should fix PR50511.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D103382
If the index itself is already poison, the poison propagates through
instructions clamping the index to a valid range. This still causes
introducing a load of poison, as flagged by Alive2 and pointed out
at 575e2aff55.
This patch updates the code to freeze the index, unless it is proven to
not be poison.
Reviewed By: nlopes
Differential Revision: https://reviews.llvm.org/D103378
This reverts commit 4f2fd3818b.
The Linux kernel fails to build after this commit. See
https://reviews.llvm.org/D99481 for a reproducer.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
This patch fixes pr43326 and pr48212.
Currently when we move reduction phis to the right place,
loop interchange assumes the first phi in loop headers is
an induction phi, skips the first phi and assumes the rest
of phis are candidate reduction phis to move. However, it
may not always be the case.
This patch loops over all phis in loop headers and considers
a phi node as a candidate reduction phi to move only when it
is indeed a reduction phi across outer and inner loop.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D102743
Update isFirstOrderRecurrence to explore all uses of a recurrence phi
and check if we can sink them. If there are multiple users to sink, they
are all mapped to the previous instruction.
Fixes PR44286 (and another PR or two).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D84951
ExprValueMap is a map from SCEV * to a set-vector of (Value *, ConstantInt *) pair,
and while the map itself will likely be big-ish (have many keys),
it is a reasonable assumption that each key will refer to a small-ish
number of pairs.
In particular looking at n=512 case from
https://bugs.llvm.org/show_bug.cgi?id=50384,
the small-size of 4 appears to be the sweet spot,
it results in the least allocations while minimizing memory footprint.
```
$ for i in $(ls heaptrack.opt.*.gz); do echo $i; heaptrack_print $i | tail -n 6; echo ""; done
heaptrack.opt.0-orig.gz
total runtime: 14.32s.
calls to allocation functions: 8222442 (574192/s)
temporary memory allocations: 2419000 (168924/s)
peak heap memory consumption: 190.98MB
peak RSS (including heaptrack overhead): 239.65MB
total memory leaked: 67.58KB
heaptrack.opt.1-n1.gz
total runtime: 13.72s.
calls to allocation functions: 7184188 (523705/s)
temporary memory allocations: 2419017 (176338/s)
peak heap memory consumption: 191.38MB
peak RSS (including heaptrack overhead): 239.64MB
total memory leaked: 67.58KB
heaptrack.opt.2-n2.gz
total runtime: 12.24s.
calls to allocation functions: 6146827 (502355/s)
temporary memory allocations: 2418997 (197695/s)
peak heap memory consumption: 163.31MB
peak RSS (including heaptrack overhead): 211.01MB
total memory leaked: 67.58KB
heaptrack.opt.3-n4.gz
total runtime: 12.28s.
calls to allocation functions: 6068532 (494260/s)
temporary memory allocations: 2418985 (197017/s)
peak heap memory consumption: 155.43MB
peak RSS (including heaptrack overhead): 201.77MB
total memory leaked: 67.58KB
heaptrack.opt.4-n8.gz
total runtime: 12.06s.
calls to allocation functions: 6068042 (503321/s)
temporary memory allocations: 2418992 (200646/s)
peak heap memory consumption: 166.03MB
peak RSS (including heaptrack overhead): 213.55MB
total memory leaked: 67.58KB
heaptrack.opt.5-n16.gz
total runtime: 12.14s.
calls to allocation functions: 6067993 (499958/s)
temporary memory allocations: 2418999 (199307/s)
peak heap memory consumption: 187.24MB
peak RSS (including heaptrack overhead): 233.69MB
total memory leaked: 67.58KB
```
While that test may be an edge worst-case scenario,
https://llvm-compile-time-tracker.com/compare.php?from=dee85d47d9f15fc268f7b18f279dac2774836615&to=98a57e31b1947d5bcdf4a5605ac2ab32b4bd5f63&stat=instructions
agrees that this also results in improvements in the usual situations.
This is a patch that replaces shufflevector and insertelement's placeholder value with poison.
Underlying motivation is to fix the semantics of shufflevector with undef mask to return poison instead
(D93818)
The consensus has been made in the late 2020 via mailing list as well as the thread in https://bugs.llvm.org/show_bug.cgi?id=44185 .
This patch is a simple syntactic change to the existing code, hence directly pushed as a commit.
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.
The test case this helps is from code like this, which can come up in
certain matrix operations:
for(i=..)
dst[i] = 0;
for(j=..)
dst[i] += src[i*n+j];
After LICM, this becomes:
for(i=..)
dst[i] = 0;
sum = 0;
for(j=..)
sum += src[i*n+j];
dst[i] = sum;
The first store is dead, and with this patch is now removed.
Differntial Revision: https://reviews.llvm.org/D100464
As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210
...the bug is triggered as Eli say when sext(idx) * ElementSize overflows.
```
// assume that GV is an array of 4-byte elements
GEP = gep GV, 0, Idx // this is accessing Idx * 4
L = load GEP
ICI = icmp eq L, value
=>
ICI = icmp eq Idx, NewIdx
```
The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp.
And there is a problem because Idx * ElementSize can overflow.
Let's assume that the wanted value is at offset 0.
Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00.
We should return true for all these values, but currently, the new icmp only returns true for 0x00..00.
This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx.
```
...
=>
Idx' = and Idx, 0x3F..FF
ICI = icmp eq Idx', NewIdx
```
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D99481
This is similar to the fix in c590a9880d ( PR49832 ), but
we missed handling the pattern for select of bools (no compare
inst).
We can't substitute a vector value because the equality condition
replacement that we are attempting requires that the condition
is true/false for the entire value. Vector select can be partly
true/false.
I added an assert for vector types, so we shouldn't hit this again.
Fixed formatting while auditing the callers.
https://llvm.org/PR50500
When you try to define a new DEBUG_TYPE in a header file, DEBUG_TYPE
definition defined around the #includes in files include it could
result in redefinition warnings even compile errors.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D102594
This does not solve PR17101, but it is one of the
underlying diffs noted here:
https://bugs.llvm.org/show_bug.cgi?id=17101#c8
We could ease the one-use checks for the 'clear'
(no 'not' op) half of the transform, but I do not
know if that asymmetry would make things better
or worse.
Proofs:
https://rise4fun.com/Alive/uVB
Name: masked bit set
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp ne i32 %and, 0
%r = zext i1 %cmp to i32
=>
%s = lshr i32 %x, %y
%r = and i32 %s, 1
Name: masked bit clear
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp eq i32 %and, 0
%r = zext i1 %cmp to i32
=>
%xn = xor i32 %x, -1
%s = lshr i32 %xn, %y
%r = and i32 %s, 1
Note: this is a re-post of a patch that I committed at:
rGa041c4ec6f7a
The commit was reverted because it exposed another bug:
rGb212eb7159b40
But that has since been corrected with:
rG8a156d1c2795189 ( D101191 )
Differential Revision: https://reviews.llvm.org/D72396
When fulling unrolling with a non-latch exit, the latch block is
folded to unreachable. Replace this folding with the existing
changeToUnreachable() helper, rather than performing it manually.
This also moves the fold to happen after the manual DT update
for exit blocks. I believe this is correct in that the conversion
of an unconditional backedge into unreachable should not affect
the DT at all.
Differential Revision: https://reviews.llvm.org/D103340
This does some non-functional cleanup of exit folding during
unrolling. The two main changes are:
* First rewrite latch->header edges, which is unrelated to exit
folding.
* Combine folding for latch and non-latch exits. After the
previous change, the only difference in their logic is that
for non-latch exits we currently only fold "known non-exit"
cases, but not "known exit" cases.
I think this helps a lot to clarify this code and prepare it for
future changes.
Differential Revision: https://reviews.llvm.org/D103333
This is split off from D102002, and I think it is clear that
the difference in behavior was not intended. Options were
added to SimplifyCFG over time, but different chunks of
the pass pipelines were not kept in sync.
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.
Utilize LoopNest and let function 'Flatten' generate information from it.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D102904
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.
Utilize LoopNest and let function 'Flatten' generate information from it.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D102904
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.
Utilize LoopNest and let function 'Flatten' generate information from it.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D102904
AIX use `__ssp_canary_word` instead of `__stack_chk_guard`.
This patch update the target hook to use correct symbol,
so that the basic stackprotect feature can work.
The traceback will be handled in follow up patch.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D103100
DFSan has flags to control flows between pointers and objects referred
by pointers. For example,
a = *p;
L(a) = L(*p) when -dfsan-combine-pointer-labels-on-load = false
L(a) = L(*p) + L(p) when -dfsan-combine-pointer-labels-on-load = true
*p = b;
L(*p) = L(b) when -dfsan-combine-pointer-labels-on-store = false
L(*p) = L(b) + L(p) when -dfsan-combine-pointer-labels-on-store = true
The question is what to do with p += c.
In practice we found many confusing flows if we propagate labels from c
to p. So a new flag works like this
p += c;
L(p) = L(p) when -dfsan-propagate-via-pointer-arithmetic = false
L(p) = L(p) + L(c) when -dfsan-propagate-via-pointer-arithmetic = true
Reviewed-by: gbalats
Differential Revision: https://reviews.llvm.org/D103176
Arguments need to have the proper ABI parameter attributes set.
Followup to D101806.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D103288
in stripDebugInfo(). This patch fixes an oversight in
https://reviews.llvm.org/D96181 and also takes into account loop
metadata pointing to other MDNodes that point into the debug info.
rdar://78487175
Differential Revision: https://reviews.llvm.org/D103220
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D99149
For uniform ReplicateRecipes, only the first lane should be used, so
sinking them would mean we have to compute the value of the first lane
multiple times. Also, at the moment, sinking them causes a crash because
the value of the first lane is re-used by all users.
Reported post-commit for D100258.
There can be a need for some optimizations to get (base, offset)
for any GC pointer. The base can be calculated by generating
needed instructions as it is done by the
RewriteStatepointsForGC::findBasePointer() function. The offset
can be calculated in the same way. Though to not expose the base
calculation and to make the offset calculation as simple as
ptrtoint(derived_ptr) - ptrtoint(base_ptr), which is illegal
outside RS4GC, this patch introduces 2 intrinsics:
@llvm.experimental.gc.get.pointer.base(%derived_ptr)
@llvm.experimental.gc.get.pointer.offset(%derived_ptr)
These intrinsics are inlined by RS4GC along with generation of
statepoint sequences.
With these new intrinsics the GC parseable lowering for atomic
memcpy intrinsics (6ec2c5e402)
could be implemented as a separate pass.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D100445
We are deleting `phi` nodes within the for loop, so this makes sure we
increment the iterator before we delete the instruction pointed by the
iterator.
This started to break in
a0be081646.
Reviewed By: dschuff, lebedev.ri
Differential Revision: https://reviews.llvm.org/D103181
SLP vectorizer should not consider in sertelements with multiple uses as
a part of high level build vector, it must be considered as
a terminating insertelement in the vector build, otherwise it may
produce incorrect code.
Differential Revision: https://reviews.llvm.org/D103164
Following the addition of salvaging dbg.values using DIArgLists to
reference multiple values, a case has been found where excessively large
DIArgLists are produced as a result of this salvaging, resulting in
large enough performance costs to effectively freeze the compiler.
This patch introduces an upper bound of 16 to the number of values that
may be salvaged into a dbg.value, to limit the impact of these extreme
cases to performance.
Differential Revision: https://reviews.llvm.org/D103162
The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost.
By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928.
Differential Revision: https://reviews.llvm.org/D102934
When loop hints are passed via metadata, the allowReordering function
in LoopVectorizationLegality will allow the order of floating point
operations to be changed:
bool allowReordering() const {
// When enabling loop hints are provided we allow the vectorizer to change
// the order of operations that is given by the scalar loop. This is not
// enabled by default because can be unsafe or inefficient.
The -enable-strict-reductions flag introduced in D98435 will currently only
vectorize reductions in-loop if hints are used, since canVectorizeFPMath()
will return false if reordering is not allowed.
This patch changes canVectorizeFPMath() to query whether it is safe to
vectorize the loop with ordered reductions if no hints are used. For
testing purposes, an additional flag (-hints-allow-reordering) has been
added to disable the reordering behaviour described above.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D101836
The patch was reverted due to compile time impact of contextual SCEV
queries. It also appeared that it introduced a miscompile on irreducible CFG.
Changes made:
1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate;
2. Irreducible CFG in live code is now detected and excluded from processing.
Differential Revision: https://reviews.llvm.org/D102615
The patch was reverted due to compile time impact of contextual SCEV
queries. It also appeared that it introduced a miscompile on irreducible CFG.
Changes made:
1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate;
2. Irreducible CFG in live code is now detected and excluded from processing.
Differential Revision: https://reviews.llvm.org/D102615
We sometimes see code like this:
Case 1:
%gep = getelementptr i32, i32* %a, <2 x i64> %splat
%ext = extractelement <2 x i32*> %gep, i32 0
or this:
Case 2:
%gep = getelementptr i32, <4 x i32*> %a, i64 1
%ext = extractelement <4 x i32*> %gep, i32 0
where there is only one use of the GEP. In such cases it makes
sense to fold the two together such that we create a scalar GEP:
Case 1:
%ext = extractelement <2 x i64> %splat, i32 0
%gep = getelementptr i32, i32* %a, i64 %ext
Case 2:
%ext = extractelement <2 x i32*> %a, i32 0
%gep = getelementptr i32, i32* %ext, i64 1
This may create further folding opportunities as a result, i.e.
the extract of a splat vector can be completely eliminated. Also,
even for the general case where the vector operand is not a splat
it seems beneficial to create a scalar GEP and extract the scalar
element from the operand. Therefore, in this patch I've assumed
that a scalar GEP is always preferrable to a vector GEP and have
added code to unconditionally fold the extract + GEP.
I haven't added folds for the case when we have both a vector of
pointers and a vector of indices, since this would require
generating an additional extractelement operation.
Tests have been added here:
Transforms/InstCombine/gep-vector-indices.ll
Differential Revision: https://reviews.llvm.org/D101900
When the lower type test pass is invoked a second time with
DropTypeTests set to true, it expects that all remaining type tests feed
assume instructions, which are removed along with the type tests.
In some cases the llvm.assume might have been merged with another one,
i.e. from a builtin_assume instruction, in which case the type test
would actually feed a phi that in turn feeds the merged assume
instruction. In this case we can simply replace that operand of the phi
with "true" before removing the type test.
Differential Revision: https://reviews.llvm.org/D103073
Beside the `comdat any` deduplication feature, instrumentations use comdat to
establish dependencies among a group of sections, to prevent section based
linker garbage collection from discarding some members without discarding all.
LangRef acknowledges this usage with the following wording:
> All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key.
On ELF, for PGO instrumentation, a `__llvm_prf_cnts` section and its associated
`__llvm_prf_data` section are placed in the same GRP_COMDAT group. A
`__llvm_prf_data` is usually not referenced and expects the liveness of its
associated `__llvm_prf_cnts` to retain it.
The `setComdat(nullptr)` code (added by D10679) in InternalizePass can break the
use case (a `__llvm_prf_data` may be dropped with its associated `__llvm_prf_cnts` retained).
The main goal of this patch is to fix the dependency relationship.
I think it makes sense for InternalizePass to internalize a comdat and thus
suppress the deduplication feature, e.g. a relocatable link of a regular LTO can
create an object file affected by InternalizePass.
If a non-internal comdat in a.o is prevailed by an internal comdat in b.o, the
a.o references to the comdat definitions will be non-resolvable (references
cannot bind to STB_LOCAL definitions in b.o).
On PE-COFF, for a non-external selection symbol, deduplication is naturally
suppressed with link.exe and lld-link. However, this is fuzzy on ELF and I tend
to believe the spec creator has not thought about this use case (see D102973).
GNU ld and gold are still using the "signature is name based" interpretation.
So even if D102973 for ld.lld is accepted, for portability, a better approach is
to rename the comdat. A comdat with one single member is the common case,
leaving the comdat can waste (sizeof(Elf64_Shdr)+4*2) bytes, so we optimize by
deleting the comdat; otherwise we rename the comdat.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D103043
The common phi value transform replaces constants with values that
have the same value as the constant on a given edge. However, LVI
generally only provides information that is correct up to poison,
so this can end up replacing a well-defined value with poison.
D69442 addressed an instance of this problem by clearing poison
flags on the generating instruction, which was sufficient at the
time. rGa917fb89dc28 made LVI's edge value analysis slightly more
powerful, and clearing poison flags is no longer sufficient.
This patch changes the transform to instead explicitly guard against
a poison value instead. This should be satisfied for most cases due
to a prior branch on poison.
Fixes https://bugs.llvm.org/show_bug.cgi?id=50399.
Differential Revision: https://reviews.llvm.org/D102966
Now that we can fold some transposes into multiplies (CM: A * B^t and RM:
A^t * B), we want to move them around to create the optimal expressions:
* fold away double transposes while still using them to assert the shape
* sink transposes hoping they cancel out
* lift transposes when both operands are transposed
This also modifies the matrix remarks to include the number of exposed
transposes (i.e. transposes that we couldn't fold into a multiply).
The adjustment to the test remarks-inlining is a bit subtle: I am changing the
double transpose to a single transpose so that we don't remove it completely.
More importantly this changes some of the total instruction count, most
notable stores because we can no longer use a vector store.
Differential Revision: https://reviews.llvm.org/D102733
Nowadays LLVM does not assume that all loops are finite,
so if we want to produce a finite loop from a potentially-infinite one,
we must ensure that the original loop is known to be a finite one.
For this transform, it only matters for arithmetic right-shifts.
For them, either the function or the loop must be known to
be `mustprogress`, or the original value being shifted must be known
to be non-negative (because iff the sign bit was set,
it will never become zero, but will become `-1` in the "end").
It would be really good for alive2 to actually complain about this,
but it currently does not: https://github.com/AliveToolkit/alive2/issues/726
The 2nd test is based on the fuzzer example in post-commit
comments of D101191 -
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34661
The 1st test shows that we don't deal with this symmetrically.
We should be able to reduce both examples (possibly in
instsimplify instead of instcombine).
We can only scalarize memory accesses if we know the index is valid.
This patch adjusts canScalarizeAcceess to fall back to
computeConstantRange to check if the index is known to be valid.
Reviewed By: nlopes
Differential Revision: https://reviews.llvm.org/D102476
We could go either direction on this transform. VectorCombine already goes this
way for bitcasts (and handles more complicated cases using the cost model), so
let's try cast-first.
Deferring completely to VectorCombine is another possibility. But the backend
should be able to invert this easily when the vectors have the same shape, so
it doesn't seem like a transform that we need to avoid.
The motivating example from https://llvm.org/PR49081 has an int-to-float
sandwiched between 2 shuffles, and the backend currently does not reduce that,
so on x86, we get something like:
pshufd $249, %xmm0, %xmm0]
cvtdq2ps %xmm0, %xmm0
shufps $144, %xmm0, %xmm0
...instead of just a single conversion instruction.
Differential Revision: https://reviews.llvm.org/D103038
This adds support for the "count active bits" pattern, i.e.:
```
int countBits(unsigned val) {
int cnt = 0;
for( ; (val << cnt) != 0; ++cnt)
;
return cnt;
}
```
but a somewhat more general one:
```
int countBits(unsigned val, int start, int off) {
int cnt;
for (cnt = start; val << (cnt + off); cnt++)
;
return cnt;
}
```
alive2 is happy with all the tests there.
Note that, again, much like with the right-shift cases,
we don't require the `val != 0` guard.
This is the last pattern that was supported by
`detectShiftUntilZeroIdiom()`, which now becomes obsolete.
This adds support for the "count active bits" pattern, i.e.:
```
int countActiveBits(signed val) {
int cnt = 0;
for( ; (val >> cnt) != 0; ++cnt)
;
return cnt;
}
```
but a somewhat more general one:
```
int countActiveBits(signed val, int start, int off) {
int cnt;
for (cnt = start; val >> (cnt + off); cnt++)
;
return cnt;
}
```
This directly matches the existing 'logical right-shift until zero' idiom.
alive2 is happy with all the tests there.
Note that, again, much like with the original unsigned case,
we don't require the `val != 0` guard.
The old `detectShiftUntilZeroIdiom()` already supports this pattern,
the idea here is that the `val` must be positive (have at least one
leading zero), because otherwise the loop is non-terminating,
but since it is not `while(1)`, that would have been UB.
We really ought to support no_sanitize("coverage") in line with other
sanitizers. This came up again in discussions on the Linux-kernel
mailing lists, because we currently do workarounds using objtool to
remove coverage instrumentation. Since that support is only on x86, to
continue support coverage instrumentation on other architectures, we
must support selectively disabling coverage instrumentation via function
attributes.
Unfortunately, for SanitizeCoverage, it has not been implemented as a
sanitizer via fsanitize= and associated options in Sanitizers.def, but
rolls its own option fsanitize-coverage. This meant that we never got
"automatic" no_sanitize attribute support.
Implement no_sanitize attribute support by special-casing the string
"coverage" in the NoSanitizeAttr implementation. To keep the feature as
unintrusive to existing IR generation as possible, define a new negative
function attribute NoSanitizeCoverage to propagate the information
through to the instrumentation pass.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035
Reviewed By: vitalybuka, morehouse
Differential Revision: https://reviews.llvm.org/D102772
The D82085 "allow TRE for non-capturing calls" caused failure during bootstrap.
This patch does the same as D82085 plus fixes bootstrap error.
The problem with D82085 is that it does not create copies for byval
operands, while replacing function call with a branch.
Consider following example:
```
int zoo ( S p1 );
int foo ( int count, S p1 ) {
if ( count > 10 )
return zoo(p1);
// temporarily variable created for passing byvalue parameter
// p1 could be used when zoo(p1) is called(after TRE is done).
// lifetime.start p1.byvalue.temp
return foo(count+1, p1);
// lifetime.end p1.byvalue.temp
}
```
After recursive call to foo is replaced with a jump into
start of the function, its parameters could be passed to
zoo function. i.e. temporarily variable created for byvalue
parameter "p1" could be passed to zoo. Finally zoo receives
broken operand:
```
int foo ( int count, S p1 ) {
:tailrecurse
p1_tr = phi p1, p1.byvalue.temp
if ( count > 10 )
return zoo(p1_tr);
// temporarily variable created for passing byvalue parameter
// p1 could be used when zoo(p1) is called(after TRE is done).
lifetime.start p1.byvalue.temp
memcpy (p1.byvalue.temp, p1_tr)
count = count + 1
lifetime.end p1.byvalue.temp
br tailrecurse
}
```
To prevent using p1.byvalue.temp after its scope finished by
lifetime.end marker this patch copies value from p1.byvalue.temp
into another temporarily variable and then copies this variable
into the input parameter for next iteration.
This patch passes bootstrap build and bootstrap build with AddressSanitizer.
Differential Revision: https://reviews.llvm.org/D85614
This patch handles one particular case of one-iteration loops for which SCEV
cannot straightforwardly prove BECount = 1. The idea of the optimization is to
symbolically execute conditional branches on the 1st iteration, moving in topoligical
order, and only visiting blocks that may be reached on the first iteration. If we find out
that we never reach header via the latch, then the backedge can be broken.
Differential Revision: https://reviews.llvm.org/D102615
Reviewed By: reames
The current ad-hoc implementation used to determine whether a basic
block is unreachable doesn't work correctly in the general case (for
example it won't detect successors of unreachable blocks as
unreachable). This patch replaces it with the correct API that uses a
DominatorTree to answer the question correctly and quickly.
rdar://77181156
Differential Revision: https://reviews.llvm.org/D102963
This patch adds a first VPlan-based implementation of sinking of scalar
operands.
The current version traverse a VPlan once and processes all operands of
a predicated REPLICATE recipe. If one of those operands can be sunk,
it is moved to the block containing the predicated REPLICATE recipe.
Continue with processing the operands of the sunk recipe.
The initial version does not re-process candidates after other recipes
have been sunk. It also cannot partially sink induction increments at
the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the
induction is used for example in a GEP, only the first lane is used and
in the lowered IR the adds for the other lanes can be sunk into the
predicated blocks.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D100258
This reverts commit 94d54155e2.
This fixes a sanitizer failure by moving scalarizeLoadExtract(I)
before foldSingleElementStore(I), which may remove instructions.
This patch adds a new combine that tries to scalarize chains of
`extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is
profitable when extracting only a few elements out of a large vector.
At the moment, `store (extractelement (load %ptr), %idx), %ptr`
operations on large vectors result in huge code in the backend.
This can easily be triggered by using the matrix extension, e.g.
https://clang.godbolt.org/z/qsccPdPf4
This should complement D98240.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D100273
If we simplify values we sometimes end up with type mismatches. If the
value is a constant we can often cast it though to still allow
propagation. The logic is now put into a helper and it replaces some
ad hoc things we did before.
This also introduces the AA namespace for abstract attribute related
functions and types.
We have seen various problems when the call graph was not updated or
the updated did not succeed because it involved functions outside the
SCC. This patch adds assertions and checks to avoid accidentally
changing something outside the SCC that would impact the call graph.
It also prevents us from reanalyzing functions outside the current
SCC which could cause problems on its own. Note that the transformations
we do might cause the CG to be "more precise" but the original one would
always be a super set of the most precise one. Since the call graph is
by nature an approximation, it is good enough to have a super set of all
call edges.
The constant value lattice looks like this
```
<None>
|
<undef>
/ | \
... <0> ...
\ | /
<unknown>
```
We did not account for the undef and assumed a value meant we could not
change anymore. Now we actually check if we have the same value as
before, which will signal CHANGED to the users when we go from undef to
a specific constant.
This fixes, among other things, the bug exposed by @ipccp4 in
`value-simplify.ll`.
The state of AAPotentialValues tracks if undef is contained. It should
fold undef into the first non-undef value. However we missed a case
before. There was also a shadowing definition of two variables that
caused trouble. The test exposes both problems.