LoopInfo doesn't give all loops in a loop nest, it gives top level loops
only. While isLoopSimplifyForm() only checkes for the outter most loop of a
loop nest. As a result, inner loops that are not in simplied form can
not be simplified with the original code.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D137672
The 'and' case showed up in a recent bug report and prevented
more follow-on transforms from happening.
We could handle more patterns (for example, the select arms
simplified, but not to constant values), but this seems
like a safe, conservative enhancement. The backend can
convert select-of-constants to math/logic in many cases
if it is profitable.
There is a lot of overlapping logic for these kinds of patterns
(see SimplifySelectsFeedingBinaryOp() and FoldOpIntoSelect()),
so there may be some opportunity to improve efficiency.
There are also optimization gaps/inconsistency because we do
not call this code for all bin-opcodes (see TODO for ashr test).
For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its
leader has been made available_externally. This violates the COMDAT rule that
its members must be retained or discarded as a unit.
To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)
This fixes two problems.
(a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to
linger around (unreferenced, hence benign), and is now correctly discarded.
```
int foo();
inline int v = foo();
```
(b) Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.
```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
#include "c.h"
int main() {
bar();
foo();
}
eof
cat > m2.cc <<'eof'
#include "c.h"
__attribute__((noinline)) void bar() {
foo();
}
eof
clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
```
If a GlobalAlias references a GlobalValue which is just changed to
available_externally, change the GlobalAlias as well (e.g. C5/D5 comdats due to
cc1 -mconstructor-aliases). The GlobalAlias may be referenced by other
available_externally functions, so it cannot easily be removed.
Depends on D137441: we use available_externally to mark a GlobalAlias in a
non-prevailing COMDAT, similar to how we handle GlobalVariable/Function.
GlobalAlias may refer to a ConstantExpr, not changing GlobalAlias to
GlobalVariable gives flexibility for future extensions (the use case is niche.
For simplicity we don't handle it yet). In addition, available_externally
GlobalAlias is the most straightforward implementation and retains the aliasee
information to help optimizers.
See windows-vftable.ll: Windows vftable uses an alias pointing to a
private constant where the alias is the COMDAT leader. The COMDAT use case
is skeptical and ThinLTO does not discard the alias in the non-prevailing COMDAT.
This patch retains the behavior.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D135427
This adapts/copies code from the existing fold that allows
widening of load scalar+insert. It can help in IR because
it removes a shuffle, and the backend can already narrow
loads if that is profitable in codegen.
We might be able to consolidate more of the logic, but
handling this basic pattern should be enough to make a small
difference on one of the motivating examples from issue #17113.
The final goal of combining loads on those patterns is not
solved though.
Differential Revision: https://reviews.llvm.org/D137341
Gather nodes are vectorized as simply vector of the scalars instead of
relying on the actual node. It leads to the fact that in some cases
we may miss incorrect transformation (non-matching set of scalars is
just ended as a gather node instead of possible vector/gather node).
Better to rely on the actual nodes, it allows to improve stability and
better detect missed cases.
Differential Revision: https://reviews.llvm.org/D135174
`CreateSecStartEnd()` will return pointer to the input type, so when called with `CreateSecStartEnd(M, SanCovCFsSectionName, IntptrPtrTy)`, `SecStartEnd.first` and `SecStartEnd.second` will have type `IntptrPtrPtrTy`, not `IntptrPtrTy`.
This problem should not impact the functionality and with opaque pointer enable, this will not trigger any alarm. But if runs with `-no-opaque-pointers`, this mismatch pointer type will cause type check assertion in `CallInst::init()` to fail.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D137310
With https://reviews.llvm.org/D136627, now we have the metrics for profile staleness based on profile statistics, monitoring the profile staleness in real-time can help user quickly identify performance issues. For a production scenario, the build is usually incremental and if we want the real-time metrics, we should store/cache all the old object's metrics somewhere and pull them in a post-build time. To make it more convenient, this patch add an option to persist them into the object binary, the metrics can be reported right away by decoding the binary rather than polling the previous stdout/stderrs from a cache system.
For implementation, it writes the statistics first into a new metadata section(llvm.stats) then encode into a special ELF `.llvm_stats` section. The section data is formatted as a list of key/value pair so that future statistics can be easily extended. This is also under a new switch(`-persist-profile-staleness`)
In terms of size overhead, the metrics are computed at module level, so the size overhead should be small, measured on one of our internal service, it costs less than < 1MB for a 10GB+ binary.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D136698
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Plumb in salvaging for the address part of dbg.assign intrinsics.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133293
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Add method:
Instruction::mergeDIAssignID(
ArrayRef<const Instruction* > SourceInstructions)
which merges the DIAssignID metadata attachments on `SourceInstructions` and
`this` and replaces uses of the original IDs with the new shared one.
This is used when stores are merged, for example sinking stores out of a
if-diamond CFG or vectorizing contiguous stores.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133291
ProvenanceAnalysis::relatedCheck was giving different answers depending
on the order in which the pointers were passed.
Specifically, it was returning different values when A and B were both
loads and were both referring to identifiable objects, but only one was
used by a store instruction.
Reverted in b22d80dc6a.
Move getDebugValueLoc so that it can be accessed from DebugInfo.h for the
Assignment Tracking patch stack and remove redundant parameter Src.
Reviewed By: jryans
Differential Revision: https://reviews.llvm.org/D132357
Need to check if the insertelement mask size is reached during cost analysis to avoid compiler crash.
Differential Revision: https://reviews.llvm.org/D137639
This was done as a test for D137302 and it makes sense to push these changes
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D137493
This reverts commit 80378a4ca7.
I am reverting this patch because I need to revert 171f7024cc and without reverting this patch, reverting 171f7024cc causes conflicts.
Patch 171f7024cc introduced a cyclic dependancy in the module build.
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/48197/consoleFull#-69937453049ba4694-19c4-4d7e-bec5-911270d8a58c
In file included from <module-includes>:1:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/IR/Argument.h:18:10: fatal error: cyclic dependency in module 'LLVM_IR': LLVM_IR -> LLVM_intrinsic_gen -> LLVM_IR
^
While building module 'LLVM_MC' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14:
While building module 'LLVM_IR' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCPseudoProbe.h:57:
In file included from <module-includes>:12:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/IR/DebugInfo.h:24:10: fatal error: could not build module 'LLVM_intrinsic_gen'
~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
While building module 'LLVM_MC' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14:
In file included from <module-includes>:15:
In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCContext.h:23:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCPseudoProbe.h:57:10: fatal error: could not build module 'LLVM_IR'
~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14:10: fatal error: could not build module 'LLVM_MC'
~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
4 errors generated.
For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its
leader has been made available_externally. This violates the COMDAT rule that
its members must be retained or discarded as a unit.
To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)
This fixes two problems.
(a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to
linger around (unreferenced, hence benign), and is now correctly discarded.
```
int foo();
inline int v = foo();
```
(b) Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.
```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
#include "c.h"
int main() {
bar();
foo();
}
eof
cat > m2.cc <<'eof'
#include "c.h"
__attribute__((noinline)) void bar() {
foo();
}
eof
clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
```
If a GlobalAlias references a GlobalValue which is just changed to
available_externally, change the GlobalAlias as well (e.g. C5/D5 comdats due to
cc1 -mconstructor-aliases). The GlobalAlias may be referenced by other
available_externally functions, so it cannot easily be removed.
Depends on D137441: we use available_externally to mark a GlobalAlias in a
non-prevailing COMDAT, similar to how we handle GlobalVariable/Function.
GlobalAlias may refer to a ConstantExpr, not changing GlobalAlias to
GlobalVariable gives flexibility for future extensions (the use case is niche.
For simplicity we don't handle it yet). In addition, available_externally
GlobalAlias is the most straightforward implementation and retains the aliasee
information to help optimizers.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D135427
Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp
Differential Revision: https://reviews.llvm.org/D135876
Try to simplify comparisons with the smallest normalized value. If
denormals will be treated as 0, we can simplify by using an equality
comparison with 0.
fcmp olt fabs(x), smallest_normalized_number -> fcmp oeq x, 0.0
fcmp ult fabs(x), smallest_normalized_number -> fcmp ueq x, 0.0
fcmp oge fabs(x), smallest_normalized_number -> fcmp one x, 0.0
fcmp ult fabs(x), smallest_normalized_number -> fcmp ueq x, 0.0
The device libraries have a few range checks that look like
this for denormal handling paths.
Move getDebugValueLoc so that it can be accessed from DebugInfo.h for the
Assignment Tracking patch stack and remove redundant parameter Src.
Reviewed By: jryans
Differential Revision: https://reviews.llvm.org/D132357
Gather nodes are vectorized as simply vector of the scalars instead of
relying on the actual node. It leads to the fact that in some cases
we may miss incorrect transformation (non-matching set of scalars is
just ended as a gather node instead of possible vector/gather node).
Better to rely on the actual nodes, it allows to improve stability and
better detect missed cases.
Differential Revision: https://reviews.llvm.org/D135174
Currently call slot optimization may be prevented because the
lifetime markers for the destination only start after the call.
In this case, rather than aborting the transform, we should move
the lifetime.start before the call to enable the transform.
Differential Revision: https://reviews.llvm.org/D135886
It is possible that we can do better on some of these transforms
by passing some subset of attributes, but we were not doing that
in any of the changed code. So it's better to give that a name
to indicate we're clearing attributes or make that more obvious
by using the default-constructed empty list.
As noted in the code comment, we could generalize this:
https://alive2.llvm.org/ce/z/N5m-eZ
It saves an instruction even without a constant operand,
but the 'and' is wider. We can do that as another step
if it doesn't harm anything.
I noticed that this missing pattern with a constant operand
inhibited other transforms in a recent bug report, so this
is enough to solve that case.
Unswitching adjusts the CFG in ways that may invalidate cached loop
dispositions. Clear all cached block and loop dispositions during
trivial unswitching. The same is already done for non-trivial
unswitching.
Fixes#58751.
Previously, we only checked for duplicate zero entries when merging a
MemOPSize table (see D92074), but a user recently provided a reproducer
demonstrating that other entries can also be duplicated. As demonstrated
by the test in this patch, PGOMemOPSizeOpt can potentially generate
invalid IR for non-zero, non-consecutive duplicate entries. This seems
to be a rare case, since the duplicate entry is often below the
threshold, but possible. This patch extends the existing warning to
check for any duplicate values in the table, both in the optimization
and in llvm-profdata.
Differential Revision: https://reviews.llvm.org/D136211
Additional SCEV verification highlighted a case where the cached loop
dispositions where incorrect after simplifying a phi node in IndVars.
Fix it by invalidating the phi before replacing it.
Fixes#58750
If PRE is performed as part of the main GVN pass (to PRE GEP
operands before processing loads), and it is performed across a
backedge, we will end up adding the new instruction to the leader
table of a block that has not yet been processed. When it will be
processed, GVN will incorrectly assume that the value is already
available, even though it is only available at the end of the
block.
Avoid this by not performing PRE across backedges.
Fixes https://github.com/llvm/llvm-project/issues/58418.
Differential Revision: https://reviews.llvm.org/D136095
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.
The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.
High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.
Differential Revision: https://reviews.llvm.org/D135780
SpeculativelyExecuteBB(), which converts a branch + phi structure
into a select, currently bails out if the block contains an assume
(because it is not speculatable).
Adjust the fold to ignore ephemeral values (i.e. assumes and values
only used in assumes) for cost modelling purposes, and drop them
when performing the fold.
Theoretically, we could try to preserve the assume information by
generating a assume(br_cond || assume_cond) style assume, but this
is very unlikely to to be useful (because we don't do anything
useful with assumes of this form) and it would make things
substantially more complicated once we take operand bundle assumes
into account (which don't really support a || operation).
I'd prefer not to do that without good motivation.
Differential Revision: https://reviews.llvm.org/D137339
This is the bugfix to the miscompile mentioned in
https://reviews.llvm.org/D132055#3814831. The IR
that reproduced the bug is added as the test case in
this patch.
What this patch does is that, during legality phase
instead of checking the phi nodes only in `InnerLoop`
and `OuterLoop`, we check phi nodes in all subloops
of the `OuterLoop`. Suppose if the loop nest is triply
nested, and `InnerLoop` and `OuterLoop` is the middle
loop and the outermost loop respectively, we'll check
phi nodes in the innermost loop as well, in addition to
the ones in the middle and outermost loops.
Reviewed By: Meinersbur, #loopoptwg
Differential Revision: https://reviews.llvm.org/D134930
CVP currently only tries to simplify comparisons if there is a
constant operand. However, even if both are non-constant, we may
be able to determine the result of the comparison based on range
information.
IPSCCP is already capable of doing this, but because it runs very
early, it may miss some cases.
Differential Revision: https://reviews.llvm.org/D137253
This patch is to address the test cases in which the load has to be inserted at a right point. This happens when there is a store b/w the loads.
This patch reverts the loads merge in all cases when stores are present b/w loads and will eventually be replaced with proper fix and test cases.
Differential Revision: https://reviews.llvm.org/D137333
Instrumentation passes now use the proper shadow offset. There will be many
asan test failures without this patch. For example:
```
$ ./lib/asan/tests/LOONGARCH64LinuxConfig/Asan-loongarch64-calls-Test
AddressSanitizer:DEADLYSIGNAL
=================================================================
==651209==ERROR: AddressSanitizer: SEGV on unknown address 0x1ffffe2dfa9b (pc 0x5555585e151c bp 0x7ffffb9ec070 sp 0x7ffffb9ebfd0 T0)
==651209==The signal is caused by a UNKNOWN memory access.
```
Before the patch:
```
$ make check-asan
Testing Time: 36.13s
Unsupported : 205
Passed : 83
Expectedly Failed: 1
Failed : 239
```
After the patch:
```
$ make check-asan
Testing Time: 58.98s
Unsupported : 205
Passed : 421
Expectedly Failed: 1
Failed : 89
```
Differential Revision: https://reviews.llvm.org/D137013
This enables odr indicators on all platforms and private aliases on non-Windows.
Note that GCC also uses private aliases: this fixes bogus
`The following global variable is not properly aligned.` errors for interposed global variables
Fix https://github.com/google/sanitizers/issues/398
Fix https://github.com/google/sanitizers/issues/1017
Fix https://github.com/llvm/llvm-project/issues/36893 (we can restore D46665)
Global variables of non-hasExactDefinition() linkages (i.e.
linkonce/linkonce_odr/weak/weak_odr/common/external_weak) are not instrumented.
If an instrumented variable gets interposed to an uninstrumented variable due to
symbol interposition (e.g. in issue 36893, _ZTS1A in foo.so is resolved to _ZTS1A in
the executable), there may be a bogus error.
With private aliases, the register code will not resolve to a definition in
another module, and thus prevent the issue.
Cons: minor size increase. This is mainly due to extra `__odr_asan_gen_*` symbols.
(ELF) In addition, in relocatable files private aliases replace some relocations
referencing global symbols with .L symbols and may introduce some STT_SECTION symbols.
For lld, with -g0, the size increase is 0.07~0.09% for many configurations I
have tested: -O0, -O1, -O2, -O3, -O2 -ffunction-sections -fdata-sections
-Wl,--gc-sections. With -g1 or above, the size increase ratio will be even smaller.
This patch obsoletes D92078.
Don't migrate Windows for now: the static data member of a specialization
`std::num_put<char>::id` is a weak symbol, as well as its ODR indicator.
Unfortunately, link.exe (and lld without -lldmingw) generally doesn't support
duplicate weak definitions (weak symbols in different TUs likely pick different
defined external symbols and conflict).
Differential Revision: https://reviews.llvm.org/D137227
For some auto-generated sources, we have a huge number of critical
edges (like from switch statements). We have seen instance of 183777
critical edges in one function.
After we split the critical edges in PGO instrumentation/profile-use
pass, the CFG is so large that we have compiler time issues in
downstream passes (like in machine CSE and block placement). Here I
add a threshold to skip PGO if the number of critical edges are too
large.
The threshold is large enough so that it will not affect the majority
of PGO compilation.
Also sync the logic for skipping instrumentation and profile-use. I
think this is the correct thing to do.
Differential Revision: https://reviews.llvm.org/D137184
This is a corrected version of:
bc886e9b58
I made a copy-paste error that created an "add" instead of the
intended "sub" on that attempt. The regression tests showed the
bug, but I overlooked that.
As I said in a comment on issue #58717, the bug reports resulting
from the botched patch confirm that the pattern does occur in
many real-world applications, so hopefully eliminating the multiply
results in better code.
I added one more regression test in this version of the patch,
and here's an Alive2 proof to show that exact example:
https://alive2.llvm.org/ce/z/dge7VC
Original commit message:
This is a sibling to:
6064e92b0a
...but we canonicalize the shl+add to shl+xor,
so the pattern is different than I expected:
https://alive2.llvm.org/ce/z/8CX16e
I have not found any patterns that are safe
to propagate no-wrap, so that is not included
here.
Differential Revision: https://reviews.llvm.org/D137157
Using a DebugVariable as the set key rather than std::pair<DIVariable *,
DIExpression *> ensures we don't accidently confuse multiple instances of
inlined variables.
Reviewed By: jryans
Differential Revision: https://reviews.llvm.org/D133303
Follow-on to:
ec0b406e16
This should prevent crashing for example like issue #58552
by not matching a select-of-vectors-with-scalar-condition.
The test that shows a regression seems unlikely to occur
in real code.
This also picks up an optimization in the case where a real
(bitwise) logic op is used. We could already convert some
similar select ops to real logic via impliesPoison(), so
we don't see more diffs on commuted tests. Using commutative
matchers (when safe) might also handle one of the TODO tests.
Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp
Differential Revision: https://reviews.llvm.org/D135876
This prevents duplicates to be pushed into the stack and hypothetically
should reduce memory footprint on ugly cornercases with multiple repeating
duplicates in 'and' tree.
Use DL-aware ConstantFoldCompareInstOperands() API instead of
ConstantExpr API. The practical effect of this is that SCCP can
now fold comparisons that require DL.
Relative to the previous attempt, this also updates the
ValueLattice unit tests.
-----
Resolve the TODO about incorrect getCompare() behavior. This can
be made more precise (e.g. by materializing the undef value and
performing constant folding on it), but for now just return an
unknown result to fix the correctness issue.
This should be NFC in terms of user-visible behavior, because the
only user of this method (SCCP) was already guarding against
UndefValue results.
Commit 359bc5c541 caused
Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"'
failures in decomposeGEP when the GEP pointer operand is a vector.
Fix is to use DataLayout::getIndexTypeSizeInBits when fetching the
index size, as it will use the scalar type in case of a ptr vector.
Differential Revision: https://reviews.llvm.org/D137185
This should prevent crashing for the example in issue #58552
by not matching a select-of-vectors-with-scalar-condition.
A similar change is likely needed for the related fold to
properly fix that kind of bug.
The test that shows a regression seems unlikely to occur
in real code.
This also picks up an optimization in the case where a real
(bitwise) logic op is used. We could already convert some
similar select ops to real logic via impliesPoison(), so
we don't see more diffs on commuted tests. Using commutative
matchers (when safe) might also handle one of the TODO tests.
Varargs and inalloca have a weird interaction where varargs are actually
passed via the inalloca alloca. Removing inalloca breaks the varargs
because they're still not passed as separate arguments.
Fixes#58718
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D137182
Resolve the TODO about incorrect getCompare() behavior. This can
be made more precise (e.g. by materializing the undef value and
performing constant folding on it), but for now just return an
unknown result.
At the moment, the implementation requires that the outer GEP has a
single index, the inner GEP can have an arbitrary indices, because the
general `decompose` helper is used.
AAPointerInfo now maintains a list of all Access objects that it owns, along
with the following maps:
- OffsetBins: OffsetAndSize -> { Access }
- InstTupleMap: RemoteI x LocalI -> Access
A RemoteI is any instruction that accesses memory. RemoteI is different from
LocalI if and only if LocalI is a call; then RemoteI is some instruction in the
callgraph starting from LocalI.
Motivation: When AAPointerInfo recomputes the offset for an instruction, it sets
the value to Unknown if the new offset is not the same as the old offset. The
instruction must now be moved from its current bin to the bin corresponding to
the new offset. This happens for example, when:
- A PHINode has operands that result in different offsets.
- The same remote inst is reachable from the same local inst via different paths
in the callgraph:
```
A (local inst)
|
B
/ \
C1 C2
\ /
D (remote inst)
```
This fixes a bug where a store is incorrectly eliminated in a lit test.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D136526
Do not duplicate a BB if it has a lot of PHI nodes.
If a threadable chain is too long then the number of duplicated PHI nodes
can add up, leading to a substantial increase in compile time when rewriting
the SSA.
Fixes https://github.com/llvm/llvm-project/issues/58203
Differential Revision: https://reviews.llvm.org/D136716
The threshold of 76 in this patch is reasonably high and reduces the compile
time of cldwat2m_macro.f90 in SPEC2017/cam4 from 80+min to <2min.
Change-Id: I153c89a8e0d89b206a5193dc1b908c67e320717e
The transform was just added with:
115d2f69a5
...but as noted in post-commit feedback, it was
confusingly coded. Now, we create the final
expected canonicalized form directly and put
an extra use check on the match, so we should
not ever end up with more instructions.
The pointsToConstantMemory() method returns true only if the memory pointed to
by the memory location is globally invariant. However, the LLVM memory model
also has the semantic notion of *locally-invariant*: memory that is known to be
invariant for the life of the SSA value representing that pointer. The most
common example of this is a pointer argument that is marked readonly noalias,
which the Rust compiler frequently emits.
It'd be desirable for LLVM to treat locally-invariant memory the same way as
globally-invariant memory when it's safe to do so. This patch implements that,
by introducing the concept of a *ModRefInfo mask*. A ModRefInfo mask is a bound
on the Mod/Ref behavior of an instruction that writes to a memory location,
based on the knowledge that the memory is globally-constant memory (in which
case the mask is NoModRef) or locally-constant memory (in which case the mask
is Ref). ModRefInfo values for an instruction can be combined with the
ModRefInfo mask by simply using the & operator. Where appropriate, this patch
has modified uses of pointsToConstantMemory() to instead examine the mask.
The most notable optimization change I noticed with this patch is that now
redundant loads from readonly noalias pointers can be eliminated across calls,
even when the pointer is captured. Internally, before this patch,
AliasAnalysis was assigning Ref to reads from constant memory; now AA can
assign NoModRef, which is a tighter bound.
Differential Revision: https://reviews.llvm.org/D136659
https://alive2.llvm.org/ce/z/EfHlWN
In the motivating case from issue #58313,
this allows forming a duplicate 'not' op
which then gets CSE'd and simplifyCFG'd
and combined into the expected 'xor'.
When translating offset info from the callee at a call site, first check if the
offset is Unknown. Any offset in the caller should be added only if the callee
offset is valid.
Differential Revision: https://reviews.llvm.org/D137011
Loop Fusion (Function Pass) requires loops in simplified form. With
legacy-pm, loop-simplify pass is added as a dependency for loop-fusion.
But the new pass manager does not always ensure this format. This patch
tries to invoke simplifyLoop() on loops that are not in simplified form
only for new PM.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D136781
The existing way of creating the predicate in the guard blocks uses
a boolean value per outgoing block. This increases the number of live
booleans as the number of outgoing blocks increases. The new way added
in this change is to store one integer to represent the outgoing block
we want to branch to, then at each guard block, an integer equality
check is performed to decide which a specific outgoing block is taken.
Using an integer reduces the number of live values and decreases
register pressure especially in cases where there are a large number
of outgoing blocks. The integer based approach is used when the
number of outgoing blocks crosses a threshold, which is currently set
to 32.
Patch by Ruiling Song.
Differential review: https://reviews.llvm.org/D127831
This is a sibling to:
6064e92b0a
...but we canonicalize the shl+add to shl+xor,
so the pattern is different than I expected:
https://alive2.llvm.org/ce/z/8CX16e
I have not found any patterns that are safe
to propagate no-wrap, so that is not included
here.
Currently, InstCombine can elide a memcpy from a constant to a local alloca if
that alloca is passed as a nocapture parameter to a *function* that's readnone
or readonly, but it can't forward the memcpy if the *argument* is marked
readonly nocapture, even though readonly guarantees that the callee won't
mutate the pointee through that pointer. This patch adds support for detecting
and handling such situations, which arise relatively frequently in Rust, a
frontend that liberally emits readonly.
A more general version of this optimization would use alias analysis to check
the call's ModRef info for the pointee, but I was concerned about blowing up
compile time, so for now I'm just checking for one of readnone on the function,
readonly on the function, or readonly on the parameter.
Differential Revision: https://reviews.llvm.org/D136822
InstCombine can replace memcpy to an alloca with a pointer directly to the
source in certain cases. Unfortunately, it also did so for volatile memcpys.
This patch makes it stop doing that.
This was discovered in D136822.
Differential Revision: https://reviews.llvm.org/D137031
Replace custom code to check if only the first lane is used by generic
helper `onlyFirstLaneUsed`. This enables VPlan-based sinking in a few
additional cases and was suggested in D133760.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D136368
X * ((1 << Z) + 1) --> (X << Z) + X
https://alive2.llvm.org/ce/z/P-7WK9
It's possible that we could do better with propagating
no-wrap, but this carries over the existing logic and
appears to be correct.
The naming differences on the existing folds are a result
of using getName() to set the final value via Builder.
That makes it easier to transfer no-wrap rather than the
gymnastics required from the raw create instruction APIs.
The `FunctionSpecialization` pass needs loop analysis results for its
cost function. For this purpose, it computes the `DominatorTree` and
`LoopInfo` for a function in `getSpecializationBonus`. This function,
however, is called O(number of call sites x number of arguments), but
the DominatorTree/LoopInfo can be computed just once.
This patch plugs into the PassManager infrastructure to obtain
LoopInfo for a function and removes ad-hoc computation from
`getSpecializatioBonus`.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D136332
[recommitting after recommitting a dependency]
This patch reorders the traversal of function call sites and function
formal parameters to:
* do various argument feasibility checks (`isArgumentInteresting` )
only once per argument, i.e. doing N-args checks instead of
N-calls x N-args checks.
* do hash table lookups only once per call site, i.e. N-calls
lookups/inserts instead of N-call x N-args lookups/inserts.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D135968
Change-Id: I7d21081a2479cbdb62deac15f903d6da4f3b8529
In InstCombine we treat i8/i16 as desirable, even if they are not legal.
The current logic in shouldChangeType will decide to convert from an
illegal but desirable type (such as an i8) to an illegal and undesirable
type (such as i3). This patch prevents changing the switch conditions to
an irregular type on like Arm/AArch64 where i8/i16 are not legal.
This is the same issue as https://reviews.llvm.org/D54115. In the case I
was looking it is was converting an i32 switch to an i8 switch, which
then became a i3 switch.
Differential Revision: https://reviews.llvm.org/D136763
There's no reason to shrink a constant or simplify
an operand in 2 steps.
This matches what we currently do for 'add' (although that
seems like it should be altered to handle the commutative
case).
The LSR may suggest less profitable transformation to the loop. This
patch adds check to prevent LSR from generating worse code than what
we already have.
Since LSR affects nearly all targets, the patch is guarded by the
option 'lsr-drop-solution' and default as disable for now.
The next step should be extending an TTI interface to allow target(s)
to enable this enhancememnt.
Debug log is added to remind user of such choice to skip the LSR
solution.
Reviewed By: Meinersbur, #loopoptwg
Differential Revision: https://reviews.llvm.org/D126043
When calculating the specialization bonus for a given function argument,
we recursively traverse the chain of (certain) users, accumulating the
instruction costs. Then we exponentially increase the bonus to account
for loop nests. This is problematic for two reasons: (a) the users might
not themselves be inside the loop nest, (b) if they are we are accounting
for it multiple times. Instead we should be adjusting the bonus before
traversing the user chain.
This reduces the instruction count for CTMark (newPM-O3) when Function
Specialization is enabled without actually reducing the amount of
specializations performed (geomean: -0.001% non-LTO, -0.406% LTO).
Differential Revision: https://reviews.llvm.org/D136692
This is copying the code that was added for 'add' with D130075.
(That patch removed a fallthrough in the cases, but we can
probably still share at least some code again as a follow-up
cleanup, but I didn't want to risk it here.)
The reasoning is similar to the carry propagation for 'add':
if we don't demand low bits of the subtraction and the
subtrahend (aka RHS or operand 1) is known zero in those low
bits, then there can't be any borrowing required from the
higher bits of operand 0, so the low bits don't matter.
Also, the no-wrap flags can be propagated (and I think that
should be true for add too).
Here's an attempt to prove that in Alive2:
https://alive2.llvm.org/ce/z/xqh7Pa
(can add nsw or nuw to src and tgt, and it should still pass)
Differential Revision: https://reviews.llvm.org/D136788
[fixed test to work with reverse iteration]
The `FunctionSpecialization` pass has support for specialising
functions, which are called with literal arguments. This functionality
is disabled by default and is enabled with the option
`-function-specialization-for-literal-constant` . There are a few
issues with the implementation, though:
* even with the default, the pass will still specialise based on
floating-point literals
* even when it's enabled, the pass will specialise only for the `i1`
type (or `i2` if all of the possible 4 values occur, or `i3` if all
of the possible 8 values occur, etc)
The reason for this is incorrect check of the lattice value of the
function formal parameter. The lattice value is `overdefined` when the
constant range of the possible arguments is the full set, and this is
the reason for the specialisation to trigger. However, if the set of
the possible arguments is not the full set, that must not prevent the
specialisation.
This patch changes the pass to NOT consider a formal parameter when
specialising a function if the lattice value for that parameter is:
* unknown or undef
* a constant
* a constant range with a single element
on the basis that specialisation is pointless for those cases.
Is also changes the criteria for picking up an actual argument to
specialise if the argument is:
* a LLVM IR constant
* has `constant` lattice value
has `constantrange` lattice value with a single element.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135893
Change-Id: Iea273423176082ec51339aa66a5fe9fea83557ee
The struct OffsetAndSize is a simple tuple of two int64_t. Treating it as a
derived class of std::pair has no special benefit, but it makes the code
verbose since we need get/set functions that avoid using "first" and "second" in
client code. Eliminating the std::pair makes this more readable.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D136745
When a profile is stale and profile mismatch could happen, the mismatched samples are discarded, so we'd like to compute the mismatch metrics to quantify how stale the profile is, which will suggest user to refresh the profile if the number is high.
Two sets of metrics are introduced here:
- (Num_of_mismatched_funchash/Total_profiled_funchash), (Samples_of_mismached_func_hash / Samples_of_profiled_function) : Here it leverages the FunctionSamples's checksums attribute which is a feature of pseudo probe. When the source code CFG changes, the function checksums will be different, later sample loader will discard the whole functions' samples, this metrics can show the percentage of samples are discarded due to this.
- (Num_of_mismatched_callsite/Total_profiled_callsite), (Samples_of_mismached_callsite / Samples_of_profiled_callsite) : This shows how many mismatching for the callsite location as callsite location mismatch will affect the inlining which is highly correlated with the performance. It goes through all the callsite location in the IR and profile, use the call target name to match, report the num of samples in the profile that doesn't match a IR callsite.
This is implemented in a new class(SampleProfileMatcher) and under a switch("--report-profile-staleness"), we plan to extend it with a fuzzy profile matching feature in the future.
Reviewed By: hoy, wenlei, davidxl
Differential Revision: https://reviews.llvm.org/D136627
If we run LTO optimization we migth end up introducing a custom state machine
and later transforming the region into SPMD. This is a problem. While a follow
up will introduce a check for the SPMD conversion, this already prevents the
eager custom state machine generation. Only if the kernel init function is
defined, rather then declared, we will emit a custom state machine. SPMD-zation
can happen eagerly though. Tests are adjusted via a weak definition. The LTO
test was added to verify this works as expected.
Differential Revision: https://reviews.llvm.org/D136740
This was reverted because it was breaking when targeting Darwin which
tried to export these symbols which are now hidden. It should be safe
to just stop attempting to export these symbols in the clang driver,
though Apple folks will need to change their TAPI allow list described
in the commit where these symbols were originally exported
f538018562
Then reverted again because it broke tests on MacOS, they should be
fixed now.
Bug: https://github.com/llvm/llvm-project/issues/58265
Differential Revision: https://reviews.llvm.org/D135340
This patch reorders the traversal of function call sites and function
formal parameters to:
* do various argument feasibility checks (`isArgumentInteresting` ) only once per argument, i.e. doing N-args checks instead of N-calls x N-args checks.
* do hash table lookups only once per call site, i.e. N-calls lookups/inserts instead of N-call x N-args lookups/inserts.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D135968
When rewriting the call sites to call the new specialised functions, a
single call site can be matched by two different specialisations - a
"less specialised" version of the function and a "more specialised"
version of the function, e.g. for a function
void f(int x, int y)
the call like `f(1, 2)` could be matched by either
void f.1(int x /* int y == 2 */);
or
void f.2(/* int x == 1, int y == 2 */);
The `FunctionSpecialisation` pass tries to match specialisation in the
order of decreasing gain, so "more specialised" functions are
preferred to "less specialised" functions. This breaks, however, when
using the flag `-force-function-specialization`, in which case the
cost/benefit analysis is not performed and all the specialisations are
equally preferable.
This patch makes the pass calculate specialisation gain and order the
specialisations accordingly even when `-force-function-specialization`
is used, under the assumption that this flag has purely debugging
purpose and it is reasonable to ignore the extra computing effort it
incurs.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D136180
The `FunctionSpecialization` pass has support for specialising
functions, which are called with literal arguments. This functionality
is disabled by default and is enabled with the option
`-function-specialization-for-literal-constant` . There are a few
issues with the implementation, though:
* even with the default, the pass will still specialise based on
floating-point literals
* even when it's enabled, the pass will specialise only for the `i1`
type (or `i2` if all of the possible 4 values occur, or `i3` if all
of the possible 8 values occur, etc)
The reason for this is incorrect check of the lattice value of the
function formal parameter. The lattice value is `overdefined` when the
constant range of the possible arguments is the full set, and this is
the reason for the specialisation to trigger. However, if the set of
the possible arguments is not the full set, that must not prevent the
specialisation.
This patch changes the pass to NOT consider a formal parameter when
specialising a function if the lattice value for that parameter is:
* unknown or undef
* a constant
* a constant range with a single element
on the basis that specialisation is pointless for those cases.
Is also changes the criteria for picking up an actual argument to
specialise if the argument is:
* a LLVM IR constant
* has `constant` lattice value
has `constantrange` lattice value with a single element.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135893
When collecting the possible constant arguments to
specialise a function the compiler will abandon the search
on the first argument that is for some reason unsuitable as
a specialisation constant. Thus, depending on the traversal
order of the functions and call sites, the compiler can end
up with a different set of possible constants, hence with
different set of specialisations.
With this patch, the compiler will skip unsuitable
constants, but nevertheless will continue searching for
more.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135867
Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder.
In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV.
In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV.
As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination.
As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV.
Differential Revision: https://reviews.llvm.org/D136695
Small functions with size under a given threshold are not
considered for specialisaion on the presumption that they
are easy to inline. This does not apply to `noinline`
functions, though.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135862
Using the legacy PM for the optimization pipeline was deprecated in 13.0.0.
Following recent changes to remove non-core features of the legacy
PM/optimization pipeline, remove DataFlowSanitizerLegacyPass.
Differential Revision: https://reviews.llvm.org/D124594
This reverts commit bd7949bcd8.
Revert this patch since reviwers have different opinions regarding
the approach in post-commit review.
Will open RFC for further discussion.
Differential Revision: https://reviews.llvm.org/D132408
This reverts commit 04877284b4.
Looks like this is still breaking the test
Profile-x86_64 :: instrprof-darwin-dead-strip.c
(see comment on https://reviews.llvm.org/D135340).
This addresses a bug where vector versions of ctlz are creating false positive reports.
Depends on D136369
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D136523
When -asan-max-inline-poisoning-size=0, all shadow memory access should be
outlined (through asan calls). This was not occuring when partial poisoning
was required on the right side of a variable's redzone. This diff contains
the changes necessary to implement and utilize __asan_set_shadow_01() through
__asan_set_shadow_07(). The change is necessary for the full abstraction of
the asan implementation and will enable experimentation with alternate strategies.
Differential Revision: https://reviews.llvm.org/D136197
This is a sibling transform to the fold just above it. That was changed
to allow the corresponding commuted patterns with:
3073074562e1bd759ea58628e6df70
This was reverted because it was breaking when targeting Darwin which
tried to export these symbols which are now hidden. It should be safe
to just stop attempting to export these symbols in the clang driver,
though Apple folks will need to change their TAPI allow list described
in the commit where these symbols were originally exported
f538018562
Bug: https://github.com/llvm/llvm-project/issues/58265
Differential Revision: https://reviews.llvm.org/D135340
Improve O(N^2) to O(N) in some cases, reduce number of allocations by
reserving memory.
Also, improve analysis of loads reduction values to avoid analysis
of not vectorizable cases.
This teaches the SCCP Solver how to constant fold more intrinsics. Constant
folding appears to be just as good as D115737 but much, much lower in code
change impact as suggested by nikic.
The constrained floating-point intrinsics all take at least one metadata
argument and were the motivation for the change.
Differential Revision: https://reviews.llvm.org/D136466
(sign|resign) + (auth|resign) can be folded by omitting the middle
sign+auth component if the key and discriminator match.
Differential Revision: https://reviews.llvm.org/D132383
Without a freeze, this transform can leak poison to the output:
https://alive2.llvm.org/ce/z/GJuF9i
This makes the transform as uniform as possible, and it can help
reduce patterns like issue #58313 (although that particular
example probably still needs another transform).
Differential Revision: https://reviews.llvm.org/D136527
This doesn't touch objc-arc-contract because that's in the codegen pipeline.
However, this does move its corresponding initialize function into initializeCodegen().
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D135041
When the common value is part of either select condition,
this is safe to reduce. Otherwise, it is not poison-safe
(with the select form of the pattern):
https://alive2.llvm.org/ce/z/FxQTzB
This is another patch motivated by issue #58313.
1) Use a static array of pointer to retain the dummy vars.
2) Associate liveness of the array with that of the runtime hook variable
__llvm_profile_runtime.
3) Perform the runtime initialization through the runtime hook variable.
4) Preserve the runtime hook variable using the -u linker flag.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D136192
This is obviously correct for real logic instructions,
and it also works for the poison-safe variants that use
selects:
https://alive2.llvm.org/ce/z/wyHiwX
This is motivated by the lack of 'xor' folding seen in issue #58313.
This more general fold should help reduce some of those patterns,
but I'm not sure if this specific case does anything for that
particular example.
This allows the regular bitwise logic opcodes in addition to the
poison-safe select variants:
https://alive2.llvm.org/ce/z/8xB9gy
Handling commuted variants safely is likely trickier, so that's
left to another patch.
Additional SCEV verification highlighted a case where the cached loop
dispositions where incorrect after simplifying a condition in IndVars
and moving the user in LoopDeletion. Fix it by invalidating ICmp and all
its users.
Fixes#58515.
This implements IR and bitcode support for the memory attribute,
as specified in https://reviews.llvm.org/D135597.
The new attribute is not used for anything yet (and as such, the
old memory attributes are unaffected).
Differential Revision: https://reviews.llvm.org/D135592
Currently, compiling a program with the `-pg` flag will result in an
undefined symbol error for `.mcount`. This revision fixes the call to
use `__mcount`, which requires a pointer argument to a pointer-sized
object (unique per inserted call) on AIX.
This is only a partial fix. This patch should fix the `-pg` flag's
behaviour on AIX to work with code you are compiling, but it will not
link against standard libraries with `mcount` instrumentation calls. The
next step is to add profiled libraries to the linker search paths in the
Clang driver for the AIX toolchain when linking with `-pg`.
Differential Review: https://reviews.llvm.org/D135384
Canonicalize GEP of GEP by swapping GEP with some suffix constant indices to the back (and GEP with all constant indices to the back of that), this allows more constant index GEP merging to happen. Exceptions are: If swapping violates use-def relations, or anti-optimizes LICM
For constant indexed GEP of GEP, if they cannot be merged directly, they will be casted to i8* and merged.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D125845
The code in buildScalarSteps already properly handles creating the
scalar induction values with VF = 1. Use it directly instead of using
extra code to handle that case.
Suggested by @Ayal in D133760.
Reapplying after the fix for volatile modelling in D135863.
-----
Don't add argmem if the pointer is clearly not an argument (e.g.
a global). I don't think this makes a difference right now, but
gives more obvious results with D135780.
Bug introduced in e239198cdb.
The assert() is making an assumption that the resulting shuffle mask
will always select elements from both vectors, this is untrue in the
case of two shuffles being folded if the former shuffle has a mask with
undef elements in it. In such a case folding the shuffles might result
in a mask which only selects from one of the vectors because the other
elements (in the mask) are undef.
Differential Revision: https://reviews.llvm.org/D136256
Per LangRef, volatile operations are allowed to access the location
of their pointer argument, plus inaccessible memory:
> Any volatile operation can have side effects, and any volatile
> operation can read and/or modify state which is not accessible
> via a regular load or store in this module.
> [...]
> The allowed side-effects for volatile accesses are limited. If
> a non-volatile store to a given address would be legal, a volatile
> operation may modify the memory at that address. A volatile
> operation may not modify any other memory accessible by the
> module being compiled. A volatile operation may not call any
> code in the current module.
FunctionAttrs currently does not model this and ends up marking
functions with volatile accesses on arguments as argmemonly,
even though they should be inaccessiblemem_or_argmemonly.
Differential Revision: https://reviews.llvm.org/D135863
This reverts commit 8ef3fd8d59.
I mentioned that GlobalAlias was not handled. It turns out GlobalAlias has to be handled in the same patch (as opposed to in a follow-up),
as otherwise clang codegen of C5/D5 constructor/destructor would regress (https://reviews.llvm.org/D135427#3869003).
@Ayal suggested a better named helper than using `!getDef()` to check if
a value is invariant across all parts.
The property we are using here is that the VPValue is defined outside
any vector loop region. There's a TODO left to handle recipes defined in
pre-header blocks.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D133666
This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns.
Differential Revision: https://reviews.llvm.org/D135137
Followup to D135962 to rename remaining uses of
FunctionModRefBehavior to MemoryEffects. Does not touch API names
yet, but also updates variables names FMRB/MRB to ME, to match the
new type name.
Generalized the cost model estimation. Improved cost model estimation
for repeated scalars (no need to count their cost anymore), improved
cost model for extractelement instructions.
cpu2017
511.povray_r 0.57
520.omnetpp_r -0.98
521.wrf_r -0.01
525.x264_r 3.59 <+
526.blender_r -0.12
531.deepsjeng_r -0.07
538.imagick_r -1.42
Geometric mean: 0.21
Differential Revision: https://reviews.llvm.org/D115757
Generalized the cost model estimation. Improved cost model estimation
for repeated scalars (no need to count their cost anymore), improved
cost model for extractelement instructions.
cpu2017
511.povray_r 0.57
520.omnetpp_r -0.98
521.wrf_r -0.01
525.x264_r 3.59 <+
526.blender_r -0.12
531.deepsjeng_r -0.07
538.imagick_r -1.42
Geometric mean: 0.21
Differential Revision: https://reviews.llvm.org/D115757
Fixes an SROA crash.
Fallout from opaque pointers since with typed pointers we'd bail out at the bitcast.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D136119
Moving an instruction can invalidate the cached block dispositions of
the corresponding SCEV. Invalidate the cached dispositions.
Also fixes a copy-paste error in forgetBlockAndLoopDispositions where
the start expression S was removed from BlockDispositions in the loop
but not the current values. This was also exposed by the new test case.
Fixes#58439.
When SimplifyLibCalls fail to optimize printf and sprintf it add
NoUndef/NonNull/Dereferenceable attributes. This patch add the same attributes
if SimplifyLibCalls optimize printf/sprintf into the integer only
iprintf/siprintf.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D136140
When unrolling, the exit values in LCSSA phis will get updated.
Invalidate cached SCEV values for those phis in case SCEV looked through
a exit phi.
Fixes#58340.
If the arithmetic for indices of inbounds GEPs overflows, the result is
poison. This means it is also OK for the coefficients to overflow. GEP
decomposition is limited to cases where the index size is <= 64 bit,
which can be represented by int64_t used for the coefficients in the
constraint system.
This reverts commit 233659c7ae.
I see some sanitizer build bot failures. Not sure if it is change
causing it, but let's see if a revert returns the bots to green...
We can't assume that operand 0 is the negated operand because
the matcher handles "fsub -0.0, X" (and also +0.0 with FMF).
By capturing the extract within the match, we avoid the bug
and make the transform more robust (can't assume that this
pass will only see canonical IR).
Relative to the previous attempt, this is rebased over the
InstSimplify fix in ac74e7a780,
which addresses the miscompile reported in PR58401.
-----
foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.
This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.
This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.
Differential Revision: https://reviews.llvm.org/D134954
LoopFlatten has been in the code base off by default for years, but this
enables it to run by default. Downstream this has been running for
years, so it has been exposed to quite some code. Then around the time
we switched to the NPM, several fixes went in related to updating the
MemorySSA state and we moved it to a loop pass manager, which both
helped preventing rerunning certain analysis passes, and thus helped a
bit with compile-times.
About compile-times, adding a pass isn't free, but this should see only
very minor increases. The pass is relatively simple and there shouldn't
be anything algorithmically expensive because all it does is looking at
inner/outer loops and it checks assumptions on loop increments and
indices. If we see increases, I expect this to mainly come from
invalidation of analysis info, and perhaps subsequent passes to trigger
and do more. Despite its simplicity/restrictions, it triggers in most
code-bases, which makes it worth to enable this by default.
Differential Revision: https://reviews.llvm.org/D109958
Another alternative to fix the thread identification problem in
coroutines.
We plan to fix this problem by unifying memory effecting attributes. See
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
But it may be a long-term project. And it is a pity that the coroutines
can't resume in different threads for years. So this one is temporary
fix. It may cause unnecessary performance regression for coroutines. But
correctness are more important. And this one is planned to be reverted
after we are able to unify the memory effecting attributes actually.
Reviewed By: jdoerfert, rjmccall
Differential Revision: https://reviews.llvm.org/D135550
Instead of duplicating the existing decomposition code for GEP indices
just use the existing code by calling the existing decompose function on
the index expression and multiply the result's coefficients by the scale of
the index.
This both reduces code duplication and generalizes the pattern we can
handle.
If both operands of an `add nsw` are known positive, it can be treated
the same as `add nuw` and added to the unsigned system.
https://alive2.llvm.org/ce/z/6gprff
If the divisor is a power-of-2 or negative-power-of-2 and the dividend
is known to have >= trailing zeros than the divisor, the division is exact:
https://alive2.llvm.org/ce/z/UGBksM (general proof)
https://alive2.llvm.org/ce/z/D4yPS- (examples based on regression tests)
This isn't the most direct optimization (we could create ashr in these
examples instead of relying on existing folds for exact divides), but
it's possible that there's a more general constraint than just a pow2
divisor, so this might be extended in the future.
This should solve issue #58348.
Differential Revision: https://reviews.llvm.org/D135970
This should be functionally equivalent - both calls are thin
wrappers around computeKnownBits(). We'll probably want to use
known-bits directly in follow-up patches because that could
determine "exact" for example (see issue #58348).
Support decomposition for `mul/shl nuw` with constant operand for unsigned
queries. Those expressions should not wrap in the unsigned sense and can
be added directly to the unsigned system.
makeLoopInvariant may recursively move its operands to make them
invariant, before moving the passed in instruction. Those recursively
moved instructions are currently missed when invalidating block and loop
dispositions.
To address this, move the invalidation code to Loop::makeLoopInvariant.
Fixes#58314.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D135909
`ProvenanceAnalysis::related()` was assuming that the order of parameters for `relatedCheck()` was not affecting
the result but this was not the case when both parameters were `PHINode`s.
Due to this assumption `ProvenanceAnalysis::related()` was ordering the parameters based on pointer value which resulted in
non-deterministic behavior.
To address this change `relatedPHI()` so that it gives the same result independent of the parameter order.
rdar://100325456
Differential Revision: https://reviews.llvm.org/D135376
Instead of checking whether a map entry exists to decide if we should
initialize it or add to it, we can rely on the map entry being constructed
and initialized to 0 before the addition happens.
For the std::max case, I've made a reference to the map entry to
avoid looking it up twice.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D135977
avoiding an assertion.
A BB with a nonzero count, whose successor blocks all have 0 counts, could
cause an assertion. Don't create any branch weights in this case.
Reviewed By: xur
Differential Revision: https://reviews.llvm.org/D134203
(A | B) & ~(A & B) --> A ^ B
https://alive2.llvm.org/ce/z/qpFMns
We already have the equivalent fold for real
logic instructions, but this pattern may occur
with selects too.
This is part of solving issue #58313.
This was reverted because it was breaking when targeting Darwin which
tried to export these symbols which are now hidden. It should be safe
to just stop attempting to export these symbols in the clang driver,
though Apple folks will need to change their TAPI allow list described
in the commit where these symbols were originally exported
f538018562
Bug: https://github.com/llvm/llvm-project/issues/58265
Differential Revision: https://reviews.llvm.org/D135340
Move logic to check and replace conditions to a helper function. This
isolates the code, allows using early returns, reduces the
indentation and simplifies eliminateConstraints.
When generating masked gathers nodes, SLP vectorizer accounts the cost
of the GEPs for loads as part of the scalar-vector transformation cost
estimation. But it does not do it for vectorized loads/stores, while it
may completely remove some of the GEPs completely. Because of this in
some cases masked gather operation can be much more profitable rather
than regular vectorization (masked-gather cost + vector GEP - scalar
loads + GEPs comparing to vectorized loads - scalar loads).
Added the analysis of the removed scalarGEPs for vectorized load/store nodes for better cost estimation.
Differential Revision: https://reviews.llvm.org/D135282
This reverts commit b05f5b90a1.
There are thread sanitizer buildbot failures in simple_stack.c.
I think that's because this ended up affecting the handling of
volatile accesses to allocas. Reverting for now.
Don't add argmem if the pointer is clearly not an argument (e.g.
a global). I don't think this makes a difference right now, but
gives more obvious results with D135780.
Limit pointer decomposition to pointers with index sizes of at most 64
bits. int64_t is used for coefficients, so as long as the index size <=
64 bits we should be able to represent all pointer offsets.
Pointer decomposition is limited to inbounds GEPs, so if a index
computation would overflow the result is poison, so it doesn't matter
that the coefficient overflows.
This allows replacing MulOverflow with regular multiplications.
We have to account for accesses to argument memory via captures.
I don't think there's any way to make this produce incorrect
results right now (because as soon as "other" is set, we lose the
ability to infer argmemonly), but this avoids incorrect results
once we have more precise representation.
The code for inferring memory attributes on arguments claims that
inalloca/preallocated arguments are always clobbered:
d71ad41080/llvm/lib/Transforms/IPO/FunctionAttrs.cpp (L640-L642)
However, we would still infer memory attributes for the whole
function without taking this into account, so we could still end
up inferring readnone for the function. This adds an argument
clobber if there are any inalloca/preallocated arguments.
Differential Revision: https://reviews.llvm.org/D135783
Add a check (can be disabled via a flag) that the pipeline we generate is actually parsable.
Can be disabled because we don't expect to handle every pass in -print-pipeline-passes.
Fixes#58280.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135703
(X << Z) / (Y << Z) --> X / Y
https://alive2.llvm.org/ce/z/CLKzqT
This requires a surprising "nuw" constraint because we have
to guard against immediate UB via signed-div overflow with
-1 divisor.
This extends 008a89037a and is another transform
derived from issue #58137.
Need to set the insertpoint for extractelement to point to the first
instruction in the node to avoid possible crash during external uses
combine process. Without it we may endup with the incorrect
transformation.
Differential Revision: https://reviews.llvm.org/D135591
(X << Z) / (Y << Z) --> X / Y
https://alive2.llvm.org/ce/z/E5eaxU
This fixes the motivating example from issue #58137,
but it is not the most general transform. We should
probably also convert left-shift in the divisor to
right-shift in the dividend for that, but that exposes
another missed canonicalization for shifts and adds.
Freeze instruction in some cases makes codegen worse, so need to be very
careful when emitting it. Instead improve analysis in isUndefVector
function to generate mask of unused elements and use it in the analysis.
Differential Revision: https://reviews.llvm.org/D135382
optimizeInductions may leave dead recipes which can prevent sinking.
Sinking on the other hand should not introduce new dead recipes, so
clean up dead recipes before sinking.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D133762