Commit Graph

30067 Commits

Author SHA1 Message Date
minglotus-6 e2074de6a8 [ProfSampleLoader] When disable-sample-loader-inlining is true, merge profiles of inlined instances to outlining versions.
When --disable-sample-loader-inlining is true, skip inline transformation, but merge profiles of inlined instances to outlining versions.

Differential Revision: https://reviews.llvm.org/D121862
2022-03-23 13:02:48 -07:00
chenglin.bi 52f323d0f1 [InstCombine] Fold abs of known negative operand when source is sub
When abs source comes from (x - y), check if a "x > y" dominating
condition exists.

Fixes #54132

Differential Revision: https://reviews.llvm.org/D122013
2022-03-23 15:21:33 -04:00
Arthur Eubanks 9bd66b312c [PassManager][Coroutine] Run passes under -O0 conditionally and run GlobalDCE
CoroSplit lowers various coroutine intrinsics. It's a CGSCC pass and
CGSCC passes don't run on unreachable functions. Normally GlobalDCE will
come along and delete unreachable functions, but we don't run GlobalDCE
under -O0, so an unreachable function with coroutine intrinsics may
never have CoroSplit run on it.

This patch adds GlobalDCE when coroutines intrinsics are present. It
also now runs all coroutine passes conditional when coroutine intrinsics
are present. This should also solve the -O0 regression reported in
D105877 due to LazyCallGraph construction.

Fixes https://github.com/llvm/llvm-project/issues/54117

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D122275
2022-03-23 11:03:26 -07:00
Arthur Eubanks e6ead19b77 Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash."
This reverts commit 27bd8f9492.

Causes crashes, see comments in D121973
2022-03-23 10:57:45 -07:00
Nikita Popov 29fada4a3d [EarlyCSE] Don't eagerly optimize MemoryUses
EarlyCSE currently optimizes all MemoryUses upfront. However,
EarlyCSE only actually queries the clobbering memory access for
a subset of uses, namely those where a CSE candidate has already
been identified. Delaying use optimization to the clobber query
improves compile-time in practice.

This change is not NFC because EarlyCSE has a limit on the number
of clobber queries (EarlyCSEMssaOptCap), in which case it falls
back to the defining access. The defining access for uses will now
no longer coincide with the optimized access.

If there are performance regressions from this change, we should
be able to address them by raising this limit.

Differential Revision: https://reviews.llvm.org/D121987
2022-03-23 16:47:35 +01:00
Sanjay Patel 0fcff69bcb [InstCombine] try to narrow shifted bswap-of-zext (2nd try)
The first attempt at this missed a validity check.
This version includes a test of the narrow source
type for modulo-16-bits.

Original commit message:

This is the IR counterpart to 370ebc9d9a
which provided a bswap narrowing fix for issue #53867.

Here we can be more general (although I'm not sure yet
what would happen for illegal types in codegen - too
rare to worry about?):
https://alive2.llvm.org/ce/z/3-CPfo

This will be more effective if we have moved the shift
after the bswap as proposed in D122010, but it is
independent of that patch.

Differential Revision: https://reviews.llvm.org/D122166
2022-03-23 11:28:37 -04:00
Alexandros Lamprineas a687f96b0f [FuncSpec][NFC] Clang-format the source code and fix debug typo. 2022-03-23 14:39:58 +00:00
Nikita Popov ba36556145 [InstrProfiling] Account for missing bitcast/GEP
This code is supposed to clean up a constexpr bitcast/GEP, but
with opaque pointers this ends up dropping references to the
global.
2022-03-23 15:39:39 +01:00
serge-sans-paille 1b89c83254 Cleanup includes: Transforms/Instrumentation & Transforms/Vectorize
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D122181
2022-03-23 11:06:13 +01:00
Nathan Chancellor 4e0008dcbe
Revert "[InstCombine] try to narrow shifted bswap-of-zext"
This reverts commit 9e9bda2e8f.

This causes a backend error when building the Linux kernel for arm64.
See https://reviews.llvm.org/D122166 for a simplified reproducer.
2022-03-22 17:32:33 -07:00
Vasileios Porpodas 27bd8f9492 Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354

This reverts commit f7d7d2a08d.
2022-03-22 16:41:55 -07:00
Philip Reames 7abefc4222 [instcombine] Fold away memset/memmove from otherwise unused alloca
The motivation for this is that while both memcpyopt and dse will catch this case, both are limited by MSSA's walk back threshold when finding clobbers.  As such, if you have a memcpy of an otherwise dead alloca placed towards the end of a long basic block with lots of other memory instructions, it would be missed.  This is a bit undesirable for such an "obviously" useless bit of code.

As noted in comments, we should probably generalize instcombine's escape analysis peephole (see visitAllocInst) to allow read xor write.  Doing that would subsume this code in a more general way, but is also a more involved change.  For the moment, I went with the easiest fix.
2022-03-22 13:48:48 -07:00
Arthur Eubanks f7d7d2a08d Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads.""
This reverts commit 79613185d3.

Causes crashes, see comments in https://reviews.llvm.org/D121973.
2022-03-22 13:33:49 -07:00
Sanjay Patel ccf8c969c2 [InstCombine] reorder code, fix formatting; NFC
The affected code can be updated to solve #54364,
so make some cosmetic diffs before real changes.
2022-03-22 16:33:01 -04:00
Florian Hahn 50c8588e44
[LV] Remove Loop argument from createInductionResumeValues (NFCI).
createInductionResumeValues only uses its loop argument only to get the
pre-header, but the pre-header is already known (we created/cached it
earlier). Remove the unneeded loop argument.
2022-03-22 14:23:12 +00:00
Sanjay Patel 60820e53ec [InstCombine] try to canonicalize logical shift after bswap
When shifting by a byte-multiple:
bswap (shl X, C) --> lshr (bswap X), C
bswap (lshr X, C) --> shl (bswap X), C

This is an IR implementation of a transform suggested in D120648.
The "swaps cancel" test models the motivating optimization from
that proposal.

Alive2 checks (as noted in the other review, we could use
knownbits to handle shift-by-variable-amount, but that can be an
enhancement patch):
https://alive2.llvm.org/ce/z/pXUaRf
https://alive2.llvm.org/ce/z/ZnaMLf

Differential Revision: https://reviews.llvm.org/D122010
2022-03-22 09:10:55 -04:00
Djordje Todorovic 91ea247039 [Debugify] Use DebugifyLevel in Debugify original mode
Before this patch the DebugifyLevel option was used for
the synthetic mode, so after this, it will be used in
the original mode as well.

Differential Revision: https://reviews.llvm.org/D115623
2022-03-22 14:04:56 +01:00
Nikita Popov afb526b3f4 [LICM] Handle store of pointer to itself (PR54495)
Rather than iterating over users and comparing operands, iterate
over uses and check operand number. Otherwise, we'll end up
promoting a store twice if it has two equal operands.

This can only happen with opaque pointers, as otherwise both
operands differ by a level of indirection, so a bitcast would have
to be involved.

Fixes https://github.com/llvm/llvm-project/issues/54495.
2022-03-22 14:00:07 +01:00
Sanjay Patel 9e9bda2e8f [InstCombine] try to narrow shifted bswap-of-zext
This is the IR counterpart to 370ebc9d9a
which provided a bswap narrowing fix for issue #53867.

Here we can be more general (although I'm not sure yet
what would happen for illegal types in codegen - too
rare to worry about?):
https://alive2.llvm.org/ce/z/3-CPfo

This will be more effective if we have moved the shift
after the bswap as proposed in D122010, but it is
independent of that patch.

Differential Revision: https://reviews.llvm.org/D122166
2022-03-22 08:22:30 -04:00
Djordje Todorovic 73777b4c35 [Debugify] Optimize debugify original mode
Before we start addressing the issue with having
a lot of false positives when using debugify in
the original mode, we have made a few patches that
should speed up the execution of the testing
utility Passes.

For example, when testing a large project
(let's say LLVM project itself), we can face
a lot of potential DI issues. Usually, we use
-verify-each-debuginfo-preserve (that is very
similar to -debugify-each) -- it collects
DI metadata before each Pass, and after the Pass
it checks if the Pass preserved the DI metadata.
However, we can speed up this process, since we
don't need to collect DI metadata before each
Pass -- we could use the DI metadata that are
collected after the previous Pass from
the pipeline as an input for the next Pass.

This patch speeds up the utility for ~2x.

Differential Revision: https://reviews.llvm.org/D115622
2022-03-22 12:14:00 +01:00
serge-sans-paille a53b689f0c Fix missing include under -DEXPENSIVE_CHECK
Regression introduced by f1985a3f85
2022-03-22 10:37:56 +01:00
serge-sans-paille f1985a3f85 Cleanup includes: Transforms/IPO
Preprocessor output diff: -238205 lines
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D122183
2022-03-22 10:06:28 +01:00
Chuanqi Xu 902f4708fe [NFC] [Coroutines] Remove unnecessary check and constraints on SmallVector
The CoroSplit pass would check the existence of coroutine intrinsic
before starting work. It is not necessary and wasteful since it would
iterate over the Module.

This patch also removes the constraint on the corresponding of the
SmallVector for the possible coroutines in the Modules. The original
value is 4. Given coroutines is used actually in practice. 4 is really
relatively a low threshold.
2022-03-22 14:24:46 +08:00
Vasileios Porpodas 79613185d3 Recommit "[SLP] Fix lookahead operand reordering for splat loads."
Original review: https://reviews.llvm.org/D121354

The original commit 9136145eb0 broke the build on several targets.

Differential Revision: https://reviews.llvm.org/D121973
2022-03-21 15:57:32 -07:00
Hirochika Matsumoto 86f970e595 [IROutliner][NFC] Fix typo in doc of findOrCreatePHIInBlock
Typo Fix in Documentation

Author: hkmatsumoto

Reviewers: AndrewLitteken

Differential Revision: https://reviews.llvm.org/D121627
2022-03-21 12:34:20 -05:00
Philip Reames ee7324b898 Rename mayBeMemoryDependent to mayHaveNonDefUseDependency [nfc] 2022-03-21 10:01:40 -07:00
Andrew Litteken 4e500df89e [IROutliner] Fix phi nodes when self referential within block but doesn't contain branch
When outlining a phi node, if the the incoming branch is a block contained in the region and the branch from that block is not outlined, we create broken code. The fix is to recognize when that branch from the included incoming block is not contained, and ignore the region.

Reviewer: paquette

Differential Revision: https://reviews.llvm.org/D121311
2022-03-21 11:05:15 -05:00
psamolysov-intel 2ed030ba88 [InferAddressSpaces][NFC] Small code improvements for the InferAddressSpaces pass
There is a bunch of code improvements in the patch: marking as const everything what can be
const and fixing some typos in comments.

Also the patch removes the shadowing parameter TTI from the rewriteWithNewAddressSpaces
method, the TTI parameter is not required because the same field is in the class.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D121671
2022-03-21 11:03:12 -05:00
Alexey Bataev 79a182371e [SLP]Make stricter check for instructions that do not require
scheduling.

Need to check that the instructions with external operands can be
reordered safely before actualy exclude them from the scheduling.
2022-03-21 06:09:12 -07:00
Sophia 72bde608d2 [LV] Fix typo in comment
Reviewed by: fhahn (Florian Hahn)

    Differential Revision: https://reviews.llvm.org/D121781
2022-03-21 20:30:05 +08:00
Florian Hahn 0ebac76e6e
[LV] Remove unneeded Loop argument from completeLoopSkeleton. (NFCI)
completeLoopSkeleton only uses its loop argument only to get the
pre-header, but the pre-header is already known (we created/cached it
earlier). Remove the unneeded loop argument.
2022-03-21 10:07:25 +00:00
Andrew Litteken 38e8880e93 [IROutliner] Do not outlined from functions with optnone
Since the IROutliner is performing an optimization, it should not outline from functions explicitly marked with optnone. This adds an extra check and test to make sure this does not occur.

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D121567
2022-03-20 23:39:23 -05:00
Florian Hahn 487629cc61
[LV] Remove dead Loop argument from emitMemRuntimeChecks. (NFC) 2022-03-20 21:01:15 +00:00
Philip Reames b7806c8b37 [SLP] Explicit track required stacksave/alloca dependency
The semantics of an inalloca alloca instruction requires that it not be reordered with a preceeding stacksave intrinsic call.  Unfortunately, there's no def/use edge or memory dependence edge.  (THe memory point is slightly subtle, but in general a new allocation can't alias with a call which executes strictly before it comes into existance.)

I'd tried to tackle this same case previously in 689babdf6, but the fix chosen there turned out to be incomplete.  As such, this change contains a fully revert of the first fix attempt.

This was noticed when investigating problems which surfaced with D118538, but this is definitely an existing bug.  This time around, I managed to reduce a couple of additional cases, including one which was being actively miscompiled even without the new scheduling change.  (See test diffs)

Compile time wise, we only spend extra time when seeing a stacksave (rare), and even then we walk the block at most once per schedule window extension.  Likely a non-issue.
2022-03-20 13:58:45 -07:00
Kazu Hirata bce1bf0ee2 [Transform] Apply clang-tidy fixes for readability-redundant-smartptr-get (NFC) 2022-03-20 10:41:22 -07:00
Philip Reames 6253b77da9 [SLP] Respect control dependence within a block during scheduling
This fixes an active miscompile visible in the test changes.  The basic problem is that the scheduling dependency graph didn't have any edges for control dependence within a single basic block.  The result is that we could (and in some rare cases *did*) perform reorderings within a block which could introduce new undefined behavior along paths which didn't previously contain any.

Impact wise, we have two major cases where control is not guaranteed to reach a later instruction in the block: may throw calls, and calls containing infinite loops.
* The former case was mostly covered by the memory dependencies, and to trigger require a function which can throw, but not write to memory.  In theory, such a case is possible, but not likely in practice.
* The later case is likely more of an issue in practice.  After this code was first written, we changed the IR semantics to allow well defined infinite loops without satisifying mustprogress.  Even for C/C++ - which do imply mustprogress - recent changes to how we treat atomics (e.g. an atomic read does not always imply a write) could expose this issue.  I'm a bit shocked we don't seem to have a bug report which hit this in real code actually.

Compile time wise, this results in a single extra scan of the scheduling window in the common case.  Since we stop scanning at the next instruction which isn't guaranteed to execute, no matter what order we traverse instructions in, we scan the block once.  The exception to this is that when we extend the scheduling window downwards, we invalidate all dependencies, and thus rescan.  So the potentially expensive case is when we a call in a big schedule window which is frequently extended.  We could optimize this case (by caching the last instruction not guaranteeed to transfer execution and scanning only the extended window) and starting there), but I decided to leave the complexity until it mattered.  That same case is already degenerate with memory dependences which is more expensive than the control dependence scan.

We could also consider combining the memory dependence and control dependence sets to reduce memory usage, but since it complicates the code slightly and makes debugging a bit harder, I went with the simplest scheme for now.

This was noticed while trying to understand the failures reported against D118538, but is not otherwise related to that change.
2022-03-19 13:36:24 -07:00
Florian Hahn 1a820ff039
[LV] Remove unnecessary uses of Loop* (NFC).
Update functions that previously took a loop pointer but only to get the
pre-header. Instead, pass the block directly. This removes the
requirement for the loop object to be created up-front.
2022-03-19 20:18:47 +00:00
Johannes Doerfert 4166738c38 [OpenMP][FIX] Do not crash when kernels are debug wrapper functions
With debug information enabled (-g) Clang will wrap the actual target
region into a new function which is called from the "kernel". The problem
is that the "kernel" is now basically a wrapper without all the things
we expect. More importantly, if we end up asking for an AAKernelInfo
for the "target region function" we might try to turn it into SPMD mode.
That used to cause an assertion as that function doesn't have an
appropriately named `_exec_mode` global. While the global is going away
soon we still need to make sure to properly handle this case, e.g.,
perform optimizations reliably.

Differential Revision: https://reviews.llvm.org/D122043
2022-03-19 14:15:55 -05:00
Fangrui Song c6692f819e [GlobalOpt] Don't replace alias with aliasee if either alias/aliasee may be preemptible
Generalize D99629 for ELF. A default visibility non-local symbol is preemptible
in a -shared link. `isInterposable` is an insufficient condition.

Moreover, a non-preemptible alias may be referenced in a sub constant expression
which intends to lower to a PC-relative relocation. Replacing the alias with a
preemptible aliasee may introduce a linker error.

Respect dso_preemptable and suppress optimization to fix the abose issues. With
the change, `alias = 345` will not be rewritten to use aliasee in a `-fpic`
compile.
```
int aliasee;
extern int alias __attribute__((alias("aliasee"), visibility("hidden")));
void foo() { alias = 345; } // intended to access the local copy
```

While here, refine the condition for the alias as well.

For some binary formats like COFF, `isInterposable` is a sufficient condition.
But I think canonicalization for the changed case has little advantage, so I
don't bother to add the `Triple(M.getTargetTriple()).isOSBinFormatELF()` or
`getPICLevel/getPIELevel` complexity.

For instrumentations, it's recommended not to create aliases that refer to
globals that have a weak linkage or is preemptible. However, the following is
supported and the IR needs to handle such cases.
```
int aliasee __attribute__((weak));
extern int alias __attribute__((alias("aliasee")));
```

There are other places where GlobalAlias isInterposable usage may need to be
fixed.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D107249
2022-03-18 14:17:05 -07:00
Philip Reames 1093949cff [SLP] Add comment clarifying assumption that tripped me up [NFC]
I keep thinking this assumption is probably exploitable for a bug in the existing implementation, but all of my attempts at writing a test case have failed.  So for the moment, just document this very subtle assumption.
2022-03-18 11:40:19 -07:00
Kazu Hirata 3e0f7c7881 [Vectorize] Fix an 'unused function' warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3917:13: error:
  unused function 'needToScheduleSingleInstruction'
  [-Werror,-Wunused-function]
2022-03-18 11:24:57 -07:00
Kazu Hirata b3d8c0d069 [Vectorize] Fix an 'unused variable' warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8148:18: error:
  unused variable 'SDTE' [-Werror,-Wunused-variable]
2022-03-18 11:24:54 -07:00
Nick Desaulniers e1bae23f6f [SCCP] do not clean up dead blocks that have their address taken
[SCCP] do not clean up dead blocks that have their address taken

Fixes a crash observed in IPSCCP.

Because the SCCPSolver has already internalized BlockAddresses as
Constants or ConstantExprs, we don't want to try to update their Values
in the ValueLatticeElement. Instead, continue to propagate these
BlockAddress Constants, continue converting BasicBlocks to unreachable,
but don't delete the "dead" BasicBlocks which happen to have their
address taken.  Leave replacing the BlockAddresses to another pass.

Fixes: https://github.com/llvm/llvm-project/issues/54238
Fixes: https://github.com/llvm/llvm-project/issues/54251

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D121744
2022-03-18 11:02:15 -07:00
Philip Reames 8f108c32bc Revert "[SLP] Optionally preserve MemorySSA"
This reverts commit 1cfa986d68.  See https://github.com/llvm/llvm-project/issues/54256 for why I'm discontinuing the project.

Seperately, it turns out that while this patch does correctly preserve MSSA, it's correct only at the end of the pass; not between vectorization attempts.  Even if we decide to resurrect this, we'll need to fix that before reapplying.
2022-03-18 10:45:59 -07:00
Florian Mayer 078b546555 [HWASan] do not replace lifetime intrinsics with tagged address.
Quote from the LLVM Language Reference
  If ptr is a stack-allocated object and it points to the first byte of the
  object, the object is initially marked as dead. ptr is conservatively
  considered as a non-stack-allocated object if the stack coloring algorithm
  that is used in the optimization pipeline cannot conclude that ptr is a
  stack-allocated object.

By replacing the alloca pointer with the tagged address before this change,
we confused the stack coloring algorithm.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D121835
2022-03-18 10:39:51 -07:00
Florian Mayer dbc918b649 Revert "[HWASan] do not replace lifetime intrinsics with tagged address."
Failed on buildbot:

/home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/llc: error: : error: unable to get target for 'aarch64-unknown-linux-android29', see --version and --triple.
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/FileCheck /home/buildbot/buildbot-root/llvm-project/llvm/test/Instrumentation/HWAddressSanitizer/stack-coloring.ll --check-prefix=COLOR

This reverts commit 208b923e74.
2022-03-18 10:04:48 -07:00
Florian Hahn 5ab421fb4e
[LICM] Add allowspeculation pass options.
This adds a new option to control AllowSpeculation added in D119965 when
using `-passes=...`.

This allows reproducing #54023 using opt.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D121944
2022-03-18 16:51:57 +00:00
Florian Mayer 208b923e74 [HWASan] do not replace lifetime intrinsics with tagged address.
Quote from the LLVM Language Reference
  If ptr is a stack-allocated object and it points to the first byte of the
  object, the object is initially marked as dead. ptr is conservatively
  considered as a non-stack-allocated object if the stack coloring algorithm
  that is used in the optimization pipeline cannot conclude that ptr is a
  stack-allocated object.

By replacing the alloca pointer with the tagged address before this change,
we confused the stack coloring algorithm.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D121835
2022-03-18 09:45:05 -07:00
Nikita Popov ab2284a643 [LowerConstantIntrinsics] Make TLI a required dependency
The way the pass is actually used in the optimization pipeline,
TLI will be available, but this is not the case when running just
-lower-constant-intrinsics in tests, which ends up being quite
confusing.

Require TLI unconditionally, as we usually do.
2022-03-18 14:59:18 +01:00
Nikita Popov fc8946fae7 [InstCombine] Remove integer SPF of SPF folds (NFCI)
Now that we canonicalize to intrinsics, these folds should no
longer be needed. Only one fold that also applies to floating-point
min/max is retained.
2022-03-18 10:20:48 +01:00