Commit Graph

27111 Commits

Author SHA1 Message Date
Nikita Popov 2fe85dd289 [Attributor] Don't access pointer elem type in constructPointer (NFC)
Splitting this out as the change is non-trivial: The way this code
handled pointer types doesn't really make sense, as GEPs can only
apply an offset to the outermost pointer, but can't drill down
into interior pointer types (which would require dereferencing
memory).

Instead give special treatment to the first (pointer) index.
I've hardcoded it to zero as that's the only way the function is
used right now, but handling non-zero indexes would be
straightforward.

The original goal here was to have an element type for CreateGEP.
2021-03-11 21:36:40 +01:00
Petr Hosek 87fd09b25f [InstrProfiling] Generate runtime hook for ELF platforms
When using -fprofile-list to selectively apply instrumentation only
to certain files or functions, we may end up with a binary that doesn't
have any counters in the case where no files were selected. However,
because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the
runtime would still be pulled in and incur some non-trivial overhead,
especially in the case when the continuous or runtime counter relocation
mode is being used. A better way would be to pull in the profile runtime
only when needed by declaring the __llvm_profile_runtime symbol in the
translation unit only when needed.

This approach was already used prior to 9a041a7522, but we changed it
to always generate the __llvm_profile_runtime due to a TAPI limitation.
Since TAPI is only used on Mach-O platforms, we could use the early
emission of __llvm_profile_runtime there, and on other platforms we
could change back to the earlier approach where the symbol is generated
later only when needed. We can stop passing -u__llvm_profile_runtime to
the linker on Linux and Fuchsia since the generated undefined symbol in
each translation unit that needed it serves the same purpose.

Differential Revision: https://reviews.llvm.org/D98061
2021-03-11 12:29:01 -08:00
Valery N Dmitriev 73f94969b2 [SLP] Fix crash when matching associative reduction for integer min/max.
Associative reduction matcher in SLP begins with select instruction but when
it reached call to llvm.umax (or alike) via def-use chain the latter also matched
as UMax kind. The routine's later code assumes matched instruction to be a select
and thus it merely died on the first encountered cast that did not fit.

Differential Revision: https://reviews.llvm.org/D98432
2021-03-11 11:52:57 -08:00
Wenlei He 051f2c144e [SamplePGO] Skip inlinee profile scaling for sample loader inlining
For CGSCC inline, we need to scale down a function's branch weights and entry counts when thee it's inlined at a callsite. This is done through updateCallProfile. Additionally, we also scale the weigths for the inlined clone based on call site count in updateCallerBFI. Neither is needed for inlining during sample profile loader as it's using context profile that is separated from inlinee's own profile. This change skip the inlinee profile scaling for sample loader inlining.

Differential Revision: https://reviews.llvm.org/D98187
2021-03-11 10:18:26 -08:00
Hiroshi Yamauchi 365b225d46 [PGO] Fix two issues in PGOMemOPSizeOpt.
1. PGOMemOPSizeOpt grabs only the first, up to five (by default) entries from
the value profile metadata and preserves the remaining entries for the fallback
memop call site. If there are more than five entries, the rest of the entries
would get dropped. This is fine for PGOMemOPSizeOpt itself as it only promotes
up to 3 (by default) values, but potentially not for other downstream passes
that may use the value profile metadata.

2. PGOMemOPSizeOpt originally assumed that only values 0 through 8 are kept
track of. When the range buckets were introduced, it was changed to skip the
range buckets, but since it does not grab all entries (only five), if some range
buckets exist in the first five entries, it could potentially cause fewer
promotion opportunities (eg. if 4 out of 5 were range buckets, it may be able to
promote up to one non-range bucket, as opposed to 3.) Also, combined with 1, it
means that wrong entries may be preserved, as it didn't correctly keep track of
which were entries were skipped.

To fix this, PGOMemOPSizeOpt now grabs all the entries (up to the maximum number
of value profile buckets), keeps track of which entries were skipped, and
preserves all the remaining entries.

Differential Revision: https://reviews.llvm.org/D97592
2021-03-11 09:53:05 -08:00
Stephen Tozer f40976bd01 Revert "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
This reverts commit c0f3dfb9f1.

Reverted due to an error on the clang-x64-windows-msvc buildbot.
2021-03-11 14:48:01 +00:00
Nikita Popov 46354bac76 [OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC)
Explicitly pass loaded type when creating loads, in preparation
for the deprecation of these APIs.

There are still a couple of uses left.
2021-03-11 14:40:57 +01:00
gbtozers c0f3dfb9f1 [DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands
This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic
operations with two or more non-const operands; this includes the GetElementPtr
instruction, and most Binary Operator instructions. These salvages produce
DIArgList locations and are only valid for dbg.values, as currently variadic
DIExpressions must use DW_OP_stack_value. This functionality is also only added
for salvageDebugInfoForDbgValues; other functions that directly call
salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be
updated in a later patch.

Differential Revision: https://reviews.llvm.org/D91722
2021-03-11 13:33:49 +00:00
Nikita Popov 403da6a69a Reapply [LICM] Make promotion faster
Relative to the previous implementation, this always uses
aliasesUnknownInst() instead of aliasesPointer() to correctly
handle atomics. The added test case was previously miscompiled.

-----

Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-11 10:50:28 +01:00
Djordje Todorovic 9f41c03f82 [Debugify][OriginalDIMode] Export the report into JSON file
By using the original-di check with debugify in the combination with
the llvm/utils/llvm-original-di-preservation.py it becomes very user
friendly tool. An example of the HTML page with the issues
related to debug info can be found at [0].

[0] https://djolertrk.github.io/di-checker-html-report-example/

Differential Revision: https://reviews.llvm.org/D82546
2021-03-11 01:11:13 -08:00
Petr Hosek c7712087cb [InstrProfiling] Don't generate __llvm_profile_runtime_user
This is no longer needed, we can add __llvm_profile_runtime directly
to llvm.compiler.used or llvm.used to achieve the same effect.

Differential Revision: https://reviews.llvm.org/D98325
2021-03-10 22:33:51 -08:00
Ruiling Song 8b7d3bed0f [ValueMapper] Add debug output for metadata remapping
This is useful for debugging which pointers are updated during remapping
process.

Differential Revision: https://reviews.llvm.org/D95775
2021-03-11 09:54:55 +08:00
kuterd d75c9e61a5 [Attributor] Attributor call site specific AAValueConstantRange
This patch makes uses of the context bridges introduced in D83299 to make
AAValueConstantRange call site specific.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83744
2021-03-11 01:19:44 +03:00
Mauri Mustonen 0de8aeae72
[VPlan] Support to widen select intructions in VPlan native path
Add support to widen select instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer.

Previously select instructions get handled by the wrong recipe and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D97136
2021-03-10 20:59:53 +00:00
Matteo Favaro 989051d5f8 [DSE] Extending isOverwrite to support offsetted fully overlapping stores
The isOverwrite function is making sure to identify if two stores
are fully overlapping and ideally we would like to identify all the
instances of OW_Complete as they'll yield possibly killable stores.
The current implementation is incapable of spotting instances where
the earlier store is offsetted compared to the later store, but
still fully overlapped. The limitation seems to lie on the
computation of the base pointers with the
GetPointerBaseWithConstantOffset API that often yields different
base pointers even if the stores are guaranteed to partially overlap
(e.g. the alias analysis is returning AliasResult::PartialAlias).

The patch relies on the offsets computed and cached by BatchAAResults
(available after D93529) to determine if the offsetted overlapping
is OW_Complete.

Differential Revision: https://reviews.llvm.org/D97676
2021-03-10 21:09:33 +01:00
Sriraman Tallam 0ba1ebcbb7 Remove original implementation of UniqueInternalLinkageNames pass.
D96109 was recently submitted which contains the refactored implementation of
-funique-internal-linakge-names by adding the unique suffixes in clang rather
than as an LLVM pass. Deleting the former implementation in this change.

Differential Revision: https://reviews.llvm.org/D98234
2021-03-10 11:57:40 -08:00
gbtozers 81b8357e70 [DebugInfo][NFC] Refactor BinOp+GEP salvaging in salvageDebugInfoImpl
This patch refactors out the salvaging of GEP and BinOp instructions into
separate functions, in preparation for further changes to the salvaging of these
instructions coming in another patch; there should be no functional change as a
result of this refactor.

Differential Revision: https://reviews.llvm.org/D92851
2021-03-10 18:03:12 +00:00
Daniil Seredkin 7c49f3c75b [InstCombine][SimplifyLibCalls] An extra sqrtf was produced because of transformations in optimizePow function
See: https://bugs.llvm.org/show_bug.cgi?id=47613

There was an extra sqrt call because shrinking emitted a new powf and at the same time optimizePow replaces the previous pow with sqrt and as the result we have two instructions that will be in worklist of InstCombie despite the fact that %powf is not used by anyone (it is alive because of errno).

As the result we have two instructions:

  %powf = call fast float @powf(float %x, float 5.000000e-01)
  %sqrt = call fast double @sqrt(double %dx)

%powf will be converted to %sqrtf on a later iteration.

As a quick fix for that I moved shrinking to the end of optimizePow so that pow is replaced with sqrt at first that allows not to emit a new shrunk powf.

Differential Revision: https://reviews.llvm.org/D98235
2021-03-10 12:33:05 -05:00
Ta-Wei Tu 7ff2768be1 Revert "[LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest`"
This reverts commit df9158c9a4.
2021-03-11 01:24:43 +08:00
Jianzhou Zhao 6a9a686ce7 [dfsan] Tracking origins at phi nodes
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D98268
2021-03-10 17:02:58 +00:00
Dávid Bolvanský c68b560be3 [DSE] Handle memmove with equal non-const sizes
Follow up for fhahn's D98284. Also fixes a case from PR47644.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D98346
2021-03-10 17:52:00 +01:00
Florian Hahn 8d9b9c0edc
[DSE] Handle memcpy/memset with equal non-const sizes.
Currently DSE misses cases where the size is a non-const IR value, even
if they match. For example, this means that llvm.memcpy/llvm.memset
calls are not eliminated, even if they write the same number of bytes.

This patch extends isOverwite to try to get IR values for the number of
bytes written from the analyzed instructions. If the values match,
alias checks are performed and the result is returned.

At the moment this only covers llvm.memcpy/llvm.memset. In the future,
we may enable MemoryLocation to also track variable sizes, but this
simple approach should allow us to cover the important cases in DSE.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98284
2021-03-10 10:13:58 +00:00
Wei Mi ee35784a90 [SampleFDO] Support enabling -funique-internal-linkage-name.
now -funique-internal-linkage-name flag is available, and we want to flip
it on by default since it is beneficial to have separate sample profiles
for different internal symbols with the same name. As a preparation, we
want to avoid regression caused by the flip.

When we flip -funique-internal-linkage-name on, the profile is collected
from binary built without -funique-internal-linkage-name so it has no uniq
suffix, but the IR in the optimized build contains the suffix. This kind of
mismatch may introduce transient regression.

To avoid such mismatch, we introduce a NameTable section flag indicating
whether there is any name in the profile containing uniq suffix. Compiler
will decide whether to keep uniq suffix during name canonicalization
depending on the NameTable section flag. The flag is only available for
extbinary format. For other formats, by default compiler will keep uniq
suffix so they will only experience transient regression when
-funique-internal-linkage-name is just flipped.

Another type of regression is caused by places where we miss to call
getCanonicalFnName. Those places are fixed.

Differential Revision: https://reviews.llvm.org/D96932
2021-03-09 21:41:40 -08:00
Philip Reames cf1899e0a9 [rs4gc] common bdv operand visitation [nfc] 2021-03-09 20:28:47 -08:00
Arnold Schwaighofer 590ac0a26a [coro async] Transfer the original function's attributes to the clone
rdar://75052917

Differential Revision: https://reviews.llvm.org/D98051
2021-03-09 17:01:41 -08:00
Leonard Chan cf371573b0 [llvm] Change DSOLocalEquivalent type if the underlying global value type changes
We encountered an issue where LTO running on IR that used the DSOLocalEquivalent
constant would result in bad codegen. The underlying issue was ValueMapper wasn't
properly handling DSOLocalEquivalent, so this just adds the machinery for handling
it. This code path is triggered by a fix to DSOLocalEquivalent::handleOperandChangeImpl
where DSOLocalEquivalent could potentially not have the same type as its underlying GV.

This updates DSOLocalEquivalent::handleOperandChangeImpl to change the type if
the GV type changes and handles this constant in ValueMapper.

Differential Revision: https://reviews.llvm.org/D97978
2021-03-09 15:09:48 -08:00
Sanjay Patel 23fd647cc6 [SLP] remove dead null check; NFC
We cast<> to Instruction (not dyn_cast<>), so we already
required/assumed that Cmp is not null.
2021-03-09 17:43:07 -05:00
Jianzhou Zhao 8506fe5b41 [dfsan] Tracking origins at memory transfer
This is a part of https://reviews.llvm.org/D95835.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D98192
2021-03-09 22:15:07 +00:00
Juneyoung Lee f49354838e Revert "[InstCombine] Add simplification of two logical and/ors"
This reverts commit 07c3b97e18 due to a reported failure in two-stage build.
2021-03-10 05:48:31 +09:00
gbtozers df69c69427 [DebugInfo] Handle multiple variable location operands in IR
This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.

Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.

Differential Revision: https://reviews.llvm.org/D88232
2021-03-09 16:44:38 +00:00
Sanjay Patel 2986a9c7e2 [InstCombine] canonicalize 'not' op after min/max intrinsic
This is another step towards parity between existing select
transforms and min/max intrinsics (D98152)..

The existing 'not' folds around select are complicated, so
it's likely that we will need to enhance this, but this
should be a safe step.
2021-03-09 11:33:28 -05:00
Sanjay Patel 41b9209a12 [InstCombine] fold min/max intrinsics with not ops
This is a partial translation of the existing select-based
folds. We need to recreate several different transforms to
avoid regressions as noted in D98152.

https://alive2.llvm.org/ce/z/teuZ_J
2021-03-09 08:55:48 -05:00
Florian Hahn 92da5b7119
[InstCombine] Simplify phis with incoming pointer-casts.
If the incoming values of a phi are pointer casts of the same original
value, replace the phi with a single cast. Such redundant phis are
somewhat common after loop-rotate and removing them can avoid some
unnecessary code bloat, e.g. because an iteration of a loop is peeled
off to make the phi invariant. It should also simplify further analysis
on its own.

InstCombine already uses stripPointerCasts in a couple of places and
also simplifies phis based on the incoming values, so the patch should
fit in the existing scope.

The patch causes binary changes in 47 out of 237 benchmarks in
MultiSource/SPEC2000/SPEC2006 with -O3 -flto on X86.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98058
2021-03-09 11:40:18 +00:00
Hongtao Yu 4c3d759d00 [CSSPGO] Always use callsite samples as callsite probe counts.
For CS profile, the callsite count of previously inlined callees is populated with the entry count of the callees. Therefore when trying to get a weight for calliste probe after inlinining, the callsite count should always be used. The same fix has already been made for non-probe case.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D98094
2021-03-08 22:52:36 -08:00
Akira Hatanaka dca5737945 Move ObjCARCUtil.h back to llvm/Analysis
Instead of adding the header to llvm/IR, just duplicate the marker
string in the auto upgrader.
2021-03-08 16:35:24 -08:00
Alina Sbirlea 29482426b5 Revert "[LICM] Make promotion faster"
Revert 3d8f842712
Revision triggers a miscompile sinking a store incorrectly outside a
threading loop. Detected by tsan.
Reverting while investigating.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-08 12:53:03 -08:00
Philip Reames ebc61f9d3c [instcombine] Collapse trivial or recurrences
If we have a recurrence of the form <Start, Or, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence.

Differential Revision: https://reviews.llvm.org/D97578 (or part)
2021-03-08 09:21:38 -08:00
Philip Reames 239a618180 [instcombine] Collapse trivial and recurrences
If we have a recurrence of the form <Start, And, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence.

Differential Revision: https://reviews.llvm.org/D97578 (and part)
2021-03-08 09:21:38 -08:00
Philip Reames 97a7bc5831 [gvn] Precisely propagate equalities to phi operands
The code used for propagating equalities (e.g. assume facts) was conservative in two ways - one of which this patch fixes. Specifically, it shifts the code reasoning about whether a use is dominated by the end of the assume block to consider phi uses to exist on the predecessor edge. This matches the dominator tree handling for dominates(Edge, Use), and simply extends it to dominates(BB, Use).

Note that the decision to use the end of the block is itself a conservative choice. The more precise option would be to use the later of the assume and the value, and replace all uses after that. GVN handles that case separately (with the replace operand mechanism) because it used to be expensive to ask dominator questions within blocks. With the new instruction ordering support, we should probably rewrite this code at some point to simplify.

Differential Revision: https://reviews.llvm.org/D98082
2021-03-08 08:59:00 -08:00
Sanne Wouda 05a6e2eb9a [InstCombine] Add a combine for a shuffle of similar bitcasts
Some intrinsics wrapper code has the habit of ignoring the type of the
elements in vectors, thinking of vector registers as a "bag of bits". As
a consequence, some operations are shared between vectors of different
types are shared. For example, functions that rearrange elements in a
vector can be shared between vectors of int32 and float.

This can result in bitcasts in awkward places that prevent the backend
from recognizing some instructions. For AArch64 in particular, it
inhibits the selection of dup from a general purpose register (GPR), and
mov from GPR to a vector lane.

This patch adds a pattern in InstCombine to move the bitcasts past the
shufflevector if this is possible. Sometimes this even allows
InstCombine to remove the bitcast entirely, as in the included tests.

Alternatively this could be done with a few extra patterns in the
AArch64 backend, but InstCombine seems like a better place for this.

Differential Revision: https://reviews.llvm.org/D97397
2021-03-08 16:32:30 +00:00
Sanne Wouda 5e963a2441 Rehome an orphaned comment [NFC]
As seen in 35827164c4, the "shuffle x, x, mask" comment has drifted away
from the implementation of the pattern. Put it back.
2021-03-08 16:32:30 +00:00
Stephen Tozer 4343c68fa3 Fix: [DebugInfo] Support DIArgList in DbgVariableIntrinsic
This patch removed the only use of a lambda capture, triggering an error
on `-Werror -Wunused-lambda-capture` builds.
2021-03-08 14:57:11 +00:00
gbtozers e5d958c456 [DebugInfo] Support DIArgList in DbgVariableIntrinsic
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
location operand, resulting in a significant change to its interface. This patch
does not update all IR passes to support multiple location operands in a
dbg.value; the only change is to update the DbgVariableIntrinsic interface and
its uses. All code outside of the intrinsic classes assumes that an intrinsic
will always have exactly one location operand; they will still support
DIArgLists, but only if they contain exactly one Value.

Among other changes, the setOperand and setArgOperand functions in
DbgVariableIntrinsic have been made private. This is to prevent code from
setting the operands of these intrinsics directly, which could easily result in
incorrect/invalid operands being set. This does not prevent these functions from
being called on a debug intrinsic at all, as they can still be called on any
CallInst pointer; it is assumed that any code directly setting the operands on a
generic call instruction is doing so safely. The intention for making these
functions private is to prevent DIArgLists from being overwritten by code that's
naively trying to replace one of the Values it points to, and also to fail fast
if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
corresponding DIExpression.
2021-03-08 14:36:13 +00:00
Ta-Wei Tu df9158c9a4 [LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest`
The check `tightlyNested()` in `LoopInterchange` is similar to the one in `LoopNest`.
In fact, the former misses some cases where loop-interchange is not feasible and results in incorrect behaviour.
Replacing it with the much robust version provided by `LoopNest` reduces code duplications and fixes https://bugs.llvm.org/show_bug.cgi?id=48113.

`LoopInterchange` has a weaker definition of tightly or perfectly nesting-ness than the one implemented in `LoopNest::arePerfectlyNested()`.
Therefore, `tightlyNested()` is instead implemented with `LoopNest::checkLoopsStructure` and additional checks for unsafe instructions.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D97290
2021-03-08 11:36:08 +08:00
Mehdi Amini 8d5a981a13 Revert "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe"
This reverts commit 99108c791d.
Clang is miscompiling LLVM with this change, a stage-2 build hits
multiple failures.

As a repro, I built clang in a stage1 directory and used it this way:

cmake -G Ninja ../llvm \
  -DCMAKE_CXX_COMPILER=`pwd`/../build-stage1/bin/clang++ \
  -DCMAKE_C_COMPILER=`pwd`/../build-stage1/bin/clang \
  -DLLVM_TARGETS_TO_BUILD="X86;NVPTX;AMDGPU" \
  -DLLVM_ENABLE_PROJECTS=mlir \
  -DLLVM_BUILD_EXAMPLES=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_ENABLE_ASSERTIONS=On
ninja check-mlir
2021-03-08 00:15:47 +00:00
Juneyoung Lee 07c3b97e18 [InstCombine] Add simplification of two logical and/ors
This is a patch that adds folding of two logical and/ors that share one variable:

a && (a && b) -> a && b
a && (a & b)  -> a && b
...

This is towards removing the poison-unsafe select optimization (D93065 has more context).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96945
2021-03-08 02:38:43 +09:00
Juneyoung Lee d672c81126 [InstCombine] use safe transformation by default
.. since it will be folded into and/or anyway
2021-03-08 02:25:29 +09:00
Nikita Popov 2b494f85f1 [CVP] Remove -cvp-dont-add-nowrap-flags option
This option was originally added to work around a bug in LFTR.
The bug has long since been fixed.
2021-03-07 18:19:31 +01:00
Nikita Popov 176bbcae11 [DSE] Remove MemDep-based implementation
The MemorySSA-based implementation has been enabled without issue
for a while now, so keeping the old implementation around doesn't
seem useful anymore. This drops the MemDep-based implementation.

Differential Revision: https://reviews.llvm.org/D97877
2021-03-07 18:17:31 +01:00
Juneyoung Lee 33590ed4f2 [InstCombine] fix another poison-unsafe select transformation
This fixes another unsafe select folding by disabling it if
EnableUnsafeSelectTransform is set to false.

EnableUnsafeSelectTransform's default value is true, hence it won't
affect generated code (unless the flag is explicitly set to false).
2021-03-08 02:11:04 +09:00
Juneyoung Lee 99108c791d [SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe
This patch makes FoldBranchToCommonDest merge branch conditions into `select i1` rather than `and/or i1` when it is called by SimplifyCFG.
It is known that merging conditions into and/or is poison-unsafe, and this is towards making things *more* correct by removing possible miscompilations.
Currently, InstCombine simply consumes these selects into and/or of i1 (which is also unsafe), so the visible effect would be very small. The unsafe select -> and/or transformation will be removed in the future.
There has been efforts for updating optimizations to support the select form as well, and they are linked to D93065.

The safe transformation is fired when it is called by SimplifyCFG only. This is done by setting the new `PoisonSafe` argument as true.
Another place that calls FoldBranchToCommonDest is LoopSimplify. `PoisonSafe` flag is set to false in this case because enabling it has a nontrivial impact in performance because SCEV is more conservative with select form and InductiveRangeCheckElimination isn't aware of select form of and/or i1.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95026
2021-03-08 01:38:03 +09:00
Juneyoung Lee 5bb38e84d3 [LoopUnswitch] unswitch if cond is in select form of and/or as well
Hello all,
I'm trying to fix unsafe propagation of poison values in and/or conditions by using
equivalent select forms (`select i1 A, i1 B, i1 false` and `select i1 A, i1 true, i1 false`)
instead.
D93065 has links to patches for this.

This patch allows unswitch to happen if the condition is in this form as well.
`collectHomogenousInstGraphLoopInvariants` is updated to keep traversal if
Root and the visiting I matches both m_LogicalOr()/m_LogicalAnd().
Other than this, the remaining changes are almost straightforward and simply replaces
Instruction::And/Or check with match(m_LogicalOr()/m_LogicalAnd()).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D97756
2021-03-08 01:19:43 +09:00
Nikita Popov 3fedaf2a52 [GVN] Don't explicitly materialize undefs from dead blocks
When materializing an available load value, do not explicitly
materialize the undef values from dead blocks. Doing so will
will force creation of a phi with an undef operand, even if there
is a dominating definition. The phi will be folded away on
subsequent GVN iterations, but by then we may have already
poisoned MDA cache slots.

Simply don't register these values in the first place, and let
SSAUpdater do its thing.
2021-03-06 23:46:24 +01:00
Fangrui Song fb2cf0dd60 [FunctionImport] Delete unneeded setLive. NFC
ValueInfo's in Worklist are guaranteed to be live.
2021-03-06 14:09:54 -08:00
Mauri Mustonen 494b5ba364
[VPlan] Support to widen call intructions in VPlan native path
Add support to widen call instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer.

Previously call instructions got handled by wrong recipes and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139.

Patch by Mauri Mustonen <mauri.mustonen@tuni.fi>

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D97278
2021-03-06 21:59:52 +00:00
Roman Lebedev 2ad1f5eb1a
[InstCombine] Don't canonicalize (gep i8* X, -(ptrtoint Y)) as (inttoptr (sub (ptrtoint X), (ptrtoint Y)))
It's just a wrong thing to do.

We introduce inttoptr where there were none, which results in
loosing all provenance information because we no longer have a GEP{i,},
and pessimize all future optimizations,
because we are basically not allowed to look past `inttoptr`.

(gep i8* X, -(ptrtoint Y))  *is* the canonical form.
So just drop this fold.

Noticed while reviewing D98120.
2021-03-06 23:00:25 +03:00
Fangrui Song 9fb6782c69 [rs4gc] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds 2021-03-06 11:42:27 -08:00
Roman Lebedev b46c085d2b
[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.

This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.

Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.

Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
2021-03-06 21:52:46 +03:00
Ta-Wei Tu 8a003861a3 [NPM] Add -enable-loopinterchange option to NPM
We have the `enable-loopinterchange` option in legacy pass manager but not in NPM.
Add `LoopInterchange` pass to the optimization pipeline (at the same position as before)
when `enable-loopinterchange` is turned on.

Reviewed By: aeubanks, fhahn

Differential Revision: https://reviews.llvm.org/D98116
2021-03-07 02:39:28 +08:00
William S. Moses d163e75c81 [Attributor] Enable heap-to-stack of any size
Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1

Differential Revision: https://reviews.llvm.org/D97873
2021-03-06 12:57:32 -05:00
Philip Reames 5db2735af9 [gvn] Handle simply phi equivalence cases
GVN basically doesn't handle phi nodes at all. This is for a reason - we can't value number their inputs since the predecessor blocks have probably not been visited yet.

However, it also creates a significant pass ordering problem. As it stands, instcombine and simplifycfg ends up implementing CSE of phi nodes. This means that for any series of CSE opportunities intermixed with phi nodes, we end up having to alternate instcombine/simplifycfg and gvn to make progress.

This patch handles the simplest case by simply preprocessing the phi instructions in a block, and CSEing them if they are syntactically identical. This turns out to be powerful enough to handle many cases in a single invocation of GVN since blocks which use the cse'd phi results are visited after the block containing the phi. If there's a CSE opportunity in one the phi predecessors required to recognize the phi CSE opportunity, that will require a second iteration on the function. (Still within a single run of gvn though.)

Compile time wise, this could go either way. On one hand, we're potentially causing GVN to iterate over the function more. On the other, we're cutting down on iterations between two passes and potentially shrinking the IR aggressively. So, a bit unclear what to expect.

Note that this does still rely on instcombine to canonicalize block order of the phis, but that's a one time transformation independent of the values incoming to the phi.

Differential Revision: https://reviews.llvm.org/D98080
2021-03-06 09:31:12 -08:00
Philip Reames 8fe59ba51e [rs4gc] track the original value in the state use for base pointer rewriting
I'd originally intended to build on this for another purpose and have decided not to, but at a minimum, the stronger asserts are useful.
2021-03-06 08:46:15 -08:00
Philip Reames 6334952ff0 [rs4gc] minor code style improvement 2021-03-06 08:46:15 -08:00
Philip Reames 51b13a7ea0 [gvn] CSE gc.relocates based on meaning, not spelling
The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.)

We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass.

Differential Revision: https://reviews.llvm.org/D97974

(Note: Part of the reviewed change was split and landed as f352463a)
2021-03-05 10:16:12 -08:00
Philip Reames f352463ade Mark gc.relocate and gc.result as readnone
For some reason, we had been marking gc.relocates as reading memory. There's no known reason for this, and I suspect it to be a legacy of very early implementation conservatism.  gc.relocate and gc.result are simply projections of the return values from the associated statepoint.  Note that the LangRef has always declared them readnone.

The EarlyCSE change is simply moving the special casing from readonly to readnone handling.

As noted by the test diffs, this does allow some additional CSE when relocates are separated by stores, but since we generate gc.relocates in batches, this is unlikely to help anything in practice.

This was reviewed as part of https://reviews.llvm.org/D97974, but split at reviewer request before landing.  The motivation is to enable the GVN changes in that patch.
2021-03-05 10:07:17 -08:00
Philip Reames 99f93dd3a5 [rs4gc] avoid insert base computation instructions for deopt uses
If we have a value live over a call which is used for deopt at the call, we know that the value must be a base pointer. We can avoid potentially inserting IR to materialize a base for this value.

In it's current form, this is mostly a compile time optimization.   Building the base pointer graph (and then optimizing it away again) is a relatively expensive operation.  We also sometimes end up with better codegen in practice - due to failures in optimizing away the inserted base pointer propogation - but those are optimization bugs we're fixing concurrently.

The alternative to this would be to extend the base pointer inference with the ability to generally reuse multiple-base input instructions (phis and selects).  That's somewhat invasive and complicated, so we're defering it a bit longer.

Differential Revision: https://reviews.llvm.org/D97885
2021-03-05 09:55:36 -08:00
gbtozers 65600cb2a7 [DebugInfo] Add DIArgList MD to store multple values in DbgVariableIntrinsics
This patch adds a new metadata node, DIArgList, which contains a list of SSA
values. This node is in many ways similar in function to the existing
ValueAsMetadata node, with the difference being that it tracks a list instead of
a single value. Internally, it uses ValueAsMetadata to track the individual
values, but there is also a reasonable amount of DIArgList-specific
value-tracking logic on top of that. Similar to ValueAsMetadata, it is a special
case in parsing and printing due to the fact that it requires a function state
(as it may reference function-local values).

This patch should not result in any immediate functional change; it allows for
DIArgLists to be parsed and printed, but debug variable intrinsics do not yet
recognize them as a valid argument (outside of parsing).

Differential Revision: https://reviews.llvm.org/D88175
2021-03-05 17:02:24 +00:00
David Sherwood fec0a0adac [SVE][LoopVectorize] Add support for extracting the last lane of a scalable vector
There are certain loops like this below:

  for (int i = 0; i < n; i++) {
    a[i] = b[i] + 1;
    *inv = a[i];
  }

that can only be vectorised if we are able to extract the last lane of the
vectorised form of 'a[i]'. For fixed width vectors this already works since
we know at compile time what the final lane is, however for scalable vectors
this is a different story. This patch adds support for extracting the last
lane from a scalable vector using a runtime determined lane value. I have
added support to VPIteration for runtime-determined lanes that still permit
the caching of values. I did this by introducing a new class called VPLane,
which describes the lane we're dealing with and provides interfaces to get
both the compile-time known lane and the runtime determined value. Whilst
doing this work I couldn't find any explicit tests for extracting the last
lane values of fixed width vectors so I added tests for both scalable and
fixed width vectors.

Differential Revision: https://reviews.llvm.org/D95139
2021-03-05 09:57:56 +00:00
Michael Kruse b119120673 [clang][OpenMP] Use OpenMPIRBuilder for workshare loops.
Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to:
 * Only the worksharing-loop directive.
 * Recognizes only the nowait clause.
 * No loop nests with more than one loop.
 * Untested with templates, exceptions.
 * Semantic checking left to the existing infrastructure.

This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics:
 * The distance function: How many loop iterations there will be before entering the loop nest.
 * The loop variable function: Conversion from a logical iteration number to the loop variable.

These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang.

The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does.

For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting.

Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset.

Reviewed By: jdenny

Differential Revision: https://reviews.llvm.org/D94973
2021-03-04 22:52:59 -06:00
Wei Mi 2357d29335 [SampleFDO] Another fix to prevent repeated indirect call promotion in
sample loader pass.

In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9,
to prevent repeated indirect call promotion for the same indirect call
and the same target, we used zero-count value profile to indicate an
indirect call has been promoted for a certain target. We removed
PromotedInsns cache in the same patch. However, there was a problem in
that patch described below, and that problem led me to add PromotedInsns
back as a mitigation in
https://reviews.llvm.org/rG4ffad1fb489f691825d6c7d78e1626de142f26cf.

When we get value profile from metadata by calling getValueProfDataFromInst,
we need to specify the maximum possible number of values we expect to read.
We uses MaxNumPromotions in the last patch so the maximum number of value
information extracted from metadata is MaxNumPromotions. If we have many
values including zero-count values when we write the metadata, some of them
will be dropped when we read them because we only read MaxNumPromotions
values. It will allow repeated indirect call promotion again. We need to
make sure if there are values indicating promoted targets, those values need
to be saved in metadata with higher priority than other values.

The patch fixed that problem. We change to use -1 to represent the count
of a promoted target instead of 0 so it is easier to sort the values.
When we prepare to update the metadata in updateIDTMetaData, we will sort
the values in the descending count order and extract only MaxNumPromotions
values to write into metadata. Since -1 is the max uint64_t number, if we
have equal to or less than MaxNumPromotions of -1 count values, they will
all be kept in metadata. If we have more than MaxNumPromotions of -1 count
values, we will only save MaxNumPromotions such values maximally. In such
case, we have logic in place in doesHistoryAllowICP to guarantee no more
promotion in sample loader pass will happen for the indirect call, because
it has been promoted enough.

With this change, now we can remove PromotedInsns without problem.

Differential Revision: https://reviews.llvm.org/D97350
2021-03-04 18:44:12 -08:00
David Blaikie a2a55def35 Move llvm/Analysis/ObjCARCUtil.h to IR to fix layering.
This is included from IR files, and IR doesn't/can't depend on Analysis
(because Analysis depends on IR).

Also fix the implementation - don't use non-member static in headers, as
it leads to ODR violations, inaccurate "unused function" warnings, etc.
And fix the header protection macro name (we don't generally include
"LIB" in the names, so far as I can tell).
2021-03-04 16:14:53 -08:00
Jianzhou Zhao db7fe6cd4b [dfsan] Propagate origin tracking at store
This is a part of https://reviews.llvm.org/D95835.

Reviewed By: morehouse, gbalats

Differential Revision: https://reviews.llvm.org/D97789
2021-03-04 23:34:44 +00:00
William S. Moses 2b896e39bf Revert "[Attributor] Enable heap-to-stack of any size"
This reverts commit 51bd42ef9b.
2021-03-04 17:24:56 -05:00
Sanjay Patel 1bee549737 [LoopVectorize] propagate fast-math-flags from induction instructions
This code assumed that FP math was only permissable if it was
fully "fast", so it hard-coded "fast" when creating new instructions.

The underlying code already allows matching recurrences/reductions
that are only "reassoc", so this change should prevent the potential
miscompile seen in the test diffs (we created "fast" ops even though
none existed in the original code).

I don't know if we need to create the temporary IRBuilder objects
used here, so that could be follow-up clean-up.

There's an open question about whether we should require "nsz" in
addition to "reassoc" here. InstCombine uses that combo for its
reassociative folds, but I think codegen is not as strict.
2021-03-04 17:21:32 -05:00
William S. Moses 51bd42ef9b [Attributor] Enable heap-to-stack of any size
Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1

Differential Revision: https://reviews.llvm.org/D97873
2021-03-04 17:17:23 -05:00
Francis Visoiu Mistrih 365b78396a [Remarks] Emit variable info in auto-init remarks
This enhances the auto-init remark with information about the variable
that is auto-initialized.

This is based of debug info if available, or alloca names (mostly for
development purposes).

```
auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes.Variables: var (4096 bytes). [-Rpass-missed=annotation-remarks]
  int var[1024];
      ^
```

This allows to see things like partial initialization of a variable that
the optimizer won't be able to completely remove.

Differential Revision: https://reviews.llvm.org/D97734
2021-03-04 12:51:22 -08:00
Akira Hatanaka 1900503595 [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.

Original commit message:

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-03-04 11:22:30 -08:00
Adrian Prantl d268febc56 Improve the debug info for coro-split .resume functions
This patch updates the scope line to point to the suspend point. This
makes the first address in the function point to the first source line
in the resume function rather than the function declaration. Without
this the line table "jumps" from the beginning of the function to the
suspend point at the beginning.

rdar://73386346

Differential Revision: https://reviews.llvm.org/D97345
2021-03-04 11:05:35 -08:00
Jianzhou Zhao 72abc9bf07 [dfsan] add a missing zero origin at atomic commands 2021-03-04 16:50:05 +00:00
Alexey Bataev 04ba80ca4d [Instcombiner]Improve emission of logical or/and reductions.
For logical or/and reductions we emit regular intrinsics @llvm.vector.reduce.or/and.vxi1 calls.
These intrinsics are not effective for the logical or/and reductions,
especially if the optimizer is able to emit short circuit versions of
the scalar or/and instructions and vector code gets less effective than
the scalar version.
Instead, or reduction for i1 can be represented as:
```
%val = bitcast <ReduxWidth x i1> to iReduxWidth
%res = cmp ne iReduxWidth %val, 0
```
and reduction for i1 can be represented as:
```
%val = bitcast <ReduxWidth x i1> to iReduxWidth
%res = cmp eq iReduxWidth %val, 11111
```
This improves perfromance of the vector code significantly and make it
to outperform short circuit scalar code.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D97406
2021-03-04 08:01:02 -08:00
Sanjay Patel 36a489d194 [Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC
Similar to b3a33553ae, but this shows a TODO and a potential
miscompile is already present.

We are tracking an FP instruction that does *not* have FMF (reassoc)
properties, so calling that "Unsafe" seems opposite of the common
reading.

I also removed one getter method by rolling the null check into
the access. Further simplification may be possible.

The motivation is to clean up the interactions between FMF and
function-level attributes in these classes and their callers.

The new test shows that there is an existing bug somewhere in
the callers. We assumed that the original code was fully 'fast'
and so we produced IR with 'fast' even though it was just 'reassoc'.
2021-03-04 10:40:26 -05:00
Sanjay Patel b3a33553ae [Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC
We are tracking an FP instruction that does *not* have FMF (reassoc)
properties, so calling that "Unsafe" seems opposite of the common
reading.

I also removed one getter method by rolling the null check into
the access. Further simplification seems possible.

The motivation is to clean up the interactions between FMF and
function-level attributes in these classes and their callers.
2021-03-04 08:53:04 -05:00
Hongtao Yu c75da238b4 [CSSPGO] Deduplicating dangling pseudo probes.
Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far.

This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added.

An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed.

Differential Revision: https://reviews.llvm.org/D97482
2021-03-03 22:44:42 -08:00
Hongtao Yu 8985515822 [CSSPGO] Unblocking optimizations by dangling pseudo probes.
This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.

The optimizations being unblocked are:

	1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
	2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
	3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.

Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D97481
2021-03-03 22:44:42 -08:00
Johannes Doerfert 5b70c12f3e [Attributor] Make DepClass a required argument
We often used a sub-optimal dependence class in the past because we
didn't see the argument. Let's make it explicit so we remember to think
about it.
2021-03-04 00:35:52 -06:00
Johannes Doerfert e592dad82e [Attributor] Fold "TrackDependence" into the DepClassTy enum
We don't need a bool and an enum to express the three options we
currently have. This makes the interface nicer and much easier to
use optional dependencies. Also avoids mistakes where the bool is
false and enum ignored.
2021-03-04 00:35:52 -06:00
Johannes Doerfert c8c93fdf0a [Attributor] Avoid work for GEPs and wait till the users are visited 2021-03-04 00:35:52 -06:00
Johannes Doerfert f3f88287c5 [Attributor] Use known alignment as lower bound to avoid work
If we know already more than available from a use, we don't need to
invest time on it.
2021-03-04 00:35:52 -06:00
Johannes Doerfert c14213e030 [Attributor][NFC] Move some trivial checks up 2021-03-04 00:35:52 -06:00
Johannes Doerfert 09c3eebf5f [Attributor] Use sensible initialization in AANoCaptureCallSiteReturned 2021-03-04 00:35:51 -06:00
Evgeniy Brevnov e94125f054 [DSE] Add support for not aligned begin/end
This is an attempt to improve handling of partial overlaps in case of unaligned begin\end.

Existing implementation just bails out if it encounters such cases. Even when it doesn't I believe existing code checking alignment constraints is not quite correct. It tries to ensure alignment of the "later" start/end offset while should be preserving relative alignment between earlier and later start/end.

The idea behind the change is simple. When start/end is not aligned as we wish instead of bailing out let's adjust it as necessary to get desired alignment.

I'll update with performance results as measured by the test-suite...it's still running...

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D93530
2021-03-04 12:24:23 +07:00
Serguei Katkov a0ff0f30df [InstCombine] Move statepoint intrinsic handling from visitCall to visitCallBase
statepoint intrinsic can be used in invoke context,
so it should be handled in visitCallBase to cover both call and invoke.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D97833
2021-03-04 11:00:22 +07:00
Xun Li 03f668613c [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions
See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment.

This patch disable promotion by check whether loop contains coro.suspend intrinsics.
This is a resubmit of D86190.
Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose.
In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame.

Differential Revision: https://reviews.llvm.org/D96928
2021-03-03 15:21:57 -08:00
Whitney Tsang 58d531fd6f [LoopUnrollRuntime] Add option to assume the non latch exit block to be
predictable.

Reviewed By: Meinersbur, bmahjour

Differential Revision: https://reviews.llvm.org/D97747
2021-03-03 20:43:31 +00:00
Philip Reames 89d331a31e Address review comment from D97219 (follow up to 8051156)
Probably should have done this before landing, but I forgot.

Basic idea is to avoid using the SCEV predicate when it doesn't buy us anything.  Also happens to set us up for handling non-add recurrences in the future if desired.
2021-03-03 12:20:27 -08:00
Philip Reames 99f5417346 Sink routine for replacing a operand bundle to CallBase [NFC]
We had equivalent code for both CallInst and InvokeInst, but never cared about the result type.
2021-03-03 12:07:55 -08:00
Philip Reames 805115655e [LSR] Unify scheduling of existing and inserted addrecs
LSR goes to some lengths to schedule IV increments such that %iv and %iv.next never need to overlap. This is fairly fundamental to LSRs cost model. LSR assumes that an addrec can be represented with a single register. If %iv and %iv.next have to overlap, then that assumption does not hold.

The bug - which this patch is fixing - is that LSR only does this scheduling for IVs which it inserts, but it's cost model assumes the same for existing IVs that it reuses. It will rewrite existing IV users such that the no-overlap property holds, but will not actually reschedule said IV increment.

As you can see from the relatively lack of test updates, this doesn't actually impact codegen much. The main reason for doing it is to make a follow up patch series which improves post-increment use and scheduling easier to follow.

Differential Revision: https://reviews.llvm.org/D97219
2021-03-03 12:07:55 -08:00
Fangrui Song a84f4fc0df [InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.

With `-z start-stop-gc` (LLD 13 (D96914); GNU ld 2.37 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker does not let `__start_/__stop_` references retain their sections.

Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.

This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.

`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.

Behaviors on COFF/Mach-O are not affected.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D97649
2021-03-03 11:32:24 -08:00
Arnold Schwaighofer a42bea211a [coro async] Allow a coro.suspend.async to specify which argument is the context argument
Before we used the same argument as the entry point. The resume partial
function might want to use a different ABI for its context argument

Differential Revision: https://reviews.llvm.org/D97333
2021-03-03 08:27:37 -08:00
Nico Weber 64f5d7e972 Revert "[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF"
This reverts commit 04c3040f41.
Breaks instrprof-value-merge.c in bootstrap builds.
2021-03-03 10:21:17 -05:00
Hans Wennborg 0a5dd06718 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.

> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
>   which indicates the call is implicitly followed by a marker
>   instruction and an implicit retainRV/claimRV call that consumes the
>   call result. In addition, it emits a call to
>   @llvm.objc.clang.arc.noop.use, which consumes the call result, to
>   prevent the middle-end passes from changing the return type of the
>   called function. This is currently done only when the target is arm64
>   and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
>   with the operand bundle in the IR and removes the inserted calls after
>   processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
>   operand bundle. It doesn't remove the operand bundle on the call since
>   the backend needs it to emit the marker instruction. The retainRV and
>   claimRV calls are emitted late in the pipeline to prevent optimization
>   passes from transforming the IR in a way that makes it harder for the
>   ARC middle-end passes to figure out the def-use relationship between
>   the call and the retainRV/claimRV calls (which is the cause of
>   PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
>   nothing in the callee prevents it from being paired up with the
>   retainRV/claimRV call in the caller. It then inserts a release call if
>   claimRV is attached to the call since autoreleaseRV+claimRV is
>   equivalent to a release. If it cannot find an autoreleaseRV call, it
>   tries to transfer the operand bundle to a function call in the callee.
>   This is important since the ARC optimizer can remove the autoreleaseRV
>   returning the callee result, which makes it impossible to pair it up
>   with the retainRV/claimRV call in the caller. If that fails, it simply
>   emits a retain call in the IR if retainRV is attached to the call and
>   does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
>   constant value if the call has the operand bundle. This ensures the
>   call always has at least one user (the call to
>   @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
>   multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
>   calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808

This reverts commit ed4718eccb.
2021-03-03 15:51:40 +01:00
Jianzhou Zhao ac4c1760b2 Fix the build error caused by D97570 2021-03-03 04:47:00 +00:00
Jianzhou Zhao d866b9c99d [dfsan] Propagate origin tracking at load
This is a part of https://reviews.llvm.org/D95835.

One issue is about origin load optimization: see the
comments of useCallbackLoadLabelAndOrigin

@gbalats This change may have some conflicts with your 8bit change. PTAL the change at visitLoad.

Reviewed By: morehouse, gbalats

Differential Revision: https://reviews.llvm.org/D97570
2021-03-03 04:32:30 +00:00
George Balatsouras 6ff18b08e6 [dfsan] Fix clang-tidy warnings
This addresses ~50 clang-tidy warnings on dfsan instrumentation pass.
It also contains some refactoring (all non-functional changes) to eliminate some variables and simplify code.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D97714
2021-03-02 17:37:45 -08:00
Andrei Elovikov b24afec8ae [NFCI][VPlan] Modify Recipes' print methods to honor Indent parameter
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D97787
2021-03-02 15:32:10 -08:00
Nikita Popov 3d8f842712 [LICM] Make promotion faster
Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-02 22:10:48 +01:00
Simon Pilgrim 232f32f0da [DSE] eliminateDeadStoresMemorySSA - fix "initialization is never read" clang-tidy warning. NFCI. 2021-03-02 15:01:33 +00:00
Alexey Bataev a054e94e9e [SLP]Merge reorder and reuse shuffles.
It is possible to merge reuse and reorder shuffles and reduce the total
cost of the vectorization tree/number of final instructions.

Differential Revision: https://reviews.llvm.org/D94992
2021-03-02 06:39:47 -08:00
Juneyoung Lee 365f5e2475 [JumpThreading] Fix tryToUnfoldSelectInCurrBB to treat and/or and its select form equally
This is a minor fix to update tryToUnfoldSelectInCurrBB to ignore select
form of and/ors because the function does not look into binops as well
2021-03-02 18:35:18 +09:00
Ta-Wei Tu ea1a1ebbc6 [NFC] Use std::swap in LoopInterchange 2021-03-02 11:42:48 +08:00
Fangrui Song 04c3040f41 [InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.

With `-z start-stop-gc` (D96914 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker no longer lets `__start_/__stop_` references retain them.

Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.

This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.

`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.

Behaviors on other COFF/Mach-O are not affected.

Differential Revision: https://reviews.llvm.org/D97649
2021-03-01 13:43:23 -08:00
Arthur Eubanks 040c1b49d7 Move EntryExitInstrumentation pass location
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.

Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.

Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D97608
2021-03-01 10:08:10 -08:00
Florian Hahn a6c81d3366
[VPlan] Remove recipes from back to front.
Update the deletion order when destroying VPBasicBlocks. This ensures
recipes that depend on earlier ones in the block are removed first.
Otherwise this may cause issues when recipes have remaining users later
in the block.
2021-03-01 16:06:30 +00:00
Florian Hahn 53dacb7b67
[LV] Generate RT checks up-front and remove them if required.
This patch updates LV to generate the runtime checks just after cost
modeling, to allow a more precise estimate of the actual cost of the
checks. This information will be used in future patches to generate
larger runtime checks in cases where the checks only make up a small
fraction of the expected scalar loop execution time.

The runtime checks are created up-front in a temporary block to allow better
estimating the cost and un-linked from the existing IR. After deciding to
vectorize, the checks are moved backed. If deciding not to vectorize, the
temporary block is completely removed.

This patch is similar in spirit to D71053, but explores a different
direction: instead of delaying the decision on whether to vectorize in
the presence of runtime checks it instead optimistically creates the
runtime checks early and discards them later if decided to not
vectorize. This has the advantage that the cost-modeling decisions
can be kept together and can be done up-front and thus preserving the
general code structure. I think delaying (part) of the decision to
vectorize would also make the VPlan migration a bit harder.

One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.

Reviewed By: lebedev.ri, ebrevnov

Differential Revision: https://reviews.llvm.org/D75980
2021-03-01 10:48:04 +00:00
Juneyoung Lee 5419b67137 [SimplifyCFG] Update FoldTwoEntryPHINode to handle and/or of select and binop equally
This is a minor change that fixes FoldTwoEntryPHINode to handle
phis with and/ors of select form and binop form equally.
2021-03-01 13:34:51 +09:00
Kazu Hirata d639120983 [llvm] Use set_is_subset (NFC) 2021-02-28 10:59:20 -08:00
Sanjay Patel 9502061bcc [InstCombine] avoid infinite loop in demanded bits for select
https://llvm.org/PR49205
2021-02-28 10:17:53 -05:00
William S. Moses b077d82b00 [Attributor] Conditinoally delete fns
Allow the attributor to delete functions only if requested

Differential Revision: https://reviews.llvm.org/D97238
2021-02-27 20:37:42 -05:00
Sanjay Patel 356cdabd3a [SimplifyCFG] avoid illegal phi with both poison and undef
In the example based on:
https://llvm.org/PR49218
...we are crashing because poison is a subclass of undef, so we merge blocks and create:

PHI node has multiple entries for the same basic block with different incoming values!
  %k3 = phi i64 [ poison, %entry ], [ %k3, %g ], [ undef, %entry ]

If both poison and undef values are incoming, we soften the poison values to undef.

Differential Revision: https://reviews.llvm.org/D97495
2021-02-27 09:10:32 -05:00
Kazu Hirata 1d4a2f3778 [Transforms/Utils] Use range-based for loops (NFC) 2021-02-26 22:36:40 -08:00
Fangrui Song bf176c49e8 [InstrProfiling] Use llvm.compiler.used instead of llvm.used for ELF
Many optimizers (e.g.  GlobalOpt/ConstantMerge) do not respect linker semantics
for comdat and may not discard the sections as a unit.

The interconnected `__llvm_prf_{cnts,data}` sections (in comdat for ELF)
are similar to D97432: `__profd_` is not directly referenced, so
`__profd_` may be discarded while `__profc_` is retained, breaking the
interconnection.  We currently conservatively add all such sections to
`llvm.used` and let the linker do GC for ELF.

In D97448, we will change GlobalObject's in the llvm.used list to use SHF_GNU_RETAIN,
causing the metadata sections to be unnecessarily retained (some `check-profile` tests check for GC).
Use `llvm.compiler.used` to retain the current GC behavior.

Differential Revision: https://reviews.llvm.org/D97585
2021-02-26 16:14:03 -08:00
George Balatsouras c9075a1c8e [dfsan] Record dfsan metadata in globals
This will allow identifying exactly how many shadow bytes were used
during compilation, for when fast8 mode is introduced.

Also, it will provide a consistent matching point for instrumentation
tests so that the exact llvm type used (i8 or i16) for the shadow can
be replaced by a pattern substitution. This is handy for tests with
multiple prefixes.

Reviewed by: stephan.yichao.zhao, morehouse

Differential Revision: https://reviews.llvm.org/D97409
2021-02-26 14:42:46 -08:00
Jianzhou Zhao a47d435bc4 [dfsan] Propagate origins for callsites
This is a part of https://reviews.llvm.org/D95835.

Each customized function has two wrappers. The
first one dfsw is for the normal shadow propagation. The second one dfso is used
when origin tracking is on. It calls the first one, and does additional
origin propagation. Which one to use can be decided at instrumentation
time. This is to ensure minimal additional overhead when origin tracking
is off.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D97483
2021-02-26 19:12:03 +00:00
Fangrui Song b55f29c194 [SanitizerCoverage] Clarify llvm.used/llvm.compiler.used and partially fix unmatched metadata sections on Windows
`__sancov_pcs` parallels the other metadata section(s). While some optimizers
(e.g. GlobalDCE) respect linker semantics for comdat and retain or discard the
sections as a unit, some (e.g.  GlobalOpt/ConstantMerge) do not. So we have to
conservatively retain all unconditionally in the compiler.

When a comdat is used, the COFF/ELF linkers' GC semantics ensure the
associated parallel array elements are retained or discarded together,
so `llvm.compiler.used` is sufficient.

Otherwise (MachO (see rL311955/rL311959), COFF special case where comdat is not
used), we have to use `llvm.used` to conservatively make all sections retain by
the linker. This will fix the Windows problem once internal linkage
GlobalObject's in `llvm.used` are retained via `/INCLUDE:`.

Reviewed By: morehouse, vitalybuka

Differential Revision: https://reviews.llvm.org/D97432
2021-02-26 11:10:03 -08:00
Simon Pilgrim 455d43b951 [Utils] collectBitParts - bail for integers > 128-bits
collectBitParts uses int8_t for the bit indices, leaving a 128-bit limit.

We already test for this before calling collectBitParts, but rGb94c215592bd added truncate handling which meant we could end up processing wider integers.

Thanks to @manojgupta for the repro.
2021-02-26 14:58:01 +00:00
Stephen Tozer ec7b9b0c18 [InstCombine] Avoid redundant or out-of-order debug value sinking
This patch modifies TryToSinkInstruction in the InstCombine pass, to prevent
redundant debug intrinsics from being produced, and also prevent the intrinsics
from being emitted in an incorrect order. It does this by ensuring that when
this pass sinks an instruction and creates clones of the debug intrinsics that
use that instruction, it inserts those debug intrinsics in their original order,
and only inserts the last debug intrinsic for each variable in the Instruction's
block.

Differential revision: https://reviews.llvm.org/D95463
2021-02-26 13:04:33 +00:00
Evgeniy Brevnov 13a5cac2ba Revert "[NARY-REASSOCIATE] Support reassociation of min/max"
This reverts commit 83d134c3c4.
2021-02-26 19:47:54 +07:00
Kazu Hirata 5fc9e30985 [Scalar] Use range-based for loops (NFC) 2021-02-25 19:54:38 -08:00
Jianzhou Zhao c88fedef2a [dfsan] Conservative solution to atomic load/store
DFSan at store does store shadow data; store app data; and at load does
load shadow data; load app data.

When an application data is atomic, one overtainting case is

thread A: load shadow
thread B: store shadow
thread B: store app
thread A: load app

If the application address had been used by other flows, thread A reads
previous shadow, causing overtainting.

The change is similar to MSan's solution.
1) enforce ordering of app load/store
2) load shadow after load app; store shadow before shadow app
3) do not track atomic store by reseting its shadow to be 0.
The last one is to address a case like this.

Thread A: load app
Thread B: store shadow
Thread A: load shadow
Thread B: store app

This approach eliminates overtainting as a trade-off between undertainting
flows via shadow data race.

Note that this change addresses only native atomic instructions, but
does not support builtin libcalls yet.
   https://llvm.org/docs/Atomics.html#libcalls-atomic

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D97310
2021-02-25 23:34:58 +00:00
James Y Knight 24539f1ef2 Add Alignment argument to IRBuilder CreateAtomicRMW and CreateAtomicCmpXchg.
And then push those change throughout LLVM.

Keep the old signature in Clang's CGBuilder for now -- that will be
updated in a follow-on patch (D97224).

The MLIR LLVM-IR dialect is not updated to support the new alignment
attribute, but preserves its existing behavior.

Differential Revision: https://reviews.llvm.org/D97223
2021-02-25 18:29:42 -05:00
Francis Visoiu Mistrih fee9abe69c [Remarks] Provide more information about auto-init calls
This now analyzes calls to both intrinsics and functions.

For intrinsics, grab the ones we know and care about (mem* family) and
analyze the arguments.

For calls, use TLI to get more information about the libcalls, then
analyze the arguments if known.

```
auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes. [-Rpass-missed=annotation-remarks]
  int var[1024];
      ^
```

Differential Revision: https://reviews.llvm.org/D97489
2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih 4753a69a31 [Remarks] Provide more information about auto-init stores
This adds support for analyzing the instruction with the !annotation
"auto-init" in order to generate a more user-friendly remark.

For now, support the store size, and whether it's atomic/volatile.

Example:

```
auto-init.c:4:7: remark: Store inserted by -ftrivial-auto-var-init.Store size: 4 bytes. [-Rpass-missed=annotation-remarks]
  int var;
      ^
```

Differential Revision: https://reviews.llvm.org/D97412
2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih c49b600b2f [Remarks] Emit remarks for "auto-init" !annotations
Using the !annotation metadata, emit remarks pointing to code added by
`-ftrivial-auto-var-init` that survived the optimizer.

Example:

```
auto-init.c:4:7: remark: Initialization inserted by -ftrivial-auto-var-init. [-Rpass-missed=annotation-remarks]
  int buf[1024];
      ^
```

The tests are testing various situations like calls/stores/other
instructions, with debug locations, and extra debug information on
purpose: more patches will come to improve the reporting to make it more
user-friendly, and these tests will show how the reporting evolves.

Differential Revision: https://reviews.llvm.org/D97405
2021-02-25 15:14:09 -08:00
Adrian Prantl 1693180884 Add a nullptr check.
This doesn't actually reproduce with a dbg.declare(i8* null, ...)
which produces a non-null null Value, but I have seen this show up in
crash logs. I'm suspecting that there may be another pass forcibly
setting the operand to a nullptr.
2021-02-25 12:01:11 -08:00
Fangrui Song 4d63892acb [SanitizerCoverage] Drop !associated on metadata sections
In SanitizerCoverage, the metadata sections (`__sancov_guards`,
`__sancov_cntrs`, `__sancov_bools`) are referenced by functions.  After
inlining, such a `__sancov_*` section can be referenced by more than one
functions, but its sh_link still refers to the original function's section.
(Note: a SHF_LINK_ORDER section referenced by a section other than its linked-to
section violates the invariant.)

If the original function's section is discarded (e.g. LTO internalization +
`ld.lld --gc-sections`), ld.lld may report a `sh_link points to discarded section` error.

This above reasoning means that `!associated` is not appropriate to be called by
an inlinable function. Non-interposable functions are inline candidates, so we
have to drop `!associated`. A `__sancov_pcs` is not referenced by other sections
but is expected to parallel a metadata section, so we have to make sure the two
sections are retained or discarded at the same time. A section group does the
trick.  (Note: we have a module ctor, so `getUniqueModuleId` guarantees to
return a non-empty string, and `GetOrCreateFunctionComdat` guarantees to return
non-null.)

For interposable functions, we could keep using `!associated`, but
LTO can change the linkage to `internal` and allow such functions to be inlinable,
so we have to drop `!associated`, too. To not interfere with section
group resolution, we need to use the `noduplicates` variant (section group flag 0).
(This allows us to get rid of the ModuleID parameter.)
In -fno-pie and -fpie code (mostly dso_local), instrumented interposable
functions have WeakAny/LinkOnceAny linkages, which are rare. So the
section group header overload should be low.

This patch does not change the object file output for COFF (where `!associated` is ignored).

Reviewed By: morehouse, rnk, vitalybuka

Differential Revision: https://reviews.llvm.org/D97430
2021-02-25 11:59:23 -08:00
Jon Roelofs 7f6e331645 Support `#pragma clang section` directives on MachO targets
rdar://59560986

Differential Revision: https://reviews.llvm.org/D97233
2021-02-25 09:30:10 -08:00
Rong Xu 6103b6ad69 [SampleFDO][NFC] Refactor: make SampleProfileLoaderBaseImpl a template class
This patch makes SampleProfileLoaderBaseImpl a template class so it
can be used in CodeGen transformation.

Noticeable changes:
 * use one template parameter and use IRTraits to get other used
   types an type specific functions.
 * remove the temporary "inline" keywords in previous refactor
   patch.
 * change the template function findEquivalencesFor to a regular
   function. This function has a single caller with type of
   PostDominatorTree. It's simpler to use the type directly
   because MachinePostDominatorTree is not a derived type of
   template DominatorTreeBase.

Differential Revision: https://reviews.llvm.org/D96981
2021-02-25 08:26:17 -08:00
Evgeniy Brevnov d0a6f8bb65 [NFC] Fix build failure after 83d134c3c4 2021-02-25 18:43:00 +07:00
Evgeniy Brevnov 83d134c3c4 [NARY-REASSOCIATE] Support reassociation of min/max
Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88287
2021-02-25 18:22:39 +07:00
Xun Li c38000a9fb [Coroutine] Check indirect uses of alloca when checking lifetime info
In the existing logic, we look at the lifetime.start marker of each alloca, and check all uses of the alloca, to see if any pair of the lifetime marker and an use of alloca crosses suspension point.
This approach is unfortunately incorrect. An use of alloca does not need to be a direct use, but can be an indirect use through alias.
Only checking direct uses can miss cases where indirect uses are crossing suspension point.
This can be demonstrated in the newly added test case 007.
In the test case, both x and y are only directly used prior to suspend, but they are captured into an alias, merged through a PHINode (so they couldn't be materialized), and used after CoroSuspend.
If we only check whether the lifetime starts cross suspension points with direct uses, we will put the allocas to the stack, and then capture their addresses in the frame.

Instead of fixing it in D96441 and D96566, this patch takes a different approach which I think is better.
We still checks the lifetime info in the same way as before, but with two differences:
1. The collection of liftime.start is moved into AllocaUseVisitor to make the logic more concentrated.
2. When looking at lifetime.start and use pairs, we not only checks the direct uses as before, but in this patch we check all uses collected by AllocaUseVisitor, which would include all indirect uses through alias. This will make the analysis more accurate without throwing away the lifetime optimization.

Differential Revision: https://reviews.llvm.org/D96922
2021-02-24 18:29:23 -08:00
Sanjay Patel a7cee55762 [InstCombine] fold fdiv with powi divisor (PR49147)
This extends b40fde062c for the especially non-standard
powi pattern. We want to avoid being completely wrong
on the negation-of-int-min corner case, so I'm adding
an extra FMF check for 'ninf' assuming that gives us
the flexibility to handle that possibility.
https://llvm.org/PR49147
2021-02-24 16:44:36 -05:00
Sanjay Patel 868d43fbd6 [InstCombine] add helper for x/pow(); NFC
We at least want to add powi to this list, so
split it off into a switch to reduce code duplication.
2021-02-24 16:44:36 -05:00
Duncan P. N. Exon Smith 01701646d5 Transforms: Clone distinct nodes in metadata mapper unless RF_ReuseAndMutateDistinctMDs
This is a follow up to 22a52dfddc and a
revert of df763188c9.

With this change, we only skip cloning distinct nodes in
MDNodeMapper::mapDistinct if RF_ReuseAndMutateDistinctMDs, dropping the
no-longer-needed local helper `cloneOrBuildODR()`.  Skipping cloning in
other cases is unsound and breaks CloneModule, which is why the textual
IR for PR48841 didn't pass previously. This commit adds the test as:
Transforms/ThinLTOBitcodeWriter/cfi-debug-info-cloned-type-references-global-value.ll

Cloning less often exposed a hole in subprogram cloning in
CloneFunctionInto thanks to df763188c9a1ecb1e7e5c4d4ea53a99fbb755903's
test ThinLTO/X86/Inputs/dicompositetype-unique-alias.ll. If a function
has a subprogram attachment whose scope is a DICompositeType that
shouldn't be cloned, but it has no internal debug info pointing at that
type, that composite type was being cloned. This commit plugs that hole,
calling DebugInfoFinder::processSubprogram from CloneFunctionInto.

As hinted at in 22a52dfddcefad4f275eb8ad1cc0e200074c2d8a's commit
message, I think we need to formalize ownership of metadata a bit more
so that ValueMapper/CloneFunctionInto (and similar functions) can deal
with cloning (or not) metadata in a more generic, less fragile way.

This fixes PR48841.

Differential Revision: https://reviews.llvm.org/D96734
2021-02-24 12:57:52 -08:00
Sander de Smalen 5e19208d96 [InstructionCost] NFC: Fix up missing cases in LoopVectorize and CodeGenPrep.
This fixes the types of a few more cost variables to be of type InstructionCost.
2021-02-24 14:30:03 +00:00
Pierre Gousseau 27830bc2b1 [asan] Avoid putting globals in a comdat section when targetting elf.
Putting globals in a comdat for dead-stripping changes the semantic and
can potentially cause false negative odr violations at link time.
If odr indicators are used, we keep the comdat sections, as link time
odr violations will be dectected for the odr indicator symbols.

This fixes PR 47925
2021-02-24 12:01:56 +00:00
Simon Pilgrim b94c215592 [Utils] collectBitParts - add truncate() handling 2021-02-24 11:48:34 +00:00
Florian Hahn 6240f436dd
Recommit "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends."
This reverts the revert commit 437f0bbcd5.

It adds a new toVPRecipeResult, which forces VPRecipeOrVPValueTy to be
constructed with a VPRecipeBase *. This should address ambiguous
constructor issues for recipe sub-types that also inherit from VPValue.
2021-02-24 10:36:02 +00:00
Dan Liew 7d3ef103b5 [ASan] Introduce a way set different ways of emitting module destructors.
Previously there was no way to control how module destructors were emitted
by `ModuleAddressSanitizerPass`. However, we want language frontends (e.g. Clang)
to be able to decide how to emit these destructors (if at all).

This patch introduces the `AsanDtorKind` enum that represents the different ways
destructors can be emitted. There are currently only two valid ways to emit destructors.

* `Global` - Use `llvm.global_dtors`. This was the previous behavior and is the default.
* `None`   - Do not emit module destructors.

The `ModuleAddressSanitizerPass` and the various wrappers around it have been updated
to take the `AsanDtorKind` as an argument.

The `-asan-destructor-kind=` command line argument has been introduced to make this
easy to test from `opt`. If this argument is specified it overrides the value passed
to the `ModuleAddressSanitizerPass` constructor.

Note that `AsanDtorKind` is not `bool` because we will introduce a new way to
emit destructors in a subsequent patch.

Note that `AsanDtorKind` is given its own header file because if it is declared
in `Transforms/Instrumentation/AddressSanitizer.h` it leads to compile error
(Module is ambiguous) when trying to use it in
`clang/Basic/CodeGenOptions.def`.

rdar://71609176

Differential Revision: https://reviews.llvm.org/D96571
2021-02-23 20:01:21 -08:00
Juneyoung Lee 56d228a14e [SimplifyCFG] Update passingValueIsAlwaysUndefined to check more attributes
This is a simple patch to update SimplifyCFG's passingValueIsAlwaysUndefined to inspect more attributes.

A new function `CallBase::isPassingUndefUB` checks attributes that imply noundef.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D97244
2021-02-24 10:40:50 +09:00
Fangrui Song ef312951fd collectUsedGlobalVariables: migrate SmallPtrSetImpl overload to SmallVecImpl overload after D97128
And delete the SmallPtrSetImpl overload.

While here, decrease inline element counts from 8 to 4. See D97128 for the choice.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D97257
2021-02-23 16:09:06 -08:00
Fangrui Song ed02f52d28 Fix unstable SmallPtrSet iteration issues due to collectUsedGlobalVariables
While here, decrease inline element counts from 8 to 4. See D97128 for the choice.

Depends on D97128 (which added a new SmallVecImpl overload for collectUsedGlobalVariables).

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D97139
2021-02-23 16:09:05 -08:00
Fangrui Song 3adb89bb9f [ThinLTO] Make cloneUsedGlobalVariables deterministic
Iterating on `SmallPtrSet<GlobalValue *, 8>` with more than 8 elements
is not deterministic. Use a SmallVector instead because `Used` is guaranteed to contain unique elements.

While here, decrease inline element counts from 8 to 4. The number of
`llvm.used`/`llvm.compiler.used` elements is usually 0 or 1. For full
LTO/hybrid LTO, the number may be large, so we need to be careful.

According to tejohnson's analysis https://reviews.llvm.org/D97128#2582399 , 4 is
good for a large project with WholeProgramDevirt, when available_externally
vtables are placed in the llvm.compiler.used set.

Differential Revision: https://reviews.llvm.org/D97128
2021-02-23 16:09:05 -08:00
Teresa Johnson 0a5949dcfa [WPD] Fix handling of pure virtual base class
The fix in 3c4c205060 caused an assert in
the case of a pure virtual base class. In that case, the vTableFuncs
list on the summary will be empty, so we were hitting the new assert
that the linkage type was not available_externally.

In the case of pure virtual, we do not want to assert, and additionally
need to set VS so that we don't treat it conservatively and quit the
analysis of the type id early.

This exposed a pre-existing issue where we were not updating the vcall
visibility on pure virtual functions when whole program visibility was
specified. We were skipping updating the visibility on any global vars
that didn't have any vTableFuncs, which meant all pure virtual were not
updated, and the later analysis would block any devirtualization of
calls that had a type id used on those pure virtual vtables (see the
handling in the other code modified in this patch). Simply remove that
check. It will mean that we may update the vcall visibility on global
vars that aren't vtables, but that setting is ignored for any global
vars that didn't have type metadata anyway.

Added a new test case that asserted without removing the assert, and
that requires the other fixes in this patch (updateVCallVisibilityInIndex
and not skipping all vtables without virtual funcs) to get a successful
devirtualization with index-only WPD. I added cases to test hybrid and
regular LTO for completeness, although those already worked without the
fixes here.

With this final fix, a clang multistage bootstrap with WPD builds and
runs all tests successfully.

Differential Revision: https://reviews.llvm.org/D97126
2021-02-23 16:07:09 -08:00
Jianzhou Zhao a05aa0dd5e [dfsan] Update memset and dfsan_(set|add)_label with origin tracking
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D97302
2021-02-23 23:16:33 +00:00
Matthew Voss 6da7d31416 [llvm-profdata] Emit Error when Invalid MemOpSize Section is Created by llvm-profdata
Under certain (currently unknown) conditions, llvm-profdata is outputting
profiles that have two consecutive entries in the MemOPSize section for the
value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch
instruction with two cases for 0. As mentioned, we’re not quite sure what’s
causing this to happen, but this patch prevents llvm-profdata from outputting a
profile that has this problem and gives an error with a request for a
reproducible.

Differential Revision: https://reviews.llvm.org/D92074
2021-02-23 12:51:54 -08:00
Andrei Elovikov 3605b873f6 [NFC][VPlan] Use VPUser to store block's predicate
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96529
2021-02-23 11:08:27 -08:00
Florian Hahn de40423c85
[LV] Ensure fixNonInductionPHIs uses a valid insertion point.
In some cases, Builder's insertion point may be invalidated before using
it in VPTransformState::get. Make sure the insertion point is
up-to-date.

This should fix various sanitizer errors, like
https://lab.llvm.org/buildbot/#/builders/5/builds/4933/steps/9/logs/stdio
2021-02-23 18:51:05 +00:00
Florian Hahn 437f0bbcd5
Revert "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends."
This reverts commit 4efa097eb4, because
some the compilers used for some bots do not support automatic
conversions to PointerUnion.
2021-02-23 16:57:21 +00:00
Florian Hahn 4efa097eb4
[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends.
Generalize the return value of tryToCreateWidenRecipe to return either a
newly create recipe or an existing VPValue. Use this to avoid creating
unnecessary VPBlendRecipes.

Fixes PR44800.
2021-02-23 16:52:03 +00:00
Juneyoung Lee 19c2e12947 [JumpThreading] Update computeValueKnownInPredecessors to recognize logical and/or patterns
This allows JumpThreading's computeValueKnownInPredecessors to
recognize select form of and/or patterns as well.
2021-02-24 00:06:10 +09:00
Nate Chandler 01b4890e47 Add @llvm.coro.async.size.replace intrinsic.
The new intrinsic replaces the size in one specified AsyncFunctionPointer with
the size in another.  This ability is necessary for functions which merely
forward to async functions such as those defined for partial applications.

Reviewed By: aschwaighofer

Differential Revision: https://reviews.llvm.org/D97229
2021-02-23 06:43:52 -08:00
David Green dd2dbf7ee2 [TTI] Change getOperandsScalarizationOverhead to take Type args
As a followup to D95291, getOperandsScalarizationOverhead was still
using a VF as a vector factor if the arguments were scalar, and would
assert on certain matrix intrinsics with differently sized vector
arguments. This patch removes the VF arg, instead passing the Types
through directly. This should allow it to more accurately compute the
cost without having to guess at which operands will be vectorized,
something difficult with more complex intrinsics.

This adjusts one SVE test as it is now calling the wrong intrinsic vs
veccall. Without invalid InstructCosts the cost of the scalarized
intrinsic is too low. This should get fixed when the cost of
scalarization is accounted for with scalable types.

Differential Revision: https://reviews.llvm.org/D96287
2021-02-23 13:04:59 +00:00
David Green bd4b61efbd [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf working. ARM removed the fix in
dfac521da1, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-23 13:03:26 +00:00
Matteo Favaro 633e090528
[DSE] Allow ptrs defined in the entry block in IsGuaranteedLoopInvariant.
The **IsGuaranteedLoopInvariant** function is making sure to check if the
incoming pointer is guaranteed to be loop invariant, therefore I think
the case where the pointer is defined in the entry block of a function
automatically guarantees the pointer to be loop invariant, as the entry
block of a function cannot have predecessors or be part of a loop.

I implemented this small patch and tested it using
**ninja check-llvm-unit** and **ninja check-llvm**. I added a contained test
file that shows the problem and used **opt -O3 -debug** on it to make sure
the case is not currently handled (in fact the debug log is showing that
the DSE pass is bailing out when testing if the killer store is able to
clobber the dead store).

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96979
2021-02-23 12:00:44 +00:00
Juneyoung Lee 481c62277d [BuildLibCalls] Add noundef to allocator fns' size
This is a patch to explicitly mark the size parameter of allocator functions like malloc/realloc/... as noundef.

For C/C++: undef can be created from reading an uninitialized variable or padding.
Calling a function with uninitialized variable is already UB.
Calling malloc with padding value is.. something that's not expected. Padding bits may appear in a coerced aggregate, which doesn't apply to malloc's size.
Therefore, malloc's size can be marked as noundef.

For transformations that introduce malloc/realloc/..: I ran LLVM unit tests with an updated Alive2 semantics, and found no regression, so it seems okay.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97045
2021-02-23 13:58:03 +09:00
Kazu Hirata 4ed47858ab [llvm] Use llvm::drop_begin (NFC) 2021-02-22 20:17:16 -08:00
ksyx 4125cabce1 [GVN] Fix a typo in comment
NFC.

Differential Revision: https://reviews.llvm.org/D97200

Reviewed By: fhahn
2021-02-23 10:39:34 +08:00
Jianzhou Zhao 7424efd5ad [dfsan] Propagate origins at non-memory/phi/call instructions
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D97200
2021-02-23 02:12:45 +00:00
Petr Hosek c24b7a16b1 [InstrProfiling] Use ELF section groups for counters, data and values
__start_/__stop_ references retain C identifier name sections such as
__llvm_prf_*. Putting these into a section group disables this logic.

The ELF section group semantics ensures that group members are retained
or discarded as a unit. When a function symbol is discarded, this allows
allows linker to discard counters, data and values associated with that
function symbol as well.

Note that `noduplicates` COMDAT is lowered to zero-flag section group in
ELF. We only set this for functions that aren't already in a COMDAT and
for those that don't have available_externally linkage since we already
use regular COMDAT groups for those.

Differential Revision: https://reviews.llvm.org/D96757
2021-02-22 14:00:02 -08:00
Alexey Bataev 9a4dd4de9d [SLP]No need to mark scatter load pointer as scalar as it gets vectorized.
Pointer operand of scatter loads does not remain scalar in the tree (it
gest vectorized) and thus must not be marked as the scalar that remains
scalar in vectorized form.

Differential Revision: https://reviews.llvm.org/D96818
2021-02-22 11:58:28 -08:00
Petr Hosek 4827492d9f Revert "[InstrProfiling] Use ELF section groups for counters, data and values"
This reverts commits:
5ca21175e0
97184ab99c

The instrprof-gc-sections.c is failing on AArch64 LLD bot.
2021-02-22 11:13:55 -08:00
Florian Hahn 95daec6a84
[ConstraintElimination] Use unsigned > 0 instead of != 0.
ICMP_NE predicates cannot be directly represented as constraint. But we
can use ICMP_UGT instead ICMP_NE for %x != 0.

See https://alive2.llvm.org/ce/z/XlLCsW
2021-02-22 17:54:36 +00:00
Nikita Popov 4125afc357 [MemCpyOpt] Fix handling of readnone byval arguments
If the call is readnone, then there may not be any MemoryAccess
associated with the call. Bail out in that case.

This fixes the issue reported at
https://reviews.llvm.org/D94376#2578312.
2021-02-22 18:48:31 +01:00
Nikita Popov 5e7e499b91 [JumpThreading] Clone noalias.scope.decl when threading blocks
When cloning instructions during jump threading, also clone and
adapt any declared scopes. This is primarily important when
threading loop exits, because we'll end up with two dominating
scope declarations in that case (at least after additional loop
rotation). This addresses a loose thread from
https://reviews.llvm.org/rG2556b413a7b8#975012.

Differential Revision: https://reviews.llvm.org/D97154
2021-02-22 18:35:30 +01:00
Florian Hahn c7ee57f1dc
[LV] Directly use incoming value for single VPBlendRecipes.
VPBlendRecipes with single incoming (value, mask) pair are no-ops. Use
the incoming value directly.
2021-02-22 16:10:08 +00:00
Florian Hahn c11fd0df64
[VPlan] Skip VPWidenPHIRecipe in VPInterleavedACcessInfo.
Update unit tests that did not expect VPWidenPHIRecipes after
15a74b64df.
2021-02-22 10:35:09 +00:00
Florian Hahn 15a74b64df
[VPlan] Manage pairs of incoming (VPValue, VPBB) in VPWidenPHIRecipe.
This patch extends VPWidenPHIRecipe to manage pairs of incoming
(VPValue, VPBasicBlock) in the VPlan native path. This is made possible
because we now directly manage defined VPValues for recipes.

By keeping both the incoming value and block in the recipe directly,
code-generation in the VPlan native path becomes independent of the
predecessor ordering when fixing up non-induction phis, which currently
can cause crashes in the VPlan native path.

This fixes PR45958.

Reviewed By: sguggill

Differential Revision: https://reviews.llvm.org/D96773
2021-02-22 09:44:25 +00:00
Petr Hosek 5ca21175e0 [InstrProfiling] Use ELF section groups for counters, data and values
__start_/__stop_ references retain C identifier name sections such as
__llvm_prf_*. Putting these into a section group disables this logic.

The ELF section group semantics ensures that group members are retained
or discarded as a unit. When a function symbol is discarded, this allows
allows linker to discard counters, data and values associated with that
function symbol as well.

Note that `noduplicates` COMDAT is lowered to zero-flag section group in
ELF. We only set this for functions that aren't already in a COMDAT and
for those that don't have available_externally linkage since we already
use regular COMDAT groups for those.

Differential Revision: https://reviews.llvm.org/D96757
2021-02-21 16:13:06 -08:00
Nikita Popov e0615bcd39 [Loads] Add optimized FindAvailableLoadedValue() overload (NFCI)
FindAvailableLoadedValue() accepts an iterator by reference. If no
available value is found, then the iterator will either be left
at a clobbering instruction or the beginning of the basic block.
This allows using FindAvailableLoadedValue() across multiple blocks.

If this functionality is not needed, as is the case in InstCombine,
then we can use a much more efficient implementation: First try
to find an available value, and only perform clobber checks if
we actually found one. As this function only looks at a very small
number of instructions (6 by default) and usually doesn't find an
available value, this saves many expensive alias analysis queries.
2021-02-21 18:42:56 +01:00
Kristina Bessonova e97aab8d15 [ThinLTO] Fix import of multiply defined global variables
Currently, if there is a module that contains a strong definition of
a global variable and a module that has both a weak definition for
the same global and a reference to it, it may result in an undefined symbol error
while linking with ThinLTO.

It happens because:
* the strong definition become internal because it is read-only and can be imported;
* the weak definition gets replaced by a declaration because it's non-prevailing;
* the strong definition failed to be imported because the destination module
  already contains another definition of the global yet this def is non-prevailing.

The patch adds a check to computeImportForReferencedGlobals() that allows
considering a global variable for being imported even if the module contains
a definition of it in the case this def has an interposable linkage type.

Note that currently the check is based only on the linkage type
(and this seems to be enough at the moment), but it might be worth to account
the information whether the def is prevailing or not.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D95943
2021-02-21 18:34:12 +02:00
Jianzhou Zhao 9524632fa2 [dfsan] Comment out unused methods by D97087 temporarily 2021-02-21 03:31:19 +00:00
Sanjay Patel e772618f1e [InstCombine] fold fdiv with exp/exp2 divisor (PR49147)
Follow-up to:
D96648 / b40fde062
...for the special-case base calls.

From the earlier commit:
This is unusual in the general (non-reciprocal) case because we need
an extra instruction, but that should be better for general FP
reassociation and codegen. We conservatively check for "arcp" FMF
here as we do with existing fdiv folds, but it is not strictly
necessary to have that.
2021-02-20 16:02:58 -05:00
Teresa Johnson fde55a9c9b [LTO] Fix cloning of llvm*.used when splitting module
Refines the fix in 3c4c205060 to only
put globals whose defs were cloned into the split regular LTO module
on the cloned llvm*.used globals. This avoids an issue where one of the
attached values was a local that was promoted in the original module
after the module was cloned. We only need to have the values defined in
the new module on those globals.

Fixes PR49251.

Differential Revision: https://reviews.llvm.org/D97013
2021-02-20 09:46:43 -08:00
Simon Pilgrim 609d0c9772 [InstCombine] matchBSwapOrBitReverse - remove pattern matching early-out. NFCI.
recognizeBSwapOrBitReverseIdiom + collectBitParts have pattern matching to bail out early if a bswap/bitreverse pattern isn't possible - we should be able to rely on this instead without any notable change in compile time.

This is part of a cleanup towards letting matchBSwapOrBitReverse /recognizeBSwapOrBitReverseIdiom use 'root' instructions that aren't ORs (FSHL/FSHRs in particular which can be prematurely created).

Differential Revision: https://reviews.llvm.org/D97056
2021-02-20 13:15:34 +00:00
Dávid Bolvanský cd54c57919 Reland "[Libcalls, Attrs] Annotate libcalls with noundef"
Fixed Clang tests.
2021-02-20 06:18:48 +01:00
Dávid Bolvanský 94d034fb86 Revert "[Libcalls, Attrs] Annotate libcalls with noundef"
This reverts commit 33b0c63775. Bots are failing. Some Clang tests need to be updated too.
2021-02-20 04:18:42 +01:00
Dávid Bolvanský 33b0c63775 [Libcalls, Attrs] Annotate libcalls with noundef
I think we can use here same logic as for nonnull.

strlen(X) - X must be noundef => valid pointer.

for libcalls with size arg, we add noundef only if size is known and greater than 0 - so pointers must be noundef (valid ones)

Reviewed By: jdoerfert, aqjune

Differential Revision: https://reviews.llvm.org/D95122
2021-02-20 04:10:07 +01:00
Dávid Bolvanský 68e6025cf7 Revert "[BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly"
This reverts commit 05d891a19e.
2021-02-20 03:58:53 +01:00
Dávid Bolvanský 05d891a19e [BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94850
2021-02-20 03:56:01 +01:00
Jianzhou Zhao dab953c8e4 [dfsan] Add utils that get/set origins
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D97087
2021-02-20 00:52:33 +00:00
Jianzhou Zhao cb1f1aab90 [dfsan] Add origin address calculation
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D97065
2021-02-19 21:30:07 +00:00
Jianzhou Zhao efc8f3311b [msan] Set cmpxchg shadow precisely
In terms of https://llvm.org/docs/LangRef.html#cmpxchg-instruction,
the return type of chmpxchg is a pair {ty, i1}, while I think we
only wanted to set the shadow for the address 0th op, and it has type
ty.

Reviewed-by: eugenis

Differential Revision: https://reviews.llvm.org/D97029
2021-02-19 20:23:23 +00:00
Wei Mi 4ffad1fb48 [SampleFDO] Add PromotedInsns to prevent repeated ICP.
In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9,
We use 0 count value profile to memorize which target has been promoted
and prevent repeated ICP for the same target, so we delete PromotedInsns.
However, I found the implementation in the patch has some shortcomings
to be fixed otherwise there will still be repeated ICP. So I add
PromotedInsns back temorarily. Will remove it after I get a thorough fix.
2021-02-19 10:01:49 -08:00
Benjamin Kramer 59f442e6bb [LV] Fold single-use variable into assert. NFC. 2021-02-19 18:11:39 +01:00
Nikita Popov 71a8e4e7d6 [MemCopyOpt] Enable MemorySSA by default
This enables use of MemorySSA instead of MemDep in MemCpyOpt. To
allow this without significant compile-time impact, the MemCpyOpt
pass is moved directly before DSE (in the cases where this was not
already the case), which allows us to reuse the existing MemorySSA
analysis.

Unlike the MemDep-based implementation, the MemorySSA-based MemCpyOpt
can also perform simple optimizations across basic blocks.

Differential Revision: https://reviews.llvm.org/D94376
2021-02-19 18:06:25 +01:00
Florian Hahn edc92a1c42
[LV] Remove VPCallback.
Now that all state for generated instructions is managed directly in
VPTransformState, VPCallBack is no longer needed. This patch updates the
last use of `getOrCreateScalarValue` to instead manage the value
directly in VPTransformState and removes VPCallback.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D95383
2021-02-19 12:50:41 +00:00
Nikita Popov 2f17ed294f [DCE] Don't remove non-willreturn calls
In both ADCE and BDCE (via DemandedBits) we should not remove
instructions that are not guaranteed to return. This issue was
pointed out by fhahn in the recent llvm-dev thread.

Differential Revision: https://reviews.llvm.org/D96993
2021-02-19 12:35:40 +01:00
Nikita Popov 370addb996 [IR] Move willReturn() to Instruction
This moves the willReturn() helper from CallBase to Instruction,
so that it can be used in a more generic manner. This will make
it easier to fix additional passes (ADCE and BDCE), and will give
us one place to change if additional instructions should become
non-willreturn (e.g. there has been talk about handling volatile
operations this way).

I have also included the IntrinsicInst workaround directly in
here, so that it gets applied consistently. (As such this change
is not entirely NFC -- FuncAttrs will now use this as well.)

Differential Revision: https://reviews.llvm.org/D96992
2021-02-19 11:56:01 +01:00
Djordje Todorovic 1a2b3536ef Reland "[Debugify] Make the debugify aware of the original (-g) Debug Info"
As discussed on the RFC [0], I am sharing the set of patches that
    enables checking of original Debug Info metadata preservation in
    optimizations. The proof-of-concept/proposal can be found at [1].

    The implementation from the [1] was full of duplicated code,
    so this set of patches tries to merge this approach into the existing
    debugify utility.

    For example, the utility pass in the original-debuginfo-check
    mode could be invoked as follows:

      $ opt -verify-debuginfo-preserve -pass-to-test sample.ll

    Since this is very initial stage of the implementation,
    there is a space for improvements such as:
      - Add support for the new pass manager
      - Add support for metadata other than DILocations and DISubprograms

    [0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ
    [1] https://github.com/djolertrk/llvm-di-checker

    Differential Revision: https://reviews.llvm.org/D82545

The test that was failing is now forced to use the old PM.
2021-02-18 23:29:22 -08:00
Xun Li 3bf8f162a0 [Coroutine] Relax CoroElide musttail check
As discussed in D94834, we don't really need to do complicated analysis. It's safe to just drop the tail call attribute.

Differential Revision: https://reviews.llvm.org/D96926
2021-02-18 19:36:11 -08:00
Wei Mi 5fb65c02ca [SampleFDO] Stop repeated indirect call promotion for the same target.
Found a problem in indirect call promotion in sample loader pass. Currently
if an indirect call is promoted for a target, and if the parent function is
inlined into some other function, the indirect call can be promoted for the
same target again. That is redundent which can harm performance and can cause
excessive compile time in some extreme case.

The patch fixes the issue. If a target is promoted for an indirect call, the
patch will write ICP metadata with the target call count being set to 0.
In the later ICP in sample profile loader, if it sees a target has 0 count
for an indirect call, it knows the target has been promoted and won't do
indirect call promotion for the indirect call.

The fix brings 0.1~0.2% performance on our search benchmark.

Differential Revision: https://reviews.llvm.org/D96806
2021-02-18 17:01:32 -08:00
Jianzhou Zhao 7e658b2fdc [dfsan] Instrument origin variable and function definitions
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse, gbalats

Differential Revision: https://reviews.llvm.org/D96977
2021-02-18 23:50:05 +00:00
Hongtao Yu e87b1b1d4e [CSSPGO] Use callsite sample counts to annotate indirect call sites.
With CSSPGO all indirect call targets are counted torwards the original indirect call site in the profile, including both inlined and non-inlined targets. Therefore no need to look for callee entry counts. This also fixes the issue where callee entry count doesn't match callsite count due to the nature of CS sampling.

I'm also cleaning up the orginal code that called `findIndirectCallFunctionSamples` just to compute the sum, the return value of which was disgarded.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D96990
2021-02-18 14:52:34 -08:00
Nikita Popov 70e3c9a8b6 [BasicAA] Always strip single-argument phi nodes
We can always look through single-argument (LCSSA) phi nodes when
performing alias analysis. getUnderlyingObject() already does this,
but stripPointerCastsAndInvariantGroups() does not. We still look
through these phi nodes with the usual aliasPhi() logic, but
sometimes get sub-optimal results due to the restrictions on value
equivalence when looking through arbitrary phi nodes. I think it's
generally beneficial to keep the underlying object logic and the
pointer cast stripping logic in sync, insofar as it is possible.

With this patch we get marginally better results:

  aa.NumMayAlias | 5010069 | 5009861
  aa.NumMustAlias | 347518 | 347674
  aa.NumNoAlias | 27201336 | 27201528
  ...
  licm.NumPromoted | 1293 | 1296

I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(),
as we're past the point where we can explicitly spell out everything
that's getting stripped.

Differential Revision: https://reviews.llvm.org/D96668
2021-02-18 23:07:50 +01:00
Ta-Wei Tu f70cdc5b5c [NPM] Properly reset parent loop after loop passes
This fixes https://bugs.llvm.org/show_bug.cgi?id=49185

When `NDEBUG` is not set, `LPMUpdater` checks if the added loops have the same parent loop as the current one in `addSiblingLoops`.
If multiple loop passes are executed through `LoopPassManager`, `U.ParentL` will be the same across all passes.
However, the parent loop might change after running a loop pass, resulting in assertion failures in subsequent passes.

This patch resets `U.ParentL` after running individual loop passes in `LoopPassManager`.

Reviewed By: asbirlea, ychen

Differential Revision: https://reviews.llvm.org/D96727
2021-02-19 02:50:53 +08:00
Jianzhou Zhao 406dc54903 [dfsan] Refactor defining TLS variables
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D96941
2021-02-18 18:04:21 +00:00
Jianzhou Zhao 2e6cd338c6 [dfsan] Refactor runtime functions checking
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D96940
2021-02-18 18:01:46 +00:00
Philip Reames 8666463889 [instcombine] Exploit UB implied by nofree attributes
This patch simply implements the documented UB of the current nofree attributes as specified. It doesn't try to be fancy about inference (yet), it just implements the cases already specified and inferred.

Note: When this lands, it may expose miscompiles. If so, please revert and provide a test case. It's likely the bug is in the existing inference code and without a relatively complete test case, it will be hard to debug.

Differential Revision: https://reviews.llvm.org/D96349
2021-02-18 08:34:22 -08:00
Djordje Todorovic c1e23894fc Revert "[Debugify] Make the debugify aware of the original (-g) Debug Info"
This reverts rG8ee7c7e02953.
One test is failing, I'll reland this as soon as possible.
2021-02-18 02:04:27 -08:00
Djordje Todorovic 8ee7c7e029 [Debugify] Make the debugify aware of the original (-g) Debug Info
As discussed on the RFC [0], I am sharing the set of patches that
enables checking of original Debug Info metadata preservation in
optimizations. The proof-of-concept/proposal can be found at [1].

The implementation from the [1] was full of duplicated code,
so this set of patches tries to merge this approach into the existing
debugify utility.

For example, the utility pass in the original-debuginfo-check
mode could be invoked as follows:

  $ opt -verify-debuginfo-preserve -pass-to-test sample.ll

Since this is very initial stage of the implementation,
there is a space for improvements such as:
  - Add support for the new pass manager
  - Add support for metadata other than DILocations and DISubprograms

[0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ
[1] https://github.com/djolertrk/llvm-di-checker

Differential Revision: https://reviews.llvm.org/D82545
2021-02-18 01:52:16 -08:00
Joseph Huber c3a3d20093 [LV] Add analysis remark for mixed precision conversions
Floating point conversions inside vectorized loops have performance
implications but are very subtle. The user could specify a floating
point constant, or call a function without realizing that it will
force a change in the vector width. An example of this behaviour is
seen in https://godbolt.org/z/M3nT6c . The vectorizer should indicate
when this happens becuase it is most likely unintended behaviour.

This patch adds a simple check for this behaviour by following floating
point stores in the original loop and checking if a floating point
conversion operation occurs.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D95539
2021-02-17 21:37:08 -05:00
Teresa Johnson d55d46f43b [WPD] Add an optional checking mode for debugging devirtualization
This adds an internal option -wholeprogramdevirt-check which if enabled
will guard each devirtualization with a runtime check against the
expected target, and an invocation of a debug trap if the check fails.
This is useful for debugging WPD failures involving undefined behavior
(e.g. casting to another class type not in the inheritance chain).

Differential Revision: https://reviews.llvm.org/D95969
2021-02-17 16:46:15 -08:00
Rong Xu 7397905ab0 [SampleFDO] Third Try: Refactor SampleProfile.cpp
Apply the patch for the third time after fixing buildbot failures.

Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.
(4) Add inline keyword to avoid duplicated symbols -- they will
be removed later when the class is changed to a template.

Differential Revision: https://reviews.llvm.org/D96455
2021-02-17 15:31:50 -08:00
Teresa Johnson 3c4c205060 [WPD][lld] Test handling of vtable definition from shared libraries
Adds a lld test for a case that the handling added for dynamically
exported symbols in 1487747e99 already
fixes. Because isExportDynamic returns true when the symbol is
SharedKind with default visibility, it will treat as dynamically
exported and block devirtualization when the definition of a vtable
comes from a shared library. This is desireable as it is dangerous to
devirtualize in that case, since there could be hidden overrides in the
shared library. Typically that happens when the shared library header
contains available externally definitions, which applications can
override. An example is std::error_category, which is overridden in LLVM
and causing failures after a self build with WPD enabled, because
libstdc++ contains hidden overrides of the virtual base class methods.

The regular LTO case in the new test already worked, but there are
2 fixes in this patch needed for the index-only case and the hybrid
LTO case. For the index-only case, WPD should not simply ignore
available externally vtables. A follow on fix will be made to clang to
emit type metadata for those vtables, which the new test is modeling.
For the hybrid case, we need to ensure when the module is split that any
llvm.*used globals are cloned to the regular LTO split module so
available externally vtable definitions are not prematurely deleted.

Another follow on fix will add the equivalent gold test, which requires
a small fix to the plugin to treat symbols in dynamic libraries the same
way lld already is.

Differential Revision: https://reviews.llvm.org/D96721
2021-02-17 12:49:24 -08:00
Vedant Kumar c28622fbf3 Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp"
Revert "[SampleFDO] Add missing #includes to unbreak modules build after D96455"

This reverts commit c73cbf218a.

Revert "[SampleFDO] Fix MSVC "namespace uses itself" warning (NFC)"

This reverts commit a23e6b321c.

Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp"

This reverts commit 6fd5ccff72.

Still seeing link failures when building llc (or other tools), due to
the new SampleProfileLoaderBaseImpl.h containing definitions that get
duplicated across multiple TU's.

```
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::findEquivalenceClasses(llvm::Function&)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::buildEdges(llvm::Function&)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::computeDominanceAndLoopInfo(llvm::Function&)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getFunctionLoc(llvm::Function&)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getBlockWeight(llvm::BasicBlock const*)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockWeight(llvm::raw_ostream&, llvm::BasicBlock const*) const' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockEquivalence(llvm::raw_ostream&, llvm::BasicBlock const*)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printEdgeWeight(llvm::raw_ostream&, std::__1::pair<llvm::BasicBlock const*, llvm::BasicBlock const*>)' in:
    tools/llc/CMakeFiles/llc.dir/llc.cpp.o
    lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o)
```
2021-02-17 10:22:24 -08:00
William S. Moses 40862b1a74 [SROA] Propagate correct TBAA/TBAA Struct offsets
SROA does not correctly account for offsets in TBAA/TBAA struct metadata.
This patch creates functionality for generating new MD with the corresponding
offset and updates SROA to use this functionality.

Differential Revision: https://reviews.llvm.org/D95826
2021-02-17 11:59:00 -05:00
Ta-Wei Tu 0eeaec2a6d [NFC] Refactor LoopInterchange into a loop-nest pass
This is the preliminary patch of converting `LoopInterchange` pass to a loop-nest pass and has no intended functional change.
Changes that are not loop-nest related are split to D96650.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D96644
2021-02-18 00:55:38 +08:00
Sjoerd Meijer f78aa8b2c2 [LSR] Add a flag that overrides the target's preferred addressing mode
This adds a new flag -lsr-preferred-addressing-mode to override the target's
preferred addressing mode. It replaces flag -lsr-backedge-indexing, which is
equivalent to preindexed addressing that is one of the options that
-lsr-preferred-addressing-mode accepts.

Differential Revision: https://reviews.llvm.org/D96855
2021-02-17 16:50:21 +00:00
Sanjay Patel 85294703a7 [InstCombine] fold fcmp-of-copysign idiom
As discussed in:
https://llvm.org/PR49179
...this pattern shows up in library code.
There are several potential generalizations as noted,
but we need to be careful that we get FP special-values
right, and it's not clear how much variation we should
expect to see from this exact idiom.
2021-02-17 10:32:33 -05:00
Sjoerd Meijer 5a641cf194 Follow up of rGdea4a63e6359, which committed a slightly different version than
intended.
2021-02-17 10:07:26 +00:00
Sjoerd Meijer dea4a63e63 [LSR] Cleanup of getPreferredAddresingMode. NFC.
This is a follow up D96600 and cleans up most calls to
getPreferredAddresingMode. I.e., we really don't need to query the same things
again and again, but get the preferred addressing mode once for each loop. So
this should be a lot friendlier for compile times, especially if we start
implementing getPreferredAddresingMode.

Differential Revision: https://reviews.llvm.org/D96772
2021-02-17 09:45:29 +00:00
Rong Xu 6fd5ccff72 [SampleFDO] Reapply: Refactor SampleProfile.cpp
Reapply patch after fixing buildbot failure.
Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.

Differential Revision: https://reviews.llvm.org/D96455
2021-02-16 16:43:21 -08:00
Mehdi Amini c761fe77bd Revert "[SampleFDO][NFC] Refactor SampleProfile.cpp"
This reverts commit 310b35304c.
The build is broken with -DBUILD_SHARED_LIBS=ON :

lib/ProfileData/CMakeFiles/LLVMProfileData.dir/SampleProfileLoaderBaseUtil.cpp.o: In function `llvm::sampleprofutil::callsiteIsHot(llvm::sampleprof::FunctionSamples const*, llvm::ProfileSummaryInfo*, bool)':
SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x1a): undefined reference to `llvm::ProfileSummaryInfo::isColdCount(unsigned long) const'
SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x28): undefined reference to `llvm::ProfileSummaryInfo::isHotCount(unsigned long) const'
...
2021-02-16 22:11:42 +00:00
Rong Xu 310b35304c [SampleFDO][NFC] Refactor SampleProfile.cpp
Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.

Differential Revision: https://reviews.llvm.org/D96455
2021-02-16 11:18:21 -08:00
Arnold Schwaighofer 627cfd4394 [coro async] Don't promote allocas to the frame or rewrite swifterror if there are no suspend points
Also don't call function to update the call graph if there are no
clones. The function will fail.

rdar://74277860

Differential Revision: https://reviews.llvm.org/D96620
2021-02-16 09:05:38 -08:00
Ta-Wei Tu 6b612a7baf [NFC][LoopInterchange] Explicitly pass both `InnerLoop` and `OuterLoop` to `processLoop`
This is a split patch of D96644.

Explicitly pass both `InnerLoop` and `OuterLoop` to function `processLoop` to remove the need to swap elements in loop list and allow making loop list an `ArrayRef`.
Also, fix inconsistent spellings of `OuterLoopId` and `Inner Loop Id` in debug log.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96650
2021-02-16 22:17:44 +08:00
Florian Hahn f64c626069
[VPlan] Remove unused Phi member from VPWidenPHIRecipe (NFC).
The member is not needed any longer after recent changes.
2021-02-16 13:53:06 +00:00
Kerry McLaughlin ba1e150d03 [SVE] Add support for scalable vectorization of loops with int/fast FP reductions
This patch enables scalable vectorization of loops with integer/fast reductions, e.g:

```
unsigned sum = 0;
for (int i = 0; i < n; ++i) {
  sum += a[i];
}
```

A new TTI interface, isLegalToVectorizeReduction, has been added to prevent
reductions which are not supported for scalable types from vectorizing.
If the reduction is not supported for a given scalable VF,
computeFeasibleMaxVF will fall back to using fixed-width vectorization.

Reviewed By: david-arm, fhahn, dmgreen

Differential Revision: https://reviews.llvm.org/D95245
2021-02-16 13:50:06 +00:00
Sander de Smalen 00fe10c6a6 [SCEVExpander] Migrate costAndCollectOperands to use InstructionCost.
This patch changes costAndCollectOperands to use InstructionCost for
accumulated cost values.

isHighCostExpansion will return true if the cost has exceeded the budget.

Reviewed By: CarolineConcatto, ctetreau

Differential Revision: https://reviews.llvm.org/D92238
2021-02-16 09:27:34 +00:00
Florian Hahn 54a14c264a
[VPlan] Manage scalarized values using VPValues.
This patch updates codegen to use VPValues to manage the generated
scalarized instructions.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D92285
2021-02-16 09:04:10 +00:00
Akira Hatanaka 32dc79c5ef [ObjC][ARC] Do not perform code motion on precise release calls
This fixes a bug where an object can get deallocated before reaching the
end of its full formal lifetime.

rdar://72110887
rdar://74123176
2021-02-15 17:39:37 -08:00
Duncan P. N. Exon Smith 22a52dfddc TransformUtils: Fix metadata handling in CloneModule (and improve CloneFunctionInto)
This commit fixes how metadata is handled in CloneModule to be sound,
and improves how it's handled in CloneFunctionInto (although the latter
is still awkward when called within a module).

Ruiling Song pointed out in PR48841 that CloneModule was changed to
unsoundly use the RF_ReuseAndMutateDistinctMDs flag (renamed in
fa35c1f80f for clarity). This flag papered
over a crash caused by other various changes made to CloneFunctionInto
over the past few years that made it unsound to use cloning between
different modules.

(This commit partially addresses PR48841, fixing the repro from
preprocessed source but not textual IR. MDNodeMapper::mapDistinctNode
became unsound in df763188c9 and this
commit does not address that regression.)

RF_ReuseAndMutateDistinctMDs is designed for the IRMover to use,
avoiding unnecessary clones of all referenced metadata when linking
between modules (with IRMover, the source module is discarded after
linking). It never makes sense to use when you're not discarding the
source. This commit drops its incorrect use in CloneModule.

Sadly, the right thing to do with metadata when cloning a function is
complicated, and this patch doesn't totally fix it.

The first problem is that there are two different types of referenceable
metadata and it's not obvious what to with one of them when remapping.

- `!0 = !{!1}` is metadata's version of a constant. Programatically it's
  called "uniqued" (probably a better term would be "constant") because,
  like `ConstantArray`, it's stored in uniquing tables. Once it's
  constructed, it's illegal to change its arguments.
- `!0 = distinct !{!1}` is a bit closer to a global variable. It's legal
  to change the operands after construction.

What should be done with distinct metadata when cloning functions within
the same module?

- Should new, cloned nodes be created?
- Should all references point to the same, old nodes?

The answer depends on whether that metadata is effectively owned by a
function.

And that's the second problem. Referenceable metadata's ownership model
is not clear or explicit. Technically, it's all stored on an
LLVMContext. However, any metadata that is `distinct`, that transitively
references a `distinct` node, or that transitively references a
GlobalValue is specific to a Module and is effectively owned by it. More
specifically, some metadata is effectively owned by a specific Function
within a module.

Effectively function-local metadata was introduced somewhere around
c10d0e5ccd, which made it illegal for two
functions to share a DISubprogram attachment.

When cloning a function within a module, you need to clone the
function-local debug info and suppress cloning of global debug info (the
status quo suppresses cloning some global debug info but not all). When
cloning a function to a new/different module, you need to clone all of
the debug info.

Here's what I think we should do (eventually? soon? not this patch
though):
- Distinguish explicitly (somehow) between pure constant metadata owned
  by the LLVMContext, global metadata owned by the Module, and local
  metadata owned by a GlobalValue (such as a function).
- Update CloneFunctionInto to trigger cloning of all "local" metadata
  (only), perhaps by adding a bit to RemapFlag. Alternatively, split
  out a separate function CloneFunctionMetadataInto to prime the
  metadata map that callers are updated to call ahead of time as
  appropriate.

Here's the somewhat more isolated fix in this patch:
- Converted the `ModuleLevelChanges` parameter to `CloneFunctionInto` to
  an enum called `CloneFunctionChangeType` that is one of
  LocalChangesOnly, GlobalChanges, DifferentModule, and ClonedModule.
- The code maintaining the "functions uniquely own subprograms"
  invariant is now only active in the first two cases, where a function
  is being cloned within a single module. That's necessary because this
  code inhibits cloning of (some) "global" metadata that's effectively
  owned by the module.
- The code maintaining the "all compile units must be explicitly
  referenced by !llvm.dbg.cu" invariant is now only active in the
  DifferentModule case, where a function is being cloned into a new
  module in isolation.
- CoroSplit.cpp's call to CloneFunctionInto in CoroCloner::create
  uses LocalChangeOnly, since fa635d730f
  only set `ModuleLevelChanges` to trigger cloning of local metadata.
- CloneModule drops its unsound use of RF_ReuseAndMutateDistinctMDs
  and special handling of !llvm.dbg.cu.
- Fixed some outdated header docs and left a couple of FIXMEs.

Differential Revision: https://reviews.llvm.org/D96531
2021-02-15 11:56:00 -08:00
Sjoerd Meijer 357237e93e Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode"
This reverts commit effc3b0799, with the build
problem fixed.
2021-02-15 11:33:00 +00:00
Max Kazantsev e3c759bd58 [LoopLoadElim] Pass ScalarEvolution in old pass manager. PR49141
Loop canonicalization may end up deleting blocks from CFG. And
Scalar Evolution may still keep cached referenced to those blocks
unless updated properly.
2021-02-15 18:08:23 +07:00
Sjoerd Meijer effc3b0799 Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode"
This reverts commit cd6de0e8de.
2021-02-15 11:01:23 +00:00
Sjoerd Meijer cd6de0e8de [TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode
This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into
getPreferredAddressingMode() so that we have one interface to steer LSR in
generating the preferred addressing mode.

Differential Revision: https://reviews.llvm.org/D96600
2021-02-15 10:44:15 +00:00
Florian Hahn 3df5d5aace [ConstraintElimination] Fix variables used for pattern matching.
Re-using the matched variable in the pattern does not work as expected.
This patch fixes that by introducing a new variable for the 2nd level
match.
2021-02-14 18:42:37 +00:00
Kazu Hirata 910e2d1e57 [llvm] Use llvm::is_contained (NFC) 2021-02-14 08:36:20 -08:00
Sanjay Patel b40fde062c [InstCombine] fold fdiv with pow divisor (PR49147)
This is unusual in the general (non-reciprocal) case because we need
an extra instruction, but that should be better for general FP
reassociation and codegen. We conservatively check for "arcp" FMF
here as we do with existing fdiv folds, but it is not strictly
necessary to have that.

This is part of solving:
https://llvm.org/PR49147
(The powi variant potentially has a different constraint.)

Differential Revision: https://reviews.llvm.org/D96648
2021-02-14 08:07:36 -05:00
Juneyoung Lee ed253ef772 [LoopVectorize] Fix VPRecipeBuilder::createEdgeMask to correctly generate the mask
This patch fixes pr48832 by correctly generating the mask when a poison value is involved.

Consider this CFG (which is a part of the input):

```
for.body:                                         ; preds = %for.cond
  br i1 true, label %cond.false, label %land.rhs

land.rhs:                                         ; preds = %for.body
  br i1 poison, label %cond.end, label %cond.false

cond.false:                                       ; preds = %for.body, %land.rhs
  br label %cond.end

cond.end:                                         ; preds = %land.rhs, %cond.false
  %cond = phi i32 [ 0, %cond.false ], [ 1, %land.rhs ]

```

The path for.body -> land.rhs -> cond.end should be taken when 'select i1 false, i1 poison, i1 false' holds (which means it's never taken); but VPRecipeBuilder::createEdgeMask was emitting 'and i1 false, poison' instead.
The former one successfully blocks poison propagation whereas the latter one doesn't, making the condition poison and thus causing the miscompilation.

SimplifyCFG has a similar bug (which didn't expose a real-world bug yet), and a patch for this is also ongoing (see https://reviews.llvm.org/D95026).

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D95217
2021-02-14 21:12:34 +09:00
Teresa Johnson a80232bd5f [LTT] Address post-review comments (NFC)
Implement some post-review cleanup suggestions for D96083.
2021-02-13 15:52:59 -08:00
Tyker 642e9225c6 reland [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-13 13:03:11 +01:00
Wei Wang 80dc0661bd [LTO] Perform DSOLocal propagation in combined index
Perform DSOLocal propagation within summary list of every GV. This
avoids the repeated query of this information during function
importing.

Differential Revision: https://reviews.llvm.org/D96398
2021-02-12 22:58:26 -08:00
Arnold Schwaighofer e760ec2a01 [coro] Add support for polymorphic return typed coro.suspend.async
This allows for suspend point specific resume function types.

Return values from a suspend point can therefore be modelled as
arguments to the resume function. Allowing for directly passed return
types.

Differential Revision: https://reviews.llvm.org/D96136
2021-02-12 10:08:00 -08:00
Akira Hatanaka ed4718eccb [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-12 09:51:57 -08:00
Kerry McLaughlin fea06efe7c [SVE][LoopVectorize] Support for vectorization of loops with function calls
Changes `getScalarizationOverhead` to return an invalid cost for scalable VFs
and adds some simple tests for loops containing a function for which
there is a vectorized variant available.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D96356
2021-02-12 13:47:43 +00:00
Sanjay Patel 79b1b4a581 [Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics
The vector reduction intrinsics started life as experimental ops, so backend support
was lacking. As part of promoting them to 1st-class intrinsics, however, codegen
support was added/improved:
D58015
D90247

So I think it is safe to now remove this complication from IR.

Note that we still have an IR-level codegen expansion pass for these as discussed
in D95690. Removing that is another step in simplifying the logic. Also note that
x86 was already unconditionally forming reductions in IR, so there should be no
difference for x86.

I spot checked a couple of the tests here by running them through opt+llc and did
not see any asm diffs.

If we do find functional differences for other targets, it should be possible
to (at least temporarily) restore the shuffle IR with the ExpandReductions IR
pass.

Differential Revision: https://reviews.llvm.org/D96552
2021-02-12 08:13:50 -05:00
Florian Hahn 85fe5c9345
[VPlan] Make VPRecipeBase inherit from VPUser directly (NFC).
The individual recipes have been updated to manage their operands using
VPUser a while back. Now that the transition is done, we can instead
make VPRecipeBase a VPUser and get rid of the toVPUser helper.
2021-02-12 13:06:58 +00:00
David Sherwood 01b87444cb [NFC][Analysis] Change struct VecDesc to use ElementCount
This patch changes the VecDesc struct to use ElementCount
instead of an unsigned VF value, in preparation for
future work that adds support for vectorized versions of
math functions using scalable vectors. Since all I'm doing
in this patch is switching the type I believe it's a
non-functional change. I changed getWidestVF to now return
both the widest fixed-width and scalable VF values, but
currently the widest scalable value will be zero.

Differential Revision: https://reviews.llvm.org/D96011
2021-02-12 11:07:58 +00:00
David Sherwood 9700228abc [Analysis] Change VFABI::mangleTLIVectorName to use ElementCount
Adds support for mangling TLI vector names for scalable vectors.

Differential Revision: https://reviews.llvm.org/D96338
2021-02-12 09:38:12 +00:00
Kazu Hirata 9dc62d1dc1 [PGO] Drop unnecessary const from return types (NFC) 2021-02-11 23:31:29 -08:00
Hongtao Yu de40f6d623 [CSSPGO] Process functions in a top-down order on a dynamic call graph.
Functions are currently processed by the sample profiler loader in a top-down order defined by the static call graph. The order is being adjusted to be a top-down order based on the input context-sensitive profile. One benefit is that the processing order of caller and callee in one SCC would follow the context order in the profile to favor more inlining. Another benefit is that the processing order of caller and callee through an indirect call (which is not on the static call graph) can be honored which in turn allows for more inlining.

The profile top-down order for SCC is also extended to support non-CS profiles.

Two switches `-mllvm -use-profile-indirect-call-edges` and `-mllvm -use-profile-top-down-order` are being introduced.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95988
2021-02-11 12:36:59 -08:00
Michael Kruse 606aa622b2 Revert "[AssumptionCache] Avoid dangling llvm.assume calls in the cache"
This reverts commit b7d870eae7 and the
subsequent fix "[Polly] Fix build after AssumptionCache change (D96168)"
(commit e6810cab09).

It caused indeterminism in the output, such that e.g. the
polly-x86_64-linux buildbot failed accasionally.
2021-02-11 12:17:38 -06:00
Sander de Smalen 703130fb01 [TTI] Change TargetTransformInfo::getMinimumVF to return ElementCount
This will be needed in the loop-vectorizer where the minimum VF
requested may be a scalable VF. getMinimumVF now takes an additional
operand 'IsScalableVF' that indicates whether a scalable VF is required.

Reviewed By: kparzysz, rampitec

Differential Revision: https://reviews.llvm.org/D96020
2021-02-11 09:08:48 +00:00
Markus Lavin 9498315c9b Expand masked mem intrinsics correctly wrt big-endian
Need to take endianness into account when doing vector to scalar casts
such as %bc = bitcast <8 x i1> %v to i8
Companion commit for https://reviews.llvm.org/D94867
Upload in response to
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147862.html
Attempting to document the actual memory layout rules for vectors in
https://reviews.llvm.org/D94964

Differential Revision: https://reviews.llvm.org/D94765
2021-02-11 08:59:52 +00:00
Sander de Smalen be9bbb57f4 [LoopVectorize] NFC: Change selectVectorizationFactor to work on ElementCount.
This patch is NFC and changes occurrences of `unsigned Width`
and `unsigned i` to work on type ElementCount instead.

This patch is a preparatory patch with the ultimate goal of making
`computeMaxVF()` return both a max fixed VF and a max scalable VF,
so that `selectVectorizationFactor()` can pick the most cost-effective
vectorization factor.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D96019
2021-02-11 08:47:59 +00:00
Kazu Hirata d12a0f4fc0 [GCOV] Drop unnecessary const from return types (NFC)
Identified with readability-const-return-type.
2021-02-10 20:01:18 -08:00
Duncan P. N. Exon Smith fa35c1f80f ValueMapper: Rename RF_MoveDistinctMDs => RF_ReuseAndMutateDistinctMDs, NFC
Rename the `RF_MoveDistinctMDs` flag passed into `MapValue` and
`MapMetadata` to `RF_ReuseAndMutateDistinctMDs` in order to more
precisely describe its effect and clarify the header documentation.

Found this while helping to investigate PR48841, which pointed out an
unsound use of the flag in `CloneModule()`. For now I've just added a
FIXME there, but I'm hopeful that the new (more precise) name will
prevent other similar errors.
2021-02-10 16:53:21 -08:00
Benjamin Kramer 8fb4a4f7bb [SampleFDO] Silence -Wnon-virtual-dtor warning
There's no polymorphic deletion happening here.
2021-02-10 23:37:15 +01:00
Rong Xu db0d7d0ba9 [SampleFDO][NFC] Refactor SampleProfileLoader to reuse in CodeGen
Break SampleProfileLoader into to a base and a derived class.
Base class (SampleProfileLoaderBaseImpl) includes the common
code for IR and MachineIR (CodeGen) sample loader.
It will be templatelized in the later patch.

Inline and Probe related code will remain in the derived class of
SampleProfileLoader and stays in SampleProfile.cpp.

We need to refactor some functions:
(1) getInstWeight() to enable the code sharing -- put the core into
getInstWeightImpl().
(2) emitAnnotation() and propagateWeights() to carve out the code
specific to SampleProfileLoader.
(3) make getInstWeight() and findFunctionSamples() virtual and override
in SampleProfileLoader as they need to access the fields in the derived
class.

Differential Revision: https://reviews.llvm.org/D95832
2021-02-10 13:29:15 -08:00
Hongtao Yu 1cb47a063e [CSSPGO] Unblock optimizations with pseudo probe instrumentation.
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:

1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95982
2021-02-10 12:43:17 -08:00
Adrian Prantl 19fc8eede4 Add missing nullptr check.
salvageDebugInfoImpl() may fail and return a nullptr.
2021-02-10 12:15:24 -08:00
Sanjay Patel 6e2053983e [InstCombine] fold lshr(mul X, SplatC), C2
This is a special-case multiply that replicates bits of
the source operand. We need this fold to avoid regression
if we make canonicalization to `mul` more aggressive for
shl+or patterns.

I did not see a way to make Alive generalize the bit width
condition for even-number-of-bits only, but an example of
the proof is:
  Name: i32
  Pre: isPowerOf2(C1 - 1) && log2(C1) == C2 && (C2 * 2 == width(C2))
  %m = mul nuw i32 %x, C1
  %t = lshr i32 %m, C2
  =>
  %t = and i32 %x, C1 - 2

  Name: i14
  %m = mul nuw i14 %x, 129
  %t = lshr i14 %m, 7
  =>
  %t = and i14 %x, 127

https://rise4fun.com/Alive/e52
2021-02-10 15:02:31 -05:00
Sander de Smalen 9db6e97a86 [LoopVectorize] NFC: Change computeFeasibleMaxVF to operate on ElementCount.
This patch is NFC and changes occurrences of `unsigned MaxVectorSize`
to work on type ElementCount.

This patch is a preparatory patch with the ultimate goal of making
`computeMaxVF()` return both a max fixed VF and a max scalable VF,
so that `selectVectorizationFactor()` can pick the most cost-effective
vectorization factor.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D96018
2021-02-10 08:52:10 +00:00
Tyker 5652e192fc Revert "[InstCombine] convert assumes to operand bundles"
This reverts commit 5eb2e994f9.
2021-02-10 01:32:00 +01:00
Florian Hahn fd8afa41eb
[VPlan] Use VPUser to manage CondBit
VP blocks keep track of a condition, which is a VPValue. This patch
updates VPBlockBase to manage the value using VPUser, so
replaceAllUsesWith properly updates the condition bit as well.

This is required to enable VP2VP transformations and it helps with
simplifying some of the code required to manage condition bits.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D95382
2021-02-09 21:53:50 +00:00
Johannes Doerfert 81429abd83 [Attributor][FIX] Do not create UB by introducing a `noundef undef`
This was reported as PR49104. The reproducer uses varargs but the issue
is the same, we know an argument is dead but can't change the signature
for some reason. The PR49104 situation was: We are in an CG-SCC
traversal and we remove all the uses of an argument and proof it thereby
dead. However, if we do not remove the argument, via signature rewrite,
we need to ensure that the `undef` we introduce at the call site doesn't
clash with a `noundef` attribute.
2021-02-09 13:02:38 -06:00
Tyker 5eb2e994f9 [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-09 19:33:53 +01:00
Jianzhou Zhao 9887fdebd6 [dfsan] Refactor loadShadow
To simplify the review of https://reviews.llvm.org/D95835.

Reviewed-by: gbalats, morehouse

Differential Revision: https://reviews.llvm.org/D96180
2021-02-09 17:21:41 +00:00
Nico Weber de1966e542 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 4a64d8fe39.
Makes clang crash when buildling trivial iOS programs, see comment
after https://reviews.llvm.org/D92808#2551401
2021-02-09 11:06:32 -05:00
Chuanqi Xu 88d7876e1e [NFC] [Coroutine] Remove Unused Variables 2021-02-09 15:55:06 +08:00
Kazu Hirata 302313a264 [Transforms] Use range-based for loops (NFC) 2021-02-08 22:33:53 -08:00
Kazu Hirata de6c49ae31 [Transforms/Utils] Drop unnecessary const from a return type (NFC)
Identified with const-return-type.
2021-02-08 22:33:49 -08:00
Jinsong Ji 9202806241 Revert "[CostModel] Remove VF from IntrinsicCostAttributes"
This reverts commit 502a67dd7f.

This expose a failure in test-suite build on PowerPC,
revert to unblock buildbot first,
Dave will re-commit in https://reviews.llvm.org/D96287.

Thanks Dave.
2021-02-09 02:14:14 +00:00
Arthur Eubanks 0eda454796 [SimpleLoopUnswitch] Don't non-trivially unswitch loops that are unsafe to clone
Non-trivial unswitching can clone loops.

The legacy -loop-unswitch pass also checks for this.

Fixes PR49085.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D96288
2021-02-08 13:19:24 -08:00
Jianzhou Zhao 64b448b983 [dfsan] Refactor visitCallBase
To simplify the review of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D96177
2021-02-08 19:55:18 +00:00
Florian Hahn 68dc90b347
[ConstraintElimination] Decompose a few more GEP indices.
This patch adds handling for zero-extended GEP indices.
2021-02-08 18:06:38 +00:00
Florian Hahn 1f1f037ed3
[ConstraintElimination] Improve index handing during constraint building.
This patch improves the index management during constraint building.
Previously, the code rejected constraints which used values that were not
part of Value2Index, but after combining the coefficients of the new
indices were 0 (if ShouldAdd was 0).

In those cases, no new indices need to be added. Instead of adding to
Value2Index directly, add new indices to the NewIndices map. The caller
can then check if it needs to add any new indices.

This enables checking constraints like `a + x <= a + n` to `x <= n`,
even if there is no constraint for `a` directly.
2021-02-08 13:05:13 +00:00
Florian Hahn ca268ed285
[ConstraintElimination] Decompose zext for unsigned compares.
For unsigned compares, zext should be a no-op and we can add the
extended value to the constraint system.
2021-02-07 20:53:06 +00:00
Florian Hahn 3bb6dc0b26
[LV] Replace some uses of VectorLoopValueMap with VPTransformState (NFC)
This patch updates some places where VectorLoopValueMap is accessed
directly to instead go through VPTransformState.

As we move towards managing created values exclusively in VPTransformState,
this ensures the use always can fetch the correct value.

This is in preparation for D92285, which switches to managing scalarized
values through VPValues.

In the future, the various fix* functions should be moved directly into
the VPlan codegen stage.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D95757
2021-02-07 18:28:21 +00:00
Kazu Hirata be23012d5a [Transforms/Utils] Use range-based for loops (NFC) 2021-02-07 09:49:36 -08:00
Sanjay Patel 6fd91be354 [Reassociate] allow or->add with shl operands
As discussed in:
https://llvm.org/PR49055

We invert instcombine's add->or transform here
because it makes it easier to identify factorization
transforms like the mul in the motivating test.

This extends the logic added with:
https://reviews.llvm.org/rG70472f3
https://reviews.llvm.org/rG93f3d7f

(I intentionally kept the formatting fix in this patch
to provide more context about the calling logic.)
2021-02-07 09:45:19 -05:00
Florian Hahn 853c52c988
[ConstraintElimination] Require GEPs to be inbounds for decomposition.
During decomposition, we assume the no-wrap properties of inbound GEPs.
Make sure the decomposed GEP is actually inbounds.
2021-02-07 11:08:53 +00:00
Johannes Doerfert b7d870eae7 [AssumptionCache] Avoid dangling llvm.assume calls in the cache
PR49043 exposed a problem when it comes to RAUW llvm.assumes. While
D96106 would fix it for GVNSink, it seems a more general concern. To
avoid future problems this patch moves away from the vector of weak
reference model used in the assumption cache. Instead, we track the
llvm.assume calls with a callback handle which will remove itself from
the cache if the call is deleted.

Fixes PR49043.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96168
2021-02-06 12:18:39 -06:00
Teresa Johnson 3a27933ec2 [LTT] Don't attempt to lower type tests used only by assumes
Type tests used only by assumes were original for devirtualization, but
are meant to be kept through the first invocation of LTT so that they
can be used for additional optimization. In the regular LTO case where
the IR is analyzed we may find a resolution for the type test and end up
rewriting the associated vtable global, which can have implications on
section splitting. Simply ignore these type tests.

Fixes PR48245.

Differential Revision: https://reviews.llvm.org/D96083
2021-02-06 09:02:10 -08:00
Sander de Smalen 79a6cfc29e NFC: Migrate LoopIdiomRecognize to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
2021-02-06 14:39:19 +00:00
Sander de Smalen ae27274b2f NFC: Migrate LoopFlatten to work on InstructionCost.
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D96029
2021-02-06 11:47:04 +00:00
Kazu Hirata ea3175c15b [Transforms/Instrumentation] Use range-based for loops (NFC) 2021-02-05 21:02:08 -08:00
Sidharth Baveja 22ebbc4765 LoopUnrollAndJam] Only allow loops with single exit(ing) blocks
Summary:
This resolves an issue posted on Bugzilla. https://bugs.llvm.org/show_bug.cgi?id=48764
In this issue, the loop had  multiple exit blocks, which resulted in the
function getExitBlock to return a nullptr, which resulted in hitting the assert.
This patch ensures that loops which only have one exit block as allowed to be
unrolled and jammed.

Reviewed By: Whitney, Meinersbur, dmgreen

Differential Revision: https://reviews.llvm.org/D95806
2021-02-05 16:10:53 +00:00
Akira Hatanaka 4a64d8fe39 [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.

Original commit message:

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 06:09:42 -08:00
Arnold Schwaighofer 8a7f5ad0fd We can only move static allocas into the resume entry points
Dynamic allocas that still exist have been verified to be only used
'locally' not accross a suspend point.

rdar://73903220

Differential Revision: https://reviews.llvm.org/D96071
2021-02-05 06:06:10 -08:00
Akira Hatanaka 2fbbb18c1d Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 3fe3946d9a.

The commit violates layering by including a header from Analysis in
lib/IR/AutoUpgrade.cpp.
2021-02-05 06:00:05 -08:00
Akira Hatanaka 3fe3946d9a [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 05:55:18 -08:00
Adrian Kuegel 7fe41ac3df Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute"
This reverts commit 3e5ce49e53.

Tests started failing on PPC, for example:
http://lab.llvm.org:8011/#/builders/105/builds/5569
2021-02-05 12:51:03 +01:00
Simon Pilgrim f7d07dbb29 IROutliner.cpp - fix Wdocumentation warning. NFCI.
Remove duplicate param
2021-02-05 11:38:09 +00:00
Simon Pilgrim 476b912e7c SampleProfile.cpp - fix Wdocumentation warning. NFCI.
Remove duplicate param
2021-02-05 11:31:17 +00:00
Simon Pilgrim 89edda7084 IROutliner.cpp - fix Wdocumentation warnings. NFCI. 2021-02-05 11:21:00 +00:00
David Green 502a67dd7f [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf working. ARM removed the fix in
dfac521da1, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-05 09:34:24 +00:00
Kazu Hirata fb74e1e78a [Transforms/Scalar] Use range-based for loops (NFC) 2021-02-04 21:18:05 -08:00
Craig Topper 11ef356d9e [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
Philip Reames 3e5ce49e53 [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute
If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCI-ish prep work, but the changes are a bit too involved for me to feel comfortable tagging the change that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-02-04 17:28:30 -08:00
Richard Smith ab243efb26 Don't infer attributes on '::operator new'.
These attributes were all incorrect or inappropriate for LLVM to infer:
- inaccessiblememonly is generally wrong; user replacement operator new
  can access memory that's visible to the caller, as can a new_handler
  function.
- willreturn is generally wrong; a custom new_handler is not guaranteed
  to terminate.
- noalias is inappropriate: Clang has a flag to determine whether this
  attribute should be present and adds it itself when appropriate.
- noundef and nonnull on the return value should be specified by the
  frontend on all 'operator new' functions if we want them, not here.

In any case, inferring attributes on functions declared 'nobuiltin' (as
these are when Clang emits them) seems questionable.
2021-02-04 13:59:49 -08:00
Richard Smith 1484ad4137 Revert "[BuildLibcalls, Attrs] Support more variants of C++'s new, add attributes for C++'s delete"
Several of the new attributes here were incorrect, and even the ones
that are generally correct were being added even to nobuiltin calls.

This reverts commit bb3f169b59.
2021-02-04 13:59:49 -08:00
Sander de Smalen 75b2555d6e NFC: Migrate LoopUnrollPass to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D95817
2021-02-04 14:05:40 +00:00
Florian Hahn 703f6a6828
[ConstraintElimination] Support conditions from loop preheaders
This patch extends the condition collection logic to allow adding
conditions from pre-headers to loop headers, by allowing cases where the
target block dominates some of its predecessors.
2021-02-04 13:58:32 +00:00
Chuanqi Xu 9511fa2dda [NFC][Coroutine] Remove redundant comment
The functionallity in the TODO was added before:
https://reviews.llvm.org/rGb3a722e66b75328ab5e2eb5c8572022cb083855b
2021-02-04 12:54:30 +08:00
Kazu Hirata be37475897 [Transforms/IPO] Use range-based for loops (NFC) 2021-02-03 20:41:20 -08:00
Nico Weber b995314143 Revert "[InstrProfiling] Use !associated metadata for counters, data and values"
This reverts commit 97ba5cde52.
Still breaks tests: https://reviews.llvm.org/D76802#2540647
2021-02-03 19:14:34 -05:00
Arthur Eubanks f020544601 [NewPM][HelloWorld] Move HelloWorld to Utils
To prevent creating a new component, which creates a new library.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D95907
2021-02-03 12:59:40 -08:00
Rong Xu b8f13db5b7 [SampleFDO][NFC] Detach SampleProfileLoader from SampleCoverageTracker
This patch detaches SampleProfileLoader from class
SampleCoverageTracker. We plan to move SampleProfileLoader
to a template class. This would remain SampleCoverageTracker
as a class.
Also make callsiteIsHot() as a file static function.

Differential Revision: https://reviews.llvm.org/D95823
2021-02-03 11:38:04 -08:00
Florian Hahn daaa0e3501
[VPlan] Manage induction value creation using VPValues.
This patch updates the induction value creation to use VPValues of
recipes to map the created values. This should bring is one step closer
to being able to optimize induction recipes directly in VPlan.

Currently widenIntOrFpInduction also generates vector values for a cast
of the induction, if it exists. Make this explicit by adding the cast
instruction to the values defined by the recipe.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D92284
2021-02-03 17:45:03 +00:00
David Sherwood d4626eb0bd [VPlan][NFC] Introduce constructors for VPIteration
This patch adds constructors to VPIteration as a cleaner way of
initialising the struct and replaces existing constructions of
the form:

  {Part, Lane}

with

  VPIteration(Part, Lane)

I have also added a default constructor, which is used by VPlan.cpp
when deciding whether to replicate a block or not.

This refactoring will be required in a later patch that adds more
members and functions to VPIteration.

Differential Revision: https://reviews.llvm.org/D95676
2021-02-03 08:52:27 +00:00
Petr Hosek 97ba5cde52 [InstrProfiling] Use !associated metadata for counters, data and values
C identifier name input sections such as __llvm_prf_* are GC roots so
they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the
C identifier name semantics.

The !associated metadata may be attached to a global object declaration
with a single argument that references another global object, and it
gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded
by the linker, setting up !associated metadata allows linker to discard
counters, data and values associated with that function symbol.

Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.

Differential Revision: https://reviews.llvm.org/D76802
2021-02-02 23:19:51 -08:00
Kazu Hirata dc3d5453bc [Transforms/Utils] Use range-based for loops (NFC) 2021-02-02 22:52:47 -08:00
Florian Hahn d8e90716df
[ConstraintElimination] Skip pointer casts.
We should be able to look through pointer casts that do not impact the
value.
2021-02-02 21:25:29 +00:00
Hongtao Yu 3d89b3cbec [CSSPGO] Introducing distribution factor for pseudo probe.
Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count.

This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes.

A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead.

Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D93264
2021-02-02 11:55:01 -08:00
Fangrui Song 51da12680f [ConstraintElimination] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build 2021-02-02 10:23:14 -08:00
Jeroen Dobbelaere 50c523a9d4 [InlineFunction] Only update noalias scopes once for an instruction.
Inlining sometimes maps different instructions to be inlined onto the same instruction.

We must ensure to only remap the noalias scopes once. Otherwise the scope might disappear (at best).
This patch ensures that we only replace scopes for which the mapping is known.

This approach is preferred over tracking which instructions we already handled in a SmallPtrSet,
as that one will need more memory.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95862
2021-02-02 17:57:10 +01:00
Florian Hahn 3e09bc2500
[ConstraintElimination] Add nicer way to dump constraints (NFC).
Use ConstraintSystem::dump(Names) to display the result of decomposing a
condition.
2021-02-02 16:36:45 +00:00
Wenlei He 1645f465be [CSSPGO] Factor out common part for CSSPGO inline and AFDO inline
Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path.

This is resubmit of D95024, with build break and overtighten assertion fixed.

Test Plan:
2021-02-02 07:55:08 -08:00
Roman Lebedev 485c4b552b
[InstCombine] Host inversion out of ashr's value operand (PR48995)
This is a yet another hint that we will eventually need InstCombineInverter,
which would consistently sink inversions, but but for that we'll need
to consistently hoist inversions where possible, so let's do that here.

Example of a proof: https://alive2.llvm.org/ce/z/78SbDq

See https://bugs.llvm.org/show_bug.cgi?id=48995
2021-02-02 17:56:43 +03:00
Tom Weaver 4f1320b77d Revert "[InstrProfiling] Use !associated metadata for counters, data and values"
This reverts commit df3e39f60b.

introduced failing test instrprof-gc-sections.c
causing build bot to fail:
http://lab.llvm.org:8011/#/builders/53/builds/1184
2021-02-02 14:19:31 +00:00
Sander de Smalen 3d3ca8f8eb NFC: Migrate SpeculateAroundPHIs to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: ctetreau

Differential Revision: https://reviews.llvm.org/D95353
2021-02-02 13:32:45 +00:00
Sander de Smalen 00da322788 NFC: Migrate SimpleLoopUnswitch to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D95352
2021-02-02 13:32:44 +00:00
Adrian Kuegel 48ca6da9d2 Revert "[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline"
This reverts commit 9a03058d63.
2021-02-02 11:51:04 +01:00
Adrian Kuegel 3a65ec4bf9 Revert "Fix build break from D95024"
This reverts commit 09cd849fde.
2021-02-02 11:51:04 +01:00
David Sherwood d4d4ceeb8f [SVE][LoopVectorize] Add masked load/store and gather/scatter support for SVE
This patch updates IRBuilder::CreateMaskedGather/Scatter to work
with ScalableVectorType and adds isLegalMaskedGather/Scatter functions
to AArch64TargetTransformInfo. In addition I've fixed up
isLegalMaskedLoad/Store to return true for supported scalar types,
since this is what the vectorizer asks for.

In LoopVectorize.cpp I've changed
LoopVectorizationCostModel::getInterleaveGroupCost to return an invalid
cost for scalable vectors, since currently this relies upon using shuffle
vector for reversing vectors. In addition, in
LoopVectorizationCostModel::setCostBasedWideningDecision I have assumed
that the cost of scalarising memory ops is infinitely expensive.

I have added some simple masked load/store and gather/scatter tests,
including cases where we use gathers and scatters for conditional invariant
loads and stores.

Differential Revision: https://reviews.llvm.org/D95350
2021-02-02 09:52:39 +00:00
Wenlei He 09cd849fde Fix build break from D95024 2021-02-02 01:01:12 -08:00
Wenlei He 9a03058d63 [CSSPGO] Factor out common part for CSSPGO inline and AFDO inline
Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path.

Test Plan:

Differential Revision: https://reviews.llvm.org/D95024
2021-02-02 00:34:06 -08:00
Wenlei He 6bae5973c4 [CSSPGO] Call site prioritized inlining for sample PGO
This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`).

Motivation

With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO.
 - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat.
 - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately.
 - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO.

New FDO Inliner

Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed.
 - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655.
 - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check.
 - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites.
 - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path.

Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO).

Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO.

Tunings and knobs

I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO.

Results

Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it).

Differential Revision: https://reviews.llvm.org/D94001
2021-02-01 23:46:34 -08:00
Gil Rapaport d475030dc2 [SCEV] Apply loop guards to divisibility tests
Extend applyLoopGuards() to take into account conditions/assumes proving some
value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the
loop unroller and the loop vectorizer identify more loops as not requiring
remainder loops.

Differential Revision: https://reviews.llvm.org/D95521
2021-02-02 08:09:39 +02:00
Petr Hosek df3e39f60b [InstrProfiling] Use !associated metadata for counters, data and values
C identifier name input sections such as __llvm_prf_* are GC roots so
they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the
C identifier name semantics.

The !associated metadata may be attached to a global object declaration
with a single argument that references another global object, and it
gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded
by the linker, setting up !associated metadata allows linker to discard
counters, data and values associated with that function symbol.

Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.

Differential Revision: https://reviews.llvm.org/D76802
2021-02-01 15:01:43 -08:00
Hongtao Yu 224fee8219 [CSSPGO] Tweaking inlining with pseudo probes.
Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites.

Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D95791
2021-02-01 13:56:40 -08:00
Sanjay Patel bbed5f2f8a [LoopVectorize] improve IR fast-math-flags propagation in reductions
This is another step (see D95452) towards correcting fast-math-flags
bugs in vector reductions.

There are multiple bugs visible in the test diffs, and this is still
not working as it should. We still use function attributes (rather
than FMF) to drive part of the logic, but we are not checking for
the correct FP function attributes.

Note that FMF may not be propagated optimally on selects (example
in https://llvm.org/PR35607 ). That's why I'm proposing to union the
FMF of a fcmp+select pair and avoid regressions on existing vectorizer
tests.

Differential Revision: https://reviews.llvm.org/D95690
2021-02-01 16:21:36 -05:00
Florian Hahn 0b28d756af
[ConstraintElimination] Add support for EQ predicates.
A == B map to A >= B && A <= B
(https://alive2.llvm.org/ce/z/_dwxKn).

This extends the constraint construction to return a list of
constraints, which can be used to properly de-compose nested AND & OR.
2021-02-01 20:48:31 +00:00
Michael Holman 8bfef78722 [ConstantHoisting] Fix bug where constant materialization could insert into EH pad
If the incoming block to a phi node is an EH pad, then we will
materialize into an EH pad, which is not supposed to happen. To fix
this, I added a check to see if incoming block of a phi node is an EH
pad before using it as the insertion point.

Differential Revision: https://reviews.llvm.org/D95019
2021-02-01 11:23:56 -08:00
Sanjay Patel 0ce2920f17 [InstCombine] try to narrow min/max intrinsics with constant operand
The constant trunc/ext may not be the optimal pre-condition,
but I think that handles the common cases.

Example of Alive2 proof:
https://alive2.llvm.org/ce/z/sREeLC

This is another step towards canonicalizing to the intrinsics.
Narrowing was identified as source of potential regression for
abs(), so we need to handle this for min/max - see:
https://llvm.org/PR48816

If this is not enough, we could process intrinsics in
the trunc-driven matching in canEvaluateTruncated().
2021-02-01 13:44:13 -05:00
Florian Hahn ce190e4144
[ConstraintElimination] Negate IR condition directly.
Instead of using ConstraintSystem::negate when adding new constraints,
flip the condition in IR.

The main advantage is that EQ predicates can be represented by 2
constraints, which makes negating based on the constraint tricky. The IR
condition can easily negated.
2021-02-01 17:21:40 +00:00
Sander de Smalen bf294953e7 NFC: Migrate SimplifyCFG to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D95351
2021-02-01 16:14:05 +00:00
Sander de Smalen 880b64aa22 [SimplifyCFG] NFC: Rename static methods to clang-tidy standards.
This patch is a precursor to D95351, which changes the signature
of these methods.
2021-02-01 16:14:05 +00:00
Cullen Rhodes 8cda227432 [LV] Fix crash when computing max VF too early
D90687 introduced a crash:

  llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int):
    Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&
    "No decisions should have been taken at this point"' failed.

when compiling the following C code:

  typedef struct {
  char a;
  } b;

  b *c;
  int d, e;

  int f() {
    int g = 0;
    for (; d; d++) {
      e = 0;
      for (; e < c[d].a; e++)
        g++;
    }
    return g;
  }

with:

  clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.c -S -o -

This occurred since prior to D90687 computeFeasibleMaxVF would only be
called in computeMaxVF when a scalar epilogue was allowed, but now it's
always called. This causes the assert above since computeFeasibleMaxVF
collects all viable VFs larger than the default MaxVF, and for each VF
calculates the register usage which results in analysis being done the
assert above guards against. This can occur in computeFeasibleMaxVF if
TTI.shouldMaximizeVectorBandwidth and this target hook is implemented in
the hexagon backend to always return true.

Reported by @iajbar.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D94869
2021-02-01 12:14:59 +00:00
Sander de Smalen 3b8a1d581e NFC: Migrate SpeculativeExecution to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D95356
2021-02-01 12:13:23 +00:00
Florian Hahn a9583a1923
[LoopUnswitch] Pacify compiler warnings.
Attempt to fix some compiler warnings on some bots after
b8c81fa5c7.
2021-02-01 09:13:06 +00:00
Florian Hahn b8c81fa5c7
[LoopUnswitch] Add shortcut if unswitched path is a no-op.
If we determine that the invariant path through the loop has no effects,
we can directly branch to the exit block, instead to unswitching first.

Besides avoiding some extra work (unswitching first, then deleting the
loop again) this allows to be more aggressive than regular unswitching
with respect to cost-modeling. This approach should always be be
desirable.

This is similar in spirit to D93734, just that it uses the previously
added checks for loop-unswitching.

I tried to add the required no-op checks from scratch, as we only check
a subset of the loop. There is potential to unify the checks with
LoopDeletion, at the cost of adding a predicate whether a block should
be considered.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95468
2021-02-01 09:03:30 +00:00
Jeroen Dobbelaere 80cdd30eb9 [LoopPeel] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed.
The reduction of a sanitizer build failure when enabling the dominance check (D95335) showed that loop peeling also needs to take care of scope duplication, just like loop unrolling (D92887).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95544
2021-02-01 10:01:17 +01:00
Kazu Hirata 3d1200b9f6 [llvm] Drop unnecessary const from return types (NFC)
Identified with const-return-type.
2021-01-31 10:23:43 -08:00
Florian Hahn 39486753d5
[ConstraintElimination] Verify CS and DFSInStack are in sync.(NFC)
After the main loop is done, we should have one constraint per item in
DFSInStack. Otherwise we added a constraint without a proper DFSInStack
item.
2021-01-30 18:27:04 +00:00
Florian Hahn 10c57268c0 [LoopUnswitch] Properly update MSSA if header has non-clobbering stores.
This patch fixes updating MemorySSA if the header contains memory
defs that do not clobber a duplicated instruction. We need to find the
first defining access outside the loop body and use that as defining
access of the duplicated instruction.

This fixes a crash caused by bee486851c.
2021-01-30 13:51:05 +00:00
Kazu Hirata 8ed1636184 [llvm] Use isa instead of dyn_cast (NFC) 2021-01-29 23:23:37 -08:00
Roman Lebedev c2534a7097
[ShadowStackGCLowering] Preserve Dominator Tree, if avaliable
This doesn't help avoid any Dominator Tree recalculations just yet,
there's one more pass to go..
2021-01-30 01:14:51 +03:00
Roman Lebedev a78d8feb48
[LowerConstantIntrinsics] Preserve Dominator Tree, if avaliable 2021-01-30 01:14:50 +03:00
Sriraman Tallam 9a81a4ef79 Emit metadata when instr. profiles hash mismatch occurs.
This patch emits "instr_prof_hash_mismatch" function annotation metadata if
there is a hash mismatch while applying instrumented profiles.

During the PGO optimized build using instrumented profiles, if the CFG of
the function has changed since generating the profile, a hash mismatch is
encountered. This patch emits this information as annotation metadata. We
plan to use this with Propeller which is done at the machine IR level.
Propeller is usually applied on top of PGO and a hash mismatch during
PGO could be used to detect source drift.

Differential Revision: https://reviews.llvm.org/D95495
2021-01-29 12:56:01 -08:00
Florian Hahn f3a710cade [LTO] Update splitCodeGen to take a reference to the module. (NFC)
splitCodeGen does not need to take ownership of the module, as it
currently clones the original module for each split operation.

There is an ~4 year old fixme to change that, but until this is
addressed, the function can just take a reference to the module.

This makes the transition of LTOCodeGenerator to use LTOBackend a bit
easier, because under some circumstances, LTOCodeGenerator needs to
write the original module back after codegen.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D95222
2021-01-29 11:53:11 +00:00
Yang Fan 59bd2068e9
[NFC][ScalarizeMaskedMemIntrin] Fix unused variable warning
GCC warning:
```
/llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp: In function ‘void scalarizeMaskedStore(llvm::CallInst*, llvm::DomTreeUpdater*, bool&)’:
/llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp:295:15: warning: variable ‘IfBlock’ set but not used [-Wunused-but-set-variable]
  295 |   BasicBlock *IfBlock = CI->getParent();
      |               ^~~~~~~
/llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp: In function ‘void scalarizeMaskedScatter(llvm::CallInst*, llvm::DomTreeUpdater*, bool&)’:
/llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp:555:15: warning: variable ‘IfBlock’ set but not used [-Wunused-but-set-variable]
  555 |   BasicBlock *IfBlock = CI->getParent();
      |               ^~~~~~~
```
2021-01-29 15:15:58 +08:00
Roman Lebedev 056385921d
[ScalarizeMaskedMemIntrin] Preserve Dominator Tree, if avaliable
This de-pessimizes the arguably more usual case of no masked mem intrinsics,
and gets rid of one more Dominator Tree recalculation.

As per llvm/test/CodeGen/X86/opt-pipeline.ll,
there's one more Dominator Tree recalculation left, we could get rid of.
2021-01-29 01:11:36 +03:00
Roman Lebedev 577fdcaa93
[PartiallyInlineLibCalls] Preserve Dominator Tree, if avaliable
This doesn't get rid of any Dominator Tree recalculations just yet,
there is one more pass to update..
2021-01-29 01:11:36 +03:00
Roman Lebedev 573f74117b
[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedCompressStore(): port to SplitBlockAndInsertIfThen()
Makes Dominator Tree preservation in a followup patch somewhat easier.
2021-01-29 01:11:35 +03:00
Roman Lebedev 2e4bb3f119
[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedExpandLoad(): port to SplitBlockAndInsertIfThen()
Makes Dominator Tree preservation in a followup patch somewhat easier.
2021-01-29 01:11:35 +03:00
Roman Lebedev e8efc03a1e
[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedScatter(): port to SplitBlockAndInsertIfThen()
Makes Dominator Tree preservation in a followup patch somewhat easier.
2021-01-29 01:11:35 +03:00
Roman Lebedev 1356399a11
[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedGather(): port to SplitBlockAndInsertIfThen()
Makes Dominator Tree preservation in a followup patch somewhat easier.
2021-01-29 01:11:34 +03:00
Roman Lebedev 22b8421156
[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedStore(): port to SplitBlockAndInsertIfThen()
Makes Dominator Tree preservation in a followup patch somewhat easier.
2021-01-29 01:11:34 +03:00
Roman Lebedev 0ea45a412a
[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedLoad(): port to SplitBlockAndInsertIfThen()
Makes Dominator Tree preservation in a followup patch somewhat easier.
2021-01-29 01:11:34 +03:00
Roman Lebedev 394685481c
[NFC][PartiallyInlineLibCalls] Port to SplitBlockAndInsertIfThen()
This makes follow-up patch for Dominator Tree preservation
somewhat more straight-forward.
2021-01-29 01:11:33 +03:00
Roman Lebedev 2de2d84ed0
[NFC][EntryExitInstrumenter] Mark Dominator Tree as preserved in legacy-PM too
This is correctly handled in new-PM wrappers, but not in old-PM.
2021-01-29 01:11:33 +03:00
Adrian Prantl 62140d943c Better document the limitations of coro::salvageDebugInfo()
and fix a few edge cases that show up in the Swift compiler but
weren't caught by the existing tests. Most notably the old code wasn't
salvaging load operations correctly. The patch also gets rid of the
LoadFromFramePtr argument and replaces it with a more generalized
mechanism.
2021-01-28 09:53:19 -08:00
Roman Lebedev 8cfa963463
[SimplifyCFG] If provided, preserve Dominator Tree
SimplifyCFG is an utility pass, and the fact that it does not
preserve DomTree's, forces it's users to somehow workaround that,
likely by not preserving DomTrees's themselves.

Indeed, simplifycfg pass didn't know how to preserve dominator tree,
it took me just under a month (starting with e113317958)
do rectify that, now it fully knows how to,
there's likely some problems with that still,
but i've dealt with everything i can spot so far.

I think we now can flip the switch.

Note that this is functionally an NFC change,
since this doesn't change the users to pass in the DomTree,
that is a separate question.

Reviewed By: kuhar, nikic

Differential Revision: https://reviews.llvm.org/D94827
2021-01-28 14:11:34 +03:00
Yang Fan 8644eb024b
[NFC][Transforms][Coroutines] Remove unused variable 2021-01-28 16:42:30 +08:00
Kazu Hirata 0da15ea581 [llvm] Use append_range (NFC) 2021-01-27 23:25:41 -08:00
Hongtao Yu 7e99bddfea [CSSPGO] Support of CS profiles in extended binary format.
This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D95547
2021-01-27 21:29:46 -08:00
Teresa Johnson 1487747e99 [LTO] Prevent devirtualization for symbols dynamically exported
Identify dynamically exported symbols (--export-dynamic[-symbol=],
--dynamic-list=, or definitions needed to preempt shared objects) and
prevent their LTO visibility from being upgraded.
This helps avoid use of whole program devirtualization when there may
be overrides in dynamic libraries.

Differential Revision: https://reviews.llvm.org/D91583
2021-01-27 15:54:13 -08:00
Sanjay Patel ab93c18c12 [LoopVectorize] use IR fast-math-flags exclusively (not FP function attributes)
I am trying to untangle the fast-math-flags propagation logic
in the vectorizers (see a6f022127 for SLP).

The loop vectorizer has a mix of checking FP function attributes,
IR-level FMF, and just wrong assumptions.

I am trying to avoid regressions while fixing this, and I think
the IR-level logic is good enough for that, but it's hard to say
for sure. This would be the 1st step in the clean-up.

The existing test that I changed to include 'fast' actually shows
a miscompile: the function only had the equivalent of nnan, but we
created new instructions that had fast (all FMF set). This is
similar to the example in https://llvm.org/PR35538

Differential Revision: https://reviews.llvm.org/D95452
2021-01-27 14:17:11 -05:00
Fangrui Song 54fb3ca96e [ThinLTO] Add Visibility bits to GlobalValueSummary::GVFlags
Imported functions and variable get the visibility from the module supplying the
definition.  However, non-imported definitions do not get the visibility from
(ELF) the most constraining visibility among all modules (Mach-O) the visibility
of the prevailing definition.

This patch

* adds visibility bits to GlobalValueSummary::GVFlags
* computes the result visibility and propagates it to all definitions

Protected/hidden can imply dso_local which can enable some optimizations (this
is stronger than GVFlags::DSOLocal because the implied dso_local can be
leveraged for ELF -shared while default visibility dso_local has to be cleared
for ELF -shared).

Note: we don't have summaries for declarations, so for ELF if a declaration has
the most constraining visibility, the result visibility may not be that one.

Differential Revision: https://reviews.llvm.org/D92900
2021-01-27 10:43:51 -08:00
Florian Hahn 28410d17f5
[LoopUtils] Pass SCEVExpander instead SE to addRuntimeChecks.
This gives the user control over which expander to use, which in turn
allows the user to decide what to do with the expanded instructions.

Used in D75980.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D94295
2021-01-27 17:36:19 +00:00
Petr Hosek bb9eb19829 Support for instrumenting only selected files or functions
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.

Differential Revision: https://reviews.llvm.org/D94820
2021-01-26 17:13:34 -08:00
Adrian Prantl 0554541b44 Salvage debug info for function arguments in coro-split funclets.
This patch improves the availability for variables stored in the
coroutine frame by emitting an alloca to hold the pointer to the frame
object and rewriting dbg.declare intrinsics to point inside the frame
object using salvaged DIExpressions. Finally, a new alloca is created
in the funclet to hold the FramePtr pointer to ensure that it is
available throughout the entire function at -O0.

This path also effectively reverts D90772. The testcase updates
highlight nicely how every removed CHECK for a dbg.value is preceded
by a new CHECK for a dbg.declare.

Thanks to JunMa, Yifeng, and Bruno for their thoughtful reviews!

Differential Revision: https://reviews.llvm.org/D93497

rdar://71866936
2021-01-26 15:01:26 -08:00
Bjorn Pettersson a9bd3d37bd [NewPM] Add ExtraVectorizerPasses support
As it looks like NewPM generally is using SimpleLoopUnswitch
instead of LoopUnswitch, this patch also use SimpleLoopUnswitch
in the ExtraVectorizerPasses sequence (compared with LegacyPM
which use the LoopUnswitch pass).

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D95457
2021-01-26 22:59:10 +01:00
Valery N Dmitriev 716b9dd0d8 [InstCombine] Preserve FMF for powi simplifications.
Differential Revision: https://reviews.llvm.org/D95455
2021-01-26 13:26:06 -08:00
Petr Hosek 1e634f3952 Revert "Support for instrumenting only selected files or functions"
This reverts commit 4edf35f11a because
the test fails on Windows bots.
2021-01-26 12:25:28 -08:00
Petr Hosek 4edf35f11a Support for instrumenting only selected files or functions
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.

Differential Revision: https://reviews.llvm.org/D94820
2021-01-26 11:11:39 -08:00
Sanjay Patel 09b1c56366 [LoopUtils] do not initialize Cmp predicate unnecessarily; NFC
The switch must set the predicate correctly; anything else
should lead to unreachable/assert.

I'm trying to fix FMF propagation here and the callers,
so this is a preliminary cleanup.
2021-01-26 11:22:51 -05:00
Florian Hahn 1272f16d14
[LoopUnswitch] Avoid partially unswitching too aggressively.
This patch adds additional checks to avoid partial unswitching
in cases where it won't be profitable, e.g. because the path directly
exits the loop anyways.
2021-01-26 15:18:41 +00:00
Florian Hahn 35b3989a30
[Passes] Run peeling as part of simple/full loop unrolling.
Loop peeling removes conditions from loop bodies that become invariant
after a small number of iterations. When triggered, this leads to fewer
compares and possibly PHIs in loop bodies, enabling further
optimizations. The current cost-model of loop peeling should be quite
conservative/safe, i.e. only peel if a condition in the loop becomes
known after peeling.

For example, see PR47671, where loop peeling enables vectorization by
removing a PHI the vectorizer does not understand. Granted, the
loop-vectorizer could also be taught about constant PHIs, but loop
peeling is likely to enable other optimizations as well.

This has an impact on quite a few benchmarks from
MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example

    Same hash: 186 (filtered out)
    Remaining: 51
    Metric: loop-vectorize.LoopsVectorized

    Program                                        base   patch  diff
     test-suite...ve-susan/automotive-susan.test     8.00   9.00 12.5%
     test-suite...nal/skidmarks10/skidmarks.test    35.00  31.00 -11.4%
     test-suite...lications/sqlite3/sqlite3.test    41.00  43.00  4.9%
     test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test    25.00  26.00  4.0%
     test-suite...006/450.soplex/450.soplex.test    88.00  89.00  1.1%
     test-suite...TimberWolfMC/timberwolfmc.test   120.00 119.00 -0.8%
     test-suite.../CINT2006/403.gcc/403.gcc.test   215.00 216.00  0.5%
     test-suite...006/447.dealII/447.dealII.test   957.00 958.00  0.1%
     test-suite...ternal/HMMER/hmmcalibrate.test    75.00  75.00  0.0%

    Same hash: 186 (filtered out)
    Remaining: 51
    Metric: loop-vectorize.LoopsAnalyzed

    Program                                        base    patch   diff
     test-suite...ks/Prolangs-C/agrep/agrep.test   440.00  434.00  -1.4%
     test-suite...nal/skidmarks10/skidmarks.test   312.00  308.00  -1.3%
     test-suite...marks/7zip/7zip-benchmark.test   6399.00 6323.00 -1.2%
     test-suite...lications/minisat/minisat.test   134.00  135.00   0.7%
     test-suite...rks/FreeBench/pifft/pifft.test   295.00  297.00   0.7%
     test-suite...TimberWolfMC/timberwolfmc.test   1879.00 1869.00 -0.5%
     test-suite...pplications/treecc/treecc.test   689.00  691.00   0.3%
     test-suite...T2000/300.twolf/300.twolf.test   1593.00 1597.00  0.3%
     test-suite.../Benchmarks/Bullet/bullet.test   1394.00 1392.00 -0.1%
     test-suite...ications/JM/ldecod/ldecod.test   1431.00 1429.00 -0.1%
     test-suite...6/464.h264ref/464.h264ref.test   2229.00 2230.00  0.0%
     test-suite...lications/sqlite3/sqlite3.test   2590.00 2589.00 -0.0%
     test-suite...ications/JM/lencod/lencod.test   2732.00 2733.00  0.0%
     test-suite...006/453.povray/453.povray.test   3395.00 3394.00 -0.0%

Note the -11% regression in number of loops vectorized for skidmarks. I
suspect this corresponds to the fact that those loops are gone now (see
the reduction in number of loops analyzed by LV).

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D88471
2021-01-26 13:52:30 +00:00
Sergey Dmitriev 13cedcaf45 [llvm-link] Fix crash when materializing appending global
This patch fixes llvm-link crash when materializing global variable
with appending linkage and initializer that depends on another
global with appending linkage.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D95329
2021-01-25 18:08:07 -08:00
modimo ce7f9cdb50 [InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks
This change leverages the work done in D83743 to replay in the SampleProfile inliner to also be used in the CGSCC inliner. NOTE: currently restricted to non-ML advisors only.

The added switch `-cgscc-inline-replay=<remarks file>` will replay the inlining decisions in that file where the remarks file is generated via `-Rpass=inline`. The aim here is to make it easier to analyze changes that would modify inlining heuristics to be separated from this behavior. Doing so allows easier examination of assembly and runtime behavior compared to the baseline rather than trying to dig through the large churn caused by inlining.

In LTO compilation, since inlining is done twice you can separately specify replay by passing the flag to the FE (`-cgscc-inline-replay=`) and to the linker (`-Wl,cgscc-inline-replay=`) with the remarks generated from their respective places.

Testing on mysqld by comparing the inline decisions between base (generates remarks.txt) and diff (replay using identical input/tools with remarks.txt) and examining the inlining sites with `diff` shows 14,000 mismatches out of 247,341 for a ~94% replay accuracy. I believe this gap can be narrowed further though for the general case we may never achieve full accuracy. For my personal use, this is close enough to be representative: I set the baseline as the one generated by the replay on identical input/toolset and compare that to my modified input/toolset using the same replay.

Testing:
ninja check-llvm
newly added test correctly replays CGSCC inlining decisions

Reviewed By: mtrofin, wenlei

Differential Revision: https://reviews.llvm.org/D94334
2021-01-25 15:38:57 -08:00
Nikita Popov 835104a114 [LSR] Drop potentially invalid nowrap flags when switching to post-inc IV (PR46943)
When LSR converts a branch on the pre-inc IV into a branch on the
post-inc IV, the nowrap flags on the addition may no longer be valid.
Previously, a poison result of the addition might have been ignored,
in which case the program was well defined. After branching on the
post-inc IV, we might be branching on poison, which is undefined behavior.

Fix this by discarding nowrap flags which are not present on the SCEV
expression. Nowrap flags on the SCEV expression are proven by SCEV
to always hold, independently of how the expression will be used.
This is essentially the same fix we applied to IndVars LFTR, which
also performs this kind of pre-inc to post-inc conversion.

I believe a similar problem can also exist for getelementptr inbounds,
but I was not able to come up with a problematic test case. The
inbounds case would have to be addressed in a differently anyway
(as SCEV does not track this property).

Fixes https://bugs.llvm.org/show_bug.cgi?id=46943.

Differential Revision: https://reviews.llvm.org/D95286
2021-01-25 23:13:48 +01:00
Richard Smith 925ae8c790 Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV"
This reverts commit 53176c1680, which
introduceed a layering violation. LLVM's IR library can't include
headers from Analysis.
2021-01-25 13:53:38 -08:00
Akira Hatanaka 53176c1680 [ObjC][ARC] Annotate calls with attributes instead of emitting retainRV
or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end annotates calls with attribute "clang.arc.rv"="retain"
  or "clang.arc.rv"="claim", which indicates the call is implicitly
  followed by a marker instruction and a retainRV/claimRV call that
  consumes the call result. This is currently done only when the target
  is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the
  annotated calls in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the annotated
  calls. It doesn't remove the attribute on the call since the backend
  needs it to emit the marker instruction. The retainRV/claimRV calls
  are emitted late in the pipeline to prevent optimization passes from
  transforming the IR in a way that makes it harder for the ARC
  middle-end passes to figure out the def-use relationship between the
  call and the retainRV/claimRV calls (which is the cause of PR31925).

- The function inliner removes the autoreleaseRV call in the callee that
  returns the result if nothing in the callee prevents it from being
  paired up with the calls annotated with "clang.arc.rv"="retain/claim"
  in the caller. If the call is annotated with "claim", a release call
  is inserted since autoreleaseRV+claimRV is equivalent to a release. If
  it cannot find an autoreleaseRV call, it tries to transfer the
  attributes to a function call in the callee. This is important since
  ARC optimizer can remove the autoreleaseRV call returning the callee
  result, which makes it impossible to pair it up with the retainRV or
  claimRV call in the caller. If that fails, it simply emits a retain
  call in the IR if the call is annotated with "retain" and does nothing
  if it's annotated with "claim".

- This patch teaches dead argument elimination pass not to change the
  return type of a function if any of the calls to the function are
  annotated with attribute "clang.arc.rv". This is necessary since the
  pass can incorrectly determine nothing in the IR uses the function
  return, which can happen since the front-end no longer explicitly
  emits retainRV/claimRV calls in the IR, and change its return type to
  'void'.

Future work:

- Use the attribute on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the attributes.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-01-25 11:57:08 -08:00
Florian Hahn 76afbf60ed
[VPlan] Replace uses with new value in VPInstructionsToVPRecipe (NFC).
Now that VPRecipeBase inherits from VPDef, we can always use the new
VPValue for replacement, if the recipe defines one. Given the recipes
that are supported at the moment, all new recipes must have either 0 or
1 defined values.
2021-01-25 19:38:08 +00:00
Nick Desaulniers d36812892c [GVN] do not repeat PRE on failure to split critical edge
Fixes an infinite loop encountered in GVN.

GVN will delay PRE if it encounters critical edges, attempt to split
them later via calls to SplitCriticalEdge(), then restart.

The caller of GVN::splitCriticalEdges() assumed a return value of true
meant that critical edges were split, that the IR had changed, and that
PRE should be re-attempted, upon which we loop infinitely.

This was exposed after D88438, by compiling the Linux kernel for s390,
but the test case is reproducible on x86.

Fixes: https://github.com/ClangBuiltLinux/linux/issues/1261

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D94996
2021-01-25 11:23:44 -08:00
Wei Mi c9cd9a0066 [SampleFDO] Report error when reading a bad/incompatible profile instead of
turning off SampleFDO silently.

Currently sample loader pass turns off SampleFDO optimization silently when
it sees error in reading the profile. This behavior will defeat the tests
which could have caught those bad/incompatible profile problems. This patch
change the behavior to report error.

Differential Revision: https://reviews.llvm.org/D95269
2021-01-25 10:28:23 -08:00
Xun Li 17c3538aef Revert "Fix unused variable in CoroFrame.cpp when building Release with GCC 10"
This reverts commit ff5e896425.
2021-01-25 08:37:45 -08:00
Florian Hahn 3201274dea
[VPlan] Handle scalarized values in VPTransformState.
This patch adds plumbing to handle scalarized values directly in
VPTransformState.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D92282
2021-01-25 14:21:56 +00:00
Sanjay Patel 09a136bcc6 [InstCombine] narrow min/max intrinsics with extended inputs
We can sink extends after min/max if they match and would
not change the sign-interpreted compare. The only combo
that doesn't work is zext+smin/smax because the zexts
could change a negative number into positive:
https://alive2.llvm.org/ce/z/D6sz6J

Sext+umax/umin works:

  define i32 @src(i8 %x, i8 %y) {
  %0:
    %sx = sext i8 %x to i32
    %sy = sext i8 %y to i32
    %m = umax i32 %sx, %sy
    ret i32 %m
  }
  =>
  define i32 @tgt(i8 %x, i8 %y) {
  %0:
    %m = umax i8 %x, %y
    %r = sext i8 %m to i32
    ret i32 %r
  }
  Transformation seems to be correct!
2021-01-25 07:52:50 -05:00
Sander de Smalen 171d12489f [SLPVectorizer] NFC: Migrate getVectorCallCosts to use InstructionCost.
This change also changes getReductionCost to return InstructionCost,
and it simplifies two expressions by removing a redundant 'isValid' check.
2021-01-25 12:27:01 +00:00
Nikita Popov 8b9df70bf7 [Utils] Use NoAliasScopeDeclInst in a few more places (NFC)
In the cloning infrastructure, only track an MDNode mapping,
without explicitly storing the Metadata mapping, same as is done
during inlining. This makes things slightly simpler.
2021-01-24 16:24:11 +01:00
Sanjay Patel 77adbe6a8c [SLP] fix fast-math requirements for fmin/fmax reductions
a6f0221276 enabled intersection of FMF on reduction instructions,
so it is safe to ease the check here.

There is still some room to improve here - it looks like we
have nearly duplicate flags propagation logic inside of the
LoopUtils helper but it is limited targets that do not form
reduction intrinsics (they form the shuffle expansion).
2021-01-24 08:55:56 -05:00
Jeroen Dobbelaere dcc7706fcf [InstCombine] Remove unused llvm.experimental.noalias.scope.decl
A @llvm.experimental.noalias.scope.decl is only useful if there is !alias.scope and !noalias metadata that uses the declared scope.
When that is not the case for at least one of the two, the intrinsic call can as well be removed.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95141
2021-01-24 13:55:50 +01:00
Jeroen Dobbelaere 659c7bcde6 [LoopRotate] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed
Similar to D92887, LoopRotation also needs duplicate the noalias scopes when rotating a `@llvm.experimental.noalias.scope.decl` across a block boundary.
This is based on the version from the Full Restrict paches (D68511).

The problem it fixes also showed up in Transforms/Coroutines/ex5.ll after D93040 (when enabling strict checking with -verify-noalias-scope-decl-dom).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94306
2021-01-24 13:53:13 +01:00
Jeroen Dobbelaere 774629641b [LoopUnroll] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed
This is a fix for https://bugs.llvm.org/show_bug.cgi?id=39282. Compared to D90104, this version is based on part of the full restrict patched (D68484) and uses the `@llvm.experimental.noalias.scope.decl` intrinsic to track the location where !noalias and !alias.scope scopes have been introduced. This allows us to only duplicate the scopes that are really needed.

Notes:
- it also includes changes and tests from D90104

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D92887
2021-01-24 13:48:20 +01:00
Roman Lebedev 6f2753273e
[NFC][SimplifyCFG] Extract CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses() out of PerformBranchToCommonDestFolding()
To be used in PerformValueComparisonIntoPredecessorFolding()
2021-01-24 00:54:55 +03:00