Commit Graph

26805 Commits

Author SHA1 Message Date
Nikita Popov 46354bac76 [OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC)
Explicitly pass loaded type when creating loads, in preparation
for the deprecation of these APIs.

There are still a couple of uses left.
2021-03-11 14:40:57 +01:00
gbtozers c0f3dfb9f1 [DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands
This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic
operations with two or more non-const operands; this includes the GetElementPtr
instruction, and most Binary Operator instructions. These salvages produce
DIArgList locations and are only valid for dbg.values, as currently variadic
DIExpressions must use DW_OP_stack_value. This functionality is also only added
for salvageDebugInfoForDbgValues; other functions that directly call
salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be
updated in a later patch.

Differential Revision: https://reviews.llvm.org/D91722
2021-03-11 13:33:49 +00:00
Nikita Popov 403da6a69a Reapply [LICM] Make promotion faster
Relative to the previous implementation, this always uses
aliasesUnknownInst() instead of aliasesPointer() to correctly
handle atomics. The added test case was previously miscompiled.

-----

Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-11 10:50:28 +01:00
Djordje Todorovic 9f41c03f82 [Debugify][OriginalDIMode] Export the report into JSON file
By using the original-di check with debugify in the combination with
the llvm/utils/llvm-original-di-preservation.py it becomes very user
friendly tool. An example of the HTML page with the issues
related to debug info can be found at [0].

[0] https://djolertrk.github.io/di-checker-html-report-example/

Differential Revision: https://reviews.llvm.org/D82546
2021-03-11 01:11:13 -08:00
Petr Hosek c7712087cb [InstrProfiling] Don't generate __llvm_profile_runtime_user
This is no longer needed, we can add __llvm_profile_runtime directly
to llvm.compiler.used or llvm.used to achieve the same effect.

Differential Revision: https://reviews.llvm.org/D98325
2021-03-10 22:33:51 -08:00
Ruiling Song 8b7d3bed0f [ValueMapper] Add debug output for metadata remapping
This is useful for debugging which pointers are updated during remapping
process.

Differential Revision: https://reviews.llvm.org/D95775
2021-03-11 09:54:55 +08:00
kuterd d75c9e61a5 [Attributor] Attributor call site specific AAValueConstantRange
This patch makes uses of the context bridges introduced in D83299 to make
AAValueConstantRange call site specific.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83744
2021-03-11 01:19:44 +03:00
Mauri Mustonen 0de8aeae72
[VPlan] Support to widen select intructions in VPlan native path
Add support to widen select instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer.

Previously select instructions get handled by the wrong recipe and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D97136
2021-03-10 20:59:53 +00:00
Matteo Favaro 989051d5f8 [DSE] Extending isOverwrite to support offsetted fully overlapping stores
The isOverwrite function is making sure to identify if two stores
are fully overlapping and ideally we would like to identify all the
instances of OW_Complete as they'll yield possibly killable stores.
The current implementation is incapable of spotting instances where
the earlier store is offsetted compared to the later store, but
still fully overlapped. The limitation seems to lie on the
computation of the base pointers with the
GetPointerBaseWithConstantOffset API that often yields different
base pointers even if the stores are guaranteed to partially overlap
(e.g. the alias analysis is returning AliasResult::PartialAlias).

The patch relies on the offsets computed and cached by BatchAAResults
(available after D93529) to determine if the offsetted overlapping
is OW_Complete.

Differential Revision: https://reviews.llvm.org/D97676
2021-03-10 21:09:33 +01:00
Sriraman Tallam 0ba1ebcbb7 Remove original implementation of UniqueInternalLinkageNames pass.
D96109 was recently submitted which contains the refactored implementation of
-funique-internal-linakge-names by adding the unique suffixes in clang rather
than as an LLVM pass. Deleting the former implementation in this change.

Differential Revision: https://reviews.llvm.org/D98234
2021-03-10 11:57:40 -08:00
gbtozers 81b8357e70 [DebugInfo][NFC] Refactor BinOp+GEP salvaging in salvageDebugInfoImpl
This patch refactors out the salvaging of GEP and BinOp instructions into
separate functions, in preparation for further changes to the salvaging of these
instructions coming in another patch; there should be no functional change as a
result of this refactor.

Differential Revision: https://reviews.llvm.org/D92851
2021-03-10 18:03:12 +00:00
Daniil Seredkin 7c49f3c75b [InstCombine][SimplifyLibCalls] An extra sqrtf was produced because of transformations in optimizePow function
See: https://bugs.llvm.org/show_bug.cgi?id=47613

There was an extra sqrt call because shrinking emitted a new powf and at the same time optimizePow replaces the previous pow with sqrt and as the result we have two instructions that will be in worklist of InstCombie despite the fact that %powf is not used by anyone (it is alive because of errno).

As the result we have two instructions:

  %powf = call fast float @powf(float %x, float 5.000000e-01)
  %sqrt = call fast double @sqrt(double %dx)

%powf will be converted to %sqrtf on a later iteration.

As a quick fix for that I moved shrinking to the end of optimizePow so that pow is replaced with sqrt at first that allows not to emit a new shrunk powf.

Differential Revision: https://reviews.llvm.org/D98235
2021-03-10 12:33:05 -05:00
Ta-Wei Tu 7ff2768be1 Revert "[LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest`"
This reverts commit df9158c9a4.
2021-03-11 01:24:43 +08:00
Jianzhou Zhao 6a9a686ce7 [dfsan] Tracking origins at phi nodes
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D98268
2021-03-10 17:02:58 +00:00
Dávid Bolvanský c68b560be3 [DSE] Handle memmove with equal non-const sizes
Follow up for fhahn's D98284. Also fixes a case from PR47644.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D98346
2021-03-10 17:52:00 +01:00
Florian Hahn 8d9b9c0edc
[DSE] Handle memcpy/memset with equal non-const sizes.
Currently DSE misses cases where the size is a non-const IR value, even
if they match. For example, this means that llvm.memcpy/llvm.memset
calls are not eliminated, even if they write the same number of bytes.

This patch extends isOverwite to try to get IR values for the number of
bytes written from the analyzed instructions. If the values match,
alias checks are performed and the result is returned.

At the moment this only covers llvm.memcpy/llvm.memset. In the future,
we may enable MemoryLocation to also track variable sizes, but this
simple approach should allow us to cover the important cases in DSE.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98284
2021-03-10 10:13:58 +00:00
Wei Mi ee35784a90 [SampleFDO] Support enabling -funique-internal-linkage-name.
now -funique-internal-linkage-name flag is available, and we want to flip
it on by default since it is beneficial to have separate sample profiles
for different internal symbols with the same name. As a preparation, we
want to avoid regression caused by the flip.

When we flip -funique-internal-linkage-name on, the profile is collected
from binary built without -funique-internal-linkage-name so it has no uniq
suffix, but the IR in the optimized build contains the suffix. This kind of
mismatch may introduce transient regression.

To avoid such mismatch, we introduce a NameTable section flag indicating
whether there is any name in the profile containing uniq suffix. Compiler
will decide whether to keep uniq suffix during name canonicalization
depending on the NameTable section flag. The flag is only available for
extbinary format. For other formats, by default compiler will keep uniq
suffix so they will only experience transient regression when
-funique-internal-linkage-name is just flipped.

Another type of regression is caused by places where we miss to call
getCanonicalFnName. Those places are fixed.

Differential Revision: https://reviews.llvm.org/D96932
2021-03-09 21:41:40 -08:00
Philip Reames cf1899e0a9 [rs4gc] common bdv operand visitation [nfc] 2021-03-09 20:28:47 -08:00
Arnold Schwaighofer 590ac0a26a [coro async] Transfer the original function's attributes to the clone
rdar://75052917

Differential Revision: https://reviews.llvm.org/D98051
2021-03-09 17:01:41 -08:00
Leonard Chan cf371573b0 [llvm] Change DSOLocalEquivalent type if the underlying global value type changes
We encountered an issue where LTO running on IR that used the DSOLocalEquivalent
constant would result in bad codegen. The underlying issue was ValueMapper wasn't
properly handling DSOLocalEquivalent, so this just adds the machinery for handling
it. This code path is triggered by a fix to DSOLocalEquivalent::handleOperandChangeImpl
where DSOLocalEquivalent could potentially not have the same type as its underlying GV.

This updates DSOLocalEquivalent::handleOperandChangeImpl to change the type if
the GV type changes and handles this constant in ValueMapper.

Differential Revision: https://reviews.llvm.org/D97978
2021-03-09 15:09:48 -08:00
Sanjay Patel 23fd647cc6 [SLP] remove dead null check; NFC
We cast<> to Instruction (not dyn_cast<>), so we already
required/assumed that Cmp is not null.
2021-03-09 17:43:07 -05:00
Jianzhou Zhao 8506fe5b41 [dfsan] Tracking origins at memory transfer
This is a part of https://reviews.llvm.org/D95835.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D98192
2021-03-09 22:15:07 +00:00
Juneyoung Lee f49354838e Revert "[InstCombine] Add simplification of two logical and/ors"
This reverts commit 07c3b97e18 due to a reported failure in two-stage build.
2021-03-10 05:48:31 +09:00
gbtozers df69c69427 [DebugInfo] Handle multiple variable location operands in IR
This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.

Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.

Differential Revision: https://reviews.llvm.org/D88232
2021-03-09 16:44:38 +00:00
Sanjay Patel 2986a9c7e2 [InstCombine] canonicalize 'not' op after min/max intrinsic
This is another step towards parity between existing select
transforms and min/max intrinsics (D98152)..

The existing 'not' folds around select are complicated, so
it's likely that we will need to enhance this, but this
should be a safe step.
2021-03-09 11:33:28 -05:00
Sanjay Patel 41b9209a12 [InstCombine] fold min/max intrinsics with not ops
This is a partial translation of the existing select-based
folds. We need to recreate several different transforms to
avoid regressions as noted in D98152.

https://alive2.llvm.org/ce/z/teuZ_J
2021-03-09 08:55:48 -05:00
Florian Hahn 92da5b7119
[InstCombine] Simplify phis with incoming pointer-casts.
If the incoming values of a phi are pointer casts of the same original
value, replace the phi with a single cast. Such redundant phis are
somewhat common after loop-rotate and removing them can avoid some
unnecessary code bloat, e.g. because an iteration of a loop is peeled
off to make the phi invariant. It should also simplify further analysis
on its own.

InstCombine already uses stripPointerCasts in a couple of places and
also simplifies phis based on the incoming values, so the patch should
fit in the existing scope.

The patch causes binary changes in 47 out of 237 benchmarks in
MultiSource/SPEC2000/SPEC2006 with -O3 -flto on X86.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98058
2021-03-09 11:40:18 +00:00
Hongtao Yu 4c3d759d00 [CSSPGO] Always use callsite samples as callsite probe counts.
For CS profile, the callsite count of previously inlined callees is populated with the entry count of the callees. Therefore when trying to get a weight for calliste probe after inlinining, the callsite count should always be used. The same fix has already been made for non-probe case.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D98094
2021-03-08 22:52:36 -08:00
Akira Hatanaka dca5737945 Move ObjCARCUtil.h back to llvm/Analysis
Instead of adding the header to llvm/IR, just duplicate the marker
string in the auto upgrader.
2021-03-08 16:35:24 -08:00
Alina Sbirlea 29482426b5 Revert "[LICM] Make promotion faster"
Revert 3d8f842712
Revision triggers a miscompile sinking a store incorrectly outside a
threading loop. Detected by tsan.
Reverting while investigating.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-08 12:53:03 -08:00
Philip Reames ebc61f9d3c [instcombine] Collapse trivial or recurrences
If we have a recurrence of the form <Start, Or, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence.

Differential Revision: https://reviews.llvm.org/D97578 (or part)
2021-03-08 09:21:38 -08:00
Philip Reames 239a618180 [instcombine] Collapse trivial and recurrences
If we have a recurrence of the form <Start, And, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence.

Differential Revision: https://reviews.llvm.org/D97578 (and part)
2021-03-08 09:21:38 -08:00
Philip Reames 97a7bc5831 [gvn] Precisely propagate equalities to phi operands
The code used for propagating equalities (e.g. assume facts) was conservative in two ways - one of which this patch fixes. Specifically, it shifts the code reasoning about whether a use is dominated by the end of the assume block to consider phi uses to exist on the predecessor edge. This matches the dominator tree handling for dominates(Edge, Use), and simply extends it to dominates(BB, Use).

Note that the decision to use the end of the block is itself a conservative choice. The more precise option would be to use the later of the assume and the value, and replace all uses after that. GVN handles that case separately (with the replace operand mechanism) because it used to be expensive to ask dominator questions within blocks. With the new instruction ordering support, we should probably rewrite this code at some point to simplify.

Differential Revision: https://reviews.llvm.org/D98082
2021-03-08 08:59:00 -08:00
Sanne Wouda 05a6e2eb9a [InstCombine] Add a combine for a shuffle of similar bitcasts
Some intrinsics wrapper code has the habit of ignoring the type of the
elements in vectors, thinking of vector registers as a "bag of bits". As
a consequence, some operations are shared between vectors of different
types are shared. For example, functions that rearrange elements in a
vector can be shared between vectors of int32 and float.

This can result in bitcasts in awkward places that prevent the backend
from recognizing some instructions. For AArch64 in particular, it
inhibits the selection of dup from a general purpose register (GPR), and
mov from GPR to a vector lane.

This patch adds a pattern in InstCombine to move the bitcasts past the
shufflevector if this is possible. Sometimes this even allows
InstCombine to remove the bitcast entirely, as in the included tests.

Alternatively this could be done with a few extra patterns in the
AArch64 backend, but InstCombine seems like a better place for this.

Differential Revision: https://reviews.llvm.org/D97397
2021-03-08 16:32:30 +00:00
Sanne Wouda 5e963a2441 Rehome an orphaned comment [NFC]
As seen in 35827164c4, the "shuffle x, x, mask" comment has drifted away
from the implementation of the pattern. Put it back.
2021-03-08 16:32:30 +00:00
Stephen Tozer 4343c68fa3 Fix: [DebugInfo] Support DIArgList in DbgVariableIntrinsic
This patch removed the only use of a lambda capture, triggering an error
on `-Werror -Wunused-lambda-capture` builds.
2021-03-08 14:57:11 +00:00
gbtozers e5d958c456 [DebugInfo] Support DIArgList in DbgVariableIntrinsic
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
location operand, resulting in a significant change to its interface. This patch
does not update all IR passes to support multiple location operands in a
dbg.value; the only change is to update the DbgVariableIntrinsic interface and
its uses. All code outside of the intrinsic classes assumes that an intrinsic
will always have exactly one location operand; they will still support
DIArgLists, but only if they contain exactly one Value.

Among other changes, the setOperand and setArgOperand functions in
DbgVariableIntrinsic have been made private. This is to prevent code from
setting the operands of these intrinsics directly, which could easily result in
incorrect/invalid operands being set. This does not prevent these functions from
being called on a debug intrinsic at all, as they can still be called on any
CallInst pointer; it is assumed that any code directly setting the operands on a
generic call instruction is doing so safely. The intention for making these
functions private is to prevent DIArgLists from being overwritten by code that's
naively trying to replace one of the Values it points to, and also to fail fast
if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
corresponding DIExpression.
2021-03-08 14:36:13 +00:00
Ta-Wei Tu df9158c9a4 [LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest`
The check `tightlyNested()` in `LoopInterchange` is similar to the one in `LoopNest`.
In fact, the former misses some cases where loop-interchange is not feasible and results in incorrect behaviour.
Replacing it with the much robust version provided by `LoopNest` reduces code duplications and fixes https://bugs.llvm.org/show_bug.cgi?id=48113.

`LoopInterchange` has a weaker definition of tightly or perfectly nesting-ness than the one implemented in `LoopNest::arePerfectlyNested()`.
Therefore, `tightlyNested()` is instead implemented with `LoopNest::checkLoopsStructure` and additional checks for unsafe instructions.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D97290
2021-03-08 11:36:08 +08:00
Mehdi Amini 8d5a981a13 Revert "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe"
This reverts commit 99108c791d.
Clang is miscompiling LLVM with this change, a stage-2 build hits
multiple failures.

As a repro, I built clang in a stage1 directory and used it this way:

cmake -G Ninja ../llvm \
  -DCMAKE_CXX_COMPILER=`pwd`/../build-stage1/bin/clang++ \
  -DCMAKE_C_COMPILER=`pwd`/../build-stage1/bin/clang \
  -DLLVM_TARGETS_TO_BUILD="X86;NVPTX;AMDGPU" \
  -DLLVM_ENABLE_PROJECTS=mlir \
  -DLLVM_BUILD_EXAMPLES=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_ENABLE_ASSERTIONS=On
ninja check-mlir
2021-03-08 00:15:47 +00:00
Juneyoung Lee 07c3b97e18 [InstCombine] Add simplification of two logical and/ors
This is a patch that adds folding of two logical and/ors that share one variable:

a && (a && b) -> a && b
a && (a & b)  -> a && b
...

This is towards removing the poison-unsafe select optimization (D93065 has more context).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96945
2021-03-08 02:38:43 +09:00
Juneyoung Lee d672c81126 [InstCombine] use safe transformation by default
.. since it will be folded into and/or anyway
2021-03-08 02:25:29 +09:00
Nikita Popov 2b494f85f1 [CVP] Remove -cvp-dont-add-nowrap-flags option
This option was originally added to work around a bug in LFTR.
The bug has long since been fixed.
2021-03-07 18:19:31 +01:00
Nikita Popov 176bbcae11 [DSE] Remove MemDep-based implementation
The MemorySSA-based implementation has been enabled without issue
for a while now, so keeping the old implementation around doesn't
seem useful anymore. This drops the MemDep-based implementation.

Differential Revision: https://reviews.llvm.org/D97877
2021-03-07 18:17:31 +01:00
Juneyoung Lee 33590ed4f2 [InstCombine] fix another poison-unsafe select transformation
This fixes another unsafe select folding by disabling it if
EnableUnsafeSelectTransform is set to false.

EnableUnsafeSelectTransform's default value is true, hence it won't
affect generated code (unless the flag is explicitly set to false).
2021-03-08 02:11:04 +09:00
Juneyoung Lee 99108c791d [SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe
This patch makes FoldBranchToCommonDest merge branch conditions into `select i1` rather than `and/or i1` when it is called by SimplifyCFG.
It is known that merging conditions into and/or is poison-unsafe, and this is towards making things *more* correct by removing possible miscompilations.
Currently, InstCombine simply consumes these selects into and/or of i1 (which is also unsafe), so the visible effect would be very small. The unsafe select -> and/or transformation will be removed in the future.
There has been efforts for updating optimizations to support the select form as well, and they are linked to D93065.

The safe transformation is fired when it is called by SimplifyCFG only. This is done by setting the new `PoisonSafe` argument as true.
Another place that calls FoldBranchToCommonDest is LoopSimplify. `PoisonSafe` flag is set to false in this case because enabling it has a nontrivial impact in performance because SCEV is more conservative with select form and InductiveRangeCheckElimination isn't aware of select form of and/or i1.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95026
2021-03-08 01:38:03 +09:00
Juneyoung Lee 5bb38e84d3 [LoopUnswitch] unswitch if cond is in select form of and/or as well
Hello all,
I'm trying to fix unsafe propagation of poison values in and/or conditions by using
equivalent select forms (`select i1 A, i1 B, i1 false` and `select i1 A, i1 true, i1 false`)
instead.
D93065 has links to patches for this.

This patch allows unswitch to happen if the condition is in this form as well.
`collectHomogenousInstGraphLoopInvariants` is updated to keep traversal if
Root and the visiting I matches both m_LogicalOr()/m_LogicalAnd().
Other than this, the remaining changes are almost straightforward and simply replaces
Instruction::And/Or check with match(m_LogicalOr()/m_LogicalAnd()).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D97756
2021-03-08 01:19:43 +09:00
Nikita Popov 3fedaf2a52 [GVN] Don't explicitly materialize undefs from dead blocks
When materializing an available load value, do not explicitly
materialize the undef values from dead blocks. Doing so will
will force creation of a phi with an undef operand, even if there
is a dominating definition. The phi will be folded away on
subsequent GVN iterations, but by then we may have already
poisoned MDA cache slots.

Simply don't register these values in the first place, and let
SSAUpdater do its thing.
2021-03-06 23:46:24 +01:00
Fangrui Song fb2cf0dd60 [FunctionImport] Delete unneeded setLive. NFC
ValueInfo's in Worklist are guaranteed to be live.
2021-03-06 14:09:54 -08:00
Mauri Mustonen 494b5ba364
[VPlan] Support to widen call intructions in VPlan native path
Add support to widen call instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer.

Previously call instructions got handled by wrong recipes and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139.

Patch by Mauri Mustonen <mauri.mustonen@tuni.fi>

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D97278
2021-03-06 21:59:52 +00:00
Roman Lebedev 2ad1f5eb1a
[InstCombine] Don't canonicalize (gep i8* X, -(ptrtoint Y)) as (inttoptr (sub (ptrtoint X), (ptrtoint Y)))
It's just a wrong thing to do.

We introduce inttoptr where there were none, which results in
loosing all provenance information because we no longer have a GEP{i,},
and pessimize all future optimizations,
because we are basically not allowed to look past `inttoptr`.

(gep i8* X, -(ptrtoint Y))  *is* the canonical form.
So just drop this fold.

Noticed while reviewing D98120.
2021-03-06 23:00:25 +03:00
Fangrui Song 9fb6782c69 [rs4gc] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds 2021-03-06 11:42:27 -08:00
Roman Lebedev b46c085d2b
[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.

This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.

Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.

Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
2021-03-06 21:52:46 +03:00
Ta-Wei Tu 8a003861a3 [NPM] Add -enable-loopinterchange option to NPM
We have the `enable-loopinterchange` option in legacy pass manager but not in NPM.
Add `LoopInterchange` pass to the optimization pipeline (at the same position as before)
when `enable-loopinterchange` is turned on.

Reviewed By: aeubanks, fhahn

Differential Revision: https://reviews.llvm.org/D98116
2021-03-07 02:39:28 +08:00
William S. Moses d163e75c81 [Attributor] Enable heap-to-stack of any size
Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1

Differential Revision: https://reviews.llvm.org/D97873
2021-03-06 12:57:32 -05:00
Philip Reames 5db2735af9 [gvn] Handle simply phi equivalence cases
GVN basically doesn't handle phi nodes at all. This is for a reason - we can't value number their inputs since the predecessor blocks have probably not been visited yet.

However, it also creates a significant pass ordering problem. As it stands, instcombine and simplifycfg ends up implementing CSE of phi nodes. This means that for any series of CSE opportunities intermixed with phi nodes, we end up having to alternate instcombine/simplifycfg and gvn to make progress.

This patch handles the simplest case by simply preprocessing the phi instructions in a block, and CSEing them if they are syntactically identical. This turns out to be powerful enough to handle many cases in a single invocation of GVN since blocks which use the cse'd phi results are visited after the block containing the phi. If there's a CSE opportunity in one the phi predecessors required to recognize the phi CSE opportunity, that will require a second iteration on the function. (Still within a single run of gvn though.)

Compile time wise, this could go either way. On one hand, we're potentially causing GVN to iterate over the function more. On the other, we're cutting down on iterations between two passes and potentially shrinking the IR aggressively. So, a bit unclear what to expect.

Note that this does still rely on instcombine to canonicalize block order of the phis, but that's a one time transformation independent of the values incoming to the phi.

Differential Revision: https://reviews.llvm.org/D98080
2021-03-06 09:31:12 -08:00
Philip Reames 8fe59ba51e [rs4gc] track the original value in the state use for base pointer rewriting
I'd originally intended to build on this for another purpose and have decided not to, but at a minimum, the stronger asserts are useful.
2021-03-06 08:46:15 -08:00
Philip Reames 6334952ff0 [rs4gc] minor code style improvement 2021-03-06 08:46:15 -08:00
Philip Reames 51b13a7ea0 [gvn] CSE gc.relocates based on meaning, not spelling
The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.)

We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass.

Differential Revision: https://reviews.llvm.org/D97974

(Note: Part of the reviewed change was split and landed as f352463a)
2021-03-05 10:16:12 -08:00
Philip Reames f352463ade Mark gc.relocate and gc.result as readnone
For some reason, we had been marking gc.relocates as reading memory. There's no known reason for this, and I suspect it to be a legacy of very early implementation conservatism.  gc.relocate and gc.result are simply projections of the return values from the associated statepoint.  Note that the LangRef has always declared them readnone.

The EarlyCSE change is simply moving the special casing from readonly to readnone handling.

As noted by the test diffs, this does allow some additional CSE when relocates are separated by stores, but since we generate gc.relocates in batches, this is unlikely to help anything in practice.

This was reviewed as part of https://reviews.llvm.org/D97974, but split at reviewer request before landing.  The motivation is to enable the GVN changes in that patch.
2021-03-05 10:07:17 -08:00
Philip Reames 99f93dd3a5 [rs4gc] avoid insert base computation instructions for deopt uses
If we have a value live over a call which is used for deopt at the call, we know that the value must be a base pointer. We can avoid potentially inserting IR to materialize a base for this value.

In it's current form, this is mostly a compile time optimization.   Building the base pointer graph (and then optimizing it away again) is a relatively expensive operation.  We also sometimes end up with better codegen in practice - due to failures in optimizing away the inserted base pointer propogation - but those are optimization bugs we're fixing concurrently.

The alternative to this would be to extend the base pointer inference with the ability to generally reuse multiple-base input instructions (phis and selects).  That's somewhat invasive and complicated, so we're defering it a bit longer.

Differential Revision: https://reviews.llvm.org/D97885
2021-03-05 09:55:36 -08:00
gbtozers 65600cb2a7 [DebugInfo] Add DIArgList MD to store multple values in DbgVariableIntrinsics
This patch adds a new metadata node, DIArgList, which contains a list of SSA
values. This node is in many ways similar in function to the existing
ValueAsMetadata node, with the difference being that it tracks a list instead of
a single value. Internally, it uses ValueAsMetadata to track the individual
values, but there is also a reasonable amount of DIArgList-specific
value-tracking logic on top of that. Similar to ValueAsMetadata, it is a special
case in parsing and printing due to the fact that it requires a function state
(as it may reference function-local values).

This patch should not result in any immediate functional change; it allows for
DIArgLists to be parsed and printed, but debug variable intrinsics do not yet
recognize them as a valid argument (outside of parsing).

Differential Revision: https://reviews.llvm.org/D88175
2021-03-05 17:02:24 +00:00
David Sherwood fec0a0adac [SVE][LoopVectorize] Add support for extracting the last lane of a scalable vector
There are certain loops like this below:

  for (int i = 0; i < n; i++) {
    a[i] = b[i] + 1;
    *inv = a[i];
  }

that can only be vectorised if we are able to extract the last lane of the
vectorised form of 'a[i]'. For fixed width vectors this already works since
we know at compile time what the final lane is, however for scalable vectors
this is a different story. This patch adds support for extracting the last
lane from a scalable vector using a runtime determined lane value. I have
added support to VPIteration for runtime-determined lanes that still permit
the caching of values. I did this by introducing a new class called VPLane,
which describes the lane we're dealing with and provides interfaces to get
both the compile-time known lane and the runtime determined value. Whilst
doing this work I couldn't find any explicit tests for extracting the last
lane values of fixed width vectors so I added tests for both scalable and
fixed width vectors.

Differential Revision: https://reviews.llvm.org/D95139
2021-03-05 09:57:56 +00:00
Michael Kruse b119120673 [clang][OpenMP] Use OpenMPIRBuilder for workshare loops.
Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to:
 * Only the worksharing-loop directive.
 * Recognizes only the nowait clause.
 * No loop nests with more than one loop.
 * Untested with templates, exceptions.
 * Semantic checking left to the existing infrastructure.

This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics:
 * The distance function: How many loop iterations there will be before entering the loop nest.
 * The loop variable function: Conversion from a logical iteration number to the loop variable.

These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang.

The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does.

For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting.

Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset.

Reviewed By: jdenny

Differential Revision: https://reviews.llvm.org/D94973
2021-03-04 22:52:59 -06:00
Wei Mi 2357d29335 [SampleFDO] Another fix to prevent repeated indirect call promotion in
sample loader pass.

In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9,
to prevent repeated indirect call promotion for the same indirect call
and the same target, we used zero-count value profile to indicate an
indirect call has been promoted for a certain target. We removed
PromotedInsns cache in the same patch. However, there was a problem in
that patch described below, and that problem led me to add PromotedInsns
back as a mitigation in
https://reviews.llvm.org/rG4ffad1fb489f691825d6c7d78e1626de142f26cf.

When we get value profile from metadata by calling getValueProfDataFromInst,
we need to specify the maximum possible number of values we expect to read.
We uses MaxNumPromotions in the last patch so the maximum number of value
information extracted from metadata is MaxNumPromotions. If we have many
values including zero-count values when we write the metadata, some of them
will be dropped when we read them because we only read MaxNumPromotions
values. It will allow repeated indirect call promotion again. We need to
make sure if there are values indicating promoted targets, those values need
to be saved in metadata with higher priority than other values.

The patch fixed that problem. We change to use -1 to represent the count
of a promoted target instead of 0 so it is easier to sort the values.
When we prepare to update the metadata in updateIDTMetaData, we will sort
the values in the descending count order and extract only MaxNumPromotions
values to write into metadata. Since -1 is the max uint64_t number, if we
have equal to or less than MaxNumPromotions of -1 count values, they will
all be kept in metadata. If we have more than MaxNumPromotions of -1 count
values, we will only save MaxNumPromotions such values maximally. In such
case, we have logic in place in doesHistoryAllowICP to guarantee no more
promotion in sample loader pass will happen for the indirect call, because
it has been promoted enough.

With this change, now we can remove PromotedInsns without problem.

Differential Revision: https://reviews.llvm.org/D97350
2021-03-04 18:44:12 -08:00
David Blaikie a2a55def35 Move llvm/Analysis/ObjCARCUtil.h to IR to fix layering.
This is included from IR files, and IR doesn't/can't depend on Analysis
(because Analysis depends on IR).

Also fix the implementation - don't use non-member static in headers, as
it leads to ODR violations, inaccurate "unused function" warnings, etc.
And fix the header protection macro name (we don't generally include
"LIB" in the names, so far as I can tell).
2021-03-04 16:14:53 -08:00
Jianzhou Zhao db7fe6cd4b [dfsan] Propagate origin tracking at store
This is a part of https://reviews.llvm.org/D95835.

Reviewed By: morehouse, gbalats

Differential Revision: https://reviews.llvm.org/D97789
2021-03-04 23:34:44 +00:00
William S. Moses 2b896e39bf Revert "[Attributor] Enable heap-to-stack of any size"
This reverts commit 51bd42ef9b.
2021-03-04 17:24:56 -05:00
Sanjay Patel 1bee549737 [LoopVectorize] propagate fast-math-flags from induction instructions
This code assumed that FP math was only permissable if it was
fully "fast", so it hard-coded "fast" when creating new instructions.

The underlying code already allows matching recurrences/reductions
that are only "reassoc", so this change should prevent the potential
miscompile seen in the test diffs (we created "fast" ops even though
none existed in the original code).

I don't know if we need to create the temporary IRBuilder objects
used here, so that could be follow-up clean-up.

There's an open question about whether we should require "nsz" in
addition to "reassoc" here. InstCombine uses that combo for its
reassociative folds, but I think codegen is not as strict.
2021-03-04 17:21:32 -05:00
William S. Moses 51bd42ef9b [Attributor] Enable heap-to-stack of any size
Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1

Differential Revision: https://reviews.llvm.org/D97873
2021-03-04 17:17:23 -05:00
Francis Visoiu Mistrih 365b78396a [Remarks] Emit variable info in auto-init remarks
This enhances the auto-init remark with information about the variable
that is auto-initialized.

This is based of debug info if available, or alloca names (mostly for
development purposes).

```
auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes.Variables: var (4096 bytes). [-Rpass-missed=annotation-remarks]
  int var[1024];
      ^
```

This allows to see things like partial initialization of a variable that
the optimizer won't be able to completely remove.

Differential Revision: https://reviews.llvm.org/D97734
2021-03-04 12:51:22 -08:00
Akira Hatanaka 1900503595 [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.

Original commit message:

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-03-04 11:22:30 -08:00
Adrian Prantl d268febc56 Improve the debug info for coro-split .resume functions
This patch updates the scope line to point to the suspend point. This
makes the first address in the function point to the first source line
in the resume function rather than the function declaration. Without
this the line table "jumps" from the beginning of the function to the
suspend point at the beginning.

rdar://73386346

Differential Revision: https://reviews.llvm.org/D97345
2021-03-04 11:05:35 -08:00
Jianzhou Zhao 72abc9bf07 [dfsan] add a missing zero origin at atomic commands 2021-03-04 16:50:05 +00:00
Alexey Bataev 04ba80ca4d [Instcombiner]Improve emission of logical or/and reductions.
For logical or/and reductions we emit regular intrinsics @llvm.vector.reduce.or/and.vxi1 calls.
These intrinsics are not effective for the logical or/and reductions,
especially if the optimizer is able to emit short circuit versions of
the scalar or/and instructions and vector code gets less effective than
the scalar version.
Instead, or reduction for i1 can be represented as:
```
%val = bitcast <ReduxWidth x i1> to iReduxWidth
%res = cmp ne iReduxWidth %val, 0
```
and reduction for i1 can be represented as:
```
%val = bitcast <ReduxWidth x i1> to iReduxWidth
%res = cmp eq iReduxWidth %val, 11111
```
This improves perfromance of the vector code significantly and make it
to outperform short circuit scalar code.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D97406
2021-03-04 08:01:02 -08:00
Sanjay Patel 36a489d194 [Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC
Similar to b3a33553ae, but this shows a TODO and a potential
miscompile is already present.

We are tracking an FP instruction that does *not* have FMF (reassoc)
properties, so calling that "Unsafe" seems opposite of the common
reading.

I also removed one getter method by rolling the null check into
the access. Further simplification may be possible.

The motivation is to clean up the interactions between FMF and
function-level attributes in these classes and their callers.

The new test shows that there is an existing bug somewhere in
the callers. We assumed that the original code was fully 'fast'
and so we produced IR with 'fast' even though it was just 'reassoc'.
2021-03-04 10:40:26 -05:00
Sanjay Patel b3a33553ae [Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC
We are tracking an FP instruction that does *not* have FMF (reassoc)
properties, so calling that "Unsafe" seems opposite of the common
reading.

I also removed one getter method by rolling the null check into
the access. Further simplification seems possible.

The motivation is to clean up the interactions between FMF and
function-level attributes in these classes and their callers.
2021-03-04 08:53:04 -05:00
Hongtao Yu c75da238b4 [CSSPGO] Deduplicating dangling pseudo probes.
Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far.

This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added.

An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed.

Differential Revision: https://reviews.llvm.org/D97482
2021-03-03 22:44:42 -08:00
Hongtao Yu 8985515822 [CSSPGO] Unblocking optimizations by dangling pseudo probes.
This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.

The optimizations being unblocked are:

	1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
	2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
	3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.

Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D97481
2021-03-03 22:44:42 -08:00
Johannes Doerfert 5b70c12f3e [Attributor] Make DepClass a required argument
We often used a sub-optimal dependence class in the past because we
didn't see the argument. Let's make it explicit so we remember to think
about it.
2021-03-04 00:35:52 -06:00
Johannes Doerfert e592dad82e [Attributor] Fold "TrackDependence" into the DepClassTy enum
We don't need a bool and an enum to express the three options we
currently have. This makes the interface nicer and much easier to
use optional dependencies. Also avoids mistakes where the bool is
false and enum ignored.
2021-03-04 00:35:52 -06:00
Johannes Doerfert c8c93fdf0a [Attributor] Avoid work for GEPs and wait till the users are visited 2021-03-04 00:35:52 -06:00
Johannes Doerfert f3f88287c5 [Attributor] Use known alignment as lower bound to avoid work
If we know already more than available from a use, we don't need to
invest time on it.
2021-03-04 00:35:52 -06:00
Johannes Doerfert c14213e030 [Attributor][NFC] Move some trivial checks up 2021-03-04 00:35:52 -06:00
Johannes Doerfert 09c3eebf5f [Attributor] Use sensible initialization in AANoCaptureCallSiteReturned 2021-03-04 00:35:51 -06:00
Evgeniy Brevnov e94125f054 [DSE] Add support for not aligned begin/end
This is an attempt to improve handling of partial overlaps in case of unaligned begin\end.

Existing implementation just bails out if it encounters such cases. Even when it doesn't I believe existing code checking alignment constraints is not quite correct. It tries to ensure alignment of the "later" start/end offset while should be preserving relative alignment between earlier and later start/end.

The idea behind the change is simple. When start/end is not aligned as we wish instead of bailing out let's adjust it as necessary to get desired alignment.

I'll update with performance results as measured by the test-suite...it's still running...

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D93530
2021-03-04 12:24:23 +07:00
Serguei Katkov a0ff0f30df [InstCombine] Move statepoint intrinsic handling from visitCall to visitCallBase
statepoint intrinsic can be used in invoke context,
so it should be handled in visitCallBase to cover both call and invoke.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D97833
2021-03-04 11:00:22 +07:00
Xun Li 03f668613c [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions
See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment.

This patch disable promotion by check whether loop contains coro.suspend intrinsics.
This is a resubmit of D86190.
Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose.
In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame.

Differential Revision: https://reviews.llvm.org/D96928
2021-03-03 15:21:57 -08:00
Whitney Tsang 58d531fd6f [LoopUnrollRuntime] Add option to assume the non latch exit block to be
predictable.

Reviewed By: Meinersbur, bmahjour

Differential Revision: https://reviews.llvm.org/D97747
2021-03-03 20:43:31 +00:00
Philip Reames 89d331a31e Address review comment from D97219 (follow up to 8051156)
Probably should have done this before landing, but I forgot.

Basic idea is to avoid using the SCEV predicate when it doesn't buy us anything.  Also happens to set us up for handling non-add recurrences in the future if desired.
2021-03-03 12:20:27 -08:00
Philip Reames 99f5417346 Sink routine for replacing a operand bundle to CallBase [NFC]
We had equivalent code for both CallInst and InvokeInst, but never cared about the result type.
2021-03-03 12:07:55 -08:00
Philip Reames 805115655e [LSR] Unify scheduling of existing and inserted addrecs
LSR goes to some lengths to schedule IV increments such that %iv and %iv.next never need to overlap. This is fairly fundamental to LSRs cost model. LSR assumes that an addrec can be represented with a single register. If %iv and %iv.next have to overlap, then that assumption does not hold.

The bug - which this patch is fixing - is that LSR only does this scheduling for IVs which it inserts, but it's cost model assumes the same for existing IVs that it reuses. It will rewrite existing IV users such that the no-overlap property holds, but will not actually reschedule said IV increment.

As you can see from the relatively lack of test updates, this doesn't actually impact codegen much. The main reason for doing it is to make a follow up patch series which improves post-increment use and scheduling easier to follow.

Differential Revision: https://reviews.llvm.org/D97219
2021-03-03 12:07:55 -08:00
Fangrui Song a84f4fc0df [InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.

With `-z start-stop-gc` (LLD 13 (D96914); GNU ld 2.37 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker does not let `__start_/__stop_` references retain their sections.

Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.

This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.

`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.

Behaviors on COFF/Mach-O are not affected.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D97649
2021-03-03 11:32:24 -08:00
Arnold Schwaighofer a42bea211a [coro async] Allow a coro.suspend.async to specify which argument is the context argument
Before we used the same argument as the entry point. The resume partial
function might want to use a different ABI for its context argument

Differential Revision: https://reviews.llvm.org/D97333
2021-03-03 08:27:37 -08:00
Nico Weber 64f5d7e972 Revert "[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF"
This reverts commit 04c3040f41.
Breaks instrprof-value-merge.c in bootstrap builds.
2021-03-03 10:21:17 -05:00
Hans Wennborg 0a5dd06718 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.

> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
>   which indicates the call is implicitly followed by a marker
>   instruction and an implicit retainRV/claimRV call that consumes the
>   call result. In addition, it emits a call to
>   @llvm.objc.clang.arc.noop.use, which consumes the call result, to
>   prevent the middle-end passes from changing the return type of the
>   called function. This is currently done only when the target is arm64
>   and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
>   with the operand bundle in the IR and removes the inserted calls after
>   processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
>   operand bundle. It doesn't remove the operand bundle on the call since
>   the backend needs it to emit the marker instruction. The retainRV and
>   claimRV calls are emitted late in the pipeline to prevent optimization
>   passes from transforming the IR in a way that makes it harder for the
>   ARC middle-end passes to figure out the def-use relationship between
>   the call and the retainRV/claimRV calls (which is the cause of
>   PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
>   nothing in the callee prevents it from being paired up with the
>   retainRV/claimRV call in the caller. It then inserts a release call if
>   claimRV is attached to the call since autoreleaseRV+claimRV is
>   equivalent to a release. If it cannot find an autoreleaseRV call, it
>   tries to transfer the operand bundle to a function call in the callee.
>   This is important since the ARC optimizer can remove the autoreleaseRV
>   returning the callee result, which makes it impossible to pair it up
>   with the retainRV/claimRV call in the caller. If that fails, it simply
>   emits a retain call in the IR if retainRV is attached to the call and
>   does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
>   constant value if the call has the operand bundle. This ensures the
>   call always has at least one user (the call to
>   @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
>   multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
>   calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808

This reverts commit ed4718eccb.
2021-03-03 15:51:40 +01:00
Jianzhou Zhao ac4c1760b2 Fix the build error caused by D97570 2021-03-03 04:47:00 +00:00
Jianzhou Zhao d866b9c99d [dfsan] Propagate origin tracking at load
This is a part of https://reviews.llvm.org/D95835.

One issue is about origin load optimization: see the
comments of useCallbackLoadLabelAndOrigin

@gbalats This change may have some conflicts with your 8bit change. PTAL the change at visitLoad.

Reviewed By: morehouse, gbalats

Differential Revision: https://reviews.llvm.org/D97570
2021-03-03 04:32:30 +00:00
George Balatsouras 6ff18b08e6 [dfsan] Fix clang-tidy warnings
This addresses ~50 clang-tidy warnings on dfsan instrumentation pass.
It also contains some refactoring (all non-functional changes) to eliminate some variables and simplify code.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D97714
2021-03-02 17:37:45 -08:00
Andrei Elovikov b24afec8ae [NFCI][VPlan] Modify Recipes' print methods to honor Indent parameter
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D97787
2021-03-02 15:32:10 -08:00
Nikita Popov 3d8f842712 [LICM] Make promotion faster
Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-02 22:10:48 +01:00