Commit Graph

29427 Commits

Author SHA1 Message Date
Nikita Popov 4dc4815f56 [MemCpyOpt] Add some debug output to call slot optimization (NFC) 2022-01-19 15:51:10 +01:00
Nikita Popov 5ba73c924d [BuildLibCalls] Mark calloc as inaccessiblememonly
Now that DSE handles inaccessiblememonly calloc, mark it as such,
as we do with other memory allocation functions.
2022-01-19 12:55:09 +01:00
Nikita Popov 26f81984e7 [DSE] Handle inaccessiblememonly calloc
Change the DSE calloc handling to assume that it is
inaccessiblememonly, i.e. the defining access is liveOnEntry.

Differential Revision: https://reviews.llvm.org/D117543
2022-01-19 12:55:09 +01:00
Sjoerd Meijer d544a89a37 [LoopFlatten] Update MemorySSA state
I would like to move LoopFlatten from LoopPass Manager LPM2 to LPM1 (D116612),
but that is a LPM that is using MemorySSA and so LoopFlatten needs to preserve
MemorySSA and this adds that. More specifically, LoopFlatten restructures the
CFG and with this change the MSSA state is updated accordingly, where we also
update the DomTree. LoopFlatten doesn't rewrite/optimise/delete load or store
instructions, so I have not added any MSSA updates for that.

Differential Revision: https://reviews.llvm.org/D116660
2022-01-19 10:57:33 +00:00
Nikita Popov d56b0ad441 [ConstantHoist] Remove check for notional overindexing
ConstantHoist currently only hoists GEPs if there is no notional
overindexing. As this transform only hoists address arithmetic,
it shouldn't care about whether any overindexing occurs or not.

There is one caveat: If the hoisted base GEP is inbounds, and a
later non-inbounds GEP is rewritten in terms of it, the value
may be incorrectly poisoned. To avoid this, restrict the transform
to inbounds GEPs for now, as the notional overindexing check
effectively did that as well. The inbounds restriction could be
dropped by dropping inbounds from the base GEP expression.

Differential Revision: https://reviews.llvm.org/D117201
2022-01-19 11:32:10 +01:00
Nikita Popov a115bbea9b [Attributor] Remove notional overindexing check
AAPointerInfo currently bails on constant expression GEPs with
notional overindexing. I don't think this is necessary, as the
following code handling GEPOperator will deal with arbitrary
indices appropriately.

Differential Revision: https://reviews.llvm.org/D117203
2022-01-19 11:30:04 +01:00
Florian Hahn 165e36bf18
[VPlan] Assert can IV is only used by increments during epilogue vec.
After resetting the start value of the canonical IV, it might not be
canonical any more. Add an assertion to make sure it is only used by its
increment, to avoid potential mis-use. Suggested in D117140.
2022-01-19 10:10:05 +00:00
Chuanqi Xu c8ecf12bc3 [Coroutines] Offering llvm.coro.align intrinsic
It is a known problem that we can't align the switch-based coroutine
frame if the alignment exceeds std::max_align_t (which is 16 usually).

We could solve the problem on the middle-end by dynamically transforming
or in the frontend by emitting aligned allocation function.

If we need to solve it in the frontend, the middle end need to offer an
intrinsic to tell the alignment at least. This patch tries to offer such
an intrinsic called llvm.coro.align.

Reviewed By: https://reviews.llvm.org/D117542

Differential revision: https://reviews.llvm.org/D117542
2022-01-19 09:52:45 +08:00
spupyrev 13d1364a34 A better profi rebalancer
This is an extension of **profi** post-processing step that rebalances counts
in CFGs that have basic blocks w/o probes (aka "unknown" blocks). Specifically,
the new version finds many more "unknown" subgraphs and marks more "unknown"
basic blocks as hot (which prevents unwanted optimization passes).

I see up to 0.5% perf on some (large) binaries, e.g., clang-10 and gcc-8.

The algorithm is still linear and yields no build time overhead.
2022-01-18 12:14:24 -08:00
Ellis Hoag 5b9358d774 [InstrProf][NFC] Add InstrProfInstBase base
The `InstrProfInstBase` class is for all `llvm.instrprof.*` intrinsics. In a
later diff we will add new instrinsic of this type. Also refactor some
logic in `InstrProfiling.cpp`.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D117261
2022-01-18 11:12:00 -08:00
Adrian Tong ea27adb45b [NFC] Test commit.
This is just a test commit to check whether I got commit permission.
2022-01-18 19:01:04 +00:00
Mircea Trofin 3e8553aab4 [mlgo][inline] Improve global state tracking
The global state refers to the number of the nodes currently in the
module, and the number of direct calls between nodes, across the
module.

Node counts are not a problem; edge counts are because we want strictly
the kind of edges that affect inlining (direct calls), and that is not
easily obtainable without iteration over the whole module.

This patch avoids relying on analysis invalidation because it turned out
to be too aggressive in some cases. It leverages the fact that Node
objects are stable - they do not get deleted while cgscc passes are
run over the module; and cgscc pass manager invariants.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D115847
2022-01-18 17:45:34 +00:00
Jan Svoboda 5f4ae56457 [llvm] Remove uses of `std::vector<bool>`
LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement.

This patch does just that for llvm.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D117121
2022-01-18 18:20:45 +01:00
Sanjay Patel 2d50630efb [InstCombine] reduce code duplication; NFC 2022-01-18 12:13:45 -05:00
Hans Wennborg 53a51acc36 Revert "[MemCpyOpt] Make capture check during call slot optimization more precise"
This casued a miscompile due to call slot optimization replacing a call
argument without considering the call's !noalias metadata, see discussion on
the code review.

> Call slot optimization is currently supposed to be prevented if
> the call can capture the source pointer. Due to an implementation
> bug, this check currently doesn't trigger if a bitcast of the source
> pointer is passed instead. I'm somewhat afraid of the fallout of
> fixing this bug (due to heavy reliance on call slot optimization
> in rust), so I'd like to strengthen the capture reasoning a bit first.
>
> In particular, I believe that the capture is fine as long as a)
> the call itself cannot depend on the pointer identity, because
> neither dest has been captured before/at nor src before the
> call and b) there is no potential use of the captured pointer
> before the lifetime of the source alloca ends, either due to
> lifetime.end or a return from a function. At that point the
> potentially captured pointer becomes dangling.
>
> Differential Revision: https://reviews.llvm.org/D115615

Also reverting the dependent commit:

> [MemCpyOpt] Look through pointer casts when checking capture
>
> The user scanning loop above looks through pointer casts, so we
> also need to strip pointer casts in the capture check. Previously
> the source was incorrectly considered not captured if a bitcast
> was passed to the call.

This reverts commit 487a34ed9d
and 00e6869463.
2022-01-18 17:41:49 +01:00
Daniil Kovalev d8e0e125a2 [InstCombine] Simplify addends reordering logic
Previously some constants were not pushed to the top of the resulting
expression tree as intended by the algorithm. We can remove the logic
from simplifyFAdd and rely on SimplifyAssociativeOrCommutative to do
that.

Differential Revision: https://reviews.llvm.org/D117302
2022-01-18 16:00:47 +03:00
David Sherwood e781620dee [LoopVectorize][AArch64] Use get.active.lane.mask intrinsic when SVE is enabled
When SVE is enabled for AArch64 targets it makes more sense to use the
get.active.lane.mask intrinsic, because SVE has an exact 1-1 mapping
from the intrinsic to the 'whilelo' instruction for legal vector types.
This instruction neatly takes overflow into account as well. This patch
fixes an issue in VPInstruction::generateInstruction that assumed we are
only dealing with fixed-width vectors.

Differential Revision: https://reviews.llvm.org/D117109
2022-01-18 11:59:30 +00:00
pvellien 4e1c207726 [SimplifyCFG] Fix assertion failure when reusing table switch comparison
After D116332, some icmps no longer fold with the target-independent
constant folder. The SimplifyCFG code assumed that the comparison
would always fold, which is not guaranteed. Explicitly check that the
result is either true or false.

Differential Revision: https://reviews.llvm.org/D117184
2022-01-18 09:30:54 +01:00
Philip Reames 26049b8ce3 [GlobalOpt] Generalize malloc-to-global for any allocation function
We can generalize the malloc-to-global transform for other allocation functions which are both a) removable, and b) have a known initialization value.

One subtlety that I want to point out - mostly because I hadn't realized it was true until I took a closer look - is that the existing code doesn't prove that initialization/malloc happens only once. The initialization function can be called multiple times. This is correct without special handling for malloc as undef can map to any value previously written, but a non-undef initializing allocation it means we may end up memseting the new global repeatedly. In particular, this means it's not legal to fold the memset into the initializer of the global.

Differential Revision: https://reviews.llvm.org/D117503
2022-01-17 15:06:23 -08:00
Philip Reames 6ca192de58 [LoopDeletion] Add back statistic update lost in 523573e
Caught by a couple of builders as an unused variable warning (e.g. https://lab.llvm.org/buildbot#builders/57/builds/13973).
2022-01-17 12:20:51 -08:00
Philip Reames 523573e90d [LoopDeletion] Revert 3af8a11 and add test coverage for breakage
This reverts 3af8a11 because I'd used an upper bound where an lower bound was required.  The included reduced test case demonstrates the issue.
2022-01-17 11:44:03 -08:00
Stephen Tozer 32417b3203 [DebugInfo] ValueMapper impl for DIArgList respects IgnoreMissingLocals
This patch fixes an issue in which SSA value reference within a
DIArgList would be unnecessarily dropped by llvm-link, even when
invoking on a single file (which should be a no-op). The reason for the
difference is that the ValueMapper does not refer to the
RF_IgnoreMissingLocals flag for LocalAsMetadata contained within a
DIArgList; this flag is used for direct LocalAsMetadata uses to preserve
SSA references even when the ValueMapper does not have an explicit
mapping for the referenced SSA value, which appears to always be the
case when using llvm-link in this manner.

Differential Revision: https://reviews.llvm.org/D114355
2022-01-17 17:17:32 +00:00
Sanjay Patel 4cdf30d9d3 [InstCombine] FP with reassoc FMF: (X * C) + X --> X * (MulC + 1.0)
This fold already exists for scalars via FAddCombine (and that's
why 2 of the tests are only changed cosmetically), but that code
misses vectors and has largely been replaced by simpler folds
over time, so this is another step towards removing it.
2022-01-17 10:38:05 -05:00
Florian Hahn aa7f0e6a55
[DSE] Remove commented-out InvisibleToCallerBeforeRet. (NFC)
This code was is a leftover from earlier changes and should be removed.
2022-01-17 13:59:13 +00:00
Sanjay Patel 7037d110fa [InstCombine] propagate IR flags from binop through select
The tests with constant folding that produces poison
could potentially remove the select entirely:
https://alive2.llvm.org/ce/z/e-WUqF
...but this patch just removes the FMF-only limitation on
propagation.
2022-01-17 08:42:48 -05:00
Florian Hahn 500fe60957
[VPlan] Drop unnecessary uses of getVPSingleValue (NFC). 2022-01-17 13:27:33 +00:00
Nikita Popov 12bee2c054 [GlobalOpt] Drop an incorrect check
This was a last-minute addition to D117249, and of course I ended
up inverting the condition in a way that caused an uninitialized
memory read.

I've dropped it entirely, as I don't think we actually care whether
the size is zero or not here. The previous code wasn't checking
this either.
2022-01-17 10:10:56 +01:00
Nikita Popov 499f1ca79f [GlobalOpt] Use generic type when converting malloc to global
The malloc to global transform currently determines the type of the
global by looking at bitcasts of the malloc. This is limited (the
transform fails if there are multiple different types) and
incompatible with opaque pointers.

My initial approach was to construct an appropriate struct type
based on usage in loads/stores. What this patch does instead is
to always create an [i8 x AllocSize] global, without trying to
guess types at all.

This does mean that other transforms that require a certain global
type may break. I fixed two of these in D117034 and D117223, which
I believe should be sufficient to avoid regressions. In particular,
the global SRA change should end up splitting the global into
naturally-typed sub-globals, at which point all other optimizations
should work.

Differential Revision: https://reviews.llvm.org/D117092
2022-01-17 09:55:33 +01:00
Nikita Popov 4796b4ae7b [GlobalOpt] Make global SRA offset based
Currently global SRA uses the GEP structure to determine how to
split the global. This patch instead analyses the loads and stores
that are performed on the global, and collects which types are used
at which offset, and then splits the global according to those.

This is both more general, and works fine with opaque pointers.
This is also closer to how ordinary SROA is performed.

Differential Revision: https://reviews.llvm.org/D117223
2022-01-17 09:28:36 +01:00
Nikita Popov 00b77d917c [DSE] Remove alloc function check in canSkipDef()
canSkipDef() currently skips inaccessiblememonly calls, but not
if they are allocation functions. This check was added in D103009,
but actually seems to be a leftover from a previous implementation
in D101440. canSkipDef() is not used on the storeIsNoop() path,
where the relevant transform ended up being implemented.

Differential Revision: https://reviews.llvm.org/D117005
2022-01-17 09:23:51 +01:00
Florian Hahn 070d1034da
[LV] Restore metadata to disable runtime unrolling for epilogue loop.
After d4a8fc3a87 LV stopped adding metadata to disable runtime
unrolling to the vectorized epilogue loop. This was missed because
278aa65cc4 removed the relevant test coverage.

This patch fixes that by adding the relevant metadata after
vector loop generation.
2022-01-16 13:14:16 +00:00
Florian Hahn 62739204d4
[LV] Move AddRuntimeUnrollDisableMetaData so it can be used earlier (NFC)
Move up the definition of AddRuntimeUnrollDisableMetaData, so it can be
re-used earlier in the file in a follow-up patch.
2022-01-16 10:30:24 +00:00
Nikita Popov c63a3175c2 [AttrBuilder] Remove ctor accepting AttributeList and Index
Use the AttributeSet constructor instead. There's no good reason
why AttrBuilder itself should exact the AttributeSet from the
AttributeList. Moving this out of the AttrBuilder generally results
in cleaner code.
2022-01-15 22:39:31 +01:00
Nikita Popov d1675e4944 [AttrBuilder] Remove empty() / td_empty() methods
The empty() method is a footgun: It only checks whether there are
non-string attributes, which is not at all obvious from its name,
and of dubious usefulness. td_empty() is entirely unused.

Drop these methods in favor of hasAttributes(), which checks
whether there are any attributes, regardless of whether these are
string or enum attributes.
2022-01-15 17:57:18 +01:00
Florian Hahn e00158ed5c
[LoopUtils] Use InstSimplifyFolder in addRuntimeChecks.
Use the InstSimplifyFolder introduced earlier to perform initial
simplification during runtime check construction.
2022-01-15 15:21:16 +00:00
Vitaly Buka 35d00fdc10 [msan] Reset shadow of byval before call
If function is not sanitized we must reset shadow, not copy.

Depends on D117285

Reviewed By: kda, eugenis

Differential Revision: https://reviews.llvm.org/D117286
2022-01-14 22:35:43 -08:00
Quentin Colombet a8ca4046e2 [LSR] Fix crash in Phi node with EHPad block
This fixes a crash I observed in issue #48708 where the LSR
pass tries to insert an instruction in a basic block with only a
catchswitch statement in there. This happens because the Phi node
being evaluated assumes the same value for different basic blocks.

If the basic block associated with the incoming value of the operand
being evaluated has an EHPad terminator LSR skips optimizing it.
But if that incoming value can come from multiple different blocks
there can be some incoming basic blocks which are terminated in
an EHPad. If these are then rewritten in RewriteForPhi the ones
containing an EHPad terminator will hit the "Insertion point must
be a normal instruction" assert in AdjustInsertPositionForExpand.

This fix makes CollectLoopInvariantFixupsAndFormulae also ignore
cases where the same value has another incoming basic block with an
EHPad, same as it already does in case the primary value has one.

Patch by Lorenz Brun <lorenz@brun.one>

Differential Revision: https://reviews.llvm.org/D98378
2022-01-14 18:53:18 -08:00
Vitaly Buka 0a46b6ec4e [msan] Clear byval shadow in ignored functions
If function has no sanitize_memory we still reset shadow for nested calls.
The first return from getShadow() correctly returned shadow for argument,
but it didn't reset shadow of byval pointee.

Depends on D117277

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D117278
2022-01-14 17:32:07 -08:00
Vitaly Buka 4959708502 [NFC][msan] Consolidate clean shadow handling
Depends on D117276

Reviewed By: kda, eugenis

Differential Revision: https://reviews.llvm.org/D117277
2022-01-14 17:06:39 -08:00
Vitaly Buka 18e4369e19 [NFC][msan] Don't setOrigin for byval pointer
It's NFC because shadow of pointer is clean so origins will not be
propagated anyway.

Depends on D117275

Reviewed By: kda, eugenis

Differential Revision: https://reviews.llvm.org/D117276
2022-01-14 16:42:26 -08:00
Heejin Ahn c3a68c5d63 [SROA] Bail out on PHIs in catchswitch BBs
In the process of rewriting `alloca`s and `phi`s that use them, the SROA
pass can try to insert a non-PHI instruction by calling
`getFirstInsertionPt()`, which is not possible in a catchswitch BB. This
CL makes we bail out on these cases.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D117168
2022-01-14 14:55:07 -08:00
Congzhe Cao fa6a2876c7 [LoopInterchange] Enable interchange with multiple inner loop indvars
Currently loop interchange only supports loops with one inner loop
induction variable. This patch adds support for transformation with
more than one inner loop induction variables. The induction PHIs and
induction increment instructions are moved/duplicated properly to the
new outer header and the new outer latch, respectively.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D114917
2022-01-14 16:28:41 -05:00
Vitaly Buka 3552177229 [NFC][msan] Reorder branches in complex if
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D117274
2022-01-14 13:22:43 -08:00
Nadav Rotem 9551fc57b7 Fold ashr-exact into a icmp-ugt.
This commit optimizes the code sequence:
  icmp-XXX (ashr-exact (X, C_1), C_2).

Instcombine already implements this optimization for sgt, and this
patch adds support to additional predicates. The transformation is legal
for all predicates if the 'exact' flag is set, and to SGE, UGE, SLT, ULT
when the exact flag is not present.

This pattern is found in the std::vector bounds checks code of the at()
method.

Alive2 proof:
https://alive2.llvm.org/ce/z/JT_WL8

Differential Revision: https://reviews.llvm.org/D117252
2022-01-14 12:58:44 -08:00
Jessica Paquette acb8de565e [JumpThreading] Change asserts for WantInteger into actual checks
After e734e8286b, it is possible to end up in
a situation where an `indirectbr` is fed by a cast, which is in turn fed by
an operation which only produces integers.

`indirectbr` expects a block address, however these operations can't produce
that.

There were several asserts in `computeValueKnownInPredecessorsImpl` which check
that we're not looking for a block address if we're walking through something
which can never produce one.

Since it's now possible to hit these asserts, this changes them into actual
checks which return false if `Preference` is not `WantInteger`.

This adds a testcase which verifies that we don't crash anymore in these
situations.

Differential Revision: https://reviews.llvm.org/D99814
2022-01-14 11:15:14 -08:00
Florian Hahn 42b34facfd
Recommit "[LV] Inline CreateSplatIV call for scalar VFs."
This reverts the revert commit 073c27b5e5.

A reduced test case has been added in 5e4966cbae and the code has
been updated to handle the case where getInductionOpcode returns
BinaryOpsEnd. In this case, the original code was always using
Instruction::Add. Do the same in the patch.

Note this commit may slightly change the value naming, because it now
also assigns the 'induction' name in the floating point case.
2022-01-14 19:03:49 +00:00
Sanjay Patel 02455bea6b [InstCombine] remove unnecessary use check on X >>exact == 0 fold
The transform replaces one icmp with another, so we should
not care if the shift has another use.
2022-01-14 12:52:16 -05:00
Florian Hahn 1ef9bfa013
[InstSimplify] Pass pointer and indices separately to SimplifyGEPInst.
This doesn't require callers to put the pointer operand and the indices
in a container like a vector when calling the function. This is not
really an issue with the existing callers. But when using it from
IRBuilder the inputs are available as separate pointer value and indices
ArrayRef.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D117038
2022-01-14 09:59:52 +00:00
Caroline Concatto 8e5a5b619d [InstCombine] Fold for masked scatters to a uniform address
When masked scatter intrinsic does a uniform store to a destination
address from a source vector, and in this case, the mask is all one value.
This patch replaces the masked scatter with an extracted element of the
last lane of the source vector and stores it in the destination vector.
This patch also folds when the value in the masked scatter is a splat.
In this case, the mask cannot be all zero, and it folds to a scalar store
of the value in the destination pointer.

Differential Revision: https://reviews.llvm.org/D115724
2022-01-14 09:44:34 +00:00
Bryce Wilson 28b6e2cb3d
[Attributor] [NFC] Use canonical variable name
Differential Revision: https://reviews.llvm.org/D117241
2022-01-13 23:06:00 -08:00