Commit Graph

31623 Commits

Author SHA1 Message Date
Matt Arsenault ce44357216 Analysis: Add AssumptionCache to isSafeToSpeculativelyExecute
Does not update any of the uses.
2022-09-19 19:25:22 -04:00
Matt Arsenault fd37ab6cf6 InstCombine: Pass AssumptionCache through isDereferenceablePointer 2022-09-19 19:10:51 -04:00
Matt Arsenault 0d8ffcc532 Analysis: Add AssumptionCache argument to isDereferenceableAndAlignedPointer
This does not try to pass it through from the end users.
2022-09-19 18:57:33 -04:00
Alexey Bataev ce39bdbd65 [SLP][NFC]Reorder gather nodes with reused scalars, NFC.
The compiler does not reorder the gather nodes with reused scalars, just
does it for opernads of the user nodes. This currently does not affect
the compiler but breaks internal logic of the SLP graph. In future, it
is supposed to actually use all nodes instead of just list of operands
and this will affect the vectorization result.
Also, did some early check to avoid complex logic in cost estimation
analysis, should improve compiler time a bit.
2022-09-19 14:00:17 -07:00
Vitaly Buka 6f3276d57e [msan] Check mask and pointers shadow
Msan has default handler for unknown instructions which
previously applied to these as well. However depending on
mask, not all pointers or passthru part will be used. This
allows other passes to insert undef into sum arguments.
As result,  default strict instruction handler can produce false reports.

Reviewed By: kda, kstoimenov

Differential Revision: https://reviews.llvm.org/D133678
2022-09-19 13:09:56 -07:00
Florian Hahn 582f8ef19f
[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd.
Epilogue vectorization uses isScalarAfterVectorization to check if
widened versions for inductions need to be generated and bails out in
those cases.

At the moment, there are scenarios where isScalarAfterVectorization
returns true but VPWidenPointerInduction::onlyScalarsGenerated would
return false, causing widening.

This can lead to widened phis with incorrect start values being created
in the epilogue vector body.

This patch addresses the issue by storing the cost-model decision in
VPWidenPointerInductionRecipe and restoring the behavior before 151c144.
This effectively reverts 151c144, but the long-term fix is to properly
support widened inductions during epilogue vectorization

Fixes #57712.
2022-09-19 18:14:35 +01:00
Craig Topper 90a004b4a1 [LV] Remove FIXME about NoImplicitFloat. NFC
My understanding is that NoImplicitFloat, despite it's name, is
supposed to disable all vectors not just float vectors.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D134084
2022-09-19 10:01:02 -07:00
Nikita Popov dd61726d5b Revert "[SimplifyCFG] accumulate bonus insts cost"
This reverts commit e5581df60a.

This causes major compile-time regressions, about 2-3% end-to-end
on CTMark.
2022-09-19 14:46:43 +02:00
Max Kazantsev 92e9bddc49 [LoopRotate] Drop loop dispositions when rotating loops. PR56260
This is required because if there is a pure loop-invariant instruction, Loop Rotation
may decide to not clone it and just hoist it instead. If SCEV has previously cached
that it was loop-variant (not being smart enough to prove invariance), we may end
up with inconsistent cache state (which may later trigger false-negative assertion
failures checking that something was invariant).

This is a conservative fix that unconditionally drops the dispositions. We could
only drop it if the hoisting has actually happened, but it should take some time
understanding whether it's safe with all other things this function does.

Differential Revision: https://reviews.llvm.org/D134167
Reviewed By: fhahn
2022-09-19 18:01:02 +07:00
Max Kazantsev 21a9abc1ce [LoopFuse] Drop loop dispositions before reassigning blocks to other loop
This bug was found by recent improvement in SCEV verifier. The code in LoopFuse
directly reassigns blocks to be a part of a different loop, which should automatically
invalidate all related cached loop dispositions.

Differential Revision: https://reviews.llvm.org/D134173
Reviewed By: nikic
2022-09-19 17:43:06 +07:00
Max Kazantsev 818b1ab84e [SCEV][NFC] Remove unused parameter from forgetLoopDispositions
Let's be honest about it, we don't drop loop dispositions for
particular loops. Remove the parameter that misleadingly makes
it apparent that we do.
2022-09-19 14:06:42 +07:00
Yaxun (Sam) Liu e5581df60a [SimplifyCFG] accumulate bonus insts cost
SimplifyCFG folds

bool foo() {
  if (cond1) return false;
  if (cond2) return false;
  return true;
}

as

bool foo() {
  if (cond1 | cond2) return false
  return true;
}

'cond2' is called 'bonus insts' in branch folding since they introduce overhead
since the original CFG could do early exit but the folded CFG always executes
them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into
its predecessor BB which shares the destination. If it is below bonus-inst-threshold,
SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed.

When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts
in the current BB to be considered for folding. This causes issue for unrolled loops
which share destinations, e.g.

bool foo(int *a) {
  for (int i = 0; i < 32; i++)
    if (a[i] > 0) return false;
  return true;
}

After unrolling, it becomes

bool foo(int *a) {
  if(a[0]>0) return false
  if(a[1]>0) return false;
  //...
  if(a[31]>0) return false;
  return true;
}

SimplifyCFG will merge each BB with its predecessor BB,
and ends up with 32 'bonus insts' which are always executed, which
is much slower than the original CFG.

The root cause is that SimplifyCFG does not consider the
accumulated cost of 'bonus insts' which are folded from
different BB's.

This patch fixes that by introducing a ValueMap to track
costs of 'bonus insts' coming from different BB's into
the same BB, and cuts off if the accumulated cost
exceeds a threshold.

Reviewed by: Artem Belevich, Florian Hahn, Nikita Popov, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D132408
2022-09-18 20:21:14 -04:00
Sanjay Patel d6498abc24 [InstCombine] remove multi-use add demanded constant fold
This was originally part of D133788. There are no visible
regressions. All of the diffs show a large unsigned constant
becoming a small negative constant. This should be better
for analysis (and slightly less compile-time) and codegen.
2022-09-18 14:23:43 -04:00
Kazu Hirata 5e5a6c5b07 Use std::conditional_t (NFC) 2022-09-18 10:25:06 -07:00
Marc Auberer f52dd920d4 [InstCombine] Fix bug when folding x + (x | -x) to x & (x - 1)
Addresses concern: https://reviews.llvm.org/rG09cdddea0c4d284c2c22f5dfade40a60850c5ea7

There was a copy/paste mistake in the code. Updated code and test ref.

Differential Revision: https://reviews.llvm.org/D134135
2022-09-18 13:16:12 -04:00
Sanjay Patel 1d1d1e6f22 [InstCombine] fold full-shift of sdiv to icmp+extend
This is a disguised sign-bit test with offset:
(X / +DivC) >> (Width - 1) --> ext (X <= -DivC)
(X / -DivC) >> (Width - 1) --> ext (X >= +DivC)

https://alive2.llvm.org/ce/z/cO8JO4

We don't match/test poison in the sdiv constant because
that would be immediate undefined behavior.
2022-09-18 13:13:14 -04:00
Kazu Hirata d3b95ecc98 [ModuleInliner] Remove InlineOrder::front (NFC)
InlineOrder::front is a remnant from the era when we had a nested
"while" loops in the module inliner, with the inner one grouping the
call sites with the same caller.

Now that we have a simple "while" loop draining the priority queue, we
can just use InlineOrder::pop.

Differential Revision: https://reviews.llvm.org/D134121
2022-09-18 08:49:44 -07:00
Benjamin Kramer b987fe4972 Silence unused variable warning in release builds. NFC 2022-09-18 09:15:32 +02:00
Kazu Hirata 284f0397e2 [Transforms] Merge function attributes within InlineFunction (NFC)
In the past, we've had a bug resulting in a compiler crash after
forgetting to merge function attributes (D105729).

This patch teaches InlineFunction to merge function attributes.  This
way, we minimize the "time" when the IR is valid, but the function
attributes are not.

Differential Revision: https://reviews.llvm.org/D134117
2022-09-17 23:10:23 -07:00
Kazu Hirata 6e4fbd2f51 [ModuleInliner] Set Changed earlier (NFC)
It makes more sense to set Changed to true immediately after a
successful inlining.
2022-09-17 14:16:32 -07:00
Kazu Hirata 31b91356bc [ModuleInliner] Don't include SetVector.h (NFC)
We don't use SetVector in the module inliner.
2022-09-17 12:17:52 -07:00
Kazu Hirata 5faf4bf195 [ModuleInliner] Move UseInlinePriority to InlineOrder.cpp (NFC)
UseInlinePriority specifies the priority function.  This patch
simplifies the code by moving UseInlinePriority closer to the actual
consumer -- the switch statement inside getInlineOrder.

Differential Revision: https://reviews.llvm.org/D134100
2022-09-17 11:41:28 -07:00
Florian Hahn 7914e53e31
[ConstraintElimination] Fix crash when combining results.
f213128b29 didn't account for the possibility that the result of
decompose may be empty. Fix that by explicitly checking. Use a newly
introduced helper to also reduce some duplication.

Thanks @bjope for finding the issue!
2022-09-17 14:47:38 +01:00
Kazu Hirata 6e30a9cc08 [Inliner] Retire DefaultInlineOrder (NFC)
DefaultInlineOrder was largely an exercise in generalizing the
traversal order of call sites within the inliner.

Now that the module inliner is starting to form its shape, there is no
point in sharing DefaultInlineOrder between the module inliner and the
CGSCC inliner.  DefaultInlineOrder and all the other inline orders are
mutually exclusive in the following sense:

- The use of DefaultInlineOrder doesn't make sense in the module
  inliner because there is no priority inherent in the order in which
  call sites are added to the list of call sites -- SmallVector.

- The use of any other inline order doesn't make sense in the CGSCC
  inliner because little prioritization can be done within one CGSCC.

This patch essentially reverts the addition of DefaultInlineOrder so
that the loop structure of Inliner.cpp looks like the state just
before we started working on the module inliner (circa June 2021).

At the same time, ww remove the choice of DefaultInlineOrder from
UseInlinePriority.

Differential Revision: https://reviews.llvm.org/D134080
2022-09-16 15:36:40 -07:00
Alexey Bataev 5d13b12674 [SLP]Improve isUndefVector function by adding insertelement analysis.
Added the mask and the analysis of the buildvector sequence in the
isUndefVector function, improves codegen and cost estimation.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
                                                                          results                   results0 diff
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27362.00                  27360.00 -0.0%

Metric: size..text

Program                                                                                                           size..text
                                                                   results     results0    diff
test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   805299.00   806035.00  0.1%

526.blender_r - some extra code is vectorized.
508.namd_r - some extra code is optimized out.

Differential Revision: https://reviews.llvm.org/D133891
2022-09-16 14:36:38 -07:00
Teresa Johnson c2cf93c1a9 [WPD/LTT] Lower type test feeding assumes via phi correctly
This fixes https://github.com/llvm/llvm-project/issues/57616.

Type test lowering in ThinLTO modules relies on having type id
summaries set up for the referenced types, which provide the type
test resolution. If there is no summary, the type tests are lowered
to false. At the very least, a default type id summary gives the
type tests a resolution of Unknown, which is handled correctly (ignored
by the first invocation of LTT, and lowered to true by the second).

WPD sets up the type id summaries (with a default type test resolution)
as it is processing the type tests, but only does this for the patterns
handled by WPD, which is a type test directly feeding an assume. In the
case of type tests feeding an assume via a phi, the type id summary was
not being set up, leading to the type tests being lowered to false
incorrectly.

Fix this by adding the default type id summary entries for all type ids
used on globals during index-only WPD.

This is not an issue for hybrid (split-lto-unit) LTO, as in that case
the type test resolution is determined and set up during LTT, since the
type definitions are in the regular LTO split module, and exported via
the summary to the ThinLTO split module.

Differential Revision: https://reviews.llvm.org/D134012
2022-09-16 13:50:01 -07:00
Kazu Hirata 9111920af8 [ModuleInliner] clang-format ModuleInliner.cpp (NFC) 2022-09-16 09:41:42 -07:00
Kazu Hirata 4475470529 [ModuleInliner] Remove a stale comment (NFC)
These comments refer to the nested loop in the module inliner where
the inner loop grouped call sites from the same caller.  We don't
group call sites anymore, so the comment has become stale.
2022-09-16 09:37:43 -07:00
Kazu Hirata 42a90e6017 [ModuleInliner] Remove a redundaunt variable (NFC)
In the CGSCC inliner, DidInline was used as an indicator to update the call graph.

In the module inliner, DidInline is always true at the end of the
"while" loop, so can just drop it.
2022-09-16 09:32:02 -07:00
Kazu Hirata 513717ddd0 [ModuleInliner] Remove a write-only variable (NFC)
InlinedCallees is a remnant from the CGSCC inliner.  We don't use it
in the module inliner.
2022-09-16 09:15:53 -07:00
Kazu Hirata 77501bfab8 [IPO] Simplify the module inliner loop (NFC)
In the bottom-up inliner, we have a two-level nested "while" loop,
with the inner one grouping call sites with the same caller.  We need
to do so to keep CGSCC up to date.

Now, with the module inliner, we don't have any per-caller work.  We
don't update CGSCC.  Plus, the caller will likely keep changing as we
pop call sites in some priority order.

This patch simply removes the inner "while" loop while indenting its
body.  Further cleanup is possible, but that's left for follow-up
patches.

Differential Revision: https://reviews.llvm.org/D133969
2022-09-16 08:56:18 -07:00
Sanjay Patel 6174da2299 [InstCombine] reduce code duplication in foldICmpMulConstant(); NFC 2022-09-16 10:39:54 -04:00
Vitaly Buka f0c2ffa8f8 [msan] Add msan-insert-check DEBUG_COUNTER 2022-09-15 21:52:58 -07:00
Gulfem Savrun Yeniceri d6aed77f0d [InstrProfiling] No runtime hook for unused funcs
This is a reland of https://reviews.llvm.org/D122336.
Original patch caused a problem in collecting coverage in
Fuchsia because it was returning early without putting unused
function names into __llvm_prf_names section. This patch
fixes that issue.

The original commit message is as the following:
CoverageMappingModuleGen generates a coverage mapping record
even for unused functions with internal linkage, e.g.
static int foo() { return 100; }
Clang frontend eliminates such functions, but InstrProfiling pass
still emits runtime hook since there is a coverage record.
Fuchsia uses runtime counter relocation, and pulling in profile
runtime for unused functions causes a linker error:
undefined hidden symbol: __llvm_profile_counter_bias.
Since https://reviews.llvm.org/D98061, we do not hook profile
runtime for the binaries that none of its translation units
have been instrumented in Fuchsia. This patch extends that for
the instrumented binaries that consist of only unused functions.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D122336
2022-09-16 02:05:09 +00:00
Navid Emamdoost 3e52c0926c Add -fsanitizer-coverage=control-flow
Reviewed By: kcc, vitalybuka, MaskRay

Differential Revision: https://reviews.llvm.org/D133157
2022-09-15 15:56:04 -07:00
Sanjay Patel aafaa2f4fc [SCCP] convert ashr to lshr for non-negative shift value
This is similar to the existing signed instruction folds.
We get the obvious minimal patterns in other passes, but
this avoids potential missed folds when the multi-block
tests are converted to selects.
2022-09-15 13:54:52 -04:00
Craig Topper ace05124f5 [IntegerDivision][AMDGPU] Use CreateLogicalOr to block poison propagation.
There are two ctlz intrinsics here with the zero_is_poison flag
set. There are also two comparisons that check if either of the
inputs the ctlzs are zero. We need to use a logical or to block
the poison from the ctlz if either of the inputs is zero.

Reviewed By: arsenm, aqjune

Differential Revision: https://reviews.llvm.org/D130680
2022-09-15 09:38:02 -07:00
Sanjay Patel 02a27b3890 [InstCombine] fold X*X == 0 --> X == 0
This is safe when the mul does not overflow:
https://alive2.llvm.org/ce/z/LedVVP

This could be extended to handle non-zero compare constants
and non-squared multiplies.
2022-09-15 12:02:50 -04:00
Evgeniy Brevnov 03a102e3b2 [JumpThreading][NFC] Reuse existing DT instead of recomputation (newPM)
This is the same change as
503d5771b6 with the same intent but for new pass manager.
2022-09-15 12:27:57 +07:00
Dhruva Chakrabarti 839ac62c50 Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit 7539e9cf81.
2022-09-15 03:08:46 +00:00
Vitaly Buka f221720e82 [nfc][msan] getShadowOriginPtr on <N x ptr>
Some vector instructions can benefit from
of Addr as <N x ptr>.

Differential Revision: https://reviews.llvm.org/D133681
2022-09-14 19:18:52 -07:00
Vitaly Buka f404169f24 [NFC][msan] Rename variables to match definition 2022-09-14 19:16:27 -07:00
Vitaly Buka 2209be15a5 [NFC][msan] Convert some code to early returns
Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D133673
2022-09-14 19:16:11 -07:00
Vitaly Buka bcf3d666b4 [NFC][msan] Simplify llvm.masked.load origin code
Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D133652
2022-09-14 19:14:29 -07:00
Vitaly Buka d421223e25 [msan] Resolve FIXME from D133880
We don't need to change tests we convertToBool
unconditionally only before OR.
2022-09-14 18:55:57 -07:00
Giorgis Georgakoudis 7539e9cf81 [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert, jhuber6, ABataev

Differential Revision: https://reviews.llvm.org/D102107
2022-09-15 00:54:05 +00:00
Vitaly Buka bf204881b6 [msan] Change logic of ClInstrumentationWithCallThreshold
According to logs, ClInstrumentationWithCallThreshold is workaround
for slow backend with large number of basic blocks.
However, I can't reproduce that one, but I see significant slowdown
after ClCheckConstantShadow. Without ClInstrumentationWithCallThreshold
compiler is able to eliminate many of the branches.

So maybe we should drop ClInstrumentationWithCallThreshold completly.

For now I just change the logic to ignore constant shadow so it will
not trigger callback fallback too early.

Reviewed By: kstoimenov

Differential Revision: https://reviews.llvm.org/D133880
2022-09-14 14:58:12 -07:00
Florian Hahn 7f3ff9d3c0
[ConstraintElimination] Track if variables are positive in constraint.
Keep track if variables are known positive during constraint
decomposition, aggregate the information when building the constraint
object and encode the extra information as constraints to be used during
reasoning.
2022-09-14 18:43:54 +01:00
Alexey Bataev d647312e3f [SLP][NFC]Extract getLastInstructionInBundle function for better
dependence  checking, NFC.

Part of D110978
2022-09-14 08:43:15 -07:00
Zain Jaffal 8253f7e286
[InstCombine] Optimize multiplication where both operands are negated
Handle the case where both operands are negated in matrix multiplication

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D133695
2022-09-14 16:29:39 +01:00
Nikita Popov b1cd393f9e [AA] Tracking per-location ModRef info in FunctionModRefBehavior (NFCI)
Currently, FunctionModRefBehavior tracks whether the function reads
or writes memory (ModRefInfo) and which locations it can access
(argmem, inaccessiblemem and other). This patch changes it to track
ModRef information per-location instead.

To give two examples of why this is useful:

* D117095 highlights a weakness of ModRef modelling in the presence
  of operand bundles. For a memcpy call with deopt operand bundle,
  we want to say that it can read any memory, but only write argument
  memory. This would allow them to be treated like any other calls.
  However, we currently can't express this and have to say that it
  can read or write any memory.
* D127383 would ideally be modelled as a separate threadid location,
  where threadid Refs outside pre-split coroutines can be ignored
  (like other accesses to constant memory). The current representation
  does not allow modelling this precisely.

The patch as implemented is intended to be NFC, but there are some
obvious opportunities for improvements and simplification. To fully
capitalize on this we would also want to change the way we represent
memory attributes on functions, but that's a larger change, and I
think it makes sense to separate out the FunctionModRefBehavior
refactoring.

Differential Revision: https://reviews.llvm.org/D130896
2022-09-14 16:34:41 +02:00
Florian Hahn efd3ec47d9
[ConstraintElimination] Clear new indices directly in getConstraint(NFC)
Instead of checking if any of the new indices has a non-zero coefficient
before using the constraint, do this directly when constructing the
constraint.
2022-09-14 15:31:25 +01:00
Sanjay Patel 73919a87e9 [InstCombine] try multi-use demanded bits folds for 'add'
This patch enables a multi-use demanded bits fold (motivated by issue #57576):
https://alive2.llvm.org/ce/z/DsZakh

This mimics transforms that we already do on the single-use path.

Originally, this patch did not include the last part to form a constant, but
that can be removed independently to reduce risk. It's not clear what the
effect of either change will be when viewed end-to-end.

This is expected to be neutral or a slight win for compile-time.
See the "add-demand2" series for experimental timing results:
https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions&remote=rotateright

Differential Revision: https://reviews.llvm.org/D133788
2022-09-14 09:30:59 -04:00
Alexey Bataev 796af0c027 [SLP] Move getInsertIndex function, NFC.
Part of D110978.
2022-09-14 06:22:52 -07:00
Florian Hahn f213128b29
[ConstraintElimination] Further de-compose operands of add operations.
This simply extends the existing logic to look through adds and combine
the components as done in other places already.
2022-09-14 12:00:32 +01:00
Kazu Hirata d3649c2be4 [Vectorize] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5879:5: error:
  expression result unused [-Werror,-Wunused-value]
2022-09-13 09:30:06 -07:00
Arthur Eubanks 5a33d1f0b9 [SimplifyCFG] Don't hoist allocas
D129370 started hoisting allocas across stacksave/stackrestore
boundaries which is wrong.

Reviewed By: chill, rnk

Differential Revision: https://reviews.llvm.org/D133730
2022-09-13 09:23:39 -07:00
Valery N Dmitriev 18dde772d6 [SLP] Unify main/alternate selection for CmpInst instructions
Make main/alternate operation selection logic for CmpInst
consistent across SLP vectorizer.

Differential Revision: https://reviews.llvm.org/D133430
2022-09-13 09:20:25 -07:00
Florian Hahn ac80b0e84f
[LV] Mark Instr as const in scalarizeInstruction. (NFC).
This is to reduce the diff in follow-up changes.
2022-09-13 09:10:02 +01:00
Max Kazantsev 86d5586d78 [SCEVExpander] Recompute poison-generating flags on hoisting. PR57187
Instruction being hoisted could have nuw/nsw flags inferred from the old
context, and we cannot simply move it to the new location keeping them
because we are going to introduce new uses to them that didn't exist before.

Example in https://github.com/llvm/llvm-project/issues/57187 shows how
this can produce branch by poison from initially well-defined program.

This patch forcefully recomputes poison-generating flag in the new context.

Differential Revision: https://reviews.llvm.org/D132022
Reviewed By: fhahn, nikic
2022-09-13 12:56:35 +07:00
Kazu Hirata 9606608474 [llvm] Use x.empty() instead of llvm::empty(x) (NFC)
I'm planning to deprecate and eventually remove llvm::empty.

I thought about replacing llvm::empty(x) with std::empty(x), but it
turns out that all uses can be converted to x.empty().  That is, no
use requires the ability of std::empty to accept C arrays and
std::initializer_list.

Differential Revision: https://reviews.llvm.org/D133677
2022-09-12 13:34:35 -07:00
Sanjay Patel 53eede597e [InstCombine] look through 'not' of ctlz/cttz op with 0-is-undef
https://alive2.llvm.org/ce/z/MNsC1S

This pattern was flagged at:
https://discourse.llvm.org/t/instcombines-select-optimizations-dont-trigger-reliably/64927
2022-09-12 15:06:21 -04:00
Benjamin Kramer 2675c41671 [DFSan] Don't crash with the legacy pass manager
TargetLibraryInfo isn't optional, so we have to provide it even with the
lageacy stuff. Ideally we wouldn't need it anymore but there are still
users out there that are stuck on the legacy PM.

Differential Revision: https://reviews.llvm.org/D133685
2022-09-12 19:11:55 +02:00
A-Wadhwani de3445e0ef [SROA] Create additional vector type candidates based on store and load slices
This patch adds additional vector types to be considered when doing
promotion in SROA, based on the types of the store and load slices. This
provides more promotion opportunities, by potentially using an optimal
"intermediate" vector type.

For example, the following code would currently not be promoted to a
vector, since `__m128i` is a `<2 x i64>` vector.

```

__m128i packfoo0(int a, int b, int c, int d) {
  int r[4] = {a, b, c, d};
  __m128i rm;
  std::memcpy(&rm, r, sizeof(rm));
  return rm;
}
```

```
packfoo0(int, int, int, int):
        mov     dword ptr [rsp - 24], edi
        mov     dword ptr [rsp - 20], esi
        mov     dword ptr [rsp - 16], edx
        mov     dword ptr [rsp - 12], ecx
        movaps  xmm0, xmmword ptr [rsp - 24]
        ret
```

By also considering the types of the elements, we could find that the
`<4 x i32>` type would be valid for promotion, hence removing the memory
accesses for this function. In other words, we can explore other new
vector types, with the same size but different element types based on
the load and store instructions from the Slices, which can provide us
more promotion opportunities.

Additionally, the step for removing duplicate elements from the
`CandidateTys` vector was not using an equality comparator, which has
been fixed.

Differential Revision: https://reviews.llvm.org/D132096
2022-09-12 09:55:37 -07:00
Sanjay Patel 4ca25c66d4 [Reassociate] prevent partial undef negation replacement
As shown in the examples in issue #57683, we allow matching
vectors with poison (undef) in this transform (and possibly more),
but we can't then use the partially defined value as a replacement
value in other expressions blindly.

This seems to be avoided in simpler examples of reassociation,
and other passes should be able to clean up the redundant op
seen in these tests.
2022-09-12 12:28:34 -04:00
Florian Hahn 3fd1cc2574
[SLP] Add Preheader to CSE blocks after hoisting CSE-able instrs.
Adding the pre-header to CSEBlocks ensures instructions are CSE'd even
after hoisting.

This was original discovered by @atrick a while ago.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D133649
2022-09-12 15:53:31 +01:00
Alexey Bataev dfe1e9dd79 [SLP]Improve reordering of clustered reused scalars.
If the reused scalars are clustered, i.e. each part of the reused mask
contains all elements of the original scalars exactly once, we can
reorder those clusters to improve the whole ordering of of the clustered
vectors.

Differential Revision: https://reviews.llvm.org/D133524
2022-09-12 06:52:25 -07:00
Max Kazantsev 0e465c0c2f [IRCE] Bail in case of pointer types. PR40539
We should not unconditionally expect that SCEVable types are all integers
because SCEV can also be computed for pointers. Bail in this case.
2022-09-12 16:01:25 +07:00
Djordje Todorovic b080d0bae8 Revert ""Recommit "[AggressiveInstCombine] Lower Table Based CTTZ"""
This reverts commit df868edee5, as it
introduces a bug found by Alive2 (more on the rGdf868edee561).
2022-09-12 08:23:07 +02:00
Johannes Doerfert c922cac868 Revert "[Attributor] AAPointerInfo should allow "harmless" uses"
Revert "[Attributor] Teach AAPointerInfo to look into aggregates"

This reverts commit 844f6c5d03 and
4ed0a88cd8 as they broke the buildbots
that run openmp/libomptarget/test/offloading/bug49021.cpp.
2022-09-11 21:37:54 -07:00
Johannes Doerfert 844f6c5d03 [Attributor] AAPointerInfo should allow "harmless" uses
If a call base use will not capture a pointer we can approximate the
effects. This is important especially for readnone/only uses.
2022-09-11 20:16:11 -07:00
Johannes Doerfert 4ed0a88cd8 [Attributor] Teach AAPointerInfo to look into aggregates
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
2022-09-11 20:16:11 -07:00
Johannes Doerfert b046ebdc01 [Attributor][FIX] Conservatively handle ptr2int, don't crash
If a pointer-2-int cast is found we give up on AAPointerInfo for now.
This caused a crash before.

Reported by John Tramm (@jtramm).
2022-09-11 20:16:11 -07:00
Johannes Doerfert 21711039e3 [OpenMP] Allow the Attributor to look at functions we also internalized
This is important as we have accesses to globals in those which we need to
categorize.
2022-09-11 20:16:11 -07:00
Junduo Dong 6975ab7126 [Clang] Reimplement time tracing of NewPassManager by PassInstrumentation framework
The previous implementation of time tracing in NewPassManager is direct but messive.

The key codes are like the demo below:
```
  /// Runs the function pass across every function in the module.
  PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM,
                        LazyCallGraph &CG, CGSCCUpdateResult &UR) {
      /// ...
      PreservedAnalyses PassPA;
      {
        TimeTraceScope TimeScope(Pass.name());
        PassPA = Pass.run(F, FAM);
      }
      /// ...
 }
```

It can be bothered to judge where should we add the tracing codes by hands.

With the PassInstrumentation framework, we can easily add `Before/After` callback
functions to add time tracing codes.

Differential Revision: https://reviews.llvm.org/D131960
2022-09-11 05:42:55 -07:00
Florian Hahn 69d9bb2aad
[VPlan] Check recipe uses instead of type of underlying instr (NFC).
Suggested by @Ayal post-commit, to reduce the dependence on the
underlying instruction in favor of information available directly for
the recipe.
2022-09-11 12:24:44 +01:00
Marc Auberer 09cdddea0c [InstCombine] Fold x + (x | -x) to x & (x - 1)
Fixes #57531

This transformation may be particularly useful on x86-64,
because x & (x - 1) can be performed by a single blsr instruction.

Differential Revision: https://reviews.llvm.org/D133362
2022-09-11 06:14:24 -04:00
Alexey Bader 2bb5535b58 [StripDeadDebugInfo] Drop dead CUs
In situations when a submodule is extracted from big module (i.e. using
CloneModule) a lot of debug info is copied via metadata nodes. Despite of
the fact that part of that info is not linked to any instruction in extracted
IR file, StripDeadDebugInfo pass doesn't drop them.
Strengthen criteria for debug info that should be kept in a module:
- Only those compile units are left that referenced by a subprogram debug info
node that is attached to a function definition in the module or to an instruction
in the module that belongs to an inlined function.

Signed-off-by: Mikhail Lychkov <mikhail.lychkov@intel.com>

Differential Revision: https://reviews.llvm.org/D122163
2022-09-11 01:31:03 -07:00
Vitaly Buka b51d1f1fbd [msan] Don't deppend on argumens evaluation order 2022-09-10 15:28:32 -07:00
Vitaly Buka 71c5e7b26a [msan] Do not deppend on arguments evaluation order
Clang and GCC do this differently making IR inconsistent.
https://lab.llvm.org/buildbot#builders/6/builds/13120
2022-09-10 13:50:32 -07:00
Vitaly Buka 1819d5999c [NFC][msan] Remove unused return type 2022-09-10 12:20:54 -07:00
Vitaly Buka 6fc31712f1 [msan] Relax handling of llvm.masked.expandload and llvm.masked.gather
This is work around for new false positives. Real implementation will
follow.
2022-09-10 12:19:16 -07:00
Manuel Brito b51c6130ef Use PoisonValue instead of UndefValue when RAUWing unreachable code [NFC]
Replacing the following instances of UndefValue with PoisonValue, where the UndefValue is used as an arbitrary value:

- llvm/lib/CodeGen/WinEHPrepare.cpp
`demotePHIsOnFunclets`: RAUW arbitrary value for lingering uses of removed PHI nodes

 - llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
`FoldSingleEntryPHINodes`: Removes a self-referential single entry phi node.

 - llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
`finalize`: Remove all references to removed functions.

- llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
`cleanup`: the result is not used then the inserted instructions are removed.

 - llvm/tools/bugpoint/CrashDebugger.cpp
`TestInts`:  the program is cloned and instructions are removed to narrow down source of crash.

Differential Revision: https://reviews.llvm.org/D133640
2022-09-10 14:28:01 +01:00
Florian Hahn da734473fa
[LV] Remove now dead variable after 2a78890b7b (NFC). 2022-09-09 20:25:55 +01:00
Florian Hahn 2a78890b7b
[VPlan] Move SCEV expansion for pointer induction to VPExpandSCEV (NFC).
Use VPExpandSCEVRecipe to expand the step of pointer inductions. This
cleanup addresses a corresponding FIXME.

It should be NFC, as steps for pointer induction must be constants,
which makes expansion trivial.
2022-09-09 19:20:13 +01:00
Sanjay Patel 6113e6738d [InstCombine] move/adjust comments about demanded bits; NFC
The code has been moved/copied around, but the comments were not updated to match.
2022-09-09 11:48:20 -04:00
Philip Reames a33d98e20a [LV] Pull out common expression [nfc] 2022-09-09 07:31:46 -07:00
Philip Reames edb26268ce [VPlan] Only generate single instr for stores uniform across all parts.
Extend the approach taken by D133019 to store instructions.

Differential Revision: https://reviews.llvm.org/D133497
2022-09-09 07:15:12 -07:00
Nikita Popov a9f312c7f4 [AST] Use BatchAA in aliasesUnknownInst() (NFCI) 2022-09-09 15:54:48 +02:00
Sebastian Neubauer c7750c522e Add helper func to get first non-alloca position
The LLVM performance tips suggest that allocas should be placed at the
beginning of the entry block. So far, llvm doesn’t provide any helper to
find that position.

Add BasicBlock::getFirstNonPHIOrDbgOrAlloca and IRBuilder::SetInsertPointPastAllocas(Function*)
that get an insert position after the (static) allocas at the start of a
function and use it in ShadowStackGCLowering.

Differential Revision: https://reviews.llvm.org/D132554
2022-09-09 15:39:53 +02:00
Nikita Popov 4ab77d1677 [LICM] Allow promotion with non-load/store users
If there are non-load/store users of the promoted pointer, we
currently abort promotion. However, having such users isn't really
relevant to the transform. We already separately check that a)
there are no instructions that modref the promoted pointer and
b) that a pointer capture disables store promotion.

In the affected @test_captured_in_loop test case we have a readnone
capture of the promoted pointer, which means that load promotion
can be performed (while store promotion cannot).

Differential Revision: https://reviews.llvm.org/D133485
2022-09-09 13:09:59 +02:00
Djordje Todorovic df868edee5 "Recommit "[AggressiveInstCombine] Lower Table Based CTTZ""
This reverts commit 053841c562.

We faced a use-after-free after pushing the D113291, since the
foldSqrt() has a call to eraseFromParent(). The function
should be at the end of the main loop that folds the patterns.
This patch fixes that.
2022-09-09 10:29:39 +02:00
Vitaly Buka 1cf5c7fe8c [msan] Disambiguate warnings debug location
If multiple warnings created on the same instruction (debug location)
it can be difficult to figure out which input value is the cause.

This patches chains origins just before the warning using last origins
update debug information.

To avoid inflating the binary unnecessarily, do this only when uncertainty is
high enough, 3 warnings by default. On average it adds 0.4% to the
.text size.

Reviewed By: kda, fmayer

Differential Revision: https://reviews.llvm.org/D133232
2022-09-08 14:17:07 -07:00
Vitaly Buka 0f2f1c2be1 [sanitizers] Invalidate GlobalsAA
GlobalsAA is considered stateless as usually transformations do not introduce
new global accesses, and removed global access is not a problem for GlobalsAA
users.
Sanitizers introduce new global accesses:
 - Msan and Dfsan tracks origins and parameters with TLS, and to store stack origins.
  - Sancov uses global counters. HWAsan store tag state in TLS.
  - Asan modifies globals, but I am not sure if invalidation is required.

I see no evidence that TSan needs invalidation.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D133394
2022-09-08 14:00:43 -07:00
Sanjay Patel 444f08c832 [InstCombine] fold icmp of truncated left shift, part 2
(trunc (1 << Y) to iN) == 2**C --> Y == C
(trunc (1 << Y) to iN) != 2**C --> Y != C
https://alive2.llvm.org/ce/z/xnFPo5

Follow-up to d9e1f9d759. This was a suggested
enhancement mentioned in issue #51889.
2022-09-08 12:44:02 -04:00
Philip Reames 4c4c0d2c06 [LV] Use safe-divisor lowering for fixed vectors if profitable
This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well.

Differential Revision: https://reviews.llvm.org/D132591
2022-09-08 09:15:54 -07:00
Joe Loser 5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
Djordje Todorovic 7aec9ddcfd Revert "Recommit "[AggressiveInstCombine] Lower Table Based CTTZ""
This reverts commit f879939157.
2022-09-08 17:01:16 +02:00
Sanjay Patel d9e1f9d759 [InstCombine] Fold icmp of truncated left shift
(trunc (1 << Y) to iN) == 0 --> Y u>= N
(trunc (1 << Y) to iN) != 0 --> Y u<  N

These can be generalized in several ways as noted by the TODO
items, but this handles the pattern in the motivating bug report.

Fixes #51889

Differential Revision: https://reviews.llvm.org/D115480
2022-09-08 10:48:14 -04:00
Djordje Todorovic f879939157 Recommit "[AggressiveInstCombine] Lower Table Based CTTZ" 2022-09-08 16:36:46 +02:00
Florian Hahn 422cf99161
[VPlan] Only generate single instr for loads uniform across all parts.
VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a
scalar instruction is generated per-part.

This is a potential alternative D132892. For now the current patch only
catches cases where the address is trivially invariant (defined outside
VPlan), while D132892 catches any address that is considered invariant
by SCEV AFAICT.

It should be possible to hoist fully invariant recipes feeding loads out
of the vector loop region as well, but in practice LICM should do that
already.

This version of the patch artificially limits this to loads to make it
easier to compare, but this restriction should be easily liftable.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133019
2022-09-08 14:27:58 +01:00
Chenbing Zheng 01cea7ac10 [InstCombine] extractvalue (any_mul_with_overflow X, 2^n), 0 -> X << n
Alive2: https://alive2.llvm.org/ce/z/JLmabt (umul)
        https://alive2.llvm.org/ce/z/J_ruXR  (smul)
        https://alive2.llvm.org/ce/z/o9SVSz (vector)

Reviewed By: spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D133188
2022-09-08 11:12:55 +08:00
Sami Tolvanen 52967a5306 [InstCombine] Fix a crash in -kcfi debug block
Don't attempt to print out DebugLoc as we may not have one.
2022-09-07 22:59:12 +00:00
Marco Elver 97c2220565 [SanitizerBinaryMetadata] Introduce SanitizerBinaryMetadata instrumentation pass
Introduces the SanitizerBinaryMetadata instrumentation pass which uses
the new MD_pcsections metadata kinds to instrument certain types of
instructions and functions required for breakpoint-based sanitizers.

The first intended user of the binary metadata emitted will be a variant
of GWP-TSan [1]. GWP-TSan will require information about atomic
accesses; to unambiguously determine if an access is atomic or not, we
also require "covered" information which code has been compiled with
SanitizerBinaryMetadata instrumentation enabled.

[1] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D130887
2022-09-07 21:25:40 +02:00
Sanjay Patel 85b289377b [SCCP] convert signed div/rem to unsigned for non-negative operands, 2nd try
The original commit ( fe1f3cfc26 ) was reverted because it could
crash / assert when trying to fold a value that was replaced
by a constant. In that case, there might not be an entry for the
constant in the solver yet.

This version adds a check for that possibility along with tests to
exercise that pattern (they used to crash).

Original commit message:
This extends the transform added with D81756 to handle div/rem opcodes.
For example:
https://alive2.llvm.org/ce/z/cX6za6

This replicates part of what CVP already does, but the motivating example
from issue #57472 demonstrates a phase ordering problem - we convert
branches to select before CVP runs and miss the transform.

Differential Revision: https://reviews.llvm.org/D133198
2022-09-07 11:56:29 -04:00
Sanjay Patel 7c57180900 [InstCombine] fold add+negate through select into sub
This transform came up as a potential DAGCombine in D133282,
so I wanted to see how it escaped in IR too.

We do general folds in InstCombiner::SimplifySelectsFeedingBinaryOp()
by checking if either arm of a select simplifies when the trailing
binop is threaded into the select.

So as long as one side simplifies, it's a good fold to combine a
negate and add into 1 subtract.

This is an example with a zero arm in the select:
https://alive2.llvm.org/ce/z/Hgu_Tj

And this models the tests with a cancelling 'not' op:
https://alive2.llvm.org/ce/z/BuzVV_

Differential Revision: https://reviews.llvm.org/D133369
2022-09-07 08:23:35 -04:00
Aaron Kogon ae05b9dc30 Sink/hoist memory instructions between loop fusion candidates
Currently, instructions in the preheader of the second of two fusion
candidates are sunk and hoisted whenever possible, to try to allow the
loops to fuse. Memory instructions are skipped, and are never sunk or
hoisted. This change adds memory instructions for sinking/hoisting
consideration.

This change uses DependenceAnalysis to check if a mem inst in the
preheader of FC1 depends on an instruction in FC0's header, across
which it will be hoisted, or FC1's header, across which it will be
sunk. We reject cases where the dependency is a data hazard.

Differential Revision: https://reviews.llvm.org/D131606
2022-09-07 07:42:00 -04:00
Nikita Popov f42d92611d [Reassociate] Avoid ConstantExpr::getFNeg() (NFCI)
Use ConstantFoldUnaryOpOperand() instead. Also make the code below
robust against non-instruction users, just in case it doesn't fold.
2022-09-07 10:48:08 +02:00
Vitaly Buka 4c18670776 [NFC][sancov] Rename ModuleSanitizerCoveragePass 2022-09-06 20:55:39 -07:00
Vitaly Buka 5e38b2a456 [NFC][msan] Rename ModuleMemorySanitizerPass 2022-09-06 20:30:35 -07:00
Ruobing Han fb45f3c948 [SimpleLoopUnswitch] Skip non-trivial unswitching of cold functions
In the current main branch, all cold loops will not be applied non-trivial unswitch. As reported in D129599, skipping these cold loops will incur regression in SPEC benchmark.
Thus, instead of skipping cold loops, now only skipping loops in cold functions.

Reviewed By: alexgatea, aeubanks

Differential Revision: https://reviews.llvm.org/D133275
2022-09-06 19:13:31 -04:00
Vitaly Buka 93600eb50c [NFC][asan] Rename ModuleAddressSanitizerPass 2022-09-06 15:02:11 -07:00
Vitaly Buka e7bac3b9fa [msan] Convert Msan to ModulePass
MemorySanitizerPass function pass violatied requirement 4 of function
pass to do not insert globals. Msan nees to insert globals for origin
tracking, and paramereters tracking.

https://llvm.org/docs/WritingAnLLVMPass.html#the-functionpass-class

Reviewed By: kstoimenov, fmayer

Differential Revision: https://reviews.llvm.org/D133336
2022-09-06 15:01:04 -07:00
Vitaly Buka b4257d3bf5 [tsan] Replace mem intrinsics with calls to interceptors
After https://reviews.llvm.org/rG463aa814182a23 tsan replaces llvm
intrinsics with calls to glibc functions. However this approach is
fragile, as slight changes in pipeline can return llvm intrinsics back.
In particular InstCombine can do that.

Msan/Asan already declare own version of these memory
functions for the similar purpose.

KCSAN, or anything that uses something else than compiler-rt, needs to
implement this callbacks.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D133268
2022-09-06 13:09:31 -07:00
Florian Hahn 27e7db54eb
Revert "[SCCP] convert signed div/rem to unsigned for non-negative operands"
This reverts commit fe1f3cfc26.

It looks like this commit breaks building llvm-test-suite.

To reproduce, run `opt -passes=ipsccp` on the IR below.

    @g = internal global i32 256, align 4

    define void @test() {
    entry:
      %0 = load i32, ptr @g, align 4
      %div = sdiv i32 %0, undef
      ret void
    }
2022-09-06 18:21:51 +01:00
Florian Hahn 2fb68c0628
[ConstraintElimination] Replace pair with named struct (NFC).
This slightly improves the readability and allows further extensions in
follow-ups.
2022-09-06 18:04:04 +01:00
Vitaly Buka c51a12d598 Revert "[tsan] Replace mem intrinsics with calls to interceptors"
Breaks
http://45.33.8.238/macm1/43944/step_4.txt
https://lab.llvm.org/buildbot/#/builders/70/builds/26926

This reverts commit 77654a65a3.
2022-09-06 09:47:33 -07:00
Sanjay Patel ae117e1c1b [InstCombine] remove dead code for add (select cond, (sub), 0); NFC
This pattern is handled more generally in SimplifySelectsFeedingBinaryOp().
Tests to confirm that added to the add.ll test file in the previous commit.
2022-09-06 12:19:50 -04:00
Doru Bercea 0b1160fdeb Fix OpenMP Opt for target without a parallel region.
Remove ctx redeclaration.

Format code.

Remove parallel check. Modify tests. Clean-up code.

Fix another test.

Move code to helper functions.

Format file.

Minor fixes.
2022-09-06 16:04:53 +00:00
Vitaly Buka 77654a65a3 [tsan] Replace mem intrinsics with calls to interceptors
After https://reviews.llvm.org/rG463aa814182a23 tsan replaces llvm
intrinsics with calls to glibc functions. However this approach is
fragile, as slight changes in pipeline can return llvm intrinsics back.
In particular InstCombine can do that.

Msan/Asan already declare own version of these memory
functions for the similar purpose.

KCSAN, or anything that uses something else than compiler-rt, needs to
implement this callbacks.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D133268
2022-09-06 08:25:32 -07:00
Sanjay Patel fe1f3cfc26 [SCCP] convert signed div/rem to unsigned for non-negative operands
This extends the transform added with D81756 to handle div/rem opcodes.
For example:
https://alive2.llvm.org/ce/z/cX6za6

This replicates part of what CVP already does, but the motivating example
from issue #57472 demonstrates a phase ordering problem - we convert
branches to select before CVP runs and miss the transform.

Differential Revision: https://reviews.llvm.org/D133198
2022-09-06 08:58:15 -04:00
Sanjay Patel dd6eb4d67f [InstCombine] reduce code duplication; NFC 2022-09-06 08:19:30 -04:00
Arthur Eubanks 7e3aa8f01a Revert "[LoopPassManager] Implement and use LoopNestAnalysis::run() instead of manually creating LoopNests"
This reverts commit 57fd866551.

Causes crashes, see comments in D132581.
2022-09-05 15:42:48 -07:00
Momchil Velikov 078899cd64 [SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions
SimplifyCFG does some common code hoisting, which is limited
to hoisting a sequence of identical instruction in identical
order and stops at the first non-identical instruction.

This patch allows hoisting instruction pairs over
same-length sequences of non-matching instructions. The
linear asymptotic complexity of the algorithm stays the
same, there's an extra parameter
`simplifycfg-hoist-common-skip-limit` serving to limit
compilation time and/or the size of the hoisted live ranges.

The patch improves SPECv6/525.x264_r by about 10%.

Reviewed By: nikic, dmgreen

Differential Revision: https://reviews.llvm.org/D129370
2022-09-05 15:13:46 +01:00
Tian Zhou 8fa432be4f [InstCombine] reduce test-for-overflow of shifted value
Fixes #57338.

The added code makes the following transformations:

For unsigned predicates / eq / ne:
icmp pred (x << 1), x --> icmp getSignedPredicate(pred) x, 0
icmp pred x, (x << 1) --> icmp getSignedPredicate(pred) 0, x

Some examples:
https://alive2.llvm.org/ce/z/ckn4cj
https://alive2.llvm.org/ce/z/h-4bAQ

Differential Revision: https://reviews.llvm.org/D132888
2022-09-05 09:51:51 -04:00
Florian Hahn 408ebe5e3a
[VPlan] Move VPWidenCallRecipe to VPlanRecipes.cpp (NFC).
Depends on D132585.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D132586
2022-09-05 10:48:29 +01:00
Nikita Popov 388b684354 [LICM] Separate check for writability and thread-safety (NFCI)
This used a single check to make sure that the object is both
writable and thread-local. Separate them out to make the
deficiencies in the current code more obvious.
2022-09-05 09:43:17 +02:00
Florian Hahn ba3d29f871
[LCSSA] Update unreachable uses with poison.
Users of LCSSA may not expect non-phi uses when checking the uses
outside a loop, which may cause crashes. This is due to the fact that we
do not update uses in unreachable blocks.

To ensure all reachable uses outside the loop are phis, update uses in
unreachable blocks to use poison in dead code.

Fixes #57508.
2022-09-04 22:26:18 +01:00
Kazu Hirata 7d8c2d17eb [llvm] Use range-based for loops (NFC)
Identified with modernize-loop-convert.
2022-09-03 23:27:25 -07:00
Fangrui Song 9fc679b87c [SanitizerCoverage] Simplify pc-table and improve test. NFC 2022-09-03 14:29:21 -07:00
Kazu Hirata 9eca5ed790 [llvm] Use std::enable_if_t (NFC) 2022-09-03 11:17:44 -07:00
Kazu Hirata fedc59734a [llvm] Use range-based for loops (NFC) 2022-09-03 11:17:40 -07:00
Sanjay Patel 22e1f66f26 [SCCP] add helper function for replacing signed operations; NFC
Preliminary refactoring for planned enhancement in D133198.
2022-09-03 10:30:10 -04:00
Sanjay Patel 5c759edc57 [InstCombine] reduce another or-xor bitwise logic pattern
~(A & ?) | (A ^ B) --> ~((A & ?) & B)
https://alive2.llvm.org/ce/z/mxex6V

This is similar to 9d218b61cc where we peeked through
another logic op to find a common operand.
2022-09-03 09:32:08 -04:00
Richard Smith 053841c562 Revert "[AggressiveInstCombine] Lower Table Based CTTZ"
This reverts commit fec01ee3f5.

According to asan, this patch introduces a heap use after free.
2022-09-02 16:19:09 -07:00
Francis Visoiu Mistrih c5b10f348e [Matrix] Use print instead of dump for matrix-print-after-transpose-opt
We should be able to use this option even if LLVM_ENABLE_DUMP is not on.

(should fix the bots too)
2022-09-02 16:12:21 -07:00
Francis Visoiu Mistrih 81bdb4068d [Matrix] Simplify matmuls with scalars
If one of the operands is a transposed splat, the transpose can be
removed.

This is useful to simplify when transposes are distributed to operands
of a matmul:

* k^T -> k
* (A * k)^t -> A^t * k

Differential Revision: https://reviews.llvm.org/D130177
2022-09-02 15:50:25 -07:00
Sameer Sahasrabuddhe 46b293cb3f [Attributor] Simplify offset calculation for a constant GEP
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D132931
2022-09-02 23:53:51 +05:30
Arthur Eubanks 57fd866551 [LoopPassManager] Implement and use LoopNestAnalysis::run() instead of manually creating LoopNests
The current code is basically just emulating what the analysis manager does.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D132581
2022-09-02 10:55:53 -07:00
Djordje Todorovic fec01ee3f5 [AggressiveInstCombine] Lower Table Based CTTZ
This patch introduces recognition of table-based ctz implementation
during the AggressiveInstCombine.

This fixes the [0].

[0] https://bugs.llvm.org/show_bug.cgi?id=46434

Differential Revision: https://reviews.llvm.org/D113291
2022-09-02 17:26:55 +02:00
Jolanta Jensen 958abe864a [LoopLoadElim] Add stores with matching sizes as load-store candidates
We are not building up a proper list of load-store candidates because
we are throwing away stores where the type don't match the load.
This patch adds stores with matching store sizes as candidates.
Author of the original patch: David Sherwood.

Differential Revision: https://reviews.llvm.org/D130233
2022-09-02 13:11:25 +01:00
Muhammad Omair Javaid 18de7c6a3b Revert "[InstCombine] Treat passing undef to noundef params as UB"
This reverts commit c911befaec.

It has broken LLDB Arm/AArch64 Linux buildbots. I dont really understand
the underlying reason. Reverting for now make buildbot green.

https://reviews.llvm.org/D133036
2022-09-02 16:09:50 +05:00
Mikael Holmen 51d4c7ceea [GlobalOpt] Fix debug variance problem in hasOnlyColdCalls
hasOnlyColdCalls skipped over calls to intrinsics, but it did so after
checking the linkage of the called function. This meant that the presence
of a call to a debug intrinsic could affect the outcome of the
optimization.

In my original reproducer (for an out of tree target) it was particularly
interesting, because the actual IR after GlobalOpt was not different with
debug instrinsics present, so -print-after-all printouts didn't show
anything there.

However, without debuginfo, GlobalOpt went further and ran
BlockFrequencyAnalysis and (more importanly) LoopAnalysis, and later on in
the pipeline, instcombine behaved in different ways when LoopInfo was
present.

So a call to a dbg.declare prevented running LoopAnalysis in
GlobalOpt, which later prevented InstCombine from doing an optimization.

The dbg-intrinsic-loopanalysis.ll testcase tries to expose this.

Then I also noted that adding a dbg.declare actually made the existing
testcase colccc_coldsites.ll generate different code, so I modified that
to now test it behaves the same way with and without the dbg.declare.

Reviewed By: nikic, fhahn

Differential Revision: https://reviews.llvm.org/D133193
2022-09-02 12:29:44 +02:00
Sergey Kachkov be37caca00 [JumpThreading] Process range comparisions with non-local cmp instructions
Use getPredicateOnEdge method if value is a non-local
compare-with-a-constant instruction, that can give more precise
results than getConstantOnEdge.

Differential Revision: https://reviews.llvm.org/D131956
2022-09-02 12:22:45 +02:00
Nikita Popov c453e5b901 Revert "[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI"
This reverts commit cd8f3e7581.

As pointed out by Eli on the review, this is missing an alignment
check. The value might be written at an offset.
2022-09-02 09:28:48 +02:00
Nikita Popov 639d912282 [LICM] Allow load-only scalar promotion in the presence of unwinding
Currently, we bail out of scalar promotion if the loop may unwind
and the memory may be visible on unwind. This is because we can't
insert stores of the promoted value on unwind edges.

However, nowadays scalar promotion also has support for only
promoting loads, while leaving stores in place. This kind of
promotion is safe even in the presence of unwinding.

Differential Revision: https://reviews.llvm.org/D133111
2022-09-02 09:27:13 +02:00
luxufan cd8f3e7581 [DSE] Eliminate noop store even through has clobbering between LoadI and StoreI
For noop store of the form of LoadI and StoreI,
An invariant should be kept is that the memory state of the related
MemoryLoc before LoadI is the same as before StoreI.
For this example:
```
define void @pr49927(i32* %q, i32* %p) {
  %v = load i32, i32* %p, align 4
  store i32 %v, i32* %q, align 4
  store i32 %v, i32* %p, align 4
  ret void
}
```
Here the definition of the store's destination is different with the
definition of the load's destination, which it seems that the
invariant mentioned above is broken. But the definition of the
store's destination would write a value that is LoadI, actually, the
invariant is still kept. So we can safely ignore it.

Differential Revision: https://reviews.llvm.org/D132657
2022-09-02 06:37:41 +00:00
Vitaly Buka ad3a77df2d [msan] Fix debug info with getNextNode
When we want to add instrumentation after
an instruction, instrumentation still should
keep debug info of the instruction.

Reviewed By: kda, kstoimenov

Differential Revision: https://reviews.llvm.org/D133091
2022-09-01 20:13:56 -07:00
Chenbing Zheng d30cf77cb1 [InstCombine] complete fold extractvalue (any_mul_with_overflow X, -1)
When we do extractvalue (any_mul_with_overflow X, -1) --> (-X and icmp),
which left partly failed to match vector constant with poison element.
This patch try to fix it.

Alive2: https://alive2.llvm.org/ce/z/2rGp_3

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D132996
2022-09-02 10:58:42 +08:00
Vitaly Buka ad2b356f85 [msan] Use no-origin functions when possible
Saves 1.8% of .text size on CTMark

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D133077
2022-09-01 19:18:38 -07:00