Commit Graph

23151 Commits

Author SHA1 Message Date
Florian Hahn f02ff5348f
[LV] Move new epilog-vectorization-widen-inductions.ll to AArch64 dir.
The test requires the AArch64 backend, so move it to the right subdir.
2022-09-19 17:13:06 +01:00
Florian Hahn 6087b6386e
[LV] Add tests for epilogue vectorization with widened inductions.
Includes a test for the miscompile in #57712.
2022-09-19 17:10:41 +01:00
Mingming Liu 34db7c64df [NFC] Use opaqueptr in llvm/test/Transforms/SimplifyCFG/preserve-llvm-loop-metadata.ll
Use opaqueptr for test case
llvm/test/Transforms/SimplifyCFG/preserve-llvm-loop-metadata.ll.

- Adjust variable number accordingly since bitcast between different pointer
  types are not necessary.

Differential Revision: https://reviews.llvm.org/D134159
2022-09-19 09:01:11 -07:00
Simon Pilgrim 2538adde5c [CostModel][X86] Add CostKinds handling for cttz
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-19 15:57:03 +01:00
Simon Pilgrim d90a42d64c [CostModel][X86] Add CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF cost handling
Without LZCNT/BMI, the *_ZERO_UNDEF costs are cheaper as they can avoid the zero handling.
2022-09-19 14:06:33 +01:00
Nikita Popov dd61726d5b Revert "[SimplifyCFG] accumulate bonus insts cost"
This reverts commit e5581df60a.

This causes major compile-time regressions, about 2-3% end-to-end
on CTMark.
2022-09-19 14:46:43 +02:00
Simon Pilgrim 5665d0941a [SLP][X86] Add AVX512 test coverage to CTLZ/CTTZ tests
Only AVX512 has decent CTTZ/CTLZ vector ops, add tests to ensure we definitely vectorize these
2022-09-19 13:07:55 +01:00
Max Kazantsev 92e9bddc49 [LoopRotate] Drop loop dispositions when rotating loops. PR56260
This is required because if there is a pure loop-invariant instruction, Loop Rotation
may decide to not clone it and just hoist it instead. If SCEV has previously cached
that it was loop-variant (not being smart enough to prove invariance), we may end
up with inconsistent cache state (which may later trigger false-negative assertion
failures checking that something was invariant).

This is a conservative fix that unconditionally drops the dispositions. We could
only drop it if the hoisting has actually happened, but it should take some time
understanding whether it's safe with all other things this function does.

Differential Revision: https://reviews.llvm.org/D134167
Reviewed By: fhahn
2022-09-19 18:01:02 +07:00
Max Kazantsev 21a9abc1ce [LoopFuse] Drop loop dispositions before reassigning blocks to other loop
This bug was found by recent improvement in SCEV verifier. The code in LoopFuse
directly reassigns blocks to be a part of a different loop, which should automatically
invalidate all related cached loop dispositions.

Differential Revision: https://reviews.llvm.org/D134173
Reviewed By: nikic
2022-09-19 17:43:06 +07:00
Max Kazantsev bb68b2402d [SCEV] Verify contents of loop disposition cache
It seems that it is sometimes broken. Initial motivation for this was
investigation of https://github.com/llvm/llvm-project/issues/56260, but
it also seems that we have found an unrelated bug in LoopFusion that leaves
broken caches.

Differential Revision: https://reviews.llvm.org/D134158
Reviewed By: nikic
2022-09-19 17:43:00 +07:00
Simon Pilgrim 393cc6a354 [LoopVectorize] Regenerate runtime-check.ll 2022-09-19 10:25:48 +01:00
Simon Pilgrim 7e626d7a89 [LoopVectorize][X86] Use quotes around the pass list to appease DOS cmd evaluation
DOS can't handle -passes='default<O3>' correctly
2022-09-19 10:24:37 +01:00
Mingming Liu 7392b45162 [NFC][SimplifyCFG]Precommit test case to show inner-loop metadata may not be preserved
- There is an outer while-loop and an inner for-loop in the test case.
  Inner-loop has `llvm.loop.unroll.enable` metadata that is not
  preserved. This happens around [1], when the loop metadata of outer loop
  overrides the inner loop metadata directly, without looking at whether inner-loop
  itself has loop metadata.

 [1] ab755e6562/llvm/lib/Transforms/Utils/Local.cpp (L1146)

Differential Revision: https://reviews.llvm.org/D134014
2022-09-18 22:48:09 -07:00
Yaxun (Sam) Liu e5581df60a [SimplifyCFG] accumulate bonus insts cost
SimplifyCFG folds

bool foo() {
  if (cond1) return false;
  if (cond2) return false;
  return true;
}

as

bool foo() {
  if (cond1 | cond2) return false
  return true;
}

'cond2' is called 'bonus insts' in branch folding since they introduce overhead
since the original CFG could do early exit but the folded CFG always executes
them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into
its predecessor BB which shares the destination. If it is below bonus-inst-threshold,
SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed.

When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts
in the current BB to be considered for folding. This causes issue for unrolled loops
which share destinations, e.g.

bool foo(int *a) {
  for (int i = 0; i < 32; i++)
    if (a[i] > 0) return false;
  return true;
}

After unrolling, it becomes

bool foo(int *a) {
  if(a[0]>0) return false
  if(a[1]>0) return false;
  //...
  if(a[31]>0) return false;
  return true;
}

SimplifyCFG will merge each BB with its predecessor BB,
and ends up with 32 'bonus insts' which are always executed, which
is much slower than the original CFG.

The root cause is that SimplifyCFG does not consider the
accumulated cost of 'bonus insts' which are folded from
different BB's.

This patch fixes that by introducing a ValueMap to track
costs of 'bonus insts' coming from different BB's into
the same BB, and cuts off if the accumulated cost
exceeds a threshold.

Reviewed by: Artem Belevich, Florian Hahn, Nikita Popov, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D132408
2022-09-18 20:21:14 -04:00
Sanjay Patel d6498abc24 [InstCombine] remove multi-use add demanded constant fold
This was originally part of D133788. There are no visible
regressions. All of the diffs show a large unsigned constant
becoming a small negative constant. This should be better
for analysis (and slightly less compile-time) and codegen.
2022-09-18 14:23:43 -04:00
Marc Auberer f52dd920d4 [InstCombine] Fix bug when folding x + (x | -x) to x & (x - 1)
Addresses concern: https://reviews.llvm.org/rG09cdddea0c4d284c2c22f5dfade40a60850c5ea7

There was a copy/paste mistake in the code. Updated code and test ref.

Differential Revision: https://reviews.llvm.org/D134135
2022-09-18 13:16:12 -04:00
Sanjay Patel 1d1d1e6f22 [InstCombine] fold full-shift of sdiv to icmp+extend
This is a disguised sign-bit test with offset:
(X / +DivC) >> (Width - 1) --> ext (X <= -DivC)
(X / -DivC) >> (Width - 1) --> ext (X >= +DivC)

https://alive2.llvm.org/ce/z/cO8JO4

We don't match/test poison in the sdiv constant because
that would be immediate undefined behavior.
2022-09-18 13:13:14 -04:00
Sanjay Patel 0ae6bc0771 [InstCombine] add tests for full-right-shift of sdiv; NFC 2022-09-18 13:13:14 -04:00
Yaxun (Sam) Liu 97b5736975 [SimplifyCFG] add a test for branch folding multiple BB
Reviewed by: Florian Hahn

Differential Revision: https://reviews.llvm.org/D132910
2022-09-17 21:17:55 -04:00
Florian Hahn 7914e53e31
[ConstraintElimination] Fix crash when combining results.
f213128b29 didn't account for the possibility that the result of
decompose may be empty. Fix that by explicitly checking. Use a newly
introduced helper to also reduce some duplication.

Thanks @bjope for finding the issue!
2022-09-17 14:47:38 +01:00
Alexey Bataev 5d13b12674 [SLP]Improve isUndefVector function by adding insertelement analysis.
Added the mask and the analysis of the buildvector sequence in the
isUndefVector function, improves codegen and cost estimation.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
                                                                          results                   results0 diff
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27362.00                  27360.00 -0.0%

Metric: size..text

Program                                                                                                           size..text
                                                                   results     results0    diff
test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   805299.00   806035.00  0.1%

526.blender_r - some extra code is vectorized.
508.namd_r - some extra code is optimized out.

Differential Revision: https://reviews.llvm.org/D133891
2022-09-16 14:36:38 -07:00
Simon Pilgrim 95c2c9c5c5 [LoopIdiom][X86] Add non-LZCNT test coverage to 'rshift until zero' idiom tests 2022-09-16 17:23:54 +01:00
Simon Pilgrim 23cb1c42cd [CostModel][X86] Update throughput costs for CTLZ ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (and recent fixes to the bdver2 + alderlake models)

Adding full CostKinds costs are affecting some other tests as they make assumptions about SizeLatency costs, so they need addressing first
2022-09-16 16:56:49 +01:00
Dmitry Makogon 7acc1f1fb1 [Test] Add tests showing instcombine sinking opportunity (NFC)
InstCombine could sink instruction to NCD of its users.
2022-09-16 15:06:00 +07:00
Jeffrey Byrnes f90cf68003 [NFC] Fix tests in commit 20cf170e68 2022-09-15 15:37:58 -07:00
Vitaly Buka ed188b39ab [test] Regenerate few tests 2022-09-15 12:36:32 -07:00
Florian Hahn b2c195da6d
[CGP] Update failing test missed in 81a11da762. 2022-09-15 19:35:25 +01:00
Florian Hahn 81a11da762
[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl.
This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops
using a wide shuffle  creating a v64i8 vector, selecting groups of 3
zero elements and an element from the input.

This is profitable on AArch64 where such shuffles can be lowered to tbl
instructions, but only in loops, because it requires materializing 4
masks, which can be done in the loop preheader.

This is the only reason the transform is part of CGP. If there's a
better alternative I missed, please let me know. The same goes for the
shouldReplaceZExtWithShuffle hook which guards this. I am not sure if
this transform will be beneficial on other targets, but it seems like
there is no way other convenient way.

This improves the generated code for loops like the one below in
combination with D96522.

    int foo(uint8_t *p, int N) {
      unsigned long long sum = 0;
      for (int i = 0; i < N ; i++, p++) {
	unsigned int v = *p;
	sum += (v < 127) ? v : 256 - v;
      }
      return sum;
    }

https://clang.godbolt.org/z/Wco866MjY

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D120571
2022-09-15 19:18:13 +01:00
Sanjay Patel aafaa2f4fc [SCCP] convert ashr to lshr for non-negative shift value
This is similar to the existing signed instruction folds.
We get the obvious minimal patterns in other passes, but
this avoids potential missed folds when the multi-block
tests are converted to selects.
2022-09-15 13:54:52 -04:00
Sanjay Patel 204a2fff1f [SCCP] add tests for ashr range transforms; NFC 2022-09-15 13:54:52 -04:00
Sanjay Patel 02a27b3890 [InstCombine] fold X*X == 0 --> X == 0
This is safe when the mul does not overflow:
https://alive2.llvm.org/ce/z/LedVVP

This could be extended to handle non-zero compare constants
and non-squared multiplies.
2022-09-15 12:02:50 -04:00
Sanjay Patel 5fd883377c [InstCombine] add tests for X*X == 0; NFC 2022-09-15 12:02:50 -04:00
Simon Pilgrim 94620e4fc3 [CostModel][X86] Add CostKinds handling for vector shift by generic/non-uniform shift amounts
These are the worst case generic vector shift costs, where nothing is known about the shift amounts - in particular this should stop us using the default sizelatency cost of 1 for so many pre-AVX2 vector shifts that can often actually expand during lowering to +20 uops, just for 128-bit vectors, resulting in some horrible inline/unroll decisions.

This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)
2022-09-15 16:51:58 +01:00
Simon Pilgrim 0ec028fe10 [CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops
Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount.

This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)
2022-09-15 14:05:30 +01:00
Craig Topper 50a699e362 [IR][VP] Remove IntrArgMemOnly from vp.gather/scatter.
IntrArgMemOnly is only valid for intrinsics that use a scalar
pointer argument. These intrinsics use a vector of pointer.

Alias analysis will try to find a scalar pointer argument and
will return incorrect alias results when it doesn't find one.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133898
2022-09-14 15:00:07 -07:00
Craig Topper 6384044df4 [GVN][VP] Add test case for incorrect removal of a vp.gather. NFC
Pre-commit for D133898

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133899
2022-09-14 15:00:07 -07:00
Florian Hahn 7f3ff9d3c0
[ConstraintElimination] Track if variables are positive in constraint.
Keep track if variables are known positive during constraint
decomposition, aggregate the information when building the constraint
object and encode the extra information as constraints to be used during
reasoning.
2022-09-14 18:43:54 +01:00
Zain Jaffal 8253f7e286
[InstCombine] Optimize multiplication where both operands are negated
Handle the case where both operands are negated in matrix multiplication

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D133695
2022-09-14 16:29:39 +01:00
Sanjay Patel 73919a87e9 [InstCombine] try multi-use demanded bits folds for 'add'
This patch enables a multi-use demanded bits fold (motivated by issue #57576):
https://alive2.llvm.org/ce/z/DsZakh

This mimics transforms that we already do on the single-use path.

Originally, this patch did not include the last part to form a constant, but
that can be removed independently to reduce risk. It's not clear what the
effect of either change will be when viewed end-to-end.

This is expected to be neutral or a slight win for compile-time.
See the "add-demand2" series for experimental timing results:
https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions&remote=rotateright

Differential Revision: https://reviews.llvm.org/D133788
2022-09-14 09:30:59 -04:00
Florian Hahn f213128b29
[ConstraintElimination] Further de-compose operands of add operations.
This simply extends the existing logic to look through adds and combine
the components as done in other places already.
2022-09-14 12:00:32 +01:00
Florian Hahn aba2085e52
[ConstraintElimination] Add tests where info from zext can be used. 2022-09-14 10:04:07 +01:00
Florian Hahn 62c1928437
[ConstraintElimination] Add tests for chained adds.
Add test coverage for reasoning about chains of adds.
2022-09-14 09:27:18 +01:00
jacquesguan ecf327f154 [RISCV] Add cost model for vector insert/extract element.
This patch adds cost model for vector insert/extract element instructions. In RVV, we could use vector scalar move instruction to insert or extract the first element, and use vslide to move it. But for mask vector or i64 vector in i32 target, we need special instructions to make it.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133007
2022-09-14 11:10:18 +08:00
Alex Bradbury c44c1e9d3e [RISCV] Implement isMaskAndCmp0FoldingBeneficial hook
This hook is currently only used by CodeGenPrepare, which will sink *and
duplicate* an 'and' into a block that has an 'icmp 0' user of it if the
hook returns true.

This hook is less useful for RISC-V than for targets like AArch64 that
have a TBZ (test bit and branch if zero instruction), but may still be
profitable if Zbs is available and a BEXTI can be selected.

Conservatively, we return false even if Zbs is enabled for any masks
that fit in the ANDI immediate because it's possible the only use is a
branch on the result, and ANDI+BNEZ => BEXTI+BNEZ isn't a profitable
transformation.

Differential Revision: https://reviews.llvm.org/D131492
2022-09-13 18:54:00 +01:00
Arthur Eubanks 5a33d1f0b9 [SimplifyCFG] Don't hoist allocas
D129370 started hoisting allocas across stacksave/stackrestore
boundaries which is wrong.

Reviewed By: chill, rnk

Differential Revision: https://reviews.llvm.org/D133730
2022-09-13 09:23:39 -07:00
Valery N Dmitriev 18dde772d6 [SLP] Unify main/alternate selection for CmpInst instructions
Make main/alternate operation selection logic for CmpInst
consistent across SLP vectorizer.

Differential Revision: https://reviews.llvm.org/D133430
2022-09-13 09:20:25 -07:00
Simon Pilgrim 8ae9cf550b [LoopVectorize][X86] Add uniform shift costs checks for VF=1/2/4 2022-09-13 13:46:52 +01:00
Zain Jaffal 67a7854719
[InstCombine] Test for matrix multiplication negation optimisation.
If one of the operands is negated in a multiplication we can optimise the operation by moving the negation to the smallest operand or to the result

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D133287
2022-09-13 11:35:18 +01:00
Matthias Gehre 6bf1b4e8e0 Move ExpandLargeDivRem to llvm/test/CodeGen/X86 because they need a triple 2022-09-13 08:29:54 +01:00
Max Kazantsev 86d5586d78 [SCEVExpander] Recompute poison-generating flags on hoisting. PR57187
Instruction being hoisted could have nuw/nsw flags inferred from the old
context, and we cannot simply move it to the new location keeping them
because we are going to introduce new uses to them that didn't exist before.

Example in https://github.com/llvm/llvm-project/issues/57187 shows how
this can produce branch by poison from initially well-defined program.

This patch forcefully recomputes poison-generating flag in the new context.

Differential Revision: https://reviews.llvm.org/D132022
Reviewed By: fhahn, nikic
2022-09-13 12:56:35 +07:00
Arthur Eubanks 724664c56b [test][SimplifyCFG] Precommit test with hoisting inallocas 2022-09-12 15:07:52 -07:00
Sanjay Patel 53eede597e [InstCombine] look through 'not' of ctlz/cttz op with 0-is-undef
https://alive2.llvm.org/ce/z/MNsC1S

This pattern was flagged at:
https://discourse.llvm.org/t/instcombines-select-optimizations-dont-trigger-reliably/64927
2022-09-12 15:06:21 -04:00
Sanjay Patel 5ddfac40c3 [InstCombine] add tests for select of ctlz/cttz with 'not' value; NFC 2022-09-12 15:06:20 -04:00
A-Wadhwani de3445e0ef [SROA] Create additional vector type candidates based on store and load slices
This patch adds additional vector types to be considered when doing
promotion in SROA, based on the types of the store and load slices. This
provides more promotion opportunities, by potentially using an optimal
"intermediate" vector type.

For example, the following code would currently not be promoted to a
vector, since `__m128i` is a `<2 x i64>` vector.

```

__m128i packfoo0(int a, int b, int c, int d) {
  int r[4] = {a, b, c, d};
  __m128i rm;
  std::memcpy(&rm, r, sizeof(rm));
  return rm;
}
```

```
packfoo0(int, int, int, int):
        mov     dword ptr [rsp - 24], edi
        mov     dword ptr [rsp - 20], esi
        mov     dword ptr [rsp - 16], edx
        mov     dword ptr [rsp - 12], ecx
        movaps  xmm0, xmmword ptr [rsp - 24]
        ret
```

By also considering the types of the elements, we could find that the
`<4 x i32>` type would be valid for promotion, hence removing the memory
accesses for this function. In other words, we can explore other new
vector types, with the same size but different element types based on
the load and store instructions from the Slices, which can provide us
more promotion opportunities.

Additionally, the step for removing duplicate elements from the
`CandidateTys` vector was not using an equality comparator, which has
been fixed.

Differential Revision: https://reviews.llvm.org/D132096
2022-09-12 09:55:37 -07:00
Sanjay Patel 4ca25c66d4 [Reassociate] prevent partial undef negation replacement
As shown in the examples in issue #57683, we allow matching
vectors with poison (undef) in this transform (and possibly more),
but we can't then use the partially defined value as a replacement
value in other expressions blindly.

This seems to be avoided in simpler examples of reassociation,
and other passes should be able to clean up the redundant op
seen in these tests.
2022-09-12 12:28:34 -04:00
Sanjay Patel eb2ac0a3c9 [Reassociate] add tests for vector negate with undef elements; NFC
Reduced/expanded from issue #57683.
2022-09-12 12:28:34 -04:00
Matthias Gehre c1502425ba Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetLowering in the new pass manager.

Differential Revision: https://reviews.llvm.org/D133691
2022-09-12 17:06:16 +01:00
Florian Hahn 3fd1cc2574
[SLP] Add Preheader to CSE blocks after hoisting CSE-able instrs.
Adding the pre-header to CSEBlocks ensures instructions are CSE'd even
after hoisting.

This was original discovered by @atrick a while ago.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D133649
2022-09-12 15:53:31 +01:00
Simon Pilgrim e80b9a8f37 [SimplifyCFG][X86] Regenerate speculate-cttz-ctlz.ll
There's no difference between generic/bmi/lzcnt targets atm
2022-09-12 15:16:44 +01:00
Alexey Bataev dfe1e9dd79 [SLP]Improve reordering of clustered reused scalars.
If the reused scalars are clustered, i.e. each part of the reused mask
contains all elements of the original scalars exactly once, we can
reorder those clusters to improve the whole ordering of of the clustered
vectors.

Differential Revision: https://reviews.llvm.org/D133524
2022-09-12 06:52:25 -07:00
zhongyunde 8a15695be2 [AA] Improve the BasicAA analysis capability
According https://discourse.llvm.org/t/memoryssa-does-the-accessedbetween-support-scalable-vector-pointer/65052,
scalable vector support in BasicAA is currently essentially limited,
and should be improved effectively for a constant offset GEP if the scalable index is zero, eg:
  getelementptr <vscale x 4 x i32>, ptr %p, i64 0, i64 %i

Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D133567
2022-09-12 19:41:17 +08:00
Max Kazantsev 0e465c0c2f [IRCE] Bail in case of pointer types. PR40539
We should not unconditionally expect that SCEVable types are all integers
because SCEV can also be computed for pointers. Bail in this case.
2022-09-12 16:01:25 +07:00
Djordje Todorovic b080d0bae8 Revert ""Recommit "[AggressiveInstCombine] Lower Table Based CTTZ"""
This reverts commit df868edee5, as it
introduces a bug found by Alive2 (more on the rGdf868edee561).
2022-09-12 08:23:07 +02:00
Johannes Doerfert c922cac868 Revert "[Attributor] AAPointerInfo should allow "harmless" uses"
Revert "[Attributor] Teach AAPointerInfo to look into aggregates"

This reverts commit 844f6c5d03 and
4ed0a88cd8 as they broke the buildbots
that run openmp/libomptarget/test/offloading/bug49021.cpp.
2022-09-11 21:37:54 -07:00
Johannes Doerfert 844f6c5d03 [Attributor] AAPointerInfo should allow "harmless" uses
If a call base use will not capture a pointer we can approximate the
effects. This is important especially for readnone/only uses.
2022-09-11 20:16:11 -07:00
Johannes Doerfert 4ed0a88cd8 [Attributor] Teach AAPointerInfo to look into aggregates
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
2022-09-11 20:16:11 -07:00
Johannes Doerfert b046ebdc01 [Attributor][FIX] Conservatively handle ptr2int, don't crash
If a pointer-2-int cast is found we give up on AAPointerInfo for now.
This caused a crash before.

Reported by John Tramm (@jtramm).
2022-09-11 20:16:11 -07:00
Johannes Doerfert 21711039e3 [OpenMP] Allow the Attributor to look at functions we also internalized
This is important as we have accesses to globals in those which we need to
categorize.
2022-09-11 20:16:11 -07:00
Marc Auberer 09cdddea0c [InstCombine] Fold x + (x | -x) to x & (x - 1)
Fixes #57531

This transformation may be particularly useful on x86-64,
because x & (x - 1) can be performed by a single blsr instruction.

Differential Revision: https://reviews.llvm.org/D133362
2022-09-11 06:14:24 -04:00
Sanjay Patel a91effa0b8 [InstCombine] add tests for mul-by-neg-pow2; NFC 2022-09-11 06:14:24 -04:00
Sanjay Patel a04fada354 [InstCombine] add tests for demanded bits of add with multi-use; NFC 2022-09-11 06:14:24 -04:00
Alexey Bader 2bb5535b58 [StripDeadDebugInfo] Drop dead CUs
In situations when a submodule is extracted from big module (i.e. using
CloneModule) a lot of debug info is copied via metadata nodes. Despite of
the fact that part of that info is not linked to any instruction in extracted
IR file, StripDeadDebugInfo pass doesn't drop them.
Strengthen criteria for debug info that should be kept in a module:
- Only those compile units are left that referenced by a subprogram debug info
node that is attached to a function definition in the module or to an instruction
in the module that belongs to an inlined function.

Signed-off-by: Mikhail Lychkov <mikhail.lychkov@intel.com>

Differential Revision: https://reviews.llvm.org/D122163
2022-09-11 01:31:03 -07:00
Florian Hahn 3ff7892005
[SLP] Add test case showing missing CSE in hoisted instructions. 2022-09-10 20:55:04 +01:00
Simon Pilgrim 10edf88458 [CostModel][X86] Update CTPOP costs
With the bdver2 model updates, many of the AVX1 costs were far too high - it also helped expose some costs mismatches for Atom/Silvermont
2022-09-10 17:57:20 +01:00
Augie Fackler 4fea8ee540 OpenMP: mark allocptr attribute on __kmpc_free_shared
Differential Revision: https://reviews.llvm.org/D124491
2022-09-09 14:09:18 -04:00
Jamie Schmeiser 5e3ac79690 Loop names used in reporting can grow very large
Summary:
The code for generating a name for loops for various reporting scenarios
created a name by serializing the loop into a string.  This may result in
a very large name for a loop containing many blocks.  Use the getName()
function on the loop instead.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: Whitney (Whitney Tsang), aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D133587
2022-09-09 13:45:14 -04:00
Mingming Liu 8aa800614b [AArch64][CostModel] Detects that {extract,insert}-element at lane 0 has the same cost as the other lane for vector instructions in the IR.
Currently, {extract,insert}-element has zero cost at lane 0 [1]. However, there is a cost (by fmov instruction [2], or ext/ins instruction) to move values from SIMD registers to GPR registers, when the element is used explicitly as integers.

See https://godbolt.org/z/faPE1nTn8, when fmov is generated for d* register -> x* register conversion.

Implementation-wise, add a private method `AArch64TTIImpl::getVectorInstrCostHelper` as a helper function. This way, instruction-based method could share the core logic (e.g.,
returning zero cost if type is legalized to scalar).

[1] 2cf320d41e/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (L1853)
[2] 2cf320d41e/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L8150-L8157)

Differential Revision: https://reviews.llvm.org/D128302
2022-09-09 09:47:30 -07:00
Philip Reames 4e295cb1ce [LV] Autogen a test for ease of update 2022-09-09 08:16:22 -07:00
Philip Reames edb26268ce [VPlan] Only generate single instr for stores uniform across all parts.
Extend the approach taken by D133019 to store instructions.

Differential Revision: https://reviews.llvm.org/D133497
2022-09-09 07:15:12 -07:00
Nikita Popov 5b1df2e951 [LICM] Regenerate test checks (NFC) 2022-09-09 15:30:17 +02:00
Nikita Popov 4ab77d1677 [LICM] Allow promotion with non-load/store users
If there are non-load/store users of the promoted pointer, we
currently abort promotion. However, having such users isn't really
relevant to the transform. We already separately check that a)
there are no instructions that modref the promoted pointer and
b) that a pointer capture disables store promotion.

In the affected @test_captured_in_loop test case we have a readnone
capture of the promoted pointer, which means that load promotion
can be performed (while store promotion cannot).

Differential Revision: https://reviews.llvm.org/D133485
2022-09-09 13:09:59 +02:00
Graham Hunter 1f639d1bd2 [NFC][LV] Convert masked call tests to use update script 2022-09-09 10:07:39 +01:00
Djordje Todorovic df868edee5 "Recommit "[AggressiveInstCombine] Lower Table Based CTTZ""
This reverts commit 053841c562.

We faced a use-after-free after pushing the D113291, since the
foldSqrt() has a call to eraseFromParent(). The function
should be at the end of the main loop that folds the patterns.
This patch fixes that.
2022-09-09 10:29:39 +02:00
Craig Topper 5f3a8b585b [RISCV] Add RecurKind::FMulAdd to isLegalToVectorizeReduction for scalable vectors.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133511
2022-09-08 12:34:59 -07:00
Sanjay Patel 444f08c832 [InstCombine] fold icmp of truncated left shift, part 2
(trunc (1 << Y) to iN) == 2**C --> Y == C
(trunc (1 << Y) to iN) != 2**C --> Y != C
https://alive2.llvm.org/ce/z/xnFPo5

Follow-up to d9e1f9d759. This was a suggested
enhancement mentioned in issue #51889.
2022-09-08 12:44:02 -04:00
Philip Reames 4c4c0d2c06 [LV] Use safe-divisor lowering for fixed vectors if profitable
This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well.

Differential Revision: https://reviews.llvm.org/D132591
2022-09-08 09:15:54 -07:00
Djordje Todorovic 7aec9ddcfd Revert "Recommit "[AggressiveInstCombine] Lower Table Based CTTZ""
This reverts commit f879939157.
2022-09-08 17:01:16 +02:00
Sanjay Patel d9e1f9d759 [InstCombine] Fold icmp of truncated left shift
(trunc (1 << Y) to iN) == 0 --> Y u>= N
(trunc (1 << Y) to iN) != 0 --> Y u<  N

These can be generalized in several ways as noted by the TODO
items, but this handles the pattern in the motivating bug report.

Fixes #51889

Differential Revision: https://reviews.llvm.org/D115480
2022-09-08 10:48:14 -04:00
Djordje Todorovic f879939157 Recommit "[AggressiveInstCombine] Lower Table Based CTTZ" 2022-09-08 16:36:46 +02:00
Florian Hahn 422cf99161
[VPlan] Only generate single instr for loads uniform across all parts.
VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a
scalar instruction is generated per-part.

This is a potential alternative D132892. For now the current patch only
catches cases where the address is trivially invariant (defined outside
VPlan), while D132892 catches any address that is considered invariant
by SCEV AFAICT.

It should be possible to hoist fully invariant recipes feeding loads out
of the vector loop region as well, but in practice LICM should do that
already.

This version of the patch artificially limits this to loads to make it
easier to compare, but this restriction should be easily liftable.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133019
2022-09-08 14:27:58 +01:00
Nikita Popov 52f7eb3151 [LICM] Add test for sret with conditional store (NFC) 2022-09-08 14:53:06 +02:00
Nikita Popov 96cb7c2273 [ConstantExpr] Remove fneg expression
As part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179,
this removes the fneg constant expression (which is, incidentally,
the only unary operator expression).

Differential Revision: https://reviews.llvm.org/D133418
2022-09-08 10:24:55 +02:00
Chenbing Zheng 01cea7ac10 [InstCombine] extractvalue (any_mul_with_overflow X, 2^n), 0 -> X << n
Alive2: https://alive2.llvm.org/ce/z/JLmabt (umul)
        https://alive2.llvm.org/ce/z/J_ruXR  (smul)
        https://alive2.llvm.org/ce/z/o9SVSz (vector)

Reviewed By: spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D133188
2022-09-08 11:12:55 +08:00
Michael Berg ba755f7951 [NFC][DSE] Add a masked dead store test that should rely on additional guards for removal. 2022-09-07 19:20:13 -07:00
Marc Auberer a002fd6678 [InstCombine] Baseline tests for folding x + (x | -x) to x & (x - 1)
Baseline tests for D133362

Differential Revision: https://reviews.llvm.org/D133449
2022-09-07 19:12:23 -04:00
Sanjay Patel 85b289377b [SCCP] convert signed div/rem to unsigned for non-negative operands, 2nd try
The original commit ( fe1f3cfc26 ) was reverted because it could
crash / assert when trying to fold a value that was replaced
by a constant. In that case, there might not be an entry for the
constant in the solver yet.

This version adds a check for that possibility along with tests to
exercise that pattern (they used to crash).

Original commit message:
This extends the transform added with D81756 to handle div/rem opcodes.
For example:
https://alive2.llvm.org/ce/z/cX6za6

This replicates part of what CVP already does, but the motivating example
from issue #57472 demonstrates a phase ordering problem - we convert
branches to select before CVP runs and miss the transform.

Differential Revision: https://reviews.llvm.org/D133198
2022-09-07 11:56:29 -04:00
Sanjay Patel 7c57180900 [InstCombine] fold add+negate through select into sub
This transform came up as a potential DAGCombine in D133282,
so I wanted to see how it escaped in IR too.

We do general folds in InstCombiner::SimplifySelectsFeedingBinaryOp()
by checking if either arm of a select simplifies when the trailing
binop is threaded into the select.

So as long as one side simplifies, it's a good fold to combine a
negate and add into 1 subtract.

This is an example with a zero arm in the select:
https://alive2.llvm.org/ce/z/Hgu_Tj

And this models the tests with a cancelling 'not' op:
https://alive2.llvm.org/ce/z/BuzVV_

Differential Revision: https://reviews.llvm.org/D133369
2022-09-07 08:23:35 -04:00
Sanjay Patel e6c5d13ac7 [InstCombine] add tests for SimplifySelectsFeedingBinaryOp(); NFC 2022-09-07 08:23:35 -04:00
Aaron Kogon ae05b9dc30 Sink/hoist memory instructions between loop fusion candidates
Currently, instructions in the preheader of the second of two fusion
candidates are sunk and hoisted whenever possible, to try to allow the
loops to fuse. Memory instructions are skipped, and are never sunk or
hoisted. This change adds memory instructions for sinking/hoisting
consideration.

This change uses DependenceAnalysis to check if a mem inst in the
preheader of FC1 depends on an instruction in FC0's header, across
which it will be hoisted, or FC1's header, across which it will be
sunk. We reject cases where the dependency is a data hazard.

Differential Revision: https://reviews.llvm.org/D131606
2022-09-07 07:42:00 -04:00
Marco Elver f0d6709e4a [AtomicExpandPass] Always copy pcsections Metadata to expanded atomics
When expanding IR atomics to target-specific atomics, copy all
!pcsections Metadata to expanded atomics automatically.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130885
2022-09-07 11:36:01 +02:00
Nikita Popov 98a3a340c3 [ConstantExpr] Don't create fneg expressions
Don't create fneg expressions unless explicitly requested by IR or
bitcode.
2022-09-07 11:27:25 +02:00
Xiang1 Zhang 16743c9534 [CodeGen] Limit building time in CodeGenPrepare for huge function
Details:

Currently CodeGenPrepare is very time consuming in handling big functions.

Old Algorithm :
It iterate each BB in function, and go on handle very instructions in BB.
Due to some instruction optimizations may affect the BBs' dominate tree.
The old logic will re-iterate and try optimize for each BB.

Suppose we have a big function with 20000 BBs, If we handled the last BB
with fine tuning the dominate tree. We need totally re-iterate and try optimize
the 20000 BBs from the beginning.

The Complex is near N!

And we really encounter somes big tests (> 20000 BBs) that cost more than 30
mins in this pass. (Debug version compiler will cost 2 hours here)

What this patch do for huge function ?
It mainly changes the iteration way for optimization.

1 We do optimizeBlock for each BB (that is same with old way).
And, in the meaning time, If BB is changed/updated in the optimization, it will
be put into FreshBBs (try do optimizeBlock again).
The new created BB at previous iteration will also put into FreshBBs.

2 For the BBs which not updated at previous iteration, we directly skip it.
Strictly speaking, here may miss some opportunity, but the probability is very
small.

3 For Instructions in single BB, we do optimizeInst for each instruction.
If optimizeInst change the instruction dominator in this BB, rather than break
and go back to optimize the first BB (the old way), we directly iterate
instructions (to do optimizeInst) in this updated BB again (the new way).

What this patch do for small/normal (not huge) function ?
It is same with the Old Algorithm. (NFC)

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D129352
2022-09-07 10:05:40 +08:00
Ruobing Han fb45f3c948 [SimpleLoopUnswitch] Skip non-trivial unswitching of cold functions
In the current main branch, all cold loops will not be applied non-trivial unswitch. As reported in D129599, skipping these cold loops will incur regression in SPEC benchmark.
Thus, instead of skipping cold loops, now only skipping loops in cold functions.

Reviewed By: alexgatea, aeubanks

Differential Revision: https://reviews.llvm.org/D133275
2022-09-06 19:13:31 -04:00
Joseph Huber 58645d3252 [OpenMP] Fix `omp_get_wtime` function being marked incorrectly as readonly
OpenMP has a list of of optimistic attributes that can be attached to
known runtime functions to aid some analysis. The `omp_get_wtime`
function incorrectly used the `readonly` attribute. This is not correct
at the `omp_get_wtime` function changes values depending on some
external state. This is more correctly modeled with
`inaccessiblememonly` meaning that the value does not depend on anything
within the module, but can not be removes as it depends on external
state.

Fixes #57578

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D133360
2022-09-06 12:59:00 -05:00
Florian Hahn 27e7db54eb
Revert "[SCCP] convert signed div/rem to unsigned for non-negative operands"
This reverts commit fe1f3cfc26.

It looks like this commit breaks building llvm-test-suite.

To reproduce, run `opt -passes=ipsccp` on the IR below.

    @g = internal global i32 256, align 4

    define void @test() {
    entry:
      %0 = load i32, ptr @g, align 4
      %div = sdiv i32 %0, undef
      ret void
    }
2022-09-06 18:21:51 +01:00
Sanjay Patel e028121ed0 [InstCombine] add/move tests for add with select operands that simplify; NFC 2022-09-06 12:19:50 -04:00
Sanjay Patel d4a4004c0f [InstCombine] add tests for add of select with 0 and negate arms; NFC 2022-09-06 12:19:49 -04:00
Doru Bercea 0b1160fdeb Fix OpenMP Opt for target without a parallel region.
Remove ctx redeclaration.

Format code.

Remove parallel check. Modify tests. Clean-up code.

Fix another test.

Move code to helper functions.

Format file.

Minor fixes.
2022-09-06 16:04:53 +00:00
Matthias Gehre af3758d678 Fix remaining test failures for "[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64" 2022-09-06 16:38:43 +01:00
Sanjay Patel fe1f3cfc26 [SCCP] convert signed div/rem to unsigned for non-negative operands
This extends the transform added with D81756 to handle div/rem opcodes.
For example:
https://alive2.llvm.org/ce/z/cX6za6

This replicates part of what CVP already does, but the motivating example
from issue #57472 demonstrates a phase ordering problem - we convert
branches to select before CVP runs and miss the transform.

Differential Revision: https://reviews.llvm.org/D133198
2022-09-06 08:58:15 -04:00
Sanjay Patel a8fcb51242 [InstSimplify] allow poison/undef in constant match for "C - X ==/!= X -> false/true"
This fold was added with 5e9522c311, but over-specified.
We can assume that an undef element is an odd number:
https://alive2.llvm.org/ce/z/djQmWU
2022-09-06 08:19:30 -04:00
Sanjay Patel 1184f8cca5 [InstCombine] add tests for icmp-of-trunc; NFC 2022-09-06 08:19:30 -04:00
LiaoChunyu 5e9522c311 [InstSimplify] Odd - X ==/!= X -> false/true
Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D132989
2022-09-05 23:51:45 +08:00
LiaoChunyu 14eea55445 [InstSimplify][NFC][test] Add tests for Odd - X ==/!= X -> false/true
testcases will be updated by D132989

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D133306
2022-09-05 23:17:26 +08:00
Momchil Velikov 078899cd64 [SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions
SimplifyCFG does some common code hoisting, which is limited
to hoisting a sequence of identical instruction in identical
order and stops at the first non-identical instruction.

This patch allows hoisting instruction pairs over
same-length sequences of non-matching instructions. The
linear asymptotic complexity of the algorithm stays the
same, there's an extra parameter
`simplifycfg-hoist-common-skip-limit` serving to limit
compilation time and/or the size of the hoisted live ranges.

The patch improves SPECv6/525.x264_r by about 10%.

Reviewed By: nikic, dmgreen

Differential Revision: https://reviews.llvm.org/D129370
2022-09-05 15:13:46 +01:00
Tian Zhou 8fa432be4f [InstCombine] reduce test-for-overflow of shifted value
Fixes #57338.

The added code makes the following transformations:

For unsigned predicates / eq / ne:
icmp pred (x << 1), x --> icmp getSignedPredicate(pred) x, 0
icmp pred x, (x << 1) --> icmp getSignedPredicate(pred) 0, x

Some examples:
https://alive2.llvm.org/ce/z/ckn4cj
https://alive2.llvm.org/ce/z/h-4bAQ

Differential Revision: https://reviews.llvm.org/D132888
2022-09-05 09:51:51 -04:00
Samuel Parker e893345589 [NFC][TypePromotion] Add test 2022-09-05 09:01:23 +01:00
Chenbing Zheng 3e84a955a3 [InstCombine] Precommit tests for smul_with_overflow. nfc 2022-09-05 10:43:40 +08:00
Florian Hahn ba3d29f871
[LCSSA] Update unreachable uses with poison.
Users of LCSSA may not expect non-phi uses when checking the uses
outside a loop, which may cause crashes. This is due to the fact that we
do not update uses in unreachable blocks.

To ensure all reachable uses outside the loop are phis, update uses in
unreachable blocks to use poison in dead code.

Fixes #57508.
2022-09-04 22:26:18 +01:00
Florian Hahn a10d42dd45
[LV] Update test use opaque pointers, regenerate checks.
Modernize the test to make it easier to extend in a follow-up patch.
2022-09-04 22:26:18 +01:00
Florian Hahn e27e826d80
[LCSSA] Update test use opaque pointers, regenerate checks.
Modernize the test to make it easier to extend in a follow-up patch.
2022-09-04 22:26:17 +01:00
Simon Pilgrim 626a84db47 [CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry
Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers
2022-09-04 17:59:08 +01:00
Simon Pilgrim 8dc99180a6 [PhaseOrdering] Move X86 unsigned-multiply-overflow-check.ll test under X86 2022-09-04 17:54:32 +01:00
Simon Pilgrim 80d4b3a275 Revert rG06e73626cf0fc33b025a0f98f1eee4a302279982 "[CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry"
Some arm buildbots are complaining about a phase ordering test failure in unsigned-multiply-overflow-check.ll - I guess this test needs making x86 specific first
2022-09-04 17:51:11 +01:00
Simon Pilgrim 06e73626cf [CostModel][X86] getTypeBasedIntrinsicInstrCost - convert to CostKindTblEntry
Begin the refactoring to use CostKindTblEntry and return real latency/codesize/sizelatency costs instead of reusing the throughput numbers
2022-09-04 17:28:45 +01:00
Ruobing Han 07341e3b37 [test] pre-submission for the following SimpleLoopUnswitch update 2022-09-04 11:49:51 -04:00
Sanjay Patel 5c759edc57 [InstCombine] reduce another or-xor bitwise logic pattern
~(A & ?) | (A ^ B) --> ~((A & ?) & B)
https://alive2.llvm.org/ce/z/mxex6V

This is similar to 9d218b61cc where we peeked through
another logic op to find a common operand.
2022-09-03 09:32:08 -04:00
Sanjay Patel fbfac8e388 [InstCombine] add tests for or-xor-nand; NFC 2022-09-03 09:32:08 -04:00
Richard Smith 053841c562 Revert "[AggressiveInstCombine] Lower Table Based CTTZ"
This reverts commit fec01ee3f5.

According to asan, this patch introduces a heap use after free.
2022-09-02 16:19:09 -07:00
Francis Visoiu Mistrih 81bdb4068d [Matrix] Simplify matmuls with scalars
If one of the operands is a transposed splat, the transpose can be
removed.

This is useful to simplify when transposes are distributed to operands
of a matmul:

* k^T -> k
* (A * k)^t -> A^t * k

Differential Revision: https://reviews.llvm.org/D130177
2022-09-02 15:50:25 -07:00
Djordje Todorovic fec01ee3f5 [AggressiveInstCombine] Lower Table Based CTTZ
This patch introduces recognition of table-based ctz implementation
during the AggressiveInstCombine.

This fixes the [0].

[0] https://bugs.llvm.org/show_bug.cgi?id=46434

Differential Revision: https://reviews.llvm.org/D113291
2022-09-02 17:26:55 +02:00
Tian Zhou a5880b5f9c [InstCombine] Baseline tests for reducing test-for-overflow of shifted value
Baseline tests for this patch https://reviews.llvm.org/D132888

Differential Revision: https://reviews.llvm.org/D133182
2022-09-02 09:01:43 -04:00
Jolanta Jensen 958abe864a [LoopLoadElim] Add stores with matching sizes as load-store candidates
We are not building up a proper list of load-store candidates because
we are throwing away stores where the type don't match the load.
This patch adds stores with matching store sizes as candidates.
Author of the original patch: David Sherwood.

Differential Revision: https://reviews.llvm.org/D130233
2022-09-02 13:11:25 +01:00
Muhammad Omair Javaid 18de7c6a3b Revert "[InstCombine] Treat passing undef to noundef params as UB"
This reverts commit c911befaec.

It has broken LLDB Arm/AArch64 Linux buildbots. I dont really understand
the underlying reason. Reverting for now make buildbot green.

https://reviews.llvm.org/D133036
2022-09-02 16:09:50 +05:00
Florian Hahn 91e67c0749
[GlobalOpt] Add test case for #56762.
Add test case where GlobalOpt fails to remove loads to global fields
with struct types.
2022-09-02 11:33:07 +01:00
Mikael Holmen 51d4c7ceea [GlobalOpt] Fix debug variance problem in hasOnlyColdCalls
hasOnlyColdCalls skipped over calls to intrinsics, but it did so after
checking the linkage of the called function. This meant that the presence
of a call to a debug intrinsic could affect the outcome of the
optimization.

In my original reproducer (for an out of tree target) it was particularly
interesting, because the actual IR after GlobalOpt was not different with
debug instrinsics present, so -print-after-all printouts didn't show
anything there.

However, without debuginfo, GlobalOpt went further and ran
BlockFrequencyAnalysis and (more importanly) LoopAnalysis, and later on in
the pipeline, instcombine behaved in different ways when LoopInfo was
present.

So a call to a dbg.declare prevented running LoopAnalysis in
GlobalOpt, which later prevented InstCombine from doing an optimization.

The dbg-intrinsic-loopanalysis.ll testcase tries to expose this.

Then I also noted that adding a dbg.declare actually made the existing
testcase colccc_coldsites.ll generate different code, so I modified that
to now test it behaves the same way with and without the dbg.declare.

Reviewed By: nikic, fhahn

Differential Revision: https://reviews.llvm.org/D133193
2022-09-02 12:29:44 +02:00
Sergey Kachkov be37caca00 [JumpThreading] Process range comparisions with non-local cmp instructions
Use getPredicateOnEdge method if value is a non-local
compare-with-a-constant instruction, that can give more precise
results than getConstantOnEdge.

Differential Revision: https://reviews.llvm.org/D131956
2022-09-02 12:22:45 +02:00
Nikita Popov 10dfcf1f87 [LICM] Add test for missed load promotion opportunity (NFC) 2022-09-02 11:36:07 +02:00
Nikita Popov c453e5b901 Revert "[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI"
This reverts commit cd8f3e7581.

As pointed out by Eli on the review, this is missing an alignment
check. The value might be written at an offset.
2022-09-02 09:28:48 +02:00
Nikita Popov 639d912282 [LICM] Allow load-only scalar promotion in the presence of unwinding
Currently, we bail out of scalar promotion if the loop may unwind
and the memory may be visible on unwind. This is because we can't
insert stores of the promoted value on unwind edges.

However, nowadays scalar promotion also has support for only
promoting loads, while leaving stores in place. This kind of
promotion is safe even in the presence of unwinding.

Differential Revision: https://reviews.llvm.org/D133111
2022-09-02 09:27:13 +02:00
luxufan cd8f3e7581 [DSE] Eliminate noop store even through has clobbering between LoadI and StoreI
For noop store of the form of LoadI and StoreI,
An invariant should be kept is that the memory state of the related
MemoryLoc before LoadI is the same as before StoreI.
For this example:
```
define void @pr49927(i32* %q, i32* %p) {
  %v = load i32, i32* %p, align 4
  store i32 %v, i32* %q, align 4
  store i32 %v, i32* %p, align 4
  ret void
}
```
Here the definition of the store's destination is different with the
definition of the load's destination, which it seems that the
invariant mentioned above is broken. But the definition of the
store's destination would write a value that is LoadI, actually, the
invariant is still kept. So we can safely ignore it.

Differential Revision: https://reviews.llvm.org/D132657
2022-09-02 06:37:41 +00:00
Chenbing Zheng bb0e6b7721 [InstCombine] Precommit tests for umul_with_overflow. nfc 2022-09-02 11:18:17 +08:00
Chenbing Zheng d30cf77cb1 [InstCombine] complete fold extractvalue (any_mul_with_overflow X, -1)
When we do extractvalue (any_mul_with_overflow X, -1) --> (-X and icmp),
which left partly failed to match vector constant with poison element.
This patch try to fix it.

Alive2: https://alive2.llvm.org/ce/z/2rGp_3

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D132996
2022-09-02 10:58:42 +08:00
Chenbing Zheng 4db9edfdb6 [NFC] fix typo 2022-09-02 10:04:52 +08:00
Arthur Eubanks c911befaec [InstCombine] Treat passing undef to noundef params as UB
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D133036
2022-09-01 15:16:45 -07:00
Rong Xu 0caa4a9559 [PGO] Support PGO annotation of CallBrInst
We currently instrument CallBrInst but do not annotate it with
the branch weight. This patch enables PGO annotation of CallBrInst.

Differential Revision: https://reviews.llvm.org/D133040
2022-09-01 14:13:50 -07:00
Sanjay Patel c29c6170fd [SCCP][PhaseOrdering] add tests for sdiv/srem range transforms; NFC
issue #57472
2022-09-01 16:23:00 -04:00
Arthur Eubanks 79a4fa366c [test][InstCombine] Update precommitted test 2022-09-01 09:08:44 -07:00
Nuno Lopes 858fe8664e Expand Div/Rem: consider the case where the dividend is zero
So we can't use ctlz in poison-producing mode
2022-09-01 17:04:26 +01:00
Arthur Eubanks 9599393eeb Revert "[Pipelines] Introduce DAE after ArgumentPromotion"
This reverts commit b10a341aa5.

This commit exposes the pre-existing https://github.com/llvm/llvm-project/issues/56503 in some edge cases. Will fix that and then reland this.
2022-09-01 08:52:19 -07:00
Nikita Popov 4f046bc8e0 [PHITranslateAddr] Require dominance when searching for translated address (PR57025)
This is a fix for PR57025 and an alternative to D131776. The problem
in the phi-translation-to-wrong-context.ll test case is that phi
translation of %gep.j into if2 pick %gep.i as the result. While this
instruction has the correct pointer address, it occurs in a context
where %i != 0. As such, we get a NoAlias result for the store in
if2, even though they do alias for %i == 0 (which is legal in the
original context of the pointer).

PHITranslateValue already has a MustDominate option, which can be
used to restrict PHI translation results to values that dominate the
translated-into block. However, this is more aggressive than what we
need and would significantly regress GVN results. In particular, if
we have a pointer value that does not require any translation, then
it is fine to continue using that value in the predecessor, because
the context is still correct for the original query. We only run into
problems if PHITranslateSubExpr() picks a completely random
instruction in a context that may have preconditions that do not hold.

Fix this by always performing the dominance checks in
PHITranslateSubExpr(), without enabling the more general MustDominate
requirement.

Fixes https://github.com/llvm/llvm-project/issues/57025. This also
fixes the test case for https://github.com/llvm/llvm-project/issues/30999,
but I'm not sure whether that's just the particular test case,
or a general solution to the problem.

Differential Revision: https://reviews.llvm.org/D132935
2022-09-01 16:26:42 +02:00
Nikita Popov 26347adf96 [LICM] Regenerate test checks (NFC) 2022-09-01 16:06:38 +02:00
Nikita Popov 315aef667e [LICM] Fix thread safety checks for promotion of byval args
This code was relying on a very subtle contract: The expectation
was that for non-allocas, the unwind safety check would already
perform a capture check, so we don't need to perform it later.
This held true when this unwind safety was only handled for allocas
and noalias calls, but became incorrect when byval support was
added.

To avoid this kind of issue, just remove the dependency between the
unwind and thread-safety checks entirely. At worst, this means we
perform a redundant capture check. If this should turn out to be
problematic for compile-time, we can cache that query in a more
explicit way.
2022-09-01 15:33:46 +02:00
Nikita Popov 20524a3c94 [LICM] Add another byval capture test (NFC)
Variant with capture after the loop, in which case promotion is
safe.
2022-09-01 15:18:10 +02:00
Nikita Popov e1826326af [LICM] Add test for byval scalar promotion miscompile (NFC) 2022-09-01 15:03:20 +02:00
Sanjay Patel c3d1504d63 [InstCombine] fix crash on type mismatch with fcmp fold
The existing predicate doesn't work for a single-element
vector, so make sure we are not crossing scalar/vector types.

Test (was crashing) based on the post-commit example for:
4827771234
2022-09-01 08:57:55 -04:00
Sanjay Patel addbdac5d5 [InstCombine] fold power-of-2 ctlz/cttz with inverted result
When X is a power-of-two or zero and zero input is poison:
ctlz(i32 X) ^ 31 --> cttz(X)
cttz(i32 X) ^ 31 --> ctlz(X)

https://alive2.llvm.org/ce/z/Cs7sFE
2022-09-01 08:57:55 -04:00
Bjorn Pettersson 3aab9d2bb7 [GVN] Pre-commit test case showing miscompile in github issue #57025
This commit adds a reproducer for
  https://github.com/llvm/llvm-project/issues/57025
showing a miscompile in GVN.

Not sure how likely this kind of faults would be in a normal pipeline,
considering that the input IR has some dead code in it. On the other
hand, GVN itself sometimes creates dead basic blocks when splitting
critical edges. Anyway, the fault was found when doing fuzzy testing
using random pass pipelines.

Differential Revision: https://reviews.llvm.org/D131775
2022-09-01 14:43:24 +02:00
Alexey Bataev 982d9ef1c1 [SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison.
Need either follow the original order of the operands for bool logical
ops, or emit freeze instruction to avoid poison propagation.

Differential Revision: https://reviews.llvm.org/D126877
2022-09-01 05:34:45 -07:00
Yuanbo Li ebd0249fcf [DebugInfo] Missing debug location after replacement in processSRem function
This patch fixes an issue in which CorrelatedValuePropagation::processSRem
would create new instructions to represent the SRem instruction, but would not
correctly copy any existing debug location metadata to the new instruction.

Differential Revision: https://reviews.llvm.org/D132218
2022-09-01 13:18:17 +01:00
Florian Hahn fc444ddc77
[VPlan] Add field to track if intrinsic should be used for call. (NFC)
This patch moves the cost-based decision whether to use an intrinsic or
library call to the point where the recipe is created. This untangles
code-gen from the cost model and also avoids doing some extra work as
the information is already computed at construction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D132585
2022-09-01 13:14:40 +01:00
Nuno Lopes fa154a9170 Revert "Expand Div/Rem: consider the case where the dividend is zero"
This reverts commit 4aed09868b.
2022-09-01 12:11:22 +01:00
Nuno Lopes 4aed09868b Expand Div/Rem: consider the case where the dividend is zero
So we can't use ctlz in poison-producing mode
2022-09-01 12:00:03 +01:00
Nikita Popov 43e7d9af1d [InstCombine] Fold extractvalue of phi
Just as we do for most other operations, we should push
extractvalue instructions through phis, if this does not increase
unfolded instruction count.
2022-09-01 10:51:54 +02:00
Nikita Popov 5b219dd9e9 [InstCombine] Add tests for extractvalue of phi (NFC) 2022-09-01 10:47:10 +02:00
Nikita Popov ab6876a40d reland: [Local] Allow creating callbr with duplicate successors
Since D129288, callbr is allowed to have duplicate successors. This patch removes a limitation which prevents optimizations from actually producing such callbrs.

This is probably the riskiest of all the recent callbr changes, because code with incorrect assumptions might be lurking somewhere. I fixed the one case I encountered ahead of time in 8201e3ef5c.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D129997

Originally landed as
commit 08860f525a ("[Local] Allow creating callbr with duplicate successors")

Reverted in
commit 1cf6b93df1 ("Revert "[Local] Allow creating callbr with duplicate successors"")
2022-08-31 13:23:00 -07:00
Arthur Eubanks 8c7b103055 [test][InstCombine] Precommit some undef/noundef tests 2022-08-31 11:19:31 -07:00
Florian Hahn e68594ca20
[SLP] Add FMA test case with missing or partial fast-math flags.
Add extra FMA tests with missing or partial fast-math flags.
2022-08-31 16:25:18 +01:00
Florian Hahn faad567589
[LV] Add test case where SCEV is needed to remove vector backedge.
Test case mentioned in the discussion for D115261.
2022-08-31 14:01:42 +01:00
Florian Hahn 1ed555a62b
[LV] Fix test cases where vector loop never executed.
It looks like the vector loops in the modified test cases
unintentionally never get executed. Update the exit condition to ensure
it does to avoid them getting optimized away in upcoming changes.
2022-08-31 13:24:49 +01:00
David Green 225faddf8a [ARM] Add a phase ordering test for MVE intrinsic remainder vectorization/unrolling. NFC 2022-08-31 12:08:38 +01:00
Nikita Popov ad66bc42b0 [InstCombine] Use getInsertionPointAfterDef() in freeze fold
This simplifies the code and fixes handling of catchswitch, in
which case we have no insertion point for the freeze.

Originally part of D129660.
2022-08-31 11:32:57 +02:00
Nikita Popov 8f3fd26b74 [Reassociate] Use getInsertionPointerAfterDef()
This simplifies the code and fixes handling for the callbr case,
where the instruction needs to be inserted in the normal
destination, rather than after the terminator.

Originally part of D129660.
2022-08-31 11:10:24 +02:00
Nikita Popov b10e508c19 [GVN] Add another test for phi translation miscompile (NFC) 2022-08-31 09:14:53 +02:00
Chenbing Zheng 35a3048c25 [InstCombine] add support for multi-use Y of (X op Y) op Z --> (Y op Z) op X
For (X op Y) op Z --> (Y op Z) op X
we can still do transform when Y is multi-use. In D131356 limit it to one-use,
this patch remove this limit.

This is still not a complete solution, I add a todo test to show it.
In this case, X and Y are both multi use, we can't differentiate how to convert based on this.
But at least we don't make the code worse,and it can solve half the scenarios.
2022-08-31 10:55:05 +08:00
Kai Luo ad2f7fd286 [AtomicExpand] Make floating point conversion happens before fence insertion
IIUC, the conversion part is not part of atomic operations and fences should be put around converted atomic operations.
This also fixes atomic load of floating point values which requires fence on PowerPC.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D127609
2022-08-31 09:54:58 +08:00
Alexey Bataev ec06df9459 [SLP]Fix PR57447: Assertion `!getTreeEntry(V) && "Scalar already in tree!"' failed.
The pointer operands for the ScatterVectorize node may contain
non-instruction values and they are not checked for "already being
vectorized". Need to check that such pointers are already vectorized and
gather them instead of trying to build vectorize node to avoid compiler
crash.

Differential Revision: https://reviews.llvm.org/D132949
2022-08-30 12:30:14 -07:00
Sanjay Patel 16e96a7e60 [InstCombine] add tests for xor-of-ctlz/cttz; NFC 2022-08-30 14:21:29 -04:00
Sanjay Patel 67cbd25dcd [InstCombine] add tests for signbit test using lshr; NFC 2022-08-30 14:21:29 -04:00
Hendrik Greving 3d5ea53906 [BasicBlockUtils] Amend test for loop metadata.
Amends test Transforms/LoopSimplify/update_latch_md2.ll
with auto-generated checks.

Differential Revision: https://reviews.llvm.org/D125574
2022-08-30 09:29:52 -07:00
Jolanta Jensen 92c4172756 [NFC][LoopLoadElim] Extending type-mismatch testing
Added IR for int-pointer type mismatch and int-vector
type mismatch. Regenerated CHECK lines using
the update_test_checks.py script.

Differential Revision: https://reviews.llvm.org/D132239
2022-08-30 16:44:19 +01:00
zhongyunde 23a5de4294 [InstCombine] Distributive or+mul with const operand
We aleady support the transform: `(X+C1)*CI -> X*CI+C1*CI`
Here the case is a little special as the form of `(X+C1)*CI` is transformed into `(X|C1)*CI`,
so we should also support the transform: `(X|C1)*CI -> X*CI+C1*CI`
Fixes https://github.com/llvm/llvm-project/issues/57278

Reviewed By: bcl5980, spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D132658
2022-08-30 20:36:52 +08:00
Florian Hahn b5e208fcba
[DSE] Support looking through memory phis at end of function.
Update isWriteAtEndOfFunction to look through MemoryPhis. The reason
MemoryPhis were skipped so far was the known AliasAnalysis issue with it
missing loop-carried dependences.

This problem is already addressed in other parts of the code by skipping
MemoryDefs that may be in difference loops. I think the same logic can
be applied here.

This can have a substantial impact on the number of stores removed in
some cases. For MultiSource/SPEC2006/SPEC2017 with -O3:

```
Metric: dse.NumFastStores

Program                                       dse.NumFastStores
                                              base              patch   diff
External/S...CINT2017rate/557.xz_r/557.xz_r     14.00             45.00 221.4%
External/S...te/538.imagick_r/538.imagick_r    439.00           1267.00 188.6%
MultiSourc...e/Applications/SIBsim4/SIBsim4      6.00             15.00 150.0%
MultiSourc...Prolangs-C/simulator/simulator      3.00              7.00 133.3%
MultiSource/Applications/siod/siod               3.00              7.00 133.3%
MultiSourc...arks/FreeBench/distray/distray      6.00              9.00  50.0%
MultiSourc...e/Applications/obsequi/Obsequi     22.00             30.00  36.4%
MultiSource/Benchmarks/Ptrdist/bc/bc            23.00             28.00  21.7%
External/S...NT2017rate/502.gcc_r/502.gcc_r   1258.00           1512.00  20.2%
External/S...te/520.omnetpp_r/520.omnetpp_r    954.00           1143.00  19.8%
External/S...rate/510.parest_r/510.parest_r   5961.00           7122.00  19.5%
External/S...C/CINT2006/445.gobmk/445.gobmk     47.00             56.00  19.1%
External/S...00.perlbench_r/500.perlbench_r    241.00            286.00  18.7%
External/S...NT2006/471.omnetpp/471.omnetpp     36.00             42.00  16.7%
External/S...06/400.perlbench/400.perlbench    183.00            210.00  14.8%
MultiSource/Applications/SPASS/SPASS            72.00             81.00  12.5%
External/S...17rate/541.leela_r/541.leela_r     72.00             80.00  11.1%
External/SPEC/CINT2006/403.gcc/403.gcc         585.00            642.00   9.7%
MultiSourc...e/Applications/sqlite3/sqlite3    120.00            131.00   9.2%
MultiSourc...Applications/hexxagon/hexxagon     11.00             12.00   9.1%
External/S.../CFP2006/453.povray/453.povray    566.00            615.00   8.7%
External/S...rate/511.povray_r/511.povray_r    578.00            627.00   8.5%
External/S...FP2006/482.sphinx3/482.sphinx3     12.00             13.00   8.3%
MultiSource/Applications/oggenc/oggenc         130.00            140.00   7.7%
MultiSourc...e/Applications/ClamAV/clamscan    250.00            268.00   7.2%
MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg     19.00             20.00   5.3%
MultiSourc...ch/consumer-jpeg/consumer-jpeg     19.00             20.00   5.3%
External/S...te/526.blender_r/526.blender_r   3747.00           3928.00   4.8%
MultiSourc...OE-ProxyApps-C++/miniFE/miniFE    104.00            108.00   3.8%
MultiSourc...ch/consumer-lame/consumer-lame     54.00             56.00   3.7%
MultiSource/Benchmarks/Bullet/bullet          1222.00           1264.00   3.4%
MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4    973.00           1005.00   3.3%
External/S.../CFP2006/447.dealII/447.dealII   2699.00           2780.00   3.0%
External/S...06/483.xalancbmk/483.xalancbmk    788.00            810.00   2.8%
External/S.../CFP2006/450.soplex/450.soplex    180.00            185.00   2.8%
MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR    338.00            345.00   2.1%
MultiSourc...Benchmarks/7zip/7zip-benchmark    685.00            699.00   2.0%
External/S...FP2017rate/544.nab_r/544.nab_r    158.00            160.00   1.3%
MultiSourc...sumer-typeset/consumer-typeset    772.00            781.00   1.2%
External/S...2017rate/525.x264_r/525.x264_r    410.00            414.00   1.0%
External/S...23.xalancbmk_r/523.xalancbmk_r    998.00           1002.00   0.4%
```

Compile-time is almost neutral:

https://llvm-compile-time-tracker.com/compare.php?from=b3125ad3d60531a97eea20009cc9629a87755862&to=84007eee59004f43464eda7f5ba8263ed5158df8&stat=instructions

NewPM-O3: +0.03%
NewPM-ReleaseThinLTO: -0.01%
NewPM-ReleaseLTO-g: +0.03%

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D132365
2022-08-30 13:27:51 +01:00
Nikita Popov 07bfbce988 [GVN] Regenerate test checks (NFC) 2022-08-30 12:06:37 +02:00
jacquesguan df525c7705 [InstCombine] fold fake floating point vector extract to shift+trunc.
This patch supports the FP part of D111082.

Differential Revision: https://reviews.llvm.org/D125750
2022-08-30 10:12:16 +08:00
jacquesguan f98153eac0 [InstCombine] Precommit test for D125750.
Differential Revision: https://reviews.llvm.org/D126054
2022-08-30 09:57:35 +08:00
Rong Xu d7ef0c3970 [llvm-profdata] Improve profile supplementation
Current implementation promotes a non-cold function in the SampleFDO profile
into a hot function in the FDO profile. This is too aggressive. This patch
promotes a hot functions in the SampleFDO profile into a hot function, and a
warm function in SampleFDO into a warm function in FDO.

Differential Revision: https://reviews.llvm.org/D132601
2022-08-29 16:50:42 -07:00
Philip Reames 4c10646367 [LV] Refresh autogen tests to reflect naming changes [nfc]
Purely so that these can be easily autogened without spurious diffs
2022-08-29 14:16:54 -07:00
Valery N Dmitriev 329b972d41 [SLP] Try to match reductions before trying to vectorize a vector build sequence.
This patch changes order of searching for reductions vs other vectorization possibilities.
The idea is if we do not match a reduction it won't be harmful for further attempts to
find vectorizable operations on a vector build sequences. But doing it in the opposite
order we have good chance to ruin opportunity to match a reduction later.
We also don't want to try vectorizing binary operations too early as 2-way vectorization
may effectively prohibit wider ones leading to producing less effective code.

Differential Revision: https://reviews.llvm.org/D132590
2022-08-29 13:32:14 -07:00
Philip Reames c37b1a5f76 [RLEV] Pick a correct insert point when incoming instruction is itself a phi node
This fixes https://github.com/llvm/llvm-project/issues/57336. It was exposed by a recent SCEV change, but appears to have been a long standing issue.

Note that the whole insert into the loop instead of a split exit edge is slightly contrived to begin with; it's there solely because IndVarSimplify preserves the CFG.

Differential Revision: https://reviews.llvm.org/D132571
2022-08-29 11:44:33 -07:00
Alexey Bataev beacf9bd9e [SLP]Fix PR57322: vectorize constant float stores.
Stores for constant floats must be vectorized, improve analysis in SLP
vectorizer for stores.

Differential Revision: https://reviews.llvm.org/D132750
2022-08-29 11:02:53 -07:00
Florian Hahn 3c5e24a51c
[SLP] Add tests showing over-eager SLP when scalar fma can be used.
Add test cases for AArch64 that show over-eager SLP vectorization on
AArch64, where keeping the things scalar allows efficient lowering using
scalar fmas.
2022-08-29 18:58:56 +01:00
Florian Hahn 197332a1f8
[DSE] Add extra test for loop invariant store in loop, update comments.
Add extra test coverage and updates some slightly stale comments as
pointed out in D132365.
2022-08-29 17:00:00 +01:00
Alexey Bataev e6345bf644 [SLP]Improve lookup of the buildvector top insertelement instruction.
When estimating the cost of the in-tree vectorized scalars in
buildvector sequences, need to take into account the vectorized
insertelement instruction. The top of the buildvector seuences is the
topmost vectorized insertelement instruction, because it will have
> than 1 use after the vectorization.

For the affected test case improves througput from 21 to 16 (per
llvm-mca).

Differential Revision: https://reviews.llvm.org/D132740
2022-08-29 08:19:52 -07:00
Sanjay Patel 6c39a3aae1 [InstCombine] fold not-shift of signbit to icmp+zext
https://alive2.llvm.org/ce/z/j_8Wz9

The arithmetic shift was converted to logical shift with:
246078604c

That does not seem to uncover any other missing/conflicting folds,
so convert directly to signbit test + cast.

We still need to fold the pattern with logical shift to test + cast.

This allows reducing patterns where the output type is not
the same as the input value:
https://alive2.llvm.org/ce/z/nydwFV

Fixes #57394
2022-08-29 10:06:31 -04:00
Sanjay Patel 246078604c [InstCombine] fold inc-of-signbit-splat to not+lshr
(iN X s>> (N - 1)) + 1 --> (~X) u>> (N - 1)

https://alive2.llvm.org/ce/z/wzS474
2022-08-29 08:48:22 -04:00
Sanjay Patel ac4c46d24b [InstCombine] add tests for increment-of-ashr; NFC 2022-08-29 08:48:22 -04:00
Florian Hahn 005d1a8ff5
[LV] Add test where either a libfunc or intrinsic is chosen.
In the newly added test either a libfunc (VF=2) or a intrinsic (VF=4)
can be chosen.

Test coverage for D132585.
2022-08-29 10:51:20 +01:00
zhongyunde eb438c80df [tests] precommit tests for D132658
Reviewed By: bcl5980
Differential Revision: https://reviews.llvm.org/D132820
2022-08-29 14:17:37 +08:00
Sanjay Patel ab6892967c [InstCombine] allow sext in fold of mask using signbit, part 2
https://alive2.llvm.org/ce/z/rcbZmx

Sibling tranform to 275aa24c0a

This pattern is seen in the examples in issue #57381.
2022-08-28 11:50:52 -04:00