Commit Graph

23151 Commits

Author SHA1 Message Date
Sanjay Patel 2e87333bfe [InstCombine] convert mul by negative-pow2 to negate and shift
This is an unusual canonicalization because we create an extra instruction,
but it's likely better for analysis and codegen (similar reasoning as D133399).

InstCombine::Negator may create this kind of multiply from negate and shift,
but this should not conflict because of the narrow negation.

I don't know how to create a fully general proof for this kind of transform in
Alive2, but here's an example with bitwidths similar to one of the regression
tests:
https://alive2.llvm.org/ce/z/J3jTjR

Differential Revision: https://reviews.llvm.org/D133667
2022-10-02 12:22:25 -04:00
Sanjay Patel 4490cfbaf4 [ValueTracking] peek through fpext in isKnownNeverInfinity()
https://alive2.llvm.org/ce/z/BkNoRW
2022-10-02 11:20:23 -04:00
Sanjay Patel 0243b424d7 [InstSimplify] add tests for FP infinity compare with fpext; NFC 2022-10-02 11:20:23 -04:00
Florian Hahn 3fe6ddd999
[ConstraintElimination] Update Changed status in ssub simplification.
Update tryToSimplifyOverflowMath to indicate whether the function made
any changes to the IR.
2022-10-02 14:25:51 +01:00
Arthur Eubanks 5df4ab55f9 [llvm] Migrate PAEval to new pass manager 2022-10-01 16:41:58 -07:00
Florian Hahn 05b0b48b50
[SimpleLoopUnswitch] Pass -verify-cfg-preserved to test.
This ensures PreservedCFGCheckerAnalysis is always added, independent of
whether opt was built with assertions enabled or not.

This fixes a few buildbot failures for bots that don't have assertions
enabled.
2022-10-01 17:19:02 +01:00
Florian Hahn 7c0ff64b0f
[LAA] Change to function analysis for new PM.
At the moment, LoopAccessAnalysis is a loop analysis for the new pass
manager. The issue with that is that LAI caches SCEV expressions and
modifications in a loop may impact SCEV expressions in other loops, but
we do not have a convenient way to invalidate LAI for other loops
withing a loop pipeline.

To avoid this issue, turn it into a function analysis which returns a
manager object that keeps track of the individual LAI objects per loop.

Fixes #50940.

Fixes #51669.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134606
2022-10-01 15:44:27 +01:00
Teresa Johnson 43417d8159 [MemProf] Update metadata during inlining
Update both memprof and callsite metadata to reflect inlined functions.

For callsite metadata this is simply a concatenation of each cloned
call's call stack with that of the inlined callsite's.

For memprof metadata, each profiled memory info block (MIB) is either
moved to the cloned allocation call or left on the original allocation
call depending on whether its context matches the newly refined call
stack context on the cloned call. We also reapply context trimming
optimizations based on the refined set of contexts on each of the calls
(cloned and original).

Depends on D128142.

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D128143
2022-09-30 19:21:15 -07:00
Teresa Johnson 4d243348fb Revert "[MemProf] Update metadata during inlining" and preceeding commit
This reverts commit 0d7f3464ce and
commit f9403ca41e. The latter was
"Profile matching and IR annotation for memprof profiles." and was left
from a bad rebase from a commit already pushed upstream.
2022-09-30 17:01:30 -07:00
Teresa Johnson 0d7f3464ce [MemProf] Update metadata during inlining
Update both memprof and callsite metadata to reflect inlined functions.

For callsite metadata this is simply a concatenation of each cloned
call's call stack with that of the inlined callsite's.

For memprof metadata, each profiled memory info block (MIB) is either
moved to the cloned allocation call or left on the original allocation
call depending on whether its context matches the newly refined call
stack context on the cloned call. We also reapply context trimming
optimizations based on the refined set of contexts on each of the calls
(cloned and original), via utilities in MemoryProfileInfo.

Depends on D128142.

Differential Revision: https://reviews.llvm.org/D128143
2022-09-30 16:46:17 -07:00
Matthias Braun 189900eb14 X86: Stop assigning register costs for longer encodings.
This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`.
This was previously done because instruction encoding require a REX
prefix when using them resulting in longer instruction encodings. I
found that this regresses the quality of the register allocation as the
costs impose an ordering on eviction candidates. I also feel that there
is a bit of an impedance mismatch as the actual costs occure when
encoding instructions using those registers, but the order of VReg
assignments is not primarily ordered by number of Defs+Uses.

I did extensive measurements with the llvm-test-suite wiht SPEC2006 +
SPEC2017 included, internal services showed similar patterns. Generally
there are a log of improvements but also a lot of regression. But on
average the allocation quality seems to improve at a small code size
regression.

Results for measuring static and dynamic instruction counts:

Dynamic Counts (scaled by execution frequency) / Optimization Remarks:
    Spills+FoldedSpills   -5.6%
    Reloads+FoldedReloads -4.2%
    Copies                -0.1%

Static / LLVM Statistics:
    regalloc.NumSpills    mean -1.6%, geomean -2.8%
    regalloc.NumReloads   mean -1.7%, geomean -3.1%
    size..text            mean +0.4%, geomean +0.4%

Static / LLVM Statistics:
    mean -2.2%, geomean -3.1%) regalloc.NumSpills
    mean -2.6%, geomean -3.9%) regalloc.NumReloads
    mean +0.6%, geomean +0.6%) size..text

Static / LLVM Statistics:
    regalloc.NumSpills   mean -3.0%
    regalloc.NumReloads  mean -3.3%
    size..text           mean +0.3%, geomean +0.3%

Differential Revision: https://reviews.llvm.org/D133902
2022-09-30 16:01:33 -07:00
Florian Hahn 586784a2e4
[ConstraintElimination] Simplify check lines in test added in 2812a141.
The CHECK lines in the test are too specific and cause mis-matches on
some platforms. Reduce them to make them less fragile.
2022-09-30 19:51:05 +01:00
Florian Hahn 2812a1413f
[ConstraintElimination] Add test showing bug in analysis invalidation. 2022-09-30 19:37:58 +01:00
Arthur Eubanks e23aee7175 [test] Update some legacy PM tests 2022-09-30 11:31:02 -07:00
Arthur Eubanks 865406d21e [FixIrreducible][opt] Mark -fix-irreducible as a codegen pass
So we don't have to specify -enable-new-pm=0.
2022-09-30 10:34:04 -07:00
Arthur Eubanks a7264e5549 [StructurizeCFG][opt] Mark -structurizecfg as a codegen pass
So we don't have to specify -enable-new-pm=0.
2022-09-30 10:27:09 -07:00
Florian Hahn 04c711c78d
[ConstraintElimination] Make sure the variable is available before use.
This fixes a crash when trying to access an index for a value where we
don't have a known index.

Fixes #58009.
2022-09-30 18:09:01 +01:00
Sanjay Patel 88b7c178f2 [SCCP] regenerate test checks; NFC
Avoid names with "tmp" because that can go wrong with the auto-generated script names.
2022-09-30 10:26:01 -04:00
Sanjay Patel bf1fe2497d [SCCP] add tests for sitofp; NFC
Adapted from the existing tests for ashr, sdiv, srem.
2022-09-30 09:28:57 -04:00
Sanjay Patel 3f906f057c [InstSimplify] look through vector select (shuffle) in min/max fold
This is an extension of the existing min/max+select fold (which already
has a very large number of variations) to allow a vector shuffle because
that's what we have in the motivating example from issue #42100.

A couple of Alive2 checks of variants (I don't know how to generalize
these in Alive):
https://alive2.llvm.org/ce/z/jUFAqT

And verify the PR42100 test:
https://alive2.llvm.org/ce/z/3EcASf

It's possible there is some generalization of the fold or a
VectorCombine/SLP answer for the motivating test, but I haven't found a
better/smaller solution yet.

We can also add even more variants here as follow-up patches. For example,
we can have shuffle followed by min/max; we also don't have this
canonicalization or the reverse:
https://alive2.llvm.org/ce/z/StHD9f

Differential Revision: https://reviews.llvm.org/D134879
2022-09-30 08:27:00 -04:00
Simon Pilgrim 5fc7bbfaa3 [SLP][X86] Add test coverage for Issue #58054 2022-09-30 13:26:31 +01:00
Nikita Popov 5c5ac3490c [InstCombine] Add test for PR57488 (NFC) 2022-09-30 14:25:13 +02:00
Nikita Popov c720fad16f [InstCombine] Add test for phi translation during select of phi fold (NFC)
The phi translation performed during this fold is important for
correctness, but was apparently untested.
2022-09-30 13:07:58 +02:00
Nikita Popov b6042cdf8b [InstCombine] Regenerate test checks (NFC) 2022-09-30 13:04:36 +02:00
Simon Pilgrim 5849fcb635 Revert rG1b7089fe67b924bdd5ecef786a34bdba7a88778f "[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions"
Revert rGef89409a59f3b79ae143b33b7d8e6ee6285aa42f "Fix 'unused-lambda-capture' gcc warning. NFCI."
Revert rG926ccfef032d206dcbcdf74ca1e3a9ebf4d1be45 "[SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds"

Revert ScalarizationOverheadBuilder sequence from D134605 - when accumulating extraction costs by Type (instead of specific Value), we are not distinguishing enough when they are coming from the same source or not, and we always just count the cost once. This needs addressing before we can use getScalarizationOverhead properly.
2022-09-30 11:22:48 +01:00
Simon Pilgrim 19782a46f8 [SLP][X86] Add test case for crash reported on D134605 2022-09-30 11:07:54 +01:00
Simon Pilgrim 28a3fc39a6 [SLP][AArch64] Add test case for vectorization regression case reported on D134605 2022-09-30 11:07:54 +01:00
Ben Dunbobbin 7eee2a2d44 [IR] Don't allow DLL storage-class and local linkage
Disallow this meaningless combination. Doing so simplifies analysis
of LLVM code w.r.t t DLL storage-class, and prevents mistakes with
DLL storage class.

- Change the assembler to reject DLL storage class on symbols with
  local linkage.
- Change the bitcode reader to clear the DLL Storage class when the
  linkage is local for auto-upgrading
- Update LangRef.

There is an existing restriction on non-default visibility and local
linkage which this is modelled on.

Differential Review: https://reviews.llvm.org/D134784
2022-09-30 00:26:01 +01:00
Florian Hahn 9933a2e9fd
[SCEVExpander] Move LCSSA fixup to ::expand.
Move LCSSA fixup from ::expandCodeForImpl to ::expand(). This has
the advantage that we directly preserve LCSSA nodes here instead of
relying on doing so in rememberInstruction. It also ensures that we
 don't add the non-LCSSA-safe value to InsertedExpressions.

Alternative to D132704.

Fixes #57000.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D134739
2022-09-29 20:49:56 +01:00
luxufan f079ba76cf [DSE] Eliminate noop store even through has clobbering between LoadI and StoreI
For noop store of the form of LoadI and StoreI,
An invariant should be kept is that the memory state of the related
MemoryLoc before LoadI is the same as before StoreI.
For this example:
```
define void @pr49927(i32* %q, i32* %p) {
  %v = load i32, i32* %p, align 4
  store i32 %v, i32* %q, align 4
  store i32 %v, i32* %p, align 4
  ret void
}
```
Here the definition of the store's destination is different with the
definition of the load's destination, which it seems that the
invariant mentioned above is broken. But the definition of the
store's destination would write a value that is LoadI, actually, the
invariant is still kept. So we can safely ignore it.

Fixes https://github.com/llvm/llvm-project/issues/49271

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D132657
2022-09-29 00:51:56 +00:00
Sanjay Patel 8bfba17b40 [InstSimplify][PhaseOrdering] add tests for vector select of min/max; NFC
The phase ordering test is the almost unoptimized IR for the example
in issue #42100; it was passed through -mem2reg to reduce obvious
excessive load/store and other noise.

D134879
2022-09-29 12:06:55 -04:00
luxufan ffb2a1534d [DSE][NFC] Update noop-stores.ll using update_test_checks.py
Differential Revision: https://reviews.llvm.org/D134630
2022-09-28 23:25:33 +00:00
Nikita Popov aa25c92f33 [ValueTracking] Fix CannotBeOrderedLessThanZero() for fdiv (PR58046)
When checking the RHS of fdiv, we should set the SignBitOnly flag,
because a negative zero can become -Inf, which is ordered less
than zero.

Fixes https://github.com/llvm/llvm-project/issues/58046.

Differential Revision: https://reviews.llvm.org/D134876
2022-09-29 17:07:48 +02:00
Philip Reames 02bfe2de7c [RISCV] Adjust vector immediate store materialization cost
This change updates the costs to make constant pool loads match their actual cost, and adds the broadcast special case to avoid too many regressions. We really need more information about the constants being rematerialized, but this is an incremental improvement.

Differential Revision: https://reviews.llvm.org/D134746
2022-09-29 07:37:13 -07:00
Nikita Popov ea32658288 [InstSimplify] Add test for PR58046 (NFC) 2022-09-29 15:22:13 +02:00
Nikita Popov 412141663c Reapply [FunctionAttrs] Infer precise FMRB
The previous version of the patch would incorrect convert an
existing argmemonly attribute into an inaccessiblemem_or_argmemonly
attribute.

-----

This updates checkFunctionMemoryAccess() to infer a precise
FunctionModRefBehavior, rather than an approximation split into
read/write and argmemonly.

Afterwards, we still map this back to imprecise function attributes.
This still allows us to infer some cases that we previously did not
handle, namely inaccessiblememonly and inaccessiblemem_or_argmemonly.
In practice, this means we get better memory attributes in the
presence of intrinsics like @llvm.assume.

Differential Revision: https://reviews.llvm.org/D134527
2022-09-29 14:02:15 +02:00
Nikita Popov e7f1331910 [FunctionAttrs] Add test for argmemonly function that already has attr (NFC)
Test for the issue reported in https://reviews.llvm.org/D134527#3821010.
2022-09-29 13:56:31 +02:00
Juan Manuel MARTINEZ CAAMAÑO 52545e603b [DebugInfo][InferAddressSpaces] Propagate DebugLoc when cloning an instruction in InferAddressSpaces
Differential Revision: https://reviews.llvm.org/D134428
2022-09-29 08:43:37 +00:00
Juan Manuel MARTINEZ CAAMAÑO e9716c64ec [StructurizeCFG] Remove imposible case and replace by assert
In addition, replace outdated XFAIL test by a new one.

Differential Revision: https://reviews.llvm.org/D134439
2022-09-29 08:27:49 +00:00
Mingming Liu ac28efa6c1 [SimplifyCFG][TranformUtils]Do not simplify away a trivial basic block if both this block and at least one of its predecessors are loop latches.
- Before this patch, loop metadata (if exists) will override the metadata of each predecessor; if the predecessor block already has loop metadata, the orignal loop metadata won't be preserved and could cause missed loop transformations (see 'test2' in llvm/test/Transforms/SimplifyCFG/preserve-llvm-loop-metadata.ll).

To illustrate how inner-loop metadata might be dropped before this patch:

CFG Before

      entry
        |
        v
 ---> while.cond   ------------->  while.end
 |       |
 |       v
 |   while.body
 |       |
 |       v
 |    for.body <---- (md1)
 |       |  |______|
 |       v
 |    while.cond.exit (md2)
 |       |
 |_______|

CFG After

       entry
         |
         v
 ---> while.cond.rewrite  ------------->  while.end
 |       |
 |       v
 |   while.body
 |       |
 |       v
 |    for.body <---- (md2)
 |_______|  |______|

Basically, when 'while.cond.exit' is folded into 'while.cond', 'md2' overrides 'md1' and 'md1' is dropped from the CFG.

Differential Revision: https://reviews.llvm.org/D134152
2022-09-28 10:48:14 -07:00
Matt Arsenault 01adf1f3e5 AtomicExpand: Add some more overaligned atomic tests 2022-09-28 12:51:30 -04:00
Matt Arsenault a61c3455c0 AtomicExpand: Use llvm.ptrmask instead of ptrtoint
This removes the ptrtoint from the load's pointer operand, although we
can't entirely eliminate these to get the LSB shift. In a future
patch, this will avoid ptrtoint in the case where the atomic is
overaligned to the word size.
2022-09-28 12:51:30 -04:00
bipmis 3b49a9fcf6 [AggressiveInstCombine] Combine consecutive loads which are being merged to form a wider load.
The patch simplifies some of the patterns as below

1. (ZExt(L1) << shift1) | (ZExt(L2) << shift2) -> ZExt(L3) << shift1
2. (ZExt(L1) << shift1) | ZExt(L2) -> ZExt(L3)

The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc.

Fix the error reported on reverse load merge.

Differential Revision: https://reviews.llvm.org/D127392
2022-09-28 17:32:47 +01:00
bipmis 48b8dee773 remove LE,BE labels inserted incorrectly 2022-09-28 17:07:26 +01:00
Sanjay Patel e239198cdb [InstCombine] fold select shuffles with shared operand together
We don't combine generic shuffles together in IR, but select
shuffles are a special-case because a select shuffle of a
select shuffle is just another select shuffle; codegen is
expected to efficiently lower those (select shuffles are also
the canonical form of a vector select with constant condition).
2022-09-28 11:56:27 -04:00
Sanjay Patel d743aff790 [InstCombine] add tests for shuffle-of-shuffle; NFC 2022-09-28 11:56:27 -04:00
bipmis 1dd7e576d7 Add reverse load tests to test load combine patch 2022-09-28 16:51:23 +01:00
Sameer Sahasrabuddhe 3f078b308b [AAPointerInfo] OffsetInfo: Unassigned is distinct from Unknown
A User like the PHINode may be visited multiple times for the same pointer along
different def-use edges. The uninitialized state of OffsetInfo at the first
visit needs to be distinct from the Unknown value that may be assigned after
processing the PHINode. Without that, a PHINode with all inputs Unknown is never
followed to its uses. This results in incorrect optimization because some
interfering accessess are missed.

Differential Revision: https://reviews.llvm.org/D134704
2022-09-28 20:31:36 +05:30
Benjamin Kramer 0fb2676c24 Revert "[FunctionAttrs] Infer precise FMRB"
This reverts commit 97dfa53626.

It can make DSE crash. Reduced test case at
https://reviews.llvm.org/P8291
2022-09-28 16:57:43 +02:00
Igor Kirillov 2d60d7ba1a [LoopVectorize][Fix] Crash when invariant store address is calculated inside loop
Fixes #57572

Generally LICM pass is responsible for sinking out code that calculates
invariant address inside loop as it only needed to be calculated once.
But in rare case it does not happen we will not be vectorizing the
loop.

Differential Revision: https://reviews.llvm.org/D133687
2022-09-28 10:33:50 +01:00