Commit Graph

31465 Commits

Author SHA1 Message Date
Vitaly Buka 4c18670776 [NFC][sancov] Rename ModuleSanitizerCoveragePass 2022-09-06 20:55:39 -07:00
Vitaly Buka 5e38b2a456 [NFC][msan] Rename ModuleMemorySanitizerPass 2022-09-06 20:30:35 -07:00
Ruobing Han fb45f3c948 [SimpleLoopUnswitch] Skip non-trivial unswitching of cold functions
In the current main branch, all cold loops will not be applied non-trivial unswitch. As reported in D129599, skipping these cold loops will incur regression in SPEC benchmark.
Thus, instead of skipping cold loops, now only skipping loops in cold functions.

Reviewed By: alexgatea, aeubanks

Differential Revision: https://reviews.llvm.org/D133275
2022-09-06 19:13:31 -04:00
Vitaly Buka 93600eb50c [NFC][asan] Rename ModuleAddressSanitizerPass 2022-09-06 15:02:11 -07:00
Vitaly Buka e7bac3b9fa [msan] Convert Msan to ModulePass
MemorySanitizerPass function pass violatied requirement 4 of function
pass to do not insert globals. Msan nees to insert globals for origin
tracking, and paramereters tracking.

https://llvm.org/docs/WritingAnLLVMPass.html#the-functionpass-class

Reviewed By: kstoimenov, fmayer

Differential Revision: https://reviews.llvm.org/D133336
2022-09-06 15:01:04 -07:00
Vitaly Buka b4257d3bf5 [tsan] Replace mem intrinsics with calls to interceptors
After https://reviews.llvm.org/rG463aa814182a23 tsan replaces llvm
intrinsics with calls to glibc functions. However this approach is
fragile, as slight changes in pipeline can return llvm intrinsics back.
In particular InstCombine can do that.

Msan/Asan already declare own version of these memory
functions for the similar purpose.

KCSAN, or anything that uses something else than compiler-rt, needs to
implement this callbacks.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D133268
2022-09-06 13:09:31 -07:00
Florian Hahn 27e7db54eb
Revert "[SCCP] convert signed div/rem to unsigned for non-negative operands"
This reverts commit fe1f3cfc26.

It looks like this commit breaks building llvm-test-suite.

To reproduce, run `opt -passes=ipsccp` on the IR below.

    @g = internal global i32 256, align 4

    define void @test() {
    entry:
      %0 = load i32, ptr @g, align 4
      %div = sdiv i32 %0, undef
      ret void
    }
2022-09-06 18:21:51 +01:00
Florian Hahn 2fb68c0628
[ConstraintElimination] Replace pair with named struct (NFC).
This slightly improves the readability and allows further extensions in
follow-ups.
2022-09-06 18:04:04 +01:00
Vitaly Buka c51a12d598 Revert "[tsan] Replace mem intrinsics with calls to interceptors"
Breaks
http://45.33.8.238/macm1/43944/step_4.txt
https://lab.llvm.org/buildbot/#/builders/70/builds/26926

This reverts commit 77654a65a3.
2022-09-06 09:47:33 -07:00
Sanjay Patel ae117e1c1b [InstCombine] remove dead code for add (select cond, (sub), 0); NFC
This pattern is handled more generally in SimplifySelectsFeedingBinaryOp().
Tests to confirm that added to the add.ll test file in the previous commit.
2022-09-06 12:19:50 -04:00
Doru Bercea 0b1160fdeb Fix OpenMP Opt for target without a parallel region.
Remove ctx redeclaration.

Format code.

Remove parallel check. Modify tests. Clean-up code.

Fix another test.

Move code to helper functions.

Format file.

Minor fixes.
2022-09-06 16:04:53 +00:00
Vitaly Buka 77654a65a3 [tsan] Replace mem intrinsics with calls to interceptors
After https://reviews.llvm.org/rG463aa814182a23 tsan replaces llvm
intrinsics with calls to glibc functions. However this approach is
fragile, as slight changes in pipeline can return llvm intrinsics back.
In particular InstCombine can do that.

Msan/Asan already declare own version of these memory
functions for the similar purpose.

KCSAN, or anything that uses something else than compiler-rt, needs to
implement this callbacks.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D133268
2022-09-06 08:25:32 -07:00
Sanjay Patel fe1f3cfc26 [SCCP] convert signed div/rem to unsigned for non-negative operands
This extends the transform added with D81756 to handle div/rem opcodes.
For example:
https://alive2.llvm.org/ce/z/cX6za6

This replicates part of what CVP already does, but the motivating example
from issue #57472 demonstrates a phase ordering problem - we convert
branches to select before CVP runs and miss the transform.

Differential Revision: https://reviews.llvm.org/D133198
2022-09-06 08:58:15 -04:00
Sanjay Patel dd6eb4d67f [InstCombine] reduce code duplication; NFC 2022-09-06 08:19:30 -04:00
Arthur Eubanks 7e3aa8f01a Revert "[LoopPassManager] Implement and use LoopNestAnalysis::run() instead of manually creating LoopNests"
This reverts commit 57fd866551.

Causes crashes, see comments in D132581.
2022-09-05 15:42:48 -07:00
Momchil Velikov 078899cd64 [SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions
SimplifyCFG does some common code hoisting, which is limited
to hoisting a sequence of identical instruction in identical
order and stops at the first non-identical instruction.

This patch allows hoisting instruction pairs over
same-length sequences of non-matching instructions. The
linear asymptotic complexity of the algorithm stays the
same, there's an extra parameter
`simplifycfg-hoist-common-skip-limit` serving to limit
compilation time and/or the size of the hoisted live ranges.

The patch improves SPECv6/525.x264_r by about 10%.

Reviewed By: nikic, dmgreen

Differential Revision: https://reviews.llvm.org/D129370
2022-09-05 15:13:46 +01:00
Tian Zhou 8fa432be4f [InstCombine] reduce test-for-overflow of shifted value
Fixes #57338.

The added code makes the following transformations:

For unsigned predicates / eq / ne:
icmp pred (x << 1), x --> icmp getSignedPredicate(pred) x, 0
icmp pred x, (x << 1) --> icmp getSignedPredicate(pred) 0, x

Some examples:
https://alive2.llvm.org/ce/z/ckn4cj
https://alive2.llvm.org/ce/z/h-4bAQ

Differential Revision: https://reviews.llvm.org/D132888
2022-09-05 09:51:51 -04:00
Florian Hahn 408ebe5e3a
[VPlan] Move VPWidenCallRecipe to VPlanRecipes.cpp (NFC).
Depends on D132585.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D132586
2022-09-05 10:48:29 +01:00
Nikita Popov 388b684354 [LICM] Separate check for writability and thread-safety (NFCI)
This used a single check to make sure that the object is both
writable and thread-local. Separate them out to make the
deficiencies in the current code more obvious.
2022-09-05 09:43:17 +02:00
Florian Hahn ba3d29f871
[LCSSA] Update unreachable uses with poison.
Users of LCSSA may not expect non-phi uses when checking the uses
outside a loop, which may cause crashes. This is due to the fact that we
do not update uses in unreachable blocks.

To ensure all reachable uses outside the loop are phis, update uses in
unreachable blocks to use poison in dead code.

Fixes #57508.
2022-09-04 22:26:18 +01:00
Kazu Hirata 7d8c2d17eb [llvm] Use range-based for loops (NFC)
Identified with modernize-loop-convert.
2022-09-03 23:27:25 -07:00
Fangrui Song 9fc679b87c [SanitizerCoverage] Simplify pc-table and improve test. NFC 2022-09-03 14:29:21 -07:00
Kazu Hirata 9eca5ed790 [llvm] Use std::enable_if_t (NFC) 2022-09-03 11:17:44 -07:00
Kazu Hirata fedc59734a [llvm] Use range-based for loops (NFC) 2022-09-03 11:17:40 -07:00
Sanjay Patel 22e1f66f26 [SCCP] add helper function for replacing signed operations; NFC
Preliminary refactoring for planned enhancement in D133198.
2022-09-03 10:30:10 -04:00
Sanjay Patel 5c759edc57 [InstCombine] reduce another or-xor bitwise logic pattern
~(A & ?) | (A ^ B) --> ~((A & ?) & B)
https://alive2.llvm.org/ce/z/mxex6V

This is similar to 9d218b61cc where we peeked through
another logic op to find a common operand.
2022-09-03 09:32:08 -04:00
Richard Smith 053841c562 Revert "[AggressiveInstCombine] Lower Table Based CTTZ"
This reverts commit fec01ee3f5.

According to asan, this patch introduces a heap use after free.
2022-09-02 16:19:09 -07:00
Francis Visoiu Mistrih c5b10f348e [Matrix] Use print instead of dump for matrix-print-after-transpose-opt
We should be able to use this option even if LLVM_ENABLE_DUMP is not on.

(should fix the bots too)
2022-09-02 16:12:21 -07:00
Francis Visoiu Mistrih 81bdb4068d [Matrix] Simplify matmuls with scalars
If one of the operands is a transposed splat, the transpose can be
removed.

This is useful to simplify when transposes are distributed to operands
of a matmul:

* k^T -> k
* (A * k)^t -> A^t * k

Differential Revision: https://reviews.llvm.org/D130177
2022-09-02 15:50:25 -07:00
Sameer Sahasrabuddhe 46b293cb3f [Attributor] Simplify offset calculation for a constant GEP
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D132931
2022-09-02 23:53:51 +05:30
Arthur Eubanks 57fd866551 [LoopPassManager] Implement and use LoopNestAnalysis::run() instead of manually creating LoopNests
The current code is basically just emulating what the analysis manager does.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D132581
2022-09-02 10:55:53 -07:00
Djordje Todorovic fec01ee3f5 [AggressiveInstCombine] Lower Table Based CTTZ
This patch introduces recognition of table-based ctz implementation
during the AggressiveInstCombine.

This fixes the [0].

[0] https://bugs.llvm.org/show_bug.cgi?id=46434

Differential Revision: https://reviews.llvm.org/D113291
2022-09-02 17:26:55 +02:00
Jolanta Jensen 958abe864a [LoopLoadElim] Add stores with matching sizes as load-store candidates
We are not building up a proper list of load-store candidates because
we are throwing away stores where the type don't match the load.
This patch adds stores with matching store sizes as candidates.
Author of the original patch: David Sherwood.

Differential Revision: https://reviews.llvm.org/D130233
2022-09-02 13:11:25 +01:00
Muhammad Omair Javaid 18de7c6a3b Revert "[InstCombine] Treat passing undef to noundef params as UB"
This reverts commit c911befaec.

It has broken LLDB Arm/AArch64 Linux buildbots. I dont really understand
the underlying reason. Reverting for now make buildbot green.

https://reviews.llvm.org/D133036
2022-09-02 16:09:50 +05:00
Mikael Holmen 51d4c7ceea [GlobalOpt] Fix debug variance problem in hasOnlyColdCalls
hasOnlyColdCalls skipped over calls to intrinsics, but it did so after
checking the linkage of the called function. This meant that the presence
of a call to a debug intrinsic could affect the outcome of the
optimization.

In my original reproducer (for an out of tree target) it was particularly
interesting, because the actual IR after GlobalOpt was not different with
debug instrinsics present, so -print-after-all printouts didn't show
anything there.

However, without debuginfo, GlobalOpt went further and ran
BlockFrequencyAnalysis and (more importanly) LoopAnalysis, and later on in
the pipeline, instcombine behaved in different ways when LoopInfo was
present.

So a call to a dbg.declare prevented running LoopAnalysis in
GlobalOpt, which later prevented InstCombine from doing an optimization.

The dbg-intrinsic-loopanalysis.ll testcase tries to expose this.

Then I also noted that adding a dbg.declare actually made the existing
testcase colccc_coldsites.ll generate different code, so I modified that
to now test it behaves the same way with and without the dbg.declare.

Reviewed By: nikic, fhahn

Differential Revision: https://reviews.llvm.org/D133193
2022-09-02 12:29:44 +02:00
Sergey Kachkov be37caca00 [JumpThreading] Process range comparisions with non-local cmp instructions
Use getPredicateOnEdge method if value is a non-local
compare-with-a-constant instruction, that can give more precise
results than getConstantOnEdge.

Differential Revision: https://reviews.llvm.org/D131956
2022-09-02 12:22:45 +02:00
Nikita Popov c453e5b901 Revert "[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI"
This reverts commit cd8f3e7581.

As pointed out by Eli on the review, this is missing an alignment
check. The value might be written at an offset.
2022-09-02 09:28:48 +02:00
Nikita Popov 639d912282 [LICM] Allow load-only scalar promotion in the presence of unwinding
Currently, we bail out of scalar promotion if the loop may unwind
and the memory may be visible on unwind. This is because we can't
insert stores of the promoted value on unwind edges.

However, nowadays scalar promotion also has support for only
promoting loads, while leaving stores in place. This kind of
promotion is safe even in the presence of unwinding.

Differential Revision: https://reviews.llvm.org/D133111
2022-09-02 09:27:13 +02:00
luxufan cd8f3e7581 [DSE] Eliminate noop store even through has clobbering between LoadI and StoreI
For noop store of the form of LoadI and StoreI,
An invariant should be kept is that the memory state of the related
MemoryLoc before LoadI is the same as before StoreI.
For this example:
```
define void @pr49927(i32* %q, i32* %p) {
  %v = load i32, i32* %p, align 4
  store i32 %v, i32* %q, align 4
  store i32 %v, i32* %p, align 4
  ret void
}
```
Here the definition of the store's destination is different with the
definition of the load's destination, which it seems that the
invariant mentioned above is broken. But the definition of the
store's destination would write a value that is LoadI, actually, the
invariant is still kept. So we can safely ignore it.

Differential Revision: https://reviews.llvm.org/D132657
2022-09-02 06:37:41 +00:00
Vitaly Buka ad3a77df2d [msan] Fix debug info with getNextNode
When we want to add instrumentation after
an instruction, instrumentation still should
keep debug info of the instruction.

Reviewed By: kda, kstoimenov

Differential Revision: https://reviews.llvm.org/D133091
2022-09-01 20:13:56 -07:00
Chenbing Zheng d30cf77cb1 [InstCombine] complete fold extractvalue (any_mul_with_overflow X, -1)
When we do extractvalue (any_mul_with_overflow X, -1) --> (-X and icmp),
which left partly failed to match vector constant with poison element.
This patch try to fix it.

Alive2: https://alive2.llvm.org/ce/z/2rGp_3

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D132996
2022-09-02 10:58:42 +08:00
Vitaly Buka ad2b356f85 [msan] Use no-origin functions when possible
Saves 1.8% of .text size on CTMark

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D133077
2022-09-01 19:18:38 -07:00
Arthur Eubanks c911befaec [InstCombine] Treat passing undef to noundef params as UB
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D133036
2022-09-01 15:16:45 -07:00
Rong Xu 0caa4a9559 [PGO] Support PGO annotation of CallBrInst
We currently instrument CallBrInst but do not annotate it with
the branch weight. This patch enables PGO annotation of CallBrInst.

Differential Revision: https://reviews.llvm.org/D133040
2022-09-01 14:13:50 -07:00
Vitaly Buka ef0f866718 [msan] Combine shadow check of the same instruction
Reduces .text size by 1% on our large binary.

On CTMark (-O2 -fsanitize=memory -fsanitize-memory-use-after-dtor -fsanitize-memory-param-retval)
Size -0.4%
Time -0.8%

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D133071
2022-09-01 13:55:59 -07:00
Vitaly Buka 9110673062 [nfc][msan] Group checks per instruction
It's a preparation of to combine shadow checks of the same instruction

Reviewed By: kda, kstoimenov

Differential Revision: https://reviews.llvm.org/D133065
2022-09-01 13:10:16 -07:00
Jordan Rupprecht 3031a250de [MSan] Fix determinism issue when using msan-track-origins.
When instrumenting `alloca`s, we use a `SmallSet` (i.e. `SmallPtrSet`). When there are fewer elements than the `SmallSet` size, it behaves like a vector, offering stable iteration order. Once we have too many `alloca`s to instrument, the iteration order becomes unstable. This manifests as non-deterministic builds because of the global constant we create while instrumenting the alloca.

The test added is a simple IR file, but was discovered while building `libcxx/src/filesystem/operations.cpp` from libc++. A reduced C++ example from that:

```
// clang++ -fsanitize=memory -fsanitize-memory-track-origins \
//   -fno-discard-value-names -S -emit-llvm \
//   -c op.cpp -o op.ll
struct Foo {
  ~Foo();
};
bool func1(Foo);
void func2(Foo);
void func3(int) {
  int f_st, t_st;
  Foo f, t;
  func1(f) || func1(f) || func1(t) || func1(f) && func1(t);
  func2(f);
}
```

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D133034
2022-09-01 09:15:57 -07:00
Nuno Lopes 858fe8664e Expand Div/Rem: consider the case where the dividend is zero
So we can't use ctlz in poison-producing mode
2022-09-01 17:04:26 +01:00
Nikita Popov f5c178b6a4 [LICM] Remove unnecessary condition (NFC) 2022-09-01 15:42:35 +02:00
Nikita Popov 315aef667e [LICM] Fix thread safety checks for promotion of byval args
This code was relying on a very subtle contract: The expectation
was that for non-allocas, the unwind safety check would already
perform a capture check, so we don't need to perform it later.
This held true when this unwind safety was only handled for allocas
and noalias calls, but became incorrect when byval support was
added.

To avoid this kind of issue, just remove the dependency between the
unwind and thread-safety checks entirely. At worst, this means we
perform a redundant capture check. If this should turn out to be
problematic for compile-time, we can cache that query in a more
explicit way.
2022-09-01 15:33:46 +02:00
Sanjay Patel c3d1504d63 [InstCombine] fix crash on type mismatch with fcmp fold
The existing predicate doesn't work for a single-element
vector, so make sure we are not crossing scalar/vector types.

Test (was crashing) based on the post-commit example for:
4827771234
2022-09-01 08:57:55 -04:00
Sanjay Patel addbdac5d5 [InstCombine] fold power-of-2 ctlz/cttz with inverted result
When X is a power-of-two or zero and zero input is poison:
ctlz(i32 X) ^ 31 --> cttz(X)
cttz(i32 X) ^ 31 --> ctlz(X)

https://alive2.llvm.org/ce/z/Cs7sFE
2022-09-01 08:57:55 -04:00
Nikita Popov 3f8b1d0f15 [LICM] Add some debug output to scalar promotion (NFC) 2022-09-01 14:46:30 +02:00
Alexey Bataev 982d9ef1c1 [SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison.
Need either follow the original order of the operands for bool logical
ops, or emit freeze instruction to avoid poison propagation.

Differential Revision: https://reviews.llvm.org/D126877
2022-09-01 05:34:45 -07:00
Yuanbo Li ebd0249fcf [DebugInfo] Missing debug location after replacement in processSRem function
This patch fixes an issue in which CorrelatedValuePropagation::processSRem
would create new instructions to represent the SRem instruction, but would not
correctly copy any existing debug location metadata to the new instruction.

Differential Revision: https://reviews.llvm.org/D132218
2022-09-01 13:18:17 +01:00
Florian Hahn fc444ddc77
[VPlan] Add field to track if intrinsic should be used for call. (NFC)
This patch moves the cost-based decision whether to use an intrinsic or
library call to the point where the recipe is created. This untangles
code-gen from the cost model and also avoids doing some extra work as
the information is already computed at construction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D132585
2022-09-01 13:14:40 +01:00
Nuno Lopes fa154a9170 Revert "Expand Div/Rem: consider the case where the dividend is zero"
This reverts commit 4aed09868b.
2022-09-01 12:11:22 +01:00
Nuno Lopes 4aed09868b Expand Div/Rem: consider the case where the dividend is zero
So we can't use ctlz in poison-producing mode
2022-09-01 12:00:03 +01:00
Pavel Samolysov 527b9a9d90 [DeadArgElim] Use structure bindings in foreach loops. NFC
Differential Revision: https://reviews.llvm.org/D133026
2022-09-01 13:48:46 +03:00
Nikita Popov 43e7d9af1d [InstCombine] Fold extractvalue of phi
Just as we do for most other operations, we should push
extractvalue instructions through phis, if this does not increase
unfolded instruction count.
2022-09-01 10:51:54 +02:00
Arthur Eubanks 04f3c20989 [NFC][LICM] Stop passing around unused BFI
Uses of this were removed in 1a25d0bfbb.
2022-08-31 19:15:34 -07:00
Vitaly Buka 53d1ae88f8 [nfc][msan] Prepare the code for check sorting 2022-08-31 15:36:49 -07:00
Nikita Popov ab6876a40d reland: [Local] Allow creating callbr with duplicate successors
Since D129288, callbr is allowed to have duplicate successors. This patch removes a limitation which prevents optimizations from actually producing such callbrs.

This is probably the riskiest of all the recent callbr changes, because code with incorrect assumptions might be lurking somewhere. I fixed the one case I encountered ahead of time in 8201e3ef5c.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D129997

Originally landed as
commit 08860f525a ("[Local] Allow creating callbr with duplicate successors")

Reverted in
commit 1cf6b93df1 ("Revert "[Local] Allow creating callbr with duplicate successors"")
2022-08-31 13:23:00 -07:00
Alexey Bataev 588115c117 [SLP][NFC]Add a check for SelectInst to match description, NFC. 2022-08-31 13:04:21 -07:00
Alexey Bataev d8d9ee10bb [SLP][NFC]Fix comment and make function following naming standard, NFC. 2022-08-31 12:37:55 -07:00
Philip Reames 8524622bdc [SLP] Simplify getOperandInfo implementation and be consistent
This is NOT nfc.  Specifically, the following behavior changes:
* Pointers are now allowed.  Both uniform, and constants.
* FP uniform non-constants can now be recognized.
* FP undefs are no longer considered constant.  This matches int behavior which we had tests for.  FP behavior was untested.  Its not clear to me int behavior is reasonable, but it's what tests seem to expect, so go with minimum impact for now.
2022-08-31 12:24:05 -07:00
Nikita Popov ad66bc42b0 [InstCombine] Use getInsertionPointAfterDef() in freeze fold
This simplifies the code and fixes handling of catchswitch, in
which case we have no insertion point for the freeze.

Originally part of D129660.
2022-08-31 11:32:57 +02:00
Nikita Popov 8f3fd26b74 [Reassociate] Use getInsertionPointerAfterDef()
This simplifies the code and fixes handling for the callbr case,
where the instruction needs to be inserted in the normal
destination, rather than after the terminator.

Originally part of D129660.
2022-08-31 11:10:24 +02:00
Nikita Popov 972840aa3b [IR] Add Instruction::getInsertionPointAfterDef()
Transforms occasionally want to insert an instruction directly
after the definition point of a value. This involves quite a few
different edge cases, e.g. for phi nodes the next insertion point
is not the next instruction, and for invokes and callbrs its not
even in the same block. Additionally, the insertion point may not
exist at all if catchswitch is involved.

This adds a general Instruction::getInsertionPointAfterDef() API to
implement the necessary logic. For now it is used in two places
where this should be mostly NFC. I will follow up with additional
uses where this fixes specific bugs in the existing implementations.

Differential Revision: https://reviews.llvm.org/D129660
2022-08-31 10:50:10 +02:00
Fangrui Song 13f0795425 [SLPVectorizer] Fix -Wunused-lambda-capture in -DLLVM_ENABLE_ASSERTIONS=off build 2022-08-30 23:01:22 -07:00
Chenbing Zheng 35a3048c25 [InstCombine] add support for multi-use Y of (X op Y) op Z --> (Y op Z) op X
For (X op Y) op Z --> (Y op Z) op X
we can still do transform when Y is multi-use. In D131356 limit it to one-use,
this patch remove this limit.

This is still not a complete solution, I add a todo test to show it.
In this case, X and Y are both multi use, we can't differentiate how to convert based on this.
But at least we don't make the code worse,and it can solve half the scenarios.
2022-08-31 10:55:05 +08:00
Alexey Bataev ec06df9459 [SLP]Fix PR57447: Assertion `!getTreeEntry(V) && "Scalar already in tree!"' failed.
The pointer operands for the ScatterVectorize node may contain
non-instruction values and they are not checked for "already being
vectorized". Need to check that such pointers are already vectorized and
gather them instead of trying to build vectorize node to avoid compiler
crash.

Differential Revision: https://reviews.llvm.org/D132949
2022-08-30 12:30:14 -07:00
Sanjay Patel 8a19842c0e [InstCombine] delete redundant folds; NFC
InstSimplify does this via isKnownNonEqual(), so it's already
using knownbits on these patterns and trying other folds.
2022-08-30 14:21:29 -04:00
Alexey Bataev afbf5466ba [SLP]Improve operands kind analaysis for constants.
Removed EnableFP parameter in getOperandInfo function since it is not
needed, the operands kinds also controlled by the operation code, which
allows to remove extra check for the type of the operands. Also, added
analysis for uniform constant float values.

This change currently does not trigger any changes in the code since TTI
does not do analysis for constant floats, so it can be considered NFC.
Tested with llvm-test-suite + SPEC2017, no changes.

Differential Revision: https://reviews.llvm.org/D132886
2022-08-30 06:35:39 -07:00
zhongyunde 23a5de4294 [InstCombine] Distributive or+mul with const operand
We aleady support the transform: `(X+C1)*CI -> X*CI+C1*CI`
Here the case is a little special as the form of `(X+C1)*CI` is transformed into `(X|C1)*CI`,
so we should also support the transform: `(X|C1)*CI -> X*CI+C1*CI`
Fixes https://github.com/llvm/llvm-project/issues/57278

Reviewed By: bcl5980, spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D132658
2022-08-30 20:36:52 +08:00
Florian Hahn b5e208fcba
[DSE] Support looking through memory phis at end of function.
Update isWriteAtEndOfFunction to look through MemoryPhis. The reason
MemoryPhis were skipped so far was the known AliasAnalysis issue with it
missing loop-carried dependences.

This problem is already addressed in other parts of the code by skipping
MemoryDefs that may be in difference loops. I think the same logic can
be applied here.

This can have a substantial impact on the number of stores removed in
some cases. For MultiSource/SPEC2006/SPEC2017 with -O3:

```
Metric: dse.NumFastStores

Program                                       dse.NumFastStores
                                              base              patch   diff
External/S...CINT2017rate/557.xz_r/557.xz_r     14.00             45.00 221.4%
External/S...te/538.imagick_r/538.imagick_r    439.00           1267.00 188.6%
MultiSourc...e/Applications/SIBsim4/SIBsim4      6.00             15.00 150.0%
MultiSourc...Prolangs-C/simulator/simulator      3.00              7.00 133.3%
MultiSource/Applications/siod/siod               3.00              7.00 133.3%
MultiSourc...arks/FreeBench/distray/distray      6.00              9.00  50.0%
MultiSourc...e/Applications/obsequi/Obsequi     22.00             30.00  36.4%
MultiSource/Benchmarks/Ptrdist/bc/bc            23.00             28.00  21.7%
External/S...NT2017rate/502.gcc_r/502.gcc_r   1258.00           1512.00  20.2%
External/S...te/520.omnetpp_r/520.omnetpp_r    954.00           1143.00  19.8%
External/S...rate/510.parest_r/510.parest_r   5961.00           7122.00  19.5%
External/S...C/CINT2006/445.gobmk/445.gobmk     47.00             56.00  19.1%
External/S...00.perlbench_r/500.perlbench_r    241.00            286.00  18.7%
External/S...NT2006/471.omnetpp/471.omnetpp     36.00             42.00  16.7%
External/S...06/400.perlbench/400.perlbench    183.00            210.00  14.8%
MultiSource/Applications/SPASS/SPASS            72.00             81.00  12.5%
External/S...17rate/541.leela_r/541.leela_r     72.00             80.00  11.1%
External/SPEC/CINT2006/403.gcc/403.gcc         585.00            642.00   9.7%
MultiSourc...e/Applications/sqlite3/sqlite3    120.00            131.00   9.2%
MultiSourc...Applications/hexxagon/hexxagon     11.00             12.00   9.1%
External/S.../CFP2006/453.povray/453.povray    566.00            615.00   8.7%
External/S...rate/511.povray_r/511.povray_r    578.00            627.00   8.5%
External/S...FP2006/482.sphinx3/482.sphinx3     12.00             13.00   8.3%
MultiSource/Applications/oggenc/oggenc         130.00            140.00   7.7%
MultiSourc...e/Applications/ClamAV/clamscan    250.00            268.00   7.2%
MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg     19.00             20.00   5.3%
MultiSourc...ch/consumer-jpeg/consumer-jpeg     19.00             20.00   5.3%
External/S...te/526.blender_r/526.blender_r   3747.00           3928.00   4.8%
MultiSourc...OE-ProxyApps-C++/miniFE/miniFE    104.00            108.00   3.8%
MultiSourc...ch/consumer-lame/consumer-lame     54.00             56.00   3.7%
MultiSource/Benchmarks/Bullet/bullet          1222.00           1264.00   3.4%
MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4    973.00           1005.00   3.3%
External/S.../CFP2006/447.dealII/447.dealII   2699.00           2780.00   3.0%
External/S...06/483.xalancbmk/483.xalancbmk    788.00            810.00   2.8%
External/S.../CFP2006/450.soplex/450.soplex    180.00            185.00   2.8%
MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR    338.00            345.00   2.1%
MultiSourc...Benchmarks/7zip/7zip-benchmark    685.00            699.00   2.0%
External/S...FP2017rate/544.nab_r/544.nab_r    158.00            160.00   1.3%
MultiSourc...sumer-typeset/consumer-typeset    772.00            781.00   1.2%
External/S...2017rate/525.x264_r/525.x264_r    410.00            414.00   1.0%
External/S...23.xalancbmk_r/523.xalancbmk_r    998.00           1002.00   0.4%
```

Compile-time is almost neutral:

https://llvm-compile-time-tracker.com/compare.php?from=b3125ad3d60531a97eea20009cc9629a87755862&to=84007eee59004f43464eda7f5ba8263ed5158df8&stat=instructions

NewPM-O3: +0.03%
NewPM-ReleaseThinLTO: -0.01%
NewPM-ReleaseLTO-g: +0.03%

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D132365
2022-08-30 13:27:51 +01:00
OCHyams 84a71d5259 [DebugInfo] Fix line number attribution in mldst-motion
Taking the example from the test included in this patch:

$ cat test.cpp -n
     1	void fun(int *a, int cond) {
     2	  if (cond)
     3	    a[1] = 1;
     4	  else
     5	    a[1] = 2;
     6	}

mldst-motion will merge and sink the stores in if.then and if.else into
if.end. The resultant PHI, gep and store should be attributed line zero
with the innermost common scope rather than picking a debug location from
one of the original stores.

Reviewed By: djtodoro

Differential Revision: https://reviews.llvm.org/D132741
2022-08-30 10:03:53 +01:00
jacquesguan df525c7705 [InstCombine] fold fake floating point vector extract to shift+trunc.
This patch supports the FP part of D111082.

Differential Revision: https://reviews.llvm.org/D125750
2022-08-30 10:12:16 +08:00
Rong Xu d7ef0c3970 [llvm-profdata] Improve profile supplementation
Current implementation promotes a non-cold function in the SampleFDO profile
into a hot function in the FDO profile. This is too aggressive. This patch
promotes a hot functions in the SampleFDO profile into a hot function, and a
warm function in SampleFDO into a warm function in FDO.

Differential Revision: https://reviews.llvm.org/D132601
2022-08-29 16:50:42 -07:00
Philip Reames 8936d86469 [LV] Add debug output for force scalar tracing [nfc]
I keep finding myself needing to rule this out as a possible source of scalarization, so add debug output like we have for other instructions we decide to scalarize.
2022-08-29 15:17:51 -07:00
Valery N Dmitriev 329b972d41 [SLP] Try to match reductions before trying to vectorize a vector build sequence.
This patch changes order of searching for reductions vs other vectorization possibilities.
The idea is if we do not match a reduction it won't be harmful for further attempts to
find vectorizable operations on a vector build sequences. But doing it in the opposite
order we have good chance to ruin opportunity to match a reduction later.
We also don't want to try vectorizing binary operations too early as 2-way vectorization
may effectively prohibit wider ones leading to producing less effective code.

Differential Revision: https://reviews.llvm.org/D132590
2022-08-29 13:32:14 -07:00
Philip Reames 033a97a8f3 [LV] Minor code restructure of isUniformAfterVectorization [nfc]
Mostly just to make a future patch easier to review.
2022-08-29 12:48:27 -07:00
Philip Reames c37b1a5f76 [RLEV] Pick a correct insert point when incoming instruction is itself a phi node
This fixes https://github.com/llvm/llvm-project/issues/57336. It was exposed by a recent SCEV change, but appears to have been a long standing issue.

Note that the whole insert into the loop instead of a split exit edge is slightly contrived to begin with; it's there solely because IndVarSimplify preserves the CFG.

Differential Revision: https://reviews.llvm.org/D132571
2022-08-29 11:44:33 -07:00
Alexey Bataev beacf9bd9e [SLP]Fix PR57322: vectorize constant float stores.
Stores for constant floats must be vectorized, improve analysis in SLP
vectorizer for stores.

Differential Revision: https://reviews.llvm.org/D132750
2022-08-29 11:02:53 -07:00
Alexey Bataev e6345bf644 [SLP]Improve lookup of the buildvector top insertelement instruction.
When estimating the cost of the in-tree vectorized scalars in
buildvector sequences, need to take into account the vectorized
insertelement instruction. The top of the buildvector seuences is the
topmost vectorized insertelement instruction, because it will have
> than 1 use after the vectorization.

For the affected test case improves througput from 21 to 16 (per
llvm-mca).

Differential Revision: https://reviews.llvm.org/D132740
2022-08-29 08:19:52 -07:00
Sanjay Patel 6c39a3aae1 [InstCombine] fold not-shift of signbit to icmp+zext
https://alive2.llvm.org/ce/z/j_8Wz9

The arithmetic shift was converted to logical shift with:
246078604c

That does not seem to uncover any other missing/conflicting folds,
so convert directly to signbit test + cast.

We still need to fold the pattern with logical shift to test + cast.

This allows reducing patterns where the output type is not
the same as the input value:
https://alive2.llvm.org/ce/z/nydwFV

Fixes #57394
2022-08-29 10:06:31 -04:00
Sanjay Patel 246078604c [InstCombine] fold inc-of-signbit-splat to not+lshr
(iN X s>> (N - 1)) + 1 --> (~X) u>> (N - 1)

https://alive2.llvm.org/ce/z/wzS474
2022-08-29 08:48:22 -04:00
Florian Hahn c78696813f
[LV] Remove unneeded getVectorIntrinsicIDForCall call (NFC).
Suggested as independent fix during the review of D132585.
2022-08-29 10:19:47 +01:00
Kazu Hirata 2ad7fd3ac7 [Instrumentation] Use std::clamp (NFC)
The use of std::clamp should be safe here.  MinRZ is at most 32, while
kMaxRZ is 1 << 18, so we have MinRZ <= kMaxRZ, avoiding the undefind
behavior of std::clamp.
2022-08-28 23:28:57 -07:00
Kazu Hirata c63f823875 [llvm] Use range-based for loops (NFC) 2022-08-28 17:35:04 -07:00
Sanjay Patel ab6892967c [InstCombine] allow sext in fold of mask using signbit, part 2
https://alive2.llvm.org/ce/z/rcbZmx

Sibling tranform to 275aa24c0a

This pattern is seen in the examples in issue #57381.
2022-08-28 11:50:52 -04:00
zhongyunde 84d6966e4d [InstCombine] Propagate the nuw for combine of add+mul
As the commit of D132658, make the 'nuw' change separately.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D132777
2022-08-28 23:01:11 +08:00
Florian Hahn af98b875e8
[VPlan] Use range check in VPHeaderPHIRecipe::classof (NFC).
This addresses a suggestion to simplify the check from D131989. This
also makes it easier to ensure that VPHeaderPHIRecipe::classof checks
for all header phi ids.
2022-08-28 15:54:12 +01:00
Sanjay Patel 275aa24c0a [InstCombine] allow sext in fold of mask using signbit
~(iN X s>> (N-1)) & Y --> (X s< 0) ? 0 : Y -- with optional sext

https://alive2.llvm.org/ce/z/wFFnZT
2022-08-28 09:01:30 -04:00
Kazu Hirata b18ff9c461 [Transform] Use range-based for loops (NFC) 2022-08-27 23:54:32 -07:00
Kazu Hirata d0166c617d [Utils] Remove redundaunt declarations (NFC)
Identified with readability-redundant-declaration.
2022-08-27 23:54:31 -07:00
Kazu Hirata d1688e9ddf [llvm] Use std::gcd (NFC)
This patch replaces calls to greatestCommonDivisor with std::gcd where
both arguments are known to be of unsigned.  This means that
std::common_type_t of the two argument types should just be the wider
one of the two.
2022-08-27 23:54:29 -07:00
Kazu Hirata 56ea4f9bd3 [Transforms] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-27 21:21:02 -07:00
Kazu Hirata 7a617fdf39 Use std::gcd (NFC)
This patch replaces calls to GreatestCommonDivisor64 with std::gcd
where both arguments are known to be of unsigned types no larger than
64 bits in size.
2022-08-27 21:20:59 -07:00
Florian Hahn 7743badafa
[VPlan] Verify that header only contains header phi recipes.
Add verification that VPHeaderPHIRecipes are only in header VPBBs. Also
adds missing checks for VPPointerInductionRecipe to
VPHeaderPHIRecipe::classof.

Split off from D119661.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D131989
2022-08-27 22:06:12 +01:00