Commit Graph

27221 Commits

Author SHA1 Message Date
Sanjay Patel e808289fe6 [IndVars] avoid crash in LFTR when assuming an add recurrence
The test is a crasher reduced from:
https://llvm.org/PR49993

linearFunctionTestReplace() assumes that we have an add recurrence,
so check for that as a condition of matching a loop counter.

Differential Revision: https://reviews.llvm.org/D101291
2021-04-27 08:26:02 -04:00
Florian Hahn 160e729cf0
[VPlan] Use recursive traversal iterator in VPSlotTracker.
This patch simplifies VPSlotTracker by using the recursive traversal
iterator to traverse all blocks in a VPlan in reverse post-order when
numbering VPValues in a plan.

This depends on a fix to RPOT (D100169). It also extends the traversal
unit tests to check RPOT.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D100176
2021-04-27 12:39:06 +01:00
Vitaly Buka f2a585e6d3 [NFC] Fix "not used" warning 2021-04-26 22:09:23 -07:00
Arthur Eubanks fd1ff5ee03 [Inliner] Make ModuleInlinerWrapperPass return PreservedAnalyses::all()
The ModulePassManager should already have taken care of all analysis
invalidation. Without this change, upcoming changes will cause more
invalidation than necessary.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D101320
2021-04-26 17:22:35 -07:00
William S. Moses 7aa3cad46a [NVPTX] Enable lowering of atomics on local memory
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.

Differential Revision: https://reviews.llvm.org/D98650
2021-04-26 20:12:12 -04:00
Hongtao Yu 30bb5be389 [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 2.
As a follow-up to D95982, this patch continues unblocking optimizations that are blocked by pseudu probe instrumention.

The optimizations unblocked are:
		- In-block load propagation.
		- In-block dead store elimination
		- Memory copy optimization that turns stores to consecutive memories into a memset.

These optimizations are local to a block, so they shouldn't affect the profile quality.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100075
2021-04-26 16:52:33 -07:00
Fangrui Song 18839be9c5 [ADT] Remove StatisticBase and make NoopStatistic empty
In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic`
but has 3 mostly unused pointers. GlobalOpt considers that the pointers can
potentially retain allocated objects, so GlobalOpt cannot optimize out the
`NoopStatistic` variables (see D69428 for more context), wasting 23KiB for stage
2 clang.

This patch makes `NoopStatistic` empty and thus reclaims the wasted space.  The
clang size is even smaller than applying D69428 (slightly smaller in both .bss and
.text).
```
# This means the D69428 optimization on clang is mostly nullified by this patch.
HEAD+D69428: size(.bss) = 0x0725a8
HEAD+D101211: size(.bss) = 0x072238

# bloaty - HEAD+D69428 vs HEAD+D101211
# With D101211, we also save a lot of string table space (.rodata).
    FILE SIZE        VM SIZE
 --------------  --------------
  -0.0%     -32  -0.0%     -24    .eh_frame
  -0.0%    -336  [ = ]       0    .symtab
  -0.0%    -360  [ = ]       0    .strtab
  [ = ]       0  -0.2%    -880    .bss
  -0.0% -2.11Ki  -0.0% -2.11Ki    .rodata
  -0.0% -2.89Ki  -0.0% -2.89Ki    .text
  -0.0% -5.71Ki  -0.0% -5.88Ki    TOTAL
```

Note: LoopFuse is a disabled pass. For now this patch adds
`#if LLVM_ENABLE_STATS` so `OptimizationRemarkMissed` is skipped in
LLVM_ENABLE_STATS==0 builds.  If these `OptimizationRemarkMissed` are useful in
LLVM_ENABLE_STATS==0 builds, we can replace `llvm::Statistic` with
`llvm::TrackingStatistic`, or use a different abstraction to keep track of the strings.

Similarly, skip the code in `mlir/lib/Pass/PassStatistics.cpp` which
calls `getName`/`getDesc`/`getValue`.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D101211
2021-04-26 16:47:32 -07:00
William S. Moses 8ede96493c Revert "[NVPTX] Enable lowering of atomics on local memory"
This reverts commit fede99d386.
2021-04-26 19:33:01 -04:00
William S. Moses fede99d386 [NVPTX] Enable lowering of atomics on local memory
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.

Differential Revision: https://reviews.llvm.org/D98650
2021-04-26 19:27:27 -04:00
Lei Zhang 254e289d45 Revert "[ADT] Remove StatisticBase and make NoopStatistic empty"
This reverts commit b540311781
because it breaks MLIR build:

https://buildkite.com/mlir/mlir-core/builds/13299#ad0f8901-dfa4-43cf-81b8-7940e2c6c15b
2021-04-26 18:31:04 -04:00
Michael Kruse b99466eb45 [SimplifyCFG] Preserve metadata when unconditionalizing branches (same target).
When replacing a conditional branch by an unconditional one because the targets are identical, transfer the metadata to the new branch instruction.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101226
2021-04-26 17:23:01 -05:00
Fangrui Song b540311781 [ADT] Remove StatisticBase and make NoopStatistic empty
In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic`
but has 3 unused pointers. GlobalOpt considers that the pointers can potentially
retain allocated objects, so GlobalOpt cannot optimize out the `NoopStatistic`
variables (see D69428 for more context), wasting 23KiB for stage 2 clang.

This patch makes `NoopStatistic` empty and thus reclaims the wasted space.  The
clang size is even smaller than applying D69428 (slightly smaller in both .bss and
.text).
```
# This means the D69428 optimization on clang is mostly nullified by this patch.
HEAD+D69428: size(.bss) = 0x0725a8
HEAD+D101211: size(.bss) = 0x072238

# bloaty - HEAD+D69428 vs HEAD+D101211
# With D101211, we also save a lot of string table space (.rodata).
    FILE SIZE        VM SIZE
 --------------  --------------
  -0.0%     -32  -0.0%     -24    .eh_frame
  -0.0%    -336  [ = ]       0    .symtab
  -0.0%    -360  [ = ]       0    .strtab
  [ = ]       0  -0.2%    -880    .bss
  -0.0% -2.11Ki  -0.0% -2.11Ki    .rodata
  -0.0% -2.89Ki  -0.0% -2.89Ki    .text
  -0.0% -5.71Ki  -0.0% -5.88Ki    TOTAL
```

Note: LoopFuse is a disabled pass. This patch adds `#if LLVM_ENABLE_STATS` so
`OptimizationRemarkMissed` is skipped in LLVM_ENABLE_STATS==0 builds.  If these
`OptimizationRemarkMissed` are useful and not noisy, we can replace
`llvm::Statistic` with `llvm::TrackingStatistic` in the future.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D101211
2021-04-26 13:39:35 -07:00
Fangrui Song 614de225c9 [gcov] Set nounwind and respect module flags metadata "frame-pointer" & "uwtable" for synthesized functions
This applies the D100251 mechanism to the gcov instrumentation pass.

With this patch, `-fno-omit-frame-pointer` in
`clang -fprofile-arcs -O1 -fno-omit-frame-pointer` will be respected for synthesized
`__llvm_gcov_writeout,__llvm_gcov_reset,__llvm_gcov_init` functions: the frame pointer
will be kept (note: on many targets -O1 eliminates the frame pointer by default).

`clang -fno-exceptions -fno-asynchronous-unwind-tables -g -fprofile-arcs` will
produce .debug_frame instead of .eh_frame.

Fix: https://github.com/ClangBuiltLinux/linux/issues/955

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D101129
2021-04-26 13:30:21 -07:00
Michael Kruse 153144be40 [SimplifyCFG] Preserve metadata when unconditionalizing branches (constant condition).
When replacing a conditional branch by an unconditional one because the condition is a constant, transfer the metadata to the new branch instruction.

Part of fix for llvm.org/PR50060

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101141
2021-04-26 10:57:31 -05:00
Dávid Bolvanský 691badc3d6 [InstCombine] C - ctpop(a) - > ctpop(~a)) if C is bitwidth (PR50104)
Proof: https://alive2.llvm.org/ce/z/mncA9K
Solves https://bugs.llvm.org/show_bug.cgi?id=50104

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101257
2021-04-26 15:40:54 +02:00
Yuanbo Li cc7803ee3f [LSR][DebugInfo] Don't unnecessarily drop DebugLocs
When transforming a loop terminating condition into a "max" comparison,
the DebugLoc from the old condition should be set on the newly created
comparison. They are the same operation, just optimized. Fixes PR48067.

Differential Revision: https://reviews.llvm.org/D98218
2021-04-26 13:14:42 +01:00
Florian Hahn 7302fe4328
[VPlan] Make blocksOnly work properly with ranges over const pointers.
When iterating over const blocks, the base type in the lambdas needs
to use const VPBlockBase *, otherwise it cannot be used with input
iterators over const VPBlockBase.

Also adjust the type of the input iterator range to const &, as it
does not take ownership of the input range.
2021-04-26 10:52:35 +01:00
Florian Hahn 4b9be5ac08
[VPlan] Add VPBlockUtils::blocksOnly helper.
This patch adds a blocksOnly helpers which take an iterator range
over VPBlockBase * or const VPBlockBase * and returns an interator
range that only include BlockTy blocks. The accesses are casted to
BlockTy.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D101093
2021-04-25 17:38:09 +01:00
Florian Hahn fa2f162e76
[NewGVN] Properly transfer PredDep in move constructor. 2021-04-25 11:22:59 +01:00
Florian Hahn 1d8ef761be
[NewGVN] Use ExprResult to add extra predicate users.
This patch updates performSymbolicPredicateInfoEvaluation to manage
registering additional dependencies using ExprResult. Similar to D99987,
this fixes an issues where we failed to track the correct dependency for
a phi-of-ops value, which is marked as temporary.

Fixes PR49873.

Reviewed By: asbirlea, ruiling

Differential Revision: https://reviews.llvm.org/D100560
2021-04-25 11:13:32 +01:00
Florian Hahn 1cc5946cc8
[NewGVN] Use performSymbolicEvaluation instead of createExpression.
performSymbolicEvaluation is used to obtain the symbolic expression when
visiting instructions and this is used to determine their congruence
class.

performSymbolicEvaluation only creates expressions for certain
instructions (via createExpression). For unsupported instructions,
'unknown' expression are created.

The use of createExpression in processOutgoingEdges means we may
simplify the condition in processOutgoingEdges to a constant in the
initial round of processing, but we use Unknown(I) for the congruence
class. If an operand of I changes the expression Unknown(I) stays the
same, so there is no update of the congruence class of I. Hence it
won't get re-visited. So if an operand of I changes in a way that causes
createExpression to return different result, this update is missed.

This patch updates the code to use performSymbolicEvaluation, to be
symmetric with the congruence class updating code.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D99990
2021-04-24 18:49:07 +01:00
Dávid Bolvanský 137568e579 [InstCombine] Fixed UB in foldCtpop 2021-04-24 19:44:16 +02:00
Dávid Bolvanský de3fa35cdb [InstCombine] ctpop(rot(X)) -> ctpop(X)
Proof:
https://alive2.llvm.org/ce/z/ss2zyt - rotl
https://alive2.llvm.org/ce/z/ZM7Aue - rotr

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101235
2021-04-24 18:25:03 +02:00
Dávid Bolvanský d4ec8ea19c [InstCombine] ctpop(X) + ctpop(Y) => ctpop(X | Y) if X and Y have no common bits (PR48999)
For example:

```
int src(unsigned int a, unsigned int b)
{
    return __builtin_popcount(a << 16) + __builtin_popcount(b >> 16);
}

int tgt(unsigned int a, unsigned int b)
{
    return __builtin_popcount((a << 16)  | (b >> 16));
}
```

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101210
2021-04-24 17:52:10 +02:00
dfukalov 6c57044231 [GVN] Clobber partially aliased loads.
Use offsets stored in `AliasResult` implemented in D98718.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95543
2021-04-24 14:14:20 +03:00
wlei 3d1aecbd28 [CSSPGO] Fix missing debug info of dangling pseudo probe
While doing speculative execution opt, it conservatively drops all insn's debug info in the merged `ThenBB`(see the loop at line 2384) including the dangling probe. The missing debug info of the dangling probe will cause the wrong inference computation.

So we should avoid dropping the debug info from pseudo probe, this change try to fix this by moving the to-be dangling probe to the merging target BB before the debug info is dropped.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D101195
2021-04-23 14:26:47 -07:00
Dávid Bolvanský 9aee07abd0 [InstCombine] X - usub.sat(X, Y) => umin(X, Y)
Pattern regressed in LLVM 9 with the introduction of usub.sat.

Fixes https://bugs.llvm.org/show_bug.cgi?id=42178#c2

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101184
2021-04-23 21:13:07 +02:00
Hongtao Yu 5f2d730073 [CSSPGO] Fix incorrect prorating indirect call distribution factor that leads to target count loss.
Pseudo probe distribution factor is used to scale down profile samples to avoid misleading the counts inference due to the usage of "maximum" in `getBlockWeight`. For callsites, the scaling down can come from code duplication prior to the sample profile loader (prelink or postlink), or due to the indirect call promotion in sample loader inliner. This patch fixes an issue in sample loader ICP where the leftover indirect callsite scaling down causes the loss of non-promoted call target samples unexpectedly. While the scaling down is to favor BFI/BPI with accurate an callsite count, it doesn't fit in the current distribution factor that represents code duplication changes. Ideally,  we would need two factors, one is for code duplication, the other is for ICP. However this seems over complicated. I'm going to trade one usage (callsite counts) for the other (call target counts).

Seeing perf win on one benchmark (mcf) of SPEC2017 with others unchanged.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D100993
2021-04-23 11:09:22 -07:00
Sanjay Patel e10d7d455d [InstCombine] fold 'not' of ctpop in parity pattern
As discussed in https://llvm.org/PR50096 , we could
convert the 'not' into a 'sub' and see the same
fold. That's because we already have another demanded
bits optimization for 'sub'.

We could add a related transform for
odd-number-of-type-bits, but that seems unlikely
to be practical.

https://alive2.llvm.org/ce/z/TWJZXr
2021-04-23 13:23:24 -04:00
Florian Hahn 89c4dda076
[VPlan] Add GraphTraits impl to traverse through VPRegionBlock.
This patch adds a new iterator to traverse through VPRegionBlocks and a
GraphTraits specialization using the iterator to traverse through
VPRegionBlocks.

Because there is already a GraphTraits specialization for VPBlockBase *
and co, a new VPBlockRecursiveTraversalWrapper helper is introduced.
This allows us to provide a new GraphTraits specialization for that
type. Users can use the new recursive traversal by using this wrapper.

The graph trait visits both the entry block of a region, as well as all
its successors. Exit blocks of a region implicitly have their parent
region's successors. This ensures all blocks in a region are visited
before any blocks in a successor region when doing a reverse post-order
traversal of the graph.

Reviewed By: a.elovikov

Differential Revision: https://reviews.llvm.org/D100175
2021-04-23 17:26:47 +01:00
Sander de Smalen f9a50f04ba [TTI] NFC: Change getIntImmCost[Inst|Intrin] to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100565
2021-04-23 16:06:36 +01:00
Sander de Smalen 43ace8b5ce [TTI] NFC: Change getScalingFactorCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100564
2021-04-23 16:06:36 +01:00
Timm Bäder e60d6e91e1 [llvm][NFC] Fix assert indentation
This triggers GCC's misleading-indentation checker.
2021-04-23 14:44:05 +02:00
Dávid Bolvanský 5f77e7708a [InstCombine] Fixed crash when setting align attr for memalign 2021-04-23 14:04:08 +02:00
Florian Hahn 2b15262f89
Recommit "[NewGVN] Track simplification dependencies for phi-of-ops."
This recommits 4f5da356ff, including
explicit implementations of move a constructor and deleted copy
constructors/assignment operators, to fix failures with some compilers.

This reverts the revert 74854d00e8.
2021-04-23 11:27:43 +01:00
Stephen Tozer 791930d740 Re-reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
Previous build failures were caused by an error in bitcode reading and
writing for DIArgList metadata, which has been fixed in e5d844b587.
There were also some unnecessary asserts that were being triggered on
certain builds, which have been removed.

This reverts commit dad5caa59e.
2021-04-23 10:54:01 +01:00
Florian Hahn 74854d00e8
Revert "[NewGVN] Track simplification dependencies for phi-of-ops."
This reverts commit 4f5da356ff.

This causes some  buildbot failures, e.g.
https://lab.llvm.org/buildbot/#/builders/139/builds/3019
2021-04-23 09:56:17 +01:00
Florian Hahn 4f5da356ff
[NewGVN] Track simplification dependencies for phi-of-ops.
If we are using a simplified value, we need to add an extra
dependency this value , because changes to the class of the
simplified value may require us to invalidate any decision based on
that value.

This is done by adding such values as additional users, however the
current code does not excludes temporary instructions.

At the moment, this means that we miss those dependencies for
phi-of-ops, because they are temporary instructions at this point. We
instead need to add the extra dependencies to the root instruction of
the phi-of-ops.

This patch pushes the responsibility of adding extra users to the
callers of createExpression & performSymbolicEvaluation. At those
points, it is clearer which real instruction to pick.

Alternatively we could either pass the 'real' instruction as additional
argument or use another map, but I think the approach in the patch makes
things a bit easier to follow.

Fixes PR35074.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D99987
2021-04-23 09:48:38 +01:00
KAWASHIMA Takahiro d9a9c992d1 [LoopReroll] Fix rerolling loop with extra instructions
Fixes PR47627

This fix suppresses rerolling a loop which has an unrerollable
instruction.

Sample IR for the explanation below:

```
define void @foo([2 x i32]* nocapture %a) {
entry:
  br label %loop

loop:
  ; base instruction
  %indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]

  ; unrerollable instructions
  %stptrx = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %indvar, i64 0
  store i32 999, i32* %stptrx, align 4

  ; extra simple arithmetic operations, used by root instructions
  %plus20 = add nuw nsw i64 %indvar, 20
  %plus10 = add nuw nsw i64 %indvar, 10

  ; root instruction 0
  %ldptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 0
  %value0 = load i32, i32* %ldptr0, align 4
  %stptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 0
  store i32 %value0, i32* %stptr0, align 4

  ; root instruction 1
  %ldptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 1
  %value1 = load i32, i32* %ldptr1, align 4
  %stptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 1
  store i32 %value1, i32* %stptr1, align 4

  ; loop-increment and latch
  %indvar.next = add nuw nsw i64 %indvar, 1
  %exitcond = icmp eq i64 %indvar.next, 5
  br i1 %exitcond, label %exit, label %loop

exit:
  ret void
}
```

In the loop rerolling pass, `%indvar` and `%indvar.next` are appended
to the `LoopIncs` vector in the `LoopReroll::DAGRootTracker::findRoots`
function.

Before this fix, two instructions with `unrerollable instructions`
comment above are marked as `IL_All` at the end of the
`LoopReroll::DAGRootTracker::collectUsedInstructions` function,
as well as instructions with `extra simple arithmetic operations`
comment and `loop-increment and latch` comment. It is incorrect
because `IL_All` means that the instruction should be executed in all
iterations of the rerolled loop but the `store` instruction should
not.

This fix rejects instructions which may have side effects and don't
belong to def-use chains of any root instructions and reductions.

See https://bugs.llvm.org/show_bug.cgi?id=47627 for more information.
2021-04-23 15:14:46 +09:00
Elia Geretto 2627f99613 [dfsan] Fix Len argument type in call to __dfsan_mem_transfer_callback
This patch is supposed to solve: https://bugs.llvm.org/show_bug.cgi?id=50075

The function `__dfsan_mem_transfer_callback` takes a `Len` argument of type `i64`; however, when processing a `MemTransferInst` such as `llvm.memcpy.p0i8.p0i8.i32`, the `len` argument has type `i32`. In order to make the type of `len` compatible with the one of the callback argument, this change zero-extends it when necessary.

Reviewed By: stephan.yichao.zhao, gbalats

Differential Revision: https://reviews.llvm.org/D101048
2021-04-22 21:12:20 +00:00
Arthur Eubanks 16ff1a7023 [GlobalOpt] Don't replace alias with aliasee if aliasee is interposable
Both the alias and aliasee linkage are important.

PR27866 provides some background.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D99629
2021-04-22 13:12:34 -07:00
Philip Reames 15e19a2599 Revert "[instcombine] Exploit UB implied by nofree attributes"
This change effectively reverts 86664638, but since there have been some changes on top and I wanted to leave the tests in, it's not a mechanical revert.

Why revert this now?  Two main reasons:
1) There are continuing discussion around what the semantics of nofree.  I am getting increasing uncomfortable with the seeming possibility we might redefine nofree in a way incompatible with these changes.
2) There was a reported miscompile triggered by this change (https://github.com/emscripten-core/emscripten/issues/9443).  At first, I was making good progress on tracking down the issues exposed and those issues appeared to be unrelated latent bugs.  Now that we've found at least one bug in the original change, and the investigation has stalled, I'm no longer comfortable leaving this in tree.  In retrospect, I probably should have reverted this earlier and investigated the issues once the triggering change was out of tree.
2021-04-22 10:53:17 -07:00
Jianzhou Zhao 7fdf270965 [dfsan] Track origin at loads
The first version of origin tracking tracks only memory stores. Although
    this is sufficient for understanding correct flows, it is hard to figure
    out where an undefined value is read from. To find reading undefined values,
    we still have to do a reverse binary search from the last store in the chain
    with printing and logging at possible code paths. This is
    quite inefficient.

    Tracking memory load instructions can help this case. The main issues of
    tracking loads are performance and code size overheads.

    With tracking only stores, the code size overhead is 38%,
    memory overhead is 1x, and cpu overhead is 3x. In practice #load is much
    larger than #store, so both code size and cpu overhead increases. The
    first blocker is code size overhead: link fails if we inline tracking
    loads. The workaround is using external function calls to propagate
    metadata. This is also the workaround ASan uses. The cpu overhead
    is ~10x. This is a trade off between debuggability and performance,
    and will be used only when debugging cases that tracking only stores
    is not enough.

Reviewed By: gbalats

Differential Revision: https://reviews.llvm.org/D100967
2021-04-22 16:25:24 +00:00
Alexey Bataev 18c61fc498 [SLP]Skip undefs trying to find perfect/shuffled tree entries matching.
We can skip check for undefs trying to find perfect/shuffled tree
entries matching, they can be ignored completely improving the final
cost/vectorization results.

Differential Revision: https://reviews.llvm.org/D101061
2021-04-22 08:59:07 -07:00
Joe Ellis 2c551aedcf [LoopVectorize] Fix bug where predicated loads/stores were dropped
This commit fixes a bug where the loop vectoriser fails to predicate
loads/stores when interleaving for targets that support masked
loads and stores.

Code such as:

     1  void foo(int *restrict data1, int *restrict data2)
     2  {
     3    int counter = 1024;
     4    while (counter--)
     5      if (data1[counter] > data2[counter])
     6        data1[counter] = data2[counter];
     7  }

... could previously be transformed in such a way that the predicated
store implied by:

    if (data1[counter] > data2[counter])
       data1[counter] = data2[counter];

... was lost, resulting in miscompiles.

This bug was causing some tests in llvm-test-suite to fail when built
for SVE.

Differential Revision: https://reviews.llvm.org/D99569
2021-04-22 15:05:54 +00:00
Alexey Bataev d4f5f23bbb [SLP]Replace more `TTI` with `TTIRef`, NFC.
To pacify MSVC buildbots.
2021-04-22 07:53:20 -07:00
Alexey Bataev da2cdfd421 [SLP]Added explicit ref to TargetTransformInfo to try to pacify MSVC
buildbots, NFC.
2021-04-22 07:49:48 -07:00
Alexey Bataev e99b98cb1b [SLP]Improve cost model for the vectorized extractelements.
1. No need to call `areAllUsersVectorized` as later the cost is
   calculated only if the instruction has one use and gets vectorized.
2. Need to calculate the cost of the dead extractelement more precisely,
   taking the vector type of the vector operand, not the resulting
   vector type.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D99980
2021-04-22 07:40:17 -07:00
Dawid Jurczak 57f443c348 [SimplifyLibCalls][NFC] Use StringRef::back instead explicit indexing.
Split off from D100724.

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D101032
2021-04-22 15:02:47 +02:00
David Sherwood 5a229a6702 [LoopVectorize] Don't create unnecessary vscale intrinsic calls
In quite a few cases in LoopVectorize.cpp we call createStepForVF
with a step value of 0, which leads to unnecessary generation of
llvm.vscale intrinsic calls. I've optimised IRBuilder::CreateVScale
and createStepForVF to return 0 when attempting to multiply
vscale by 0.

Differential Revision: https://reviews.llvm.org/D100763
2021-04-22 09:01:52 +01:00
Max Kazantsev 8fe62b7af1 [GVN] Introduce loop load PRE
This patch allows PRE of the following type of loads:

```
preheader:
  br label %loop

loop:
  br i1 ..., label %merge, label %clobber

clobber:
  call foo() // Clobbers %p
  br label %merge

merge:
  ...
  br i1 ..., label %loop, label %exit

```

Into
```
preheader:
  %x0 = load %p
  br label %loop

loop:
  %x.pre = phi(x0, x2)
  br i1 ..., label %merge, label %clobber

clobber:
  call foo() // Clobbers %p
  %x1 = load %p
  br label %merge

merge:
  x2 = phi(x.pre, x1)
  ...
  br i1 ..., label %loop, label %exit

```

So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.

The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.

There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
  we don't know if their sum is colder than the header.

Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames
2021-04-22 12:50:38 +07:00
Chuanqi Xu 77ca2a6893 [Coroutine] Collect CoroBegin if all of terminators are dominated by one coro.destroy
Summary: The original logic seems to be we could collecting a CoroBegin
if one of the terminators could be dominated by one of coro.destroy,
which doesn't make sense.
This patch rewrites the logics to collect CoroBegin if all of
terminators are dominated by one coro.destroy. If there is no such
coro.destroy, we would call hasEscapePath to evaluate if we should
collect it.

Test Plan: check-llvm

Reviewed by: lxfind

Differential Revision: https://reviews.llvm.org/D100614
2021-04-22 11:21:37 +08:00
Giorgis Georgakoudis a2dbfb6b72 [OpenMP] Simplify offloading parallel call codegen
This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions.

Reviewed By: jdoerfert, Meinersbur

Differential Revision: https://reviews.llvm.org/D95976
2021-04-21 18:46:07 -07:00
Fangrui Song 775a9483e5 [IR][sanitizer] Set nounwind on module ctor/dtor, additionally set uwtable if -fasynchronous-unwind-tables
On ELF targets, if a function has uwtable or personality, or does not have
nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module.

Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified.
(i.e. If -g[123], every function gets `.eh_frame`.
This behavior is strange but that is the status quo on GCC and Clang.)

Let's take asan as an example. Other sanitizers are similar.
`asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true,
so every function gets `.eh_frame` if `-g[123]` is specified.
This is the root cause that
`-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame
while
`-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame.

This patch

* sets the nounwind attribute on sanitizer module ctor/dtor.
* let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute.

The "uwtable" mechanism is generic: synthesized functions not cloned/specialized
from existing ones should consider `Function::createWithDefaultAttr` instead of
`Function::create` if they want to get some default attributes which
have more of module semantics.

Other candidates: "frame-pointer" (https://github.com/ClangBuiltLinux/linux/issues/955
https://github.com/ClangBuiltLinux/linux/issues/1238), dso_local, etc.

Differential Revision: https://reviews.llvm.org/D100251
2021-04-21 15:58:20 -07:00
Olle Fredriksson f5446b769a [MemCpyOpt] Allow variable lengths in memcpy optimizer
This makes the memcpy-memcpy and memcpy-memset optimizations work for
variable sizes as long as they are equal, relaxing the old restriction
that they are constant integers. If they're not equal, the old
requirement that they are constant integers with certain size
restrictions is used.

The implementation works by pushing the length tests further down in the
code, which reveals some places where it's enough that the lengths are
equal (but not necessarily constant).

Differential Revision: https://reviews.llvm.org/D100870
2021-04-21 23:23:38 +02:00
Arthur Eubanks b606e2df4d [Evaluator] Bitcast result of pointer stripping
Trying to evaluate a GEP would assert with
  "Ty == cast<PointerType>(C->getType()->getScalarType())->getElementType()"
because the type of the pointer we would evaluate the GEP argument to
would be a different type than the GEP was expecting. We should treat
pointer stripping as a bitcast.

The test adds a redundant GEP that would crash due to type mismatch.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D100970
2021-04-21 13:32:29 -07:00
Nikita Popov 24e9fbc1a3 Revert "[InstCombine] Fold multiuse shr eq zero"
This reverts commit 9423f78240.

A performance regression with this patch has been reported at
https://reviews.llvm.org/rG9423f78240a2#990953. Reverting for now.
2021-04-21 21:40:52 +02:00
sstefan1 62cdcd6c5a [FuncAttrs] Don't infer willreturn for nonexact definitions
Discovered during attributor testing comparing stats with
and without the attributor. Willreturn should not be inferred
for nonexact definitions.

Differential Revision: https://reviews.llvm.org/D100988
2021-04-21 21:26:09 +02:00
sstefan1 656ebd519e [SimplifyLibCalls] Don't change alignment when creating memset
Fix for PR49984
This was discovered during Attributor testing.
Memset was always created with alignment of 1
and in case when strncpy alignment was changed
it triggered an assertion in the AttrBuilder.
Memset will now be created with appropriate alignment.

Differential Revision: https://reviews.llvm.org/D100875
2021-04-21 20:34:13 +02:00
Nico Weber ba7a92c01e [Support] Don't include VirtualFileSystem.h in CommandLine.h
CommandLine.h is indirectly included in ~50% of TUs when building
clang, and VirtualFileSystem.h is large.

(Already remarked by jhenderson on D70769.)

No behavior change.

Differential Revision: https://reviews.llvm.org/D100957
2021-04-21 10:19:01 -04:00
George Balatsouras 79b5280a6c [dfsan] Enable origin tracking with fast8 mode
All related instrumentation tests have been updated.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D100903
2021-04-20 18:10:32 -07:00
Arthur Eubanks 326da4adcb [FuncAttrs] Always preserve FunctionAnalysisManagerCGSCCProxy
FunctionAnalysisManagerCGSCCProxy should not be preserved if any of its
keys may be invalid. Since we are not removing/adding functions in
FuncAttrs, it's fine to preserve it.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D100893
2021-04-20 16:37:45 -07:00
Reid Kleckner 91f7a4fff7 Revert "[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)"
This reverts commit 13ec913bdf.

This commit introduces new uses of the overflow checking intrinsics that
depend on implementations in compiler-rt, which Windows users generally
do not link against. I filed an issue (somewhere) to make clang
auto-link the builtins library to resolve this situation, but until that
happens, it isn't reasonable for the optimizer to introduce new link
time dependencies.
2021-04-20 15:53:34 -07:00
Philip Reames 4824d876f0 Revert "Allow invokable sub-classes of IntrinsicInst"
This reverts commit d87b9b81cc.

Post commit review raised concerns, reverting while discussion happens.
2021-04-20 15:38:38 -07:00
Roman Lebedev 5a654bfeab
Revert "[InstCombine] `sext(trunc(x)) --> sext(x)` iff trunc is NSW (PR49543)"
I forgot about the case where we sign-extend to width smaller than the original.

This reverts commit 1e6ca23ab8.
2021-04-21 01:11:15 +03:00
Roman Lebedev 1e68d338c1
Revert "[InstCombine] "Bypass" NUW trunc of lshr if we are going to sext the result (PR49543)"
I forgot about the case where we sign-extend to width smaller than the original.

This reverts commit 41b71f718b.
2021-04-21 01:11:14 +03:00
Philip Reames d87b9b81cc Allow invokable sub-classes of IntrinsicInst
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.

This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.

After this lands and has baked for a couple days, planned cleanups:
    Make GCStatepointInst a IntrinsicInst subclass.
    Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
    Do the same in SelectionDAG.
    Do the same in FastISEL.

Differential Revision: https://reviews.llvm.org/D99976
2021-04-20 15:03:49 -07:00
Roman Lebedev 41b71f718b
[InstCombine] "Bypass" NUW trunc of lshr if we are going to sext the result (PR49543)
This is a more convoluted form of the same pattern "sext of NSW trunc",
but in this case the operand of trunc was a right-shift,
and the truncation chops off just the zero bits that were shifted-in.
2021-04-21 00:31:46 +03:00
Roman Lebedev 1e6ca23ab8
[InstCombine] `sext(trunc(x)) --> sext(x)` iff trunc is NSW (PR49543)
If we can tell that trunc only chops off sign bits, and not all of them,
then we can simply sign-extend the trunc's source.
2021-04-21 00:31:45 +03:00
Sanjay Patel 1e202e8f39 [InstCombine] fold shift-of-srem-by-2 to mask+shift
There are several potential srem-by-2 folds
because the result is known {-1,0,1}.

https://alive2.llvm.org/ce/z/LuVyeK
2021-04-20 17:10:16 -04:00
Roman Lebedev 13ec913bdf
[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)
We already had support for it's unsigned variant, so simply extend it
to also handle the signed variant.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48769
2021-04-20 21:29:43 +03:00
Joseph Huber b2ad63d3cf [OpenMP] Add OpenMPOpt as a Module pass
Summary:
This patch registers OpenMPOpt as a Module pass in addition to a CGSCC
pass. This is so certain optimzations that are sensitive to intact
call-sites can happen before inlining. The old `openmpopt` pass name is
changed to `openmp-opt-cgscc` and `openmp-opt` calls the Module pass.
The current module pass only runs a single check but will be expanded in
the future.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D99202
2021-04-20 12:28:58 -04:00
Alexey Bataev af870e11ae [SLP] Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100495
2021-04-20 09:08:46 -07:00
Philip Reames 3b1474cab2 free(nullptr) does not violate the nofree specification
This fixes a subtle and nasty bug in my 86664638. The problem is that free(nullptr) is well defined (and common).

The specification for the nofree attributes talks about memory objects, and doesn't explicitly address null, but I think it's reasonable to assume that nofree doesn't disallow a call to free(nullptr). If it did, we'd have to prove nonnull on an argument to ever infer nofree which doesn't seem to be the intent.

This was found by Nuno and Alive2 over in https://reviews.llvm.org/D100141#2697374.

Differential Revision: https://reviews.llvm.org/D100779
2021-04-20 09:08:05 -07:00
Alexey Bataev b82344a019 Revert "[SLP] Add detection of shuffled/perfect matching of tree entries."
This reverts commit daf6e18c55 to fix the
compiler crash.
2021-04-20 08:29:32 -07:00
Alexey Bataev daf6e18c55 [SLP] Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100495
2021-04-20 07:46:49 -07:00
Alexey Bataev cf00cb8bed Revert "[SLP] Add detection of shuffled/perfect matching of tree entries."
This reverts commit b232771aca to fix
buildbots.
2021-04-20 07:16:11 -07:00
Alexey Bataev b232771aca [SLP] Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100495
2021-04-20 06:55:55 -07:00
Sander de Smalen 86729538bd [LV] Let selectVectorizationFactor reason directly on VectorizationFactor.
Rather than maintaining two separate values, a `float` for the per-lane
cost and a Width for the VF, maintain a single VectorizationFactor which
comprises the two and also removes the need for converting an integer value
to float.

This simplifies the query when asking if one VF is more profitable than
another when we want to extend this for scalable vectors (which may
require additional options to determine if e.g. a scalable VF of the
some cost, is more profitable than a fixed VF of the same cost).

The patch isn't entirely NFC because it also fixes an issue in
selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs
no longer truncates the floating-point cost from `float` to `unsigned` to
then perform the calculation on the truncated cost. It now does
a cost comparison with the correct precision.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100121
2021-04-20 09:54:45 +01:00
Luo, Yuanke bcdaccfe34 [X86][AMX] Verify illegal types or instructions for x86_amx.
This patch is related to https://reviews.llvm.org/D100032 which define
some illegal types or operations for x86_amx. There are no arguments,
arrays, pointers, vectors or constants of x86_amx.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D100472
2021-04-20 16:14:22 +08:00
Arthur Eubanks 5e71b9fa93 Explicitly pass type to cast load constant folding result
Previously we would use the type of the pointee to determine what to
cast the result of constant folding a load. To aid with opaque pointer
types, we should explicitly pass the type of the load rather than
looking at pointee types.

ConstantFoldLoadThroughBitcast() converts the const prop'd value to the
proper load type (e.g. [1 x i32] -> i32). Instead of calling this in
every intermediate step like bitcasts, we only call this when we
actually see the global initializer value.

In some existing uses of this API, we don't know the exact type we're
loading from immediately (e.g. first we visit a bitcast, then we visit
the load using the bitcast). In those cases we have to manually call
ConstantFoldLoadThroughBitcast() when simplifying the load to make sure
that we cast to the proper type.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D100718
2021-04-20 00:53:21 -07:00
Dávid Bolvanský 324d641b75 [InstCombine] Enhance deduction of alignment for aligned_alloc
This patch improves https://reviews.llvm.org/D76971 (Deduce attributes for aligned_alloc in InstCombine) and implements "TODO" item mentioned in the review of that patch.

> The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a multiple of alignment.

Currently, we simply bail out if we see a non-constant size - change that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D100785
2021-04-20 02:04:18 +02:00
Alexey Bataev 8030481065 Revert "[SLP]Add detection of shuffled/perfect matching of tree entries."
This reverts commit d6fde91379 to fix
compiler crashes.
2021-04-19 14:10:04 -07:00
Zequan Wu e28435caf6 [ThinLTO] Copy UnnamedAddr when spliting module.
The unnamedaddr property of a function is lost when using
`-fwhole-program-vtables` and thinlto which causes size increase under linker's
safe icf mode.

The size increase of chrome on Linux when switching from all icf to safe icf
drops from 5 MB to 3 MB after this change, and from 6 MB to 4 MB on Windows.

There is a repro:
```
# a.h
struct A {
  virtual int f();
  virtual int g();
};

# a.cpp
#include "a.h"
int A::f() { return 10; }
int A::g() { return 10; }

# main.cpp
#include "a.h"

int g(A* a) {
  return a->f();
}

int main(int argv, char** args) {
  A a;
  return g(&a);
}

$ clang++ -O2 -ffunction-sections -flto=thin -fwhole-program-vtables -fsplit-lto-unit -c main.cpp -o main.o  && clang++ -Wl,--icf=safe -fuse-ld=lld  -flto=thin main.o -o a.out && llvm-readobj -t a.out | grep -A 1 -e _ZN1A1fEv -e _ZN1A1gEv
    Name: _ZN1A1fEv (480)
    Value: 0x201830
--
    Name: _ZN1A1gEv (490)
    Value: 0x201840
```

Differential Revision: https://reviews.llvm.org/D100498
2021-04-19 14:04:58 -07:00
Alexey Bataev d6fde91379 [SLP]Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Differential Revision: https://reviews.llvm.org/D100495
2021-04-19 13:29:30 -07:00
Philip Reames 3c54762226 [funcattrs] Consistently check call site attributes
This is mostly stylistic cleanup after D100226, but not entirely. When skimming the code, I found one case where we weren't accounting for attributes on the callsite at all. I'm also suspicious we had some latent bugs related to operand bundles (which are supposed to be able to *override* attributes on declarations), but I don't have concrete test cases for those, just suspicions.

Aside: The only case left in the file which directly checks attributes on the declaration is the norecurse logic. I left that because I didn't understand it; it looks obviously wrong, so I suspect I'm misinterpreting the intended semantics of the attribute.

Differential Revision: https://reviews.llvm.org/D100689
2021-04-19 13:20:50 -07:00
Philip Reames 01801d5274 [rs4gc] Fix a latent bug around attribute stripping for intrinsics
This change fixes a latent bug which was exposed by a change currently in review (https://reviews.llvm.org/D99802#2685032).

The story on this is a bit involved.  Without this change, what ended up happening with the pending review was that we'd strip attributes off intrinsics, and then selectiondag would fail to lower the intrinsic.  Why?  Because the lowering of the intrinsic relies on the presence of the readonly attribute.  We don't have a matcher to select the case where there's a glue node needed.

Now, on the surface, this still seems like a codegen bug.  However, here it gets fun.  I was unable to reproduce this with a standalone test at all, and was pretty much struck until skatkov provided the critical detail.  This reproduces only when RS4GC and codegen are run in the same process and context.  Why?  Because it turns out we can't roundtrip the stripped attribute through serialized IR!

We'll happily print out the missing attribute, but when we parse it back, the auto-upgrade logic has a side effect of blindly overwriting attributes on intrinsics with those specified in Intrinsics.td.  This makes it impossible to exercise SelectionDAG from a standalone test case.

At this point, I decided to treat this an RS4GC bug as a) we don't need to strip in this case, and b) I could write a test which shows the correct behavior to ensure this doesn't break again in the future.

As an aside, I'd originally set out to handle libfuncs too - since in theory they might have the same issues - but backed away quickly when I realized how the semantics of builtin, nobuiltin, and no-builtin-x all interacted.  I'm utterly convinced that no part of the optimizer handles that correctly, and decided not to open that can of worms here.
2021-04-19 13:14:07 -07:00
Nikita Popov 9423f78240 [InstCombine] Fold multiuse shr eq zero
The single-use case is handled implicity by converting the icmp
into a mask check first. When comparing with zero in particular,
we don't need the one-use restriction, as we only produce a single
icmp.

https://alive2.llvm.org/ce/z/MSixcm
https://alive2.llvm.org/ce/z/GwpG0M
2021-04-19 22:13:11 +02:00
Nikita Popov d440f9a326 [LICM] Make capture check more precise
During store promotion, we check whether the pointer was captured
to exclude potential reads from other threads. However, we're only
interested in captures before or inside the loop. Check this using
PointerMayBeCapturedBefore against the loop header.

Differential Revision: https://reviews.llvm.org/D100706
2021-04-19 20:34:23 +02:00
Roman Lebedev d746fefb6f
[SCEVExpander] ReuseOrCreateCast(): use IRBuilder to actually create the cast
In particular, this allows to create constant expressions
instead of IR Instruction's if the argumen is a constant.
2021-04-19 18:38:39 +03:00
Roman Lebedev ecc9d7e913
[SCEVExpander] Expand explicit PtrToInt casts just like we would implicit ones
I.e., use GetOptimalInsertionPointForCastOf() helper to get the insertion
point, and try to reuse casts first.
2021-04-19 18:38:39 +03:00
Roman Lebedev 442c408e0e
[SCEVExpander] GetOptimalInsertionPointForCastOf(): gracefully handle Constant's
I guess this case hasn't come up thus far, and i'm not sure if it can
really happen for the existing usages, thus no test in *this* commit.

But, the following commit adds test coverage,
there we'd expirience a crash without this fix.
2021-04-19 18:38:39 +03:00
Roman Lebedev b8a3705896
[NFCI][SCEVExpander] Extract GetOptimalInsertionPointForCastOf() helper 2021-04-19 18:38:38 +03:00
Roman Lebedev 73f60e3988
[SCEVExpander] generateOverflowCheck(): explicitly PtrToInt the Start
Currently, InsertNoopCastOfTo() would implicitly insert that cast,
but now that we have SCEVPtrToIntExpr, i'm hoping we could stop
InsertNoopCastOfTo() from doing that. But first all users must be fixed.
2021-04-19 18:38:38 +03:00
Cullen Rhodes f0bc2782f2 [TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D100377
2021-04-19 11:01:34 +00:00
OCHyams 0ebf9a8e34 [DebugInfo] Move the findDbg* functions into DebugInfo.cpp
Move the findDbg* functions into lib/IR/DebugInfo.cpp from
lib/Transforms/Utils/Local.cpp.

D99169 adds a call to a function (findDbgUsers) that lives in
lib/Transforms/Utils/Local.cpp (LLVMTransformUtils) from lib/IR/Value.cpp
(LLVMCore). The Core lib doesn't include TransformUtils. The builtbots caught
this here: https://lab.llvm.org/buildbot/#/builders/109/builds/12664. This patch
moves the function, and the 3 similar ones for consistency, into DebugInfo.cpp
which is part of LLVMCore.

Reviewed By: dblaikie, rnk

Differential Revision: https://reviews.llvm.org/D100632
2021-04-19 10:30:25 +01:00
Evgeniy Brevnov 35e95c6817 [CVP] processCallSite returns wrong status
Recently processMinMaxIntrinsic has been added and we started to observe a number of analysis get invalidated after CVP. The problem is CVP conservatively returns 'true'  even if there were no modifications to IR. I found one more place besides processMinMaxIntrinsic  which has the same problem. I think processMinMaxIntrinsic and similar should better have boolean return status to prevent similar issue reappear in future.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D100538
2021-04-19 12:13:22 +07:00
Xun Li 5faba87938 Revert "[Coroutines] Set presplit attribute in Clang instead of CoroEarly pass"
This reverts commit fa6b54c44a.
The commited patch broke mlir tests. It seems that mlir tests depend on coroutine function properties set in CoroEarly pass.
2021-04-18 17:22:28 -07:00
Xun Li fa6b54c44a [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920

To fix this, we set the attributes in the Clang front-end instead of in CoroEarly pass.

Reviewed By: rjmccall, ChuanqiXu

Differential Revision: https://reviews.llvm.org/D100282
2021-04-18 15:41:09 -07:00
Xun Li c0211e8d7d Revert "[Coroutines] Move CoroEarly pass to before AlwaysInliner"
This reverts commit 2b50f5a434.
Forgot to update the description of the commit to sync with phabricator. Going to redo the commit.
2021-04-18 15:38:19 -07:00