Commit Graph

26463 Commits

Author SHA1 Message Date
Wang, Pengfei 8dcf8d1da5 [msan] Fix bugs when instrument x86.avx512*_cvt* intrinsics.
Scalar intrinsics x86.avx512*_cvt* have an extra rounding mode operand.
We can directly ignore it to reuse the SSE/AVX math.
This fix the bug https://bugs.llvm.org/show_bug.cgi?id=48298.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D92206
2020-11-27 16:33:14 +08:00
Markus Lavin 808fcfe594 Revert "[DebugInfo] Improve dbg preservation in LSR."
This reverts commit 06758c6a61.

Bug: https://bugs.llvm.org/show_bug.cgi?id=48166
Additional discussion in: https://reviews.llvm.org/D91711
2020-11-27 08:52:32 +01:00
Max Kazantsev faf183874c [IndVars] LCSSA Phi users should not prevent widening
When widening an IndVar that has LCSSA Phi users outside
the loop, we can safely widen it as usual and then truncate
the result outside the loop without hurting the performance.

Differential Revision: https://reviews.llvm.org/D91593
Reviewed By: skatkov
2020-11-27 11:19:54 +07:00
Roman Lebedev f3abd54958
Revert "[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions"
Many bots are unhappy, at the very least missed a few codegen tests,
and possibly this has a logic hole inducing a miscompile
(will be really awesome to have ready reproducer..)

Need to investigate.

This reverts commit 2245fb8aaa.
2020-11-26 23:13:43 +03:00
Roman Lebedev 2245fb8aaa
[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions
1. It doesn't make sense to enforce that the bonus instruction
   is only used once in it's basic block. What matters is
   whether those user instructions fit within our budget, sure,
   but that is another question.
2. It doesn't make sense to enforce that said bonus instructions
   are only used within their basic block. Perhaps the branch
   condition isn't using the value computed by said bonus instruction,
   and said bonus instruction is simply being calculated
   to be used in successors?

So iff we can clone bonus instructions, to lift these restrictions,
we just need to carefully update their external uses
to use the new cloned instructions.

Notably, this transform (even without this change) appears to be
poison-unsafe as per alive2, but is otherwise (including the patch) legal.

We don't introduce any new PHI nodes, but only "move" the instructions
around, i'm not really seeing much potential for extra cost modelling
for the transform, especially since now we allow at most one such
bonus instruction by default.

This causes the fold to fire +11.4% more (13216 -> 14725)
as of vanilla llvm test-suite + RawSpeed.

The motivational pattern is IEEE-754-2008 Binary16->Binary32
extension code:
ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)
^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v
That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM
apparently that fold is happening somewhere else afterall,
so something else also has a similar 'artificial' restriction.
2020-11-26 22:51:22 +03:00
Roman Lebedev 65db7d38e0
[NFC][SimplifyCFG] Add statistic to `FoldBranchToCommonDest()` fold 2020-11-26 22:51:21 +03:00
Nikita Popov 4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Florian Hahn bd0b1311db
[VPlan] Turn VPReplicateRecipe into a VPValue.
Update VPReplicateRecipe to inherit from VPValue. This still does not
update scalarizeInstruction to set the result for the VPValue of
VPReplicateRecipe, because this first requires tracking scalar values in
VPTransformState.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D91500
2020-11-26 13:50:24 +00:00
David Stenberg 384996f9e1 [IndVarSimplify] Fix Modified status when handling dead PHI nodes
When bailing out in rewriteLoopExitValues() you could be left with PHI
nodes in the DeadInsts vector. Those would be not handled by the use of
RecursivelyDeleteTriviallyDeadInstructions() in IndVarSimplify. This
resulted in the IndVarSimplify pass returning an incorrect modified
status. This was caught by the expensive check introduced in D86589.

This patches changes IndVarSimplify so that it deletes those PHI nodes,
using RecursivelyDeleteDeadPHINode().

This fixes PR47486.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D91153
2020-11-26 14:28:21 +01:00
Zhengyang Liu 345fcccb33 Fix use-of-uninitialized-value in rG75f50e15bf8f
Differential Revision: https://reviews.llvm.org/D71126
2020-11-26 01:39:22 -07:00
Max Kazantsev 664e1da485 [LoopLoadElim] Make sure all loops are in simplify form. PR48150
LoopLoadElim may end up expanding an AddRec from a loop
which is not the current loop. This loop may not be in simplify
form. We figure it out after the no-return point, so cannot bail
in this case.

AddRec requires simplify form to expand. The only way to ensure
this does not crash is to simplify all loops beforehand.

The issue only exists in new PM. Old PM requests LoopSimplify
required pass and it simplifies all loops before the opt begins.

Differential Revision: https://reviews.llvm.org/D91525
Reviewed By: asbirlea, aeubanks
2020-11-26 10:51:11 +07:00
Roman Lebedev a8d74517dc
[PassManager] Run Induction Variable Simplification pass *after* Recognize loop idioms pass, not before
Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does.
I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand,
but i'm *very* sure that `-indvars` requires `-loop-idiom` to run afterwards,
as it can be seen in the phase-ordering test.

LoopIdiom runs on two types of loops: countable ones, and uncountable ones.
For uncountable ones, IndVars obviously didn't make any change to them,
since they are uncountable, so for them the order should be irrelevant.
For countable ones, well, they should have been countable before IndVars
for IndVars to make any change to them, and since SCEV is used on them,
it shouldn't matter if IndVars have already canonicalized them.
So i don't really see why we'd want the current ordering.

Should this cause issues, it will give us a reproducer test case
that shows flaws in this logic, and we then could adjust accordingly.

While this is quite likely beneficial in-the-wild already,
it's a required part for the full motivational pattern
behind `left-shift-until-bittest` loop idiom (D91038).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91800
2020-11-25 19:20:07 +03:00
Cullen Rhodes 1ba4b82f67 [LAA] NFC: Rename [get]MaxSafeRegisterWidth -> [get]MaxSafeVectorWidthInBits
MaxSafeRegisterWidth is a misnomer since it actually returns the maximum
safe vector width. Register suggests it relates directly to a physical
register where it could be a vector spanning one or more physical
registers.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91727
2020-11-25 13:06:26 +00:00
Florian Hahn ad5b83ddcf
[VPlan] Add VPReductionSC to VPUser::classof, unify VPValue IDs.
This is a follow-up to 00a6601136 to make
isa<VPReductionRecipe> work and unifies the VPValue ID names, by making
sure they all consistently start with VPV*.
2020-11-25 11:08:25 +00:00
David Green e0c479cd0e [VPlan] Switch VPWidenRecipe to be a VPValue
Similar to other patches, this makes VPWidenRecipe a VPValue. Because of
the way it interacts with the reduction code it also slightly alters the
way that VPValues are registered, removing the up front NeedDef and
using getOrAddVPValue to create them on-demand if needed instead.

Differential Revision: https://reviews.llvm.org/D88447
2020-11-25 08:25:06 +00:00
David Green 00a6601136 [VPlan] Turn VPReductionRecipe into a VPValue
This converts the VPReductionRecipe into a VPValue, like other
VPRecipe's in preparation for traversing def-use chains. It also makes
it a VPUser, now storing the used VPValues as operands.

It doesn't yet change how the VPReductionRecipes are created. It will
need to call replaceAllUsesWith from the original recipe they replace,
but that is not done yet as VPWidenRecipe need to be created first.

Differential Revision: https://reviews.llvm.org/D88382
2020-11-25 08:25:05 +00:00
Kazu Hirata 1c82d32089 [CHR] Use pred_size (NFC) 2020-11-24 22:52:30 -08:00
Max Kazantsev 28d7ba1543 [IndVars] Use more precise context when eliminating narrowing
When deciding to widen narrow use, we may need to prove some facts
about it. For proof, the context is used. Currently we take the instruction
being widened as the context.

However, we may be more precise here if we take as context the point that
dominates all users of instruction being widened.

Differential Revision: https://reviews.llvm.org/D90456
Reviewed By: skatkov
2020-11-25 11:47:39 +07:00
Philip Reames 10ddb927c1 [SCEV] Use isa<> pattern for testing for CouldNotCompute [NFC]
Some older code - and code copied from older code - still directly tested against the singelton result of SE::getCouldNotCompute.  Using the isa<SCEVCouldNotCompute> form is both shorter, and more readable.
2020-11-24 18:47:49 -08:00
Sanjay Patel 678b9c5dde [InstCombine] try difference-of-shifts factorization before negator
We need to preserve wrapping flags to allow better folds.
The cases with geps may be non-intuitive, but that appears to agree with Alive2:
https://alive2.llvm.org/ce/z/JQcqw7
We create 'nsw' ops independent from the original wrapping on the sub.
2020-11-24 13:56:30 -05:00
Philip Reames 075468621c [LoopVec] Add a minor clarifying comment 2020-11-24 10:45:06 -08:00
Teresa Johnson 6e4c1cf293 [ThinLTO/WPD] Enable -wholeprogramdevirt-skip in ThinLTO backends
Previously this option could be used to skip devirtualizations of the
given functions in regular LTO and in the ThinLTO indexing step. This
change allows them to be skipped in the backend as well, which is useful
when debugging WPD in a distributed ThinLTO backend.

Differential Revision: https://reviews.llvm.org/D91812
2020-11-24 09:35:07 -08:00
Ayal Zaks 32d9a386bf [LV] Keep Primary Induction alive when folding tail by masking
Fix PR47390.

The primary induction should be considered alive when folding tail by masking,
because it will be used by said masking; even when it may otherwise appear
useless: feeding only its own 'bump', which is correctly considered dead, and
as the 'bump' of another induction variable, which may wrongfully want to
consider its bump = the primary induction, dead.

Differential Revision: https://reviews.llvm.org/D92017
2020-11-24 15:12:54 +02:00
Arthur Eubanks 932e4f8815 [FunctionAttrs][NPM] Fix handling of convergent
The legacy pass didn't properly detect indirect calls.

We can still remove the convergent attribute when there are indirect
calls. The LangRef says:

> When it appears on a call/invoke, the convergent attribute indicates
that we should treat the call as though we’re calling a convergent
function. This is particularly useful on indirect calls; without this we
may treat such calls as though the target is non-convergent.

So don't skip handling of convergent when there are unknown calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89826
2020-11-23 21:09:41 -08:00
Philip Reames 1a9c72f8a8 [LoopVec] Reuse a lambda [NFC]
Minor code refactor to improve readability.
2020-11-23 21:07:34 -08:00
Philip Reames b06a2ad94f [LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE)
A uniform load is one which loads from a uniform address across all lanes. As currently implemented, we cost model such loads as if we did a single scalar load + a broadcast, but the actual lowering replicates the load once per lane.

This change tweaks the lowering to use the REPLICATE strategy by marking such loads (and the computation leading to their memory operand) as uniform after vectorization. This is a useful change in itself, but it's real purpose is to pave the way for a following change which will generalize our uniformity logic.

In review discussion, there was an issue raised with coupling cost modeling with the lowering strategy for uniform inputs.  The discussion on that item remains unsettled and is pending larger architectural discussion.  We decided to move forward with this patch as is, and revise as warranted once the bigger picture design questions are settled.

Differential Revision: https://reviews.llvm.org/D91398
2020-11-23 15:32:17 -08:00
Sanjay Patel ab29f091eb [InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps
This is a retry of 324a53205. I cautiously reverted that at 6aa3fc4
because the rules about gep math were not clear. Since then, we
have added this line to LangRef for gep inbounds:
"The successive addition of offsets (without adding the base address)
does not wrap the pointer index type in a signed sense (nsw)."

See D90708 and post-commit comments on the revert patch for more details.
2020-11-23 16:50:09 -05:00
Arthur Eubanks 3c811ce4f3 [NPM] Share pass building options with legacy PM
We should share options when possible.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91741
2020-11-23 13:04:05 -08:00
Sjoerd Meijer 33b2c88fa8 [LoopFlatten] Widen IV, support ZExt.
I disabled the widening in fa5cb4b because it run in an assert, which was
related to replacing values with different types. I forgot that an extend could
also be a zero-extend, which I have added now. This means that the approach now
is to create and insert a trunc value of the outerloop for each user, and use
that to replace IV values.

Differential Revision: https://reviews.llvm.org/D91690
2020-11-23 08:57:19 +00:00
Kazu Hirata df73b8c174 [ValueMapper] Remove unused declaration remapFunction (NFC)
The function declaration with two parameters was introduced on Apr 16
2016 in commit f0d73f95c1 without a
corresponding definition.
2020-11-22 21:52:03 -08:00
Kazu Hirata 186d129320 [hwasan] Remove unused declaration shadowBase (NFC)
The function was introduced on Jan 23, 2019 in commit
73078ecd38.

Its definition was removed on Oct 27, 2020 in commit
0930763b4b, leaving the declaration
unused.
2020-11-22 20:08:51 -08:00
Kazu Hirata def7cfb7ff [InstCombine] Use is_contained (NFC) 2020-11-21 15:47:11 -08:00
Alexey Bataev 0b420d674a [SLP][NFC]Fix assert condition in newTreeEntry, NFC. 2020-11-20 13:25:21 -08:00
Hongtao Yu f3c445697d [CSSPGO] IR intrinsic for pseudo-probe block instrumentation
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:

1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.

We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.

Let's now look at an example. Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}

```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86490
2020-11-20 10:39:24 -08:00
Jamie Schmeiser 7f6360cdc6 Reland: Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug.
Summary:
Expand existing loopsink testing to also test loopsinking using new pass
manager.  Enable memoryssa for loopsink with new pass manager.  This
combination exposed a bug that was previously fixed for loopsink
without memoryssa.  When sinking an instruction into a loop, the source
block may not be part of the loop but still needs to be checked for
pointer invalidation.  This is the fix for bugzilla #39695 (PR 54659)
expanded to also work with memoryssa.

Respond to review comments.  Enable Memory SSA in legacy Loop Sink pass
under EnableMSSALoopDependency option control.  Update tests accordingly.

Respond to review comments.  Add options controlling whether memoryssa is
used for loop sink, defaulting to off.  Expand testing based on these
options.

Respond to review comments.  Properly indicated preserved analyses.

This relanding addresses a compile-time performance problem by moving
test for profile data earlier to avoid unnecessary computations.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D90249
2020-11-20 10:26:33 -05:00
Arthur Eubanks b77436047a [PGO] Make -disable-preinline work with NPM
Fixes cspgo_profile_summary.ll under NPM.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D91826
2020-11-19 22:58:55 -08:00
Arthur Eubanks 513d165b80 Port -lower-matrix-intrinsics-minimal to NPM
This reuses the existing lower-matrix-intrinsics pass rather than going
the legacy pass route of creating a new pass.

Use this new variant in the NPM -O0 pipeline.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91811
2020-11-19 17:42:48 -08:00
Florian Hahn 7fa14a7c69
[ConstraintElimination] Decompose GEP with arbitrary offsets.
This patch decomposes `GEP %x, %offset` as  0 + 1 * %x + 1 * %off.
2020-11-19 22:49:21 +00:00
Geoffrey Martin-Noble b156514f8d Remove unused private fields
Unused since https://reviews.llvm.org/D91762 and triggering
-Wunused-private-field

```
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:365:13: error: private field 'GetArgTLS' is not used [-Werror,-Wunused-private-field]
  Constant *GetArgTLS;
            ^
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:366:13: error: private field 'GetRetvalTLS' is not used [-Werror,-Wunused-private-field]
  Constant *GetRetvalTLS;
```

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D91820
2020-11-19 13:54:54 -08:00
Roman Lebedev a91e96702a
[InstCombine] Fold `and(shl(zext(x), width(SIGNMASK) - width(%x)), SIGNMASK)` to `and(sext(%x), SIGNMASK)`
One less instruction and reducing use count of zext.
As alive2 confirms, we're fine with all the weird combinations of
undef elts in constants, but unless the shift amount was undef
for a lane, we must sanitize undef mask to zero, since sign bits
are no longer zeros.

https://rise4fun.com/Alive/d7r
```
----------------------------------------
Optimization: zz
Precondition: ((C1 == (width(%r) - width(%x))) && isSignBit(C2))
  %o0 = zext %x
  %o1 = shl %o0, C1
  %r = and %o1, C2
=>
  %n0 = sext %x
  %r = and %n0, C2

Done: 2016
Optimization is correct!
```
2020-11-20 00:31:27 +03:00
Jianzhou Zhao 6c1c308c0e Remove deadcode from DFSanFunction::get*TLS*()
clean more deadcode after D84704

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D91762
2020-11-19 21:10:37 +00:00
Nikita Popov 393b9e9db3 [MemLoc] Require LocationSize argument (NFC)
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
2020-11-19 21:45:52 +01:00
Sander de Smalen 41c9f4c1ce [LoopVectorize] NFC: Fix unused variable warning for MaxSafeDepDist
rGf571fe6df585127d8b045f8e8f5b4e59da9bbb73 led to a warning of an unused
variable for MaxSafeDepDist (written but not used). It seems this
variable and assignment can be safely removed.
2020-11-19 17:41:35 +00:00
Joseph Huber da8bec47ab [OpenMP] Add Location Fields to Libomptarget Runtime for Debugging
Summary:
Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values.

Reviewers: jdoerfert grokos

Differential Revision: https://reviews.llvm.org/D87946
2020-11-19 12:01:53 -05:00
Simon Moll a1de391dae [LV][NFC-ish] Allow vector widths over 256 elements
The assertion that vector widths are <= 256 elements was hard wired in the LV code. Eg, VE allows for vectors up to 512 elements. Test again the TTI vector register bit width instead - this is an NFC for non-asserting builds.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D91518
2020-11-19 10:58:29 +01:00
Max Kazantsev 515105f46b [NFC] Remove comment (commited ahead of time by mistake) 2020-11-19 16:28:34 +07:00
Max Kazantsev 7c601d09a7 [NFC] Move code earlier as preparation for further changes 2020-11-19 16:27:23 +07:00
Andrew Wei ea7ab5a42c [IndVarSimplify] Notify top most loop to drop cached exit counts
Some nested loops may share the same ExitingBB, so after we finishing FoldExit,
we need to notify OuterLoop and SCEV to drop any stored trip count.

Patched by: guopeilin
Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D91325
2020-11-19 15:37:54 +08:00
Kazu Hirata 43c0e4f665 [Transforms] Use llvm::is_contained (NFC) 2020-11-18 20:42:22 -08:00
Jamie Schmeiser cff479b145 Revert "Revert "Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug."""
This reverts commit e29292969b.

This apparently causes a regression in compile time (ie, it slows down).
2020-11-18 16:07:16 -05:00
Roman Lebedev 7bf89c2174
[NFC][Reassociate] Delay checking isLoadCombineCandidate() until after ShouldConvertOrWithNoCommonBitsToAdd() but before haveNoCommonBitsSet()
This appears to improve -O3 compile-time performance somewhat:
https://llvm-compile-time-tracker.com/compare.php?from=87369c626114ae17f4c637635c119e6de0856a9a&to=c04b8271e1609b0dfb20609b40844b0c4324517e&stat=instructions
It doesn't look like delaying it until after haveNoCommonBitsSet() is better:
https://llvm-compile-time-tracker.com/compare.php?from=c04b8271e1609b0dfb20609b40844b0c4324517e&to=b2943d450eaf41b5f76d2dc7350f0a279f64cd99&stat=instructions
2020-11-18 23:57:12 +03:00
Jamie Schmeiser e29292969b Revert "Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug.""
This reverts commit 562addba65.

Reverted change too quickly, the failing test cases passed on the next build.
So reverting revert (to include the changes).
2020-11-18 15:33:02 -05:00
Florian Hahn 2fead1ac61
[ConstraintElimination] Decompose add nuw/sub nuw.
Make use of the more flexible constraint handling added in
a8a79c9069 to decompose add nuw/sub nuw.
2020-11-18 20:29:30 +00:00
Joseph Huber 97e55cfef5 [OpenMP] Add Passing in Original Declaration Names To Mapper API
Summary:
This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;"

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D89802
2020-11-18 15:28:39 -05:00
Nikita Popov f4a3969bff [Inline] Fix incorrectly dropped noalias metadata
This is the same fix as 23aeadb89d,
just for CloneScopedAliasMetadata rather than PropagateCallSiteMetadata.

In this case the previous outcome was incorrectly dropped metadata,
as it was not part of the computed metadata map.

The real change in the test is that the first load now retains
metadata, the rest of the changes are due to changes in metadata
numbering.
2020-11-18 21:22:50 +01:00
Jamie Schmeiser 562addba65 Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug."
This reverts commit d4ba28bddc.
2020-11-18 15:17:53 -05:00
Nikita Popov 23aeadb89d [Inline] Fix incorrect noalias metadata application (PR48209)
The VMap also contains a mapping from Argument => Instruction,
where the instruction is part of the original function, not the
inlined one. The code was assuming that all the instructions in
the VMap were inlined.

This was a pre-existing problem for the loop access metadata, but
was extended to the more common noalias metadata by
27f647d117, thus causing miscompiles.

There is a similar assumption inside CloneAliasScopeMetadata(), so
that one likely needs to be fixed as well.
2020-11-18 20:52:58 +01:00
Jamie Schmeiser d4ba28bddc Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug.
Summary:
Expand existing loopsink testing to also test loopsinking using new pass
manager.  Enable memoryssa for loopsink with new pass manager.  This
combination exposed a bug that was previously fixed for loopsink
without memoryssa.  When sinking an instruction into a loop, the source
block may not be part of the loop but still needs to be checked for
pointer invalidation.  This is the fix for bugzilla #39695 (PR 54659)
expanded to also work with memoryssa.

Respond to review comments.  Enable Memory SSA in legacy Loop Sink pass
under EnableMSSALoopDependency option control.  Update tests accordingly.

Respond to review comments.  Add options controlling whether memoryssa is
used for loop sink, defaulting to off.  Expand testing based on these
options.

Respond to review comments.  Properly indicated preserved analyses.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D90249
2020-11-18 14:08:42 -05:00
Piotr Sobczak b3b9be4ae7 SpeculativeExecution: Allow speculating more instruction types
Support more instructions in SpeculativeExecution pass:
- ExtractValue
- InsertValue
- Trunc
- Freeze

Differential Revision: https://reviews.llvm.org/D91688
2020-11-18 17:00:19 +01:00
Roman Lebedev 34ff90ad5d
[Reassociate] Don't convert add-like-or's into add's if they appear to be part of load-combining idiom
As Wei Mi is reporting in post-commit review
  https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20201116/853479.html
teaching -reassociate about add-like-or's (70472f3) results in breaking apart
load widening patterns, and reassociating them.

For now, simply exclude any such `or` that appears to be a root of
load widening idiom from the or->add transformation.

Note that the heuristic is greedy, it doesn't ensure that loads
can *actually* be widened into a single load.
2020-11-18 17:55:02 +03:00
Florian Hahn a8a79c9069
[ConstraintElimination] Refactor constraint extraction (NFC).
This patch generalizes the extraction of a constraint for a given
condition. It allows decompose to return a vector of c * X pairs, which
allows de-composing multiple instructions in the future.

It also adds more clarifying comments.
2020-11-18 13:59:18 +00:00
Benjamin Kramer 4dbe12e866 [SLP] Use the minimum alignment of the load bundle when forming a masked.gather
Instead of the first load. That works when vectorizing contiguous loads,
but not for gathers.

Fixes a miscompile introduced in fcad8d3635.
2020-11-18 12:53:39 +01:00
Max Kazantsev f33118c61c [IndVars] Support different types of ExitCount when optimizing exit conds
In some cases we can handle IV and iter count of different types. It's a typical situation
after IV have been widened. This patch adds support for such cases, when legal.

Differential Revision: https://reviews.llvm.org/D88528
Reviewed By: skatkov
2020-11-18 18:20:05 +07:00
Piotr Sobczak c173f1b8eb SpeculativeExecution: Allow speculating more instruction types
Support more instructions in SpeculativeExecution pass:
- ExtractElement
- InsertElement
- ShuffleVector

Differential Revision: https://reviews.llvm.org/D91633
2020-11-18 09:46:43 +01:00
Arthur Eubanks 9e3b4f4941 [JumpThreading] Make -print-lvi-after-jump-threading work with NPM 2020-11-17 23:15:20 -08:00
Arthur Eubanks ee7d315cd9 [DCE] Always get TargetLibraryInfo
I don't see any reason not to unconditionally retrieve TLI, it's fairly
cheap.

Fixes calls-errno.ll under NPM.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91476
2020-11-17 20:41:05 -08:00
Nick Desaulniers f4c6080ab8 Revert "[IR] add fn attr for no_stack_protector; prevent inlining on mismatch"
This reverts commit b7926ce6d7.

Going with a simpler approach.
2020-11-17 17:27:14 -08:00
Sanjay Patel 08834979e3 [SLP] avoid unreachable code crash/infloop
Example based on the post-commit comments for D88735.
2020-11-17 15:10:23 -05:00
Sanjay Patel 4a66a1d17a [InstCombine] allow vectors for masked-add -> xor fold
https://rise4fun.com/Alive/I4Ge

  Name: add with pow2 mask
  Pre: isPowerOf2(C2) && (C1 & C2) != 0 && (C1 & (C2-1)) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %n = and i8 %x, C2
  %r = xor i8 %n, C2
2020-11-17 13:36:08 -05:00
Simon Pilgrim f7ebdec987 [InstCombine] visitAnd - remove unnecessary Value *X, *Y shadow variables. NFCI.
Fixes a number of Wshadow warnings.
2020-11-17 17:59:21 +00:00
Simon Pilgrim abf29d9862 [InstCombine] visitAnd - use m_SpecificInt instead of m_APInt + comparison. NFCI.
m_SpecificInt has the same 'no undef element' behaviour as m_APInt so no change there, and anyway we have test coverage for undef elements in the fold.

Noticed while fixing a Wshadow warning about shadow Value *X, *Y variables.
2020-11-17 17:37:10 +00:00
Sanjay Patel f791ad7e1e [InstCombine] remove scalar constraint for mask-of-add fold
https://rise4fun.com/Alive/V6fP

  Name: add with low mask
  Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %r = and i8 %x, C2
2020-11-17 12:13:45 -05:00
Sanjay Patel 433696911a [InstCombine] relax constraints on mask-of-add
There are 2 changes:
1. Remove the unnecessary one-use check.
2. Remove the unnecessary power-of-2 check.

https://rise4fun.com/Alive/V6fP

  Name: add with low mask
  Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %r = and i8 %x, C2
2020-11-17 12:13:44 -05:00
Florian Hahn 52f3714dae [VPlan] Add VPDef class.
This patch introduces a new VPDef class, which can be used to
manage VPValues defined by recipes/VPInstructions.

The idea here is to mirror VPUser for values defined by a recipe. A
VPDef can produce either zero (e.g. a store recipe), one (most recipes)
or multiple (VPInterleaveRecipe) result VPValues.

To traverse the def-use chain from a VPDef to its users, one has to
traverse the users of all values defined by a VPDef.

VPValues now contain a pointer to their corresponding VPDef, if one
exists. To traverse the def-use chain upwards from a VPValue, we first
need to check if the VPValue is defined by a VPDef. If it does not have
a VPDef, this means we have a VPValue that is not directly defined
iniside the plan and we are done.

If we have a VPDef, it is defined inside the region by a recipe, which
is a VPUser, and the upwards def-use chain traversal continues by
traversing all its operands.

Note that we need to add an additional field to to VPVAlue to link them
to their defs. The space increase is going to be offset by being able to
remove the SubclassID field in future patches.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D90558
2020-11-17 16:18:11 +00:00
Matt Arsenault c5ce6036c1 Linker: Fix linking of byref types
This wasn't properly remapping the type like with the other
attributes, so this would end up hitting a verifier error after
linking different modules using byref.
2020-11-17 11:02:04 -05:00
Anton Afanasyev 0a1d315f9f [SLPVectorizer] Fix assert 2020-11-17 18:46:31 +03:00
Anton Afanasyev fcad8d3635 [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic
For the scattered operands of load instructions it makes sense
to use gathering load intrinsic, which can lower to native instruction
for X86/AVX512 and ARM/SVE. This also enables building
vectorization tree with entries containing scattered operands.
The next step is to add scattered store.

Fixes PR47629 and PR47623

Differential Revision: https://reviews.llvm.org/D90445
2020-11-17 18:11:45 +03:00
Florian Hahn 13042da5cb
[ConstraintElimination] Add support for And.
When processing conditional branches, if the condition is an AND of 2 compares
and the true successor only has the current block as predecessor, queue both
conditions for the true successor.
2020-11-17 14:12:15 +00:00
Sander de Smalen f571fe6df5 Reland [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost.
This relands https://reviews.llvm.org/D91059 and reverts commit
30fded75b4.

GetRegUsage now returns 0 when Ty is not a valid vector element type.
2020-11-17 13:45:10 +00:00
Yevgeny Rouban a57fe210ff [JumpThreading] Fix branch probabilities in DuplicateCondBranchOnPHIIntoPred()
When instructions are cloned from block BB to PredBB in the method
DuplicateCondBranchOnPHIIntoPred() number of successors of PredBB
changes from 1 to number of successors of BB. So we have to copy
branch probabilities from BB to PredBB.

Reviewed By: Kazu Hirata

Differential Revision: https://reviews.llvm.org/D90841
2020-11-17 14:40:50 +07:00
Max Kazantsev 63dd1734b2 [NFC] Collect ext users into vector instead of finding them twice 2020-11-17 14:01:43 +07:00
Kazu Hirata 1da60f1d44 [Transforms] Use pred_empty (NFC) 2020-11-16 22:09:14 -08:00
Kazu Hirata 5935952c31 [SanitizerCoverage] Use [&] for lambdas (NFC) 2020-11-16 21:45:21 -08:00
Arthur Eubanks 7de6dcd246 [Debugify] Skip debugifying on special/immutable passes
With a function pass manager, it would insert debuginfo metadata before
getting to function passes while processing the pass manager, causing
debugify to skip while running the function passes.

Skip special passes + verifier + printing passes. Compared to the legacy
implementation of -debugify-each, this additionally skips verifier
passes. Probably no need to update the legacy version since it will be
obsolete soon.

This fixes 2 instcombine tests using -debugify-each under NPM.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D91558
2020-11-16 20:39:46 -08:00
Sjoerd Meijer fa5cb4b936 [LoopFlatten] Disable IV widening
Disable widening of the IV in LoopFlatten while I investigate an assertion
failures. Please note that the pass is also not yet enabled by default.
2020-11-16 22:30:52 +00:00
Michael Liao f375885ab8 [InferAddrSpace] Teach to handle assumed address space.
- In certain cases, a generic pointer could be assumed as a pointer to
  the global memory space or other spaces. With a dedicated target hook
  to query that address space from a given value, infer-address-space
  pass could infer and propagate that to all its users.

Differential Revision: https://reviews.llvm.org/D91121
2020-11-16 17:06:33 -05:00
Florian Hahn 5a4ca8b550 [ConstraintElimination] Add support for Or.
When processing conditional branches, if the condition is an OR of 2 compares
and the false successor only has the current block as predecessor, queue both
negated conditions for the false successor
2020-11-16 21:48:38 +00:00
Philip Reames 2240d3d054 [LoopVec] Introduce an api for detecting uniform memory ops
Split off D91398 at request of reviewer.
2020-11-16 13:30:48 -08:00
Arnold Schwaighofer d861cc0e43 [coro] Async coroutines: Make sure we can handle control flow in suspend point dispatch function
Create a valid basic block with a terminator before we call
InlineFunction.

Differential Revision: https://reviews.llvm.org/D91547
2020-11-16 11:59:02 -08:00
Sanjay Patel 4e68bc0999 Revert "[InstCombine] add multi-use demanded bits fold for add with low-bit mask"
This reverts commit e56103d250.
There is a stage2 msan failure blamed on this commit:
http://lab.llvm.org:8011/#/builders/74/builds/888/steps/9/logs/stdio
2020-11-16 14:48:09 -05:00
Arthur Eubanks aeb0fdff35 [SimplifyCFG] Respect optforfuzzing in NPM pass
Regression caused by refactoring in
cdd006eec9.

See discussion in https://reviews.llvm.org/D89917.

Reviewed By: arsenm, morehouse

Differential Revision: https://reviews.llvm.org/D91473
2020-11-16 09:56:37 -08:00
Xun Li 985c524001 [Coroutine] Allocas used by StoreInst does not always escape
In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped.
This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away.
These used should be handled without conservatively marking them escaped.

This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still
consider the original alloca not escaping and keep it on the stack instead of putting it on the frame.

Differential Revision: https://reviews.llvm.org/D91305
2020-11-16 09:14:44 -08:00
Florian Hahn 8dbe44cb29 Add pass to add !annotate metadata from @llvm.global.annotations.
This patch adds a new pass to add !annotation metadata for entries in
@llvm.global.anotations, which is generated  using
__attribute__((annotate("_name"))) on functions in Clang.

This has been discussed on llvm-dev as part of
    RFC: Combining Annotation Metadata and Remarks
    http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D91195
2020-11-16 14:57:11 +00:00
Benjamin Kramer 2e7455f00a [LoopFlatten] Fold variable into assert. NFC. 2020-11-16 11:51:39 +01:00
Sjoerd Meijer 9aa773381b [LoopFlatten] Widen the IV
Widen the IV to the widest available and legal integer type, which makes this
transformations always safe so that we can skip overflow checks.

Motivation is to let this pass trigger on 64-bit targets too, and this is the
last patch in a serie to achieve this: D90402 moves pass LoopFlatten to just
before IndVarSimplify so that IVs are not already widened, D90421 factors out
widening from IndVarSimplify into Utils/SimplifyIndVar so that we can also use
it in LoopFlatten.

Differential Revision: https://reviews.llvm.org/D90640
2020-11-16 10:20:13 +00:00
Max Kazantsev b4624f65cf Recommit "[NFC] Move code between functions as a preparation step for further improvement"
The bug should be fixed now.
2020-11-16 14:30:34 +07:00
Kazu Hirata 147ccc848a [JumpThreading] Call eraseBlock when folding a conditional branch
This patch teaches the jump threading pass to call BPI->eraseBlock
when it folds a conditional branch.

Without this patch, BranchProbabilityInfo could end up with stale edge
probabilities for the basic block containing the conditional branch --
one edge probability with less than 1.0 and the other for a removed
edge.

This patch is one of the steps before we can safely re-apply D91017.

Differential Revision: https://reviews.llvm.org/D91511
2020-11-15 22:29:30 -08:00
Kazu Hirata 0888eaf3fd [Loop Fusion] Use pred_empty and succ_empty (NFC) 2020-11-15 20:32:57 -08:00
Kazu Hirata 0c03d1328c [ADCE] Use succ_empty (NFC) 2020-11-15 19:52:59 -08:00
Kazu Hirata 43a6a1e928 [TRE] Use successors(BB) (NFC) 2020-11-15 19:12:49 -08:00
Kazu Hirata 918e3439e2 [SanitizerCoverage] Use llvm::all_of (NFC) 2020-11-15 19:01:20 -08:00
Serguei Katkov 400f6edce7 [IRCE] Use the same min runtime iteration threshold for BPI and BFI checks
In the last change to IRCE the BPI is ignored if BFI is present, however
BFI and BPI have a different thresholds. Specifically BPI approach checks only
latch exit probability so it is expected if the loop has only one exit block (latch)
the behavior with BFI and BPI should be the same,

BPI approach by default uses threshold 10, so it considers the loop with estimated
number of iterations less then 10 should not be considered for IRCE optimization.
BFI approach uses the default value 3 and this is inconsistent.

The CL modifies the code to use the same threshold for both approaches..

The test is updated due to it has two side-exits (except latch) and each of them has a
probability 1/16, so BFI estimates the number of runtime iteration is about to 7
(1/16 + 1/16 + some for latch) and test fails.

Reviewers: mkazantsev, ebrevnov
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D91230
2020-11-16 09:21:50 +07:00
Sanjay Patel 6ddc237766 [InstCombine] reduce code for flip of masked bit; NFC
There are 1-2 potential follow-up NFC commits to reduce
this further on the way to generalizing this for vectors.

The operand replacing path should be dead code because demanded
bits handles that more generally (D91415).
2020-11-15 15:43:34 -05:00
Sanjay Patel e56103d250 [InstCombine] add multi-use demanded bits fold for add with low-bit mask
I noticed an add example like the one from D91343, so here's a similar patch.
The logic is based on existing code for the single-use demanded bits fold.
But I only matched a constant instead of using compute known bits on the
operands because that was the motivating patterni that I noticed.

I think this will allow removing a special-case (but incomplete) dedicated
fold within visitAnd(), but I need to untangle the existing code to be sure.

https://rise4fun.com/Alive/V6fP

  Name: add with low mask
  Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0
  %a = add i8 %x, C1
  %r = and i8 %a, C2
  =>
  %r = and i8 %x, C2

Differential Revision: https://reviews.llvm.org/D91415
2020-11-15 15:09:49 -05:00
Florian Hahn 0c119ba8a8 [VPlan] Use VPValue def for VPWidenGEPRecipe.
This patch turns VPWidenGEPRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84683
2020-11-15 15:12:47 +00:00
Arthur Eubanks 6e04da0a5a [DCE] Port -redundant-dbg-inst-elim to NPM
This is used to test RemoveRedundantDbgInstrs(), which is used by other
passes.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D91477
2020-11-14 16:55:20 -08:00
Florian Hahn a70b511e78 Recommit "[VPlan] Use VPValue def for VPWidenSelectRecipe."
This reverts the revert commit c8d73d939f.

It includes a fix for cases where we missed inserting VPValues
for some selects, which should fix PR48142.
2020-11-14 20:00:25 +00:00
Arnold Schwaighofer 8fb73cecfd [Coroutines] Make sure that async coroutine context size is a multiple of the alignment requirement
This simplifies the code the allocator has to executed

Differential Revision: https://reviews.llvm.org/D91471
2020-11-14 04:56:56 -08:00
Roman Lebedev 6861d938e5
Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM"
See discussion in https://bugs.llvm.org/show_bug.cgi?id=45073 / https://reviews.llvm.org/D66324#2334485
the implementation is known-broken for certain inputs,
the bugreport was up for a significant amount of timer,
and there has been no activity to address it.
Therefore, just completely rip out all of misexpect handling.

I suspect, fixing it requires redesigning the internals of MD_misexpect.
Should anyone commit to fixing the implementation problem,
starting from clean slate may be better anyways.

This reverts commit 7bdad08429,
and some of it's follow-ups, that don't stand on their own.
2020-11-14 13:12:38 +03:00
Akira Hatanaka 2ed3a76745 [ObjC][ARC] Add and use a function which finds and returns the single
dependency. NFC

Use findSingleDependency in place of FindDependencies and stop passing a
set of Instructions around. Modify FindDependencies to return a boolean
flag which indicates whether the dependencies it has found are all
valid.
2020-11-13 14:02:58 -08:00
Akira Hatanaka 00d0974e62 Move variable declarations to functions in which they are used. NFC 2020-11-13 14:02:58 -08:00
Guozhi Wei a20220d25b [AlwaysInliner] Call mergeAttributesForInlining after inlining
Like inlineCallIfPossible and InlinerPass, after inlining mergeAttributesForInlining
should be called to merge callee's attributes to caller. But it is not called in
AlwaysInliner, causes caller's attributes inconsistent with inlined code.

Attached test case demonstrates that attribute "min-legal-vector-width"="512" is
not merged into caller without this patch, and it causes failure in SelectionDAG
when lowering the inlined AVX512 intrinsic.

Differential Revision: https://reviews.llvm.org/D91446
2020-11-13 12:01:35 -08:00
Jianzhou Zhao 06c9b4aaa9 Extend the dfsan store/load callback with write/read address
This helped debugging.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D91236
2020-11-13 19:46:32 +00:00
Nikita Popov 02dda1c659 [Local] Clean up EmitGEPOffset
Handle the emission of the add in a single place, instead of three
different ones.

Don't emit an unnecessary add with zero to start with. It will get
dropped by InstCombine, but we may as well not create it in the
first place. This also means that InstCombine does not need to
specially handle this extra add.

This is conceptually NFC, but can affect worklist order etc.
2020-11-13 18:30:56 +01:00
David Zarzycki 5a327f3337 Revert "[NFC] Move code between functions as a preparation step for further improvement"
This reverts commit 08016ac32b.

A bunch of tests are failing my local two stage builder.
2020-11-13 10:52:49 -05:00
serge-sans-paille 95537f4508 llvmbuildectomy - compatibility with ocaml bindings
Use exact component name in add_ocaml_library.
Make expand_topologically compatible with new architecture.
Fix quoting in is_llvm_target_library.
Fix LLVMipo component name.
Write release note.
2020-11-13 14:35:52 +01:00
Florian Hahn 8bb6347939
Add !annotation metadata and remarks pass.
This patch adds a new !annotation metadata kind which can be used to
attach annotation strings to instructions.

It also adds a new pass that emits summary remarks per function with the
counts for each annotation kind.

The intended uses cases for this new metadata is annotating
'interesting' instructions and the remarks should provide additional
insight into transformations applied to a program.

To motivate this, consider these specific questions we would like to get answered:

* How many stores added for automatic variable initialization remain after optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?

Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)

Reviewed By: thegameg, jdoerfert

Differential Revision: https://reviews.llvm.org/D91188
2020-11-13 13:24:10 +00:00
Max Kazantsev 08016ac32b [NFC] Move code between functions as a preparation step for further improvement 2020-11-13 18:12:45 +07:00
Max Kazantsev 185cface2e [NFC] Refactor lambda into static function 2020-11-13 17:42:23 +07:00
Max Kazantsev 68490aec4e [NFC] Move lambdae into static functions 2020-11-13 17:07:25 +07:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Max Kazantsev 9224d322a2 [IndVars] Fix branches exiting by true with invariant conditions
Forgot to invert the condition for them.
2020-11-13 15:52:00 +07:00
Max Kazantsev 0a1d394bf3 [NFC] Refactor loop-invariant getters to return Optional 2020-11-13 15:03:10 +07:00
Arthur Eubanks b9406121a0 [NFC] Removed unused variable
Obsolete as of https://reviews.llvm.org/D91046.
2020-11-12 22:24:57 -08:00
Akira Hatanaka 09266e4af0 [ObjC][ARC] Clear the lists of basic blocks and instructions before
continuing the loop

This fixes a bug introduced in c6f1713c46.
2020-11-12 22:20:02 -08:00
Max Kazantsev 77efb73c67 [IndVars] Replace checks with invariants if we cannot remove them
If we cannot prove that the check is trivially true, but can prove that it either
fails on the 1st iteration or never fails, we can replace it with first iteration check.

Differential Revision: https://reviews.llvm.org/D88527
Reviewed By: skatkov
2020-11-13 12:23:12 +07:00
Sanjay Patel 0abde4bc92 [InstCombine] fold sub of low-bit masked value from offset of same value
There might be some demanded/known bits way to generalize this,
but I'm not seeing it right now.

This came up as a regression when I was looking at a different
demanded bits improvement.

https://rise4fun.com/Alive/5fl

  Name: general
  Pre: ((-1 << countTrailingZeros(C1)) & C2) == 0
  %a1 = add i8 %x, C1
  %a2 = and i8 %x, C2
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, ~C2

  Name: test 1
  %a1 = add i8 %x, 192
  %a2 = and i8 %x, 10
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, -11

  Name: test 2
  %a1 = add i8 %x, -108
  %a2 = and i8 %x, 3
  %r = sub i8 %a1, %a2
  =>
  %r = and i8 %a1, -4
2020-11-12 20:10:28 -05:00
Jianzhou Zhao 2d96859ea6 [msan] Break the getShadow loop after matching an argument
Reviewed-by: eugenis

Differential Revision: https://reviews.llvm.org/D91320
2020-11-12 19:48:59 +00:00
Alexander Kornienko 76b6cb515b Fix unused variable warning in release builds 2020-11-12 18:14:06 +01:00
Jamie Schmeiser f79b483385 [NFC intended] Refactor SinkAndHoistLICMFlags to allow others to construct without exposing internals
Summary:
Refactor SinkAdHoistLICMFlags from a struct to a class with accessors and constructors to allow other
classes to construct flags with meaningful defaults while not exposing LICM internal details.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea)

Differential Revision: https://reviews.llvm.org/D90482
2020-11-12 15:06:59 +00:00
Xun Li 94a45a8098 Revert "[Coroutine] Allocas used by StoreInst does not always escape"
This reverts commit 8bc7b9278e, which landed by accident.
2020-11-11 21:09:39 -08:00
Max Kazantsev d6dd938589 [IndVars] IV user should not prevent use widening
Sometimes the an instruction we are trying to widen is used by the IV
(which means the instruction is the IV increment). Currently this may
prevent its widening. We should ignore such user because it will be
dead once the transform is done anyways.

Differential Revision: https://reviews.llvm.org/D90920
Reviewed By: fhahn
2020-11-12 12:02:01 +07:00
Xun Li 8bc7b9278e [Coroutine] Allocas used by StoreInst does not always escape
In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped.
This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away.
These used should be handled without conservatively marking them escaped.

This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still
consider the original alloca not escaping and keep it on the stack instead of putting it on the frame.

Differential Revision: https://reviews.llvm.org/D91305
2020-11-11 20:53:51 -08:00
Max Kazantsev 2e01ceafaa [IndVars] Recognize 'sub nuw' expressed as 'add' for widening
InstCombine canonicalizes 'sub nuw' instructions to 'add' without the
`nuw` flag. The typical case where we see it is decrementing induction
variables. For them, IndVars fails to prove that it's legal to widen them,
and inserts unprofitable `zext`'s.

This patch adds recognition of such pattern using SCEV.

Differential Revision: https://reviews.llvm.org/D89550
Reviewed By: fhahn, skatkov
2020-11-12 10:51:29 +07:00
Arnold Schwaighofer 431337662e [coro] Async coroutines: Allow more than 3 arguments in the dispatch function
We need to be able to call function pointers. Inline the dispatch
function.

Also inline the context projection function.

Transfer debug locations from the suspend point to the inlined functions.

Use the function argument index instead of the function argument in
coro.id.async. This solves any spurious use issues.

Coerce the arguments of the tail call function at a suspend point. The LLVM
optimizer seems to drop casts leading to a vararg intrinsic.

rdar://70097093

Differential Revision: https://reviews.llvm.org/D91098
2020-11-11 15:25:28 -08:00
Arthur Eubanks d9cbceb041 [CGSCC][Inliner] Handle new non-trivial edges in updateCGAndAnalysisManagerForPass
Previously the inliner did a bit of a hack by adding ref edges for all
new edges introduced by performing an inline before calling
updateCGAndAnalysisManagerForPass(). This was because
updateCGAndAnalysisManagerForPass() didn't handle new non-trivial call
edges.

This adds handling of non-trivial call edges to
updateCGAndAnalysisManagerForPass().  The inliner called
updateCGAndAnalysisManagerForFunctionPass() since it was handling adding
newly introduced edges (so updateCGAndAnalysisManagerForPass() would
only have to handle promotion), but now it needs to call
updateCGAndAnalysisManagerForCGSCCPass() since
updateCGAndAnalysisManagerForPass() is now handling the new call edges
and function passes cannot add new edges.

We follow the previous path of adding trivial ref edges then letting promotion
handle changing the ref edges to call edges and the CGSCC updates. So
this still does not allow adding call edges that result in an addition
of a non-trivial ref edge.

This is in preparation for better detecting devirtualization. Previously
since the inliner itself would add ref edges,
updateCGAndAnalysisManagerForPass() would think that promotion and thus
devirtualization had happened after any sort of inlining.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91046
2020-11-11 13:43:49 -08:00
Jianzhou Zhao 0dd87825db Add a flag to control whether to propagate labels from condition values to results
Before the change, DFSan always does the propagation. W/o
origin tracking, it is harder to understand such flows. After
the change, the flag is off by default.

Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D91234
2020-11-11 20:41:42 +00:00
Akira Hatanaka 5e85d00ed6 Move variable declarations to functions in which they are used. NFC 2020-11-11 10:58:43 -08:00
Sander de Smalen 30fded75b4 Revert "[LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost."
This reverts commits:
* [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost.
  b873aba394.
* [LoopVectorizer] Silence warning in GetRegUsage.
  9ff701100a.
2020-11-11 14:41:55 +00:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
Sander de Smalen 9ff701100a [LoopVectorizer] Silence warning in GetRegUsage.
This patch silences the warning:
	error: lambda capture 'DL' is not used [-Werror,-Wunused-lambda-capture]
	  auto GetRegUsage = [&DL, &TTI=TTI](Type *Ty, ElementCount VF) {
	                      ~^~~
	1 error generated.

Introduced in:
  https://reviews.llvm.org/rGb873aba3943c067a5efd5303cbdf5aeb0732cf88
2020-11-11 10:54:20 +00:00
Sander de Smalen b873aba394 [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost.
This is more accurate than dividing the bitwidth based on the element count by the
maximum register size, as it can just reuse whatever has been calculated for
legalization of these types.

This change is also necessary when calculating register usage for scalable vectors, where
the legalization of these types cannot be done based on the widest register size, because
that does not take the 'vscale' component into account.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D91059
2020-11-11 10:18:50 +00:00
Sander de Smalen 0141f5a49d [LoopVectorizer] NFC: Return ElementCount from compute[Feasible]MaxVF
Interfaces changed to return `ElementCount`:
* LoopVectorizationCostModel::computeMaxVF
* LoopVectorizationCostModel::computeFeasibleMaxVF

This is NFC for fixed-width vectors.

Reviewed By: dmgreen, ctetreau

Differential Revision: https://reviews.llvm.org/D90880
2020-11-11 09:55:06 +00:00
Chen Zheng 4eb8359e74 [EarlyCSE] delete abs/nabs handling
delete abs/nabs handling in earlycse pass to avoid bugs related to
hashing values. After abs/nabs is canonicalized to intrinsics in D87188,
we should get CSE ability for abs/nabs back.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D90734
2020-11-10 21:10:58 -05:00
Florian Hahn c8d73d939f Revert "[VPlan] Use VPValue def for VPWidenSelectRecipe."
This reverts commit a8e50f1c6e.

This reportedly breaks building the Linux kernel.
  https://bugs.llvm.org/show_bug.cgi?id=48142
2020-11-10 22:50:46 +00:00
Bruno Cardoso Lopes dc14542a71 [Coroutines] Add missing llvm.dbg.declare's to cover for more allocas
Tracking local variables across suspend points is still somewhat incomplete.
Consider this coroutine snippet:

```
resumable foo() {
  int x[10] = {};
  int a = 3;
  co_await std::experimental::suspend_always();
  a++;
  x[0] = 1;
  a += 2;
  x[1] = 2;
  a += 3;
  x[2] = 3;
}
```

Can't manage to print `a` or `x` if they turn out to be allocas during
CoroSplit (which happens if you build this code with `-O0` prior to this
commit):

```
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x0000000100003729 main-noprint`foo() at main-noprint.cpp:43:5
   40     co_await std::experimental::suspend_always();
   41     a++;
   42     x[0] = 1;
-> 43     a += 2;
   44     x[1] = 2;
   45     a += 3;
   46     x[2] = 3;
(lldb) p x
error: <user expression 21>:1:1: use of undeclared identifier 'x'
x
^
```

The generated IR contains a `llvm.dbg.declare` for `x` in it's initialization
basic block. After CoroSplit, the `llvm.dbg.declare` might not dominate all of
`x` uses and we lose debugging quality.

Add `llvm.dbg.value`s to all relevant basic blocks such that if later
transformations break the dominance the reliable debug info is already in
place. For instance, this BB:

```
await.ready:
  ...
  %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760
  ...
  %arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763
  ...
  %arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766
```

becomes:

```
await.ready:
  ...
  call void @llvm.dbg.value(metadata [10 x i32]* %x.reload.addr, metadata !751, metadata !DIExpression()), !dbg !753
  ...
  %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760
  ...
  %arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763
  ...
  %arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766
```

Differential Revision: https://reviews.llvm.org/D90772
2020-11-10 12:36:07 -08:00
Sjoerd Meijer 2ef47910d5 [LoopFlatten] Run it earlier, just before IndVarSimplify
This is a prep step for widening induction variables in LoopFlatten if this is
posssible (D90640), to avoid having to perform certain overflow checks. Since
IndVarSimplify may already widen induction variables, we want to run
LoopFlatten just before IndVarSimplify. This is a minor reshuffle as both
passes were already close after each other.

Differential Revision: https://reviews.llvm.org/D90402
2020-11-10 20:22:41 +00:00
Sjoerd Meijer 706ead0e87 [LoopFlatten] Make it a FunctionPass
This converts LoopFlatten from a LoopPass to a FunctionPass so that we don't
run into problems of a loop pass deleting a (inner)loop.

Differential Revision: https://reviews.llvm.org/D90940
2020-11-10 20:03:31 +00:00
Florian Hahn a8e50f1c6e
[VPlan] Use VPValue def for VPWidenSelectRecipe.
This patch turns VPWidenSelectRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84682
2020-11-10 19:39:37 +00:00
Jonas Paulsson 89a1042b6a Make inferLibFuncAttributes() add SExt attribute on second arg to ldexp.
This was missing as discovered by the SystemZ multistage bot:
http://lab.llvm.org:8011/#/builders/8, where wrong code resulted when this
extension was not performed.

Thanks for review by Ulrich Weigand and Roman Lebedev.

Differential Revision: https://reviews.llvm.org/D90760
2020-11-10 18:32:15 +01:00
David Green c7e275388e [ARM] Don't aggressively unroll vector remainder loops
We already do not unroll loops with vector instructions under MVE, but
that does not include the remainder loops that the vectorizer produces.
These remainder loops will be rarely executed and are not worth
unrolling, as the trip count is likely to be low if they get executed at
all. Luckily they get llvm.loop.isvectorized to make recognizing them
simpler.

We have wanted to do this for a while but hit issues with low overhead
loops being reverted due to difficult registry allocation. With recent
changes that seems to be less of an issue now.

Differential Revision: https://reviews.llvm.org/D90055
2020-11-10 17:01:31 +00:00
Sanne Wouda dd03881bd5 Add loop distribution to the LTO pipeline
The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.

With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.

Differential Revision: https://reviews.llvm.org/D89896
2020-11-10 12:04:32 +00:00
Sander de Smalen f47573f9bf [LoopVectorizer] NFC: Propagate ElementCount to more interfaces.
Interfaces changed to take `ElementCount` as parameters:
* LoopVectorizationPlanner::buildVPlans
* LoopVectorizationPlanner::buildVPlansWithVPRecipes
* LoopVectorizationCostModel::selectVectorizationFactor

This patch is NFC for fixed-width vectors.

Reviewed By: dmgreen, ctetreau

Differential Revision: https://reviews.llvm.org/D90879
2020-11-10 11:11:02 +00:00
Max Kazantsev 25755a0159 [NFC] Add flag to disable IV widening in indvar instance
This allows us to have control over IV widening in the pipeline.
2020-11-10 15:10:44 +07:00
Arthur Eubanks 1cbf8e89b5 [NewPM] Port -separate-const-offset-from-gep
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91095
2020-11-09 17:42:36 -08:00
Michael Kruse e5dba2d7e5 [OMPIRBuilder] Start 'Create' methods with lower case. NFC.
For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews.

This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers.

I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller.

Reviewed By: mehdi_amini, fghanim

Differential Revision: https://reviews.llvm.org/D91109
2020-11-09 19:35:11 -06:00
Xun Li c2cb093d9b [Coroutine] Move all used local allocas to the .resume function
Prior to D89768, any alloca that's used after suspension points will be put on to the coroutine frame, and hence they will always be reloaded in the resume function.
However D89768 introduced a more precise way to determine whether an alloca should live on the frame. Allocas that are only used within one suspension region (hence does not need to live across suspension points) will not be put on the frame. They will remain local to the resume function.
When creating the new entry for the .resume function, the existing logic only moved all the allocas from the old entry to the new entry. This covers every alloca from the old entry. However allocas that's defined afer coro.begin are put into a separate basic block during CoroSplit (the PostSpill basic block). We need to make sure these allocas are moved to the new entry as well if they are used.
This patch walks through all allocas, and check if they are still used but are not reachable from the new entry, if so, we move them to the new entry.

Differential Revision: https://reviews.llvm.org/D90977
2020-11-09 17:24:49 -08:00
Sjoerd Meijer e2dcea4489 [LoopFlatten] FlattenInfo bookkeeping. NFC.
Introduce struct FlattenInfo to group some of the bookkeeping. Besides this
being a bit of a clean-up, it is a prep step for next additions (D90640). I
could take things a bit further, but thought this was a good first step also
not to make this change too large.

Differential Revision: https://reviews.llvm.org/D90408
2020-11-09 14:50:26 +00:00
Florian Hahn f0d76275cb
[VPlan] Print result value for loads in VPWidenMemoryInst (NFC).
For loads, print the result value.
2020-11-09 14:01:29 +00:00
Florian Hahn 537829f2a7
[VPlan] Add isStore helper to VPWidenMemoryInstructionRecipe (NFC).
Move logic to check if the recipe is a store to a helper for easier
reuse.
2020-11-09 14:01:29 +00:00
Florian Hahn fec64de261
[VPlan] Use VPValue def for VPWidenCall.
This patch turns VPWidenCall into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84681
2020-11-09 13:29:41 +00:00
Florian Hahn 091c5c9a18
[VPlan] Add printOperands helper to VPUser (NFC).
Factor out the code for printing operands of a VPUser so it can be
re-used when printing other recipes.
2020-11-09 12:30:57 +00:00
LemonBoy 42732d33cc
[InstCombine] Fix constant-folding of overflowing arithmetic ops on vectors
Feeding vector values to `InstCombiner::OptimizeOverflowCheck` produces a scalar boolean flag if it proves the overflow check can be eliminated.
This causes `InstCombiner::CreateOverflowTuple` to crash as it correctly expects a vector of i1 values instead.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D89628
2020-11-09 14:41:07 +03:00
Tim Northover f7fe7ea24d [MergeFunctions] fix function attribute comparison in FunctionComparator
The comparison of AttributeSets stopped after seeing a matching type attribute.
Subsequent mismatching attributes were not detected causing a crash.
2020-11-09 09:19:11 +00:00
Simon Pilgrim b11eaf5617 [DSE] Don't dereference a dyn_cast<> result - use cast<> instead. NFCI.
We were relying on the dyn_cast<> succeeding - better use cast<> and have it assert that its the correct type than dereference a null result.
2020-11-08 13:07:45 +00:00
Simon Pilgrim 0fe91ad463 [InstCombine] foldSelectFunnelShift - block poison in funnel shift value
As raised by @nlopes on D90382 - if this is not a rotate then the select was blocking poison from the 'shift-by-zero' non-TVal, but a funnel shift won't - so freeze it.
2020-11-08 12:58:30 +00:00
Florian Hahn e8dc17a2b7
[LoopInterchange] Skip non SCEV-able operands in cost function.
This fixes a crash when trying to get a SCEV expression for operands
that are not SCEV-able.
2020-11-08 11:41:19 +00:00
Pedro Tammela 5e8ecff0d8 [Reg2Mem] add support for the new pass manager
This patch refactors the pass to accomodate the new pass manager
boilerplate.

Differential Revision: https://reviews.llvm.org/D91005
2020-11-08 11:14:05 +00:00
Kazu Hirata 75e46c6328 [Mem2Reg] Use llvm::count instead of std::count (NFC) 2020-11-07 20:18:47 -08:00
Kazu Hirata c95fff5be7 [JumpThreading] Fix function names (NFC) 2020-11-07 19:35:03 -08:00
Atmn Patel 04a0896487 Revert "[LoopDeletion] Allows deletion of possibly infinite side-effect free loops"
This reverts commit 0b17c6e447. This patch
causes a compile-time error in SCEV.
2020-11-07 00:32:12 -05:00
Atmn Patel 0b17c6e447 [LoopDeletion] Allows deletion of possibly infinite side-effect free loops
From C11 and C++11 onwards, a forward-progress requirement has been
introduced for both languages. In the case of C, loops with non-constant
conditionals that do not have any observable side-effects (as defined by
6.8.5p6) can be assumed by the implementation to terminate, and in the
case of C++, this assumption extends to all functions. The clang
frontend will emit the `mustprogress` function attribute for C++
functions (D86233, D85393, D86841) and emit the loop metadata
`llvm.loop.mustprogress` for every loop in C11 or later that has a
non-constant conditional.

This patch modifies LoopDeletion so that only loops with
the `llvm.loop.mustprogress` metadata or loops contained in functions
that are required to make progress (`mustprogress` or `willreturn`) are
checked for observable side-effects. If these loops do not have an
observable side-effect, then we delete them.

Loops without observable side-effects that do not satisfy the above
conditions will not be deleted.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86844
2020-11-06 22:06:58 -05:00
Atmn Patel babc224c5d [LoopDeletion] Remove dead loops with no exit blocks
Currently, LoopDeletion refuses to remove dead loops with no exit blocks
because it cannot statically determine the control flow after it removes
the block. This leads to miscompiles if the loop is an infinite loop and
should've been removed.

Differential Revision: https://reviews.llvm.org/D90115
2020-11-06 17:08:34 -05:00
Quentin Colombet a585228027 Prevent LICM and machineLICM from hoisting convergent operations
Results of convergent operations are implicitly affected by the
enclosing control flows and should not be hoisted out of arbitrary
loops.

Patch by Xiaoqing Wu <xiaoqing_wu@apple.com>

Differential Revision: https://reviews.llvm.org/D90361
2020-11-06 10:26:39 -08:00
Arnold Schwaighofer c6543cc6b8 llvm.coro.id.async lowering: Parameterize how-to restore the current's continutation context and restart the pipeline after splitting
The `llvm.coro.suspend.async` intrinsic takes a function pointer as its
argument that describes how-to restore the current continuation's
context from the context argument of the continuation function. Before
we assumed that the current context can be restored by loading from the
context arguments first pointer field (`first_arg->caller_context`).

This allows for defining suspension points that reuse the current
context for example.

Also:

llvm.coro.id.async lowering: Add llvm.coro.preprare.async intrinsic

Blocks inlining until after the async coroutine was split.

Also, change the async function pointer's context size position

   struct async_function_pointer {
     uint32_t relative_function_pointer_to_async_impl;
     uint32_t context_size;
   }

And make the position of the `async context` argument configurable. The
position is specified by the `llvm.coro.id.async` intrinsic.

rdar://70097093

Differential Revision: https://reviews.llvm.org/D90783
2020-11-06 06:22:46 -08:00
Florian Hahn d8d1cc647d [SLP] Also try to vectorize incoming values of PHIs .
Currently we do not consider incoming values of PHIs as roots for SLP
vectorization. This means we miss scenarios like the one in the test
case and PR47670.

It appears quite straight-forward to consider incoming values of PHIs as
roots for vectorization, but I might be missing something that makes
this problematic.

In terms of vectorized instructions, this applies to quite a few
benchmarks across MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto

    Same hash: 185 (filtered out)
    Remaining: 52
    Metric: SLP.NumVectorInstructions

    Program                                        base    patch   diff
     test-suite...ProxyApps-C++/HPCCG/HPCCG.test     9.00   27.00  200.0%
     test-suite...C/CFP2000/179.art/179.art.test     8.00   22.00  175.0%
     test-suite...T2006/458.sjeng/458.sjeng.test    14.00   30.00  114.3%
     test-suite...ce/Benchmarks/PAQ8p/paq8p.test    11.00   18.00  63.6%
     test-suite...s/FreeBench/neural/neural.test    12.00   18.00  50.0%
     test-suite...rimaran/enc-3des/enc-3des.test    65.00   95.00  46.2%
     test-suite...006/450.soplex/450.soplex.test    63.00   89.00  41.3%
     test-suite...ProxyApps-C++/CLAMR/CLAMR.test   177.00  250.00  41.2%
     test-suite...nchmarks/McCat/18-imp/imp.test    13.00   18.00  38.5%
     test-suite.../Applications/sgefa/sgefa.test    26.00   35.00  34.6%
     test-suite...pplications/oggenc/oggenc.test   100.00  133.00  33.0%
     test-suite...6/482.sphinx3/482.sphinx3.test   103.00  134.00  30.1%
     test-suite...oxyApps-C++/miniFE/miniFE.test   169.00  213.00  26.0%
     test-suite.../Benchmarks/Olden/tsp/tsp.test    59.00   73.00  23.7%
     test-suite...TimberWolfMC/timberwolfmc.test   503.00  622.00  23.7%
     test-suite...T2006/456.hmmer/456.hmmer.test    65.00   79.00  21.5%
     test-suite...libquantum/462.libquantum.test    58.00   68.00  17.2%
     test-suite...ternal/HMMER/hmmcalibrate.test    84.00   98.00  16.7%
     test-suite...ications/JM/ldecod/ldecod.test   351.00  401.00  14.2%
     test-suite...arks/VersaBench/dbms/dbms.test    52.00   57.00   9.6%
     test-suite...ce/Benchmarks/Olden/bh/bh.test   118.00  128.00   8.5%
     test-suite.../Benchmarks/Bullet/bullet.test   6355.00 6880.00  8.3%
     test-suite...nsumer-lame/consumer-lame.test   480.00  519.00   8.1%
     test-suite...000/183.equake/183.equake.test   226.00  244.00   8.0%
     test-suite...chmarks/Olden/power/power.test   105.00  113.00   7.6%
     test-suite...6/471.omnetpp/471.omnetpp.test    92.00   99.00   7.6%
     test-suite...ications/JM/lencod/lencod.test   1173.00 1261.00  7.5%
     test-suite...0/253.perlbmk/253.perlbmk.test    55.00   59.00   7.3%
     test-suite...oxyApps-C/miniAMR/miniAMR.test    92.00   98.00   6.5%
     test-suite...chmarks/MallocBench/gs/gs.test   446.00  473.00   6.1%
     test-suite.../CINT2006/403.gcc/403.gcc.test   464.00  491.00   5.8%
     test-suite...6/464.h264ref/464.h264ref.test   998.00  1055.00  5.7%
     test-suite...006/453.povray/453.povray.test   5711.00 6007.00  5.2%
     test-suite...FreeBench/distray/distray.test   102.00  107.00   4.9%
     test-suite...:: External/Povray/povray.test   4184.00 4378.00  4.6%
     test-suite...DOE-ProxyApps-C/CoMD/CoMD.test   112.00  117.00   4.5%
     test-suite...T2006/445.gobmk/445.gobmk.test   104.00  108.00   3.8%
     test-suite...CI_Purple/SMG2000/smg2000.test   789.00  819.00   3.8%
     test-suite...yApps-C++/PENNANT/PENNANT.test   233.00  241.00   3.4%
     test-suite...marks/7zip/7zip-benchmark.test   417.00  428.00   2.6%
     test-suite...arks/mafft/pairlocalalign.test   627.00  643.00   2.6%
     test-suite.../Benchmarks/nbench/nbench.test   259.00  265.00   2.3%
     test-suite...006/447.dealII/447.dealII.test   4641.00 4732.00  2.0%
     test-suite...lications/ClamAV/clamscan.test   106.00  108.00   1.9%
     test-suite...CFP2000/177.mesa/177.mesa.test   1639.00 1664.00  1.5%
     test-suite...oxyApps-C/RSBench/rsbench.test    66.00   65.00  -1.5%
     test-suite.../CINT2000/252.eon/252.eon.test   3416.00 3444.00  0.8%
     test-suite...CFP2000/188.ammp/188.ammp.test   1846.00 1861.00  0.8%
     test-suite.../CINT2000/176.gcc/176.gcc.test   152.00  153.00   0.7%
     test-suite...CFP2006/444.namd/444.namd.test   3528.00 3544.00  0.5%
     test-suite...T2006/473.astar/473.astar.test    98.00   98.00   0.0%
     test-suite...frame_layout/frame_layout.test    NaN     39.00   nan%

On ARM64, there appears to be a slight regression on SPEC2006, which
might be interesting to investigate:

   test-suite...T2006/473.astar/473.astar.test   0.9%

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D88735
2020-11-06 12:50:32 +00:00
Sander de Smalen 4a3bb9ea6c [VPlan] NFC: Change VFRange to take ElementCount
This patch changes the type of Start, End in VFRange to be an ElementCount
instead of `unsigned`. This is done as preparation to make VPlans for
scalable vectors, but is otherwise NFC.

Reviewed By: dmgreen, fhahn, vkmr

Differential Revision: https://reviews.llvm.org/D90715
2020-11-06 09:50:20 +00:00
Roman Lebedev 8d0fdd36a3
[IR] CmpInst: Add getFlippedSignednessPredicate()
And refactor a few places to use it
2020-11-06 11:31:09 +03:00
Giorgis Georgakoudis 700d2417d8 [CodeExtractor] Replace uses of extracted bitcasts in out-of-region lifetime markers
CodeExtractor handles bitcasts in the extracted region that have
lifetime markers users in the outer region as outputs. That
creates unnecessary alloca/reload instructions and extra lifetime
markers. The patch identifies those cases, and replaces uses in
out-of-region lifetime markers with new bitcasts in the outer region.

**Example**
```
define void @foo() {
entry:
  %0 = alloca i32
  br label %extract

extract:
  %1 = bitcast i32* %0 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %1)
  call void @use(i32* %0)
  br label %exit

exit:
  call void @use(i32* %0)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %1)
  ret void
}
```

**Current extraction**
```
define void @foo() {
entry:
  %.loc = alloca i8*, align 8
  %0 = alloca i32, align 4
  br label %codeRepl

codeRepl:                                         ; preds = %entry
  %lt.cast = bitcast i8** %.loc to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast)
  %lt.cast1 = bitcast i32* %0 to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1)
  call void @foo.extract(i32* %0, i8** %.loc)
  %.reload = load i8*, i8** %.loc, align 8
  call void @llvm.lifetime.end.p0i8(i64 -1, i8* %lt.cast)
  br label %exit

exit:                                             ; preds = %codeRepl
  call void @use(i32* %0)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %.reload)
  ret void
}

define internal void @foo.extract(i32* %0, i8** %.out) {
newFuncRoot:
  br label %extract

exit.exitStub:                                    ; preds = %extract
  ret void

extract:                                          ; preds = %newFuncRoot
  %1 = bitcast i32* %0 to i8*
  store i8* %1, i8** %.out, align 8
  call void @use(i32* %0)
  br label %exit.exitStub
}
```

**Extraction with patch**
```
define void @foo() {
entry:
  %0 = alloca i32, align 4
  br label %codeRepl

codeRepl:                                         ; preds = %entry
  %lt.cast1 = bitcast i32* %0 to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1)
  call void @foo.extract(i32* %0)
  br label %exit

exit:                                             ; preds = %codeRepl
  call void @use(i32* %0)
  %lt.cast = bitcast i32* %0 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %lt.cast)
  ret void
}

define internal void @foo.extract(i32* %0) {
newFuncRoot:
  br label %extract

exit.exitStub:                                    ; preds = %extract
  ret void

extract:                                          ; preds = %newFuncRoot
  %1 = bitcast i32* %0 to i8*
  call void @use(i32* %0)
  br label %exit.exitStub
}
```

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D90689
2020-11-05 17:01:08 -08:00
Sjoerd Meijer 7eb70158e4 [IndVarSimplify][SimplifyIndVar] Move WidenIV to Utils/SimplifyIndVar. NFCI.
This moves WidenIV from IndVarSimplify to Utils/SimplifyIndVar so that we have
createWideIV available as a generic helper utility. I.e., this is not only
useful in IndVarSimplify, but could be useful for loop transformations. For
example, motivation for this refactoring is the loop flatten transformation: if
induction variables in a loop nest can be widened, we can avoid having to
perform certain overflow checks, enabling this transformation.

Differential Revision: https://reviews.llvm.org/D90421
2020-11-05 16:52:47 +00:00
Florian Hahn be0578f0b4 [GVN] Fix MemorySSA update when replacing assume(false) with stores.
When replacing an assume(false) with a store, we have to be more careful
with the order we insert the new access. This patch updates the code to
look at the accesses in the block to find a suitable insertion point.

Alterantively we could check the defining access of the assume, but IIRC
there has been some discussion about making assume() readnone, so
looking at the access list might be more future proof.

Fixes PR48072.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90784
2020-11-05 12:09:32 +00:00
Simon Pilgrim 4b2be681f4 [InstCombine] Remove orphan InstCombinerImpl method declarations. NFCI. 2020-11-05 10:13:16 +00:00
Arnold Schwaighofer ea5989b43a Start of an llvm.coro.async implementation
This patch adds the `async` lowering of coroutines.

This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.

This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.

rdar://70097093

Reapply with fix for memory sanitizer failure and sphinx failure.

Differential Revision: https://reviews.llvm.org/D90612
2020-11-04 10:29:21 -08:00
Arnold Schwaighofer 42f1916640 Revert "Start of an llvm.coro.async implementation"
This reverts commit ea606cced0.

This patch causes memory sanitizer failures sanitizer-x86_64-linux-fast.
2020-11-04 08:26:20 -08:00
Arnold Schwaighofer ea606cced0 Start of an llvm.coro.async implementation
This patch adds the `async` lowering of coroutines.

This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.

This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.

rdar://70097093

Differential Revision: https://reviews.llvm.org/D90612
2020-11-04 07:32:29 -08:00
Roman Lebedev 93f3d7f7b3
[Reassociate] Guard `add`-like `or` conversion into an `add` with profitability check
This is slightly better compile-time wise,
since we avoid potentially-costly knownbits analysis that will
ultimately not allow us to actually do anything with said `add`.
2020-11-04 16:10:34 +03:00
Martin Storsjö 36cf1e7d0e Revert "[AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts"
This reverts commit 59b22e495c.

That commit broke building for ARM and AArch64, reproducible like this:

$ cat apedec-reduced.c
a;
b(e) {
  int c;
  unsigned d = f();
  c = d >> 32 - e;
  return c;
}
g() {
  int h = i();
  if (a)
    h = h << a | b(a);
  return h;
}
$ clang -target aarch64-linux-gnu -w -c -O3 apedec-reduced.c
clang: ../lib/Transforms/InstCombine/InstructionCombining.cpp:3656: bool llvm::InstCombinerImpl::run(): Assertion `DT.dominates(BB, UserParent) && "Dominance relation broken?"' failed.

Same thing for e.g. an armv7-linux-gnueabihf target.
2020-11-04 08:39:32 +02:00
Xun Li 7f34aca083 [musttail] Unify musttail call preceding return checking
There is already an API in BasicBlock that checks and returns the musttail call if it precedes the return instruction.
Use it instead of manually checking in each place.

Differential Revision: https://reviews.llvm.org/D90693
2020-11-03 11:39:27 -08:00
Roman Lebedev 70472f34b2
[Reassociate] Convert `add`-like `or`'s into an `add`'s to allow reassociation
InstCombine is quite aggressive in doing the opposite transform,
folding `add` of operands with no common bits set into an `or`,
and that not many things support that new pattern..

In this case, teaching Reassociate about it is easy,
there's preexisting art for `sub`/`shl`:
just convert such an `or` into an `add`:
https://rise4fun.com/Alive/Xlyv
2020-11-03 22:30:51 +03:00
Sanne Wouda 2ec26d3a23 Revert "Add loop distribution to the LTO pipeline"
This reverts commit 6e80318eec.
2020-11-03 19:29:27 +00:00
Sanne Wouda 6e80318eec Add loop distribution to the LTO pipeline
The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.

With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.

Differential Revision: https://reviews.llvm.org/D89896
2020-11-03 18:54:24 +00:00
Jameson Nash 59a6ab28c4 [GVN] small improvements to comments 2020-11-03 13:21:48 -05:00
Roman Lebedev c009d11bda
[InstCombine] Perform C-(X+C2) --> (C-C2)-X transform before using Negator
In particular, it makes it fire for C=0, because negator doesn't want
to perform that fold since in general it's not beneficial.
2020-11-03 16:06:52 +03:00
Roman Lebedev e465f9c303
[InstCombine] Negator: - (C - %x) --> %x - C (PR47997)
This relaxes one-use restriction on that `sub` fold,
since apparently the addition of Negator broke
preexisting `C-(C2-X) --> X+(C-C2)` (with C=0) fold.
2020-11-03 16:06:51 +03:00
Florian Hahn d68bed0fa9 [SCCP] Handle bitcast of vector constants.
Vectors where all elements have the same known constant range are treated as a
single constant range in the lattice. When bitcasting such vectors, there is a
mis-match between the width of the lattice value (single constant range) and
the original operands (vector). Go to overdefined in that case.

Fixes PR47991.
2020-11-03 12:58:39 +00:00
Simon Pilgrim 59b22e495c [AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts
The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns.

This should allow us to begin resolving PR46896 et al.

Differential Revision: https://reviews.llvm.org/D90625
2020-11-03 10:49:49 +00:00
Florian Hahn d9cbf39a37 [SLP] Pass VecPred argument to getCmpSelInstrCost.
Check if all compares in VL have the same predicate and pass it to
getCmpSelInstrCost, to improve cost-modeling on targets that only
support compare/select combinations for certain uniform predicates.

This leads to additional vectorization in some cases

```
Same hash: 217 (filtered out)
Remaining: 19
Metric: SLP.NumVectorInstructions

Program                                        base    slp2    diff
 test-suite...marks/SciMark2-C/scimark2.test    11.00   26.00  136.4%
 test-suite...T2006/445.gobmk/445.gobmk.test    79.00  135.00  70.9%
 test-suite...ediabench/gsm/toast/toast.test    54.00   71.00  31.5%
 test-suite...telecomm-gsm/telecomm-gsm.test    54.00   71.00  31.5%
 test-suite...CI_Purple/SMG2000/smg2000.test   426.00  542.00  27.2%
 test-suite...ch/g721/g721encode/encode.test    30.00   24.00  -20.0%
 test-suite...000/186.crafty/186.crafty.test   116.00  138.00  19.0%
 test-suite...ications/JM/ldecod/ldecod.test   697.00  765.00   9.8%
 test-suite...6/464.h264ref/464.h264ref.test   822.00  886.00   7.8%
 test-suite...chmarks/MallocBench/gs/gs.test   154.00  162.00   5.2%
 test-suite...nsumer-lame/consumer-lame.test   621.00  651.00   4.8%
 test-suite...lications/ClamAV/clamscan.test   223.00  231.00   3.6%
 test-suite...marks/7zip/7zip-benchmark.test   680.00  695.00   2.2%
 test-suite...CFP2000/177.mesa/177.mesa.test   2121.00 2129.00  0.4%
 test-suite...:: External/Povray/povray.test   2406.00 2412.00  0.2%
 test-suite...TimberWolfMC/timberwolfmc.test   634.00  634.00   0.0%
 test-suite...CFP2006/433.milc/433.milc.test   1036.00 1036.00  0.0%
 test-suite.../Benchmarks/nbench/nbench.test   321.00  321.00   0.0%
 test-suite...ctions-flt/Reductions-flt.test    NaN      5.00   nan%
```

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D90124
2020-11-03 10:16:43 +00:00
Max Kazantsev 46b2e85f0f [NFC] Refactor code in IndVars, preparing for further improvement 2020-11-03 15:08:12 +07:00
Max Kazantsev a44b7322a2 [NFC] Split lambda into 2 parts for further reuse 2020-11-03 14:13:55 +07:00
Max Kazantsev f847094c24 [IndVars] Use knowledge about execution on last iteration when removing checks
If we know that some check will not be executed on the last iteration, we can use this
fact to eliminate its check.

Differential Revision: https://reviews.llvm.org/D88210
Reviwed By: ebrevnov
2020-11-03 13:38:58 +07:00
Alina Sbirlea f514b32a89 [LICM] Add assert of AST/MSSA exclusiveness.
The API `canSinkOrHoistInst` may be called by LoopSink. Add assert to
avoid having two analyses passed in.
2020-11-02 18:04:43 -08:00
Akira Hatanaka b0f1d7d562 Remove unused parameter 2020-11-02 17:40:06 -08:00
Ettore Tiotto 4274cbba1c [PartialInliner]: Handle code regions in a switch stmt cases
This patch enhances computeOutliningColdRegionsInfo() to allow it to
consider regions containing a single basic block and a single
predecessor as candidate for partial inlining.

Reviewed By: fhann

Differential Revision: https://reviews.llvm.org/D89911
2020-11-02 14:32:45 -05:00
Simon Pilgrim 55f15f99cb [AggressiveInstCombine] foldGuardedRotateToFunnelShift - generalize rotation to funnel shift matcher.
Replace matchRotate with a more general matchFunnelShift - at the moment this is still just used for rotation patterns.
2020-11-02 17:09:17 +00:00
Fangrui Song 98b9338588 [Debugify] Port -debugify-each to NewPM
Preemptively switch 2 tests to the new PM

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D90365
2020-11-02 08:16:43 -08:00
Florian Hahn b3b993a7ad Reland "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts the revert commit 408c4408fa.

This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.

Original message:

On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
2020-11-02 15:39:29 +00:00
Teresa Johnson 0949f96dc6 [MemProf] Pass down memory profile name with optional path from clang
Similar to -fprofile-generate=, add -fmemory-profile= which takes a
directory path. This is passed down to LLVM via a new module flag
metadata. LLVM in turn provides this name to the runtime via the new
__memprof_profile_filename variable.

Additionally, always pass a default filename (in $cwd if a directory
name is not specified vi the = form of the option). This is also
consistent with the behavior of the PGO instrumentation. Since the
memory profiles will generally be fairly large, it doesn't make sense to
dump them to stderr. Also, importantly, the memory profiles will
eventually be dumped in a compact binary format, which is another reason
why it does not make sense to send these to stderr by default.

Change the existing memprof tests to specify log_path=stderr when that
was being relied on.

Depends on D89086.

Differential Revision: https://reviews.llvm.org/D89087
2020-11-01 17:38:23 -08:00
Florian Hahn ca38652b9a [VPlan] Assert no users remaining when deleting a VPValue.
When deleting a VPValue, all users must already by deleted. Add an
assertion to make sure and catch violations.
2020-11-01 17:44:53 +00:00
Florian Hahn aab71d4443 [DSE] Use same logic as legacy impl to check if free kills a location.
This patch updates DSE + MemorySSA to use the same check as the legacy
implementation to determine if a location is killed by a free call.

This changes the existing behavior so that a free does not kill
locations before the start of the freed pointer.

This should fix PR48036.
2020-10-31 20:09:25 +00:00
Florian Hahn 799033d8c5 Reland "[SLP] Consider alternatives for cost of select instructions."
This reverts the revert commit a1b53db324.

This patch includes a fix for a reported issue, caused by
matchSelectPattern returning UMIN for selects of pointers in
some cases by looking to some connected casts.

For now, ensure integer instrinsics are only returned for selects of
ints or int vectors.
2020-10-31 16:52:36 +00:00
Simon Pilgrim 538fdb0189 [InstCombine] foldSelectRotate - generalize to foldSelectFunnelShift
This is the last of the rotate->funnel shift InstCombine generalizations for PR46896

We still have foldGuardedRotateToFunnelShift to deal with in AggressiveInstCombine

Differential Revision: https://reviews.llvm.org/D90382
2020-10-31 12:32:34 +00:00
Simon Pilgrim 4da6a48399 [CSE] Make some basic EarlyCSE::StackNode helper methods const. NFCI.
Fixes a number of cppcheck remarks.
2020-10-31 12:16:48 +00:00
Nikita Popov 27f647d117 [Inliner] Consistently apply callsite noalias metadata
Previously, !noalias and !alias.scope metadata on the call site was
applied as part of CloneAliasScopeMetadata(), which short-circuits
if the callee does not use any noalias metadata itself. However,
these two things have no relation to each other.

Consistently apply !noalias and !alias.scope metadata by integrating
this into an existing function that handled !llvm.access.group and
!llvm.mem.parallel_loop_access metadata. The handling for all of
these metadata kinds essentially the same.
2020-10-31 10:54:45 +01:00
Arthur Eubanks 5c31b8b94f Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit 10f2a0d662.

More uint64_t overflows.
2020-10-31 00:25:32 -07:00
Florian Hahn a1b53db324 Revert "[SLP] Consider alternatives for cost of select instructions."
This reverts commit 1922570489.

This appears to cause a crash in the following example

 a, b, c;
 l() {
   int e = a, f = l, g, h, i, j;
   float *d = c, *k = b;
   for (;;)
     for (; g < f; g++) {
       k[h] = d[i];
       k[h - 1] = d[j];
       h += e << 1;
       i += e;
     }
 }

 clang -cc1 -triple i386-unknown-linux-gnu -emit-obj -target-cpu pentium-m -O1 -vectorize-loops -vectorize-slp reduced.c

 llvm::Type *llvm::Type::getWithNewBitWidth(unsigned int) const: Assertion `isIntOrIntVectorTy() && "Original type expected to be a vector of integers or a scalar integer."' failed.
2020-10-30 21:26:14 +00:00
Florian Hahn 408c4408fa Revert "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts commit 73f01e3df5.

This appears to break
http://lab.llvm.org:8011/#/builders/85/builds/383.
2020-10-30 21:26:14 +00:00
Peter Collingbourne 3d049bce98 hwasan: Support for outlined checks in the Linux kernel.
Add support for match-all tags and GOT-free runtime calls, which
are both required for the kernel to be able to support outlined
checks. This requires extending the access info to let the backend
know when to enable these features. To make the code easier to maintain
introduce an enum with the bit field positions for the access info.

Allow outlined checks to be enabled with -mllvm
-hwasan-inline-all-checks=0. Kernels that contain runtime support for
outlined checks may pass this flag. Kernels lacking runtime support
will continue to link because they do not pass the flag. Old versions
of LLVM will ignore the flag and continue to use inline checks.

With a separate kernel patch [1] I measured the code size of defconfig
+ tag-based KASAN, as well as boot time (i.e. time to init launch)
on a DragonBoard 845c with an Android arm64 GKI kernel. The results
are below:

         code size    boot time
before    92824064      6.18s
after     38822400      6.65s

[1] https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76

Depends on D90425

Differential Revision: https://reviews.llvm.org/D90426
2020-10-30 14:25:40 -07:00
Peter Collingbourne 0930763b4b hwasan: Move fixed shadow behind opaque no-op cast as well.
This is a workaround for poor heuristics in the backend where we can
end up materializing the constant multiple times. This is particularly
bad when using outlined checks because we materialize it for every call
(because the backend considers it trivial to materialize).

As a result the field containing the shadow base value will always
be set so simplify the code taking that into account.

Differential Revision: https://reviews.llvm.org/D90425
2020-10-30 13:23:52 -07:00
Arthur Eubanks 10f2a0d662 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-30 10:03:46 -07:00
Pedro Tammela 86e0c1acdb [NFC][Reg2Mem] modernize loops iterators
This patch updates the Reg2Mem loops to use more modern iterators.

Differential Revision: https://reviews.llvm.org/D90122
2020-10-30 16:50:07 +00:00
Pedro Tammela 70a495c7f0 [NFC][LoopSimplify] modernize for loops over LoopInfo
This patch modifies two for loops to use the range based syntax.
Since they are equivalent, this patch is tagged NFC.

Differential Revision: https://reviews.llvm.org/D90069
2020-10-30 16:50:07 +00:00
Michael Liao c82403d025 [gvn] PRE needs to skip convergent intrinsics/calls.
- As convergent intrinsics/calls could only be moved to
  control-equivalent blocks, or more precisely the same divergent
  branch, PRE needs to skip them.

Differential Revision: https://reviews.llvm.org/D90391
2020-10-30 11:24:40 -04:00
Evgeniy Brevnov 3d31adaec4 [DSE] Improve partial overlap detection
Currently isOverwrite returns OW_MaybePartial even for accesss known not to overlap. This is not a big problem for legacy implementation (since isPartialOverwrite follows isOverwrite and clarifies the result). Contrary SSA based version does a lot of work to later find out that accesses don't overlap. Besides negative impact on compile time we quickly reach MemorySSAPartialStoreLimit and miss optimization opportunities.

Note: In fact, I think it would be cleaner implementation if isOverwrite returned fully clarified result in the first place whithout need to call isPartialOverwrite. This can be done as a follow up. What do you think?

Reviewed By: fhahn, asbirlea

Differential Revision: https://reviews.llvm.org/D90371
2020-10-30 22:23:20 +07:00
Simon Pilgrim ed577892cf Use cast<> instead of dyn_cast<> as we dereference the pointers immediately. NFCI.
Fix clang static analyzer warnings - we're better off relying on cast<> asserting on failure rather than a null dereference crash.
2020-10-30 15:20:40 +00:00
Florian Hahn aa1a198a64 [VPlan] Use isa<> instead getVPRecipeID in getFirstNonPhi (NFC).
As per the comment in VPRecipeBase, clients should not rely on
getVPRecipeID, as it may change in the future. It should only be used in
classof implementations. Use isa instead in getFirstNonPhi.
2020-10-30 14:56:06 +00:00
Simon Pilgrim b7c91a9b8e [SCEV] SCEVExpander::InsertNoopCastOfTo - reduce scope of pointer type. NFCI.
By reducing the scope of the dyn_cast<PointerType> we can make this a cast<PointerType> and avoid clang static analyzer null deference warnings.
2020-10-30 14:55:09 +00:00
Florian Hahn 73f01e3df5 [TTI] Add VecPred argument to getCmpSelInstrCost.
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.

Reviewed By: dmgreen, RKSimon

Differential Revision: https://reviews.llvm.org/D90070
2020-10-30 13:49:08 +00:00
Simon Pilgrim 9e154f1aca [SROA] Pass Twine by const reference. NFCI.
Fixes clang-tidy warnings.
2020-10-30 11:36:58 +00:00
Max Kazantsev bd341bafbf [NFC] Simplify code in IndVars 2020-10-30 17:49:32 +07:00
Florian Hahn 05e4f7bde9 [DSE] Remove noop stores after killing stores for a MemoryDef.
Currently we fail to eliminate some noop stores if there is a kill-able
store between the starting def and the load. This is because we
eliminate noop stores first.

In practice it seems like eliminating noop stores after the main
elimination for a def covers slightly more cases.

This patch improves the number of stores slightly in 2 cases for X86 -O3
-flto

Same hash: 235 (filtered out)
Remaining: 2
Metric: dse.NumRedundantStores

Program                                          base      patch diff
 test-suite...ce/Benchmarks/PAQ8p/paq8p.test     2.00   3.00 50.0%
 test-suite...006/453.povray/453.povray.test    18.00  21.00 16.7%

There might be other phase ordering issues, but it appears that they do
not show up in the test-suite/SPEC2000/SPEC2006. We can always tune the
ordering later.

Partly fixes PR47887.

Reviewed By: asbirlea, zoecarver

Differential Revision: https://reviews.llvm.org/D89650
2020-10-30 09:40:15 +00:00
Roman Lebedev 81fc53a36a
[SCEV] Introduce SCEVPtrToIntExpr (PR46786)
And use it to model LLVM IR's `ptrtoint` cast.

This is essentially an alternative to D88806, but with no chance for
all the problems it caused due to having the cast as implicit there.
(see rG7ee6c402474a2f5fd21c403e7529f97f6362fdb3)

As we've established by now, there are at least two reasons why we want this:
* It will allow SCEV to actually model the `ptrtoint` casts
  and their operands, instead of treating them as `SCEVUnknown`
* It should help with initial problem of PR46786 - this should eventually allow us
  to not loose pointer-ness of an expression in more cases

As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=46786 | PR46786 ]], in principle,
we could just extend `SCEVUnknown` with a `is ptrtoint` cast, because `ScalarEvolution::getPtrToIntExpr()`
should sink the cast as far down into the expression as possible,
so in the end we should always end up with `SCEVPtrToIntExpr` of `SCEVUnknown`.

But i think that it isn't the best solution, because it doesn't really matter
from memory consumption side - there probably won't be *that* many `SCEVPtrToIntExpr`s
for it to matter, and it allows for much better discoverability.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89456
2020-10-30 11:13:35 +03:00
Vitaly Buka 36fa658db5 [NFC] Fix "ambiguous overload for ‘operator=’"
From D89768
2020-10-30 00:43:32 -07:00
Vitaly Buka 1455259546 [NFC] Fix "ambiguous overload for ‘operator=’" 2020-10-30 00:36:50 -07:00
Xun Li 9f5a2beadc [Coroutine] Properly determine whether an alloca should live on the frame
The existing logic in determining whether an alloca should live on the frame only looks explicit def-use relationships. However a value defined by an alloca may be implicitly needed across suspension points, either because an alias has across-suspension-point def-use relationship, or escaped by store/call/memory intrinsics. To properly handle all these cases, we have to properly visit the alloca pointer up-front. Thie patch extends the exisiting alloca use visitor to determine whether an alloca should live on the frame.

Differential Revision: https://reviews.llvm.org/D89768
2020-10-29 23:56:05 -07:00
Stefanos Baziotis a3345300b6 [LCSSA] Doc for special treatment of PHIs
Differential Revision: https://reviews.llvm.org/D89739
2020-10-29 22:50:07 +02:00
Nikita Popov 20b386aae0 [LoopUtils] Fix neutral value for vector.reduce.fadd
Use -0.0 instead of 0.0 as the start value. The previous use of 0.0
was fine for all existing uses of this function though, as it is
always generated with fast flags right now, and thus nsz.
2020-10-29 21:45:13 +01:00
Florian Hahn 1922570489 [SLP] Consider alternatives for cost of select instructions.
Some architectures do not have general vector select instructions (e.g.
AArch64). But some cmp/select patterns can be vectorized using other
instructions/intrinsics.

One example is using min/max instructions for certain patterns.

This patch updates the cost calculations for selects in the SLP
vectorizer to consider using min/max intrinsics.

This patch does not change SLP vectorizer's codegen itself to actually
generate those intrinsics, but relies on the backends to lower the
vector cmps & selects. This keeps things simple on the SLP side and
works well in practice for AArch64.

This exposes additional SLP vectorization opportunities in some
benchmarks on AArch64 (-O3 -flto).

Metric: SLP.NumVectorInstructions

Program                                        base    slp     diff
 test-suite...ications/JM/ldecod/ldecod.test   502.00  697.00  38.8%
 test-suite...ications/JM/lencod/lencod.test   1023.00 1414.00 38.2%
 test-suite...-typeset/consumer-typeset.test    56.00   65.00  16.1%
 test-suite...6/464.h264ref/464.h264ref.test   804.00  822.00   2.2%
 test-suite...006/453.povray/453.povray.test   3335.00 3357.00  0.7%
 test-suite...CFP2000/177.mesa/177.mesa.test   2110.00 2121.00  0.5%
 test-suite...:: External/Povray/povray.test   2378.00 2382.00  0.2%

Reviewed By: RKSimon, samparker

Differential Revision: https://reviews.llvm.org/D89969
2020-10-29 20:39:50 +00:00
Dávid Bolvanský 7a2abf5aca [InferAttrs] Add nocapture/writeonly to string/mem libcalls
One step closer to fix PR47644.

Differential Revision: https://reviews.llvm.org/D89645
2020-10-29 20:06:43 +01:00
Simon Pilgrim dcb3dc101d [InstCombine] visitShl - ensure inner shifts have inrange amounts
Noticed when fixing OSS Fuzz #26716
2020-10-29 15:28:15 +00:00
Max Kazantsev a5b2e795c3 [NFC][SCEV] Refactor monotonic predicate checks to return enums instead of bools
This patch gets rid of output parameter which is not needed for most users
and prepares this API for further refactoring.
2020-10-29 16:01:25 +07:00
Johannes Doerfert d39f574dcc [Attributor][FIX] Properly promote arguments pointers to arrays
When we promote pointer arguments we did compute a wrong offset and use
a wrong type for the array case.

Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.
2020-10-29 00:45:32 -05:00
Fangrui Song 39856d5d0b [Debugify] Move global namespace functions into llvm::
Also move exportDebugifyStats from tools/opt to Debugify.cpp
2020-10-28 19:11:41 -07:00
Florian Hahn 53f4c4b2cc [InstCombine] Do not introduce bitcasts for swifterror arguments.
The following constraints hold for swifterror values:

    A swifterror value (either the parameter or the alloca) can only
    be loaded and stored from, or used as a swifterror argument.

This patch updates instcombine to not try to convert a bitcast of a
function into a bitcast of a swifterror argument.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D90258
2020-10-28 21:52:12 +00:00
Benjamin Kramer 207cf71fa9 Revert "[OpenMP] Add Passing in Original Declaration Names To Mapper API"
This reverts commit d981c7b758 and
a87d7b3d44. Test fails under msan.
2020-10-28 13:58:14 +01:00
Max Kazantsev 160a453138 Return "[IndVars] Remove monotonic checks with unknown exit count"
This reverts commit e038b60d91.
This reverts commit a0d84d8031.

This revert was a mistake. The reason of the failures was
"Use uint64_t for branch weights instead of uint32_t"

Differential Revision: https://reviews.llvm.org/D87832
2020-10-28 18:51:40 +07:00
Florian Hahn b82f80057d [DSE] Use walker to skip noalias stores between current & clobber def.
Instead of getting the defining access we should be able to use
getClobberingMemoryAccess to skip non-aliasing MemoryDefs. No additional
checks should be needed, because we only remove the starting def if it
matches the defining access of the load. All we need to worry about is
that there are no (may)alias stores between the starting def and the
load and getClobberingMemoryAccess should guarantee that.

Partly fixes PR47887.

This improves the number of redundant stores removed in some cases
(numbers below for MultiSource, SPEC2000, SPEC2006 on X86 with -flto
-O3).

Same hash: 226 (filtered out)
Remaining: 11
Metric: dse.NumRedundantStores

Program                                        base   patch1 diff
 test-suite...:: External/Povray/povray.test     1.00   5.00 400.0%
 test-suite...chmarks/MallocBench/gs/gs.test     1.00   3.00 200.0%
 test-suite...0/253.perlbmk/253.perlbmk.test    21.00  37.00 76.2%
 test-suite...0.perlbench/400.perlbench.test    24.00  37.00 54.2%
 test-suite.../Applications/SPASS/SPASS.test     3.00   4.00 33.3%
 test-suite...006/453.povray/453.povray.test    15.00  18.00 20.0%
 test-suite...T2006/445.gobmk/445.gobmk.test    27.00  29.00  7.4%
 test-suite.../CINT2006/403.gcc/403.gcc.test   136.00 137.00  0.7%
 test-suite.../CINT2000/176.gcc/176.gcc.test     6.00   6.00  0.0%
 test-suite.../Benchmarks/Bullet/bullet.test    NaN     3.00  nan%
 test-suite.../Benchmarks/Ptrdist/bc/bc.test    NaN     1.00  nan%

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89647
2020-10-28 11:01:25 +00:00
Luqman Aden 4c0a016927 Rename EHPersonality::MSVC_Win64SEH to EHPersonality::MSVC_TableSEH. NFC.
The types of SEH aren't x86(-32) vs x64 but rather stack-based exception chaining
vs table-based exception handling. x86-32 is the only arch for which Windows
uses the former. 32-bit ARM would use what is called Win64SEH today, which
is a bit confusing so instead let's just rename it to be a bit more clear.

Reviewed By: compnerd, rnk

Differential Revision: https://reviews.llvm.org/D90117
2020-10-27 23:22:13 -07:00
Kazu Hirata b2f05fae80 [JumpThreading] Remove extraneous calls to setEdgeProbability
This patch removes extraneous calls to setEdgeProbability introduced
in c91487769d.

The follow-up patch, a7b662d0f4, has
since fixed BranchProbabilityInfo::eraseBlock, so we don't need to
worry about getting stale values from getEdgeProbability.

Also, since getEdgeProbability(BB, BB->getSingleSuccessor()) returns
edge probability 1/1 by default for BB with exactly one successor
edge, we don't need to explicitly call setEdgeProbability.

This patch introduces almost no functional change, but we do end up
reducing debug messages from setEdgeProbability.

Differential Revision: https://reviews.llvm.org/D90284
2020-10-27 21:12:54 -07:00
Johannes Doerfert d13daa4018 [Attributor] Finalize the CGUpdater after each SCC
This matches the new PM model.
2020-10-27 22:07:56 -05:00
Johannes Doerfert 50d34958df [Attributor][NFC] Introduce a debug counter for `AA::manifest`
This will simplify debugging and tracking down problems.
2020-10-27 22:07:56 -05:00
Johannes Doerfert 1d57b7f503 [Attributor][NFC] Print the right value in debug output 2020-10-27 22:07:55 -05:00
Johannes Doerfert 1c2531c9e1 [Attributor][FIX] Delete all unreachable static functions
Before we used to only mark unreachable static functions as dead if all
uses were known dead. Now we optimistically assume uses to be dead until
proven otherwise.
2020-10-27 22:07:55 -05:00
Johannes Doerfert bfe05b1aff [Attributor][FIX] Do not attach range metadata to the wrong Instruction
If we are looking at a call site argument it might be a load or call
which is in a different context than the call site argument. We cannot
simply use the call site argument range for the call or load.

Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.
2020-10-27 22:07:55 -05:00
Johannes Doerfert 724fcce109 [Attributor][NFC] Clang-format 2020-10-27 22:07:55 -05:00
Johannes Doerfert d504f7b91a [Attributor][NFC] Hoist call out of a lambda
The call is not free, unsure if  this is needed but it does not make it
worse either.
2020-10-27 22:07:54 -05:00
Johannes Doerfert 30e5a1f0be [Attributor][FIX] Properly check uses in the call not uses of the call
In the AANoAlias logic we determine if a pointer may have been captured
before a call. We need to look at other uses in the call not uses of the
call.

The new code is not perfect as it does not allow trivial cases where the
call has multiple arguments but it is at least not unsound and a TODO
was added.
2020-10-27 22:07:54 -05:00
Johannes Doerfert cb813ab66a [Attributor][NFC] Improve time trace output 2020-10-27 22:07:54 -05:00
Kazu Hirata c91487769d [JumpThreading] Set edge probabilities when creating basic blocks
This patch teaches the jump threading pass to set edge probabilities
whenever the pass creates new basic blocks.

Without this patch, the compiler sometimes produces non-deterministic
results.  The non-determinism comes from the jump threading pass using
stale edge probabilities in BranchProbabilityInfo.  Specifically, when
the jump threading pass creates a new basic block, we don't initialize
its outgoing edge probability.

Edge probabilities are maintained in:

  DenseMap<Edge, BranchProbability> Probs;

in class BranchProbabilityInfo, where Edge is an ordered pair of
BasicBlock * and a successor index declared as:

  using Edge = std::pair<const BasicBlock *, unsigned>;

Probs maps edges to their corresponding probabilities.

Now, we rarely remove entries from this map, so if we happen to
allocate a new basic block at the same address as a previously deleted
basic block with an edge probability assigned, the newly created basic
block appears to have an edge probability, albeit a stale one.

This patch fixes the problem by explicitly setting edge probabilities
whenever the jump threading pass creates new basic blocks.

Differential Revision: https://reviews.llvm.org/D90106
2020-10-27 16:07:27 -07:00
Joseph Huber a87d7b3d44 [OpenMP] Add Passing in Original Declaration Names To Mapper API
Summary:
This patch adds support for passing in the original delcaration name in the
source file to the libomptarget runtime. This will allow the runtime to provide
more intelligent debugging messages. This patch takes the original expression
parsed from the OpenMP map / update clause and provides a textual
representation if it was explicitly mapped, otherwise it takes the name of the
variable declaration as a fallback. The information in passed to the runtime in
a global array of strings that matches the existing ident_t source location
strings using ";name;filename;column;row;;". See
clang/test/OpenMP/target_map_names.cpp for an example of the generated output
for a given map clause.

Reviewers: jdoervert

Differential Revision: https://reviews.llvm.org/D89802
2020-10-27 16:09:19 -04:00
Nicolai Hähnle e025d09b21 Revert multiple patches based on "Introduce CfgTraits abstraction"
These logically belong together since it's a base commit plus
followup fixes to less common build configurations.

The patches are:

Revert "CfgInterface: rename interface() to getInterface()"

This reverts commit a74fc48158.

Revert "Wrap CfgTraitsFor in namespace llvm to please GCC 5"

This reverts commit f2a06875b6.

Revert "Try to make GCC5 happy about the CfgTraits thing"

This reverts commit 03a5f7ce12.

Revert "Introduce CfgTraits abstraction"

This reverts commit c0cdd22c72.
2020-10-27 20:33:30 +01:00
Nicolai Hähnle ce6900c6cb Revert "DomTree: Extract (mostly) read-only logic into type-erased base classes"
This reverts commit 848a68a032.
2020-10-27 20:33:29 +01:00
Vedant Kumar 5a3ef55a52 [Utils] Skip RemoveRedundantDbgInstrs in MergeBlockIntoPredecessor (PR47746)
This patch changes MergeBlockIntoPredecessor to skip the call to
RemoveRedundantDbgInstrs, in effect partially reverting D71480 due to
some compile-time issues spotted in LoopUnroll and SimplifyCFG.

The call to RemoveRedundantDbgInstrs appears to have changed the
worst-case behavior of the merging utility. Loosely speaking, it seems
to have gone from O(#phis) to O(#insts).

It might not be possible to mitigate this by scanning a block to
determine whether there are any debug intrinsics to remove, since such a
scan costs O(#insts).

So: skip the call to RemoveRedundantDbgInstrs. There's surprisingly
little fallout from this, and most of it can be addressed by doing
RemoveRedundantDbgInstrs later. The exception is (the block-local
version of) SimplifyCFG, where it might just be too expensive to call
RemoveRedundantDbgInstrs.

Differential Revision: https://reviews.llvm.org/D88928
2020-10-27 10:12:59 -07:00
Raphael Isemann e038b60d91 Revert "[IndVars] Remove monotonic checks with unknown exit count"
This reverts commit c6ca26c0bf.
This breaks stage2 builds due to hitting this assert:
```
   Assertion failed: (WeightSum <= UINT32_MAX && "Expected weights to scale down to 32 bits"), function calcMetadataWeights
```
when compiling AArch64RegisterBankInfo.cpp in LLVM.
2020-10-27 15:31:37 +01:00
Raphael Isemann a0d84d8031 Revert "[NFC] Factor away lambda's redundant parameter"
This reverts commit fdc845b361.
It seems to be a follow-up to c6372b3fb495 which will be reverted.
2020-10-27 15:30:52 +01:00
Simon Pilgrim bce770ffa6 Revert rG0905bd5c2fa42bd4c "[InstCombine] collectBitParts - add trunc support."
This reverts commit 0905bd5c2f.

Causing failures in multistage buildbots that I need to investigate
2020-10-27 13:43:54 +00:00
Nico Weber 2a4e704c92 Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit e5766f25c6.
Makes clang assert when building Chromium, see https://crbug.com/1142813
for a repro.
2020-10-27 09:26:21 -04:00
Simon Pilgrim 0905bd5c2f [InstCombine] collectBitParts - add trunc support.
This should allow us to remove the rather limited matchOrConcat fold and just use recognizeBSwapOrBitReverseIdiom.
2020-10-27 13:14:54 +00:00
Roman Lebedev 0ac56e8eaa
[InstCombine] Fold `(X >>? C1) << C2` patterns to shift+bitmask (PR37872)
This is essentially finalizes a revert of rL155136,
because nowadays the situation has improved, SCEV can model
all these patterns well, and we canonicalize rotate-like patterns
into a funnel shift intrinsics in InstCombine.
So this should not cause any pessimization.

I've verified the canonicalize-{a,l}shr-shl-to-masking.ll transforms
with alive, which confirms that we can freely preserve exact-ness,
and no-wrap flags.

Profs:
* base: https://rise4fun.com/Alive/gPQ
* exact-ness preservation: https://rise4fun.com/Alive/izi
* nuw preservation: https://rise4fun.com/Alive/DmD
* nsw preservation: https://rise4fun.com/Alive/SLN6N
* nuw nsw preservation: https://rise4fun.com/Alive/Qp7

Refs. https://reviews.llvm.org/D46760
2020-10-27 14:42:53 +03:00
Florian Hahn f067bc3c0a [LoopRotation] Allow loop header duplication if vectorization is forced.
-Oz normally does not allow loop header duplication so this loop wouldn't be
vectorized.  However the vectorization pragma should override this and allow
for loop rotation.

rdar://problem/49281061

Original patch by Adam Nemet.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D59832
2020-10-27 09:28:01 +00:00
Max Kazantsev fdc845b361 [NFC] Factor away lambda's redundant parameter 2020-10-27 12:56:52 +07:00
Serguei Katkov b69919b537 [GVN LoadPRE] Add an option to disable splitting backedge
GVN Load PRE can split the backedge causing breaking the loop structure where the latch
contains the conditional branch with for example induction variable.

Different optimizations expect this form of the loop, so it is better to preserve it for some time.
This CL adds an option to control an ability to split backedge.

Default value is true so technically it is NFC and current behavior is not changed.

Reviewers: fedor.sergeev, mkazantsev, nikic, reames, fhahn
Reviewed By: mkazasntsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D89854
2020-10-27 11:59:52 +07:00
Max Kazantsev c6ca26c0bf [IndVars] Remove monotonic checks with unknown exit count
Even if the exact exit count is unknown, we can still prove that this
exit will not be taken. If we can prove that the predicate is monotonic,
fulfilled on first & last iteration, and no overflow happened in between,
then the check can be removed.

Differential Revision: https://reviews.llvm.org/D87832
Reviewed By: apilipenko
2020-10-27 11:35:16 +07:00
Arthur Eubanks 42f76e193b Reland [AlwaysInliner] Pass callee AAResults to InlineFunction()
Test copied from noalias-calls.ll with small changes.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89609
2020-10-26 20:40:46 -07:00
Arthur Eubanks e5766f25c6 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-26 20:24:04 -07:00
Arthur Eubanks 4af5ba1726 Revert "[AlwaysInliner] Pass callee AAResults to InlineFunction()"
This reverts commit 504fbec7a6.

Test failure.
2020-10-26 20:23:38 -07:00
Arthur Eubanks 504fbec7a6 [AlwaysInliner] Pass callee AAResults to InlineFunction()
Test copied from noalias-calls.ll with small changes.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89609
2020-10-26 20:10:09 -07:00
Arthur Eubanks 3dd1c72458 Port -objc-arc-expand to NPM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90182
2020-10-26 20:05:10 -07:00
Arthur Eubanks 90c0b0d3d6 Port -objc-arc-apelim to NPM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90181
2020-10-26 20:01:46 -07:00
Chen Zheng 00e573cadb [LSR] fix typo in comments and rename for a new added hook. 2020-10-26 22:29:22 -04:00
TaWeiTu 0efbfa38ae [NPM] Port -slsr to NPM
`-separate-const-offset-from-gep` has not yet be ported, so some tests are not updated.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D90149
2020-10-27 09:21:40 +08:00
Sriraman Tallam ad1b9daa4b Prepend "__uniq" to symbol names hash with -funique-internal-linkage-names.
Prepend the module name hash with a fixed string ".__uniq." which helps tools
that consume sampled profiles and attribute it to functions to understand
that this symbol belongs to a unique internal linkage type symbol.

Symbols with suffixes can result from various optimizations in the compiler.
Function Multiversioning, function splitting, parameter constant propogation,
unique internal linkage names.

External tools like sampled profile aggregators combine profiles from multiple
runs of a binary. They use various heuristics with symbols that have suffixes
to try and attribute the profile to the right function instance. For instance
multi-versioned symbols like foo.avx, foo.sse4.2, etc even though different
should be attributed to the same source function if a single function is
versioned, using attribute target_clones (supported in GCC but yet to land in
LLVM). Similarly, functions that are split (split part having a .cold suffix)
could have profiles for both the original and split symbols but would be
aggregated and attributed to the original function that was split.

Unique internal linkage functions however have different source instances and
the aggregator must not put them together but attribute it to the appropriate
function instance. To be sure that we are dealing with a symbol of a unique
internal linkage function, we would like to prepend the hash with a known
string ".__uniq." which these tools can check to understand the suffix type.

Differential Revision: https://reviews.llvm.org/D89617
2020-10-26 14:24:28 -07:00
Sanjay Patel 5a6e66ec72 [InstCombine] add folds for icmp+ctpop
https://alive2.llvm.org/ce/z/XjFPQJ

  define void @src(i64 %value) {
    %t0 = call i64 @llvm.ctpop.i64(i64 %value)
    %gt = icmp ugt i64 %t0, 63
    %lt = icmp ult i64 %t0, 64
    call void @use(i1 %gt, i1 %lt)
    ret void
  }

  define void @tgt(i64 %value) {
    %eq = icmp eq i64 %value, -1
    %ne = icmp ne i64 %value, -1
    call void @use(i1 %eq, i1 %ne)
    ret void
  }

  declare i64 @llvm.ctpop.i64(i64) #1
  declare void @use(i1, i1)
2020-10-26 16:48:56 -04:00
Sanjay Patel 437d7551c5 [InstCombine] reduce code duplication in icmp intrinsic folds; NFC 2020-10-26 16:48:56 -04:00
Stanislav Mekhanoshin 00928a1956 Fix SROA with a PHI mergig values from a same block
This fixes the bug 47945. It is legal to have a PHI with values
from from the same block, but values must stay the same. In this
case it is illegal to merge different values.

Differential Revision: https://reviews.llvm.org/D89978
2020-10-26 12:58:27 -07:00
Joe Ellis 0f83505593 [SVE][InstCombine] Fix TypeSize warning in canReplaceGEPIdxWithZero
The warning would fire when calling canReplaceGEPIdxWithZero on a GEP
whose source element type is a scalable vector. The size of scalable
vector types is not known, so this optimization cannot be performed.

This patch fixes the issue by:

- bailing out early in this routine if the GEP instruction's source
  element type is a scalable vector.

- making use of getFixedSize -- this removes the dependency on the
  deprecated interface.

Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D89968
2020-10-26 17:40:26 +00:00
Joe Ellis 467e5cf40f [SVE][AArch64] Fix TypeSize warning in loop vectorization legality
The warning would fire when calling isDereferenceableAndAlignedInLoop
with a scalable load. Calling isDereferenceableAndAlignedInLoop with a
scalable load would result in the use of the now deprecated implicit
cast of TypeSize to uint64_t through the overloaded operator.

This patch fixes this issue by:

- no longer considering vector loads as candidates in
  canVectorizeWithIfConvert. This doesn't make sense in the context of
  identifying scalar loads to vectorize.

- making use of getFixedSize inside isDereferenceableAndAlignedInLoop --
  this removes the dependency on the deprecated interface, and will
  trigger an assertion error if the function is ever called with a
  scalable type.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D89798
2020-10-26 17:40:04 +00:00
Simon Pilgrim 532f3bec3e [InstCombine] collectBitParts - add bitreverse intrinsic support. 2020-10-26 14:36:36 +00:00
Simon Pilgrim 6b2eb31e1e [InstCombine] Add support for zext(and(neg(amt),width-1)) rotate shift amount patterns
Alive2: https://alive2.llvm.org/ce/z/bCvvHd
2020-10-26 11:22:41 +00:00
Max Kazantsev bfabd7878b Fix broken build after previous commit 2020-10-26 14:55:46 +07:00
Max Kazantsev cdccc82f48 [NFC] Remove unused funciton param 2020-10-26 14:53:22 +07:00
Max Kazantsev 4b5e848bef [NFC] Factor out common code into lambda for further improvement 2020-10-26 14:50:45 +07:00
Max Kazantsev c019099053 [IndVars] Use contextual knowledge when proving trivial conds
No exact example where it would help, but it's a generally a more
powerful way to prove predicates.
2020-10-26 13:48:32 +07:00
Simon Pilgrim 3052e474ec [InstCombine] matchBSwapOrBitReversem - recognise or(fshl(),fshl()) bswap patterns.
I'm not certain InstCombinerImpl::matchBSwapOrBitReverse needs to filter the or(op0(),op1()) ops - there are just too many cases that recognizeBSwapOrBitReverseIdiom/collectBitParts handle now (and quickly).
2020-10-25 10:17:45 +00:00
TaWeiTu 65a36bbc3d [NPM] Port -loop-versioning-licm to NPM
Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89371
2020-10-24 21:51:18 +08:00
TaWeiTu 060a4fccf1 [LoopVersioning] Form dedicated exits for versioned loop to preserve simplify form
The exit blocks of the versioned and non-versioned loops are not dedicated and thus the two loops are not in simplify form.
Insert dummy exit blocks after loop versioning with `formDedicatedExits()` to preserve the simplify form for subsequence passes.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89569
2020-10-24 21:40:46 +08:00
Simon Pilgrim 310f62b4ff [InstCombine] narrowFunnelShift - fold trunc/zext or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) (PR35155)
As discussed on PR35155, this extends narrowFunnelShift (recently renamed from narrowRotate) to support basic funnel shift patterns.

Unlike matchFunnelShift we don't include the computeKnownBits limitation as extracting the pattern from the zext/trunc layers should be a indicator of reasonable funnel shift codegen, in D89139 we demonstrated how to efficiently promote funnel shifts to wider types.

Differential Revision: https://reviews.llvm.org/D89542
2020-10-24 12:42:43 +01:00
Hongtao Yu a16cbdd676 [AutoFDO] Remove a broken assert in merging inlinee samples
Duplicated callsites share the same callee profile if the original callsite was inlined. The sharing also causes the profile of callee's callee to be shared. This breaks the assert introduced ealier by D84997 in a tricky way.

To illustrate, I'm using an abstract example. Say we have three functions `A`, `B` and `C`. A calls B twice and B calls C once. Some optimize performed prior to the sample profile loader duplicates first callsite to `B` and the program may look like

```
A()
{
  B();  // with nested profile B1 and C1
  B();  // duplicated, with nested profile B1 and C1
  B();  // with nested profile B2 and C2
}
```

For some reason, the sample profile loader inliner then decides to only inline the first callsite in `A` and transforms `A` into

```
A()
{
  C();  // with nested profile C1
  B();  // duplicated, with nested profile B1 and C1
  B();  // with nested profile B2 and C2.
}
```

Here is what happens next:

	1. Failing to inline the callsite `C()` results in `C1`'s samples returned to `C`'s base (outlined) profile. In the meantime, `C1`'s head samples are updated to `C1`'s entry sample. This also affects the profile of the middle callsite which shares `C1` with the first callsite.
	2. Failing to inline the middle callsite results in `B1` returned to `B`'s base profile, which in turn will cause `C1` merged into `B`'s base profile. Note that the nest `C` profile in `B`'s base has a non-zero head sample count now. The value actually equals to `C1`'s entry count.
	3. Failing to inline last callsite results in `B2` returned to `B`'s base profile. Note that the nested `C` profile in `B`'s base now has an entry count equal to the sum of that of `C1` and `C2`, with the head count equal to that of `C1`. This will trigger the assert later on.
        4. Compiling `B` using `B`'s base profile. Failing to inline `C` there triggers the returning of the nested `C` profile. Since the nested `C` profile has a non-zero head count, the returning doesn't go through. Instead, the assert goes off.

It's good that `C1` is only returned once, based on using a non-zero head count to ensure an inline profile is only returned once. However C2 is never returned. While it seems hard to solve this perfectly within the current framework, I'm just removing the broken assert. This should be reasonably fixed by the upcoming CSSPGO work where counts returning is based on context-sensitivity and a distribution factor for callsite probes.

The simple example is extracted from one of our internal services. In reality, why the original callsite `B()` and duplicate one having different inline behavior is a magic. It has to do with imperfect counts in profile and extra complicated inlining that makes the hotness for them different.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D90056
2020-10-23 17:42:21 -07:00
Arthur Eubanks baffd052b0 [StructurizeCFG][NewPM] Port -structurizecfg to NPM
This doesn't support -structurizecfg-skip-uniform-regions since that
would require porting LegacyDivergenceAnalysis.

The NPM doesn't support adding a non-analysis pass as a dependency of
another, so I had to add -lowerswitch to some tests or pin them to the
legacy PM.

This is the only RegionPass in tree, so I simply copied the logic for
finding all Regions from the legacy PM's RGManager into
StructurizeCFG::run().

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89026
2020-10-23 15:54:03 -07:00
Arthur Eubanks ba22c403b2 [Inliner][NPM] Properly pass callee AAResults
Fixes noalias-calls.ll under NPM.

Differential Revision: https://reviews.llvm.org/D89592
2020-10-23 15:37:18 -07:00
Artur Pilipenko 6ec2c5e402 GC-parseable element atomic memcpy/memmove
This change introduces a GC parseable lowering for element atomic
memcpy/memmove intrinsics. This way runtime can provide an
implementation which can take a safepoint during copy operation.

See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev
for the background and details:
https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ

Differential Revision: https://reviews.llvm.org/D88861
2020-10-23 14:06:09 -07:00
Nick Desaulniers b7926ce6d7 [IR] add fn attr for no_stack_protector; prevent inlining on mismatch
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.

It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.

While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u

Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining.  Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.

Fixes pr/47479.

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D87956
2020-10-23 11:55:39 -07:00
Chen Zheng 1e0b6c1df0 [LSR] ignore profitable chain when reg num is not major cost.
Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D89665
2020-10-23 09:35:48 -04:00
Simon Pilgrim 1cab3bf004 [InstCombine] matchBSwapOrBitReverse - expose bswap/bitreverse matching flags.
matchBSwapOrBitReverse was hardcoded to just match bswaps - we're going to need to expose the ability to match bitreverse as well, so make this part of the function call.
2020-10-23 12:35:28 +01:00
Simon Pilgrim 19a13bf538 [InstCombine] Rename InstCombinerImpl::matchBSwap to matchBSwapOrBitReverse. NFCI.
This matches bswap and bitreverse intrinsics, so we should make that clear in the function name.
2020-10-23 12:35:27 +01:00
OCHyams fea067bdfd [mem2reg] Remove dbg.values describing contents of dead allocas
This patch copies @vsk's fix to instcombine from D85555 over to mem2reg. The
motivation and rationale are exactly the same: When mem2reg removes an alloca,
it erases the dbg.{addr,declare} instructions which refer to the alloca. It
would be better to instead remove all debug intrinsics which describe the
contents of the dead alloca, namely all dbg.value(<dead alloca>, ...,
DW_OP_deref)'s.

As far as I can tell, prior to D80264 these `dbg.value+deref`s would have been
silently dropped instead of being made `undef`, so we're just returning to
previous behaviour with these patches.

Testing:
`llvm-lit llvm/test` and `ninja check-clang` gave no unexpected failures. Added
3 tests, each of which covers a dbg.value deletion path in mem2reg:
  mem2reg-promote-alloca-1.ll
  mem2reg-promote-alloca-2.ll
  mem2reg-promote-alloca-3.ll
The first is based on the dexter test inlining.c from D89543. This patch also
improves the debugging experience for loop.c from D89543, which suffers
similarly after arg promotion instead of inlining.
2020-10-23 04:46:56 +00:00
Caroline Concatto 2415636475 [SVE]Clarify TypeSize comparisons in llvm/lib/Transforms
Use isKnownXY comparators when one of the operands can be with
scalable vectors or getFixedSize() for all the other cases.

This patch also does bug fixes for getPrimitiveSizeInBits by using
getFixedSize() near the places with the TypeSize comparison.

Differential Revision: https://reviews.llvm.org/D89703
2020-10-23 09:15:17 +01:00
Max Kazantsev 6e574abf61 [SCEV][NFC] Cache symbolic max exit count
We want to have a caching version of symbolic BE exit count
rather than recompute it every time we need it.

Differential Revision: https://reviews.llvm.org/D89954
Reviewed By: nikic, efriedma
2020-10-23 12:29:37 +07:00
Arthur Eubanks 0291e2c933 [Inliner] Run always-inliner in inliner-wrapper
An alwaysinline function may not get inlined in inliner-wrapper due to
the inlining order.

Previously for the following, the inliner would first inline @a() into @b(),

```
define void @a() {
entry:
  call void @b()
  ret void
}

define void @b() alwaysinline {
entry:
  br label %for.cond

for.cond:
  call void @a()
  br label %for.cond
}
```

making @b() recursive and unable to be inlined into @a(), ending at

```
define void @a() {
entry:
  call void @b()
  ret void
}

define void @b() alwaysinline {
entry:
  br label %for.cond

for.cond:
  call void @b()
  br label %for.cond
}
```

Running always-inliner first makes sure that we respect alwaysinline in more cases.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46945.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D86988
2020-10-22 19:16:25 -07:00
Vedant Kumar 099bffe7f7 Revert "[CodeExtractor] Don't create bitcasts when inserting lifetime markers (NFCI)"
This reverts commit 26ee8aff2b.

It's necessary to insert bitcast the pointer operand of a lifetime
marker if it has an opaque pointer type.

rdar://70560161
2020-10-22 12:25:50 -07:00
Arthur Eubanks 92d9a3868a Port -instnamer to NPM
Some clang tests use this.

Reviewed By: akhuang

Differential Revision: https://reviews.llvm.org/D89931
2020-10-22 12:08:36 -07:00
Layton Kifer d49911c282 [InstCombine][NFC] Use ConstantExpr::getBinOpIdentity
Delete duplicate implementation getSelectFoldableConstant and
replace with ConstantExpr::getBinOpIdentity.

Differential Revision: https://reviews.llvm.org/D89839
2020-10-22 20:44:57 +02:00
Nikita Popov 3e37543111 [MemCpyOpt] Move GEP during call slot optimization
When performing a call slot optimization to a GEP destination, it
will currently usually fail, because the GEP is directly before the
memcpy and as such does not dominate the call. We should move it
above the call if that satisfies the domination requirement.

I think that a constant-index GEP is the only useful thing to move
here, as otherwise isDereferenceablePointer couldn't look through
it anyway. As such I'm not trying to generalize this further.

Differential Revision: https://reviews.llvm.org/D89623
2020-10-22 20:40:56 +02:00
Ettore Tiotto e6521ce064 [NFC][PartialInliner]: Clean up code
Make member function const where possible, use LLVM_DEBUG to print debug traces
rather than a custom option, pass by reference to avoid null checking, ...

Reviewed By: fhann

Differential Revision: https://reviews.llvm.org/D89895
2020-10-22 14:40:15 -04:00
Vedant Kumar 3419252a79 [InstCombine] Remove dbg.values describing contents of dead allocas
When InstCombine removes an alloca, it erases the dbg.{addr,declare}
instructions which refer to the alloca. It would be better to instead
remove all debug intrinsics which describe the contents of the dead
alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s.

This effectively undoes work performed in an InstCombine run earlier in
the pipeline by LowerDbgDeclare, which inserts DW_OP_deref dbg.values
before CallInst users of an alloca. The motivating example looks like:

```
  define void @foo(i32 %0) {
    %a = alloca i32              ; This alloca is erased.
    store i32 %0, i32* %a
    dbg.value(i32 %0, "arg0")    ; This dbg.value survives.
    dbg.value(i32* %a, "arg0", DW_OP_deref)
    call void @trivially_inlinable_no_op(i32* %a)
    ret void
  }
```

If the DW_OP_deref dbg.value is not erased, it becomes dbg.value(undef)
after inlining, making "arg0" unavailable. But we already have dbg.value
descriptions of the alloca's value (from LowerDbgDeclare), so the
DW_OP_deref dbg.value cannot serve its purpose of describing an
initialization of the alloca by some callee. It invalidates other useful
dbg.values, causing large gaps in location coverage, so we should delete
it (even though doing so may cause stale dbg.values to appear, if
there's a dead store to `%a` in @trivially_inlinable_no_op).

OTOH, it wouldn't be correct to delete all dbg.value descriptions of an
alloca. Note that it's possible to describe a variable that takes on
different pointer values, e.g.:

```
  void use(int *);
  void t(int a, int b) {
    int *local = &a;     // dbg.value(i32* %a.addr, "local")
    local = &b;          // dbg.value(i32* undef, "local")
    use(&a);             //           (note: %b.addr is optimized out)
    local = &a;          // dbg.value(i32* %a.addr, "local")
  }
```

In this example, the alloca for "b" is erased, but we need to describe
the value of "local" as <unavailable> before the call to "use". This
prevents "local" from appearing to be equal to "&a" at the callsite.

rdar://66592859

Differential Revision: https://reviews.llvm.org/D85555
2020-10-22 10:00:13 -07:00
Serguei Katkov 75d0e0cd5f [IRCE] consolidate profitability check
Use BFI if it is available and BPI otherwise.
This is a promised follow-up after D89541.

Reviewers: ebrevnov, mkazantsev
Reviewed By: ebrevnov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D89773
2020-10-22 11:26:45 +07:00
Zequan Wu 2f29341114 Revert "Revert "SimplifyCFG: Clean up optforfuzzing implementation""
This reverts commit 716f7636e1.
2020-10-21 17:08:56 -07:00
Zequan Wu 716f7636e1 Revert "SimplifyCFG: Clean up optforfuzzing implementation"
See discussion: https://reviews.llvm.org/D89590
This reverts commit cdd006eec9.
2020-10-21 16:56:32 -07:00
Arthur Eubanks 8d9466a385 [BlockExtract][NewPM] Port -extract-blocks to NPM
Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89015
2020-10-21 12:51:11 -07:00
Arthur Eubanks aa6c305344 [LowerMatrixIntrinsics][NewPM] Fix PreservedAnalyses result
PreservedCFGCheckerInstrumentation was saying that LowerMatrixIntrinsics
didn't properly preserve CFG even though it claimed to. The legacy pass
says it doesn't. Match the legacy pass's preserved analyses.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89175
2020-10-21 12:42:16 -07:00
Artur Pilipenko e8cce5ad89 [RS4GC] NFC. Preparatory refactoring to make GC parseable memcpy
For GC parseable element atomic memcpy/memmove we'll need to
shuffle statepoint arguments. Make it possible by storing the
arguments as Value *, not Use *.
2020-10-21 12:38:20 -07:00
Simon Pilgrim 7b4a828452 [InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI. 2020-10-21 11:53:45 +01:00
Florian Hahn 88241ffb56 [Passes] Move ADCE before DSE & LICM.
The adjustment seems to have very little impact on optimizations.
The only binary change with -O3 MultiSource/SPEC2000/SPEC2006 on X86 is
in consumer-typeset and the size there actually decreases by -0.1%, with
not significant changes in the stats.

On its own, it is mildly positive in terms of compile-time, most likely
due to LICM & DSE having to process slightly less instructions. It
should also be unlikely that DSE/LICM make much new code dead.

http://llvm-compile-time-tracker.com/compare.php?from=df63eedef64d715ce1f31843f7de9c11fe1e597f&to=e3bdfcf94a9eeae6e006d010464f0c1b3550577d&stat=instructions

With DSE & MemorySSA, it gives some nice compile-time improvements, due
to the fact that DSE can re-use the PDT from ADCE, if it does not make
any changes:

http://llvm-compile-time-tracker.com/compare.php?from=15fdd6cd7c24c745df1bb419e72ff66fd138aa7e&to=481f494515fc89cb7caea8d862e40f2c910dc994&stat=instructions

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D87322
2020-10-21 10:30:56 +01:00
Martin Storsjö 4de215ff18 Revert "[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support"
Also revert "[InstCombine] foldOrOfICmps - use m_Specific instead of
explicit comparisons. NFCI." to make the primarily intended revert
work.

This reverts commits ce13549761 and
e372a5f86f.

This commit caused failed asserts e.g. like this:

$ cat repro.cpp
bool a(char b) {
  return b >= '0' && b <= '9' || (b | 32) >= 'a' && (b | 32) <= 'z';
$ clang++ -target x86_64-linux-gnu -c -O2 repro.cpp
clang++: ../include/llvm/ADT/APInt.h:1151: bool llvm::APInt::operator==(const
llvm::APInt&) const: Assertion `BitWidth == RHS.BitWidth && "Comparison
requires equal bit widths"' failed.
2020-10-21 09:47:18 +03:00
Geoffrey Martin-Noble c17ae2916c Remove unnecessary header include which violates layering
This was introduced in https://reviews.llvm.org/D89774, but I don't
think it should be necessary.

Reviewed By: TaWeiTu, aeubanks

Differential Revision: https://reviews.llvm.org/D89843
2020-10-20 20:14:03 -07:00
Nicolai Hähnle 848a68a032 DomTree: Extract (mostly) read-only logic into type-erased base classes
Avoid having to instantiate and compile a subset of the dominator tree logic
separately for each node type. More importantly, this allows generic
algorithms to be built on top of dominator trees without writing them as
templates -- such algorithms can now use opaque CfgBlockRef and
CfgInterface instead.

A type-erased implementation of dominator trees could be written in
terms of CfgInterface as well, but doing so would change the current
trade-off: it would slightly reduce code size at the cost of a slight
runtime overhead.

This patch does not change the trade-off, as it only does type-erasure
where basic blocks can be treated in a fully opaque way, i.e. it only
moves methods that don't require iteration over CFG successors and
predecessors.

v5:
- rename generic_{begin,end,children} back without the generic_ prefix
  and refer explictly to base class methods in NewGVN, which wants to
  mutate the order of dominator tree node children directly

v6:
- style change: iDom -> idom; it's arguable whether this is really
  invalid, since it is actually standard camelCase, but clang-tidy
  complains about it so... *shrug*
- rename {to,from}Generic -> {wrap,unwrap}Ref

Change-Id: Ib860dc04cf8bb093d8ed00be7def40d662213672

Differential Revision: https://reviews.llvm.org/D83089
2020-10-20 19:53:07 +02:00
Ta-Wei Tu 529ecd19df [NPM] port -unify-loop-exits to NPM
Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89774
2020-10-20 10:46:57 -07:00
Ta-Wei Tu 59286b36df [NPM] Port -mergereturn to NPM
Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89781
2020-10-20 10:33:58 -07:00
Florian Hahn 2e58010208 [DSE] Do not scan users of memory terminators for further reads.
isMemTerminator checks if the current def is a memory terminator that
terminates the memory pointed to by DefLoc. We do not have to add any of
their users to the worklist, because the follow-on users cannot read the
memory in question.

This leads to more stores eliminated in the presence of lifetime calls.
Previously we added the users of those intrinsics to the worklist,
limiting elimination.

In terms of removed stores, this gives a nice boost on some benchmarks
(MultiSource/SPEC2000/SPEC2006 on X86 with -flto -O3):

Same hash: 205 (filtered out)
Remaining: 32
Metric: dse.NumFastStores

Program                                          base   patch   diff
 test-suite...000/197.parser/197.parser.test     4.00    8.00  100.0%
 test-suite...rolangs-C++/family/family.test     4.00    7.00  75.0%
 test-suite...marks/7zip/7zip-benchmark.test   1722.00 2189.00 27.1%
 test-suite...CFP2000/177.mesa/177.mesa.test    30.00   38.00  26.7%
 test-suite :: External/Nurbs/nurbs.test        44.00   49.00  11.4%
 test-suite...lications/sqlite3/sqlite3.test   115.00  128.00  11.3%
 test-suite...006/447.dealII/447.dealII.test   2715.00 3013.00 11.0%
 test-suite...ProxyApps-C++/CLAMR/CLAMR.test   237.00  261.00  10.1%
 test-suite...tions/lambda-0.1.3/lambda.test    40.00   44.00  10.0%
 test-suite...3.xalancbmk/483.xalancbmk.test   1366.00 1475.00  8.0%
 test-suite...abench/jpeg/jpeg-6a/cjpeg.test    13.00   14.00   7.7%
 test-suite...oxyApps-C++/miniFE/miniFE.test    43.00   46.00   7.0%
 test-suite...lications/ClamAV/clamscan.test   230.00  246.00   7.0%
 test-suite...006/450.soplex/450.soplex.test   284.00  299.00   5.3%
 test-suite...nsumer-jpeg/consumer-jpeg.test    21.00   22.00   4.8%
2020-10-20 16:55:22 +01:00
Simon Pilgrim ec228fbfc0 [InstCombine] SimplifyDemandedUseBits - replace dyn_cast<ConstantInt> with m_ConstantInt. NFCI. 2020-10-20 16:45:16 +01:00
Simon Pilgrim ce13549761 [InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI. 2020-10-20 16:26:41 +01:00
Florian Hahn 6439fde6d4 [DSE] Bail out from getLocForWriteEx if call is not argmemonly/inacc_mem.
This change should currently not have any impact, but guard against
further inconsistencies between MemoryLocation and function attributes.
2020-10-20 14:37:53 +01:00
Simon Pilgrim e372a5f86f [InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support
Reapplied rGa704d8238c86 with a check for integer/integervector types to prevent matching with pointer types
2020-10-20 14:14:26 +01:00
Nicolai Hähnle c0cdd22c72 Introduce CfgTraits abstraction
The CfgTraits abstraction simplfies writing algorithms that are
generic over the type of CFG, and enables writing such algorithms
as regular non-template code that operates on opaque references
to CFG blocks and values.

Implementations of CfgTraits provide operations on the concrete
CFG types, e.g. `IrCfgTraits::BlockRef` is `BasicBlock *`.

CfgInterface is an abstract base class which provides operations
on opaque types CfgBlockRef and CfgValueRef. Those opaque types
encapsulate a `void *`, but the meaning depends on the concrete
CFG type. For example, MachineCfgTraits -- for use with MachineIR
in SSA form -- encodes a Register inside CfgValueRef. Converting
between concrete references and opaque/generic ones is done by
CfgTraits::{fromGeneric,toGeneric}. Convenience methods
CfgTraits::{un}wrap{Iterator,Range} are available as well.

Writing algorithms in terms of CfgInterface adds some overhead
(virtual method calls, plus in same cases it removes the
opportunity to inline iterators), but can be much more convenient
since generic algorithms can be written as non-templates.

This patch adds implementations of CfgTraits for all CFGs on
which dominator trees are calculated, so that the dominator
tree can be ported to this machinery. Only IrCfgTraits (LLVM IR)
and MachineCfgTraits (Machine IR in SSA form) are complete, the
other implementations are limited to the absolute minimum
required to make the upcoming dominator tree changes work.

v5:
- fix MachineCfgTraits::blockdef_iterator and allow it to iterate over
  the instructions in a bundle
- use MachineBasicBlock::printName

v6:
- implement predecessors/successors for all CfgTraits implementations
- fix error in unwrapRange
- rename toGeneric/fromGeneric into wrapRef/unwrapRef to have naming
  that is consistent with {wrap,unwrap}{Iterator,Range}
- use getVRegDef instead of getUniqueVRegDef

v7:
- std::forward fix in wrapping_iterator
- fix typos

v8:
- cleanup operators on CfgOpaqueType
- address other review comments

Change-Id: Ia75f4f268fded33fca11218a7d578c9aec1f3f4d

Differential Revision: https://reviews.llvm.org/D83088
2020-10-20 13:50:52 +02:00
Simon Pilgrim e346ea9905 [InstCombine] SimplifyDemandedUseBits - pass APInt by const reference. NFCI. 2020-10-20 12:13:08 +01:00
Atmn Patel 595c615606 [IR] Adds mustprogress as a LLVM IR attribute
This adds the LLVM IR attribute `mustprogress` as defined in LangRef through D86233. This attribute will be applied to functions with in languages like C++ where forward progress is guaranteed. Functions without this attribute are not required to make progress.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85393
2020-10-20 03:09:57 -04:00
Serguei Katkov 38799975ce [IRCE] Do not transform if loop has small number of iterations
IRCE has some overhead for runtime checks and in case number of iteration is small
the overhead can kill the benefit from optimizations.

This CL bases on BlockFrequencyInfo of pre-header and header to estimate the
number of loop iterations. If it is less than irce-min-estimated-iters we do not transform the loop.

Probably it is better to make more complex cost model but for simplicity it seems the be enough.

The usage of BFI is added only for new pass manager and tries to use it efficiently.

Reviewers: ebrevnov, dantrushin, asbirlea, mkazantsev
Reviewed By: mkazantsev
Subscribers: llvm-commits, fhahn
Differential Revision: https://reviews.llvm.org/D89541
2020-10-20 10:33:59 +07:00
Jordan Rupprecht 8a377f1e3c [NFC] Inline assertion-only variable 2020-10-19 15:11:37 -07:00
Roman Lebedev e0567582b8
[NFCI][SCEV] Always refer to enum SCEVTypes as enum, not integer
The main tricky thing here is forward-declaring the enum:
we have to specify it's underlying data type.

In particular, this avoids the danger of switching over the SCEVTypes,
but actually switching over an integer, and not being notified
when some case is not handled.

I have updated most of such switches to be exaustive and not have
a default case, where it's pretty obvious to be the intent,
however not all of them.
2020-10-20 00:10:22 +03:00
Roman Lebedev 3355284b2d
[NFC][SCEVExpander] isHighCostExpansionHelper(): rewrite as a switch
If we switch over an enum, compiler can easily issue a diagnostic
if some case is not handled. However with an if cascade that isn't so.
Experimental evidence suggests new behavior to be superior.
2020-10-20 00:10:22 +03:00
Simon Pilgrim adb52e5f9e [InstCombine] foldOrOfICmps - only fold (icmp_eq B, 0) | (icmp_ult/gt A, B) for integer types
Fixes a number of stage2 buildbots that were failing when I generalized the m_ConstantInt() logic - that didn't match for pointer types but m_Zero() does......
2020-10-19 17:05:38 +01:00
Simon Pilgrim 482e6f0041 Revert rGa704d8238c86bac: "[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support"
This reverts commit a704d8238c.

Causing stage2 build failures on some bots.
2020-10-19 16:03:36 +01:00
Simon Pilgrim de885f1b2a [InstCombine] Add (icmp ne A, 0) | (icmp ne B, 0) --> (icmp ne (A|B), 0) vector support
Scalar cases were already being handled by foldLogOpOfMaskedICmps (so this was dead code), but refactoring to support non-uniform vectors will take some time, so tweak this fold in the meantime.
2020-10-19 15:41:21 +01:00
Simon Pilgrim ecd25086d1 [InstCombine] Add (icmp eq B, 0) | (icmp ult/gt A, B) -> (icmp ule A, B-1) vector support 2020-10-19 15:23:48 +01:00
Simon Pilgrim a704d8238c [InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support 2020-10-19 14:55:18 +01:00
Simon Pilgrim 1d90e53044 [InstCombine] foldOrOfICmps - pull out repeated getOperand() calls. NFCI. 2020-10-19 14:28:08 +01:00
Hans Wennborg 0628bea513 Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting"
This broke Chromium's PGO build, it seems because hot-cold-splitting got turned
on unintentionally. See comment on the code review for repro etc.

> This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
> the splitting pass to be toggled on/off. The current method of passing
> `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
> correctly (say, with `-O0` or `-Oz`).
>
> To implement the -fsplit-cold-code option, an attribute is applied to
> functions to indicate that they may be considered for splitting. This
> removes some complexity from the old/new PM pipeline builders, and
> behaves as expected when LTO is enabled.
>
> Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
> Differential Revision: https://reviews.llvm.org/D57265
> Reviewed By: Aditya Kumar, Vedant Kumar
> Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar

This reverts commit 273c299d5d.
2020-10-19 12:31:14 +02:00
Simon Pilgrim 0b7b446a40 [InstCombine] Support vectors-with-undef in and(logicalshift(1,X),1) --> zext(X == 0) fold 2020-10-19 11:10:32 +01:00
Roman Lebedev d083d55c2c
[NFC][SCEV] Rename SCEVCastExpr into SCEVIntegralCastExpr
All existing SCEV cast types operate on integers.
D89456 will add SCEVPtrToIntExpr cast expression type.
I believe this is best for consistency.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89455
2020-10-19 10:59:53 +03:00
Florian Hahn f5cf7f544b [DSE] Do not consider 'noop' intrinsics as read-clobbers.
isNoopIntrinsic returns true for some intrinsics that are modeled in
MemorySSA but do not actually read or write any memory and do not block
DSE. Such intrinsics should not be considered as read-clobbers.
2020-10-18 15:51:05 +01:00
Dávid Bolvanský 65e94cc946 [InferAttrs] Add argmemonly attribute to string libcalls
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89602
2020-10-18 01:33:26 +02:00
Dávid Bolvanský 2a75e956e5 Revert "[InferAttrs] Add argmemonly attribute to string libcalls"
This reverts commit b77dd32a6f. Sanitizer tests are broken.
2020-10-17 23:29:02 +02:00
Dávid Bolvanský b77dd32a6f [InferAttrs] Add argmemonly attribute to string libcalls
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89602
2020-10-17 22:42:36 +02:00
Sanjay Patel 53e92b4c0e [InstCombine] (~A & B) ^ A -> A | B
Differential Revision: https://reviews.llvm.org/D86395
2020-10-17 12:20:18 -04:00
Nikita Popov 50cc9a0e61 [MemCpyOpt] Extract common function for unwinding check
These two cases should be using the same logic. Not NFC, as this
resolves the TODO regarding use of the underlying object.
2020-10-17 15:30:39 +02:00
Pedro Tammela 60b19424bb [NFC] fix some typos in LoopUnrollPass
This patch fixes a couple of typos in the LoopUnrollPass.cpp comments

Differential Revision: https://reviews.llvm.org/D89603
2020-10-17 14:20:55 +01:00
Juneyoung Lee 62a0ec1612 Add support for !noundef metatdata on loads
This patch adds metadata !noundef and makes load instructions can optionally have it.
A load with !noundef always return a well-defined value (has no undef bit or isn't poison).
If the loaded value isn't well defined, the behavior is undefined.

This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values.
It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise.

The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead.
The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89050
2020-10-17 13:50:10 +09:00
Artem Belevich c36c0fabd1 [VectorCombine] Avoid crossing address space boundaries.
We can not bitcast pointers across different address spaces, and VectorCombine
should be careful when it attempts to find the original source of the loaded
data.

Differential Revision: https://reviews.llvm.org/D89577
2020-10-16 13:19:31 -07:00
Benjamin Kramer b740899c50 [Indvars][NFCI] Simplify assertion.
This should be semantically identical. Also avoids unused variable
warnings in Release builds.
2020-10-16 19:58:55 +02:00
Matt Arsenault 0a7cd99a70 Reapply "OpaquePtr: Add type to sret attribute"
This reverts commit eb9f7c28e5.

Previously this was incorrectly handling linking of the contained
type, so this merges the fixes from D88973.
2020-10-16 11:05:02 -04:00
Simon Pilgrim 83ae625f0c [InstCombine] visitAnd - pull out repeated I.getType() calls. NFCI. 2020-10-16 15:43:11 +01:00
Simon Pilgrim 253f24cf4c [InstCombine] Remove custom and(trunc(and(x,c1)),c2) fold
This is more correctly handled by canEvaluateTruncated (one use checks etc.) and covers all the tests cases that were added for this fold.
2020-10-16 15:43:10 +01:00
Michael Liao 98f254960f [globalopt] Teach to look through `addrspacecast`.
- so that global variables in numbered address spaces could be properly
  analyzed.

Differential Revision: https://reviews.llvm.org/D89140
2020-10-16 08:43:09 -04:00
Max Kazantsev 0857029011 [Indvars][NFC] Merge two functions together
Logic of widenWithVariantUse is split into check and transform
part, unlike any other transform in IndVars. We want to pass some
extra flags from analysis to transform part and standartize
the code at once, so merging them together.
2020-10-16 19:21:57 +07:00
Simon Pilgrim 981fdf01d5 [InstCombine] foldSelectRotate - canonicalize to OR(SHL,LSHR). NFCI.
Match the canonicalization code that was added to matchFunnelShift at rG02295e6d1a15
2020-10-16 13:18:53 +01:00
Max Kazantsev bb39372e5e [Indvars][NFCI] Remove meaningless restrictive code in IndVars
Variable ExtendOperExpr only exists to check whether it is a SCEV ext.
We create it as SCEV ext right here, so semantically this check is
trivially true. In theory, it may fail if SCEV is smart enough and can
simplify the expression. However, no matter whether it is an ext or not,
we never use this fact for further reasoning. So this code is currently
useless and in theory may become harmful with SCEV's development.

We do not expect any behavior changes with removing it. If it caused
negative changes, the patch should be reverted.
2020-10-16 18:04:31 +07:00
Max Kazantsev 0ee0c7dcc3 [Indvars][NFC] Remove duplicating checks
Some facts have already been checked in widenWithVariantUse and then
checked again in widenWithVariantUseCodegen. The latter is redundant,
we can replace it with asserts.
2020-10-16 17:35:14 +07:00
Simon Pilgrim 1cf347e48b [InstCombine] narrowRotate - minor refactoring for funnel shift support. NFC.
Prep work for PR35155 - renamed narrowRotate to narrowFunnelShift, rewrote some comments and adjusted code to collect separate shift values, although we bail if they don't match (still only rotations are only actually folded).

I'm trying to match matchFunnelShift as much as possible in case we finally get to merge these one day.
2020-10-16 11:27:28 +01:00
Simon Pilgrim 55991b44b7 [InstCombine] foldAndOrOfICmpsOfAndWithPow2 - add vector support
Support vector cases for folding:

 (iszero(A & K1) | iszero(A & K2)) -> (A & (K1 | K2)) != (K1 | K2)
 (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 | K2)) == (K1 | K2)
2020-10-16 10:41:40 +01:00
Florian Hahn 51ff04567b Recommit "[DSE] Switch to MemorySSA-backed DSE by default."
After investigation by @asbirlea, the issue that caused the
revert appears to be an issue in the original source, rather
than a problem with the compiler.

This patch enables MemorySSA DSE again.

This reverts commit 915310bf14.
2020-10-16 09:02:53 +01:00
Vedant Kumar 273c299d5d [PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting
This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
the splitting pass to be toggled on/off. The current method of passing
`-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
correctly (say, with `-O0` or `-Oz`).

To implement the -fsplit-cold-code option, an attribute is applied to
functions to indicate that they may be considered for splitting. This
removes some complexity from the old/new PM pipeline builders, and
behaves as expected when LTO is enabled.

Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
Differential Revision: https://reviews.llvm.org/D57265
Reviewed By: Aditya Kumar, Vedant Kumar
Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
2020-10-15 23:13:33 +00:00
Florian Hahn 89c0124273 [LoopVersion] Unify SCEVChecks and alias check handling (NFC).
This is an initial cleanup of the way LoopVersioning interacts with LAA.

Currently LoopVersioning has 2 ways of initializing things:

1. Passing LAI and passing UseLAIChecks = true
2. Passing UseLAIChecks = false, followed by calling setSCEVChecks and
   setAliasChecks.

Both ways of initializing lead to the same result and the duplication
seems more complicated than necessary.

This patch removes the UseLAIChecks flag from the constructor and the
setSCEVChecks & setAliasChecks helpers and move initialization
exclusively to the constructor.

This simplifies things, by providing a single way to initialize
LoopVersioning and reducing duplication.

Reviewed By: Meinersbur, lebedev.ri

Differential Revision: https://reviews.llvm.org/D84406
2020-10-15 22:02:17 +01:00
David Green 13ec3dd66f [LV] Add a getRecurrenceBinOp and make use of it. NFC 2020-10-15 18:21:41 +01:00
Hiroshi Yamauchi 1ebee7adf8 [PGO] Remove the old memop value profiling buckets.
Following up D81682 and D83903, remove the code for the old value profiling
buckets, which have been replaced with the new, extended buckets and disabled by
default.

Also syncing InstrProfData.inc between compiler-rt and llvm.

Differential Revision: https://reviews.llvm.org/D88838
2020-10-15 10:09:49 -07:00
Simon Pilgrim 23f1616626 [InstCombine] Use m_SpecificInt instead of m_APInt + comparison. NFCI. 2020-10-15 16:06:27 +01:00
Simon Pilgrim b3330ae42c [InstCombine] SimplifyDemandedUseBits - xor - refactor cast<ConstantInt> usage to PatternMatch. NFCI.
First step towards replacing these to add full vector support.
2020-10-15 16:06:23 +01:00
Simon Pilgrim 2b45639ea0 [InstCombine] InstCombineAndOrXor - refactor cast<ConstantInt> usages to PatternMatch. NFCI.
First step towards replacing these to add full vector support.
2020-10-15 16:06:17 +01:00
Simon Pilgrim 09be7623e4 [InstCombine] visitXor - refactor ((X^C1)>>C2)^C3 -> (X>>C2)^((C1>>C2)^C3) fold. NFCI.
This is still ConstantInt-only (scalar) but is refactored to use PatternMatch to make adding vector support in the future relatively trivial.
2020-10-15 14:38:15 +01:00
Simon Pilgrim fadd152317 [AggressiveInstCombine] foldAnyOrAllBitsSet - add uniform vector support
Replace m_ConstantInt with m_APInt to support uniform vectors (with no undef elements)

Adding non-undef support would involve some refactoring of the MaskOps struct but this might still be worth it.
2020-10-15 11:02:35 +01:00
Simon Pilgrim 60ba9233d1 Revert rG25a97c3a43d7 - "[InstCombine] visitCallInst - retain undefs in vector funnel shift amounts"
This reverts commit 25a97c3a43.

We have other constant folds that fold undef funnel shift amounts to 0 - so we need to be consistent.

If we end up with regressions where we lose a splat shift amount pattern we'll have to investigate other canonicalizations, but matchFunnelShift currently protects us from that.
2020-10-14 18:14:37 +01:00
Matt Arsenault 6a9484f4bf InstCombine: Fix losing load properties in copy-constant-to-alloca
Preserve the alignment and metadata. Atomic loads are skipped for
this, but pass along the properties for consistency.
2020-10-14 12:55:25 -04:00
Matt Arsenault 6da31fa4a6 InstCombine: Fix infinite loop in copy-constant-to-alloca transform
This was broken by 16295d521e, when
instructions started being handled and not just constant
expressions. This was re-inserting an equivalent bitcast to the
original memcpy operand, which made a non-functional IR change on
every iteration.

This also fixes a secondary problem where it was inserting
addrspacecasts which may not have been legal (i.e. it changed the
source address space). Start visiting all pointer users and fail out
if we can't process them. Also start handling the relevant memory
intrinsic users. These cases can be dealt with by running
InferAddressSpaces separately.
2020-10-14 12:55:25 -04:00
Florian Hahn 93f6c6b79c Recommit "[VPlan] Use VPValue def for VPMemoryInstructionRecipe."
This reverts the revert commit 710aceb645
and includes a fix for a memsan failure.

Original message:

    This patch turns VPMemoryInstructionRecipe into a VPValue and uses it
    during VPlan construction and codegeneration instead of the plain IR
    reference where possible.
2020-10-14 17:41:23 +01:00
Simon Pilgrim 89657b3a3b [InstCombine] narrowRotate - canonicalize to OR(SHL,LSHR). NFCI.
Match the canonicalization code that was added to matchFunnelShift at rG02295e6d1a15
2020-10-14 16:45:00 +01:00
Simon Pilgrim 89a2a47870 [InstCombine] Add m_SpecificIntAllowUndef pattern matcher
m_SpecificInt doesn't accept undef elements in a vector splat value - tweak specific_intval to optionally allow undefs and add the m_SpecificIntAllowUndef variants.

Allows us to remove the m_APIntAllowUndef + comparison hack inside matchFunnelShift
2020-10-14 16:15:53 +01:00
Simon Pilgrim 25a97c3a43 [InstCombine] visitCallInst - retain undefs in vector funnel shift amounts
By always performing a modulo on the shift amount constants this was causing undef amounts being replaced with zero, meaning we were losing funnel shift by splat (with undef) patterns.

Tweaked the shift amount bounds check to support (passthrough) undefs, and use Constant::mergeUndefsWith to preserve the undefs after folding.
2020-10-14 14:38:21 +01:00
Roman Lebedev 7ee6c40247
Revert "Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"" and it's follow-ups
While we haven't encountered an earth-shattering problem with this yet,
by now it is pretty evident that trying to model the ptr->int cast
implicitly leads to having to update every single place that assumed
no such cast could be needed. That is of course the wrong approach.

Let's back this out, and re-attempt with some another approach,
possibly one originally suggested by Eli Friedman in
https://bugs.llvm.org/show_bug.cgi?id=46786#c20
which should hopefully spare us this pain and more.

This reverts commits 1fb6104293,
7324616660,
aaafe350bb,
e92a8e0c74.

I've kept&improved the tests though.
2020-10-14 16:09:18 +03:00
Juneyoung Lee 9b3c2a72e4 [ValueTracking] Use assume's noundef operand bundle
This patch updates `isGuaranteedNotToBeUndefOrPoison` to use `llvm.assume`'s `noundef` operand bundle.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89219
2020-10-14 20:16:33 +09:00
Evgeniy Brevnov d0c95808e5 [LV] Unroll factor is expected to be > 0
LV fails with assertion checking that UF > 0. We already set UF to 1 if it is 0 except the case when IC > MaxInterleaveCount. The fix is to set UF to 1 for that case as well.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87679
2020-10-14 16:48:17 +07:00
Simon Pilgrim 1e4d882f9a [InstCombine] matchFunnelShift - add support for non-uniform vectors containing undefs.
Replace m_SpecificInt with m_APIntAllowUndef to matching splats containing undefs, then use ConstantExpr::mergeUndefsWith to merge the undefs together in the result.

The undef funnel shift amounts are getting replaced with zero later on - I'll address this in a later patch, otherwise we lose potential shift by splat value patterns.
2020-10-14 10:42:27 +01:00
sstefan1 ce16be253c [Attributor][NFC] Make `createShallowWrapper()` available outside of Attributor
D85703 will need to create shallow wrappers in order to track the spmd icv. We need to make it available.

Differential Revision: https://reviews.llvm.org/D89342
2020-10-14 10:08:59 +02:00
Arthur Eubanks 518ec05a10 [LoopExtract][NewPM] Port -loop-extract to NPM
-loop-extract-single is just -loop-extract on one loop.

-loop-extract depended on -break-crit-edges and -loop-simplify in the
legacy PM, but the NPM doesn't allow specifying pass dependencies like
that, so manually add those passes to the RUN lines where necessary.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89016
2020-10-13 22:55:42 -07:00
Nikita Popov 3b31f05372 [LICM] Don't require AST in LoopPromoter (NFC)
While promotion currently always has an AST available, it is only
relevant for invalidation purposes in LoopPromoter, so we do not
need to have it as a hard dependency.
2020-10-13 22:08:49 +02:00
Nikita Popov cd6f40f432 [MemCpyOpt] Add test scaffolding for MSSA based MemCpyOpt
This adds an -enable-memcpyopt-memoryssa option that currently does
nothing apart from requiring MSSA as a dependency. The tests are
split to run both with the option disabled and enabled. I went with
this rather than the separate directory DSE uses, as I found it
convenient to have a direct side-by-side comparison of differences.

Differential Revision: https://reviews.llvm.org/D89206
2020-10-13 21:45:05 +02:00
Nikita Popov e79ca751fc [MemCpyOpt] Fix MemorySSA preservation
moveUp() moves instructions, so we should move the corresponding
memory accesses as well. We should also move the store instruction
itself: Even though we'll end up removing it later, this gives us
a correct MemoryDef to replace.

The implementation is somewhat more complicated than it should be,
because we also handle the case where P does not have a memory
access due to a degnerate AA pipeline. Hopefully, the need for this
will go away in the future, when the rest of the pass is based on
MSSA.

Differential Revision: https://reviews.llvm.org/D88778
2020-10-13 21:39:09 +02:00
Nikita Popov baa3b87015 [MemCpyOpt] Don't shorten memset if memcpy operands may be the same
If the memcpy operands are the same (which is allowed since D86815)
then the memcpy is effectively a no-op and the partially overlapping
memset is not dead.

Differential Revision: https://reviews.llvm.org/D89192
2020-10-13 21:19:19 +02:00
Nikita Popov 39c39e8a7f [MemCpyOpt] Don't shorten memset if destination observable through unwinding
MemCpyOpt can shorten a memset if it is later partially overwritten
by a memcpy. It checks that the destination is not read in between,
but we also need to make sure that the destination cannot be observed
via unwinding.

Differential Revision: https://reviews.llvm.org/D89190
2020-10-13 21:12:19 +02:00
Xun Li 0ccf9263cc [ASAN] Make sure we are only processing lifetime markers with offset 0 to alloca
This patch addresses https://bugs.llvm.org/show_bug.cgi?id=47787 (and hence https://bugs.llvm.org/show_bug.cgi?id=47767 as well).
In latter instrumentation code, we always use the beginning of the alloca as the base for instrumentation, ignoring any offset into the alloca.
Because of that, we should only instrument a lifetime marker if it's actually pointing to the beginning of the alloca.

Differential Revision: https://reviews.llvm.org/D89191
2020-10-13 10:21:45 -07:00
Nikita Popov 6713332fdd [LoopVersioningLICM] Fix noalias metadata emission
The previous code added the scope on each iteration, so that the
same scope was represented many times in the same !noalias metadata.
That's legal, and semantically equivalent to only storing the scope
once, but it's also wasteful and may pessimize further optimization
if AATags get intersected naively, as done by the AliasSetTracker.
2020-10-13 18:58:05 +02:00
Simon Pilgrim 9c3138bd6d [InstCombine] visitTrunc - pass through undefs for trunc(shift(trunc/ext(x),c)) patterns
Based on the recent patches D88475 and D88429 where we are losing undef values due to extension/comparisons.

I've added a Constant::mergeUndefsWith method that merges the undef scalar/elements from another Constant into a specific Constant.

Differential Revision: https://reviews.llvm.org/D88687
2020-10-13 14:35:18 +01:00
Vitaly Buka 710aceb645 Revert "[VPlan] Use VPValue def for VPMemoryInstructionRecipe."
It introduced a memory leak.

This reverts commit 525b085a65.
2020-10-13 03:14:08 -07:00
Simon Pilgrim 5df61724a1 [InstCombine] Support uniform vector splats in ((((X >> C) & CC) + Y) << C) folds.
Add support for uniform vector splats (no undefs).
2020-10-13 09:28:39 +01:00
Roman Lebedev 1fb6104293
Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"
This relands commit 1c021c64ca which was
reverted in commit 17cec6a11a because
an assertion was being triggered, since `BuildConstantFromSCEV()`
wasn't updated to handle the case where the constant we want to truncate
is actually a pointer. I was unsuccessful in coming up with a test case
where we'd end there with constant zext/sext of a pointer,
so i didn't handle those cases there until there is a test case.

Original commit message:

While we indeed can't treat them as no-ops, i believe we can/should
do better than just modelling them as `unknown`. `inttoptr` story
is complicated, but for `ptrtoint`, it seems straight-forward
to model it just as a zext-or-trunc of unknown.

This may be important now that we track towards
making inttoptr/ptrtoint casts not no-op,
and towards preventing folding them into loads/etc
(see D88979/D88789/D88788)

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88806
2020-10-12 23:02:55 +03:00
Simon Pilgrim 4ff7136268 [InstCombine] FoldShiftByConstant - create Scalar/Vector constant with ConstantInt::get(). NFCI.
There's no need to create constant vector splats manually - missed this one in rG24dd0cd1edd5
2020-10-12 18:39:45 +01:00
Simon Pilgrim 24dd0cd1ed [InstCombine] FoldShiftByConstant - create Scalar/Vector constant with ConstantInt::get(). NFCI.
There's no need to create constant vector splats manually.
2020-10-12 18:17:20 +01:00
Simon Pilgrim 2de368f6a7 [InstCombine] FoldShiftByConstant - merge equivalent types. NFCI.
Consistently use the original shift instruction's Type/BitWidth instead of the operands, casted values etc.
2020-10-12 18:17:19 +01:00
Florian Hahn 525b085a65 [VPlan] Use VPValue def for VPMemoryInstructionRecipe.
This patch turns VPMemoryInstructionRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84680
2020-10-12 18:02:33 +01:00
Hans Wennborg 17cec6a11a Revert 1c021c64c "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"
> While we indeed can't treat them as no-ops, i believe we can/should
> do better than just modelling them as `unknown`. `inttoptr` story
> is complicated, but for `ptrtoint`, it seems straight-forward
> to model it just as a zext-or-trunc of unknown.
>
> This may be important now that we track towards
> making inttoptr/ptrtoint casts not no-op,
> and towards preventing folding them into loads/etc
> (see D88979/D88789/D88788)
>
> Reviewed By: mkazantsev
>
> Differential Revision: https://reviews.llvm.org/D88806

It caused the following assert during Chromium builds:

  llvm/lib/IR/Constants.cpp:1868:
  static llvm::Constant *llvm::ConstantExpr::getTrunc(llvm::Constant *, llvm::Type *, bool):
  Assertion `C->getType()->isIntOrIntVectorTy() && "Trunc operand must be integer"' failed.

See code review for a link to a reproducer.

This reverts commit 1c021c64ca.
2020-10-12 18:39:35 +02:00
Florian Hahn ea058d289c [VPlan] Use operands for printing of VPWidenMemoryInstructionRecipe.
Now that operands of the recipe are managed through VPUser, we can
simplify the printing by just using the operands.
2020-10-12 16:51:54 +01:00
Florian Hahn ad5541045a [LoopDeletion] Remove over-eager SCEV verification.
60b852092c introduced SCEV verification to
deleteDeadLoop, but it appears this check is currently a bit over-eager
and some users of deleteDeadLoop appear to only patch up SE after
calling it (e.g. PR47753).

Remove the extra check for now. We can consider adding it back after we
tracked down the source of the inconsistency for PR47753.
2020-10-12 16:18:30 +01:00
Simon Pilgrim bbf3925879 [InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw (REAPPLIED)
If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139).

Reapplied after the shift canonicalization in rG02295e6d1a15 which removed the need to flip the shift values.

Differential Revision: https://reviews.llvm.org/D88783
2020-10-12 16:06:41 +01:00
Simon Pilgrim fa56623370 [InstCombine] matchFunnelShift - remove shift value commutation. NFCI.
After rG02295e6d1a15 we no longer need to invert the shift values for fshr - this is just hidden at the moment as funnel shifts only ever match for constant values so never use the fshr "Sub on SHL" path.
2020-10-12 15:55:18 +01:00
Simon Pilgrim 02295e6d1a [InstCombine] matchFunnelShift - canonicalize to OR(SHL,LSHR). NFCI.
Simplify the shift amount matching code by canonicalizing the shift ops first.
2020-10-12 15:10:59 +01:00
Simon Pilgrim 45d785e22b Revert rGb97093e520036f8 - "[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw"
This reverts commit b97093e520.

Funnel shift argument commutation isn't working correctly
2020-10-12 11:38:52 +01:00
Roman Lebedev 1c021c64ca
[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown
While we indeed can't treat them as no-ops, i believe we can/should
do better than just modelling them as `unknown`. `inttoptr` story
is complicated, but for `ptrtoint`, it seems straight-forward
to model it just as a zext-or-trunc of unknown.

This may be important now that we track towards
making inttoptr/ptrtoint casts not no-op,
and towards preventing folding them into loads/etc
(see D88979/D88789/D88788)

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88806
2020-10-12 11:04:03 +03:00
David Sherwood c5ba0d33cc [SVE] Make ElementCount and TypeSize use a new PolySize class
I have introduced a new template PolySize class, where the template
parameter determines the type of quantity, i.e. for an element
count this is just an unsigned value. The ElementCount class is
now just a simple derivation of PolySize<unsigned>, whereas TypeSize
is more complicated because it still needs to contain the uint64_t
cast operator, since there are still many places in the code that
rely upon this implicit cast. As such the class also still needs
some of it's own operators.

I've tried to minimise the amount of code in the base PolySize
class, which led to a couple of changes:

1. In some places we were relying on '==' operator comparisons
between ElementCounts and the scalar value 1. I didn't put this
operator in the new PolySize class, and thought it was actually
clearer to use the isScalar() function instead.
2. I removed the isByteSized function and replaced it with calls
to isKnownMultipleOf(8).

I've also renamed NextPowerOf2 to be coefficientNextPowerOf2 so
that it's more consistent with coefficientDivideBy.

Differential Revision: https://reviews.llvm.org/D88409
2020-10-12 08:23:38 +01:00
Roman Lebedev 544a6aa267
[InstCombine] combineLoadToOperationType(): don't fold int<->ptr cast into load
And another step towards transforms not introducing inttoptr and/or
ptrtoint casts that weren't there already.

As we've been establishing (see D88788/D88789), if there is a int<->ptr cast,
it basically must stay as-is, we can't do much with it.

I've looked, and the most source of new such casts being introduces,
as far as i can tell, is this transform, which, ironically,
tries to reduce count of casts..

On vanilla llvm test-suite + RawSpeed, @ `-O3`, this results in
-33.58% less `IntToPtr`s (19014 -> 12629)
and +76.20% more `PtrToInt`s (18589 -> 32753),
which is an increase of +20.69% in total.

However just on RawSpeed, where i know there are basically
none `IntToPtr` in the original source code,
this results in -99.27% less `IntToPtr`s (2724 -> 20)
and +82.92% more `PtrToInt`s (4513 -> 8255).
which is again an increase of 14.34% in total.

To me this does seem like the step in the right direction,
we end up with strictly less `IntToPtr`, but strictly more `PtrToInt`,
which seems like a reasonable trade-off.

See https://reviews.llvm.org/D88860 / https://reviews.llvm.org/D88995
for some more discussion on the subject.

(Eventually, `CastInst::isNoopCast()`/`CastInst::isEliminableCastPair`
should be taught about this, yes)

Reviewed By: nlopes, nikic

Differential Revision: https://reviews.llvm.org/D88979
2020-10-11 20:24:28 +03:00
David Green be6e8e50f4 [LV] Tail folded inloop reductions.
This expands upon the inloop reductions added in e9761688e41cb9e976,
allowing them to be inserted into tail folded loops. Reductions are
generates with the form:

  x = select(mask, vecop, zero)
  v = vecreduce.add(x)
  c = add chain, v

Where zero here is chosen as the identity value for add reductions. The
backend is then expected to fold the select and the vecreduce into a
single predicated instruction.

Most of the code is fairly straight forward, except for the creation of
blockmasks which need to ensure they are created in dominance order. The
order they are added is altered to be after any phis, keeping the
requirements for the underlying IR.

Differential Revision: https://reviews.llvm.org/D84451
2020-10-11 16:58:34 +01:00
Sanjay Patel 3f3356bdd9 [InstCombine] allow vector splats for add+xor --> shifts 2020-10-11 09:04:24 -04:00
Sanjay Patel f81200ae99 [InstCombine] add one-use check to add+xor transform
As shown in the affected test, we could increase instruction
count without this limitation. There's another test with extra
use that shows we still convert directly to a real "sext" if
possible.
2020-10-11 09:04:24 -04:00
Simon Pilgrim b97093e520 [InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw
If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139).

Differential Revision: https://reviews.llvm.org/D88783
2020-10-11 10:37:20 +01:00
Simon Pilgrim b752daa26b [InstCombine] Replace getLogBase2 internal helper with ConstantExpr::getExactLogBase2. NFCI.
This exposes the helper for other power-of-2 instcombine folds that I'm intending to add vector support to.

The helper only operated on power-of-2 constants so getExactLogBase2 is a more accurate name.
2020-10-11 10:31:17 +01:00
Xun Li 667dfe39ca [Coroutines] Refactor/Rewrite Spill and Alloca processing
This patch is a refactoring of how we process spills and allocas during CoroSplit.
In the previous implementation, everything that needs to go to the heap is put into Spills, including all the values defined by allocas.
And the way to identify a Spill, is to check whether there exists a use-def relationship that crosses suspension points.

This approach is fundamentally confusing, and unfortunately, incorrect.
First of all, allocas are always process differently than spills, hence it's quite confusing to put them together. It's a much cleaner to separate them and process them separately.
Doing so simplify lots of code and makes the logic more clear and easier to reason about.

Secondly, use-def relationship is insufficient to decide whether a value defined by AllocaInst needs to go to the heap.
There are many cases where a value defined by AllocaInst can implicitly be used across suspension points without a direct use-def relationship.
For example, you can store the address of an alloca into the heap, and load that address after suspension. Or you can escape the address into an object through a function call.
Or you can have a PHINode that takes two allocas, and this PHINode is used across suspension point (when this happens, the existing implementation will spill the PHINode, a.k.a a stack adddress to the heap!).
All these issues suggest that we need to separate spill and alloca in order to properly implement this.
This patch does not yet fix these bugs, however it sets up the code in a better shape so that we can start fixing them in the next patch.

The core idea of this patch is to add a new struct called FrameDataInfo, which contains all Spills, all Allocas, and a map from each definition to its layout index in the frame (FieldIndexMap).
Spills and Allocas are identified, stored and processed independently. When they are initially added to the frame, we record their field index through FieldIndexMap. When the frame layout is finalized, we update each index into their final layout index.

In doing so, I also cleaned up a few things and also discovered a few other bugs.

Cleanups:
1. Found out that PromiseFieldId is not used, delete it.
2. Previously, SpillInfo is a vector, which is strange because every def can have multiple users. This patch cleans it up by turning it into a map from def to users.
3. Previously, a frame Field struct contains a list of Spills that field corresponds to. This isn't necessary since we only need the layout index for each given definition. This patch removes that list. Instead, we connect each field and definition using the FieldIndexMap.
4. All the loops that process Spills are simplified now because we use a map instead of a vector.

Bugs:
It seems that we are only keeping llvm.dbg.declare intrinsics in the .resume part of the function. The ramp function will no longer has it. This means we are dropping some debug information in the ramp function.

The next step is to start fixing the bugs where the implementation fails to identify some allocas that should live on the frame.

Differential Revision: https://reviews.llvm.org/D88872
2020-10-10 22:21:34 -07:00
Simon Pilgrim 702ccb40e2 [InstCombine] getLogBase2(undef) -> 0.
Move the undef element handling into the getLogBase2 helper instead of pre-empting with replaceUndefsWith.
2020-10-10 20:29:03 +01:00
Simon Pilgrim 3aab3cbd4a [InstCombine] getLogBase2 - no need to specify Type. NFCI.
In all the getLogBase2 uses, the specified Type is always the same as the constant being folded.
2020-10-10 20:09:55 +01:00
Nikita Popov 5e855f1e80 [MemCpyOpt] Don't hoist store that's not guaranteed to execute
MemCpyOpt can hoist stores while load+store pairs into memcpy.
This hoisting can currently result in stores being executed that
weren't guaranteed to execute in the original problem.

Differential Revision: https://reviews.llvm.org/D89154
2020-10-10 10:26:28 +02:00
Changpeng Fang f192a27ed3 Sink: Handle instruction sink when a user is dead
Summary:
  The current instruction sink pass uses findNearestCommonDominator of all users to find block to sink the instruction to.
However, a user may be in a dead block, which will result in unexpected behavior.

This patch handles such cases by skipping dead blocks. This patch fixes:
https://bugs.llvm.org/show_bug.cgi?id=47415

Reviewers:
  MaskRay, arsenm

Differential Revision:
  https://reviews.llvm.org/D89166
2020-10-09 16:20:26 -07:00
Eli Friedman 278299b0f0 [SCCP] Reduce the number of times ResolvedUndefsIn is called for large modules.
If a module has many values that need to be resolved by
ResolvedUndefsIn, compilation takes quadratic time overall. Solve should
do a small amount of work, since not much is added to the worklists each
time markOverdefined is called. But ResolvedUndefsIn is linear over the
length of the function/module, so resolving one undef at a time is
quadratic in general.

To solve this, make ResolvedUndefsIn resolve every undef value at once,
instead of resolving them one at a time. This loses a little
optimization power, but can be a lot faster.

We still need a loop around ResolvedUndefsIn because markOverdefined
could change the set of blocks that are live. That should be uncommon,
hopefully. We could optimize it by tracking which blocks transition from
dead to live, instead of iterating over the whole module to find them.
But I'll leave that for later. (The whole function will become a lot
simpler once we start pruning branches on undef.)

The regression test changes seem minor. The specific cases in question
could probably be optimized with a bit more work, but they seem like
edge cases that don't really matter.

Fixes an "infinite" compile issue my team found on an internal workoad.

Differential Revision: https://reviews.llvm.org/D89080
2020-10-09 15:24:16 -07:00
Giorgis Georgakoudis 3a6bfcf2f9 [OpenMPOpt] Merge parallel regions
There are cases that generated OpenMP code consists of multiple,
consecutive OpenMP parallel regions, either due to high-level
programming models, such as RAJA, Kokkos, lowering to OpenMP code, or
simply because the programmer parallelized code this way.  This
optimization merges consecutive parallel OpenMP regions to: (1) reduce
the runtime overhead of re-activating a team of threads; (2) enlarge the
scope for other OpenMP optimizations, e.g., runtime call deduplication
and synchronization elimination.

This implementation defensively merges parallel regions, only when they
are within the same BB and any in-between instructions are safe to
execute in parallel.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83635
2020-10-09 09:59:04 -07:00
Arthur Eubanks 0689dab844 [FixIrreducible][NewPM] Port -fix-irreducible to NPM
In the NPM, a pass cannot depend on another non-analysis pass. So pin
the test that tests that -lowerswitch is run automatically to legacy PM.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D89051
2020-10-09 09:22:09 -07:00
Arthur Eubanks 9c21c6c966 [LoopInterchange][NewPM] Port -loop-interchange to NPM
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D89058
2020-10-09 09:21:31 -07:00
Simon Pilgrim 8a836daaa9 [InstCombine] Support lshr(trunc(lshr(x,c1)), c2) -> trunc(lshr(lshr(x,c1),c2)) uniform vector tests
FoldShiftByConstant is hardcoded for scalar/uniform outer shift amounts atm so that needs to be fixed first to support non-uniform cases
2020-10-09 16:54:46 +01:00
Simon Pilgrim 1c040a3e56 [InstCombine] commonShiftTransforms - add support for pow2 nonuniform constant vectors in srem fold
Note: we already fold srem to undef if any denominator vector element is undef.
2020-10-09 15:59:33 +01:00
Sanjay Patel 080e6bc205 [InstCombine] allow vector splats for add+and with high-mask
There might be a better way to specify the pre-conditions,
but this is hopefully clearer than the way it was written:
https://rise4fun.com/Alive/Jhk3

  Pre: C2 < 0 && isShiftedMask(C2) && (C1 == C1 & C2)
  %a = and %x, C2
  %r = add %a, C1
  =>
  %a2 = add %x, C1
  %r = and %a2, C2
2020-10-09 10:39:11 -04:00
Simon Pilgrim 9e796d5e71 [InstCombine] foldShiftOfShiftedLogic - add support for nonuniform constant vectors 2020-10-09 14:25:12 +01:00
Simon Pilgrim 556316cf72 [InstCombine] foldShiftOfShiftedLogic - replace cast<BinaryOperator> with m_BinOp matcher. NFCI.
Allows us to drop the !isa<ConstantExpr> check.
2020-10-09 14:10:12 +01:00
Max Kazantsev 225df71951 [NFC] Add option to disable IV widening if needed
IV widening is sometimes a strictly harmful transform (some examples
of this are shown in tests 11, 12 in widen-loop-comp.ll). One of the
reasons of this is that sometimes SCEV fails to prove some facts after
part of guards has been widened.

Though each single such case looks like a bug that can be addressed,
it seems that disabling of IV widening may be profitable in some cases.
We want to have an option to do so. By default, existing behavior is
preserved and IV widening is on.
2020-10-09 18:32:03 +07:00
Simon Pilgrim d9f064dc0b [InstCombine] visitTrunc - trunc(shl(X, C)) --> shl(trunc(X),trunc(C)) vector support
Annoyingly vectors aren't supported by shouldChangeType(), but we have precedents for always performing this on vector types (e.g. narrowBinOp).

Differential Revision: https://reviews.llvm.org/D89067
2020-10-08 22:07:51 +01:00
Sanjay Patel f688ae7a0e [InstCombine] allow vector splats for add+xor with low-mask
This can be allowed with undef elements too, but that can be another step:
https://alive2.llvm.org/ce/z/hnC4Z-
2020-10-08 15:53:38 -04:00
Simon Pilgrim 6aa10ae5bf [Transforms] visitCmpBlock - don't dereference a dyn_cast<>. NFCI.
Use cast<> as we immediately dereference the pointer afterwards - cast<> will assert if we fail.

Prevents clang static analyzer warning that we could deference a null pointer.
2020-10-08 20:18:32 +01:00
Sanjay Patel 5ac89add1e [InstCombine] remove unnecessary one-use check from add-xor transform
Pre-conditions seem to be optimal, but we don't need a use check
because we are only replacing an add with a sub.

https://rise4fun.com/Alive/hzN

  Pre: (~C1 | C2 == -1) && isPowerOf2(C2+1)
  %m = and i8 %x, C1
  %f = xor i8 %m, C2
  %r = add i8 %f, C3
  =>
  %r = sub i8 C2 + C3, %m
2020-10-08 15:08:51 -04:00
Simon Pilgrim 0716805c02 [SLP] optimizeGatherSequence - assert every Instruction in the worklist is non-null.
Fixes clang static analyzer warning.
2020-10-08 20:02:18 +01:00
Simon Pilgrim 8f0658ae67 [Transforms] CodeExtractor::verifyAssumptionCache - don't dereference a dyn_cast<>. NFCI.
Use cast<> as we immediately dereference the pointer afterwards - cast<> will assert if we fail.

Prevents clang static analyzer warning that we could deference a null pointer.
2020-10-08 19:04:30 +01:00
Sanjay Patel b57451b011 [InstCombine] allow vector splats for add+xor with signmask 2020-10-08 10:46:34 -04:00
Simon Pilgrim 5415fef3ab [InstCombine] matchFunnelShift - support non-uniform constant vector shift amounts (PR46895)
Complete basic PR46895 fixes by refactoring D87452/D88402 to allow us to match non-uniform constant values.

We still don't handle non-uniform vectors that contain undef elements, but that can wait until we have a decent generic mechanism for this.

Differential Revision: https://reviews.llvm.org/D88420
2020-10-08 12:56:27 +01:00
Markus Lavin 06758c6a61 [DebugInfo] Improve dbg preservation in LSR.
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo). Before transformation compute SCEV for each
@llvm.dbg.value in the loop body and store it (along side its current
DIExpression). After transformation update those @llvm.dbg.value now
referencing undef by comparing its stored SCEV to the SCEV of the
current loop-header PHI-nodes. Allow match with offset by inserting
compensation code in the DIExpression.

Includes fix for the nullptr deref that caused the original commit
to be reverted in 9d63029770.

Fixes : PR38815

Differential Revision: https://reviews.llvm.org/D87494
2020-10-08 13:16:43 +02:00
Simon Pilgrim e1d4ca0009 [InstCombine] matchRotate - add support for matching general funnel shifts with constant shift amounts (PR46896)
First step towards extending the existing rotation support to full funnel shift handling now that the backend legalization support has improved.

This enables us to match the shift by constant cases, which are pretty trivial to expand again if necessary.

D88420 will add non-uniform support for funnel shifts as well once its been finalized.

Differential Revision: https://reviews.llvm.org/D88834
2020-10-08 11:05:14 +01:00
Simon Pilgrim aa47962cc9 [InstCombine] canNarrowShiftAmt - replace custom Constant matching with m_SpecificInt_ICMP
The existing code ignores undef values which matches m_SpecificInt_ICMP, although m_SpecificInt_ICMP returns false for an all-undef constant, I've added test coverage at rGfe0197e194a64f9 to show that undef folding should already have dealt with that case.
2020-10-08 10:53:32 +01:00
David Green 498f89d188 [LV] Collect dead induction truncates
We currently collect the ICmp and Add from an induction variable,
marking them as dead so that vplan values are not created for them. This
extends that to include any single use trunk from the ICmp, which allows
the Add to more readily be removed too.

This can help with costing vplan nodes, as the ICmp and Add are more
reliably removed and are not double-counted.

Differential Revision: https://reviews.llvm.org/D88873
2020-10-08 08:28:58 +01:00
Reid Kleckner 940d7aaea9 Port StripGCRelocates pass to NPM
Fixes one test under NPM

Differential Revision: https://reviews.llvm.org/D88766
2020-10-07 14:41:29 -07:00
Reid Kleckner da48fe1732 [NPM] Port strip nonlinetable debuginfo pass to the new pass manager
Fixes a few tests in llvm/test/Transforms/Utils.

Differential Revision: https://reviews.llvm.org/D88762
2020-10-07 14:35:36 -07:00
Amara Emerson 322d0afd87 [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787
2020-10-07 10:36:44 -07:00
Roman Lebedev fed0f890e5
InstCombine: Negator: don't rely on complexity sorting already being performed (PR47752)
In some cases, we can negate instruction if only one of it's operands
negates. Previously, we assumed that constants would have been
canonicalized to RHS already, but that isn't guaranteed to happen,
because of InstCombine worklist visitation order,
as the added test (previously-hanging) shows.

So if we only need to negate a single operand,
we should ensure ourselves that we try constant operand first.
Do that by re-doing the complexity sorting ourselves,
when we actually care about it.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47752
2020-10-07 15:09:50 +03:00
Max Kazantsev fba42aea43 [NFC] Use getZero instead of getConstant(0) 2020-10-07 13:53:36 +07:00
Roman Lebedev 7fa503ef4a
[SROA] rewritePartition()/findCommonType(): if uses have conflicting type, try getTypePartition() before falling back to largest integral use type (PR47592)
And another step towards transformss not introducing inttoptr and/or
ptrtoint casts that weren't there already.

In this case, when load/store uses have conflicting types,
instead of falling back to the iN, we can try to use allocated sub-type.
As disscussed, this isn't the best idea overall (we shouldn't rely on
allocated type), but it works fine as a temporary measure.

I've measured, and @ `-O3` as of vanilla llvm test-suite + RawSpeed,
this results in +0.05% more bitcasts, -5.51% less inttoptr
and -1.05% less ptrtoint (at the end of middle-end opt pipeline)

See https://bugs.llvm.org/show_bug.cgi?id=47592

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D88788
2020-10-07 09:20:19 +03:00
Johannes Doerfert 7993d61177 [Attributor] Use smarter way to determine alignment of GEPs
Use same logic existing in other places to deal with base case GEPs.

Add the original Attributor talk example.
2020-10-06 19:31:08 -05:00
Johannes Doerfert c4cfe7a435 [Attributor] Ignore read accesses to constant memory
The old function attribute deduction pass ignores reads of constant
memory and we need to copy this behavior to replace the pass completely.
First step are constant globals. TBAA can also describe constant
accesses and there are other possibilities. We might want to consider
asking the alias analyses that are available but for now this is simpler
and cheaper.
2020-10-06 19:31:07 -05:00
Johannes Doerfert 3f540c05df [Attributor] Give up early on AANoReturn::initialize
If the function is not assumed `noreturn` we should not wait for an
update to mark the call site as "may-return".

This has two kinds of consequences:
  - We have less iterations in many tests.
  - We have less deductions based on "known information" (since we ask
    earlier, point 1, and therefore assumed information is not "known"
    yet).
The latter is an artifact that we might want to tackle properly at some
point but which is not easily fixable right now.
2020-10-06 19:31:07 -05:00
Nikita Popov 616f545048 [MemCpyOpt] Use dereferenceable pointer helper
The call slot optimization has some home-grown code for checking
whether the destination is dereferenceable. Replace this with the
generic isDereferenceableAndAlignedPointer() helper.

I'm not checking alignment here, because that is currently handled
separately and may be an enforced alignment for allocas. The clean
way of integrating that part would probably be to accept a callback
in isDereferenceableAndAlignedPointer() for the actual isAligned check,
which would then have a chance to use an enforced alignment instead.

This allows the destination to be a GEP (among other things), though
the two open TODOs may prevent it from working in practice.

Differential Revision: https://reviews.llvm.org/D88805
2020-10-06 18:41:19 +02:00
Nikita Popov 6b441ca523 [MemCpyOpt] Check for throwing calls during call slot optimization
When performing call slot optimization for a non-local destination,
we need to check whether there may be throwing calls between the
call and the copy. Otherwise, the early write to the destination
may be observable by the caller.

This was already done for call slot optimization of load/store,
but not for memcpys. For the sake of clarity, I'm moving this check
into the common optimization function, even if that does need an
additional instruction scan for the load/store case.

As efriedma pointed out, this check is not sufficient due to
potential accesses from another thread. This case is left as a TODO.

Differential Revision: https://reviews.llvm.org/D88799
2020-10-06 18:24:40 +02:00
Nikita Popov 80cde02e85 [MemCpyOpt] Add separate statistic for call slot optimization (NFC) 2020-10-06 18:14:10 +02:00
Dávid Bolvanský 86429c4eaf [SimplifyLibCalls] Optimize mempcpy_chk to mempcpy 2020-10-06 17:08:46 +02:00
Johannes Doerfert 4a7a988442 [Attributor][FIX] Move assertion to make it not trivially fail
The idea of this assertion was to check the simplified value before we
assign it, not after, which caused this to trivially fail all the time.
2020-10-06 09:32:18 -05:00
Johannes Doerfert 04f6951397 [Attributor][FIX] Dead return values are not `noundef`
When we assume a return value is dead we might still visit return
instructions via `Attributor::checkForAllReturnedValuesAndReturnInsts(..)`.
When we do so the "returned value" is potentially simplified to `undef`
as it is the assumed "returned value". This is a problem if there was a
preexisting `noundef` attribute that will only be removed as we manifest
the `undef` return value. We should not use this combination to derive
`unreachable` though. Two test cases fixed.
2020-10-06 09:32:18 -05:00
Johannes Doerfert 957094e31b [Attributor][NFC] Ignore benign uses in AAMemoryBehaviorFloating
In AAMemoryBehaviorFloating we used to track benign uses in a SetVector.
With this change we look through benign uses eagerly to reduce the
number of elements (=Uses) we look at during an update.

The test does actually not fail prior to this commit but I already wrote
it so I kept it.
2020-10-06 09:32:18 -05:00
Simon Pilgrim 17b9a91ec2 [InstCombine] canRewriteGEPAsOffset - don't dereference a dyn_cast<>. NFCI.
We know V is a IntToPtrInst or PtrToIntInst type so we know its a CastInst - so use cast<> directly.

Prevents clang static analyzer warning that we could deference a null pointer.
2020-10-06 14:48:34 +01:00
Simon Pilgrim 75d33a3a97 [InstCombine] FoldShiftByConstant - consistently use ConstantExpr in logicalshift(trunc(shift(x,c1)),c2) fold. NFCI.
This still only gets used for scalar types but now always uses ConstantExpr in preparation for vector support - it was using APInt methods in some places.
2020-10-06 14:48:34 +01:00
Simon Pilgrim 21100f885d [InstCombine] FoldShiftByConstant - use PatternMatch for logicalshift(trunc(shift(x,c1)),c2) fold. NFCI. 2020-10-06 13:13:08 +01:00
Simon Pilgrim 0b402e985e [InstCombine] FoldShiftByConstant - remove unnecessary cast<>. NFC.
Op1 is already a Constant*
2020-10-06 13:13:08 +01:00
Serguei Katkov b988898013 [GVN LoadPRE] Extend the scope of optimization by using context to prove safety of speculation
Use context to prove that load can be safely executed at a point where load is being hoisted.

Postpone the decision about safety of speculative load execution till the moment we know
where we hoist load and check safety at that context.

Reviewers: nikic, fhahn, mkazantsev, lebedev.ri, efriedma, reames
Reviewed By: reames, mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D88725
2020-10-06 09:25:16 +07:00
Vedant Kumar 9afb1c566e Revert "Outline non returning functions unless a longjmp"
This reverts commit 20797989ea.

This patch (https://reviews.llvm.org/D69257) cannot complete a stage2
build due to the change:

```
CI->getCalledFunction()->getName().contains("longjmp")
```

There are several concrete issues here:

  - The callee may not be a function, so `getCalledFunction` can assert.
  - The called value may not have a name, so `getName` can assert.
  - There's no distinction made between "my_longjmp_test_helper" and the
    actual longjmp libcall.

At a higher level, there's a serious layering problem here. The
splitting pass makes policy decisions in a general way (e.g. based on
attributes or profile data). Special-casing certain names breaks the
layering. It subverts the work of library maintainers (who may now need
to opt-out of unexpected optimization behavior for any affected
functions) and can lead to inconsistent optimization behavior (as not
all llvm passes special-case ".*longjmp.*" in the same way).

The patch may need significant revision to address these issues.

But the immediate issue is that this crashes while compiling llvm's unit
tests in a stage2 build (due to the `getName` problem).
2020-10-05 14:10:25 -07:00
Roman Lebedev e00f189d39
[InstCombine] Revert rL226781 "Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available." (PR47592)
(it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html)

This canonicalization seems dubious.

Most importantly, while it does not create `inttoptr` casts by itself,
it may cause them to appear later, see e.g. D88788.

I think it's pretty obvious that it is an undesirable outcome,
by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts
are not no-op, and are no longer eager to look past them.
Which e.g. means that given
```
%a = load i32
%b = inttoptr %a
%c = inttoptr %a
```
we likely won't be able to tell that `%b` and `%c` is the same thing.

As we can see in D88789 / D88788 / D88806 / D75505,
we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least)
And we can't recover the situation post-inlining in instcombine.

So it really does look like this fold is actively breaking
otherwise-good IR, in a way that is not recoverable.
And that means, this fold isn't helpful in exposing the passes
that are otherwise unaware of these patterns it produces.

Thusly, i propose to simply not perform such a canonicalization.
The original motivational RFC does not state what larger problem
that canonicalization was trying to solve, so i'm not sure
how this plays out in the larger picture.

On vanilla llvm test-suite + RawSpeed, this results in
increase of asm instructions and final object size by ~+0.05%
decreases final count of bitcasts by -4.79% (-28990),
ptrtoint casts by -15.41% (-3423),
and of inttoptr casts by -25.59% (-6919, *sic*).
Overall, there's -0.04% less IR blocks, -0.39% instructions.

See https://bugs.llvm.org/show_bug.cgi?id=47592

Differential Revision: https://reviews.llvm.org/D88789
2020-10-06 00:00:30 +03:00
Dávid Bolvanský a4bae56ab8 Revert "[SLC] Optimize mempcpy_chk to mempcpy"
This reverts commit 3f1fd59de3.
2020-10-05 22:27:14 +02:00
Dávid Bolvanský 3f1fd59de3 [SLC] Optimize mempcpy_chk to mempcpy
As reported in PR46735:

void* f(void *d, const void *s, size_t l)
{
    return __builtin___mempcpy_chk(d, s, l, __builtin_object_size(d, 0));
}

This can be optimized to `return mempcpy(d, s, l);`.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86019
2020-10-05 22:18:36 +02:00
Roman Lebedev 59127de243
[NFC][GCOV] Fix build: there's `llvm::stable_partition()` wrapper 2020-10-05 22:52:32 +03:00
Fangrui Song e338f8fe69 [gcov] Fix non-determinism (DenseMap iteration order) of checksum computation
... by using MapVector. The issue was caused by 63182c2ac0.

Also use stable_partition instead of partition to get stable results
across different STL implementations.
2020-10-05 12:39:36 -07:00
Nikita Popov 3641d375f6 [InstCombine] Handle GEP inbounds in select op replacement (PR47730)
When retrying the "simplify with operand replaced" select
optimization without poison flags, also handle inbounds on GEPs.

Of course, this particular example would also be safe to transform
while keeping inbounds, but the underlying machinery does not
know this (yet).
2020-10-05 21:13:02 +02:00
Simon Pilgrim 8fb4645321 [InstCombine] FoldShiftByConstant - use m_Specific. NFCI.
Use m_Specific instead of m_Value followed by an equality check - we already do this for the similar folds above, it looks like an oversight in rG2b459fe7e1e where the original pattern match code looked a little different.
2020-10-05 18:56:10 +01:00
Simon Pilgrim 4ce61144cb [InstCombine] canEvaluateShifted - remove dead (and never used code). NFC.
This was already #if'd out when it was added back in 2010 at rG18d7fc8fc6767 and has never been touched since.
2020-10-05 18:05:13 +01:00
Nikita Popov 9d63029770 Revert "[DebugInfo] Improve dbg preservation in LSR."
This reverts commit a3caf7f610.

The ReleaseLTO-g test-suite configuration has been failing
to build since this commit, because clang segfaults while
building 7zip.
2020-10-05 19:02:30 +02:00
Florian Hahn 348d85a6c7 [VPlan] Clean up uses/operands on VPBB deletion.
Update the code responsible for deleting VPBBs and recipes to properly
update users and release operands.

This is another preparation for D84680 & following patches towards
enabling modeling def-use chains in VPlan.
2020-10-05 14:43:52 +01:00
Markus Lavin a3caf7f610 [DebugInfo] Improve dbg preservation in LSR.
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo). Before transformation compute SCEV for each
@llvm.dbg.value in the loop body and store it (along side its current
DIExpression). After transformation update those @llvm.dbg.value now
referencing undef by comparing its stored SCEV to the SCEV of the
current loop-header PHI-nodes. Allow match with offset by inserting
compensation code in the DIExpression.

Fixes : PR38815

Differential Revision: https://reviews.llvm.org/D87494
2020-10-05 09:55:16 +02:00
Florian Hahn 357bbaab66 [VPlan] Add VPRecipeBase::toVPUser helper (NFC).
This adds a helper to convert a VPRecipeBase pointer to a VPUser, for
recipes that inherit from VPUser. Once VPRecipeBase directly inherits
from VPUser this helper can be removed.
2020-10-04 19:43:27 +01:00
Florian Hahn f5fe7abe8a [VPlan] Account for removed users in replaceAllUsesWith.
Make sure we do not iterate using an invalid iterator.

Another small fix/step towards traversing the def-use chains in VPlan.
2020-10-04 18:18:58 +01:00
Anatoly Parshintsev a566f0525a [RISCV][ASAN] instrumentation pass now uses proper shadow offset
[10/11] patch series to port ASAN for riscv64

Depends On D87580

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D87581
2020-10-04 16:30:38 +03:00
Roman Lebedev 03bd5198b6
[OldPM] Pass manager: run SROA after (simple) loop unrolling
I have stumbled into this pretty accidentally, when rewriting
some spaghetti-like code into something more structured,
which involved using some `std::array<>`s. And to my surprise,
the `alloca`s remained, causing about `+160%` perf regression.

https://llvm-compile-time-tracker.com/compare.php?from=bb6f4d32aac3eecb51909f4facc625219307ee68&to=d563e66f40f9d4d145cb2050e41cb961e2b37785&stat=instructions
suggests that this has geomean compile-time cost of `+0.08%`.

Note that D68593 / cecc0d27ad
already did this chage for NewPM, but left OldPM in a pessimized state.

This fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40011 | PR40011 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=42794 | PR42794 ]] and probably some other reports.

Reviewed By: nikic, xbolva00

Differential Revision: https://reviews.llvm.org/D87972
2020-10-04 11:53:50 +03:00
Florian Hahn 82dcd383c4 [VPlan] Properly update users when updating operands.
When updating operands of a VPUser, we also have to adjust the list of
users for the new and old VPValues. This is required once we start
transitioning recipes to become VPValues.
2020-10-03 20:54:58 +01:00
Michał Górny 66e493f81e [asan] Stop instrumenting user-defined ELF sections
Do not instrument user-defined ELF sections (whose names resemble valid
C identifiers).  They may have special use semantics and modifying them
may break programs.  This is e.g. the case with NetBSD __link_set API
that expects these sections to store consecutive array elements.

Differential Revision: https://reviews.llvm.org/D76665
2020-10-03 19:54:38 +02:00
Simon Pilgrim aacfe2be53 [InstCombine] recognizeBSwapOrBitReverseIdiom - add vector support
Add basic vector handling to recognizeBSwapOrBitReverseIdiom/collectBitParts - this works at the element level, all vector element operations must match (splat constants etc.) and there is no cross-element support (insert/extract/shuffle etc.).
2020-10-03 16:26:46 +01:00
Simon Pilgrim 347fd9955a [InstCombine] recognizeBSwapOrBitReverseIdiom - use generic CreateIntegerCast
Try to appease buildbots breakages due to D88578
2020-10-03 15:29:22 +01:00
Simon Pilgrim 3aa93f690b [InstCombine] recognizeBSwapOrBitReverseIdiom - support for 'partial' bswap patterns (PR47191) (Reapplied)
If we're bswap'ing some bytes and zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern.

Reapplied with early-out if recognizeBSwapOrBitReverseIdiom collects a source wider than the result type.

Differential Revision: https://reviews.llvm.org/D88578
2020-10-03 14:52:42 +01:00
Nikita Popov fbf818724f [MemCpyOpt] Make moveUp() a member method (NFC)
So we don't have to pass through more parameters in the future.
2020-10-03 11:28:49 +02:00
Arthur Eubanks 321986fe68 [MetaRenamer][NewPM] Port metarenamer to NPM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D88690
2020-10-02 15:42:25 -07:00
Vitaly Buka aff896dea1 [NFC][MSAN] Extract llvm.abs handling into a function
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D88519
2020-10-02 15:01:25 -07:00
Nikita Popov 94704ed008 [MemCpyOpt] Add helper to erase instructions (NFC)
Next to erasing the instruction, we also always want to remove
it from MSSA and MD. Use a common function to do so.

This is a refactoring split out from D26739.
2020-10-02 21:52:10 +02:00
Nikita Popov 87b63c1726 [MemCpyOpt] Avoid double invalidation (NFCI)
The removal of the cpy instruction is left to the caller of
performCallSlotOptzn(), including the invalidation of MD. Both
call-sites already do this.

Also handle incrementation of NumMemCpyInstr consistently at the
call-site. One of the call-site was already doing this, which
ended up incrementing the statistic twice.

This fix was part of D26739.
2020-10-02 21:50:46 +02:00
Arthur Eubanks 7468afe9ca [DAE] MarkLive in MarkValue(MaybeLive) if any use is live
While looping through all args or all return values, we may mark a use
of a later iteration as live. Previously when we got to that later value
it would ignore that and continue adding to Uses instead of marking it
live. For example, when looping through arg#0 and arg#1,
MarkValue(arg#0, Live) may cause some use of arg#1 to be live, but
MarkValue(arg#1, MaybeLive) will not notice that and continue adding
into Uses.

Now MarkValue(RA, MaybeLive) will MarkLive(RA) if any use is live.

Fixes PR47444.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D88529
2020-10-02 10:55:08 -07:00
Arthur Eubanks eb55735073 Reland [AlwaysInliner] Update BFI when inlining
Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88324
2020-10-02 10:46:57 -07:00
Arthur Eubanks 9b8c0b8b46 Revert "[AlwaysInliner] Update BFI when inlining"
This reverts commit b1bf24667f.
2020-10-02 10:34:51 -07:00
Arthur Eubanks b1bf24667f [AlwaysInliner] Update BFI when inlining
Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88324
2020-10-02 10:26:34 -07:00
Simon Pilgrim 0364721e3e Revert rG3d14a1e982ad27 - "[InstCombine] recognizeBSwapOrBitReverseIdiom - support for 'partial' bswap patterns (PR47191)"
This reverts commit 3d14a1e982.

This is breaking on some 2stage clang buildbots
2020-10-02 18:17:14 +01:00
Florian Hahn 0867a9e85a [VPlan] Use isa<> instead of directly checking VPRecipeID (NFC).
getVPRecipeID is intended to be only used in `classof` helpers. Instead
of checking it directly, use isa<> with the correct recipe type.
2020-10-02 17:47:35 +01:00
Simon Pilgrim 3d14a1e982 [InstCombine] recognizeBSwapOrBitReverseIdiom - support for 'partial' bswap patterns (PR47191)
If we're bswap'ing some bytes and zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern.

Differential Revision: https://reviews.llvm.org/D88578
2020-10-02 17:25:12 +01:00
Simon Pilgrim 0347f3ea72 TruncInstCombine.cpp - fix header include ordering to fix llvm-include-order clang-tidy warning. NFCI. 2020-10-02 17:25:12 +01:00
Simon Pilgrim 5e8e89d814 TruncInstCombine.cpp - use auto * to fix llvm-qualified-auto clang-tidy warning. NFCI. 2020-10-02 17:25:11 +01:00
Philip Reames f29645e7af [gvn] Handle a corner case w/vectors of non-integral pointers
If we try to coerce a vector of non-integral pointers to a narrower type (either narrower vector or single pointer), we use inttoptr and violate the semantics of non-integral pointers.  In theory, we can handle many of these cases, we just need to use a different code idiom to convert without going through inttoptr and back.

This shows up as wrong code bugs, and in some cases, crashes due to failed asserts.  Modeled after a change which has lived downstream for a couple years, though completely rewritten to be more idiomatic.
2020-10-01 19:20:21 -07:00
Joseph Huber 82453e759c [OpenMP] Add Missing Runtime Call for Globalization Remarks
Summary:
Add a missing runtime call to perform data globalization checks.

Reviewers: jdoerfert

Subscribers: guansong hiraditya llvm-commits sstefan1 yaxunl

Tags: #LLVM #OpenMP

Differential Revision: https://reviews.llvm.org/D88621
2020-10-01 21:19:53 -04:00
Philip Reames bb0344644a [memcpyopt] Conservatively handle non-integral pointers
If we allow the non-integral pointers to become memset and memcpy, we loose the ability to reason about pointer propagation.  This patch is modeled on changes we've carried downstream for a long time, figured it was worth being equally conservative for other users.  There is room to refine the semantics and handling here if anyone is motivated.
2020-10-01 16:46:56 -07:00
Philip Reames de3cb9548d Fix a bug in memset formation with vectors of non-integral pointers
We were converting the non-integral store into a integer store which is not legal.
2020-10-01 16:11:11 -07:00
Nikita Popov 9d1c8c0ba9 [InstCombine] Fix select operand simplification with undef (PR47696)
When replacing X == Y ? f(X) : Z with X == Y ? f(Y) : Z, make sure
that Y cannot be undef. If it may be undef, we might end up picking
a different value for undef in the comparison and the select
operand.
2020-10-01 21:15:48 +02:00
zoecarver 6c25816d7b [DSE] Look through memory PHI arguments when removing noop stores in MSSA.
Summary:
Adds support for "following" memory through MSSA PHI arguments. This will help catch more noop stores that exist between blocks.

Originally part of D79391.

Reviewers: fhahn, jfb, asbirlea

Differential Revision: https://reviews.llvm.org/D82588
2020-10-01 10:42:02 -07:00
Simon Pilgrim 29ac9fae54 [InstCombine] collectBitParts - convert to use PatterMatch matchers and avoid IntegerType casts.
Make sure we're using getScalarSizeInBits instead of cast<IntegerType> to get Type bit widths.

This is preliminary cleanup before we can start adding vector support to the bswap/bitreverse (element level) matching.
2020-10-01 16:44:14 +01:00
Simon Pilgrim 567049f892 [InstCombine] Use m_FAbs matcher helper. NFCI. 2020-10-01 14:42:34 +01:00
Sjoerd Meijer d53b4bee0c [LoopFlatten] Add a loop-flattening pass
This is a simple pass that flattens nested loops.  The intention is to optimise
loop nests like this, which together access an array linearly:

  for (int i = 0; i < N; ++i)
    for (int j = 0; j < M; ++j)
      f(A[i*M+j]);

into one loop:

  for (int i = 0; i < (N*M); ++i)
    f(A[i]);

It can also flatten loops where the induction variables are not used in the
loop. This can help with codesize and runtime, especially on simple cpus
without advanced branch prediction.

This is only worth flattening if the induction variables are only used in an
expression like i*M+j. If they had any other uses, we would have to insert a
div/mod to reconstruct the original values, so this wouldn't be profitable.

This partially fixes PR40581 as this pass triggers on one of the two cases. I
will follow up on this to learn LoopFlatten a few more (small) tricks. Please
note that LoopFlatten is not yet enabled by default.

Patch by Oliver Stannard, with minor tweaks from Dave Green and myself.

Differential Revision: https://reviews.llvm.org/D42365
2020-10-01 13:54:45 +01:00
Simon Pilgrim bc730b5e43 [InstCombine] collectBitParts - use APInt directly to check for out of range bit shifts. NFCI. 2020-10-01 12:50:36 +01:00
Arthur Eubanks 460dda071e [WholeProgramDevirt][NewPM] Add NPM testing path to match legacy pass
The legacy pass's default constructor sets UseCommandLine = true and
goes down a separate testing route. Match that in the NPM pass.

This fixes all tests in llvm/test/Transforms/WholeProgramDevirt under NPM.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D88588
2020-09-30 17:27:37 -07:00
Simon Pilgrim 323d08e50a [InstCombine] Fix bswap(trunc(bswap(x))) -> trunc(lshr(x, c)) vector support
Use getScalarSizeInBits not getPrimitiveSizeInBits to determine the shift value at the element level.
2020-09-30 16:01:08 +01:00
Simon Pilgrim c722b32596 [InstCombine] recognizeBSwapOrBitReverseIdiom - merge the regular/trunc+zext paths. NFCI.
There doesn't seem to be any good reason for having a separate path for when we bswap/bitreverse at a smaller size than the destination size - so merge these to make the instruction generation a lot clearer.
2020-09-30 14:54:04 +01:00
Simon Pilgrim d5545a8993 [InstCombine] recognizeBSwapOrBitReverseIdiom - remove unnecessary cast. NFCI. 2020-09-30 14:44:15 +01:00
Florian Hahn d856365470 [VPlan] Change recipes to inherit from VPUser instead of a member var.
Now that VPUser is not inheriting from VPValue, we can take the next
step and turn the recipes that already manage their operands via VPUser
into VPUsers directly. This is another small step towards traversing
def-use chains in VPlan.

This is NFC with respect to the generated code, but makes the interface
more powerful.
2020-09-30 14:39:00 +01:00
Simon Pilgrim 621c6c8962 [InstCombine] recognizeBSwapOrBitReverseIdiom - cleanup bswap/bitreverse detection loop. NFCI.
Early out if both pattern matches have failed (or we don't want them). Fix case of bit index iterator (and avoid Wshadow issue).
2020-09-30 14:19:18 +01:00
Simon Pilgrim 413b4998bd [InstCombine] recognizeBSwapOrBitReverseIdiom - use ArrayRef::back() helper. NFCI.
Post-commit feedback on D88316
2020-09-30 13:39:18 +01:00
Simon Pilgrim 05290eead3 InstCombine] collectBitParts - cleanup variable names. NFCI.
Fix a number of WShadow warnings (I was used as the instruction and index......) and fix cases to match style.

Also, replaced the Bit APInt mask check in AND instructions with a direct APInt[] bit check.
2020-09-30 13:25:32 +01:00
Simon Pilgrim af47d40b9c [InstCombine] recognizeBSwapOrBitReverseIdiom - recognise zext(bswap(trunc(x))) patterns (PR39793)
PR39793 demonstrated an issue where we fail to recognize 'partial' bswap patterns of the lower bytes of an integer source.

In fact, most of this is already in place collectBitParts suitably tags zero bits, so we just need to correctly handle this case by finding the zero'd upper bits and reducing the bswap pattern just to the active demanded bits.

Differential Revision: https://reviews.llvm.org/D88316
2020-09-30 12:07:19 +01:00
Simon Pilgrim ec3f24d453 [InstCombine] recognizeBSwapOrBitReverseIdiom - assert for correct bit providence indices. NFCI.
As suggested by @spatel on D88316
2020-09-30 11:16:33 +01:00
Jeremy Morse 05659606a2 Revert "[gardening] Replace some uses of setDebugLoc(DebugLoc()) with dropLocation(), NFC"
Some of the buildbots have croaked with this patch, for examples failures
that begin in this build:

  http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/29933

This reverts commit 674f57870f.
2020-09-30 09:52:12 +01:00
Vedant Kumar 674f57870f [gardening] Replace some uses of setDebugLoc(DebugLoc()) with dropLocation(), NFC 2020-09-29 17:39:07 -07:00
Vedant Kumar 26ee8aff2b [CodeExtractor] Don't create bitcasts when inserting lifetime markers (NFCI)
Lifetime marker intrinsics support any pointer type, so CodeExtractor
does not need to bitcast to `i8*` in order to use these markers.
2020-09-29 16:34:36 -07:00
Sanjay Patel 0527c8749b [InstCombine] ease alignment restriction for converting masked load to normal load
I think we initially made this fold conservative to be safer, but we do not
need the alignment attribute/metadata limitation because the masked load
intrinsic itself specifies the alignment. A normal vector load is better for
IR transforms and should be no worse in codegen than the masked alternative.
If it is worse for some target, the backend can reverse this transform.

Differential Revision: https://reviews.llvm.org/D88505
2020-09-29 15:26:22 -04:00
Simon Pilgrim 0cf48a7065 [InstCombine] visitTrunc - trunc (*shr (trunc A), C) --> trunc(*shr A, C)
Attempt to fold trunc (*shr (trunc A), C) --> trunc(*shr A, C) iff the shift amount if small enough that all zero/sign bits created by the shift are removed by the last trunc.

Helps fix the regressions encountered in D88316.

I've tweaked a couple of shift values as suggested by @lebedev.ri to ensure we have coverage of shift values close (above/below) to the max limit.

Differential Revision: https://reviews.llvm.org/D88429
2020-09-29 18:27:42 +01:00
Juneyoung Lee 67aac915ba [BuildLibCalls] Add noundef to the returned pointers of allocators and argument of free
This patch adds noundef to the returned pointers of allocators (malloc, calloc, ...)
and the pointer argument of free.
The returned pointer of allocators cannot be poison or (partially) undef.
Since the pointer that is given to free should precisely have zero offset,
it cannot be poison or (partially) undef too.

For the size arguments of allocators, noundef wasn't attached simply because
I wasn't sure whether attaching it is okay or not.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D87984
2020-09-30 02:13:48 +09:00
Simon Pilgrim b610d73b3f [InstCombine] visitTrunc - remove dead trunc(lshr (zext A), C) combine. NFCI.
I added additional test coverage at rG7a55989dc4305 - but all are handled independently of this combine and http://lab.llvm.org:8080/coverage/coverage-reports/ indicates the code is never used.

Differential revision: https://reviews.llvm.org/D88492
2020-09-29 17:15:16 +01:00
Simon Pilgrim 89a8a0c910 [InstCombine] Inherit exact flags on extended shifts in trunc (lshr (sext A), C) --> (ashr A, C)
This was missed in D88475
2020-09-29 15:32:09 +01:00
Simon Pilgrim 14ff38e235 [InstCombine] visitTrunc - trunc (lshr (sext A), C) --> (ashr A, C) non-uniform support
This came from @lebedev.ri's suggestion to use m_SpecificInt_ICMP for D88429 - since I was going to change the m_APInt to m_Constant for that patch I thought I would do it for the only other user of the APInt first.

I've added a ConstantExpr::getUMin helper - its trivial to add UMAX/SMIN/SMAX but thought I'd wait until we have use cases.

Differential Revision: https://reviews.llvm.org/D88475
2020-09-29 15:01:16 +01:00
Florian Hahn 7bae2bc5a8 [LoopUtils] Only verify SE in builds with assertions.
Follow up to 60b852092c.
2020-09-29 13:39:23 +01:00
Daniel Kiss c5a4900e1a [AArch64] Add BTI to CFI jumptables.
With branch protection the jump to the jump table entries requires a landing pad.

Reviewed By: eugenis, tamas.petz

Differential Revision: https://reviews.llvm.org/D81251
2020-09-29 13:50:23 +02:00
David Stenberg e6f332ef1e [IndVarSimplify] Fix Modified status for removal of overflow intrinsics
When removing an overflow intrinsic the Changed status in SimplifyIndvar
was not set, leading to the IndVarSimplify pass returning an incorrect
status.

This was caught using the check introduced by D80916.

As pointed out in the code review, a similar bug may exist for
eliminateTrunc().

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D85971
2020-09-29 13:20:59 +02:00
Vitaly Buka 4aa6abe4ef [msan] Fix llvm.abs.v intrinsic
The last argument of the intrinsic is a boolean
flag to control INT_MIN handling and does
not affect msan metadata.
2020-09-29 03:52:27 -07:00
sstefan1 cb9cfa0d2f [OpenMPOpt][Fix] Only initialize ICV initial values once.
Reviewers: jdoerfert, ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D88441
2020-09-29 12:22:58 +02:00
Florian Hahn 60b852092c [LoopDeletion] Forget loop before setting values to undef
After D71539, we need to forget the loop before setting the incoming
values of phi nodes in exit blocks, because we are looking through those
phi nodes now and the SCEV expression could depend on the loop phi. If
we update the phi nodes before forgetting the loop, we miss those users
during invalidation.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D88167
2020-09-29 10:38:44 +01:00
Florian Hahn b76df593eb Revert "Recommit "[SCCP] Do not replace deref'able ptr with un-deref'able one.""
Looks like there is still another remaining issue:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-msan/builds/22273/steps/build%20libcxx%2Fmsan/logs/stdio

This reverts commit 86a20d9e34.
2020-09-29 09:18:19 +01:00
Florian Hahn 86a20d9e34 Recommit "[SCCP] Do not replace deref'able ptr with un-deref'able one."
This version includes an small fix allowing function pointers to be
unconditionally replaced for now.

This reverts commit 4c5e4aa89b.
2020-09-29 09:10:27 +01:00
Max Kazantsev e862e78b63 [NFC] Use assert instead of checking the guaranteed condition
From preconditions it is known that either A dominates B or
B dominates A. If A does not dominate B, we do not really need
to check it. Assert should be enough. Should save some compile
time.
2020-09-29 11:38:45 +07:00
Max Kazantsev d266fd960e [IndVars] Remove exiting conditions that are trivially true/false
When removing exiting loop conditions, we only consider checks for
which we know the exact exit count. We could also eliminate checks for
which the condition is always true/false.

Differential Revision: https://reviews.llvm.org/D87344
Reviewed By: lebedev.ri, reames
2020-09-29 11:35:32 +07:00
Philip Reames e46d74b589 [CVP] Allow two transforms in one invocation
For a call site which had both constant deopt operands and nonnull arguments, we were missing the opportunity to recognize the later by bailing early.

This is somewhat of a speculative fix.  Months ago, I'd had a private report of performance and compile time regressions from the deopt operand folding.  I never received a test case.  However, the only possibility I see was that after that change CVP missed the nonnull fold, and we end up with a pass ordering/missed simplification issue.  So, since it's a real issue, fix it and hope.
2020-09-28 15:11:42 -07:00
Dominic Chen 06e68f05da
[AddressSanitizer] Copy type metadata to prevent miscompilation
When ASan and e.g. Dead Virtual Function Elimination are enabled, the
latter will rely on type metadata to determine if certain virtual calls can be
removed. However, ASan currently does not copy type metadata, which can cause
virtual function calls to be incorrectly removed.

Differential Revision: https://reviews.llvm.org/D88368
2020-09-28 13:56:05 -04:00
Simon Pilgrim 63ee42a06b [InstCombine] matchRotate - force splat of uniform constant rotation amounts (PR46895)
Fixes minor bug in D88402 where we were using the original shift constant (with undefs) instead of one with the splat values (re)splatted to all elements.
2020-09-28 15:12:41 +01:00
Simon Pilgrim dabb14cadd [InstCombine] matchRotate - allow undef in uniform constant rotation amounts (PR46895)
An extension to D87452, we can safely permit undefs in the uniform/splat detection

https://alive2.llvm.org/ce/z/nT-ptN

Differential Revision: https://reviews.llvm.org/D88402
2020-09-28 13:36:13 +01:00
Benjamin Kramer 7e5a356d2b [Coroutines] Remove unused includes. NFC. 2020-09-28 10:27:23 +02:00
Chuanqi Xu b3a722e66b [Coroutines] Reuse storage for local variables with non-overlapping lifetimes
bug 45566 shows the process of building coroutine frame won't consider
that the lifetimes of different local variables are not overlapped,
which means the compiler could generates smaller frame.

This patch calculate the lifetime range of each alloca by StackLifetime
class. Then the patch build non-overlapped sets for allocas whose
lifetime ranges are not overlapped. We use the largest type in a
non-overlapped set as the field type in the frame. In insertSpills
process, if we find the type of field is not the same with the alloca,
we cast the pointer to the field type to the pointer to the alloca type.
Since the lifetime range of alloca in one non-overlapped set is not
overlapped with each other, it should be ok to reuse the storage space
in the frame.

Test plan: check-llvm, check-clang, cppcoro, folly

Reviewers: junparser, lxfind, modocache

Differential Revision: https://reviews.llvm.org/D87596
2020-09-28 15:48:00 +08:00
Dávid Bolvanský 155ac33394 [BuildLibCalls] Add noalias for strcat and stpcpy
strcat:
destination and source shall not overlap. (http://www.cplusplus.com/reference/cstring/strcat/)

stpcpy:
The strings may not overlap, and the destination string dest must be  large enough to receive the copy. (https://man7.org/linux/man-pages/man3/stpcpy.3.html)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D88335
2020-09-27 21:37:09 +02:00
Nikita Popov fe79061be2 [LVI][CVP] Use block value when simplifying icmps
Add a flag to getPredicateAt() that allows making use of the block
value. This allows us to take into account range information from
the current block, rather than only information that is threaded
over edges, making the icmp simplification in CVP a lot more
powerful.

I'm not changing getPredicateAt() to use the block value
unconditionally to avoid any impact on the JumpThreading pass,
which is somewhat picky about LVI query order.

Most test changes here are just icmps that now get dropped (while
previously only a result used in a return was replaced). The three
tests in icmp.ll show some representative improvements. Some of
the folds this enables have been covered by IPSCCP in the meantime,
but LVI can reason about some cases which are hard to support in
IPSCCP, such as in test_br_cmp_with_offset.

The compile-time time cost of doing this is fairly minimal, with
a ~0.05% CTMark regression for ReleaseThinLTO:
https://llvm-compile-time-tracker.com/compare.php?from=709d03f8af4da4204849a70f01798e7cebba2e32&to=6236fd503761f43c99f4537121e057a01056f185&stat=instructions

This is because the block values will typically already be queried
and cached by other CVP optimizations anyway.

Differential Revision: https://reviews.llvm.org/D69686
2020-09-27 20:25:16 +02:00
Fangrui Song 50bd71e1d7 [NewPM] Port ConstraintElimination to the new pass manager
If -enable-constraint-elimination is specified, add it to the -O2/-O3 pipeline.
(-O1 uses a separate function now.)

Reviewed By: fhahn, aeubanks

Differential Revision: https://reviews.llvm.org/D88365
2020-09-27 11:12:26 -07:00
Benjamin Kramer 7b782062b4 [InstCombine] Simplify code. NFCI. 2020-09-27 19:11:07 +02:00
Nikita Popov 9b959b59df [LVI] Require context instruction in external API (NFCI)
Require CxtI in getConstant() and getConstantRange() APIs.
Accordingly drop the BB parameter, as it is implied by
CxtI->getParent().

This makes sure we don't forget to pass the context instruction,
and makes the API contract clearer (also clean up the comments to
that effect -- the value holds at the context instruction, not
the end of the block).
2020-09-27 18:07:24 +02:00
Nikita Popov c8abf1c12d [CVP] Pass context instruction when narrowing div/rem
This fold was the only place not passing the context instruction.
The tests worked around that fact by introducing a basic block split,
which is now no longer necessary.
2020-09-27 17:51:30 +02:00
Fangrui Song 82420b4e49 [DivRemPairs] Use DenseMapBase::find instead of operator[]. NFC 2020-09-27 01:13:14 -07:00
Fangrui Song 400bdbc422 [ConstraintElimination] Internalize function/class and delete an implied condition. NFC
Delete an implied condition (E.NumIn <= CB.NumIn)
2020-09-26 15:04:39 -07:00
Florian Hahn 915310bf14 Revert "[DSE] Switch to MemorySSA-backed DSE by default."
There appears to be a mis-compile with MemorySSA-backed DSE in
combination with llvm.lifetime.end. It currently appears like
DSE is doing the right thing and the llvm.lifetime.end markers
are incorrect. The reverted patch uncovers the mis-compile.

This patch temporarily switches back to the legacy DSE
implementation, while we investigate.

This reverts commit 9d172c8e9c.
2020-09-26 18:35:27 +01:00
Florian Hahn 8f0466edc0 [DSE] Unify & fix mem terminator location checks.
When looking for memory defs killed by memory terminators the code
currently incorrectly ignores the size argument of llvm.lifetime.end.

This patch updates the code to use isMemTerminator and updates
isMemTerminator to use isOverwrite() to make sure locations that are
outside the range marked as dead by llvm.lifetime.end are not
considered. Note that isOverwrite is only used for llvm.lifetime.end,
because free-like functions make the whole underlying object dead.
2020-09-26 13:47:50 +01:00
Tyker 8d5b289a46 [LoopDelete][Assume] Allow deleting loops with assumes
This pervent very poor optimization caused by a signle assume like https://godbolt.org/z/EK3oMh

baseline flags: -O3
patched flags: -O3 -mllvm --enable-knowledge-retention

Before the patch
```
Metric: compile_time
Program                                                      baseline patched diff
             test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test  20.72    29.74  43.5%
                     test-suite :: CTMark/Bullet/bullet.test  24.39    24.91   2.2%
               test-suite :: CTMark/7zip/7zip-benchmark.test  37.39    38.06   1.8%
                      test-suite :: CTMark/kimwitu++/kc.test  11.76    11.94   1.5%
                   test-suite :: CTMark/sqlite3/sqlite3.test  12.94    12.91  -0.3%
                       test-suite :: CTMark/SPASS/SPASS.test  11.72    11.70  -0.2%
                     test-suite :: CTMark/lencod/lencod.test  16.12    16.10  -0.1%
                   test-suite :: CTMark/ClamAV/clamscan.test  13.31    13.30  -0.1%
              test-suite :: CTMark/mafft/pairlocalalign.test   9.12     9.12  -0.1%
 test-suite :: CTMark/consumer-typeset/consumer-typeset.test   9.34     9.34  -0.1%
                                          Geomean difference                   4.2%

Metric: compiler_Kinst_count
Program                                                      baseline     patched      diff
             test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test 107576069.87 172886418.90 60.7%
                     test-suite :: CTMark/Bullet/bullet.test 123291865.66 125457117.96  1.8%
                      test-suite :: CTMark/kimwitu++/kc.test  56347884.64  57298544.14  1.7%
               test-suite :: CTMark/7zip/7zip-benchmark.test 180637699.58 183341656.57  1.5%
                   test-suite :: CTMark/sqlite3/sqlite3.test  66723788.85  66664692.80 -0.1%
                   test-suite :: CTMark/ClamAV/clamscan.test  69581500.56  69597863.92  0.0%
                     test-suite :: CTMark/lencod/lencod.test  94236501.48  94216545.32 -0.0%
                       test-suite :: CTMark/SPASS/SPASS.test  58516756.95  58505089.07 -0.0%
 test-suite :: CTMark/consumer-typeset/consumer-typeset.test  48832815.53  48841989.39  0.0%
              test-suite :: CTMark/mafft/pairlocalalign.test  49682720.53  49686324.34  0.0%
                                          Geomean difference                            5.4%
```

After the patch
```
Metric: compile_time
Program                                                      baseline patched diff
             test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test  20.70    22.40   8.2%
               test-suite :: CTMark/7zip/7zip-benchmark.test  37.13    38.05   2.5%
                     test-suite :: CTMark/Bullet/bullet.test  24.25    24.83   2.4%
                      test-suite :: CTMark/kimwitu++/kc.test  11.69    11.94   2.2%
                   test-suite :: CTMark/ClamAV/clamscan.test  13.19    13.36   1.3%
                     test-suite :: CTMark/lencod/lencod.test  16.02    16.19   1.1%
 test-suite :: CTMark/consumer-typeset/consumer-typeset.test   9.29     9.36   0.7%
                       test-suite :: CTMark/SPASS/SPASS.test  11.64    11.73   0.7%
              test-suite :: CTMark/mafft/pairlocalalign.test   9.10     9.15   0.5%
                   test-suite :: CTMark/sqlite3/sqlite3.test  12.95    12.96   0.0%
                                          Geomean difference                   1.9%

Metric: compiler_Kinst_count
Program                                                      baseline     patched      diff
             test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test 107590933.61 114044834.72  6.0%
                      test-suite :: CTMark/kimwitu++/kc.test  56344526.77  57235806.29  1.6%
                     test-suite :: CTMark/Bullet/bullet.test 123291285.10 125128334.97  1.5%
               test-suite :: CTMark/7zip/7zip-benchmark.test 180641540.10 183155706.39  1.4%
                   test-suite :: CTMark/sqlite3/sqlite3.test  66725619.22  66668713.92 -0.1%
                       test-suite :: CTMark/SPASS/SPASS.test  58509029.85  58478704.75 -0.1%
 test-suite :: CTMark/consumer-typeset/consumer-typeset.test  48843711.23  48826894.68 -0.0%
                     test-suite :: CTMark/lencod/lencod.test  94233305.79  94207544.23 -0.0%
                   test-suite :: CTMark/ClamAV/clamscan.test  69587887.66  69603549.90  0.0%
              test-suite :: CTMark/mafft/pairlocalalign.test  49686968.65  49689291.04  0.0%
                                          Geomean difference                            1.0%
```

Reviewed By: jdoerfert, efriedma

Differential Revision: https://reviews.llvm.org/D86816
2020-09-26 12:32:44 +02:00
Arthur Eubanks 83e3ea2cfc [LowerTypeTests][NewPM] Add constructor that uses command line flags
This matches the legacy PM pass by having one constructor use command
line flags, and the other use parameters to the pass.

This fixes all tests under Transforms/LowerTypeTests using NPM.

Reviewed By: ychen, pcc

Differential Revision: https://reviews.llvm.org/D87845
2020-09-25 17:39:59 -07:00
Layton Kifer 48961ba0de [TRE][NFC] Refactor Basic Block Processing
Simplify and improve readability.

Differential Revision: https://reviews.llvm.org/D82269
2020-09-25 16:01:05 -07:00
Simon Pilgrim 9ff9c1d8ee [InstCombine] matchRotate - support (uniform) constant rotation amounts (PR46895)
This patch adds handling of rotation patterns with constant shift amounts - the next bit will be how we want to support non-uniform constant vectors.

Differential Revision: https://reviews.llvm.org/D87452
2020-09-25 22:03:10 +01:00
Simon Pilgrim 2a0ca17f66 [InstCombine] collectBitParts - add fshl/fshr handling
Pulled from D87452, this is a fixed version of the collectBitParts fshl/fshr handling which as @nikic noticed wasn't checking for different providers or had correct bit ordering (which was hid by only testing shift amounts of bitwidth/2).

Differential Revision: https://reviews.llvm.org/D88292
2020-09-25 20:34:59 +01:00
Arthur Eubanks d3f6972abb [LoopReroll][NewPM] Port -loop-reroll to NPM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87957
2020-09-25 12:09:06 -07:00
Daniel Paoliello d2166076b8 [Coroutine] Split PHI Nodes in `cleanuppad` blocks in a way that obeys EH pad rules
Issue Details:
In order to support coroutine splitting, any multi-value PHI node in a coroutine is split into multiple blocks with single-value PHI Nodes, which then allows a subsequent transform to generate `reload` instructions as required (i.e., to reload the value if required if the coroutine has been resumed). This causes issues with EH pads (`catchswitch` and `catchpad`) as all pads within a `catchswitch` must have the same unwind destination, but the coroutine splitting logic may modify them to each have a unique unwind destination if there is a PHI node in the unwind `cleanuppad` that is set from values in the `catchswitch` and `cleanuppad` blocks.

Fix Details:
During splitting, if such a PHI node is detected, then create a "dispatcher" `cleanuppad` as well as the blocks with single-value PHI Nodes: thus the "dispatcher" is the unwind destination and it will detect which predecessor called it and then branch to the appropriate single-value PHI node block, which will then branch back to the original `cleanuppad` block.

Reviewed By: GorNishanov, lxfind

Differential Revision: https://reviews.llvm.org/D88059
2020-09-25 11:30:38 -07:00
Joseph Huber a22814194e [OpenMP] OpenMPOpt Support for Globalization Remarks
Summary:
This patch add support for printing analysis messages relating to data
globalization on the GPU. This occurs when data is shared between the
threads in a GPU context and must be pushed to global or shared memory.

Reviewers: jdoerfert

Subscribers: guansong hiraditya llvm-commits ormris sstefan1 yaxunl

Tags: #OpenMP #LLVM

Differential Revision: https://reviews.llvm.org/D88243
2020-09-24 18:23:12 -04:00
Vedant Kumar dfc5a9eb57 [Instruction] Add dropLocation and updateLocationAfterHoist helpers
Introduce a helper which can be used to update the debug location of an
Instruction after the instruction is hoisted. This can be used to safely
drop a source location as recommended by the docs.

For more context, see the discussion in https://reviews.llvm.org/D60913.

Differential Revision: https://reviews.llvm.org/D85670
2020-09-24 15:00:04 -07:00
Sanjay Patel 0a349d5827 [SLP] clean up - use 'const' and ArrayRef constructor; NFC
Follow-on tidying suggested in the post-commit review of 6a23668.
2020-09-24 15:31:07 -04:00
Craig Topper 03f22b08e2 [SLP] Remove LHS and RHS from OperationData.
These were only really used for 2 things. One was to check if the operand matches the phi if it exists. The other was for the createOp method to build the reduction.

For the first case we still have the operation we just need to know how to index its operands. So I've modified getLHS/getRHS to just use the opcode/kind to know how to find the right operands on an instruction that is now passed in.

For the other case we had to create an OperationData object to set the LHS/RHS values and copy the opcode/kind from another object. We would then just call createOp on that temporary object. Instead I've made LHS/RHS arguments to createOp and removed all these temporary objects.

Differential Revision: https://reviews.llvm.org/D88193
2020-09-24 10:57:11 -07:00
Simon Pilgrim 81a408808f [Scalar] ConstantHoistingPass - iterate with const references. NFCI.
Fix some clang-tidy warnings.
2020-09-24 18:40:50 +01:00
Matt Arsenault d65a7003c4 OpaquePtr: Add helpers for sret to mirror byval
Sret should really have a type parameter like byval does.
2020-09-24 09:57:28 -04:00
Zequan Wu f5435399e8 [CGProfile] don't emit cgprofile entry if called function is dllimport
Differential Revision: https://reviews.llvm.org/D88127
2020-09-23 16:56:54 -07:00
Arthur Eubanks 6b1ce83a12 [NewPM][CGSCC] Handle newly added functions in updateCGAndAnalysisManagerForPass
This seems to fit the CGSCC updates model better than calling
addNewFunctionInto{Ref,}SCC() on newly created/outlined functions.
Now addNewFunctionInto{Ref,}SCC() are no longer necessary.

However, this doesn't work on newly outlined functions that aren't
referenced by the original function. e.g. if a() was outlined into b()
and c(), but c() is only referenced by b() and not by a(), this will
trigger an assert.

This also fixes an issue I was seeing with newly created functions not
having passes run on them.

Ran check-llvm with expensive checks.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87798
2020-09-23 15:22:18 -07:00
Craig Topper 7a3c643c35 [SLP] Make HorizontalReduction::getOperationData take an Instruction* instead of a Value*. NFCI
All of the callers already have an Instruction *. Many of them
from a dyn_cast.

Also update the OperationData constructor to use a Instruction&
to remove a dyn_cast and make it clear that the pointer is non-null.

Differential Revision: https://reviews.llvm.org/D88132
2020-09-23 10:51:03 -07:00
Krzysztof Parzyszek 76e8c1899e Break long line accidentally left in the previous commit 2020-09-23 12:24:45 -05:00
Krzysztof Parzyszek e976fb1e54 [EarlyCSE] Fix crash with expensive checks after D87691
D87691 reordered some checks, which turned out to be unsafe. More
specifically, when examining a store instruction, the check against
getOrCreateResult should be done before attempting to call
isSameMemGeneration. Otherwise a crash in MSSA walker can occur.

This patch restores the order of these calls to what it was originally.
2020-09-23 12:21:34 -05:00
Simon Pilgrim 91589cf679 Add missing namespace closure comments. NFCI.
Fixes some clang-tidy llvm-namespace-comment warnings.
2020-09-23 16:19:25 +01:00
Simon Pilgrim 474dc33d07 Add missing namespace closure comment. NFCI.
Fixes clang-tidy llvm-namespace-comment warning.
2020-09-23 16:19:25 +01:00
Florian Hahn 31923f6b36 [VPlan] Disconnect VPValue and VPUser.
This refactors VPuser to not inherit from VPValue to facilitate
introducing operations that introduce multiple VPValues (e.g.
VPInterleaveRecipe).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D84679
2020-09-23 14:44:31 +01:00
David Sherwood 59c4d5aad0 [SVE] Fix InstCombinerImpl::PromoteCastOfAllocation for scalable vectors
In this patch I've fixed some warnings that arose from the implicit
cast of TypeSize -> uint64_t. I tried writing a variety of different
cases to show how this optimisation might work for scalable vectors
and found:

1. The optimisation does not work for cases where the cast type
is scalable and the allocated type is not. This because we need to
know how many times the cast type fits into the allocated type.
2. If we pass all the various checks for the case when the allocated
type is scalable and the cast type is not, then when creating the
new alloca we have to take vscale into account. This leads to
sub-optimal IR that is worse than the original IR.
3. For the remaining case when both the alloca and cast types are
scalable it is hard to find examples where the optimisation would
kick in, except for simple bitcasts, because we typically fail the
ABI alignment checks.

For now I've changed the code to bail out if only one of the alloca
and cast types is scalable. This means we continue to support the
existing cases where both types are fixed, and also the specific case
when both types are scalable with the same size and alignment, for
example a simple bitcast of an alloca to another type.

I've added tests that show we don't attempt to promote the alloca,
except for simple bitcasts:

  Transforms/InstCombine/AArch64/sve-cast-of-alloc.ll

Differential revision: https://reviews.llvm.org/D87378
2020-09-23 08:43:05 +01:00
Martin Storsjö b90132399a [CVP] Remove a redundant trailing semicolon, fixing GCC warnings. NFC. 2020-09-23 09:03:01 +03:00
Martin Storsjö 2c4c659666 [InstCombine] Add parentheses in assert to silence GCC warning. NFC. 2020-09-23 09:03:01 +03:00
Hubert Tong 32c9991dab [InstCombine] Fix errno bug in pow expansion to sqrt
A conversion from `pow` to `sqrt` shall not call an `errno`-setting
`sqrt` with -//infinity//: the `sqrt` will set `EDOM` where the `pow`
call need not.

This patch avoids the erroneous (pun not intended) transformation by
applying the restrictions discussed in the thread for
https://lists.llvm.org/pipermail/llvm-dev/2020-September/145051.html.

The existing tests are updated (depending on emphasis in the checks for
library calls, avoidance of overlap, and overall coverage):
  - to add `ninf`, retaining the intended library call,
  - to use the intrinsic, retaining the use of `select`, or
  - to expect the replacement to not occur.

The following is tested:
  - The pow intrinsic folds to a `select` instruction to
    handle -//infinity//.
  - The pow library call folds, with `ninf`, to `sqrt` without the
    `select` instruction associated with handling -//infinity//.
  - The pow library call does not fold to `sqrt` without `ninf`.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D87877
2020-09-22 18:58:05 -04:00
Alexey Bataev d6ac649ccd [SLP]Fix coding style, NFC. 2020-09-22 17:44:29 -04:00
Stefanos Baziotis 89c1e35f3c [LoopInfo] empty() -> isInnermost(), add isOutermost()
Differential Revision: https://reviews.llvm.org/D82895
2020-09-22 23:28:51 +03:00
Roman Lebedev b289dc5306
[CVP] Narrow SDiv/SRem to the smallest power-of-2 that's sufficient to contain its operands
This is practically identical to what we already do for UDiv/URem:
  https://rise4fun.com/Alive/04K

Name: narrow udiv
Pre: C0 u<= 255 && C1 u<= 255
%r = udiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = udiv i8 %t0, %t1
%r = zext i8 %t2 to i16

Name: narrow exact udiv
Pre: C0 u<= 255 && C1 u<= 255
%r = udiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = udiv exact i8 %t0, %t1
%r = zext i8 %t2 to i16

Name: narrow urem
Pre: C0 u<= 255 && C1 u<= 255
%r = urem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = urem i8 %t0, %t1
%r = zext i8 %t2 to i16

... only here we need to look for 'min signed bits', not 'active bits',
and there's an UB to be aware of:
  https://rise4fun.com/Alive/KG86
  https://rise4fun.com/Alive/LwR

Name: narrow sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = sdiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = sdiv i9 %t0, %t1
%r = sext i9 %t2 to i16

Name: narrow exact sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = sdiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = sdiv exact i9 %t0, %t1
%r = sext i9 %t2 to i16

Name: narrow srem
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = srem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = srem i9 %t0, %t1
%r = sext i9 %t2 to i16


Name: narrow sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = sdiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = sdiv i8 %t0, %t1
%r = sext i8 %t2 to i16

Name: narrow exact sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = sdiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = sdiv exact i8 %t0, %t1
%r = sext i8 %t2 to i16

Name: narrow srem
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = srem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = srem i8 %t0, %t1
%r = sext i8 %t2 to i16


The ConstantRangeTest.losslessSignedTruncationSignext test sanity-checks
the logic, that we can losslessly truncate ConstantRange to
`getMinSignedBits()` and signext it back, and it will be identical
to the original CR.

On vanilla llvm test-suite + RawSpeed, this fires 1262 times,
while the same fold for UDiv/URem only fires 384 times. Sic!

Additionally, this causes +606.18% (+1079) extra cases of
aggressive-instcombine.NumDAGsReduced, and +473.14% (+1145)
of aggressive-instcombine.NumInstrsReduced folds.
2020-09-22 21:37:30 +03:00
Roman Lebedev 4977eadee5
[NFC][CVP] Give a better name STATISTIC() counting udiv i16 -> udiv i8 xforms 2020-09-22 21:37:30 +03:00
Roman Lebedev ba5afe5588
[NFC][CVP] processUDivOrURem(): refactor to use ConstantRange::getActiveBits()
As an exhaustive test shows, this logic is fully identical to the old
implementation, with exception of the case where both of the operands
had empty ranges:

```
TEST_F(ConstantRangeTest, CVP_UDiv) {
  unsigned Bits = 4;
  EnumerateConstantRanges(Bits, [&](const ConstantRange &CR0) {
    if(CR0.isEmptySet())
      return;
    EnumerateConstantRanges(Bits, [&](const ConstantRange &CR1) {
      if(CR0.isEmptySet())
        return;

      unsigned MaxActiveBits = 0;
      for (const ConstantRange &CR : {CR0, CR1})
        MaxActiveBits = std::max(MaxActiveBits, CR.getActiveBits());

      ConstantRange OperandRange(Bits, /*isFullSet=*/false);
      for (const ConstantRange &CR : {CR0, CR1})
        OperandRange = OperandRange.unionWith(CR);
      unsigned NewWidth = OperandRange.getUnsignedMax().getActiveBits();

      EXPECT_EQ(MaxActiveBits, NewWidth) << CR0 << " " << CR1;
    });
  });
}
```
2020-09-22 21:37:29 +03:00
Roman Lebedev 4eeeb356fc
[CVP] Enhance SRem -> URem fold to work not just on non-negative operands
This is a continuation of 8d487668d0,
the logic is pretty much identical for SRem:

Name: pos pos
Pre: C0 >= 0 && C1 >= 0
%r = srem i8 C0, C1
  =>
%r = urem i8 C0, C1

Name: pos neg
Pre: C0 >= 0 && C1 <= 0
%r = srem i8 C0, C1
  =>
%r = urem i8 C0, -C1

Name: neg pos
Pre: C0 <= 0 && C1 >= 0
%r = srem i8 C0, C1
  =>
%t0 = urem i8 -C0, C1
%r = sub i8 0, %t0

Name: neg neg
Pre: C0 <= 0 && C1 <= 0
%r = srem i8 C0, C1
  =>
%t0 = urem i8 -C0, -C1
%r = sub i8 0, %t0

https://rise4fun.com/Alive/Vd6

Now, this new logic does not result in any new catches
as of vanilla llvm test-suite + RawSpeed.
but it should be virtually compile-time free,
and it may be important to be consistent in their handling,
because if we had a pair of sdiv-srem, and only converted one of them,
-divrempairs will no longer see them as a pair,
and thus not "merge" them.
2020-09-22 21:37:28 +03:00
Hubert Tong 6801950192 [InstCombine] For pow(x, +/-0.5), stop falling into pow(x, 1.5), etc. case
The current code for handling pow(x, y) where y is an integer plus 0.5
is not explicitly guarded against attempting to transform the case where
abs(y) is exactly 0.5.

The latter case is meant to be handled by `replacePowWithSqrt`. Indeed,
if the pow(x, integer+0.5) case proceeds past a certain point, it will
hit an assertion by attempting to form pow(x, 0) using `getPow`.

This patch adds an explicit check to prevent attempting the
pow(x, integer+0.5) transformation on pow(x, +/-0.5) as suggested during
the review of D87877. This has the effect of retaining the shrinking of
`pow` to `powf` when the `sqrt` libcall cannot be formed.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D88066
2020-09-22 14:23:32 -04:00
Hamilton Tobon Mosquera bd31abc1d0 [OpenMPOpt] Refactored "issue" and "wait" declarations for data map runtime call.
Refactored __tgt_target_data_begin_mapper_<issue|wait> to receive the handle as an input/output argument.
This given the compiler warning of returning the handle as copy.

Differential Revision: https://reviews.llvm.org/D88029
2020-09-22 10:50:17 -05:00
Florian Hahn c671e34bf2 [VPlan] Add dump() helper to VPValue & VPRecipeBase.
This provides a convenient way to print VPValues and recipes in a
debugger. In particular it saves the user from instantiating
VPSlotTracker to print recipes or values.
2020-09-22 15:55:16 +01:00
Sanjay Patel 0c3bfbe4bc [SLP] reduce code duplication for checking parent block; NFC 2020-09-22 09:21:20 -04:00
Sanjay Patel bbd49a0266 [SLP] move misplaced code comments; NFC 2020-09-22 09:21:20 -04:00
Sanjay Patel 062276c691 [SLP] clean up code in gather(); NFC
1. Use range for-loop to avoid repeatedly accessing end index.
2. Better variable names.
2020-09-22 09:21:20 -04:00
Simon Pilgrim d682a36ef9 [SLP] Merge null and dyn_cast<> checks into dyn_cast_or_null<>. NFCI. 2020-09-22 14:01:47 +01:00
Meera Nakrani a3d0dce260 [ARM][TTI] Prevents constants in a min(max) or max(min) pattern from being hoisted when in a loop
Changes TTI function getIntImmCostInst to take an additional Instruction parameter,
which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT.
We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated.
Required minor changes in some non-ARM backends to allow for the optional parameter to be included.

Differential Revision: https://reviews.llvm.org/D87457
2020-09-22 11:54:10 +00:00
Arthur Eubanks 3bf703fb6d [AlwaysInliner] Emit optimization remarks
To match the normal inliner in preparation for https://reviews.llvm.org/D86988.

Also change a FIXME to an assert.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88067
2020-09-21 22:09:28 -07:00
Serguei Katkov 5502cfa091 [LoopUnswitch] Trivial simplification: remove trivial dead condition after unswitch
Non trivial loop unswitch can keep the dead condition instruction.
CL adds trivial dead code elimination for unused condition.

Reviewers: asbirlea, aqjune, fhahn, DaniilSuchkov, reames
Reviewed By: asbirlea
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D88014
2020-09-22 09:04:59 +07:00
Krzysztof Parzyszek ae3f54c1e9 [EarlyCSE] Handle masked loads and stores
Extend the handling of memory intrinsics to also include non-
target-specific intrinsics, in particular masked loads and stores.

Invent "isHandledNonTargetIntrinsic" to distinguish between intrin-
sics that should be handled natively from intrinsics that can be
passed to TTI.

Add code that handles masked loads and stores and update the
testcase to reflect the results.

Differential Revision: https://reviews.llvm.org/D87340
2020-09-21 18:47:10 -05:00
Arthur Eubanks 1747f77764 [SimplifyCFG] Override options in default constructor
SimplifyCFG's options should always be overridden by command line flags,
but they mistakenly weren't in the default constructor.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87718
2020-09-21 16:33:01 -07:00
Krzysztof Parzyszek 2c768c7d6c [EarlyCSE] Small refactoring changes, NFC
1. Store intrinsic ID in ParseMemoryInst instead of a boolean flag
   "IsTargetMemInst". This will make it easier to add support for
   target-independent intrinsics.
2. Extract the complex multiline conditions from EarlyCSE::processNode
   into a new function "getMatchingValue".

Differential Revision: https://reviews.llvm.org/D87691
2020-09-21 16:11:06 -05:00
Sanjay Patel 7451bf0b0b [SLP] use std::distance/find to reduce code; NFC
We were already using this code pattern right after
the loop, so this makes it consistent.
2020-09-21 16:22:55 -04:00
Sanjay Patel 6bad3caeb0 [InstCombine] use unary shuffle creator to reduce code duplication; NFC 2020-09-21 15:34:24 -04:00
Sanjay Patel be93505986 [LoopVectorize] use unary shuffle creator to reduce code duplication; NFC 2020-09-21 15:34:24 -04:00
Arthur Eubanks f4f7df037e [DIE] Remove DeadInstEliminationPass
This pass is like DeadCodeEliminationPass, but only does one pass
through a function instead of iterating on users of eliminated
instructions.

DeadCodeEliminationPass should be used in all cases.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87933
2020-09-21 12:12:25 -07:00
Arthur Eubanks 746a2c3775 [ObjCARC] Initialize return value
Mistakenly removed initialization of `Changed` in https://reviews.llvm.org/D87806.
2020-09-21 11:03:44 -07:00
Sanjay Patel a44238cb44 [SLP] use unary shuffle creator to reduce code duplication; NFC 2020-09-21 13:54:06 -04:00
Sanjay Patel 1e6b240d7d [IRBuilder][VectorCombine] make and use a convenience function for unary shuffle; NFC
This reduces code duplication for common construct.
Follow-ups can use this in SLP, LoopVectorizer, and other passes.
2020-09-21 13:47:01 -04:00
Simon Pilgrim 005f826a05 [SLP] Use for-range loops across ValueLists. NFCI.
Also rename some existing loops that used a 'j' iterator to consistently use 'V'.
2020-09-21 18:24:23 +01:00
Sanjay Patel 46075e0b78 [SLP] simplify interface for gather(); NFC
The implementation of gather() should be reduced too,
but this change by itself makes things a little clearer:
we don't try to gather to a different type or
number-of-values than whatever is passed in as the value
list itself.
2020-09-21 12:57:28 -04:00
Arthur Eubanks 024979b7b6 [ObjCARC][NewPM] Port objc-arc-contract to NPM
Similar to https://reviews.llvm.org/D86178.

This is a module pass instead of a function pass since
ARCRuntimeEntryPoints can lazily add function declarations.

Reviewed By: ahatanak

Differential Revision: https://reviews.llvm.org/D87806
2020-09-21 09:40:14 -07:00
Simon Pilgrim 3ddecfd220 SLPVectorizer.cpp - fix include ordering. NFCI. 2020-09-21 17:17:11 +01:00
Alexey Bataev 3ff07fcd54 [SLP] Allow reordering of vectorization trees with reused instructions.
If some leaves have the same instructions to be vectorized, we may
incorrectly evaluate the best order for the root node (it is built for the
vector of instructions without repeated instructions and, thus, has less
elements than the root node). In this case we just can not try to reorder
the tree + we may calculate the wrong number of nodes that requre the
same reordering.
For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves
are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first
leaf, it will be shrink to \<a, b\>. If instructions in this leaf should
be reordered, the best order will be \<1, 0\>. We need to extend this
order for the root node. For the root node this order should look like
\<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes
with the reused instructions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D45263
2020-09-21 10:51:03 -04:00
Florian Hahn 57ae9bb932 [LSR] Preserve MSSA when using SplitCriticalEdge.
LSR claims to MemorySSA, but we also have to make sure it is preserved
when splitting critical edges. This can be done by passing MSSAU to
SplitCriticalEdge.

Fixes PR47557.
2020-09-21 09:51:26 +01:00
Sanjay Patel 7903ae4720 [InstCombine] factorize left shifts of add/sub
We do similar factorization folds in SimplifyUsingDistributiveLaws,
but that drops no-wrap properties. Propagating those optimally may
help solve:
https://llvm.org/PR47430

The propagation is all-or-nothing for these patterns: when all
3 incoming ops have nsw or nuw, the 2 new ops should have the
same no-wrap property:
https://alive2.llvm.org/ce/z/Dv8wsU

This also solves:
https://llvm.org/PR47584
2020-09-20 12:55:24 -04:00
Sanjay Patel cf75e83275 [InstCombine] replace zombie unreachable values with 'undef' before erasing
The test (currently crashing) is reduced from the example provided
in the post-commit discussion in D87149.

Differential Revision: https://reviews.llvm.org/D87965
2020-09-20 12:25:08 -04:00
Fangrui Song 871d03a675 [FunctionAttrs] Inline setDoesNotRecurse() and delete it. NFC
It always returns true, which may lead to confusion. Inline it because it is
trivial and only called twice.
2020-09-19 22:24:52 -07:00
Fangrui Song 0526713aa8 [FunctionAttrs] Remove redundant check. NFC 2020-09-19 20:46:18 -07:00
Fangrui Song 6913812abc Fix some clang-tidy bugprone-argument-comment issues 2020-09-19 20:41:25 -07:00
Nikita Popov f4e5541809 [Local] Clean up enforceKnownAlignment() (NFC)
I want to export this function, and the current API was a bit
weird: It took an additional Alignment argument that didn't really
have anything to do with what the function does. Drop it, and
perform a max at the callsite.

Also rename it to tryEnforceAlignment().
2020-09-19 22:29:40 +02:00
Florian Hahn 1d8f2e5292 [SCEVExpander] Support expanding nonintegral pointers with constant base.
Currently SCEVExpander creates inttoptr for non-integral pointers if the
base is a null constant for example. This results in invalid IR.

This patch changes InsertNoopCastOfTo to emit a GEP & bitcast to convert
to a non-integral pointer. First, a GEP of i8* null is generated and the
integral value is used as index. The GEP is then bitcasted to the target
type.

This was exposed by D71539.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87827
2020-09-19 17:19:53 +01:00
Xun Li 11453740bc [ASAN] Properly deal with musttail calls in ASAN
When address sanitizing a function, stack unpinsoning code is inserted before each ret instruction. However if the ret instruciton is preceded by a musttail call, such transformation broke the musttail call contract and generates invalid IR.
This patch fixes the issue by moving the insertion point prior to the musttail call if there is one.

Differential Revision: https://reviews.llvm.org/D87777
2020-09-18 23:10:34 -07:00
Fangrui Song 76eec6c95b [SCEV] Fix an unused variable in -DLLVM_ENABLE_ASSERTIONS=off build 2020-09-18 16:19:05 -07:00
Eric Christopher ecfd8161bf Temporarily Revert "[SLP] Allow reordering of vectorization trees with reused instructions."
as it's infinite looping on occasion.

This reverts commit 455ca0ebb6.
2020-09-18 12:50:04 -07:00
Huihui Zhang 9ad6049736 [InstCombine][SVE] Skip scalable type for InstCombiner::getFlippedStrictnessPredicateAndConstant.
We cannot iterate on scalable vector, the number of elements is unknown at compile-time.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87918
2020-09-18 11:26:36 -07:00
Alexey Bataev 455ca0ebb6 [SLP] Allow reordering of vectorization trees with reused instructions.
If some leaves have the same instructions to be vectorized, we may
incorrectly evaluate the best order for the root node (it is built for the
vector of instructions without repeated instructions and, thus, has less
elements than the root node). In this case we just can not try to reorder
the tree + we may calculate the wrong number of nodes that requre the
same reordering.
For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves
are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first
leaf, it will be shrink to \<a, b\>. If instructions in this leaf should
be reordered, the best order will be \<1, 0\>. We need to extend this
order for the root node. For the root node this order should look like
\<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes
with the reused instructions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D45263
2020-09-18 09:34:59 -04:00
Florian Hahn 9d172c8e9c Recommit "[DSE] Switch to MemorySSA-backed DSE by default."
This switches to using DSE + MemorySSA by default again, after
fixing the issues reported after the first commit.

Notable fixes fc82006331, a0017c2bc2.

This reverts commit 3a59628f3c.
2020-09-18 11:05:00 +01:00
Nikita Popov 13e19d2e7c Revert "[InstCombine] Canonicalize SPF_ABS to abs intrinc"
This reverts commit 05d4c4ebc2.

mstorsjo reports a miscompile after this change in
https://reviews.llvm.org/D87188#2281093. Reverting until I can
investigate this.
2020-09-18 09:38:26 +02:00
Nikita Popov 05d4c4ebc2 [InstCombine] Canonicalize SPF_ABS to abs intrinc
Enable canonicalization of SPF_ABS and SPF_NABS to the abs intrinsic.

To be conservative, the one-use check on the comparison is retained,
this may be relaxed if all goes well.

It's pretty likely that this will uncover places that missing
handling for the abs() intrinsic. Please report any seen performance
regressions.

Differential Revision: https://reviews.llvm.org/D87188
2020-09-17 22:28:34 +02:00
Whitney Tsang 1cee33e9db [LoopUnrollAndJam] Allow unroll and jam loops forced by user.
Summary: Allow unroll and jam loops forced by user.
LoopUnrollAndJamPass is still disabled by default in the NPM pipeline,
and can be controlled by -enable-npm-unroll-and-jam.

Reviewed By: Meinersbur, dmgreen

Differential Revision: https://reviews.llvm.org/D87786
2020-09-17 19:40:14 +00:00
Nikita Popov 91ce8e121b [GVN] Use that assume(!X) implies X==false (PR47496)
We already use that assume(X) implies X==true, do the same for
assume(!X) implying X==false. This fixes PR47496.
2020-09-17 21:34:44 +02:00
Sanjay Patel 48a23bccf3 [VectorCombine] limit load+insert transform to one-use
As discussed in:
https://llvm.org/PR47558
...there are several potential fixes/follow-ups visible
in the test case, but this is the quickest and safest
fix of the perf regression.
2020-09-17 14:29:15 -04:00
Sanjay Patel ddd9575d15 [VectorCombine] rearrange bailouts for load insert for efficiency; NFC 2020-09-17 13:50:37 -04:00
Xun Li 5b533d6cde [Coroutine] Fix a bug where Coroutine incorrectly spills phi and invoke defs before CoroBegin
When a spill definition is before CoroBegin, we cannot spill it to the frame immediately after the definition. We have to spill it after the frame is ready.
The current implementation handles it properly for any other kinds of instructions except for PhINode and InvokeInst, which could also be defined before CoroBegin.
This patch fixes it by moving the CoroBegin dominance check earlier, so that it covers all cases.
Added a test.

Differential Revision: https://reviews.llvm.org/D87810
2020-09-17 08:13:07 -07:00
Sanjay Patel 03783f19dc [SLP] sort candidates to increase chance of optimal compare reduction
This is one (small) part of improving PR41312:
https://llvm.org/PR41312

As shown there and in the smaller tests here, if we have some member of the
reduction values that does not match the others, we want to push it to the
end (bring the matching members forward and together).

In the regression tests, we have 5 candidates for the 4 slots of the reduction.
If the one "wrong" compare is grouped with the others, it prevents forming the
ideal v4i1 compare reduction.

Differential Revision: https://reviews.llvm.org/D87772
2020-09-17 08:49:27 -04:00
Roman Lebedev aadf55d1ce
[NFC] EliminateDuplicatePHINodes(): small-size optimization: if there are <= 32 PHI's, O(n^2) algo is faster (geomean -0.08%)
This is functionally equivalent to the old implementation.

As per https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=4739e6e4eb54d3736e6457249c0919b30f6c855a&stat=instructions
this is a clear geomean compile-time regression-free win with overall geomean of `-0.08%`

32 PHI's appears to be the sweet spot; both the 16 and 64 performed worse:
https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=c4efe1fbbfdf0305ac26cd19eacb0c7774cdf60e&stat=instructions
https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=e4989d1c67010d3339d1a40ff5286a31f10cfe82&stat=instructions

If we have more PHI's than that, we fall-back to the original DenseSet-based implementation,
so the not-so-fast cases will still be handled.

However compile-time isn't the main motivation here.
I can name at least 3 limitations of this CSE:
1. Assumes that all PHI nodes have incoming basic blocks in the same order (can be fixed while keeping the DenseMap)
2. Does not special-handle `undef` incoming values (i don't see how we can do this with hashing)
3. Does not special-handle backedge incoming values (maybe can be fixed by hashing backedge as some magical value)

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87408
2020-09-17 11:29:03 +03:00
Michael Liao 4e4c89b22c [EarlyCSE] Simplify max/min pattern matching. NFC. 2020-09-16 18:34:46 -04:00
Nikita Popov 222bf3ffbc Reapply [InstCombine] Simplify select operand based on equality condition
Reapply after fixing SimplifyWithOpReplaced() to never return
the original value, which would lead to an infinite loop in this
transform.

-----

For selects of the type X == Y ? A : B, check if we can simplify A
by using the X == Y equality and replace the operand if that's
possible. We already try to do this in InstSimplify, but will only
fold if the result of the simplification is the same as B, in which
case the select can be dropped entirely. Here the select will be
retained, just one operand simplified.

As we are performing an actual replacement here, we don't have
problems with refinement / poison values.

Differential Revision: https://reviews.llvm.org/D87480
2020-09-16 20:53:58 +02:00
Arthur Eubanks c27b64bbe1 [Coro][NewPM] Handle llvm.coro.prepare.retcon in NPM coro-split pass
Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D87731
2020-09-16 09:09:10 -07:00
Dangeti Tharun kumar 01e2b394ee [Partial Inliner] Compute intrinsic cost through TTI
https://bugs.llvm.org/show_bug.cgi?id=45932

assert(OutlinedFunctionCost >= Cloner.OutlinedRegionCost && "Outlined function cost should be no less than the outlined region") getting triggered in computeBBInlineCost.

Intrinsics like "assume" are considered regular function calls while computing costs.
This patch enables computeBBInlineCost to queries TTI for intrinsic call cost.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87132
2020-09-16 15:12:31 +01:00
Sanjay Patel 24238f09ed [SLP] fix formatting; NFC
Also move variable declarations closer to usage and add code comments.
2020-09-16 08:50:27 -04:00
Sanjay Patel 6a23668e78 [SLP] remove uses of 'auto' that obscure functionality; NFC 2020-09-16 08:26:21 -04:00
Sanjay Patel 0cee1bf5d1 [SLP] remove redundant size check; NFC
We bail out on small array size anyway.
2020-09-16 08:11:19 -04:00
Sanjay Patel bbad998bab [SLP] move loop index variable declaration to its use; NFC 2020-09-16 07:59:31 -04:00
Sanjay Patel 158989184e [SLP] change poorly named variable; NFC
'V' shadows a function argument.
2020-09-16 07:59:31 -04:00
Arthur Eubanks ba12e77ec1 [NewPM] Port strip* passes to NPM
strip-nondebug and strip-debug-declare have no existing associated tests

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87639
2020-09-15 18:25:12 -07:00
Arthur Eubanks f7aa1563eb [LowerSwitch][NewPM] Port lowerswitch to NPM
Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87726
2020-09-15 18:18:31 -07:00
Wenlei He 2c391a5a14 [LICM] Make Loop ICM profile aware again
D65060 was reverted because it introduced non-determinism by using BFI counts from already freed blocks. The parent of this revision fixes that by using a VH callback on blocks to prevent this from happening and makes sure BFI data is passed correctly in LoopStandardAnalysisResults.

This re-introduces the previous optimization of using BFI data to prevent LICM from hoisting/sinking if the instruction will end up moving to a colder block.

Internally at Facebook this change results in a ~7% win in a CPU related metric in one of our big services by preventing hoisting cold code into a hot pre-header like the added test case demonstrates.

Testing:
ninja check

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87551
2020-09-15 17:21:58 -07:00
Wenlei He 2ea4c2c598 [BFI] Make BFI information available through loop passes inside LoopStandardAnalysisResults
~~D65060 uncovered that trying to use BFI in loop passes can lead to non-deterministic behavior when blocks are re-used while retaining old BFI data.~~

~~To make sure BFI is preserved through loop passes a Value Handle (VH) callback is registered on blocks themselves. When a block is freed it now also wipes out the accompanying BFI entry such that stale BFI data can no longer persist resolving the determinism issue. ~~

~~An optimistic approach would be to incrementally update BFI information throughout the loop passes rather than only invalidating them on removed blocks. The issues with that are:~~
~~1. It is not clear how BFI information should be incrementally updated: If a block is duplicated does its BFI information come with? How about if it's split/modified/moved around? ~~
~~2. Assuming we can address these problems the implementation here will be a massive undertaking. ~~

~~There's a known need of BFI in LICM analysis which requires correct but not incrementally updated BFI data. A follow-up change can register BFI in all loop passes so this preserved but potentially lossy data is available to any loop pass that wants it.~~

See: D75341 for an identical implementation of preserving BFI via VH callbacks. The previous statements do still apply but this change no longer has to be in this diff because it's already upstream 😄 .

This diff also moves BFI to be a part of LoopStandardAnalysisResults since the previous method using getCachedResults now (correctly!) statically asserts (D72893) that this data isn't static through the loop passes.

Testing
Ninja check

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D86156
2020-09-15 16:16:24 -07:00
Xun Li 7b4cc0961b [TSAN] Handle musttail call properly in EscapeEnumerator (and TSAN)
Call instructions with musttail tag must be optimized as a tailcall, otherwise could lead to incorrect program behavior.
When TSAN is instrumenting functions, it broke the contract by adding a call to the tsan exit function inbetween the musttail call and return instruction, and also inserted exception handling code.
This happend throguh EscapeEnumerator, which adds exception handling code and returns ret instructions as the place to insert instrumentation calls.
This becomes especially problematic for coroutines, because coroutines rely on tail calls to do symmetric transfers properly.
To fix this, this patch moves the location to insert instrumentation calls prior to the musttail call for ret instructions that are following musttail calls, and also does not handle exception for musttail calls.

Differential Revision: https://reviews.llvm.org/D87620
2020-09-15 15:20:05 -07:00
Huihui Zhang 3b7f5166bd [SLPVectorizer][SVE] Skip scalable-vector instructions before vectorizeSimpleInstructions.
For scalable type, the aggregated size is unknown at compile-time.
Skip instructions with scalable type to ensure the list of instructions
for vectorizeSimpleInstructions does not contains any scalable-vector instructions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D87550
2020-09-15 13:10:15 -07:00
Matt Arsenault 7d6ca2ec57 InferAddressSpaces: Fix assert with unreachable code
Invalid IR in unreachable code is technically valid IR. In this case,
the address space of the value was never inferred, and we tried to
rewrite it with an invalid address space value which would assert.
2020-09-15 15:48:43 -04:00
Florian Hahn 3d42d54955 [ConstraintElimination] Add constraint elimination pass.
This patch is a first draft of a new pass that adds a more flexible way
to eliminate compares based on more complex constraints collected from
dominating conditions.

In particular, it aims at simplifying conditions of the forms below
using a forward propagation approach, rather than instcomine-style
ad-hoc backwards walking of def-use chains.

    if (x < y)
      if (y < z)
        if (x < z) <- simplify

or

    if (x + 2 < y)
        if (x + 1 < y) <- simplify assuming no wraps

The general approach is to collect conditions and blocks, sort them by
dominance and then iterate over the sorted list. Conditions are turned
into a linear inequality and add it to a system containing the linear
inequalities that hold on entry to the block. For blocks, we check each
compare against the system and see if it is implied by the constraints
in the system.

We also keep a stack of processed conditions and remove conditions from
the stack and the constraint system once they go out-of-scope (= do not
dominate the current block any longer).

Currently there still are the least the following areas for improvements

* Currently large unsigned constants cannot be added to the system
  (coefficients must be represented as integers)
* The way constraints are managed currently is not very optimized.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D84547
2020-09-15 19:31:11 +01:00
Florian Hahn 3a59628f3c Revert "[DSE] Switch to MemorySSA-backed DSE by default."
This reverts commit fb109c42d9.

Temporarily revert due to a mis-compile pointed out at D87163.
2020-09-15 18:07:56 +01:00
Fangrui Song 4452cc4086 [VectorCombine] Don't vectorize scalar load under asan/hwasan/memtag/tsan
Similar to the tsan suppression in
`Utils/VNCoercion.cpp:getLoadLoadClobberFullWidthSize` (rL175034; load widening used by GVN),
the D81766 optimization should be suppressed under tsan due to potential
spurious data race reports:

  struct A {
    int i;
    const short s; // the load cannot be vectorized because
    int modify;    // it overlaps with bytes being concurrently modified
    long pad1, pad2;
  };
  // __tsan_read16 does not know that some bytes are undef and accessing is safe

Similarly, under asan, users can mark memory regions with
`__asan_poison_memory_region`. A widened load can lead to a spurious
use-after-poison error. hwasan/memtag should be similarly suppressed.

`mustSuppressSpeculation` suppresses asan/hwasan/tsan but not memtag, so
we need to exclude memtag in `vectorizeLoadInsert`.

Note, memtag suppression can be relaxed if the load is aligned to the
its granule (usually 16), but that is out of scope of this patch.

Reviewed By: spatel, vitalybuka

Differential Revision: https://reviews.llvm.org/D87538
2020-09-15 09:47:21 -07:00
Simon Pilgrim 2b42d53e5e SLPVectorizer.h - remove unnecessary AliasAnalysis.h include. NFCI.
Forward declare AAResults instead of the (old) AliasAnalysis type.

Remove includes from SLPVectorizer.cpp that are already included in SLPVectorizer.h.
2020-09-15 16:24:05 +01:00
Simon Pilgrim 65c6ae3b6a [Utils] isLegalToPromote - Fix missing null check before writing to FailureReason.
The FailureReason input parameter maybe null, we check this in all other cases in the method but this one was missed somehow.

Fixes clang-tidy warning.
2020-09-15 14:49:04 +01:00
Sanjay Patel aa57c1c967 [InstCombine] fix bug in pow expansion
There at least one other bug related to pow -> sqrt transforms:
http://lists.llvm.org/pipermail/llvm-dev/2020-September/145051.html
...but we probably can't solve that without fixing this first.
2020-09-15 09:29:48 -04:00
Simon Pilgrim 796c805269 ProvenanceAnalysis.h - remove unnecessary AliasAnalysis.h include. NFCI.
Forward declare AAResults instead of the (old) AliasAnalysis type.
2020-09-15 13:34:35 +01:00
Bjorn Pettersson aa8be5aeea [Scalarizer] Avoid changing name of non-instructions
The "takeName" logic in ScalarizerVisitor::gather did not consider
that the value vector could refer to non-instructions, such as
global variables. This patch make sure that we avoid changing the
name of a value if it isn't an instruction.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D87685
2020-09-15 14:15:50 +02:00
Benjamin Kramer b768546fe0 Revert "[InstCombine] Simplify select operand based on equality condition"
This reverts commit cfff88c03c. Sends
instcombine into an infinite loop.

```
define i1 @foo(i32 %arg, i32 %arg1) {
bb:
  %tmp = udiv i32 %arg, %arg1
  %tmp2 = mul nsw i32 %tmp, %arg1
  %tmp3 = icmp eq i32 %tmp2, %arg
  %tmp4 = select i1 %tmp3, i32 %tmp, i32 undef
  %tmp5 = icmp sgt i32 %tmp4, 255
  ret i1 %tmp5
}
```
2020-09-15 12:22:47 +02:00
Simon Pilgrim 5f13d6c1ee [Transforms][Coroutines] Add missing header path to CMakeLists.txt
Helps Visual Studio check include dependencies.
2020-09-15 10:37:25 +01:00
David Sherwood 69cccb3189 [SVE] Fix isLoadInvariantInLoop for scalable vectors
I've amended the isLoadInvariantInLoop function to bail out for
scalable vectors for now since the invariant.start intrinsic is only
ever generated by the clang frontend for thread locals or struct
and class constructors, neither of which support sizeless types.
In addition, the intrinsic itself does not currently support the
concept of a scaled size, which makes it impossible to compare
the sizes of different scalable objects, e.g. <vscale x 32 x i8>
and <vscale x 16 x i8>.

Added new tests here:

  Transforms/LICM/AArch64/sve-load-hoist.ll
  Transforms/LICM/hoisting.ll

Differential Revision: https://reviews.llvm.org/D87227
2020-09-15 08:30:19 +01:00
Arthur Eubanks 10b12d4035 Reland [docs][NewPM] Add docs for writing NPM passes
As to not conflict with the legacy PM example passes under
llvm/lib/Transforms/Hello, this is under HelloNew. This makes the
CMakeLists.txt and general directory structure less confusing for people
following the example.

Much of the doc structure was taken from WritinAnLLVMPass.rst.

This adds a HelloWorld pass which simply prints out each function name.

More will follow after this, e.g. passes over different units of IR, analyses.
https://llvm.org/docs/WritingAnLLVMPass.html contains a lot more.

Relanded with missing "Support" dependency in LLVMBuild.txt.

Reviewed By: ychen, asbirlea

Differential Revision: https://reviews.llvm.org/D86979
2020-09-14 16:06:19 -07:00
Arthur Eubanks 39ec36415d Revert "[docs][NewPM] Add docs for writing NPM passes"
This reverts commit c2590de30d.

Breaks shared libs build
2020-09-14 15:55:17 -07:00
Arthur Eubanks f3d8344854 [PruneEH][NFC] Use CallGraphUpdater in PruneEH
In preparation for porting the pass to NPM.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87632
2020-09-14 14:43:19 -07:00
Arthur Eubanks c2590de30d [docs][NewPM] Add docs for writing NPM passes
As to not conflict with the legacy PM example passes under
llvm/lib/Transforms/Hello, this is under HelloNew. This makes the
CMakeLists.txt and general directory structure less confusing for people
following the example.

Much of the doc structure was taken from WritinAnLLVMPass.rst.

This adds a HelloWorld pass which simply prints out each function name.

More will follow after this, e.g. passes over different units of IR, analyses.
https://llvm.org/docs/WritingAnLLVMPass.html contains a lot more.

Reviewed By: ychen, asbirlea

Differential Revision: https://reviews.llvm.org/D86979
2020-09-14 13:26:03 -07:00
Teresa Johnson 226d80ebe2 [MemProf] Rename HeapProfiler to MemProfiler for consistency
This is consistent with the clang option added in
7ed8124d46, and the comments on the
runtime patch in D87120.

Differential Revision: https://reviews.llvm.org/D87622
2020-09-14 13:14:57 -07:00
Nikita Popov cfff88c03c [InstCombine] Simplify select operand based on equality condition
For selects of the type X == Y ? A : B, check if we can simplify A
by using the X == Y equality and replace the operand if that's
possible. We already try to do this in InstSimplify, but will only
fold if the result of the simplification is the same as B, in which
case the select can be dropped entirely. Here the select will be
retained, just one operand simplified.

As we are performing an actual replacement here, we don't have
problems with refinement / poison values.

Differential Revision: https://reviews.llvm.org/D87480
2020-09-14 20:07:06 +02:00
Simon Pilgrim 4ff4708d39 collectBitParts - use const references. NFCI.
Fixes clang-tidy warnings first noticed on D87452.
2020-09-14 18:23:00 +01:00
Florian Hahn f715d81c9d [DSE] Only eliminate candidates that always store the same loc.
AliasAnalysis/MemoryLocation does not account for loops. Two
MemoryLocation can be must-overwrite, even if the first one writes
multiple locations in a loop.

This patch prevents removing such stores, by only considering candidates
that are known to be loop invariant, or executed in the same BB.

Currently the invariant check is quite conservative and only considers
Alloca and Alloca-like instructions and arguments as invariant base pointers.
It also considers GEPs with all constant indices and invariant bases as
invariant.

This can be improved in the future, but the current implementation has
only minor impact on the total number of stores eliminated (25903 vs
26047 for the baseline). There are some 2-10% swings for some individual
benchmarks. In roughly half of the cases, the number of stores removed
increases actually, because we skip candidates that are unlikely to be
valid candidates early.
2020-09-14 12:06:58 +01:00
David Sherwood 816663adb5 [SVE] In LoopIdiomRecognize::isLegalStore bail out for scalable vectors
The function LoopIdiomRecognize::isLegalStore looks for stores in loops
that could be transformed into memset or memcpy. However, the algorithm
currently requires that we know how big the store is at runtime, i.e.
that the store size will not overflow an unsigned integer. For scalable
vectors we cannot guarantee this so I have changed the code to bail out
for now. In addition, even if we add a way to query the maximum value of
vscale in future we will still need to update the algorithm to cope with
non-constant strides. The additional cost associated with calculating
the memset and memcpy arguments will need to be taken into account as
well.

This patch also fixes up an implicit TypeSize -> uint64_t cast,
thereby removing a warning. I've added tests here showing a fixed
width vector loop being transformed into memcpy, and a scalable
vector loop remaining unchanged:

  Transforms/LoopIdiom/memcpy-vectors.ll

Differential Revision: https://reviews.llvm.org/D87439
2020-09-14 11:28:31 +01:00
David Stenberg bfcb824ba5 [JumpThreading] Fix an incorrect Modified status
This fixes PR47297.

When ProcessBlock() was able to constant fold the terminator's
condition, but not do any more transformations, the function would
return false, which would lead to the JumpThreading pass returning an
incorrect modified status. This patch makes so that ProcessBlock()
returns true in such cases. This will trigger an unnecessary invocation
of ProcessBlock() in such cases, but this should be rare to occur.

This was caught using the check introduced by D80916.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87392
2020-09-14 10:36:13 +02:00
Jay Foad 9a4476072e [UnifyLoopExits] Fix non-deterministic iteration order
This was causing random minor codegen differences in shaders compiled
with the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D87548
2020-09-14 09:09:58 +01:00
David Blaikie 6e06f1cd08 GCOVProfiling: Avoid use-after-move
Turns out this was use-after-move of function_ref, which is trivially
copyable and movable, so the move did nothing and use after move was
safe.

But since this function_ref is being copied into a std::function, change
the function_ref to be std::function to avoid extra layers of type
erasure indirection - and then it's a real use after move, and fix that
by referring to the moved-to member variable rather than the moved-from
parameter.
2020-09-13 12:54:36 -07:00
Fangrui Song 5f4e9bf641 [gcov] Fix memory leak due to BranchProbabilityInfoWrapperPass
This is weird.
2020-09-13 00:44:32 -07:00
Fangrui Song 63182c2ac0 [gcov] Add spanning tree optimization
gcov is an "Edge Profiling with Edge Counters" application according to
Optimally Profiling and Tracing Programs (1994).

The minimum number of counters necessary is |E|-(|V|-1). The unmeasured edges
form a spanning tree. Both GCC --coverage and clang -fprofile-generate leverage
this optimization. This patch implements the optimization for clang --coverage.
The produced .gcda files are much smaller now.
2020-09-13 00:07:31 -07:00
Fangrui Song f086e85eea [gcov] Assign names to some types and loaded values used in @__llvm_internal*
This makes the generated IR much more readable.
2020-09-12 22:42:37 -07:00
Fangrui Song d6fadc49e3 [gcov] Process .gcda immediately after the accompanying .gcno instead of doing all .gcda after all .gcno
i.e. change the work flow from

* .gcno for function A
* .gcno for function B
* .gcno for function C
* .gcda for function A
* .gcda for function B
* .gcda for function C

to

* .gcno for function A
* .gcda for function A
* .gcno for function B
* .gcda for function B
* .gcno for function C
* .gcda for function C

Currently there is duplicate logic in .gcno & .gcda processing: how functions
are filtered, which edges are instrumented, etc. This refactor enables simplification.

Since we always process .gcno, in -fprofile-arcs -fno-test-coverage mode,
__llvm_internal_gcov_emit_function_args.0 will have non-zero checksums.
2020-09-12 13:53:03 -07:00
Fangrui Song 7d3825ed95 Revert "[gcov] emitProfileArcs: iterate over GCOVFunction's instead of Function's to avoid duplicated filtering"
This reverts commit 412c9c0bf2.
2020-09-12 12:34:43 -07:00
Fangrui Song 412c9c0bf2 [gcov] emitProfileArcs: iterate over GCOVFunction's instead of Function's to avoid duplicated filtering 2020-09-12 12:21:32 -07:00
Fangrui Song c55c14837e [gcov] Clean up by getting llvm.dbg.cu earlier 2020-09-12 12:21:32 -07:00
Florian Hahn e082dee2b5 [DSE] Bail out on MemoryPhis when deleting stores at end of function.
When deleting stores at the end of a function, we have to do PHI
translation, otherwise we might miss reads in different iterations of a
loop. See multiblock-loop-carried-dependence.ll for details.

This fixes a mis-compile and surprisingly also increases the number of
eliminated stores from 26047 to 26572 for MultiSource/SPEC2000/SPEC2006
on X86 with -O3 -flto. This is most likely because we save budget by not
exploring through MemoryPhis, which are less likely to result in valid
candidates for elimination.

The issue was reported post-commit for fb109c42d9.
2020-09-12 19:05:59 +01:00
David Green 74760bb00f [LV][ARM] Add preferInloopReduction target hook.
This allows the backend to tell the vectorizer to produce inloop
reductions through a TTI hook.

For the moment on ARM under MVE this means allowing integer add
reductions of the correct size. In the future this can include integer
min/max too, under -Os.

Differential Revision: https://reviews.llvm.org/D75512
2020-09-12 17:47:04 +01:00
Tyker 78de7297ab Reland [AssumeBundles] Use operand bundles to encode alignment assumptions
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".

As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
2020-09-12 15:36:06 +02:00
Nikita Popov 36e2e2e12e [InstCombine] Fix incorrect SimplifyWithOpReplaced transform (PR47322)
This is a followup to D86834, which partially fixed this issue in
InstSimplify. However, InstCombine repeats the same transform while
dropping poison flags -- which does not cover cases where poison is
introduced in some other way.

The fix here is a bit more comprehensive, because things are quite
entangled, and it's hard to only partially address it without
regressing optimization. There are really two changes here:

 * Export the SimplifyWithOpReplaced API from InstSimplify, with an
   added AllowRefinement flag. For replacements inside the TrueVal
   we don't actually care whether refinement occurs or not, the
   replacement is always legal. This part of the transform is now
   done in InstSimplify only. (It should be noted that the current
   AllowRefinement check is not sufficient -- that's an issue we
   need to address separately.)
 * Change the InstCombine fold to work by temporarily dropping
   poison generating flags, running the fold and then restoring the
   flags if it didn't work out. This will ensure that the InstCombine
   fold is correct as long as the InstSimplify fold is correct.

Differential Revision: https://reviews.llvm.org/D87445
2020-09-12 14:45:06 +02:00
Sanjay Patel 40f12ef621 [SLP] further limit bailout for load combine candidate (PR47450)
The test example based on PR47450 shows that we can
match non-byte-sized shifts, but those won't ever be
bswap opportunities. This isn't a full fix (we'd still
match if the shifts were by 8-bits for example), but
this should be enough until there's evidence that we
need to do more (this is a borderline case for
vectorization in the first place).
2020-09-11 11:56:11 -04:00
Krzysztof Parzyszek f92908cc74 [DSE] Make sure that DSE+MSSA can handle masked stores
Differential Revision: https://reviews.llvm.org/D87414
2020-09-11 10:00:21 -05:00
Sanjay Patel 6aa3fc4a5b Revert "[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430)"
This reverts commit 324a53205a.

On closer examination of at least one of the test diffs,
this does not appear to be correct in all cases. Even the
existing 'nsw' creation may be wrong based on this example:
https://alive2.llvm.org/ce/z/uL4Hw9
https://alive2.llvm.org/ce/z/fJMKQS
2020-09-11 10:54:48 -04:00
Sanjay Patel 324a53205a [InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430)
There's no signed wrap if both geps have 'inbounds':
https://alive2.llvm.org/ce/z/nZkQTg
https://alive2.llvm.org/ce/z/7qFauh
2020-09-11 10:39:09 -04:00
Simon Pilgrim 48b510c4bc [NFC] Fix compiler warnings due to integer comparison of different signedness
Fix by directly using INT_MAX and INT32_MAX.

Patch by: @nullptr.cpp (Yang Fan)

Differential Revision: https://reviews.llvm.org/D87347
2020-09-11 15:32:03 +01:00
David Sherwood 1e1770a07e [SVE][CodeGen] Fix InlineFunction for scalable vectors
When inlining functions containing allocas of scalable vectors we
cannot specify the size in the lifetime markers, since we don't
know this at compile time.

Added new test here:

  test/Transforms/Inline/AArch64/sve-alloca-merge.ll

Differential Revision: https://reviews.llvm.org/D87139
2020-09-11 08:34:51 +01:00
Michael Liao f787fe15d8 [EarlyCSE] Remove unnecessary operand swap.
- As min/max are commutative operators, there is no need to swap
  operands. That breaks the convention calculating the hash value.
2020-09-11 02:14:04 -04:00
Michael Liao 41e68f7ee7 [EarlyCSE] Fix and recommit the revised c9826829d7
In addition to calculate hash consistently by swapping SELECT's
operands, we also need to inverse the select pattern favor to match the
original logic.

[EarlyCSE] Equivalent SELECTs should hash equally

DenseMap<SimpleValue> assumes that, if its isEqual method returns true
for two elements, then its getHashValue method must return the same value
for them. This invariant is broken when one SELECT node is a min/max
operation, and the other can be transformed into an equivalent min/max by
inverting its predicate and swapping its operands. This patch fixes an
assertion failure that would occur intermittently while compiling the
following IR:

    define i32 @t(i32 %i) {
      %cmp = icmp sle i32 0, %i
      %twin1 = select i1 %cmp, i32 %i, i32 0
      %cmpinv = icmp sgt i32 0, %i
      %twin2 = select i1 %cmpinv,  i32 0, i32 %i
      %sink = add i32 %twin1, %twin2
      ret i32 %sink
    }

Differential Revision: https://reviews.llvm.org/D86843
2020-09-10 23:30:56 -04:00
Michael Liao 39dc75f66c Revert "[EarlyCSE] Equivalent SELECTs should hash equally"
This reverts commit c9826829d7 as it
breaks regression tests.
2020-09-10 22:37:35 -04:00
Florian Hahn fb109c42d9 [DSE] Switch to MemorySSA-backed DSE by default.
The tests have been updated and I plan to move them from the MSSA
directory up.

Some end-to-end tests needed small adjustments. One difference to the
legacy DSE is that legacy DSE also deletes trivially dead instructions
that are unrelated to memory operations. Because MemorySSA-backed DSE
just walks the MemorySSA, we only visit/check memory instructions. But
removing unrelated dead instructions is not really DSE's job and other
passes will clean up.

One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll,
but I think this comes down to legacy DSE not handling instructions that
may throw correctly in that case. To cover this with MemorySSA-backed
DSE, we need an update to llvm.coro.begin to treat it's return value to
belong to the same underlying object as the passed pointer.

There are some minor cases MemorySSA-backed DSE currently misses, e.g. related
to atomic operations, but I think those can be implemented after the switch.

This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html

For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores
goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers
and details in the thread on llvm-dev.

Impact on CTMark:
```
                                     Legacy Pass Manager
                        exec instrs    size-text
O3                       + 0.60%        - 0.27%
ReleaseThinLTO           + 1.00%        - 0.42%
ReleaseLTO-g.            + 0.77%        - 0.33%
RelThinLTO (link only)   + 0.87%        - 0.42%
RelLO-g (link only)      + 0.78%        - 0.33%
```
http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions
```
                                     New Pass Manager
                       exec instrs.   size-text
O3                       + 0.95%       - 0.25%
ReleaseThinLTO           + 1.34%       - 0.41%
ReleaseLTO-g.            + 1.71%       - 0.35%
RelThinLTO (link only)   + 0.96%       - 0.41%
RelLO-g (link only)      + 2.21%       - 0.35%
```
http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions

Reviewed By: asbirlea, xbolva00, nikic

Differential Revision: https://reviews.llvm.org/D87163
2020-09-10 22:24:32 +01:00
Bryan Chan c9826829d7 [EarlyCSE] Equivalent SELECTs should hash equally
DenseMap<SimpleValue> assumes that, if its isEqual method returns true
for two elements, then its getHashValue method must return the same value
for them. This invariant is broken when one SELECT node is a min/max
operation, and the other can be transformed into an equivalent min/max by
inverting its predicate and swapping its operands. This patch fixes an
assertion failure that would occur intermittently while compiling the
following IR:

    define i32 @t(i32 %i) {
      %cmp = icmp sle i32 0, %i
      %twin1 = select i1 %cmp, i32 %i, i32 0
      %cmpinv = icmp sgt i32 0, %i
      %twin2 = select i1 %cmpinv,  i32 0, i32 %i
      %sink = add i32 %twin1, %twin2
      ret i32 %sink
    }

Differential Revision: https://reviews.llvm.org/D86843
2020-09-10 16:59:24 -04:00
Christopher Tetreault 7ddfd9b3eb [SVE] Bail from VectorUtils heuristics for scalable vectors
Bail from maskIsAllZeroOrUndef and maskIsAllOneOrUndef prior to iterating over the number of
elements for scalable vectors.

Assert that the mask type is not scalable in possiblyDemandedEltsInMask .

Assert that the types are correct in all three functions.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87424
2020-09-10 12:29:37 -07:00